TensorDiffEq: Scalable Multi-GPU Forward and Inverse Solvers for Physics Informed Neural Networks

\nameLevi D. McClenny\email[email protected]
\addrDepartment of Electrical Engineering
Texas A&M University
College Station, TX 77840, USA \AND\nameMulugeta A. Haile \email[email protected]
\addrUS Army CCDC Army Research Lab
Aberdeen Proving Ground
Aberdeen, MD, USA \AND\nameUlisses M. Braga-Neto \email[email protected]
\addrDepartment of Electrical Engineering
Texas A&M University
College Station, TX 77840, USA

Abstract

Physics-Informed Neural Networks promise to revolutionize science and engineering practice, by introducing domain-aware deep machine learning models into scientific computation. Several software suites have emerged to make the implementation and usage of these architectures available to the research and industry communities. Here we introduceTensorDiffEq, built on Tensorflow 2.x, which presents an intuitive Keras-like interface for problem domain definition, model definition, and solution of forward and inverse problems using physics-aware deep learning methods. TensorDiffEq takes full advantage of Tensorflow 2.x infrastructure for deployment on multiple GPUs, allowing the implementation of large high-dimensional and complex models. Simultaneously, TensorDiffEq supports the Keras API for custom neural network architecture definitions. In the case of smaller or simpler models, the package allows for rapid deployment on smaller-scale CPU platforms with negligible changes to the implementation scripts. We demonstrate the basic usage and capabilities of TensorDiffEq in solving forward, inverse, and data assimilation problems of varying sizes and levels of complexity. The source code is available at https://github.com/tensordiffeq.¹¹1Package documentation is available at https://docs.tensordiffeq.io

Keywords: Scientific Machine Learning, PINNs, Neural PDEs, Scientific Computation, Numerical Methods, Neural Networks, Physics-Informed Deep Learning

1 Introduction

As part of the burgeoning field of scientific machine learning (Baker et al., 2019), physics-informed neural networks (PINNs) have emerged recently as an alternative to traditional partial different equation (PDE) solvers (Raissi et al., 2019; Raissi, 2018; Wight and Zhao, 2020; Wang et al., 2020b), and have given rise to the larger field of study in neural network approximation of PDE systems, generally referred to as Neural PDEs. Typical black-box deep learning methodologies do not take into account the underlying physics of the problem domain. The Neural PDE approach is based on constraining the output of a deep neural network to satisfy a physical model specified by a PDE. PINNs typically perform this task via PDE-constrained regularization of a residual function defined by the approximation of the solution network u and forward-pass calculations through the physics of the PDE model, with the applicable derivatives of u calculated via reverse-mode automatic differentiation in a modern deep learning framework such as Tensorflow (Abadi et al., 2016).

The potential of using neural networks as universal function approximators to solve PDEs had been recognized since the 1990’s (Dissanayake and Phan-Thien, 1994; Lagaris et al., 1998). However, Physics-Informed Neural Networks promise to take this approach to a different level through deep neural networks, the exploration of which is now possible due to the vast advances in computational capabilities and training algorithms since that time (Abadi et al., 2016; Revels et al., 2016) and modern congenial automatic differentiation software (Baydin et al., 2017; Paszke et al., 2017).

A great advantage of the PINN architecture over traditional time-stepping PDE solvers is that the entire spatial-temporal domain can be solved at once using collocation points distributed quasi-randomly (rather than on a grid) across the spatial-temporal domain, in a process that can be massively parallelized via GPU. As we have continued to see GPU capabilities increase in recent years, a method that relies on parallelism in training iterations could begin to emerge as the predominant approach in scientific computing. To this end, while other software suites exist to define and solve PINNs (Rackauckas and Nie, 2017; Lu et al., 2021; Hennigh et al., 2020; Haghighat and Juanes, 2021), many of those platforms are either restricted to single-GPU implementation or are not fully open-source. Additionally, with full support and customization capabilities of the Keras neural network ecosystem built in to the package, researchers and practitioners can define and train their own custom neural network architectures to approximate the solution of their problem domains. TensorDiffEq provides these scalable, modular, and customizable multi-GPU architectures and solvers in a fully open-source platform, tapping into the collective intelligence of the field to improve the implementation of the software and provide input on the direction, structure, and feature coverage of the framework.

2 Mathematical Underpinnings of PINNs

Consider a general nonlinear PDE of the form:

	$\displaystyle\mathcal{N}_{\boldsymbol{x},t}[u(\boldsymbol{x},t)]=0\,,\ \ \boldsymbol{x}\in\Omega\,,\ t\in[0,T]\,,$		(1)
	$\displaystyle u(\boldsymbol{x},t)=g(\boldsymbol{x},t)\,,\ \ \boldsymbol{x}\in\partial\Omega,\ t\in[0,T]\,,$		(2)
	$\displaystyle u(\boldsymbol{x},0)=h(\boldsymbol{x})\,,\ \ \boldsymbol{x}\in\Omega\,,$		(3)

where $\boldsymbol{x}\in\Omega$ is a spatial vector variable in a domain $\Omega\subset R^{d}$ , $t$ is time, and $\mathcal{N}_{\boldsymbol{x},t}$ is a spatial-temporal differential operator. Following Raissi et al. (2019), let $u(\boldsymbol{x},t)$ be approximated by the output $u(\boldsymbol{x},t;\boldsymbol{w})$ of a deep neural network with inputs $\boldsymbol{x}$ and $t$ . Define the residual network $r(\boldsymbol{x},t;\boldsymbol{w})$ , which share the same network weights $\boldsymbol{w}$ as the approximation network $u(\boldsymbol{x},t;\boldsymbol{w})$ , and satisfies:

r(\boldsymbol{x},t;\boldsymbol{w}):=\mathcal{N}_{\boldsymbol{x},t}[u(\boldsymbol{x},t;\boldsymbol{w})]\,,

(4)

where all partial derivatives can be computed by automatic differentiation methods (Baydin et al., 2017; Paszke et al., 2017). The shared network weights $\boldsymbol{w}$ are trained by minimizing a loss function that penalizes the output for not satisfying (1)-(3):

\mathcal{L}(\boldsymbol{w})=\mathcal{L}_{s}(\boldsymbol{w})+\mathcal{L}_{r}(\boldsymbol{w})+\mathcal{L}_{b}(\boldsymbol{w})+\mathcal{L}_{0}(\boldsymbol{w})\,,

(5)

where $\mathcal{L}_{s}$ is the loss corresponding to sample data (if any), $\mathcal{L}_{r}$ is the loss corresponding to the residual (4), $\mathcal{L}_{b}$ is the loss due to the boundary conditions (2), and $\mathcal{L}_{0}$ is the loss due to the initial conditions (3):

$\displaystyle\mathcal{L}_{s}(\boldsymbol{w})$	$\displaystyle=\frac{1}{N_{s}}\sum^{N_{s}}_{i=1}\|u(\boldsymbol{x}^{i}_{s},t^{i}_{s};\boldsymbol{w})-y^{i}_{s}\|^{2},$	(6)
$\displaystyle\mathcal{L}_{r}(\boldsymbol{w})$	$\displaystyle=\frac{1}{N_{r}}\sum^{N_{r}}_{i=1}r(\boldsymbol{x}^{i}_{r},t^{i}_{r};\boldsymbol{w})^{2},$	(7)
$\displaystyle\mathcal{L}_{b}(\boldsymbol{w})$	$\displaystyle=\frac{1}{N_{b}}\sum^{N_{b}}_{i=1}\|u(\boldsymbol{x}^{i}_{b},t^{i}_{b};\boldsymbol{w})-g^{i}_{b}\|^{2},\,$	(8)
$\displaystyle\mathcal{L}_{0}(\boldsymbol{w})$	$\displaystyle=\frac{1}{N_{0}}\sum^{N_{0}}_{i=1}\|u(\boldsymbol{x}^{i}_{0},0;\boldsymbol{w})-h_{0}^{i}\|^{2}.$	(9)

where $\{\boldsymbol{x}_{s}^{i},t_{s}^{i},y_{s}^{i}\}_{i=1}^{N_{s}}$ are sample data (if any), $\{\boldsymbol{x}_{0}^{i},h_{0}^{i}=h(\boldsymbol{x}_{0}^{i})\}_{i=1}^{N_{0}}$ are initial condition point, $\{\boldsymbol{x}_{b}^{i},t^{i}_{b},g_{b}^{i}=g(\boldsymbol{x}_{b}^{i},t^{i}_{b}))\}_{i=1}^{N_{b}}$ are boundary condition points, $\{\boldsymbol{x}_{r}^{i},t^{i}_{r}\}_{i=1}^{N_{r}}$ are collocation points randomly distributed in the domain $\Omega$ , and $N_{s},N_{0},N_{b}$ and $N_{r}$ denote the total number of sample data, initial points, boundary points, and collocation points, respectively. The network weights $\boldsymbol{w}$ can be tuned by minimizing the total training loss $\mathcal{L}(\boldsymbol{w})$ via standard gradient descent procedures used in deep learning.

3 Using TensorDiffEq for Forward Problems

TensorDiffEq has a boilerplate model that can loosely be followed in most instances of usage of the package. In forward problems, this process is generally described in the following order:

1.

Define the problem domain
2.

Describe the physics of the model
3.

Define the Initial Conditions and Boundary Conditions (IC/BCs)
4.

Define the neural network architecture
5.

Select and define the solver
6.

Solve the PDE using the fit method

Each of these steps has multiple options and definitions in the TensorDiffEq solution suite. The following sections will provide a brief overview of some of the built-in functionality of the package.

3.1 Define the Problem Domain

A Domain object is the first essential component of defining a problem in TensorDiffEq. The domain object contains primitives for defining the problem scope used later in your definitions of boundary conditions, initial conditions, and eventually to sample collocation points that are fed into the PINN solver.

The Domain object is defined iteratively. As many dimensions as are required can simply be added to the domain using the add method. This means TensorDiffEq can be used to solve spatial (steady-state) or spatiotemporal 2D, 3D, or $N$ D problems.

3.2 Describe the Physics of the Model

Since TensorDiffEq is built on top of Tensorflow (Abadi et al., 2015) physics of the model can be defined via a strong-form PDE, with gradients defined using the built-in tf.gradients function. This allows for a definition similar to that seen in Raissi et al. (2019). An example of defining the PDE for a viscous Burger’s system is shown below:

⬇

1def f_model(u_model, x, t):

2 u = u_model(tf.concat([x, t], 1))

3 u_x = tf.gradients(u, x)

4 u_xx = tf.gradients(u_x, x)

5 u_t = tf.gradients(u, t)

6 f_u = u_t + u * u_x - (0.01 / tf.constant(math.pi)) * u_xx

7 return f_u

Due to the nature of how the PDE system is defined in TensorDiffEq, one could define a separate system of u and allow for a coupled PDE definition using a similar style as the one shown above.

3.3 Define the ICs/BCs

TensorDiffEq supports various types of ICs and BCs and the list will continue to grow. The ICs and BCs that require functions allow for intuitive definitions of those functions of system variables as a Python function, which allows for nonlinear and non-continuous function definitions of state variables. One could define piece-wise functions, Boolean functions, etc using this verbiage and it would be valid input to TensorDiffEq’s solvers. At the time of this writing, TensorDiffEq supports constant Dirichlet, Function Dirichlet, and periodic BCs, as well as function-based ICs. TensorDiffEq takes the ICs and BCs as a list, therefore one can add as many as necessary to define the system. If a BC is not defined on a particular boundary or it is overlooked in the problem definition then the solver will attempt to approximate that boundary using PDE-constrained regularization of the inner points on or around that boundary.

3.4 Define the Neural Network Architecture

The default architecture of the neural network is a fully connected MLP defined in the Keras API (Chollet et al., 2015). To take advantage of the built-in MLP, a list of hidden layer sizes is passed into the solver. However, this baseline architecture can be overwritten by any Keras neural network. Currently, the solver requires the number of inputs of the neural network to be the same as the number of dimensions of the system, and the output is the scalar value of the approximation of $u(\textbf{X})$ at that combination of input points. However, this “single-network” output architecture is actively being expanded at the time of this writing.

In the event one desires to add batch norm, residual blocks, etc, then the Keras API could be used to define the model and the internal parameters of TensorDiffEq’s solvers could be modified to use that network as the solution network for $u(\textbf{X})$ . In this way, so long as the input to the neural network has the correct dimensionality for the system (i.e. 3 nodes for a problem with x, y, t dimensions) and the output node is the correct number of dimensions then one could build any architecture the Keras API allows and pass it into the solver. This features also allows for custom neural network layer support using the Keras lambda layer ecosystem, allowing for complete autonomy in the definition of the neural network model internals and training via built-in Keras optimizers.

3.5 Select and Define The Solver

TensorDiffEq is a suite designed to provide forward and inverse PINN solvers. As such, there are various solvers to perform these tasks. At the time of this writing, there are Collocation Method solvers for forward modeling and the Discovery Model for inverse modeling.

Hyperparameter selection can be modified by the user by overwriting the default Adam optimizer (Kingma and Ba, 2014) with any of the other available optimizers in Keras, to include AdaDelta (Zeiler, 2012), Root-Mean-Square Propagation, SGD, and others. Some of these different optimization techniques prove more stable in training than others, and there exist various methods of modifying the loss function of the collocation solver to improve convergence (Wang et al., 2020b, a). To this end, TensorDiffEq supports self-adaptive training methods, which have proven to be effective in helping semi-linear PDE systems, such as Allen-Cahn (Allen and Cahn, 1972), converge where the baseline collocation method fails (McClenny and Braga-Neto, 2020). Other methods of improving convergence in Neural PDE and PINN training are continuously being considered.

3.6 Solve the PDE

Each solver has a compile and fit method, to give the package a feel similar to modern popular machine learning or deep learning frameworks such as Keras (Chollet et al., 2015) or scikit-learn (Pedregosa et al., 2011). In most instances, the compile function places parameters such as the domain size and shape, neural network sizes, BCs/ICs, etc. into the solver, and the fit function takes only the number of iterations of Keras optimizer runs or newton solver runs.

A feature unique to TensorDiffEq is that the Keras neural network model can be exported and saved for later use. This could allow for training on a data center platform, but inference on a local machine. Additionally, being able to export the Keras neural network model opens the door to transfer learning possibilities previously difficult with the versions of Neural PDE solvers currently in circulation. In the case of TensorDiffEq, this is a natural result of leaning on the Tensorflow/Keras APIs.

4 Solving Inverse Problems

TensorDiffEq comes with a base class solver for inverse problems. Inverse problems can imply parameter estimation or even estimate the interactions between nonlinear operators (Lu et al., 2019) from data. TensorDiffEq contains solvers that perform parameter estimation in a PDE system. These parameters can be mobility parameters, diffusivity parameters, etc, where there is some level of a priori physical knowledge about the system in question, but a specific parameter may be unknown. TensorDiffEq contains built-in support for solving of such systems that can be solved in $N$ D cases. Parameters are defined as variables that are learned over the course of the training, therefore a natural output is a trained $u(\textbf{X},t)$ solution as well as the estimate of the parameters in question.

5 Conclusion

In this article, the authors introduce TensorDiffEq, a scalable multi-GPU solver for PINNs/NeuralPDEs. Some of the main highlights of the software are covered, and more features are currently underway. TensorDiffEq contains support for various types of initial conditions, boundary conditions, and allows the user to custom-define their PDE system for their specific problem. In the event that inverse modeling is required, TensorDiffEq contains solvers that will accommodate parameter estimation of a PDE system. Currently, TensorDiffEq is the only software suite to support self-adaptive solving, demonstrated to improve training convergence and accuracy of the final solution. TensorDiffEq takes a step forward in modern implementations of PINN solvers, and fills a unique niche of being a fully open-source multi-GPU PINN solver in the current ecosystem of Scientific Machine Learning software offerings.

Acknowledgments

The authors would like to acknowledge the support of the D³EM program funded through NSF Award DGE-1545403. The authors would further like to thank the US Army CCDC Army Research Lab for their generous support and affiliation, as well as the Nvidia DGX Station hardware support, which allowed the development of the software highlighted in this publication.

References

Abadi et al. (2015) Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org.
Abadi et al. (2016) Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. In 12th $\{$ USENIX $\}$ symposium on operating systems design and implementation ( $\{$ OSDI $\}$ 16), pages 265–283, 2016.
Allen and Cahn (1972) Samuel Miller Allen and John W Cahn. Ground state structures in ordered binary alloys with second neighbor interactions. Acta Metallurgica, 20(3):423–433, 1972.
Baker et al. (2019) Nathan Baker, Frank Alexander, Timo Bremer, Aric Hagberg, Yannis Kevrekidis, Habib Najm, Manish Parashar, Abani Patra, James Sethian, Stefan Wild, Karen Willcox, and Steven Lee. Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence, 2 2019.
Baydin et al. (2017) Atılım Günes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. The Journal of Machine Learning Research, 18(1):5595–5637, 2017.
Chollet et al. (2015) François Chollet et al. Keras. https://keras.io, 2015.
Dissanayake and Phan-Thien (1994) MWMG Dissanayake and N Phan-Thien. Neural-network-based approximations for solving partial differential equations. communications in Numerical Methods in Engineering, 10(3):195–201, 1994.
Haghighat and Juanes (2021) Ehsan Haghighat and Ruben Juanes. Sciann: A keras/tensorflow wrapper for scientific computations and physics-informed deep learning using artificial neural networks. Computer Methods in Applied Mechanics and Engineering, 373:113552, 2021.
Hennigh et al. (2020) Oliver Hennigh, Susheela Narasimhan, Mohammad Amin Nabian, Akshay Subramaniam, Kaustubh Tangsali, Max Rietmann, Jose del Aguila Ferrandis, Wonmin Byeon, Zhiwei Fang, and Sanjay Choudhry. Nvidia simnet^ $\{$ TM $\}$ : an ai-accelerated multi-physics simulation framework. arXiv preprint arXiv:2012.07938, 2020.
Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Lagaris et al. (1998) Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
Lu et al. (2019) Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
Lu et al. (2021) Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: A deep learning library for solving differential equations. SIAM Review, 63(1):208–228, 2021. doi: 10.1137/19M1274067.
McClenny and Braga-Neto (2020) Levi McClenny and Ulisses Braga-Neto. Self-adaptive physics-informed neural networks using a soft attention mechanism. arXiv preprint arXiv:2009.04544, 2020.
Paszke et al. (2017) Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
Pedregosa et al. (2011) Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
Rackauckas and Nie (2017) Christopher Rackauckas and Qing Nie. Differentialequations.jl – a performant and feature-rich ecosystem for solving differential equations in julia. The Journal of Open Research Software, 5(1), 2017. doi: 10.5334/jors.151. URL https://app.dimensions.ai/details/publication/pub.1085583166andhttp://openresearchsoftware.metajnl.com/articles/10.5334/jors.151/galley/245/download/. Exported from https://app.dimensions.ai on 2019/05/05.
Raissi (2018) Maziar Raissi. Forward-backward stochastic neural networks: Deep learning of high-dimensional partial differential equations. arXiv preprint arXiv:1804.07010, 2018.
Raissi et al. (2019) Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
Revels et al. (2016) Jarrett Revels, Miles Lubin, and Theodore Papamarkou. Forward-mode automatic differentiation in julia. arXiv preprint arXiv:1607.07892, 2016.
Wang et al. (2020a) Sifan Wang, Yujun Teng, and Paris Perdikaris. Understanding and mitigating gradient pathologies in physics-informed neural networks. arXiv preprint arXiv:2001.04536, 2020a.
Wang et al. (2020b) Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why pinns fail to train: A neural tangent kernel perspective. arXiv preprint arXiv:2007.14527, 2020b.
Wight and Zhao (2020) Colby L Wight and Jia Zhao. Solving allen-cahn and cahn-hilliard equations using the adaptive physics informed neural networks. arXiv preprint arXiv:2007.04542, 2020.
Zeiler (2012) Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.