A deep learning framework for solution and discovery in solid mechanics

Ehsan Haghighat Massachusetts Institute of Technology, Cambridge, MA Maziar Raissi University of Colorado Boulder, Boulder, CO Adrian Moure Hector Gomez Purdue University, West Lafayette, IN Ruben Juanes

Abstract

We present the application of a class of deep learning, known as Physics Informed Neural Networks (PINN), to learning and discovery in solid mechanics. We explain how to incorporate the momentum balance and constitutive relations into PINN, and explore in detail the application to linear elasticity, and illustrate its extension to nonlinear problems through an example that showcases von Mises elastoplasticity. While common PINN algorithms are based on training one deep neural network (DNN), we propose a multi-network model that results in more accurate representation of the field variables. To validate the model, we test the framework on synthetic data generated from analytical and numerical reference solutions. We study convergence of the PINN model, and show that Isogeometric Analysis (IGA) results in superior accuracy and convergence characteristics compared with classic low-order Finite Element Method (FEM). We also show the applicability of the framework for transfer learning, and find vastly accelerated convergence during network re-training. Finally, we find that honoring the physics leads to improved robustness: when trained only on a few parameters, we find that the PINN model can accurately predict the solution for a wide range of parameters new to the network—thus pointing to an important application of this framework to sensitivity analysis and surrogate modeling.

keywords:

Artificial neural network , Physics-informed deep learning , Inversion , Transfer learning , Linear elasticity , Elastoplasticity

1 Introduction

Over the past few years, there has been a revolution in the successful application of Artificial Neural Networks (ANN), also commonly referred to Deep Neural Networks (DNN) and Deep Learning (DL), in various fields including image classification, handwriting recognition, speech recognition and translation, and computer vision. These ANN approaches have led to a sea change in the performance of search engines, autonomous driving, e-commerce, and photography (see [1, 2, 3] for a review). In engineering and science, ANNs have been applied to an increasing number of areas, including geosciences [4, 5, 6, 7, 8], material science [9, 10, 11, 12], fluid mechanics [13, 14], genetics [15], and infrastructure health monitoring [16, 17], to name a few examples. In the solid and geomechanics community, deep learning has been used primarily for material modeling, in an attempt to replace classical constitutive models with ANNs [18, 19, 20]. In these applications, training of the network, i.e., evaluation of the network parameters, is carried out by minimizing the norm of the distance between the network output (prediction) and the true output (training data). In this paper, we will refer to ANNs trained in this way as “data-driven.”

A different class of ANNs, known as Physics-Informed Neural Networks (PINN), was introduced recently [21, 22, 23, 24, 25]. This concept of ANNs was developed to endow the network model with known equations that govern the physics of a system. The training of PINNs is performed with a cost function that, in addition to data, includes the governing equations, initial and boundary conditions. This architecture can be used for solution and discovery (finding parameters) of systems of ordinary differential equations (ODEs) and partial differential equations (PDEs). While solving ODEs and PDEs with ANNs is not a new topic, e.g., [26, 27, 28], the success of these recent studies can be broadly attributed to: (1) the choice of network architecture, i.e., the set of inputs and outputs of the ANN, so that one can impose governing equations on the network; (2) algorithmic advances, including graph-based automatic differentiation for accurate differentiation of ANN functionals and for error back-propagation; and (3) availability of advanced machine-learning software with CPU and GPU parallel processing capabilities including Theano [29] and TensorFlow [30].

This framework has been used for solution and discovery of Schrodinger, Allen–Cahn, and Navier–Stokes equations [22, 21]. It has also been used for solution of high-dimensional stochastic PDEs [23]. As pointed out in [23], this approach can be considered as a class of Reinforcement Learning [31], where the learning is on maximizing an incentive or minimizing a loss rather than direct training on data. If the network prediction does not satisfy a governing equation, it will result in an increase in the cost and therefore the learning traverses a path that minimizes that cost.

Here, we focus on the novel application of PINNs to solution and discovery of solid mechanics. We study linear elasticity in detail, but then illustrate the performance on nonlinear von Mises elastoplasticity. Since parameters of the governing PDEs can also be defined as trainable parameters, the framework inherently allows us to perform parameter identification (model inversion). We validate the framework on synthetic data generated from low-order and high-order Finite Element Methods (FEM) and from Isogeometric Analysis (IGA) [32, 33]. These datasets satisfy the governing equations with different order of accuracy, where the error can be considered as noise in data. We find that the training converges faster on more accurate datasets, pointing to importance of higher-order numerical methods for pre-training ANNs. We also find that if the data is pre-processed properly, the training converges to the correct solution and correct parameters even on data generated with a coarse mesh and low-order FEM—an important result that illustrates the robustness of the proposed approach. Finally, we find that, due to the imposition of the physics constraints, the training converges on a very sparse data set, which is a crucial property in practice given that the installation of a dense network of sensors can be very costly.

Parameter estimation (identification) of complex models is a challenging task that requires a large number of forward simulations, depending on model complexity and the number of parameters. As a result, most inversion techniques have been applied to simplified models. The use of PINNs, however, allows us to perform identification simultaneously with fitting the ANN model on data [22]. This property highlights the potential of this approach compared with classical methods. We explore the application of PINN models for identification of multiple datasets generated with different parameters. Similar to transfer learning, where a pre-trained model is used as the initial state of the network [34], we perform re-training on new datasets starting from a previously trained network on a different dataset (with different parameters). We find that the re-training and identification of other datasets take far less time. Since the successfully trained PINN model should also satisfy the physics constraints, it is in effect a surrogate model that can be used for extrapolation on unexplored data. To test this property, we train a network on four datasets with different parameters and then test it on a wide range of new parameter sets, and find that the results remain relatively accurate. This property points to the applicability of PINN models for sensitivity analysis, where classical approaches typically require an exceedingly large number of forward simulations.

2 Physics-Informed Neural Networks: Linear Elasticity

In this section, we review the equations of linear elastostatics with emphasis on PINN implementation.

2.1 Linear elasticity

The equations expressing momentum balance, the constitutive model and the kinematic relations are, respectively,

\begin{split}\sigma_{ij,j}+f_{i}&=0,\\ \sigma_{ij}&=\lambda\delta_{ij}\varepsilon_{kk}+2\mu\varepsilon_{ij},\\ \varepsilon_{ij}&=\frac{1}{2}\left(u_{i,j}+u_{j,i}\right).\end{split}

(1)

Here, $\sigma_{ij}$ denotes the Cauchy stress tensor. For the two-dimensional problems considered here $i,j=1,2$ (or $i,j=x,y$ ). We use the summation convention, and an subscript comma denotes partial derivative. The function $f_{i}$ denotes a body force, $u_{i}$ represents the displacements, $\varepsilon_{ij}$ is the infinitesimal stress tensor and $\delta_{ij}$ is the Kronecker delta. The Lamé parameters $\lambda$ and $\mu$ are the quantities to be inferred using PINN.

2.2 Introduction to Physics-Informed Neural Networks

In this section, we provide an overview of the Physics-Informed Neural Networks (PINN) architecture, with emphasis on their application to model inversion. Let $\mathcal{N}(\mathbf{x};\mathbf{W},\mathbf{b}):\mathbb{R}^{d_{\mathbf{x}}}\rightarrow\mathbb{R}^{d_{\mathbf{y}}}$ be an $L$ -layer neural network with input vector $\mathbf{x}$ , output vector $\mathbf{y}$ , and network parameters $\mathbf{W},\mathbf{b}$ . This network is a feed-forward network, meaning that each layer creates data for the next layer through the following nested transformations:

\mathbf{z}^{l}=\sigma^{l}\left(\mathbf{W}^{l}\mathbf{z}^{l-1}+\mathbf{b}^{l}\right),\quad l=1,\dots,L,

(2)

where $\mathbf{z}^{0}\equiv\mathbf{x}$ and $\mathbf{z}^{L}\equiv\mathbf{y}$ are inputs and outputs of the model, $\mathbf{W}^{l},\mathbf{b}^{l}$ are parameters of each layer $l$ , known as weights and biases, respectively. The functions $\sigma^{l}$ are called activation functions and make the network nonlinear with respect to the inputs. For instance, an ANN functional of some field variable, such as displacement $u(\mathbf{x})$ , with three hidden layers and with $\sigma^{l}=\tanh$ as the activation function for all layers except the last can be written as

\begin{split}\mathbf{z}^{1}(\mathbf{x})&=\tanh(\mathbf{W}^{0}\mathbf{x}+\mathbf{b}^{0}),\\ \mathbf{z}^{2}(\mathbf{x})&=\tanh(\mathbf{W}^{1}\mathbf{z}^{1}+\mathbf{b}^{1}),\\ \mathbf{z}^{3}(\mathbf{x})&=\tanh(\mathbf{W}^{2}\mathbf{z}^{2}+\mathbf{b}^{2}),\\ u(\mathbf{x})&=\mathbf{W}^{3}\mathbf{z}^{3}+\mathbf{b}^{3}.\end{split}

(3)

This model can be considered as an approximate solution for the field variable $u$ of a partial differential equation.

In the PINN architecture, the network inputs (also known as features) are space and time variables, i.e., $(x,y,z,t)$ in Cartesian coordinates, which makes it meaningful to perform the differentiation of the network’s output with respect to any of the input variables. Classical implementations based on finite difference approximations are not accurate when applied to deep networks (see [35] for a review). Thanks to modern graph-based implementation of the feed-forward network (e.g., Theano [29], Tensorflow [30], MXNet [36]), this can be carried out using Automatic Differentiation at machine precision, therefore allowing for many hidden layers to represent nonlinear response. Hence, evaluation of a partial differential operator $\mathcal{P}$ acting on $u$ is achieved naturally with graph-based differentiation and can then be incorporated in the cost function along with initial and boundary conditions as:

\mathcal{L}=|u-u^{*}|+|\mathcal{P}u-0^{*}|+|u-u^{*}|_{\partial\Omega}+|u_{0}-u_{0}^{*}|,

(4)

where $\partial\Omega$ is the domain boundary, $u_{0}-u_{0}^{*}$ is the initial condition at $t=t_{0}$ , and $0^{*}$ indicates the expected (true) value for the differential relation $\mathcal{P}u$ at any given training point. The norm $|\cdot|$ of a generic quantity $g$ defined in $\Omega$ denotes $\frac{1}{N}\sum_{i=1}^{N}g(\mathbf{x}_{i})^{2}$ where the $\mathbf{x}_{i}$ ’s are the spatial points where the data is known. The dataset is then fed to the neural network and an optimization is performed to evaluate all the parameters of the model, including the parameters of the PDE.

2.3 Training PINN

Different algorithms that can be used to train a neural network. Among the choices available in Keras [37] we use the Adam optimization scheme [38], which we have found to outperform other choices such as Adagrad [39], for this task. Several algorithmic parameters affect the rate of convergence of the network training. Here we adopt the terminology in Keras [37], but the terminology in other modern machine learning packages is similar. The algorithmic parameters include batch-size, epochs, shuffle, and patience. Batch-size controls the number of samples from a dataset used to evaluate one gradient update. A batch-size of 1 would be associated with a full stochastic gradient descent optimization. One epoch is one round of training on a dataset. If a dataset is shuffled, then a new round of training (epoch) would result in an updated parameter set because the batched-gradients are evaluated on different batches. It is common to re-shuffle a dataset many times and perform the back-propagation updates. The optimizer may, however, stop earlier if it finds that new rounds of epochs are not improving the cost function. That is where the last keyword, patience, comes in. This is mainly because we are dealing with non-convex optimization and we need to test the training from different starting points and in different directions to build confidence on the parameters evaluated from minimization of the cost-function on a dataset. Patience is the parameter that controls when the optimizer should stop the training.

There are three ways to train the network: (1) generate a sufficiently large number of datasets and perform a one-epoch training on each dataset, (2) work on one dataset over many epochs by reshuffling the data, and (3) a combination of these. When dealing with synthetic data, all approaches are feasible to pursue. However, strategy (1) above is usually impossible to apply in practice, specially in space, where sensors are installed at fixed and limited locations. In the original work on PINN [22], approach (1) was used to train the model, where datasets are generated on random space discretizations at each epoch. Here, we follow approach (2) to use training data that we could realistically have in practice. For all examples, unless otherwise noted, we use a batch-size of 64, a limit of 10,000 epochs with shuffling, and a patience of 500 to perform the training.

3 Illustrative Example and Discussion

In this section, we use the PINN architecture on an illustrative linear elasticity problem.

3.1 Problem setup

To illustrate the application of the proposed approach, we consider an elastic plane-strain problem on the unit square (Fig. 1), subject to the boundary conditions depicted in the figure. The body forces are:

\begin{split}f_{x}&=\lambda\left[4\pi^{2}\cos(2\pi x)\sin(\pi y)-\pi\cos(\pi x)Qy^{3}\right]\\ ~{}&+\mu\left[9\pi^{2}\cos(2\pi x)\sin(\pi y)-\pi\cos(\pi x)Qy^{3}\right]\\ f_{y}&=\lambda\left[-3\sin(\pi x)Qy^{2}+2\pi^{2}\sin(2\pi x)\cos(\pi y)\right]\\ ~{}&+\mu\left[-6\sin(\pi x)Qy^{2}+2\pi^{2}\sin(2\pi x)\cos(\pi y)+\pi^{2}\sin(\pi x)Qy^{4}/4\right].\end{split}

(5)

The exact solution of this problem is

	$\displaystyle u_{x}(x,y)$	$\displaystyle=\cos(2\pi x)\sin(\pi y),$		(6)
	$\displaystyle u_{y}(x,y)$	$\displaystyle=\sin(\pi x)Qy^{4}/4.$		(7)

Refer to caption — Figure 1: Problem setup and boundary conditions.

which is plotted in Fig. 2, for parameter values of $\lambda=1$ , $\mu=0.5$ , and $Q=4$ .

3.2 Neural Network setup

Due to the symmetry of the stress and strain tensors, the quantities of interest for a two-dimensional problem are $u_{x}$ , $u_{y}$ , $\varepsilon_{xx}$ , $\varepsilon_{yy}$ , $\varepsilon_{xy}$ , $\sigma_{xx}$ , $\sigma_{yy}$ , $\sigma_{xy}$ . There are a few potential architectures that we can use to design our network. The input features (variables) are the spatial coordinates $(x,y)$ , for all the network choices. For the outputs, a potential design is to have a densely connected network with two outputs as $(u_{x},u_{y})$ . Another option is to have two densely connected independent networks with only one output each, associated with $u_{x}$ and $u_{y}$ , respectively (Fig. 3). Then, the remaining quantities of interest, i.e., $\sigma_{ij},\varepsilon_{ij}$ , can be obtained through differentiation. Alternatively, we may have $(u_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{xy})$ or $(u_{x},u_{y},\varepsilon_{xx},\varepsilon_{yy},\varepsilon_{xy})$ as outputs of one network or multiple independent networks. As can be seen from Fig. 3, these choices affect the number of parameters of the network and how different quantities of interest are correlated. Equation (3) shows that the the feed-forward neural network imposes a special functional form to the network that may not necessarily follow any cross-dependence between variables in the governing equations (1). Our data shows that using separate networks for each variable results in a far more effective strategy. Therefore, we propose to have variables $u_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{xy}$ defined as independent ANNs as our architecture of choice (see Fig. 4), i.e.

\begin{split}&u_{x}(\mathbf{x})\approx\mathcal{N}_{u_{x}}(\mathbf{x}),\\ &u_{y}(\mathbf{x})\approx\mathcal{N}_{u_{y}}(\mathbf{x}),\\ &\sigma_{xx}(\mathbf{x})\approx\mathcal{N}_{\sigma_{xx}}(\mathbf{x}),\\ &\sigma_{yy}(\mathbf{x})\approx\mathcal{N}_{\sigma_{yy}}(\mathbf{x}),\\ &\sigma_{xy}(\mathbf{x})\approx\mathcal{N}_{\sigma_{xy}}(\mathbf{x}),\end{split}

(8)

The cost function is defined as

\begin{split}\mathcal{L}&=|u_{x}-u_{x}^{*}|+|u_{y}-u^{*}_{y}|+|\sigma_{xx}-\sigma_{xx}^{*}|+|\sigma_{yy}-\sigma_{yy}^{*}|+|\sigma_{xy}-\sigma_{xy}^{*}|\\ &+|\sigma_{xx,x}+\sigma_{xy,y}+f_{x}^{*}|+|\sigma_{xy,x}+\sigma_{yy,y}+f_{y}^{*}|\\ &+|(\lambda+2\mu)\varepsilon_{xx}+\lambda\varepsilon_{yy}-\sigma_{xx}|+|(\lambda+2\mu)\varepsilon_{yy}+\lambda\varepsilon_{xx}-\sigma_{yy}|+|2\mu\varepsilon_{xy}-\sigma_{xy}|.\end{split}

(9)

The quantities with asterisks represent given data. We will train the networks so that their output values are as close as possible to the data, which may be real field data or, in this paper, synthetic data from the exact solution to the problem or the result of a high-fidelity simulation. The values without asterisk represent either direct outputs of the network (e.g., $u_{x}$ or $\sigma_{xx}$ ; see Eq. (8)) or quantities obtained through automatic graph-based differentiation [35] of the network outputs (e.g., $\varepsilon_{xx}=u_{x,x}$ ). In Eq. (9), $f_{x}^{*}$ and $f_{y}^{*}$ represent data on the body forces obtained as $f_{i}^{*}=-\sigma_{ij,j}^{*}$ .

The different terms in the cost function represent measures of the error in the displacement and stress fields, the momentum balance, and the constitutive law. This cost function can be used for deep-learning-based solution of PDEs as well as for identification of the model parameters. For the solution of PDEs, $\lambda$ and $\mu$ are treated as fixed numbers in the network. For parameter identification, $\lambda$ and $\mu$ are treated as network parameters that change during the training phase (see Fig. 4). In TensorFlow [30] this can be accomplished defining $\lambda$ and $\mu$ as Constant (PDE solution) or Variable (parameter identification) objects, respectively. We set up the problem using the SciANN [40] framework, a high-level Keras [37] wrapper for physics-informed deep learning and scientific computations. Experimenting with all of the previously mentioned network choices can be easily done in SciANN with minimal coding.¹¹1The code for some of the examples solved here is available at: https://github.com/sciann/examples.

3.3 Identification of model parameters: PINN trained on the exact solution

Here, we use PINN to identify the model parameters $\lambda$ and $\mu$ . Our data corresponds to the exact solution with parameter values $\lambda=1$ , $\mu=0.5$ and $Q=4$ . Our default dataset consists of 100 $\times$ 100 sample points, uniformly distributed. We study how the accuracy and the efficiency of the identification process depend on the architecture and functional form of the network; the available data; and whether we use one or several independent networks for the different quantities of interest. To study the impact of the architecture and functional form of the ANN, we use 4 different networks with either 5 or 10 hidden layers, and either 20 or 50 neurons per layer; see Table 1. The role of the network functional form is studied comparing the performance of the two most widely used activation functions, i.e., $\tanh$ and ReLU, where $\textrm{ReLU}(x)=\max(0,x)$ [1].

Studying the impact of the available data on the identification process is crucial because we are interested in identifying the model parameters with as little data as possible. We undertake the analysis considering two scenarios:

(a)

Stress-complete data: In this case, we have data at a set of points for the displacements and their first-order derivatives, that is, $u_{x}^{*}$ , $u_{y}^{*}$ , $\sigma_{xx}^{*}$ , $\sigma_{yy}^{*}$ , $\sigma_{xy}^{*}$ . Because our cost function (9) involves also data that depends on the stress derivatives ( $f_{x}^{*}$ and $f_{y}^{*}$ ), this approach relies on an additional algorithmic procedure for differentiation of stresses. In this section we compute the stress derivatives using second-order central finite-difference approximations.
(b)

Force-complete data: In this scenario, we have data at a set of points for the displacements, their first derivatives and their second derivatives. The availability of the displacement second derivatives allows us to determine data for the body forces $f_{x}^{*}$ and $f_{y}^{*}$ using the momentum balance equation without resorting to any differentiation algorithm.

Table 1: Statistics of the networks of choice to perform PINN learning.

Network	Layers	Neurons	Number of Parameters
Network	Layers	Neurons	Independent Networks	Single Network
i	5	20	12336	1893
ii	5	50	72816	10713
iii	10	20	27036	3993
iv	10	50	162066	23463

In Fig. 5 we compare the evolution of the cost function for stress-complete data (Fig. 5a) and force-complete data (Fig. 5b). Both figures show a comparison of the four network architectures that we study; see Table 1. We find that training on the force-complete data performs slightly better (lower loss) at a given epoch.

The result of convergence of model identification is shown in Fig. 6. The training converges to the true values of parameters, i.e., $\lambda=1$ and $\mu=1/2$ , for all cases. We find that the optimization is very quick on the parameters while it takes far more epochs to fit the network on the field variables. Additionally, we observe that deeper networks produce less accurate parameters. We attribute the loss of accuracy as we increase the ANN complexity to over-fitting [1, 3]. Convergence of the individual terms in the loss function (9) is shown in Fig. 7 for Net-ii (see Table 1). We find that all terms in the loss, i.e., data-driven and physics-informed, show oscillations during the optimization. Therefore, no individual term is solely responsible for the oscillations in the total loss (Fig. 5).

The impact of the ANN functional form can be examined comparing the data in Figs. 5b and 8a, which show the evolution of the cost function using the activation functions $\tanh$ and ReLU, respectively. The function ReLU has discontinuous derivatives, which explains its poor performance for physics-informed deep learning, whose effectiveness relies heavily on accurate evaluation of derivatives.

A comparison of Figs. 5b and 8b shows that using independent networks for displacements and stresses is more effective than using a single network. We find that the single network leads to less accurate elastic parameters because the cross-dependencies of the network outputs through the kinematic and constitutive relations may not be adequately represented by the $\tanh$ activation function.

Fig. 9 analyzes the effect of availability of data on the training. We computed the exact solution on four different uniform grids of size $10\times 10$ , $40\times 40$ , $160\times 160$ , and $640\times 640$ ; and carried out the parameter identification process. We performed the comparison using force-complete data and a network with 10 layers and 20 neurons per layer (network iii). The training process found good approximations to the parameters for all cases, including that with only $10\times 10$ points. The results show that fewer data points require many more epoch cycles, but the overall computational cost is far lower.

3.4 PINN models trained on the FEM solution

Here, we generate synthetic data from FEM solutions, and then perform the training. The domain is discretized with a mesh comprised of $40\times 40$ elements. Four datasets are prepared using quadrilateral bilinear, biquadratic, bicubic, and biquartic Lagrange $C^{0}$ elements using the commercial FEM software COMSOL [41]. We evaluate the FEM displacements, strains, stresses and stress derivatives at the center of each element. Then, we map the data to a $100\times 100$ training grid using SciPy’s griddata module with cubic interpolation. This step is performed as a data-augmentation procedure, which is a common practice in machine learning [1].

To analyze the importance of data satisfying the governing equations of the system, we focus our attention on network ii and we study cases with stress- and force-complete data. The results of training are presented in Fig. 10. As can be seen here, the bilinear element performs poorly on the learning and identification. The performance of training on the other elements is good, comparable to that using the analytical solution. Further analysis shows that this is indeed expected as FEM differentiation of bilinear elements provides a poor approximation of the body forces. The error in the body forces is shown in Fig. 11, which indicates a high error for bilinear elements. We conclude that the standard bilinear elements are not suitable for this problem to generate numerical data for deep learning. Fig. 10a2 confirms that pre-processing the data can remove the error that was present in the numerical solution with bilinear elements, and enable the optimization to successfully complete the identification.

3.5 PINN models trained on the IGA solution

Observing the lowest loss $\mathcal{L}$ on the analytical solution, we decided to study the influence of the global continuity of the numerical solution. We generated a $C^{3}$ -continuous dataset using Isogeometric analysis [42]. We, therefore, analyze the system using $C^{3}$ IGA elements with again a grid of $40\times 40$ dimension. The data are then mapped on to a grid of $100\times 100$ and used to train the PINN models. The training results are shown in Fig. 12. The outputs are very similar to the high-order FEM datasets.

3.6 Identification using transfer learning

Here we explore the applicability of our PINN framework to transfer learning: a neural network that is pre-trained is used to perform identification on a new dataset. The expectation is that since the initial state of neural network is not randomly chosen anymore, training should converge faster to the solution and parameters of the data. This is crucial for many practical aspects including adaptation to new data for online search or purchase history [34] or in geosciences, where we can train a representative PINN in highly-instrumented regions and use them at other locations with limited observational datasets. To this end, we use the pre-trained model on Net-iii (Fig. 5), which was trained on a dataset with $\lambda=1.0$ and $\mu=0.5$ and then we explore how the loss evolves and the training converges when data is generated with different values of $\mu\in\left\{2.0,1.5,1.0,0.1\right\}$ .

In Fig. 13 we show the convergence of the model with different datasets. Note that the loss is normalized by the initial value $\mathcal{L}_{0}$ from the pre-trained network on $\mu=0.5$ (Fig. 5). As can be seen here, re-training on new datasets costs only a few hundred epochs with a smaller initial value for the loss. This is pointing to the advantage of deep learning and PINN, where retraining on similar data is much less costly than classical methods that rely on forward simulations.

3.7 Application to sensitivity analysis

Performing sensitivity analysis is an expensive task when the analytical solution is not available, since it requires performing many forward numerical simulations. Alternatively, if we can construct a surrogate model to be a function of parameters of interest, then performing sensitivity analysis becomes tractable. However, construction of such a surrogate model is itself an expensive task within classical frameworks. Within PINN, however, this seems to be naturally possible. Let us suppose that the parameter of interest is shear modulus $\mu$ . Consider an ANN model with inputs as $(x,y,\mu)$ and outputs as $(u_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{xy})$ . We can, therefore, use a similar framework to construct a model that is a function of $\mu$ in addition to the space variables. Again, PINN can constrain the model to adapt to the physics of interest and therefore there is less data needed to construct such a model.

Here, we explore if a PINN model trained on multiple datasets generated with various material parameters, i.e., different values of $\mu$ , can be used as a surrogate model to perform sensitivity analysis. The network in Fig. 4 is now slightly adapted to carry $\mu$ as an extra input (in addition to $x,y$ ). The training set is prepared based on $\lambda=1$ and $\mu\in\left\{1/4,2/3,3/2,4\right\}$ . Note that there is no identification in this case, and therefore the parameters $\lambda$ and $\mu$ are known at any given training data. The results of the analysis are shown in Fig. 14. For a wide range of values of $\mu\in(0,9)$ , the model performs very well in terms of displacements; it is less accurate, but still very useful, in terms of stresses with a maximum error for near-incompressible conditions, $\mu\approx 0$ .

4 Extension to Nonlinear Elastoplasticity

In this section, we discuss the application of PINN to nonlinear solid mechanics problems undergoing elastic and plastic deformation. We use the von Mises elastoplastic constitutive model—a commonly used model to describe mechanical behavior of solid materials, in particular metals. We first describe the extension of the linear-elasticity relations in Eq. (1) to the von Mises elastoplastic relations. We then discuss the neural-network setup and apply the PINN framework to identify parameters of a classic problem: a perforated strip subjected to uniaxial extension.

4.1 von Mises elastoplasticity

We adopt the classic elastoplasticity postulate of additive decomposition of the strain tensor [43],

\varepsilon_{ij}=\varepsilon_{ij}^{e}+\varepsilon_{ij}^{p}.

(10)

The stress tensor is now linearly dependent on the elastic strain tensor:

\sigma_{ij}=\lambda\varepsilon_{kk}^{e}\delta_{ij}+2\mu\varepsilon_{ij}^{e}.

(11)

The plastic part of deformation tensor is evaluated through a plasticity model. The von Mises model implies that the plastic deformation occurs in the direction of normal to a yield surface $\mathcal{F}$ defined as $\mathcal{F}(\sigma_{ij}):=q-\sigma_{Y}$ , as

\varepsilon^{p}_{ij}=\gamma\frac{\partial\mathcal{F}}{\partial\sigma_{ij}},

(12)

where $\sigma_{Y}$ is the yield stress, $q$ is the equivalent stress defined as $q=\sqrt{3/2s_{ij}s_{ij}}$ , with $s_{ij}$ the components of the deviatoric stress tensor, $s_{ij}=\sigma_{ij}-\sigma_{kk}/3\delta_{ij}$ . The strain remains strictly elastic as long as the state of stress $\sigma_{ij}$ remains inside the yield surface, $\mathcal{F}<0$ . Plastic deformation occurs when the state of stress is on the yield surface, $\mathcal{F}=0$ . The condition $\mathcal{F}>0$ is associated with an inadmissible state of stress. Parameter $\gamma$ is the plastic multiplier, subject to the condition $\gamma\geq 0$ , and evaluated through a predictor–corrector algorithm by imposing the condition $\mathcal{F}\leq 0$ [43]. In the case of von Mises plasticity, the volumetric plastic deformation is zero, $\varepsilon_{kk}^{p}=0$ . It can be shown that the plastic multiplier $\gamma$ is equal to the equivalent plastic strain $\bar{\varepsilon}^{p}=\sqrt{2/3e_{ij}^{p}e_{ij}^{p}}$ , where $e_{ij}$ are the components of deviatoric strain tensor, $e_{ij}=\varepsilon_{ij}-\varepsilon_{kk}/3\delta_{ij}$ .

Therefore, the elastoplastic relations for a plane-strain problem can be summarized as:

\begin{split}\sigma_{ij,j}+f_{i}&=0,\\ \sigma_{ij}&=\lambda\varepsilon_{kk}\delta_{ij}+2\mu\varepsilon_{ij}=(\lambda+2/3\mu)\varepsilon_{kk}+s_{ij},\\ s_{ij}&=2\mu(e_{ij}-e^{p}_{ij}),\\ e^{p}_{ij}&=\bar{\varepsilon}^{p}\frac{\partial\mathcal{F}}{\partial\sigma_{ij}}=\bar{\varepsilon}^{p}\frac{2}{3}\frac{s_{ij}}{q},\\ \end{split}

(13)

subject to the elastoplasticity conditions:

\bar{\varepsilon}^{p}\geq 0,\quad\mathcal{F}\leq 0,\quad\bar{\varepsilon}^{p}\mathcal{F}=0,

(14)

also known as Karush–Kuhn–Tucker (KKT) conditions. For the von Mises model, the plastic multiplier $\bar{\varepsilon}^{p}$ can be expressed as

\bar{\varepsilon}^{p}=\bar{\varepsilon}-\frac{\sigma_{Y}}{3\mu}\geq 0,

(15)

where $\bar{\varepsilon}$ is the total equivalent strain, i.e., $\bar{\varepsilon}=\sqrt{2/3e_{ij}e_{ij}}$ . Therefore, the parameters of this model are the Lamé elastic parameters $\lambda$ and $\mu$ , and the yield stress $\sigma_{Y}$ .

4.2 Neural Network setup

The solution variables for a two-dimensional problem are $u_{x}$ , $u_{y}$ , $\varepsilon_{xx}$ , $\varepsilon_{yy}$ , $\varepsilon_{xy}$ , $\varepsilon_{xx}^{p}$ , $\varepsilon_{yy}^{p}$ , $\varepsilon_{zz}^{p}$ , $\varepsilon_{xy}^{p}$ , $\sigma_{xx}$ , $\sigma_{yy}$ , $\sigma_{zz}$ , $\sigma_{xy}$ . Since the out-of-plane components are no longer zero, they must be reflected in the choice of independent networks. Following the discussions for the linear elasticity case, we approximate the displacement and stress components $u_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{zz},\sigma_{xy}$ with nonlinear neural networks as:

\begin{split}{u}_{x}&\approx\mathcal{N}_{u_{x}}(\mathbf{x}),\\ {u}_{y}&\approx\mathcal{N}_{u_{y}}(\mathbf{x}),\\ \sigma_{xx}&\approx\mathcal{N}_{\sigma_{xx}}(\mathbf{x}),\\ \sigma_{yy}&\approx\mathcal{N}_{\sigma_{yy}}(\mathbf{x}),\\ \sigma_{zz}&\approx\mathcal{N}_{\sigma_{zz}}(\mathbf{x}),\\ \sigma_{xy}&\approx\mathcal{N}_{\sigma_{xy}}(\mathbf{x}).\\ \end{split}

(16)

The associated cost function is then defined as

\begin{split}\mathcal{L}&=|u_{x}-u_{x}^{*}|+|u_{y}-u^{*}_{y}|\\ &+|\sigma_{xx}-\sigma_{xx}^{*}|+|\sigma_{yy}-\sigma_{yy}^{*}|+|\sigma_{zz}-\sigma_{zz}^{*}|+|\sigma_{xy}-\sigma_{xy}^{*}|\\ &+|\sigma_{xx,x}+\sigma_{xy,y}-f_{x}^{*}|+|\sigma_{xy,x}+\sigma_{yy,y}-f_{y}^{*}|\\ &+|(\lambda+2/3\mu)\varepsilon_{kk}+2\mu(e_{xx}-e^{p}_{xx})-\sigma_{xx}|\\ &+|(\lambda+2/3\mu)\varepsilon_{kk}+2\mu(e_{yy}-e^{p}_{yy})-\sigma_{yy}|\\ &+|(\lambda+2/3\mu)\varepsilon_{kk}+2\mu(e_{zz}-e^{p}_{zz})-\sigma_{zz}|\\ &+|2\mu(e_{xy}-e^{p}_{xy})-\sigma_{xy}|+|(\bar{\varepsilon}-\sigma_{Y}/3\mu)-\bar{\varepsilon}^{p}|\\ &+|(1-\textrm{sign}(\bar{\varepsilon}^{p}))|\bar{\varepsilon}^{p}||+|(1+\textrm{sign}(\mathcal{F}))|\mathcal{F}||+|\bar{\varepsilon}^{p}\mathcal{F}|.\\ \end{split}

(17)

The KKT positivity and negativity conditions are imposed through a penalty constraint in the loss function. For instance, $\bar{\varepsilon}^{p}\geq 0$ is incorporated in the loss as $(1-\textrm{sign}(\bar{\varepsilon}^{p}))|\bar{\varepsilon}^{p}|$ . Therefore, for values of $\bar{\varepsilon}^{p}<0$ , the resulting ‘cost’ is $2|\bar{\varepsilon}^{p}|$ , which should vanish.

4.3 Illustrative example

We use a classic example to illustrate our framework: a perforated strip subjected to uniaxial extension [44, 43]. Consider a plate of dimensions $200~{}\text{mm}\times 360~{}\text{mm}$ , with a circular hole of diameter $100~{}\text{mm}$ located in the center of the plate. The plate is subjected to extension displacements of $\delta=1~{}\text{mm}$ along the short edge, under plane-strain condition, and without body forces, $f_{i}=0$ . The parameters are $\lambda=19.44~{}\textrm{GPa}$ , $\mu=29.17~{}\textrm{GPa}$ and $\sigma_{Y}=243.0~{}\textrm{MPa}$ . Due to symmetry, only a quarter of the domain needs to be considered in the simulation. The synthetic data is generated from a high-fidelity FEM simulation using COMSOL software [41] on a mesh of 13041 quartic triangular elements (Fig. 15). The plate undergoes significant plastic deformation around the circular hole, as can be seen from $\varepsilon_{ij}^{p}$ contours in Fig. 15. This results in localized deformation in the form of a shear band. While the strain exhibits localization, the stress field remains continuous and smooth—a behavior that is due to the choice of a perfect-plasticity model with no hardening.

We use 2,000 data points from this reference solution, randomly distributed in the simulation domain, to provide the training data. The PINN training is performed using networks with 4 layers, each with 100 neurons, and with a hyperbolic-tangent activation function. The optimization parameters are the same as those used for the linear elasticity problem. The results predicted by the PINN approach match the reference results very closely, as evidenced by: (1) the very small errors in each of the components of the solution, except for the out-of-plane plastic strain components (Fig. 16); and (2) the precise identification of yield stress $\sigma_{Y}$ and relatively accurate identification of elastic parameters $\lambda$ and $\mu$ , yielding estimated values $\lambda=18.3~{}\textrm{GPa}$ , $\mu=27.6~{}\textrm{GPa}$ and $\sigma_{Y}=243.0~{}\textrm{MPa}$ .

5 Conclusions

We study the application of a class of deep learning, known as Physics-Informed Neural Networks (PINN), for solution and discovery in solid mechanics. In this work, we formulate and apply the framework to a linear elastostatics problem, which we analyze in detail, but then illustrate the application of the method to nonlinear elastoplasticity. We study the sensitivity of the proposed framework to noise in data coming from different numerical techniques. We find that the optimizer performs much better on data from high-order classical finite elements, or with methods with enhanced continuity such as Isogeometric Analysis. We analyze the impact of the size and depth of the network, and the size of the dataset from uniform sampling of the numerical solution—an aspect that is important in practice given the cost of a dense monitoring network. We find that the proposed PINN approach is able to converge to the solution and identify the parameters quite efficiently with as little as 100 data points.

We also explore transfer learning, that is, the use a pre-trained neural network to perform training on new datasets with different parameters. We find that training converges much faster when this is done. Lastly, we study the applicability of the model as a surrogate model for sensitivity analysis. To this end, we introduce shear modulus $\mu$ as an input variable to the network. When training only on four values of $\mu$ , we find that the network predicts the solution quite accurately on a wide range of values for $\mu$ , a feature that is indicative of the robustness of the approach.

Despite the success exhibited by the PINN approach, we have found that it faces challenges when dealing with problems with discontinuous solutions. The network architecture is less accurate on problems with localized high gradients as a result of discontinuities in the material properties or boundary conditions. We find that, in those cases, the results are artificially diffuse where they should be sharp. We speculate that the underlying reason for this behavior is the particular architecture of the network, where the input variables are only the spatial dimensions ( $x$ and $y$ ), rendering the network unable to produce the required variability needed for gradient-based optimization that would capture solutions with high gradients. Addressing this extension is an exciting avenue for future work in machine-learning applications to solid mechanics.

Acknowledgements

This work was funded by the KFUPM-MIT collaborative agreement ‘Multiscale Reservoir Science’.

References

Bishop [2006] C. M. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, Berlin, Heidelberg, 2006. URL: https://www.springer.com/gp/book/9780387310732. doi:https://dl.acm.org/doi/book/10.5555/1162264.
LeCun et al. [2015] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444. URL: https://doi.org/10.1038/nature14539. doi:10.1038/nature14539.
Goodfellow et al. [2016] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT press, 2016. URL: https://www.deeplearningbook.org. doi:https://dl.acm.org/doi/book/10.5555/3086952.
Yoon et al. [2015] C. E. Yoon, O. O’Reilly, K. J. Bergen, G. C. Beroza, Earthquake detection through computationally efficient similarity search, Science Advances 1 (2015) e1501057. URL: http://advances.sciencemag.org/lookup/doi/10.1126/sciadv.1501057. doi:10.1126/sciadv.1501057.
Bergen et al. [2019] K. J. Bergen, P. A. Johnson, M. V. de Hoop, G. C. Beroza, Machine learning for data-driven discovery in solid earth geoscience, Science 363 (2019). URL: https://science.sciencemag.org/content/363/6433/eaau0323. doi:10.1126/science.aau0323. arXiv:https://science.sciencemag.org/content/363/6433/eaau0323.full.pdf.
DeVries et al. [2018] P. M. DeVries, F. Viégas, M. Wattenberg, B. J. Meade, Deep learning of aftershock patterns following large earthquakes, Nature 560 (2018) 632–634. URL: http://dx.doi.org/10.1038/s41586-018-0438-y. doi:10.1038/s41586-018-0438-y.
Kong et al. [2018] Q. Kong, D. T. Trugman, Z. E. Ross, M. J. Bianco, B. J. Meade, P. Gerstoft, Machine learning in seismology: turning data into insights, Seismological Research Letters 90 (2018) 3–14. doi:10.1785/0220180259.
Ren et al. [2019] C. X. Ren, O. Dorostkar, B. Rouet-Leduc, C. Hulbert, D. Strebel, R. A. Guyer, P. A. Johnson, J. Carmeliet, Machine learning reveals the state of intermittent frictional dynamics in a sheared granular fault, Geophysical Research Letters 46 (2019) 7395–7403. URL: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2019GL082706. doi:10.1029/2019GL082706. arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2019GL082706.
Pilania et al. [2013] G. Pilania, C. Wang, X. Jiang, S. Rajasekaran, R. Ramprasad, Accelerating materials property predictions using machine learning, Scientific Reports 3 (2013) 1–6. doi:10.1038/srep02810.
Butler et al. [2018] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, A. Walsh, Machine learning for molecular and materials science, Nature 559 (2018) 547–555. URL: http://dx.doi.org/10.1038/s41586-018-0337-2. doi:10.1038/s41586-018-0337-2.
Shi et al. [2019] Z. Shi, E. Tsymbalov, M. Dao, S. Suresh, A. Shapeev, J. Li, Deep elastic strain engineering of bandgap through machine learning, Proceedings of the National Academy of Sciences 116 (2019) 4117–4122. URL: http://www.pnas.org/lookup/doi/10.1073/pnas.1818555116. doi:10.1073/pnas.1818555116.
Brunton and Kutz [2019] S. L. Brunton, J. N. Kutz, Methods for data-driven multiscale model discovery for materials, Journal of Physics: Materials 2 (2019) 044002. URL: https://doi.org/10.1088%2F2515-7639%2Fab291e. doi:10.1088/2515-7639/ab291e.
Brenner et al. [2019] M. P. Brenner, J. D. Eldredge, J. B. Freund, Perspective on machine learning for advancing fluid mechanics, Physical Review Fluids 4 (2019) 100501. URL: https://link.aps.org/doi/10.1103/PhysRevFluids.4.100501. doi:10.1103/PhysRevFluids.4.100501.
Brunton et al. [2020] S. L. Brunton, B. R. Noack, P. Koumoutsakos, Machine learning for fluid mechanics, Annual Review of Fluid Mechanics 52 (2020) 477–508. URL: https://doi.org/10.1146/annurev-fluid-010719-060214. doi:10.1146/annurev-fluid-010719-060214.
Libbrecht and Noble [2015] M. W. Libbrecht, W. S. Noble, Machine learning applications in genetics and genomics, Nature Reviews Genetics 16 (2015) 321–332. URL: http://dx.doi.org/10.1038/nrg3920. doi:10.1038/nrg3920.
Rafiei and Adeli [2017] M. H. Rafiei, H. Adeli, A novel machine learning-based algorithm to detect damage in high-rise building structures, Structural Design of Tall and Special Buildings 26 (2017) 1–11. doi:10.1002/tal.1400.
Sen et al. [2019] D. Sen, A. Aghazadeh, A. Mousavi, S. Nagarajaiah, R. Baraniuk, A. Dabak, Data-driven semi-supervised and supervised learning algorithms for health monitoring of pipes, Mechanical Systems and Signal Processing 131 (2019) 524–537. URL: https://doi.org/10.1016/j.ymssp.2019.06.003. doi:10.1016/j.ymssp.2019.06.003.
Ghaboussi and Sidarta [1998] J. Ghaboussi, D. Sidarta, New nested adaptive neural networks (NANN) for constitutive modeling, Computers and Geotechnics 22 (1998) 29–52.
Kalidindi et al. [2011] S. R. Kalidindi, S. R. Niezgoda, A. A. Salem, Microstructure informatics using higher-order statistics and efficient data-mining protocols, JOM 63 (2011) 34–41. URL: https://doi.org/10.1007/s11837-011-0057-7. doi:10.1007/s11837-011-0057-7.
Mozaffar et al. [2019] M. Mozaffar, R. Bostanabad, W. Chen, K. Ehmann, J. Cao, M. A. Bessa, Deep learning predicts path-dependent plasticity, Proceedings of the National Academy of Sciences 116 (2019) 26414–26420. URL: https://www.pnas.org/content/116/52/26414. doi:10.1073/pnas.1911815116. arXiv:https://www.pnas.org/content/116/52/26414.full.pdf.
Rudy et al. [2019] S. Rudy, A. Alla, S. L. Brunton, J. N. Kutz, Data-driven identification of parametric partial differential equations, SIAM Journal on Applied Dynamical Systems 18 (2019) 643–660. URL: https://epubs.siam.org/doi/abs/10.1137/18M1191944. doi:10.1137/18M1191944.
Raissi et al. [2019] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019) 686–707. URL: https://doi.org/10.1016/j.jcp.2018.10.045. doi:10.1016/j.jcp.2018.10.045.
Han et al. [2018] J. Han, A. Jentzen, W. E, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences 115 (2018) 8505–8510. URL: https://www.pnas.org/content/115/34/8505. doi:10.1073/pnas.1718942115.
Bar-Sinai et al. [2019] Y. Bar-Sinai, S. Hoyer, J. Hickey, M. P. Brenner, Learning data-driven discretizations for partial differential equations, Proceedings of the National Academy of Sciences 116 (2019) 15344–15349. URL: https://www.pnas.org/content/116/31/15344. doi:10.1073/pnas.1814058116. arXiv:https://www.pnas.org/content/116/31/15344.full.pdf.
Zhu et al. [2019] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, P. Perdikaris, Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data, Journal of Computational Physics 394 (2019) 56–81. URL: https://www.sciencedirect.com/science/article/pii/S0021999119303559. doi:10.1016/j.jcp.2019.05.024.
Meade and Fernandez [1994] A. J. Meade, A. A. Fernandez, The numerical solution of linear ordinary differential equations by feed-forward neural networks, Mathematical and Computer Modelling 19 (1994) 1–25. URL: https://www.sciencedirect.com/science/article/pii/0895717794900957. doi:10.1016/0895-7177(94)90095-7.
Lagaris et al. [1998] I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differential equations, IEEE Transactions on Neural Networks 9 (1998) 987–1000. URL: https://ieeexplore.ieee.org/document/712178. doi:10.1109/72.712178.
Lagaris et al. [2000] I. E. Lagaris, A. C. Likas, D. G. Papageorgiou, Neural-network methods for boundary value problems with irregular boundaries, IEEE Transactions on Neural Networks 11 (2000) 1041–1049. URL: https://ieeexplore.ieee.org/document/870037. doi:10.1109/72.870037.
Bergstra et al. [2010] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, Y. Bengio, Theano: a CPU and GPU math expression compiler, in: Proceedings of the Python for Scientific Computing Conference (SciPy), volume 4, Austin, TX, 2010.
Abadi et al. [2016] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, TensorFlow: A system for large-scale machine learning, in: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), USENIX Association, Savannah, GA, 2016, pp. 265–283. URL: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.
Lange et al. [2012] S. Lange, T. Gabel, M. Riedmiller, Reinforcement learning, volume 12 of Adaptation, Learning, and Optimization, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. URL: http://link.springer.com/10.1007/978-3-642-27645-3. doi:10.1007/978-3-642-27645-3.
Hughes et al. [2005] T. Hughes, J. Cottrell, Y. Bazilevs, Isogeometric analysis: CAD, finite elements, NURBS, exact geometry and mesh refinement, Computer Methods in Applied Mechanics and Engineering 194 (2005) 4135–4195. URL: http://www.sciencedirect.com/science/article/pii/S0045782504005171. doi:https://doi.org/10.1016/j.cma.2004.10.008.
Cottrell et al. [2009] J. A. Cottrell, T. J. Hughes, Y. Bazilevs, Isogeometric Analysis: Toward Integration of CAD and FEA, John Wiley & Sons, 2009. doi:https://dl.acm.org/doi/book/10.5555/1816404.
Taylor and Stone [2009] M. E. Taylor, P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research 10 (2009) 1633–1685. URL: http://www.jmlr.org/papers/v10/taylor09a.html. doi:https://dl.acm.org/doi/10.5555/1577069.1755839.
Baydin et al. [2017] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in machine learning: a survey, The Journal of Machine Learning Research 18 (2017) 5595–5637. URL: https://dl.acm.org/doi/abs/10.5555/3122009.3242010.
Chen et al. [2015] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, Z. Zhang, MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems (2015). arXiv:1512.01274.
Chollet et al. [2015] F. Chollet, et al., Keras, 2015.
Kingma and Ba [2014] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization (2014). arXiv:1412.6980.
Duchi et al. [2011] J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research 12 (2011) 2121–2159. URL: http://jmlr.org/papers/v12/duchi11a.html.
Haghighat and Juanes [2019] E. Haghighat, R. Juanes, SciANN: A Keras wrapper for scientific computations and physics-informed deep learning using artificial neural networks, https://sciann.com, 2019. URL: https://github.com/sciann/sciann.git.
COMSOL [2020] COMSOL, COMSOL Multiphysics User’s Guide, COMSOL, Stockholm, Sweden, 2020.
Bazilevs et al. [2010] Y. Bazilevs, V. M. Calo, J. A. Cottrell, J. A. Evans, T. J. R. Hughes, S. Lipton, M. A. Scott, T. W. Sederberg, Isogeometric analysis using T-splines, Computer Methods in Applied Mechanics and Engineering 199 (2010) 229–263. URL: https://www.sciencedirect.com/science/article/pii/S0045782509000875. doi:10.1016/j.cma.2009.02.036.
Simo and Hughes [1998] J. C. Simo, T. J. R. Hughes, Computational Inelasticity, volume 7 of Interdisciplinary Applied Mathematics, Springer, New York, 1998.
Zienkiewicz et al. [1969] O. Zienkiewicz, S. Valliappan, I. King, Elasto-plastic solutions of engineering problems ‘initial stress’, finite element approach, International Journal for Numerical Methods in Engineering 1 (1969) 75–100.