This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A deep learning framework for solution and discovery in solid mechanics

Ehsan Haghighat Massachusetts Institute of Technology, Cambridge, MA Maziar Raissi University of Colorado Boulder, Boulder, CO Adrian Moure Hector Gomez Purdue University, West Lafayette, IN Ruben Juanes
Abstract

We present the application of a class of deep learning, known as Physics Informed Neural Networks (PINN), to learning and discovery in solid mechanics. We explain how to incorporate the momentum balance and constitutive relations into PINN, and explore in detail the application to linear elasticity, and illustrate its extension to nonlinear problems through an example that showcases von Mises elastoplasticity. While common PINN algorithms are based on training one deep neural network (DNN), we propose a multi-network model that results in more accurate representation of the field variables. To validate the model, we test the framework on synthetic data generated from analytical and numerical reference solutions. We study convergence of the PINN model, and show that Isogeometric Analysis (IGA) results in superior accuracy and convergence characteristics compared with classic low-order Finite Element Method (FEM). We also show the applicability of the framework for transfer learning, and find vastly accelerated convergence during network re-training. Finally, we find that honoring the physics leads to improved robustness: when trained only on a few parameters, we find that the PINN model can accurately predict the solution for a wide range of parameters new to the network—thus pointing to an important application of this framework to sensitivity analysis and surrogate modeling.

keywords:
Artificial neural network , Physics-informed deep learning , Inversion , Transfer learning , Linear elasticity , Elastoplasticity

1 Introduction

Over the past few years, there has been a revolution in the successful application of Artificial Neural Networks (ANN), also commonly referred to Deep Neural Networks (DNN) and Deep Learning (DL), in various fields including image classification, handwriting recognition, speech recognition and translation, and computer vision. These ANN approaches have led to a sea change in the performance of search engines, autonomous driving, e-commerce, and photography (see [1, 2, 3] for a review). In engineering and science, ANNs have been applied to an increasing number of areas, including geosciences [4, 5, 6, 7, 8], material science [9, 10, 11, 12], fluid mechanics [13, 14], genetics [15], and infrastructure health monitoring [16, 17], to name a few examples. In the solid and geomechanics community, deep learning has been used primarily for material modeling, in an attempt to replace classical constitutive models with ANNs [18, 19, 20]. In these applications, training of the network, i.e., evaluation of the network parameters, is carried out by minimizing the norm of the distance between the network output (prediction) and the true output (training data). In this paper, we will refer to ANNs trained in this way as “data-driven.”

A different class of ANNs, known as Physics-Informed Neural Networks (PINN), was introduced recently [21, 22, 23, 24, 25]. This concept of ANNs was developed to endow the network model with known equations that govern the physics of a system. The training of PINNs is performed with a cost function that, in addition to data, includes the governing equations, initial and boundary conditions. This architecture can be used for solution and discovery (finding parameters) of systems of ordinary differential equations (ODEs) and partial differential equations (PDEs). While solving ODEs and PDEs with ANNs is not a new topic, e.g., [26, 27, 28], the success of these recent studies can be broadly attributed to: (1) the choice of network architecture, i.e., the set of inputs and outputs of the ANN, so that one can impose governing equations on the network; (2) algorithmic advances, including graph-based automatic differentiation for accurate differentiation of ANN functionals and for error back-propagation; and (3) availability of advanced machine-learning software with CPU and GPU parallel processing capabilities including Theano [29] and TensorFlow [30].

This framework has been used for solution and discovery of Schrodinger, Allen–Cahn, and Navier–Stokes equations [22, 21]. It has also been used for solution of high-dimensional stochastic PDEs [23]. As pointed out in [23], this approach can be considered as a class of Reinforcement Learning [31], where the learning is on maximizing an incentive or minimizing a loss rather than direct training on data. If the network prediction does not satisfy a governing equation, it will result in an increase in the cost and therefore the learning traverses a path that minimizes that cost.

Here, we focus on the novel application of PINNs to solution and discovery of solid mechanics. We study linear elasticity in detail, but then illustrate the performance on nonlinear von Mises elastoplasticity. Since parameters of the governing PDEs can also be defined as trainable parameters, the framework inherently allows us to perform parameter identification (model inversion). We validate the framework on synthetic data generated from low-order and high-order Finite Element Methods (FEM) and from Isogeometric Analysis (IGA) [32, 33]. These datasets satisfy the governing equations with different order of accuracy, where the error can be considered as noise in data. We find that the training converges faster on more accurate datasets, pointing to importance of higher-order numerical methods for pre-training ANNs. We also find that if the data is pre-processed properly, the training converges to the correct solution and correct parameters even on data generated with a coarse mesh and low-order FEM—an important result that illustrates the robustness of the proposed approach. Finally, we find that, due to the imposition of the physics constraints, the training converges on a very sparse data set, which is a crucial property in practice given that the installation of a dense network of sensors can be very costly.

Parameter estimation (identification) of complex models is a challenging task that requires a large number of forward simulations, depending on model complexity and the number of parameters. As a result, most inversion techniques have been applied to simplified models. The use of PINNs, however, allows us to perform identification simultaneously with fitting the ANN model on data [22]. This property highlights the potential of this approach compared with classical methods. We explore the application of PINN models for identification of multiple datasets generated with different parameters. Similar to transfer learning, where a pre-trained model is used as the initial state of the network [34], we perform re-training on new datasets starting from a previously trained network on a different dataset (with different parameters). We find that the re-training and identification of other datasets take far less time. Since the successfully trained PINN model should also satisfy the physics constraints, it is in effect a surrogate model that can be used for extrapolation on unexplored data. To test this property, we train a network on four datasets with different parameters and then test it on a wide range of new parameter sets, and find that the results remain relatively accurate. This property points to the applicability of PINN models for sensitivity analysis, where classical approaches typically require an exceedingly large number of forward simulations.

2 Physics-Informed Neural Networks: Linear Elasticity

In this section, we review the equations of linear elastostatics with emphasis on PINN implementation.

2.1 Linear elasticity

The equations expressing momentum balance, the constitutive model and the kinematic relations are, respectively,

σij,j+fi=0,σij=λδijεkk+2μεij,εij=12(ui,j+uj,i).\begin{split}\sigma_{ij,j}+f_{i}&=0,\\ \sigma_{ij}&=\lambda\delta_{ij}\varepsilon_{kk}+2\mu\varepsilon_{ij},\\ \varepsilon_{ij}&=\frac{1}{2}\left(u_{i,j}+u_{j,i}\right).\end{split} (1)

Here, σij\sigma_{ij} denotes the Cauchy stress tensor. For the two-dimensional problems considered here i,j=1,2i,j=1,2 (or i,j=x,yi,j=x,y). We use the summation convention, and an subscript comma denotes partial derivative. The function fif_{i} denotes a body force, uiu_{i} represents the displacements, εij\varepsilon_{ij} is the infinitesimal stress tensor and δij\delta_{ij} is the Kronecker delta. The Lamé parameters λ\lambda and μ\mu are the quantities to be inferred using PINN.

2.2 Introduction to Physics-Informed Neural Networks

In this section, we provide an overview of the Physics-Informed Neural Networks (PINN) architecture, with emphasis on their application to model inversion. Let 𝒩(𝐱;𝐖,𝐛):d𝐱d𝐲\mathcal{N}(\mathbf{x};\mathbf{W},\mathbf{b}):\mathbb{R}^{d_{\mathbf{x}}}\rightarrow\mathbb{R}^{d_{\mathbf{y}}} be an LL-layer neural network with input vector 𝐱\mathbf{x}, output vector 𝐲\mathbf{y}, and network parameters 𝐖,𝐛\mathbf{W},\mathbf{b}. This network is a feed-forward network, meaning that each layer creates data for the next layer through the following nested transformations:

𝐳l=σl(𝐖l𝐳l1+𝐛l),l=1,,L,\mathbf{z}^{l}=\sigma^{l}\left(\mathbf{W}^{l}\mathbf{z}^{l-1}+\mathbf{b}^{l}\right),\quad l=1,\dots,L, (2)

where 𝐳0𝐱\mathbf{z}^{0}\equiv\mathbf{x} and 𝐳L𝐲\mathbf{z}^{L}\equiv\mathbf{y} are inputs and outputs of the model, 𝐖l,𝐛l\mathbf{W}^{l},\mathbf{b}^{l} are parameters of each layer ll, known as weights and biases, respectively. The functions σl\sigma^{l} are called activation functions and make the network nonlinear with respect to the inputs. For instance, an ANN functional of some field variable, such as displacement u(𝐱)u(\mathbf{x}), with three hidden layers and with σl=tanh\sigma^{l}=\tanh as the activation function for all layers except the last can be written as

𝐳1(𝐱)=tanh(𝐖0𝐱+𝐛0),𝐳2(𝐱)=tanh(𝐖1𝐳1+𝐛1),𝐳3(𝐱)=tanh(𝐖2𝐳2+𝐛2),u(𝐱)=𝐖3𝐳3+𝐛3.\begin{split}\mathbf{z}^{1}(\mathbf{x})&=\tanh(\mathbf{W}^{0}\mathbf{x}+\mathbf{b}^{0}),\\ \mathbf{z}^{2}(\mathbf{x})&=\tanh(\mathbf{W}^{1}\mathbf{z}^{1}+\mathbf{b}^{1}),\\ \mathbf{z}^{3}(\mathbf{x})&=\tanh(\mathbf{W}^{2}\mathbf{z}^{2}+\mathbf{b}^{2}),\\ u(\mathbf{x})&=\mathbf{W}^{3}\mathbf{z}^{3}+\mathbf{b}^{3}.\end{split} (3)

This model can be considered as an approximate solution for the field variable uu of a partial differential equation.

In the PINN architecture, the network inputs (also known as features) are space and time variables, i.e., (x,y,z,t)(x,y,z,t) in Cartesian coordinates, which makes it meaningful to perform the differentiation of the network’s output with respect to any of the input variables. Classical implementations based on finite difference approximations are not accurate when applied to deep networks (see [35] for a review). Thanks to modern graph-based implementation of the feed-forward network (e.g., Theano [29], Tensorflow [30], MXNet [36]), this can be carried out using Automatic Differentiation at machine precision, therefore allowing for many hidden layers to represent nonlinear response. Hence, evaluation of a partial differential operator 𝒫\mathcal{P} acting on uu is achieved naturally with graph-based differentiation and can then be incorporated in the cost function along with initial and boundary conditions as:

=|uu|+|𝒫u0|+|uu|Ω+|u0u0|,\mathcal{L}=|u-u^{*}|+|\mathcal{P}u-0^{*}|+|u-u^{*}|_{\partial\Omega}+|u_{0}-u_{0}^{*}|, (4)

where Ω\partial\Omega is the domain boundary, u0u0u_{0}-u_{0}^{*} is the initial condition at t=t0t=t_{0}, and 00^{*} indicates the expected (true) value for the differential relation 𝒫u\mathcal{P}u at any given training point. The norm |||\cdot| of a generic quantity gg defined in Ω\Omega denotes 1Ni=1Ng(𝐱i)2\frac{1}{N}\sum_{i=1}^{N}g(\mathbf{x}_{i})^{2} where the 𝐱i\mathbf{x}_{i}’s are the spatial points where the data is known. The dataset is then fed to the neural network and an optimization is performed to evaluate all the parameters of the model, including the parameters of the PDE.

2.3 Training PINN

Different algorithms that can be used to train a neural network. Among the choices available in Keras [37] we use the Adam optimization scheme [38], which we have found to outperform other choices such as Adagrad [39], for this task. Several algorithmic parameters affect the rate of convergence of the network training. Here we adopt the terminology in Keras [37], but the terminology in other modern machine learning packages is similar. The algorithmic parameters include batch-size, epochs, shuffle, and patience. Batch-size controls the number of samples from a dataset used to evaluate one gradient update. A batch-size of 1 would be associated with a full stochastic gradient descent optimization. One epoch is one round of training on a dataset. If a dataset is shuffled, then a new round of training (epoch) would result in an updated parameter set because the batched-gradients are evaluated on different batches. It is common to re-shuffle a dataset many times and perform the back-propagation updates. The optimizer may, however, stop earlier if it finds that new rounds of epochs are not improving the cost function. That is where the last keyword, patience, comes in. This is mainly because we are dealing with non-convex optimization and we need to test the training from different starting points and in different directions to build confidence on the parameters evaluated from minimization of the cost-function on a dataset. Patience is the parameter that controls when the optimizer should stop the training.

There are three ways to train the network: (1) generate a sufficiently large number of datasets and perform a one-epoch training on each dataset, (2) work on one dataset over many epochs by reshuffling the data, and (3) a combination of these. When dealing with synthetic data, all approaches are feasible to pursue. However, strategy (1) above is usually impossible to apply in practice, specially in space, where sensors are installed at fixed and limited locations. In the original work on PINN [22], approach (1) was used to train the model, where datasets are generated on random space discretizations at each epoch. Here, we follow approach (2) to use training data that we could realistically have in practice. For all examples, unless otherwise noted, we use a batch-size of 64, a limit of 10,000 epochs with shuffling, and a patience of 500 to perform the training.

3 Illustrative Example and Discussion

In this section, we use the PINN architecture on an illustrative linear elasticity problem.

3.1 Problem setup

To illustrate the application of the proposed approach, we consider an elastic plane-strain problem on the unit square (Fig. 1), subject to the boundary conditions depicted in the figure. The body forces are:

fx=λ[4π2cos(2πx)sin(πy)πcos(πx)Qy3]+μ[9π2cos(2πx)sin(πy)πcos(πx)Qy3]fy=λ[3sin(πx)Qy2+2π2sin(2πx)cos(πy)]+μ[6sin(πx)Qy2+2π2sin(2πx)cos(πy)+π2sin(πx)Qy4/4].\begin{split}f_{x}&=\lambda\left[4\pi^{2}\cos(2\pi x)\sin(\pi y)-\pi\cos(\pi x)Qy^{3}\right]\\ ~{}&+\mu\left[9\pi^{2}\cos(2\pi x)\sin(\pi y)-\pi\cos(\pi x)Qy^{3}\right]\\ f_{y}&=\lambda\left[-3\sin(\pi x)Qy^{2}+2\pi^{2}\sin(2\pi x)\cos(\pi y)\right]\\ ~{}&+\mu\left[-6\sin(\pi x)Qy^{2}+2\pi^{2}\sin(2\pi x)\cos(\pi y)+\pi^{2}\sin(\pi x)Qy^{4}/4\right].\end{split} (5)

The exact solution of this problem is

ux(x,y)\displaystyle u_{x}(x,y) =cos(2πx)sin(πy),\displaystyle=\cos(2\pi x)\sin(\pi y), (6)
uy(x,y)\displaystyle u_{y}(x,y) =sin(πx)Qy4/4.\displaystyle=\sin(\pi x)Qy^{4}/4. (7)
Refer to caption
Figure 1: Problem setup and boundary conditions.

which is plotted in Fig. 2, for parameter values of λ=1\lambda=1, μ=0.5\mu=0.5, and Q=4Q=4.

Refer to caption
Figure 2: Exact solution in Eqs. (6)–(7) for parameter values of λ=1\lambda=1, μ=0.5\mu=0.5, and Q=4Q=4.

3.2 Neural Network setup

Due to the symmetry of the stress and strain tensors, the quantities of interest for a two-dimensional problem are uxu_{x}, uyu_{y}, εxx\varepsilon_{xx}, εyy\varepsilon_{yy}, εxy\varepsilon_{xy}, σxx\sigma_{xx}, σyy\sigma_{yy}, σxy\sigma_{xy}. There are a few potential architectures that we can use to design our network. The input features (variables) are the spatial coordinates (x,y)(x,y), for all the network choices. For the outputs, a potential design is to have a densely connected network with two outputs as (ux,uy)(u_{x},u_{y}). Another option is to have two densely connected independent networks with only one output each, associated with uxu_{x} and uyu_{y}, respectively (Fig. 3). Then, the remaining quantities of interest, i.e., σij,εij\sigma_{ij},\varepsilon_{ij}, can be obtained through differentiation. Alternatively, we may have (ux,uy,σxx,σyy,σxy)(u_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{xy}) or (ux,uy,εxx,εyy,εxy)(u_{x},u_{y},\varepsilon_{xx},\varepsilon_{yy},\varepsilon_{xy}) as outputs of one network or multiple independent networks. As can be seen from Fig. 3, these choices affect the number of parameters of the network and how different quantities of interest are correlated. Equation (3) shows that the the feed-forward neural network imposes a special functional form to the network that may not necessarily follow any cross-dependence between variables in the governing equations (1). Our data shows that using separate networks for each variable results in a far more effective strategy. Therefore, we propose to have variables ux,uy,σxx,σyy,σxyu_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{xy} defined as independent ANNs as our architecture of choice (see Fig. 4), i.e.

ux(𝐱)𝒩ux(𝐱),uy(𝐱)𝒩uy(𝐱),σxx(𝐱)𝒩σxx(𝐱),σyy(𝐱)𝒩σyy(𝐱),σxy(𝐱)𝒩σxy(𝐱),\begin{split}&u_{x}(\mathbf{x})\approx\mathcal{N}_{u_{x}}(\mathbf{x}),\\ &u_{y}(\mathbf{x})\approx\mathcal{N}_{u_{y}}(\mathbf{x}),\\ &\sigma_{xx}(\mathbf{x})\approx\mathcal{N}_{\sigma_{xx}}(\mathbf{x}),\\ &\sigma_{yy}(\mathbf{x})\approx\mathcal{N}_{\sigma_{yy}}(\mathbf{x}),\\ &\sigma_{xy}(\mathbf{x})\approx\mathcal{N}_{\sigma_{xy}}(\mathbf{x}),\end{split} (8)
Refer to caption
Figure 3: Potential PINN network choices, with uxu_{x} and uyu_{y} as outputs of a single network (left), or outputs of two independent networks with different parameters (right).

The cost function is defined as

=|uxux|+|uyuy|+|σxxσxx|+|σyyσyy|+|σxyσxy|+|σxx,x+σxy,y+fx|+|σxy,x+σyy,y+fy|+|(λ+2μ)εxx+λεyyσxx|+|(λ+2μ)εyy+λεxxσyy|+|2μεxyσxy|.\begin{split}\mathcal{L}&=|u_{x}-u_{x}^{*}|+|u_{y}-u^{*}_{y}|+|\sigma_{xx}-\sigma_{xx}^{*}|+|\sigma_{yy}-\sigma_{yy}^{*}|+|\sigma_{xy}-\sigma_{xy}^{*}|\\ &+|\sigma_{xx,x}+\sigma_{xy,y}+f_{x}^{*}|+|\sigma_{xy,x}+\sigma_{yy,y}+f_{y}^{*}|\\ &+|(\lambda+2\mu)\varepsilon_{xx}+\lambda\varepsilon_{yy}-\sigma_{xx}|+|(\lambda+2\mu)\varepsilon_{yy}+\lambda\varepsilon_{xx}-\sigma_{yy}|+|2\mu\varepsilon_{xy}-\sigma_{xy}|.\end{split} (9)

The quantities with asterisks represent given data. We will train the networks so that their output values are as close as possible to the data, which may be real field data or, in this paper, synthetic data from the exact solution to the problem or the result of a high-fidelity simulation. The values without asterisk represent either direct outputs of the network (e.g., uxu_{x} or σxx\sigma_{xx}; see Eq. (8)) or quantities obtained through automatic graph-based differentiation [35] of the network outputs (e.g., εxx=ux,x\varepsilon_{xx}=u_{x,x}). In Eq. (9), fxf_{x}^{*} and fyf_{y}^{*} represent data on the body forces obtained as fi=σij,jf_{i}^{*}=-\sigma_{ij,j}^{*}.

The different terms in the cost function represent measures of the error in the displacement and stress fields, the momentum balance, and the constitutive law. This cost function can be used for deep-learning-based solution of PDEs as well as for identification of the model parameters. For the solution of PDEs, λ\lambda and μ\mu are treated as fixed numbers in the network. For parameter identification, λ\lambda and μ\mu are treated as network parameters that change during the training phase (see Fig. 4). In TensorFlow [30] this can be accomplished defining λ\lambda and μ\mu as Constant (PDE solution) or Variable (parameter identification) objects, respectively. We set up the problem using the SciANN [40] framework, a high-level Keras [37] wrapper for physics-informed deep learning and scientific computations. Experimenting with all of the previously mentioned network choices can be easily done in SciANN with minimal coding.111The code for some of the examples solved here is available at: https://github.com/sciann/examples.

Refer to caption
Figure 4: Network architecture of choice used in this study. We define five networks, one for each variable of interest, i.e., ux,uy,σxx,σyy,σxyu_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{xy}. Each network has (x,y)(x,y) as input features.

3.3 Identification of model parameters: PINN trained on the exact solution

Here, we use PINN to identify the model parameters λ\lambda and μ\mu. Our data corresponds to the exact solution with parameter values λ=1\lambda=1, μ=0.5\mu=0.5 and Q=4Q=4. Our default dataset consists of 100×\times100 sample points, uniformly distributed. We study how the accuracy and the efficiency of the identification process depend on the architecture and functional form of the network; the available data; and whether we use one or several independent networks for the different quantities of interest. To study the impact of the architecture and functional form of the ANN, we use 4 different networks with either 5 or 10 hidden layers, and either 20 or 50 neurons per layer; see Table 1. The role of the network functional form is studied comparing the performance of the two most widely used activation functions, i.e., tanh\tanh and ReLU, where ReLU(x)=max(0,x)\textrm{ReLU}(x)=\max(0,x) [1].

Studying the impact of the available data on the identification process is crucial because we are interested in identifying the model parameters with as little data as possible. We undertake the analysis considering two scenarios:

  1. (a)

    Stress-complete data: In this case, we have data at a set of points for the displacements and their first-order derivatives, that is, uxu_{x}^{*}, uyu_{y}^{*}, σxx\sigma_{xx}^{*}, σyy\sigma_{yy}^{*}, σxy\sigma_{xy}^{*}. Because our cost function (9) involves also data that depends on the stress derivatives (fxf_{x}^{*} and fyf_{y}^{*}), this approach relies on an additional algorithmic procedure for differentiation of stresses. In this section we compute the stress derivatives using second-order central finite-difference approximations.

  2. (b)

    Force-complete data: In this scenario, we have data at a set of points for the displacements, their first derivatives and their second derivatives. The availability of the displacement second derivatives allows us to determine data for the body forces fxf_{x}^{*} and fyf_{y}^{*} using the momentum balance equation without resorting to any differentiation algorithm.

Table 1: Statistics of the networks of choice to perform PINN learning.
Network Layers Neurons Number of Parameters
Independent Networks Single Network
i 5 20 12336 1893
ii 5 50 72816 10713
iii 10 20 27036 3993
iv 10 50 162066 23463

In Fig. 5 we compare the evolution of the cost function for stress-complete data (Fig. 5a) and force-complete data (Fig. 5b). Both figures show a comparison of the four network architectures that we study; see Table 1. We find that training on the force-complete data performs slightly better (lower loss) at a given epoch.

The result of convergence of model identification is shown in Fig. 6. The training converges to the true values of parameters, i.e., λ=1\lambda=1 and μ=1/2\mu=1/2, for all cases. We find that the optimization is very quick on the parameters while it takes far more epochs to fit the network on the field variables. Additionally, we observe that deeper networks produce less accurate parameters. We attribute the loss of accuracy as we increase the ANN complexity to over-fitting [1, 3]. Convergence of the individual terms in the loss function (9) is shown in Fig. 7 for Net-ii (see Table 1). We find that all terms in the loss, i.e., data-driven and physics-informed, show oscillations during the optimization. Therefore, no individual term is solely responsible for the oscillations in the total loss (Fig. 5).

Refer to caption
Figure 5: The result of training networks i, ii, iii, and iv on the analytical data set uxu_{x}, uyu_{y}, σxx\sigma_{xx}, σyy\sigma_{yy}, and σxy\sigma_{xy}; (a) body forces are evaluated from central-difference differentiation of stress components, (b) body forces are also given analytically.
Refer to caption
Refer to caption
Refer to caption
Figure 6: The result of identification for λ=1,μ=1/2\lambda=1,\mu=1/2 for networks i, ii, iii, and iv on the analytical data set uxu_{x}, uyu_{y}, σxx\sigma_{xx}, σyy\sigma_{yy}, and σxy\sigma_{xy}; (a) body forces are evaluated from central-difference differentiation of stress components, (b) body forces are also given analytically.
Refer to caption
Figure 7: Individual terms of total loss (9) for network ii on the analytical data set uxu_{x}, uyu_{y}, σxx\sigma_{xx}, σyy\sigma_{yy}, and σxy\sigma_{xy}; (a) body forces are evaluated from central-difference differentiation of stress components, (b) body forces are also given analytically.

The impact of the ANN functional form can be examined comparing the data in Figs. 5b and 8a, which show the evolution of the cost function using the activation functions tanh\tanh and ReLU, respectively. The function ReLU has discontinuous derivatives, which explains its poor performance for physics-informed deep learning, whose effectiveness relies heavily on accurate evaluation of derivatives.

A comparison of Figs. 5b and 8b shows that using independent networks for displacements and stresses is more effective than using a single network. We find that the single network leads to less accurate elastic parameters because the cross-dependencies of the network outputs through the kinematic and constitutive relations may not be adequately represented by the tanh\tanh activation function.

Refer to caption
Figure 8: (a) ReLU activation function on the analytical data set uxu_{x}, uyu_{y}, σxx\sigma_{xx}, σyy\sigma_{yy}, σxy\sigma_{xy}, fxf_{x} , and fyf_{y}. (b) Connected network.

Fig. 9 analyzes the effect of availability of data on the training. We computed the exact solution on four different uniform grids of size 10×1010\times 10, 40×4040\times 40, 160×160160\times 160, and 640×640640\times 640; and carried out the parameter identification process. We performed the comparison using force-complete data and a network with 10 layers and 20 neurons per layer (network iii). The training process found good approximations to the parameters for all cases, including that with only 10×1010\times 10 points. The results show that fewer data points require many more epoch cycles, but the overall computational cost is far lower.

Refer to caption
Figure 9: Training on different sizes of data. Parameters are all accurately identified however solution has different level of accuracy.

3.4 PINN models trained on the FEM solution

Here, we generate synthetic data from FEM solutions, and then perform the training. The domain is discretized with a mesh comprised of 40×4040\times 40 elements. Four datasets are prepared using quadrilateral bilinear, biquadratic, bicubic, and biquartic Lagrange C0C^{0} elements using the commercial FEM software COMSOL [41]. We evaluate the FEM displacements, strains, stresses and stress derivatives at the center of each element. Then, we map the data to a 100×100100\times 100 training grid using SciPy’s griddata module with cubic interpolation. This step is performed as a data-augmentation procedure, which is a common practice in machine learning [1].

To analyze the importance of data satisfying the governing equations of the system, we focus our attention on network ii and we study cases with stress- and force-complete data. The results of training are presented in Fig. 10. As can be seen here, the bilinear element performs poorly on the learning and identification. The performance of training on the other elements is good, comparable to that using the analytical solution. Further analysis shows that this is indeed expected as FEM differentiation of bilinear elements provides a poor approximation of the body forces. The error in the body forces is shown in Fig. 11, which indicates a high error for bilinear elements. We conclude that the standard bilinear elements are not suitable for this problem to generate numerical data for deep learning. Fig. 10a2 confirms that pre-processing the data can remove the error that was present in the numerical solution with bilinear elements, and enable the optimization to successfully complete the identification.

Refer to caption
Figure 10: (a1) Training on the FEM dataset using uxu_{x}, uyu_{y}, σxx\sigma_{xx}, σyy\sigma_{yy}, σxy\sigma_{xy}, fxf_{x} and fyf_{y} components. (a2) Training with body forces fxf_{x} and fyf_{y} evaluated from central-differentiation of stress components.
Refer to caption
Figure 11: The error in bilinear, biquadratic, bicubic, and biquartic FEM data, that is evaluated as the difference between FEM evaluation of momentum relation, i.e., σij,j\sigma_{ij,j} and true body forces fif_{i}^{*} in xx (top) and yy (bottom) directions.

3.5 PINN models trained on the IGA solution

Observing the lowest loss \mathcal{L} on the analytical solution, we decided to study the influence of the global continuity of the numerical solution. We generated a C3C^{3}-continuous dataset using Isogeometric analysis [42]. We, therefore, analyze the system using C3C^{3} IGA elements with again a grid of 40×4040\times 40 dimension. The data are then mapped on to a grid of 100×100100\times 100 and used to train the PINN models. The training results are shown in Fig. 12. The outputs are very similar to the high-order FEM datasets.

Refer to caption
Figure 12: (a1) Training on the IGA dataset using uxu_{x}, uyu_{y}, σxx\sigma_{xx}, σyy\sigma_{yy}, σxy\sigma_{xy}, fxf_{x} and fyf_{y} components. (a2) Learning with body forces fxf_{x} and fyf_{y} evaluated from centeral-differentiation of stress components.

3.6 Identification using transfer learning

Here we explore the applicability of our PINN framework to transfer learning: a neural network that is pre-trained is used to perform identification on a new dataset. The expectation is that since the initial state of neural network is not randomly chosen anymore, training should converge faster to the solution and parameters of the data. This is crucial for many practical aspects including adaptation to new data for online search or purchase history [34] or in geosciences, where we can train a representative PINN in highly-instrumented regions and use them at other locations with limited observational datasets. To this end, we use the pre-trained model on Net-iii (Fig. 5), which was trained on a dataset with λ=1.0\lambda=1.0 and μ=0.5\mu=0.5 and then we explore how the loss evolves and the training converges when data is generated with different values of μ{2.0,1.5,1.0,0.1}\mu\in\left\{2.0,1.5,1.0,0.1\right\}.

In Fig. 13 we show the convergence of the model with different datasets. Note that the loss is normalized by the initial value 0\mathcal{L}_{0} from the pre-trained network on μ=0.5\mu=0.5 (Fig. 5). As can be seen here, re-training on new datasets costs only a few hundred epochs with a smaller initial value for the loss. This is pointing to the advantage of deep learning and PINN, where retraining on similar data is much less costly than classical methods that rely on forward simulations.

Refer to caption
Figure 13: Identification of a new dataset generated with different values of μ\mu using a pre-trained neural network on μ=0.5\mu=0.5. The re-training takes far less epochs to converge with an initial value for loss \mathcal{L} much smaller.

3.7 Application to sensitivity analysis

Performing sensitivity analysis is an expensive task when the analytical solution is not available, since it requires performing many forward numerical simulations. Alternatively, if we can construct a surrogate model to be a function of parameters of interest, then performing sensitivity analysis becomes tractable. However, construction of such a surrogate model is itself an expensive task within classical frameworks. Within PINN, however, this seems to be naturally possible. Let us suppose that the parameter of interest is shear modulus μ\mu. Consider an ANN model with inputs as (x,y,μ)(x,y,\mu) and outputs as (ux,uy,σxx,σyy,σxy)(u_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{xy}). We can, therefore, use a similar framework to construct a model that is a function of μ\mu in addition to the space variables. Again, PINN can constrain the model to adapt to the physics of interest and therefore there is less data needed to construct such a model.

Here, we explore if a PINN model trained on multiple datasets generated with various material parameters, i.e., different values of μ\mu, can be used as a surrogate model to perform sensitivity analysis. The network in Fig. 4 is now slightly adapted to carry μ\mu as an extra input (in addition to x,yx,y). The training set is prepared based on λ=1\lambda=1 and μ{1/4,2/3,3/2,4}\mu\in\left\{1/4,2/3,3/2,4\right\}. Note that there is no identification in this case, and therefore the parameters λ\lambda and μ\mu are known at any given training data. The results of the analysis are shown in Fig. 14. For a wide range of values of μ(0,9)\mu\in(0,9), the model performs very well in terms of displacements; it is less accurate, but still very useful, in terms of stresses with a maximum error for near-incompressible conditions, μ0\mu\approx 0.

Refer to caption
Figure 14: Application to sensitivity analysis: the model is trained on multiple datasets generated with different values of μ{1/4,2/3,3/2,4}\mu\in\left\{1/4,2/3,3/2,4\right\} (highlighted in dot-dashed lines). The model is then tested on a continuous range of values for μ(0,9)\mu\in(0,9). The error is defined as ||/|||\circ-\circ^{*}|/|\circ^{*}| at the point where \circ^{*} is maximum.

4 Extension to Nonlinear Elastoplasticity

In this section, we discuss the application of PINN to nonlinear solid mechanics problems undergoing elastic and plastic deformation. We use the von Mises elastoplastic constitutive model—a commonly used model to describe mechanical behavior of solid materials, in particular metals. We first describe the extension of the linear-elasticity relations in Eq. (1) to the von Mises elastoplastic relations. We then discuss the neural-network setup and apply the PINN framework to identify parameters of a classic problem: a perforated strip subjected to uniaxial extension.

4.1 von Mises elastoplasticity

We adopt the classic elastoplasticity postulate of additive decomposition of the strain tensor [43],

εij=εije+εijp.\varepsilon_{ij}=\varepsilon_{ij}^{e}+\varepsilon_{ij}^{p}. (10)

The stress tensor is now linearly dependent on the elastic strain tensor:

σij=λεkkeδij+2μεije.\sigma_{ij}=\lambda\varepsilon_{kk}^{e}\delta_{ij}+2\mu\varepsilon_{ij}^{e}. (11)

The plastic part of deformation tensor is evaluated through a plasticity model. The von Mises model implies that the plastic deformation occurs in the direction of normal to a yield surface \mathcal{F} defined as (σij):=qσY\mathcal{F}(\sigma_{ij}):=q-\sigma_{Y}, as

εijp=γσij,\varepsilon^{p}_{ij}=\gamma\frac{\partial\mathcal{F}}{\partial\sigma_{ij}}, (12)

where σY\sigma_{Y} is the yield stress, qq is the equivalent stress defined as q=3/2sijsijq=\sqrt{3/2s_{ij}s_{ij}}, with sijs_{ij} the components of the deviatoric stress tensor, sij=σijσkk/3δijs_{ij}=\sigma_{ij}-\sigma_{kk}/3\delta_{ij}. The strain remains strictly elastic as long as the state of stress σij\sigma_{ij} remains inside the yield surface, <0\mathcal{F}<0. Plastic deformation occurs when the state of stress is on the yield surface, =0\mathcal{F}=0. The condition >0\mathcal{F}>0 is associated with an inadmissible state of stress. Parameter γ\gamma is the plastic multiplier, subject to the condition γ0\gamma\geq 0, and evaluated through a predictor–corrector algorithm by imposing the condition 0\mathcal{F}\leq 0 [43]. In the case of von Mises plasticity, the volumetric plastic deformation is zero, εkkp=0\varepsilon_{kk}^{p}=0. It can be shown that the plastic multiplier γ\gamma is equal to the equivalent plastic strain ε¯p=2/3eijpeijp\bar{\varepsilon}^{p}=\sqrt{2/3e_{ij}^{p}e_{ij}^{p}}, where eije_{ij} are the components of deviatoric strain tensor, eij=εijεkk/3δije_{ij}=\varepsilon_{ij}-\varepsilon_{kk}/3\delta_{ij}.

Therefore, the elastoplastic relations for a plane-strain problem can be summarized as:

σij,j+fi=0,σij=λεkkδij+2μεij=(λ+2/3μ)εkk+sij,sij=2μ(eijeijp),eijp=ε¯pσij=ε¯p23sijq,\begin{split}\sigma_{ij,j}+f_{i}&=0,\\ \sigma_{ij}&=\lambda\varepsilon_{kk}\delta_{ij}+2\mu\varepsilon_{ij}=(\lambda+2/3\mu)\varepsilon_{kk}+s_{ij},\\ s_{ij}&=2\mu(e_{ij}-e^{p}_{ij}),\\ e^{p}_{ij}&=\bar{\varepsilon}^{p}\frac{\partial\mathcal{F}}{\partial\sigma_{ij}}=\bar{\varepsilon}^{p}\frac{2}{3}\frac{s_{ij}}{q},\\ \end{split} (13)

subject to the elastoplasticity conditions:

ε¯p0,0,ε¯p=0,\bar{\varepsilon}^{p}\geq 0,\quad\mathcal{F}\leq 0,\quad\bar{\varepsilon}^{p}\mathcal{F}=0, (14)

also known as Karush–Kuhn–Tucker (KKT) conditions. For the von Mises model, the plastic multiplier ε¯p\bar{\varepsilon}^{p} can be expressed as

ε¯p=ε¯σY3μ0,\bar{\varepsilon}^{p}=\bar{\varepsilon}-\frac{\sigma_{Y}}{3\mu}\geq 0, (15)

where ε¯\bar{\varepsilon} is the total equivalent strain, i.e., ε¯=2/3eijeij\bar{\varepsilon}=\sqrt{2/3e_{ij}e_{ij}}. Therefore, the parameters of this model are the Lamé elastic parameters λ\lambda and μ\mu, and the yield stress σY\sigma_{Y}.

4.2 Neural Network setup

The solution variables for a two-dimensional problem are uxu_{x}, uyu_{y}, εxx\varepsilon_{xx}, εyy\varepsilon_{yy}, εxy\varepsilon_{xy}, εxxp\varepsilon_{xx}^{p}, εyyp\varepsilon_{yy}^{p}, εzzp\varepsilon_{zz}^{p}, εxyp\varepsilon_{xy}^{p}, σxx\sigma_{xx}, σyy\sigma_{yy}, σzz\sigma_{zz}, σxy\sigma_{xy}. Since the out-of-plane components are no longer zero, they must be reflected in the choice of independent networks. Following the discussions for the linear elasticity case, we approximate the displacement and stress components ux,uy,σxx,σyy,σzz,σxyu_{x},u_{y},\sigma_{xx},\sigma_{yy},\sigma_{zz},\sigma_{xy} with nonlinear neural networks as:

ux𝒩ux(𝐱),uy𝒩uy(𝐱),σxx𝒩σxx(𝐱),σyy𝒩σyy(𝐱),σzz𝒩σzz(𝐱),σxy𝒩σxy(𝐱).\begin{split}{u}_{x}&\approx\mathcal{N}_{u_{x}}(\mathbf{x}),\\ {u}_{y}&\approx\mathcal{N}_{u_{y}}(\mathbf{x}),\\ \sigma_{xx}&\approx\mathcal{N}_{\sigma_{xx}}(\mathbf{x}),\\ \sigma_{yy}&\approx\mathcal{N}_{\sigma_{yy}}(\mathbf{x}),\\ \sigma_{zz}&\approx\mathcal{N}_{\sigma_{zz}}(\mathbf{x}),\\ \sigma_{xy}&\approx\mathcal{N}_{\sigma_{xy}}(\mathbf{x}).\\ \end{split} (16)

The associated cost function is then defined as

=|uxux|+|uyuy|+|σxxσxx|+|σyyσyy|+|σzzσzz|+|σxyσxy|+|σxx,x+σxy,yfx|+|σxy,x+σyy,yfy|+|(λ+2/3μ)εkk+2μ(exxexxp)σxx|+|(λ+2/3μ)εkk+2μ(eyyeyyp)σyy|+|(λ+2/3μ)εkk+2μ(ezzezzp)σzz|+|2μ(exyexyp)σxy|+|(ε¯σY/3μ)ε¯p|+|(1sign(ε¯p))|ε¯p||+|(1+sign())|||+|ε¯p|.\begin{split}\mathcal{L}&=|u_{x}-u_{x}^{*}|+|u_{y}-u^{*}_{y}|\\ &+|\sigma_{xx}-\sigma_{xx}^{*}|+|\sigma_{yy}-\sigma_{yy}^{*}|+|\sigma_{zz}-\sigma_{zz}^{*}|+|\sigma_{xy}-\sigma_{xy}^{*}|\\ &+|\sigma_{xx,x}+\sigma_{xy,y}-f_{x}^{*}|+|\sigma_{xy,x}+\sigma_{yy,y}-f_{y}^{*}|\\ &+|(\lambda+2/3\mu)\varepsilon_{kk}+2\mu(e_{xx}-e^{p}_{xx})-\sigma_{xx}|\\ &+|(\lambda+2/3\mu)\varepsilon_{kk}+2\mu(e_{yy}-e^{p}_{yy})-\sigma_{yy}|\\ &+|(\lambda+2/3\mu)\varepsilon_{kk}+2\mu(e_{zz}-e^{p}_{zz})-\sigma_{zz}|\\ &+|2\mu(e_{xy}-e^{p}_{xy})-\sigma_{xy}|+|(\bar{\varepsilon}-\sigma_{Y}/3\mu)-\bar{\varepsilon}^{p}|\\ &+|(1-\textrm{sign}(\bar{\varepsilon}^{p}))|\bar{\varepsilon}^{p}||+|(1+\textrm{sign}(\mathcal{F}))|\mathcal{F}||+|\bar{\varepsilon}^{p}\mathcal{F}|.\\ \end{split} (17)

The KKT positivity and negativity conditions are imposed through a penalty constraint in the loss function. For instance, ε¯p0\bar{\varepsilon}^{p}\geq 0 is incorporated in the loss as (1sign(ε¯p))|ε¯p|(1-\textrm{sign}(\bar{\varepsilon}^{p}))|\bar{\varepsilon}^{p}|. Therefore, for values of ε¯p<0\bar{\varepsilon}^{p}<0, the resulting ‘cost’ is 2|ε¯p|2|\bar{\varepsilon}^{p}|, which should vanish.

4.3 Illustrative example

We use a classic example to illustrate our framework: a perforated strip subjected to uniaxial extension [44, 43]. Consider a plate of dimensions 200mm×360mm200~{}\text{mm}\times 360~{}\text{mm}, with a circular hole of diameter 100mm100~{}\text{mm} located in the center of the plate. The plate is subjected to extension displacements of δ=1mm\delta=1~{}\text{mm} along the short edge, under plane-strain condition, and without body forces, fi=0f_{i}=0. The parameters are λ=19.44GPa\lambda=19.44~{}\textrm{GPa}, μ=29.17GPa\mu=29.17~{}\textrm{GPa} and σY=243.0MPa\sigma_{Y}=243.0~{}\textrm{MPa}. Due to symmetry, only a quarter of the domain needs to be considered in the simulation. The synthetic data is generated from a high-fidelity FEM simulation using COMSOL software [41] on a mesh of 13041 quartic triangular elements (Fig. 15). The plate undergoes significant plastic deformation around the circular hole, as can be seen from εijp\varepsilon_{ij}^{p} contours in Fig. 15. This results in localized deformation in the form of a shear band. While the strain exhibits localization, the stress field remains continuous and smooth—a behavior that is due to the choice of a perfect-plasticity model with no hardening.

We use 2,000 data points from this reference solution, randomly distributed in the simulation domain, to provide the training data. The PINN training is performed using networks with 4 layers, each with 100 neurons, and with a hyperbolic-tangent activation function. The optimization parameters are the same as those used for the linear elasticity problem. The results predicted by the PINN approach match the reference results very closely, as evidenced by: (1) the very small errors in each of the components of the solution, except for the out-of-plane plastic strain components (Fig. 16); and (2) the precise identification of yield stress σY\sigma_{Y} and relatively accurate identification of elastic parameters λ\lambda and μ\mu, yielding estimated values λ=18.3GPa\lambda=18.3~{}\textrm{GPa}, μ=27.6GPa\mu=27.6~{}\textrm{GPa} and σY=243.0MPa\sigma_{Y}=243.0~{}\textrm{MPa}.

Refer to caption
Figure 15: Reference solution of extension loading of a perforated plate from a high-fidelity FEM simulation. The true parameters are λ=19.44GPa\lambda=19.44~{}\textrm{GPa}, μ=29.17GPa\mu=29.17~{}\textrm{GPa} and σY=243.0MPa\sigma_{Y}=243.0~{}\textrm{MPa}.
Refer to caption
Figure 16: Error in predicted values from the PINN framework for displacements, strains, plastic strains and stresses.

5 Conclusions

We study the application of a class of deep learning, known as Physics-Informed Neural Networks (PINN), for solution and discovery in solid mechanics. In this work, we formulate and apply the framework to a linear elastostatics problem, which we analyze in detail, but then illustrate the application of the method to nonlinear elastoplasticity. We study the sensitivity of the proposed framework to noise in data coming from different numerical techniques. We find that the optimizer performs much better on data from high-order classical finite elements, or with methods with enhanced continuity such as Isogeometric Analysis. We analyze the impact of the size and depth of the network, and the size of the dataset from uniform sampling of the numerical solution—an aspect that is important in practice given the cost of a dense monitoring network. We find that the proposed PINN approach is able to converge to the solution and identify the parameters quite efficiently with as little as 100 data points.

We also explore transfer learning, that is, the use a pre-trained neural network to perform training on new datasets with different parameters. We find that training converges much faster when this is done. Lastly, we study the applicability of the model as a surrogate model for sensitivity analysis. To this end, we introduce shear modulus μ\mu as an input variable to the network. When training only on four values of μ\mu, we find that the network predicts the solution quite accurately on a wide range of values for μ\mu, a feature that is indicative of the robustness of the approach.

Despite the success exhibited by the PINN approach, we have found that it faces challenges when dealing with problems with discontinuous solutions. The network architecture is less accurate on problems with localized high gradients as a result of discontinuities in the material properties or boundary conditions. We find that, in those cases, the results are artificially diffuse where they should be sharp. We speculate that the underlying reason for this behavior is the particular architecture of the network, where the input variables are only the spatial dimensions (xx and yy), rendering the network unable to produce the required variability needed for gradient-based optimization that would capture solutions with high gradients. Addressing this extension is an exciting avenue for future work in machine-learning applications to solid mechanics.

Acknowledgements

This work was funded by the KFUPM-MIT collaborative agreement ‘Multiscale Reservoir Science’.

References