This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\emails

Department of Mathematics & Statistics, Utah State University, Logan, UT, USA [email protected]. (J. Zhao)

Discovering Phase Field Models from Image Data with the Pseudo-spectral Physics Informed Neural Networks

Jia Zhao \corrauth
Abstract

In this paper, we introduce a new deep learning framework for discovering the phase field models from existing image data. The new framework embraces the approximation power of physics informed neural networks (PINN), and the computational efficiency of the pseudo-spectral methods, which we named pseudo-spectral PINN or SPINN. Unlike the baseline PINN, the pseudo-spectral PINN has several advantages. First of all, it requires less training data. A minimum of two snapshots with uniform spatial resolution would be adequate. Secondly, it is computationally efficient, as the pseudo-spectral method is used for spatial discretization. Thirdly, it requires less trainable parameters compared with the baseline PINN. Thus, it significantly simplifies the training process and assures less local minima or saddle points. We illustrate the effectiveness of pseudo-spectral PINN through several numerical examples. The newly proposed pseudo-spectral PINN is rather general, and it can be readily applied to discover other PDE-based models from image data.

keywords:
Phase Field; Linear Scheme; Cahn-Hilliard Equation; Physics Informed Neural Network
\ams

1 Introduction

The phase field models have been widely appreciated for investigating moving interface or multiphase problems. Among them, the well-known models are the Allen-Cahn equation (AC) and the Cahn-Hilliard (CH) equation [3]. In general, most existing phase field models are thermodynamically consistent, i.e., they satisfy the first and second thermodynamic laws. In particular, for most scenarios, the temperature is assumed constant, such that the Helmholtz free energy is non-increasing in time. Therefore, Denoting the state variable as ϕ\phi and the free energy as EE, the phase field models can usually be written in a generic gradient flow form

tϕ=𝒢δEδϕ,\partial_{t}\phi=\mathcal{G}\frac{\delta E}{\delta\phi}, (1.1)

with proper initial values and boundary conditions. Here 𝒢\mathcal{G} is a semi negative-definite operator known as the mobility operator. In general, the free energy EE contains a conformation entropy and a bulk potential part. In this paper, we consider the free energy

E=Ω[ε22|ϕ|2+F(ϕ)]𝑑x.E=\int_{\Omega}\Big{[}\frac{\varepsilon^{2}}{2}|\nabla\phi|^{2}+F(\phi)\Big{]}dx. (1.2)

with ε\varepsilon a model parameter. We use f(ϕ)=F(ϕ)f(\phi)=F^{\prime}(\phi) to denote the bulk chemical potential. For the Allen-Cahn equation, the mobility is usually chosen as 𝒢=M\mathcal{G}=-M, with M>0M>0 a constant. Then, the Allen-Cahn equation reads

tϕ=M(ε2Δϕf(ϕ)).\partial_{t}\phi=M(\varepsilon^{2}\Delta\phi-f(\phi)). (1.3)

Similarly, for the Cahn-Hilliard equation, the mobility is usually chosen as 𝒢=MΔ\mathcal{G}=M\Delta, with M>0M>0 a constant. Then the Cahn-Hilliard equation reads as

tϕ=MΔ(ε2Δϕ+f(ϕ)).\partial_{t}\phi=M\Delta(-\varepsilon^{2}\Delta\phi+f(\phi)). (1.4)

Here the choice of the bulk potential F(ϕ)F(\phi) will depend on the material properties, and the double-well potential and Flory-Huggins free energy are the common choices.

Though there is a considerable amount of work on solving the phase field models [19, 18, 8, 6, 22, 7], there is less achievement on solving the inverse problem, i.e., discovering the phase field models from data. In the classical approach, the PDE models are usually derived based on empirical observation. The PDE models with free parameters are introduced first, and the free parameters will be fitted with data afterward. In this paper, we introduce a novel method using the physics informed neural network approach, where the model would be directly learned from data. In particular, we assume the bulk potential F(ϕ)F(\phi) is unknown, and we will discover it via learning from the existing data using the deep neural network.

The deep neural network has been widely used to investigate problems in various fields. Mathematically, the feed-forward neural network could be defined as compositions of nonlinear functions. Given an input xn1x\in\mathbb{R}^{n_{1}}, and denote the output of the ll-th layer as a[l]nla^{[l]}\in\mathbb{R}^{n_{l}}, which is the input for l+1l+1-th layer. In general, we can define the neural network as [10]

a[1]=xn1,a[k]=σ(W[k]a[k1]+b[k])nk, for k=2,3,,L,\begin{array}[]{l}a^{[1]}=x\in\mathbb{R}^{n_{1}},\\ a^{[k]}=\sigma\Big{(}W^{[k]}a^{[k-1]}+b^{[k]}\Big{)}\in\mathbb{R}^{n_{k}},\quad\mbox{ for }k=2,3,\cdots,L,\end{array} (1.5)

where W[k]nk×nk1W^{[k]}\in\mathbb{R}^{n_{k}\times n_{k-1}} and b[k]nkb^{[k]}\in\mathbb{R}^{n_{k}} denote the weights and biases at layer kk respectively, σ\sigma denotes the activation function. Many interesting work have been published in terms of solving or discovery of partial differential equations [16, 13, 15, 14, 2, 20, 9, 5, 1, 11, 23]. Here we briefly discuss some relevant work of PDE discovery with deep neural network or machine learning method. In [16], the authors assume the PDE model is a linear combination of a few known terms, and try to fit the coefficient from data. Based on the format of known data, they propose two approaches for calculating the loss function. In [17], the authors assume the PDE model is a linear combination of terms in a predefined dictionary (with many terms), and use the sparse regularization to coefficients are zero. Thus, after fitting with the data, only a few non-zero terms will be left, and the model is discovered. In [12], the authors introduce a feed-forward deep neural network to predict the PDE dynamics and uncover the underlying PDE model (in a black box form) simultaneously.

However, in most of the existing work, there is strong assumption on the availability of data. For instance, most existing PDE discovery strategies request a well sampled data across time and spatial domain, and they usually assume the data collected at different time slots are from a single time sequence. In reality, especially from lab experiments, the data are usually sampled as snapshots (images) from various time slots, and the images might have limited spatial resolution, i.e. the solution values are only known at certain locations. The major goal of this paper is to address the problem of PDE discovery with image snapshots. Therefore, we add to the efforts by introducing the pseudo-spectral PINN, which provides a handy approach to deal with data that are provided as image snapshots. And in particular, we focus on the phase field models alone.

The rest of of this paper is organized as follows. In Section 2, we introduce some notations, and set up the PDE discovery problem. In Section 3, we introduce the pseudo-spectral physics informed neural networks by introducing the network stricture and the various definitions of the loss function. Afterward, several examples are shown to demonstrate the approximation power of the pseudo-spectral PINN in Section 4. A brief conclusion is drawn in the last section.

2 Problem Setup

Recall from (1.1), we use ϕ(𝐱,t)\phi(\mathbf{x},t) to denote the state variables, where 𝐱Ωd\mathbf{x}\in\Omega\subset\mathbb{R}^{d} are the spatial coordinates, and t(0,T]t\in(0,T] denotes the time. The baseline PINN [16] would require a rather amount of data pairs {(𝐱i,ti,ϕ(𝐱i,ti))}i=1N\{(\mathbf{x}_{i},t_{i},\phi(\mathbf{x}_{i},t_{i}))\}_{i=1}^{N} properly sampled from the domain Ω×[0,T]\Omega\times[0,T]. Also, it requires all the data are from a single time sequence, given a deep neural network 𝒩(𝐱,t)\mathcal{N}(\mathbf{x},t) is needed to approximate ϕ(𝐱,t)\phi(\mathbf{x},t) as an aiding neural network for the discovery the PDE. However, this is usually not the case. In practice, the data are usually sampled as image snapshots at various time slots. For instance, for certain experiments, a picture is taken at t0t_{0}, and after δt\delta t, another picture is taken at t1=t0+δtt_{1}=t_{0}+\delta t.

To clearly describe the problem, we need to introduce some notations. Let Nx,NyN_{x},N_{y} be two positive even integers. The spatial domain Ω=[0,Lx]×[0,Ly]\Omega=[0,L_{x}]\times[0,L_{y}] is uniformly partitioned with mesh size hx=Lx/Nx,hy=Ly/Nyh_{x}=L_{x}/N_{x},h_{y}=L_{y}/N_{y} and

Ωh={(xj,yk)|xj=jhx,yk=khy,0jNx1,0kNy1}.\Omega_{h}=\left\{(x_{j},y_{k})|x_{j}=jh_{x},y_{k}=kh_{y},~{}0\leq j\leq N_{x}-1,0\leq k\leq N_{y}-1\right\}.

In order to derive the algorithm conveniently, we denote the discrete gradient operator and the discrete Laplace operator

h=(𝐃1x𝐃1y),Δh=hh=(𝐃1x)2+(𝐃1y)2,\nabla_{h}=\left(\begin{array}[]{c}\mathbf{D}_{1}^{x}ⓧ\\ \mathbf{D}_{1}^{y}ⓨ\end{array}\right),\quad\Delta_{h}=\nabla_{h}\cdot\nabla_{h}=(\mathbf{D}_{1}^{x})^{2}ⓧ+(\mathbf{D}_{1}^{y})^{2}ⓨ,

following our previous work [4].

Let Vh={u|u=uj,k(xj,yk)Ωh}V_{h}=\{u|u={u_{j,k}(x_{j},y_{k})\in\Omega_{h}}\} be the space of grid functions on Ωh\Omega_{h}. For such cases, the data are collected in the form as

{(Φi(1),Φi(2),δi)}i=1NVh×Vh×+,\{(\Phi_{i}^{(1)},\Phi_{i}^{(2)},\delta_{i})\}_{i=1}^{N}\subset V_{h}\times V_{h}\times\mathbb{R}^{+}, (2.1)

where Φi(1)Vh\Phi_{i}^{(1)}\in V_{h} and Φi(2)Vh\Phi_{i}^{(2)}\in V_{h} are two snapshots with a time lag δi>0\delta_{i}>0 between the two states, and NN is the total number of snapshot pairs. And the goal is to discover the bulk function ff in the phase field models, with the data collected in the form of (2.1).

We emphasis that δi\delta_{i} is not required to be constant, removing the assumption in [21]. Also, different data pairs (Φi(1),Φi(2))(\Phi_{i}^{(1)},\Phi_{i}^{(2)}) and (Φj(1),Φj(2))(\Phi_{j}^{(1)},\Phi_{j}^{(2)}) iji\neq j are not necessarily from the same single time sequence, removing the requirement of baseline PINN [16].

3 Pseudo-spectral Physics Informed Neural Networks

3.1 Neural Network Structure

For simplicity of notations, we introduce our idea by applying it on a generic example. Given the PDE problem

tϕ=𝒢[g(ϕ,ϕ,Δϕ)+f(ϕ)],𝐱Ω,t[0,T],\partial_{t}\phi=\mathcal{G}\Big{[}g(\phi,\nabla\phi,\Delta\phi)+f(\phi)\Big{]},\quad\mathbf{x}\in\Omega,\quad t\in[0,T], (3.1)

with periodic boundary condition, where operator gg is known, but operator ff is unknown. If we are provided with data pairs as (2.1), the major goal of this paper is to identify the operator/functional ff, thus discover the PDE model in (3.1).

The generic way to solve it is by introducing a deep neural network

𝒩f:ϕ𝒩f(ϕ;θ),\mathcal{N}_{f}:\phi\rightarrow\mathcal{N}_{f}(\phi;\theta), (3.2)

to approximate f(ϕ)f(\phi), where θ\theta represents the free parameters. In this paper, we assume ff is only a function of ϕ\phi for simplicity. Notice the idea introduced applies easily to cases where ff is a function of ϕ\phi and its derivatives, saying f:=f(ϕ,ϕ,Δϕ)f:=f(\phi,\nabla\phi,\Delta\phi).

Since the data pairs (2.1) are uniformly sampled and the periodic boundary condition is considered for the PDE problem (3.1), it is advisable to simply the problem (3.1) by discretizing in space using pseudo-spectral method. Afterwards, the PDE problem in (3.1) can be written as

tΦij=𝒢h[g(Φij,hΦij,ΔhΦij)+f(Φij)],i,j=1,2,,M,t[0,T],\partial_{t}\Phi_{ij}=\mathcal{G}_{h}\Big{[}g(\Phi_{ij},\nabla_{h}\Phi_{ij},\Delta_{h}\Phi_{ij})+f(\Phi_{ij})\Big{]},\quad i,j=1,2,\cdots,M,\quad t\in[0,T], (3.3)

where 𝒢h\mathcal{G}_{h} is the spatially discretized mobility operator. Here we use ΦVh\Phi\in V_{h} to denote the discrete function values of ϕ\phi on Ωh\Omega_{h}. We emphasis the periodic boundary condition is discussed in this paper. If other type of boundary conditions is considered, finite-difference or finite-element method for spatial discretization might be more proper.

Remark 3.1.

Notice, in the baseline PINN, an aiding neural network 𝒩:(𝐱,t)𝒩(𝐱,t)\mathcal{N}:(\mathbf{x},t)\rightarrow\mathcal{N}(\mathbf{x},t) is introduced to approximate Φ(𝐱,t)\Phi(\mathbf{x},t), which is computationally expensive and is applicable only to a single time sequence. It will be apparent that applying the physics informed neural networks on the semi-discrete problem (3.3) will have several advantages.

There are mainly two components (ingredients) in the deep neural network method. First of all, we shall define what the neural network meant to approximate (along with its structures, input, output, and hidden layers); define the loss function (which is enforced with known physics). Hence, for the problem above, we introduce a neural network 𝒩f:ϕ𝒩f(ϕ;θ)\mathcal{N}_{f}:\phi\rightarrow\mathcal{N}_{f}(\phi;\theta) to approximate ff. Next, we need to define the loss function.

3.2 Loss function

We split gg in (3.3) as

g(Φ,hΦ,ΔhΦ)=Lg(Φ)+Ng(Φ),g(\Phi,\nabla_{h}\Phi,\Delta_{h}\Phi)=L_{g}(\Phi)+N_{g}(\Phi),

where LgL_{g} is the linear operator and NgN_{g} is the rest, i.e.

tΦ=𝒢h[Lg(Φ)+Ng(Φ)+f(Φ)].\partial_{t}\Phi=\mathcal{G}_{h}\Big{[}L_{g}(\Phi)+N_{g}(\Phi)+f(\Phi)\Big{]}. (3.4)

To solve it numerically in the time interval [ti,ti+δi][t_{i},t_{i}+\delta_{i}], we can propose the stabilized scheme

1δi(Φti+δiΦti)=𝒢h[Lg(Φti+δi)+Ng(Φti)+f(Φti)+C(Φti+δiΦti)],\frac{1}{\delta_{i}}(\Phi_{t_{i}+\delta_{i}}-\Phi_{t_{i}})=\mathcal{G}_{h}\Big{[}L_{g}(\Phi_{t_{i}+\delta_{i}})+N_{g}(\Phi_{t_{i}})+f(\Phi_{t_{i}})+C(\Phi_{t_{i}+\delta_{i}}-\Phi_{t_{i}})\Big{]}, (3.5)

where CC is a stabilizing operator, which could be chosen as

C=i=02(1)ici(Δh)2i,C=\sum_{i=0}^{2}(-1)^{i}c_{i}(\Delta_{h})^{2i}, (3.6)

with cic_{i}’s are constants. Then, we have

Φti+δi=(1δi(𝒢h(C+Lg)))1(Φti+δi𝒢h[Ng(Φti)+f(Φti)CΦti]).\Phi_{t_{i}+\delta_{i}}=(1-\delta_{i}(\mathcal{G}_{h}(C+L_{g})))^{-1}\Big{(}\Phi_{t_{i}}+\delta_{i}\mathcal{G}_{h}\Big{[}N_{g}(\Phi_{t_{i}})+f(\Phi_{t_{i}})-C\Phi_{t_{i}}\Big{]}\Big{)}. (3.7)

Notice the scheme (3.7) is first-order accurate in time. When δi\delta_{i} is small enough, the expression for Φti+δi\Phi_{t_{i}+\delta_{i}} is accurate. Inspired by this, we can introduce our linear SPINN, to discover (3.1) from the data (2.1).

Definition 3.2 (Linear SPINN loss function).

Denote the neural network 𝒩f:Φ𝒩f(Φ;θ)\mathcal{N}_{f}:\Phi\rightarrow\mathcal{N}_{f}(\Phi;\theta) where θ\theta are the free parameters. Given (Φi(1),δi)(\Phi_{i}^{(1)},\delta_{i}), we can approximate Φi(2)\Phi_{i}^{(2)} via 𝒩R(Φi(1),δi)\mathcal{N}_{R}(\Phi_{i}^{(1)},\delta_{i}), which is defined by

𝒩R:(Φi(1),δi)(1δi(𝒢h(C+Lg)))1(Φi(1)+δi𝒢h[Ng(Φi(1))+𝒩f(Φi(1);θ)CΦi(1)]).\mathcal{N}_{R}:(\Phi_{i}^{(1)},\delta_{i})\rightarrow(1-\delta_{i}(\mathcal{G}_{h}(C+L_{g})))^{-1}\Big{(}\Phi_{i}^{(1)}+\delta_{i}\mathcal{G}_{h}\Big{[}N_{g}(\Phi_{i}^{(1)})+\mathcal{N}_{f}(\Phi_{i}^{(1)};\theta)-C\Phi_{i}^{(1)}\Big{]}\Big{)}. (3.8)

Hence, the loss function is defined as

L(θ)=i=1NΦi(2)𝒩R(Φi(1),δi;θ)22.L(\theta)=\sum_{i=1}^{N}\|\Phi_{i}^{(2)}-\mathcal{N}_{R}(\Phi_{i}^{(1)},\delta_{i};\theta)\|_{2}^{2}. (3.9)

When the time step δi\delta_{i} is small, by minimizing L(θ)L(\theta), we can identify 𝒩f\mathcal{N}_{f}, which in turn discover the PDE problem in (3.1). For certain cases, δi\delta_{i} might be large. A single step marching scheme might be inaccurate. We thus introduce a loss function that is based on a more general recursive Linear SPINN.

Definition 3.3 (Recursive Linear SPINN loss function).

Denote the neural network 𝒩f:Φ𝒩f(Φ;θ)\mathcal{N}_{f}:\Phi\rightarrow\mathcal{N}_{f}(\Phi;\theta). Given (Φi(1),δi)(\Phi_{i}^{(1)},\delta_{i}), we can approximate Φi(2)\Phi_{i}^{(2)} via a mapping 𝒩RK(Φi(1),δi)=RK\mathcal{N}_{R_{K}}(\Phi_{i}^{(1)},\delta_{i})=R_{K}, where RKR_{K} is defined recursively as

R0=Φi(1),Rj=(1δi(𝒢h(C+Lg)))1(Rj1+δi𝒢h[Ng(Rj1)+f(Rj1)CRj1]),j=1,2,,K,\begin{array}[]{l}R_{0}=\Phi_{i}^{(1)},\\ R_{j}=(1-\delta_{i}(\mathcal{G}_{h}(C+L_{g})))^{-1}\Big{(}R_{j-1}+\delta_{i}\mathcal{G}_{h}\Big{[}N_{g}(R_{j-1})+f(R_{j-1})-CR_{j-1}\Big{]}\Big{)},j=1,2,\cdots,K,\end{array} (3.10)

with KK and cic_{i}’s hyper-parameters. And the loss function is traced as

L(θ)=iΦi(2)𝒩RK(Φi(1),δi;θ)22.L(\theta)=\sum_{i}\|\Phi_{i}^{(2)}-\mathcal{N}_{R_{K}}(\Phi_{i}^{(1)},\delta_{i};\theta)\|_{2}^{2}. (3.11)
Remark 3.4.

Similarly, we can design the Neural network inspired by a second-order or higher-order numerical scheme. For instance, consider the following predictor-corrector second-order scheme. To solve it numerically in the time interval [ti,ti+δi][t_{i},t_{i}+\delta_{i}], we introduce the stabilized second-order scheme in two step:

  • First of all, we can obtain Φ^ti+δi2\hat{\Phi}_{t_{i}+\frac{\delta_{i}}{2}} via

    Φ^ti+δi2Φtiδi/2=𝒢h[Lg(Φ^ti+δi2)+Ng(Φti)+f(Φti)+C(Φ^ti+δi2Φti)].\frac{\hat{\Phi}_{t_{i}+\frac{\delta_{i}}{2}}-\Phi_{t_{i}}}{\delta_{i}/2}=\mathcal{G}_{h}\Big{[}L_{g}(\hat{\Phi}_{t_{i}+\frac{\delta_{i}}{2}})+N_{g}(\Phi_{t_{i}})+f(\Phi_{t_{i}})+C(\hat{\Phi}_{t_{i}+\frac{\delta_{i}}{2}}-\Phi_{t_{i}})\Big{]}.\\ (3.12)
  • Next, we can obtain Φti+δi\Phi_{t_{i}+\delta_{i}} via solving

    Φti+δiΦtiδi=𝒢h[Lg(Φti+δi+Φti2)+Ng(Φ^ti+δi2)+f(Φ^ti+δi2)+C(Φti+δi+Φti2Φ^ti+δi2)],\frac{\Phi_{t_{i}+\delta_{i}}-\Phi_{t_{i}}}{\delta_{i}}=\mathcal{G}_{h}\Big{[}L_{g}(\frac{\Phi_{t_{i}+\delta_{i}}+\Phi_{t_{i}}}{2})+N_{g}(\hat{\Phi}_{t_{i}+\frac{\delta_{i}}{2}})+f(\hat{\Phi}_{t_{i}+\frac{\delta_{i}}{2}})+C(\frac{\Phi_{t_{i}+\delta_{i}}+\Phi_{t_{i}}}{2}-\hat{\Phi}_{t_{i}+\frac{\delta_{i}}{2}})\Big{]}, (3.13)

Therefore, if we definite the neural network 𝒩f:ϕ𝒩f(ϕ;θ)\mathcal{N}_{f}:\phi\rightarrow\mathcal{N}_{f}(\phi;\theta), once given (Φi(1),δi)(\Phi_{i}^{(1)},\delta_{i}), we can approximate Φi(2)\Phi_{i}^{(2)} via the mapping defined as

𝒩R:(Φi(1),δi)(1δi2(𝒢h(C+Lg)))1(Φi(1)+δi𝒢h[Lg(Φi(1)2)+Ng(Φ^i+δi2)+𝒩f(Φ^i+δi2;θ)+C(Φi(1)2Φ^i+δi2)]),\begin{array}[]{l}\mathcal{N}_{R}:(\Phi_{i}^{(1)},\delta_{i})\rightarrow(1-\frac{\delta_{i}}{2}(\mathcal{G}_{h}(C+L_{g})))^{-1}\Big{(}\Phi_{i}^{(1)}\\ \qquad\qquad+\delta_{i}\mathcal{G}_{h}\Big{[}L_{g}(\frac{\Phi_{i}^{(1)}}{2})+N_{g}(\hat{\Phi}_{i+\frac{\delta_{i}}{2}})+\mathcal{N}_{f}(\hat{\Phi}_{i+\frac{\delta_{i}}{2}};\theta)+C(\frac{\Phi_{i}^{(1)}}{2}-\hat{\Phi}_{i+\frac{\delta_{i}}{2}})\Big{]}\Big{)},\end{array} (3.14)

where Φ^i+δi/2\hat{\Phi}_{i+\delta_{i}/2} is defined by

Φ^i+δi2=(1δi2(𝒢h(C+Lg)))1(Φi(1)+δi2𝒢h[Ng(Φi(1))+𝒩f(Φi(1);θ)CΦi(1)]).\hat{\Phi}_{i+\frac{\delta_{i}}{2}}=(1-\frac{\delta_{i}}{2}(\mathcal{G}_{h}(C+L_{g})))^{-1}\Big{(}\Phi_{i}^{(1)}+\frac{\delta_{i}}{2}\mathcal{G}_{h}\Big{[}N_{g}(\Phi_{i}^{(1)})+\mathcal{N}_{f}(\Phi_{i}^{(1)};\theta)-C\Phi_{i}^{(1)}\Big{]}\Big{)}. (3.15)

Then the loss function could be defined similarly. Some other ideas, such as exponential time integration can also be utilized for designing neural network structure. For brevity, these ideas will not be pursued in this paper. Interested readers are encouraged to explore these interesting topics.

As an analogy, we can mimic the Runge-Kutta method for solving (3.1) to design neural networks.Inspired by the idea of four-stage explicit Runge-Kutta method for the time discretization, we can introduce the following loss function.

Definition 3.5 (Four-stage explicit Runge-Kutta SPINN loss function).

Denote the neural network 𝒩f:Φ𝒩f(Φ;θ)\mathcal{N}_{f}:\Phi\rightarrow\mathcal{N}_{f}(\Phi;\theta). Given (Φi(1),δi)(\Phi_{i}^{(1)},\delta_{i}), we can approximate Φi(2)\Phi_{i}^{(2)} by the mapping 𝒩R(Φi(1),δi)\mathcal{N}_{R}(\Phi_{i}^{(1)},\delta_{i}) defined by the four stage method

𝒩R:(Φi(1),δi)Φi(1)+δi6(K1+2K2+2K3+K4),\mathcal{N}_{R}:(\Phi_{i}^{(1)},\delta_{i})\rightarrow\Phi_{i}^{(1)}+\frac{\delta_{i}}{6}(K_{1}+2K_{2}+2K_{3}+K_{4}), (3.16)

where

K1=𝒢h[g(Φi(1))+𝒩f(Φi(1);θ)],K2=𝒢h[g(Φi(1)+δi2K1)+𝒩f(Φi(1)+δi2K1;θ)],K3=𝒢h[g(Φi(1)+δi2K2)+𝒩f(Φi(1)+δi2K2;θ)],K4=𝒢h[g(Φi(1)+δiK3)+Nf(Φi(1)+δiK3;θ)].\begin{array}[]{l}K_{1}=\mathcal{G}_{h}\Big{[}g(\Phi_{i}^{(1)})+\mathcal{N}_{f}(\Phi_{i}^{(1)};\theta)\Big{]},\\ K_{2}=\mathcal{G}_{h}\Big{[}g(\Phi_{i}^{(1)}+\frac{\delta_{i}}{2}K_{1})+\mathcal{N}_{f}(\Phi_{i}^{(1)}+\frac{\delta_{i}}{2}K_{1};\theta)\Big{]},\\ K_{3}=\mathcal{G}_{h}\Big{[}g(\Phi_{i}^{(1)}+\frac{\delta_{i}}{2}K_{2})+\mathcal{N}_{f}(\Phi_{i}^{(1)}+\frac{\delta_{i}}{2}K_{2};\theta)\Big{]},\\ K_{4}=\mathcal{G}_{h}\Big{[}g(\Phi_{i}^{(1)}+\delta_{i}K_{3})+N_{f}(\Phi_{i}^{(1)}+\delta_{i}K_{3};\theta)\Big{]}.\end{array} (3.17)

Then the loss function can be defined as

L(θ)=i=1NΦi(2)𝒩R(Φi(1),δi)2.L(\theta)=\sum_{i=1}^{N}\|\Phi_{i}^{(2)}-\mathcal{N}_{R}(\Phi_{i}^{(1)},\delta_{i})\|^{2}. (3.18)

Similarly, when the time step δi\delta_{i} is large, we can define the loss function via the explicit RK SPINN recursively as below.

Definition 3.6 (Recursive four-stage Runge-Kutta SPINN loss function).

Denote the neural network 𝒩f:Φ𝒩f(Φ;θ)\mathcal{N}_{f}:\Phi\rightarrow\mathcal{N}_{f}(\Phi;\theta). Given the data (Φi(1),δi)(\Phi_{i}^{(1)},\delta_{i}), we can approximate Φi(2)\Phi_{i}^{(2)} via the KK-step recursive mapping denoted as 𝒩RK:(Φi(1),δi)RK\mathcal{N}_{R_{K}}:(\Phi_{i}^{(1)},\delta_{i})\rightarrow R_{K}, where

R0=Φi(1),Rj=Rj1+δi6K(K1j1+2K2j1+2K3j1+K4j1),j=1,2,,K,K1j1=𝒢h[g(Rj1)+𝒩f(Rj1;θ)],K2j1=𝒢h[g(Rj1+δi2KK1j1)+𝒩f(Rj1+δi2KK1j1;θ)],K3j1=𝒢h[g(Rj1+δi2KK2j1)+𝒩f(Rj1+δi2KK2j1;θ)],K4j1=𝒢h[g(Rj1+δiKK3j1)+Nf(Rj1+δiKK3j1;θ)].\begin{array}[]{l}R_{0}=\Phi_{i}^{(1)},\\ R_{j}=R_{j-1}+\frac{\delta_{i}}{6K}(K_{1}^{j-1}+2K_{2}^{j-1}+2K_{3}^{j-1}+K_{4}^{j-1}),j=1,2,\cdots,K,\\ K_{1}^{j-1}=\mathcal{G}_{h}\Big{[}g(R_{j-1})+\mathcal{N}_{f}(R_{j-1};\theta)\Big{]},\\ K_{2}^{j-1}=\mathcal{G}_{h}\Big{[}g(R_{j-1}+\frac{\delta_{i}}{2K}K^{j-1}_{1})+\mathcal{N}_{f}(R_{j-1}+\frac{\delta_{i}}{2K}K^{j-1}_{1};\theta)\Big{]},\\ K_{3}^{j-1}=\mathcal{G}_{h}\Big{[}g(R_{j-1}+\frac{\delta_{i}}{2K}K^{j-1}_{2})+\mathcal{N}_{f}(R_{j-1}+\frac{\delta_{i}}{2K}K^{j-1}_{2};\theta)\Big{]},\\ K_{4}^{j-1}=\mathcal{G}_{h}\Big{[}g(R_{j-1}+\frac{\delta_{i}}{K}K^{j-1}_{3})+N_{f}(R_{j-1}+\frac{\delta_{i}}{K}K^{j-1}_{3};\theta)\Big{]}.\end{array} (3.19)

And the loss function is defined by

L(θ)=i=1NΦi(1)𝒩RK(Φi(1),δi)2.L(\theta)=\sum_{i=1}^{N}\|\Phi_{i}^{(1)}-\mathcal{N}_{R_{K}}(\Phi_{i}^{(1)},\delta_{i})\|^{2}. (3.20)
Remark 3.7.

When the time step is even larger, one can use the implicit Runge-Kutta method for time discretization. However, given the intermediate stages are unknown, we need to introduce an extra neural network to approximate it:

𝒩q:(Φi(1),δi)(𝒩c1(Φi(1),δi),𝒩c2(Φi(1),δi),,𝒩cq(Φi(1),δi)),\mathcal{N}_{q}:(\Phi_{i}^{(1)},\delta_{i})\rightarrow(\mathcal{N}_{c_{1}}(\Phi_{i}^{(1)},\delta_{i}),\mathcal{N}_{c_{2}}(\Phi_{i}^{(1)},\delta_{i}),\cdots,\mathcal{N}_{c_{q}}(\Phi_{i}^{(1)},\delta_{i})), (3.21)

where 𝒩cj(Φi(1),δi)\mathcal{N}_{c_{j}}(\Phi_{i}^{(1)},\delta_{i}) is to approximate Φ(ti+cjδi)\Phi(t_{i}+c_{j}\delta_{i}). By following the idea in [16], we can define the following loss function

L(θ)=j=1qΦi(1)[𝒩cj(Φi(1),δi)+δik=1qajk𝒩ck(Φi(1),δi)]2+j=1qΦi(2)[𝒩cj(Φi(1),δi)+δik=1q(ajkbk)𝒩ck(Φi(1),δi)]2.\begin{array}[]{l}L(\theta)=\sum_{j=1}^{q}\Big{\|}\Phi_{i}^{(1)}-\Big{[}\mathcal{N}_{c_{j}}(\Phi_{i}^{(1)},\delta_{i})+\delta_{i}\sum_{k=1}^{q}a_{jk}\mathcal{N}_{c_{k}}(\Phi_{i}^{(1)},\delta_{i})\Big{]}\Big{\|}^{2}\\ +\sum_{j=1}^{q}\Big{\|}\Phi_{i}^{(2)}-\Big{[}\mathcal{N}_{c_{j}}(\Phi_{i}^{(1)},\delta_{i})+\delta_{i}\sum_{k=1}^{q}(a_{jk}-b_{k})\mathcal{N}_{c_{k}}(\Phi_{i}^{(1)},\delta_{i})\Big{]}\Big{\|}^{2}.\end{array} (3.22)

For such a case, it requires expensive computational costs, and we will not investigate it in the current paper.

4 Numerical examples

Next, we investigate the proposed SPINN with several examples. To identify the phase field models from data snapshots.

Recall that we assume f(ϕ)f(\phi) is unknown, and the goal is to identify it via learning the existing data in the form of (2.1). As explained, we define a neural network. 𝒩f:(ϕ)𝒩f(ϕ;θ)\mathcal{N}_{f}:(\phi)\rightarrow\mathcal{N}_{f}(\phi;\theta) to approximate ff and use the loss functions as defined in the previous section. In the rest of this paper, we assume 𝒩f\mathcal{N}_{f} a feed-forward neural network, with 22-hidden layers, with each hidden layer having 2020 neurons. The tanh\tanh activation function is applied in both hidden layers. During the training process, the AdamAdam method with default learning rate is used for 10,000 training iterations, followed by an L-BFGS-B optimization training process. The algorithms are implemented with Tensorflow.

For simplicity of discussion, in the all the examples below, rand()rand() generates random numbers between [1,1][-1,1] we chosen the domain Ω=[1,1]2\Omega=[-1,1]^{2}, and choose Nx=Ny=128N_{x}=N_{y}=128 in Ωh\Omega_{h}, i.e. the collected data Φi(1),Φi(2)\Phi_{i}^{(1)},\Phi_{i}^{(2)} are matrices in 128,128\mathbb{R}^{128,128}. And we solve the PDE first with high-order-accurate scheme with uniform time steps and uniform spatial discretization. The numerical solutions are randomly sampled at different time slots as training data to inversely discover the bulk function ff. Given the free energy in (1.2), we get g(ϕ)=ε2Δϕg(\phi)=-\varepsilon^{2}\Delta\phi, and Lg(Φ)=ε2ΔhΦL_{g}(\Phi)=-\varepsilon^{2}\Delta_{h}\Phi. And we will chose C=2C=2 for AC equation, and C=2ΔhC=-2\Delta_{h} for CH equation.

Example 1. In the first example, we generate data by solving the Allen-Cahn equation in (1.3) with F(ϕ)=14(1ϕ2)2F(\phi)=\frac{1}{4}(1-\phi^{2})^{2}, which means f(ϕ)=ϕ3ϕf(\phi)=\phi^{3}-\phi. And the parameters used are ε=0.02\varepsilon=0.02, M=10M=10, with the initial condition ϕ(𝐱,t=0)=0.25rand(𝐱)\phi(\mathbf{x},t=0)=0.25rand(\mathbf{x}). Some snapshots of ϕ\phi at various time slots are shown in Figure 4.1.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4.1: Snapshots of the solution Φ\Phi for the Allen-Cahn equation at various time slots t=0.325,0.375,0.575,0.825t=0.325,0.375,0.575,0.825.

First of all, we test out the Runge-Kutta SPINN approach. We choose N=1N=1, i.e. use only a single data pair (Φ(1),Φ(2),δ)(\Phi^{(1)},\Phi^{(2)},\delta) to train the neural network. We randomly choose Φ(1)\Phi^{(1)}, and study how the size of δ\delta would affect the learned result. One experiment result with Φ(1)\Phi^{(1)} chosen at t=0.325t=0.325 and δ=0.05,0.1,0.25,0.5\delta=0.05,0.1,0.25,0.5 are summarized in Figure 4.2. We observe that when the two snapshots are close, i.e. δ\delta is small, the Runge-Kutta SPINN approach can accurately learn the bulk function ff. However, when the time step δ\delta is large, its accuracy drops.

Refer to caption
Figure 4.2: Bulk function ff predicted using the Runge-Kutta SPINN loss function with δ=0.05,0.1,0.25,0.5\delta=0.05,0.1,0.25,0.5.

To overcome the inaccuracy when δ\delta is large, we utilize the recursive Runge-Kutta SPINN approach. Here we fix the time step δ=0.25\delta=0.25, and test the effect by using different recursive stage K=1,20,50,100K=1,20,50,100. The results are summarized in Figure 4.3. We observe that, by increasing the recursive stages, the accuracy improves, and the bulk function ff can be learned accurately when sufficient stage is used.

Refer to caption
Figure 4.3: Bulk function ff predicted using the recursive Runge-Kutta SPINN loss function, where K=1,50,100K=1,50,100. This figure indicates with sufficient recursive stages, it gives accurate prediction of ff.

Example 2. However, when the time step is large enough, the recursive Runge-Kutta SPINN approach will not provide accurate approximate to ff. For instance, with the time step δ=0.5\delta=0.5, the recursive Runge-Kutta SPINN approach fails. Meanwhile, the linear SPINN approach shows superior accuracy. As an example, we use a single data pair(Φ(1),Φ(2),δ)(\Phi^{(1)},\Phi^{(2)},\delta) with a fixed time step δ=0.5\delta=0.5. We vary the recursive stage KK for the recursive linear SPINN, and the results of the learned function ff are summarized in Figure 4.4. We observe that, even when the time step δ\delta is large, the recursive linear SPINN approach still provides accurate approximate to ff, so long as the recursive stage KK is large enough.

Refer to caption
Figure 4.4: Predicted bulk function ff with K=1,10,50,100. It shows ,with KK increasing, the accuracy of the SPINN improves.

We remark that the training strategy and the quality of training data might also be factors for the approximation accuracy, which we will not pursue in detail. Interested readers are strongly encouraged to explore.

Example 3. In the next example, we increase the problem complexity to identify a highly nonlinear bulk function. In details, we get the data by solving an Allen-Cahn equation with the Flory-Huggins free energy F(ϕ)=ϕln(ϕ)+0.5(1ϕ)ln(1ϕ)+2ϕ(1ϕ)F(\phi)=\phi\ln(\phi)+0.5(1-\phi)\ln(1-\phi)+2\phi(1-\phi), which means the bulk function f=lnϕ0.5ln(1ϕ)+1.52ϕf=\ln\phi-0.5\ln(1-\phi)+1.5-2\phi. The parameters used are M=10M=10, ε=0.1\varepsilon=0.1, along with the initial condition ϕ(𝐱,t=0)=12(1+tanh0.8x2+y22ε)\phi(\mathbf{x},t=0)=\frac{1}{2}(1+\tanh\frac{0.8-\sqrt{x^{2}+y^{2}}}{\sqrt{2}\varepsilon}). We randomly sample two snapshots with δt=0.05\delta t=0.05, and train the neural network. A example of using two snapshots at t=0.325,0.375t=0.325,0.375 are shown in Figure 4.5, where the two snapshots are shown in Figure 4.5(a), and the predicted function ff is shown in Figure 4.5(b). We observe the linear SPINN approach can learn the bulk function ff from only two images accurately.

Refer to caption
Refer to caption
(a) snapshots of ϕ\phi at t=0.325,0.375t=0.325,0.375
Refer to caption
(b) predicted ff
Figure 4.5: Predicted bulk function ff for the AC equation with Flory-Huggins bulk free energy using the Linear SPINN with K=10K=10. (a) snapshots of the solution ϕ\phi for the Allen-Cahn equation at time t=0.325,0.375t=0.325,0.375. (b) the predicted bulk function ff learned from the data in (a).

Example 4. In the last example, we use the linear SPINN approach to discover the bulk function ff from the solution snapshots of the Cahn-Hilliard equation in (1.4). Here the data is obtained by solving the Cahn-Hilliard equation with F(ϕ)=14(ϕ21)2F(\phi)=\frac{1}{4}(\phi^{2}-1)^{2}, i.e. f(ϕ)=ϕ3ϕf(\phi)=\phi^{3}-\phi, M=1M=1 ε=0.05\varepsilon=0.05, and initial condition ϕ(t=0)=0.2+0.001rand(𝐱)\phi(t=0)=0.2+0.001rand(\mathbf{x}). For the loss function, we add an extra term 103𝒩f(0)210^{3}\|\mathcal{N}_{f}(0)\|^{2}, to enforce the uniqueness of 𝒩f\mathcal{N}_{f}. Some temporal snapshots for Φ\Phi are summarized in Figure 4.6. We randomly choose data {(Φi(1),Φi(2),δi)}i=1N\{(\Phi_{i}^{(1)},\Phi_{i}^{(2)},\delta_{i})\}_{i=1}^{N}, with N=1,5,10N=1,5,10 data points with fixed time step δi=0.05\delta_{i}=0.05. The learned result ff is summarized in Figure 4.7. We obverse the predicted bulk function ff has improved accuracy with more data used to train the neural network.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4.6: Snapshots of the solution Φ\Phi for the Cahn-Hilliard equation at various time slots t=0.05,0.325,0.375,1t=0.05,0.325,0.375,1.
Refer to caption
Figure 4.7: Predicted bulk function ff for the Cahn-Hilliard equation with various data pairs N=1,5,10N=1,5,10.

5 Conclusion

In this paper, we introduce a pseudo-spectral physics informed neural network to discover the bulk function in the phase field models. This newly proposed method well fits the common data collection strategy, i.e., taking snapshots/images at various time slots. The effectiveness of the proposed pseudo-spectral PINN, or SPINN, has been verified through identifying the bulk function ff of several phase field models. The idea of pseudo-spectral PINN is rather general, and it can be readily applied to discover other PDE models from image data, which will be investigated in our later research projects.

Acknowledgment

Jia Zhao would like to acknowledge the support from NSF DMS-1816783.

Conflict of Interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  • [1] J. Berg and K. Nystrom. A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing, 317:28–41, 2018.
  • [2] S. L. Brunton, J. L. Proctor, and J. N. Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016.
  • [3] J. W. Cahn and J. E. Hilliard. Free energy of a nonuniform system. I. interfacial free energy. Journal of Chemical Physics, 28:258–267, 1958.
  • [4] L. Chen, J. Zhao, and Y. Gong. A novel second-order scheme for the molecular beam epitaxy model with slope selection. Commun. Comput. Phys, x(x):1–21, 2019.
  • [5] W. E and B. Yu. The deep ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6:1–12, 2018.
  • [6] F. Guillen-Gonzalez and G. Tierra. Second order schemes and time-step adaptivity for Allen-Cahn and Cahn-Hilliard models. Computers Mathematics with Applications, 68(8):821–846, 2014.
  • [7] Y. Gong, and J. Zhao. Energy-stable Runge–Kutta schemes for gradient flow models using the energy quadratization approach. Applied Mathematics Letter, 94, 224-231, 2019.
  • [8] D. Han and X. Wang. A second order in time uniquely solvable unconditionally stable numerical schemes for Cahn-Hilliard-Navier-Stokes equation. Journal of Computational Physics, 290(1):139–156, 2015.
  • [9] J. Han, A. Jentzen, and W. E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
  • [10] C. Higham and D. Higham. Deep learning: An introduction for applied mathematicians. arXiv, page 1801.05894, 2018.
  • [11] B. Li, S. Tang, and H. Yu. Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. Communications in Computational Physics, 27:379–411, 2020.
  • [12] Z. Long, Y. Lu, X. Ma, and B. Dong. Pde-net: Learning pdes from data. Proceedings of the 35th International Conference on Machine Learning, 80:3208–3216, 2018.
  • [13] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. Deepxde: A deep learning library for solving differential equations. arXiv, page 1907.04502, 2019.
  • [14] T. Qin, K. Wu, and D. Xiu. Data driven governing equations approximation using deep neural networks. Journal of Computational Physics, 395:620–635, 2019.
  • [15] M. Raissi. Deep hidden physics models: Deep learning of nonlinear partial differential equations. The Journal of Machine Learning Research, 19:1–24, 2018.
  • [16] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  • [17] S. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz. Data-driven discovery of partial differential equations. Science Advances, 3(4):e1602614, 2017.
  • [18] J. Shen and X. Yang. Numerical approximations of Allen-Cahn and Cahn-Hilliard equations. Disc. Conti. Dyn. Sys.-A, 28:1669–1691, 2010.
  • [19] C. Wang, X. Wang, and S. Wise. Unconditionally stable schemes for equations of thin film epitaxy. Discrete and Continuous Dynamic Systems, 28(1):405–423, 2010.
  • [20] Y. Wang and C. Lin. Runge-kutta neural network for identification of dynamical systems in high accuracy. IEEE Transactions on Neural Networks, 9(2):294–307, 1998.
  • [21] K. Xu and D. Xiu. Data-driven deep learning of partial differential equations in modal space. Journal of Computational Physics, 408:109307, 2020.
  • [22] X. Yang, J. Zhao, and Q. Wang. Numerical approximations for the molecular beam epitaxial growth model based on the invariant energy quadratization method. Journal of Computational Physics, 333:102–127, 2017.
  • [23] H. Zhao, B. Storey, R. Braatz, and M. Bazant. Learning the physics of pattern formation from images. Physical Review Letters, 124, 060201, 2020.
  • [24] J. Zhao, X. Yang, Y. Gong, X. Zhao, X.G. Yang, J. Li, and Q. Wang A general strategy for numerical approximations of non-equilibrium models–Part I thermodynamical systems. International Journal of Numerical Analysis & Modeling, 15 (6), 884-918, 2018.