This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Hybrid explicit-implicit learning for multiscale problems with time dependent source

Yalchin Efendiev111Department of Mathematics, Texas A&M University, College Station, TX 77843, USA,  Wing Tat Leung222Department of Mathematics, City University of Hong Kong, Hong Kong SAR,  Wenyuan Li333Department of Mathematics, Texas A&M University, College Station, TX 77843, USA,  Zecheng Zhang444Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA

Abstract

The splitting method is a powerful method for solving partial differential equations. Various splitting methods have been designed to separate different physics, nonlinearities, and so on. Recently, a new splitting approach has been proposed where some degrees of freedom are handled implicitly while other degrees of freedom are handled explicitly. As a result, the scheme contains two equations, one implicit and the other explicit. The stability of this approach has been studied. It was shown that the time step scales as the coarse spatial mesh size, which can provide a significant computational advantage. However, the implicit solution part can be expensive, especially for nonlinear problems. In this paper, we introduce modified partial machine learning algorithms to replace the implicit solution part of the algorithm. These algorithms are first introduced in ‘HEI: Hybrid Explicit-Implicit Learning For Multiscale Problems’, where a homogeneous source term is considered along with the Transformer, which is a neural network that can predict future dynamics. In this paper, we consider time-dependent source terms which is a generalization of the previous work. Moreover, we use the whole history of the solution to train the network. As the implicit part of the equations is more complicated to solve, we design a neural network to predict it based on training. Furthermore, we compute the explicit part of the solution using our splitting strategy. In addition, we use Proper Orthogonal Decomposition based model reduction in machine learning. The machine learning algorithms provide computational saving without sacrificing accuracy. We present three numerical examples which show that our machine learning scheme is stable and accurate.

1 Introduction

Many physical problems vary at multiple space and time scales. The solutions of these problems are difficult to compute as we need to resolve the time and space scales. For example, porous media flow and transport occur over many space and time scales. Many multiscale and splitting algorithms have been proposed to solve these problems [28, 18, 41]. In this paper, we would like to combine machine learning and splitting/multiscale approaches to predict the dynamics of complex forward problems. Our goal is to perform machine learning on a small part of the solution space, while computing “the rest” of the solution fast. Our proposed splitting algorithms allow doing it, which we will describe below.

Our approaches use multiscale methods in their construction to achieve coarser time stepping. Multiscale methods have extensively been studied in the literature. For linear problems, homogenization-based approaches [23, 39, 40], multiscale finite element methods [23, 31, 35], generalized multiscale finite element methods (GMsFEM) [12, 13, 14, 17, 21], Constraint Energy Minimizing GMsFEM (CEM-GMsFEM) [15, 16, 7], nonlocal multi-continua (NLMC) approaches [19], metric-based upscaling [44], heterogeneous multiscale method [20], localized orthogonal decomposition (LOD) [30], equation-free approaches [45, 46], multiscale stochastic approaches [33, 34, 32], and hierarchical multiscale methods [5] have been studied. Approaches such as GMsFEM and NLMC are designed to handle high-contrast problems. For nonlinear problems, we can replace linear multiscale basis functions with nonlinear maps [26, 27, 22].

Splitting methods are introduced when solving dynamic problems [3, 47, 29]. There is a wide range of applications for splitting methods, e.g. splitting physics, scales, and so on. Splitting methods allow simplifying the large-scale problem into smaller-scale problems. We couple them and obtain the final solution after we solve the smaller-scale problems. Recently, a partially explicit splitting method [28, 18, 24] is proposed for linear and nonlinear multiscale problems. In this approach, we treat some solution parts implicitly while treating the other parts explicitly. The stability analysis shows that the time step only depends on the explicit part of the solution. With the careful splitting of the space, we are able to make the time step scale as the coarse mesh size and independent of contrast.

To illustrate the main idea of our methods, we consider

ut+f(u)=g(x,t).u_{t}+f(u)=g(x,t).

In this paper, we combine machine learning and splitting approaches (such as the partially explicit method). In the splitting algorithm, we write the solution as u=u1+u2u=u_{1}+u_{2}, where u1u_{1} is computed implicitly and u2u_{2} is computed explicitly. In our previous work, the Transformer has been used to solve this kind of problem in [25, 11]. In [25], we consider zero source term and use the Transformer to predict the future dynamics of the solution. In our current paper, we consider full dynamics and time-dependent source terms. The latter occurs in porous media applications and represents the rates of wells, which vary in time. The goal in these simulations is to compute the solutions at all time steps for a given time-dependent source term in an efficient way, which is not addressed before.

Many machine learning and multiscale methods are designed for solving multiscale parametric PDEs [11, 48, 49, 9, 43, 42]. The advantage of machine learning is that it treats PDEs as an input-to-output map. By training, and possibly, using the observed data, machine learning avoids computing solutions for each input (such as parameters in boundary conditions, permeability fields or sources). In this paper, we use machine learning to learn the difficult-to-compute part of the solution that is, in general, computed implicitly. This is a coarse-grid part of the solution, which can also be observed given some dynamic data. The splitting algorithm gives two coupled equations. The first equation for u1u_{1} is implicit while the second equation for u2u_{2} is explicit. We design a neural network to predict the first component of the solution. Then, we calculate the second component using the discretized equations. The inputs of our neural network are the parameters that characterize the source term of the equation. The outputs are the whole history of u1u_{1}, i.e. u1u_{1} at all time steps. The output dimension of our neural network is the dimension of u1u_{1} times the number of time steps, which is considerably large. Therefore, we use Proper Orthogonal Decomposition (POD) [48, 10, 38] to reduce the dimension of the solution space. We concatenate all solution vectors of every training input together to generate a large-scale matrix. After doing the singular value decomposition, we select several dominant eigenvectors to form the POD basis vectors. Thus, the output becomes the coordinate vectors based on the POD basis vectors and its dimension is greatly reduced, so is the number of parameters inside the neural networks.

In this paper, we show several numerical examples to illustrate our learning framework which is based on the partially explicit scheme. In our numerical examples, the diffusion term f(u)f(u) is div(κ(x,u)u)-div(\kappa(x,u)\nabla u) and the reaction term is g(x,t)g(x,t). g(x,t)g(x,t) is a source term that is nonzero on several coarse blocks and is zero elsewhere. We choose such g(x,t)g(x,t) as it simulates the wells in petroleum applications. By comparing with the solution which is obtained using fine grid basis functions, we find that the relative L2L^{2} error is small and similar to the scheme where we calculate both components. Thus, the learning scheme can achieve similar accuracy while saving computation.

This paper is organized as follows. We present preliminaries and review the partially explicit splitting scheme in Section 2. In Section 3, we show details about our learning framework. We present numerical experiments in Section 4.

2 Problem Setting

We consider the following parabolic partial differential equation

ut+f(u)=g(x,t).\displaystyle u_{t}+f(u)=g(x,t). (2.1)

Let VV be a Hilbert space. In this equation, f=δFδuVf=\cfrac{\delta F}{\delta u}\in V^{\ast} with FF being the variational derivative of energy F(u):=ΩE(u)F(u):=\int_{\Omega}E(u). In this paper, we use (,)(\cdot,\cdot) to denote (,)V,V(\cdot,\cdot)_{V^{*},V} for simplicity. We assume that ff is contrast dependent (linear or nonlinear). Here, g(x,t)g(x,t) is a time dependent source term with parameters. The weak formulation of Equation (2.1) is the following. We want to find uVu\in V such that

(ut,v)+(f(u),v)=(g(x,t),v)vV.\displaystyle(u_{t},v)+(f(u),v)=(g(x,t),v)\quad\forall v\in V. (2.2)

The (,)(\cdot,\cdot) is the L2L^{2} inner product. We assume that

(f(v),v)>0,vV.\displaystyle(f(v),v)>0,\quad\forall v\in V. (2.3)
Example 1.

In the heat equation, F(u)=12Ωκ|u|2F(u^{\prime})=\cfrac{1}{2}\int_{\Omega}\kappa|\nabla u^{\prime}|^{2}. Let V=H1(Ω)V=H^{1}(\Omega). We have f(u)=δFδu(u)(H1(Ω))=H1(Ω)f(u)=\cfrac{\delta F}{\delta u^{\prime}}(u)\in(H^{1}(\Omega))^{*}=H^{1}(\Omega). Then,

(δFδu(u),v)=Ωκuv.(\cfrac{\delta F}{\delta u^{\prime}}(u),v)=\int_{\Omega}\kappa\nabla u\cdot\nabla v.

Therefore, we have

f(u)=(κu).f(u)=-\nabla\cdot(\kappa\nabla u).

The weak formulation (2.2) becomes

(ut,v)((κu),v)=(g(x,t),v).(u_{t},v)-(\nabla\cdot(\kappa\nabla u),v)=(g(x,t),v).

The standard approach to solving this problem is the finite element method. Let VHV_{H} be the finite element space and uHVHu_{H}\in V_{H} is the numerical solution satisfying

(uH,t,v)+(f(uH),v)=(g(x,t),v)vVH.\displaystyle(u_{H,t},v)+(f(u_{H}),v)=(g(x,t),v)\quad\forall v\in V_{H}. (2.4)

The Backward Euler (implicit) time discretization for this Equation is

(uHn+1uHnΔt,v)+(f(uHn+1),v)=(g(x,tn+1),v)vVH.\displaystyle(\frac{u^{n+1}_{H}-u^{n}_{H}}{\Delta t},v)+(f(u^{n+1}_{H}),v)=(g(x,t^{n+1}),v)\quad\forall v\in V_{H}. (2.5)

The Forward Euler (explicit) time discretization for Equation (2.4) is

(uHn+1uHnΔt,v)+(f(uHn),v)=(g(x,tn),v)vVH.\displaystyle(\frac{u^{n+1}_{H}-u^{n}_{H}}{\Delta t},v)+(f(u^{n}_{H}),v)=(g(x,t^{n}),v)\quad\forall v\in V_{H}. (2.6)

One can find details about the stability analysis of the Backward and Forward Euler method in [18].

2.1 Partially Explicit Scheme

In this paper, we use the partially explicit splitting scheme introduced in [18]. We split the space VHV_{H} to be the direct sum of two subspaces VH,1V_{H,1} and VH,2V_{H,2}, i.e. VH=VH,1VH,2V_{H}=V_{H,1}\oplus V_{H,2}. Then the solution uH=uH,1+uH,2u_{H}=u_{H,1}+u_{H,2} with uH,1VH,1u_{H,1}\in V_{H,1} and uH,2VH,2u_{H,2}\in V_{H,2} satisfies

((uH,1+uH,2)t,v1)+(f(uH,1+uH,2),v1)=(g(x,t),v1)v1VH,1,\displaystyle((u_{H,1}+u_{H,2})_{t},v_{1})+(f(u_{H,1}+u_{H,2}),v_{1})=(g(x,t),v_{1})\quad\forall v_{1}\in V_{H,1},
((uH,1+uH,2)t,v2)+(f(uH,1+uH,2),v2)=(g(x,t),v2)v2VH,2.\displaystyle((u_{H,1}+u_{H,2})_{t},v_{2})+(f(u_{H,1}+u_{H,2}),v_{2})=(g(x,t),v_{2})\quad\forall v_{2}\in V_{H,2}.

We use the partially explicit time discretization for this problem. Namely, we consider finding {uHn}n=0K\{u_{H}^{n}\}_{n=0}^{K} (with uHn=uH,1n+uH,2n,nu_{H}^{n}=u_{H,1}^{n}+u_{H,2}^{n},\;\forall n) such that

(uH,1n+1uH,1nΔt+uH,2nuH,2n1Δt,v1)+(f(uH,1n+1+uH,2n),v1)=(g(x,tn),v1)v1VH,1,\displaystyle(\frac{u_{H,1}^{n+1}-u_{H,1}^{n}}{\Delta t}+\frac{u_{H,2}^{n}-u_{H,2}^{n-1}}{\Delta t},v_{1})+(f(u_{H,1}^{n+1}+u_{H,2}^{n}),v_{1})=(g(x,t^{n}),v_{1})\quad\forall v_{1}\in V_{H,1}, (2.7)
(uH,1nuH,1n1Δt+uH,2n+1uH,2nΔt,v2)+(f(uH,1n+1+uH,2n),v2)=(g(x,tn),v2)v2VH,2.\displaystyle(\frac{u_{H,1}^{n}-u_{H,1}^{n-1}}{\Delta t}+\frac{u_{H,2}^{n+1}-u_{H,2}^{n}}{\Delta t},v_{2})+(f(u_{H,1}^{n+1}+u_{H,2}^{n}),v_{2})=(g(x,t^{n}),v_{2})\quad\forall v_{2}\in V_{H,2}. (2.8)

We refer to [28, 18] for the details of convergence assumptions and stability analysis of the above scheme.

2.1.1 Construction of VH,1V_{H,1} and VH,2V_{H,2}

In this section, we present how to construct the two subspaces for our partially explicit splitting scheme. We will first present the CEM method which is used to construct VH,1V_{H,1}. Next, with the application of eigenvalue problems, we build VH,2V_{H,2}. In the following discussion, let SΩS\subset\Omega and then we define V(S)=H01(S)V(S)=H_{0}^{1}(S).

CEM method

We introduce the CEM finite element method in this section. Let 𝒯H\mathcal{T}_{H} be a coarse grid partition of Ω\Omega. For Ki𝒯HK_{i}\in\mathcal{T}_{H}, we need to construct a collection of auxiliary basis functions in V(Ki)V(K_{i}). We assume that {χi}\{\chi_{i}\} are basis functions that form a partition of unity, e.g., piecewise linear functions, see [4]. We first find λj(i)\lambda_{j}^{(i)} and ψj(i)\psi_{j}^{(i)} such that the following equation holds:

Kiκψj(i)v=λj(i)si(ψj(i),v),vV(Ki),\displaystyle\int_{K_{i}}\kappa\nabla\psi_{j}^{(i)}\cdot\nabla v=\lambda_{j}^{(i)}s_{i}(\psi_{j}^{(i)},v),\quad\forall v\in V(K_{i}),

where

si(u,v)=Kiκ~uv,κ~=κH2orκ~=κi|χi|2.\displaystyle s_{i}(u,v)=\int_{K_{i}}\tilde{\kappa}uv,\quad\tilde{\kappa}=\kappa H^{-2}\;\text{or}\;\tilde{\kappa}=\kappa\sum_{i}\left|\nabla\chi_{i}\right|^{2}.

With rearrangement if necessary, we select the LiL_{i} eigenfunctions of the corresponding first LiL_{i} smallest eigenvalues. We define

Vaux(i):=span{ψj(i): 1jLi}.V_{aux}^{(i)}:=\text{span}\{\psi_{j}^{(i)}:\;1\leq j\leq L_{i}\}.

Define a projection operator Π:L2(Ω)VauxL2(Ω)\Pi:L^{2}(\Omega)\mapsto V_{aux}\subset L^{2}(\Omega)

s(Πu,v)=s(u,v),vVaux:=i=1NeVaux(i),s(\Pi u,v)=s(u,v),\quad\forall v\in V_{aux}:=\sum_{i=1}^{N_{e}}V_{aux}^{(i)},

with s(u,v):=i=1Nesi(u|Ki,v|Ki)s(u,v):=\sum_{i=1}^{N_{e}}s_{i}(u|_{K_{i}},v|_{K_{i}}) and NeN_{e} being the number of coarse elements. We denote an oversampling domain which is several coarse blocks larger than KiK_{i} as Ki+K_{i}^{+}[15]. For every auxiliary basis ψj(i)\psi_{j}^{(i)}, we solve the following two equations and find ϕj(i)V(Ki+)\phi_{j}^{(i)}\in V(K_{i}^{+}) and μj(i)Vaux\mu_{j}^{(i)}\in V_{aux}

a(ϕj(i),v)+s(μj(i),v)\displaystyle a(\phi_{j}^{(i)},v)+s(\mu_{j}^{(i)},v) =0,vV(Ki+),\displaystyle=0,\quad\forall v\in V(K_{i}^{+}),
s(ϕj(i),ν)\displaystyle s(\phi_{j}^{(i)},\nu) =s(ψj(i),ν),νVaux(Ki+).\displaystyle=s(\psi_{j}^{(i)},\nu),\quad\forall\nu\in V_{aux}(K_{i}^{+}).

Then we can define the CEM finite element space as

Vcem\displaystyle V_{cem} :=span{ϕj(i): 1iNe,1jLi}.\displaystyle:=\text{span}\{\phi_{j}^{(i)}:\;1\leq i\leq N_{e},1\leq j\leq L_{i}\}.

We define V~:={vV:Π(v)=0}\tilde{V}:=\{v\in V:\;\Pi(v)=0\} and we will need it when we construct VH,2V_{H,2} in the next subsection.

Construction of VH,2V_{H,2}

We build the second subspace VH,2V_{H,2} based on VH,1V_{H,1} and V~\tilde{V}. Similarly, we first have to construct the auxiliary basis functions by solving the eigenvalue equations. For every coarse element KiK_{i}, we search for (ξj(i),γj(i))(V(Ki)V~)×(\xi_{j}^{(i)},\gamma_{j}^{(i)})\in(V(K_{i})\cap\tilde{V})\times\mathbb{R} such that

Kiκξj(i)v\displaystyle\int_{K_{i}}\kappa\nabla\xi_{j}^{(i)}\cdot\nabla v =γj(i)Kiξj(i)v,vV(Ki)V~.\displaystyle=\gamma_{j}^{(i)}\int_{K_{i}}\xi_{j}^{(i)}v,\;\ \forall v\in V(K_{i})\cap\tilde{V}. (2.9)

We choose the JiJ_{i} eigenfunctions corresponding to the smallest JiJ_{i} eigenvalues. The second type of auxiliary space is defined as Vaux,2:=span{ξj(i):1iNe,1jJi}V_{aux,2}:=\text{span}\{\xi_{j}^{(i)}:1\leq i\leq N_{e},1\leq j\leq J_{i}\}. For every auxiliary basis ξj(i)Vaux,2\xi_{j}^{(i)}\in V_{aux,2}, we look for a basis function ζj(i)V(Ki+)\zeta_{j}^{(i)}\in V(K_{i}^{+}) such that for some μj(i),1Vaux\mu_{j}^{(i),1}\in V_{aux}, μj(i),2Vaux,2\mu_{j}^{(i),2}\in V_{aux,2}, we have

a(ζj(i),v)+s(μj(i),1,v)+(μj(i),2,v)\displaystyle a(\zeta_{j}^{(i)},v)+s(\mu_{j}^{(i),1},v)+(\mu_{j}^{(i),2},v) =0,vV(Ki+),\displaystyle=0,\quad\forall v\in V(K_{i}^{+}), (2.10)
s(ζj(i),ν)\displaystyle s(\zeta_{j}^{(i)},\nu) =0,νVaux,\displaystyle=0,\quad\forall\nu\in V_{aux}, (2.11)
(ζj(i),ν)\displaystyle(\zeta_{j}^{(i)},\nu) =(ξj(i),ν),νVaux,2.\displaystyle=(\xi_{j}^{(i)},\nu),\quad\forall\nu\in V_{aux,2}. (2.12)

We define

VH,2=span{ζj(i)| 1iNe, 1jJi}.V_{H,2}=\text{span}\{\zeta_{j}^{(i)}|\;1\leq i\leq N_{e},\;1\leq j\leq J_{i}\}.

3 Machine Learning Approach

In Equation (2.1), we consider g(x,t)g(x,t) as a source term with varying parameters representing temporal changes. For every parameter, we want to find the corresponding solution u(x,t)u(x,t). For the traditional finite element methods, we need to construct meshes and basis functions, assemble the stiffness and mass matrices and solve the matrices equations, which makes the computation cost large, especially for the nonlinear PDEs as we have to assemble the stiffness matrix at every time step. Designing machine learning algorithms for the partially explicit scheme can alleviate this problem. Once the neural network is trained, the computation of solution becomes the multiplication of matrices and applying activation functions, which is more efficient in contrast to the finite element methods.

The Equation (2.7) is implicit for uH,1n+1u^{n+1}_{H,1}, which makes the computations more complicated. We design a machine learning algorithm to predict uH,1n+1u^{n+1}_{H,1} and then we compute uH,2n+1u^{n+1}_{H,2} using Equation (2.8).

Refer to caption
Figure 3.1: Flowchart.

We present a flowchart in Figure 3.1 to show the workflow of our method. We first need to generate the dataset which will be used to train the neural network. Let wm=(wm1,wm2,,wmk)w_{m}=(w_{m1},w_{m2},\cdots,w_{mk}) be the parameters in gw(x,t)g_{w}(x,t). We set wmi[L,U]w_{mi}\in[L,U] for i=1,2,,ki=1,2,\cdots,k with LL and UU being the lower bound and upper bound, respectively. We generate the training space {wm}m=1M\{w_{m}\}_{m=1}^{M} by drawing samples from a uniform distribution over the interval [L,U][L,U]. For every wmw_{m} in the training space, we calculate its corresponding partially explicit solution using Equations (2.7) and (2.8). Let NN be the number of time steps and MM be the size of our training sample space. We denote this solution space as {{uH,mn}n=2N}m=1M\{\{u_{H,m}^{n}\}_{n=2}^{N}\}_{m=1}^{M}, where uH,mn=uH,m,1n+uH,m,2nu_{H,m}^{n}=u_{H,m,1}^{n}+u_{H,m,2}^{n}. We remark that the reason why nn starts from 22 is that we need u0u^{0} and u1u^{1} to compute u2u^{2} in the partially explicit scheme. The u1u^{1} is computed by the Backward Euler scheme using basis functions from both VH,1V_{H,1} and VH,2V_{H,2}. We only need {{uH,m,1n}n=2N}m=1M\{\{u_{H,m,1}^{n}\}_{n=2}^{N}\}_{m=1}^{M} for our machine learning algorithm. Before we move on, we discuss a useful tool called Proper Orthogonal Decomposition [6].

For a given set of vectors {xi}i=1c\{x_{i}\}_{i=1}^{c} with xidx_{i}\in\mathbb{R}^{d}, we want to construct a smaller dimensional vector space that can represent the space spanned by {xi}i=1c\{x_{i}\}_{i=1}^{c} accurately, i.e., we want to find orthonormal vectors {yj}j=1l\{y_{j}\}_{j=1}^{l} that solve

miny1,,ylj=1cxji=1lxj,yll2yll22,\displaystyle\min\limits_{y_{1},\cdots,y_{l}}\sum_{j=1}^{c}\|x_{j}-\sum_{i=1}^{l}\langle x_{j},y_{l}\rangle_{l^{2}}y_{l}\|^{2}_{l^{2}}, (3.1)

where lrl\leq r and rr is the rank of (x1,x2,,xc)(x_{1},x_{2},\cdots,x_{c}). We apply the singular value decomposition for this. Next, we briefly remind the POD procedure [6]. We first define a matrix X=(x1,x2,,xc)X=(x_{1},x_{2},\cdots,x_{c}) whose dimension is d×cd\times c. Then XTXX^{T}X and XXTXX^{T} are c×cc\times c and d×dd\times d dimensional matrices. Consider the eigenvalues and eigenvectors of XTXX^{T}X as well as XXTXX^{T},

XTXzi=σi2zi,XXTyi=σi2yi.X^{T}Xz_{i}=\sigma_{i}^{2}z_{i},\quad XX^{T}y_{i}=\sigma_{i}^{2}y_{i}.

Next, we define

Y=(y1,,yd),Z=(z1,,zc).Y=(y_{1},\cdots,y_{d}),\quad Z=(z_{1},\cdots,z_{c}).

We now have

YTXZ=Σ,Y^{T}XZ=\Sigma,

where Σ\Sigma is the diagonal matrix whose main diagonal entries are σi2\sigma_{i}^{2}. After doing rearrangement if necessary, we may assume

σ12σ22>0.\sigma_{1}^{2}\geq\sigma_{2}^{2}\geq\cdots>0.

The solution of (3.1) is given by the ll eigenvectors of XXTXX^{T} corresponding to the largest ll eigenvalues. Thus, the desired ll dimensional subspace is the subspace spanned by eigenvectors corresponding to the largest ll eigenvalues.

There are N×MN\times M column vectors (dim(VH,1)\in\mathbb{R}^{\dim(V_{H,1})}) in the solution space {{uH,mn}n=2N}m=1M\{\{u_{H,m}^{n}\}_{n=2}^{N}\}_{m=1}^{M}. We would like to output the solution {uHn}n=2N\{u_{H}^{n}\}_{n=2}^{N} at the same time so that we don’t have to do the iteration for every time step recursively. In this case, the output dimension is dim(VH,1)×(N1)\dim(V_{H,1})\times(N-1), which is too large (dim(VH,1)\dim(V_{H,1}) is 300300 in our examples). As a result, the number of parameters in the neural network will also be large, which increases the computation cost. We define

S=(uH,1,12,uH,1,13,,uH,1,1N,uH,2,12,uH,2,13,,uH,2,1N,,uH,M,12,uH,M,13,,uH,M,1N).\displaystyle S=(u_{H,1,1}^{2},u_{H,1,1}^{3},\cdots,u_{H,1,1}^{N},u_{H,2,1}^{2},u_{H,2,1}^{3},\cdots,u_{H,2,1}^{N},\cdots,u_{H,M,1}^{2},u_{H,M,1}^{3},\cdots,u_{H,M,1}^{N}).

Hence, dim(S)=dim(VH,1)×((N1)M)\dim(S)=\dim(V_{H,1})\times((N-1)\cdot M). After the singular value decomposition to SS, we define

P=(p1,p2,,pl),P=(p_{1},p_{2},\cdots,p_{l}),

where pidim(VH,1)p_{i}\in\mathbb{R}^{\dim(V_{H,1})} is the eigenvector corresponding to the ii-th largest eigenvalue. We use PP to get a reduced dimensional coordinate vectors for every uH,m,1nu_{H,m,1}^{n},i.e.

uH,m,1n,=(PTP)1PTuH,m,1nl\displaystyle u_{H,m,1}^{n,\prime}=(P^{T}P)^{-1}P^{T}u_{H,m,1}^{n}\in\mathbb{R}^{l} (3.2)

Thus, the output dimension is reduced to l×(N1)l\times(N-1). In the testing stage, after we get the output {uH,1n,}n=2N\{u_{H,1}^{n,\prime}\}_{n=2}^{N}, we use PP to get the coarse grid solution {uH,1n=PuH,1n,}n=2N\{u_{H,1}^{n}=Pu_{H,1}^{n,\prime}\}_{n=2}^{N}.

As discussed above, in our neural networks, the inputs are the parameters ww in gw(x,t)g_{w}(x,t) and outputs are {uH,m,1n,}n\{u_{H,m,1}^{n,\prime}\}_{n}. That is

{uH,m,1n,}n=𝒩(w),\displaystyle\{u_{H,m,1}^{n,\prime}\}_{n}=\mathcal{N}(w),

where 𝒩\mathcal{N} is the neural network. To better illustrate,

Y1=ϕ1(wW1+b1),\displaystyle Y_{1}=\phi_{1}(wW_{1}+b_{1}),

where Y1Y_{1}, ϕ1\phi_{1}, ww, W1W_{1} and b1b_{1} are the output of the first layer, the activation function, inputs, weight matrix and the biased term. Here, Y1Y_{1}, ww and b1b_{1} are row vectors. Similarly,

Yi+1=ϕi+1(YiWi+1+bi+1)\displaystyle Y_{i+1}=\phi_{i+1}(Y_{i}W_{i+1}+b_{i+1})

and

YI=YI1WI+bI,\displaystyle Y_{I}=Y_{I-1}W_{I}+b_{I},

where II is the number of layers and i=1,2,,I1i=1,2,\cdots,I-1. Then, we can obtain {uH,m,1n,}n\{u_{H,m,1}^{n,\prime}\}_{n} by reshaping YIY_{I}.

Next, we would like to prove two theoretical results. Under the setting in Section 3, we first define the map from the input to the output of our neural network as

h:W\displaystyle h:W l×(N1),\displaystyle\rightarrow\mathbb{R}^{l\times(N-1)},
w\displaystyle w (uH,w,12,,1,uH,w,12,,2,,uH,w,12,,l,uH,w,13,,1,uH,w,13,,2,,uH,w,13,,l,,uH,w,1N,,1,uH,w,1N,,2,,uH,w,1N,,l),\displaystyle\mapsto(u_{H,w,1}^{2,\prime,1},u_{H,w,1}^{2,\prime,2},\dots,u_{H,w,1}^{2,\prime,l},u_{H,w,1}^{3,\prime,1},u_{H,w,1}^{3,\prime,2},\dots,u_{H,w,1}^{3,\prime,l},\dots,u_{H,w,1}^{N,\prime,1},u_{H,w,1}^{N,\prime,2},\dots,u_{H,w,1}^{N,\prime,l}),

where WW is a compact subset of k\mathbb{R}^{k} with norm W\|\cdot\|_{W}. We will show that the neural network is capable of approximating hh precisely. We then show that when the loss function (mean squared error) goes to zero, the output of the neural network converges to the exact solution (which is computed via the partially explicit scheme). We state the following well-known theorem [36].

Theorem 3.1.

Universal Approximation Theorem. Let σ:\sigma:\mathbb{R}\rightarrow\mathbb{R} be any non-affine continuous function. Also, σ\sigma is continuously differentiable at at least one point and with nonzero derivative at that point. Let 𝒩k,Dσ\mathcal{N}_{k,D}^{\sigma} be the space of neural networks with kk inputs, DD outputs, σ\sigma as the activation function for every hidden layers and the identity function as the activation function for the output layers. Then given any ϵ>0\epsilon>0 and any function C(W,D)\mathcal{H}\in C\left(W,\mathbb{R}^{D}\right), there exists ^𝒩k,Dσ\hat{\mathcal{H}}\in\mathcal{N}_{k,D}^{\sigma} such that

supwW^(w)(w)l()𝔻<ϵ.\sup_{w\in W}\|\hat{\mathcal{H}}(w)-\mathcal{H}(w)\|_{l_{\infty}(\mathbb{R{{}^{D}})}}<\epsilon.

The universal approximation theorem relies on the continuity assumption of the mapping \mathcal{H}. More specifically, to apply the universal approximation theorem, we now need to show that hh is continuous.

Lemma 3.1.

We define ua2=a(u,u)=(f(u),u)\|u\|_{a}^{2}=a(u,u)=(f(u),u) and assume ff is linear. We know VH=VH,1VH,2V_{H}=V_{H,1}\oplus V_{H,2}. The two subspaces VH,1V_{H,1} and VH,2V_{H,2} are finite dimensional subspaces with trivial intersection. By the strengthened Cauchy Schwarz inequality [2], there exists a constant γ\gamma depending on VH,1V_{H,1} and VH,2V_{H,2} such that

0<γ:=supv1VH,1,v2VH,2(v1,v2)v1v2<1.\displaystyle 0<\gamma:=\sup_{v_{1}\in V_{H,1},v_{2}\in V_{H,2}}\cfrac{(v_{1},v_{2})}{\|v_{1}\|\|v_{2}\|}<1. (3.3)

If

Δt(1γ)(supv2VH,2v2a2v22)1,\displaystyle\Delta t\leq(1-\gamma)\left(\sup_{v_{2}\in V_{H,2}}\cfrac{\|v_{2}\|_{a}^{2}}{\|v_{2}\|^{2}}\right)^{-1}, (3.4)

then hh is continuous with respect to wWw\in W.

The proof is given in Appendix A. With the continuity of hh, we have the following theorem.

Theorem 3.2.

For any ϵ>0\epsilon>0, there exists h^𝒩k,l(N1)σ\hat{h}\in\mathcal{N}_{k,l(N-1)}^{\sigma} such that

supwWh^(w)h(w)l(l(N1))<ϵ/3.\sup_{w\in W}\|\hat{h}(w)-h(w)\|_{l_{\infty}(\mathbb{R}^{l(N-1)})}<\epsilon/3.

The theorem follows from the continuity of hh and the Universal Approximation Theorem. Finally, we present the estimation of the generalization error.

Theorem 3.3.

Let WTWW_{T}\subset W be the set containing all training samples. Fix ϵ>0\epsilon>0. The functions hh and h^\hat{h} are uniformly continuous since they are continuous and WW is compact. It follows that there exists δ>0\delta>0 such that

h^(w)h^(w′′)l<ϵ/3,h(w)h(w′′)l<ϵ/3,\displaystyle\|\hat{h}(w^{\prime})-\hat{h}(w^{\prime\prime})\|_{l_{\infty}}<\epsilon/3,\quad\|h(w^{\prime})-h(w^{\prime\prime})\|_{l_{\infty}}<\epsilon/3,

for any w,w′′Ww^{\prime},w^{\prime\prime}\in W and ww′′<δ\|w^{\prime}-w^{\prime\prime}\|<\delta. We assume that for any wWw\in W, there exists wiWTw_{i}\in W_{T} such that wwiW<δ\|w-w_{i}\|_{W}<\delta. Let h^𝒩k,l(N1)σ\hat{h}\in\mathcal{N}_{k,l(N-1)}^{\sigma} be the mapping in Theorem 3.2. Then we have

supwWh^(w)h(w)l(l(N1))<ϵ.\sup_{w\in W}\|\hat{h}(w)-h(w)\|_{l_{\infty}(\mathbb{R}^{l(N-1)})}<\epsilon.
Proof.

For any wWw\in W, we can find wiWTw_{i}\in W_{T} such that wwiW<δ\|w-w_{i}\|_{W}<\delta, it follows that:

h^(w)h(w)l\displaystyle\|\hat{h}(w)-h(w)\|_{l_{\infty}} =h^(w)h^(wi)+h^(wi)h(wi)+h(wi)h(w)l\displaystyle=\|\hat{h}(w)-\hat{h}(w_{i})+\hat{h}(w_{i})-h(w_{i})+h(w_{i})-h(w)\|_{l_{\infty}}
h^(w)h^(wi)l+h^(wi)h(wi)l+h(wi)h(w)l\displaystyle\leq\|\hat{h}(w)-\hat{h}(w_{i})\|_{l_{\infty}}+\|\hat{h}(w_{i})-h(w_{i})\|_{l_{\infty}}+\|h(w_{i})-h(w)\|_{l_{\infty}}
ϵ/3+ϵ/3+ϵ/3=ϵ.\displaystyle\leq\epsilon/3+\epsilon/3+\epsilon/3=\epsilon.

As ϵ\epsilon and ww are arbitrary, we obtain the conclusion. ∎

4 Numerical Results

In this section, we present numerical examples. In all cases, the domain Ω\Omega is [0,1]2[0,1]^{2}. The permeability field κ\kappa we use is shown in Figure 4.1. Note that there are high contrast channels in the permeability field. The fine mesh size is 2/100\sqrt{2}/100 and the coarse mesh size is 2/10\sqrt{2}/10.

Refer to caption
Figure 4.1: The permeability field κ\kappa.

We define the following notation.

  • uH,lu_{H,l} is obtained by predicting uH,1u_{H,1} and computing uH,2u_{H,2} by Equation (2.8).

  • uH,cu_{H,c} is obtained by computing uH,1u_{H,1} and uH,2u_{H,2} by Equation (2.7) and (2.8).

  • uH,fu_{H,f} is obtained using the fine grid basis functions and Equation (2.5).

We also need to define the following relative error for better illustration. Let \|\cdot\| be the L2L^{2} norm. We know that uH,l=uH,l,1+uH,l,2u_{H,l}=u_{H,l,1}+u_{H,l,2} and uH,c=uH,c,1+uH,c,2u_{H,c}=u_{H,c,1}+u_{H,c,2}.

  • e1=uH,luH,fuH,f×100%e_{1}=\cfrac{\|u_{H,l}-u_{H,f}\|}{\|u_{H,f}\|}\times 100\%, e2=uH,cuH,fuH,f×100%e_{2}=\cfrac{\|u_{H,c}-u_{H,f}\|}{\|u_{H,f}\|}\times 100\%.

  • e3=uH,l,1uH,c,1uH,c,1×100%e_{3}=\cfrac{\|u_{H,l,1}-u_{H,c,1}\|}{\|u_{H,c,1}\|}\times 100\%, e4=uH,luH,cuH,c×100%e_{4}=\cfrac{\|u_{H,l}-u_{H,c}\|}{\|u_{H,c}\|}\times 100\%.

e1e_{1} and e2e_{2} are used to measure the difference of accuracy between our learning scheme and the scheme in which we compute both uH,1u_{H,1} and uH,2u_{H,2}. e3e_{3} and e4e_{4} are defined to measure how close the learning output is to their target. The neural network is fully connected and is activated by the scaled exponential linear unit (SELU). To obtain the solution of the minimization problem, we use the gradient-based optimizer Adam [37]. The loss function we use is the mean square error (MSE). Our machine learning algorithm is implemented using the open-source library Keras [8] with the Tensorflow [1] backend. The neural network we use is a simple neural network with 55 fully connected layers. We remark that with this neural network, we can obtain high accuracy.

In the first two examples, we consider the following linear parabolic partial differential equations

ut(κu)\displaystyle u_{t}-\nabla\cdot(\kappa\nabla u) =gw(x,t),(x,t)Ω×[0,T],\displaystyle=g_{w}(x,t),\;(x,t)\in\Omega\times[0,T],
u(x)n\displaystyle\nabla u(x)\cdot\vec{n} =0,xΩ,\displaystyle=0,\;x\in\partial\Omega,
u(x,0)\displaystyle u(x,0) =u0,\displaystyle=u_{0},

where n\vec{n} is the outward normal vector and u0u_{0} is the initial condition which is presented it in Figure 4.2. In our first example, the source term is defined as follows,

gw(x,t)={100(w1sin(2πtT)+w2sin(5.2πtT)) for x[0.2,0.3]2100(w3sin(2.4πtT)+w4sin(4πtT)) for x[0.8,0.9]20 otherwise,\displaystyle g_{w}(x,t)=\begin{cases}100(w_{1}\sin(\cfrac{2\pi t}{T})+w_{2}\sin(\cfrac{5.2\pi t}{T}))&\text{ for }x\in[0.2,0.3]^{2}\\ 100(w_{3}\sin(\cfrac{2.4\pi t}{T})+w_{4}\sin(\cfrac{4\pi t}{T}))&\text{ for }x\in[0.8,0.9]^{2}\\ 0&\text{ otherwise, }\end{cases}

where (w1,w2,w3,w4)(w_{1},w_{2},w_{3},w_{4}) are parameters. In this case, we let 1wi101\leq w_{i}\leq 10 for i=1,2,3,4i=1,2,3,4. We show a simple sketch of gw(x,t)g_{w}(x,t) in Figure 4.2 to illustrate. This sketch only shows the position of the coarse blocks where the source term is nonzero. We choose this source, which has some similarities to the wells in petroleum engineering. The final time is T=0.01T=0.01, there are N=100N=100 time steps and the time step size is Δt=T/N=104\Delta t=T/N=10^{-4}.

Refer to caption
Refer to caption
Figure 4.2: Left: Initial condition u0u_{0}. Right: Source term g(x,t)g(x,t). The square boxes indicate the positions of the wells.

The training space contains 10001000 samples and the testing space contains 500500 samples. Every sample’s parameter ww is drawn from the uniform distribution over [1,10][1,10]. The number of POD basis vectors is l=15l=15. We choose this ll based on the eigenvalue behavior in the process of singular value decomposition. We present e1e_{1} and e2e_{2} defined before in Figure 4.3. The error is calculated at each time step as the average over all samples in the testing space. From the plot in Figure 4.3, we find that the curves for e1e_{1} and e2e_{2} coincide, which means our learning algorithms can obtain similar accuracy as the scheme in which we compute uHu_{H} by Equation (2.7) and (2.8). The average of e3e_{3} and e4e_{4} over time are 0.151%0.151\% and 0.155%0.155\%, respectively, which means the output of our learning scheme nearly resembles the corresponding target (the solution we obtain by computing both equations).

Refer to caption
Figure 4.3: e1e_{1} and e2e_{2} of the linear case with 44 parameters.

In the next case, we test our algorithms using a more complicated source term with more parameters. The source term we use is,

gw(x,t)={100(w1sin(πtT)+w2cos(3.2πtT)) for x[0.2,0.3]2100(w3sin(2.2πtT)+w4sin(1.6πtT)) for x[0.8,0.9]2100(w5cos(3πtT)+w6cos(4.6πtT)) for x[0.2,0.3]×[0.8,0.9]100(w7cos(1.4πtT)+w8sin(5πtT)) for x[0.8,0.9]×[0.2,0.3]100(w9sin(2.8πtT)+w10sin(4πtT)) for x[0.5,0.6]20 otherwise,\displaystyle g_{w}(x,t)=\begin{cases}100(w_{1}\sin(\cfrac{\pi t}{T})+w_{2}\cos(\cfrac{3.2\pi t}{T}))&\text{ for }x\in[0.2,0.3]^{2}\\ 100(w_{3}\sin(\cfrac{2.2\pi t}{T})+w_{4}\sin(\cfrac{1.6\pi t}{T}))&\text{ for }x\in[0.8,0.9]^{2}\\ 100(w_{5}\cos(\cfrac{3\pi t}{T})+w_{6}\cos(\cfrac{4.6\pi t}{T}))&\text{ for }x\in[0.2,0.3]\times[0.8,0.9]\\ 100(w_{7}\cos(\cfrac{1.4\pi t}{T})+w_{8}\sin(\cfrac{5\pi t}{T}))&\text{ for }x\in[0.8,0.9]\times[0.2,0.3]\\ 100(w_{9}\sin(\cfrac{2.8\pi t}{T})+w_{10}\sin(\cfrac{4\pi t}{T}))&\text{ for }x\in[0.5,0.6]^{2}\\ 0&\text{ otherwise, }\end{cases}

where (w1,w2,w3,,w10)(w_{1},w_{2},w_{3},\cdots,w_{10}) are parameters. In this case, the parameters wiw_{i} of every sample are drawn from the uniform distribution over [1,20][1,20]. We present an easy sketch of gw(x,t)g_{w}(x,t) in Figure 4.4 to illustrate the positions of the sources. The initial condition u0u_{0} is also shown in Figure 4.4. The final time TT, the number of time steps NN and the time step size are 0.010.01, 100100 and 10410^{-4}, respectively.

Refer to caption
Refer to caption
Figure 4.4: Left: Initial condition u0u_{0}. Right: Source term gw(x,t)g_{w}(x,t). The square boxes in the plot indicate the positions of the wells.

The model is trained using 2000 samples and we test it with 1000 samples. The number of POD basis vectors is l=25l=25. In Figure 4.5, the e1e_{1} and e2e_{2} are presented. By observing the error plot, we find that e1e_{1} and e2e_{2} are close to each other, which implies that our learning algorithms can obtain similar accuracy as the scheme where we calculate both uH,1u_{H,1} and uH,2u_{H,2} using Equation (2.7) and (2.8). The average of e3e_{3} and e4e_{4} over time are 0.673%0.673\% and 0.677%0.677\%, respectively. Such small e3e_{3} and e4e_{4} indicate that the prediction is accurate.

Refer to caption
Figure 4.5: e1e_{1} and e2e_{2} of the linear case with 1010 parameters.

In the next case, we consider the following equation

ut(κeuu)\displaystyle u_{t}-\nabla\cdot(\kappa e^{u}\nabla u) =g(x,t),(x,t)Ω×[0,T],\displaystyle=g(x,t),\;(x,t)\in\Omega\times[0,T],
u(x)n=0,xΩ,\displaystyle\nabla u(x)\cdot\vec{n}=0,\;x\in\partial\Omega,
u(x,0)=u0,\displaystyle u(x,0)=u_{0},

where n\vec{n} is the outward normal vector. This equation is nonlinear and its computation cost is much larger than the linear case because we have to assemble the stiffness matrix at every time step. We use the same initial condition u0u_{0} and source term g(x,t)g(x,t) as the second example. The final time is T=0.001T=0.001, the number of time steps is N=40N=40 and the time step size is Δt=T/N=2.5×105\Delta t=T/N=2.5\times 10^{-5}. We train our neural network using 10001000 samples and the network is tested using 500500 samples. In Figure 4.6, we present e1e_{1} and e2e_{2} defined before. From the first plot, we find that the curves for e1e_{1} and e2e_{2} nearly coincide, which implies that our machine learning algorithms can have the same accuracy as we compute the solution by Equations (2.7) and (2.8). The average of e3e_{3} and e4e_{4} over time are 0.460%0.460\% and 0.450%0.450\%, respectively, which means our neural network can approximate the target well and achieve a fantastic accuracy.

Refer to caption
Figure 4.6: e1e_{1} and e2e_{2} of the nonlinear case with 10 parameters.

5 Conclusion

In this paper, we construct neural networks based on the partially explicit splitting scheme. We predict u1u_{1}, the implicit part of the solution, while solving for u2u_{2}, the explicit part of the solution. The output dimension resulting from u1u_{1} is large and we apply the POD to reduce the dimension. The POD transformation greatly reduces the output dimension and at the same time the number of parameters in the neural networks. The POD error is small compared to the overall error. We discuss three numerical examples. We present a theoretical justification. We conclude that our machine learning algorithms can obtain similar accuracy as the partially explicit scheme.

Appendix A Proof of Lemma 3.1

In this section, we prove the Lemma 3.1, i.e., the function

h:W\displaystyle h:W l×(N1),\displaystyle\rightarrow\mathbb{R}^{l\times(N-1)},
w\displaystyle w (uH,w,12,,1,uH,w,12,,2,,uH,w,12,,l,uH,w,13,,1,uH,w,13,,2,,uH,w,13,,l,,uH,w,1N,,1,uH,w,1N,,2,,uH,w,1N,,l)\displaystyle\mapsto(u_{H,w,1}^{2,\prime,1},u_{H,w,1}^{2,\prime,2},\dots,u_{H,w,1}^{2,\prime,l},u_{H,w,1}^{3,\prime,1},u_{H,w,1}^{3,\prime,2},\dots,u_{H,w,1}^{3,\prime,l},\dots,u_{H,w,1}^{N,\prime,1},u_{H,w,1}^{N,\prime,2},\dots,u_{H,w,1}^{N,\prime,l})

is continuous. We equip l×(N1)\mathbb{R}^{l\times(N-1)} with the ll_{\infty} norm. Let us decompose the mapping as h=𝒫h2h1h=\mathcal{P}\circ h_{2}\circ h_{1}, where

h1:(W,)(L2(Ω×[0,T],),L2)wgw(x,t),\displaystyle h_{1}:(W,\|\cdot\|_{\infty})\to(L^{2}(\Omega\times[0,T],\mathbb{R}),\|\cdot\|_{L^{2}})\quad w\mapsto g_{w}(x,t),
h2:(L2(Ω×[0,T],),L2)(H1(Ω),H1),gw(x,t){uH(x,tn)}n=1N,\displaystyle h_{2}:(L^{2}(\Omega\times[0,T],\mathbb{R}),\|\cdot\|_{L^{2}})\to(H^{1}(\Omega),\|\cdot\|_{H^{1}}),\quad g_{w}(x,t)\mapsto\{u_{H}(x,t^{n})\}_{n=1}^{N},

and 𝒫\mathcal{P} is the POD transformation defined by (3.2). The map 𝒫\mathcal{P} is a projection, thus continuous. We assume that gwg_{w} is continuous with respect to ww and therefore, h1h_{1} is continuous. To show that hh is continuous, we only need to prove h2h_{2} is continuous. The proof is given in the next lemma.

Lemma A.1.

Let us assume ff is linear, and we define a bilinear form a(u,v):=(f(u),v)a(u,v):=(f(u),v) and its co-responding norm as ua:=a(u,u)\|u\|_{a}:=\sqrt{a(u,u)}. If

Δt(1γ)(supv2VH,2v2a2v22)1,\displaystyle\Delta t\leq(1-\gamma)\left(\sup_{v_{2}\in V_{H,2}}\cfrac{\|v_{2}\|_{a}^{2}}{\|v_{2}\|^{2}}\right)^{-1}, (A.1)

the function h2h_{2} is continuous with respect to the source term gw(x,t)g_{w}(x,t).

Proof.

Case I. The solution ui(x,Δt)u_{i}(x,\Delta t) for i=1,2i=1,2 is continuous with respect to the source term gw(x,t)g_{w}(x,t).
The initial condition u(x,0)u(x,0) is given and the solution at time step 1Δt1\cdot\Delta t is calculated via the fully implicit scheme. Thus, we want to show the continuity from the source term to the solution ui1u_{i}^{1} of the following scheme

(uH1uH0Δt,v)+(f(uH1),v)=(gw(x,t1),v)vVH,\displaystyle(\frac{u^{1}_{H}-u^{0}_{H}}{\Delta t},v)+(f(u^{1}_{H}),v)=(g_{w}(x,t^{1}),v)\quad\forall v\in V_{H},

where VH=VH,1VH,2V_{H}=V_{H,1}\oplus V_{H,2}.

In the following proof, we abbreviate the subscript HH for simplicity.

Let w1w_{1} and w2w_{2} be different, and let us denote the corresponding sources as gw1g_{w_{1}} and gw2g_{w_{2}}. The discretized solutions uw11u^{1}_{w_{1}} and uw21u^{1}_{w_{2}} evolved for one time step then satisfy:

(uw11u0Δt,v)+(f(uw11),v)=(gw1(x,t1),v)vVH,\displaystyle(\frac{u^{1}_{w_{1}}-u^{0}}{\Delta t},v)+(f(u^{1}_{w_{1}}),v)=(g_{w_{1}}(x,t^{1}),v)\quad\forall v\in V_{H},
(uw21u0Δt,v)+(f(uw21),v)=(gw2(x,t1),v)vVH.\displaystyle(\frac{u^{1}_{w_{2}}-u^{0}}{\Delta t},v)+(f(u^{1}_{w_{2}}),v)=(g_{w_{2}}(x,t^{1}),v)\quad\forall v\in V_{H}.

It follows immediately that,

(uw11uw21Δt,v)+(f(uw11)f(uw21),v)=(gw1(x,t1)gw2(x,t1),v)vVH.\displaystyle(\frac{u^{1}_{w_{1}}-u^{1}_{w_{2}}}{\Delta t},v)+(f(u^{1}_{w_{1}})-f(u^{1}_{w_{2}}),v)=(g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1}),v)\quad\forall v\in V_{H}.

Test with v=uw11uw21v=u^{1}_{w_{1}}-u^{1}_{w_{2}}, the linearity of ff then yields,

1Δtuw11uw212+(f(uw11uw21),uw11uw21)gw1(x,t1)gw2(x,t1)uw11uw21Δt2gw1(x,t1)gw2(x,t1)2+12Δtuw11uw212\displaystyle\begin{split}\frac{1}{\Delta t}\|u^{1}_{w_{1}}-u^{1}_{w_{2}}\|^{2}+(f(u^{1}_{w_{1}}-u^{1}_{w_{2}}),u^{1}_{w_{1}}-u^{1}_{w_{2}})&\leq\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|\|u^{1}_{w_{1}}-u^{1}_{w_{2}}\|\\ &\leq\frac{\Delta t}{2}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{1}{2\Delta t}\|u^{1}_{w_{1}}-u^{1}_{w_{2}}\|^{2}\end{split}

After rearrangement, we have

12Δtuw11uw212+uw11uw21a2Δt2gw1(x,t1)gw2(x,t1)2.\displaystyle\begin{split}\frac{1}{2\Delta t}\|u^{1}_{w_{1}}-u^{1}_{w_{2}}\|^{2}+\|u^{1}_{w_{1}}-u^{1}_{w_{2}}\|_{a}^{2}\leq\frac{\Delta t}{2}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}.\end{split}

Apply the strengthened Cauchy Schwartz inequality (3.3), let uwmn=u1,wmn+u2,wmnu_{w_{m}}^{n}=u_{1,w_{m}}^{n}+u_{2,w_{m}}^{n} with u1,wmnVH,1u_{1,w_{m}}^{n}\in V_{H,1} and u2,wmnVH,2u_{2,w_{m}}^{n}\in V_{H,2}, we have,

uw1nuw2n2\displaystyle\|u_{w_{1}}^{n}-u_{w_{2}}^{n}\|^{2} =u1,w1nu1,w2n2+u2,w1nu2,w2n2+2(u1,w1nu1,w2n,u2,w1nu2,w2n)\displaystyle=\|u_{1,w_{1}}^{n}-u_{1,w_{2}}^{n}\|^{2}+\|u_{2,w_{1}}^{n}-u_{2,w_{2}}^{n}\|^{2}+2(u_{1,w_{1}}^{n}-u_{1,w_{2}}^{n},u_{2,w_{1}}^{n}-u_{2,w_{2}}^{n})
u1,w1nu1,w2n2+u2,w1nu2,w2n22γ(u1,w1nu1,w2n,u2,w1nu2,w2n)\displaystyle\geq\|u_{1,w_{1}}^{n}-u_{1,w_{2}}^{n}\|^{2}+\|u_{2,w_{1}}^{n}-u_{2,w_{2}}^{n}\|^{2}-2\gamma(u_{1,w_{1}}^{n}-u_{1,w_{2}}^{n},u_{2,w_{1}}^{n}-u_{2,w_{2}}^{n})
(1γ)(u1,w1nu1,w2n2+u2,w1nu2,w2n2),\displaystyle\geq(1-\gamma)(\|u_{1,w_{1}}^{n}-u_{1,w_{2}}^{n}\|^{2}+\|u_{2,w_{1}}^{n}-u_{2,w_{2}}^{n}\|^{2}),

for all n>0n>0.

Thus,

1γ2Δtiui,w11ui,w212+uw11uw21a2Δt2gw1(x,t1)gw2(x,t1)2.\displaystyle\frac{1-\gamma}{2\Delta t}\sum_{i}\|u^{1}_{i,w_{1}}-u^{1}_{i,w_{2}}\|^{2}+\|u^{1}_{w_{1}}-u^{1}_{w_{2}}\|_{a}^{2}\leq\frac{\Delta t}{2}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}. (A.2)

Case II. The solution u1(x,nΔt)u_{1}(x,n\Delta t) is continuous with respect to the source term gw(x,t)g_{w}(x,t), for n=2,3,,Nn=2,3,\dots,N.

In the partially explicit scheme, for the source term gw1g_{w_{1}}, the solution ui,w1n+1u_{i,w_{1}}^{n+1} satisfies

(u1,w1n+1u1,w1nΔt+u2,w1nu2,w1n1Δt,v1)+(f(u1,w1n+1+u2,w1n),v1)=(gw1(x,tn),v1)v1VH,1,\displaystyle(\frac{u_{1,w_{1}}^{n+1}-u_{1,w_{1}}^{n}}{\Delta t}+\frac{u_{2,w_{1}}^{n}-u_{2,w_{1}}^{n-1}}{\Delta t},v_{1})+(f(u_{1,w_{1}}^{n+1}+u_{2,w_{1}}^{n}),v_{1})=(g_{w_{1}}(x,t^{n}),v_{1})\quad\forall v_{1}\in V_{H,1}, (A.3)
(u2,w1n+1u2,w1nΔt+u1,w1nu1,w1n1Δt,v2)+(f(u1,w1n+1+u2,w1n),v2)=(gw1(x,tn),v2)v2VH,2.\displaystyle(\frac{u_{2,w_{1}}^{n+1}-u_{2,w_{1}}^{n}}{\Delta t}+\frac{u_{1,w_{1}}^{n}-u_{1,w_{1}}^{n-1}}{\Delta t},v_{2})+(f(u_{1,w_{1}}^{n+1}+u_{2,w_{1}}^{n}),v_{2})=(g_{w_{1}}(x,t^{n}),v_{2})\quad\forall v_{2}\in V_{H,2}. (A.4)

Similarly, for gw2g_{w_{2}}, the solution ui,w2n+1u_{i,w_{2}}^{n+1} satisfies

(u1,w2n+1u1,w2nΔt+u2,w2nu2,w2n1Δt,v1)+(f(u1,w2n+1+u2,w2n),v1)=(gw2(x,tn),v1)v1VH,1,\displaystyle(\frac{u_{1,w_{2}}^{n+1}-u_{1,w_{2}}^{n}}{\Delta t}+\frac{u_{2,w_{2}}^{n}-u_{2,w_{2}}^{n-1}}{\Delta t},v_{1})+(f(u_{1,w_{2}}^{n+1}+u_{2,w_{2}}^{n}),v_{1})=(g_{w_{2}}(x,t^{n}),v_{1})\quad\forall v_{1}\in V_{H,1}, (A.5)
(u2,w2n+1u2,w2nΔt+u1,w2nu1,w2n1Δt,v2)+(f(u1,w2n+1+u2,w2n),v2)=(gw2(x,tn),v2)v2VH,2.\displaystyle(\frac{u_{2,w_{2}}^{n+1}-u_{2,w_{2}}^{n}}{\Delta t}+\frac{u_{1,w_{2}}^{n}-u_{1,w_{2}}^{n-1}}{\Delta t},v_{2})+(f(u_{1,w_{2}}^{n+1}+u_{2,w_{2}}^{n}),v_{2})=(g_{w_{2}}(x,t^{n}),v_{2})\quad\forall v_{2}\in V_{H,2}. (A.6)

Let us denote rin=ui,w1nui,w2nr^{n}_{i}=u_{i,w_{1}}^{n}-u_{i,w_{2}}^{n} for i=1,2i=1,2, and subtract (A.5) from (A.3) and (A.6) from (A.4), it follows that,

(r1n+1r1nΔt+r2nr2n1Δt,v1)+(f(u1,w1n+1+u2,w1n)f(u1,w2n+1+u2,w2n),v1)\displaystyle(\frac{r_{1}^{n+1}-r_{1}^{n}}{\Delta t}+\frac{r_{2}^{n}-r_{2}^{n-1}}{\Delta t},v_{1})+(f(u_{1,w_{1}}^{n+1}+u_{2,w_{1}}^{n})-f(u_{1,w_{2}}^{n+1}+u_{2,w_{2}}^{n}),v_{1})
=(gw1(x,tn)gw2(x,tn),v1)v1VH,1,\displaystyle=(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),v_{1})\quad\forall v_{1}\in V_{H,1},
(r2n+1r2nΔt+r1nr1n1Δt,v2)+(f(u1,w1n+1+u2,w1n)f(u1,w2n+1+u2,w2n),v2)\displaystyle(\frac{r_{2}^{n+1}-r_{2}^{n}}{\Delta t}+\frac{r_{1}^{n}-r_{1}^{n-1}}{\Delta t},v_{2})+(f(u_{1,w_{1}}^{n+1}+u_{2,w_{1}}^{n})-f(u_{1,w_{2}}^{n+1}+u_{2,w_{2}}^{n}),v_{2})
=(gw1(x,tn)gw2(x,tn),v2)v2VH,2.\displaystyle=(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),v_{2})\quad\forall v_{2}\in V_{H,2}.

The linearity of ff then yields,

(r1n+1r1nΔt+r2nr2n1Δt,v1)+(f(r1n+1+r2n),v1)=(gw1(x,tn)gw2(x,tn),v1)v1VH,1,\displaystyle(\frac{r_{1}^{n+1}-r_{1}^{n}}{\Delta t}+\frac{r_{2}^{n}-r_{2}^{n-1}}{\Delta t},v_{1})+(f(r_{1}^{n+1}+r_{2}^{n}),v_{1})=(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),v_{1})\quad\forall v_{1}\in V_{H,1},
(r2n+1r2nΔt+r1nr1n1Δt,v2)+(f(r1n+1+r2n),v2)=(gw1(x,tn)gw2(x,tn),v2)v2VH,2.\displaystyle(\frac{r_{2}^{n+1}-r_{2}^{n}}{\Delta t}+\frac{r_{1}^{n}-r_{1}^{n-1}}{\Delta t},v_{2})+(f(r_{1}^{n+1}+r_{2}^{n}),v_{2})=(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),v_{2})\quad\forall v_{2}\in V_{H,2}.

Let v1=r1n+1r1nv_{1}=r_{1}^{n+1}-r_{1}^{n} and v2=r2n+1r2nv_{2}=r_{2}^{n+1}-r_{2}^{n}. We get

(r1n+1r1nΔt+r2nr2n1Δt,r1n+1r1n)+(f(r1n+1+r2n),r1n+1r1n)=(gw1(x,tn)gw2(x,tn),r1n+1r1n),\displaystyle(\frac{r_{1}^{n+1}-r_{1}^{n}}{\Delta t}+\frac{r_{2}^{n}-r_{2}^{n-1}}{\Delta t},r_{1}^{n+1}-r_{1}^{n})+(f(r_{1}^{n+1}+r_{2}^{n}),r_{1}^{n+1}-r_{1}^{n})=(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),r_{1}^{n+1}-r_{1}^{n}),
(r2n+1r2nΔt+r1nr1n1Δt,r2n+1r2n)+(f(r1n+1+r2n),r2n+1r2n)=(gw1(x,tn)gw2(x,tn),r2n+1r2n).\displaystyle(\frac{r_{2}^{n+1}-r_{2}^{n}}{\Delta t}+\frac{r_{1}^{n}-r_{1}^{n-1}}{\Delta t},r_{2}^{n+1}-r_{2}^{n})+(f(r_{1}^{n+1}+r_{2}^{n}),r_{2}^{n+1}-r_{2}^{n})=(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),r_{2}^{n+1}-r_{2}^{n}).

Sum up these two equations, we have

1Δtirin+1rin2+1Δtij(rin+1rin,rjnrjn1)+(f(r1n+1+r2n),rn+1rn)\displaystyle\frac{1}{\Delta t}\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}+\frac{1}{\Delta t}\sum_{i\neq j}(r_{i}^{n+1}-r_{i}^{n},r_{j}^{n}-r_{j}^{n-1})+(f(r_{1}^{n+1}+r_{2}^{n}),r^{n+1}-r^{n})
=(gw1(x,tn)gw2(x,tn),rn+1rn),\displaystyle=(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),r^{n+1}-r^{n}), (A.7)

where rn=r1n+r2nr^{n}=r^{n}_{1}+r^{n}_{2}. The second term of (A.7) on the left hand side can be further estimated as,

1Δtij|(rin+1rin,rjnrjn1)|\displaystyle\frac{1}{\Delta t}\sum_{i\neq j}|(r_{i}^{n+1}-r_{i}^{n},r_{j}^{n}-r_{j}^{n-1})| γΔtijrin+1rinrjnrjn1\displaystyle\leq\frac{\gamma}{\Delta t}\sum_{i\neq j}\|r_{i}^{n+1}-r_{i}^{n}\|\|r_{j}^{n}-r_{j}^{n-1}\|
γ2Δtij(rin+1rin2+rjnrjn12),\displaystyle\leq\frac{\gamma}{2\Delta t}\sum_{i\neq j}(\|r_{i}^{n+1}-r_{i}^{n}\|^{2}+\|r_{j}^{n}-r_{j}^{n-1}\|^{2}),

where we apply the strenthened Cauchy Schwartz inequality. It follows that,

1Δtirin+1rin2+1Δtij(rin+1rin,rjnrjn1)\displaystyle\frac{1}{\Delta t}\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}+\frac{1}{\Delta t}\sum_{i\neq j}(r_{i}^{n+1}-r_{i}^{n},r_{j}^{n}-r_{j}^{n-1})
\displaystyle\geq 2γ2Δtirin+1rin2γ2Δtirinrin12\displaystyle\frac{2-\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}-\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}
=\displaystyle= (γ2Δt+1γΔt)irin+1rin2γ2Δtirinrin12.\displaystyle(\frac{\gamma}{2\Delta t}+\frac{1-\gamma}{\Delta t})\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}-\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}.

Note that (f(r1n+1+r2n),rn+1rn)=a(r1n+1+r2n,rn+1rn)(f(r_{1}^{n+1}+r_{2}^{n}),r^{n+1}-r^{n})=a(r_{1}^{n+1}+r_{2}^{n},r^{n+1}-r^{n}), we then can estimate the third term of (A.7),

(f(r1n+1+r2n),rn+1rn)\displaystyle(f(r_{1}^{n+1}+r_{2}^{n}),r^{n+1}-r^{n})
=\displaystyle= a(rn+1,rn+1rn)a(r2n+1r2n,rn+1rn)\displaystyle a(r^{n+1},r^{n+1}-r^{n})-a(r_{2}^{n+1}-r_{2}^{n},r^{n+1}-r^{n})
=\displaystyle= 12(rn+1a2+rn+1rna2rna2)a(r2n+1r2n,rn+1rn)\displaystyle\frac{1}{2}(\|r^{n+1}\|_{a}^{2}+\|r^{n+1}-r^{n}\|^{2}_{a}-\|r^{n}\|_{a}^{2})-a(r_{2}^{n+1}-r_{2}^{n},r^{n+1}-r^{n})
\displaystyle\geq 12(rn+1a2+rn+1rna2rna2)12r2n+1r2na212rn+1rna2\displaystyle\frac{1}{2}(\|r^{n+1}\|_{a}^{2}+\|r^{n+1}-r^{n}\|^{2}_{a}-\|r^{n}\|_{a}^{2})-\frac{1}{2}\|r_{2}^{n+1}-r_{2}^{n}\|_{a}^{2}-\frac{1}{2}\|r^{n+1}-r^{n}\|_{a}^{2}
=\displaystyle= 12(rn+1a2rna2)12r2n+1r2na2.\displaystyle\frac{1}{2}(\|r^{n+1}\|_{a}^{2}-\|r^{n}\|_{a}^{2})-\frac{1}{2}\|r_{2}^{n+1}-r_{2}^{n}\|_{a}^{2}.

Now (A.7) can be estimated as:

(γ2Δt+1γΔt)irin+1rin2γ2Δtirinrin12+12(rn+1a2rna2)12r2n+1r2na2\displaystyle(\frac{\gamma}{2\Delta t}+\frac{1-\gamma}{\Delta t})\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}-\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}+\frac{1}{2}(\|r^{n+1}\|_{a}^{2}-\|r^{n}\|_{a}^{2})-\frac{1}{2}\|r_{2}^{n+1}-r_{2}^{n}\|_{a}^{2}
(gw1(x,tn)gw2(x,tn),rn+1rn).\displaystyle\leq(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),r^{n+1}-r^{n}).

After rearranging, we have

γ2Δtirin+1rin2+12rn+1a2+1γΔtirin+1rin212r2n+1r2na2\displaystyle\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}+\frac{1}{2}\|r^{n+1}\|_{a}^{2}+\frac{1-\gamma}{\Delta t}\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}-\frac{1}{2}\|r_{2}^{n+1}-r_{2}^{n}\|_{a}^{2}
\displaystyle\leq γ2Δtirinrin12+12rna2+(gw1(x,tn)gw2(x,tn),rn+1rn)\displaystyle\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}+\frac{1}{2}\|r^{n}\|_{a}^{2}+(g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n}),r^{n+1}-r^{n})
\displaystyle\leq γ2Δtirinrin12+12rna2+γgw1(x,tn)gw2(x,tn)rn+1rn\displaystyle\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}+\frac{1}{2}\|r^{n}\|_{a}^{2}+\gamma\|g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n})\|\|r^{n+1}-r^{n}\|
\displaystyle\leq γ2Δtirinrin12+12rna2+γ2Δt1γgw1(x,tn)gw2(x,tn)2+1γ4Δtrn+1rn2\displaystyle\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}+\frac{1}{2}\|r^{n}\|_{a}^{2}+\frac{\gamma^{2}\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n})\|^{2}+\frac{1-\gamma}{4\Delta t}\|r^{n+1}-r^{n}\|^{2}
\displaystyle\leq γ2Δtirinrin12+12rna2+γ2Δt1γgw1(x,tn)gw2(x,tn)2+1γ2Δtirin+1rin2.\displaystyle\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}+\frac{1}{2}\|r^{n}\|_{a}^{2}+\frac{\gamma^{2}\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n})\|^{2}+\frac{1-\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}.

We assume the CFL condition (A.1), specifically,

(1γΔt1γ2Δt)irin+1rin212r2n+1r2na20.\displaystyle(\frac{1-\gamma}{\Delta t}-\frac{1-\gamma}{2\Delta t})\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}-\frac{1}{2}\|r_{2}^{n+1}-r_{2}^{n}\|_{a}^{2}\geq 0.

This yields,

γ2Δtirin+1rin2+12rn+1a2γ2Δtirinrin12+12rna2+γ2Δt1γgw1(x,tn)gw2(x,tn)2γ2Δtiui,w11ui,w21(ui,w10ui,w20)2+12uw11uw21a2+γ2Δt1γk=1ngw1(x,tk)gw2(x,tk)2=γ2Δtiui,w11ui,w212+12uw11uw21a2+γ2Δt1γk=1ngw1(x,tk)gw2(x,tk)2γΔt2(1γ)gw1(x,t1)gw2(x,t1)2+γ2Δt1γk=1ngw1(x,tk)gw2(x,tk)2,\displaystyle\begin{split}&\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n+1}-r_{i}^{n}\|^{2}+\frac{1}{2}\|r^{n+1}\|_{a}^{2}\\ \leq&\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n}-r_{i}^{n-1}\|^{2}+\frac{1}{2}\|r^{n}\|_{a}^{2}+\frac{\gamma^{2}\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{n})-g_{w_{2}}(x,t^{n})\|^{2}\\ \leq&\frac{\gamma}{2\Delta t}\sum_{i}\|u_{i,w_{1}}^{1}-u_{i,w_{2}}^{1}-(u_{i,w_{1}}^{0}-u_{i,w_{2}}^{0})\|^{2}+\frac{1}{2}\|u_{w_{1}}^{1}-u_{w_{2}}^{1}\|_{a}^{2}+\frac{\gamma^{2}\Delta t}{1-\gamma}\sum_{k=1}^{n}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2}\\ =&\frac{\gamma}{2\Delta t}\sum_{i}\|u_{i,w_{1}}^{1}-u_{i,w_{2}}^{1}\|^{2}+\frac{1}{2}\|u_{w_{1}}^{1}-u_{w_{2}}^{1}\|_{a}^{2}+\frac{\gamma^{2}\Delta t}{1-\gamma}\sum_{k=1}^{n}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2}\\ \leq&\frac{\gamma\Delta t}{2(1-\gamma)}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{\gamma^{2}\Delta t}{1-\gamma}\sum_{k=1}^{n}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2},\end{split} (A.8)

where we use (A.2) in the last inequality. Then,

γ2Δtirin+12γΔti(rin+1rin2+rin2)\displaystyle\frac{\gamma}{2\Delta t}\sum_{i}\|r_{i}^{n+1}\|^{2}\leq\frac{\gamma}{\Delta t}\sum_{i}(\|r_{i}^{n+1}-r_{i}^{n}\|^{2}+\|r_{i}^{n}\|^{2})
\displaystyle\leq γΔt1γgw1(x,t1)gw2(x,t1)2+2γ2Δt1γk=1ngw1(x,tk)gw2(x,tk)2+γΔtirin2\displaystyle\frac{\gamma\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{2\gamma^{2}\Delta t}{1-\gamma}\sum_{k=1}^{n}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2}+\frac{\gamma}{\Delta t}\sum_{i}\|r_{i}^{n}\|^{2}
\displaystyle\leq nγΔt1γgw1(x,t1)gw2(x,t1)2+2γ2Δt1γind=1nk=1indgw1(x,tk)gw2(x,tk)2+γΔtiri12\displaystyle\frac{n\gamma\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{2\gamma^{2}\Delta t}{1-\gamma}\sum_{ind=1}^{n}\sum_{k=1}^{ind}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2}+\frac{\gamma}{\Delta t}\sum_{i}\|r_{i}^{1}\|^{2}
\displaystyle\leq (n+1)γΔt1γgw1(x,t1)gw2(x,t1)2+2γ2Δt1γind=1nk=1indgw1(x,tk)gw2(x,tk)2,\displaystyle\frac{(n+1)\gamma\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{2\gamma^{2}\Delta t}{1-\gamma}\sum_{ind=1}^{n}\sum_{k=1}^{ind}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2},

where we apply (A.2) in the last inequality.

Therefore,

irin+12\displaystyle\sum_{i}\|r_{i}^{n+1}\|^{2}\leq 2(n+1)(Δt)21γgw1(x,t1)gw2(x,t1)2+4γ(Δt)21γind=1nk=1indgw1(x,tk)gw2(x,tk)2\displaystyle\frac{2(n+1)(\Delta t)^{2}}{1-\gamma}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{4\gamma(\Delta t)^{2}}{1-\gamma}\sum_{ind=1}^{n}\sum_{k=1}^{ind}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2}
\displaystyle\leq 2T(Δt)1γgw1(x,t1)gw2(x,t1)2+4γT21γmax1kN1gw1(x,tk)gw2(x,tk)2,\displaystyle\frac{2T(\Delta t)}{1-\gamma}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{4\gamma T^{2}}{1-\gamma}\max_{1\leq k\leq N-1}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2},

where T=NΔtT=N\Delta t is the final time. Also, from (A.8), we have

rn+1a2\displaystyle\|r^{n+1}\|_{a}^{2}\leq γΔt1γgw1(x,t1)gw2(x,t1)2+2γ2Δt1γk=1ngw1(x,tk)gw2(x,tk)2\displaystyle\frac{\gamma\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{2\gamma^{2}\Delta t}{1-\gamma}\sum_{k=1}^{n}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2}
\displaystyle\leq γΔt1γgw1(x,t1)gw2(x,t1)2+2γ2T1γmax1kN1gw1(x,tk)gw2(x,tk)2.\displaystyle\frac{\gamma\Delta t}{1-\gamma}\|g_{w_{1}}(x,t^{1})-g_{w_{2}}(x,t^{1})\|^{2}+\frac{2\gamma^{2}T}{1-\gamma}\max_{1\leq k\leq N-1}\|g_{w_{1}}(x,t^{k})-g_{w_{2}}(x,t^{k})\|^{2}.

With these two inequalities, we can obtain the conclusion.

References

  • [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  • [2] J. Aldaz. Strengthened Cauchy-Schwarz and Hölder inequalities. arXiv preprint arXiv:1302.2254, 2013.
  • [3] U. M. Ascher, S. J. Ruuth, and R. J. Spiteri. Implicit-explicit Runge-Kutta methods for time-dependent partial differential equations. Applied Numerical Mathematics, 25(2-3):151–167, 1997.
  • [4] I. Babuška and J. M. Melenk. The partition of unity method. Int. J. Numer. Meth. Engrg., 40:727–758, 1997.
  • [5] D. L. Brown, Y. Efendiev, and V. H. Hoang. An efficient hierarchical multiscale finite element method for stokes equations in slowly varying media. Multiscale Modeling & Simulation, 11(1):30–58, 2013.
  • [6] A. Chatterjee. An introduction to the proper orthogonal decomposition. Current Science, 78(7):808–817, 2000.
  • [7] B. Chetverushkin, E. Chung, Y. Efendiev, S.-M. Pun, and Z. Zhang. Computational multiscale methods for quasi-gas dynamic equations. Journal of Computational Physics, 440:110352, 2021.
  • [8] F. Chollet et al. Keras, 2015.
  • [9] E. Chung, Y. Efendiev, W. T. Leung, S.-M. Pun, and Z. Zhang. Multi-agent reinforcement learning accelerated mcmc on multiscale inversion problem. arXiv preprint arXiv:2011.08954, 2020.
  • [10] E. Chung, Y. Efendiev, S.-M. Pun, and Z. Zhang. Computational multiscale methods for parabolic wave approximations in heterogeneous media. arXiv preprint arXiv:2104.02283, 2021.
  • [11] E. Chung, W. T. Leung, S.-M. Pun, and Z. Zhang. A multi-stage deep learning based algorithm for multiscale model reduction. Journal of Computational and Applied Mathematics, 394:113506, 2021.
  • [12] E. T. Chung, Y. Efendiev, and T. Hou. Adaptive multiscale model reduction with generalized multiscale finite element methods. Journal of Computational Physics, 320:69–95, 2016.
  • [13] E. T. Chung, Y. Efendiev, and C. Lee. Mixed generalized multiscale finite element methods and applications. SIAM Multiscale Model. Simul., 13:338–366, 2014.
  • [14] E. T. Chung, Y. Efendiev, and W. T. Leung. Generalized multiscale finite element methods for wave propagation in heterogeneous media. Multiscale Modeling & Simulation, 12(4):1691–1721, 2014.
  • [15] E. T. Chung, Y. Efendiev, and W. T. Leung. Constraint energy minimizing generalized multiscale finite element method. Computer Methods in Applied Mechanics and Engineering, 339:298–319, 2018.
  • [16] E. T. Chung, Y. Efendiev, and W. T. Leung. Constraint energy minimizing generalized multiscale finite element method in the mixed formulation. Computational Geosciences, 22(3):677–693, 2018.
  • [17] E. T. Chung, Y. Efendiev, and W. T. Leung. Fast online generalized multiscale finite element method using constraint energy minimization. Journal of Computational Physics, 355:450–463, 2018.
  • [18] E. T. Chung, Y. Efendiev, W. T. Leung, and W. Li. Contrast-independent partially explicit time discretizations for nonlinear multiscale problems, 2021.
  • [19] E. T. Chung, Y. Efendiev, W. T. Leung, M. Vasilyeva, and Y. Wang. Non-local multi-continua upscaling for flows in heterogeneous fractured media. Journal of Computational Physics, 372:22–34, 2018.
  • [20] W. E and B. Engquist. Heterogeneous multiscale methods. Comm. Math. Sci., 1(1):87–132, 2003.
  • [21] Y. Efendiev, J. Galvis, and T. Hou. Generalized multiscale finite element methods (GMsFEM). Journal of Computational Physics, 251:116–135, 2013.
  • [22] Y. Efendiev, J. Galvis, G. Li, and M. Presho. Generalized multiscale finite element methods. nonlinear elliptic equations. Communications in Computational Physics, 15(3):733–755, 2014.
  • [23] Y. Efendiev and T. Hou. Multiscale Finite Element Methods: Theory and Applications, volume 4 of Surveys and Tutorials in the Applied Mathematical Sciences. Springer, New York, 2009.
  • [24] Y. Efendiev, W. T. Leung, W. Li, S.-M. Pun, and P. N. Vabishchevich. Nonlocal transport equations in multiscale media. modeling, dememorization, and discretizations, 2022.
  • [25] Y. Efendiev, W. T. Leung, G. Lin, and Z. Zhang. Hei: hybrid explicit-implicit learning for multiscale problems, 2021.
  • [26] Y. Efendiev and A. Pankov. Numerical homogenization of monotone elliptic operators. SIAM J. Multiscale Modeling and Simulation, 2(1):62–79, 2003.
  • [27] Y. Efendiev and A. Pankov. Homogenization of nonlinear random parabolic operators. Advances in Differential Equations, 10(11):1235–1260, 2005.
  • [28] W. T. L. Eric Chung, Yalchin Efendiev and P. N. Vabishchevich. Contrast-independent partially explicit time discretizations for multiscale flow problems. arXiv:2101.04863.
  • [29] J. Frank, W. Hundsdorfer, and J. Verwer. On the stability of implicit-explicit linear multistep methods. Applied Numerical Mathematics, 25(2-3):193–205, 1997.
  • [30] P. Henning, A. Målqvist, and D. Peterseim. A localized orthogonal decomposition method for semi-linear elliptic problems. ESAIM: Mathematical Modelling and Numerical Analysis, 48(5):1331–1349, 2014.
  • [31] T. Hou and X. Wu. A multiscale finite element method for elliptic problems in composite materials and porous media. J. Comput. Phys., 134:169–189, 1997.
  • [32] T. Y. Hou, D. Huang, K. C. Lam, and P. Zhang. An adaptive fast solver for a general class of positive definite matrices via energy decomposition. Multiscale Modeling & Simulation, 16(2):615–678, 2018.
  • [33] T. Y. Hou, Q. Li, and P. Zhang. Exploring the locally low dimensional structure in solving random elliptic pdes. Multiscale Modeling & Simulation, 15(2):661–695, 2017.
  • [34] T. Y. Hou, D. Ma, and Z. Zhang. A model reduction method for multiscale elliptic pdes with random coefficients using an optimization approach. Multiscale Modeling & Simulation, 17(2):826–853, 2019.
  • [35] P. Jenny, S. Lee, and H. Tchelepi. Multi-scale finite volume method for elliptic problems in subsurface flow simulation. J. Comput. Phys., 187:47–67, 2003.
  • [36] P. Kidger and T. Lyons. Universal Approximation with Deep Narrow Networks. In J. Abernethy and S. Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 2306–2327. PMLR, 09–12 Jul 2020.
  • [37] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2017.
  • [38] K. Kunisch and S. Volkwein. Galerkin proper orthogonal decomposition methods for parabolic problems. Numerische Mathematik, 90:117–148, 2001.
  • [39] C. Le Bris, F. Legoll, and A. Lozinski. An MsFEM type approach for perforated domains. Multiscale Modeling & Simulation, 12(3):1046–1077, 2014.
  • [40] W. T. Leung, G. Lin, and Z. Zhang. Nh-pinn: Neural homogenization based physics-informed neural network for multiscale problems. arXiv preprint arXiv:2108.12942, 2021.
  • [41] W. Li, A. Alikhanov, Y. Efendiev, and W. T. Leung. Partially explicit time discretization for nonlinear time fractional diffusion equations. arXiv preprint arXiv:2110.13248, 2021.
  • [42] G. Lin, C. Moya, and Z. Zhang. Accelerated replica exchange stochastic gradient langevin diffusion enhanced bayesian deeponet for solving noisy parametric pdes. arXiv preprint arXiv:2111.02484, 2021.
  • [43] G. Lin, Y. Wang, and Z. Zhang. Multi-variance replica exchange stochastic gradient mcmc for inverse and forward bayesian physics-informed neural network. arXiv preprint arXiv:2107.06330, 2021.
  • [44] H. Owhadi and L. Zhang. Metric-based upscaling. Comm. Pure. Appl. Math., 60:675–723, 2007.
  • [45] A. Roberts and I. Kevrekidis. General tooth boundary conditions for equation free modeling. SIAM J. Sci. Comput., 29(4):1495–1510, 2007.
  • [46] G. Samaey, I. Kevrekidis, and D. Roose. Patch dynamics with buffers for homogenization problems. J. Comput. Phys., 213(1):264–287, 2006.
  • [47] H. Shi and Y. Li. Local discontinuous galerkin methods with implicit-explicit multistep time-marching for solving the nonlinear cahn-hilliard equation. Journal of Computational Physics, 394:719–731, 2019.
  • [48] Y. Yang, M. Ghasemi, E. Gildin, Y. Efendiev, and V. Calo. Fast Multiscale Reservoir Simulations With POD-DEIM Model Reduction. SPE Journal, 21(06):2141–2154, 06 2016.
  • [49] Z. Zhang, E. T. Chung, Y. Efendiev, and W. T. Leung. Learning algorithms for coarsening uncertainty space and applications to multiscale simulations. Mathematics, 8(5):720, 2020.