This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convolutional neural network based reduced order modeling for multiscale problems

Xuehan Zhang School of Mathematical Sciences, Tongji University, Shanghai 200092, China. ([email protected]).    Lijian Jiang School of Mathematical Sciences, Tongji University, Shanghai 200092, China. ([email protected]).

Abstract

In this paper, we combine convolutional neural networks (CNNs) with reduced order modeling (ROM) for efficient simulations of multiscale problems. These problems are modeled by partial differential equations with high-dimensional random inputs. The proposed method involves two separate CNNs: Basis CNNs and Coefficient CNNs (Coef CNNs), which correspond to two main parts of ROM. The method is called CNN-based ROM. The former one learns input-specific basis functions from the snapshots of fine-scale solutions. An activation function, inspired by Galerkin projection, is utilized at the output layer to reconstruct fine-scale solutions from the basis functions. Numerical results show that the basis functions learned by the Basis CNNs resemble data, which help to significantly reduce the number of the basis functions. Moreover, CNN-based ROM is less sensitive to data fluctuation caused by numerical errors than traditional ROM. Since the tests of Basis CNNs still need fine-scale stiffness matrix and load vector, it can not be directly applied to nonlinear problems. The Coef CNNs can be applied to nonlinear problems and designed to determine the coefficients for linear combination of basis functions. In addition, two applications of CNN-based ROM are presented, including predicting MsFEM basis functions within oversampling regions and building accurate surrogates for inverse problems.

keywords: reduced-basis methods; convolutional neural network; random multiscale problems

1 Introduction

Multiscale problems, which encompass phenomenons characterized by significant variations across multiple scales, are widely applied in fields such as materials science, environmental science, biomechanics and many others. Examples include the behavior of composite materials, fluid flow in porous media, biological processes and turbulent fluid dynamics. Many of the problems are often modeled by partial differential equations (PDEs) with random inputs. The randomness is caused by heterogeneity of physical properties. For groundwater, the dispersed phases (e.g., pores or fractures), which may be randomly distributed in the spatial space, lead to fluctuation of hydraulic conductivity [1]. Moreover, the conductivity difference between phases is typically high-contrast. To capture the fine-scale features of systems, the direct numerical simulations using finite element methods are not feasible due to large scale of computation. Therefore, it is necessary to design efficient multiscale simulation methods, especially for real-time applications (e.g., inverse problems and stochastic control [2, 3]) and many-query contexts (e.g., design or optimization [4]).

During last decades, the idea of reduced-order modeling has been proposed for multiscale problems. The construction of reduced-order basis functions is the core of ROM. By fully exploring the local properties of differential operators, a class of multiscale reduction methods use a set of multiscale basis functions, which can effectively capture some fine-scale effects. These multiscale methods include multiscale finite element method (MsFEM) [5, 6], generalized multiscale finite element method (GMsFEM) [7, 8], constraint energy minimizing generalized multiscale finite element method (CEM-GMsFEM) [9, 10] and so on. Based on finite element methods and homogenization theory, the convergence of these methods can be rigorously analyzed. In addition, the same basis functions can be used for different source terms since the construction of multiscale basis functions only involves the differential operators. However, many local problems should be solved for each sample of random inputs to obtain the basis functions, which prevents efficient online computation. Unlike multiscale reduction methods, the reduced-basis methods (RB methods) obtain a universal collection of basis functions by data-driven methods such as greedy algorithm [11] and proper orthogonal decomposition (POD) [12]. However, most existing works have difficulty scaling to high-dimensional problems, such as random inputs based on Gaussian processes [13].

To obtain the approximated solutions, ROM usually implements Galerkin projection on a low-dimensional space spanned by the constructed basis functions. Furthermore, techniques, such as Discrete Empirical Interpolation Methods (DEIM) [14, 15], are used to obtain affine decompositions for nonlinear problems and achieve fast online computation. Although DEIM can significantly reduce online computation costs, there are still a couple of limitations. Firstly, a large number of basis functions are required for high-dimensional and highly nonlinear problems. This will impact on the efficiency of DEIM. Secondly, ROM with DEIM is still nonlinear, and need to recall nonlinear solvers such as Newton methods [16].

The ROM for multiscale problems has two stages: the construction of reduced-order basis functions and the computation of coarsen models. To improve the online efficiency, it is desirable to learn a quick response from samples of random inputs to basis functions. The ROM for multiscale problems assisted by deep learning techniques, which have powerful representation and generalization abilities, gains more and more attentions. For multiscale reduction methods, M. Wang et al. [17] learn the mapping from permeability to GMsFEM basis functions and coarse-scale stiffness matrices on local coarse grids using fully connected neural networks (FNNs). A. Choubineh et al. [18] reconstruct the basis functions of mixed GMsFEM from permeability matrix using Convolutional Neural Networks (CNNs). For reduced-basis methods, S. Cheung et al. [19] firstly use proper orthogonal decomposition (POD) to construct a set of global nodal basis functions. Then an FNN is used to approximate evolution of the coefficients and, therefore, the porous media flow. In addition, learning the mapping from parameters to coefficients using deep learning techniques is not a new idea in the field of non-intrusive reduced-order modeling [20, 21, 22, 23, 24].

In this paper, we consider the case that the multiscale problems are represented by steady partial differential equations with random inputs. Our goal is to develop a data-driven method to model multiscale problems and make fast online predictions. Similar to RB methods, we assume that there are snapshots of fine-scale solutions. Two distinct CNNs are designed through following two main stages of ROM, thus introducing the proposed method as CNN-based ROM. The first ones are called Basis CNNs, which learn the mapping from samples of random inputs to basis functions. Activation functions, inspired by Galerkin projection methods, are utilized at the output layers to reconstruct fine-scale solutions from the basis functions. Besides, condition number under Frobenius norm are included in loss functions to ensure the stability of training. Note that Basis CNNs can provide input-specific basis functions like multiscale reduction methods, which can break the limitation of RB methods for high-dimensional problems. However, the predictions of Basis CNNs are still dependent on fine-scale stiffness matrix and load vector of FEM, which are not available for nonlinear equations at the online stage. To overcome these issues, we design the second CNN, called Coefficient CNNs (Coef CNNs), to learn the final linear combinations of the approximated solutions using the same dataset as the Basis CNNs.

This article is structured as follows. In Section 2, we give the definitions and examples of multiscale problems, which is followed by a brief introduction to the reduced basis methods and convolutional neural networks in Section 3. Section 4 focuses on the proposed method, CNN-based ROM. In this section, we also explore the connections between the proposed method with MsFEM. In Section 5, a few numerical results are presented to illustrate the efficacy of the proposed method and compare it with POD-based RB methods. Two applications of CNN-based ROM are also available in this section, including learning MsFEM basis functions with oversampling techniques and being used as surrogates in inverse problems. Finally, some conclusions are given.

2 Problem setup

In this paper, we consider the following multiscale problems defined in a bounded domain 𝒮2\mathcal{S}\subseteq\mathbb{R}^{2} with boundary 𝒮\partial{\mathcal{S}}:

{(u(x);κ(x,ξ))=f(u,x),x𝒮,𝒟u(x)=0,x𝒮,\left\{\begin{aligned} &\mathcal{L}\Big{(}u(x);\kappa(x,\xi)\Big{)}=f(u,x),x\in\mathcal{S},\\ &\mathcal{D}u(x)=0,x\in\partial{\mathcal{S}},\end{aligned}\right. (2.1)

where \mathcal{L} denotes a nonlinear differential operator encoded with random input κ(x,ξ)\kappa(x,\xi). More specifically, κ(x,ξ)\kappa(x,\xi) is a random variable defined on a probability space 𝒫=(Ω,,)\mathcal{P}=(\Omega,\mathcal{F},\mathbb{P}) for any fixed point x𝒮x\in\mathcal{S}. Besides, f(u,x)f(u,x) is the nonlinear source term with sufficient regularity. The equations (2.1) thus have unique solutions with the boundary conditions 𝒟u(x)=0\mathcal{D}u(x)=0. Next, we give an example for (2.1).

  • Diffusion equations. In this case, we introduce a linear elliptic equation with Dirichlet boundary conditions as follows,

    {(κ(x,ξ)u(x))=f(x),x𝒮,u(x)=g(x),x𝒮,\left\{\begin{aligned} &-\nabla\cdot\big{(}\kappa(x,\xi)\nabla u(x)\big{)}=f(x),\ x\in\mathcal{S},\\ &u(x)=g(x),\ x\in\partial{\mathcal{S}},\end{aligned}\right. (2.2)

    where κ(x,ξ)\kappa(x,\xi) is a permeability field with multiscale features, and g(x)g(x) is Dirichlet boundary condition.

A numerical approximation of the solution uu of the problem (2.1) can be obtained by finite element methods. Denote by VV the appropriate solution space, the variational form of the problem (2.1) with homogenous Dirichlet boundary reads: find the approximated solution u^V\hat{u}\in V such that

((u^(x);κ(x;ξ)),v)=(f(u^,x),v),vV,\big{(}\mathcal{L}(\hat{u}(x);\kappa(x;\xi)),v\big{)}=\big{(}f(\hat{u},x),v\big{)},\ \forall v\in V,

where

((u^(x);κ(x;ξ)),v)=𝒮(u^(x);κ(x,ξ))v(x)𝑑x,(f(u^),v)=𝒮f(u^(x),x)v(x)𝑑x.\big{(}\mathcal{L}(\hat{u}(x);\kappa(x;\xi)),v\big{)}=\int_{\mathcal{S}}\mathcal{L}(\hat{u}(x);\kappa(x,\xi))v(x)dx,\big{(}f(\hat{u}),v\big{)}=\int_{\mathcal{S}}f(\hat{u}(x),x)v(x)dx.

If we partition the spatial domain by the grid size hh (the corresponding mesh grid is denoted as \mathcal{M}), we can obtain a finite-dimensional approximation uhVhNh×Nhu_{h}\in V_{h}\subseteq\mathbb{R}^{N_{h}\times N_{h}} of uu by solving the following nonlinear algebra equation,

Ah(uh;K)uh(K)=Fh(uh;K),A_{h}(u_{h};K)u_{h}(K)=F_{h}(u_{h};K), (2.3)

The random input κ\kappa is discretized over the mesh grid \mathcal{M}, thus is equivalent to a high-dimensional random matrix KNh×NhK\in\mathbb{R}^{N_{h}\times N_{h}}. The notation KiK_{i} represents the ii-th sample of KK.

To obtain a high-fidelity approximation of the problems (2.1) that have multiscale features, the number of degrees of freedom NhN_{h} is usually required to be large. Given the dataset {Ki,uh(Ki)}i=1M\{K_{i},u_{h}(K_{i})\}_{i=1}^{M}, our goal is to find a low-dimensional approximation uNu_{N} of uhu_{h} to solve the variational problems (2.3) efficiently, yet retaining the essential features of the maps KuhK\rightarrow u_{h}.

3 Preliminary knowledge

In this section, we will introduce the preliminary knowledge for the proposed method. The problem outlined in Section 2 will be considered in ROM. In Subsection 3.1, we give a short review of RB methods, accompanied by an analysis of their limitations. Following this, Subsection 3.2 will provide a brief introduction to CNNs and compare them with fully connected neural networks.

3.1 The reduced basis method

The problem (2.3) is a parametrized PDE with high-dimensional parameter space. In the field of model reduction of parametrized PDEs, the RB method has been one of most widely used method. In the following, we briefly review the main ideas of the RB method, which is the foundation of our work. Additional details are referred to [25, 26].

The RB method assume that the high-fidelity approximation uhu_{h} has low-dimensional features, i.e., uhu_{h} can be well represented by the linear combination of NN basis functions, NNhN\ll N_{h}. The NN basis functions construct a space called reduced space VNV_{N} (the RB space), which is algebraically denoted by the matrix PNNh×N:=[p1,p2,,pN],piNh×1,i=1,,NP_{N}\in\mathbb{R}^{N_{h}\times N}:=[p_{1},p_{2},\ldots,p_{N}],\ \ p_{i}\in\mathbb{R}^{N_{h}\times 1},\ \ i=1,\ldots,N. Then we can obtain the following RB problem

AN(uN;K)uN(K)=FN(uN;K),A_{N}(u_{N};K)u_{N}(K)=F_{N}(u_{N};K), (3.1)

where ANN×NA_{N}\in\mathbb{R}^{N\times N}, FNN×1F_{N}\in\mathbb{R}^{N\times 1} and uN(K)N×1u_{N}(K)\in\mathbb{R}^{N\times 1} is the reduced vector of degrees of freedom that is called RB solution. A common criterion to obtain the RB problem (3.1) from (2.3) is Galerkin projection [27], that is

AN(uN;K)=PNTAh(PNTuN;K)PN,FN(uN;K)=PNTFh(PNTuN;K).A_{N}(u_{N};K)=P_{N}^{T}A_{h}(P_{N}^{T}u_{N};K)P_{N},\ F_{N}(u_{N};K)=P_{N}^{T}F_{h}(P_{N}^{T}u_{N};K). (3.2)

The computational procedure of RB methods is commonly decomposed into offline and online phases: during offline phase, reduced basis functions are generated, and auxiliary quantities are precomputed. Then, in the online phase, for different instances of the parameters, the RB solutions can be efficiently computed.

3.1.1 The construction of RB space

The goal of RB methods is to find linear approximation spaces VNV_{N} for which the worst best-approximation error for elements of VhV_{h},

dVN(Vh):=supuhVhinfνVNuhν,d_{V_{N}}(V_{h}):=\mathop{\sup}\limits_{u_{h}\in V_{h}}\mathop{\inf}\limits_{\nu\in V_{N}}\|u_{h}-\nu\|,

is near the Kolmogorov N-width of VhV_{h}

dN(Vh):=infWVhdimWNsupuhVhinfνWuhν.d_{N}(V_{h}):=\mathop{\inf}\limits_{\begin{subarray}{c}W\subseteq V_{h}\\ \dim W\leq N\end{subarray}}\mathop{\sup}\limits_{u_{h}\in V_{h}}\mathop{\inf}\limits_{\nu\in W}\|u_{h}-\nu\|.

The existence of the optimal subspace VNV_{N} is guaranteed by the Theorem II.2.3 in [28].

In practical, the reduced basis functions PNP_{N} of VNV_{N} can be generated by either greedy algorithms or proper orthogonal decomposition (POD). We briefly recall the latter as we will use it in the numerical results for comparision. Given a snapshot matrix of data

𝒰={uh(K1),uh(K2),,uh(KM)}\mathcal{U}=\{u_{h}(K_{1}),u_{h}(K_{2}),\ldots,u_{h}(K_{M})\}

the main idea of POD is to find a finite subspace VNVhV_{N}\subseteq V_{h} with orthogonal basis PN:=[p1,p2,,pN]P_{N}:=[p_{1},p_{2},\ldots,p_{N}] to minimize the projection error,

PN=argminPTP=IpiVNi=1Muh(Ki)PPTuh(Ki)22.P_{N}=\mathop{\arg\min}\limits_{\begin{subarray}{c}P^{T}P=I\\ p_{i}\in V_{N}\end{subarray}}\sum_{i=1}^{M}\|u_{h}(K_{i})-PP^{T}u_{h}(K_{i})\|_{2}^{2}.

By taking advantage of the singular value decomposition (SVD) of 𝒰\mathcal{U}

U=PΣZT,U=P\Sigma Z^{T},

we can obtain the matrix PNP_{N} whose columns are the NN orthogonal basis of VNV_{N}: that is the NN columns of PP corresponding to the largest diagonal elements in Σ\Sigma.

We note that POD is a powerful tool to exploit the low-rank features of data. However, there are still two bottlenecks:

  • POD only provide the optimal linear representations of data in l2l_{2} sense, and it is not guaranteed that they are the best basis functions in a broader sense.

  • RB methods, including POD-based RB methods, try to represent the solutions for different parameters in the same subspace of VhV_{h}, which means that a larger number of basis functions are needed to obtain accurate approximation of parametrized PDEs with high-dimensional parameter space. This will affect the efficiency and accuracy of RB methods.

3.1.2 Online stage of RB methods

Efficient implementation of RB-methods relies on the affine decompositions of stiffness matrix and load vectors, that is Ah(uh;K)A_{h}(u_{h};K) and Fh(uh;K)F_{h}(u_{h};K) can be written in parameter-seperable form, as follows

Ah(PNTuN;K)=q=1QAcAq(uN,K)Ahq,Fh(PNTuN;K)=q=1QFcFq(uN,K)Fhq.A_{h}(P_{N}^{T}u_{N};K)=\sum_{q=1}^{Q_{A}}c_{A}^{q}(u_{N},K)A_{h}^{q},\ F_{h}(P_{N}^{T}u_{N};K)=\sum_{q=1}^{Q_{F}}c_{F}^{q}(u_{N},K)F_{h}^{q}.

According to (3.2), we can obtain that

AN(uN;K)=q=1QAcAq(uN,K)ANq,FN(uN;K)=q=1QFcFq(uN,K)FNq,A_{N}(u_{N};K)=\sum_{q=1}^{Q_{A}}c_{A}^{q}(u_{N},K)A_{N}^{q},\ F_{N}(u_{N};K)=\sum_{q=1}^{Q_{F}}c_{F}^{q}(u_{N},K)F_{N}^{q},

where ANq=PNTAhqPN,FNq=PNTFhqA_{N}^{q}=P_{N}^{T}A_{h}^{q}P_{N},F_{N}^{q}=P_{N}^{T}F_{h}^{q}. Precomputing {ANq}q=1QA\{A_{N}^{q}\}_{q=1}^{Q_{A}} and {FNq}q=1QF\{F_{N}^{q}\}_{q=1}^{Q_{F}} in the offline phase, we can avoid operations with a complexity dependent on NhN_{h}.

However, affine decompositions of bilinear forms (or linear forms) is a strong assumption, which is not satisfied in many cases. DEIM are used to provide an approximated parameter-seperable representations. During offline phase, POD is used to obtain parameter-independent functions {ANq}q=1QA\{A_{N}^{q}\}_{q=1}^{Q_{A}} and {FNq}q=1QF\{F_{N}^{q}\}_{q=1}^{Q_{F}}. Then, during online phase, an interpolation problem is solved to obtain the coefficients {cNq(uN,K)}q=1QA\{c_{N}^{q}(u_{N},K)\}_{q=1}^{Q_{A}} and {cNq(uN,K)}q=1QF\{c_{N}^{q}(u_{N},K)\}_{q=1}^{Q_{F}} for each new parameters KK.

DEIM techniques are powerful tool to deal with nonlinear and non-affine RB problems, however there are two main limitations:

  • For complex problems (i.e., high-dimensional and nonlinear problems), a large number of basis functions must be employed to obtain an accurate approximation, which prevents efficiency of online computation.

  • For nonlinear problems, DEIM can help to reduce the computation costs but can not avoid iteration (such as newton iteration methods) since it is an interpolation method.

3.2 Convolutional Neural Network

Refer to caption
(a) Fully connections.
Refer to caption
(b) Local connections.
Refer to caption
(c) Shared weights.
Figure 3.1: Architectures of the FNN and the CNN.

In the field of machine learning, the problem (2.1) can be regarded as an image-to-image regression task [13]. Both the discrete solution uhu_{h} and the sample of the random input KK are matrix with respect to FE mesh \mathcal{M}, which are usually high-dimensional. Using feed-forward neural networks (FNN) for such tasks usually leads to large training parameters and is prone to overfitting, because they are usually fully-connected, that is, each neuron in one layer is connected to all neurons in the next layer (see Figure 3.1(a)). Convolutional neural networks is a special type of deep neural network that is inspired by human visual system [29] and have been widely used in many fields such as image recognition, natural language processing and so on. There are two special aspects in the architecture of CNN, i.e., local connections (see Figure 3.1(b)) and shared weights (see Figure 3.1(c)), which can well deal with image-to-image regression tasks.

Refer to caption
Figure 3.2: Convolutional layer in the CNN.

A complete CNN contains convolution layers and pooling layers. We briefly introduce the former as we will use it in the proposed methods. The value of a neuron ylnxy_{ln}^{x} at position x of the nn\ th feature map in the ll\ th layer is defined as (see Figure 3.2)

ylnx=σ(bln+mk=0Kl1wlnmky(l1)mx+k),y_{ln}^{x}=\sigma\Big{(}b_{ln}+\sum_{m}\sum_{k=0}^{K_{l}-1}w_{lnm}^{k}y_{(l-1)m}^{x+k}\Big{)},

where mm indexes the feature map in the previous layer (l1l-1\ th layer), wlnmkw_{lnm}^{k} is the weight of position xx connected to the mm\ th feature map, KlK_{l} is the width of the kernel, blnb_{ln} is the bias of the nn\ th feature map in current layer, and σ\sigma is the activation function.

As shown in Figure 3.2, convolution operations will change the size of inputs. To preserve the spatial dimensions of the inputs or feature maps, padding techniques [30], which refers to the addition of extra pixels around the borders of the input images or feature map, can be used.

4 Convolutional neural network based reduced-order modeling (CNN-based ROM)

In this section, we will introduce a novel framework for approximating PDEs with random inputs by coupling CNNs with the ideas of the reduced-order modeling, which is called CNN-based ROM. There are two main components of the proposed architecture, that is Basis CNNs and Coefficient CNNs (Coef CNNs). We first learn the input-specific basis functions PN(K)P_{N}(K) by Basis CNNs. Besides dataset {Ki,uh(Ki)}i=1M\Big{\{}K_{i},u_{h}(K_{i})\Big{\}}_{i=1}^{M}, FEM bilinear forms (or linear forms) {Ah(uh(Ki);Ki),Fh(uh(Ki);Ki)}i=1M\Big{\{}Ah(uh(K_{i});K_{i}),Fh(uh(K_{i});K_{i})\Big{\}}_{i=1}^{M}, as parts of activation functions at the output layer, are involved to obtain the reduced-order solution uNu_{N}. However, such Basis CNNs alone can only be used to linear problems since unknowns uh(Ki)u_{h}(K_{i}) still exists in the FEM bilinear forms (or linear forms). In terms of the results of Basis CNNs, Coef CNNs are thus designed to solve these issues through learning the mapping from KK to coefficients, which can also improve the online efficiency.

Remark 4.1.

To simplify the notations, we will describe CNN-based ROM for linear problems, i.e.,

Ah(K)uh(K)=Fh(K).A_{h}(K)u_{h}(K)=F_{h}(K). (4.1)

However, the proposed framework includes the nonlinear cases (2.3), since we can linearize the discrete equations by replacing uhu_{h} in Ah(uh;Ki)A_{h}(u_{h};K_{i}) and Fh(uh;Ki)F_{h}(u_{h};K_{i}) with data uh(Ki)u_{h}(K_{i}).

Refer to caption
Figure 4.1: The schema of CNN-based ROM.

4.1 The Basis CNNs (Basis CNNs)

PDEs with random inputs theoretically have infinite dimensional parameter space. In practical, to capture the multiscale features in the random inputs, we usually sample from random inputs κ\kappa at a very fine FEM mesh \mathcal{M}. Thus, the problems we considered here have higher dimensional parameter space than ones in traditional RB methods. The success of RB methods significantly depends on the quality of RB basis functions, which can remain the features of the mapping KuhK\rightarrow u_{h} while realizing model reduction. This requires the RB methods capture the common features of PDEs in the same parameter space, therefore it is crucial to choose representative points in parameters space to form the reduced basis functions, which is intractable in high-dimensional parameter space. A large number of reduced basis functions are often used for high-dimensional and nonlinear problems.

For multiscale reduction methods, a set of local problems require to be solved at the online stage to obtain the basis functions, which may be time-consuming for large-scale problems. In addition, basis functions constructed by multiscale reduction methods also lack global information due to localization.

To solve these issues, we learn the basis functions from data for each instance of random inputs, i.e.,

ΦθBase\displaystyle\Phi^{Base}_{\theta} :Nh×NhNh2×N,\displaystyle:\mathbb{R}^{N_{h}\times N_{h}}\rightarrow\mathbb{R}^{N_{h}^{2}\times N},
PN(K)\displaystyle P_{N}(K) =ΦθBase(K).\displaystyle=\Phi^{Base}_{\theta}(K).

To capture the multiscale features from image-like inputs KK, CNN is a good choice. Compared with FNN, local connections and shared weights in the CNN can help better extract significant local features and significantly reduce the trainable parameters. To maintain the integrity of spatial information, pooling layers are not included in Basis CNNs and padding techniques are used to keep the size of feature maps after convolution operations. In addition, we apply Batch Normalization layers (BN) after the convolutional layers (Conv) and before the activation functions (see Figure 4.2(a)) to accelerate model convergence and avoid overfitting [31]. In the last layer, 1×11\times 1 convolutional layer are used to adjust the number of basis functions and fuse features of different channels. The specific architecture of Basis CNNs refer to Figure 4.2(b).

Refer to caption
(a) Convolutional block.
Refer to caption
(b) Total architecture.
Figure 4.2: The architecture of Basis CNNs.

4.1.1 Galerkin-projection activation function

In this subsection, we will introduce an activation function at the output layer to reconstruct the high-fidelity solution uh(K)u_{h}(K) from basis functions, that is, using u^h(K)=PN(K)uN(K)\hat{u}_{h}(K)=P_{N}(K)u_{N}(K) to approximate the high-fidelity solutions uh(K)u_{h}(K). Given PN(K)P_{N}(K) by Basis CNNs, we use Galerkin projection to obtain uN(K)u_{N}(K) by solving the algebra equation (3.1). Define the residual by

(uh,K)=Ah(K)uh(K)Fh(K).\mathcal{R}(u_{h},K)=A_{h}(K)u_{h}(K)-F_{h}(K).

Forcing the residual of u^h\hat{u}_{h} orthogonal to VNV_{N}, uN(K)u_{N}(K) satisfied

PN(K)T(u^h,K)=PN(K)T(Ah(K)PN(K)uN(K)Fh(K))=0.P_{N}(K)^{T}\mathcal{R}(\hat{u}_{h},K)=P_{N}(K)^{T}\Big{(}A_{h}(K)P_{N}(K)u_{N}(K)-F_{h}(K)\Big{)}=0.

We can then obtain the approximated solution u^h\hat{u}_{h} by solving above algebra equation, i.e.,

u^h(K)=PN(K)(PN(K)TAh(K)PN(K))1PN(K)TFh(K),\hat{u}_{h}(K)=P_{N}(K)\Big{(}P_{N}(K)^{T}A_{h}(K)P_{N}(K)\Big{)}^{-1}P_{N}(K)^{T}F_{h}(K),

where PN(K)=ΦθBase(K)P_{N}(K)=\Phi_{\theta}^{Base}(K). We call it Galerkin-projection activation functions σG\sigma_{G} and define it as a function of PNP_{N}, that is

σG(PN;Ah,Fh)=PN(PNTAhPN)1(PNTFh)=PNAN1FN,\sigma_{G}(P_{N};A_{h},F_{h})=P_{N}(P_{N}^{T}A_{h}P_{N})^{-1}(P_{N}^{T}F_{h})=P_{N}A_{N}^{-1}F_{N},

where AN=PNTAhPN,FN=PNTFhA_{N}=P_{N}^{T}A_{h}P_{N},\ F_{N}=P_{N}^{T}F_{h}.

Above all, We can write the forward propagation of Basis CNNs as

u^h(K)=σGΦθBase(K).\hat{u}_{h}(K)=\sigma_{G}\circ\Phi_{\theta}^{Base}(K). (4.2)
Remark 4.2.

For elliptic equations, such as the diffusion equations (2.2), Ce´aC\acute{e}a lemma [32] gaurantees that u^h(K)\hat{u}_{h}(K) defined in (4.2) best approximate the truth u(K)u(K) in the linear space VNV_{N} spanned by PNP_{N}. More formally, we denote the bounded and VNV_{N}-elliptic bilinear form by a(,)a(\cdot,\cdot), which leads to the definition of the norm uVN=a(u,u)\|u\|_{V_{N}}=\sqrt{a(u,u)}. Then there is a constant C>0C>0, such that

u(K)u^h(K)VNCinfνVNu(K)νVN.\|u(K)-\hat{u}_{h}(K)\|_{V_{N}}\leq C\mathop{\inf}\limits_{\nu\in V_{N}}\|u(K)-\nu\|_{V_{N}}.
Remark 4.3.

When training the networks, back-propagation algorithms require the derivatives of the loss function with respect to trainable parameters. Therefore, a well-defined activation function σG\sigma_{G} must be differentiable with respect to input variables, i.e., reduced basis functions PN=[pij]Nh2×NP_{N}=[p_{ij}]_{N_{h}^{2}\times N}. The jj-th column of PNP_{N} is represented as pj,j=1,,Np_{j},j=1,\ldots,N. Denote the reduced-order solution by uN=AN1FNu_{N}=A_{N}^{-1}F_{N} and the elements of matrix AhA_{h} by Ah=[aij]Nh2×Nh2A_{h}=[a_{ij}]_{N_{h}^{2}\times N_{h}^{2}}, we can compute them as follows

σGpij\displaystyle\frac{\partial{\sigma_{G}}}{\partial{p_{ij}}} =uNejPNAN1ANpijAN1FN+PNAN1FNej,\displaystyle=u_{N}*e_{j}-P_{N}A_{N}^{-1}\frac{\partial{A_{N}}}{\partial{p_{ij}}}A_{N}^{-1}F_{N}+P_{N}A_{N}^{-1}F_{N}*e_{j},
=uNejPNAN1HijAN1FN+PNAN1FNej,\displaystyle=u_{N}*e_{j}-P_{N}A_{N}^{-1}H_{ij}A_{N}^{-1}F_{N}+P_{N}A_{N}^{-1}F_{N}*e_{j},

where ei=[0,,1,,0]TN×1e_{i}=[0,\ldots,1,\ldots,0]^{T}\in\mathbb{R}^{N\times 1} is the ii-th unit vector. Besides, the element hkth_{kt} of matrix HijN×NH_{ij}\in\mathbb{R}^{N\times N} are

hkt=pkTAhptpij=m=1Nh2n=1Nh2pmkpntamnpij.h_{kt}=\frac{\partial p_{k}^{T}A_{h}p_{t}}{\partial p_{ij}}=\sum_{m=1}^{N_{h}^{2}}\sum_{n=1}^{N_{h}^{2}}\frac{\partial p_{mk}p_{nt}a_{mn}}{\partial p_{ij}}.

4.1.2 Loss function

Given the dataset {Ki,uh(Ki)}i=1M\Big{\{}K_{i},u_{h}(K_{i})\Big{\}}_{i=1}^{M} and FEM bilinear forms (or linear forms) {Ah(Ki),Fh(Ki)}i=1M\Big{\{}A_{h}(K_{i}),F_{h}(K_{i})\Big{\}}_{i=1}^{M}, we define the loss function as follows

LossRB(θ)\displaystyle Loss_{RB}(\theta) =1Mi=1Muh(Ki)u^h(Ki)L22+λGcondF(AN(Ki))2,\displaystyle=\frac{1}{M}\sum_{i=1}^{M}\|u_{h}(K_{i})-\hat{u}_{h}(K_{i})\|_{L_{2}}^{2}+\lambda_{G}\ cond_{F}\Big{(}A_{N}(K_{i})\Big{)}^{2}, (4.3)
=1Mi=1Muh(Ki)σG(ΦθBase(Ki);Ah(Ki),Fh(Ki))L22\displaystyle=\frac{1}{M}\sum_{i=1}^{M}\|u_{h}(K_{i})-\sigma_{G}\big{(}\Phi_{\theta}^{Base}(K_{i});A_{h}(K_{i}),F_{h}(K_{i})\big{)}\|_{L_{2}}^{2}
+λGcondF((ΦθBase(Ki))TAhΦθBase(Ki))2,\displaystyle+\lambda_{G}\ cond_{F}\Big{(}\big{(}\Phi_{\theta}^{Base}(K_{i})\big{)}^{T}A_{h}\Phi_{\theta}^{Base}(K_{i})\Big{)}^{2},

where the L2L_{2} is the L2L_{2} norm. In practical, we approximate the L2L_{2} norm in the loss function at the discrete points on the FEM mesh \mathcal{M}, and condFcond_{F} is the condition number under Frobenious norm, i.e.,

condF(A)=AFAF1.cond_{F}(A)=\|A\|_{F}\|A\|_{F}^{-1}.

The first term of loss function (4.3) trains the neural networks to approximate the data. As shown in Remark 4.3, the training depends on the inverse of ANA_{N}. However, ANA_{N} learned by neural networks may be ill-conditioned because it does not have good properties, such as orthogonality [33]. To guarantee the stability of training and the reduced-order model (3.1), we impose the second term in the loss function. In theoretical analysis, condition number under 2-norm are widely used, but it is intractable to compute during training. Therefore, we choose the condition number under Frobenious norm here. The stability parameter λG\lambda_{G} is a hyperparameter.

4.2 The Coefficient CNNs (Coef CNNs)

The Basis CNNs can generate a reduced-order basis function PNP_{N} of the problem (4.1) for given input KK, and the corresponding reduced-order model can be written as follows

AN(K)uN(K)=FN(K),A_{N}(K)u_{N}(K)=F_{N}(K), (4.4)

where AN(K)=PNTAh(K)PNA_{N}(K)=P_{N}^{T}A_{h}(K)P_{N} and FN(K)=PNTFh(K)F_{N}(K)=P_{N}^{T}F_{h}(K). Here, the computation of AN(K)A_{N}(K) and FN(K)F_{N}(K) still depend on Nh2N_{h}^{2}, which prevents the efficiency of online stage. Besides, we should note that Basis CNNs alone is not applicable for nonlinear equations because AhA_{h} and FhF_{h} not only depend on input KK but also unknown solution uhu_{h}. To overcome these issues and extend the idea to nonlinear problems, Coef CNNs are designed to learn the coefficients c(K)c(K) of linear combination u~h(K)=ΦθBase(K)c(K)\tilde{u}_{h}(K)=\Phi_{\theta}^{Base}(K)c(K) to approximate high fidelity solution uh(K)u_{h}(K), i.e.,

ΦϕCoef\displaystyle\Phi_{\phi}^{Coef} :Nh×NhN×1,\displaystyle:\mathbb{R}^{N_{h}\times N_{h}}\rightarrow\mathbb{R}^{N\times 1},
c(K)\displaystyle c(K) =ΦϕCoef(K),\displaystyle=\Phi_{\phi}^{Coef}(K),

where c(K)c(K) is a NN -dimensional vector with elements ci(K),i=1,,Nc_{i}(K),i=1,\ldots,N. Combined with the result of Basis CNNs, we can obtain the following surrogates of the equations (2),

u~h(K)=PN(K)c(K)=ΦθRB(K)ΦϕCoef(K).\tilde{u}_{h}(K)=P_{N}(K)c(K)=\Phi_{\theta}^{RB}(K)\Phi_{\phi}^{Coef}(K).

The architecture of Coef CNNs is similar to encoders and bottlenecks of Convolutional Autoencoder[34]. Convolutional blocks (see Figure 4.3(a)) are first connected to capture local features from input KK. Fully connected layers (FC) at last few layers help to compress the extracted feature into a smaller vector representation. (see Figure 4.3(b)).

Refer to caption
(a) Convolutional block.
Refer to caption
(b) Total architecture.
Figure 4.3: The architecture of Coef CNNs.

Finally, We use following loss function to train the coefficient CNNs,

LossCoef(ϕ)=1Mi=1Muh(Ki)u~h(Ki)L22.\displaystyle Loss_{Coef}(\phi)=\frac{1}{M}\sum_{i=1}^{M}\|u_{h}(K_{i})-\tilde{u}_{h}(K_{i})\|_{L_{2}}^{2}.

4.3 Connections with multiscale finite element methods

Refer to caption
Figure 4.4: The computational domain of MsFEM.

CNN-based ROM tries to obtain the reduced basis functions by extracting multiscale features from data. Therefore, the Basis CNNs need to be retrained for different source terms. Rather than directly solving above issues, we attempt to find connections between the proposed methods and another class of multiscale model reduction methods, called multiscale finite element methods (MsFEM)[5]. For convenience, we use the diffusion problems as an example (2.2).

The idea that learning input-specific basis functions PP by Basis CNNs comes from MsFEM. MsFEM is also a model reduction method, which is realized by constructing multiscale finite element basis functions that are adaptive to the local property of differential operators. Therefore, the basis functions of MsFEM is KK-specific for the given differential operators. Unlike traditional FEM, which need fine meshes h\mathcal{M}_{h} to capture small-scale features accurately, MsFEM enriches the basis functions by solving local problems on a coarse grid H\mathcal{M}_{H}. For each element ωH\omega\subset\mathcal{M}_{H}, four nodal basis functions {Φmsi,i=1,2,3,4}\{\Phi_{ms}^{i},i=1,2,3,4\} satisfy

(κ(x,ξ)Φmsi)=0.-\nabla\cdot\Big{(}\kappa(x,\xi)\nabla\Phi_{ms}^{i}\Big{)}=0. (4.5)

Let xjωx_{j}\in\partial{\omega} be the four nodal points of ω\omega. We pose the constraints Φmsi(xj)=δij\Phi_{ms}^{i}(x_{j})=\delta_{ij} and assume that the basis functions vary linearly along the boundary S\partial{S}. Here, δij\delta_{ij} is Kronecker Delta fuction.

Using CNN-based ROM to learn multiscale finite element basis functions in the oversampling domains. Similar to other numerical upscaling methods, MsFEM also suffers ”resonance” effects. Oversampling is a common strategy to overcome this difficulty. In practical, to avoid repeated computations, the spatial space 𝒮\mathcal{S} is usually decomposed into a number of large sampling regions. Each of these sampling regions contains many coarse elements. The choice of sampling regions size is a balance between accuracy and efficiency. Large size usually leads to faster convergence rate, but it is not practical on the online stage. Here, we use CNN-based ROM to learn the basis functions in a relative large Oversampling domains, which can even be the same as computational domain 𝒮\mathcal{S}. Such application can not only help to reduce online time of MsFEM with oversampling techniques but also breaks some limitations of the proposed method. The implementation details and numerical results can be found in Section 5.1.2.

5 Numerical results

In this section, the numerical results for various multiscale problems with random inputs are presented to show the accuracy and efficiency of CNN-based ROM method.

To illustrate numerical results quantitatively, we define the relative test mean error ϵtest\epsilon_{test} as follows,

ϵtest=1Mi=1Muh(Ki)uCNN(Ki)22uh(Ki)22,\epsilon_{test}=\frac{1}{M}\sum_{i=1}^{M}\frac{\|u_{h}(K_{i})-u_{CNN}(K_{i})\|_{2}^{2}}{\|u_{h}(K_{i})\|_{2}^{2}}, (5.1)

where uCNNu_{CNN} have different meanings for different cases, specifically, uCNN=u^hu_{CNN}=\hat{u}_{h} for Basis CNNs and uCNN=u~hu_{CNN}=\tilde{u}_{h} for Coef CNNs. The reference solutions uhu_{h} are obtained by FEM methods on mesh \mathcal{M} with either square or triangle elements. The inputs KiK_{i} of both neural networks are the samples of KK defined in Section 2. It should be noted that FEM methods require the value of κ\kappa on the integral points while we now only have the values on the nodes. To deal with this contradiction, we use element means for the integral points (see Figure 5.1).

Refer to caption
Figure 5.1: The values of inputs KK. The points in the figure represent spatial sampling points of KK. Left: training CNNs. Middle: FEM with rectangular elements. Right: FEM with triangular elements.

5.1 Applications to Darcy flow

The Darcy flow, which models how flow moves in a porous medium, has been widely studied in civil, geotechnical, and petroleum engineering [35]. In this subsection, we consider the single-phase Darcy flow for pressure u(x)u(x), i.e.,

{(κ(x,ξ)u(x))=1,x𝒮:=[0,1]×[0,1],u(x)=0,x𝒮.\left\{\begin{aligned} &-\nabla\cdot\Big{(}\kappa(x,\xi)u(x)\Big{)}=1,x\in\mathcal{S}:=[0,1]\times[0,1],\\ &u(x)=0,x\in\partial{\mathcal{S}}.\end{aligned}\right. (5.2)

The weak form of above equation is the following:

Findu,uV,such that,𝒮κuvdx=𝒮v𝑑x,vV.\text{Find}\ u,u\in V,\text{such that},\int_{\mathcal{S}}\kappa\nabla u\cdot\nabla vdx=\int_{\mathcal{S}}vdx,\forall v\in V.

In this section, we use finite-dimensional space VhV_{h} with bilinear basis functions φ1,,φNh2\varphi_{1},\ldots,\varphi_{N_{h}^{2}} defined on rectangular elements to obtain the high-fidelity approximation of uu, then the stiffness matrix Ah(K)A_{h}(K) belongs to Nh2×Nh2\mathbb{R}^{N_{h}^{2}\times N_{h}^{2}} and has entries

ah(i,j)=𝒮Kφiφjdx,a_{h}(i,j)=\int_{\mathcal{S}}K\nabla\varphi_{i}\cdot\nabla\varphi_{j}dx, (5.3)

and the load vector Fh(K)F_{h}(K) belongs to Nh2×1\mathbb{R}^{N_{h}^{2}\times 1} and has entries

fh(i)=𝒮φi𝑑x.f_{h}(i)=\int_{\mathcal{S}}\varphi_{i}dx. (5.4)

The approximated solution can thus be obtained by

uh(K)=Ah(K)1Fh(K).u_{h}(K)=A_{h}(K)^{-1}F_{h}(K). (5.5)

Next, we will show that the proposed methods works for different types of random inputs κ\kappa. In addition, since Darcy flow (5.2) is a linear equation, we only use Basis CNNs to learn the mapping from KK to uhuh for convenience. The efficacy of Coef CNNs will be demonstrated in Section 5.2.

5.1.1 Darcy flow with binomial point process

In the fist test case, we consider the binomial point process κ(x,ξ)\kappa(x,\xi) defined on 𝒫=(Ω,,)\mathcal{P}=(\Omega,\mathcal{F},\mathbb{P}) for any given x𝒮x\in\mathcal{S}. Let any subset of 𝒮\mathcal{S} be \mathcal{B} and {Xi}i=1n\{X_{i}\}_{i=1}^{n} be n i.i.di.i.d points with the distribution (𝒮)\mathbb{P}(\mathcal{S}) and the density p(x)p(x). Then the number of points in \mathcal{\mathcal{B}} can be defined as the following random variable

ξ()=i=1nδ(Xi),\xi(\mathcal{B})=\sum_{i=1}^{n}\delta(X_{i}\in\mathcal{B}),

where δ(Xi)\delta(X_{i}\in\mathcal{B}) is the indicator function and takes value as

δ(Xi)={1ifXi,0otherwise.\delta(X_{i}\in\mathcal{B})=\left\{\begin{aligned} &1\ \ \ \mathrm{if}\ X_{i}\in\mathcal{B},\\ &0\ \ \ \mathrm{otherwise}.\end{aligned}\right.

The probability of each random variables XiX_{i} in \mathcal{B} is the same and can be defined as follows,

p=(Xi)=p(x)𝑑x𝒮p(x)𝑑x.p=\mathbb{P}(X_{i}\in\mathcal{B})=\frac{\int_{\mathcal{B}}p(x)dx}{\int_{\mathcal{S}}p(x)dx}.

Then ξ()\xi(\mathcal{B}) is distributed to a binomial distribution with nn and pp. We introduce the set

F(ξ)=i=1n{x2|xxi2r},F(\xi)=\cup_{i=1}^{n}\{x\in\mathbb{R}^{2}|\|x-x_{i}\|_{2}\leq r\},

where xix_{i} is the sample of Xip(𝒮)X_{i}\sim p(\mathcal{S}) and rr is the radius of points. Then we can define the two-scale random permeability κ(x,ξ)\kappa(x,\xi), where points of permeability κ(1)\kappa^{(1)} are randomly distributed in the background media of conductivity κ(0)\kappa^{(0)}. Specifically,

κ(x,ξ)={κ(1)ifxF(ξ),κ(0)otherwise.\kappa(x,\xi)=\left\{\begin{aligned} &\kappa^{(1)}\ \ \ \mathrm{if}\ x\in F(\xi),\\ &\kappa^{(0)}\ \ \ \mathrm{otherwise}.\end{aligned}\right.

Here, we consider the special case that =𝒮\mathcal{B}=\mathcal{S}, \mathbb{P} is the uniform distribution on 𝒮\mathcal{S} and takes n=5,r=0.05,κ(1)=1000,κ(0)=1n=5,r=0.05,\kappa^{(1)}=1000,\kappa^{(0)}=1. In the following experiments, we discretize κ(x,ξ)\kappa(x,\xi) over the uniform FEM mesh \mathcal{M} with rectangular elements and 61×6161\times 61 nodes, denoted by KK. Several instances of KK are given in Figure 5.2.

Refer to caption
Figure 5.2: Three samples of binomial point process defined in Section 5.1.1.

In this paper, we only approximate the solutions on the free nodes, i.e., the spatial resolution of inputs is 59×5959\times 59. The configuration of Basis CNNs are shown in Figure 4.2(b) with more details in Table 1. The 2nd-3th columns of Table 1 show the spatial resolution H×WH\times W of feature maps and the number of parameters of each layer. This configuration will be used in all experiments except that the spatial resolution of feature maps varies with different cases.

Table 1: The architecture of Basis CNNs
Layers Resolution H ×\times W The number of parameters
Input 59×\times 59 -
Conv_\_p block (25×\times 25,32,1) 59×\times 59 20000
Conv_\_p block (25×\times 25,32,1) 59×\times 59 20000
Conv_\_p block (25×\times 25,64,1) 59×\times 59 40000
Conv_\_p block (25×\times 25,64,1) 59×\times 59 40000
Conv_\_p block (25×\times 25,128,1) 59×\times 59 80000
Conv_\_p block (25×\times 25,128,1) 59×\times 59 80000
Conv(25×\times 25,1,1) 59×\times 59 625
Reshape 3481 ×\times 10 -

The Basis CNNs are trained with Adam optimizer with loss (4.3). The initial learning rate is 0.0001 and cosine decay, which is a built-in function of TensorFlow, is used. The batch size is set as 32 and the model is trained for 100 epochs. Each sample in the dataset is composed of four variables, they are permeability KK, high-fidelity solution uh(K)u_{h}(K), stiffness matrix Ah(K)A_{h}(K) and load vector Fh(K)F_{h}(K), respectively (see equation (5.3)-(5.5)). We collect the data pairs for 12740 different samples of KK, 10240 of which are selected as training data and the rest are used to test the generalization ability of CNN-based ROM.

Compared with POD-based RB methods. In the following context, we compare the predictive performance of CNN-based ROM with POD-based RB methods introduced in Section 3. Figure 5.3 illustrates that CNN-based ROM can approximate the high-fidelity solution accurately using few number of RB basis functions (N=10N=10), which can not be achieved by POD-based RB methods. Since the accuracy of POD-based RB methods is highly dependent on the number of RB basis functions NN, we let NN equal 10, 20, 30, 40, 50, 800, 900, 1000, respectively and record the relative test mean errors in Table 2. As shown in table, POD-based RB methods requires 900-1000 RB basis functions to achieve the same accuracy as the proposed method, which directly affects the efficiency of the online computation.

Refer to caption
Figure 5.3: the predictive results of CNN-based ROM and POD-based RB methods with ten basis functions at three test instances of KK.
Table 2: Darcy flow with binomial point process: comparison of relative test mean errors for different basis numbers between CNN-based ROM and POD-based RB method.
The number of basis 10 20 30 40 50 800 900 1000
POD-based RB methods 81.42% 65.56% 54.50% 47.96% 43.47% 4.46% 3.68% 3.08%
CNN-based ROM 3.52% 3.39% 3.29% 3.23% 3.43% - - -

The RB methods exploit low-rank features of the data, which is usually called modes, that characterize physical processes. Therefore, we show the RB basis functions learned by CNN-based ROM and POD-based RB methods in Figure 5.4. Note that for POD-based RB basis functions, the modes are distributed around the center and dominant modes alternate arrangement. In contrast, the shape of the CNN-based RB basis functions are very much like the data. Such input-different basis functions can help reduce the number of basis functions needed for approximation. However, they lack interpretability, which is the common problem of deep learning techniques.

Refer to caption
Figure 5.4: Basis functions for test instance in the first row of Figure 5.3. The first two lines show the basis functions of CNN-based ROM with N=10N=10. Ten POD basis functions corresponding to the ten largest eigenvalues are listed in the last two rows.

Our training data are generated by FEM, which has approximation error. For instance, FEM solutions on a mesh with 31×3131\times 31 nodes are less accurate than those on a mesh with 61×6161\times 61 (see Figure 5.5). We use a Gaussian distribution with means of zero and standard deviations of 0.001 to simulate the errors caused by numerical methods, which is shown in the second column of Figure 5.6.

Refer to caption
Figure 5.5: From left to right are FEM solutions on mesh with 61×6161\times 61 nodes, FEM solutions on mesh with 31×3131\times 31 nodes and their absolute error, respectively.

Using the noisy data, we train the Basis CNNs with N=10N=10 and implement POD-based RB methods with N=100,1000N=100,1000 again. The numerical results are presented in Figure 5.6. CNN-based ROM shows a clear advantage and a de-noising effect.

Refer to caption
Figure 5.6: The predictive results for data with Gaussian noise at a level of 0.001.

5.1.2 Darcy flow in a channelized aquifer

In this subsection, we consider the random input in Figure 5.7, which simulates the hydraulic conductivity field of a channelized aquifer [36]. The total number of pixels is 2500×25002500\times 2500. Channel and matrix materials are assigned hydraulic conductivity values of 1000 and 1, respectively. To introduce randomness, we extract 23104 small images of resolution 64×6464\times 64 from the large image by moving along both the horizontal and vertical directions with an interval of 16 pixels for each movement. To obtain enough training data, we flip the generated images horizontally (see Figure 5.7). We finally get 46208 images, among which 40960 images are collected as training dataset and the rest are used for test. Then Basis CNNs with architecture (Table 1) are trained using Adam optimizer for 50 epochs. The relative tests mean error is 2.16%2.16\% and the generalization ability of the proposed method is intuitively presented in Figure 5.8.

Refer to caption
Figure 5.7: The hydraulic conductivity field of a channelized aquifer.
Refer to caption
Figure 5.8: The predictive results of CNN-based ROM with ten basis functions at three test instances of KK.

Next, we use the upper-left quarter of this background field to validate the ideas in Section 4.3, i.e., the total number of pixels is 1250×12501250\times 1250. Here, we consider an extreme case that the oversampling domain covers the whole computational region. Linear boundary conditions defined in equation (4.5) are used for the oversampling domain. Four nodal basis functions are obtained with different boundary conditions. Rather than training four different CNNs, we fully utilize the rotational symmetry of the boundary conditions, i.e., each boundary condition can be rotated clockwise by 90 degrees to overlap with another boundary condition. This can be accomplished by rotating background field. More specifically, we use Basis CNNs to learn the following equation,

{(κ(x,ξ)Φ~ms(x))=0,x𝒮:=[0,1]×[0,1],Φ~ms(0,1)=1,Φ~(0,0)=0,Φ~ms(1,0)=0,Φ~ms(1,1)=0.\left\{\begin{aligned} &-\nabla\cdot\Big{(}\kappa(x,\xi)\nabla\tilde{\Phi}_{ms}(x)\Big{)}=0,\ x\in\mathcal{S}:=[0,1]\times[0,1],\\ &\tilde{\Phi}_{ms}(0,1)=1,\tilde{\Phi}(0,0)=0,\tilde{\Phi}_{ms}(1,0)=0,\tilde{\Phi}_{ms}(1,1)=0.\end{aligned}\right.

To generate training and test dataset, we first obtain 5625 sub-image of resolution 65×6565\times 65 by moving along both the horizontal and vertical directions with an interval of 16 pixels for each movement. Then each sample are rotated clockwise by 90 degrees three times to acquire three additional samples (see Figure 5.9).

Refer to caption
Figure 5.9: The hydraulic conductivity field of a channelized aquifer for learning multiscale finite element basis functions.

The relative test mean error is 1.38%1.38\% and several test instances are shown in Figure 5.10. Note that the basis functions learned by Basis CNNs are temporary, we construct the true multiscale finite element basis functions Φmsi,i=1,2,3,4\Phi_{ms}^{i},i=1,2,3,4 in each coarse elements ω\omega from the linear combination of Φ~msi,i=1,2,3,4\tilde{\Phi}_{ms}^{i},i=1,2,3,4, i.e.,

Φmsi=k=14cikΦ~msk,\Phi_{ms}^{i}=\sum_{k=1}^{4}c_{ik}\tilde{\Phi}_{ms}^{k},

where the coefficients cikc_{ik} is determined by conditions Φmsi(xj)=δij\Phi_{ms}^{i}(x_{j})=\delta_{ij}. This leads to NN non-conforming basis functions which is dealt with by simply taking averages on the boundary because there only exists an O(1)O(1) jump across ω\partial{\omega} [5]. The numerical results depicted in Figure 5.10 demonstrate that the same basis functions can be used for different types of source terms and the numerical solution is convergent to the reference as the number of basis functions is large enough.

Refer to caption
Figure 5.10: MsFEM with multiscale basis functions learned by CNN-based ROM for different number of basis functions N=81,289,1089N=81,289,1089. The source terms for each rows in the figure are f(x)=exp(x1+x2)f(x)=\exp(x_{1}+x_{2}) and f(x)=sin(2πx1+2πx2)f(x)=\sin(2\pi x_{1}+2\pi x_{2}), respectively.

5.2 Applications to nonlinear flows

In this subsection, we consider more complex cases, namely nonlinear flows. The logarithm of the random input κ(x,ξ)\kappa(x,\xi) is restricted to be a Gaussian random field, i.e.,

κ(x,ξ)=exp(ξ(x)),ξ()𝒩(m(),k(,)),\kappa(x,\xi)=\exp\Big{(}\xi(x)\Big{)},\xi(\cdot)\sim\mathcal{N}\Big{(}m(\cdot),k(\cdot,\cdot)\Big{)}, (5.6)

where m:𝒮m:\mathcal{S}\rightarrow\mathbb{R} is the constant mean function with value mm. The covariance function k(,)k(\cdot,\cdot) is a finite positive semi-definite function, which is also call reproducing kernel on 𝒮\mathcal{S} [13]. In this test case, we take m=0m=0 and the covariance function kk is specified in the following form using vector 2-norm,

k(x1,x2)=exp(x1x22l),k(x_{1},x_{2})=\exp\Big{(}-\frac{\|x_{1}-x_{2}\|_{2}}{l}\Big{)}, (5.7)

where ll is a hyperparameter called bandwidth. Decreasing the bandwidth leads to the covariance between points x1x_{1} and x2x_{2} decreasing at a faster rate with respect to Euclidean distance x1x22\|x_{1}-x_{2}\|_{2}, which makes the discretized field realizations vary highly even across adjacent grid points (see Figure 5.11). In the following two subsections, we will take l=0.1l=0.1 and KLE are used to sample from κ(x,ξ)\kappa(x,\xi), i.e.,

ξ(x)=i=1Qλizigi(s),\xi(x)=\sum_{i=1}^{Q}\sqrt{\lambda_{i}}z_{i}g_{i}(s), (5.8)

where λi\lambda_{i} and gi(x)g_{i}(x) are the eigenvalues and eigenfunctions of the covariance function k(,)k(\cdot,\cdot), ziz_{i} are i.i.d standard norm, and QQ is the number of expansion terms that is used to control the intrinsic dimensionality.

Refer to caption
Figure 5.11: Gaussian random field with bandwidth 0.1 (left) and 0.01 (right).

5.2.1 Nonlinear source terms

We first consider the case that the source term is nonlinear as follows,

{(κ(x,ξ)u(x))=sin(10πu(x))+cos(10πu(x)),x𝒮,u(x)=0,x𝒮.\left\{\begin{aligned} &-\nabla\cdot\Big{(}\kappa(x,\xi)\nabla u(x)\Big{)}=\sin\Big{(}10\pi u(x)\Big{)}+\cos\Big{(}10\pi u(x)\Big{)},x\in\mathcal{S},\\ &u(x)=0,x\in\partial{\mathcal{S}}.\end{aligned}\right. (5.9)

The weak form of above equation is that find uVu\in V, such that

𝒮κuvdx=𝒮(sin(10πu)+cos(10πu))v𝑑x,vV.\int_{\mathcal{S}}\kappa\nabla u\nabla v\ dx=\int_{\mathcal{S}}\Big{(}\sin(10\pi u)+\cos(10\pi u)\Big{)}\ v\ dx,\ \forall v\in V. (5.10)

We solve above variational problems by Newton’s iteration methods on a fixed mesh with rectangular elements and 65×6565\times 65 nodes. Once we get the high-fidelity solution uh(K)u_{h}(K), we can obtain entries of stiffness matrix Ah(K)4225×4225A_{h}(K)\subset\mathbb{R}^{4225\times 4225} and the load vector Fh(K)4225×1F_{h}(K)\subset\mathbb{R}^{4225\times 1} are

ah(i,j)\displaystyle a_{h}(i,j) =𝒮Kφiφjdx,\displaystyle=\int_{\mathcal{S}}K\nabla\varphi_{i}\nabla\varphi_{j}\ dx,
fh(i)\displaystyle f_{h}(i) =𝒮uh(K)φidx.\displaystyle=\int_{\mathcal{S}}u_{h}(K)\nabla\varphi_{i}\ dx.

where {φi,,φ4225}\{\varphi_{i},\ldots,\varphi_{4225}\} is the bilinear basis functions of VhV_{h}. Here, we use element means to approximate the value of uhu_{h} at the integral points (the same as Figure 5.1). The Basis CNNs with architecture (Table 1) are first trained using 5120 data pairs. Since uh(K)u_{h}(K) is unknown for test instances, Basis CNNs themselves can not realize online computation. We further use Coef CNNs to learn the mapping from KK to the coefficients c(K)c(K). The specific architecture of Coef CNNs is presented in Table 3

Table 3: The architecture of Coef CNNs
Layers Resolution H ×\times W The number of parameters
Input 63×\times 63 -
Conv block (25×\times 25,32,1) 63×\times 63 20000
Conv block (25×\times 25,32,2) 32×\times 32 20000
Conv block (25×\times 25,64,1) 32×\times 32 40000
Conv block (25×\times 25,64,2) 16×\times 16 40000
Conv block (25×\times 25,128,1) 16×\times 16 80000
Conv block (25×\times 25,128,2) 8×\times 8 80000
FC(300) 300×\times 1 2457600
FC(300) 300×\times 1 90000
FC(10) 10×\times 1 3000

The same dataset is used to train Coef RNN. The relative test mean error is 4.43%4.43\%, which is estimated by 2500 test instances. Besides, several results are shown in Figure 5.12. Compared to traditional ROM, which need iterative methods on the online stage, we can directly obtain the prediction of solutions using CNN-based ROM.

Refer to caption
Figure 5.12: The predictive results of CNN-based ROM with ten basis functions at three test instances of KK.

5.2.2 Nonlinear differential operators

In this section, we consider the p-Laplacian equations as follows,

{(κ(x,ξ)|u(x)|p2u(x))=1,x𝒮,u(x)=0,x𝒮.\left\{\begin{aligned} &-\nabla\cdot\Big{(}\kappa(x,\xi)|u(x)|^{p-2}\nabla u(x)\Big{)}=1,x\in\mathcal{S},\\ &u(x)=0,x\in\partial{\mathcal{S}}.\end{aligned}\right. (5.11)

The p-Laplacian operator in the left hand of equation (5.11) is derived from a nonlinear Darcy law and continuity equation. For p=2, equation (5.11) is equivalent to the equation in Section 5.1, which is linear. Here, we take p=3p=3, and minimize the total potential energy of the flow, i.e.,

u=argminvV1p𝒮κv2p𝒮v𝑑x.u=\mathop{\arg\min}\limits_{v\in V}\frac{1}{p}\int_{\mathcal{S}}\kappa\|\nabla v\|_{2}^{p}-\int_{\mathcal{S}}vdx.

We approximate uu on the mesh with triangular elements and 65×6565\times 65 nodes. The above optimization problems are solved for 12740 samples of the log-Gaussian random field with bandwidth 0.1 by an efficient algorithm [37], among which 10240 instances are split to training dataset and the rest are used for test. The stiffness matrix Ah(K)4225×4225A_{h}(K)\in\mathbb{R}^{4225\times 4225} is computed by assuming that u(K)u(K) is accurately approximated by uh(K)u_{h}(K), which has entries

ah(i,j)=𝒮K|uh(K)|p2φiφjdx,a_{h}(i,j)=\int_{\mathcal{S}}K|u_{h}(K)|^{p-2}\nabla\varphi_{i}\nabla\varphi_{j}\ dx,

where {φi,,φ4225}\{\varphi_{i},\ldots,\varphi_{4225}\} is the piecewise linear basis functions of VhV_{h}. The definition of load vector Fh(K)4225×1F_{h}(K)\in\mathbb{R}^{4225\times 1} is the same as equation (5.4). We train both Basis CNNs and Coef CNNs by Adam for 50 epochs. The relative test mean error is 1.49%1.49\% and several test instances are shown in Figure 5.13.

Refer to caption
Figure 5.13: The predictive results of CNN-based ROM with ten basis functions at three test instances of KK.

5.2.3 Application to inverse problems

Refer to caption
Figure 5.14: The schema of Surrogate-constrained VAE.

As shown in above numerical results, the proposed method provides an efficient and precise surrogate model for the random multiscale equation (2.1). In this section, we will apply CNN-based ROM to inverse problems. Given dataset 𝒴={y1,y2,,yNobs}\mathcal{Y}=\{y_{1},y_{2},\ldots,y_{N_{obs}}\} observed by NobsN_{obs} sensors on the spatial points {x1,x2,,xNobs}\{x_{1},x_{2},\ldots,x_{N_{obs}}\}, we want to estimate the coefficient vector 𝒛:=[z1,,zQ]\bm{z}:=[z_{1},\ldots,z_{Q}] of KLE expansion of the Gaussian random fields defined in equation (5.8).

A method that combines Variational Autoencoder (VAE) [39] with CNN-based ROM is introduced to solve this problem. The schema of this method is presented in Figure 5.14. VAE is a type of generative model that belongs to the family of Autoencoders, specifically designed for learning latent representation of the input data and generate new data samples. It is composed of the recognition model, which learn the mapping from input data to parameters of posteriors for latent variables, and the likelihood model, which reconstructs data from latent space. In standard VAE, both recognition model and likelihood model are represented by neural networks. As an unsupervised method, VAE learn the latent variables purely by data, which is not suitable for inverse problems. Hence, we use the trained CNN-based ROM as parts of likelihood model to constrain the learning of latent variables, which is inspired by PDE-constrained Bayesian inversion methods [38]. Therefore, we call the method Surrogate-constrained VAE and the unknown parameters Ψ\Psi are only contained in the recognition model, that is

ψ=ΦΨRecog(𝒴),\psi=\Phi_{\Psi}^{Recog}(\mathcal{Y}),

where ψ\psi are the local variational parameters that define the posterior distributions q(𝒛|𝒴;ψ)q(\bm{z}|\mathcal{Y};\psi) that approximate the true posterior p(𝒛|𝒴)p(\bm{z}|\mathcal{Y}). The mapping ΦΨRecog\Phi_{\Psi}^{Recog} is a fully-connected neural networks. The problem thus becomes solving for the global variational parameters Ψ\Psi. Using such a neural network, we can obtain the posterior estimation of coefficients ziz_{i} by passing the observation data 𝒴\mathcal{Y} through ΦΨRecog\Phi_{\Psi}^{Recog}, which can not be achieved by PDE-constrained Bayesian inversion methods.

Next, we will use variational inference to learn the parameters Ψ\Psi. Because the likelihood of data p(y)p(y) is hard to computed, we instead minimize the lower bound of it that is known as evidence lower bound (ELBO),

ELBO(Ψ)=𝔼q(𝒛|𝒴;ψ)(logp(y|𝒛;ΦθBase,ΦϕCoef,σobs))+KL(q(𝒛|𝒴;ψ)p(𝒛)),\text{ELBO}(\Psi)=\mathbb{E}_{q(\bm{z}|\mathcal{Y};\psi)}\Big{(}\log\ p(y|\bm{z};\Phi_{\theta}^{Base},\Phi_{\phi}^{Coef},\sigma_{obs})\Big{)}+\text{KL}\Big{(}q(\bm{z}|\mathcal{Y};\psi)\|p(\bm{z})\Big{)}, (5.12)

where p(y|𝒛;ΦθBase,ΦϕCoef,σobs)p(y|\bm{z};\Phi_{\theta}^{Base},\Phi_{\phi}^{Coef},\sigma_{obs}) is the likelihood model with i.i.d Gaussian observation noise σobs\sigma_{obs}, i.e.,

𝒴i=ΦθBase(Ki)ΦϕCoef(Ki)+ϵi,ϵi𝒩(0,σobs2I).\mathcal{Y}_{i}=\Phi_{\theta}^{Base}(K_{i})\Phi_{\phi}^{Coef}(K_{i})+\epsilon_{i},\epsilon_{i}\sim\mathcal{N}(0,\sigma_{obs}^{2}I).

And p(𝒛)p(\bm{z}) is the prior, we take it the standard Gaussian distribution because the coefficients of KLE of Gaussian random field is distributed to i.i.d Gaussian distribution. The approximation posterior distribution q(𝒛|𝒴;ψ)q(\bm{z}|\mathcal{Y};\psi) are also Gaussian with mean vector μ\mu and covariance matrix σ2I\sigma^{2}I, i.e.,

𝒛|𝒴𝒩(μ,σ2I).\bm{z}|\mathcal{Y}\sim\mathcal{N}(\mu,\sigma^{2}I).

Furthermore, we can understand the equation (5.12) from the perspective of inverse problems. The first term aims at maximizing the likelihood of the reconstructed data. The second term is the Kullback-Leibler divergence of two probability density, which is defined as

KL(pq)=p(x)logp(x)q(x)dx.\text{KL}(p\|q)=\int p(x)\log\frac{p(x)}{q(x)}\ dx.

It works as the regularization term such that posterior is as similar as possible to prior, which can help to alleviate the ill-posedness of the inverse problem. In terms of reparametrization tricks [39], we can use gradient descent algorithms, such as Adam, to optimize neural network parameters Ψ\Psi. Once the recognition model is trained, we can also estimate the expectations of KK by Monte Carlo method, i.e.,

𝔼q(𝒛|𝒴;ψ)(K)1Mi=1Mexp(j=1Qλjzj(i)𝒈j),𝒛q(𝒛|𝒴;ψ).\mathbb{E}_{q(\bm{z}|\mathcal{Y};\psi)}(K)\approx\frac{1}{M}\sum_{i=1}^{M}\exp\Big{(}\sum_{j=1}^{Q}\sqrt{\lambda_{j}}z_{j}^{(i)}\bm{g}_{j}\Big{)},\bm{z}\sim q(\bm{z}|\mathcal{Y};\psi). (5.13)

where 𝒈j\bm{g}_{j} is the discrete vector of eigenfunctions gjg_{j} in KLE (5.8). Similarly, we can also obtain the variance of KK based on estimations (5.13)

Varq(𝒛|𝒴;ψ)(K)1Mi=1M(exp(j=1Qλjzj(i)𝒈j)𝔼q(𝒛|𝒴;ψ)(K))2,𝒛q(𝒛|𝒴;ψ).Var_{q(\bm{z}|\mathcal{Y};\psi)}(K)\approx\frac{1}{M}\sum_{i=1}^{M}\Bigg{(}\exp\Big{(}\sum_{j=1}^{Q}\sqrt{\lambda_{j}}z_{j}^{(i)}\bm{g}_{j}\Big{)}-\mathbb{E}_{q(\bm{z}|\mathcal{Y};\psi)}(K)\Bigg{)}^{2},\bm{z}\sim q(\bm{z}|\mathcal{Y};\psi). (5.14)

Finally, we will demonstrate the performance of Surrogate-constrained VAE for p-Laplacian equation in Section 5.2.2. We first place 15×1515\times 15 sensors uniformly in the spatial space as shown in the first column of Figure 5.15 and assume that the observation noise is 0.01. We can see from Figure 5.15 that Surrogate-constrained VAE can accurately estimate the expectation of permeability KK. The posterior density of coefficients ziz_{i} of the test instance in the first row of Figure 5.15 are further drawn in Figure 5.16. We also change the noise level, we can see from Figure 5.17 that the standard deviations decrease as the noise level decrease.

Refer to caption
Figure 5.15: The expectations of random inputs, which is estimated by surrogate-constrained VAE with 15×1515\times 15 sensors (σobs=0.01\sigma_{obs}=0.01).
Refer to caption
Figure 5.16: The posterior density of coefficients ziz_{i}. The number of sensors is 225225 and the observation noise is at the level of 0.01. The true coefficients are denoted by red stars, and the posterior densities of the estimated ziz_{i} are represented by blue lines.
Refer to caption
Figure 5.17: The expectations and standard deviations of random inputs, which is estimated by surrogate-constrained VAE with 8×88\times 8 sensors. The observation noises from top line to bottom line are at the level of 0.05, 0.01, 0.005, respectively.

To further illustrates performance of the proposed method, we fix the noise level and use more sparse observation data with 8×88\times 8 sensors. The estimation in Figure 5.18 only captures the basic shapes of permeability due to the high noise level. We thus decrease the noise level to 0.001, the performance, shown in Figure 5.19, is improved as expected. It is observed that the standard deviations will be higher for less accurate estimation.

Refer to caption
Figure 5.18: The expectations of random inputs, which is estimated by surrogate-constrained VAE with 8×88\times 8 sensors (σobs=0.01\sigma_{obs}=0.01).
Refer to caption
Figure 5.19: The expectations and standard deviations of random inputs, which is estimated by surrogate-constrained VAE with 8×88\times 8 sensors (σobs=0.001\sigma_{obs}=0.001).

6 Conclusion

In this paper, we have provided a data-driven ROM methods for multiscale problems with high-dimensional random inputs, which was called CNN-based ROM. After reviewing the multiscale reduction methods and reduced basis methods, we first summarized the ROM as two main parts: the construction of reduced-order basis functions and the computation of coarse-grid, which inspired the designs of two distinct CNNs in the proposed method.

The first CNN was called Basis CNNs, which was used to learn input-specific basis functions for linear model reduction. At the output layer, activation function defined by Galerkin projection were utilized to reconstruct fine-scale solutions and constrained the learning of basis functions. On the one hand, such settings could break the limitations of the RB methods for high-dimensional problems. On the other hand, compared to multiscale reduction methods, there is no need for computing a lot of local problems online to obtain the basis functions. Because the reduced-order model learned by CNNs might be ill-conditioned, we used condition number under Frobenious norm to guarantee the stability of training. In addition, Numerical results showed that the basis functions learned by Basis CNNs were similar to fine-scale solutions, thus the number of reduced-order basis could be less than POD-based RB methods. Moreover, ROM assisted by deep learning techniques was less sensitive to data fluctuation caused by numerical methods than traditonal ROM. Fine-scale stiffness matrix and load vector were needed in Basis CNNs, which was computationally expensive and not available for nonlinear problems at the online stage. To overcome these limitations, we designed the second CNN called Coefficient CNN (Coef) to learn the coefficients of linear combination.

We also provided two applications of CNN-based ROM. We first explored the connections between the proposed method and MsFEM. It was shown that CNN-based ROM could be used for learning MsFEM basis functions within large oversampling regions. Thus, the learned basis functions could be used for different source terms. In the last subsection of numerical results, it was demonstrated that CNN-based ROM could be used as surrogates for solving inverse problems of the p-Laplacian equation with Gaussian random field.

At the end, we would like to remark some limitations of our methods, which can be further investigated in the future. Firstly, basis functions learned by CNN-based ROM itself can not be used for different boundaries and different source terms. Secondly, the dataset consists of fine-scale solutions is not easy to obtain, ROM based on mixed data of coarse-scale and fine-scale simulations can be considered. Finally, we will extend our methods to the dynamical systems.

References

  • [1] M. Alotaibi, H. Chen, S. Sun, Generalized multiscale finite element methods for the reduced model of Darcy flow in fractured porous media, Journal of Computational and Applied Mathematics. 413 (2022) 114305.
  • [2] Y. Xia, N. Zabaras, Bayesian multiscale deep generative model for the solution of high-dimensional inverse problems, Journal of Computational Physics. 455 (2022) 111008.
  • [3] L. Jiang, L. Ma, A hybrid model reduction method for stochastic parabolic optimal control problems, Computer Methods in Applied Mechanics and Engineering. 370 (2020) 113244.
  • [4] J. Capers, L. Stanfield, J. Sambles, S. Bayes, A. Powell, A. Hibbins, S. Horsley, Physical review applied. 21 (2024) 014005.
  • [5] T. Hou, X. Wu, A multiscale finite element method for elliptic problems in composite materials and porous media. Journal of Computational Physics. 134 (1997) 169-189.
  • [6] L. Jiang, Y. Efendiev, G. Victor, Multiscale methods for parabolic equations with continuum spatial scales, Discrete and Continuous Dynamical Systems. 8 (2012) 833-859.
  • [7] Y. Efendiev, J. Galvis, T. Hou, Generalized multiscale finite element methods (GMsFEM), Journal of Computational Physics. 251 (2013) 116-135.
  • [8] E. Chung, Yalchin Efendiev, T. Hou, Adaptive multiscale model reduction with generalized multiscale finite element methods, Journal of Computational Physics. 320 (2016) 69-95.
  • [9] E. Chung, Y. Efendiev, W. Leung, Constraint energy minimizing generalized multiscale finite element method, Computer Methods in Applied Mechanics and Engineering. 339 (2018) 298-319.
  • [10] M. Li, E. Chung, L., Jiang, A constraint energy minimizing generalized multiscale finite element method for parabolic equations, Multiscale Modeling and Simulation. 17 (2019) 996-1018.
  • [11] M. Yano, Discontinuous Galerkin reduced basis empirical quadrature procedure for model reduction of parametrized nonlinear conservation laws, Advances in Computational Mathematics. 45 (2019) 2287-2320.
  • [12] R. Zimmermann, S. Görtz, Improved extrapolation of steady turbulent aerodynamics using a non-linear POD-based reduced order model, The Aeronautical Journal. 116 (2012) 1079-1100.
  • [13] Y. Zhu, N. Zabaras, Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification, Journal of Computational Physics. 366 (2018) 415-447.
  • [14] S. Chaturantabut, D. Sorensen, Discrete Empirical Interpolation for Nonlinear Model Reduction, in: Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, 2009, pp. 4316-4321.
  • [15] F. Negri, Andrea Manzoni, David Amsallem, Efficient model reduction of parametrized systems by matrix discrete empirical interpolation, Journal of Computational Physics. 303 (2015) 431-454.
  • [16] S. Brunton, J. Kutz, Data-driven science and engineering: machine learning, dynamical systems and control, Cambridge university press, 2022, pp. 500-504.
  • [17] M. Wang, S. Cheung, E. Chung, Y. Efendiev, W. Leung, Y. Wang, Prediction of discretization of GMsFEM Using Deep Learning, Mathematics. 7 (2019) 412.
  • [18] A. Choubineh, J. Chen, F. Coenen, F. Ma, an innovative application of deep learning in multiscale modeling of subsurface fluid flow: Reconstructing the basis functions of the mixed GMsFEM, Journal of Petroleum Science and Engineering. 216 (2022) 110751.
  • [19] S. Cheung, E. Chung, Y. Efendiev, E. Gildin, Y. Wang, J. Zhang, Deep global model reduction learning in porous media flow simulation, Computational Geosciences. 24 (2020) 261-274.
  • [20] J. Hesthaven, S. Ubbiali, Non-intrusive reduced order modeling of nonlinear problems using neural networks, Journal of Computational Physics. 363 (2018) 55-78.
  • [21] S. Fresca, L. Dede, A. Manzoni, A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs, Journal of Scientific Computing. 87 (2021) 1-36.
  • [22] M. Mignolet, A. Przekop, S. Rizzi, S. Spottswood, A review of indirect/non-intrusive reduced order modeling of nonlinear geometric structures, Journal of Sound and Vibration. 332 (2013) 2437-2460.
  • [23] Q. Wang, J. Hesthaven, D. Ray, Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem, Journal of Computational Physics. 384 (2019) 289-307.
  • [24] X. Zhang, L. Jiang, Conditional variational autoencoder with Gaussian process regression recognition for parametric models, Journal of Computational and Applied Mathematics. 438 (2024) 115532.
  • [25] A. Quarteroni, A. Mabzoni, F. Negri, Reduced basis methods for partial differential equations, Springer, 2016.
  • [26] M. Ohlberger, S. Rave, Reduced basis methods: success, limitations and future challenges, Proceedings of ALGORITMY, 2016, pp. 1-12.
  • [27] N. Santo, S. Deparis, L. Pegolotti, Data-driven approximation of parametrized PDEs by reduced basis and neural networks, Journal of Computational Physics. 416 (2020) 109550.
  • [28] A. Pinkus, n-widths in approximation theory, volume 7 of Ergebnisse der Mathematik und ihrer Grenzgebiete, Springer-Verlag, 1985.
  • [29] N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonarid, L. Piater, A. Rodriguez-Sanchez, L. Wiskott, Deep hierarchies in the primate visual cortex: what can we learn for computer vision?, IEEE Trans Pattern Anal Mach Intell. 35 (2013) 1847-1871.
  • [30] R. Yamashita, M. Nishio, R. DO, K. Togashi, Convolutional neural networks: an overview and application in radiology, Insights into Imaging. 9 (2018) 611-629.
  • [31] S. Loffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Proceedings of the 32nd International Conference on Machine Learning. 37 (2015) 448-456.
  • [32] M. Gockenbach, Understanding and Implementing the Finite Element Method, Society for Industrial and Applied Mathematics, 2006, pp. 59-62.
  • [33] P. Benner, A. Cohen, M. Ohlberger, K. Willcox, Model Reduction and Approximation: Theory and Algorithms, Society for Industrial and Applied Mathematics, 2016, pp. 65-136.
  • [34] J. Masci, U. Meier, D. Ciresan, J. Schmidhuber, Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction, Springer-Verlag. 6791 (2011) 52-59.
  • [35] A. Masud, T. Hughes, A stabilized mixed finite element method for Darcy flow, Computer methods in applied mechanics and engineering. 191 (2002) 4341-4370.
  • [36] E. Laloy, R. Herault, D. Jacques, N. Linde, Training-Image Based Geostatistical Inversion Using a Spatial Generative Adversarial Neural Network. Water Resources Research. 54 (2018) 381-406.
  • [37] S. Loisel, Efficient algorithms for solving the p-Laplacian in polynomial time, Numerische Mathematik. 146 (2020) 369-400.
  • [38] J. Cockayne, C. Oates, T. Sullivan, M. Girolami, Probabilistic numerical methods for PDE-constrained Bayesian inverse problems, AIP Conference Proceedings, 1853 (2017) 060001.
  • [39] L. Cinelli, M. Marins, E. Silva, S. Netto, Variational Methods for Machine Learning with Applications to Deep Networks, Springer, 2021, pp. 151-152.