This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Automatic stabilization of finite-element simulations using neural networks and hierarchical matrices

Tomasz Służalec(1), Mateusz Dobija(1), Anna Paszyńska(1), Ignacio Muga(2),
Maciej Paszyński(3)
(1)Jagiellonian University, Kraków, Poland
(2) Pontificia Universidad Católica of Valparaíso, Chile
(3) AGH University of Science and Technology, Kraków, Poland
e-mail: [email protected]
Abstract

Petrov-Galerkin formulations with optimal test functions allow for the stabilization of finite element simulations. In particular, given a discrete trial space, the optimal test space induces a numerical scheme delivering the best approximation in terms of a problem-dependent energy norm. This ideal approach has two shortcomings: first, we need to explicitly know the set of optimal test functions; and second, the optimal test functions may have large supports inducing expensive dense linear systems.

Nevertheless, parametric families of PDEs are an example where it is worth investing some (offline) computational effort to obtain stabilized linear systems that can be solved efficiently, for a given set of parameters, in an online stage. Therefore, as a remedy for the first shortcoming, we explicitly compute (offline) a function mapping any PDE-parameter, to the matrix of coefficients of optimal test functions (in a basis expansion) associated with that PDE-parameter. Next, as a remedy for the second shortcoming, we use the low-rank approximation to hierarchically compress the (non-square) matrix of coefficients of optimal test functions. In order to accelerate this process, we train a neural network to learn a critical bottleneck of the compression algorithm (for a given set of PDE-parameters). When solving online the resulting (compressed) Petrov-Galerkin formulation, we employ a GMRES iterative solver with inexpensive matrix-vector multiplications thanks to the low-rank features of the compressed matrix. We perform experiments showing that the full online procedure as fast as the original (unstable) Galerkin approach. We illustrate our findings by means of 2D Eriksson-Johnson and Hemholtz model problems.

keywords:
Petrov-Galerkin method , optimal test functions , parametric PDEs , automatic stabilization , neural networks , hierarchical matrices

1 Introduction

Difficult finite-element simulations solved with the Galerkin method (where we employ the same trial and test space) often generate incorrect numerical results with oscillations or spurious behavior. Examples of such problems are the advection-dominated diffusion equation [7] or the Helmholtz equation [10].

Petrov-Galerkin formulations111i.e., trial and test spaces are not equal, although they share the same dimension. [18] with optimal test functions allow for automatic stabilization of difficult finite-element simulations. This particular Petrov-Galerkin approach is equivalent to the residual minimization (RM) method [7], whose applications include advection-diffusion [16, 6], Navier-Stokes [15], or space-time formulations [20].

In general, for a fixed trial space, RM allows for stabilization by enriching the discrete test space where optimal test functions of the associated Petrov-Galerkin method live 222Contrary to standard stabilization methods like SUPG [5, 13], there are no special stabilization terms modifying the weak formulation.. If we had explicitly the formulas for the optimal test functions expanded in the enriched discrete test space, then we can return to the Petrov-Galerkin formulation and solve problems in the best possible way for a given trial space.

This scenario has two shortcomings. The first problem is that the computation of optimal test functions is expensive. It requires solving a large system of linear equations333As large as the dimension of the enriched discrete test space. with multiple right-hand sides (one right-hand side per each basis function of the trial space). The second problem is that the optimal test functions can have global supports, and thus the Petrov-Galerkin method with optimal test functions can generate a dense matrix, expensive to solve.

Nevertheless, parametric families of partial differential equations (PDEs) are an example where it is worth investing some (offline) computational effort, to obtain stabilized linear systems that can be solved efficiently, for a given set of parameters in an online stage. Therefore, in the context of a parametric family of PDEs, and as a remedy for the first shortcoming, we explicitly compute (offline) a function mapping any PDE-parameter, to the matrix of coefficients of optimal test functions444expanded in the basis of the enriched discrete test space. associated with that PDE-parameter. We emphasize that this last procedure is independent of particular right-hand side sources and/or prescribed boundary data of the PDE family.

The obtained matrix 𝕎\mathbb{W} of optimal test functions coefficients is dense. The Petrov-Galerkin method induces a linear system of the form 𝔹T𝕎x=(LT𝕎)T\mathbb{B}^{T}\mathbb{W}\,x=(L^{T}\mathbb{W})^{T}, where LL is a right-hand side vector and 𝔹\mathbb{B} is the matrix associated with the bilinear form of the underlying PDE. To avoid dense matrix computations, we compress 𝕎\mathbb{W} using the approach of hierarchical matrices [11]. Having the matrix 𝕎\mathbb{W} compressed into a hierarchical matrix \mathbb{H}, we employ the GMRES method [19], which involves computations of the residual R=𝔹Tx(LT)TR=\mathbb{B}^{T}\mathbb{H}\,x-(L^{T}\mathbb{H})^{T} and the hierarchical matrix enables matrix-vector multiplication of x\mathbb{H}\,x and LTL^{T}\mathbb{H} in a quasi-linear computational cost.

However, compressing the matrix 𝕎\mathbb{W} for each PDE-parameter is an expensive procedure. Thus, with the help of an artificial neural network, we train (offline) the critical bottleneck of the compression algorithm. We obtain a stabilized method with the additional quasi-linear cost resulting from matrix-vector multiplications within the GMRES method. From our numerical results with the Ericksson-Johnson and Hemholtz model problems, this cost of stabilization (the cost of compression of the hierarchical matrix and the cost of GMRES with hierarchical matrix multiplication by a vector) is of the same order as the cost of the solution of the original Galerkin problem without the stabilization. Thus, we claim we can obtain stabilization with neural networks and hierarchical matrices practically for free.

1.1 One-dimensional illustration of stabilization

Let us illustrate how we stabilize difficult computational problems by means of a one-dimensional example for the advection-dominated diffusion model.

Given 0<ϵ10<\epsilon\leq 1, consider the following differential equation:

{ϵu′′+u=0in (0,1);ϵu(0)+u(0)=1and u(1)=0.\left\{\begin{array}[]{rl}-\epsilon u^{\prime\prime}+u^{\prime}={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}0}&\hbox{in }(0,1)\,;\\ -\epsilon u^{\prime}(0)+u(0)=1&\hbox{and }u(1)=0.\end{array}\right. (1)

In weak-form, equation (1) translates into find uH0)1(0,1):={vH1(0,1):v(1)=0}u\in H_{0)}^{1}(0,1):=\{v\in H^{1}(0,1):v(1)=0\} such that:

ϵ01uv+01uv+u(0)v(0)=v(0)\epsilon\int_{0}^{1}u^{\prime}v^{\prime}+\int_{0}^{1}u^{\prime}v+u(0)v(0)={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}v(0)} (2)
Refer to caption
Refer to caption
Figure 1: Discrete spaces UhU_{h} (left) and VhV_{h} (right).

We define discrete spaces UhU_{h} and VhV_{h} as depicted in Figure 1. Given a regular mesh, UhU_{h} will be the space of piecewise linear and continuous functions; while VhV_{h} will be the space of piecewise quadratics and continuous functions. From one side, we discretize formulation (2) using a standard Galerkin method where trial and test spaces are equal to VhV_{h}. On the other side, we discretize (2) by means of a residual minimization (RM) technique that uses UhU_{h} as the trial space, and VhV_{h} as the test space. Figure 2 compares the discrete solutions delivered by these two methods for different values of ϵ\epsilon, and different meshes parametrized by the number of elements nn. We observe a superior performance of the residual minimization method, even though the approximation (trial) space used is poorer than that of the Galerkin method.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Comparing the exact solution with the Galerkin method (where trial=test=quadratic B-splines with C0C^{0} separators) and the residual minimization method (with linear B-splines for trial and quadratic B-splines with C0C^{0} separators for test).

1.2 Outline of the paper

The structure of the paper is the following. Section 2 contains all the theoretical ingredients needed to understand our approach. Namely, Petrov-Galerkin formulations with optimal test functions (section 2.1); optimal test functions for an affine family of PDEs (section 2.2); the hierarchical compression of the optimal test functions matrix of coefficients (section 2.3); the fast hierarchical matrix-vector multiplication and related fast implementation of the GMRES solver (section 2.4); and the neural networks acceleration of the hierarchical matrix compression (section 2.5). Next, in section 3 we apply our automatic stabilization procedure to well-known unstable parametric model problems. First, for the two-dimensional Eriksson-Johnsson model problem (section 3.1); and next, for a two-dimensional Helmholtz equation (section 3.2). We write our conclusions in section 4. All the pseudo-code algorithms needed to follow our methodology have been shifted to the Appendix.

2 Theoretical ingredients

2.1 Petrov-Galerkin formulations with optimal test functions

Let UU and VV be Hilbert spaces. We consider a general variational formulation of a PDE, which is to find uUu\in U such that

b(u,v)=(v),vV,b(u,v)=\ell(v)\,,\quad\forall v\in V\,, (3)

where b:U×Vb:U\times V\to\mathbb{R} is a continuous bilinear form, and :V\ell:V\to\mathbb{R} is a continuous linear functional (i.e., V\ell\in V^{\prime}, the dual space of VV).

The dual space VV^{\prime} has a norm inherited by the norm of VV. Indeed, if (,)V(\cdot,\cdot)_{V} denotes the inner-product of the Hilbert space VV, then these norms are given by the following expressions:

VvvV:=(v,v)VandVffV:=supvV=1f(v).V\ni v\mapsto\|v\|_{V}:=\sqrt{(v,v)_{V}}\qquad\hbox{and}\qquad V^{\prime}\ni f\mapsto\|f\|_{V^{\prime}}:=\sup_{\|v\|_{V}=1}f(v).

We will assume well-posedness of problem (3), which translates into the well-known inf-sup conditions (see, e.g., [9, Theorem 2.6]):

γ>0 such that b(w,)VγwU,wU;\displaystyle\exists\gamma>0\hbox{ such that }\|b(w,\cdot)\|_{V^{\prime}}\geq\gamma\|w\|_{U}\,,\quad\forall w\in U; (4a)
{vV:b(w,v)=0,wU}={0}.\displaystyle\{v\in V:b(w,v)=0\,,\forall w\in U\}=\{0\}. (4b)

Notice that the continuity assumption on the bilinear form b(,)b(\cdot,\cdot), also implies the existence of a constant MγM\geq\gamma such that:

b(w,)VMwU,wU.\|b(w,\cdot)\|_{V^{\prime}}\leq M\|w\|_{U}\,,\quad\forall w\in U. (5)

Given a discrete finite-element space UhUU_{h}\subset U, a natural candidate to approximate the solution uUu\in U to problem (3) is the residual minimizer

uh=argminwhUhb(wh,)()V.u_{h}=\operatorname*{arg\,min}_{w_{h}\in U_{h}}\|b(w_{h},\cdot)-\ell(\cdot)\|_{V^{\prime}}\,. (6)

Indeed, combining (4a), (6), and (5), the residual minimizer automatically satisfies the quasi-optimality property:

γuhuUb(uh,)()V=infwhUhb(wh,)()VMinfwhUhwhuU.\gamma\|u_{h}-u\|_{U}\leq\|b(u_{h},\cdot)-\ell(\cdot)\|_{V^{\prime}}=\inf_{w_{h}\in U_{h}}\|b(w_{h},\cdot)-\ell(\cdot)\|_{V^{\prime}}\leq M\inf_{w_{h}\in U_{h}}\|w_{h}-u\|_{U}\,.

Thus, the residual minimizer inherits stability properties from the continuous problem.

It is well-known (see, e.g., [8]) that the saddle-point formulation of residual minimization (6) becomes the mixed problem that aims to find uhUhu_{h}\in U_{h}, and a residual representative rVr\in V, such that:

(r,v)Vb(uh,v)\displaystyle(r,v)_{V}-b(u_{h},v) =(v),\displaystyle=-\ell(v),\quad vV,\displaystyle\forall v\in V, (7a)
b(wh,r)\displaystyle b(w_{h},r) =0,\displaystyle=0,\quad whUh.\displaystyle\forall w_{h}\in U_{h}. (7b)

Although it seems harmless, the mixed problem (7) is still infinite-dimensional in the test space VV. To obtain a computable version, we introduce a discrete test space VhVV_{h}\in V that turns (7) into the fully discrete problem to find uhUhu_{h}\in U_{h}555Notice the abuse of notation. This new uhUhu_{h}\in U_{h} solution of (8) does not equals the exact residual minimizer solution of (6), or equivalently (7)., and a residual representative rhVhr_{h}\in V_{h}, such that:

(rh,vh)Vb(uh,vh)\displaystyle(r_{h},v_{h})_{V}-b(u_{h},v_{h}) =(vh),\displaystyle=-\ell(v_{h}),\quad vVh,\displaystyle\forall v\in V_{h}, (8a)
b(wh,rh)\displaystyle b(w_{h},r_{h}) =0,\displaystyle=0,\quad whUh,\displaystyle\forall w_{h}\in U_{h}, (8b)

Problem (8) corresponds to the saddle-point formulation of a discrete-dual residual minimization, in which the dual norm V\|\cdot\|_{V^{\prime}} in (6) is replaced by the discrete-dual norm Vh\|\cdot\|_{V_{h}^{\prime}}. Well-posedness and stability of (8) has been extensively studied in [17] and depends on a Fortin compatibility condition between the discrete spaces UhU_{h} and VhV_{h}. Moreover, once this condition is fulfilled (and in order to gain stability), it is possible to enrich the test space VhV_{h} without changing the trial space UhU_{h}. Obviously, this process will enlarge the linear system (8). Nevertheless, we know that there is an equivalent linear system of the same size of the trial space, delivering the same uhUhu_{h}\in U_{h} solving (8). This is known as the Petrov-Galerkin method with optimal test functions, which we describe below. Our goal will be to make this generally impractical method practical.

Let us introduce now the concept of optimal test functions. For each whUhw_{h}\in U_{h}, the optimal test function TwhVhTw_{h}\in V_{h} is defined as the Riesz representative of the functional b(wh,)Vhb(w_{h},\cdot)\in V_{h}^{\prime}, i.e.,

(Twh,vh)V=b(wh,vh),vhVh.(Tw_{h},v_{h})_{V}=b(w_{h},v_{h})\,,\quad\forall v_{h}\in V_{h}. (9)

Testing equation (8a) with optimal test functions TwhTw_{h}, using (9) and (8b), we arrive to the following Petrov-Galerkin system with optimal test functions:

{Find uhUh such thatb(uh,Twh)=(Twh),whUh.\left\{\begin{array}[]{ll}\hbox{Find }u_{h}\in U_{h}\hbox{ such that}\\ b(u_{h},Tw_{h})=\ell(Tw_{h})\,,&\forall w_{h}\in U_{h}\,.\end{array}\right. (10)

In order to explicit a matrix expression for (10), let us set Uh:=span{w1,,wn}U_{h}:=\operatorname{span}\{w_{1},\dots,w_{n}\} and Vh:=span{v1,,vm}V_{h}:=\operatorname{span}\{v_{1},...,v_{m}\}. Consider the matrix 𝔹\mathbb{B} linked to the bilinear form b(,)b(\cdot,\cdot) such that its (i,j)(i,j)-entry is 𝔹ij=b(wj,vi)\mathbb{B}_{ij}=b(w_{j},v_{i}). Analogously, we consider the Gram matrix 𝔾\mathbb{G} linked to the inner product (,)V(\cdot,\cdot)_{V} such that 𝔾ki=(vk,vi)V\mathbb{G}_{ki}=(v_{k},v_{i})_{V}. The optimal test space is defined as Vhopt:=span{Tw1,,Twn}V_{h}^{\hbox{\tiny{opt}}}:=\operatorname{span}\{Tw_{1},...,Tw_{n}\}. Thus, using (9), we observe that the matrix containing the coefficients of optimal test functions when expanded in the basis of VhV_{h} is 𝕎:=𝔾1𝔹\mathbb{W}:=\mathbb{G}^{-1}\mathbb{B}. Moreover, the Petrov-Galerkin system (10) becomes

𝔹T𝕎x=(LT𝕎)T,\mathbb{B}^{T}\mathbb{W}\,x=(L^{T}\mathbb{W})^{T}, (11)

where the vector xx contains the coefficients of the expansion of uhu_{h} in the basis of UhU_{h}, LT=[(v1)(vm)]L^{T}=[\ell(v_{1})\,\cdots\,\ell(v_{m})], and we have used the fact that 𝔾T=𝔾\mathbb{G}^{T}=\mathbb{G}. Therefore, if we aim to solve the Petrov-Galerkin linear system (11), an optimized matrix-vector multiplication to perform 𝕎x\mathbb{W}\,x and LT𝕎L^{T}\mathbb{W} becomes critical. Section 2.3 is devoted to study the hierarchical compression of 𝕎\mathbb{W}, which allows for fast vector-matrix multiplications.

2.2 Optimal test functions for an affine family of parametric PDEs

Assume we want to solve parametric PDEs in variational form, i.e.,

Find uμU such that bμ(uμ,v)=μ(v),vV,\hbox{Find }\quad u_{\mu}\in U\quad\hbox{ such that }\quad b_{\mu}(u_{\mu},v)=\ell_{\mu}(v),\quad\forall v\in V,

where for each set of parameters μ𝒫d\mu\in\mathcal{P}\subset\mathbb{R}^{d}, the bilinear form bμ(,)b_{\mu}(\cdot,\cdot) is continuous and inf-sup stable, with constants that may depend on μ\mu. Moreover, we assume that bμ(,)b_{\mu}(\cdot,\cdot) has the affine decomposition:

bμ(,)=b0(,)+k=1dθk(μ)bk(,),b_{\mu}(\cdot,\cdot)=b_{0}(\cdot,\cdot)+\sum_{k=1}^{d}\theta_{k}(\mu)b_{k}(\cdot,\cdot)\,,

where θk:𝒫\theta_{k}:\mathcal{P}\to\mathbb{R} and bk:U×Vb_{k}:U\times V\to\mathbb{R} are accessible and easy to compute. When the trial and test spaces are discretized, the bilinear form bμ(,)b_{\mu}(\cdot,\cdot) induces a matrix of the form:

𝔹μ=𝔹0+k=1dθk(μ)𝔹k.\mathbb{B}_{\mu}=\mathbb{B}_{0}+\sum_{k=1}^{d}\theta_{k}(\mu)\mathbb{B}_{k}\,.

Thus, the matrix of coefficients of optimal test functions 𝕎μ:=𝔾1𝔹μ\mathbb{W}_{\mu}:=\mathbb{G}^{-1}\mathbb{B}_{\mu} becomes in this case:

𝕎μ=𝔾1𝔹0+k=1dθk(μ)𝔾1𝔹k.\mathbb{W}_{\mu}=\mathbb{G}^{-1}\mathbb{B}_{0}+\sum_{k=1}^{d}\theta_{k}(\mu)\mathbb{G}^{-1}\mathbb{B}_{k}\,. (12)

Equation (18) shows the particular form that expression (12) gets for the Eriksson-Johnsson model problem, where knowing 𝔾1𝔹0\mathbb{G}^{-1}\mathbb{B}_{0} and 𝔾1𝔹1\mathbb{G}^{-1}\mathbb{B}_{1} implies the knowledge of 𝕎ϵ\mathbb{W}_{\epsilon} for any ϵ>0\epsilon>0.

2.3 Hierarchical compression of the optimal test functions matrix of coefficients

The hierarchical matrices has been introduced by Hackbush [11]. The main idea of the hierachical compression of a matrix is to store the matrix in a tree-like structure, where:

  • 1.

    the root node corresponds to the whole matrix;

  • 2.

    the root node has some number of sons (in our approach 4 sons) corresponding to submatrices of the main matrix;

  • 3.

    each node can have sons (in our approach 4 sons) corresponding to submatrices (blocks), or can be a leaf representing the corresponding matrix (block);

  • 4.

    each leaf stores its associated matrix in the SVD compressed form or as zero matrix;

  • 5.

    at each node, the decision about storing the block in SVD form, or either dividing the block into submatrices, depends on an admissibility condition of the block.

Refer to caption
Figure 3: Hierarchical compression of a matrix

Exemplary hierarchical compression of the matrix in a form of a tree is presented in Figure 3; while the algorithm for compression of the matrix into the hierarchical matrix format is presented in Algorithm 1.

The admissibility condition controls the process of creation of the tree, it allows to decide if the matrix should be divided (or not) into submatrices. In our case, the admissibility condition is defined by

  1. 1.

    The size of the matrix: if the matrix is bigger than a pre-defined maximal admissible size l>>1l>>1, then the matrix should be divided into submatrices;

  2. 2.

    The first rr singular values: if the r+1r+1 singular value is greater than a pre-defined threshold δ>0\delta>0, then the matrix should be divided into submatrices.

In the leaves of the tree, we perform a reduced Signular Value Decomposition (rSVD). A reduced singular value decomposition of a (n×m)(n\times m)-matrix 𝕄\mathbb{M} of rank kk is a factorisation of the form

𝕄=𝕌𝔻𝕍T,\mathbb{M}=\mathbb{U}\,\mathbb{D}\,\mathbb{V}^{T},

with unitary matrices 𝕌n×k\mathbb{U}\in\mathbb{R}^{n\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}} and 𝕍m×k\mathbb{V}\in\mathbb{R}^{m\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}}, and a diagonal matrix 𝔻k×k\mathbb{D}\in\mathbb{R}^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}\times{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}} where the diagonal entries are 𝔻11𝔻22𝔻kk>0\mathbb{D}_{11}\geq\mathbb{D}_{22}\geq\cdots\geq\mathbb{D}_{kk}>0. (see Figure 4).

Refer to caption
Figure 4: Reduced Singular Value Decomposition.

The diagonal entries of 𝔻\mathbb{D} are called the singular values of 𝕄\mathbb{M}. The computational complexity of the reduced SVD is 𝒪((m+n)k2){\cal O}((m+n){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}k}^{2}).

2.4 Matrix-vector multiplication with \mathbb{H}-matrices and GMRES solver speedup

The computational cost of matrix-vector multiplication using a compressed \mathbb{H}-matrix of rank rr and ss right-hand side vectors is 𝒪((m+n)rs){\cal O}((m+n)rs). This is illustrated in Figure  5(a).

The multiplication of a matrix compressed into SVD blocks is performed recursively as illustrated in Figure 5(b). The resulting computational cost of the multiplication is 𝒪(Nrs){\cal O}(Nrs), where N:=max{n,m}N:=\max\{n,m\}.

Refer to caption
Figure 5: (a) Computational cost of matrix-vector multiplication with compressed \mathbb{H}-matrix and ss vectors 𝒪(rms+rns){\cal O}(rms+rns) when n=N>>rn=N>>r reduces to 𝒪(Nrs){\cal O}(Nrs). (b) Multiplication of a matrix compressed into four SVD blocks by the vector partitioned into two blocks, following [C2(C1X1)+D2(D1X2)E2(E1X1)+F2(F1X2)]\begin{bmatrix}{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}C_{2}*(C_{1}}*X_{1})+{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}D_{2}*(D_{1}}*X_{2})\\ {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}E_{2}*(E_{1}}*X_{1})+{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}F_{2}*(F_{1}}*X_{2})\end{bmatrix}. The resulting computational cost is 𝒪(Nrs){\cal O}(Nrs).

The GMRES algorithm employed for computing the solution, includes multiplications of the problem matrix by vectors (see line 1, line 4, and line 5 in Algorithm 9). The application of the stabilization matrix requires replacement of the 𝔹\mathbb{B} matrix by 𝔹T\mathbb{B}^{T}\mathbb{H}. If we can apply matrix-vector multiplication x\mathbb{H}\,x in a linear cost, we can say that our stabilization comes for free.

2.5 Neural network learning the hierarchical matrices

The hierarchical matrix is obtained by constructing a tree with SVD decompositions of different blocks of the full matrix. The root level corresponds to the entire matrix, and the children correspond to sub-blocks. Only the leaf nodes have blocks stored in the SVD decomposition format. The most expensive part of the compression algorithm is checking the admissibility condition. In particular, checking if a given block has rr singular values smaller than δ\delta, and whether we partition or run SVD decomposition. The SVD data for the blocks of different sizes can be precomputed and stored in a list, see Figure 6.

Refer to caption
Figure 6: Compression of truncated SVD decompositions of matrices, starting from the root level, second level, third level, and the following levels.

From the set 𝒫d\mathcal{P}\subset\mathbb{R}^{d} of PDE-parameters, we can construct the neural network

𝒫μDNN(μ)={𝕌i(μ),𝔻i(μ),𝕍i(μ)}i=1,,NB\mathcal{P}\ni\mu\rightarrow\operatorname{DNN}(\mu)=\{\mathbb{U}_{i}(\mu),\mathbb{D}_{i}(\mu),\mathbb{V}_{i}(\mu)\}_{i=1,...,N_{B}} (13)

where DNN(μ)\operatorname{DNN}(\mu) is the list of SVD decompositions for all NBN_{B} blocks of different dimensions.

Unfortunately, training the neural networks 𝕌i(μ)\mathbb{U}_{i}(\mu) and 𝕍i(μ)\mathbb{V}_{i}(\mu) for different blocks, as functions of μ\mu does not work. However, it is possible to train the singular values for different blocks as function of μ\mu, i.e.,

𝒫μDNN(μ)={𝔻i(μ)}i=1,,NB.\mathcal{P}\ni\mu\rightarrow\operatorname{DNN}(\mu)=\{\mathbb{D}_{i}(\mu)\}_{i=1,...,N_{B}}\,. (14)

Knowing 𝔻i(μ)\mathbb{D}_{i}(\mu) a priori for a given μ\mu allows us to construct the structure of the hierarchical matrix, and call truncated SVD only for leaves of the matrix, which considerably speeds up the compression algorithm. An illustration of the architecture of the neural network used to learn singular values is depicted in Figure 7.

Refer to caption
Figure 7: The architecture of the neural network learning the singular values for blocks of the matrix.

3 Application

3.1 Two-dimensional Eriksson-Johnson problem

Given Ω=(0,1)22\Omega=(0,1)^{2}\subset\mathbb{R}^{2} and β=(1,0)\beta=(1,0), we seek the solution of the advection-diffusion problem

{ϵΔu+βu=0in Ωu=sin(kπy)χ{x=0}over Ω,\left\{\begin{array}[]{rl}-\epsilon\,\Delta u+\beta\cdot\nabla u=0&\hbox{in }\Omega\\ u=\sin(k\pi y)\chi_{\{x=0\}}&\hbox{over }\partial\Omega\,,\end{array}\right. (15)

where χ{x=0}\chi_{\{x=0\}} denotes the characteristic function over the the inflow boundary x=0x=0.

Refer to caption
Figure 8: Boundary layer of Eriksson-Johnson problem for ϵ=0.01\epsilon=0.01.

The problem is driven by the inflow Dirichlet boundary condition and develops a boundary layer of width ϵ\epsilon near the outflow x=1x=1, as shown in Figure 8.

The weak form with a general Dirichlet boundary data gH12(Ω)g\in H^{1\over 2}(\partial\Omega) will be to find uϵH1(Ω)u^{\epsilon}\in H^{1}(\Omega) such that

{ϵΩuϵv+Ω(βuϵ)vϵb1(uϵ,v)+b0(uϵ,v)=:bϵ(uϵ,v)=0,vH01(Ω),uϵ=g,over Ω.\left\{\begin{array}[]{rl}\underbrace{\displaystyle\epsilon\int_{\Omega}\nabla u^{\epsilon}\cdot\nabla v+\int_{\Omega}(\beta\cdot\nabla u^{\epsilon})v}_{\epsilon\,b_{1}(u^{\epsilon},v)+b_{0}(u^{\epsilon},v)=:b_{\epsilon}(u^{\epsilon},v)}=0,&\forall v\in H_{0}^{1}(\Omega),\\ u^{\epsilon}=g,&\hbox{over }\partial\Omega.\end{array}\right. (16)

To simplify the discussion, we approximate the solution as tensor products of one-dimensional B-splines basis functions {Bi;p(x)Bj;p(y)}i,j\{B_{i;p}(x)B_{j;p}(y)\}_{i,j} of uniform order pp in all directions. This discrete trial space UhpH1(Ω)U_{h}^{p}\subset H^{1}(\Omega) will be split as Uhp=Uh,0p+Uh,ΩpU_{h}^{p}=U_{h,0}^{p}+U^{p}_{h,\partial\Omega}, where Uh,0pH01U^{p}_{h,0}\subset H_{0}^{1} contains all the basis functions vanishing at Ω\partial\Omega; and Uh,ΩpU^{p}_{h,\partial\Omega} is the complementary subspace containing the basis functions associated with boundary nodes. Our discrete solution will be uhϵ=uh,0ϵ+uh,gϵu_{h}^{\epsilon}=u_{h,0}^{\epsilon}+u^{\epsilon}_{h,g}, where uh,0ϵUh,0pu^{\epsilon}_{h,0}\in U_{h,0}^{p} is unknown and uh,gϵUh,Ωpu^{\epsilon}_{h,g}\in U_{h,\partial\Omega}^{p} is directly obtained using the boundary data gg.

We build the test space using a larger polynomial order q>pq>p. That is, we use the tensor product of one-dimensional B-splines basis functions {Bs;q(x)Bt;q(y)}s,t\{B_{s;q}(x)B_{t;q}(y)\}_{s,t} of order qq and vanishing over Ω\partial\Omega. This discrete test space will be denoted by Vh,0qH01(Ω)V^{q}_{h,0}\subset H_{0}^{1}(\Omega). The discrete residual minimization problem will be to find uh,0ϵUh,0pu^{\epsilon}_{h,0}\in U_{h,0}^{p} and rhVh,0qr_{h}\in V_{h,0}^{q} such that

(rh,vh)L2(Ω)bϵ(uh,0ϵ,vh)\displaystyle(\nabla r_{h},\nabla v_{h})_{L^{2}(\Omega)}-b_{\epsilon}(u^{\epsilon}_{h,0},v_{h}) =bϵ(uh,gϵ,vh),\displaystyle=b_{\epsilon}(u^{\epsilon}_{h,g},v_{h})\,, vhVh,0q;\displaystyle\quad\forall v_{h}\in V_{h,0}^{q}\,; (17a)
bϵ(wh,rh)\displaystyle b_{\epsilon}(w_{h},r_{h}) =0,\displaystyle=0\,, whUh,0p.\displaystyle\quad\forall w_{h}\in U_{h,0}^{p}\,. (17b)

The reduced matrix system associated with (17) takes the form:

𝔹ϵT𝕎ϵx=(LT𝕎ϵ)T, where 𝕎ϵ=𝔾1𝔹ϵ=ϵ𝔾1𝔹1+𝔾1𝔹0.\mathbb{B}_{\epsilon}^{T}\mathbb{W}_{\epsilon}x=(L^{T}\mathbb{W}_{\epsilon})^{T},\quad\hbox{ where }\mathbb{W}_{\epsilon}=\mathbb{G}^{-1}\mathbb{B}_{\epsilon}=\epsilon\,\mathbb{G}^{-1}\mathbb{B}_{1}+\mathbb{G}^{-1}\mathbb{B}_{0}. (18)
Refer to caption
Figure 9: The entire matrix 𝕎ϵ\mathbb{W}_{\epsilon} of the coefficients of the optimal test functions as the input of the first hierarchical level.

We train a neural network for the diagonal 𝔻(ϵ)\mathbb{D}(\epsilon) matrix of the SVD decomposition [𝕌(ϵ),𝔻(ϵ),𝕍(ϵ)][\mathbb{U}(\epsilon),\mathbb{D}(\epsilon),\mathbb{V}(\epsilon)] of the entire matrix 𝕎ϵ\mathbb{W}_{\epsilon} (see Figure 9), as well as 𝔻ij(ϵ)\mathbb{D}_{ij}(\epsilon) from the SVD decompositions {𝕌ij(ϵ),𝔻ij(ϵ),𝕍ij(ϵ)}i=1,,j2\{\mathbb{U}_{ij}(\epsilon),\mathbb{D}_{ij}(\epsilon),\mathbb{V}_{ij}(\epsilon)\}_{i=1,...,j^{2}} of sub-matrices obtained by j×jj\times j partitions of 𝕎ϵ\mathbb{W}_{\epsilon}, for j=2,4,8,16j=2,4,8,16. The white parts correspond to the boundary nodes, where we have enforced the boundary conditions. The convergence of the training procedures is presented in Figures 10.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 10: Erikkson-Johnsson problem. Training of the neural network for the diagonal matrices 𝔻ij(ϵ)\mathbb{D}_{ij}(\epsilon) at several levels.

In an online stage, for a given diffusion coefficient ϵ\epsilon, we perform the compression of the matrix 𝕎ϵ\mathbb{W}_{\epsilon} into the hierarchical matrix ϵ\mathbb{H}_{\epsilon} using Algorithm 6, where the admissibility condition is now provided by the trained neural network. The compressed hierarchical matrices are illustrated in Figure 11.

Refer to caption
Refer to caption
Figure 11: Matrix of the coefficients of the optimal test functions compressed by using recursive SVD algorithm for ϵ=0.1\epsilon=0.1 and ϵ=0.000001\epsilon=0.000001. The compression algorithm removes singular values smaller than δ=107\delta=10^{-7}, and employs r=16r=16.

Having the compressed matrix ϵ\mathbb{H}_{\epsilon}, we employ the GMRES algorithm [19] for solving the linear system (18). To avoid the computation with a dense 𝔹ϵTϵ\mathbb{B}_{\epsilon}^{T}\mathbb{H}_{\epsilon} matrix, we note that the GMRES method involves computations of the residual R=𝔹ϵTϵx(LTϵ)TR=\mathbb{B}_{\epsilon}^{T}\mathbb{H}_{\epsilon}\,x-(L^{T}\mathbb{H}_{\epsilon})^{T} and the hierarchical matrix ϵ\mathbb{H}_{\epsilon} enables matrix-vector multiplications of ϵx\mathbb{H}_{\epsilon}\,x and LTϵL^{T}\mathbb{H}_{\epsilon} in a quasi-linear computational cost.

ϵ\epsilon Compress Compress ϵx{\mathbb{H}}_{\epsilon}*x 𝔸ϵx{\mathbb{A}}*{\mathbb{H}}_{\epsilon}x # iter Total
ϵ{\mathbb{H}}_{\epsilon} flops ϵ{\mathbb{H}}_{\epsilon} flops flops flops GMRES flops
with DNN without DNN {\mathbb{H}}-matrix
0.1 14,334 158,946 41,163 9,152 90 3,728,166
10610^{-6} 31,880 117,922 33,100 9,002 76 3,232,392
Table 1: Computational costs of the stabilized Erikkson-Johnsson solver using neural networks, hierarchical matrices and GMRES solver.
ϵ\epsilon # iter GMRES Galerkin Flops per iteration Total flops
0.1 89 17,536 1,560,704
10610^{-6} 65 17,536 1,139,840
Table 2: Computational costs of Galerkin method for the Erikkson-Johnsson problem using GMRES solver.

In Table 1 we summarize the computational costs of our solver for two values of ϵ={0.1,0.000001}\epsilon=\{0.1,0.000001\}. The computational mesh was a tensor product of quadratic B-splines with 26 elements along xx-axis and quadratic B-splines with 10 elements along yy-axis. As we can read from the second and third columns, the DNN speeds up the compression process of the matrix of optimal test function’s coefficients around ten times. We employ the GMRES solver that computes the residual. The cost of multiplication of the ϵx\mathbb{H}_{\epsilon}*x and the cost of multiplication of 𝔸ϵx\mathbb{A}*\mathbb{H}_{\epsilon}x is included in the fourth and fifth column in Table 1. The total cost of the GMRES with hierarchical matrices augmented by DNN compression is equal to compression cost of ϵ{\mathbb{H}}_{\epsilon} with DNN plus number of iterations times the multiplication cost of ϵx{\mathbb{H}}_{\epsilon}*x plus multiplication cost of 𝔸ϵx{\mathbb{A}}*{\mathbb{H}}_{\epsilon}x. The total cost is presented in the last column of Table 1.

For comparison, we run the GMRES algorithm on the Galerkin method. The comparison is summarized in Table 2. The number of iterations, the cost per iteration, and the total cost are presented there. We can see that the cost of the stabilized solution is of the same order as the cost of the Galerkin solution. We compare here the costs of the correct solution obtained from the stable Petrov-Galerkin method with the cost of the incorrect solution from the unstable Galerkin method.

The numerical results are compared with the exact solutions in Figure 12 for inflow data g=sin(πy)g=\sin(\pi y), and Figure 13 for inflow data g=sin(2πy)g=\sin(2\pi y).

Refer to caption
Refer to caption
Figure 12: Case inflow data g=sinπyg=\sin{\pi y}. Comparison of exact solution (first column) and solutions obtained from GMRES solver using H-matrix (second column) for ϵ=0.000001\epsilon=0.000001 (first row) and ϵ=0.1\epsilon=0.1 (second row). Iterative solver executed with accuracy 101010^{-10}.
Refer to caption
Figure 13: Case inflow data g=sin(2πy)g=\sin(2\pi y). Solutions obtained from the Petrov-Galerkin formulation with ϵ=0.000001\epsilon=0.000001 and ϵ=0.1\epsilon=0.1.

3.2 Hemholtz problem

Given Ω=(0,1)22\Omega=(0,1)^{2}\subset\mathbb{R}^{2} and κ[1,10]\kappa\in[1,10], we seek the solution of the Hemholtz problem

{Δu+κ2u=fin Ωu=gover Ω,\left\{\begin{array}[]{rl}\Delta u+\kappa^{2}u=f&\hbox{in }\Omega\\ u=g&\hbox{over }\partial\Omega\,,\end{array}\right. (19)

with right-hand sides ff and gg such that the exact solution is u(x,y)=sin(κπx)sin(κπy)u(x,y)=\sin(\kappa\pi x)\sin(\kappa\pi y).

We employ 20×2020\times 20 finite elements mesh. The trial space is constructed from quadratic B-splines. The test space is obtained for trial, and quadraticfrom B-splines with C0C^{0} separators (equivalent to Lagrange polynomials). The dependence of the coefficients of the optimal test functions on κ\kappa for the Hemholtz problem has the affine structure described in Section 2.2, thus we can offline construct the function

𝒫κ𝕎(κ),\mathcal{P}\ni\kappa\rightarrow\mathbb{W}(\kappa), (20)

where 𝕎(κ)\mathbb{W}(\kappa) is the matrix of the coefficients of the optimal test functions. We fix the trial and test spaces used for approximation of the solution and stabilization of the Petrov-Galerkin formulation. Next, we consider blocks of different size of matrix 𝕎(κ)\mathbb{W}(\kappa), and we train the SVD for these different blocks as function of κ\kappa, i.e.,

𝒫κDNN(κ)={𝕌i(κ),𝔻i(κ),𝕍i(κ)}i=1,,NB.\mathcal{P}\ni\kappa\rightarrow\operatorname{DNN}(\kappa)=\{\mathbb{U}_{i}(\kappa),\mathbb{D}_{i}(\kappa),\mathbb{V}_{i}(\kappa)\}_{i=1,...,N_{B}}\,. (21)
Refer to caption
Refer to caption
Refer to caption
Figure 14: Hemholtz problem. Training of the neural network for the diagonal matrices 𝕌ij(κ)\mathbb{U}_{ij}(\kappa), 𝔻ij(κ)\mathbb{D}_{ij}(\kappa), and 𝕍ij(κ)\mathbb{V}_{ij}(\kappa) for one block.

The convergence of the training procedure is presented in Figure 14. Knowing 𝔻i(κ)\mathbb{D}_{i}(\kappa) a priori for a given κ\kappa allows us to construct the structure of the hierarchical matrix, and we obtain the 𝕌i(κ)\mathbb{U}_{i}(\kappa) and 𝕍i(κ)\mathbb{V}_{i}(\kappa) from the neural networks.

Figure 15 depicts the exemplary resulting hierarchical matrices.

Table 3 summarizes the computational costs of our solver for two values of κ={1,10}\kappa=\{1,10\}. The computational mesh was a tensor product of quadratic B-splines with 10 elements along xx and yy axes. The DNN allows obtaining the compression of the matrix of optimal test function’s coefficients for free. We employ the GMRES solver that computes the residual. The cost of multiplication of the κx\mathbb{H}_{\kappa}*x and the cost of multiplication of 𝔸κx\mathbb{A}*\mathbb{H}_{\kappa}x is included in the fourth and fifth column in Table 3. The total cost of the GMRES with hierarchical matrices augmented by DNN compression is equal to the number of iterations times the multiplication cost of κx{\mathbb{H}}_{\kappa}*x plus multiplication cost of 𝔸κx{\mathbb{A}}*{\mathbb{H}}_{\kappa}x. The total cost is presented in the last column of Table 3.

We present in Table 4 the cost of the GMRES algorithm executed on the Galerkin method. We present the number of iterations, the cost per iteration, and the total cost. We can see that the cost of obtaining the stabilized solution is of the same order as the cost of the Galerkin method. We compare here the costs of the correct solution obtained from the Petrov-Galerkin method with the cost of the incorrect solution from the unstable Galerkin method.

The comparison of the solution obtained with the Petrov-Galerkin formulation with the optimal test functions generated by DNN and the exact solution is presented in Figure 16.

Refer to caption
Refer to caption
Figure 15: The hierarchical matrices for κ=1\kappa=1 (left panel) and κ=10\kappa=10 (right panel).
Refer to caption
Figure 16: The solutions obtained for the Hemholtz problem for κ=1,2,4,8\kappa=1,2,4,8. The solution obtained from the Petrov-Galerkin formulation with the optimal test functions provided by the neural network (first row). The exact solution (second row).
κ\kappa Compress Compress 𝕎κx{\mathbb{W}}_{\kappa}*x 𝔸𝕎κx{\mathbb{A}}*{\mathbb{W}}_{\kappa}x # iter Total
𝕎κ{\mathbb{W}}_{\kappa} flops 𝕎κ{\mathbb{W}}_{\kappa} flops flops flops GMRES flops
with DNN without DNN {\mathbb{H}}-matrix
1 0 129,788 47,649 8,901 10 448,284
10 0 129,788 47,649 8,877 31 1,752,306
Table 3: Computational costs of the stabilized Hemholtz solver using neural networks, hierarchical matrices and GMRES solver.
ϵ\epsilon # iter GMRES Galerkin Flops per iteration Total flops
1 10 14,444 144,440
10 31 14,444 447,764
Table 4: Computational costs of Galerkin method for the Hemholtz problem using GMRES solver.

4 Conclusions

We have employed the Petrov-Galerkin formulation with optimal test functions for the stabilization of difficult problems. We have focused on advection-dominated diffusion and Helmholtz problems. During the offline phase, we explicitly compute the matrix of coefficients of optimal test functions for any PDE-parameter. We have also trained neural networks to compute for each PDE-parameter the bottleneck of hierarchical matrix compression. During the online phase, we rapidly compute the matrix compression using the neural networks, and we perform the GMRES iterative solver on the reduced Petrov-Galerkin linear system, where vector-matrix multiplications are done in a quasi-linear computational cost, due to the hierarchical structure of the low-rank decomposition used. Thus, we obtain the online stabilization practically for free.

Acknowledgments

The European Union’s Horizon 2020 Research and Innovation Program of the Marie Skłodowska-Curie grant agreement No. 777778, MATHROCKs. Research project partly supported by the program “Excellence initiative – research university” for the University of Science and Technology.

Appendix A Algorithms

A.1 Recursive hierarchical compression of the matrix

0:  tmin,tmax,smin,smaxt_{\min},t_{\max},s_{\min},s_{\max}\in{\mathbb{N}} (row and column index ranges),
1tmintmaxn,1sminsmaxm1\leq t_{\min}\leq t_{\max}\leq n,1\leq s_{\min}\leq s_{\max}\leq m where n×mn\times m is the size of the matrix to be compressed
  if Admissible(tmin,tmax,smin,smax,r,δ)\operatorname{Admissible}(t_{\min},t_{\max},s_{\min},s_{\max},r,\delta) then
     v=CompressMatrix(tmin,tmax,smin,smax,r)v=\operatorname{CompressMatrix}(t_{\min},t_{\max},s_{\min},s_{\max},r)
  else
     // Create a new node with four sons corresponding to four //quarters of the matrix
     create new node vv
     AppendChild(v,CreateTree(tmin,tnewmax,smin,snewmax))(t_{\min},t_{\operatorname{newmax}},s_{\min},s_{\operatorname{newmax}}))
     AppendChild(v,CreateTree(tmin,tnewmax,snewmax+1,smax))(t_{\min},t_{\operatorname{newmax}},s_{\operatorname{newmax}}+1,s_{\max}))
     AppendChild(v,CreateTree(tnewmax+1,tmax,smin,snewmax))(t_{\operatorname{newmax}}+1,t_{\max},s_{\min},s_{\operatorname{newmax}}))
     AppendChild(v,CreateTree(tnewmax+1,tmax,snewmax+1,smax))(t_{\operatorname{newmax}}+1,t_{\max},s_{\operatorname{newmax}}+1,s_{\max}))
  end if
  RETURN vv
Algorithm 1 Recursive hierarchical compression of the matrix: CreateTree(r,δr,\delta) where rr is the rank used for the compression, and δ\delta is the threshold for the singular values.

A.2 Checking of the admissibility condition

0:  tmin,tmax,smin,smaxt_{\min},t_{\max},s_{\min},s_{\max} - range of indexes of block, δ\delta compression threshold, rr maximum rank
  if block (tmin,tmax,smin,smaxt_{\min},t_{\max},s_{\min},s_{\max}) consist of zeros then
     return true;
  end if
  [𝕌,𝔻,𝕍]truncatedSVD(tmin,tmax,smin,smax,r+1)[\mathbb{U},\mathbb{D},\mathbb{V}]\leftarrow\operatorname{truncatedSVD}(t_{\min},t_{\max},s_{\min},s_{\max},r+1); σdiag(𝔻)\sigma\leftarrow\operatorname{diag}(\mathbb{D});
  if σ(r+1)<δ\sigma(r+1)<\delta then
     return true;
  end if
  return false;
Algorithm 2 Checking of the admissibility condition: result=Admissible(tmin,tmax,smin,smax,r,δ)\operatorname{result}=\operatorname{Admissible}(t_{\min},t_{\max},s_{\min},s_{\max},r,\delta)

A.3 Matrix vector multiplication

0:  node v\textrm{node }v representing compressed matrix (v)m×n\mathbb{H}(v)\in{\cal M}^{m\times n}, Xn×cX\in{\cal M}^{n\times c} vectors to multiply
  if v.nr_sons==0v.nr\_sons==0 then
     if v.rank==0v.rank==0 then
        return zeros(size(A).rows)\textrm{return }zeros(size(A).rows)
     end if
     return v.U(v.VX)\textrm{return }v.U*(v.V*X)
  end if
  rows=size(X).rowsrows=size(X).rows
  X1=X(1:rows2,)X_{1}=X(1:\frac{rows}{2},*)
  X2=X(rows2+1:size(A).rows,)X_{2}=X(\frac{rows}{2}+1:size(A).rows,*)
  C2=v.son(1).U;C1=v.son(1).VC_{2}=v.son(1).U;C_{1}=v.son(1).V
  D2=v.son(2).U;D1=v.son(2).VD_{2}=v.son(2).U;D_{1}=v.son(2).V
  E2=v.son(3).U;E1=v.son(3).VE_{2}=v.son(3).U;E_{1}=v.son(3).V
  F2=v.son(4).U;F1=v.son(4).VF_{2}=v.son(4).U;F_{1}=v.son(4).V
  return [C2(C1X1)+D2(D1X2)E2(E1X1)+F2(F1X2)]\textrm{return }\begin{bmatrix}C_{2}*(C_{1}*X_{1})+D_{2}*(D_{1}*X_{2})\\ E_{2}*(E_{1}*X_{1})+F_{2}*(F_{1}*X_{2})\end{bmatrix}
Algorithm 3 Matrix vector multiplication: Y=𝐦𝐚𝐭𝐫𝐢𝐱_𝐯𝐞𝐜𝐭𝐨𝐫_𝐦𝐮𝐥𝐭(v,X)Y={\bf matrix\_vector\_mult}(v,X)

A.4 rSVD compression of a block

0:  tmin,tmax,smin,smaxt_{\min},t_{\max},s_{\min},s_{\max} - range of indexes of block, δ\delta compression threshold, rr maximum rank
  if block (tmin,tmax,smin,smaxt_{\min},t_{\max},s_{\min},s_{\max}) consist of zeros then
     create new node v;v.rank0;v.sizesize(tmin,tmax,smin,smax);return v;\textrm{\bf create new node }v;v.rank\leftarrow 0;v.size\leftarrow size(t_{\min},t_{\max},s_{\min},s_{\max});\textrm{\bf return }v;
  end if
  [𝕌,𝔻,𝕍]truncatedSVD(tmin,tmax,smin,smax,r)[\mathbb{U},\mathbb{D},\mathbb{V}]\leftarrow\operatorname{truncatedSVD}(t_{\min},t_{\max},s_{\min},s_{\max},r); σdiag(𝔻)\sigma\leftarrow\operatorname{diag}(\mathbb{D});
  rankrank(𝔻)rank\leftarrow rank(\mathbb{D})
  create new node v;\textrm{\bf create new node }v; v.rankrank;v.rank\leftarrow rank;
  v.singularvaluesσ(1:rank);v.singularvalues\leftarrow\sigma(1:rank);
  v.UU(,1:rank);v.U\leftarrow U(*,1:rank);
  v.VD(1:rank,1:rank)V(1:rank,);v.V\leftarrow D(1:rank,1:rank)*V(1:rank,*);
  v.sons;v.sons\leftarrow\emptyset; v.sizesize(tmin,tmax,smin,smax);v.size\leftarrow size(t_{\min},t_{\max},s_{\min},s_{\max});
  return v;\textrm{\bf return }v;
Algorithm 4 rSVD compression of a block: node=𝐂𝐨𝐦𝐩𝐫𝐞𝐬𝐬𝐌𝐚𝐭𝐫𝐢𝐱(tmin,tmax,smin,smax,r)node={\bf CompressMatrix}(t_{\min},t_{\max},s_{\min},s_{\max},r)

A.5 Pseudo-code of the GMRES algorithm

0:  AA matrix, bb right-hand-side vector, x0x_{0} starting point
  Compute r0=bAx0r_{0}=b-Ax_{0}
  Compute v1=r0r0v_{1}=\frac{r_{0}}{\|r_{0}\|}
  for j=1,2,,kj=1,2,...,k
  Compute hi,j=(Avj,vi)h_{i,j}=\left(Av_{j},v_{i}\right) for i=1,2,,ji=1,2,...,j
  Compute v^j+1=Avji=1,,jhi,jvi\hat{v}_{j+1}=Av_{j}-\sum_{i=1,...,j}h_{i,j}v_{i}
  Compute hj+1,j=v^j+12h_{j+1,j}=\|\hat{v}_{j+1}\|_{2}
  Compute vj+1=v^j+1/hj+1,jv_{j+1}=\hat{v}_{j+1}/h_{j+1,j}
  end for
  Form solution xk=x0+Vkykx_{k}=x_{0}+V_{k}y_{k} where Vk=[v1vk]V_{k}=[v_{1}...v_{k}], and yky_{k} minimizes J(y)=βe1H^kyJ(y)=\|\beta e_{1}-\hat{H}_{k}y\| where H^=[h1,1h1,2h1,kh2,1h2,2h2,k0hk,k1hk,k00hk+1,k]\hat{H}=\begin{bmatrix}h_{1,1}&h_{1,2}\cdots h_{1,k}\\ h_{2,1}&h_{2,2}\cdots h_{2,k}\\ 0&\ddots&\ddots&\vdots\\ \vdots&\ddots&h_{k,k-1}&h_{k,k}\\ 0&\cdots&0&h_{k+1,k}\end{bmatrix}
Algorithm 5 Pseudo-code of the GMRES algorithm

A.6 Recursive hierarchical compression of the matrix augmented by neural network

0:  tmin,tmax,smin,smax,t_{\min},t_{\max},s_{\min},s_{\max},\in{\mathbb{N}} (row and column index ranges), rr rank of the blocks, δ\delta accuracy of compression, μ\mu PDE parameter
1tmintmaxn,1sminsmaxm1\leq t_{\min}\leq t_{\max}\leq n,1\leq s_{\min}\leq s_{\max}\leq m where n×mn\times m is the size of the matrix to be compressed
  i = block index for (tmin,tmax,smin,smax)(t_{\min},t_{\max},s_{\min},s_{\max})
  if  𝔻i(μ)[r+1]<δ\mathbb{D}_{i}(\mu)[r+1]<\delta (asking NN for block singularvalues then
     if block (tmin,tmax,smin,smaxt_{\min},t_{\max},s_{\min},s_{\max}) consist of zeros then
        create new node v;v.rank0;v.sizesize(tmin,tmax,smin,smax,s,t);return v;\textrm{\bf create new node }v;v.rank\leftarrow 0;v.size\leftarrow size(t_{\min},t_{\max},s_{\min},s_{\max},s,t);\textrm{\bf return }v;
     end if
     [𝕌,𝔻,𝕍]truncatedSVD(tmin,tmax,smin,smax,r);σdiag(𝔻);[\mathbb{U},\mathbb{D},\mathbb{V}]\leftarrow\operatorname{truncatedSVD}(t_{\min},t_{\max},s_{\min},s_{\max},r);\sigma\leftarrow\operatorname{diag}(\mathbb{D});
     rankrank(𝔻)rank\leftarrow rank(\mathbb{D})
     create new node v;\textrm{\bf create new node }v; v.rankrank;v.rank\leftarrow rank;
     v.singularvaluesσ(1:rank);v.singularvalues\leftarrow\sigma(1:rank);
     v.U𝕌(,1:rank);v.U\leftarrow\mathbb{U}(*,1:rank);
     v.V𝔻(1:rank,1:rank)𝕍(1:rank,);v.V\leftarrow\mathbb{D}(1:rank,1:rank)*\mathbb{V}(1:rank,*);
     v.sons;v.sons\leftarrow\emptyset; v.sizesize(tmin,tmax,smin,smax);v.size\leftarrow size(t_{\min},t_{\max},s_{\min},s_{\max});
     return v;\textrm{\bf return }v;
  else
     // Create a new node with four sons corresponding to four //quarters of the matrix
     create new node v
     AppendChild(v,CreateTreeNN(tmin,tnewmax,smin,snewmax,r,δ,μ))(t_{\min},t_{\operatorname{newmax}},s_{\min},s_{\operatorname{newmax}},r,\delta,\mu))
     AppendChild(v,CreateTreeNN(tmin,tnewmax,snewmax+1,smax,r,δ,μ))(t_{\min},t_{\operatorname{newmax}},s_{\operatorname{newmax}}+1,s_{\max},r,\delta,\mu))
     AppendChild(v,CreateTreeNN(tnewmax+1,tmax,smin,snewmax,r,δ,μ))(t_{\operatorname{newmax}}+1,t_{\max},s_{\min},s_{\operatorname{newmax}},r,\delta,\mu))
     AppendChild(v,CreateTreeNN(tnewmax+1,tmax,snewmax+1,smax,r,δ,μ))(t_{\operatorname{newmax}}+1,t_{\max},s_{\operatorname{newmax}}+1,s_{\max},r,\delta,\mu))
  end if
  RETURN vv
Algorithm 6 Recursive hierarchical compression of the matrix augmented by neural network: CreateTreeNN(1,rowsof(AA),1,columnsof(AA),r,δ,μr,\delta,\mu) where rr is the rank used for the compression, and δ\delta is the threshold for the rr singular values, μ\mu is the PDE parameter.

References

  • [1] A. F. Agarap, Deep learning using rectified linear units (ReLu). arXiv preprint arXiv:1803.08375 (2018).
  • [2] P. R. Amestoy and I. S. Duff, Multifrontal parallel distributed symmetric and unsymmetric solvers, Computer Methods in Applied Mechanics and Engineering, 184 (2000) 501-520.
  • [3] P. R. Amestoy, I. S. Duff, J. Koster and J-Y L’Excellent, A fully asynchronous multifrontal solver using distributed dynamic scheduling, SIAM Journal on Matrix Analysis and Applications, 1(23) (2001) 15-41.
  • [4] P. R. Amestoy, A. Guermouche, J-Y L’Excellent and S. Pralet, Hybrid scheduling for the parallel solution of linear systems, Computer Methods in Applied Mechanics and Engineering 2(32) (2001), 136–156.
  • [5] V. M. Calo, Residual-based multiscale turbulence modeling: Finite volume simulations of bypass transition, Stanford University, Ph.D. Thesis (2005)
  • [6] V. M. Calo, M. Łoś, Q. Deng, I. Muga, M. Paszyński, Isogeometric residual minimization method (iGRM) with direction splitting preconditioner for stationary advection-dominated diffusion problems, Computer Methods in Applied Mechanics and Engineering 373 (2021) 113214.
  • [7] J. Chan, J. A.Evans, A minimal-residual finite element method for the convection-diffusion equations, ICES-REPORT 13-12 (2013)
  • [8] J. Chan, J. A. Evans, W. Qiu. A dual Petrov–Galerkin fi- nite element method for the convection–diffusion equation. Computers & Mathematics with Applications 68(11) (2014) 1513–1529
  • [9] A. Ern, J.-L. Guermond. Theory and practice of finite elements. Vol. 159. Springer, 2013
  • [10] O.G. Ernst, M. J. Gander, Why it is Difficult to Solve Helmholtz Problems with Classical Iterative Methods. In: Graham, I., Hou, T., Lakkis, O., Scheichl, R. (eds.) Numerical Analysis of Multiscale Problems. Lecture Notes in Computational Science and Engineering, vol 83. Springer, Berlin, Heidelberg (2012)
  • [11] W. Hackbush, Hierarchical Matrices: Algorithms and Analysis, Springer (2009)
  • [12] R. A. Horn, C. R., Johnson, Matrix analysis. Cambridge university press. (1990)
  • [13] T.J.R. Hughes, L.P. Franca, M. Mallet, A new finite element formulation for fluid dynamics: VI. Convergence analysis of the generalized SUPG formulation for linear time– dependent multidimensional advective–diffusive systems, Computer Methods in Applied Mechanics and Engineering, 6 (1987) 97–112.
  • [14] D. P. Kingma, J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  • [15] M. Łoś, I. Muga, J. Muñoz-Matute, M. Paszyński, Isogeometric Residual Minimization Method (iGRM) with direction splitting for non-stationary advection–diffusion problems, Computers & Mathematics with Applications, 79(2) (2020) 213-229.
  • [16] M. Łoś, I. Muga, J. Muñoz-Matute, M. Paszyński, Isogeometric residual minimization (iGRM) for non-stationary Stokes and Navier–Stokes problems, Computers & Mathematics with Applications, 95(1) (2021) 200-214.
  • [17] I. Muga, K. G. Van Der Zee. Discretization of Linear Prob- lems in Banach Spaces: Residual Minimization, Nonlinear Petrov–Galerkin, and Monotone Mixed Methods, SIAM Journal on Numerical Analysis 58(6) (2020), 3406–3426.
  • [18] J. N. Reddy: An introduction to the finite element method, Mcgraw–Hill (2006)
  • [19] Y. Saad, Iterative Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics; 2nd edition (2003)
  • [20] R. Stevenson, J. Westerdiep, Minimal residual space-time discretizations of parabolic equations: Asymmetric spatial operators, Computers & Mathematics with Applications 101 (2021) 107-118