This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A hybrid iterative method based on MIONet for PDEs: Theory and numerical examples111The research is supported by NSFC project 12288101.

Jun Hu222School of Mathematical Sciences, Peking University, Beijing 100871, China ([email protected]).  and Pengzhan Jin333School of Mathematical Sciences, Peking University, Beijing 100871, China ([email protected]).
Abstract

We propose a hybrid iterative method based on MIONet for PDEs, which combines the traditional numerical iterative solver and the recent powerful machine learning method of neural operator, and further systematically analyze its theoretical properties, including the convergence condition, the spectral behavior, as well as the convergence rate, in terms of the errors of the discretization and the model inference. We show the theoretical results for the frequently-used smoothers, i.e. Richardson (damped Jacobi) and Gauss-Seidel. We give an upper bound of the convergence rate of the hybrid method w.r.t. the model correction period, which indicates a minimum point to make the hybrid iteration converge fastest. Several numerical examples including the hybrid Richardson (Gauss-Seidel) iteration for the 1-d (2-d) Poisson equation are presented to verify our theoretical results, and also reflect an excellent acceleration effect. As a meshless acceleration method, it is provided with enormous potentials for practice applications.

1 Introduction

Partial differential equations (PDEs) play a crucial role in describing physical reality and modeling engineering problems. As the vast majority of PDEs have no computable analytical solutions, seeking numerical solutions is an important issue. There have been many classical numerical methods developed for solving PDEs, such as the finite element method (FEM) [3], the finite difference method (FDM) [32], the finite volume method [6], and the spectral method [29]. With these methods, linear PDEs are usually discretized as linear algebraic systems, and subsequently be solved by direct methods (e.g. Gaussian elimination), or iterative methods such as Richardson, (damped) Jacobi, Gauss-Seidel, SOR, SSOR [33]. When the scale of the problem is very large, iterative methods show significant advantages like low memory requirements. However, a large scale system results in a long solving time, which is unacceptable. To accelerate the iteration, multilevel iterative methods are developed [8, 34, 38], a multigrid solver will greatly reduce the steps of iteration and lead to fast convergence. The shortcoming of the multigrid method is that it is mesh-depended, and requires users to generate several levels of meshes, which may be a very heavy task. In order to overcome this drawback, the algebraic multigrid (AMG) methods [2, 22, 28, 39] are proposed, which deal with the algebraic system directly without geometric meshes. To further promote the development of fast and low usage cost iterative methods, in this work we propose a hybrid iterative method by introducing neural network-based tools.

The rise of neural networks (NNs) brings a huge impact on searching numerical solutions of PDEs. It has been shown that NNs have powerful expressivity [4, 30], thus NNs are considered as universal approximators for the solutions of PDEs. Mainstream neural networks-based solving methods for PDEs are optimizing the parameters of such NNs to minimize the corresponding loss functions constructed by strong or variational forms of PDEs, e.g. the PINNs [14, 21, 26] , the deep Ritz method [5], and the deep Galerkin method [31]. These methods show the possibility for solving high-dimensional problems, which is indeed a challenge for traditional methods due to the curse of dimensionality. As meshless methods, they are easily implemented and friendly for coders. The drawbacks of such methods are also obvious. On the one hand, these methods are usually hard to obtain solutions as accurate as the traditional methods like FEM, on the other hand, one needs to restart a training process once some parameters of the given PDE problem (e.g. the right-hand side term of Poisson equation) are changed, which is costly to deal with parametric PDE problems. In recent years, a new research field called neural operator, is emerging to achieve the fast solving for PDEs based on neural networks. The pioneer works are the DeepONet [19, 20] and the FNO [17, 18], they directly learn the solution operators of PDEs, for example, the mapping from the right-hand side of Poisson equation to its solution. These methods present a new landscape for understanding physical reality, one may study the causality via a data-driven manner besides the PDE model. Recently numerous neural operators are proposed to achieve higher performance [7, 9, 12, 13, 25, 27, 37], as this field is developing rapidly. There are also theoretical works for the above neural operators including error analysis [15, 16, 23].

The most attractive neural operator for this work is the multiple-input operator network (MIONet) [13]. MIONet firstly faces up to the problems with multiple inputs, and generalizes the architecture and the theory of DeepONet by the tool of tensor decomposition. Such architecture also induces the tensor-based models for solving eigenvalue problems [10, 36]. An important characteristic of DeepONet/MIONet series architectures is that they encode the output functions as neural networks, rather than discrete vectors. This feature leads to efficient interpolation for the predicted solutions, and also allows the computing of derivatives of output functions during training, which prompts the work of physics-informed DeepONets [35] that learns the solution operator without any input-output pairs of data, by training a PINN-style loss. Another interesting work based on DeepONet is the HINTS [41], which in fact inspires this work. Almost all of the works of neural operators use a pure machine learning-style to solve the PDE problems, and they break away from traditional PDE numerical methods. The neural operators are developed for fast solving, but usually have limited prediction accuracy, while the traditional numerical methods can provide solutions as accurate as needed at the cost of long solving time. Surprisingly, HINTS combines the two methods, i.e. the traditional numerical solver and the neural operator. It has been shown a spectral bias/frequency principle of the training behavior of NNs in [24, 40], i.e. the NNs often fit functions from low to high frequency during the training. Based on this feature, [11] employs the deep learning solutions (e.g. PINNs) to initialize the iterative methods for PDEs. HINTS further utilizes the spectral bias, where a pre-trained DeepONet is periodically used to eliminate the low-frequency components of the error during the iteration process, thus achieve acceleration since the smoothers (e.g. Gauss-Seidel) are difficult to deal with the low-frequency errors. This idea is similar to the mulltigrid method. A significant difference is that HINTS does not require the users to generate meshes, and the cost of this method has been transferred to the developers who are responsible for the pre-trained DeepONets.

Contributions. As HINTS studies the hybrid method empirically, the lack of theoretical guidance leads to the imperfection of this method. In this work, we propose a hybrid iterative method based on MIONet. We theoretically show that MIONet is a more reasonable framework to complete the hybrid iterative method. An ideal operator regression model for the hybrid method is expected to satisfy the following three key points:

  • 1.

    Supporting multiple inputs. Taking the Poisson equation

    {(ku)=finΩ,u=0onΩ,\begin{cases}-\nabla\cdot(k\nabla u)=f&\quad\mbox{in}\;\Omega,\\ u=0&\quad\mbox{on}\;\partial\Omega,\end{cases} (1)

    as an example, its solution operator u=𝒢(k,f)u=\mathcal{G}(k,f) has two inputs, hence the chosen operator regression model should efficiently deal with such multiple-input case.

  • 2.

    Separating linearity and nonlinearity. Since the solution operator 𝒢(k,f)\mathcal{G}(k,f) of (1) is linear w.r.t. ff but nonlinear w.r.t. kk, the operator regression model (k,f)\mathcal{M}(k,f) is also expected to be linear w.r.t. ff and nonlinear w.r.t. kk. The linearity on ff will guarantee the convergence of the hybrid iterative method which will be discussed later.

  • 3.

    Supporting efficient interpolation. After obtaining the predicted solution by operator regression model \mathcal{M}, we have to compute the values of (k,f)()\mathcal{M}(k,f)(\cdot) at the grid points which are independent of the model \mathcal{M}, so that the model \mathcal{M} should support efficient interpolation.

Fortunately, we find that MIONet satisfies all the three points perfectly, while other models do not (note that MIONet is a high-level framework, and one may design a targeted architecture for each branch net on a specific problem, rather than directly use FNNs.). With the proposed MIONet-based hybrid iterative method, we systematically analyze its theoretical properties, including the convergence condition, the spectral behavior, as well as the convergence rate, in terms of the errors of the discretization and the model inference. We show the theoretical results for the frequently-used smoothers, i.e. Richardson (damped Jacobi) and Gauss-Seidel. We give an upper bound of the convergence rate of the hybrid method w.r.t. the model correction period, which indicates a minimum point to make the hybrid iteration converge fastest. Several numerical examples including the Richardson (Gauss-Seidel) iteration for the 1-d (2-d) Poisson equation are shown to verify our theoretical results.

This paper is organized as follows. We first briefly introduce the operator regression model MIONet as the necessary tool in Section 2. In Section 3, we present the algorithm of the proposed hybrid iterative method in detail. We then comprehensively analyze the theoretical properties of this method in Section 4. The numerical experiments are shown in Section 5. Finally, Section 6 summarizes this work.

2 Operator regression with MIONet

Let us simply introduce the operator regression model MIONet, which is a necessary tool for the later discussion of the hybrid iterative method. All the content of this section can be found in the MIONet paper [13], and the readers may refer to that paper for more details if interested. The readers familiar with MIONet may skip this section.

Let X1,X2,,XnX_{1},X_{2},...,X_{n} and YY be n+1n+1 Banach spaces, and KiXiK_{i}\subset X_{i} (i=1,,ni=1,...,n) is a compact set. Now we aim to learn a continuous operator

𝒢:K1××KnY,(v1,,vn)u,\mathcal{G}:K_{1}\times\cdots\times K_{n}\to Y,\quad(v_{1},...,v_{n})\mapsto u,

where viKiv_{i}\in K_{i} and u=𝒢(v1,,vn)u=\mathcal{G}(v_{1},...,v_{n}). Such 𝒢\mathcal{G} form the space C(K1××Kn,Y)C(K_{1}\times\cdots\times K_{n},Y) which is studied below. Note that YY could be any Banach space. We usually consider YY as a function space, such as Y=C(K0),L2(K0),H1(K0)Y=C(K_{0}),L^{2}(K_{0}),H^{1}(K_{0}) for a compact domain K0K_{0}.

To deal with the inputs from infinite-dimensional Banach spaces, it is necessary to firstly project them onto finite-dimensional spaces. Here we utilize the tools of Schauder basis and canonical projections.

Definition 1 (Schauder basis).

Let XX be an infinite-dimensional normed linear space. A sequence {ei}i=1\{e_{i}\}_{i=1}^{\infty} in XX is called a Schauder basis of XX, if for every xXx\in X there is a unique sequence of scalars {ai}i=1\{a_{i}\}_{i=1}^{\infty}, called the coordinates of xx, such that

x=i=1aiei.x=\sum_{i=1}^{\infty}a_{i}e_{i}.

We show two useful examples of Schauder basis as follows.

Example 1.

Faber-Schauder basis of C[0,1]C[0,1]. Given distinct points {ti}i=1\{t_{i}\}_{i=1}^{\infty} which is a dense subset in [0,1][0,1] with t1=0t_{1}=0, t2=1t_{2}=1. Let e1(t)=1e_{1}(t)=1, e2(t)=te_{2}(t)=t, and ek+1e_{k+1} is chosen as an element, such that e1,,ek,ek+1{e_{1},...,e_{k},e_{k+1}} is a basis of the (k+1)(k+1)-dimensional space which consists of all the piecewise linear functions with grid points {ti}i=1k+1\{t_{i}\}_{i=1}^{k+1}.

Example 2.

Fourier basis of L2[0,1]L^{2}[0,1]. Any orthogonal basis in a separable Hilbert space is a Schauder basis.

We denote the coordinate functional of eie_{i} by eie_{i}^{*}, and thus

x=i=1ei(x)ei,xX.x=\sum_{i=1}^{\infty}e_{i}^{*}(x)e_{i},\quad\forall x\in X.

Then for a positive integer nn, the canonical projection PnP_{n} is defined as

Pn(x)=Pn(i=1ei(x)ei)=i=1nei(x)ei.P_{n}(x)=P_{n}\left(\sum_{i=1}^{\infty}e_{i}^{*}(x)e_{i}\right)=\sum_{i=1}^{n}e_{i}^{*}(x)e_{i}.

We have the following property for PnP_{n}, according to which, we can represent points in an infinite-dimensional Banach space by finite coordinates within a sufficiently small projection error.

Property 1 (Canonical projection).

Assume that KK is a compact set in a Banach space XX equipped with a Schauder basis and corresponding canonical projections PnP_{n}, then we have

limnsupxKxPn(x)=0.\lim_{n\to\infty}\sup_{x\in K}\left\lVert x-P_{n}(x)\right\rVert=0.\\

For convenience, we decompose the PnP_{n} as

Pn=ψnφn,P_{n}=\psi_{n}\circ\varphi_{n},

where φn:Xn\varphi_{n}:X\to\mathbb{R}^{n} and ψn:nX\psi_{n}:\mathbb{R}^{n}\to X are defined as

φn(x)=(e1(x),,en(x))T,ψn(α1,,αn)=i=1nαiei.\varphi_{n}(x)=\left(e_{1}^{*}(x),...,e_{n}^{*}(x)\right)^{T},\quad\psi_{n}(\alpha_{1},...,\alpha_{n})=\sum_{i=1}^{n}\alpha_{i}e_{i}.

The φn(x)\varphi_{n}(x) are essentially the truncated coordinates for xx. Moreover, sometimes we can further replace {e1,,en}\{e_{1},...,e_{n}\} with an equivalent basis for the decomposition of PnP_{n}, i.e.,

φ^n(x)=Q(e1(x),,en(x))T,ψ^n(α1,,αn)=(e1,,en)Q1(α1,,αn)T,\hat{\varphi}_{n}(x)=Q(e_{1}^{*}(x),...,e_{n}^{*}(x))^{T},\quad\hat{\psi}_{n}(\alpha_{1},...,\alpha_{n})=(e_{1},...,e_{n})Q^{-1}(\alpha_{1},...,\alpha_{n})^{T},

with a nonsingular matrix Qn×nQ\in\mathbb{R}^{n\times n}. For example, when applying the Faber-Schauder basis (Example 1), instead of using the coordinates based on the sequence {ei}i=1\{e_{i}\}_{i=1}^{\infty}, we adopt the function values evaluated at certain grid points as the coordinates, which is the same as the linear element basis in the finite element method.

By considering the canonical projection property and the injective tensor product decomposition

C(K1×K2××Kn,Y)C(K1)^εC(K2)^ε^εC(Kn)^εY,C(K_{1}\times K_{2}\times\cdots\times K_{n},Y)\cong C(K_{1})\hat{\otimes}_{\varepsilon}C(K_{2})\hat{\otimes}_{\varepsilon}\cdots\hat{\otimes}_{\varepsilon}C(K_{n})\hat{\otimes}_{\varepsilon}Y, (2)

we can easily obtain the following approximation result.

Theorem 1.

Suppose that X1,,Xn,YX_{1},...,X_{n},Y are Banach spaces, KiXiK_{i}\subset X_{i} are compact sets, and XiX_{i} have a Schauder basis with canonical projections Pqi=ψqiφqiP_{q}^{i}=\psi_{q}^{i}\circ\varphi_{q}^{i} (truncate the previous q terms). Assume that 𝒢:K1××KnY\mathcal{G}:K_{1}\times\cdots\times K_{n}\to Y is a continuous operator, then for any ϵ>0\epsilon>0, there exist positive integers p,qip,q_{i}, continuous functions gjiC(qi)g_{j}^{i}\in C(\mathbb{R}^{q_{i}}), ujYu_{j}\in Y, such that

supviKi𝒢(v1,,vn)j=1pgj1(φq11(v1))gjn(φqnn(vn))ujY<ϵ.\sup_{v_{i}\in K_{i}}\left\lVert\mathcal{G}(v_{1},...,v_{n})-\sum_{j=1}^{p}g_{j}^{1}(\varphi_{q_{1}}^{1}(v_{1}))\cdots g_{j}^{n}(\varphi_{q_{n}}^{n}(v_{n}))\cdot u_{j}\right\rVert_{Y}<\epsilon. (3)

In addition, if 𝒢\mathcal{G} is linear with respect to viv_{i}, then linear gjig_{j}^{i} is sufficient.

The MIONet is immediately constructed based on this theorem, where we just replace the gjig_{j}^{i} with neural networks. Assume that now Y=C(K0)Y=C(K_{0}) (L2(K0)L^{2}(K_{0}), H1(K0)H^{1}(K_{0}) etc. are also available) for a compact domain K0K_{0}.

MIONet. We construct the MIONet according to Eq. (3). Specifically, 𝐠i=(g1i,,gpi)TC(qi,p)\mathbf{g}_{i}=(g_{1}^{i},...,g_{p}^{i})^{T}\in C(\mathbb{R}^{q_{i}},\mathbb{R}^{p}) and 𝐟=(u1,,up)TC(K0,p)\mathbf{f}=(u_{1},...,u_{p})^{T}\in C(K_{0},\mathbb{R}^{p}) are approximated by neural networks 𝐠~i\tilde{\mathbf{g}}_{i} (called branch net ii) and 𝐟~\tilde{\mathbf{f}} (called trunk net). Then the network (Figure 1) is

𝒢~(v1,,vn)(y)=𝒮(𝐠~1(φq11(v1))branch1𝐠~n(φqnn(vn))branchn𝐟~(y)trunk)+b,\tilde{\mathcal{G}}(v_{1},...,v_{n})(y)=\mathcal{S}\left(\underbrace{\tilde{\mathbf{g}}_{1}(\varphi_{q_{1}}^{1}(v_{1}))}_{\text{branch}_{1}}\odot\cdots\odot\underbrace{\tilde{\mathbf{g}}_{n}(\varphi_{q_{n}}^{n}(v_{n}))}_{\text{branch}_{n}}\odot\underbrace{\tilde{\mathbf{f}}(y)}_{\text{trunk}}\right)+b, (4)

where φqii\varphi_{q_{i}}^{i} truncates the previous qiq_{i} coordinates w.r.t. the corresponding Schauder basis as defined in Eq. (3), \odot is the Hadamard product (element-wise product), e.g.

(x1,,xp)(y1,,yp)=(x1y1,,xpyp),(x_{1},...,x_{p})\odot(y_{1},...,y_{p})=(x_{1}y_{1},...,x_{p}y_{p}), (5)

and 𝒮\mathcal{S} is the summation of all the components of a vector, bb\in\mathbb{R} is a trainable bias.

The universal approximation theorem for MIONet can be easily obtained via Theorem 1 and the universal approximation theorems for the applied neural networks (i.e. 𝐠~i\tilde{\mathbf{g}}_{i} and 𝐟~\tilde{\mathbf{f}}) such as FNNs.

Theorem 2 (Universal approximation theorem for MIONet).

Under the setting of Theorem 1 with Y=C(K0),L2(K0),H1(K0)Y=C(K_{0}),L^{2}(K_{0}),H^{1}(K_{0}) etc., for any ϵ>0\epsilon>0, there exists a MIONet 𝒢~\tilde{\mathcal{G}}, such that

𝒢𝒢~C(K1××Kn,Y)<ϵ.\left\lVert\mathcal{G}-\tilde{\mathcal{G}}\right\rVert_{C(K_{1}\times\cdots\times K_{n},Y)}<\epsilon.

In addition, the branch nets can be chosen as linear maps (i.e. one-layer FNN without activation and bias) if 𝒢\mathcal{G} is linear with respect to the corresponding inputs.

Up to now, we have already shown the architecture and properties of MIONet, which will be applied to the iterative methods later.

Refer to caption
Figure 1: An illustration of the architecture of MIONet for general operators. All the branch nets and the trunk net have the same output dimension, whose outputs are merged together via the Hadamard product and then a summation.

3 A hybrid iterative method

We present a new MIONet-based hybrid iterative method for general linear equations:

{αu=finΩ,u=gonΩ,\begin{cases}\mathcal{L}_{\alpha}u=f&\quad\mbox{in}\;\Omega,\\ u=g&\quad\mbox{on}\;\partial\Omega,\end{cases} (6)

where α\mathcal{L}_{\alpha} is a linear differential operator depends on α\alpha. α={α1,α2,}\alpha=\{\alpha_{1},\alpha_{2},...\} is a finite set consisting of several parameters which can be scalars, vectors, functions, etc.. Other boundary conditions are also available.

Without loss of generality we take the well-known Poisson equation as an example. Consider

{(ku)=finΩ,u=0onΩ,\begin{cases}-\nabla\cdot(k\nabla u)=f&\quad\mbox{in}\;\Omega,\\ u=0&\quad\mbox{on}\;\partial\Omega,\end{cases} (7)

where Ωd\Omega\subset\mathbb{R}^{d} is a bounded open set. Here fL2(Ω),kL(Ω)f\in L^{2}(\Omega),k\in L^{\infty}(\Omega), and kCk\geq C a.e. for a constant C>0C>0, so that it is a classical second-order elliptic equation. The variational formula is

FindinguH01(Ω)suchthatΩkuvdx=Ωfv𝑑x,vH01(Ω).\begin{split}&{\rm Finding\ }u\in H_{0}^{1}(\Omega){\rm\ such\ that}\\ &\int_{\Omega}k\nabla u\cdot\nabla vdx=\int_{\Omega}fvdx,\quad\forall v\in H_{0}^{1}(\Omega).\\ \end{split} (8)

Now assume that Ω\Omega is a bounded polyhedral domain, 𝒯h\mathcal{T}_{h} is a quasi-uniform and shape regular triangulation of Ω\Omega. We apply the simple linear element space

Vh={vC(Ω¯):v|KP1(K),K𝒯h}V:=H01(Ω),V_{h}=\{v\in C(\overline{\Omega}):v|_{K}\in P_{1}(K),\forall K\in\mathcal{T}_{h}\}\subset V:=H_{0}^{1}(\Omega), (9)

where P1(K)P_{1}(K) denotes the function space consisting of all the linear polynomials over KK. The finite element method is

FindinguhVhsuchthatΩkuhvhdx=Ωfvh𝑑x,vhVh.\begin{split}&{\rm Finding\ }u_{h}\in V_{h}{\rm\ such\ that}\\ &\int_{\Omega}k\nabla u_{h}\cdot\nabla v_{h}dx=\int_{\Omega}fv_{h}dx,\quad\forall v_{h}\in V_{h}.\\ \end{split} (10)

Let {ϕi}i=1nh\{\phi_{i}\}_{i=1}^{n_{h}} be the nodal basis functions corresponding to all the interior nodes of 𝒯h\mathcal{T}_{h}. Then we obtain a discrete linear system for Eq. (10) as

Ahμh=bh,A_{h}\mu_{h}=b_{h}, (11)

where Ah=(Ωkϕiϕjdx)nh×nhA_{h}=(\int_{\Omega}k\nabla\phi_{i}\cdot\nabla\phi_{j}dx)_{n_{h}\times n_{h}}, bh=(Ωfϕi𝑑x)nh×1b_{h}=(\int_{\Omega}f\phi_{i}dx)_{n_{h}\times 1}. Denote μh=(α1,,αnh)T\mu_{h}=(\alpha_{1},...,\alpha_{n_{h}})^{T}, then uh:=i=1nhαiϕiu_{h}:=\sum_{i=1}^{n_{h}}\alpha_{i}\phi_{i} is the numerical solution obtained by the above system. What we next to do is solving Eq. (11) via a suitable iterative method.

For convenience, we temporarily omit the “hh” in above notations, but we keep in mind that they indeed depend on hh. For the discrete linear system

Aμ=b,A\mu=b, (12)

we consider the iterative form

μ(m+1)=μ(m)+B(bAμ(m)),μ(0)=0,\mu^{(m+1)}=\mu^{(m)}+B(b-A\mu^{(m)}),\quad\mu^{(0)}=0, (13)

where BB is an approximation of A1A^{-1}. The frequently-used choices of BB are

B=\displaystyle B= (ω1I)1,\displaystyle(\omega^{-1}I)^{-1}, Richardson (14)
B=\displaystyle B= (ω1D)1,\displaystyle(\omega^{-1}D)^{-1}, Jacobi (damped) (15)
B=\displaystyle B= (ω1D+L)1,\displaystyle(\omega^{-1}D+L)^{-1}, Gauss-Seidel (SOR) (16)

where A=D+L+UA=D+L+U, DD is the diagonal of AA, LL and UU are the strictly lower and upper triangular parts of AA, respectively. The ω\omega is usually considered as the relaxation parameter to be chosen. By performing a suitable iterative method, we will get a numerical solution μ\mu for system (12).

Briefly, our hybrid iterative method is just inserting some inference steps via MIONet into the iterative process. Now we return to the operator regression model MIONet. For the problem (7), what we are concerned with is in fact the solution operator

𝒢:(k,f)u.\mathcal{G}:(k,f)\mapsto u. (17)

Now assume that we have a dataset as 𝒯={(ki,fi),ui}i=1N\mathcal{T}=\{(k_{i},f_{i}),u_{i}\}_{i=1}^{N} satisfying 𝒢(ki,fi)=ui\mathcal{G}(k_{i},f_{i})=u_{i}. We pre-train a MIONet, denoted by \mathcal{M}, on this dataset with loss function

L(θ)=1Ni=1N(ki,fi;θ)uiY2,L(\theta)=\frac{1}{N}\sum_{i=1}^{N}\|\mathcal{M}(k_{i},f_{i};\theta)-u_{i}\|_{Y}^{2}, (18)

where YY could be C(Ω¯)C(\overline{\Omega}), L2(Ω)L^{2}(\Omega), H1(Ω)H^{1}(\Omega), etc.. Note that here we set \mathcal{M} linear with respect to ff due to Theorem 1. An illustration is presented in Figure 2.

Refer to caption
Figure 2: An illustration of the architecture of MIONet for the 2-d Poisson equation.

Denote r(m):=bAμ(m)r^{(m)}:=b-A\mu^{(m)}, which is the residual vector of mm-th step. We expect to find a residual function related to the residual vector r(m)r^{(m)}. Define the residual function r¯(m):=i=1nβiϕi\bar{r}^{(m)}:=\sum_{i=1}^{n}\beta_{i}\phi_{i}, where β=(β1,,βn)T\beta=(\beta_{1},...,\beta_{n})^{T} satisfies

Hβ=r(m),H:=(Ωϕiϕj𝑑x)n×n.H\beta=r^{(m)},\quad H:=\left(\int_{\Omega}\phi_{i}\phi_{j}dx\right)_{n\times n}. (19)

Specially, we let r¯(0)=f\bar{r}^{(0)}=f. Note that r¯(m)\bar{r}^{(m)} is linear with respect to r(m)r^{(m)}, we denote their relationship by a linear operator \mathcal{H} as

r¯(m)=r(m),:=(ϕ1,,ϕn)H1,m1.\bar{r}^{(m)}=\mathcal{H}r^{(m)},\quad\mathcal{H}:=(\phi_{1},...,\phi_{n})H^{-1},\quad m\geq 1. (20)

We further define the interpolation operator as

𝕀(u):=(u(x1),,u(xn))T,\mathbb{I}(u):=(u(x_{1}),...,u(x_{n}))^{T}, (21)

where xiΩx_{i}\in\Omega is the corresponding nodal coordinate of ϕi\phi_{i}, i.e. ϕi(xj)=δij\phi_{i}(x_{j})=\delta_{i}^{j} (Dirac delta function).

Then the hybrid iterative method can be written as

μ(m+1)=\displaystyle\mu^{(m+1)}= μ(m)+𝕀((k,(bAμ(m)))),\displaystyle\mu^{(m)}+\mathbb{I}(\mathcal{M}(k,\mathcal{H}(b-A\mu^{(m)}))), m0modMm\equiv 0{\rm\ mod\ }M (22)
μ(m+1)=\displaystyle\mu^{(m+1)}= μ(m)+B(bAμ(m)),\displaystyle\mu^{(m)}+B(b-A\mu^{(m)}), otherwise (23)

where MM is a positive integer to be chosen, we regard it as the correction period. The algorithm is summarized in Algorithm 1.

Algorithm 1 Hybrid iterative method for Poisson equation
0:  Parameters k,fk,f, mesh 𝒯h\mathcal{T}_{h}, iterative method BB, pre-trained MIONet \mathcal{M}, and correction period MM
0:  Numerical solution μ\mu
  1. Obtain a discrete system
Aμ=bA\mu=b
based on 𝒯h\mathcal{T}_{h} for Eq. (7).
  2. Initialize μ(1)=𝕀((k,f))\mu^{(1)}=\mathbb{I}(\mathcal{M}(k,f)), r(1)=bAμ(1)r^{(1)}=b-A\mu^{(1)}, m=1m=1
  3. while (μ(m)\mu^{(m)} does not satisfy the convergence condition)     if (m0modMm\equiv 0{\rm\ mod\ }M)       μ(m+1)=μ(m)+𝕀((k,r(m)))\mu^{(m+1)}=\mu^{(m)}+\mathbb{I}(\mathcal{M}(k,\mathcal{H}r^{(m)}))     else      μ(m+1)=μ(m)+Br(m)\mu^{(m+1)}=\mu^{(m)}+Br^{(m)} (in practice we update μ(m)\mu^{(m)} in-place without computing BB)    r(m+1)=bAμ(m+1)r^{(m+1)}=b-A\mu^{(m+1)}     m=m+1m=m+1        (set mm to m+1m+1)   end
  4. return μ=μ(m)\mu=\mu^{(m)}

Approximation and padding for residual function. There is a new linear system (19) involved. We may choose an approximation for β\beta, for example:

βdiag(jΩϕiϕj𝑑x)1r(m)diag((Ωϕi𝑑x)1)r(m).\beta\approx{\rm diag}\left(\sum_{j}\int_{\Omega}\phi_{i}\cdot\phi_{j}dx\right)^{-1}r^{(m)}\approx{\rm diag}\left(\left(\int_{\Omega}\phi_{i}dx\right)^{-1}\right)r^{(m)}. (24)

With the solved β\beta, we obtain the r¯(m)=r(m)\bar{r}^{(m)}=\mathcal{H}r^{(m)} expressed as a continuous piece-wise linear function w.r.t. {ϕi}\{\phi_{i}\}, which does not contain the boundary nodes. Hence the current r¯(m)\bar{r}^{(m)} equals to 0 on Ω\partial\Omega, which is regarded as a “zero padding”. Here we may use other better padding for r¯(m)\bar{r}^{(m)}, for example, add boundary nodes whose values equal to the values of their corresponding nearest interior nodes (we can also apply other padding strategy like reflection padding, etc.). For convenience, we still use the notation r¯(m)\bar{r}^{(m)} and \mathcal{H}, i.e.

r¯(m)=r(m):=Padding((ϕ1,,ϕn)diag((Ωϕ1𝑑x)1,,(Ωϕn𝑑x)1)r(m)).\bar{r}^{(m)}=\mathcal{H}r^{(m)}:={\rm Padding}\left((\phi_{1},...,\phi_{n}){\rm diag}\left(\left(\int_{\Omega}\phi_{1}dx\right)^{-1},...,\left(\int_{\Omega}\phi_{n}dx\right)^{-1}\right)r^{(m)}\right). (25)

Note that the above approximation brings an acceptable error for this hybrid method. Since r¯(0)=f\bar{r}^{(0)}=f, μ(1)\mu^{(1)} is exactly the prediction by MIONet. We may also think of μ(1)\mu^{(1)} as an initial guess provided by MIONet for the iterative method. When feeding r¯(m)\bar{r}^{(m)} to \mathcal{M}, we need to evaluate the linear element function r¯(m)\bar{r}^{(m)} at some sampling points (usually with smaller scale compared to the number of nodes nn) for \mathcal{M}.

Efficient interpolation. The interpolation 𝕀\mathbb{I} is easy to compute, since the prediction

(k,r¯(m))()\mathcal{M}(k,\bar{r}^{(m)})(\cdot) (26)

is expressed by a neural network (encoded as a trunk net) rather than a discrete vector, and we only need to directly feed the interpolated points into the trunk net of the ONet. This is indeed an important advantage of the ONets series architectures.

Hybrid iteration with multigrid method. Since there is no requirement for the basic iterative method in this algorithm, we can also utilize other advanced iterative methods such as the multigrid method here. We will show more details in the experiment part.

Inhomogeneous boundary condition. Here we further investigate the inhomogeneous boundary condition. Consider

{(ku)=finΩ,Tu=gonΩ,\begin{cases}-\nabla\cdot(k\nabla u)=f&\quad\mbox{in}\;\Omega,\\ Tu=g&\quad\mbox{on}\;\partial\Omega,\end{cases} (27)

where TT is the trace operator. In such a case, we aim to learn an operator mapping from (k,f,g)(k,f,g) to the solution uu, i.e.,

𝒢:(k,f,g)u.\mathcal{G}:(k,f,g)\mapsto u. (28)

Of course we can train an MIONet (k,f,g)\mathcal{M}(k,f,g) to learn this operator. However, this strategy losses the linearity w.r.t. the residual vector/function during the iteration, and we strive to preserve this linearity to guarantee the convergency of the hybrid method. Denote by EE the trace extension operator for (27), then T(Eg)=gT(Eg)=g for gH1/2(Ω)g\in H^{1/2}(\partial\Omega). We derive that

{(ku0)=f+(kEg)inΩ,Tu0=0onΩ,\begin{cases}-\nabla\cdot(k\nabla u_{0})=f+\nabla\cdot(k\nabla Eg)&\quad\mbox{in}\;\Omega,\\ Tu_{0}=0&\quad\mbox{on}\;\partial\Omega,\end{cases} (29)

where u=u0+Egu=u_{0}+Eg. Therefore the model can be modified to

~(k,f,g):=(k,f+(kEg))+Eg,\tilde{\mathcal{M}}(k,f,g):=\mathcal{M}(k,f+\nabla\cdot(k\nabla Eg))+Eg, (30)

where \mathcal{M} is a standard MIONet which is linear w.r.t. the second input as before. Although the trace extension operator EE is unique defined, we may choose a suitable constructed extension as needed, for example the continuous piece-wise linear function related to a grid, which takes 0 at the interior nodes and matches gg at the boundary nodes. Note that here we regard f+(kEg)f+\nabla\cdot(k\nabla Eg) as an element in H1(Ω)H^{-1}(\Omega).

There is in fact another way which is much easier to achieve. It is not difficult to find that 𝒢(k,f,g)\mathcal{G}(k,f,g) is linear w.r.t. (f,g)L2(Ω)×H1/2(Ω)(f,g)\in L^{2}(\Omega)\times H^{1/2}(\partial\Omega), i.e.,

𝒢(k,(λ1(f1,g1)+λ2(f2,g2)))=λ1𝒢(k,(f1,g1))+λ2𝒢(k,(f2,g2)),\mathcal{G}(k,(\lambda_{1}(f_{1},g_{1})+\lambda_{2}(f_{2},g_{2})))=\lambda_{1}\mathcal{G}(k,(f_{1},g_{1}))+\lambda_{2}\mathcal{G}(k,(f_{2},g_{2})), (31)

then we naturally apply the model

~(k,f,g):=(k,(f,g)),\tilde{\mathcal{M}}(k,f,g):=\mathcal{M}(k,(f,g)), (32)

where (f,g)(f,g) is a simple cartesian product of ff and gg, and it will be fed into the linear branch net of \mathcal{M}. In such a case, the discrete algebraic system of (27) is augmented with several rows and columns taking into account the boundary nodes, where the boundary block is the identity. The unknown and residual vectors contain not only the coefficients of the interior nodes but also the boundary nodes. We apply this strategy in our experiment.

Necessary convergence condition for operator regression models. There is a necessary condition for the convergence of the hybrid iterative method. Assume that the hybrid method converges with a model \mathcal{M}, then

limsμ(sM+1)=lims(μ(sM)+𝕀((k,r(sM)))),\lim_{s\to\infty}\mu^{(sM+1)}=\lim_{s\to\infty}\left(\mu^{(sM)}+\mathbb{I}(\mathcal{M}(k,\mathcal{H}r^{(sM)}))\right), (33)

and we have

𝕀((k,0))=0.\mathbb{I}(\mathcal{M}(k,0))=0. (34)

As the interpolation operator 𝕀\mathbb{I} is related to the mesh 𝒯h\mathcal{T}_{h}, and the above equation holds for arbitrary 𝕀𝒯h\mathbb{I}_{\mathcal{T}_{h}}, we know

(k,0)=0.\mathcal{M}(k,0)=0. (35)

Obviously, Eq. (35) holds as long as \mathcal{M} is linear w.r.t. the second input.

4 Theoretical analysis

Consider the one-dimensional two-point boundary value problem of system (7) where k=1k=1, Ω=(0,1)\Omega=(0,1), i.e.,

{u′′=f,x(0,1),u(0)=u(1)=0.\begin{cases}-u^{\prime\prime}=f,\quad x\in(0,1),\\ u(0)=u(1)=0.\end{cases} (36)

We choose a uniform grid with size h=1n+1h=\frac{1}{n+1} and apply the linear element bases functions {ϕi}i=1n\{\phi_{i}\}_{i=1}^{n} whose nodal points are interior, then the above system leads to a discrete system as

Aμ=b,A\mu=b, (37)

where

A=1h(2112112112),μ=(μ1μ2μn1μn),b=(b1b2bn1bn),A=\frac{1}{h}\begin{pmatrix}2&-1&&&\\ -1&2&-1&&\\ &\ddots&\ddots&\ddots&\\ &&-1&2&-1\\ &&&-1&2\\ \end{pmatrix},\quad\mu=\begin{pmatrix}\mu_{1}\\ \mu_{2}\\ \vdots\\ \mu_{n-1}\\ \mu_{n}\\ \end{pmatrix},\quad b=\begin{pmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\\ \end{pmatrix}, (38)

and bi=01fϕi𝑑xb_{i}=\int_{0}^{1}f\phi_{i}dx. We can easily write out the eigenvalues and the eigenvectors of AA as

λi=4hsin2(π2hi),ξi=(ξ1i,,ξni)T,ξji=2hsin(iπhj),1i,jn.\lambda_{i}=\frac{4}{h}\sin^{2}(\frac{\pi}{2}hi),\quad\xi^{i}=(\xi^{i}_{1},...,\xi^{i}_{n})^{T},\quad\xi^{i}_{j}=\sqrt{2h}\sin(i\pi hj),\quad 1\leq i,j\leq n. (39)

Denote Ξ=(ξ1,,ξn)\Xi=(\xi^{1},...,\xi^{n}), then Ξ\Xi is symmetric and orthogonal, i.e. ΞT=Ξ\Xi^{T}=\Xi and Ξ1=Ξ\Xi^{-1}=\Xi.

4.1 Convergence condition

Recall that we apply the hybrid iterative method (22-23) to solve (37). For convenience, here we simply set r¯(0)=b=r(0)\bar{r}^{(0)}=\mathcal{H}b=\mathcal{H}r^{(0)}, so that we have r¯(m)=r(m)\bar{r}^{(m)}=\mathcal{H}r^{(m)} for all m0m\geq 0. We further denote that

¯(r):=𝕀((1,r)),\bar{\mathcal{M}}(r):=\mathbb{I}(\mathcal{M}(1,\mathcal{H}r)), (40)

thus ¯:nn\bar{\mathcal{M}}:\mathbb{R}^{n}\to\mathbb{R}^{n} is a linear map. Denote that

e(m):=μμ(m),e^{(m)}:=\mu-\mu^{(m)}, (41)

then we find that

e(1)=μ¯(r(0))=(I¯A)e(0),e^{(1)}=\mu-\bar{\mathcal{M}}(r^{(0)})=(I-\bar{\mathcal{M}}A)e^{(0)}, (42)

and for 1mM11\leq m\leq M-1,

e(m+1)=(IBA)e(m),e^{(m+1)}=(I-BA)e^{(m)}, (43)

thus

e(M)=(IBA)M1e(1)=(IBA)M1(I¯A)e(0).e^{(M)}=(I-BA)^{M-1}e^{(1)}=(I-BA)^{M-1}(I-\bar{\mathcal{M}}A)e^{(0)}. (44)

Consequently we have

e(sM)=Use(0),e^{(sM)}=U^{s}e^{(0)}, (45)

where

U:=(IBA)M1(I¯A).U:=(I-BA)^{M-1}(I-\bar{\mathcal{M}}A). (46)

Up to now, we have derived the convergence results of the hybrid iterative method.

Theorem 3.

The hybrid iterative method (22-23) converges if and only if

ρ((IBA)M1(I¯A))<1,\rho\left((I-BA)^{M-1}(I-\bar{\mathcal{M}}A)\right)<1, (47)

where ρ()\rho(\cdot) denotes the spectral radius.

Corollary 1.

There exists an MM large enough such that the hybrid iterative method (22-23) converges, if the applied original iterative method converges, i.e.

ρ(IBA)<1.\rho(I-BA)<1. (48)

The corollary is also obvious since (IBA)M1(I¯A)0(I-BA)^{M-1}(I-\bar{\mathcal{M}}A)\to 0 as MM\to\infty if ρ(IBA)<1\rho(I-BA)<1.

Corollary 2.

The hybrid iterative method (22-23) with Richardson iteration, i.e. B=ωIB=\omega I, converges if

0<ω<2ρ(A),M2+lnI¯A2lnρ(IωA).0<\omega<\frac{2}{\rho(A)},\quad M\geq 2+\left\lfloor-\frac{\ln\|I-\bar{\mathcal{M}}A\|_{2}}{\ln\rho(I-\omega A)}\right\rfloor. (49)
Proof.

If 0<ω<2ρ(A)0<\omega<\frac{2}{\rho(A)}, we have

ρ(IBA)=ρ(IωA)<1,\rho(I-BA)=\rho(I-\omega A)<1,

thus the Richardson iteration converges. Furthermore, if Eq. (49) holds, then

ρ((IBA)M1(I¯A))(IωA)M1(I¯A)2IωA2M1I¯A2=ρ(IωA)M1I¯A2<ρ(IωA)1lnI¯A2lnρ(IωA)1I¯A2=1.\begin{split}\rho\left((I-BA)^{M-1}(I-\bar{\mathcal{M}}A)\right)&\leq\left\lVert(I-\omega A)^{M-1}(I-\bar{\mathcal{M}}A)\right\rVert_{2}\\ &\leq\left\lVert I-\omega A\right\rVert_{2}^{M-1}\cdot\left\lVert I-\bar{\mathcal{M}}A\right\rVert_{2}\\ &=\rho(I-\omega A)^{M-1}\cdot\left\lVert I-\bar{\mathcal{M}}A\right\rVert_{2}\\ &<\rho(I-\omega A)^{1-\frac{\ln\|I-\bar{\mathcal{M}}A\|_{2}}{\ln\rho(I-\omega A)}-1}\cdot\left\lVert I-\bar{\mathcal{M}}A\right\rVert_{2}\\ &=1.\end{split}

It is noteworthy that there is no requirement on the loss of the MIONet \mathcal{M} to guarantee the convergence. However, an \mathcal{M} with a large loss cannot accelerate the iteration, which will be discussed later.

4.2 Error of model inference

Since the MIONet usually learns the low-frequency data, we assume that the dataset includes the previous several eigenfunctions as

𝒯={(1,λ¯iξ¯i),ξ¯i}i=1n0,1<n0n,\mathcal{T}=\{(1,\bar{\lambda}_{i}\bar{\xi}^{i}),\bar{\xi}^{i}\}_{i=1}^{n_{0}},\quad 1<n_{0}\leq n, (50)

where λ¯i,ξ¯i\bar{\lambda}_{i},\bar{\xi}^{i} are the eigenvalue and the eigenfunction of the original system, i.e.,

λ¯i=i2π2,ξ¯i(x)=sin(iπx),1in.\bar{\lambda}_{i}=i^{2}\pi^{2},\quad\bar{\xi}^{i}(x)=\sin(i\pi x),\quad\quad 1\leq i\leq n. (51)

For convenience, we consider the loss

L(θ)=1n0i=1n0i(θ):=1n0i=1n0(1,λ¯iξ¯i;θ)ξ¯iC[0,1]2.L(\theta)=\frac{1}{n_{0}}\sum_{i=1}^{n_{0}}\ell_{i}(\theta):=\frac{1}{n_{0}}\sum_{i=1}^{n_{0}}\|\mathcal{M}(1,\bar{\lambda}_{i}\bar{\xi}^{i};\theta)-\bar{\xi}^{i}\|_{C[0,1]}^{2}. (52)

We further impose a generalization assumption on the MIONet for this case.

Assumption 1 (generalization assumption).

Assume that the MIONet keeps a consistent Lipschitz constant L>0L>0 w.r.t. the second input during training, i.e.

(k,f1;θ)(k,f2;θ)C[0,1]Lf1f2C[0,1]\left\lVert\mathcal{M}(k,f_{1};\theta)-\mathcal{M}(k,f_{2};\theta)\right\rVert_{C[0,1]}\leq L\left\lVert f_{1}-f_{2}\right\rVert_{C[0,1]} (53)

holds for f1,f2C[0,1]f_{1},f_{2}\in C[0,1] and θ{θ1,θ2,}\theta\in\{\theta_{1},\theta_{2},...\}. The set {θt}\{\theta_{t}\} is regarded as the sequence of parameters during the training process. LL depends only on kk.

Let \mathcal{H} be defined as in Eq. (25) and extended with replication padding. Then

(h2ξi)=(ϕ0,ϕ1,,ϕn,ϕn+1)12h(ξ1i,ξ1i,ξ2i,,ξn1i,ξni,ξni)T,\mathcal{H}(\sqrt{\frac{h}{2}}\xi^{i})=(\phi_{0},\phi_{1},...,\phi_{n},\phi_{n+1})\cdot\sqrt{\frac{1}{2h}}(\xi_{1}^{i},\xi_{1}^{i},\xi_{2}^{i},...,\xi_{n-1}^{i},\xi_{n}^{i},\xi_{n}^{i})^{T}, (54)

where ϕ0\phi_{0} and ϕn+1\phi_{n+1} are the two boundary nodal functions, hence

(h2ξi)ξ¯iC[0,1]=max(sin(iπh)ξ¯iC[0,h],sin(iπh)ϕ1+sin(2iπh)ϕ2ξ¯iC[h,2h],,sin((n1)iπh)ϕn1+sin(niπh)ϕnξ¯iC[(n1)h,nh],sin(niπh)ξ¯iC[nh,1])iπh.\begin{split}&\left\lVert\mathcal{H}(\sqrt{\frac{h}{2}}\xi^{i})-\bar{\xi}^{i}\right\rVert_{C[0,1]}\\ =&\max\big{(}\left\lVert\sin(i\pi h)-\bar{\xi}^{i}\right\rVert_{C[0,h]},\left\lVert\sin(i\pi h)\phi_{1}+\sin(2i\pi h)\phi_{2}-\bar{\xi}^{i}\right\rVert_{C[h,2h]},...,\\ &\left\lVert\sin((n-1)i\pi h)\phi_{n-1}+\sin(ni\pi h)\phi_{n}-\bar{\xi}^{i}\right\rVert_{C[(n-1)h,nh]},\left\lVert\sin(ni\pi h)-\bar{\xi}^{i}\right\rVert_{C[nh,1]}\big{)}\\ \leq&i\pi h.\end{split} (55)

Consequently,

¯(h2ξi)𝕀((1,ξ¯i))=𝕀((1,(h2ξi)ξ¯i))(1,(h2ξi)ξ¯i)C[0,1]L1(h2ξi)ξ¯iC[0,1]L1iπh,\begin{split}\left\lVert\bar{\mathcal{M}}(\sqrt{\frac{h}{2}}\xi^{i})-\mathbb{I}(\mathcal{M}(1,\bar{\xi}^{i}))\right\rVert_{\infty}=&\left\lVert\mathbb{I}(\mathcal{M}(1,\mathcal{H}(\sqrt{\frac{h}{2}}\xi^{i})-\bar{\xi}^{i}))\right\rVert_{\infty}\\ \leq&\left\lVert\mathcal{M}(1,\mathcal{H}(\sqrt{\frac{h}{2}}\xi^{i})-\bar{\xi}^{i})\right\rVert_{C[0,1]}\\ \leq&L_{1}\left\lVert\mathcal{H}(\sqrt{\frac{h}{2}}\xi^{i})-\bar{\xi}^{i}\right\rVert_{C[0,1]}\\ \leq&L_{1}i\pi h,\end{split} (56)

where L1L_{1} is the Lipschitz constant defined as in Assumption 1. By Eq. (52),

¯(λ¯ih2ξi)12hξi=¯(λ¯ih2ξi)𝕀((1,λ¯iξ¯i))+𝕀((1,λ¯iξ¯i))𝕀(ξ¯i)λ¯i¯(h2ξi)𝕀((1,ξ¯i))+𝕀((1,λ¯iξ¯i))𝕀(ξ¯i)λ¯iL1iπh+i=L1π3i3h+i.\begin{split}\left\lVert\bar{\mathcal{M}}(\bar{\lambda}_{i}\sqrt{\frac{h}{2}}\xi^{i})-\sqrt{\frac{1}{2h}}\xi^{i}\right\rVert_{\infty}=&\left\lVert\bar{\mathcal{M}}(\bar{\lambda}_{i}\sqrt{\frac{h}{2}}\xi^{i})-\mathbb{I}(\mathcal{M}(1,\bar{\lambda}_{i}\bar{\xi}^{i}))+\mathbb{I}(\mathcal{M}(1,\bar{\lambda}_{i}\bar{\xi}^{i}))-\mathbb{I}(\bar{\xi}^{i})\right\rVert_{\infty}\\ \leq&\bar{\lambda}_{i}\left\lVert\bar{\mathcal{M}}(\sqrt{\frac{h}{2}}\xi^{i})-\mathbb{I}(\mathcal{M}(1,\bar{\xi}^{i}))\right\rVert_{\infty}+\left\lVert\mathbb{I}(\mathcal{M}(1,\bar{\lambda}_{i}\bar{\xi}^{i}))-\mathbb{I}(\bar{\xi}^{i})\right\rVert_{\infty}\\ \leq&\bar{\lambda}_{i}L_{1}i\pi h+\ell_{i}\\ =&L_{1}\pi^{3}i^{3}h+\ell_{i}.\end{split} (57)

Now denote

ei:=ξi¯(λiξi),1in,e_{i}:=\xi^{i}-\bar{\mathcal{M}}(\lambda_{i}\xi^{i}),\quad 1\leq i\leq n, (58)

and we get the estimate

ei2=ξi¯(λiξi)2=λiλ¯i2h¯(λ¯ih2ξi)12hξi+12hξiλ¯iλih2ξi2λiλ¯i2h(¯(λ¯ih2ξi)12hξi2+12hξiλ¯iλih2ξi2)λiλ¯i2h(n¯(λ¯ih2ξi)12hξi+12hξiλ¯iλih2ξi2).\begin{split}\left\lVert e_{i}\right\rVert_{2}=&\left\lVert\xi^{i}-\bar{\mathcal{M}}(\lambda_{i}\xi^{i})\right\rVert_{2}\\ =&\frac{\lambda_{i}}{\bar{\lambda}_{i}}\sqrt{\frac{2}{h}}\left\lVert\bar{\mathcal{M}}(\bar{\lambda}_{i}\sqrt{\frac{h}{2}}\xi^{i})-\sqrt{\frac{1}{2h}}\xi^{i}+\sqrt{\frac{1}{2h}}\xi^{i}-\frac{\bar{\lambda}_{i}}{\lambda_{i}}\sqrt{\frac{h}{2}}\xi^{i}\right\rVert_{2}\\ \leq&\frac{\lambda_{i}}{\bar{\lambda}_{i}}\sqrt{\frac{2}{h}}\left(\left\lVert\bar{\mathcal{M}}(\bar{\lambda}_{i}\sqrt{\frac{h}{2}}\xi^{i})-\sqrt{\frac{1}{2h}}\xi^{i}\right\rVert_{2}+\left\lVert\sqrt{\frac{1}{2h}}\xi^{i}-\frac{\bar{\lambda}_{i}}{\lambda_{i}}\sqrt{\frac{h}{2}}\xi^{i}\right\rVert_{2}\right)\\ \leq&\frac{\lambda_{i}}{\bar{\lambda}_{i}}\sqrt{\frac{2}{h}}\left(\sqrt{n}\left\lVert\bar{\mathcal{M}}(\bar{\lambda}_{i}\sqrt{\frac{h}{2}}\xi^{i})-\sqrt{\frac{1}{2h}}\xi^{i}\right\rVert_{\infty}+\left\lVert\sqrt{\frac{1}{2h}}\xi^{i}-\frac{\bar{\lambda}_{i}}{\lambda_{i}}\sqrt{\frac{h}{2}}\xi^{i}\right\rVert_{2}\right).\\ \end{split} (59)

With the results of (57-59) and the orthogonality of Ξ\Xi, we further derive that

ei2λiλ¯i2h(n(L1π3i3h+i)+12hξiλ¯iλih2ξi2)=4sin2(π2hi)π2i2h22(1h)(L1π3i3h+i)+|4sin2(π2hi)π2i2h21|.\begin{split}\left\lVert e_{i}\right\rVert_{2}\leq&\frac{\lambda_{i}}{\bar{\lambda}_{i}}\sqrt{\frac{2}{h}}\left(\sqrt{n}(L_{1}\pi^{3}i^{3}h+\ell_{i})+\left\lVert\sqrt{\frac{1}{2h}}\xi^{i}-\frac{\bar{\lambda}_{i}}{\lambda_{i}}\sqrt{\frac{h}{2}}\xi^{i}\right\rVert_{2}\right)\\ =&\frac{4\sin^{2}(\frac{\pi}{2}hi)}{\pi^{2}i^{2}h^{2}}\sqrt{2(1-h)}(L_{1}\pi^{3}i^{3}h+\ell_{i})+\left|\frac{4\sin^{2}(\frac{\pi}{2}hi)}{\pi^{2}i^{2}h^{2}}-1\right|.\end{split} (60)

It is elementary to see

113x2sin2(x)x21,0x,1-\frac{1}{3}x^{2}\leq\frac{\sin^{2}(x)}{x^{2}}\leq 1,\quad\forall 0\neq x\in\mathbb{R}, (61)

so that

ei24sin2(π2hi)π2i2h22(1h)(L1π3i3h+i)+|4sin2(π2hi)π2i2h21|2L1π3i3h+2i+1(113(π2hi)2)=2L1π3i3h+π212i2h2+2i,1in0.\begin{split}\left\lVert e_{i}\right\rVert_{2}\leq&\frac{4\sin^{2}(\frac{\pi}{2}hi)}{\pi^{2}i^{2}h^{2}}\sqrt{2(1-h)}(L_{1}\pi^{3}i^{3}h+\ell_{i})+\left|\frac{4\sin^{2}(\frac{\pi}{2}hi)}{\pi^{2}i^{2}h^{2}}-1\right|\\ \leq&\sqrt{2}L_{1}\pi^{3}i^{3}h+\sqrt{2}\ell_{i}+1-(1-\frac{1}{3}(\frac{\pi}{2}hi)^{2})\\ =&\sqrt{2}L_{1}\pi^{3}i^{3}h+\frac{\pi^{2}}{12}i^{2}h^{2}+\sqrt{2}\ell_{i},\quad 1\leq i\leq n_{0}.\end{split} (62)

It is summarized as

ei2=𝒪(i3h)+𝒪(i),1in0.\left\lVert e_{i}\right\rVert_{2}=\mathcal{O}(i^{3}h)+\mathcal{O}(\ell_{i}),\quad 1\leq i\leq n_{0}. (63)

The error includes two parts, and the terms 𝒪(i3h),𝒪(i)\mathcal{O}(i^{3}h),\mathcal{O}(\ell_{i}) are caused by the errors of the discretization and the loss, respectively. The first part of this error is inevitable even though the loss i\ell_{i} has been trained to zero, since there exists a gap between the eigenvector (eigenvalue) of the discrete system (37) and the eigenfunction (eigenvalue) of the original system (36), which becomes larger as ii increases.

The estimate (63) can be equivalently written as

ei2=𝒪(i3n)+𝒪(i),1in0.\left\lVert e_{i}\right\rVert_{2}=\mathcal{O}\left(\frac{i^{3}}{n}\right)+\mathcal{O}(\ell_{i}),\quad 1\leq i\leq n_{0}. (64)

Eq. (64) tells that ini\ll n is necessary to ensure that ei2\left\lVert e_{i}\right\rVert_{2} is small enough. Therefore n0n_{0}, the size of the dataset 𝒯\mathcal{T} defined in (50), is no need to be too large compared to nn. We reasonably assume that n0nn_{0}\ll n. For a fixed 1in01\leq i\leq n_{0}, we have derived that

limh0,i0ei2=0,1in0.\lim_{h\to 0,\ell_{i}\to 0}\left\lVert e_{i}\right\rVert_{2}=0,\quad 1\leq i\leq n_{0}. (65)

Note that here we consider the ground truth, i.e. the analytical eigenvectors and eigenvalues, as data. If we utilize numerical solutions as data, the error will decrease, since the gap between data and the discrete system becomes smaller. As a matter of fact, if we replace the λ¯i\bar{\lambda}_{i} in the dataset (50) by the corresponding numerical eigenvalue λih\frac{\lambda_{i}}{h}, then the error term π212i2h2\frac{\pi^{2}}{12}i^{2}h^{2} in the last equality of (62) will disappear.

4.3 Spectral analysis

With the above discussion, we are able to analyze the spectral behavior and the convergence speed of the proposed hybrid iterative method. Now we express e(0),e(1)e^{(0)},e^{(1)} (see Section 4.1) by the eigenvectors of AA as

{e(0)=μμ(0)=μ=i=1nαiξi=Ξα,e(1)=(I¯A)e(0)=i=1nαi(Iλi¯)ξi=:i=1nβiξi=Ξβ,\begin{cases}e^{(0)}=\mu-\mu^{(0)}=\mu=\sum_{i=1}^{n}\alpha_{i}\xi^{i}=\Xi\cdot\alpha,\\ e^{(1)}=(I-\bar{\mathcal{M}}A)e^{(0)}=\sum_{i=1}^{n}\alpha_{i}(I-\lambda_{i}\bar{\mathcal{M}})\xi^{i}=:\sum_{i=1}^{n}\beta_{i}\xi^{i}=\Xi\cdot\beta,\end{cases} (66)

where α:=(α1,,αn)T\alpha:=(\alpha_{1},...,\alpha_{n})^{T}, β:=(β1,,βn)T\beta:=(\beta_{1},...,\beta_{n})^{T}. We further set

{Y:=Ξ1¯Ξ=(ξ1,,ξn)1¯(ξ1,,ξn),Z:=Ξ1BΞ=(ξ1,,ξn)1B(ξ1,,ξn),\begin{cases}Y:=\Xi^{-1}\bar{\mathcal{M}}\Xi=(\xi^{1},...,\xi^{n})^{-1}\bar{\mathcal{M}}(\xi^{1},...,\xi^{n}),\\ Z:=\Xi^{-1}B\Xi=(\xi^{1},...,\xi^{n})^{-1}B(\xi^{1},...,\xi^{n}),\end{cases} (67)

then by direct computing we have

β=(IYΛ)α,Λ=diag(λ1,,λn),\beta=(I-Y\Lambda)\alpha,\quad\Lambda={\rm diag}(\lambda_{1},...,\lambda_{n}), (68)

so that

e(1)=Ξ(IYΛ)α.e^{(1)}=\Xi\cdot(I-Y\Lambda)\alpha. (69)

Similarly, we can derive that

e(M)=Ξ(IZΛ)M1(IYΛ)α.e^{(M)}=\Xi\cdot(I-Z\Lambda)^{M-1}(I-Y\Lambda)\alpha. (70)

Consequently, we obtain the final error as

e(sM)=ΞTsα,e^{(sM)}=\Xi\cdot T^{s}\alpha, (71)

where

T:=(IZΛ)M1(IYΛ).T:=(I-Z\Lambda)^{M-1}(I-Y\Lambda). (72)

Next we proceed discussions on TT. Note that the YY defined in (67) depends on the trained operator regression model MIONet, i.e., \mathcal{M}. Assume that ¯\bar{\mathcal{M}} perfectly learns the eigenvectors of Eq. (37), so that

A¯(ξi)=ξi,A\bar{\mathcal{M}}(\xi^{i})=\xi^{i}, (73)

which leads to

¯(ξi)=λi1ξi.\bar{\mathcal{M}}(\xi^{i})=\lambda_{i}^{-1}\xi^{i}. (74)

Then

Y=Ξ1¯Ξ=Λ1,Y=\Xi^{-1}\bar{\mathcal{M}}\Xi=\Lambda^{-1}, (75)

thus

YΛ=I,T=0.Y\Lambda=I,\quad T=0. (76)

In such a situation, the first iterative step by MIONet in fact gives the right solution for the discrete system (37). However, we have to take into account the error of MIONet. Keep in mind that YY is in fact an approximation of Λ1\Lambda^{-1}, therefore we expect

IYΛ0,I-Y\Lambda\approx 0, (77)

at least for the first few columns. As discussed in Section 4.2, we recall that

ei=ξi¯(λiξi),1in,e_{i}=\xi^{i}-\bar{\mathcal{M}}(\lambda_{i}\xi^{i}),\quad 1\leq i\leq n, (78)

and

ei2=𝒪(i3h)+𝒪(i),1in0.\left\lVert e_{i}\right\rVert_{2}=\mathcal{O}(i^{3}h)+\mathcal{O}(\ell_{i}),\quad 1\leq i\leq n_{0}. (79)

The errors of MIONet on the previous n0n_{0} eigenvectors can be controlled by hh and the loss i\ell_{i}. Hence we assume that e1,,en0e_{1},...,e_{n_{0}} are small enough with sufficiently training for MIONet, while en0+1,,ene_{n_{0}+1},...,e_{n} are uncontrolled. Therefore,

IYΛ=Ξ1(Ξ¯ΞΛ)=Ξ(e1,,en0,en0+1,,en)=[Ξ(e1,,en0)Ξ(en0+1,,en)],\begin{split}I-Y\Lambda=&\Xi^{-1}\cdot(\Xi-\bar{\mathcal{M}}\Xi\Lambda)\\ =&\Xi\cdot(e_{1},...,e_{n_{0}},e_{n_{0}+1},...,e_{n})\\ =&\begin{bmatrix}\Xi\cdot(e_{1},...,e_{n_{0}})&\Xi\cdot(e_{n_{0}+1},...,e_{n})\end{bmatrix},\end{split} (80)

whose first n0n_{0} columns are small enough. It indicates that a well-trained MIONet can eliminate the low-frequency components of the error.

Return to the final error e(sM)e^{(sM)}, we have

e(sM)=Ξ((IZΛ)M1(IYΛ))sα=Ξ(IZΛ)M1((IYΛ)(IZΛ)M1)s1(IYΛ)α=Ξ(IZΛ)M1T~s1(IYΛ)α,\begin{split}e^{(sM)}&=\Xi\cdot((I-Z\Lambda)^{M-1}(I-Y\Lambda))^{s}\alpha\\ &=\Xi\cdot(I-Z\Lambda)^{M-1}\left((I-Y\Lambda)(I-Z\Lambda)^{M-1}\right)^{s-1}(I-Y\Lambda)\alpha\\ &=\Xi\cdot(I-Z\Lambda)^{M-1}\tilde{T}^{s-1}(I-Y\Lambda)\alpha,\end{split} (81)

where

T~:=(IYΛ)(IZΛ)M1.\tilde{T}:=(I-Y\Lambda)(I-Z\Lambda)^{M-1}. (82)

We subsequently study how the T~\tilde{T} acts on a vector v=(v1,,vn)Tnv=(v_{1},...,v_{n})^{T}\in\mathbb{R}^{n}. Now assume that we choose the Richardson iteration and set ω=h4\omega=\frac{h}{4} which satisfies the convergence condition, then Z=h4IZ=\frac{h}{4}I and

T~v=(IYΛ)(Ih4Λ)M1v=Ξ1(Ξ¯ΞΛ)(cos2(M1)(π2h)v1,,cos2(M1)(π2hn)vn)T=Ξ(e1,,en)(cos2(M1)(π2h)v1,,cos2(M1)(π2hn)vn)T=Ξ(i=1ncos2(M1)(π2hi)viei),\begin{split}\tilde{T}v&=(I-Y\Lambda)(I-\frac{h}{4}\Lambda)^{M-1}v\\ &=\Xi^{-1}\cdot(\Xi-\bar{\mathcal{M}}\Xi\Lambda)\left(\cos^{2(M-1)}(\frac{\pi}{2}h)v_{1},...,\cos^{2(M-1)}(\frac{\pi}{2}hn)v_{n}\right)^{T}\\ &=\Xi\cdot(e_{1},...,e_{n})\left(\cos^{2(M-1)}(\frac{\pi}{2}h)v_{1},...,\cos^{2(M-1)}(\frac{\pi}{2}hn)v_{n}\right)^{T}\\ &=\Xi\cdot\left(\sum_{i=1}^{n}\cos^{2(M-1)}(\frac{\pi}{2}hi)v_{i}e_{i}\right),\end{split} (83)

therefore

T~v2=i=1ncos2(M1)(π2hi)viei2i=1ncos2(M1)(π2hi)|vi|ei2(i=1ncos4(M1)(π2hi)ei22)12(i=1n|vi|2)12(i=1ncos2(M1)(π2hi)ei2)v2.\begin{split}\left\lVert\tilde{T}v\right\rVert_{2}&=\left\lVert\sum_{i=1}^{n}\cos^{2(M-1)}(\frac{\pi}{2}hi)v_{i}e_{i}\right\rVert_{2}\\ &\leq\sum_{i=1}^{n}\cos^{2(M-1)}(\frac{\pi}{2}hi)|v_{i}|\left\lVert e_{i}\right\rVert_{2}\\ &\leq\left(\sum_{i=1}^{n}\cos^{4(M-1)}(\frac{\pi}{2}hi)\left\lVert e_{i}\right\rVert_{2}^{2}\right)^{\frac{1}{2}}\left(\sum_{i=1}^{n}|v_{i}|^{2}\right)^{\frac{1}{2}}\\ &\leq\left(\sum_{i=1}^{n}\cos^{2(M-1)}(\frac{\pi}{2}hi)\left\lVert e_{i}\right\rVert_{2}\right)\left\lVert v\right\rVert_{2}.\end{split} (84)

As

{cos2(π2hi)cos2(π2h),1in0,cos2(π2hi)cos2(π2h(n0+1)),n0+1in,\begin{cases}\cos^{2}(\frac{\pi}{2}hi)\leq\cos^{2}(\frac{\pi}{2}h),\quad 1\leq i\leq n_{0},\\ \cos^{2}(\frac{\pi}{2}hi)\leq\cos^{2}(\frac{\pi}{2}h(n_{0}+1)),\quad n_{0}+1\leq i\leq n,\end{cases} (85)

we obtain the estimate

T~v2(cos2(M1)(π2h)i=1n0ei2+cos2(M1)(π2h(n0+1))i=n0+1nei2)v2,\left\lVert\tilde{T}v\right\rVert_{2}\leq\left(\cos^{2(M-1)}(\frac{\pi}{2}h)\sum_{i=1}^{n_{0}}\left\lVert e_{i}\right\rVert_{2}+\cos^{2(M-1)}(\frac{\pi}{2}h(n_{0}+1))\sum_{i=n_{0}+1}^{n}\left\lVert e_{i}\right\rVert_{2}\right)\left\lVert v\right\rVert_{2}, (86)

which shows the change of the error within a period of MM steps. Subsequently, we write down the total error as

e(sM)2=(IZΛ)M1T~s1(IYΛ)α2=(Ih4Λ)M1T~s1(IΞ1¯ΞΛ)α2cos2(M1)(π2h)T~s1(IΞ1¯ΞΛ)α2cos2(M1)(π2h)(cos2(M1)(π2h)i=1n0ei2+cos2(M1)(π2h(n0+1))i=n0+1nei2)s1(IΞ1¯ΞΛ)α2.\begin{split}&\left\lVert e^{(sM)}\right\rVert_{2}\\ =&\left\lVert(I-Z\Lambda)^{M-1}\tilde{T}^{s-1}(I-Y\Lambda)\alpha\right\rVert_{2}\\ =&\left\lVert(I-\frac{h}{4}\Lambda)^{M-1}\tilde{T}^{s-1}(I-\Xi^{-1}\bar{\mathcal{M}}\Xi\Lambda)\alpha\right\rVert_{2}\\ \leq&\cos^{2(M-1)}(\frac{\pi}{2}h)\left\lVert\tilde{T}^{s-1}(I-\Xi^{-1}\bar{\mathcal{M}}\Xi\Lambda)\alpha\right\rVert_{2}\\ \leq&\cos^{2(M-1)}(\frac{\pi}{2}h)\left(\cos^{2(M-1)}(\frac{\pi}{2}h)\sum_{i=1}^{n_{0}}\left\lVert e_{i}\right\rVert_{2}+\cos^{2(M-1)}(\frac{\pi}{2}h(n_{0}+1))\sum_{i=n_{0}+1}^{n}\left\lVert e_{i}\right\rVert_{2}\right)^{s-1}\\ &\cdot\left\lVert(I-\Xi^{-1}\bar{\mathcal{M}}\Xi\Lambda)\alpha\right\rVert_{2}.\end{split} (87)

Moreover, the last term in above inequality is

(IΞ1¯ΞΛ)α2=(Ξ¯ΞΛ)α2=(e1,,en)α2i=1n0|αi|ei2+i=n0+1n|αi|ei2C(cos2(M1)(π2h)i=1n0ei2+cos2(M1)(π2h(n0+1))i=n0+1nei2),\begin{split}&\left\lVert(I-\Xi^{-1}\bar{\mathcal{M}}\Xi\Lambda)\alpha\right\rVert_{2}\\ =&\left\lVert(\Xi-\bar{\mathcal{M}}\Xi\Lambda)\alpha\right\rVert_{2}\\ =&\left\lVert(e_{1},...,e_{n})\alpha\right\rVert_{2}\\ \leq&\sum_{i=1}^{n_{0}}|\alpha_{i}|\left\lVert e_{i}\right\rVert_{2}+\sum_{i=n_{0}+1}^{n}|\alpha_{i}|\left\lVert e_{i}\right\rVert_{2}\\ \leq&C\left(\cos^{2(M-1)}(\frac{\pi}{2}h)\sum_{i=1}^{n_{0}}\left\lVert e_{i}\right\rVert_{2}+\cos^{2(M-1)}(\frac{\pi}{2}h(n_{0}+1))\sum_{i=n_{0}+1}^{n}\left\lVert e_{i}\right\rVert_{2}\right),\end{split} (88)

where C>0C>0 is a constant independent of ss. We hence have the simple relationship that

e(sM)2C(cos2(M1)(π2h)i=1n0ei2+cos2(M1)(π2h(n0+1))i=n0+1nei2)s.\left\lVert e^{(sM)}\right\rVert_{2}\leq C\cdot\left(\cos^{2(M-1)}(\frac{\pi}{2}h)\sum_{i=1}^{n_{0}}\left\lVert e_{i}\right\rVert_{2}+\cos^{2(M-1)}(\frac{\pi}{2}h(n_{0}+1))\sum_{i=n_{0}+1}^{n}\left\lVert e_{i}\right\rVert_{2}\right)^{s}. (89)

For comparison, we write down the error of the original Richardson method as

eRi(sM)2=(i=1ncos4sM(π2hi)αi2)12cos2sM(π2h)|α1|=CRi(cos2M(π2h))s,\left\lVert e^{(sM)}_{\rm Ri}\right\rVert_{2}=\left(\sum_{i=1}^{n}\cos^{4sM}(\frac{\pi}{2}hi)\alpha_{i}^{2}\right)^{\frac{1}{2}}\geq\cos^{2sM}(\frac{\pi}{2}h)|\alpha_{1}|=C_{\rm Ri}\cdot\left(\cos^{2M}(\frac{\pi}{2}h)\right)^{s}, (90)

where CRi>0C_{\rm Ri}>0 is a constant under the assumption that α10\alpha_{1}\neq 0.

Now we show the errors of the Richardson iterative method and the corresponding hybrid method as

eRi(sM)2CRi(η1M)s,\displaystyle\left\lVert e^{(sM)}_{\rm Ri}\right\rVert_{2}\geq C_{\rm Ri}\cdot\left(\eta_{1}^{M}\right)^{s},\quad\quad\quad\quad Richardson (91)
e(sM)2C(ϵη1M1+Rη2M1)s,\displaystyle\left\lVert e^{(sM)}\right\rVert_{2}\leq C\cdot\left(\epsilon\cdot\eta_{1}^{M-1}+R\cdot\eta_{2}^{M-1}\right)^{s},\quad\quad\quad\quad Hybrid (92)

where

η1=cos2(π2h),η2=cos2(π2h(n0+1)),ϵ=i=1n0ei2,R=i=n0+1nei2.\eta_{1}=\cos^{2}(\frac{\pi}{2}h),\quad\eta_{2}=\cos^{2}(\frac{\pi}{2}h(n_{0}+1)),\quad\epsilon=\sum_{i=1}^{n_{0}}\left\lVert e_{i}\right\rVert_{2},\quad R=\sum_{i=n_{0}+1}^{n}\left\lVert e_{i}\right\rVert_{2}. (93)

Note that η11\eta_{1}\approx 1, 1>η1>η2>01>\eta_{1}>\eta_{2}>0, and ϵ\epsilon is usually small enough while RR is uncontrolled. Up to now, we are able to compare the convergence speeds of these two methods.

Theorem 4.

Assume that ϵ<η1\epsilon<\eta_{1}. Then there exists a K>0K>0, such that for any integer M>KM>K, it holds

e(sM)2<eRi(sM)2\left\lVert e^{(sM)}\right\rVert_{2}<\left\lVert e^{(sM)}_{\rm Ri}\right\rVert_{2} (94)

for ss large enough. Briefly, the hybrid iterative method converges faster than the original Richardson iterative method for MM large enough as long as ϵ<η1\epsilon<\eta_{1}.

Proof.

If ϵ<η1\epsilon<\eta_{1}, we have

limMϵη1M1+Rη2M1η1M=limMϵη1+Rη2(η2η1)M=ϵη1<1.\lim_{M\to\infty}\frac{\epsilon\cdot\eta_{1}^{M-1}+R\cdot\eta_{2}^{M-1}}{\eta_{1}^{M}}=\lim_{M\to\infty}\frac{\epsilon}{\eta_{1}}+\frac{R}{\eta_{2}}\cdot\left(\frac{\eta_{2}}{\eta_{1}}\right)^{M}=\frac{\epsilon}{\eta_{1}}<1. (95)

Next we investigate how the MM influences the convergence rate of the hybrid iterative method. According to Eq. (92-91), the average convergence rates of the above methods are derived as the following.

Theorem 5 (convergence rate).

The convergence rate of the Richardson iterative method and an upper bound of the average convergence rate of the corresponding hybrid iterative method are

limseRi(sM)21sM=η1,\displaystyle\lim_{s\to\infty}\left\lVert e^{(sM)}_{\rm Ri}\right\rVert_{2}^{\frac{1}{sM}}=\eta_{1},\quad\quad\quad\quad Richardson (96)
lim¯se(sM)21sMη1(ϵη1+Rη2(η2η1)M)1M.\displaystyle\varlimsup_{s\to\infty}\left\lVert e^{(sM)}\right\rVert_{2}^{\frac{1}{sM}}\leq\eta_{1}\cdot\left(\frac{\epsilon}{\eta_{1}}+\frac{R}{\eta_{2}}\cdot\left(\frac{\eta_{2}}{\eta_{1}}\right)^{M}\right)^{\frac{1}{M}}.\quad\quad\quad\quad Hybrid (97)

This theorem clearly points out how the network correction step accelerates the iteration.

Denote

Rate(M)=η1(ϵη1+Rη2(η2η1)M)1M.{\rm Rate}(M)=\eta_{1}\cdot\left(\frac{\epsilon}{\eta_{1}}+\frac{R}{\eta_{2}}\cdot\left(\frac{\eta_{2}}{\eta_{1}}\right)^{M}\right)^{\frac{1}{M}}. (98)

To visually observe the tendency of Rate(MM), we for example set η1=0.999,η2=0.5,ϵ=0.1,R=10\eta_{1}=0.999,\eta_{2}=0.5,\epsilon=0.1,R=10, and plot them in Figure 3. It is shown that Rate(M){\rm Rate}(M) decreases first then increases, and there exists a minimum point of Rate(M){\rm Rate}(M). For MM that is too small, the rate will be larger than 1, which means the hybrid iterative method will diverge. The rate increases and tends to the convergence rate of Richardson as MM\to\infty.

Refer to caption
Figure 3: An example of Rate(M){\rm Rate}(M). In this case η1=0.999,η2=0.5,ϵ=0.1,R=10\eta_{1}=0.999,\eta_{2}=0.5,\epsilon=0.1,R=10.

4.4 Discussion for complicated cases

At this point, we have systematically analyzed the hybrid iterative method for the one-dimensional case with Richardson iteration (or equivalently damped Jacobi iteration). For more complicated case, such as the high-dimensional problem with Gauss-Seidel iteration which is a better smoother, we have to apply other effective tools [1, 8]. The smoothing property of GS is conveniently analyzed by using local mode analysis introduced by Brandt [1]. Here let us show the idea and promote the discussion on the hybrid Gauss-Seidel iteration.

Recall that the 2-d Poisson equation with the uniform grid and the lexicographical order can be discretized as

4μi,j(μi1,j+μi,j1+μi,j+1+μi+1,j)=bi,j.4\mu_{i,j}-(\mu_{i-1,j}+\mu_{i,j-1}+\mu_{i,j+1}+\mu_{i+1,j})=b_{i,j}. (99)

To begin with, we add some idealized assumptions and simplifications as in the local mode analysis:

  • 1.

    Assume that boundary conditions are neglected and the problem is defined on an infinite grid with a period of n+1n+1, i.e.

    μi+k(n+1),j+l(n+1)=μi,j,bi+k(n+1),j+l(n+1)=bi,j,i,j,k,l.\mu_{i+k\cdot(n+1),j+l\cdot(n+1)}=\mu_{i,j},\quad b_{i+k\cdot(n+1),j+l\cdot(n+1)}=b_{i,j},\quad\forall i,j,k,l\in\mathbb{Z}. (100)
  • 2.

    Assume that the Gauss-Seidel iteration is implicitly defined as

    4μ~i,j(μ~i1,j+μ~i,j1+μ¯i,j+1+μ¯i+1,j)=bi,j,i,j,4\tilde{\mu}_{i,j}-(\tilde{\mu}_{i-1,j}+\tilde{\mu}_{i,j-1}+\bar{\mu}_{i,j+1}+\bar{\mu}_{i+1,j})=b_{i,j},\quad\forall i,j\in\mathbb{Z}, (101)

    where μ~\tilde{\mu} is updated from μ¯\bar{\mu}. Note that Eq. (101) leads to an algebraic system for μ~\tilde{\mu}, which is uniquely solvable since it is strictly diagonally dominant.

Denote e~i,j=μi,jμ~i,j\tilde{e}_{i,j}=\mu_{i,j}-\tilde{\mu}_{i,j} and e¯i,j=μi,jμ¯i,j\bar{e}_{i,j}=\mu_{i,j}-\bar{\mu}_{i,j}, then

4e~i,j(e~i1,j+e~i,j1+e¯i,j+1+e¯i+1,j)=0.4\tilde{e}_{i,j}-(\tilde{e}_{i-1,j}+\tilde{e}_{i,j-1}+\bar{e}_{i,j+1}+\bar{e}_{i+1,j})=0. (102)

We expand these errors by the discrete Fourier transformation, as

e~i,j=θΘc~(θ)Ψi,jθ,e¯i,j=θΘc¯(θ)Ψi,jθ,0i,jn,\tilde{e}_{i,j}=\sum_{\theta\in\Theta}\tilde{c}(\theta)\Psi^{\theta}_{i,j},\quad\bar{e}_{i,j}=\sum_{\theta\in\Theta}\bar{c}(\theta)\Psi^{\theta}_{i,j},\quad 0\leq i,j\leq n, (103)

where

Ψi,jθ=1n+1𝐞𝐢(iθ1+jθ2),𝐢2=1,θ=(θ1,θ2),\Psi^{\theta}_{i,j}=\frac{1}{n+1}\mathbf{e}^{\mathbf{i}(i\theta_{1}+j\theta_{2})},\quad\mathbf{i}^{2}=-1,\quad\theta=(\theta_{1},\theta_{2}), (104)

and

Θ={2πn+1(k,l)|n2k,ln+12,k,l}.\Theta=\left\{\frac{2\pi}{n+1}(k,l)\bigg{|}-\frac{n}{2}\leq k,l\leq\frac{n+1}{2},\quad k,l\in\mathbb{Z}\right\}. (105)

Substituting (103) into (102), we obtain

ζ(θ):=|c~(θ)c¯(θ)|=|𝐞𝐢θ1+𝐞𝐢θ24𝐞𝐢θ1𝐞𝐢θ2|.\zeta(\theta):=\left|\frac{\tilde{c}(\theta)}{\bar{c}(\theta)}\right|=\left|\frac{\mathbf{e}^{\mathbf{i}\theta_{1}}+\mathbf{e}^{\mathbf{i}\theta_{2}}}{4-\mathbf{e}^{\mathbf{i}\theta_{1}}-\mathbf{e}^{\mathbf{i}\theta_{2}}}\right|. (106)

The smoothing factor is defined as

ζρ=maxρπ|θ|πζ(θ),|θ|:=max(|θ1|,|θ2|),0<ρ<1.\zeta_{\rho}=\max_{\rho\pi\leq|\theta|\leq\pi}\zeta(\theta),\quad|\theta|:=\max(|\theta_{1}|,|\theta_{2}|),\quad 0<\rho<1. (107)

It can be shown that ζ1/2=ζ(π2,arccos(45))=12\zeta_{1/2}=\zeta(\frac{\pi}{2},\arccos(\frac{4}{5}))=\frac{1}{2}, which means the GS iteration can efficiently eliminate the high-frequency errors.

It is worth noting that ζ(0,0)=1\zeta(0,0)=1, which indicates that the lowest-frequency component will never be eliminated. In fact, the problem (99) does not determine a solution, since a change of a constant on μ\mu does not affect the relationship. For convenience, we only consider the error excluding the lowest-frequency component in the following.

Based on this framework, we proceed the discussion on the network correction step. Assume that the MIONet-based iterative step ¯\bar{\mathcal{M}} sufficiently learns the low-frequency modes, i.e.,

4[¯Ψθ]i,j([¯Ψθ]i1,j+[¯Ψθ]i,j1+[¯Ψθ]i,j+1+[¯Ψθ]i+1,j)Ψi,jθ,0<|θ|ρπ,4[\bar{\mathcal{M}}\Psi^{\theta}]_{i,j}-([\bar{\mathcal{M}}\Psi^{\theta}]_{i-1,j}+[\bar{\mathcal{M}}\Psi^{\theta}]_{i,j-1}+[\bar{\mathcal{M}}\Psi^{\theta}]_{i,j+1}+[\bar{\mathcal{M}}\Psi^{\theta}]_{i+1,j})\approx\Psi^{\theta}_{i,j},\quad 0<|\theta|\leq\rho\pi, (108)

where 0<ρ<10<\rho<1 is a constant that determines the (low-)frequency components learned. Denote

¯Ψθ=φΘαθ(φ)Ψφ,\bar{\mathcal{M}}\Psi^{\theta}=\sum_{\varphi\in\Theta}\alpha_{\theta}(\varphi)\cdot\Psi^{\varphi}, (109)

we substitute (109) into (108) and similarly derive that

φΘ(42cos(φ1)2cos(φ2))αθ(φ)ΨφΨθ,0<|θ|ρπ,\sum_{\varphi\in\Theta}(4-2\cos(\varphi_{1})-2\cos(\varphi_{2}))\alpha_{\theta}(\varphi)\Psi^{\varphi}\approx\Psi^{\theta},\quad 0<|\theta|\leq\rho\pi, (110)

which leads to

(42cos(φ1)2cos(φ2))αθ(φ)δθφ(Diracdeltafunction),0<|θ|ρπ.(4-2\cos(\varphi_{1})-2\cos(\varphi_{2}))\alpha_{\theta}(\varphi)\approx\delta_{\theta}^{\varphi}\quad({\rm Dirac\ delta\ function}),\quad 0<|\theta|\leq\rho\pi. (111)

Denote

γθφ:=δθφ(42cos(φ1)2cos(φ2))αθ(φ),\gamma_{\theta}^{\varphi}:=\delta_{\theta}^{\varphi}-(4-2\cos(\varphi_{1})-2\cos(\varphi_{2}))\alpha_{\theta}(\varphi), (112)

then we suppose that |γθφ||\gamma_{\theta}^{\varphi}| is as small as needed for 0<|θ|ρπ0<|\theta|\leq\rho\pi.

Now we consider the MIONet iteration step, i.e.,

μ~=μ¯+¯(r¯),\tilde{\mu}=\bar{\mu}+\bar{\mathcal{M}}(\bar{r}), (113)

or equivalently

e~=e¯¯(r¯),\tilde{e}=\bar{e}-\bar{\mathcal{M}}(\bar{r}), (114)

where e~=μμ~\tilde{e}=\mu-\tilde{\mu}, e¯=μμ¯\bar{e}=\mu-\bar{\mu}, and

r¯i,j=bi,j(4μ¯i,j(μ¯i1,j+μ¯i,j1+μ¯i,j+1+μ¯i+1,j))=4μi,j(μi1,j+μi,j1+μi,j+1+μi+1,j)(4μ¯i,j(μ¯i1,j+μ¯i,j1+μ¯i,j+1+μ¯i+1,j))=4e¯i,j(e¯i1,j+e¯i,j1+e¯i,j+1+e¯i+1,j).\begin{split}\bar{r}_{i,j}=&b_{i,j}-(4\bar{\mu}_{i,j}-(\bar{\mu}_{i-1,j}+\bar{\mu}_{i,j-1}+\bar{\mu}_{i,j+1}+\bar{\mu}_{i+1,j}))\\ =&4\mu_{i,j}-(\mu_{i-1,j}+\mu_{i,j-1}+\mu_{i,j+1}+\mu_{i+1,j})\\ &-(4\bar{\mu}_{i,j}-(\bar{\mu}_{i-1,j}+\bar{\mu}_{i,j-1}+\bar{\mu}_{i,j+1}+\bar{\mu}_{i+1,j}))\\ =&4\bar{e}_{i,j}-(\bar{e}_{i-1,j}+\bar{e}_{i,j-1}+\bar{e}_{i,j+1}+\bar{e}_{i+1,j}).\end{split} (115)

We expand e~,e¯,r¯\tilde{e},\bar{e},\bar{r} as before, i.e.,

e~=θΘc~(θ)Ψθ,e¯=θΘc¯(θ)Ψθ,r¯=θΘ(42cos(θ1)2cos(θ2))c¯(θ)Ψθ,\tilde{e}=\sum_{\theta\in\Theta}\tilde{c}(\theta)\Psi^{\theta},\quad\bar{e}=\sum_{\theta\in\Theta}\bar{c}(\theta)\Psi^{\theta},\quad\bar{r}=\sum_{\theta\in\Theta}(4-2\cos(\theta_{1})-2\cos(\theta_{2}))\bar{c}(\theta)\Psi^{\theta}, (116)

then we have

θΘc~(θ)Ψθ=θΘc¯(θ)Ψθ¯(θΘ(42cos(θ1)2cos(θ2))c¯(θ)Ψθ)=θΘc¯(θ)ΨθθΘ(42cos(θ1)2cos(θ2))c¯(θ)φΘαθ(φ)Ψφ=θΘc¯(θ)ΨθθΘφΘ(42cos(φ1)2cos(φ2))c¯(φ)αφ(θ)Ψθ.\begin{split}\sum_{\theta\in\Theta}\tilde{c}(\theta)\Psi^{\theta}=&\sum_{\theta\in\Theta}\bar{c}(\theta)\Psi^{\theta}-\bar{\mathcal{M}}\left(\sum_{\theta\in\Theta}(4-2\cos(\theta_{1})-2\cos(\theta_{2}))\bar{c}(\theta)\Psi^{\theta}\right)\\ =&\sum_{\theta\in\Theta}\bar{c}(\theta)\Psi^{\theta}-\sum_{\theta\in\Theta}(4-2\cos(\theta_{1})-2\cos(\theta_{2}))\bar{c}(\theta)\sum_{\varphi\in\Theta}\alpha_{\theta}(\varphi)\cdot\Psi^{\varphi}\\ =&\sum_{\theta\in\Theta}\bar{c}(\theta)\Psi^{\theta}-\sum_{\theta\in\Theta}\sum_{\varphi\in\Theta}(4-2\cos(\varphi_{1})-2\cos(\varphi_{2}))\bar{c}(\varphi)\alpha_{\varphi}(\theta)\cdot\Psi^{\theta}.\end{split} (117)

Consequently,

c~(θ)=c¯(θ)φΘc¯(φ)(42cos(φ1)2cos(φ2))αφ(θ)=φΘc¯(φ)42cos(φ1)2cos(φ2)42cos(θ1)2cos(θ2)γφθ,|θ|>0.\begin{split}\tilde{c}(\theta)=&\bar{c}(\theta)-\sum_{\varphi\in\Theta}\bar{c}(\varphi)(4-2\cos(\varphi_{1})-2\cos(\varphi_{2}))\alpha_{\varphi}(\theta)\\ =&\sum_{\varphi\in\Theta}\bar{c}(\varphi)\cdot\frac{4-2\cos(\varphi_{1})-2\cos(\varphi_{2})}{4-2\cos(\theta_{1})-2\cos(\theta_{2})}\cdot\gamma^{\theta}_{\varphi},\quad|\theta|>0.\end{split} (118)

Return to the algorithm of the hybrid iterative method, the network correction step is performed following M1M-1 GS iteration steps, so that c¯(φ)\bar{c}(\varphi) can be written as

c¯(φ)=c¯(M1)(φ),\bar{c}(\varphi)=\bar{c}^{(M-1)}(\varphi), (119)

then we have the estimate

|c¯(φ)|=|c¯(M1)(φ)|(ζρ)M1|c¯(0)(φ)|,|φ|ρπ,|\bar{c}(\varphi)|=|\bar{c}^{(M-1)}(\varphi)|\leq(\zeta_{\rho})^{M-1}|\bar{c}^{(0)}(\varphi)|,\quad|\varphi|\geq\rho\pi, (120)

and the low-frequency component |c¯(M1)(φ)||\bar{c}^{(M-1)}(\varphi)| (0<|φ|<ρπ0<|\varphi|<\rho\pi) has a smoothing factor ζ0\zeta_{0} greater than ζρ\zeta_{\rho}, i.e.

|c¯(M1)(φ)|(ζ0)M1|c¯(0)(φ)|,ζρ<ζ0<1,0<|φ|<ρπ.|\bar{c}^{(M-1)}(\varphi)|\leq(\zeta_{0})^{M-1}|\bar{c}^{(0)}(\varphi)|,\quad\zeta_{\rho}<\zeta_{0}<1,\quad 0<|\varphi|<\rho\pi. (121)

Now denote

e(M):=φΘ,|φ|>0c~(φ)Ψφ,e(0):=φΘ,|φ|>0c¯(0)(φ)Ψφ,e^{(M)}:=\sum_{\varphi\in\Theta,|\varphi|>0}\tilde{c}(\varphi)\Psi^{\varphi},\quad e^{(0)}:=\sum_{\varphi\in\Theta,|\varphi|>0}\bar{c}^{(0)}(\varphi)\Psi^{\varphi}, (122)

then

e(M)2=|θ|>0|c~(θ)|2|θ|>0|c~(θ)||θ|>0|φ|>0|c¯(φ)||42cos(φ1)2cos(φ2)42cos(θ1)2cos(θ2)||γφθ|,\begin{split}\left\lVert e^{(M)}\right\rVert_{2}=&\sqrt{\sum_{|\theta|>0}|\tilde{c}(\theta)|^{2}}\\ \leq&\sum_{|\theta|>0}|\tilde{c}(\theta)|\\ \leq&\sum_{|\theta|>0}\sum_{|\varphi|>0}|\bar{c}(\varphi)|\cdot\left|\frac{4-2\cos(\varphi_{1})-2\cos(\varphi_{2})}{4-2\cos(\theta_{1})-2\cos(\theta_{2})}\right|\cdot|\gamma^{\theta}_{\varphi}|,\\ \end{split} (123)

by (118). With (120) and (121), we further derive that

e(M)2|φ|>0|c¯(M1)(φ)||θ|>0|42cos(φ1)2cos(φ2)42cos(θ1)2cos(θ2)||γφθ|0<|φ|<ρπ(ζ0)M1|c¯(0)(φ)||θ|>0|42cos(φ1)2cos(φ2)42cos(θ1)2cos(θ2)||γφθ|+|φ|ρπ(ζρ)M1|c¯(0)(φ)||θ|>0|42cos(φ1)2cos(φ2)42cos(θ1)2cos(θ2)||γφθ|((ζ0)M1ϵ+(ζρ)M1R)e(0)2,\begin{split}\left\lVert e^{(M)}\right\rVert_{2}\leq&\sum_{|\varphi|>0}|\bar{c}^{(M-1)}(\varphi)|\cdot\sum_{|\theta|>0}\left|\frac{4-2\cos(\varphi_{1})-2\cos(\varphi_{2})}{4-2\cos(\theta_{1})-2\cos(\theta_{2})}\right|\cdot|\gamma^{\theta}_{\varphi}|\\ \leq&\sum_{0<|\varphi|<\rho\pi}(\zeta_{0})^{M-1}|\bar{c}^{(0)}(\varphi)|\cdot\sum_{|\theta|>0}\left|\frac{4-2\cos(\varphi_{1})-2\cos(\varphi_{2})}{4-2\cos(\theta_{1})-2\cos(\theta_{2})}\right|\cdot|\gamma^{\theta}_{\varphi}|\\ &+\sum_{|\varphi|\geq\rho\pi}(\zeta_{\rho})^{M-1}|\bar{c}^{(0)}(\varphi)|\cdot\sum_{|\theta|>0}\left|\frac{4-2\cos(\varphi_{1})-2\cos(\varphi_{2})}{4-2\cos(\theta_{1})-2\cos(\theta_{2})}\right|\cdot|\gamma^{\theta}_{\varphi}|\\ \leq&\left((\zeta_{0})^{M-1}\cdot\epsilon+(\zeta_{\rho})^{M-1}\cdot R\right)\left\lVert e^{(0)}\right\rVert_{2},\end{split} (124)

where

{ϵ=0<|φ|<ρπ,|θ|>0|42cos(φ1)2cos(φ2)42cos(θ1)2cos(θ2)||γφθ|,R=|φ|ρπ,|θ|>0|42cos(φ1)2cos(φ2)42cos(θ1)2cos(θ2)||γφθ|,\begin{cases}\epsilon=\sum_{0<|\varphi|<\rho\pi,|\theta|>0}\left|\frac{4-2\cos(\varphi_{1})-2\cos(\varphi_{2})}{4-2\cos(\theta_{1})-2\cos(\theta_{2})}\right|\cdot|\gamma^{\theta}_{\varphi}|,\\ R=\sum_{|\varphi|\geq\rho\pi,|\theta|>0}\left|\frac{4-2\cos(\varphi_{1})-2\cos(\varphi_{2})}{4-2\cos(\theta_{1})-2\cos(\theta_{2})}\right|\cdot|\gamma^{\theta}_{\varphi}|,\end{cases} (125)

and ϵ\epsilon is supposed to be small as discussed before. The relationship

e(M)2((ζ0)M1ϵ+(ζρ)M1R)e(0)2\left\lVert e^{(M)}\right\rVert_{2}\leq\left((\zeta_{0})^{M-1}\cdot\epsilon+(\zeta_{\rho})^{M-1}\cdot R\right)\left\lVert e^{(0)}\right\rVert_{2} (126)

immediately leads to the final upper bound of the average convergence rate

Rate(M)=ζ0(ϵζ0+Rζρ(ζρζ0)M1)1M.{\rm Rate}(M)=\zeta_{0}\cdot\left(\frac{\epsilon}{\zeta_{0}}+\frac{R}{\zeta_{\rho}}\cdot\left(\frac{\zeta_{\rho}}{\zeta_{0}}\right)^{M-1}\right)^{\frac{1}{M}}. (127)

According to the above deduction, we have studied the convergence rate of this hybrid iterative method for 2-d problem with GS iteration, which is consistent with Eq. (98). It is expected to promote the discussion and obtain further theoretical results for more complicated cases. We will keep it for future research due to space limitations.

5 Numerical experiments

The experiments are implemented in c++ with Eigen library for sparse matrices and basic iterations, and LibTorch library for model inference. The used models are pre-trained via PyTorch. The running devices are Intel(R) Core(TM) i7-10750H for CPU and NVIDIA GeForce RTX 2070 for GPU.

5.1 One-dimensional Poisson equation

Consider the 1-d Poisson equation

{(ku)=fin(0,1),u(0)=u(1)=0.\begin{cases}-\nabla\cdot(k\nabla u)=f\quad\mbox{in}\ (0,1),\\ u(0)=u(1)=0.\end{cases} (128)

To firstly construct the dataset, we generate kk and ff by Gaussian Process (GP) with RBF kernel. kk is generated by GP with mean 1, std 0.2, length scale 0.1, while ff is with mean 0, std 1, length scale 0.1. We totally generate 5000 data points as training set 𝒯={(ki,fi),ui}i=15000\mathcal{T}=\{(k_{i},f_{i}),u_{i}\}_{i=1}^{5000}. After training an MIONet on the dataset 𝒯\mathcal{T}, we obtain an MIONet-based iteration step ¯:nn\bar{\mathcal{M}}:\mathbb{R}^{n}\to\mathbb{R}^{n} (linear), where n=48n=48 is the number of the interior nodal points in the uniform grid for the interval (0,1)(0,1). Here the size of MIONet is [50,100,100,100][50,100,100,100] for kk’s branch net, [50,100][50,100] for ff’s branch net, and [1,100,100,100][1,100,100,100] for the trunk net. Note that we remove the bias in ff’s branch net and also the bias bb in Eq. (4) to guarantee the linearity w.r.t. ff. Figure 4 shows an example of a couple of inputs kk and ff as well as the corresponding solution uu and the MIONet prediction.

Refer to caption
Figure 4: An example of one prediction of MIONet for the 1-d Poisson equation.

We then apply the P1P_{1} element together with the Richardson iteration (set ω=h4=14×49\omega=\frac{h}{4}=\frac{1}{4\times 49}) to solve this 1-d Poisson equation for k=1k=1. The MIONet iteration ¯\bar{\mathcal{M}} is performed every M=20M=20 steps to correct low-frequency errors in the Richardson-MIONet method (we also show the convergence rate versus MM in Figure 5, which is consistent with the theoretical results as Eq. (98) and Figure 3). The error threshold for stopping iteration is 1×10141\times 10^{-14}. We list the numbers of iterations of the Richardson and the Richardson-MIONet in Table 1. The number of iterations of the original Richardson method is 157 times that of the Richardson-MIONet method. Since the 1-d case converges too fast, we leave the study of the convergence time to the later 2-d experiment.

#\#Iterations Speed up
Richardson 28396 /
Richardson-MIONet 181 ×\times157
Table 1: Comparison of the Richardson iterative method and the Richardson-MIONet hybrid iterative method for the 1-d Poisson equation. The number of iterations of the original Richardson method is 157 times that of the Richardson-MIONet method.
Refer to caption
Figure 5: Convergence rate of the Richardson-MIONet method versus the correction period MM. The tendency is consistent with the theoretical results as Eq. (98) and Figure 3.

We further observe the spectral error of the iterative methods, and the results are shown in Figure 6. The high-frequency errors are quickly eliminated by the Richardson iteration in both two methods. However, the low-frequency errors are difficult to eliminate via the original Richardson method. In each correction step, MIONet efficiently modifies the low-frequency errors. Although MIONet brings new high-frequency errors, they will be quickly clear up soon. After about 180 steps, the Richardson-MIONet method has already achieved a machine accuracy, while the original Richardson method still keeps a significant error.

Refer to caption
Figure 6: The spectral error of the iterative methods. The numerical value of each pixel is log10|spectralerror|\log_{10}|{\rm spectral\ error}|. The high-frequency errors are quickly eliminated by the Richardson iteration in both two methods. However, the low-frequency errors are difficult to eliminate via the original Richardson method. MIONet efficiently modifies the low-frequency errors every M=20M=20 steps. Although MIONet brings new high-frequency errors, they will be quickly clear up soon. After about 180 steps, the Richardson-MIONet method has already achieved a machine accuracy, while the original Richardson method still keeps a significant error.

5.2 Two-dimensional Poisson equation

Consider the 2-d Poisson equation

{(ku)=finΩ,u=0onΩ,\begin{cases}-\nabla\cdot(k\nabla u)=f&\quad\mbox{in}\;\Omega,\\ u=0&\quad\mbox{on}\;\partial\Omega,\end{cases} (129)

where Ω=(0,1)2\Omega=(0,1)^{2}. We also generate kk and ff as the same setting as the 1-d case. The difference is that we set the length scale to 0.2. We totally generate 5000 data points as training set 𝒯={(ki,fi),ui}i=15000\mathcal{T}=\{(k_{i},f_{i}),u_{i}\}_{i=1}^{5000}. After training an MIONet on the dataset 𝒯\mathcal{T}, we obtain an approximate solution operator. Here the size of MIONet is [1002,500,500,500][100^{2},500,500,500] for kk’s branch net, [1002,500][100^{2},500] for ff’s branch net, and [2,500,500,500][2,500,500,500] for the trunk net. We also remove the corresponding biases as in the 1-d case. An example of a couple of inputs kk and ff as well as the corresponding solution uu and the MIONet prediction is shown in Figure 7.

Refer to caption
Figure 7: An example of one prediction of MIONet for the 2-d Poisson equation.

We then apply the P1P_{1} element together with the Gauss-Seidel (GS) iteration to solve this 2-d Poisson equation with grid size 1025×10251025\times 1025. The MIONet iteration ¯:nn\bar{\mathcal{M}}:\mathbb{R}^{n}\to\mathbb{R}^{n} (n=1023×1023n=1023\times 1023) is performed every MM steps to correct low-frequency errors in the GS-MIONet method. The error threshold for stopping iteration is 1×10121\times 10^{-12}. We list the numbers of convergence iterations and the convergence time of GS-MIONet given different MM in Table 2, 3. We also show the trends of the convergence steps and the convergence time versus MM in Figure 8. The best speed of the GS-MIONet method is nearly 30 times that of the original GS method at M=8000M=8000, while the number of convergence iterations reaches its minimum at M=4000M=4000. This gap exists since the inference time of MIONet accounts for a significant portion in the hybrid iteration. When M>2556893M>2556893 (i.e. the number of convergence iterations of the GS method), MIONet only gives an initial guess during the hybrid iteration, which provides a 22%22\% acceleration.

MM GS 500 1000 2000 4000 8000 16000 32000 64000
#\#Iterations 2556893 div. 167001 94001 78471 82484 96001 160001 256001
Time (s) 27103 div. 2641 1222 923 919 1049 1722 2750
Speed up / div. ×\times10.3 ×\times22.2 ×\times29.4 ×\times29.5 ×\times25.8 ×\times15.7 ×\times9.9
Table 2: The number of convergence iterations and the convergence time of GS-MIONet hybrid iterative method for the 2-d Poisson equation versus the correction period MM (part 1). The size of grid is 1025×10251025\times 1025. “div.” means the iteration diverges under the given MM.
MM 128000 256000 512000 1024000 2048000 4096000 8192000
#\#Iterations 384640 768001 1024001 1439557 2048001 2071460 2071460
Time (s) 4079 8173 10804 15467 22058 22181 22181
Speed up ×\times6.6 ×\times3.3 ×\times2.5 ×\times1.75 ×\times1.23 ×\times1.22 ×\times1.22
Table 3: The number of convergence iterations and the convergence time of GS-MIONet hybrid iterative method for the 2-d Poisson equation versus the correction period MM (part 2). The size of grid is 1025×10251025\times 1025. When M>2556893M>2556893 (i.e. the number of convergence iterations of the GS method), MIONet only gives an initial guess during the hybrid iteration.
Refer to caption
Figure 8: Performance of GS-MIONet versus the correction period MM. We use the MIONet trained on 100×100100\times 100 grid, and to perform hybrid iteration on 1025×10251025\times 1025 grid. The number of convergence iterations reaches its minimum at M=4000M=4000, while the convergence time reaches its minimum at M=8000M=8000. The best convergence time of GS-MIONet method is 919 seconds, the speed of which is nearly 30 times that of the original GS method. When MM is large enough, the MIONet only gives an initial guess during the hybrid iteration, which provides a 22%22\% acceleration.

Next we compare the hybrid iterative method to the multigrid method. We first solve this equation by different levels of multigrid methods, then for each case, we again test the corresponding multigrid-MIONet hybrid iterative method. In this hybrid method, we regard a V-cycle of the multigrid method as one step, and insert one network correction step every MM V-cycles. The results are recorded in Table 4 and visually presented in Figure 9. We observe that the proposed hybrid iterative method can flexibly utilize the advanced multigrid method. The speed of GS-MIONet is between the multigrid methods of level 3 and 4. There is an acceleration effect all the way up to level 6. For a high level (7\geq 7) multigrid method, it converges so fast that the inference time of MIONet is larger than the multigrid’s convergence time. Unsurprisingly in such a case the hybrid method converges slower compared to the original multigrid method. However, in practice it is costly to construct high level multigrid, while the use of the hybrid iterative method has no extra burden for users as long as the pre-trained model is provided.

Method GS Mg2 Mg3 Mg4 Mg5 Mg6 Mg7 Mg8 Mg9 Mg10
Basic time (s) 27103 8880 2210 567 139 35 8.7 2.4 1.7 1.7
Hybrid time (s) 919 337 118 56 34 22 12.2 7.0 6.4 6.4
Speed up ×\times29.5 ×\times26.4 ×\times18.7 ×\times10.1 ×\times4.1 ×\times1.6 / / / /
Table 4: Multigrid-MIONet method. The time of multigrid methods and corresponding hybrid methods on 1025×10251025\times 1025 grid. The correction period MM is tuned for each hybrid case.
Refer to caption
Figure 9: Multigrid-MIONet method for the 2-d Poisson equation on 1025×10251025\times 1025 grid. There is an acceleration effect all the way up to level 6, and the speed of GS-MIONet is between the multigrid methods of level 3 and 4.

5.3 Inhomogeneous boundary condition

Finally we consider the 2-d Poisson equation with inhomogeneous boundary condition:

{(ku)=finΩ,u=gonΩ,\begin{cases}-\nabla\cdot(k\nabla u)=f&\quad\mbox{in}\;\Omega,\\ u=g&\quad\mbox{on}\;\partial\Omega,\end{cases} (130)

where Ω=(0,1)2\Omega=(0,1)^{2}, and the boundary condition gg is also changing. We generate kk and ff as the same setting as the previous 2-d case. The gg is regarded as a function with a period of 4 (anticlockwise along the boundary), which is generated by GP with exponential sine squared kernel, where we set mean, std, length scale and period to 0, 0.05, 1 and 4, respectively. We totally generate 5000 data points as training set 𝒯={(ki,(fi,gi)),ui}i=15000\mathcal{T}=\{(k_{i},(f_{i},g_{i})),u_{i}\}_{i=1}^{5000}. After training an MIONet on the dataset 𝒯\mathcal{T}, we obtain an approximate solution operator. Here the size of MIONet is [1002,500,500,500][100^{2},500,500,500] for kk’s branch net, [1002,500][100^{2},500] for (f,g)(f,g)’s branch net, and [2,500,500,500][2,500,500,500] for the trunk net. We remove the corresponding biases as before. An example of inputs kk, ff and gg as well as the corresponding solution uu and the MIONet prediction is shown in Figure 10.

Refer to caption
Figure 10: An example of one prediction of MIONet for the 2-d Poisson equation with inhomogeneous boundary condition.

We then apply the P1P_{1} element together with the GS iteration to solve this equation on the 500×500500\times 500 uniform grid. The MIONet iteration ¯:nn\bar{\mathcal{M}}:\mathbb{R}^{n}\to\mathbb{R}^{n} (n=500×500n=500\times 500) is performed every M=1600M=1600 steps to correct low-frequency errors in the GS-MIONet method. The error threshold for stopping iteration is 1×10121\times 10^{-12}. We show the number of convergence iterations and the convergence time of the GS(-MIONet) method in Table 5. The speed of the GS-MIONet method is nearly 24 times that of the original GS method.

#\#Iterations Time (s) Speed up
GS 672230 1652 /
GS-MIONet 20963 69 ×\times23.9
Table 5: Comparison of the GS iterative method and the GS-MIONet hybrid iterative method for the 2-d Poisson equation with inhomogeneous boundary condition. The size of grid is 500×500500\times 500. The speed of the GS-MIONet method is 23.9 times that of the original GS method.

6 Conclusions

We have proposed a hybrid iterative method based on MIONet for PDEs, which combines the traditional numerical iterative solver and the neural operator, and further systematically analyzed its theoretical properties, including the convergence condition, the spectral behavior, as well as the convergence rate, in terms of the errors of the discretization and the model inference. We have shown the theoretical results for Richardson (damped Jacobi) and Gauss-Seidel. An upper bound of the convergence rate of the hybrid method w.r.t. the correction period MM is given, which indicates an optimal MM to make the hybrid iteration converge fastest. Several numerical examples including the hybrid Richardson (Gauss-Seidel) iteration for the 1-d (2-d) Poisson equation are presented to verify our theoretical results. In addition, we tested the hybrid iteration utilizing the multigrid method, it still reflects an excellent acceleration effect, unless the level of multigrid is very high. Furthermore, we have achieved the hybrid iteration for the Poisson equation with inhomogeneous boundary condition, which has not been studied in the work of HINTS. So far, we are able to deal with the Poisson equation in which all the parameters are changing. As a meshless acceleration method, it is provided with enormous potentials for practice applications.

References

  • [1] A. Brandt. Multi-level adaptive solutions to boundary-value problems. Mathematics of computation, 31(138):333–390, 1977.
  • [2] A. Brandt. Algebraic multigrid theory: The symmetric case. Applied mathematics and computation, 19(1-4):23–56, 1986.
  • [3] S. C. Brenner. The mathematical theory of finite element methods. Springer, 2008.
  • [4] W. E, C. Ma, and L. Wu. Barron spaces and the compositional function spaces for neural network models. arXiv preprint arXiv:1906.08039, 2019.
  • [5] W. E and B. Yu. The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
  • [6] R. Eymard, T. Gallouët, and R. Herbin. Finite volume methods. Handbook of numerical analysis, 7:713–1018, 2000.
  • [7] G. Gupta, X. Xiao, and P. Bogdan. Multiwavelet-based operator learning for differential equations. Advances in neural information processing systems, 34:24048–24062, 2021.
  • [8] W. Hackbusch. Multi-grid methods and applications, volume 4. Springer Science & Business Media, 2013.
  • [9] J. He, X. Liu, and J. Xu. MgNO: Efficient parameterization of linear operators via multigrid. arXiv preprint arXiv:2310.19809, 2023.
  • [10] J. Hu and P. Jin. Experimental observation on a low-rank tensor model for eigenvalue problems. arXiv preprint arXiv:2302.00538, 2023.
  • [11] J. Huang, H. Wang, and H. Yang. Int-deep: A deep learning initialized iterative method for nonlinear problems. Journal of computational physics, 419:109675, 2020.
  • [12] Z. Jiang, M. Zhu, D. Li, Q. Li, Y. O. Yuan, and L. Lu. Fourier-MIONet: Fourier-enhanced multiple-input neural operators for multiphase modeling of geological carbon sequestration. arXiv preprint arXiv:2303.04778, 2023.
  • [13] P. Jin, S. Meng, and L. Lu. MIONet: Learning multiple-input operators via tensor product. SIAM Journal on Scientific Computing, 44(6):A3490–A3514, 2022.
  • [14] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
  • [15] N. Kovachki, S. Lanthaler, and S. Mishra. On universal approximation and error bounds for Fourier neural operators. The Journal of Machine Learning Research, 22(1):13237–13312, 2021.
  • [16] S. Lanthaler, S. Mishra, and G. E. Karniadakis. Error estimates for DeepONets: A deep learning framework in infinite dimensions. Transactions of Mathematics and Its Applications, 6(1):tnac001, 2022.
  • [17] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
  • [18] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020.
  • [19] L. Lu, P. Jin, and G. E. Karniadakis. DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
  • [20] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.
  • [21] L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, and S. G. Johnson. Physics-informed neural networks with hard constraints for inverse design. SIAM Journal on Scientific Computing, 43(6):B1105–B1132, 2021.
  • [22] J. Mandel. Algebraic study of multigrid methods for symmetric, definite problems. Applied mathematics and computation, 25(1):39–56, 1988.
  • [23] C. Marcati and C. Schwab. Exponential convergence of deep operator networks for elliptic partial differential equations. SIAM Journal on Numerical Analysis, 61(3):1513–1545, 2023.
  • [24] N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville. On the spectral bias of neural networks. In International Conference on Machine Learning, pages 5301–5310. PMLR, 2019.
  • [25] M. A. Rahman, Z. E. Ross, and K. Azizzadenesheli. U-no: U-shaped neural operators. arXiv preprint arXiv:2204.11127, 2022.
  • [26] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
  • [27] B. Raonić, R. Molinaro, T. Rohner, S. Mishra, and E. de Bezenac. Convolutional Neural Operators. arXiv preprint arXiv:2302.01178, 2023.
  • [28] J. W. Ruge and K. Stüben. Algebraic multigrid. In Multigrid methods, pages 73–130. SIAM, 1987.
  • [29] J. Shen, T. Tang, and L.-L. Wang. Spectral methods: algorithms, analysis and applications, volume 41. Springer Science & Business Media, 2011.
  • [30] J. W. Siegel and J. Xu. High-order approximation rates for shallow neural networks with cosine and ReLUk{\rm ReLU}^{k} activation functions. Applied and Computational Harmonic Analysis, 58:1–26, 2022.
  • [31] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
  • [32] G. D. Smith. Numerical solution of partial differential equations: finite difference methods. Oxford university press, 1985.
  • [33] L. N. Trefethen and D. Bau. Numerical linear algebra, volume 181. Siam, 2022.
  • [34] U. Trottenberg, C. W. Oosterlee, and A. Schuller. Multigrid. Elsevier, 2000.
  • [35] S. Wang, H. Wang, and P. Perdikaris. Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Science advances, 7(40):eabi8605, 2021.
  • [36] Y. Wang, P. Jin, and H. Xie. Tensor neural network and its numerical integration. arXiv preprint arXiv:2207.02754, 2022.
  • [37] H. Wu, T. Hu, H. Luo, J. Wang, and M. Long. Solving High-Dimensional PDEs with Latent Spectral Models. arXiv preprint arXiv:2301.12664, 2023.
  • [38] J. Xu. Theory of multilevel methods, volume 8924558. Cornell University Ithaca, NY, 1989.
  • [39] J. Xu and L. Zikatanov. Algebraic multigrid methods. Acta Numerica, 26:591–721, 2017.
  • [40] Z.-Q. J. Xu, Y. Zhang, T. Luo, Y. Xiao, and Z. Ma. Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523, 2019.
  • [41] E. Zhang, A. Kahana, E. Turkel, R. Ranade, J. Pathak, and G. E. Karniadakis. A hybrid iterative numerical transferable solver (HINTS) for PDEs based on deep operator network and relaxation methods. arXiv preprint arXiv:2208.13273, 2022.