This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SPFNO: Spectral operator learning for PDEs with Dirichlet and Neumann boundary conditions

Ziyuan Liu11footnotemark: 1 [email protected] Yuhang Wu22footnotemark: 2 [email protected] Daniel Zhengyu Huang33footnotemark: 3 [email protected] Hong Zhang44footnotemark: 4 [email protected] Xu Qian55footnotemark: 5 [email protected] Songhe Song66footnotemark: 677footnotemark: 7 [email protected]
Abstract

Neural operators have been validated as promising deep surrogate models for solving partial differential equations (PDEs). Despite the critical role of boundary conditions in PDEs, however, only a limited number of neural operators robustly enforce these conditions. In this paper we introduce semi-periodic Fourier neural operator (SPFNO), a novel spectral operator learning method for solving PDEs with non-periodic BCs. This method extends our previous work (arXiv:2206.12698), which showed significant improvements by employing enhanced neural operators that precisely satisfy the boundary conditions. However, the previous work is associated with Gaussian grids, restricting comprehensive comparisons across most public datasets. In response to this, we present numerical results of SPFNO for various PDEs such as the viscous Burgers’ equation, Darcy flow, incompressible pipe flow, and coupled reaction-diffusion equations. These results demonstrate the computational efficiency, resolution invariant property, and BC-satisfaction behavior of proposed model. An accuracy improvement of approximately 1.7X–-4.7X over the non-BC-satisfying baselines is also achieved. Furthermore, our studies on SOL underscore the significance of satisfying BCs as a criterion for deep surrogate models of PDEs.

keywords:
neural operator , deep learning-based PDE solver , AI4science , scientific machine learning , spectral method
journal: Elsevier
\affiliation

[label1]organization=Department of Mathematics, National University of Defense Technology,city=Changsha, postcode=410073, country=China \affiliation[label2]organization=Beijing International Center for Mathematical Research, Peking University,city=Beijing, postcode=100871, country=China \affiliation[label3]organization=State Key Laboratory of High Performance Computing, National University of Defense Technology,city=Changsha, postcode=410073, country=China

{highlights}

Precisely satisfies the boundary conditions.

State-of-the-art (SOTA) performance on 5 publicly available PDE datasets.

O(NlogN)O(NlogN) time complexity and discretization invariance.

Flexibility to various tasks and general geometries.

1 Introduction

Partial Differential Equations (PDEs) play a pivotal role in various scientific and engineering fields, modeling phenomena such as heat conduction, fluid flow, electromagnetic waves, and quantum mechanics. Given that a substantial majority of PDEs lack analytical solutions, numerical methods, such as spectral methods and finite difference methods, are the primary means of numerically solving PDEs. Recently, researchers have discovered that deep-learning methods can serve as more convenient and efficient alternatives to these traditional methods. At present, two primary deep learning methodologies are employed for solving PDEs, including directly approximating of the solution using neural networks [1, 2, 3, 4, 5, 6, 7], e.g., deep Ritz methods [8, 9, 10] , physics informed neural networks (PINNs, [11, 12, 13, 14, 15, 16, 17, 18, 19]), and extreme learning machine [20]; or approximating the nonlinear operator between the input and output functions [21, 22, 23, 24, 25, 26, 27, 28, 29, 30], which is known as operator learning and is the focus of this paper.

The boundary condition (BC) of PDE plays a crucial role in defining the behavior of the system. Due to the significant physical information contained in BCs, there has been an emerging interest in accurately enforcing these conditions when directly learning the solution using neural networks [31, 32, 33, 34]. Due to the complexity and architectural differences involved in learning a nonlinear operator, the direct generalization of these methods to neural operators is non-trivial.

Refer to caption

Figure 1: (a) Comparisons of the accuracy between SPFNO and non-BC-satisfying baseline models. The lower, the better. (b)–(e) Examples of numerical experiments pertaining to various PDE learning tasks.

Fourier neural operator (FNO, [27, 35]) is a popular neural operator with applications in various fields [36, 37, 38, 39, 40], followed by many derivative neural operators that adopted a similar spectral analysis backbone and replaced the Fourier transform. While the treatment of BCs is also the most crucial issue for spectral methods, it is hard for the majority of these neural operators to accurately enforce the most commonly used BCs, namely the Dirichlet, Neumann, and Robin BCs (also known as the first-, second-, and third-type BCs, respectively). By generalizing the spectral backbone of FNO using the technique of spectral methods with a suitable basis, we have introduced a general framework named spectral operator learning (SOL) in [41], in which the enhanced neural operators satisfy the BCs exactly. And as the first SOL instance, the orthogonal polynomial neural operator (OPNO) showed state-of-the-art performance on solving PDEs with non-periodic BCs. Moreover, when solving the heat transfer equation with Robin BCs, it acheived an unprecedent relative L2L^{2} norm error of 1e61e-6 in the implementation of all neural operators. In addition, based on similar ideas, Bonev et al. [42] developed a spherical fourier neural operators (SFNO) that strictly satisfies behavioral boundary conditions on the sphere [43]; while the Boundary enforcing Operator Network (BOON, [44]) enforces the BCs to neural operators using a refinement procedure.

Meanwhile, there have been multiple studies on the application of the well-known transformer models in operator learning [45, 46], among which the recently developed latent spectral model (LSM, [47]) ranked 1st in solving multiple PDE datasets. Thus, the transformers are significant challengers to neural operators in solving PDEs.

Unfortunately, the fast algorithms for Chebyshev transformation in OPNO depend on the Gaussian grids, while typically, most public datasets only provide values on uniform grids, limiting the comprehensive comparison between SOL architecture with the non-BC-satisfying models. Introducing a novel SOL model with fast algorithms on a uniform grid and comparing it with state-of-the-art baselines across a wider range of datasets will undoubtedly strengthen the persuasiveness of our viewpoint. To address this technical issue, we introduce SPFNO, of which the fast transformation algorithm with a time complexity of O(NlogN)O(NlogN) is designed on a NN-point uniform grid, to solve PDEs with Dirichlet and Neumann BCs. The name SPFNO is derived from the Semi-Periodic Fourier Neural Operator and also represents the SPecified Fourier Neural Operator with non-periodic BCs. Its specified trigonometric bases allow the errors on the BCs to reach machine precision. In addition, SPFNO also possesses the following appealing properties that are expected from a neural operator.

  • 1.

    Invariance to discretization. Without the need for retraining, an SPFNO model trained on a coarse grid can directly predict the solution on any fine grid. Detailed discussion on this property is conducted in Sec. 3.6.2.

  • 2.

    Efficient and accurate spatial differentiation. Differentiating the output function of SPFNO requires operations of only O(NlogN)O(NlogN) using the spectral method.

Comparisons of the performance and several examples are given in Fig. 1. All datasets used are already public, and the code, pre-trained model and reproducible training process are made publicly available at https://github.com/liu-ziyuan-math/SPFNO.

2 Methodology

2.1 Spectral Operator Learning

The SOL is a kind of specifically designed neural architecture [41], which involves learnable spectral operators that are induced by suitable basis function and implemented through correspoding transformation. Simultaneously, it emphases the concept of strictly satisfying BCs in deep surrogate models for PDEs. This architecture allows for the utilization of various spectral methods techniques in deep learning methods and leads to improved accuracy and physical validity of the neural operator models.

We now briefly introduce this architecture through the task of prediction of time-dependent PDEs. However, its application is not limited to this scenario, and other cases can be found in the numerical experiments and Fig. 1. Consider the following PDE

ut(x,t)+𝒩(u(x,t))=0,xΩ,t[0,T]u_{t}(x,t)+\mathcal{N}(u(x,t))=0,x\in\Omega,t\in[0,T]

with inital–boundary conditions

u(x,0)=u0(x),xΩ,\displaystyle u(x,0)=u_{0}(x),x\in\Omega,
(u(x,t))=0,xΩ,\displaystyle\mathcal{B}(u(x,t))=0,x\in\partial\Omega, (1)

where 𝒩\mathcal{N} is a continuous operator and \mathcal{B} is an operator corresponding to specific BCs. The task is to learn the solution operator 𝒮τ\mathcal{S}_{\tau} which evolves the initial condition u0u_{0} to the solution at τ\tau, namely, 𝒮τ(u0(x))=u(x,t=τ)\mathcal{S}_{\tau}(u_{0}(x))=u(x,t=\tau). We let {uk(x)}k\left\{u_{k}(x)\right\}_{k\in\mathbb{N}} and 𝒯\mathcal{T} be a set of basis functions satisfying the BC (1) and the linear transform induced by such a basis, respectively. Then the SOL model for 𝒮(τ)\mathcal{S}(\tau) consists of a stack of neural spectral layers (l)\mathcal{L}^{(l)} that are induced by {uk}\left\{u_{k}\right\} and 𝒯\mathcal{T}, and are in the form of

u(l+1)=σ(Wlu(l)+(l)u(l)):=σ(Wlu(l)+𝒯1Al𝒯u(l)),u^{(l+1)}=\sigma(W_{l}u^{(l)}+\mathcal{L}^{(l)}u^{(l)}):=\sigma(W_{l}u^{(l)}+\mathcal{T}^{-1}A_{l}\mathcal{T}u^{(l)}), (2)

where σ\sigma is the nonlinear activation function, WlW_{l} is an auxiliary pointwise shallow neural network, and AlA_{l} is a learnable spectrum-wise matrix.

The form of architecture (2) is first demonstrated by Li et al. for FNO [27], and then adopted by multiple neural operators, such as the multiwavelet-based neural operator (MWT-NO, [48]), wavelet neural operator (WNO, [49]), spectral neural operator (SNO, [50]), orthogonal polynomial neural operator (OPNO, [41]), multi-channel IAE-Net [51] and so on. Some Examples are listed in Tab. 1. While matching the BCs by selecting a suitable underlying basis is a fundamental criterion in spectral methods, this criterion was first introduced in [41] to operator learning, and we specifically refer to the those approach that inherently satisfies the BCs of the problem as spectral operator learning. So when solving PDEs with periodic BCs, FNO also falls into the category of SOL.

Table 1: The fundamental spectral bases, the specified BCs and the types of utilized grids of neural operators that are based on spectral analysis. Only a subset of models are listed. When the FNO model is applied to periodic BCs, it also fulfills the definition of the SOL architecture.
Model MWT-NO WNO IAE-Net FNO SOL models
SFNO SPFNO OPNO
Basis multiwavelets wavelets integral Fourier
spherical
harmonics
Shen-poly trig-poly
BCs --
periodic
(implicitly)
sphere
Dirichlet
Neumann
Robin
mixed
Dirichlet
Neumann
mixed
Grids uniform
spherical
coordinates
Gaussian uniform

2.2 SPFNO

Refer to caption

Figure 2: (a) Sketch map of SPFNO, where AlA_{l} is a learnable bb-diagonal matrix with a truncation of kmaxk_{max}, where the bb represents the bandwidth. (b) Diagram of odd/even extensions. The Fourier expansion of odd/even extensions only contain the sine or cosine components, respectively. (c) The sketch map of the optional projection filter.

In this subsection, the domain Ω\Omega is limited to one dimensional interval [0,1][0,1] for convenience, while the conclusions can be easily generalized to separable multi-dimensional domains. When focusing on the cases of Dirichlet or Neumann BCs, the basis of specified trigonometric polynomials with “semi-period” is a competitive choice due to its fast transformation algorithms. More concretely, for Dirichlet BCs, it represents the Fourier sine polynomials uk(x)=sinkπx,x[0,1]u_{k}(x)=\sin k\pi x,x\in[0,1] and (discrete) sine transform; and the basis of cosine polynomials uk(x)=coskπx,x[0,1]u_{k}(x)=\cos k\pi x,x\in[0,1] and (discrete) cosine transform for Neumann BCs.

The spectral methods of the aforementioned specified trigonometric polynomials have been investigated in [52, 53]. The SPFNO model is a natural derivative of these methods under the framework of SOL, with 𝒯\mathcal{T} substituted by the sine or cosine transform.

The discrete sine/cosine transforms (DST/DCT) of the 1st kind involve taking the discrete Fourier transform on the odd/even extensions of the given input, respectively. With a slight abuse of notation, we denote f^\hat{f} as the odd extension of function fC[0,1]f\in C[0,1] if ff satisfies Dirichlet BCs or the even extension if ff satisfies Neumann BCs:

Odd:f^(x)={f(x),x[0,1],f(2x),x[1,2];Even:f^(x)={f(x),x[0,1],f(2x),x[1,2].\begin{split}\text{Odd:}\ \hat{f}(x)=\left\{\begin{aligned} &f(x),\ x\in\left[0,1\right],\\ &-f(2-x),\ x\in\left[1,2\right];\end{aligned}\right.\quad\text{Even:}\ \hat{f}(x)=\left\{\begin{aligned} &f(x),\ x\in\left[0,1\right],\\ &f(2-x),\ x\in\left[1,2\right].\end{aligned}\right.\end{split}

So the solutions are discretized at uniform grids and BCs are imposed the on boundary points, see Fig 2b. Then, the sine or cosine polynomials form a basis for f^\hat{f}.

Theorem 2.1.

Suppose fC[0,1]f\in C[0,1]. The extended function f^\hat{f} can be uniquely deconstructed by cosine polynomials {coskπx}kN\left\{\cos k\pi x\right\}_{k\in N} if and only if ff satisfies the Neumann BCs and can be uniquely deconstructed by sine polynomials {sinkπx}kN+\left\{\sin k\pi x\right\}_{k\in N^{+}} if and only if ff satisfies the Dirichlet BCs.

The proof of Theorem 2.1 will be presented in Section 3.2. Theorem 2.1 ensures the effectiveness of DST and DCT as decomposition transforms in solving PDEs with corresponding specific BCs. It also leads to the following conclusion:

Corollary 2.2.

The outputs of the SPFNO model strictly satisfy the Dirichlet and Neumann BCs, respectively.

Finally, the structure of the SPFNO is given in Fig. 2. The projection filter illustrated in Fig. 2c is an optional component that ultimately eliminates the tiny errors of output with respect to the BCs. Generally, its influence on performance is minimal. For example, we simultaneously present the results of SPFNO with and without it in Sec. 3.5. And it is worth noting that a FNO with a similar DFT projection filter satisfies the periodic BCs exactly.

2.3 Proof for Theorem 2.1

Since ff is typically a strong solution to PDEs in the implementation of SPFNO, it is reasonable to assume that ff possesses sufficient smoothness, such as belonging to fH1([0,1])f\in H^{1}([0,1]). But for now we only need fC[0,1]f\in C[0,1], or the differentiability on the boundary additionally for the following proof of Neumann BCs.

2.3.1 Case of ff satisfying Dirichlet BCs and f^\hat{f} being its odd extension

First, we will prove that if the odd extension f^\hat{f} can be uniquely deconstructed by sine polynomials, i.e.

f^=k=1bksinkπx,x[0,2],\hat{f}=\sum\limits_{k=1}^{\infty}b_{k}\sin k\pi x,\ x\in\left[0,2\right],

then ff satisfies the Dirichlet BCs:

f(0)=f(1)=0.f(0)=f(1)=0.

The proof is straightforward: substituting x=0x=0 and x=1x=1 give that

f(0)=f^(0)=0=f^(1)=f(1).f(0)=\hat{f}(0)=0=\hat{f}(1)=f(1).

This equality is derived from the fact that a countably infinite number of zeros sum up to zero. On the other hand, given fC[0,1]f\in C\left[0,1\right] as an arbitrary function that satisfies homogeneous Dirichlet BCs, i.e.,

f(0)=f(1)=0,f(0)=f(1)=0,

its odd extension is of the form

f^(x)={f(x),x[0,1],f(2x),x[1,2],\hat{f}(x)=\left\{\begin{aligned} &f(x),\ x\in\left[0,1\right],\\ &-f(2-x),\ x\in\left[1,2\right],\end{aligned}\right.

which means that f^\hat{f} is a continuous function on a closed interval. Consequently, it can be inferred from the Weierstrass approximation theorem for trigonometric series that the Fourier series of f^\hat{f} converges to f^\hat{f} uniformly, which is of the form

f^(x)=m=0amcosmπx+n=1bnsinnπx,x[0,2],\hat{f}(x)=\sum\limits_{m=0}^{\infty}a_{m}\cos m\pi x+\sum\limits_{n=1}^{\infty}b_{n}\sin n\pi x,\ x\in\left[0,2\right], (3)

where

am=02f^(x)cosmπxdx,m,bn=02f^(x)sinnπxdx,n+.\begin{split}&a_{m}=\int_{0}^{2}\hat{f}(x)\cos m\pi x\ \rm{d}x,\ m\in\mathbb{N},\\ &b_{n}=\int_{0}^{2}\hat{f}(x)\sin n\pi x\ \rm{d}x,\ n\in\mathbb{N}^{+}.\end{split} (4)

Then it yeilds that

f^(x)n=1bnsinnπx=m=0amcosmπx,x[0,2].\hat{f}(x)-\sum\limits_{n=1}^{\infty}b_{n}\sin n\pi x=\sum\limits_{m=0}^{\infty}a_{m}\cos m\pi x,\ x\in\left[0,2\right]. (5)

Noting that the left-hand side of Eq. (5) is an odd function while the right-hand side is even. The only possibility is that it remains zero constantly. As a result,

f^(x)=n=1bnsinnπx,\hat{f}(x)=\sum\limits_{n=1}^{\infty}b_{n}\sin n\pi x,

where bnb_{n} is determined by Eq. (4).

2.3.2 Case of ff satisfying Neumann BCs and f^\hat{f} being its even extension

The proof of this part is actually analogous to Sec. 2.3.1. Assume that fC[0,1]f\in C[0,1] is differentiable on x=0x=0 and 11. On the one hand, given that the even extension f^\hat{f} can be uniquely deconstructed by cosine polynomials, i.e.

f^=k=0akcoskπx,x[0,2].\hat{f}=\sum\limits_{k=0}^{\infty}a_{k}\cos k\pi x,\ x\in\left[0,2\right]. (6)

Noting that f^\hat{f} is an even expansion. Consequently, we have

f(1)=(f^)(1)=limh0f^(1+h)f^(1h)2h=limh0f(1h)f(1h)2h=0.\begin{split}f^{\prime}(1)&=(\hat{f})^{\prime}(1)=\lim\limits_{h\to 0}\frac{\hat{f}(1+h)-\hat{f}(1-h)}{2h}\\ &=\lim\limits_{h\to 0}\frac{f(1-h)-f(1-h)}{2h}=0.\end{split}

Moreover, since Eq. (6) holds, we can further extend f^\hat{f} to \mathbb{R} with a period of 22, which is here denoted as f~\tilde{f}. The periodicity of f~\tilde{f} yields that the following equation holds on x=0x=0:

f(0)=(f~)(0)=limh0f~(0+h)f~(0h)2h=limh0f(0+h)f^(2h)2h=limh0f(h)f(h)2h=0.\begin{split}f^{\prime}(0)&=(\tilde{f})^{\prime}(0)\\ &=\lim\limits_{h\to 0}\frac{\tilde{f}(0+h)-\tilde{f}(0-h)}{2h}=\lim\limits_{h\to 0}\frac{f(0+h)-\hat{f}(2-h)}{2h}\\ &=\lim\limits_{h\to 0}\frac{f(h)-f(h)}{2h}=0.\end{split}

So the Neumann BCs are satisfied by ff.

On the other hand, similar to Sec. 2.3.1, the Fourier series of f^\hat{f} converges to f^\hat{f} uniformly, so Eqns. (3) & (4) also hold for the even extension f^\hat{f} when ff satisfies Neumann BCs. Then it yields that

f^(x)m=0amcosmπx=n=1bnsinnπx0,x[0,2].\hat{f}(x)-\sum\limits_{m=0}^{\infty}a_{m}\cos m\pi x=\sum\limits_{n=1}^{\infty}b_{n}\sin n\pi x\equiv 0,\ x\in\left[0,2\right].

As a result,

f^(x)=m=0amcosmπx,\hat{f}(x)=\sum\limits_{m=0}^{\infty}a_{m}\cos m\pi x,

where ama_{m} is determined by Eq. (4).

Corollary 2.3.

The sets of specified trignometric functions {coskπx}kN\left\{\cos k\pi x\right\}_{k\in N} and {sinkπx}kN+\left\{\sin k\pi x\right\}_{k\in N^{+}} form the orthonormal bases for corresponding function spaces, namely, the even extension {f^even|fC[0,1],f(0)=f(1)=0}\left\{\hat{f}_{\mathrm{even}}|f\in C[0,1],\ f^{\prime}(0)=f^{\prime}(1)=0\right\} and odd extension {f^odd|fC[0,1],f(0)=f(1)=0}\left\{\hat{f}_{\mathrm{odd}}|f\in C[0,1],\ f(0)=f(1)=0\right\}, respectively.

Proof.

As Theorem 2.1 has shown that the specified trignometic functions form a basis for corresponding function space, all we need to do is to prove is the orthonormality of the normalized bases. A simple calculation yields

02sinkπxsinmπx=1202cosπ(k+m)xcosπ(km)x=δk,m,02coskπxcosmπx=1202cosπ(k+m)x+cosπ(km)x=δk,m.02sinkπxcosmπx=1202sinπ(k+m)x+sinπ(km)x=0.\begin{split}&\int_{0}^{2}\sin k\pi x\sin m\pi x=-\frac{1}{2}\int_{0}^{2}\cos\pi(k+m)x-\cos\pi(k-m)x=\delta_{k,m},\\ &\int_{0}^{2}\cos k\pi x\cos m\pi x=\frac{1}{2}\int_{0}^{2}\cos\pi(k+m)x+\cos\pi(k-m)x=\delta_{k,m}.\\ &\int_{0}^{2}\sin k\pi x\cos m\pi x=\frac{1}{2}\int_{0}^{2}\sin\pi(k+m)x+\sin\pi(k-m)x=0.\end{split}

3 Numerical experiments

In order to verify the accuracy and efficiency of the SOL architecture and the importance of BC-satisfying property for neural operators, we compare SPFNO with multiple popular but non-BC-satisfying architectures by solving the following four PDEs on publicly available datasets: (1) 1D and (2) 2D Burgers’ equations with Neumann BCs; (3) Darcy flow problem with Dirichlet BCs; (4) the coupled reaction diffusion equations with Neumann BCs; and (5) 2D incompressible flow through a pipe. These five questions comprehensively address the common tasks for operator learning that we are primarily concerned with:

  • 1.

    single-step predictions for time-dependent PDEs;

  • 2.

    multi-step predictions for time-dependent PDEs;

  • 3.

    solving parametric PDEs with the variable coeffients as input;

  • 4.

    autoregressive learning for time-dependent PDEs;

  • 5.

    learning the solution to PDEs on general geometries.

The accuracy of the models is measured by the average relative L2L^{2} norm error (also known as the relative mean square error, or RMSE) between the predicted solution and the reference solutions and the LL^{\infty} norm error on the corresponding BCs. The maximum L2L^{2} norm error on the test dataset, which represents the empirical worst performance and is crucial for assessing the credibility of models, is also taken into consideration. The baseline models are listed below.

  1. 1.

    FNO [27] is a state-of-the-art neural operator for parametric PDEs, especially those involving periodic BCs. FNOs have many applications and they have achieved impressive accuracy in practical application due to its architecture being similar to that of the spectral method. In the experiments, zero-padding has been applied to the input of FNO to enhance the accuracy on non-periodic BCs.

  2. 2.

    OPNO [41] is the first proposed SOL method for non-periodic BCs such as Dirichlet, Neumann, and Robin BCs. It provided the first numerical example that verifies the competitive accuracy of deep-learning-based surrogate model to the numerical method, with the relative errors reaching the order of 10610^{-6}. Notice that the OPNO will be tested if and only if the data on Chebyshev-Gauss-Lobatto grids are provided by the dataset.

  3. 3.

    U-Net [54] is a popular autoencoding deep learning architecture that combines the convolutional and deconvolutional layers. It has been proven to be a powerful model for tackling image segmentation tasks. It is also used as a baseline model in the PDEBench datasets and demonstrates considerable accuracy in specific PDE tasks.

  4. 4.

    LSM [47] is a cutting-edge transformer-based neural PDE solver that consists of an autoencoding backbone and innovative neural spectral blocks. It successfully trained a neural network with considerable depth and outperformed the performance of 14 existing models, including neural operators, autoencoders, and transformers, across 7 PDE-solving tasks. In this paper, the LSM model also serves as the most representative non-neural-operator model for PDEs known to us.

Table 2: Hyperparameters configurations for the networks of SPFNO and the training process in the following experiments. The number of modes, depth of the spectral backbone, width of the channel domain, and bandwidth of AlA_{l} are denoted by kmaxk_{max}, LL, WW, and bb, respectively.
Experiment epochs LL kmaxk_{max} WW bb batch size weight decay scheduler
1D-Burgers 5000 4 20 50 4 20 1e-4 StepLR
2D-Burgers 3000 4 16 24 4 20 1e-4 StepLR
2D React-Diff 500 4 24 24 1 5 1e-4 StepLR
Darcy flow 500 4 24 32 1 20 1e-6 Plateau
Corrected pipe 500 4 24 32 (3, 1) 20 1e-6 Plateau

In addition, the BOON [44] model is an alternative and, to the best of our knowledge, the only other type of BC-satisfication technique distinct from SOL. It applies a corrective projection that enforces the BCs on each spectral layer of given neural operators, instead of using a basis that inherently satisfies the BCs. It also showed that the BC-satisfying correction significantly increase the accuracy of the neural operators. In Sec. 3.6, we selected three representative problems from BOON datasets [44] as supplementary experiments: (6.a) 1D Burgers’ equation with Dirichlet BCs, on which the BOON showed the highest performance improvement (30X) over its baseline; (6.b) 1D heat equation with time-dependent BCs; and (6.c) 2D wave equation with Neumann BCs. Moreover, the selected datasets precisely covers all the cases of model dimensions considered in [44] (1D, 1D+time, 2D+time). In addition, the currently available portion of the BOON code allows us to include it in the comparison in Experiment 1.

Except for Sec. 3.6.2, all models are evaluated at the same resolution as that used during the training process. Additionally, they are trained using the Adam optimizer and the RMSE loss function, with the same weight decay parameter and number of epochs, and the random seeds are fixed to 0 for fairness. For neural operators, considering their similar spectral structures, we choose the same number of modes (denoted by kmaxk_{max}), depth of the spectral backbone (LL), width of the channel domain (WW), and size of mini-batch. It is very common in spectral methods to compare the accuracy of various spectral bases by controlling the number of modes. A representative example is found in Boyd’s work [55], where the number of modes needed to discretize a given function was used as a decisive criterion for comparing spectral methods with different bases, such as Chebyshev, Fourier, and spherical harmonics. The non-neural-operator baseline are also finely tuned, and only the best performance of baselines with comparable scale in our experiment is demonstrated. All experiments are performed on an Nvidia A100 80GB GPU.

3.1 Experiment 1: 1D viscous Burgers’ equation with Neumann BCs

We first consider the one-dimensional viscous Burgers equation as follows

tu(x,t)+u(x,t)xu(x,t)=ν2x2u(x,t),xΩ\frac{\partial}{\partial t}u(x,t)+u(x,t)\frac{\partial}{\partial x}u(x,t)=\nu\frac{\partial^{2}}{\partial x^{2}}u(x,t),x\in\Omega

subject to the initial-boundary conditions

u(x,0)=u0(x),xΩ,\displaystyle u(x,0)=u_{0}(x),\ x\in\Omega,
xu(x,t)=0,xΩ,\displaystyle\frac{\partial}{\partial x}u(x,t)=0,\ x\in\partial\Omega,

where Ω=[1,1]\Omega=\left[-1,1\right]. Burgers’ equation is a fundamental PDE with applications in modeling turbulence, nonlinear acoustics, and traffic flow. The complexity of the dynamical system it describes poses challenges for the learning of deep models, so it has been adopted as one of the most popular benchmark problems in the field of AI4Science.

The dataset used in this experiment is sourced from the paper on the OPNO [41], where ν\nu is fixed to 0.1/π0.1/\pi and the task is to learn the solution operator S1:u0(x)u(x,t=1)S_{1}:u_{0}(x)\to u(x,t=1). Because the input functions in PDE training and test dataset are usually randomly sampled from the same distribution, in order to ensure the generalization of the trained model, the generator of samples should possess the ability to approximate any function on the input space and a sufficient number of degrees of freedom. In this dataset, the initial condition u0(x)u_{0}(x) is generated by sampling from a Gaussian random field according to u0𝒩(0,625(4Δ+25I)2)u_{0}\sim\mathcal{N}(0,625(-4\Delta+25I)^{-2}) with Neumann BCs to maintain the complexity of the sample space.

As we have concluded in [41], the BC-satisfying property is crucial for enhancing the accuracy and credibility of surrogate models for PDEs. One can reasonably anticipate that other BC-satisfying neural operators should also demonstrate notable superiority, even when employed on different grids. This expectation arises from the fact that neural operators are specifically designed to learn the underlying operator, rather than the values at discrete grids of a function.

Our results first ruled out the suspicion that the advantage of SPFNO over FNO arises from a larger bandwidth rather than BC satisfaction. This is illustrated in Fig. 3, which also indicates that this technique is not suitable for FNO. Therefore, the bandwidth of FNO is maintained at 1.

The results presented in Tab. 3 and Fig. 3 show that the SPFNO achieves a comparable average accuracy and superior worst performance when compared to the previous SOL model OPNO. On the other hand, it significantly outperforms all non-BC-satisfying baseline models as well as the BC-satisfying BOON model. And although the FNO and LSM models have achieved acceptable global errors in some cases, their errors on the BCs can be several orders of magnitude larger.

Table 3: Evaluation of (a) the relative L2L^{2} norm error (×102\times 10^{-2}) and worst error (×102\times 10^{-2}), and (b) the error on the Neumann BC of 1D-Burgers’ equation with resolution of NN. The best result is in bold and the second best is underlined. The SOL models, namely the SPFNO and OPNO that achieve the SOTA performance, are listed separately.
(a) Relative L2L^{2} norm error (×102\times 10^{-2}) and worst error (×102\times 10^{-2})
Model N=256N=256 N=1024N=1024 N=4096N=4096
L2L^{2} worst L2L^{2} worst L2L^{2} worst
FNO 1.571.57 14.914.9 1.681.68 14.114.1 1.691.69 16.616.6
U-Net 6.276.27 56.956.9 27.627.6 126.3126.3 33.033.0 146.3146.3
LSM 4.874.87 47.247.2 21.121.1 108.6108.6 38.738.7 167.7167.7
BOON 1.201.20 10.110.1 1.281.28 10.210.2 1.421.42 10.010.0
OPNO 0.770\mathbf{0.770} 4.63¯\underline{4.63} 0.781\mathbf{0.781} 4.40¯\underline{4.40} 0.782\mathbf{0.782} 3.86¯\underline{3.86}
SPFNO 0.868¯\underline{0.868} 3.58\mathbf{3.58} 0.862¯\underline{0.862} 3.55\mathbf{3.55} 0.873¯\underline{0.873} 3.65\mathbf{3.65}
(b) Error on the BCs
N=256N=256 N=1024N=1024 N=4096N=4096
LBCL^{\infty}_{BC} LBCL^{\infty}_{BC} LBCL^{\infty}_{BC}
2.9E-12.9E\text{-}1 4.1E-14.1E\text{-}1 5.5E-15.5E\text{-}1
1.1E-21.1E\text{-}2 5.7E-25.7E\text{-}2 8.9E-28.9E\text{-}2
4.8E-14.8E\text{-}1 5.1E-15.1E\text{-}1 9.8E-19.8E\text{-}1
0 0 0
6.0E-126.0E\text{-}12 1.1E-101.1E\text{-}10 1.9E-91.9E\text{-}9
0 0 0

Moreover, it is noteworthy that the BOON model also achieves better performance than any non-BC-satisfying models. This further supports our viewpoints on the BC-satisfication of neural operators, as it is mutually corroborated through different neural architectures.

Refer to caption
(a) relative L2L^{2} norm error
Refer to caption
(b) max error
Figure 3: Comparisons of the relative L2L^{2} errors and max errors for various bandwidth bb on the testing dataset for 1D Burgers’ equation. While numerical experiment demonstate that the quasi-diagnolizing technique improves the performance of SPFNO, it cannot substitute the requirement of BC-satification property, seeing that the FNO with a bb-diagonal learnable matrix does not result in a substantial accuracy improvement.

3.2 Experiment 2: 2D viscous Burgers’ equation with Neumann BCs

With the computational domain set to Ω=[1,1]2\Omega=[-1,1]^{2}, the following 2D Burgers’ equation is considered in this experiment:

{ut+uux+uuy=ν(2x2+2y2)u,u𝐧(x,t)=0,xΩ,\left\{\begin{aligned} &\frac{\partial u}{\partial t}+u\frac{\partial u}{\partial x}+u\frac{\partial u}{\partial y}=\nu(\frac{\partial^{2}}{\partial x^{2}}+\frac{\partial^{2}}{\partial y^{2}})u,\\ &\frac{\partial u}{\partial\mathbf{n}}(x,t)=0,\ x\in\partial\Omega,\end{aligned}\right.

Nevertheless, the output of the target operator consists of solutions at multiple fixed time steps, i.e. Sτ0,,τn:u0{u(,τ1),,u(,τn)}S_{\tau_{0},...,\tau_{n}}:u_{0}\mapsto\left\{u(\cdot,\tau_{1}),...,u(\cdot,\tau_{n})\right\}, so that the time-dependent PDEs can be efficiently solved by only one forward propagation. As a result, both the operator and the task can be more complicated compared with the 1D case. We take the dataset from [41] and choose a subset of time discretization by fixing {τi}={0.2,0.6,1.0}\left\{\tau_{i}\right\}=\left\{0.2,0.6,1.0\right\}. In this dataset, u0u_{0} is generated according to the Gaussian random field 𝒩(0,16(Δ+16I)2)\mathcal{N}(0,16(\Delta+16I)^{-2}). During the training, the bandwidth of the learnable matrix AlA_{l} is also set as 44. The results are illustrated in Tab. 4, where the results of different SOL models are also very close, and significantly outperform the baseline models.

Table 4: Evaluation of the (a) relative L2L^{2} norm error (×102\times 10^{-2}) and worst error (×102\times 10^{-2}), and (b) the error on the Neumann BCs of 2D-Burgers’ equation with resolution N×NN\times N.
(a) Relative L2L^{2} norm error (×102\times 10^{-2}) and worst error (×102\times 10^{-2})
Model N=50N=50 N=100N=100 N=200N=200
L2L^{2} worst L2L^{2} worst L2L^{2} worst
FNO 0.5280.528 9.029.02 0.5890.589 10.0310.03 0.6720.672 9.719.71
U-Net 1.641.64 16.016.0 2.312.31 17.617.6 2.282.28 17.117.1
LSM 2.432.43 9.239.23 2.932.93 12.412.4 3.193.19 11.611.6
OPNO 0.371\mathbf{0.371} 3.37\mathbf{3.37} 0.336\mathbf{0.336} 3.68\mathbf{3.68} 0.335\mathbf{0.335} 3.68\mathbf{3.68}
SPFNO 0.386¯\underline{0.386} 4.98¯\underline{4.98} 0.378¯\underline{0.378} 5.00¯\underline{5.00} 0.378¯\underline{0.378} 5.05¯\underline{5.05}
(b) Error on the BCs
N=50N=50 N=100N=100 N=200N=200
LBCL^{\infty}_{BC} LBCL^{\infty}_{BC} LBCL^{\infty}_{BC}
1.6E-11.6E\text{-}1 3.6E-13.6E\text{-}1 7.8E-17.8E\text{-}1
4.2E-24.2E\text{-}2 1.3E+01.3E\text{+}0 4.2E+04.2E\text{+}0
1.9E-11.9E\text{-}1 4.1E-14.1E\text{-}1 1.6E+01.6E\text{+}0
2.9E-122.9E\text{-}12 2.0E-122.0E\text{-}12 7.9E-127.9E\text{-}12
0 0 0

3.3 Experiment 3: Coupled 2D Reaction–Diffusion Equations with Neumann BCs

The coupled reaction–diffusion (Allen Cahn) equations are formulated as follows

{ut=du2ux2+du2uy2+Ru(u,v),vt=dv2vx2+dv2vy2+Rv(u,v),\left\{\begin{aligned} \frac{\partial u}{\partial t}&=d_{u}\frac{\partial^{2}u}{\partial x^{2}}+d_{u}\frac{\partial^{2}u}{\partial y^{2}}+R_{u}(u,v),\\ \frac{\partial v}{\partial t}&=d_{v}\frac{\partial^{2}v}{\partial x^{2}}+d_{v}\frac{\partial^{2}v}{\partial y^{2}}+R_{v}(u,v),\end{aligned}\right.

where

Ru(u,v)=uu3kv,Rv(u,v)=uv,du=0.001,dv=k=0.005.\begin{split}&R_{u}(u,v)=u-u^{3}-k-v,\\ &R_{v}(u,v)=u-v,\\ &d_{u}=0.001,d_{v}=k=0.005.\end{split}

The nonlinearly coupled variables uu and vv represent the activator and inhibitor in the system, respectively, to which the Neumann BCs are imposed. The dataset is provided by PDEBench [56], a comprehensive set of benchmarks for scientific machine learning. Since the data are given on a staggered uniform grid, directly sub-sampling would yields an unexpected non-uniform grid. So we only perform the experiment with the original resolution of 128×128128\times 128.

In [56], models are trained and evaluated using an autoregressive approach, where the output at each time step serves as the input for the subsequent time step in a time series. This approach is often applied in predicting time-dependent PDEs [27] and global weather [36] using neural operators but may lead to training instability and GPU memory limitations. The results are shown in Tab. 5(a), where SPFNO acheives the lowest errors.

Furthermore, it should be noted that the difference in efficiency between SPFNO and FNO is mainly due to the constant factor arising from performing FFT on the odd/even extensions of length 2N for the DST/DCT. However, by employing optimized butterfly operations, it is possible to achieve the same efficiency as FFT.

Table 5: Relative L2L^{2} norm error (×102\times 10^{-2}) and worst error (×102\times 10^{-2}) of (a) 2D reaction diffusion equations in Experiment 3 and (b) 2D Darcy flow problem in Experiment 4. The number of parameters in neural networks (#Param) and the averaged GPU time per epoch during the training process (#Time) are also given.
(a) Reaction diffusion equations
L2L^{2} worst #Param #Time
FNO 5.19¯\underline{5.19} 6.37¯\underline{6.37} 1.31.3m 150s150s
U-Net 68.968.9 77.677.6 7.87.8m 217s217s
LSM 7.207.20 13.713.7 1.21.2m 606s606s
SPFNO 1.13\mathbf{1.13} 1.60\mathbf{1.60} 1.41.4m 275s275s
(b) Darcy flow problem
L2L^{2} worst #Param #Time
FNO 0.6880.688 6.046.04 2.42.4m 7.44s7.44s
U-Net 0.9890.989 4.81¯\underline{4.81} 7.87.8m 6.53s6.53s
LSM 0.468¯\underline{0.468} 2.72\mathbf{2.72} 19.219.2m 62.7s62.7s
SPFNO 0.283\mathbf{0.283} 4.894.89 2.42.4m 32.9s32.9s

3.4 Experiment 4: 2-D Darcy flow with Dirichlet BCs

Darcy’s law describes the flow of fluid through a porous medium. It has been widely implemented in various fields, including hydrogeology, petroleum engineering, and soil mechanics. In this experiment, the 2-D steady-state Darcy flow equations in a unit box are formulated as the following boundary value problem (BVP):

(a(𝐱)u(𝐱))=f,𝐱[0,1]2.-\nabla\cdot(a(\mathbf{x})\nabla u(\mathbf{x}))=f,\ \mathbf{x}\in[0,1]^{2}. (7)

Moreover, the homogeneous Dirichlet BCs are imposed. The task is to learn the operator 𝒢(a)=u\mathcal{G}(a)=u that maps the diffusion coefficient aa to the solution uu, where the input aa is possibly discontinuous. This problem serves as another most commonly used benchmark for deep PDE solvers since the dataset is provided in [27]. In this dataset, the diffusion coefficient a(x)a(x) is taken as a piecewise constant, while the reference solution is generated using a finite difference method. Under this premise, however, the nondifferentiable variable coefficient makes the 2nd order finite difference method unsuitable for solving the problem. So we utilize the 2D-Darcy dataset of PDEBench [56] with ff fixed as 100.0100.0 instead. Its input functions are also piecewise constant and its larger size (10410^{4} pieces of data compared to 10310^{3} in [27]) contributes to the model achieving higher accuracy. The experiment is conducted using the original spatial resolution 128×128128\times 128.

Compared to the Neumann BC, the Dirichlet BC is much easier to learn because it does not involve any derivative. Additionally, the heterogeneity between the input and output functions leads to a much more complicated spectral structure of the mapping operator. Actually, multiple non-neural-operator method have been reported to surpass the performance of neural operators in solving Eq. (7), especially the transformers, among which the LSM acheives the highest accuracy that is known to us.

We adopt the ReduceLROnPlateau scheduler for SPFNO to accelerate the training. Similar scheduler is employed for baseline models if there is an improvement in accuracy. The results can be found in Tab. 5(b), where the SPFNO again obtains the lowest relative error. Notably, however, the LSM achieves the lowest maximum error. The main reason is that the input a(x)a(x) in the dataset is discontinuous, and when the discontinuous input exhibits rapid variations, the accuracy of spectral methods will significantly decrease. Similarly, the worst performances of FNO and SPFNO are slightly inferior to that of U-Net. It might help address such issues to introduce local structures such as convolution layers or radial basis functions to capture the extremely high frequency information, but could potentially lead to a loss of flexibility or invariance in discretization [57].

3.5 Experiment 5: 2D Navier-Stokes equation with mixed BCs

We focus on the ability and flexibility of handling general geometries of our model. The incompressible flow through a pipe is considered in this experiment, of which the governing equation is incompressible Navier-Stokes equation

{𝐮t+(𝐮)𝐮=p+μ2𝐮,𝐮=0,\left\{\begin{aligned} &\frac{\partial\mathbf{u}}{\partial t}+(\mathbf{u}\cdot\nabla)\mathbf{u}=-\nabla p+\mu\nabla^{2}\mathbf{u},\\ &\nabla\cdot\mathbf{u}=0,\end{aligned}\right.

where uu is the velocity vector, pp is the pressure, and μ\mu represents the viscosity. The Pipe dataset is given by the geometry-aware Fourier neural operator (Geo-FNO) [58], where the shape of the pipe is randomly generated, and the maximum velocity condition umax=[1;0]u_{max}=[1;0] is imposed at the inlet, free boundary condition is imposed at the outlet, and no-slip boundary condition is imposed at the inner wall of the pipe. However, we have identified a few instances with artificial discontinuities at the upper edge of the outlet in the current Pipe dataset, see Fig. 4a. These discontinuities arise from bugs in the numerical solver when handling the free boundary condition. So we recomputed the reference solutions to address this issue. The updated dataset is now available at https://drive.google.com/drive/folders/1WPAs6bXttCPOWrDaUudC8B4dKPoju1OO. Notably, after excluding the errors caused by the dataset, the accuracy of all models has improved on the updated dataset.

Refer to caption
(a) Updated pipe dataset
Refer to caption
(b) Dual Fourier extension
Figure 4: (a) Examples of the differences between updated pipe dataset and the previous version. (b) Diagram of the dual Fourier extension.

In the experiment, we utilized 10001000 instances for training and 10001000 instances for test. The goal is to learn the operator that maps the mesh point locations to the horizonal fluid velocity on these mesh points. The experiment is conducted with a resolution of 129×129129\times 129.

The solution to the problem satisfies a fixed Dirichlet BC on the inlet and a Neumann BC on the outlet. To approximate the mixed BCs in the horizontal direction, the Fourier transform is applied on the dual Fourier extensions, as shown in Fig. 4b. When a continuous function fC[0,1]f\in C[0,1] satisfies f(0)=0f(0)=0 and f(1)=0f^{\prime}(1)=0, the Fourier expansion of its odd-even dual Fourier extension consists solely of odd-order sine series terms. The corresponding transform is also known as the “WAWS” transform [53], which is an abbreviation for the Whole-sample Antisymmetry on the left boundary and Whole-sample Symmetry on the right boundary. On the vertical direction, we use SPFNO with sine basis to impose the Dirichlet BCs. Finally, in this experiment, we implement a WAWS transformation in the horizontal direction for the mixed BCs, and a sine transform in the vertical direction for Dirichlet BCs.

In addition to the mesh deviating from the expected uniform grid in the design of SPFNO, the complexity of the problem is further compounded by the non-perpendicular alignment of the pipe’s artificial outlet with the longitudinal axis. This could lead to errors in the Neumann BC we imposed on the mesh at the outlet. To improve the flexibility of our model when dealing with general geometries, the projection filter is dropped. As shown in Tab. 6, the SPFNO models has a significant lower error compared to the baseline models. Besides, the projection filter only exhibits minimal impact on the performance. These results demonstrate the flexibility of SPFNO in handling general geometries.

Table 6: Evaluation relative L2L^{2} norm error (×103\times 10^{-3}) and worst error (×103\times 10^{-3}) for the pipe flow.
L2L^{2} error worst #Param #Time(s)
Geo-FNO 1.78 6.31 2.4m 2.6
U-Net 4.68 9.35 7.8m 1.4
LSM 1.69 4.08 10.8m 6.7
SPFNO 0.980 3.56 7.2m 5.8
SPFNO(with PF) 0.999 3.83 7.2m 6.0

3.6 Additional experiements

3.6.1 Evaluation and comparison on the dataset of BOON

We additionally compare our model with the BOON model on the Dirichlet and Neumann datasets that the latter provided. Similar parameters are chosen as those for BOON in [44] (also see Tab. 7), and the bandwidth bb is fixed as 1. Please note that one complex Fourier mode consists of two basis functions.

Table 7: Parameters of the SPFNO model for the training process on the dataset of BOON
Problem and the dimension Modes Channels #Param
1D Burgers’ equation (1D) 32 64 0.55m
1D heat equation (1D space+time) 24 32 0.11m
2D wave equation (2D space+time) 16 20 0.43m

To accelerate the training process, a ReduceLROnPlateau scheduler is adopted as in Experiment 4 for the SPFNO model.

(a) 1D Burgers’ equation with Dirichlet BCs: In this dataset, the following Riemann problem of viscous Burgers’ equation with Dirichlet BCs is consider:

ut+(u2/2)x=νuxx,x[0,1],t0,u0(x)={uL,ifx0.5,uR,ifx>0.5,u(0,t)=uexact(0,t),u(1,t)=uexact(1,t),t>0.\begin{split}&u_{t}+(u^{2}/2)_{x}=\nu u_{xx},\ x\in[0,1],t\geq 0,\\ &u_{0}(x)=\left\{\begin{aligned} u_{L},\ \textrm{if}\ x\leq 0.5,\\ u_{R},\ \textrm{if}\ x>0.5,\end{aligned}\right.\\ &u(0,t)=u_{\textrm{exact}}(0,t),\ u(1,t)=u_{\textrm{exact}}(1,t),\ t>0.\end{split}

The task is to learn the solution operator 𝒮:u0(x)ut=1.2\mathcal{S}:u_{0}(x)\to u_{t=1.2}.

(b) 1D heat transfer equation with Neumann BCs: In this dataset, the following 1D heat equation with time-dependent Neumann BCs is consider:

utkuxx=f(x,t),x[0,1],t0,u0(x)=cos(ωπx),x[0,1],ux(0,t)=0,ux(1,t)=Usinπt,t0,\begin{split}&u_{t}-ku_{xx}=f(x,t),\ x\in[0,1],\ t\geq 0,\\ &u_{0}(x)=\cos(\omega\pi x),\ x\in[0,1],\\ &u_{x}(0,t)=0,\ u_{x}(1,t)=U\sin\pi t,\ t\geq 0,\end{split}

The task is to learn the solution operator 𝒮:u0(x){u(x,t=tM)}\mathcal{S}:u_{0}(x)\to\left\{u(x,t=t_{M})\right\} where M=25M=25.

(c) 2D wave equation with Neumann BCs: In this dataset, the following 2D wave equation with Neumann BCs is consider:

utt=c2(uxx+uyy),x,y[0,1]2,t0,u_{tt}=c^{2}(u_{xx}+u_{yy}),\ x,y\in[0,1]^{2},\ t\geq 0,

to which the following analytical solution exists

uexact(x,t)=kcos(πx)cos(πy)cos(c2πt).u_{\textrm{exact}}(x,t)=k\cos(\pi x)\cos(\pi y)\cos(c\sqrt{2}\pi t).

The task is to learn the solution operator 𝒮:u0(x,y){u(x,y,t=tM)}\mathcal{S}:u_{0}(x,y)\to\left\{u(x,y,t=t_{M})\right\} where M=25M=25.

Table 8: Performance of SPFNO and BOON models on multiple BOON datasets. The relative L2L^{2} test error (and error on BCs within the parentheses) are given. The averaged GPU time per epoch in the training process is given within square brackets or seperately.
(a) Single-step predictions for 1D Burgers’ equation with Dirichlet BCs and varying viscosities ν\nu. N=500N=500.
Model ν=0.1\nu=0.1 ν=0.05\nu=0.05 ν=0.02\nu=0.02 ν=0.005\nu=0.005 ν=0.002\nu=0.002 #Time
SPFNO 1.12𝐞𝟓(0)\mathbf{1.12e-5}(0) 1.26𝐞𝟓(0)\mathbf{1.26e-5}(0) 5.95𝐞𝟓(0)\mathbf{5.95e-5}(0) 5.04e4(0)5.04e-4(0) 7.28e4(0)7.28e-4(0) 0.12s0.12s
BOON-FNO [44] 1.2e4(0)1.2e-4(0) 1.0e4(0)1.0e-4(0) 8.4e5(0)8.4e-5(0) 1.0𝐞𝟒(0)\mathbf{1.0e-4}(0) 1.27e3(0)1.27e-3(0) 1.3s1.3s
BOON-MWT [44] 2.0e4(0)2.0e-4(0) 2.5e4(0)2.5e-4(0) 2.2e4(0)2.2e-4(0) 2.0e4(0)2.0e-4(0) 3.4𝐞𝟒(0)\mathbf{3.4e-4}(0) --
(b) Multi-step predictions for 1D heat equation with Neumann BCs and varying resolutions N. M=25M=25.
Model N=256N=256 N=512N=512 N=1024N=1024 N=2048N=2048
SPFNO 5.39𝐞𝟒(0)[0.51s]\mathbf{5.39e-4}(0)[0.51s] 3.53𝐞𝟒(0)[0.90s]\mathbf{3.53e-4}(0)[0.90s] 2.63𝐞𝟒(0)[0.96s]\mathbf{2.63e-4}(0)[0.96s] 2.90𝐞𝟒(0)[1.13s]\mathbf{2.90e-4}(0)[1.13s]
BOON 3.18e2(0)[3.5s]3.18e-2(0)[3.5s] 3.67e2(0)[5.9s]3.67e-2(0)[5.9s] 4.24e2(0)[10.4s]4.24e-2(0)[10.4s] 6.76e3(0)[20.0s]6.76e-3(0)[20.0s]
(c) Multi-step predictions for 2D wave equation with Neumann BCs and varying resolutions N. M=25M=25.
Model N=25N=25 N=50N=50 N=100N=100
SPFNO 1.14𝐞𝟒\mathbf{1.14e-4}(0) 4.69𝐞𝟓\mathbf{4.69e-5}(0) 1.93𝐞𝟒\mathbf{1.93e-4}(0)
BOON [44] 9.7e49.7e-4(0) 8.93e48.93e-4(0) 9.6e49.6e-4(0)

The results are illstrated in Tab. 8, where the SPFNOs outperform BOONs in almost all of the cases, and we have observed that continuing the training process allows the SPFNO surpassing the baseline in all scenarios. However, this process would not be carried out, and the reasons will be given in the subsequent text. Due to the adoption of a global spectral method in SPFNO, in contrast to BOON, which empolys a local finite difference method for the approximation of BCs, the SPFNO achieves higher accuracy globally by introducing the physical information from boundaries into the interior system in a direct way.

Furthermore, the SPFNO also demonstrates approximately an order of magnitude higher efficiency compared to BOON due to its utilization of bases that inherently satisfy the BCs, while BOON requires an additional BC correction operation in each spectral layer.

It is worth noting that, however, we suggest that the primary factor contributing to the extremely high accuracy of our models in additional experiments is the dataset. The problems considered here have analytical solutions, and the generator of input functions involves only a small number of degrees of freedom. This circumstance enables the model to easily fit a manifold of significantly reduced dimensionality while learning the solution operator. Hence, although the existing results sufficiently illustrate the approximation capability of all tested model for the problems, we are afraid that the practical implications of further error reduction remain limited.

3.6.2 Invariance to discretization: the out-of-domain evaluations of the pre-trained model on different grids

We take the pre-trained model in Experiment 1 as an example. As a consequence of the spectral structure of the SOL architecture, the model trained on the coarse grid (N=256N=256) can directly predict on the fine grid without significant losses in accuracy, see Tab 9 and Fig. 5. The data from the sub-scaled grid remains untouched during the training process. A similar situation is also observed for the model trained on the fine grid (N=4096N=4096) but evaluated on coarse ones.

Table 9: The evaluation relative L2L^{2} errors (errors on BCs) of SPFNO model for 1D Burgers’ equation. Model is trained on a NN grid but evaluated on a grid with resolution NN^{\prime}
N=256N^{\prime}=256 N=512N^{\prime}=512 N=1024N^{\prime}=1024 N=2048N^{\prime}=2048 N=4096N^{\prime}=4096
N=256N=256 0.0086766(0)0.0086766(0) 0.0086748(0)0.0086748(0) 0.0086737(0)0.0086737({0}) 0.0086732(0)0.0086732(0) 0.0086729(0)0.0086729(0)
N=4096N=4096 0.0087394(0)0.0087394(0) 0.0087364(0)0.0087364({0}) 0.0087351(0)0.0087351({0}) 0.0087344(0)0.0087344({0}) 0.0087341(0)0.0087341({0})
Refer to caption
(a) N=256N^{\prime}=256
Refer to caption
(b) N=512N^{\prime}=512
Refer to caption
(c) N=1024N^{\prime}=1024
Refer to caption
(d) N=2048N^{\prime}=2048
Refer to caption
(e) N=4096N^{\prime}=4096
Figure 5: The prediction of SPFNO model that is trained on a N=256N=256 grid but evaluated on different grids of resolution NN^{\prime}

4 Discussion and future work

We presented a novel spectral operator learning (SOL) architecture for PDEs with Dirichlet and Neumann BCs. This method, named SPFNO, combines traditional spectral methods and the neural operator architecture, so that it satisfies the corresponding BCs exactly. The BC-satisfying properties were proved both theoretically and numerically. Numerical experiments also showed that the SOL methods yield very close results regardless of the different types of grids they are associated with. On the other hand, compared with baselines including non-BC-satisfying models and BC-satisfying BOON model, state-of-the-art performance in solving a variety of widely adopted benchmark problems can be achieved with our proposed framework. From the perspective of machine learning, the BC-satisfying spectral structure is an intuitive bias that effectively shrinks the hypothesis space.

Although the paper focuses on the data-driven training of neural operators, we can also directly learn the target operator by utilizing the physics constraints and minimizing the residuals of equations, which can dramatically reduce the dependence on datasets. Readers may find that the residuals of the BCs that are usually difficult to handle will vanish for an SOL model, making the training much easier. Further research holds promise for future exploration. And while the bandwidth of the learnable matrix is generally considered to be determined by the orthogonality of the basis, a suitable increase of it improves the approximation capability of SPFNO, for which the theoretical analysis is an interesting topic for further research. In addition, developing other SOL instances for more complex BCs and geometries, e.g., the radiation BCs or unbounded domains, is an important subject of future research.

Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work the author(s) used ERNIE Bot in order to improve language and readability. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Acknowledgement

This work was supported by the National Natural Science Foundation of China (Grant Nos. 12271523, 11901577, 11971481, 12071481), by the Defense Science Foundation of China (Grant No. 2021-JCJQ-JJ-0538), by the National Key R&D Program of China (Grant No. SQ2020YFA0709803), by the Science and Technology Innovation Program of Hunan Province (Grant Nos. 2022RC1192, 2021RC3082), and by the Research Fund of National University of Defense Technology (Grant Nos. ZK19-37, ZZKY-JJ-21-01).

References

  • [1] J. Han, A. Jentzen, W. E, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences 115 (34) (2018) 8505–8510.
  • [2] J. Sirignano, K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, Journal of computational physics 375 (2018) 1339–1364.
  • [3] J. He, J. Xu, Mgnet: A unified framework of multigrid and convolutional neural network, Science china mathematics 62 (2019) 1331–1354.
  • [4] Y. Zhu, N. Zabaras, Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification, Journal of Computational Physics 366 (2018) 415–447.
  • [5] Y. Khoo, J. Lu, L. Ying, Solving parametric PDE problems with artificial neural networks, European Journal of Applied Mathematics 32 (3) (2021) 421–435.
  • [6] P. Jin, Z. Zhang, A. Zhu, Y. Tang, G. E. Karniadakis, SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems, Neural Networks 132 (2020) 166–179.
  • [7] L. Lyu, Z. Zhang, M. Chen, J. Chen, MIM: A deep mixed residual method for solving high-order partial differential equations, Journal of Computational Physics 452 (2022) 110930.
  • [8] B. Yu, et al., The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems, Communications in Mathematics and Statistics 6 (1) (2018) 1–12.
  • [9] E. Weinan, C. Ma, L. Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039 (2019).
  • [10] C. Duan, Y. Jiao, Y. Lai, D. Li, X. Lu, J. Z. Yang, Convergence rate analysis for deep ritz method, Communications in Computational Physics 31 (4) (2022) 1020–1048. doi:https://doi.org/10.4208/cicp.OA-2021-0195.
  • [11] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378 (2019) 686–707.
  • [12] Z. Wang, M. Chen, J. Chen, Solving multiscale elliptic problems by sparse radial basis function neural networks, Journal of Computational Physics 492 (2023) 112452.
  • [13] Z. Mao, A. D. Jagtap, G. E. Karniadakis, Physics-informed neural networks for high-speed flows, Computer Methods in Applied Mechanics and Engineering 360 (2020) 112789.
  • [14] Z. Gao, L. Yan, T. Zhou, Failure-informed adaptive sampling for PINNs, SIAM Journal on Scientific Computing 45 (4) (2023) A1971–A1994.
  • [15] Y. Shin, J. Darbon, G. Em Karniadakis, On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type pdes, Communications in Computational Physics 28 (5) (2020) 2042–2074. doi:https://doi.org/10.4208/cicp.OA-2020-0193.
  • [16] L. Sun, J.-X. Wang, Physics-constrained bayesian neural network for fluid flow reconstruction with sparse and noisy data, Theoretical and Applied Mechanics Letters 10 (3) (2020) 161–169.
  • [17] A. D. Jagtap, K. Kawaguchi, G. E. Karniadakis, Adaptive activation functions accelerate convergence in deep and physics-informed neural networks, Journal of Computational Physics 404 (2020) 109136.
  • [18] L. Guo, H. Wu, X. Yu, T. Zhou, Monte Carlo fPINNs: Deep learning method for forward and inverse problems involving high dimensional fractional partial differential equations, Computer Methods in Applied Mechanics and Engineering 400 (2022) 115523.
  • [19] Y. Jiao, D. Li, X. Lu, J. Z. Yang, C. Yuan, GAS: A Gaussian mixture distribution-based adaptive sampling method for PINNs, arXiv preprint arXiv:2303.15849 (2023).
  • [20] S. Dong, J. Yang, On computing the hyperparameter of extreme learning machines: Algorithm and application to computational PDEs, and comparison with classical and high-order finite elements, Journal of Computational Physics 463 (2022) 111290.
  • [21] L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature machine intelligence 3 (3) (2021) 218–229.
  • [22] C. R. Gin, D. E. Shea, S. L. Brunton, J. N. Kutz, DeepGreen: deep learning of Green’s functions for nonlinear boundary value problems, Scientific reports 11 (1) (2021) 21614.
  • [23] Y. Fan, L. Ying, Solving electrical impedance tomography with deep learning, Journal of Computational Physics 404 (2020) 109119.
  • [24] S. Cai, Z. Wang, L. Lu, T. A. Zaki, G. E. Karniadakis, DeepM&Mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks, Journal of Computational Physics 436 (2021) 110296.
  • [25] P. Jin, S. Meng, L. Lu, MIONet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing 44 (6) (2022) A3490–A3514.
  • [26] N. H. Nelsen, A. M. Stuart, The random feature model for input-output maps between banach spaces, SIAM Journal on Scientific Computing 43 (5) (2021) A3212–A3243.
  • [27] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, Fourier Neural Operator for Parametric Partial Differential Equations, in: International Conference on Learning Representations, 2021.
  • [28] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, A. Stuart, K. Bhattacharya, A. Anandkumar, Multipole graph neural operator for parametric partial differential equations, Advances in Neural Information Processing Systems 33 (2020) 6755–6766.
  • [29] B. Raonic, R. Molinaro, T. D. Ryck, T. Rohner, F. Bartolucci, R. Alaifari, S. Mishra, E. de Bezenac, Convolutional neural operators for robust and accurate learning of PDEs, in: Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [30] F. Bartolucci, E. de Bezenac, B. Raonic, R. Molinaro, S. Mishra, R. Alaifari, Representation equivalent neural operators: a framework for alias-free operator learning, in: Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [31] L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, S. G. Johnson, Physics-informed neural networks with hard constraints for inverse design, SIAM Journal on Scientific Computing 43 (6) (2021) B1105–B1132.
  • [32] M. Horie, N. Mitsume, Physics-Embedded Neural Networks: Graph Neural PDE Solvers with Mixed Boundary Conditions, Advances in Neural Information Processing Systems 35 (2022) 23218–23229.
  • [33] N. Sukumar, A. Srivastava, Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks, Computer Methods in Applied Mechanics and Engineering 389 (2022) 114333.
  • [34] S. N. Cohen, D. Jiang, J. Sirignano, Neural Q-learning for solving PDEs, Journal of Machine Learning Research 24 (236) (2023) 1–49.
  • [35] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, A. Anandkumar, Neural operator: Learning maps between function spaces, arXiv preprint arXiv:2108.08481 (2021).
  • [36] J. Pathak, S. Subramanian, P. Harrington, S. Raja, A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall, Z. Li, K. Azizzadenesheli, et al., Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators, arXiv preprint arXiv:2202.11214 (2022).
  • [37] K. Zhang, Y. Zuo, H. Zhao, X. Ma, J. Gu, J. Wang, Y. Yang, C. Yao, J. Yao, Fourier neural operator for solving subsurface oil/water two-phase flow partial differential equation, SPE Journal (2022) 1–15.
  • [38] T. J. Grady II, R. Khan, M. Louboutin, Z. Yin, P. A. Witte, R. Chandra, R. J. Hewett, F. J. Herrmann, Towards large-scale learned solvers for parametric PDEs with model-parallel Fourier neural operators, arXiv preprint arXiv:2204.01205 (2022).
  • [39] G. Wen, Z. Li, K. Azizzadenesheli, A. Anandkumar, S. M. Benson, U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow, Advances in Water Resources 163 (2022) 104180.
  • [40] T. Zhou, X. Wan, D. Z. Huang, Z. Li, Z. Peng, A. Anandkumar, J. F. Brady, P. W. Sternberg, C. Daraio, AI-aided Geometric Design of Anti-infection Catheters, arXiv preprint arXiv:2304.14554 (2023).
  • [41] Z. Liu, H. Wang, H. Zhang, K. Bao, X. Qian, S. Song, Render unto Numerics: Orthogonal Polynomial Neural Operator for PDEs with Non-periodic Boundary Conditions, arXiv preprint arXiv:2206.12698 (2022).
  • [42] B. Bonev, T. Kurth, C. Hundt, J. Pathak, M. Baust, K. Kashinath, A. Anandkumar, Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere, arXiv preprint arXiv:2306.03838 (2023).
  • [43] J. P. Boyd, Chebyshev and Fourier spectral methods, Courier Corporation, 2001.
  • [44] N. Saad, G. Gupta, S. Alizadeh, D. C. Maddix, Guiding continuous operator learning through Physics-based boundary constraints, in: The Eleventh International Conference on Learning Representations , 2023.
  • [45] S. Cao, Choose a transformer: Fourier or galerkin, Advances in neural information processing systems 34 (2021) 24924–24940.
  • [46] X. Liu, B. Xu, L. Zhang, Ht-net: Hierarchical transformer based operator learning model for multiscale PDEs, arXiv preprint arXiv:2210.10890 (2022).
  • [47] H. Wu, T. Hu, H. Luo, J. Wang, M. Long, Solving High-Dimensional PDEs with Latent Spectral Models, in: International Conference on Machine Learning, 2023.
  • [48] G. Gupta, X. Xiao, P. Bogdan, Multiwavelet-based operator learning for differential equations, Advances in Neural Information Processing Systems 34 (2021) 24048–24062.
  • [49] T. Tripura, S. Chakraborty, Wavelet Neural Operator for solving parametric partial differential equations in computational mechanics problems, Computer Methods in Applied Mechanics and Engineering 404 (2023) 115783.
  • [50] V. Fanaskov, I. Oseledets, Spectral Neural Operators, arXiv preprint arXiv:2205.10573 (2022).
  • [51] Y. Z. Ong, Z. Shen, H. Yang, Integral autoencoder network for discretization-invariant learning, Journal of Machine Learning Research 23 (286) (2022) 1–45.
  • [52] A. Bueno-Orovio, D. Kay, K. Burrage, Fourier spectral methods for fractional-in-space reaction-diffusion equations, BIT Numerical mathematics 54 (2014) 937–954.
  • [53] E. S. Wise, J. Jaros, B. T. Cox, B. E. Treeby, Pseudospectral Time-Domain (PSTD) Methods for the Wave Equation: Realizing Boundary Conditions with Discrete Sine and Cosine Transforms, Journal of Theoretical and Computational Acoustics 29 (04) (2021) 2050021.
  • [54] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, arXiv preprint arXiv:1505.04597 (2015).
  • [55] J. P. Boyd, The choice of spectral functions on a sphere for boundary and eigenvalue problems: A comparison of Chebyshev, Fourier and associated Legendre expansions, Monthly Weather Review 106 (8) (1978) 1184–1191.
  • [56] M. Takamoto, T. Praditia, R. Leiteritz, D. MacKinlay, F. Alesiani, D. Pflüger, M. Niepert, PDEBench: An extensive benchmark for scientific machine learning, Advances in Neural Information Processing Systems 35 (2022) 1596–1611.
  • [57] W. Xiong, X. Huang, Z. Zhang, R. Deng, P. Sun, Y. Tian, Koopman neural operator as a mesh-free solver of non-linear partial differential equations, arXiv preprint arXiv:2301.10022 (2023).
  • [58] Z. Li, D. Z. Huang, B. Liu, A. Anandkumar, Fourier neural operator with learned deformations for PDEs on general geometries, arXiv preprint arXiv:2207.05209 (2022).