This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Number Theoretic Accelerated Learning of Physics-Informed Neural Networks

Takashi Matsubara1, Takaharu Yaguchi2
Abstract

Physics-informed neural networks solve partial differential equations by training neural networks. Since this method approximates infinite-dimensional PDE solutions with finite collocation points, minimizing discretization errors by selecting suitable points is essential for accelerating the learning process. Inspired by number theoretic methods for numerical analysis, we introduce good lattice training and periodization tricks, which ensure the conditions required by the theory. Our experiments demonstrate that GLT requires 2–7 times fewer collocation points, resulting in lower computational cost, while achieving competitive performance compared to typical sampling methods.

Introduction

Many real-world phenomena can be modeled as partial differential equations (PDEs), and solving PDEs has been a central topic in computational science. The applications include, but are not limited to, weather forecasting, vehicle design (Hirsch2006), economic analysis (Achdou2014), and computer vision (Logan2015). A PDE is expressed as 𝒩[u]=0\mathcal{N}[u]=0, where 𝒩\mathcal{N} is a (possibly nonlinear) differential operator, and u:Ωu:\Omega\rightarrow\mathbb{R} is an unknown function on the domain Ωs\Omega\subset\mathbb{R}^{s}. For most PDEs that appear in physical simulations, the well-posedness (the uniqueness of the solution uu and the continuous dependence on the initial and boundary conditions) has been well-studied and is typically guaranteed under certain conditions. To solve PDEs, various computational techniques have been explored, including finite difference methods, finite volume methods, and spectral methods (Furihata2010; Morton2005; Thomas1995). However, the development of computer architecture has become slower, leading to a growing need for computationally efficient alternatives. A promising approach is physics-informed neural networks (PINNs) (Raissi2019), which train a neural network by minimizing the physics-informed loss (Wang2022b; Wang2023). This is typically the squared error of the neural network’s output u~\tilde{u} from the PDE 𝒩[u]=0\mathcal{N}[u]=0 averaged over a finite set of collocation points 𝒙j\bm{x}_{j}, 1Nj=0N1𝒩[u~](𝒙j)2\frac{1}{N}\sum_{j=0}^{N-1}\|\mathcal{N}[\tilde{u}](\bm{x}_{j})\|^{2}, encouraging the output u~\tilde{u} to satisfy the equation 𝒩[u~](𝒙j)=0\mathcal{N}[\tilde{u}](\bm{x}_{j})=0.

   Refer to caption    Refer to caption    Refer to caption
uniformly random uniformly spaced LHS
   Refer to caption    Refer to caption    Refer to caption
Sobol sequence proposed GLT proposed GLT (folded)
Figure 1: Examples of sampled collocation points. 128 points for the Sobol sequence, and 144 points for others.

However, the solutions uu to PDEs are inherently infinite-dimensional, and any distance involving the output u~\tilde{u} or the solution uu needs to be defined by an integral over the domain Ω\Omega. In this regard, the physics-informed loss serves as a finite approximation to the squared 2-norm |𝒩[u~]|22=𝒙Ω𝒩[u~](𝒙)2d𝒙|\mathcal{N}[\tilde{u}]|^{2}_{2}=\int_{\bm{x}\in\Omega}\|\mathcal{N}[\tilde{u}](\bm{x})\|^{2}\mathrm{d}\bm{x} on the function space L2(Ω)L^{2}(\Omega) for 𝒩[u]L2(Ω)\mathcal{N}[u]\in L^{2}(\Omega), and hence the discretization errors should affect the training efficiency. A smaller number NN of collocation points leads to a less accurate approximation and inferior performance, while a larger number NN increases the computational cost (Bihlo2022; Sharma2022). Despite the importance of selecting appropriate collocation points, insufficient emphasis has been placed on this aspect. Raissi2019, Zeng2023, and many other studies employed Latin hypercube sampling (LHS) to determine the collocation points. Alternative approaches include uniformly random sampling (i.e., the Monte Carlo method) (Jin2021; Krishnapriyan2022) and uniformly spaced sampling (Wang2021c; Wang2022a). These methods are exemplified in Fig. 1.

In the field of numerical analysis, the relationship between integral approximation and collocation points has been extensively investigated. Accordingly, some studies have used quasi-Monte Carlo methods, specifically the Sobol sequence, which approximate integrals more accurately than the Monte Carlo method (Lye2020; Longo2021; Mishra2021). For further improvement, this paper proposes good lattice training (GLT) for PINNs and their variants, such as the competitive PINN (CPINN) (Zeng2023) and physics-informed neural operator (Li2021e; Rosofsky2023). The GLT is inspired by number theoretic methods for numerical analysis, providing an optimal set of collocation points depending on the initial and boundary conditions, as shown in Fig. 1. Our experiments demonstrate that the proposed GLT requires far fewer collocation points than comparison methods while achieving similar errors, significantly reducing computational cost. The contribution and significance of the proposed GLT are threefold.

Computationally Efficient: The proposed GLT offers an optimal set of collocation points to compute a loss function that can be regarded as a finite approximation to an integral over the domain, such as the physics-informed loss, if the activation functions of the neural networks are smooth enough. It requires significantly fewer collocation points to achieve accuracy of solutions and system identifications comparable to other methods, or can achieve lower errors with the same computational budget.

Applicable to PINNs Variants: As the proposed GLT changes only the collocation points, it can be applied to various variants of PINNs without modifying the learning algorithm or objective function. In this study, we investigate a specific variant, the CPINNs (Zeng2023), and demonstrate that CPINNs using the proposed GLT achieve superior convergence speed with significantly fewer collocation points than CPINNs using LHS.

Theoretically Solid: Number theory provides a theoretical basis for the efficacy of the proposed GLT. Existing methods based on quasi-Monte Carlo methods are inferior to the proposed GLT in theoretical performance, or at least require the prior knowledge about the smoothness α\alpha of the solution uu and careful adjustments of hyperparameters (Longo2021). On the other hand, the proposed GLT is free from these prior knowledge or adjustments and achieves better performances depending on the smoothness α\alpha and the neural network, which is a significant advantage.

Related Work

Neural networks are a powerful tool for processing information and have achieved significant success in various fields (He2015a; Vaswani2017), including black-box system identification, that is, to learn the dynamics of physical phenomena from data and predict their future behaviors (Chen2018e; Chen1990; Wang1998). By integrating knowledge from analytical mechanics, neural networks can learn dynamics that adheres to physical laws and even uncover these laws from data (Finzi2020; Greydanus2019; Matsubara2023ICLR).

Neural networks have also gained attention as computational tools for solving differential equations, particularly PDEs (Dissanayake1994; Lagaris1998). Recently, Raissi2019 introduced an elegant refinement to this approach and named it PINNs. The key concept behind PINNs is the physics-informed loss (Wang2022b; Wang2023). This loss function evaluates the extent to which the output u~\tilde{u} of the neural network satisfies a given PDE 𝒩[u]=0\mathcal{N}[u]=0 and its associated initial and boundary conditions [u]=0\mathcal{B}[u]=0. The physics-informed loss can be integrated into other models like DeepONet (Lu2021a; Wang2023) or used for white-box system identifications (that is, adjusting the parameters of known PDEs so that their solutions fit observations).

PINNs are applied to various PDEs (Bihlo2022; Jin2021; Mao2020), with significant efforts in improving learning algorithms and objective functions (Hao2023; Heldmann2023; Lu2022; Pokkunuru2023; Sharma2022; Zeng2023). Objective functions are generally based on PDEs evaluated at a finite set of collocation points rather than data. Bihlo2022; Sharma2022 have shown a trade-off between the number of collocation points (and hence computational cost) and the accuracy of the solution. Thus, selecting collocation points that efficiently cover the entire domain Ω\Omega is essential for achieving better results. Some studies have employed quasi-Monte Carlo methods, specifically the Sobol sequence, to determine the collocation points (Lye2020; Mishra2021), but their effectiveness depends on knowledge of the solution’s smoothness α\alpha and hyperparameter adjustments (Longo2021).

Method

Theoretical Error Estimate of PINNs

For simplicity, we consider PDEs defined on an ss-dimensional unit cube [0,1]s[0,1]^{s}. PINNs employ a PDE that describes the target physical phenomena as loss functions. Specifically, first, an appropriate set of collocation points L={𝒙jj=0,,N1}L^{*}=\{\bm{x}_{j}\mid j=0,\ldots,N-1\} is determined, and then the sum of the residuals of the PDE at these points

1Nj=0N1𝒫[u~](𝒙j)=1N𝒙jL𝒫[u~](𝒙j),\textstyle\frac{1}{N}\sum_{j=0}^{N-1}\mathcal{P}[\tilde{u}](\bm{x}_{j})=\frac{1}{N}\sum_{{\bm{x}}_{j}\in L^{*}}\mathcal{P}[\tilde{u}](\bm{x}_{j}), (1)

is minimized as a loss function, where 𝒫\mathcal{P} is a differential operator. The physics-informed loss satisfies 𝒫[u~](𝒙)=𝒩[u~](𝒙)2\mathcal{P}[\tilde{u}](\bm{x})=\|\mathcal{N}[\tilde{u}](\bm{x})\|^{2}. Then, the neural network’s output u~\tilde{u} becomes an approximate solution of the PDE. However, for u~\tilde{u} to be the exact solution, the loss function should be 0 for all 𝒙Ω\bm{x}\in\Omega, not just at collocation points. Therefore, the following integral must be minimized as the loss function:

[0,1]s𝒫[u~](𝒙)d𝒙.\textstyle\int_{[0,1]^{s}}\mathcal{P}[\tilde{u}](\bm{x})\mathrm{d}\bm{x}. (2)

In other words, the practical minimization of (1) essentially minimizes the approximation of (2) with the expectation that (2) will be small enough, and hence u~\tilde{u} becomes an accurate approximation to the exact solution.

More precisely, we show the following theorem, which is an improvement of an existing error analysis (Mishra2023estimates) in the sense that the approximation error bound of neural networks is considered.

Theorem 1.

Suppose that the class of neural networks used for PINNs includes an ε1\varepsilon_{1}-approximator u~opt\tilde{u}_{\mathrm{opt}} to the exact solution uu^{*} to the PDE 𝒩[u]=0\mathcal{N}[u]=0: uu~optε1\|u^{*}-\tilde{u}_{\mathrm{opt}}\|\leq\varepsilon_{1}, and that (1) is an ε2\varepsilon_{2}-approximation of (2)(\ref{eq:closs}) for the approximated solution u~\tilde{u} and also for u~opt\tilde{u}_{\mathrm{opt}}: |[0,1]s𝒫[u](𝐱)d𝐱1N𝐱jL𝒫[u](𝐱j)|ε2|\textstyle\int_{[0,1]^{s}}\mathcal{P}[{u}](\bm{x})\mathrm{d}\bm{x}-\frac{1}{N}\sum_{{\bm{x}}_{j}\in L^{*}}\mathcal{P}[u](\bm{x}_{j})|\leq\varepsilon_{2} for u=u~u=\tilde{u} and u=u~optu=\tilde{u}_{\mathrm{opt}}. Suppose also that there exist cp>0c_{\mathrm{p}}>0 and cL>0c_{\mathrm{L}}>0 such that 1cpuv𝒩[u]𝒩[v]cLuv.\frac{1}{c_{\mathrm{p}}}\|u-v\|\leq\|\mathcal{N}[u]-\mathcal{N}[v]\|\leq c_{\mathrm{L}}\|u-v\|. Then,

uu~(1+cpcL)ε1+cp1N𝒙jL𝒫[u~](𝒙j)+ε2.\textstyle\|u^{*}-\tilde{u}\|\leq(1+c_{\mathrm{p}}c_{\mathrm{L}})\varepsilon_{1}+c_{\mathrm{p}}\sqrt{\frac{1}{N}\sum_{{\bm{x}}_{j}\in L^{*}}\mathcal{P}[\tilde{u}](\bm{x}_{j})+\varepsilon_{2}}.

For a proof, see Appendix “Theoretical Background.” ε1\varepsilon_{1} is determined by the architecture of the network and the function space to which the solution belongs. For example, approximation rates of neural networks in Sobolev spaces are given in GUHRING2021107. This explains why increasing the number of collocation points beyond a certain point does not further reduce the error. ε2\varepsilon_{2} depends on the accuracy of the approximation of the integral. In this paper, we investigate a training method that easily gives small ε2\varepsilon_{2}.

One standard strategy often used in practice is to set 𝒙j\bm{x}_{j}’s to be uniformly distributed random numbers, which can be interpreted as the Monte Carlo approximation of the integral (2). As is widely known, the Monte Carlo method can approximate the integral within an error of O(1/N12)O(1/N^{\frac{1}{2}}) independently from the number ss of dimensions (Sloan1994-cl). However, most PDEs for physical simulations are two to four-dimensional, incorporating a three-dimensional space and a one-dimensional time. Hence, in this paper, we propose a sampling strategy specialized for low-dimensional cases, inspired by number-theoretic numerical analysis.

Note that some variants, such as CPINN (Zeng2023), have proposed alternative objective functions. We hereafter denote any variant of physics-informed loss by (1), without loss of generality, as long as it can be regarded as a finite approximation to an integral over a domain, (2).

Good Lattice Training

In this section, we propose the good lattice training (GLT), in which a number theoretic numerical analysis is used to accelerate the training of PINNs (Niederreiter1992-qb; Sloan1994-cl; Zaremba1972-gj). In the following, we use some tools from this theory.

While our target is a PDE on the unit cube [0,1]s[0,1]^{s}, we now treat the loss function 𝒫[u~]\mathcal{P}[\tilde{u}] as periodic on s\mathbb{R}^{s} with a period of 1. Then, we define a lattice.

Definition 2 (Sloan1994-cl).

A lattice LL in s\mathbb{R}^{s} is defined as a finite set of points in s\mathbb{R}^{s} that is closed under addition and subtraction.

Given a lattice LL, the set of collocation points is defined as L={𝒙jj=0,,N1}{the decimal part of 𝒙𝒙L}[0,1]sL^{*}=\{\bm{x}_{j}\mid j=0,\dots,N-1\}\coloneqq\{\mbox{the decimal part of }\bm{x}\mid\bm{x}\in L\}\in[0,1]^{s}. Considering that the loss function to be minimized is (2), it is desirable to determine the lattice LL (and hence the set of collocation points 𝒙j\bm{x}_{j}’s) so that the difference |(2)(1)||(\ref{eq:closs})-(\ref{eq:dloss})| of the two functions is minimized.

Suppose that ε(𝒙)𝒫[u~](𝒙)\varepsilon(\bm{x})\coloneqq\mathcal{P}[\tilde{u}](\bm{x}) is smooth enough, admitting the Fourier series expansion:

ε(𝒙)𝒫[u~](𝒙)=𝒉ε^(𝒉)exp(2πi𝒉𝒙),\textstyle\varepsilon(\bm{x})\coloneqq\mathcal{P}[\tilde{u}](\bm{x})=\sum_{\bm{h}}\hat{\varepsilon}(\bm{h})\exp(2\pi\mathrm{i}\bm{h}\cdot\bm{x}),

where i\mathrm{i} denotes the imaginary unit and 𝒉=(h1,h2,,hs)\bm{h}=(h_{1},h_{2},\ldots,h_{s}) s\in\mathbb{Z}^{s}. Substituting this into (1) yields

|(2)(1)|=|1Nj=0N1𝒉s,𝒉0ε^(𝒉)exp(2πi𝒉𝒙j)|,\textstyle|(\ref{eq:closs})-(\ref{eq:dloss})|=\left|\frac{1}{N}\sum_{j=0}^{N-1}\sum_{\bm{h}\in\mathbb{Z}^{s},\bm{h}\neq 0}\hat{\varepsilon}(\bm{h})\exp(2\pi\mathrm{i}\bm{h}\cdot\bm{x}_{j})\right|, (3)

because the Fourier mode of 𝒉=0\bm{h}=0 is equal to the integral [0,1]sε(𝒙)d𝒙\int_{[0,1]^{s}}\varepsilon(\bm{x})\mathrm{d}\bm{x}. Before optimizing (3), the dual lattice of lattice LL and an insightful lemma are introduced as follows.

Definition 3 (Zaremba1972-gj).

A dual lattice LL^{\top} of a lattice LL is defined as L{𝒉s𝒉𝒙,𝒙L}L^{\top}\coloneqq\{\bm{h}\in\mathbb{R}^{s}\mid\bm{h}\cdot\bm{x}\in\mathbb{Z},\ \forall\bm{x}\in L\}.

Lemma 4 (Zaremba1972-gj).

For 𝐡s\bm{h}\in\mathbb{Z}^{s}, it holds that

1Nj=0N1exp(2πi𝒉𝒙j)={1(𝒉L)0(otherwise.)\textstyle\frac{1}{N}\sum_{j=0}^{N-1}\exp(2\pi\mathrm{i}\bm{h}\cdot\bm{x}_{j})=\begin{cases}1&(\bm{h}\in L^{\top})\\ 0&(\mathrm{otherwise}.)\end{cases}

Lemma 4 follows directly from the properties of Fourier series. Based on this lemma, we restrict the lattice point LL to the form {𝒙𝒙=jN𝒛 for j}\{\bm{x}\mid\bm{x}=\frac{j}{N}\bm{z}\mbox{ for }j\in\mathbb{Z}\} with a fixed integer vector 𝒛\bm{z}; the set LL^{*} of collocation points is {the decimal part of jN𝒛j=0,,N1}\{\mbox{the decimal part of\ }\frac{j}{N}\bm{z}\mid j=0,\ldots,N-1\}. Then, instead of searching 𝒙j\bm{x}_{j}’s, a vector 𝒛\bm{z} is searched. By restricting to this form, 𝒙j\bm{x}_{j}’s can be obtained automatically from a given 𝒛\bm{z}, and hence the optimal collocation points 𝒙j\bm{x}_{j}’s do not need to be stored as a table of numbers, making a significant advantage in implementation. Another advantage is theoretical; the optimization problem of the collocation points can be reformulated in a number theoretic way. In fact, for LL as shown above, it is confirmed that L={𝒉𝒉𝒛0(modN)}L^{\top}=\{\bm{h}\mid\bm{h}\cdot\bm{z}\equiv 0\pmod{N}\}. If 𝒉𝒛0(modN)\bm{h}\cdot\bm{z}\equiv 0\pmod{N} then there exists an mm\in\mathbb{Z} such that 𝒉𝒛=mN\bm{h}\cdot\bm{z}=mN and hence jN𝒉𝒛=mj\frac{j}{N}\bm{h}\cdot\bm{z}=mj\in\mathbb{Z}. Conversely, if 𝒉𝒛0(modN)\bm{h}\cdot\bm{z}\not\equiv 0\pmod{N}, clearly 1N𝒉𝒛\frac{1}{N}\bm{h}\cdot\bm{z}\notin\mathbb{Z}.

From the above lemma,

(3)𝒉s,𝒉0,𝒉𝒛0(modN)|ε^(𝒉)|,\textstyle(\ref{eq:interror0})\leq\sum_{\bm{h}\in\mathbb{Z}^{s},\bm{h}\neq 0,\bm{h}\cdot\bm{z}\equiv 0\pmod{N}}|\hat{\varepsilon}(\bm{h})|, (4)

and hence the collocation points 𝒙j\bm{x}_{j}’s should be determined so that (4) becomes small. This problem is a number theoretic problem in the sense that it is a minimization problem of finding an integer vector 𝒉\bm{h} subject to the condition 𝒉𝒛0(modN)\bm{h}\cdot\bm{z}\equiv 0\pmod{N}. This problem has been considered in the field of number theoretic numerical analysis. In particular, optimal solutions have been investigated for integrands in the Korobov spaces, which are spaces of functions that satisfy a certain smoothness condition.

Definition 5 (Zaremba1972-gj).

The function space that is defined as Eα={f:[0,1]sc,|f^(𝒉)|c(h¯1h¯2h¯s)α}E_{\alpha}=\{f:[0,1]^{s}\to\mathbb{R}\mid{}^{\exists}c,|\hat{f}(\bm{h})|\leq\frac{c}{(\bar{h}_{1}\bar{h}_{2}\cdots\bar{h}_{s})^{\alpha}}\} is called the Korobov space, where f^(𝒉)\hat{f}(\bm{h}) is the Fourier coefficients of ff and k¯=max(1,|k|)\bar{k}=\max(1,|k|) for kk\in\mathbb{R}.

It is known that if α\alpha is an integer, for a function ff to be in EαE_{\alpha}, it is sufficient that ff has continuous partial derivatives q1+q2++qs1q12q2sqsf,0qkα(k=1,,s)\frac{\partial^{q_{1}+q_{2}+\cdots+q_{s}}}{\partial_{1}^{q_{1}}\cdot\partial_{2}^{q_{2}}\cdots\partial_{s}^{q_{s}}}f,0\leq q_{k}\leq\alpha\ (k=1,\ldots,s). For example, if a function f(x,y):2f(x,y):\mathbb{R}^{2}\to\mathbb{R} has continuous fx,fy,fxyf_{x},f_{y},f_{xy}, then fE1f\in E_{1}. Hence, if 𝒫[u~]\mathcal{P}[\tilde{u}] and the neural network belong to Koborov space,

(4)𝒉s,𝒉0,𝒉𝒛0(modN)c(h¯1h¯2h¯s)α.\textstyle(\ref{eq:interror})\leq\sum_{\bm{h}\in\mathbb{Z}^{s},\bm{h}\neq 0,\bm{h}\cdot\bm{z}\equiv 0\pmod{N}}\frac{c}{(\bar{h}_{1}\bar{h}_{2}\cdots\bar{h}_{s})^{\alpha}}. (5)

Here, we introduce a theorem in the field of number theoretic numerical analysis:

Theorem 6 (Sloan1994-cl).

For integers N2N\geq 2 and s2s\geq 2, there exists a 𝐳s\bm{z}\in\mathbb{Z}^{s} such that

Pα(𝒛,N)(2logN)αsNα+O((logN)αs1Nα).\textstyle P_{\alpha}(\bm{z},N)\leq\frac{(2\log N)^{\alpha s}}{N^{\alpha}}+O\left(\frac{(\log N)^{\alpha s-1}}{N^{\alpha}}\right).

for Pα(𝐳,N)=1(h¯1h¯2h¯s)αP_{\alpha}(\bm{z},N)=\frac{1}{(\bar{h}_{1}\bar{h}_{2}\cdots\bar{h}_{s})^{\alpha}}.

The main result of this paper is the following.

Theorem 7.

Suppose that the activation function of u~\tilde{u} and hence u~\tilde{u} itself are sufficiently smooth so that there exists an α>0\alpha>0 such that 𝒫[u~]Eα\mathcal{P}[\tilde{u}]\in E_{\alpha}. Then, for given integers N2N\geq 2 and s2s\geq 2, there exists an integer vector 𝐳s\bm{z}\in\mathbb{Z}^{s} such that L={the decimal part of jN𝐳j=0,,N1}L^{*}=\{\mbox{the decimal part of\ }\frac{j}{N}\bm{z}\mid j=0,\ldots,N-1\} is a “good lattice” in the sense that

|[0,1]s𝒫[u~](𝒙)d𝒙1N𝒙jL𝒫[u~](𝒙j)|=O((logN)αsNα).\textstyle\left|\int_{[0,1]^{s}}\!\!\!\mathcal{P}[\tilde{u}](\bm{x})\mathrm{d}\bm{x}\!-\!\frac{1}{N}\!\!\sum_{\bm{x}_{j}\in L^{*}}\!\!\mathcal{P}[\tilde{u}](\bm{x}_{j})\right|=\!O\left(\frac{(\log N)^{\alpha s}}{N^{\alpha}}\right)\!\!. (6)

Intuitively, if 𝒫[u~]\mathcal{P}[\tilde{u}] satisfies certain conditions, we can find a set LL^{*} of collocation points with which the objective function (1) approximates the integral (2) only within an error of O((logN)αsNα)O(\frac{(\log N)^{\alpha s}}{N^{\alpha}}). This rate is much better than that of the uniformly random sampling (i.e., the Monte Carlo method), which is of O(1/N12)O(1/N^{\frac{1}{2}}) (Sloan1994-cl), if the activation function of u~\tilde{u} and hence the neural network u~\tilde{u} itself are sufficiently smooth so that 𝒫[u~]Eα\mathcal{P}[\tilde{u}]\in E_{\alpha} for a large α\alpha. Hence, in this paper, we call the training method that minimizes (1) for a lattice LL satisfying (6) the good lattice training (GLT).

While any set of collocation points that satisfies the above condition will have the same convergence rate, a set constructed by the vector 𝒛s\bm{z}\in\mathbb{Z}^{s} that minimizes (5) leads to better accuracy. When s=2s=2, it is known that a good lattice can be constructed by setting N=FkN=F_{k} and 𝒛=(1,Fk1)\bm{z}=(1,F_{k-1}), where FkF_{k} denotes the kk-th Fibonacci number (Niederreiter1992-qb; Sloan1994-cl). In general, an algorithm exists that can determine the optimal 𝒛\bm{z} with a computational cost of O(N2)O(N^{2}). See Appendix “Theoretical Background” for more details. Also, we can retrieve the optimal 𝒛\bm{z} from numerical tables found in references, such as Fang1994; Keng1981.

Periodization and Randomization Tricks

The integrand 𝒫[u~]\mathcal{P}[\tilde{u}] of the loss function (2) does not always belong to the Korobov space EαE_{\alpha} with high smoothness α\alpha. To align the proposed GLT with theoretical expectations, we propose periodization tricks for ensuring periodicity and smoothness.

Given an initial condition at time t=0t=0, the periodicity is ensured by extending the lattice twice as much along the time coordinate and folding it. Specifically, instead of tt, we use t^\hat{t} as the time coordinate that satisfies t=2t^t=2\hat{t} if t^<0.5\hat{t}<0.5 and t=2(1t^)t=2(1-\hat{t}) otherwise (see the lower right panel of Fig. 1, where the time is put on the horizontal axis). Also, while not mandatory, we combine the initial condition u0(𝒙/t)u_{0}(\bm{x}_{\!/t}) and the neural network’s output u~(t,)\tilde{u}(t,\dots) as exp(t)u0(𝒙/t)+(1exp(t))u~(t,𝒙/t)\exp(-t)u_{0}(\bm{x}_{\!/t})+(1-\exp(-t))\tilde{u}(t,\bm{x}_{\!/t}), thereby ensuring the initial condition, where 𝒙/t\bm{x}_{\!/t} denotes the set of coordinates except for the time coordinate tt. A similar idea was proposed in Lagaris1998, which however does not ensure the initial condition strictly. If a periodic boundary condition is given to the kk-th space coordinate, we bypass learning it and instead map the coordinate xkx_{k} to a unit circle in two-dimensional space. Specifically, we map xkx_{k} to (xk(1),xk(2))=(cos(2πxk),sin(2πxk))(x_{k}^{(1)},x_{k}^{(2)})=(\cos(2\pi x_{k}),\sin(2\pi x_{k})), assuring the loss function 𝒫[u~]\mathcal{P}[\tilde{u}] to take the same value at the both edges (xk=0x_{k}=0 and xk=1x_{k}=1) and be periodic. Given a Dirichlet boundary condition u=0u=0 at Ω\partial\Omega to the kk-th axis, we multiply the neural network’s output u~(,xk,)\tilde{u}(\dots,x_{k},\dots) by xk(1xk)x_{k}(1-x_{k}), and treat the result as the approximated solution. This ensures the Dirichlet boundary condition is met, and the loss function 𝒫[u~]\mathcal{P}[\tilde{u}] takes zero at the boundary Ω\partial\Omega, thereby ensuring the periodicity. If a more complicated Dirichlet boundary condition is given, one can fold the lattice along the space coordinate in the same manner as the time coordinate and ensure the periodicity of the loss function 𝒫[u~]\mathcal{P}[\tilde{u}].

These periodization tricks aim to satisfy the periodicity conditions necessary for GLT to exhibit the performance shown in Theorem 7. However, they are also available for other sampling methods and potentially improve the practical performance by liberating them from the effort of learning initial and boundary conditions.

Since the GLT is grounded on the Fourier series foundation, it allows for the periodic shifting of the lattice. Hence, we randomize the collocation points as

L={the decimal part of jN𝒛+𝒓j=0,,N1},\textstyle L^{*}=\{\mbox{the decimal part of\ }\frac{j}{N}\bm{z}+{\bm{r}}\mid j=0,\ldots,N-1\},

where 𝒓{\bm{r}} follows the uniform distribution over the unit cube [0,1]s[0,1]^{s}. Our preliminary experiments confirmed that, if using the stochastic gradient descent (SGD) algorithm, resampling the random numbers 𝒓{\bm{r}} at each training iteration prevents the neural network from overfitting and improves training efficiency. We call this approach the randomization trick.

Limitations

Not only is this true for the proposed GLT, but most strategies to determine collocation points are not directly applicable to non-rectangular or non-flat domain Ω\Omega (Shankar2018). To achieve the best performance, the PDEs should be transformed to such a domain by an appropriate coordinate transformation. See Knupp2020-ix; Thompson1985-tx for examples.

Previous studies on numerical analysis addressed the periodicity and smoothness conditions on the integrand by variable transformations (Sloan1994-cl) (see also Appendix “Theoretical Background”). However, our preliminary experiments confirmed that it did not perform optimally in typical problem settings. Intuitively, these variable transformations reduce the weights of regions that are difficult to integrate, suppressing the discretization error. This implies that, when used for training, the regions with small weights remain unlearned. As a viable alternative, we introduced the periodization tricks to ensure periodicity.

The performance depends on the smoothness of the physics-informed loss, and hence on the smoothness of the neural network and the true solution. See Appendix “Theoretical Background” for details.

Experiments and Results

Physics-Informed Neural Networks

We modified the code from the official repository111https://github.com/maziarraissi/PINNs (MIT license) of Raissi2019, the original paper on PINNs. We obtained the datasets of the nonlinear Schrödinger (NLS) equation, Korteweg–De Vries (KdV) equation, and Allen-Cahn (AC) equation from the repository. The NLS equation governs wave functions in quantum mechanics, while the KdV equation models shallow water waves, and the AC equation characterizes phase separation in co-polymer melts. These datasets provide numerical solutions to initial value problems with periodic boundary conditions. Although they contain numerical errors, we treated them as the true solutions uu. These equations are nonlinear versions of hyperbolic or parabolic PDEs. Additionally, as a nonlinear version of an elliptic PDE, we created a dataset for ss-dimensional Poisson’s equation, which produces analytically solvable solutions with 2s2^{s} modes with the Dirichlet boundary condition. We examined the cases where s{2,4}s\in\{2,4\}. See Appendix “Experimental Settings” for further information.

Unless otherwise stated, we followed the repository’s experimental settings for the NLS equation. The physics-informed loss was defined as 𝒫[u~]=1Nj=0N1𝒩[u~](𝒙j)2\mathcal{P}[\tilde{u}]=\frac{1}{N}\sum_{j=0}^{N-1}\|\mathcal{N}[\tilde{u}](\bm{x}_{j})\|^{2} given NN collocation points {𝒙j}j=0N1\{\bm{x}_{j}\}_{j=0}^{N-1}. This can be regarded as a finite approximation to the squared 2-norm |𝒩[u~]|22=Ω𝒩[u~](𝒙)2d𝒙|\mathcal{N}[\tilde{u}]|_{2}^{2}=\int_{\Omega}\|\mathcal{N}[\tilde{u}](\bm{x})\|^{2}\mathrm{d}\bm{x}. The state of the NLS equation is complex; we simply treated it as a 2D real vector for training and used its absolute value for evaluation and visualization. Following Raissi2019, we evaluated the performance using the relative error, which is the normalized squared error (u~,u;𝒙j)=(j=0Ne1u~(𝒙j)u(𝒙j)2)12/(j=0Ne1u(𝒙j)2)12\mathcal{L}(\tilde{u},u;\bm{x}_{j})=(\sum_{j=0}^{N_{e}-1}\|\tilde{u}(\bm{x}_{j})-u(\bm{x}_{j})\|^{2})^{\frac{1}{2}}/(\sum_{j=0}^{N_{e}-1}\|u(\bm{x}_{j})\|^{2})^{\frac{1}{2}} at predefined NeN_{e} collocation points {𝒙j}j=0Ne1\{\bm{x}_{j}\}_{j=0}^{N_{e}-1}. This is also a finite approximation to |u~u|2/|u|2|\tilde{u}-u|_{2}/|u|_{2}.

We applied the above periodization tricks to each test and model for a fair comparison. For Poisson’s equation with s=2s=2, which gives the exact solutions, we followed the original learning strategy using the L-BFGS-B method preceded by the Adam optimizer (Kingma2014b) for 50,000 iterations to ensure precise convergence. For other datasets, which contain the numerical solutions, we trained PINNs using the Adam optimizer with cosine decay of a single cycle to zero (Loshchilov2017) for 200,000 iterations and sampled a different set of collocation points at each iteration. See Appendix “Experimental Settings” for details.

We determined the collocation points using uniformly random sampling, uniformly spaced sampling, LHS, the Sobol sequence, and the proposed GLT. For the GLT, we took the number NN of collocation points and the corresponding integer vector 𝒛\bm{z} from numerical tables in (Fang1994; Keng1981). We used the same values for uniformly random sampling and LHS to maintain consistency. For uniformly spaced sampling, we selected numbers N=msN=m^{s} for mm\in\mathbb{N} that were closest to a number used in the GLT, creating a unit cube of mm points on each side. We additionally applied the randomization trick. For the Sobol sequence, we used N=2mN=2^{m} for mm\in\mathbb{N} due to its theoretical background. We conducted five trials for each number NN and each method. All experiments were conducted using Python v3.7.16 and tensorflow v1.15.5 (tensorflow) on servers with Intel Xeon Platinum 8368.

Refer to caption
Figure 2: The number NN of collocation points and the standard deviation of the physics-informed loss, which approximates the discretization error |(2)(1)||(\ref{eq:closs})-(\ref{eq:dloss})|.
Table 1: Trade-Off between Number NN of Collocation Points and Relative Error \mathcal{L}.
# of points NN^{\dagger} relative error \mathcal{L}^{\ddagger}
NLS KdV AC Poisson NLS KdV AC Poisson
s=2s=2 s=4s=4 s=2s=2 s=4s=4

\blacktriangle

uniformly random >>4,181 >>4,181 4,181 >>4,181 1,019 3.11 2.97 1.55 28.53 0.28