Number Theoretic Accelerated Learning of Physics-Informed Neural Networks

Takashi Matsubara¹, Takaharu Yaguchi²

Abstract

Physics-informed neural networks solve partial differential equations by training neural networks. Since this method approximates infinite-dimensional PDE solutions with finite collocation points, minimizing discretization errors by selecting suitable points is essential for accelerating the learning process. Inspired by number theoretic methods for numerical analysis, we introduce good lattice training and periodization tricks, which ensure the conditions required by the theory. Our experiments demonstrate that GLT requires 2–7 times fewer collocation points, resulting in lower computational cost, while achieving competitive performance compared to typical sampling methods.

Introduction

Many real-world phenomena can be modeled as partial differential equations (PDEs), and solving PDEs has been a central topic in computational science. The applications include, but are not limited to, weather forecasting, vehicle design (Hirsch2006), economic analysis (Achdou2014), and computer vision (Logan2015). A PDE is expressed as $\mathcal{N}[u]=0$ , where $\mathcal{N}$ is a (possibly nonlinear) differential operator, and $u:\Omega\rightarrow\mathbb{R}$ is an unknown function on the domain $\Omega\subset\mathbb{R}^{s}$ . For most PDEs that appear in physical simulations, the well-posedness (the uniqueness of the solution $u$ and the continuous dependence on the initial and boundary conditions) has been well-studied and is typically guaranteed under certain conditions. To solve PDEs, various computational techniques have been explored, including finite difference methods, finite volume methods, and spectral methods (Furihata2010; Morton2005; Thomas1995). However, the development of computer architecture has become slower, leading to a growing need for computationally efficient alternatives. A promising approach is physics-informed neural networks (PINNs) (Raissi2019), which train a neural network by minimizing the physics-informed loss (Wang2022b; Wang2023). This is typically the squared error of the neural network’s output $\tilde{u}$ from the PDE $\mathcal{N}[u]=0$ averaged over a finite set of collocation points $\bm{x}_{j}$ , $\frac{1}{N}\sum_{j=0}^{N-1}\|\mathcal{N}[\tilde{u}](\bm{x}_{j})\|^{2}$ , encouraging the output $\tilde{u}$ to satisfy the equation $\mathcal{N}[\tilde{u}](\bm{x}_{j})=0$ .


uniformly random	uniformly spaced	LHS

Sobol sequence	proposed GLT	proposed GLT (folded)

Figure 1: Examples of sampled collocation points. 128 points for the Sobol sequence, and 144 points for others.

However, the solutions $u$ to PDEs are inherently infinite-dimensional, and any distance involving the output $\tilde{u}$ or the solution $u$ needs to be defined by an integral over the domain $\Omega$ . In this regard, the physics-informed loss serves as a finite approximation to the squared 2-norm $|\mathcal{N}[\tilde{u}]|^{2}_{2}=\int_{\bm{x}\in\Omega}\|\mathcal{N}[\tilde{u}](\bm{x})\|^{2}\mathrm{d}\bm{x}$ on the function space $L^{2}(\Omega)$ for $\mathcal{N}[u]\in L^{2}(\Omega)$ , and hence the discretization errors should affect the training efficiency. A smaller number $N$ of collocation points leads to a less accurate approximation and inferior performance, while a larger number $N$ increases the computational cost (Bihlo2022; Sharma2022). Despite the importance of selecting appropriate collocation points, insufficient emphasis has been placed on this aspect. Raissi2019, Zeng2023, and many other studies employed Latin hypercube sampling (LHS) to determine the collocation points. Alternative approaches include uniformly random sampling (i.e., the Monte Carlo method) (Jin2021; Krishnapriyan2022) and uniformly spaced sampling (Wang2021c; Wang2022a). These methods are exemplified in Fig. 1.

In the field of numerical analysis, the relationship between integral approximation and collocation points has been extensively investigated. Accordingly, some studies have used quasi-Monte Carlo methods, specifically the Sobol sequence, which approximate integrals more accurately than the Monte Carlo method (Lye2020; Longo2021; Mishra2021). For further improvement, this paper proposes good lattice training (GLT) for PINNs and their variants, such as the competitive PINN (CPINN) (Zeng2023) and physics-informed neural operator (Li2021e; Rosofsky2023). The GLT is inspired by number theoretic methods for numerical analysis, providing an optimal set of collocation points depending on the initial and boundary conditions, as shown in Fig. 1. Our experiments demonstrate that the proposed GLT requires far fewer collocation points than comparison methods while achieving similar errors, significantly reducing computational cost. The contribution and significance of the proposed GLT are threefold.

Computationally Efficient: The proposed GLT offers an optimal set of collocation points to compute a loss function that can be regarded as a finite approximation to an integral over the domain, such as the physics-informed loss, if the activation functions of the neural networks are smooth enough. It requires significantly fewer collocation points to achieve accuracy of solutions and system identifications comparable to other methods, or can achieve lower errors with the same computational budget.

Applicable to PINNs Variants: As the proposed GLT changes only the collocation points, it can be applied to various variants of PINNs without modifying the learning algorithm or objective function. In this study, we investigate a specific variant, the CPINNs (Zeng2023), and demonstrate that CPINNs using the proposed GLT achieve superior convergence speed with significantly fewer collocation points than CPINNs using LHS.

Theoretically Solid: Number theory provides a theoretical basis for the efficacy of the proposed GLT. Existing methods based on quasi-Monte Carlo methods are inferior to the proposed GLT in theoretical performance, or at least require the prior knowledge about the smoothness $\alpha$ of the solution $u$ and careful adjustments of hyperparameters (Longo2021). On the other hand, the proposed GLT is free from these prior knowledge or adjustments and achieves better performances depending on the smoothness $\alpha$ and the neural network, which is a significant advantage.

Related Work

Neural networks are a powerful tool for processing information and have achieved significant success in various fields (He2015a; Vaswani2017), including black-box system identification, that is, to learn the dynamics of physical phenomena from data and predict their future behaviors (Chen2018e; Chen1990; Wang1998). By integrating knowledge from analytical mechanics, neural networks can learn dynamics that adheres to physical laws and even uncover these laws from data (Finzi2020; Greydanus2019; Matsubara2023ICLR).

Neural networks have also gained attention as computational tools for solving differential equations, particularly PDEs (Dissanayake1994; Lagaris1998). Recently, Raissi2019 introduced an elegant refinement to this approach and named it PINNs. The key concept behind PINNs is the physics-informed loss (Wang2022b; Wang2023). This loss function evaluates the extent to which the output $\tilde{u}$ of the neural network satisfies a given PDE $\mathcal{N}[u]=0$ and its associated initial and boundary conditions $\mathcal{B}[u]=0$ . The physics-informed loss can be integrated into other models like DeepONet (Lu2021a; Wang2023) or used for white-box system identifications (that is, adjusting the parameters of known PDEs so that their solutions fit observations).

PINNs are applied to various PDEs (Bihlo2022; Jin2021; Mao2020), with significant efforts in improving learning algorithms and objective functions (Hao2023; Heldmann2023; Lu2022; Pokkunuru2023; Sharma2022; Zeng2023). Objective functions are generally based on PDEs evaluated at a finite set of collocation points rather than data. Bihlo2022; Sharma2022 have shown a trade-off between the number of collocation points (and hence computational cost) and the accuracy of the solution. Thus, selecting collocation points that efficiently cover the entire domain $\Omega$ is essential for achieving better results. Some studies have employed quasi-Monte Carlo methods, specifically the Sobol sequence, to determine the collocation points (Lye2020; Mishra2021), but their effectiveness depends on knowledge of the solution’s smoothness $\alpha$ and hyperparameter adjustments (Longo2021).

Method

Theoretical Error Estimate of PINNs

For simplicity, we consider PDEs defined on an $s$ -dimensional unit cube $[0,1]^{s}$ . PINNs employ a PDE that describes the target physical phenomena as loss functions. Specifically, first, an appropriate set of collocation points $L^{*}=\{\bm{x}_{j}\mid j=0,\ldots,N-1\}$ is determined, and then the sum of the residuals of the PDE at these points

\textstyle\frac{1}{N}\sum_{j=0}^{N-1}\mathcal{P}[\tilde{u}](\bm{x}_{j})=\frac{1}{N}\sum_{{\bm{x}}_{j}\in L^{*}}\mathcal{P}[\tilde{u}](\bm{x}_{j}),

(1)

is minimized as a loss function, where $\mathcal{P}$ is a differential operator. The physics-informed loss satisfies $\mathcal{P}[\tilde{u}](\bm{x})=\|\mathcal{N}[\tilde{u}](\bm{x})\|^{2}$ . Then, the neural network’s output $\tilde{u}$ becomes an approximate solution of the PDE. However, for $\tilde{u}$ to be the exact solution, the loss function should be 0 for all $\bm{x}\in\Omega$ , not just at collocation points. Therefore, the following integral must be minimized as the loss function:

\textstyle\int_{[0,1]^{s}}\mathcal{P}[\tilde{u}](\bm{x})\mathrm{d}\bm{x}.

(2)

In other words, the practical minimization of (1) essentially minimizes the approximation of (2) with the expectation that (2) will be small enough, and hence $\tilde{u}$ becomes an accurate approximation to the exact solution.

More precisely, we show the following theorem, which is an improvement of an existing error analysis (Mishra2023estimates) in the sense that the approximation error bound of neural networks is considered.

Theorem 1.

Suppose that the class of neural networks used for PINNs includes an $\varepsilon_{1}$ -approximator $\tilde{u}_{\mathrm{opt}}$ to the exact solution $u^{*}$ to the PDE $\mathcal{N}[u]=0$ : $\|u^{*}-\tilde{u}_{\mathrm{opt}}\|\leq\varepsilon_{1}$ , and that (1) is an $\varepsilon_{2}$ -approximation of $(\ref{eq:closs})$ for the approximated solution $\tilde{u}$ and also for $\tilde{u}_{\mathrm{opt}}$ : $|\textstyle\int_{[0,1]^{s}}\mathcal{P}[{u}](\bm{x})\mathrm{d}\bm{x}-\frac{1}{N}\sum_{{\bm{x}}_{j}\in L^{*}}\mathcal{P}[u](\bm{x}_{j})|\leq\varepsilon_{2}$ for $u=\tilde{u}$ and $u=\tilde{u}_{\mathrm{opt}}$ . Suppose also that there exist $c_{\mathrm{p}}>0$ and $c_{\mathrm{L}}>0$ such that $\frac{1}{c_{\mathrm{p}}}\|u-v\|\leq\|\mathcal{N}[u]-\mathcal{N}[v]\|\leq c_{\mathrm{L}}\|u-v\|.$ Then,

\textstyle\|u^{*}-\tilde{u}\|\leq(1+c_{\mathrm{p}}c_{\mathrm{L}})\varepsilon_{1}+c_{\mathrm{p}}\sqrt{\frac{1}{N}\sum_{{\bm{x}}_{j}\in L^{*}}\mathcal{P}[\tilde{u}](\bm{x}_{j})+\varepsilon_{2}}.

For a proof, see Appendix “Theoretical Background.” $\varepsilon_{1}$ is determined by the architecture of the network and the function space to which the solution belongs. For example, approximation rates of neural networks in Sobolev spaces are given in GUHRING2021107. This explains why increasing the number of collocation points beyond a certain point does not further reduce the error. $\varepsilon_{2}$ depends on the accuracy of the approximation of the integral. In this paper, we investigate a training method that easily gives small $\varepsilon_{2}$ .

One standard strategy often used in practice is to set $\bm{x}_{j}$ ’s to be uniformly distributed random numbers, which can be interpreted as the Monte Carlo approximation of the integral (2). As is widely known, the Monte Carlo method can approximate the integral within an error of $O(1/N^{\frac{1}{2}})$ independently from the number $s$ of dimensions (Sloan1994-cl). However, most PDEs for physical simulations are two to four-dimensional, incorporating a three-dimensional space and a one-dimensional time. Hence, in this paper, we propose a sampling strategy specialized for low-dimensional cases, inspired by number-theoretic numerical analysis.

Note that some variants, such as CPINN (Zeng2023), have proposed alternative objective functions. We hereafter denote any variant of physics-informed loss by (1), without loss of generality, as long as it can be regarded as a finite approximation to an integral over a domain, (2).

Good Lattice Training

In this section, we propose the good lattice training (GLT), in which a number theoretic numerical analysis is used to accelerate the training of PINNs (Niederreiter1992-qb; Sloan1994-cl; Zaremba1972-gj). In the following, we use some tools from this theory.

While our target is a PDE on the unit cube $[0,1]^{s}$ , we now treat the loss function $\mathcal{P}[\tilde{u}]$ as periodic on $\mathbb{R}^{s}$ with a period of 1. Then, we define a lattice.

Definition 2 (Sloan1994-cl).

A lattice $L$ in $\mathbb{R}^{s}$ is defined as a finite set of points in $\mathbb{R}^{s}$ that is closed under addition and subtraction.

Given a lattice $L$ , the set of collocation points is defined as $L^{*}=\{\bm{x}_{j}\mid j=0,\dots,N-1\}\coloneqq\{\mbox{the decimal part of }\bm{x}\mid\bm{x}\in L\}\in[0,1]^{s}$ . Considering that the loss function to be minimized is (2), it is desirable to determine the lattice $L$ (and hence the set of collocation points $\bm{x}_{j}$ ’s) so that the difference $|(\ref{eq:closs})-(\ref{eq:dloss})|$ of the two functions is minimized.

Suppose that $\varepsilon(\bm{x})\coloneqq\mathcal{P}[\tilde{u}](\bm{x})$ is smooth enough, admitting the Fourier series expansion:

\textstyle\varepsilon(\bm{x})\coloneqq\mathcal{P}[\tilde{u}](\bm{x})=\sum_{\bm{h}}\hat{\varepsilon}(\bm{h})\exp(2\pi\mathrm{i}\bm{h}\cdot\bm{x}),

where $\mathrm{i}$ denotes the imaginary unit and $\bm{h}=(h_{1},h_{2},\ldots,h_{s})$ $\in\mathbb{Z}^{s}$ . Substituting this into (1) yields

\textstyle|(\ref{eq:closs})-(\ref{eq:dloss})|=\left|\frac{1}{N}\sum_{j=0}^{N-1}\sum_{\bm{h}\in\mathbb{Z}^{s},\bm{h}\neq 0}\hat{\varepsilon}(\bm{h})\exp(2\pi\mathrm{i}\bm{h}\cdot\bm{x}_{j})\right|,

(3)

because the Fourier mode of $\bm{h}=0$ is equal to the integral $\int_{[0,1]^{s}}\varepsilon(\bm{x})\mathrm{d}\bm{x}$ . Before optimizing (3), the dual lattice of lattice $L$ and an insightful lemma are introduced as follows.

Definition 3 (Zaremba1972-gj).

A dual lattice $L^{\top}$ of a lattice $L$ is defined as $L^{\top}\coloneqq\{\bm{h}\in\mathbb{R}^{s}\mid\bm{h}\cdot\bm{x}\in\mathbb{Z},\ \forall\bm{x}\in L\}$ .

Lemma 4 (Zaremba1972-gj).

For $\bm{h}\in\mathbb{Z}^{s}$ , it holds that

\textstyle\frac{1}{N}\sum_{j=0}^{N-1}\exp(2\pi\mathrm{i}\bm{h}\cdot\bm{x}_{j})=\begin{cases}1&(\bm{h}\in L^{\top})\\ 0&(\mathrm{otherwise}.)\end{cases}

Lemma 4 follows directly from the properties of Fourier series. Based on this lemma, we restrict the lattice point $L$ to the form $\{\bm{x}\mid\bm{x}=\frac{j}{N}\bm{z}\mbox{ for }j\in\mathbb{Z}\}$ with a fixed integer vector $\bm{z}$ ; the set $L^{*}$ of collocation points is $\{\mbox{the decimal part of\ }\frac{j}{N}\bm{z}\mid j=0,\ldots,N-1\}$ . Then, instead of searching $\bm{x}_{j}$ ’s, a vector $\bm{z}$ is searched. By restricting to this form, $\bm{x}_{j}$ ’s can be obtained automatically from a given $\bm{z}$ , and hence the optimal collocation points $\bm{x}_{j}$ ’s do not need to be stored as a table of numbers, making a significant advantage in implementation. Another advantage is theoretical; the optimization problem of the collocation points can be reformulated in a number theoretic way. In fact, for $L$ as shown above, it is confirmed that $L^{\top}=\{\bm{h}\mid\bm{h}\cdot\bm{z}\equiv 0\pmod{N}\}$ . If $\bm{h}\cdot\bm{z}\equiv 0\pmod{N}$ then there exists an $m\in\mathbb{Z}$ such that $\bm{h}\cdot\bm{z}=mN$ and hence $\frac{j}{N}\bm{h}\cdot\bm{z}=mj\in\mathbb{Z}$ . Conversely, if $\bm{h}\cdot\bm{z}\not\equiv 0\pmod{N}$ , clearly $\frac{1}{N}\bm{h}\cdot\bm{z}\notin\mathbb{Z}$ .

From the above lemma,

\textstyle(\ref{eq:interror0})\leq\sum_{\bm{h}\in\mathbb{Z}^{s},\bm{h}\neq 0,\bm{h}\cdot\bm{z}\equiv 0\pmod{N}}|\hat{\varepsilon}(\bm{h})|,

(4)

and hence the collocation points $\bm{x}_{j}$ ’s should be determined so that (4) becomes small. This problem is a number theoretic problem in the sense that it is a minimization problem of finding an integer vector $\bm{h}$ subject to the condition $\bm{h}\cdot\bm{z}\equiv 0\pmod{N}$ . This problem has been considered in the field of number theoretic numerical analysis. In particular, optimal solutions have been investigated for integrands in the Korobov spaces, which are spaces of functions that satisfy a certain smoothness condition.

Definition 5 (Zaremba1972-gj).

The function space that is defined as $E_{\alpha}=\{f:[0,1]^{s}\to\mathbb{R}\mid{}^{\exists}c,|\hat{f}(\bm{h})|\leq\frac{c}{(\bar{h}_{1}\bar{h}_{2}\cdots\bar{h}_{s})^{\alpha}}\}$ is called the Korobov space, where $\hat{f}(\bm{h})$ is the Fourier coefficients of $f$ and $\bar{k}=\max(1,|k|)$ for $k\in\mathbb{R}$ .

It is known that if $\alpha$ is an integer, for a function $f$ to be in $E_{\alpha}$ , it is sufficient that $f$ has continuous partial derivatives $\frac{\partial^{q_{1}+q_{2}+\cdots+q_{s}}}{\partial_{1}^{q_{1}}\cdot\partial_{2}^{q_{2}}\cdots\partial_{s}^{q_{s}}}f,0\leq q_{k}\leq\alpha\ (k=1,\ldots,s)$ . For example, if a function $f(x,y):\mathbb{R}^{2}\to\mathbb{R}$ has continuous $f_{x},f_{y},f_{xy}$ , then $f\in E_{1}$ . Hence, if $\mathcal{P}[\tilde{u}]$ and the neural network belong to Koborov space,

\textstyle(\ref{eq:interror})\leq\sum_{\bm{h}\in\mathbb{Z}^{s},\bm{h}\neq 0,\bm{h}\cdot\bm{z}\equiv 0\pmod{N}}\frac{c}{(\bar{h}_{1}\bar{h}_{2}\cdots\bar{h}_{s})^{\alpha}}.

(5)

Here, we introduce a theorem in the field of number theoretic numerical analysis:

Theorem 6 (Sloan1994-cl).

For integers $N\geq 2$ and $s\geq 2$ , there exists a $\bm{z}\in\mathbb{Z}^{s}$ such that

\textstyle P_{\alpha}(\bm{z},N)\leq\frac{(2\log N)^{\alpha s}}{N^{\alpha}}+O\left(\frac{(\log N)^{\alpha s-1}}{N^{\alpha}}\right).

for $P_{\alpha}(\bm{z},N)=\frac{1}{(\bar{h}_{1}\bar{h}_{2}\cdots\bar{h}_{s})^{\alpha}}$ .

The main result of this paper is the following.

Theorem 7.

Suppose that the activation function of $\tilde{u}$ and hence $\tilde{u}$ itself are sufficiently smooth so that there exists an $\alpha>0$ such that $\mathcal{P}[\tilde{u}]\in E_{\alpha}$ . Then, for given integers $N\geq 2$ and $s\geq 2$ , there exists an integer vector $\bm{z}\in\mathbb{Z}^{s}$ such that $L^{*}=\{\mbox{the decimal part of\ }\frac{j}{N}\bm{z}\mid j=0,\ldots,N-1\}$ is a “good lattice” in the sense that

\textstyle\left|\int_{[0,1]^{s}}\!\!\!\mathcal{P}[\tilde{u}](\bm{x})\mathrm{d}\bm{x}\!-\!\frac{1}{N}\!\!\sum_{\bm{x}_{j}\in L^{*}}\!\!\mathcal{P}[\tilde{u}](\bm{x}_{j})\right|=\!O\left(\frac{(\log N)^{\alpha s}}{N^{\alpha}}\right)\!\!.

(6)

Intuitively, if $\mathcal{P}[\tilde{u}]$ satisfies certain conditions, we can find a set $L^{*}$ of collocation points with which the objective function (1) approximates the integral (2) only within an error of $O(\frac{(\log N)^{\alpha s}}{N^{\alpha}})$ . This rate is much better than that of the uniformly random sampling (i.e., the Monte Carlo method), which is of $O(1/N^{\frac{1}{2}})$ (Sloan1994-cl), if the activation function of $\tilde{u}$ and hence the neural network $\tilde{u}$ itself are sufficiently smooth so that $\mathcal{P}[\tilde{u}]\in E_{\alpha}$ for a large $\alpha$ . Hence, in this paper, we call the training method that minimizes (1) for a lattice $L$ satisfying (6) the good lattice training (GLT).

While any set of collocation points that satisfies the above condition will have the same convergence rate, a set constructed by the vector $\bm{z}\in\mathbb{Z}^{s}$ that minimizes (5) leads to better accuracy. When $s=2$ , it is known that a good lattice can be constructed by setting $N=F_{k}$ and $\bm{z}=(1,F_{k-1})$ , where $F_{k}$ denotes the $k$ -th Fibonacci number (Niederreiter1992-qb; Sloan1994-cl). In general, an algorithm exists that can determine the optimal $\bm{z}$ with a computational cost of $O(N^{2})$ . See Appendix “Theoretical Background” for more details. Also, we can retrieve the optimal $\bm{z}$ from numerical tables found in references, such as Fang1994; Keng1981.

Periodization and Randomization Tricks

The integrand $\mathcal{P}[\tilde{u}]$ of the loss function (2) does not always belong to the Korobov space $E_{\alpha}$ with high smoothness $\alpha$ . To align the proposed GLT with theoretical expectations, we propose periodization tricks for ensuring periodicity and smoothness.

Given an initial condition at time $t=0$ , the periodicity is ensured by extending the lattice twice as much along the time coordinate and folding it. Specifically, instead of $t$ , we use $\hat{t}$ as the time coordinate that satisfies $t=2\hat{t}$ if $\hat{t}<0.5$ and $t=2(1-\hat{t})$ otherwise (see the lower right panel of Fig. 1, where the time is put on the horizontal axis). Also, while not mandatory, we combine the initial condition $u_{0}(\bm{x}_{\!/t})$ and the neural network’s output $\tilde{u}(t,\dots)$ as $\exp(-t)u_{0}(\bm{x}_{\!/t})+(1-\exp(-t))\tilde{u}(t,\bm{x}_{\!/t})$ , thereby ensuring the initial condition, where $\bm{x}_{\!/t}$ denotes the set of coordinates except for the time coordinate $t$ . A similar idea was proposed in Lagaris1998, which however does not ensure the initial condition strictly. If a periodic boundary condition is given to the $k$ -th space coordinate, we bypass learning it and instead map the coordinate $x_{k}$ to a unit circle in two-dimensional space. Specifically, we map $x_{k}$ to $(x_{k}^{(1)},x_{k}^{(2)})=(\cos(2\pi x_{k}),\sin(2\pi x_{k}))$ , assuring the loss function $\mathcal{P}[\tilde{u}]$ to take the same value at the both edges ( $x_{k}=0$ and $x_{k}=1$ ) and be periodic. Given a Dirichlet boundary condition $u=0$ at $\partial\Omega$ to the $k$ -th axis, we multiply the neural network’s output $\tilde{u}(\dots,x_{k},\dots)$ by $x_{k}(1-x_{k})$ , and treat the result as the approximated solution. This ensures the Dirichlet boundary condition is met, and the loss function $\mathcal{P}[\tilde{u}]$ takes zero at the boundary $\partial\Omega$ , thereby ensuring the periodicity. If a more complicated Dirichlet boundary condition is given, one can fold the lattice along the space coordinate in the same manner as the time coordinate and ensure the periodicity of the loss function $\mathcal{P}[\tilde{u}]$ .

These periodization tricks aim to satisfy the periodicity conditions necessary for GLT to exhibit the performance shown in Theorem 7. However, they are also available for other sampling methods and potentially improve the practical performance by liberating them from the effort of learning initial and boundary conditions.

Since the GLT is grounded on the Fourier series foundation, it allows for the periodic shifting of the lattice. Hence, we randomize the collocation points as

\textstyle L^{*}=\{\mbox{the decimal part of\ }\frac{j}{N}\bm{z}+{\bm{r}}\mid j=0,\ldots,N-1\},

where ${\bm{r}}$ follows the uniform distribution over the unit cube $[0,1]^{s}$ . Our preliminary experiments confirmed that, if using the stochastic gradient descent (SGD) algorithm, resampling the random numbers ${\bm{r}}$ at each training iteration prevents the neural network from overfitting and improves training efficiency. We call this approach the randomization trick.

Limitations

Not only is this true for the proposed GLT, but most strategies to determine collocation points are not directly applicable to non-rectangular or non-flat domain $\Omega$ (Shankar2018). To achieve the best performance, the PDEs should be transformed to such a domain by an appropriate coordinate transformation. See Knupp2020-ix; Thompson1985-tx for examples.

Previous studies on numerical analysis addressed the periodicity and smoothness conditions on the integrand by variable transformations (Sloan1994-cl) (see also Appendix “Theoretical Background”). However, our preliminary experiments confirmed that it did not perform optimally in typical problem settings. Intuitively, these variable transformations reduce the weights of regions that are difficult to integrate, suppressing the discretization error. This implies that, when used for training, the regions with small weights remain unlearned. As a viable alternative, we introduced the periodization tricks to ensure periodicity.

The performance depends on the smoothness of the physics-informed loss, and hence on the smoothness of the neural network and the true solution. See Appendix “Theoretical Background” for details.

Experiments and Results

Physics-Informed Neural Networks

We modified the code from the official repository¹¹1https://github.com/maziarraissi/PINNs (MIT license) of Raissi2019, the original paper on PINNs. We obtained the datasets of the nonlinear Schrödinger (NLS) equation, Korteweg–De Vries (KdV) equation, and Allen-Cahn (AC) equation from the repository. The NLS equation governs wave functions in quantum mechanics, while the KdV equation models shallow water waves, and the AC equation characterizes phase separation in co-polymer melts. These datasets provide numerical solutions to initial value problems with periodic boundary conditions. Although they contain numerical errors, we treated them as the true solutions $u$ . These equations are nonlinear versions of hyperbolic or parabolic PDEs. Additionally, as a nonlinear version of an elliptic PDE, we created a dataset for $s$ -dimensional Poisson’s equation, which produces analytically solvable solutions with $2^{s}$ modes with the Dirichlet boundary condition. We examined the cases where $s\in\{2,4\}$ . See Appendix “Experimental Settings” for further information.

Unless otherwise stated, we followed the repository’s experimental settings for the NLS equation. The physics-informed loss was defined as $\mathcal{P}[\tilde{u}]=\frac{1}{N}\sum_{j=0}^{N-1}\|\mathcal{N}[\tilde{u}](\bm{x}_{j})\|^{2}$ given $N$ collocation points $\{\bm{x}_{j}\}_{j=0}^{N-1}$ . This can be regarded as a finite approximation to the squared 2-norm $|\mathcal{N}[\tilde{u}]|_{2}^{2}=\int_{\Omega}\|\mathcal{N}[\tilde{u}](\bm{x})\|^{2}\mathrm{d}\bm{x}$ . The state of the NLS equation is complex; we simply treated it as a 2D real vector for training and used its absolute value for evaluation and visualization. Following Raissi2019, we evaluated the performance using the relative error, which is the normalized squared error $\mathcal{L}(\tilde{u},u;\bm{x}_{j})=(\sum_{j=0}^{N_{e}-1}\|\tilde{u}(\bm{x}_{j})-u(\bm{x}_{j})\|^{2})^{\frac{1}{2}}/(\sum_{j=0}^{N_{e}-1}\|u(\bm{x}_{j})\|^{2})^{\frac{1}{2}}$ at predefined $N_{e}$ collocation points $\{\bm{x}_{j}\}_{j=0}^{N_{e}-1}$ . This is also a finite approximation to $|\tilde{u}-u|_{2}/|u|_{2}$ .

We applied the above periodization tricks to each test and model for a fair comparison. For Poisson’s equation with $s=2$ , which gives the exact solutions, we followed the original learning strategy using the L-BFGS-B method preceded by the Adam optimizer (Kingma2014b) for 50,000 iterations to ensure precise convergence. For other datasets, which contain the numerical solutions, we trained PINNs using the Adam optimizer with cosine decay of a single cycle to zero (Loshchilov2017) for 200,000 iterations and sampled a different set of collocation points at each iteration. See Appendix “Experimental Settings” for details.

We determined the collocation points using uniformly random sampling, uniformly spaced sampling, LHS, the Sobol sequence, and the proposed GLT. For the GLT, we took the number $N$ of collocation points and the corresponding integer vector $\bm{z}$ from numerical tables in (Fang1994; Keng1981). We used the same values for uniformly random sampling and LHS to maintain consistency. For uniformly spaced sampling, we selected numbers $N=m^{s}$ for $m\in\mathbb{N}$ that were closest to a number used in the GLT, creating a unit cube of $m$ points on each side. We additionally applied the randomization trick. For the Sobol sequence, we used $N=2^{m}$ for $m\in\mathbb{N}$ due to its theoretical background. We conducted five trials for each number $N$ and each method. All experiments were conducted using Python v3.7.16 and tensorflow v1.15.5 (tensorflow) on servers with Intel Xeon Platinum 8368.

Refer to caption — Figure 2: The number $N$ of collocation points and the standard deviation of the physics-informed loss, which approximates the discretization error $|(\ref{eq:closs})-(\ref{eq:dloss})|$ .

Table 1: Trade-Off between Number

N

of Collocation Points and Relative Error

\mathcal{L}

		# of points $N^{\dagger}$					relative error $\mathcal{L}^{\ddagger}$
		NLS	KdV	AC	Poisson		NLS	KdV	AC	Poisson
					$s=2$	$s=4$				$s=2$	$s=4$
$\blacktriangle$	uniformly random	$>$ 4,181	$>$ 4,181	4,181	$>$ 4,181	1,019	3.11	2.97	1.55	28.53	0.28