This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Random Feature Method for Time-dependent Problems

Jingrun Chen School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, China and Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China. [email protected] Weinan E AI for Science Institute, Beijing and Center for Machine Learning Research and School of Mathematical Sciences, Peking University. [email protected]  and  Yixin Luo University of Science and Technology of China, Hefei 230026, China and Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China. [email protected] Dedicated to Professor Tao Tang on the occasion of his 60th birthday
Abstract.

We present a framework for solving time-dependent partial differential equations (PDEs) in the spirit of the random feature method. The numerical solution is constructed using a space-time partition of unity and random feature functions. Two different ways of constructing the random feature functions are investigated: feature functions that treat the spatial and temporal variables (STC) on the same footing, or functions that are the product of two random feature functions depending on spatial and temporal variables separately (SoV). Boundary and initial conditions are enforced by penalty terms. We also study two ways of solving the resulting least-squares problem: the problem is solved as a whole or solved using the block time-marching strategy. The former is termed “the space-time random feature method” (ST-RFM). Numerical results for a series of problems show that the proposed method, i.e. ST-RFM with STC and ST-RFM with SoV, have spectral accuracy in both space and time. In addition, ST-RFM only requires collocation points, not a mesh. This is important for solving problems with complex geometry. We demonstrate this by using ST-RFM to solve a two-dimensional wave equation over a complex domain. The two strategies differ significantly in terms of the behavior in time. In the case when block time-marching is used, we prove a lower error bound that shows an exponentially growing factor with respect to the number of blocks in time. For ST-RFM, we prove an upper bound with a sublinearly growing factor with respect to the number of subdomains in time. These estimates are also confirmed by numerical results.

Key words and phrases:
time-dependent PDEs, partition of unity method, random feature method, collocation method, separation-of-variables random features
2010 Mathematics Subject Classification:
Primary: 65M20,65M55,65M70.

1. Introduction

Time-dependent partial differential equations (PDEs), such as diffusion equation, wave equation, Maxwell equation, and Schrödinger equation, are widely used for modeling the dynamic evolution of physical systems. Numerical methods, including finite difference method [11], finite element methods [17], and spectral methods [15], have been proposed to solve these PDEs. Despite the great success in theory and application, these methods still face some challenges, to name a few, complex geometry, mesh generation, and possibly high dimensionality.

Along another line, the success of deep learning in computer vision and natural language processing [8] attracts great attention in the community of scientific computing. As a special class of functions, neural networks are proved to be universal approximators to continuous functions [3]. Many researchers seek for solving ordinary and partial differential equations with neural networks [6, 9, 5, 16, 19, 14, 7]. Since the PDE solution can be defined in the variational (if exists), strong, and weak forms, deep Ritz method [5], deep Galerkin method [16] and physics-informed neural networks [14], and weak adversarial network [19] are proposed using loss (objective) functions in the variational, strong, and weak forms, respectively. Deep learning-based algorithms have now made it fairly routine to solve a large class of PDEs in high dimensions without the need for mesh generation of any kind.

For low-dimensional problems, traditional methods are accurate, with reliable error control, stability analysis and affordable cost. However, in practice, coming up with a suitable mesh is often a highly non-trivial task, especially for complex geometry. On the contrary, machine-learning methods are mesh-free and only collocation points are needed. Even for low-dimensional problems, this point is still very attractive. What bothers a user is the lackness of reliable error control in machine-learning methods. For example, without an exact solution, the numerical approximation given by a machine-learning method does not show a clear trend of convergence as the number of parameters increases.

There are some efforts to combine the merits of traditional methods and deep-learning based methods. The key ingredient is to replace deep neural networks by a special class of two-layer neural networks with the inner parameters fixed, known as random features [12, 13] or extreme learning machine [10]. Random feature functions are proved to be universal approximators as well, meanwhile only the parameters of the output layer need to be optimized, leading to a convex optimization problem. Extreme learning machines are employed to solve ordinary and partial differential equations in [18] and [1], respectively. Spectral accuracy is obtained for problems with analytic solutions, and the simplicity of network architectures reduces the training difficulty in terms of execution time and solution accuracy, compared to deep neural networks. In [4], a special kind of partition of unity (PoU), termed as domain decomposition, is combined with extreme learning machines to approximate the PDE solution and the block time-marching strategy is proposed for long time simulations. Spectral accuracy is obtained in both space and time for analytic solutions, but the error grows exponentially fast in most cases as the simulation time increases. In [2], combining PoU and random feature functions, the random feature method (RFM) is proposed to solve static PDEs with complex geometries. An automatic rescaling strategy is also proposed to balance the weights of equations and boundary conditions, which is found to work well for linear elasticity and Stokes flow over complex geometrices.

The objective in this article is to propose a methodology for solving time-dependent PDEs that shares the merits of both traditional and machine learning-based algorithms. This new class of algorithms can be made spectrally accurate in both space and time. Meanwhile, they are also mesh-free, making them easy to use even in settings with complex geometry. Our starting point is based on a combination of rather simple and well-known ideas: We use space-time PoU and random feature functions to represent the approximate solution, the collocation method to take care of the PDE as well as the boundary conditions in the least-squares sense, and a rescaling procedure to balance the contributions from the PDE and initial/boundary conditions in the loss function. This method is called ST-RFM. For time-dependent problems, a random feature function can depend on both spatial and temporal variables, i.e. space-time concatenation input (STC), or is the product of two random feature functions depending on spatial and temporal variables separately (SoV). STC can be viewed as natural extensions of random feature method for static problems [2], while SoV may be a better choice for some time-dependent PDEs. Both STC and SoV are proved to be universal approximators. For long time intervals, PoU for the temporal variable, i.e. domain decomposition along the time direction, is propsed to solve time-dependent problems. Error estimates of the ST-RFM and the block-time marching strategy are provided. ST-RFM yields spectrally accurate results with slowly growing error in terms of the number of subdomains in time, while the error generated by the block time-marching strategy in [4] grows exponentially fast in terms of the number of blocks. These findings are confirmed by numerical results in one and two dimensions.

This article is organized as follows. In Section 2, we present the ST-RFM and prove the approximation property of STC and SoV. Upper bound error estimate for ST-RFM and lower bound error estimate for the block time-marching strategy are provided. In Section 3, numerical experiments in one and two dimensions for heat, wave, and Schrödinger equations are conducted to show the spectral accuracy of the proposed method, and confirm the error estimates for long time simulations. Application of ST-RFM to a two-dimensional wave equation over a complex geometry is also provided. Conclusions are drawn in Section 4.

2. Random Feature Method for Solving Time-dependent PDEs

For completeness, we first recall the RFM for solving static problems. We then introduce the ST-RFM for solving time-dependent PDEs, and prove the universal approximation property and the error estimate.

2.1. Random Feature Method for Static Problems

Let 𝒙Ωdx\boldsymbol{x}\in\Omega\subset\mathbb{R}^{d_{x}}, where dx+d_{x}\in\mathbb{N}^{+} is the dimension of 𝒙\boldsymbol{x}, and let du+d_{u}\in\mathbb{N}^{+} be the dimension of output. Consider the following boundary-value problem

(1) {𝒖(𝒙)=𝒇(𝒙),𝒙Ω,𝒖(𝒙)=𝒈(𝒙),𝒙Ω,\left\{\begin{aligned} \mathcal{L}\boldsymbol{u}(\boldsymbol{x})&=\boldsymbol{f}(\boldsymbol{x}),&\quad\boldsymbol{x}\in\Omega,\\ \mathcal{B}\boldsymbol{u}(\boldsymbol{x})&=\boldsymbol{g}(\boldsymbol{x}),&\quad\boldsymbol{x}\in\partial\Omega,\end{aligned}\right.

where 𝒇\boldsymbol{f} and 𝒈\boldsymbol{g} are known functions, \mathcal{L} and \mathcal{B} are differential and boundary operators, respectively. Ω\partial\Omega is the boundary of Ω\Omega.

The RFM has the following components. First, NN points {𝒙^i}\{\hat{\boldsymbol{x}}_{i}\} are chosen from Ω\Omega, typically uniformly distributed. Then, Ω\Omega is decomposed to NN subdomains {Ωi}\{\Omega_{i}\} with 𝒙^iΩi\hat{\boldsymbol{x}}_{i}\in\Omega_{i}. Thus, we have ΩiΩi\Omega\subset\cup_{i}\Omega_{i}. For each Ωi\Omega_{i}, a PoU function ψi\psi_{i} with support Ωi\Omega_{i}, i.e. supp(ψi)=Ωi\mathrm{supp}(\psi_{i})=\Omega_{i}, is constructed. In one dimension, two PoU functions are commonly used:

(2a) ψa(x)\displaystyle\psi_{a}(x) =𝕀[1,1](x),\displaystyle=\mathds{I}_{[-1,1]}(x),
(2b) ψb(x)\displaystyle\psi_{b}(x) =𝕀[54,34](x)1+sin(2πx)2+𝕀[34,34](x)+𝕀[34,54](x)1sin(2πx)2.\displaystyle=\mathds{I}_{[-\frac{5}{4},-\frac{3}{4}]}(x)\frac{1+\sin(2\pi x)}{2}+\mathds{I}_{[-\frac{3}{4},\frac{3}{4}]}(x)+\mathds{I}_{[\frac{3}{4},\frac{5}{4}]}(x)\frac{1-\sin(2\pi x)}{2}.

The first PoU is discontinuous while the second one is continuously differentiable. In high dimensions, the PoU function ψ\psi can be constructed as a tensor product of dxd_{x} one-dimensional PoU functions ψ\psi, i.e. ψ(𝒙)=Πi=1dxψ(xi)\psi({\boldsymbol{x}})=\Pi_{i=1}^{d_{x}}\psi(x_{i}).

Then, for each Ωi\Omega_{i}, RFM constructs Jn+J_{n}\in\mathbb{N}^{+} random features ϕij\phi_{ij} by a two-layer neural network with random but fixed parameters 𝒌ij\boldsymbol{k}_{ij} and bijb_{ij}, i.e.

(3) ϕij(𝒙)=σ(𝒌ijli(𝒙)+bij),j=1,2,,Jn,\phi_{ij}(\boldsymbol{x})=\sigma(\boldsymbol{k}_{ij}^{\top}l_{i}(\boldsymbol{x})+b_{ij}),\qquad j=1,2,\cdots,J_{n},

where the nonlinear activation function σ\sigma is chosen as tanh or trigonometric functions in [2]. In (3), each component of 𝒌ij\boldsymbol{k}_{ij} and bijb_{ij} is independently sampled from 𝕌(Rm,Rm)\mathds{U}(-R_{m},R_{m}), where Rm+R_{m}\in\mathbb{R}^{+} controls the magnitude of parameters. In particular, both the weights and the biases are fixed in the optimization procedure. Moreover, li(𝒙)l_{i}(\boldsymbol{x}) is a linear transformation to transform inputs in Ωi\Omega_{i} to [1,1]dx[-1,1]^{d_{x}}. The approximate solution in RFM is the linear combination of these random features together with the PoU

(4) uM(𝒙)=i=1Nψi(𝒙)j=1Jnuijϕij(𝒙),u_{M}(\boldsymbol{x})=\sum_{i=1}^{N}\psi_{i}(\boldsymbol{x})\sum_{j=1}^{J_{n}}u_{ij}\phi_{ij}(\boldsymbol{x}),

where uiju_{ij}\in\mathbb{R} are unknown coefficients to be sought and M=NJnM=N\cdot J_{n} denotes the degree of freedom. For vectorial solutions, we approximate each component of the solution by (4), i.e.

(5) 𝒖M(𝒙)=(uM1(𝒙),,uMdu(𝒙)).\boldsymbol{u}_{M}(\boldsymbol{x})=(u_{M}^{1}(\boldsymbol{x}),\cdots,u_{M}^{d_{u}}(\boldsymbol{x}))^{\top}.

To find the optimal set of parameters {uij}\{u_{ij}\}, RFM evaluates the original problem on collocation points and formulates a least-squares problem. To be specific, RFM samples Q+Q\in\mathbb{N}^{+} collocation points 𝒙i,1,,𝒙i,Q\boldsymbol{x}_{i,1},\cdots,\boldsymbol{x}_{{i,Q}} in each Ωi\Omega_{i}, and computes the rescaling parameters λi,k,qp>0\lambda_{i,k,q}^{p}>0 and λi,k,eb>0\lambda_{i,k,e}^{b}>0 for i[N]i\in[N], k[du]k\in[d_{u}], q[Q]q\in[Q] and e[Q]e\in[Q] satisfying 𝒙i,eΩ\boldsymbol{x}_{i,e}\in\partial\Omega. Let λi,qp=diag(λi,1,qp,,λi,du,qp)\lambda_{i,q}^{p}=\mathrm{diag}(\lambda_{i,1,q}^{p},\cdots,\lambda_{i,d_{u},q}^{p})^{\top} and λi,eb=diag(λi,1,eb,,λi,du,eb)\lambda_{i,e}^{b}=\mathrm{diag}(\lambda_{i,1,e}^{b},\cdots,\lambda_{i,d_{u},e}^{b})^{\top}. Then, the random feature method minimizes

(6) Loss({ui,j,k})=i=1N(q=1Qλi,qp(𝒖M(𝒙i,q)𝒇(𝒙i,q))22)+(𝒙i,eΩλi,eb(𝒖M(𝒙i,e)𝒈(𝒙i,e))22),\mathrm{Loss}(\{u_{i,j,k}\})=\sum_{i=1}^{N}\left(\sum_{q=1}^{Q}\|\lambda^{p}_{i,q}(\mathcal{L}\boldsymbol{u}_{M}(\boldsymbol{x}_{i,q})-\boldsymbol{f}(\boldsymbol{x}_{i,q}))\|_{2}^{2}\right)+\left(\sum_{\boldsymbol{x}_{i,e}\in\partial\Omega}\|\lambda^{b}_{i,e}(\mathcal{B}\boldsymbol{u}_{M}(\boldsymbol{x}_{i,e})-\boldsymbol{g}(\boldsymbol{x}_{i,e}))\|_{2}^{2}\right),

where 𝒖M\boldsymbol{u}_{M} is of the form (5) and ui,j,ku_{i,j,k} is the kk-th coefficient of the random feature ϕij\phi_{ij}.

The above problem (6) is a linear least-squares problem when \mathcal{L} and \mathcal{B} are linear operators. Moreover, when the discontinuous PoU ψa\psi_{a} is used, continuity conditions between adjacent subdomains must be imposed by adding regularization terms in (6), while no regularization is required when ψb\psi_{b} is used for second-order equations. By minimizing (6), the optimal coefficients 𝒖=(uijk,)\boldsymbol{u}^{*}=(u_{ijk}^{*},)^{\top} are obtained and the numerical solution is constructed by (5).

2.2. Space-Time Random Feature Method

Now, we consider time-dependent PDEs of the following form with the final time T>0T>0

(7) {𝒖(𝒙,t)=𝒇(𝒙,t),𝒙,tΩ×[0,T],𝒖(𝒙,t)=𝒈(𝒙,t),𝒙,tΩ×[0,T],𝒖(𝒙,0)=𝒉(𝒙),𝒙Ω,\left\{\begin{aligned} \mathcal{L}\boldsymbol{u}(\boldsymbol{x},t)&=\boldsymbol{f}(\boldsymbol{x},t),&\quad\boldsymbol{x},t\in\Omega\times[0,T],\\ \mathcal{B}\boldsymbol{u}(\boldsymbol{x},t)&=\boldsymbol{g}(\boldsymbol{x},t),&\quad\boldsymbol{x},t\in\partial\Omega\times[0,T],\\ \mathcal{I}\boldsymbol{u}(\boldsymbol{x},0)&=\boldsymbol{h}(\boldsymbol{x}),&\quad\boldsymbol{x}\in\Omega,\\ \end{aligned}\right.

where 𝒇\boldsymbol{f}, 𝒈\boldsymbol{g} and 𝒉\boldsymbol{h} are known functions and \mathcal{I} is the initial operator.

Following the same routine as in RFM, we construct a partition of Ω×[0,T]\Omega\times[0,T]. First, we decompose the spatial domain Ω\Omega to NxN_{x} subdomains where each subdomain Ωi\Omega_{i} contains a central point 𝒙^i\hat{\boldsymbol{x}}_{i}. We then decompose the temporal interval [0,T][0,T] to NtN_{t} subdomains, i.e. [0,T)=[t0,t1)[t1,t2),,[tNt1,tNt)[0,T)=[t_{0},t_{1})\cup[t_{1},t_{2}),\cdots,[t_{N_{t}-1},t_{N_{t}}), where each subdomain contains a central point t^i=ti1+ti2\hat{t}_{i}=\frac{t_{i-1}+t_{i}}{2}. The product of the PoUs in space and in time results in the space-time PoU, i.e.

(8) ψix,it(𝒙,t)=ψix(𝒙)ψit(t),\psi_{i_{x},i_{t}}(\boldsymbol{x},t)=\psi_{i_{x}}(\boldsymbol{x})\psi_{i_{t}}(t),

where ixi_{x} and iti_{t} are indices for spatial and temporal subdomains, respectively.

Next, we generalize spatial random features (3) to space-time random features. There are two options. The first can be viewed as a natural extension of (3), where the concatenation of spatial and temporal variables is fed into ϕij\phi_{ij} as the input and the output is a random feature function, i.e.

(9) ϕix,it,j(𝒙,t)=σ((𝒌ix,it,jx)lix(𝒙)+kix,it,jtlit(t)+bix,it,j).\phi_{i_{x},i_{t},j}(\boldsymbol{x},t)=\sigma((\boldsymbol{k}_{i_{x},i_{t},j}^{x})^{\top}l_{i_{x}}(\boldsymbol{x})+k^{t}_{i_{x},i_{t},j}l_{i_{t}}(t)+b_{i_{x},i_{t},j}).

Here 𝒌ix,it,jx\boldsymbol{k}_{i_{x},i_{t},j}^{x} and kix,it,jtk_{i_{x},i_{t},j}^{t} are the weights associated with spatial and temporal inputs, respectively. bix,it,jb_{i_{x},i_{t},j} is the bias, lix(𝒙)l_{i_{x}}(\boldsymbol{x}) and lit(t)l_{i_{t}}(t) are linear transformations from 𝒙Ωix\boldsymbol{x}\in\Omega_{i_{x}} to [1,1]dx[-1,1]^{d_{x}} and from t[tit1,tit]t\in[t_{i_{t}-1},t_{i_{t}}] to [1,1][-1,1], respectively. (9) is called STC.

The second option is to use separation of variables, which mimics the technique of separation of variables for solving PDEs

(10) ϕix,it,j(𝒙,t)=σ((𝒌ix,it,jx)lix(𝒙)+bix,it,jx)σ(kix,it,jtlit(t)+bix,it,jt).\phi_{i_{x},i_{t},j}(\boldsymbol{x},t)=\sigma((\boldsymbol{k}_{i_{x},i_{t},j}^{x})^{\top}l_{i_{x}}(\boldsymbol{x})+b^{x}_{i_{x},i_{t},j})\sigma(k^{t}_{i_{x},i_{t},j}l_{i_{t}}(t)+b_{i_{x},i_{t},j}^{t}).

In this formulation, the space-time random feature is a product of the spatial random feature and the temporal random feature, thus we term it as SoV. For both random features, the degrees of freedom M=NxNtJnM=N_{x}\cdot N_{t}\cdot J_{n}.

The combination of these space-time random features and PoU leads to the following approximate solution in the scalar case

(11) uM(𝒙,t)=ix=1Nxit=1Ntψix(𝒙)ψit(t)j=1Jnuix,it,jϕix,it,j(𝒙,t),u_{M}(\boldsymbol{x},t)=\sum_{i_{x}=1}^{N_{x}}\sum_{i_{t}=1}^{N_{t}}\psi_{i_{x}}(\boldsymbol{x})\psi_{i_{t}}(t)\sum_{j=1}^{J_{n}}u_{i_{x},i_{t},j}\phi_{i_{x},i_{t},j}(\boldsymbol{x},t),

where uix,it,ju_{i_{x},i_{t},j} is the unknown coefficient to be sought. A vectorial solution 𝒖M\boldsymbol{u}_{M} can be formulated in the same way as that in (5), i.e.

(12) 𝒖M(𝒙,t)=(uM1(𝒙,t),,uMdu(𝒙,t)).\boldsymbol{u}_{M}(\boldsymbol{x},t)=(u_{M}^{1}(\boldsymbol{x},t),\cdots,u_{M}^{d_{u}}(\boldsymbol{x},t))^{\top}.

For the kk-th component of 𝒖M\boldsymbol{u}_{M}, we denote the coefficient associated to ϕix,it,j\phi_{i_{x},i_{t},j} by uix,it,j,ku_{i_{x},i_{t},j,k}.

By substituting (11) or (12) into (7), we define the loss function on collocation points. Denote {𝒙ix,it,1,,𝒙ix,it,Qx}\{\boldsymbol{x}_{i_{x},i_{t},1},\cdots,\boldsymbol{x}_{i_{x},i_{t},Q_{x}}\} spatial collocation points in the ixi_{x}-th spatial subdomain and (it1)T/Nt=tix,it=tix,it,0<tix,it,1<<tix,it,Qt=itT/Nt(i_{t}-1)T/N_{t}=t_{i_{x},i_{t}}=t_{i_{x},i_{t},0}<t_{i_{x},i_{t},1}<\cdots<t_{i_{x},i_{t},Q_{t}}=i_{t}T/N_{t} temporal collocation points in the iti_{t}-th temporal subdomain, respectively. In RFM, we define the rescaling parameters λix,it,qx,qte\lambda_{i_{x},i_{t},q_{x},q_{t}}^{e}, λix,it,e,qtb\lambda_{i_{x},i_{t},e,q_{t}}^{b} and λix,it,qxi\lambda_{i_{x},i_{t},q_{x}}^{i} for ix[Nx]i_{x}\in[N_{x}], it[Nt]i_{t}\in[N_{t}], qx[Qx]q_{x}\in[Q_{x}], qt[Qt]q_{t}\in[Q_{t}] and e[Qx]e\in[Q_{x}] satisfying 𝒙ix,it,eΩ\boldsymbol{x}_{i_{x},i_{t},e}\in\partial\Omega. Then, the ST-RFM minimizes

(13) Loss({uix,it,j,k})=ix=1Nxit=1Nt\displaystyle\mathrm{Loss}(\{u_{i_{x},i_{t},j,k}\})=\sum_{i_{x}=1}^{N_{x}}\sum_{i_{t}=1}^{N_{t}} (qx=1Qxqt=1Qtλix,it,qx,qte(𝒖(𝒙ix,it,qx,tix,it,qt)𝒇(𝒙ix,it,qx,tix,it,qt))2)+\displaystyle\left(\sum_{q_{x}=1}^{Q_{x}}\sum_{q_{t}=1}^{Q_{t}}\|\lambda^{e}_{i_{x},i_{t},q_{x},q_{t}}(\mathcal{L}\boldsymbol{u}(\boldsymbol{x}_{i_{x},i_{t},q_{x}},t_{i_{x},i_{t},q_{t}})-\boldsymbol{f}(\boldsymbol{x}_{i_{x},i_{t},q_{x}},t_{i_{x},i_{t},q_{t}}))\|^{2}\right)+
(𝒙ix,it,eΩqt=1Qtλix,it,e,qtb(𝒖(𝒙ix,it,e,tix,it,qt)𝒈(𝒙ix,it,e,tix,it,qt))2)+\displaystyle\left(\sum_{\boldsymbol{x}_{i_{x},i_{t},e}\in\partial\Omega}\sum_{q_{t}=1}^{Q_{t}}\|\lambda^{b}_{i_{x},i_{t},e,q_{t}}(\mathcal{B}\boldsymbol{u}(\boldsymbol{x}_{i_{x},i_{t},e},t_{i_{x},i_{t},q_{t}})-\boldsymbol{g}(\boldsymbol{x}_{i_{x},i_{t},e},t_{i_{x},i_{t},q_{t}}))\|^{2}\right)+
𝕀it=1(qx=1Qxλix,it,qxi(𝒖(𝒙ix,it,qx,0)𝒉(𝒙ix,it,qx))2)\displaystyle\mathds{I}_{i_{t}=1}\left(\sum_{q_{x}=1}^{Q_{x}}\|\lambda^{i}_{i_{x},i_{t},q_{x}}(\mathcal{I}\boldsymbol{u}(\boldsymbol{x}_{i_{x},i_{t},q_{x}},0)-\boldsymbol{h}(\boldsymbol{x}_{i_{x},i_{t},q_{x}}))\|^{2}\right)

to find the numerical solution of the form (12). Moreover, when Nx>1N_{x}>1 or Nt>1N_{t}>1 and ψa\psi_{a} is used, continuity conditions between adjacent spatial (temporal) subdomains must be imposed by adding regularization terms in (13), while no regularization is required when ψb\psi_{b} is used for second-order equations.

For better illustration of this method, we consider an example with du=1d_{u}=1, the initial operator \mathcal{I} being identity and rescaling parameters being one. For convenience, we set Nx=1N_{x}=1 and relabel iti_{t} by ii without confusion. Under these settings, we introduce two matrix-valued functions 𝚽i(t)\mathbf{\Phi}_{i}(t), 𝐋i(t)\mathbf{L}_{i}(t) and three induced matrices as follows:

𝚽i(t)=\displaystyle\mathbf{\Phi}_{i}(t)= (ϕi,j(𝒙i,q,t),)Qx×Jn,𝚽i,0=𝚽i(ti,0),𝚽i,1=𝚽i(ti,Qt),\displaystyle(\phi_{i,j}(\boldsymbol{x}_{i,q},t),)\in\mathbb{R}^{Q_{x}\times J_{n}},\quad\mathbf{\Phi}_{i,0}=\mathbf{\Phi}_{i}(t_{i,0}),\quad\mathbf{\Phi}_{i,1}=\mathbf{\Phi}_{i}(t_{i,Q_{t}}),
𝐋i(t)=\displaystyle\mathbf{L}_{i}(t)= (ϕi,j(𝒙i,q,t),)Qx×Jn,𝐋i=[𝐋i(ti,0),,𝐋i(ti,Qt1)],\displaystyle(\mathcal{L}\phi_{i,j}(\boldsymbol{x}_{i,q},t),)\in\mathbb{R}^{Q_{x}\times J_{n}},\quad\mathbf{L}_{i}=[\mathbf{L}_{i}(t_{i,0})^{\top},\cdots,\mathbf{L}_{i}(t_{i,Q_{t}-1})^{\top}]^{\top},

where qq is the row index and jj is the column index. Then, we construct the matrix 𝐀\mathbf{A} as follows:

𝐀i=[𝟎(Jn×(i1)(Qt+1)Qx),𝚽i,0,𝐋i,𝚽i,1,𝟎(Jn×((Nti)(Qt+1)1)Qx)],𝐀=[𝐀1,,𝐀Nt].\mathbf{A}_{i}=[\mathbf{0}_{(J_{n}\times(i-1)(Q_{t}+1)Q_{x})},\mathbf{\Phi}^{\top}_{i,0},\mathbf{L}_{i}^{\top},-\mathbf{\Phi}_{i,1}^{\top},\mathbf{0}_{(J_{n}\times((N_{t}-i)(Q_{t}+1)-1)Q_{x})}]^{\top},\quad\mathbf{A}=[\mathbf{A}_{1},\cdots,\mathbf{A}_{N_{t}}].

Define the following vectors

𝒉=[h(𝒙1,1),h(𝒙1,2),,h(𝒙1,Qx)],\displaystyle\boldsymbol{h}=[h(\boldsymbol{x}_{1,1}),h(\boldsymbol{x}_{1,2}),\cdots,h(\boldsymbol{x}_{1,Q_{x}})]^{\top},
𝒇i,j=[f(𝒙i,1,ti,j),f(𝒙i,2,ti,j),,f(𝒙i,Qx,ti,j)],\displaystyle\boldsymbol{f}_{i,j}=[f(\boldsymbol{x}_{i,1},t_{i,j}),f(\boldsymbol{x}_{i,2},t_{i,j}),\cdots,f(\boldsymbol{x}_{i,Q_{x}},t_{i,j})]^{\top},
𝒇i=[𝒇i,0,𝒇i,1,,𝒇i,Qt1],\displaystyle\boldsymbol{f}_{i}=[\boldsymbol{f}_{i,0}^{\top},\boldsymbol{f}_{i,1}^{\top},\cdots,\boldsymbol{f}_{i,Q_{t}-1}^{\top}]^{\top},

then we construct the vector 𝒃=i=1Nt𝒃i\boldsymbol{b}=\sum_{i=1}^{N_{t}}\boldsymbol{b}_{i}, where

𝒃1=[𝒉,𝒇1,𝟎((Nt1)(Qt+1)Qx)],\displaystyle\boldsymbol{b}_{1}=[\boldsymbol{h}^{\top},\boldsymbol{f}_{1}^{\top},\boldsymbol{0}_{((N_{t}-1)(Q_{t}+1)Q_{x})}^{\top}]^{\top},
𝒃i=[𝟎(((i1)(Qt+1)+1)Qx),𝒇i,𝟎((Nti)(Qt+1)Qx)],fori=2,,Nt.\displaystyle\boldsymbol{b}_{i}=[\boldsymbol{0}_{(((i-1)(Q_{t}+1)+1)Q_{x})}^{\top},\boldsymbol{f}_{i}^{\top},\boldsymbol{0}_{((N_{t}-i)(Q_{t}+1)Q_{x})}^{\top}],\quad\mathrm{for}\;i=2,\cdots,N_{t}.

Let 𝒖NtJn\boldsymbol{u}\in\mathbb{R}^{N_{t}J_{n}}, then the optimal coefficient 𝒖S\boldsymbol{u}^{S} by ST-RFM is obtained by

(14) 𝒖S=min𝒖(𝐀𝒖𝒃)2.\boldsymbol{u}^{S}=\min_{\boldsymbol{u}}\|(\mathbf{A}\boldsymbol{u}-\boldsymbol{b})\|^{2}.

For time-dependent partial differential equations, Dong et al. [4] proposed the block time-marching strategy. Block time-marching strategy solves Eq. (7) in each time block individually and applies the numerical results in ii-th block at the terminal time as the initial conditions of (i+1)(i+1)-th block. Let Nb+N_{b}\in\mathbb{N}^{+} be the number of time blocks, and let Nt=1N_{t}=1 for simplicity. For first-order equations in time, we present the detailed process of block time-marching strategy in Algorithm 1, and the optimal coefficient 𝒖B\boldsymbol{u}^{B} is the concatenation of optimal coefficients in all time blocks.

Output: 𝒖B\boldsymbol{u}^{B}
1 𝐀~1=[𝚽1,0,𝑳1]\tilde{\mathbf{A}}_{1}=[\boldsymbol{\Phi}_{1,0}^{\top},\boldsymbol{L}_{1}^{\top}]^{\top}, 𝒃~1=[𝒈,𝒇1]\tilde{\boldsymbol{b}}_{1}=[\boldsymbol{g}^{\top},\boldsymbol{f}_{1}^{\top}]^{\top} ;
2 𝒖1B=min𝒖𝐀~1𝒖𝒃~12\boldsymbol{u}_{1}^{B}=\min_{\boldsymbol{u}}\|\tilde{\mathbf{A}}_{1}\boldsymbol{u}-\tilde{\boldsymbol{b}}_{1}\|^{2} ;
3 for i=2,3,\cdots,NbN_{b} do
4       𝐀~i=[𝚽i,0,𝑳i]\tilde{\mathbf{A}}_{i}=[\boldsymbol{\Phi}_{i,0}^{\top},\boldsymbol{L}_{i}^{\top}]^{\top}, 𝒃~i=[(𝒖i1B)𝚽i1,1,𝒇i]\tilde{\boldsymbol{b}}_{i}=[(\boldsymbol{u}_{i-1}^{B})^{\top}\boldsymbol{\Phi}_{i-1,1}^{\top},\boldsymbol{f}_{i}^{\top}]^{\top} ;
5       𝒖iB=min𝒖𝐀~i𝒖𝒃~i2\boldsymbol{u}_{i}^{B}=\min_{\boldsymbol{u}}\|\tilde{\mathbf{A}}_{i}\boldsymbol{u}-\tilde{\boldsymbol{b}}_{i}\|^{2} ;
6      
7 end for
8𝒖B=[(𝒖1B),(𝒖2B),,(𝒖NbB)]\boldsymbol{u}^{B}=[(\boldsymbol{u}_{1}^{B})^{\top},(\boldsymbol{u}_{2}^{B})^{\top},\cdots,(\boldsymbol{u}_{N_{b}}^{B})^{\top}]^{\top} ;
Algorithm 1 Block time-marching strategy to solve time-dependent PDEs

2.3. Approximation Properties

Both concatenation random features and separation-of-variables random features are universal approximators. To prove these, we first recall the definition of sigmoidal functions in [3].

Definition 2.1.

σ\sigma is said to be sigmoidal if

σ(z){1asz+,0asz.\sigma(z)\to\begin{cases}1\quad&\mathrm{as}\;z\to+\infty,\\ 0\quad&\mathrm{as}\;z\to-\infty.\\ \end{cases}

The approximation property of STC is given in the following theorem.

Theorem 2.1.

Let σ\sigma be any continuous sigmoidal function. Then finite sums of

(15) GC(𝒙,t)=j=1Nujσ((𝒌jx)𝒙+kjtt+bj)G_{C}(\boldsymbol{x},t)=\sum_{j=1}^{N}u_{j}\sigma((\boldsymbol{k}_{j}^{x})^{\top}\boldsymbol{x}+k_{j}^{t}t+b_{j})

are dense in C(Ω×[0,T])C(\Omega\times[0,T]). In other words, given any fC(Ω×[0,T])f\in C(\Omega\times[0,T]) and ϵ>0\epsilon>0, there exists a sum GC(x,t)G_{C}(x,t), such that

|GC(𝒙,t)f(𝒙,t)|<ϵforall𝒙,tΩ×[0,T].|G_{C}(\boldsymbol{x},t)-f(\boldsymbol{x},t)|<\epsilon\quad\mathrm{for}\;\mathrm{all}\;\boldsymbol{x},t\in\Omega\times[0,T].
Proof.

It is a direct consequence of in [3, Theorem 2]. ∎

Corollary 2.2.

When using tanh as the activation function in (15), the finite sums (15) are dense in C(Ω×[0,T])C(\Omega\times[0,T]).

Proof.

Consider a continuous sigmoidal function σ~(z)=11+ez\tilde{\sigma}(z)=\frac{1}{1+e^{-z}}, then tanh(z)=2σ~(2z)1\mathrm{tanh}(z)=2\tilde{\sigma}(2z)-1. From Theorem 2.1, there is a sum G~J(𝒙,t)=j=1N~u~jσ~((𝒌~jx)𝒙+k~jtt+bj~)\tilde{G}_{J}(\boldsymbol{x},t)=\sum_{j=1}^{\tilde{N}}\tilde{u}_{j}\tilde{\sigma}((\boldsymbol{\tilde{k}}_{j}^{x})^{\top}\boldsymbol{x}+\tilde{k}_{j}^{t}t+\tilde{b_{j}}), such that

|G~J(𝒙,t)f(𝒙,t)|<ϵ2.|\tilde{G}_{J}(\boldsymbol{x},t)-f(\boldsymbol{x},t)|<\frac{\epsilon}{2}.

Define u^=12j=1N~u~j\hat{u}=\frac{1}{2}\sum_{j=1}^{\tilde{N}}\tilde{u}_{j}. If u^=0\hat{u}=0, then the finite sum GC(𝒙,t)=j=1N~12u~jtanh(12((𝒌~jx)𝒙+k~jtt+bj~))G_{C}(\boldsymbol{x},t)=\sum_{j=1}^{\tilde{N}}\frac{1}{2}\tilde{u}_{j}\mathrm{tanh}(\frac{1}{2}((\boldsymbol{\tilde{k}}_{j}^{x})^{\top}\boldsymbol{x}+\tilde{k}_{j}^{t}t+\tilde{b_{j}})) satisfies

|GC(𝒙,t)f(𝒙,t)|\displaystyle|G_{C}(\boldsymbol{x},t)-f(\boldsymbol{x},t)| =|j=1N~12u~j(2σ~((𝒌~jx)𝒙+k~jtt+bj~)1)f(𝒙,t)|=|G~J(𝒙,t)f(x,t)|<ϵ2<ϵ.\displaystyle=|\sum_{j=1}^{\tilde{N}}\frac{1}{2}\tilde{u}_{j}(2\tilde{\sigma}((\boldsymbol{\tilde{k}}_{j}^{x})^{\top}\boldsymbol{x}+\tilde{k}_{j}^{t}t+\tilde{b_{j}})-1)-f(\boldsymbol{x},t)|=|\tilde{G}_{J}(\boldsymbol{x},t)-f(x,t)|<\frac{\epsilon}{2}<\epsilon.

If u^0\hat{u}\neq 0, since Ωdx\Omega\subset\mathbb{R}^{d_{x}} is compact and thus is bounded and closed, there exist 𝒌^x\boldsymbol{\hat{k}}^{x}, k^t\hat{k}^{t} and b^\hat{b} such that

|tanh((𝒌^x)𝒙+k^tt+b^)1|<ϵ|u^||\mathrm{tanh}((\boldsymbol{\hat{k}}^{x})^{\top}\boldsymbol{x}+\hat{k}_{t}t+\hat{b})-1|<\frac{\epsilon}{|\hat{u}|}

holds for all (𝒙,t)Ω×[0,T](\boldsymbol{x},t)\in\Omega\times[0,T]. Then, the finite sum GC(𝒙,t)=j=1N~12u~jtanh(12((𝒌~jx)𝒙+k~jtt+bj~))+12u^tanh((𝒌^x)𝒙+k^tt+b^)G_{C}(\boldsymbol{x},t)=\sum_{j=1}^{\tilde{N}}\frac{1}{2}\tilde{u}_{j}\mathrm{tanh}(\frac{1}{2}((\boldsymbol{\tilde{k}}_{j}^{x})^{\top}\boldsymbol{x}+\tilde{k}_{j}^{t}t+\tilde{b_{j}}))+\frac{1}{2}\hat{u}\mathrm{tanh}((\boldsymbol{\hat{k}}^{x})^{\top}\boldsymbol{x}+\hat{k}^{t}t+\hat{b}) satisfies

|GC(𝒙,t)f(𝒙,t)|\displaystyle|G_{C}(\boldsymbol{x},t)-f(\boldsymbol{x},t)| =|j=1N~12u~j(2σ~((𝒌~jx)𝒙+k~jtt+bj~)1)+12u^tanh((𝒌^x)𝒙+k^tt+b^)f(𝒙,t)|\displaystyle=|\sum_{j=1}^{\tilde{N}}\frac{1}{2}\tilde{u}_{j}(2\tilde{\sigma}((\boldsymbol{\tilde{k}}_{j}^{x})^{\top}\boldsymbol{x}+\tilde{k}_{j}^{t}t+\tilde{b_{j}})-1)+\frac{1}{2}\hat{u}\mathrm{tanh}((\boldsymbol{\hat{k}}^{x})^{\top}\boldsymbol{x}+\hat{k}^{t}t+\hat{b})-f(\boldsymbol{x},t)|
|G~J(𝒙,t)f(𝒙,t)|+12|u^||tanh((𝒌^x)𝒙+k^tt+b^)1|\displaystyle\leq|\tilde{G}_{J}(\boldsymbol{x},t)-f(\boldsymbol{x},t)|+\frac{1}{2}|\hat{u}||\mathrm{tanh}((\boldsymbol{\hat{k}}^{x})^{\top}\boldsymbol{x}+\hat{k}^{t}t+\hat{b})-1|
<ϵ2+ϵ2=ϵ\displaystyle<\frac{\epsilon}{2}+\frac{\epsilon}{2}=\epsilon

for all (𝒙,t)Ω×[0,T](\boldsymbol{x},t)\in\Omega\times[0,T]. ∎

For SoV, we first extend the definition of discriminatory functions in [3] as follows.

Definition 2.2.

σ\sigma is said to be discriminatory if for a measure μ(Ω×[0,T])\mu\in\mathcal{M}(\Omega\times[0,T])

Ω×[0,T]σ((𝒌x)𝒙+bx)σ(ktt+bt)dμ(𝒙,t)\int_{\Omega\times[0,T]}\sigma((\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b^{x})\sigma(k^{t}t+b^{t})\mathrm{d}\mu(\boldsymbol{x},t)

for all 𝐤xdx\boldsymbol{k}^{x}\in\mathbb{R}^{d_{x}}, ktk^{t}\in\mathbb{R}, bxb^{x}\in\mathbb{R} and btb^{t}\in\mathbb{R} implies μ=0\mu=0.

Lemma 2.3.

Any bounded, measurable sigmoidal function σ\sigma is discriminatory. In particular, any continuous sigmoidal function is discriminatory.

Proof.

For any 𝒙,t,𝒌=((𝒌x),kt),b,θ=(θx,θt)\boldsymbol{x},t,\boldsymbol{k}=((\boldsymbol{k}^{x})^{\top},k_{t})^{\top},b,\theta=(\theta^{x},\theta^{t}), we have

σ(λ((𝒌x)𝒙+b)+θx){1for(𝒌x)𝒙+b>0asλ+,0for(𝒌x)𝒙+b<0asλ+,=σ(θx)for(𝒌x)𝒙+b=0forallλ.\sigma(\lambda((\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b)+\theta_{x})\begin{cases}\to 1&\mathrm{for}\quad(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b>0\quad\mathrm{as}\quad\lambda\to+\infty,\\ \to 0&\mathrm{for}\quad(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b<0\quad\mathrm{as}\quad\lambda\to+\infty,\\ =\sigma(\theta_{x})&\mathrm{for}\quad(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b=0\quad\mathrm{for}\;\mathrm{all}\;\lambda.\end{cases}

Thus, σx,λ(𝒙)=σ(λ((𝒌x)𝒙+b)+θx)\sigma_{x,\lambda}(\boldsymbol{x})=\sigma(\lambda((\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b)+\theta_{x}) converges in the pointwise and bounded sense to

γx(𝒙)={1for(𝒌x)𝒙+b>0,0for(𝒌x)𝒙+b<0,σ(θx)for(𝒌x)𝒙+b=0\gamma_{x}(\boldsymbol{x})=\begin{cases}1&\mathrm{for}\quad(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b>0,\\ 0&\mathrm{for}\quad(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b<0,\\ \sigma(\theta_{x})&\mathrm{for}\quad(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b=0\end{cases}

as λ+\lambda\to+\infty. Similarly, we have σt,λ(t)=σ(λ(ytt+θ)+ϕt)\sigma_{t,\lambda}(t)=\sigma(\lambda(y_{t}t+\theta)+\phi_{t}) converges in the pointwise and bounded sense to

γt(t)={1forktt+b>0,0forktt+b<0,σ(θt)forktt+b=0\gamma_{t}(t)=\begin{cases}1&\mathrm{for}\quad k^{t}t+b>0,\\ 0&\mathrm{for}\quad k^{t}t+b<0,\\ \sigma(\theta_{t})&\mathrm{for}\quad k^{t}t+b=0\end{cases}

as λ+\lambda\to+\infty.

Denote Πk,bx={(𝒙,t)|(𝒌x)𝒙+b=0,ktt+b>0}\Pi_{k,b}^{x}=\{(\boldsymbol{x},t)|(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b=0,k^{t}t+b>0\} and Πk,bt={(𝒙,t)|(𝒌x)𝒙+b>0,ktt+b=0}\Pi_{k,b}^{t}=\{(\boldsymbol{x},t)|(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b>0,k^{t}t+b=0\}. Let Ik,bI_{k,b} be the hyperline defined by {(𝒙,t)|(𝒌x)𝒙+b=0,ktt+b=0}\{(\boldsymbol{x},t)|(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b=0,k^{t}t+b=0\} and Hk,bH_{k,b} be the space defined by {(𝒙,t)|(𝒌x)𝒙+b>0,ktt+b>0}\{(\boldsymbol{x},t)|(\boldsymbol{k}^{x})^{\top}\boldsymbol{x}+b>0,k^{t}t+b>0\}. Then by the Lesbegue Bounded Convergence Theorem, we have

0\displaystyle 0 =Ω×[0,T]σx,λ(𝒙)σt,λ(t)dμ(𝒙,t)=Ω×[0,T]γx(𝒙)γt(t)dμ(𝒙,t)\displaystyle=\int_{\Omega\times[0,T]}\sigma_{x,\lambda}(\boldsymbol{x})\sigma_{t,\lambda}(t)\mathrm{d}\mu(\boldsymbol{x},t)=\int_{\Omega\times[0,T]}\gamma_{x}(\boldsymbol{x})\gamma_{t}(t)\mathrm{d}\mu(\boldsymbol{x},t)
=μ(Hk,b)+σ(θx)μ(Πk,bx)+σ(θt)μ(Πk,bt)+σ(θx)σ(θt)μ(Ik,b)\displaystyle=\mu(H_{k,b})+\sigma(\theta_{x})\mu(\Pi_{k,b}^{x})+\sigma(\theta_{t})\mu(\Pi_{k,b}^{t})+\sigma(\theta_{x})\sigma(\theta_{t})\mu(I_{k,b})

for all θ,b,𝒌\theta,b,\boldsymbol{k}.

We now show that the measure of all quarter spaces being zero implies that the measure μ\mu itself must be zero. This would be trivial if μ\mu was a positive measure but here it is not.

For a fixed 𝒌\boldsymbol{k} and a bounded measurable function hh, we define a linear functional FF as

F(h)=Ω×[0,T]h((𝒌x)𝒙)h(ktt)dμ(𝒙,t).F(h)=\int_{\Omega\times[0,T]}h((\boldsymbol{k}^{x})^{\top}\boldsymbol{x})h(k^{t}t)\mathrm{d}\mu(\boldsymbol{x},t).

Note that FF is a bounded functional on L()L^{\infty}(\mathbb{R}) since μ\mu is a finite signed measure. Set hh the indicator function of the interval [θ,)[\theta,\infty), i.e. h(u)=1h(u)=1 if u0u\geq 0 and h(u)=0h(u)=0 if u<θu<\theta, then

F(h)=Ω×[0,T]h((𝒌x)𝒙)h(ktt)dμ(𝒙,t)=μ(Ik,b)+μ(Πk,bx)+μ(Πk,bt)+μ(Hk,b)=0.F(h)=\int_{\Omega\times[0,T]}h((\boldsymbol{k}^{x})^{\top}\boldsymbol{x})h(k^{t}t)\mathrm{d}\mu(\boldsymbol{x},t)=\mu(I_{k,-b})+\mu(\Pi_{k,-b}^{x})+\mu(\Pi_{k,-b}^{t})+\mu(H_{k,-b})=0.

Similarly, f(h)=0f(h)=0 if hh is the indicator function of the open interval (θ,)(\theta,\infty). By linearity, F(h)=0F(h)=0 holds for the indicator function of any interval and for any simple function (sum of indicator functions of intervals). Since simple functions are dense in L()L^{\infty}(\mathbb{R}), F=0F=0.

In particular, a substitution of bounded measurable functions s(𝒙,t)=sin(𝒎x𝒙+mtt)s(\boldsymbol{x},t)=\sin(\boldsymbol{m}_{x}^{\top}\boldsymbol{x}+m_{t}t) and c(𝒙,t)=cos(𝒎x𝒙+mtt)c(\boldsymbol{x},t)=\cos(\boldsymbol{m}_{x}^{\top}\boldsymbol{x}+m_{t}t) gives

F(c+is)=Ω×[0,T]cos(𝒎x𝒙+mtt)+isin(𝒎x𝒙+mtt)dμ(x,t)=Ω×[0,T]exp(i(𝒎x𝒙+mtt))dμ(𝒎,t)=0F(c+is)=\int_{\Omega\times[0,T]}\cos(\boldsymbol{m}_{x}^{\top}\boldsymbol{x}+m_{t}t)+i\sin(\boldsymbol{m}_{x}^{\top}\boldsymbol{x}+m_{t}t)\mathrm{d}\mu(x,t)=\int_{\Omega\times[0,T]}\exp(i(\boldsymbol{m}_{x}^{\top}\boldsymbol{x}+m_{t}t))\mathrm{d}\mu(\boldsymbol{m},t)=0

for all 𝒎=(𝒎x,mt)\boldsymbol{m}=(\boldsymbol{m}_{x},m_{t}). The Fourier transform of μ\mu is zero and we have μ=0\mu=0. Hence, σ\sigma is discriminatory. ∎

Remark 2.4.

Following the same routine, we can prove the approximation property of SoV. For any continuous sigmoidal function σ\sigma, the finite sums

(16) GS(𝒙,t)=j=1Nujσ((𝒌jx)𝒙+bjx)σ(ktt+bjt)G_{S}(\boldsymbol{x},t)=\sum_{j=1}^{N}u_{j}\sigma((\boldsymbol{k}_{j}^{x})^{\top}\boldsymbol{x}+b_{j}^{x})\sigma(k^{t}t+b_{j}^{t})

are dense in C(Ω×[0,T])C(\Omega\times[0,T]).

Corollary 2.5.

For the tanh activation function in (16), the finite sums of the form

(17) GS(𝒙,t)=j=1Nujtanh((𝒌jx)𝒙+bjx)tanh(kjtt+bjt)G_{S}(\boldsymbol{x},t)=\sum_{j=1}^{N}u_{j}\mathrm{tanh}((\boldsymbol{k}_{j}^{x})^{\top}\boldsymbol{x}+b_{j}^{x})\mathrm{tanh}(k_{j}^{t}t+b_{j}^{t})

are dense in C(Ω×[0,T])C(\Omega\times[0,T]).

Proof.

The proof is similar to that of Corollary 2.2. ∎

2.4. Error Estimates of Space-time Random Feature Method

For convenience, we set Nb=NtN_{b}=N_{t} to analyze the error of block time-marching strategy and the ST-RFM. Let 𝒖S\boldsymbol{u}^{S} and 𝒖B\boldsymbol{u}^{B} be exact solutions in Eq. (14) and the block time-marching strategy in Algorithm 1, and 𝒖^S\hat{\boldsymbol{u}}^{S} and 𝒖^B\hat{\boldsymbol{u}}^{B} be numerical solutions by a numerical solver, such as direct or iterative methods, respectively. Then we have the following results.

Theorem 2.6.

Assume that for any linear least-squares problem min𝐮𝐀𝐮𝐛2\min_{\boldsymbol{u}}\|\mathbf{A}\boldsymbol{u}-\boldsymbol{b}\|^{2} in ST-RFM and block time-marching strategy, there exists 𝐮\boldsymbol{u}^{*} such that 𝐀𝐮=𝐛\mathbf{A}\boldsymbol{u}^{*}=\boldsymbol{b}. Then 𝐮B=𝐮S\boldsymbol{u}^{B}=\boldsymbol{u}^{S}.

Proof.

Under the assumption, for each subproblem in Algorithm 1, the optimal solution 𝒖iB\boldsymbol{u}_{i}^{B} satisifies

𝐀~i𝒖iB=𝒃~i,fori=1,2,,Nt.\tilde{\mathbf{A}}_{i}\boldsymbol{u}_{i}^{B}=\tilde{\boldsymbol{b}}_{i},\quad\mathrm{for}\;i=1,2,\cdots,N_{t}.

Recalling the definition of 𝐀i\mathbf{A}_{i} in ST-RFM, we have

𝐀i𝒖iB=\displaystyle\mathbf{A}_{i}\boldsymbol{u}_{i}^{B}= [𝟎((i1)(Qt+1)Qx),𝒃~i,(𝒖iB)𝚽i,1,𝟎(((Nti)(Qt+1)1)Qx)]\displaystyle[\boldsymbol{0}_{((i-1)(Q_{t}+1)Q_{x})}^{\top},\tilde{\boldsymbol{b}}_{i}^{\top},-(\boldsymbol{u}_{i}^{B})^{\top}\mathbf{\Phi}_{i,1}^{\top},\mathbf{0}_{(((N_{t}-i)(Q_{t}+1)-1)Q_{x})}^{\top}]^{\top}
=\displaystyle= [𝟎((i1)(Qt+1)Qx),(𝒖i1B)𝚽i1,1,𝒇i,(𝒖iB)𝚽i,1,𝟎(((Nti)(Qt+1)1)Qx)]\displaystyle[\boldsymbol{0}_{((i-1)(Q_{t}+1)Q_{x})}^{\top},(\boldsymbol{u}_{i-1}^{B})^{\top}\mathbf{\Phi}_{i-1,1}^{\top},\boldsymbol{f}_{i}^{\top},-(\boldsymbol{u}_{i}^{B})^{\top}\mathbf{\Phi}_{i,1}^{\top},\mathbf{0}_{(((N_{t}-i)(Q_{t}+1)-1)Q_{x})}^{\top}]^{\top}
=\displaystyle= [𝟎((i1)(Qt+1)Qx),(𝒖i1B)𝚽i1,1,𝟎(((Nti+1)(Qt+1)1)Qx)]+\displaystyle[\boldsymbol{0}_{((i-1)(Q_{t}+1)Q_{x})}^{\top},(\boldsymbol{u}_{i-1}^{B})^{\top}\mathbf{\Phi}_{i-1,1}^{\top},\mathbf{0}_{(((N_{t}-i+1)(Q_{t}+1)-1)Q_{x})}^{\top}]^{\top}+
[𝟎(((i1)(Qt+1)+1)Qx),𝒇i,𝟎((Nti)(Qt+1)Qx)]\displaystyle[\boldsymbol{0}_{(((i-1)(Q_{t}+1)+1)Q_{x})}^{\top},\boldsymbol{f}_{i}^{\top},\mathbf{0}_{((N_{t}-i)(Q_{t}+1)Q_{x})}^{\top}]^{\top}-
[𝟎(i(Qt+1)Qx),(𝒖iB)𝚽i,1,𝟎(((Nti)(Qt+1)1)Qx)].\displaystyle[\boldsymbol{0}_{(i(Q_{t}+1)Q_{x})}^{\top},(\boldsymbol{u}_{i}^{B})^{\top}\mathbf{\Phi}_{i,1}^{\top},\mathbf{0}_{(((N_{t}-i)(Q_{t}+1)-1)Q_{x})}^{\top}]^{\top}.

After some algebraic calculations, we have

𝐀𝒖B=i=1Nt𝐀i𝒖iB=𝒃.\displaystyle\mathbf{A}\boldsymbol{u}^{B}=\sum_{i=1}^{N_{t}}\mathbf{A}_{i}\boldsymbol{u}_{i}^{B}=\boldsymbol{b}.

Since the optimal solution of linear least-squares problem is unique under the assumption, we have 𝒖B=𝒖S\boldsymbol{u}^{B}=\boldsymbol{u}^{S}. ∎

From Theorem 2.6, solving the time-dependent PDEs by the block time-marching strategy is equivalent to solving the same problem by the ST-RFM. In practice, however, the numerical solution 𝒖^\hat{\boldsymbol{u}} is different from the optimal solution 𝒖\boldsymbol{u}, and we denote the difference by δu\delta u. For long time intervals, when NbN_{b} and NtN_{t} increases by one, the solution error at Nb+1N_{b}+1 block or NtN_{t} subdomain is expected to be greater than previous blocks or subdomains due to error accumulation. To quantitatively analyze the error propagation in terms of NtN_{t} and NbN_{b}, we first introduce a matrix 𝐁i\mathbf{B}_{i} as

𝐁i=(𝐀~i𝐀~i)1𝚽i,0𝚽i1,1Jn×Jn,fori=1,,Nt.\mathbf{B}_{i}=(\tilde{\mathbf{A}}_{i}^{\top}\tilde{\mathbf{A}}_{i})^{-1}\mathbf{\Phi}_{i,0}^{\top}\mathbf{\Phi}_{i-1,1}\in\mathbb{R}^{J_{n}\times J_{n}},\qquad\mathrm{for}\;i=1,\cdots,N_{t}.

For simplicity, random feature functions are set to be the same over different time subdomains, i.e. ϕi1,j=ϕi2,j\phi_{i_{1},j}=\phi_{i_{2},j} for all 1i1i2Nt1\leq i_{1}\neq i_{2}\leq N_{t}. Then, all 𝐁i\mathbf{B}_{i} are the same and can be denoted by 𝐁\mathbf{B}. To proceed, we need the following assumption for 𝐁\mathbf{B}.

Assumption 2.1.

𝐁\mathbf{B} is diagonalizable.

Remark 2.7.

Although this assumption cannot be proved, we find that numerically 𝐁\mathbf{B} is diagonalizable for all the numerical examples we have tested. This is not a bad assumption since it provides a lower bound error estimate of the block time-marching strategy and is verified numerically as well.

Under Assumption 2.1, there exists {(λ1,𝒃1),,(λJn,𝒃Jn)}\{(\lambda_{1},\boldsymbol{b}_{1}),\cdots,(\lambda_{J_{n}},\boldsymbol{b}_{J_{n}})\} such that 𝒃k\boldsymbol{b}_{k} is the eigenvector of 𝐁\mathbf{B} with eigenvalue λk\lambda_{k} and {𝒃k}\{\boldsymbol{b}_{k}\} is a unitary orthogonal basis in Jn\mathbb{R}^{J_{n}}. Denote the eigenvalue with the largest modulus by λm\lambda_{m}, i.e. λm=max(|λk|)\lambda_{m}=\max(|\lambda_{k}|). Since {𝒃k}\{\boldsymbol{b}_{k}\} forms a basis, there exists {δuk}k=1Jn\{\delta u_{k}\}_{k=1}^{J_{n}} such that δu=k=1Jnδuk𝒃k\delta u=\sum_{k=1}^{J_{n}}\delta u_{k}\boldsymbol{b}_{k}, where δuk\delta u_{k}, k=1,,Jnk=1,\cdots,J_{n} are independent and identically distributed random variables with 𝔼[δuk]=0\mathbb{E}[\delta u_{k}]=0, 𝔼[|δuk|]=μ>0\mathbb{E}[|\delta u_{k}|]=\mu>0 and 𝔼[(δuk)2]=δ2>0\mathbb{E}[(\delta u_{k})^{2}]=\delta^{2}>0.

We need the following lemma to bound a geometric series from above and from below.

Lemma 2.8.

For any λ\lambda\in\mathbb{C} with |λ|1|\lambda|\neq 1 and ϵ(0,1)\epsilon\in(0,1), there exists N+N\in\mathbb{N}^{+} such that

(1+ϵ)Cλn|1λ||i=0n1λi|(1ϵ)Cλn|1λ|(1+\epsilon)\frac{C_{\lambda}^{n}}{|1-\lambda|}\geq|\sum_{i=0}^{n-1}\lambda^{i}|\geq(1-\epsilon)\frac{C_{\lambda}^{n}}{|1-\lambda|}

holds for n>Nn>N, where Cλ=max(1,|λ|)C_{\lambda}=\max(1,|\lambda|).

Proof.

When |λ|<1|\lambda|<1, we have

limn|i=0n1λi||1λ|=limn|1λn1λ||1λ|=limn|1λn|=1.\lim_{n\to\infty}\left|\sum_{i=0}^{n-1}\lambda^{i}\right||1-\lambda|=\lim_{n\to\infty}\left|\frac{1-\lambda^{n}}{1-\lambda}\right||1-\lambda|=\lim_{n\to\infty}|1-\lambda^{n}|=1.

Therefore, there exists N1+N_{1}\in\mathbb{N}^{+} such that

1+ϵ|i=0n1λi||1λ|1ϵ.1+\epsilon\geq\left|\sum_{i=0}^{n-1}\lambda^{i}\right||1-\lambda|\geq 1-\epsilon.

holds for n>N1n>N_{1}. Similarly, when |λ|>1|\lambda|>1, we have

limn|i=0n1λi||1λ||λ|n=limn|1λn1λ||1λ||λ|n=limn|λn1|=1.\lim_{n\to\infty}\left|\sum_{i=0}^{n-1}\lambda^{i}\right|\frac{|1-\lambda|}{|\lambda|^{n}}=\lim_{n\to\infty}\left|\frac{1-\lambda^{n}}{1-\lambda}\right|\frac{|1-\lambda|}{|\lambda|^{n}}=\lim_{n\to\infty}|\lambda^{-n}-1|=1.

Therefore, there exists N2+N_{2}\in\mathbb{N}^{+} such that

1+ϵ|i=0n1λi||1λ||λn|1ϵ.1+\epsilon\geq\left|\sum_{i=0}^{n-1}\lambda^{i}\right|\frac{|1-\lambda|}{|\lambda^{n}|}\geq 1-\epsilon.

Thus, the conclusion holds when N=max(N1,N2)N=\max(N_{1},N_{2}). ∎

Now we are ready to characterize the difference between 𝒖^B\hat{\boldsymbol{u}}^{B} and 𝒖B\boldsymbol{u}^{B} under the perturbation δu\delta u.

Lemma 2.9.

Under a perturbation δu\delta u to the solution, there exists N+N\in\mathbb{N}^{+} and α>0\alpha>0 such that when Nt>NN_{t}>N, the solution difference 𝐮^B𝐮B\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\| by block time-marching strategy satisfies

𝔼δu[𝒖^B𝒖B]αμNtβ(Nt),\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\|]\geq\alpha\mu\sqrt{N_{t}}\beta(N_{t}),

where

β(x)={1,ifmax(|λk|)1and 1{λk},x,ifmax(|λk|)=1and 1{λk},|λm|x/2,ifmax(|λk|)>1.\beta(x)=\begin{cases}1,&\mathrm{if}\;\max(|\lambda_{k}|)\leq 1\;\mathrm{and}\;1\notin\{\lambda_{k}\},\\ x,&\mathrm{if}\;\max(|\lambda_{k}|)=1\;\mathrm{and}\;1\in\{\lambda_{k}\},\\ |\lambda_{m}|^{x/2},&\mathrm{if}\;\max(|\lambda_{k}|)>1.\end{cases}
Proof.

The difference between 𝒖^iB\hat{\boldsymbol{u}}_{i}^{B} and 𝒖iB\boldsymbol{u}_{i}^{B} satisfies

𝒖^iB𝒖iB=\displaystyle\hat{\boldsymbol{u}}^{B}_{i}-\boldsymbol{u}^{B}_{i}= δu+(𝐀~i𝐀~i)1𝐀~i𝒃^i𝒖iB\displaystyle\delta u+(\tilde{\mathbf{A}}_{i}^{\top}\tilde{\mathbf{A}}_{i})^{-1}\tilde{\mathbf{A}}_{i}^{\top}\hat{\boldsymbol{b}}_{i}-\boldsymbol{u}_{i}^{B}
=\displaystyle= δu+(𝐀~i𝐀~i)1𝐀~i(𝒃~i+[𝚽i1,1,𝟎(Jn×QtQx)](𝒖^i1B𝒖i1B))𝒖iB\displaystyle\delta u+(\tilde{\mathbf{A}}_{i}^{\top}\tilde{\mathbf{A}}_{i})^{-1}\tilde{\mathbf{A}}_{i}^{\top}(\tilde{\boldsymbol{b}}_{i}+[\mathbf{\Phi}_{i-1,1}^{\top},\boldsymbol{0}_{(J_{n}\times Q_{t}Q_{x})}^{\top}]^{\top}(\hat{\boldsymbol{u}}_{i-1}^{B}-\boldsymbol{u}_{i-1}^{B}))-\boldsymbol{u}_{i}^{B}
=\displaystyle= δu+𝐁(𝒖^i1B𝒖i1B)\displaystyle\delta u+\mathbf{B}(\hat{\boldsymbol{u}}_{i-1}^{B}-\boldsymbol{u}_{i-1}^{B})
=\displaystyle= δu+𝐁(δu+𝐁(𝒖^i2B𝒖i2B))\displaystyle\delta u+\mathbf{B}(\delta u+\mathbf{B}(\hat{\boldsymbol{u}}_{i-2}^{B}-\boldsymbol{u}_{i-2}^{B}))
=\displaystyle= \displaystyle\cdots
=\displaystyle= (j=0i1𝐁j)δu,\displaystyle\left(\sum_{j=0}^{i-1}\mathbf{B}^{j}\right)\delta u,

where 𝒃^i𝒃~i\hat{\boldsymbol{b}}_{i}\neq\tilde{\boldsymbol{b}}_{i} due to 𝒖^i1B𝒖i1B𝟎\hat{\boldsymbol{u}}_{i-1}^{B}-\boldsymbol{u}_{i-1}^{B}\neq\boldsymbol{0}. Notice that {𝒃k}\{\boldsymbol{b}_{k}\} is an orthonormal basis of Jn\mathbb{R}^{J_{n}}, thus

𝒖^B𝒖B2=\displaystyle\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\|^{2}= i=1Nt𝒖^iB𝒖iB2\displaystyle\sum_{i=1}^{N_{t}}\|\hat{\boldsymbol{u}}^{B}_{i}-\boldsymbol{u}^{B}_{i}\|^{2}
=\displaystyle= i=1Ntk=1Jnj=0i1δukλkj𝒃k2\displaystyle\sum_{i=1}^{N_{t}}\left\|\sum_{k=1}^{J_{n}}\sum_{j=0}^{i-1}\delta u_{k}\lambda_{k}^{j}\boldsymbol{b}_{k}\right\|^{2} (𝐁j𝒃k=λk𝐁j1𝒃k==λkj𝒃k)\displaystyle\quad{\scriptstyle(\mathbf{B}^{j}\boldsymbol{b}_{k}=\lambda_{k}\mathbf{B}^{j-1}\boldsymbol{b}_{k}=\cdots=\lambda_{k}^{j}\boldsymbol{b}_{k})}
=\displaystyle= i=1Ntk=1Jn|j=0i1λkj|2δuk2\displaystyle\sum_{i=1}^{N_{t}}\sum_{k=1}^{J_{n}}\left|\sum_{j=0}^{i-1}\lambda_{k}^{j}\right|^{2}\delta u_{k}^{2} (Unitary orthogonality of {𝒃k}\{\boldsymbol{b}_{k}\})
\displaystyle\geq i=1Nt|j=0i1λmj|2δum2.\displaystyle\sum_{i=1}^{N_{t}}\left|\sum_{j=0}^{i-1}\lambda_{m}^{j}\right|^{2}\delta u_{m}^{2}.

The lower bound can be estimated for four different cases.

  1. (1)

    When |λm|<1|\lambda_{m}|<1, we have

    𝔼δu[𝒖^B𝒖B]\displaystyle\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\|]\geq 𝔼δu[((i=1N+i=N+1Nt)|j=0i1λmj|2δum2)12]\displaystyle\mathbb{E}_{\delta u}\left[\left(\left(\sum_{i=1}^{N}+\sum_{i=N+1}^{N_{t}}\right)\left|\sum_{j=0}^{i-1}\lambda_{m}^{j}\right|^{2}\delta u_{m}^{2}\right)^{\frac{1}{2}}\right]
    \displaystyle\geq 𝔼δu[(i=N+1Nt(1δ)Cλm2i|1λm|2δum2)12]\displaystyle\mathbb{E}_{\delta u}\left[\left(\sum_{i=N+1}^{N_{t}}(1-\delta)\frac{C_{\lambda_{m}}^{2i}}{|1-\lambda_{m}|^{2}}\delta u_{m}^{2}\right)^{\frac{1}{2}}\right] (Lemma 2.8)\displaystyle\quad{\scriptstyle(\text{Lemma~{}\ref{lem:method:2}})}
    \displaystyle\geq αμNt,\displaystyle\alpha\mu\sqrt{N_{t}}, (NtNNtN+1)\displaystyle\quad{\scriptstyle(N_{t}-N\geq\frac{N_{t}}{N+1})}

    where α=(1ϵ|1λm|2(N+1))12\alpha=\left(\frac{1-\epsilon}{|1-\lambda_{m}|^{2}(N+1)}\right)^{\frac{1}{2}}.

  2. (2)

    When |λm|=1|\lambda_{m}|=1 and λm1\lambda_{m}\neq 1, let λ=ejθ\lambda=e^{j\theta}, where j2=1j^{2}=-1 and θ(0,2π)\theta\in(0,2\pi), then we have

    𝔼δu[𝒖^B𝒖B]\displaystyle\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\|]\geq 𝔼δu[(i=1Nt|1ejiθ1ejθ|2δum2)12]\displaystyle\mathbb{E}_{\delta u}\left[\left(\sum_{i=1}^{N_{t}}\left|\frac{1-e^{ji\theta}}{1-e^{j\theta}}\right|^{2}\delta u_{m}^{2}\right)^{\frac{1}{2}}\right]
    =\displaystyle= 𝔼δu[(i=1Nt1cosiθ1cosθ)12|δum|]\displaystyle\mathbb{E}_{\delta u}\left[\left(\sum_{i=1}^{N_{t}}\frac{1-\cos i\theta}{1-\cos\theta}\right)^{\frac{1}{2}}|\delta u_{m}|\right]
    =\displaystyle= 𝔼δu[1(1cosθ)12(NtsinNt+12θsin12θ2sin12θ)12|δum|]\displaystyle\mathbb{E}_{\delta u}\left[\frac{1}{(1-\cos\theta)^{\frac{1}{2}}}\left(N_{t}-\frac{\sin\frac{N_{t}+1}{2}\theta-\sin\frac{1}{2}\theta}{2\sin\frac{1}{2}\theta}\right)^{\frac{1}{2}}|\delta u_{m}|\right]
    \displaystyle\geq 𝔼δu[(Nt1sin12θ1cosθ)12|δum|]\displaystyle\mathbb{E}_{\delta u}\left[\left(\frac{N_{t}-\frac{1}{\sin\frac{1}{2}\theta}}{1-\cos\theta}\right)^{\frac{1}{2}}|\delta u_{m}|\right]
    =\displaystyle= αμNt,\displaystyle\alpha\mu\sqrt{N_{t}}, (Ntz1zNtforz[0,1))\displaystyle\quad{\scriptstyle(\sqrt{N_{t}-z}\geq\sqrt{1-z}\sqrt{N_{t}}\;\mathrm{for}\;z\in[0,1))}

    where α=2(sin12θ1sin12θsin32θ)12\alpha=2\left(\frac{\sin\frac{1}{2}\theta-1}{\sin\frac{1}{2}\theta-\sin\frac{3}{2}\theta}\right)^{\frac{1}{2}}.

  3. (3)

    When |λm|=1|\lambda_{m}|=1 and λm=1\lambda_{m}=1, we have

    𝔼δu[𝒖^B𝒖B]\displaystyle\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\|]\geq 𝔼δu[(16Nt(Nt+1)(2Nt+1)δum2)12]\displaystyle\mathbb{E}_{\delta u}\left[\left(\frac{1}{6}N_{t}(N_{t}+1)(2N_{t}+1)\delta u_{m}^{2}\right)^{\frac{1}{2}}\right] (i=1Nti2=16Nt(Nt+1)(2Nt+1))\displaystyle\quad{\scriptstyle(\sum_{i=1}^{N_{t}}i^{2}=\frac{1}{6}N_{t}(N_{t}+1)(2N_{t}+1))}
    \displaystyle\geq αμNtNt,\displaystyle\alpha\mu\sqrt{N_{t}}N_{t},

    where α=1/3\alpha=1/\sqrt{3}.

  4. (4)

    When |λm|>1|\lambda_{m}|>1, we have

    𝔼δu[𝒖^B𝒖B]\displaystyle\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\|]\geq 𝔼δu[((i=1N+i=N+1Nt)|j=0i1λmj|2δum2)12]\displaystyle\mathbb{E}_{\delta u}\left[\left(\left(\sum_{i=1}^{N}+\sum_{i=N+1}^{N_{t}}\right)\left|\sum_{j=0}^{i-1}\lambda_{m}^{j}\right|^{2}\delta u_{m}^{2}\right)^{\frac{1}{2}}\right]
    \displaystyle\geq 𝔼δu[(i=N+1Nt(1ϵ)Cλm2i|1λm|2δum2)12]\displaystyle\mathbb{E}_{\delta u}\left[\left(\sum_{i=N+1}^{N_{t}}(1-\epsilon)\frac{C_{\lambda_{m}}^{2i}}{|1-\lambda_{m}|^{2}}\delta u_{m}^{2}\right)^{\frac{1}{2}}\right] (Lemma 2.8)\displaystyle\quad{\scriptstyle(\text{Lemma~{}\ref{lem:method:2}})}
    \displaystyle\geq 1ϵ|1λm|𝔼δu[((NtN)|λm|NtN+1δum2)12]\displaystyle\frac{\sqrt{1-\epsilon}}{|1-\lambda_{m}|}\mathbb{E}_{\delta u}\left[\left((N_{t}-N)|\lambda_{m}|^{N_{t}-N+1}\delta u_{m}^{2}\right)^{\frac{1}{2}}\right] (Arithmetic geometric mean inequality)\displaystyle\quad{\scriptstyle(\text{Arithmetic geometric mean inequality})}
    \displaystyle\geq αμNtβ(Nt),\displaystyle\alpha\mu\sqrt{N_{t}}\beta(N_{t}), (NtNNtN+1)\displaystyle\quad{\scriptstyle(N_{t}-N\geq\frac{N_{t}}{N+1})}

    where α=((1ϵ)|λm|1N|1λm|2(N+1))12\alpha=\left(\frac{(1-\epsilon)|\lambda_{m}|^{1-N}}{|1-\lambda_{m}|^{2}(N+1)}\right)^{\frac{1}{2}}.

If different random feature functions {ϕi,j}\{\phi_{i,j}\} are chosen over different time subdomains, we observe that 𝒖^B𝒖B\|\hat{\boldsymbol{u}}^{B}-\boldsymbol{u}^{B}\| also satisfies the estimation in Lemma 2.9.

Lemma 2.10.

For the space-time random feature method, there exists N+N\in\mathbb{N}^{+} and α>0\alpha>0 such that

𝔼δu[𝒖^S𝒖S]αδNt.\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{S}-\boldsymbol{u}^{S}\|]\leq\alpha\delta\sqrt{N_{t}}.
Proof.
(𝔼δu[𝒖^S𝒖S])2\displaystyle(\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{S}-\boldsymbol{u}^{S}\|])^{2}\leq 𝔼δu[𝒖^iS𝒖iS2]\displaystyle\mathbb{E}_{\delta u}[\|\hat{\boldsymbol{u}}^{S}_{i}-\boldsymbol{u}^{S}_{i}\|^{2}] (Schwarz inequality)\displaystyle{\scriptstyle(\text{Schwarz inequality})}
=\displaystyle= 𝔼δu[i=1Ntk=1Mδuk,i𝒃k2]\displaystyle\mathbb{E}_{\delta u}\left[\left\|\sum_{i=1}^{N_{t}}\sum_{k=1}^{M}\delta u_{k,i}\boldsymbol{b}_{k}\right\|^{2}\right]
=\displaystyle= αδ2Nt,\displaystyle\alpha\delta^{2}N_{t},

where α=M\alpha=\sqrt{M}. ∎

Lemma 2.11.

Let two functions uM(𝐱,t)u_{M}(\boldsymbol{x},t) and u^M(𝐱,t)\hat{u}_{M}(\boldsymbol{x},t) be of the form (11) with Nx=1N_{x}=1 and the coefficients being 𝐮\boldsymbol{u}, 𝐮^\hat{\boldsymbol{u}}, respectively, then there exists C1C2>0C_{1}\geq C_{2}>0 such that

C1𝒖^𝒖u^M(𝒙,t)uM(𝒙,t)L2C2𝒖^𝒖.C_{1}\|\hat{\boldsymbol{u}}-\boldsymbol{u}\|\geq\|\hat{u}_{M}(\boldsymbol{x},t)-u_{M}(\boldsymbol{x},t)\|_{L^{2}}\geq C_{2}\|\hat{\boldsymbol{u}}-\boldsymbol{u}\|.
Proof.

Notice that

u^M(𝒙,t)uM(𝒙,t)L2=(n=1Nti=1Jn(u^n,iun,i)2Ω×[0,T]ϕn,i(𝒙,t)2d𝒙dt)12C1𝒖^𝒖,\|\hat{u}_{M}(\boldsymbol{x},t)-u_{M}(\boldsymbol{x},t)\|_{L^{2}}=\left(\sum_{n=1}^{N_{t}}\sum_{i=1}^{J_{n}}(\hat{u}_{n,i}-u_{n,i})^{2}\int_{\Omega\times[0,T]}\phi_{n,i}(\boldsymbol{x},t)^{2}\;\mathrm{d}\boldsymbol{x}\mathrm{d}t\right)^{\frac{1}{2}}\leq C_{1}\|\boldsymbol{\hat{u}}-\boldsymbol{u}\|,

where C1=maxn,iϕn,i(𝒙,t)2C_{1}=\max_{n,i}\|\phi_{n,i}(\boldsymbol{x},t)\|_{2}, and

u^M(𝒙,t)uM(𝒙,t)L2=(n=1Nti=1Jn(u^n,iun,i)2Ω×[0,T]ϕn,i(𝒙,t)2d𝒙dt)12C2𝒖^𝒖,\|\hat{u}_{M}(\boldsymbol{x},t)-u_{M}(\boldsymbol{x},t)\|_{L^{2}}=\left(\sum_{n=1}^{N_{t}}\sum_{i=1}^{J_{n}}(\hat{u}_{n,i}-u_{n,i})^{2}\int_{\Omega\times[0,T]}\phi_{n,i}(\boldsymbol{x},t)^{2}\;\mathrm{d}\boldsymbol{x}\mathrm{d}t\right)^{\frac{1}{2}}\geq C_{2}\|\boldsymbol{\hat{u}}-\boldsymbol{u}\|,

where C2=minn,iϕn,i(𝒙,t)2C_{2}=\min_{n,i}\|\phi_{n,i}(\boldsymbol{x},t)\|_{2}. Moreover, for all ϕn,i2(Ω×[0,T])\phi_{n,i}\in\mathcal{L}^{2}(\Omega\times[0,T]), we have C1,C2<+C_{1},C_{2}<+\infty. ∎

Theorem 2.12 (Lower bound).

Denote ueu_{e} the exact solution. Let δu\delta u be the error between the numerical solution and the exact solution to the least-squares problem. Given ϵ>0\epsilon>0, we assume that

uMB(𝒙,t)ue(𝒙,t)L2(Ω×[tn,tn+1])ϵ,n=0,,N.\|u^{B}_{M}(\boldsymbol{x},t)-u_{e}(\boldsymbol{x},t)\|_{L^{2}(\Omega\times[t_{n},t_{n+1}])}\leq\epsilon,\quad\forall\;n=0,\cdots,N.

Then, there exists N+N\in\mathbb{N}^{+} and α>0\alpha>0 such that when Nt>NN_{t}>N,

(18) 𝔼δu[u^MB(𝒙,t)ue(𝒙,t)L2]αμNtβ(Nt)ϵNt.\mathbb{E}_{\delta u}[\|\hat{u}^{B}_{M}(\boldsymbol{x},t)-u_{e}(\boldsymbol{x},t)\|_{L^{2}}]\geq\alpha\mu\sqrt{N_{t}}\beta(N_{t})-\epsilon\sqrt{N_{t}}.
Proof.

Notice that

𝔼δu[u^MB(𝒙,t)ue(𝒙,t)L2]\displaystyle\mathbb{E}_{\delta u}[\|\hat{u}^{B}_{M}(\boldsymbol{x},t)-u_{e}(\boldsymbol{x},t)\|_{L^{2}}]\geq 𝔼δu[u^MB(𝒙,t)uMB(𝒙,t)L2]uMB(𝒙,t)ue(𝒙,t)L2\displaystyle\mathbb{E}_{\delta u}[\|\hat{u}^{B}_{M}(\boldsymbol{x},t)-u^{B}_{M}(\boldsymbol{x},t)\|_{L^{2}}]-\|u^{B}_{M}(\boldsymbol{x},t)-u_{e}(\boldsymbol{x},t)\|_{L_{2}} (Triangle inequality)\displaystyle\quad{\scriptstyle(\text{Triangle inequality})}
\displaystyle\geq αμNtβ(Nt)(n=1Ntϵ2)12\displaystyle\alpha\mu\sqrt{N_{t}}\beta(N_{t})-\left(\sum_{n=1}^{N_{t}}\epsilon^{2}\right)^{\frac{1}{2}} (Lemma 2.9 and Lemma 2.11)\displaystyle\quad{\scriptstyle(\text{Lemma~{}\ref{thm:method:3} and Lemma~{}\ref{thm:method:5}})}
=\displaystyle= αμNtβ(Nt)ϵNt.\displaystyle\alpha\mu\sqrt{N_{t}}\beta(N_{t})-\epsilon\sqrt{N_{t}}.

Theorem 2.13 (Upper bound).

Denote ueu_{e} the exact solution. Let δu\delta u be the error between the numerical solution and the exact solution to the least-squares problem. Given ϵ>0\epsilon>0, we assume that

uMS(𝒙,t)ue(𝒙,t)L2(Ω×[tn,tn+1])ϵ,n=0,,N.\|u^{S}_{M}(\boldsymbol{x},t)-u_{e}(\boldsymbol{x},t)\|_{L^{2}(\Omega\times[t_{n},t_{n+1}])}\leq\epsilon,\quad\forall\;n=0,\cdots,N.

Then, there exists N+N\in\mathbb{N}^{+} and α>0\alpha>0 such that when Nt>NN_{t}>N,

(19) 𝔼δu[u^S(𝒙,t)ue(𝒙,t)L2]αδNt+ϵNt.\mathbb{E}_{\delta u}[\|\hat{u}^{S}(\boldsymbol{x},t)-u_{e}(\boldsymbol{x},t)\|_{L^{2}}]\leq\alpha\delta\sqrt{N_{t}}+\epsilon\sqrt{N_{t}}.
Proof.

Notice that

𝔼δu[u^S(𝒙,t)ue(𝒙,t)L2]\displaystyle\mathbb{E}_{\delta u}[\|\hat{u}^{S}(\boldsymbol{x},t)-u_{e}(\boldsymbol{x},t)\|_{L^{2}}]\leq 𝔼δu[u^S(𝒙,t)uS(𝒙,t)L2]+uS(𝒙,t)+ue(𝒙,t)L2\displaystyle\mathbb{E}_{\delta u}[\|\hat{u}^{S}(\boldsymbol{x},t)-u^{S}(\boldsymbol{x},t)\|_{L^{2}}]+\|u^{S}(\boldsymbol{x},t)+u_{e}(\boldsymbol{x},t)\|_{L_{2}} (Triangle inequality)\displaystyle\quad{\scriptstyle(\text{Triangle inequality})}
\displaystyle\leq ασNt+(n=1Ntϵ2)12\displaystyle\alpha\sigma\sqrt{N_{t}}+\left(\sum_{n=1}^{N_{t}}\epsilon^{2}\right)^{\frac{1}{2}} (Lemma 2.10 and Lemma 2.11)\displaystyle\quad{\scriptstyle(\text{Lemma~{}\ref{thm:method:4} and Lemma~{}\ref{thm:method:5}})}
=\displaystyle= ασNt+Ntϵ.\displaystyle\alpha\sigma\sqrt{N_{t}}+\sqrt{N_{t}}\epsilon.

From Theorem 2.12 and Theorem 2.13, we see that the error of solving the least-squares problem by the block time-marching strategy increases exponentially in time, while the error in the ST-RFM does not have this problem. These are also confirmed by the numerical results given below.

3. Numerical Results

In this section, we present numerical results for one-dimensional and two-dimensional problems with simple geometry and a two-dimensional problem with complex geometry to demonstrate the effectiveness of the ST-RFM and confirm theoretical results.

3.1. One-dimensional Problems

3.1.1. Heat Equation

Consider the following problem

(20) {tu(x,t)α2x2u(x,t)=0,x[x0,x1],t[0,T],u(x0,t)=g1(t),t[0,T],u(x1,t)=g2(t),t[0,T],u(x,0)=h(x),x[x0,x1],\left\{\begin{aligned} &\partial_{t}u(x,t)-\alpha^{2}\partial_{x}^{2}u(x,t)=0,\qquad&x\in[x_{0},x_{1}],t\in[0,T],\\ &u(x_{0},t)=g_{1}(t),&t\in[0,T],\\ &u(x_{1},t)=g_{2}(t),&t\in[0,T],\\ &u(x,0)=h(x),&x\in[x_{0},x_{1}],\\ \end{aligned}\right.

where α=π/2\alpha=\pi/2, x0=0x_{0}=0, x1=12x_{1}=12 and T=10T=10. The exact solution is chosen to be

(21) ue(x,t)=2sin(αx)et.u_{e}(x,t)=2\sin(\alpha x)e^{-t}.

We choose the initial condition h(x)h(x) by restricting Eq. (21) to t=0t=0, and the boundary conditions g1(t)g_{1}(t) and g2(t)g_{2}(t) by restricting Eq. (21) to x=x0x=x_{0} and x=x1x=x_{1}, respectively.

Set the default hyper-parameters Nx=2N_{x}=2, Nt=5N_{t}=5, Qx=20Q_{x}=20, Qt=20Q_{t}=20, Jn=400J_{n}=400 and Nb=1N_{b}=1. Numerical solutions and errors of STC (Eq. (9)) and SoV (Eq. (10)) are plotted in Figure 1. The LL^{\infty} error in the ST-RFM is small (<4.5e6<4.5e-6), which indicates that both random feature functions have strong approximation properties.

Refer to caption
Refer to caption
(a) STC.
Refer to caption
Refer to caption
(b) SoV.
Figure 1. Results for the heat equation. (a) STC: Left, numerical solution; right, absolute error. (b) SoV: Left, numerical solution; right, absolute error.

Next, we report the convergence behavior with respect to different parameters in Figure 2(a)-(d). In Figure 2(a), we set Nb=1N_{b}=1, Nx=2N_{x}=2, Jn=400J_{n}=400, Qx=Qt=20Q_{x}=Q_{t}=20 and Nt=1,,5N_{t}=1,\cdots,5 to verify the convergence with respect to NtN_{t}. In Figure 2(b), we set Nx=2N_{x}=2, Nt=1N_{t}=1, Jn=400J_{n}=400, Qx=Qt=20Q_{x}=Q_{t}=20 and Nb=1,,5N_{b}=1,\cdots,5 to verify the convergence with respect to NbN_{b}. In Figure 2(c), we set Nb=5N_{b}=5, Nx=2N_{x}=2, Nt=1N_{t}=1, Qx=Qt=20Q_{x}=Q_{t}=20 and Jn=50,100,200,300,400J_{n}=50,100,200,300,400 to verify the convergence with respect to JnJ_{n}. In Figure 2(d), we set Nb=5N_{b}=5, Nx=2N_{x}=2, Nt=1N_{t}=1, Jn=400J_{n}=400 and Qx=Qt=5,10,15,20,25Q_{x}=Q_{t}=5,10,15,20,25 to verify the convergence with respect to QxQ_{x}/QtQ_{t}. A clear trend of spectral accuracy is observed for the ST-RFM in both spatial and temporal directions.

Now, we compare STC and SoV in Figure 2(e), where the default hyper-parameter setting is used. For this example, STC performs better than SoV. The comparison between the block time-marching strategy and the ST-RFM is plotted in Figure 2(f), where we set Nb=5N_{b}=5 and Nt=1N_{t}=1 for the block time-marching strategy and Nb=1N_{b}=1 and Nt=5N_{t}=5 for the ST-RFM. The L2L^{2} error of solution by the block time-marching strategy increases exponentially fast with respect to the number of blocks, while the error in the ST-RFM remains almost flat over all time subdomains.

Refer to caption
(a) Convergence w.r.t. NtN_{t}.
Refer to caption
(b) Convergence w.r.t. NbN_{b}.
Refer to caption
(c) Convergence w.r.t. JnJ_{n}.
Refer to caption
(d) Convergence w.r.t. QQ.
Refer to caption
(e) STC v.s. SoV.
Refer to caption
(f) Block time-marching v.s. ST-RFM.
Figure 2. Convergence behavior as different hyper-parameters are varied for the heat equation. (a) the number of time subdomains in the ST-RFM; (b) the number of time blocks in the block time-marching strategy; (c) the number of random features in each direction; (d) the number of collocation points in each direction; (e) comparison of STC and SoV; (f) comparison of the ST-RFM and the block time-marching strategy.

3.1.2. Heat Equation with Nonsmooth Initial Condition

Consider the heat equation (20) with x0=0x_{0}=0, x1=8x_{1}=8 and the nonsmooth initial condition as follows:

(22) h(x)=2𝕀0x<4sin(πx/2)+2𝕀4x8sin(πx).h(x)=2\mathds{I}_{0\leq x<4}\sin\left(\pi x/2\right)+2\mathds{I}_{4\leq x\leq 8}\sin\left({\pi}x\right).

It is easy to check that h(x)h(x) only belongs to C([0,8])C([0,8]).

Since an exact solution is not analytically available for the problem under consideration, we employ a set of hyper-parameters, namely Nx=2N_{x}=2, Nt=5N_{t}=5, Qx=30Q_{x}=30, Qt=50Q_{t}=50, Jn=250J_{n}=250, and Nb=1N_{b}=1, to obtain a reference solution. Numerical convergence of ST-RFM is evaluated in terms of the relative L2L^{2} error, which is depicted in Figure 3. We use STC to implement the basis function of ST-RFM here. Apparently, spectral accuracy is observed. However, the accuracy is only around 10310^{-3}, two orders of magnitude larger than that in the smooth case (20), which is attributed to the regularity deficiency.

Refer to caption
(a) Convergence w.r.t. time subdomains.
Refer to caption
(b) Convergence w.r.t. JnJ_{n}.
Refer to caption
(c) Convergence w.r.t. QQ.
Figure 3. Convergence behavior as different hyper-parameters are varied for the heat equation with nonsmooth initial condition. (a) the number of time subdomains in the ST-RFM; (b) the number of random features in each direction; (c) the number of collocation points in the temporal direction.

3.1.3. Wave Equation

Consider the following problem

(23) {t2u(x,t)α2x2u(x,t)=0,x,t[x0,x1]×[0,T],u(x0,t)=u(x1,t)=0.t[0,T],u(x,0)=g1(x),x[x0,x1],tu(x,0)=g2(x),x[x0,x1].\left\{\begin{aligned} &\partial_{t}^{2}u(x,t)-\alpha^{2}\partial_{x}^{2}u(x,t)=0,&\quad x,t\in[x_{0},x_{1}]\times[0,T],\\ &u(x_{0},t)=u(x_{1},t)=0.&\quad t\in[0,T],\\ &u(x,0)=g_{1}(x),&\quad x\in[x_{0},x_{1}],\\ &\partial_{t}u(x,0)=g_{2}(x),&\quad x\in[x_{0},x_{1}].\\ \end{aligned}\right.

where x0=0x_{0}=0, x1=6πx_{1}=6\pi, α=1\alpha=1 and T=10T=10. The exact solution is chosen to be

(24) ue(x,t)=cos(aπlt)sin(πlx)+[cos(2aπlt)+l2aπsin(2aπlt)]sin(2πlx),l=x1x0.u_{e}(x,t)=\cos\left(\frac{a\pi}{l}t\right)\sin\left(\frac{\pi}{l}x\right)+\left[\cos\left(\frac{2a\pi}{l}t\right)+\frac{l}{2a\pi}\sin\left(\frac{2a\pi}{l}t\right)\right]\sin\left(\frac{2\pi}{l}x\right),\quad l=x_{1}-x_{0}.

Initial conditions g1(x)g_{1}(x) and g2(x)g_{2}(x) are chosen accordingly.

Set the default hyper-parameters Nx=5N_{x}=5, Nt=5N_{t}=5, Qx=30Q_{x}=30, Qt=30Q_{t}=30, Jn=300J_{n}=300 and Nb=1N_{b}=1. Numerical solutions and errors of STC and SoV are plotted in Figure 4. The LL^{\infty} error is smaller than 2.5e82.5e-8.

Refer to caption
Refer to caption
(a) STC.
Refer to caption
Refer to caption
(b) SoV.
Figure 4. Results for the wave equation. (a) ST-RFM with STC: Left, numerical solution; right, absolute error. (b) ST-RFM with SoV: Left, numerical solution; right, absolute error.

Next, we report the convergence behavior with respect to different parameters in Figure 5(a-d). In Figure 5(a), we set Nb=1N_{b}=1, Nx=5N_{x}=5, Jn=300J_{n}=300, Qx=Qt=30Q_{x}=Q_{t}=30 and Nt=1,,5N_{t}=1,\cdots,5 to verify the convergence with respect to NtN_{t}. In Figure 5(b), we set Nx=5N_{x}=5, Nt=1N_{t}=1, Jn=300J_{n}=300, Qx=Qt=30Q_{x}=Q_{t}=30 and Nb=1,,5N_{b}=1,\cdots,5 to verify the convergence with respect to NbN_{b}. In Figure 5(c), we set Nb=5N_{b}=5, Nx=5N_{x}=5, Nt=1N_{t}=1, Qx=Qt=30Q_{x}=Q_{t}=30 and Jn=100,150,200,250,300J_{n}=100,150,200,250,300 to verify the convergence with respect to JnJ_{n}. In Figure 5(d), we set Nb=5N_{b}=5, Nx=5N_{x}=5, Nt=1N_{t}=1, Jn=300J_{n}=300 and Qx=Qt=10,15,20,25,30Q_{x}=Q_{t}=10,15,20,25,30 to verify the convergence with respect to QxQ_{x}/QtQ_{t}. A clear trend of spectral accuracy is observed for the ST-RFM in both spatial and temporal directions.

Now, we compare STC and SoV in Figure 5(e), where we set Nb=1N_{b}=1, Nx=5N_{x}=5, Nt=1,,5N_{t}=1,\cdots,5, Qx=Qt=30Q_{x}=Q_{t}=30 and Jn=300J_{n}=300. For this example, SoV performs better than STC. The comparison between the block time-marching strategy and the ST-RFM is plotted in Figure 5(f), where we set Nb=5N_{b}=5 and Nt=1N_{t}=1 for block time-marching strategy and Nb=1N_{b}=1 and Nt=5N_{t}=5 for ST-RFM. The L2L^{2} error of the solution by the block time-marching strategy increases exponentially fast with respect to the number of blocks, while the error in the ST-RFM remains almost flat over all time subdomains.

Refer to caption
(a) Convergence w.r.t. NtN_{t}.
Refer to caption
(b) Convergence w.r.t. NbN_{b}.
Refer to caption
(c) Convergence w.r.t. JnJ_{n}.
Refer to caption
(d) Convergence w.r.t. QQ.
Refer to caption
(e) STC v.s. SoV.
Refer to caption
(f) Block time-marching v.s. ST-RFM.
Figure 5. Convergence behavior as different hyper-parameters are varied for the wave equation. (a) the number of time subdomains in the ST-RFM; (b) the number of time blocks in the block time-marching strategy; (c) the number of random features in each direction; (d) the number of collocation points in each direction; (e) comparison of STC and SoV; (f) comparison of the ST-RFM and the block time-marching strategy.

3.1.4. Schrödinger Equation

Consider the following problem

(25) {itψ(x,t)+0.5Δψ(x,t)=0,x,t[x0,x1]×[0,T],ψ(x,0)=g(x),x[x0,x1],ψ(x0,t)=ψ(x1,t),t[0,T],xψ(x0,t)=xψ(x1,t)t[0,T],\left\{\begin{aligned} &i\partial_{t}\psi(x,t)+0.5\Delta\psi(x,t)=0,&\quad x,t\in[x_{0},x_{1}]\times[0,T],\\ &\psi(x,0)=g(x),&\quad x\in[x_{0},x_{1}],\\ &\psi(x_{0},t)=\psi(x_{1},t),&\quad t\in[0,T],\\ &\partial_{x}\psi(x_{0},t)=\partial_{x}\psi(x_{1},t)&\quad t\in[0,T],\end{aligned}\right.

where x0=0x_{0}=0, x1=5x_{1}=5 and T=10T=10. The exact solution is chosen to be

(26) ψ(x,t)=eiω2t/2(2cos(ωx)+sin(ωx)),ω=2πb1a1,\psi(x,t)=e^{-i\omega^{2}t/2}(2\cos(\omega x)+\sin(\omega x)),\quad\omega=\frac{2\pi}{b_{1}-a_{1}},

and g(x)g(x) is chosen accordingly.

Set the default hyper-parameters Nx=5N_{x}=5, Nt=3N_{t}=3, Qx=30Q_{x}=30, Qt=30Q_{t}=30, Jn=300J_{n}=300 and Nb=1N_{b}=1. Numerical solutions and errors of STC and SoV are plotted in Figure 6. The LL^{\infty} error in the ST-RFM is smaller than 2.0e92.0e-9.

Refer to caption
Refer to caption
(a) Real part: STC.
Refer to caption
Refer to caption
(b) Real part: SoV.
Refer to caption
Refer to caption
(c) Imaginary part: STC.
Refer to caption
Refer to caption
(d) Imaginary part: SoV.
Figure 6. Results for Schrödinger equation. (a), (c) STC: Left, numerical solution; right, absolute error; (b), (d) SoV: Left, numerical solution; right, absolute error.

Next, we report the convergence behavior with respect to different parameters in Figure 7(a-d) (real part) and Figure 8(a-d) (imaginary part). In Figure 7(a) and Figure 8(a), we set Nb=1N_{b}=1, Nx=5N_{x}=5, Jn=300J_{n}=300, Qx=Qt=30Q_{x}=Q_{t}=30 and Nt=1,2,3N_{t}=1,2,3 to verify the convergence with respect to NtN_{t}. In Figure 7(b) and Figure 8(b), we set Nx=5N_{x}=5, Nt=1N_{t}=1, Jn=300J_{n}=300, Qx=Qt=30Q_{x}=Q_{t}=30 and Nb=1,2,3N_{b}=1,2,3 to verify the convergence with respect to NbN_{b}. In Figure 7(c) and Figure 8(c), we set Nb=3N_{b}=3, Nx=5N_{x}=5, Nt=1N_{t}=1, Qx=Qt=30Q_{x}=Q_{t}=30 and Jn=100,150,200,250,300J_{n}=100,150,200,250,300 to verify the convergence with respect to JnJ_{n}. In Figure 7(d) and Figure 8(d), we set Nb=3N_{b}=3, Nx=5N_{x}=5, Nt=1N_{t}=1, Jn=300J_{n}=300 and Qx=Qt=10,15,20,25,30Q_{x}=Q_{t}=10,15,20,25,30 to verify the convergence with respect to QxQ_{x}/QtQ_{t}. A clear trend of spectral accuracy is observed for the ST-RFM in both spatial and temporal directions.

Now, we compare STC and SoV in Figure 7(e) and Figure 8(e), where we set Nb=1N_{b}=1, Nx=5N_{x}=5, Nt=1,2,3N_{t}=1,2,3, Qx=Qt=30Q_{x}=Q_{t}=30 and Jn=300J_{n}=300. For this example, SoV performs better than STC. The comparison between the block time-marching strategy and the ST-RFM is plotted in Figure 7(f) and Figure 8(f), where we set Nb=3N_{b}=3 and Nt=1N_{t}=1 for block time-marching strategy and Nb=1N_{b}=1 and Nt=3N_{t}=3 for ST-RFM. The L2L^{2} error of the solution by the block time-marching strategy increases exponentially fast with respect to the number of blocks, while the error in the ST-RFM increases slower over all time subdomains.

Refer to caption
(a) Convergence w.r.t. NtN_{t}.
Refer to caption
(b) Convergence w.r.t. NbN_{b}.
Refer to caption
(c) Convergence w.r.t. JnJ_{n}.
Refer to caption
(d) Convergence w.r.t. QQ.
Refer to caption
(e) STC v.s. SoV.
Refer to caption
(f) Block time-marching v.s. ST-RFM.
Figure 7. Convergence behavior as different hyper-parameters are varied for Schrödinger equation (real part). (a) the number of time subdomains for ST-RFM; (b) the number of time blocks in the block time-marching strategy; (c) the number of random features in each direction; (d) the number of collocation points in each direction; (e) comparison of STC and SoV; (f) comparison of the ST-RFM and the block time-marching strategy.
Refer to caption
(a) Convergence w.r.t. NtN_{t}.
Refer to caption
(b) Convergence w.r.t. NbN_{b}.
Refer to caption
(c) Convergence w.r.t. JnJ_{n}.
Refer to caption
(d) Convergence w.r.t. QQ.
Refer to caption
(e) STC v.s. SoV.
Refer to caption
(f) Block time-marching v.s. ST-RFM.
Figure 8. Convergence behavior as different hyper-parameters are varied for Schrödinger equation (imaginary part). (a) the number of time subdomains for ST-RFM; (b) the number of time blocks in the block time-marching strategy; (c) the number of random features in each direction; (d) the number of collocation points in each direction; (e) comparison of STC and SoV; (f) comparison of the ST-RFM and the block time-marching strategy.

3.1.5. Verification of Theoretical Results

First, we verify Assumption 2.1 for all one-dimensional problems using the sufficient condition that the number of different eigenvalues of 𝐁\mathbf{B}, denoted by #uniqueλk\mathrm{unique}\;\lambda_{k}, equals to the number of random features JnJ_{n}. Results are recorded in Table 1. The number of unique eigenvalues of 𝐁\mathbf{B} equals to JnJ_{n} for all one-dimensional problems and Assumption 2.1 is verified.

Heat Wave Schrödinger
JnJ_{n} #unique λk\lambda_{k} JnJ_{n} #unique λk\lambda_{k} JnJ_{n} #unique λk\lambda_{k}
100 100 100 100 200 200
150 150 150 150 300 300
200 200 200 200 400 400
250 250 250 250 500 500
300 300 300 300 600 600
350 350 350 350 700 700
400 400 400 400 800 800
Table 1. Eigenvalue distribution of 𝐁\mathbf{B} in one-dimensional problems for the verification of Assumption 2.1.

Next, we use the wave equation to verify the error estimate in Theorem 2.12. For the block time-marching strategy, we set Nb=20,Nx=2,Nt=1,Jn=100,Qx=10,Qt=10N_{b}=20,N_{x}=2,N_{t}=1,J_{n}=100,Q_{x}=10,Q_{t}=10, and for ST-RFM, we set Nb=1,Nx=2,Nt=20,Jn=100,Qx=10,Qt=10N_{b}=1,N_{x}=2,N_{t}=20,J_{n}=100,Q_{x}=10,Q_{t}=10. Eigenvalues of 𝐁\mathbf{B} are plotted in Figure 9(a), and the L2L^{2} error is plotted in Figure 9(b).

The largest modulus of eigenvalues of 𝐁\mathbf{B} from Figure 9(a) is 187.75187.75, indicating that β(x)13.70x\beta(x)\approx 13.70^{x}, while the L2L^{2} error in the block time-marching strategy increases exponentially with the rate 17.7017.70 from Figure 9(b). Therefore, the lower bound estimate in Theorem 2.10 is verified. In addition, the L2L^{2} error in the ST-RFM remains almost flat, which is mainly constrolled by the approximation power of space-time random features.

Refer to caption
(a) Modulus of λk\lambda_{k}.
Refer to caption
(b) L2L^{2} error.
Figure 9. (a) Eigenvalue distribution for the matrix 𝐁\mathbf{B}; (b) L2L^{2} error for the wave equation. We see a good agreement between Theorem 2.10, Theorem 2.11, and the actual numerical results.

3.2. Two-dimensional problems

3.2.1. Membrane Vibration over a Simple Geometry

Consider the following problem

(27) {ttu(x,y,t)α2Δu(x,y,t)=0,(x,y),tΩ×[0,T],u(x,y,0)=ϕ(x,y),(x,y)Ω,tu(x,y,0)=ψ(x,y),(x,y)Ω,u(x,y,t)=0,(x,y)Ω×[0,T],\left\{\begin{aligned} &\partial_{tt}u(x,y,t)-\alpha^{2}\Delta u(x,y,t)=0,&\quad(x,y),t\in\Omega\times[0,T],\\ &u(x,y,0)=\phi(x,y),&\quad(x,y)\in\Omega,\\ &\partial_{t}u(x,y,0)=\psi(x,y),&\quad(x,y)\in\Omega,\\ &u(x,y,t)=0,&\quad(x,y)\in\partial\Omega\times[0,T],\\ \end{aligned}\right.

where Ω=[0,5]×[0,4]\Omega=[0,5]\times[0,4], α=1\alpha=1 and T=10T=10. The exact solution is chosen to be

(28) ue(x,y,t)=sin(μx)sin(νy)(2cos(λt)+sin(λt)),μ=2πx1x0,ν=2πy1y0,λ=μ2+ν2,u_{e}(x,y,t)=\sin(\mu x)\sin(\nu y)(2\cos(\lambda t)+\sin(\lambda t)),\qquad\mu=\frac{2\pi}{x_{1}-x_{0}},\nu=\frac{2\pi}{y_{1}-y_{0}},\lambda=\sqrt{\mu^{2}+\nu^{2}},

and ϕ(x,y)\phi(x,y) and ψ(x,y)\psi(x,y) are chosen accordingly.

Set the default hyper-parameters Nx=Ny=2N_{x}=N_{y}=2, Nt=5N_{t}=5, Qx=Qy=Qt=30Q_{x}=Q_{y}=Q_{t}=30, Jn=400J_{n}=400 and Nb=1N_{b}=1. We report the convergence behavior with respect to different parameters in Figure 10(a-c). In Figure 10(a), we set Nb=1N_{b}=1, Nx=Ny=2N_{x}=N_{y}=2, Jn=400J_{n}=400, Qx=Qy=Qt=30Q_{x}=Q_{y}=Q_{t}=30 and Nt=1,,6N_{t}=1,\cdots,6 to verify the convergence with respect to NtN_{t}. In Figure 10(b), we set Nb=1N_{b}=1, Nx=Ny=Nt=2N_{x}=N_{y}=N_{t}=2, Qx=Qy=Qt=30Q_{x}=Q_{y}=Q_{t}=30 and Jn=100,150,200,250,300,350,400J_{n}=100,150,200,250,300,350,400 to verify the convergence with respect to JnJ_{n}. In Figure 10(c), we set Nb=5N_{b}=5, Nx=Ny=Nt=5N_{x}=N_{y}=N_{t}=5, Jn=400J_{n}=400 and Qx=Qy=Qt=10,15,20,25,30Q_{x}=Q_{y}=Q_{t}=10,15,20,25,30 to verify the convergence with respect to the number of collocation points. A clear trend of spectral accuracy is observed for the ST-RFM in both spatial and temporal directions. Now, we compare STC and SoV in Figure 10(d), when Nb=1N_{b}=1, Nx=Ny=2N_{x}=N_{y}=2, Nt=1,,6N_{t}=1,\cdots,6, Qx=Qy=Qt=30Q_{x}=Q_{y}=Q_{t}=30 and Jn=400J_{n}=400. Performances of STC and SoV are close.

Refer to caption
(a) Convergence w.r.t. NtN_{t}.
Refer to caption
(b) Convergence w.r.t. JnJ_{n}.
Refer to caption
(c) Convergence w.r.t. QQ.
Refer to caption
(d) STC v.s. SoV.
Figure 10. Convergence behavior as different hyper-parameters are varied for the membrane vibration equation (27).

3.2.2. Membrane vibration over a Complex Geometry

Consider a complex geometry Ω\Omega in Figure 11

Refer to caption
Figure 11. The complex geometry for the membrane vibration problem.

and the following the membrane vibration problem

(29) {ttu(x,y,t)α2Δu(x,y,t)=1,(x,y),tΩ×[0,T],u(x,y,0)=ϕ(x,y),(x,y)Ω,tu(x,y,0)=ψ(x,y),(x,y)Ω,u(x,y,t)=0,(x,y)Ω×[0,T],\left\{\begin{aligned} &\partial_{tt}u(x,y,t)-\alpha^{2}\Delta u(x,y,t)=1,&\quad(x,y),t\in\Omega\times[0,T],\\ &u(x,y,0)=\phi(x,y),&\quad(x,y)\in\Omega,\\ &\partial_{t}u(x,y,0)=\psi(x,y),&\quad(x,y)\in\Omega,\\ &u(x,y,t)=0,&\quad(x,y)\in\partial\Omega\times[0,T],\\ \end{aligned}\right.

where T=10.0T=10.0, α=1\alpha=1. The same ϕ(x,y)\phi(x,y) and ψ(x,y)\psi(x,y) are used as in Section 3.2.1. Set the default hyper-parameters Nx=5N_{x}=5, Nt=1N_{t}=1, Qx=30Q_{x}=30, Qt=30Q_{t}=30, Jn=400J_{n}=400 and Nb=5N_{b}=5. Numerical solutions of SoV at different times are plotted in Figure 12.

Refer to caption
(a) Time=0.0.
Refer to caption
(b) Time=2.5.
Refer to caption
(c) Time=5.0.
Refer to caption
(d) Time=7.5.
Figure 12. Numerical solution of the membrane vibration problem over a complex geometry at different times.

No exact solution is available and numerical convergence is shown here. L2L^{2} errors of solution uMuref\|u_{M}-u_{ref}\| with respect to different parameters is recorded in Table 2, Table 3 and Table 4. The solution with the largest degrees of freedom is chosen as the reference solution. As the parameter varies, the numerical solution converges to the reference solution, indicating the robustness of the ST-RFM in solving time-dependent partial differential equations over a complex geometry.

# Time Blocks NxN_{x}/NyN_{y} NtN_{t} JnJ_{n} QxQ_{x}/QyQ_{y}/QtQ_{t} Solution error
1 2 1 400 30 5.98e-1
1 2 2 400 30 7.07e-2
1 2 3 400 30 4.57e-2
1 2 4 400 30 3.89e-2
1 2 5 400 30 3.31e-2
1 2 6 400 30 Reference
Table 2. L2L^{2} error with respect to the number of time sub-domains.
# Time Blocks NxN_{x}/NyN_{y} NtN_{t} JnJ_{n} QxQ_{x}/QyQ_{y}/QtQ_{t} Solution error
5 2 2 400 10 7.71e-2
5 2 2 400 15 2.16e-2
5 2 2 400 20 1.44e-2
5 2 2 400 25 1.25e-2
5 2 2 400 30 Reference
Table 3. L2L^{2} error with respect to the number of collocation points.
# Time Blocks NxN_{x}/NyN_{y} NtN_{t} JnJ_{n} QxQ_{x}/QyQ_{y}/QtQ_{t} Solution error
5 2 2 100 30 1.05e-1
5 2 2 150 30 5.25e-2
5 2 2 200 30 3.96e-2
5 2 2 250 30 2.82e-2
5 2 2 300 30 1.80e-2
5 2 2 350 30 1.46e-2
5 2 2 400 30 Reference
Table 4. L2L^{2} error with respect to the number of random features.

4. Concluding Remarks

In this work, we study numerical algorithms for solving time-dependent partial differential equations in the framework of the random feature method. Two types of random feature functions are considered: space-time concatenation random feature functions (STC) and space-time separation-of-variables random feature functions (SoV). A space-time partition of unity is used to piece together local random feature functions to approximate the solution. We tested these ideas for a number of time-dependent problems. Our numerical results show that ST-RFM with both STC and SoV has spectral accuracy in space and time. The error in ST-RFM remains almost flat as the number of time subdomains increases, while the error grows exponentially fast when the block time-marching strategy is used. Consistent theoretical error estimates are also proved. A two-dimensional problem over a complex geometry is used to show that the method is insensitive to the complexity of the underlying domain.

Acknowledgments

The work is supported by National Key R&D Program of China (No. 2022YFA1005200 and No. 2022YFA1005203), NSFC Major Research Plan - Interpretable and General-purpose Next-generation Artificial Intelligence (No. 92270001 and No. 92270205), Anhui Center for Applied Mathematics, and the Major Project of Science & Technology of Anhui Province (No. 202203a05020050).

References

  • [1] Francesco Calabrò, Gianluca Fabiani, and Constantinos Siettos, Extreme learning machine collocation for the numerical solution of elliptic pdes with sharp gradients, Computer Methods in Applied Mechanics and Engineering 387 (2021), no. 1, 114–188.
  • [2] Jingrun Chen, Xurong Chi, Weinan E, and Zhouwang Yang, Bridging traditional and machine learning-based algorithms for solving pdes: The random feature method, Journal of Machine Learning 1 (2022), no. 3, 268–298.
  • [3] George Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems 2 (1989), no. 4, 303–314.
  • [4] Suchuan Dong and Zongwei Li, Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations, Computer Methods in Applied Mechanics and Engineering 387 (2021), no. 1, 114–129.
  • [5] Weinan E and Yu Bing, The deep ritz method: A deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics 6 (2018), no. 1, 1–12.
  • [6] Weinan E, Jiequn Han, and Arnulf Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics and Statistics 5 (2017), no. 4, 349–380.
  • [7] Weinan E, Jiequn Han, and Arnulf Jentzen, Algorithms for solving high dimensional pdes: from nonlinear monte carlo to machine learning, Nonlinearity 35 (2021), no. 1, 278.
  • [8] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep learning, MIT Press, 2016.
  • [9] Jiequn Han, Arnulf Jentzen, and Weinan E, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences 115 (2018), no. 34, 8505–8510.
  • [10] Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (2006), no. 1, 489–501.
  • [11] Randall J LeVeque, Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems, SIAM, 2007.
  • [12] Radford M. Neal, Bayesian learning for neural networks, Ph.D. thesis, University of Toronto, 1995.
  • [13] Ali Rahimi and Benjamin Recht, Random features for large-scale kernel machines, Advances in Neural Information Processing Systems (NIPS), vol. 20, Curran Associates, Inc., 2007.
  • [14] M. Raissi, P. Perdikaris, and G.E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019), no. 1, 686–707.
  • [15] Jie Shen, Tao Tang, and Li-Lian Wang, Spectral methods: algorithms, analysis and applications, vol. 41, Springer Science & Business Media, 2011.
  • [16] Justin Sirignano and Konstantinos Spiliopoulos, Dgm: A deep learning algorithm for solving partial differential equations, Journal of Computational Physics 375 (2018), no. 1, 1339–1364.
  • [17] Vidar Thomée, Galerkin finite element methods for parabolic problems, vol. 25, Springer Science & Business Media, 2007.
  • [18] Yunlei Yang, Muzhou Hou, and Jianshu Luo, A novel improved extreme learning machine algorithm in solving ordinary differential equations by legendre neural network methods, Advances in Difference Equations 2018 (2018), no. 1, 1–24.
  • [19] Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou, Weak adversarial networks for high-dimensional partial differential equations, Journal of Computational Physics 411 (2020), no. 1, 109409.