This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Lattice-based kernel approximation and
serendipitous weights for parametric PDEs in
very high dimensions

Vesa Kaarnioja111Department of Mathematics and Computer Science, Free University of Berlin, Arnimallee 6, 14195 Berlin, Germany.
Email: [email protected]
   Frances Y. Kuo222School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia. Email: [email protected]    Ian H. Sloan333School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia. Email: [email protected]
(August 2023)
Abstract

We describe a fast method for solving elliptic partial differential equations (PDEs) with uncertain coefficients using kernel interpolation at a lattice point set. By representing the input random field of the system using the model proposed by Kaarnioja, Kuo, and Sloan (SIAM J. Numer. Anal. 2020), in which a countable number of independent random variables enter the random field as periodic functions, it was shown by Kaarnioja, Kazashi, Kuo, Nobile, and Sloan (Numer. Math. 2022) that the lattice-based kernel interpolant can be constructed for the PDE solution as a function of the stochastic variables in a highly efficient manner using fast Fourier transform (FFT). In this work, we discuss the connection between our model and the popular “affine and uniform model” studied widely in the literature of uncertainty quantification for PDEs with uncertain coefficients. We also propose a new class of product weights entering the construction of the kernel interpolant, which dramatically improve the computational performance of the kernel interpolant for PDE problems with uncertain coefficients, and allow us to tackle function approximation problems up to very high dimensionalities. Numerical experiments are presented to showcase the performance of the new weights.

1 Introduction

As in the major survey [4], we consider parametric elliptic partial differential equations (PDEs) of the form

(a~(𝒙,𝒛)u~(𝒙,𝒛))=q(𝒙),𝒙Dd,𝒛[1,1]s,-\nabla\cdot(\widetilde{a}({\bm{x}},{\bm{z}})\nabla\widetilde{u}({\bm{x}},{\bm{z}}))=q({\bm{x}}),\quad{\bm{x}}\in D\subset\mathbb{R}^{d},\quad{\bm{z}}\in[-1,1]^{s}, (1)

subject to the homogeneous Dirichlet boundary condition u~(𝒙,𝒛)=0\widetilde{u}({\bm{x}},{\bm{z}})=0 for 𝒙D{\bm{x}}\in\partial D, where DdD\subset\mathbb{R}^{d} is a bounded Lipschitz domain, with dd typically 22 or 33, and a~\widetilde{a} is an uncertain input field given in terms of parameters 𝒛=(z1,z2,,zs)[1,1]s{\bm{z}}=(z_{1},z_{2},\ldots,z_{s})\in[-1,1]^{s} by

a~(𝒙,𝒛):=a0(𝒙)+j=1szjψj(𝒙),𝒙D,𝒛[1,1]s.\widetilde{a}({\bm{x}},{\bm{z}}):=a_{0}({\bm{x}})+\sum_{j=1}^{s}z_{j}\,\psi_{j}({\bm{x}}),\quad{\bm{x}}\in D,\quad{\bm{z}}\in[-1,1]^{s}. (2)

Here the parameters zjz_{j} represent independent random variables distributed on the interval [1,1][-1,1] according to a given probability distribution with density ρ\rho. The essential feature giving rise to major difficulties is that ss may be very large.

We assume that the univariate density ρ\rho has the Chebyshev or arcsine form

ρ(z):=1π1z2,z[1,1].\rho(z):=\frac{1}{\pi\sqrt{1-z^{2}}},\quad z\in[-1,1]. (3)

This is one of only two specific probability densities considered in the recent monograph [1], the other being the constant density ρ(z)=12\rho(z)=\frac{1}{2}. In the method of generalized polynomial chaos (GPC), see [33], the density (3) is associated with Chebyshev polynomials of the first kind as the univariate basis functions. It might be argued that in many applications the choice between these two densities is a matter of taste rather than conviction.

In Section 2 we set 𝒛=sin(2π𝒚){\bm{z}}=\sin(2\pi{\bm{y}}) component-wise, and so recast the problem (1)–(2) with the density (3) to one in which the dependence on a new stochastic variable 𝒚{\bm{y}} is periodic, and the solution becomes

u(𝒙,𝒚):=u~(𝒙,sin(2π𝒚)),𝒙D,𝒚[0,1]s=:Us.u({\bm{x}},{\bm{y}}):=\widetilde{u}({\bm{x}},\sin(2\pi{\bm{y}})),\quad{\bm{x}}\in D,\quad{\bm{y}}\in[0,1]^{s}\,=:\,U_{s}. (4)

More precisely, we show that u(𝒙,𝒚)u({\bm{x}},{\bm{y}}) satisfies

(a(𝒙,𝒚)u(𝒙,𝒚))=q(𝒙),𝒙Dd,𝒚Us-\nabla\cdot(a({\bm{x}},{\bm{y}})\nabla u({\bm{x}},{\bm{y}}))=q({\bm{x}}),\quad{\bm{x}}\in D\subset\mathbb{R}^{d},\quad{\bm{y}}\in U_{s} (5)

and that (2) together with the probability density (3) can be replaced by

a(𝒙,𝒚):=a~(𝒙,sin(2π𝒚))=a0(𝒙)+j=1ssin(2πyj)ψj(𝒙),𝒙D,𝒚Us,a({\bm{x}},{\bm{y}}):=\widetilde{a}({\bm{x}},\sin(2\pi{\bm{y}}))=a_{0}({\bm{x}})+\sum_{j=1}^{s}\sin(2\pi y_{j})\,\psi_{j}({\bm{x}}),\quad{\bm{x}}\in D,\quad{\bm{y}}\in U_{s}, (6)

with 𝒚=(y1,y2,,ys){\bm{y}}=(y_{1},y_{2},\ldots,y_{s}), where the parameters yjy_{j} represent i.i.d. random variables uniformly distributed on [0,1][0,1]. The equivalence of (2) subject to density (3) on the one hand, with (6) subject to uniform density on the other, is meant in the sense that both fields have exactly the same joint probability distribution in the domain DD; for a precise statement see Theorem 2 and Corollary 3 below.

In Section 3 we describe the lattice-based kernel interpolant of [20] and discuss its properties, error bounds and cost of construction and evaluation. The essence of the method is that the dependence on the parameter 𝒚{\bm{y}} is approximated by a linear combination of periodic kernels, each with one leg fixed at a point 𝒕k,k=1,,n{\bm{t}}_{k},k=1,\ldots,n, of a carefully chosen ss-dimensional rank-11 lattice, with coefficients fixed by interpolation. The kernel is the reproducing kernel of a weighted ss-variate Hilbert space HH of dominating mixed smoothness α\alpha\in{\mathbb{N}}. (The parameter labelled α\alpha in [20] is here replaced by 2α2\alpha, to make α\alpha correspond to the order of mixed derivatives, see the norm (9) below.)

In Section 4 we summarize the results from the paper [20] on applying the lattice-based kernel interpolant to the PDE problem (5)–(6). The main result is that, provided the fluctuation coefficients ψj\psi_{j} in (6) decay suitably, the L2L_{2} error (with respect to both 𝒙{\bm{x}} and 𝒚{\bm{y}}) of the kernel interpolant to the PDE solution is of the order

𝒪(nα/2+δ),δ>0,{\mathcal{O}}(n^{-\alpha/2+\delta}),\quad\delta>0, (7)

where the implied constant depends on δ\delta but is independent of  ss. The convergence rate nα/2n^{-\alpha/2} is known (see [3]) to be the best that can be achieved in the sense of worst case L2L_{2} error for any approximation that (as in [20] and here) uses information only at the points of a rank-11 lattice.

The present paper improves upon the method as presented in [20] in two different ways: first by making the method much more efficient; and second by making it much more accurate in difficult cases. The combined effect is to greatly increase the range of dimension ss that are realistically achievable. We demonstrate this by carrying out explicit nontrivial calculations with s=1000s=1000, compared with s=100s=100 in [20].

Both benefits are achieved in the present paper by the introduction of a new kind of “weights”. As mentioned earlier, our approximation makes explicit use of the reproducing kernel of a certain weighted Hilbert space HH. The definition of that Hilbert space involves positive numbers γ𝔲\gamma_{\mathfrak{u}} called weights, see (9) below. The weights naturally occur also in the kernel, see (10). The best performing weights found in [20] were of the so-called “SPOD” form (see (22) below). For SPOD weights the cost of evaluating the kernel interpolant is high, because the cost of kernel evaluation is high.

In Section 5 we introduce novel weights, which are “product” weights rather than SPOD weights, allowing the kernels to be evaluated at a cost of order ss. We still have a rigorous error bound of the form (7). The theoretical downside is that the implied constant is no longer independent of ss. We refer to these weights as “serendipitous”, with the word “serendipity” meaning “happy discovery by accident”.

In Section 6 we give numerical results, which show that with the lattice-based kernel interpolant from [20] and these weights, problems with dimension ss of a thousand or more can easily be studied. For not only are they cheaper and easier to use than SPOD weights, but also in difficult problems they lead empirically (and to our surprise) to much smaller errors, while producing similar errors to SPOD weights for easier problems.

A different kind of product weight was developed in [20] by adhering to the requirement that the error bound be independent of dimension, but those weights were found to have limited applicability, and did not show the remarkable performance reported here.

Many other methods have been proposed for L2L_{2} approximation in the multivariate setting. The review article [4] and the monograph [1] take the approximating function to be a multivariate polynomial; as a result a major part of their analysis is inevitably concerned with sparsification of the basis set, since otherwise the “curse of dimensionality” would preclude large values of ss. Recently other methods have been proposed [2, 3, 22, 23], some of which are optimal with respect to order of convergence, in the sense of producing error bounds of order

𝒪(nα)or𝒪(nα(logn)β)\displaystyle{\mathcal{O}}(n^{-\alpha})\quad\mbox{or}\quad{\mathcal{O}}(n^{-\alpha}(\log n)^{\beta}) (8)

for some β\beta, a rate with respect to the exponent of nn that is the same as for approximation numbers, see [25, 31, 9]. Obviously (8) displays a better convergence rate than (7), but it has yet to be demonstrated that any of these methods has the potential for solving in practice the very high-dimensional problems seen in the numerical calculations of this paper.

The application of lattice point sets together with kernel interpolation has gained a lot of attention in the recent years. The paper [34] appears to have been the first to consider this approach, later the paper [35] obtained dimension-independent error estimates in weighted spaces using product weights.

The use of lattice points for approximation has been facilitated by the development of fast component-by-component (CBC) constructions for lattices under different assumptions on the form of the weights, see [30, 5, 6]. This enables the generation of tailored lattices for large-scale high-dimensional approximation problems such as those arising in uncertainty quantification.

This paper is organized as follows. In Section 2, we describe the connection between the so-called “affine model” and the “periodic model” introduced in [21]. The lattice-based kernel interpolant of [20] is summarized in Section 3. The application to parametric elliptic PDEs with uncertain coefficients is discussed in Section 4. The new class of serendipitous weights for the construction of the kernel interpolant for parametric PDE problems is introduced in Section 5. Numerical experiments assessing the performance of those weights are presented in Section 6. The paper ends with some conclusions.

2 Transforming to the periodic setting

The equivalence of the affine probability model given by (2) and (3) with the periodic formulation in (6) is expressed in Corollary 3 below. It states that the probability distributions in the two cases are identical. A more general result is stated in Theorem 2. It rests in turn on the following lemma, stating that if YY is a random variable uniformly distributed on [0,1][0,1], then sin(2πY)\sin(2\pi Y) has exactly the same probability distribution as a random variable distributed on [1,1][-1,1] with the arcsine probability density (3). While the proof is elementary, it has one slightly unusual feature, that the change of variable from y[0,1]y\in[0,1] to z=sin(2πy)z=\sin(2\pi y) is not monotone.

Lemma 1.

Let ZZ be a random variable distributed on [1,1][-1,1] with density ρ\rho given by (3), and let YY be a random variable uniformly distributed on [0,1][0,1]. Then for all tt\in\mathbb{R} we have

[Zt]=[sin(2πY)t].\mathbb{P}[Z\leq t]=\mathbb{P}[\sin(2\pi Y)\leq t].
Proof.

We first write the left-hand side explicitly as an integral:

[Zt]=11ind(tz)1π1z2dz,\mathbb{P}[Z\leq t]=\int_{-1}^{1}{\rm ind}(t-z)\,\frac{1}{\pi\sqrt{1-z^{2}}}\,\mathrm{d}z,

where ind{\rm ind} is the indicator function, taking the value 11 if the argument is non-negative, and the value 0 if it is negative. The next step is to make a change of variable z=sin(2πy)z=\sin(2\pi y), but note that this is not permissible for yy on the whole interval [0,1][0,1] because sin(2πy)\sin(2\pi y) is not monotone. Noting that sin(2πy)\sin(2\pi y) is 11-periodic, it is sufficient to consider yy in the two intervals [1/4,1/4][-1/4,1/4] and [1/4,3/4][1/4,3/4] separately (the point being that sin(2πy)\sin(2\pi y) is monotone in each subinterval, and that together the two subintervals make a full period). For the first subinterval we have

z=sin(2πy),y=arcsin(z)2π[1/4,1/4],z=\sin(2\pi y),\quad y=\frac{\arcsin(z)}{2\pi}\in[-1/4,1/4],

so that we obtain

[Zt]=1/41/4ind(tsin(2πy))2πcos(2πy)πcos(2πy)dy=1/41/4ind(tsin(2πy)) 2dy.\mathbb{P}[Z\leq t]=\int_{-1/4}^{1/4}{\rm ind}(t-\sin(2\pi y))\frac{2\pi\cos(2\pi y)}{\pi\cos(2\pi y)}\,\mathrm{d}y=\int_{-1/4}^{1/4}{\rm ind}(t-\sin(2\pi y))\,2\,\mathrm{d}y.

Similarly, for the second interval [1/4,3/4][1/4,3/4] we have

z=sin(2πy),y=12arcsin(z)2π[1/4,3/4],z=\sin(2\pi y),\quad y=\frac{1}{2}-\frac{\arcsin(z)}{2\pi}\in[1/4,3/4],

so that

[Zt]=1/43/4ind(tsin(2πy)) 2dy.\mathbb{P}[Z\leq t]=\int_{1/4}^{3/4}{\rm ind}(t-\sin(2\pi y))\,2\,\mathrm{d}y.

Adding the two results together, and dividing by 2, we obtain

[Zt]=1/43/4ind(tsin(2πy))dy=01ind(tsin(2πy))dy=[sin(2πY)t],\mathbb{P}[Z\leq t]=\int_{-1/4}^{3/4}{\rm ind}(t-\sin(2\pi y))\,\mathrm{d}y=\int_{0}^{1}{\rm ind}(t-\sin(2\pi y))\,\mathrm{d}y=\mathbb{P}[\sin(2\pi Y)\leq t],

with the second equality following from the periodicity of sin(2πy)\sin(2\pi y). This completes the proof of the lemma. ∎

Theorem 2.

Let Q~(𝐙)=Q~(Z1,Z2,,Zs)\widetilde{Q}({\bm{Z}})=\widetilde{Q}(Z_{1},Z_{2},\ldots,Z_{s}) be a real-valued random variable that depends on ss i.i.d. real-valued random variables Z1,Z2,,ZsZ_{1},Z_{2},\ldots,Z_{s}, where each ZjZ_{j} is distributed on [1,1][-1,1] with density ρ\rho given by (3). Let Q(𝐘)=Q(Y1,Y2,,Ys)Q({\bm{Y}})=Q(Y_{1},Y_{2},\ldots,Y_{s}) be another random variable defined by

Q(𝒀)=Q~(sin(2π𝒀)),Q({\bm{Y}})=\widetilde{Q}(\sin(2\pi{\bm{Y}})),

where the YjY_{j} are i.i.d. uniformly distributed random variables on [0,1][0,1]. Then for all qq\in\mathbb{R} we have

[Q()q]=[Q~()q],\mathbb{P}[Q(\cdot)\leq q]=\mathbb{P}[\widetilde{Q}(\cdot)\leq q],

where the probability on the left is with respect to uniform density, while that on the right is with respect to a product of the univariate densities (3).

Proof.

This follows immediately by applying Lemma 1 to each random variable, together with Fubini’s theorem. ∎

It follows from the theorem that the input parametric field given by (2) subject to the density (3) can with equal validity be expressed as (6), with each yjy_{j} uniformly distributed on [0,1][0,1]. We state this as a corollary:

Corollary 3.

Let 𝐙{\bm{Z}} and 𝐘{\bm{Y}} be as in Theorem 2, and u~(𝐱,z)\widetilde{u}({\bm{x}},z) and u(𝐱,𝐲)u({\bm{x}},{\bm{y}}) be as in (1) and (4). The random variable u~(𝐱,𝐙)\widetilde{u}({\bm{x}},{\bm{Z}}) for 𝐱D{\bm{x}}\in D has the same joint probability distribution with respect to the product density j=1sρ(zj)\prod_{j=1}^{s}\rho(z_{j}) as u(𝐱,𝐘)u({\bm{x}},{\bm{Y}}) has with respect to the uniform product density.

This is the periodic formulation introduced by [21], except for the trivial difference of a different normalising factor: in [21] the sum was multiplied by 1/61/\sqrt{6} to ensure that the variance of the field was the same as that of a uniformly distributed affine variable defined on [1/2,1/2][-1/2,1/2]. In effect we are here redefining the fluctuation coefficients ψj\psi_{j}.

3 The kernel interpolant

We assume that f:[0,1]sf:[0,1]^{s}\to{\mathbb{R}} belongs to the weighted periodic unanchored Sobolev space HH of dominating mixed smoothness of order α\alpha\in{\mathbb{N}}, with norm

fH2:=𝔲{1:s}1(2π)2α|𝔲|γ𝔲[0,1]|𝔲||[0,1]s|𝔲|(j𝔲αyjα)f(𝒚)d𝒚𝔲|2d𝒚𝔲,\displaystyle\|f\|_{H}^{2}:=\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathfrak{u}}|}\gamma_{\mathfrak{u}}}\int_{[0,1]^{|{\mathfrak{u}}|}}\!\!\;\bigg{|}\int_{[0,1]^{s-|{\mathfrak{u}}|}}\bigg{(}\prod_{j\in{\mathfrak{u}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}f({\bm{y}})\,\mathrm{d}{\bm{y}}_{-{\mathfrak{u}}}\bigg{|}^{2}\,\,\mathrm{d}{\bm{y}}_{\mathfrak{u}}, (9)

where {1:s}:={1,2,,s}\{1:s\}:=\{1,2,\ldots,s\}, 𝒚𝔲:=(yj)j𝔲{\bm{y}}_{{\mathfrak{u}}}:=(y_{j})_{j\in{\mathfrak{u}}} and 𝒚𝔲:=(yj)j{1:s}𝔲{\bm{y}}_{-{\mathfrak{u}}}:=(y_{j})_{j\in\{1:s\}\setminus{\mathfrak{u}}}. The inner product ,H\langle\cdot,\cdot\rangle_{H} is defined in an obvious way. Here we have replaced the traditional notation of the smoothness parameter α\alpha by 2α2\alpha, so that our α\alpha corresponds exactly to the order of derivatives in the norm (9). The space HH is a special case of the weighted Korobov space which has a real smoothness parameter α\alpha characterizing the rate of decay of Fourier coefficients, see e.g., [32, 30, 5, 6].

The parameters γ𝔲\gamma_{{\mathfrak{u}}} for 𝔲{1:s}{\mathfrak{u}}\subseteq\{1:s\} in the norm (9) are weights that are used to moderate the relative importance between subsets of variables in the norm, with γ:=1\gamma_{\emptyset}:=1. There are 2s2^{s} weights in full generality, far too many to prescribe one by one. In practice we must work with weights of restricted forms. The following forms of weights have been considered in the literature:

  • Product weights [32]: γ𝔲=j𝔲γj\gamma_{\mathfrak{u}}=\prod_{j\in{\mathfrak{u}}}\gamma_{j}, specified by one sequence (γj)j1(\gamma_{j})_{j\geq 1}.

  • POD weights (product and order dependent) [29]: γ𝔲=Γ|𝔲|j𝔲γj\gamma_{\mathfrak{u}}=\Gamma_{|{\mathfrak{u}}|}\prod_{j\in{\mathfrak{u}}}\gamma_{j}, specified by two sequences (Γ)0(\Gamma_{\ell})_{\ell\geq 0} and (γj)j1(\gamma_{j})_{j\geq 1}.

  • SPOD weights (smoothness-driven product and order dependent) [8]: γ𝔲=𝝂𝔲{1:α}|𝔲|Γ|𝝂𝔲|j𝔲γj,νj\gamma_{\mathfrak{u}}=\sum_{{\bm{\nu}}_{\mathfrak{u}}\in\{1:\alpha\}^{|{\mathfrak{u}}|}}\Gamma_{|{\bm{\nu}}_{\mathfrak{u}}|}\prod_{j\in{\mathfrak{u}}}\gamma_{j,\nu_{j}}, specified by the sequences (Γ)0(\Gamma_{\ell})_{\ell\geq 0} and (γj,ν)j1(\gamma_{j,\nu})_{j\geq 1} for each ν=1,,α\nu=1,\ldots,\alpha, where |𝝂𝔲|:=j𝔲νj|{\bm{\nu}}_{\mathfrak{u}}|:=\sum_{j\in{\mathfrak{u}}}\nu_{j}.

The space HH is a reproducing kernel Hilbert space (RKHS) with the kernel

K(𝒚,𝒚):=𝔲{1:s}γ𝔲j𝔲ηα(yj,yj),𝒚,𝒚[0,1]s,K({\bm{y}},{\bm{y}}^{\prime}):=\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\gamma_{{\mathfrak{u}}}\prod_{j\in{\mathfrak{u}}}\eta_{\alpha}(y_{j},y_{j}^{\prime}),\quad{\bm{y}},{\bm{y}}^{\prime}\in[0,1]^{s}, (10)

where

ηα(y,y):=(2π)2α(1)α+1(2α)!B2α({yy}),y,y[0,1],\eta_{\alpha}(y,y^{\prime}):=\frac{(2\pi)^{2\alpha}}{(-1)^{\alpha+1}(2\alpha)!}B_{2\alpha}(\{y-y^{\prime}\}),\quad y,y^{\prime}\in[0,1],

with B(y)B_{\ell}(y) denoting the Bernoulli polynomial of degree \ell, and the braces {}\{\cdot\} denoting the fractional part of each component of the argument. As concrete examples, we have B2(y)=y2y+1/6B_{2}(y)=y^{2}-y+1/6 and B4(y)=y42y3+y21/30B_{4}(y)=y^{4}-2y^{3}+y^{2}-1/30. The kernel KK is easily seen to satisfy the two defining properties of a reproducing kernel, namely that K(,𝒚)HK(\cdot,{\bm{y}})\in H for all 𝒚[0,1]s{\bm{y}}\in[0,1]^{s} and f,K(,𝒚)H=f(𝒚)\langle f,K(\cdot,{\bm{y}})\rangle_{H}=f({\bm{y}}) for all fHf\in H and all 𝒚[0,1]s{\bm{y}}\in[0,1]^{s}.

For a given fHf\in H, we consider an approximation of the form

An(f)(𝒚):=fn(𝒚):=k=1nakK(𝒕k,𝒚),𝒚[0,1]s,A_{n}^{*}(f)({\bm{y}}):=f_{n}({\bm{y}}):=\sum_{k=1}^{n}a_{k}\,K({\bm{t}}_{k},{\bm{y}}),\quad{\bm{y}}\in[0,1]^{s}, (11)

where a1,,ana_{1},\ldots,a_{n}\in\mathbb{R} and the nodes

𝒕k:={k𝒛genn}fork=1,,n\displaystyle{\bm{t}}_{k}:=\bigg{\{}\frac{k{\bm{z}}_{\rm gen}}{n}\bigg{\}}\quad\text{for}~{}k=1,\ldots,n (12)

are the points of a rank-1 lattice, with 𝒛gen{1,,n1}s{\bm{z}}_{\rm gen}\in\{1,\ldots,n-1\}^{s} being the lattice generating vector. Since the kernel KK is periodic, the braces {}\{\cdot\} indicating the fractional part in the definition of 𝒕k{\bm{t}}_{k} can be omitted when we evaluate the kernel.

The kernel interpolant fnHf_{n}\in H is a function of the form (11) that interpolates ff at the lattice points,

fn(𝒕k)=f(𝒕k)for allk=1,,nf_{n}({\bm{t}}_{k^{\prime}})=f({\bm{t}}_{k^{\prime}})\quad\text{for all}~{}k^{\prime}=1,\ldots,n

with the coefficients aka_{k} therefore satisfying the linear system

k=1n𝒦k,kak=f(𝒕k)for allk=1,,n,\displaystyle\sum_{k=1}^{n}{\mathcal{K}}_{k,k^{\prime}}\,a_{k}=f({\bm{t}}_{k^{\prime}})\quad\text{for all}~{}k^{\prime}=1,\ldots,n, (13)

where

𝒦k,k=𝒦k,k:=K(𝒕k,𝒕k)=K((kk)𝒛genn,𝟎)for k,k=1,,n.\displaystyle{\mathcal{K}}_{k,k^{\prime}}={\mathcal{K}}_{k^{\prime},k}:=K({\bm{t}}_{k},{\bm{t}}_{k^{\prime}})=K\left(\frac{(k-k^{\prime}){\bm{z}}_{\rm gen}}{n},\mathbf{0}\right)\quad\mbox{for }k,k^{\prime}=1,\ldots,n. (14)

The solution of the square linear sytem (13) exists and is unique because KK, by virtue of being a reproducing kernel, is positive definite.

Moreover, because the nodes form a lattice, the matrix 𝒦{\mathcal{K}} is a circulant, thus the coefficient vector 𝒂:=[a1,,an]T{\bm{a}}:=[a_{1},\ldots,a_{n}]^{\rm T} in (13) can be solved in 𝒪(nlogn)\mathcal{O}(n\log n) time using the fast Fourier transform (FFT) via

𝒂=𝚒𝚏𝚏𝚝(𝚏𝚏𝚝(𝒇)./𝚏𝚏𝚝(𝒦:,1)),{\bm{a}}={\tt ifft}({\tt fft}({\bm{f}})\,./\,{\tt fft}({\mathcal{K}}_{:,1})),

where  ././  indicates component-wise division of two vectors, 𝒇:=[f(𝒕1),,f(𝒕n)]T{\bm{f}}:=[f({\bm{t}}_{1}),\ldots,f({\bm{t}}_{n})]^{\rm T}, and 𝒦:,1{\mathcal{K}}_{:,1} denotes the first column of matrix 𝒦{\mathcal{K}}. This is a major advantage of using lattice points for the construction of the kernel interpolant.

Important properties regarding the kernel interpolant were summarized or proved in [20]:

  • The kernel interpolant is the minimal norm interpolant in the sense that it has the smallest HH-norm among all interpolants using the same function values of ff, see [20, Theorem 2.1].

  • The kernel interpolant is optimal in the sense that it has the smallest worst case error (measured in any norm W\|\cdot\|_{W} with HWH\subset W) among all linear or non-linear algorithms that use the same function values of ff, see [20, Theorem 2.2]. Recall that the worst case error measured in WW-norm of an algorithm AA in HH is defined by

    ewor(A;H;W):=supfH,fH1fA(f)W.e^{\rm wor}(A;H;W):=\sup_{f\in H,\,\|f\|_{H}\leq 1}\|f-A(f)\|_{W}.
  • Any (linear or non-linear) algorithm AnA_{n} (with An(0)=0A_{n}(0)=0) that uses function values of ff only at the lattice points has the lower bound

    ewor(An;H;Lp)Cnα/2,p[1,],e^{\rm wor}(A_{n};H;L_{p})\geq C\,n^{-\alpha/2},\quad p\in[1,\infty],

    with an explicit constant C>0C>0, see [20, Theorem 3.1] and [3]. However, it is known that there exist other algorithms based on function values that can achieve an L2L_{2} approximation error upper bound of order nαn^{-\alpha}, see [25, 31, 9]. Hence, our lattice-based kernel interpolant can only get the half-optimal convergence rate for L2L_{2} approximation error at best.

  • A component-by-component (CBC) construction from [5, 6] can be used to obtain a lattice generating vector such that our lattice-based kernel interpolant achieves

    ewor(An;H;L2)κn1/(4λ)(𝔲{1:s}max(|𝔲|,1)γ𝔲λ[2ζ(2αλ)]|𝔲|)1/(2λ)\displaystyle e^{\rm wor}(A_{n}^{*};H;L_{2})\leq\frac{\kappa}{n^{1/(4\lambda)}}\bigg{(}\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\max(|{\mathfrak{u}}|,1)\,\gamma_{{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathfrak{u}}|}\bigg{)}^{1/{(2\lambda)}} (15)

    for all λ(1/(2α),1]\lambda\in(1/(2\alpha),1], with κ:=2(2.5+24αλ+1)1/(4λ)\kappa:=\sqrt{2}\,(2.5+2^{4\alpha\lambda+1})^{{{1}/{(4\lambda)}}} and ζ(x):=k=1kx\zeta(x):=\sum_{k=1}^{\infty}k^{-x} denoting the Riemann zeta function for x>1x>1. Hence, by taking λ1/(2α)\lambda\to 1/(2\alpha) we conclude that

    ewor(An;H;L2)=𝒪(nα/2+δ),δ>0,e^{\rm wor}(A_{n}^{*};H;L_{2})={\mathcal{O}}(n^{-\alpha/2+\delta}),\quad\delta>0,

    where the implied constant depends on δ\delta but is independent of ss if the weights γ𝔲\gamma_{\mathfrak{u}} are such that the sum in (15) can be bounded independently of ss, see [20, Theorem 3.2]. Consequently, for any fHf\in H we have

    ffnL2ewor(An;H;L2)fH=𝒪(nα/2+δ).\|f-f_{n}\|_{L_{2}}\leq e^{\rm wor}(A_{n}^{*};H;L_{2})\,\|f\|_{H}={\mathcal{O}}(n^{-\alpha/2+\delta}).

The bound (15) was proved initially in [5] only for prime nn, but has since been generalized to composite nn and extensible lattice sequences in [26].

Although the theoretical error bound (15) holds for all general weights γ𝔲\gamma_{\mathfrak{u}}, practical implementation of the CBC construction must take into account the specific form of weights for computational efficiency. Fast (FFT-based) CBC constructions were developed in [6] for product weights, POD weights and SPOD weights with varying computational cost. Evaluating the kernel interpolant (11) also requires evaluations of the kernel (10) with varying computational cost depending on the form of weights, see [20, Section 5.2]. Furthermore, if we are interested in evaluating the kernel interpolant fn(𝒚)f_{n}({\bm{y}}_{\ell}) at LL arbitrary points 𝒚{\bm{y}}_{\ell}, =1,,L\ell=1,\ldots,L, then due to the matrix (14) being circulant, we can evaluate the kernel interpolant at all the shifted points fn(𝒚+𝒕k)f_{n}({\bm{y}}_{\ell}+{\bm{t}}_{k^{\prime}}), k=1,,nk^{\prime}=1,\ldots,n, with only an extra logarithmic factor in the cost, see [20, Section 5.1]. We summarize these cost considerations in Table 1 (taken from [20, Table 1]). Clearly, product weights are the most efficient form of weights in all considerations.

Table 1: Cost breakdown for the kernel interpolant fnf_{n} based on nn lattice points 𝒕k{\bm{t}}_{k} in ss dimensions, evaluated at LL arbitrary points 𝒚{\bm{y}}_{\ell}. Here XX is the cost for one evaluation of ff.
Operation \\backslash Weights Product POD SPOD
Fast CBC construction for 𝒛gen{\bm{z}}_{\rm gen} snlog(n)s\,n\log(n) snlog(n)+s2log(s)ns\,n\log(n)+s^{2}\log(s)\,n snlog(n)+s3α2ns\,n\log(n)+s^{3}\alpha^{2}\,n
Compute K(𝒕k,𝟎)K({\bm{t}}_{k},{\bm{0}}) for all kk sns\,n s2ns^{2}\,n s2α2ns^{2}\,\alpha^{2}\,n
Evaluate f(𝒕k)f({\bm{t}}_{k}) for all kk XnX\,n XnX\,n XnX\,n
Linear solve for all coefficients aka_{k} nlog(n)n\log(n) nlog(n)n\log(n) nlog(n)n\log(n)
Compute K(𝒕k,𝒚)K({\bm{t}}_{k},{\bm{y}}_{\ell}) for all k,k,\ell snLs\,n\,L s2nLs^{2}\,n\,L s2α2nLs^{2}\,\alpha^{2}\,n\,L
Assemble fn(𝒚)f_{n}({\bm{y}}_{\ell}) for all \ell nLn\,L nLn\,L nLn\,L
OR Assemble fn(𝒚+𝒕k)f_{n}({\bm{y}}_{\ell}+{\bm{t}}_{k}) for all ,k\ell,k nlog(n)Ln\,\log(n)\,L nlog(n)Ln\,\log(n)\,L nlog(n)Ln\,\log(n)\,L

4 Kernel interpolant for parametric elliptic PDEs

In the literature of “tailored” quasi-Monte Carlo (QMC) rules for parametric PDEs, it is customary to analyze the parametric regularity of the PDE solutions. This information can be used to construct QMC rules satisfying rigorous error bounds. Many of these studies have been carried out under the assumption of the so-called “affine and uniform setting” as in (2). Examples include the source problem for elliptic PDEs with random coefficients [29, 8, 27, 28, 10], spectral eigenvalue problems under uncertainty [12, 13, 14], Bayesian inverse problems [11, 7, 19], domain uncertainty quantification [18], PDE-constrained optimization under uncertainty [15, 16], and many others. When the input random field is modified to involve a composition with a periodic function as in (6), the regularity bound naturally changes, as we have encountered in [21, 20, 17].

We return now to the PDE problem (5) together with the periodic random field (6). We remark that in many studies the input random field is modeled as an infinite series and the effect of dimension truncation is analyzed. We will take this point of view below, as we did in [20]. Thus we now have a countably infinite parameter sequence 𝒚U:=[0,1]{\bm{y}}\in U:=[0,1]^{\mathbb{N}}, and UsU_{s} in (4), (5) and (6) is now replaced by UU. We will abuse the notation from the introduction and instead use a(,𝒚)a(\cdot,{\bm{y}}) and u(,𝒚)u(\cdot,{\bm{y}}) to denote the corresponding random field and PDE solution, while we write as(,𝒚):=a(,(y1,,ys,0,0,))a_{s}(\cdot,{\bm{y}}):=a(\cdot,(y_{1},\ldots,y_{s},0,0,\ldots)) and us(,𝒚):=u(,(y1,,ys,0,0,,))u_{s}(\cdot,{\bm{y}}):=u(\cdot,(y_{1},\ldots,y_{s},0,0,\ldots,)) to denote the dimension truncated counterparts.

Since we have two sets of variables 𝒙D{\bm{x}}\in D and 𝒚U{\bm{y}}\in U, from now on we will make the domains DD and UU explicit in our notation. Let H01(D)H_{0}^{1}(D) denote the subspace of H1(D)H^{1}(D) with zero trace on D\partial D. We equip H01(D)H_{0}^{1}(D) with the norm vH01(D):=vL2(D)\|v\|_{H_{0}^{1}(D)}:=\|\nabla v\|_{L_{2}(D)}. Let H1(D)H^{-1}(D) denote the topological dual space of H01(D)H_{0}^{1}(D), and let ,H1(D),H01(D)\langle\cdot,\cdot\rangle_{H^{-1}(D),H_{0}^{1}(D)} denote the duality pairing between H1(D)H^{-1}(D) and H01(D)H_{0}^{1}(D). We have the parametric weak formulation: for 𝒚U{\bm{y}}\in U, find u(,𝒚)H01(D)u(\cdot,{\bm{y}})\in H_{0}^{1}(D) such that

Da(𝒙,𝒚)u(𝒙,𝒚)v(𝒙)d𝒙=q,vH1(D),H01(D)for allvH01(D),\displaystyle\int_{D}a({\bm{x}},{\bm{y}})\nabla u({\bm{x}},{\bm{y}})\cdot\nabla v({\bm{x}})\,{\rm d}{\bm{x}}=\langle q,v\rangle_{H^{-1}(D),H_{0}^{1}(D)}\quad\text{for all}~{}v\in H_{0}^{1}(D), (16)

where qH1(D)q\in H^{-1}(D). Following the problem formulation in [20], we make these standing assumptions: {addmargin}[1.3em]0em

  1. (A1)

    a0L(D)a_{0}\in L_{\infty}(D) and ψjL(D)\psi_{j}\in L_{\infty}(D) for all j1j\geq 1, and j1ψjL(D)<\sum_{j\geq 1}\|\psi_{j}\|_{L_{\infty}(D)}<\infty;

  2. (A2)

    there exist amina_{\min} and amaxa_{\max} such that 0<amina(𝒙,𝒚)amax<0<a_{\min}\leq a({\bm{x}},{\bm{y}})\leq a_{\max}<\infty for all 𝒙D{\bm{x}}\in D and 𝒚U{\bm{y}}\in U;

  3. (A3)

    j1ψjL(D)p<\sum_{j\geq 1}\|\psi_{j}\|_{L_{\infty}(D)}^{p}<\infty for some p(0,1)p\in(0,1);

  4. (A4)

    a0W1,(D)a_{0}\in W^{1,\infty}(D) and j1ψjW1,(D)<\sum_{j\geq 1}\|\psi_{j}\|_{W^{1,\infty}(D)}<\infty, where vW1,(D):=max{vL(D),vL(D)};\|v\|_{W^{1,\infty}(D)}:=\max\{\|v\|_{L_{\infty}(D)},\|\nabla v\|_{L_{\infty}(D)}\};

  5. (A5)

    ψ1L(D)ψ2L(D)\|\psi_{1}\|_{L_{\infty}(D)}\geq\|\psi_{2}\|_{L_{\infty}(D)}\geq\cdots;

  6. (A6)

    the spatial domain DdD\subset\mathbb{R}^{d}, d{1,2,3}d\in\{1,2,3\}, is a convex and bounded polyhedron.

In practical computations, one typically needs to discretize the PDE (5) using, e.g., a finite element method. While the weak solution of the PDE problem is in general a Sobolev function and may not be pointwise well-defined with respect to the spatial variable 𝒙D{\bm{x}}\in D, the finite element solution is pointwise well-defined with respect to 𝒙D{\bm{x}}\in D which we may exploit in the construction of the kernel interpolant. To this end, let {Vh}h\{V_{h}\}_{h} be a family of conforming finite element subspaces VhH01(D)V_{h}\subset H_{0}^{1}(D), indexed by the mesh size h>0h>0 and spanned by continuous, piecewise linear finite element basis functions. Furthermore, we assume that the triangulation corresponding to each VhV_{h} is obtained from an initial, regular triangulation of DD by recursive, uniform partition of simplices. For 𝒚U{\bm{y}}\in U, the finite element solution uh(,𝒚)Vhu_{h}(\cdot,{\bm{y}})\in V_{h} satisfies

Da(𝒙,𝒚)uh(𝒙,𝒚)vh(𝒙)d𝒙=q,vhH1(D),H01(D)for allvhVh.\int_{D}a({\bm{x}},{\bm{y}})\nabla u_{h}({\bm{x}},{\bm{y}})\cdot\nabla v_{h}({\bm{x}})\,{\rm d}{\bm{x}}=\langle q,v_{h}\rangle_{H^{-1}(D),H_{0}^{1}(D)}\quad\text{for all}~{}v_{h}\in V_{h}.

Let 𝝂0{\bm{\nu}}\in\mathbb{N}_{0}^{\infty} denote a multi-index with finite order |𝝂|:=j1νj<|{\bm{\nu}}|:=\sum_{j\geq 1}\nu_{j}<\infty, and let 𝝂:=j1(/yj)νj\partial^{\bm{\nu}}:=\prod_{j\geq 1}(\partial/\partial y_{j})^{\nu_{j}} denote the mixed partial derivatives with respect to 𝒚{\bm{y}}. The standing assumptions (A1)–(A6) together with the Lax–Milgram lemma ensure that the weak formulation (16) has a unique solution such that for all 𝒚U{\bm{y}}\in U (see [21] for a proof),

𝝂u(,𝒚)H01(D)qH1(D)amin(2π)|𝝂|𝒎𝝂|𝒎|!j1(bjmjS(νj,mj)),\displaystyle\|\partial^{{\bm{\nu}}}u(\cdot,{\bm{y}})\|_{H_{0}^{1}(D)}\leq\frac{\|q\|_{H^{-1}(D)}}{a_{\min}}\,(2\pi)^{|{\bm{\nu}}|}\sum_{{\bm{m}}\leq{\bm{\nu}}}|{\bm{m}}|!\,\prod_{j\geq 1}\big{(}b_{j}^{m_{j}}\,S(\nu_{j},m_{j})\big{)}, (17)

where (no factor 1/61/\sqrt{6} here compared to [21])

bj:=ψjL(D)aminfor allj1,b_{j}:=\frac{\|\psi_{j}\|_{L_{\infty}(D)}}{a_{\min}}\quad\text{for all}~{}j\geq 1, (18)

and S(ν,m)S(\nu,m) denotes the Stirling number of the second kind for integers νm0\nu\geq m\geq 0, under the convention S(ν,0)=δν,0S(\nu,0)=\delta_{\nu,0}.

Note that the parametric regularity bound (17) holds when u(,𝒚)u(\cdot,{\bm{y}}) is replaced by a conforming finite element approximation uh(,𝒚)u_{h}(\cdot,{\bm{y}}). The same is also true for the dimension truncated solution us(,𝒚)u_{s}(\cdot,{\bm{y}}) and its corresponding finite element approximation us,h(,𝒚)u_{s,h}(\cdot,{\bm{y}}).

Let H(Us)=HH(U_{s})=H denote the RKHS of functions with respect to 𝒚Us{\bm{y}}\in U_{s} from Section 3. In the framework of Section 3, for every 𝒙D{\bm{x}}\in D, we wish to approximate the dimension truncated finite element solution f:=us,h(𝒙,)f:=u_{s,h}({\bm{x}},\cdot) at 𝒙{\bm{x}} as a function of 𝒚{\bm{y}}, and we define

fn:=us,h,n(𝒙,):=An(us,h(𝒙,))H(Us)f_{n}:=u_{s,h,n}({\bm{x}},\cdot):=A_{n}^{*}(u_{s,h}({\bm{x}},\cdot))\in H(U_{s})

to be the corresponding kernel interpolant. Then we are interested in the joint L2L_{2} error

uus,h,nL2(D×U):=DU(u(𝒙,𝒚)us,h,n(𝒙,𝒚))2d𝒚d𝒙,\|u-u_{s,h,n}\|_{L_{2}(D\times U)}:=\sqrt{\int_{D}\int_{U}\left(u({\bm{x}},{\bm{y}})-u_{s,h,n}({\bm{x}},{\bm{y}})\right)^{2}\,\mathrm{d}{\bm{y}}\,\mathrm{d}{\bm{x}}},

where we may interchange the order of integration using Fubini’s theorem. Using the triangle inequality, we split this error into three parts, with C>0C>0 in each case denoting a generic constant independent of ss, hh and nn:

  1. 1.

    The dimension truncation error satisfies, see [20, Theorem 4.1],

    uusL2(D×U)CqH1(D)s(1/p1/2),with p as in (A3).\displaystyle\|u-u_{s}\|_{L_{2}(D\times U)}\leq C\,\|q\|_{H^{-1}(D)}\,s^{-(1/p-1/2)},\quad\mbox{with $p$ as in (A3)}. (19)
  2. 2.

    The finite element error satisfies, see [20, Theorem 4.3],

    usus,hL2(D×U)CqH1+t(D)h1+tash0,t[0,1].\|u_{s}-u_{s,h}\|_{L_{2}(D\times U)}\leq C\,\|q\|_{H^{-1+t}(D)}\,h^{1+t}\quad\mbox{as}\quad h\to 0,\quad t\in[0,1].
  3. 3.

    The kernel interpolation error satisfies, see [20, Theorem 4.4],

    us,hus,h,nL2(D×U)CqH1(D)n1/(4λ)Cs(λ)\displaystyle\|u_{s,h}-u_{s,h,n}\|_{L_{2}(D\times U)}\leq C\,\|q\|_{H^{-1}(D)}\,n^{-1/(4\lambda)}\,C_{s}(\lambda) (20)

    for all α\alpha\in{\mathbb{N}} and λ(12α,1]\lambda\in(\frac{1}{2\alpha},1], where

    [Cs(λ)]2λ\displaystyle[C_{s}(\lambda)]^{2\lambda} :=(𝔲{1:s}max(|𝔲|,1)γ𝔲λ[2ζ(2αλ)]|𝔲|)\displaystyle:=\bigg{(}\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\max(|{\mathfrak{u}}|,1)\gamma_{{\mathfrak{u}}}^{\lambda}[2\zeta(2\alpha\lambda)]^{|{\mathfrak{u}}|}\bigg{)}
    ×(𝔲{1:s}1γ𝔲(𝒎𝔲{1:α}|𝔲||𝒎𝔲|!j𝔲(bjmjS(α,mj)))2)λ.\displaystyle\qquad\times\bigg{(}\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\frac{1}{\gamma_{{\mathfrak{u}}}}\bigg{(}\sum_{{\bm{m}}_{{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathfrak{u}}|}}|{\bm{m}}_{{\mathfrak{u}}}|!\prod_{j\in{\mathfrak{u}}}(b_{j}^{m_{j}}S(\alpha,m_{j}))\bigg{)}^{2}\bigg{)}^{\lambda}.

Specifically, the bound (20) was obtained by writing

us,hus,h,nL2(D×U)2\displaystyle\|u_{s,h}-u_{s,h,n}\|_{L_{2}(D\times U)}^{2} =Dus,h(𝒙,)An(us,h(𝒙,))L2(Us)2d𝒙\displaystyle=\int_{D}\|u_{s,h}({\bm{x}},\cdot)-A_{n}^{*}(u_{s,h}({\bm{x}},\cdot))\|_{L_{2}(U_{s})}^{2}\,\mathrm{d}{\bm{x}}
[ewor(An;H(Us);L2(Us))]2Dus,h(𝒙,)H(Us)2d𝒙.\displaystyle\leq[e^{\rm wor}(A_{n}^{*};H(U_{s});L_{2}(U_{s}))]^{2}\int_{D}\|u_{s,h}({\bm{x}},\cdot)\|_{H(U_{s})}^{2}\,\mathrm{d}{\bm{x}}.

The worst case error can be bounded by (15), while the integral of the squared H(Us)H(U_{s})-norm can be bounded by a sum over 𝔲{1:s}{\mathfrak{u}}\subseteq\{1:s\} involving 𝝂us,h(,𝒚)H01(D)2\|\partial^{\bm{\nu}}u_{s,h}(\cdot,{\bm{y}})\|_{H^{1}_{0}(D)}^{2} where each νj\nu_{j} is α\alpha for j𝔲j\in{\mathfrak{u}} and is 0 otherwise. The latter H01(D)H^{1}_{0}(D)-norm can be bounded by (17), leading ultimately to the constant Cs(λ)C_{s}(\lambda) in (20).

The difficulty of the parametric PDE problem is largely determined by the summability exponent pp in (A3). We see in (19) that the smaller pp is the faster the dimension truncation error decays. Naturally, the kernel interpolation error (20) should be linked with the summability exponent pp. In this application, the parameter α\alpha of the RKHS is actually a free parameter for us to choose, and so are the weights γ𝔲\gamma_{\mathfrak{u}}. This is more than just a theoretical exercise: to be able to implement the kernel interpolant in practice, we must specify α\alpha and the weights γ𝔲\gamma_{\mathfrak{u}}, since they appear in the formula for the kernel (10). The paper [20] proposed a number of choices, all with the aim of optimizing the convergence rate while keeping the constant Cs(λ)C_{s}(\lambda) bounded independently of ss. The best convergence rate obtained in [20] was

us,hus,h,nL2(D×U)CqH1(D)nr,withr=12p14,\displaystyle\|u_{s,h}-u_{s,h,n}\|_{L_{2}(D\times U)}\leq C\|q\|_{H^{-1}(D)}\,n^{-r},\quad\mbox{with}\quad r=\frac{1}{2p}-\frac{1}{4}, (21)

and this was achieved by a choice of SPOD weights. More precisely:

  • We can choose weights (of a very complicated form) to minimize Cs(λ)C_{s}(\lambda) and they achieve (21), see [20, Theorem 4.5].

  • We can choose SPOD weights to mimic the previous weights and they also achieve (21), see [20, Theorem 4.5]. These SPOD weights are given explicitly by

    γ𝔲:=𝒎𝔲{1:α}|𝔲|(|𝒎𝔲|!)21+λj𝔲(bjmjS(α,mj)2e1/eζ(2αλ))21+λ,\displaystyle\gamma_{{\mathfrak{u}}}:=\sum_{{\bm{m}}_{{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathfrak{u}}|}}(|{\bm{m}}_{{\mathfrak{u}}}|!)^{\frac{2}{1+\lambda}}\prod_{j\in{\mathfrak{u}}}\bigg{(}\frac{b_{j}^{m_{j}}S(\alpha,m_{j})}{\sqrt{2{\rm e}^{1/{\rm e}}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}}, (22)

    with

    α:=1p+12andλ:=p2p.\displaystyle\alpha:=\left\lfloor\frac{1}{p}+\frac{1}{2}\right\rfloor\quad\mbox{and}\quad\lambda:=\frac{p}{2-p}. (23)
  • If pk=1(22k+1,1k)p\in\bigcup_{k=1}^{\infty}(\frac{2}{2k+1},\frac{1}{k}) in (A3) then we can choose POD weights to achieve (21), see [20, Theorem 4.6].

  • We can choose product weights to achieve (21) with a reduced rate rr around one half of 12p14\frac{1}{2p}-\frac{1}{4}, see [20, Theorem 4.7].

Hence, SPOD weights and POD weights can achieve theoretically better convergence rates than product weights, but they are much more costly as seen in Table 1. The paper [20] reported comprehensive numerical experiments with the different choices of weights for the PDE problems of varying difficulties, and concluded that indeed SPOD weights perform mostly better than POD weights and product weights. However, the greater computational cost of SPOD weights is definitely real. We therefore set out to seek better weights.

5 Seeking better weights

In implementations of the lattice-based kernel interpolant of [20] the weights γ𝔲\gamma_{\mathfrak{u}} play a dual role. On the one hand they appear in the component-by-component (CBC) construction for determining the lattice generating vector 𝒛gen{\bm{z}}_{{\rm gen}}, which in turn determines the interpolation points through (12). On the other hand they appear in the formula (10) for the kernel. In both roles only special forms of weights are feasible, given that there are 2s2^{s} subsets 𝔲{1:s}{\mathfrak{u}}\subseteq\{1:s\}. The SPOD weights described above are feasible (and were used in the computations in [20]), but encounter two computational bottlenecks:

  1. 1.

    The CBC construction used to obtain the generating vector 𝒛gen{\bm{z}}_{{\rm gen}} using SPOD weights has cost 𝒪(snlog(n)+s3α2n)\mathcal{O}(s\,n\log(n)+s^{3}\,\alpha^{2}\,n), see Table 1 and [5, 6].

  2. 2.

    Evaluating the kernel interpolant at LL arbitrary points using SPOD weights has cost 𝒪(s2α2nL){\mathcal{O}}(s^{2}\,\alpha^{2}\,n\,L), see Table 1 and [20, Section 5.2].

While the cost of obtaining the generating vector could be regarded as an offline step that only needs to be performed once for a given set of problem parameters, the cost of evaluating the kernel interpolant makes its online use unattractive for high-dimensional problems.

We propose the following new formula for product weights to be used in both roles for the construction of the kernel interpolant:

γ𝔲:=𝒎𝔲{1:α}|𝔲|j𝔲(bjmjS(α,mj)2e1/eζ(2αλ))21+λ=j𝔲(m=1αbjmS(α,m)2e1/eζ(2αλ))21+λ,\displaystyle\gamma_{{\mathfrak{u}}}:=\sum_{{\bm{m}}_{{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathfrak{u}}|}}\prod_{j\in{\mathfrak{u}}}\bigg{(}\frac{b_{j}^{m_{j}}S(\alpha,m_{j})}{\sqrt{2{\rm e}^{1/{\rm e}}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}}=\prod_{j\in{\mathfrak{u}}}\bigg{(}\sum_{m=1}^{\alpha}\frac{b_{j}^{m}S(\alpha,m)}{\sqrt{2{\rm e}^{1/{\rm e}}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}}, (24)

where α\alpha and λ\lambda are given by (23). The weights (24) have been obtained from the SPOD weights (22) simply by leaving out the factorial factor (|𝒎𝔲|!)2/(1+λ)(|{\bm{m}}_{\mathfrak{u}}|!)^{2/(1+\lambda)}. We call these serendipitous weights.

The performance of these weights will be compared against the SPOD weights (22) in a series of numerical experiments in Section 6. In addition to the smaller observed errors, the serendipitous weights (because they are product weights) have obvious computational advantages:

  1. 1.

    The CBC construction used to obtain the generating vector 𝒛gen{\bm{z}}_{\rm gen} using product weights has cost 𝒪(snlog(n))\mathcal{O}(s\,n\log(n)), see Table 1 and [5, 6].

  2. 2.

    Evaluating the kernel interpolant at LL arbitrary points using product weights has cost 𝒪(snL)\mathcal{O}(s\,n\,L), see Table 1 and [20, Section 5.2].

As we shall see, serendipitous weights (24) allow us to tackle successfully very high-dimensional approximation problems. Moreover, we still have the rigorous error bound given in (20). We no longer have a guarantee of a constant independent of ss, but the observed performance will be seen to be excellent.

6 Numerical experiments

We consider the parametric PDE problem (1) converted to periodic form in (5), with the periodic diffusion coefficient (6). The domain is D=(0,1)2D=(0,1)^{2}, and the source term is q(𝒙)=x2q({\bm{x}})=x_{2}. For the mean field we set a0(𝒙)=1a_{0}({\bm{x}})=1, and for the stochastic fluctuations we take the functions

ψj(𝒙):=cjθsin(jπx1)sin(jπx2),𝒙=(x1,x2)D,j1,\psi_{j}({\bm{x}}):=cj^{-\theta}\sin(j\pi x_{1})\sin(j\pi x_{2}),\quad{\bm{x}}=(x_{1},x_{2})\in D,~{}j\geq 1,

where c>0c>0 is a constant determining the magnitude of the fluctuations, and θ>1\theta>1 is the decay rate of the stochastic fluctuations. The sequence (bj)j1(b_{j})_{j\geq 1} defined by (18) becomes

bj:=cjθamin,j1,\displaystyle b_{j}:=\frac{cj^{-\theta}}{a_{\min}},\quad j\geq 1, (25)

where for simplicity we take

amin:=1cζ(θ)andamax:=1+cζ(θ),a_{\min}:=1-c\zeta(\theta)\quad\mbox{and}\quad a_{\max}:=1+c\zeta(\theta),

and enforce c<1ζ(θ)c<\frac{1}{\zeta(\theta)} to ensure the uniform ellipticity condition.

For each fixed 𝒚[0,1]s{\bm{y}}\in[0,1]^{s} we solve the PDE using a piecewise linear finite element method with h=25h=2^{-5} as the finite element mesh size. We construct a kernel interpolant us,h,n(𝒙,)=An(us,h(𝒙,))u_{s,h,n}({\bm{x}},\cdot)=A_{n}^{*}(u_{s,h}({\bm{x}},\cdot)) for the finite element solution us,h(𝒙,)u_{s,h}({\bm{x}},\cdot) using both the SPOD weights (22) and serendipitous weights (24). These weights also enter the CBC construction used to obtain a lattice generating vector satisfying (20): specifically, the kernel interpolant is constructed over the point set 𝒕k={k𝒛gen/n}{\bm{t}}_{k}=\{k{\bm{z}}_{{\rm gen}}/n\}, where 𝒛gen{\bm{z}}_{{\rm gen}} is obtained using the algorithm described in [6].

The kernel interpolation error is estimated by computing

error\displaystyle{\rm error} =D[0,1]s(us,h(𝒙,𝒚)us,h,n(𝒙,𝒚))2d𝒚d𝒙\displaystyle=\sqrt{\int_{D}\int_{[0,1]^{s}}\big{(}u_{s,h}({\bm{x}},{\bm{y}})-u_{s,h,n}({\bm{x}},{\bm{y}})\big{)}^{2}\,\mathrm{d}{\bm{y}}\,\mathrm{d}{\bm{x}}}
1Ln=1Lk=1nD(us,h(𝒙,𝒚+𝒕k)us,h,n(𝒙,𝒚+𝒕k))2d𝒙,\displaystyle\approx\sqrt{\frac{1}{Ln}\sum_{\ell=1}^{L}\sum_{k=1}^{n}\int_{D}\big{(}u_{s,h}({\bm{x}},{\bm{y}}_{\ell}+{\bm{t}}_{k})-u_{s,h,n}({\bm{x}},{\bm{y}}_{\ell}+{\bm{t}}_{k})\big{)}^{2}\,{\rm d}{\bm{x}}}, (26)

where 𝒚{\bm{y}}_{\ell} for =1,,L\ell=1,\ldots,L denotes the sequence of Sobol points with L=100L=100. Note that since the functions us,h(𝒙,𝒚)u_{s,h}({\bm{x}},{\bm{y}}) and us,h,n(𝒙,𝒚)u_{s,h,n}({\bm{x}},{\bm{y}}) are 1-periodic with respect to 𝒚{\bm{y}}, the kernel interpolant can be evaluated efficiently over the union of shifted points 𝒚+𝒕k{\bm{y}}_{\ell}+{\bm{t}}_{k} for =1,,L\ell=1,\ldots,L and k=1,,nk=1,\ldots,n, using FFT, see Table 1 and [20, Section 5.1].

6.1 Fixing the parameters in the weights

To implement the kernel interpolant with either SPOD or serendipitous weights, one first has to choose the parameters cc and θ\theta in (25). The next step is to decide on a value of p(0,1)p\in(0,1) that satisfies (A3). This clearly restricts pp to the smaller interval p(1/θ,1).p\in(1/\theta,1). The choice of pp in turn determines the parameters α\alpha and λ\lambda through (23).

In the experiments we choose three different values for the decay parameter θ\theta, namely, θ=3.6\theta=3.6, 2.42.4 and 1.21.2, ranging from the very easy to the very difficult. Correspondingly, we choose p=13.3p=\frac{1}{3.3}, 12.2\frac{1}{2.2} and 11.1\frac{1}{1.1}, respectively, leading to values of the smoothness parameter α=3\alpha=3, 22 and 11, respectively. We also use different values for the parameter c=0.26c=\frac{0.2}{\sqrt{6}}, 0.46\frac{0.4}{\sqrt{6}}, and 1.56\frac{1.5}{\sqrt{6}}, again ranging from easy to difficult. (The factor 16\frac{1}{\sqrt{6}} has been included here to facilitate comparisons to the numerical results in [20].)

6.2 Comparing SPOD weights with serendipitous weights

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to captionRefer to captionRefer to caption
Figure 1: The kernel interpolation errors of the PDE problem (5) and (6) with θ=3.6\theta=3.6, p=1/3.3p=1/3.3, c{0.26,1.56}c\in\{\frac{0.2}{\sqrt{6}},\frac{1.5}{\sqrt{6}}\}, and s{10,100}s\in\{10,100\}. The results are displayed for kernel interpolants constructed using SPOD weights (22) and serendipitous weights (24).
Refer to caption

​​​ Refer to caption

Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption

Figure 2: The kernel interpolation errors of the PDE problem (5) and (6) with θ=2.4\theta=2.4, p=1/2.2p=1/2.2, c{0.26,1.56}c\in\{\frac{0.2}{\sqrt{6}},\frac{1.5}{\sqrt{6}}\}, and s{10,100}s\in\{10,100\}. The results are displayed for kernel interpolants constructed using SPOD weights (22) and serendipitous weights (24).
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to captionRefer to captionRefer to caption
Figure 3: The kernel interpolation errors of the PDE problem (5) and (6) with θ=1.2\theta=1.2, p=1/1.1p=1/1.1, c{0.26,0.46}c\in\{\frac{0.2}{\sqrt{6}},\frac{0.4}{\sqrt{6}}\}, and s{10,100}s\in\{10,100\}. The results are displayed for kernel interpolants constructed using SPOD weights (22) and serendipitous weights (24).

We here compare the kernel interpolation errors using both the SPOD weights (22) and serendipitous weights (24). The computed quantity in each case is the estimated L2L_{2} error with respect to both the domain and stochastic variables, given by (6).

The results are displayed in Figures 1, 2 and 3 for the three different θ\theta values, ranging from the easiest to the hardest. In each case the graphs on the left are for dimension s=10s=10, those on the right for s=100s=100. Each figure also gives the computed errors for two different values of the parameter cc, with the easier (i.e., smaller) value used in the upper pair, the harder (i.e., larger) value in the lower pair. Each graph also shows (dashed line) the theoretical convergence rate (21) for the given value of pp.

The striking fact is that the serendipitous weights perform about as well as SPOD weights for all the easier cases (all graphs in Figure 1, the upper graphs in Figures 2 and 3), while dramatically outperforming the SPOD weights for the harder cases (the lower graphs in Figures 2 and 3).

One way to assess the hardness of a particular parameter set is to inspect the values of amina_{{\rm min}} and amaxa_{{\rm max}} given in the legend above each graph. In particular, the hardest problem is the fourth graph in Figure 3, where the dimensionality is s=100s=100, and the random field has values ranging from around 0.1 to near 2. For this case the SPOD weights are seen to fail completely. Yet even in this case the serendipitous weights perform superbly.

The plateau in the convergence graph for the SPOD weights in Figure 3 can be explained as follows: the SPOD weights in this case become very large with increasing dimension and, in consequence, the kernel interpolant becomes very spiky at the lattice points and near zero elsewhere. Thus we are effectively seeing just the L2L_{2} norm of the target function us,hu_{s,h} for feasible values of nn.

The intuition behind the serendipitous weights is that the problem of overly large weights is overcome by omitting the factorials in the SPOD weight formula (22).

Finally, it is worth emphasizing that the construction cost with serendipitous weights is considerably cheaper than with SPOD weights. Yet the quality of the kernel interpolant appears to be just as good as or—as seen in Figures 2 and 3dramatically better than the interpolant based on SPOD weights. Putting these two aspects together, we resolved to repeat the experiments for a still larger value of ss, well beyond the reach of SPOD weights.

6.3 1000-dimensional examples

Refer to caption
Refer to caption
Refer to caption
Figure 4: The kernel interpolation errors of the PDE problem (5) and (6) with s=1000s=1000 and parameters θ=3.6\theta=3.6, c=1.56c=\frac{1.5}{\sqrt{6}}, and p=1/3.3p=1/3.3 (top), θ=2.4\theta=2.4, c=1.56c=\frac{1.5}{\sqrt{6}}, and p=1/2.2p=1/2.2 (middle), θ=1.2\theta=1.2, c=0.46c=\frac{0.4}{\sqrt{6}}, and p=1/1.1p=1/1.1 (bottom). The results are displayed for kernel interpolants constructed using serendipitous weights (24).

Since the serendipitous weights (24) are product weights, we are able to carry out computations using much higher dimensionalities than before. We consider the previous set of three θ\theta values together with the harder value of cc in each case, and now set the upper limit of the series (6) to be s=1000s=1000. The results are displayed in Figure 4.

The computational performance of the kernel interpolant using serendips is seen in Figure 4 to continue to be excellent even when s=1000s=1000. The method works well even for the most difficult experiment, illustrated on the bottom of Figure 4. While the pre-asymptotic regime is even longer than in the case s=100s=100 (shown in the bottom right of Figure 3), the kernel interpolation error does not stall as it does in the case s=100s=100 for SPOD weights. Thus the kernel interpolant based on serendipitous weights appears to be robust in practice.

7 Conclusions

We have introduced a new class of product weights, called the serendipitous weights, to be used in conjunction with the lattice-based kernel interpolant presented in [20]. Numerical experiments illustrate that this family of weights appears to be robust when it comes to kernel interpolation of parametric PDEs.

Numerical experiments in the paper comparing the performance with previously studied SPOD weights show that not only are the new weights cheaper and easier to work with, but also they give much better results for hard problems.

Acknowledgements

F. Y. Kuo and I. H. Sloan acknowledge support from the Australian Research Council (DP210100831). This research includes computations using the computational cluster Katana [24] supported by Research Technology Services at UNSW Sydney.

References

  • [1] Adcock, B., Brugiapaglia, S., Webster, C. G.: Sparse Polynomial Approximation of High-Dimensional Functions. Society for Industrial and Applied Mathematics (2022)
  • [2] Bartel, F., Kämmerer, L., Potts, D., Ullrich, T.: On the reconstruction of function values at subsampled quadrature points. Preprint arXiv:2208.13597 [math.NA] (2022)
  • [3] Byrenheid, G., Kämmerer, L., Ullrich, T., Volkmer, T.: Tight error bounds for rank-11 lattice sampling in spaces of hybrid mixed smoothness. Numer. Math. 136, 993–1034 (2017)
  • [4] Cohen, A., DeVore, R.: Approximation of high-dimensional parametric PDEs. Acta Numer. 24, 1–159 (2015)
  • [5] Cools, R., Kuo, F. Y., Nuyens, D., Sloan, I. H.: Lattice algorithms for multivariate approximation in periodic spaces with general weight parameters. In: Celebrating 75 Years of Mathematics of Computation (S. C. Brenner, I. Shparlinski, C.-W. Shu, and D. Szyld, eds.), Contemporary Mathematics, 754, AMS, 93–113 (2020)
  • [6] Cools, R., Kuo, F. Y., Nuyens, D., Sloan, I. H.: Fast component-by-component construction of lattice algorithms for multivariate approximation with POD and SPOD weights. Math. Comp. 90, 787–812 (2021)
  • [7] Dick, J., Gantner, R. N., Le Gia, Q. T., Schwab, Ch.: Higher order quasi-Monte Carlo integration for Bayesian PDE inversion. Comput. Math. Appl. 77, 144–172 (2019)
  • [8] Dick, J., Kuo, F. Y., Le Gia, Q. T., Nuyens, D., Schwab, Ch.: Higher order QMC Galerkin discretization for parametric operator equations. SIAM J. Numer. Anal. 52, 2676–2702 (2014)
  • [9] Dolbeault, M., Krieg, D., Ullrich, M.: A sharp upper bound for sampling numbers in L2L_{2}. Appl. Comp. Harm. Anal. 63, 113–134 (2023)
  • [10] Gantner, R. N., Herrmann, L., Schwab, Ch.: Quasi–Monte Carlo integration for affine-parametric, elliptic PDEs: local supports and product weights. SIAM J. Numer. Anal. 56, 111–135 (2018)
  • [11] Gantner, R. N., Peters, M. D.: Higher-order quasi-Monte Carlo for Bayesian shape inversion. SIAM/ASA J. Uncertain. Quantif. 6, 707–736 (2018)
  • [12] Gilbert, A. D., Graham, I. G., Kuo, F. Y., Scheichl, R., Sloan, I. H.: Analysis of quasi-Monte Carlo methods for elliptic eigenvalue problems with stochastic coefficients. Numer. Math. 142, 863–915 (2019)
  • [13] Gilbert, A. D., Scheichl, R.: Multilevel quasi-Monte Carlo for random elliptic eigenvalue problems I: regularity and error analysis. To appear in IMA J. Numer. Anal. (2023)
  • [14] Gilbert, A. D., Scheichl, R.: Multilevel quasi-Monte Carlo for random elliptic eigenvalue problems II: efficient algorithms and numerical results. To appear in IMA J. Numer. Anal. (2023)
  • [15] Guth, P. A., Kaarnioja, V., Kuo, F. Y., Schillings, C., Sloan, I. H.: A quasi-Monte Carlo method for optimal control under uncertainty. SIAM/ASA J. Uncertain. Quantif. 9, 354–383 (2021)
  • [16] Guth, P. A., Kaarnioja, V., Kuo, F. Y., Schillings, C., Sloan, I. H.: Parabolic PDE-constrained optimal control under uncertainty with entropic risk measure using quasi-Monte Carlo integration. Preprint arXiv:2208.02767 [math.NA] (2022)
  • [17] Hakula, H., Harbrecht, H., Kaarnioja, V., Kuo, F. Y., Sloan, I. H.: Uncertainty quantification for random domains using periodic random variables. Preprint arXiv:2210.17329 [math.NA] (2022)
  • [18] Harbrecht, H., Peters, M., Siebenmorgen, M.: Analysis of the domain mapping method for elliptic diffusion problems on random domains. Numer. Math. 134, 823–856 (2016)
  • [19] Herrmann, L., Keller, M., Schwab, Ch.: Quasi-Monte Carlo Bayesian estimation under Besov priors in elliptic inverse problems. Math. Comp. 90, 1831–1860 (2021)
  • [20] Kaarnioja, V., Kazashi, Y., Kuo, F. Y., Nobile, F., Sloan, I. H.: Fast approximation by periodic kernel-based lattice-point interpolation with application in uncertainty quantification. Numer. Math. 150, 33–77 (2022)
  • [21] Kaarnioja, V., Kuo, F. Y., Sloan, I. H.: Uncertainty quantification using periodic random variables. SIAM J. Numer. Anal. 58, 1068–1091 (2020)
  • [22] Kämmerer, L., Potts, D., Volkmer, T.: Approximation of multivariate periodic functions by trigonometric polynomials based on rank-11 lattice sampling. J. Complexity 31, 543–576 (2015)
  • [23] Kämmerer, L., Volkmer, T.: Approximation of multivariate periodic functions based on sampling along multiple rank-11 lattices. J. Approx. Theory 246, 1–17 (2019)
  • [24] Katana. Published online, DOI:10.26190/669X-A286 (2010)
  • [25] Krieg, D., Ullrich, M.: Function values are enough for L2L_{2}-approximation. Found. Comput. Math. 21, 1141–1151 (2021)
  • [26] Kuo, F. Y., Mo, W., Nuyens, D.: Constructing embedded lattice-based algorithms for multivariate function approximation with a composite number of points. Preprint arXiv:2209.01002 [math.NA] (2022)
  • [27] Kuo, F. Y., Nuyens, D.: Application of quasi-Monte Carlo methods to elliptic PDEs with random diffusion coefficients – a survey of analysis and implementation. Found. Comput. Math. 16, 1631–1696 (2016)
  • [28] Kuo, F. Y., Nuyens, D.: Application of quasi-Monte Carlo methods to PDEs with random coefficients – an overview and tutorial. In: A. B. Owen, P. W. Glynn (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2016, pp. 53–71. Springer (2018)
  • [29] Kuo, F. Y., Schwab, Ch., Sloan, I. H.: Quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50, 3351–3374 (2012)
  • [30] Kuo, F. Y., Sloan, I. H., Woźniakowski, H.: Lattice rules for multivariate approximation in the worst case setting. In: H. Niederreiter, D. Talay (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2004, pp. 289–330. Springer (2006)
  • [31] Nagel, N., Schäfer, M., Ullrich, T.: A new upper bound for sampling numbers. Found. Comput. Math. 22, 445–468 (2022)
  • [32] Sloan, I. H., Woźniakowski., H.: Tractability of multivariate integration for weighted Korobov classes. J. Complexity 17, 697–721 (2001)
  • [33] Xiu, D., Karniadakis, G. E.: The Wiener–Askey polynomial chaos for stochastic differential equations. SIAM J. Sci. Comput. 24, 619–644 (2002)
  • [34] Zeng, X. Y., Leung, K. T., Hickernell, F. J.: Error analysis of splines for periodic problems using lattice designs. In: H. Niederreiter, D. Talay (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2004, pp. 501–514. Springer (2006)
  • [35] Zeng, X. Y., Kritzer, P., Hickernell, F. J.: Spline methods using integration lattices and digital nets. Constr. Approx. 30, 529–555 (2009)