Lattice-based kernel approximation and
serendipitous weights for parametric PDEs in
very high dimensions

Vesa Kaarnioja¹¹1Department of Mathematics and Computer Science, Free University of Berlin, Arnimallee 6, 14195 Berlin, Germany.
Email: [email protected] Frances Y. Kuo²²2School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia. Email: [email protected] Ian H. Sloan³³3School of Mathematics and Statistics, UNSW Sydney, Sydney NSW 2052, Australia. Email: [email protected]

(August 2023)

Abstract

We describe a fast method for solving elliptic partial differential equations (PDEs) with uncertain coefficients using kernel interpolation at a lattice point set. By representing the input random field of the system using the model proposed by Kaarnioja, Kuo, and Sloan (SIAM J. Numer. Anal. 2020), in which a countable number of independent random variables enter the random field as periodic functions, it was shown by Kaarnioja, Kazashi, Kuo, Nobile, and Sloan (Numer. Math. 2022) that the lattice-based kernel interpolant can be constructed for the PDE solution as a function of the stochastic variables in a highly efficient manner using fast Fourier transform (FFT). In this work, we discuss the connection between our model and the popular “affine and uniform model” studied widely in the literature of uncertainty quantification for PDEs with uncertain coefficients. We also propose a new class of product weights entering the construction of the kernel interpolant, which dramatically improve the computational performance of the kernel interpolant for PDE problems with uncertain coefficients, and allow us to tackle function approximation problems up to very high dimensionalities. Numerical experiments are presented to showcase the performance of the new weights.

1 Introduction

As in the major survey [4], we consider parametric elliptic partial differential equations (PDEs) of the form

-\nabla\cdot(\widetilde{a}({\bm{x}},{\bm{z}})\nabla\widetilde{u}({\bm{x}},{\bm{z}}))=q({\bm{x}}),\quad{\bm{x}}\in D\subset\mathbb{R}^{d},\quad{\bm{z}}\in[-1,1]^{s},

(1)

subject to the homogeneous Dirichlet boundary condition $\widetilde{u}({\bm{x}},{\bm{z}})=0$ for ${\bm{x}}\in\partial D$ , where $D\subset\mathbb{R}^{d}$ is a bounded Lipschitz domain, with $d$ typically $2$ or $3$ , and $\widetilde{a}$ is an uncertain input field given in terms of parameters ${\bm{z}}=(z_{1},z_{2},\ldots,z_{s})\in[-1,1]^{s}$ by

\widetilde{a}({\bm{x}},{\bm{z}}):=a_{0}({\bm{x}})+\sum_{j=1}^{s}z_{j}\,\psi_{j}({\bm{x}}),\quad{\bm{x}}\in D,\quad{\bm{z}}\in[-1,1]^{s}.

(2)

Here the parameters $z_{j}$ represent independent random variables distributed on the interval $[-1,1]$ according to a given probability distribution with density $\rho$ . The essential feature giving rise to major difficulties is that $s$ may be very large.

We assume that the univariate density $\rho$ has the Chebyshev or arcsine form

\rho(z):=\frac{1}{\pi\sqrt{1-z^{2}}},\quad z\in[-1,1].

(3)

This is one of only two specific probability densities considered in the recent monograph [1], the other being the constant density $\rho(z)=\frac{1}{2}$ . In the method of generalized polynomial chaos (GPC), see [33], the density (3) is associated with Chebyshev polynomials of the first kind as the univariate basis functions. It might be argued that in many applications the choice between these two densities is a matter of taste rather than conviction.

In Section 2 we set ${\bm{z}}=\sin(2\pi{\bm{y}})$ component-wise, and so recast the problem (1)–(2) with the density (3) to one in which the dependence on a new stochastic variable ${\bm{y}}$ is periodic, and the solution becomes

u({\bm{x}},{\bm{y}}):=\widetilde{u}({\bm{x}},\sin(2\pi{\bm{y}})),\quad{\bm{x}}\in D,\quad{\bm{y}}\in[0,1]^{s}\,=:\,U_{s}.

(4)

More precisely, we show that $u({\bm{x}},{\bm{y}})$ satisfies

-\nabla\cdot(a({\bm{x}},{\bm{y}})\nabla u({\bm{x}},{\bm{y}}))=q({\bm{x}}),\quad{\bm{x}}\in D\subset\mathbb{R}^{d},\quad{\bm{y}}\in U_{s}

(5)

and that (2) together with the probability density (3) can be replaced by

a({\bm{x}},{\bm{y}}):=\widetilde{a}({\bm{x}},\sin(2\pi{\bm{y}}))=a_{0}({\bm{x}})+\sum_{j=1}^{s}\sin(2\pi y_{j})\,\psi_{j}({\bm{x}}),\quad{\bm{x}}\in D,\quad{\bm{y}}\in U_{s},

(6)

with ${\bm{y}}=(y_{1},y_{2},\ldots,y_{s})$ , where the parameters $y_{j}$ represent i.i.d. random variables uniformly distributed on $[0,1]$ . The equivalence of (2) subject to density (3) on the one hand, with (6) subject to uniform density on the other, is meant in the sense that both fields have exactly the same joint probability distribution in the domain $D$ ; for a precise statement see Theorem 2 and Corollary 3 below.

In Section 3 we describe the lattice-based kernel interpolant of [20] and discuss its properties, error bounds and cost of construction and evaluation. The essence of the method is that the dependence on the parameter ${\bm{y}}$ is approximated by a linear combination of periodic kernels, each with one leg fixed at a point ${\bm{t}}_{k},k=1,\ldots,n$ , of a carefully chosen $s$ -dimensional rank- $1$ lattice, with coefficients fixed by interpolation. The kernel is the reproducing kernel of a weighted $s$ -variate Hilbert space $H$ of dominating mixed smoothness $\alpha\in{\mathbb{N}}$ . (The parameter labelled $\alpha$ in [20] is here replaced by $2\alpha$ , to make $\alpha$ correspond to the order of mixed derivatives, see the norm (9) below.)

In Section 4 we summarize the results from the paper [20] on applying the lattice-based kernel interpolant to the PDE problem (5)–(6). The main result is that, provided the fluctuation coefficients $\psi_{j}$ in (6) decay suitably, the $L_{2}$ error (with respect to both ${\bm{x}}$ and ${\bm{y}}$ ) of the kernel interpolant to the PDE solution is of the order

{\mathcal{O}}(n^{-\alpha/2+\delta}),\quad\delta>0,

(7)

where the implied constant depends on $\delta$ but is independent of $s$ . The convergence rate $n^{-\alpha/2}$ is known (see [3]) to be the best that can be achieved in the sense of worst case $L_{2}$ error for any approximation that (as in [20] and here) uses information only at the points of a rank- $1$ lattice.

The present paper improves upon the method as presented in [20] in two different ways: first by making the method much more efficient; and second by making it much more accurate in difficult cases. The combined effect is to greatly increase the range of dimension $s$ that are realistically achievable. We demonstrate this by carrying out explicit nontrivial calculations with $s=1000$ , compared with $s=100$ in [20].

Both benefits are achieved in the present paper by the introduction of a new kind of “weights”. As mentioned earlier, our approximation makes explicit use of the reproducing kernel of a certain weighted Hilbert space $H$ . The definition of that Hilbert space involves positive numbers $\gamma_{\mathfrak{u}}$ called weights, see (9) below. The weights naturally occur also in the kernel, see (10). The best performing weights found in [20] were of the so-called “SPOD” form (see (22) below). For SPOD weights the cost of evaluating the kernel interpolant is high, because the cost of kernel evaluation is high.

In Section 5 we introduce novel weights, which are “product” weights rather than SPOD weights, allowing the kernels to be evaluated at a cost of order $s$ . We still have a rigorous error bound of the form (7). The theoretical downside is that the implied constant is no longer independent of $s$ . We refer to these weights as “serendipitous”, with the word “serendipity” meaning “happy discovery by accident”.

In Section 6 we give numerical results, which show that with the lattice-based kernel interpolant from [20] and these weights, problems with dimension $s$ of a thousand or more can easily be studied. For not only are they cheaper and easier to use than SPOD weights, but also in difficult problems they lead empirically (and to our surprise) to much smaller errors, while producing similar errors to SPOD weights for easier problems.

A different kind of product weight was developed in [20] by adhering to the requirement that the error bound be independent of dimension, but those weights were found to have limited applicability, and did not show the remarkable performance reported here.

Many other methods have been proposed for $L_{2}$ approximation in the multivariate setting. The review article [4] and the monograph [1] take the approximating function to be a multivariate polynomial; as a result a major part of their analysis is inevitably concerned with sparsification of the basis set, since otherwise the “curse of dimensionality” would preclude large values of $s$ . Recently other methods have been proposed [2, 3, 22, 23], some of which are optimal with respect to order of convergence, in the sense of producing error bounds of order

\displaystyle{\mathcal{O}}(n^{-\alpha})\quad\mbox{or}\quad{\mathcal{O}}(n^{-\alpha}(\log n)^{\beta})

(8)

for some $\beta$ , a rate with respect to the exponent of $n$ that is the same as for approximation numbers, see [25, 31, 9]. Obviously (8) displays a better convergence rate than (7), but it has yet to be demonstrated that any of these methods has the potential for solving in practice the very high-dimensional problems seen in the numerical calculations of this paper.

The application of lattice point sets together with kernel interpolation has gained a lot of attention in the recent years. The paper [34] appears to have been the first to consider this approach, later the paper [35] obtained dimension-independent error estimates in weighted spaces using product weights.

The use of lattice points for approximation has been facilitated by the development of fast component-by-component (CBC) constructions for lattices under different assumptions on the form of the weights, see [30, 5, 6]. This enables the generation of tailored lattices for large-scale high-dimensional approximation problems such as those arising in uncertainty quantification.

This paper is organized as follows. In Section 2, we describe the connection between the so-called “affine model” and the “periodic model” introduced in [21]. The lattice-based kernel interpolant of [20] is summarized in Section 3. The application to parametric elliptic PDEs with uncertain coefficients is discussed in Section 4. The new class of serendipitous weights for the construction of the kernel interpolant for parametric PDE problems is introduced in Section 5. Numerical experiments assessing the performance of those weights are presented in Section 6. The paper ends with some conclusions.

2 Transforming to the periodic setting

The equivalence of the affine probability model given by (2) and (3) with the periodic formulation in (6) is expressed in Corollary 3 below. It states that the probability distributions in the two cases are identical. A more general result is stated in Theorem 2. It rests in turn on the following lemma, stating that if $Y$ is a random variable uniformly distributed on $[0,1]$ , then $\sin(2\pi Y)$ has exactly the same probability distribution as a random variable distributed on $[-1,1]$ with the arcsine probability density (3). While the proof is elementary, it has one slightly unusual feature, that the change of variable from $y\in[0,1]$ to $z=\sin(2\pi y)$ is not monotone.

Lemma 1.

Let $Z$ be a random variable distributed on $[-1,1]$ with density $\rho$ given by (3), and let $Y$ be a random variable uniformly distributed on $[0,1]$ . Then for all $t\in\mathbb{R}$ we have

\mathbb{P}[Z\leq t]=\mathbb{P}[\sin(2\pi Y)\leq t].

Proof.

We first write the left-hand side explicitly as an integral:

\mathbb{P}[Z\leq t]=\int_{-1}^{1}{\rm ind}(t-z)\,\frac{1}{\pi\sqrt{1-z^{2}}}\,\mathrm{d}z,

where ${\rm ind}$ is the indicator function, taking the value $1$ if the argument is non-negative, and the value $0$ if it is negative. The next step is to make a change of variable $z=\sin(2\pi y)$ , but note that this is not permissible for $y$ on the whole interval $[0,1]$ because $\sin(2\pi y)$ is not monotone. Noting that $\sin(2\pi y)$ is $1$ -periodic, it is sufficient to consider $y$ in the two intervals $[-1/4,1/4]$ and $[1/4,3/4]$ separately (the point being that $\sin(2\pi y)$ is monotone in each subinterval, and that together the two subintervals make a full period). For the first subinterval we have

z=\sin(2\pi y),\quad y=\frac{\arcsin(z)}{2\pi}\in[-1/4,1/4],

so that we obtain

\mathbb{P}[Z\leq t]=\int_{-1/4}^{1/4}{\rm ind}(t-\sin(2\pi y))\frac{2\pi\cos(2\pi y)}{\pi\cos(2\pi y)}\,\mathrm{d}y=\int_{-1/4}^{1/4}{\rm ind}(t-\sin(2\pi y))\,2\,\mathrm{d}y.

Similarly, for the second interval $[1/4,3/4]$ we have

z=\sin(2\pi y),\quad y=\frac{1}{2}-\frac{\arcsin(z)}{2\pi}\in[1/4,3/4],

so that

\mathbb{P}[Z\leq t]=\int_{1/4}^{3/4}{\rm ind}(t-\sin(2\pi y))\,2\,\mathrm{d}y.

Adding the two results together, and dividing by 2, we obtain

\mathbb{P}[Z\leq t]=\int_{-1/4}^{3/4}{\rm ind}(t-\sin(2\pi y))\,\mathrm{d}y=\int_{0}^{1}{\rm ind}(t-\sin(2\pi y))\,\mathrm{d}y=\mathbb{P}[\sin(2\pi Y)\leq t],

with the second equality following from the periodicity of $\sin(2\pi y)$ . This completes the proof of the lemma. ∎

Theorem 2.

Let $\widetilde{Q}({\bm{Z}})=\widetilde{Q}(Z_{1},Z_{2},\ldots,Z_{s})$ be a real-valued random variable that depends on $s$ i.i.d. real-valued random variables $Z_{1},Z_{2},\ldots,Z_{s}$ , where each $Z_{j}$ is distributed on $[-1,1]$ with density $\rho$ given by (3). Let $Q({\bm{Y}})=Q(Y_{1},Y_{2},\ldots,Y_{s})$ be another random variable defined by

Q({\bm{Y}})=\widetilde{Q}(\sin(2\pi{\bm{Y}})),

where the $Y_{j}$ are i.i.d. uniformly distributed random variables on $[0,1]$ . Then for all $q\in\mathbb{R}$ we have

\mathbb{P}[Q(\cdot)\leq q]=\mathbb{P}[\widetilde{Q}(\cdot)\leq q],

where the probability on the left is with respect to uniform density, while that on the right is with respect to a product of the univariate densities (3).

Proof.

This follows immediately by applying Lemma 1 to each random variable, together with Fubini’s theorem. ∎

It follows from the theorem that the input parametric field given by (2) subject to the density (3) can with equal validity be expressed as (6), with each $y_{j}$ uniformly distributed on $[0,1]$ . We state this as a corollary:

Corollary 3.

Let ${\bm{Z}}$ and ${\bm{Y}}$ be as in Theorem 2, and $\widetilde{u}({\bm{x}},z)$ and $u({\bm{x}},{\bm{y}})$ be as in (1) and (4). The random variable $\widetilde{u}({\bm{x}},{\bm{Z}})$ for ${\bm{x}}\in D$ has the same joint probability distribution with respect to the product density $\prod_{j=1}^{s}\rho(z_{j})$ as $u({\bm{x}},{\bm{Y}})$ has with respect to the uniform product density.

This is the periodic formulation introduced by [21], except for the trivial difference of a different normalising factor: in [21] the sum was multiplied by $1/\sqrt{6}$ to ensure that the variance of the field was the same as that of a uniformly distributed affine variable defined on $[-1/2,1/2]$ . In effect we are here redefining the fluctuation coefficients $\psi_{j}$ .

3 The kernel interpolant

We assume that $f:[0,1]^{s}\to{\mathbb{R}}$ belongs to the weighted periodic unanchored Sobolev space $H$ of dominating mixed smoothness of order $\alpha\in{\mathbb{N}}$ , with norm

\displaystyle\|f\|_{H}^{2}:=\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\frac{1}{(2\pi)^{2\alpha|{\mathfrak{u}}|}\gamma_{\mathfrak{u}}}\int_{[0,1]^{|{\mathfrak{u}}|}}\!\!\;\bigg{|}\int_{[0,1]^{s-|{\mathfrak{u}}|}}\bigg{(}\prod_{j\in{\mathfrak{u}}}\frac{\partial^{\alpha}}{\partial y_{j}^{\alpha}}\bigg{)}f({\bm{y}})\,\mathrm{d}{\bm{y}}_{-{\mathfrak{u}}}\bigg{|}^{2}\,\,\mathrm{d}{\bm{y}}_{\mathfrak{u}},

(9)

where $\{1:s\}:=\{1,2,\ldots,s\}$ , ${\bm{y}}_{{\mathfrak{u}}}:=(y_{j})_{j\in{\mathfrak{u}}}$ and ${\bm{y}}_{-{\mathfrak{u}}}:=(y_{j})_{j\in\{1:s\}\setminus{\mathfrak{u}}}$ . The inner product $\langle\cdot,\cdot\rangle_{H}$ is defined in an obvious way. Here we have replaced the traditional notation of the smoothness parameter $\alpha$ by $2\alpha$ , so that our $\alpha$ corresponds exactly to the order of derivatives in the norm (9). The space $H$ is a special case of the weighted Korobov space which has a real smoothness parameter $\alpha$ characterizing the rate of decay of Fourier coefficients, see e.g., [32, 30, 5, 6].

The parameters $\gamma_{{\mathfrak{u}}}$ for ${\mathfrak{u}}\subseteq\{1:s\}$ in the norm (9) are weights that are used to moderate the relative importance between subsets of variables in the norm, with $\gamma_{\emptyset}:=1$ . There are $2^{s}$ weights in full generality, far too many to prescribe one by one. In practice we must work with weights of restricted forms. The following forms of weights have been considered in the literature:

•

Product weights [32]: $\gamma_{\mathfrak{u}}=\prod_{j\in{\mathfrak{u}}}\gamma_{j}$ , specified by one sequence $(\gamma_{j})_{j\geq 1}$ .
•

POD weights (product and order dependent) [29]: $\gamma_{\mathfrak{u}}=\Gamma_{|{\mathfrak{u}}|}\prod_{j\in{\mathfrak{u}}}\gamma_{j}$ , specified by two sequences $(\Gamma_{\ell})_{\ell\geq 0}$ and $(\gamma_{j})_{j\geq 1}$ .
•

SPOD weights (smoothness-driven product and order dependent) [8]: $\gamma_{\mathfrak{u}}=\sum_{{\bm{\nu}}_{\mathfrak{u}}\in\{1:\alpha\}^{|{\mathfrak{u}}|}}\Gamma_{|{\bm{\nu}}_{\mathfrak{u}}|}\prod_{j\in{\mathfrak{u}}}\gamma_{j,\nu_{j}}$ , specified by the sequences $(\Gamma_{\ell})_{\ell\geq 0}$ and $(\gamma_{j,\nu})_{j\geq 1}$ for each $\nu=1,\ldots,\alpha$ , where $|{\bm{\nu}}_{\mathfrak{u}}|:=\sum_{j\in{\mathfrak{u}}}\nu_{j}$ .

The space $H$ is a reproducing kernel Hilbert space (RKHS) with the kernel

K({\bm{y}},{\bm{y}}^{\prime}):=\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\gamma_{{\mathfrak{u}}}\prod_{j\in{\mathfrak{u}}}\eta_{\alpha}(y_{j},y_{j}^{\prime}),\quad{\bm{y}},{\bm{y}}^{\prime}\in[0,1]^{s},

(10)

where

\eta_{\alpha}(y,y^{\prime}):=\frac{(2\pi)^{2\alpha}}{(-1)^{\alpha+1}(2\alpha)!}B_{2\alpha}(\{y-y^{\prime}\}),\quad y,y^{\prime}\in[0,1],

with $B_{\ell}(y)$ denoting the Bernoulli polynomial of degree $\ell$ , and the braces $\{\cdot\}$ denoting the fractional part of each component of the argument. As concrete examples, we have $B_{2}(y)=y^{2}-y+1/6$ and $B_{4}(y)=y^{4}-2y^{3}+y^{2}-1/30$ . The kernel $K$ is easily seen to satisfy the two defining properties of a reproducing kernel, namely that $K(\cdot,{\bm{y}})\in H$ for all ${\bm{y}}\in[0,1]^{s}$ and $\langle f,K(\cdot,{\bm{y}})\rangle_{H}=f({\bm{y}})$ for all $f\in H$ and all ${\bm{y}}\in[0,1]^{s}$ .

For a given $f\in H$ , we consider an approximation of the form

A_{n}^{*}(f)({\bm{y}}):=f_{n}({\bm{y}}):=\sum_{k=1}^{n}a_{k}\,K({\bm{t}}_{k},{\bm{y}}),\quad{\bm{y}}\in[0,1]^{s},

(11)

where $a_{1},\ldots,a_{n}\in\mathbb{R}$ and the nodes

\displaystyle{\bm{t}}_{k}:=\bigg{\{}\frac{k{\bm{z}}_{\rm gen}}{n}\bigg{\}}\quad\text{for}~{}k=1,\ldots,n

(12)

are the points of a rank-1 lattice, with ${\bm{z}}_{\rm gen}\in\{1,\ldots,n-1\}^{s}$ being the lattice generating vector. Since the kernel $K$ is periodic, the braces $\{\cdot\}$ indicating the fractional part in the definition of ${\bm{t}}_{k}$ can be omitted when we evaluate the kernel.

The kernel interpolant $f_{n}\in H$ is a function of the form (11) that interpolates $f$ at the lattice points,

f_{n}({\bm{t}}_{k^{\prime}})=f({\bm{t}}_{k^{\prime}})\quad\text{for all}~{}k^{\prime}=1,\ldots,n

with the coefficients $a_{k}$ therefore satisfying the linear system

\displaystyle\sum_{k=1}^{n}{\mathcal{K}}_{k,k^{\prime}}\,a_{k}=f({\bm{t}}_{k^{\prime}})\quad\text{for all}~{}k^{\prime}=1,\ldots,n,

(13)

where

\displaystyle{\mathcal{K}}_{k,k^{\prime}}={\mathcal{K}}_{k^{\prime},k}:=K({\bm{t}}_{k},{\bm{t}}_{k^{\prime}})=K\left(\frac{(k-k^{\prime}){\bm{z}}_{\rm gen}}{n},\mathbf{0}\right)\quad\mbox{for }k,k^{\prime}=1,\ldots,n.

(14)

The solution of the square linear sytem (13) exists and is unique because $K$ , by virtue of being a reproducing kernel, is positive definite.

Moreover, because the nodes form a lattice, the matrix ${\mathcal{K}}$ is a circulant, thus the coefficient vector ${\bm{a}}:=[a_{1},\ldots,a_{n}]^{\rm T}$ in (13) can be solved in $\mathcal{O}(n\log n)$ time using the fast Fourier transform (FFT) via

{\bm{a}}={\tt ifft}({\tt fft}({\bm{f}})\,./\,{\tt fft}({\mathcal{K}}_{:,1})),

where $./$ indicates component-wise division of two vectors, ${\bm{f}}:=[f({\bm{t}}_{1}),\ldots,f({\bm{t}}_{n})]^{\rm T}$ , and ${\mathcal{K}}_{:,1}$ denotes the first column of matrix ${\mathcal{K}}$ . This is a major advantage of using lattice points for the construction of the kernel interpolant.

Important properties regarding the kernel interpolant were summarized or proved in [20]:

•

The kernel interpolant is the minimal norm interpolant in the sense that it has the smallest $H$ -norm among all interpolants using the same function values of $f$ , see [20, Theorem 2.1].
•

The kernel interpolant is optimal in the sense that it has the smallest worst case error (measured in any norm $\|\cdot\|_{W}$ with $H\subset W$ ) among all linear or non-linear algorithms that use the same function values of $f$ , see [20, Theorem 2.2]. Recall that the worst case error measured in $W$ -norm of an algorithm $A$ in $H$ is defined by

$e^{\rm wor}(A;H;W):=\sup_{f\in H,\,\|f\|_{H}\leq 1}\|f-A(f)\|_{W}.$
•

Any (linear or non-linear) algorithm $A_{n}$ (with $A_{n}(0)=0$ ) that uses function values of $f$ only at the lattice points has the lower bound

$e^{\rm wor}(A_{n};H;L_{p})\geq C\,n^{-\alpha/2},\quad p\in[1,\infty],$

with an explicit constant $C>0$ , see [20, Theorem 3.1] and [3]. However, it is known that there exist other algorithms based on function values that can achieve an $L_{2}$ approximation error upper bound of order $n^{-\alpha}$ , see [25, 31, 9]. Hence, our lattice-based kernel interpolant can only get the half-optimal convergence rate for $L_{2}$ approximation error at best.

•

A component-by-component (CBC) construction from [5, 6] can be used to obtain a lattice generating vector such that our lattice-based kernel interpolant achieves

\displaystyle e^{\rm wor}(A_{n}^{*};H;L_{2})\leq\frac{\kappa}{n^{1/(4\lambda)}}\bigg{(}\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\max(|{\mathfrak{u}}|,1)\,\gamma_{{\mathfrak{u}}}^{\lambda}\,[2\zeta(2\alpha\lambda)]^{|{\mathfrak{u}}|}\bigg{)}^{1/{(2\lambda)}}

(15)

for all $\lambda\in(1/(2\alpha),1]$ , with $\kappa:=\sqrt{2}\,(2.5+2^{4\alpha\lambda+1})^{{{1}/{(4\lambda)}}}$ and $\zeta(x):=\sum_{k=1}^{\infty}k^{-x}$ denoting the Riemann zeta function for $x>1$ . Hence, by taking $\lambda\to 1/(2\alpha)$ we conclude that

e^{\rm wor}(A_{n}^{*};H;L_{2})={\mathcal{O}}(n^{-\alpha/2+\delta}),\quad\delta>0,

where the implied constant depends on $\delta$ but is independent of $s$ if the weights $\gamma_{\mathfrak{u}}$ are such that the sum in (15) can be bounded independently of $s$ , see [20, Theorem 3.2]. Consequently, for any $f\in H$ we have

\|f-f_{n}\|_{L_{2}}\leq e^{\rm wor}(A_{n}^{*};H;L_{2})\,\|f\|_{H}={\mathcal{O}}(n^{-\alpha/2+\delta}).

The bound (15) was proved initially in [5] only for prime $n$ , but has since been generalized to composite $n$ and extensible lattice sequences in [26].

Although the theoretical error bound (15) holds for all general weights $\gamma_{\mathfrak{u}}$ , practical implementation of the CBC construction must take into account the specific form of weights for computational efficiency. Fast (FFT-based) CBC constructions were developed in [6] for product weights, POD weights and SPOD weights with varying computational cost. Evaluating the kernel interpolant (11) also requires evaluations of the kernel (10) with varying computational cost depending on the form of weights, see [20, Section 5.2]. Furthermore, if we are interested in evaluating the kernel interpolant $f_{n}({\bm{y}}_{\ell})$ at $L$ arbitrary points ${\bm{y}}_{\ell}$ , $\ell=1,\ldots,L$ , then due to the matrix (14) being circulant, we can evaluate the kernel interpolant at all the shifted points $f_{n}({\bm{y}}_{\ell}+{\bm{t}}_{k^{\prime}})$ , $k^{\prime}=1,\ldots,n$ , with only an extra logarithmic factor in the cost, see [20, Section 5.1]. We summarize these cost considerations in Table 1 (taken from [20, Table 1]). Clearly, product weights are the most efficient form of weights in all considerations.

Table 1: Cost breakdown for the kernel interpolant

f_{n}

based on

n

lattice points

{\bm{t}}_{k}

s

dimensions, evaluated at

L

arbitrary points

{\bm{y}}_{\ell}

. Here

X

is the cost for one evaluation of

f

Operation $\backslash$ Weights	Product	POD	SPOD
Fast CBC construction for ${\bm{z}}_{\rm gen}$	$s\,n\log(n)$	$s\,n\log(n)+s^{2}\log(s)\,n$	$s\,n\log(n)+s^{3}\alpha^{2}\,n$
Compute $K({\bm{t}}_{k},{\bm{0}})$ for all $k$	$s\,n$	$s^{2}\,n$	$s^{2}\,\alpha^{2}\,n$
Evaluate $f({\bm{t}}_{k})$ for all $k$	$X\,n$	$X\,n$	$X\,n$
Linear solve for all coefficients $a_{k}$	$n\log(n)$	$n\log(n)$	$n\log(n)$
Compute $K({\bm{t}}_{k},{\bm{y}}_{\ell})$ for all $k,\ell$	$s\,n\,L$	$s^{2}\,n\,L$	$s^{2}\,\alpha^{2}\,n\,L$
Assemble $f_{n}({\bm{y}}_{\ell})$ for all $\ell$	$n\,L$	$n\,L$	$n\,L$
OR Assemble $f_{n}({\bm{y}}_{\ell}+{\bm{t}}_{k})$ for all $\ell,k$	$n\,\log(n)\,L$	$n\,\log(n)\,L$	$n\,\log(n)\,L$

4 Kernel interpolant for parametric elliptic PDEs

In the literature of “tailored” quasi-Monte Carlo (QMC) rules for parametric PDEs, it is customary to analyze the parametric regularity of the PDE solutions. This information can be used to construct QMC rules satisfying rigorous error bounds. Many of these studies have been carried out under the assumption of the so-called “affine and uniform setting” as in (2). Examples include the source problem for elliptic PDEs with random coefficients [29, 8, 27, 28, 10], spectral eigenvalue problems under uncertainty [12, 13, 14], Bayesian inverse problems [11, 7, 19], domain uncertainty quantification [18], PDE-constrained optimization under uncertainty [15, 16], and many others. When the input random field is modified to involve a composition with a periodic function as in (6), the regularity bound naturally changes, as we have encountered in [21, 20, 17].

We return now to the PDE problem (5) together with the periodic random field (6). We remark that in many studies the input random field is modeled as an infinite series and the effect of dimension truncation is analyzed. We will take this point of view below, as we did in [20]. Thus we now have a countably infinite parameter sequence ${\bm{y}}\in U:=[0,1]^{\mathbb{N}}$ , and $U_{s}$ in (4), (5) and (6) is now replaced by $U$ . We will abuse the notation from the introduction and instead use $a(\cdot,{\bm{y}})$ and $u(\cdot,{\bm{y}})$ to denote the corresponding random field and PDE solution, while we write $a_{s}(\cdot,{\bm{y}}):=a(\cdot,(y_{1},\ldots,y_{s},0,0,\ldots))$ and $u_{s}(\cdot,{\bm{y}}):=u(\cdot,(y_{1},\ldots,y_{s},0,0,\ldots,))$ to denote the dimension truncated counterparts.

Since we have two sets of variables ${\bm{x}}\in D$ and ${\bm{y}}\in U$ , from now on we will make the domains $D$ and $U$ explicit in our notation. Let $H_{0}^{1}(D)$ denote the subspace of $H^{1}(D)$ with zero trace on $\partial D$ . We equip $H_{0}^{1}(D)$ with the norm $\|v\|_{H_{0}^{1}(D)}:=\|\nabla v\|_{L_{2}(D)}$ . Let $H^{-1}(D)$ denote the topological dual space of $H_{0}^{1}(D)$ , and let $\langle\cdot,\cdot\rangle_{H^{-1}(D),H_{0}^{1}(D)}$ denote the duality pairing between $H^{-1}(D)$ and $H_{0}^{1}(D)$ . We have the parametric weak formulation: for ${\bm{y}}\in U$ , find $u(\cdot,{\bm{y}})\in H_{0}^{1}(D)$ such that

\displaystyle\int_{D}a({\bm{x}},{\bm{y}})\nabla u({\bm{x}},{\bm{y}})\cdot\nabla v({\bm{x}})\,{\rm d}{\bm{x}}=\langle q,v\rangle_{H^{-1}(D),H_{0}^{1}(D)}\quad\text{for all}~{}v\in H_{0}^{1}(D),

(16)

where $q\in H^{-1}(D)$ . Following the problem formulation in [20], we make these standing assumptions: {addmargin}[1.3em]0em

(A1)

$a_{0}\in L_{\infty}(D)$ and $\psi_{j}\in L_{\infty}(D)$ for all $j\geq 1$ , and $\sum_{j\geq 1}\|\psi_{j}\|_{L_{\infty}(D)}<\infty$ ;
(A2)

there exist $a_{\min}$ and $a_{\max}$ such that $0<a_{\min}\leq a({\bm{x}},{\bm{y}})\leq a_{\max}<\infty$ for all ${\bm{x}}\in D$ and ${\bm{y}}\in U$ ;
(A3)

$\sum_{j\geq 1}\|\psi_{j}\|_{L_{\infty}(D)}^{p}<\infty$ for some $p\in(0,1)$ ;
(A4)

$a_{0}\in W^{1,\infty}(D)$ and $\sum_{j\geq 1}\|\psi_{j}\|_{W^{1,\infty}(D)}<\infty$ , where $\|v\|_{W^{1,\infty}(D)}:=\max\{\|v\|_{L_{\infty}(D)},\|\nabla v\|_{L_{\infty}(D)}\};$
(A5)

$\|\psi_{1}\|_{L_{\infty}(D)}\geq\|\psi_{2}\|_{L_{\infty}(D)}\geq\cdots$ ;
(A6)

the spatial domain $D\subset\mathbb{R}^{d}$ , $d\in\{1,2,3\}$ , is a convex and bounded polyhedron.

In practical computations, one typically needs to discretize the PDE (5) using, e.g., a finite element method. While the weak solution of the PDE problem is in general a Sobolev function and may not be pointwise well-defined with respect to the spatial variable ${\bm{x}}\in D$ , the finite element solution is pointwise well-defined with respect to ${\bm{x}}\in D$ which we may exploit in the construction of the kernel interpolant. To this end, let $\{V_{h}\}_{h}$ be a family of conforming finite element subspaces $V_{h}\subset H_{0}^{1}(D)$ , indexed by the mesh size $h>0$ and spanned by continuous, piecewise linear finite element basis functions. Furthermore, we assume that the triangulation corresponding to each $V_{h}$ is obtained from an initial, regular triangulation of $D$ by recursive, uniform partition of simplices. For ${\bm{y}}\in U$ , the finite element solution $u_{h}(\cdot,{\bm{y}})\in V_{h}$ satisfies

\int_{D}a({\bm{x}},{\bm{y}})\nabla u_{h}({\bm{x}},{\bm{y}})\cdot\nabla v_{h}({\bm{x}})\,{\rm d}{\bm{x}}=\langle q,v_{h}\rangle_{H^{-1}(D),H_{0}^{1}(D)}\quad\text{for all}~{}v_{h}\in V_{h}.

Let ${\bm{\nu}}\in\mathbb{N}_{0}^{\infty}$ denote a multi-index with finite order $|{\bm{\nu}}|:=\sum_{j\geq 1}\nu_{j}<\infty$ , and let $\partial^{\bm{\nu}}:=\prod_{j\geq 1}(\partial/\partial y_{j})^{\nu_{j}}$ denote the mixed partial derivatives with respect to ${\bm{y}}$ . The standing assumptions (A1)–(A6) together with the Lax–Milgram lemma ensure that the weak formulation (16) has a unique solution such that for all ${\bm{y}}\in U$ (see [21] for a proof),

\displaystyle\|\partial^{{\bm{\nu}}}u(\cdot,{\bm{y}})\|_{H_{0}^{1}(D)}\leq\frac{\|q\|_{H^{-1}(D)}}{a_{\min}}\,(2\pi)^{|{\bm{\nu}}|}\sum_{{\bm{m}}\leq{\bm{\nu}}}|{\bm{m}}|!\,\prod_{j\geq 1}\big{(}b_{j}^{m_{j}}\,S(\nu_{j},m_{j})\big{)},

(17)

where (no factor $1/\sqrt{6}$ here compared to [21])

b_{j}:=\frac{\|\psi_{j}\|_{L_{\infty}(D)}}{a_{\min}}\quad\text{for all}~{}j\geq 1,

(18)

and $S(\nu,m)$ denotes the Stirling number of the second kind for integers $\nu\geq m\geq 0$ , under the convention $S(\nu,0)=\delta_{\nu,0}$ .

Note that the parametric regularity bound (17) holds when $u(\cdot,{\bm{y}})$ is replaced by a conforming finite element approximation $u_{h}(\cdot,{\bm{y}})$ . The same is also true for the dimension truncated solution $u_{s}(\cdot,{\bm{y}})$ and its corresponding finite element approximation $u_{s,h}(\cdot,{\bm{y}})$ .

Let $H(U_{s})=H$ denote the RKHS of functions with respect to ${\bm{y}}\in U_{s}$ from Section 3. In the framework of Section 3, for every ${\bm{x}}\in D$ , we wish to approximate the dimension truncated finite element solution $f:=u_{s,h}({\bm{x}},\cdot)$ at ${\bm{x}}$ as a function of ${\bm{y}}$ , and we define

f_{n}:=u_{s,h,n}({\bm{x}},\cdot):=A_{n}^{*}(u_{s,h}({\bm{x}},\cdot))\in H(U_{s})

to be the corresponding kernel interpolant. Then we are interested in the joint $L_{2}$ error

\|u-u_{s,h,n}\|_{L_{2}(D\times U)}:=\sqrt{\int_{D}\int_{U}\left(u({\bm{x}},{\bm{y}})-u_{s,h,n}({\bm{x}},{\bm{y}})\right)^{2}\,\mathrm{d}{\bm{y}}\,\mathrm{d}{\bm{x}}},

where we may interchange the order of integration using Fubini’s theorem. Using the triangle inequality, we split this error into three parts, with $C>0$ in each case denoting a generic constant independent of $s$ , $h$ and $n$ :

The dimension truncation error satisfies, see [20, Theorem 4.1],

\displaystyle\|u-u_{s}\|_{L_{2}(D\times U)}\leq C\,\|q\|_{H^{-1}(D)}\,s^{-(1/p-1/2)},\quad\mbox{with $p$ as in (A3)}.

(19)

2.

The finite element error satisfies, see [20, Theorem 4.3],

$\|u_{s}-u_{s,h}\|_{L_{2}(D\times U)}\leq C\,\|q\|_{H^{-1+t}(D)}\,h^{1+t}\quad\mbox{as}\quad h\to 0,\quad t\in[0,1].$

The kernel interpolation error satisfies, see [20, Theorem 4.4],

\displaystyle\|u_{s,h}-u_{s,h,n}\|_{L_{2}(D\times U)}\leq C\,\|q\|_{H^{-1}(D)}\,n^{-1/(4\lambda)}\,C_{s}(\lambda)

(20)

for all $\alpha\in{\mathbb{N}}$ and $\lambda\in(\frac{1}{2\alpha},1]$ , where

	$\displaystyle[C_{s}(\lambda)]^{2\lambda}$	$\displaystyle:=\bigg{(}\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\max(\|{\mathfrak{u}}\|,1)\gamma_{{\mathfrak{u}}}^{\lambda}[2\zeta(2\alpha\lambda)]^{\|{\mathfrak{u}}\|}\bigg{)}$
		$\displaystyle\qquad\times\bigg{(}\sum_{{\mathfrak{u}}\subseteq\{1:s\}}\frac{1}{\gamma_{{\mathfrak{u}}}}\bigg{(}\sum_{{\bm{m}}_{{\mathfrak{u}}}\in\{1:\alpha\}^{\|{\mathfrak{u}}\|}}\|{\bm{m}}_{{\mathfrak{u}}}\|!\prod_{j\in{\mathfrak{u}}}(b_{j}^{m_{j}}S(\alpha,m_{j}))\bigg{)}^{2}\bigg{)}^{\lambda}.$

Specifically, the bound (20) was obtained by writing

	$\displaystyle\\|u_{s,h}-u_{s,h,n}\\|_{L_{2}(D\times U)}^{2}$	$\displaystyle=\int_{D}\\|u_{s,h}({\bm{x}},\cdot)-A_{n}^{*}(u_{s,h}({\bm{x}},\cdot))\\|_{L_{2}(U_{s})}^{2}\,\mathrm{d}{\bm{x}}$
		$\displaystyle\leq[e^{\rm wor}(A_{n}^{*};H(U_{s});L_{2}(U_{s}))]^{2}\int_{D}\\|u_{s,h}({\bm{x}},\cdot)\\|_{H(U_{s})}^{2}\,\mathrm{d}{\bm{x}}.$

The worst case error can be bounded by (15), while the integral of the squared $H(U_{s})$ -norm can be bounded by a sum over ${\mathfrak{u}}\subseteq\{1:s\}$ involving $\|\partial^{\bm{\nu}}u_{s,h}(\cdot,{\bm{y}})\|_{H^{1}_{0}(D)}^{2}$ where each $\nu_{j}$ is $\alpha$ for $j\in{\mathfrak{u}}$ and is $0$ otherwise. The latter $H^{1}_{0}(D)$ -norm can be bounded by (17), leading ultimately to the constant $C_{s}(\lambda)$ in (20).

The difficulty of the parametric PDE problem is largely determined by the summability exponent $p$ in (A3). We see in (19) that the smaller $p$ is the faster the dimension truncation error decays. Naturally, the kernel interpolation error (20) should be linked with the summability exponent $p$ . In this application, the parameter $\alpha$ of the RKHS is actually a free parameter for us to choose, and so are the weights $\gamma_{\mathfrak{u}}$ . This is more than just a theoretical exercise: to be able to implement the kernel interpolant in practice, we must specify $\alpha$ and the weights $\gamma_{\mathfrak{u}}$ , since they appear in the formula for the kernel (10). The paper [20] proposed a number of choices, all with the aim of optimizing the convergence rate while keeping the constant $C_{s}(\lambda)$ bounded independently of $s$ . The best convergence rate obtained in [20] was

\displaystyle\|u_{s,h}-u_{s,h,n}\|_{L_{2}(D\times U)}\leq C\|q\|_{H^{-1}(D)}\,n^{-r},\quad\mbox{with}\quad r=\frac{1}{2p}-\frac{1}{4},

(21)

and this was achieved by a choice of SPOD weights. More precisely:

•

We can choose weights (of a very complicated form) to minimize $C_{s}(\lambda)$ and they achieve (21), see [20, Theorem 4.5].

•

We can choose SPOD weights to mimic the previous weights and they also achieve (21), see [20, Theorem 4.5]. These SPOD weights are given explicitly by

\displaystyle\gamma_{{\mathfrak{u}}}:=\sum_{{\bm{m}}_{{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathfrak{u}}|}}(|{\bm{m}}_{{\mathfrak{u}}}|!)^{\frac{2}{1+\lambda}}\prod_{j\in{\mathfrak{u}}}\bigg{(}\frac{b_{j}^{m_{j}}S(\alpha,m_{j})}{\sqrt{2{\rm e}^{1/{\rm e}}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}},

(22)

with

\displaystyle\alpha:=\left\lfloor\frac{1}{p}+\frac{1}{2}\right\rfloor\quad\mbox{and}\quad\lambda:=\frac{p}{2-p}.

(23)

•

If $p\in\bigcup_{k=1}^{\infty}(\frac{2}{2k+1},\frac{1}{k})$ in (A3) then we can choose POD weights to achieve (21), see [20, Theorem 4.6].
•

We can choose product weights to achieve (21) with a reduced rate $r$ around one half of $\frac{1}{2p}-\frac{1}{4}$ , see [20, Theorem 4.7].

Hence, SPOD weights and POD weights can achieve theoretically better convergence rates than product weights, but they are much more costly as seen in Table 1. The paper [20] reported comprehensive numerical experiments with the different choices of weights for the PDE problems of varying difficulties, and concluded that indeed SPOD weights perform mostly better than POD weights and product weights. However, the greater computational cost of SPOD weights is definitely real. We therefore set out to seek better weights.

5 Seeking better weights

In implementations of the lattice-based kernel interpolant of [20] the weights $\gamma_{\mathfrak{u}}$ play a dual role. On the one hand they appear in the component-by-component (CBC) construction for determining the lattice generating vector ${\bm{z}}_{{\rm gen}}$ , which in turn determines the interpolation points through (12). On the other hand they appear in the formula (10) for the kernel. In both roles only special forms of weights are feasible, given that there are $2^{s}$ subsets ${\mathfrak{u}}\subseteq\{1:s\}$ . The SPOD weights described above are feasible (and were used in the computations in [20]), but encounter two computational bottlenecks:

1.

The CBC construction used to obtain the generating vector ${\bm{z}}_{{\rm gen}}$ using SPOD weights has cost $\mathcal{O}(s\,n\log(n)+s^{3}\,\alpha^{2}\,n)$ , see Table 1 and [5, 6].
2.

Evaluating the kernel interpolant at $L$ arbitrary points using SPOD weights has cost ${\mathcal{O}}(s^{2}\,\alpha^{2}\,n\,L)$ , see Table 1 and [20, Section 5.2].

While the cost of obtaining the generating vector could be regarded as an offline step that only needs to be performed once for a given set of problem parameters, the cost of evaluating the kernel interpolant makes its online use unattractive for high-dimensional problems.

We propose the following new formula for product weights to be used in both roles for the construction of the kernel interpolant:

\displaystyle\gamma_{{\mathfrak{u}}}:=\sum_{{\bm{m}}_{{\mathfrak{u}}}\in\{1:\alpha\}^{|{\mathfrak{u}}|}}\prod_{j\in{\mathfrak{u}}}\bigg{(}\frac{b_{j}^{m_{j}}S(\alpha,m_{j})}{\sqrt{2{\rm e}^{1/{\rm e}}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}}=\prod_{j\in{\mathfrak{u}}}\bigg{(}\sum_{m=1}^{\alpha}\frac{b_{j}^{m}S(\alpha,m)}{\sqrt{2{\rm e}^{1/{\rm e}}\zeta(2\alpha\lambda)}}\bigg{)}^{\frac{2}{1+\lambda}},

(24)

where $\alpha$ and $\lambda$ are given by (23). The weights (24) have been obtained from the SPOD weights (22) simply by leaving out the factorial factor $(|{\bm{m}}_{\mathfrak{u}}|!)^{2/(1+\lambda)}$ . We call these serendipitous weights.

The performance of these weights will be compared against the SPOD weights (22) in a series of numerical experiments in Section 6. In addition to the smaller observed errors, the serendipitous weights (because they are product weights) have obvious computational advantages:

1.

The CBC construction used to obtain the generating vector ${\bm{z}}_{\rm gen}$ using product weights has cost $\mathcal{O}(s\,n\log(n))$ , see Table 1 and [5, 6].
2.

Evaluating the kernel interpolant at $L$ arbitrary points using product weights has cost $\mathcal{O}(s\,n\,L)$ , see Table 1 and [20, Section 5.2].

As we shall see, serendipitous weights (24) allow us to tackle successfully very high-dimensional approximation problems. Moreover, we still have the rigorous error bound given in (20). We no longer have a guarantee of a constant independent of $s$ , but the observed performance will be seen to be excellent.

6 Numerical experiments

We consider the parametric PDE problem (1) converted to periodic form in (5), with the periodic diffusion coefficient (6). The domain is $D=(0,1)^{2}$ , and the source term is $q({\bm{x}})=x_{2}$ . For the mean field we set $a_{0}({\bm{x}})=1$ , and for the stochastic fluctuations we take the functions

\psi_{j}({\bm{x}}):=cj^{-\theta}\sin(j\pi x_{1})\sin(j\pi x_{2}),\quad{\bm{x}}=(x_{1},x_{2})\in D,~{}j\geq 1,

where $c>0$ is a constant determining the magnitude of the fluctuations, and $\theta>1$ is the decay rate of the stochastic fluctuations. The sequence $(b_{j})_{j\geq 1}$ defined by (18) becomes

\displaystyle b_{j}:=\frac{cj^{-\theta}}{a_{\min}},\quad j\geq 1,

(25)

where for simplicity we take

a_{\min}:=1-c\zeta(\theta)\quad\mbox{and}\quad a_{\max}:=1+c\zeta(\theta),

and enforce $c<\frac{1}{\zeta(\theta)}$ to ensure the uniform ellipticity condition.

For each fixed ${\bm{y}}\in[0,1]^{s}$ we solve the PDE using a piecewise linear finite element method with $h=2^{-5}$ as the finite element mesh size. We construct a kernel interpolant $u_{s,h,n}({\bm{x}},\cdot)=A_{n}^{*}(u_{s,h}({\bm{x}},\cdot))$ for the finite element solution $u_{s,h}({\bm{x}},\cdot)$ using both the SPOD weights (22) and serendipitous weights (24). These weights also enter the CBC construction used to obtain a lattice generating vector satisfying (20): specifically, the kernel interpolant is constructed over the point set ${\bm{t}}_{k}=\{k{\bm{z}}_{{\rm gen}}/n\}$ , where ${\bm{z}}_{{\rm gen}}$ is obtained using the algorithm described in [6].

The kernel interpolation error is estimated by computing

	$\displaystyle{\rm error}$	$\displaystyle=\sqrt{\int_{D}\int_{[0,1]^{s}}\big{(}u_{s,h}({\bm{x}},{\bm{y}})-u_{s,h,n}({\bm{x}},{\bm{y}})\big{)}^{2}\,\mathrm{d}{\bm{y}}\,\mathrm{d}{\bm{x}}}$
		$\displaystyle\approx\sqrt{\frac{1}{Ln}\sum_{\ell=1}^{L}\sum_{k=1}^{n}\int_{D}\big{(}u_{s,h}({\bm{x}},{\bm{y}}_{\ell}+{\bm{t}}_{k})-u_{s,h,n}({\bm{x}},{\bm{y}}_{\ell}+{\bm{t}}_{k})\big{)}^{2}\,{\rm d}{\bm{x}}},$		(26)

where ${\bm{y}}_{\ell}$ for $\ell=1,\ldots,L$ denotes the sequence of Sobol^′ points with $L=100$ . Note that since the functions $u_{s,h}({\bm{x}},{\bm{y}})$ and $u_{s,h,n}({\bm{x}},{\bm{y}})$ are 1-periodic with respect to ${\bm{y}}$ , the kernel interpolant can be evaluated efficiently over the union of shifted points ${\bm{y}}_{\ell}+{\bm{t}}_{k}$ for $\ell=1,\ldots,L$ and $k=1,\ldots,n$ , using FFT, see Table 1 and [20, Section 5.1].

6.1 Fixing the parameters in the weights

To implement the kernel interpolant with either SPOD or serendipitous weights, one first has to choose the parameters $c$ and $\theta$ in (25). The next step is to decide on a value of $p\in(0,1)$ that satisfies (A3). This clearly restricts $p$ to the smaller interval $p\in(1/\theta,1).$ The choice of $p$ in turn determines the parameters $\alpha$ and $\lambda$ through (23).

In the experiments we choose three different values for the decay parameter $\theta$ , namely, $\theta=3.6$ , $2.4$ and $1.2$ , ranging from the very easy to the very difficult. Correspondingly, we choose $p=\frac{1}{3.3}$ , $\frac{1}{2.2}$ and $\frac{1}{1.1}$ , respectively, leading to values of the smoothness parameter $\alpha=3$ , $2$ and $1$ , respectively. We also use different values for the parameter $c=\frac{0.2}{\sqrt{6}}$ , $\frac{0.4}{\sqrt{6}}$ , and $\frac{1.5}{\sqrt{6}}$ , again ranging from easy to difficult. (The factor $\frac{1}{\sqrt{6}}$ has been included here to facilitate comparisons to the numerical results in [20].)

6.2 Comparing SPOD weights with serendipitous weights

Refer to caption — Figure 1: The kernel interpolation errors of the PDE problem (5) and (6) with $\theta=3.6$ , $p=1/3.3$ , $c\in\{\frac{0.2}{\sqrt{6}},\frac{1.5}{\sqrt{6}}\}$ , and $s\in\{10,100\}$ . The results are displayed for kernel interpolants constructed using SPOD weights (22) and serendipitous weights (24).

We here compare the kernel interpolation errors using both the SPOD weights (22) and serendipitous weights (24). The computed quantity in each case is the estimated $L_{2}$ error with respect to both the domain and stochastic variables, given by (6).

The results are displayed in Figures 1, 2 and 3 for the three different $\theta$ values, ranging from the easiest to the hardest. In each case the graphs on the left are for dimension $s=10$ , those on the right for $s=100$ . Each figure also gives the computed errors for two different values of the parameter $c$ , with the easier (i.e., smaller) value used in the upper pair, the harder (i.e., larger) value in the lower pair. Each graph also shows (dashed line) the theoretical convergence rate (21) for the given value of $p$ .

The striking fact is that the serendipitous weights perform about as well as SPOD weights for all the easier cases (all graphs in Figure 1, the upper graphs in Figures 2 and 3), while dramatically outperforming the SPOD weights for the harder cases (the lower graphs in Figures 2 and 3).

One way to assess the hardness of a particular parameter set is to inspect the values of $a_{{\rm min}}$ and $a_{{\rm max}}$ given in the legend above each graph. In particular, the hardest problem is the fourth graph in Figure 3, where the dimensionality is $s=100$ , and the random field has values ranging from around 0.1 to near 2. For this case the SPOD weights are seen to fail completely. Yet even in this case the serendipitous weights perform superbly.

The plateau in the convergence graph for the SPOD weights in Figure 3 can be explained as follows: the SPOD weights in this case become very large with increasing dimension and, in consequence, the kernel interpolant becomes very spiky at the lattice points and near zero elsewhere. Thus we are effectively seeing just the $L_{2}$ norm of the target function $u_{s,h}$ for feasible values of $n$ .

The intuition behind the serendipitous weights is that the problem of overly large weights is overcome by omitting the factorials in the SPOD weight formula (22).

Finally, it is worth emphasizing that the construction cost with serendipitous weights is considerably cheaper than with SPOD weights. Yet the quality of the kernel interpolant appears to be just as good as or—as seen in Figures 2 and 3—dramatically better than the interpolant based on SPOD weights. Putting these two aspects together, we resolved to repeat the experiments for a still larger value of $s$ , well beyond the reach of SPOD weights.

6.3 1000-dimensional examples

Since the serendipitous weights (24) are product weights, we are able to carry out computations using much higher dimensionalities than before. We consider the previous set of three $\theta$ values together with the harder value of $c$ in each case, and now set the upper limit of the series (6) to be $s=1000$ . The results are displayed in Figure 4.

The computational performance of the kernel interpolant using serendips is seen in Figure 4 to continue to be excellent even when $s=1000$ . The method works well even for the most difficult experiment, illustrated on the bottom of Figure 4. While the pre-asymptotic regime is even longer than in the case $s=100$ (shown in the bottom right of Figure 3), the kernel interpolation error does not stall as it does in the case $s=100$ for SPOD weights. Thus the kernel interpolant based on serendipitous weights appears to be robust in practice.

7 Conclusions

We have introduced a new class of product weights, called the serendipitous weights, to be used in conjunction with the lattice-based kernel interpolant presented in [20]. Numerical experiments illustrate that this family of weights appears to be robust when it comes to kernel interpolation of parametric PDEs.

Numerical experiments in the paper comparing the performance with previously studied SPOD weights show that not only are the new weights cheaper and easier to work with, but also they give much better results for hard problems.

Acknowledgements

F. Y. Kuo and I. H. Sloan acknowledge support from the Australian Research Council (DP210100831). This research includes computations using the computational cluster Katana [24] supported by Research Technology Services at UNSW Sydney.

References

[1] Adcock, B., Brugiapaglia, S., Webster, C. G.: Sparse Polynomial Approximation of High-Dimensional Functions. Society for Industrial and Applied Mathematics (2022)
[2] Bartel, F., Kämmerer, L., Potts, D., Ullrich, T.: On the reconstruction of function values at subsampled quadrature points. Preprint arXiv:2208.13597 [math.NA] (2022)
[3] Byrenheid, G., Kämmerer, L., Ullrich, T., Volkmer, T.: Tight error bounds for rank- $1$ lattice sampling in spaces of hybrid mixed smoothness. Numer. Math. 136, 993–1034 (2017)
[4] Cohen, A., DeVore, R.: Approximation of high-dimensional parametric PDEs. Acta Numer. 24, 1–159 (2015)
[5] Cools, R., Kuo, F. Y., Nuyens, D., Sloan, I. H.: Lattice algorithms for multivariate approximation in periodic spaces with general weight parameters. In: Celebrating 75 Years of Mathematics of Computation (S. C. Brenner, I. Shparlinski, C.-W. Shu, and D. Szyld, eds.), Contemporary Mathematics, 754, AMS, 93–113 (2020)
[6] Cools, R., Kuo, F. Y., Nuyens, D., Sloan, I. H.: Fast component-by-component construction of lattice algorithms for multivariate approximation with POD and SPOD weights. Math. Comp. 90, 787–812 (2021)
[7] Dick, J., Gantner, R. N., Le Gia, Q. T., Schwab, Ch.: Higher order quasi-Monte Carlo integration for Bayesian PDE inversion. Comput. Math. Appl. 77, 144–172 (2019)
[8] Dick, J., Kuo, F. Y., Le Gia, Q. T., Nuyens, D., Schwab, Ch.: Higher order QMC Galerkin discretization for parametric operator equations. SIAM J. Numer. Anal. 52, 2676–2702 (2014)
[9] Dolbeault, M., Krieg, D., Ullrich, M.: A sharp upper bound for sampling numbers in $L_{2}$ . Appl. Comp. Harm. Anal. 63, 113–134 (2023)
[10] Gantner, R. N., Herrmann, L., Schwab, Ch.: Quasi–Monte Carlo integration for affine-parametric, elliptic PDEs: local supports and product weights. SIAM J. Numer. Anal. 56, 111–135 (2018)
[11] Gantner, R. N., Peters, M. D.: Higher-order quasi-Monte Carlo for Bayesian shape inversion. SIAM/ASA J. Uncertain. Quantif. 6, 707–736 (2018)
[12] Gilbert, A. D., Graham, I. G., Kuo, F. Y., Scheichl, R., Sloan, I. H.: Analysis of quasi-Monte Carlo methods for elliptic eigenvalue problems with stochastic coefficients. Numer. Math. 142, 863–915 (2019)
[13] Gilbert, A. D., Scheichl, R.: Multilevel quasi-Monte Carlo for random elliptic eigenvalue problems I: regularity and error analysis. To appear in IMA J. Numer. Anal. (2023)
[14] Gilbert, A. D., Scheichl, R.: Multilevel quasi-Monte Carlo for random elliptic eigenvalue problems II: efficient algorithms and numerical results. To appear in IMA J. Numer. Anal. (2023)
[15] Guth, P. A., Kaarnioja, V., Kuo, F. Y., Schillings, C., Sloan, I. H.: A quasi-Monte Carlo method for optimal control under uncertainty. SIAM/ASA J. Uncertain. Quantif. 9, 354–383 (2021)
[16] Guth, P. A., Kaarnioja, V., Kuo, F. Y., Schillings, C., Sloan, I. H.: Parabolic PDE-constrained optimal control under uncertainty with entropic risk measure using quasi-Monte Carlo integration. Preprint arXiv:2208.02767 [math.NA] (2022)
[17] Hakula, H., Harbrecht, H., Kaarnioja, V., Kuo, F. Y., Sloan, I. H.: Uncertainty quantification for random domains using periodic random variables. Preprint arXiv:2210.17329 [math.NA] (2022)
[18] Harbrecht, H., Peters, M., Siebenmorgen, M.: Analysis of the domain mapping method for elliptic diffusion problems on random domains. Numer. Math. 134, 823–856 (2016)
[19] Herrmann, L., Keller, M., Schwab, Ch.: Quasi-Monte Carlo Bayesian estimation under Besov priors in elliptic inverse problems. Math. Comp. 90, 1831–1860 (2021)
[20] Kaarnioja, V., Kazashi, Y., Kuo, F. Y., Nobile, F., Sloan, I. H.: Fast approximation by periodic kernel-based lattice-point interpolation with application in uncertainty quantification. Numer. Math. 150, 33–77 (2022)
[21] Kaarnioja, V., Kuo, F. Y., Sloan, I. H.: Uncertainty quantification using periodic random variables. SIAM J. Numer. Anal. 58, 1068–1091 (2020)
[22] Kämmerer, L., Potts, D., Volkmer, T.: Approximation of multivariate periodic functions by trigonometric polynomials based on rank- $1$ lattice sampling. J. Complexity 31, 543–576 (2015)
[23] Kämmerer, L., Volkmer, T.: Approximation of multivariate periodic functions based on sampling along multiple rank- $1$ lattices. J. Approx. Theory 246, 1–17 (2019)
[24] Katana. Published online, DOI:10.26190/669X-A286 (2010)
[25] Krieg, D., Ullrich, M.: Function values are enough for $L_{2}$ -approximation. Found. Comput. Math. 21, 1141–1151 (2021)
[26] Kuo, F. Y., Mo, W., Nuyens, D.: Constructing embedded lattice-based algorithms for multivariate function approximation with a composite number of points. Preprint arXiv:2209.01002 [math.NA] (2022)
[27] Kuo, F. Y., Nuyens, D.: Application of quasi-Monte Carlo methods to elliptic PDEs with random diffusion coefficients – a survey of analysis and implementation. Found. Comput. Math. 16, 1631–1696 (2016)
[28] Kuo, F. Y., Nuyens, D.: Application of quasi-Monte Carlo methods to PDEs with random coefficients – an overview and tutorial. In: A. B. Owen, P. W. Glynn (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2016, pp. 53–71. Springer (2018)
[29] Kuo, F. Y., Schwab, Ch., Sloan, I. H.: Quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50, 3351–3374 (2012)
[30] Kuo, F. Y., Sloan, I. H., Woźniakowski, H.: Lattice rules for multivariate approximation in the worst case setting. In: H. Niederreiter, D. Talay (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2004, pp. 289–330. Springer (2006)
[31] Nagel, N., Schäfer, M., Ullrich, T.: A new upper bound for sampling numbers. Found. Comput. Math. 22, 445–468 (2022)
[32] Sloan, I. H., Woźniakowski., H.: Tractability of multivariate integration for weighted Korobov classes. J. Complexity 17, 697–721 (2001)
[33] Xiu, D., Karniadakis, G. E.: The Wiener–Askey polynomial chaos for stochastic differential equations. SIAM J. Sci. Comput. 24, 619–644 (2002)
[34] Zeng, X. Y., Leung, K. T., Hickernell, F. J.: Error analysis of splines for periodic problems using lattice designs. In: H. Niederreiter, D. Talay (eds.), Monte Carlo and Quasi-Monte Carlo Methods 2004, pp. 501–514. Springer (2006)
[35] Zeng, X. Y., Kritzer, P., Hickernell, F. J.: Spline methods using integration lattices and digital nets. Constr. Approx. 30, 529–555 (2009)

Lattice-based kernel approximation and serendipitous weights for parametric PDEs in very high dimensions

Abstract

1 Introduction

2 Transforming to the periodic setting

Lemma 1.

Proof.

Theorem 2.

Proof.

Corollary 3.

3 The kernel interpolant

4 Kernel interpolant for parametric elliptic PDEs

5 Seeking better weights

6 Numerical experiments

6.1 Fixing the parameters in the weights

6.2 Comparing SPOD weights with serendipitous weights

6.3 1000-dimensional examples

7 Conclusions

Acknowledgements

References

Lattice-based kernel approximation and
serendipitous weights for parametric PDEs in
very high dimensions