This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Spectral convergence of probability densities for forward problems in uncertainty quantification

Amir Sagiv Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA [email protected]
Abstract.

The estimation of probability density functions (PDF) using approximate maps is a fundamental building block in computational probability. We consider forward problems in uncertainty quantification: the inputs or the parameters of a deterministic model are random with a known distribution. The scalar quantity of interest is a fixed measurable function of the parameters, and is therefore a random variable as a well. Often, the quantity of interest map is not explicitly known and difficult to compute, and so the computational problem is to design a good approximation (surrogate model) of the quantity of interest. For the goal of approximating the moments of the quantity of interest, there is a well developed body of research. One widely popular approach is generalized Polynomial Chaos (gPC) and its many variants, which approximate moments with spectral accuracy. However, it is not clear whether the quantity of interest can be approximated with spectral accuracy as well. This result does not follow directly from spectrally accurate moment estimation. In this paper, we prove convergence rates for PDFs using collocation and Galerkin gPC methods with Legendre polynomials in all dimensions. In particular, exponential convergence of the densities is guaranteed for analytic quantities of interest. In one dimension, we provide more refined results with stronger convergence rates, as well as an alternative proof strategy based on optimal-transport techniques.

Key words and phrases:
Density, Polynomial Chaos, Spectral, Uncertainty Quantification, Wasserstein, Legendre, pushforward.
2010 Mathematics Subject Classification:
33C47, 62G07, 65C50, 65D15

1. Introduction

In the modeling of physical phenomena, taking uncertainties into consideration is often done by introducing random parameters into deterministic models. If the quantity of interest (model output) is a measurable function of the inputs, then it is a random as well. In this problem, known as the forward problems in uncertainty quantification (UQ), the computational task is to compute the induced probability measure of the quantity of interest given a prescribed probability measure on the inputs and parameters. The type of relevant information varies between applications, but in many cases the full probability density function (PDF) is of interest; See e.g., [1, 10, 11, 17, 19, 27, 29, 33, 30, 42]. This paper will rigorously establish that there exist methods which approximate these PDFs with spectral accuracy.

One avenue to approximate the PDF in forward UQ problems is by nonparametric statistical approaches, e.g., a discretized cumulative distribution function, the histogram method, or kernel density estimators (KDE) [41, 48].

To obtain better accuracy, one needs to utilize the underlying structure of forward UQ problems. That is, denote the input Borel probability space by (Ω,ϱ)(\Omega,\varrho) and the Borel measurable model-output map by f:Ωf:\Omega\to\mathbb{R}. The measure of interest μ=fϱ\mu=~{}f_{\sharp}\varrho is the pushforward of ϱ\varrho by ff, i.e., μ(A)=ϱ(f1(A))\mu(A)=\varrho(f^{-1}(A)) for every Borel set AA\subseteq~{}\mathbb{R}. To take this structure into account, we turn to approximation-based methods (surrogate models). In this class of methods, ff is approximated by a simpler function gg for which the pushforward gϱg_{\sharp}\varrho is easier to compute [20, 50]. For smooth quantities of interest with low- or moderate-dimensional domains, surrogate models have become the standard [20, 37], most prominently the polynomial-based methods known as generalized polynomial chaos (gPC)  [21, 28, 36, 50] and their many variants, see e.g., [25, 39, 45]. The success of the gPC methods relies on their spectral L2(Ω)L^{2}(\Omega) convergence, when the original map is sufficiently smooth and specific domain and boundary conditions are met [51, 52]. By spectral accuracy we mean the following: let gng_{n} be the gPC polynomial of order nn (see Sec. 2). Then fgnL2\|f-g_{n}\|_{L^{2}} decays polynomially in nn and the order of decay improves with the order of regularity of ff. Crucially, the error decays exponentially in nn if ff is analytic. L2(Ω)L^{2}(\Omega) convergence implies convergence of the moments of νn:=(gn)ϱ\nu_{n}:\,=(g_{n})_{\sharp}\varrho to those of μ\mu. Practically, this means that for analytic functions of interest ff, the moments converge exponentially fast in the order of approximation.

Question.

If gng_{n} is the gPC approximation of ff, does pνnp_{\nu_{n}} converge to pμp_{\mu} with spectral accuracy?

That a sequence of functions gng_{n} converges to ff in L2L^{2} does not guarantee that the resulting PDFs pνnp_{\nu_{n}} converge to pμp_{\mu} in any Lq()L^{q}(\mathbb{R}) norm, see counter-example in [15]. Despite its practical relevance, the problem of density estimation using surrogate models has so far received little theoretical treatment, see one notable example in [6]. We previously showed that L2L^{2} convergence of gng_{n} to ff guarantees convergence of νn\nu_{n} to μ\mu in the Wasserstein-pp metric, see details in [32] and in Sec. 5. Convergence in the Wasserstein metric, however, is in general weaker than convergence of the PDFs [22, 26]. To guarantee PDF approximation, Ditkowski, Fibich, and the author [15] proved that more strict regularity conditions and C1C^{1} convergence of gng_{n} to ff guarantee convergence of pνnp_{\nu_{n}} to pμp_{\mu} in Lq()L^{q}(\mathbb{R}) for all 1q<1\leq q<\infty (see Theorem 4 below) and provided a spline-based method with provable convergence rates for the PDFs. Related, but not equivalent results, have also been obtained for certain monotonic triangular maps on [1,1]d[-1,1]^{d} [53], and for the approximation of stochastic PDEs [9]. From a practical standpoint, it is desirable to have a surrogate model with spectrally convergent PDFs. One would like to prove that a well-known trade-off in function approximation (e.g., in L2L^{2}) holds for PDF approximation as well: on the one hand, local methods (like splines) are more robust to high derivatives and non-smoothness than spectral methods (like gPC). On the other hand, local methods are usually restricted to a fixed polynomial order of convergence, whereas spectral methods converge extremely fast for smooth functions. The question of spectral PDF convergence is therefore open and interesting.

In this work, we answer the above question affirmatively for the gPC expansion with respect to Legendre polynomials. To the best of our knowledge, these are the first results of this kind. Under high regularity conditions, the PDFs obtained by the gPC method converge spectrally (Theorem 1), and in particular converge exponentially fast for analytic quantities of interest. In the one-dimensional case we prove a stronger result, that pμpνnL1()fgnH1\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}} (Theorem 2), and provide an alternative spectral convergence result (Theorem 5), for which the proof relies on the weak Wasserstein-pp convergence combined with a recent result by Chae and Walker [7]. In all these results, the analysis relies on three main analytic components - the relation between ff and pμp_{\mu} (5), the approximation power of gPC in Sobolev spaces (4), and Sobolev-Morrey embeddings in compact domains (10). In this work, we study pushforward measures induced by full-grid (or tensor-product grid) polynomial approximations. In high dimensions, the use of tensor-product grids becomes computationally expansive due to the so-called curse of dimensionality - the computational cost required to achieve a certain accuracy increases exponentially with the dimension. Several approaches have been developed to perform high-dimensional uncertainty propagation, such as sparse grids [3, 5, 50, 51] and adaptive methods [45]. While these methods are beyond the scope is of this work, our analysis lays the numerical analysis foundation to understand the pushforward measures they induce, and thus their adequacy for density estimation.

1.1. Paper structure

The remainder of the paper is organized as follows: Sec. 2 provides the preliminaries on the gPC method, pushforward measures and a more detailed account of previous results. Secs. 3 and 4 detail the paper’s main theoretical results and their proofs, respectively. We then turn in Sec. 5 to provide an alternative, optimal-transport based analysis of the measure pushforward problem. Finally, in Sec. 6 we present numerical experiments and outline open problems.

2. Preliminaries

2.1. Generalized Polynomial Chaos

We present a brief review of the gPC method, see [50] for a thorough introduction. For clarity, we first review the gPC method for the one-dimensional case. The Legendre polynomials {pn(α)}n=0\{p_{n}(\alpha)\}_{n=0}^{\infty} are the set of orthogonal polynomials with respect to the Lebesgue measure on Ω=[1,1]\Omega=~{}[-1,1], see [38]

Deg(pn)=n,pn,pmL2(Ω)=δn,m.{\rm Deg}(p_{n})=n\,,\qquad\langle p_{n},p_{m}\rangle_{L^{2}(\Omega)}=\delta_{n,m}\,.

The Legendre polynomials constitute an orthonormal basis of L2(Ω,dα)L^{2}(\Omega,d\alpha), i.e.,

(1) f(α)=n=0f^(n)pn(α),f^(n):=pn,fL2(Ω),fL2(Ω,dα).f(\alpha)=\sum\limits_{n=0}^{\infty}\hat{f}(n)p_{n}(\alpha)\,,\qquad\hat{f}(n):=\langle p_{n},f\rangle_{L^{2}(\Omega)}\,,\qquad f\in L^{2}(\Omega,d\alpha)\,.

The Galerkin-gPC expansion of a function fL2(Ω)f\in L^{2}(\Omega) of order nn is the projection of ff to its first n+1n+1 modes, i.e., gn(α)=j=0nf^(n)pn(α)g_{n}(\alpha)=\sum_{j=0}^{n}\hat{f}(n)p_{n}(\alpha). In the multidimensional case Ω=[1,1]d\Omega=[-1,1]^{d}, the expansion of order nn of ff is defined as the L2(Ω)L^{2}(\Omega) projection of ff into the space of polynomials of maximal degree nn [8, 21, 52]

(2) gn=Pnf=𝐣np𝐣,fL2(Ω)p𝐣,n,g_{n}=P_{n}f=\sum\limits_{\|{\bf j}\|_{\infty}\leq n}\langle p_{\bf j},f\rangle_{L^{2}(\Omega)}p_{\bf j}\,,\qquad n\in\mathbb{N}\,,

where p𝐣p_{\bf j} is the tensor-product Legendre polynomial of multi-index 𝐣=(j1,,jd){\bf j}=(j_{1},\ldots,j_{d}) and 𝐣=maxi|ji|\|{\bf j}\|_{\infty}=\max_{i}|j_{i}|. If ff can only be evaluated at a discrete set of points, the integrals in expansion coefficients f^(n)\hat{f}(n) cannot be exactly computed. The collocation-gPC approach is to approximate these coefficients using the Gauss quadrature formula: Let q:Ωq:\Omega\to\mathbb{R} be an integrable function, then the Gauss-Legendre quadrature formula of order nn is 11q(α)𝑑αk=1ng(αk)wk\int_{-1}^{1}q(\alpha)\,d\alpha\approx\sum_{k=1}^{n}g(\alpha_{k})w_{k}, where {αk}k=1N\{\alpha_{k}\}_{k=1}^{N} are the distinct and real roots of pN(α)p_{N}(\alpha), wk:=Ωlk(α)𝑑μ(α)w_{k}:=\int_{\Omega}l_{k}(\alpha)\,d\mu(\alpha) are the weights, and lk(α)l_{k}(\alpha) are the Lagrange interpolation polynomials with respect at the quadrature points [14]. Taking q=pj(α)f(α)q=p_{j}(\alpha)f(\alpha) yields the gPC collocation method in one dimension [51],

(3a) gN(α):=j=0N1f^N(j)pj(α),g_{N}(\alpha):\,=\sum\limits_{j=0}^{N-1}\hat{f}_{N}(j)p_{j}(\alpha)\,,
(3b) f^(j)f^N(j):=k=1Nf(αk)pj(αk)wk,j=0,1,,N1.\hat{f}(j)\approx\hat{f}_{N}(j):\,=\sum\limits_{k=1}^{N}f\left(\alpha_{k}\right)p_{j}\left(\alpha_{k}\right)w_{k},\qquad j=0,1,\ldots,N-1\,.

The generalization of (3) to multiple dimensions is a direct result of integration by tensor-grid Gauss quadratures, see [8, 14, 51] for details. We further note that the gPC-collocation polynomial (3) is also the polynomial interpolant of degree N1N-1 in the Guass-Legendre points, i.e., gn(α𝐤)=f(α𝐤)g_{n}(\alpha_{\bf k})=f(\alpha_{\bf k}) for all quadrature points [12].

2.2. The approximation power of Legendre polynomials

Recall that for A=[1,1]dA=[-1,1]^{d} or d\mathbb{R}^{d}, and any pair of integers k,p1k,p\geq 1, the corresponding Sobolev space is defined as [2, 18]

Wk,p(A):={u:A|max𝐣1kD𝐣uLp(A)<},W^{k,p}(A):\,=\left\{u:A\to\mathbb{C}~{}~{}|~{}~{}\max\limits_{\|{\bf j}\|_{1}\leq k}\|D^{\bf j}u\|_{L^{p}(A)}<\infty~{}\right\}\,,

where 𝐣{\bf j} is a multi-index and D𝐣u=(Πk=1dαkjk)u(α)D^{\bf j}u=(\Pi_{k=1}^{d}\partial_{\alpha_{k}}^{j_{k}})u(\alpha). For p=2p=2 we will adopt the conventional notation Wk,2(A)=Hk(A)W^{k,2}(A)=H^{k}(A). Both the Galerkin (2) and the collocation expansion (3) converge spectrally in L2(Ω)L^{2}(\Omega), where the underlying measure is the Lebsegue measure [24, 40, 47, 50]. By this we mean that if fHk(Ω)f\in H^{k}(\Omega) for some k0k\geq 0, then [8, 24, 50]

fgnL2(Ω)cnkfHk(Ω),\|f-g_{n}\|_{L^{2}(\Omega)}\leq cn^{-k}\|f\|_{H^{k}(\Omega)}\,,

and if ff is analytic (in the multivariate sense), then fgnL2(Ω)\|f-g_{n}\|_{L^{2}(\Omega)} decays exponentially in nn [13], See [53] and the references therein for details. From a probability and UQ point of view, the convergence in L2(Ω)L^{2}(\Omega) guarantees that the moments of gng_{n} converge to the moments of ff. Indeed, if ϱ\varrho is the Lebesgue measure on Ω\Omega, a simple application of the Cauchy-Schwartz inequality shows that

|𝔼f𝔼gn|Ω(f(α)g(α))𝑑αfgnL2(Ω),|\mathbb{E}f-\mathbb{E}g_{n}|\leq\int_{\Omega}(f(\alpha)-g(\alpha))\,d\alpha\lesssim\|f-g_{n}\|_{L^{2}(\Omega)}\,,

and therefore for e.g., a smooth quantity of interest ff, the gPC method provides an accurate approximation of 𝔼f\mathbb{E}f for a relatively low order nn. The convergence of other moments follow similarly, see e.g., [15].111For measures ϱ\varrho which are not the Lebesgue measure, see [16]. Both gPC expansions (2) and (3) also converge in higher-regularity Sobolev spaces, as given by the now classical result:

Theorem (Canuto and Quarteroni [8]).

For any 1βσ1\leq\beta\leq\sigma, there exists a constant C=C(β,σ)C=C(\beta,\sigma) such that

(4a) fgnHβ(Ω)Cne(β,σ)fHσ(Ω),\|f-g_{n}\|_{H^{\beta}(\Omega)}\leq Cn^{-e(\beta,\sigma)}\|f\|_{H^{\sigma}(\Omega)}\,,
where gng_{n} is given by either (2) or (3), and
(4b) e(β,σ)=σ+122β.e(\beta,\sigma)=\sigma+\frac{1}{2}-2\beta\,.

If ff is analytic (for d=1d=1) on an ellipse in \mathbb{C} with foci at ±1\pm 1 and with both axis summing to r>1r>1, then in one dimension f^(n)=O(n1/2rn)\hat{f}(n)=O(n^{-1/2}r^{-n}) [46, 47]. By Parseval identity (with respect to Legendre polynomials),

fgnL2(Ω)2=j=n+1|f^(n)|2j=n+1n1r2n,\|f-g_{n}\|_{L^{2}(\Omega)}^{2}=\sum\limits_{j=n+1}^{\infty}|\hat{f}(n)|^{2}\lesssim\sum\limits_{j=n+1}^{\infty}n^{-1}r^{-2n}\,,

and so the L2L^{2} approximation error is almost exponential as well. Furthermore, by uniform convergence of gng_{n} to ff on Ω\Omega, one can differentiate gng_{n} term-wise, yielding faster than polynomial (exponential for all practical purposes) convergence of gng_{n} to ff in all Sobolev spaces Hk(Ω)H^{k}(\Omega) with r0r\geq 0.

2.3. Pushforward measures and prior results

The pushforward of a Borel measure ϱ\varrho on Ω\Omega by a measurable f:Ωf:\Omega\to\mathbb{R} is a measure μ:=fϱ\mu:\,=f_{*}\varrho defined by μ(A)=ϱ(f1(A))\mu(A)=\varrho(f^{-1}(A)) for any Borel subset AA\subseteq\mathbb{R}. If a measure μ\mu on \mathbb{R} is absolutely continuous with respect to the Lebesgue measure, its probability density function (PDF) is its Radon-Nykodim derivative, i.e., pμL1()p_{\mu}\in L^{1}(\mathbb{R}) which satisfies μ(A)=Apμ(y)𝑑y\mu(A)=\int_{A}p_{\mu}(y)\,dy for any Borel set AA\subseteq\mathbb{R}. Alternatively, if the cumulative distribution function (CDF) Fμ(y)=μ(f1(,y))F_{\mu}(y)=\mu(f^{-1}(-\infty,y)) is differentiable, then pμ(y)=dFμ(y)/dyp_{\mu}(y)=dF_{\mu}(y)/dy. In the one-dimensional case, if ff is piecewise C1C^{1} and piecewise monotonic, and ϱ\varrho is an absolutely continuous probability measure, then [15]

(5) pμ(y)=f(α)=ypϱ(α)|f(α)|.p_{\mu}(y)=\sum\limits_{f(\alpha)=y}\frac{p_{\varrho}(\alpha)}{|f^{\prime}(\alpha)|}\,.

This relation is the source of many of the difficulties and peculiarities in understanding PDFs of pushforward measures. For example, if f(J)=cf(J)=c for some constant cc on an interval JJ with ϱ(J)>0\varrho(J)>0, then μ\mu has a singular part at cc and therefore has no PDF. Even for a non-constant smooth monotonic function such as f(α)=α2f(\alpha)=\alpha^{2} and a simple ϱ\varrho such as the uniform measure on [0,1][0,1], then pμ(y)1/yp_{\mu}(y)\sim 1/\sqrt{y} is singular. Analogously for d>1d>1, then pμf1(y)pϱ|f|1𝑑σp_{\mu}\sim\int_{f^{-1}(y)}p_{\varrho}|\nabla f|^{-1}d\sigma, where dσd\sigma is the (d1)(d-1)-dimensional surface elements.

The practical goal of surrogate models in the context of PDF approximation is to approximate μ\mu by νn=(gn)ϱ\nu_{n}=(g_{n})_{\sharp}\varrho with small error terms pμpνnLq()\|p_{\mu}-p_{\nu_{n}}\|_{L^{q}(\mathbb{R})} for some q1q\geq 1, while maintaining nn small, as nn is a good proxy to the computational cost. To see that, first note that since sampling arbitrarily many times from the polynomial gng_{n} is computationally cheap, approximating pνnp_{\nu_{n}} to arbitrary precision given gng_{n} is relatively cheap as well, see the analysis in [15]. Therefore, it is constructing gng_{n} which is costly. In the collocation gPC (3), the larger the degree of approximation g=gng=g_{n} is, the more evaluations of ff are required. If for example ff models the response of a partial differential equation (PDE), each such evaluation amounts to a solution of the PDE, which is usually computationally expansive. In the Galerkin-type gPC (2), one usually projects the original PDE with random parameters to several PDEs with deterministic parameters, the number of which also grows with the degree nn, see [50] for details. Therefore, in either case guaranteeing good accuracy for a small degree nn should be the goal of our analysis.

As noted in the introduction, L2(Ω)L^{2}(\Omega) convergence alone does not guarantee convergence of the pushforward PDFs. Indeed, one can construct a sequence such that gnfg_{n}\to f in L2(Ω)L^{2}(\Omega) but pνnpμLq()>const(q)\|p_{\nu_{n}}-p_{\mu}\|_{L^{q}(\mathbb{R})}>{\rm const}(q) for all n,q1n,q\geq 1 [15]. Previously however [32], we proved that L2L^{2} convergence is sufficient to establish convergence in the weaker Wasserstein metric, or more precisely, that Wass2(fϱ,gϱ)fgL2(Ω,ϱ)\rm Wass_{2}(f_{\sharp}\varrho,g_{\sharp}\varrho)\leq\|f-g\|_{L^{2}(\Omega,\varrho)} (see relevant definitions in Sec. 5). Furthermore, we showed that uniform boundedness of fgn\|f-g_{n}\|_{\infty} combined with L2(Ω)L^{2}(\Omega) convergence guarantees Wasserstein-pp convergence for any 1p<1\leq p<\infty, see (26) below. Since Wass1(μ,ν)\rm Wass_{1}(\mu,\nu) is equal to the L1()L^{1}(\mathbb{R}) convergence of the CDFs [34, 43], the convergence of the CDFs in L1L^{1} is a corollary of our result. A result in the direction of PDF approximation by surrogate methods was obtain by Ditkowski, Fibich, and the author in [15]. If gng_{n} is taken to be the spline interpolant of ff of order mm, on e.g., a uniform grid with a total of NN grid points, then pμpνnLq()Nm/d\|p_{\mu}-p_{\nu_{n}}\|_{L^{q}(\mathbb{R})}\lesssim N^{-m/d} for any 1q<1\leq q<\infty. The main characteristic of splines that is useful to establish this result is the pointwise C1(Ω)C^{1}(\Omega) convergence of gng_{n} to ff, see Theorem 4 below. We will use this tool tool later to prove Theorem 1.

2.4. Problem formulation

Let Ω=Ω(d)=[1,1]d\Omega=\Omega(d)=[-1,1]^{d} be equipped with an absolutely continuous probability measure ϱ\varrho. Consider a smooth function of interest f:Ωf:\Omega\to\mathbb{R} and let gn:Ωg_{n}:\Omega\to\mathbb{R} be its generalized polynomial chaos (gPC) approximation - either its L2(Ω)L^{2}(\Omega) projection to the space of Legendre polynomials of order n\leq n (Galerkin type) or its polynomial interpolant on the Gauss-Legendre quadrature points of order nn (collocation type). Consider the pushforward measures μ:=fϱ\mu:\,=f_{*}\varrho and νn:=(gn)ϱ\nu_{n}:\,=(g_{n})_{*}\varrho and denote their respective probability density functions (PDF) by pμp_{\mu} and pνnp_{\nu_{n}}. In what follows, we see under what conditions pνnp_{\nu_{n}} converges to pμp_{\mu} in Lq()L^{q}(\mathbb{R}) for 1q<1\leq q<\infty, and find the convergence rates.

3. Main results

The convergence of the pushed-forward PDFs is guaranteed by the following:

Theorem 1.

Let Ω=[1,1]d\Omega=[-1,1]^{d} for any d1d\geq 1, let fHσ(Ω)f\in H^{\sigma}(\Omega) where

(6) σσmin(d)={512+d,deven,412+d,dodd,\sigma\geq\sigma_{\min}(d)=\left\{\begin{array}[]{ll}5\frac{1}{2}+d\,,&d~{}{\rm even}\,,\\ 4\frac{1}{2}+d\,,&d~{}{\rm odd}\,,\end{array}\right.

let dϱ(α)=r(α)dαd\varrho(\alpha)=r(\alpha)d\alpha with rC1(Ω)r\in C^{1}(\Omega), and assume that |f|>κf>0|\nabla f|>\kappa_{f}>0. Then for any 1q<1\leq q<\infty,

pμpνnLq()fgnC1(Ω)nσ+σmin2fHσ(Ω).\|p_{\mu}-p_{\nu_{n}}\|_{L^{q}(\mathbb{R})}\lesssim\|f-g_{n}\|_{C^{1}(\Omega)}\lesssim n^{-\sigma+\sigma_{\min}-2}\|f\|_{H^{\sigma}(\Omega)}\,.

For the one-dimensional case, the result can be improved:

Theorem 2.

Let Ω=[1,1]\Omega=[-1,1], let fHσ(Ω)f\in H^{\sigma}(\Omega) with σ6\sigma\geq 6 and let dϱ(α)=r(α)dαd\varrho(\alpha)=r(\alpha)d\alpha with rC1(Ω)r\in C^{1}(\Omega). If f(αj)=0f^{\prime}(\alpha_{j})=0 for finitely many points α1,,αJΩ\alpha_{1},\ldots,\alpha_{J}\in\Omega and there exist kj2k_{j}\geq 2 for each 1jJ1\leq j\leq J such that |f(kj+1)(αj)|>0|f^{(k_{j}+1)}(\alpha_{j})|>0, then

(7) pμpνnL1()fgnH1(Ω)12k+1fHσ(Ω)12k+1n2σ32(2k+1),\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}^{\frac{1}{2k+1}}\lesssim\|f\|_{H^{\sigma}(\Omega)}^{\frac{1}{2k+1}}n^{-\frac{2\sigma-3}{2(2k+1)}}\,,

where k=maxjkjk=\max_{j}k_{j}. In particular, if |f(α)|>κf>0|f^{\prime}(\alpha)|>\kappa_{f}>0 for all αΩ\alpha\in\Omega, then

(8) pμpνnL1()fgnH1(Ω)fHσ(Ω)n32σ.\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}\lesssim\|f\|_{H^{\sigma}(\Omega)}n^{\frac{3}{2}-\sigma}\,.

Furthermore, if ff is analytic on an ellipse in \mathbb{C} with foci at ±1\pm 1 and with both radius summing to r>1r>1, pμpνL1()\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})} converges faster than any polynomial in nn.

Theorem 2 improves Theorem 1 in the case of d=1d=1 in two ways - First, it allows for points with f=0f^{\prime}=0. Second, Theorem 2 improves the convergence rate by three orders, from n412σn^{4\frac{1}{2}-\sigma} to n32σn^{\frac{3}{2}-\sigma}. Generally, Theorem 2 also yields much smaller constants. These improvements are owed to our ability to link the PDF convergence directly to H1(Ω)H^{1}(\Omega) convergence of gng_{n} to ff. Since the PDFs depend on ff^{\prime}, it is conjectured that convergence of gng_{n} to ff in H1(Ω)H^{1}(\Omega) is the weakest Hk(ω)H^{k}(\omega) convergence that guarantees convergence of the PDFs, thus possibly leading to a sharp rate in (8). In Theorem 1, we use C1(Ω)C^{1}(\Omega) convergence of gng_{n} to ff, which by Sobolev embedding depends on the much stronger, and therefore much slower, H2+d/2(Ω)H^{2+\lfloor d/2\rfloor}(\Omega) convergence. The improvement of rates also implies the improvement of constants, as can be observed from the Canuto and Quarteroni result (4). Suppose one wishes to guarantee pμpνn1Cn9/2\|p_{\mu}-p_{\nu_{n}}\|_{1}\leq Cn^{-9/2}, and that f>κf>0f^{\prime}>\kappa_{f}>0. Since Theorem 2 uses H1H^{1} convergence, the constant would increase with fH6\|f\|_{H^{6}}, whereas in Theorem 1 it would depend on fH9\|f\|_{H^{9}}.

That the convergence constants depend on high-regularity Sobolev norm is emblematic of global methods in general, and spectral methods in particular: To approximate a function with high derivatives well, i.e., in the asymptotically guaranteed rate, one has to to use a gPC polynomial of a relatively high order. This high threshold resolution is often embodied in the constants of the upper bound. Here one draws a distinction between the spline-based surrogate proposed in [15] and the spectral methods: splines guarantee polynomial convergence with low-sensitivity to high-derivatives. Global polynomial methods, such as gPC, can provide exponential accuracy if ff is very smooth in comparison to the sampling resolution. Another way in which even Theorem 2 depends on high-order Sobolev norms, is that it requires that ff is at least in H6H^{6}. This is a technical requirement that is needed to guarantee that fC2\|f\|_{C^{2}} does not grow with nn, see Lemma 3. Simulations seem to suggest that this is not a sharp requirement, see Sec. 6.

4. Proofs of main results

Throughout this paper we need to establish the C1C^{1} approximation of ff by gng_{n}, and that gnC2(Ω)\|g_{n}\|_{C^{2}(\Omega)} is uniformly bounded in nn.

Lemma 3.

Under the conditions of Theorem 1, gnC2(Ω)\|g_{n}\|_{C^{2}(\Omega)} is uniformly bounded for all nn\in\mathbb{N}, and

(9) fgnC1(Ω)nσ+σmin+2fHσ(Ω).\|f-g_{n}\|_{C^{1}(\Omega)}\lesssim n^{-\sigma+\sigma_{\min}+2}\|f\|_{H^{\sigma}(\Omega)}\,.
Proof.

Recall the Sobolev-Morrey embedding theorem [2, 18]: If uHs(Ω)u\in H^{s}(\Omega), and s>d/2s>d/2, then

(10) uCsd21(Ω)uHs(Ω),\|u\|_{C^{s-\lfloor\frac{d}{2}\rfloor-1}(\Omega)}\lesssim\|u\|_{H^{s}(\Omega)}\,,

where x\lfloor x\rfloor is the lower integer value for any x0x\geq 0.222In this study, we will not use a stronger version of these embeddings for Hölder norms Ck,αC^{k,\alpha}, but just the integer-power CkC^{k} norms. By (10), choosing β=3+d/2\beta=3+\lfloor d/2\rfloor yields

(11) fgnC2(Ω)fgnHβ(Ω).\|f-g_{n}\|_{C^{2}(\Omega)}\lesssim\|f-g_{n}\|_{H^{\beta}(\Omega)}\,.

Applying the Sobolev approximation result (4) to (11) yields

(12) fgnC2ne(β,σ)fHσ.\|f-g_{n}\|_{C^{2}}\lesssim n^{-e(\beta,\sigma)}\|f\|_{H^{\sigma}}\,.

To guarantee a uniform bound for all nn\in\mathbb{N}, it is sufficient to choose σσmin\sigma\geq\sigma_{\rm min} such that e(β,σmin)=0e(\beta,\sigma_{\rm min})=0. Since β>1\beta>1, then

0\displaystyle 0 =e(β,σmin)\displaystyle=e(\beta,\sigma_{\min})
=σmin+122β\displaystyle=\sigma_{\min}+\frac{1}{2}-2\beta
=σmin+122(3+d2).\displaystyle=\sigma_{\min}+\frac{1}{2}-2(3+\lfloor\frac{d}{2}\rfloor)\,.
σmin={512+d,deven,412+d,dodd.\displaystyle\Longrightarrow\sigma_{\min}=\left\{\begin{array}[]{ll}5\frac{1}{2}+d\,,&d~{}{\rm even}\,,\\ 4\frac{1}{2}+d\,,&d~{}{\rm odd}\,.\end{array}\right.

For σσmin\sigma\geq\sigma_{\min}, then e(β,σ)0e(\beta,\sigma)\geq 0 in (12), and so fgnC2(Ω)fHσ(Ω)\|f-g_{n}\|_{C^{2}(\Omega)}\lesssim\|f\|_{H^{\sigma}(\Omega)}. Hence, since fC2(Ωd)f\in C^{2}(\Omega_{d}), then

gnC2(Ω)\displaystyle\|g_{n}\|_{C^{2}(\Omega)} gnfC2(Ω)+fC2(Ω)\displaystyle\lesssim\|g_{n}-f\|_{C^{2}(\Omega)}+\|f\|_{C^{2}(\Omega)}
fHσ(Ω)+fC2(Ω),n.\displaystyle\lesssim\|f\|_{H^{\sigma}(\Omega)}+\|f\|_{C^{2}(\Omega)}\,,\qquad n\in\mathbb{N}\,.

We proceed to prove the estimate (9). By Sobolev-Morrey inequality (10) and (4), we have that

fPnfC1(Ω)\displaystyle\|f-P_{n}f\|_{C^{1}(\Omega)} fPnfH2+d2(Ω)\displaystyle\lesssim\|f-P_{n}f\|_{H^{2+\lfloor\frac{d}{2}\rfloor}(\Omega)}
fHσne(2+d2,σ)(Ω),\displaystyle\lesssim\|f\|_{H^{\sigma}}n^{e(2+\lfloor\frac{d}{2}\rfloor,\sigma)(\Omega)}\,,

where

e(2+d2,σ)\displaystyle e(2+\lfloor\frac{d}{2}\rfloor,\sigma) =σ+1242d2\displaystyle=\sigma+\frac{1}{2}-4-2\lfloor\frac{d}{2}\rfloor
={σ312d,deven,σ212d,dodd,\displaystyle=\left\{\begin{array}[]{ll}\sigma-3\frac{1}{2}-d\,,&d~{}{\rm even}\,,\\ \sigma-2\frac{1}{2}-d\,,&d~{}{\rm odd}\,,\end{array}\right.

which is positive for all σ>σmin\sigma>\sigma_{\min}. ∎

4.1. Proof of Theorem 1

Local C1C^{1} convergence as established in Lemma 3 implies convergence of PDFs by the following result:

Theorem 4 (Corollary 5.5, [15]).

Let (gn)n=1C1(Ω)(g_{n})_{n=1}^{\infty}\subset C^{1}(\Omega), and consider fC1(Ω)f\in C^{1}(\Omega) such that |f|>κf>0|\nabla f|>\kappa_{f}>0. Then if

(13a) gnC2(Ωd)K,\|g_{n}\|_{C^{2}(\Omega_{d})}\leq K\,,
for some constant KK and for all nn\in\mathbb{N}, and if
(13b) fgnC1(Ω)Knτ,τ,K>0,\|f-g_{n}\|_{C^{1}(\Omega)}\leq Kn^{-\tau}\,,\qquad\tau,K>0\,,

then |pμ(y)pνn(y)|nτ|p_{\mu}(y)-p_{\nu_{n}}(y)|\lesssim n^{-\tau} for all but o(nτ)o(n^{-\tau}) points, and therefore

pμpνLq()nτ,\|p_{\mu}-p_{\nu}\|_{L^{q}(\mathbb{R})}\lesssim n^{-\tau}\,,

for all 1q<1\leq q<\infty. Furthermore, if d=1d=1 and gng_{n} interpolates ff at the endpoints f(±1)=gn(±1)f(\pm 1)=g_{n}(\pm 1), then the uniform estimate fgnLnτ\|f-g_{n}\|_{L^{\infty}}\lesssim n^{-\tau} holds.

Indeed, (13a) is guaranteed explicitly and (9) guarantees (13b) with an explicit convergence rate τ=σσmin2>0\tau=\sigma-\sigma_{\min}-2>0.

4.2. Proof of Theorem 2

For brevity, we omit the nn subscripts, denoting gn=gg_{n}=g and ν=νn\nu=\nu_{n}. First, we treat the case where |f|>κf>0|f^{\prime}|>\kappa_{f}>0. Suppose ff is monotonic increasing, i.e., f>κf>0f^{\prime}>\kappa_{f}>0. By the embedding theorem (10), that fH6(Ω)f\in H^{6}(\Omega) implies that fC2(Ω)f\in C^{2}(\Omega) and so

(14) pμ(y)=f(α)=ypϱ(α)|f(α)|=r(f1(y))f(f1(y)),p_{\mu}(y)=\sum\limits_{f(\alpha)=y}\frac{p_{\varrho}(\alpha)}{|f^{\prime}(\alpha)|}=\frac{r(f^{-1}(y))}{f^{\prime}(f^{-1}(y))}\,,

for every yrange(f)y\in{\rm range}(f) [15, Lemma 4.1]. Since gg is a polynomial, gC1(Ω)g\in C^{1}(\Omega). Furthermore, by Lemma 3 we have that fgC1n5/2fH4(Ω)\|f-g\|_{C^{1}}\lesssim n^{-5/2}\|f\|_{H^{4}(\Omega)}, and so for sufficiently large nn, g(α)>κf/2>0g^{\prime}(\alpha)>\kappa_{f}/2>0 for all αΩ\alpha\in\Omega. Therefore pν(y)=r(g1(y))/g(g1(y))p_{\nu}(y)=r(g^{-1}(y))/g^{\prime}(g^{-1}(y)) for every yrange(g)y\in{\rm range}(g). It might be that the ranges of ff and gg, which are the supports of pμp_{\mu} and pνp_{\nu}, respectively, do not overlap. Assume for simplicity that f(1)=g(1)f(-1)=g(-1) but g(1)>f(1)g(1)>f(1), the other cases can be treated similarly. Then

(15) pμpνL1()=f(1)f(1)|pμ(y)pν(y)|𝑑y+f(1)g(1)pν(y)𝑑y.\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}=\int\limits_{f(-1)}^{f(1)}|p_{\mu}(y)-p_{\nu}(y)|\,dy+\int\limits_{f(1)}^{g(1)}p_{\nu}(y)\,dy\,.

We begin with the second integral -

f(1)g(1)pν(y)𝑑y\displaystyle\int\limits_{f(1)}^{g(1)}p_{\nu}(y)\,dy =f(1)g(1)r(g1(y))g(g1(y))𝑑y\displaystyle=\int\limits_{f(1)}^{g(1)}\frac{r(g^{-1}(y))}{g^{\prime}(g^{-1}(y))}\,dy
2κfr|g(1)f(1)|\displaystyle\leq\frac{2}{\kappa_{f}}\|r\|_{\infty}|g(1)-f(1)|
2κfrgfC0(Ω)\displaystyle\leq\frac{2}{\kappa_{f}}\|r\|_{\infty}\|g-f\|_{C^{0}(\Omega)}
2κfrgfH1(Ω),\displaystyle\lesssim\frac{2}{\kappa_{f}}\|r\|_{\infty}\|g-f\|_{H^{1}(\Omega)}\,,

where the last inequality is due to the Sobolev-Morrey embedding (10). Therefore, we need only to consider the first integral in (15), and we therefore assume without loss of generality that range(f)=range(g){\rm range}(f)={\rm range}(g), and so

(16) pμpνL1()=f(1)f(1)|r(f1(y))f(f1(y))r(g1(y))g(g1(y))|𝑑y.\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}=\int\limits_{f(-1)}^{f(1)}\left|\frac{r(f^{-1}(y))}{f^{\prime}(f^{-1}(y))}-\frac{r(g^{-1}(y))}{g^{\prime}(g^{-1}(y))}\right|\,dy\,.

Denote α=f1(y)\alpha=f^{-1}(y) and α:=α(α)=g1(f(α))\alpha_{*}:\,=\alpha_{*}(\alpha)=g^{-1}(f(\alpha)), then by change of variables

pμpνL1()\displaystyle\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})} =11|r(α)f(α)r(α)g(α)|f(α)𝑑α\displaystyle=\int\limits_{-1}^{1}\left|\frac{r(\alpha)}{f^{\prime}(\alpha)}-\frac{r(\alpha_{*})}{g^{\prime}(\alpha_{*})}\right|f^{\prime}(\alpha)\,d\alpha
=11|r(α)g(α)f(α)r(α)|g(α)𝑑α.\displaystyle=\int\limits_{-1}^{1}\frac{|r(\alpha)g^{\prime}(\alpha_{*})-f^{\prime}(\alpha)r(\alpha_{*})|}{g^{\prime}(\alpha^{*})}\,d\alpha\,.

Since g(α)g^{\prime}(\alpha) and r(α)r(\alpha) are differentiable, then for any αΩ\alpha\in\Omega

(17) |r(α)g(α)\displaystyle|r(\alpha)g^{\prime}(\alpha_{*})- f(α)r(α)|\displaystyle f^{\prime}(\alpha)r(\alpha_{*})|
(18) r(α)|g(α)g(α)|+r(α)|g(α)f(α)|+f(α)|r(α)r(α)|\displaystyle\leq r(\alpha)|g^{\prime}(\alpha_{*})-g^{\prime}(\alpha)|+r(\alpha)|g^{\prime}(\alpha)-f^{\prime}(\alpha)|+f^{\prime}(\alpha)|r(\alpha)-r(\alpha_{*})|
(19) D|αα|+r(α)|f(α)g(α)|,\displaystyle\leq D|\alpha-\alpha_{*}|+r(\alpha)|f^{\prime}(\alpha)-g^{\prime}(\alpha)|\,,\qquad

where D:=[rg′′+rf]D:\,=\left[\|r\|_{\infty}\|g^{\prime\prime}\|_{\infty}+\|r^{\prime}\|_{\infty}\|f\|_{\infty}\right]. In general, D=DnD=D_{n} as g=Pnfg=P_{n}f depends on nn, and g′′\|g^{\prime\prime}\|_{\infty} might not be bounded. However, as in Lemma 3, g′′\|g^{\prime\prime}\|_{\infty} and therefore DD are uniformly bounded for all n1n\geq 1. Since gκf/2g^{\prime}\geq\kappa_{f}/2 for sufficiently large nn, then substituting (17) in the integral yields

(20) pμpνL1()1κf11r(α)|f(α)g(α)|𝑑α:=I+Dκf11|αα|𝑑α:=II.\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\frac{1}{\kappa_{f}}\underbrace{\int\limits_{-1}^{1}r(\alpha)|f^{\prime}(\alpha)-g^{\prime}(\alpha)|\,d\alpha}_{:\,={\rm I}}+\underbrace{\frac{D}{\kappa_{f}}\int\limits_{-1}^{1}|\alpha-\alpha_{*}|\,d\alpha}_{:\,={\rm II}}\,.

Since r(α),(fg)L2(Ω)r(\alpha),(f^{\prime}-g^{\prime})\in L^{2}(\Omega), then by the Cauchy-Schwartz inequality I{\rm I} is bounded from above by

|r,fgL2(Ω)|rL2(Ω)fgL2(Ω)rL2(Ω)fgH1(Ω)\left|\langle r,f^{\prime}-g^{\prime}\rangle_{L^{2}(\Omega)}\right|\leq\|r\|_{L^{2}(\Omega)}\cdot\|f^{\prime}-g^{\prime}\|_{L^{2}(\Omega)}\leq\|r\|_{L^{2}(\Omega)}\cdot\|f-g\|_{H^{1}(\Omega)}\,

To bound II{\rm II} in (20) from above, we first note that by Lagrange’s mean-value theorem, there exists β\beta between α\alpha and α\alpha_{*} such that g(β)(αα)=g(α)g(α)=g(α)f(α)g^{\prime}(\beta)(\alpha-\alpha_{*})=g(\alpha)-g(\alpha_{*})=g(\alpha)-f(\alpha), and therefore |αα||g(α)f(α)|/κf|\alpha-\alpha_{*}|\leq|g(\alpha)-f(\alpha)|/\kappa_{\rm f}. From here, the process of bounding II{\rm II} from above is the same as bounding I{\rm I}, which yields IID/κffgL2(Ω){\rm II}\leq D/\kappa_{f}\|f-g\|_{L^{2}(\Omega)}. Therefore pμpνL1()fgH1(Ω)\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g\|_{H^{1}(\Omega)}. Applying the relevant Sobolev approximation theorem (4), settles the case where |f|>κf>0|f^{\prime}|>\kappa_{f}>0.

We now turn to the case where gg^{\prime} and ff^{\prime} vanish at finitely many points. As we show, it does not matter whether these zero points coincide or not, and we will therefore treat the case where f(1)=0f^{\prime}(-1)=0, g(1)>0g^{\prime}(-1)>0 and f(α),g(α)>0f^{\prime}(\alpha),g^{\prime}(\alpha)>0 for all α(1,1]\alpha\in(-1,1]. Assume without loss of generality that f(1)=g(1)f(-1)=g(-1).333We showed how to treat the case where the ranges of ff and gg do not overlap above. Fix ε>0\varepsilon>0 and divide the integral for pμpν1\|p_{\mu}-p_{\nu}\|_{1} on the left hand side of (16) into two domains - (1) isolating the singular point in the PDF, i.e., y[g(1),g(1+ε)]y\in[g(-1),g(-1+\varepsilon)] and (2) the rest of the domain, y[g(1+ε),g(1)]y\in[g(-1+\varepsilon),g(1)].

  1. (1)

    On the first domain [g(1),g(1+ε)][g(-1),g(-1+\varepsilon)], we take a crude estimation

    (21a) g(1)g(1+ε)|pμ(y)pν(y)|𝑑yg(1)g(1+ε)pμ(y)+pν(y)dy.\int\limits_{g(-1)}^{g(-1+\varepsilon)}|p_{\mu}(y)-p_{\nu}(y)|\,dy\leq\int\limits_{g(-1)}^{g(-1+\varepsilon)}p_{\mu}(y)+p_{\nu}(y)\,dy\,.
    Taking, for example pμ(y)p_{\mu}(y), by a change of variables f(α)=yf(\alpha)=y we get
    (21b) g(1)g(1+ε)pμ(y)=g(1)g(1+ε)r(f1(y))f(f1(y))𝑑y=11+εr(α)𝑑αrε.\int\limits_{g(-1)}^{g(-1+\varepsilon)}p_{\mu}(y)=\int\limits_{g(-1)}^{g(-1+\varepsilon)}\frac{r(f^{-1}(y))}{f^{\prime}(f^{-1}(y))}\,dy=\int\limits_{-1}^{-1+\varepsilon}r(\alpha)\,d\alpha\leq\|r\|_{\infty}\varepsilon\,.
  2. (2)

    The integral on [g(1+ε),g(1)][g(-1+\varepsilon),g(1)] reads the same as (20), where κf\kappa_{f} is replaced by κfε:=minx[1+ε,1]|f|\kappa_{f}^{\varepsilon}:\,=\min\limits_{x\in[-1+\varepsilon,1]}|f^{\prime}|, yielding

    (22) g(1+ε)g(1)|pμ(y)pν(y)|𝑑yD~(κfε)2fgH1,\int\limits_{g(-1+\varepsilon)}^{g(1)}|p_{\mu}(y)-p_{\nu}(y)|\,dy\leq\frac{\tilde{D}}{(\kappa_{f}^{\varepsilon})^{2}}\|f-g\|_{H^{1}}\,,

    where D^\hat{D} is ε\varepsilon-independent. How does κfε\kappa_{f}^{\varepsilon} depend on ε\varepsilon? Since in the worst case f(1)=f(1)=f(k)(1)=0f(-1)=f^{\prime}(-1)=\cdots f^{(k)}(-1)=0 but f(k+1)(1)=0f^{(k+1)}(-1)=0, then by Taylor expansion, κfεεk\kappa_{f}^{\varepsilon}\approx\varepsilon^{k} for sufficiently small ε\varepsilon.

By combining both upper bounds, we have that

pμpνL1()ε2kfgH1(Ω)+rε.\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\varepsilon^{-2k}\|f-g\|_{H^{1}(\Omega)}+\|r\|_{\infty}\varepsilon\,.

This upper bound is minimized by equating its ε\varepsilon derivative to zero, which yields r2kfgε2k1\|r\|_{\infty}\sim 2k\|f-g\|\varepsilon^{-2k-1}. Hence, for sufficiently small fgH1\|f-g\|_{H^{1}} (i.e., for sufficiently large nn), pμpνL1()fgH1(Ω)1/(2k+1)\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g\|_{H^{1}(\Omega)}^{1/(2k+1)}.

Finally, we reduce the general case where ff^{\prime} has finitely many nodal points α1,,αL\alpha_{1},\ldots,\alpha_{L}. In the integral f(1)f(1)|pμ(y)pν(y)|𝑑y\int_{f(-1)}^{f(1)}|p_{\mu}(y)-p_{\nu}(y)|\,dy, we take out the intervals f((αjε,αj+ε))f((\alpha_{j}-\varepsilon,\alpha_{j}+\varepsilon)) and treat them separately, as in (21). For yy value outside these intervals, we claim that ff^{\prime} and gg^{\prime} have the same sign, for sufficiently large nn. We split the discussion into two cases. First, if ff^{\prime} changes its sign at αj\alpha_{j}, then there are two points in the interval (αjε,αj+ε)(\alpha_{j}-\varepsilon,\alpha_{j}+\varepsilon) where ff^{\prime} is maximized and minimized. Taking nn to be sufficiently large, then gg^{\prime} has to be positive and negative at these points, respectively, due to the pointwise C1C^{1} approximation (9). Therefore, gg^{\prime} must change its sign in between. It might be that gg^{\prime} changes its sign within this interval again, but outside of this interval, since f>κfε>0f^{\prime}>\kappa_{f}^{\varepsilon}>0, then gg^{\prime} has the same sign as ff^{\prime} there for sufficiently large nn. The second case is easier - if ff does not change its sign at αj\alpha_{j}, then outside the interval f>κfε>0f^{\prime}>\kappa_{f}^{\varepsilon}>0, and so by (9) g>κfε/2>0g^{\prime}>\kappa_{f}^{\varepsilon}/2>0.

5. A transport-based convergence result for d=1d=1

In this section, we present a different convergence result for the one-dimensional collocation gPC. This method of proof highlights the role that the “weaker” Wasserstein metric can play in understanding PDFs, and the potential such methods have for future works. The result only applied to the Gauss-Lobatto (GL), defined as the roots of pn(α)p_{n}^{\prime}(\alpha), the derivative of the Legendre polynomial of order nn, plus the endpoints {1,1}\{-1,1\}. The polynomial interpolant at GL quadrature points admits the same Sobolev approximation theory as stated in (4) for the Gauss-Legendre points, see [8] and [31, Chapter 10]. As we will see, the following result is restricted to the GL interpolant since the condition f(±1)=gn(±1)f(\pm 1)=g_{n}(\pm 1) guarantees that range(f)=range(gn){\rm range}(f)={\rm range}(g_{n}).

Theorem 5.

Let Ω=[1,1]\Omega=[-1,1]. For any integer m1m\geq 1, and any function fWσ,2(Ω)f\in W^{\sigma,2}(\Omega) with σ2m+4\sigma\geq 2m+4 and |f|κf>0|f^{\prime}|\geq\kappa_{f}>0, let gng_{n} be its polynomial interpolant at the GL quadrature points of order nn, and suppose the dϱ(α)=r(α)dαd\varrho(\alpha)=r(\alpha)d\alpha where rWm,1(Ω)r\in W^{m,1}(\Omega). Then

(23) pμpνnL1()pμWm,1(A)1m+1fWσ,2(Ω)1m+1nmm+1(σ56),\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\|f\|_{W^{\sigma,2}(\Omega)}^{\frac{1}{m+1}}n^{-\frac{m}{m+1}\left(\sigma-\frac{5}{6}\right)}\,,

where A=image(f)A={\rm image}(f).

For analytic functions, both Theorems 2 and 5 guarantee faster than polynomial convergence. For functions fHσHσ1f\in H^{\sigma}\setminus H^{\sigma-1}, however, Theorem 2 guarantees slightly better convergence rates. Theorem 5 does improve on Theorem 2 in that it relaxes the demand rC1r\in C^{1} to rWm,1r\in W^{m,1}. But as noted, the importance of Theorem 5 is the method of its proof, which relies on the Wasserstein distance.

Given two probability measures ω1\omega_{1} and ω2\omega_{2} on \mathbb{R}, the Wasserstein-11 distance is defined as444To avoid confusion - Wk,pW^{k,p} denotes the Sobolev spaces of functions with kk derivatives which are pp integrable, and Wassp\rm Wass_{p} denotes the Wasserstein-pp distance (instead of the standard WpW_{p}).

(24a) Wass1(ω1,ω2):=infγΓ2|xy|dγ(x,y),\rm Wass_{1}(\omega_{1},\omega_{2}):\,=\inf\limits_{\gamma\in\Gamma}\int\limits_{\mathbb{R}^{2}}|x-y|\,d\gamma(x,y)\,,
where Γ\Gamma is the set of all measures γ\gamma on 2\mathbb{R}^{2} for which ω1\omega_{1} and ω2\omega_{2} are marginals, i.e., for any Borel BB\subseteq\mathbb{R},
(24b) ω1(B)=×Bγ(x,y)𝑑y,ω2(B)=B×γ(x,y)𝑑x.\omega_{1}(B)=\int\limits_{\mathbb{R}\times B}\gamma(x,y)\,dy\,,\qquad\omega_{2}(B)=\int\limits_{B\times\mathbb{R}}\gamma(x,y)\,dx\,.

Since ω1()=ω2()=1\omega_{1}(\mathbb{R})=\omega_{2}(\mathbb{R})=1, a minimizer of (24) exists, and so Wass1(ω1,ω2)\rm Wass_{1}(\omega_{1},\omega_{2}) is finite, and it is a metric [35, 44]. Intuitively, the Wasserstein distance is often referred to as the earth-mover’s distance; Wass1(ω1,ω2)\rm Wass_{1}(\omega_{1},\omega_{2}) computes the minimal work (distance times force) by which one can transfer a mound of earth in the mold of ω1\omega_{1} to a one that is in the mold of ω2\omega_{2}.

How does the Wasserstein distance relate to the problem of densities? In general, Wass1(ω1,ω2)pω1pω2L1()\rm Wass_{1}(\omega_{1},\omega_{2})\lesssim\|p_{\omega_{1}}-p_{\omega_{2}}\|_{L^{1}(\mathbb{R})}, but not the other way around [22]. This is why, in general, the Wasserstein distance induces a weaker topology on the space of probability measures than that induced by the L1L^{1} distance between the PDFs. Moreover, the Wasserstein distance is even well-defined for singular measures, which do not have densities at all. A recent result due to Chae and Walker, however, shows that if the densities are sufficiently regular, the Wasserstein-11 metric bounds from above the L1L^{1} distance between the densities.

Theorem (Chae and Walker [7]).

Let ω1\omega_{1} and ω2\omega_{2} be two Borel measures on AA\subseteq\mathbb{R} with PDFs pω1,pω2Wm,1(A)p_{\omega_{1}},p_{\omega_{2}}\in W^{m,1}(A) for some m1m\geq 1. Then

(25) pω1pω2L1(A)(pω1Wm,1(A)+pω2Wm,1(A))1m+1Wass1mm+1(ω1,ω2).\|p_{\omega_{1}}-p_{\omega_{2}}\|_{L^{1}(A)}\lesssim\left(\|p_{\omega_{1}}\|_{W^{m,1}(A)}+\|p_{\omega_{2}}\|_{W^{m,1}(A)}\right)^{\frac{1}{m+1}}\rm Wass_{1}^{\frac{m}{m+1}}(\omega_{1},\omega_{2})\,.

As we will show in Lemma 6, one can verify under what conditions pμp_{\mu} and pνnp_{\nu_{n}} are sufficiently regular. Therefore, to prove L1()L^{1}(\mathbb{R}) convergence of pνnp_{\nu_{n}} to pμp_{\mu}, it is sufficient to prove the convergence of Wass1(μ,νn)\rm Wass_{1}(\mu,\nu_{n}). The weak convergence in the Wasserstein metric has recently been established under much more general conditions by the author:

Theorem ([32]).

For any compact Borel set Ωd\Omega\subset\mathbb{R}^{d}, and for every 1p,q<1\leq p,q<\infty,

(26) Wassp(μ,νn)fgnLq(Ω)qq+pfgnL(Ω)pq+p,\rm Wass_{p}(\mu,\nu_{n})\lesssim\|f-g_{n}\|_{L^{q}(\Omega)}^{\frac{q}{q+p}}\|f-g_{n}\|_{L^{\infty}(\Omega)}^{\frac{p}{q+p}}\,,

where the implicit constant depends only on Ω\Omega, pp, and qq.

Proof of Theorem 5.

To apply (25) to μ\mu and νn\nu_{n}, we need to show that their densities are sufficiently regular.

Lemma 6.

Let m1m\geq 1 and let fW2m+4,2(Ω)f\in W^{2m+4,2}(\Omega) with |f(α)|κf>0|f^{\prime}(\alpha)|\geq\kappa_{f}>0 for all αI\alpha\in I, and denote A=range(f)A={\rm range}(f)\subset\mathbb{R}. Then pμWm,1(A)p_{\mu}\in W^{m,1}(A), pνnWm,1(A)p_{\nu_{n}}\in W^{m,1}(A) for sufficiently large nn, and

pνnWm,1(A)pμWm,1(A),\|p_{\nu_{n}}\|_{W^{m,1}(A)}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}\,,

where the implicit constant does not depend on nn.

proof of lemma.

We begin, for simplicity, with m=1m=1. By (5), if ff is monotonic, formally,

(27) ddypμ(y)=r(f1(y))f′′(f1(y))|f(f1(y))|3+r(f1(y))|f(f1(y))|2.\frac{d}{dy}p_{\mu}(y)=-r(f^{-1}(y))\frac{f^{{}^{\prime\prime}}(f^{-1}(y))}{|f^{\prime}(f^{-1}(y))|^{3}}+\frac{r^{\prime}(f^{-1}(y))}{|f^{\prime}(f^{-1}(y))|^{2}}\,.

By Sobolev-Morrey embedding (10), fW6,2(Ω)C2(Ω)f\in W^{6,2}(\Omega)\subseteq C^{2}(\Omega). Combined with the fact that |f|>κf>0|f^{\prime}|>\kappa_{f}>0, then ddypμ(y)W1,1()\frac{d}{dy}p_{\mu}(y)\in W^{1,1}(\mathbb{R}). For gng_{n}, due to Lemma 3, |gn(α)|>κf/2>0|g_{n}^{\prime}(\alpha)|>\kappa_{f}/2>0 on Ω\Omega for all sufficiently large nn, and gn′′\|g_{n}^{\prime\prime}\|_{\infty} is also uniformly bounded for all n1n\geq 1 (gng_{n} is a polynomial, and therefore smooth). By the analog of (27) for νn\nu_{n} and gng_{n}, we have that pνnW1,1()p_{\nu_{n}}\in W^{1,1}(\mathbb{R}). Since |gn′′||g_{n}^{\prime\prime}| is uniformly bounded in terms of ff and its derivatives, pνnW1,1()pμW1,1()\|p_{\nu_{n}}\|_{W^{1,1}(\mathbb{R})}\lesssim\|p_{\mu}\|_{W^{1,1}(\mathbb{R})} for all sufficiently large nn. Here it is key that our domain is one-dimensional, that gng_{n} interpolates ff at the endpoints α=±1\alpha=\pm 1, and that both functions are monotonic. These three facts ensure that

A=supp(μ)=image(f)=image(gn)=supp(νn).A={\rm supp}(\mu)={\rm image}(f)={\rm image}(g_{n})={\rm supp}(\nu_{n})\,.

Suppose otherwise that, without loss of generality, maxf(α)maxgn(α)=y\max f(\alpha)\geq\max g_{n}(\alpha)=y_{*}. In this case pνnp_{\nu_{n}} would generically have a step-like discontinuity at yAy_{*}\in A, and therefore pνnW1,1(A)p_{\nu_{n}}\not\in W^{1,1}(A).

For a general m1m\geq 1, by direct differentiation one has that dmdympμ(y)\frac{d^{m}}{dy^{m}}p_{\mu}(y) is a sum of rational functions where the numerators depend on f,,f(m+1)f^{\prime},\ldots,f^{(m+1)} and in r,,r(m)r,\ldots,r^{(m)}, and the denominators are monomials in ff^{\prime}. One then generalizes Lemma 3 to show that adequately-high derivatives of gng_{n} are uniformly bounded in nn, which concludes the Lemma. ∎

Lemma 6 implies that (25) is applicable in our settings, and that (pμWm,1(A)+pνnWm,1(A))1/m+1pμWm,1(A)1/m+1(\|p_{\mu}\|_{W^{m,1}(A)}+\|p_{\nu_{n}}\|_{W^{m,1}(A)})^{1/m+1}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{1/m+1}, where the upper bound depends only on ff and its derivatives. Therefore

(28) pμpνnL1()=pμpνnL1(A)pμWm,1(A)1m+1Wass11m+1(μ,νn).\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}=\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(A)}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\rm Wass_{1}^{\frac{1}{m+1}}(\mu,\nu_{n})\,.

Since Ω\Omega is compact, (26) can be applied to Wass1(μ,νn)\rm Wass_{1}(\mu,\nu_{n}) such that (28) yields

(29) pμpνnL1()pμWm,1(A)1m+1fgnL2(Ω)23(m+1)fgnL(Ω)13(m+1).\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\|f-g_{n}\|_{L^{2}(\Omega)}^{\frac{2}{3(m+1)}}\|f-g_{n}\|_{L^{\infty}(\Omega)}^{\frac{1}{3(m+1)}}\,.

By the spectral L2L^{2} convergence, fgnL2Wσ,2nσ\|f-g_{n}\|_{L^{2}}\lesssim W^{\sigma,2}n^{-\sigma}, see (4). To bound the LL^{\infty} error, we use (10) in conjuction with (4) again, yielding fgnC0fWσ,2nσ+212\|f-g_{n}\|_{C^{0}}\lesssim\|f\|_{W^{\sigma,2}}n^{-\sigma+2\frac{1}{2}}. Applying both of these upper bounds to (29) than yields

pμpνnL1()\displaystyle\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})} pμWm,1(A)1m+1fWσ,2(Ω)1m+1nmm+1(σ56),\displaystyle\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\|f\|_{W^{\sigma,2}(\Omega)}^{\frac{1}{m+1}}n^{-\frac{m}{m+1}(\sigma-\frac{5}{6})}\,,

The following heuristic argument suggests that the condition |f|>κf>0|f^{\prime}|>\kappa_{f}>0 is necessary for the proof of Theorem 5. Let r1/2r\equiv 1/2 be the uniform density, and suppose without loss of generality that ff is monotonic increasing, that f(0)=f(0)=0f(0)=f^{\prime}(0)=0, and that by Taylor expansion f(δ)=cδk+o(δk)f(\delta)=c\delta^{k}+o(\delta^{k}) for some integer k2k\geq 2 and |c|>0|c|>0 as δ0\delta\to 0. Then f(δ)=kcδk1+o(δk1)f^{\prime}(\delta)=kc\delta^{k-1}+o(\delta^{k-1}) and by direct substitution into (27)

ddypμ(y)y2+1/k,y0,\frac{d}{dy}p_{\mu}^{\prime}(y)\sim y^{-2+1/k}\,,\qquad y\to 0\,,

which is not integrable in any neighborhood of y=0y=0. Hence, pμWk,1(A)p_{\mu}\not\in W^{k,1}(A) for any k1k\geq 1, and we cannot use (25).

6. Numerical experiments and open questions

Refer to caption
Figure 1. PDF approximation for μ=fϱ\mu=f_{\sharp}\varrho where f(α)=sin(20α)f(\alpha)=\sin(20\alpha) and ϱ\varrho is the uniform probability measure on [1,1][-1,1]. (a) f(α)f(\alpha). (b) pμp_{\mu} (solid, black), pν10p_{\nu_{10}} (dots, red) and pν50p_{\nu_{50}} (dashes, blue), where νn=(gn)ϱ\nu_{n}=(g_{n})_{\sharp}\varrho where gng_{n} is the gPC collocation approximation of order nn. pμp_{\mu} and pν50p_{\nu_{50}} are nearly indistinguishable. (c) pμpνn1\|p_{\mu}-p_{\nu_{n}}\|_{1} as a function of nn.

We highlight some aspects of density-approximation using the gPC collocation method (3) by numerical experiments on Ω=[1,1]\Omega=[-1,1] where dϱ(α)=1/2dαd\varrho(\alpha)=1/2\,d\alpha is the uniform probability measure on Ω\Omega. We first consider f(α)=sin(20α)f(\alpha)=\sin(20\alpha), see Fig. 1(a). The approximation in L2(Ω)L^{2}(\Omega) and H1(Ω)H^{1}(\Omega) of ff by polynomials is quite standard: a small number of collocation points (conversely, a low-order polynomial) does not suffice to resolve the oscillations of ff. Once nn is sufficiently large, since ff is analytic, we expect gng_{n} to converge to ff in Hs(Ω)H^{s}(\Omega) exponentially fast, for every s0s\geq 0. For the PDF, we see that indeed pν10p_{\nu_{10}} approximates pμp_{\mu} quite poorly, whereas pν50p_{\nu_{50}} is nearly indistinguishable from pμp_{\mu}; see Fig. 1(b). Quantitatively, the L1()L^{1}(\mathbb{R}) error between the PDFs follows the expected pattern - no convergence for n30n\leq 30, but then a sharp, exponential decay until machine-precision is reached; see Fig. 1(c). Another interesting facet of this example is that since f=0f^{\prime}=0 at several points, pμp_{\mu} is singular at ±1\pm 1, see (5). Theorem 2 therefore implies that pμpνnL1()fgnH1(Ω)1/3\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}^{1/3} since f′′0f^{\prime\prime}\neq 0 at the maximas and minimas, see (7). However, since fgnH1(Ω)\|f-g_{n}\|_{H^{1}(\Omega)} decays exponentially with nn, the effect of this loss of accuracy is hardly noticeable.

Refer to caption
Figure 2. pμpνn1\|p_{\mu}-p_{\nu_{n}}\|_{1} as a function of nn in higher dimensions. (a) f(x,y)=sin(10(x+y))f(x,y)=\sin(10(x+y)). (b) f(x,y)=sin(x+y)f(x,y)=\sin(x+y). (c) f(x,y,z)=sin(x+y+z)f(x,y,z)=~{}\sin(x+y+z).

We repeat the same experiment in higher dimension, using the gPC interpolant on the Gauss-Legendre tensor-product quadrature points (using the Sparse Grid Matlab Kit [3]). First, for d=2d=2, we test f(x,y)=sin(10(x+y))f(x,y)=\sin(10(x+y)), for which we get similar behavior as we got for d=1d=1; compare Fig. 1(c) with Fig. 2(a). Next, to observe “simpler” exponential convergence, we test in d=2,3d=2,3 the functions f(x,y)=sin(x+y)f(x,y)=\sin(x+y) and f(x,y,z)=sin(x+y+z)f(x,y,z)=\sin(x+y+z), in Figs. 2(b) and 2(c), respectively. Note that in all three figures the error convergence is plotted against nn, the maximal polynomial degree, see e.g., (2). The computational cost, however, as represented by the number of grid point, hence the number of evaluations of ff, scales is (n+1)d(n+1)^{d}.

Refer to caption
Figure 3. Same as Fig. 1 for f(α)=|α|3f(\alpha)=|\alpha|^{3}. In (c), a polynomial fit of 2.62n1.442.62n^{-1.44} is presented (dots, black).

Our next two examples are of non-smooth functions. It is generally not advisable to approximate such functions with global polynomial methods such as gPC, and we do not promote this as a strategy in this work. Rather, we consider non-smooth functions to examine the sharpness and scope of our theory. In Fig. 3 we consider f(α)=|α|3f(\alpha)=|\alpha|^{3}. This is case not covered by Theorem 2, as fH3(Ω)f\in~{}H^{3}(\Omega) but not in H4(Ω)H^{4}(\Omega). Notwithstanding, we observe numerically that pμpνnL1()n1.44\|p_{\mu}-~{}p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim n^{-1.44}, which is comparable to fgnH1(Ω)\|f-g_{n}\|_{H^{1}(\Omega)} which by (4) converges as n1.5n^{-1.5}. Recall that the requirement in Theorem 2 that fH6f\in H^{6} stems from our use of Sobolev embedding in Lemma 3 to guarantee that gnC2\|g_{n}\|_{C^{2}} is uniformly bounded in nn. This boundedness is numerically satisfied in this example (results not shown), even though it is not guaranteed by our current theory.

Numerically, it seems that the kk-dependence of the upper bound (7) might not be tight. Since f(0)=f′′(0)=0f^{\prime}(0)=f^{\prime\prime}(0)=0, by applying (7) with k=2k=2 we expect that pμpνnL1()fgnH1(Ω)1/5n0.3\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}^{1/5}\lesssim-n^{-0.3}, which is much slower then what we observe in practice. Finally, consider a third function, f(α)=|α0.5|f(\alpha)=|\alpha-0.5|, which is in H1(Ω)H^{1}(\Omega) but not in H2(Ω)H^{2}(\Omega), see Fig. 4. While it is certainly not good practice to approximate such non-smooth functions with global polynomials, it is interesting to see that even so, pμpνnL1()n0.78\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim n^{-0.78}. In comparison, since fH2(Ω)f\not\in H^{2}(\Omega), the theoretically-predicted H1(Ω)H^{1}(\Omega) convergence rate of gng_{n} to ff by (4) is slower than 1/2-1/2, but was computed to be roughly 0.6-0.6 (results not shown). That convergence is obtained implies that stronger mechanisms of PDF convergence at at play than what our current theory accounts for.

Refer to caption
Figure 4. Same as Fig. 1 for f(α)=|α0.5|f(\alpha)=|\alpha-0.5|. In (c), a fit of 0.91n0.780.91n^{-0.78} is presented (dots, black).

This work also lays the foundations to the study of pushforwards by sparse-grid representations, which are of great practical importance. A key impediment to approximation in the multidimensional case d>1d>1 is the so-called curse of dimensionality - the computational cost of constructing gng_{n} grows exponentially with the dimension dd. For example, the collocation gPC on a tensor-grid of quadrature-points requires sampling ff at ndn^{d} points, which becomes non-feasible already for relatively moderate dimensions. A common approach to this problem in L2L^{2} approximation (which in the UQ context is equivalent to the approximation of moments) is by using sparse grid, which allow for superior orders of convergence [3, 5, 50, 51]. Applying our current analytic approach to sparse grids would require a detailed analysis of the convergence of sparse quadratures in high-order Sobolev spaces.555Not to be confused with the more standard topic of L2(Ω)L^{2}(\Omega) or H1(Ω)H^{1}(\Omega) convergence sparse-representations of high-regularity functions fHs(Ω)f\in H^{s}(\Omega) for s>1s>1. To the best of our knowledge, there are few relevant results in this direction, see [23] and the references therein. However, as Theorem 2 suggests for d=1d=1, it might be that a sharper result is possible in d>1d>1 as well. Such improvements of Theorem 1 might relax the dependence in high-order Sobolev norms to H1(Ω)H^{1}(\Omega) convergence, a widely studied and well understood topic [4, 5].

7. Acknowledgments

The author would like to thank A. Ditkowski, G. Fibich, S. Steinerberger, L. Tamellini, H. Wang, and M.I. Weinstein for useful comments and discussions. The author acknowledges the support of the AMS-Simons Travel Grant.

References

  • [1] M.J. Ablowitz and T.P. Horikis. Interacting nonlinear wave envelopes and rogue wave formation in deep water, Phys. Fluids, 27 (2015), pp. 012107.
  • [2] RA Adams and JJ Fournier, Sobolev Spaces, Elsevier, 2003.
  • [3] J Bäck, F Nobile, L Tamellini, and R Tempone. Stochastic spectral Galerkin and collocation methods for PDEs with random coefficients: a numerical comparison. In J.S. Hesthaven and E.M. Ronquist, editors, Spectral and High Order Methods for Partial Differential Equations, volume 76 of Lecture Notes in Computational Science and Engineering, pages 43–62. Springer, 2011.
  • [4] V Berthelmann, E Novak, and K Ritter, High dimensional polynomial interpolation on sparse grids, Adv. Comput. Math., 12, 1999, pp. 273–288.
  • [5] HJ Bungartz and M Griebel, Sparse grids, Acta Numerica, 13, 2004, pp. 147–269.
  • [6] T Butler, J Jakeman, and T Widely, Convergence of probability densities using approximate models for forward and inverse problems in uncertainty quantification, SIAM J. Sci. Comput., 40, 2018, A3523–A3548.
  • [7] Chae M and Walker SG, Wasserstein upper bounds of the total variation for smooth densities, Stats. Probab. Lett., 163 (2020), pp. 108771.
  • [8] C Canuto and A Quanteroni, Approximation results for orthogonal polynomials in Sobolev spaces, Math. Comput., 38 (1982), pp. 67–86.
  • [9] G Capodaglio, M Gunzburger, and HP Wynn, Approximation of probability density functions for SPDEs using truncated series expansions, arXiv preprint arXiv:1810.01028, 2018.
  • [10] Q.Y Chen, D. Gottlieb, and J.S. Hesthaven. Uncertainty analysis for the steady-state flows in a dual throat nozzle, J. Comput. Phys., 204 (2005), pp. 378–398.
  • [11] I. Colombo, F. Nobile, G. Porta, A. Scotti, and L. Tamellini. Uncertainty quantification of geochemical and mechanical compaction in layered sedimentary basins, Comput. Methods Appl. Mech. Engrg., 328 (2018), pp. 122–146.
  • [12] PG Constantine, MS Eldred, and ET Phipps, Sparse pseudospectral approximation method, Comput. Methods Appl. Mech. Engrg., 229 (2012), pp. 1–12.
  • [13] PJ Davis, Interpolation and Approximation, Dover, 1975.
  • [14] PJ Davis and P Rabinowitz, Numerical Integration, Academic, New York, 1975.
  • [15] A Ditkowski, G Fibich, and A Sagiv, Density estimation in uncertainty propagation problems using a surrogate model, SIAM/ASA J. Uncertain. Quantif., 8 (2020), pp. 261-300.
  • [16] A Ditkowski and R Katz, On spectral approximations with nonstandard weight functions and their implementations to generalized chaos expansions, J. Sci. Comput., 79 (2019), pp. 1985–2005.
  • [17] D Estep, A Malqvist, and S Tavener, Nonparametric density estimation for randomly perturbed elliptic problems I: computational methods, a posteriori analysis, and adaptive error control, SIAM J. Sci. Comput., 31, 2009, pp. 2935–2959.
  • [18] LC Evans, Partial Differential Equations, (Vol. 19) American Mathematical Society, 2010.
  • [19] B. Ganapathysubramanian and N. Zabaras. Sparse grid collocation schemes for stochastic natural convection problems, J. Comp. Phys., 225 (2007), pp. 652–685.
  • [20] R. Ghanem, D. Higdon, and H. Owhadi. Handbook of Uncertainty Quantification, Springer, New York, 2017.
  • [21] R. Ghanem and P.D. Spanos. Stochastic Finite Elements: a Spectral Approach, Springer-Verlag, New-York, 1991.
  • [22] A.L. Gibbs and F.E. Su, On choosing and bounding probability metrics, Int. Stats. Rev., 70:419–435, 2002.
  • [23] M Griebel and S Knapek, Optimized general sparse grid approximation spaces for operator equations, Math. Comput., 78, 2009, pp. 2223–2257.
  • [24] J.S. Hesthaven, S. Gottlieb, and D. Gottlieb. Spectral Methods for Time-Dependent Problem, Cambridge, UK, 2007.
  • [25] O.P. Le Maître, O.M. Knio, H.N. Najm, and R. Ghanem. Uncertainty propagation using Wiener–Haar expansions, J. Comp. Phys., 197 (2004), pp. 28–57.
  • [26] L. Le Cam and G.L. Yang. Asymptotics in Statistics: Some Basic Concepts, Springer Science & Business Media, New York, 2012.
  • [27] O.P. Le Maître, L. Mathelin, O.M. Knio, and M.Y. Hussaini. Asynchronous time integration for polynomial chaos expansion of uncertain periodic dynamics, Discrete Contin. Dyn. Syst, 28 (2010), pp. 199–226.
  • [28] A. O’Hagan. Polynomial chaos: A tutorial and critique from a statistician’s perspective. SIAM/ASA J. Uncertain. Quantif., 20 (2013), pp. 1–20.
  • [29] G. Patwardhan, X. Gao, A. Sagiv, A. Dutt, J. Ginsberg, A. Ditkowski, G. Fibich, and A.L. Gaeta. Loss of polarization of elliptically polarized collapsing beams. Phys. Rev. A, 99 (2019), pp. 033824.
  • [30] Piazzola C, Tamellini L, Pellegrini R, Broglia R, Serani A, and Diez M, Uncertainty quantification of ship resistance via multi-index stochastic collocation and radial basis function surrogates: a comparison, AIAA Aviation 2020 Forum, pp. 3160, 2020.
  • [31] A Quarteroni, R Sacco, and F Salero, Numerical Mathematics, Springer-Verlag, New York NY, USA, 2000.
  • [32] A Sagiv, The Wasserstein distances between pushed-forward measures with applications to uncertainty quantification, arXiv preprints, arXiv:1902.05451 (to appear in Comm. Math. Sci.)
  • [33] A Sagiv, A Ditkowski, and G Fibich, Loss of phase and universality of stochastic interactions between laser beams, Opt. Exp., 25 (2017), pp. 24387–24399.
  • [34] T. Salvemini, Sul calcolo degli indici di concordanza tra due caratteri quantitativi, Atti della I Riunione della Soc. Ital. di Statistica, Roma, 1943.
  • [35] F Santambrogio, Optimal Transport for Applied Mathematicians. Calculus of Variations, PDEs, and Modeling, Progress in Nonlinear Differential Equations and their Applications, Birkäuser, New York, 2015.
  • [36] G. Stefanou. The stochastic finite element method: past, present and future, Comput. Methods Appl. Mech. Engrg., 198 (2009), pp. 1031–1051.
  • [37] B. Sudret and A. Der Kiureghian. Stochastic Finite Element Methods and Reliability: a State-of-the-Art Report, Department of Civil and Environmental Engineering, University of California Berkeley, Berkeley, CA, 2000.
  • [38] G. Szego. Orthogonal Polynomials, Amer. Math. Soc. Colloq. Publ., 23, American Mathematical Society, New York, 1939.
  • [39] R Tipireddy and R Ghanem, Basis adaptation in homogeneous chaos spaces, J. Comput. Phys., 259, 2014, pp. 304–317.
  • [40] L.N. Trefethen. Approximation Theory and Approximation Practice, SIAM, Philadelphia, PA, 2013.
  • [41] A.B. Tsybakov. Introduction to Nonparametric Estimation, Springer, New York, 2009.
  • [42] S. Ullmann and J. Lang. POD-Galerkin modeling and sparse-grid collocation for a natural convection problem with stochastic boundary conditions, in Sparse Grids and Applications, Munich, Spring (2012), pp. 295–315.
  • [43] S.S. Vallender, Calculation of the Wassertein distance between probability distributions on the line, SIAM Theory Prob. Appl., 18:784–786, 1974.
  • [44] C Villani, Topics in Optimal Transportation, American Mathematical Society, 2003.
  • [45] X Wan and GE Karniadakis, An adaptive multi-element generalized polynomial chaos method for stochastic differential equations, J. Comput. Phys., 209 (2005), pp. 617–642.
  • [46] H Wang, How fast does the best polynomial approximation converge than Legendre projection? Numer. Math., 147 (2021), pp. 481–583.
  • [47] H. Wang and S. Xiang, On the convergence rates of Legendre approximation, Math. Comp., 81 (2012), pp. 861–877.
  • [48] L Wasserman, All of Nonparametric Statistics, Springer Science & Business Media, 2006.
  • [49] L Wasserman, All of Statistics: A Concise Course in Statistical Inference, Springer Science & Business Media, New York, 2004.
  • [50] D. Xiu. Numerical Methods for Stochastic Computations: a Spectral Method Approach. Princeton University, Princeton, NJ, 2010.
  • [51] D. Xiu and J.S. Hesthaven. High-order collocation methods for differential equations with random inputs. SIAM J. Sci. Comput., 27 (2005), pp. 1118–1139.
  • [52] D. Xiu and G.E. Karniadakis. The Wiener–Askey polynomial chaos for stochastic differential equations, SIAM J. Sci. Comput., 24 (2002), pp. 619–644.
  • [53] J Zech and Y. Marzouk, Sparse approximation of triangular transports on bounded domains, arXiv preprint arXiv:2006.06994, 2020.