Spectral convergence of probability densities for forward problems in uncertainty quantification

Amir Sagiv Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA [email protected]

Abstract.

The estimation of probability density functions (PDF) using approximate maps is a fundamental building block in computational probability. We consider forward problems in uncertainty quantification: the inputs or the parameters of a deterministic model are random with a known distribution. The scalar quantity of interest is a fixed measurable function of the parameters, and is therefore a random variable as a well. Often, the quantity of interest map is not explicitly known and difficult to compute, and so the computational problem is to design a good approximation (surrogate model) of the quantity of interest. For the goal of approximating the moments of the quantity of interest, there is a well developed body of research. One widely popular approach is generalized Polynomial Chaos (gPC) and its many variants, which approximate moments with spectral accuracy. However, it is not clear whether the quantity of interest can be approximated with spectral accuracy as well. This result does not follow directly from spectrally accurate moment estimation. In this paper, we prove convergence rates for PDFs using collocation and Galerkin gPC methods with Legendre polynomials in all dimensions. In particular, exponential convergence of the densities is guaranteed for analytic quantities of interest. In one dimension, we provide more refined results with stronger convergence rates, as well as an alternative proof strategy based on optimal-transport techniques.

Key words and phrases:

Density, Polynomial Chaos, Spectral, Uncertainty Quantification, Wasserstein, Legendre, pushforward.

2010 Mathematics Subject Classification:

33C47, 62G07, 65C50, 65D15

1. Introduction

In the modeling of physical phenomena, taking uncertainties into consideration is often done by introducing random parameters into deterministic models. If the quantity of interest (model output) is a measurable function of the inputs, then it is a random as well. In this problem, known as the forward problems in uncertainty quantification (UQ), the computational task is to compute the induced probability measure of the quantity of interest given a prescribed probability measure on the inputs and parameters. The type of relevant information varies between applications, but in many cases the full probability density function (PDF) is of interest; See e.g., [1, 10, 11, 17, 19, 27, 29, 33, 30, 42]. This paper will rigorously establish that there exist methods which approximate these PDFs with spectral accuracy.

One avenue to approximate the PDF in forward UQ problems is by nonparametric statistical approaches, e.g., a discretized cumulative distribution function, the histogram method, or kernel density estimators (KDE) [41, 48].

To obtain better accuracy, one needs to utilize the underlying structure of forward UQ problems. That is, denote the input Borel probability space by $(\Omega,\varrho)$ and the Borel measurable model-output map by $f:\Omega\to\mathbb{R}$ . The measure of interest $\mu=~{}f_{\sharp}\varrho$ is the pushforward of $\varrho$ by $f$ , i.e., $\mu(A)=\varrho(f^{-1}(A))$ for every Borel set $A\subseteq~{}\mathbb{R}$ . To take this structure into account, we turn to approximation-based methods (surrogate models). In this class of methods, $f$ is approximated by a simpler function $g$ for which the pushforward $g_{\sharp}\varrho$ is easier to compute [20, 50]. For smooth quantities of interest with low- or moderate-dimensional domains, surrogate models have become the standard [20, 37], most prominently the polynomial-based methods known as generalized polynomial chaos (gPC) [21, 28, 36, 50] and their many variants, see e.g., [25, 39, 45]. The success of the gPC methods relies on their spectral $L^{2}(\Omega)$ convergence, when the original map is sufficiently smooth and specific domain and boundary conditions are met [51, 52]. By spectral accuracy we mean the following: let $g_{n}$ be the gPC polynomial of order $n$ (see Sec. 2). Then $\|f-g_{n}\|_{L^{2}}$ decays polynomially in $n$ and the order of decay improves with the order of regularity of $f$ . Crucially, the error decays exponentially in $n$ if $f$ is analytic. $L^{2}(\Omega)$ convergence implies convergence of the moments of $\nu_{n}:\,=(g_{n})_{\sharp}\varrho$ to those of $\mu$ . Practically, this means that for analytic functions of interest $f$ , the moments converge exponentially fast in the order of approximation.

Question.

If $g_{n}$ is the gPC approximation of $f$ , does $p_{\nu_{n}}$ converge to $p_{\mu}$ with spectral accuracy?

That a sequence of functions $g_{n}$ converges to $f$ in $L^{2}$ does not guarantee that the resulting PDFs $p_{\nu_{n}}$ converge to $p_{\mu}$ in any $L^{q}(\mathbb{R})$ norm, see counter-example in [15]. Despite its practical relevance, the problem of density estimation using surrogate models has so far received little theoretical treatment, see one notable example in [6]. We previously showed that $L^{2}$ convergence of $g_{n}$ to $f$ guarantees convergence of $\nu_{n}$ to $\mu$ in the Wasserstein- $p$ metric, see details in [32] and in Sec. 5. Convergence in the Wasserstein metric, however, is in general weaker than convergence of the PDFs [22, 26]. To guarantee PDF approximation, Ditkowski, Fibich, and the author [15] proved that more strict regularity conditions and $C^{1}$ convergence of $g_{n}$ to $f$ guarantee convergence of $p_{\nu_{n}}$ to $p_{\mu}$ in $L^{q}(\mathbb{R})$ for all $1\leq q<\infty$ (see Theorem 4 below) and provided a spline-based method with provable convergence rates for the PDFs. Related, but not equivalent results, have also been obtained for certain monotonic triangular maps on $[-1,1]^{d}$ [53], and for the approximation of stochastic PDEs [9]. From a practical standpoint, it is desirable to have a surrogate model with spectrally convergent PDFs. One would like to prove that a well-known trade-off in function approximation (e.g., in $L^{2}$ ) holds for PDF approximation as well: on the one hand, local methods (like splines) are more robust to high derivatives and non-smoothness than spectral methods (like gPC). On the other hand, local methods are usually restricted to a fixed polynomial order of convergence, whereas spectral methods converge extremely fast for smooth functions. The question of spectral PDF convergence is therefore open and interesting.

In this work, we answer the above question affirmatively for the gPC expansion with respect to Legendre polynomials. To the best of our knowledge, these are the first results of this kind. Under high regularity conditions, the PDFs obtained by the gPC method converge spectrally (Theorem 1), and in particular converge exponentially fast for analytic quantities of interest. In the one-dimensional case we prove a stronger result, that $\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}}$ (Theorem 2), and provide an alternative spectral convergence result (Theorem 5), for which the proof relies on the weak Wasserstein- $p$ convergence combined with a recent result by Chae and Walker [7]. In all these results, the analysis relies on three main analytic components - the relation between $f$ and $p_{\mu}$ (5), the approximation power of gPC in Sobolev spaces (4), and Sobolev-Morrey embeddings in compact domains (10). In this work, we study pushforward measures induced by full-grid (or tensor-product grid) polynomial approximations. In high dimensions, the use of tensor-product grids becomes computationally expansive due to the so-called curse of dimensionality - the computational cost required to achieve a certain accuracy increases exponentially with the dimension. Several approaches have been developed to perform high-dimensional uncertainty propagation, such as sparse grids [3, 5, 50, 51] and adaptive methods [45]. While these methods are beyond the scope is of this work, our analysis lays the numerical analysis foundation to understand the pushforward measures they induce, and thus their adequacy for density estimation.

1.1. Paper structure

The remainder of the paper is organized as follows: Sec. 2 provides the preliminaries on the gPC method, pushforward measures and a more detailed account of previous results. Secs. 3 and 4 detail the paper’s main theoretical results and their proofs, respectively. We then turn in Sec. 5 to provide an alternative, optimal-transport based analysis of the measure pushforward problem. Finally, in Sec. 6 we present numerical experiments and outline open problems.

2. Preliminaries

2.1. Generalized Polynomial Chaos

We present a brief review of the gPC method, see [50] for a thorough introduction. For clarity, we first review the gPC method for the one-dimensional case. The Legendre polynomials $\{p_{n}(\alpha)\}_{n=0}^{\infty}$ are the set of orthogonal polynomials with respect to the Lebesgue measure on $\Omega=~{}[-1,1]$ , see [38]

{\rm Deg}(p_{n})=n\,,\qquad\langle p_{n},p_{m}\rangle_{L^{2}(\Omega)}=\delta_{n,m}\,.

The Legendre polynomials constitute an orthonormal basis of $L^{2}(\Omega,d\alpha)$ , i.e.,

(1)

f(\alpha)=\sum\limits_{n=0}^{\infty}\hat{f}(n)p_{n}(\alpha)\,,\qquad\hat{f}(n):=\langle p_{n},f\rangle_{L^{2}(\Omega)}\,,\qquad f\in L^{2}(\Omega,d\alpha)\,.

The Galerkin-gPC expansion of a function $f\in L^{2}(\Omega)$ of order $n$ is the projection of $f$ to its first $n+1$ modes, i.e., $g_{n}(\alpha)=\sum_{j=0}^{n}\hat{f}(n)p_{n}(\alpha)$ . In the multidimensional case $\Omega=[-1,1]^{d}$ , the expansion of order $n$ of $f$ is defined as the $L^{2}(\Omega)$ projection of $f$ into the space of polynomials of maximal degree $n$ [8, 21, 52]

(2)

g_{n}=P_{n}f=\sum\limits_{\|{\bf j}\|_{\infty}\leq n}\langle p_{\bf j},f\rangle_{L^{2}(\Omega)}p_{\bf j}\,,\qquad n\in\mathbb{N}\,,

where $p_{\bf j}$ is the tensor-product Legendre polynomial of multi-index ${\bf j}=(j_{1},\ldots,j_{d})$ and $\|{\bf j}\|_{\infty}=\max_{i}|j_{i}|$ . If $f$ can only be evaluated at a discrete set of points, the integrals in expansion coefficients $\hat{f}(n)$ cannot be exactly computed. The collocation-gPC approach is to approximate these coefficients using the Gauss quadrature formula: Let $q:\Omega\to\mathbb{R}$ be an integrable function, then the Gauss-Legendre quadrature formula of order $n$ is $\int_{-1}^{1}q(\alpha)\,d\alpha\approx\sum_{k=1}^{n}g(\alpha_{k})w_{k}$ , where $\{\alpha_{k}\}_{k=1}^{N}$ are the distinct and real roots of $p_{N}(\alpha)$ , $w_{k}:=\int_{\Omega}l_{k}(\alpha)\,d\mu(\alpha)$ are the weights, and $l_{k}(\alpha)$ are the Lagrange interpolation polynomials with respect at the quadrature points [14]. Taking $q=p_{j}(\alpha)f(\alpha)$ yields the gPC collocation method in one dimension [51],

(3a)		$g_{N}(\alpha):\,=\sum\limits_{j=0}^{N-1}\hat{f}_{N}(j)p_{j}(\alpha)\,,$
(3b)		$\hat{f}(j)\approx\hat{f}_{N}(j):\,=\sum\limits_{k=1}^{N}f\left(\alpha_{k}\right)p_{j}\left(\alpha_{k}\right)w_{k},\qquad j=0,1,\ldots,N-1\,.$

The generalization of (3) to multiple dimensions is a direct result of integration by tensor-grid Gauss quadratures, see [8, 14, 51] for details. We further note that the gPC-collocation polynomial (3) is also the polynomial interpolant of degree $N-1$ in the Guass-Legendre points, i.e., $g_{n}(\alpha_{\bf k})=f(\alpha_{\bf k})$ for all quadrature points [12].

2.2. The approximation power of Legendre polynomials

Recall that for $A=[-1,1]^{d}$ or $\mathbb{R}^{d}$ , and any pair of integers $k,p\geq 1$ , the corresponding Sobolev space is defined as [2, 18]

W^{k,p}(A):\,=\left\{u:A\to\mathbb{C}~{}~{}|~{}~{}\max\limits_{\|{\bf j}\|_{1}\leq k}\|D^{\bf j}u\|_{L^{p}(A)}<\infty~{}\right\}\,,

where ${\bf j}$ is a multi-index and $D^{\bf j}u=(\Pi_{k=1}^{d}\partial_{\alpha_{k}}^{j_{k}})u(\alpha)$ . For $p=2$ we will adopt the conventional notation $W^{k,2}(A)=H^{k}(A)$ . Both the Galerkin (2) and the collocation expansion (3) converge spectrally in $L^{2}(\Omega)$ , where the underlying measure is the Lebsegue measure [24, 40, 47, 50]. By this we mean that if $f\in H^{k}(\Omega)$ for some $k\geq 0$ , then [8, 24, 50]

\|f-g_{n}\|_{L^{2}(\Omega)}\leq cn^{-k}\|f\|_{H^{k}(\Omega)}\,,

and if $f$ is analytic (in the multivariate sense), then $\|f-g_{n}\|_{L^{2}(\Omega)}$ decays exponentially in $n$ [13], See [53] and the references therein for details. From a probability and UQ point of view, the convergence in $L^{2}(\Omega)$ guarantees that the moments of $g_{n}$ converge to the moments of $f$ . Indeed, if $\varrho$ is the Lebesgue measure on $\Omega$ , a simple application of the Cauchy-Schwartz inequality shows that

|\mathbb{E}f-\mathbb{E}g_{n}|\leq\int_{\Omega}(f(\alpha)-g(\alpha))\,d\alpha\lesssim\|f-g_{n}\|_{L^{2}(\Omega)}\,,

and therefore for e.g., a smooth quantity of interest $f$ , the gPC method provides an accurate approximation of $\mathbb{E}f$ for a relatively low order $n$ . The convergence of other moments follow similarly, see e.g., [15].¹¹1For measures $\varrho$ which are not the Lebesgue measure, see [16]. Both gPC expansions (2) and (3) also converge in higher-regularity Sobolev spaces, as given by the now classical result:

Theorem (Canuto and Quarteroni [8]).

For any $1\leq\beta\leq\sigma$ , there exists a constant $C=C(\beta,\sigma)$ such that

(4a)		$\\|f-g_{n}\\|_{H^{\beta}(\Omega)}\leq Cn^{-e(\beta,\sigma)}\\|f\\|_{H^{\sigma}(\Omega)}\,,$
where $g_{n}$ is given by either (2) or (3), and
(4b)		$e(\beta,\sigma)=\sigma+\frac{1}{2}-2\beta\,.$

If $f$ is analytic (for $d=1$ ) on an ellipse in $\mathbb{C}$ with foci at $\pm 1$ and with both axis summing to $r>1$ , then in one dimension $\hat{f}(n)=O(n^{-1/2}r^{-n})$ [46, 47]. By Parseval identity (with respect to Legendre polynomials),

\|f-g_{n}\|_{L^{2}(\Omega)}^{2}=\sum\limits_{j=n+1}^{\infty}|\hat{f}(n)|^{2}\lesssim\sum\limits_{j=n+1}^{\infty}n^{-1}r^{-2n}\,,

and so the $L^{2}$ approximation error is almost exponential as well. Furthermore, by uniform convergence of $g_{n}$ to $f$ on $\Omega$ , one can differentiate $g_{n}$ term-wise, yielding faster than polynomial (exponential for all practical purposes) convergence of $g_{n}$ to $f$ in all Sobolev spaces $H^{k}(\Omega)$ with $r\geq 0$ .

2.3. Pushforward measures and prior results

The pushforward of a Borel measure $\varrho$ on $\Omega$ by a measurable $f:\Omega\to\mathbb{R}$ is a measure $\mu:\,=f_{*}\varrho$ defined by $\mu(A)=\varrho(f^{-1}(A))$ for any Borel subset $A\subseteq\mathbb{R}$ . If a measure $\mu$ on $\mathbb{R}$ is absolutely continuous with respect to the Lebesgue measure, its probability density function (PDF) is its Radon-Nykodim derivative, i.e., $p_{\mu}\in L^{1}(\mathbb{R})$ which satisfies $\mu(A)=\int_{A}p_{\mu}(y)\,dy$ for any Borel set $A\subseteq\mathbb{R}$ . Alternatively, if the cumulative distribution function (CDF) $F_{\mu}(y)=\mu(f^{-1}(-\infty,y))$ is differentiable, then $p_{\mu}(y)=dF_{\mu}(y)/dy$ . In the one-dimensional case, if $f$ is piecewise $C^{1}$ and piecewise monotonic, and $\varrho$ is an absolutely continuous probability measure, then [15]

(5)

p_{\mu}(y)=\sum\limits_{f(\alpha)=y}\frac{p_{\varrho}(\alpha)}{|f^{\prime}(\alpha)|}\,.

This relation is the source of many of the difficulties and peculiarities in understanding PDFs of pushforward measures. For example, if $f(J)=c$ for some constant $c$ on an interval $J$ with $\varrho(J)>0$ , then $\mu$ has a singular part at $c$ and therefore has no PDF. Even for a non-constant smooth monotonic function such as $f(\alpha)=\alpha^{2}$ and a simple $\varrho$ such as the uniform measure on $[0,1]$ , then $p_{\mu}(y)\sim 1/\sqrt{y}$ is singular. Analogously for $d>1$ , then $p_{\mu}\sim\int_{f^{-1}(y)}p_{\varrho}|\nabla f|^{-1}d\sigma$ , where $d\sigma$ is the $(d-1)$ -dimensional surface elements.

The practical goal of surrogate models in the context of PDF approximation is to approximate $\mu$ by $\nu_{n}=(g_{n})_{\sharp}\varrho$ with small error terms $\|p_{\mu}-p_{\nu_{n}}\|_{L^{q}(\mathbb{R})}$ for some $q\geq 1$ , while maintaining $n$ small, as $n$ is a good proxy to the computational cost. To see that, first note that since sampling arbitrarily many times from the polynomial $g_{n}$ is computationally cheap, approximating $p_{\nu_{n}}$ to arbitrary precision given $g_{n}$ is relatively cheap as well, see the analysis in [15]. Therefore, it is constructing $g_{n}$ which is costly. In the collocation gPC (3), the larger the degree of approximation $g=g_{n}$ is, the more evaluations of $f$ are required. If for example $f$ models the response of a partial differential equation (PDE), each such evaluation amounts to a solution of the PDE, which is usually computationally expansive. In the Galerkin-type gPC (2), one usually projects the original PDE with random parameters to several PDEs with deterministic parameters, the number of which also grows with the degree $n$ , see [50] for details. Therefore, in either case guaranteeing good accuracy for a small degree $n$ should be the goal of our analysis.

As noted in the introduction, $L^{2}(\Omega)$ convergence alone does not guarantee convergence of the pushforward PDFs. Indeed, one can construct a sequence such that $g_{n}\to f$ in $L^{2}(\Omega)$ but $\|p_{\nu_{n}}-p_{\mu}\|_{L^{q}(\mathbb{R})}>{\rm const}(q)$ for all $n,q\geq 1$ [15]. Previously however [32], we proved that $L^{2}$ convergence is sufficient to establish convergence in the weaker Wasserstein metric, or more precisely, that $\rm Wass_{2}(f_{\sharp}\varrho,g_{\sharp}\varrho)\leq\|f-g\|_{L^{2}(\Omega,\varrho)}$ (see relevant definitions in Sec. 5). Furthermore, we showed that uniform boundedness of $\|f-g_{n}\|_{\infty}$ combined with $L^{2}(\Omega)$ convergence guarantees Wasserstein- $p$ convergence for any $1\leq p<\infty$ , see (26) below. Since $\rm Wass_{1}(\mu,\nu)$ is equal to the $L^{1}(\mathbb{R})$ convergence of the CDFs [34, 43], the convergence of the CDFs in $L^{1}$ is a corollary of our result. A result in the direction of PDF approximation by surrogate methods was obtain by Ditkowski, Fibich, and the author in [15]. If $g_{n}$ is taken to be the spline interpolant of $f$ of order $m$ , on e.g., a uniform grid with a total of $N$ grid points, then $\|p_{\mu}-p_{\nu_{n}}\|_{L^{q}(\mathbb{R})}\lesssim N^{-m/d}$ for any $1\leq q<\infty$ . The main characteristic of splines that is useful to establish this result is the pointwise $C^{1}(\Omega)$ convergence of $g_{n}$ to $f$ , see Theorem 4 below. We will use this tool tool later to prove Theorem 1.

2.4. Problem formulation

Let $\Omega=\Omega(d)=[-1,1]^{d}$ be equipped with an absolutely continuous probability measure $\varrho$ . Consider a smooth function of interest $f:\Omega\to\mathbb{R}$ and let $g_{n}:\Omega\to\mathbb{R}$ be its generalized polynomial chaos (gPC) approximation - either its $L^{2}(\Omega)$ projection to the space of Legendre polynomials of order $\leq n$ (Galerkin type) or its polynomial interpolant on the Gauss-Legendre quadrature points of order $n$ (collocation type). Consider the pushforward measures $\mu:\,=f_{*}\varrho$ and $\nu_{n}:\,=(g_{n})_{*}\varrho$ and denote their respective probability density functions (PDF) by $p_{\mu}$ and $p_{\nu_{n}}$ . In what follows, we see under what conditions $p_{\nu_{n}}$ converges to $p_{\mu}$ in $L^{q}(\mathbb{R})$ for $1\leq q<\infty$ , and find the convergence rates.

3. Main results

The convergence of the pushed-forward PDFs is guaranteed by the following:

Theorem 1.

Let $\Omega=[-1,1]^{d}$ for any $d\geq 1$ , let $f\in H^{\sigma}(\Omega)$ where

(6)

\sigma\geq\sigma_{\min}(d)=\left\{\begin{array}[]{ll}5\frac{1}{2}+d\,,&d~{}{\rm even}\,,\\ 4\frac{1}{2}+d\,,&d~{}{\rm odd}\,,\end{array}\right.

let $d\varrho(\alpha)=r(\alpha)d\alpha$ with $r\in C^{1}(\Omega)$ , and assume that $|\nabla f|>\kappa_{f}>0$ . Then for any $1\leq q<\infty$ ,

\|p_{\mu}-p_{\nu_{n}}\|_{L^{q}(\mathbb{R})}\lesssim\|f-g_{n}\|_{C^{1}(\Omega)}\lesssim n^{-\sigma+\sigma_{\min}-2}\|f\|_{H^{\sigma}(\Omega)}\,.

For the one-dimensional case, the result can be improved:

Theorem 2.

Let $\Omega=[-1,1]$ , let $f\in H^{\sigma}(\Omega)$ with $\sigma\geq 6$ and let $d\varrho(\alpha)=r(\alpha)d\alpha$ with $r\in C^{1}(\Omega)$ . If $f^{\prime}(\alpha_{j})=0$ for finitely many points $\alpha_{1},\ldots,\alpha_{J}\in\Omega$ and there exist $k_{j}\geq 2$ for each $1\leq j\leq J$ such that $|f^{(k_{j}+1)}(\alpha_{j})|>0$ , then

(7)

\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}^{\frac{1}{2k+1}}\lesssim\|f\|_{H^{\sigma}(\Omega)}^{\frac{1}{2k+1}}n^{-\frac{2\sigma-3}{2(2k+1)}}\,,

where $k=\max_{j}k_{j}$ . In particular, if $|f^{\prime}(\alpha)|>\kappa_{f}>0$ for all $\alpha\in\Omega$ , then

(8)

\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}\lesssim\|f\|_{H^{\sigma}(\Omega)}n^{\frac{3}{2}-\sigma}\,.

Furthermore, if $f$ is analytic on an ellipse in $\mathbb{C}$ with foci at $\pm 1$ and with both radius summing to $r>1$ , $\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}$ converges faster than any polynomial in $n$ .

Theorem 2 improves Theorem 1 in the case of $d=1$ in two ways - First, it allows for points with $f^{\prime}=0$ . Second, Theorem 2 improves the convergence rate by three orders, from $n^{4\frac{1}{2}-\sigma}$ to $n^{\frac{3}{2}-\sigma}$ . Generally, Theorem 2 also yields much smaller constants. These improvements are owed to our ability to link the PDF convergence directly to $H^{1}(\Omega)$ convergence of $g_{n}$ to $f$ . Since the PDFs depend on $f^{\prime}$ , it is conjectured that convergence of $g_{n}$ to $f$ in $H^{1}(\Omega)$ is the weakest $H^{k}(\omega)$ convergence that guarantees convergence of the PDFs, thus possibly leading to a sharp rate in (8). In Theorem 1, we use $C^{1}(\Omega)$ convergence of $g_{n}$ to $f$ , which by Sobolev embedding depends on the much stronger, and therefore much slower, $H^{2+\lfloor d/2\rfloor}(\Omega)$ convergence. The improvement of rates also implies the improvement of constants, as can be observed from the Canuto and Quarteroni result (4). Suppose one wishes to guarantee $\|p_{\mu}-p_{\nu_{n}}\|_{1}\leq Cn^{-9/2}$ , and that $f^{\prime}>\kappa_{f}>0$ . Since Theorem 2 uses $H^{1}$ convergence, the constant would increase with $\|f\|_{H^{6}}$ , whereas in Theorem 1 it would depend on $\|f\|_{H^{9}}$ .

That the convergence constants depend on high-regularity Sobolev norm is emblematic of global methods in general, and spectral methods in particular: To approximate a function with high derivatives well, i.e., in the asymptotically guaranteed rate, one has to to use a gPC polynomial of a relatively high order. This high threshold resolution is often embodied in the constants of the upper bound. Here one draws a distinction between the spline-based surrogate proposed in [15] and the spectral methods: splines guarantee polynomial convergence with low-sensitivity to high-derivatives. Global polynomial methods, such as gPC, can provide exponential accuracy if $f$ is very smooth in comparison to the sampling resolution. Another way in which even Theorem 2 depends on high-order Sobolev norms, is that it requires that $f$ is at least in $H^{6}$ . This is a technical requirement that is needed to guarantee that $\|f\|_{C^{2}}$ does not grow with $n$ , see Lemma 3. Simulations seem to suggest that this is not a sharp requirement, see Sec. 6.

4. Proofs of main results

Throughout this paper we need to establish the $C^{1}$ approximation of $f$ by $g_{n}$ , and that $\|g_{n}\|_{C^{2}(\Omega)}$ is uniformly bounded in $n$ .

Lemma 3.

Under the conditions of Theorem 1, $\|g_{n}\|_{C^{2}(\Omega)}$ is uniformly bounded for all $n\in\mathbb{N}$ , and

(9)

\|f-g_{n}\|_{C^{1}(\Omega)}\lesssim n^{-\sigma+\sigma_{\min}+2}\|f\|_{H^{\sigma}(\Omega)}\,.

Proof.

Recall the Sobolev-Morrey embedding theorem [2, 18]: If $u\in H^{s}(\Omega)$ , and $s>d/2$ , then

(10)

\|u\|_{C^{s-\lfloor\frac{d}{2}\rfloor-1}(\Omega)}\lesssim\|u\|_{H^{s}(\Omega)}\,,

where $\lfloor x\rfloor$ is the lower integer value for any $x\geq 0$ .²²2In this study, we will not use a stronger version of these embeddings for Hölder norms $C^{k,\alpha}$ , but just the integer-power $C^{k}$ norms. By (10), choosing $\beta=3+\lfloor d/2\rfloor$ yields

(11)

\|f-g_{n}\|_{C^{2}(\Omega)}\lesssim\|f-g_{n}\|_{H^{\beta}(\Omega)}\,.

Applying the Sobolev approximation result (4) to (11) yields

(12)

\|f-g_{n}\|_{C^{2}}\lesssim n^{-e(\beta,\sigma)}\|f\|_{H^{\sigma}}\,.

To guarantee a uniform bound for all $n\in\mathbb{N}$ , it is sufficient to choose $\sigma\geq\sigma_{\rm min}$ such that $e(\beta,\sigma_{\rm min})=0$ . Since $\beta>1$ , then

	$\displaystyle 0$	$\displaystyle=e(\beta,\sigma_{\min})$
		$\displaystyle=\sigma_{\min}+\frac{1}{2}-2\beta$
		$\displaystyle=\sigma_{\min}+\frac{1}{2}-2(3+\lfloor\frac{d}{2}\rfloor)\,.$
		$\displaystyle\Longrightarrow\sigma_{\min}=\left\{\begin{array}[]{ll}5\frac{1}{2}+d\,,&d~{}{\rm even}\,,\\ 4\frac{1}{2}+d\,,&d~{}{\rm odd}\,.\end{array}\right.$

For $\sigma\geq\sigma_{\min}$ , then $e(\beta,\sigma)\geq 0$ in (12), and so $\|f-g_{n}\|_{C^{2}(\Omega)}\lesssim\|f\|_{H^{\sigma}(\Omega)}$ . Hence, since $f\in C^{2}(\Omega_{d})$ , then

	$\displaystyle\\|g_{n}\\|_{C^{2}(\Omega)}$	$\displaystyle\lesssim\\|g_{n}-f\\|_{C^{2}(\Omega)}+\\|f\\|_{C^{2}(\Omega)}$
		$\displaystyle\lesssim\\|f\\|_{H^{\sigma}(\Omega)}+\\|f\\|_{C^{2}(\Omega)}\,,\qquad n\in\mathbb{N}\,.$

We proceed to prove the estimate (9). By Sobolev-Morrey inequality (10) and (4), we have that

	$\displaystyle\\|f-P_{n}f\\|_{C^{1}(\Omega)}$	$\displaystyle\lesssim\\|f-P_{n}f\\|_{H^{2+\lfloor\frac{d}{2}\rfloor}(\Omega)}$
		$\displaystyle\lesssim\\|f\\|_{H^{\sigma}}n^{e(2+\lfloor\frac{d}{2}\rfloor,\sigma)(\Omega)}\,,$

where

	$\displaystyle e(2+\lfloor\frac{d}{2}\rfloor,\sigma)$	$\displaystyle=\sigma+\frac{1}{2}-4-2\lfloor\frac{d}{2}\rfloor$
		$\displaystyle=\left\{\begin{array}[]{ll}\sigma-3\frac{1}{2}-d\,,&d~{}{\rm even}\,,\\ \sigma-2\frac{1}{2}-d\,,&d~{}{\rm odd}\,,\end{array}\right.$

which is positive for all $\sigma>\sigma_{\min}$ . ∎

4.1. Proof of Theorem 1

Local $C^{1}$ convergence as established in Lemma 3 implies convergence of PDFs by the following result:

Theorem 4 (Corollary 5.5, [15]).

Let $(g_{n})_{n=1}^{\infty}\subset C^{1}(\Omega)$ , and consider $f\in C^{1}(\Omega)$ such that $|\nabla f|>\kappa_{f}>0$ . Then if

(13a)		$\\|g_{n}\\|_{C^{2}(\Omega_{d})}\leq K\,,$
for some constant $K$ and for all $n\in\mathbb{N}$ , and if
(13b)		$\\|f-g_{n}\\|_{C^{1}(\Omega)}\leq Kn^{-\tau}\,,\qquad\tau,K>0\,,$

then $|p_{\mu}(y)-p_{\nu_{n}}(y)|\lesssim n^{-\tau}$ for all but $o(n^{-\tau})$ points, and therefore

\|p_{\mu}-p_{\nu}\|_{L^{q}(\mathbb{R})}\lesssim n^{-\tau}\,,

for all $1\leq q<\infty$ . Furthermore, if $d=1$ and $g_{n}$ interpolates $f$ at the endpoints $f(\pm 1)=g_{n}(\pm 1)$ , then the uniform estimate $\|f-g_{n}\|_{L^{\infty}}\lesssim n^{-\tau}$ holds.

Indeed, (13a) is guaranteed explicitly and (9) guarantees (13b) with an explicit convergence rate $\tau=\sigma-\sigma_{\min}-2>0$ .

4.2. Proof of Theorem 2

For brevity, we omit the $n$ subscripts, denoting $g_{n}=g$ and $\nu=\nu_{n}$ . First, we treat the case where $|f^{\prime}|>\kappa_{f}>0$ . Suppose $f$ is monotonic increasing, i.e., $f^{\prime}>\kappa_{f}>0$ . By the embedding theorem (10), that $f\in H^{6}(\Omega)$ implies that $f\in C^{2}(\Omega)$ and so

(14)

p_{\mu}(y)=\sum\limits_{f(\alpha)=y}\frac{p_{\varrho}(\alpha)}{|f^{\prime}(\alpha)|}=\frac{r(f^{-1}(y))}{f^{\prime}(f^{-1}(y))}\,,

for every $y\in{\rm range}(f)$ [15, Lemma 4.1]. Since $g$ is a polynomial, $g\in C^{1}(\Omega)$ . Furthermore, by Lemma 3 we have that $\|f-g\|_{C^{1}}\lesssim n^{-5/2}\|f\|_{H^{4}(\Omega)}$ , and so for sufficiently large $n$ , $g^{\prime}(\alpha)>\kappa_{f}/2>0$ for all $\alpha\in\Omega$ . Therefore $p_{\nu}(y)=r(g^{-1}(y))/g^{\prime}(g^{-1}(y))$ for every $y\in{\rm range}(g)$ . It might be that the ranges of $f$ and $g$ , which are the supports of $p_{\mu}$ and $p_{\nu}$ , respectively, do not overlap. Assume for simplicity that $f(-1)=g(-1)$ but $g(1)>f(1)$ , the other cases can be treated similarly. Then

(15)

\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}=\int\limits_{f(-1)}^{f(1)}|p_{\mu}(y)-p_{\nu}(y)|\,dy+\int\limits_{f(1)}^{g(1)}p_{\nu}(y)\,dy\,.

We begin with the second integral -

	$\displaystyle\int\limits_{f(1)}^{g(1)}p_{\nu}(y)\,dy$	$\displaystyle=\int\limits_{f(1)}^{g(1)}\frac{r(g^{-1}(y))}{g^{\prime}(g^{-1}(y))}\,dy$
		$\displaystyle\leq\frac{2}{\kappa_{f}}\\|r\\|_{\infty}\|g(1)-f(1)\|$
		$\displaystyle\leq\frac{2}{\kappa_{f}}\\|r\\|_{\infty}\\|g-f\\|_{C^{0}(\Omega)}$
		$\displaystyle\lesssim\frac{2}{\kappa_{f}}\\|r\\|_{\infty}\\|g-f\\|_{H^{1}(\Omega)}\,,$

where the last inequality is due to the Sobolev-Morrey embedding (10). Therefore, we need only to consider the first integral in (15), and we therefore assume without loss of generality that ${\rm range}(f)={\rm range}(g)$ , and so

(16)

\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}=\int\limits_{f(-1)}^{f(1)}\left|\frac{r(f^{-1}(y))}{f^{\prime}(f^{-1}(y))}-\frac{r(g^{-1}(y))}{g^{\prime}(g^{-1}(y))}\right|\,dy\,.

Denote $\alpha=f^{-1}(y)$ and $\alpha_{*}:\,=\alpha_{*}(\alpha)=g^{-1}(f(\alpha))$ , then by change of variables

	$\displaystyle\\|p_{\mu}-p_{\nu}\\|_{L^{1}(\mathbb{R})}$	$\displaystyle=\int\limits_{-1}^{1}\left\|\frac{r(\alpha)}{f^{\prime}(\alpha)}-\frac{r(\alpha_{})}{g^{\prime}(\alpha_{})}\right\|f^{\prime}(\alpha)\,d\alpha$
		$\displaystyle=\int\limits_{-1}^{1}\frac{\|r(\alpha)g^{\prime}(\alpha_{})-f^{\prime}(\alpha)r(\alpha_{})\|}{g^{\prime}(\alpha^{*})}\,d\alpha\,.$

Since $g^{\prime}(\alpha)$ and $r(\alpha)$ are differentiable, then for any $\alpha\in\Omega$

(17)	$\displaystyle\|r(\alpha)g^{\prime}(\alpha_{*})-$	$\displaystyle f^{\prime}(\alpha)r(\alpha_{*})\|$
(18)		$\displaystyle\leq r(\alpha)\|g^{\prime}(\alpha_{})-g^{\prime}(\alpha)\|+r(\alpha)\|g^{\prime}(\alpha)-f^{\prime}(\alpha)\|+f^{\prime}(\alpha)\|r(\alpha)-r(\alpha_{})\|$
(19)		$\displaystyle\leq D\|\alpha-\alpha_{*}\|+r(\alpha)\|f^{\prime}(\alpha)-g^{\prime}(\alpha)\|\,,\qquad$

where $D:\,=\left[\|r\|_{\infty}\|g^{\prime\prime}\|_{\infty}+\|r^{\prime}\|_{\infty}\|f\|_{\infty}\right]$ . In general, $D=D_{n}$ as $g=P_{n}f$ depends on $n$ , and $\|g^{\prime\prime}\|_{\infty}$ might not be bounded. However, as in Lemma 3, $\|g^{\prime\prime}\|_{\infty}$ and therefore $D$ are uniformly bounded for all $n\geq 1$ . Since $g^{\prime}\geq\kappa_{f}/2$ for sufficiently large $n$ , then substituting (17) in the integral yields

(20)

\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\frac{1}{\kappa_{f}}\underbrace{\int\limits_{-1}^{1}r(\alpha)|f^{\prime}(\alpha)-g^{\prime}(\alpha)|\,d\alpha}_{:\,={\rm I}}+\underbrace{\frac{D}{\kappa_{f}}\int\limits_{-1}^{1}|\alpha-\alpha_{*}|\,d\alpha}_{:\,={\rm II}}\,.

Since $r(\alpha),(f^{\prime}-g^{\prime})\in L^{2}(\Omega)$ , then by the Cauchy-Schwartz inequality ${\rm I}$ is bounded from above by

\left|\langle r,f^{\prime}-g^{\prime}\rangle_{L^{2}(\Omega)}\right|\leq\|r\|_{L^{2}(\Omega)}\cdot\|f^{\prime}-g^{\prime}\|_{L^{2}(\Omega)}\leq\|r\|_{L^{2}(\Omega)}\cdot\|f-g\|_{H^{1}(\Omega)}\,

To bound ${\rm II}$ in (20) from above, we first note that by Lagrange’s mean-value theorem, there exists $\beta$ between $\alpha$ and $\alpha_{*}$ such that $g^{\prime}(\beta)(\alpha-\alpha_{*})=g(\alpha)-g(\alpha_{*})=g(\alpha)-f(\alpha)$ , and therefore $|\alpha-\alpha_{*}|\leq|g(\alpha)-f(\alpha)|/\kappa_{\rm f}$ . From here, the process of bounding ${\rm II}$ from above is the same as bounding ${\rm I}$ , which yields ${\rm II}\leq D/\kappa_{f}\|f-g\|_{L^{2}(\Omega)}$ . Therefore $\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g\|_{H^{1}(\Omega)}$ . Applying the relevant Sobolev approximation theorem (4), settles the case where $|f^{\prime}|>\kappa_{f}>0$ .

We now turn to the case where $g^{\prime}$ and $f^{\prime}$ vanish at finitely many points. As we show, it does not matter whether these zero points coincide or not, and we will therefore treat the case where $f^{\prime}(-1)=0$ , $g^{\prime}(-1)>0$ and $f^{\prime}(\alpha),g^{\prime}(\alpha)>0$ for all $\alpha\in(-1,1]$ . Assume without loss of generality that $f(-1)=g(-1)$ .³³3We showed how to treat the case where the ranges of $f$ and $g$ do not overlap above. Fix $\varepsilon>0$ and divide the integral for $\|p_{\mu}-p_{\nu}\|_{1}$ on the left hand side of (16) into two domains - (1) isolating the singular point in the PDF, i.e., $y\in[g(-1),g(-1+\varepsilon)]$ and (2) the rest of the domain, $y\in[g(-1+\varepsilon),g(1)]$ .

(1)

On the first domain $[g(-1),g(-1+\varepsilon)]$ , we take a crude estimation

(21a)		$\int\limits_{g(-1)}^{g(-1+\varepsilon)}\|p_{\mu}(y)-p_{\nu}(y)\|\,dy\leq\int\limits_{g(-1)}^{g(-1+\varepsilon)}p_{\mu}(y)+p_{\nu}(y)\,dy\,.$
Taking, for example $p_{\mu}(y)$ , by a change of variables $f(\alpha)=y$ we get
(21b)		$\int\limits_{g(-1)}^{g(-1+\varepsilon)}p_{\mu}(y)=\int\limits_{g(-1)}^{g(-1+\varepsilon)}\frac{r(f^{-1}(y))}{f^{\prime}(f^{-1}(y))}\,dy=\int\limits_{-1}^{-1+\varepsilon}r(\alpha)\,d\alpha\leq\\|r\\|_{\infty}\varepsilon\,.$

(2)

The integral on $[g(-1+\varepsilon),g(1)]$ reads the same as (20), where $\kappa_{f}$ is replaced by $\kappa_{f}^{\varepsilon}:\,=\min\limits_{x\in[-1+\varepsilon,1]}|f^{\prime}|$ , yielding

(22)

\int\limits_{g(-1+\varepsilon)}^{g(1)}|p_{\mu}(y)-p_{\nu}(y)|\,dy\leq\frac{\tilde{D}}{(\kappa_{f}^{\varepsilon})^{2}}\|f-g\|_{H^{1}}\,,

where $\hat{D}$ is $\varepsilon$ -independent. How does $\kappa_{f}^{\varepsilon}$ depend on $\varepsilon$ ? Since in the worst case $f(-1)=f^{\prime}(-1)=\cdots f^{(k)}(-1)=0$ but $f^{(k+1)}(-1)=0$ , then by Taylor expansion, $\kappa_{f}^{\varepsilon}\approx\varepsilon^{k}$ for sufficiently small $\varepsilon$ .

By combining both upper bounds, we have that

\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\varepsilon^{-2k}\|f-g\|_{H^{1}(\Omega)}+\|r\|_{\infty}\varepsilon\,.

This upper bound is minimized by equating its $\varepsilon$ derivative to zero, which yields $\|r\|_{\infty}\sim 2k\|f-g\|\varepsilon^{-2k-1}$ . Hence, for sufficiently small $\|f-g\|_{H^{1}}$ (i.e., for sufficiently large $n$ ), $\|p_{\mu}-p_{\nu}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g\|_{H^{1}(\Omega)}^{1/(2k+1)}$ .

Finally, we reduce the general case where $f^{\prime}$ has finitely many nodal points $\alpha_{1},\ldots,\alpha_{L}$ . In the integral $\int_{f(-1)}^{f(1)}|p_{\mu}(y)-p_{\nu}(y)|\,dy$ , we take out the intervals $f((\alpha_{j}-\varepsilon,\alpha_{j}+\varepsilon))$ and treat them separately, as in (21). For $y$ value outside these intervals, we claim that $f^{\prime}$ and $g^{\prime}$ have the same sign, for sufficiently large $n$ . We split the discussion into two cases. First, if $f^{\prime}$ changes its sign at $\alpha_{j}$ , then there are two points in the interval $(\alpha_{j}-\varepsilon,\alpha_{j}+\varepsilon)$ where $f^{\prime}$ is maximized and minimized. Taking $n$ to be sufficiently large, then $g^{\prime}$ has to be positive and negative at these points, respectively, due to the pointwise $C^{1}$ approximation (9). Therefore, $g^{\prime}$ must change its sign in between. It might be that $g^{\prime}$ changes its sign within this interval again, but outside of this interval, since $f^{\prime}>\kappa_{f}^{\varepsilon}>0$ , then $g^{\prime}$ has the same sign as $f^{\prime}$ there for sufficiently large $n$ . The second case is easier - if $f$ does not change its sign at $\alpha_{j}$ , then outside the interval $f^{\prime}>\kappa_{f}^{\varepsilon}>0$ , and so by (9) $g^{\prime}>\kappa_{f}^{\varepsilon}/2>0$ .

5. A transport-based convergence result for $d=1$

In this section, we present a different convergence result for the one-dimensional collocation gPC. This method of proof highlights the role that the “weaker” Wasserstein metric can play in understanding PDFs, and the potential such methods have for future works. The result only applied to the Gauss-Lobatto (GL), defined as the roots of $p_{n}^{\prime}(\alpha)$ , the derivative of the Legendre polynomial of order $n$ , plus the endpoints $\{-1,1\}$ . The polynomial interpolant at GL quadrature points admits the same Sobolev approximation theory as stated in (4) for the Gauss-Legendre points, see [8] and [31, Chapter 10]. As we will see, the following result is restricted to the GL interpolant since the condition $f(\pm 1)=g_{n}(\pm 1)$ guarantees that ${\rm range}(f)={\rm range}(g_{n})$ .

Theorem 5.

Let $\Omega=[-1,1]$ . For any integer $m\geq 1$ , and any function $f\in W^{\sigma,2}(\Omega)$ with $\sigma\geq 2m+4$ and $|f^{\prime}|\geq\kappa_{f}>0$ , let $g_{n}$ be its polynomial interpolant at the GL quadrature points of order $n$ , and suppose the $d\varrho(\alpha)=r(\alpha)d\alpha$ where $r\in W^{m,1}(\Omega)$ . Then

(23)

\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\|f\|_{W^{\sigma,2}(\Omega)}^{\frac{1}{m+1}}n^{-\frac{m}{m+1}\left(\sigma-\frac{5}{6}\right)}\,,

where $A={\rm image}(f)$ .

For analytic functions, both Theorems 2 and 5 guarantee faster than polynomial convergence. For functions $f\in H^{\sigma}\setminus H^{\sigma-1}$ , however, Theorem 2 guarantees slightly better convergence rates. Theorem 5 does improve on Theorem 2 in that it relaxes the demand $r\in C^{1}$ to $r\in W^{m,1}$ . But as noted, the importance of Theorem 5 is the method of its proof, which relies on the Wasserstein distance.

Given two probability measures $\omega_{1}$ and $\omega_{2}$ on $\mathbb{R}$ , the Wasserstein- $1$ distance is defined as⁴⁴4To avoid confusion - $W^{k,p}$ denotes the Sobolev spaces of functions with $k$ derivatives which are $p$ integrable, and $\rm Wass_{p}$ denotes the Wasserstein- $p$ distance (instead of the standard $W_{p}$ ).

(24a)		$\rm Wass_{1}(\omega_{1},\omega_{2}):\,=\inf\limits_{\gamma\in\Gamma}\int\limits_{\mathbb{R}^{2}}\|x-y\|\,d\gamma(x,y)\,,$
where $\Gamma$ is the set of all measures $\gamma$ on $\mathbb{R}^{2}$ for which $\omega_{1}$ and $\omega_{2}$ are marginals, i.e., for any Borel $B\subseteq\mathbb{R}$ ,
(24b)		$\omega_{1}(B)=\int\limits_{\mathbb{R}\times B}\gamma(x,y)\,dy\,,\qquad\omega_{2}(B)=\int\limits_{B\times\mathbb{R}}\gamma(x,y)\,dx\,.$

Since $\omega_{1}(\mathbb{R})=\omega_{2}(\mathbb{R})=1$ , a minimizer of (24) exists, and so $\rm Wass_{1}(\omega_{1},\omega_{2})$ is finite, and it is a metric [35, 44]. Intuitively, the Wasserstein distance is often referred to as the earth-mover’s distance; $\rm Wass_{1}(\omega_{1},\omega_{2})$ computes the minimal work (distance times force) by which one can transfer a mound of earth in the mold of $\omega_{1}$ to a one that is in the mold of $\omega_{2}$ .

How does the Wasserstein distance relate to the problem of densities? In general, $\rm Wass_{1}(\omega_{1},\omega_{2})\lesssim\|p_{\omega_{1}}-p_{\omega_{2}}\|_{L^{1}(\mathbb{R})}$ , but not the other way around [22]. This is why, in general, the Wasserstein distance induces a weaker topology on the space of probability measures than that induced by the $L^{1}$ distance between the PDFs. Moreover, the Wasserstein distance is even well-defined for singular measures, which do not have densities at all. A recent result due to Chae and Walker, however, shows that if the densities are sufficiently regular, the Wasserstein- $1$ metric bounds from above the $L^{1}$ distance between the densities.

Theorem (Chae and Walker [7]).

Let $\omega_{1}$ and $\omega_{2}$ be two Borel measures on $A\subseteq\mathbb{R}$ with PDFs $p_{\omega_{1}},p_{\omega_{2}}\in W^{m,1}(A)$ for some $m\geq 1$ . Then

(25)

\|p_{\omega_{1}}-p_{\omega_{2}}\|_{L^{1}(A)}\lesssim\left(\|p_{\omega_{1}}\|_{W^{m,1}(A)}+\|p_{\omega_{2}}\|_{W^{m,1}(A)}\right)^{\frac{1}{m+1}}\rm Wass_{1}^{\frac{m}{m+1}}(\omega_{1},\omega_{2})\,.

As we will show in Lemma 6, one can verify under what conditions $p_{\mu}$ and $p_{\nu_{n}}$ are sufficiently regular. Therefore, to prove $L^{1}(\mathbb{R})$ convergence of $p_{\nu_{n}}$ to $p_{\mu}$ , it is sufficient to prove the convergence of $\rm Wass_{1}(\mu,\nu_{n})$ . The weak convergence in the Wasserstein metric has recently been established under much more general conditions by the author:

Theorem ([32]).

For any compact Borel set $\Omega\subset\mathbb{R}^{d}$ , and for every $1\leq p,q<\infty$ ,

(26)

\rm Wass_{p}(\mu,\nu_{n})\lesssim\|f-g_{n}\|_{L^{q}(\Omega)}^{\frac{q}{q+p}}\|f-g_{n}\|_{L^{\infty}(\Omega)}^{\frac{p}{q+p}}\,,

where the implicit constant depends only on $\Omega$ , $p$ , and $q$ .

Proof of Theorem 5.

To apply (25) to $\mu$ and $\nu_{n}$ , we need to show that their densities are sufficiently regular.

Lemma 6.

Let $m\geq 1$ and let $f\in W^{2m+4,2}(\Omega)$ with $|f^{\prime}(\alpha)|\geq\kappa_{f}>0$ for all $\alpha\in I$ , and denote $A={\rm range}(f)\subset\mathbb{R}$ . Then $p_{\mu}\in W^{m,1}(A)$ , $p_{\nu_{n}}\in W^{m,1}(A)$ for sufficiently large $n$ , and

\|p_{\nu_{n}}\|_{W^{m,1}(A)}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}\,,

where the implicit constant does not depend on $n$ .

proof of lemma.

We begin, for simplicity, with $m=1$ . By (5), if $f$ is monotonic, formally,

(27)

\frac{d}{dy}p_{\mu}(y)=-r(f^{-1}(y))\frac{f^{{}^{\prime\prime}}(f^{-1}(y))}{|f^{\prime}(f^{-1}(y))|^{3}}+\frac{r^{\prime}(f^{-1}(y))}{|f^{\prime}(f^{-1}(y))|^{2}}\,.

By Sobolev-Morrey embedding (10), $f\in W^{6,2}(\Omega)\subseteq C^{2}(\Omega)$ . Combined with the fact that $|f^{\prime}|>\kappa_{f}>0$ , then $\frac{d}{dy}p_{\mu}(y)\in W^{1,1}(\mathbb{R})$ . For $g_{n}$ , due to Lemma 3, $|g_{n}^{\prime}(\alpha)|>\kappa_{f}/2>0$ on $\Omega$ for all sufficiently large $n$ , and $\|g_{n}^{\prime\prime}\|_{\infty}$ is also uniformly bounded for all $n\geq 1$ ( $g_{n}$ is a polynomial, and therefore smooth). By the analog of (27) for $\nu_{n}$ and $g_{n}$ , we have that $p_{\nu_{n}}\in W^{1,1}(\mathbb{R})$ . Since $|g_{n}^{\prime\prime}|$ is uniformly bounded in terms of $f$ and its derivatives, $\|p_{\nu_{n}}\|_{W^{1,1}(\mathbb{R})}\lesssim\|p_{\mu}\|_{W^{1,1}(\mathbb{R})}$ for all sufficiently large $n$ . Here it is key that our domain is one-dimensional, that $g_{n}$ interpolates $f$ at the endpoints $\alpha=\pm 1$ , and that both functions are monotonic. These three facts ensure that

A={\rm supp}(\mu)={\rm image}(f)={\rm image}(g_{n})={\rm supp}(\nu_{n})\,.

Suppose otherwise that, without loss of generality, $\max f(\alpha)\geq\max g_{n}(\alpha)=y_{*}$ . In this case $p_{\nu_{n}}$ would generically have a step-like discontinuity at $y_{*}\in A$ , and therefore $p_{\nu_{n}}\not\in W^{1,1}(A)$ .

For a general $m\geq 1$ , by direct differentiation one has that $\frac{d^{m}}{dy^{m}}p_{\mu}(y)$ is a sum of rational functions where the numerators depend on $f^{\prime},\ldots,f^{(m+1)}$ and in $r,\ldots,r^{(m)}$ , and the denominators are monomials in $f^{\prime}$ . One then generalizes Lemma 3 to show that adequately-high derivatives of $g_{n}$ are uniformly bounded in $n$ , which concludes the Lemma. ∎

Lemma 6 implies that (25) is applicable in our settings, and that $(\|p_{\mu}\|_{W^{m,1}(A)}+\|p_{\nu_{n}}\|_{W^{m,1}(A)})^{1/m+1}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{1/m+1}$ , where the upper bound depends only on $f$ and its derivatives. Therefore

(28)

\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}=\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(A)}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\rm Wass_{1}^{\frac{1}{m+1}}(\mu,\nu_{n})\,.

Since $\Omega$ is compact, (26) can be applied to $\rm Wass_{1}(\mu,\nu_{n})$ such that (28) yields

(29)

\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\|f-g_{n}\|_{L^{2}(\Omega)}^{\frac{2}{3(m+1)}}\|f-g_{n}\|_{L^{\infty}(\Omega)}^{\frac{1}{3(m+1)}}\,.

By the spectral $L^{2}$ convergence, $\|f-g_{n}\|_{L^{2}}\lesssim W^{\sigma,2}n^{-\sigma}$ , see (4). To bound the $L^{\infty}$ error, we use (10) in conjuction with (4) again, yielding $\|f-g_{n}\|_{C^{0}}\lesssim\|f\|_{W^{\sigma,2}}n^{-\sigma+2\frac{1}{2}}$ . Applying both of these upper bounds to (29) than yields

\displaystyle\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}

\displaystyle\lesssim\|p_{\mu}\|_{W^{m,1}(A)}^{\frac{1}{m+1}}\|f\|_{W^{\sigma,2}(\Omega)}^{\frac{1}{m+1}}n^{-\frac{m}{m+1}(\sigma-\frac{5}{6})}\,,

∎

The following heuristic argument suggests that the condition $|f^{\prime}|>\kappa_{f}>0$ is necessary for the proof of Theorem 5. Let $r\equiv 1/2$ be the uniform density, and suppose without loss of generality that $f$ is monotonic increasing, that $f(0)=f^{\prime}(0)=0$ , and that by Taylor expansion $f(\delta)=c\delta^{k}+o(\delta^{k})$ for some integer $k\geq 2$ and $|c|>0$ as $\delta\to 0$ . Then $f^{\prime}(\delta)=kc\delta^{k-1}+o(\delta^{k-1})$ and by direct substitution into (27)

\frac{d}{dy}p_{\mu}^{\prime}(y)\sim y^{-2+1/k}\,,\qquad y\to 0\,,

which is not integrable in any neighborhood of $y=0$ . Hence, $p_{\mu}\not\in W^{k,1}(A)$ for any $k\geq 1$ , and we cannot use (25).

6. Numerical experiments and open questions

Refer to caption — Figure 1. PDF approximation for $\mu=f_{\sharp}\varrho$ where $f(\alpha)=\sin(20\alpha)$ and $\varrho$ is the uniform probability measure on $[-1,1]$ . (a) $f(\alpha)$ . (b) $p_{\mu}$ (solid, black), $p_{\nu_{10}}$ (dots, red) and $p_{\nu_{50}}$ (dashes, blue), where $\nu_{n}=(g_{n})_{\sharp}\varrho$ where $g_{n}$ is the gPC collocation approximation of order $n$ . $p_{\mu}$ and $p_{\nu_{50}}$ are nearly indistinguishable. (c) $\|p_{\mu}-p_{\nu_{n}}\|_{1}$ as a function of $n$ .

We highlight some aspects of density-approximation using the gPC collocation method (3) by numerical experiments on $\Omega=[-1,1]$ where $d\varrho(\alpha)=1/2\,d\alpha$ is the uniform probability measure on $\Omega$ . We first consider $f(\alpha)=\sin(20\alpha)$ , see Fig. 1(a). The approximation in $L^{2}(\Omega)$ and $H^{1}(\Omega)$ of $f$ by polynomials is quite standard: a small number of collocation points (conversely, a low-order polynomial) does not suffice to resolve the oscillations of $f$ . Once $n$ is sufficiently large, since $f$ is analytic, we expect $g_{n}$ to converge to $f$ in $H^{s}(\Omega)$ exponentially fast, for every $s\geq 0$ . For the PDF, we see that indeed $p_{\nu_{10}}$ approximates $p_{\mu}$ quite poorly, whereas $p_{\nu_{50}}$ is nearly indistinguishable from $p_{\mu}$ ; see Fig. 1(b). Quantitatively, the $L^{1}(\mathbb{R})$ error between the PDFs follows the expected pattern - no convergence for $n\leq 30$ , but then a sharp, exponential decay until machine-precision is reached; see Fig. 1(c). Another interesting facet of this example is that since $f^{\prime}=0$ at several points, $p_{\mu}$ is singular at $\pm 1$ , see (5). Theorem 2 therefore implies that $\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}^{1/3}$ since $f^{\prime\prime}\neq 0$ at the maximas and minimas, see (7). However, since $\|f-g_{n}\|_{H^{1}(\Omega)}$ decays exponentially with $n$ , the effect of this loss of accuracy is hardly noticeable.

We repeat the same experiment in higher dimension, using the gPC interpolant on the Gauss-Legendre tensor-product quadrature points (using the Sparse Grid Matlab Kit [3]). First, for $d=2$ , we test $f(x,y)=\sin(10(x+y))$ , for which we get similar behavior as we got for $d=1$ ; compare Fig. 1(c) with Fig. 2(a). Next, to observe “simpler” exponential convergence, we test in $d=2,3$ the functions $f(x,y)=\sin(x+y)$ and $f(x,y,z)=\sin(x+y+z)$ , in Figs. 2(b) and 2(c), respectively. Note that in all three figures the error convergence is plotted against $n$ , the maximal polynomial degree, see e.g., (2). The computational cost, however, as represented by the number of grid point, hence the number of evaluations of $f$ , scales is $(n+1)^{d}$ .

Our next two examples are of non-smooth functions. It is generally not advisable to approximate such functions with global polynomial methods such as gPC, and we do not promote this as a strategy in this work. Rather, we consider non-smooth functions to examine the sharpness and scope of our theory. In Fig. 3 we consider $f(\alpha)=|\alpha|^{3}$ . This is case not covered by Theorem 2, as $f\in~{}H^{3}(\Omega)$ but not in $H^{4}(\Omega)$ . Notwithstanding, we observe numerically that $\|p_{\mu}-~{}p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim n^{-1.44}$ , which is comparable to $\|f-g_{n}\|_{H^{1}(\Omega)}$ which by (4) converges as $n^{-1.5}$ . Recall that the requirement in Theorem 2 that $f\in H^{6}$ stems from our use of Sobolev embedding in Lemma 3 to guarantee that $\|g_{n}\|_{C^{2}}$ is uniformly bounded in $n$ . This boundedness is numerically satisfied in this example (results not shown), even though it is not guaranteed by our current theory.

Numerically, it seems that the $k$ -dependence of the upper bound (7) might not be tight. Since $f^{\prime}(0)=f^{\prime\prime}(0)=0$ , by applying (7) with $k=2$ we expect that $\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim\|f-g_{n}\|_{H^{1}(\Omega)}^{1/5}\lesssim-n^{-0.3}$ , which is much slower then what we observe in practice. Finally, consider a third function, $f(\alpha)=|\alpha-0.5|$ , which is in $H^{1}(\Omega)$ but not in $H^{2}(\Omega)$ , see Fig. 4. While it is certainly not good practice to approximate such non-smooth functions with global polynomials, it is interesting to see that even so, $\|p_{\mu}-p_{\nu_{n}}\|_{L^{1}(\mathbb{R})}\lesssim n^{-0.78}$ . In comparison, since $f\not\in H^{2}(\Omega)$ , the theoretically-predicted $H^{1}(\Omega)$ convergence rate of $g_{n}$ to $f$ by (4) is slower than $-1/2$ , but was computed to be roughly $-0.6$ (results not shown). That convergence is obtained implies that stronger mechanisms of PDF convergence at at play than what our current theory accounts for.

This work also lays the foundations to the study of pushforwards by sparse-grid representations, which are of great practical importance. A key impediment to approximation in the multidimensional case $d>1$ is the so-called curse of dimensionality - the computational cost of constructing $g_{n}$ grows exponentially with the dimension $d$ . For example, the collocation gPC on a tensor-grid of quadrature-points requires sampling $f$ at $n^{d}$ points, which becomes non-feasible already for relatively moderate dimensions. A common approach to this problem in $L^{2}$ approximation (which in the UQ context is equivalent to the approximation of moments) is by using sparse grid, which allow for superior orders of convergence [3, 5, 50, 51]. Applying our current analytic approach to sparse grids would require a detailed analysis of the convergence of sparse quadratures in high-order Sobolev spaces.⁵⁵5Not to be confused with the more standard topic of $L^{2}(\Omega)$ or $H^{1}(\Omega)$ convergence sparse-representations of high-regularity functions $f\in H^{s}(\Omega)$ for $s>1$ . To the best of our knowledge, there are few relevant results in this direction, see [23] and the references therein. However, as Theorem 2 suggests for $d=1$ , it might be that a sharper result is possible in $d>1$ as well. Such improvements of Theorem 1 might relax the dependence in high-order Sobolev norms to $H^{1}(\Omega)$ convergence, a widely studied and well understood topic [4, 5].

7. Acknowledgments

The author would like to thank A. Ditkowski, G. Fibich, S. Steinerberger, L. Tamellini, H. Wang, and M.I. Weinstein for useful comments and discussions. The author acknowledges the support of the AMS-Simons Travel Grant.

References

[1] M.J. Ablowitz and T.P. Horikis. Interacting nonlinear wave envelopes and rogue wave formation in deep water, Phys. Fluids, 27 (2015), pp. 012107.
[2] RA Adams and JJ Fournier, Sobolev Spaces, Elsevier, 2003.
[3] J Bäck, F Nobile, L Tamellini, and R Tempone. Stochastic spectral Galerkin and collocation methods for PDEs with random coefficients: a numerical comparison. In J.S. Hesthaven and E.M. Ronquist, editors, Spectral and High Order Methods for Partial Differential Equations, volume 76 of Lecture Notes in Computational Science and Engineering, pages 43–62. Springer, 2011.
[4] V Berthelmann, E Novak, and K Ritter, High dimensional polynomial interpolation on sparse grids, Adv. Comput. Math., 12, 1999, pp. 273–288.
[5] HJ Bungartz and M Griebel, Sparse grids, Acta Numerica, 13, 2004, pp. 147–269.
[6] T Butler, J Jakeman, and T Widely, Convergence of probability densities using approximate models for forward and inverse problems in uncertainty quantification, SIAM J. Sci. Comput., 40, 2018, A3523–A3548.
[7] Chae M and Walker SG, Wasserstein upper bounds of the total variation for smooth densities, Stats. Probab. Lett., 163 (2020), pp. 108771.
[8] C Canuto and A Quanteroni, Approximation results for orthogonal polynomials in Sobolev spaces, Math. Comput., 38 (1982), pp. 67–86.
[9] G Capodaglio, M Gunzburger, and HP Wynn, Approximation of probability density functions for SPDEs using truncated series expansions, arXiv preprint arXiv:1810.01028, 2018.
[10] Q.Y Chen, D. Gottlieb, and J.S. Hesthaven. Uncertainty analysis for the steady-state flows in a dual throat nozzle, J. Comput. Phys., 204 (2005), pp. 378–398.
[11] I. Colombo, F. Nobile, G. Porta, A. Scotti, and L. Tamellini. Uncertainty quantification of geochemical and mechanical compaction in layered sedimentary basins, Comput. Methods Appl. Mech. Engrg., 328 (2018), pp. 122–146.
[12] PG Constantine, MS Eldred, and ET Phipps, Sparse pseudospectral approximation method, Comput. Methods Appl. Mech. Engrg., 229 (2012), pp. 1–12.
[13] PJ Davis, Interpolation and Approximation, Dover, 1975.
[14] PJ Davis and P Rabinowitz, Numerical Integration, Academic, New York, 1975.
[15] A Ditkowski, G Fibich, and A Sagiv, Density estimation in uncertainty propagation problems using a surrogate model, SIAM/ASA J. Uncertain. Quantif., 8 (2020), pp. 261-300.
[16] A Ditkowski and R Katz, On spectral approximations with nonstandard weight functions and their implementations to generalized chaos expansions, J. Sci. Comput., 79 (2019), pp. 1985–2005.
[17] D Estep, A Malqvist, and S Tavener, Nonparametric density estimation for randomly perturbed elliptic problems I: computational methods, a posteriori analysis, and adaptive error control, SIAM J. Sci. Comput., 31, 2009, pp. 2935–2959.
[18] LC Evans, Partial Differential Equations, (Vol. 19) American Mathematical Society, 2010.
[19] B. Ganapathysubramanian and N. Zabaras. Sparse grid collocation schemes for stochastic natural convection problems, J. Comp. Phys., 225 (2007), pp. 652–685.
[20] R. Ghanem, D. Higdon, and H. Owhadi. Handbook of Uncertainty Quantification, Springer, New York, 2017.
[21] R. Ghanem and P.D. Spanos. Stochastic Finite Elements: a Spectral Approach, Springer-Verlag, New-York, 1991.
[22] A.L. Gibbs and F.E. Su, On choosing and bounding probability metrics, Int. Stats. Rev., 70:419–435, 2002.
[23] M Griebel and S Knapek, Optimized general sparse grid approximation spaces for operator equations, Math. Comput., 78, 2009, pp. 2223–2257.
[24] J.S. Hesthaven, S. Gottlieb, and D. Gottlieb. Spectral Methods for Time-Dependent Problem, Cambridge, UK, 2007.
[25] O.P. Le Maître, O.M. Knio, H.N. Najm, and R. Ghanem. Uncertainty propagation using Wiener–Haar expansions, J. Comp. Phys., 197 (2004), pp. 28–57.
[26] L. Le Cam and G.L. Yang. Asymptotics in Statistics: Some Basic Concepts, Springer Science & Business Media, New York, 2012.
[27] O.P. Le Maître, L. Mathelin, O.M. Knio, and M.Y. Hussaini. Asynchronous time integration for polynomial chaos expansion of uncertain periodic dynamics, Discrete Contin. Dyn. Syst, 28 (2010), pp. 199–226.
[28] A. O’Hagan. Polynomial chaos: A tutorial and critique from a statistician’s perspective. SIAM/ASA J. Uncertain. Quantif., 20 (2013), pp. 1–20.
[29] G. Patwardhan, X. Gao, A. Sagiv, A. Dutt, J. Ginsberg, A. Ditkowski, G. Fibich, and A.L. Gaeta. Loss of polarization of elliptically polarized collapsing beams. Phys. Rev. A, 99 (2019), pp. 033824.
[30] Piazzola C, Tamellini L, Pellegrini R, Broglia R, Serani A, and Diez M, Uncertainty quantification of ship resistance via multi-index stochastic collocation and radial basis function surrogates: a comparison, AIAA Aviation 2020 Forum, pp. 3160, 2020.
[31] A Quarteroni, R Sacco, and F Salero, Numerical Mathematics, Springer-Verlag, New York NY, USA, 2000.
[32] A Sagiv, The Wasserstein distances between pushed-forward measures with applications to uncertainty quantification, arXiv preprints, arXiv:1902.05451 (to appear in Comm. Math. Sci.)
[33] A Sagiv, A Ditkowski, and G Fibich, Loss of phase and universality of stochastic interactions between laser beams, Opt. Exp., 25 (2017), pp. 24387–24399.
[34] T. Salvemini, Sul calcolo degli indici di concordanza tra due caratteri quantitativi, Atti della I Riunione della Soc. Ital. di Statistica, Roma, 1943.
[35] F Santambrogio, Optimal Transport for Applied Mathematicians. Calculus of Variations, PDEs, and Modeling, Progress in Nonlinear Differential Equations and their Applications, Birkäuser, New York, 2015.
[36] G. Stefanou. The stochastic finite element method: past, present and future, Comput. Methods Appl. Mech. Engrg., 198 (2009), pp. 1031–1051.
[37] B. Sudret and A. Der Kiureghian. Stochastic Finite Element Methods and Reliability: a State-of-the-Art Report, Department of Civil and Environmental Engineering, University of California Berkeley, Berkeley, CA, 2000.
[38] G. Szego. Orthogonal Polynomials, Amer. Math. Soc. Colloq. Publ., 23, American Mathematical Society, New York, 1939.
[39] R Tipireddy and R Ghanem, Basis adaptation in homogeneous chaos spaces, J. Comput. Phys., 259, 2014, pp. 304–317.
[40] L.N. Trefethen. Approximation Theory and Approximation Practice, SIAM, Philadelphia, PA, 2013.
[41] A.B. Tsybakov. Introduction to Nonparametric Estimation, Springer, New York, 2009.
[42] S. Ullmann and J. Lang. POD-Galerkin modeling and sparse-grid collocation for a natural convection problem with stochastic boundary conditions, in Sparse Grids and Applications, Munich, Spring (2012), pp. 295–315.
[43] S.S. Vallender, Calculation of the Wassertein distance between probability distributions on the line, SIAM Theory Prob. Appl., 18:784–786, 1974.
[44] C Villani, Topics in Optimal Transportation, American Mathematical Society, 2003.
[45] X Wan and GE Karniadakis, An adaptive multi-element generalized polynomial chaos method for stochastic differential equations, J. Comput. Phys., 209 (2005), pp. 617–642.
[46] H Wang, How fast does the best polynomial approximation converge than Legendre projection? Numer. Math., 147 (2021), pp. 481–583.
[47] H. Wang and S. Xiang, On the convergence rates of Legendre approximation, Math. Comp., 81 (2012), pp. 861–877.
[48] L Wasserman, All of Nonparametric Statistics, Springer Science & Business Media, 2006.
[49] L Wasserman, All of Statistics: A Concise Course in Statistical Inference, Springer Science & Business Media, New York, 2004.
[50] D. Xiu. Numerical Methods for Stochastic Computations: a Spectral Method Approach. Princeton University, Princeton, NJ, 2010.
[51] D. Xiu and J.S. Hesthaven. High-order collocation methods for differential equations with random inputs. SIAM J. Sci. Comput., 27 (2005), pp. 1118–1139.
[52] D. Xiu and G.E. Karniadakis. The Wiener–Askey polynomial chaos for stochastic differential equations, SIAM J. Sci. Comput., 24 (2002), pp. 619–644.
[53] J Zech and Y. Marzouk, Sparse approximation of triangular transports on bounded domains, arXiv preprint arXiv:2006.06994, 2020.

	$\displaystyle\int\limits_{f(1)}^{g(1)}p_{\nu}(y)\,dy$	$\displaystyle=\int\limits_{f(1)}^{g(1)}\frac{r(g^{-1}(y))}{g^{\prime}(g^{-1}(y))}\,dy$
		$\displaystyle\leq\frac{2}{\kappa_{f}}\\|r\\|_{\infty}\|g(1)-f(1)\|$
		$\displaystyle\leq\frac{2}{\kappa_{f}}\\|r\\|_{\infty}\\|g-f\\|_{C^{0}(\Omega)}$
		$\displaystyle\lesssim\frac{2}{\kappa_{f}}\\|r\\|_{\infty}\\|g-f\\|_{H^{1}(\Omega)}\,,$

(17)	$\displaystyle\|r(\alpha)g^{\prime}(\alpha_{*})-$	$\displaystyle f^{\prime}(\alpha)r(\alpha_{*})\|$
(18)		$\displaystyle\leq r(\alpha)\|g^{\prime}(\alpha_{})-g^{\prime}(\alpha)\|+r(\alpha)\|g^{\prime}(\alpha)-f^{\prime}(\alpha)\|+f^{\prime}(\alpha)\|r(\alpha)-r(\alpha_{})\|$
(19)		$\displaystyle\leq D\|\alpha-\alpha_{*}\|+r(\alpha)\|f^{\prime}(\alpha)-g^{\prime}(\alpha)\|\,,\qquad$