This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A note on sampling recovery of multivariate functions in the uniform norm

Kateryna Pozharska, Tino Ullrich111Corresponding author: [email protected]

Institute of Mathematics of NAS of Ukraine, 01024 Kyiv, Ukraine
TU Chemnitz, Faculty of Mathematics, 09107 Chemnitz, Germany
Abstract

We study the recovery of multivariate functions from reproducing kernel Hilbert spaces in the uniform norm. Our main interest is to obtain preasymptotic estimates for the corresponding sampling numbers. We obtain results in terms of the decay of related singular numbers of the compact embedding into L2(D,ϱD)L_{2}(D,\varrho_{D}) multiplied with the supremum of the Christoffel function of the subspace spanned by the first mm singular functions. Here the measure ϱD\varrho_{D} is at our disposal. As an application we obtain near optimal upper bounds for the sampling numbers for periodic Sobolev type spaces with general smoothness weight. Those can be bounded in terms of the corresponding benchmark approximation number in the uniform norm, which allows for preasymptotic bounds. By applying a recently introduced sub-sampling technique related to Weaver’s conjecture we mostly lose a logn\sqrt{\log n} and sometimes even less. Finally we point out a relation to the corresponding Kolmogorov numbers.

Keywords and phrases : Sampling recovery, least squares approximation, random sampling, rate of convergence, uniform norm

2010 AMS Mathematics Subject Classification : 41A25, 41A60, 41A63, 42A10, 68W40, 94A20

1 Introduction

We consider the problem of the recovery of complex-valued multivariate functions on a compact domain DdD\subset\mathds{R}^{d} in the uniform norm. We are allowed to use function samples on a suitable set of nodes 𝐗:={𝐱1,,𝐱n}D\mathbf{X}\mathrel{\mathop{\mathchar 58\relax}}=\{\mathbf{x}^{1},...,\mathbf{x}^{n}\}\subset D. The functions are modeled as elements from some reproducing kernel Hilbert space H(K)H(K) with kernel K:D×DK\colon D\times D\to\mathds{C}. The nodes 𝐗\mathbf{X} are drawn independently according to a tailored probability measure depending on the spectral properties of the embedding of H(K)H(K) in L2(D,ϱD)L_{2}(D,\varrho_{D}). This “random information” is used for the whole class of functions and our main focus in this paper is on worst-case errors. We additionally apply a recently introduced sub-sampling technique, see [24] to further reduce the sampling budget. Note that in one scenario the measure ϱD\varrho_{D} is at our disposal and represents a certain degree of freedom. In another scenario the measure ϱD\varrho_{D} is a fixed measure, namely the one where the data is coming from.

Authors distinguish two main paradigms for the recovery of functions from “samples”. Here “samples” refer to the available information. The first one uses general “linear” information about functions, i.e., evaluations given by nn arbitrary linear continuous functionals as a number of inner products of the Hilbert space, e.g., Fourier or wavelet coefficients. The corresponding quantity that measures the minimal worst case error is given by the Gelfand numbers/widths, see the monographs [26, 27, 29]. In case of a Hilbert source space (like in our setting) these quantities equal the so-called approximation numbers which are usually denoted by ana_{n} and represent the worst-case error obtained by linear recovery algorithms. Precisely, let H(K)H(K) be a reproducing kernel Hilbert space (RKHS) of multivariate complex-valued functions f(𝐱)=f(x1,,xd)f(\mathbf{x})=f(x_{1},\dots,x_{d}), 𝐱D\mathbf{x}\in D\subset\mathds{R} with the kernel K:D×DK\colon D\times D\rightarrow\mathds{C}. Consider the identity operator Id:H(K)F\mathrm{Id}\colon H(K)\rightarrow F. Further, by (H(K),F)\mathcal{L}(H(K),F) we denote the set of linear continuous operators A:H(K)FA\colon H(K)\rightarrow F. The approximation numbers an(Id)a_{n}(\mathrm{Id}) are defined as follows

an(Id:H(K)F):=infA(H(K),F)rankA<nsupfH(K)1fAfF.a_{n}\left(\mathrm{Id}\colon H(K)\rightarrow F\right)\mathrel{\mathop{\mathchar 58\relax}}=\inf\limits_{\begin{subarray}{c}A\in\mathcal{L}(H(K),F)\\ {\rm rank}A<n\end{subarray}}\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-Af\|_{F}. (1.1)

If FF is a Hilbert space, then ana_{n} represents the singular numbers σn\sigma_{n} of the corresponding embedding.

Several practical applications are modeled with “standard information”, i.e., a finite number of function values at some points. Let f(𝐱1),,f(𝐱n)f(\boldsymbol{\rm x}^{1}),\dots,f(\boldsymbol{\rm x}^{n}) be the given data, where 𝐗={𝐱1,,𝐱n}\boldsymbol{\rm X}=\{\boldsymbol{\rm x}^{1},\dots,\boldsymbol{\rm x}^{n}\}, 𝐱iD\boldsymbol{\rm x}^{i}\subset D, i=1,,ni=1,\dots,n, are the nodes. The corresponding sampling numbers are defined as follows:

gn(Id:H(K)F):=inf𝐗={𝐱1,,𝐱n}infR(n,F)supfH(K)1fR(f(𝐗))F.g_{n}\left(\mathrm{Id}\colon H(K)\rightarrow F\right)\mathrel{\mathop{\mathchar 58\relax}}=\inf_{\boldsymbol{\rm X}=\{\boldsymbol{\rm x}^{1},\dots,\boldsymbol{\rm x}^{n}\}}\inf_{R\in\mathcal{L}(\mathds{C}^{n},F)}\sup_{\|f\|_{H(K)}\leq 1}\|f-R\left(f(\boldsymbol{\rm X})\right)\|_{F}. (1.2)

Note that we always have an(Id)gn(Id)a_{n}(\mathrm{Id})\leq g_{n}(\mathrm{Id}).

Our goal is to recover functions using a (weighted) least squares algorithm and measure the error in \ell_{\infty}. Function values are taken at independently drawn random nodes in DD according to a tailored probability measure ϱ\varrho. The main feature of this approach is that the nodes are drawn once for the whole class. This is usually termed as “random information”. Note that we do not investigate Monte Carlo algorithms in this paper.

A recent breakthrough was made by D. Krieg and M. Ullrich [14] in establishing a new connection between sampling numbers and singular values for L2L_{2}-approximation. This has been extended by L. Kämmerer, T. Ullrich and T. Volkmer [9] and bounds with high probability have been obtained, see also M. Moeller and T. Ullrich [23] and M.Ullrich [38] for further refinements. In all the mentioned contributions we need a logarithmic oversampling, i.e, nmlogmn\asymp m\log m. A new sub-sampling technique has been applied by N. Nagel, M. Schäfer and T. Ullrich in [24], where it is shown that n=𝒪(m)n=\mathcal{O}(m) samples are sufficient. For non-Hilbert function classes we refer to D. Krieg and M. Ullrich [13] and V.N. Temlyakov [35], where a connection to Kolmogorov widths is pointed out. There is also a relation in the \ell_{\infty}-setting, see [32] for details.

In the present paper we obtain worst case recovery guarantees with high probability for the weighted least squares recovery operator S~𝐗m\widetilde{S}_{\mathbf{X}}^{m} (see Algorithm 1) where the error is measured in the supremum norm of the space (D)\ell_{\infty}(D) of bounded on DD functions. The operator S~𝐗m\widetilde{S}_{\mathbf{X}}^{m} uses a density function ϱm\varrho_{m} tailored to the spectral properties of the corresponding embedding into L2(D,ϱD)L_{2}(D,\varrho_{D}), see (3.3) and (3.4). We need to impose stronger conditions, namely we consider a compact set DdD\subset\mathds{R}^{d}, a continuous kernel K(𝐱,𝐲)K(\mathbf{x},\mathbf{y}) such that sup𝐱DK(𝐱,𝐱)<\sup_{\mathbf{x}\in D}\sqrt{K(\mathbf{x},\mathbf{x})}<\infty and a Borel measure ϱD\varrho_{D} with full support (such that Mercer’s theorem is applicable). Then we get for m:=nc1rlognm\mathrel{\mathop{\mathchar 58\relax}}=\lfloor\frac{n}{c_{1}r\log n}\rfloor, r>1r>1, the following error bound with probability larger than 1c2n1r1-c_{2}n^{1-r}

supfH(K)1fS~𝐗mf(D)2c3max{NK,ϱD(m)mkm/2σk2,km/2NK,ϱD(4k)σk2k}\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-\widetilde{S}^{m}_{\boldsymbol{\rm X}}f\|^{2}_{\ell_{\infty}(D)}\leq c_{3}\max\Big{\{}\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2},\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\Big{\}} (1.3)

with three universal constants c1,c2,c3>0c_{1},c_{2},c_{3}>0, where NK,ϱD(m)N_{K,\varrho_{D}}(m) denotes the supremum over the spectral function (Christoffel function) which is defined in Section 2. When applying this theorem to periodic spaces with mixed smoothness Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}) we obtain the following result (see Theorem 7.8).

If s>(1+log2d)/2s>(1+\log_{2}d)/2, r>1r>1, m=n/(10rlogn)m=\lfloor n/(10r\log n)\rfloor and β=2s/(1+log2d)>1\beta=2s/(1+\log_{2}d)>1 then

supfHmixs(𝕋d)1fS𝐗mfL(𝕋d)21612(163)βββ1(m21)β+1\sup_{\|f\|_{H^{s}_{{\rm mix}}(\mathds{T}^{d})}\leq 1}\|f-S_{\mathbf{X}}^{m}f\|^{2}_{L_{\infty}(\mathds{T}^{d})}\leq 1612\left(\frac{16}{3}\right)^{\beta}\frac{\beta}{\beta-1}\left(\frac{m}{2}-1\right)^{-\beta+1} (1.4)

with probability larger than 13n1r1-3n^{1-r} . Clearly, we have to determine the norm in Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}). In the above statement we used the #\#-norm, see (7.7). This is a preasymptotic statement.

Note, that the sampling operator uses nmlogmn\asymp m\log m many sample points such that we obtain a bound for gcmlogmg_{\lfloor cm\log m\rfloor}. By a recently introduced sub-sampling technique (see [24]) we obtain for the sampling numbers another result which may be better in many cases, namely

gn(Id:H(K)(D))2c4max{NK,ϱD(n)lognnkc5nσk2,kc5nNK,ϱD(4k)σk2k},g_{n}(\mathrm{Id}\colon H(K)\rightarrow\ell_{\infty}(D))^{2}\leq c_{4}\max\Big{\{}\frac{N_{K,\varrho_{D}}(n)\log n}{n}\sum\limits_{k\geq\lfloor c_{5}n\rfloor}\sigma_{k}^{2},\sum\limits_{k\geq\lfloor c_{5}n\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\Big{\}}\,, (1.5)

where c4,c5>0c_{4},c_{5}>0 are universal constants. We evaluate this bound for several relevant examples to obtain new bounds for sampling numbers. Here the question for reasonable assumptions arises to ensure the polynomial order qq^{*} of the error decay, i.e., the supremum over all qq such that the worst case error is of the order nqn^{-q}. We improve a result by F. Y. Kuo, G. W. Wasilkowski and H. Woźniakowski [21] and show that qstd=p1/2q^{\rm std}_{\infty}=p-1/2 in case NK,ϱD(k)=𝒪(k)N_{K,\varrho_{D}}(k)=\mathcal{O}(k) and σn=𝒪(np)\sigma_{n}=\mathcal{O}(n^{-p}). We actually prove more: If there is a measure ϱD\varrho_{D} such that NK,ϱD(k)=𝒪(k)N_{K,\varrho_{D}}(k)=\mathcal{O}(k) we obtain as a consequence of (1.3) and (1.5) together with [4, Lem. 3.3] the relation

gn(Id)CϱD,Kmin{an/(c6logn)(Id),lognac5n(Id)},g_{n}(\mathrm{Id})\leq C_{\varrho_{D},K}\min\{a_{\lfloor n/(c_{6}\log n)\rfloor}(\mathrm{Id}),\sqrt{\log n}\cdot a_{\lfloor c_{5}n\rfloor}(\mathrm{Id})\}\,, (1.6)

where c6>0c_{6}>0 is an absolute constant and CϱD,K>0C_{\varrho_{D},K}>0 depends on the kernel KK and ϱD\varrho_{D}, see Corollary 4.3. This improves on a recent result by L. Kämmerer, see [8, Theorem 4.11], which was stated under stronger conditions (that is, in turn, an improvement of the previous result by L. Kämmerer and T. Volkmer [10])). It shows, that in case NK,ϱD(k)=𝒪(k)N_{K,\varrho_{D}}(k)=\mathcal{O}(k) we only loose a logn\sqrt{\log n} or sometimes even less (assuming a polynomial decay of the ana_{n}). In case that NK,ϱD(k)=𝒪(ku)N_{K,\varrho_{D}}(k)=\mathcal{O}(k^{u}), with u>1u>1, we have at least that qstdpu/2q^{\rm std}_{\infty}\geq p-u/2.

As to spaces with mixed smoothness Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}), s>1/2s>1/2, we know by the work of V.N. Temlyakov [34] that

gn(Is)s,dns+1/2(logn)(d1)s,g_{n}(\operatorname{I}_{s})\asymp_{s,d}n^{-s+1/2}(\log n)^{(d-1)s}\,,

where the optimal algorithm is a (deterministic) Smolyak type interpolation operator. Our algorithm yields the bound

gn(Is:Hmixs(𝕋d)L(𝕋d))s,dns+1/2(logn)(d1)s+min{s1/2,1/2},g_{n}(\operatorname{I}_{s}\mathrel{\mathop{\mathchar 58\relax}}H^{s}_{{\rm mix}}(\mathds{T}^{d})\to L_{\infty}(\mathds{T}^{d}))\lesssim_{s,d}n^{-s+1/2}(\log n)^{(d-1)s+\min\{s-1/2,1/2\}}\,,

which is only worse by a small dd-independent log\log-term. However, as indicated in (1.4) we also obtain preasymptotic bounds for gn(Is)g_{n}(\operatorname{I}_{s}) from (1.3) and (1.5). The open question remains whether random iid nodes are asymptotically optimal for this framework. Such a question has been recently answered positively for isotropic Sobolev spaces in [11].

Let us point out that the above general results are also applicable to the more general periodic Sobolev type spaces Hw(𝕋d)H^{w}(\mathds{T}^{d}) with a weight sequence (w(𝐤))𝐤d(w(\mathbf{k}))_{\mathbf{k}\in\mathds{Z}^{d}}. If (τn)n(\tau_{n})_{n} represents the non-increasing rearrangement of the square summable weight sequence (1/w(𝐤))𝐤d(1/w(\mathbf{k}))_{\mathbf{k}\in\mathds{Z}^{d}} then we have

gn(Iw:Hw(𝕋d)L(𝕋d))2c7min{lognkc9nτk2,kn/(c8logn)τk2}g_{n}(\operatorname{I}_{w}\mathrel{\mathop{\mathchar 58\relax}}H^{w}(\mathds{T}^{d})\to L_{\infty}(\mathds{T}^{d}))^{2}\leq c_{7}\min\Big{\{}\log n\sum\limits_{k\geq\lfloor c_{9}n\rfloor}\tau_{k}^{2},\sum\limits_{k\geq\lfloor n/(c_{8}\log n)\rfloor}\tau_{k}^{2}\Big{\}}

with three universal constants c7,c8,c9>0c_{7},c_{8},c_{9}>0. In most of the cases we do not know whether there is a deterministic method which may compete with this bound.

Notation. By \mathds{N} we denote the set of natural numbers putting 0={0}{\mathds{N}}_{0}=\mathds{N}\cup\{0\}, by \mathds{Z} - the set of integer, \mathds{R} - real, and \mathds{C} - complex numbers. As usual, aa\in\mathds{R} is decomposed into a=a+{a}a=\lfloor a\rfloor+\{a\}, where 0{a}<10\leq\{a\}<1 and a\lfloor a\rfloor\in\mathds{Z}. Let further d\mathds{N}^{d}, 0d{\mathds{N}}_{0}^{d}, d\mathds{Z}^{d}, d\mathds{R}^{d}, d\mathds{C}^{d}, d2d\geq 2, be the spaces of dd-dimensional vectors 𝐱=(x1,,xd)\mathbf{x}=(x_{1},\dots,x_{d}) with coordinates, respectively, from \mathds{N}, 0{\mathds{N}}_{0}, \mathds{Z}, \mathds{R}, and \mathds{C}. The notation 𝐱𝐲\mathbf{x}\leq\mathbf{y}, where 𝐱,𝐲d\mathbf{x},\mathbf{y}\in\mathds{C}^{d}, stands for xiyix_{i}\leq y_{i}, i=1,,di=1,\dots,d (relations \geq, <<, >> and == are defined analogically), 𝐱¯\overline{\mathbf{x}} is complex conjugate to 𝐱\mathbf{x}, 𝐱\mathbf{x}^{\top} is transpose to 𝐱\mathbf{x}. For 𝐤d\mathbf{k}\in\mathds{Z}^{d} and 𝐱d\mathbf{x}\in\mathds{C}^{d} we write 𝐱𝐤:=i=1dxiki\mathbf{x}^{\mathbf{k}}\mathrel{\mathop{\mathchar 58\relax}}=\prod_{i=1}^{d}x_{i}^{k_{i}} assuming that 00:=10^{0}\mathrel{\mathop{\mathchar 58\relax}}=1. Let also |𝐤|=(|k1|,,|kd|)|\mathbf{k}|=(|k_{1}|,\dots,|k_{d}|), |𝐤|0|\mathbf{k}|_{0} be the number of nonzero components (sparsity) of 𝐤\mathbf{k}. The symbols (𝐱,𝐲)(\mathbf{x},\mathbf{y}) or 𝐱𝐲\mathbf{x}\cdot\mathbf{y} mean the Euclidean scalar product in d\mathds{R}^{d} or d\mathds{C}^{d}. We use the notation log\log for the natural logarithm. By m×n\mathds{C}^{m\times n} we denote the set of all m×nm\times n matrices 𝐋\boldsymbol{\rm L} with complex entries, where, in what follows, 𝐋\|\boldsymbol{\rm L}\| or 𝐋22\|\boldsymbol{\rm L}\|_{2\to 2} is their spectral norm (the largest singular value), 𝐋\boldsymbol{\rm L}^{*} is adjoint matrix to 𝐋\boldsymbol{\rm L}, i.e., the matrix obtained by taking transpose and then taking complex conjugate of each entry of 𝐋\boldsymbol{{\rm L}}. For an operator A(X,Y)A\in\mathcal{L}(X,Y) between two Banach spaces we use the symbol A\|A\| for its operator norm, i.e.,

A:=supfX,f0AfYfX.\|A\|\mathrel{\mathop{\mathchar 58\relax}}=\sup\limits_{f\in X,f\neq 0}\frac{\|Af\|_{Y}}{\|f\|_{X}}.

If X,YX,Y are Hilbert spaces, by AA^{*} we denote adjoint to AA operator A:YXA^{*}\colon Y\rightarrow X, such that for fXf\in X, gYg\in Y the following equality for the corresponding scalar products holds: (Af,g)Y=(f,Ag)X(Af,g)_{Y}=(f,A^{*}g)_{X}. The notation A1A^{-1} stands for inverse operator. For two sequences (an)n=1,(bn)n=1(a_{n})_{n=1}^{\infty},(b_{n})_{n=1}^{\infty}\subset\mathds{R} we write anbna_{n}\lesssim b_{n} if there exists a constant c>0c>0, such that ancbna_{n}\leq cb_{n} for all nn. We will write anbna_{n}\asymp b_{n} if anbna_{n}\lesssim b_{n} and bnanb_{n}\lesssim a_{n}. If the constant cc depends on the dimension dd and smoothness ss, we indicate it by s,d\lesssim_{s,d} and s,d\asymp_{s,d}.

2 Reproducing kernel Hilbert spaces

We will work in the framework of reproducing kernel Hilbert spaces. The relevant theoretical background can be found in [1, Chapt. 1] and [3, Chapt. 4].

Let DdD\subset\mathds{R}^{d} be an arbitrary subset and ϱD\varrho_{D} be a measure on DD. By L2(D,ϱD)L_{2}(D,\varrho_{D}) denote the space of complex-valued square-integrable functions with respect to ϱD\varrho_{D}, where

(f,g)L2(D,ϱD):=Df(𝐱)g(𝐱)¯dϱD(𝐱),(f,g)_{L_{2}(D,\varrho_{D})}\mathrel{\mathop{\mathchar 58\relax}}=\int_{D}f(\mathbf{x})\overline{g(\mathbf{x})}{\rm d}{\varrho_{D}}(\mathbf{x}),

and, respectively,

fL2(D,ϱD)=(D|f(𝐱)|2dϱD(𝐱))1/2.\|f\|_{L_{2}(D,\varrho_{D})}=\left(\int_{D}|f(\mathbf{x})|^{2}{\rm d}{\varrho_{D}}(\mathbf{x})\right)^{1/2}.

With (D)\ell_{\infty}(D) we denote the space of bounded on DD functions, for which

f(D):=sup𝐱D|f(𝐱)|.\|f\|_{\ell_{\infty}(D)}\mathrel{\mathop{\mathchar 58\relax}}=\sup\limits_{\mathbf{x}\in D}|f(\mathbf{x})|.

Note, that one do not need the measure ϱD\varrho_{D} for this embedding. In fact, here we mean “boundedness” in the strong sense (in contrast to essential boundedness w.r.t. the measure ϱD\varrho_{D}).

We further consider a reproducing kernel Hilbert space H(K)H(K) with a Hermitian positive definite kernel K(𝐱,𝐲)K(\mathbf{x},\mathbf{y}) on D×DD\times D. The crucial property is the identity

f(𝐱)=(f,K(,𝐱))H(K)f(\mathbf{x})=(f,K(\cdot,\mathbf{x}))_{H(K)}

for all 𝐱D\mathbf{x}\in D. It ensures that point evaluations are continuous functionals on H(K)H(K). We will use the notation from [3, Chapt. 4]. In the framework of this paper, the boundedness of the kernel KK is assumed, i.e.,

K:=sup𝐱DK(𝐱,𝐱)<.\|K\|_{\infty}\mathrel{\mathop{\mathchar 58\relax}}=\sup\limits_{\mathbf{x}\in D}\sqrt{K(\mathbf{x},\mathbf{x})}<\infty. (2.1)

The condition (2.1) implies that H(K)H(K) is continuously embedded into (D)\ell_{\infty}(D), i.e.,

f(D)KfH(K).\|f\|_{\ell_{\infty}(D)}\leq\|K\|_{\infty}\cdot\|f\|_{H(K)}\,. (2.2)

The embedding operator

Id:H(K)L2(D,ϱD)\mathrm{Id}\colon H(K)\to L_{2}(D,\varrho_{D}) (2.3)

is Hilbert-Schmidt under the finite trace condition of the kernel

tr(K):=K22=DK(𝐱,𝐱)dϱD(𝐱)<,\operatorname{tr}(K)\mathrel{\mathop{\mathchar 58\relax}}=\|K\|^{2}_{2}=\int_{D}K(\mathbf{x},\mathbf{x}){\rm d}\varrho_{D}(\mathbf{x})<\infty, (2.4)

see [7], [33, Lemma 2.3], which is less strong than (2.1). In view of considering in this paper a recovery of functions in the uniform norm, we assume that the kernel KK is bounded from now. We additionally assume that H(K)H(K) is at least infinite dimensional.

Consider the embeddings Id\mathrm{Id} from (2.3) and WϱD=IdId:H(K)H(K)W_{\varrho_{D}}=\mathrm{Id}^{*}\circ\mathrm{Id}\colon H(K)\rightarrow H(K). Due to the compactness of Id\mathrm{Id}, the operator WϱDW_{\varrho_{D}} provides an (at most) countable system of strictly positive eigenvalues. Let (λn)n=1(\lambda_{n})_{n=1}^{\infty} be a non-increasing rearrangement of these eigenvalues, (σn)n=1(\sigma_{n})_{n=1}^{\infty} be the sequence of singular numbers of Id\mathrm{Id}, i.e., σk:=λk\sigma_{k}\mathrel{\mathop{\mathchar 58\relax}}=\sqrt{\lambda_{k}}, k=1,2,k=1,2,\dots. Further, with (en(𝐱))n=1H(K)(e^{*}_{n}(\mathbf{x}))_{n=1}^{\infty}\subset H(K) denote the system of right eigenfunctions (that correspond to the ordered system (λn)n=1(\lambda_{n})_{n=1}^{\infty}), and with (ηn(𝐱))n=1L2(D,ϱD)(\eta_{n}(\mathbf{x}))_{n=1}^{\infty}\subset L_{2}(D,\varrho_{D}) the system consisting of ηk(𝐱):=σk1ek(𝐱)\eta_{k}(\mathbf{x})\mathrel{\mathop{\mathchar 58\relax}}=\sigma_{k}^{-1}e^{*}_{k}(\mathbf{x}), k=1,2,k=1,2,\dots, which both represent orthonormal systems (ONS) in the respective spaces.

We would like to emphasize that the embedding (2.3) is not necessarily injective. As discussed in [3, p. 127], Id\mathrm{Id} is in general not injective as it maps a function to an equivalence class. As a consequence, the system of eigenvectors (en(𝐱))n=1(e^{*}_{n}(\mathbf{x}))_{n=1}^{\infty} may not be a basis in H(K)H(K) (note that H(K)H(K) may not even be separable). However, there are conditions which ensure that ONS (en(𝐱))n=1(e^{*}_{n}(\mathbf{x}))_{n=1}^{\infty} is an orthonormal basis (ONB) in H(K)H(K), see [3, Chapt. 4.5], which is related to Mercer’s theorem, see [3, Theorem 4.49]. Indeed, if we additionally assume that the kernel KK is bounded and continuous on D×DD\times D (for a compact domain DdD\subset\mathds{R}^{d}), then H(K)H(K) is separable and consists of continuous functions, see [1, Theorems 16, 17]. If we finally assume that the measure ϱD\varrho_{D} is a Borel measure with full support then (en(𝐱))n=1(e^{*}_{n}(\mathbf{x}))_{n=1}^{\infty} is a complete orthonormal system, i.e., ONB in H(K)H(K).

In this case we have the pointwise identity (see, e.g., [1, Cor. 4])

K(𝐱,𝐲)=k=1ek(𝐲)¯ek(𝐱),𝐱,𝐲D.K(\mathbf{x},\mathbf{y})=\sum\limits_{k=1}^{\infty}\overline{e^{*}_{k}(\mathbf{y})}e^{*}_{k}(\mathbf{x}),\quad\quad\mathbf{x},\mathbf{y}\in D\,. (2.5)

In what follows, we assume that the kernel KK is continuous on a compact domain DD (the so-called Mercer kernel). Such kernels satisfy all the indicated above conditions (see [3, Theorem 4.49]), and we even have (2.5) with absolute and uniform convergence on D×DD\times D. Taking into account the spectral theorem [3, Chapt. 4.5], we have that the system (en(𝐱))n=1(e^{*}_{n}(\mathbf{x}))_{n=1}^{\infty} of eigenfunctions of the non-negative compact self-adjoint operator WϱDW_{\varrho_{D}} is an orthonormal basis in H(K)H(K). Note, that (ηn(𝐱))n=1(\eta_{n}(\mathbf{x}))_{n=1}^{\infty} is also ONB in L2(D,ϱD)L_{2}(D,\varrho_{D}).

Hence, for every fH(K)f\in H(K) with a Mercer kernel KK, it holds

f(𝐱)=k=1(f,ek)H(K)ek(𝐱),f(\mathbf{x})=\sum_{k=1}^{\infty}(f,e^{*}_{k})_{H(K)}e^{*}_{k}(\mathbf{x}), (2.6)

where (f,ek)H(K)(f,e^{*}_{k})_{H(K)}, k=1,2,k=1,2,\dots, are the Fourier coefficients of ff with respect to the system (en(𝐱))n=1(e^{*}_{n}(\mathbf{x}))_{n=1}^{\infty}. Let us further denote

Pmf:=k=1m(f,ek)H(K)ek()P_{m}f\mathrel{\mathop{\mathchar 58\relax}}=\sum\limits_{k=1}^{m}(f,e^{*}_{k})_{H(K)}e^{*}_{k}(\cdot)

the projection onto the space span{e1(),,em()}\operatorname{span}\{e^{*}_{1}(\cdot),...,e^{*}_{m}(\cdot)\}. Due to the Parseval’s identity, the norm of the space H(K)H(K) takes the following form

fH(K)2=k=1|(f,ek)H(K)|2.\|f\|^{2}_{H(K)}=\sum_{k=1}^{\infty}|(f,e^{*}_{k})_{H(K)}|^{2}. (2.7)

Note, that for mm\in\mathds{N} the function

NK,ϱD(m,𝐱)=k=1m1|ηk(𝐱)|2N_{K,\varrho_{D}}(m,\mathbf{x})=\sum_{k=1}^{m-1}\left|\eta_{k}(\mathbf{x})\right|^{2}

is often called “Christoffel function” in literature (see [6] and references therein). For Vm=span{η1(),,ηm1()}V_{m}=\operatorname{span}\{\eta_{1}(\cdot),...,\eta_{m-1}(\cdot)\}, we define

NK,ϱD(m):=sup𝐱DNK,ϱD(m,𝐱)=supfVmf0f2/f22.N_{K,\varrho_{D}}(m)\mathrel{\mathop{\mathchar 58\relax}}=\sup_{\mathbf{x}\in D}N_{K,\varrho_{D}}(m,\mathbf{x})=\sup\limits_{\begin{subarray}{c}f\in V_{m}\\ f\neq 0\end{subarray}}\|f\|_{\infty}^{2}/\|f\|^{2}_{2}\,. (2.8)

We may use the quantity NK,ϱD(m)N_{K,\varrho_{D}}(m) to bound the operator norm IdPm1K,\|\mathrm{Id}-P_{m-1}\|_{K,\infty} (We use the abbreviation AK,:=AH(K)(D)\|A\|_{K,\infty}\mathrel{\mathop{\mathchar 58\relax}}=\|A\|_{H(K)\to\ell_{\infty}(D)}). It holds

supfH(K)1fPm1f=sup𝐱DsupfH(K)1|f(x)Pm1(x)|=sup𝐱DsupfH(K)1|km(f,ek())H(K)ek(𝐱)|.\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-P_{m-1}f\|_{\infty}=\sup\limits_{\mathbf{x}\in D}\sup\limits_{\|f\|_{H(K)}\leq 1}|f(x)-P_{m-1}(x)|=\sup\limits_{\mathbf{x}\in D}\sup\limits_{\|f\|_{H(K)}\leq 1}\Big{|}\sum\limits_{k\geq m}(f,e_{k}^{*}(\cdot))_{H(K)}e_{k}^{*}(\mathbf{x})\Big{|}\,.

This implies

IdPm1K,2=sup𝐱Dkm|ek(𝐱)|2.\|\mathrm{Id}-P_{m-1}\|^{2}_{K,\infty}=\sup\limits_{\mathbf{x}\in D}\sum\limits_{k\geq m}|e_{k}^{*}(\mathbf{x})|^{2}\,.

Using, that ek(𝐱)=σkηke_{k}^{*}(\mathbf{x})=\sigma_{k}\eta_{k}^{*}, where the sequence (σn)n=1(\sigma_{n})_{n=1}^{\infty} is non-increasing, i.e., for all 2lk<2l+12^{l}\leq k<2^{l+1} it holds σk212l12l1j<2lσj2\sigma_{k}^{2}\leq\frac{1}{2^{l-1}}\sum\limits_{2^{l-1}\leq j<2^{l}}\sigma_{j}^{2}, we obtain by a straight-forward calculation

IdPm1K,2=sup𝐱Dl=log2m2lk<2l+1|σkηk(𝐱)|2l=log2mNK,ϱD(2l+1)2l12l1j<2lσj2.\|\mathrm{Id}-P_{m-1}\|^{2}_{K,\infty}=\sup\limits_{\mathbf{x}\in D}\sum_{l=\lfloor\log_{2}m\rfloor}^{\infty}\sum_{2^{l}\leq k<2^{l+1}}\left|\sigma_{k}\eta_{k}^{*}(\mathbf{x})\right|^{2}\leq\sum_{l=\lfloor\log_{2}m\rfloor}^{\infty}\frac{N_{K,\varrho_{D}}(2^{l+1})}{2^{l-1}}\sum_{2^{l-1}\leq j<2^{l}}\sigma_{j}^{2}.

Hence, in view of the relations 2l+14j2^{l+1}\leq 4j and 1/2l1<2/j1/2^{l-1}<2/j, that are true for all 2l1j<2l2^{l-1}\leq j<2^{l}, we get

IdPm1K,2\displaystyle\|\mathrm{Id}-P_{m-1}\|^{2}_{K,\infty} l=log2m2l1j<2l2NK,ϱD(4j)jσj22k=2log2m1NK,ϱD(4k)kσk2\displaystyle\leq\sum_{l=\lfloor\log_{2}m\rfloor}^{\infty}\sum_{2^{l-1}\leq j<2^{l}}\frac{2N_{K,\varrho_{D}}(4j)}{j}\sigma_{j}^{2}\leq 2\sum_{k=\lfloor 2^{\log_{2}m-1}\rfloor}^{\infty}\frac{N_{K,\varrho_{D}}(4k)}{k}\sigma_{k}^{2}
=2km/2NK,ϱD(4k)σk2k.\displaystyle=2\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}. (2.9)

Note, that the system (ηk(𝐱))k=1(\eta_{k}(\mathbf{x}))_{k=1}^{\infty} is not necessarily a uniformly \ell_{\infty}-bounded ONS in L2(D,ϱD)L_{2}(D,\varrho_{D}), nevertheless the equality (2.5) is true. For systems (ηk(𝐱))k=1(\eta_{k}(\mathbf{x}))_{k=1}^{\infty}, where for all kk\in\mathds{N}

ηk(𝐱)(D)B,k,\|\eta_{k}(\mathbf{x})\|_{\ell_{\infty}(D)}\leq B,\quad k\in\mathds{N},

we have

NK,ϱD(m)(m1)B2.N_{K,\varrho_{D}}(m)\leq(m-1)B^{2}. (2.10)

3 Least squares approximation in the uniform norm

Let further 𝐗={𝐱1,,𝐱n}\boldsymbol{\rm X}=\{\boldsymbol{\rm x}^{1},\dots,\boldsymbol{\rm x}^{n}\}, 𝐱i=(x1i,,xdi)D\boldsymbol{\rm x}^{i}=({\rm x}_{1}^{i},\dots,{\rm x}_{d}^{i})\subset D, i=1,,ni=1,\dots,n, be the drawn independently nodes according to a Borel measure ϱD\varrho_{D} on DD that has full support. In [9] a recovery operator S𝐗mS^{m}_{\boldsymbol{\rm X}} was constructed which computes the best least squares fit S𝐗mfS^{m}_{\boldsymbol{\rm X}}f to the given data 𝐟(𝐗):=(f(𝐱1),,f(𝐱n))\mathbf{f}(\boldsymbol{\rm X})\mathrel{\mathop{\mathchar 58\relax}}=(f(\boldsymbol{\rm x}^{1}),\dots,f(\boldsymbol{\rm x}^{n})) from the finite-dimensional space spanned by ηk(𝐱)\eta_{k}(\mathbf{x}), k=1,,m1k=1,\dots,m-1.

Namely, let 𝐟:=(f(𝐱1),,f(𝐱n))\boldsymbol{\rm f}\mathrel{\mathop{\mathchar 58\relax}}=(f(\boldsymbol{\rm x}^{1}),\dots,f(\boldsymbol{\rm x}^{n}))^{\top}, 𝐜:=(c1,,cm1)\boldsymbol{\rm c}\mathrel{\mathop{\mathchar 58\relax}}=(c_{1},\dots,c_{m-1})^{\top}, (ηk(𝐱))k=1=(σk1ek(𝐱)))k=1(\eta_{k}(\mathbf{x}))_{k=1}^{\infty}=(\sigma_{k}^{-1}e^{*}_{k}(\mathbf{x})))_{k=1}^{\infty} and

𝐋n,m:=𝐋n,m(𝐗)=(η1(𝐱1)η2(𝐱1)ηm1(𝐱1)η1(𝐱n)η2(𝐱n)ηm1(𝐱n)).{\boldsymbol{\rm L}}_{n,m}\mathrel{\mathop{\mathchar 58\relax}}=\mathbf{L}_{n,m}(\boldsymbol{\rm X})=\begin{pmatrix}\eta_{1}(\boldsymbol{\rm x}^{1})&\eta_{2}(\boldsymbol{\rm x}^{1})&\cdots&\eta_{m-1}(\boldsymbol{\rm x}^{1})\\ \vdots&\vdots&\ddots&\vdots\\ \eta_{1}(\boldsymbol{\rm x}^{n})&\eta_{2}(\boldsymbol{\rm x}^{n})&\cdots&\eta_{m-1}(\boldsymbol{\rm x}^{n})\end{pmatrix}. (3.1)

We should solve the over-determined linear system 𝐋m𝐜=𝐟{\boldsymbol{\rm L}}_{m}\cdot\boldsymbol{\rm c}=\boldsymbol{\rm f} via least squares (e.g. directly or via the LSQR algorithm [31]), i.e., compute

𝐜:=(𝐋m𝐋m)1𝐋m𝐟.\boldsymbol{\rm c}\mathrel{\mathop{\mathchar 58\relax}}=(\mathbf{L}_{m}^{\ast}\mathbf{L}_{m})^{-1}\,\mathbf{L}_{m}^{\ast}\cdot\mathbf{f}.

The output 𝐜m1\boldsymbol{\rm c}\in\mathds{C}^{m-1} gives the coefficients of the approximant

S𝐗mf:=k=1m1ckηk.S^{m}_{\boldsymbol{\rm X}}f\mathrel{\mathop{\mathchar 58\relax}}=\sum_{k=1}^{m-1}c_{k}\eta_{k}. (3.2)

Further, let us consider a weighted least squares algorithm (see below) using the following weight/density function, which was introduced by D. Krieg and M. Ullrich in [14]. We set

ϱm(𝐱)=12(1m1k=1m1|ηk(𝐱)|2+1k=mλkk=m|ek(𝐱)|2).\varrho_{m}(\mathbf{x})=\frac{1}{2}\Big{(}\frac{1}{m-1}\sum_{k=1}^{m-1}\left|\eta_{k}(\mathbf{x})\right|^{2}+\frac{1}{\sum_{k=m}^{\infty}\lambda_{k}}\sum\limits_{k=m}^{\infty}|e_{k}^{\ast}(\mathbf{x})|^{2}\Big{)}. (3.3)

This density first appeared in [14]. We will also use the below algorithm with the simpler density function

ϱm(𝐱)=12(m1)k=1m1|ηk(𝐱)|2+12,\varrho^{\prime}_{m}(\mathbf{x})=\frac{1}{2(m-1)}\sum\limits_{k=1}^{m-1}|\eta_{k}(\mathbf{x})|^{2}+\frac{1}{2}\,, (3.4)

in case ϱD\varrho_{D} is a probability measure (see Remark 3.4). This density function has been implicitly used by V.N. Temlyakov in [35] and by Cohen, Migliorati [5]. In many relevant cases both densities yield similar bounds. However, as we point out in Remark 7.9 it could make a big difference even in the univariate situation.

Algorithm 1 Weighted least squares regression [14], [5].
Input: 𝐗=(𝐱1,,𝐱n)Dn\mathbf{X}=(\mathbf{x}^{1},...,\mathbf{x}^{n})\in D^{n} set of distinct sampling nodes,
𝐟=(f(𝐱1),,f(𝐱n))\mathbf{f}=(f(\mathbf{x}^{1}),...,f(\mathbf{x}^{n}))^{\top} samples of ff evaluated at the nodes from 𝐗\mathbf{X},
mm\in\mathds{N} m<nm<n such that the matrix 𝐋~k,m\widetilde{\mathbf{L}}_{k,m} in (3.5) has full (column) rank.
  Compute reweighted samples 𝒈:=(gj)j=1n\boldsymbol{g}\mathrel{\mathop{\mathchar 58\relax}}=(g_{j})_{j=1}^{n} with gj:={0,ϱm(𝐱j)=0,f(𝐱j)/ϱm(𝐱j),ϱm(𝐱j)0.g_{j}\mathrel{\mathop{\mathchar 58\relax}}=\begin{cases}0,&\varrho_{m}(\mathbf{x}^{j})=0,\\ f(\mathbf{x}^{j})/\sqrt{\varrho_{m}(\mathbf{x}^{j})},&\varrho_{m}(\mathbf{x}^{j})\neq 0\,.\end{cases}
  Solve the over-determined linear system
𝐋~k,m(c~1,,c~m1)=𝐠,𝐋~k,m:=(lj,k)j=1,k=1n,m1,lj,k:={0,ϱm(𝐱j)=0,ηk(𝐱j)/ϱm(𝐱j),ϱm(𝐱j)0,\widetilde{\mathbf{L}}_{k,m}\cdot(\tilde{c}_{1},...,\tilde{c}_{m-1})^{\top}=\mathbf{g}\,,\;\widetilde{\mathbf{L}}_{k,m}\mathrel{\mathop{\mathchar 58\relax}}=\Big{(}l_{j,k}\Big{)}_{j=1,k=1}^{n,m-1},\;l_{j,k}\mathrel{\mathop{\mathchar 58\relax}}=\begin{cases}0,&\varrho_{m}(\mathbf{x}^{j})=0,\\ \eta_{k}(\mathbf{x}^{j})/\sqrt{\varrho_{m}(\mathbf{x}^{j})},&\varrho_{m}(\mathbf{x}^{j})\neq 0,\end{cases} (3.5)
via least squares (e.g. directly or via the LSQR algorithm [31]), i.e., compute
(c~1,,c~m1):=(𝐋~k,m𝐋~k,m)1𝐋~k,m𝐠.(\tilde{c}_{1},...,\tilde{c}_{m-1})^{\top}\mathrel{\mathop{\mathchar 58\relax}}=(\widetilde{\mathbf{L}}_{k,m}^{\ast}\widetilde{\mathbf{L}}_{k,m})^{-1}\,\widetilde{\mathbf{L}}_{k,m}^{\ast}\cdot\mathbf{g}.

Output: 𝐜~=(c~1,,c~m1)m1\mathbf{\tilde{c}}=(\tilde{c}_{1},...,\tilde{c}_{m-1})^{\top}\in\mathds{C}^{m-1} coefficients of the approximant S~𝐗mf:=k=1m1c~kηk\widetilde{S}_{\mathbf{X}}^{m}f\mathrel{\mathop{\mathchar 58\relax}}=\sum_{k=1}^{m-1}\tilde{c}_{k}\eta_{k}.

Theorem 3.1.

Let H(K)H(K) be a reproducing kernel Hilbert space of complex-valued functions defined on a compact domain DdD\subset\mathds{R}^{d} with a continuous and bounded kernel K(𝐱,𝐲)K(\mathbf{x},\mathbf{y}), K:D×DK\colon D\times D\rightarrow\mathds{C}, i.e., K:=sup𝐱DK(𝐱,𝐱)<\|K\|_{\infty}\mathrel{\mathop{\mathchar 58\relax}}=\sup_{\mathbf{x}\in D}\sqrt{K(\mathbf{x},\mathbf{x})}<\infty, and ϱD\varrho_{D} be a finite Borel measure with full support on DD. We denote with (σn)n=1(\sigma_{n})_{n=1}^{\infty} the non-increasing sequence of singular numbers of the embedding Id:H(K)L2(D,ϱD)\mathrm{Id}\colon H(K)\rightarrow L_{2}(D,\varrho_{D}). The points 𝐗=(𝐱1,,𝐱n)\mathbf{X}=(\mathbf{x}^{1},...,\mathbf{x}^{n}) are drawn i.i.d. with respect to the measure defined by the density ϱm()dϱD\varrho_{m}(\cdot)d\varrho_{D}. There are absolute constants c1,c2,c3>0c_{1},c_{2},c_{3}>0 such that for

m:=nc1rlognm\mathrel{\mathop{\mathchar 58\relax}}=\Big{\lfloor}\frac{n}{c_{1}r\log n}\Big{\rfloor}

with r>1r>1, the reconstruction operator S~𝐗m\widetilde{S}_{\mathbf{X}}^{m} (see Algorithm 1), satisfies

(supfH(K)1fS~𝐗mf(D)2c3max{NK,ϱD(m)mkm/2σk2,km/2NK,ϱD(4k)σk2k}1c2n1r.\begin{split}\mathds{P}&\Big{(}\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|^{2}_{\ell_{\infty}(D)}\leq c_{3}\max\Big{\{}\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2},\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\Big{\}}\\ &\geq 1-c_{2}n^{1-r}\,.\end{split} (3.6)
Proof.

Let fH(K)f\in H(K) such that fH(K)1\|f\|_{H(K)}\leq 1. For the reconstruction operator S~𝐗mf\widetilde{S}_{\mathbf{X}}^{m}f from Algorithm 1 it holds

fS~𝐗mf(D)fPm1f(D)+Pm1fS~𝐗mf(D),\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}\leq\|f-P_{m-1}f\|_{\ell_{\infty}(D)}+\|P_{m-1}f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}, (3.7)

where the projection PmfP_{m}f is defined in Section 2.

From (2.9) we obtain that

supfH(K)1fPm1f(D)2km/2NK,ϱD(4k)σk2k.\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-P_{m-1}f\|_{\ell_{\infty}(D)}\leq\sqrt{2\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}}. (3.8)

Clearly, S~𝐗m(Pm1f)=Pm1f\widetilde{S}^{m}_{\boldsymbol{\rm X}}(P_{m-1}f)=P_{m-1}f. Therefore, for the second summand on the right-hand side of (3.7) we can write

Pm1fS~𝐗mf(D)\displaystyle\|P_{m-1}f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)} =S~𝐗m(fPm1f)(D)=k=1m1c~kηk(𝐱)(D)\displaystyle=\|\widetilde{S}_{\mathbf{X}}^{m}\left(f-P_{m-1}f\right)\|_{\ell_{\infty}(D)}=\left\|\sum_{k=1}^{m-1}\tilde{c}_{k}\eta_{k}(\mathbf{x})\right\|_{\ell_{\infty}(D)}
NK,ϱD(m)(k=1m1|c~k|2)1/2.\displaystyle\leq\sqrt{N_{K,\varrho_{D}}(m)}\left(\sum_{k=1}^{m-1}|\tilde{c}_{k}|^{2}\right)^{1/2}. (3.9)

Let us estimate the quantity (k=1m1|c~k|2)1/2\left(\sum_{k=1}^{m-1}|\tilde{c}_{k}|^{2}\right)^{1/2}, i.e., 2\ell_{2}-norm of the coefficients of the operator S~𝐗m\widetilde{S}_{\mathbf{X}}^{m}. We have

(k=1m1|c~k|2)1/2\displaystyle\left(\sum_{k=1}^{m-1}|\tilde{c}_{k}|^{2}\right)^{1/2} =(𝐋~k,m𝐋~k,m)1𝐋~k,m((fPm1f)(𝐱1)ϱm(𝐱1)(fPm1f)(𝐱n)ϱm(𝐱n))2\displaystyle=\left\|(\widetilde{\mathbf{L}}_{k,m}^{\ast}\widetilde{\mathbf{L}}_{k,m})^{-1}\,\widetilde{\mathbf{L}}_{k,m}^{\ast}\left(\begin{array}[]{c}\frac{\left(f-P_{m-1}f\right)(\mathbf{x}^{1})}{\sqrt{\varrho_{m}(\mathbf{x}^{1})}}\\ \cdots\\ \frac{\left(f-P_{m-1}f\right)(\mathbf{x}^{n})}{\sqrt{\varrho_{m}(\mathbf{x}^{n})}}\\ \end{array}\right)\right\|_{\ell_{2}} (3.13)
(𝐋~n,m𝐋~n,m)1𝐋~n,m22(i=1n|f(𝐱i)Pm1f(𝐱i)|2ϱm(𝐱i))1/2.\displaystyle\leq\left\|(\widetilde{\mathbf{L}}_{n,m}^{\ast}\widetilde{\mathbf{L}}_{n,m})^{-1}\,\widetilde{\mathbf{L}}_{n,m}^{\ast}\right\|_{2\to 2}\left(\sum_{i=1}^{n}\frac{|f(\mathbf{x}^{i})-P_{m-1}f(\mathbf{x}^{i})|^{2}}{\varrho_{m}(\mathbf{x}^{i})}\right)^{1/2}. (3.14)

Now we estimate the spectral norm. For the function

N~(m):=sup𝐱Dk=1m1|ηk(𝐱)|2ϱm(𝐱),\widetilde{N}(m)\mathrel{\mathop{\mathchar 58\relax}}=\sup_{\mathbf{x}\in D}\sum_{k=1}^{m-1}\frac{\left|\eta_{k}(\mathbf{x})\right|^{2}}{\varrho_{m}(\mathbf{x})}\,,

by the choice (3.3) of ϱm\varrho_{m}, it holds N~(m)2(m1)\widetilde{N}(m)\leq 2(m-1). Hence, in view of [24, Theorem 5.1], we get with

N~(m)2(m1)n10rlogn\widetilde{N}(m)\leq 2(m-1)\leq\frac{n}{10r\log n}

for r>1r>1 the condition

mnc1rlogn,m\leq\frac{n}{c_{1}r\log n}\,, (3.15)

and have

(𝐋~n,m𝐋~n,m)1𝐋~n,m222n\left\|(\widetilde{\mathbf{L}}_{n,m}^{\ast}\widetilde{\mathbf{L}}_{n,m})^{-1}\,\widetilde{\mathbf{L}}_{n,m}^{\ast}\right\|_{2\to 2}\leq\sqrt{\frac{2}{n}} (3.16)

with probability at least 13n1r1-3n^{1-r}. Combining (3.14) and (3.16), and taking into account (2.6), we derive to

k=1m1|c~k|2\displaystyle\sum_{k=1}^{m-1}|\tilde{c}_{k}|^{2} 2ni=1n|f(𝐱i)Pm1f(𝐱i)|2ϱm(𝐱i)\displaystyle\leq\frac{2}{n}\sum_{i=1}^{n}\frac{|f(\mathbf{x}^{i})-P_{m-1}f(\mathbf{x}^{i})|^{2}}{\varrho_{m}(\mathbf{x}^{i})}
=2ni=1n|k=m(f,ek)H(K)ek(𝐱i)ϱm(𝐱i)|2=2n(𝚽~𝐟^,𝚽~𝐟^)2,\displaystyle=\frac{2}{n}\sum_{i=1}^{n}\left|\sum_{k=m}^{\infty}(f,e^{*}_{k})_{H(K)}\frac{e^{*}_{k}(\mathbf{x}^{i})}{\sqrt{\varrho_{m}(\mathbf{x}^{i})}}\right|^{2}=\frac{2}{n}(\widetilde{\mathbf{\Phi}}\mathbf{\hat{f}},\widetilde{\mathbf{\Phi}}\mathbf{\hat{f}})_{\ell_{2}}, (3.17)

where

𝚽~:=(ek(𝐱i)ϱm(𝐱i))i=1,k=mn,=(𝐲1𝐲n)\widetilde{\mathbf{\Phi}}\mathrel{\mathop{\mathchar 58\relax}}=\left(\frac{e^{*}_{k}(\mathbf{x}^{i})}{\sqrt{\varrho_{m}(\mathbf{x}^{i})}}\right)_{i=1,k=m}^{n,\infty}=\left(\begin{array}[]{c}\mathbf{y}^{1}\\ \vdots\\ \mathbf{y}^{n}\\ \end{array}\right)

is an infinite matrix with 𝐲i:=1/ϱm(𝐱i)(em(𝐱i),em+1(𝐱i),)\mathbf{y}^{i}\mathrel{\mathop{\mathchar 58\relax}}=1/\sqrt{\varrho_{m}(\mathbf{x}^{i})}(e^{*}_{m}(\mathbf{x}^{i}),e^{*}_{m+1}(\mathbf{x}^{i}),\dots), i=1,,ni=1,\dots,n, 𝐟^:=((f,ek)H(K))k=m\mathbf{\hat{f}}\mathrel{\mathop{\mathchar 58\relax}}=((f,e^{*}_{k})_{H(K)})_{k=m}^{\infty}. Then, (3.17) yields

k=1m1|c~k|2\displaystyle\sum_{k=1}^{m-1}|\tilde{c}_{k}|^{2} 2n(𝚽~𝚽~𝐟^,𝐟^)22n𝚽~𝚽~22(𝐟^,𝐟^)22n𝚽~𝚽~22fH(K)2\displaystyle\leq\frac{2}{n}(\widetilde{\mathbf{\Phi}}^{*}\widetilde{\mathbf{\Phi}}\mathbf{\hat{f}},\mathbf{\hat{f}})_{\ell_{2}}\leq\frac{2}{n}\left\|\widetilde{\mathbf{\Phi}}^{*}\widetilde{\mathbf{\Phi}}\right\|_{2\to 2}(\mathbf{\hat{f}},\mathbf{\hat{f}})_{\ell_{2}}\leq\frac{2}{n}\left\|\widetilde{\mathbf{\Phi}}^{*}\widetilde{\mathbf{\Phi}}\right\|_{2\to 2}\|f\|^{2}_{H(K)}
2(1n𝚽~𝚽~𝚲22+𝚲22)\displaystyle\leq 2\left(\left\|\frac{1}{n}\widetilde{\mathbf{\Phi}}^{*}\widetilde{\mathbf{\Phi}}-\mathbf{\Lambda}\right\|_{2\to 2}+\|\mathbf{\Lambda}\|_{2\to 2}\right) (3.18)

with 𝚲:=diag(σm2,σm+12,)\mathbf{\Lambda}\mathrel{\mathop{\mathchar 58\relax}}=\mathrm{diag}(\sigma_{m}^{2},\sigma_{m+1}^{2},\dots).

We can use now [23, Prop. 3.8], in view of the fact that

𝐲i22sup𝐱Dk=m|ek(𝐱)|2ϱm(𝐱)2k=mλk=:M2.\|\mathbf{y}^{i}\|^{2}_{2}\leq\sup_{\mathbf{x}\in D}\sum_{k=m}^{\infty}\frac{\left|e^{*}_{k}(\mathbf{x})\right|^{2}}{\varrho_{m}(\mathbf{x})}\leq 2\sum\limits_{k=m}^{\infty}\lambda_{k}=\mathrel{\mathop{\mathchar 58\relax}}M^{2}.

We get with high probability that

1n𝚽~𝚽~𝚲22max{8rlognnM2κ2,𝚲22}:=F,\left\|\frac{1}{n}\widetilde{\mathbf{\Phi}}^{*}\widetilde{\mathbf{\Phi}}-\mathbf{\Lambda}\right\|_{2\to 2}\leq\max\left\{\frac{8r\log n}{n}M^{2}\kappa^{2},\|\mathbf{\Lambda}\|_{2\to 2}\right\}\mathrel{\mathop{\mathchar 58\relax}}=F,

where κ=1+52\kappa=\frac{1+\sqrt{5}}{2}, 𝚲22=σm2\|\mathbf{\Lambda}\|_{2\to 2}=\sigma_{m}^{2}. Therefore, from (3.18) we get

k=1m1|c~k|2\displaystyle\sum_{k=1}^{m-1}|\tilde{c}_{k}|^{2} 2(F+σm2)=2(max{16κ2rlognnk=mλk,σm2}+σm2)\displaystyle\leq 2(F+\sigma_{m}^{2})=2\left(\max\left\{\frac{16\kappa^{2}r\log n}{n}\sum\limits_{k=m}^{\infty}\lambda_{k},\sigma_{m}^{2}\right\}+\sigma_{m}^{2}\right)
4max{16κ2rlognnk=mλk,σm2}.\displaystyle\leq 4\max\left\{\frac{16\kappa^{2}r\log n}{n}\sum\limits_{k=m}^{\infty}\lambda_{k},\sigma_{m}^{2}\right\}. (3.19)

Choosing the maximal mm that satisfy the relation (3.15), i.e., m=nc1rlogn,m=\big{\lfloor}\frac{n}{c_{1}r\log n}\big{\rfloor}\,, we get

1m=1nc1rlogn1nc1rlogn=c1rlognn,\frac{1}{m}=\frac{1}{\big{\lfloor}\frac{n}{c_{1}r\log n}\big{\rfloor}}\geq\frac{1}{\frac{n}{c_{1}r\log n}}=\frac{c_{1}r\log n}{n}\,,

that is

rlognn1c1m,\frac{r\log n}{n}\leq\frac{1}{c_{1}m}\,,

and hence,

Pm1fS~𝐗mf(D)\displaystyle\|P_{m-1}f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)} 2NK,ϱD(m)max{4κc1k=mσk2m,σm}\displaystyle\leq 2\sqrt{N_{K,\varrho_{D}}(m)}\max\left\{\frac{4\kappa}{\sqrt{c_{1}}}\sqrt{\frac{\sum_{k=m}^{\infty}\sigma_{k}^{2}}{m}},\sigma_{m}\right\}
CNK,ϱD(m)mkm/2σk2.\displaystyle\leq C^{\prime}\sqrt{\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}}\,. (3.20)

Finally, from (3.7), (3.8), and (3.20) we obtain

supfH(K)1fS~𝐗mf(D)2km/2NK,ϱD(4k)σk2k+CNK,ϱD(m)mkm/2σk2,\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}\leq\sqrt{2\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}}+C^{\prime}\sqrt{\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}}\,, (3.21)

which yields the statement of Theorem 3.1. ∎

Remark 3.2.

Note, that we can get explicit values for the constants c1,c2,c3c_{1},c_{2},c_{3} in Theorem 3.1. Indeed, following the proof we see, that c2=3c_{2}=3. As to c1,c3c_{1},c_{3}, the relation (3.20) yields

Pm1fS~𝐗mf(D)2(4κc1+1)NK,ϱD(m)max{k=mσk2m,σm}.\|P_{m-1}f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}\leq 2\left(\frac{4\kappa}{\sqrt{c_{1}}}+1\right)\sqrt{N_{K,\varrho_{D}}(m)}\max\left\{\sqrt{\frac{\sum_{k=m}^{\infty}\sigma_{k}^{2}}{m}},\sigma_{m}\right\}.

Then, taking into account (see Remark 2 on p. 16 of [24]) that

σm2+1mk=mσk22mkm/2σk2,\sigma_{m}^{2}+\frac{1}{m}\sum_{k=m}^{\infty}\sigma_{k}^{2}\leq\frac{2}{m}\sum_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2},

we get

Pm1fS~𝐗mf(D)22(4κc1+1)NK,ϱD(m)mkm/2σk2,\|P_{m-1}f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}\leq 2\sqrt{2}\left(\frac{4\kappa}{\sqrt{c_{1}}}+1\right)\sqrt{\frac{N_{K,\varrho_{D}}(m)}{m}\sum_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}},

where κ=1+52\kappa=\frac{1+\sqrt{5}}{2}. Respectively, we put C=22(4κc1+1)C^{\prime}=2\sqrt{2}\left(\frac{4\kappa}{\sqrt{c_{1}}}+1\right) in (3.20), and from (3.21) get

supfH(K)1f\displaystyle\sup\limits_{\|f\|_{H(K)}\leq 1}\|f S~𝐗mf(D)(2+C)(km/2NK,ϱD(4k)σk2k+NK,ϱD(m)mkm/2σk2)\displaystyle-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}\leq(\sqrt{2}+C^{\prime})\left(\sqrt{\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}}+\sqrt{\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}}\right)
22(8κc1+3)max{km/2NK,ϱD(4k)σk2k,NK,ϱD(m)mkm/2σk2}.\displaystyle\leq 2\sqrt{2}\left(\frac{8\kappa}{\sqrt{c_{1}}}+3\right)\max\Big{\{}\sqrt{\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}},\sqrt{\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}}\Big{\}}.

Hence, c3=8(8κc1+3)2c_{3}=8\left(\frac{8\kappa}{\sqrt{c_{1}}}+3\right)^{2} in the general estimate (3.6).

For arbitrary orthonormal systems (ηk(𝐱))k=1(\eta_{k}(\mathbf{x}))_{k=1}^{\infty}, we can take c1=20c_{1}=20 in (3.15) and obtain then c3=278c_{3}=278. If (ηk(𝐱))k=1(\eta_{k}(\mathbf{x}))_{k=1}^{\infty} is uniformly \ell_{\infty}-bounded ONS in L2(D,ϱD)L_{2}(D,\varrho_{D}), then we can use the plain (non-weighted) least squares algorithm and take c1=10c_{1}=10, hence, put c3=403c_{3}=403 (see Section 7 for further comments).

Corollary 3.3.

If there is a measure ϱD\varrho_{D} on DD such that NK,ϱD(k)=𝒪(k)N_{K,\varrho_{D}}(k)=\mathcal{O}(k) we obtain

supfH(K)1fS~𝐗mf(D)2CϱD,Kkm/2σk2CϱD,Kam/2(IdK,)2\begin{split}\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|^{2}_{\ell_{\infty}(D)}\leq C_{\varrho_{D},K}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}\leq C_{\varrho_{D},K}a_{\lfloor m/2\rfloor}(\mathrm{Id}_{K,\infty})^{2}\end{split} (3.22)

for constant CK,ϱD>0C_{K,\varrho_{D}}>0 depending on the kernel KK and the measure ϱD\varrho_{D}.

Proof.

The statement follows directly from Theorem 3.1 in combination [4, Lem. 3.3].

Remark 3.4 (Other density).

Instead of (3.3) one could also use the simpler density function (3.4) defined above in case that ϱD\varrho_{D} is a probability measure on DD (otherwise with a proper normalization). Then the estimate would be slightly worse. Instead of the term

NK,ϱD(m)mkm/2σk2\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}

in (3.6) we obtain

NK,ϱD(m)mkcmNK,ϱD(4k)σk2k,\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor cm\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\,,

where c>0c>0 is an absolute constant. This does not make a difference if NK,ϱD(m)=𝒪(m)N_{K,\varrho_{D}}(m)=\mathcal{O}(m). However, it makes a difference if we have for instance NK,ϱD(m)N_{K,\varrho_{D}}(m) growing as m2m^{2} which is the case for, e.g., Legendre polynomials. See Example 7.3 below.

Remark 3.5 (Non-weighted version).

Note, that one could even use a non-weighted least squares algorithm (see (3.2)), without considering the additional density function ϱm\varrho_{m}. If the measure ϱD\varrho_{D} is fixed, namely the one where the data is coming from, then we get a bound

supfH(K)1fS𝐗mf(D)2Ckm/2NK,ϱD(4k)σk2k,\sup\limits_{\|f\|_{H(K)}\leq 1}\|f-S_{\mathbf{X}}^{m}f\|^{2}_{\ell_{\infty}(D)}\leq C\sum\limits_{k\geq\lfloor m^{*}/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\,,

where C>0C>0 is an absolute constant which can be calculated precisely, mm^{*} is the largest number such that NK,ϱD(m)n/(10rlogn)N_{K,\varrho_{D}}(m^{*})\leq n/(10r\log n).

4 Improved bounds for sampling numbers

By exactly the same strategy as used in the recent paper by N. Nagel, M. Schäfer and T. Ullrich [24], we can improve the bound (3.6) for sampling numbers associated to the compact embedding of RKHS with Mercer kernel into the space of bounded on DD functions, applying a modification of the Weaver sub-sampling strategy.

Theorem 4.1 ([24, 28, 22]).

Let k1,k2,k3>0k_{1},k_{2},k_{3}>0 and 𝐮1,,𝐮nm\mathbf{u}_{1},...,\mathbf{u}_{n}\in\mathds{C}^{m} with 𝐮i22k1mn\|\mathbf{u}_{i}\|_{2}^{2}\leq k_{1}\frac{m}{n} for all i=1,,ni=1,...,n and

k2𝐰22i=1n|𝐰,𝐮i|2k3𝐰22k_{2}\|\mathbf{w}\|_{2}^{2}\leq\sum_{i=1}^{n}|\langle\mathbf{w},\mathbf{u}_{i}\rangle|^{2}\leq k_{3}\|\mathbf{w}\|_{2}^{2}

for all 𝐰m\mathbf{w}\in\mathds{C}^{m}. Then there is a J[n]J\subseteq[n] of size #JC1m\#J\leq C_{1}m with

C2mn𝐰22iJ|𝐰,𝐮i|2C3mn𝐰22C_{2}\cdot\frac{m}{n}\|\mathbf{w}\|_{2}^{2}\leq\sum_{i\in J}|\langle\mathbf{w},\mathbf{u}_{i}\rangle|^{2}\leq C_{3}\cdot\frac{m}{n}\|\mathbf{w}\|_{2}^{2}

for all 𝐰m\mathbf{w}\in\mathds{C}^{m}, where C1,C2,C3C_{1},C_{2},C_{3} only depend on k1,k2,k3k_{1},k_{2},k_{3}. More precisely, we can choose

C1=1642k1k2,C2=(2+2)2k1,C3=1642k1k3k2C_{1}=1642\frac{k_{1}}{k_{2}}\,,\quad C_{2}=(2+\sqrt{2})^{2}k_{1}\,,\quad C_{3}=1642\frac{k_{1}k_{3}}{k_{2}}

in case nm47k1k2\frac{n}{m}\geq 47\frac{k_{1}}{k_{2}}. In the regime 1nm<47k1k21\leq\frac{n}{m}<47\frac{k_{1}}{k_{2}} one may put C1=47k1/k2C_{1}=47k_{1}/k_{2}, C2=k2C_{2}=k_{2}, C3=47k1k3/k2C_{3}=47k_{1}k_{3}/k_{2}.

Theorem 4.2.

Under the assumptions of Theorem 3.1 we obtain the following bounds for the sampling numbers gn(Id:H(K)(D))g_{n}(\mathrm{Id}\mathrel{\mathop{\mathchar 58\relax}}H(K)\to\ell_{\infty}(D)). The measure ϱD\varrho_{D} is at our disposal.

(i) There is an absolute constant b>0b>0 such that

gbmlogm(Id)2c3max{NK,ϱD(m)mkm/2σk2,km/2NK,ϱD(4k)σk2k}.g_{\lfloor bm\log m\rfloor}(\mathrm{Id})^{2}\leq c_{3}\max\Big{\{}\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}\,,\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\Big{\}}\,.

(ii) There are absolute constants c4,c5>0c_{4},c_{5}>0 such that

gm(Id)2c4max{NK,ϱD(m)logmmkc5mσk2,kc5mNK,ϱD(4k)σk2k}.g_{m}(\mathrm{Id})^{2}\leq c_{4}\max\Big{\{}\frac{N_{K,\varrho_{D}}(m)\log m}{m}\sum\limits_{k\geq\lfloor c_{5}m\rfloor}\sigma_{k}^{2}\,,\sum\limits_{k\geq\lfloor c_{5}m\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\Big{\}}\,.
Proof.

The statement (i) follows immediately from Theorem 3.1.

To get (ii), we apply a sampling operator S~Jm\widetilde{S}_{J}^{m} using O(m)O(m) sampling nodes (𝐱i)iJ𝐗(\mathbf{x}^{i})_{i\in J}\subset\mathbf{X} constructed out of a random draw in combination with a sub-sampling procedure, instead of O(mlogm)O(m\log m) (as in Algorithm 1). That is, we omit some rows in the matrix 𝐋n,m\mathbf{L}_{n,m} from (3.1) (respectively, in 𝐋~n,m\widetilde{\mathbf{L}}_{n,m}) by constructing a sum-matrix 𝐋~J,m\widetilde{\mathbf{L}}_{J,m} having #J=O(m)\#J=O(m) rows, such that for all 𝐰m1\mathbf{w}\in\mathds{C}^{m-1} (see Theorem 4.1) it holds

c2𝐰221m𝐋~J,m𝐰22C2𝐰22.c_{2}\|\mathbf{w}\|_{2}^{2}\leq\frac{1}{m}\|\widetilde{\mathbf{L}}_{J,m}\mathbf{w}\|_{2}^{2}\leq C_{2}\|\mathbf{w}\|_{2}^{2}\,.

With this matrix we perform the least squares method, applied to the shrinked vector of function samples (f(𝐱i))iJ(f(\mathbf{x}^{i}))_{i\in J}, see the proof of Theorem 5 in [24] for the details. ∎

Corollary 4.3.

If there is a measure ϱD\varrho_{D} on DD such that NK,ϱD(k)=𝒪(k)N_{K,\varrho_{D}}(k)=\mathcal{O}(k) we obtain

gm(Id)CϱD,Kmin{am/(c6logm)(Id),logmac5m(Id)}.g_{m}(\mathrm{Id})\leq C_{\varrho_{D},K}\min\{a_{\lfloor m/(c_{6}\log m)\rfloor}(\mathrm{Id})\,,\sqrt{\log m}\cdot a_{\lfloor c_{5}m\rfloor}(\mathrm{Id})\}\,. (4.1)
Proof.

In the case NK,ϱD(k)=𝒪(k)N_{K,\varrho_{D}}(k)=\mathcal{O}(k), from Theorem 4.2 we obtain

gm(Id)2Cmin{km/(c6logm)σk2,logmkc5mσk2}g_{m}(\mathrm{Id})^{2}\leq C\min\Big{\{}\sum\limits_{k\geq\lfloor m/(c_{6}\log m)\rfloor}\sigma_{k}^{2}\,,\log m\sum\limits_{k\geq\lfloor c_{5}m\rfloor}\sigma_{k}^{2}\Big{\}}

with an absolute constant C>0C>0. Taking into account, that

j=nσj2ϱD(D)an(Id)2\sum_{j=n}^{\infty}\sigma_{j}^{2}\leq\varrho_{D}(D)\cdot a_{n}(\mathrm{Id})^{2} (4.2)

(see F. Cobos, T. Kühn and W. Sickel [4, Lem. 3.3], which was proved in a more general case), we get (4.1). ∎

Remark 4.4.

The estimate from Corollary 4.3 improves the upper bound for sampling numbers of the embeddings of RKHS, consisting of continuous periodic functions, into L([0,1)d)L_{\infty}([0,1)^{d}), that was established in a recent result by L. Kämmerer, see [8, Theorem 4.11]. Note also, that this result was stated under stronger conditions on a weight function. In turn, it is an an improvement of the previous result by L. Kämmerer and T. Volkmer [10].

Remark 4.5.

The estimate from (ii) of Theorem 4.2 is non-constructive, since we have just the fact of existence of such algorithm.

Remark 4.6.

The bound in (4.2) appears at several places in the literature. See for instance R. Kunsch [19, Prop. 2.7], K. Yu. Osipenko and O. G. Parfenov [30, Theorem 3], F. Cobos, T. Kühn and W. Sickel [4, Lem. 3.3] and also F. Kuo, G. W. Wasilkowski and H. Woźniakowski [20].

5 Sampling and Kolmogorov numbers

For a compact set F(D)F\subset\ell_{\infty}(D) of complex-valued functions defined on a compact domain DdD\subset\mathds{R}^{d}, let us denote by dm(Id:F(D))d_{m}(\mathrm{Id}\colon F\rightarrow\ell_{\infty}(D)) its mmth Kolmogorov number, i.e.,

dm(Id):=infVmsupfF1infgVmfg(D),d_{m}(\mathrm{Id})\mathrel{\mathop{\mathchar 58\relax}}=\inf\limits_{V_{m}}\sup\limits_{\|f\|_{F}\leq 1}\inf_{g\in V_{m}}\|f-g\|_{\ell_{\infty}(D)},

where VmV_{m} is m1m-1-dimensional subspace of FF. Assume also, that we know the optimal subspace for dm(Id)d_{m}(\mathrm{Id}), i.e., a subspace VmV_{m}^{*} such that

supfF1infgVmfg(D)32dm(Id).\sup\limits_{\|f\|_{F}\leq 1}\inf_{g\in V_{m}^{*}}\|f-g\|_{\ell_{\infty}(D)}\leq\frac{3}{2}d_{m}(\mathrm{Id}).

Let further, ϱ\varrho be a finite measure on DD and (ϕn)n=1(\phi_{n})_{n=1}^{\infty} be any ONS in VmV_{m}^{*} with respect to ϱ\varrho. The following statement holds.

Proposition 5.1.

Let FF be compactly embedded subspace (of complex-valued continuous functions) of (D)\ell_{\infty}(D), where DdD\subset\mathds{R}^{d} is a compact subset. Then there is an absolute constant b>0b>0 such that

gbmlogm(Id:F(D))(2+ϱ(D)/(m1))Nϱ,Vmdm(Id)g_{\lfloor bm\log m\rfloor}(\mathrm{Id}\mathrel{\mathop{\mathchar 58\relax}}F\to\ell_{\infty}(D))\leq(2+\sqrt{\varrho(D)/(m-1)})\sqrt{N_{\varrho,V_{m}^{*}}}d_{m}(\mathrm{Id})

holds, where

Nϱ,Vm:=supfVmf(D)2fL2(D,ϱ)2,N_{\varrho,V_{m}^{*}}\mathrel{\mathop{\mathchar 58\relax}}=\sup\limits_{f\in V_{m}^{*}}\frac{\|f\|^{2}_{\ell_{\infty}(D)}}{\|f\|^{2}_{L_{2}(D,\varrho)}}, (5.1)

and the finite measure ϱ\varrho on DD is at our disposal.

Proof.

Following the proof of Theorem 3.1, for fFf\in F, gVmg\in V_{m}^{*} and the recovery operator S~𝐗mf\widetilde{S}_{\mathbf{X}}^{m}f from Algorithm 1 that uses the density function

ϱm(𝐱)=12(1m1k=1m1|ϕk(𝐱)|2+1),\varrho_{m}(\mathbf{x})=\frac{1}{2}\Big{(}\frac{1}{m-1}\sum_{k=1}^{m-1}\left|\phi_{k}(\mathbf{x})\right|^{2}+1\Big{)},

(compare to (3.4)) we get with high probability, that

fS~𝐗mf(D)\displaystyle\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)} fg(D)+gS~𝐗mf(D)=fg(D)+k=1m1c~kϕk(𝐱)(D)\displaystyle\leq\|f-g\|_{\ell_{\infty}(D)}+\|g-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}=\|f-g\|_{\ell_{\infty}(D)}+\left\|\sum_{k=1}^{m-1}\tilde{c}_{k}\phi_{k}(\mathbf{x})\right\|_{\ell_{\infty}(D)}
fg(D)+Nϱ,Vm2ni=1n|f(𝐱i)g(𝐱i)|21/2\displaystyle\leq\|f-g\|_{\ell_{\infty}(D)}+\sqrt{N_{\varrho,V_{m}^{*}}}\sqrt{\frac{2}{n}\sum_{i=1}^{n}\frac{|f(\mathbf{x}^{i})-g(\mathbf{x}^{i})|^{2}}{1/2}}
fg(D)+2Nϱ,Vmfg(D).\displaystyle\leq\|f-g\|_{\ell_{\infty}(D)}+2\sqrt{N_{\varrho,V_{m}^{*}}}\|f-g\|_{\ell_{\infty}(D)}. (5.2)

Since for all m2m\geq 2, it holds

m1=Dk=1m1|ϕk(𝐱)|2dϱ(𝐱)ϱ(D)sup𝐱Dk=1m1|ϕk(𝐱)|2m-1=\int_{D}\sum\limits_{k=1}^{m-1}|\phi_{k}(\mathbf{x})|^{2}d\varrho(\mathbf{x})\leq\varrho(D)\sup\limits_{\mathbf{x}\in D}\sum\limits_{k=1}^{m-1}|\phi_{k}(\mathbf{x})|^{2}

we have

Nϱ,Vm=sup𝐱Dk=1m1|ϕk(𝐱)|2(m1)/ϱ(D).N_{\varrho,V_{m}^{*}}=\sup_{\mathbf{x}\in D}\sum_{k=1}^{m-1}\left|\phi_{k}(\mathbf{x})\right|^{2}\geq(m-1)/\varrho(D)\,.

It follows from (5.2) that

fS~𝐗mf(D)(2+ϱ(D)/(m1))Nϱ,Vmfg(D).\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{\ell_{\infty}(D)}\leq(2+\sqrt{\varrho(D)/(m-1)})\sqrt{N_{\varrho,V_{m}^{*}}}\|f-g\|_{\ell_{\infty}(D)}.

Remark 5.2.

Note, that Proposition 5.1 can be extended to more general function classes FF, similar as in [35] and [13].

The following corollary is in the spirit of [35, Theorem 1.1].

Corollary 5.3.

There are universal constants b,C>0b,C>0 such that it holds

gbm(Id)CNϱ,Vmdm(Id),mϱ(D).g_{\lfloor bm\rfloor}(\mathrm{Id})\leq C\sqrt{N_{\varrho,V_{m}^{*}}}d_{m}(\mathrm{Id})\,,\quad m\geq\varrho(D)\,.
Proof.

This follows by an application of the subsampling technique in Theorem 4.1. ∎

Remark 5.4.

It was shown by V.N. Temlyakov [35, Theorem 1.1], that for a compact set FF of continuous on DdD\subset\mathds{R}^{d} complex-valued functions, there exist two positive constants bb and BB such that

gbm(Id:FL2(D,ϱ))Bdm(Id:FL(D,ϱ)).g_{\lfloor bm\rfloor}(\mathrm{Id}\colon F\rightarrow L_{2}(D,\varrho))\leq Bd_{m}(\mathrm{Id}\colon F\rightarrow L_{\infty}(D,\varrho)).

For special sets FF (in the reproducing kernel Hilbert space setting) the estimate

gm(Id:FL2(D,ϱ))2Clogmmkcmdk(Id:FL2(D,ϱ))2,g_{m}(\mathrm{Id}\colon F\rightarrow L_{2}(D,\varrho))^{2}\leq C\frac{\log m}{m}\sum\limits_{k\geq\lfloor cm\rfloor}d_{k}(\mathrm{Id}\colon F\rightarrow L_{2}(D,\varrho))^{2},

with universal constants C,c>0C,c>0, was earlier proved by N. Nagel, M. Schäfer and T. Ullrich [24] based on D. Krieg and M. Ullrich [14]. Similar results for non-Hilbert function spaces were obtained in the second part of research by D. Krieg and M. Ullrich [13].

Remark 5.5.

The above bound on sampling numbers in terms of Kolmogorov numbers motivates the investigation of Kolmogorov numbers in certain situations. It further motivates the study of the Christoffel function of the related optimal subspaces. Currently we do not know how this bound can be made more explicit and therefore do not have convincing examples. We leave it as an open problem to control the quantity Nϱ,Vm\sqrt{N_{\varrho,V_{m}^{*}}} for the “optimal” subspace VmV^{*}_{m}. Note that the measure ϱ\varrho is at our disposal.

Remark 5.6 (Approximation in the uniform norm).

In the recent papers by V.N. Temlyakov and T. Ullrich [36, 37], a behavior of some asymptotic characteristics of multivariate functions classes in the uniform norm was studied in the case of “small smoothness” of functions. The crucial point here is the fact, that in a small smoothness setting the corresponding approximation numbers are not square summable. The established estimates for Kolmogorov widths of the Sobolev classes of functions and classes of functions with bounded mixed differences serve as a powerful tool to investigate sampling recovery problem in L2L_{2} and LL_{\infty} (see also Section 7 in [36] and Section 5 in [37]).

6 The power of standard information in the uniform norm

Now we discuss the optimal order of convergence of algorithms that realize upper bounds for approximation numbers an(Id)a_{n}(\mathrm{Id}) and sampling numbers gn(Id)g_{n}(\mathrm{Id}) defined by (1.1) and (1.2), for the embedding operator Id:H(K)F\mathrm{Id}\colon H(K)\rightarrow F.

The optimal order of convergence qFlinq^{\rm lin}_{F} among all algorithms that use linear information for the identity operator Id\mathrm{Id} is defined (see the monograph by E. Novak and H. Woźniakowski [29, Chapt. 26.6.1]) as

qFlin:=sup{q0:limnnqan(Id)=0},q^{\rm lin}_{F}\mathrel{\mathop{\mathchar 58\relax}}=\sup\left\{q\geq 0\colon\quad\lim_{n\rightarrow\infty}n^{q}a_{n}(\mathrm{Id})=0\right\},

and, respectively, for standard information

qFstd:=sup{q0:limnnqgn(Id)=0}.q^{\rm std}_{F}\mathrel{\mathop{\mathchar 58\relax}}=\sup\left\{q\geq 0\colon\quad\lim_{n\rightarrow\infty}n^{q}g_{n}(\mathrm{Id})=0\right\}.

In the case F=L2(D,ϱD)F=L_{2}(D,\varrho_{D}) we write q2,ϱDlin:=qL2(D,ϱD)linq^{\rm lin}_{2,\varrho_{D}}\mathrel{\mathop{\mathchar 58\relax}}=q^{\rm lin}_{L_{2}(D,\varrho_{D})} and q2,ϱDstd:=qL2(D,ϱD)stdq^{\rm std}_{2,\varrho_{D}}\mathrel{\mathop{\mathchar 58\relax}}=q^{\rm std}_{L_{2}(D,\varrho_{D})}, if F=(D)F=\ell_{\infty}(D) put qlin:=q(D)linq^{\rm lin}_{\infty}\mathrel{\mathop{\mathchar 58\relax}}=q^{\rm lin}_{\ell_{\infty}(D)} and qstd:=q(D)stdq^{\rm std}_{\infty}\mathrel{\mathop{\mathchar 58\relax}}=q^{\rm std}_{\ell_{\infty}(D)}. Note that, when investigating the approximation numbers an(Id)a_{n}(\mathrm{Id}), one can take F=L(D)F=L_{\infty}(D) with fL(D)=esssup𝐱D|f(𝐱)|\|f\|_{L_{\infty}(D)}=\operatorname{ess\,sup}_{\mathbf{x}\in D}|f(\mathbf{x})| instead of considering the supremum norm of (D)\ell_{\infty}(D) space.

Let us introduce the following two assumptions:

  1. (i)

    NK,ϱD(k)=𝒪(ku)N_{K,\varrho_{D}}(k)=\mathcal{O}(k^{u});

  2. (ii)

    there exist p>1/2p>1/2, C2>0C_{2}>0 such that σjC2jp\sigma_{j}\leq C_{2}j^{-p}, j=1,2,j=1,2,\dots.

Note that the values of qq and pp are optimal in the sense that

u\displaystyle u :=inf{u:NK,ϱD(k)=𝒪(ku)},\displaystyle\mathrel{\mathop{\mathchar 58\relax}}=\inf\left\{u\colon\ N_{K,\varrho_{D}}(k)=\mathcal{O}(k^{u})\right\},
p\displaystyle p :=sup{p:σjC2jp,j=1,2,}.\displaystyle\mathrel{\mathop{\mathchar 58\relax}}=\sup\{p\colon\ \sigma_{j}\leq C_{2}j^{-p},\,j=1,2,\dots\}.

The stronger conditions on the integral operator were earlier introduced by F. Kuo, G. W. Wasilkowski and H. Woźniakowski in [20, 21]. Namely, instead of the condition (i) the authors assume that

  1. (i’)

    there exists C1>0C_{1}>0 such that ηj(D)C1\|\eta_{j}\|_{\ell_{\infty}(D)}\leq C_{1}, j=1,2,j=1,2,\dots,

i.e., consider the special case NK,ϱD(k)=𝒪(k)N_{K,\varrho_{D}}(k)=\mathcal{O}(k). Note, that the constants C1,C2C_{1},C_{2} can depend on the dimension dd. We consider concrete examples of RKHS which satisfy the indicated conditions.

If the second assumption is true, then the optimal rate of convergence among all algorithms that use linear information in the worst case setting and error measured in L2(D,ϱD)L_{2}(D,\varrho_{D})-norm is q2,ϱDlin=pq^{\rm lin}_{2,\varrho_{D}}=p. If (i’) and (ii) are satisfied, then, when switching from L2(D,ϱD)L_{2}(D,\varrho_{D}) to L(D)L_{\infty}(D) norm, we loose 1/21/2 in the optimal order [20]:

qlin=q2,ϱDlin1/2=p1/2,q^{\rm lin}_{\infty}=q^{\rm lin}_{2,\varrho_{D}}-1/2=p-1/2\,,

where p>1/2p>1/2. Clearly, we only need the weaker assumption (i) for such a statement.

Now we move to (D)\ell_{\infty}(D)-error among all algorithms that use standard information in the worst case setting. It was proved in [21] that under the assumptions (i’) and (ii) the optimal order of convergence

qstd[2p2p+1(p12),p12].q^{\rm std}_{\infty}\in\left[\frac{2p}{2p+1}\left(p-\frac{1}{2}\right),p-\frac{1}{2}\right]. (6.1)

The upper bound can not be improved even for linear information. From Theorem 4.2 we get the following improvement over (6.1).

Corollary 6.1.

Suppose that (i) and (ii) hold true with 2p>u2p>u. Then qstdpu/2q^{\rm std}_{\infty}\geq p-u/2. In case u=1u=1 (or if (i’) holds) we have qstd=qlin=p1/2q^{\rm std}_{\infty}=q^{\rm lin}_{\infty}=p-1/2.

Proof.

Due to the inequality

k=n+1kαntαdt=nα+1α1.\sum\limits_{k=n+1}^{\infty}k^{-\alpha}\leq\int_{n}^{\infty}t^{-\alpha}{\rm d}t=\frac{n^{-\alpha+1}}{\alpha-1}.

This holds for any α>1\alpha>1. We obtain

max{NK,ϱD(m)mkm/2σk2,km/2NK,ϱD(4k)σk2k}\displaystyle\max\Big{\{}\frac{N_{K,\varrho_{D}}(m)}{m}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2},\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\Big{\}}
max{mu1km/2k2p,km/2ku1k2p}\displaystyle\lesssim\max\Big{\{}m^{u-1}\sum\limits_{k\geq\lfloor m/2\rfloor}k^{-2p},\sum\limits_{k\geq\lfloor m/2\rfloor}k^{u-1}k^{-2p}\Big{\}}
max{mu1(m/22p+12p1m/22p+1),m/2(2pu+1)+12pum/2(2pu)}\displaystyle\leq\max\Bigg{\{}m^{u-1}\left(\lfloor m/2\rfloor^{-2p}+\frac{1}{2p-1}\lfloor m/2\rfloor^{-2p+1}\right),\lfloor m/2\rfloor^{-(2p-u+1)}+\frac{1}{2p-u}\lfloor m/2\rfloor^{-(2p-u)}\Bigg{\}}
max{2p2p1mu1m/22p+1,2pu+12pum/2(2pu)}pm(2pu).\displaystyle\leq\max\Bigg{\{}\frac{2p}{2p-1}m^{u-1}\lfloor m/2\rfloor^{-2p+1},\frac{2p-u+1}{2p-u}\lfloor m/2\rfloor^{-(2p-u)}\Bigg{\}}\lesssim_{p}m^{-(2p-u)}. (6.2)

Respectively, for n=bmlogmn=\lfloor bm\log m\rfloor, where b>0b>0 is an absolute constant, the estimate (i) of Theorem 4.2 yields

gn(Id)pn(pu/2)(logn)pu/2.g_{n}(\mathrm{Id})\lesssim_{p}n^{-(p-u/2)}(\log n)^{p-u/2}.

We proved now, that under the considered assumptions qstdpu/2q^{\rm std}_{\infty}\geq p-u/2. In case q=1q=1, i.e., where (i) = (i’) we have that qstd=qlin=p1/2q^{\rm std}_{\infty}=q^{\rm lin}_{\infty}=p-1/2. ∎

7 Examples

7.1 Sobolev type spaces

Let us begin with an application of the obtained results to classes of periodic functions. We compare the terms in Corollary 4.3 in the case when σms,dms(logm)α\sigma_{m}\asymp_{s,d}m^{-s}(\log m)^{\alpha} with some α>0\alpha>0. This holds, in particular, for widely used mixed Sobolev classes.

So, let 𝕋=[0,2π]\mathds{T}=[0,2\pi] be a torus where the endpoints of the interval are identified. By 𝕋d\mathds{T}^{d} we denote a dd-dimensional torus and equip it with the Lebesque measure (2π)dd𝐱(2\pi)^{-d}{\rm d}\mathbf{x}.

Let further L2(𝕋d)L_{2}(\mathds{T}^{d}) be the space of all (equivalence classes of) 2π2\pi-periodic in each component measurable functions f(𝐱)=f(x1,,xd)f(\mathbf{x})=f(x_{1},\dots,x_{d}) on 𝕋d\mathds{T}^{d} such that

fL2(𝕋d)=(2π)d/2(𝕋d|f(𝐱)|2d𝐱)1/2<.\|f\|_{L_{2}(\mathds{T}^{d})}=(2\pi)^{-d/2}\left(\int_{\mathds{T}^{d}}|f(\mathbf{x})|^{2}{\rm d}\mathbf{x}\right)^{1/2}<\infty.

The functions fL2(𝕋d)f\in L_{2}(\mathds{T}^{d}) are completely defined by their Fourier coefficients c𝐤(f)c_{\mathbf{k}}(f), 𝐤d\mathbf{k}\in\mathds{Z}^{d}, with respect to the trigonometric system {ei𝐤𝐱}𝐤d\left\{e^{i\mathbf{k}\cdot\mathbf{x}}\right\}_{\mathbf{k}\in\mathds{Z}^{d}} (which form an orthonormal basis in L2(𝕋d)L_{2}(\mathds{T}^{d})):

c𝐤(f)=(2π)d𝕋df(𝐱)ei𝐤𝐱˙d𝐱,𝐤d,c_{\mathbf{k}}(f)=(2\pi)^{-d}\int_{\mathds{T}^{d}}f(\mathbf{x})e^{-i\mathbf{k}\dot{\mathbf{x}}}{\rm d}\mathbf{x},\quad\mathbf{k}\in\mathds{Z}^{d}, (7.1)

namely, it holds

f(𝐱)=𝐤dc𝐤(f)ei𝐤𝐱˙.f(\mathbf{x})=\sum_{\mathbf{k}\in\mathds{Z}^{d}}c_{\mathbf{k}}(f)e^{i\mathbf{k}\dot{\mathbf{x}}}.

The notation HwH^{w} stands for a Hilbert space of integrable functions on the torus 𝕋d\mathds{T}^{d} such that

fHw(𝕋d)=(𝐤d(w(𝐤))2|c𝐤(f))|2)1/2<,\|f\|_{H^{w}(\mathds{T}^{d})}=\left(\sum_{\mathbf{k}\in\mathds{Z}^{d}}(w(\mathbf{k}))^{2}|c_{\mathbf{k}}(f))|^{2}\right)^{1/2}<\infty,

where w(𝐤)>0w(\mathbf{k})>0, 𝐤d\mathbf{k}\in\mathds{Z}^{d}, are certain weights.

F. Cobos, T. Kühn and W. Sickel [4, Theorem 3.1] established necessary and sufficient condition on the weights w(𝐤)w(\mathbf{k}), which guarantee the existence of compact embedding of Hw(𝕋d)H^{w}(\mathds{T}^{d}) in L(𝕋d)L_{\infty}(\mathds{T}^{d}), where fL(𝕋d)=esssup𝐱𝕋d|f(𝐱)|\|f\|_{L_{\infty}(\mathds{T}^{d})}=\operatorname{ess\,sup}_{\mathbf{x}\in\mathds{T}^{d}}|f(\mathbf{x})| (recall, that for the space (𝕋d)\ell_{\infty}(\mathds{T}^{d}) we define the supremum norm). Namely, it was proved that Hw(𝕋d)L(𝕋d)H^{w}(\mathds{T}^{d})\hookrightarrow L_{\infty}(\mathds{T}^{d}) compactly if and only if

𝐤d1(w(𝐤))2<.\sum_{\mathbf{k}\in\mathds{Z}^{d}}\frac{1}{(w(\mathbf{k}))^{2}}<\infty\,. (7.2)

Now we move to the concept of RKHS. Let us define the kernel

Kw(𝐱,𝐲)=𝐤dei𝐤(𝐱𝐲)(w(𝐤))2,K_{w}(\mathbf{x},\mathbf{y})=\sum_{\mathbf{k}\in\mathds{Z}^{d}}\frac{e^{i\mathbf{k}\cdot(\mathbf{x}-\mathbf{y})}}{\left(w(\mathbf{k})\right)^{2}}\,, (7.3)

which is bounded under the condition (7.2). The space Hw(𝕋d)H^{w}(\mathds{T}^{d}) is a reproducing kernel Hilbert space with the kernel (7.3), and the boundedness of KwK_{w} implies that Hw(𝕋d)H^{w}(\mathds{T}^{d}) is compactly embedded also into (𝕋d)\ell_{\infty}(\mathds{T}^{d}). The eigenvalues of the operator WwW_{w} are given by λ𝐤:=(w(𝐤))2\lambda_{\mathbf{k}}\mathrel{\mathop{\mathchar 58\relax}}=(w(\mathbf{k}))^{-2} . Let us consider the frequency set

I(R):={𝐤d:w(𝐤)R}I(R)\mathrel{\mathop{\mathchar 58\relax}}=\Big{\{}\mathbf{k}\in\mathds{Z}^{d}~{}\mathrel{\mathop{\mathchar 58\relax}}~{}w(\mathbf{k})\leq R\Big{\}}

and let m(R)m(R) denote its cardinality. The reconstruction operator S𝐗m(R)S^{m(R)}_{\mathbf{X}} is defined for the set of functions η𝐤()=exp(i𝐤)\eta_{\mathbf{k}}(\cdot)=\exp(i\mathbf{k}\cdot) for 𝐤I(R)\mathbf{k}\in I(R) by (3.2). Note, that here we use the non-weighted least squares algorithm since ηk(D)=1\|\eta_{k}\|_{\ell_{\infty}(D)}=1, kk\in\mathds{N}.

From Theorem 3.1 (Remark 3.5) we get

Theorem 7.1.

Let Hw(𝕋d)H^{w}(\mathds{T}^{d}) be a Sobolev type space given by weights w(𝐤)w(\mathbf{k}) satisfying the condition

𝐤d1(w(𝐤))2<.\sum_{\mathbf{k}\in\mathds{Z}^{d}}\frac{1}{(w(\mathbf{k}))^{2}}<\infty\,.

Let further be R1R\geq 1, r>1r>1 and nn be smallest such that

m(R)nc1rlogn,m(R)\leq\Big{\lfloor}\frac{n}{c_{1}r\log n}\Big{\rfloor}\,,

where c1>0c_{1}>0 is an absolute constant. Then the random reconstruction operator S𝐗m(R)S_{\mathbf{X}}^{m(R)} satisfies

(supfH(Kw)1fS𝐗m(R)fL(𝕋d)2C𝐤:w(𝐤)>R(w(𝐤))2)13n1r\mathds{P}\left(\sup\limits_{\|f\|_{H(K_{w})}\leq 1}\|f-S_{\mathbf{X}}^{m(R)}f\|^{2}_{L_{\infty}(\mathds{T}^{d})}\leq C\sum\limits_{\mathbf{k}\mathrel{\mathop{\mathchar 58\relax}}w(\mathbf{k})>R}\left(w(\mathbf{k})\right)^{-2}\right)\geq 1-3n^{1-r} (7.4)

with universal constant C>0C>0.

In what follows, we consider concrete examples of weights w(𝐤)w(\mathbf{k}), 𝐤d\mathbf{k}\in\mathds{Z}^{d}, and respective Sobolev classes of functions. Note, that the study of different characteristics on these classes recently became of interest. It is due to the application of Sobolev classes of functions in a number of practical issues, as event modeling, quantum chemistry, signal reconstruction, etc.

Theorem 7.2.

Under the assumptions of Theorem 7.1, the estimate

gn(Iw)Cmin{an/blogn(Iw),lognacn(Iw)}g_{n}(\operatorname{I}_{w})\leq C\min\left\{a_{\lfloor n/b\log n\rfloor}(\operatorname{I}_{w})\,,\sqrt{\log n}\cdot a_{\lfloor cn\rfloor}(\operatorname{I}_{w})\right\} (7.5)

holds for the sampling numbers of the embedding Iw:Hw(𝕋d)L(𝕋d)\operatorname{I}_{w}\colon H^{w}(\mathds{T}^{d})\rightarrow L_{\infty}(\mathds{T}^{d}), where C,b,c>0C,b,c>0 are absolute constants. Note that

an(Iw)2kn+1τk2,a_{n}(\operatorname{I}_{w})^{2}\leq\sum\limits_{k\geq n+1}\tau_{k}^{2}\,,

where (τn)n(\tau_{n})_{n} is the non-increasing rearrangement of (1/w(𝐤))𝐤d(1/w(\mathbf{k}))_{\mathbf{k}\in\mathds{Z}^{d}} .

Proof.

The relation (7.5) follows from Corollary 4.3 and the fact that Nw(m)=N(m)=m1mN_{w}(m)=N(m)=m-1\leq m. Note that the eigenfunctions η𝐤()\eta_{\mathbf{k}}(\cdot) are always exp(i𝐤)\exp(i\mathbf{k}\cdot) no matter what w()w(\cdot) is. ∎

Remark 7.3.

Theorem 7.2 improves the recent estimate by L. Kämmerer [8, Theorem 4.11]. There it was shown

gm(Iw)Clogmam/logm(Iw)g_{m}(\operatorname{I}_{w})\leq C\log m\cdot a_{\lfloor m/\log m\rfloor}(\operatorname{I}_{w}) (7.6)

under some additional conditions, that we do not need for the relation (7.5). Note also, that (7.6) is an improvement of the result by L. Kämmerer and T. Volkmer [10].

7.2 Sobolev Hilbert spaces with mixed smoothness

Let us specify the weight sequence (w(𝐤))𝐤(w(\mathbf{k}))_{\mathbf{k}} as follows. First, let us define ws#(𝐤):=j=1d(1+|kj|)sw_{s}^{\#}(\mathbf{k})\mathrel{\mathop{\mathchar 58\relax}}=\prod_{j=1}^{d}(1+|k_{j}|)^{s} , 𝐤d\mathbf{k}\in\mathds{Z}^{d}, s>1/2s>1/2. By Hmixs,#(𝕋d)H^{s,\#}_{{\rm mix}}(\mathds{T}^{d}) we denote the periodic Sobolev space (see, e.g., [17])

Hmixs,#(𝕋d)={fL2(𝕋d):fHmixs,#(𝕋d)=(𝐤d|c𝐤(f)|2j=1d(1+|kj|)2s)1/2<}.H^{s,\#}_{{\rm mix}}(\mathds{T}^{d})=\left\{f\in L_{2}(\mathds{T}^{d})\colon\quad\|f\|_{H^{s,\#}_{{\rm mix}}(\mathds{T}^{d})}=\left(\sum_{\mathbf{k}\in\mathds{Z}^{d}}|c_{\mathbf{k}}(f)|^{2}\prod_{j=1}^{d}(1+|k_{j}|)^{2s}\right)^{1/2}<\infty\right\}. (7.7)

One could also choose the weight of the form ws+(𝐤):=j=1d(1+|kj|2)s/2w_{s}^{+}(\mathbf{k})\mathrel{\mathop{\mathchar 58\relax}}=\prod_{j=1}^{d}\left(1+|k_{j}|^{2}\right)^{s/2}, 𝐤d\mathbf{k}\in\mathds{Z}^{d}, s>1/2s>1/2. The corresponding classes are denoted by Hmixs,+(𝕋d)H^{s,+}_{{\rm mix}}(\mathds{T}^{d}), i.e.,

Hmixs,+(𝕋d):={fL2(𝕋d):fHmixs,+(𝕋d)=(𝐤d|c𝐤(f)|2j=1d(1+|kj|2)s)1/2<}.H^{s,+}_{{\rm mix}}(\mathds{T}^{d})\mathrel{\mathop{\mathchar 58\relax}}=\Bigg{\{}f\in L_{2}(\mathds{T}^{d})\colon\quad\|f\|_{H^{s,+}_{{\rm mix}}(\mathds{T}^{d})}=\Bigg{(}\sum_{\mathbf{k}\in\mathds{Z}^{d}}|c_{\mathbf{k}}(f)|^{2}\prod_{j=1}^{d}\left(1+|k_{j}|^{2}\right)^{s}\Bigg{)}^{1/2}<\infty\Bigg{\}}.

The norms of the classes Hmixs,#(𝕋d)H^{s,\#}_{{\rm mix}}(\mathds{T}^{d}) and Hmixs,+(𝕋d)H^{s,+}_{{\rm mix}}(\mathds{T}^{d}) are equivalent (up to constants that can depend on the dimension dd and the smoothness parameter ss). Therefore, an asymptotic behavior of sampling numbers of these classes coincide. When considering asymptotic issues let us write Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}), which is a common notation for both introduced classes. For singular numbers of the embedding operator Id:Hmixs(𝕋d)L2(𝕋d)\mathrm{Id}\colon H^{s}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{2}(\mathds{T}^{d}), it is known that

σns,dns(logn)s(d1).\sigma_{n}\lesssim_{s,d}n^{-s}(\log n)^{s(d-1)}\,. (7.8)
Theorem 7.4.

Let r>1r>1, m=nc1rlognm=\Big{\lfloor}\frac{n}{c_{1}r\log n}\Big{\rfloor} , where c1>0c_{1}>0 is an absolute constant. Then

(supfHmixs(𝕋d)1fS𝐗mfL(𝕋d)s,d,rns+1/2(logn)sd1/2)13n1r\mathds{P}\left(\sup_{\|f\|_{H^{s}_{{\rm mix}}(\mathds{T}^{d})}\leq 1}\|f-S_{\mathbf{X}}^{m}f\|_{L_{\infty}(\mathds{T}^{d})}\lesssim_{s,d,r}n^{-s+1/2}\left(\log n\right)^{sd-1/2}\right)\geq 1-3n^{1-r} (7.9)

is true as nn\rightarrow\infty.

Proof.

The choice of mm\in\mathds{N} yields m1C1rlognnm^{-1}\leq C_{1}\frac{r\log n}{n} , C1>0C_{1}>0, hence, from (7.8) we get

σms,drsns(logn)sd,\sigma_{m}\lesssim_{s,d}r^{s}n^{-s}\left(\log n\right)^{sd}\,, (7.10)

and therefore

km/2σk2\displaystyle\sum_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2} s,dr2skm/2k2s(logk)2s(d1)r2s(m/2)2s+1(log(m/2))2s(d1)\displaystyle\lesssim_{s,d}r^{2s}\sum_{k\geq\lfloor m/2\rfloor}k^{-2s}(\log k)^{2s(d-1)}\leq r^{2s}(m/2)^{-2s+1}\left(\log(m/2)\right)^{2s(d-1)}
sr2s1n2s+1(logn)2s1(logn)2s(d1)=r2s1n2s+1(logn)2sd1.\displaystyle\lesssim_{s}r^{2s-1}n^{-2s+1}\left(\log n\right)^{2s-1}\left(\log n\right)^{2s(d-1)}=r^{2s-1}n^{-2s+1}\left(\log n\right)^{2sd-1}\,.

Respectively, from Theorem 7.1 we get the estimate (7.9). ∎

Theorem 7.5.

For Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}), s>1/2s>1/2, it holds

(i) There is an absolute constant b>0b>0 such that

gbmlogm(Id:Hmixs(𝕋d)L(𝕋d))s,dms+1/2(logm)s(d1).g_{\lfloor bm\log m\rfloor}(\mathrm{Id}\colon H^{s}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{\infty}(\mathds{T}^{d}))\lesssim_{s,d}m^{-s+1/2}(\log m)^{s(d-1)}\,.

(ii) There is an absolute constant c>0c>0 such that

gcn(Id:Hmixs(𝕋d))L(𝕋d))s,dns+1/2(logn)s(d1)+1/2.g_{\lfloor cn\rfloor}(\mathrm{Id}\colon H^{s}_{{\rm mix}}(\mathds{T}^{d}))\rightarrow L_{\infty}(\mathds{T}^{d}))\lesssim_{s,d}n^{-s+1/2}(\log n)^{s(d-1)+1/2}\,. (7.11)
Proof.

The statement (i) follows immediately from Theorem 7.4.

To get (ii), we use instead of Theorem 7.1 the improved bound for sampling numbers from Section 4, namely, the statement (ii) of Theorem 4.2 (Corollary 4.3). ∎

Remark 7.6.

Comparing the estimates (i) and (ii) from Theorem 7.5, we see a reduced exponent in the logarithmic term of (ii) for s>1s>1.

Remark 7.7.

V.N. Temlyakov [34] proved the following estimate for sparse grids:

gn(Id:Hmixs(𝕋d)L(𝕋d))s,dns+1/2(logn)(d1)s,s>1/2.g_{n}(\mathrm{Id}\colon H^{s}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{\infty}(\mathds{T}^{d}))\asymp_{s,d}n^{-s+1/2}(\log n)^{(d-1)s}\,,\quad s>1/2.

The estimate of Theorem 7.5 is a little worse if we look at the power of the logarithm. Let us comment on the preasymptotic estimates which give us the precise constants for sufficiently smooth function classes with #\#-norm. Similar results can be given for the “++”-norm, where we use (7.19).

Note, that we can use the plain (non-weighted) least squares algorithm here. This is why we have a slightly better mm, nn scaling here.

Theorem 7.8.

Let s>1+log2d2s>\frac{1+\log_{2}d}{2}, β:=2s/(1+log2d)>1\beta\mathrel{\mathop{\mathchar 58\relax}}=2s/(1+\log_{2}d)>1 and r>1r>1. For nn\in\mathds{N}, n3n\geq 3, we define mm\in\mathds{N} by

m=n10rlogn.m=\Bigl{\lfloor}\frac{n}{10r\log n}\Bigr{\rfloor}.

Then

(supfHmixs,#(𝕋d)1fS𝐗mfL(𝕋d)21612(163)βββ1(m21)β+1)13n1r.\mathds{P}\left(\sup_{\|f\|_{H^{s,\#}_{{\rm mix}}(\mathds{T}^{d})}\leq 1}\|f-S_{\mathbf{X}}^{m}f\|^{2}_{L_{\infty}(\mathds{T}^{d})}\leq 1612\left(\frac{16}{3}\right)^{\beta}\frac{\beta}{\beta-1}\left(\frac{m}{2}-1\right)^{-\beta+1}\right)\geq 1-3n^{1-r}. (7.12)
Proof.

In [15, Theorem 4.1] it was established, that

σn#(163n)s1+log2d,n6.\sigma_{n}^{\#}\leq\left(\frac{16}{3n}\right)^{\frac{s}{1+\log_{2}d}}\,,\quad n\geq 6.

We obtain

km/2(σk#)2\displaystyle\sum_{k\geq\lfloor m/2\rfloor}\left(\sigma_{k}^{\#}\right)^{2} (163)βkm/2kβ(163)β(m/2β+1β1m/2β+1)\displaystyle\leq\left(\frac{16}{3}\right)^{\beta}\sum_{k\geq\lfloor m/2\rfloor}k^{-\beta}\leq\left(\frac{16}{3}\right)^{\beta}\left(\lfloor m/2\rfloor^{-\beta}+\frac{1}{\beta-1}\lfloor m/2\rfloor^{-\beta+1}\right)
(163)βββ1m/2β+1(163)βββ1(m21)β+1.\displaystyle\leq\left(\frac{16}{3}\right)^{\beta}\frac{\beta}{\beta-1}\lfloor m/2\rfloor^{-\beta+1}\leq\left(\frac{16}{3}\right)^{\beta}\frac{\beta}{\beta-1}\left(\frac{m}{2}-1\right)^{-\beta+1}. (7.13)

Hence, from Theorem 3.1 (see Remark 3.2), we have with high probability that

supfHmixs,#(𝕋d)1fS𝐗mfL(𝕋d)2403max{km/2σk2, 4km/2σk2},\sup_{\|f\|_{H^{s,\#}_{{\rm mix}}(\mathds{T}^{d})}\leq 1}\|f-S_{\mathbf{X}}^{m}f\|^{2}_{L_{\infty}(\mathds{T}^{d})}\leq 403\max\Bigg{\{}\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2},\ \ 4\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}\Bigg{\}}\,,

which, in combination with (7.13), yields the estimate (7.12). ∎

7.3 A univariate example

Theorems 3.1 and 4.2 can be applied to non-bounded orthonormal systems. There is a large source of examples already for the univariate case stemming from second order differential operators in connection with orthogonal polynomials, see for instance [25, P. 108, Lem. 5]. Here the growth of the Christoffel function for the space of polynomials of degree nn for different Jacobi weight parameters α,β>1\alpha,\beta>-1 is given. It turns that if α=β1/2\alpha=\beta\leq-1/2 we have N(n)=𝒪(n)N(n)=\mathcal{O}(n) otherwise not. We focus on Legendre polynomials (α=β=0\alpha=\beta=0) and the corresponding second order differential operator. Here we consider D=[1,1]D=[-1,1] together with the uniform measure dx\rm dx on DD. The second order operator AA, that is defined by

Af(x)=((1x2)v),Af(x)=-((1-x^{2})v^{\prime})^{\prime}\,,

characterizes for s>1s>1 weighted Sobolev spaces

H(Ks):={fL2(D):As/2fL2(D)}.H(K_{s})\mathrel{\mathop{\mathchar 58\relax}}=\{f\in L_{2}(D)\colon\ A^{s/2}f\in L_{2}(D)\}\,.

These spaces are RKHS with the kernel

Ks(x,y)=k(1+(k(k+1))s)1𝒫k(x)𝒫k(y),K_{s}(x,y)=\sum_{k\in\hbox{\msbm{N}}}(1+(k(k+1))^{s})^{-1}\mathcal{P}_{k}(x)\mathcal{P}_{k}(y)\,,

where 𝒫k:D\mathcal{P}_{k}\colon D\rightarrow\mathds{R}, kk\in\mathds{N}, are L2(D)L_{2}(D)-normalized Legendre polynomials 𝒫k(x)\mathcal{P}_{k}(x).

In this setting, we have (ηk)k=1=(𝒫k)k=1(\eta_{k})_{k=1}^{\infty}=(\mathcal{P}_{k})_{k=1}^{\infty} , (ek)k=1=((1+(k(k+1))s)1/2𝒫k)k=1(e^{*}_{k})_{k=1}^{\infty}=((1+(k(k+1))^{s})^{-1/2}\mathcal{P}_{k})_{k=1}^{\infty} , and, accordingly, σk=((1+(k(k+1))s)1/2\sigma_{k}=((1+(k(k+1))^{s})^{-1/2}. As for the spectral function, we get

N(m)=supxDk=1m1|𝒫k(x)|2=k=0m22k+12=(m1)22.N(m)=\sup_{x\in D}\sum_{k=1}^{m-1}|\mathcal{P}_{k}(x)|^{2}=\sum_{k=0}^{m-2}\frac{2k+1}{2}=\frac{(m-1)^{2}}{2}\,.

Hence,

km/2σk2=km/2((1+(k(k+1))s)1km/2k2ssm2s+1\displaystyle\sum\limits_{k\geq\lfloor m/2\rfloor}\sigma_{k}^{2}=\sum\limits_{k\geq\lfloor m/2\rfloor}((1+(k(k+1))^{s})^{-1}\lesssim\sum\limits_{k\geq\lfloor m/2\rfloor}k^{-2s}\lesssim_{s}m^{-2s+1}
km/2NK,ϱD(4k)σk2kkm/2kk2ssm2s+2,\displaystyle\sum\limits_{k\geq\lfloor m/2\rfloor}\frac{N_{K,\varrho_{D}}(4k)\sigma_{k}^{2}}{k}\lesssim\sum\limits_{k\geq\lfloor m/2\rfloor}kk^{-2s}\lesssim_{s}m^{-2s+2}\,,
supfH(Ks)1fS~𝐗mfL(D)sms+1ns+1(logn)s1.\sup\limits_{\|f\|_{H(K_{s})}\leq 1}\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{L_{\infty}(D)}\lesssim_{s}m^{-s+1}\lesssim n^{-s+1}(\log n)^{s-1}\,. (7.14)

Applying the improved bounds for sampling numbers from Theorem 4.2, we derive to the estimate

gn(Id)sns+1(logn)min{s1,1/2}.g_{n}(\mathrm{Id})\lesssim_{s}n^{-s+1}(\log n)^{\min\{s-1,1/2\}}\,. (7.15)

Note, that Ch. Bernardi and Y. Maday [2, Theorem 6.2] got optimal in the main rate estimates for approximation by a polynomial operator at Gauss points in the space L2(D)L_{2}(D). We are not aware of any existing L(D)L_{\infty}(D) bounds in the literature. In the paper by L. Kämmerer, T. Ullrich and T. Volkmer [9], the authors obtained improved worst case error estimates with high probability in the space L2(D)L_{2}(D) (see Examples 5.7 and 5.10 there).

Remark 7.9.

Note, that using the density function ϱm\varrho_{m} from Remark 3.4, we have the less sharp bound

supfH(Ks)1fS~𝐗mfL(D)smm2s+2ns+3/2(logn)s3/2.\sup\limits_{\|f\|_{H(K_{s})}\leq 1}\|f-\widetilde{S}_{\mathbf{X}}^{m}f\|_{L_{\infty}(D)}\lesssim_{s}\sqrt{m\cdot m^{-2s+2}}\lesssim n^{-s+3/2}(\log n)^{s-3/2}\,. (7.16)

Without the weighted least squares (see Remark 3.5), we obtain

supfH(Ks)1fS𝐗mfL(D)sms+1\sup\limits_{\|f\|_{H(K_{s})}\leq 1}\|f-S_{\mathbf{X}}^{m}f\|_{L_{\infty}(D)}\lesssim_{s}m^{-s+1}

that is the same in order with respect to the parameter mm, as those in (7.14). But we need to impose the condition N(m)m2cn/lognN(m)\sim m^{2}\leq c\,n/\log n, i.e., mCn/lognm\leq C\,\sqrt{n/\log n}. Hence, in terms of the number nn of used samples we get

supfH(Ks)1fS𝐗mfL(D)sn(s1)/2(logn)(s1)/2,\sup\limits_{\|f\|_{H(K_{s})}\leq 1}\|f-S_{\mathbf{X}}^{m}f\|_{L_{\infty}(D)}\lesssim_{s}n^{-(s-1)/2}(\log n)^{(s-1)/2}\,, (7.17)

which sometimes even better than (7.16), namely if 1<s<21<s<2. However, (7.16) and (7.17) are never better than the right-hand side in (7.15).

Appendix. Preasymptotic estimates for singular numbers.

Note, that the asymptotic rate of the singular numbers σn(Id:Hmixs(𝕋d)L2(𝕋d))\sigma_{n}(\mathrm{Id}\colon H^{s}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{2}(\mathds{T}^{d})) as nn\to\infty is known for a long time in many cases. So, for the periodic Sobolev classes Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}) of functions with dominating mixed smoothness the order of approximation numbers σn)\sigma_{n}) was obtained by B.S. Mityagin in 1962 (in a slightly general setting). As for the isotropic classes Hs(𝕋d)H^{s}(\mathds{T}^{d}) the order of corresponding approximation numbers was proved by J.W. Jerome in 1967. A lot of scientists were involved in finding the optimal orders of convergence for linear algorithms on different function classes, but mainly the constants in the estimates were not specified. And in practical issues the dependence of these constants on the smoothness parameter of the spaces and the dimension of the underlying domain is a crucial point.

Various comments on literature concerning the existing preasymptotic results for mixed order Sobolev functions on the dd-torus can be found in [17, Chapt. 4.5]. We comment only few that are closely connected to our research.

It was shown by T. Kühn, W. Sickel and T. Ullrich [17] that for s>0s>0 and dd\in\mathds{N} it holds

limnnsσn(Id:Hmixs(𝕋d)L2(𝕋d))(logn)s(d1)=(2d(d1)!)s\lim\limits_{n\to\infty}\frac{n^{s}\sigma_{n}(\mathrm{Id}\colon H^{s}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{2}(\mathds{T}^{d}))}{(\log n)^{s(d-1)}}=\left(\frac{2^{d}}{(d-1)!}\right)^{s}

(they showed an existence of the limit and that its value is independent of the norm choice). The preasymptotic estimates (for small nn) depend heavily on the choice of corresponding norm in the space Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}). Therefore, authors further considered several natural norms (#,\#,* and ++) in the space Hmixs(𝕋d)H^{s}_{{\rm mix}}(\mathds{T}^{d}), in particular they showed that for the classes Hmixs,#(𝕋d)H^{s,\#}_{{\rm mix}}(\mathds{T}^{d}) defined exactly as in (7.7) it holds for s>0s>0, dd\in\mathds{N}, d2d\geq 2 and 1n4d1\leq n\leq 4^{d} the estimate

σn(Id:Hmixs,#(𝕋d)L2(𝕋d))(e2n)s2+log2d\sigma_{n}(\mathrm{Id}\colon H^{s,\#}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{2}(\mathds{T}^{d}))\leq\left(\frac{e^{2}}{n}\right)^{\frac{s}{2+\log_{2}d}}

is true (with the corresponding non-matching lower bound).

This result was improved by T. Kühn [15] where he showed that for s>0s>0, dd\in\mathds{N} and all n6n\geq 6 it holds

σn(Id:Hmixs,#(𝕋d)L2(𝕋d))(163n)s1+log2d.\sigma_{n}(\mathrm{Id}\colon H^{s,\#}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{2}(\mathds{T}^{d}))\leq\left(\frac{16}{3n}\right)^{\frac{s}{1+\log_{2}d}}. (7.18)

The estimate (7.18) contains all values of the parameter nn, not only in preasymptotic range and moreover the factor and the exponent in the corresponding right hand side are better.

In the case of “+”-norm, a new preasymptotic bound for the classes of functions with anisotropic mixed smoothness is obtained by T. Kühn, W. Sickel and T. Ullrich [18, Theorem 4.11]. In particular, it was proved that for s>0s>0, d3d\geq 3 and all n2n\geq 2 it holds

σn(Id:Hmixs,+(𝕋d)L2(𝕋d))(C(d)n)s2(1+log2(d1)),\sigma_{n}(\mathrm{Id}\colon H^{s,+}_{{\rm mix}}(\mathds{T}^{d})\rightarrow L_{2}(\mathds{T}^{d}))\leq\left(\frac{C(d)}{n}\right)^{\frac{s}{2(1+\log_{2}(d-1))}}, (7.19)

where the constant C(d)=(1+1d1(1+2log2(d1)))d1C(d)=\left(1+\frac{1}{d-1}\left(1+\frac{2}{\log_{2}(d-1)}\right)\right)^{d-1}. Note, that this estimate was stated in a more general case. In the paper one can also find an improved upper bound in the case 2ned12\leq n\leq e^{d-1}.

Note also, that D. Krieg [12, Section 4.1] established earlier results on the asymptotic and preasymptotic behavior for tensor powers of arbitrary sequences of polynomial decay, and further estimates for approximation numbers of embeddings of mixed order Sobolev functions on the dd-cube (with different equivalent norms) into L2([a,b]d)L_{2}([a,b]^{d}), where [a,b]{[0,1],[1,1],[0,2π]}[a,b]\in\{[0,1],[-1,1],[0,2\pi]\}.

As to the isotropic Sobolev classes Hs(𝕋d)H^{s}(\mathds{T}^{d}), T. Kühn, S. Mayer and T. Ullrich [16] established a connection between approximation numbers of periodic Sobolev type spaces, where the smoothness weights on the Fourier coefficients are influenced by a (quasi-)norm \|\cdot\| on d\mathds{R}^{d}, and entropy numbers of embeddings for finite dimensional balls. This allowed them to get preasymptotic estimates for isotropic Sobolev spaces and spaces of Gevrey type (in terms of equivalences where the hidden constants do not depend on nn and dd). But in [16] the authors did not indicate explicitly a dependence of these constants on the parameters ss and pp. The exact dependence on the dimension dd and the smoothness ss was given later by T. Kühn [15].

Acknowledgment.

T.U. would like to acknowledge support by the DFG Ul-403/2-1. The authors would like to thank Felix Bartel, David Krieg, Stefan Kunis and Mario Ullrich for valuable comments.

References

  • [1] A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, Boston, MA, 2004. With a preface by Persi Diaconis.
  • [2] C. Bernardi and Y. Maday. Polynomial interpolation results in Sobolev spaces. J. Comput. Appl. Math., 43(1):53–80, 1992.
  • [3] A. Christmann and I. Steinwart. Support Vector Machines. Springer, 2008.
  • [4] F. Cobos, T. Kühn, and W. Sickel. Optimal approximation of multivariate periodic Sobolev functions in the sup-norm. J. Funct. Anal., 270(11):4196–4212, 2016.
  • [5] A. Cohen and G. Migliorati. Optimal weighted least-squares methods. SMAI J. Comput. Math., 3:181–203, 2017.
  • [6] K. Gröchenig. Sampling, Marcinkiewicz-Zygmund inequalities, approximation, and quadrature rules. J. Approx. Theory, 257, 2020.
  • [7] M. Hein and O. Bousquet. Kernels, associated structures and generalizations. Technical Report 127, Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2004.
  • [8] L. Kämmerer. Multiple lattice rules for multivariate LL_{\infty} approximation in the worst-case setting. arXiv: math/1909.02290v1, 2019.
  • [9] L. Kämmerer, T. Ullrich, and T. Volkmer. Worst-case recovery guarantees for least squares approximation using random samples. arXiv: math/1911.10111, 2019.
  • [10] L. Kämmerer and T. Volkmer. Approximation of multivariate periodic functions based on sampling along multiple rank-1 lattices. J. Approx. Theory, 246:1–27, 2019.
  • [11] D. Krieg and M. Sonnleitner. Random points are optimal for the approximation of Sobolev functions. arXiv: math/arXiv:2009.11275, 2020.
  • [12] D. Krieg. Tensor power sequences and the approximation of tensor product operators. J. Complexity, 44:30–51, 2018.
  • [13] D. Krieg and M. Ullrich. Function values are enough for L2{L}_{2}-approximation: Part II. J. Complexity, to appear. arXiv: math/2011.01779.
  • [14] D. Krieg and M. Ullrich. Function values are enough for L2{L}_{2}-approximation. Found. Comp. Math., to appear. arXiv:math/1905.02516v3.
  • [15] T. Kühn. New preasymptotic estimates for the approximation of periodic Sobolev functions. In 2018 MATRIX annals, volume 3 of MATRIX Book Ser., pages 97–112. Springer, Cham., 2020.
  • [16] T. Kühn, S. Mayer, and T. Ullrich. Counting via entropy: new preasymptotics for the approximation numbers of Sobolev embeddings. SIAM J. Numer. Anal., 54(6):3625–3647, 2016.
  • [17] T. Kühn, W. Sickel, and T. Ullrich. Approximation of mixed order Sobolev functions on the dd-torus: asymptotics, preasymptotics and dd-dependence. Constr. Approx., 42:353–398, 2015.
  • [18] T. Kühn, W. Sickel, and T. Ullrich. How anisotropic mixed smoothness affects the decay of singular numbers of Sobolev embeddings. J. Complexity, to appear, arXiv: math/2001.09022v3.
  • [19] R. J. Kunsch. Breaking the curse for uniform approximation in Hilbert spaces via Monte Carlo methods. J. Complexity, 48:15–35, 2018.
  • [20] F. Y. Kuo, G. W. Wasilkowski, and H. Woźniakowski. Multivariate LL_{\infty} approximation in the worst case setting over reproducing kernel Hilbert spaces. J. Approx. Theory, 152(2):135–160, 2008.
  • [21] F. Y. Kuo, G. W. Wasilkowski, and H. Woźniakowski. On the power of standard information for multivariate approximation in the worst case setting. J. Approx. Theory, 158(1):97–125, 2009.
  • [22] I. Limonova and V. N. Temlyakov. On sampling discretization in L2{L}_{2}. arXiv: math/2009.10789v1, 2020.
  • [23] M. Moeller and T. Ullrich. L2L_{2}-norm sampling discretization and recovery of functions from RKHS with finite trace. arXiv: math/2009.11940v1, 2020.
  • [24] N. Nagel, M. Schäfer, and T. Ullrich. A new upper bound for sampling numbers. Found. Comp. Math., to appear, arXiv: math/2010.00327v1.
  • [25] P. G. Nevai. Orthogonal polynomials. Mem. Amer. Math. Soc., 18(213):v+185, 1979.
  • [26] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Vol. 1: Linear information, volume 6 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2008.
  • [27] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Volume II: Standard information for functionals, volume 12 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2010.
  • [28] S. Nitzan, A. Olevskii and A. Ulanovskii. Exponential frames on unbounded sets. Proc. Amer. Math. Soc., 144(1):109–118, 2016.
  • [29] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Volume III: Standard information for operators, volume 18 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2012.
  • [30] K. Y. Osipenko and O. G. Parfenov. Ismagilov type theorems for linear, Gel’fand and Bernstein nn-widths. J. Complexity, 11(4):474–492, 1995.
  • [31] C. C. Paige and M. A. Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8:43–71, 1982.
  • [32] K. Pozharska and T. Ullrich. A note on sampling recovery of multivariate functions in the uniform norm. arXiv: math/2103.11124, 2021.
  • [33] I. Steinwart and C. Scovel. Mercers theorem on general domains: On the interaction between measures, kernels, and rkhss. Constr. Approx., 35, 2012.
  • [34] V. N. Temlyakov. On approximate recovery of functions with bounded mixed derivative. J. Complexity, 9(1):41–59, 1993.
  • [35] V. N. Temlyakov. On optimal recovery in L2L_{2}. J. Complexity, to appear. arXiv: math/2010.03103.
  • [36] V. N. Temlyakov and T. Ullrich. Approximation of functions with small mixed smoothness in the uniform norm. arXiv: math/2012.11983, 2020.
  • [37] V. N. Temlyakov and T. Ullrich. Bounds on Kolmogorov widths and sampling recovery for classes with small mixed smoothness. arXiv: math/2012.09925, 2020.
  • [38] M. Ullrich. On the worst-case error of least squares algorithms for L2{L}_{2}-approximation with high probability. J. Complexity, 60, 2020.
  • [39] G. W. Wasilkowski and H. Woźniakowski. On the power of standard information for weighted approximation. Found. Comput. Math., 1(4):417–434, 2001.