A note on sampling recovery of multivariate functions in the uniform norm
Abstract
We study the recovery of multivariate functions from reproducing kernel Hilbert spaces in the uniform norm.
Our main interest is to obtain preasymptotic estimates for the corresponding sampling numbers. We obtain results in terms of the decay of related singular numbers of the compact embedding into multiplied with the supremum of the Christoffel function of the subspace spanned by the first singular functions. Here the measure is at our disposal. As an application we obtain near optimal upper bounds for the sampling numbers for periodic Sobolev type spaces with general smoothness weight. Those can be bounded in terms of the corresponding benchmark approximation number in the uniform norm, which allows for preasymptotic bounds. By applying a recently introduced sub-sampling technique related to Weaver’s conjecture we mostly lose a and sometimes even less. Finally we point out a relation to the corresponding Kolmogorov numbers.
Keywords and phrases : Sampling recovery, least squares approximation, random sampling, rate of convergence, uniform norm
2010 AMS Mathematics Subject Classification : 41A25, 41A60, 41A63, 42A10, 68W40, 94A20
1 Introduction
We consider the problem of the recovery of complex-valued multivariate functions on a compact domain in the uniform norm. We are allowed to use function samples on a suitable set of nodes . The functions are modeled as elements from some reproducing kernel Hilbert space with kernel . The nodes are drawn independently according to a tailored probability measure depending on the spectral properties of the embedding of in . This “random information” is used for the whole class of functions and our main focus in this paper is on worst-case errors. We additionally apply a recently introduced sub-sampling technique, see [24] to further reduce the sampling budget. Note that in one scenario the measure is at our disposal and represents a certain degree of freedom. In another scenario the measure is a fixed measure, namely the one where the data is coming from.
Authors distinguish two main paradigms for the recovery of functions from “samples”. Here “samples” refer to the available information. The first one uses general “linear” information about functions, i.e., evaluations given by arbitrary linear continuous functionals as a number of inner products of the Hilbert space, e.g., Fourier or wavelet coefficients. The corresponding quantity that measures the minimal worst case error is given by the Gelfand numbers/widths, see the monographs [26, 27, 29]. In case of a Hilbert source space (like in our setting) these quantities equal the so-called approximation numbers which are usually denoted by and represent the worst-case error obtained by linear recovery algorithms. Precisely, let be a reproducing kernel Hilbert space (RKHS) of multivariate complex-valued functions , with the kernel . Consider the identity operator . Further, by we denote the set of linear continuous operators . The approximation numbers are defined as follows
(1.1) |
If is a Hilbert space, then represents the singular numbers of the corresponding embedding.
Several practical applications are modeled with “standard information”, i.e., a finite number of function values at some points. Let be the given data, where , , , are the nodes. The corresponding sampling numbers are defined as follows:
(1.2) |
Note that we always have .
Our goal is to recover functions using a (weighted) least squares algorithm and measure the error in . Function values are taken at independently drawn random nodes in according to a tailored probability measure . The main feature of this approach is that the nodes are drawn once for the whole class. This is usually termed as “random information”. Note that we do not investigate Monte Carlo algorithms in this paper.
A recent breakthrough was made by D. Krieg and M. Ullrich [14] in establishing a new connection between sampling numbers and singular values for -approximation. This has been extended by L. Kämmerer, T. Ullrich and T. Volkmer [9] and bounds with high probability have been obtained, see also M. Moeller and T. Ullrich [23] and M.Ullrich [38] for further refinements. In all the mentioned contributions we need a logarithmic oversampling, i.e, . A new sub-sampling technique has been applied by N. Nagel, M. Schäfer and T. Ullrich in [24], where it is shown that samples are sufficient. For non-Hilbert function classes we refer to D. Krieg and M. Ullrich [13] and V.N. Temlyakov [35], where a connection to Kolmogorov widths is pointed out. There is also a relation in the -setting, see [32] for details.
In the present paper we obtain worst case recovery guarantees with high probability for the weighted least squares recovery operator (see Algorithm 1) where the error is measured in the supremum norm of the space of bounded on functions. The operator uses a density function tailored to the spectral properties of the corresponding embedding into , see (3.3) and (3.4). We need to impose stronger conditions, namely we consider a compact set , a continuous kernel such that and a Borel measure with full support (such that Mercer’s theorem is applicable). Then we get for , , the following error bound with probability larger than
(1.3) |
with three universal constants , where denotes the supremum over the spectral function (Christoffel function) which is defined in Section 2. When applying this theorem to periodic spaces with mixed smoothness we obtain the following result (see Theorem 7.8).
If , , and then
(1.4) |
with probability larger than . Clearly, we have to determine the norm in . In the above statement we used the -norm, see (7.7). This is a preasymptotic statement.
Note, that the sampling operator uses many sample points such that we obtain a bound for . By a recently introduced sub-sampling technique (see [24]) we obtain for the sampling numbers another result which may be better in many cases, namely
(1.5) |
where are universal constants. We evaluate this bound for several relevant examples to obtain new bounds for sampling numbers. Here the question for reasonable assumptions arises to ensure the polynomial order of the error decay, i.e., the supremum over all such that the worst case error is of the order . We improve a result by F. Y. Kuo, G. W. Wasilkowski and H. Woźniakowski [21] and show that in case and . We actually prove more: If there is a measure such that we obtain as a consequence of (1.3) and (1.5) together with [4, Lem. 3.3] the relation
(1.6) |
where is an absolute constant and depends on the kernel and , see Corollary 4.3. This improves on a recent result by L. Kämmerer, see [8, Theorem 4.11], which was stated under stronger conditions (that is, in turn, an improvement of the previous result by L. Kämmerer and T. Volkmer [10])). It shows, that in case we only loose a or sometimes even less (assuming a polynomial decay of the ). In case that , with , we have at least that .
As to spaces with mixed smoothness , , we know by the work of V.N. Temlyakov [34] that
where the optimal algorithm is a (deterministic) Smolyak type interpolation operator. Our algorithm yields the bound
which is only worse by a small -independent -term. However, as indicated in (1.4) we also obtain preasymptotic bounds for from (1.3) and (1.5). The open question remains whether random iid nodes are asymptotically optimal for this framework. Such a question has been recently answered positively for isotropic Sobolev spaces in [11].
Let us point out that the above general results are also applicable to the more general periodic Sobolev type spaces with a weight sequence . If represents the non-increasing rearrangement of the square summable weight sequence then we have
with three universal constants . In most of the cases we do not know whether there is a deterministic method which may compete with this bound.
Notation. By we denote the set of natural numbers putting , by - the set of integer, - real, and - complex numbers. As usual, is decomposed into , where and . Let further , , , , , , be the spaces of -dimensional vectors with coordinates, respectively, from , , , , and . The notation , where , stands for , (relations , , and are defined analogically), is complex conjugate to , is transpose to . For and we write assuming that . Let also , be the number of nonzero components (sparsity) of . The symbols or mean the Euclidean scalar product in or . We use the notation for the natural logarithm. By we denote the set of all matrices with complex entries, where, in what follows, or is their spectral norm (the largest singular value), is adjoint matrix to , i.e., the matrix obtained by taking transpose and then taking complex conjugate of each entry of . For an operator between two Banach spaces we use the symbol for its operator norm, i.e.,
If are Hilbert spaces, by we denote adjoint to operator , such that for , the following equality for the corresponding scalar products holds: . The notation stands for inverse operator. For two sequences we write if there exists a constant , such that for all . We will write if and . If the constant depends on the dimension and smoothness , we indicate it by and .
2 Reproducing kernel Hilbert spaces
We will work in the framework of reproducing kernel Hilbert spaces. The relevant theoretical background can be found in [1, Chapt. 1] and [3, Chapt. 4].
Let be an arbitrary subset and be a measure on . By denote the space of complex-valued square-integrable functions with respect to , where
and, respectively,
With we denote the space of bounded on functions, for which
Note, that one do not need the measure for this embedding. In fact, here we mean “boundedness” in the strong sense (in contrast to essential boundedness w.r.t. the measure ).
We further consider a reproducing kernel Hilbert space with a Hermitian positive definite kernel on . The crucial property is the identity
for all . It ensures that point evaluations are continuous functionals on . We will use the notation from [3, Chapt. 4]. In the framework of this paper, the boundedness of the kernel is assumed, i.e.,
(2.1) |
The condition (2.1) implies that is continuously embedded into , i.e.,
(2.2) |
The embedding operator
(2.3) |
is Hilbert-Schmidt under the finite trace condition of the kernel
(2.4) |
see [7], [33, Lemma 2.3], which is less strong than (2.1). In view of considering in this paper a recovery of functions in the uniform norm, we assume that the kernel is bounded from now. We additionally assume that is at least infinite dimensional.
Consider the embeddings from (2.3) and . Due to the compactness of , the operator provides an (at most) countable system of strictly positive eigenvalues. Let be a non-increasing rearrangement of these eigenvalues, be the sequence of singular numbers of , i.e., , . Further, with denote the system of right eigenfunctions (that correspond to the ordered system ), and with the system consisting of , , which both represent orthonormal systems (ONS) in the respective spaces.
We would like to emphasize that the embedding (2.3) is not necessarily injective. As discussed in [3, p. 127], is in general not injective as it maps a function to an equivalence class. As a consequence, the system of eigenvectors may not be a basis in (note that may not even be separable). However, there are conditions which ensure that ONS is an orthonormal basis (ONB) in , see [3, Chapt. 4.5], which is related to Mercer’s theorem, see [3, Theorem 4.49]. Indeed, if we additionally assume that the kernel is bounded and continuous on (for a compact domain ), then is separable and consists of continuous functions, see [1, Theorems 16, 17]. If we finally assume that the measure is a Borel measure with full support then is a complete orthonormal system, i.e., ONB in .
In this case we have the pointwise identity (see, e.g., [1, Cor. 4])
(2.5) |
In what follows, we assume that the kernel is continuous on a compact domain (the so-called Mercer kernel). Such kernels satisfy all the indicated above conditions (see [3, Theorem 4.49]), and we even have (2.5) with absolute and uniform convergence on . Taking into account the spectral theorem [3, Chapt. 4.5], we have that the system of eigenfunctions of the non-negative compact self-adjoint operator is an orthonormal basis in . Note, that is also ONB in .
Hence, for every with a Mercer kernel , it holds
(2.6) |
where , , are the Fourier coefficients of with respect to the system . Let us further denote
the projection onto the space . Due to the Parseval’s identity, the norm of the space takes the following form
(2.7) |
Note, that for the function
is often called “Christoffel function” in literature (see [6] and references therein). For , we define
(2.8) |
We may use the quantity to bound the operator norm (We use the abbreviation ). It holds
This implies
Using, that , where the sequence is non-increasing, i.e., for all it holds , we obtain by a straight-forward calculation
Hence, in view of the relations and , that are true for all , we get
(2.9) |
Note, that the system is not necessarily a uniformly -bounded ONS in , nevertheless the equality (2.5) is true. For systems , where for all
we have
(2.10) |
3 Least squares approximation in the uniform norm
Let further , , , be the drawn independently nodes according to a Borel measure on that has full support. In [9] a recovery operator was constructed which computes the best least squares fit to the given data from the finite-dimensional space spanned by , .
Namely, let , , and
(3.1) |
We should solve the over-determined linear system via least squares (e.g. directly or via the LSQR algorithm [31]), i.e., compute
The output gives the coefficients of the approximant
(3.2) |
Further, let us consider a weighted least squares algorithm (see below) using the following weight/density function, which was introduced by D. Krieg and M. Ullrich in [14]. We set
(3.3) |
This density first appeared in [14]. We will also use the below algorithm with the simpler density function
(3.4) |
in case is a probability measure (see Remark 3.4). This density function has been implicitly used by V.N. Temlyakov in [35] and by Cohen, Migliorati [5]. In many relevant cases both densities yield similar bounds. However, as we point out in Remark 7.9 it could make a big difference even in the univariate situation.
Input: | set of distinct sampling nodes, | |
samples of evaluated at the nodes from , | ||
such that the matrix in (3.5) has full (column) rank. |
(3.5) |
Output: coefficients of the approximant .
Theorem 3.1.
Let be a reproducing kernel Hilbert space of complex-valued functions defined on a compact domain with a continuous and bounded kernel , , i.e., , and be a finite Borel measure with full support on . We denote with the non-increasing sequence of singular numbers of the embedding . The points are drawn i.i.d. with respect to the measure defined by the density . There are absolute constants such that for
with , the reconstruction operator (see Algorithm 1), satisfies
(3.6) |
Proof.
Let such that . For the reconstruction operator from Algorithm 1 it holds
(3.7) |
where the projection is defined in Section 2.
From (2.9) we obtain that
(3.8) |
Clearly, . Therefore, for the second summand on the right-hand side of (3.7) we can write
(3.9) |
Let us estimate the quantity , i.e., -norm of the coefficients of the operator . We have
(3.13) | ||||
(3.14) |
Now we estimate the spectral norm. For the function
by the choice (3.3) of , it holds . Hence, in view of [24, Theorem 5.1], we get with
for the condition
(3.15) |
and have
(3.16) |
with probability at least . Combining (3.14) and (3.16), and taking into account (2.6), we derive to
(3.17) |
where
is an infinite matrix with , , . Then, (3.17) yields
(3.18) |
with .
We can use now [23, Prop. 3.8], in view of the fact that
Remark 3.2.
Note, that we can get explicit values for the constants in Theorem 3.1. Indeed, following the proof we see, that . As to , the relation (3.20) yields
Then, taking into account (see Remark 2 on p. 16 of [24]) that
we get
where . Respectively, we put in (3.20), and from (3.21) get
Hence, in the general estimate (3.6).
Corollary 3.3.
If there is a measure on such that we obtain
(3.22) |
for constant depending on the kernel and the measure .
Remark 3.4 (Other density).
Instead of (3.3) one could also use the simpler density function (3.4) defined above in case that is a probability measure on (otherwise with a proper normalization). Then the estimate would be slightly worse. Instead of the term
in (3.6) we obtain
where is an absolute constant. This does not make a difference if . However, it makes a difference if we have for instance growing as which is the case for, e.g., Legendre polynomials. See Example 7.3 below.
Remark 3.5 (Non-weighted version).
Note, that one could even use a non-weighted least squares algorithm (see (3.2)), without considering the additional density function . If the measure is fixed, namely the one where the data is coming from, then we get a bound
where is an absolute constant which can be calculated precisely, is the largest number such that .
4 Improved bounds for sampling numbers
By exactly the same strategy as used in the recent paper by N. Nagel, M. Schäfer and T. Ullrich [24], we can improve the bound (3.6) for sampling numbers associated to the compact embedding of RKHS with Mercer kernel into the space of bounded on functions, applying a modification of the Weaver sub-sampling strategy.
Theorem 4.1 ([24, 28, 22]).
Let and with for all and
for all . Then there is a of size with
for all , where only depend on . More precisely, we can choose
in case . In the regime one may put , , .
Theorem 4.2.
Under the assumptions of Theorem 3.1 we obtain the following bounds for the sampling numbers . The measure is at our disposal.
(i) There is an absolute constant such that
(ii) There are absolute constants such that
Proof.
The statement (i) follows immediately from Theorem 3.1.
To get (ii), we apply a sampling operator using sampling nodes constructed out of a random draw in combination with a sub-sampling procedure, instead of (as in Algorithm 1). That is, we omit some rows in the matrix from (3.1) (respectively, in ) by constructing a sum-matrix having rows, such that for all (see Theorem 4.1) it holds
With this matrix we perform the least squares method, applied to the shrinked vector of function samples , see the proof of Theorem 5 in [24] for the details. ∎
Corollary 4.3.
If there is a measure on such that we obtain
(4.1) |
Proof.
Remark 4.4.
The estimate from Corollary 4.3 improves the upper bound for sampling numbers of the embeddings of RKHS, consisting of continuous periodic functions, into , that was established in a recent result by L. Kämmerer, see [8, Theorem 4.11]. Note also, that this result was stated under stronger conditions on a weight function. In turn, it is an an improvement of the previous result by L. Kämmerer and T. Volkmer [10].
Remark 4.5.
The estimate from (ii) of Theorem 4.2 is non-constructive, since we have just the fact of existence of such algorithm.
5 Sampling and Kolmogorov numbers
For a compact set of complex-valued functions defined on a compact domain , let us denote by its th Kolmogorov number, i.e.,
where is -dimensional subspace of . Assume also, that we know the optimal subspace for , i.e., a subspace such that
Let further, be a finite measure on and be any ONS in with respect to . The following statement holds.
Proposition 5.1.
Let be compactly embedded subspace (of complex-valued continuous functions) of , where is a compact subset. Then there is an absolute constant such that
holds, where
(5.1) |
and the finite measure on is at our disposal.
Proof.
Remark 5.2.
The following corollary is in the spirit of [35, Theorem 1.1].
Corollary 5.3.
There are universal constants such that it holds
Proof.
This follows by an application of the subsampling technique in Theorem 4.1. ∎
Remark 5.4.
It was shown by V.N. Temlyakov [35, Theorem 1.1], that for a compact set of continuous on complex-valued functions, there exist two positive constants and such that
For special sets (in the reproducing kernel Hilbert space setting) the estimate
with universal constants , was earlier proved by N. Nagel, M. Schäfer and T. Ullrich [24] based on D. Krieg and M. Ullrich [14]. Similar results for non-Hilbert function spaces were obtained in the second part of research by D. Krieg and M. Ullrich [13].
Remark 5.5.
The above bound on sampling numbers in terms of Kolmogorov numbers motivates the investigation of Kolmogorov numbers in certain situations. It further motivates the study of the Christoffel function of the related optimal subspaces. Currently we do not know how this bound can be made more explicit and therefore do not have convincing examples. We leave it as an open problem to control the quantity for the “optimal” subspace . Note that the measure is at our disposal.
Remark 5.6 (Approximation in the uniform norm).
In the recent papers by V.N. Temlyakov and T. Ullrich [36, 37], a behavior of some asymptotic characteristics of multivariate functions classes in the uniform norm was studied in the case of “small smoothness” of functions. The crucial point here is the fact, that in a small smoothness setting the corresponding approximation numbers are not square summable. The established estimates for Kolmogorov widths of the Sobolev classes of functions and classes of functions with bounded mixed differences serve as a powerful tool to investigate sampling recovery problem in and (see also Section 7 in [36] and Section 5 in [37]).
6 The power of standard information in the uniform norm
Now we discuss the optimal order of convergence of algorithms that realize upper bounds for approximation numbers and sampling numbers defined by (1.1) and (1.2), for the embedding operator .
The optimal order of convergence among all algorithms that use linear information for the identity operator is defined (see the monograph by E. Novak and H. Woźniakowski [29, Chapt. 26.6.1]) as
and, respectively, for standard information
In the case we write and , if put and . Note that, when investigating the approximation numbers , one can take with instead of considering the supremum norm of space.
Let us introduce the following two assumptions:
-
(i)
;
-
(ii)
there exist , such that , .
Note that the values of and are optimal in the sense that
The stronger conditions on the integral operator were earlier introduced by F. Kuo, G. W. Wasilkowski and H. Woźniakowski in [20, 21]. Namely, instead of the condition (i) the authors assume that
-
(i’)
there exists such that , ,
i.e., consider the special case . Note, that the constants can depend on the dimension . We consider concrete examples of RKHS which satisfy the indicated conditions.
If the second assumption is true, then the optimal rate of convergence among all algorithms that use linear information in the worst case setting and error measured in -norm is . If (i’) and (ii) are satisfied, then, when switching from to norm, we loose in the optimal order [20]:
where . Clearly, we only need the weaker assumption (i) for such a statement.
Now we move to -error among all algorithms that use standard information in the worst case setting. It was proved in [21] that under the assumptions (i’) and (ii) the optimal order of convergence
(6.1) |
The upper bound can not be improved even for linear information. From Theorem 4.2 we get the following improvement over (6.1).
Corollary 6.1.
Suppose that (i) and (ii) hold true with . Then . In case (or if (i’) holds) we have .
Proof.
Due to the inequality
This holds for any . We obtain
(6.2) |
Respectively, for , where is an absolute constant, the estimate (i) of Theorem 4.2 yields
We proved now, that under the considered assumptions . In case , i.e., where (i) = (i’) we have that . ∎
7 Examples
7.1 Sobolev type spaces
Let us begin with an application of the obtained results to classes of periodic functions. We compare the terms in Corollary 4.3 in the case when with some . This holds, in particular, for widely used mixed Sobolev classes.
So, let be a torus where the endpoints of the interval are identified. By we denote a -dimensional torus and equip it with the Lebesque measure .
Let further be the space of all (equivalence classes of) -periodic in each component measurable functions on such that
The functions are completely defined by their Fourier coefficients , , with respect to the trigonometric system (which form an orthonormal basis in ):
(7.1) |
namely, it holds
The notation stands for a Hilbert space of integrable functions on the torus such that
where , , are certain weights.
F. Cobos, T. Kühn and W. Sickel [4, Theorem 3.1] established necessary and sufficient condition on the weights , which guarantee the existence of compact embedding of in , where (recall, that for the space we define the supremum norm). Namely, it was proved that compactly if and only if
(7.2) |
Now we move to the concept of RKHS. Let us define the kernel
(7.3) |
which is bounded under the condition (7.2). The space is a reproducing kernel Hilbert space with the kernel (7.3), and the boundedness of implies that is compactly embedded also into . The eigenvalues of the operator are given by . Let us consider the frequency set
and let denote its cardinality. The reconstruction operator is defined for the set of functions for by (3.2). Note, that here we use the non-weighted least squares algorithm since , .
Theorem 7.1.
Let be a Sobolev type space given by weights satisfying the condition
Let further be , and be smallest such that
where is an absolute constant. Then the random reconstruction operator satisfies
(7.4) |
with universal constant .
In what follows, we consider concrete examples of weights , , and respective Sobolev classes of functions. Note, that the study of different characteristics on these classes recently became of interest. It is due to the application of Sobolev classes of functions in a number of practical issues, as event modeling, quantum chemistry, signal reconstruction, etc.
Theorem 7.2.
Under the assumptions of Theorem 7.1, the estimate
(7.5) |
holds for the sampling numbers of the embedding , where are absolute constants. Note that
where is the non-increasing rearrangement of .
Proof.
7.2 Sobolev Hilbert spaces with mixed smoothness
Let us specify the weight sequence as follows. First, let us define , , . By we denote the periodic Sobolev space (see, e.g., [17])
(7.7) |
One could also choose the weight of the form , , . The corresponding classes are denoted by , i.e.,
The norms of the classes and are equivalent (up to constants that can depend on the dimension and the smoothness parameter ). Therefore, an asymptotic behavior of sampling numbers of these classes coincide. When considering asymptotic issues let us write , which is a common notation for both introduced classes. For singular numbers of the embedding operator , it is known that
(7.8) |
Theorem 7.4.
Let , , where is an absolute constant. Then
(7.9) |
is true as .
Proof.
Theorem 7.5.
For , , it holds
(i) There is an absolute constant such that
(ii) There is an absolute constant such that
(7.11) |
Proof.
The statement (i) follows immediately from Theorem 7.4.
Remark 7.6.
Comparing the estimates (i) and (ii) from Theorem 7.5, we see a reduced exponent in the logarithmic term of (ii) for .
Remark 7.7.
V.N. Temlyakov [34] proved the following estimate for sparse grids:
The estimate of Theorem 7.5 is a little worse if we look at the power of the logarithm. Let us comment on the preasymptotic estimates which give us the precise constants for sufficiently smooth function classes with -norm. Similar results can be given for the “”-norm, where we use (7.19).
Note, that we can use the plain (non-weighted) least squares algorithm here. This is why we have a slightly better , scaling here.
Theorem 7.8.
Let , and . For , , we define by
Then
(7.12) |
7.3 A univariate example
Theorems 3.1 and 4.2 can be applied to non-bounded orthonormal systems. There is a large source of examples already for the univariate case stemming from second order differential operators in connection with orthogonal polynomials, see for instance [25, P. 108, Lem. 5]. Here the growth of the Christoffel function for the space of polynomials of degree for different Jacobi weight parameters is given. It turns that if we have otherwise not. We focus on Legendre polynomials () and the corresponding second order differential operator. Here we consider together with the uniform measure on . The second order operator , that is defined by
characterizes for weighted Sobolev spaces
These spaces are RKHS with the kernel
where , , are -normalized Legendre polynomials .
In this setting, we have , , and, accordingly, . As for the spectral function, we get
Hence,
(7.14) |
Applying the improved bounds for sampling numbers from Theorem 4.2, we derive to the estimate
(7.15) |
Note, that Ch. Bernardi and Y. Maday [2, Theorem 6.2] got optimal in the main rate estimates for approximation by a polynomial operator at Gauss points in the space . We are not aware of any existing bounds in the literature. In the paper by L. Kämmerer, T. Ullrich and T. Volkmer [9], the authors obtained improved worst case error estimates with high probability in the space (see Examples 5.7 and 5.10 there).
Remark 7.9.
Note, that using the density function from Remark 3.4, we have the less sharp bound
(7.16) |
Without the weighted least squares (see Remark 3.5), we obtain
that is the same in order with respect to the parameter , as those in (7.14). But we need to impose the condition , i.e., . Hence, in terms of the number of used samples we get
(7.17) |
which sometimes even better than (7.16), namely if . However, (7.16) and (7.17) are never better than the right-hand side in (7.15).
Appendix. Preasymptotic estimates for singular numbers.
Note, that the asymptotic rate of the singular numbers as is known for a long time in many cases. So, for the periodic Sobolev classes of functions with dominating mixed smoothness the order of approximation numbers was obtained by B.S. Mityagin in 1962 (in a slightly general setting). As for the isotropic classes the order of corresponding approximation numbers was proved by J.W. Jerome in 1967. A lot of scientists were involved in finding the optimal orders of convergence for linear algorithms on different function classes, but mainly the constants in the estimates were not specified. And in practical issues the dependence of these constants on the smoothness parameter of the spaces and the dimension of the underlying domain is a crucial point.
Various comments on literature concerning the existing preasymptotic results for mixed order Sobolev functions on the -torus can be found in [17, Chapt. 4.5]. We comment only few that are closely connected to our research.
It was shown by T. Kühn, W. Sickel and T. Ullrich [17] that for and it holds
(they showed an existence of the limit and that its value is independent of the norm choice). The preasymptotic estimates (for small ) depend heavily on the choice of corresponding norm in the space . Therefore, authors further considered several natural norms ( and ) in the space , in particular they showed that for the classes defined exactly as in (7.7) it holds for , , and the estimate
is true (with the corresponding non-matching lower bound).
This result was improved by T. Kühn [15] where he showed that for , and all it holds
(7.18) |
The estimate (7.18) contains all values of the parameter , not only in preasymptotic range and moreover the factor and the exponent in the corresponding right hand side are better.
In the case of “+”-norm, a new preasymptotic bound for the classes of functions with anisotropic mixed smoothness is obtained by T. Kühn, W. Sickel and T. Ullrich [18, Theorem 4.11]. In particular, it was proved that for , and all it holds
(7.19) |
where the constant . Note, that this estimate was stated in a more general case. In the paper one can also find an improved upper bound in the case .
Note also, that D. Krieg [12, Section 4.1] established earlier results on the asymptotic and preasymptotic behavior for tensor powers of arbitrary sequences of polynomial decay, and further estimates for approximation numbers of embeddings of mixed order Sobolev functions on the -cube (with different equivalent norms) into , where .
As to the isotropic Sobolev classes , T. Kühn, S. Mayer and T. Ullrich [16] established a connection between approximation numbers of periodic Sobolev type spaces, where the smoothness weights on the Fourier coefficients are influenced by a (quasi-)norm on , and entropy numbers of embeddings for finite dimensional balls. This allowed them to get preasymptotic estimates for isotropic Sobolev spaces and spaces of Gevrey type (in terms of equivalences where the hidden constants do not depend on and ). But in [16] the authors did not indicate explicitly a dependence of these constants on the parameters and . The exact dependence on the dimension and the smoothness was given later by T. Kühn [15].
Acknowledgment.
T.U. would like to acknowledge support by the DFG Ul-403/2-1. The authors would like to thank Felix Bartel, David Krieg, Stefan Kunis and Mario Ullrich for valuable comments.
References
- [1] A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, Boston, MA, 2004. With a preface by Persi Diaconis.
- [2] C. Bernardi and Y. Maday. Polynomial interpolation results in Sobolev spaces. J. Comput. Appl. Math., 43(1):53–80, 1992.
- [3] A. Christmann and I. Steinwart. Support Vector Machines. Springer, 2008.
- [4] F. Cobos, T. Kühn, and W. Sickel. Optimal approximation of multivariate periodic Sobolev functions in the sup-norm. J. Funct. Anal., 270(11):4196–4212, 2016.
- [5] A. Cohen and G. Migliorati. Optimal weighted least-squares methods. SMAI J. Comput. Math., 3:181–203, 2017.
- [6] K. Gröchenig. Sampling, Marcinkiewicz-Zygmund inequalities, approximation, and quadrature rules. J. Approx. Theory, 257, 2020.
- [7] M. Hein and O. Bousquet. Kernels, associated structures and generalizations. Technical Report 127, Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 2004.
- [8] L. Kämmerer. Multiple lattice rules for multivariate approximation in the worst-case setting. arXiv: math/1909.02290v1, 2019.
- [9] L. Kämmerer, T. Ullrich, and T. Volkmer. Worst-case recovery guarantees for least squares approximation using random samples. arXiv: math/1911.10111, 2019.
- [10] L. Kämmerer and T. Volkmer. Approximation of multivariate periodic functions based on sampling along multiple rank-1 lattices. J. Approx. Theory, 246:1–27, 2019.
- [11] D. Krieg and M. Sonnleitner. Random points are optimal for the approximation of Sobolev functions. arXiv: math/arXiv:2009.11275, 2020.
- [12] D. Krieg. Tensor power sequences and the approximation of tensor product operators. J. Complexity, 44:30–51, 2018.
- [13] D. Krieg and M. Ullrich. Function values are enough for -approximation: Part II. J. Complexity, to appear. arXiv: math/2011.01779.
- [14] D. Krieg and M. Ullrich. Function values are enough for -approximation. Found. Comp. Math., to appear. arXiv:math/1905.02516v3.
- [15] T. Kühn. New preasymptotic estimates for the approximation of periodic Sobolev functions. In 2018 MATRIX annals, volume 3 of MATRIX Book Ser., pages 97–112. Springer, Cham., 2020.
- [16] T. Kühn, S. Mayer, and T. Ullrich. Counting via entropy: new preasymptotics for the approximation numbers of Sobolev embeddings. SIAM J. Numer. Anal., 54(6):3625–3647, 2016.
- [17] T. Kühn, W. Sickel, and T. Ullrich. Approximation of mixed order Sobolev functions on the -torus: asymptotics, preasymptotics and -dependence. Constr. Approx., 42:353–398, 2015.
- [18] T. Kühn, W. Sickel, and T. Ullrich. How anisotropic mixed smoothness affects the decay of singular numbers of Sobolev embeddings. J. Complexity, to appear, arXiv: math/2001.09022v3.
- [19] R. J. Kunsch. Breaking the curse for uniform approximation in Hilbert spaces via Monte Carlo methods. J. Complexity, 48:15–35, 2018.
- [20] F. Y. Kuo, G. W. Wasilkowski, and H. Woźniakowski. Multivariate approximation in the worst case setting over reproducing kernel Hilbert spaces. J. Approx. Theory, 152(2):135–160, 2008.
- [21] F. Y. Kuo, G. W. Wasilkowski, and H. Woźniakowski. On the power of standard information for multivariate approximation in the worst case setting. J. Approx. Theory, 158(1):97–125, 2009.
- [22] I. Limonova and V. N. Temlyakov. On sampling discretization in . arXiv: math/2009.10789v1, 2020.
- [23] M. Moeller and T. Ullrich. -norm sampling discretization and recovery of functions from RKHS with finite trace. arXiv: math/2009.11940v1, 2020.
- [24] N. Nagel, M. Schäfer, and T. Ullrich. A new upper bound for sampling numbers. Found. Comp. Math., to appear, arXiv: math/2010.00327v1.
- [25] P. G. Nevai. Orthogonal polynomials. Mem. Amer. Math. Soc., 18(213):v+185, 1979.
- [26] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Vol. 1: Linear information, volume 6 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2008.
- [27] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Volume II: Standard information for functionals, volume 12 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2010.
- [28] S. Nitzan, A. Olevskii and A. Ulanovskii. Exponential frames on unbounded sets. Proc. Amer. Math. Soc., 144(1):109–118, 2016.
- [29] E. Novak and H. Woźniakowski. Tractability of multivariate problems. Volume III: Standard information for operators, volume 18 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2012.
- [30] K. Y. Osipenko and O. G. Parfenov. Ismagilov type theorems for linear, Gel’fand and Bernstein -widths. J. Complexity, 11(4):474–492, 1995.
- [31] C. C. Paige and M. A. Saunders. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8:43–71, 1982.
- [32] K. Pozharska and T. Ullrich. A note on sampling recovery of multivariate functions in the uniform norm. arXiv: math/2103.11124, 2021.
- [33] I. Steinwart and C. Scovel. Mercers theorem on general domains: On the interaction between measures, kernels, and rkhss. Constr. Approx., 35, 2012.
- [34] V. N. Temlyakov. On approximate recovery of functions with bounded mixed derivative. J. Complexity, 9(1):41–59, 1993.
- [35] V. N. Temlyakov. On optimal recovery in . J. Complexity, to appear. arXiv: math/2010.03103.
- [36] V. N. Temlyakov and T. Ullrich. Approximation of functions with small mixed smoothness in the uniform norm. arXiv: math/2012.11983, 2020.
- [37] V. N. Temlyakov and T. Ullrich. Bounds on Kolmogorov widths and sampling recovery for classes with small mixed smoothness. arXiv: math/2012.09925, 2020.
- [38] M. Ullrich. On the worst-case error of least squares algorithms for -approximation with high probability. J. Complexity, 60, 2020.
- [39] G. W. Wasilkowski and H. Woźniakowski. On the power of standard information for weighted approximation. Found. Comput. Math., 1(4):417–434, 2001.