This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\spnewtheorem

udaExample

\guidelinedefn

Local transfer learning from one data space to another

*

H. N. Mhaskar and Ryan O’Dowd Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711. The research of HNM was supported in part by ARO grant W911NF2110218 and NSF DMS grant 2012355. email: [email protected] of Mathematical Sciences, Claremont Graduate University, Claremont CA 91711. email: ryan.o’[email protected].
Abstract

A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data space is an abstraction of a manifold encapsulating the essential properties that allow for function approximation. The problem of transfer learning (meta-learning) is to use the learning of a function on one data set to learn a similar function on a new data set. In terms of function approximation, this means lifting a function on one data space (the base data space) to another (the target data space). This viewpoint enables us to connect some inverse problems in applied mathematics (such as inverse Radon transform) with transfer learning. In this paper we examine the question of such lifting when the data is assumed to be known only on a part of the base data space. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related.

1 Introduction

A fundamental problem in machine learning is the following. A data of the form {(xj,yj)}\{(x_{j},y_{j})\} is given, assumed to be sampled from an unknown probability distribution. The goal is to approximate the function f(x)=𝔼(y|x)f(x)=\mathbb{E}(y|x) from the data. Typically, the points xjx_{j} belong to an ambient Euclidean space of a very high dimension, leading to the so called curse of dimensionality. One of the strategies to counter this “curse” is to assume the manifold hypothesis; i.e., assume that the points xjx_{j} are located on an unknown low dimensional submanifold of the ambient space. Examples of some well known techniques in this direction, dimensionality reduction in particular, are Isomaps tenenbaum2000global , maximum variance unfolding (MVU) (also called semidefinite programming (SDP)) weinberger2005nonlinear , locally linear embedding (LLE) roweis2000nonlinear , local tangent space alignment method (LTSA) zhang2004principal , Laplacian eigenmaps (Leigs) belkin2003laplacian , Hessian locally linear embedding (HLLE) david2003hessian , diffusion maps (Dmaps) coifmanlafondiffusion , and randomized anisotropic transform chuiwang2010 . A recent survey of these methods is given by Chui and Wang in chuidimred2015 . An excellent introduction to the subject of diffusion geometry can be found in the special issue achaspissue of Applied and Computational Harmonic Analysis, 2006. The application areas are too numerous to mention exhaustively. They include, for example, document analysis coifmanmauro2006 , face recognition niyogiface ; ageface2011 ; chuiwang2010 , hyperspectral imaging chuihyper , semi-supervised learning niyogi1 ; niyogi2 , image processing donoho2005image ; arjuna1 , cataloguing of galaxies donoho2002multiscale , and social networking bertozzicommunity .

A good deal of research in the theory of manifold learning deals with the problem of understanding the geometry of the data defined manifold. For example, it is shown in jones2010universal ; jones2008parameter that an atlas on the unknown manifold can be defined in terms of the heat kernel corresponding to the Laplace-Beltrami operator on the manifold. Other constructions of the atlas are given in chui_deep ; shaham2018provable ; schmidt2019deep with applications to the study of deep networks. Function approximation on manifolds based on scattered data (i.e., data points xjx_{j} whose locations are not prescribed analytically) has been studied in detail in many papers, starting with mauropap , e.g., frankbern ; modlpmz ; eignet ; compbio ; heatkernframe ; mhaskar2020kernel . A theory was applied successfully in mhas_sergei_maryke_diabetes2017 to construct deep networks for predicting blood sugar levels based on continuous glucose monitoring devices.

A fundamental role in this theory is played by the heat kernel on the manifold corresponding to an appropriate elliptic partial differential operator. In coifmanmauro2006 ; heatkernframe , a muti-resolution analysis is constructed using the heat kernel. Another important tool is the theory of localized kernels based on the eigen-decomposition of the heat kernel. These were introduced in mauropap based on certain assumptions on the spectral function and the property of finite speed of wave propagation. In the context of manifolds, this later property was proved in sikora2004riesz ; frankbern to be equivalent to the so called Gaussian upper bounds on the heat kernels. Although such bounds are studied in many contexts by many authors, e.g., grigoryan1995upper ; grigor1997gaussian ; davies1990heat ; kordyukov1991p , we could not locate a reference where such a bound was proved for a general smooth manifold. We have therefore supplied a proof in mhaskar2020kernel . In [tauberian, , Theorem 4.3], we have proved a very general recipe that yields localized kernels based on the Gaussian upper bound on the heat kernel in what we have termed a data defined space (or data space in some other papers).

The problem of transfer learning (or meta-learning) involves learning the parameters of an approximation process based on one data set, and using this information to quickly learn the corresponding parameters on another data set, e.g., valeriyasmartphone ; maskey2023transferability ; maurer2013sparse . In the context of manifold learning, a data set (point cloud) determines a manifold, so that different data sets would correspond to different manifolds. In the context of data spaces, we can therefore interpret transfer learning as “lifting” a function from one data space (the base data space) to another (the target data space). This viewpoint allows us to unify the topic of transfer learning with the study of some inverse problems in image/signal processing. For example, the problem of synthetic aperture radar (SAR) imaging can be described in terms of an inverse Radon transform nolan2002synthetic ; cheney2009fundamentals ; munson1983tomographic . The domain and range of the Radon transform are different, and hence, the problem amounts to approximating the actual image on one domain based on observations of its Radon transform, which are located on a different domain. Another application is in analyzing hyperspectral images changing with time coifmanhirn . A similar problem arises in analyzing the progress of Alzheimer’s disease from MRI images of the brain taken over time, where one is interested in the development of the cortical thickness as a function on the surface of the brain, a manifold which is changing over time kim2014multi .

Motivated by these applications and the paper coifmanhirn of Coifman and Hirn, we studied in tauberian the question of lifting a function from one data space to another, when certain landmarks from one data space were identified with those on the other data space. For example, it is known lerch2005focal that in spite of the changing brain, one can think of each brain to be parametrized by an inner sphere, and the cortical thickness at certain standard points based on this parametrization are important in the prognosis of the disease. In tauberian we investigated certain conditions on the two data spaces which allow the lifting of a function from one to the other, and analyzed the effect on the smoothness of the function as it is lifted.

In many applications, the data about the function is available only on a part of the base data space. The novel part of this paper is to investigate the following questions of interest: (1) determine on what subsets of the target data space the lifting is defined, and (2) how the local smoothness on the base data space translates into the local smoothness of the lifted function. In limited angle tomography, one observes the Radon transform on a limited part of a cylinder and needs to reconstruct the image as a function on a ball from this data. A rudimentary introduction to the subject is given in the book natterer2001mathematics of Natterer. We do not aim to solve the limited angle tomography problem itself, but we will study in detail an example motivated by the singular value decomposition of the Radon transform, which involves two different systems of orthogonal polynomials on the interval [1,1][-1,1]. The theory of transplantation theorems muckenhoupt1986transplantation deals with the following problem. We are given the coefficients in the expansion of a function ff on [1,1][-1,1] in terms of Jacobi polynomials with certain parameters (the base space expansion in our language), and use them as the coefficients in an expansion in terms of Jacobi polynomials with respect to a different set of parameters (the target space in our language). Under what conditions on ff and the parameters of the two Jacobi polynomial systems will the expansion in the target space converge and in which LpL^{p} spaces? While old fashioned, the topic appears to be of recent interest diaz2021discrete ; arenas2019weighted . We will illustrate our general theory by obtaining a localized transplantation theorem for uniform approximation.

In Section 2, we review certain important results in the context of a single data space (our abstraction of a manifold). In particular, we present a characterization of local approximation of functions on such spaces. In Section 3, we review the notion of joint spaces (introduced under a different name in tauberian ). The main new result of our paper is to study the lifting of a function from a subset (typically, a ball) on one data space to another. These results are discussed in Section 4. The proofs are given in Section 5. An essential ingredient in our constructions is the notion of localized kernels which, in turn, depend upon a Tauberian theorem. For the convenience of the reader, this theorem is presented in Appendix 5.1. Appendix 5.2 lists some important properties of Jacobi polynomials which are required in our examples.

2 Data spaces

As mentioned in the introduction, a good deal of research on manifold learning is devoted to the question of learning the geometry of the manifold. For the purpose of harmonic analysis and approximation theory on the manifold, we do not need the full strength of the differentiability structure on the manifold. Our own understanding of the correct hypotheses required to study these questions has evolved, resulting in a plethora of terminology such as data defined manifolds, admissible systems, data defined spaces, etc., culminating in our current understanding with the definition of a data space given in mhaskar2020kernel . For the sake of simplicity, we will restrict our attention in this paper to the case of compact spaces. We do not expect any serious problems in extending the theory to the general case, except for a great deal of technical details.

Thus, the set up is the following.

We consider a compact metric measure space 𝕏{\mathbb{X}} with metric dd and a probability measure μ\mu^{*}. We take {λk}k=0\{\lambda_{k}\}_{k=0}^{\infty} to be a non-decreasing sequence of real numbers with λ0=0\lambda_{0}=0 and λk\lambda_{k}\to\infty as kk\to\infty, and {ϕk}k=0\{\phi_{k}\}_{k=0}^{\infty} to be an orthonormal set in L2(μ)L^{2}(\mu^{*}). We assume that each ϕk\phi_{k} is continuous. The elements of the space

Πn=𝗌𝗉𝖺𝗇{ϕk:λk<n}\Pi_{n}=\mathsf{span}\{\phi_{k}:\lambda_{k}<n\} (1)

are called diffusion polynomials (of order <n<n). We write Π=n>0Πn\displaystyle\Pi_{\infty}=\bigcup_{n>0}\Pi_{n}. We introduce the following notation.

𝔹(x,r)={y𝕏:d(x,y)r},x𝕏,r>0.{\mathbb{B}}(x,r)=\{y\in{\mathbb{X}}:d(x,y)\leq r\},\qquad x\in{\mathbb{X}},\ r>0. (2)

If A𝕏A\subseteq{\mathbb{X}} we define

𝔹(A,r)=xA𝔹(x,r).{\mathbb{B}}(A,r)=\bigcup_{x\in A}{\mathbb{B}}(x,r). (3)

With this set up, the definition of a compact data space is the following.

Definition 2.1.

The tuple Ξ=(𝕏,d,μ,{λk}k=0,{ϕk}k=0)\Xi=({\mathbb{X}},d,\mu^{*},\{\lambda_{k}\}_{k=0}^{\infty},\{\phi_{k}\}_{k=0}^{\infty}) is called a (compact) data space if each of the following conditions is satisfied.

  1. 1.

    For each x𝕏x\in{\mathbb{X}}, r>0r>0, 𝔹(x,r)\mathbb{B}(x,r) is compact.

  2. 2.

    (Ball measure condition) There exist q1q\geq 1 and κ>0\kappa>0 with the following property: For each x𝕏x\in{\mathbb{X}}, r>0r>0,

    μ(𝔹(x,r))=μ({y𝕏:d(x,y)<r})κrq.\mu^{*}(\mathbb{B}(x,r))=\mu^{*}\left(\{y\in{\mathbb{X}}:d(x,y)<r\}\right)\leq\kappa r^{q}. (4)

    (In particular, μ({y𝕏:d(x,y)=r})=0\mu^{*}\left(\{y\in{\mathbb{X}}:d(x,y)=r\}\right)=0.)

  3. 3.

    (Gaussian upper bound) There exist κ1,κ2>0\kappa_{1},\kappa_{2}>0 such that for all x,y𝕏x,y\in{\mathbb{X}}, 0<t10<t\leq 1,

    |k=0exp(λk2t)ϕk(x)ϕk(y)|κ1tq/2exp(κ2d(x,y)2t).\left|\sum_{k=0}^{\infty}\exp(-\lambda_{k}^{2}t)\phi_{k}(x)\phi_{k}(y)\right|\leq\kappa_{1}t^{-q/2}\exp\left(-\kappa_{2}\frac{d(x,y)^{2}}{t}\right). (5)

We refer to qq as the exponent for Ξ\Xi.

The primary example of a data space is, of course, a Riemannian manifold.

{uda}

Let 𝕏{\mathbb{X}} be a smooth, compact, connected Riemannian manifold (without boundary), dd be the geodesic distance on 𝕏{\mathbb{X}}, μ\mu^{*} be the Riemannian volume measure normalized to be a probability measure, {λk}\{\lambda_{k}\} be the sequence of eigenvalues of the (negative) Laplace-Beltrami operator on 𝕏{\mathbb{X}}, and ϕk\phi_{k} be the eigenfunction corresponding to the eigenvalue λk\lambda_{k}; in particular, ϕ01\phi_{0}\equiv 1. We have proved in [mhaskar2020kernel, , Appendix A] that the Gaussian upper bound is satisfied. Therefore, if the condition in Equation (4) is satisfied, then (𝕏,d,μ,{λk}k=0,{ϕk}k=0)({\mathbb{X}},d,\mu^{*},\{\lambda_{k}\}_{k=0}^{\infty},\{\phi_{k}\}_{k=0}^{\infty}) is a data space with exponent equal to the dimension of the manifold.

Remark 2.2.

In friedman2004wave , Friedman and Tillich give a construction for an orthonormal system on a graph which leads to a finite speed of wave propagation. It is shown in frankbern that this, in turn, implies the Gaussian upper bound. Therefore, it is an interesting question whether appropriate definitions of measures and distances can be defined on a graph to satisfy the assumptions of a data space.

The constant convention. In the sequel, c,c1,c,c_{1},\cdots will denote generic positive constants depending only on the fixed quantities under discussion such as Ξ\Xi, qq, κ,κ1,κ2\kappa,\kappa_{1},\kappa_{2}, the various smoothness parameters and the filters to be introduced. Their value may be different at different occurrences, even within a single formula. The notation ABA\lesssim B means AcBA\leq cB, ABA\gtrsim B means BAB\lesssim A and ABA\sim B means ABAA\lesssim B\lesssim A.

{uda}

In this example, we let 𝕏=[0,π]\mathbb{X}=[0,\pi] and for θ1,θ2𝕏\theta_{1},\theta_{2}\in\mathbb{X} we simply define the distance as

d(θ1,θ2)=|θ1θ2|.d(\theta_{1},\theta_{2})=\left|\theta_{1}-\theta_{2}\right|. (6)

We will consider the so-called trigonometric functions nowak2011sharp

ϕn(α,β)(θ)=(1cosθ)α/2+1/4(1+cosθ)β/2+1/4pn(α,β)(cosθ),\phi_{n}^{(\alpha,\beta)}(\theta)=(1-\cos\theta)^{\alpha/2+1/4}(1+\cos\theta)^{\beta/2+1/4}p_{n}^{(\alpha,\beta)}(\cos\theta), (7)

where pn(α,β)p_{n}^{(\alpha,\beta)} are orthonormalized Jacobi polynomials defined as in Appendix 5.2 and α,β1/2\alpha,\beta\geq-1/2. We define

dμ(θ)=1πdθ.d\mu^{*}(\theta)=\frac{1}{\pi}d\theta. (8)

We see that a change of variables x=cosθx=\cos\theta in Equation (96) results in the following orthogonality condition

0πϕn(α,β)(θ)ϕm(α,β)(θ)𝑑θ=δn,m.\int_{0}^{\pi}\phi_{n}^{(\alpha,\beta)}(\theta)\phi_{m}^{(\alpha,\beta)}(\theta)d\theta=\delta_{n,m}. (9)

So our orthonormal set of functions with respect to μ\mu^{*} will be {πϕn(α,β)}\{\sqrt{\pi}\phi_{n}^{(\alpha,\beta)}\}. It was proven in nowak2011sharp that with

λn=n+α+β+12,\lambda_{n}=n+\frac{\alpha+\beta+1}{2}, (10)

we have

πn=0exp(λn2t)ϕn(α,β)(θ1)ϕn(α,β)(θ2)t1/2exp(cd(θ1,θ2)2t),θ1,θ2𝕏.\pi\sum_{n=0}^{\infty}\exp\left(-\lambda_{n}^{2}t\right)\phi_{n}^{(\alpha,\beta)}(\theta_{1})\phi_{n}^{(\alpha,\beta)}(\theta_{2})\lesssim t^{-1/2}\exp\left(-c\frac{d(\theta_{1},\theta_{2})^{2}}{t}\right),\quad\theta_{1},\theta_{2}\in\mathbb{X}. (11)

In conclusion,

Ξ=(𝕏,d,μ,{λn},{πϕn(α,β)})\Xi=(\mathbb{X},d,\mu^{*},\{\lambda_{n}\},\{\sqrt{\pi}\phi_{n}^{(\alpha,\beta)}\}) (12)

is a data space with exponent 11.

The following example illustrates how a manifold with boundary can be transformed into a closed manifold as in Example 2. We will use the notation and facts from Appendix 5.2 without always referring to them explicitly. We adopt the notation

SSq={𝐱q+1:|𝐱|=1},SS+q={𝐱SSq:xq+10}.\SS^{q}=\{{\bf x}\in{\mathbb{R}}^{q+1}:|{\bf x}|=1\},\qquad\SS^{q}_{+}=\{{\bf x}\in\SS^{q}:x_{q+1}\geq 0\}. (13)
{uda}

Let μq\mu^{*}_{q} denote the volume measure of 𝕊q\mathbb{S}^{q}, normalized to be a probability measure. Let nq{\mathbb{H}}_{n}^{q} be the space of the restrictions to SSq\SS^{q} of homogeneous harmonic polynomials of degree nn on q+1q+1 variables, and {Yn,k}k\{Y_{n,k}\}_{k} be an orthonormal (with respect to μq\mu^{*}_{q}) basis for nq{\mathbb{H}}_{n}^{q}. The polynomials Yn,kY_{n,k} are eigenfunctions of the Laplace-Beltrami operator on the manifold SSq\SS^{q} with eigenvalues n(n+q1)n(n+q-1). The geodesic distance between ξ,η𝕊q\xi,\eta\in\mathbb{S}^{q} is arccos(ξη)\arccos(\xi\cdot\eta), so the Gaussian upper bound for manifolds takes the form

n,kexp(n(n+q1)t)Yn,k(𝐱)Yn,k(𝐲)¯tq/2exp(c(arccos(𝐱𝐲))2t).\sum_{n,k}\exp(-n(n+q-1)t)Y_{n,k}({\bf x})\overline{Y_{n,k}({\bf y})}\lesssim t^{-q/2}\exp\left(-c\frac{(\arccos({\bf x}\cdot{\bf y}))^{2}}{t}\right). (14)

As a result, (𝕊q,arccos(),μq,{λn}n,{Yn,k}n,k)(\mathbb{S}^{q},\arccos(\circ\cdot\circ),\mu^{*}_{q},\{\lambda_{n}\}_{n},\{Y_{n,k}\}_{n,k}) is a data space with dimension qq.

Now we consider

𝕏=𝔹q={𝐱q:|𝐱|1}.{\mathbb{X}}={\mathbb{B}}^{q}=\{{\bf x}\in{\mathbb{R}}^{q}:|{\bf x}|\leq 1\}.

We can identify 𝔹q{\mathbb{B}}^{q} with SS+q\SS^{q}_{+} as follows. Any point 𝐱𝔹q{\bf x}\in{\mathbb{B}}^{q} has the form 𝐱=ωsinθ{\bf x}=\omega\sin\theta for some ωSSq1\omega\in\SS^{q-1}, θ[0,π/2]\theta\in[0,\pi/2]. We write 𝐱^=(ωsinθ,cosθ)SS+q\hat{{\bf x}}=(\omega\sin\theta,\cos\theta)\in\SS^{q}_{+}. With this identification, SS+q\SS^{q}_{+} is parameterized by 𝔹q{\mathbb{B}}^{q} and we define

dμ(𝐱)=dμq(𝐱^)=\displaystyle d\mu^{*}({\bf x})=d\mu^{*}_{q}(\hat{{\bf x}})= Vol(𝔹q)Vol(𝕊+q)(1|𝐱|2)1/2dm(𝐱)\displaystyle\frac{\operatorname{Vol}(\mathbb{B}^{q})}{\operatorname{Vol}(\mathbb{S}^{q}_{+})}(1-|{\bf x}|^{2})^{-1/2}dm^{*}({\bf x}) (15)
=\displaystyle= Γ((q+1)/2)πΓ(q/2+1)(1|𝐱|2)1/2dm(𝐱),\displaystyle\frac{\Gamma((q+1)/2)}{\sqrt{\pi}\Gamma(q/2+1)}(1-|{\bf x}|^{2})^{-1/2}dm^{*}({\bf x}),

where μq\mu^{*}_{q} is the probability volume measure on 𝕊+q\mathbb{S}^{q}_{+}, and mm^{*} is the probability volume measure on 𝔹q\mathbb{B}^{q}. It is also convenient to define the distance on 𝔹q{\mathbb{B}}^{q} by

d(𝐱1,𝐱2)=arccos(𝐱^1𝐱^2)=arccos(𝐱1𝐱2+1|𝐱1|21|𝐱2|2).d({\bf x}_{1},{\bf x}_{2})=\arccos(\hat{{\bf x}}_{1}\cdot\hat{{\bf x}}_{2})=\arccos({\bf x}_{1}\cdot{\bf x}_{2}+\sqrt{1-|{\bf x}_{1}|^{2}}\sqrt{1-|{\bf x}_{2}|^{2}}). (16)

All spherical harmonics of degree 2n2n are even functions on 𝕊q\mathbb{S}^{q}. So with the identification of measures as above, one can represent the even spherical harmonics as an orthonormal system of functions on 𝔹q\mathbb{B}^{q}. That is, by defining

P2n,k(𝐱)=2Y2n,k(𝐱^),P_{2n,k}({\bf x})=\sqrt{2}Y_{2n,k}(\hat{{\bf x}}), (17)

we have

𝔹qP2n,k(𝐱)P2n,k(𝐱)¯𝑑μ(𝐱)=\displaystyle\int_{{\mathbb{B}}^{q}}P_{2n,k}({\bf x})\overline{P_{2n^{\prime},k^{\prime}}({\bf x})}d\mu^{*}({\bf x})= 2𝕊+qY2n,k(𝐱^)Y2n,k(𝐱^)¯𝑑μq(𝐱^)\displaystyle 2\int_{\mathbb{S}_{+}^{q}}Y_{2n,k}(\hat{\bf x})\overline{Y_{2n^{\prime},k^{\prime}}(\hat{\bf x})}d\mu^{*}_{q}(\hat{\bf x}) (18)
=\displaystyle= 𝕊qY2n,k(ξ)Y2n,k(ξ)¯𝑑μq(ξ)\displaystyle\int_{\mathbb{S}^{q}}Y_{2n,k}(\xi)\overline{Y_{2n^{\prime},k^{\prime}}(\xi)}d\mu^{*}_{q}(\xi)
=\displaystyle= δ(n,k),(n,k).\displaystyle\delta_{(n,k),(n^{\prime},k^{\prime})}.

To show the Gaussian upper bound for Y2n,kY_{2n,k} on 𝔹q\mathbb{B}^{q}, we first see that in view of the addition formula (101) and (98), we deduce

k=1dim(2nq)P2n,k(𝐱)P2n,k(𝐲)¯\displaystyle\sum_{k=1}^{\operatorname{dim}(\mathbb{H}_{2n}^{q})}P_{2n,k}({\bf x})\overline{P_{2n,k}({\bf y})} (19)
=\displaystyle= k=1dim(2nq)Y2n,k(𝐱^)Y2n,k(𝐲^)¯\displaystyle\sum_{k=1}^{\operatorname{dim}(\mathbb{H}_{2n}^{q})}Y_{2n,k}(\hat{\bf x})\overline{Y_{2n,k}(\hat{\bf y})}
=\displaystyle= ωqωq1p2n(q/21,q/21)(1)p2n(q/21,q/21)(𝐱^𝐲^)\displaystyle\frac{\omega_{q}}{\omega_{q-1}}p_{2n}^{(q/2-1,q/2-1)}(1)p_{2n}^{(q/2-1,q/2-1)}(\hat{\bf x}\cdot\hat{\bf y})
=\displaystyle= ωqωq12(q1)/2pn(q/21,1/2)(1)pn(q/21,1/2)(cos(2arccos(𝐱^𝐲^))).\displaystyle\frac{\omega_{q}}{\omega_{q-1}}2^{(q-1)/2}p_{n}^{(q/2-1,-1/2)}(1)p_{n}^{(q/2-1,-1/2)}(\cos(2\arccos(\hat{\bf x}\cdot\hat{\bf y}))).

In light of Equation (95) we define

λn=n(n+q/21/2),\lambda_{n}=\sqrt{n(n+q/2-1/2)}, (20)

which is conveniently not dependent upon kk. Using (LABEL:eq:specialjacobigauss), we see that for t>0t>0

n=0k=1dim(2nq)exp(λn2t)P2n,k(𝐱)P2n,k(𝐲)¯\displaystyle\sum_{n=0}^{\infty}\sum_{k=1}^{\operatorname{dim}(\mathbb{H}_{2n}^{q})}\exp\left(-\lambda_{n}^{2}t\right)P_{2n,k}(\mathbf{x})\overline{P_{2n,k}({\bf y})} (21)
\displaystyle\sim n=0exp(n(n+q/21/2)t)pn(q/21,1/2)(1)pn(q/21,1/2)(cos(2arccos(𝐱^𝐲^)))\displaystyle\sum_{n=0}^{\infty}\exp(-n(n+q/2-1/2)t)p_{n}^{(q/2-1,-1/2)}(1)p_{n}^{(q/2-1,-1/2)}(\cos(2\arccos(\hat{\bf x}\cdot\hat{\bf y})))
\displaystyle\lesssim tq/2(4carccos(𝐱^1𝐱^2)2t).\displaystyle t^{-q/2}\left(-4c\frac{\arccos(\hat{{\bf x}}_{1}\cdot\hat{{\bf x}}_{2})^{2}}{t}\right).

Therefore, (𝔹q,d,μ,{λn}n,{P2n,k}n,k)(\mathbb{B}^{q},d,\mu^{*},\{\lambda_{n}\}_{n},\{P_{2n,k}\}_{n,k}) is a data space with exponent qq.

In this section, we will assume Ξ\Xi to be a fixed data space and omit its mention from the notations. We will mention it later in other parts of the paper in order to avoid confusion. Next, we define smoothness classes of functions on 𝕏{\mathbb{X}}. In the absence of any differentiability structure, we do this in a manner that is customary in approximation theory. We define first the degree of approximation of a function fLp(μ)f\in L^{p}(\mu^{*}) by

En(p,f)=minPΠnfPp,μ,n>0,1p,fLp(μ).E_{n}(p,f)=\min_{P\in\Pi_{n}}\|f-P\|_{p,\mu^{*}},\qquad n>0,1\leq p\leq\infty,\ f\in L^{p}(\mu^{*}). (22)

We find it convenient to denote by XpX^{p} the space {fLp(μ):limnEn(p,f)=0}\{f\in L^{p}(\mu^{*}):\displaystyle\lim_{n\to\infty}E_{n}(p,f)=0\}; e.g., in the manifold case, Xp=Lp(μ)X^{p}=L^{p}(\mu^{*}) if 1p<1\leq p<\infty and X=C(𝕏)X^{\infty}=C({\mathbb{X}}). In the case of Example 2, we need to restrict ourselves to even functions.

Definition 2.3.

Let 1p1\leq p\leq\infty, γ>0\gamma>0.
(a) For fXpf\in X^{p}, we define

fWγ,p=fp,μ+supn>0nγEn(p,f),\|f\|_{W_{\gamma,p}}=\|f\|_{p,\mu^{*}}+\sup_{n>0}n^{\gamma}E_{n}(p,f), (23)

and note that

fWγ,pfp,μ+supn+2nγE2n(p,f).\|f\|_{W_{\gamma,p}}\sim\|f\|_{p,\mu^{*}}+\sup_{n\in{\mathbb{Z}}_{+}}2^{n\gamma}E_{2^{n}}(p,f). (24)

The space Wγ,pW_{\gamma,p} comprises all ff for which fWγ,p<\|f\|_{W_{\gamma,p}}<\infty.
(b) We write C=γ>0Wγ,C^{\infty}=\displaystyle\bigcap_{\gamma>0}W_{\gamma,\infty}. If BB is a ball in 𝕏{\mathbb{X}}, C(B)C^{\infty}(B) comprises functions fCf\in C^{\infty} which are supported on BB.
(c) If x0𝕏x_{0}\in{\mathbb{X}}, the space Wγ,p(x0)W_{\gamma,p}(x_{0}) comprises functions ff such that there exists r>0r>0 with the property that for every ϕC(𝔹(x0,r))\phi\in C^{\infty}(\mathbb{B}(x_{0},r)), ϕfWγ,p\phi f\in W_{\gamma,p}. If A𝕏A\subset{\mathbb{X}}, the space Wγ,p(A)=x0AWγ,p(x0)W_{\gamma,p}(A)=\displaystyle\bigcap_{x_{0}\in A}W_{\gamma,p}(x_{0}); i.e., Wγ,p(A)W_{\gamma,p}(A) comprises functions which are in Wγ,p(x0)W_{\gamma,p}(x_{0}) for each x0Ax_{0}\in A.

A central theme in approximation theory is to characterize the smoothness spaces Wγ,pW_{\gamma,p} in terms of the degree of approximation from some spaces; in our case we consider Πn\Pi_{n}’s.

For this purpose, we define some localized kernels and operators.

The kernels are defined by

Φn(H;x,y)=m=0H(λmn)ϕm(x)ϕm(y),\Phi_{n}(H;x,y)=\sum_{m=0}^{\infty}H\left(\frac{\lambda_{m}}{n}\right)\phi_{m}(x)\phi_{m}(y), (25)

where H:H:{\mathbb{R}}\to{\mathbb{R}} is a compactly supported function.

The operators corresponding to the kernels Φn\Phi_{n} are defined by

σn(H;f,x)=𝕏Φn(H;x,y)f(y)𝑑μ(y)=k:λk<nH(λkn)f^(k)ϕk(x),\sigma_{n}(H;f,x)=\int_{\mathbb{X}}\Phi_{n}(H;x,y)f(y)d\mu^{*}(y)=\sum_{k:\lambda_{k}<n}H\left(\frac{\lambda_{k}}{n}\right)\hat{f}(k)\phi_{k}(x), (26)

where

f^(k)=𝕏f(y)ϕk(y)𝑑μ(y).\hat{f}(k)=\int_{\mathbb{X}}f(y)\phi_{k}(y)d\mu^{*}(y). (27)

The following proposition recalls an important property of these kernels. Proposition 2.4 is proved in mauropap , and more recently in much greater generality in [tauberian, , Theorem 4.3].

Proposition 2.4.

Let S>q+1S>q+1 be an integer, H:H:\mathbb{R}\to\mathbb{R} be an even, SS times continuously differentiable, compactly supported function. Then for every x,y𝕏x,y\in\mathbb{X}, N>0N>0,

|ΦN(H;x,y)|Nqmax(1,(Nd(x,y))S),|\Phi_{N}(H;x,y)|\lesssim\frac{N^{q}}{\max(1,(Nd(x,y))^{S})}, (28)

where the constant may depend upon HH and SS, but not on NN, xx, or yy.

In the remainder of this paper, we fix a filter hh; i.e., an infinitely differentiable function h:[0,)[0,1]h:[0,\infty)\to[0,1], such that h(t)=1h(t)=1 for 0t1/20\leq t\leq 1/2, h(t)=0h(t)=0 for t1t\geq 1. The domain of the filter hh can be extended to {\mathbb{R}} by setting h(t)=h(t)h(-t)=h(t). Since hh is fixed, its mention will be omitted from the notation unless we feel that this would cause a confusion. The following theorem gives a crucial property of the operators, proved in several papers of ours in different contexts, see mhaskar2020kernel for a recent proof.

Theorem 2.5.

Let n>0n>0. If PΠn/2P\in\Pi_{n/2}, then σn(P)=P\sigma_{n}(P)=P. Also, for any pp with 1p1\leq p\leq\infty,

σn(f)pfp,fLp.\|\sigma_{n}(f)\|_{p}\lesssim\|f\|_{p},\qquad f\in L^{p}. (29)

If 1p1\leq p\leq\infty, and fLp(𝕏)f\in L^{p}({\mathbb{X}}), then

En(p,f)fσn(f)p,μEn/2(p,f).E_{n}(p,f)\leq\|f-\sigma_{n}(f)\|_{p,\mu^{*}}\lesssim E_{n/2}(p,f). (30)

While Theorem 2.5 gives, in particular, a characterization of the global smoothness spaces Wγ,pW_{\gamma,p}, the characterization of local smoothness requires two more assumptions: the partition of unity and product assumption.

Definition 2.6 (Partition of unity).

We say that a set XX has a partition of unity if for every r>0r>0, there exists a countable family r={ψk,r}k=0\mathcal{F}_{r}=\{\psi_{k,r}\}_{k=0}^{\infty} of CC^{\infty} functions with the following properties:

  1. 1.

    Each ψk,rr\psi_{k,r}\in\mathcal{F}_{r} is supported on 𝔹(xk,r)\mathbb{B}(x_{k},r) for some xkXx_{k}\in X.

  2. 2.

    For every ψk,rr\psi_{k,r}\in\mathcal{F}_{r} and xXx\in X, 0ψk,r(x)10\leq\psi_{k,r}(x)\leq 1.

  3. 3.

    For every xXx\in X there exists a finite subset r(x)r\mathcal{F}_{r}(x)\subseteq\mathcal{F}_{r} (with cardinality bounded independently of xx) such that for all y𝔹(x,r)y\in\mathbb{B}(x,r)

    ψk,rr(x)ψk,r(y)=1.\sum_{\psi_{k,r}\in\mathcal{F}_{r}(x)}\psi_{k,r}(y)=1. (31)

Definition 2.7 (Product assumption).

We say that a data space Ξ\Xi satisfies the product assumption if there exists A1A^{*}\geq 1 and a family {Rj,k,nΠAn}\{R_{j,k,n}\in\Pi_{A^{*}n}\} such that for every S>0S>0,

limnnS(maxλk,λj<nϕkϕjRj,k,n𝕏)=0.\lim_{n\to\infty}n^{S}\left(\max_{\lambda_{k},\lambda_{j}<n}\left|\left|\phi_{k}\phi_{j}-R_{j,k,n}\right|\right|_{\mathbb{X}}\right)=0. (32)

If instead for every n>0n>0 and P,QΠnP,Q\in\Pi_{n} we have PQΠAnPQ\in\Pi_{A^{*}n}, then we say that Ξ\Xi satisfies the strong product assumption.

In the most important manifold case, the partition of unity assumption is always satisfied [docarmo_riemannian, , Chapter 0, Theorem 5.6]. It is shown in geller2011band ; modlpmz that the strong product assumption is satisfied if ϕk\phi_{k}’s are eigenfunctions of certain differential equations on a Riemannian manifold and the λk\lambda_{k}’s are the corresponding eigenvalues. We do not know of any example where this property does not hold, yet cannot prove that it holds in general. Hence, we have listed it as an assumption.

Our characterization of local smoothness (compbio ; heatkernframe ; mhaskar2020kernel ) is the following.

Theorem 2.8.

Let 1p1\leq p\leq\infty, γ>0\gamma>0, fXpf\in X^{p}, x0𝕏x_{0}\in{\mathbb{X}}. We assume the partition of unity and the product assumption. Then the following are equivalent.
(a) fWγ,p(x0)f\in W_{\gamma,p}(x_{0}).
(b) There exists a ball 𝔹{\mathbb{B}} centered at x0x_{0} such that

supn02nγfσ2n(f)p,μ,𝔹<.\sup_{n\geq 0}2^{n\gamma}\|f-\sigma_{2^{n}}(f)\|_{p,\mu^{*},{\mathbb{B}}}<\infty. (33)

A direct corollary is the following.

Corollary 2.9.

Let 1p1\leq p\leq\infty, γ>0\gamma>0, fXpf\in X^{p}, AA be a compact subset of 𝕏{\mathbb{X}}. We assume the partition of unity and the product assumption. Then the following are equivalent.
(a) fWγ,p(A)f\in W_{\gamma,p}(A).
(b) There exists r>0r>0 such that

supn02nγfσ2n(f)p,μ,𝔹(A,r)<.\sup_{n\geq 0}2^{n\gamma}\|f-\sigma_{2^{n}}(f)\|_{p,\mu^{*},{\mathbb{B}}(A,r)}<\infty. (34)

3 Joint data spaces

In order to motivate our definitions in this section, we first consider a couple of examples.

{uda}

Let Ξj=(𝕏j,dj,μj,{λj,k}k=0,{ϕj,k}k=0)\Xi_{j}=({\mathbb{X}}_{j},d_{j},\mu_{j}^{*},\{\lambda_{j,k}\}_{k=0}^{\infty},\{\phi_{j,k}\}_{k=0}^{\infty}), j=1,2j=1,2 be two data spaces with exponent qq. We denote the heat kernel in each case by

Kj,t(x,y)=k=0exp(λj,k2t)ϕj,k(x)ϕj,k(y),j=1,2,x,y𝕏,t>0,K_{j,t}(x,y)=\sum_{k=0}^{\infty}\exp(-\lambda_{j,k}^{2}t)\phi_{j,k}(x)\phi_{j,k}(y),\qquad j=1,2,\ x,y\in{\mathbb{X}},\ t>0,

In the paper coifmanhirn , Coifman and Hirn assumed that 𝕏1=𝕏2=𝕏{\mathbb{X}}_{1}={\mathbb{X}}_{2}={\mathbb{X}}, μ1=μ2=μ\mu_{1}^{*}=\mu_{2}^{*}=\mu^{*}, and proposed the diffusion distance between points x1,x2x_{1},x_{2} to be the square root of

K1,2t(x1,x2)+K2,2t(x2,x2)2𝕏K1,t(x1,y)K2,t(y,x2)𝑑μ(y).K_{1,2t}(x_{1},x_{2})+K_{2,2t}(x_{2},x_{2})-2\int_{\mathbb{X}}K_{1,t}(x_{1},y)K_{2,t}(y,x_{2})d\mu^{*}(y).

Writing, in this example only,

Aj,k=𝕏ϕ1,j(y)ϕ2,k(y)𝑑μ(y),A_{j,k}=\int_{\mathbb{X}}\phi_{1,j}(y)\phi_{2,k}(y)d\mu^{*}(y), (35)

we get

𝕏K1,t(x1,y)K2,t(y,x2)𝑑μ(y)=j,kexp((λ1,j2+λ2,k2)t)Aj,kϕ1,j(x1)ϕ2,k(x2).\int_{\mathbb{X}}K_{1,t}(x_{1},y)K_{2,t}(y,x_{2})d\mu^{*}(y)=\sum_{j,k}\exp\left(-(\lambda_{1,j}^{2}+\lambda_{2,k}^{2})t\right)A_{j,k}\phi_{1,j}(x_{1})\phi_{2,k}(x_{2}). (36)

Furthermore, the Gaussian upper bound conditions imply that

𝕏K1,t(x1,y)K2,t(y,x2)𝑑μ(y)\displaystyle\int_{\mathbb{X}}K_{1,t}(x_{1},y)K_{2,t}(y,x_{2})d\mu^{*}(y) tq𝕏exp(cd1(x1,y)2+d2(y,x2)2t)𝑑μ(y)\displaystyle\lesssim t^{-q}\int_{\mathbb{X}}\exp\left(-c\frac{d_{1}(x_{1},y)^{2}+d_{2}(y,x_{2})^{2}}{t}\right)d\mu^{*}(y) (37)
tqexp(c(miny𝕏(d1(x1,y)+d2(y,x2)))2t).\displaystyle\lesssim t^{-q}\exp\left(-c\frac{\left(\min_{y\in{\mathbb{X}}}\left(d_{1}(x_{1},y)+d_{2}(y,x_{2})\right)\right)^{2}}{t}\right).

Writing, in this example only,

d1,2(x1,x2)=miny𝕏(d1(x1,y)+d2(y,x2))=d2,1(x2,x1),d_{1,2}(x_{1},x_{2})=\min_{y\in{\mathbb{X}}}\left(d_{1}(x_{1},y)+d_{2}(y,x_{2})\right)=d_{2,1}(x_{2},x_{1}),

we observe that for any x1,x1,x2,x2𝕏x_{1},x_{1}^{\prime},x_{2},x_{2}^{\prime}\in{\mathbb{X}},

d1,2(x1,x2)\displaystyle d_{1,2}(x_{1},x_{2}) d1,2(x1,x2)+d1(x1,x1),\displaystyle\leq d_{1,2}(x_{1}^{\prime},x_{2})+d_{1}(x_{1},x_{1}^{\prime}),
d1,2(x1,x2)\displaystyle d_{1,2}(x_{1},x_{2}) d1,2(x1,x2)+d1(x2,x2).\displaystyle\leq d_{1,2}(x_{1},x_{2}^{\prime})+d_{1}(x_{2},x_{2}^{\prime}).

{uda}

In this example we let αi,βi1/2\alpha_{i},\beta_{i}\geq-1/2 for i=1,2i=1,2 and assume that a=|α1α2|/2,b=|β1β2|/2a=\left|\alpha_{1}-\alpha_{2}\right|/2,b=\left|\beta_{1}-\beta_{2}\right|/2\in\mathbb{N}. Then we select the following two data spaces as defined in Example 2

Ξi=([0,π],di,1πdθ,{λi,n},{πϕn(αi,βi)}).\Xi_{i}=([0,\pi],d_{i},\frac{1}{\pi}d\theta,\{\lambda_{i,n}\},\{\sqrt{\pi}\phi^{(\alpha_{i},\beta_{i})}_{n}\}). (38)

Since both spaces already have the same distance, we will define a joint distance for the systems accordingly:

d1,2(θ1,θ2)=d1(θ1,θ2)=d2(θ1,θ2)=|θ1θ2|.d_{1,2}(\theta_{1},\theta_{2})=d_{1}(\theta_{1},\theta_{2})=d_{2}(\theta_{1},\theta_{2})=|\theta_{1}-\theta_{2}|. (39)

Similar to Example 3 above, we are considering two data spaces with the same underlying space and measure. However, we now proceed in a different manner. Let us denote

Ω(θ)=(1cosθ)a(1+cosθ)b.\varOmega(\theta)=(1-\cos\theta)^{a}(1+\cos\theta)^{b}. (40)

Let α¯=max(α1,α2)\overline{\alpha}=\max(\alpha_{1},\alpha_{2}) and β¯=max(β1,β2)\overline{\beta}=\max(\beta_{1},\beta_{2}). Then we define

Am,n=\displaystyle A_{m,n}= 0πϕm(α1,β1)(θ)ϕn(α2,β2)(θ)Ω(θ)𝑑θ\displaystyle\int_{0}^{\pi}\phi_{m}^{(\alpha_{1},\beta_{1})}(\theta)\phi_{n}^{(\alpha_{2},\beta_{2})}(\theta)\varOmega(\theta)d\theta (41)
=\displaystyle= 0πpm(α1,β1)(cosθ)pn(α2,β2)(cosθ)(1cosθ)α¯+1/2(1+cosθ)β¯+1/2𝑑θ.\displaystyle\int_{0}^{\pi}p_{m}^{(\alpha_{1},\beta_{1})}(\cos\theta)p_{n}^{(\alpha_{2},\beta_{2})}(\cos\theta)(1-\cos\theta)^{\overline{\alpha}+1/2}(1+\cos\theta)^{\overline{\beta}+1/2}d\theta.

The orthogonality of the Jacobi polynomials tells us that Am,n=0A_{m,n}=0 at least when m>n+2a+2bm>n+2a+2b or n>m+2a+2bn>m+2a+2b. Furthermore, we have the following two sums

nAm,nϕn(α2,β2)(θ)=Ω(θ)ϕm(α1,β1)(θ),mAm,nϕm(α1,β1)(θ)=Ω(θ)ϕn(α2,β2)(θ).\sum_{n}A_{m,n}\phi_{n}^{(\alpha_{2},\beta_{2})}(\theta)=\varOmega(\theta)\phi_{m}^{(\alpha_{1},\beta_{1})}(\theta),\hskip 5.0pt\sum_{m}A_{m,n}\phi_{m}^{(\alpha_{1},\beta_{1})}(\theta)=\varOmega(\theta)\phi_{n}^{(\alpha_{2},\beta_{2})}(\theta). (42)

We define m.n=λ1,m2+λ2,n2\ell_{m.n}=\sqrt{\lambda_{1,m}^{2}+\lambda_{2,n}^{2}}, utilize the Gaussian upper bound property for Ξi\Xi_{i} and Equation (42) to deduce as in Example 3 that

|πm,nexp(m,n2t)Am,nϕm(α1,β1)(θ1)ϕn(α2,β2)(θ2)|\displaystyle\left|\pi\sum_{m,n}\exp\left(-\ell_{m,n}^{2}t\right)A_{m,n}\phi_{m}^{(\alpha_{1},\beta_{1})}(\theta_{1})\phi_{n}^{(\alpha_{2},\beta_{2})}(\theta_{2})\right| (43)
=\displaystyle= |0πK1,t(θ1,ϕ)K2,t(ϕ,θ2)Ω(ϕ)𝑑ϕ|\displaystyle\left|\int_{0}^{\pi}K_{1,t}(\theta_{1},\phi)K_{2,t}(\phi,\theta_{2})\varOmega(\phi)d\phi\right|
\displaystyle\lesssim t1exp(cd1,2(θ1,θ2)2t).\displaystyle t^{-1}\exp\left(-c\frac{d_{1,2}(\theta_{1},\theta_{2})^{2}}{t}\right).

We note (cf. [mhaskar2020kernel, , Lemma 5.2]) that

πm,n:m,n<N|Am,nϕm(α1,β1)(θ1)ϕn(α2,β2)(θ2)|\displaystyle\pi\sum_{m,n:\ell_{m,n}<N}\left|A_{m,n}\phi_{m}^{(\alpha_{1},\beta_{1})}(\theta_{1})\phi_{n}^{(\alpha_{2},\beta_{2})}(\theta_{2})\right| (44)
\displaystyle\lesssim Ω[0,π]m:λ1,m<N|ϕm(α1,β1)(θ1)ϕm(α1,β1)(θ2)|\displaystyle\left|\left|\varOmega\right|\right|_{[0,\pi]}\sum_{m:\lambda_{1,m}<N}\left|\phi_{m}^{(\alpha_{1},\beta_{1})}(\theta_{1})\phi_{m}^{(\alpha_{1},\beta_{1})}(\theta_{2})\right|
\displaystyle\lesssim N.\displaystyle N.

Motivated by these examples, we now give a series of definitions, culminating in Definition 3.3. First, we define the notion of a joint distance.

Definition 3.1.

Let 𝕏1{\mathbb{X}}_{1}, 𝕏2{\mathbb{X}}_{2} be metric spaces, with each 𝕏j{\mathbb{X}}_{j} having a metric djd_{j}. A function d1,2:𝕏1×𝕏2[0,)d_{1,2}:\mathbb{X}_{1}\times\mathbb{X}_{2}\to[0,\infty) will be called a joint distance if the following generalized triangle inequalities are satisfied for x1,x1𝕏1x_{1},x_{1}^{\prime}\in\mathbb{X}_{1} and x2,x2𝕏2x_{2},x_{2}^{\prime}\in\mathbb{X}_{2}:

d1,2(x1,x2)\displaystyle d_{1,2}(x_{1},x_{2}) d1(x1,x1)+d1,2(x1,x2),\displaystyle\leq d_{1}(x_{1},x_{1}^{\prime})+d_{1,2}(x_{1}^{\prime},x_{2}), (45)
d1,2(x1,x2)\displaystyle d_{1,2}(x_{1},x_{2}) d1,2(x1,x2)+d2(x2,x2).\displaystyle\leq d_{1,2}(x_{1},x_{2}^{\prime})+d_{2}(x_{2}^{\prime},x_{2}).

For convenience of notation we denote d2,1(x2,x1)=d1,2(x1,x2)d_{2,1}(x_{2},x_{1})=d_{1,2}(x_{1},x_{2}). Then for r>0r>0, x1𝕏1x_{1}\in\mathbb{X}_{1}, x2𝕏2x_{2}\in\mathbb{X}_{2}, A1𝕏1A_{1}\subset{\mathbb{X}}_{1}, A2𝕏2A_{2}\subset{\mathbb{X}}_{2}, we define

𝔹1(x1,r)\displaystyle\mathbb{B}_{1}(x_{1},r) ={z𝕏1:d1(x1,z)r},\displaystyle=\{z\in\mathbb{X}_{1}:d_{1}(x_{1},z)\leq r\}, 𝔹2(x2,r)\displaystyle\mathbb{B}_{2}(x_{2},r) ={z𝕏2:d2(x2,z)r},\displaystyle=\{z\in\mathbb{X}_{2}:d_{2}(x_{2},z)\leq r\},
𝔹1,2(x1,r)\displaystyle\mathbb{B}_{1,2}(x_{1},r) ={z𝕏2:d1,2(x1,z)r},\displaystyle=\{z\in\mathbb{X}_{2}:d_{1,2}(x_{1},z)\leq r\}, 𝔹2,1(x2,r)\displaystyle\mathbb{B}_{2,1}(x_{2},r) ={z𝕏1:d2,1(x2,z)r},\displaystyle=\{z\in\mathbb{X}_{1}:d_{2,1}(x_{2},z)\leq r\},
d1,2(A1,x2)=\displaystyle d_{1,2}(A_{1},x_{2})= infxA1𝕏1d1,2(x,x2),\displaystyle\inf_{x\in A_{1}\subseteq\mathbb{X}_{1}}d_{1,2}(x,x_{2}), (46)
d1,2(x1,A2)=d2,1(A2,x1)=\displaystyle d_{1,2}(x_{1},A_{2})=d_{2,1}(A_{2},x_{1})= infyA2𝕏2d2,1(y,x1).\displaystyle\inf_{y\in A_{2}\subseteq\mathbb{X}_{2}}d_{2,1}(y,x_{1}).

We recall here that an infimum over an empty set is defined to be \infty.

Definition 3.2.

Let 𝐀=(Aj,k)j,k=0\mathbf{A}=(A_{j,k})_{j,k=0}^{\infty} (connection coefficients) and 𝐋=(j,k)j,k=0\mathbf{L}=(\ell_{j,k})_{j,k=0}^{\infty} (joint eigenvalues) be bi-infinite matrices. For x1𝕏1x_{1}\in{\mathbb{X}}_{1}, x2𝕏2x_{2}\in{\mathbb{X}}_{2}, t>0t>0, the joint heat kernel is defined formally by

Kt(Ξ1,Ξ2;x1,x2)=\displaystyle K_{t}(\Xi_{1},\Xi_{2};x_{1},x_{2})= Kt(Ξ1,Ξ2;𝐀,𝐋;x1,x2)\displaystyle K_{t}(\Xi_{1},\Xi_{2};\mathbf{A},\mathbf{L};x_{1},x_{2}) (47)
=\displaystyle= j,k=0exp(j,k2t)Aj,kϕ1,j(x1)ϕ2,k(x2)\displaystyle\sum_{j,k=0}^{\infty}\exp(-\ell_{j,k}^{2}t)A_{j,k}\phi_{1,j}(x_{1})\phi_{2,k}(x_{2})
=\displaystyle= limnj,k:j,k<nexp(j,k2t)Aj,kϕ1,j(x1)ϕ2,k(x2).\displaystyle\lim_{n\to\infty}\sum_{j,k:\ell_{j,k}<n}\exp(-\ell_{j,k}^{2}t)A_{j,k}\phi_{1,j}(x_{1})\phi_{2,k}(x_{2}).

Definition 3.3.

For m=1,2m=1,2, let Ξm=(𝕏m,dm,μm,{λm,k}k=0,{ϕm,k}k=0)\Xi_{m}=\left({\mathbb{X}}_{m},d_{m},\mu_{m}^{*},\{\lambda_{m,k}\}_{k=0}^{\infty},\{\phi_{m,k}\}_{k=0}^{\infty}\right) be compact data spaces. With the notation above, assume each j,k0\ell_{j,k}\geq 0 and that for any u>0u>0, the set {(j,k):j,k<u}\{(j,k):\ell_{j,k}<u\} is finite. A joint (compact) data space Ξ\Xi is a tuple

(Ξ1,Ξ2,d1,2,𝐀,𝐋),(\Xi_{1},\Xi_{2},d_{1,2},\mathbf{A},\mathbf{L}),

where each of the following conditions is satisfied for some Q>0Q>0:

  1. 1.

    (Joint regularity) There exist q1,q2>0q_{1},q_{2}>0 such that

    μ1(𝔹2,1(x2,r))crq1,μ2(𝔹1,2(x1,r))crq2,x1𝕏1,x2𝕏2,r>0.\mu_{1}^{*}({\mathbb{B}}_{2,1}(x_{2},r))\leq cr^{q_{1}},\quad\mu_{2}^{*}({\mathbb{B}}_{1,2}(x_{1},r))\leq cr^{q_{2}},\qquad x_{1}\in{\mathbb{X}}_{1},\ x_{2}\in{\mathbb{X}}_{2},\ r>0. (48)
  2. 2.

    (Variation bound) For each n>0n>0,

    j,k:j,k<n|Aj,kϕ1,j(x1)ϕ2,k(x2)|nQ,x1𝕏1,x2𝕏2.\sum_{j,k:\ell_{j,k}<n}\left|A_{j,k}\phi_{1,j}(x_{1})\phi_{2,k}(x_{2})\right|\lesssim n^{Q},\qquad x_{1}\in{\mathbb{X}}_{1},\ x_{2}\in{\mathbb{X}}_{2}. (49)
  3. 3.

    (Joint Gaussian upper bound) The limit in (47) exists for all x1𝕏1x_{1}\in{\mathbb{X}}_{1}, x2𝕏2x_{2}\in{\mathbb{X}}_{2}, and

    |Kt(Ξ1,Ξ2;x1,x2)|c1tcexp(c2d1,2(x1,x2)2t),x1𝕏1,x2𝕏2.|K_{t}(\Xi_{1},\Xi_{2};x_{1},x_{2})|\leq c_{1}t^{-c}\exp\left(-c_{2}\frac{d_{1,2}(x_{1},x_{2})^{2}}{t}\right),\qquad x_{1}\in{\mathbb{X}}_{1},\ x_{2}\in{\mathbb{X}}_{2}. (50)

We refer to (Q,q1,q2)(Q,q_{1},q_{2}) as the (joint) exponents of the joint data space.

The kernel corresponding to the one defined in Equation (25) is the following, where H:[0,)[0,)H:[0,\infty)\to[0,\infty) is a compactly supported function.

Φn(H,Ξ1,Ξ2;x1,x2)=j,k=0H(j,kn)Aj,kϕ1,j(x1)ϕ2,k(x2).\Phi_{n}(H,\Xi_{1},\Xi_{2};x_{1},x_{2})=\sum_{j,k=0}^{\infty}H\left(\frac{\ell_{j,k}}{n}\right)A_{j,k}\phi_{1,j}(x_{1})\phi_{2,k}(x_{2}). (51)

For fL1(μ2)+L(μ2)f\in L^{1}(\mu^{*}_{2})+L^{\infty}(\mu^{*}_{2}) and x1𝕏1x_{1}\in\mathbb{X}_{1}, we also define

σn(H,Ξ1,Ξ2;f)(x1)=\displaystyle\sigma_{n}(H,\Xi_{1},\Xi_{2};f)(x_{1})= 𝕏2f(x2)Φn(H,Ξ1,Ξ2;x1,x2)𝑑μ2(x2)\displaystyle\int_{\mathbb{X}_{2}}f(x_{2})\Phi_{n}(H,\Xi_{1},\Xi_{2};x_{1},x_{2})d\mu^{*}_{2}(x_{2}) (52)
=\displaystyle= j,k=0H(j,kn)Aj,kf^(Ξ2;k)ϕ1,j(x1).\displaystyle\sum_{j,k=0}^{\infty}H\left(\frac{\ell_{j,k}}{n}\right)A_{j,k}\hat{f}(\Xi_{2};k)\phi_{1,j}(x_{1}).

The localization property of the kernels is given in the following proposition (cf. [tauberian, , Eqn. (4.5)]).

Proposition 3.4.

Let S>Q+1S>Q+1 be an integer, H:H:\mathbb{R}\to\mathbb{R} be an even, SS times continuously differentiable, compactly supported function. Then for every x1𝕏1x_{1}\in\mathbb{X}_{1}, x2𝕏2x_{2}\in\mathbb{X}_{2}, N>0N>0,

|ΦN(H,Ξ1,Ξ2;x1,x2)|NQmax(1,(Nd1,2(x1,x2))S),|\Phi_{N}(H,\Xi_{1},\Xi_{2};x_{1},x_{2})|\lesssim\frac{N^{Q}}{\max(1,(Nd_{1,2}(x_{1},x_{2}))^{S})}, (53)

where the constant involved may depend upon HH, and SS, but not on NN, x1x_{1}, x2x_{2}.

In the sequel, we will fix HH to be the filter hh introduced in Section 2, and will omit its mention from all notations. Also, we take S>max(Q,q1,q2)+1S>\max(Q,q_{1},q_{2})+1 to be fixed, although we may put additional conditions on SS as needed. As before, all constants may depend upon hh and SS.

In the remainder of this paper, we will take p=p=\infty, work only with continuous functions on 𝕏1{\mathbb{X}}_{1} or 𝕏2{\mathbb{X}}_{2}, and use fK\|f\|_{K} to denote the supremum norm of ff on a set KK. Accordingly, we will omit the index pp from the notation for the smoothness classes; e.g., we will write Wγ(Ξ1;B)W_{\gamma}(\Xi_{1};B) instead of Wγ,(Ξ1;B)W_{\gamma,\infty}(\Xi_{1};B). The results in the sequel are similar in the case where p<p<\infty due to the Riesz-Thorin interpolation theorem, but more notationally exhausting without adding any apparent new insights.

We end the section with a condition on the operator defined in Equation (52) that is useful for our purposes.

Definition 3.5 (Polynomial preservation condition).

Let (Ξ1,Ξ2,d1,2,𝐀,𝐋)(\Xi_{1},\Xi_{2},d_{1,2},\mathbf{A},\mathbf{L}) be a joint data space. We say the polynomial preservation condition is satisfied if there exists some c>0c^{*}>0 with the property that if PnΠn(Ξ2)P_{n}\in\Pi_{n}(\Xi_{2}), then σm(Ξ1,Ξ2;Pn)=σcn(Ξ1,Ξ2;Pn)\sigma_{m}(\Xi_{1},\Xi_{2};P_{n})=\sigma_{c^{*}n}(\Xi_{1},\Xi_{2};P_{n}) for all mcnm\geq c^{*}n.

Remark 3.6.

The polynomial preservation condition is satisfied if, for any n>0n>0, we have the following inclusion:

{(i,j):Ai,j0,λ2,j<n}{(i,j):i,jcn,λ1,i<cn}.\{(i,j):A_{i,j}\neq 0,\lambda_{2,j}<n\}\subseteq\{(i,j):\ell_{i,j}\leq c^{*}n,\lambda_{1,i}<c^{*}n\}. (54)

{uda}

We utilize the same notation as in Examples 2 and 3. We now see, in light of Definition 3.3, that (Ξ1,Ξ2,d1,2,𝐀,𝐋)(\Xi_{1},\Xi_{2},d_{1,2},\mathbf{A},\mathbf{L}) is a joint data space with exponents (1,1,1)(1,1,1). It is clear that both the partition of unity and strong product assumption hold in these spaces. One may also recall that Am,n=0A_{m,n}=0 at least whenever m>n+2a+2bm>n+2a+2b, so there exists cc^{*} such that Equation (54) is satisfied. As a result, we conclude the polynomial preservation condition holds.

4 Local approximation in joint data spaces

In this section, we assume a fixed joint data space as in Section 3. We are interested in the following questions. Suppose fC(𝕏2)f\in C({\mathbb{X}}_{2}), and we have information about ff only in the neighborhood of a compact set A𝕏2A\subseteq{\mathbb{X}}_{2}. Under what conditions on ff and a subset B𝕏1B\subseteq{\mathbb{X}}_{1} can ff be lifted to a function (f)\mathcal{E}(f) on BB? Moreover, how does the local smoothness of (f)\mathcal{E}(f) on BB depends upon the local smoothness of ff on AA? We now give definitions for (f),A,B\mathcal{E}(f),A,B for which we have considered these questions.

Definition 4.1.

Given fC(𝕏2)f\in C(\mathbb{X}_{2}), we define the lifted function (f)\mathcal{E}(f) to be the limit

(f)=limnσn(Ξ1,Ξ2;f),\mathcal{E}(f)=\lim_{n\to\infty}\sigma_{n}(\Xi_{1},\Xi_{2};f), (55)

if the limit exists.

Definition 4.2.

Let r,s>0r,s>0 and A𝕏2A\subseteq\mathbb{X}_{2} be a compact subset with the property that there exists a compact subset B𝕏1B^{-}\subset\mathbb{X}_{1} such that

B{x1:d1,2(x1,𝕏2A)s+r}B^{-}\subseteq\{x_{1}:d_{1,2}(x_{1},\mathbb{X}_{2}\setminus A)\geq s+r\} (56)

for some r>0r>0. We then define the image set of AA by

(r,s;A)=𝔹1(B,s)={x1:d1(x1,B)s}.\mathcal{I}(r,s;A)={\mathbb{B}}_{1}(B^{-},s)=\{x_{1}:d_{1}(x_{1},B^{-})\leq s\}. (57)

If the set BB^{-} does not exist, then we define (r,s;A)=\mathcal{I}(r,s;A)=\emptyset.

Remark 4.3.

In the sequel we fix r,s>0r,s>0 and a compact subset A𝕏2A\subseteq\mathbb{X}_{2} such that BB^{-} defined in Equation (56) is nonempty. We write B=(r,s;A)B=\mathcal{I}(r,s;A). We note that, due to the generalized triangle inequality (45), we have the important property

BB{x1:d1,2(x1,𝕏2A)r}.B^{-}\varsubsetneq B\subseteq\{x_{1}:d_{1,2}(x_{1},\mathbb{X}_{2}\setminus A)\geq r\}. (58)

We now state our main theorem. Although there is no explicit mention of BB^{-} in the statement of the theorem, Remark 4.5 and Example 4 clarify the benefit of such a construction.

Theorem 4.4.

Let (Ξ1,Ξ2,d1,2,𝐀,𝐋)(\Xi_{1},\Xi_{2},d_{1,2},\mathbf{A},\mathbf{L}) be a joint data space with exponents (Q,q1,q2)(Q,q_{1},q_{2}). We assume that the polynomial preservation condition holds with parameter cc^{*}. Suppose 𝕏2\mathbb{X}_{2} has a partition of unity.
(a) Let fC(𝕏2)f\in C(\mathbb{X}_{2}), satisfying

m=02m(Qq2)σ2m+1(Ξ2;f)σ2m(Ξ2;f)A<.\sum_{m=0}^{\infty}2^{m(Q-q_{2})}\left|\left|\sigma_{2^{m+1}}(\Xi_{2};f)-\sigma_{2^{m}}(\Xi_{2};f)\right|\right|_{A}<\infty. (59)

Then (f)\mathcal{E}(f) as defined in Definition 4.1 exists on BB and for cr2n1c^{*}r2^{n}\geq 1 we have

(f)σc2n(Ξ1,Ξ2;f)B\displaystyle\left|\left|\mathcal{E}(f)-\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\right|\right|_{B}\lesssim 2n(Qq2)fσ2n(Ξ2;f)A+f𝕏22n(QS)rq2S\displaystyle 2^{n(Q-q_{2})}\left|\left|f-\sigma_{2^{n}}(\Xi_{2};f)\right|\right|_{A}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}2^{n(Q-S)}r^{q_{2}-S} (60)
+m=n2m(Qq2)σ2m+1(Ξ2;f)σ2m(Ξ2;f)A.\displaystyle+\sum_{m=n}^{\infty}2^{m(Q-q_{2})}\left|\left|\sigma_{2^{m+1}}(\Xi_{2};f)-\sigma_{2^{m}}(\Xi_{2};f)\right|\right|_{A}.

In particular, if Ξ1\Xi_{1} satisfies the strong product assumption, 𝕏1\mathbb{X}_{1} has a partition of unity, and α>0\alpha>0 is given such that αj,kλ1,j\alpha\ell_{j,k}\geq\lambda_{1,j} for all j,kj,k\in\mathbb{N}, then σn(Ξ1,Ξ2;f)Παn(Ξ1)\sigma_{n}(\Xi_{1},\Xi_{2};f)\in\Pi_{\alpha n}(\Xi_{1}).
(b) If additionally, fWγ(Ξ2;A)f\in W_{\gamma}(\Xi_{2};A) with Qq2<γ<Sq2Q-q_{2}<\gamma<S-q_{2}, then (f)\mathcal{E}(f) is continuous on BB and for ϕC(B)\phi\in C^{\infty}(B), we have ϕ(f)WγQ+q2(Ξ1)\phi\mathcal{E}(f)\in W_{\gamma-Q+q_{2}}(\Xi_{1}).

Remark 4.5.

Given the assumptions of Theorem 4.4, (f)\mathcal{E}(f) is not guaranteed to be continuous on the entirety of 𝕏1\mathbb{X}_{1} (or even defined outside of BB). As a result, in the setting of 4.4(b) we cannot say (f)\mathcal{E}(f) belongs to any of the smoothness classes defined in this paper. However we can still say, for instance, that

infPΠ2n(Ξ1)(f)PB2n(γQ+q2)\inf_{P\in\Pi_{2^{n}}(\Xi_{1})}\left|\left|\mathcal{E}(f)-P\right|\right|_{B^{-}}\lesssim 2^{-n(\gamma-Q+q_{2})} (61)

(this can be seen directly by taking ϕC(Ξ1)\phi\in C^{\infty}(\Xi_{1}) such that ϕ(x)=1\phi(x)=1 when xBx\in B^{-} and ϕ(x)=0\phi(x)=0 when x𝕏1Bx\in\mathbb{X}_{1}\setminus B). Consequently, if it happens that (f)C(Ξ1)\mathcal{E}(f)\in C(\Xi_{1}), then (f)WγQ+q2(Ξ1;B)\mathcal{E}(f)\in W_{\gamma-Q+q_{2}}(\Xi_{1};B^{-}).

{uda}

We now conclude the running examples from 2, 3, and 3 by demonstrating how one may utilize Theorem 4.4. We assume the notation given in each of the prior examples listed. First, we find the image set for A=𝔹2(θ0,r0)A=\mathbb{B}_{2}(\theta_{0},r_{0}) given some θ0[0,π]\theta_{0}\in[0,\pi] and r0>0r_{0}>0. We let r=s=r0/8r=s=r_{0}/8 in correspondence to Definition 4.2 and define

B=\displaystyle B^{-}= 𝔹1(θ0,3r04)\displaystyle\mathbb{B}_{1}\left(\theta_{0},\frac{3r_{0}}{4}\right) (62)
=\displaystyle= {θ1[0,π]:d1(θ1,[0,π]𝔹1(θ0,r0))r04}\displaystyle\left\{\theta_{1}\in[0,\pi]:d_{1}(\theta_{1},[0,\pi]\setminus\mathbb{B}_{1}(\theta_{0},r_{0}))\geq\frac{r_{0}}{4}\right\}
=\displaystyle= {θ1[0,π]:d1,2(θ1,[0,π]A)r+s}.\displaystyle\left\{\theta_{1}\in[0,\pi]:d_{1,2}(\theta_{1},[0,\pi]\setminus A)\geq r+s\right\}.

Then we can let B=𝔹1(θ0,7r08)=𝔹1(B,r)B=\mathbb{B}_{1}\left(\theta_{0},\frac{7r_{0}}{8}\right)=\mathbb{B}_{1}(B^{-},r). By Theorem 4.4(a), fC([0,π])f\in C([0,\pi]) can be lifted to 𝔹1(θ0,7r08)\mathbb{B}_{1}\left(\theta_{0},\frac{7r_{0}}{8}\right) (where we note that Equation (59) is automatically satisfied due to Q=q2=1Q=q_{2}=1). Since m,n=λ1,n\ell_{m,n}=\lambda_{1,n}, we have σn(Ξ1,Ξ2;f)Πn(Ξ1)\sigma_{n}(\Xi_{1},\Xi_{2};f)\in\Pi_{n}(\Xi_{1}). If we suppose fWγ(Ξ2;A)f\in W_{\gamma}(\Xi_{2};A)for some γ>0\gamma>0 (with hh chosen so SS is sufficiently large), then Theorem 4.4(b) informs us that ϕ(f)Wγ(Ξ1)\phi\mathcal{E}(f)\in W_{\gamma}(\Xi_{1}) for ϕC(B)\phi\in C^{\infty}(B). Lastly, as a result of Equation (61), we can conclude

infPΠ2n(Ξ1)(f)P𝔹1(θ0,3r0r)2nγ.\inf_{P\in\Pi_{2^{n}}(\Xi_{1})}\left|\left|\mathcal{E}(f)-P\right|\right|_{\mathbb{B}_{1}\left(\theta_{0},\frac{3r_{0}}{r}\right)}\lesssim 2^{-n\gamma}. (63)

5 Proofs

In this section, we give a proof of Theorem 4.4 after proving some preperatory results. We assume that (Ξ1,Ξ2,d1,2,𝐀,𝐋)(\Xi_{1},\Xi_{2},d_{1,2},\mathbf{A},\mathbf{L}) is a joint data space with exponents Q,q1,q2Q,q_{1},q_{2}.

Lemma 5.1.

Let x1𝕏1x_{1}\in{\mathbb{X}}_{1}, r>0r>0. We have

𝕏2𝔹1,2(x1,r)|Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)nQq2(max(1,nr))q2S.\int_{\mathbb{X}_{2}\setminus\mathbb{B}_{1,2}(x_{1},r)}\left|\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})\right|d\mu_{2}^{*}(x_{2})\lesssim n^{Q-q_{2}}(\max(1,nr))^{q_{2}-S}. (64)

In particular,

𝕏2|Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)nQq2.\int_{\mathbb{X}_{2}}\left|\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})\right|d\mu_{2}^{*}(x_{2})\lesssim n^{Q-q_{2}}. (65)

Proof 5.2.

In this proof only, define

A0=𝔹1,2(x1,r),Am=𝔹1,2(x1,r2m)𝔹1,2(x1,r2m1) for all m.A_{0}=\mathbb{B}_{1,2}(x_{1},r),\qquad A_{m}=\mathbb{B}_{1,2}(x_{1},r2^{m})\setminus\mathbb{B}_{1,2}(x_{1},r2^{m-1})\text{ for all $m\in\mathbb{N}$.} (66)

Then the joint regularity condition (48) implies μ2(Am)(r2m)q2,\mu^{*}_{2}(A_{m})\lesssim(r2^{m})^{q_{2}}, for each mm. We can also see by definition that when xAmx\in A_{m}, then d1,2(x1,x)>r2m1d_{1,2}(x_{1},x)>r2^{m-1}. Since S>q2S>q_{2}, we deduce that for rn1rn\geq 1,

𝕏2A0|Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)\displaystyle\int_{\mathbb{X}_{2}\setminus A_{0}}\left|\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})\right|d\mu_{2}^{*}(x_{2})\lesssim m=1nQμ2(Am)(rn2m1)S\displaystyle\sum_{m=1}^{\infty}\frac{n^{Q}\mu_{2}^{*}(A_{m})}{(rn2^{m-1})^{S}} (67)
\displaystyle\lesssim rq2SnQSm=12m(q2S)\displaystyle r^{q_{2}-S}n^{Q-S}\sum_{m=1}^{\infty}2^{m(q_{2}-S)}
\displaystyle\lesssim nQq2(nr)q2S.\displaystyle n^{Q-q_{2}}(nr)^{q_{2}-S}.

This completes the proof of (64) when nr1nr\geq 1. The joint regularity condition and Proposition 3.4 show further that

A0|Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)nQμ2(A0)nQrq2=nQq2(nr)q2.\int_{A_{0}}\left|\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})\right|d\mu_{2}^{*}(x_{2})\lesssim n^{Q}\mu_{2}^{*}(A_{0})\lesssim n^{Q}r^{q_{2}}=n^{Q-q_{2}}(nr)^{q_{2}}. (68)

We use r=1/nr=1/n in the estimates (67) and (68) and add the estimates to arrive at both (65) and the case r1/nr\leq 1/n of (64).

The next lemma gives a local bound on the kernels σn\sigma_{n} defined in (52).

Lemma 5.3.

Let AA and BB be as defined in Remark 4.3. For a continuous f:Af:A\to{\mathbb{R}}, we have

σn(Ξ1,Ξ2;f)BnQq2{fA+f𝕏2(max(1,nr))q2S}.\|\sigma_{n}(\Xi_{1},\Xi_{2};f)\|_{B}\lesssim n^{Q-q_{2}}\left\{\|f\|_{A}+\|f\|_{{\mathbb{X}}_{2}}(\max(1,nr))^{q_{2}-S}\right\}. (69)

Proof 5.4.

Let x1Bx_{1}\in B. In view of the joint triangle inequality (45), we have d1,2(x1,x2)rd_{1,2}(x_{1},x_{2})\geq r for all x2𝕏2Ax_{2}\in{\mathbb{X}}_{2}\setminus A. Therefore, Lemma 5.1 shows that

|σn(Ξ1,Ξ2;f)(x1)|\displaystyle|\sigma_{n}(\Xi_{1},\Xi_{2};f)(x_{1})|\leq 𝕏2|f(x2)Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)\displaystyle\int_{{\mathbb{X}}_{2}}|f(x_{2})\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})|d\mu_{2}^{*}(x_{2}) (70)
=\displaystyle= A|f(x2)Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)\displaystyle\int_{A}|f(x_{2})\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})|d\mu_{2}^{*}(x_{2})
+𝕏2A|f(x2)Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)\displaystyle+\int_{{\mathbb{X}}_{2}\setminus A}|f(x_{2})\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})|d\mu_{2}^{*}(x_{2})
\displaystyle\lesssim nQq2fA+f𝕏2𝕏2𝔹1,2(x1,r)|Φn(Ξ1,Ξ2;x1,x2)|𝑑μ2(x2)\displaystyle n^{Q-q_{2}}\|f\|_{A}+\|f\|_{{\mathbb{X}}_{2}}\int_{{\mathbb{X}}_{2}\setminus{\mathbb{B}}_{1,2}(x_{1},r)}|\Phi_{n}(\Xi_{1},\Xi_{2};x_{1},x_{2})|d\mu_{2}^{*}(x_{2})
\displaystyle\lesssim nQq2{fA+f𝕏2(max(1,nr))q2S}.\displaystyle n^{Q-q_{2}}\left\{\|f\|_{A}+\|f\|_{{\mathbb{X}}_{2}}(\max(1,nr))^{q_{2}-S}\right\}.

Lemma 5.5.

We assume the polynomial preservation condition with parameter cc^{*}. Let fC(𝕏2)f\in C({\mathbb{X}}_{2}) satisfy (59). Then

(f)=limnσ2n(Ξ1,Ξ2;f)\mathcal{E}(f)=\lim_{n\to\infty}\sigma_{2^{n}}(\Xi_{1},\Xi_{2};f) (71)

exists on BB. Furthermore, when c2n>1/rc^{*}2^{n}>1/r, we have

(f)(σ2n(Ξ2;f))B\displaystyle\left|\left|\mathcal{E}(f)-\mathcal{E}(\sigma_{2^{n}}(\Xi_{2};f))\right|\right|_{B} (72)
\displaystyle\lesssim m=n2m(Qq2)fσ2m(Ξ2;f)A+f𝕏2nQSrq2S.\displaystyle\sum_{m=n}^{\infty}2^{m(Q-q_{2})}\left|\left|f-\sigma_{2^{m}}(\Xi_{2};f)\right|\right|_{A}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}n^{Q-S}r^{q_{2}-S}.

Proof 5.6.

In this proof only we denote Pn=σn(Ξ2;f)P_{n}=\sigma_{n}(\Xi_{2};f). Since PnΠn(Ξ2)P_{n}\in\Pi_{n}(\Xi_{2}), the condition (54) implies that

(Pn)=σcn(Ξ1,Ξ2;Pn)=limkσk(Ξ1,Ξ2;Pk)\mathcal{E}(P_{n})=\sigma_{c^{*}n}(\Xi_{1},\Xi_{2};P_{n})=\lim_{k\to\infty}\sigma_{k}(\Xi_{1},\Xi_{2};P_{k}) (73)

is defined on 𝕏1\mathbb{X}_{1}. Theorem 2.5 and Lemma 5.3 then imply that

(P2m+1)(P2m)B\displaystyle\left|\left|\mathcal{E}(P_{2^{m+1}})-\mathcal{E}(P_{2^{m}})\right|\right|_{B} (74)
=\displaystyle= σc2m+1(Ξ1,Ξ2;P2m+1)σc2m+1(Ξ1,Ξ2;P2m)B\displaystyle\left|\left|\sigma_{c^{*}2^{m+1}}(\Xi_{1},\Xi_{2};P_{2^{m+1}})-\sigma_{c^{*}2^{m+1}}(\Xi_{1},\Xi_{2};P_{2^{m}})\right|\right|_{B}
\displaystyle\lesssim 2m(Qq2)(P2m+1P2mA+f𝕏2(max(1,2mr))q2S).\displaystyle 2^{m(Q-q_{2})}(\left|\left|P_{2^{m+1}}-P_{2^{m}}\right|\right|_{A}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}(\max(1,2^{m}r))^{q_{2}-S}).

We conclude that

(P1)B+m=0(P2m+1)(P2m)B\displaystyle\left|\left|\mathcal{E}(P_{1})\right|\right|_{B}+\sum_{m=0}^{\infty}\left|\left|\mathcal{E}(P_{2^{m+1}})-\mathcal{E}(P_{2^{m}})\right|\right|_{B} (75)
\displaystyle\lesssim P1+m=02m(Qq2)P2m+1P2mA\displaystyle\left|\left|P_{1}\right|\right|+\sum_{m=0}^{\infty}2^{m(Q-q_{2})}\left|\left|P_{2^{m+1}}-P_{2^{m}}\right|\right|_{A}
+f𝕏2(c2m1/r2m(Qq2)+rq2Sc2m>1/r2m(QS))<.\displaystyle+\left|\left|f\right|\right|_{\mathbb{X}_{2}}\left(\sum_{c^{*}2^{m}\leq 1/r}2^{m(Q-q_{2})}+r^{q_{2}-S}\sum_{c^{*}2^{m}>1/r}2^{m(Q-S)}\right)<\infty.

Thus,

(f)=(P1)+m=0((P2m+1)(P2m))\mathcal{E}(f)=\mathcal{E}(P_{1})+\sum_{m=0}^{\infty}\left(\mathcal{E}(P_{2^{m+1}})-\mathcal{E}(P_{2^{m}})\right) (76)

is defined on BB. In particular, when c2n1/rc^{*}2^{n}\geq 1/r it follows

(f)(P2n)Bm=n2m(Qq2)P2m+1P2mA+f𝕏22n(QS)rq2S.\left|\left|\mathcal{E}(f)-\mathcal{E}(P_{2^{n}})\right|\right|_{B}\leq\sum_{m=n}^{\infty}2^{m(Q-q_{2})}\left|\left|P_{2^{m+1}}-P_{2^{m}}\right|\right|_{A}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}2^{n(Q-S)}r^{q_{2}-S}. (77)

Now we give the proof of Theorem 4.4.

Proof 5.7.

In this proof only denote Pn=σn(Ξ2;f)Πn(Ξ2)P_{n}=\sigma_{n}(\Xi_{2};f)\in\Pi_{n}(\Xi_{2}). We can deduce from Theorem 2.5 and Lemma 5.3 that for cr2n1c^{*}r2^{n}\geq 1,

σc2n(Ξ1,Ξ2;f)σc2n(Ξ1,Ξ2;P2n)B\displaystyle\left|\left|\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)-\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};P_{2^{n}})\right|\right|_{B} (78)
\displaystyle\lesssim 2n(Qq2)(fP2nA+fP2n𝕏22n(q2S)rq2S)\displaystyle 2^{n(Q-q_{2})}(\left|\left|f-P_{2^{n}}\right|\right|_{A}+\left|\left|f-P_{2^{n}}\right|\right|_{\mathbb{X}_{2}}2^{n(q_{2}-S)}r^{q_{2}-S})
\displaystyle\lesssim 2n(Qq2)(fP2nA+f𝕏22n(q2S)rq2S).\displaystyle 2^{n(Q-q_{2})}(\left|\left|f-P_{2^{n}}\right|\right|_{A}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}2^{n(q_{2}-S)}r^{q_{2}-S}).

The polynomial preservation condition (Definition 3.5) gives us that

σc2n(Ξ1,Ξ2;P2n)(P2n)B=0.\left|\left|\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};P_{2^{n}})-\mathcal{E}(P_{2^{n}})\right|\right|_{B}=0. (79)

Then utilizing Equation (LABEL:eq:pf2eqn1) and Lemma 5.5, we see

σc2n(Ξ1,Ξ2;f)(f)B\displaystyle\left|\left|\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)-\mathcal{E}(f)\right|\right|_{B} (80)
\displaystyle\leq σc2n(Ξ1,Ξ2;f)σc2n(Ξ1,Ξ2;P2n)B\displaystyle\left|\left|\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)-\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};P_{2^{n}})\right|\right|_{B}
+σc2n(Ξ1,Ξ2;P2n)(P2n)B+(P2n)(f)B\displaystyle+\left|\left|\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};P_{2^{n}})-\mathcal{E}(P_{2^{n}})\right|\right|_{B}+\left|\left|\mathcal{E}(P_{2^{n}})-\mathcal{E}(f)\right|\right|_{B}
\displaystyle\lesssim 2n(Qq2)fP2nA+m=n2m(Qq2)P2m+1P2mA+f𝕏22n(QS)rq2S.\displaystyle 2^{n(Q-q_{2})}\left|\left|f-P_{2^{n}}\right|\right|_{A}+\sum_{m=n}^{\infty}2^{m(Q-q_{2})}\left|\left|P_{2^{m+1}}-P_{2^{m}}\right|\right|_{A}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}2^{n(Q-S)}r^{q_{2}-S}.

This proves Equation (60).

In particular, when αj,kλ1,j\alpha\ell_{j,k}\geq\lambda_{1,j} and α>0\alpha>0, the only ϕ1,j(x1)\phi_{1,j}(x_{1}) with non-zero coefficients in Equation (52) are those where j,k<n\ell_{j,k}<n, which implies λ1,j<αn\lambda_{1,j}<\alpha n and further that σn(Ξ1,Ξ2;f)Παn(Ξ1)\sigma_{n}(\Xi_{1},\Xi_{2};f)\in\Pi_{\alpha n}(\Xi_{1}). This completes the proof of part (a).

In the proof of part (b), we may assume without loss of generality that fWγ(Ξ2;A)+f𝕏2=1\|f\|_{W_{\gamma}(\Xi_{2};A)}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}=1. We can see from Corollary 2.9 that for each mm

P2m+1P2mAP2m+1fA+P2mfA2mγ,\left|\left|P_{2^{m+1}}-P_{2^{m}}\right|\right|_{A}\leq\left|\left|P_{2^{m+1}}-f\right|\right|_{A}+\left|\left|P_{2^{m}}-f\right|\right|_{A}\lesssim 2^{-m\gamma}, (81)

which implies that whenever Qq2<γQ-q_{2}<\gamma we have

m=n2m(Qq2)P2m+1P2mA2n(Qq2γ).\sum_{m=n}^{\infty}2^{m(Q-q_{2})}\left|\left|P_{2^{m+1}}-P_{2^{m}}\right|\right|_{A}\lesssim 2^{n(Q-q_{2}-\gamma)}. (82)

Further, the assumption that γ<Sq2\gamma<S-q_{2} gives us

2n(QS)2n(Qq2γ).2^{n(Q-S)}\lesssim 2^{n(Q-q_{2}-\gamma)}. (83)

Since fWγ(Ξ2;A)f\in W_{\gamma}(\Xi_{2};A), we have from Corollary 2.9 that

fP2nA2nγ.\left|\left|f-P_{2^{n}}\right|\right|_{A}\lesssim 2^{-n\gamma}. (84)

Using Equation (60) from part (a), we see

(f)σc2n(Ξ1,Ξ2;f)B(1+rq2Sf𝕏2)2n(Qq2γ).\left|\left|\mathcal{E}(f)-\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\right|\right|_{B}\lesssim(1+r^{q_{2}-S}\left|\left|f\right|\right|_{\mathbb{X}_{2}})2^{n(Q-q_{2}-\gamma)}. (85)

Thus, {σc2n(Ξ1,Ξ2;f)}\{\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\} is a sequence of continuous functions converging uniformly to (f)\mathcal{E}(f) on BB, so (f)\mathcal{E}(f) itself is continuous on BB. Let us define Rcα2nΠcα2nR_{c^{*}\alpha 2^{n}}\in\Pi_{c^{*}\alpha 2^{n}} for each n such that Rcα2nϕ𝕏12nγ\left|\left|R_{c^{*}\alpha 2^{n}}-\phi\right|\right|_{\mathbb{X}_{1}}\lesssim 2^{-n\gamma}. Theorem 2.5 and the strong product assumption (Definition 2.7) allow us to write

σcAα2n+1(Ξ1;Rcα2nσc2n(Ξ1,Ξ2;f))=Rcα2nσc2n(Ξ1,Ξ2;f).\sigma_{c^{*}A^{*}\alpha 2^{n+1}}(\Xi_{1};R_{c^{*}\alpha 2^{n}}\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f))=R_{c^{*}\alpha 2^{n}}\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f). (86)

Using Equations (65) and (86), Theorem 2.5, and the fact ϕ\phi is supported on BB, we can deduce

σcAα2n+1(Ξ1;Rcα2nσc2n(Ξ1,Ξ2;f)ϕ(f))𝕏1\displaystyle\left|\left|\sigma_{c^{*}A^{*}\alpha 2^{n+1}}\big{(}\Xi_{1};R_{c^{*}\alpha 2^{n}}\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)-\phi\mathcal{E}(f)\big{)}\right|\right|_{\mathbb{X}_{1}} (87)
\displaystyle\lesssim Rcα2nσc2n(Ξ1,Ξ2;f)ϕ(f)𝕏1\displaystyle\left|\left|R_{c^{*}\alpha 2^{n}}\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)-\phi\mathcal{E}(f)\right|\right|_{\mathbb{X}_{1}}
\displaystyle\lesssim ϕ(f)ϕσc2n(Ξ1,Ξ2;f)𝕏1+Rcα2nϕ𝕏1σc2n(Ξ1,Ξ2;f)𝕏1\displaystyle\left|\left|\phi\mathcal{E}(f)-\phi\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\right|\right|_{\mathbb{X}_{1}}+\left|\left|R_{c^{*}\alpha 2^{n}}-\phi\right|\right|_{\mathbb{X}_{1}}\left|\left|\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\right|\right|_{\mathbb{X}_{1}}
\displaystyle\lesssim (f)σc2n(Ξ1,Ξ2;f)B+2n(Qq2γ)f𝕏2.\displaystyle\left|\left|\mathcal{E}(f)-\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\right|\right|_{B}+2^{n(Q-q_{2}-\gamma)}\left|\left|f\right|\right|_{\mathbb{X}_{2}}.

In view of Equations (85) and (87), we can conclude that

EcAα2n+1(Ξ1,ϕ(f))\displaystyle E_{c^{*}A^{*}\alpha 2^{n+1}}(\Xi_{1},\phi\mathcal{E}(f)) (88)
\displaystyle\lesssim ϕ(f)σcAα2n+1(Ξ1,ϕ(f))𝕏1\displaystyle\left|\left|\phi\mathcal{E}(f)-\sigma_{c^{*}A^{*}\alpha 2^{n+1}}(\Xi_{1},\phi\mathcal{E}(f))\right|\right|_{\mathbb{X}_{1}}
\displaystyle\leq ϕ(f)Rcα2nσc2n(Ξ1,Ξ2;f)𝕏1\displaystyle\left|\left|\phi\mathcal{E}(f)-R_{c^{*}\alpha 2^{n}}\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\right|\right|_{\mathbb{X}_{1}}
+σcAα2n+1(Ξ1;Rcα2nσc2n(Ξ1,Ξ2;f)ϕ(f))𝕏1\displaystyle+\left|\left|\sigma_{c^{*}A^{*}\alpha 2^{n+1}}\big{(}\Xi_{1};R_{c^{*}\alpha 2^{n}}\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)-\phi\mathcal{E}(f)\big{)}\right|\right|_{\mathbb{X}_{1}}
\displaystyle\lesssim (f)σc2n(Ξ1,Ξ2;f)B+f𝕏22n(Qq2γ)\displaystyle\left|\left|\mathcal{E}(f)-\sigma_{c^{*}2^{n}}(\Xi_{1},\Xi_{2};f)\right|\right|_{B}+\left|\left|f\right|\right|_{\mathbb{X}_{2}}2^{n(Q-q_{2}-\gamma)}
\displaystyle\lesssim (1+f𝕏2(1+rq2S))2n(Qq2γ).\displaystyle(1+\left|\left|f\right|\right|_{\mathbb{X}_{2}}(1+r^{q_{2}-S}))2^{n(Q-q_{2}-\gamma)}.

Thus, ϕ(f)WγQ+q2(Ξ1)\phi\mathcal{E}(f)\in W_{\gamma-Q+q_{2}}(\Xi_{1}), completing the proof of part (b).

Appendix

5.1 Tauberian theorem

For the convenience of the reader, we reproduce the Tauberian theorem from [tauberian, , Theorem 4.3].

We recall that if μ\mu is an extended complex valued Borel measure on {\mathbb{R}}, then its total variation measure is defined for a Borel set BB by

|μ|(B)=sup|μ(Bk)|,|\mu|(B)=\sup\sum|\mu(B_{k})|,

where the sum is over a partition {Bk}\{B_{k}\} of BB comprising Borel sets, and the supremum is over all such partitions.

A measure μ\mu on {\mathbb{R}} is called an even measure if μ((u,u))=2μ([0,u))\mu((-u,u))=2\mu([0,u)) for all u>0u>0, and μ({0})=0\mu(\{0\})=0. If μ\mu is an extended complex valued measure on [0,)[0,\infty), and μ({0})=0\mu(\{0\})=0, we define a measure μe\mu_{e} on {\mathbb{R}} by

μe(B)=μ({|x|:xB}),\mu_{e}(B)=\mu\left(\{|x|:x\in B\}\right),

and observe that μe\mu_{e} is an even measure such that μe(B)=μ(B)\mu_{e}(B)=\mu(B) for B[0,)B\subset[0,\infty). In the sequel, we will assume that all measures on [0,)[0,\infty) which do not associate a nonzero mass with the point 0 are extended in this way, and will abuse the notation μ\mu also to denote the measure μe\mu_{e}. In the sequel, the phrase “measure on {\mathbb{R}}” will refer to an extended complex valued Borel measure having bounded total variation on compact intervals in {\mathbb{R}}, and similarly for measures on [0,)[0,\infty).

Our main Tauberian theorem is the following.

Theorem 5.8.

Let μ\mu be an extended complex valued measure on [0,)[0,\infty), and μ({0})=0\mu(\{0\})=0. We assume that there exist Q,r>0Q,r>0, such that each of the following conditions are satisfied.

  1. 1.
    |μ|Q:=supu[0,)|μ|([0,u))(u+2)Q<,|\!|\!|\mu|\!|\!|_{Q}:=\sup_{u\in[0,\infty)}\frac{|\mu|([0,u))}{(u+2)^{Q}}<\infty, (89)
  2. 2.

    There are constants c,C>0c,C>0, such that

    |exp(u2t)𝑑μ(u)|c1tCexp(r2/t)|μ|Q,0<t1.\left|\int_{\mathbb{R}}\exp(-u^{2}t)d\mu(u)\right|\leq c_{1}t^{-C}\exp(-r^{2}/t)|\!|\!|\mu|\!|\!|_{Q},\qquad 0<t\leq 1. (90)

Let H:[0,)H:[0,\infty)\to{\mathbb{R}}, S>Q+1S>Q+1 be an integer, and suppose that there exists a measure H[S]H^{[S]} such that

H(u)=0(v2u2)+S𝑑H[S](v),u,H(u)=\int_{0}^{\infty}(v^{2}-u^{2})_{+}^{S}dH^{[S]}(v),\qquad u\in{\mathbb{R}}, (91)

and

VQ,S(H)=max(0(v+2)Qv2Sd|H[S]|(v),0(v+2)QvSd|H[S]|(v))<.V_{Q,S}(H)=\max\left(\int_{0}^{\infty}(v+2)^{Q}v^{2S}d|H^{[S]}|(v),\int_{0}^{\infty}(v+2)^{Q}v^{S}d|H^{[S]}|(v)\right)<\infty. (92)

Then for n1n\geq 1,

|0H(u/n)𝑑μ(u)|cnQmax(1,(nr)S)VQ,S(H)|μ|Q.\left|\int_{0}^{\infty}H(u/n)d\mu(u)\right|\leq c\frac{n^{Q}}{\max(1,(nr)^{S})}V_{Q,S}(H)|\!|\!|\mu|\!|\!|_{Q}. (93)

Proposition 2.4 is proved using this theorem with

μ(u)=μx,y(u)=k:λk<uϕk(x)ϕk(y).\mu(u)=\mu_{x,y}(u)=\sum_{k:\lambda_{k}<u}\phi_{k}(x)\phi_{k}(y).

Proposition 3.4 is proved using this theorem with

μ(u)=μx1,x2(u)=j,k:j,k<uAj,kϕ1,j(x1)ϕ2,k(x2).\mu(u)=\mu_{x_{1},x_{2}}(u)=\sum_{j,k:\ell_{j,k}<u}A_{j,k}\phi_{1,j}(x_{1})\phi_{2,k}(x_{2}).

5.2 Jacobi polynomials

For α,β>1\alpha,\beta>-1, x(1,1)x\in(-1,1) and integer 0\ell\geq 0, the Jacobi polynomials p(α,β)p_{\ell}^{(\alpha,\beta)} are defined by the Rodrigues’ formula [szego, , Formulas (4.3.1), (4.3.4)]

(1x)α(1+x)βp(α,β)(x)\displaystyle(1-x)^{\alpha}(1+x)^{\beta}p_{\ell}^{(\alpha,\beta)}(x) (94)
=\displaystyle= {2+α+β+12α+β+1!(+α+β)!(+α)!(+β)!}1/2(1)2!ddx((1x)+α(1+x)+β),\displaystyle\left\{\frac{2\ell+\alpha+\beta+1}{2^{\alpha+\beta+1}}\frac{\ell!(\ell+\alpha+\beta)!}{(\ell+\alpha)!(\ell+\beta)!}\right\}^{1/2}\frac{(-1)^{\ell}}{2^{\ell}\ell!}\frac{d^{\ell}}{dx^{\ell}}\left((1-x)^{\ell+\alpha}(1+x)^{\ell+\beta}\right),

where z!z! denotes Γ(z+1)\Gamma(z+1). The Jacobi polynomials satisfy the following well-known differential equation:

pn′′(α,β)(x)(1x2)+(βα(α+β+2)x)pn(α,β)(x)=n(n+α+β+1)pn(α,β)(x).{p^{\prime\prime}_{n}}^{(\alpha,\beta)}(x)(1-x^{2})+(\beta-\alpha-(\alpha+\beta+2)x){p^{\prime}_{n}}^{(\alpha,\beta)}(x)=-n(n+\alpha+\beta+1)p_{n}^{(\alpha,\beta)}(x). (95)

Each p(α,β)p_{\ell}^{(\alpha,\beta)} is a polynomial of degree \ell with positive leading coefficient, satisfying the orthogonality relation

11p(α,β)(x)pj(α,β)(x)(1x)α(1+x)β=δ,j,\int_{-1}^{1}p_{\ell}^{(\alpha,\beta)}(x)p_{j}^{(\alpha,\beta)}(x)(1-x)^{\alpha}(1+x)^{\beta}=\delta_{\ell,j}, (96)

and

p(α,β)(1)={2+α+β+12α+β+1!(+α+β)!(+α)!(+β)!}1/2(+α)!α!!α+1/2.p_{\ell}^{(\alpha,\beta)}(1)=\left\{\frac{2\ell+\alpha+\beta+1}{2^{\alpha+\beta+1}}\frac{\ell!(\ell+\alpha+\beta)!}{(\ell+\alpha)!(\ell+\beta)!}\right\}^{1/2}\frac{(\ell+\alpha)!}{\alpha!\ell!}\sim\ell^{\alpha+1/2}. (97)

It follows that p(α,β)(x)=(1)p(β,α)(x)p_{\ell}^{(\alpha,\beta)}(-x)=(-1)^{\ell}p_{\ell}^{(\beta,\alpha)}(x). In particular, p2(α,α)p_{2\ell}^{(\alpha,\alpha)} is an even polynomial, and p2+1(α,α)p_{2\ell+1}^{(\alpha,\alpha)} is an odd polynomial. We note (cf. [szego, , Theorem 4.1]) that

p2(α,α)(x)=\displaystyle p_{2\ell}^{(\alpha,\alpha)}(x)= 2α/2+1/4p(α,1/2)(2x21)=2α/2+1/4(1)p(1/2,α)(12x2)\displaystyle 2^{\alpha/2+1/4}p_{\ell}^{(\alpha,-1/2)}(2x^{2}-1)=2^{\alpha/2+1/4}(-1)^{\ell}p_{\ell}^{(-1/2,\alpha)}(1-2x^{2}) (98)
p2+1(α,α)(x)=\displaystyle p_{2\ell+1}^{(\alpha,\alpha)}(x)= 2α/2+1/2xp(α,1/2)(2x21)=2α/2+1/2(1)xp(1/2,α)(12x2).\displaystyle 2^{\alpha/2+1/2}xp_{\ell}^{(\alpha,1/2)}(2x^{2}-1)=2^{\alpha/2+1/2}(-1)^{\ell}xp_{\ell}^{(1/2,\alpha)}(1-2x^{2}).

It is known nowak2011sharp that for α,β1/2\alpha,\beta\geq-1/2 and θ,ϕ[0,π]\theta,\phi\in[0,\pi],

j=0exp(j(j+α+β+1)t)pj(α,β)(cosθ)pj(α,β)(cosϕ)\displaystyle\sum_{j=0}^{\infty}\exp(-j(j+\alpha+\beta+1)t)p_{j}^{(\alpha,\beta)}(\cos\theta)p_{j}^{(\alpha,\beta)}(\cos\phi) (99)
\displaystyle\lesssim (t+θϕ)α1/2(t+(πθ)(πϕ))β1/2t1/2exp(c(θϕ)2t).\displaystyle(t+\theta\phi)^{-\alpha-1/2}(t+(\pi-\theta)(\pi-\phi))^{-\beta-1/2}t^{-1/2}\exp\left(-c\frac{(\theta-\phi)^{2}}{t}\right).

We note that when β=1/2\beta=-1/2, this yields

j=0exp(j(j+α+1/2)t)pj(α,1/2)(cosθ)pj(α,1/2)(cosϕ)\displaystyle\sum_{j=0}^{\infty}\exp(-j(j+\alpha+1/2)t)p_{j}^{(\alpha,-1/2)}(\cos\theta)p_{j}^{(\alpha,-1/2)}(\cos\phi) (100)
\displaystyle\lesssim tα1exp(c(θϕ)2t).\displaystyle t^{-\alpha-1}\exp\left(-c\frac{(\theta-\phi)^{2}}{t}\right).

If 0\ell\geq 0 is an integer, and {Y,k}\{Y_{\ell,k}\} is an orthonormal basis for the space q{\mathbb{H}}_{\ell}^{q} of restrictions to the sphere SSq\SS^{q} (with respect to the probability volume measure) of (q+1)(q+1)-variate homogeneous harmonic polynomials of total degree \ell, then one has the well-known addition formula mullerbk and [batemanvol2, , Chapter XI, Theorem 4] connecting Y,kY_{\ell,k}’s with Jacobi polynomials defined in (94):

k=1dqY,k(𝐱)Y,k(𝐲)¯=ωqωq1p(q/21,q/21)(1)p(q/21,q/21)(𝐱𝐲),=0,1,,\sum_{k=1}^{d\,^{q}_{\ell}}Y_{\ell,k}({\bf x})\overline{Y_{\ell,k}({\bf y})}=\frac{\omega_{q}}{\omega_{q-1}}p_{\ell}^{(q/2-1,q/2-1)}(1)p_{\ell}^{(q/2-1,q/2-1)}({\bf x}\cdot{\bf y}),\quad\ell=0,1,\cdots, (101)

where ωq=Vol(𝕊q)\omega_{q}=\operatorname{Vol}(\mathbb{S}^{q}) and dq=dim(q)d\,^{q}_{\ell}=\operatorname{dim}(\mathbb{H}_{\ell}^{q}).

References

  • [1] A. Arenas, Ó. Ciaurri, and E. Labarga. A weighted transplantation theorem for jacobi coefficients. Journal of Approximation Theory, 248:105297, 2019.
  • [2] H. Bateman, A. Erdélyi, W. Magnus, F. Oberhettinger, and F. G. Tricomi. Higher transcendental functions, volume 2. McGraw-Hill New York, 1955.
  • [3] M. Belkin, I. Matveeva, and P. Niyogi. Regularization and semi-supervised learning on large graphs. In Learning theory, pages 624–638. Springer, 2004.
  • [4] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6):1373–1396, 2003.
  • [5] M. Belkin and P. Niyogi. Semi-supervised learning on Riemannian manifolds. Machine learning, 56(1-3):209–239, 2004.
  • [6] A. L. Bertozzi and A. Flenner. Diffuse interface models on graphs for classification of high dimensional data. Multiscale Modeling & Simulation, 10(3):1090–1118, 2012.
  • [7] M. Cheney and B. Borden. Fundamentals of radar imaging. SIAM, 2009.
  • [8] C. K. Chui and D. L. Donoho. Special issue: Diffusion maps and wavelets. Appl. and Comput. Harm. Anal., 21(1), 2006.
  • [9] C. K. Chui and H. N. Mhaskar. Deep nets for local manifold learning. Frontiers in Applied Mathematics and Statistics, 4:12, 2018.
  • [10] C. K. Chui and J. Wang. Dimensionality reduction of hyperspectral imagery data for feature classification. In Handbook of Geomathematics, pages 1005–1047. Springer, 2010.
  • [11] C. K. Chui and J. Wang. Randomized anisotropic transform for nonlinear dimensionality reduction. GEM-International Journal on Geomathematics, 1(1):23–50, 2010.
  • [12] C. K. Chui and J. Wang. Nonlinear methods for dimensionality reduction. In Handbook of Geomathematics, pages 1–46. Springer, 2015.
  • [13] R. R. Coifman and M. J. Hirn. Diffusion maps for changing data. Applied and Computational Harmonic Analysis, 36(1):79–107, 2014.
  • [14] R. R. Coifman and S. Lafon. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30, 2006.
  • [15] R. R. Coifman and M. Maggioni. Diffusion wavelets. Applied and Computational Harmonic Analysis, 21(1):53–94, 2006.
  • [16] L. D. David and G. Carrie. Hessian eigenmaps: new locally linear embedding techniques for high dimensional data, tr2003-08, dept. of statistics, 2003.
  • [17] E. B. Davies. Heat kernels and spectral theory, volume 92. Cambridge University Press, 1990.
  • [18] A. Díaz-González, F. Marcellán, H. Pijeira-Cabrera, and W. Urbina. Discrete–continuous jacobi–sobolev spaces and fourier series. Bulletin of the Malaysian Mathematical Sciences Society, 44:571–598, 2021.
  • [19] M. P. do Carmo Valero. Riemannian geometry. Birkhäuser, 1992.
  • [20] D. L. Donoho and C. Grimes. Image manifolds which are isometric to euclidean space. Journal of mathematical imaging and vision, 23(1):5–24, 2005.
  • [21] D. L. Donoho, O. Levi, J.-L. Starck, and V. Martinez. Multiscale geometric analysis for 3d catalogs. In Astronomical Telescopes and Instrumentation, pages 101–111. International Society for Optics and Photonics, 2002.
  • [22] M. Ehler, F. Filbir, and H. N. Mhaskar. Locally learning biomedical data using diffusion frames. Journal of Computational Biology, 19(11):1251–1264, 2012.
  • [23] F. Filbir and H. N. Mhaskar. A quadrature formula for diffusion polynomials corresponding to a generalized heat kernel. Journal of Fourier Analysis and Applications, 16(5):629–657, 2010.
  • [24] F. Filbir and H. N. Mhaskar. Marcinkiewicz–Zygmund measures on manifolds. Journal of Complexity, 27(6):568–596, 2011.
  • [25] J. Friedman and J.-P. Tillich. Wave equations for graphs and the edge-based laplacian. Pacific Journal of Mathematics, 216(2):229–266, 2004.
  • [26] D. Geller and I. Z. Pesenson. Band-limited localized Parseval frames and Besov spaces on compact homogeneous manifolds. Journal of Geometric Analysis, 21(2):334–371, 2011.
  • [27] A. Grigor’yan. Upper bounds of derivatives of the heat kernel on an arbitrary complete manifold. Journal of Functional Analysis, 127(2):363–389, 1995.
  • [28] A. Grigor’yan. Gaussian upper bounds for the heat kernel on arbitrary manifolds. J. Diff. Geom., 45:33–52, 1997.
  • [29] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang. Face recognition using Laplacianfaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(3):328–340, 2005.
  • [30] P. W. Jones, M. Maggioni, and R. Schul. Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels. Proceedings of the National Academy of Sciences, 105(6):1803–1808, 2008.
  • [31] P. W. Jones, M. Maggioni, and R. Schul. Universal local parametrizations via heat kernels and eigenfunctions of the Laplacian. Ann. Acad. Sci. Fenn. Math., 35:131–174, 2010.
  • [32] W. H. Kim, V. Singh, M. K. Chung, C. Hinrichs, D. Pachauri, O. C. Okonkwo, S. C. Johnson, A. D. N. Initiative, et al. Multi-resolutional shape features via non-euclidean wavelets: Applications to statistical analysis of cortical thickness. NeuroImage, 93:107–123, 2014.
  • [33] Y. A. Kordyukov. Lp{L}^{p}–theory of elliptic differential operators on manifolds of bounded geometry. Acta Applicandae Mathematica, 23(3):223–260, 1991.
  • [34] J. P. Lerch, J. C. Pruessner, A. Zijdenbos, H. Hampel, S. J. Teipel, and A. C. Evans. Focal decline of cortical thickness in alzheimer’s disease identified by computational neuroanatomy. Cerebral cortex, 15(7):995–1001, 2005.
  • [35] Z. Li, U. Park, and A. K. Jain. A discriminative model for age invariant face recognition. Information Forensics and Security, IEEE Transactions on, 6(3):1028–1037, 2011.
  • [36] M. Maggioni and H. N. Mhaskar. Diffusion polynomial frames on metric measure spaces. Applied and Computational Harmonic Analysis, 24(3):329–353, 2008.
  • [37] S. Maskey, R. Levie, and G. Kutyniok. Transferability of graph neural networks: an extended graphon approach. Applied and Computational Harmonic Analysis, 63:48–83, 2023.
  • [38] A. Maurer, M. Pontil, and B. Romera-Paredes. Sparse coding for multitask and transfer learning. In International conference on machine learning, pages 343–351. PMLR, 2013.
  • [39] H. Mhaskar. A unified framework for harmonic analysis of functions on directed graphs and changing data. Applied and Computational Harmonic Analysis, 44(3):611–644, 2018.
  • [40] H. N. Mhaskar. Eignets for function approximation on manifolds. Applied and Computational Harmonic Analysis, 29(1):63–87, 2010.
  • [41] H. N. Mhaskar. A generalized diffusion frame for parsimonious representation of functions on data defined manifolds. Neural Networks, 24(4):345–359, 2011.
  • [42] H. N. Mhaskar. Kernel-based analysis of massive data. Frontiers in Applied Mathematics and Statistics, 6:30, 2020.
  • [43] H. N. Mhaskar, S. V. Pereverzyev, and M. D. van der Walt. A deep learning approach to diabetic blood glucose prediction. Frontiers in Applied Mathematics and Statistics, 3:14, 2017.
  • [44] B. Muckenhoupt. Transplantation theorems and multiplier theorems for Jacobi series. American Mathematical Soc., 1986.
  • [45] C. Müller. Spherical harmonics, volume 17. Springer, 2006.
  • [46] D. C. Munson, J. D. O’Brien, and W. K. Jenkins. A tomographic formulation of spotlight-mode synthetic aperture radar. Proceedings of the IEEE, 71(8):917–925, 1983.
  • [47] F. Natterer. The mathematics of computerized tomography. SIAM, 2001.
  • [48] V. Naumova, L. Nita, J. U. Poulsen, and S. V. Pereverzyev. Meta-learning based blood glucose predictor for diabetic smartphone app, 2014.
  • [49] C. J. Nolan and M. Cheney. Synthetic aperture inversion. Inverse Problems, 18(1):221, 2002.
  • [50] A. Nowak and P. Sjögren. Sharp estimates of the jacobi heat kernel. arXiv preprint arXiv:1111.3145, 2011.
  • [51] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
  • [52] J. Schmidt-Hieber. Deep ReLU network approximation of functions on a manifold. arXiv preprint arXiv:1908.00695, 2019.
  • [53] U. Shaham, A. Cloninger, and R. R. Coifman. Provable approximation properties for deep neural networks. Applied and Computational Harmonic Analysis, 44(3):537–557, 2018.
  • [54] A. Sikora. Riesz transform, Gaussian bounds and the method of wave equation. Mathematische Zeitschrift, 247(3):643–662, 2004.
  • [55] G. Szegö. Orthogonal polynomials. In Colloquium publications/American mathematical society, volume 23. Providence, 1975.
  • [56] J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
  • [57] Y. van Gennip, B. Hunter, R. Ahn, P. Elliott, K. Luh, M. Halvorson, S. Reid, M. Valasik, J. Wo, G. E. Tita, et al. Community detection using spectral clustering on sparse geosocial data. SIAM Journal on Applied Mathematics, 73(1):67–83, 2013.
  • [58] K. Q. Weinberger, B. D. Packer, and L. K. Saul. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In Proceedings of the tenth international workshop on artificial intelligence and statistics, pages 381–388. Citeseer, 2005.
  • [59] Z.-Y. Zhang and H.-Y. Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai University (English Edition), 8(4):406–424, 2004.