This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Spectral Barron space and deep neural network approximation

Yulei Liao and Pingbing Ming LSEC, Institute of Computational Mathematics and Scientific/Engineering Computing, AMSS, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China [email protected], [email protected]
Abstract.

We prove the sharp embedding between the spectral Barron space and the Besov space. Given the spectral Barron space as the target function space, we prove a dimension-free result that if the neural network contains LL hidden layers with NN units per layer, then the upper and lower bounds of the L2L^{2}-approximation error are ๐’ชโ€‹(Nโˆ’sโ€‹L)\mathcal{O}(N^{-sL}) with 0<sโ€‹Lโ‰ค1/20<sL\leq 1/2, where ss is the smoothness index of the spectral Barron space.

Key words and phrases:
Spectral Barron space; Deep neural network; Approximation theory
2020 Mathematics Subject Classification:
32C22, 32K05, 33C20, 41A25, 41A46, 42A38, 68T07
The authors thank Professor Renjin Jiang in Capital Normal University for the helpful discussion. The work of Liao and Ming were supported by National Natural Science Foundation of China through Grants No. 11971467 and No. 12371438.

1. Introduction

A series of works have been devoted to studying the neural network approximation error and generalization error with the Barron classย [Barron:1992, Barron:1993, Barron:1994, Barron:2018] as the target space. For ff a complex-valued function and sโ‰ฅ0s\geq 0, the spectral norm ฯ…f,s\upsilon_{f,s} is defined as

ฯ…f,s:=โˆซโ„d|ฮพ|sโ€‹|f^โ€‹(ฮพ)|โ€‹dฮพ,\upsilon_{f,s}{:}=\int_{\mathbb{R}^{d}}\lvert\xi\rvert^{s}\lvert\widehat{f}(\xi)\rvert\mathrm{d}\xi,

where f^\widehat{f} is the Fourier transform of ff in the distribution sense. A function ff is said to belong to the Barron class if its spectral norm ฯ…f,s\upsilon_{f,s} is finite and the Fourier inversion holds pointwise. However, it is important to note that this definition lacks rigor, as it does not specify the conditions under which the pointwise Fourier inversion is valid. Addressing this issue is a nontrivial matter, as discussed inย [Pinsky:1997]. Since then, the authors inย [Ma:2017, Xu:2020, Siegel:2022, Siegel:2023] assume fโˆˆL1โ€‹(โ„d)f\in L^{1}(\mathbb{R}^{d}), and define

(1.1) โ„ฌ^sโ€‹(โ„d):={fโˆˆL1โ€‹(โ„d)โˆฃฯ…f,0+ฯ…f,s<โˆž}.\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}){:}=\left\{\,f\in L^{1}(\mathbb{R}^{d})\,\mid\,\upsilon_{f,0}+\upsilon_{f,s}<\infty\,\right\}.

For functions in โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}), the Fourier transform and the pointwise Fourier inversion are valid. Unfortunately, we shall prove in Lemmaย 2.3 that โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}) equipped with the norm ฯ…f,0+ฯ…f,s\upsilon_{f,0}+\upsilon_{f,s} is not complete. Therefore, it does not qualify as a Banach space.

To address this issue, an alternative class of function spaces has been proposed, which can be traced back to the work of Hรถrmanderย [Hormander:1963]. It is defined as follows.

โ„ฑโ€‹Lpsโ€‹(โ„d):={fโˆˆ๐’ฎโ€ฒโ€‹(โ„d)โˆฃ(1+|ฮพ|s)โ€‹f^โ€‹(ฮพ)โˆˆLpโ€‹(โ„d)}\mathscr{F}L^{s}_{p}(\mathbb{R}^{d}){:}=\left\{\,f\in\mathscr{S}^{\prime}(\mathbb{R}^{d})\,\mid\,(1+\lvert\xi\rvert^{s})\widehat{f}(\xi)\in L^{p}(\mathbb{R}^{d})\,\right\}

for 1โ‰คpโ‰คโˆž1\leq p\leq\infty and sโ‰ฅ0s\geq 0. This space has been studied extensively and may be referred to by different names. Some call it the Hรถrmander space, as mentioned in works such asย [Hormander:1963, Messina:2001, DGSM60:2014, Ivec:2021]; Others refer to it as the Fourier Lebesgue space, as seen inย [Grochenig:2002, Pilipovic:2010, BenyiOh:2013, Kato:2020]. We are interested in p=1p=1, and call it the spectral Barron space:

โ„ฌsโ€‹(โ„d):={fโˆˆ๐’ฎโ€ฒโ€‹(โ„d)โˆฃฯ…f,0+ฯ…f,s<โˆž},\mathscr{B}^{s}(\mathbb{R}^{d}){:}=\left\{\,f\in\mathscr{S}^{\prime}(\mathbb{R}^{d})\,\mid\,\upsilon_{f,0}+\upsilon_{f,s}<\infty\,\right\},

which is equipped with the norm

โ€–fโ€–โ„ฌsโ€‹(โ„d):=ฯ…f,0+ฯ…f,s=โˆซโ„d(1+|ฮพ|s)โ€‹|f^โ€‹(ฮพ)|โ€‹dฮพ.\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}{:}=\upsilon_{f,0}+\upsilon_{f,s}=\int_{\mathbb{R}^{d}}(1+\lvert\xi\rvert^{s})\lvert\widehat{f}(\xi)\rvert\mathrm{d}\xi.

We show in Lemmaย 2.1 that the pointwise Fourier inversion is valid for functions in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) with a nonnegative ss. Some authors also refer to โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) as the Fourier algebra or Wiener algebra, whose algebraic properties, such as the Wiener-Levy theoremย [Wiener:1932, Levy:1935, Helson:1959], have been extensively studied inย [ReitherStegeman:2000, Liflyand:2012].

Another popular space for analyzing shallow neural networks is the Barron spaceย [E:2019, EMW:2022], which can be viewed as shallow neural networks with infinite width. The authors inย [Wojtowytsch:2022, E:2022] claimed that the spectral Barron space is much smaller than the Barron space. As observed inย [Caragea:2023], this statement is not accurate because they have not discriminated the smoothness index ss in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}). In addition, the variation space, introduced inย [Barron:2008], has been studied in relation to the spectral Barron space โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) and the Barron space inย [SiegelXu:2022, Siegel:2023]. These spaces have been exploited to study the regularity of partial differential equationsย [ChenLuLu:2021, Lu:2021, E:2022, Chen:2023]. Recently a new spaceย [ParhiNowak:2022] originated from the variational spline theory, which is closely related to the variation space, has also been exploited as the target function space for neural network approximationย [ParhiNowak:2023].

The first objective of the present work is the analytical properties of โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}). In Lemmaย 2.2, we show that โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) is complete, while Lemmaย 2.3 shows that โ„ฌ^sโ€‹(โ„d)\mathscr{\widehat{B}}^{s}(\mathbb{R}^{d}) is not complete. This distinction highlights a key difference between these two spaces. Furthermore, Lemmaย 2.5 provides an example that illustrates functions in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) may decay arbitrarily slow. This example, constructed elegantly using the generalized Hypergeometric function, reveals interesting relationships between the Fourier transform and the decay rate of the functions. Additionally, we study the relations among โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) and some classical function spaces. In Theoremย 2.9, we establish the connections between โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) and the Besov space. Furthermore, in Corollaryย 2.12, we establish the connections between โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) and the Sobolev spaces. Notably, we prove the embedding relation

B2,1s+d/2โ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d)โ†ชBโˆž,1sโ€‹(โ„d),B^{s+d/2}_{2,1}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow B^{s}_{\infty,1}(\mathbb{R}^{d}),

which is an optimal result that appears to be missing in the existing literature. This embedding may serve as a bridge to study how the Barron space, the variation space and the space inย [ParhiNowak:2022] are related to the classical function spaces such as the Besov space, which seems missing in the literature; cf.ย [ParhiNowak:2022]*ยงย 5.

The second objective of the present work is to explore the neural network approximation on a bounded domain. Building upon Barronโ€™s seminal works on approximating functions in โ„ฌ1โ€‹(โ„d)\mathscr{B}^{1}(\mathbb{R}^{d}) with L2L^{2}-norm, recent studies extended the approximation to functions in โ„ฌk+1โ€‹(โ„d)\mathscr{B}^{k+1}(\mathbb{R}^{d}) with HkH^{k}-norm, as demonstrated inย [Siegel:2020, Xu:2020]. Furthermore, improved approximation rates have been achieved for functions in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) with large ss in works such asย [Bresler:2020, MaSiegelXu:2022, Siegel:2022]. These advancements contribute to a deeper understanding of the approximation capabilities of neural networks.

The distinction between deep ReLU networks and shallow networks has been highlighted in the separation theorems presented inย [Eldan:2016, Telgarsky:2016, Shamir:2022]. These theorems provide examples that can be well approximated by three-layer ReLU neural networks but not by two-layer ReLU neural networks with a width that grows polynomially with the dimension. This sheds light on the differences in expressive power between shallow and deep networks. Moreover, the approximation rates for neural networks targeting mixed derivative Besov/Sobolev spaces, spectral Barron spaces, and Hรถlder spaces have also been investigated. These studies contribute to a broader understanding of the approximation capabilities of neural networks in various function spaces as inย [Du:2019, BreslerNagaraj:2020, Bolcskei:2021, LuShenYangZhang:2021, Suzuki:2021].

We focus on the L2L^{2}-approximation properties for functions in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) when ss is small. In Theoremย 3.9, we establish that a neural network with LL hidden layers and NN units in each layer can approximate functions in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) with a convergence rate of ๐’ชโ€‹(Nโˆ’sโ€‹L)\mathcal{O}(N^{-sL}) when 0<sโ€‹Lโ‰ค1/20<sL\leq 1/2. This bound is sharp, as demonstrated in Theoremย 3.11. Importantly, our results provide optimal convergence rates compared to existing literature. For deep neural networks, a similar result has been presented inย [BreslerNagaraj:2020] with a convergence rate of ๐’ชโ€‹(Nโˆ’sโ€‹L/2)\mathcal{O}(N^{-sL/2}). For shallow neural network; i.e., L=1L=1, convergence rates of ๐’ชโ€‹(Nโˆ’1/2)\mathcal{O}(N^{-1/2}) have been established inย [MengMing:2022, Siegel:2022] when s=1/2s=1/2. However, it is worth noting that the constants in their estimates depend on the dimension at least polynomially, or even exponentially, and require other bounded norms besides ฯ…f,s\upsilon_{f,s}. Our results provide a significant advancement by achieving optimal convergence rates without the additional dependency on dimension or other bounded norms.

The remaining part of the paper is structured as follows. In Section 2, we demonstrate that the spectral Barron space is a Banach space and examine its relationship with other function spaces. This analysis provides a foundation for understanding the properties of the spectral Barron space. In Section 3, we delve into the error estimation for approximating functions in the spectral Barron space using deep neural networks with finite depth and infinite width. By investigating the convergence properties of these networks, we gain insights into their approximation capabilities and provide error bounds for their performance. Finally, in Section 4, we conclude our work by summarizing the key findings and contributions of this study. We also discuss potential avenues for future research and highlight the significance of our results in the broader context of function approximation using neural networks. Certain technical results are postponed to the Appendix.

2. Completeness of โ„ฌs\mathscr{B}^{s} and its relation to other function spaces

This part discusses the completeness of the spectral Barron space and embedding relations to other classical function spaces. Firstly, we fix some notations. Let ๐’ฎ\mathscr{S} be the Schwartz space and let ๐’ฎโ€ฒ\mathscr{S}^{\prime} be its topological dual space, i.e., the space of tempered distribution. The Gamma function

ฮ“โ€‹(s):=โˆซ0โˆžtsโˆ’1โ€‹eโˆ’tโ€‹dt,s>0.\Gamma(s){:}=\int_{0}^{\infty}t^{s-1}e^{-t}\mathrm{d}t,\qquad s>0.

Denoting the surface area of the unit sphere ๐•Šdโˆ’1\mathbb{S}^{d-1} by ฯ‰dโˆ’1=2โ€‹ฯ€d/2/ฮ“โ€‹(d/2)\omega_{d-1}=2\pi^{d/2}/\Gamma(d/2). The volume of the unit ball is ฮฝd=ฯ‰dโˆ’1/d\nu_{d}=\omega_{d-1}/d. The Beta function

Bโ€‹(ฮฑ,ฮฒ):=โˆซ01tฮฑโˆ’1โ€‹(1โˆ’t)ฮฒโˆ’1โ€‹dt=ฮ“โ€‹(ฮฑ)โ€‹ฮ“โ€‹(ฮฒ)ฮ“โ€‹(ฮฑ+ฮฒ),ฮฑ,ฮฒ>0.B(\alpha,\beta){:}=\int_{0}^{1}t^{\alpha-1}(1-t)^{\beta-1}\mathrm{d}t=\dfrac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)},\qquad\alpha,\beta>0.

The series formulation of the first kind of Bessel function is defined as

Jฮฝโ€‹(x):=(x/2)ฮฝโ€‹โˆ‘k=0โˆž(โˆ’1)kโ€‹(x/2)2โ€‹kฮ“โ€‹(ฮฝ+k+1)โ€‹k!.J_{\nu}(x){:}=(x/2)^{\nu}\sum_{k=0}^{\infty}(-1)^{k}\dfrac{(x/2)^{2k}}{\Gamma(\nu+k+1)k!}.

This definition may be found inย [Luke:1962]*ยงย 1.4.1, Eq. (1).

For fโˆˆL1โ€‹(โ„d)f\in L^{1}(\mathbb{R}^{d}), its Fourier transform of ff is defined as

f^โ€‹(ฮพ):=โˆซโ„dfโ€‹(x)โ€‹eโˆ’2โ€‹ฯ€โ€‹iโ€‹xโ‹…ฮพโ€‹dx,\widehat{f}(\xi){:}=\int_{\mathbb{R}^{d}}f(x)e^{-2\pi ix\cdot\xi}\mathrm{d}x,

and the inverse Fourier transform is defined as

fโˆจโ€‹(x):=โˆซโ„dfโ€‹(ฮพ)โ€‹e2โ€‹ฯ€โ€‹iโ€‹xโ‹…ฮพโ€‹dฮพ.f^{\vee}(x){:}=\int_{\mathbb{R}^{d}}f(\xi)e^{2\pi ix\cdot\xi}\mathrm{d}\xi.

If fโˆˆ๐’ฎโ€ฒโ€‹(โ„d)f\in\mathscr{S}^{\prime}(\mathbb{R}^{d}), then the Fourier transform in the sense of distribution means

โŸจf^,ฯ†โŸฉ=โŸจf,ฯ†^โŸฉfor anyฯ†โˆˆ๐’ฎโ€‹(โ„d)โŠ‚L1โ€‹(โ„d).\langle\,\widehat{f},\varphi\rangle=\langle\,f,\widehat{\varphi}\rangle\qquad\text{for any}\quad\varphi\in\mathscr{S}(\mathbb{R}^{d})\subset L^{1}(\mathbb{R}^{d}).

We shall frequently use the following Hausdorff-Young inequality. Let 1โ‰คpโ‰ค21\leq p\leq 2 and fโˆˆLpโ€‹(โ„d)f\in L^{p}(\mathbb{R}^{d}), then

(2.1) โ€–f^โ€–Lpโ€ฒโ€‹(โ„d)โ‰คโ€–fโ€–Lpโ€‹(โ„d),\|\,\widehat{f}\,\|_{L^{p^{\prime}}(\mathbb{R}^{d})}\leq\|\,f\,\|_{L^{p}(\mathbb{R}^{d})},

where pโ€ฒp^{\prime} is the conjugate exponent of pp; i.e. 1/p+1/pโ€ฒ=11/p+1/p^{\prime}=1.

We shall use the following pointwise Fourier inversion theorem.

Lemma 2.1.

Let gโˆˆL1โ€‹(โ„d)g\in L^{1}(\mathbb{R}^{d}), then gโˆจ^=g\widehat{g^{\vee}}=g in ๐’ฎโ€ฒโ€‹(โ„d)\mathscr{S}^{\prime}(\mathbb{R}^{d}). Furthermore, let fโˆˆ๐’ฎโ€ฒโ€‹(โ„d)f\in\mathscr{S}^{\prime}(\mathbb{R}^{d}) and f^โˆˆL1โ€‹(โ„d)\widehat{f}\in L^{1}(\mathbb{R}^{d}), then (f^)โˆจ=f(\widehat{f})^{\vee}=f, a.e. on โ„d\mathbb{R}^{d}.

Proof.

By definition, there holds

โŸจgโˆจ^,ฯ†โŸฉ=โŸจgโˆจ,ฯ†^โŸฉ=โŸจg,ฯ†โŸฉfor anyฯ†โˆˆ๐’ฎโ€‹(โ„d).\langle\,\widehat{g^{\vee}},\varphi\rangle=\langle\,g^{\vee},\widehat{\varphi}\rangle=\langle\,g,\varphi\rangle\qquad\text{for any}\quad\varphi\in\mathscr{S}(\mathbb{R}^{d}).

Therefore, gโˆจ^=g\widehat{g^{\vee}}=g in ๐’ฎโ€ฒโ€‹(โ„d)\mathscr{S}^{\prime}(\mathbb{R}^{d}). Note that f^โˆˆL1โ€‹(โ„d)\widehat{f}\in L^{1}(\mathbb{R}^{d}),

โŸจ(f^)โˆจ,ฯ†โŸฉ=โŸจf^,ฯ†โˆจโŸฉ=โŸจf,ฯ†โŸฉfor anyฯ†โˆˆ๐’ฎโ€‹(โ„d).\langle\,(\widehat{f})^{\vee},\varphi\rangle=\langle\,\widehat{f},\varphi^{\vee}\rangle=\langle\,f,\varphi\rangle\qquad\text{for any}\quad\varphi\in\mathscr{S}(\mathbb{R}^{d}).

By the Hausdorff-Young inequalityย (2.1),

โ€–(f^)โˆจโ€–Lโˆžโ€‹(โ„d)โ‰คโ€–f^โ€–L1โ€‹(โ„d).\|\,(\widehat{f})^{\vee}\,\|_{L^{\infty}(\mathbb{R}^{d})}\leq\|\,\widehat{f}\,\|_{L^{1}(\mathbb{R}^{d})}.

Therefore, ff is a linear bounded operator on L1โ€‹(โ„d)L^{1}(\mathbb{R}^{d}); i.e., fโˆˆ[L1โ€‹(โ„d)]โˆ—=Lโˆžโ€‹(โ„d)f\in[L^{1}(\mathbb{R}^{d})]^{*}=L^{\infty}(\mathbb{R}^{d}) due to ๐’ฎโ€‹(โ„d)\mathscr{S}(\mathbb{R}^{d}) is dense in L1โ€‹(โ„d)L^{1}(\mathbb{R}^{d}) and

|โŸจf,ฯ†โŸฉ|=|โŸจ(f^)โˆจ,ฯ†โŸฉ|โ‰คโ€–(f^)โˆจโ€–Lโˆžโ€‹(โ„d)โ€‹โ€–ฯ†โ€–L1โ€‹(โ„d)โ‰คโ€–f^โ€–L1โ€‹(โ„d)โ€‹โ€–ฯ†โ€–L1โ€‹(โ„d).\lvert\langle\,f,\varphi\rangle\rvert=\lvert\langle\,(\widehat{f})^{\vee},\varphi\rangle\rvert\leq\|\,(\widehat{f})^{\vee}\,\|_{L^{\infty}(\mathbb{R}^{d})}\|\,\varphi\,\|_{L^{1}(\mathbb{R}^{d})}\leq\|\,\widehat{f}\,\|_{L^{1}(\mathbb{R}^{d})}\|\,\varphi\,\|_{L^{1}(\mathbb{R}^{d})}.

Hence, (f^)โˆจ=f(\widehat{f})^{\vee}=f, a.e. on โ„d\mathbb{R}^{d} because (f^)โˆจโˆ’fโˆˆLโˆžโ€‹(โ„d)(\widehat{f})^{\vee}-f\in L^{\infty}(\mathbb{R}^{d})ย [Brezis:2011]*Corollary 4.24. โˆŽ

A direct consequence of Lemmaย 2.1 is that the pointwise Fourier inversion is valid for functions in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}). We shall frequently use this fact later on.

2.1. Completeness of the spectral Barron space

Lemma 2.2.
  1. (1)

    โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) is a Banach space.

  2. (2)

    When s>0s>0, โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) is not a Banach space if the norm โ€–fโ€–โ„ฌsโ€‹(โ„d)\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})} is replaced by ฯ…f,s\upsilon_{f,s}.

Proof.

We give a brief proof for the first claim for the readerโ€™s convenience, which has been stated inย [Hormander:1963]*Theorem 2.2.1.

It is sufficient to check the completeness of โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}). For any Cauchy sequence {fk}k=1โˆžโŠ‚โ„ฌsโ€‹(โ„d)\{f_{k}\}_{k=1}^{\infty}\subset\mathscr{B}^{s}(\mathbb{R}^{d}), there exists gโˆˆL1โ€‹(โ„d)g\in L^{1}(\mathbb{R}^{d}) such that f^kโ†’g\widehat{f}_{k}\to g in L1โ€‹(โ„d)L^{1}(\mathbb{R}^{d}). Therefore there exists a sub-sequence of {fk}k=1โˆž\{f_{k}\}_{k=1}^{\infty}(still denoted by fkf_{k}) such that f^kโ†’g\widehat{f}_{k}\to g a.e. on โ„d\mathbb{R}^{d}.

Define the measure ฮผ\mu by setting that for any measurable set EโŠ‚โ„dE\subset\mathbb{R}^{d},

ฮผโ€‹(E):=โˆซE|ฮพ|sโ€‹dฮพ.\mu(E){:}=\int_{E}\lvert\xi\rvert^{s}\mathrm{d}\xi.

Then {f^k}k=1โˆž\{\widehat{f}_{k}\}_{k=1}^{\infty} is a Cauchy sequence in L1โ€‹(โ„d,ฮผ)L^{1}(\mathbb{R}^{d},\mu) and there exists hโˆˆL1โ€‹(โ„d,ฮผ)h\in L^{1}(\mathbb{R}^{d},\mu) such that f^kโ†’h\widehat{f}_{k}\to h in L1โ€‹(โ„d,ฮผ)L^{1}(\mathbb{R}^{d},\mu). Therefore there exists a sub-sequence of {fk}k=1โˆž\{f_{k}\}_{k=1}^{\infty}(still denoted by fkf_{k}) such that f^kโ†’h\widehat{f}_{k}\to h ฮผ\mu-a.e. on โ„d\mathbb{R}^{d}. Note that for any measurable set EโŠ‚โ„dE\subset\mathbb{R}^{d}, ฮผโ€‹(E)=0\mu(E)=0 is equivalent to |E|=0\lvert E\rvert=0. Therefore f^kโ†’h\widehat{f}_{k}\to h a.e. on โ„d\mathbb{R}^{d}. By the uniqueness of limitation, h=gh=g, a.e. on โ„d\mathbb{R}^{d}.

Define f=gโˆจf=g^{\vee}. Lemmaย 2.1 shows that f^=g\widehat{f}=g in ๐’ฎโ€ฒโ€‹(โ„d)\mathscr{S}^{\prime}(\mathbb{R}^{d}). Therefore fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}) and fkโ†’ff_{k}\to f in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}). Hence โ„ฌs\mathscr{B}^{s} is complete and it is a Banach space.

The proof for (2) is a reductio ad absurdum.

Suppose the claim does not hold, then there exists CC depending only on ss and dd such that for any fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}),

(2.2) ฯ…f,0โ‰คCโ€‹ฯ…f,s.\upsilon_{f,0}\leq C\upsilon_{f,s}.

We shall show this is false by the following example.

For some ฮด>โˆ’1\delta>-1, let

fnโ€‹(x)=(โˆ‘k=1n2kโ€‹dโ€‹(1โˆ’22โ€‹kโ€‹|ฮพ|2)+ฮด)โˆจโ€‹(x).f_{n}(x)=\left(\sum_{k=1}^{n}2^{kd}(1-2^{2k}\lvert\xi\rvert^{2})_{+}^{\delta}\right)^{\vee}(x).

To bound ฯ…fn,0\upsilon_{f_{n},0} and ฯ…fn,s\upsilon_{f_{n},s}, we introduce the Bochner-Riesz multipliers

ฯ•R=((1โˆ’|ฮพ|2R2)+ฮด)โˆจ,ฮด>โˆ’1.\phi_{R}=\left(\left(1-\dfrac{\lvert\xi\rvert^{2}}{R^{2}}\right)_{+}^{\delta}\right)^{\vee},\qquad\delta>-1.

We claim

(2.3) ฯ•Rโ€‹(x)=ฮ“โ€‹(ฮด+1)ฯ€ฮดโ€‹|x|ฮด+d/2โ€‹Rโˆ’ฮด+d/2โ€‹Jฮด+d/2โ€‹(2โ€‹ฯ€โ€‹|x|โ€‹R),\phi_{R}(x)=\dfrac{\Gamma(\delta+1)}{\pi^{\delta}\lvert x\rvert^{\delta+d/2}}R^{-\delta+d/2}J_{\delta+d/2}(2\pi\lvert x\rvert R),

and

(2.4) ฯ…ฯ•R,s=ฯ‰dโˆ’12โ€‹Bโ€‹(s+d2,ฮด+1)โ€‹Rs+d.\upsilon_{\phi_{R},s}=\dfrac{\omega_{d-1}}{2}B\left(\dfrac{s+d}{2},\delta+1\right)R^{s+d}.

The proof is postponed to Appendixย A.1. It follows fromย (2.3) that

(2.5) fnโ€‹(x)=ฮ“โ€‹(ฮด+1)ฯ€ฮดโ€‹|x|ฮด+d/2โ€‹โˆ‘k=1n2kโ€‹(ฮด+d/2)โ€‹Jฮด+d/2โ€‹(21โˆ’kโ€‹ฯ€โ€‹|x|),f_{n}(x)=\dfrac{\Gamma(\delta+1)}{\pi^{\delta}\lvert x\rvert^{\delta+d/2}}\sum_{k=1}^{n}2^{k(\delta+d/2)}J_{\delta+d/2}(2^{1-k}\pi\lvert x\rvert),

and fnโˆˆโ„ฌsโ€‹(โ„d)f_{n}\in\mathscr{B}^{s}(\mathbb{R}^{d}) with

ฯ…fn,s=โˆ‘k=1n2kโ€‹dโ€‹ฯ…ฯ•2โˆ’k,s=1โˆ’2โˆ’nโ€‹s2s+1โˆ’2โ€‹ฯ‰dโˆ’1โ€‹Bโ€‹(s+d2,ฮด+1),\upsilon_{f_{n},s}=\sum_{k=1}^{n}2^{kd}\upsilon_{\phi_{2^{-k}},s}=\dfrac{1-2^{-ns}}{2^{s+1}-2}\omega_{d-1}B\left(\dfrac{s+d}{2},\delta+1\right),

and

ฯ…fn,0=โˆ‘k=1n2kโ€‹dโ€‹ฯ…ฯ•2โˆ’k,0=ฯ‰dโˆ’12โ€‹Bโ€‹(d2,ฮด+1)โ€‹n.\upsilon_{f_{n},0}=\sum_{k=1}^{n}2^{kd}\upsilon_{\phi_{2^{-k}},0}=\dfrac{\omega_{d-1}}{2}B\left(\dfrac{d}{2},\delta+1\right)n.

where we have usedย (2.4). It is clear that

ฯ‰dโˆ’12s+1โ€‹Bโ€‹(s+d2,ฮด+1)โ‰คฯ…fn,sโ‰คฯ‰dโˆ’12s+1โˆ’2โ€‹Bโ€‹(s+d2,ฮด+1).\dfrac{\omega_{d-1}}{2^{s+1}}B\left(\dfrac{s+d}{2},\delta+1\right)\leq\upsilon_{f_{n},s}\leq\dfrac{\omega_{d-1}}{2^{s+1}-2}B\left(\dfrac{s+d}{2},\delta+1\right).

Hence ฯ…fn,0โ‰ƒ๐’ชโ€‹(n)\upsilon_{f_{n},0}\simeq\mathcal{O}(n) while ฯ…fn,sโ‰ƒ๐’ชโ€‹(1)\upsilon_{f_{n},s}\simeq\mathcal{O}(1). This shows thatย (2.2) is invalid for a large number nn. This proves the second claim. โˆŽ

Similar to โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}), the space โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}) defined inย (1.1) has been exploited as the target space for neural network approximation by several authorsย [Ma:2017, Xu:2020, Siegel:2022, Siegel:2023]. The advantage of this space is that the Fourier transform is well-defined and the pointwise Fourier inversion is true for functions belonging to โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}). Unfortunately, โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}) is not a Banach space as we shall show below.

Lemma 2.3.

The space โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}) defined inย (1.1) equipped with the norm ฯ…f,0+ฯ…f,s\upsilon_{f,0}+\upsilon_{f,s} is not a Banach space.

To prove Lemmaย 2.3, we recall the Barron spectrum space defined by Meng and Ming inย [MengMing:2022]: For sโˆˆโ„s\in\mathbb{R} and 1โ‰คpโ‰ค21\leq p\leq 2,

(2.6) โ„ฌpsโ€‹(โ„d):={fโˆˆLpโ€‹(โ„d)โˆฃโ€–fโ€–Lpโ€‹(โ„d)+ฯ…f,s<โˆž}\mathscr{B}^{s}_{p}(\mathbb{R}^{d}){:}=\left\{\,f\in L^{p}(\mathbb{R}^{d})\,\mid\,\|\,f\,\|_{L^{p}(\mathbb{R}^{d})}+\upsilon_{f,s}<\infty\,\right\}

equipped with the norm โ€–fโ€–โ„ฌpsโ€‹(โ„d):=โ€–fโ€–Lpโ€‹(โ„d)+ฯ…f,s\|\,f\,\|_{\mathscr{B}^{s}_{p}(\mathbb{R}^{d})}{:}=\|\,f\,\|_{L^{p}(\mathbb{R}^{d})}+\upsilon_{f,s}. A useful interpolation inequality that compares the spectral norm of different orders has been proved inย [MengMing:2022]*Lemma 2.1. For 1โ‰คpโ‰ค21\leq p\leq 2 and โˆ’d/p<s1<s2-d/p<s_{1}<s_{2}, there exists CC depends on s1,s2,ds_{1},s_{2},d and pp such that

(2.7) ฯ…f,s1โ‰คCโ€‹โ€–fโ€–Lpโ€‹(โ„d)ฮณโ€‹ฯ…f,s21โˆ’ฮณ,\upsilon_{f,s_{1}}\leq C\|\,f\,\|_{L^{p}(\mathbb{R}^{d})}^{\gamma}\upsilon_{f,s_{2}}^{1-\gamma},

where ฮณ=(s2โˆ’s1)/(s2+d/p)\gamma=(s_{2}-s_{1})/(s_{2}+d/p). For any ฮต>0\varepsilon>0, using the fact

ฯ…fฮต,s=ฮตโˆ’sโ€‹ฯ…f,s,\upsilon_{f_{\varepsilon},s}=\varepsilon^{-s}\upsilon_{f,s},

we observe that the inequalityย (2.7) is dilation invariant because it is invariant if we replace ff by fฮต:=fโ€‹(x/ฮต)f_{\varepsilon}{:}=f(x/\varepsilon).

Proof of Lemmaย 2.3.

The authors inย [MengMing:2022] have proved that โ„ฌpsโ€‹(โ„d)\mathscr{B}^{s}_{p}(\mathbb{R}^{d}) equipped with the norm โ€–fโ€–โ„ฌpsโ€‹(โ„d)\|\,f\,\|_{\mathscr{B}^{s}_{p}(\mathbb{R}^{d})} is a Banach space. For any fโˆˆโ„ฌ1sโ€‹(โ„d)f\in\mathscr{B}^{s}_{1}(\mathbb{R}^{d}), taking s1=0,s2=ss_{1}=0,s_{2}=s and p=1p=1 inย (2.7), we obtain, there exists CC depending only on dd and ss such that

ฯ…f,0โ‰คCโ€‹โ€–fโ€–L1โ€‹(โ„d)ฮณโ€‹ฯ…f,s1โˆ’ฮณโ‰คCโ€‹โ€–fโ€–โ„ฌ1sโ€‹(โ„d),\upsilon_{f,0}\leq C\|\,f\,\|_{L^{1}(\mathbb{R}^{d})}^{\gamma}\upsilon_{f,s}^{1-\gamma}\leq C\|\,f\,\|_{\mathscr{B}^{s}_{1}(\mathbb{R}^{d})},

where ฮณ=s/(s+d)\gamma=s/(s+d).

We prove the assertion by reductio ad absurdum. Suppose that โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}) equipped with the norm ฯ…f,0+ฯ…f,s\upsilon_{f,0}+\upsilon_{f,s} is also a Banach space, then by the bounded inverse theorem and the above interpolation inequality, we get, there exists CC depending only on ss and dd such that for any fโˆˆโ„ฌ1sโ€‹(โ„d)f\in\mathscr{B}^{s}_{1}(\mathbb{R}^{d}),

(2.8) โ€–fโ€–L1โ€‹(โ„d)โ‰คCโ€‹(ฯ…f,0+ฯ…f,s).\|\,f\,\|_{L^{1}(\mathbb{R}^{d})}\leq C(\upsilon_{f,0}+\upsilon_{f,s}).

We shall show this is not the case by the following example.

For some ฮด>(dโˆ’1)/2\delta>(d-1)/2, we define

fnโ€‹(x):=(โˆ‘k=1n(1โˆ’22โ€‹kโ€‹|ฮพ|2)+ฮด)โˆจโ€‹(x).f_{n}(x){:}=\left(\sum_{k=1}^{n}(1-2^{2k}\lvert\xi\rvert^{2})_{+}^{\delta}\right)^{\vee}(x).

Usingย (2.3) and noting fn=โˆ‘k=1nฯ•2โˆ’kf_{n}=\sum_{k=1}^{n}\phi_{2^{-k}}, we have the explicit form of fnf_{n} as

(2.9) fnโ€‹(x)=ฮ“โ€‹(ฮด+1)ฯ€ฮดโ€‹|x|ฮด+d/2โ€‹โˆ‘k=1n2kโ€‹(ฮดโˆ’d/2)โ€‹Jฮด+d/2โ€‹(21โˆ’kโ€‹ฯ€โ€‹|x|).f_{n}(x)=\dfrac{\Gamma(\delta+1)}{\pi^{\delta}\lvert x\rvert^{\delta+d/2}}\sum_{k=1}^{n}2^{k(\delta-d/2)}J_{\delta+d/2}(2^{1-k}\pi\lvert x\rvert).

Usingย (2.4), we get

ฯ…fn,s=โˆ‘k=1nฯ…ฯ•2โˆ’k,s=1โˆ’2โˆ’nโ€‹(s+d)2s+d+1โˆ’2โ€‹ฯ‰dโˆ’1โ€‹Bโ€‹(s+d2,ฮด+1),\upsilon_{f_{n},s}=\sum_{k=1}^{n}\upsilon_{\phi_{2^{-k}},s}=\dfrac{1-2^{-n(s+d)}}{2^{s+d+1}-2}\omega_{d-1}B\left(\dfrac{s+d}{2},\delta+1\right),

and

ฯ‰dโˆ’12s+d+1โ€‹Bโ€‹(s+d2,ฮด+1)โ‰คฯ…fn,sโ‰คฯ‰dโˆ’12s+d+1โˆ’2โ€‹Bโ€‹(s+d2,ฮด+1).\dfrac{\omega_{d-1}}{2^{s+d+1}}B\left(\dfrac{s+d}{2},\delta+1\right)\leq\upsilon_{f_{n},s}\leq\dfrac{\omega_{d-1}}{2^{s+d+1}-2}B\left(\dfrac{s+d}{2},\delta+1\right).

Proceeding along the same line, we obtain

ฯ…fn,0=โˆ‘k=1nฯ…ฯ•2โˆ’k,0=1โˆ’2โˆ’nโ€‹d2d+1โˆ’2โ€‹ฯ‰dโˆ’1โ€‹Bโ€‹(d2,ฮด+1),\upsilon_{f_{n},0}=\sum_{k=1}^{n}\upsilon_{\phi_{2^{-k}},0}=\dfrac{1-2^{-nd}}{2^{d+1}-2}\omega_{d-1}B\left(\dfrac{d}{2},\delta+1\right),

and

ฯ‰dโˆ’12d+1โ€‹Bโ€‹(d2,ฮด+1)โ‰คฯ…fn,0โ‰คฯ‰dโˆ’12d+1โˆ’2โ€‹Bโ€‹(d2,ฮด+1).\dfrac{\omega_{d-1}}{2^{d+1}}B\left(\dfrac{d}{2},\delta+1\right)\leq\upsilon_{f_{n},0}\leq\dfrac{\omega_{d-1}}{2^{d+1}-2}B\left(\dfrac{d}{2},\delta+1\right).

Hence,

(2.10) ฯ…fn,0+ฯ…fn,sโ‰คฯ‰dโˆ’12โ€‹(Bโ€‹(d/2,ฮด+1)2dโˆ’1+Bโ€‹((s+d)/2,ฮด+1)2s+dโˆ’1).\upsilon_{f_{n},0}+\upsilon_{f_{n},s}\leq\dfrac{\omega_{d-1}}{2}\left(\dfrac{B(d/2,\delta+1)}{2^{d}-1}+\dfrac{B((s+d)/2,\delta+1)}{2^{s+d}-1}\right).

Byย (2.3), a direct calculation gives

โ€–ฯ•Rโ€–L1โ€‹(โ„d)=ฮ“โ€‹(ฮด+1)ฯ€ฮดโ€‹Rฮดโˆ’d/2โ€‹โˆซโ„d|Jฮด+d/2โ€‹(2โ€‹ฯ€โ€‹|x|โ€‹R)||x|ฮด+d/2โ€‹dx=2ฮดโ€‹ฮ“โ€‹(ฮด+1)ฯ€ฮด+d/2โ€‹โˆซโ„d|Jฮด+d/2โ€‹(|x|)||x|ฮด+d/2โ€‹dx.\|\,\phi_{R}\,\|_{L^{1}(\mathbb{R}^{d})}=\dfrac{\Gamma(\delta+1)}{\pi^{\delta}R^{\delta-d/2}}\int_{\mathbb{R}^{d}}\dfrac{\lvert J_{\delta+d/2}(2\pi\lvert x\rvert R)\rvert}{\lvert x\rvert^{\delta+d/2}}\mathrm{d}x\\ =\dfrac{2^{\delta}\Gamma(\delta+1)}{\pi^{\delta+d/2}}\int_{\mathbb{R}^{d}}\dfrac{\lvert J_{\delta+d/2}(\lvert x\rvert)\rvert}{\lvert x\rvert^{\delta+d/2}}\mathrm{d}x.

Invokingย [Grafakos:2014]*Appendix B.6, B.7, there exists CC that depends on ฮฝ\nu such that

|Jฮฝโ€‹(x)|โ‰คCโ€‹{|x|ฮฝ|x|โ‰ค1,|x|โˆ’1/2|x|>1.\lvert J_{\nu}(x)\rvert\leq C\begin{cases}\lvert x\rvert^{\nu}\qquad&\lvert x\rvert\leq 1,\\ \lvert x\rvert^{-1/2}\qquad&\lvert x\rvert>1.\end{cases}

We get, there exists CC depending only on dd and ฮด\delta such that

โ€–ฯ•Rโ€–L1โ€‹(โ„d)\displaystyle\|\,\phi_{R}\,\|_{L^{1}(\mathbb{R}^{d})} =2ฮดโ€‹ฮ“โ€‹(ฮด+1)ฯ€ฮด+d/2โ€‹(โˆซ|x|โ‰ค1|Jฮด+d/2โ€‹(|x|)||x|ฮด+d/2โ€‹dx+โˆซ|x|>1|Jฮด+d/2โ€‹(|x|)||x|ฮด+d/2โ€‹dx)\displaystyle=\dfrac{2^{\delta}\Gamma(\delta+1)}{\pi^{\delta+d/2}}\left(\int_{\lvert x\rvert\leq 1}\dfrac{\lvert J_{\delta+d/2}(\lvert x\rvert)\rvert}{\lvert x\rvert^{\delta+d/2}}\mathrm{d}x+\int_{\lvert x\rvert>1}\dfrac{\lvert J_{\delta+d/2}(\lvert x\rvert)\rvert}{\lvert x\rvert^{\delta+d/2}}\mathrm{d}x\right)
โ‰คCโ€‹(โˆซ|x|โ‰ค1dx+โˆซ|x|>1|x|โˆ’1/2โˆ’ฮดโˆ’d/2โ€‹dx)\displaystyle\leq C\left(\int_{\lvert x\rvert\leq 1}\mathrm{d}\,x+\int_{\lvert x\rvert>1}\lvert x\rvert^{-1/2-\delta-d/2}\mathrm{d}\,x\right)
โ‰คCโ€‹(1+1ฮดโˆ’(dโˆ’1)/2),\displaystyle\leq C\left(1+\dfrac{1}{\delta-(d-1)/2}\right),

where we have used the fact ฮด>(dโˆ’1)/2\delta>(d-1)/2 in the last step. Therefore, โ€–ฯ•Rโ€–L1โ€‹(โ„d)\|\,\phi_{R}\,\|_{L^{1}(\mathbb{R}^{d})} is bounded by a constant that depends only on ฮด\delta and dd but is independent of RR. Moreover,

โ€–fnโ€–L1โ€‹(โ„d)โ‰คโˆ‘k=1nโ€–ฯ•2โˆ’kโ€–L1โ€‹(โ„d)โ‰คnโ€‹โ€–ฯ•1โ€–L1โ€‹(โ„d),\|\,f_{n}\,\|_{L^{1}(\mathbb{R}^{d})}\leq\sum_{k=1}^{n}\|\,\phi_{2^{-k}}\,\|_{L^{1}(\mathbb{R}^{d})}\leq n\|\,\phi_{1}\,\|_{L^{1}(\mathbb{R}^{d})},

and by the Hausdorff-Young inequalityย (2.1),

โ€–fnโ€–L1โ€‹(โ„d)โ‰ฅโ€–f^nโ€–Lโˆžโ€‹(โ„d)=f^nโ€‹(0)=n.\|\,f_{n}\,\|_{L^{1}(\mathbb{R}^{d})}\geq\|\,\widehat{f}_{n}\,\|_{L^{\infty}(\mathbb{R}^{d})}=\widehat{f}_{n}(0)=n.

This means that โ€–fnโ€–L1โ€‹(โ„d)=๐’ชโ€‹(n)\|\,f_{n}\,\|_{L^{1}(\mathbb{R}^{d})}=\mathcal{O}(n), which together withย (2.10) immediately shows that the inequalityย (2.8) cannot be true for sufficiently large nn. Hence, we conclude that โ„ฌ^sโ€‹(โ„d)\widehat{\mathscr{B}}^{s}(\mathbb{R}^{d}) is not a Banach space. โˆŽ

2.2. Embedding relations of the spectral Barron spaces

In this part we discuss the embedding of the spectral Barron spaces.

Lemma 2.4.
  1. (1)

    Interpolation inequality: For any 0โ‰คs1โ‰คsโ‰คs20\leq s_{1}\leq s\leq s_{2} satisfying s=ฮฑโ€‹s1+(1โˆ’ฮฑ)โ€‹s2s=\alpha s_{1}+(1-\alpha)s_{2} with 0โ‰คฮฑโ‰ค10\leq\alpha\leq 1, and fโˆˆโ„ฌs1โ€‹(โ„d)f\in\mathscr{B}^{s_{1}}(\mathbb{R}^{d}), there holds

    (2.11) ฯ…f,sโ‰คฯ…f,s1ฮฑโ€‹ฯ…f,s21โˆ’ฮฑ,\upsilon_{f,s}\leq\upsilon_{f,s_{1}}^{\alpha}\upsilon_{f,s_{2}}^{1-\alpha},

    and

    (2.12) โ€–fโ€–โ„ฌsโ€‹(โ„d)โ‰คโ€–fโ€–โ„ฌs1โ€‹(โ„d)ฮฑโ€‹โ€–fโ€–โ„ฌs2โ€‹(โ„d)1โˆ’ฮฑ.\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}\leq\|\,f\,\|_{\mathscr{B}^{s_{1}}(\mathbb{R}^{d})}^{\alpha}\|\,f\,\|_{\mathscr{B}^{s_{2}}(\mathbb{R}^{d})}^{1-\alpha}.
  2. (2)

    Let 0โ‰คs1โ‰คs20\leq s_{1}\leq s_{2}, there holds โ„ฌs2โ€‹(โ„d)โ†ชโ„ฌs1โ€‹(โ„d)\mathscr{B}^{s_{2}}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s_{1}}(\mathbb{R}^{d}) with

    (2.13) โ€–fโ€–โ„ฌs1โ€‹(โ„d)โ‰ค(2โˆ’s1s2)โ€‹โ€–fโ€–โ„ฌs2โ€‹(โ„d)โˆ€fโˆˆโ„ฌs2โ€‹(โ„d).\|\,f\,\|_{\mathscr{B}^{s_{1}}(\mathbb{R}^{d})}\leq\left(2-\dfrac{s_{1}}{s_{2}}\right)\|\,f\,\|_{\mathscr{B}^{s_{2}}(\mathbb{R}^{d})}\qquad\forall f\in\mathscr{B}^{s_{2}}(\mathbb{R}^{d}).

The embeddingย (2.13) has been stated inย [Hormander:1963]*Theorem 2.2.2 without tracing the embedding constant.

Proof.

We start with the interpolation inequalityย (2.11) for the spectral norm. For any 0โ‰คs1โ‰คsโ‰คs20\leq s_{1}\leq s\leq s_{2} with s=ฮฑโ€‹s1+(1โˆ’ฮฑ)โ€‹s2s=\alpha s_{1}+(1-\alpha)s_{2}, using Hรถlderโ€™s inequality, we obtain

ฯ…f,s=โˆซโ„d(|ฮพ|s1โ€‹|f^โ€‹(ฮพ)|)ฮฑโ€‹(|ฮพ|s2โ€‹|f^โ€‹(ฮพ)|)1โˆ’ฮฑโ€‹dฮพโ‰คฯ…f,s1ฮฑโ€‹ฯ…f,s21โˆ’ฮฑ.\upsilon_{f,s}=\int_{\mathbb{R}^{d}}\left(\lvert\xi\rvert^{s_{1}}\lvert\widehat{f}(\xi)\rvert\right)^{\alpha}\left(\lvert\xi\rvert^{s_{2}}\lvert\widehat{f}(\xi)\rvert\right)^{1-\alpha}\mathrm{d}\xi\leq\upsilon_{f,s_{1}}^{\alpha}\upsilon_{f,s_{2}}^{1-\alpha}.

This givesย (2.11).

Next, for a,b,c>0a,b,c>0, by Youngโ€™s inequality, we have

a+bฮฑโ€‹c1โˆ’ฮฑ(a+b)ฮฑโ€‹(a+c)1โˆ’ฮฑ\displaystyle\dfrac{a+b^{\alpha}c^{1-\alpha}}{(a+b)^{\alpha}(a+c)^{1-\alpha}} =(aa+b)ฮฑโ€‹(aa+c)1โˆ’ฮฑ+(ba+b)ฮฑโ€‹(ca+c)1โˆ’ฮฑ\displaystyle=\left(\dfrac{a}{a+b}\right)^{\alpha}\left(\dfrac{a}{a+c}\right)^{1-\alpha}+\left(\dfrac{b}{a+b}\right)^{\alpha}\left(\dfrac{c}{a+c}\right)^{1-\alpha}
โ‰คฮฑโ€‹aa+b+(1โˆ’ฮฑ)โ€‹aa+c+ฮฑโ€‹ba+b+(1โˆ’ฮฑ)โ€‹ca+c\displaystyle\leq\alpha\dfrac{a}{a+b}+(1-\alpha)\dfrac{a}{a+c}+\alpha\dfrac{b}{a+b}+(1-\alpha)\dfrac{c}{a+c}
=1.\displaystyle=1.

This yields

a+bฮฑโ€‹c1โˆ’ฮฑโ‰ค(a+b)ฮฑโ€‹(a+c)1โˆ’ฮฑ.a+b^{\alpha}c^{1-\alpha}\leq(a+b)^{\alpha}(a+c)^{1-\alpha}.

Let a=ฯ…f,0,b=ฯ…f,s1a=\upsilon_{f,0},b=\upsilon_{f,s_{1}} and c=ฯ…f,s2c=\upsilon_{f,s_{2}}, we obtain

โ€–fโ€–โ„ฌsโ€‹(โ„d)=ฯ…f,0+ฯ…f,sโ‰คฯ…f,0+ฯ…f,s1ฮฑโ€‹ฯ…f,s21โˆ’ฮฑโ‰คโ€–fโ€–โ„ฌs1โ€‹(โ„d)ฮฑโ€‹โ€–fโ€–โ„ฌs2โ€‹(โ„d)1โˆ’ฮฑ.\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}=\upsilon_{f,0}+\upsilon_{f,s}\leq\upsilon_{f,0}+\upsilon_{f,s_{1}}^{\alpha}\upsilon_{f,s_{2}}^{1-\alpha}\leq\|\,f\,\|_{\mathscr{B}^{s_{1}}(\mathbb{R}^{d})}^{\alpha}\|\,f\,\|_{\mathscr{B}^{s_{2}}(\mathbb{R}^{d})}^{1-\alpha}.

This impliesย (2.12).

Next, if we take s1=0s_{1}=0 inย (2.11) and s=(1โˆ’ฮฑ)โ€‹s2s=(1-\alpha)s_{2} with ฮฑ=1โˆ’s/s2\alpha=1-s/s_{2}, then

โ€–fโ€–โ„ฌsโ€‹(โ„d)โ‰คฯ…f,0+ฯ…f,0ฮฑโ€‹ฯ…f,s21โˆ’ฮฑโ‰ค(1+ฮฑ)โ€‹ฯ…f,0+(1โˆ’ฮฑ)โ€‹ฯ…f,s2โ‰ค(1+ฮฑ)โ€‹โ€–fโ€–โ„ฌs2โ€‹(โ„d).\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}\leq\upsilon_{f,0}+\upsilon_{f,0}^{\alpha}\upsilon_{f,s_{2}}^{1-\alpha}\leq(1+\alpha)\upsilon_{f,0}+(1-\alpha)\upsilon_{f,s_{2}}\leq(1+\alpha)\|\,f\,\|_{\mathscr{B}^{s_{2}}(\mathbb{R}^{d})}.

This leads toย (2.13) and completes the proof. โˆŽ

The next lemma shows that โ„ฌpsโ€‹(โ„d)\mathscr{B}^{s}_{p}(\mathbb{R}^{d}) is a proper subspace of โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}).

Lemma 2.5.

For sโ‰ฅ0s\geq 0 and 1โ‰คpโ‰ค21\leq p\leq 2, there holds โ„ฌpsโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d)\mathscr{B}^{s}_{p}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}), and the inclusion is proper in the sense that for any 1โ‰คp<โˆž1\leq p<\infty, there exists fpโˆˆโ„ฌsโ€‹(โ„d)f_{p}\in\mathscr{B}^{s}(\mathbb{R}^{d}) and fpโˆ‰Lpโ€‹(โ„d)f_{p}\not\in L^{p}(\mathbb{R}^{d}).

Proof.

It follows from the interpolation inequalityย (2.7) that ฯ…f,0โ‰คCโ€‹โ€–fโ€–โ„ฌpsโ€‹(โ„d)\upsilon_{f,0}\leq C\|\,f\,\|_{\mathscr{B}^{s}_{p}(\mathbb{R}^{d})}. Hence

โ€–fโ€–โ„ฌsโ€‹(โ„d)โ‰คCโ€‹โ€–fโ€–โ„ฌpsโ€‹(โ„d).\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}\leq C\|\,f\,\|_{\mathscr{B}^{s}_{p}(\mathbb{R}^{d})}.

This implies โ„ฌpsโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d)\mathscr{B}^{s}_{p}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}) for any sโ‰ฅ0s\geq 0 and 1โ‰คpโ‰ค21\leq p\leq 2.

We shall show below that the inclusion is proper. Let

fpโ€‹(x):=(|ฮพ|โˆ’d/pโ€ฒโ€‹ฯ‡[0,1)โ€‹(|ฮพ|))โˆจโ€‹(x),f_{p}(x){:}=\left(\lvert\xi\rvert^{-d/p^{\prime}}\chi_{[0,1)}(\lvert\xi\rvert)\right)^{\vee}(x),

where ฯ‡ฮฉโ€‹(t)\chi_{\Omega}(t) is the characteristic function on โ„\mathbb{R} that equals to one if tโˆˆฮฉt\in\Omega and zero otherwise. It is straightforward to verify fpโˆˆโ„ฌsโ€‹(โ„d)f_{p}\in\mathscr{B}^{s}(\mathbb{R}^{d}).

We shall show below that fpโˆ‰Lpโ€‹(โ„d)f_{p}\notin L^{p}(\mathbb{R}^{d}), which is based on the following explicit formula for fpf_{p} shown in Appendixย A.2:

(2.14) fpโ€‹(x)=F21โ€‹(d/(2โ€‹p);1+d/(2โ€‹p),d/2;โˆ’ฯ€2โ€‹|x|2)โ€‹pโ€‹ฮฝd,f_{p}(x)={}_{1}F_{2}(d/(2p);1+d/(2p),d/2;-\pi^{2}\lvert x\rvert^{2})p\nu_{d},

where the generalized Hypergeometric function Fmn{}_{n}F_{m} is defined as follows. For nonnegative integer n,mn,m and none of the parameters {ฮฒj}j=1m\{\beta_{j}\}_{j=1}^{m} is a negative integer or zero,

Fmnโ€‹(ฮฑ1,โ€ฆ,ฮฑn;ฮฒ1,โ€ฆ,ฮฒm;x):=โˆ‘k=0โˆžโˆj=1n(ฮฑj)kโˆj=1m(ฮฒj)kโ€‹xkk!.{}_{n}F_{m}(\alpha_{1},\dots,\alpha_{n};\beta_{1},\dots,\beta_{m};x){:}=\sum_{k=0}^{\infty}\dfrac{\prod_{j=1}^{n}(\alpha_{j})_{k}}{\prod_{j=1}^{m}(\beta_{j})_{k}}\dfrac{x^{k}}{k!}.

The generalized Hypergeometric function Fmn{}_{n}F_{m} converges for all finite xx if nโ‰คmn\leq m. In particular Fmnโ€‹(ฮฑ1,โ€ฆ,ฮฑn;ฮฒ1,โ€ฆ,ฮฒm;0)=1{}_{n}F_{m}(\alpha_{1},\dots,\alpha_{n};\beta_{1},\dots,\beta_{m};0)=1. Hence fpโ€‹(x)f_{p}(x) is finite for any xx. Usingย [MathaiSaxena:1973]*Appendix, we obtain

F21โ€‹(ฮฑ;ฮฒ,ฮณ;โˆ’x2/4)โ‰ƒ๐’ชโ€‹(|x|ฮฑโˆ’ฮฒโˆ’ฮณ+1/2+|x|โˆ’2โ€‹ฮฑ)when|x|โ†’โˆž.{}_{1}F_{2}(\alpha;\beta,\gamma;-x^{2}/4)\simeq\mathcal{O}(\lvert x\rvert^{\alpha-\beta-\gamma+1/2}+\lvert x\rvert^{-2\alpha})\qquad\text{when}\quad\lvert x\rvert\to\infty.

Therefore,

fpโ€‹(x)โ‰ƒ๐’ชโ€‹(|x|โˆ’(d+1)/2+|x|โˆ’d/p)when|x|โ†’โˆž.f_{p}(x)\simeq\mathcal{O}(\lvert x\rvert^{-(d+1)/2}+\lvert x\rvert^{-d/p})\qquad\text{when}\quad\lvert x\rvert\to\infty.

This immediately implies fpโˆ‰Lpโ€‹(โ„d)f_{p}\not\in L^{p}(\mathbb{R}^{d}). โˆŽ

Remark 2.6.

The representationย (2.14) is rather complicated, we give explicit formulas for certain special cases. When d=1d=1,

f1โ€‹(x)=sinโก(2โ€‹ฯ€โ€‹x)ฯ€โ€‹x,f2โ€‹(x)=2|x|โ€‹Cโ€‹(2โ€‹|x|),f_{1}(x)=\dfrac{\sin(2\pi x)}{\pi x},\qquad f_{2}(x)=\dfrac{2}{\sqrt{\lvert x\rvert}}C(2\sqrt{\lvert x\rvert}),

where CC is the Fresnel Cosine integral given by

Cโ€‹(x)=โˆซ0xcosโก(ฯ€โ€‹t2/2)โ€‹dtโ†’12whenxโ†’โˆž,C(x)=\int_{0}^{x}\cos(\pi t^{2}/2)\mathrm{d}t\to\dfrac{1}{2}\qquad\text{when}\quad x\to\infty,

Indeed, for d=p=1d=p=1, using the relationย [Luke:1969]*ยงย 6.2.1, Eq.(10)

sinx=F10(;3/2;โˆ’x2/4)x,\sin x={}_{0}F_{1}(;3/2;-x^{2}/4)x,

we obtain

f1(x)=2F21(1/2;3/2,1/2;โˆ’ฯ€2x2)=2F10(;3/2;โˆ’ฯ€2x2)=sinโก(2โ€‹ฯ€โ€‹x)ฯ€โ€‹x.f_{1}(x)=2{}_{1}F_{2}(1/2;3/2,1/2;-\pi^{2}x^{2})=2{}_{0}F_{1}(;3/2;-\pi^{2}x^{2})=\dfrac{\sin(2\pi x)}{\pi x}.

When p=2p=2, using the identityย [Luke:1969]*ยงย 6.2.11, Eq. (41)

Cโ€‹(2โ€‹x/ฯ€)=2โ€‹xฯ€โ€‹F21โ€‹(1/4;5/4,1/2;โˆ’x2/4)whenx>0,C(\sqrt{2x/\pi})=\sqrt{\dfrac{2x}{\pi}}{}_{1}F_{2}(1/4;5/4,1/2;-x^{2}/4)\qquad\text{when}\quad x>0,

we obtain

f2โ€‹(x)=4โ€‹F21โ€‹(1/4;5/4,1/2;โˆ’ฯ€2โ€‹x2)=2|x|โ€‹Cโ€‹(2โ€‹|x|).f_{2}(x)=4{}_{1}F_{2}(1/4;5/4,1/2;-\pi^{2}x^{2})=\dfrac{2}{\sqrt{\lvert x\rvert}}C(2\sqrt{\lvert x\rvert}).

2.3. Relations to some classical function spaces

In this part, we establish the embedding between the spectral Barron space โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) and the Besov space, and hence we bridge โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}) and the Sobolev space as inย [MengMing:2022]. We firstly recall the definition of the Besov space.

Definition 2.7 (Besov space).

Let {ฯ†j}j=0โˆžโŠ‚๐’ฎโ€‹(โ„d)\{\varphi_{j}\}_{j=0}^{\infty}\subset\mathscr{S}(\mathbb{R}^{d}) satisfies 0โ‰คฯ†jโ‰ค10\leq\varphi_{j}\leq 1 and

{suppโ€‹(ฯ†0)โŠ‚ฮ“0:={xโˆˆโ„dโˆฃ|x|โ‰ค2},suppโ€‹(ฯ†j)โŠ‚ฮ“j:={xโˆˆโ„dโˆฃโ€‰2jโˆ’1โ‰ค|x|โ‰ค2j+1},j=1,2,โ€ฆ.\begin{cases}\text{supp}(\varphi_{0})\subset\Gamma_{0}{:}=\left\{\,x\in\mathbb{R}^{d}\,\mid\,\lvert x\rvert\leq 2\,\right\},\\ \text{supp}(\varphi_{j})\subset\Gamma_{j}{:}=\left\{\,x\in\mathbb{R}^{d}\,\mid\,2^{j-1}\leq\lvert x\rvert\leq 2^{j+1}\,\right\},\qquad j=1,2,\dots.\end{cases}

For every multi-index ฮฑ\alpha, there exists a positive number cฮฑc_{\alpha} such that

2jโ€‹|ฮฑ|โ€‹|โˆ‡ฮฑฯ†jโ€‹(x)|โ‰คcฮฑfor allj=0,โ€ฆ,for allxโˆˆโ„d,2^{j\lvert\alpha\rvert}\lvert\nabla^{\alpha}\varphi_{j}(x)\rvert\leq c_{\alpha}\quad\text{for all}\quad j=0,\dots,\quad\text{for all}\quad x\in\mathbb{R}^{d},

and

โˆ‘j=0โˆžฯ†jโ€‹(x)=1for everyxโˆˆโ„d.\sum_{j=0}^{\infty}\varphi_{j}(x)=1\quad\text{for every}\quad x\in\mathbb{R}^{d}.

Let ฮฑโˆˆโ„\alpha\in\mathbb{R} and 1โ‰คp,qโ‰คโˆž1\leq p,q\leq\infty. Define the Besov space

Bp,qฮฑโ€‹(โ„d):={fโˆˆ๐’ฎโ€ฒโ€‹(โ„d)โˆฃโ€–fโ€–Bp,qฮฑโ€‹(โ„d)<โˆž}B^{\alpha}_{p,q}(\mathbb{R}^{d}){:}=\left\{\,f\in\mathscr{S}^{\prime}(\mathbb{R}^{d})\,\mid\,\|\,f\,\|_{B^{\alpha}_{p,q}(\mathbb{R}^{d})}<\infty\,\right\}

equipped with the norm

โ€–fโ€–Bp,qฮฑโ€‹(โ„d):=(โˆ‘j=0โˆž2ฮฑโ€‹qโ€‹jโ€‹โ€–(ฯ†jโ€‹f^)โˆจโ€–Lpโ€‹(โ„d)q)1/qwhenq<โˆž,\|\,f\,\|_{B^{\alpha}_{p,q}(\mathbb{R}^{d})}{:}=\left(\sum_{j=0}^{\infty}2^{\alpha qj}\|\,(\varphi_{j}\widehat{f})^{\vee}\,\|_{L^{p}(\mathbb{R}^{d})}^{q}\right)^{1/q}\qquad\text{when}\quad q<\infty,

and

โ€–fโ€–Bp,โˆžฮฑโ€‹(โ„d):=supjโ‰ฅ02ฮฑโ€‹jโ€‹โ€–(ฯ†jโ€‹f^)โˆจโ€–Lpโ€‹(โ„d).\|\,f\,\|_{B_{p,\infty}^{\alpha}(\mathbb{R}^{d})}{:}=\sup_{j\geq 0}2^{\alpha j}\|\,(\varphi_{j}\widehat{f})^{\vee}\,\|_{L^{p}(\mathbb{R}^{d})}.

We firstly recall the following embedding for the Besov space, which was firstly proved in the series work of Taiblesonย [Taibleson:1964, Taibleson:1965, Taibleson:1966]. We retain the proof in Appendixย A.3 for the readerโ€™s convenience.

Lemma 2.8.

There holds Bp1,q1ฮฑ1โ€‹(โ„d)โ†ชBp2,q2ฮฑ2โ€‹(โ„d)B_{p_{1},q_{1}}^{\alpha_{1}}(\mathbb{R}^{d})\hookrightarrow B_{p_{2},q_{2}}^{\alpha_{2}}(\mathbb{R}^{d}) if and only if p1โ‰คp2p_{1}\leq p_{2} and one of the following conditions holds:

  1. (1)

    ฮฑ1โˆ’d/p1>ฮฑ2โˆ’d/p2\alpha_{1}-d/p_{1}>\alpha_{2}-d/p_{2} and q1,q2q_{1},q_{2} are arbitrary;

  2. (2)

    ฮฑ1โˆ’d/p1=ฮฑ2โˆ’d/p2\alpha_{1}-d/p_{1}=\alpha_{2}-d/p_{2} and q1โ‰คq2q_{1}\leq q_{2}.

The main result of the embedding is:

Theorem 2.9.
  1. (1)

    There holds

    (2.15) B2,1s+d/2โ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d)โ†ชBโˆž,1sโ€‹(โ„d).B_{2,1}^{s+d/2}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow B_{\infty,1}^{s}(\mathbb{R}^{d}).
  2. (2)

    The above embedding is optimal in the sense that B2,1s+d/2โ€‹(โ„d)B_{2,1}^{s+d/2}(\mathbb{R}^{d}) is the biggest one of all Bp,qฮฑโ€‹(โ„d)B_{p,q}^{\alpha}(\mathbb{R}^{d}) satisfying Bp,qฮฑโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d)B_{p,q}^{\alpha}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}), and Bโˆž,1sโ€‹(โ„d)B_{\infty,1}^{s}(\mathbb{R}^{d}) is the smallest one of all Bp,qฮฑโ€‹(โ„d)B_{p,q}^{\alpha}(\mathbb{R}^{d}) satisfying โ„ฌsโ€‹(โ„d)โ†ชBp,qฮฑโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow B_{p,q}^{\alpha}(\mathbb{R}^{d}).

Proof.

To prove (1), firstly, for any fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}),

โ€–fโ€–โ„ฌsโ€‹(โ„d)=\displaystyle\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}= โˆ‘j=0โˆžโˆซโ„d(1+|ฮพ|s)โ€‹ฯ†jโ€‹(ฮพ)โ€‹|f^โ€‹(ฮพ)|โ€‹dฮพ\displaystyle\sum_{j=0}^{\infty}\int_{\mathbb{R}^{d}}(1+\lvert\xi\rvert^{s})\varphi_{j}(\xi)\lvert\widehat{f}(\xi)\rvert\mathrm{d}\xi
โ‰ค\displaystyle\leq โˆ‘j=0โˆž(โˆซsuppย โ€‹ฯ†j(1+|ฮพ|s)2โ€‹dฮพ)1/2โ€‹โ€–ฯ†jโ€‹f^โ€–L2โ€‹(โ„d).\displaystyle\sum_{j=0}^{\infty}\left(\int_{\text{supp\;}\varphi_{j}}(1+\lvert\xi\rvert^{s})^{2}\mathrm{d}\xi\right)^{1/2}\|\,\varphi_{j}\widehat{f}\,\|_{L^{2}(\mathbb{R}^{d})}.

A direct calculation gives: for j=0,1,โ€ฆj=0,1,\dots,

โˆซsuppย โ€‹ฯ†j(1+|ฮพ|s)2โ€‹dฮพ\displaystyle\int_{\text{supp\;}\varphi_{j}}(1+\lvert\xi\rvert^{s})^{2}\mathrm{d}\xi โ‰คโˆซ0โ‰ค|ฮพ|โ‰ค2j+1(1+|ฮพ|s)2โ€‹dฮพ\displaystyle\leq\int_{0\leq\lvert\xi\rvert\leq 2^{j+1}}(1+\lvert\xi\rvert^{s})^{2}\mathrm{d}\xi
=ฯ‰dโˆ’1โ€‹โˆซ02j+1(1+rs)2โ€‹rdโˆ’1โ€‹dr\displaystyle=\omega_{d-1}\int_{0}^{2^{j+1}}(1+r^{s})^{2}r^{d-1}\mathrm{d}\,r
โ‰ค2โ€‹ฯ‰dโˆ’1โ€‹โˆซ02j+1(1+r2โ€‹s)โ€‹rdโˆ’1โ€‹dr\displaystyle\leq 2\omega_{d-1}\int_{0}^{2^{j+1}}(1+r^{2s})r^{d-1}\mathrm{d}\,r
โ‰ค2โ€‹ฯ‰dโˆ’1โ€‹(2(j+1)โ€‹dd+2(j+1)โ€‹(2โ€‹s+d)2โ€‹s+d)\displaystyle\leq 2\omega_{d-1}\left(\dfrac{2^{(j+1)d}}{d}+\dfrac{2^{(j+1)(2s+d)}}{2s+d}\right)
โ‰ค4โ€‹ฮฝdโ€‹2(j+1)โ€‹(2โ€‹s+d).\displaystyle\leq 4\nu_{d}2^{(j+1)(2s+d)}.

Using the Plancherelโ€™s theorem, we get

โ€–fโ€–โ„ฌsโ€‹(โ„d)\displaystyle\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})} โ‰ค2s+1+d/2โ€‹ฮฝdโ€‹โˆ‘j=0โˆž2jโ€‹(s+d/2)โ€‹โ€–ฯ†jโ€‹f^โ€–L2โ€‹(โ„d)\displaystyle\leq 2^{s+1+d/2}\sqrt{\nu_{d}}\sum_{j=0}^{\infty}2^{j(s+d/2)}\|\,\varphi_{j}\widehat{f}\,\|_{L^{2}(\mathbb{R}^{d})}
=2s+1+d/2โ€‹ฮฝdโ€‹โˆ‘j=0โˆž2jโ€‹(s+d/2)โ€‹โ€–(ฯ†jโ€‹f^)โˆจโ€–L2โ€‹(โ„d)\displaystyle=2^{s+1+d/2}\sqrt{\nu_{d}}\sum_{j=0}^{\infty}2^{j(s+d/2)}\|\,(\varphi_{j}\widehat{f})^{\vee}\,\|_{L^{2}(\mathbb{R}^{d})}
=2s+1+d/2โ€‹ฮฝdโ€‹โ€–fโ€–B2,1s+d/2โ€‹(โ„d).\displaystyle=2^{s+1+d/2}\sqrt{\nu_{d}}\|\,f\,\|_{B_{2,1}^{s+d/2}(\mathbb{R}^{d})}.

Next, for any fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}), by Lemmaย 2.1, we have ฯ†jโ€‹f^โˆˆL1โ€‹(โ„d)\varphi_{j}\widehat{f}\in L^{1}(\mathbb{R}^{d}), using the Hausdorff-Young inequalityย (2.1), we obtain

โ€–fโ€–Bโˆž,1sโ€‹(โ„d)=\displaystyle\|\,f\,\|_{B_{\infty,1}^{s}(\mathbb{R}^{d})}= โˆ‘j=0โˆž2sโ€‹jโ€‹โ€–(ฯ†jโ€‹f^)โˆจโ€–Lโˆžโ€‹(โ„d)โ‰คโˆ‘j=0โˆž2sโ€‹jโ€‹โ€–ฯ†jโ€‹f^โ€–L1โ€‹(โ„d)\displaystyle\sum_{j=0}^{\infty}2^{sj}\|\,(\varphi_{j}\widehat{f})^{\vee}\,\|_{L^{\infty}(\mathbb{R}^{d})}\leq\sum_{j=0}^{\infty}2^{sj}\|\,\varphi_{j}\widehat{f}\,\|_{L^{1}(\mathbb{R}^{d})}
โ‰ค\displaystyle\leq โ€–ฯ†0โ€‹f^โ€–L1โ€‹(โ„d)+2sโ€‹โˆ‘j=1โˆžโˆซโ„dฯ†jโ€‹(ฮพ)โ€‹|ฮพ|sโ€‹|f^โ€‹(ฮพ)|โ€‹dฮพ\displaystyle\|\,\varphi_{0}\widehat{f}\,\|_{L^{1}(\mathbb{R}^{d})}+2^{s}\sum_{j=1}^{\infty}\int_{\mathbb{R}^{d}}\varphi_{j}(\xi)\lvert\xi\rvert^{s}\lvert\widehat{f}(\xi)\rvert\mathrm{d}\xi
โ‰ค\displaystyle\leq ฯ…f,0+2sโ€‹ฯ…f,s.\displaystyle\upsilon_{f,0}+2^{s}\upsilon_{f,s}.

Therefore, โ€–fโ€–Bโˆž,1sโ€‹(โ„d)โ‰ค2sโ€‹โ€–fโ€–โ„ฌsโ€‹(โ„d)\|\,f\,\|_{B_{\infty,1}^{s}(\mathbb{R}^{d})}\leq 2^{s}\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}. This provesย (2.15) with

2โˆ’sโ€‹โ€–fโ€–Bโˆž,1sโ€‹(โ„d)โ‰คโ€–fโ€–โ„ฌsโ€‹(โ„d)โ‰ค2s+1+d/2โ€‹ฮฝdโ€‹โ€–fโ€–B2,1s+d/2โ€‹(โ„d).2^{-s}\|\,f\,\|_{B_{\infty,1}^{s}(\mathbb{R}^{d})}\leq\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}\leq 2^{s+1+d/2}\sqrt{\nu_{d}}\|\,f\,\|_{B_{2,1}^{s+d/2}(\mathbb{R}^{d})}.

It remains to show the embeddingย (2.15) is optimal. Suppose that there exists Bp,qฮฑโ€‹(โ„d)B_{p,q}^{\alpha}(\mathbb{R}^{d}) such that

B2,1s+d/2โ€‹(โ„d)โ†ชBp,qฮฑโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d)โ†ชBโˆž,1sโ€‹(โ„d),B_{2,1}^{s+d/2}(\mathbb{R}^{d})\hookrightarrow B_{p,q}^{\alpha}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow B_{\infty,1}^{s}(\mathbb{R}^{d}),

using Lemmaย 2.8, we would have 2โ‰คpโ‰คโˆž,ฮฑ=s+d/p2\leq p\leq\infty,\alpha=s+d/p and q=1q=1. In what follows, we shall exploit an example adopted fromย [Lieb:2001]*Ch. 5, Ex. 9 to show that Bp,1ฮฑโ€‹(โ„d)โ†ชฬธโ„ฌsโ€‹(โ„d)B_{p,1}^{\alpha}(\mathbb{R}^{d})\not\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}) when 2<p<โˆž2<p<\infty and ฮฑ>0\alpha>0. Therefore, Bp,1s+d/pโ€‹(โ„d)โ†ชฬธโ„ฌsโ€‹(โ„d)B^{s+d/p}_{p,1}(\mathbb{R}^{d})\not\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}) for any 2<pโ‰คโˆž2<p\leq\infty.

Let

ฯˆnโ€‹(x)=(1+iโ€‹n)โˆ’d/2โ€‹eโˆ’ฯ€โ€‹|x|2/(1+iโ€‹n).\psi_{n}(x)=(1+in)^{-d/2}e^{-\pi\lvert x\rvert^{2}/(1+in)}.

A direct calculation gives ฯˆ^nโ€‹(ฮพ)=eโˆ’ฯ€โ€‹(1+iโ€‹n)โ€‹|ฮพ|2\widehat{\psi}_{n}(\xi)=e^{-\pi(1+in)\lvert\xi\rvert^{2}}. Hence |ฯˆ^nโ€‹(ฮพ)|=eโˆ’ฯ€โ€‹|ฮพ|2โˆˆ๐’ฎโ€‹(โ„d)\lvert\widehat{\psi}_{n}(\xi)\rvert=e^{-\pi\lvert\xi\rvert^{2}}\in\mathscr{S}(\mathbb{R}^{d}) and

โ€–ฯˆnโ€–โ„ฌsโ€‹(โ„d)=1+ฮ“โ€‹((s+d)/2)ฮ“โ€‹(d/2)โ€‹ฯ€s/2,\|\,\psi_{n}\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}=1+\dfrac{\Gamma((s+d)/2)}{\Gamma(d/2)\pi^{s/2}},

which is independent of nn. We shall prove in Appendixย A.4 that when 1โ‰คp<โˆž1\leq p<\infty and ฮฑ>0\alpha>0,

(2.16) โ€–ฯˆnโ€–Bp,1ฮฑโ€‹(โ„d)โ‰คCโ€‹(1+n2)โˆ’dโ€‹(pโˆ’2)/(4โ€‹p).\|\,\psi_{n}\,\|_{B_{p,1}^{\alpha}(\mathbb{R}^{d})}\leq C(1+n^{2})^{-d(p-2)/(4p)}.

Therefore โ€–ฯˆnโ€–Bp,1ฮฑโ€‹(โ„d)โ†’0\|\,\psi_{n}\,\|_{B_{p,1}^{\alpha}(\mathbb{R}^{d})}\to 0 when p>2p>2 and nโ†’โˆžn\to\infty.

On the other hand, we cannot expect that there exists certain p<โˆžp<\infty such that โ„ฌsโ€‹(โ„d)โ†ชBp,1s+d/pโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow B^{s+d/p}_{p,1}(\mathbb{R}^{d}). Otherwise, we would have

โ„ฌsโ€‹(โ„d)โ†ชBp,1s+d/pโ€‹(โ„d)โ†ชLpโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow B^{s+d/p}_{p,1}(\mathbb{R}^{d})\hookrightarrow L^{p}(\mathbb{R}^{d})

because of Lemmaย 2.8 andย [Triebel:1983]*ยงย 2.5.7, Proposition. This contradicts with the fact โ„ฌsโ€‹(โ„d)โ†ชฬธLpโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d})\not\hookrightarrow L^{p}(\mathbb{R}^{d}), which has been proved in Lemmaย 2.5. โˆŽ

As a consequence of Theoremย 2.9 and Lemmaย 2.5, we establish the embedding between the spectral Barron space and the Sobolev spaces.

Definition 2.10 (Fractional Sobolev space).

Let 1โ‰คp<โˆž1\leq p<\infty and non-integer ฮฑ>0\alpha>0, then the fractional Sobolev space

Wpฮฑโ€‹(โ„d):={fโˆˆWpโŒŠฮฑโŒ‹โ€‹(โ„d)โˆฃโˆฌโ„dร—โ„d|โˆ‡โŒŠฮฑโŒ‹fโ€‹(x)โˆ’โˆ‡โŒŠฮฑโŒ‹fโ€‹(y)|p|xโˆ’y|d+(ฮฑโˆ’โŒŠฮฑโŒ‹)โ€‹pโ€‹dxโ€‹dy<โˆž}W^{\alpha}_{p}(\mathbb{R}^{d}){:}=\left\{\,f\in W^{\lfloor\alpha\rfloor}_{p}(\mathbb{R}^{d})\,\mid\,\iint_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\dfrac{\lvert\nabla^{\lfloor\alpha\rfloor}f(x)-\nabla^{\lfloor\alpha\rfloor}f(y)\rvert^{p}}{\lvert x-y\rvert^{d+(\alpha-\lfloor\alpha\rfloor)p}}\mathrm{d}x\mathrm{d}y<\infty\,\right\}

equipped with the norm

โ€–fโ€–Wpฮฑโ€‹(โ„d):=โ€–fโ€–WpโŒŠฮฑโŒ‹โ€‹(โ„d)+(โˆฌโ„dร—โ„d|โˆ‡โŒŠฮฑโŒ‹fโ€‹(x)โˆ’โˆ‡โŒŠฮฑโŒ‹fโ€‹(y)|p|xโˆ’y|d+(ฮฑโˆ’โŒŠฮฑโŒ‹)โ€‹pโ€‹dxโ€‹dy)1/p.\|\,f\,\|_{W^{\alpha}_{p}(\mathbb{R}^{d})}{:}=\|\,f\,\|_{W^{\lfloor\alpha\rfloor}_{p}(\mathbb{R}^{d})}+\left(\iint_{\mathbb{R}^{d}\times\mathbb{R}^{d}}\dfrac{\lvert\nabla^{\lfloor\alpha\rfloor}f(x)-\nabla^{\lfloor\alpha\rfloor}f(y)\rvert^{p}}{\lvert x-y\rvert^{d+(\alpha-\lfloor\alpha\rfloor)p}}\mathrm{d}x\mathrm{d}y\right)^{1/p}.

We firstly recall the relation between the Sobolev space and โ„ฌpsโ€‹(โ„d)\mathscr{B}^{s}_{p}(\mathbb{R}^{d}), which has been proved inย [MengMing:2022]*Theorem 4.3.

Lemma 2.11 ([MengMing:2022]*Theorem 4.3).
  1. (1)

    If 1โ‰คpโ‰ค21\leq p\leq 2 and ฮฑ>s+d/p>0\alpha>s+d/p>0, then

    Wpฮฑโ€‹(โ„d)โ†ชโ„ฌpsโ€‹(โ„d).W^{\alpha}_{p}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}_{p}(\mathbb{R}^{d}).
  2. (2)

    If s>โˆ’ds>-d is not an integer or s>โˆ’ds>-d is an integer and dโ‰ฅ2d\geq 2, then

    W1s+dโ€‹(โ„d)โ†ชโ„ฌ1sโ€‹(โ„d).W^{s+d}_{1}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}_{1}(\mathbb{R}^{d}).

It follows from the above lemma and Lemmaย 2.5 that

Corollary 2.12.
  1. (1)

    If 1โ‰คpโ‰ค21\leq p\leq 2 and ฮฑ>s+d/p\alpha>s+d/p, there holds

    (2.17) Wpฮฑโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d)โ†ชCsโ€‹(โ„d).W^{\alpha}_{p}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow C^{s}(\mathbb{R}^{d}).
  2. (2)

    If ss is not an integer or ss is an integer and dโ‰ฅ2d\geq 2, then

    W1s+dโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d).W^{s+d}_{1}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}).

The first embedding with p=2p=2 and s=1s=1 was hidden inย [Barron:1993]*ย ยงย II, Para. 7;ย ยงย IX, 15.

Proof.

By Lemmaย 2.11 and Lemmaย 2.5, when ฮฑ>s+d/p\alpha>s+d/p and 1โ‰คpโ‰ค21\leq p\leq 2, we have

Wpฮฑโ€‹(โ„d)โ†ชโ„ฌpsโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d).W^{\alpha}_{p}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}_{p}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}).

When ss is not an integer or ss is an integer and dโ‰ฅ2d\geq 2, there holds

W1s+dโ€‹(โ„d)โ†ชโ„ฌ1sโ€‹(โ„d)โ†ชโ„ฌsโ€‹(โ„d).W^{s+d}_{1}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}_{1}(\mathbb{R}^{d})\hookrightarrow\mathscr{B}^{s}(\mathbb{R}^{d}).

It remains to prove the right-hand side ofย (2.17). Using Theoremย 2.9,

โ„ฌsโ€‹(โ„d)โ†ชBโˆž,1sโ€‹(โ„d)โ†ชCsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d})\hookrightarrow B^{s}_{\infty,1}(\mathbb{R}^{d})\hookrightarrow C^{s}(\mathbb{R}^{d})

due to Theoremย 2.9, Lemmaย 2.8 andย [Triebel:1983]*ยงย 2.3.5, Eq. (1); ยงย 2.5.7, Eq. (2), (9), (11). โˆŽ

3. Application to deep neural network approximation

The embedding results proved in Theoremย 2.9 and Corollaryย 2.12 indicate that ss is a smoothness index. Consequently, we are interested in exploring the approximate rate when ss is small with โ„ฌs\mathscr{B}^{s} as the target function space. To facilitate our analysis, we shall focus on the hypercube ฮฉ:=[0,1]d\Omega{:}=[0,1]^{d}, and the spectral norm for function ff defined on ฮฉ\Omega is

ฯ…f,s,ฮฉ=infEโ€‹f|ฮฉ=fโˆซโ„dโ€–ฮพโ€–1sโ€‹|Eโ€‹f^โ€‹(ฮพ)|โ€‹dฮพ,\upsilon_{f,s,\Omega}=\inf_{Ef|_{\Omega}=f}\int_{\mathbb{R}^{d}}\|\,\xi\,\|_{1}^{s}\lvert\widehat{Ef}(\xi)\rvert\mathrm{d}\xi,

where the infimum is taken for all extension operators E:ฮฉโ†’โ„dE:\Omega\to\mathbb{R}^{d}. To simplify the notations, we employ ff to denote Eโ€‹fEf subsequently. We replace |ฮพ|\lvert\xi\rvert by โ€–ฮพโ€–1\|\,\xi\,\|_{1} in the definition of ฯ…f,s,ฮฉ\upsilon_{f,s,\Omega}, the latter seems more natural for studying the approximation over the hypercube as suggested byย [Barron:1993]*ยงย V.

Definition 3.1.

A sigmoidal function is a bounded function ฯƒ:โ„โ†ฆโ„\sigma:\mathbb{R}\mapsto\mathbb{R} such that

limtโ†’โˆ’โˆžฯƒโ€‹(t)=0,limtโ†’+โˆžฯƒโ€‹(t)=1.\lim_{t\to-\infty}\sigma(t)=0,\qquad\lim_{t\to+\infty}\sigma(t)=1.

For example, the Heaviside function ฯ‡[0,โˆž)\chi_{[0,\infty)} is a sigmoidal function.

A classical idea for the approximation error of neural networks with sigmoidal activation functions ฯƒ\sigma is to use the Heaviside function ฯ‡[0,โˆž)\chi_{[0,\infty)} as a transition. Caragea et. al.ย [Caragea:2023] pointed out that the gap between sigmoidal function ฯƒ\sigma and the Heaviside function ฯ‡[0,โˆž)\chi_{[0,\infty)} cannot be dismissed in Lโˆžโ€‹(ฮฉ)L^{\infty}(\Omega). While this gap does not exist in L2โ€‹(ฮฉ)L^{2}(\Omega).

Lemma 3.2.

For fixed ฯ‰โˆˆโ„d\{0}\omega\in\mathbb{R}^{d}\backslash\{0\} and bโˆˆโ„b\in\mathbb{R},

limฯ„โ†’โˆžโ€–ฯƒโ€‹(ฯ„โ€‹(ฯ‰โ‹…x+b))โˆ’ฯ‡[0,โˆž)โ€‹(ฯ‰โ‹…x+b)โ€–L2โ€‹(ฮฉ)=0.\lim_{\tau\to\infty}\|\,\sigma(\tau(\omega\cdot x+b))-\chi_{[0,\infty)}(\omega\cdot x+b)\,\|_{L^{2}(\Omega)}=0.
Proof.

Note that

limtโ†’ยฑโˆž|ฯƒโ€‹(t)โˆ’ฯ‡[0,โˆž)โ€‹(t)|=0.\lim_{t\to\pm\infty}\lvert\sigma(t)-\chi_{[0,\infty)}(t)\rvert=0.

We divide the cube ฮฉ\Omega into ฮฉ1:={xโˆˆฮฉโˆฃ|ฯ„โ€‹(ฯ‰โ‹…x+b)|<ฮด}\Omega_{1}{:}=\left\{\,x\in\Omega\,\mid\,\lvert\tau(\omega\cdot x+b)\rvert<\delta\,\right\} and ฮฉ2:=ฮฉโˆ–ฮฉ1\Omega_{2}{:}=\Omega\setminus\Omega_{1}. With proper choice of ฮด>0\delta>0 and ฯ„>0\tau>0 large enough, we can obtain that the L2L^{2}-distance between ฯƒโ€‹(ฯ„โ€‹(ฯ‰โ‹…x+b))\sigma(\tau(\omega\cdot x+b)) and ฯ‡[0,โˆž)โ€‹(ฯ‰โ‹…x+b)\chi_{[0,\infty)}(\omega\cdot x+b) is arbitrarily small. โˆŽ

For a shallow neural network, the following lemma inย [Barron:1993] is proved for the real-valued function, while it is straightforward to extend the proof to the complex-valued function.

Lemma 3.3 ([Barron:1993]*Theorem 1).

Let fโˆˆโ„ฌ1โ€‹(โ„d)f\in\mathscr{B}^{1}(\mathbb{R}^{d}), there exists

(3.1) fNโ€‹(x)=โˆ‘i=1Nciโ€‹ฯƒโ€‹(ฯ‰iโ‹…x+bi)f_{N}(x)=\sum_{i=1}^{N}c_{i}\sigma(\omega_{i}\cdot x+b_{i})

with ฯ‰iโˆˆโ„d,biโˆˆโ„\omega_{i}\in\mathbb{R}^{d},b_{i}\in\mathbb{R} and ciโˆˆโ„‚c_{i}\in\mathbb{C} such that

โ€–fโˆ’fNโ€–L2โ€‹(ฮฉ)โ‰ค2โ€‹ฯ…f,1,ฮฉN.\|\,f-f_{N}\,\|_{L^{2}(\Omega)}\leq\dfrac{2\upsilon_{f,1,\Omega}}{\sqrt{N}}.

In this part, we shall show the approximation error for the deep neural network. We use the (L,N)(L,N)-network to describe a neural network with LL hidden layers and at most NN units per layer. Here LL denotes the number of hidden layers, e.g., the shallow neural network, expressed asย (3.1), is an (1,N)(1,N)-network.

Definition 3.4 ((L,N)(L,N)-network).

An (L,N)(L,N)-network represents a neural network with LL hidden layers and at most NN units per layer. The activation functions of the first Lโˆ’1L-1 layers are all ReLU and the activation function of the last layer is the sigmoidal function. The connection weights between the input layer and the hidden layer, and between the hidden layer and the hidden layer are all real numbers. The connection weights between the last hidden layer and the output layer are complex numbers.

Here we make some preparations for the rest work. The analysis in this part owns the most toย [BreslerNagaraj:2020] with certain improvements that will be detailed later on. For any function gg defined on [0,1][0,1] and it is symmetric about x=1/2x=1/2, We use the notation g,ng_{,n} to denote the function gg in the [0,1][0,1] interval of the period repeated nn times, i.e.,

(3.2) g,nโ€‹(t)=gโ€‹(nโ€‹tโˆ’j),j=0,โ€ฆ,nโˆ’1,0โ‰คnโ€‹tโˆ’jโ‰ค1.g_{,n}(t)=g(nt-j),\quad j=0,\dots,n-1,\quad 0\leq nt-j\leq 1.

Define

ฮฒโ€‹(t)=ReLUโ€‹(2โ€‹t)โˆ’2โ€‹ReLUโ€‹(2โ€‹tโˆ’1)+ReLUโ€‹(2โ€‹tโˆ’2)={2โ€‹t,0โ‰คtโ‰ค1/2,2โˆ’2โ€‹t,1/2โ‰คtโ‰ค1,0,otherwise.\beta(t)=\text{ReLU}(2t)-2\text{ReLU}(2t-1)+\text{ReLU}(2t-2)=\begin{cases}2t,&0\leq t\leq 1/2,\\ 2-2t,&1/2\leq t\leq 1,\\ 0,&\text{otherwise}.\end{cases}

By definitionย (3.2), ฮฒ,n\beta_{,n} represents a triangle function with nn peaks and can be represented by 3โ€‹n3n ReLUs:

ฮฒ,nโ€‹(t)=โˆ‘j=0nโˆ’1ฮฒโ€‹(nโ€‹tโˆ’j),0โ‰คtโ‰ค1.\beta_{,n}(t)=\sum_{j=0}^{n-1}\beta(nt-j),\quad 0\leq t\leq 1.
Lemma 3.5.

Let gg be a function defined on [0,1][0,1] and symmetric about x=1/2x=1/2, then g,n2โˆ˜ฮฒ,n1=g,2n1n2g_{,n_{2}}\circ\beta_{,n_{1}}=g_{,2n_{1}n_{2}} on [0,1][0,1].

The above lemma is a rigorous statement ofย [Telgarsky:2016]*Proposition 5.1. A key example is cosโก(2โ€‹ฯ€โ€‹n2โ€‹ฮฒ,n1โ€‹(t))=cosโก(4โ€‹ฯ€โ€‹n1โ€‹n2โ€‹t)\cos(2\pi n_{2}\beta_{,n_{1}}(t))=\cos(4\pi n_{1}n_{2}t) when tโˆˆ[0,1]t\in[0,1]. A geometrical explanation may be founded in ย [Bolcskei:2021]*Figure 3. We postpone the rigorous proof in Appendixย A.5.

For rโˆˆ(0,1)r\in(0,1), we define

ฮฑโ€‹(t,r)=ฯ‡[0,โˆž)โ€‹(tโˆ’r/2)โˆ’ฯ‡[0,โˆž)โ€‹(tโˆ’(1โˆ’r)/2)={ฯ‡[r/2,(1โˆ’r)/2]โ€‹(t),0<rโ‰ค1/2,โˆ’ฯ‡[(1โˆ’r)/2,r/2]โ€‹(t),1/2โ‰คr<1,\alpha(t,r)=\chi_{[0,\infty)}(t-r/2)-\chi_{[0,\infty)}(t-(1-r)/2)=\begin{cases}\chi_{[r/2,(1-r)/2]}(t),&0<r\leq 1/2,\\ -\chi_{[(1-r)/2,r/2]}(t),&1/2\leq r<1,\end{cases}

then supp(ฮฑโ€‹(โ‹…,r))โŠ‚(0,1/2)(\alpha(\cdot,r))\subset(0,1/2) and ฮฑโ€‹(t,r)\alpha(t,r) is symmetric about t=1/4t=1/4. Define

ฮณโ€‹(t,r)=ฮฑโ€‹(t+1/4,r)โˆ’ฮฑโ€‹(tโˆ’1/4,r)+ฮฑโ€‹(tโˆ’3/4,r).\gamma(t,r)=\alpha(t+1/4,r)-\alpha(t-1/4,r)+\alpha(t-3/4,r).

Then ฮณโ€‹(t,r)\gamma(t,r) is symmetric about t=1/2t=1/2 because

ฮณโ€‹(1โˆ’t,r)=\displaystyle\gamma(1-t,r)= ฮฑโ€‹(5/4โˆ’t,r)โˆ’ฮฑโ€‹(3/4โˆ’t,r)+ฮฑโ€‹(1/4โˆ’t,r)\displaystyle\alpha(5/4-t,r)-\alpha(3/4-t,r)+\alpha(1/4-t,r)
=\displaystyle= ฮฑโ€‹(tโˆ’3/4,r)โˆ’ฮฑโ€‹(tโˆ’1/4,r)+ฮฑโ€‹(t+1/4,r)=ฮณโ€‹(t,r).\displaystyle\alpha(t-3/4,r)-\alpha(t-1/4,r)+\alpha(t+1/4,r)=\gamma(t,r).

By definitionย (3.2), ฮณ,nโ€‹(โ‹…,r)\gamma_{,n}(\cdot,r) is well defined on [0,1][0,1] and

ฮณ,nโ€‹(t,r)=\displaystyle\gamma_{,n}(t,r)= {ฮฑโ€‹(nโ€‹tโˆ’j+1/4,r),0โ‰คnโ€‹tโˆ’jโ‰ค1/4,โˆ’ฮฑโ€‹(nโ€‹tโˆ’jโˆ’1/4,r),1/4โ‰คnโ€‹tโˆ’jโ‰ค3/4,ฮฑโ€‹(nโ€‹tโˆ’jโˆ’3/4,r),3/4โ‰คnโ€‹tโˆ’jโ‰ค1,\displaystyle\begin{cases}\alpha(nt-j+1/4,r),&0\leq nt-j\leq 1/4,\\ -\alpha(nt-j-1/4,r),&1/4\leq nt-j\leq 3/4,\\ \alpha(nt-j-3/4,r),&3/4\leq nt-j\leq 1,\end{cases} j=0,โ€ฆ,nโˆ’1\displaystyle j=0,\dots,n-1
=\displaystyle= {ฮฑโ€‹(nโ€‹tโˆ’j+1/4,r),โˆ’1/4โ‰คnโ€‹tโˆ’jโ‰ค1/4,โˆ’ฮฑโ€‹(nโ€‹tโˆ’jโˆ’1/4,r),1/4โ‰คnโ€‹tโˆ’jโ‰ค3/4,\displaystyle\begin{cases}\alpha(nt-j+1/4,r),&-1/4\leq nt-j\leq 1/4,\\ -\alpha(nt-j-1/4,r),&1/4\leq nt-j\leq 3/4,\\ \end{cases} j=0,โ€ฆ,n.\displaystyle j=0,\dots,n.

And ฮณ,nโ€‹(โ‹…,r)\gamma_{,n}(\cdot,r) on [0,1][0,1] can be represents by 4โ€‹n4n Heaviside function ฯ‡[0,โˆž)\chi_{[0,\infty)} due to ฮฑโ€‹(nโ€‹t+1/4,r),ฮฑโ€‹(nโ€‹tโˆ’n+1/4,r)\alpha(nt+1/4,r),\alpha(nt-n+1/4,r) only need one Heaviside function each:

ฮณ,nโ€‹(t,r)=โˆ‘j=0nฮฑโ€‹(nโ€‹tโˆ’j+1/4,r)โˆ’โˆ‘j=0nโˆ’1ฮฑโ€‹(nโ€‹tโˆ’jโˆ’1/4,r).\gamma_{,n}(t,r)=\sum_{j=0}^{n}\alpha(nt-j+1/4,r)-\sum_{j=0}^{n-1}\alpha(nt-j-1/4,r).

A direct consequence of the above construction is

Lemma 3.6.

For tโˆˆ[0,1]t\in[0,1], there holds

(3.3) ฯ€2โ€‹โˆซ01cosโก(ฯ€โ€‹r)โ€‹ฮณ,nโ€‹(t,r)โ€‹dr=cosโก(2โ€‹ฯ€โ€‹nโ€‹t).\dfrac{\pi}{2}\int_{0}^{1}\cos(\pi r)\gamma_{,n}(t,r)\mathrm{d}r=\cos(2\pi nt).
Proof.

For any tโˆˆ[0,1/2]t\in[0,1/2], a direct calculation gives

ฯ€2โ€‹โˆซ01cosโก(ฯ€โ€‹r)โ€‹ฮฑโ€‹(t,r)โ€‹dr=ฯ€โ€‹โˆซ02โ€‹tcosโก(ฯ€โ€‹r)โ€‹dr=sinโก(2โ€‹ฯ€โ€‹t).\dfrac{\pi}{2}\int_{0}^{1}\cos(\pi r)\alpha(t,r)\mathrm{d}r=\pi\int_{0}^{2t}\cos(\pi r)\mathrm{d}r=\sin(2\pi t).

Fix a tโˆˆ[0,1]t\in[0,1]. If there exists an integer jj satisfying 0โ‰คjโ‰คn0\leq j\leq n and โˆ’1/4โ‰คnโ€‹tโˆ’jโ‰ค1/4-1/4\leq nt-j\leq 1/4, then ฮณ,nโ€‹(t,r)=ฮฑโ€‹(nโ€‹tโˆ’j+1/4,r)\gamma_{,n}(t,r)=\alpha(nt-j+1/4,r) and

ฯ€2โ€‹โˆซ01cosโก(ฯ€โ€‹r)โ€‹ฮฑโ€‹(nโ€‹tโˆ’j+1/4,r)โ€‹dr=sinโก(2โ€‹ฯ€โ€‹(nโ€‹tโˆ’j+1/4))=cosโก(2โ€‹ฯ€โ€‹nโ€‹t).\dfrac{\pi}{2}\int_{0}^{1}\cos(\pi r)\alpha(nt-j+1/4,r)\mathrm{d}r=\sin(2\pi(nt-j+1/4))=\cos(2\pi nt).

Otherwise there exists an integer jj satisfying 0โ‰คjโ‰คnโˆ’10\leq j\leq n-1 and 1/4โ‰คnโ€‹tโˆ’jโ‰ค3/41/4\leq nt-j\leq 3/4. Then ฮณ,nโ€‹(t,r)=โˆ’ฮฑโ€‹(nโ€‹tโˆ’jโˆ’1/4,r)\gamma_{,n}(t,r)=-\alpha(nt-j-1/4,r) and

โˆ’ฯ€2โ€‹โˆซ01cosโก(ฯ€โ€‹r)โ€‹ฮฑโ€‹(nโ€‹tโˆ’jโˆ’1/4,r)โ€‹dr=โˆ’sinโก(2โ€‹ฯ€โ€‹(nโ€‹tโˆ’jโˆ’1/4))=cosโก(2โ€‹ฯ€โ€‹nโ€‹t).-\dfrac{\pi}{2}\int_{0}^{1}\cos(\pi r)\alpha(nt-j-1/4,r)\mathrm{d}r=-\sin(2\pi(nt-j-1/4))=\cos(2\pi nt).

This completes the proof ofย (3.3). โˆŽ

Now we are ready to give an approximation result for deep neural networks, which follows the framework ofย [BreslerNagaraj:2020], while we achieve a higher order convergence rate and the prefactor is dimension-free.

Lemma 3.7.

Let the positive integer LL and fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}) with 0<sโ€‹Lโ‰ค1/20<sL\leq 1/2 and suppโ€‹fโŠ‚{ฮพโˆˆโ„dโˆฃโ€–ฮพโ€–1โ‰ฅ1}\mathrm{supp\;}f\subset\left\{\,\xi\in\mathbb{R}^{d}\,\mid\,\|\,\xi\,\|_{1}\geq 1\,\right\}. For any positive integer NN there exists an (L,N)(L,N)-network fNf_{N} such that

(3.4) โ€–fโˆ’fNโ€–L2โ€‹(ฮฉ)โ‰ค22โ€‹ฯ…f,s,ฮฉNsโ€‹L.\|\,f-f_{N}\,\|_{L^{2}(\Omega)}\leq\dfrac{22\upsilon_{f,s,\Omega}}{N^{sL}}.
Proof.

By Lemmaย 2.1, for fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}), assume ff is real-valued, then

fโ€‹(x)=โˆซโ„df^โ€‹(ฮพ)โ€‹e2โ€‹ฯ€โ€‹iโ€‹ฮพโ‹…xโ€‹dฮพ=โˆซโ„d|f^โ€‹(ฮพ)|โ€‹cosโก(2โ€‹ฯ€โ€‹(ฮพโ‹…x+ฮธโ€‹(ฮพ)))โ€‹dฮพ,f(x)=\int_{\mathbb{R}^{d}}\widehat{f}(\xi)e^{2\pi i\xi\cdot x}\mathrm{d}\xi=\int_{\mathbb{R}^{d}}\lvert\widehat{f}(\xi)\rvert\cos(2\pi(\xi\cdot x+\theta(\xi)))\mathrm{d}\xi,

with proper choice ฮธโ€‹(ฮพ)\theta(\xi) such that 0โ‰คฮพโ‹…x+ฮธโ€‹(ฮพ)โ‰คโ€–ฮพโ€–1+10\leq\xi\cdot x+\theta(\xi)\leq\|\,\xi\,\|_{1}+1. For fixed ฮพ\xi, choose nฮพ=2Lโˆ’1โ€‹โŒˆ(โ€–ฮพโ€–1+1)1/LโŒ‰Ln_{\xi}=2^{L-1}\lceil(\|\,\xi\,\|_{1}+1)^{1/L}\rceil^{L} and tฮพโ€‹(x)=(ฮพโ‹…x+ฮธโ€‹(ฮพ))/nฮพt_{\xi}(x)=(\xi\cdot x+\theta(\xi))/n_{\xi}, then 0โ‰คtฮพโ€‹(x)โ‰ค10\leq t_{\xi}(x)\leq 1 and by Lemmaย 3.6,

fโ€‹(x)=โˆซโ„d|f^โ€‹(ฮพ)|โ€‹cosโก(2โ€‹ฯ€โ€‹nฮพโ€‹tฮพโ€‹(x))โ€‹dฮพ=ฯ€2โ€‹โˆซโ„d|f^โ€‹(ฮพ)|โ€‹dฮพโ€‹โˆซ01cosโก(ฯ€โ€‹r)โ€‹ฮณ,nฮพโ€‹(tฮพโ€‹(x),r)โ€‹dr.f(x)=\int_{\mathbb{R}^{d}}\lvert\widehat{f}(\xi)\rvert\cos(2\pi n_{\xi}t_{\xi}(x))\mathrm{d}\xi=\dfrac{\pi}{2}\int_{\mathbb{R}^{d}}\lvert\widehat{f}(\xi)\rvert\mathrm{d}\xi\int_{0}^{1}\cos(\pi r)\gamma_{,n_{\xi}}(t_{\xi}(x),r)\mathrm{d}r.

Define the probability measure

(3.5) ฮผโ€‹(dโ€‹ฮพ,dโ€‹r)=1Qโ€‹โ€–ฮพโ€–1โˆ’sโ€‹|f^โ€‹(ฮพ)|โ€‹ฯ‡(0,1)โ€‹(r)โ€‹dโ€‹ฮพโ€‹dโ€‹r,\mu(\mathrm{d}\xi,\mathrm{d}r)=\dfrac{1}{Q}\|\,\xi\,\|_{1}^{-s}\lvert\widehat{f}(\xi)\rvert\chi_{(0,1)}(r)\mathrm{d}\xi\mathrm{d}r,

where QQ is the normalized factor that

Q=โˆซโ„dโ€–ฮพโ€–1โˆ’sโ€‹|f^โ€‹(ฮพ)|โ€‹dฮพโ€‹โˆซ01drโ‰คฯ…f,s,ฮฉ.Q=\int_{\mathbb{R}^{d}}\|\,\xi\,\|_{1}^{-s}\lvert\widehat{f}(\xi)\rvert\mathrm{d}\xi\int_{0}^{1}\mathrm{d}r\leq\upsilon_{f,s,\Omega}.

Therefore fโ€‹(x)=๐”ผ(ฮพ,r)โˆผฮผโ€‹Fโ€‹(x,ฮพ,r)f(x)=\mathbb{E}_{(\xi,r)\sim\mu}F(x,\xi,r) with

Fโ€‹(x,ฮพ,r)=ฯ€โ€‹Q2โ€‹โ€–ฮพโ€–1sโ€‹cosโก(ฯ€โ€‹r)โ€‹ฮณ,nฮพโ€‹(tฮพโ€‹(x),r).F(x,\xi,r)=\dfrac{\pi Q}{2}\|\,\xi\,\|_{1}^{s}\cos(\pi r)\gamma_{,n_{\xi}}(t_{\xi}(x),r).

If {ฮพi,ri}i=1m\{\xi_{i},r_{i}\}_{i=1}^{m} is an i.i.d. sequence of random samples from ฮผ\mu, and

f~=1mโ€‹โˆ‘i=1mFโ€‹(x,ฮพi,ri),\tilde{f}=\dfrac{1}{m}\sum_{i=1}^{m}F(x,\xi_{i},r_{i}),

then using Fubiniโ€™s theorem, we obtain

๐”ผ(ฮพi,ri)โˆผฮผโ€‹โ€–fโˆ’f~โ€–L2โ€‹(ฮฉ)2=\displaystyle\mathbb{E}_{(\xi_{i},r_{i})\sim\mu}\|\,f-\tilde{f}\,\|_{L^{2}(\Omega)}^{2}= โˆซฮฉ๐”ผ(ฮพi,ri)โˆผฮผโ€‹|๐”ผ(ฮพ,r)โˆผฮผโ€‹Fโ€‹(x,ฮพ,r)โˆ’f~โ€‹(x)|2โ€‹dx\displaystyle\int_{\Omega}\mathbb{E}_{(\xi_{i},r_{i})\sim\mu}\lvert\mathbb{E}_{(\xi,r)\sim\mu}F(x,\xi,r)-\tilde{f}(x)\rvert^{2}\mathrm{d}x
=\displaystyle= 1mโ€‹โˆซฮฉVar(ฮพ,r)โˆผฮผโ€‹Fโ€‹(x,ฮพ,r)โ€‹dx\displaystyle\dfrac{1}{m}\int_{\Omega}\mathrm{Var}_{(\xi,r)\sim\mu}F(x,\xi,r)\mathrm{d}x
โ‰ค\displaystyle\leq 1mโ€‹๐”ผ(ฮพ,r)โˆผฮผโ€‹โ€–Fโ€‹(โ‹…,ฮพ,r)โ€–Lโˆžโ€‹(ฮฉ)2.\displaystyle\dfrac{1}{m}\mathbb{E}_{(\xi,r)\sim\mu}\|\,F(\cdot,\xi,r)\,\|_{L^{\infty}(\Omega)}^{2}.

Note that

โ€–Fโ€‹(โ‹…,ฮพ,r)โ€–Lโˆžโ€‹(ฮฉ)โ‰คฯ€โ€‹Q2โ€‹โ€–ฮพโ€–1s,\|\,F(\cdot,\xi,r)\,\|_{L^{\infty}(\Omega)}\leq\dfrac{\pi Q}{2}\|\,\xi\,\|_{1}^{s},

we obtain

๐”ผ(ฮพi,ri)โˆผฮผโ€‹โ€–fโˆ’f~โ€–L2โ€‹(ฮฉ)2โ‰ค1mโ€‹๐”ผ(ฮพ,r)โˆผฮผโ€‹โ€–Fโ€‹(โ‹…,ฮพ,r)โ€–Lโˆžโ€‹(ฮฉ)2โ‰คฯ€2โ€‹Qโ€‹ฯ…f,s,ฮฉ4โ€‹m.\mathbb{E}_{(\xi_{i},r_{i})\sim\mu}\|\,f-\tilde{f}\,\|_{L^{2}(\Omega)}^{2}\leq\dfrac{1}{m}\mathbb{E}_{(\xi,r)\sim\mu}\|\,F(\cdot,\xi,r)\,\|_{L^{\infty}(\Omega)}^{2}\leq\dfrac{\pi^{2}Q\upsilon_{f,s,\Omega}}{4m}.

By Markovโ€™s inequality, with probability at least (1+ฮต)/(2+ฮต)(1+\varepsilon)/(2+\varepsilon), for some ฮต>0\varepsilon>0 to be chosen later on, we obtain

(3.6) โ€–fโˆ’f~โ€–L2โ€‹(ฮฉ)2โ‰ค(2+ฮต)โ€‹ฯ€2โ€‹Qโ€‹ฯ…f,s,ฮฉ4โ€‹m\|\,f-\tilde{f}\,\|_{L^{2}(\Omega)}^{2}\leq\dfrac{(2+\varepsilon)\pi^{2}Q\upsilon_{f,s,\Omega}}{4m}

It remains to calculate the number of units in each layer. For each ฮณ,nฮพโ€‹(tฮพโ€‹(x),r)\gamma_{,n_{\xi}}(t_{\xi}(x),r), choose n1=โ‹ฏ=nL=โŒˆ(โ€–ฮพโ€–1+1)1/LโŒ‰n_{1}=\dots=n_{L}=\lceil(\|\,\xi\,\|_{1}+1)^{1/L}\rceil, then nฮพ=2Lโˆ’1โ€‹n1โ€‹โ€ฆโ€‹nLn_{\xi}=2^{L-1}n_{1}\dots n_{L}, and by Lemmaย 3.5, ฮณ,nฮพโ€‹(โ‹…,r)=ฮณ,nLโ€‹(โ‹…,r)โˆ˜ฮฒ,nLโˆ’1โˆ˜โ‹ฏโˆ˜ฮฒ,n1\gamma_{,n_{\xi}}(\cdot,r)=\gamma_{,n_{L}}(\cdot,r)\circ\beta_{,n_{L-1}}\circ\dots\circ\beta_{,n_{1}} on [0,1][0,1]. Lemmaย 3.2 shows the Heaviside function ฯ‡[0,โˆž)\chi_{[0,\infty)} can be approximated by ฯƒ\sigma, and we need at most

maxโก{3โ€‹n1,โ€ฆ,3โ€‹nLโˆ’1,4โ€‹nL}โ‰ค4โ€‹โŒˆ(โ€–ฮพโ€–1+1)1/LโŒ‰โ‰ค12โ€‹โ€–ฮพโ€–11/L\max\{3n_{1},\dots,3n_{L-1},4n_{L}\}\leq 4\lceil(\|\,\xi\,\|_{1}+1)^{1/L}\rceil\leq 12\|\,\xi\,\|_{1}^{1/L}

units in each layer to represent ฮณ,nฮพโ€‹(tฮพโ€‹(x),r)\gamma_{,n_{\xi}}(t_{\xi}(x),r). Denote NN the total number of units in each layer, then Nโ‰ค12โ€‹โˆ‘i=1mโ€–ฮพiโ€–11/LN\leq 12\sum_{i=1}^{m}\|\,\xi_{i}\,\|_{1}^{1/L} and

๐”ผ(ฮพi,ri)โˆผฮผโ€‹N2โ€‹sโ€‹Lโ‰ค12โ€‹โˆ‘i=1m๐”ผ(ฮพi,ri)โˆผฮผโ€‹โ€–ฮพiโ€–12โ€‹sโ‰ค12โ€‹mโ€‹ฯ…f,s,ฮฉQ.\mathbb{E}_{(\xi_{i},r_{i})\sim\mu}N^{2sL}\leq 12\sum_{i=1}^{m}\mathbb{E}_{(\xi_{i},r_{i})\sim\mu}\|\,\xi_{i}\,\|_{1}^{2s}\leq\dfrac{12m\upsilon_{f,s,\Omega}}{Q}.

Again, by Markov inequality, with probability at least (1+ฮต)/(2+ฮต)(1+\varepsilon)/(2+\varepsilon), we obtain

(3.7) Qmโ‰ค12โ€‹(2+ฮต)โ€‹ฯ…f,s,ฮฉN2โ€‹sโ€‹L.\dfrac{Q}{m}\leq\dfrac{12(2+\varepsilon)\upsilon_{f,s,\Omega}}{N^{2sL}}.

Combiningย (3.6) andย (3.7), with probability at least ฮต/(2+ฮต)\varepsilon/(2+\varepsilon), there exists an (L,N)(L,N)-network fNf_{N} such that

โ€–fโˆ’fNโ€–L2โ€‹(ฮฉ)โ‰ค3โ€‹(2+ฮต)โ€‹ฯ€โ€‹ฯ…f,s,ฮฉNsโ€‹Lโ‰ค11โ€‹ฯ…f,s,ฮฉNsโ€‹L,\|\,f-f_{N}\,\|_{L^{2}(\Omega)}\leq\dfrac{\sqrt{3}(2+\varepsilon)\pi\upsilon_{f,s,\Omega}}{N^{sL}}\leq\dfrac{11\upsilon_{f,s,\Omega}}{N^{sL}},

with proper choice of ฮต\varepsilon in the last step. Finally, if ff is complex-valued, we approximate the real and imaginary parts of the function separately to obtainย (3.4). โˆŽ

Remark 3.8.

We assume suppโ€‹f^โŠ‚{ฮพโˆˆโ„dโˆฃโ€–ฮพโ€–1โ‰ฅ1}\mathrm{supp}\;\widehat{f}\subset\left\{\,\xi\in\mathbb{R}^{d}\,\mid\,\|\,\xi\,\|_{1}\geq 1\,\right\} in Lemmaย 3.7 because we want to obtain an upper bound depending only on ฯ…f,s,ฮฉ\upsilon_{f,s,\Omega}. If we give up this condition, then the upper bound inย (3.4) changes to Cโ€‹โ€–fโ€–โ„ฌsโ€‹(โ„d)/Nsโ€‹LC\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})}/N^{sL} for some dimension-free constant CC. The proof is essentially the same provided that the probability measureย (3.5) is replaced by

ฮผโ€‹(dโ€‹ฮพ,dโ€‹r)=1Qโ€‹(1+โ€–ฮพโ€–1)โˆ’sโ€‹|f^โ€‹(ฮพ)|โ€‹ฯ‡(0,1)โ€‹(r)โ€‹dโ€‹ฮพโ€‹dโ€‹r,\mu(\mathrm{d}\xi,\mathrm{d}r)=\dfrac{1}{Q}(1+\|\,\xi\,\|_{1})^{-s}\lvert\widehat{f}(\xi)\rvert\chi_{(0,1)}(r)\mathrm{d}\xi\mathrm{d}r,

We leave it to the interested reader.

There is relatively little work on the approximation rate of deep neural networks that employ the spectral Barron space as the target space. For deep ReLU networks,ย [BreslerNagaraj:2020] has proven approximation results of (sโ€‹L/2)(sL/2)-order. We shall show below this may be improved to sโ€‹LsL-order at the cost of ฯ…f,s,ฮฉ\upsilon_{f,s,\Omega} appears in the estimate.

Theorem 3.9.

Let the positive integer LL and fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}) with 0<sโ€‹Lโ‰ค1/20<sL\leq 1/2. For any positive integer NN there exists an (L,N+2)(L,N+2)-network fNf_{N} such that

(3.8) โ€–fโˆ’fNโ€–L2โ€‹(ฮฉ)โ‰ค29โ€‹ฯ…f,s,ฮฉNsโ€‹L.\|\,f-f_{N}\,\|_{L^{2}(\Omega)}\leq\dfrac{29\upsilon_{f,s,\Omega}}{N^{sL}}.

Moreover, if ff is a real-valued function, then the connection weights in fNf_{N} are all real.

Proof.

We may write f=f1+f2f=f_{1}+f_{2} with

f1โ€‹(x)=โˆซโ€–ฮพโ€–1<1f^โ€‹(ฮพ)โ€‹e2โ€‹ฯ€โ€‹iโ€‹ฮพโ‹…xโ€‹dฮพ,f2โ€‹(x)=โˆซโ€–ฮพโ€–1โ‰ฅ1f^โ€‹(ฮพ)โ€‹e2โ€‹ฯ€โ€‹iโ€‹ฮพโ‹…xโ€‹dฮพ.f_{1}(x)=\int_{\|\,\xi\,\|_{1}<1}\widehat{f}(\xi)e^{2\pi i\xi\cdot x}\mathrm{d}\xi,\qquad f_{2}(x)=\int_{\|\,\xi\,\|_{1}\geq 1}\widehat{f}(\xi)e^{2\pi i\xi\cdot x}\mathrm{d}\xi.

Then ฯ…f1,1,ฮฉโ‰คฯ…f,s,ฮฉ\upsilon_{f_{1},1,\Omega}\leq\upsilon_{f,s,\Omega} and ฯ…f2,s,ฮฉโ‰คฯ…f,s,ฮฉ\upsilon_{f_{2},s,\Omega}\leq\upsilon_{f,s,\Omega} because

f^1โ€‹(ฮพ)=f^โ€‹(ฮพ)โ€‹ฯ‡[0,1)โ€‹(โ€–ฮพโ€–1)andf^2โ€‹(ฮพ)=f^โ€‹(ฮพ)โ€‹ฯ‡[1,โˆž)โ€‹(โ€–ฮพโ€–1).\widehat{f}_{1}(\xi)=\widehat{f}(\xi)\chi_{[0,1)}(\|\,\xi\,\|_{1})\qquad\text{and}\qquad\widehat{f}_{2}(\xi)=\widehat{f}(\xi)\chi_{[1,\infty)}(\|\,\xi\,\|_{1}).

We approximate f1f_{1} with an (L,n1)(L,n_{1})-network with n1=โŒˆN/6โŒ‰n_{1}=\lceil N/6\rceil and obtain the error estimate. Applying Lemmaย 3.3 we obtain, there exists an (1,n1)(1,n_{1})-network f1,n1f_{1,n_{1}} such that

โ€–f1โˆ’f1,n1โ€–L2โ€‹(ฮฉ)โ‰ค2โ€‹ฯ…f1,1,ฮฉn11/2โ‰ค2โ€‹6โ€‹ฯ…f,s,ฮฉNsโ€‹L.\|\,f_{1}-f_{1,n_{1}}\,\|_{L^{2}(\Omega)}\leq\dfrac{2\upsilon_{f_{1},1,\Omega}}{n_{1}^{1/2}}\leq\dfrac{2\sqrt{6}\upsilon_{f,s,\Omega}}{N^{sL}}.

Additional emphasis needs to be placed on the fact that an (1,n1)(1,n_{1})-network can be represented by an (L,n1)(L,n_{1})-network. We just need to fill the rest of the hidden layers with

t={ReLUโ€‹(t),tโ‰ฅ0,โˆ’ReLUโ€‹(โˆ’t),t<0.t=\begin{cases}\text{ReLU}(t),&t\geq 0,\\ -\text{ReLU}(-t),&t<0.\end{cases}

Meanwhile, we approximate f2f_{2} with an (L,n2)(L,n_{2})-network with n2=โŒˆ5โ€‹N/6โŒ‰n_{2}=\lceil 5N/6\rceil and obtain the error estimate. Applying Lemmaย 3.7 we obtain, there exists an (L,n2)(L,n_{2}) network f2,n2f_{2,n_{2}} such that

โ€–f2โˆ’f2,n2โ€–L2โ€‹(ฮฉ)โ‰ค22โ€‹ฯ€โ€‹ฯ…f2,s,ฮฉn2sโ‰ค22โ€‹6/5โ€‹ฯ€โ€‹ฯ…f,s,ฮฉNsโ€‹L.\|\,f_{2}-f_{2,n_{2}}\,\|_{L^{2}(\Omega)}\leq\dfrac{22\pi\upsilon_{f_{2},s,\Omega}}{n_{2}^{s}}\leq\dfrac{22\sqrt{6/5}\pi\upsilon_{f,s,\Omega}}{N^{sL}}.

These together with the triangle inequality give the estimateย (3.8) and the total number of units in each layer is

n1+2โ€‹n2=โŒˆN/6โŒ‰+โŒˆ5โ€‹N/6โŒ‰โ‰คN+2.n_{1}+2n_{2}=\lceil N/6\rceil+\lceil 5N/6\rceil\leq N+2.

If ff is a real-valued function, then we let fN=Reโ€‹(f1,n1+f2,n2)f_{N}=\mathrm{Re}(f_{1,n_{1}}+f_{2,n_{2}}), and the upper boundย (3.8) still holds. โˆŽ

As far as we know, the above theorem is best in the literature available so far. For shallow neural network L=1L=1, the authors inย [MengMing:2022] have proven the 1/21/2-convergence rate with โ„ฌp1/2โ€‹(โ„d)\mathscr{B}^{1/2}_{p}(\mathbb{R}^{d}) as the target function space, which is smaller than โ„ฌ1/2โ€‹(โ„d)\mathscr{B}^{1/2}(\mathbb{R}^{d}), and their estimate depends on the dimension as d1/4d^{1/4}. The upper bound inย [Siegel:2022] depends on ฯ…f,0+ฯ…f,1/2\upsilon_{f,0}+\upsilon_{f,1/2}, whileย (2.5) exemplifies that ฯ…f,0\upsilon_{f,0} may be much larger than ฯ…f,s\upsilon_{f,s} for some functions in โ„ฌsโ€‹(โ„d)\mathscr{B}^{s}(\mathbb{R}^{d}), and the estimate depends upon the dimension exponentially. By contrast to these two results, the upper bound in Theoremย 3.9 depends only on ฯ…f,s,ฮฉ\upsilon_{f,s,\Omega}, and is independent of the dimension.

For deep neural network, a similar result for ReLU has been proven inย [BreslerNagaraj:2020] with (sโ€‹L/2)(sL/2)-order, which is not optimal compared with our estimate. At first glance, our result may seem contradictory withย [BreslerNagaraj:2020]*Theorem 2. This is not the case because the upper bound therein is ฯ…f,0โ€‹ฯ…f,s+ฯ…f,0\sqrt{\upsilon_{f,0}\upsilon_{f,s}}+\upsilon_{f,0}, which requires fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}), but is usually smaller than โ€–fโ€–โ„ฌsโ€‹(โ„d)\|\,f\,\|_{\mathscr{B}^{s}(\mathbb{R}^{d})} for oscillatory functions; cf. Lemmaย A.1.

Remark 3.10.

The activation function of the last hidden layer of the (L,N)(L,N)-network in Theoremย 3.9 may be replaced by many other familiar activation functions such as Hyperbolic tangent, SoftPlus, ELU, Leaky ReLU, ReLUk and so on. Because all these activation functions can be reduced to sigmoidal functions by certain shifting and scaling argument; e.g., for SoftPlus, we observe that SoftPlusโ€‹(t)โˆ’SoftPlusโ€‹(tโˆ’1)\text{SoftPlus}(t)-\text{SoftPlus}(t-1) is a sigmoidal function. Unfortunately, it is not easy to change ReLU of the first Lโˆ’1L-1 hidden layers by other activation functions.

In what follows, we shall show that Theoremย 3.9 is sharp if the activation function of the last hidden layer is Heaviside function. This example is adopted fromย [BreslerNagaraj:2020]. We reserve it briefly to ensure the completeness of our work and postpone the proof in Appendixย A.6.

Theorem 3.11.

For any fixed positive integers L,NL,N and real numbers ฮต,s\varepsilon,s with 0<ฮต,sโ€‹Lโ‰ค1/20<\varepsilon,sL\leq 1/2, there exists fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}) satisfying ฯ…f,s,ฮฉโ‰ค1+ฮต\upsilon_{f,s,\Omega}\leq 1+\varepsilon such that for any (L,N)(L,N)-network fNf_{N} whose activation function ฯƒ\sigma in the last layer is the Heaviside function ฯ‡[0,โˆž)\chi_{[0,\infty)}, there holds

(3.9) โ€–fโˆ’fNโ€–L2โ€‹(ฮฉ)โ‰ฅ1โˆ’ฮต8โ€‹Nsโ€‹L.\|\,f-f_{N}\,\|_{L^{2}(\Omega)}\geq\dfrac{1-\varepsilon}{8N^{sL}}.

4. Conclusion

We discuss the analytical functional properties of the spectral Barron space. The sharp embedding between the spectral Barron spaces and various classical function spaces have been established. The approximation rate has been proved for the deep ReLU neural networks when the spectral Barron space with a small smoothness index is employed as the target function space. There are still some unsolved problems, such as the sup-norm error and the higher-order convergence results for larger ss, the relations among Barron type spaces, variational space and the Radon bounded variation space as well as understanding how these spaces are related to the classical function spaces, which will be pursued in the subsequent works.

References

Appendix A Some proof details

A.1. Proof forย (2.3) andย (2.4)

Proof.

Note that ฯ•^R\widehat{\phi}_{R} is a radial function. By Lemmaย 2.1 and ฯ•^RโˆˆL1โ€‹(โ„d)\widehat{\phi}_{R}\in L^{1}(\mathbb{R}^{d}),

ฯ•Rโ€‹(x)=\displaystyle\phi_{R}(x)= โˆซBR(1โˆ’|ฮพ|2R2)ฮดโ€‹e2โ€‹ฯ€โ€‹iโ€‹xโ‹…ฮพโ€‹dฮพ\displaystyle\int_{B_{R}}\left(1-\dfrac{\lvert\xi\rvert^{2}}{R^{2}}\right)^{\delta}e^{2\pi ix\cdot\xi}\mathrm{d}\xi
=\displaystyle= โˆซโˆ’RRe2โ€‹ฯ€โ€‹iโ€‹|x|โ€‹ฮพ1โ€‹dฮพ1โ€‹โˆซฮพ22+โ€ฆโ€‹ฮพd2<R2โˆ’ฮพ12(1โˆ’|ฮพ|2R2)ฮดโ€‹dฮพ2โ€‹โ€ฆโ€‹dฮพd.\displaystyle\int_{-R}^{R}e^{2\pi i\lvert x\rvert\xi_{1}}\mathrm{d}\xi_{1}\int_{\xi_{2}^{2}+\dots\xi_{d}^{2}<R^{2}-\xi_{1}^{2}}\left(1-\dfrac{\lvert\xi\rvert^{2}}{R^{2}}\right)^{\delta}\mathrm{d}\xi_{2}\dots\mathrm{d}\xi_{d}.

Performing the polar transformation and changing the variable t=r2/(R2โˆ’ฮพ12)t=r^{2}/(R^{2}-\xi_{1}^{2}), we obtain

โˆซฮพ22+โ€ฆโ€‹ฮพd2<R2โˆ’ฮพ12(1โˆ’|ฮพ|2R2)ฮดโ€‹dฮพ2โ€‹โ€ฆโ€‹dฮพd\displaystyle\quad\int_{\xi_{2}^{2}+\dots\xi_{d}^{2}<R^{2}-\xi_{1}^{2}}\left(1-\dfrac{\lvert\xi\rvert^{2}}{R^{2}}\right)^{\delta}\mathrm{d}\xi_{2}\dots\mathrm{d}\xi_{d}
=ฯ‰dโˆ’2โ€‹โˆซ0R2โˆ’ฮพ12rdโˆ’2โ€‹(1โˆ’ฮพ12+r2R2)ฮดโ€‹dr\displaystyle=\omega_{d-2}\int_{0}^{\sqrt{R^{2}-\xi_{1}^{2}}}r^{d-2}\left(1-\dfrac{\xi_{1}^{2}+r^{2}}{R^{2}}\right)^{\delta}\mathrm{d}r
=ฯ‰dโˆ’22โ€‹Rdโˆ’1โ€‹(1โˆ’ฮพ12R2)ฮด+(dโˆ’1)/2โ€‹โˆซ01t(dโˆ’3)/2โ€‹(1โˆ’t)ฮดโ€‹dt\displaystyle=\dfrac{\omega_{d-2}}{2}R^{d-1}\left(1-\dfrac{\xi_{1}^{2}}{R^{2}}\right)^{\delta+(d-1)/2}\int_{0}^{1}t^{(d-3)/2}(1-t)^{\delta}\mathrm{d}t
=ฯ‰dโˆ’22โ€‹Rdโˆ’1โ€‹(1โˆ’ฮพ12R2)ฮด+(dโˆ’1)/2โ€‹Bโ€‹(dโˆ’12,ฮด+1).\displaystyle=\dfrac{\omega_{d-2}}{2}R^{d-1}\left(1-\dfrac{\xi_{1}^{2}}{R^{2}}\right)^{\delta+(d-1)/2}B\left(\dfrac{d-1}{2},\delta+1\right).

Substituting this equation into the previous one and changing the variable ฮพ1=Rโ€‹cosโกฮธ\xi_{1}=R\cos\theta, we get

ฯ•Rโ€‹(x)=\displaystyle\phi_{R}(x)= ฯ‰dโˆ’22โ€‹Rdโˆ’1โ€‹Bโ€‹(dโˆ’12,ฮด+1)โ€‹โˆซโˆ’RR(1โˆ’ฮพ12R2)ฮด+(dโˆ’1)/2โ€‹e2โ€‹ฯ€โ€‹iโ€‹|x|โ€‹ฮพ1โ€‹dฮพ1\displaystyle\dfrac{\omega_{d-2}}{2}R^{d-1}B\left(\dfrac{d-1}{2},\delta+1\right)\int_{-R}^{R}\left(1-\dfrac{\xi_{1}^{2}}{R^{2}}\right)^{\delta+(d-1)/2}e^{2\pi i\lvert x\rvert\xi_{1}}\mathrm{d}\xi_{1}
=\displaystyle= ฯ€(dโˆ’1)/2โ€‹ฮ“โ€‹(ฮด+1)โ€‹Rdฮ“โ€‹(ฮด+(d+1)/2)โ€‹โˆซ0ฯ€cosโก(2โ€‹ฯ€โ€‹|x|โ€‹Rโ€‹cosโกฮธ)โ€‹sin2โ€‹ฮด+dโกฮธโ€‹dโ€‹ฮธ\displaystyle\dfrac{\pi^{(d-1)/2}\Gamma(\delta+1)R^{d}}{\Gamma(\delta+(d+1)/2)}\int_{0}^{\pi}\cos(2\pi\lvert x\rvert R\cos\theta)\sin^{2\delta+d}\theta\mathrm{d}\theta
=\displaystyle= ฮ“โ€‹(ฮด+1)ฯ€ฮดโ€‹|x|ฮด+d/2โ€‹Rโˆ’ฮด+d/2โ€‹Jฮด+d/2โ€‹(2โ€‹ฯ€โ€‹|x|โ€‹R),\displaystyle\dfrac{\Gamma(\delta+1)}{\pi^{\delta}\lvert x\rvert^{\delta+d/2}}R^{-\delta+d/2}J_{\delta+d/2}(2\pi\lvert x\rvert R),

where we have used

Jฮฝโ€‹(x)=(x/2)ฮฝฯ€1/2โ€‹ฮ“โ€‹((d+1)/2)โ€‹โˆซ0ฯ€cosโก(xโ€‹cosโกฮธ)โ€‹sin2โ€‹ฮฝโกฮธโ€‹dโ€‹ฮธ,ฮฝ>โˆ’12J_{\nu}(x)=\dfrac{(x/2)^{\nu}}{\pi^{1/2}\Gamma((d+1)/2)}\int_{0}^{\pi}\cos(x\cos\theta)\sin^{2\nu}\theta\mathrm{d}\theta,\qquad\nu>-\dfrac{1}{2}

in the last step. The above integral representation of the first kind of Bessel function may be found inย [Luke:1962]*ยงย 1.4.5, Eq. (4).

It remains to proveย (2.4). For sโ‰ฅ0s\geq 0, a direct calculation gives

ฯ…ฯ•R,s\displaystyle\upsilon_{\phi_{R},s} =โˆซBR|ฮพ|sโ€‹(1โˆ’|ฮพ|2R2)ฮดโ€‹dฮพ=ฯ‰dโˆ’1โ€‹โˆซ0Rrs+dโˆ’1โ€‹(1โˆ’r2R2)ฮดโ€‹dr\displaystyle=\int_{B_{R}}\lvert\xi\rvert^{s}\left(1-\dfrac{\lvert\xi\rvert^{2}}{R^{2}}\right)^{\delta}\mathrm{d}\xi=\omega_{d-1}\int_{0}^{R}r^{s+d-1}\left(1-\dfrac{r^{2}}{R^{2}}\right)^{\delta}\mathrm{d}r
=ฯ‰dโˆ’12โ€‹Rs+dโ€‹โˆซ01t(s+d)/2โˆ’1โ€‹(1โˆ’t)ฮดโ€‹dt\displaystyle=\dfrac{\omega_{d-1}}{2}R^{s+d}\int_{0}^{1}t^{(s+d)/2-1}(1-t)^{\delta}\mathrm{d}t
=ฯ‰dโˆ’12โ€‹Bโ€‹(s+d2,ฮด+1)โ€‹Rs+d.\displaystyle=\dfrac{\omega_{d-1}}{2}B\left(\dfrac{s+d}{2},\delta+1\right)R^{s+d}.

Therefore ฯ•Rโˆˆโ„ฌsโ€‹(โ„d)\phi_{R}\in\mathscr{B}^{s}(\mathbb{R}^{d}). โˆŽ

A.2. Proof forย (2.14)

Proof.

To proveย (2.14), we start with the following representation formula. If f^โˆˆL1โ€‹(โ„d)\widehat{f}\in L^{1}(\mathbb{R}^{d}) is a radial function with f^โ€‹(ฮพ)=g0โ€‹(|ฮพ|)\widehat{f}(\xi)=g_{0}(\lvert\xi\rvert), then

(A.1) fโ€‹(x)=2โ€‹ฯ€|x|d/2โˆ’1โ€‹โˆซ0โˆžg0โ€‹(r)โ€‹rd/2โ€‹Jd/2โˆ’1โ€‹(2โ€‹ฯ€โ€‹|x|โ€‹r)โ€‹dr.f(x)=\dfrac{2\pi}{\lvert x\rvert^{d/2-1}}\int_{0}^{\infty}g_{0}(r)r^{d/2}J_{d/2-1}(2\pi\lvert x\rvert r)\mathrm{d}r.

If d=1d=1, then using Lemmaย 2.1, we obtain

fโ€‹(x)=โˆซโ„g0โ€‹(|ฮพ|)โ€‹e2โ€‹ฯ€โ€‹iโ€‹xโ€‹ฮพโ€‹dฮพ=2โ€‹โˆซ0โˆžg0โ€‹(r)โ€‹cosโก(2โ€‹ฯ€โ€‹|x|โ€‹r)โ€‹dr,f(x)=\int_{\mathbb{R}}g_{0}(\lvert\xi\rvert)e^{2\pi ix\xi}\mathrm{d}\xi=2\int_{0}^{\infty}g_{0}(r)\cos(2\pi\lvert x\rvert r)\mathrm{d}r,

which givesย (A.1)., where we have used the relationย [Luke:1962]*ยงย 1.4.6, Eq. (7)

Jโˆ’1/2โ€‹(x)=2ฯ€โ€‹xโ€‹cosโก(x)J_{-1/2}(x)=\sqrt{\dfrac{2}{\pi x}}\cos(x)

in the last step.

For dโ‰ฅ2d\geq 2, combining Lemmaย 2.1 andย [Stein:1971]*Ch. IV, Theorem 3.3, we obtainย (A.1), which immediately implies

fpโ€‹(x)\displaystyle f_{p}(x) =2โ€‹ฯ€|x|d/2โˆ’1โ€‹โˆซ01rdโ€‹(1/2โˆ’1/pโ€ฒ)โ€‹Jd/2โˆ’1โ€‹(2โ€‹ฯ€โ€‹|x|โ€‹r)โ€‹dr\displaystyle=\dfrac{2\pi}{\lvert x\rvert^{d/2-1}}\int_{0}^{1}r^{d(1/2-1/p^{\prime})}J_{d/2-1}(2\pi\lvert x\rvert r)\mathrm{d}r
=ฯ‰dโˆ’1โˆซ01rd/pโˆ’1F10(;d/2;โˆ’ฯ€2|x|2r2)dr,\displaystyle=\omega_{d-1}\int_{0}^{1}r^{d/p-1}{}_{0}F_{1}(;d/2;-\pi^{2}\lvert x\rvert^{2}r^{2})\mathrm{d}r,

where we have used the relationย [Luke:1962]*ยงย 1.4.1, Eq. (1)

Jฮฝ(x)=(x/2)ฮฝฮ“โ€‹(ฮฝ+1)F10(;ฮฝ+1;โˆ’x2/4)J_{\nu}(x)=\dfrac{(x/2)^{\nu}}{\Gamma(\nu+1)}{}_{0}F_{1}(;\nu+1;-x^{2}/4)

in the last step. Changing the variable t=r2t=r^{2} and using the identityย [Luke:1962]*ยงย 1.3.2, Eq. (2)

F21(ฯ;ฯ+ฯƒ,ฮฒ;x)=1Bโ€‹(ฯ,ฯƒ)โˆซ01tฯโˆ’1(1โˆ’t)ฯƒโˆ’1F10(;ฮฒ;xt)dt,{}_{1}F_{2}(\rho;\rho+\sigma,\beta;x)=\dfrac{1}{B(\rho,\sigma)}\int_{0}^{1}t^{\rho-1}(1-t)^{\sigma-1}{}_{0}F_{1}(;\beta;xt)\mathrm{d}t,

we getย (2.14). โˆŽ

A.3. Proof for Lemmaย 2.8

Proof.

The โ€œifโ€-part is standard byย [Triebel:1983]*ยงย 2.3.2, Proposition 2; ยงย 2.7.1, Theorem. We illustrate the โ€œonly ifโ€-part with an example, which is taken fromย [Triebel:1983]*ยงย 2.3.9, Proof of Theorem.

Let f0โˆˆ๐’ฎโ€‹(โ„d)f_{0}\in\mathscr{S}(\mathbb{R}^{d}) with supp(f^0)โŠ‚{xโˆˆโ„dโˆฃโ€‰1โ‰ค|x|โ‰ค3/2}(\widehat{f}_{0})\subset\left\{\,x\in\mathbb{R}^{d}\,\mid\,1\leq\lvert x\rvert\leq 3/2\,\right\}. Let fnโ€‹(x)=f0โ€‹(2โˆ’nโ€‹x)f_{n}(x)=f_{0}(2^{-n}x) for an integer nn, then

f^nโ€‹(ฮพ)=2โˆ’dโ€‹nโ€‹f^0โ€‹(2โˆ’nโ€‹ฮพ)andsuppโ€‹(f^n)โŠ‚{xโˆˆโ„dโˆฃโ€‰2nโ‰ค|x|โ‰ค3ร—2nโˆ’1}.\widehat{f}_{n}(\xi)=2^{-dn}\widehat{f}_{0}(2^{-n}\xi)\qquad\text{and}\qquad\mathrm{supp}(\widehat{f}_{n})\subset\left\{\,x\in\mathbb{R}^{d}\,\mid\,2^{n}\leq\lvert x\rvert\leq 3\times 2^{n-1}\,\right\}.

Choose proper {ฯ†j}j=0โˆžโŠ‚๐’ฎโ€‹(โ„d)\{\varphi_{j}\}_{j=0}^{\infty}\subset\mathscr{S}(\mathbb{R}^{d}) in the definition of Besov space such that ฯ†0โ€‹(x)=1\varphi_{0}(x)=1 when |x|โ‰ค3/2\lvert x\rvert\leq 3/2 and ฯ†j=1\varphi_{j}=1 on supp(f^j)(\widehat{f}_{j}) for jโ‰ฅ1j\geq 1, then

suppโ€‹(f^n)โˆฉsuppโ€‹(ฯ†j)=โˆ…ifnโ‰ฅ0andnโ‰ j.\mathrm{supp}(\widehat{f}_{n})\cap\mathrm{supp}(\varphi_{j})=\emptyset\quad\text{if}\quad n\geq 0\quad\text{and}\quad n\neq j.

A direct calculation gives that

(ฯ†jโ€‹f^n)โˆจ=ฮด0โ€‹nโ€‹fn,ifnโ‰ค0and(ฯ†jโ€‹f^n)โˆจ=ฮดjโ€‹nโ€‹f^n,ifn>0.(\varphi_{j}\widehat{f}_{n})^{\vee}=\delta_{0n}f_{n},\quad\text{if}\quad n\leq 0\qquad\text{and}\qquad(\varphi_{j}\widehat{f}_{n})^{\vee}=\delta_{jn}\widehat{f}_{n},\quad\text{if}\quad n>0.

By definition, when n<0n<0,

โ€–fnโ€–Bp,qฮฑโ€‹(โ„d)=โ€–fnโ€–Lpโ€‹(โ„d)=2โˆ’dโ€‹n/pโ€‹โ€–f0โ€–Lpโ€‹(โ„d).\|\,f_{n}\,\|_{B_{p,q}^{\alpha}(\mathbb{R}^{d})}=\|\,f_{n}\,\|_{L^{p}(\mathbb{R}^{d})}=2^{-dn/p}\|\,f_{0}\,\|_{L^{p}(\mathbb{R}^{d})}.

Let nโ†’โˆ’โˆžn\to-\infty with the embedding relation Bp1,q1ฮฑ1โ€‹(โ„d)โ†ชBp2,q2ฮฑ2โ€‹(โ„d)B_{p_{1},q_{1}}^{\alpha_{1}}(\mathbb{R}^{d})\hookrightarrow B_{p_{2},q_{2}}^{\alpha_{2}}(\mathbb{R}^{d}) yields p1โ‰คp2p_{1}\leq p_{2}. Similarly, when n>0n>0,

โ€–fnโ€–Bp,qฮฑโ€‹(โ„d)=2ฮฑโ€‹nโ€‹โ€–fnโ€–Lpโ€‹(โ„d)=2(ฮฑโˆ’d/p)โ€‹nโ€‹โ€–f0โ€–Lpโ€‹(โ„d).\|\,f_{n}\,\|_{B_{p,q}^{\alpha}(\mathbb{R}^{d})}=2^{\alpha n}\|\,f_{n}\,\|_{L^{p}(\mathbb{R}^{d})}=2^{(\alpha-d/p)n}\|\,f_{0}\,\|_{L^{p}(\mathbb{R}^{d})}.

Let nโ†’+โˆžn\to+\infty with the embedding relation implies ฮฑ1โˆ’d/p1โ‰ฅฮฑ2โˆ’d/p2\alpha_{1}-d/p_{1}\geq\alpha_{2}-d/p_{2}. Finally if ฮฑ1โˆ’d/p1=ฮฑ2โˆ’d/p2\alpha_{1}-d/p_{1}=\alpha_{2}-d/p_{2}, then q1โ‰คq2q_{1}\leq q_{2} proved inย [Triebel:1995]*Theorem 3.2.1. โˆŽ

A.4. Proof forย (2.16)

Proof.

If 1โ‰คp<โˆž1\leq p<\infty and ฮฑ>0\alpha>0, then Bp,1ฮฑโ€‹(โ„d)B_{p,1}^{\alpha}(\mathbb{R}^{d}) is equivalent to the space defined inย [Triebel:1983]*ยงย 2.5.7, Theorem

ฮ›p,1ฮฑโ€‹(โ„d):={fโˆˆW[ฮฑ],pโ€‹(โ„d)โˆฃโˆซโ„dโ€–ฮ”h2โ€‹(โˆ‡[ฮฑ]f)โ€–Lpโ€‹(โ„d)|h|d+{ฮฑ}โ€‹dh<โˆž}\Lambda_{p,1}^{\alpha}(\mathbb{R}^{d}){:}=\left\{\,f\in W^{[\alpha],p}(\mathbb{R}^{d})\,\mid\,\int_{\mathbb{R}^{d}}\dfrac{\|\,\Delta_{h}^{2}(\nabla^{[\alpha]}f)\,\|_{L^{p}(\mathbb{R}^{d})}}{\lvert h\rvert^{d+\{\alpha\}}}\mathrm{d}h<\infty\,\right\}

equipped with the norm

โ€–fโ€–ฮ›p,1ฮฑโ€‹(โ„d):=โ€–fโ€–W[ฮฑ],pโ€‹(โ„d)+โˆซโ„dโ€–ฮ”h2โ€‹(โˆ‡[ฮฑ]f)โ€–Lpโ€‹(โ„d)|h|d+{ฮฑ}โ€‹dh,\|\,f\,\|_{\Lambda_{p,1}^{\alpha}(\mathbb{R}^{d})}{:}=\|\,f\,\|_{W^{[\alpha],p}(\mathbb{R}^{d})}+\int_{\mathbb{R}^{d}}\dfrac{\|\,\Delta_{h}^{2}(\nabla^{[\alpha]}f)\,\|_{L^{p}(\mathbb{R}^{d})}}{\lvert h\rvert^{d+\{\alpha\}}}\mathrm{d}h,

where ฮฑ=[ฮฑ]+{ฮฑ}\alpha=[\alpha]+\{\alpha\} with integer [ฮฑ][\alpha] and 0<{ฮฑ}โ‰ค10<\{\alpha\}\leq 1, and ฮ”h2โ€‹fโ€‹(x)=fโ€‹(x+2โ€‹h)โˆ’2โ€‹fโ€‹(x+h)+fโ€‹(x)\Delta_{h}^{2}f(x)=f(x+2h)-2f(x+h)+f(x)ย [Triebel:1983]*ยงย 2.2.2, Eq. (9).

For any nonnegative integer kk, a direct calculation gives

โˆ‡kฯˆnโ€‹(x)=โˆ‘j=0kcjโ€‹xฮฒj(1+iโ€‹n)d/2+jโ€‹eโˆ’ฯ€โ€‹|x|2/(1+iโ€‹n)\nabla^{k}\psi_{n}(x)=\sum_{j=0}^{k}\dfrac{c_{j}x^{\beta_{j}}}{(1+in)^{d/2+j}}e^{-\pi\lvert x\rvert^{2}/(1+in)}

for some constants {cj}j=0k\{c_{j}\}_{j=0}^{k}, and the multi-index ฮฒj=(ฮฒjโ€‹1,โ€ฆ,ฮฒjโ€‹d)\beta_{j}=(\beta_{j1},\dots,\beta_{jd}) satisfies |ฮฒj|โ‰คj\lvert\beta_{j}\rvert\leq j with xฮฒj=x1ฮฒjโ€‹1โ€‹โ€ฆโ€‹xdฮฒjโ€‹dx^{\beta_{j}}=x_{1}^{\beta_{j1}}\dots x_{d}^{\beta_{jd}}. Then

โ€–โˆ‡kฯˆnโ€–Lpโ€‹(โ„d)โ‰คCโ€‹โˆ‘j=0k1(1+n2)d/4+j/2โ€‹โ€–|x||ฮฒj|โ€‹eโˆ’ฯ€โ€‹|x|2/(1+n2)โ€–Lpโ€‹(โ„d).\|\,\nabla^{k}\psi_{n}\,\|_{L^{p}(\mathbb{R}^{d})}\leq C\sum_{j=0}^{k}\dfrac{1}{(1+n^{2})^{d/4+j/2}}\|\,\lvert x\rvert^{\lvert\beta_{j}\rvert}e^{-\pi\lvert x\rvert^{2}/(1+n^{2})}\,\|_{L^{p}(\mathbb{R}^{d})}.

A direct calculation gives

โˆซโ„d|x||ฮฒj|โ€‹pโ€‹eโˆ’ฯ€โ€‹pโ€‹|x|2/(1+n2)โ€‹dx=\displaystyle\int_{\mathbb{R}^{d}}\lvert x\rvert^{\lvert\beta_{j}\rvert p}e^{-\pi p\lvert x\rvert^{2}/(1+n^{2})}\mathrm{d}x= (1+n2)(d+|ฮฒj|โ€‹p)/2โ€‹โˆซโ„d|y||ฮฒj|โ€‹pโ€‹eโˆ’ฯ€โ€‹pโ€‹|y|2โ€‹dy\displaystyle(1+n^{2})^{(d+\lvert\beta_{j}\rvert p)/2}\int_{\mathbb{R}^{d}}\lvert y\rvert^{\lvert\beta_{j}\rvert p}e^{-\pi p\lvert y\rvert^{2}}\mathrm{d}y
=\displaystyle= ฯ‰dโˆ’1โ€‹ฮ“โ€‹((|ฮฒj|โ€‹p+d)/2)2โ€‹(pโ€‹ฯ€)(|ฮฒj|โ€‹p+d)/2โ€‹(1+n2)(d+|ฮฒj|โ€‹p)/2.\displaystyle\dfrac{\omega_{d-1}\Gamma\left((\lvert\beta_{j}\rvert p+d)/2\right)}{2(p\pi)^{(\lvert\beta_{j}\rvert p+d)/2}}(1+n^{2})^{(d+\lvert\beta_{j}\rvert p)/2}.

Therefore, there exists CC depending only on d,p,kd,p,k such that

(A.2) โ€–โˆ‡kฯˆnโ€–Lpโ€‹(โ„d)โ‰คCโ€‹โˆ‘j=0k(1+n2)d/(2โ€‹p)+|ฮฒj|/2(1+n2)d/4+j/2โ‰คCโ€‹(1+n2)โˆ’dโ€‹(pโˆ’2)/(4โ€‹p).\|\,\nabla^{k}\psi_{n}\,\|_{L^{p}(\mathbb{R}^{d})}\leq C\sum_{j=0}^{k}\dfrac{(1+n^{2})^{d/(2p)+\lvert\beta_{j}\rvert/2}}{(1+n^{2})^{d/4+j/2}}\leq C(1+n^{2})^{-d(p-2)/(4p)}.

If fโˆˆWp2โ€‹(โ„d)f\in W^{2}_{p}(\mathbb{R}^{d}), then

ฮ”h2โ€‹fโ€‹(x)=โˆซ01dtโ€‹โˆซt1+tโˆ‡2fโ€‹(x+sโ€‹h)โ€‹dsโ€‹hโ‹…h=โˆซ02โˆ‡2fโ€‹(x+sโ€‹h)โ€‹dsโ€‹โˆซmaxโก(sโˆ’1,0)minโก(s,1)dtโ€‹hโ‹…h.\Delta_{h}^{2}f(x)=\int_{0}^{1}\mathrm{d}t\int_{t}^{1+t}\nabla^{2}f(x+sh)\mathrm{d}sh\cdot h=\int_{0}^{2}\nabla^{2}f(x+sh)\mathrm{d}s\int_{\max(s-1,0)}^{\min(s,1)}\mathrm{d}th\cdot h.

Therefore,

|ฮ”h2โ€‹fโ€‹(x)|โ‰ค|h|2โ€‹โˆซ02|โˆ‡2fโ€‹(x+sโ€‹h)|โ€‹ds.\lvert\Delta_{h}^{2}f(x)\rvert\leq\lvert h\rvert^{2}\int_{0}^{2}\lvert\nabla^{2}f(x+sh)\rvert\mathrm{d}s.

By the Minkowskiโ€™s inequality, we obtain

โˆฅฮ”h2f(x)โˆฅLpโ€‹(โ„d)โ‰ค|h|2โˆซ02โˆฅโˆ‡2f(โ‹…+sh)โˆฅLpโ€‹(โ„d)ds=2|h|2โˆฅโˆ‡2fโˆฅLpโ€‹(โ„d).\|\,\Delta_{h}^{2}f(x)\,\|_{L^{p}(\mathbb{R}^{d})}\leq\lvert h\rvert^{2}\int_{0}^{2}\|\,\nabla^{2}f(\cdot+sh)\,\|_{L^{p}(\mathbb{R}^{d})}\mathrm{d}s=2\lvert h\rvert^{2}\|\,\nabla^{2}f\,\|_{L^{p}(\mathbb{R}^{d})}.

Splitting the integral part of the ฮ›p,1ฮฑ\Lambda_{p,1}^{\alpha}-norm into two parts, we get

โˆซ|h|<1โ€–ฮ”h2โ€‹fโ€–Lpโ€‹(โ„d)|h|d+{ฮฑ}โ€‹dh+โˆซ|h|>1โ€–ฮ”h2โ€‹fโ€–Lpโ€‹(โ„d)|h|d+{ฮฑ}โ€‹dh\displaystyle\int_{\lvert h\rvert<1}\dfrac{\|\,\Delta_{h}^{2}f\,\|_{L^{p}(\mathbb{R}^{d})}}{\lvert h\rvert^{d+\{\alpha\}}}\mathrm{d}h+\int_{\lvert h\rvert>1}\dfrac{\|\,\Delta_{h}^{2}f\,\|_{L^{p}(\mathbb{R}^{d})}}{\lvert h\rvert^{d+\{\alpha\}}}\mathrm{d}h
โ‰ค\displaystyle\leq 2โ€‹โ€–โˆ‡2fโ€–Lpโ€‹(โ„d)โ€‹โˆซ|h|<1h2โˆ’dโˆ’{ฮฑ}โ€‹dh+4โ€‹โ€–fโ€–Lpโ€‹(โ„d)โ€‹โˆซ|h|>1hโˆ’dโˆ’{ฮฑ}โ€‹dh\displaystyle 2\|\,\nabla^{2}f\,\|_{L^{p}(\mathbb{R}^{d})}\int_{\lvert h\rvert<1}h^{2-d-\{\alpha\}}\mathrm{d}h+4\|\,f\,\|_{L^{p}(\mathbb{R}^{d})}\int_{\lvert h\rvert>1}h^{-d-\{\alpha\}}\mathrm{d}h
=\displaystyle= 2โ€‹ฯ‰dโˆ’1โ€‹(โ€–โˆ‡2fโ€–Lpโ€‹(โ„d)2โˆ’{ฮฑ}+2โ€‹โ€–fโ€–Lpโ€‹(โ„d){ฮฑ})\displaystyle 2\omega_{d-1}\left(\dfrac{\|\,\nabla^{2}f\,\|_{L^{p}(\mathbb{R}^{d})}}{2-\{\alpha\}}+\dfrac{2\|\,f\,\|_{L^{p}(\mathbb{R}^{d})}}{\{\alpha\}}\right)
โ‰ค\displaystyle\leq Cโ€‹โ€–fโ€–Wp2โ€‹(โ„d).\displaystyle C\|\,f\,\|_{W^{2}_{p}(\mathbb{R}^{d})}.

Note that โˆ‡[ฮฑ]ฯˆnโˆˆWp2โ€‹(โ„d)\nabla^{[\alpha]}\psi_{n}\in W^{2}_{p}(\mathbb{R}^{d}), a combination of the above inequality andย (A.2) yields

โ€–ฯˆnโ€–ฮ›p,1ฮฑโ€‹(โ„d)โ‰คCโ€‹โ€–ฯˆnโ€–Wp[ฮฑ]+2โ€‹(โ„d)โ‰คCโ€‹(1+n2)โˆ’dโ€‹(pโˆ’2)/(4โ€‹p),\|\,\psi_{n}\,\|_{\Lambda_{p,1}^{\alpha}(\mathbb{R}^{d})}\leq C\|\,\psi_{n}\,\|_{W^{[\alpha]+2}_{p}(\mathbb{R}^{d})}\leq C(1+n^{2})^{-d(p-2)/(4p)},

where CC is a constant depending on p,ฮฑp,\alpha and dd but independent of nn. So does โ€–ฯˆnโ€–Bp,1ฮฑโ€‹(โ„d)\|\,\psi_{n}\,\|_{B_{p,1}^{\alpha}(\mathbb{R}^{d})}. โˆŽ

A.5. Proof for Lemmaย 3.5

Proof.

Firstly we show that g,ng_{,n} is symmetric about x=1/2x=1/2. By definition,

g,nโ€‹(1โˆ’t)=gโ€‹(nโ€‹(1โˆ’t)โˆ’j)=gโ€‹(nโ€‹tโˆ’n+j+1)=gโ€‹(nโ€‹tโˆ’k)=g,nโ€‹(t)g_{,n}(1-t)=g(n(1-t)-j)=g(nt-n+j+1)=g(nt-k)=g_{,n}(t)

for some integers j,kj,k satisfying 0โ‰คj,kโ‰คnโˆ’10\leq j,k\leq n-1 and k=nโˆ’jโˆ’1k=n-j-1.

For a fixed tโˆˆ[0,1]t\in[0,1], there exist integers j,kj,k satisfying 0โ‰คjโ‰ค2โ€‹n1โˆ’1,0โ‰คkโ‰คn2โˆ’10\leq j\leq 2n_{1}-1,0\leq k\leq n_{2}-1 such that 0โ‰ค2โ€‹n1โ€‹n2โ€‹tโˆ’n2โ€‹jโˆ’kโ‰ค10\leq 2n_{1}n_{2}t-n_{2}j-k\leq 1, then 0โ‰คn2โ€‹(2โ€‹n1โ€‹tโˆ’j)โˆ’kโ‰ค10\leq n_{2}(2n_{1}t-j)-k\leq 1 and

g,2n1n2โ€‹(t)=gโ€‹(2โ€‹n1โ€‹n2โ€‹tโˆ’n2โ€‹jโˆ’k)=gโ€‹(n2โ€‹(2โ€‹n1โ€‹tโˆ’j)โˆ’k)=g,n2โ€‹(2โ€‹n1โ€‹tโˆ’j).g_{,2n_{1}n_{2}}(t)=g(2n_{1}n_{2}t-n_{2}j-k)=g(n_{2}(2n_{1}t-j)-k)=g_{,n_{2}}(2n_{1}t-j).

By definition,

ฮฒ,nโ€‹(t)={2โ€‹nโ€‹tโˆ’2โ€‹j,0โ‰คnโ€‹tโˆ’jโ‰ค1/2,2+2โ€‹jโˆ’2โ€‹nโ€‹t,1/2โ‰คnโ€‹tโˆ’jโ‰ค1,j=0,โ€ฆ,nโˆ’1.\beta_{,n}(t)=\begin{cases}2nt-2j,&0\leq nt-j\leq 1/2,\\ 2+2j-2nt,&1/2\leq nt-j\leq 1,\end{cases}\quad j=0,\dots,n-1.

If jj is even, then j=2โ€‹lj=2l for some integer ll satisfying 0โ‰คlโ‰คn1โˆ’10\leq l\leq n_{1}-1 and 0โ‰คn1โ€‹tโˆ’lโ‰ค1/20\leq n_{1}t-l\leq 1/2. Therefore

g,n2โ€‹(ฮฒ,n1โ€‹(t))=g,n2โ€‹(2โ€‹n1โ€‹tโˆ’2โ€‹l)=g,n2โ€‹(2โ€‹n1โ€‹tโˆ’j).g_{,n_{2}}(\beta_{,n_{1}}(t))=g_{,n_{2}}(2n_{1}t-2l)=g_{,n_{2}}(2n_{1}t-j).

Otherwise jj is odd, then j=2โ€‹l+1j=2l+1 for some integer ll satisfying 0โ‰คlโ‰คn1โˆ’10\leq l\leq n_{1}-1 and 1/2โ‰คn1โ€‹tโˆ’lโ‰ค11/2\leq n_{1}t-l\leq 1. Therefore

g,n2โ€‹(ฮฒ,n1โ€‹(t))=g,n2โ€‹(2+2โ€‹lโˆ’2โ€‹n1โ€‹t)=g,n2โ€‹(1+jโˆ’2โ€‹n1โ€‹t)=g,n2โ€‹(2โ€‹n1โ€‹tโˆ’j).g_{,n_{2}}(\beta_{,n_{1}}(t))=g_{,n_{2}}(2+2l-2n_{1}t)=g_{,n_{2}}(1+j-2n_{1}t)=g_{,n_{2}}(2n_{1}t-j).

This gives g,n2โˆ˜ฮฒ,n1=g,2n1n2g_{,n_{2}}\circ\beta_{,n_{1}}=g_{,2n_{1}n_{2}} on [0,1][0,1]. โˆŽ

A.6. Proof for Theoremย 3.11

Lemma A.1.

Given n,R>0n,R>0 and let fโ€‹(x)=cosโก(2โ€‹ฯ€โ€‹nโ€‹x1)โ€‹eโˆ’ฯ€โ€‹|x|2/Rf(x)=\cos(2\pi nx_{1})e^{-\pi\lvert x\rvert^{2}/R}, then fโˆˆโ„ฌsโ€‹(โ„d)f\in\mathscr{B}^{s}(\mathbb{R}^{d}) with

(A.3) ฯ…f,s,ฮฉโ‰ค(n+dฯ€โ€‹R)sfor0โ‰คsโ‰ค1.\upsilon_{f,s,\Omega}\leq\left(n+\dfrac{d}{\pi\sqrt{R}}\right)^{s}\qquad\text{for}\quad 0\leq s\leq 1.
Proof.

For any R>0,R>0, the Fourier transform of the dilated Guass function eโˆ’ฯ€โ€‹xj2/Re^{-\pi x_{j}^{2}/R} reads as

eโˆ’ฯ€โ€‹xj2/R^=Rโ€‹eโˆ’ฯ€โ€‹Rโ€‹ฮพj2.\widehat{e^{-\pi x_{j}^{2}/R}}=\sqrt{R}e^{-\pi R\xi_{j}^{2}}.

A direct calculation gives

f^โ€‹(ฮพ)=\displaystyle\widehat{f}(\xi)= 12โ€‹โˆซโ„eโˆ’ฯ€โ€‹x12/Rโ€‹(eโˆ’2โ€‹ฯ€โ€‹iโ€‹x1โ€‹(ฮพ1โˆ’n)+eโˆ’2โ€‹ฯ€โ€‹iโ€‹x1โ€‹(ฮพ1+n))โ€‹dx1โ€‹โˆj=2dโˆซโ„eโˆ’ฯ€โ€‹xj2/Rโˆ’2โ€‹ฯ€โ€‹iโ€‹xjโ€‹ฮพjโ€‹dxj\displaystyle\dfrac{1}{2}\int_{\mathbb{R}}e^{-\pi x_{1}^{2}/R}\Bigl{(}e^{-2\pi ix_{1}(\xi_{1}-n)}+e^{-2\pi ix_{1}(\xi_{1}+n)}\Bigr{)}\mathrm{d}x_{1}\prod_{j=2}^{d}\int_{\mathbb{R}}e^{-\pi x_{j}^{2}/R-2\pi ix_{j}\xi_{j}}\mathrm{d}x_{j}
=\displaystyle= Rd/22โ€‹(eโˆ’ฯ€โ€‹Rโ€‹(ฮพ1โˆ’n)2+eโˆ’ฯ€โ€‹Rโ€‹(ฮพ1+n)2)โ€‹โˆj=2deโˆ’ฯ€โ€‹Rโ€‹ฮพj2\displaystyle\dfrac{R^{d/2}}{2}\Bigl{(}e^{-\pi R(\xi_{1}-n)^{2}}+e^{-\pi R(\xi_{1}+n)^{2}}\Bigr{)}\prod_{j=2}^{d}e^{-\pi R\xi_{j}^{2}}
=\displaystyle= Rd/2โ€‹eโˆ’ฯ€โ€‹Rโ€‹(|ฮพ|2+n2)โ€‹coshโก(2โ€‹ฯ€โ€‹nโ€‹Rโ€‹ฮพ1).\displaystyle R^{d/2}e^{-\pi R(\lvert\xi\rvert^{2}+n^{2})}\cosh(2\pi nR\xi_{1}).

It is clear that f,f^โˆˆL1โ€‹(โ„d)f,\widehat{f}\in L^{1}(\mathbb{R}^{d}) and the pointwise Fourier inversion theorem holds true, and

ฯ…f,0,ฮฉ=โˆซโ„d|f^โ€‹(ฮพ)|โ€‹dฮพ=โˆซโ„df^โ€‹(ฮพ)โ€‹dฮพ=fโ€‹(0)=1,\upsilon_{f,0,\Omega}=\int_{\mathbb{R}^{d}}\lvert\widehat{f}(\xi)\rvert\mathrm{d}\xi=\int_{\mathbb{R}^{d}}\widehat{f}(\xi)\mathrm{d}\xi=f(0)=1,

where we have used the positiveness of f^\widehat{f} .

Next, using the elementary identities

Rโ€‹โˆซโ„eโˆ’ฯ€โ€‹Rโ€‹ฮพj2โ€‹dฮพj=1andRโ€‹โˆซโ„|ฮพj|โ€‹eโˆ’ฯ€โ€‹Rโ€‹ฮพj2โ€‹dฮพj=1ฯ€โ€‹R,\sqrt{R}\int_{\mathbb{R}}e^{-\pi R\xi_{j}^{2}}\mathrm{d}\xi_{j}=1\quad\text{and}\quad\sqrt{R}\int_{\mathbb{R}}\lvert\xi_{j}\rvert e^{-\pi R\xi_{j}^{2}}\mathrm{d}\xi_{j}=\dfrac{1}{\pi\sqrt{R}},

we obtain

ฯ…f,1,ฮฉ\displaystyle\upsilon_{f,1,\Omega} =Rd/2โ€‹โˆซโ„|ฮพ1|โ€‹eโˆ’ฯ€โ€‹Rโ€‹(ฮพ1โˆ’n)2โ€‹dฮพ1โ€‹โˆj=2dโˆซโ„eโˆ’ฯ€โ€‹Rโ€‹ฮพj2โ€‹dฮพj\displaystyle=R^{d/2}\int_{\mathbb{R}}\lvert\xi_{1}\rvert e^{-\pi R(\xi_{1}-n)^{2}}\mathrm{d}\xi_{1}\prod_{j=2}^{d}\int_{\mathbb{R}}e^{-\pi R\xi_{j}^{2}}\mathrm{d}\xi_{j}
+Rd/2โ€‹โˆซโ„eโˆ’ฯ€โ€‹Rโ€‹(ฮพ1โˆ’n)2โ€‹dฮพ1โ€‹โˆ‘j=2dโˆซโ„|ฮพj|โ€‹eโˆ’ฯ€โ€‹Rโ€‹ฮพj2โ€‹dฮพjโ€‹โˆkโ‰ 1,jโˆซโ„eโˆ’ฯ€โ€‹Rโ€‹ฮพk2โ€‹dฮพk\displaystyle\quad+R^{d/2}\int_{\mathbb{R}}e^{-\pi R(\xi_{1}-n)^{2}}\mathrm{d}\xi_{1}\sum_{j=2}^{d}\int_{\mathbb{R}}\lvert\xi_{j}\rvert e^{-\pi R\xi_{j}^{2}}\mathrm{d}\xi_{j}\prod_{k\neq 1,j}\int_{\mathbb{R}}e^{-\pi R\xi_{k}^{2}}\mathrm{d}\xi_{k}
โ‰คRโ€‹โˆซโ„(|ฮพ1โˆ’n|+n)โ€‹eโˆ’ฯ€โ€‹Rโ€‹(ฮพ1โˆ’n)2โ€‹dฮพ1+dโˆ’1ฯ€โ€‹R\displaystyle\leq\sqrt{R}\int_{\mathbb{R}}(\lvert\xi_{1}-n\rvert+n)e^{-\pi R(\xi_{1}-n)^{2}}\mathrm{d}\xi_{1}+\dfrac{d-1}{\pi\sqrt{R}}
=n+dฯ€โ€‹R.\displaystyle=n+\dfrac{d}{\pi\sqrt{R}}.

Using the interpolation inequalityย (2.11), we obtainย (A.3). โˆŽ

Proof for Theoremย 3.11.

Define n=2L+2โ€‹NLn=2^{L+2}N^{L} and fโ€‹(x)=nโˆ’sโ€‹cosโก(2โ€‹ฯ€โ€‹nโ€‹x1)โ€‹eโˆ’ฯ€โ€‹|x|2/Rf(x)=n^{-s}\cos(2\pi nx_{1})e^{-\pi\lvert x\rvert^{2}/R} with large enough RR such that ฯ…f,s,ฮฉโ‰ค1+ฮต\upsilon_{f,s,\Omega}\leq 1+\varepsilon by Lemmaย A.1 and eโˆ’ฯ€โ€‹|x|2/Rโ‰ฅ1โˆ’ฮตe^{-\pi\lvert x\rvert^{2}/R}\geq 1-\varepsilon when xโˆˆฮฉx\in\Omega. Fix x2,โ€ฆ,xdx_{2},\dots,x_{d}, then any (L,N)(L,N)-network fNf_{N} can be viewed as an one-dimensional (L,N)(L,N)-network, i.e. fNโ€‹(โ‹…,x2,โ€ฆ,xd):[0,1]โ†’โ„‚f_{N}(\cdot,x_{2},\dots,x_{d}):[0,1]\to\mathbb{C}. Divide [0,1][0,1] into nn-internals of [j/n,(j+1)/n][j/n,(j+1)/n] with j=0,โ€ฆ,nโˆ’1j=0,\dots,n-1. There exists at least nโˆ’2L+1โ€‹NL=2L+1โ€‹NLn-2^{L+1}N^{L}=2^{L+1}N^{L} intervals such that fNf_{N} does not change sign on those intervalsย [Telgarsky:2016]*Lemma 3.2. Without loss of generality, we assume fNโ€‹(โ‹…,x2,โ€ฆ,xd)โ‰ฅ0f_{N}(\cdot,x_{2},\dots,x_{d})\geq 0 on some interval [j/n,(j+1)/n][j/n,(j+1)/n], then

โˆซj/n(j+1)/n(fโ€‹(x)โˆ’fNโ€‹(x))2โ€‹dx1โ‰ฅ(1โˆ’ฮต)2n2โ€‹sโ€‹โˆซ(4โ€‹j+1)/(4โ€‹n)(4โ€‹j+3)/(4โ€‹n)cos2โก(2โ€‹ฯ€โ€‹nโ€‹x1)โ€‹dx1โ‰ฅ(1โˆ’ฮต)24โ€‹n2โ€‹s+1,\int_{j/n}^{(j+1)/n}(f(x)-f_{N}(x))^{2}\mathrm{d}x_{1}\geq\dfrac{(1-\varepsilon)^{2}}{n^{2s}}\int_{(4j+1)/(4n)}^{(4j+3)/(4n)}\cos^{2}(2\pi nx_{1})\mathrm{d}x_{1}\geq\dfrac{(1-\varepsilon)^{2}}{4n^{2s+1}},

because cosโก(2โ€‹ฯ€โ€‹nโ€‹x1)โ‰ค0\cos(2\pi nx_{1})\leq 0 when 2โ€‹ฯ€โ€‹j+ฯ€/2โ‰ค2โ€‹ฯ€โ€‹nโ€‹x1โ‰ค2โ€‹ฯ€โ€‹j+3โ€‹ฯ€/22\pi j+\pi/2\leq 2\pi nx_{1}\leq 2\pi j+3\pi/2.

Summing up these nโˆ’2L+1โ€‹NLn-2^{L+1}N^{L} intervals gives

โ€–fโˆ’fNโ€–L2โ€‹(ฮฉ)2โ‰ฅ\displaystyle\|\,f-f_{N}\,\|_{L^{2}(\Omega)}^{2}\geq โˆซ[0,1]dโˆ’1dx2โ€‹โ€ฆโ€‹dxdโ€‹โˆซ01(fโ€‹(x)โˆ’fNโ€‹(x))2โ€‹dx1\displaystyle\int_{[0,1]^{d-1}}\mathrm{d}x_{2}\dots\mathrm{d}x_{d}\int_{0}^{1}(f(x)-f_{N}(x))^{2}\mathrm{d}x_{1}
โ‰ฅ\displaystyle\geq 2L+1โ€‹NLโ€‹(1โˆ’ฮต)24โ€‹n2โ€‹s+1โ‰ฅ(1โˆ’ฮต)222โ€‹sโ€‹L+4โ€‹s+3โ€‹N2โ€‹sโ€‹Lโ‰ฅ(1โˆ’ฮต)264โ€‹N2โ€‹sโ€‹L.\displaystyle\dfrac{2^{L+1}N^{L}(1-\varepsilon)^{2}}{4n^{2s+1}}\geq\dfrac{(1-\varepsilon)^{2}}{2^{2sL+4s+3}N^{2sL}}\geq\dfrac{(1-\varepsilon)^{2}}{64N^{2sL}}.

Simultaneously squaring off both sides of the inequality, we obtainย (3.9). โˆŽ