This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\AtAppendix\AtAppendix

Gaussian and Bootstrap Approximations for Suprema of Empirical Processes

Alexander Giessing Department of Statistics, University of Washington, Seattle, WA. E-mail: [email protected].
Abstract

In this paper we develop non-asymptotic Gaussian approximation results for the sampling distribution of suprema of empirical processes when the indexing function class n\mathcal{F}_{n} varies with the sample size nn and may not be Donsker. Prior approximations of this type required upper bounds on the metric entropy of n\mathcal{F}_{n} and uniform lower bounds on the variance of fnf\in\mathcal{F}_{n} which, both, limited their applicability to high-dimensional inference problems. In contrast, the results in this paper hold under simpler conditions on boundedness, continuity, and the strong variance of the approximating Gaussian process. The results are broadly applicable and yield a novel procedure for bootstrapping the distribution of empirical process suprema based on the truncated Karhunen-Loève decomposition of the approximating Gaussian process. We demonstrate the flexibility of this new bootstrap procedure by applying it to three fundamental problems in high-dimensional statistics: simultaneous inference on parameter vectors, inference on the spectral norm of covariance matrices, and construction of simultaneous confidence bands for functions in reproducing kernel Hilbert spaces.
 
Keywords: Gaussian Approximation; Gaussian Comparison; Gaussian Process Bootstrap; High-Dimensional Inference.

1 Introduction

1.1 Approximating the sampling distribution of suprema of empirical processes

This paper is concerned with non-asymptotic bounds on the Kolmogorov distance between the sampling distribution of suprema of empirical processes and the distribution of suprema of Gaussian proxy processes. Consider a simple random sample of independent and identically distributed random variables X1,,XnX_{1},\ldots,X_{n} with common law PP taking values in the measurable space (S,𝒮)(S,\mathcal{S}). Let n\mathcal{F}_{n} be a class of measurable functions f:Sf:S\rightarrow\mathbb{R} and define the empirical process

𝔾n(f)=1ni=1n(f(Xi)Pf),fn.\displaystyle\mathbb{G}_{n}(f)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\big{(}f(X_{i})-Pf\big{)},\quad\quad f\in\mathcal{F}_{n}.

Also, denote by {G(f):fn}\{G(f):f\in\mathcal{F}_{n}\} a centered Gaussian process with some positive semi-definite covariance function. Within this minimal setup (and a few additional technical assumptions), the paper addresses the problem of deriving non-asymptotic bounds on

ϱn:=sups0|{supfn|𝔾n(f)|s}{supfn|G(f)|s}|,\displaystyle\varrho_{n}:=\sup_{s\geq 0}\left|\mathbb{P}\Big{\{}\sup_{f\in\mathcal{F}_{n}}|\mathbb{G}_{n}(f)|\leq s\Big{\}}-\mathbb{P}\Big{\{}\sup_{f\in\mathcal{F}_{n}}|G(f)|\leq s\Big{\}}\right|, (1)

and, consequently, conditions under which an empirical process indexed by n\mathcal{F}_{n} is Gaussian approximable, i.e. ϱn0\varrho_{n}\rightarrow 0 as nn\rightarrow\infty. Evidently, if n\mathcal{F}_{n} is a Donsker class then the empirical process is trivially Gaussian approximable by the centered Gaussian process with covariance function (f,g)P(fg)(Pf)(Pg)(f,g)\mapsto P(fg)-(Pf)(Pg). Thus, in this paper we are concerned with conditions that are strictly weaker than those that guarantee a central limit theorem.

Gaussian approximation results of this type have gained significant interest recently, as they prove to be effective in tackling high- and infinite-dimensional inference problems. Applications include simultaneous inference on high-dimensional parameters (Dezeure et al., 2017; Zhang and Cheng, 2017), inference on nonparametric models (Chernozhukov et al., 2014a; Chen et al., 2015, 2016), testing for spurious correlations (Fan et al., 2018), testing for shape restrictions (Chetverikov, 2019), inference on large covariance matrices (Chen, 2018; Han et al., 2018; Lopes et al., 2019), goodness-of-fit tests for high-dimensional linear models (Janková et al., 2020), simultaneous confidence bands in functional data analysis (Lopes et al., 2020; Singh and Vijaykumar, 2023), and error quantification in randomized algorithms (Lopes et al., 2023).

The theoretical groundwork on Gaussian approximation was laid in three seminal papers by Chernozhukov et al. (2013, 2014b, 2015). Since then, subsequent theoretical works have developed numerous refinements and extensions (e.g. Chernozhukov et al., 2016, 2019, 2020; Deng and Zhang, 2020; Fang and Koike, 2021; Kuchibhotla et al., 2021; Cattaneo et al., 2022; Lopes, 2022a, b; Bong et al., 2023). In this paper, we resolve two limitations of Chernozhukov et al.’s (2014b) original results that have not been previously addressed:

  • Entropy conditions. The original bounds on the Kolmogorov distance ϱn\varrho_{n} depend on the metric entropy of the function class n\mathcal{F}_{n}. These upper bounds are non-trivial only if the metric entropy grows at a much slower rate than the sample size nn. As a result, most applications only address sparse inference problems involving function classes with either a finite number of functions, discretized functions, or a small VC-index. The new upper bounds in this paper no longer depend on the metric entropy of the function class and therefore open the possibility to tackle non-sparse, high-dimensional inference problems. We call these bounds dimension-/entropy-free.

  • Lower bounds on weak variances. The original bounds on ϱn\varrho_{n} require a strictly positive lower bound on the weak variance of the Gaussian proxy process, i.e. inffnVar(G(f))>0\inf_{f\in\mathcal{F}_{n}}\mathrm{Var}\big{(}G(f)\big{)}>0. This condition limits the scope of the original results to problems with standardized function classes (studentized statistics) and excludes situations with variance decay, typically observed in non-sparse, high-dimensional problems. The new Gaussian approximation results in this paper only depend on the strong variance Var(supfn|G(f)|)\mathrm{Var}\big{(}\sup_{f\in\mathcal{F}_{n}}|G(f)|\big{)} and are therefore applicable to a broad range of problems including those with degenerate distributions. We say that these bounds are weak variance-free.

Even though these limitations (and our solution) are quite technical, the resulting new Gaussian and bootstrap approximations have immediate practical consequences. We present three substantial applications to problems in high-dimensional inference in Section 4 and two toy examples that cast light on why and when the original approximation results by Chernozhukov et al. (2014b) fail and the new results succeed in Appendix A.

Notably, in the special case of inference on spectral norms of covariance matrices, entropy-free bounds on ϱn\varrho_{n} already exist. Namely, for n={xf(x)=(xu)(xv):u,vSd1}\mathcal{F}_{n}=\{x\mapsto f(x)=(x^{\prime}u)(x^{\prime}v):u,v\in S^{d-1}\} with Sd1={ud:u2=1}S^{d-1}=\{u\in\mathbb{R}^{d}:\|u\|_{2}=1\} Lopes et al. (2019; 2020; 2022a; 2022b) have devised an approach to bounding ϱn\varrho_{n} that combines a specific variance decay assumption with a truncation argument. For a carefully chosen truncation level, the resulting bound on ϱn\varrho_{n} depends only on the sample size nn and the parameter characterizing the variance decay. Since their approach is intimately related to the bilinearity of the functions in n\mathcal{F}_{n} and the specific variance decay assumption, it does not easily extend to arbitrary function classes. We therefore develop a different strategy in this paper.

Furthermore, for finite function classes |n|<|\mathcal{F}_{n}|<\inftyChernozhukov et al. (2020) and Deng and Zhang (2020) have been able to slightly relax the requirements on the weak variance of the Gaussian proxy process. However, their results do not generalize to arbitrary function classes with |n|=|\mathcal{F}_{n}|=\infty. Our strategy for replacing the weak variance with the strong variance in the upper bounds on ϱn\varrho_{n} is therefore conceptually completely different from theirs. For details, we refer to our companion paper on Gaussian anti-concentration inequalities Giessing (2023).

1.2 Contributions and overview of the results

This paper consists of three parts which contribute to probability theory (Section 2), mathematical statistics and bootstrap methodology (Section 3), and high-dimensional inference (Section 4). The Appendices AF contain additional supporting results and all proofs.

Section 2 contains the main mathematical innovations of this paper. We establish dimension- and weak variance-free Gaussian and bootstrap approximations for maxima of sums of independent and identically distributed high-dimensional random vectors (Section 2.1). Specifically, we derive a Gaussian approximation inequality (Proposition 1), a Gaussian comparison inequality (Proposition 3), and two bootstrap approximation inequalities (Propositions 4 and 5). At the core of these four theoretical results is a new proof of a multivariate Berry-Esseen-type bound which leverages two new technical auxiliary results: an anti-concentration inequality for suprema of separable Gaussian processes (Lemma 6 in Appendix B.2) and a smoothing inequality for partial derivatives of the Ornstein-Uhlenbeck semigroup associated to a multivariate normal measure (Lemma 2 in Appendix B.1). We conclude this section with a comparison to results from the literature (Section 2.2).

Section 3 contains our contributions to mathematical statistics and bootstrap methodology. The results in this section generalize the four basic results of Section 2 from finite dimensional vectors to empirical processes indexed by totally bounded (and hence separable) function classes. As one would expect, dimension- (a.k.a entropy-) and weak variance-freeness of the finite dimensional results carry over to empirical processes. We establish Gaussian approximation inequalities (Section 3.2), Gaussian comparison inequalities (Section 3.3), and an abstract bootstrap approximation inequality (Section 3.4). The latter result motivates a new procedure for bootstrapping the sampling distribution of suprema of empirical processes. We call this procedure the Gaussian process bootstrap (Algorithm 1) and discuss practical aspects of its implementations via the truncated Karhunen-Loève decomposition of a Gaussian proxy process (Section 3.5). We include a selective comparison with results from the literature (Section 3.6).

In Section 4 we showcase the flexibility of the Gaussian process bootstrap by applying it to three fundamental problems in high-dimensional statistics: simultaneous inference on parameter vectors (Section 4.1), inference on the spectral norm of covariance matrices (Section 4.2), and construction of simultaneous confidence bands for functions in reproducing kernel Hilbert spaces (Section 4.3). For each of these three examples we include a brief comparison with alternative bootstrap methods from the literature to explain in what sense our method improves over (or matches with) existing results.

2 Dimension- and weak variance-free results for the maximum norm of sums of i.i.d. random vectors

2.1 Four basic results

We begin with four propositions on non-asymptotic Gaussian and bootstrap approximations for finite-dimensional random vectors. These propositions are our main mathematical contribution and essential for the statistical results to come. In Section 3 we lift these propositions to general empirical processes which are widely applicable to problems in mathematical statistics. Throughout this section, x=max1kd|xk|\|x\|_{\infty}=\max_{1\leq k\leq d}|x_{k}| denotes the maximum norm of a vector xdx\in\mathbb{R}^{d}.

The first result is a Gaussian approximation inequality. This inequality provides a non-asymptotic bound on the Kolmogorov distance between the laws of the maximum norm of sums of independent and identically distributed random vectors and a Gaussian proxy statistic.

Proposition 1 (Gaussian approximation).

Let X,X1,XndX,X_{1},\ldots X_{n}\in\mathbb{R}^{d} be i.i.d. random vectors with mean zero and positive semi-definite covariance matrix Σ𝟎d×d\Sigma\neq\mathbf{0}\in\mathbb{R}^{d\times d}. Set Sn=n1/2i=1nXiS_{n}=n^{-1/2}\sum_{i=1}^{n}X_{i} and ZN(0,Σ)Z\sim N(0,\Sigma). Then, for M0M\geq 0, n1n\geq 1,

sups0|{Sns}{Zs}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}
(E[X3])1/3n1/3Var(Z)+E[X3𝟏{X>M}]E[X3]+E[Z]+MnVar(Z),\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\|X\|_{\infty}^{3}])^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\|Z\|_{\infty})}}+\frac{\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X\|_{\infty}^{3}\right]}+\frac{\mathrm{E}[\|Z\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z\|_{\infty})}},

where \lesssim hides an absolute constant independent of n,d,Mn,d,M, and the distribution of the XiX_{i}’s.

Remark 1 (Extension to independent, non-identically distributed random vectors).

The proof of Proposition 1 involves an inductive argument inspired by Theorem 3.7.1 in Nourdin and Peccati (2012) who attribute it to Bolthausen (1984). This argument relies crucially on independent and identically distributed data. Generalizing it to independent but non-identically distributed data would require additional assumptions on the variances similar to the uniform asymptotic negligibility condition in the classical Lindeberg-Feller CLT for triangular arrays. For empirical processes such a generalization is less relevant. We therefore leave this to future research.

Remark 2 (Extension to non-centered random vectors).

If the covariance matrix Σ\Sigma is strictly positive definite, then Proposition 1 also holds for i.i.d. random vectors X,X1,XndX,X_{1},\ldots X_{n}\in\mathbb{R}^{d} with non-zero mean μd\mu\in\mathbb{R}^{d} and ZN(μ,Σ)Z\sim N(\mu,\Sigma). Nevertheless, in the broader context of empirical processes a strictly positive definite covariance (function) is a very strong assumption. Therefore, in this paper, we do not pursue this refinement. Instead, the interested reader may consult the companion paper Giessing (2023).

Remark 3 (Gaussian approximation versus CLT).

Proposition 1 is strictly weaker than a multivariate CLT because the Kolmogorov distance between the maximum norm of two sequences of random vectors induces a topology on the set of probability measures on d\mathbb{R}^{d} which is coarser than the topology of convergence in distribution. Indeed, consider Un=(X,Y)2U_{n}=(X,Y)^{\prime}\in\mathbb{R}^{2} and Vn=(Y,X)2V_{n}=(Y,X)^{\prime}\in\mathbb{R}^{2} for n1n\geq 1 and X,YX,Y arbitrary random variables. Then, the Kolmogorov distance between Un\|U_{n}\|_{\infty} and Vn\|V_{n}\|_{\infty} is zero for all n1n\geq 1, but Un=𝑑VnU_{n}\overset{d}{=}V_{n} if and only if XX and YY are exchangeable. For another perspective on the same issue, see Section 2.2.

Above Gaussian approximation result differs in several ways from related results in the literature. The two most striking difference are the following: First, the non-asymptotic upper bound in Proposition 1 does not explicitly depend on the dimension dd, but only on the (truncated) third moments of X\|X\|_{\infty} and Z\|Z\|_{\infty} and the variance of Z\|Z\|_{\infty}. Under suitable conditions on the marginal and/ or joint distribution of the coordinates of XX these three quantities grow substantially slower than the dimension dd. Among other things, this opens up the possibility of applying this bound in the context of high-dimensional random vectors when dnd\gg n. Second, Proposition 1 holds without stringent assumptions on the distribution of XX and applies even to degenerate distributions that do not have a strictly positive definite covariance matrix Σ\Sigma. In particular, the lemma does not require a lower bound on the variances of the coordinates of XX or the minimum eigenvalue of Σ\Sigma. This is fundamental for an effortless extension to general empirical processes. For lower bounds on the variance of Z\|Z\|_{\infty} the reader may consult Giessing (2023). For a comprehensive comparison of Proposition 1 with previous work, we refer to Section 2.2.

If we are willing to impose a lower bound on the variances of the coordinates of XX, we can deduce the following useful inequality:

Corollary 2.

Recall the setup of Proposition 1. In addition, suppose that σ(1)2=min1jdVar(X(j))>0\sigma_{(1)}^{2}=\min_{1\leq j\leq d}\mathrm{Var}(X^{(j)})>0 and that XX has coordinate-wise finite 3+δ3+\delta moments, δ>0\delta>0. Define X~=(X(j)/σ(1))j=1d\widetilde{X}=(X^{(j)}/\sigma_{(1)})_{j=1}^{d}, Z~=(Z(j)/σ(1))j=1d\widetilde{Z}=(Z^{(j)}/\sigma_{(1)})_{j=1}^{d}. Then, for all n1n\geq 1,

sups0|{Sns}{Zs}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}
1n1/6(E[X~3+δ]+E[Z~3+δ])13+δE[Z~]+1nδ/3(E[X~3+δ]13+δE[X~3]1/3)3,\displaystyle\quad\quad\quad\lesssim\frac{1}{n^{1/6}}\left(\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+\delta}]+\mathrm{E}[\|\widetilde{Z}\|_{\infty}^{3+\delta}]\right)^{\frac{1}{3+\delta}}\mathrm{E}[\|\widetilde{Z}\|_{\infty}]+\frac{1}{n^{\delta/3}}\left(\frac{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+\delta}]^{\frac{1}{3+\delta}}}{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]^{1/3}}\right)^{3},

where \lesssim hides an absolute constant independent of n,d,σ(1)2n,d,\sigma_{(1)}^{2}, and the distribution of the XiX_{i}’s. If the coordinates in XX are equicorrelated with correlation coefficient ρ(0,1]\rho\in(0,1], then above inequality holds with E[Z~]\mathrm{E}[\|\widetilde{Z}\|_{\infty}] replaced by 1/ρ1/\rho.

Since E[X~3]1/3max1jdE[|X~(j)|2]1/2=σ(n)σ(1)\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]^{1/3}\geq\max_{1\leq j\leq d}\mathrm{E}[|\widetilde{X}^{(j)}|^{2}]^{1/2}=\frac{\sigma_{(n)}}{\sigma_{(1)}} for σ(n)2=max1jdVar(X(j))\sigma_{(n)}^{2}=\max_{1\leq j\leq d}\mathrm{Var}(X^{(j)}), the right hand side of the inequality in Corollary 2 can be easily upper bounded under a variety of moment assumptions on the marginal distributions of the coordinates of XX. For a few concrete examples relevant in high-dimensional statistics we refer to Lemmas 1–3 in Giessing and Fan (2023).

The second basic result is a Gaussian comparison inequality. The novelty of this result is (again) that it holds even for degenerate Gaussian laws with singular covariance matrices and does not explicitly depend on the dimension of the random vectors.

Proposition 3 (Gaussian comparison).

Let Y,ZdY,Z\in\mathbb{R}^{d} be Gaussian random vectors with mean zero and positive semi-definite covariance matrices Σ\Sigma and Ω𝟎d×d\Omega\neq\mathbf{0}\in\mathbb{R}^{d\times d}, respectively. Then,

sups0|{Ys}{Zs}|(maxj,k|ΩjkΣjk|Var(Y)Var(Z))1/3,\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|Y\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}\lesssim\left(\frac{\max_{j,k}|\Omega_{jk}-\Sigma_{jk}|}{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})}\right)^{1/3},

where \lesssim hides an absolute constant independent of d,Σ,Ωd,\Sigma,\Omega.

Since the Gaussian distribution is fully characterized by its first two moments, Proposition 1 instantly suggests that it should be possible to approximate the sampling distribution of Sn\|S_{n}\|_{\infty} with the sampling distribution of Zn\|Z_{n}\|_{\infty}, where ZnX1,,XnN(0,Σ^n)Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n}) and Σ^n\widehat{\Sigma}_{n} is a positive semi-definite estimate of Σ\Sigma. The third basic result formalizes this idea; it is a simple consequence of the triangle inequality combined with Propositions 1 and 3:

Proposition 4 (Bootstrap approximation of the sampling distribution).

Let X,X1,XndX,X_{1},\ldots X_{n}\in\mathbb{R}^{d} be i.i.d. random vectors with mean zero and positive semi-definite covariance matrix Σ𝟎d×d\Sigma\neq\mathbf{0}\in\mathbb{R}^{d\times d}. Let Σ^nΣ^n(X1,,Xn)\widehat{\Sigma}_{n}\equiv\widehat{\Sigma}_{n}(X_{1},\ldots,X_{n}) be any positive semi-definite estimate of Σ\Sigma. Set Sn=n1/2i=1nXiS_{n}=n^{-1/2}\sum_{i=1}^{n}X_{i}, ZnX1,,XnN(0,Σ^n)Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n}), and ZN(0,Σ)Z\sim N(0,\Sigma). Then, for M0M\geq 0, n1n\geq 1,

sups0|{Sns}{ZnsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z_{n}\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
(E[X3])1/3n1/3Var(Z)+E[X3𝟏{X>M}]E[X3]\displaystyle\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\|X\|_{\infty}^{3}])^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\|Z\|_{\infty})}}+\frac{\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X\|_{\infty}^{3}\right]}
+E[Z]+MnVar(Z)+(maxj,k|Σ^n,jkΣjk|Var(Z))1/3,\displaystyle\quad{}\quad{}\quad{}\quad{}+\frac{\mathrm{E}[\|Z\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z\|_{\infty})}}+\left(\frac{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|}{\mathrm{Var}(\|Z\|_{\infty})}\right)^{1/3},

where \lesssim hides an absolute constant independent of n,d,Mn,d,M, and the distribution of the XiX_{i}’s.

Typically, statistical applications require estimates of the quantiles of the sampling distribution. Since the covariance matrix Σ\Sigma is unknown, the quantiles of Zp\|Z\|_{p} with ZN(0,Σ)Z\sim N(0,\Sigma) are infeasible. Hence, for α(0,1)\alpha\in(0,1) arbitrary, we define the feasible Gaussian bootstrap quantiles as

cn(α;Σ^n)\displaystyle c_{n}(\alpha;\widehat{\Sigma}_{n}) :=inf{s0:{ZnsX1,,Xn}α},whereZnX1,,XnN(0,Σ^n).\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|Z_{n}\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\},\quad\text{where}\quad Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n}).

Since this quantity is random, it is not immediately obvious that it is a valid approximation of the α\alpha-quantile of the sampling distribution of Snp\|S_{n}\|_{p}. However, combing Proposition 4 with standard arguments (e.g. Chernozhukov et al., 2013) we obtain the fourth basic result:

Proposition 5 (Bootstrap approximation of quantiles).

Consider the setup of Lemma 4. Let (Θn)n1(\Theta_{n})_{n\geq 1}\in\mathbb{R} be a sequence of arbitrary random variables, not necessarily independent of X,X1,,XnX,X_{1},\ldots,X_{n}. Then, for M0M\geq 0, n1n\geq 1,

supα(0,1)|{Sn+Θncn(α;Σ^n)}α|\displaystyle\sup_{\alpha\in(0,1)}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}+\Theta_{n}\leq c_{n}(\alpha;\widehat{\Sigma}_{n})\right\}-\alpha\Big{|}
(E[X3])1/3n1/3Var(Z)+E[X3𝟏{X>M}]E[X3]+E[Z]+MnVar(Z)\displaystyle\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\|X\|_{\infty}^{3}])^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\|Z\|_{\infty})}}+\frac{\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X\|_{\infty}^{3}\right]}+\frac{\mathrm{E}[\|Z\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z\|_{\infty})}}
+infδ>0{(δVar(Z))1/3+P(maxj,k|Σ^n,jkΣjk|>δ)}\displaystyle\quad{}\quad{}\quad{}\quad+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\|Z\|_{\infty})}\right)^{1/3}+\mathrm{P}\left(\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right)\right\}
+infη>0{ηVar(Z)+P(|Θn|>η)},\displaystyle\quad{}\quad{}\quad{}\quad{}\quad\quad+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\mathrm{P}\left(|\Theta_{n}|>\eta\right)\right\},

where \lesssim hides an absolute constant independent of n,d,Mn,d,M, and the distribution of the XiX_{i}’s.

Remark 4 (On the purpose of the random variables (Θn)n1(\Theta_{n})_{n\geq 1}).

In applications (Θn)n1(\Theta_{n})_{n\geq 1} may be a higher-order approximation error such as the remainder of a first-order Taylor approximation. For concrete examples we refer to Corollary 2 and Theorem 10 in Giessing and Fan (2023). The random variables (Θn)n1(\Theta_{n})_{n\geq 1} may also be taken identical to zero. In this case, the expression in the last line in Lemma 5 vanishes.

Propositions 4 and 5 inherit the dimension- and weak variance-freeness from Propositions 1 and 3. Since these propositions do not require lower bounds the minimum eigenvalue of the estimate of Σ\Sigma, we can always use the naive sample covariance matrix Σ^n=n1i=1n(XiX¯n)(XiX¯n)\widehat{\Sigma}_{n}=n^{-1}\sum_{i=1}^{n}(X_{i}-\bar{X}_{n})(X_{i}-\bar{X}_{n})^{\prime} with X¯n=n1i=1nXi\bar{X}_{n}=n^{-1}\sum_{i=1}^{n}X_{i} to estimate Σ\Sigma even if dnd\gg n. If additional information about Σ\Sigma is available (viz. low-rank, bandedness, or approximate sparsity), we can of course use more sophisticated estimators Σ^n\widehat{\Sigma}_{n} to improve the non-asymptotic bounds. This will prove particularly effective when we lift Propositions 4 and 5 to general empirical processes (see Section 3.4). The reader can find several more examples in Section 4.2 of the companion paper Giessing and Fan (2023).

2.2 Relation to previous work

Here, we compare Propositions 1 and 3 with the relevant results in the literature. From a mathematical point of view Propositions 4 and 5 are just an afterthought. For a comprehensive review of the entire literature on Gaussian and bootstrap approximations of maxima of sums of random vectors see Chernozhukov et al. (2023).

  • On the dependence on the dimension. As emphasized above, the upper bounds in Propositions 1 and 3 are dimension-free. This is a significant improvement over Theorems 2.2, 2.1, 3.2, 2.1 in Chernozhukov et al. (2013, 2017, 2019, 2020), Theorem 1.1 and Corollary 1.3 in Fang and Koike (2021), and Theorems 2.1 and 2.2 in Lopes (2022a), which all feature logarithmic factors of the dimension. Such bounds generalize poorly to empirical processes since the resulting upper bounds necessarily depend on the ε\varepsilon-entropy of the function class. This precludes (or, at the very least, substantially complicates) applications to objects as basic as the operator norm of a high-dimensional covariance matrix.

  • On the moment assumptions. The above mentioned results by Chernozhukov et al. (2013, 2017, 2019, 2020)Fang and Koike (2021), and Lopes (2022a) all require strictly positive lower bounds on the variances of the components of the random vector XX and/ or on the minimum eigenvalue of the covariance matrix Σ\Sigma. The strictly positive lower bounds are especially awkward if we try to extend their finite dimensional results to general empirical processes: While is often sensible to impose an upper bound on the variances of the increments of an empirical process, lower bounds on the variances are much harder to justify and in general only achievable via a discretization (or thinning) of the function class. The weak variance-free upper bounds in Propositions 1 and 3 give us greater leeway and increases the scope of our approximation results considerably (see also Appendix A).

  • On the sharpness of the upper bounds. The results in Chernozhukov et al. (2020)Fang and Koike (2021), and Lopes (2022a) show that the upper bound in Proposition 1 is sub-optimal and that its dependence on sample size nn can be improved to n1/2n^{-1/2}. The proof techniques in Chernozhukov et al. (2020) and Fang and Koike (2021) are both based on delicate estimates of Hermite polynomials and thus inherently dimension dependent. Extending their approaches to the coordinate-free Wiener chaos decomposition (which would yield dimension-free results) is a formidable research task. The proof strategy in Lopes (2022a) is very different and requires sub-Gaussian data. The results in the aforementioned papers also show that under additional distributional assumptions the exponent on the upper bound in Proposition 3 can be improved to 1/21/2.

  • Proper generalizations of classical CLTs. In a certain regard, Propositions 1 and 3 are strictly weaker than the results in Chernozhukov et al. (2017, 2020)Fang and Koike (2021), and Lopes (2022a). Proposition 1 (and similarly Proposition 3) provides a bound on

    sups0|{Sns}{Zs}|=supA𝒬d|{SnA}{ZA}|,\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}=\sup_{A\in\mathcal{Q}_{d}}\Big{|}\mathbb{P}\left\{S_{n}\in A\right\}-\mathbb{P}\left\{Z\in A\right\}\Big{|},

    where 𝒬d\mathcal{Q}_{d} be the collection of all hypercubes in d\mathbb{R}^{d} with center at the origin. Chernozhukov et al. (2017)Fang and Koike (2021), and Lopes (2022a) provide bounds on above quantity not for the supremum over 𝒬d\mathcal{Q}_{d} but the supremum over the much larger collection of all hyper-rectangles in d\mathbb{R}^{d}. When the dimension dd is fixed, their results imply convergence in distribution of SnS_{n} to ZZ as nn\rightarrow\infty and are therefore stronger than our results (see also Remark 3). In particular, their results can be considered proper generalizations of classical CLTs to high dimensions.

    The results in this paper depend on a dimension-free anti-concentration inequality for Z\|Z\|_{\infty} (or, in other words, for the supremum over 𝒬d\mathcal{Q}_{d}) and a smoothing inequality specific to the map xxx\mapsto\|x\|_{\infty} (i.e. Lemmas 2 and 6). Since these inequalities do not apply to the class of all hyper-rectangles in d\mathbb{R}^{d}, the arguments in this paper cannot be easily modified to yield dimension-free generalizations of the classical CLTs to high dimensions.

3 Approximation results for suprema of empirical processes

3.1 Empirical process notation and definitions

In the previous section we got away with intuitive standard notation. We now introduce precise empirical process notation for the remainder of the paper: We denote by X,X1,X2,X,X_{1},X_{2},\ldots a sequence of i.i.d. random variables taking values in a measurable space (S,𝒮)(S,\mathcal{S}) with common distribution PP, i.e. Xi:SSX_{i}:S^{\infty}\rightarrow S, i1i\geq 1, are the coordinate projections of the infinite product probability space (Ω,𝒜,)=(S,𝒮,P)(\Omega,\mathcal{A},\mathbb{P})=\left(S^{\infty},\mathcal{S}^{\infty},P^{\infty}\right) with law Xi=P\mathbb{P}_{X_{i}}=P. If auxiliary variables independent of the XiX_{i}’s are involved, the underlying probability space is assumed to be of the form (Ω,𝒜,)=(S,𝒮,P)×(Z,𝒵,Q)(\Omega,\mathcal{A},\mathbb{P})=\left(S^{\infty},\mathcal{S}^{\infty},P^{\infty}\right)\times(Z,\mathcal{Z},Q). We define the empirical measures PnP_{n} associated with observations X1,,XnX_{1},\ldots,X_{n} as random measures on (S,𝒮)\left(S^{\infty},\mathcal{S}^{\infty}\right) given by Pn(ω):=n1i=1nδXi(ωi)P_{n}(\omega):=n^{-1}\sum_{i=1}^{n}\delta_{X_{i}(\omega_{i})} for all ωS\omega\in S^{\infty}, where δx\delta_{x} is the Dirac measure at xx.

For a class \mathcal{F} of measurable functions from SS onto the real line \mathbb{R} we define the empirical process indexed by \mathcal{F} as

𝔾n(f):=1ni=1n(f(Xi)Pf),f.\displaystyle\mathbb{G}_{n}(f):=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(f(X_{i})-Pf\big{)},\quad{}f\in\mathcal{F}.

Further, we denote by {GP(f):f}\{G_{P}(f):f\in\mathcal{F}\} the Gaussian PP-bridge process with mean zero and the same covariance function 𝒞P:×\mathcal{C}_{P}:\mathcal{F}\times\mathcal{F}\rightarrow\mathbb{R} as the process {f(X):f}\{f(X):f\in\mathcal{F}\}, i.e.

(f,g)𝒞P(f,g):=E[GP(f)GP(g)]=Pfg(Pf)(Pg).\displaystyle(f,g)\mapsto\mathcal{C}_{P}(f,g):=\mathrm{E}[G_{P}(f)G_{P}(g)]=Pfg-(Pf)(Pg). (2)

Moreover, we denote by {ZQ(f):f}\{Z_{Q}(f):f\in\mathcal{F}\} the Gaussian QQ-motion with mean zero and covariance function covariance function Q:×\mathcal{E}_{Q}:\mathcal{F}\times\mathcal{F}\rightarrow\mathbb{R} given by

(f,g)Q(f,g):=E[ZQ(f)ZQ(g)]=Qfg.\displaystyle(f,g)\mapsto\mathcal{E}_{Q}(f,g):=\mathrm{E}[Z_{Q}(f)Z_{Q}(g)]=Qfg. (3)

For probability measures QQ on (S,𝒮)(S,\mathcal{S}) we define the Lq(Q)L_{q}(Q)-norm, q1q\geq 1, for function ff\in\mathcal{F} by fQ,q=(Q|f|q)1/q\|f\|_{Q,q}=(Q|f|^{q})^{1/q}, the L2(Q)L_{2}(Q)-semimetric by eQ(f,g):=fgQ,2e_{Q}(f,g):=\|f-g\|_{Q,2} and the intrinsic standard deviation metric by dP(f,g):=eP(fPf,gPg)=P(fg)2(P(fg))2d_{P}(f,g):=e_{P}(f-Pf,g-Pg)=\sqrt{P(f-g)^{2}-(P(f-g))^{2}}, f,gf,g\in\mathcal{F}. We denote by Lq(S,𝒮,Q)L_{q}(S,\mathcal{S},Q), q1q\geq 1, the space of all real-valued measurable functions ff on (S,𝒮)(S,\mathcal{S}) with finite Lq(Q)L_{q}(Q)-norm. A function F:SF:S\rightarrow\mathbb{R} is an envelop for the class \mathcal{F} satisfies |f(x)|F(x)|f(x)|\leq F(x) for all ff\in\mathcal{F} and xSx\in S. For a semimetric space (T,d)(T,d) and any ε>0\varepsilon>0 we write N(T,d,ε)N(T,d,\varepsilon) to denote the ε\varepsilon-covering numbers of TT with respect to dd. For two deterministic sequences (an)n1(a_{n})_{n\geq 1} and (bn)n1(b_{n})_{n\geq 1}, we write anbna_{n}\lesssim b_{n} if an=o(bn)a_{n}=o(b_{n}) and anbna_{n}\asymp b_{n} if there exist absolute constants C1,C2>0C_{1},C_{2}>0 such that C1bnanC2bnC_{1}b_{n}\leq a_{n}\leq C_{2}b_{n} for all n1n\geq 1. For a,ba,b\in\mathbb{R} we write aba\wedge b for min{a,b}\min\{a,b\} and aba\vee b for max{a,b}\max\{a,b\}.

3.2 Gaussian approximation inequalities

We present the first main theoretical result of this paper: a Gaussian approximation inequality for empirical processes indexed by function classes that need not be Donsker. This result generalizes Proposition 1 from finite dimensional index sets to general function classes.

Theorem 1 (Gaussian approximation).

Let (n,ρ)(\mathcal{F}_{n},\rho) be a totally bounded pseudo-metric space. Further, let nL2(S,𝒮,P)\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P) have envelope function FnL3(S,𝒮,P)F_{n}\in L_{3}(S,\mathcal{S},P). Suppose that there exist functions ψn,ϕn\psi_{n},\phi_{n} such that for δ>0\delta>0,

EGPn,δψn(δ)Var(GPn)and𝔾nn,δP,1ϕn(δ)Var(GPn),\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\lesssim\psi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\quad{}\quad{}\text{and}\quad{}\quad{}\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}\lesssim\phi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}, (4)

where n,δ={fg:f,gn,ρ(f,g)<δFnP,2}\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>\rho(f,g)<\delta\|F_{n}\|_{P,2}\}. Let rn=inf{ψn(δ)ϕn(δ):δ>0}r_{n}=\inf\big{\{}\psi_{n}(\delta)\vee\phi_{n}(\delta):\delta>0\big{\}}. Then, for each M0M\geq 0,

sups0|{𝔾nns}{GPns}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}
FnP,3n1/3Var(GPn)+Fn𝟏{Fn>M}P,33FnP,33+EGPn+MnVar(GPn)+rn,\displaystyle\quad{}\quad{}\quad\lesssim\frac{\|F_{n}\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\sqrt{r_{n}},

where \lesssim hides an absolute constant independent of n,rn,M,n,Fn,ψn,ϕnn,r_{n},M,\mathcal{F}_{n},F_{n},\psi_{n},\phi_{n}, and PP.

Remark 1 (Lower and upper bounds on Var(GPn\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}).

Since n\mathcal{F}_{n} may change with the sample size, so may Var(GPn\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}. In a few special cases it is possible to exactly compute the variance (Giessing and Fan, 2023), but in most cases one can only derive lower and upper bounds. In our companion work Giessing (2023) we derive such bounds under mild conditions on the Gaussian PP-bridge process. For the reader’s convenience we include the relevant results from this paper in Appendix B.2.

The total boundedness of n\mathcal{F}_{n} in above theorem is a standard assumption which allows us to reduce the proof of Theorem 1 to an application of Proposition 1 combined with a discretization argument. Since n\mathcal{F}_{n} is totally bounded whenever EGPn<\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}<\infty (i.e. Lemma 16 in Appendix B.4), this is a rather mild technical assumption. For a detailed discussion of this result and a comparison with the literature we refer to Section 3.6.

Under more specific smoothness assumptions on the Gaussian PP-bridge process (beyond the abstract control of the moduli of continuity in eq. (4)) Theorem 1 simplifies as follows:

Corollary 2.

Let nL2(S,𝒮,P)\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P) and have envelope FnL3(S,𝒮,P)F_{n}\in L_{3}(S,\mathcal{S},P). If the Gaussian PP-Bridge process {GP(f):fn}\{G_{P}(f):f\in\mathcal{F}_{n}\} has almost surely uniformly dPd_{P}-continuous sample paths and EGPn<\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}<\infty, then for each M0M\geq 0,

sups0|{𝔾nns}{GPns}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}
FnP,3n1/3Var(GPn)+Fn𝟏{Fn>M}P,33FnP,33+EGPn+MnVar(GPn),\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\|F_{n}\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}},

where \lesssim hides an absolute constant independent of n,M,n,Fnn,M,\mathcal{F}_{n},F_{n}, and PP.

Remark 2 (Further simplification with lower bound on second moment).

Recall the setup of Corollary 2. If in addition there exists κn>0\kappa_{n}>0 such that Pf2κn2>0Pf^{2}\geq\kappa_{n}^{2}>0 for all fnf\in\mathcal{F}_{n} and FnL3+δ(S,𝒮,P)F_{n}\in L_{3+\delta}(S,\mathcal{S},P), δ>0\delta>0, then

sups0|{𝔾nns}{GPns}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}
1n1/6κn2(FnP,3+δ3+δ+E[GPn3+δ])13+δE[GPn]+1nδ/3(FnP,3+δFnP,3)3,\displaystyle\quad\quad\quad\lesssim\frac{1}{n^{1/6}\kappa^{2}_{n}}\left(\|F_{n}\|_{P,3+\delta}^{3+\delta}+\mathrm{E}[\|G_{P}\|_{\mathcal{F}_{n}}^{3+\delta}]\right)^{\frac{1}{3+\delta}}\mathrm{E}[\|G_{P}\|_{\mathcal{F}_{n}}]+\frac{1}{n^{\delta/3}}\left(\frac{\|F_{n}\|_{P,3+\delta}}{\|F_{n}\|_{P,3}}\right)^{3},

where \lesssim hides an absolute constant independent of n,d,κn2n,d,\kappa_{n}^{2}, and the distribution of the XiX_{i}’s. This is the empirical process analogue to Corollary 2. Since the proof of this inequality is is identical to the one of Corollary 2 and only differs in notation, we omit the details.

Since centered Gaussian processes are either almost surely uniformly continuous w.r.t. their intrinsic standard deviation metric or almost surely discontinuous, the smoothness condition in Corollary 2 is natural. (For the proofs to go through uniform dPd_{P}-continuity in probability would be sufficient.) In Lemmas 17 and 18 in Appendix B.4 we provide simple sufficient and necessary conditions for almost sure uniform dPd_{P}-continuity of GPG_{P} on n\mathcal{F}_{n}. Importantly, n1n\geq 1 is fixed; the uniform dPd_{P}-continuity is a point-wise requirement and does not need to hold when taking the limit nn\rightarrow\infty. This is especially relevant in high-dimensional statistics where Gaussian processes are typically unbounded and have diverging sample paths as nn\rightarrow\infty, and, more generally, whenever the function classes are non-Donsker. Of course, just as with Proposition 1 the conclusion of Corollary 2 is weaker than a weaker convergence (see Section 2.2).

3.3 Gaussian comparison inequalities

Our second main result is a comparison inequality which bounds the Kolmogorov distance between a Gaussian PP-bridge process and a Gaussian QQ-motion both indexed by (possibly) different function classes n\mathcal{F}_{n} and 𝒢n\mathcal{G}_{n}. This generalizes Proposition 3 to Gaussian processes.

To state the theorem, we introduce the following notation: Given the pseudo-metric spaces (n,ρi)(\mathcal{F}_{n},\rho_{i}) and (𝒢n,ρi)(\mathcal{G}_{n},\rho_{i}), i{1,2}i\in\{1,2\}, we write π𝒢n1:n𝒢n𝒢n\pi^{1}_{\mathcal{G}_{n}}:\mathcal{F}_{n}\cup\mathcal{G}_{n}\rightarrow\mathcal{G}_{n} to denote the projection from n𝒢n\mathcal{F}_{n}\cup\mathcal{G}_{n} onto 𝒢n\mathcal{G}_{n} defined by ρ1(h,π𝒢n1h)=infg𝒢nρ1(h,g)\rho_{1}(h,\pi^{1}_{\mathcal{G}_{n}}h)=\inf_{g\in\mathcal{G}_{n}}\rho_{1}(h,g) for all hn𝒢nh\in\mathcal{F}_{n}\cup\mathcal{G}_{n}. In the case that the image π𝒢n1h\pi^{1}_{\mathcal{G}_{n}}h is not a singleton for some hn𝒢nh\in\mathcal{F}_{n}\cup\mathcal{G}_{n}, we choose any one of the equivalent points in 𝒢n\mathcal{G}_{n}. Similarly, we write πn2:n𝒢nn\pi^{2}_{\mathcal{F}_{n}}:\mathcal{F}_{n}\cup\mathcal{G}_{n}\rightarrow\mathcal{F}_{n} for the projection from n𝒢n\mathcal{F}_{n}\cup\mathcal{G}_{n} onto n\mathcal{F}_{n} defined via ρ2\rho_{2}.

Theorem 3 (Gaussian comparison).

Let (n,ρ1)(\mathcal{F}_{n},\rho_{1}) and (𝒢n,ρ2)(\mathcal{G}_{n},\rho_{2}) be totally bounded pseudo-metric spaces. Further, let nL2(S,𝒮,P)\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P) and 𝒢nL2(S,𝒮,Q)\mathcal{G}_{n}\subset L_{2}(S,\mathcal{S},Q) have envelope functions FnL2(S,𝒮,P)F_{n}\in L_{2}(S,\mathcal{S},P) and GnL2(S,𝒮,Q)G_{n}\in L_{2}(S,\mathcal{S},Q), respectively. Suppose that there exist functions ψn,ϕn\psi_{n},\phi_{n} such that

EGPn,δψn(δ)Var(GPn)andEZQ𝒢n,δϕn(δ)Var(ZQ𝒢n),\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\lesssim\psi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\quad{}\quad{}\text{and}\quad{}\quad{}\mathrm{E}\|Z_{Q}\|_{\mathcal{G}_{n,\delta}^{\prime}}\lesssim\phi_{n}(\delta)\sqrt{\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})},

where n,δ={fg:f,gn,ρ1(f,g)<δFnP,2}\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>\rho_{1}(f,g)<\delta\|F_{n}\|_{P,2}\} and 𝒢n,δ={fg:f,g𝒢n,ρ2(f,g)<δGnQ,2}\mathcal{G}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{G}_{n},\>\rho_{2}(f,g)<\delta\|G_{n}\|_{Q,2}\}. Let rn=inf{ψn(δ)ϕn(δ):δ>0}r_{n}=\inf\big{\{}\psi_{n}(\delta)\vee\phi_{n}(\delta):\delta>0\big{\}}. Then,

sups0|{GPns}{ZQ𝒢ns}|(ΔP,Q(n,𝒢n)Var(GPn)Var(ZQ𝒢n))1/3+rn,\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\lesssim\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right)^{1/3}+\sqrt{r_{n}},

where

ΔP,Q(n,𝒢n):=supf1,f2n|𝒞P(f1,f2)Q(π𝒢n1f1),π𝒢n1f2)|supg1,g2𝒢n|Q(g1,g2)𝒞P(πn2g1,πn2g2)|,\displaystyle\begin{split}\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})&:=\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f_{1},f_{2})-\mathcal{E}_{Q}(\pi^{1}_{\mathcal{G}_{n}}f_{1}),\pi^{1}_{\mathcal{G}_{n}}f_{2})\big{|}\\ &\quad\quad\quad\bigvee\sup_{g_{1},g_{2}\in\mathcal{G}_{n}}\big{|}\mathcal{E}_{Q}(g_{1},g_{2})-\mathcal{C}_{P}(\pi^{2}_{\mathcal{F}_{n}}g_{1},\pi^{2}_{\mathcal{F}_{n}}g_{2})\big{|},\end{split} (5)

where \lesssim hides an absolute constant independent of n,n,𝒢n,Fn,Gn,ψn,ϕn,Pn,\mathcal{F}_{n},\mathcal{G}_{n},F_{n},G_{n},\psi_{n},\phi_{n},P and QQ.

Remark 3 (Comparison inequality for Gaussian QQ-bridge processes).

Theorem 3 holds without changes (and identical proof) for Gaussian QQ-bridge processes {GQ(f):fn}\{G_{Q}(f):f\in\mathcal{F}_{n}\}. This easily follows from the almost sure representation GQ(f)=ZQ(f)+(Pf)ZG_{Q}(f)=Z_{Q}(f)+(Pf)Z for all fnf\in\mathcal{F}_{n}, where ZN(0,1)Z\sim N(0,1) is independent of ZQ(f)Z_{Q}(f) for all fnf\in\mathcal{F}_{n}. We state the comparison inequality for QQ-motions because this is the version that we use in the context of the Gaussian process bootstrap in Section 3.4.

Informally speaking, Theorem 3 states that the distributions of the suprema of any two Gaussian processes are close whenever their covariance functions are not too different. Note that we compare two Gaussian processes with different covariance functions 𝒞P\mathcal{C}_{P} and Q\mathcal{E}_{Q} that differ not only in their measures (PP and QQ) but also their functional form (viz. eq. (2) and (3) in Section 3.1). This turns out to be essential for developing alternatives to the classical Gaussian multiplier bootstrap procedure for suprema of empirical processes proposed by Chernozhukov et al. (2014a) and Chernozhukov et al. (2014b). We discuss this in detail in Section 3.4.

Under additional smoothness conditions on the Gaussian processes and assumptions on the function classes Theorem 3 simplifies. The special cases 𝒢n=n\mathcal{G}_{n}=\mathcal{F}_{n} and 𝒢nn\mathcal{G}_{n}\subseteq\mathcal{F}_{n} are of particular interest in the context of the bootstrap. They are the content of the next two corollaries.

Corollary 4.

Let nL2(S,𝒮,P)L2(S,𝒮,Q)\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)\cap L_{2}(S,\mathcal{S},Q). If the Gaussian PP-Bridge process {GP(f):fn}\{G_{P}(f):f\in\mathcal{F}_{n}\} and the Gaussian QQ-motion {ZQ(f):fn}\{Z_{Q}(f):f\in\mathcal{F}_{n}\} have almost surely uniformly continuous sample paths w.r.t. their respective standard deviation metrics dPd_{P} and dQd_{Q}, and EGPnEZQn<\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}\vee\mathrm{E}\|Z_{Q}\|_{\mathcal{F}_{n}}<\infty, then

sups0|{GPns}{ZQns}|(supf,gn|𝒞P(f,g)Q(f,g)|Var(GPn)Var(ZQn))1/3,\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\lesssim\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\mathcal{E}_{Q}(f,g)\big{|}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{F}_{n}})}\right)^{1/3},

where \lesssim hides an absolute constant independent of n,n,Pn,\mathcal{F}_{n},P, and QQ.

Corollary 5.

Let nL2(S,𝒮,P)\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P) and 𝒢nn\mathcal{G}_{n}\subseteq\mathcal{F}_{n} be a δFP,2\delta\|F\|_{P,2}-net of n\mathcal{F}_{n} with respect to dPd_{P}, δ>0\delta>0. If the Gaussian PP-Bridge process {GP(f):fn}\{G_{P}(f):f\in\mathcal{F}_{n}\} has almost surely uniformly continuous sample paths w.r.t. metric dPd_{P} and EGPn<\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}<\infty, then

sups0|{GPns}{GP𝒢ns}|(δFP,2supfnPf2Var(GPn)Var(GP𝒢n))1/3,\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\lesssim\left(\frac{\delta\|F\|_{P,2}\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{2}}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|G_{P}\|_{\mathcal{G}_{n}})}\right)^{1/3},

where \lesssim hides an absolute constant independent of δ,n,n\delta,n,\mathcal{F}_{n}, and PP.

3.4 Gaussian process bootstrap

In this section we develop a general framework for bootstrapping the distribution of suprema of empirical processes. We recover the classical Gaussian multiplier bootstrap as a special case of this more general framework. For concrete applications to statistical problems we refer to Section 4.

The following abstract approximation result generalizes Proposition 4 to empirical processes.

Theorem 6 (Abstract bootstrap approximation).

Let (n,ρ1)(\mathcal{F}_{n},\rho_{1}) and (𝒢n,ρ2)(\mathcal{G}_{n},\rho_{2}) be totally bounded pseudo-metric spaces. Further, let nL2(S,𝒮,P)\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P) and 𝒢nL2(S,𝒮,Q)\mathcal{G}_{n}\subset L_{2}(S,\mathcal{S},Q) have envelope functions FnL3(S,𝒮,P)F_{n}\in L_{3}(S,\mathcal{S},P) and GnL2(S,𝒮,Q)G_{n}\in L_{2}(S,\mathcal{S},Q), respectively. Suppose that there exist functions ψn,ϕn\psi_{n},\phi_{n} such that for δ>0\delta>0,

EGPn,δ𝔾nn,δP,1ψn(δ)Var(GPn),EZQ𝒢n,δϕn(δ)Var(ZQ𝒢n),\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\vee\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}\lesssim\psi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})},\quad\mathrm{E}\|Z_{Q}\|_{\mathcal{G}_{n,\delta}^{\prime}}\lesssim\phi_{n}(\delta)\sqrt{\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}, (6)

where n,δ={fg:f,gn,ρ1(f,g)<δFnP,2}\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>\rho_{1}(f,g)<\delta\|F_{n}\|_{P,2}\} and 𝒢n,δ={fg:f,g𝒢n,ρ2(f,g)<δGnQ,2}\mathcal{G}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{G}_{n},\>\rho_{2}(f,g)<\delta\|G_{n}\|_{Q,2}\}. Let rn=inf{ψn(δ)ϕn(δ):δ>0}r_{n}=\inf\big{\{}\psi_{n}(\delta)\vee\phi_{n}(\delta):\delta>0\big{\}}. Then, for each M0M\geq 0,

sups0|{𝔾nns}{ZQ𝒢ns}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}
FnP,3n1/3Var(GPn)+Fn𝟏{Fn>M}P,33FnP,33+EGPn+MnVar(GPn)\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\|F_{n}\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}
+(ΔP,Q(n,𝒢n)Var(GPn)Var(ZQ𝒢n))1/3+rn,\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right)^{1/3}+\sqrt{r_{n}},

where ΔP,Q(n,𝒢n)\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n}) is given in eq. (5) and \lesssim hides an absolute constant independent of n,rn,M,n,r_{n},M, n,\mathcal{F}_{n}, 𝒢n,Fn,Gn,ψn,ϕn,P\mathcal{G}_{n},F_{n},G_{n},\psi_{n},\phi_{n},P, and QQ.

Remark 4 (Bootstrap approximation with Gaussian QQ-bridge process).

Theorem 6 holds without changes (and identical proof) with the Gaussian QQ-bridge process {GQ(f):fn}\{G_{Q}(f):f\in\mathcal{F}_{n}\} substituted for the Gaussian QQ-motion {ZQ(f):fn}\{Z_{Q}(f):f\in\mathcal{F}_{n}\} (see also Remark 3).

Theorem 6 is an immediate consequence of Theorems 1 and 3 and the triangle inequality, and thus a mathematical triviality. What matters is its statistical interpretation: It implies that we can approximate the sampling distribution of the supremum of an empirical process {𝔾n(f):fn}\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\} with the distribution of the supremum of a Gaussian QQ-motion {ZQ(g):g𝒢n}\{Z_{Q}(g):g\in\mathcal{G}_{n}\} provided that their covariance functions 𝒞P\mathcal{C}_{P} and Q\mathcal{E}_{Q} do not differ by too much. Importantly, up to the smoothness condition in (6), we are completely free to choose whatever QQ-measure and function class 𝒢n\mathcal{G}_{n} suit us best.

This interpretation of Theorem 6 motivates the following bootstrap procedure for estimating the distribution of the supremum of an empirical process {𝔾n(f):fn}\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\} when the PP-measure is unknown.

Algorithm 1 (Gaussian process bootstrap).

Let X1,,XnX_{1},\ldots,X_{n} be a simple random sample drawn from distribution PP.

  • Step 1:

    Construct a positive semi-definite estimate 𝒞^n\widehat{\mathcal{C}}_{n} of the covariance function 𝒞P\mathcal{C}_{P} of the empirical process {𝔾n(f):fn}\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}.

  • Step 2:

    Construct a Gaussian process Z^n={Z^n(f):fn}\widehat{Z}_{n}=\{\widehat{Z}_{n}(f):f\in\mathcal{F}_{n}\} such that for all f,gnf,g\in\mathcal{F}_{n},

    E[Z^n(f)X1,,Xn]=0andE[Z^n(f)Z^n(g)X1,,Xn]=𝒞^n(f,g).\displaystyle\mathrm{E}\big{[}\widehat{Z}_{n}(f)\mid X_{1},\ldots,X_{n}\big{]}=0\quad\mathrm{and}\quad\mathrm{E}\big{[}\widehat{Z}_{n}(f)\widehat{Z}_{n}(g)\mid X_{1},\ldots,X_{n}\big{]}=\widehat{\mathcal{C}}_{n}(f,g). (7)
  • Step 3:

    Approximate the distribution of 𝔾nn\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}} by drawing Monte Carlo samples from Z^nn\|\widehat{Z}_{n}\|_{\mathcal{F}_{n}}.

By Kolmogorov’s consistency theorem the Gaussian process Z^n\widehat{Z}_{n} will always exist. If the estimate 𝒞^n\widehat{\mathcal{C}}_{n} is uniformly consistent and if the envelope function FnF_{n} and the supremum of the Gaussian PP-bridge process satisfy certain moment conditions, Theorem 6 readily implies uniform consistency of the Gaussian process bootstrap as nn\rightarrow\infty. The challenge is to actually construct a (measurable) version of Z^n\widehat{Z}_{n} from which we can draw Monte Carlo samples. This is the content the next section.

Remark 5 (Relation to the Gaussian multiplier bootstrap by Chernozhukov et al. (2014b)).

The Gaussian multiplier bootstrap is a special case of the Gaussian process bootstrap. To see this, note that if we estimate the covariance function 𝒞P\mathcal{C}_{P} nonparametrically via the sample covariance function 𝒞Pn(f,g)=Pnfg(Pnf)(Png)\mathcal{C}_{P_{n}}(f,g)=P_{n}fg-(P_{n}f)(P_{n}g), f,gnf,g\in\mathcal{F}_{n}, then the corresponding Gaussian PnP_{n}-Bridge process {GPn(f):fn}\{G_{P_{n}}(f):f\in\mathcal{F}_{n}\} can be expanded into a finite sum of non-orthogonal functions

GPn(f)=1ni=1nξi(f(Xi)Pnf),fn,\displaystyle G_{P_{n}}(f)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\big{(}f(X_{i})-P_{n}f\big{)},\quad{}f\in\mathcal{F}_{n},

where ξ1,,ξn\xi_{1},\ldots,\xi_{n} are i.i.d. standard normal random variables independent of X1,,XnX_{1},\ldots,X_{n}. While this representation is extremely simple (in theory and practice!), unlike the Gaussian process bootstrap it does not allow integrating additional structural information about the covariance function. Empirically, this often results in less accurate bootstrap estimates (Giessing and Fan, 2023).

3.5 A practical guide to the Gaussian process bootstrap

To implement the Gaussian process bootstrap we need two things: a uniformly consistent estimate of the covariance function and a systematic way of constructing Gaussian processes for given covariance functions.

Finding consistent estimates of the covariance function is relatively straightforward. For example, a natural estimate is the nonparametric sample covariance function defined as 𝒞Pn(f,g):=Pn(fg)(Pnf)(Png)\mathcal{C}_{P_{n}}(f,g):=P_{n}(fg)-(P_{n}f)(P_{n}g), f,gnf,g\in\mathcal{F}_{n}. Under suitable conditions on the probability measure PP and the function class n\mathcal{F}_{n} this estimate is uniformly consistent in high dimensions (e.g. Koltchinskii and Lounici, 2017). Of course, an important feature of the Gaussian process bootstrap is that it can be combined with any positive semi-definite estimate of the covariance function. In particular, we can use (semi-)parametric estimators to exploit structural constraints induced by PP and n\mathcal{F}_{n}. For concrete examples we refer to Section 4 and also Giessing and Fan (2023).

Constructing Gaussian processes for given covariance functions and defined on potentially arbitrary index sets is a more challenging problem. We propose to base the construction on an almost sure version of the classical Karhunen-Loève decomposition.

To develop this decomposition in the framework of this paper, we need new notation and concepts. In the following, (n,n,μ)(\mathcal{F}_{n},\mathcal{B}_{n},\mu) denotes a measurable space for some finite measure μ\mu with support n\mathcal{F}_{n}. Typically, n\mathcal{B}_{n} is the Borel σ\sigma-algebra on n\mathcal{F}_{n} and the measure μ\mu is chosen for convenience. For example, μ\mu can be set to the Lebesgue measure when \mathcal{F} an interval, or the counting measure when n\mathcal{F}_{n} is a finite (discrete) set. The space L2(n,n,μ)L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu) equipped with the inner product ψ,ϕ:=nψ(f)ϕ(f)𝑑μ(f)\langle\psi,\phi\rangle:=\int_{\mathcal{F}_{n}}\psi(f)\phi(f)d\mu(f) is a Hilbert space. Given a positive semi-definite and continuous kernel K:n×nK:\mathcal{F}_{n}\times\mathcal{F}_{n}\rightarrow\mathbb{R} we define the linear operator TKT_{K} on L2(n,n,μ)L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu) via

(TKψ)(g):=K(,g),ψ=nK(f,g)ψ(f)𝑑μ(f),ψL2(n,n,μ),gn.\displaystyle(T_{K}\psi)(g):=\langle K(\cdot,g),\psi\rangle=\int_{\mathcal{F}_{n}}K(f,g)\psi(f)d\mu(f),\quad\psi\in L^{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu),\quad g\in\mathcal{F}_{n}. (8)

If n×nK2(f,g)𝑑μ(f)𝑑μ(g)<\int_{\mathcal{F}_{n}\times\mathcal{F}_{n}}K^{2}(f,g)d\mu(f)d\mu(g)<\infty, then TKT_{K} is a bounded linear operator. If (n,dQ)(\mathcal{F}_{n},d_{Q}) is a compact metric space, then TKT_{K} is a compact operator. In this case, TKT_{K} has a spectral decomposition which depends on the kernel KK alone; the measure μ\mu is exogenous. For further details we refer the reader to Chapters 4 and 7 in Hsing and Eubank (2015).

The following result is well-known (Jain and Kallianpur, 1970); since it is slightly adapted to our setting we have included its proof in the appendix.

Proposition 7 (Karhunen-Loève decomposition of Gaussian processes).

Let (n,eQ)(\mathcal{F}_{n},e_{Q}) be a compact pseudo-metric space and ZQ={ZQ(f):fn}Z_{Q}=\{Z_{Q}(f):f\in\mathcal{F}_{n}\} be a Gaussian QQ-motion. Suppose that ZQZ_{Q} has a continuous covariance function Q\mathcal{E}_{Q} and EZQn<\mathrm{E}\|Z_{Q}\|_{\mathcal{F}_{n}}<\infty. Let {(λk,φk)}k=1\{(\lambda_{k},\varphi_{k})\}_{k=1}^{\infty} be the eigenvalue and eigenfunction pairs of the linear operator TQT_{\mathcal{E}_{Q}} corresponding to Q\mathcal{E}_{Q}, and {ξk}k=1\{\xi_{k}\}_{k=1}^{\infty} be a sequence of i.i.d. standard normal random variables. Then, with probability one,

limmZQmZQn=0,\displaystyle\lim_{m\rightarrow\infty}\|Z_{Q}^{m}-Z_{Q}\|_{\mathcal{F}_{n}}=0,

where ZQm(f):=k=1mξkλkφk(f)Z_{Q}^{m}(f):=\sum_{k=1}^{m}\xi_{k}\sqrt{\lambda_{k}}\varphi_{k}(f), fnf\in\mathcal{F}_{n}, is an almost surely bounded and continuous Gaussian process on n\mathcal{F}_{n} for all mm\in\mathbb{N}.

This proposition justifies the following constructive approximation of the Gaussian proxy process Z^n\widehat{Z}_{n} defined in (7): Let 𝒞^n\widehat{\mathcal{C}}_{n} be a generic estimate of the covariance function 𝒞P\mathcal{C}_{P} of the empirical process {𝔾n(f):fn}\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}. Suppose that 𝒞^n\widehat{\mathcal{C}}_{n} is such that the associated integral operator T𝒞^nT_{\widehat{\mathcal{C}}_{n}} defined in (8) admits a spectral decomposition. Denote by {(λ^k,φ^k)}k=1\big{\{}(\widehat{\lambda}_{k},\widehat{\varphi}_{k})\big{\}}_{k=1}^{\infty} its eigenvalue and eigenfunction pairs, and, for concreteness, assume that the λ^k\widehat{\lambda}_{k}’s are sorted in nonincreasing order. Given a sequence of i.i.d. standard normal random variables {ξk}k=1\{\xi_{k}\}_{k=1}^{\infty} and truncation level m1m\geq 1, define the Gaussian process and associated covariance function

Z^nm(f):=k=1mξkλ^kφ^k(f),𝒞^nm(f,g):=k=1mλ^kφ^k(f)φ^k(g),f,gn.\displaystyle\widehat{Z}_{n}^{m}(f):=\sum_{k=1}^{m}\xi_{k}\sqrt{\widehat{\lambda}_{k}}\widehat{\varphi}_{k}(f),\quad\quad\widehat{\mathcal{C}}_{n}^{m}(f,g):=\sum_{k=1}^{m}\widehat{\lambda}_{k}\widehat{\varphi}_{k}(f)\widehat{\varphi}_{k}(g),\quad f,g\in\mathcal{F}_{n}. (9)

While there are only a few situations in which the eigenvalues and eigenfunctions of T𝒞^nT_{\widehat{\mathcal{C}}_{n}} can be found analytically, from a computational perspective this is a standard problem and efficient numerical solvers exist (Ghanem and Spanos, 2003; Berlinet and Thomas-Agnan, 2004). Thus, constructing the process Z^nm\widehat{Z}_{n}^{m} in (9) does not pose any practical challenges. Proposition 7 now guarantees (under appropriate boundedness and smoothness conditions on Z^n\widehat{Z}_{n}) that the approximation error between Z^n\widehat{Z}_{n} and Z^nm\widehat{Z}_{n}^{m} can be made arbitrarily small by choosing m1m\geq 1 sufficiently large. In fact, by Mercer’s theorem the error can be quantified in terms of the operator norm of the difference between the covariance functions 𝒞^n\widehat{\mathcal{C}}_{n} and 𝒞^nm\widehat{\mathcal{C}}_{n}^{m} (e.g. Hsing and Eubank, 2015, Lemma 4.6.6 and Corollary 8).

In above discussion we have implicitly imposed several assumptions on the estimate 𝒞^n\widehat{\mathcal{C}}_{n}. For future reference, we summarize these assumptions in a single definition.

Definition 1 (Admissibility of 𝒞^n\widehat{\mathcal{C}}_{n}).

We say that an estimate 𝒞^n:n×n\widehat{\mathcal{C}}_{n}:\mathcal{F}_{n}\times\mathcal{F}_{n}\rightarrow\mathbb{R} of the covariance function 𝒞P\mathcal{C}_{P} is admissible if it is continuous, symmetric, positive semi-definite, and its associated integral operator T𝒞^nT_{\widehat{\mathcal{C}}_{n}} is a bounded linear operator on L2(n,n,μ)L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu) for some finite measure μ\mu with support n\mathcal{F}_{n}.

Remark 6 (Admissible estimates exist).

Under the assumption that n\mathcal{F}_{n} has an envelope function, the nonparametric sample covariance 𝒞Pn(f,g)=Pn(fg)(Pnf)(Png)\mathcal{C}_{P_{n}}(f,g)=P_{n}(fg)-(P_{n}f)(P_{n}g) is admissible. Indeed, by definition, 𝒞Pn\mathcal{C}_{P_{n}} is continuous (w.r.t. ePne_{P_{n}}), symmetric, positive semi-definite, and, by existence of the envelope function, n×n𝒞Pn2(f,g)𝑑Pn(f)𝑑Pn(g)<\int_{\mathcal{F}_{n}\times\mathcal{F}_{n}}\mathcal{C}_{P_{n}}^{2}(f,g)dP_{n}(f)dP_{n}(g)<\infty. Also, the existence of an envelop function implies that (n,ePn)(\mathcal{F}_{n},e_{P_{n}}) is a compact metric space. See also Section 4.

The next result establishes consistency of the Gaussian process bootstrap based on the truncated Karhunen-Loève expansion in (9). It is a simple corollary of Theorem 6. Note that the intrinsic standard deviation metric associated with (the pushforward probability measure induced by) Z^nm\widehat{Z}_{n}^{m} can be expressed in terms of its covariance function as d𝒞^nm2(f,g)=𝒞^nm(f,f)+𝒞^nm(g,g)2𝒞^nm(f,g)d_{\widehat{\mathcal{C}}_{n}^{m}}^{2}(f,g)=\widehat{\mathcal{C}}_{n}^{m}(f,f)+\widehat{\mathcal{C}}_{n}^{m}(g,g)-2\widehat{\mathcal{C}}_{n}^{m}(f,g).

Corollary 8 (Consistency of the Gaussian process bootstrap).

Let 𝒞^n\widehat{\mathcal{C}}_{n} be an admissible estimate of 𝒞P\mathcal{C}_{P} and 𝒞^nm\widehat{\mathcal{C}}_{n}^{m} its best rank-mm approximation. Let nL2(S,𝒮,P)\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P) have envelope FnL3(S,𝒮,P)F_{n}\in L_{3}(S,\mathcal{S},P). If the Gaussian processes {GP(f):fn}\{G_{P}(f):f\in\mathcal{F}_{n}\} and {Z^nm(f):fn}\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\} have almost surely uniformly continuous sample paths w.r.t. their respective standard deviation metrics dPd_{P} and d𝒞^nmd_{\widehat{\mathcal{C}}_{n}^{m}}, and EGPnE[Z^nmnX1,,Xn]<\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}\vee\mathrm{E}\big{[}\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\mid X_{1},\ldots,X_{n}\big{]}<\infty, then, for each M0M\geq 0,

sups0|{𝔾nns}{Z^nmnsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
FnP,3n1/3Var(GPn)+Fn𝟏{Fn>M}P,33FnP,33+EGPn+MnVar(GPn)\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\|F_{n}\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}
+(supf,gn|𝒞P(f,g)𝒞^n(f,g)|Var(GPn))1/3+(supf,gn|𝒞^n(f,g)𝒞^nm(f,g)|Var(GPn))1/3,\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}(f,g)\big{|}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right)^{1/3}+\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right)^{1/3},

where \lesssim hides an absolute constant independent of n,m,M,n,Fn,Pnn,m,M,\mathcal{F}_{n},F_{n},P_{n}, and PP.

Moreover, if n\mathcal{F}_{n} is compact w.r.t. d𝒞^nd_{\widehat{\mathcal{C}}_{n}}, then the last term on the right hand side in above display vanishes as mm\rightarrow\infty.

In general, the strong variance Var(GPn)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}) depends on the dimension of the function class n\mathcal{F}_{n}. Hence, the truncation level mm has to be chosen (inverse) proportionate to Var(GPn)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}) to ensure that in the upper bound of Corollary 8 the deterministic approximation error supf,gn|𝒞^n(f,g)𝒞^nm(f,g)|\sup_{f,g\in\mathcal{F}_{n}}\big{|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|} is negligible compared to the stochastic estimation errors.

We conclude this section with a consistency result on the bootstrap approximation of quantiles of suprema of empirical processes. This result is the empirical process analogue to Proposition 5. It is relevant in the context of hypothesis testing and construction of confidence regions. For α(0,1)\alpha\in(0,1) we denote the conditional α\alpha-quantile of the supremum of {Z^nm(f):fn}\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\} by

cn(α;𝒞^nm)\displaystyle c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m}) :=inf{s0:{Z^nmnsX1,,Xn}α}.\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\}.

We have the following result:

Theorem 9 (Quantiles of the Gaussian process bootstrap).

Consider the setup of Corollary 8. Let (Θn)n1(\Theta_{n})_{n\geq 1}\in\mathbb{R} be a sequence of arbitrary random variables, not necessarily independent of X1,,XnX_{1},\ldots,X_{n}. Then, for each M0M\geq 0,

supα(0,1)|{𝔾nn+Θncn(α;𝒞^nm)}α|\displaystyle\sup_{\alpha\in(0,1)}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}+\Theta_{n}\leq c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m})\right\}-\alpha\Big{|}
FnP,3n1/3Var(GPn)+Fn𝟏{Fn>M}P,33FnP,33+EGPn+MnVar(GPn)\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\|F_{n}\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}
+infδ>0{(δVar(GPn))1/3+{supf,gn|𝒞P(f,g)𝒞^nm(f,g)|>δ}}\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right)^{1/3}+\mathbb{P}\left\{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}>\delta\right\}\right\}
+infη>0{ηVar(GPn)+{|Θn|>η}},\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\mathbb{P}\left\{|\Theta_{n}|>\eta\right\}\right\},

where \lesssim hides an absolute constant independent of n,m,M,n,Fn,Pnn,m,M,\mathcal{F}_{n},F_{n},P_{n}, and PP.

In statistical applications, the statistic of interest is rarely a simple empirical process. Instead, the empirical process usually arises as the leading term of a (functional) Taylor expansion. The random sequence (Θn)n1(\Theta_{n})_{n\geq 1} in above theorem can be used to capture the higher-order approximation errors of such a expansion.

Remark 7 (Additional practical considerations).

It is often infeasible to draw Monte Carlo samples directly from Znmn\|Z_{n}^{m}\|_{\mathcal{F}_{n}}. In practice, we suggest approximating n\mathcal{F}_{n} via a finite δFP,2\delta\|F\|_{P,2}-net 𝒢nn\mathcal{G}_{n}\subseteq\mathcal{F}_{n} with respect to dPd_{P}. With this additional approximation step and by Corollary 5, the conclusion of the Corollary 8 holds with Z^nm𝒢n\|\widehat{Z}_{n}^{m}\|_{\mathcal{G}_{n}} substituted for Z^nmn\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}} and the additional quantity (δFP,2supfnPf2)1/3Var(GPn)1/3\big{(}\delta\|F\|_{P,2}\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{2}}\big{)}^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})^{-1/3} in the upper bound on the Kolmogorov distance. This shows that the level of discretization δ>0\delta>0 should be chosen proportional to Var(GPn)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}).

3.6 Relation to previous work

The papers by Chernozhukov et al. (2014a, b, 2016) are currently the only other existing works on Gaussian and bootstrap approximations of suprema of empirical processes indexed by (potentially) non-Donsker function classes. In this section, we compare their results to our Theorems 19. To keep the comparison we focus on the key aspects that motivated us to write this paper.

It is important to note that the results presented in Chernozhukov et al. (2014a, b, 2016) differ slightly in nature from ours. Instead of establishing bounds on Kolmogorov distances, Chernozhukov and his co-authors derive coupling inequalities. However, through standard arguments (Strassen’s theorem and anti-concentration) these coupling inequalities indirectly yield bounds on Kolmogorov distances. These implied bounds on the Kolmogorov distances are at most as sharp as the ones in the coupling inequalities, up to multiplicative constants. Since we do not care about absolute constants, but only the dependence of the upper bounds on characteristics of the function class n\mathcal{F}_{n} and the law PP, we can meaningfully compare their findings and ours.

  • Unbounded function classes. In classical empirical process theory the function class n\mathcal{F}_{n} is typically assumed to be uniformly bounded, i.e. supxS|f(x)|<\sup_{x\in S}|f(x)|<\infty for all fnf\in\mathcal{F}_{n} (e.g. van der Vaart and Wellner, 1996). A key feature of Theorems 19 as well as Theorems A.2, 2.1, and 2.1-2.3 in Chernozhukov et al. (2014a, b, 2016) is that they hold for unbounded function classes and only require the envelope function FnF_{n} to have finite moments. Relaxing the uniform boundedness of the function class is useful, among other things, for inference on high-dimensional statistical models, functional data analysis, nonparametric regression, and series/ sieve estimation (Chernozhukov et al., 2014a, b; Giessing and Fan, 2023).

  • Entropy conditions. The upper bounds provided in Theorems A.2, 2.1, and 2.1-2.3 in Chernozhukov et al. (2014a, b, 2016) depend on a combination of (truncated) second and qqth (q3q\geq 3) moments of max1inFn\max_{1\leq i\leq n}F_{n} (Xi)(X_{i}), “local quantities” of order supfnP|f|3\sup_{f\in\mathcal{F}_{n}}P|f|^{3} and supfnPf4\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{4}} (disregarding (logn)(\log n)-factors), EGPn,δ𝔾nn,δP,1\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\vee\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}, and the entropy number logN(,eP,δFnP,2)\log N(\mathcal{F},e_{P},\\ \delta\|F_{n}\|_{P,2}), δ>0\delta>0 arbitrary. These upper bounds are not only more complex than ours but also weaker in terms of their implications: Since metric entropy with respect to intrinsic standard deviation metric ePe_{P} typically scales linearly in the (VC-)dimension of the statistical model (see Appendix A for two (counter-)examples), the upper bounds in Chernozhukov et al. (2014a, b, 2016) are vacuous in high-dimensional situations without sparsity or when the (VC-)dimension exceeds the sample size. In contrast, the upper bounds in our Theorems 19 depend only on the expected value and standard deviation of the supremum of the Gaussian PP-bridge process and the (truncated) third moments of the envelope. Under mild assumptions, these quantities can be upper bounded independently of the (VC-)dimension, thus offering useful bounds even in high-dimensional problems (see Section 4).

    The entropy term in Chernozhukov et al. (2014a, b, 2016) is due to their proofs relying on a dimension-dependent version of Proposition 1.

  • Lower bounds on weak variances. Lemmas 2.3 and 2.4 in Chernozhukov et al. (2014b) and all of the results in Chernozhukov et al. (2014a, 2016) require a strictly positive lower bound on the weak variance of the Gaussian PP-bridge process, i.e. inffnVar(GP(f))σ¯2>0\inf_{f\in\mathcal{F}_{n}}\mathrm{Var}\big{(}G_{P}(f)\big{)}\geq\underline{\sigma}^{2}>0. This assumption automatically limits the applicability of these lemmas to studentized statistics/ standardized function classes (Chernozhukov et al., 2014a, b) and excludes relevant scenarios with variance decay (Lopes et al., 2020; Lopes, 2022b; Lopes et al., 2023). In contrast, our Theorems 19 apply to all function classes for which the strong variance of the Gaussian PP-bridge process, Var(GPn)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}), does not vanish “too fast”. A slowly vanishing strong variance is a weaker requirement than a strictly positive lower bound on the weak variance (see Giessing (2023) and also Lemmas 6 and 7).

    The lower bound on the weak variance is an artifact of Chernozhukov et al.’s (2014a; 2014b; 2016) proof technique which requires the joint distribution of the finite-dimensional marginals of the Gaussian PP-bridge process to be non-degenerate.

4 Applications

4.1 Confidence ellipsoids for high-dimensional parameter vectors

Confidence regions are fundamental to uncertainty quantification in multivariate statistics. Typically, their asymptotic validity relies on multivariate CLTs. However, in high dimensions, when the number of parameters exceeds the sample size, validity of confidence regions needs to be justified differently. In this section, we show how the Gaussian process bootstrap offers a practical solution to this problem. Existing work on bootstrap confidence regions in high dimensions has focused exclusively on conservative, rectangular confidence regions (e.g. Chernozhukov et al., 2013, 2014b, 2023). For the first time, our results allow construction of tighter, elliptical confidence regions and without sparsity assumptions.

Let θ0d\theta_{0}\in\mathbb{R}^{d} be an unknown parameter and θ^nθ^n(X1,,Xn)\hat{\theta}_{n}\equiv\hat{\theta}_{n}(X_{1},\ldots,X_{n}) an estimator for θ0\theta_{0} based on the simple random sample X1,,XnX_{1},\ldots,X_{n}. Then, an asymptotic 1α1-\alpha confidence ellipsoid for θ0\theta_{0} can be constructed as

n(c):={θd:nθ^nθ2c},\displaystyle\mathcal{E}_{n}(c):=\left\{\theta\in\mathbb{R}^{d}:\sqrt{n}\|\hat{\theta}_{n}-\theta\|_{2}\leq c\right\}, (10)

where 2\|\cdot\|_{2} denotes the Euclidean norm and c>0c>0 solves

limn{nθ^nθ02c}=1α.\displaystyle\lim_{n\rightarrow\infty}\mathbb{P}\left\{\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2}\leq c\right\}=1-\alpha.

In general, it is difficult to determine cc from above equation because the sampling distribution of nθ^nθ02\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2} often does not have an explicit form. However, in many cases, θ^n\hat{\theta}_{n} admits an asymptotically linear expansion of the form

θ^nθ0=1ni=1nψi+Rn,\displaystyle\hat{\theta}_{n}-\theta_{0}=\frac{1}{n}\sum_{i=1}^{n}\psi_{i}+R_{n},

where ψ1,,ψn\psi_{1},\ldots,\psi_{n} are centered i.i.d. random vectors (influence functions) and RnR_{n} is a remainder term. Thus, in these cases, we have

nθ^nθ02=supuSd1|1ni=1nψiu|+Θn,where|Θn|nRn2.\displaystyle\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2}=\sup_{u\in S^{d-1}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\psi_{i}^{\prime}u\right|+\Theta_{n},\quad\quad\text{where}\quad\quad|\Theta_{n}|\leq\sqrt{n}\|R_{n}\|_{2}.

Using this formulation, we can now apply Theorem 9 to approximate the quantile cc and complete the construction of the asymptotic confidence ellipsoid. By Algorithm 1 we need to construct a Gaussian process with index set Sd1S^{d-1} and whose covariance function approximates the bi-linear map (u,v)uΩψv(u,v)\mapsto u^{\prime}\Omega_{\psi}v, where Ωψ=E[ψ1ψ1]\Omega_{\psi}=\mathrm{E}[\psi_{1}\psi_{1}^{\prime}]. Clearly, the following process does the job:

{Z^ψu:uSd1},whereZ^ψX1,,XnN(0,Ω^ψ),\displaystyle\{\widehat{Z}_{\psi}^{\prime}u:u\in S^{d-1}\},\quad\quad\text{where}\quad\quad\widehat{Z}_{\psi}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{\psi}),

and Ω^ψ=n1i=1n(ψiψ¯n)(ψiψ¯n)\widehat{\Omega}_{\psi}=n^{-1}\sum_{i=1}^{n}(\psi_{i}-\overline{\psi}_{n})(\psi_{i}-\overline{\psi}_{n})^{\prime} and ψ¯n=n1i=1nψi\overline{\psi}_{n}=n^{-1}\sum_{i=1}^{n}\psi_{i}. We denote the α\alpha-quantile of the supremum of this process by

cn(α):=inf{c0:{Z^ψ2cX1,,Xn}α}.\displaystyle c_{n}(\alpha):=\inf\left\{c\geq 0:\mathbb{P}\left\{\|\widehat{Z}_{\psi}\|_{2}\leq c\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\}. (11)

To show that these quantiles uniformly approximate the quantiles of the (asymptotic) distribution of nθ^nθ02\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2} introduce the following assumptions:

Definition 2 (Sub-Gaussian random vector).

We call a centered random vector XdX\in\mathbb{R}^{d} sub-Gaussian if Xuψ22E[(Xu)2]\|X^{\prime}u\|_{\psi_{2}}^{2}\lesssim\mathrm{E}[(X^{\prime}u)^{2}] for all udu\in\mathbb{R}^{d}.

Assumption 1 (Sub-Gaussian influence functions).

The influence functions ψ,ψ1,,ψnd\psi,\psi_{1},\ldots,\psi_{n}\in\mathbb{R}^{d} are i.i.d. sub-Gaussian random vectors with covariance matrix Ωψ\Omega_{\psi} and (i) r(Ωψ)/r(Ωψ2)=O(1)r(\Omega_{\psi})/r(\Omega_{\psi}^{2})=O(1), (ii) r(Ωψ)=o(n1/6)r(\Omega_{\psi})=o(n^{1/6}), and (iii) nRn2=op(1)\sqrt{n}\|R_{n}\|_{2}=o_{p}(1), where r(A)=tr(A)/Aopr(A)=\mathrm{tr}(A)/\|A\|_{op} is the effective rank matrix AA.

Assumption 2 (Heavy-tailed influence functions).

The influence functions ψ,ψ1,,ψnd\psi,\psi_{1},\ldots,\psi_{n}\in\mathbb{R}^{d} are centered i.i.d. random vectors with covariance matrix Ωψ\Omega_{\psi} such that Mn2:=E[max1inψi22]/ΩψopM_{n}^{2}:=\mathrm{E}[\max_{1\leq i\leq n}\|\psi_{i}\|_{2}^{2}]/\|\Omega_{\psi}\|_{op} <<\infty and m2,ss:=E[ψ12s]/Ωψops/2<m_{2,s}^{s}:=\mathrm{E}[\|\psi_{1}\|_{2}^{s}]/\|\Omega_{\psi}\|_{op}^{s/2}<\infty for some s>3s>3. Furthermore, (i) r(Ωψ)/r(Ωψ2)=O(1)r(\Omega_{\psi})/r(\Omega_{\psi}^{2})=O(1), (ii) m2,s/m2,3=o(n1/31/s)m_{2,s}/m_{2,3}=o(n^{1/3-1/s}), (iii) m2,3=o(n1/6)m_{2,3}=o(n^{1/6}), (iv) Mn2(logn)=o(n)M_{n}^{2}(\log n)=o(n), and (v) nRn2=op(1)\sqrt{n}\|R_{n}\|_{2}=o_{p}(1), where r(A)=tr(A)/Aopr(A)=\mathrm{tr}(A)/\|A\|_{op} is the effective rank matrix AA.

Under either assumption, we have the following result:

Proposition 1 (Bootstrap confidence ellipsoid).

If either Assumption 1 or Assumption 2 holds, then

limnsupα(0,1)|{θ0n(cn(1α))}(1α)|=0,\displaystyle\lim_{n\rightarrow\infty}\sup_{\alpha\in(0,1)}\left|\mathbb{P}\Big{\{}\theta_{0}\in\mathcal{E}_{n}\big{(}c_{n}(1-\alpha)\big{)}\Big{\}}-(1-\alpha)\right|=0,

with the confidence ellipsoid n\mathcal{E}_{n} as defined in (10) and the conditional quantile cn(1α)c_{n}(1-\alpha) as in (11).

It is worth noticing that neither Assumption 1 nor 2 require sparsity of the parameter θ0\theta_{0}, the estimator θ^n\hat{\theta}_{n}, or the influence functions ψ1,,ψn\psi_{1},\ldots,\psi_{n}. Instead, Assumption 1 and 2 already hold if the covariance matrix Ωψ\Omega_{\psi} has bounded effective rank (see Giessing and Fan, 2023, Section 2.1 and Appendices A.1 and A.2). In this context it is important to notice that, unlike Chernozhukov et al. (2014b), we do not require a strictly positive lower bound on infu2=1E[(Zψu)2]λmin(Ωψ)\inf_{\|u\|_{2}=1}\mathrm{E}[(Z_{\psi}^{\prime}u)^{2}]\equiv\lambda_{\min}(\Omega_{\psi}), the smallest eigenvalue of the covariance matrix Ωψ\Omega_{\psi}. If such a lower bound was needed, the effective rank r(Ωψ)r(\Omega_{\psi}) would grow linearly in the dimension dd and Assumptions 1 and 2 would be violated if dnd\gg n (see Appendix A). Moreover, Proposition 1 cannot be deduced from Theorem 2.1 in Chernozhukov et al. (2014b) because the upper bound in their coupling inequality would feature the term dlog(1+1/ϵ)/n1/4\sqrt{d\log(1+1/\epsilon)}/n^{1/4} (and other terms as well), which is a remnant of the entropy number of the ϵ\epsilon-net discretization of the dd-dimensional Euclidean unit-ball.

4.2 Inference on the spectral norm of high-dimensional covariance matrices

Spectral statistics of random matrices play an important role in multivariate statistical analysis. The asymptotic distributions of spectral statistics are well established in low dimensions (e.g. Anderson, 1963; Waternaux, 1976; Fujikoshi, 1980) and when the dimension is comparable to the sample size (e.g. Johnstone, 2001; El Karoui, 2007; Péché, 2009; Bao et al., 2015). In the high-dimensional case, when asymptotic arguments do not apply, bootstrap procedures have proved to be effective in approximating distributions of certain maximum-type spectral statistics (e.g. Han et al., 2018; Naumov et al., 2019; Lopes et al., 2019, 2020; Lopes, 2022b). Here, we demonstrate that the Gaussian process bootstrap is a viable alternative to these bootstrap procedures in approximating the distribution of the spectral norm of a high-dimensional sample covariance matrix.

Let X1,,XndX_{1},\ldots,X_{n}\in\mathbb{R}^{d} be i.i.d. random vectors with law PP, mean zero, and covariance matrix Σd×d\Sigma\in\mathbb{R}^{d\times d}. Consider the spectral statistic

Tn:=nΣ^nΣop,Σ^n=n1i=1nXiXi.\displaystyle T_{n}:=\sqrt{n}\|\widehat{\Sigma}_{n}-\Sigma\|_{op},\quad\quad{}\widehat{\Sigma}_{n}=n^{-1}\sum_{i=1}^{n}X_{i}X_{i}^{\prime}.

Since there does not exist a closed form expression of the sampling distribution of TnT_{n} when dnd\gg n, we apply Algorithm 1 to obtain an approximation based on a Gaussian proxy process. We introduce the following notation: Let x,u,vdx,u,v\in\mathbb{R}^{d} and note that (xu)(xv)=vech(xx)Hd(vu)(x^{\prime}u)(x^{\prime}v)=\mathrm{vech}^{\prime}(xx^{\prime})H_{d}^{\prime}(v\otimes u), where \otimes denotes the Kronecker product, vech()\mathrm{vech}(\cdot) the half-vectorization operator which turns a symmetric matrix Ad×dA\in\mathbb{R}^{d\times d} into a d(d+1)/2d(d+1)/2 column vector (of unique entries), and HdH_{d} the duplication matrix such that Hdvech(A)=vec(A)H_{d}\>\mathrm{vech}(A)=\mathrm{vec}(A) where vec()\mathrm{vec}(\cdot) is the ordinary vectorization operator. Whence, TnnQnQnT_{n}\equiv\sqrt{n}\|Q_{n}-Q\|_{{\mathcal{F}_{n}}} where QnQ_{n} is the empirical measure of the collection of Yi=vech(XiXi)HdY_{i}=\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime})H_{d}^{\prime}, 1in1\leq i\leq n, QQ the pushforward of PP under the map XY=vech(XX)HdX\mapsto Y=\mathrm{vech}^{\prime}(XX^{\prime})H_{d}^{\prime}, and n={yy(vu):u,vSd1}\mathcal{F}_{n}=\{y\mapsto y(v\otimes u):u,v\in S^{d-1}\}. Since each fnf\in\mathcal{F}_{n} has a (not necessarily unique) representation in terms of u,vSd1u,v\in S^{d-1}, in the following we identify fnf\in\mathcal{F}_{n} with pairs (u,v)Sd1×Sd1(u,v)\in S^{d-1}\times S^{d-1}. The covariance function associated with the empirical process {n(QnQ)(f):fn}\{\sqrt{n}(Q_{n}-Q)(f):f\in\mathcal{F}_{n}\} is thus given by

((u1,v1),(u2,v2))\displaystyle\big{(}(u_{1},v_{1}),(u_{2},v_{2})\big{)} (v1u1)HdΩHd(v2u2),\displaystyle\mapsto(v_{1}\otimes u_{1})^{\prime}H_{d}\Omega H_{d}^{\prime}(v_{2}\otimes u_{2}),

where Ω=E[vech(XXΣ)vech(XXΣ)]d(d+1)/2×d(2+1)/2\Omega=\mathrm{E}\left[\mathrm{vech}(XX^{\prime}-\Sigma)\mathrm{vech}^{\prime}(XX^{\prime}-\Sigma)\right]\in\mathbb{R}^{d(d+1)/2\times d(2+1)/2}. Thus, in the light of Algorithm 1 the natural choice for the Gaussian bootstrap process is

{Z^nHd(vu):u,vSd1},whereZ^nX1,,XnN(0,Ω^n),\displaystyle\big{\{}\widehat{Z}_{n}^{\prime}H_{d}^{\prime}(v\otimes u):u,v\in S^{d-1}\big{\}},\quad\quad\text{where}\quad\quad\widehat{Z}_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{n}), (12)

and Ω^n=n1i=1nvech(XiXiΣ^n)vech(XiXiΣ^n)\widehat{\Omega}_{n}=n^{-1}\sum_{i=1}^{n}\mathrm{vech}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n})\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n}) is the sample analogue of Ω\Omega.

We make the following assumption:

Assumption 3 (Sub-Gaussian data).

The data X,X1,,XndX,X_{1},\ldots,X_{n}\in\mathbb{R}^{d} are i.i.d. sub-Gaussian random vectors with covariance matrix Σ\Sigma and infu2=1Var((Xu)2)κ>0\inf_{\|u\|_{2}=1}\mathrm{Var}((X^{\prime}u)^{2})\geq\kappa>0.

Remark 1 (On the lower bound on the variances).

The strictly positive lower bound κ>0\kappa>0 on the variance of the quadratic from uXXuu^{\prime}X^{\prime}Xu is mild. The existence of the lower bound κ>0\kappa>0 is equivalent to E[(Xu)4]1/4>E[(Xu)2]1/2\mathrm{E}[(X^{\prime}u)^{4}]^{1/4}>\mathrm{E}[(X^{\prime}u)^{2}]^{1/2} for all uSd1u\in S^{d-1}. The latter inequality holds if the law of XX does not concentrate on a lower dimensional subspace of d\mathbb{R}^{d}. Since the bounds in Proposition 2 are explicit in κ>0\kappa>0, Proposition 2 also applies to scenarios in which κκ(n,d)0\kappa\equiv\kappa(n,d)\rightarrow 0 as n,dn,d\rightarrow\infty. Similar lower bounds on the variance of uXXuu^{\prime}X^{\prime}Xu appear in Lopes (2022b); Lopes et al. (2023).

We have the following result:

Proposition 2 (Bootstrap approximation of the distribution of spectral norms of covariance matrices).

Suppose that Assumption 3 holds. Let S^n=vec1(HdZ^n)d×d\widehat{S}_{n}=\mathrm{vec}^{-1}(H_{d}\widehat{Z}_{n})\in\mathbb{R}^{d\times d} with Z^n\widehat{Z}_{n} as given in (12). Then,

sups0|{nΣ^nΣops}{S^nopsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\sqrt{n}\|\widehat{\Sigma}_{n}-\Sigma\|_{op}\leq s\right\}-\mathbb{P}\left\{\|\widehat{S}_{n}\|_{op}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
Op((Σop2/3κ2/3Σop4/3κ4/3)((logen)2r7(Σ)n(logen)2r4(Σ)n)1/3)\displaystyle\quad\quad\quad\lesssim O_{p}\left(\left(\frac{\|\Sigma\|_{op}^{2/3}}{\kappa^{2/3}}\vee\frac{\|\Sigma\|_{op}^{4/3}}{\kappa^{4/3}}\right)\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}^{7}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}^{4}(\Sigma)}{n}\right)^{1/3}\right)
+(ΣopκΣop2κ2)r2(Σ)n1/6,\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\|\Sigma\|_{op}}{\kappa}\vee\frac{\|\Sigma\|_{op}^{2}}{\kappa^{2}}\right)\frac{r^{2}(\Sigma)}{n^{1/6}},

where r(Σ)=tr(Σ)/Σopr(\Sigma)=\mathrm{tr}(\Sigma)/\|\Sigma\|_{op} is the effective rank of Σ\Sigma and \lesssim hides an absolute constant independent of n,d,Σn,d,\Sigma, and κ\kappa.

Remark 2.

Note that the matrix S^n\widehat{S}_{n} is symmetric just as the target matrix Σ^nΣ\widehat{\Sigma}_{n}-\Sigma.

We conclude this section with a comparison of Proposition 2 to existing results in the literature: First, if (Σop/κΣop2/κ2)r2(Σ)=o(n1/6)(\|\Sigma\|_{op}/\kappa\vee\|\Sigma\|_{op}^{2}/\kappa^{2})r^{2}(\Sigma)=o(n^{1/6}), then the upper bound in above theorem is asymptotically negligible. In this case, the bootstrapped distribution of the Gaussian proxy statistic S^nop\|\widehat{S}_{n}\|_{op} consistently approximates the distribution of Σ^nΣop\|\widehat{\Sigma}_{n}-\Sigma\|_{op}. Since this rate depends only on the effective rank, it is dimension-free and cannot be derived through the results in Chernozhukov et al. (2014b). Second, unlike the results in Lopes (2022b) and Lopes et al. (2023), Proposition 2 does not rely on specific assumptions about the decay of the eigenvalues of Σ\Sigma. And yet, under certain circumstances, the consistency rate provided by Proposition 2 can be faster than theirs. Specifically, the bootstrap procedure described in Lopes (2022b) achieves consistency at a rate of nβ1/22β+4+ϵn^{-\frac{\beta-1/2}{2\beta+4+\epsilon}} for ϵ>0\epsilon>0 and β>1/2\beta>1/2. In our context, the parameter β\beta determines the rate at which the eigenvalues of Σ\Sigma decrease, i.e. λk(Σ)k2β\lambda_{k}(\Sigma)\asymp k^{-2\beta}, k=1,,dk=1,\ldots,d. To achieve a rate faster than n1/6n^{-1/6}, β\beta must be greater than (7+ϵ)/4(7+\epsilon)/4 which requires an extremely fast decay of the eigenvalues. Third, Lopes et al. (2023) conduct extensive numerical experiments and observe that the bootstrap approximation exhibits a sharp phase transition from accurate to inaccurate when β\beta switches from greater than 1/21/2 to less than 1/21/2. This observation aligns not only with their own theoretical findings but also with the upper bound presented in Proposition 2, since, under their modeling assumptions, the effective rank r(Σ)r(\Sigma) remains bounded if β>1/2\beta>1/2 but diverges if β1/2\beta\leq 1/2.

4.3 Simultaneous confidence bands for functions in reproducing kernel Hilbert spaces

Reproducing kernel Hilbert spaces (RKHS) are an integral part of statistics, with applications in classical non-parametric statistics (Wahba, 1990), machine learning (Schölkopf and Smola, 2002; Steinwart and Christmann, 2008) and, most recently, (deep) neural nets (Belkin et al., 2018; Jacot et al., 2018; Bohn et al., 2019; Unser, 2019; Chen and Xu, 2020). In this section, we consider constructing simultaneous confidence bands for functions in RKHS by bootstrapping the distribution of a bias-corrected kernel ridge regression estimator. Recently, this problem has been addressed by Singh and Vijaykumar (2023) with a symmetrizied multiplier bootstrap. Here, we propose an alternative based on the Gaussian process bootstrap using a truncated Karhunen-Loève decomposition. We point out several commonalities and differences between the two procedures.

In the following, \mathcal{H} denotes the RKHS of continuous functions from SS to \mathbb{R} associated to the symmetric and positive definite kernel k:S×Sk:S\times S\rightarrow\mathbb{R}. We allow \mathcal{H}, SS, and kk to change with the sample size nn, but we do not make this dependence explicit. We write \|\cdot\|_{\mathcal{H}} for the norm induced by the inner product ,\langle\cdot,\cdot\rangle_{\mathcal{H}} and \|\cdot\|_{\infty} for the supremum norm. For xSx\in S, we let kx:Sk_{x}:S\rightarrow\mathbb{R} be the function yk(x,y)y\mapsto k(x,y). Then, kxk_{x}\in\mathcal{H} for all xSx\in S and, by the reproducing property, f(x)=f,kxf(x)=\langle f,k_{x}\rangle_{\mathcal{H}} for all ff\in\mathcal{H} and xSx\in S. The kernel induces the so-called kernel metric dk(x,y):=kxkyd_{k}(x,y):=\|k_{x}-k_{y}\|_{\mathcal{H}}, x,ySx,y\in S. Given ff\in\mathcal{H} (zSz\in S) we denote its dual by ff^{*}\in\mathcal{H}^{*} (zSz^{*}\in S^{*}\subset\mathcal{H}). For f,gf,g\in\mathcal{H} we define the tensor product fg:f\otimes g^{*}:\mathcal{H}\rightarrow\mathcal{H} by h(fg)(h):=g,hfh\mapsto(f\otimes g^{*})(h):=\langle g,h\rangle_{\mathcal{H}}f. For operators on \mathcal{H} we use op\|\cdot\|_{op}, HS\|\cdot\|_{HS}, and tr()\mathrm{tr}(\cdot) to denote operator norm, Hilbert-Schmidt norm, and trace, respectively. For further details on RKHSs we refer to Berlinet and Thomas-Agnan (2004).

Let YY\in\mathbb{R} and XSX\in S have joint law PP. Given a simple random sample (Y1,X1),,(Yn,(Y_{1},X_{1}),\ldots,(Y_{n}, Xn)X_{n}) our goal is to construct uniform confidence bands for the conditional mean function xE[YX=x]x\mapsto\mathrm{E}[Y\mid X=x] or, rather, its best approximation in hypothesis space \mathcal{H}, i.e.

f0argminfE[(Yf(X))2].\displaystyle f_{0}\in\operatorname*{\arg\min}_{f\in\mathcal{H}}\mathrm{E}\left[\big{(}Y-f(X)\big{)}^{2}\right].

To this end, consider the classical kernel ridge regression estimator

f^nargminf{1ni=1n(Yif(Xi))2+λf2},λ>0,\displaystyle\widehat{f}_{n}\in\operatorname*{\arg\min}_{f\in\mathcal{H}}\left\{\frac{1}{n}\sum_{i=1}^{n}\big{(}Y_{i}-f(X_{i})\big{)}^{2}+\lambda\|f\|_{\mathcal{H}}^{2}\right\},\quad\lambda>0,

and define its bias-corrected version as

f^nbc:=f^n+λ(T^n+λ)1f^n,T^n=1ni=1n(kXikXi).\displaystyle\widehat{f}^{\mathrm{bc}}_{n}:=\widehat{f}_{n}+\lambda(\widehat{T}_{n}+\lambda)^{-1}\widehat{f}_{n},\quad\quad\widehat{T}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left(k_{X_{i}}\otimes k_{X_{i}}^{*}\right). (13)

We propose to construct simultaneous 1α1-\alpha confidence bands for f0f_{0} based on f^nbc\widehat{f}^{\mathrm{bc}}_{n} via the rectangle

n(c):={f:nf^nbcfc},\displaystyle\mathcal{R}_{n}(c):=\left\{f\in\mathcal{F}:\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f\|_{\infty}\leq c\right\}, (14)

where c>0c>0 approximates the (asymptotic) 1α1-\alpha quantile of the law of nf^nbcf0\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}. To compute c>0c>0 we proceed in two steps: First, we show that nf^nbcf0\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty} can be written as the sum of the supremum of an empirical process and a negligible remainder term. Then, we apply the strategy developed in Section 3.5 to bootstrap the supremum of the empirical process.

By Lemma 20 in Appendix B.5,

f^nbcf0=(T+λ)2T(1ni=1n(Yif0(Xi))kXi)+Rn,\displaystyle\widehat{f}^{\mathrm{bc}}_{n}-f_{0}=(T+\lambda)^{-2}T\left(\frac{1}{n}\sum_{i=1}^{n}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right)+R_{n},

where T=E[kXkX]T=\mathrm{E}[k_{X}\otimes k_{X}^{*}] and RnR_{n} is a higher-order remainder term. Since kk is a reproducing kernel, we have kx(z)=kx,kz=kx,zk_{x}(z)=\langle k_{x},k_{z}\rangle_{\mathcal{H}}=\langle k_{x},z^{*}\rangle_{\mathcal{H}} for all x,zSx,z\in S. Hence, above expansion implies

nf^nbcf0=supuS|1ni=1nVi,u|+Θn,where|Θn|nRn,\displaystyle\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}=\sup_{u\in S^{*}}\left|\left\langle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}V_{i},u\right\rangle_{\mathcal{H}}\right|+\Theta_{n},\quad\quad\text{where}\quad\quad|\Theta_{n}|\leq\sqrt{n}\|R_{n}\|_{\infty},

and Vi=(T+λ)2T(Yif0(Xi))kXiV_{i}=(T+\lambda)^{-2}T\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}, 1in1\leq i\leq n. By Lemma 21 in Appendix B.5, nRn\sqrt{n}\|R_{n}\|_{\infty} is negligible with high probability. Moreover, since f0f_{0} is the best approximation in square loss, the random elements ViV_{i}’s have mean zero. Thus, supuS|n1/2i=1nVi,u|nQnQn\sup_{u\in S^{*}}|\langle n^{-1/2}\sum_{i=1}^{n}V_{i},u\rangle_{\mathcal{H}}|\equiv\sqrt{n}\|Q_{n}-Q\|_{\mathcal{F}_{n}}, where QnQ_{n} is the empirical measure of the ViV_{i}’s, QQ the pushforward of PP under the map (Y,X)V=(T+λ)2T(Yf0(X))kX(Y,X)\mapsto V=(T+\lambda)^{-2}T\big{(}Y-f_{0}(X)\big{)}k_{X}, and n={vv,u:uS}\mathcal{F}_{n}=\{v\mapsto\langle v,u\rangle_{\mathcal{H}}:u\in S^{*}\}. Since the functions fnf\in\mathcal{F}_{n} are just the evaluation functionals of uSu\in S^{*}, in the following we identify fnf\in\mathcal{F}_{n} with its corresponding uSu\in S^{*}. The covariance function 𝒞:S×S\mathcal{C}:S^{*}\times S^{*}\rightarrow\mathbb{R} associated with the empirical process {n(QnQ)(f):fn}\{\sqrt{n}(Q_{n}-Q)(f):f\in\mathcal{F}_{n}\} is thus given by

(u1,u2)E[V,u1V,u2]=E[(VV)u1,u2]=Ωu1,u2,\displaystyle(u_{1},u_{2})\mapsto\mathrm{E}[\langle V,u_{1}\rangle_{\mathcal{H}}\langle V,u_{2}\rangle_{\mathcal{H}}]=\mathrm{E}\big{[}\big{\langle}(V\otimes V^{*})u_{1},u_{2}\big{\rangle}_{\mathcal{H}}\big{]}=\big{\langle}\Omega u_{1},u_{2}\big{\rangle}_{\mathcal{H}},

with covariance operator

Ω:=σ02T(T+λ)2T(T+λ)2T,whereσ02:=E[(Yf0(X))2],\displaystyle\Omega:=\sigma_{0}^{2}T(T+\lambda)^{-2}T(T+\lambda)^{-2}T,\quad\quad\text{where}\quad\quad\sigma_{0}^{2}:=\mathrm{E}\big{[}\big{(}Y-f_{0}(X)\big{)}^{2}\big{]},

where we have used that the operators TT and (T+λ)1(T+\lambda)^{-1} commute.

We proceed to construct a Gaussian proxy process as outlined in Section 3.5: Let Ω^n=σ^n2T^n(T^n+λ)2T^n(T^n+λ)2T^n\widehat{\Omega}_{n}=\widehat{\sigma}_{n}^{2}\widehat{T}_{n}(\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n}(\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n} and σ^n2=n1i=1n(Yif^n(Xi))2\widehat{\sigma}_{n}^{2}=n^{-1}\sum_{i=1}^{n}\big{(}Y_{i}-\widehat{f}_{n}(X_{i})\big{)}^{2} be the plug-in estimates of the covariance operator Ω\Omega and the variance σ02\sigma_{0}^{2}, respectively. Define 𝒞^n:S×S\widehat{\mathcal{C}}_{n}:S^{*}\times S^{*}\rightarrow\mathbb{R} by (u1,u2)Ω^nu1,u2(u_{1},u_{2})\mapsto\big{\langle}\widehat{\Omega}_{n}u_{1},u_{2}\big{\rangle}_{\mathcal{H}}. Recall definition (8) of the integral operator TKT_{K}. In the present setup, T𝒞^n=Ω^nT_{\widehat{\mathcal{C}}_{n}}=\widehat{\Omega}_{n} by Fubini’s theorem. Denote by {(λ^k,φ^k)}k=1\big{\{}(\widehat{\lambda}_{k},\widehat{\varphi}_{k})\big{\}}_{k=1}^{\infty} the eigenvalue and eigenfunction pairs of Ω^n\widehat{\Omega}_{n}. Further, let {ξk}k=1\{\xi_{k}\}_{k=1}^{\infty} be a sequence of i.i.d. standard normal random variables. Then, for m1m\geq 1 and u,u1,u2Su,u_{1},u_{2}\in S^{*} define

Z^nm(u):=k=1mξkλ^kφ^k(u),𝒞^nm(u1,u2):=k=1mλ^kφ^k(u1)φ^k(u2)=Ω^nmu1,u2.\displaystyle\widehat{Z}_{n}^{m}(u):=\sum_{k=1}^{m}\xi_{k}\sqrt{\widehat{\lambda}_{k}}\widehat{\varphi}_{k}(u),\quad\quad\widehat{\mathcal{C}}_{n}^{m}(u_{1},u_{2}):=\sum_{k=1}^{m}\widehat{\lambda}_{k}\widehat{\varphi}_{k}(u_{1})\widehat{\varphi}_{k}(u_{2})=\big{\langle}\widehat{\Omega}_{n}^{m}u_{1},u_{2}\big{\rangle}_{\mathcal{H}}. (15)

where Ω^nm\widehat{\Omega}_{n}^{m} is the best rank-mm approximation of Ω^n\widehat{\Omega}_{n}.

Given Proposition 7 we postulate that the process Z^n\widehat{Z}_{n}^{\infty} is an almost sure version of a Gaussian process on SS^{*} with covariance function 𝒞^n\widehat{\mathcal{C}}_{n} (or, equivalently, with covariance operator Ω^n\widehat{\Omega}_{n}. Consequently, Theorem 9 guarantees validity of the Gaussian process bootstrap based on Z^nm\widehat{Z}_{n}^{m}. To make all these claims rigorous, consider the following assumptions:

Assumption 4 (On the kernel).

The kernel k:S×Sk:S\times S\rightarrow\mathbb{R} is symmetric, positive semi-definite, continuous, and bounded, i.e. supxS|k(x,x)|=:κ<\sup_{x\in S}\sqrt{|k(x,x)|}=:\kappa<\infty.

Remark 3.

The assumptions on the kernel are standard and important (Berlinet and Thomas-Agnan, 2004). The continuity of kk guarantees that kXk_{X} is a random element on \mathcal{H} whenever XSX\in S is a random variable. It also implies that the RKHS \mathcal{H} is separable whenever SS is separable. The boundedness and the reproducing property of kk imply that fκf\|f\|_{\infty}\leq\kappa\|f\|_{\mathcal{H}} for all ff\in\mathcal{H}.

Assumption 5 (On the data).

The data (Y1,X1),,(Yn,Xn)×S(Y_{1},X_{1}),\ldots,(Y_{n},X_{n})\in\mathbb{R}\times S are i.i.d. random elements defined on an abstract product probability space (Ω,𝒜,)(\Omega,\mathcal{A},\mathbb{P}). The YiY_{i}’s are almost surely bounded, i.e. there exists an absolute constant B>0B>0 such that max1in|Yi|B\max_{1\leq i\leq n}|Y_{i}|\leq B almost surely.

Remark 4.

The almost sure boundedness of the YiY_{i}’s is a strong assumption. We introduce this assumption to keep technical arguments at a minimum. Singh and Vijaykumar (2023) impose an equivalent boundedness condition on the pseudo-residuals εi=Yif0(Xi)\varepsilon_{i}=Y_{i}-f_{0}(X_{i}), 1in1\leq i\leq n.

Assumption 6 (On the population and sample covariance operators).

For all n1n\geq 1 (i) there exists ωS>0\omega_{S}>0 such that infuSΩu,uωS\inf_{u\in S^{*}}\langle\Omega u,u\rangle_{\mathcal{H}}\geq\omega_{S} and (ii) tr(Ω)tr(Ω^n)<\mathrm{tr}(\Omega)\vee\mathrm{tr}(\widehat{\Omega}_{n})<\infty.

Remark 5.

Condition (i) is the Hilbert space equivalent to the lower bound on the variance in Assumption 3. While Singh and Vijaykumar (2023) do not explicitly impose a lower bound on the covariance operator, such a lower bound is implied by their Assumption 5.2 (see Giessing, 2023, for details). In eq. (LABEL:eq:theorem:Uniform-CI-Bands-RKHS-8) in Appendix 4 we provide a general non-asymptotic complement to below Proposition 3 which applies even if ωSωS(n)0\omega_{S}\equiv\omega_{S}(n)\rightarrow 0 as nn\rightarrow\infty. Condition (ii) is a classical assumption in learning theory on RKHS (Mendelson, 2002). Together with Conditions (i) it implies that Ω\Omega and Ω^n\widehat{\Omega}_{n} are finite rank and trace class operators, respectively.

Denote the α\alpha-quantile of the supremum of the Gaussian proxy process Z^nm\widehat{Z}_{n}^{m} in (15) by

cnm(α)\displaystyle c_{n}^{m}(\alpha) :=inf{s0:{supuS|Z^nm,u|s(Y1,X1),,(Yn,Xn)}α}.\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\sup_{u\in S^{*}}\left|\big{\langle}\widehat{Z}_{n}^{m},u\big{\rangle}_{\mathcal{H}}\right|\leq s\mid(Y_{1},X_{1}),\ldots,(Y_{n},X_{n})\right\}\geq\alpha\right\}. (16)

An application of Theorem 9 yields:

Proposition 3 (Bootstrap quantiles).

Let (S,dk)(S,d_{k}) be a compact metric space such that 0N(S,dk,ε)𝑑ε<\int_{0}^{\infty}\sqrt{N(S,d_{k},\varepsilon)}\\ d\varepsilon<\infty. If Assumptions 45, and 6, and the rates in eq. (77) in Appendix E hold, then

supα(0,1)|{nf^nbcf0cnm(α)}α|\displaystyle\sup_{\alpha\in(0,1)}\left|\mathbb{P}\left\{\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}\leq c_{n}^{m}(\alpha)\right\}-\alpha\right|
=o(1)+{Ω^nΩ^nmop>(κ𝔫1(λ)+σ¯2)T3(T+λ)4oplognnλ2},\displaystyle\quad=o(1)+\mathbb{P}\left\{\big{\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\|}_{op}>\big{(}\kappa\mathfrak{n}_{1}(\lambda)+\bar{\sigma}^{2}\big{)}\|T^{3}(T+\lambda)^{-4}\|_{op}\sqrt{\frac{\log n}{n\lambda^{2}}}\right\},

where σ¯2σ02κ2(B+κf0)21\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1 and 𝔫12(λ)=tr((T+λ)2T)\mathfrak{n}_{1}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2}T\right).

The finite metric entropy condition on the set SS ensures that the Gaussian bootstrap process (15) is almost surely bounded and uniformly continuous on SS^{*}\subset\mathcal{H} (as required by Theorem 9). This condition is not merely technical but also intuitive: Since the RKHS \mathcal{H} is the completion of span({kx:xS})\mathrm{span}(\{k_{x}:x\in S\}) and kk is bounded and continuous, conditions that guarantee the continuity of Gaussian bootstrap processes on (a subset of) \mathcal{H} should indeed be attributable to properties of SS. Importantly, the metric entropy condition on SS does not impose restrictions on the dimension of \mathcal{H}. Only Assumption 6 (ii) implicitly imposes restrictions on the dimension of \mathcal{H}.

Since under the conditions of Proposition 3, limmΩ^nΩ^nmop=0\lim_{m\rightarrow\infty}\|\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\|_{op}=0 almost surely for all n1n\geq 1, it follows that the bootstrap confidence band proposed in (14) is asymptotically valid:

Corollary 4 (Simultaneous bootstrap confidence bands).

Under the setup of Proposition 3,

limnlimmsupα(0,1)|{f0n(cnm(1α))}(1α)|=0,\displaystyle\lim_{n\rightarrow\infty}\lim_{m\rightarrow\infty}\sup_{\alpha\in(0,1)}\left|\mathbb{P}\Big{\{}f_{0}\in\mathcal{R}_{n}\big{(}c_{n}^{m}(1-\alpha)\big{)}\Big{\}}-(1-\alpha)\right|=0,

with the uniform confidence band n\mathcal{R}_{n} as defined in (14) and quantile cn(1α)c_{n}(1-\alpha) as in (16).

A thorough comparison of the Gaussian process bootstrap and Singh and Vijaykumar’s (2023) symmetrized multiplier bootstrap is beyond the scope of this paper. In practice, both methods yield biased confidence bands for f0f_{0}, albeit for different reasons: Singh and Vijaykumar’s (2023) bias stems from constructing a confidence band for the pseudo-true regression function fλ=(T+λ)1Tf0f_{\lambda}=(T+\lambda)^{-1}Tf_{0} without correcting the regularization bias induced by λ>0\lambda>0, ours is due to using an mm-truncated Karhunen-Loève decomposition based on a finite number of eigenfunctions. In future work we will explore ways to mitigate these biases by judiciously choosing λλ(n)0\lambda\equiv\lambda(n)\rightarrow 0 and mm(d)m\equiv m(d)\rightarrow\infty.

5 Conclusion

In this paper we have developed a new approach to theory and practice of Gaussian and bootstrap approximations of the sampling distribution of suprema of empirical processes. We have put special emphasize on non-asymptotic approximations that are entropy- and weak variance-free, and have allowed the function class n\mathcal{F}_{n} to vary with the sample size nn and to be non-Donsker. We have shown that such general approximation results are useful, among other things, for inference on high-dimensional statistical models and reproducing kernel Hilbert spaces. However, theory and methodology in this paper have three limitations that need to be addressed in future work:

  • Reliance on independent and identically distributed data. All statistically relevant results in this paper depend on Proposition 1, which heavily relies on the assumption of independent and identically distributed data. Expanding Proposition 1 to accommodate non-identical distributed data would be a first step towards solving simultaneous and large-scale two-sample testing problems and conducting inference in high-dimensional fixed design settings. Currently, the results in this paper are exclusively applicable to one-sample testing and unconditional inference.

  • Lack of tight lower bounds on the strong variances of most Gaussian processes. One the most notable features of the results in this paper is the fact that all upper bounds depend on the inverse of the strong variance of some Gaussian proxy process. Unfortunately, in statistical applications this poses a formidable challenge since up until now there exist only few techniques to derive tight lower bounds on these strong variances (Giessing and Fan, 2023; Giessing, 2023). We either need new tools or we need to develop Gaussian and bootstrap approximations for statistics other than maxima/ suprema. The latter will require new anti-concentration inequalities.

  • Biased quantile estimates due to bootstrapping non-pivotal statistic. The Gaussian process bootstrap is based on a non-pivotal statistic, i.e. the sampling distribution of the supremum depends on the unknown population covariance function. In practice, when bootstrapping non-pivotal statistics the estimated quantiles often differ substantially from the true quantiles. Several bias correction schemes have been proposed in the classical setting (Davison et al., 1986; Beran, 1987; Hall and Martin, 1988; Shi, 1992). Since the Gaussian process bootstrap is not a re-sampling procedure in the classical sense, these techniques do not apply. In Giessing and Fan (2023) we therefore develop the spherical bootstrap to improve accuracy and efficiency when bootstrapping p\ell_{p}-statistics. However, this approach does not generalize to arbitrary empirical processes as the one in Section 4.3. Thus, there is an urgent need for new bias correction schemes.

Acknowledgement

Alexander Giessing is supported by NSF grant DMS-2310578.

References

  • Anderson (2003) T. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics. Wiley, 2003. ISBN 9780471360919.
  • Anderson (1963) T. W. Anderson. Asymptotic Theory for Principal Component Analysis. The Annals of Mathematical Statistics, 34(1):122 – 148, 1963.
  • Bao et al. (2015) Z. Bao, G. Pan, and W. Zhou. Universality for the largest eigenvalue of sample covariance matrices with general population. The Annals of Statistics, 43(1):382–421, 2015.
  • Belkin et al. (2018) M. Belkin, S. Ma, and S. Mandal. To understand deep learning we need to understand kernel learning. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 541–549. PMLR, 2018.
  • Bentkus (2003) V. Bentkus. On the dependence of the Berry–Esseen bound on dimension. Journal of Statistical Planning and Inference, 113(2):385 – 402, 2003. ISSN 0378-3758.
  • Beran (1987) R. Beran. Prepivoting to reduce level error of confidence sets. Biometrika, 74(3):457–468, 1987.
  • Berlinet and Thomas-Agnan (2004) A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, 2004.
  • Bhattacharya and Holmes (2010) R. Bhattacharya and S. Holmes. An exposition of Götze’s estimation of the rate of convergence in the multivariate central limit theorem, 2010.
  • Bohn et al. (2019) B. Bohn, C. Rieger, and M. Griebel. A representer theorem for deep kernel learning. The Journal of Machine Learning Research, 20(1):2302–2333, 2019.
  • Bolthausen (1984) E. Bolthausen. An estimate of the remainder in a combinatorial central limit theorem. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 66(3):379–386, 1984.
  • Bong et al. (2023) H. Bong, A. K. Kuchibhotla, and A. Rinaldo. Dual induction clt for high-dimensional m-dependent data, 2023.
  • Cattaneo et al. (2022) M. D. Cattaneo, R. P. Masini, and W. G. Underwood. Yurinskii’s coupling for martingales, 2022.
  • Chen and Xu (2020) L. Chen and S. Xu. Deep neural tangent kernel and laplace kernel have the same rkhs. In International Conference on Learning Representations, 2020.
  • Chen (2018) X. Chen. Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications. Ann. Statist., 46(2):642–678, 04 2018.
  • Chen et al. (2015) Y.-C. Chen, C. R. Genovese, and L. Wasserman. Asymptotic theory for density ridges. The Annals of Statistics, 43(5):1896–1928, 2015.
  • Chen et al. (2016) Y.-C. Chen, C. R. Genovese, R. J. Tibshirani, and L. Wasserman. Nonparametric modal regression. The Annals of Statistics, 44(2):489 – 514, 2016.
  • Chernozhukov et al. (2013) V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics, 41(6):2786–2819, 12 2013.
  • Chernozhukov et al. (2014a) V. Chernozhukov, D. Chetverikov, and K. Kato. Anti-concentration and honest, adaptive confidence bands. The Annals of Statistics, 42(5):1787–1818, 2014a.
  • Chernozhukov et al. (2014b) V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximation of suprema of empirical processes. The Annals of Statistics, 42(4):1564–1597, 08 2014b.
  • Chernozhukov et al. (2015) V. Chernozhukov, D. Chetverikov, and K. Kato. Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162(1):47–70, Jun 2015.
  • Chernozhukov et al. (2016) V. Chernozhukov, D. Chetverikov, and K. Kato. Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related gaussian couplings. Stochastic Processes and their Applications, 126(12):3632–3651, 2016. ISSN 0304-4149. In Memoriam: Evarist Giné.
  • Chernozhukov et al. (2017) V. Chernozhukov, D. Chetverikov, and K. Kato. Central limit theorems and bootstrap in high dimensions. The Annals of Probability, 45(4):2309–2352, 07 2017.
  • Chernozhukov et al. (2019) V. Chernozhukov, D. Chetverikov, K. Kato, and Y. Koike. Improved central limit theorem and bootstrap approximations in high dimensions. arXiv preprint, arXiv:1912.10529, 12 2019.
  • Chernozhukov et al. (2020) V. Chernozhukov, D. Chetverikov, and Y. Koike. Nearly optimal central limit theorem and bootstrap approximations in high dimensions. 2020.
  • Chernozhukov et al. (2023) V. Chernozhukov, D. Chetverikov, K. Kato, and Y. Koike. High-dimensional data bootstrap. Annual Review of Statistics and Its Application, 10(1):427–449, 2023.
  • Chetverikov (2019) D. Chetverikov. Testing regression monotonicity in econometric models. Econometric Theory, 35(4):729–776, 2019.
  • Davison et al. (1986) A. C. Davison, D. V. Hinkley, and E. Schechtman. Efficient bootstrap simulation. Biometrika, 73(3):555–566, 1986.
  • Deng and Zhang (2020) H. Deng and C.-H. Zhang. Beyond gaussian approximation: Bootstrap for maxima of sums of independent random vectors. arXiv preprint, arXiv:1705.09528, 2020.
  • Dezeure et al. (2017) R. Dezeure, P. Bühlmann, and C.-H. Zhang. High-dimensional simultaneous inference with the bootstrap. Test, 26(4):685–719, 2017.
  • Dudley (2002) R. Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2002.
  • Dudley (2014) R. Dudley. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2014.
  • El Karoui (2007) N. El Karoui. Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. The Annals of Probability, 35(2):663 – 714, 2007.
  • Fan et al. (2018) J. Fan, Q.-M. Shao, and W.-X. Zhou. Are discoveries spurious? Distributions of maximum spurious correlations and their applications. Ann. Statist., 46(3):989–1017, 06 2018.
  • Fang and Koike (2021) X. Fang and Y. Koike. High-dimensional central limit theorems by Stein’s method. The Annals of Applied Probability, 31(4):1660 – 1686, 2021.
  • Folland (1999) G. Folland. Real Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley, 1999.
  • Fujikoshi (1980) Y. Fujikoshi. Asymptotic expansions for the distributions of the sample roots under nonnormality. Biometrika, 67(1):45–51, 1980.
  • Ghanem and Spanos (2003) R. Ghanem and P. Spanos. Stochastic Finite Elements: A Spectral Approach. Dover Publications, 2003.
  • Giessing (2023) A. Giessing. Anti-Concentration of Suprema of Gaussian Processes and Gaussian Order Statistics. Working Paper, 2023.
  • Giessing and Fan (2023) A. Giessing and J. Fan. A bootstrap hypothesis test for high-dimensional mean vectors. Working Paper, 2023.
  • Giessing and Wang (2021) A. Giessing and J. Wang. Debiased inference on heterogeneous quantile treatment effects with regression rank-scores, 2021.
  • Giné and Nickl (2016) E. Giné and R. Nickl. Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge University Press, 2016.
  • Götze (1991) F. Götze. On the rate of convergence in the multivariate clt. The Annals of Probability, 19(2):724–739, 1991.
  • Hall and Martin (1988) P. Hall and M. A. Martin. On bootstrap resampling and iteration. Biometrika, 75(4):661–671, 1988.
  • Han et al. (2018) F. Han, S. Xu, and W.-X. Zhou. On Gaussian comparison inequality and its application to spectral analysis of large random matrices. Bernoulli, 24(3):1787 – 1833, 2018.
  • Hardy (2006) M. Hardy. Combinatorics of partial derivatives. Electronic Journal of Combinatorics, 2006.
  • Hsing and Eubank (2015) T. Hsing and R. Eubank. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, 2015.
  • Jacot et al. (2018) A. Jacot, F. Gabriel, and C. Hongler. Neural tangent kernel: Convergence and generalization in neural networks. NIPS’18, page 8580–8589, 2018.
  • Jain and Kallianpur (1970) N. C. Jain and G. Kallianpur. A Note on Uniform Convergence of Stochastic Processes. The Annals of Mathematical Statistics, 41(4):1360 – 1362, 1970.
  • Janková et al. (2020) J. Janková, R. D. Shah, P. Bühlmann, and R. J. Samworth. Goodness-of-fit Testing in High Dimensional Generalized Linear Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):773–795, 05 2020.
  • Johnstone (2001) I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics, 29(2):295 – 327, 2001.
  • Koltchinskii and Lounici (2017) V. Koltchinskii and K. Lounici. Concentration inequalities and moment bounds for sample covariance operators. Bernoulli, 23(1):110 – 133, 2017.
  • Kuchibhotla et al. (2021) A. K. Kuchibhotla, S. Mukherjee, and D. Banerjee. High-dimensional CLT: Improvements, non-uniform extensions and large deviations. Bernoulli, 27(1):192 – 217, 2021.
  • Le Cam (1986) L. Le Cam. Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. Springer New York, 1986.
  • Lopes (2022a) M. E. Lopes. Central limit theorem and bootstrap approximation in high dimensions: Near 1/n1/\sqrt{n} rates via implicit smoothing. The Annals of Statistics, 50(5):2492 – 2513, 2022a.
  • Lopes (2022b) M. E. Lopes. Improved rates of bootstrap approximation for the operator norm: A coordinate-free approach, 2022b.
  • Lopes et al. (2019) M. E. Lopes, A. Blandino, and A. Aue. Bootstrapping spectral statistics in high dimensions. Biometrika, 106(4):781–801, 09 2019.
  • Lopes et al. (2020) M. E. Lopes, Z. Lin, and H.-G. Müller. Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data. Annals of Statistics, 48(2):1214–1229, 04 2020.
  • Lopes et al. (2023) M. E. Lopes, N. B. Erichson, and M. W. Mahoney. Bootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching. Bernoulli, 29(1):428 – 450, 2023.
  • Mendelson (2002) S. Mendelson. Geometric parameters of kernel machines. In International conference on computational learning theory, pages 29–43. Springer, 2002.
  • Naumov et al. (2019) A. Naumov, V. Spokoiny, and V. Ulyanov. Bootstrap confidence sets for spectral projectors of sample covariance. Probability Theory and Related Fields, 174(3):1091–1132, 2019.
  • Nourdin and Peccati (2012) I. Nourdin and G. Peccati. Normal approximations with Malliavin calculus: from Stein’s method to universality. Number 192. Cambridge University Press, 2012.
  • Péché (2009) S. Péché. Universality results for the largest eigenvalues of some sample covariance matrix ensembles. Probability Theory and Related Fields, 143(3):481–516, 2009.
  • Raič (2019) M. Raič. A multivariate Berry–Esseen theorem with explicit constants. Bernoulli, 25(4A):2824–2853, 11 2019.
  • Rockafellar (1999) R. T. Rockafellar. Second-order convex analysis. J. Nonlinear Convex Anal, 1(1-16):84, 1999.
  • Schölkopf and Smola (2002) B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Adaptive computation and machine learning. MIT Press, 2002.
  • Sedaghat (2003) H. Sedaghat. Nonlinear Difference Equations: Theory with Applications to Social Science Models. Springer, 2003.
  • Shi (1992) S. G. Shi. Accurate and efficient double-bootstrap confidence limit method. Computational Statistics and Data Analysis, 13(1):21–32, 1992.
  • Singh and Vijaykumar (2023) R. Singh and S. Vijaykumar. Kernel ridge regression inference, 2023.
  • Steinwart and Christmann (2008) I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.
  • Tanguy (2017) K. Tanguy. Quelques inégalités de superconcentration: théorie et applications. PhD thesis, Université Paul Sabatier-Toulouse III, 2017.
  • Unser (2019) M. Unser. A representer theorem for deep neural networks. Journal of Machine Learning Research, 20, 2019.
  • van der Vaart and Wellner (1996) A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.
  • Vershynin (2018) R. Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.
  • Wahba (1990) G. Wahba. Spline Models for Observational Data. SIAM, 1990.
  • Waternaux (1976) C. M. Waternaux. Asymptotic distribution of the sample roots for a nonnormal population. Biometrika, 63(3):639–645, 1976.
  • Yurinsky (2006) V. Yurinsky. Sums and Gaussian Vectors. Springer, 2006.
  • Zhang and Cheng (2017) X. Zhang and G. Cheng. Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association, 112(518):757–768, 2017.

Supplementary Materials for “Gaussian and Bootstrap Approximations for Empirical Processes” Alexander Giessing11footnotemark: 1

Contents

\startcontents

[sections] \printcontents[sections]l1

Appendix A Two toy examples

In this section we present two examples to illustrate the limitations of the existing and the advantage of the new Gaussian approximation results. The presentation is deliberately expository; we omit proofs as much as possible. In both examples, we take n={xf(x)=xu:uSd1}\mathcal{F}_{n}=\{x\mapsto f(x)=x^{\prime}u:u\in S^{d-1}\} with Sd1={ud:u2=1}S^{d-1}=\{u\in\mathbb{R}^{d}:\|u\|_{2}=1\}, which plays a role in the construction of high-dimensional confidence regions and multiple testing problems (see Section 4.1).

Consider a simple random sample X1,,XndX_{1},\ldots,X_{n}\in\mathbb{R}^{d} from the law of X=aξX=a\cdot\xi, where ada\in\mathbb{R}^{d} is a fixed vector and ξ\xi\in\mathbb{R} a centered random variable with finite third moment. Then, supfn|𝔾n(f)|=n1/2i=1nXi2=a2|n1/2i=1nξi|\sup_{f\in\mathcal{F}_{n}}|\mathbb{G}_{n}(f)|=\|n^{-1/2}\sum_{i=1}^{n}X_{i}\|_{2}=\|a\|_{2}|n^{-1/2}\sum_{i=1}^{n}\xi_{i}|. Whence, the supremum of the empirical process reduces to the average of i.i.d. scalar-valued random variables and the classical univariate Berry-Essen theorem yields the following bound on the Kolmogorov distance:

ϱnE[|ξ|3]nE[|ξ|2]3/20asn.\displaystyle\varrho_{n}\lesssim\frac{\mathrm{E}[|\xi|^{3}]}{\sqrt{n}\>\mathrm{E}[|\xi|^{2}]^{3/2}}\rightarrow 0\quad\mathrm{as}\quad n\rightarrow\infty. (17)

This non-asymptotic bound is independent of the dimension dd and the weak variance of the Gaussian proxy process. If we ignore the low-rank structure of the data and instead use the Gaussian approximation results in Chernozhukov et al. (2014b) (Theorem 2.1 combined with Lemma 2.3) we obtain (qualitatively)

ϱnpolylog(N(n,eP,ε)n)n1/6infuSd1Var(Zu),\displaystyle\varrho_{n}\lesssim_{\star}\frac{\mathrm{polylog}\left(N\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\vee n\right)}{n^{1/6}\sqrt{\inf_{u\in S^{d-1}}\mathrm{Var}(Z^{\prime}u)}}, (18)

where ZN(0,E[ξ2]aa)Z\sim N\left(0,\mathrm{E}[\xi^{2}]\cdot aa^{\prime}\right) and \lesssim_{\star} hides a complicated multiplicative factor which, among other things, depends on the sample size nn, third and higher moments of Z2\|Z\|_{2}, and the inverse of the discretization level ε>0\varepsilon>0 (see Section 3.6). Since the covariance matrix E[ξ2]aa\mathrm{E}[\xi^{2}]\cdot aa^{\prime} has rank one, the unit ball in n\mathcal{F}_{n} with respect to the intrinsic standard deviation metric ePe_{P} is isometrically isomorphic to the interval [1,1][-1,1]\subset\mathbb{R}. Hence, the metric entropy can be upper bounded independently of the dimension dd; in particular, we have logN(n,eP,ε)log(1+2/ε)\log N\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\leq\log(1+2/\varepsilon). However, the low-rank structure of the data also implies that the weak variance of the associated Gaussian proxy process vanishes; indeed, infuSd1Var(Zu)=E[ξ2]infuSd1(au)2=0\inf_{u\in S^{d-1}}\mathrm{Var}(Z^{\prime}u)=\mathrm{E}[\xi^{2}]\cdot\inf_{u\in S^{d-1}}(a^{\prime}u)^{2}=0. Thus, the upper bound in (18) is in fact invalid (or trivial if we interpret 1/0=1/0=\infty) and fails to replicate the univariate Berry-Esseen bound in (17).

In contrast, the new Gaussian approximation inequality (Theorem 1 in Section 3.2; Theorem A.1 in Giessing and Fan (2023)) is agnostic to the covariance structure of the data and yields

ϱn(E[X23])1/3n1/6Var(Z2)+E[X23𝟏{X23>nE[X23]}]E[X23]+E[Z2]nVar(Z2),\displaystyle\varrho_{n}\lesssim\frac{(\mathrm{E}[\|X\|_{2}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z\|_{2})}}+\frac{\mathrm{E}\left[\|X\|_{2}^{3}\mathbf{1}\{\|X\|_{2}^{3}>n\>\mathrm{E}[\|X\|_{2}^{3}]\}\right]}{\mathrm{E}\left[\|X\|_{2}^{3}\right]}+\frac{\mathrm{E}[\|Z\|_{2}]}{\sqrt{n\mathrm{Var}(\|Z\|_{2})}}, (19)

where \lesssim hides an absolute constant independent of n,dn,d, and the distribution of the XiX_{i}’s and ZN(0,E[ξ2]aa)Z\sim N\left(0,\mathrm{E}[\xi^{2}]\cdot aa^{\prime}\right). The upper bound in this inequality depends only on the third moment of the envelop function xx2x\mapsto\|x\|_{2} and the strong variance of the Gaussian proxy process Var(Z2)\mathrm{Var}(\|Z\|_{2}). If we use that X2=a2|ξ|\|X\|_{2}=\|a\|_{2}\cdot|\xi| and Z2=a2|g|\|Z\|_{2}=\|a\|_{2}\cdot|g| for gN(0,1)g\sim N(0,1), we obtain

ϱnE[|ξ|3]1/3n1/6E[|ξ|2]1/2+E[|ξ|3𝟏{|ξ|3>nE[|ξ|3]}]E[|ξ|3]0asn.\displaystyle\varrho_{n}\lesssim\frac{\mathrm{E}[|\xi|^{3}]^{1/3}}{n^{1/6}\>\mathrm{E}[|\xi|^{2}]^{1/2}}+\frac{\mathrm{E}\left[|\xi|^{3}\mathbf{1}\{|\xi|^{3}>n\>\mathrm{E}[|\xi|^{3}]\}\right]}{\mathrm{E}\left[|\xi|^{3}\right]}\rightarrow 0\quad\mathrm{as}\quad n\rightarrow\infty. (20)

This inequality is obviously dimension- and weak variance-free. In this sense, it recovers the essential feature of the univariate Berry-Esseen bound in (17) and improves over the bound in (18). The dependence on the sample size nn and the moments of ξ\xi is still sub-optimal; but refinements in this direction are beyond the scope of this paper. Obviously, in this example, we have chosen a rank one covariance matrix only to be able to compare the Gaussian approximation result with the Berry-Esseen theorem. Any low-rank structure implies a vanishing weak variance and, hence, a breakdown of the results in Chernozhukov et al. (2014b).

Next, suppose that the data X1,,XndX_{1},\ldots,X_{n}\in\mathbb{R}^{d} are a simple random sample drawn from the law of a random vector X=(X(1),,X(d))X=(X^{(1)},\ldots,X^{(d)})^{\prime} with mean zero, element-wise bounded entries max1kd|X(k)|<B\max_{1\leq k\leq d}|X^{(k)}|<B almost surely for some B>0B>0, and equi-correlated covariance matrix Σ=(1ρ)Id+ρ𝟏d𝟏d\Sigma=(1-\rho)I_{d}+\rho\mathbf{1}_{d}\mathbf{1}_{d}^{\prime} for some ρ(1/(d1),1)\rho\in(-1/(d-1),1). The constraints on ρ\rho guarantee that Σ\Sigma has full rank. Therefore, the multivariate Berry-Esseen theorem by Bentkus (2003) (Theorem 1.1) implies

ϱnd1/4E[Σ1/2X23]n.\displaystyle\varrho_{n}\lesssim\frac{d^{1/4}\>\mathrm{E}[\|\Sigma^{-1/2}X\|_{2}^{3}]}{\sqrt{n}}. (21)

This upper bound is not useful in high-dimensional settings because the expected value is polynomial in the dimension dd, i.e. E[Σ1/2X23]E[Σ1/2X22]3/2=d3/2\mathrm{E}[\|\Sigma^{-1/2}X\|_{2}^{3}]\geq\mathrm{E}[\|\Sigma^{-1/2}X\|_{2}^{2}]^{3/2}=d^{3/2}.

The Gaussian approximation results by Chernozhukov et al. (2014b) yield (again) inequality (18). Since the covariance matrix Σ\Sigma has full rank, the weak variance is now strictly positive and equal to the smallest eigenvalue of Σ\Sigma, i.e. infuSd1Var(Zu)=infuSd1uΣu=1ρ>0\inf_{u\in S^{d-1}}\mathrm{Var}(Z^{\prime}u)=\inf_{u\in S^{d-1}}u^{\prime}\Sigma u=1-\rho>0. However, in this example, the metric entropy with respect to the intrinsic standard deviation metric ePe_{P} poses a problem. Indeed, let λ1=1+(d1)ρ\lambda_{1}=1+(d-1)\rho and λ2==λd=1ρ\lambda_{2}=\ldots=\lambda_{d}=1-\rho be the eigenvalues of Σ\Sigma. Then, the unit ball in n\mathcal{F}_{n} with respect to ePe_{P} can be identified with the weighted Euclidean ball Bλd(0,ε):={ud:k=1dλkukε2}B_{\lambda}^{d}(0,\varepsilon):=\{u\in\mathbb{R}^{d}:\sum_{k=1}^{d}\lambda_{k}u_{k}\leq\varepsilon^{2}\}. Let Bd(0,1)B^{d}(0,1) be the standard unit ball in d\mathbb{R}^{d} with respect to the Euclidean distance. Then, Bd(0,ε)Bλd(0,ε)B^{d}(0,\varepsilon)\subseteq B_{\lambda}^{d}(0,\varepsilon) whenever ε<1ρ\varepsilon<1-\rho. Therefore, standard arguments N(n,eP,ε)vol(Bd(0,1))/vol(Bλd(0,ε))vol(Bd(0,1))/vol(Bd(0,ε))((1ρ)/ε)dN\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\geq\mathrm{vol}\left(B^{d}(0,1)\right)/\mathrm{vol}\left(B_{\lambda}^{d}(0,\varepsilon)\right)\geq\mathrm{vol}\left(B^{d}(0,1)\right)/\mathrm{vol}\left(B^{d}(0,\varepsilon)\right)\geq\left((1-\rho)/\varepsilon\right)^{d}. Thus, the metric entropy grows linear in the dimension dd, i.e. logN(n,eP,ε)dlog((1ρ)/ε)\log N\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\geq d\log\left((1-\rho)/\varepsilon\right) for ε0\varepsilon\downarrow 0. We conclude that the results in Chernozhukov et al. (2014b) are again not useful in high dimensions with dnd\gg n.

The new Gaussian approximation inequality implies (again) (19). Since max1kd|X(k)|<B\max_{1\leq k\leq d}|X^{(k)}|<B almost surely and tr(Σ)=d\mathrm{tr}(\Sigma)=d it follows that d=(E[Z22])1/2=(E[X22])1/2(E[X23])1/3Bd\sqrt{d}=(\mathrm{E}[\|Z\|_{2}^{2}])^{1/2}=(\mathrm{E}[\|X\|_{2}^{2}])^{1/2}\leq(\mathrm{E}[\|X\|_{2}^{3}])^{1/3}\leq B\sqrt{d}. Moreover, by Theorem A.6 in Giessing and Fan (2023) and since tr(Σ2)=d(1ρ)2+ρ2d2\mathrm{tr}(\Sigma^{2})=d(1-\rho)^{2}+\rho^{2}d^{2}, we have Var(Z2)tr(Σ2)/tr(Σ)=(1ρ)2+ρ2d\mathrm{Var}(\|Z\|_{2})\geq\mathrm{tr}(\Sigma^{2})/\mathrm{tr}(\Sigma)=(1-\rho)^{2}+\rho^{2}d. Therefore, inequality (19) simplifies to

ϱnBn1/6ρ+B3𝟏{B>n}0asn.\displaystyle\varrho_{n}\lesssim\frac{B}{n^{1/6}\rho}+B^{3}\mathbf{1}\{B>n\}\rightarrow 0\quad\mathrm{as}\quad n\rightarrow\infty. (22)

This inequality is not only dimension- and weak variance-free but also improves qualitatively over, both, the results by Chernozhukov et al. (2014b) and Bentkus (2003). Intuitively, the reason why we are able to shed the d1/4d^{1/4}-factor from the upper bound compared to the results in Bentkus (2003) is that we only take the supremum over all Euclidean balls with center at the origin whereas he takes the supremum over all convex sets in d\mathbb{R}^{d}. (Note that to apply his bound in the context of this example, we need to take the supremum over at least all weighted Euclidean balls Bλd(0,ε):={ud:k=1dλkukε2}B_{\lambda}^{d}(0,\varepsilon):=\{u\in\mathbb{R}^{d}:\sum_{k=1}^{d}\lambda_{k}u_{k}\leq\varepsilon^{2}\}.) This second example is related to the first one in so far as the covariance matrix has “approximately” rank one. Indeed, as the dimension increases the law of the random vector XX concentrates in the neighborhood of the one-dimensional subspace spanned by eigenvector associated to the largest eigenvalue of Σ\Sigma. This becomes even more obvious if we consider the standardized covariance Σ/Σop\Sigma/\|\Sigma\|_{op} with eigenvalues λ1=ρ+(1ρ)/dρ\lambda_{1}=\rho+(1-\rho)/d\rightarrow\rho and λ2=λd=(1ρ)/d0\lambda_{2}=\ldots\lambda_{d}=(1-\rho)/d\rightarrow 0 as dd\rightarrow\infty.

In both examples, the exact and approximate low-rank structures of the data are crucial in order to go from the abstract bound (19) to the dimension-/ entropy-free bounds (20) and (22), respectively. This is not coincidental and is in fact representative for the entire theory that we develop in this paper: While our Gaussian and bootstrap approximation inequalities hold without assumptions on the metric entropy, in concrete examples they often only yield entropy-free upper bounds if the trace of the covariance operator is bounded or grows at a much slower rate than the sample size nn (e.g. low-rank, bounded effective rank, or variance decay, see Section 4). This is certainly a limitation of our theory, but a low-rank covariance (function) is an empirically well-documented property of many data sets and therefore a common assumption in multivariate and high-dimensional statistics (e.g. Anderson, 2003; Vershynin, 2018).

Appendix B Auxiliary results

B.1 Smoothing inequalities and partial derivatives

Lemma 1.

Let X,ZX,Z\in\mathbb{R} be arbitrary random variables. There exists a map hs,λCc()h_{s,\lambda}\in C^{\infty}_{c}(\mathbb{R}) such that

  • (i)

    for all s,ts,t\in\mathbb{R} and λ>0\lambda>0,

    |Dkhs,λ(t)|Ckλk𝟏{sts+3λ},\displaystyle|D^{k}h_{s,\lambda}(t)|\leq C_{k}\lambda^{-k}\mathbf{1}\{s\leq t\leq s+3\lambda\},

    where Ck>0C_{k}>0 is a constant depending only on k0k\in\mathbb{N}_{0}; and

  • (ii)

    for all λ>0\lambda>0,

    |sups|{Xs}{Zs}|sups|E[hs,λ(X)hs,λ(Z)]||ζ3λ(X)ζ3λ(Z),\displaystyle\left|\sup_{s\in\mathbb{R}}\left|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right|-\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h_{s,\lambda}(X)-h_{s,\lambda}(Z)]\big{|}\right|\leq\zeta_{3\lambda}(X)\wedge\zeta_{3\lambda}(Z),

    where ζλ(V):=sups{sVs+λ}\zeta_{\lambda}(V):=\sup_{s\in\mathbb{R}}\mathbb{P}\{s\leq V\leq s+\lambda\} for real-valued VV\in\mathbb{R}.

Remark 1.

We can take C0=C1=1C_{0}=C_{1}=1 (see proof).

Lemma 2.

Let hs,λC0()h_{s,\lambda}\in C^{\infty}_{0}(\mathbb{R}) be the map from Lemma 1 and define xh(x):=hs,λ(x)x\mapsto h(x):=h_{s,\lambda}(\|x\|_{\infty}) for xdx\in\mathbb{R}^{d}. Let (Pt)t0(P_{t})_{t\geq 0} be the Ornstein-Uhlenbeck semi-group with stationary measure N(0,Σ)N(0,\Sigma) and positive definite covariance matrix Σ\Sigma.

  • (i)

    For arbitrary indices 1i1,,ikd1\leq i_{1},\ldots,i_{k}\leq d, k1k\geq 1, and all x0dx_{0}\in\mathbb{R}^{d},

    kxi1xik(0Pth(x)𝑑t)|x=x0=0ektPt(khxi1xik)(x0)𝑑t;\displaystyle\frac{\partial^{k}}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}=\int_{0}^{\infty}e^{-kt}P_{t}\left(\frac{\partial^{k}h}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}\right)(x_{0})dt;
  • (ii)

    for almost every x0dx_{0}\in\mathbb{R}^{d} the absolute value of the derivative in (i) can be upper bounded by

    Ckλk0ektE[𝟏{sV0ts+3λ}𝟏{|V0i1t||V0t|,i1}]𝑑t 1{i1==ik},\displaystyle C_{k}\lambda^{-k}\int_{0}^{\infty}e^{-kt}\mathrm{E}\left[\mathbf{1}\left\{s\leq\|V_{0}^{t}\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{|V_{0{i_{1}}}^{t}|\geq|V_{0\ell}^{t}|,\>\ell\neq{i_{1}}\right\}\right]dt\>\mathbf{1}\{i_{1}=\ldots=i_{k}\},

    where V0t=etx0+1e2tZV_{0}^{t}=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z, ZN(0,Σ)Z\sim N(0,\Sigma), and Ck>0C_{k}>0 is the absolute constant from Lemma 1.

Remark 2.

While claim (i) looks a lot like a “differentiating under the integral sign” type result, it is more accurate to think of it as a specific smoothing property of the Ornstein-Uhlenbeck semigroup with stationary measure N(0,Σ)N(0,\Sigma) when applied to (compositions of Lipschitz continuous functions with) the map xxx\mapsto\|x\|_{\infty}.

Lemma 3.

Let ϱL1()\varrho\in L^{1}(\mathbb{R}) with ϱ(r)𝑑r=1\int\varrho(r)dr=1. For η>0\eta>0 and a map hh on d\mathbb{R}^{d} set

(ϱη(i)h)(x):=ϱ(r)h(xrηei)𝑑r,\displaystyle(\varrho_{\eta}\ast_{(i)}h)(x):=\int\varrho(r)h(x-r\eta e_{i})dr,

where eie_{i} denotes the iith standard unit vector in d\mathbb{R}^{d}. For kk\in\mathbb{N}, f1,,fkB(d)f_{1},\ldots,f_{k}\in B(\mathbb{R}^{d}), gL1(d)g\in L^{1}(\mathbb{R}^{d}), and arbitrary indices 1i1,,ikd1\leq i_{1},\ldots,i_{k}\leq d,

j=1k(ϱη(ij)fj)gj=1kfjg10asη0.\displaystyle\left\|\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{j})}f_{j})g-\prod_{j=1}^{k}f_{j}g\right\|_{1}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0.
Remark 3.

The proof of this result on “partially regularized” functions is similar to the one on “fully regularized” functions (Folland, 1999, Theorem 8.14 (a)). The conditions on the functions f1,,fk,g,ϱf_{1},\ldots,f_{k},g,\varrho can probably be relaxed, but they are sufficiently general to apply to the situations that we encounter.

Lemma 4 (Partial derivatives of compositions of almost everywhere diff’able functions).

Let gCk()g\in C^{k}(\mathbb{R}) and fCk(d𝒩)f\in C^{k}(\mathbb{R}^{d}\setminus\mathcal{N}) and 𝒩\mathcal{N} is a null set with respect to the Lebesgue measure on d\mathbb{R}^{d}. Then, gfCk(d𝒩)g\circ f\in C^{k}(\mathbb{R}^{d}\setminus\mathcal{N}) and, for arbitrary indices 1i1,,ikd1\leq i_{1},\ldots,i_{k}\leq d, k1k\geq 1, and all xd𝒩x\in\mathbb{R}^{d}\setminus\mathcal{N},

k(gf)xi1xik(x)=πΠ(D|π|gf)Bπ|B|fjBxj(x),\displaystyle\frac{\partial^{k}(g\circ f)}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}(x)=\sum_{\pi\in\Pi}(D^{|\pi|}g\circ f)\prod_{B\in\pi}\frac{\partial^{|B|}f}{\prod_{j\in B}\partial x_{j}}(x),

where Π\Pi is the set of all partitions of {i1,,ik}\{i_{1},\ldots,i_{k}\}, |π||\pi| denotes the number of “blocks of indices” in partition πΠ\pi\in\Pi, and |B||B| is the number of indices in block BπB\in\pi.

Remark 4.

This result is a trivial modification of the multivariate version of Faà di Bruno’s formula due to Hardy (2006). The modification is that we only require ff to be kk-times differentiable almost everywhere on d\mathbb{R}^{d}. The formula is generally false if gg is only kk-times differentiable almost everywhere on \mathbb{R}.

Remark 5.

The first three partial derivatives are of particular interest to us. They are given by (whenever they exist)

(gf)xi(x)\displaystyle\frac{\partial(g\circ f)}{\partial x_{i}}(x) =(Dgf)(x)fxi(x),\displaystyle=\left(Dg\circ f\right)(x)\frac{\partial f}{\partial x_{i}}(x),
2(gf)xixj(x)\displaystyle\frac{\partial^{2}(g\circ f)}{\partial x_{i}\partial x_{j}}(x) =(D2gf)(x)fxi(x)fxj(x)+(Dgf)(x)2fxixj(x),\displaystyle=\left(D^{2}g\circ f\right)(x)\frac{\partial f}{\partial x_{i}}(x)\frac{\partial f}{\partial x_{j}}(x)+\left(Dg\circ f\right)(x)\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}(x),
3(gf)xixjxk(x)\displaystyle\frac{\partial^{3}(g\circ f)}{\partial x_{i}\partial x_{j}\partial x_{k}}(x) =(D3gf)(x)fxi(x)fxj(x)fxk(x)\displaystyle=\left(D^{3}g\circ f\right)(x)\frac{\partial f}{\partial x_{i}}(x)\frac{\partial f}{\partial x_{j}}(x)\frac{\partial f}{\partial x_{k}}(x)
+(D2gf)(x)[2fxixj(x)fxk(x)+2fxixk(x)fxj(x)+2fxkxj(x)fxi(x)]\displaystyle\quad{}+\left(D^{2}g\circ f\right)(x)\left[\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}(x)\frac{\partial f}{\partial x_{k}}(x)+\frac{\partial^{2}f}{\partial x_{i}\partial x_{k}}(x)\frac{\partial f}{\partial x_{j}}(x)+\frac{\partial^{2}f}{\partial x_{k}\partial x_{j}}(x)\frac{\partial f}{\partial x_{i}}(x)\right]
+(Dgf)(x)3fxixjxk(x).\displaystyle\quad{}+\left(Dg\circ f\right)(x)\frac{\partial^{3}f}{\partial x_{i}\partial x_{j}\partial x_{k}}(x).
Lemma 5 (Partial and total derivatives of \ell_{\infty}-norms).

The map f(x)=max1kd|xk|f(x)=\max_{1\leq k\leq d}|x_{k}| is partially differentiable of any order almost everywhere on d\mathbb{R}^{d} with partial derivatives (whenever they exist)

fxi(x)=sign(xi)𝟏{|xi||xk|,k}andkfxi1xik(x)=0,\displaystyle\frac{\partial f}{\partial x_{i}}(x)=\mathrm{sign}(x_{i})\mathbf{1}\{|x_{i}|\geq|x_{k}|,\>\forall k\}\quad{}\quad{}\text{and}\quad{}\quad{}\frac{\partial^{k}f}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}(x)=0,

for 1i,i1,,ikd1\leq i,i_{1},\ldots,i_{k}\leq d and k2k\geq 2. Moreover, ff is twice totally differentiable almost everywhere on d\mathbb{R}^{d} with Jacobian and Hessian matrices (whenever they exist)

Df(x)=[fx1(x),,fxd(x)]andD2f(x)=𝟎d×d.\displaystyle Df(x)=\left[\frac{\partial f}{\partial x_{1}}(x),\ldots,\frac{\partial f}{\partial x_{d}}(x)\right]\quad{}\quad{}\text{and}\quad{}\quad{}D^{2}f(x)=\mathbf{0}\in\mathbb{R}^{d\times d}.
Remark 6.

Note that the first partial derivative can be re-written (less compact) as a piece-wise linear function.

B.2 Anti-concentration inequalities and lower bounds on variances

Lemma 6 (Giessing2023).

Let X=(Xu)uUX=(X_{u})_{u\in U} be a centered separable Gaussian process indexed by a semi-metric space UU. Set Z=supuUXuZ=\sup_{u\in U}X_{u} and assume that 0Z<0\leq Z<\infty a.s. For all ε0\varepsilon\geq 0,

ε/12Var(Z)+ε2/12supt0{tZt+ε}ε12Var(Z)+ε2/12.\displaystyle\frac{\varepsilon/\sqrt{12}}{\sqrt{\mathrm{Var}(Z)+\varepsilon^{2}/12}}\leq\sup_{t\geq 0}\mathbb{P}\left\{t\leq Z\leq t+\varepsilon\right\}\leq\frac{\varepsilon\sqrt{12}}{\sqrt{\mathrm{Var}(Z)+\varepsilon^{2}/12}}.

The result remains true if ZZ is replaced by Z~=supuU|Xu|\widetilde{Z}=\sup_{u\in U}|X_{u}|.

Remark 7.

If the covariance function of XX is positive definite, then above inequalities hold even for uncentered X=(Xu)uUX=(X_{u})_{u\in U} and Z[,)Z\in[-\infty,\infty) a.s.

Lemma 7 (Giessing2023).

Let X=(Xu)uUX=(X_{u})_{u\in U} be a separable Gaussian process indexed by a semi-metric space UU such that E[Xu]=0\mathrm{E}[X_{u}]=0, 0<σ¯2E[Xu2]σ¯2<0<\underline{\sigma}^{2}\leq\mathrm{E}[X_{u}^{2}]\leq\bar{\sigma}^{2}<\infty, and |Corr(Xu,Xv)|ρ|\mathrm{Corr}(X_{u},X_{v})|\leq\rho for all u,vUu,v\in U. Set Z=supuUXuZ=\sup_{u\in U}X_{u} and assume that Z<Z<\infty a.s. Then, 0E[Z]<0\leq\mathrm{E}[Z]<\infty and there exist absolute constants c,C>0c,C>0 such that

1C(σ¯1+E[Z/σ¯])2Var(Z)C[σ¯2(σ¯2ρ+(σ¯(E[Z/σ¯]c)+)2)],\displaystyle\frac{1}{C}\left(\frac{\underline{\sigma}}{1+\mathrm{E}[Z/\underline{\sigma}]}\right)^{2}\leq\mathrm{Var}(Z)\leq C\left[\bar{\sigma}^{2}\wedge\left(\bar{\sigma}^{2}\rho+\left(\frac{\bar{\sigma}}{(\mathrm{E}[Z/\bar{\sigma}]-c)_{+}}\right)^{2}\right)\right],

with the convention that “1/0=1/0=\infty”. The result remains true if ZZ is replaced by Z~=supuU|Xu|\widetilde{Z}=\sup_{u\in U}|X_{u}|.

Lemma 8 (Le Cam1986, p. 402).

For X,ZX,Z\in\mathbb{R} arbitrary random variables and λ>0\lambda>0,

sups0|{Xs}{Zs}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\Big{|} {|XZ|>λ}+ζλ(X)ζλ(Z),\displaystyle\leq\mathbb{P}\left\{|X-Z|>\lambda\right\}+\zeta_{\lambda}(X)\wedge\zeta_{\lambda}(Z),

where ζλ(V):=sups{sVs+λ}\zeta_{\lambda}(V):=\sup_{s\in\mathbb{R}}\mathbb{P}\{s\leq V\leq s+\lambda\} for real-valued VV\in\mathbb{R}.

Lemma 9.

Let X,ZX,Z\in\mathbb{R} be arbitrary random variables. Then,

|Var(X)Var(XZ)|Var((ZX)+),\displaystyle\left|\sqrt{\mathrm{Var}(X)}-\sqrt{\mathrm{Var}(X\vee Z)}\right|\leq\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}},

where (a)+=max(a,0)(a)_{+}=\max(a,0) for aa\in\mathbb{R}. Moreover, if E[Z]E[X]\mathrm{E}[Z]\geq\mathrm{E}[X], then

Var((ZX)+)Var(ZX).\displaystyle\mathrm{Var}\big{(}(Z-X)_{+}\big{)}\leq\mathrm{Var}(Z-X).

B.3 Quantile comparison lemmas

Throughout this section, ZnX1,,XnN(0,Σ^n)Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n}) and ZN(0,Σ)Z\sim N(0,\Sigma). For α(0,1)\alpha\in(0,1) we define the α\alphath quantile of Zn\|Z_{n}\|_{\infty} and Z\|Z\|_{\infty}, respectively, by

cn(α;Σ^n):=inf{s0:{ZnsX1,,Xn}α},cn(α;Σ):=inf{s0:{Zs}α}.\displaystyle\begin{split}c_{n}(\alpha;\widehat{\Sigma}_{n})&:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|Z_{n}\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\},\\ c_{n}(\alpha;\Sigma)&:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\geq\alpha\right\}.\end{split} (23)
Lemma 10.

For all δ>0\delta>0,

infα(0,1){cn(α;Σ^n)cn(πn(δ)+α;Σ)}1{maxj,k|Σ^n,jkΣjk|>δ},and\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}(\alpha;\widehat{\Sigma}_{n})\leq c_{n}(\pi_{n}(\delta)+\alpha;\Sigma)\Big{\}}\geq 1-\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\},\quad{}\text{and}
infα(0,1){cn(α;Σ)cn(πn(δ)+α;Σ^n)}1{maxj,k|Σ^n,jkΣjk|>δ},\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}(\alpha;\Sigma)\leq c_{n}(\pi_{n}(\delta)+\alpha;\widehat{\Sigma}_{n})\Big{\}}\geq 1-\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\},

where πn(δ)=Kδ1/3(Var(Σ1/2Z))1/3\pi_{n}(\delta)=K\delta^{1/3}\left(\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})\right)^{-1/3} and K>0K>0 is an absolute constant.

If we use non-Gaussian proxy statistics to test hypothesis (such as Efron’s empirical bootstrap), we replace Lemma 10 by the following result:

Lemma 11.

Let Tn=Sn+Rn0T_{n}=S_{n}+R_{n}\geq 0 be a statistic; SnS_{n} and RnR_{n} need not be independent. For α(0,1)\alpha\in(0,1) arbitrary define cnT(α):=inf{s0:P(TnsX1,,Xn)α}c_{n}^{T}(\alpha):=\inf\{s\geq 0:\mathrm{P}(T_{n}\leq s\mid X_{1},\ldots,X_{n})\geq\alpha\}. Then, for all δ,η>0\delta,\eta>0,

infα(0,1){cnT(α)cn(κn(δ)+η+α;Σ)}1{γn+ρn(δ)>η}and\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}^{T}(\alpha)\leq c_{n}(\kappa_{n}(\delta)+\eta+\alpha;\Sigma)\right\}\geq 1-\mathbb{P}\left\{\gamma_{n}+\rho_{n}(\delta)>\eta\right\}\quad{}\text{and}
infα(0,1){cn(α;Σ)cnT(κn(δ)+η+α)}1{γn+ρn(δ)>η},\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\Sigma)\leq c_{n}^{T}(\kappa_{n}(\delta)+\eta+\alpha)\right\}\geq 1-\mathbb{P}\left\{\gamma_{n}+\rho_{n}(\delta)>\eta\right\},

where γn=sups0|{SnsX1,,Xn}{Σn1/2Zs}|\gamma_{n}=\sup_{s\geq 0}|\mathbb{P}\{S_{n}\leq s\mid X_{1},\ldots,X_{n}\}-\mathbb{P}\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\}|, κn(δ)=Kδ(Var(Σn1/2Z))1/2\kappa_{n}(\delta)=K\delta(\mathrm{Var}(\|\Sigma_{n}^{1/2}Z\|_{\infty}))^{-1/2}, ρn(δ)={|Rn|>δX1,,Xn}\rho_{n}(\delta)=\mathbb{P}\{|R_{n}|>\delta\mid X_{1},\ldots,X_{n}\}, and K>0K>0 is an absolute constant.

For α(0,1)\alpha\in(0,1) we define the (conditional) α\alphath quantile of the supremum of {Z^nm(f):fn}\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\} by

cn(α;𝒞^nm)\displaystyle c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m}) :=inf{s0:{Z^nmnsX1,,Xn}α},\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\},

and the α\alphath quantile of the supremum of the Gaussian PP-bridge process {GP(f):fn}\{G_{P}(f):f\in\mathcal{F}_{n}\} by

cn(α;𝒞P)\displaystyle c_{n}(\alpha;\mathcal{C}_{P}) :=inf{s0:{GPns}α},\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\geq\alpha\right\},

The following two lemmas are straightforward generalizations of the preceding lemmas to empirical processes. The proofs of Lemmas 12 and 13 are identical to the ones of Lemmas 10 and 11, respectively. We therefore omit them.

Lemma 12.

For all δ>0\delta>0,

infα(0,1){cn(α;𝒞^nm)cn(πn(δ;𝒞P)+α)}1{supf,gn|𝒞P(f,g)𝒞^nm(f,g)|>δ},and\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m})\leq c_{n}(\pi_{n}(\delta;\mathcal{C}_{P})+\alpha)\right\}\geq 1-\mathbb{P}\left\{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}>\delta\right\},\quad{}\text{and}
infα(0,1){cn(α;𝒞P)cn(πn(δ)+α;𝒞^nm)}1{supf,gn|𝒞P(f,g)𝒞^nm(f,g)|>δ},\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\mathcal{C}_{P})\leq c_{n}(\pi_{n}(\delta)+\alpha;\widehat{\mathcal{C}}_{n}^{m})\right\}\geq 1-\mathbb{P}\left\{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}>\delta\right\},

where πn(δ)=Kδ1/3(Var(GPn))1/3\pi_{n}(\delta)=K\delta^{1/3}\left(\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\right)^{-1/3} and K>0K>0 is an absolute constant.

Lemma 13.

Let {Zn(f)=Zn1(f)+Zn2(f):fn}\{Z_{n}(f)=Z_{n}^{1}(f)+Z_{n}^{2}(f):f\in\mathcal{F}_{n}\} be an arbitrary stochastic process. For α(0,1)\alpha\in(0,1) arbitrary define cn,Zn(α):=inf{s0:{ZnnsX1,,Xn}α}c_{n,Z_{n}}(\alpha):=\inf\{s\geq 0:\mathbb{P}\{\|Z_{n}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\}\geq\alpha\}. Then, for all δ,η>0\delta,\eta>0,

infα(0,1){cn,Zn(α)cn(κn(δ)+η+α;𝒞P)}1{γn,Zn1+ρn,Zn2(δ)>η},and\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n,Z_{n}}(\alpha)\leq c_{n}(\kappa_{n}(\delta)+\eta+\alpha;\mathcal{C}_{P})\right\}\geq 1-\mathbb{P}\left\{\gamma_{n,Z_{n}^{1}}+\rho_{n,Z_{n}^{2}}(\delta)>\eta\right\},\quad{}\text{and}
infα(0,1){cn(α;𝒞P)cn,Zn(κn(δ)+η+α)}1{γn,Zn1+ρn,Zn2(δ)>η},\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\mathcal{C}_{P})\leq c_{n,Z_{n}}(\kappa_{n}(\delta)+\eta+\alpha)\right\}\geq 1-\mathbb{P}\left\{\gamma_{n,Z_{n}^{1}}+\rho_{n,Z_{n}^{2}}(\delta)>\eta\right\},

where γn,Zn1=sups0|{Zn1nsX1,,Xn}{GPns}|\gamma_{n,Z_{n}^{1}}=\sup_{s\geq 0}|\mathbb{P}\{\|Z_{n}^{1}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\}-\mathbb{P}\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\}|, κn(δ)=Kδ/Var(GPn)\kappa_{n}(\delta)=K\delta/\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}, ρn,Zn2(δ)={Zn2n>δX1,,Xn}\rho_{n,Z_{n}^{2}}(\delta)=\mathbb{P}\left\{\|Z_{n}^{2}\|_{\mathcal{F}_{n}}>\delta\mid X_{1},\ldots,X_{n}\right\}, and K>0K>0 is an absolute constant.

B.4 Boundedness and continuity of centered Gaussian processes

This section contains classical results on boundeness and continuity of the sample paths of centered Gaussian processes. We provide a proof for Lemma 18 in Appendix F.4; all other results (with proofs or references to proofs) can be found in Appendix A of van der Vaart and Wellner (1996).

Throughout this section X=(Xu)uUX=(X_{u})_{u\in U} denotes a centered separable Gaussian process indexed by a semi-metric space UU, Z=supuU|Xu|Z=\sup_{u\in U}|X_{u}|, σ2=supuUE[Xu2]\sigma^{2}=\sup_{u\in U}\mathrm{E}[X_{u}^{2}], and dXd_{X} the intrinsic standard deviation metric associated with XX.

Lemma 14 (Equivalence of bounded sample path and finite expectation).

XX is almost surely bounded on UU if and only if E[Z]<\mathrm{E}[Z]<\infty.

Lemma 15 (Reverse Liapunov inequality).

If XX is almost surely bounded on UU, then there exist constants Kp,q>0K_{p,q}>0 depending on 0<pq<0<p\leq q<\infty only such that

(E[Zq])1/qKp,q(E[Zp])1/p.\displaystyle\left(\mathrm{E}[Z^{q}]\right)^{1/q}\leq K_{p,q}\left(\mathrm{E}[Z^{p}]\right)^{1/p}.
Lemma 16 (Sudakov’s lower bound).

Let N(U,dX,ε)N(U,d_{X},\varepsilon) be the ε\varepsilon-covering number of UU w.r.t. dXd_{X}. Then, there exists an absolute constant K>0K>0 such that

supε>0εN(U,dX,ε)KE[Z].\displaystyle\sup_{\varepsilon>0}\varepsilon\sqrt{N(U,d_{X},\varepsilon)}\leq K\mathrm{E}[Z].

Consequently, if E[Z]<\mathrm{E}[Z]<\infty, then UU is totally bounded w.r.t. dXd_{X}.

Lemma 17 (Metric entropy condition for bounded and continuous sample paths).

Let N(U,dX,ε)N(U,d_{X},\varepsilon) be the ε\varepsilon-covering number of UU w.r.t. dXd_{X}. If 0N(U,dX,ε)𝑑ε<\int_{0}^{\infty}\sqrt{N(U,d_{X},\varepsilon)}d\varepsilon<\infty, then there exists a version of XX that is almost surely bounded and has almost surely uniformly dXd_{X}-continuous sample paths.

Lemma 18 (Continuous sample paths and modulus of continuity).

If XX is almost surely bounded on UU, then the sample paths of XX on UU are almost surely uniformly dXd_{X}-continuous if and only if

limδ0E[supdX(u,v)<δ|XuXv|]=0.\displaystyle\lim_{\delta\rightarrow 0}\mathrm{E}\left[\sup_{d_{X}(u,v)<\delta}|X_{u}-X_{v}|\right]=0. (24)
Remark 8.

Sufficiency of (24) holds for arbitrary stochastic processes on general metric spaces. Necessity of (24) holds only for Gaussian processes. See comment in proof.

B.5 Auxiliary results for applications

In this section we collect several technical results needed for the applications in Section 4.

Lemma 19.

Let X,X1,,XndX,X_{1},\ldots,X_{n}\in\mathbb{R}^{d} be i.i.d. sub-Gaussian random vectors with mean zero and covariance matrix Σ\Sigma. Define T4:d×d×d×dT_{4}:\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d}\rightarrow\mathbb{R} by T4(t,u,v,w):=E[(Xt)(Xu)(Xv)(Xw)]T_{4}(t,u,v,w):=\mathrm{E}[(X^{\prime}t)(X^{\prime}u)(X^{\prime}v)(X^{\prime}w)]. Then,

E1ni=1nXiXiXiXiT4opr(Σ)Σop2((logen)2r(Σ)n(logen)2r(Σ)n),\displaystyle\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}\otimes X_{i}\otimes X_{i}-T_{4}\right\|_{op}\lesssim\mathrm{r}(\Sigma)\|\Sigma\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right),

where r(Σ)=tr(Σ)/Σop\mathrm{r}(\Sigma)=\mathrm{tr}(\Sigma)/\|\Sigma\|_{op} and \lesssim hides an absolute constant independent of nn, dd, and Σ\Sigma. (Here, we tacitly identify the XiX_{i}’s with linear maps d\mathbb{R}^{d}\rightarrow\mathbb{R} and \otimes denotes the tensor product between these linear maps.)

Remark 9.

This result is useful because the upper bound is dimension-free in the sense that it only depends on the effective rank r(Σ)\mathrm{r}(\Sigma) and the operator norm Σop\|\Sigma\|_{op}. However, the dependence on the sample size nn is sub-optimal.

Lemma 20.

The bias-corrected kernel ridge regression estimator defined in (13) satisfies

f^nbcf0=(T+λ)2T(1ni=1n(Yif0(Xi))kXi)+Rn,\displaystyle\widehat{f}^{\mathrm{bc}}_{n}-f_{0}=(T+\lambda)^{-2}T\left(\frac{1}{n}\sum_{i=1}^{n}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right)+R_{n},

where RnR_{n} is a higher-order remainder term and

nRn\displaystyle\sqrt{n}\|R_{n}\|_{\infty} κ(λ1T^nTop+λ2T^nTop2)1ni=1n(T+λ)1εikXi\displaystyle\lesssim\kappa\left(\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}+\lambda^{-2}\|\widehat{T}_{n}-T\|_{op}^{2}\right)\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}}
+κ(nT^nTop2+nλ2)(T+λ)2f0,\displaystyle\quad\quad+\kappa\left(\sqrt{n}\|\widehat{T}_{n}-T\|_{op}^{2}+\sqrt{n}\lambda^{2}\right)\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}},

where \lesssim hides an absolute constant.

Lemma 21.

Let δ(0,1)\delta\in(0,1) and RnR_{n} be the remainder term in (13). Let SS be a separable metric space (w.r.t. some metric). If Assumptions 4 and 5 hold, then, with probability at least 1δ1-\delta,

nRn\displaystyle\sqrt{n}\|R_{n}\|_{\infty} (σ¯2𝔫12(λ)nλ2σ¯2nλ2)(κ4log3(2/δ)nλκ2log2(2/δ))\displaystyle\lesssim\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right)\left(\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\vee\kappa^{2}\log^{2}(2/\delta)\right)
+(κ4log2(2/δ)nnλ2)κ(T+λ)2f0,\displaystyle\quad\quad+\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}\vee\sqrt{n}\lambda^{2}\right)\kappa\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}},

where σ¯2σ02κ2(B+κf0)21\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1, 𝔫12(λ)=tr((T+λ)2T)\mathfrak{n}_{1}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2}T\right), and \lesssim hides an absolute constant.

Remark 10.

The quantity 𝔫1(λ)\mathfrak{n}_{1}(\lambda) also appears in Singh and Vijaykumar (2023). For an interpretation and its relation to the effective rank of the operator TT we refer to Section H (ibid., pp. 80ff).

Remark 11.

Above upper bound is op(1)o_{p}(1) if (i) σ¯κ3(logn)3σ¯κ2(logn)2𝔫1(λ)=o(nλ)\bar{\sigma}\kappa^{3}(\log n)^{3}\vee\bar{\sigma}\kappa^{2}(\log n)^{2}\mathfrak{n}_{1}(\lambda)=o(\sqrt{n}\lambda), (ii) κ5(logn)2(T+λ)2f0=o(n)\kappa^{5}(\log n)^{2}\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(\sqrt{n}), and (iii) λ2κ(T+λ)2f0=o(1)\lambda^{2}\kappa\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(1).

Lemma 22.

Let δ(0,1)\delta\in(0,1) and Ω\Omega and Ω^n\widehat{\Omega}_{n} be the covariance operator and its sample analogue as define in Section 4.3. If Assumptions 4 and 5 hold, then, with probability at least 1δ1-\delta,

Ω^nΩopT3(T+λ)4op(κ4+κ2𝔫12(λ)+σ¯4)log(2/δ)nλ2,\displaystyle\|\widehat{\Omega}_{n}-\Omega\|_{op}\lesssim\|T^{3}(T+\lambda)^{-4}\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}},

where σ¯2σ02κ2(B+κf0)21\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1, 𝔫12(λ)=tr((T+λ)2T)\mathfrak{n}_{1}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2}T\right), and \lesssim hides an absolute constant.

Remark 12.

Above upper bound is op(1)o_{p}(1) if T3(T+λ)4op(κ2κ𝔫1(λ)σ¯2)logn=o(nλ)\|T^{3}(T+\lambda)^{-4}\|_{op}(\kappa^{2}\vee\kappa\mathfrak{n}_{1}(\lambda)\vee\bar{\sigma}^{2})\sqrt{\log n}=o(\sqrt{n}\lambda).

Lemma 23.

Let δ(0,1)\delta\in(0,1) and α0\alpha\in\mathbb{N}_{0}. Let SS be a separable metric space (w.r.t. some metric). If Assumptions 4 and 5 hold, then, with probability at least 1δ1-\delta,

(i)\displaystyle(i) 1ni=1n(T+λ)α(Yif0(Xi))kXiσ02𝔫α2(λ)log(2/δ)nλακ(B+κf0)log(2/δ)n,\displaystyle\quad\quad\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right\|_{\mathcal{H}}\lesssim\sqrt{\frac{\sigma_{0}^{2}\mathfrak{n}_{\alpha}^{2}(\lambda)\log(2/\delta)}{n}}\vee\frac{\lambda^{-\alpha}\kappa(B+\kappa\|f_{0}\|_{\mathcal{H}})\log(2/\delta)}{n},
(ii)\displaystyle(ii) 1ni=1n(T+λ)α((kXikXi)T)HSκ2𝔫α2(λ)log(2/δ)nλακ2log(2/δ)n,\displaystyle\quad\quad\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-\alpha}\left((k_{X_{i}}\otimes k_{X_{i}}^{*})-T\right)\right\|_{HS}\lesssim\sqrt{\frac{\kappa^{2}\mathfrak{n}_{\alpha}^{2}(\lambda)\log(2/\delta)}{n}}\vee\frac{\lambda^{-\alpha}\kappa^{2}\log(2/\delta)}{n},

where σ¯2σ02κ2(B+κf0)21\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1, 𝔫α2(λ)=tr((T+λ)2αT)\mathfrak{n}_{\alpha}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right) and \lesssim hides an absolute constant.

Remark 13.

If \mathcal{H} is pre-Gaussian, then weaker conditions and generic chaining arguments yield tighter bounds (e.g. Koltchinskii and Lounici, 2017, Theorem 9).

The next lemma is a version of Bernstein’s exponential tail bound for random elements on separable Hilbert spaces.

Lemma 24 (Theorem 3.3.4, Yurinsky2006; Lemma G.2, Singh and Vijaykumar2023).

Let X,X1,,XnX,X_{1},\ldots,X_{n} be i.i.d. centered random elements on a separable Hilbert space \mathcal{H} with induced norm \|\cdot\|_{\mathcal{H}}. Suppose that there exist absolute constants ν,σ>0\nu,\sigma>0 such that i=1nEXik(k!/2)σ2νk2\sum_{i=1}^{n}\mathrm{E}\|X_{i}\|_{\mathcal{H}}^{k}\leq(k!/2)\sigma^{2}\nu^{k-2} for all k2k\geq 2. Then, for t>0t>0 arbitrary,

{max1mni=1mXi>tσ}2exp(t2/21+tν/σ).\displaystyle\mathbb{P}\left\{\max_{1\leq m\leq n}\left\|\sum_{i=1}^{m}X_{i}\right\|_{\mathcal{H}}>t\sigma\right\}\leq 2\exp\left(\frac{-t^{2}/2}{1+t\nu/\sigma}\right).

In particular, for δ(0,1)\delta\in(0,1) arbitrary, with probability at least 1δ1-\delta,

1ni=1nXiσ2log(2/δ)nνlog(2/δ)n,\displaystyle\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\right\|_{\mathcal{H}}\lesssim\sqrt{\frac{\sigma^{2}\log(2/\delta)}{n}}\vee\frac{\nu\log(2/\delta)}{n},

where \lesssim hides an absolute constant independent of δ,n,ν,σ\delta,n,\nu,\sigma, and \mathcal{H}.

Appendix C Proofs of the results in Section 2

C.1 Proof of Proposition 1

Proof of Proposition 1.

Our proof is inspired by Nourdin and Peccati (2012) (Theorem 3.7.1) who establish a Berry-Esseen type bound for the uni-variate case using an inductive argument (attributed to Bolthausen, 1984). The multi-variate case requires several modifications some of which we take from Götze (1991)Bhattacharya and Holmes (2010), and Fang and Koike (2021). We also borrow a truncation argument from Chernozhukov et al. (2017), which explains the qualitative similarity between their bound and ours. Original ideas in our proof are mostly those related to the way in which we use our Gaussian anti-concentration inequality (Lemma 6) and exploit the mollifying properties of the Ornstein-Uhlenbeck semi-group operator to by-pass dimension dependent smoothing inequalities (Lemmas 1 and 2).

Our proof strategy has two drawbacks: First, the inductive argument relies substantially on the i.i.d. assumption of the data. Generalizing this argument to independent but non-identical data requires additional assumptions on the variances similar to the uniform asymptotic negligibility condition in the classical Lindeberg-Feller CLT. We leave this generalization to future research. Second, the recent results by Chernozhukov et al. (2020) suggest that our Berry-Esseen type bound is not sharp. Unfortunately, their proof technique (based on delicate estimates of Hermite polynomials) is inherently dimension dependent. Extending their approach to the coordinate-free Wiener chaos decomposition is another interesting research task.

The case of positive definite Σ\Sigma.

Suppose that Σd×d\Sigma\in\mathbb{R}^{d\times d} is positive definite. Let ZN(0,Σ)Z\sim N(0,\Sigma) be independent of X,X1,,XndX,X_{1},\ldots,X_{n}\in\mathbb{R}^{d} and define, for each n1n\geq 1,

Δn:=sups,t0|{etSn+1e2tZs}{Zs}|.\displaystyle\Delta_{n}:=\sup_{s,t\geq 0}\Big{|}\mathbb{P}\left\{\|e^{-t}S_{n}+\sqrt{1-e^{-2t}}Z\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}.

Further, for each n1n\geq 1, let Cn,d1C_{n,d}\geq 1 be the smallest constant greater than or equal to one such that, for all i.i.d. random variables X,X1,,XndX,X_{1},\ldots,X_{n}\in\mathbb{R}^{d} with E[X3]<\mathrm{E}[\|X\|_{\infty}^{3}]<\infty, E[X]=0\mathrm{E}[X]=0, and E[XX]=Σ\mathrm{E}[XX^{\prime}]=\Sigma,

ΔnCn,dBn,\displaystyle\Delta_{n}\leq C_{n,d}B_{n},

where

Bn:=(E[X3])1/3n1/6Var(Z)+E[X3𝟏{X>M}]E[X3]+12E[Z]+MnVar(Z).\displaystyle B_{n}:=\frac{(\mathrm{E}[\|X\|_{\infty}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X\|_{\infty}^{3}\right]}+\frac{12\mathrm{E}[\|Z\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z\|_{\infty})}}.

The factor 12 in front of E[Z]\mathrm{E}[\|Z\|_{\infty}] ensures that Bnn1B_{n}\sqrt{n}\geq 1 so that Cn,dnC_{n,d}\leq\sqrt{n}. Indeed, one easily computes (E[Z2])1/22(2π+1)E[Z]12E[Z](\mathrm{E}[\|Z\|_{\infty}^{2}])^{1/2}\leq 2(2\sqrt{\pi}+1)\mathrm{E}[\|Z\|_{\infty}]\leq 12\mathrm{E}[\|Z\|_{\infty}] (by the equivalence of moments of suprema of Gaussian processes, e.g. van der Vaart and Wellner, 1996, Proposition A.2.4) and, hence, BnnE[Z2]/Var(Z)1B_{n}\sqrt{n}\geq\sqrt{\mathrm{E}[\|Z\|_{\infty}^{2}]/\mathrm{Var}(\|Z\|_{\infty})}\geq 1.

While the upper bound Cn,dnC_{n,d}\leq\sqrt{n} is too loose to conclude the proof, it is nonetheless an important first step towards a tighter bound. For the moment, assume that there exists an absolute constant K1K\geq 1 such that

ΔnBn[(1+2K2)Cn1,d+1]n2.\displaystyle\Delta_{n}\leq B_{n}\left[\left(1+2K^{2}\right)\sqrt{C_{n-1,d}}+1\right]\quad{}\forall n\geq 2. (25)

Then, by construction of Cn,dC_{n,d},

C1,d=1,Cn,d[(1+2K2)Cn1,d+1]nn2.\displaystyle C_{1,d}=1,\quad{}C_{n,d}\leq\left[\left(1+2K^{2}\right)\sqrt{C_{n-1,d}}+1\right]\wedge\sqrt{n}\quad{}\forall n\geq 2. (26)

We shall now show that this difference inequality implies that supd1supn1Cn,d<\sup_{d\geq 1}\sup_{n\geq 1}C_{n,d}<\infty independent of the distribution of the XiX_{i}’s: Define the map xF(x):=(1+2K2)x+1x\mapsto F(x):=(1+2K^{2})\sqrt{x}+1 and consider the nonlinear first-order difference equation

x1=1,xn=F(xn1)n2.\displaystyle x_{1}=1,\quad{}x_{n}=F(x_{n-1})\quad{}\forall n\geq 2.

We easily verify that the fixed point x>0x^{*}>0 solving x=F(x)x=F(x) satisfies

x=12(4K4+4K2+(2K2+1)2(4K4+4K2+5)+3),\displaystyle x^{*}=\frac{1}{2}\left(4K^{4}+4K^{2}+\sqrt{(2K^{2}+1)^{2}(4K^{4}+4K^{2}+5)}+3\right),

and that F(x)>xF(x)>x for all x(0,x)x\in(0,x^{*}) and F(x)<xF(x)<x for all (x,)(x^{*},\infty). We also notice that FF is monotone increasing on +\mathbb{R}_{+}. Thus,

[(FF)(x)x](xx)<0x+{x}.\displaystyle[(F\circ F)(x)-x](x-x^{*})<0\quad{}\quad{}\forall x\in\mathbb{R}_{+}\setminus\{x^{*}\}.

Hence, by Theorem 2.1.2 in Sedaghat (2003) every trajectory {Fn(x1)}n1\{F^{n}(x_{1})\}_{n\geq 1} with x1+x_{1}\in\mathbb{R}_{+} converges to xx^{*}. In particular, limnxn=limnFn(1)=x\lim_{n\rightarrow\infty}x_{n}=\lim_{n\rightarrow\infty}F^{n}(1)=x^{*}. Returning to the inequality (26) we conclude that there exists N02N_{0}\geq 2 such that Cn,dx+1C_{n,d}\leq x^{*}+1 for all n>N0n>N_{0} and all d1d\geq 1. Since Cn,dnC_{n,d}\leq\sqrt{n} for all n1n\geq 1, it follows that Cn,d(x+1)N0<C_{n,d}\leq(x^{*}+1)\vee\sqrt{N_{0}}<\infty for all n1n\geq 1 and d1d\geq 1.

To complete the proof of the theorem, it remains to show that eq. (25) holds. Let ss\in\mathbb{R}, t,λ0t,\lambda\geq 0 be arbitrary and hs,λh_{s,\lambda} be the map from Lemma 1. Define xh(x):=hs,λ(x)x\mapsto h(x):=h_{s,\lambda}(\|x\|_{\infty}). By Lemma 6 and Lemma 1 (ii),

Δnsups,t0|E[Pth(Sn)h(Z)]|+23λVar(Z)+λ2/12,\displaystyle\Delta_{n}\leq\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}, (27)

where PthP_{t}h denotes the Ornstein-Uhlenbeck semi-group with stationary measure N(0,Σ)N(0,\Sigma), i.e.

Pth(x):=E[h(etx+1e2tZ)]xd.\displaystyle P_{t}h(x):=\mathrm{E}\left[h\left(e^{-t}x+\sqrt{1-e^{-2t}}Z\right)\right]\quad{}\quad{}\forall x\in\mathbb{R}^{d}.

Since xPth(x)E[h(Z)]x\mapsto P_{t}h(x)-\mathrm{E}[h(Z)] is Lipschitz continuous (with constant λ1et\lambda^{-1}e^{-t}) and Σ\Sigma positive definite, Proposition 4.3.2 in Nourdin and Peccati (2012) implies that

E[Pth(Sn)h(Z)]=E[tr(ΣD2Gh(Sn))SnDGh(Sn)],\displaystyle\mathrm{E}\left[P_{t}h(S_{n})-h(Z)\right]=\mathrm{E}\left[\mathrm{tr}\left(\Sigma D^{2}G_{h}(S_{n})\right)-S_{n}^{\prime}DG_{h}(S_{n})\right], (28)

where GhC(d)G_{h}\in C^{\infty}(\mathbb{R}^{d}) and

Gh(x):=0(E[h(Z)]PuPth(x))𝑑uxd.\displaystyle G_{h}(x):=\int_{0}^{\infty}\Big{(}\mathrm{E}[h(Z)]-P_{u}P_{t}h(x)\Big{)}du\quad{}\quad{}\forall x\in\mathbb{R}^{d}.

Since PuPtf=Pu+tfP_{u}P_{t}f=P_{u+t}f almost surly for all integrable maps ff (semi-group property!), we have

Gh(x)=tE[h(Z)h(eux+1e2uZ)]𝑑u.\displaystyle G_{h}(x)=\int_{t}^{\infty}\mathrm{E}\left[h(Z)-h\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\right]du.

We proceed by re-writing eq. (28) in multi-index notation as

|E[Pth(Sn)h(Z)]|\displaystyle\big{|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{|} =|E[tr(ΣD2Gh(Sn))SnDGh(Sn)]|\displaystyle=\left|\mathrm{E}\left[\mathrm{tr}\left(\Sigma D^{2}G_{h}(S_{n})\right)-S_{n}^{\prime}DG_{h}(S_{n})\right]\right|
=|i=1nE[1ntr(X~iX~iD2Gh(Sn))XinDGh(Sn)]|\displaystyle=\left|\sum_{i=1}^{n}\mathrm{E}\left[\frac{1}{n}\mathrm{tr}\left(\widetilde{X}_{i}\widetilde{X}_{i}D^{2}G_{h}(S_{n})\right)-\frac{X_{i}^{\prime}}{\sqrt{n}}DG_{h}(S_{n})\right]\right|
=|i=1nE[|α|=2DαGh(Sn)(X~in)α|α|=1DαGh(Sn)(Xin)α]|,\displaystyle=\left|\sum_{i=1}^{n}\mathrm{E}\left[\sum_{|\alpha|=2}D^{\alpha}G_{h}(S_{n})\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}-\sum_{|\alpha|=1}D^{\alpha}G_{h}\left(S_{n}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha}\right]\right|,

where the X~i\widetilde{X}_{i}’s are independent copies of the XiX_{i}’s. A Taylor expansion around Sni:=Snn1/2XiS_{n}^{i}:=S_{n}-n^{-1/2}X_{i} with exact integral remainder term yields

|E[Pth(Sn)h(Z)]|\displaystyle\big{|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{|}
=|E[i=1n|α|=2DαGh(Sni)(X~in)α]+E[i=1n|α|=2|β|=1Dα+βGh(Sni+θXin)(X~in)α(Xin)β]\displaystyle\quad{}=\left|\mathrm{E}\left[\sum_{i=1}^{n}\sum_{|\alpha|=2}D^{\alpha}G_{h}(S_{n}^{i})\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\right]+\mathrm{E}\left[\sum_{i=1}^{n}\sum_{|\alpha|=2}\sum_{|\beta|=1}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\left(\frac{X_{i}}{\sqrt{n}}\right)^{\beta}\right]\right.
E[i=1n|α|=1DαGh(Sni)(Xin)α]E[i=1n|α|=1|β|=1Dα+βGh(Sni)(Xin)α+β]\displaystyle\quad{}\left.\quad{}-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{|\alpha|=1}D^{\alpha}G_{h}\left(S_{n}^{i}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha}\right]-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{|\alpha|=1}\sum_{|\beta|=1}D^{\alpha+\beta}G_{h}(S_{n}^{i})\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\right]\right.
E[i=1n|α|=1|β|=2Dα+βGh(Sni+θXin)(Xin)α+β2(1θ)β!]|\displaystyle\quad{}\left.\quad{}-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{|\alpha|=1}\sum_{|\beta|=2}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\frac{2(1-\theta)}{\beta!}\right]\right|
=|E[i=1n|α|=2|β|=1Dα+βGh(Sni+θXin){(X~in)α(Xin)β(Xin)α+β2(1θ)α!}]|,\displaystyle\quad{}=\left|\mathrm{E}\left[\sum_{i=1}^{n}\sum_{|\alpha|=2}\sum_{|\beta|=1}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left\{\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\left(\frac{X_{i}}{\sqrt{n}}\right)^{\beta}-\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\frac{2(1-\theta)}{\alpha!}\right\}\right]\right|, (29)

where θUnif(0,1)\theta\sim Unif(0,1) is independent of the XiX_{i}’s and X~i\widetilde{X}_{i}’s. The first and fourth term cancel out and the third term vanishes because SniS_{n}^{i} and XiX_{i} are independent and mean zero. (Eq. (C.1) is essentially a re-statement of Lemmas 2.9 and 2.4 in Götze (1991) and Raič (2019), respectively.)

Notice that GhC(d)G_{h}\in C^{\infty}(\mathbb{R}^{d}) because it is the convolution of a bounded Lipschitz map with the density of N(0,Σ)N(0,\Sigma) (e.g. Nourdin and Peccati, 2012, Proposition 4.2.2). Derivatives on GhG_{h} are usually obtained by differentiating the density of N(0,Σ)N(0,\Sigma) (e.g. Götze1991, Lemma 2.1; Raič, 2019, Lemma 2.5 and 2.6; Fang and Koike2021, Lemma 2.2; and Chernozhukov et al.2020, Lemmas 6.1, 6.2, 6.3). Here, we proceed differently. Let 1j,k,d1\leq j,k,\ell\leq d be the indices corresponding to the multi-indices |α|=2|\alpha|=2 and |β|=1|\beta|=1. By Lemma 2 (i) we have

Dα+βGh(Sni+θXin)\displaystyle D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right) =3Ghxjxkx(Sni+θXin)\displaystyle=\frac{\partial^{3}G_{h}}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)
=te3uEZ[3hxjxkx(eux+1e2uZ)|x=Sni+θXin]𝑑u,\displaystyle=-\int_{t}^{\infty}e^{-3u}\mathrm{E}_{Z}\left[\frac{\partial^{3}h}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\Big{|}_{x=S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}}\right]du,

where EZ\mathrm{E}_{Z} denotes the expectation taken with respect to the law of ZZ only. And by Lemma 2 (ii),

j,k,e3uEZ[|3hxjxkx(eux+1e2uZ)|x=Sni+θXin|]\displaystyle\sum_{j,k,\ell}e^{-3u}\mathrm{E}_{Z}\left[\left|\frac{\partial^{3}h}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\Big{|}_{x=S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}}\right|\right]
C3λ3e3uj=1dEZ[𝟏[s,s+3λ](Vi)𝟏{|Vij||Vim|,m}],\displaystyle\quad{}\leq C_{3}\lambda^{-3}e^{-3u}\sum_{j=1}^{d}\mathrm{E}_{Z}\left[\mathbf{1}_{[s,s+3\lambda]}\left(\left\|V_{i}\right\|_{\infty}\right)\mathbf{1}\left\{\left|V_{ij}\right|\geq\left|V_{im}\right|,\>\forall m\right\}\right],

where

Vi:=eu(Sni+θXin)+1e2uZ,1in.\displaystyle V_{i}:=e^{-u}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)+\sqrt{1-e^{-2u}}Z,\quad\quad 1\leq i\leq n.

Let αk,{1,2}\alpha_{k,\ell}\in\{1,2\} be the value of α!\alpha! for |α|=2|\alpha|=2 corresponding to the indices k,k,\ell. An application of Hölder’s inequality yields

e3un3/2i=1nj,k,EZ[|3hxjxkx(eux+1e2uZ)|x=Sni+θXin|]|X~ijX~ikXiXijXikXi2(1θ)αk,|\displaystyle\frac{e^{-3u}}{n^{3/2}}\sum_{i=1}^{n}\sum_{j,k,\ell}\mathrm{E}_{Z}\left[\left|\frac{\partial^{3}h}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\Big{|}_{x=S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}}\right|\right]\left|\widetilde{X}_{ij}\widetilde{X}_{ik}X_{i\ell}-X_{ij}X_{ik}X_{i\ell}\frac{2(1-\theta)}{\alpha_{k,\ell}}\right|
C3e3un3/2λ3EZ[i=1nj=1d𝟏{|Vij||Vim|,mj}𝟏[s,s+3λ](Vi)|X~ij2XijXij32(1θ)αj,j|]\displaystyle\leq\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\sum_{j=1}^{d}\mathbf{1}\left\{\left|V_{ij}\right|\geq\left|V_{im}\right|,\>m\neq j\right\}\mathbf{1}_{[s,s+3\lambda]}\left(\left\|V_{i}\right\|_{\infty}\right)\left|\widetilde{X}_{ij}^{2}X_{ij}-X_{ij}^{3}\frac{2(1-\theta)}{\alpha_{j,j}}\right|\right]
C3e3un3/2λ3EZ[i=1n𝟏[s,s+3λ](Vi)(j=1d𝟏{|Vij||Vim|,mj})max1jd(X~ij2|Xij|+|Xij|3)]\displaystyle\leq\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\|V_{i}\right\|_{\infty}\right)\left(\sum_{j=1}^{d}\mathbf{1}\left\{\left|V_{ij}\right|\geq\left|V_{im}\right|,\>m\neq j\right\}\right)\max_{1\leq j\leq d}\left(\widetilde{X}_{ij}^{2}|X_{ij}|+|X_{ij}|^{3}\right)\right]
=C3e3un3/2λ3EZ[i=1n𝟏[s,s+3λ](Vi)(X~i2Xi+Xi3)],\displaystyle=\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\|V_{i}\right\|_{\infty}\right)\left(\|\widetilde{X}_{i}\|_{\infty}^{2}\|X_{i}\|_{\infty}+\|X_{i}\|_{\infty}^{3}\right)\right], (30)

where in the last line we have used that j=1d𝟏{|Vij||Vim|,m}=1\sum_{j=1}^{d}\mathbf{1}\left\{\left|V_{ij}\right|\geq\left|V_{im}\right|,\>\forall m\right\}=1 almost surely because |Corr(Zj,Zk)|=|Corr(X1j,X1k)|<1|Corr(Z_{j},Z_{k})|=|Corr(X_{1j},X_{1k})|<1 for all jkj\neq k and 1in1\leq i\leq n (since Σ\Sigma is positive definite no pair of entries in XiX_{i} and ZZ can be perfectly (positively or negatively) correlated!). Taking expectation with respect to the XiX_{i}’s and θ\theta over above inequality, we obtain

C3e3un3/2λ3E[i=1n𝟏[s,s+3λ](Vi)(X~i2Xi+Xi3)]\displaystyle\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\|V_{i}\right\|_{\infty}\right)\left(\|\widetilde{X}_{i}\|_{\infty}^{2}\|X_{i}\|_{\infty}+\|X_{i}\|_{\infty}^{3}\right)\right]
=C3e3un3/2λ3E[i=1nE[𝟏[s,s+3λ](Vi)Xi,θ](X~i2Xi+Xi3)]\displaystyle\quad{}=\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}\left[\sum_{i=1}^{n}\mathrm{E}\left[\mathbf{1}_{[s,s+3\lambda]}\left(\left\|V_{i}\right\|_{\infty}\right)\mid X_{i},\theta\right]\left(\|\widetilde{X}_{i}\|_{\infty}^{2}\|X_{i}\|_{\infty}+\|X_{i}\|_{\infty}^{3}\right)\right]
=C3e3un3/2λ3E[i=1n{sVis+3λXi,θ}(X~i2Xi+Xi3)].\displaystyle\quad{}=\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}\left[\sum_{i=1}^{n}\mathbb{P}\left\{s\leq\|V_{i}\|_{\infty}\leq s+3\lambda\mid X_{i},\theta\right\}\left(\|\widetilde{X}_{i}\|_{\infty}^{2}\|X_{i}\|_{\infty}+\|X_{i}\|_{\infty}^{3}\right)\right]. (31)

Notice that E[Sni]=0\mathrm{E}[S_{n}^{i}]=0 and E[SniSni]=n1nE[SnSn]\mathrm{E}[S_{n}^{i}S_{n}^{i^{\prime}}]=\frac{n-1}{n}\mathrm{E}[S_{n}S_{n}^{\prime}], and Z=𝑑n1nZ+1nZ~Z\overset{d}{=}\sqrt{\frac{n-1}{n}}Z+\frac{1}{\sqrt{n}}\widetilde{Z}, where Z~\widetilde{Z} is an independent copy of ZN(0,Σ)Z\sim N(0,\Sigma). Set Zn:=n1nZZ_{n}:=\sqrt{\frac{n-1}{n}}Z and bound the probability in line (C.1) by

{sVis+3λXi,θ}\displaystyle\mathbb{P}\left\{s\leq\|V_{i}\|_{\infty}\leq s+3\lambda\mid X_{i},\theta\right\}
={Vi>sXi,θ}{Vi>s+3λXi,θ}\displaystyle\quad{}=\mathbb{P}\left\{\|V_{i}\|_{\infty}>s\mid X_{i},\theta\right\}-\mathbb{P}\left\{\|V_{i}\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}
{euSni+1e2uZn+1e2un1/2Z~+eun1/2θXi>sXi,θ}\displaystyle\quad{}\leq\mathbb{P}\left\{\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\|_{\infty}+\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}+e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}>s\mid X_{i},\theta\right\}
{euSni+1e2uZn1e2un1/2Z~eun1/2θXi>s+3λXi,θ}\displaystyle\quad{}\quad{}-\mathbb{P}\left\{\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\|_{\infty}-\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}-e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}
={euSni+1e2uZn+1e2un1/2Z~+eun1/2θXi>sXi,θ}\displaystyle\quad{}=\mathbb{P}\left\{\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\|_{\infty}+\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}+e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}>s\mid X_{i},\theta\right\}
±{Zn+1e2un1/2Z~+eun1/2θXi>sXi,θ}\displaystyle\quad{}\quad{}\pm\mathbb{P}\left\{\|Z_{n}\|_{\infty}+\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}+e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}>s\mid X_{i},\theta\right\}
±{Zn1e2un1/2Z~eun1/2θXi>s+3λXi,θ}\displaystyle\quad{}\quad{}\pm\mathbb{P}\left\{\|Z_{n}\|_{\infty}-\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}-e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}
{euSni+1e2uZn1e2un1/2Z~eun1/2θXi>s+3λXi,θ}\displaystyle\quad{}\quad{}-\mathbb{P}\left\{\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\|_{\infty}-\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}-e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}
2Δn1+{s1e2un1/2Z~eun1/2θXiZns+3λ+1e2un1/2Z~+eun1/2θXiXi,θ},\displaystyle\begin{split}&\quad{}\leq 2\Delta_{n-1}+\mathbb{P}\left\{s-\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}-e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}\leq\|Z_{n}\|_{\infty}\leq s+3\lambda\right.\\ &\left.\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}+e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}\mid X_{i},\theta\right\},\end{split} (32)

where we have used that under the i.i.d. assumption

Δn1\displaystyle\Delta_{n-1} sups,t0|{etSni+1e2tZns}{Zns}|.\displaystyle\equiv\sup_{s\in\mathbb{R},t\geq 0}\Big{|}\mathbb{P}\left\{\|e^{-t}S_{n}^{i}+\sqrt{1-e^{-2t}}Z_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z_{n}\|_{\infty}\leq s\right\}\Big{|}.

By Lemma 6, Lemma 1 (ii), and monotonicity and concavity of the map xx/a+x2x\mapsto x/\sqrt{a+x^{2}}, a>0a>0, we have, for arbitrary M>0M>0,

{s1e2un1/2Z~eun1/2θXiZns+3λ\displaystyle\mathbb{P}\left\{s-\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}-e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}\leq\|Z_{n}\|_{\infty}\leq s+3\lambda\right.
+1e2un1/2Z~+eun1/2θXiXi,θ}\displaystyle\left.\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\sqrt{1-e^{-2u}}n^{-1/2}\|\widetilde{Z}\|_{\infty}+e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}\mid X_{i},\theta\right\}
63λ+231e2un1/2E[Z~]+43eun1/2θXiVar(Z)+(1e2u)n1E[Z~]2/12+3λ2/4+e2un1θXi2/3\displaystyle\quad{}\leq\frac{6\sqrt{3}\lambda+2\sqrt{3}\sqrt{1-e^{-2u}}n^{-1/2}\mathrm{E}[\|\widetilde{Z}\|_{\infty}]+4\sqrt{3}e^{-u}n^{-1/2}\|\theta X_{i}\|_{\infty}}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+(1-e^{-2u})n^{-1}\mathrm{E}[\|\widetilde{Z}\|_{\infty}]^{2}/12+3\lambda^{2}/4+e^{-2u}n^{-1}\|\theta X_{i}\|_{\infty}^{2}/3}}
43n1/2MVar(Z)+n1M2/3𝟏{XiM}+12𝟏{Xi>M}+63λVar(Z)+3λ2/4+23n1/2E[Z]Var(Z)+n1E[Z]2/12.\displaystyle\begin{split}&\quad{}\leq\frac{4\sqrt{3}n^{-1/2}M}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+n^{-1}M^{2}/3}}\mathbf{1}\{\|X_{i}\|_{\infty}\leq M\}+12\cdot\mathbf{1}\{\|X_{i}\|_{\infty}>M\}\\ &\quad{}\quad{}+\frac{6\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+3\lambda^{2}/4}}+\frac{2\sqrt{3}n^{-1/2}\mathrm{E}[\|Z\|_{\infty}]}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+n^{-1}\mathrm{E}[\|Z\|_{\infty}]^{2}/12}}.\end{split} (33)

Combine eq. (C.1)–(C.1) with (C.1) and integrate over u(t,)u\in(t,\infty) to conclude via eq. (27)–(C.1) and the i.i.d. assumption that there exists an absolute constant K1K\geq 1 such that

ΔnKλVar(Z)+Kn1/2λ3E[X3]Δn1+Kn1/2λ2E[X3]Var(Z)+Knλ3E[X3]E[Z]Var(Z)+Knλ3E[X3]MVar(Z)+Kn1/2λ3E[X3𝟏{X>M}],\displaystyle\begin{split}\Delta_{n}&\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X\|_{\infty}^{3}\right]\Delta_{n-1}\\ &\quad{}+\frac{K}{n^{1/2}\lambda^{2}}\frac{\mathrm{E}[\|X\|_{\infty}^{3}]}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X\|_{\infty}^{3}]\mathrm{E}[\|Z\|_{\infty}]}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}\\ &\quad{}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X\|_{\infty}^{3}]M}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right],\end{split} (34)

where we have used Harris’ association inequality to simplify several summands, i.e.

i=1nE[X~i2]E[Xi𝟏{XiM}]i=1nE[Xi2]E[Xi]i=1nE[Xi3],\displaystyle\sum_{i=1}^{n}\mathrm{E}[\|\widetilde{X}_{i}\|_{\infty}^{2}]\mathrm{E}[\|X_{i}\|_{\infty}\mathbf{1}\{\|X_{i}\|_{\infty}\leq M\}]\leq\sum_{i=1}^{n}\mathrm{E}[\|X_{i}\|_{\infty}^{2}]\mathrm{E}[\|X_{i}\|_{\infty}]\leq\sum_{i=1}^{n}\mathrm{E}\left[\|X_{i}\|_{\infty}^{3}\right],

and

i=1nE[X~i2]E[Xi𝟏{Xi>M}]i=1nE[Xi3𝟏{Xi>M}].\displaystyle\sum_{i=1}^{n}\mathrm{E}[\|\widetilde{X}_{i}\|_{\infty}^{2}]\mathrm{E}\left[\|X_{i}\|_{\infty}\mathbf{1}\{\|X_{i}\|_{\infty}>M\}\right]\leq\sum_{i=1}^{n}\mathrm{E}\left[\|X_{i}\|_{\infty}^{3}\mathbf{1}\{\|X_{i}\|_{\infty}>M\}\right].

Observe that eq. (34) holds for arbitrary λ>0\lambda>0. Setting

λ=K(Cn1,dn1)1/6(E[X3])1/3\displaystyle\lambda=K\left(\frac{C_{n-1,d}}{n-1}\right)^{1/6}\left(\mathrm{E}[\|X\|_{\infty}^{3}]\right)^{1/3}

we deduce from eq. (34), the definition of Cn1,dC_{n-1,d}, and KCn1,d1/61KC_{n-1,d}^{1/6}\geq 1 (because K,Cn1,d1K,C_{n-1,d}\geq 1!) that, for all n2n\geq 2,

Δn\displaystyle\Delta_{n} KλVar(Z)+Bn1Cn1,d+Bn\displaystyle\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+B_{n-1}\sqrt{C_{n-1,d}}+B_{n}
Bn[(nn1)1/6K2Cn1,d1/6+(nn1)1/2Cn1,d+1]\displaystyle\leq B_{n}\left[\left(\frac{n}{n-1}\right)^{1/6}K^{2}C_{n-1,d}^{1/6}+\left(\frac{n}{n-1}\right)^{1/2}\sqrt{C_{n-1,d}}+1\right]
Bn[(1+2K2)Cn1,d+1].\displaystyle\leq B_{n}\left[\left(1+2K^{2}\right)\sqrt{C_{n-1,d}}+1\right].

We have thus established eq. (25). This concludes the proof of the theorem in the case of a positive definite covariance matrix.

The case of positive semi-definite Σ𝟎\Sigma\neq\mathbf{0}.

Suppose that Σd×d\Sigma\in\mathbb{R}^{d\times d} is positive semi-definite but not identical to zero.

Take Y,Y1,,YnN(0,Id)Y,Y_{1},\ldots,Y_{n}\sim N(0,I_{d}) and ZN(0,Σ)Z\sim N(0,\Sigma) such that X,X1,,Xn,Y,Y1,,Yn,ZdX,X_{1},\ldots,X_{n},Y,Y_{1},\ldots,Y_{n},Z\in\mathbb{R}^{d} are mutually independent. Let η>0\eta>0 be arbitrary and define Zη:=Z+ηYZ^{\eta}:=Z+\eta Y, Xη:=X+ηYX^{\eta}:=X+\eta Y, and Snη:=n1/2i=1nXiηS_{n}^{\eta}:=n^{-1/2}\sum_{i=1}^{n}X_{i}^{\eta} with Xiη:=Xi+ηYiX_{i}^{\eta}:=X_{i}+\eta Y_{i}. Clearly, ZηN(0,Σ+η2Id)Z^{\eta}\sim N(0,\Sigma+\eta^{2}I_{d}) and the XηiX^{\eta}_{i}’s are i.i.d. with mean zero and positive definite covariance Σ+η2Id\Sigma+\eta^{2}I_{d}. Hence, by the first part of the proof there exists an absolute constant C1C_{*}\geq 1 independent of nn, dd, and the distribution of the XiηX_{i}^{\eta}’s (and hence, independent of η>0\eta>0!) such that for M0M\geq 0 and n1n\geq 1,

ΔnηCBnη,\displaystyle\Delta_{n}^{\eta}\leq C_{*}B_{n}^{\eta}, (35)

where

Δnη:=sups,t0|{etSnη+1e2tZηs}{Zηs}|,\displaystyle\Delta_{n}^{\eta}:=\sup_{s,t\geq 0}\Big{|}\mathbb{P}\left\{\|e^{-t}S_{n}^{\eta}+\sqrt{1-e^{-2t}}Z^{\eta}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z^{\eta}\|_{\infty}\leq s\right\}\Big{|},

and

Bnη:=(E[Xη3])1/3n1/6Var(Zη)+E[Xη3𝟏{Xη>M}]E[Xη3]+12E[Zη]+MnVar(Zη).\displaystyle B_{n}^{\eta}:=\frac{(\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\mathbf{1}\{\|X^{\eta}\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\right]}+\frac{12\mathrm{E}[\|Z^{\eta}\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}.

At this point, it is tempting to take η0\eta\rightarrow 0 in eq. (35). However, this would yield the desired result only for the case in which the law of Sn\|S_{n}\|_{\infty} is continuous. (Alternatively, we could replace the supremum over s[0,]s\in[0,\infty] by the supremum over s𝒞ns\in\mathcal{C}_{n}, where 𝒞n\mathcal{C}_{n} is the set of continuity points of the law of Sn\|S_{n}\|_{\infty}.) Therefore, we proceed differently. Recall eq. (27), i.e.

Δnsups,t0|E[Pth(Sn)h(Z)]|+23λVar(Z)+λ2/12,\displaystyle\Delta_{n}\leq\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}},

where PthP_{t}h denotes the Ornstein-Uhlenbeck semi-group with stationary measure N(0,Σ)N(0,\Sigma),

Pth(x):=E[h(etx+1e2tZ)]xd.\displaystyle P_{t}h(x):=\mathrm{E}\left[h\left(e^{-t}x+\sqrt{1-e^{-2t}}Z\right)\right]\quad{}\quad{}\forall x\in\mathbb{R}^{d}.

Let Ptηh(x)P_{t}^{\eta}h(x) be the Ornstein-Uhlenbeck semi-group with stationary measure N(0,Σ+η2Id)N(0,\Sigma+\eta^{2}I_{d}) and expand above inequality to obtain the following modified version of eq. (27):

Δn\displaystyle\Delta_{n} sups,t0|E[Ptηh(Snη)h(Zη)]|+23λVar(Z)+λ2/12+sups,t0|E[Pth(Snη)Pth(Sn)]|\displaystyle\leq\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}^{\eta}h(S_{n}^{\eta})-h(Z^{\eta})]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}+\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}h(S_{n}^{\eta})-P_{t}h(S_{n})]\big{|}
+sups,t0|E[h(Zη)h(Z)]|+sups,t0|E[Pth(Snη)Ptηh(Snη)]|\displaystyle\quad{}+\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[h(Z^{\eta})-h(Z)]\big{|}+\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}h(S_{n}^{\eta})-P_{t}^{\eta}h(S_{n}^{\eta})]\big{|}
(a)sups,t0|E[Ptηh(Snη)h(Zη)]|+23λVar(Z)+λ2/12\displaystyle\overset{(a)}{\leq}\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}^{\eta}h(S_{n}^{\eta})-h(Z^{\eta})]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}
+ηλE[n1/2i=1nYi]+ηλE[Y]+ηλE[Y]\displaystyle\quad{}+\frac{\eta}{\lambda}\mathrm{E}\left[\big{\|}n^{-1/2}\sum_{i=1}^{n}Y_{i}\big{\|}_{\infty}\right]+\frac{\eta}{\lambda}\mathrm{E}[\|Y\|_{\infty}]+\frac{\eta}{\lambda}\mathrm{E}[\|Y\|_{\infty}]
sups,t0|E[Ptηh(Snη)h(Zη)]|+23λVar(Z)+λ2/12+3ηλ2log2d,\displaystyle\leq\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}^{\eta}h(S_{n}^{\eta})-h(Z^{\eta})]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}+\frac{3\eta}{\lambda}\sqrt{2\log 2d},

where the (a) holds because hh is λ1\lambda^{-1}-Lipschitz and PthP_{t}h is λ1et\lambda^{-1}e^{-t}-Lipschitz w.r.t. the metric induced by the \ell_{\infty}-norm.

Since Σ+η2Id\Sigma+\eta^{2}I_{d} is positive definite we can proceed as in the first part of the proof and arrive at the following modified version of eq. (34): There exists an absolute constant K1K\geq 1 such that

ΔnKλVar(Zη)+Kn1/2λ3E[Xη3]Δn1η+Kn1/2λ2E[Xη3]Var(Zη)+Knλ3E[Xη3]E[Zη]Var(Zη)+Knλ3E[Xη3]MVar(Zη)+Kn1/2λ3E[Xη3𝟏{Xη>M}]+3ηλ2log2d.\displaystyle\begin{split}\Delta_{n}&\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\right]\Delta_{n-1}^{\eta}\\ &\quad{}+\frac{K}{n^{1/2}\lambda^{2}}\frac{\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]\mathrm{E}[\|Z^{\eta}\|_{\infty}]}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}\\ &\quad{}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]M}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\mathbf{1}\{\|X^{\eta}\|_{\infty}>M\}\right]\\ &\quad{}+\frac{3\eta}{\lambda}\sqrt{2\log 2d}.\end{split} (36)

Set

λ=K(Cn1)1/6(E[Xη3])1/3\displaystyle\lambda=K\left(\frac{C_{*}}{n-1}\right)^{1/6}\left(\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]\right)^{1/3}

and combine eq. (35) and eq. (36) to conclude (as in the first part of the proof) that for all n2n\geq 2,

Δn\displaystyle\Delta_{n} KλVar(Zη)+Bηn1C+Bnη+3ηλ2log2d\displaystyle\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+B^{\eta}_{n-1}\sqrt{C_{*}}+B_{n}^{\eta}+\frac{3\eta}{\lambda}\sqrt{2\log 2d}
Bnη[(nn1)1/6K2C1/6+(nn1)1/2C+1]+3ηλ2log2d\displaystyle\leq B_{n}^{\eta}\left[\left(\frac{n}{n-1}\right)^{1/6}K^{2}C_{*}^{1/6}+\left(\frac{n}{n-1}\right)^{1/2}\sqrt{C_{*}}+1\right]+\frac{3\eta}{\lambda}\sqrt{2\log 2d}
Bnη[(1+2K2)C+1]+3ηK2log2d(E[Xη3])1/3(n1C)1/6.\displaystyle\leq B_{n}^{\eta}\left[\left(1+2K^{2}\right)\sqrt{C_{*}}+1\right]+\frac{3\eta}{K}\frac{\sqrt{2\log 2d}}{\left(\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]\right)^{1/3}}\left(\frac{n-1}{C_{*}}\right)^{1/6}. (37)

Since η>0\eta>0 arbitrary, we can take η0\eta\rightarrow 0. To complete the proof, we only need to find the limit of the expression on the right hand side in above display.

Let (ηk)k1(\eta_{k})_{k\geq 1} be a monotone falling null sequence. For all 0<ηk10<\eta_{k}\leq 1 and 1p31\leq p\leq 3, XηpX+ηYp2p1Xp+2p1Yp\|X^{\eta}\|_{\infty}^{p}\equiv\|X+\eta Y\|_{\infty}^{p}\leq 2^{p-1}\|X\|_{\infty}^{p}+2^{p-1}\|Y\|_{\infty}^{p} a.s. and E[Xp]<\mathrm{E}[\|X\|_{\infty}^{p}]<\infty (otherwise the upper bound in Proposition 1 is trivial!) and E[Yp]<\mathrm{E}[\|Y\|_{\infty}^{p}]<\infty (e.g. van der Vaart and Wellner, 1996, Proposition A.2.4). Hence, (Xηp)k1(\|X^{\eta}\|_{\infty}^{p})_{k\geq 1} is uniformly integrable for p{1,2}p\in\{1,2\}. Since in addition XηpXp\|X^{\eta}\|_{\infty}^{p}\rightarrow\|X\|^{p} a.s., the sequence (Xη)k1(\|X^{\eta}\|_{\infty})_{k\geq 1} converges in L2L^{2}. Arguing similarly, we conclude that the sequences (Zη2)k1(\|Z^{\eta}\|_{\infty}^{2})_{k\geq 1} converges in L2L^{2} as well. Thus,

Bnη(E[X3])1/3n1/6Var(Z)+E[X3𝟏{X>M}]E[X3]+E[Z]+MnVar(Z)Bnask,\displaystyle B_{n}^{\eta}\rightarrow\frac{(\mathrm{E}[\|X\|_{\infty}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X\|_{\infty}^{3}\right]}+\frac{\mathrm{E}[\|Z\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z\|_{\infty})}}\equiv B_{n}\quad{}\mathrm{as}\quad{}k\rightarrow\infty,

and

3ηkK2log2d(E[Xη3])1/3(n1C)1/60ask.\displaystyle\frac{3\eta_{k}}{K}\frac{\sqrt{2\log 2d}}{\left(\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]\right)^{1/3}}\left(\frac{n-1}{C_{*}}\right)^{1/6}\rightarrow 0\quad{}\mathrm{as}\quad{}k\rightarrow\infty.

Hence, by eq. (37) we have shown that ΔnBn\Delta_{n}\lesssim B_{n} for all M0M\geq 0 and n1n\geq 1. This completes the proof of the proposition. ∎

C.2 Proof of Corollary 2

Proof of Corollary 2.

Define X~=(X(j)/σ(1))j=1d\widetilde{X}=(X^{(j)}/\sigma_{(1)})_{j=1}^{d} and Z~=(Z(j)/σ(1))j=1d\widetilde{Z}=(Z^{(j)}/\sigma_{(1)})_{j=1}^{d} where σ(1)2=min1jdΣjj\sigma_{(1)}^{2}=\\ \min_{1\leq j\leq d}\Sigma_{jj}. Thus, by Lemma 7,

Var(Z)(σ(1)1+E[Z~])2.\displaystyle\mathrm{Var}(\|Z\|_{\infty})\gtrsim\left(\frac{\sigma_{(1)}}{1+\mathrm{E}[\|\widetilde{Z}\|_{\infty}]}\right)^{2}. (38)

Moreover, for M,s>0M,s>0 arbitrary,

E[X~3𝟏{X~>M}]E[X~3+s𝟏{X~>M}]MsE[X~3+s]Ms,\displaystyle\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}\mathbf{1}\{\|\widetilde{X}\|_{\infty}>M\}]\leq\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+s}\mathbf{1}\{\|\widetilde{X}\|_{\infty}>M\}]M^{-s}\leq\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+s}]M^{-s},

and, hence, for M3+δ:=E[X~3+δ]1/(3+δ)M_{3+\delta}:=\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+\delta}]^{1/(3+\delta)} and s=δs=\delta,

E[X~3𝟏{X~>n1/3M3+δ}]E[X~3]1nδ/3E[X~3+δ]M3+δδE[X~3]1nδ/3M3+δ3E[X~3].\displaystyle\frac{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}\mathbf{1}\{\|\widetilde{X}\|_{\infty}>n^{1/3}M_{3+\delta}\}]}{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]}\leq\frac{1}{n^{\delta/3}}\frac{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+\delta}]}{M_{3+\delta}^{\delta}\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]}\leq\frac{1}{n^{\delta/3}}\frac{M_{3+\delta}^{3}}{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]}. (39)

Combine the upper bound in Proposition 1 with eq. (38) and (39) and simplify the expression to conclude the proof of the first claim. (Obviously, this bound is not tight, but it is aesthetically pleasing.) The second claim about equicorrelated coordinates in XX follows from the lower bound on the variance of Z~\|\widetilde{Z}\|_{\infty} in Proposition 4.1.1 in Tanguy (2017) combined with the upper bound in Proposition 1 and eq. (39). ∎

C.3 Proof of Proposition 3

Proof of Proposition 3.

The main proof idea is standard, e.g. similar arguments have been used in proofs by Fang and Koike (2021) (Theorem 1.1) and Chernozhukov et al. (2020) (Theorem 3.2). While our bound is dimension-free, it is not sharp (e.g. Chernozhukov et al., 2020, Proposition 2.1).

The case of positive definite Ω\Omega.

Suppose that Σ\Sigma is positive semi-definite and Ω\Omega is positive definite. To simplify notation, we set

Δ:=sups0|{Ys}{Zs}|.\displaystyle\Delta:=\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|Y\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}.

Moreover, for ss\in\mathbb{R}, λ0\lambda\geq 0 arbitrary denote by hs,λh_{s,\lambda} the map from Lemma 1 and define yh(y):=hs,λ(y)y\mapsto h(y):=h_{s,\lambda}(\|y\|_{\infty}). Then, by Lemma 6 and Lemma 1 (ii),

Δsups|E[h(Y)h(Z)]|+23λVar(Y)Var(Z)+λ2/12.\displaystyle\Delta\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}. (40)

Since yh(y)E[h(Z)]y\mapsto h(y)-\mathrm{E}[h(Z)] is Lipschitz continuous (with constant λ1\lambda^{-1}) and Ω\Omega is positive definite, Proposition 4.3.2 in Nourdin and Peccati (2012) implies that

E[h(Y)h(Z)]=E[tr(ΩD2Gh(Y))YDGh(Y)],\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]=\mathrm{E}\left[\mathrm{tr}\left(\Omega D^{2}G_{h}(Y)\right)-Y^{\prime}DG_{h}(Y)\right], (41)

where GhC(d)G_{h}\in C^{\infty}(\mathbb{R}^{d}) and

Gh(y):=0(E[h(Z)]Pth(y))dtyd,\displaystyle G_{h}(y):=\int_{0}^{\infty}\Big{(}\mathrm{E}[h(Z)]-P_{t}h(y)\Big{)}dt\quad{}\quad{}\forall y\in\mathbb{R}^{d},

and PthP_{t}h denotes the Ornstein-Uhlenbeck semi-group with stationary measure N(0,Ω)N(0,\Omega), i.e.

Pth(y):=E[h(ety+1e2tZ)]yd.\displaystyle P_{t}h(y):=\mathrm{E}\left[h\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\right]\quad{}\quad{}\forall y\in\mathbb{R}^{d}.

Using Stein’s lemma we re-write eq. (41) as

E[h(Y)h(Z)]=E[tr(ΩD2Gh(Y))tr(ΣD2Gh(Y))].\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]=\mathrm{E}\left[\mathrm{tr}\left(\Omega D^{2}G_{h}(Y)\right)-\mathrm{tr}\left(\Sigma D^{2}G_{h}(Y)\right)\right].

Notice that above identity holds even if Σ\Sigma is only positive semi-definite (e.g. Nourdin and Peccati, 2012, Lemma 4.1.3)! By Hölder’s inequality for matrix inner products,

E[h(Y)h(Z)](maxj,k|ΩjkΣjk|)(j,kE[|2Ghxjxk(Y)|]).\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]\leq\left(\max_{j,k}|\Omega_{jk}-\Sigma_{jk}|\right)\left(\sum_{j,k}\mathrm{E}\left[\left|\frac{\partial^{2}G_{h}}{\partial x_{j}\partial x_{k}}(Y)\right|\right]\right). (42)

To complete the proof, we now bound the second derivative D2GhD^{2}G_{h}. By Lemma 2 (i), for arbitrary indices 1j,kd1\leq j,k\leq d,

2Ghyjyk(Y)=0e2tEZ[2hyjyk(ety+1e2tZ)|y=Y]dt,\displaystyle\frac{\partial^{2}G_{h}}{\partial y_{j}\partial y_{k}}(Y)=-\int_{0}^{\infty}e^{-2t}\mathrm{E}_{Z}\left[\frac{\partial^{2}h}{\partial y_{j}\partial y_{k}}\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\Big{|}_{y=Y}\right]dt,

and, hence, by Lemma 2 (ii),

e2tj,kEZ[|2hyjyk(ety+1e2tZ)|y=Y|]\displaystyle e^{-2t}\sum_{j,k}\mathrm{E}_{Z}\left[\left|\frac{\partial^{2}h}{\partial y_{j}\partial y_{k}}\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\Big{|}_{y=Y}\right|\right]
C2λ2e2tEZ[j=1d𝟏[s,s+3λ](Vt)𝟏{|Vjt||Vmt|,mj}],\displaystyle\quad{}\leq C_{2}\lambda^{-2}e^{-2t}\mathrm{E}_{Z}\left[\sum_{j=1}^{d}\mathbf{1}_{[s,s+3\lambda]}\left(\|V^{t}\|_{\infty}\right)\mathbf{1}\left\{|V_{j}^{t}|\geq|V_{m}^{t}|,\>m\neq j\right\}\right],

where

Vt:=etY+1e2tZ.\displaystyle V^{t}:=e^{-t}Y+\sqrt{1-e^{-2t}}Z.

Since Ω\Omega is positive definite, no pair of entries in ZZ can be perfectly (positively or negatively) correlated. Therefore, j=1d𝟏[s,s+3λ](Vt)𝟏{|Vjt||Vmt|,mj}=1\sum_{j=1}^{d}\mathbf{1}_{[s,s+3\lambda]}\left(\|V^{t}\|_{\infty}\right)\mathbf{1}\left\{|V_{j}^{t}|\geq|V_{m}^{t}|,\>m\neq j\right\}=1 almost surely. Hence,

e2tj,kE[|2hyjyk(ety+1e2tZ)|y=Y|]C2λ2e2t.\displaystyle e^{-2t}\sum_{j,k}\mathrm{E}\left[\left|\frac{\partial^{2}h}{\partial y_{j}\partial y_{k}}\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\Big{|}_{y=Y}\right|\right]\leq C_{2}\lambda^{-2}e^{-2t}. (43)

Combine eq. (42)–(43), integrate over t(0,)t\in(0,\infty), and conclude that

E[h(Y)h(Z)]C2λ2(maxj,k|ΩjkΣjk|).\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]\leq C_{2}\lambda^{-2}\left(\max_{j,k}|\Omega_{jk}-\Sigma_{jk}|\right). (44)

To conclude, combine eq. (40) and eq. (44) and optimize over λ>0\lambda>0.

The case of positive semi-definite Ω𝟎\Omega\neq\mathbf{0}.

Suppose that both, Σ\Sigma and Ω\Omega are positive semi-definite. To avoid trivialities, we assume that Ω\Omega is not identical to zero.

Take WN(0,Id)W\sim N(0,I_{d}) independent of ZZ and, for η>0\eta>0 arbitrary, define Zη:=Z+ηWZ^{\eta}:=Z+\eta W. Now, consider eq. (40), i.e.

Δsups|E[h(Y)h(Z)]|+23λVar(Y)Var(Z)+λ2/12.\displaystyle\Delta\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}.

Expand above inequality yields

Δ\displaystyle\Delta sups|E[h(Y)h(Zη)]|+23λVar(Y)Var(Z)+λ2/12+sups|E[h(Zη)h(Z)]|\displaystyle\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z^{\eta})]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}+\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Z^{\eta})-h(Z)]\big{|}
sups|E[h(Y)h(Zη)]|+23λVar(Y)Var(Z)+λ2/12+ηλ2log2d,\displaystyle\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z^{\eta})]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}+\frac{\eta}{\lambda}\sqrt{2\log 2d}, (45)

where the last inequality holds because hh is λ1\lambda^{-1}-Lipschitz continuous w.r.t. the metric induced by the \ell_{\infty}-norm.

Since ZηN(0,Ω+η2Id)Z^{\eta}\sim N(0,\Omega+\eta^{2}I_{d}) has positive definite covariance matrix, we can bound the first term on the far right hand side in above display using eq. (44), i.e.

sups|E[h(Y)h(Zη)]|C2λ2(maxj,k|ΩjkΣjk+η2𝟏{j=k}|).\displaystyle\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z^{\eta})]\big{|}\leq C_{2}\lambda^{-2}\left(\max_{j,k}|\Omega_{jk}-\Sigma_{jk}+\eta^{2}\mathbf{1}\{j=k\}|\right). (46)

Combine eq. (C.3) and eq. (46) to obtain

Δ\displaystyle\Delta C2λ2(maxj,k|ΩjkΣjk+η2𝟏{j=k}|)+23λVar(Y)Var(Z)+λ2/12\displaystyle\leq C_{2}\lambda^{-2}\left(\max_{j,k}|\Omega_{jk}-\Sigma_{jk}+\eta^{2}\mathbf{1}\{j=k\}|\right)+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}
+ηλ2log2d.\displaystyle\quad{}+\frac{\eta}{\lambda}\sqrt{2\log 2d}.

Letting η0\eta\rightarrow 0 and optimizing over λ>0\lambda>0 gives the desired bound on Δ\Delta. ∎

C.4 Proofs of Propositions 4 and 5

Proof of Proposition 4.

By the triangle inequality,

sups0|{Sns}{Σ^n1/2ZsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\widehat{\Sigma}_{n}^{1/2}Z\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
sups0|{Sns}{Σn1/2Zs}|\displaystyle\quad{}\leq\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\right\}\Big{|}
+sups0|{Σn1/2Zs}{Σ^n1/2ZsX1,,Xn}|.\displaystyle\quad{}\quad{}+\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\widehat{\Sigma}_{n}^{1/2}Z\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}.

Now, apply Proposition 1 to the first summand and Proposition 3 to the second summand. This completes the proof. ∎

Proof of Proposition 5.

The proof is an adaptation of the proof of Theorem 3.1 in Chernozhukov et al. (2013) to our setup. To simplify notation we write cn(α):=cn(α;Σ^n)c_{n}^{*}(\alpha):=c_{n}(\alpha;\widehat{\Sigma}_{n}) and cn(α):=cn(α;Σ)c_{n}(\alpha):=c_{n}(\alpha;\Sigma); see also eq. (23). Note that

supα(0,1)|{Sn+Θcn(α)}α|\displaystyle\sup_{\alpha\in(0,1)}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}+\Theta\leq c_{n}^{*}(\alpha)\right\}-\alpha\right|
supα(0,1)|{Sn+Θcn(α)}{Sn+Θcn(α)}|+supα(0,1)|{Sn+Θcn(α)}{Σ1/2Zcn(α)}|.\displaystyle\begin{split}&\quad{}\leq\sup_{\alpha\in(0,1)}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}+\Theta\leq c_{n}^{*}(\alpha)\right\}-\mathbb{P}\left\{\|S_{n}\|_{\infty}+\Theta\leq c_{n}(\alpha)\right\}\right|\\ &\quad{}\quad{}+\sup_{\alpha\in(0,1)}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}+\Theta\leq c_{n}(\alpha)\right\}-\mathbb{P}\left\{\|\Sigma^{1/2}Z\|_{\infty}\leq c_{n}(\alpha)\right\}\right|.\end{split} (47)

For δ>0\delta>0 arbitrary, the first term can be upper bounded by Lemma 10 as

supα(0,1){cn(απn(δ))<Sn+Θcn(α+πn(δ))}+2{maxj,k|Σ^n,jkΣjk|>δ}\displaystyle\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}<\|S_{n}\|_{\infty}+\Theta\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}\Big{\}}+2\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\}
supα(0,1){cn(απn(δ))η<Sncn(α+πn(δ))+η}\displaystyle\quad\leq\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}-\eta<\|S_{n}\|_{\infty}\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}+\eta\Big{\}} (48)
+{|Θ|>η}+2{maxj,k|Σ^n,jkΣjk|>δ}\displaystyle\quad\quad+\mathbb{P}\left\{|\Theta|>\eta\right\}+2\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\}
supα(0,1){cn(απn(δ))η<Σ1/2Zcn(α+πn(δ))+η}+{|Θ|>η}\displaystyle\quad{}\leq\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}-\eta<\|\Sigma^{1/2}Z\|_{\infty}\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}+\eta\Big{\}}+\mathbb{P}\left\{|\Theta|>\eta\right\}
+2sups0|{Sns}{Σ1/2Zs}|+2{maxj,k|Σ^n,jkΣjk|>δ}\displaystyle\quad{}\quad{}+2\sup_{s\geq 0}\left|\mathbb{P}\big{\{}\|S_{n}\|_{\infty}\leq s\big{\}}-\mathbb{P}\big{\{}\|\Sigma^{1/2}Z\|_{\infty}\leq s\big{\}}\right|+2\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\}
supα(0,1){cn(απn(δ))<Σ1/2Zcn(α+πn(δ))}\displaystyle\quad{}\leq\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}<\|\Sigma^{1/2}Z\|_{\infty}\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}\Big{\}}
+η83Var(Σ1/2Z)+η2/3+{|Θ|>η}\displaystyle\quad{}\quad+\frac{\eta 8\sqrt{3}}{\sqrt{\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})+\eta^{2}/3}}+\mathbb{P}\left\{|\Theta|>\eta\right\}
+2sups0|{Sns}{Σ1/2Zs}|+2{maxj,k|Σ^n,jkΣjk|>δ}\displaystyle\quad{}\quad{}+2\sup_{s\geq 0}\left|\mathbb{P}\big{\{}\|S_{n}\|_{\infty}\leq s\big{\}}-\mathbb{P}\big{\{}\|\Sigma^{1/2}Z\|_{\infty}\leq s\big{\}}\right|+2\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\}
2πn(δ)+η83Var(Σ1/2Z)+η2/3+{|Θ|>η}+2sups0|{Sns}{Σ1/2Zs}|+2{maxj,k|Σ^n,jkΣjk|>δ},\displaystyle\begin{split}&\quad{}\leq 2\pi_{n}(\delta)+\frac{\eta 8\sqrt{3}}{\sqrt{\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})+\eta^{2}/3}}+\mathbb{P}\left\{|\Theta|>\eta\right\}\\ &\quad{}\quad{}+2\sup_{s\geq 0}\left|\mathbb{P}\big{\{}\|S_{n}\|_{\infty}\leq s\big{\}}-\mathbb{P}\big{\{}\|\Sigma^{1/2}Z\|_{\infty}\leq s\big{\}}\right|+2\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\},\\ \end{split} (49)

where the second inequality follows from (several applications of) Lemma 6 and the third from the definition of quantiles and because Σ1/2Z\|\Sigma^{1/2}Z\|_{\infty} has no point masses.

Let η>0\eta>0 be arbitrary. We now bound the second term on the right hand side of eq. (C.4) by

supα(0,1)|{Sn+Θcn(α)}{Sncn(α)}|\displaystyle\sup_{\alpha\in(0,1)}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}+\Theta\leq c_{n}(\alpha)\right\}-\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq c_{n}(\alpha)\right\}\right|
+sups0|{Sns}{Σ1/2Zs}|\displaystyle\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\Sigma^{1/2}Z\|_{\infty}\leq s\right\}\right|
{|Θ|>η}+sups0{sηSns+η}\displaystyle\quad{}\quad{}\lesssim\mathbb{P}\left\{|\Theta|>\eta\right\}+\sup_{s\geq 0}\mathbb{P}\left\{s-\eta\leq\|S_{n}\|_{\infty}\leq s+\eta\right\}
+sups0|{Sns}{Σ1/2Zs}|\displaystyle\quad{}\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\Sigma^{1/2}Z\|_{\infty}\leq s\right\}\right|
{|Θ|>η}+sups0{sηΣ1/2Zs+η}\displaystyle\quad{}\quad{}\lesssim\mathbb{P}\left\{|\Theta|>\eta\right\}+\sup_{s\geq 0}\mathbb{P}\left\{s-\eta\leq\|\Sigma^{1/2}Z\|_{\infty}\leq s+\eta\right\}
+sups0|{Sns}{Σ1/2Zs}|\displaystyle\quad{}\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\Sigma^{1/2}Z\|_{\infty}\leq s\right\}\right|
{|Θ|>η}+η43Var(Σ1/2Z)+η2/3+sups0|{Sns}{Σ1/2Zs}|,\displaystyle\begin{split}&\quad{}\quad{}\lesssim\mathbb{P}\left\{|\Theta|>\eta\right\}+\frac{\eta 4\sqrt{3}}{\sqrt{\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})+\eta^{2}/3}}\\ &\quad{}\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\Sigma^{1/2}Z\|_{\infty}\leq s\right\}\right|,\end{split} (50)

where the third inequality follows Lemma 6.

Combine eq. (C.4)–(C.4) to obtain

supα(0,1)|{Sn+Θcn(α)}α|\displaystyle\sup_{\alpha\in(0,1)}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}+\Theta\leq c_{n}^{*}(\alpha)\right\}-\alpha\right|
sups0|{Sns}{Σ1/2Zs}|+infδ>0{πn(δ)+{maxj,k|Σ^n,jkΣjk|>δ}}\displaystyle\quad{}\lesssim\sup_{s\geq 0}\left|\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|\Sigma^{1/2}Z\|_{\infty}\leq s\right\}\right|+\inf_{\delta>0}\left\{\pi_{n}(\delta)+\mathbb{P}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|>\delta\right\}\right\}
+infη>0{ηVar(Σ1/2Z)+{|Θ|>η}}.\displaystyle\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})}}+\mathbb{P}\left\{|\Theta|>\eta\right\}\right\}.

To complete the proof bound the first term on the right hand side by Proposition 1. ∎

Appendix D Proofs of the results in Section 3

D.1 Proofs of Theorem 1 and Corollary 2

Proof of Theorem 1.

Let δ>0\delta>0 be arbitrary and define rn(δ):=ψn(δ)ϕn(δ)r_{n}(\delta):=\psi_{n}(\delta)\vee\phi_{n}(\delta). Let n,δn\mathcal{H}_{n,\delta}\subset\mathcal{F}_{n} be a δFnP,2\delta\|F_{n}\|_{P,2}-net of n\mathcal{F}_{n} and set n,δ={fg:f,gn,ρ(f,g)<δFnP,2}\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>\rho(f,g)<\delta\|F_{n}\|_{P,2}\}. Since n\mathcal{F}_{n} is totally bounded with respect to ρ\rho, the n,δ\mathcal{H}_{n,\delta}’s are finite. Moreover, for each δ>0\delta>0 there exists a map πδ:nn,δ\pi_{\delta}:\mathcal{F}_{n}\rightarrow\mathcal{H}_{n,\delta} such that ρ(f,πδf)<δFnP,2\rho(f,\pi_{\delta}f)<\delta\|F_{n}\|_{P,2} for all fnf\in\mathcal{F}_{n} and, hence,

|GPn,δGPn|GP(πδid)nGPn,δ,\displaystyle\big{|}\|G_{P}\|_{\mathcal{H}_{n,\delta}}-\|G_{P}\|_{\mathcal{F}_{n}}\big{|}\leq\|G_{P}\circ(\pi_{\delta}-id)\|_{\mathcal{F}_{n}}\leq\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}},

where the first inequality holds by the reverse triangle inequality and the prelinearity of the Gaussian PP-bridge process (e.g. Dudley, 2014, p. 65, eq. 2.4). (Here and in the following idid stands for the identity map.) Since the map f(PnP)ff\mapsto(P_{n}-P)f is obviously linear, the same inequality holds for the empirical process {𝔾n(f):fn}\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}.

By Lemma 8 and Lemma 6,

sups0|{𝔾nns}{GPns}|{|𝔾nnGPn|>3rn(δ)Var(GPn)}+63rn(δ).\displaystyle\begin{split}&\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}-\|G_{P}\|_{\mathcal{F}_{n}}\big{|}>3\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}+6\sqrt{3}\sqrt{r_{n}(\delta)}.\end{split}

By the triangle inequality,

{|𝔾nnGPn|>3rn(δ)Var(GPn)}{|𝔾nn,δGPn,δ|>rn(δ)Var(GPn)}+{GPn,δ>rn(δ)Var(GPn)}+{𝔾nn,δ>rn(δ)Var(GPn)},\displaystyle\begin{split}&\mathbb{P}\left\{\big{|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}-\|G_{P}\|_{\mathcal{F}_{n}}\big{|}>3\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|\mathbb{G}_{n}\|_{\mathcal{H}_{n,\delta}}-\|G_{P}\|_{\mathcal{H}_{n,\delta}}\big{|}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}+\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\},\end{split}

Since δ\mathcal{H}_{\delta} is finite, Proposition 1 implies, for all s0s\geq 0,

{𝔾nn,δs}{GPn,δs}+KBn(δ),\displaystyle\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{H}_{n,\delta}}\leq s\right\}\leq\mathbb{P}\left\{\|G_{P}\|_{\mathcal{H}_{n,\delta}}\leq s\right\}+KB_{n}(\delta),

where K>0K>0 is an absolute constant and

Bn(δ):=FnP,3n1/6Var(GPn,δ)+Fn𝟏{Fn>Mn}P,33FnP,33+EGPn,δ+MnnVar(GPn,δ).\displaystyle B_{n}(\delta):=\frac{\|F_{n}\|_{P,3}}{n^{1/6}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M_{n}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{H}_{n,\delta}}+M_{n}}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}.

Therefore, by Strassen’s theorem (e.g. Dudley, 2002, Theorem 11.6.2),

sups0|{𝔾nns}{GPns}|KBn(δn)+63rn(δ)+{GPn,δ>rn(δ)Var(GPn)}+{𝔾nn,δ>rn(δ)Var(GPn)},\displaystyle\begin{split}&\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\\ &\quad{}\quad{}\leq KB_{n}(\delta_{n})+6\sqrt{3}\sqrt{r_{n}(\delta)}+\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\},\end{split} (51)

where K>0K>0 is an absolute constant and

Bn(δ)=FnP,3n1/6Var(GPn,δ)+Fn𝟏{Fn>Mn}P,33FnP,33+EGPn,δ+MnnVar(GPn,δ).\displaystyle B_{n}(\delta)=\frac{\|F_{n}\|_{P,3}}{n^{1/6}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M_{n}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{H}_{n,\delta}}+M_{n}}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}.

By the hypothesis on the moduli of continuity of the Gaussian PP-bridge and the empirical process, we can upper bound the right hand side in eq. (51) by

KBn(δ)+63rn(δ)+2ψn(δ)ϕn(δ)rn(δ)\displaystyle KB_{n}(\delta)+6\sqrt{3}\sqrt{r_{n}(\delta)}+2\frac{\psi_{n}(\delta)\vee\phi_{n}(\delta)}{r_{n}(\delta)} <KBn(δ)+(2+63)rn(δ)\displaystyle<KB_{n}(\delta)+(2+6\sqrt{3})\sqrt{r_{n}(\delta)}
<KBn(δ)+13rn(δ).\displaystyle<KB_{n}(\delta)+13\sqrt{r_{n}(\delta)}. (52)

Since rn(δ)r_{n}(\delta) is non-increasing in δ>0\delta>0, we can assume without loss of generality that there exist N1N\geq 1 such that for all δ>0\delta>0 and all n>Nn>N, rn(δ)<1/13\sqrt{r_{n}(\delta)}<1/13; otherwise the upper bound is trivial by eq. (D.1). Thus, since GPnGPn,δ\|G_{P}\|_{\mathcal{F}_{n}}\geq\|G_{P}\|_{\mathcal{H}_{n,\delta}}, Lemma 9 with X=GPn,δX=\|G_{P}\|_{\mathcal{H}_{n,\delta}} and Z=GPnZ=\|G_{P}\|_{\mathcal{F}_{n}} yields, for all n>Nn>N,

Var(GPn)\displaystyle\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})} Var(GPn,δ)+Var(GPnGPn,δ)\displaystyle\leq\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}+\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}-\|G_{P}\|_{\mathcal{H}_{n,\delta}})}
(a)Var(GPn,δ)+EGPn,δ2\displaystyle\overset{(a)}{\leq}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}+\sqrt{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}^{2}}
(b)Var(GPn,δ)+12/13Var(GPn),\displaystyle\overset{(b)}{\leq}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}+12/13\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})},

where (a) holds because 0GPnGPn,δGPn,δ0\leq\|G_{P}\|_{\mathcal{F}_{n}}-\|G_{P}\|_{\mathcal{H}_{n,\delta}}\leq\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}} and (b) holds because of rn(δ)<1/13\sqrt{r_{n}(\delta)}<1/13 and for suprema of Gaussian processes we can reverse Liapunov’s inequality (i.e. Lemma 15, the unknown constant can be absorbed into rn(δ)r_{n}(\delta)). Conclude that for all n>Nn>N,

(112/13)Var(GPn)Var(GPn,δ).\displaystyle(1-12/13)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\leq\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}. (53)

Combine eq. (51)–(53) and obtain for all n>Nn>N,

sups0|{𝔾nns}{GPns}|13(KBn+rn(δ)),\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\leq 13\left(KB_{n}+\sqrt{r_{n}(\delta)}\right),

where

Bn=FnP,3n1/6Var(GPn)+Fn𝟏{Fn>Mn}P,33FnP,33+EGPn+MnnVar(GPn).\displaystyle B_{n}=\frac{\|F_{n}\|_{P,3}}{n^{1/6}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M_{n}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M_{n}}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}.

Increase the constant 1313 on the right hand side until the bound holds also for all 1nN1\leq n\leq N. Since δ>0\delta>0 is arbitrary, set δ=δ\delta=\delta^{*} such that rn(δ)=rn:=inf{rn(δ):δ>0}r_{n}(\delta^{*})=r_{n}:=\inf\big{\{}r_{n}(\delta):\delta>0\big{\}}.

Proof of Corollary 2.

We only need to verify that under the stated assumptions Theorem 1 applies with rn0r_{n}\equiv 0. Let n1n\geq 1 be arbitrary and define n,δ:=n,δn,δ\mathcal{F}_{n,\delta}^{\prime}:=\mathcal{F}_{n,\delta}^{\prime\prime}\cap\mathcal{F}_{n,\delta}^{\prime\prime\prime}, where n,δ={fg:f,gn,dP(f,g)<δFnP,2}\mathcal{F}_{n,\delta}^{\prime\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>d_{P}(f,g)<\delta\|F_{n}\|_{P,2}\} and n,δ={fg:f,gn,dPn(f,g)<δFnP,2}\mathcal{F}_{n,\delta}^{\prime\prime\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>d_{P_{n}}(f,g)<\delta\|F_{n}\|_{P,2}\}. Note that n,δ={fg:f,gn,(dPdPn)(f,g)<δFnP,2}\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>(d_{P}\vee d_{P_{n}})(f,g)<\delta\|F_{n}\|_{P,2}\}. First, by Lemma 16, n\mathcal{F}_{n} is totally bounded w.r.t. the pseudo-metric dPd_{P}. Second, by Lemma 18, EGPn,δEGPn,δ0\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\leq\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime\prime}}\rightarrow 0 as δ\delta\rightarrow\infty. Thus, for each n1n\geq 1, there exists some ψn\psi_{n} such that ψn(δ)0\psi_{n}(\delta)\rightarrow 0 as δ0\delta\rightarrow 0. Third, 𝔾nn,δP,1𝔾nn,δP,12nFnP,2δ\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}\leq\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime\prime\prime}}\big{\|}_{P,1}\leq 2\sqrt{n}\|F_{n}\|_{P,2}\delta. Thus, for each n1n\geq 1, there also exists ϕn\phi_{n} such that ψn(δ)0\psi_{n}(\delta)\rightarrow 0 as δ0\delta\rightarrow 0. This completes the proof.

(Note that the last argument does not imply that the empirical process is asymptotically dPd_{P}-equicontinuous: Not only do we use a different norm, but we keep the sample size nn fixed and only consider the limit δ0\delta\rightarrow 0. This shows once again that Gaussian approximability is a strictly weaker concept than weak convergence, see also Remark 3 and the discussion in Section 2.2.) ∎

D.2 Proofs of Theorem 3 and Corollaries 4 and 5

Proof of Theorem 3.

The key idea is to construct “entangled δ\delta-nets” of the function classes n\mathcal{F}_{n} and 𝒢n\mathcal{G}_{n}. The precise meaning of this term and our reasoning for this idea will become clear in the course of the proof.

Let δ>0\delta>0 be arbitrary and define rn(δ):=ψn(δ)ϕn(δ)r_{n}(\delta):=\psi_{n}(\delta)\vee\phi_{n}(\delta). Let n,δn\mathcal{H}_{n,\delta}\subseteq\mathcal{F}_{n} and n,δ𝒢n\mathcal{I}_{n,\delta}\subseteq\mathcal{G}_{n} be δFnP,2\delta\|F_{n}\|_{P,2}- and δGnP,2\delta\|G_{n}\|_{P,2}-nets of n\mathcal{F}_{n} and 𝒢n\mathcal{G}_{n} w.r.t. ρ1\rho_{1} and ρ2\rho_{2}, respectively. Since n\mathcal{F}_{n} and 𝒢n\mathcal{G}_{n} are totally bounded w.r.t. ρ1\rho_{1} and ρ2\rho_{2}, respectively, the n,δ\mathcal{H}_{n,\delta}’s and n,δ\mathcal{I}_{n,\delta}’s are finite.

Next, let π1𝒢n\pi^{1}_{\mathcal{G}_{n}} be the projection from n𝒢n\mathcal{F}_{n}\cup\mathcal{G}_{n} onto 𝒢n\mathcal{G}_{n} defined by ρ1(h,π1𝒢nh)=infg𝒢nρ1(h,g)\rho_{1}(h,\pi^{1}_{\mathcal{G}_{n}}h)=\inf_{g\in\mathcal{G}_{n}}\rho_{1}(h,g) for all hn𝒢nh\in\mathcal{F}_{n}\cup\mathcal{G}_{n}. If this projection is not unique for some hn𝒢nh\in\mathcal{F}_{n}\cup\mathcal{G}_{n}, choose any of the equivalent points. Define the projection π2n\pi^{2}_{\mathcal{F}_{n}} from n𝒢n\mathcal{F}_{n}\cup\mathcal{G}_{n} onto n\mathcal{F}_{n} analogously. Finally, define the “entangled δ\delta-nets” of n\mathcal{F}_{n} and 𝒢n\mathcal{G}_{n} as

~n,δ\displaystyle\widetilde{\mathcal{H}}_{n,\delta} :=π2n(n,δn,δ)=n,δπ2n(n,δ)n,and\displaystyle:=\pi^{2}_{\mathcal{F}_{n}}\big{(}\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}\big{)}=\mathcal{H}_{n,\delta}\cup\pi^{2}_{\mathcal{F}_{n}}\big{(}\mathcal{I}_{n,\delta}\big{)}\subset\mathcal{F}_{n},\quad\text{and}
~n,δ\displaystyle\widetilde{\mathcal{I}}_{n,\delta} :=π1𝒢n(n,δn,δ)=π1𝒢n(n,δ)n,δ𝒢n.\displaystyle:=\pi^{1}_{\mathcal{G}_{n}}\big{(}\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}\big{)}=\pi^{1}_{\mathcal{G}_{n}}\big{(}\mathcal{H}_{n,\delta}\big{)}\cup\mathcal{I}_{n,\delta}\subset\mathcal{G}_{n}.

Note that the entangled sets ~n,δ\widetilde{\mathcal{H}}_{n,\delta} and ~n,δ\widetilde{\mathcal{I}}_{n,\delta} are still δFnP,2\delta\|F_{n}\|_{P,2}- and δGnP,2\delta\|G_{n}\|_{P,2}-nets of n\mathcal{F}_{n} and 𝒢n\mathcal{G}_{n} w.r.t. ρ1\rho_{1} and ρ2\rho_{2}. Hence, for each δ>0\delta>0, there exist maps π~δ:n~n,δ\tilde{\pi}_{\delta}:\mathcal{F}_{n}\rightarrow\widetilde{\mathcal{H}}_{n,\delta} and θ~δ:𝒢n~n,δ\tilde{\theta}_{\delta}:\mathcal{G}_{n}\rightarrow\widetilde{\mathcal{I}}_{n,\delta} such that ρ1(f,π~δf)δFnP,2\rho_{1}(f,\tilde{\pi}_{\delta}f)\leq\delta\|F_{n}\|_{P,2} for all fnf\in\mathcal{F}_{n} and ρ2(g,θ~δg)δGnQ,2\rho_{2}(g,\tilde{\theta}_{\delta}g)\leq\delta\|G_{n}\|_{Q,2} for all g𝒢ng\in\mathcal{G}_{n}. Therefore, by the reverse triangle inequality and the linearity of Gaussian PP-bridge processes and QQ-motions,

|GP~n,δGPn|GP(π~δid)nGPn,δ,and\displaystyle\big{|}\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}}-\|G_{P}\|_{\mathcal{F}_{n}}\big{|}\leq\|G_{P}\circ(\tilde{\pi}_{\delta}-id)\|_{\mathcal{F}_{n}}\leq\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}},\quad{}\text{and}
|ZQ~n,δZQ𝒢n|ZQ(θ~δid)𝒢nZQ𝒢n,δ,\displaystyle\big{|}\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}}-\|Z_{Q}\|_{\mathcal{G}_{n}}\big{|}\leq\|Z_{Q}\circ(\tilde{\theta}_{\delta}-id)\|_{\mathcal{G}_{n}}\leq\|Z_{Q}\|_{\mathcal{G}_{n,\delta}^{\prime}},

where n,δ={f1f2:f1,f2n,ρ1(f1,f2)δFnP,2}\mathcal{F}_{n,\delta}^{\prime}=\{f_{1}-f_{2}:f_{1},f_{2}\in\mathcal{F}_{n},\>\rho_{1}(f_{1},f_{2})\leq\delta\|F_{n}\|_{P,2}\} and 𝒢n,δ={g1g2:g1,g2𝒢n,ρ2(g1,g2)δGnP,2}\mathcal{G}_{n,\delta}^{\prime}=\{g_{1}-g_{2}:g_{1},g_{2}\in\mathcal{G}_{n},\>\rho_{2}(g_{1},g_{2})\leq\delta\|G_{n}\|_{P,2}\}.

By Lemma 8 and Lemma 6,

sups0|{GPns}{ZQ𝒢ns}|{|GPnZQ𝒢n|>3rn(δ)Var(GPn)Var(ZQ𝒢n)}+63rn(δ).\displaystyle\begin{split}&\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|G_{P}\|_{\mathcal{F}_{n}}-\|Z_{Q}\|_{\mathcal{G}_{n}}\big{|}>3\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}+6\sqrt{3}\sqrt{r_{n}(\delta)}.\end{split} (54)

By the triangle inequality,

{|GPnZQ𝒢n|>3rn(δ)Var(GPn)Var(ZQ𝒢n)}{|GP~n,δZQ~n,δ|>rn(δ)Var(GPn)Var(ZQ𝒢n)}+{GPδ>rn(δ)Var(GPn)}+{ZQ𝒢δ>rn(δ)Var(ZQ𝒢n)}.\displaystyle\begin{split}&\mathbb{P}\left\{\big{|}\|G_{P}\|_{\mathcal{F}_{n}}-\|Z_{Q}\|_{\mathcal{G}_{n}}\big{|}>3\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}}-\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}}\big{|}>\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{\delta}^{\prime}}>\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}+\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{\delta}^{\prime}}>\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}.\end{split} (55)

Since GP~n,δGPπ2nn,δn,δ\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}}\equiv\|G_{P}\circ\pi^{2}_{\mathcal{F}_{n}}\|_{\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}} and ZQ~n,δZQπ1𝒢nn,δn,δ\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}}\equiv\|Z_{Q}\circ\pi^{1}_{\mathcal{G}_{n}}\|_{\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}} (note: two finite dimensional mean zero Gaussian vectors with the same (!) index sets; to achieve this we needed the “entangled δ\delta-nets”) we have by Proposition 3, for all s0s\geq 0,

{GP~n,δs}{ZQ~n,δs}+K(Δn(δ)Var(GP~n,δ)Var(ZQ~n,δ))1/3,\displaystyle\mathbb{P}\left\{\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}}\leq s\right\}\leq\mathbb{P}\left\{\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}}\leq s\right\}+K\left(\frac{\Delta_{n}(\delta)}{\mathrm{Var}(\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}})\vee\mathrm{Var}(\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}})}\right)^{1/3},

where K>0K>0 is an absolute constant and, for all δ>0\delta>0,

Δn(δ)\displaystyle\Delta_{n}(\delta) :=suph1,h2n,δn,δ|E[(GPπ2n)(h1)(GPπ2n)(h2)]E[(ZQπ1𝒢n)(h1)(ZQπ1𝒢n)(h2)]|\displaystyle:=\sup_{h_{1},h_{2}\in\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}}\big{|}\mathrm{E}[(G_{P}\circ\pi^{2}_{\mathcal{F}_{n}})(h_{1})(G_{P}\circ\pi^{2}_{\mathcal{F}_{n}})(h_{2})]-\mathrm{E}[(Z_{Q}\circ\pi^{1}_{\mathcal{G}_{n}})(h_{1})(Z_{Q}\circ\pi^{1}_{\mathcal{G}_{n}})(h_{2})]\big{|}
supf1,f2n|E[GP(f1)GP(f2)]E[ZQ(π1𝒢nf1)ZQ(π1𝒢nf2)]|\displaystyle\leq\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\big{|}\mathrm{E}[G_{P}(f_{1})G_{P}(f_{2})]-\mathrm{E}[Z_{Q}(\pi^{1}_{\mathcal{G}_{n}}f_{1})Z_{Q}(\pi^{1}_{\mathcal{G}_{n}}f_{2})]\big{|}
supg1,g2𝒢n|E[ZQ(g1)ZQ(g2)]E[GP(π2ng1)GP(π2ng2)]|\displaystyle\quad\quad\quad\quad\quad\quad\bigvee\sup_{g_{1},g_{2}\in\mathcal{G}_{n}}\big{|}\mathrm{E}[Z_{Q}(g_{1})Z_{Q}(g_{2})]-\mathrm{E}[G_{P}(\pi^{2}_{\mathcal{F}_{n}}g_{1})G_{P}(\pi^{2}_{\mathcal{F}_{n}}g_{2})]\big{|}
=:ΔP,Q(n,𝒢n).\displaystyle=:\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n}).

Therefore, by eq. (54) and eq. (55), Strassen’s theorem (e.g. Dudley, 2002, Theorem 11.6.2; see also proof of Theorem 1), and Markov’s inequality,

sups0|{GPns}{ZQ𝒢ns}|K(ΔP,Q(n,𝒢n)Var(GP~n,δ)Var(ZQ~n,δ))1/3+13rn(δ).\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\leq K\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}})\vee\mathrm{Var}(\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}})}\right)^{1/3}+13\sqrt{r_{n}(\delta)}. (56)

By the same arguments as in the proof of Theorem 1, there exists N>0N>0 such that for all δ>0\delta>0 and all nNn\geq N we have rn(δ)<1/13\sqrt{r_{n}(\delta)}<1/13 and, hence,

(ΔP,Q(n,𝒢n)Var(GP~n,δ)Var(ZQ~n,δ))1/313(ΔP,Q(n,𝒢n)Var(GPn)Var(ZQ𝒢n))1/3.\displaystyle\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}})\vee\mathrm{Var}(\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}})}\right)^{1/3}\leq 13\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right)^{1/3}. (57)

Combine eq. (56) and eq. (57) to conclude that the claim of the theorem holds for all nNn\geq N. Since for rn(δ)1/13\sqrt{r_{n}(\delta)}\geq 1/13 the upper bound is trivial by eq. (56), the theorem holds in fact for all n1n\geq 1. Since δ>0\delta>0 is arbitrary, set δ=δ\delta=\delta^{*} such that rn(δ)=rn:=inf{rn(δ):δ>0}r_{n}(\delta^{*})=r_{n}:=\inf\big{\{}r_{n}(\delta):\delta>0\big{\}}. ∎

Proof of Corollary 4.

We only need to verify that under the stated assumptions Theorem 3 applies with rn0r_{n}\equiv 0. This follows by the same argument as in proof of Corollary 2. We therefore omit a proof. ∎

Proof of Corollary 5.

First, apply Theorem 3 (and Remark 3) with ρ=dP\rho=d_{P}, Q=PQ=P, and 𝒢nn\mathcal{G}_{n}\subseteq\mathcal{F}_{n} a δFP,2\delta\|F\|_{P,2}-net of n\mathcal{F}_{n} with respect to dPd_{P}. Next, compute

ΔP,P(n,𝒢n)\displaystyle\Delta_{P,P}(\mathcal{F}_{n},\mathcal{G}_{n}) =supf1,f2n|E[GP(f1)GP(f2)]E[GP(π𝒢nf1)GP(π𝒢nf2)]|\displaystyle=\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\big{|}\mathrm{E}[G_{P}(f_{1})G_{P}(f_{2})]-\mathrm{E}[G_{P}(\pi_{\mathcal{G}_{n}}f_{1})G_{P}(\pi_{\mathcal{G}_{n}}f_{2})]\big{|}
supf1,f2nE[GP2(f1)(GP(f2)GP(π𝒢nf2))2]1/2\displaystyle\leq\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\mathrm{E}\left[G_{P}^{2}(f_{1})\big{(}G_{P}(f_{2})-G_{P}(\pi_{\mathcal{G}_{n}}f_{2})\big{)}^{2}\right]^{1/2}
+supf1,f2nE[GP2(π𝒢nf2)(GP(f1)GP(π𝒢nf1))2]1/2\displaystyle\quad+\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\mathrm{E}\left[G_{P}^{2}(\pi_{\mathcal{G}_{n}}f_{2})\big{(}G_{P}(f_{1})-G_{P}(\pi_{\mathcal{G}_{n}}f_{1})\big{)}^{2}\right]^{1/2}
2δFP,2supfnPf2.\displaystyle\leq 2\delta\|F\|_{P,2}\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{2}}.

Lastly, verify that under the stated assumptions Theorem 3 applies with rn0r_{n}\equiv 0. But this follows by the same argument as in proof of Corollary 2. This completes the proof. ∎

D.3 Proofs of Theorems 6 and 9, Proposition 7, and Corollary 8

Proof of Theorem 6.

By the triangle inequality

sups0|{𝔾nns}{ZQ𝒢nsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
sups0|{𝔾nns}{GPns}|\displaystyle\quad{}\leq\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}
+sups0|{GPns}{ZQ𝒢nsX1,,Xn}|.\displaystyle\quad{}\quad{}+\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}.

To complete the proof, apply Theorem 1 to the first summand and Theorem 3 to the second summand. ∎

Proof of Proposition 7.

Our proof is modeled after the proof of Theorem 1 in Jain and Kallianpur (1970). However, unlike them we do not argue via the reproducing kernel Hilbert space associated to Q\mathcal{E}_{Q}. Instead, we leverage the fact that under the stated assumptions ZQZ_{Q} has a version that is both a mean-square continuous stochastic process and a random element on some Hilbert space.

Since n\mathcal{F}_{n} is compact it is totally bounded and, by Lemma 16, separable w.r.t eQe_{Q}. Hence, the process ZQ={ZQ(f):fn}Z_{Q}=\{Z_{Q}(f):f\in\mathcal{F}_{n}\} has a separable and jointly measurable version Z~Q\widetilde{Z}_{Q} (e.g. Giné and Nickl, 2016, Proposition 2.1.12, note that the eQe_{Q} is the intrinsic standard deviation metric of the Gaussian QQ-motion ZQZ_{Q}). Since n\mathcal{F}_{n} is compact and Q\mathcal{E}_{Q} is continuous, TQT_{\mathcal{E}_{Q}} is a bounded linear operator. Let {λk,φk)}k=1\{\lambda_{k},\varphi_{k})\}_{k=1}^{\infty} be the eigenvalue and eigenfunction pairs of TQT_{\mathcal{E}_{Q}} and define

Z~Qm(f):=k=1mZ~Q,φkφk(f),fn.\displaystyle\widetilde{Z}_{Q}^{m}(f):=\sum_{k=1}^{m}\langle\widetilde{Z}_{Q},\varphi_{k}\rangle\varphi_{k}(f),\quad f\in\mathcal{F}_{n}. (58)

By continuity of Q\mathcal{E}_{Q}, Z~Q\widetilde{Z}_{Q} is a mean-square continuous stochastic process in L2(n,n,μ)L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu). Moreover, by joint measurability, Z~Q\widetilde{Z}_{Q} is also a random element on the Hilbert space (L2(n,n,μ),,)\big{(}L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu),\langle\cdot,\cdot\rangle\big{)}. Hence, Theorems 7.3.5 and 7.4.3 in Hsing and Eubank (2015) apply, and we conclude that the partial sums Z~Qm(f)\widetilde{Z}_{Q}^{m}(f) converge to Z~Q(f)\widetilde{Z}_{Q}(f) in L2(Ω,𝒜,)L_{2}(\Omega,\mathcal{A},\mathbb{P}) as mm\rightarrow\infty pointwise in fnf\in\mathcal{F}_{n}.

Now, observe that

E[Z~Q,φk]=0andE[Z~Q,φkZ~Q,φj]=λk𝟏{j=k},j,k.\displaystyle\mathrm{E}[\langle\widetilde{Z}_{Q},\varphi_{k}\rangle]=0\quad\quad\mathrm{and}\quad\quad\mathrm{E}[\langle\widetilde{Z}_{Q},\varphi_{k}\rangle\langle\widetilde{Z}_{Q},\varphi_{j}\rangle]=\lambda_{k}\mathbf{1}\{j=k\},\quad\forall j,k\in\mathbb{N}. (59)

Thus, the Z~Q,φk\langle\widetilde{Z}_{Q},\varphi_{k}\rangle’s are uncorrelated random variables. Since the Z~Q,φk\langle\widetilde{Z}_{Q},\varphi_{k}\rangle’s are necessarily Gaussian (inner product of a Gaussian random element with a deterministic function φk\varphi_{k}!), they are in fact independent Gaussian random variables with mean zero and variance λk\lambda_{k}. Therefore, by Lévy’s theorem, Z~Qm(f)\widetilde{Z}_{Q}^{m}(f) converges to Z~Q(f)\widetilde{Z}_{Q}(f) almost surely pointwise in fnf\in\mathcal{F}_{n}. Thus, for ff\in\mathcal{F},

Z~Q(f)=Z~Q(f)a.s.\displaystyle\widetilde{Z}_{Q}(f)=\widetilde{Z}_{Q}^{\infty}(f)\quad a.s. (60)

Since EZQn<\mathrm{E}\|Z_{Q}\|_{\mathcal{F}_{n}}<\infty, Lemma 14 implies that Z~Q\widetilde{Z}_{Q} is almost surely bounded. Moreover, since Q\mathcal{E}_{Q} is continuous and n\mathcal{F}_{n} compact, Lemma 4.6.6 (4) in Hsing and Eubank (2015) implies that Z~Q\widetilde{Z}_{Q} is uniformly continuous. Consequently, by construction, Z~Qm\widetilde{Z}_{Q}^{m} is almost surely bounded and uniformly continuous, too. Thus, Z~Q\widetilde{Z}_{Q} and Z~Qm\widetilde{Z}_{Q}^{m} can be considered random elements on the Banach space C(n)C(\mathcal{F}_{n}) of bounded continuous functions on n\mathcal{F}_{n} equipped with the supremum norm.

Recall that the dual space C(n)C^{*}(\mathcal{F}_{n}) is the space of finite signed Borel measures μ\mu on n\mathcal{F}_{n} equipped with the total variation norm. The dual pairing between XC(n)X\in C(\mathcal{F}_{n}) and μC(n)\mu\in C^{*}(\mathcal{F}_{n}) is X,μ:=nX(f)dμ(f)\langle X,\mu\rangle:=\int_{\mathcal{F}_{n}}X(f)d\mu(f). We compute

E|Z~Qm,μZ~Q,μ|\displaystyle\mathrm{E}\left|\langle\widetilde{Z}_{Q}^{m},\mu\rangle-\langle\widetilde{Z}_{Q},\mu\rangle\right| =E|n(Z~Qm(f)Z~Q(f))dμ(f)|\displaystyle=\mathrm{E}\left|\int_{\mathcal{F}_{n}}\big{(}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}(f)\big{)}d\mu(f)\right|
supfnE[|Z~Qm(f)Z~Q(f)|]μTV\displaystyle\leq\sup_{f\in\mathcal{F}_{n}}\mathrm{E}\left[\big{|}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}(f)\big{|}\right]\|\mu\|_{TV}
supfnE[(Z~Qm(f)Z~Q(f))2]1/2μTV\displaystyle\leq\sup_{f\in\mathcal{F}_{n}}\mathrm{E}\left[\big{(}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}(f)\big{)}^{2}\right]^{1/2}\|\mu\|_{TV}
=(a)supfnE[(Z~Qm(f)Z~Q(f))2]1/2μTV\displaystyle\overset{(a)}{=}\sup_{f\in\mathcal{F}_{n}}\mathrm{E}\left[\big{(}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}^{\infty}(f)\big{)}^{2}\right]^{1/2}\|\mu\|_{TV}
=supfn(k=m+1λkφk2(f))1/2μTV\displaystyle=\sup_{f\in\mathcal{F}_{n}}\left(\sum_{k=m+1}^{\infty}\lambda_{k}\varphi_{k}^{2}(f)\right)^{1/2}\|\mu\|_{TV}
(b)0asm,\displaystyle\overset{(b)}{\rightarrow}0\quad\text{as}\quad m\rightarrow\infty, (61)

where (a) holds by (60) and (b) holds by Lemma 4.6.6 (3) in Hsing and Eubank (2015). Since (D.3) implies convergence of the dual pairings in probability for every μC(n)\mu\in C^{*}(\mathcal{F}_{n}), we have, by Itô-Nisio’s theorem,

Z~QmZ~Qn0asm\displaystyle\big{\|}\widetilde{Z}_{Q}^{m}-\widetilde{Z}_{Q}\big{\|}_{\mathcal{F}_{n}}\rightarrow 0\quad\text{as}\quad m\rightarrow\infty

almost surely. Recall that Z~Q\widetilde{Z}_{Q} is a version of ZQZ_{Q}. This completes the proof. ∎

Proof of Corollary 8.

By Theorem 6 and arguments as in the proof of Corollary 2 we have, for each M0M\geq 0,

sups0|{𝔾nns}{Z^nmnsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
FnP,3n1/3Var(GPn)+Fn𝟏{Fn>M}P,33FnP,33+EGPn+MnVar(GPn)\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\|F_{n}\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}
+(supf,gn|𝒞P(f,g)𝒞^nm(f,g)|Var(GPn))1/3,\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right)^{1/3},

where \lesssim hides an absolute constant independent of n,m,M,n,Fn,Pnn,m,M,\mathcal{F}_{n},F_{n},P_{n}, and PP.

Above approximation error can be further upper bounded by

supf,gn|𝒞P(f,g)𝒞^nm(f,g)|supf,gn|𝒞P(f,g)𝒞^n(f,g)|+supf,gn|𝒞^n(f,g)𝒞^nm(f,g)|.\displaystyle\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}\leq\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}(f,g)\big{|}+\sup_{f,g\in\mathcal{F}_{n}}\big{|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}.

If n\mathcal{F}_{n} is compact w.r.t. d𝒞^nd_{\widehat{\mathcal{C}}_{n}}, then by Mercer’s theorem (e.g. Hsing and Eubank, 2015, Lemma 4.6.6),

supf,gn|𝒞^n(f,g)𝒞^nm(f,g)|=supf,gnk=m+1|λ^kφ^k(f)φ^k(g)|0asm.\displaystyle\sup_{f,g\in\mathcal{F}_{n}}\big{|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}=\sup_{f,g\in\mathcal{F}_{n}}\sum_{k=m+1}^{\infty}\left|\widehat{\lambda}_{k}\widehat{\varphi}_{k}(f)\widehat{\varphi}_{k}(g)\right|\rightarrow 0\quad\text{as}\quad m\rightarrow\infty.

This completes the proof of the corollary. ∎

Proof of Theorem 9.

The proof is identical to the proof of Proposition 5. The only differences are the notation and that we use Lemma 12 instead of Lemma 10 and Corollary 2/ Theorem 6 instead of Proposition 1. ∎

Appendix E Proofs of the results in Section 4

Proof of Proposition 1.

Since the problem is finite dimensional we do not have to develop the Karhunen-Loève expansion from Section 3.5. The covariance function 𝒞ψ\mathcal{C}_{\psi} of the empirical process 1ni=1nψiu\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\psi_{i}^{\prime}u with u{vd:vSd1}u\in\{v\in\mathbb{R}^{d}:v\in S^{d-1}\} has the explicit form (u,v)𝒞ψ(u,v)=uE[ψ1ψ1]v(u,v)\mapsto\mathcal{C}_{\psi}(u,v)=u^{\prime}\mathrm{E}[\psi_{1}\psi_{1}^{\prime}]v. Hence, a natural estimate of 𝒞ψ\mathcal{C}_{\psi} is (u,v)𝒞^ψ(u,v)=uΩ^ψv(u,v)\mapsto\widehat{\mathcal{C}}_{\psi}(u,v)=u^{\prime}\widehat{\Omega}_{\psi}v. A version of a centered Gaussian process defined on {vd:v2=1}\{v\in\mathbb{R}^{d}:\|v\|_{2}=1\} and with covariance function 𝒞^ψ\widehat{\mathcal{C}}_{\psi} is {Z^ψu:uSd1}\{\widehat{Z}_{\psi}^{\prime}u:u\in S^{d-1}\} where Z^ψX1,,XnN(0,Ω^ψ)\widehat{Z}_{\psi}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{\psi}). It is standard to verify that, under the assumptions of the theorem, the metric entropy integrals associated to the Gaussian processes GPG_{P} and Z^ψ\widehat{Z}_{\psi} are finite for every fixed nn and dd. Thus, by Lemma 17 there exist versions of GPG_{P} and Z^ψ\widehat{Z}_{\psi} which are almost surely bounded and almost surely uniformly continuous. Hence, the modulus of continuity condition (6) holds for these versions and we can take rn=0r_{n}=0. It follows by Theorem 6 that, for all α(0,1)\alpha\in(0,1),

supα(0,1)|{nθ^nθ02cn(α;Ω^ψ)}α|\displaystyle\sup_{\alpha\in(0,1)}\Big{|}\mathbb{P}\left\{\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2}\leq c_{n}(\alpha;\widehat{\Omega}_{\psi})\right\}-\alpha\Big{|}
(E[ψ123])1/3n1/6Var(Zψ2p)+E[ψ123𝟏{ψ123>nE[ψ123]}]E[ψ123]+E[Zψ2]nVar(Zψ2)\displaystyle\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\|\psi_{1}\|_{2}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|\|Z_{\psi}\|_{2}\|_{p})}}+\frac{\mathrm{E}\left[\|\psi_{1}\|_{2}^{3}\mathbf{1}\{\|\psi_{1}\|_{2}^{3}>n\>\mathrm{E}[\|\psi_{1}\|_{2}^{3}]\}\right]}{\mathrm{E}\left[\|\psi_{1}\|_{2}^{3}\right]}+\frac{\mathrm{E}[\|Z_{\psi}\|_{2}]}{\sqrt{n\mathrm{Var}(\|Z_{\psi}\|_{2})}}
+infδ>0{(δVar(Zψ2))1/3+{Ω^ψE[ψ1ψ1]op>δ}}\displaystyle\quad{}\quad{}\quad{}\quad{}+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\|Z_{\psi}\|_{2})}\right)^{1/3}+\mathbb{P}\left\{\|\widehat{\Omega}_{\psi}-\mathrm{E}[\psi_{1}\psi_{1}^{\prime}]\|_{op}>\delta\right\}\right\}
+infη>0{ηVar(Zψ2)+{|Θn|>η}}.\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\|Z_{\psi}\|_{2})}}+\mathbb{P}\left\{|\Theta_{n}|>\eta\right\}\right\}.

Under Assumption 1 or 2 the right hand side in above inequality vanishes as nn diverges. For details we refer to Appendix A.1 in Giessing and Fan (2023). This completes the proof. ∎

Proof of Proposition 2.

Since the problem is finite dimensional we do not have to develop the Karhunen-Loève expansion from Section 3.5. Consider TnnQnQnT_{n}\equiv\sqrt{n}\|Q_{n}-Q\|_{{\mathcal{F}_{n}}} where QnQ_{n} is the empirical measure of the collection of Yi=vech(XiXi)HdY_{i}=\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime})H_{d}^{\prime}, 1in1\leq i\leq n, QQ the pushforward of PP under the map Xvech(XX)HdX\mapsto\mathrm{vech}^{\prime}(XX^{\prime})H_{d}^{\prime}, and n={yy(vu):u,vSd1}\mathcal{F}_{n}=\{y\mapsto y(v\otimes u):u,v\in S^{d-1}\}. Since each fnf\in\mathcal{F}_{n} has a (not necessarily unique) representation in terms of u,vSd1u,v\in S^{d-1}, we will identify fnf\in\mathcal{F}_{n} with pairs (u,v)Sd1×Sd1(u,v)\in S^{d-1}\times S^{d-1} when there is no danger of confusion. Clearly, Fn(y)=y22F_{n}(y)=\|y\|_{2}^{2} is an envelope function for n\mathcal{F}_{n}.

Define the Gaussian processes {Z^nm(f):fn}{Z^nHd(vu):u,vSd1}\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\}\equiv\{\widehat{Z}_{n}^{\prime}H_{d}^{\prime}(v\otimes u):u,v\in S^{d-1}\}, where Z^nX1,,XnN(0,Ω^n)\widehat{Z}_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{n}) with Ω^n=n1i=1nvech(XiXiΣ^n)vech(XiXiΣ^n)d(d+1)/2×d(d+1)/2\widehat{\Omega}_{n}=n^{-1}\sum_{i=1}^{n}\mathrm{vech}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n})\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n})\in\mathbb{R}^{d(d+1)/2\times d(d+1)/2}, and {GQ(f):fn}{ZHd(vu):u,vSd1}\{G_{Q}(f):f\in\mathcal{F}_{n}\}\equiv\{Z^{\prime}H_{d}^{\prime}(v\otimes u):u,v\in S^{d-1}\}, where ZN(0,Ω)Z\sim N(0,\Omega) with Ω=E[vech(XXΣ)vech(XXΣ)]d(d+1)/2×d(d+1)/2\Omega=\mathrm{E}[\mathrm{vech}(XX^{\prime}-\Sigma)\mathrm{vech}^{\prime}(XX^{\prime}-\Sigma)]\in\mathbb{R}^{d(d+1)/2\times d(d+1)/2}.

Without loss of generality, we can assume that tr(Ω)\mathrm{tr}(\Omega) and Ωop\|\Omega\|_{op} are finite since otherwise the statement is trivially true. Thus, under the assumptions of the theorem, the metric entropy integrals associated to the Gaussian processes Z^nm\widehat{Z}_{n}^{m} and GQG_{Q} are finite for every fixed nn and dd. Therefore by Lemma 17 there exist versions of Z^nm\widehat{Z}_{n}^{m} and GQG_{Q} (which we also denote by Z^nm\widehat{Z}_{n}^{m} and GQG_{Q}) that are almost surely bounded and almost surely uniformly continuous. The modulus of continuity condition (6) holds for these versions and, in particular, we can take rn=0r_{n}=0. By Theorem 6,

sups0|{Tns}{S^nopsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{T_{n}\leq s\right\}-\mathbb{P}\left\{\|\widehat{S}_{n}\|_{op}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
FnP,3n1/3Var(GPn)+Fn𝟏{Fn>n1/3FnP,3}P,33FnP,33+EGPnnVar(GPn)+(supu,vSd1|(uv)Hd(ΩΩ^n)Hd(uv)|Var(GPn))1/3.\displaystyle\begin{split}&\quad{}\quad{}\quad{}\lesssim\frac{\|F_{n}\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>n^{1/3}\|F_{n}\|_{P,3}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}\\ &\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\sup_{u,v\in S^{d-1}}\big{|}(u\otimes v)^{\prime}H_{d}(\Omega-\widehat{\Omega}_{n})H_{d}^{\prime}(u\otimes v)\big{|}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right)^{1/3}.\end{split} (62)

In the remainder of the proof we derive upper bounds on the quantities on the right-hand side in above display.

Note that

tr(Σ)=E[X22](a)E[X26]1/3=FnP,3Fn(X)ψ1(b)tr(Σ),\displaystyle\mathrm{tr}(\Sigma)=\mathrm{E}\big{[}\|X\|_{2}^{2}\big{]}\overset{(a)}{\leq}\mathrm{E}\big{[}\|X\|_{2}^{6}\big{]}^{1/3}=\|F_{n}\|_{P,3}\lesssim\|F_{n}(X)\|_{\psi_{1}}\overset{(b)}{\lesssim}\mathrm{tr}(\Sigma), (63)

where (a) follows from Hölder’s inequality and (b) holds because X=(X1,,Xd)dX=(X_{1},\ldots,X_{d})^{\prime}\in\mathbb{R}^{d} is sub-Gaussian with mean zero and covariance Σ\Sigma and therefore

Fn(X)ψ1=XXψ1k=1dXk2ψ1k=1dXkψ22k=1dVar(Xk)=tr(Σ).\displaystyle\|F_{n}(X)\|_{\psi_{1}}=\left\|X^{\prime}X\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}^{2}\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}\right\|_{\psi_{2}}^{2}\lesssim\sum_{k=1}^{d}\mathrm{Var}(X_{k})=\mathrm{tr}(\Sigma).

Next, note that uv22=(uv)(uv)=uuvv=1\|u\otimes v\|_{2}^{2}=(u^{\prime}\otimes v^{\prime})(u\otimes v)=u^{\prime}u\otimes v^{\prime}v=1 for all u,vSd1u,v\in S^{d-1}. Thus, we compute

EGPn=ESop\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}=\mathrm{E}\|S\|_{op} =E[supu,vSd1|vec(S)(uv)|]\displaystyle=\mathrm{E}\left[\sup_{u,v\in S^{d-1}}|\mathrm{vec}(S)^{\prime}(u\otimes v)|\right]
Evec(S)2(Evec(S)22)1/2=tr(HdΩHd).\displaystyle\leq\mathrm{E}\|\mathrm{vec}(S)\|_{2}\leq\left(\mathrm{E}\|\mathrm{vec}(S)\|_{2}^{2}\right)^{1/2}=\sqrt{\mathrm{tr}(H_{d}\Omega H_{d}^{\prime})}.

Also, since vec(A)=Hdvech(A)\mathrm{vec}(A)=H_{d}\mathrm{vech}(A) for all symmetric matrices AA, we have

tr(HdΩHd)=E[vec(XXΣ)vec(XXΣ)]=E[tr((XXΣ)(XXΣ))]\displaystyle\mathrm{tr}(H_{d}\Omega H_{d}^{\prime})=\mathrm{E}\left[\mathrm{vec}(XX^{\prime}-\Sigma)^{\prime}\mathrm{vec}(XX^{\prime}-\Sigma)\right]=\mathrm{E}\left[\mathrm{tr}\left((XX^{\prime}-\Sigma)(XX^{\prime}-\Sigma)\right)\right]
=E[tr(XXXX)]tr(Σ2)=E[X24]tr(Σ2)tr2(Σ)tr(Σ2),\displaystyle=\mathrm{E}\left[\mathrm{tr}\left(XX^{\prime}XX^{\prime}\right)\right]-\mathrm{tr}(\Sigma^{2})=\mathrm{E}[\|X\|_{2}^{4}]-\mathrm{tr}(\Sigma^{2})\lesssim\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2}),

where the last inequality follows from similar arguments as used to derive the upper bound in (63). Thus,

EGPntr2(Σ)tr(Σ2).\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}\lesssim\sqrt{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}. (64)

Moreover, since (uv)Hdvech(A)=vAu(u\otimes v)^{\prime}H_{d}\mathrm{vech}(A)=v^{\prime}Au for all symmetric matrices AA and matching vectors u,vu,v, we have, for arbitrary fnf\in\mathcal{F}_{n},

Var(GP(f))\displaystyle\mathrm{Var}(G_{P}(f)) infu,vSd1Var(vec(S)(uv))\displaystyle\geq\inf_{u,v\in S^{d-1}}\mathrm{Var}\left(\mathrm{vec}(S)^{\prime}(u\otimes v)\right)
=infu,vSd1(uv)HdΩHd(uv)=infu,vSd1E[(u(XXΣ)v)2]\displaystyle=\inf_{u,v\in S^{d-1}}(u\otimes v)^{\prime}H_{d}\Omega H_{d}^{\prime}(u\otimes v)=\inf_{u,v\in S^{d-1}}\mathrm{E}\left[\big{(}u^{\prime}(XX^{\prime}-\Sigma)v\big{)}^{2}\right]
=infuSd1E[(Xu)4]Σu22=infuSd1Var((Xu)2)\displaystyle=\inf_{u\in S^{d-1}}\mathrm{E}[(X^{\prime}u)^{4}]-\|\Sigma u\|_{2}^{2}=\inf_{u\in S^{d-1}}\mathrm{Var}\big{(}(X^{\prime}u)^{2}\big{)}
κ.\displaystyle\geq\kappa. (65)

Combine eq. (64)–(E) with Lemma 7 to obtain

Var(GPn)(κ1+E[GPn/κ])2(κ2κ+tr2(Σ)tr(Σ2))2(κ2κ+tr(Σ))2.\displaystyle\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\gtrsim\left(\frac{\kappa}{1+\mathrm{E}[\|G_{P}\|_{\mathcal{F}_{n}}/\kappa]}\right)^{2}\gtrsim\left(\frac{\kappa^{2}}{\kappa+\sqrt{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}}\right)^{2}\gtrsim\left(\frac{\kappa^{2}}{\kappa+\mathrm{tr}(\Sigma)}\right)^{2}. (66)

Next, since (again!) (uv)Hdvech(A)=vAu(u\otimes v)^{\prime}H_{d}\mathrm{vech}(A)=v^{\prime}Au for all symmetric matrices AA and matching vectors u,vu,v, we compute

supu,vSd1|(uv)Hd(ΩΩ^n)Hd(uv)|\displaystyle\sup_{u,v\in S^{d-1}}\left|(u\otimes v)^{\prime}H_{d}(\Omega-\widehat{\Omega}_{n})H_{d}(u\otimes v)\right|
supu,vSd1|1ni=1n(Xiu)2(Xiv)2E[(Xiu)2(Xiv)2]|+supu,vSd1u(Σ^Σ)v\displaystyle\quad\leq\sup_{u,v\in S^{d-1}}\left|\frac{1}{n}\sum_{i=1}^{n}(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}-\mathrm{E}[(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}]\right|+\sup_{u,v\in S^{d-1}}u^{\prime}(\widehat{\Sigma}-\Sigma)v
=Op(r(Σ)Σop2((logen)2r(Σ)n(logen)2r(Σ)n)),\displaystyle\quad=O_{p}\left(\mathrm{r}(\Sigma)\|\Sigma\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)\right), (67)

where the last line holds since by Markov’s inequality and Lemma 19,

supu,vSd1|1ni=1n(Xiu)2(Xiv)2E[(Xiu)2(Xiv)2]|\displaystyle\sup_{u,v\in S^{d-1}}\left|\frac{1}{n}\sum_{i=1}^{n}(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}-\mathrm{E}[(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}]\right|
=Op(r(Σ)Σop2((logen)2r(Σ)n(logen)2r(Σ)n)),\displaystyle\quad\quad=O_{p}\left(\mathrm{r}(\Sigma)\|\Sigma\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)\right),

and by Theorem 4 in Koltchinskii and Lounici (2017)

supu,vSd1u(Σ^Σ)v=Op(Σop(r(Σ)nr(Σ)n)).\displaystyle\sup_{u,v\in S^{d-1}}u^{\prime}(\widehat{\Sigma}-\Sigma)v=O_{p}\left(\|\Sigma\|_{op}\left(\sqrt{\frac{\mathrm{r}(\Sigma)}{n}}\vee\frac{\mathrm{r}(\Sigma)}{n}\right)\right).

Lastly, for r>0r>0, by the upper and lower bounds in (63),

Fn𝟏{Fn>n1/3FnP,3}P,33FnP,33E[Fn3+r]nr/3E[Fn3]r/3+1=FnP,3+r3+rnr/3FnP,33+rnr/3.\displaystyle\frac{\|F_{n}\mathbf{1}\{F_{n}>n^{1/3}\|F_{n}\|_{P,3}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}\leq\frac{\mathrm{E}[F_{n}^{3+r}]}{n^{r/3}\mathrm{E}[F_{n}^{3}]^{r/3+1}}=\frac{\|F_{n}\|_{P,3+r}^{3+r}}{n^{r/3}\|F_{n}\|_{P,3}^{3+r}}\lesssim n^{-r/3}. (68)

Now, combining (E)–(68), we obtain for r=1r=1 and under Assumption 3,

sups0|{Tns}{S^nopsX1,,Xn}|\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{T_{n}\leq s\right\}-\mathbb{P}\left\{\|\widehat{S}_{n}\|_{op}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{|}
tr(Σ)n1/6κ+tr2(Σ)n1/6κ2+n1/3+tr2(Σ)tr(Σ2)n1/2κ+tr2(Σ)tr(Σ2)n1/2κ2\displaystyle\quad{}\lesssim\frac{\mathrm{tr}(\Sigma)}{n^{1/6}\kappa}+\frac{\mathrm{tr}^{2}(\Sigma)}{n^{1/6}\kappa^{2}}+n^{-1/3}+\frac{\sqrt{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}}{n^{1/2}\kappa}+\frac{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}{n^{1/2}\kappa^{2}}
+Op((r(Σ)Σop2((logen)2r(Σ)n(logen)2r(Σ)n))1/3(κ+tr(Σ)κ2)2/3)\displaystyle\quad{}\quad{}+O_{p}\left(\left(\mathrm{r}(\Sigma)\|\Sigma\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)\right)^{1/3}\left(\frac{\kappa+\mathrm{tr}(\Sigma)}{\kappa^{2}}\right)^{2/3}\right)
Σopκr(Σ)n1/6Σop2κ2r2(Σ)n1/6\displaystyle\quad{}\lesssim\frac{\|\Sigma\|_{op}}{\kappa}\frac{r(\Sigma)}{n^{1/6}}\vee\frac{\|\Sigma\|_{op}^{2}}{\kappa^{2}}\frac{r^{2}(\Sigma)}{n^{1/6}}
+Op((r1/3(Σ)Σop2/3κ2/3r(Σ)Σop4/3κ4/3)((logen)2r(Σ)n(logen)2r(Σ)n)1/3).\displaystyle\quad{}\quad{}+O_{p}\left(\left(\frac{r^{1/3}(\Sigma)\|\Sigma\|_{op}^{2/3}}{\kappa^{2/3}}\vee\frac{r(\Sigma)\|\Sigma\|_{op}^{4/3}}{\kappa^{4/3}}\right)\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)^{1/3}\right).

To complete the proof, adjust some constants. ∎

Proof of Proposition 3.

Let n={vv,u:uS}\mathcal{F}_{n}=\{v\mapsto\langle v,u\rangle_{\mathcal{H}}:u\in S^{*}\}. Further, let QnQ_{n} be the empirical measure based on the Vi=(T+λ)2T(Yif0(Xi))kXiV_{i}=(T+\lambda)^{-2}T\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}, 1in1\leq i\leq n and QQ the pushforward measure of PP under the map (Y,X)V=(T+λ)2T(Yf0(X))kX(Y,X)\mapsto V=(T+\lambda)^{-2}T\big{(}Y-f_{0}(X)\big{)}k_{X}. Since f0f_{0} is the best approximation in square loss, E[Yf0(X)]=0\mathrm{E}[Y-f_{0}(X)]=0 and VV and the ViV_{i}’s have mean zero. Hence,

nf^bcnf0=supuS|1ni=1nVi,u|+ΘnnQnQn+Θn,\displaystyle\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}=\sup_{u\in S^{*}}\left|\left\langle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}V_{i},u\right\rangle_{\mathcal{H}}\right|+\Theta_{n}\equiv\sqrt{n}\|Q_{n}-Q\|_{\mathcal{F}_{n}}+\Theta_{n},

where Θn\Theta_{n} is a reminder term which satisfies |Θn|nRn|\Theta_{n}|\leq\sqrt{n}\|R_{n}\|_{\infty}. We plan to apply Theorem 9 to the far right hand side in above display. To this end, we need to (i) find an envelope for the function class n\mathcal{F}_{n}, (ii) establish (ii) compactness of n\mathcal{F}_{n}, (iii) construct certain Gaussian processes (to be defined below) with continuous covariance functions, and (iv) establish almost sure uniform continuity of these processes w.r.t. their intrinsic standard deviation metrics.

Recall that k(x,y)κ\sqrt{k(x,y)}\leq\kappa for all x,ySx,y\in S and that v,x=v,kx\langle v,x^{*}\rangle_{\mathcal{H}}=\langle v,k_{x}\rangle_{\mathcal{H}} for all vv\in\mathcal{H} and xSx\in S. Hence, |v,u|vsupxkxκv|\langle v,u\rangle_{\mathcal{H}}|\leq\|v\|_{\mathcal{H}}\sup_{x\in\mathcal{H}}\|k_{x}\|_{\mathcal{H}}\leq\kappa\|v\|_{\mathcal{H}} for all vv\in\mathcal{H} and uSu\in S^{*}. Thus, vκvv\mapsto\kappa\|v\|_{\mathcal{H}} is an envelope of the function class n\mathcal{F}_{n}.

Next, let ZZ and Z^nm\widehat{Z}_{n}^{m} be centered Gaussian random elements on \mathcal{H} with covariance operators Ω\Omega and Ω^nm\widehat{\Omega}_{n}^{m}, respectively, i.e. for all u,vu,v\in\mathcal{H},

Z,uN(0,𝒞(u,v)),where𝒞(u,v)=Ωu,v,\displaystyle\langle Z,u\rangle_{\mathcal{H}}\sim N\big{(}0,\mathcal{C}(u,v)\big{)},\quad\quad\text{where}\quad\quad\mathcal{C}(u,v)=\langle\Omega u,v\rangle_{\mathcal{H}},
Z^n,uN(0,𝒞^nm(u,v)),where𝒞^nm(u,v)=Ω^nmu,v.\displaystyle\langle\widehat{Z}_{n},u\rangle_{\mathcal{H}}\sim N\big{(}0,\mathcal{\widehat{C}}_{n}^{m}(u,v)\big{)},\quad\quad\text{where}\quad\quad\mathcal{\widehat{C}}_{n}^{m}(u,v)=\langle\widehat{\Omega}_{n}^{m}u,v\rangle_{\mathcal{H}}.

By Cauchy-Schwarz these covariance functions are obviously continuous w.r.t. \|\cdot\|_{\mathcal{H}}. Denote the standard deviation metrics associated with above Gaussian random elements by d𝒞d_{\mathcal{C}} and d𝒞^nmd_{\widehat{\mathcal{C}}_{n}^{m}}. Then, for all x,ySx,y\in S,

d𝒞2(x,y)\displaystyle d_{\mathcal{C}}^{2}(x^{*},y^{*}) =E[(Z(x)Z(y))2]=E[Z,xy2]EZ2xy2=tr(Ω)dk2(x,y),\displaystyle=\mathrm{E}\big{[}(Z(x^{*})-Z(y^{*}))^{2}\big{]}=\mathrm{E}\big{[}\langle Z,x^{*}-y^{*}\rangle_{\mathcal{H}}^{2}\big{]}\leq\mathrm{E}\|Z\|_{\mathcal{H}}^{2}\|x^{*}-y^{*}\|_{\mathcal{H}}^{2}=\mathrm{tr}(\Omega)d_{k}^{2}(x,y),

and, completely analogous,

d𝒞^nm2(x,y)tr(Ω^nm)dk2(x,y).\displaystyle d_{\widehat{\mathcal{C}}_{n}^{m}}^{2}(x^{*},y^{*})\leq\mathrm{tr}(\widehat{\Omega}_{n}^{m})d_{k}^{2}(x,y).

Since (S,dk)(S,d_{k}) is compact and tr(Ω)tr(Ω^n)<\mathrm{tr}(\Omega)\vee\mathrm{tr}(\widehat{\Omega}_{n})<\infty by assumption, these inequalities imply that both SS^{*} and (by continuity of the inner product) n\mathcal{F}_{n} are compact w.r.t. the standard deviation metrics d𝒞d_{\mathcal{C}} and d𝒞^nmd_{\widehat{\mathcal{C}}_{n}^{m}}. Moreover, since 0N(S,dk,ε)dε<\int_{0}^{\infty}\sqrt{N(S,d_{k},\varepsilon)}d\varepsilon<\infty, these inequalities also imply that 0N(n,d𝒞,ε)dε<\int_{0}^{\infty}\sqrt{N(\mathcal{F}_{n},d_{\mathcal{C}},\varepsilon)}d\varepsilon<\infty and 0N(n,d𝒞^n,ε)dε<\int_{0}^{\infty}\sqrt{N(\mathcal{F}_{n},d_{\widehat{\mathcal{C}}_{n}},\varepsilon)}d\varepsilon<\infty. Hence, by Lemma 17 there exist versions of ZZ and Z^n\widehat{Z}_{n} that are almost surely bounded and has almost surely uniformly d𝒞d_{\mathcal{C}}- and d𝒞^nd_{\widehat{\mathcal{C}}_{n}}-continuous sample paths. In the following, we keep using ZZ and Z^nm\widehat{Z}_{n}^{m} to denote these versions.

Hence, Theorem 9 applies and we have

supα(0,1)|{nf^bcnf0cnm(α)}α|κE[V3]1/3n1/3Var(Zn)+E[V3𝟏{V>n1/3E[V3]1/3}]E[V3]+EZnnVar(Zn)+infδ>0{(δVar(Zn))1/3+{ΩΩ^nmop>δ}}+infη>0{ηVar(Zn)+{nRn>η}}.\displaystyle\begin{split}&\sup_{\alpha\in(0,1)}\Big{|}\mathbb{P}\left\{\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}\leq c_{n}^{m}(\alpha)\right\}-\alpha\Big{|}\\ &\quad{}\quad{}\lesssim\frac{\kappa\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}}+\frac{\mathrm{E}[\|V\|_{\mathcal{H}}^{3}\mathbf{1}\{\|V\|_{\mathcal{H}}>n^{1/3}\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]^{1/3}\}]}{\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]}+\frac{\mathrm{E}\|Z\|_{\mathcal{F}_{n}}}{\sqrt{n\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}}\\ &\quad{}\quad{}\quad{}+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}\right)^{1/3}+\mathbb{P}\left\{\big{\|}\Omega-\widehat{\Omega}_{n}^{m}\big{\|}_{op}>\delta\right\}\right\}\\ &\quad{}\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}}+\mathbb{P}\left\{\sqrt{n}\|R_{n}\|_{\infty}>\eta\right\}\right\}.\end{split} (69)

In the remainder of the proof we derive upper bounds on the quantities on the right-hand side in above display.

First, from the proof of Lemma 20 we know that

V=(T+λ)1(Yif0(Xi))kXλ(T+λ)2(Yif0(Xi))kX.\displaystyle V=(T+\lambda)^{-1}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X}-\lambda(T+\lambda)^{-2}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X}.

Hence, by the proof of Lemma 22

V\displaystyle\|V\|_{\mathcal{H}} |Yif0(Xi)|(T+λ)1kXi+λ|Yif0(Xi)|(T+λ)2kXi\displaystyle\leq\left|Y_{i}-f_{0}(X_{i})\right|\left\|(T+\lambda)^{-1}k_{X_{i}}\right\|_{\mathcal{H}}+\lambda\left|Y_{i}-f_{0}(X_{i})\right|\left\|(T+\lambda)^{-2}k_{X_{i}}\right\|_{\mathcal{H}}
λ12κ(B+κf0)a.s.\displaystyle\leq\lambda^{-1}2\kappa(B+\kappa\|f_{0}\|_{\mathcal{H}})\quad a.s.

and

EV2\displaystyle\mathrm{E}\|V\|_{\mathcal{H}}^{2} 2σ02E[(T+λ)1kXi2]+2λσ02E[(T+λ)2kXi2]\displaystyle\leq 2\sigma_{0}^{2}\mathrm{E}\left[\left\|(T+\lambda)^{-1}k_{X_{i}}\right\|_{\mathcal{H}}^{2}\right]+2\lambda\sigma_{0}^{2}\mathrm{E}\left[\left\|(T+\lambda)^{-2}k_{X_{i}}\right\|_{\mathcal{H}}^{2}\right]
2σ02tr((T+λ)2T)+2σ02E[(T+λ)1kXi2]\displaystyle\leq 2\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2}T\right)+2\sigma_{0}^{2}\mathrm{E}\left[\left\|(T+\lambda)^{-1}k_{X_{i}}\right\|_{\mathcal{H}}^{2}\right]
4σ02tr((T+λ)2T).\displaystyle\leq 4\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2}T\right).

Therefore,

EV3\displaystyle\mathrm{E}\|V\|_{\mathcal{H}}^{3} λ18σ02tr((T+λ)2T)κ(B+κf0)λ1σ¯3𝔫22(λ),\displaystyle\leq\lambda^{-1}8\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2}T\right)\kappa(B+\kappa\|f_{0}\|_{\mathcal{H}})\lesssim\lambda^{-1}\bar{\sigma}^{3}\mathfrak{n}_{2}^{2}(\lambda), (70)

where σ¯2σ02κ2(B+κf0)21\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1 and 𝔫α2(λ)=tr((T+λ)2αT)\mathfrak{n}_{\alpha}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right), α0\alpha\in\mathbb{N}_{0}.

Second, by Cauchy-Schwarz,

EZn=E[supuS|Z,u|]EZsupuSutr(Ω)κ<.\displaystyle\mathrm{E}\|Z\|_{\mathcal{F}_{n}}=\mathrm{E}\left[\sup_{u\in S^{*}}\big{|}\langle Z,u\rangle_{\mathcal{H}}\big{|}\right]\leq\mathrm{E}\|Z\|_{\mathcal{H}}\sup_{u\in S^{*}}\|u\|_{\mathcal{H}}\leq\sqrt{\mathrm{tr}(\Omega)}\kappa<\infty. (71)

Third, for r>0r>0,

E[V3𝟏{V>n1/3E[V3]1/3}]E[V3]\displaystyle\frac{\mathrm{E}[\|V\|_{\mathcal{H}}^{3}\mathbf{1}\{\|V\|_{\mathcal{H}}>n^{1/3}\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]^{1/3}\}]}{\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]} E[V3+r]nr/3E[V3]r/3+1\displaystyle\leq\frac{\mathrm{E}[\|V\|_{\mathcal{H}}^{3+r}]}{n^{r/3}\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]^{r/3+1}}
=nr/3(E[V3+r]1/(3+r)E[V3]1/3)3+r\displaystyle=n^{-r/3}\left(\frac{\mathrm{E}[\|V\|_{\mathcal{H}}^{3+r}]^{1/(3+r)}}{\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]^{1/3}}\right)^{3+r}
nr/3(λ12κ(B+κf0))r.\displaystyle\lesssim n^{-r/3}\left(\lambda^{-1}2\kappa(B+\kappa\|f_{0}\|_{\mathcal{H}})\right)^{r}. (72)

Fourth, for uSu\in S^{*} arbitrary,

Var(Z(u))infuSVar(Z,u)=infuSΩu,uωS>0.\displaystyle\mathrm{Var}\big{(}Z(u)\big{)}\geq\inf_{u\in S^{*}}\mathrm{Var}\big{(}\langle Z,u\rangle_{\mathcal{H}}\big{)}=\inf_{u\in S^{*}}\langle\Omega u,u\rangle_{\mathcal{H}}\geq\omega_{S}>0. (73)

Combined with (71) and Lemma 7 this lower bound yields

Var(Zn)(ωS1+EZn/ωS)2(ωS2ωS+tr(Ω)κ)2ωS4tr(Ω)κ2.\displaystyle\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})\gtrsim\left(\frac{\omega_{S}}{1+\mathrm{E}\|Z\|_{\mathcal{F}_{n}}/\omega_{S}}\right)^{2}\gtrsim\left(\frac{\omega_{S}^{2}}{\omega_{S}+\sqrt{\mathrm{tr}(\Omega)}\kappa}\right)^{2}\gtrsim\frac{\omega_{S}^{4}}{\mathrm{tr}(\Omega)\kappa^{2}}. (74)

Fifth, by Lemma 22, with probability at least 1δ1-\delta,

ΩΩ^nmop\displaystyle\big{\|}\Omega-\widehat{\Omega}_{n}^{m}\big{\|}_{op} ΩΩ^nop+Ω^nΩ^nmop\displaystyle\leq\big{\|}\Omega-\widehat{\Omega}_{n}\big{\|}_{op}+\big{\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\|}_{op}
Ω^nΩ^nmop+T3(T+λ)4op(κ4+κ2𝔫12(λ)+σ¯4)log(2/δ)nλ2.\displaystyle\lesssim\big{\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\|}_{op}+\|T^{3}(T+\lambda)^{-4}\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}}. (75)

Finally, Lemma 21 combined with (69)–(E), r=1r=1 and δ(0,1/n)\delta\in(0,1/n) implies that, with probability at least 1δ1-\delta,

supα(0,1)|{nf^bcnf0cnm(α)}α|κ(σ¯3𝔫22(λ)nλ)1/3tr(Ω)κωS2+δ+σ¯n1/3λ+tr(Ω)κ2nωS2+{Ω^nΩ^nmop>T3(T+λ)4op(κ4+κ2𝔫12(λ)+σ¯4)log(2/δ)nλ2}+(T3(T+λ)4op(κ4+κ2𝔫12(λ)+σ¯4)log(2/δ)nλ2tr(Ω)κ2ωS4)1/3+δ++(σ¯2𝔫12(λ)nλ2σ¯2nλ2)(κ4log3(2/δ)nλκ2log2(2/δ))tr(Ω)κωS2+(κ4log2(2/δ)nnλ2)κ(T+λ)2f0tr(Ω)κωS2\displaystyle\begin{split}&\sup_{\alpha\in(0,1)}\Big{|}\mathbb{P}\left\{\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}\leq c_{n}^{m}(\alpha)\right\}-\alpha\Big{|}\\ &\quad{}\quad{}\lesssim\kappa\left(\frac{\bar{\sigma}^{3}\mathfrak{n}_{2}^{2}(\lambda)}{\sqrt{n}\lambda}\right)^{1/3}\frac{\sqrt{\mathrm{tr}(\Omega)}\kappa}{\omega_{S}^{2}}+\delta+\frac{\bar{\sigma}}{n^{1/3}\lambda}+\frac{\mathrm{tr}(\Omega)\kappa^{2}}{\sqrt{n}\omega_{S}^{2}}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\big{\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\|}_{op}>\|T^{3}(T+\lambda)^{-4}\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}}\right\}\\ &\quad\quad\quad+\left(\|T^{3}(T+\lambda)^{-4}\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}}\frac{\mathrm{tr}(\Omega)\kappa^{2}}{\omega_{S}^{4}}\right)^{1/3}+\delta+\\ &\quad{}\quad{}\quad{}+\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right)\left(\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\vee\kappa^{2}\log^{2}(2/\delta)\right)\frac{\sqrt{\mathrm{tr}(\Omega)}\kappa}{\omega_{S}^{2}}\\ &\quad\quad\quad+\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}\vee\sqrt{n}\lambda^{2}\right)\kappa\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}\frac{\sqrt{\mathrm{tr}(\Omega)}\kappa}{\omega_{S}^{2}}\end{split} (76)
=(a)o(1)+{Ω^nΩ^nmop>T3(T+λ)4op(κ4+κ2𝔫12(λ)+σ¯4)lognnλ2},\displaystyle\overset{(a)}{=}o(1)+\mathbb{P}\left\{\big{\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\|}_{op}>\|T^{3}(T+\lambda)^{-4}\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log n}{n\lambda^{2}}}\right\},

where (a) holds provided that sample size nn, regularization parameter λ\lambda, kernel kk, and operators Ω\Omega and TT are such that

(i)σ¯2κ4(logn)3𝔫1(λ)tr(Ω)T3(T+λ)4op=o(nλωS2)(ii)κ6(logn)2tr(Ω)(T+λ)2f0=o(nωS2)(iii)λ2κ2tr(Ω)(T+λ)2f0=o(ωS2)(iv)σ¯2κ2logntr(Ω)𝔫1(λ)T3(T+λ)4op=o(nλωS4)(v)σ¯3κ3(tr(Ω))3/2𝔫22(λ)=o(nλωS6)(vi)σ¯=o(n1/3λ),\displaystyle\begin{split}(i)&\quad\quad\bar{\sigma}^{2}\kappa^{4}(\log n)^{3}\mathfrak{n}_{1}(\lambda)\sqrt{\mathrm{tr}(\Omega)}\|T^{3}(T+\lambda)^{-4}\|_{op}=o(\sqrt{n}\lambda\omega_{S}^{2})\\ (ii)&\quad\quad\kappa^{6}(\log n)^{2}\sqrt{\mathrm{tr}(\Omega)}\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(\sqrt{n}\omega_{S}^{2})\\ (iii)&\quad\quad\lambda^{2}\kappa^{2}\sqrt{\mathrm{tr}(\Omega)}\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(\omega_{S}^{2})\\ (iv)&\quad\quad\bar{\sigma}^{2}\kappa^{2}\sqrt{\log n}\mathrm{tr}(\Omega)\mathfrak{n}_{1}(\lambda)\|T^{3}(T+\lambda)^{-4}\|_{op}=o(\sqrt{n}\lambda\omega_{S}^{4})\\ (v)&\quad\quad\bar{\sigma}^{3}\kappa^{3}\big{(}\mathrm{tr}(\Omega)\big{)}^{3/2}\mathfrak{n}_{2}^{2}(\lambda)=o(\sqrt{n}\lambda\omega_{S}^{6})\\ (vi)&\quad\quad\bar{\sigma}=o(n^{1/3}\lambda),\end{split} (77)

where σ¯2σ02κ2(B+κf0)21\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1 and 𝔫β2(λ)=tr((T+λ)2βT)\mathfrak{n}_{\beta}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2\beta}T\right), β0\beta\in\mathbb{N}_{0}.

Simplifying these rates is beyond the scope of this illustrative example. The interested reader may consult Appendix H in Singh and Vijaykumar (2023) and Sections 6 and 7 in Lopes (2022b) for potentially useful results. If the Hilbert space \mathcal{H} is finite dimensional and TT is invertible, then these rates are satisfied for λ0\lambda\rightarrow 0 and n1/3λn^{1/3}\lambda\rightarrow\infty (the exact rates feature some (logn)c(\log n)^{c}- factor for some c1c\geq 1). ∎

Appendix F Proofs of the results in Section B

F.1 Proofs of Lemmas 14, and 5

Proof of Lemma 1.

For s,ts,t\in\mathbb{R} and λ>0\lambda>0 arbitrary, define

gs,λ(t):=((1+stλ)1)0andgs,0(t):=𝟏(,s](t).\displaystyle g_{s,\lambda}(t):=\left(\left(1+\frac{s-t}{\lambda}\right)\wedge 1\right)\vee 0\quad{}\quad{}\mathrm{and}\quad{}\quad{}g_{s,0}(t):=\mathbf{1}_{(-\infty,s]}(t). (78)

Since gs,0gs,λgs+λ,0g_{s,0}\leq g_{s,\lambda}\leq g_{s+\lambda,0} for all ss\in\mathbb{R} and λ>0\lambda>0, it follows that (draw a sketch!)

gs,0gs+λ,λgs+λ,λϱλgs+2λ,λgs+3λ,0,\displaystyle g_{s,0}\leq g_{s+\lambda,\lambda}\leq g_{s+\lambda,\lambda}\ast\varrho_{\lambda}\leq g_{s+2\lambda,\lambda}\leq g_{s+3\lambda,0}, (79)

where ϱλ():=λ1ϱ(λ1)\varrho_{\lambda}(\cdot):=\lambda^{-1}\varrho\left(\>\cdot\>\lambda^{-1}\right) and

ϱ(t)=C0exp(1t21)𝟏[1,1](t),\displaystyle\varrho(t)=C_{0}\exp\left(\frac{1}{t^{2}-1}\right)\mathbf{1}_{[-1,1]}(t),

where C0>0C_{0}>0 is an absolute constant such that ϱ(t)dt=1\int\varrho(t)dt=1. Since ϱCc()\varrho\in C_{c}^{\infty}(\mathbb{R}) with support [1,1][-1,1], we easily verify that the map t(gs+λ,λϱλ)(t)t\mapsto(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(t) is continuously differentiable and its kkth derivative satisfies

|Dk(gs+λ,λϱλ)(t)|=|(gs+λ,λ(Dkϱλ))(t)|Ckλk𝟏[s,s+3λ](t),\displaystyle\left|D^{k}\left(g_{s+\lambda,\lambda}\ast\varrho_{\lambda}\right)(t)\right|=\left|\left(g_{s+\lambda,\lambda}\ast(D^{k}\varrho_{\lambda})\right)(t)\right|\leq C_{k}\lambda^{-k}\mathbf{1}_{[s,s+3\lambda]}(t), (80)

where Ck>0C_{k}>0 is a constant depending only on k0k\in\mathbb{N}_{0}. This establishes the first claim of the lemma with hs,λgs+λ,λϱλh_{s,\lambda}\equiv g_{s+\lambda,\lambda}\ast\varrho_{\lambda}. To prove the second claim of the lemma we proceed in two steps: First, we show that

sups|{Xs}{Zs}|\displaystyle\sup_{s\in\mathbb{R}}\left|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right| sups|E[(gs+λ,λϱλ)(X)(gs+λ,λϱλ)(Z)]|+ζ3λ(X)ζ3λ(Z).\displaystyle\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)]\big{|}+\zeta_{3\lambda}(X)\wedge\zeta_{3\lambda}(Z).

By the chain of inequalities in eq. (79), for ss\in\mathbb{R} arbitrary,

{Xs}{Zs}\displaystyle\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}
=E[gs,0(X)gs,0(Z)]\displaystyle\quad{}=\mathrm{E}\left[g_{s,0}(X)-g_{s,0}(Z)\right]
|E[(gs+λ,λϱλ)(X)(gs+λ,λϱλ)(Z)]|+|E[(gs+λ,λϱλ)(Z)gs,0(Z)]|\displaystyle\quad{}\leq\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]\right|+\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-g_{s,0}(Z)\right]\right|
|E[(gs+λ,λϱλ)(X)(gs+λ,λϱλ)(Z)]|+|E[gs+3λ,0(Z)gs,0(Z)]|\displaystyle\quad{}\leq\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]\right|+\left|\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s,0}(Z)\right]\right|
|E[(gs+λ,λϱλ)(X)(gs+λ,λϱλ)(Z)]|+{sZs+3λ}.\displaystyle\quad{}\leq\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]\right|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\}. (81)

Similarly,

{Zs+3λ}{Xs+3λ}\displaystyle\mathbb{P}\left\{Z\leq s+3\lambda\right\}-\mathbb{P}\left\{X\leq s+3\lambda\right\}
=E[gs+3λ,0(Z)gs+3λ,0(X)]\displaystyle\quad{}=\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s+3\lambda,0}(X)\right]
|E[(gs+λ,λϱλ)(Z)(gs+λ,λϱλ)(X)]|+|E[(gs+λ,λϱλ)(Z)gs+3λ,0(Z)]|\displaystyle\quad{}\leq\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]\right|+\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-g_{s+3\lambda,0}(Z)\right]\right|
|E[(gs+λ,λϱλ)(Z)(gs+λ,λϱλ)(X)]|+|E[gs+3λ(Z)gs,0(Z)]|\displaystyle\quad{}\leq\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]\right|+\left|\mathrm{E}\left[g_{s+3\lambda}(Z)-g_{s,0}(Z)\right]\right|
|E[(gs+λ,λϱλ)(Z)(gs+λ,λϱλ)(X)]|+{sZs+3λ}.\displaystyle\quad{}\leq\left|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]\right|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\}. (82)

Now, take the supremum over ss\in\mathbb{R} in eq. (F.1) and (F.1) and switch the roles of XX and ZZ. Next, we show

sups|E[(gs+λ,λϱλ)(X)(gs+λ,λϱλ)(Z)]|\displaystyle\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)]\big{|} sups|{Xs}{Zs}|+ζ3λ(X)ζ3λ(Z).\displaystyle\leq\sup_{s\in\mathbb{R}}\left|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right|+\zeta_{3\lambda}(X)\wedge\zeta_{3\lambda}(Z).

As before, we compute

E[(gs+λ,λϱλ)(X)(gs+λ,λϱλ)(Z)]\displaystyle\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]
E[gs+3λ,0(X)gs+3λ,0(Z)]+E[gs+3λ,0(Z)gs,0(Z)]\displaystyle\quad{}\leq\mathrm{E}\left[g_{s+3\lambda,0}(X)-g_{s+3\lambda,0}(Z)\right]+\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s,0}(Z)\right]
|{Xs+3λ}{Zs+3λ}|+{sZs+3λ},\displaystyle\quad{}\leq\left|\mathbb{P}\left\{X\leq s+3\lambda\right\}-\mathbb{P}\left\{Z\leq s+3\lambda\right\}\right|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\}, (83)

and

E[(gs+λ,λϱλ)(Z)(gs+λ,λϱλ)(X)]\displaystyle\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]
E[gs,0(Z)gs,0(X)]+E[gs+3λ,0(Z)gs,0(Z)]\displaystyle\quad{}\leq\mathrm{E}\left[g_{s,0}(Z)-g_{s,0}(X)\right]+\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s,0}(Z)\right]
|{Xs}{Zs}|+{sZs+3λ}.\displaystyle\quad{}\leq\left|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\}. (84)

To conclude, take the supremum over ss\in\mathbb{R} in (F.1) and (F.1) and switch the roles of XX and ZZ. ∎

Proof of Lemma 4.

The set on which gfg\circ f is not differentiable is contained in the null set NN on which ff is not differentiable. Thus, gfg\circ f is kk-times differentiable almost everywhere. The derivatives now follow from Hardy (2006). ∎

Proof of Lemma 5.

Since ff is a piecewise linear function, it has partial derivatives of any order at all its continuity points. Since the set of discontinuity points of ff forms a null set with respect to the Lebesgue measure on d\mathbb{R}^{d}, ff differentiable almost everywhere on d\mathbb{R}^{d}. (These partial derivatives need not to be continuous!) The expressions of these partial derivatives follow from direct calculation.

Even more is true: Since ff is Lipschitz continuous, Rademacher’s theorem implies that ff is in fact totally differentiable almost everywhere on d\mathbb{R}^{d}. Furthermore, since ff is convex, Alexandrov’s theorem implies that ff satisfies a second order quadratic expansion almost everywhere on d\mathbb{R}^{d}, i.e. for all x,adx,a\in\mathbb{R}^{d},

|f(x+a)f(x)Df(x)a12aHa|=o(a22).\displaystyle\left|f(x+a)-f(x)-Df(x)a-\frac{1}{2}a^{\prime}Ha\right|=o(\|a\|_{2}^{2}).

According to Rockafellar (1999) the matrix Hd×dH\in\mathbb{R}^{d\times d} is symmetric, positive semi-definite, and equal to the Jacobian of Df(x)Df(x) for all xdx\in\mathbb{R}^{d} at which Df(x)Df(x) exists. Thus, HH can be identified with the second derivative D2f(x)D^{2}f(x) for almost all xdx\in\mathbb{R}^{d}. ∎

F.2 Proofs of Lemmas 2 and 3

Proof of Lemma 2.

Throughout the proof we write h(x)=(hs,λf)(x)h(x)=(h_{s,\lambda}\circ f)(x) with hs,λCc()h_{s,\lambda}\in C^{\infty}_{c}(\mathbb{R}) as defined in Lemma 1 and f(x)=xf(x)=\|x\|_{\infty}. To simplify the notation, we denote partial derivatives w.r.t xjx_{j} by j\partial_{j}, e.g. we write i1h\partial_{i_{1}}h for hxi1\frac{\partial h}{\partial x_{i_{1}}}, i2f\partial_{i_{2}}f for fxi2\frac{\partial f}{\partial x_{i_{2}}}, etc. We use d\mathcal{L}^{d} to denote the Lebesgue measure on d\mathbb{R}^{d}.

Proof of part (i).

Special case k=1k=1.

Let ε>0\varepsilon>0 and ei1de_{i_{1}}\in\mathbb{R}^{d} the i1i_{1}-th standard unit vector in d\mathbb{R}^{d}. Recall that h=hs,λfh=h_{s,\lambda}\circ f, where hs,λh_{s,\lambda} is λ1\lambda^{-1}-Lipschitz w.r.t. the norm induced by the absolute value and f(x)=xf(x)=\|x\|_{\infty} is 11-Lipschitz w.r.t. the metric induced by the \ell_{\infty}-norm. Thus, for x0,zdx_{0},z\in\mathbb{R}^{d} and t,ϵ>0t,\epsilon>0 arbitrary,

Δi1(x0,z,t;ϵ)\displaystyle\Delta_{i_{1}}(x_{0},z,t;\epsilon) :=ϵ1[h(etx0+etϵei1+1e2tz)h(etx0+1e2tz)]\displaystyle:=\epsilon^{-1}\left[h\left(e^{-t}x_{0}+e^{-t}\epsilon e_{i_{1}}+\sqrt{1-e^{-2t}}z\right)-h\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}z\right)\right]
ϵ1λ1etϵei1\displaystyle\leq\epsilon^{-1}\lambda^{-1}\|e^{-t}\epsilon e_{i_{1}}\|_{\infty}
=λ1et.\displaystyle=\lambda^{-1}e^{-t}.

Hence, the difference quotient Δi1(x0,z,t;ϵ)\Delta_{i_{1}}(x_{0},z,t;\epsilon) is bounded uniformly in x0,zdx_{0},z\in\mathbb{R}^{d} and t,ϵ>0t,\epsilon>0. Furthermore, by Lemma 5 hh is differentiable d\mathcal{L}^{d}-a.e. Therefore, the conditions of Corollary A.5 in Dudley (2014) are met and we can pass the derivative through both integrals (over t>0t>0 and zdz\in\mathbb{R}^{d}) to obtain

i1(0Pth(x)dt)|x=x0=0etPti1h(x0)dt.\displaystyle\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}=\int_{0}^{\infty}e^{-t}P_{t}\partial_{i_{1}}h(x_{0})dt. (85)

The cases k2k\geq 2.

“Off-the-shelf” differentiating under the integral sign seems to be only possible in the case k=1k=1. In all other cases, Corollary A.5 and related results in Dudley (2002) do not apply, because the higher-order difference quotients of hh are not uniformly integrable (considered as a collection of random variables indexed by a null sequence (ϵn)n1(\epsilon_{n})_{n\geq 1}). Therefore, to prove the claim for k2k\geq 2 we develop a more specific inductive argument which is tailored explicitly to the map xxx\mapsto\|x\|_{\infty} and the fact that we integrate w.r.t. a non-degenerate Gaussian measure.

Base case k=2k=2 and i1i2i_{1}\neq i_{2}.

Let YN(0,Id)Y\sim N(0,I_{d}), A=[A1,,Ad]=Σ1/2A=[A_{1},\ldots,A_{d}]=\Sigma^{-1/2}, and φ\varphi be the density of the law of YY. By a change of variable we can re-write eq. (85) as

i1(0Pth(x)dt)|x=x0\displaystyle\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}} =0etPti1h(x0)dt\displaystyle=\int_{0}^{\infty}e^{-t}P_{t}\partial_{i_{1}}h(x_{0})dt
=0deti1h(etx0+1e2tA1y)φ(y)dydt\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\partial_{i_{1}}h\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}A^{-1}y\right)\varphi(y)dydt
=0deti1h(u)ψ(x0,u,t)dudt,\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\partial_{i_{1}}h\left(u\right)\psi(x_{0},u,t)dudt, (86)

where

ψ(x,u,t):=(1e2t)d/2det(A)φ(A(uetx)1e2t).\displaystyle\psi(x,u,t):=\left(1-e^{-2t}\right)^{-d/2}\det(A)\varphi\left(\frac{A(u-e^{-t}x)}{\sqrt{1-e^{-2t}}}\right).

Since AA is full rank, ψ\psi is a smooth function of d\mathcal{L}^{d}-a.e. xdx\in\mathbb{R}^{d}. Hence, the map xeti1h(u)i2ψ(x,u,t)x\mapsto e^{-t}\partial_{i_{1}}h\left(u\right)\partial_{i_{2}}\psi(x,u,t) exists and is continuous for d\mathcal{L}^{d}-a.e. xdx\in\mathbb{R}^{d}. Furthermore, the map is uniformly bounded and integrable in udu\in\mathbb{R}^{d} and t0t\geq 0 for d\mathcal{L}^{d}-a.e. xdx\in\mathbb{R}^{d}. Hence, by Corollary A.4 in Dudley (2014) we can pass a second partial derivative through the integral in (F.2) and compute

i2i1(0Pth(x)dt)|x=x0\displaystyle\partial_{i_{2}}\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}} =0deti1h(u)i2ψ(x0,u,t)dudt.\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\partial_{i_{1}}h\left(u\right)\partial_{i_{2}}\psi(x_{0},u,t)dudt. (87)

In the following we show that one can “pull back” the partial derivative i2\partial_{i_{2}} from ψ\psi onto i1h\partial_{i_{1}}h. Our argument involves a version of Stein’s lemma (i.e. Chernozhukov et al., 2020, Theorem 11.1) and a regularization of i1h\partial_{i_{1}}h. Several computations also crucially depend on the fact that h(x)=hs,λ(x)h(x)=h_{s,\lambda}(\|x\|_{\infty}).

Let ϱC()\varrho\in C^{\infty}(\mathbb{R}) be the standard mollifier

ϱ(r)=C0exp(1r21)𝟏{1r1},\displaystyle\varrho(r)=C_{0}\exp\left(\frac{1}{r^{2}-1}\right)\mathbf{1}\{-1\leq r\leq 1\},

where C0>0C_{0}>0 is an absolute constant such that ϱ(r)dr=1\int\varrho(r)dr=1. For η>0\eta>0, set ϱη()=η1ϱ(η1)\varrho_{\eta}(\cdot)=\eta^{-1}\varrho\left(\>\cdot\>\eta^{-1}\right) and define the “partial regularization” of a function gg on d\mathbb{R}^{d} in its iith coordinate by

(ϱη(i)g):=11ϱ(r)g(urηei)dr,\displaystyle(\varrho_{\eta}\ast_{(i)}g):=\int_{-1}^{1}\varrho(r)\>g(u-r\eta e_{i})dr, (88)

where eie_{i} is the iith standard unit vector in d\mathbb{R}^{d}. With this notation, define

hi1η,i2(u)\displaystyle h_{i_{1}}^{\eta,i_{2}}(u) :=(Dhs,λf)(u)(ϱη(i2)i1f)(u)\displaystyle:=(Dh_{s,\lambda}\circ f)(u)(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)
(Dhs,λf)(u)11ϱ(r)i1f(urηei2)dr\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\int_{-1}^{1}\varrho(r)\>\partial_{i_{1}}f(u-r\eta e_{i_{2}})dr
(Dhs,λf)(u) 1{|ui1||uj|,ji2}(ϱη𝟏{|ui1|||})(ui2).\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\>\mathbf{1}\{|u_{i_{1}}|\geq|u_{j}|,\>\forall j\neq i_{2}\}\big{(}\varrho_{\eta}\ast\mathbf{1}\{|u_{i_{1}}|\geq|\cdot|\}\big{)}(u_{i_{2}}).

Obviously, the regularization hi1η,i2(u)h_{i_{1}}^{\eta,i_{2}}(u) is just i1h\partial_{i_{1}}h with the discontinuity at ui2u_{i_{2}} smoothed out. (We comment on the rationale behind this partial regularization after eq. (94).) In particular, ui2hi1η,i2(u)u\mapsto\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}(u) exists for all udu\in\mathbb{R}^{d}, and, by Lemmas 4 and 5,

i2hi1η,i2(u)\displaystyle\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}(u) =(D2hs,λf)(u)(ϱη(i2)i1f)(u)i2f(u)\displaystyle=\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)\>\partial_{i_{2}}f(u) (89)
+(D2hs,λf)(u)i2(ϱη(i2)i1f)(u).\displaystyle\quad{}+\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\partial_{i_{2}}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u). (90)

Observe that by Leibniz’s integral rule the partial derivative i2\partial_{i_{2}} in line (90) takes the form

i2(ϱη(i2)i1f)(u)=sign(ui1)𝟏{|ui1||uj|,ji2}i2(ϱη𝟏{|ui1|||})(ui2),\displaystyle\partial_{i_{2}}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)=\mathrm{sign}(u_{i_{1}})\mathbf{1}\{|u_{i_{1}}|\geq|u_{j}|,\>j\neq i_{2}\}\>\partial_{i_{2}}\big{(}\varrho_{\eta}\ast\mathbf{1}\{|u_{i_{1}}|\geq|\cdot|\}\big{)}(u_{i_{2}}),

where

i2(ϱη𝟏{|ui1|||})(ui2)\displaystyle\partial_{i_{2}}\big{(}\varrho_{\eta}\ast\mathbf{1}\{|u_{i_{1}}|\geq|\cdot|\}\big{)}(u_{i_{2}}) =i2(ui2|ui1|)/η(ui2+|ui1|)/ηϱ(r)dr\displaystyle=\partial_{i_{2}}\int_{(u_{i_{2}}-|u_{i_{1}}|)/\eta}^{(u_{i_{2}}+|u_{i_{1}}|)/\eta}\varrho(r)dr
=ϱη(ui2+|ui1|)ϱη(ui2|ui1|)\displaystyle=\varrho_{\eta}(u_{i_{2}}+|u_{i_{1}}|)-\varrho_{\eta}(u_{i_{2}}-|u_{i_{1}}|)
=(ϱη𝟏|ui1|)(ui2)(ϱη𝟏|ui1|)(ui2),\displaystyle=\left(\varrho_{\eta}\ast\mathbf{1}_{-|u_{i_{1}}|}\right)(u_{i_{2}})-\left(\varrho_{\eta}\ast\mathbf{1}_{|u_{i_{1}}|}\right)(u_{i_{2}}), (91)

where x𝟏a(x)=𝟏{a=x}x\mapsto\mathbf{1}_{a}(x)=\mathbf{1}\{a=x\}, a,xa,x\in\mathbb{R}. Thus, by Lemma 1 (i) we easily verify that hi1η,i2h_{i_{1}}^{\eta,i_{2}} and i2hi1η,i2\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}} are bounded and integrable.

We return to eq. (87). Adding and subtracting hi1η,i2h_{i_{1}}^{\eta,i_{2}} we expand its right hand side as

0dethi1η,i2(u)i2ψ(x0,u,t)dudt\displaystyle\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}h_{i_{1}}^{\eta,i_{2}}(u)\partial_{i_{2}}\psi(x_{0},u,t)dudt (92)
+0det(i1h(u)hi1η,i2(u))i2ψ(x0,u,t)dudt.\displaystyle\quad{}+\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{2}}(u)\right)\partial_{i_{2}}\psi(x_{0},u,t)dudt. (93)

Consider the integral in line (92). A change of variable yields

0dethi1η,i2(u)i2ψ(x0,u,t)dudt\displaystyle\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}h_{i_{1}}^{\eta,i_{2}}(u)\partial_{i_{2}}\psi(x_{0},u,t)dudt
=0de2t1e2thi1η,i2(u)(1e2t)d/2det(A)φ(A(uetx0)1e2t)(A(uetx0)1e2t)Ai2dudt\displaystyle\quad{}=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}\frac{e^{-2t}}{\sqrt{1-e^{-2t}}}h_{i_{1}}^{\eta,i_{2}}(u)\left(1-e^{-2t}\right)^{-d/2}\det(A)\varphi\left(\frac{A(u-e^{-t}x_{0})}{\sqrt{1-e^{-2t}}}\right)\left(\frac{A(u-e^{-t}x_{0})}{\sqrt{1-e^{-2t}}}\right)^{\prime}A_{i_{2}}dudt
=0de2t1e2thi1η,i2(etx0+1e2tA1y)φ(y)y(A1)AAi2dydt\displaystyle\quad{}=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}\frac{e^{-2t}}{\sqrt{1-e^{-2t}}}h_{i_{1}}^{\eta,i_{2}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}A^{-1}y\right)\varphi(y)y^{\prime}(A^{-1})^{\prime}A^{\prime}A_{i_{2}}dydt
=0e2t1e2tE[hi1η,i2(etx0+1e2tZ)Z]dtAAi2.\displaystyle\quad{}=\int_{0}^{\infty}\frac{e^{-2t}}{\sqrt{1-e^{-2t}}}\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z\right)Z^{\prime}\right]dt\>A^{\prime}A_{i_{2}}.

By Stein’s lemma for non-differentiable (but bounded and integrable) functions (i.e. Chernozhukov et al., 2020, Theorem 11.1) the last line in above display is equal to

0e2tDE[hi1η,i2(x+1e2tZ)]|x=etx0dtΣAAi2\displaystyle\int_{0}^{\infty}e^{-2t}D\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{|}_{x=e^{-t}x_{0}}dt\>\Sigma A^{\prime}A_{i_{2}}
=0e2tDE[hi1η,i2(x+1e2tZ)]|x=etx0dtA1Ai2\displaystyle\quad{}=\int_{0}^{\infty}e^{-2t}D\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{|}_{x=e^{-t}x_{0}}dt\>A^{-1}A_{i_{2}}
=0e2tDE[hi1η,i2(x+1e2tZ)]|x=etx0dtei2\displaystyle\quad{}=\int_{0}^{\infty}e^{-2t}D\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{|}_{x=e^{-t}x_{0}}dt\>e_{i_{2}}
=0e2ti2E[hi1η,i2(x+1e2tZ)]|x=etx0dt,\displaystyle\quad{}=\int_{0}^{\infty}e^{-2t}\partial_{i_{2}}\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{|}_{x=e^{-t}x_{0}}dt,

where ei2e_{i_{2}} denotes the i2i_{2}-th standard unit vector in d\mathbb{R}^{d}.

Consider the expression in the previous line. Since xi2hi1η,i2(x+u)x\mapsto\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}(x+u) exists and is uniformly bounded in x,udx,u\in\mathbb{R}^{d}, we can push the partial derivative i2\partial_{i_{2}} through the expectation (e.g. Folland, 1999, Theorem 2.27 (b); the integrability condition is satisified because we integrate w.r.t. the non-degenerate law of N(0,Σ)N(0,\Sigma)). Thus, we have shown that the integral in line (92) equals

0e2tE[i2hi1η,i2(etx0+1e2tZ)]dt.\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z\right)\right]dt. (94)

Obviously, we could have derived eq. (94) under any kind of smoothing. However, under the “partial regularization” of hi1h_{i_{1}} in its i2i_{2}th coordinate the partial derivative i2hi1η,i2\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}} takes on a simple closed-form expression (eq. (89)–(F.2)). The simple form of the expression in eq. (F.2) is particularly important as it strongly suggests that i2hi1η,i2\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}} does not “blow up” as η0\eta\downarrow 0. Indeed, if we had “fully” regularized hi1h_{i_{1}} in all its coordinates, the expression in eq. (F.2) would not be as simple and its asymptotic behavior (as η0\eta\downarrow 0) would be less clear.

We now show that the integral in eq. (94) converges to

0e2tE[(D2hs,λf)(V0t)i1f(V0t)i2f(V0t)]dt,\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt, (95)

as η0\eta\downarrow 0, where

V0t:=etx0+1e2tZ.\displaystyle V_{0}^{t}:=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z.

We record the following useful bounds: First, by Lemma 1 (i), for all udu\in\mathbb{R}^{d},

|(Dhs,λf)(u)|\displaystyle\Big{|}(Dh_{s,\lambda}\circ f)(u)\Big{|} λ1𝟏{sus+3λ},\displaystyle\leq\lambda^{-1}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}, (96)
|(D2hs,λf)(u)|\displaystyle\Big{|}(D^{2}h_{s,\lambda}\circ f)(u)\Big{|} λ2𝟏{sus+3λ}.\displaystyle\leq\lambda^{-2}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}. (97)

Second, there exists CA,d>0C_{A,d}>0 (depending on AA and dd) such that for all udu\in\mathbb{R}^{d} and t0t\geq 0,

|ψ|(x0,u,t)|i2ψ|(x0,u,t)CA,dfor d-a.e. x0d.\displaystyle\left|\psi\right|(x_{0},u,t)\vee\left|\partial_{i_{2}}\psi\right|(x_{0},u,t)\leq C_{A,d}\quad{}\text{for\>\>\>}\mathcal{L}^{d}\text{-a.e.\>\>\>}x_{0}\in\mathbb{R}^{d}. (98)

By eq. (97) and (98),

|0e2tE[(D2hs,λf)(V0t)(ϱη(i2)i1f)(V0t)i2f(V0t)]dt\displaystyle\left|\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt\right.
0e2tE[(D2hs,λf)(V0t)i1f(V0t)i2f(V0t)]dt|\displaystyle\quad{}\quad{}\quad{}\quad{}\left.-\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt\right|
=|0de2t(D2hs,λf)(u)((ϱη(i2)i1f)(u)i1f(u))i2f(u)ψ(x0,u,t)dudt|\displaystyle\quad{}=\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\Big{(}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{)}\>\partial_{i_{2}}f(u)\psi(x_{0},u,t)dudt\right|
CA,d2λ2d|(ϱη(i2)i1f)(u)i1f(u)|𝟏{sus+3λ}du\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda^{2}}\int_{\mathbb{R}^{d}}\Big{|}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{|}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}du
0asη0,\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0, (99)

where the limit in the last line follows from Lemma 3.

Since the law of N(0,Σ)N(0,\Sigma) is non-degenerate,

|Corr(V0,i1t,V0,i2t)|=|Corr(Zi1,Zi2)|<1,\displaystyle\big{|}Corr(V_{0,i_{1}}^{t},V_{0,i_{2}}^{t})\big{|}=\big{|}Corr(Z_{i_{1}},Z_{i_{2}})\big{|}<1,

and, hence, the event {|V0,i1t|=|V0,i2t|}\{|V_{0,i_{1}}^{t}|=|V_{0,i_{2}}^{t}|\} is a N(0,Σ)N(0,\Sigma)-null set for all t0t\geq 0 and d\mathcal{L}^{d}-a.e. x0dx_{0}\in\mathbb{R}^{d} (for a more detailed argument, see also below proof of part (ii) of this lemma). Thus,

0e2tE[(Dhs,λf)(V0t)sign(V0,i1t)𝟏{|V0,i1t||V0,jt|,ji2}(𝟏|V0,i1t|(V0,i2t)+𝟏|V0,i1t|(V0,i2t))]dt\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\Big{[}\big{(}Dh_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\mathrm{sign}(V_{0,i_{1}}^{t})\mathbf{1}\{|V_{0,i_{1}}^{t}|\geq|V_{0,j}^{t}|,\>j\neq i_{2}\}\left(-\mathbf{1}_{-|V_{0,i_{1}}^{t}|}(V_{0,i_{2}}^{t})+\mathbf{1}_{|V_{0,i_{1}}^{t}|}(V_{0,i_{2}}^{t})\right)\Big{]}dt
=0.\displaystyle\quad{}=0.

Therefore, by eq. (F.2), (96), and (98),

|0e2tE[(Dhs,λf)(V0t)i2(ϱη(i2)i1f)(V0t)]dt|\displaystyle\left|\int_{0}^{\infty}e^{-2t}\mathrm{E}\Big{[}\big{(}Dh_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{2}}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(V_{0}^{t})\Big{]}dt\right|
=|0de2t(Dhs,λf)(u)sign(ui1)𝟏{|ui1||uj|,ji2}\displaystyle\quad{}=\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}Dh_{s,\lambda}\circ f\big{)}(u)\>\mathrm{sign}(u_{i_{1}})\mathbf{1}\{|u_{i_{1}}|\geq|u_{j}|,\>j\neq i_{2}\}\right.
×[(ϱη[𝟏|ui1|𝟏|ui1|])(ui2)(𝟏|ui1|(ui2)𝟏|ui1|(ui2))]ψ(x0,u,t)dt|\displaystyle\quad{}\quad{}\quad{}\quad{}\left.\phantom{\int_{0}^{\infty}}\times\left[\left(\varrho_{\eta}\ast\left[\mathbf{1}_{-|u_{i_{1}}|}-\mathbf{1}_{|u_{i_{1}}|}\right]\right)(u_{i_{2}})-\left(\mathbf{1}_{-|u_{i_{1}}|}(u_{i_{2}})-\mathbf{1}_{|u_{i_{1}}|}(u_{i_{2}})\right)\right]\psi(x_{0},u,t)dt\right|
CA,d2λd|(ϱη𝟏|ui1|)(ui2)𝟏|ui1|(ui2)|𝟏{sus+3λ}du\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\left|\left(\varrho_{\eta}\ast\mathbf{1}_{-|u_{i_{1}}|}\right)(u_{i_{2}})-\mathbf{1}_{-|u_{i_{1}}|}(u_{i_{2}})\right|\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}du
+CA,d2λd|(ϱη𝟏|ui1|)(ui2)𝟏|ui1|(ui2)|𝟏{sus+3λ}du\displaystyle\quad{}\quad{}+\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\left|\left(\varrho_{\eta}\ast\mathbf{1}_{|u_{i_{1}}|}\right)(u_{i_{2}})-\mathbf{1}_{|u_{i_{1}}|}(u_{i_{2}})\right|\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}du
0asη0,\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0, (100)

where the limit follows from Lemma 3. The limit in line (95) now follows by combining eq. (F.2) and (F.2) with eq. (89) and (90).

Lastly, we return to the integral in line (93). By eq. (97) and (98),

|0det(i1h(u)hi1η,i2(u))i2ψ(x0,u,t)dudt|\displaystyle\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{2}}(u)\right)\partial_{i_{2}}\psi(x_{0},u,t)dudt\right|
=|0det(Dhs,λf)(u)(i1f(u)(ϱη(i2)i1f)(u))i2ψ(x0,u,t)dudt|\displaystyle\quad{}=\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}(Dh_{s,\lambda}\circ f)(u)\Big{(}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)\Big{)}\partial_{i_{2}}\psi(x_{0},u,t)dudt\right|
CA,dλd|i1f(u)(ϱη(i2)i1f)(u)|𝟏{sus+3λ}du\displaystyle\quad{}\leq\frac{C_{A,d}}{\lambda}\int_{\mathbb{R}^{d}}\Big{|}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)\Big{|}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}du
0asη0,\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0, (101)

where the limit follows from Lemma 3.

Combine eq. (95) and (F.2) with eq. (87), (92), and (93) to conclude via Lemma (5) that for d\mathcal{L}^{d}-a.e. x0dx_{0}\in\mathbb{R}^{d},

i2i1(0Pth(x)dt)|x=x0=0e2tE[(D2hs,λf)(V0t)i1f(V0t)i2f(V0t)]dt0de2ti2i1h(u)ψ(x0,u,t)dudt0e2tPt(i2i1h)(x0)dt.\displaystyle\begin{split}\partial_{i_{2}}\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}&=\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt\\ &\equiv\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\partial_{i_{2}}\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt\\ &\equiv\int_{0}^{\infty}e^{-2t}P_{t}\left(\partial_{i_{2}}\partial_{i_{1}}h\right)(x_{0})dt.\end{split} (102)

Notice that the order in which we take the partial derivatives i1\partial_{i_{1}} and i2\partial_{i_{2}} does not matter.

Base case k=2k=2 and i1=i2i_{1}=i_{2}.

The strategy is identical to the one of the preceding case i1i2i_{1}\neq i_{2}. The only difference is the regularization of i1h\partial_{i_{1}}h. Recall the notion of “partial regularization” from eq. (88) and define, for η>0\eta>0,

hi1η,i1(u)\displaystyle h_{i_{1}}^{\eta,i_{1}}(u) :=(Dhs,λf)(u)(ϱη(i1)i1f)(u)\displaystyle:=(Dh_{s,\lambda}\circ f)(u)(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)
(Dhs,λf)(u)11ϱ(r)i1f(urηei1)dr\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\int_{-1}^{1}\varrho(r)\>\partial_{i_{1}}f(u-r\eta e_{i_{1}})dr
(Dhs,λf)(u)(ϱηsign()𝟏{|||uj|,j})(ui1).\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\big{(}\varrho_{\eta}\ast\mathrm{sign}(\cdot)\mathbf{1}\{|\cdot|\geq|u_{j}|,\>\forall j\}\big{)}(u_{i_{1}}).

The map ui1hi1η,i1(u)u\mapsto\partial_{i_{1}}h_{i_{1}}^{\eta,i_{1}}(u) exists for all udu\in\mathbb{R}^{d}, and, by Lemmas 4 and 5,

i1hi1η,i2(u)\displaystyle\partial_{i_{1}}h_{i_{1}}^{\eta,i_{2}}(u) =(D2hs,λf)(u)(ϱη(i1)i1f)(u)i1f(u)\displaystyle=\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)\>\partial_{i_{1}}f(u) (103)
+(D2hs,λf)(u)i1(ϱη(i1)i1f)(u).\displaystyle\quad{}+\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\partial_{i_{1}}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u). (104)

By Leibniz’s integral rule we find that the partial derivative i1\partial_{i_{1}} in line (104) equals

i1(ϱη(i1)i1f)(u)\displaystyle\partial_{i_{1}}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)
=i1(ϱηsign()𝟏{|||uj|,j})(ui1)\displaystyle\quad{}=\partial_{i_{1}}\big{(}\varrho_{\eta}\ast\mathrm{sign}(\cdot)\mathbf{1}\{|\cdot|\geq|u_{j}|,\>\forall j\}\big{)}(u_{i_{1}})
=i1((ui1+maxji1|uj|)/η1ϱ(r)sign(ui1rη)dr+1(ui1maxji1|uj|)/ηϱ(r)sign(ui1rη)dr)\displaystyle\quad{}=\partial_{i_{1}}\left(\int_{(u_{i_{1}}+\max_{j\neq i_{1}}|u_{j}|)/\eta}^{1}\varrho(r)\mathrm{sign}(u_{i_{1}}-r\eta)dr+\int_{-1}^{(u_{i_{1}}-\max_{j\neq i_{1}}|u_{j}|)/\eta}\varrho(r)\mathrm{sign}(u_{i_{1}}-r\eta)dr\right)
=i1((ui1+maxji1|uj|)/η1ϱ(r)dr+1(ui1maxji1|uj|)/ηϱ(r)dr)\displaystyle\quad{}=\partial_{i_{1}}\left(-\int_{(u_{i_{1}}+\max_{j\neq i_{1}}|u_{j}|)/\eta}^{1}\varrho(r)dr+\int_{-1}^{(u_{i_{1}}-\max_{j\neq i_{1}}|u_{j}|)/\eta}\varrho(r)dr\right)
=ϱη(ui1+maxji1|uj|)ϱη(ui1maxji1|uj|)\displaystyle\quad{}=\varrho_{\eta}(u_{i_{1}}+\max_{j\neq i_{1}}|u_{j}|)-\varrho_{\eta}(u_{i_{1}}-\max_{j\neq i_{1}}|u_{j}|)
=(ϱη𝟏maxji1|uj|)(ui1)(ϱη𝟏maxji1|uj|)(ui1),\displaystyle\quad{}=\left(\varrho_{\eta}\ast\mathbf{1}_{-\max_{j\neq i_{1}}|u_{j}|}\right)(u_{i_{1}})-\left(\varrho_{\eta}\ast\mathbf{1}_{\max_{j\neq i_{1}}|u_{j}|}\right)(u_{i_{1}}), (105)

where x𝟏a(x)=𝟏{a=x}x\mapsto\mathbf{1}_{a}(x)=\mathbf{1}\{a=x\}, a,xa,x\in\mathbb{R}. Thus, from Lemma 1 (i) we infer that hi1η,i1h_{i_{1}}^{\eta,i_{1}} and i1hi1η,i1\partial_{i_{1}}h_{i_{1}}^{\eta,i_{1}} are both bounded and integrable.

The same arguments that led to eq. (87), (92), and (93) also yield

2xi12(0Pth(x)dt)|x=x0\displaystyle\frac{\partial^{2}}{\partial x_{i_{1}}^{2}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}} =0dethi1η,i1(u)i1ψ(x0,u,t)dudt\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}h_{i_{1}}^{\eta,i_{1}}(u)\partial_{i_{1}}\psi(x_{0},u,t)dudt (106)
+0det(i1h(u)hi1η,i1(u))i1ψ(x0,u,t)dudt.\displaystyle\quad{}+\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{1}}(u)\right)\partial_{i_{1}}\psi(x_{0},u,t)dudt. (107)

Repeating the arguments that gave eq. (94) we find that the integral in line (106) equals

0e2tE[i1hi1η,i1(etx0+1e2tZ)]dt.\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\partial_{i_{1}}h_{i_{1}}^{\eta,i_{1}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z\right)\right]dt. (108)

We now study the behavior of the integrals in lines (107) and (108) as η0\eta\downarrow 0. The arguments are similar to those used in the case i1i2i_{1}\neq i_{2}; we provide them for completeness only. As before, let V0t=etx0+1e2tZV_{0}^{t}=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z. By eq. (97) and (98) and Lemma 3,

|0e2tE[(D2hs,λf)(V0t)(ϱη(i1)i1f)(V0t)i1f(V0t)]dt\displaystyle\left|\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt\right.
0e2tE[(D2hs,λf)(V0t)i1f(V0t)i1f(V0t)]dt|\displaystyle\quad{}\quad{}\quad{}\quad{}\left.-\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt\right|
=|0de2t(D2hs,λf)(u)((ϱη(i1)i1f)(u)i1f(u))i1f(u)ψ(x0,u,t)dudt|\displaystyle\quad{}=\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\Big{(}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{)}\>\partial_{i_{1}}f(u)\psi(x_{0},u,t)dudt\right|
CA,d2λ2d|(ϱη(i1)i1f)(u)i1f(u)|𝟏{sus+3λ}du\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda^{2}}\int_{\mathbb{R}^{d}}\Big{|}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{|}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}du
0asη0,\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0, (109)

Since the law of N(0,Σ)N(0,\Sigma) is non-degenerate, by eq. (F.2), (96), and (98) and Lemma 3,

|0e2tE[(Dhs,λf)(V0t)i1(ϱη(i1)i1)(V0t)]dt|\displaystyle\left|\int_{0}^{\infty}e^{-2t}\mathrm{E}\Big{[}\big{(}Dh_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}})(V_{0}^{t})\Big{]}dt\right|
=|0de2t(Dhs,λf)(u)\displaystyle\quad{}=\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}Dh_{s,\lambda}\circ f\big{)}(u)\right.
×[(ϱη[𝟏maxji1|uj|𝟏maxji1|uj|])(ui2)(𝟏maxji1|uj|𝟏maxji1|uj|)(ui1)]ψ(x0,u,t)dt|\displaystyle\left.\phantom{\int_{0}^{\infty}}\times\left[\left(\varrho_{\eta}\ast\left[\mathbf{1}_{-\max_{j\neq i_{1}}|u_{j}|}-\mathbf{1}_{\max_{j\neq i_{1}}|u_{j}|}\right]\right)(u_{i_{2}})-\left(\mathbf{1}_{-\max_{j\neq i_{1}}|u_{j}|}-\mathbf{1}_{\max_{j\neq i_{1}}|u_{j}|}\right)(u_{i_{1}})\right]\psi(x_{0},u,t)dt\right|
CA,d2λd𝟏{sus+3λ}|(ϱη𝟏maxji1|uj|)(ui1)𝟏maxji1|uj|(ui1)|du\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}\left|\left(\varrho_{\eta}\ast\mathbf{1}_{-\max_{j\neq i_{1}}|u_{j}|}\right)(u_{i_{1}})-\mathbf{1}_{-\max_{j\neq i_{1}}|u_{j}|}(u_{i_{1}})\right|du
+CA,d2λd𝟏{sus+3λ}|(ϱη𝟏maxji1|uj|)(ui1)𝟏maxji1|uj|(ui1)|du\displaystyle\quad{}\quad{}+\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}\left|\left(\varrho_{\eta}\ast\mathbf{1}_{\max_{j\neq i_{1}}|u_{j}|}\right)(u_{i_{1}})-\mathbf{1}_{\max_{j\neq i_{1}}|u_{j}|}(u_{i_{1}})\right|du
0asη0.\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0. (110)

Combine eq. (F.2) and (F.2) to conclude that, as η0\eta\downarrow 0, the integral in (108) converges to

0e2tE[(D2hs,λf)(V0t)i1f(V0t)i1f(V0t)]dt.\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt. (111)

Lastly, we turn to the integral in line (107). By eq. (97) and (98) and Lemma 3,

|0det(i1h(u)hi1η,i1(u))i1ψ(x0,u,t)dudt|\displaystyle\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{1}}(u)\right)\partial_{i_{1}}\psi(x_{0},u,t)dudt\right|
=|0det(Dhs,λf)(u)(i1f(u)(ϱη(i1)i1f)(u))i1ψ(x0,u,t)dudt|\displaystyle\quad{}=\left|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}(Dh_{s,\lambda}\circ f)(u)\Big{(}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)\Big{)}\partial_{i_{1}}\psi(x_{0},u,t)dudt\right|
CA,dλd|i1f(u)(ϱη(i1)i1f)(u)|𝟏{sus+3λ}du\displaystyle\quad{}\leq\frac{C_{A,d}}{\lambda}\int_{\mathbb{R}^{d}}\Big{|}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)\Big{|}\mathbf{1}\{s\leq\|u\|_{\infty}\leq s+3\lambda\}du
0asη0,\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0, (112)

Thus, combining eq. (111) and (F.2) with eq. (106) and (107) and invoking Lemma (5) we conclude for d\mathcal{L}^{d}-a.e. x0dx_{0}\in\mathbb{R}^{d},

i1i1(0Pth(x)dt)|x=x0=0e2tE[(D2hs,λf)(V0t)i1f(V0t)i1f(V0t)]dt0de2ti1i1h(u)ψ(x0,u,t)dudt0e2tPt(i1i1h)(x0)dt.\displaystyle\begin{split}\partial_{i_{1}}\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}&=\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt\\ &\equiv\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\partial_{i_{1}}\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt\\ &\equiv\int_{0}^{\infty}e^{-2t}P_{t}\left(\partial_{i_{1}}\partial_{i_{1}}h\right)(x_{0})dt.\end{split} (113)

Inductive step from kk to k+1k+1.

Suppose that for arbitrary indices 1i1,,ikd1\leq i_{1},\ldots,i_{k}\leq d, k2k\geq 2,

iki1(0Pth(x)dt)|x=x0=0dektiki1h(u)ψ(x0,u,t)dudt0ektPt(iki1h)(x0)dt.\displaystyle\begin{split}\partial_{i_{k}}\cdots\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}&=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}\partial_{i_{k}}\cdots\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt\\ &\equiv\int_{0}^{\infty}e^{-kt}P_{t}\left(\partial_{i_{k}}\cdots\partial_{i_{1}}h\right)(x_{0})dt.\end{split} (114)

Under the induction hypothesis (114) the argument that gave identity (87) also gives

ik+1i1(0Pth(x)dt)|x=x0=0dektiki1h(u)ik+1ψ(x0,u,t)dudt.\displaystyle\partial_{i_{k+1}}\cdots\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}\partial_{i_{k}}\cdots\partial_{i_{1}}h(u)\partial_{i_{k+1}}\psi(x_{0},u,t)dudt. (115)

As in above case with k=2k=2, we now show that one can “pull back” the partial derivative ik+1\partial_{i_{k+1}} from ψ\psi onto iki1h\partial_{i_{k}}\cdots\partial_{i_{1}}h. Let 1ik+1d1\leq i_{k+1}\leq d be an arbitrary index and, for η>0\eta>0, define

hi1,,ikη,ik+1(u)\displaystyle h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u) :=(Dkhs,λf)(u)j=1k(ϱη(ik+1)ijf)(u),\displaystyle:=(D^{k}h_{s,\lambda}\circ f)(u)\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u),

where (ϱη(ik+1)ijf)(u)(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u) denotes the “partial regularization” in the ik+1i_{k+1}th coordinate as defined in eq. (88). The map uik+1hi1,,ikη,ik+1(u)u\mapsto\partial_{i_{k+1}}h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u) exists for all udu\in\mathbb{R}^{d}, and satisfies, by the chain rule,

ik+1hi1,,ikη,ik+1(u)\displaystyle\partial_{i_{k+1}}h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)
=(Dk+1hs,λf)(u)(j=1k(ϱη(ik+1)ijf)(u))ik+1f(u)\displaystyle\quad{}=\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(u)\left(\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u)\right)\partial_{i_{k+1}}f(u) (116)
+(Dk+1hs,λf)(u)j:ijik+1(j(ϱη(ik+1)if)(u))ik+1(ϱη(ik+1)ijf)(u)\displaystyle\quad{}\quad{}+\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(u)\sum_{j:i_{j}\neq i_{k+1}}\left(\prod_{\ell\neq j}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{\ell}}f)(u)\right)\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u) (117)
+(Dk+1hs,λf)(u)j:ij=ik+1(j(ϱη(ik+1)if)(u))ik+1(ϱη(ik+1)ik+1f)(u).\displaystyle\quad{}\quad{}+\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(u)\sum_{j:i_{j}=i_{k+1}}\left(\prod_{\ell\neq j}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{\ell}}f)(u)\right)\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{k+1}}f)(u). (118)

Notice that the partial derivative ik+1(ϱη(ik+1)ijf)(u)\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u) in eq. (117) and ik+1(ϱη(ik+1)ik+1f)(u)\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{k+1}}f)(u) in eq. (118) follow the patterns derived in eq. (F.2) and (F.2), respectively.

Next, expand the right hand side of (115) as

0dekthi1,,ikη,ik+1(u)ik+1ψ(x0,u,t)dudt\displaystyle\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)\partial_{i_{k+1}}\psi(x_{0},u,t)dudt (119)
+0dekt(i1,,ikh(u)hi1,,ikη,ik+1(u))ik+1ψ(x0,u,t)dudt.\displaystyle\quad{}+\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}\left(\partial_{i_{1},\ldots,i_{k}}h(u)-h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)\right)\partial_{i_{k+1}}\psi(x_{0},u,t)dudt. (120)

Recall the argument developed to establish the limit (95) for η0\eta\downarrow 0. The same argument, now combined with (116)–(118), also yields that the integral in (119) converges to

0e(k+1)tE[(Dk+1hs,λf)(V0t)j=1k+1ijf(V0t)]dt\displaystyle\int_{0}^{\infty}e^{-(k+1)t}\mathrm{E}\left[\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\prod_{j=1}^{k+1}\partial_{i_{j}}f(V_{0}^{t})\right]dt
0de(k+1)tik+1i1h(u)ψ(x0,u,t)dudt\displaystyle\quad{}\equiv\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-(k+1)t}\partial_{i_{k+1}}\cdots\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt
0e(k+1)tPt(ik+1i1h)(x0)dt.\displaystyle\quad{}\equiv\int_{0}^{\infty}e^{-(k+1)t}P_{t}\left(\partial_{i_{k+1}}\cdots\partial_{i_{1}}h\right)(x_{0})dt. (121)

Similarly, the same argument used to show that the integral in (93) vanishes for η0\eta\downarrow 0 also guarantees that the integral in (120) vanishes. Combine eq. (115) and (119)–(F.2) to conclude the inductive step from kk to k+1k+1 for k2k\geq 2.

Proof of part (ii).

Combine Lemma 1 (i), 4, and 5 and conclude that, for all xd𝒩x\in\mathbb{R}^{d}\setminus\mathcal{N},

|iki1h|(x)\displaystyle\left|\partial_{i_{k}}\ldots\partial_{i_{1}}h\right|(x)
=|(Dkhs,λf)|(x)|i1f|(x)|ik|(x)\displaystyle\quad{}=\left|\left(D^{k}h_{s,\lambda}\circ f\right)\right|(x)\left|\partial_{i_{1}}f\right|(x)\cdots\left|\partial_{i_{k}}\right|(x)
Ckλk𝟏{sxs+3λ}𝟏{|xi1||x|,}𝟏{|xik||x|,},\displaystyle\quad{}\leq C_{k}\lambda^{-k}\mathbf{1}\left\{s\leq\|x\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{|x_{i_{1}}|\geq|x_{\ell}|,\>\forall\ell\right\}\cdots\mathbf{1}\left\{|x_{i_{k}}|\geq|x_{\ell}|,\>\forall\ell\right\},

where Ck>0C_{k}>0 is the absolute constant from Lemma 1. Notice that

𝟏{|xi1||x|,}𝟏{|xik||x|,}=1|xi1|==|xik|,\displaystyle\mathbf{1}\left\{|x_{i_{1}}|\geq|x_{\ell}|,\>\forall\ell\right\}\cdots\mathbf{1}\left\{|x_{i_{k}}|\geq|x_{\ell}|,\>\forall\ell\right\}=1\quad{}\Longleftrightarrow\quad{}|x_{i_{1}}|=\ldots=|x_{i_{k}}|,

and, hence,

d({xd:𝟏{|xi1||x|,}𝟏{|xik||x|,}=1})>0i1==ik.\displaystyle\mathcal{L}^{d}\left(\left\{x\in\mathbb{R}^{d}:\mathbf{1}\left\{|x_{i_{1}}|\geq|x_{\ell}|,\>\forall\ell\right\}\cdots\mathbf{1}\left\{|x_{i_{k}}|\geq|x_{\ell}|,\>\forall\ell\right\}=1\right\}\right)>0\>\>\Leftrightarrow\>\>i_{1}=\ldots=i_{k}.

Thus, there exists a d\mathcal{L}^{d}-null set 𝒩𝒩\mathcal{N}^{\prime}\supseteq\mathcal{N} such that for all xd𝒩x\in\mathbb{R}^{d}\setminus\mathcal{N}^{\prime},

|iki1h|(x)Ckλk𝟏{sxs+3λ}𝟏{|xi1||x|,i1}𝟏{i1==ik}.\displaystyle\left|\partial_{i_{k}}\ldots\partial_{i_{1}}h\right|(x)\leq C_{k}\lambda^{-k}\mathbf{1}\left\{s\leq\|x\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{|x_{i_{1}}|\geq|x_{\ell}|,\>\ell\neq{i_{1}}\right\}\mathbf{1}\{i_{1}=\ldots=i_{k}\}. (122)

Since Σ\Sigma is positive definite, N(0,Σ)N(0,\Sigma) is absolutely continuous with respect to d\mathcal{L}^{d}. Thus, the d\mathcal{L}^{d}-a.e. upper bound (122) continues to hold when evaluated at V0t=etx0+1e2tZV_{0}^{t}=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z and integrated over ZN(0,Σ)Z\sim N(0,\Sigma) and tExp(k)t\sim Exp(k). We conclude that for all x0dx_{0}\in\mathbb{R}^{d},

|0ektPt(iki1h)(x0)dt|Ckλk0ektE[𝟏{sV0ts+3λ}𝟏{|V0i1t||V0t|,}]dt 1{i1==ik}.\displaystyle\begin{split}&\left|\int_{0}^{\infty}e^{-kt}P_{t}\left(\partial_{i_{k}}\ldots\partial_{i_{1}}h\right)(x_{0})dt\right|\\ &\>\>\>\leq C_{k}\lambda^{-k}\int_{0}^{\infty}e^{-kt}\mathrm{E}\left[\mathbf{1}\left\{s\leq\|V_{0}^{t}\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{|V_{0{i_{1}}}^{t}|\geq|V_{0\ell}^{t}|,\>\forall\ell\right\}\right]dt\>\mathbf{1}\{i_{1}=\ldots=i_{k}\}.\end{split} (123)

Proof of Lemma 3.

For any function hh on d\mathbb{R}^{d} and vector ydy\in\mathbb{R}^{d}, we define the translation operator τ\tau by τyh(x)=h(xy)\tau_{y}h(x)=h(x-y). With this notation,

(ϱη(i)h)=ϱ(r)h(xrηei)dr=ϱ(r)τrηeih(x)dr\displaystyle(\varrho_{\eta}\ast_{(i)}h)=\int\varrho(r)h(x-r\eta e_{i})dr=\int\varrho(r)\tau_{r\eta e_{i}}h(x)dr

Without loss of generality we can assume that the fjf_{j}’s are bounded by one. Also, since ϱ\varrho integrates to one, we have

(ϱη(ij)fj)g(x)fj(x)g(x)=ϱ(r)(τrηeijfj(x)fj(x))g(x)dr.\displaystyle(\varrho_{\eta}\ast_{(i_{j})}f_{j})g(x)-f_{j}(x)g(x)=\int\varrho(r)\left(\tau_{r\eta e_{i_{j}}}f_{j}(x)-f_{j}(x)\right)g(x)dr.

Hence, by the product comparison inequality,

j=1k(ϱη(ij)fj)gj=1kfjg1\displaystyle\left\|\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{j})}f_{j})g-\prod_{j=1}^{k}f_{j}g\right\|_{1} =|ϱ(r)j=1k(τrηeijfj(x)g(x)fj(x)g(x))dr|dx\displaystyle=\int\left|\int\varrho(r)\prod_{j=1}^{k}\left(\tau_{r\eta e_{i_{j}}}f_{j}(x)g(x)-f_{j}(x)g(x)\right)dr\right|dx
|ϱ(r)||j=1k(τrηeijfj(x)g(x)fj(x)g(x))|drdx\displaystyle\leq\int\int|\varrho(r)|\left|\prod_{j=1}^{k}\left(\tau_{r\eta e_{i_{j}}}f_{j}(x)g(x)-f_{j}(x)g(x)\right)\right|drdx
k=1k|ϱ(r)||τrηeijfj(x)g(x)fj(x)g(x)|drdx\displaystyle\leq\sum_{k=1}^{k}\int\int|\varrho(r)|\left|\tau_{r\eta e_{i_{j}}}f_{j}(x)g(x)-f_{j}(x)g(x)\right|drdx
k=1k|ϱ(r)|(τrηeijfj)gfjg1dr.\displaystyle\leq\sum_{k=1}^{k}\int|\varrho(r)|\|(\tau_{r\eta e_{i_{j}}}f_{j})g-f_{j}g\|_{1}dr. (124)

Next, compute

|ϱ(r)|τrηeijfjgfjg1dr\displaystyle\int|\varrho(r)|\|\tau_{r\eta e_{i_{j}}}f_{j}g-f_{j}g\|_{1}dr |ϱ(r)|(τrηeijfj)gτrηeij(fjg)1dr+ϱ(r)τrηeij(fjg)fjg1dr\displaystyle\leq\int|\varrho(r)|\|(\tau_{r\eta e_{i_{j}}}f_{j})g-\tau_{r\eta e_{i_{j}}}(f_{j}g)\|_{1}dr+\int\varrho(r)\|\tau_{r\eta e_{i_{j}}}(f_{j}g)-f_{j}g\|_{1}dr
=|ϱ(r)|(τrηeijfj)(gτrηeijg)1dr+|ϱ(r)|τrηeij(fjg)fjg1dr\displaystyle=\int|\varrho(r)|\|(\tau_{r\eta e_{i_{j}}}f_{j})(g-\tau_{r\eta e_{i_{j}}}g)\|_{1}dr+\int|\varrho(r)|\|\tau_{r\eta e_{i_{j}}}(f_{j}g)-f_{j}g\|_{1}dr
|ϱ(r)|gτrηeijg1dr+|ϱ(r)|τrηeij(fjg)fjg1dr.\displaystyle\leq\int|\varrho(r)|\|g-\tau_{r\eta e_{i_{j}}}g\|_{1}dr+\int|\varrho(r)|\|\tau_{r\eta e_{i_{j}}}(f_{j}g)-f_{j}g\|_{1}dr.

Since gτzg1\|g-\tau_{z}g\|_{1} and τz(fjg)fjg1\|\tau_{z}(f_{j}g)-f_{j}g\|_{1} are both bounded by 2g1<2\|g\|_{1}<\infty for all zz\in\mathbb{R}, Proposition 8.5 in Folland (1999) implies that, for all rr\in\mathbb{R},

gτrηeig1τrηei(fjg)fjg10asη0.\displaystyle\|g-\tau_{r\eta e_{i}}g\|_{1}\vee\|\tau_{r\eta e_{i}}(f_{j}g)-f_{j}g\|_{1}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0.

Hence, by the dominated convergence theorem

|ϱ(r)|τrηeijfjgfjg1dr0asη0.\displaystyle\int|\varrho(r)|\|\tau_{r\eta e_{i_{j}}}f_{j}g-f_{j}g\|_{1}dr\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0.

Conclude that each summand in eq. (F.2) vanishes as η0\eta\downarrow 0. This completes the proof. ∎

F.3 Proofs of Lemmas 8910, and 11

Proof of Lemma 8.

We provide a full proof for completeness only. Incidentally, this result has already been established by Le Cam (1986), p. 402 (Lemma 2). We compute,

{Xs}\displaystyle\mathbb{P}\left\{X\leq s\right\} ={Xs,Zs+ε}+{Xs,Z>s+ε}\displaystyle=\mathbb{P}\left\{X\leq s,\>Z\leq s+\varepsilon\right\}+\mathbb{P}\left\{X\leq s,\>Z>s+\varepsilon\right\}
{Zs+ε}+{|ZX|>ε}\displaystyle\leq\mathbb{P}\left\{Z\leq s+\varepsilon\right\}+\mathbb{P}\left\{|Z-X|>\varepsilon\right\}
{Zs}+{sZs+ε}+{|ZX|>ε}.\displaystyle\leq\mathbb{P}\left\{Z\leq s\right\}+\mathbb{P}\left\{s\leq Z\leq s+\varepsilon\right\}+\mathbb{P}\left\{|Z-X|>\varepsilon\right\}.

For the reverse inequality,

{Zsε}\displaystyle\mathbb{P}\left\{Z\leq s-\varepsilon\right\} {Zsε,Xs}+{Zsε,X>s}\displaystyle\leq\mathbb{P}\left\{Z\leq s-\varepsilon,\>X\leq s\right\}+\mathbb{P}\left\{Z\leq s-\varepsilon,\>X>s\right\}
{Xs}+{|ZX|>ε},\displaystyle\leq\mathbb{P}\left\{X\leq s\right\}+\mathbb{P}\left\{|Z-X|>\varepsilon\right\},

and, hence,

{Zs}{Xs}+{|ZX|>ε}+{sεZs}.\displaystyle\mathbb{P}\left\{Z\leq s\right\}\leq\mathbb{P}\left\{X\leq s\right\}+\mathbb{P}\left\{|Z-X|>\varepsilon\right\}+\mathbb{P}\left\{s-\varepsilon\leq Z\leq s\right\}.

Take the supremum over s0s\geq 0 and combine both inequalities. Then switch the roles of XX and ZZ to conclude the proof. ∎

Proof of Lemma 9.

Observe that XZ=X+(ZX)+X\vee Z=X+(Z-X)_{+}. Now, by Cauchy-Schwarz,

Var(XZ)\displaystyle\mathrm{Var}(X\vee Z) =Var(X)+Var((ZX)+)+2Cov(X,(ZX)+)\displaystyle=\mathrm{Var}(X)+\mathrm{Var}\big{(}(Z-X)_{+}\big{)}+2\mathrm{Cov}\big{(}X,(Z-X)_{+}\big{)}
Var(X)+Var((ZX)+)+2Var(X)Var((ZX)+)\displaystyle\leq\mathrm{Var}(X)+\mathrm{Var}\big{(}(Z-X)_{+}\big{)}+2\sqrt{\mathrm{Var}(X)}\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}}
=(Var(X)+Var((ZX)+))2.\displaystyle=\left(\sqrt{\mathrm{Var}(X)}+\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}}\right)^{2}.

Similarly, use the estimate 2Cov(X,(ZX)+)2Var(X)Var((ZX)+)2\mathrm{Cov}\big{(}X,(Z-X)_{+}\big{)}\geq-2\sqrt{\mathrm{Var}(X)}\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}} in the first line of above display, to obtain

Var(XZ)(Var(X)Var((ZX)+))2.\displaystyle\mathrm{Var}(X\vee Z)\geq\left(\sqrt{\mathrm{Var}(X)}-\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}}\right)^{2}.

Combine both inequalities to obtain the desired two-sided inequality. Further, if E[ZX]0\mathrm{E}[Z-X]\geq 0, then by convexity of the map a(a)+a\mapsto(a)_{+}, aa\in\mathbb{R}, and Jensen’s inequality,

Var((ZX)+)=E[(ZX)+2]E[(ZX)+]2E[(ZX)2](E[ZX])+2=Var(ZX).\displaystyle\mathrm{Var}\big{(}(Z-X)_{+}\big{)}=\mathrm{E}\big{[}(Z-X)_{+}^{2}\big{]}-\mathrm{E}\big{[}(Z-X)_{+}\big{]}^{2}\leq\mathrm{E}\big{[}(Z-X)^{2}\big{]}-\big{(}\mathrm{E}[Z-X])_{+}^{2}=\mathrm{Var}(Z-X).

Proof of Lemma 10.

The proof is identical to the one of Lemma 3.2 in Chernozhukov et al. (2013). Let πn(δ)=δ1/3(Var(Σ1/2Z))1/3\pi_{n}(\delta)=\delta^{1/3}\left(\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})\right)^{-1/3}. By Lemma 3 there exists an absolute constant K>0K>0 such that on the event {maxj,k|Σ^n,jkΣjk|δ}\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|\leq\delta\right\}, for all s0s\geq 0,

|{Σ^n1/2ZsX1,,Xn}{Σn1/2Zs}|Kπn(δ).\displaystyle\Big{|}\mathbb{P}\left\{\|\widehat{\Sigma}_{n}^{1/2}Z\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\right\}\Big{|}\leq K\pi_{n}(\delta).

In particular, for s=cn(Kπn(δ;Σ)+α)s=c_{n}(K\pi_{n}(\delta;\Sigma)+\alpha), we obtain

{Σ^n1/2Zcn(Kπn(δ)+α;Σ)X1,,Xn}\displaystyle\mathbb{P}\left\{\|\widehat{\Sigma}_{n}^{1/2}Z\|_{\infty}\leq c_{n}(K\pi_{n}(\delta)+\alpha;\Sigma)\mid X_{1},\ldots,X_{n}\right\} {Σn1/2Zcn(Kπn(δ;Σ)+α)}Kπn(δ)\displaystyle\geq\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq c_{n}(K\pi_{n}(\delta;\Sigma)+\alpha)\right\}-K\pi_{n}(\delta)
Kπn(δ)+αKπn(δ)=α.\displaystyle\geq K\pi_{n}(\delta)+\alpha-K\pi_{n}(\delta)=\alpha.

To conclude the proof of the first statement apply the definition of quantiles. The second claim follows in the same way. ∎

Proof of Lemma 11.

Let δ,η>0\delta,\eta>0 be arbitrary and set γn=sups0|{SnsX1,,Xn}{Σn1/2Zs}|\gamma_{n}=\sup_{s\geq 0}|\mathbb{P}\{S_{n}\leq s\mid X_{1},\ldots,X_{n}\}-\mathbb{P}\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\}|, κn(δ)=δ(Var(Σn1/2Z))1/2\kappa_{n}(\delta)=\delta(\mathrm{Var}(\|\Sigma_{n}^{1/2}Z\|_{\infty}))^{-1/2}, ρn(δ)={|Rn|>δX1,,Xn}\rho_{n}(\delta)=\mathbb{P}\{|R_{n}|>\delta\mid X_{1},\ldots,X_{n}\}. By Lemma 6 there exists an absolute constant K>0K>0 such that on the event {γn+ρ(δ)η}\{\gamma_{n}+\rho(\delta)\leq\eta\},

|{TnsX1,,Xn}{Σn1/2Zs}|\displaystyle\left|\mathbb{P}\left\{T_{n}\leq s\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\right\}\right|
|{Tns,|Rn|δX1,,Xn}{Σn1/2Zs}|+{|Rn|>δX1,,Xn}\displaystyle\quad\leq\left|\mathbb{P}\left\{T_{n}\leq s,|R_{n}|\leq\delta\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\right\}\right|+\mathbb{P}\left\{|R_{n}|>\delta\mid X_{1},\ldots,X_{n}\right\}
sups0|{SnsX1,,Xn}{Σn1/2Zs}|\displaystyle\quad\leq\sup_{s\geq 0}\left|\mathbb{P}\left\{S_{n}\leq s\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\right\}\right|
+sups0{sΣn1/2Zs+δ}+{|Rn|>δX1,,Xn}\displaystyle\quad\quad+\sup_{s\geq 0}\mathbb{P}\left\{s\leq\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s+\delta\right\}+\mathbb{P}\left\{|R_{n}|>\delta\mid X_{1},\ldots,X_{n}\right\}
γn+κn(δ)+ρn(δ)\displaystyle\quad\leq\gamma_{n}+\kappa_{n}(\delta)+\rho_{n}(\delta)
Kκn(δ)+η.\displaystyle\quad\leq K\kappa_{n}(\delta)+\eta.

Hence, for s=cn(Kκn(δ)+η+α;Σ)s=c_{n}(K\kappa_{n}(\delta)+\eta+\alpha;\Sigma), we obtain

{Tncn(Kκn(δ)+η+α;Σ)}\displaystyle\mathbb{P}\left\{T_{n}\leq c_{n}(K\kappa_{n}(\delta)+\eta+\alpha;\Sigma)\right\}
{Σn1/2Zcn(Kκn(δ)+η+α;Σ)}Kκn(δ)η\displaystyle\quad\geq\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq c_{n}(K\kappa_{n}(\delta)+\eta+\alpha;\Sigma)\right\}-K\kappa_{n}(\delta)-\eta
Kκn(δ)+η+αKκn(δ)η\displaystyle\quad\geq K\kappa_{n}(\delta)+\eta+\alpha-K\kappa_{n}(\delta)-\eta
=α.\displaystyle\quad=\alpha.

The first statement now follows from the definition of quantiles. The second claim follows in the same way. ∎

F.4 Proof of Lemma 18

Proof of Lemma 18.

First, we show necessity: Suppose that the sample paths of XX on UU are almost surely uniformly continuous w.r.t. dXd_{X}, i.e. for almost every ωΩ\omega\in\Omega,

limδ0supdX(u,v)<δ|Xu(ω)Xv(ω)|=0.\displaystyle\lim_{\delta\rightarrow 0}\sup_{d_{X}(u,v)<\delta}|X_{u}(\omega)-X_{v}(\omega)|=0.

Since XX is Gaussian, by Lemma 14, for δ>0\delta>0 arbitrary,

E[supdX(u,v)<δ|Xu(ω)Xv(ω)|]2E[Z]<.\displaystyle\mathrm{E}\left[\sup_{d_{X}(u,v)<\delta}|X_{u}(\omega)-X_{v}(\omega)|\right]\leq 2\mathrm{E}[Z]<\infty.

Hence, the claim follows from the dominated convergence theorem.

Next, we show sufficiency: Given the premise, we can find a null sequence {δn}n1\{\delta_{n}\}_{n\geq 1} such that

E[supdX(u,v)<δn|XuXv|]2n\displaystyle\mathrm{E}\left[\sup_{d_{X}(u,v)<\delta_{n}}|X_{u}-X_{v}|\right]\leq 2^{-n} (125)

Define events

An={supdX(u,v)<δn2n|XuXv|>2n/2}.\displaystyle A_{n}=\left\{\sup_{d_{X}(u,v)<\delta_{n}\wedge 2^{-n}}|X_{u}-X_{v}|>2^{-n/2}\right\}.

By Markov’s inequality and (125),

i=1{An}i=12n/2=1+2<.\displaystyle\sum_{i=1}^{\infty}\mathbb{P}\{A_{n}\}\leq\sum_{i=1}^{\infty}2^{-n/2}=1+\sqrt{2}<\infty.

Thus, by Borel-Cantelli, {lim supnAn}=0\mathbb{P}\{\limsup_{n\rightarrow\infty}A_{n}\}=0, i.e. XX is almost surely uniformly continuous on UU. (Note that we have not used the fact that XX is Gaussian and dXd_{X} the intrinsic standard deviation. Hence, sufficiency holds for arbitrary stochastic processes on general metric spaces.) ∎

F.5 Proofs of Lemmas 19202122, and 23

Proof of Lemma 19.

Throughout the proof \otimes denotes the tensor product between linear maps.

Let Mn=max1inXi22M_{n}=\max_{1\leq i\leq n}\|X_{i}\|_{2}^{2}. Define the map xφ(x):=x2𝟏{|x|Mn}+Mn2𝟏{|x|>Mn}x\mapsto\varphi(x):=x^{2}\mathbf{1}\{|x|\leq M_{n}\}+M_{n}^{2}\mathbf{1}\{|x|>M_{n}\}. This map is Lipschitz continuous with Lipschitz constant 2Mn2M_{n} and φ(0)=0\varphi(0)=0. Let ε1,,εn\varepsilon_{1},\ldots,\varepsilon_{n} be i.i.d. Rademacher random variables independent of X1,,XnX_{1},\ldots,X_{n}. Then, symmetrization and contraction principle applied to φ()/(2Mn)\varphi(\cdot)/(2M_{n}) yield

E1ni=1nXiXiXiXiT4op\displaystyle\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}\otimes X_{i}\otimes X_{i}-T_{4}\right\|_{op} 4E[Mnsupu2=1v2=1|1ni=1nεi(Xiu)(Xiv)|].\displaystyle\leq 4\mathrm{E}\left[M_{n}\sup_{\begin{subarray}{c}\|u\|_{2}=1\\ \|v\|_{2}=1\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}(X_{i}^{\prime}u)(X_{i}^{\prime}v)\right|\right].

We upper bound the right-hand side in above display using Cauchy-Schwarz, Hoffmann-Jørgensen, and de-symmetrization inequalities by

(E[Mn2])1/2(E[supu2=1v2=1(1ni=1nεi(Xiu)(Xiv))2])1/2\displaystyle\left(\mathrm{E}\left[M_{n}^{2}\right]\right)^{1/2}\left(\mathrm{E}\left[\sup_{\begin{subarray}{c}\|u\|_{2}=1\\ \|v\|_{2}=1\end{subarray}}\left(\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}(X_{i}^{\prime}u)(X_{i}^{\prime}v)\right)^{2}\right]\right)^{1/2}
(E[Mn2])1/2{E1ni=1nXiXiΣop+n1(E[Mn2])1/2}.\displaystyle\quad\lesssim\left(\mathrm{E}\left[M_{n}^{2}\right]\right)^{1/2}\left\{\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}-\Sigma\right\|_{op}+n^{-1}\left(\mathrm{E}\left[M_{n}^{2}\right]\right)^{1/2}\right\}.

Since X=(X1,,Xd)dX=(X_{1},\ldots,X_{d})^{\prime}\in\mathbb{R}^{d} is sub-Gaussian with mean zero and covariance Σ\Sigma, we compute

XXψ1k=1dXk2ψ1k=1dXkψ22k=1dVar(Xk)=tr(Σ).\displaystyle\left\|X^{\prime}X\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}^{2}\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}\right\|_{\psi_{2}}^{2}\lesssim\sum_{k=1}^{d}\mathrm{Var}(X_{k})=\mathrm{tr}(\Sigma).

Hence, by Lemma 35 in Giessing and Wang (2021) and Lemma 2.2.2 in van der Vaart and Wellner (1996) (see Exercise 2.14.1 (ibid.) for how to handle the non-convexity of the map inducing the ψ1/2\psi_{1/2}-norm)

E[Mn2]=E[max1in(XiXi)2](logen)2max1in(XiXi)2ψ1/2(logen)2tr2(Σ).\displaystyle\mathrm{E}\left[M_{n}^{2}\right]=\mathrm{E}\left[\max_{1\leq i\leq n}(X_{i}^{\prime}X_{i})^{2}\right]\lesssim(\log en)^{2}\max_{1\leq i\leq n}\left\|(X_{i}^{\prime}X_{i})^{2}\right\|_{\psi_{1/2}}\lesssim(\log en)^{2}\mathrm{tr}^{2}(\Sigma).

Moreover, by Theorem 4 in Koltchinskii and Lounici (2017),

E1ni=1nXiXiΣopΣop(r(Σ)nr(Σ)n).\displaystyle\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}-\Sigma\right\|_{op}\lesssim\|\Sigma\|_{op}\left(\sqrt{\frac{\mathrm{r}(\Sigma)}{n}}\vee\frac{\mathrm{r}(\Sigma)}{n}\right).

Thus, we conclude that

E1ni=1nXiXiXiXiT4op\displaystyle\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}\otimes X_{i}\otimes X_{i}-T_{4}\right\|_{op}
(logen)r(Σ)Σop2(r(Σ)nr(Σ)n)+(logen)2r2(Σ)Σop2n.\displaystyle\quad\lesssim(\log en)\mathrm{r}(\Sigma)\|\Sigma\|_{op}^{2}\left(\sqrt{\frac{\mathrm{r}(\Sigma)}{n}}\vee\frac{\mathrm{r}(\Sigma)}{n}\right)+\frac{(\log en)^{2}\mathrm{r}^{2}(\Sigma)\|\Sigma\|_{op}^{2}}{n}.

Adjust some absolute constants to complete the proof. ∎

Proof of Lemma 20.

To simplify notation we write εi=Yif0(Xi)\varepsilon_{i}=Y_{i}-f_{0}(X_{i}) for 1in1\leq i\leq n.

We begin with the following fundamental identity: Let II be the identity operator and PP be such that I+PI+P is invertible. Then,

(I+P)1PI=(I+P)1.\displaystyle(I+P)^{-1}P-I=-(I+P)^{-1}.

Indeed, we have I=(I+P)1(I+P)=(I+P)1+(I+P)1PI=(I+P)^{-1}(I+P)=(I+P)^{-1}+(I+P)^{-1}P. Now, re-arrange the terms on the far left and right hand side of this identity to conclude. Applied to P=λ1T^nP=\lambda^{-1}\widehat{T}_{n}, we obtain

(T^n+λ)1T^nI=(λ1T^n+I)1λ1T^nI=(λ1T^n+I)1=λ(T^n+λ)1.\displaystyle(\widehat{T}_{n}+\lambda)^{-1}\widehat{T}_{n}-I=(\lambda^{-1}\widehat{T}_{n}+I)^{-1}\lambda^{-1}\widehat{T}_{n}-I=-(\lambda^{-1}\widehat{T}_{n}+I)^{-1}=-\lambda(\widehat{T}_{n}+\lambda)^{-1}. (126)

Hence, we compute

f^bcnf0\displaystyle\widehat{f}^{\mathrm{bc}}_{n}-f_{0} =(T^n+λ)11ni=1nkXiYi+λ(T^n+λ)1f^nf0\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}k_{X_{i}}Y_{i}+\lambda(\widehat{T}_{n}+\lambda)^{-1}\widehat{f}_{n}-f_{0}
=(T^n+λ)11ni=1nεikXi+(T^n+λ)1T^nf0+λ(T^n+λ)1f^nf0\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+(\widehat{T}_{n}+\lambda)^{-1}\widehat{T}_{n}f_{0}+\lambda(\widehat{T}_{n}+\lambda)^{-1}\widehat{f}_{n}-f_{0}
=(a)(T^n+λ)11ni=1nεikXi+λ(T^n+λ)1(f^nf0)\displaystyle\overset{(a)}{=}(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+\lambda(\widehat{T}_{n}+\lambda)^{-1}(\widehat{f}_{n}-f_{0})
=(T^n+λ)11ni=1nεikXi+λ(T^n+λ)21ni=1nεikXi\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+\lambda(\widehat{T}_{n}+\lambda)^{-2}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}
+λ(T^n+λ)1((T^n+λ)1T^nf0f0)\displaystyle\quad+\lambda(\widehat{T}_{n}+\lambda)^{-1}\left((\widehat{T}_{n}+\lambda)^{-1}\widehat{T}_{n}f_{0}-f_{0}\right)
=(b)(T^n+λ)11ni=1nεikXiλ(T^n+λ)21ni=1nεikXiλ2(T^n+λ)2f0\displaystyle\overset{(b)}{=}(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda(\widehat{T}_{n}+\lambda)^{-2}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda^{2}(\widehat{T}_{n}+\lambda)^{-2}f_{0}
=(T^n+λ)1(Iλ(T^n+λ)1)1ni=1nεikXiλ2(T^n+λ)2f0\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\left(I-\lambda(\widehat{T}_{n}+\lambda)^{-1}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda^{2}(\widehat{T}_{n}+\lambda)^{-2}f_{0}
=(c)(T^n+λ)2T^n1ni=1nεikXiλ2(T^n+λ)2f0,\displaystyle\overset{(c)}{=}(\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda^{2}(\widehat{T}_{n}+\lambda)^{-2}f_{0}, (127)

where (a), (b), and (c) follow from identity (126).

We now further upper bound the terms in (F.5). Re-walking the steps from (c) to (b) we obtain

((T^n+λ)2T^n(T+λ)2T)1ni=1nεikXi\displaystyle\left\|\left((\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n}-(T+\lambda)^{-2}T\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\|_{\infty}
((T^n+λ)1(T+λ)1)1ni=1nεikXi+λ((T^n+λ)2(T+λ)2)1ni=1nεikXi.\displaystyle\begin{split}&\quad\quad\leq\left\|\left((\widehat{T}_{n}+\lambda)^{-1}-(T+\lambda)^{-1}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\|_{\infty}\\ &\quad\quad\quad+\lambda\left\|\left((\widehat{T}_{n}+\lambda)^{-2}-(T+\lambda)^{-2}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\|_{\infty}.\end{split} (128)

Note that

(T^n+λ)1(T+λ)Iop\displaystyle\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\|_{op} (T^n+λ)1opTT^nopλ1TT^nop.\displaystyle\leq\|(\widehat{T}_{n}+\lambda)^{-1}\|_{op}\|T-\widehat{T}_{n}\|_{op}\leq\lambda^{-1}\|T-\widehat{T}_{n}\|_{op}. (129)

Recall that |a21|3max{|a1|,|a1|2}|a^{2}-1|\leq 3\max\{|a-1|,|a-1|^{2}\} for all a0a\geq 0. Let UU_{\mathcal{H}} be the unit ball of \mathcal{H}. We have

(T^n+λ)2(T+λ)2Iop\displaystyle\|(\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\|_{op}
=(a)supuU|((T^n+λ)2(T+λ)2I)u,u|\displaystyle\quad\overset{(a)}{=}\sup_{u\in U_{\mathcal{H}}}\left|\left\langle\left((\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\right)u,u\right\rangle_{\mathcal{H}}\right|
=(b)supuU|(T^n+λ)1(T+λ)u,(T^n+λ)1(T+λ)u1|\displaystyle\quad\overset{(b)}{=}\sup_{u\in U_{\mathcal{H}}}\left|\left\langle(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u,(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\rangle_{\mathcal{H}}-1\right|
=supuU|(T^n+λ)1(T+λ)u21|\displaystyle\quad=\sup_{u\in U_{\mathcal{H}}}\left|\left\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\|_{\mathcal{H}}^{2}-1\right|
3supuU|(T^n+λ)1(T+λ)u1|3supuU|(T^n+λ)1(T+λ)u1|2\displaystyle\quad\leq 3\sup_{u\in U_{\mathcal{H}}}\left|\left\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\|_{\mathcal{H}}-1\right|\vee 3\sup_{u\in U_{\mathcal{H}}}\left|\left\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\|_{\mathcal{H}}-1\right|^{2}
(c)3(T^n+λ)1(T+λ)Iop3(T^n+λ)1(T+λ)Iop2\displaystyle\quad\overset{(c)}{\leq}3\left\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\right\|_{op}\vee 3\left\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\right\|_{op}^{2}
(d)3λ1TT^nop3λ2TT^nop2,\displaystyle\quad\overset{(d)}{\leq}3\lambda^{-1}\|T-\widehat{T}_{n}\|_{op}\vee 3\lambda^{-2}\|T-\widehat{T}_{n}\|_{op}^{2}, (130)

where (a) and (b) hold because (T^n+λ)1(\widehat{T}_{n}+\lambda)^{-1} and (T+λ)(T+\lambda) are self-adjoint, (c) follows from the reverse triangle inequality applied \|\cdot\|_{\mathcal{H}} and the definition of the operator norm, and (d) follows from (129).

By the reproducing property of the kernel kk (see Remark LABEL:remark:Assumptions-RE) and (129),

((T^n+λ)1(T+λ)1)1ni=1nεikXiκλ1T^nTop1ni=1n(T+λ)1εikXi,\displaystyle\begin{split}&\left\|\left((\widehat{T}_{n}+\lambda)^{-1}-(T+\lambda)^{-1}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\|_{\infty}\leq\kappa\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}},\end{split} (131)

and similarly, by (F.5),

λ((T^n+λ)2(T+λ)2)1ni=1nεikXiκλ1T^nTop(3+3λ1T^nTop)1ni=1n(T+λ)1εikXi,\displaystyle\begin{split}&\lambda\left\|\left((\widehat{T}_{n}+\lambda)^{-2}-(T+\lambda)^{-2}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\|_{\infty}\\ &\quad\leq\kappa\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\left(3+3\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\right)\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}},\end{split} (132)

and (129) again,

λ2(T^n+λ)2f0\displaystyle\lambda^{2}\|(\widehat{T}_{n}+\lambda)^{-2}f_{0}\|_{\infty} λ2κ(T^n+λ)1(T+λ)op2(T+λ)2f0\displaystyle\leq\lambda^{2}\kappa\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)\|_{op}^{2}\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}
λ2κ(T^n+λ)1(T+λ)Iop2(T+λ)2f0+λ2κ(T+λ)2f0\displaystyle\leq\lambda^{2}\kappa\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\|_{op}^{2}\|(T+\lambda)^{2}f_{0}\|_{\mathcal{H}}+\lambda^{2}\kappa\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}
κ(T^nTop2+λ2)(T+λ)2f0.\displaystyle\leq\kappa\left(\|\widehat{T}_{n}-T\|_{op}^{2}+\lambda^{2}\right)\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}. (133)

Combine (F.5), (F.5), and (131)–(F.5) to conclude that

n(f^bcnf0)=(T+λ)2T1ni=1nεikXi+nRn,\displaystyle\sqrt{n}(\widehat{f}^{\mathrm{bc}}_{n}-f_{0})=(T+\lambda)^{-2}T\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+\sqrt{n}R_{n},

where

nRn\displaystyle\sqrt{n}\|R_{n}\|_{\infty} κλ1T^nTop(1+λ1T^nTop)1ni=1n(T+λ)1εikXi\displaystyle\lesssim\kappa\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\left(1+\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\right)\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}}
+κ(nT^nTop2+nλ2)(T+λ)2f0.\displaystyle+\kappa\left(\sqrt{n}\|\widehat{T}_{n}-T\|_{op}^{2}+\sqrt{n}\lambda^{2}\right)\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}.

Proof of Lemma 21.

Observe that tr(T)κ2\mathrm{tr}(T)\leq\kappa^{2}. Therefore, by Lemma 23, with probability at least 1δ1-\delta,

λ1T^nTop1ni=1n(T+λ)1εikXiκ2log2(2/δ)(σ¯2𝔫12(λ)nλ2σ¯2nλ2),\displaystyle\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}}\lesssim\kappa^{2}\log^{2}(2/\delta)\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right),

and

λ2T^nTop21ni=1n(T+λ)1εikXiκ4log3(2/δ)nλ(σ¯2𝔫12(λ)nλ2σ¯2nλ2),\displaystyle\lambda^{-2}\|\widehat{T}_{n}-T\|_{op}^{2}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}}\lesssim\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right),

and

κ(nT^nTop2+nλ2)(T+λ)2f0κ(κ4log2(2/δ)n+nλ2)(T+λ)2f0.\displaystyle\kappa\left(\sqrt{n}\|\widehat{T}_{n}-T\|_{op}^{2}+\sqrt{n}\lambda^{2}\right)\|(T+\lambda)^{2}f_{0}\|_{\mathcal{H}}\lesssim\kappa\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}+\sqrt{n}\lambda^{2}\right)\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}.

We combine above three inequalities and conclude that, with probability at least 1δ1-\delta,

nRn\displaystyle\sqrt{n}\|R_{n}\|_{\infty} (σ¯2𝔫12(λ)nλ2σ¯2nλ2)(κ2log2(2/δ)+κ4log3(2/δ)nλ)\displaystyle\lesssim\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right)\left(\kappa^{2}\log^{2}(2/\delta)+\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\right)
+κ(T+λ)2f0(κ4log2(2/δ)n+nλ2).\displaystyle+\kappa\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}+\sqrt{n}\lambda^{2}\right).

Proof of Lemma 22.

Note that AA and (A+λ)1(A+\lambda)^{-1} commute for any operator AA. Hence, we compute

Ω^nΩop|σ^n2σ02|(T+λ)4T3op+σ02(T^n+λ)4T^n3(T+λ)4T3op=𝐈+𝐈𝐈.\displaystyle\|\widehat{\Omega}_{n}-\Omega\|_{op}\leq\left|\widehat{\sigma}_{n}^{2}-\sigma_{0}^{2}\right|\|(T+\lambda)^{-4}T^{3}\|_{op}+\sigma_{0}^{2}\|(\widehat{T}_{n}+\lambda)^{-4}\widehat{T}_{n}^{3}-(T+\lambda)^{-4}T^{3}\|_{op}=\mathbf{I}+\mathbf{II}.

Bound on term 𝐈\mathbf{I}. Since E[(Yif0(Xi))2σ02]=0\mathrm{E}[(Y_{i}-f_{0}(X_{i}))^{2}-\sigma_{0}^{2}]=0, |(Yif0(Xi))2σ02|(B+κf0)2+σ02|(Y_{i}-f_{0}(X_{i}))^{2}-\sigma_{0}^{2}|\leq(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}+\sigma_{0}^{2} almost surely, and Var(Yif0(Xi))2)2(B+κf0)4+2σ04\mathrm{Var}(Y_{i}-f_{0}(X_{i}))^{2})\leq 2(B+\kappa\|f_{0}\|_{\mathcal{H}})^{4}+2\sigma_{0}^{4} Bernstein’s inequality for real-valued random variables implies that, with probability at least 1δ1-\delta,

|σ^n2σ02|((B+κf0)2+σ02)(log(2/δ)nlog(2/δ)n).\displaystyle\left|\widehat{\sigma}_{n}^{2}-\sigma_{0}^{2}\right|\lesssim((B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}+\sigma_{0}^{2})\left(\sqrt{\frac{\log(2/\delta)}{n}}\vee\frac{\log(2/\delta)}{n}\right).

Also,

(T+λ)4T3opλ1(T+λ)1Top3.\displaystyle\|(T+\lambda)^{-4}T^{3}\|_{op}\leq\lambda^{-1}\|(T+\lambda)^{-1}T\|_{op}^{3}.

Hence, with probability at least 1δ1-\delta,

𝐈((B+κf0)2+σ02)(T+λ)1Top3(log(2/δ)nλ2log(2/δ)nλ).\displaystyle\mathbf{I}\lesssim((B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}+\sigma_{0}^{2})\|(T+\lambda)^{-1}T\|_{op}^{3}\left(\sqrt{\frac{\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\log(2/\delta)}{n\lambda}\right).

Bound on term 𝐈𝐈\mathbf{II}. Compute

(T^n+λ)4T^n3(T+λ)4T3op\displaystyle\|(\widehat{T}_{n}+\lambda)^{-4}\widehat{T}_{n}^{3}-(T+\lambda)^{-4}T^{3}\|_{op}
(T^n3T3)(T+λ)4op+(T^n+λ)4(T+λ)4Iop(T+λ)4T3op.\displaystyle\quad\leq\|(\widehat{T}_{n}^{3}-T^{3})(T+\lambda)^{-4}\|_{op}+\|(\widehat{T}_{n}+\lambda)^{-4}(T+\lambda)^{4}-I\|_{op}\|(T+\lambda)^{-4}T^{3}\|_{op}.

Since (a3b3)c4=(ab)c2((ab)2c2+3(ab)cbc+3b2c2)(a^{3}-b^{3})c^{4}=(a-b)c^{2}\big{(}(a-b)^{2}c^{2}+3(a-b)cbc+3b^{2}c^{2}\big{)} for a,b,ca,b,c\in\mathbb{R}, it follows that

(T^n3T3)(T+λ)4op\displaystyle\|(\widehat{T}_{n}^{3}-T^{3})(T+\lambda)^{-4}\|_{op}
λ1(T^nT)(T+λ)1op((T^nT)(T+λ)1op2\displaystyle\quad\leq\lambda^{-1}\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\|_{op}\left(\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\|_{op}^{2}\right.
+3(T^nT)(T+λ)1opT(T+λ)1op+3T(T+λ)1op2)\displaystyle\quad\quad\left.+3\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\|_{op}\|T(T+\lambda)^{-1}\|_{op}+3\|T(T+\lambda)^{-1}\|_{op}^{2}\right)
3λ1(T^nT)(T+λ)1op((T^nT)(T+λ)1op+T(T+λ)1op)2.\displaystyle\quad\leq 3\lambda^{-1}\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\|_{op}\left(\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\|_{op}+\|T(T+\lambda)^{-1}\|_{op}\right)^{2}.

Hence, by Lemma 23, with probability at least 1δ1-\delta,

(T^n3T3)(T+λ)4op(κ2𝔫12(λ)log(2/δ)nλ2κ2log(2/δ)nλ2)×(κ2𝔫12(λ)log(2/δ)nλ2κ2log(2/δ)nλ2T(T+λ)1op)2.\displaystyle\begin{split}\|(\widehat{T}_{n}^{3}-T^{3})(T+\lambda)^{-4}\|_{op}&\lesssim\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)\\ &\quad\times\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\vee\|T(T+\lambda)^{-1}\|_{op}\right)^{2}.\end{split} (134)

Next, recall the approach that led to the bound in (F.5) in the proof of Lemma 20. We iterate this approach to obtain

(T^n+λ)4(T+λ)4Iop\displaystyle\|(\widehat{T}_{n}+\lambda)^{-4}(T+\lambda)^{4}-I\|_{op}
3(T^n+λ)2(T+λ)2Iop3(T^n+λ)2(T+λ)2Iop2\displaystyle\quad\leq 3\|(\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\|_{op}\vee 3\|(\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\|_{op}^{2}
9(T^n+λ)1(T+λ)Iop9(T^n+λ)1(T+λ)Iop4\displaystyle\quad\leq 9\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\|_{op}\vee 9\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\|_{op}^{4}
9λ1TT^nop9λ4TT^nop4,\displaystyle\quad\leq 9\lambda^{-1}\|T-\widehat{T}_{n}\|_{op}\vee 9\lambda^{-4}\|T-\widehat{T}_{n}\|_{op}^{4},

Thus, by Lemma 23 and since tr(T)κ2\mathrm{tr}(T)\leq\kappa^{2}, with probability at least 1δ1-\delta,

(T^n+λ)4(T+λ)4Iop(T+λ)1Top3(κ4log(2/δ)nλ2(κ2log(2/δ)nλ2)4)(T+λ)4T3op.\displaystyle\begin{split}&\|(\widehat{T}_{n}+\lambda)^{-4}(T+\lambda)^{4}-I\|_{op}\|(T+\lambda)^{-1}T\|_{op}^{3}\\ &\quad\lesssim\left(\sqrt{\frac{\kappa^{4}\log(2/\delta)}{n\lambda^{2}}}\vee\left(\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)^{4}\right)\|(T+\lambda)^{-4}T^{3}\|_{op}.\end{split} (135)

Combine (134) and (135) and conclude that, with probability at least 1δ1-\delta,

𝐈𝐈\displaystyle\mathbf{II} (κ2𝔫12(λ)log(2/δ)nλ2κ2log(2/δ)nλ2)(T+λ)1Top2\displaystyle\lesssim\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)\|(T+\lambda)^{-1}T\|_{op}^{2}
+(κ2𝔫12(λ)log(2/δ)nλ2κ2log(2/δ)nλ2)3\displaystyle\quad+\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)^{3}
+(κ4log(2/δ)nλ2(κ2log(2/δ)nλ2)4)T3(T+λ)4op.\displaystyle\quad+\left(\sqrt{\frac{\kappa^{4}\log(2/\delta)}{n\lambda^{2}}}\vee\left(\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)^{4}\right)\|T^{3}(T+\lambda)^{-4}\|_{op}.

Since (T+λ)1TI(T+\lambda)^{-1}T\lesssim I it follows that (T+λ)1Top1\|(T+\lambda)^{-1}T\|_{op}\leq 1. Hence, under the rate conditions the upper bounds on terms 𝐈\mathbf{I} and 𝐈𝐈\mathbf{II} simplify as stated in the lemma. ∎

Proof of Lemma 23.

Proof of claim (i). Let 1in1\leq i\leq n and α\alpha\in\mathbb{N} be arbitrary. Since f0f_{0} minimizes the expected square loss, we have

E[(T+λ)α(Yif0(Xi))kXi]=0.\displaystyle\mathrm{E}\left[(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right]=0.

Therefore, as in the proof of Lemma G.4 in Singh and Vijaykumar (2023), we compute,

(T+λ)1(T+λ)α(Yif0(Xi))kXi\displaystyle\left\|(T+\lambda)^{-1}(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right\|_{\mathcal{H}} |Yif0(Xi)|(T+λ)αkXi\displaystyle\leq\left|Y_{i}-f_{0}(X_{i})\right|\left\|(T+\lambda)^{-\alpha}k_{X_{i}}\right\|_{\mathcal{H}}
λακ(B+κf0)a.s.\displaystyle\leq\lambda^{-\alpha}\kappa(B+\kappa\|f_{0}\|_{\mathcal{H}})\quad a.s.

and

E[(T+λ)α(Yif0(Xi))kXi2]σ02E[(T+λ)αkXi2]σ02tr((T+λ)2αT).\displaystyle\mathrm{E}\left[\left\|(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right\|_{\mathcal{H}}^{2}\right]\leq\sigma_{0}^{2}\mathrm{E}\left[\left\|(T+\lambda)^{-\alpha}k_{X_{i}}\right\|_{\mathcal{H}}^{2}\right]\leq\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right).

Moreover, since SS is separable and kk is continuous, \mathcal{H} is separable as well. Hence, the conditions of Lemma 24 are satisfied with ν=λακB+λακ2f0\nu=\lambda^{-\alpha}\kappa B+\lambda^{-\alpha}\kappa^{2}\|f_{0}\|_{\mathcal{H}} and σ=σ0𝔫α(λ)=σ0tr((T+λ)2αT)\sigma=\sigma_{0}\mathfrak{n}_{\alpha}(\lambda)=\sigma_{0}\sqrt{\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right)}. This completes the proof of the first claim.

To proof of claim (ii). Let 1in1\leq i\leq n be arbitrary. Then E[(T+λ)α((kXikXi)T)]=0\mathrm{E}[(T+\lambda)^{-\alpha}((k_{X_{i}}\otimes k_{X_{i}}^{*})-T)]=0. Moreover,

(T+λ)α((kXikXi)T)HS\displaystyle\left\|(T+\lambda)^{-\alpha}\left((k_{X_{i}}\otimes k_{X_{i}}^{*})-T\right)\right\|_{HS} (T+λ)αkXikXiHS+(T+λ)αTHS\displaystyle\leq\left\|(T+\lambda)^{-\alpha}k_{X_{i}}\otimes k_{X_{i}}^{*}\right\|_{HS}+\left\|(T+\lambda)^{-\alpha}T\right\|_{HS}
λακ2+λ2αtr(T2)\displaystyle\leq\lambda^{-\alpha}\kappa^{2}+\sqrt{\lambda^{-2\alpha}\mathrm{tr}(T^{2})}
2λακ2a.s.\displaystyle\leq 2\lambda^{-\alpha}\kappa^{2}\quad a.s.

and

E[(T+λ)α((kXikXi)T)HS2]\displaystyle\mathrm{E}\left[\left\|(T+\lambda)^{-\alpha}\left((k_{X_{i}}\otimes k_{X_{i}}^{*})-T\right)\right\|_{HS}^{2}\right] =E[tr((T+λ)2α(kXikXiT)2)]\displaystyle=\mathrm{E}\left[\mathrm{tr}\left((T+\lambda)^{-2\alpha}(k_{X_{i}}\otimes k_{X_{i}}^{*}-T)^{2}\right)\right]
=E[tr((T+λ)2α(kXikXi)2)]tr((T+λ)2αT2)\displaystyle=\mathrm{E}\left[\mathrm{tr}\left((T+\lambda)^{-2\alpha}(k_{X_{i}}\otimes k_{X_{i}}^{*})^{2}\right)\right]-\mathrm{tr}((T+\lambda)^{-2\alpha}T^{2})
κ2tr((T+λ)2αT).\displaystyle\leq\kappa^{2}\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right).

The space of linear operators A:A:\mathcal{H}\rightarrow\mathcal{H} equipped with the Hilbert-Schmidt norm AHS=tr(AA)\|A\|_{HS}=\sqrt{\mathrm{tr}(A^{\prime}A)} is a Hilbert space. This space can be identified as the tensor product \mathcal{H}\otimes\mathcal{H}. Since this tensor product space is separable whenever \mathcal{H} is separable, the conditions of Lemma 24 are satisfied ν=2λακ2\nu=2\lambda^{-\alpha}\kappa^{2} and σ=κtr((T+λ)2αT)\sigma=\kappa\sqrt{\mathrm{tr}((T+\lambda)^{-2\alpha}T)}. This completes the proof of the second claim. ∎