\AtAppendix\AtAppendix

Gaussian and Bootstrap Approximations for Suprema of Empirical Processes

Alexander Giessing Department of Statistics, University of Washington, Seattle, WA. E-mail: [email protected].

Abstract

In this paper we develop non-asymptotic Gaussian approximation results for the sampling distribution of suprema of empirical processes when the indexing function class $\mathcal{F}_{n}$ varies with the sample size $n$ and may not be Donsker. Prior approximations of this type required upper bounds on the metric entropy of $\mathcal{F}_{n}$ and uniform lower bounds on the variance of $f\in\mathcal{F}_{n}$ which, both, limited their applicability to high-dimensional inference problems. In contrast, the results in this paper hold under simpler conditions on boundedness, continuity, and the strong variance of the approximating Gaussian process. The results are broadly applicable and yield a novel procedure for bootstrapping the distribution of empirical process suprema based on the truncated Karhunen-Loève decomposition of the approximating Gaussian process. We demonstrate the flexibility of this new bootstrap procedure by applying it to three fundamental problems in high-dimensional statistics: simultaneous inference on parameter vectors, inference on the spectral norm of covariance matrices, and construction of simultaneous confidence bands for functions in reproducing kernel Hilbert spaces.

Keywords: Gaussian Approximation; Gaussian Comparison; Gaussian Process Bootstrap; High-Dimensional Inference.

1 Introduction

1.1 Approximating the sampling distribution of suprema of empirical processes

This paper is concerned with non-asymptotic bounds on the Kolmogorov distance between the sampling distribution of suprema of empirical processes and the distribution of suprema of Gaussian proxy processes. Consider a simple random sample of independent and identically distributed random variables $X_{1},\ldots,X_{n}$ with common law $P$ taking values in the measurable space $(S,\mathcal{S})$ . Let $\mathcal{F}_{n}$ be a class of measurable functions $f:S\rightarrow\mathbb{R}$ and define the empirical process

\displaystyle\mathbb{G}_{n}(f)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\big{(}f(X_{i})-Pf\big{)},\quad\quad f\in\mathcal{F}_{n}.

Also, denote by $\{G(f):f\in\mathcal{F}_{n}\}$ a centered Gaussian process with some positive semi-definite covariance function. Within this minimal setup (and a few additional technical assumptions), the paper addresses the problem of deriving non-asymptotic bounds on

\displaystyle\varrho_{n}:=\sup_{s\geq 0}\left|\mathbb{P}\Big{\{}\sup_{f\in\mathcal{F}_{n}}|\mathbb{G}_{n}(f)|\leq s\Big{\}}-\mathbb{P}\Big{\{}\sup_{f\in\mathcal{F}_{n}}|G(f)|\leq s\Big{\}}\right|,

(1)

and, consequently, conditions under which an empirical process indexed by $\mathcal{F}_{n}$ is Gaussian approximable, i.e. $\varrho_{n}\rightarrow 0$ as $n\rightarrow\infty$ . Evidently, if $\mathcal{F}_{n}$ is a Donsker class then the empirical process is trivially Gaussian approximable by the centered Gaussian process with covariance function $(f,g)\mapsto P(fg)-(Pf)(Pg)$ . Thus, in this paper we are concerned with conditions that are strictly weaker than those that guarantee a central limit theorem.

Gaussian approximation results of this type have gained significant interest recently, as they prove to be effective in tackling high- and infinite-dimensional inference problems. Applications include simultaneous inference on high-dimensional parameters (Dezeure et al., 2017; Zhang and Cheng, 2017), inference on nonparametric models (Chernozhukov et al., 2014a; Chen et al., 2015, 2016), testing for spurious correlations (Fan et al., 2018), testing for shape restrictions (Chetverikov, 2019), inference on large covariance matrices (Chen, 2018; Han et al., 2018; Lopes et al., 2019), goodness-of-fit tests for high-dimensional linear models (Janková et al., 2020), simultaneous confidence bands in functional data analysis (Lopes et al., 2020; Singh and Vijaykumar, 2023), and error quantification in randomized algorithms (Lopes et al., 2023).

The theoretical groundwork on Gaussian approximation was laid in three seminal papers by Chernozhukov et al. (2013, 2014b, 2015). Since then, subsequent theoretical works have developed numerous refinements and extensions (e.g. Chernozhukov et al., 2016, 2019, 2020; Deng and Zhang, 2020; Fang and Koike, 2021; Kuchibhotla et al., 2021; Cattaneo et al., 2022; Lopes, 2022a, b; Bong et al., 2023). In this paper, we resolve two limitations of Chernozhukov et al.’s (2014b) original results that have not been previously addressed:

•

Entropy conditions. The original bounds on the Kolmogorov distance $\varrho_{n}$ depend on the metric entropy of the function class $\mathcal{F}_{n}$ . These upper bounds are non-trivial only if the metric entropy grows at a much slower rate than the sample size $n$ . As a result, most applications only address sparse inference problems involving function classes with either a finite number of functions, discretized functions, or a small VC-index. The new upper bounds in this paper no longer depend on the metric entropy of the function class and therefore open the possibility to tackle non-sparse, high-dimensional inference problems. We call these bounds dimension-/entropy-free.
•

Lower bounds on weak variances. The original bounds on $\varrho_{n}$ require a strictly positive lower bound on the weak variance of the Gaussian proxy process, i.e. $\inf_{f\in\mathcal{F}_{n}}\mathrm{Var}\big{(}G(f)\big{)}>0$ . This condition limits the scope of the original results to problems with standardized function classes (studentized statistics) and excludes situations with variance decay, typically observed in non-sparse, high-dimensional problems. The new Gaussian approximation results in this paper only depend on the strong variance $\mathrm{Var}\big{(}\sup_{f\in\mathcal{F}_{n}}|G(f)|\big{)}$ and are therefore applicable to a broad range of problems including those with degenerate distributions. We say that these bounds are weak variance-free.

Even though these limitations (and our solution) are quite technical, the resulting new Gaussian and bootstrap approximations have immediate practical consequences. We present three substantial applications to problems in high-dimensional inference in Section 4 and two toy examples that cast light on why and when the original approximation results by Chernozhukov et al. (2014b) fail and the new results succeed in Appendix A.

Notably, in the special case of inference on spectral norms of covariance matrices, entropy-free bounds on $\varrho_{n}$ already exist. Namely, for $\mathcal{F}_{n}=\{x\mapsto f(x)=(x^{\prime}u)(x^{\prime}v):u,v\in S^{d-1}\}$ with $S^{d-1}=\{u\in\mathbb{R}^{d}:\|u\|_{2}=1\}$ Lopes et al. (2019; 2020; 2022a; 2022b) have devised an approach to bounding $\varrho_{n}$ that combines a specific variance decay assumption with a truncation argument. For a carefully chosen truncation level, the resulting bound on $\varrho_{n}$ depends only on the sample size $n$ and the parameter characterizing the variance decay. Since their approach is intimately related to the bilinearity of the functions in $\mathcal{F}_{n}$ and the specific variance decay assumption, it does not easily extend to arbitrary function classes. We therefore develop a different strategy in this paper.

Furthermore, for finite function classes $|\mathcal{F}_{n}|<\infty$ , Chernozhukov et al. (2020) and Deng and Zhang (2020) have been able to slightly relax the requirements on the weak variance of the Gaussian proxy process. However, their results do not generalize to arbitrary function classes with $|\mathcal{F}_{n}|=\infty$ . Our strategy for replacing the weak variance with the strong variance in the upper bounds on $\varrho_{n}$ is therefore conceptually completely different from theirs. For details, we refer to our companion paper on Gaussian anti-concentration inequalities Giessing (2023).

1.2 Contributions and overview of the results

This paper consists of three parts which contribute to probability theory (Section 2), mathematical statistics and bootstrap methodology (Section 3), and high-dimensional inference (Section 4). The Appendices A–F contain additional supporting results and all proofs.

Section 2 contains the main mathematical innovations of this paper. We establish dimension- and weak variance-free Gaussian and bootstrap approximations for maxima of sums of independent and identically distributed high-dimensional random vectors (Section 2.1). Specifically, we derive a Gaussian approximation inequality (Proposition 1), a Gaussian comparison inequality (Proposition 3), and two bootstrap approximation inequalities (Propositions 4 and 5). At the core of these four theoretical results is a new proof of a multivariate Berry-Esseen-type bound which leverages two new technical auxiliary results: an anti-concentration inequality for suprema of separable Gaussian processes (Lemma 6 in Appendix B.2) and a smoothing inequality for partial derivatives of the Ornstein-Uhlenbeck semigroup associated to a multivariate normal measure (Lemma 2 in Appendix B.1). We conclude this section with a comparison to results from the literature (Section 2.2).

Section 3 contains our contributions to mathematical statistics and bootstrap methodology. The results in this section generalize the four basic results of Section 2 from finite dimensional vectors to empirical processes indexed by totally bounded (and hence separable) function classes. As one would expect, dimension- (a.k.a entropy-) and weak variance-freeness of the finite dimensional results carry over to empirical processes. We establish Gaussian approximation inequalities (Section 3.2), Gaussian comparison inequalities (Section 3.3), and an abstract bootstrap approximation inequality (Section 3.4). The latter result motivates a new procedure for bootstrapping the sampling distribution of suprema of empirical processes. We call this procedure the Gaussian process bootstrap (Algorithm 1) and discuss practical aspects of its implementations via the truncated Karhunen-Loève decomposition of a Gaussian proxy process (Section 3.5). We include a selective comparison with results from the literature (Section 3.6).

In Section 4 we showcase the flexibility of the Gaussian process bootstrap by applying it to three fundamental problems in high-dimensional statistics: simultaneous inference on parameter vectors (Section 4.1), inference on the spectral norm of covariance matrices (Section 4.2), and construction of simultaneous confidence bands for functions in reproducing kernel Hilbert spaces (Section 4.3). For each of these three examples we include a brief comparison with alternative bootstrap methods from the literature to explain in what sense our method improves over (or matches with) existing results.

2 Dimension- and weak variance-free results for the maximum norm of sums of i.i.d. random vectors

2.1 Four basic results

We begin with four propositions on non-asymptotic Gaussian and bootstrap approximations for finite-dimensional random vectors. These propositions are our main mathematical contribution and essential for the statistical results to come. In Section 3 we lift these propositions to general empirical processes which are widely applicable to problems in mathematical statistics. Throughout this section, $\|x\|_{\infty}=\max_{1\leq k\leq d}|x_{k}|$ denotes the maximum norm of a vector $x\in\mathbb{R}^{d}$ .

The first result is a Gaussian approximation inequality. This inequality provides a non-asymptotic bound on the Kolmogorov distance between the laws of the maximum norm of sums of independent and identically distributed random vectors and a Gaussian proxy statistic.

Proposition 1 (Gaussian approximation).

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|Z\\|_{\infty}\leq s\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\\|X\\|_{\infty}^{3}])^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|Z\\|_{\infty})}}+\frac{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\mathbf{1}\{\\|X\\|_{\infty}>M\}\right]}{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\right]}+\frac{\mathrm{E}[\\|Z\\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\\|Z\\|_{\infty})}},$

where $\lesssim$ hides an absolute constant independent of $n,d,M$ , and the distribution of the $X_{i}$ ’s.

Remark 1 (Extension to independent, non-identically distributed random vectors).

The proof of Proposition 1 involves an inductive argument inspired by Theorem 3.7.1 in Nourdin and Peccati (2012) who attribute it to Bolthausen (1984). This argument relies crucially on independent and identically distributed data. Generalizing it to independent but non-identically distributed data would require additional assumptions on the variances similar to the uniform asymptotic negligibility condition in the classical Lindeberg-Feller CLT for triangular arrays. For empirical processes such a generalization is less relevant. We therefore leave this to future research.

Remark 2 (Extension to non-centered random vectors).

If the covariance matrix $\Sigma$ is strictly positive definite, then Proposition 1 also holds for i.i.d. random vectors $X,X_{1},\ldots X_{n}\in\mathbb{R}^{d}$ with non-zero mean $\mu\in\mathbb{R}^{d}$ and $Z\sim N(\mu,\Sigma)$ . Nevertheless, in the broader context of empirical processes a strictly positive definite covariance (function) is a very strong assumption. Therefore, in this paper, we do not pursue this refinement. Instead, the interested reader may consult the companion paper Giessing (2023).

Remark 3 (Gaussian approximation versus CLT).

Proposition 1 is strictly weaker than a multivariate CLT because the Kolmogorov distance between the maximum norm of two sequences of random vectors induces a topology on the set of probability measures on $\mathbb{R}^{d}$ which is coarser than the topology of convergence in distribution. Indeed, consider $U_{n}=(X,Y)^{\prime}\in\mathbb{R}^{2}$ and $V_{n}=(Y,X)^{\prime}\in\mathbb{R}^{2}$ for $n\geq 1$ and $X,Y$ arbitrary random variables. Then, the Kolmogorov distance between $\|U_{n}\|_{\infty}$ and $\|V_{n}\|_{\infty}$ is zero for all $n\geq 1$ , but $U_{n}\overset{d}{=}V_{n}$ if and only if $X$ and $Y$ are exchangeable. For another perspective on the same issue, see Section 2.2.

Above Gaussian approximation result differs in several ways from related results in the literature. The two most striking difference are the following: First, the non-asymptotic upper bound in Proposition 1 does not explicitly depend on the dimension $d$ , but only on the (truncated) third moments of $\|X\|_{\infty}$ and $\|Z\|_{\infty}$ and the variance of $\|Z\|_{\infty}$ . Under suitable conditions on the marginal and/ or joint distribution of the coordinates of $X$ these three quantities grow substantially slower than the dimension $d$ . Among other things, this opens up the possibility of applying this bound in the context of high-dimensional random vectors when $d\gg n$ . Second, Proposition 1 holds without stringent assumptions on the distribution of $X$ and applies even to degenerate distributions that do not have a strictly positive definite covariance matrix $\Sigma$ . In particular, the lemma does not require a lower bound on the variances of the coordinates of $X$ or the minimum eigenvalue of $\Sigma$ . This is fundamental for an effortless extension to general empirical processes. For lower bounds on the variance of $\|Z\|_{\infty}$ the reader may consult Giessing (2023). For a comprehensive comparison of Proposition 1 with previous work, we refer to Section 2.2.

If we are willing to impose a lower bound on the variances of the coordinates of $X$ , we can deduce the following useful inequality:

Corollary 2.

Recall the setup of Proposition 1. In addition, suppose that $\sigma_{(1)}^{2}=\min_{1\leq j\leq d}\mathrm{Var}(X^{(j)})>0$ and that $X$ has coordinate-wise finite $3+\delta$ moments, $\delta>0$ . Define $\widetilde{X}=(X^{(j)}/\sigma_{(1)})_{j=1}^{d}$ , $\widetilde{Z}=(Z^{(j)}/\sigma_{(1)})_{j=1}^{d}$ . Then, for all $n\geq 1$ ,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|Z\\|_{\infty}\leq s\right\}\Big{\|}$
	$\displaystyle\quad\quad\quad\lesssim\frac{1}{n^{1/6}}\left(\mathrm{E}[\\|\widetilde{X}\\|_{\infty}^{3+\delta}]+\mathrm{E}[\\|\widetilde{Z}\\|_{\infty}^{3+\delta}]\right)^{\frac{1}{3+\delta}}\mathrm{E}[\\|\widetilde{Z}\\|_{\infty}]+\frac{1}{n^{\delta/3}}\left(\frac{\mathrm{E}[\\|\widetilde{X}\\|_{\infty}^{3+\delta}]^{\frac{1}{3+\delta}}}{\mathrm{E}[\\|\widetilde{X}\\|_{\infty}^{3}]^{1/3}}\right)^{3},$

where $\lesssim$ hides an absolute constant independent of $n,d,\sigma_{(1)}^{2}$ , and the distribution of the $X_{i}$ ’s. If the coordinates in $X$ are equicorrelated with correlation coefficient $\rho\in(0,1]$ , then above inequality holds with $\mathrm{E}[\|\widetilde{Z}\|_{\infty}]$ replaced by $1/\rho$ .

Since $\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]^{1/3}\geq\max_{1\leq j\leq d}\mathrm{E}[|\widetilde{X}^{(j)}|^{2}]^{1/2}=\frac{\sigma_{(n)}}{\sigma_{(1)}}$ for $\sigma_{(n)}^{2}=\max_{1\leq j\leq d}\mathrm{Var}(X^{(j)})$ , the right hand side of the inequality in Corollary 2 can be easily upper bounded under a variety of moment assumptions on the marginal distributions of the coordinates of $X$ . For a few concrete examples relevant in high-dimensional statistics we refer to Lemmas 1–3 in Giessing and Fan (2023).

The second basic result is a Gaussian comparison inequality. The novelty of this result is (again) that it holds even for degenerate Gaussian laws with singular covariance matrices and does not explicitly depend on the dimension of the random vectors.

Proposition 3 (Gaussian comparison).

Let $Y,Z\in\mathbb{R}^{d}$ be Gaussian random vectors with mean zero and positive semi-definite covariance matrices $\Sigma$ and $\Omega\neq\mathbf{0}\in\mathbb{R}^{d\times d}$ , respectively. Then,

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|Y\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}\lesssim\left(\frac{\max_{j,k}|\Omega_{jk}-\Sigma_{jk}|}{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})}\right)^{1/3},

where $\lesssim$ hides an absolute constant independent of $d,\Sigma,\Omega$ .

Since the Gaussian distribution is fully characterized by its first two moments, Proposition 1 instantly suggests that it should be possible to approximate the sampling distribution of $\|S_{n}\|_{\infty}$ with the sampling distribution of $\|Z_{n}\|_{\infty}$ , where $Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n})$ and $\widehat{\Sigma}_{n}$ is a positive semi-definite estimate of $\Sigma$ . The third basic result formalizes this idea; it is a simple consequence of the triangle inequality combined with Propositions 1 and 3:

Proposition 4 (Bootstrap approximation of the sampling distribution).

Let $X,X_{1},\ldots X_{n}\in\mathbb{R}^{d}$ be i.i.d. random vectors with mean zero and positive semi-definite covariance matrix $\Sigma\neq\mathbf{0}\in\mathbb{R}^{d\times d}$ . Let $\widehat{\Sigma}_{n}\equiv\widehat{\Sigma}_{n}(X_{1},\ldots,X_{n})$ be any positive semi-definite estimate of $\Sigma$ . Set $S_{n}=n^{-1/2}\sum_{i=1}^{n}X_{i}$ , $Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n})$ , and $Z\sim N(0,\Sigma)$ . Then, for $M\geq 0$ , $n\geq 1$ ,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|Z_{n}\\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\\|X\\|_{\infty}^{3}])^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|Z\\|_{\infty})}}+\frac{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\mathbf{1}\{\\|X\\|_{\infty}>M\}\right]}{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\right]}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}+\frac{\mathrm{E}[\\|Z\\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\\|Z\\|_{\infty})}}+\left(\frac{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|}{\mathrm{Var}(\\|Z\\|_{\infty})}\right)^{1/3},$

where $\lesssim$ hides an absolute constant independent of $n,d,M$ , and the distribution of the $X_{i}$ ’s.

Typically, statistical applications require estimates of the quantiles of the sampling distribution. Since the covariance matrix $\Sigma$ is unknown, the quantiles of $\|Z\|_{p}$ with $Z\sim N(0,\Sigma)$ are infeasible. Hence, for $\alpha\in(0,1)$ arbitrary, we define the feasible Gaussian bootstrap quantiles as

\displaystyle c_{n}(\alpha;\widehat{\Sigma}_{n})

\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|Z_{n}\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\},\quad\text{where}\quad Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n}).

Since this quantity is random, it is not immediately obvious that it is a valid approximation of the $\alpha$ -quantile of the sampling distribution of $\|S_{n}\|_{p}$ . However, combing Proposition 4 with standard arguments (e.g. Chernozhukov et al., 2013) we obtain the fourth basic result:

Proposition 5 (Bootstrap approximation of quantiles).

Consider the setup of Lemma 4. Let $(\Theta_{n})_{n\geq 1}\in\mathbb{R}$ be a sequence of arbitrary random variables, not necessarily independent of $X,X_{1},\ldots,X_{n}$ . Then, for $M\geq 0$ , $n\geq 1$ ,

	$\displaystyle\sup_{\alpha\in(0,1)}\Big{\|}\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta_{n}\leq c_{n}(\alpha;\widehat{\Sigma}_{n})\right\}-\alpha\Big{\|}$
	$\displaystyle\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\\|X\\|_{\infty}^{3}])^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|Z\\|_{\infty})}}+\frac{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\mathbf{1}\{\\|X\\|_{\infty}>M\}\right]}{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\right]}+\frac{\mathrm{E}[\\|Z\\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\\|Z\\|_{\infty})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\\|Z\\|_{\infty})}\right)^{1/3}+\mathrm{P}\left(\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right)\right\}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad\quad+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})}}+\mathrm{P}\left(\|\Theta_{n}\|>\eta\right)\right\},$

where $\lesssim$ hides an absolute constant independent of $n,d,M$ , and the distribution of the $X_{i}$ ’s.

Remark 4 (On the purpose of the random variables $(\Theta_{n})_{n\geq 1}$ ).

In applications $(\Theta_{n})_{n\geq 1}$ may be a higher-order approximation error such as the remainder of a first-order Taylor approximation. For concrete examples we refer to Corollary 2 and Theorem 10 in Giessing and Fan (2023). The random variables $(\Theta_{n})_{n\geq 1}$ may also be taken identical to zero. In this case, the expression in the last line in Lemma 5 vanishes.

Propositions 4 and 5 inherit the dimension- and weak variance-freeness from Propositions 1 and 3. Since these propositions do not require lower bounds the minimum eigenvalue of the estimate of $\Sigma$ , we can always use the naive sample covariance matrix $\widehat{\Sigma}_{n}=n^{-1}\sum_{i=1}^{n}(X_{i}-\bar{X}_{n})(X_{i}-\bar{X}_{n})^{\prime}$ with $\bar{X}_{n}=n^{-1}\sum_{i=1}^{n}X_{i}$ to estimate $\Sigma$ even if $d\gg n$ . If additional information about $\Sigma$ is available (viz. low-rank, bandedness, or approximate sparsity), we can of course use more sophisticated estimators $\widehat{\Sigma}_{n}$ to improve the non-asymptotic bounds. This will prove particularly effective when we lift Propositions 4 and 5 to general empirical processes (see Section 3.4). The reader can find several more examples in Section 4.2 of the companion paper Giessing and Fan (2023).

2.2 Relation to previous work

Here, we compare Propositions 1 and 3 with the relevant results in the literature. From a mathematical point of view Propositions 4 and 5 are just an afterthought. For a comprehensive review of the entire literature on Gaussian and bootstrap approximations of maxima of sums of random vectors see Chernozhukov et al. (2023).

•

On the dependence on the dimension. As emphasized above, the upper bounds in Propositions 1 and 3 are dimension-free. This is a significant improvement over Theorems 2.2, 2.1, 3.2, 2.1 in Chernozhukov et al. (2013, 2017, 2019, 2020), Theorem 1.1 and Corollary 1.3 in Fang and Koike (2021), and Theorems 2.1 and 2.2 in Lopes (2022a), which all feature logarithmic factors of the dimension. Such bounds generalize poorly to empirical processes since the resulting upper bounds necessarily depend on the $\varepsilon$ -entropy of the function class. This precludes (or, at the very least, substantially complicates) applications to objects as basic as the operator norm of a high-dimensional covariance matrix.
•

On the moment assumptions. The above mentioned results by Chernozhukov et al. (2013, 2017, 2019, 2020), Fang and Koike (2021), and Lopes (2022a) all require strictly positive lower bounds on the variances of the components of the random vector $X$ and/ or on the minimum eigenvalue of the covariance matrix $\Sigma$ . The strictly positive lower bounds are especially awkward if we try to extend their finite dimensional results to general empirical processes: While is often sensible to impose an upper bound on the variances of the increments of an empirical process, lower bounds on the variances are much harder to justify and in general only achievable via a discretization (or thinning) of the function class. The weak variance-free upper bounds in Propositions 1 and 3 give us greater leeway and increases the scope of our approximation results considerably (see also Appendix A).
•

On the sharpness of the upper bounds. The results in Chernozhukov et al. (2020), Fang and Koike (2021), and Lopes (2022a) show that the upper bound in Proposition 1 is sub-optimal and that its dependence on sample size $n$ can be improved to $n^{-1/2}$ . The proof techniques in Chernozhukov et al. (2020) and Fang and Koike (2021) are both based on delicate estimates of Hermite polynomials and thus inherently dimension dependent. Extending their approaches to the coordinate-free Wiener chaos decomposition (which would yield dimension-free results) is a formidable research task. The proof strategy in Lopes (2022a) is very different and requires sub-Gaussian data. The results in the aforementioned papers also show that under additional distributional assumptions the exponent on the upper bound in Proposition 3 can be improved to $1/2$ .

•

Proper generalizations of classical CLTs. In a certain regard, Propositions 1 and 3 are strictly weaker than the results in Chernozhukov et al. (2017, 2020), Fang and Koike (2021), and Lopes (2022a). Proposition 1 (and similarly Proposition 3) provides a bound on

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|S_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}=\sup_{A\in\mathcal{Q}_{d}}\Big{|}\mathbb{P}\left\{S_{n}\in A\right\}-\mathbb{P}\left\{Z\in A\right\}\Big{|},

where $\mathcal{Q}_{d}$ be the collection of all hypercubes in $\mathbb{R}^{d}$ with center at the origin. Chernozhukov et al. (2017), Fang and Koike (2021), and Lopes (2022a) provide bounds on above quantity not for the supremum over $\mathcal{Q}_{d}$ but the supremum over the much larger collection of all hyper-rectangles in $\mathbb{R}^{d}$ . When the dimension $d$ is fixed, their results imply convergence in distribution of $S_{n}$ to $Z$ as $n\rightarrow\infty$ and are therefore stronger than our results (see also Remark 3). In particular, their results can be considered proper generalizations of classical CLTs to high dimensions.

The results in this paper depend on a dimension-free anti-concentration inequality for $\|Z\|_{\infty}$ (or, in other words, for the supremum over $\mathcal{Q}_{d}$ ) and a smoothing inequality specific to the map $x\mapsto\|x\|_{\infty}$ (i.e. Lemmas 2 and 6). Since these inequalities do not apply to the class of all hyper-rectangles in $\mathbb{R}^{d}$ , the arguments in this paper cannot be easily modified to yield dimension-free generalizations of the classical CLTs to high dimensions.

3 Approximation results for suprema of empirical processes

3.1 Empirical process notation and definitions

In the previous section we got away with intuitive standard notation. We now introduce precise empirical process notation for the remainder of the paper: We denote by $X,X_{1},X_{2},\ldots$ a sequence of i.i.d. random variables taking values in a measurable space $(S,\mathcal{S})$ with common distribution $P$ , i.e. $X_{i}:S^{\infty}\rightarrow S$ , $i\geq 1$ , are the coordinate projections of the infinite product probability space $(\Omega,\mathcal{A},\mathbb{P})=\left(S^{\infty},\mathcal{S}^{\infty},P^{\infty}\right)$ with law $\mathbb{P}_{X_{i}}=P$ . If auxiliary variables independent of the $X_{i}$ ’s are involved, the underlying probability space is assumed to be of the form $(\Omega,\mathcal{A},\mathbb{P})=\left(S^{\infty},\mathcal{S}^{\infty},P^{\infty}\right)\times(Z,\mathcal{Z},Q)$ . We define the empirical measures $P_{n}$ associated with observations $X_{1},\ldots,X_{n}$ as random measures on $\left(S^{\infty},\mathcal{S}^{\infty}\right)$ given by $P_{n}(\omega):=n^{-1}\sum_{i=1}^{n}\delta_{X_{i}(\omega_{i})}$ for all $\omega\in S^{\infty}$ , where $\delta_{x}$ is the Dirac measure at $x$ .

For a class $\mathcal{F}$ of measurable functions from $S$ onto the real line $\mathbb{R}$ we define the empirical process indexed by $\mathcal{F}$ as

\displaystyle\mathbb{G}_{n}(f):=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(f(X_{i})-Pf\big{)},\quad{}f\in\mathcal{F}.

Further, we denote by $\{G_{P}(f):f\in\mathcal{F}\}$ the Gaussian $P$ -bridge process with mean zero and the same covariance function $\mathcal{C}_{P}:\mathcal{F}\times\mathcal{F}\rightarrow\mathbb{R}$ as the process $\{f(X):f\in\mathcal{F}\}$ , i.e.

\displaystyle(f,g)\mapsto\mathcal{C}_{P}(f,g):=\mathrm{E}[G_{P}(f)G_{P}(g)]=Pfg-(Pf)(Pg).

(2)

Moreover, we denote by $\{Z_{Q}(f):f\in\mathcal{F}\}$ the Gaussian $Q$ -motion with mean zero and covariance function covariance function $\mathcal{E}_{Q}:\mathcal{F}\times\mathcal{F}\rightarrow\mathbb{R}$ given by

\displaystyle(f,g)\mapsto\mathcal{E}_{Q}(f,g):=\mathrm{E}[Z_{Q}(f)Z_{Q}(g)]=Qfg.

(3)

For probability measures $Q$ on $(S,\mathcal{S})$ we define the $L_{q}(Q)$ -norm, $q\geq 1$ , for function $f\in\mathcal{F}$ by $\|f\|_{Q,q}=(Q|f|^{q})^{1/q}$ , the $L_{2}(Q)$ -semimetric by $e_{Q}(f,g):=\|f-g\|_{Q,2}$ and the intrinsic standard deviation metric by $d_{P}(f,g):=e_{P}(f-Pf,g-Pg)=\sqrt{P(f-g)^{2}-(P(f-g))^{2}}$ , $f,g\in\mathcal{F}$ . We denote by $L_{q}(S,\mathcal{S},Q)$ , $q\geq 1$ , the space of all real-valued measurable functions $f$ on $(S,\mathcal{S})$ with finite $L_{q}(Q)$ -norm. A function $F:S\rightarrow\mathbb{R}$ is an envelop for the class $\mathcal{F}$ satisfies $|f(x)|\leq F(x)$ for all $f\in\mathcal{F}$ and $x\in S$ . For a semimetric space $(T,d)$ and any $\varepsilon>0$ we write $N(T,d,\varepsilon)$ to denote the $\varepsilon$ -covering numbers of $T$ with respect to $d$ . For two deterministic sequences $(a_{n})_{n\geq 1}$ and $(b_{n})_{n\geq 1}$ , we write $a_{n}\lesssim b_{n}$ if $a_{n}=o(b_{n})$ and $a_{n}\asymp b_{n}$ if there exist absolute constants $C_{1},C_{2}>0$ such that $C_{1}b_{n}\leq a_{n}\leq C_{2}b_{n}$ for all $n\geq 1$ . For $a,b\in\mathbb{R}$ we write $a\wedge b$ for $\min\{a,b\}$ and $a\vee b$ for $\max\{a,b\}$ .

3.2 Gaussian approximation inequalities

We present the first main theoretical result of this paper: a Gaussian approximation inequality for empirical processes indexed by function classes that need not be Donsker. This result generalizes Proposition 1 from finite dimensional index sets to general function classes.

Theorem 1 (Gaussian approximation).

Let $(\mathcal{F}_{n},\rho)$ be a totally bounded pseudo-metric space. Further, let $\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)$ have envelope function $F_{n}\in L_{3}(S,\mathcal{S},P)$ . Suppose that there exist functions $\psi_{n},\phi_{n}$ such that for $\delta>0$ ,

\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\lesssim\psi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\quad{}\quad{}\text{and}\quad{}\quad{}\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}\lesssim\phi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})},

(4)

where $\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>\rho(f,g)<\delta\|F_{n}\|_{P,2}\}$ . Let $r_{n}=\inf\big{\{}\psi_{n}(\delta)\vee\phi_{n}(\delta):\delta>0\big{\}}$ . Then, for each $M\geq 0$ ,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|G_{P}\\|_{\mathcal{F}_{n}}\leq s\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>M\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\sqrt{r_{n}},$

where $\lesssim$ hides an absolute constant independent of $n,r_{n},M,\mathcal{F}_{n},F_{n},\psi_{n},\phi_{n}$ , and $P$ .

Remark 1 (Lower and upper bounds on $\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}$ ).

Since $\mathcal{F}_{n}$ may change with the sample size, so may $\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}$ . In a few special cases it is possible to exactly compute the variance (Giessing and Fan, 2023), but in most cases one can only derive lower and upper bounds. In our companion work Giessing (2023) we derive such bounds under mild conditions on the Gaussian $P$ -bridge process. For the reader’s convenience we include the relevant results from this paper in Appendix B.2.

The total boundedness of $\mathcal{F}_{n}$ in above theorem is a standard assumption which allows us to reduce the proof of Theorem 1 to an application of Proposition 1 combined with a discretization argument. Since $\mathcal{F}_{n}$ is totally bounded whenever $\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}<\infty$ (i.e. Lemma 16 in Appendix B.4), this is a rather mild technical assumption. For a detailed discussion of this result and a comparison with the literature we refer to Section 3.6.

Under more specific smoothness assumptions on the Gaussian $P$ -bridge process (beyond the abstract control of the moduli of continuity in eq. (4)) Theorem 1 simplifies as follows:

Corollary 2.

Let $\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)$ and have envelope $F_{n}\in L_{3}(S,\mathcal{S},P)$ . If the Gaussian $P$ -Bridge process $\{G_{P}(f):f\in\mathcal{F}_{n}\}$ has almost surely uniformly $d_{P}$ -continuous sample paths and $\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}<\infty$ , then for each $M\geq 0$ ,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|G_{P}\\|_{\mathcal{F}_{n}}\leq s\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>M\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}},$

where $\lesssim$ hides an absolute constant independent of $n,M,\mathcal{F}_{n},F_{n}$ , and $P$ .

Remark 2 (Further simplification with lower bound on second moment).

Recall the setup of Corollary 2. If in addition there exists $\kappa_{n}>0$ such that $Pf^{2}\geq\kappa_{n}^{2}>0$ for all $f\in\mathcal{F}_{n}$ and $F_{n}\in L_{3+\delta}(S,\mathcal{S},P)$ , $\delta>0$ , then

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|G_{P}\\|_{\mathcal{F}_{n}}\leq s\right\}\Big{\|}$
	$\displaystyle\quad\quad\quad\lesssim\frac{1}{n^{1/6}\kappa^{2}_{n}}\left(\\|F_{n}\\|_{P,3+\delta}^{3+\delta}+\mathrm{E}[\\|G_{P}\\|_{\mathcal{F}_{n}}^{3+\delta}]\right)^{\frac{1}{3+\delta}}\mathrm{E}[\\|G_{P}\\|_{\mathcal{F}_{n}}]+\frac{1}{n^{\delta/3}}\left(\frac{\\|F_{n}\\|_{P,3+\delta}}{\\|F_{n}\\|_{P,3}}\right)^{3},$

where $\lesssim$ hides an absolute constant independent of $n,d,\kappa_{n}^{2}$ , and the distribution of the $X_{i}$ ’s. This is the empirical process analogue to Corollary 2. Since the proof of this inequality is is identical to the one of Corollary 2 and only differs in notation, we omit the details.

Since centered Gaussian processes are either almost surely uniformly continuous w.r.t. their intrinsic standard deviation metric or almost surely discontinuous, the smoothness condition in Corollary 2 is natural. (For the proofs to go through uniform $d_{P}$ -continuity in probability would be sufficient.) In Lemmas 17 and 18 in Appendix B.4 we provide simple sufficient and necessary conditions for almost sure uniform $d_{P}$ -continuity of $G_{P}$ on $\mathcal{F}_{n}$ . Importantly, $n\geq 1$ is fixed; the uniform $d_{P}$ -continuity is a point-wise requirement and does not need to hold when taking the limit $n\rightarrow\infty$ . This is especially relevant in high-dimensional statistics where Gaussian processes are typically unbounded and have diverging sample paths as $n\rightarrow\infty$ , and, more generally, whenever the function classes are non-Donsker. Of course, just as with Proposition 1 the conclusion of Corollary 2 is weaker than a weaker convergence (see Section 2.2).

3.3 Gaussian comparison inequalities

Our second main result is a comparison inequality which bounds the Kolmogorov distance between a Gaussian $P$ -bridge process and a Gaussian $Q$ -motion both indexed by (possibly) different function classes $\mathcal{F}_{n}$ and $\mathcal{G}_{n}$ . This generalizes Proposition 3 to Gaussian processes.

To state the theorem, we introduce the following notation: Given the pseudo-metric spaces $(\mathcal{F}_{n},\rho_{i})$ and $(\mathcal{G}_{n},\rho_{i})$ , $i\in\{1,2\}$ , we write $\pi^{1}_{\mathcal{G}_{n}}:\mathcal{F}_{n}\cup\mathcal{G}_{n}\rightarrow\mathcal{G}_{n}$ to denote the projection from $\mathcal{F}_{n}\cup\mathcal{G}_{n}$ onto $\mathcal{G}_{n}$ defined by $\rho_{1}(h,\pi^{1}_{\mathcal{G}_{n}}h)=\inf_{g\in\mathcal{G}_{n}}\rho_{1}(h,g)$ for all $h\in\mathcal{F}_{n}\cup\mathcal{G}_{n}$ . In the case that the image $\pi^{1}_{\mathcal{G}_{n}}h$ is not a singleton for some $h\in\mathcal{F}_{n}\cup\mathcal{G}_{n}$ , we choose any one of the equivalent points in $\mathcal{G}_{n}$ . Similarly, we write $\pi^{2}_{\mathcal{F}_{n}}:\mathcal{F}_{n}\cup\mathcal{G}_{n}\rightarrow\mathcal{F}_{n}$ for the projection from $\mathcal{F}_{n}\cup\mathcal{G}_{n}$ onto $\mathcal{F}_{n}$ defined via $\rho_{2}$ .

Theorem 3 (Gaussian comparison).

Let $(\mathcal{F}_{n},\rho_{1})$ and $(\mathcal{G}_{n},\rho_{2})$ be totally bounded pseudo-metric spaces. Further, let $\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)$ and $\mathcal{G}_{n}\subset L_{2}(S,\mathcal{S},Q)$ have envelope functions $F_{n}\in L_{2}(S,\mathcal{S},P)$ and $G_{n}\in L_{2}(S,\mathcal{S},Q)$ , respectively. Suppose that there exist functions $\psi_{n},\phi_{n}$ such that

\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\lesssim\psi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\quad{}\quad{}\text{and}\quad{}\quad{}\mathrm{E}\|Z_{Q}\|_{\mathcal{G}_{n,\delta}^{\prime}}\lesssim\phi_{n}(\delta)\sqrt{\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})},

where $\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>\rho_{1}(f,g)<\delta\|F_{n}\|_{P,2}\}$ and $\mathcal{G}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{G}_{n},\>\rho_{2}(f,g)<\delta\|G_{n}\|_{Q,2}\}$ . Let $r_{n}=\inf\big{\{}\psi_{n}(\delta)\vee\phi_{n}(\delta):\delta>0\big{\}}$ . Then,

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\lesssim\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right)^{1/3}+\sqrt{r_{n}},

where

\displaystyle\begin{split}\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})&:=\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f_{1},f_{2})-\mathcal{E}_{Q}(\pi^{1}_{\mathcal{G}_{n}}f_{1}),\pi^{1}_{\mathcal{G}_{n}}f_{2})\big{|}\\ &\quad\quad\quad\bigvee\sup_{g_{1},g_{2}\in\mathcal{G}_{n}}\big{|}\mathcal{E}_{Q}(g_{1},g_{2})-\mathcal{C}_{P}(\pi^{2}_{\mathcal{F}_{n}}g_{1},\pi^{2}_{\mathcal{F}_{n}}g_{2})\big{|},\end{split}

(5)

where $\lesssim$ hides an absolute constant independent of $n,\mathcal{F}_{n},\mathcal{G}_{n},F_{n},G_{n},\psi_{n},\phi_{n},P$ and $Q$ .

Remark 3 (Comparison inequality for Gaussian $Q$ -bridge processes).

Theorem 3 holds without changes (and identical proof) for Gaussian $Q$ -bridge processes $\{G_{Q}(f):f\in\mathcal{F}_{n}\}$ . This easily follows from the almost sure representation $G_{Q}(f)=Z_{Q}(f)+(Pf)Z$ for all $f\in\mathcal{F}_{n}$ , where $Z\sim N(0,1)$ is independent of $Z_{Q}(f)$ for all $f\in\mathcal{F}_{n}$ . We state the comparison inequality for $Q$ -motions because this is the version that we use in the context of the Gaussian process bootstrap in Section 3.4.

Informally speaking, Theorem 3 states that the distributions of the suprema of any two Gaussian processes are close whenever their covariance functions are not too different. Note that we compare two Gaussian processes with different covariance functions $\mathcal{C}_{P}$ and $\mathcal{E}_{Q}$ that differ not only in their measures ( $P$ and $Q$ ) but also their functional form (viz. eq. (2) and (3) in Section 3.1). This turns out to be essential for developing alternatives to the classical Gaussian multiplier bootstrap procedure for suprema of empirical processes proposed by Chernozhukov et al. (2014a) and Chernozhukov et al. (2014b). We discuss this in detail in Section 3.4.

Under additional smoothness conditions on the Gaussian processes and assumptions on the function classes Theorem 3 simplifies. The special cases $\mathcal{G}_{n}=\mathcal{F}_{n}$ and $\mathcal{G}_{n}\subseteq\mathcal{F}_{n}$ are of particular interest in the context of the bootstrap. They are the content of the next two corollaries.

Corollary 4.

Let $\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)\cap L_{2}(S,\mathcal{S},Q)$ . If the Gaussian $P$ -Bridge process $\{G_{P}(f):f\in\mathcal{F}_{n}\}$ and the Gaussian $Q$ -motion $\{Z_{Q}(f):f\in\mathcal{F}_{n}\}$ have almost surely uniformly continuous sample paths w.r.t. their respective standard deviation metrics $d_{P}$ and $d_{Q}$ , and $\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}\vee\mathrm{E}\|Z_{Q}\|_{\mathcal{F}_{n}}<\infty$ , then

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\lesssim\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\mathcal{E}_{Q}(f,g)\big{|}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{F}_{n}})}\right)^{1/3},

where $\lesssim$ hides an absolute constant independent of $n,\mathcal{F}_{n},P$ , and $Q$ .

Corollary 5.

Let $\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)$ and $\mathcal{G}_{n}\subseteq\mathcal{F}_{n}$ be a $\delta\|F\|_{P,2}$ -net of $\mathcal{F}_{n}$ with respect to $d_{P}$ , $\delta>0$ . If the Gaussian $P$ -Bridge process $\{G_{P}(f):f\in\mathcal{F}_{n}\}$ has almost surely uniformly continuous sample paths w.r.t. metric $d_{P}$ and $\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}<\infty$ , then

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\lesssim\left(\frac{\delta\|F\|_{P,2}\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{2}}}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|G_{P}\|_{\mathcal{G}_{n}})}\right)^{1/3},

where $\lesssim$ hides an absolute constant independent of $\delta,n,\mathcal{F}_{n}$ , and $P$ .

3.4 Gaussian process bootstrap

In this section we develop a general framework for bootstrapping the distribution of suprema of empirical processes. We recover the classical Gaussian multiplier bootstrap as a special case of this more general framework. For concrete applications to statistical problems we refer to Section 4.

The following abstract approximation result generalizes Proposition 4 to empirical processes.

Theorem 6 (Abstract bootstrap approximation).

Let $(\mathcal{F}_{n},\rho_{1})$ and $(\mathcal{G}_{n},\rho_{2})$ be totally bounded pseudo-metric spaces. Further, let $\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)$ and $\mathcal{G}_{n}\subset L_{2}(S,\mathcal{S},Q)$ have envelope functions $F_{n}\in L_{3}(S,\mathcal{S},P)$ and $G_{n}\in L_{2}(S,\mathcal{S},Q)$ , respectively. Suppose that there exist functions $\psi_{n},\phi_{n}$ such that for $\delta>0$ ,

\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\vee\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}\lesssim\psi_{n}(\delta)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})},\quad\mathrm{E}\|Z_{Q}\|_{\mathcal{G}_{n,\delta}^{\prime}}\lesssim\phi_{n}(\delta)\sqrt{\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})},

(6)

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|Z_{Q}\\|_{\mathcal{G}_{n}}\leq s\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>M\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\\|Z_{Q}\\|_{\mathcal{G}_{n}})}\right)^{1/3}+\sqrt{r_{n}},$

where $\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})$ is given in eq. (5) and $\lesssim$ hides an absolute constant independent of $n,r_{n},M,$ $\mathcal{F}_{n},$ $\mathcal{G}_{n},F_{n},G_{n},\psi_{n},\phi_{n},P$ , and $Q$ .

Remark 4 (Bootstrap approximation with Gaussian $Q$ -bridge process).

Theorem 6 holds without changes (and identical proof) with the Gaussian $Q$ -bridge process $\{G_{Q}(f):f\in\mathcal{F}_{n}\}$ substituted for the Gaussian $Q$ -motion $\{Z_{Q}(f):f\in\mathcal{F}_{n}\}$ (see also Remark 3).

Theorem 6 is an immediate consequence of Theorems 1 and 3 and the triangle inequality, and thus a mathematical triviality. What matters is its statistical interpretation: It implies that we can approximate the sampling distribution of the supremum of an empirical process $\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}$ with the distribution of the supremum of a Gaussian $Q$ -motion $\{Z_{Q}(g):g\in\mathcal{G}_{n}\}$ provided that their covariance functions $\mathcal{C}_{P}$ and $\mathcal{E}_{Q}$ do not differ by too much. Importantly, up to the smoothness condition in (6), we are completely free to choose whatever $Q$ -measure and function class $\mathcal{G}_{n}$ suit us best.

This interpretation of Theorem 6 motivates the following bootstrap procedure for estimating the distribution of the supremum of an empirical process $\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}$ when the $P$ -measure is unknown.

Algorithm 1 (Gaussian process bootstrap).

Let $X_{1},\ldots,X_{n}$ be a simple random sample drawn from distribution $P$ .

Step 1:

Construct a positive semi-definite estimate $\widehat{\mathcal{C}}_{n}$ of the covariance function $\mathcal{C}_{P}$ of the empirical process $\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}$ .

Step 2:

Construct a Gaussian process $\widehat{Z}_{n}=\{\widehat{Z}_{n}(f):f\in\mathcal{F}_{n}\}$ such that for all $f,g\in\mathcal{F}_{n}$ ,

\displaystyle\mathrm{E}\big{[}\widehat{Z}_{n}(f)\mid X_{1},\ldots,X_{n}\big{]}=0\quad\mathrm{and}\quad\mathrm{E}\big{[}\widehat{Z}_{n}(f)\widehat{Z}_{n}(g)\mid X_{1},\ldots,X_{n}\big{]}=\widehat{\mathcal{C}}_{n}(f,g).

(7)

Step 3:

Approximate the distribution of $\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}$ by drawing Monte Carlo samples from $\|\widehat{Z}_{n}\|_{\mathcal{F}_{n}}$ .

By Kolmogorov’s consistency theorem the Gaussian process $\widehat{Z}_{n}$ will always exist. If the estimate $\widehat{\mathcal{C}}_{n}$ is uniformly consistent and if the envelope function $F_{n}$ and the supremum of the Gaussian $P$ -bridge process satisfy certain moment conditions, Theorem 6 readily implies uniform consistency of the Gaussian process bootstrap as $n\rightarrow\infty$ . The challenge is to actually construct a (measurable) version of $\widehat{Z}_{n}$ from which we can draw Monte Carlo samples. This is the content the next section.

Remark 5 (Relation to the Gaussian multiplier bootstrap by Chernozhukov et al. (2014b)).

The Gaussian multiplier bootstrap is a special case of the Gaussian process bootstrap. To see this, note that if we estimate the covariance function $\mathcal{C}_{P}$ nonparametrically via the sample covariance function $\mathcal{C}_{P_{n}}(f,g)=P_{n}fg-(P_{n}f)(P_{n}g)$ , $f,g\in\mathcal{F}_{n}$ , then the corresponding Gaussian $P_{n}$ -Bridge process $\{G_{P_{n}}(f):f\in\mathcal{F}_{n}\}$ can be expanded into a finite sum of non-orthogonal functions

\displaystyle G_{P_{n}}(f)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}\big{(}f(X_{i})-P_{n}f\big{)},\quad{}f\in\mathcal{F}_{n},

where $\xi_{1},\ldots,\xi_{n}$ are i.i.d. standard normal random variables independent of $X_{1},\ldots,X_{n}$ . While this representation is extremely simple (in theory and practice!), unlike the Gaussian process bootstrap it does not allow integrating additional structural information about the covariance function. Empirically, this often results in less accurate bootstrap estimates (Giessing and Fan, 2023).

3.5 A practical guide to the Gaussian process bootstrap

To implement the Gaussian process bootstrap we need two things: a uniformly consistent estimate of the covariance function and a systematic way of constructing Gaussian processes for given covariance functions.

Finding consistent estimates of the covariance function is relatively straightforward. For example, a natural estimate is the nonparametric sample covariance function defined as $\mathcal{C}_{P_{n}}(f,g):=P_{n}(fg)-(P_{n}f)(P_{n}g)$ , $f,g\in\mathcal{F}_{n}$ . Under suitable conditions on the probability measure $P$ and the function class $\mathcal{F}_{n}$ this estimate is uniformly consistent in high dimensions (e.g. Koltchinskii and Lounici, 2017). Of course, an important feature of the Gaussian process bootstrap is that it can be combined with any positive semi-definite estimate of the covariance function. In particular, we can use (semi-)parametric estimators to exploit structural constraints induced by $P$ and $\mathcal{F}_{n}$ . For concrete examples we refer to Section 4 and also Giessing and Fan (2023).

Constructing Gaussian processes for given covariance functions and defined on potentially arbitrary index sets is a more challenging problem. We propose to base the construction on an almost sure version of the classical Karhunen-Loève decomposition.

To develop this decomposition in the framework of this paper, we need new notation and concepts. In the following, $(\mathcal{F}_{n},\mathcal{B}_{n},\mu)$ denotes a measurable space for some finite measure $\mu$ with support $\mathcal{F}_{n}$ . Typically, $\mathcal{B}_{n}$ is the Borel $\sigma$ -algebra on $\mathcal{F}_{n}$ and the measure $\mu$ is chosen for convenience. For example, $\mu$ can be set to the Lebesgue measure when $\mathcal{F}$ an interval, or the counting measure when $\mathcal{F}_{n}$ is a finite (discrete) set. The space $L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu)$ equipped with the inner product $\langle\psi,\phi\rangle:=\int_{\mathcal{F}_{n}}\psi(f)\phi(f)d\mu(f)$ is a Hilbert space. Given a positive semi-definite and continuous kernel $K:\mathcal{F}_{n}\times\mathcal{F}_{n}\rightarrow\mathbb{R}$ we define the linear operator $T_{K}$ on $L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu)$ via

\displaystyle(T_{K}\psi)(g):=\langle K(\cdot,g),\psi\rangle=\int_{\mathcal{F}_{n}}K(f,g)\psi(f)d\mu(f),\quad\psi\in L^{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu),\quad g\in\mathcal{F}_{n}.

(8)

If $\int_{\mathcal{F}_{n}\times\mathcal{F}_{n}}K^{2}(f,g)d\mu(f)d\mu(g)<\infty$ , then $T_{K}$ is a bounded linear operator. If $(\mathcal{F}_{n},d_{Q})$ is a compact metric space, then $T_{K}$ is a compact operator. In this case, $T_{K}$ has a spectral decomposition which depends on the kernel $K$ alone; the measure $\mu$ is exogenous. For further details we refer the reader to Chapters 4 and 7 in Hsing and Eubank (2015).

The following result is well-known (Jain and Kallianpur, 1970); since it is slightly adapted to our setting we have included its proof in the appendix.

Proposition 7 (Karhunen-Loève decomposition of Gaussian processes).

Let $(\mathcal{F}_{n},e_{Q})$ be a compact pseudo-metric space and $Z_{Q}=\{Z_{Q}(f):f\in\mathcal{F}_{n}\}$ be a Gaussian $Q$ -motion. Suppose that $Z_{Q}$ has a continuous covariance function $\mathcal{E}_{Q}$ and $\mathrm{E}\|Z_{Q}\|_{\mathcal{F}_{n}}<\infty$ . Let $\{(\lambda_{k},\varphi_{k})\}_{k=1}^{\infty}$ be the eigenvalue and eigenfunction pairs of the linear operator $T_{\mathcal{E}_{Q}}$ corresponding to $\mathcal{E}_{Q}$ , and $\{\xi_{k}\}_{k=1}^{\infty}$ be a sequence of i.i.d. standard normal random variables. Then, with probability one,

\displaystyle\lim_{m\rightarrow\infty}\|Z_{Q}^{m}-Z_{Q}\|_{\mathcal{F}_{n}}=0,

where $Z_{Q}^{m}(f):=\sum_{k=1}^{m}\xi_{k}\sqrt{\lambda_{k}}\varphi_{k}(f)$ , $f\in\mathcal{F}_{n}$ , is an almost surely bounded and continuous Gaussian process on $\mathcal{F}_{n}$ for all $m\in\mathbb{N}$ .

This proposition justifies the following constructive approximation of the Gaussian proxy process $\widehat{Z}_{n}$ defined in (7): Let $\widehat{\mathcal{C}}_{n}$ be a generic estimate of the covariance function $\mathcal{C}_{P}$ of the empirical process $\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}$ . Suppose that $\widehat{\mathcal{C}}_{n}$ is such that the associated integral operator $T_{\widehat{\mathcal{C}}_{n}}$ defined in (8) admits a spectral decomposition. Denote by $\big{\{}(\widehat{\lambda}_{k},\widehat{\varphi}_{k})\big{\}}_{k=1}^{\infty}$ its eigenvalue and eigenfunction pairs, and, for concreteness, assume that the $\widehat{\lambda}_{k}$ ’s are sorted in nonincreasing order. Given a sequence of i.i.d. standard normal random variables $\{\xi_{k}\}_{k=1}^{\infty}$ and truncation level $m\geq 1$ , define the Gaussian process and associated covariance function

\displaystyle\widehat{Z}_{n}^{m}(f):=\sum_{k=1}^{m}\xi_{k}\sqrt{\widehat{\lambda}_{k}}\widehat{\varphi}_{k}(f),\quad\quad\widehat{\mathcal{C}}_{n}^{m}(f,g):=\sum_{k=1}^{m}\widehat{\lambda}_{k}\widehat{\varphi}_{k}(f)\widehat{\varphi}_{k}(g),\quad f,g\in\mathcal{F}_{n}.

(9)

While there are only a few situations in which the eigenvalues and eigenfunctions of $T_{\widehat{\mathcal{C}}_{n}}$ can be found analytically, from a computational perspective this is a standard problem and efficient numerical solvers exist (Ghanem and Spanos, 2003; Berlinet and Thomas-Agnan, 2004). Thus, constructing the process $\widehat{Z}_{n}^{m}$ in (9) does not pose any practical challenges. Proposition 7 now guarantees (under appropriate boundedness and smoothness conditions on $\widehat{Z}_{n}$ ) that the approximation error between $\widehat{Z}_{n}$ and $\widehat{Z}_{n}^{m}$ can be made arbitrarily small by choosing $m\geq 1$ sufficiently large. In fact, by Mercer’s theorem the error can be quantified in terms of the operator norm of the difference between the covariance functions $\widehat{\mathcal{C}}_{n}$ and $\widehat{\mathcal{C}}_{n}^{m}$ (e.g. Hsing and Eubank, 2015, Lemma 4.6.6 and Corollary 8).

In above discussion we have implicitly imposed several assumptions on the estimate $\widehat{\mathcal{C}}_{n}$ . For future reference, we summarize these assumptions in a single definition.

Definition 1 (Admissibility of $\widehat{\mathcal{C}}_{n}$ ).

We say that an estimate $\widehat{\mathcal{C}}_{n}:\mathcal{F}_{n}\times\mathcal{F}_{n}\rightarrow\mathbb{R}$ of the covariance function $\mathcal{C}_{P}$ is admissible if it is continuous, symmetric, positive semi-definite, and its associated integral operator $T_{\widehat{\mathcal{C}}_{n}}$ is a bounded linear operator on $L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu)$ for some finite measure $\mu$ with support $\mathcal{F}_{n}$ .

Remark 6 (Admissible estimates exist).

Under the assumption that $\mathcal{F}_{n}$ has an envelope function, the nonparametric sample covariance $\mathcal{C}_{P_{n}}(f,g)=P_{n}(fg)-(P_{n}f)(P_{n}g)$ is admissible. Indeed, by definition, $\mathcal{C}_{P_{n}}$ is continuous (w.r.t. $e_{P_{n}}$ ), symmetric, positive semi-definite, and, by existence of the envelope function, $\int_{\mathcal{F}_{n}\times\mathcal{F}_{n}}\mathcal{C}_{P_{n}}^{2}(f,g)dP_{n}(f)dP_{n}(g)<\infty$ . Also, the existence of an envelop function implies that $(\mathcal{F}_{n},e_{P_{n}})$ is a compact metric space. See also Section 4.

The next result establishes consistency of the Gaussian process bootstrap based on the truncated Karhunen-Loève expansion in (9). It is a simple corollary of Theorem 6. Note that the intrinsic standard deviation metric associated with (the pushforward probability measure induced by) $\widehat{Z}_{n}^{m}$ can be expressed in terms of its covariance function as $d_{\widehat{\mathcal{C}}_{n}^{m}}^{2}(f,g)=\widehat{\mathcal{C}}_{n}^{m}(f,f)+\widehat{\mathcal{C}}_{n}^{m}(g,g)-2\widehat{\mathcal{C}}_{n}^{m}(f,g)$ .

Corollary 8 (Consistency of the Gaussian process bootstrap).

Let $\widehat{\mathcal{C}}_{n}$ be an admissible estimate of $\mathcal{C}_{P}$ and $\widehat{\mathcal{C}}_{n}^{m}$ its best rank- $m$ approximation. Let $\mathcal{F}_{n}\subset L_{2}(S,\mathcal{S},P)$ have envelope $F_{n}\in L_{3}(S,\mathcal{S},P)$ . If the Gaussian processes $\{G_{P}(f):f\in\mathcal{F}_{n}\}$ and $\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\}$ have almost surely uniformly continuous sample paths w.r.t. their respective standard deviation metrics $d_{P}$ and $d_{\widehat{\mathcal{C}}_{n}^{m}}$ , and $\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}\vee\mathrm{E}\big{[}\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\mid X_{1},\ldots,X_{n}\big{]}<\infty$ , then, for each $M\geq 0$ ,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|\widehat{Z}_{n}^{m}\\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>M\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{\|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}(f,g)\big{\|}}{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}\right)^{1/3}+\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{\|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{\|}}{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}\right)^{1/3},$

where $\lesssim$ hides an absolute constant independent of $n,m,M,\mathcal{F}_{n},F_{n},P_{n}$ , and $P$ .

Moreover, if $\mathcal{F}_{n}$ is compact w.r.t. $d_{\widehat{\mathcal{C}}_{n}}$ , then the last term on the right hand side in above display vanishes as $m\rightarrow\infty$ .

In general, the strong variance $\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})$ depends on the dimension of the function class $\mathcal{F}_{n}$ . Hence, the truncation level $m$ has to be chosen (inverse) proportionate to $\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})$ to ensure that in the upper bound of Corollary 8 the deterministic approximation error $\sup_{f,g\in\mathcal{F}_{n}}\big{|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}$ is negligible compared to the stochastic estimation errors.

We conclude this section with a consistency result on the bootstrap approximation of quantiles of suprema of empirical processes. This result is the empirical process analogue to Proposition 5. It is relevant in the context of hypothesis testing and construction of confidence regions. For $\alpha\in(0,1)$ we denote the conditional $\alpha$ -quantile of the supremum of $\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\}$ by

\displaystyle c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m})

\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\}.

We have the following result:

Theorem 9 (Quantiles of the Gaussian process bootstrap).

Consider the setup of Corollary 8. Let $(\Theta_{n})_{n\geq 1}\in\mathbb{R}$ be a sequence of arbitrary random variables, not necessarily independent of $X_{1},\ldots,X_{n}$ . Then, for each $M\geq 0$ ,

	$\displaystyle\sup_{\alpha\in(0,1)}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}+\Theta_{n}\leq c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m})\right\}-\alpha\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>M\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}\right)^{1/3}+\mathbb{P}\left\{\sup_{f,g\in\mathcal{F}_{n}}\big{\|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{\|}>\delta\right\}\right\}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\mathbb{P}\left\{\|\Theta_{n}\|>\eta\right\}\right\},$

where $\lesssim$ hides an absolute constant independent of $n,m,M,\mathcal{F}_{n},F_{n},P_{n}$ , and $P$ .

In statistical applications, the statistic of interest is rarely a simple empirical process. Instead, the empirical process usually arises as the leading term of a (functional) Taylor expansion. The random sequence $(\Theta_{n})_{n\geq 1}$ in above theorem can be used to capture the higher-order approximation errors of such a expansion.

Remark 7 (Additional practical considerations).

It is often infeasible to draw Monte Carlo samples directly from $\|Z_{n}^{m}\|_{\mathcal{F}_{n}}$ . In practice, we suggest approximating $\mathcal{F}_{n}$ via a finite $\delta\|F\|_{P,2}$ -net $\mathcal{G}_{n}\subseteq\mathcal{F}_{n}$ with respect to $d_{P}$ . With this additional approximation step and by Corollary 5, the conclusion of the Corollary 8 holds with $\|\widehat{Z}_{n}^{m}\|_{\mathcal{G}_{n}}$ substituted for $\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}$ and the additional quantity $\big{(}\delta\|F\|_{P,2}\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{2}}\big{)}^{1/3}\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})^{-1/3}$ in the upper bound on the Kolmogorov distance. This shows that the level of discretization $\delta>0$ should be chosen proportional to $\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})$ .

3.6 Relation to previous work

The papers by Chernozhukov et al. (2014a, b, 2016) are currently the only other existing works on Gaussian and bootstrap approximations of suprema of empirical processes indexed by (potentially) non-Donsker function classes. In this section, we compare their results to our Theorems 1–9. To keep the comparison we focus on the key aspects that motivated us to write this paper.

It is important to note that the results presented in Chernozhukov et al. (2014a, b, 2016) differ slightly in nature from ours. Instead of establishing bounds on Kolmogorov distances, Chernozhukov and his co-authors derive coupling inequalities. However, through standard arguments (Strassen’s theorem and anti-concentration) these coupling inequalities indirectly yield bounds on Kolmogorov distances. These implied bounds on the Kolmogorov distances are at most as sharp as the ones in the coupling inequalities, up to multiplicative constants. Since we do not care about absolute constants, but only the dependence of the upper bounds on characteristics of the function class $\mathcal{F}_{n}$ and the law $P$ , we can meaningfully compare their findings and ours.

•

Unbounded function classes. In classical empirical process theory the function class $\mathcal{F}_{n}$ is typically assumed to be uniformly bounded, i.e. $\sup_{x\in S}|f(x)|<\infty$ for all $f\in\mathcal{F}_{n}$ (e.g. van der Vaart and Wellner, 1996). A key feature of Theorems 1–9 as well as Theorems A.2, 2.1, and 2.1-2.3 in Chernozhukov et al. (2014a, b, 2016) is that they hold for unbounded function classes and only require the envelope function $F_{n}$ to have finite moments. Relaxing the uniform boundedness of the function class is useful, among other things, for inference on high-dimensional statistical models, functional data analysis, nonparametric regression, and series/ sieve estimation (Chernozhukov et al., 2014a, b; Giessing and Fan, 2023).
•

Entropy conditions. The upper bounds provided in Theorems A.2, 2.1, and 2.1-2.3 in Chernozhukov et al. (2014a, b, 2016) depend on a combination of (truncated) second and $q$ th ( $q\geq 3$ ) moments of $\max_{1\leq i\leq n}F_{n}$ $(X_{i})$ , “local quantities” of order $\sup_{f\in\mathcal{F}_{n}}P|f|^{3}$ and $\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{4}}$ (disregarding $(\log n)$ -factors), $\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\vee\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}$ , and the entropy number $\log N(\mathcal{F},e_{P},\\ \delta\|F_{n}\|_{P,2})$ , $\delta>0$ arbitrary. These upper bounds are not only more complex than ours but also weaker in terms of their implications: Since metric entropy with respect to intrinsic standard deviation metric $e_{P}$ typically scales linearly in the (VC-)dimension of the statistical model (see Appendix A for two (counter-)examples), the upper bounds in Chernozhukov et al. (2014a, b, 2016) are vacuous in high-dimensional situations without sparsity or when the (VC-)dimension exceeds the sample size. In contrast, the upper bounds in our Theorems 1–9 depend only on the expected value and standard deviation of the supremum of the Gaussian $P$ -bridge process and the (truncated) third moments of the envelope. Under mild assumptions, these quantities can be upper bounded independently of the (VC-)dimension, thus offering useful bounds even in high-dimensional problems (see Section 4).

The entropy term in Chernozhukov et al. (2014a, b, 2016) is due to their proofs relying on a dimension-dependent version of Proposition 1.
•

Lower bounds on weak variances. Lemmas 2.3 and 2.4 in Chernozhukov et al. (2014b) and all of the results in Chernozhukov et al. (2014a, 2016) require a strictly positive lower bound on the weak variance of the Gaussian $P$ -bridge process, i.e. $\inf_{f\in\mathcal{F}_{n}}\mathrm{Var}\big{(}G_{P}(f)\big{)}\geq\underline{\sigma}^{2}>0$ . This assumption automatically limits the applicability of these lemmas to studentized statistics/ standardized function classes (Chernozhukov et al., 2014a, b) and excludes relevant scenarios with variance decay (Lopes et al., 2020; Lopes, 2022b; Lopes et al., 2023). In contrast, our Theorems 1–9 apply to all function classes for which the strong variance of the Gaussian $P$ -bridge process, $\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})$ , does not vanish “too fast”. A slowly vanishing strong variance is a weaker requirement than a strictly positive lower bound on the weak variance (see Giessing (2023) and also Lemmas 6 and 7).

The lower bound on the weak variance is an artifact of Chernozhukov et al.’s (2014a; 2014b; 2016) proof technique which requires the joint distribution of the finite-dimensional marginals of the Gaussian $P$ -bridge process to be non-degenerate.

4 Applications

4.1 Confidence ellipsoids for high-dimensional parameter vectors

Confidence regions are fundamental to uncertainty quantification in multivariate statistics. Typically, their asymptotic validity relies on multivariate CLTs. However, in high dimensions, when the number of parameters exceeds the sample size, validity of confidence regions needs to be justified differently. In this section, we show how the Gaussian process bootstrap offers a practical solution to this problem. Existing work on bootstrap confidence regions in high dimensions has focused exclusively on conservative, rectangular confidence regions (e.g. Chernozhukov et al., 2013, 2014b, 2023). For the first time, our results allow construction of tighter, elliptical confidence regions and without sparsity assumptions.

Let $\theta_{0}\in\mathbb{R}^{d}$ be an unknown parameter and $\hat{\theta}_{n}\equiv\hat{\theta}_{n}(X_{1},\ldots,X_{n})$ an estimator for $\theta_{0}$ based on the simple random sample $X_{1},\ldots,X_{n}$ . Then, an asymptotic $1-\alpha$ confidence ellipsoid for $\theta_{0}$ can be constructed as

\displaystyle\mathcal{E}_{n}(c):=\left\{\theta\in\mathbb{R}^{d}:\sqrt{n}\|\hat{\theta}_{n}-\theta\|_{2}\leq c\right\},

(10)

where $\|\cdot\|_{2}$ denotes the Euclidean norm and $c>0$ solves

\displaystyle\lim_{n\rightarrow\infty}\mathbb{P}\left\{\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2}\leq c\right\}=1-\alpha.

In general, it is difficult to determine $c$ from above equation because the sampling distribution of $\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2}$ often does not have an explicit form. However, in many cases, $\hat{\theta}_{n}$ admits an asymptotically linear expansion of the form

\displaystyle\hat{\theta}_{n}-\theta_{0}=\frac{1}{n}\sum_{i=1}^{n}\psi_{i}+R_{n},

where $\psi_{1},\ldots,\psi_{n}$ are centered i.i.d. random vectors (influence functions) and $R_{n}$ is a remainder term. Thus, in these cases, we have

\displaystyle\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2}=\sup_{u\in S^{d-1}}\left|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\psi_{i}^{\prime}u\right|+\Theta_{n},\quad\quad\text{where}\quad\quad|\Theta_{n}|\leq\sqrt{n}\|R_{n}\|_{2}.

Using this formulation, we can now apply Theorem 9 to approximate the quantile $c$ and complete the construction of the asymptotic confidence ellipsoid. By Algorithm 1 we need to construct a Gaussian process with index set $S^{d-1}$ and whose covariance function approximates the bi-linear map $(u,v)\mapsto u^{\prime}\Omega_{\psi}v$ , where $\Omega_{\psi}=\mathrm{E}[\psi_{1}\psi_{1}^{\prime}]$ . Clearly, the following process does the job:

\displaystyle\{\widehat{Z}_{\psi}^{\prime}u:u\in S^{d-1}\},\quad\quad\text{where}\quad\quad\widehat{Z}_{\psi}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{\psi}),

and $\widehat{\Omega}_{\psi}=n^{-1}\sum_{i=1}^{n}(\psi_{i}-\overline{\psi}_{n})(\psi_{i}-\overline{\psi}_{n})^{\prime}$ and $\overline{\psi}_{n}=n^{-1}\sum_{i=1}^{n}\psi_{i}$ . We denote the $\alpha$ -quantile of the supremum of this process by

\displaystyle c_{n}(\alpha):=\inf\left\{c\geq 0:\mathbb{P}\left\{\|\widehat{Z}_{\psi}\|_{2}\leq c\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\}.

(11)

To show that these quantiles uniformly approximate the quantiles of the (asymptotic) distribution of $\sqrt{n}\|\hat{\theta}_{n}-\theta_{0}\|_{2}$ introduce the following assumptions:

Definition 2 (Sub-Gaussian random vector).

We call a centered random vector $X\in\mathbb{R}^{d}$ sub-Gaussian if $\|X^{\prime}u\|_{\psi_{2}}^{2}\lesssim\mathrm{E}[(X^{\prime}u)^{2}]$ for all $u\in\mathbb{R}^{d}$ .

Assumption 1 (Sub-Gaussian influence functions).

The influence functions $\psi,\psi_{1},\ldots,\psi_{n}\in\mathbb{R}^{d}$ are i.i.d. sub-Gaussian random vectors with covariance matrix $\Omega_{\psi}$ and (i) $r(\Omega_{\psi})/r(\Omega_{\psi}^{2})=O(1)$ , (ii) $r(\Omega_{\psi})=o(n^{1/6})$ , and (iii) $\sqrt{n}\|R_{n}\|_{2}=o_{p}(1)$ , where $r(A)=\mathrm{tr}(A)/\|A\|_{op}$ is the effective rank matrix $A$ .

Assumption 2 (Heavy-tailed influence functions).

The influence functions $\psi,\psi_{1},\ldots,\psi_{n}\in\mathbb{R}^{d}$ are centered i.i.d. random vectors with covariance matrix $\Omega_{\psi}$ such that $M_{n}^{2}:=\mathrm{E}[\max_{1\leq i\leq n}\|\psi_{i}\|_{2}^{2}]/\|\Omega_{\psi}\|_{op}$ $<\infty$ and $m_{2,s}^{s}:=\mathrm{E}[\|\psi_{1}\|_{2}^{s}]/\|\Omega_{\psi}\|_{op}^{s/2}<\infty$ for some $s>3$ . Furthermore, (i) $r(\Omega_{\psi})/r(\Omega_{\psi}^{2})=O(1)$ , (ii) $m_{2,s}/m_{2,3}=o(n^{1/3-1/s})$ , (iii) $m_{2,3}=o(n^{1/6})$ , (iv) $M_{n}^{2}(\log n)=o(n)$ , and (v) $\sqrt{n}\|R_{n}\|_{2}=o_{p}(1)$ , where $r(A)=\mathrm{tr}(A)/\|A\|_{op}$ is the effective rank matrix $A$ .

Under either assumption, we have the following result:

Proposition 1 (Bootstrap confidence ellipsoid).

If either Assumption 1 or Assumption 2 holds, then

\displaystyle\lim_{n\rightarrow\infty}\sup_{\alpha\in(0,1)}\left|\mathbb{P}\Big{\{}\theta_{0}\in\mathcal{E}_{n}\big{(}c_{n}(1-\alpha)\big{)}\Big{\}}-(1-\alpha)\right|=0,

with the confidence ellipsoid $\mathcal{E}_{n}$ as defined in (10) and the conditional quantile $c_{n}(1-\alpha)$ as in (11).

It is worth noticing that neither Assumption 1 nor 2 require sparsity of the parameter $\theta_{0}$ , the estimator $\hat{\theta}_{n}$ , or the influence functions $\psi_{1},\ldots,\psi_{n}$ . Instead, Assumption 1 and 2 already hold if the covariance matrix $\Omega_{\psi}$ has bounded effective rank (see Giessing and Fan, 2023, Section 2.1 and Appendices A.1 and A.2). In this context it is important to notice that, unlike Chernozhukov et al. (2014b), we do not require a strictly positive lower bound on $\inf_{\|u\|_{2}=1}\mathrm{E}[(Z_{\psi}^{\prime}u)^{2}]\equiv\lambda_{\min}(\Omega_{\psi})$ , the smallest eigenvalue of the covariance matrix $\Omega_{\psi}$ . If such a lower bound was needed, the effective rank $r(\Omega_{\psi})$ would grow linearly in the dimension $d$ and Assumptions 1 and 2 would be violated if $d\gg n$ (see Appendix A). Moreover, Proposition 1 cannot be deduced from Theorem 2.1 in Chernozhukov et al. (2014b) because the upper bound in their coupling inequality would feature the term $\sqrt{d\log(1+1/\epsilon)}/n^{1/4}$ (and other terms as well), which is a remnant of the entropy number of the $\epsilon$ -net discretization of the $d$ -dimensional Euclidean unit-ball.

4.2 Inference on the spectral norm of high-dimensional covariance matrices

Spectral statistics of random matrices play an important role in multivariate statistical analysis. The asymptotic distributions of spectral statistics are well established in low dimensions (e.g. Anderson, 1963; Waternaux, 1976; Fujikoshi, 1980) and when the dimension is comparable to the sample size (e.g. Johnstone, 2001; El Karoui, 2007; Péché, 2009; Bao et al., 2015). In the high-dimensional case, when asymptotic arguments do not apply, bootstrap procedures have proved to be effective in approximating distributions of certain maximum-type spectral statistics (e.g. Han et al., 2018; Naumov et al., 2019; Lopes et al., 2019, 2020; Lopes, 2022b). Here, we demonstrate that the Gaussian process bootstrap is a viable alternative to these bootstrap procedures in approximating the distribution of the spectral norm of a high-dimensional sample covariance matrix.

Let $X_{1},\ldots,X_{n}\in\mathbb{R}^{d}$ be i.i.d. random vectors with law $P$ , mean zero, and covariance matrix $\Sigma\in\mathbb{R}^{d\times d}$ . Consider the spectral statistic

\displaystyle T_{n}:=\sqrt{n}\|\widehat{\Sigma}_{n}-\Sigma\|_{op},\quad\quad{}\widehat{\Sigma}_{n}=n^{-1}\sum_{i=1}^{n}X_{i}X_{i}^{\prime}.

Since there does not exist a closed form expression of the sampling distribution of $T_{n}$ when $d\gg n$ , we apply Algorithm 1 to obtain an approximation based on a Gaussian proxy process. We introduce the following notation: Let $x,u,v\in\mathbb{R}^{d}$ and note that $(x^{\prime}u)(x^{\prime}v)=\mathrm{vech}^{\prime}(xx^{\prime})H_{d}^{\prime}(v\otimes u)$ , where $\otimes$ denotes the Kronecker product, $\mathrm{vech}(\cdot)$ the half-vectorization operator which turns a symmetric matrix $A\in\mathbb{R}^{d\times d}$ into a $d(d+1)/2$ column vector (of unique entries), and $H_{d}$ the duplication matrix such that $H_{d}\>\mathrm{vech}(A)=\mathrm{vec}(A)$ where $\mathrm{vec}(\cdot)$ is the ordinary vectorization operator. Whence, $T_{n}\equiv\sqrt{n}\|Q_{n}-Q\|_{{\mathcal{F}_{n}}}$ where $Q_{n}$ is the empirical measure of the collection of $Y_{i}=\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime})H_{d}^{\prime}$ , $1\leq i\leq n$ , $Q$ the pushforward of $P$ under the map $X\mapsto Y=\mathrm{vech}^{\prime}(XX^{\prime})H_{d}^{\prime}$ , and $\mathcal{F}_{n}=\{y\mapsto y(v\otimes u):u,v\in S^{d-1}\}$ . Since each $f\in\mathcal{F}_{n}$ has a (not necessarily unique) representation in terms of $u,v\in S^{d-1}$ , in the following we identify $f\in\mathcal{F}_{n}$ with pairs $(u,v)\in S^{d-1}\times S^{d-1}$ . The covariance function associated with the empirical process $\{\sqrt{n}(Q_{n}-Q)(f):f\in\mathcal{F}_{n}\}$ is thus given by

\displaystyle\big{(}(u_{1},v_{1}),(u_{2},v_{2})\big{)}

\displaystyle\mapsto(v_{1}\otimes u_{1})^{\prime}H_{d}\Omega H_{d}^{\prime}(v_{2}\otimes u_{2}),

where $\Omega=\mathrm{E}\left[\mathrm{vech}(XX^{\prime}-\Sigma)\mathrm{vech}^{\prime}(XX^{\prime}-\Sigma)\right]\in\mathbb{R}^{d(d+1)/2\times d(2+1)/2}$ . Thus, in the light of Algorithm 1 the natural choice for the Gaussian bootstrap process is

\displaystyle\big{\{}\widehat{Z}_{n}^{\prime}H_{d}^{\prime}(v\otimes u):u,v\in S^{d-1}\big{\}},\quad\quad\text{where}\quad\quad\widehat{Z}_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{n}),

(12)

and $\widehat{\Omega}_{n}=n^{-1}\sum_{i=1}^{n}\mathrm{vech}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n})\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n})$ is the sample analogue of $\Omega$ .

We make the following assumption:

Assumption 3 (Sub-Gaussian data).

The data $X,X_{1},\ldots,X_{n}\in\mathbb{R}^{d}$ are i.i.d. sub-Gaussian random vectors with covariance matrix $\Sigma$ and $\inf_{\|u\|_{2}=1}\mathrm{Var}((X^{\prime}u)^{2})\geq\kappa>0$ .

Remark 1 (On the lower bound on the variances).

The strictly positive lower bound $\kappa>0$ on the variance of the quadratic from $u^{\prime}X^{\prime}Xu$ is mild. The existence of the lower bound $\kappa>0$ is equivalent to $\mathrm{E}[(X^{\prime}u)^{4}]^{1/4}>\mathrm{E}[(X^{\prime}u)^{2}]^{1/2}$ for all $u\in S^{d-1}$ . The latter inequality holds if the law of $X$ does not concentrate on a lower dimensional subspace of $\mathbb{R}^{d}$ . Since the bounds in Proposition 2 are explicit in $\kappa>0$ , Proposition 2 also applies to scenarios in which $\kappa\equiv\kappa(n,d)\rightarrow 0$ as $n,d\rightarrow\infty$ . Similar lower bounds on the variance of $u^{\prime}X^{\prime}Xu$ appear in Lopes (2022b); Lopes et al. (2023).

We have the following result:

Proposition 2 (Bootstrap approximation of the distribution of spectral norms of covariance matrices).

Suppose that Assumption 3 holds. Let $\widehat{S}_{n}=\mathrm{vec}^{-1}(H_{d}\widehat{Z}_{n})\in\mathbb{R}^{d\times d}$ with $\widehat{Z}_{n}$ as given in (12). Then,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\sqrt{n}\\|\widehat{\Sigma}_{n}-\Sigma\\|_{op}\leq s\right\}-\mathbb{P}\left\{\\|\widehat{S}_{n}\\|_{op}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\quad\quad\quad\lesssim O_{p}\left(\left(\frac{\\|\Sigma\\|_{op}^{2/3}}{\kappa^{2/3}}\vee\frac{\\|\Sigma\\|_{op}^{4/3}}{\kappa^{4/3}}\right)\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}^{7}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}^{4}(\Sigma)}{n}\right)^{1/3}\right)$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\\|\Sigma\\|_{op}}{\kappa}\vee\frac{\\|\Sigma\\|_{op}^{2}}{\kappa^{2}}\right)\frac{r^{2}(\Sigma)}{n^{1/6}},$

where $r(\Sigma)=\mathrm{tr}(\Sigma)/\|\Sigma\|_{op}$ is the effective rank of $\Sigma$ and $\lesssim$ hides an absolute constant independent of $n,d,\Sigma$ , and $\kappa$ .

Remark 2.

Note that the matrix $\widehat{S}_{n}$ is symmetric just as the target matrix $\widehat{\Sigma}_{n}-\Sigma$ .

We conclude this section with a comparison of Proposition 2 to existing results in the literature: First, if $(\|\Sigma\|_{op}/\kappa\vee\|\Sigma\|_{op}^{2}/\kappa^{2})r^{2}(\Sigma)=o(n^{1/6})$ , then the upper bound in above theorem is asymptotically negligible. In this case, the bootstrapped distribution of the Gaussian proxy statistic $\|\widehat{S}_{n}\|_{op}$ consistently approximates the distribution of $\|\widehat{\Sigma}_{n}-\Sigma\|_{op}$ . Since this rate depends only on the effective rank, it is dimension-free and cannot be derived through the results in Chernozhukov et al. (2014b). Second, unlike the results in Lopes (2022b) and Lopes et al. (2023), Proposition 2 does not rely on specific assumptions about the decay of the eigenvalues of $\Sigma$ . And yet, under certain circumstances, the consistency rate provided by Proposition 2 can be faster than theirs. Specifically, the bootstrap procedure described in Lopes (2022b) achieves consistency at a rate of $n^{-\frac{\beta-1/2}{2\beta+4+\epsilon}}$ for $\epsilon>0$ and $\beta>1/2$ . In our context, the parameter $\beta$ determines the rate at which the eigenvalues of $\Sigma$ decrease, i.e. $\lambda_{k}(\Sigma)\asymp k^{-2\beta}$ , $k=1,\ldots,d$ . To achieve a rate faster than $n^{-1/6}$ , $\beta$ must be greater than $(7+\epsilon)/4$ which requires an extremely fast decay of the eigenvalues. Third, Lopes et al. (2023) conduct extensive numerical experiments and observe that the bootstrap approximation exhibits a sharp phase transition from accurate to inaccurate when $\beta$ switches from greater than $1/2$ to less than $1/2$ . This observation aligns not only with their own theoretical findings but also with the upper bound presented in Proposition 2, since, under their modeling assumptions, the effective rank $r(\Sigma)$ remains bounded if $\beta>1/2$ but diverges if $\beta\leq 1/2$ .

4.3 Simultaneous confidence bands for functions in reproducing kernel Hilbert spaces

Reproducing kernel Hilbert spaces (RKHS) are an integral part of statistics, with applications in classical non-parametric statistics (Wahba, 1990), machine learning (Schölkopf and Smola, 2002; Steinwart and Christmann, 2008) and, most recently, (deep) neural nets (Belkin et al., 2018; Jacot et al., 2018; Bohn et al., 2019; Unser, 2019; Chen and Xu, 2020). In this section, we consider constructing simultaneous confidence bands for functions in RKHS by bootstrapping the distribution of a bias-corrected kernel ridge regression estimator. Recently, this problem has been addressed by Singh and Vijaykumar (2023) with a symmetrizied multiplier bootstrap. Here, we propose an alternative based on the Gaussian process bootstrap using a truncated Karhunen-Loève decomposition. We point out several commonalities and differences between the two procedures.

In the following, $\mathcal{H}$ denotes the RKHS of continuous functions from $S$ to $\mathbb{R}$ associated to the symmetric and positive definite kernel $k:S\times S\rightarrow\mathbb{R}$ . We allow $\mathcal{H}$ , $S$ , and $k$ to change with the sample size $n$ , but we do not make this dependence explicit. We write $\|\cdot\|_{\mathcal{H}}$ for the norm induced by the inner product $\langle\cdot,\cdot\rangle_{\mathcal{H}}$ and $\|\cdot\|_{\infty}$ for the supremum norm. For $x\in S$ , we let $k_{x}:S\rightarrow\mathbb{R}$ be the function $y\mapsto k(x,y)$ . Then, $k_{x}\in\mathcal{H}$ for all $x\in S$ and, by the reproducing property, $f(x)=\langle f,k_{x}\rangle_{\mathcal{H}}$ for all $f\in\mathcal{H}$ and $x\in S$ . The kernel induces the so-called kernel metric $d_{k}(x,y):=\|k_{x}-k_{y}\|_{\mathcal{H}}$ , $x,y\in S$ . Given $f\in\mathcal{H}$ ( $z\in S$ ) we denote its dual by $f^{*}\in\mathcal{H}^{*}$ ( $z^{*}\in S^{*}\subset\mathcal{H}$ ). For $f,g\in\mathcal{H}$ we define the tensor product $f\otimes g^{*}:\mathcal{H}\rightarrow\mathcal{H}$ by $h\mapsto(f\otimes g^{*})(h):=\langle g,h\rangle_{\mathcal{H}}f$ . For operators on $\mathcal{H}$ we use $\|\cdot\|_{op}$ , $\|\cdot\|_{HS}$ , and $\mathrm{tr}(\cdot)$ to denote operator norm, Hilbert-Schmidt norm, and trace, respectively. For further details on RKHSs we refer to Berlinet and Thomas-Agnan (2004).

Let $Y\in\mathbb{R}$ and $X\in S$ have joint law $P$ . Given a simple random sample $(Y_{1},X_{1}),\ldots,(Y_{n},$ $X_{n})$ our goal is to construct uniform confidence bands for the conditional mean function $x\mapsto\mathrm{E}[Y\mid X=x]$ or, rather, its best approximation in hypothesis space $\mathcal{H}$ , i.e.

\displaystyle f_{0}\in\operatorname*{\arg\min}_{f\in\mathcal{H}}\mathrm{E}\left[\big{(}Y-f(X)\big{)}^{2}\right].

To this end, consider the classical kernel ridge regression estimator

\displaystyle\widehat{f}_{n}\in\operatorname*{\arg\min}_{f\in\mathcal{H}}\left\{\frac{1}{n}\sum_{i=1}^{n}\big{(}Y_{i}-f(X_{i})\big{)}^{2}+\lambda\|f\|_{\mathcal{H}}^{2}\right\},\quad\lambda>0,

and define its bias-corrected version as

\displaystyle\widehat{f}^{\mathrm{bc}}_{n}:=\widehat{f}_{n}+\lambda(\widehat{T}_{n}+\lambda)^{-1}\widehat{f}_{n},\quad\quad\widehat{T}_{n}=\frac{1}{n}\sum_{i=1}^{n}\left(k_{X_{i}}\otimes k_{X_{i}}^{*}\right).

(13)

We propose to construct simultaneous $1-\alpha$ confidence bands for $f_{0}$ based on $\widehat{f}^{\mathrm{bc}}_{n}$ via the rectangle

\displaystyle\mathcal{R}_{n}(c):=\left\{f\in\mathcal{F}:\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f\|_{\infty}\leq c\right\},

(14)

where $c>0$ approximates the (asymptotic) $1-\alpha$ quantile of the law of $\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}$ . To compute $c>0$ we proceed in two steps: First, we show that $\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}$ can be written as the sum of the supremum of an empirical process and a negligible remainder term. Then, we apply the strategy developed in Section 3.5 to bootstrap the supremum of the empirical process.

By Lemma 20 in Appendix B.5,

\displaystyle\widehat{f}^{\mathrm{bc}}_{n}-f_{0}=(T+\lambda)^{-2}T\left(\frac{1}{n}\sum_{i=1}^{n}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right)+R_{n},

where $T=\mathrm{E}[k_{X}\otimes k_{X}^{*}]$ and $R_{n}$ is a higher-order remainder term. Since $k$ is a reproducing kernel, we have $k_{x}(z)=\langle k_{x},k_{z}\rangle_{\mathcal{H}}=\langle k_{x},z^{*}\rangle_{\mathcal{H}}$ for all $x,z\in S$ . Hence, above expansion implies

\displaystyle\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}=\sup_{u\in S^{*}}\left|\left\langle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}V_{i},u\right\rangle_{\mathcal{H}}\right|+\Theta_{n},\quad\quad\text{where}\quad\quad|\Theta_{n}|\leq\sqrt{n}\|R_{n}\|_{\infty},

and $V_{i}=(T+\lambda)^{-2}T\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}$ , $1\leq i\leq n$ . By Lemma 21 in Appendix B.5, $\sqrt{n}\|R_{n}\|_{\infty}$ is negligible with high probability. Moreover, since $f_{0}$ is the best approximation in square loss, the random elements $V_{i}$ ’s have mean zero. Thus, $\sup_{u\in S^{*}}|\langle n^{-1/2}\sum_{i=1}^{n}V_{i},u\rangle_{\mathcal{H}}|\equiv\sqrt{n}\|Q_{n}-Q\|_{\mathcal{F}_{n}}$ , where $Q_{n}$ is the empirical measure of the $V_{i}$ ’s, $Q$ the pushforward of $P$ under the map $(Y,X)\mapsto V=(T+\lambda)^{-2}T\big{(}Y-f_{0}(X)\big{)}k_{X}$ , and $\mathcal{F}_{n}=\{v\mapsto\langle v,u\rangle_{\mathcal{H}}:u\in S^{*}\}$ . Since the functions $f\in\mathcal{F}_{n}$ are just the evaluation functionals of $u\in S^{*}$ , in the following we identify $f\in\mathcal{F}_{n}$ with its corresponding $u\in S^{*}$ . The covariance function $\mathcal{C}:S^{*}\times S^{*}\rightarrow\mathbb{R}$ associated with the empirical process $\{\sqrt{n}(Q_{n}-Q)(f):f\in\mathcal{F}_{n}\}$ is thus given by

\displaystyle(u_{1},u_{2})\mapsto\mathrm{E}[\langle V,u_{1}\rangle_{\mathcal{H}}\langle V,u_{2}\rangle_{\mathcal{H}}]=\mathrm{E}\big{[}\big{\langle}(V\otimes V^{*})u_{1},u_{2}\big{\rangle}_{\mathcal{H}}\big{]}=\big{\langle}\Omega u_{1},u_{2}\big{\rangle}_{\mathcal{H}},

with covariance operator

\displaystyle\Omega:=\sigma_{0}^{2}T(T+\lambda)^{-2}T(T+\lambda)^{-2}T,\quad\quad\text{where}\quad\quad\sigma_{0}^{2}:=\mathrm{E}\big{[}\big{(}Y-f_{0}(X)\big{)}^{2}\big{]},

where we have used that the operators $T$ and $(T+\lambda)^{-1}$ commute.

We proceed to construct a Gaussian proxy process as outlined in Section 3.5: Let $\widehat{\Omega}_{n}=\widehat{\sigma}_{n}^{2}\widehat{T}_{n}(\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n}(\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n}$ and $\widehat{\sigma}_{n}^{2}=n^{-1}\sum_{i=1}^{n}\big{(}Y_{i}-\widehat{f}_{n}(X_{i})\big{)}^{2}$ be the plug-in estimates of the covariance operator $\Omega$ and the variance $\sigma_{0}^{2}$ , respectively. Define $\widehat{\mathcal{C}}_{n}:S^{*}\times S^{*}\rightarrow\mathbb{R}$ by $(u_{1},u_{2})\mapsto\big{\langle}\widehat{\Omega}_{n}u_{1},u_{2}\big{\rangle}_{\mathcal{H}}$ . Recall definition (8) of the integral operator $T_{K}$ . In the present setup, $T_{\widehat{\mathcal{C}}_{n}}=\widehat{\Omega}_{n}$ by Fubini’s theorem. Denote by $\big{\{}(\widehat{\lambda}_{k},\widehat{\varphi}_{k})\big{\}}_{k=1}^{\infty}$ the eigenvalue and eigenfunction pairs of $\widehat{\Omega}_{n}$ . Further, let $\{\xi_{k}\}_{k=1}^{\infty}$ be a sequence of i.i.d. standard normal random variables. Then, for $m\geq 1$ and $u,u_{1},u_{2}\in S^{*}$ define

\displaystyle\widehat{Z}_{n}^{m}(u):=\sum_{k=1}^{m}\xi_{k}\sqrt{\widehat{\lambda}_{k}}\widehat{\varphi}_{k}(u),\quad\quad\widehat{\mathcal{C}}_{n}^{m}(u_{1},u_{2}):=\sum_{k=1}^{m}\widehat{\lambda}_{k}\widehat{\varphi}_{k}(u_{1})\widehat{\varphi}_{k}(u_{2})=\big{\langle}\widehat{\Omega}_{n}^{m}u_{1},u_{2}\big{\rangle}_{\mathcal{H}}.

(15)

where $\widehat{\Omega}_{n}^{m}$ is the best rank- $m$ approximation of $\widehat{\Omega}_{n}$ .

Given Proposition 7 we postulate that the process $\widehat{Z}_{n}^{\infty}$ is an almost sure version of a Gaussian process on $S^{*}$ with covariance function $\widehat{\mathcal{C}}_{n}$ (or, equivalently, with covariance operator $\widehat{\Omega}_{n}$ . Consequently, Theorem 9 guarantees validity of the Gaussian process bootstrap based on $\widehat{Z}_{n}^{m}$ . To make all these claims rigorous, consider the following assumptions:

Assumption 4 (On the kernel).

The kernel $k:S\times S\rightarrow\mathbb{R}$ is symmetric, positive semi-definite, continuous, and bounded, i.e. $\sup_{x\in S}\sqrt{|k(x,x)|}=:\kappa<\infty$ .

Remark 3.

The assumptions on the kernel are standard and important (Berlinet and Thomas-Agnan, 2004). The continuity of $k$ guarantees that $k_{X}$ is a random element on $\mathcal{H}$ whenever $X\in S$ is a random variable. It also implies that the RKHS $\mathcal{H}$ is separable whenever $S$ is separable. The boundedness and the reproducing property of $k$ imply that $\|f\|_{\infty}\leq\kappa\|f\|_{\mathcal{H}}$ for all $f\in\mathcal{H}$ .

Assumption 5 (On the data).

The data $(Y_{1},X_{1}),\ldots,(Y_{n},X_{n})\in\mathbb{R}\times S$ are i.i.d. random elements defined on an abstract product probability space $(\Omega,\mathcal{A},\mathbb{P})$ . The $Y_{i}$ ’s are almost surely bounded, i.e. there exists an absolute constant $B>0$ such that $\max_{1\leq i\leq n}|Y_{i}|\leq B$ almost surely.

Remark 4.

The almost sure boundedness of the $Y_{i}$ ’s is a strong assumption. We introduce this assumption to keep technical arguments at a minimum. Singh and Vijaykumar (2023) impose an equivalent boundedness condition on the pseudo-residuals $\varepsilon_{i}=Y_{i}-f_{0}(X_{i})$ , $1\leq i\leq n$ .

Assumption 6 (On the population and sample covariance operators).

For all $n\geq 1$ (i) there exists $\omega_{S}>0$ such that $\inf_{u\in S^{*}}\langle\Omega u,u\rangle_{\mathcal{H}}\geq\omega_{S}$ and (ii) $\mathrm{tr}(\Omega)\vee\mathrm{tr}(\widehat{\Omega}_{n})<\infty$ .

Remark 5.

Condition (i) is the Hilbert space equivalent to the lower bound on the variance in Assumption 3. While Singh and Vijaykumar (2023) do not explicitly impose a lower bound on the covariance operator, such a lower bound is implied by their Assumption 5.2 (see Giessing, 2023, for details). In eq. (LABEL:eq:theorem:Uniform-CI-Bands-RKHS-8) in Appendix 4 we provide a general non-asymptotic complement to below Proposition 3 which applies even if $\omega_{S}\equiv\omega_{S}(n)\rightarrow 0$ as $n\rightarrow\infty$ . Condition (ii) is a classical assumption in learning theory on RKHS (Mendelson, 2002). Together with Conditions (i) it implies that $\Omega$ and $\widehat{\Omega}_{n}$ are finite rank and trace class operators, respectively.

Denote the $\alpha$ -quantile of the supremum of the Gaussian proxy process $\widehat{Z}_{n}^{m}$ in (15) by

\displaystyle c_{n}^{m}(\alpha)

\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\sup_{u\in S^{*}}\left|\big{\langle}\widehat{Z}_{n}^{m},u\big{\rangle}_{\mathcal{H}}\right|\leq s\mid(Y_{1},X_{1}),\ldots,(Y_{n},X_{n})\right\}\geq\alpha\right\}.

(16)

An application of Theorem 9 yields:

Proposition 3 (Bootstrap quantiles).

Let $(S,d_{k})$ be a compact metric space such that $\int_{0}^{\infty}\sqrt{N(S,d_{k},\varepsilon)}\\ d\varepsilon<\infty$ . If Assumptions 4, 5, and 6, and the rates in eq. (77) in Appendix E hold, then

	$\displaystyle\sup_{\alpha\in(0,1)}\left\|\mathbb{P}\left\{\sqrt{n}\\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\\|_{\infty}\leq c_{n}^{m}(\alpha)\right\}-\alpha\right\|$
	$\displaystyle\quad=o(1)+\mathbb{P}\left\{\big{\\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\\|}_{op}>\big{(}\kappa\mathfrak{n}_{1}(\lambda)+\bar{\sigma}^{2}\big{)}\\|T^{3}(T+\lambda)^{-4}\\|_{op}\sqrt{\frac{\log n}{n\lambda^{2}}}\right\},$

where $\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1$ and $\mathfrak{n}_{1}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2}T\right)$ .

The finite metric entropy condition on the set $S$ ensures that the Gaussian bootstrap process (15) is almost surely bounded and uniformly continuous on $S^{*}\subset\mathcal{H}$ (as required by Theorem 9). This condition is not merely technical but also intuitive: Since the RKHS $\mathcal{H}$ is the completion of $\mathrm{span}(\{k_{x}:x\in S\})$ and $k$ is bounded and continuous, conditions that guarantee the continuity of Gaussian bootstrap processes on (a subset of) $\mathcal{H}$ should indeed be attributable to properties of $S$ . Importantly, the metric entropy condition on $S$ does not impose restrictions on the dimension of $\mathcal{H}$ . Only Assumption 6 (ii) implicitly imposes restrictions on the dimension of $\mathcal{H}$ .

Since under the conditions of Proposition 3, $\lim_{m\rightarrow\infty}\|\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\|_{op}=0$ almost surely for all $n\geq 1$ , it follows that the bootstrap confidence band proposed in (14) is asymptotically valid:

Corollary 4 (Simultaneous bootstrap confidence bands).

Under the setup of Proposition 3,

\displaystyle\lim_{n\rightarrow\infty}\lim_{m\rightarrow\infty}\sup_{\alpha\in(0,1)}\left|\mathbb{P}\Big{\{}f_{0}\in\mathcal{R}_{n}\big{(}c_{n}^{m}(1-\alpha)\big{)}\Big{\}}-(1-\alpha)\right|=0,

with the uniform confidence band $\mathcal{R}_{n}$ as defined in (14) and quantile $c_{n}(1-\alpha)$ as in (16).

A thorough comparison of the Gaussian process bootstrap and Singh and Vijaykumar’s (2023) symmetrized multiplier bootstrap is beyond the scope of this paper. In practice, both methods yield biased confidence bands for $f_{0}$ , albeit for different reasons: Singh and Vijaykumar’s (2023) bias stems from constructing a confidence band for the pseudo-true regression function $f_{\lambda}=(T+\lambda)^{-1}Tf_{0}$ without correcting the regularization bias induced by $\lambda>0$ , ours is due to using an $m$ -truncated Karhunen-Loève decomposition based on a finite number of eigenfunctions. In future work we will explore ways to mitigate these biases by judiciously choosing $\lambda\equiv\lambda(n)\rightarrow 0$ and $m\equiv m(d)\rightarrow\infty$ .

5 Conclusion

In this paper we have developed a new approach to theory and practice of Gaussian and bootstrap approximations of the sampling distribution of suprema of empirical processes. We have put special emphasize on non-asymptotic approximations that are entropy- and weak variance-free, and have allowed the function class $\mathcal{F}_{n}$ to vary with the sample size $n$ and to be non-Donsker. We have shown that such general approximation results are useful, among other things, for inference on high-dimensional statistical models and reproducing kernel Hilbert spaces. However, theory and methodology in this paper have three limitations that need to be addressed in future work:

•

Reliance on independent and identically distributed data. All statistically relevant results in this paper depend on Proposition 1, which heavily relies on the assumption of independent and identically distributed data. Expanding Proposition 1 to accommodate non-identical distributed data would be a first step towards solving simultaneous and large-scale two-sample testing problems and conducting inference in high-dimensional fixed design settings. Currently, the results in this paper are exclusively applicable to one-sample testing and unconditional inference.
•

Lack of tight lower bounds on the strong variances of most Gaussian processes. One the most notable features of the results in this paper is the fact that all upper bounds depend on the inverse of the strong variance of some Gaussian proxy process. Unfortunately, in statistical applications this poses a formidable challenge since up until now there exist only few techniques to derive tight lower bounds on these strong variances (Giessing and Fan, 2023; Giessing, 2023). We either need new tools or we need to develop Gaussian and bootstrap approximations for statistics other than maxima/ suprema. The latter will require new anti-concentration inequalities.
•

Biased quantile estimates due to bootstrapping non-pivotal statistic. The Gaussian process bootstrap is based on a non-pivotal statistic, i.e. the sampling distribution of the supremum depends on the unknown population covariance function. In practice, when bootstrapping non-pivotal statistics the estimated quantiles often differ substantially from the true quantiles. Several bias correction schemes have been proposed in the classical setting (Davison et al., 1986; Beran, 1987; Hall and Martin, 1988; Shi, 1992). Since the Gaussian process bootstrap is not a re-sampling procedure in the classical sense, these techniques do not apply. In Giessing and Fan (2023) we therefore develop the spherical bootstrap to improve accuracy and efficiency when bootstrapping $\ell_{p}$ -statistics. However, this approach does not generalize to arbitrary empirical processes as the one in Section 4.3. Thus, there is an urgent need for new bias correction schemes.

Acknowledgement

Alexander Giessing is supported by NSF grant DMS-2310578.

References

Anderson (2003) T. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics. Wiley, 2003. ISBN 9780471360919.
Anderson (1963) T. W. Anderson. Asymptotic Theory for Principal Component Analysis. The Annals of Mathematical Statistics, 34(1):122 – 148, 1963.
Bao et al. (2015) Z. Bao, G. Pan, and W. Zhou. Universality for the largest eigenvalue of sample covariance matrices with general population. The Annals of Statistics, 43(1):382–421, 2015.
Belkin et al. (2018) M. Belkin, S. Ma, and S. Mandal. To understand deep learning we need to understand kernel learning. In J. Dy and A. Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 541–549. PMLR, 2018.
Bentkus (2003) V. Bentkus. On the dependence of the Berry–Esseen bound on dimension. Journal of Statistical Planning and Inference, 113(2):385 – 402, 2003. ISSN 0378-3758.
Beran (1987) R. Beran. Prepivoting to reduce level error of confidence sets. Biometrika, 74(3):457–468, 1987.
Berlinet and Thomas-Agnan (2004) A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, 2004.
Bhattacharya and Holmes (2010) R. Bhattacharya and S. Holmes. An exposition of Götze’s estimation of the rate of convergence in the multivariate central limit theorem, 2010.
Bohn et al. (2019) B. Bohn, C. Rieger, and M. Griebel. A representer theorem for deep kernel learning. The Journal of Machine Learning Research, 20(1):2302–2333, 2019.
Bolthausen (1984) E. Bolthausen. An estimate of the remainder in a combinatorial central limit theorem. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 66(3):379–386, 1984.
Bong et al. (2023) H. Bong, A. K. Kuchibhotla, and A. Rinaldo. Dual induction clt for high-dimensional m-dependent data, 2023.
Cattaneo et al. (2022) M. D. Cattaneo, R. P. Masini, and W. G. Underwood. Yurinskii’s coupling for martingales, 2022.
Chen and Xu (2020) L. Chen and S. Xu. Deep neural tangent kernel and laplace kernel have the same rkhs. In International Conference on Learning Representations, 2020.
Chen (2018) X. Chen. Gaussian and bootstrap approximations for high-dimensional u-statistics and their applications. Ann. Statist., 46(2):642–678, 04 2018.
Chen et al. (2015) Y.-C. Chen, C. R. Genovese, and L. Wasserman. Asymptotic theory for density ridges. The Annals of Statistics, 43(5):1896–1928, 2015.
Chen et al. (2016) Y.-C. Chen, C. R. Genovese, R. J. Tibshirani, and L. Wasserman. Nonparametric modal regression. The Annals of Statistics, 44(2):489 – 514, 2016.
Chernozhukov et al. (2013) V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics, 41(6):2786–2819, 12 2013.
Chernozhukov et al. (2014a) V. Chernozhukov, D. Chetverikov, and K. Kato. Anti-concentration and honest, adaptive confidence bands. The Annals of Statistics, 42(5):1787–1818, 2014a.
Chernozhukov et al. (2014b) V. Chernozhukov, D. Chetverikov, and K. Kato. Gaussian approximation of suprema of empirical processes. The Annals of Statistics, 42(4):1564–1597, 08 2014b.
Chernozhukov et al. (2015) V. Chernozhukov, D. Chetverikov, and K. Kato. Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162(1):47–70, Jun 2015.
Chernozhukov et al. (2016) V. Chernozhukov, D. Chetverikov, and K. Kato. Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related gaussian couplings. Stochastic Processes and their Applications, 126(12):3632–3651, 2016. ISSN 0304-4149. In Memoriam: Evarist Giné.
Chernozhukov et al. (2017) V. Chernozhukov, D. Chetverikov, and K. Kato. Central limit theorems and bootstrap in high dimensions. The Annals of Probability, 45(4):2309–2352, 07 2017.
Chernozhukov et al. (2019) V. Chernozhukov, D. Chetverikov, K. Kato, and Y. Koike. Improved central limit theorem and bootstrap approximations in high dimensions. arXiv preprint, arXiv:1912.10529, 12 2019.
Chernozhukov et al. (2020) V. Chernozhukov, D. Chetverikov, and Y. Koike. Nearly optimal central limit theorem and bootstrap approximations in high dimensions. 2020.
Chernozhukov et al. (2023) V. Chernozhukov, D. Chetverikov, K. Kato, and Y. Koike. High-dimensional data bootstrap. Annual Review of Statistics and Its Application, 10(1):427–449, 2023.
Chetverikov (2019) D. Chetverikov. Testing regression monotonicity in econometric models. Econometric Theory, 35(4):729–776, 2019.
Davison et al. (1986) A. C. Davison, D. V. Hinkley, and E. Schechtman. Efficient bootstrap simulation. Biometrika, 73(3):555–566, 1986.
Deng and Zhang (2020) H. Deng and C.-H. Zhang. Beyond gaussian approximation: Bootstrap for maxima of sums of independent random vectors. arXiv preprint, arXiv:1705.09528, 2020.
Dezeure et al. (2017) R. Dezeure, P. Bühlmann, and C.-H. Zhang. High-dimensional simultaneous inference with the bootstrap. Test, 26(4):685–719, 2017.
Dudley (2002) R. Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2002.
Dudley (2014) R. Dudley. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2014.
El Karoui (2007) N. El Karoui. Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. The Annals of Probability, 35(2):663 – 714, 2007.
Fan et al. (2018) J. Fan, Q.-M. Shao, and W.-X. Zhou. Are discoveries spurious? Distributions of maximum spurious correlations and their applications. Ann. Statist., 46(3):989–1017, 06 2018.
Fang and Koike (2021) X. Fang and Y. Koike. High-dimensional central limit theorems by Stein’s method. The Annals of Applied Probability, 31(4):1660 – 1686, 2021.
Folland (1999) G. Folland. Real Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley, 1999.
Fujikoshi (1980) Y. Fujikoshi. Asymptotic expansions for the distributions of the sample roots under nonnormality. Biometrika, 67(1):45–51, 1980.
Ghanem and Spanos (2003) R. Ghanem and P. Spanos. Stochastic Finite Elements: A Spectral Approach. Dover Publications, 2003.
Giessing (2023) A. Giessing. Anti-Concentration of Suprema of Gaussian Processes and Gaussian Order Statistics. Working Paper, 2023.
Giessing and Fan (2023) A. Giessing and J. Fan. A bootstrap hypothesis test for high-dimensional mean vectors. Working Paper, 2023.
Giessing and Wang (2021) A. Giessing and J. Wang. Debiased inference on heterogeneous quantile treatment effects with regression rank-scores, 2021.
Giné and Nickl (2016) E. Giné and R. Nickl. Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge University Press, 2016.
Götze (1991) F. Götze. On the rate of convergence in the multivariate clt. The Annals of Probability, 19(2):724–739, 1991.
Hall and Martin (1988) P. Hall and M. A. Martin. On bootstrap resampling and iteration. Biometrika, 75(4):661–671, 1988.
Han et al. (2018) F. Han, S. Xu, and W.-X. Zhou. On Gaussian comparison inequality and its application to spectral analysis of large random matrices. Bernoulli, 24(3):1787 – 1833, 2018.
Hardy (2006) M. Hardy. Combinatorics of partial derivatives. Electronic Journal of Combinatorics, 2006.
Hsing and Eubank (2015) T. Hsing and R. Eubank. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, 2015.
Jacot et al. (2018) A. Jacot, F. Gabriel, and C. Hongler. Neural tangent kernel: Convergence and generalization in neural networks. NIPS’18, page 8580–8589, 2018.
Jain and Kallianpur (1970) N. C. Jain and G. Kallianpur. A Note on Uniform Convergence of Stochastic Processes. The Annals of Mathematical Statistics, 41(4):1360 – 1362, 1970.
Janková et al. (2020) J. Janková, R. D. Shah, P. Bühlmann, and R. J. Samworth. Goodness-of-fit Testing in High Dimensional Generalized Linear Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):773–795, 05 2020.
Johnstone (2001) I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics, 29(2):295 – 327, 2001.
Koltchinskii and Lounici (2017) V. Koltchinskii and K. Lounici. Concentration inequalities and moment bounds for sample covariance operators. Bernoulli, 23(1):110 – 133, 2017.
Kuchibhotla et al. (2021) A. K. Kuchibhotla, S. Mukherjee, and D. Banerjee. High-dimensional CLT: Improvements, non-uniform extensions and large deviations. Bernoulli, 27(1):192 – 217, 2021.
Le Cam (1986) L. Le Cam. Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. Springer New York, 1986.
Lopes (2022a) M. E. Lopes. Central limit theorem and bootstrap approximation in high dimensions: Near $1/\sqrt{n}$ rates via implicit smoothing. The Annals of Statistics, 50(5):2492 – 2513, 2022a.
Lopes (2022b) M. E. Lopes. Improved rates of bootstrap approximation for the operator norm: A coordinate-free approach, 2022b.
Lopes et al. (2019) M. E. Lopes, A. Blandino, and A. Aue. Bootstrapping spectral statistics in high dimensions. Biometrika, 106(4):781–801, 09 2019.
Lopes et al. (2020) M. E. Lopes, Z. Lin, and H.-G. Müller. Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data. Annals of Statistics, 48(2):1214–1229, 04 2020.
Lopes et al. (2023) M. E. Lopes, N. B. Erichson, and M. W. Mahoney. Bootstrapping the operator norm in high dimensions: Error estimation for covariance matrices and sketching. Bernoulli, 29(1):428 – 450, 2023.
Mendelson (2002) S. Mendelson. Geometric parameters of kernel machines. In International conference on computational learning theory, pages 29–43. Springer, 2002.
Naumov et al. (2019) A. Naumov, V. Spokoiny, and V. Ulyanov. Bootstrap confidence sets for spectral projectors of sample covariance. Probability Theory and Related Fields, 174(3):1091–1132, 2019.
Nourdin and Peccati (2012) I. Nourdin and G. Peccati. Normal approximations with Malliavin calculus: from Stein’s method to universality. Number 192. Cambridge University Press, 2012.
Péché (2009) S. Péché. Universality results for the largest eigenvalues of some sample covariance matrix ensembles. Probability Theory and Related Fields, 143(3):481–516, 2009.
Raič (2019) M. Raič. A multivariate Berry–Esseen theorem with explicit constants. Bernoulli, 25(4A):2824–2853, 11 2019.
Rockafellar (1999) R. T. Rockafellar. Second-order convex analysis. J. Nonlinear Convex Anal, 1(1-16):84, 1999.
Schölkopf and Smola (2002) B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Adaptive computation and machine learning. MIT Press, 2002.
Sedaghat (2003) H. Sedaghat. Nonlinear Difference Equations: Theory with Applications to Social Science Models. Springer, 2003.
Shi (1992) S. G. Shi. Accurate and efficient double-bootstrap confidence limit method. Computational Statistics and Data Analysis, 13(1):21–32, 1992.
Singh and Vijaykumar (2023) R. Singh and S. Vijaykumar. Kernel ridge regression inference, 2023.
Steinwart and Christmann (2008) I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.
Tanguy (2017) K. Tanguy. Quelques inégalités de superconcentration: théorie et applications. PhD thesis, Université Paul Sabatier-Toulouse III, 2017.
Unser (2019) M. Unser. A representer theorem for deep neural networks. Journal of Machine Learning Research, 20, 2019.
van der Vaart and Wellner (1996) A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996.
Vershynin (2018) R. Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.
Wahba (1990) G. Wahba. Spline Models for Observational Data. SIAM, 1990.
Waternaux (1976) C. M. Waternaux. Asymptotic distribution of the sample roots for a nonnormal population. Biometrika, 63(3):639–645, 1976.
Yurinsky (2006) V. Yurinsky. Sums and Gaussian Vectors. Springer, 2006.
Zhang and Cheng (2017) X. Zhang and G. Cheng. Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association, 112(518):757–768, 2017.

Supplementary Materials for “Gaussian and Bootstrap Approximations for Empirical Processes” Alexander Giessing¹¹footnotemark: 1

Contents

\startcontents

[sections] \printcontents[sections]l1

Appendix A Two toy examples

In this section we present two examples to illustrate the limitations of the existing and the advantage of the new Gaussian approximation results. The presentation is deliberately expository; we omit proofs as much as possible. In both examples, we take $\mathcal{F}_{n}=\{x\mapsto f(x)=x^{\prime}u:u\in S^{d-1}\}$ with $S^{d-1}=\{u\in\mathbb{R}^{d}:\|u\|_{2}=1\}$ , which plays a role in the construction of high-dimensional confidence regions and multiple testing problems (see Section 4.1).

Consider a simple random sample $X_{1},\ldots,X_{n}\in\mathbb{R}^{d}$ from the law of $X=a\cdot\xi$ , where $a\in\mathbb{R}^{d}$ is a fixed vector and $\xi\in\mathbb{R}$ a centered random variable with finite third moment. Then, $\sup_{f\in\mathcal{F}_{n}}|\mathbb{G}_{n}(f)|=\|n^{-1/2}\sum_{i=1}^{n}X_{i}\|_{2}=\|a\|_{2}|n^{-1/2}\sum_{i=1}^{n}\xi_{i}|$ . Whence, the supremum of the empirical process reduces to the average of i.i.d. scalar-valued random variables and the classical univariate Berry-Essen theorem yields the following bound on the Kolmogorov distance:

\displaystyle\varrho_{n}\lesssim\frac{\mathrm{E}[|\xi|^{3}]}{\sqrt{n}\>\mathrm{E}[|\xi|^{2}]^{3/2}}\rightarrow 0\quad\mathrm{as}\quad n\rightarrow\infty.

(17)

This non-asymptotic bound is independent of the dimension $d$ and the weak variance of the Gaussian proxy process. If we ignore the low-rank structure of the data and instead use the Gaussian approximation results in Chernozhukov et al. (2014b) (Theorem 2.1 combined with Lemma 2.3) we obtain (qualitatively)

\displaystyle\varrho_{n}\lesssim_{\star}\frac{\mathrm{polylog}\left(N\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\vee n\right)}{n^{1/6}\sqrt{\inf_{u\in S^{d-1}}\mathrm{Var}(Z^{\prime}u)}},

(18)

where $Z\sim N\left(0,\mathrm{E}[\xi^{2}]\cdot aa^{\prime}\right)$ and $\lesssim_{\star}$ hides a complicated multiplicative factor which, among other things, depends on the sample size $n$ , third and higher moments of $\|Z\|_{2}$ , and the inverse of the discretization level $\varepsilon>0$ (see Section 3.6). Since the covariance matrix $\mathrm{E}[\xi^{2}]\cdot aa^{\prime}$ has rank one, the unit ball in $\mathcal{F}_{n}$ with respect to the intrinsic standard deviation metric $e_{P}$ is isometrically isomorphic to the interval $[-1,1]\subset\mathbb{R}$ . Hence, the metric entropy can be upper bounded independently of the dimension $d$ ; in particular, we have $\log N\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\leq\log(1+2/\varepsilon)$ . However, the low-rank structure of the data also implies that the weak variance of the associated Gaussian proxy process vanishes; indeed, $\inf_{u\in S^{d-1}}\mathrm{Var}(Z^{\prime}u)=\mathrm{E}[\xi^{2}]\cdot\inf_{u\in S^{d-1}}(a^{\prime}u)^{2}=0$ . Thus, the upper bound in (18) is in fact invalid (or trivial if we interpret $1/0=\infty$ ) and fails to replicate the univariate Berry-Esseen bound in (17).

In contrast, the new Gaussian approximation inequality (Theorem 1 in Section 3.2; Theorem A.1 in Giessing and Fan (2023)) is agnostic to the covariance structure of the data and yields

\displaystyle\varrho_{n}\lesssim\frac{(\mathrm{E}[\|X\|_{2}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z\|_{2})}}+\frac{\mathrm{E}\left[\|X\|_{2}^{3}\mathbf{1}\{\|X\|_{2}^{3}>n\>\mathrm{E}[\|X\|_{2}^{3}]\}\right]}{\mathrm{E}\left[\|X\|_{2}^{3}\right]}+\frac{\mathrm{E}[\|Z\|_{2}]}{\sqrt{n\mathrm{Var}(\|Z\|_{2})}},

(19)

where $\lesssim$ hides an absolute constant independent of $n,d$ , and the distribution of the $X_{i}$ ’s and $Z\sim N\left(0,\mathrm{E}[\xi^{2}]\cdot aa^{\prime}\right)$ . The upper bound in this inequality depends only on the third moment of the envelop function $x\mapsto\|x\|_{2}$ and the strong variance of the Gaussian proxy process $\mathrm{Var}(\|Z\|_{2})$ . If we use that $\|X\|_{2}=\|a\|_{2}\cdot|\xi|$ and $\|Z\|_{2}=\|a\|_{2}\cdot|g|$ for $g\sim N(0,1)$ , we obtain

\displaystyle\varrho_{n}\lesssim\frac{\mathrm{E}[|\xi|^{3}]^{1/3}}{n^{1/6}\>\mathrm{E}[|\xi|^{2}]^{1/2}}+\frac{\mathrm{E}\left[|\xi|^{3}\mathbf{1}\{|\xi|^{3}>n\>\mathrm{E}[|\xi|^{3}]\}\right]}{\mathrm{E}\left[|\xi|^{3}\right]}\rightarrow 0\quad\mathrm{as}\quad n\rightarrow\infty.

(20)

This inequality is obviously dimension- and weak variance-free. In this sense, it recovers the essential feature of the univariate Berry-Esseen bound in (17) and improves over the bound in (18). The dependence on the sample size $n$ and the moments of $\xi$ is still sub-optimal; but refinements in this direction are beyond the scope of this paper. Obviously, in this example, we have chosen a rank one covariance matrix only to be able to compare the Gaussian approximation result with the Berry-Esseen theorem. Any low-rank structure implies a vanishing weak variance and, hence, a breakdown of the results in Chernozhukov et al. (2014b).

Next, suppose that the data $X_{1},\ldots,X_{n}\in\mathbb{R}^{d}$ are a simple random sample drawn from the law of a random vector $X=(X^{(1)},\ldots,X^{(d)})^{\prime}$ with mean zero, element-wise bounded entries $\max_{1\leq k\leq d}|X^{(k)}|<B$ almost surely for some $B>0$ , and equi-correlated covariance matrix $\Sigma=(1-\rho)I_{d}+\rho\mathbf{1}_{d}\mathbf{1}_{d}^{\prime}$ for some $\rho\in(-1/(d-1),1)$ . The constraints on $\rho$ guarantee that $\Sigma$ has full rank. Therefore, the multivariate Berry-Esseen theorem by Bentkus (2003) (Theorem 1.1) implies

\displaystyle\varrho_{n}\lesssim\frac{d^{1/4}\>\mathrm{E}[\|\Sigma^{-1/2}X\|_{2}^{3}]}{\sqrt{n}}.

(21)

This upper bound is not useful in high-dimensional settings because the expected value is polynomial in the dimension $d$ , i.e. $\mathrm{E}[\|\Sigma^{-1/2}X\|_{2}^{3}]\geq\mathrm{E}[\|\Sigma^{-1/2}X\|_{2}^{2}]^{3/2}=d^{3/2}$ .

The Gaussian approximation results by Chernozhukov et al. (2014b) yield (again) inequality (18). Since the covariance matrix $\Sigma$ has full rank, the weak variance is now strictly positive and equal to the smallest eigenvalue of $\Sigma$ , i.e. $\inf_{u\in S^{d-1}}\mathrm{Var}(Z^{\prime}u)=\inf_{u\in S^{d-1}}u^{\prime}\Sigma u=1-\rho>0$ . However, in this example, the metric entropy with respect to the intrinsic standard deviation metric $e_{P}$ poses a problem. Indeed, let $\lambda_{1}=1+(d-1)\rho$ and $\lambda_{2}=\ldots=\lambda_{d}=1-\rho$ be the eigenvalues of $\Sigma$ . Then, the unit ball in $\mathcal{F}_{n}$ with respect to $e_{P}$ can be identified with the weighted Euclidean ball $B_{\lambda}^{d}(0,\varepsilon):=\{u\in\mathbb{R}^{d}:\sum_{k=1}^{d}\lambda_{k}u_{k}\leq\varepsilon^{2}\}$ . Let $B^{d}(0,1)$ be the standard unit ball in $\mathbb{R}^{d}$ with respect to the Euclidean distance. Then, $B^{d}(0,\varepsilon)\subseteq B_{\lambda}^{d}(0,\varepsilon)$ whenever $\varepsilon<1-\rho$ . Therefore, standard arguments $N\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\geq\mathrm{vol}\left(B^{d}(0,1)\right)/\mathrm{vol}\left(B_{\lambda}^{d}(0,\varepsilon)\right)\geq\mathrm{vol}\left(B^{d}(0,1)\right)/\mathrm{vol}\left(B^{d}(0,\varepsilon)\right)\geq\left((1-\rho)/\varepsilon\right)^{d}$ . Thus, the metric entropy grows linear in the dimension $d$ , i.e. $\log N\left(\mathcal{F}_{n},e_{P},\varepsilon\right)\geq d\log\left((1-\rho)/\varepsilon\right)$ for $\varepsilon\downarrow 0$ . We conclude that the results in Chernozhukov et al. (2014b) are again not useful in high dimensions with $d\gg n$ .

The new Gaussian approximation inequality implies (again) (19). Since $\max_{1\leq k\leq d}|X^{(k)}|<B$ almost surely and $\mathrm{tr}(\Sigma)=d$ it follows that $\sqrt{d}=(\mathrm{E}[\|Z\|_{2}^{2}])^{1/2}=(\mathrm{E}[\|X\|_{2}^{2}])^{1/2}\leq(\mathrm{E}[\|X\|_{2}^{3}])^{1/3}\leq B\sqrt{d}$ . Moreover, by Theorem A.6 in Giessing and Fan (2023) and since $\mathrm{tr}(\Sigma^{2})=d(1-\rho)^{2}+\rho^{2}d^{2}$ , we have $\mathrm{Var}(\|Z\|_{2})\geq\mathrm{tr}(\Sigma^{2})/\mathrm{tr}(\Sigma)=(1-\rho)^{2}+\rho^{2}d$ . Therefore, inequality (19) simplifies to

\displaystyle\varrho_{n}\lesssim\frac{B}{n^{1/6}\rho}+B^{3}\mathbf{1}\{B>n\}\rightarrow 0\quad\mathrm{as}\quad n\rightarrow\infty.

(22)

This inequality is not only dimension- and weak variance-free but also improves qualitatively over, both, the results by Chernozhukov et al. (2014b) and Bentkus (2003). Intuitively, the reason why we are able to shed the $d^{1/4}$ -factor from the upper bound compared to the results in Bentkus (2003) is that we only take the supremum over all Euclidean balls with center at the origin whereas he takes the supremum over all convex sets in $\mathbb{R}^{d}$ . (Note that to apply his bound in the context of this example, we need to take the supremum over at least all weighted Euclidean balls $B_{\lambda}^{d}(0,\varepsilon):=\{u\in\mathbb{R}^{d}:\sum_{k=1}^{d}\lambda_{k}u_{k}\leq\varepsilon^{2}\}$ .) This second example is related to the first one in so far as the covariance matrix has “approximately” rank one. Indeed, as the dimension increases the law of the random vector $X$ concentrates in the neighborhood of the one-dimensional subspace spanned by eigenvector associated to the largest eigenvalue of $\Sigma$ . This becomes even more obvious if we consider the standardized covariance $\Sigma/\|\Sigma\|_{op}$ with eigenvalues $\lambda_{1}=\rho+(1-\rho)/d\rightarrow\rho$ and $\lambda_{2}=\ldots\lambda_{d}=(1-\rho)/d\rightarrow 0$ as $d\rightarrow\infty$ .

In both examples, the exact and approximate low-rank structures of the data are crucial in order to go from the abstract bound (19) to the dimension-/ entropy-free bounds (20) and (22), respectively. This is not coincidental and is in fact representative for the entire theory that we develop in this paper: While our Gaussian and bootstrap approximation inequalities hold without assumptions on the metric entropy, in concrete examples they often only yield entropy-free upper bounds if the trace of the covariance operator is bounded or grows at a much slower rate than the sample size $n$ (e.g. low-rank, bounded effective rank, or variance decay, see Section 4). This is certainly a limitation of our theory, but a low-rank covariance (function) is an empirically well-documented property of many data sets and therefore a common assumption in multivariate and high-dimensional statistics (e.g. Anderson, 2003; Vershynin, 2018).

Appendix B Auxiliary results

B.1 Smoothing inequalities and partial derivatives

Lemma 1.

Let $X,Z\in\mathbb{R}$ be arbitrary random variables. There exists a map $h_{s,\lambda}\in C^{\infty}_{c}(\mathbb{R})$ such that

(i)

for all $s,t\in\mathbb{R}$ and $\lambda>0$ ,

$\displaystyle|D^{k}h_{s,\lambda}(t)|\leq C_{k}\lambda^{-k}\mathbf{1}\{s\leq t\leq s+3\lambda\},$

where $C_{k}>0$ is a constant depending only on $k\in\mathbb{N}_{0}$ ; and

(ii)

for all $\lambda>0$ ,

\displaystyle\left|\sup_{s\in\mathbb{R}}\left|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right|-\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h_{s,\lambda}(X)-h_{s,\lambda}(Z)]\big{|}\right|\leq\zeta_{3\lambda}(X)\wedge\zeta_{3\lambda}(Z),

where $\zeta_{\lambda}(V):=\sup_{s\in\mathbb{R}}\mathbb{P}\{s\leq V\leq s+\lambda\}$ for real-valued $V\in\mathbb{R}$ .

Remark 1.

We can take $C_{0}=C_{1}=1$ (see proof).

Lemma 2.

Let $h_{s,\lambda}\in C^{\infty}_{0}(\mathbb{R})$ be the map from Lemma 1 and define $x\mapsto h(x):=h_{s,\lambda}(\|x\|_{\infty})$ for $x\in\mathbb{R}^{d}$ . Let $(P_{t})_{t\geq 0}$ be the Ornstein-Uhlenbeck semi-group with stationary measure $N(0,\Sigma)$ and positive definite covariance matrix $\Sigma$ .

(i)

For arbitrary indices $1\leq i_{1},\ldots,i_{k}\leq d$ , $k\geq 1$ , and all $x_{0}\in\mathbb{R}^{d}$ ,

\displaystyle\frac{\partial^{k}}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}=\int_{0}^{\infty}e^{-kt}P_{t}\left(\frac{\partial^{k}h}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}\right)(x_{0})dt;

(ii)

for almost every $x_{0}\in\mathbb{R}^{d}$ the absolute value of the derivative in (i) can be upper bounded by

\displaystyle C_{k}\lambda^{-k}\int_{0}^{\infty}e^{-kt}\mathrm{E}\left[\mathbf{1}\left\{s\leq\|V_{0}^{t}\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{|V_{0{i_{1}}}^{t}|\geq|V_{0\ell}^{t}|,\>\ell\neq{i_{1}}\right\}\right]dt\>\mathbf{1}\{i_{1}=\ldots=i_{k}\},

where $V_{0}^{t}=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z$ , $Z\sim N(0,\Sigma)$ , and $C_{k}>0$ is the absolute constant from Lemma 1.

Remark 2.

While claim (i) looks a lot like a “differentiating under the integral sign” type result, it is more accurate to think of it as a specific smoothing property of the Ornstein-Uhlenbeck semigroup with stationary measure $N(0,\Sigma)$ when applied to (compositions of Lipschitz continuous functions with) the map $x\mapsto\|x\|_{\infty}$ .

Lemma 3.

Let $\varrho\in L^{1}(\mathbb{R})$ with $\int\varrho(r)dr=1$ . For $\eta>0$ and a map $h$ on $\mathbb{R}^{d}$ set

\displaystyle(\varrho_{\eta}\ast_{(i)}h)(x):=\int\varrho(r)h(x-r\eta e_{i})dr,

where $e_{i}$ denotes the $i$ th standard unit vector in $\mathbb{R}^{d}$ . For $k\in\mathbb{N}$ , $f_{1},\ldots,f_{k}\in B(\mathbb{R}^{d})$ , $g\in L^{1}(\mathbb{R}^{d})$ , and arbitrary indices $1\leq i_{1},\ldots,i_{k}\leq d$ ,

\displaystyle\left\|\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{j})}f_{j})g-\prod_{j=1}^{k}f_{j}g\right\|_{1}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0.

Remark 3.

The proof of this result on “partially regularized” functions is similar to the one on “fully regularized” functions (Folland, 1999, Theorem 8.14 (a)). The conditions on the functions $f_{1},\ldots,f_{k},g,\varrho$ can probably be relaxed, but they are sufficiently general to apply to the situations that we encounter.

Lemma 4 (Partial derivatives of compositions of almost everywhere diff’able functions).

Let $g\in C^{k}(\mathbb{R})$ and $f\in C^{k}(\mathbb{R}^{d}\setminus\mathcal{N})$ and $\mathcal{N}$ is a null set with respect to the Lebesgue measure on $\mathbb{R}^{d}$ . Then, $g\circ f\in C^{k}(\mathbb{R}^{d}\setminus\mathcal{N})$ and, for arbitrary indices $1\leq i_{1},\ldots,i_{k}\leq d$ , $k\geq 1$ , and all $x\in\mathbb{R}^{d}\setminus\mathcal{N}$ ,

\displaystyle\frac{\partial^{k}(g\circ f)}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}(x)=\sum_{\pi\in\Pi}(D^{|\pi|}g\circ f)\prod_{B\in\pi}\frac{\partial^{|B|}f}{\prod_{j\in B}\partial x_{j}}(x),

where $\Pi$ is the set of all partitions of $\{i_{1},\ldots,i_{k}\}$ , $|\pi|$ denotes the number of “blocks of indices” in partition $\pi\in\Pi$ , and $|B|$ is the number of indices in block $B\in\pi$ .

Remark 4.

This result is a trivial modification of the multivariate version of Faà di Bruno’s formula due to Hardy (2006). The modification is that we only require $f$ to be $k$ -times differentiable almost everywhere on $\mathbb{R}^{d}$ . The formula is generally false if $g$ is only $k$ -times differentiable almost everywhere on $\mathbb{R}$ .

Remark 5.

The first three partial derivatives are of particular interest to us. They are given by (whenever they exist)

	$\displaystyle\frac{\partial(g\circ f)}{\partial x_{i}}(x)$	$\displaystyle=\left(Dg\circ f\right)(x)\frac{\partial f}{\partial x_{i}}(x),$
	$\displaystyle\frac{\partial^{2}(g\circ f)}{\partial x_{i}\partial x_{j}}(x)$	$\displaystyle=\left(D^{2}g\circ f\right)(x)\frac{\partial f}{\partial x_{i}}(x)\frac{\partial f}{\partial x_{j}}(x)+\left(Dg\circ f\right)(x)\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}(x),$
	$\displaystyle\frac{\partial^{3}(g\circ f)}{\partial x_{i}\partial x_{j}\partial x_{k}}(x)$	$\displaystyle=\left(D^{3}g\circ f\right)(x)\frac{\partial f}{\partial x_{i}}(x)\frac{\partial f}{\partial x_{j}}(x)\frac{\partial f}{\partial x_{k}}(x)$
		$\displaystyle\quad{}+\left(D^{2}g\circ f\right)(x)\left[\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}(x)\frac{\partial f}{\partial x_{k}}(x)+\frac{\partial^{2}f}{\partial x_{i}\partial x_{k}}(x)\frac{\partial f}{\partial x_{j}}(x)+\frac{\partial^{2}f}{\partial x_{k}\partial x_{j}}(x)\frac{\partial f}{\partial x_{i}}(x)\right]$
		$\displaystyle\quad{}+\left(Dg\circ f\right)(x)\frac{\partial^{3}f}{\partial x_{i}\partial x_{j}\partial x_{k}}(x).$

Lemma 5 (Partial and total derivatives of $\ell_{\infty}$ -norms).

The map $f(x)=\max_{1\leq k\leq d}|x_{k}|$ is partially differentiable of any order almost everywhere on $\mathbb{R}^{d}$ with partial derivatives (whenever they exist)

\displaystyle\frac{\partial f}{\partial x_{i}}(x)=\mathrm{sign}(x_{i})\mathbf{1}\{|x_{i}|\geq|x_{k}|,\>\forall k\}\quad{}\quad{}\text{and}\quad{}\quad{}\frac{\partial^{k}f}{\partial x_{i_{1}}\cdots\partial x_{i_{k}}}(x)=0,

for $1\leq i,i_{1},\ldots,i_{k}\leq d$ and $k\geq 2$ . Moreover, $f$ is twice totally differentiable almost everywhere on $\mathbb{R}^{d}$ with Jacobian and Hessian matrices (whenever they exist)

\displaystyle Df(x)=\left[\frac{\partial f}{\partial x_{1}}(x),\ldots,\frac{\partial f}{\partial x_{d}}(x)\right]\quad{}\quad{}\text{and}\quad{}\quad{}D^{2}f(x)=\mathbf{0}\in\mathbb{R}^{d\times d}.

Remark 6.

Note that the first partial derivative can be re-written (less compact) as a piece-wise linear function.

B.2 Anti-concentration inequalities and lower bounds on variances

Lemma 6 (Giessing, 2023).

Let $X=(X_{u})_{u\in U}$ be a centered separable Gaussian process indexed by a semi-metric space $U$ . Set $Z=\sup_{u\in U}X_{u}$ and assume that $0\leq Z<\infty$ a.s. For all $\varepsilon\geq 0$ ,

\displaystyle\frac{\varepsilon/\sqrt{12}}{\sqrt{\mathrm{Var}(Z)+\varepsilon^{2}/12}}\leq\sup_{t\geq 0}\mathbb{P}\left\{t\leq Z\leq t+\varepsilon\right\}\leq\frac{\varepsilon\sqrt{12}}{\sqrt{\mathrm{Var}(Z)+\varepsilon^{2}/12}}.

The result remains true if $Z$ is replaced by $\widetilde{Z}=\sup_{u\in U}|X_{u}|$ .

Remark 7.

If the covariance function of $X$ is positive definite, then above inequalities hold even for uncentered $X=(X_{u})_{u\in U}$ and $Z\in[-\infty,\infty)$ a.s.

Lemma 7 (Giessing, 2023).

Let $X=(X_{u})_{u\in U}$ be a separable Gaussian process indexed by a semi-metric space $U$ such that $\mathrm{E}[X_{u}]=0$ , $0<\underline{\sigma}^{2}\leq\mathrm{E}[X_{u}^{2}]\leq\bar{\sigma}^{2}<\infty$ , and $|\mathrm{Corr}(X_{u},X_{v})|\leq\rho$ for all $u,v\in U$ . Set $Z=\sup_{u\in U}X_{u}$ and assume that $Z<\infty$ a.s. Then, $0\leq\mathrm{E}[Z]<\infty$ and there exist absolute constants $c,C>0$ such that

\displaystyle\frac{1}{C}\left(\frac{\underline{\sigma}}{1+\mathrm{E}[Z/\underline{\sigma}]}\right)^{2}\leq\mathrm{Var}(Z)\leq C\left[\bar{\sigma}^{2}\wedge\left(\bar{\sigma}^{2}\rho+\left(\frac{\bar{\sigma}}{(\mathrm{E}[Z/\bar{\sigma}]-c)_{+}}\right)^{2}\right)\right],

with the convention that “ $1/0=\infty$ ”. The result remains true if $Z$ is replaced by $\widetilde{Z}=\sup_{u\in U}|X_{u}|$ .

Lemma 8 (Le Cam, 1986, p. 402).

For $X,Z\in\mathbb{R}$ arbitrary random variables and $\lambda>0$ ,

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\Big{|}

\displaystyle\leq\mathbb{P}\left\{|X-Z|>\lambda\right\}+\zeta_{\lambda}(X)\wedge\zeta_{\lambda}(Z),

where $\zeta_{\lambda}(V):=\sup_{s\in\mathbb{R}}\mathbb{P}\{s\leq V\leq s+\lambda\}$ for real-valued $V\in\mathbb{R}$ .

Lemma 9.

Let $X,Z\in\mathbb{R}$ be arbitrary random variables. Then,

\displaystyle\left|\sqrt{\mathrm{Var}(X)}-\sqrt{\mathrm{Var}(X\vee Z)}\right|\leq\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}},

where $(a)_{+}=\max(a,0)$ for $a\in\mathbb{R}$ . Moreover, if $\mathrm{E}[Z]\geq\mathrm{E}[X]$ , then

\displaystyle\mathrm{Var}\big{(}(Z-X)_{+}\big{)}\leq\mathrm{Var}(Z-X).

B.3 Quantile comparison lemmas

Throughout this section, $Z_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Sigma}_{n})$ and $Z\sim N(0,\Sigma)$ . For $\alpha\in(0,1)$ we define the $\alpha$ th quantile of $\|Z_{n}\|_{\infty}$ and $\|Z\|_{\infty}$ , respectively, by

\displaystyle\begin{split}c_{n}(\alpha;\widehat{\Sigma}_{n})&:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|Z_{n}\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\},\\ c_{n}(\alpha;\Sigma)&:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\geq\alpha\right\}.\end{split}

(23)

Lemma 10.

For all $\delta>0$ ,

	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}(\alpha;\widehat{\Sigma}_{n})\leq c_{n}(\pi_{n}(\delta)+\alpha;\Sigma)\Big{\}}\geq 1-\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\},\quad{}\text{and}$
	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}(\alpha;\Sigma)\leq c_{n}(\pi_{n}(\delta)+\alpha;\widehat{\Sigma}_{n})\Big{\}}\geq 1-\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\},$

where $\pi_{n}(\delta)=K\delta^{1/3}\left(\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})\right)^{-1/3}$ and $K>0$ is an absolute constant.

If we use non-Gaussian proxy statistics to test hypothesis (such as Efron’s empirical bootstrap), we replace Lemma 10 by the following result:

Lemma 11.

Let $T_{n}=S_{n}+R_{n}\geq 0$ be a statistic; $S_{n}$ and $R_{n}$ need not be independent. For $\alpha\in(0,1)$ arbitrary define $c_{n}^{T}(\alpha):=\inf\{s\geq 0:\mathrm{P}(T_{n}\leq s\mid X_{1},\ldots,X_{n})\geq\alpha\}$ . Then, for all $\delta,\eta>0$ ,

	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}^{T}(\alpha)\leq c_{n}(\kappa_{n}(\delta)+\eta+\alpha;\Sigma)\right\}\geq 1-\mathbb{P}\left\{\gamma_{n}+\rho_{n}(\delta)>\eta\right\}\quad{}\text{and}$
	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\Sigma)\leq c_{n}^{T}(\kappa_{n}(\delta)+\eta+\alpha)\right\}\geq 1-\mathbb{P}\left\{\gamma_{n}+\rho_{n}(\delta)>\eta\right\},$

where $\gamma_{n}=\sup_{s\geq 0}|\mathbb{P}\{S_{n}\leq s\mid X_{1},\ldots,X_{n}\}-\mathbb{P}\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\}|$ , $\kappa_{n}(\delta)=K\delta(\mathrm{Var}(\|\Sigma_{n}^{1/2}Z\|_{\infty}))^{-1/2}$ , $\rho_{n}(\delta)=\mathbb{P}\{|R_{n}|>\delta\mid X_{1},\ldots,X_{n}\}$ , and $K>0$ is an absolute constant.

For $\alpha\in(0,1)$ we define the (conditional) $\alpha$ th quantile of the supremum of $\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\}$ by

\displaystyle c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m})

\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|\widehat{Z}_{n}^{m}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\geq\alpha\right\},

and the $\alpha$ th quantile of the supremum of the Gaussian $P$ -bridge process $\{G_{P}(f):f\in\mathcal{F}_{n}\}$ by

\displaystyle c_{n}(\alpha;\mathcal{C}_{P})

\displaystyle:=\inf\left\{s\geq 0:\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\geq\alpha\right\},

The following two lemmas are straightforward generalizations of the preceding lemmas to empirical processes. The proofs of Lemmas 12 and 13 are identical to the ones of Lemmas 10 and 11, respectively. We therefore omit them.

Lemma 12.

For all $\delta>0$ ,

	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m})\leq c_{n}(\pi_{n}(\delta;\mathcal{C}_{P})+\alpha)\right\}\geq 1-\mathbb{P}\left\{\sup_{f,g\in\mathcal{F}_{n}}\big{\|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{\|}>\delta\right\},\quad{}\text{and}$
	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\mathcal{C}_{P})\leq c_{n}(\pi_{n}(\delta)+\alpha;\widehat{\mathcal{C}}_{n}^{m})\right\}\geq 1-\mathbb{P}\left\{\sup_{f,g\in\mathcal{F}_{n}}\big{\|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{\|}>\delta\right\},$

where $\pi_{n}(\delta)=K\delta^{1/3}\left(\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\right)^{-1/3}$ and $K>0$ is an absolute constant.

Lemma 13.

Let $\{Z_{n}(f)=Z_{n}^{1}(f)+Z_{n}^{2}(f):f\in\mathcal{F}_{n}\}$ be an arbitrary stochastic process. For $\alpha\in(0,1)$ arbitrary define $c_{n,Z_{n}}(\alpha):=\inf\{s\geq 0:\mathbb{P}\{\|Z_{n}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\}\geq\alpha\}$ . Then, for all $\delta,\eta>0$ ,

	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n,Z_{n}}(\alpha)\leq c_{n}(\kappa_{n}(\delta)+\eta+\alpha;\mathcal{C}_{P})\right\}\geq 1-\mathbb{P}\left\{\gamma_{n,Z_{n}^{1}}+\rho_{n,Z_{n}^{2}}(\delta)>\eta\right\},\quad{}\text{and}$
	$\displaystyle\inf_{\alpha\in(0,1)}\mathbb{P}\left\{c_{n}(\alpha;\mathcal{C}_{P})\leq c_{n,Z_{n}}(\kappa_{n}(\delta)+\eta+\alpha)\right\}\geq 1-\mathbb{P}\left\{\gamma_{n,Z_{n}^{1}}+\rho_{n,Z_{n}^{2}}(\delta)>\eta\right\},$

where $\gamma_{n,Z_{n}^{1}}=\sup_{s\geq 0}|\mathbb{P}\{\|Z_{n}^{1}\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\}-\mathbb{P}\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\}|$ , $\kappa_{n}(\delta)=K\delta/\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}$ , $\rho_{n,Z_{n}^{2}}(\delta)=\mathbb{P}\left\{\|Z_{n}^{2}\|_{\mathcal{F}_{n}}>\delta\mid X_{1},\ldots,X_{n}\right\}$ , and $K>0$ is an absolute constant.

B.4 Boundedness and continuity of centered Gaussian processes

This section contains classical results on boundeness and continuity of the sample paths of centered Gaussian processes. We provide a proof for Lemma 18 in Appendix F.4; all other results (with proofs or references to proofs) can be found in Appendix A of van der Vaart and Wellner (1996).

Throughout this section $X=(X_{u})_{u\in U}$ denotes a centered separable Gaussian process indexed by a semi-metric space $U$ , $Z=\sup_{u\in U}|X_{u}|$ , $\sigma^{2}=\sup_{u\in U}\mathrm{E}[X_{u}^{2}]$ , and $d_{X}$ the intrinsic standard deviation metric associated with $X$ .

Lemma 14 (Equivalence of bounded sample path and finite expectation).

$X$ is almost surely bounded on $U$ if and only if $\mathrm{E}[Z]<\infty$ .

Lemma 15 (Reverse Liapunov inequality).

If $X$ is almost surely bounded on $U$ , then there exist constants $K_{p,q}>0$ depending on $0<p\leq q<\infty$ only such that

\displaystyle\left(\mathrm{E}[Z^{q}]\right)^{1/q}\leq K_{p,q}\left(\mathrm{E}[Z^{p}]\right)^{1/p}.

Lemma 16 (Sudakov’s lower bound).

Let $N(U,d_{X},\varepsilon)$ be the $\varepsilon$ -covering number of $U$ w.r.t. $d_{X}$ . Then, there exists an absolute constant $K>0$ such that

\displaystyle\sup_{\varepsilon>0}\varepsilon\sqrt{N(U,d_{X},\varepsilon)}\leq K\mathrm{E}[Z].

Consequently, if $\mathrm{E}[Z]<\infty$ , then $U$ is totally bounded w.r.t. $d_{X}$ .

Lemma 17 (Metric entropy condition for bounded and continuous sample paths).

Let $N(U,d_{X},\varepsilon)$ be the $\varepsilon$ -covering number of $U$ w.r.t. $d_{X}$ . If $\int_{0}^{\infty}\sqrt{N(U,d_{X},\varepsilon)}d\varepsilon<\infty$ , then there exists a version of $X$ that is almost surely bounded and has almost surely uniformly $d_{X}$ -continuous sample paths.

Lemma 18 (Continuous sample paths and modulus of continuity).

If $X$ is almost surely bounded on $U$ , then the sample paths of $X$ on $U$ are almost surely uniformly $d_{X}$ -continuous if and only if

\displaystyle\lim_{\delta\rightarrow 0}\mathrm{E}\left[\sup_{d_{X}(u,v)<\delta}|X_{u}-X_{v}|\right]=0.

(24)

Remark 8.

Sufficiency of (24) holds for arbitrary stochastic processes on general metric spaces. Necessity of (24) holds only for Gaussian processes. See comment in proof.

B.5 Auxiliary results for applications

In this section we collect several technical results needed for the applications in Section 4.

Lemma 19.

Let $X,X_{1},\ldots,X_{n}\in\mathbb{R}^{d}$ be i.i.d. sub-Gaussian random vectors with mean zero and covariance matrix $\Sigma$ . Define $T_{4}:\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d}\rightarrow\mathbb{R}$ by $T_{4}(t,u,v,w):=\mathrm{E}[(X^{\prime}t)(X^{\prime}u)(X^{\prime}v)(X^{\prime}w)]$ . Then,

\displaystyle\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}\otimes X_{i}\otimes X_{i}-T_{4}\right\|_{op}\lesssim\mathrm{r}(\Sigma)\|\Sigma\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right),

where $\mathrm{r}(\Sigma)=\mathrm{tr}(\Sigma)/\|\Sigma\|_{op}$ and $\lesssim$ hides an absolute constant independent of $n$ , $d$ , and $\Sigma$ . (Here, we tacitly identify the $X_{i}$ ’s with linear maps $\mathbb{R}^{d}\rightarrow\mathbb{R}$ and $\otimes$ denotes the tensor product between these linear maps.)

Remark 9.

This result is useful because the upper bound is dimension-free in the sense that it only depends on the effective rank $\mathrm{r}(\Sigma)$ and the operator norm $\|\Sigma\|_{op}$ . However, the dependence on the sample size $n$ is sub-optimal.

Lemma 20.

The bias-corrected kernel ridge regression estimator defined in (13) satisfies

\displaystyle\widehat{f}^{\mathrm{bc}}_{n}-f_{0}=(T+\lambda)^{-2}T\left(\frac{1}{n}\sum_{i=1}^{n}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right)+R_{n},

where $R_{n}$ is a higher-order remainder term and

	$\displaystyle\sqrt{n}\\|R_{n}\\|_{\infty}$	$\displaystyle\lesssim\kappa\left(\lambda^{-1}\\|\widehat{T}_{n}-T\\|_{op}+\lambda^{-2}\\|\widehat{T}_{n}-T\\|_{op}^{2}\right)\left\\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\\|_{\mathcal{H}}$
		$\displaystyle\quad\quad+\kappa\left(\sqrt{n}\\|\widehat{T}_{n}-T\\|_{op}^{2}+\sqrt{n}\lambda^{2}\right)\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}},$

where $\lesssim$ hides an absolute constant.

Lemma 21.

Let $\delta\in(0,1)$ and $R_{n}$ be the remainder term in (13). Let $S$ be a separable metric space (w.r.t. some metric). If Assumptions 4 and 5 hold, then, with probability at least $1-\delta$ ,

	$\displaystyle\sqrt{n}\\|R_{n}\\|_{\infty}$	$\displaystyle\lesssim\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right)\left(\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\vee\kappa^{2}\log^{2}(2/\delta)\right)$
		$\displaystyle\quad\quad+\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}\vee\sqrt{n}\lambda^{2}\right)\kappa\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}},$

where $\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1$ , $\mathfrak{n}_{1}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2}T\right)$ , and $\lesssim$ hides an absolute constant.

Remark 10.

The quantity $\mathfrak{n}_{1}(\lambda)$ also appears in Singh and Vijaykumar (2023). For an interpretation and its relation to the effective rank of the operator $T$ we refer to Section H (ibid., pp. 80ff).

Remark 11.

Above upper bound is $o_{p}(1)$ if (i) $\bar{\sigma}\kappa^{3}(\log n)^{3}\vee\bar{\sigma}\kappa^{2}(\log n)^{2}\mathfrak{n}_{1}(\lambda)=o(\sqrt{n}\lambda)$ , (ii) $\kappa^{5}(\log n)^{2}\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(\sqrt{n})$ , and (iii) $\lambda^{2}\kappa\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(1)$ .

Lemma 22.

Let $\delta\in(0,1)$ and $\Omega$ and $\widehat{\Omega}_{n}$ be the covariance operator and its sample analogue as define in Section 4.3. If Assumptions 4 and 5 hold, then, with probability at least $1-\delta$ ,

\displaystyle\|\widehat{\Omega}_{n}-\Omega\|_{op}\lesssim\|T^{3}(T+\lambda)^{-4}\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}},

Remark 12.

Above upper bound is $o_{p}(1)$ if $\|T^{3}(T+\lambda)^{-4}\|_{op}(\kappa^{2}\vee\kappa\mathfrak{n}_{1}(\lambda)\vee\bar{\sigma}^{2})\sqrt{\log n}=o(\sqrt{n}\lambda)$ .

Lemma 23.

Let $\delta\in(0,1)$ and $\alpha\in\mathbb{N}_{0}$ . Let $S$ be a separable metric space (w.r.t. some metric). If Assumptions 4 and 5 hold, then, with probability at least $1-\delta$ ,

	$\displaystyle(i)$	$\displaystyle\quad\quad\left\\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right\\|_{\mathcal{H}}\lesssim\sqrt{\frac{\sigma_{0}^{2}\mathfrak{n}_{\alpha}^{2}(\lambda)\log(2/\delta)}{n}}\vee\frac{\lambda^{-\alpha}\kappa(B+\kappa\\|f_{0}\\|_{\mathcal{H}})\log(2/\delta)}{n},$
	$\displaystyle(ii)$	$\displaystyle\quad\quad\left\\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-\alpha}\left((k_{X_{i}}\otimes k_{X_{i}}^{*})-T\right)\right\\|_{HS}\lesssim\sqrt{\frac{\kappa^{2}\mathfrak{n}_{\alpha}^{2}(\lambda)\log(2/\delta)}{n}}\vee\frac{\lambda^{-\alpha}\kappa^{2}\log(2/\delta)}{n},$

where $\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1$ , $\mathfrak{n}_{\alpha}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right)$ and $\lesssim$ hides an absolute constant.

Remark 13.

If $\mathcal{H}$ is pre-Gaussian, then weaker conditions and generic chaining arguments yield tighter bounds (e.g. Koltchinskii and Lounici, 2017, Theorem 9).

The next lemma is a version of Bernstein’s exponential tail bound for random elements on separable Hilbert spaces.

Lemma 24 (Theorem 3.3.4, Yurinsky, 2006; Lemma G.2, Singh and Vijaykumar, 2023).

Let $X,X_{1},\ldots,X_{n}$ be i.i.d. centered random elements on a separable Hilbert space $\mathcal{H}$ with induced norm $\|\cdot\|_{\mathcal{H}}$ . Suppose that there exist absolute constants $\nu,\sigma>0$ such that $\sum_{i=1}^{n}\mathrm{E}\|X_{i}\|_{\mathcal{H}}^{k}\leq(k!/2)\sigma^{2}\nu^{k-2}$ for all $k\geq 2$ . Then, for $t>0$ arbitrary,

\displaystyle\mathbb{P}\left\{\max_{1\leq m\leq n}\left\|\sum_{i=1}^{m}X_{i}\right\|_{\mathcal{H}}>t\sigma\right\}\leq 2\exp\left(\frac{-t^{2}/2}{1+t\nu/\sigma}\right).

In particular, for $\delta\in(0,1)$ arbitrary, with probability at least $1-\delta$ ,

\displaystyle\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\right\|_{\mathcal{H}}\lesssim\sqrt{\frac{\sigma^{2}\log(2/\delta)}{n}}\vee\frac{\nu\log(2/\delta)}{n},

where $\lesssim$ hides an absolute constant independent of $\delta,n,\nu,\sigma$ , and $\mathcal{H}$ .

Appendix C Proofs of the results in Section 2

C.1 Proof of Proposition 1

Proof of Proposition 1.

Our proof is inspired by Nourdin and Peccati (2012) (Theorem 3.7.1) who establish a Berry-Esseen type bound for the uni-variate case using an inductive argument (attributed to Bolthausen, 1984). The multi-variate case requires several modifications some of which we take from Götze (1991), Bhattacharya and Holmes (2010), and Fang and Koike (2021). We also borrow a truncation argument from Chernozhukov et al. (2017), which explains the qualitative similarity between their bound and ours. Original ideas in our proof are mostly those related to the way in which we use our Gaussian anti-concentration inequality (Lemma 6) and exploit the mollifying properties of the Ornstein-Uhlenbeck semi-group operator to by-pass dimension dependent smoothing inequalities (Lemmas 1 and 2).

Our proof strategy has two drawbacks: First, the inductive argument relies substantially on the i.i.d. assumption of the data. Generalizing this argument to independent but non-identical data requires additional assumptions on the variances similar to the uniform asymptotic negligibility condition in the classical Lindeberg-Feller CLT. We leave this generalization to future research. Second, the recent results by Chernozhukov et al. (2020) suggest that our Berry-Esseen type bound is not sharp. Unfortunately, their proof technique (based on delicate estimates of Hermite polynomials) is inherently dimension dependent. Extending their approach to the coordinate-free Wiener chaos decomposition is another interesting research task.

The case of positive definite $\Sigma$ .

Suppose that $\Sigma\in\mathbb{R}^{d\times d}$ is positive definite. Let $Z\sim N(0,\Sigma)$ be independent of $X,X_{1},\ldots,X_{n}\in\mathbb{R}^{d}$ and define, for each $n\geq 1$ ,

\displaystyle\Delta_{n}:=\sup_{s,t\geq 0}\Big{|}\mathbb{P}\left\{\|e^{-t}S_{n}+\sqrt{1-e^{-2t}}Z\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}.

Further, for each $n\geq 1$ , let $C_{n,d}\geq 1$ be the smallest constant greater than or equal to one such that, for all i.i.d. random variables $X,X_{1},\ldots,X_{n}\in\mathbb{R}^{d}$ with $\mathrm{E}[\|X\|_{\infty}^{3}]<\infty$ , $\mathrm{E}[X]=0$ , and $\mathrm{E}[XX^{\prime}]=\Sigma$ ,

\displaystyle\Delta_{n}\leq C_{n,d}B_{n},

where

\displaystyle B_{n}:=\frac{(\mathrm{E}[\|X\|_{\infty}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X\|_{\infty}^{3}\right]}+\frac{12\mathrm{E}[\|Z\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z\|_{\infty})}}.

The factor 12 in front of $\mathrm{E}[\|Z\|_{\infty}]$ ensures that $B_{n}\sqrt{n}\geq 1$ so that $C_{n,d}\leq\sqrt{n}$ . Indeed, one easily computes $(\mathrm{E}[\|Z\|_{\infty}^{2}])^{1/2}\leq 2(2\sqrt{\pi}+1)\mathrm{E}[\|Z\|_{\infty}]\leq 12\mathrm{E}[\|Z\|_{\infty}]$ (by the equivalence of moments of suprema of Gaussian processes, e.g. van der Vaart and Wellner, 1996, Proposition A.2.4) and, hence, $B_{n}\sqrt{n}\geq\sqrt{\mathrm{E}[\|Z\|_{\infty}^{2}]/\mathrm{Var}(\|Z\|_{\infty})}\geq 1$ .

While the upper bound $C_{n,d}\leq\sqrt{n}$ is too loose to conclude the proof, it is nonetheless an important first step towards a tighter bound. For the moment, assume that there exists an absolute constant $K\geq 1$ such that

\displaystyle\Delta_{n}\leq B_{n}\left[\left(1+2K^{2}\right)\sqrt{C_{n-1,d}}+1\right]\quad{}\forall n\geq 2.

(25)

Then, by construction of $C_{n,d}$ ,

\displaystyle C_{1,d}=1,\quad{}C_{n,d}\leq\left[\left(1+2K^{2}\right)\sqrt{C_{n-1,d}}+1\right]\wedge\sqrt{n}\quad{}\forall n\geq 2.

(26)

We shall now show that this difference inequality implies that $\sup_{d\geq 1}\sup_{n\geq 1}C_{n,d}<\infty$ independent of the distribution of the $X_{i}$ ’s: Define the map $x\mapsto F(x):=(1+2K^{2})\sqrt{x}+1$ and consider the nonlinear first-order difference equation

\displaystyle x_{1}=1,\quad{}x_{n}=F(x_{n-1})\quad{}\forall n\geq 2.

We easily verify that the fixed point $x^{*}>0$ solving $x=F(x)$ satisfies

\displaystyle x^{*}=\frac{1}{2}\left(4K^{4}+4K^{2}+\sqrt{(2K^{2}+1)^{2}(4K^{4}+4K^{2}+5)}+3\right),

and that $F(x)>x$ for all $x\in(0,x^{*})$ and $F(x)<x$ for all $(x^{*},\infty)$ . We also notice that $F$ is monotone increasing on $\mathbb{R}_{+}$ . Thus,

\displaystyle[(F\circ F)(x)-x](x-x^{*})<0\quad{}\quad{}\forall x\in\mathbb{R}_{+}\setminus\{x^{*}\}.

Hence, by Theorem 2.1.2 in Sedaghat (2003) every trajectory $\{F^{n}(x_{1})\}_{n\geq 1}$ with $x_{1}\in\mathbb{R}_{+}$ converges to $x^{*}$ . In particular, $\lim_{n\rightarrow\infty}x_{n}=\lim_{n\rightarrow\infty}F^{n}(1)=x^{*}$ . Returning to the inequality (26) we conclude that there exists $N_{0}\geq 2$ such that $C_{n,d}\leq x^{*}+1$ for all $n>N_{0}$ and all $d\geq 1$ . Since $C_{n,d}\leq\sqrt{n}$ for all $n\geq 1$ , it follows that $C_{n,d}\leq(x^{*}+1)\vee\sqrt{N_{0}}<\infty$ for all $n\geq 1$ and $d\geq 1$ .

To complete the proof of the theorem, it remains to show that eq. (25) holds. Let $s\in\mathbb{R}$ , $t,\lambda\geq 0$ be arbitrary and $h_{s,\lambda}$ be the map from Lemma 1. Define $x\mapsto h(x):=h_{s,\lambda}(\|x\|_{\infty})$ . By Lemma 6 and Lemma 1 (ii),

\displaystyle\Delta_{n}\leq\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}},

(27)

where $P_{t}h$ denotes the Ornstein-Uhlenbeck semi-group with stationary measure $N(0,\Sigma)$ , i.e.

\displaystyle P_{t}h(x):=\mathrm{E}\left[h\left(e^{-t}x+\sqrt{1-e^{-2t}}Z\right)\right]\quad{}\quad{}\forall x\in\mathbb{R}^{d}.

Since $x\mapsto P_{t}h(x)-\mathrm{E}[h(Z)]$ is Lipschitz continuous (with constant $\lambda^{-1}e^{-t}$ ) and $\Sigma$ positive definite, Proposition 4.3.2 in Nourdin and Peccati (2012) implies that

\displaystyle\mathrm{E}\left[P_{t}h(S_{n})-h(Z)\right]=\mathrm{E}\left[\mathrm{tr}\left(\Sigma D^{2}G_{h}(S_{n})\right)-S_{n}^{\prime}DG_{h}(S_{n})\right],

(28)

where $G_{h}\in C^{\infty}(\mathbb{R}^{d})$ and

\displaystyle G_{h}(x):=\int_{0}^{\infty}\Big{(}\mathrm{E}[h(Z)]-P_{u}P_{t}h(x)\Big{)}du\quad{}\quad{}\forall x\in\mathbb{R}^{d}.

Since $P_{u}P_{t}f=P_{u+t}f$ almost surly for all integrable maps $f$ (semi-group property!), we have

\displaystyle G_{h}(x)=\int_{t}^{\infty}\mathrm{E}\left[h(Z)-h\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\right]du.

We proceed by re-writing eq. (28) in multi-index notation as

	$\displaystyle\big{\|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{\|}$	$\displaystyle=\left\|\mathrm{E}\left[\mathrm{tr}\left(\Sigma D^{2}G_{h}(S_{n})\right)-S_{n}^{\prime}DG_{h}(S_{n})\right]\right\|$
		$\displaystyle=\left\|\sum_{i=1}^{n}\mathrm{E}\left[\frac{1}{n}\mathrm{tr}\left(\widetilde{X}_{i}\widetilde{X}_{i}D^{2}G_{h}(S_{n})\right)-\frac{X_{i}^{\prime}}{\sqrt{n}}DG_{h}(S_{n})\right]\right\|$
		$\displaystyle=\left\|\sum_{i=1}^{n}\mathrm{E}\left[\sum_{\|\alpha\|=2}D^{\alpha}G_{h}(S_{n})\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}-\sum_{\|\alpha\|=1}D^{\alpha}G_{h}\left(S_{n}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha}\right]\right\|,$

where the $\widetilde{X}_{i}$ ’s are independent copies of the $X_{i}$ ’s. A Taylor expansion around $S_{n}^{i}:=S_{n}-n^{-1/2}X_{i}$ with exact integral remainder term yields

		$\displaystyle\big{\|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{\|}$
		$\displaystyle\quad{}=\left\|\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=2}D^{\alpha}G_{h}(S_{n}^{i})\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\right]+\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=2}\sum_{\|\beta\|=1}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\left(\frac{X_{i}}{\sqrt{n}}\right)^{\beta}\right]\right.$
		$\displaystyle\quad{}\left.\quad{}-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=1}D^{\alpha}G_{h}\left(S_{n}^{i}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha}\right]-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=1}\sum_{\|\beta\|=1}D^{\alpha+\beta}G_{h}(S_{n}^{i})\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\right]\right.$
		$\displaystyle\quad{}\left.\quad{}-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=1}\sum_{\|\beta\|=2}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\frac{2(1-\theta)}{\beta!}\right]\right\|$
		$\displaystyle\quad{}=\left\|\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=2}\sum_{\|\beta\|=1}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left\{\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\left(\frac{X_{i}}{\sqrt{n}}\right)^{\beta}-\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\frac{2(1-\theta)}{\alpha!}\right\}\right]\right\|,$		(29)

where $\theta\sim Unif(0,1)$ is independent of the $X_{i}$ ’s and $\widetilde{X}_{i}$ ’s. The first and fourth term cancel out and the third term vanishes because $S_{n}^{i}$ and $X_{i}$ are independent and mean zero. (Eq. (C.1) is essentially a re-statement of Lemmas 2.9 and 2.4 in Götze (1991) and Raič (2019), respectively.)

Notice that $G_{h}\in C^{\infty}(\mathbb{R}^{d})$ because it is the convolution of a bounded Lipschitz map with the density of $N(0,\Sigma)$ (e.g. Nourdin and Peccati, 2012, Proposition 4.2.2). Derivatives on $G_{h}$ are usually obtained by differentiating the density of $N(0,\Sigma)$ (e.g. Götze, 1991, Lemma 2.1; Raič, 2019, Lemma 2.5 and 2.6; Fang and Koike, 2021, Lemma 2.2; and Chernozhukov et al., 2020, Lemmas 6.1, 6.2, 6.3). Here, we proceed differently. Let $1\leq j,k,\ell\leq d$ be the indices corresponding to the multi-indices $|\alpha|=2$ and $|\beta|=1$ . By Lemma 2 (i) we have

	$\displaystyle D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)$	$\displaystyle=\frac{\partial^{3}G_{h}}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)$
		$\displaystyle=-\int_{t}^{\infty}e^{-3u}\mathrm{E}_{Z}\left[\frac{\partial^{3}h}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\Big{\|}_{x=S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}}\right]du,$

where $\mathrm{E}_{Z}$ denotes the expectation taken with respect to the law of $Z$ only. And by Lemma 2 (ii),

	$\displaystyle\sum_{j,k,\ell}e^{-3u}\mathrm{E}_{Z}\left[\left\|\frac{\partial^{3}h}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\Big{\|}_{x=S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}}\right\|\right]$
	$\displaystyle\quad{}\leq C_{3}\lambda^{-3}e^{-3u}\sum_{j=1}^{d}\mathrm{E}_{Z}\left[\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\mathbf{1}\left\{\left\|V_{ij}\right\|\geq\left\|V_{im}\right\|,\>\forall m\right\}\right],$

where

\displaystyle V_{i}:=e^{-u}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)+\sqrt{1-e^{-2u}}Z,\quad\quad 1\leq i\leq n.

Let $\alpha_{k,\ell}\in\{1,2\}$ be the value of $\alpha!$ for $|\alpha|=2$ corresponding to the indices $k,\ell$ . An application of Hölder’s inequality yields

		$\displaystyle\frac{e^{-3u}}{n^{3/2}}\sum_{i=1}^{n}\sum_{j,k,\ell}\mathrm{E}_{Z}\left[\left\|\frac{\partial^{3}h}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\Big{\|}_{x=S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}}\right\|\right]\left\|\widetilde{X}_{ij}\widetilde{X}_{ik}X_{i\ell}-X_{ij}X_{ik}X_{i\ell}\frac{2(1-\theta)}{\alpha_{k,\ell}}\right\|$
		$\displaystyle\leq\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\sum_{j=1}^{d}\mathbf{1}\left\{\left\|V_{ij}\right\|\geq\left\|V_{im}\right\|,\>m\neq j\right\}\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\left\|\widetilde{X}_{ij}^{2}X_{ij}-X_{ij}^{3}\frac{2(1-\theta)}{\alpha_{j,j}}\right\|\right]$
		$\displaystyle\leq\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\left(\sum_{j=1}^{d}\mathbf{1}\left\{\left\|V_{ij}\right\|\geq\left\|V_{im}\right\|,\>m\neq j\right\}\right)\max_{1\leq j\leq d}\left(\widetilde{X}_{ij}^{2}\|X_{ij}\|+\|X_{ij}\|^{3}\right)\right]$
		$\displaystyle=\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\left(\\|\widetilde{X}_{i}\\|_{\infty}^{2}\\|X_{i}\\|_{\infty}+\\|X_{i}\\|_{\infty}^{3}\right)\right],$		(30)

where in the last line we have used that $\sum_{j=1}^{d}\mathbf{1}\left\{\left|V_{ij}\right|\geq\left|V_{im}\right|,\>\forall m\right\}=1$ almost surely because $|Corr(Z_{j},Z_{k})|=|Corr(X_{1j},X_{1k})|<1$ for all $j\neq k$ and $1\leq i\leq n$ (since $\Sigma$ is positive definite no pair of entries in $X_{i}$ and $Z$ can be perfectly (positively or negatively) correlated!). Taking expectation with respect to the $X_{i}$ ’s and $\theta$ over above inequality, we obtain

		$\displaystyle\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\left(\\|\widetilde{X}_{i}\\|_{\infty}^{2}\\|X_{i}\\|_{\infty}+\\|X_{i}\\|_{\infty}^{3}\right)\right]$
		$\displaystyle\quad{}=\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}\left[\sum_{i=1}^{n}\mathrm{E}\left[\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\mid X_{i},\theta\right]\left(\\|\widetilde{X}_{i}\\|_{\infty}^{2}\\|X_{i}\\|_{\infty}+\\|X_{i}\\|_{\infty}^{3}\right)\right]$
		$\displaystyle\quad{}=\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}\left[\sum_{i=1}^{n}\mathbb{P}\left\{s\leq\\|V_{i}\\|_{\infty}\leq s+3\lambda\mid X_{i},\theta\right\}\left(\\|\widetilde{X}_{i}\\|_{\infty}^{2}\\|X_{i}\\|_{\infty}+\\|X_{i}\\|_{\infty}^{3}\right)\right].$		(31)

Notice that $\mathrm{E}[S_{n}^{i}]=0$ and $\mathrm{E}[S_{n}^{i}S_{n}^{i^{\prime}}]=\frac{n-1}{n}\mathrm{E}[S_{n}S_{n}^{\prime}]$ , and $Z\overset{d}{=}\sqrt{\frac{n-1}{n}}Z+\frac{1}{\sqrt{n}}\widetilde{Z}$ , where $\widetilde{Z}$ is an independent copy of $Z\sim N(0,\Sigma)$ . Set $Z_{n}:=\sqrt{\frac{n-1}{n}}Z$ and bound the probability in line (C.1) by

	$\displaystyle\mathbb{P}\left\{s\leq\\|V_{i}\\|_{\infty}\leq s+3\lambda\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}=\mathbb{P}\left\{\\|V_{i}\\|_{\infty}>s\mid X_{i},\theta\right\}-\mathbb{P}\left\{\\|V_{i}\\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}\leq\mathbb{P}\left\{\\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\\|_{\infty}+\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}+e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}>s\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}\quad{}-\mathbb{P}\left\{\\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\\|_{\infty}-\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}-e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}=\mathbb{P}\left\{\\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\\|_{\infty}+\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}+e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}>s\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}\quad{}\pm\mathbb{P}\left\{\\|Z_{n}\\|_{\infty}+\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}+e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}>s\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}\quad{}\pm\mathbb{P}\left\{\\|Z_{n}\\|_{\infty}-\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}-e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}\quad{}-\mathbb{P}\left\{\\|e^{-u}S_{n}^{i}+\sqrt{1-e^{-2u}}Z_{n}\\|_{\infty}-\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}-e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}>s+3\lambda\mid X_{i},\theta\right\}$
$\displaystyle\begin{split}&\quad{}\leq 2\Delta_{n-1}+\mathbb{P}\left\{s-\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}-e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}\leq\\|Z_{n}\\|_{\infty}\leq s+3\lambda\right.\\ &\left.\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}+e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}\mid X_{i},\theta\right\},\end{split}$		(32)

where we have used that under the i.i.d. assumption

\displaystyle\Delta_{n-1}

\displaystyle\equiv\sup_{s\in\mathbb{R},t\geq 0}\Big{|}\mathbb{P}\left\{\|e^{-t}S_{n}^{i}+\sqrt{1-e^{-2t}}Z_{n}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z_{n}\|_{\infty}\leq s\right\}\Big{|}.

By Lemma 6, Lemma 1 (ii), and monotonicity and concavity of the map $x\mapsto x/\sqrt{a+x^{2}}$ , $a>0$ , we have, for arbitrary $M>0$ ,

	$\displaystyle\mathbb{P}\left\{s-\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}-e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}\leq\\|Z_{n}\\|_{\infty}\leq s+3\lambda\right.$
	$\displaystyle\left.\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\sqrt{1-e^{-2u}}n^{-1/2}\\|\widetilde{Z}\\|_{\infty}+e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}\mid X_{i},\theta\right\}$
	$\displaystyle\quad{}\leq\frac{6\sqrt{3}\lambda+2\sqrt{3}\sqrt{1-e^{-2u}}n^{-1/2}\mathrm{E}[\\|\widetilde{Z}\\|_{\infty}]+4\sqrt{3}e^{-u}n^{-1/2}\\|\theta X_{i}\\|_{\infty}}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})+(1-e^{-2u})n^{-1}\mathrm{E}[\\|\widetilde{Z}\\|_{\infty}]^{2}/12+3\lambda^{2}/4+e^{-2u}n^{-1}\\|\theta X_{i}\\|_{\infty}^{2}/3}}$
$\displaystyle\begin{split}&\quad{}\leq\frac{4\sqrt{3}n^{-1/2}M}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})+n^{-1}M^{2}/3}}\mathbf{1}\{\\|X_{i}\\|_{\infty}\leq M\}+12\cdot\mathbf{1}\{\\|X_{i}\\|_{\infty}>M\}\\ &\quad{}\quad{}+\frac{6\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})+3\lambda^{2}/4}}+\frac{2\sqrt{3}n^{-1/2}\mathrm{E}[\\|Z\\|_{\infty}]}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})+n^{-1}\mathrm{E}[\\|Z\\|_{\infty}]^{2}/12}}.\end{split}$		(33)

Combine eq. (C.1)–(C.1) with (C.1) and integrate over $u\in(t,\infty)$ to conclude via eq. (27)–(C.1) and the i.i.d. assumption that there exists an absolute constant $K\geq 1$ such that

\displaystyle\begin{split}\Delta_{n}&\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X\|_{\infty}^{3}\right]\Delta_{n-1}\\ &\quad{}+\frac{K}{n^{1/2}\lambda^{2}}\frac{\mathrm{E}[\|X\|_{\infty}^{3}]}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X\|_{\infty}^{3}]\mathrm{E}[\|Z\|_{\infty}]}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}\\ &\quad{}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X\|_{\infty}^{3}]M}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right],\end{split}

(34)

where we have used Harris’ association inequality to simplify several summands, i.e.

\displaystyle\sum_{i=1}^{n}\mathrm{E}[\|\widetilde{X}_{i}\|_{\infty}^{2}]\mathrm{E}[\|X_{i}\|_{\infty}\mathbf{1}\{\|X_{i}\|_{\infty}\leq M\}]\leq\sum_{i=1}^{n}\mathrm{E}[\|X_{i}\|_{\infty}^{2}]\mathrm{E}[\|X_{i}\|_{\infty}]\leq\sum_{i=1}^{n}\mathrm{E}\left[\|X_{i}\|_{\infty}^{3}\right],

and

\displaystyle\sum_{i=1}^{n}\mathrm{E}[\|\widetilde{X}_{i}\|_{\infty}^{2}]\mathrm{E}\left[\|X_{i}\|_{\infty}\mathbf{1}\{\|X_{i}\|_{\infty}>M\}\right]\leq\sum_{i=1}^{n}\mathrm{E}\left[\|X_{i}\|_{\infty}^{3}\mathbf{1}\{\|X_{i}\|_{\infty}>M\}\right].

Observe that eq. (34) holds for arbitrary $\lambda>0$ . Setting

\displaystyle\lambda=K\left(\frac{C_{n-1,d}}{n-1}\right)^{1/6}\left(\mathrm{E}[\|X\|_{\infty}^{3}]\right)^{1/3}

we deduce from eq. (34), the definition of $C_{n-1,d}$ , and $KC_{n-1,d}^{1/6}\geq 1$ (because $K,C_{n-1,d}\geq 1$ !) that, for all $n\geq 2$ ,

	$\displaystyle\Delta_{n}$	$\displaystyle\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})}}+B_{n-1}\sqrt{C_{n-1,d}}+B_{n}$
		$\displaystyle\leq B_{n}\left[\left(\frac{n}{n-1}\right)^{1/6}K^{2}C_{n-1,d}^{1/6}+\left(\frac{n}{n-1}\right)^{1/2}\sqrt{C_{n-1,d}}+1\right]$
		$\displaystyle\leq B_{n}\left[\left(1+2K^{2}\right)\sqrt{C_{n-1,d}}+1\right].$

We have thus established eq. (25). This concludes the proof of the theorem in the case of a positive definite covariance matrix.

The case of positive semi-definite $\Sigma\neq\mathbf{0}$ .

Suppose that $\Sigma\in\mathbb{R}^{d\times d}$ is positive semi-definite but not identical to zero.

Take $Y,Y_{1},\ldots,Y_{n}\sim N(0,I_{d})$ and $Z\sim N(0,\Sigma)$ such that $X,X_{1},\ldots,X_{n},Y,Y_{1},\ldots,Y_{n},Z\in\mathbb{R}^{d}$ are mutually independent. Let $\eta>0$ be arbitrary and define $Z^{\eta}:=Z+\eta Y$ , $X^{\eta}:=X+\eta Y$ , and $S_{n}^{\eta}:=n^{-1/2}\sum_{i=1}^{n}X_{i}^{\eta}$ with $X_{i}^{\eta}:=X_{i}+\eta Y_{i}$ . Clearly, $Z^{\eta}\sim N(0,\Sigma+\eta^{2}I_{d})$ and the $X^{\eta}_{i}$ ’s are i.i.d. with mean zero and positive definite covariance $\Sigma+\eta^{2}I_{d}$ . Hence, by the first part of the proof there exists an absolute constant $C_{*}\geq 1$ independent of $n$ , $d$ , and the distribution of the $X_{i}^{\eta}$ ’s (and hence, independent of $\eta>0$ !) such that for $M\geq 0$ and $n\geq 1$ ,

\displaystyle\Delta_{n}^{\eta}\leq C_{*}B_{n}^{\eta},

(35)

where

\displaystyle\Delta_{n}^{\eta}:=\sup_{s,t\geq 0}\Big{|}\mathbb{P}\left\{\|e^{-t}S_{n}^{\eta}+\sqrt{1-e^{-2t}}Z^{\eta}\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z^{\eta}\|_{\infty}\leq s\right\}\Big{|},

and

\displaystyle B_{n}^{\eta}:=\frac{(\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\mathbf{1}\{\|X^{\eta}\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\right]}+\frac{12\mathrm{E}[\|Z^{\eta}\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}.

At this point, it is tempting to take $\eta\rightarrow 0$ in eq. (35). However, this would yield the desired result only for the case in which the law of $\|S_{n}\|_{\infty}$ is continuous. (Alternatively, we could replace the supremum over $s\in[0,\infty]$ by the supremum over $s\in\mathcal{C}_{n}$ , where $\mathcal{C}_{n}$ is the set of continuity points of the law of $\|S_{n}\|_{\infty}$ .) Therefore, we proceed differently. Recall eq. (27), i.e.

\displaystyle\Delta_{n}\leq\sup_{s\in\mathbb{R},t\geq 0}\big{|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}},

where $P_{t}h$ denotes the Ornstein-Uhlenbeck semi-group with stationary measure $N(0,\Sigma)$ ,

\displaystyle P_{t}h(x):=\mathrm{E}\left[h\left(e^{-t}x+\sqrt{1-e^{-2t}}Z\right)\right]\quad{}\quad{}\forall x\in\mathbb{R}^{d}.

Let $P_{t}^{\eta}h(x)$ be the Ornstein-Uhlenbeck semi-group with stationary measure $N(0,\Sigma+\eta^{2}I_{d})$ and expand above inequality to obtain the following modified version of eq. (27):

	$\displaystyle\Delta_{n}$	$\displaystyle\leq\sup_{s\in\mathbb{R},t\geq 0}\big{\|}\mathrm{E}[P_{t}^{\eta}h(S_{n}^{\eta})-h(Z^{\eta})]\big{\|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})+\lambda^{2}/12}}+\sup_{s\in\mathbb{R},t\geq 0}\big{\|}\mathrm{E}[P_{t}h(S_{n}^{\eta})-P_{t}h(S_{n})]\big{\|}$
		$\displaystyle\quad{}+\sup_{s\in\mathbb{R},t\geq 0}\big{\|}\mathrm{E}[h(Z^{\eta})-h(Z)]\big{\|}+\sup_{s\in\mathbb{R},t\geq 0}\big{\|}\mathrm{E}[P_{t}h(S_{n}^{\eta})-P_{t}^{\eta}h(S_{n}^{\eta})]\big{\|}$
		$\displaystyle\overset{(a)}{\leq}\sup_{s\in\mathbb{R},t\geq 0}\big{\|}\mathrm{E}[P_{t}^{\eta}h(S_{n}^{\eta})-h(Z^{\eta})]\big{\|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})+\lambda^{2}/12}}$
		$\displaystyle\quad{}+\frac{\eta}{\lambda}\mathrm{E}\left[\big{\\|}n^{-1/2}\sum_{i=1}^{n}Y_{i}\big{\\|}_{\infty}\right]+\frac{\eta}{\lambda}\mathrm{E}[\\|Y\\|_{\infty}]+\frac{\eta}{\lambda}\mathrm{E}[\\|Y\\|_{\infty}]$
		$\displaystyle\leq\sup_{s\in\mathbb{R},t\geq 0}\big{\|}\mathrm{E}[P_{t}^{\eta}h(S_{n}^{\eta})-h(Z^{\eta})]\big{\|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})+\lambda^{2}/12}}+\frac{3\eta}{\lambda}\sqrt{2\log 2d},$

where the (a) holds because $h$ is $\lambda^{-1}$ -Lipschitz and $P_{t}h$ is $\lambda^{-1}e^{-t}$ -Lipschitz w.r.t. the metric induced by the $\ell_{\infty}$ -norm.

Since $\Sigma+\eta^{2}I_{d}$ is positive definite we can proceed as in the first part of the proof and arrive at the following modified version of eq. (34): There exists an absolute constant $K\geq 1$ such that

\displaystyle\begin{split}\Delta_{n}&\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\right]\Delta_{n-1}^{\eta}\\ &\quad{}+\frac{K}{n^{1/2}\lambda^{2}}\frac{\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]\mathrm{E}[\|Z^{\eta}\|_{\infty}]}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}\\ &\quad{}+\frac{K}{n\lambda^{3}}\frac{\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]M}{\sqrt{\mathrm{Var}(\|Z^{\eta}\|_{\infty})}}+\frac{K}{n^{1/2}\lambda^{3}}\mathrm{E}\left[\|X^{\eta}\|_{\infty}^{3}\mathbf{1}\{\|X^{\eta}\|_{\infty}>M\}\right]\\ &\quad{}+\frac{3\eta}{\lambda}\sqrt{2\log 2d}.\end{split}

(36)

Set

\displaystyle\lambda=K\left(\frac{C_{*}}{n-1}\right)^{1/6}\left(\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]\right)^{1/3}

and combine eq. (35) and eq. (36) to conclude (as in the first part of the proof) that for all $n\geq 2$ ,

$\displaystyle\Delta_{n}$	$\displaystyle\leq\frac{K\lambda}{\sqrt{\mathrm{Var}(\\|Z^{\eta}\\|_{\infty})}}+B^{\eta}_{n-1}\sqrt{C_{*}}+B_{n}^{\eta}+\frac{3\eta}{\lambda}\sqrt{2\log 2d}$
	$\displaystyle\leq B_{n}^{\eta}\left[\left(\frac{n}{n-1}\right)^{1/6}K^{2}C_{}^{1/6}+\left(\frac{n}{n-1}\right)^{1/2}\sqrt{C_{}}+1\right]+\frac{3\eta}{\lambda}\sqrt{2\log 2d}$
	$\displaystyle\leq B_{n}^{\eta}\left[\left(1+2K^{2}\right)\sqrt{C_{}}+1\right]+\frac{3\eta}{K}\frac{\sqrt{2\log 2d}}{\left(\mathrm{E}[\\|X^{\eta}\\|_{\infty}^{3}]\right)^{1/3}}\left(\frac{n-1}{C_{}}\right)^{1/6}.$	(37)

Since $\eta>0$ arbitrary, we can take $\eta\rightarrow 0$ . To complete the proof, we only need to find the limit of the expression on the right hand side in above display.

Let $(\eta_{k})_{k\geq 1}$ be a monotone falling null sequence. For all $0<\eta_{k}\leq 1$ and $1\leq p\leq 3$ , $\|X^{\eta}\|_{\infty}^{p}\equiv\|X+\eta Y\|_{\infty}^{p}\leq 2^{p-1}\|X\|_{\infty}^{p}+2^{p-1}\|Y\|_{\infty}^{p}$ a.s. and $\mathrm{E}[\|X\|_{\infty}^{p}]<\infty$ (otherwise the upper bound in Proposition 1 is trivial!) and $\mathrm{E}[\|Y\|_{\infty}^{p}]<\infty$ (e.g. van der Vaart and Wellner, 1996, Proposition A.2.4). Hence, $(\|X^{\eta}\|_{\infty}^{p})_{k\geq 1}$ is uniformly integrable for $p\in\{1,2\}$ . Since in addition $\|X^{\eta}\|_{\infty}^{p}\rightarrow\|X\|^{p}$ a.s., the sequence $(\|X^{\eta}\|_{\infty})_{k\geq 1}$ converges in $L^{2}$ . Arguing similarly, we conclude that the sequences $(\|Z^{\eta}\|_{\infty}^{2})_{k\geq 1}$ converges in $L^{2}$ as well. Thus,

\displaystyle B_{n}^{\eta}\rightarrow\frac{(\mathrm{E}[\|X\|_{\infty}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\|Z\|_{\infty})}}+\frac{\mathrm{E}\left[\|X\|_{\infty}^{3}\mathbf{1}\{\|X\|_{\infty}>M\}\right]}{\mathrm{E}\left[\|X\|_{\infty}^{3}\right]}+\frac{\mathrm{E}[\|Z\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\|Z\|_{\infty})}}\equiv B_{n}\quad{}\mathrm{as}\quad{}k\rightarrow\infty,

and

\displaystyle\frac{3\eta_{k}}{K}\frac{\sqrt{2\log 2d}}{\left(\mathrm{E}[\|X^{\eta}\|_{\infty}^{3}]\right)^{1/3}}\left(\frac{n-1}{C_{*}}\right)^{1/6}\rightarrow 0\quad{}\mathrm{as}\quad{}k\rightarrow\infty.

Hence, by eq. (37) we have shown that $\Delta_{n}\lesssim B_{n}$ for all $M\geq 0$ and $n\geq 1$ . This completes the proof of the proposition. ∎

C.2 Proof of Corollary 2

Proof of Corollary 2.

Define $\widetilde{X}=(X^{(j)}/\sigma_{(1)})_{j=1}^{d}$ and $\widetilde{Z}=(Z^{(j)}/\sigma_{(1)})_{j=1}^{d}$ where $\sigma_{(1)}^{2}=\\ \min_{1\leq j\leq d}\Sigma_{jj}$ . Thus, by Lemma 7,

\displaystyle\mathrm{Var}(\|Z\|_{\infty})\gtrsim\left(\frac{\sigma_{(1)}}{1+\mathrm{E}[\|\widetilde{Z}\|_{\infty}]}\right)^{2}.

(38)

Moreover, for $M,s>0$ arbitrary,

\displaystyle\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}\mathbf{1}\{\|\widetilde{X}\|_{\infty}>M\}]\leq\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+s}\mathbf{1}\{\|\widetilde{X}\|_{\infty}>M\}]M^{-s}\leq\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+s}]M^{-s},

and, hence, for $M_{3+\delta}:=\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+\delta}]^{1/(3+\delta)}$ and $s=\delta$ ,

\displaystyle\frac{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}\mathbf{1}\{\|\widetilde{X}\|_{\infty}>n^{1/3}M_{3+\delta}\}]}{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]}\leq\frac{1}{n^{\delta/3}}\frac{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3+\delta}]}{M_{3+\delta}^{\delta}\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]}\leq\frac{1}{n^{\delta/3}}\frac{M_{3+\delta}^{3}}{\mathrm{E}[\|\widetilde{X}\|_{\infty}^{3}]}.

(39)

Combine the upper bound in Proposition 1 with eq. (38) and (39) and simplify the expression to conclude the proof of the first claim. (Obviously, this bound is not tight, but it is aesthetically pleasing.) The second claim about equicorrelated coordinates in $X$ follows from the lower bound on the variance of $\|\widetilde{Z}\|_{\infty}$ in Proposition 4.1.1 in Tanguy (2017) combined with the upper bound in Proposition 1 and eq. (39). ∎

C.3 Proof of Proposition 3

Proof of Proposition 3.

The main proof idea is standard, e.g. similar arguments have been used in proofs by Fang and Koike (2021) (Theorem 1.1) and Chernozhukov et al. (2020) (Theorem 3.2). While our bound is dimension-free, it is not sharp (e.g. Chernozhukov et al., 2020, Proposition 2.1).

The case of positive definite $\Omega$ .

Suppose that $\Sigma$ is positive semi-definite and $\Omega$ is positive definite. To simplify notation, we set

\displaystyle\Delta:=\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|Y\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\|Z\|_{\infty}\leq s\right\}\Big{|}.

Moreover, for $s\in\mathbb{R}$ , $\lambda\geq 0$ arbitrary denote by $h_{s,\lambda}$ the map from Lemma 1 and define $y\mapsto h(y):=h_{s,\lambda}(\|y\|_{\infty})$ . Then, by Lemma 6 and Lemma 1 (ii),

\displaystyle\Delta\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}.

(40)

Since $y\mapsto h(y)-\mathrm{E}[h(Z)]$ is Lipschitz continuous (with constant $\lambda^{-1}$ ) and $\Omega$ is positive definite, Proposition 4.3.2 in Nourdin and Peccati (2012) implies that

\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]=\mathrm{E}\left[\mathrm{tr}\left(\Omega D^{2}G_{h}(Y)\right)-Y^{\prime}DG_{h}(Y)\right],

(41)

where $G_{h}\in C^{\infty}(\mathbb{R}^{d})$ and

\displaystyle G_{h}(y):=\int_{0}^{\infty}\Big{(}\mathrm{E}[h(Z)]-P_{t}h(y)\Big{)}dt\quad{}\quad{}\forall y\in\mathbb{R}^{d},

and $P_{t}h$ denotes the Ornstein-Uhlenbeck semi-group with stationary measure $N(0,\Omega)$ , i.e.

\displaystyle P_{t}h(y):=\mathrm{E}\left[h\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\right]\quad{}\quad{}\forall y\in\mathbb{R}^{d}.

Using Stein’s lemma we re-write eq. (41) as

\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]=\mathrm{E}\left[\mathrm{tr}\left(\Omega D^{2}G_{h}(Y)\right)-\mathrm{tr}\left(\Sigma D^{2}G_{h}(Y)\right)\right].

Notice that above identity holds even if $\Sigma$ is only positive semi-definite (e.g. Nourdin and Peccati, 2012, Lemma 4.1.3)! By Hölder’s inequality for matrix inner products,

\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]\leq\left(\max_{j,k}|\Omega_{jk}-\Sigma_{jk}|\right)\left(\sum_{j,k}\mathrm{E}\left[\left|\frac{\partial^{2}G_{h}}{\partial x_{j}\partial x_{k}}(Y)\right|\right]\right).

(42)

To complete the proof, we now bound the second derivative $D^{2}G_{h}$ . By Lemma 2 (i), for arbitrary indices $1\leq j,k\leq d$ ,

\displaystyle\frac{\partial^{2}G_{h}}{\partial y_{j}\partial y_{k}}(Y)=-\int_{0}^{\infty}e^{-2t}\mathrm{E}_{Z}\left[\frac{\partial^{2}h}{\partial y_{j}\partial y_{k}}\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\Big{|}_{y=Y}\right]dt,

and, hence, by Lemma 2 (ii),

	$\displaystyle e^{-2t}\sum_{j,k}\mathrm{E}_{Z}\left[\left\|\frac{\partial^{2}h}{\partial y_{j}\partial y_{k}}\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\Big{\|}_{y=Y}\right\|\right]$
	$\displaystyle\quad{}\leq C_{2}\lambda^{-2}e^{-2t}\mathrm{E}_{Z}\left[\sum_{j=1}^{d}\mathbf{1}_{[s,s+3\lambda]}\left(\\|V^{t}\\|_{\infty}\right)\mathbf{1}\left\{\|V_{j}^{t}\|\geq\|V_{m}^{t}\|,\>m\neq j\right\}\right],$

where

\displaystyle V^{t}:=e^{-t}Y+\sqrt{1-e^{-2t}}Z.

Since $\Omega$ is positive definite, no pair of entries in $Z$ can be perfectly (positively or negatively) correlated. Therefore, $\sum_{j=1}^{d}\mathbf{1}_{[s,s+3\lambda]}\left(\|V^{t}\|_{\infty}\right)\mathbf{1}\left\{|V_{j}^{t}|\geq|V_{m}^{t}|,\>m\neq j\right\}=1$ almost surely. Hence,

\displaystyle e^{-2t}\sum_{j,k}\mathrm{E}\left[\left|\frac{\partial^{2}h}{\partial y_{j}\partial y_{k}}\left(e^{-t}y+\sqrt{1-e^{-2t}}Z\right)\Big{|}_{y=Y}\right|\right]\leq C_{2}\lambda^{-2}e^{-2t}.

(43)

Combine eq. (42)–(43), integrate over $t\in(0,\infty)$ , and conclude that

\displaystyle\mathrm{E}\left[h(Y)-h(Z)\right]\leq C_{2}\lambda^{-2}\left(\max_{j,k}|\Omega_{jk}-\Sigma_{jk}|\right).

(44)

To conclude, combine eq. (40) and eq. (44) and optimize over $\lambda>0$ .

The case of positive semi-definite $\Omega\neq\mathbf{0}$ .

Suppose that both, $\Sigma$ and $\Omega$ are positive semi-definite. To avoid trivialities, we assume that $\Omega$ is not identical to zero.

Take $W\sim N(0,I_{d})$ independent of $Z$ and, for $\eta>0$ arbitrary, define $Z^{\eta}:=Z+\eta W$ . Now, consider eq. (40), i.e.

\displaystyle\Delta\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z)]\big{|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\|Y\|_{\infty})\vee\mathrm{Var}(\|Z\|_{\infty})+\lambda^{2}/12}}.

Expand above inequality yields

	$\displaystyle\Delta$	$\displaystyle\leq\sup_{s\in\mathbb{R}}\big{\|}\mathrm{E}[h(Y)-h(Z^{\eta})]\big{\|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\\|Y\\|_{\infty})\vee\mathrm{Var}(\\|Z\\|_{\infty})+\lambda^{2}/12}}+\sup_{s\in\mathbb{R}}\big{\|}\mathrm{E}[h(Z^{\eta})-h(Z)]\big{\|}$
		$\displaystyle\leq\sup_{s\in\mathbb{R}}\big{\|}\mathrm{E}[h(Y)-h(Z^{\eta})]\big{\|}+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\\|Y\\|_{\infty})\vee\mathrm{Var}(\\|Z\\|_{\infty})+\lambda^{2}/12}}+\frac{\eta}{\lambda}\sqrt{2\log 2d},$		(45)

where the last inequality holds because $h$ is $\lambda^{-1}$ -Lipschitz continuous w.r.t. the metric induced by the $\ell_{\infty}$ -norm.

Since $Z^{\eta}\sim N(0,\Omega+\eta^{2}I_{d})$ has positive definite covariance matrix, we can bound the first term on the far right hand side in above display using eq. (44), i.e.

\displaystyle\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[h(Y)-h(Z^{\eta})]\big{|}\leq C_{2}\lambda^{-2}\left(\max_{j,k}|\Omega_{jk}-\Sigma_{jk}+\eta^{2}\mathbf{1}\{j=k\}|\right).

(46)

Combine eq. (C.3) and eq. (46) to obtain

	$\displaystyle\Delta$	$\displaystyle\leq C_{2}\lambda^{-2}\left(\max_{j,k}\|\Omega_{jk}-\Sigma_{jk}+\eta^{2}\mathbf{1}\{j=k\}\|\right)+\frac{2\sqrt{3}\lambda}{\sqrt{\mathrm{Var}(\\|Y\\|_{\infty})\vee\mathrm{Var}(\\|Z\\|_{\infty})+\lambda^{2}/12}}$
		$\displaystyle\quad{}+\frac{\eta}{\lambda}\sqrt{2\log 2d}.$

Letting $\eta\rightarrow 0$ and optimizing over $\lambda>0$ gives the desired bound on $\Delta$ . ∎

C.4 Proofs of Propositions 4 and 5

Proof of Proposition 4.

By the triangle inequality,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\widehat{\Sigma}_{n}^{1/2}Z\\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\quad{}\leq\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq s\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}+\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\widehat{\Sigma}_{n}^{1/2}Z\\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}.$

Now, apply Proposition 1 to the first summand and Proposition 3 to the second summand. This completes the proof. ∎

Proof of Proposition 5.

The proof is an adaptation of the proof of Theorem 3.1 in Chernozhukov et al. (2013) to our setup. To simplify notation we write $c_{n}^{*}(\alpha):=c_{n}(\alpha;\widehat{\Sigma}_{n})$ and $c_{n}(\alpha):=c_{n}(\alpha;\Sigma)$ ; see also eq. (23). Note that

		$\displaystyle\sup_{\alpha\in(0,1)}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta\leq c_{n}^{*}(\alpha)\right\}-\alpha\right\|$
	$\displaystyle\begin{split}&\quad{}\leq\sup_{\alpha\in(0,1)}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta\leq c_{n}^{*}(\alpha)\right\}-\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta\leq c_{n}(\alpha)\right\}\right\|\\ &\quad{}\quad{}+\sup_{\alpha\in(0,1)}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta\leq c_{n}(\alpha)\right\}-\mathbb{P}\left\{\\|\Sigma^{1/2}Z\\|_{\infty}\leq c_{n}(\alpha)\right\}\right\|.\end{split}$			(47)

For $\delta>0$ arbitrary, the first term can be upper bounded by Lemma 10 as

	$\displaystyle\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}<\\|S_{n}\\|_{\infty}+\Theta\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}\Big{\}}+2\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\}$
	$\displaystyle\quad\leq\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}-\eta<\\|S_{n}\\|_{\infty}\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}+\eta\Big{\}}$	(48)
	$\displaystyle\quad\quad+\mathbb{P}\left\{\|\Theta\|>\eta\right\}+2\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\}$
	$\displaystyle\quad{}\leq\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}-\eta<\\|\Sigma^{1/2}Z\\|_{\infty}\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}+\eta\Big{\}}+\mathbb{P}\left\{\|\Theta\|>\eta\right\}$
	$\displaystyle\quad{}\quad{}+2\sup_{s\geq 0}\left\|\mathbb{P}\big{\{}\\|S_{n}\\|_{\infty}\leq s\big{\}}-\mathbb{P}\big{\{}\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\big{\}}\right\|+2\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\}$
	$\displaystyle\quad{}\leq\sup_{\alpha\in(0,1)}\mathbb{P}\Big{\{}c_{n}\big{(}\alpha-\pi_{n}(\delta)\big{)}<\\|\Sigma^{1/2}Z\\|_{\infty}\leq c_{n}\big{(}\alpha+\pi_{n}(\delta)\big{)}\Big{\}}$
	$\displaystyle\quad{}\quad+\frac{\eta 8\sqrt{3}}{\sqrt{\mathrm{Var}(\\|\Sigma^{1/2}Z\\|_{\infty})+\eta^{2}/3}}+\mathbb{P}\left\{\|\Theta\|>\eta\right\}$
	$\displaystyle\quad{}\quad{}+2\sup_{s\geq 0}\left\|\mathbb{P}\big{\{}\\|S_{n}\\|_{\infty}\leq s\big{\}}-\mathbb{P}\big{\{}\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\big{\}}\right\|+2\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\}$
$\displaystyle\begin{split}&\quad{}\leq 2\pi_{n}(\delta)+\frac{\eta 8\sqrt{3}}{\sqrt{\mathrm{Var}(\\|\Sigma^{1/2}Z\\|_{\infty})+\eta^{2}/3}}+\mathbb{P}\left\{\|\Theta\|>\eta\right\}\\ &\quad{}\quad{}+2\sup_{s\geq 0}\left\|\mathbb{P}\big{\{}\\|S_{n}\\|_{\infty}\leq s\big{\}}-\mathbb{P}\big{\{}\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\big{\}}\right\|+2\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\},\\ \end{split}$		(49)

where the second inequality follows from (several applications of) Lemma 6 and the third from the definition of quantiles and because $\|\Sigma^{1/2}Z\|_{\infty}$ has no point masses.

Let $\eta>0$ be arbitrary. We now bound the second term on the right hand side of eq. (C.4) by

	$\displaystyle\sup_{\alpha\in(0,1)}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta\leq c_{n}(\alpha)\right\}-\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq c_{n}(\alpha)\right\}\right\|$
	$\displaystyle\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\right\}\right\|$
	$\displaystyle\quad{}\quad{}\lesssim\mathbb{P}\left\{\|\Theta\|>\eta\right\}+\sup_{s\geq 0}\mathbb{P}\left\{s-\eta\leq\\|S_{n}\\|_{\infty}\leq s+\eta\right\}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\right\}\right\|$
	$\displaystyle\quad{}\quad{}\lesssim\mathbb{P}\left\{\|\Theta\|>\eta\right\}+\sup_{s\geq 0}\mathbb{P}\left\{s-\eta\leq\\|\Sigma^{1/2}Z\\|_{\infty}\leq s+\eta\right\}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\right\}\right\|$
$\displaystyle\begin{split}&\quad{}\quad{}\lesssim\mathbb{P}\left\{\|\Theta\|>\eta\right\}+\frac{\eta 4\sqrt{3}}{\sqrt{\mathrm{Var}(\\|\Sigma^{1/2}Z\\|_{\infty})+\eta^{2}/3}}\\ &\quad{}\quad{}\quad{}\quad{}+\sup_{s\geq 0}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\right\}\right\|,\end{split}$		(50)

where the third inequality follows Lemma 6.

Combine eq. (C.4)–(C.4) to obtain

	$\displaystyle\sup_{\alpha\in(0,1)}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta\leq c_{n}^{*}(\alpha)\right\}-\alpha\right\|$
	$\displaystyle\quad{}\lesssim\sup_{s\geq 0}\left\|\mathbb{P}\left\{\\|S_{n}\\|_{\infty}\leq s\right\}-\mathbb{P}\left\{\\|\Sigma^{1/2}Z\\|_{\infty}\leq s\right\}\right\|+\inf_{\delta>0}\left\{\pi_{n}(\delta)+\mathbb{P}\left\{\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right\}\right\}$
	$\displaystyle\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\\|\Sigma^{1/2}Z\\|_{\infty})}}+\mathbb{P}\left\{\|\Theta\|>\eta\right\}\right\}.$

To complete the proof bound the first term on the right hand side by Proposition 1. ∎

Appendix D Proofs of the results in Section 3

D.1 Proofs of Theorem 1 and Corollary 2

Proof of Theorem 1.

Let $\delta>0$ be arbitrary and define $r_{n}(\delta):=\psi_{n}(\delta)\vee\phi_{n}(\delta)$ . Let $\mathcal{H}_{n,\delta}\subset\mathcal{F}_{n}$ be a $\delta\|F_{n}\|_{P,2}$ -net of $\mathcal{F}_{n}$ and set $\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>\rho(f,g)<\delta\|F_{n}\|_{P,2}\}$ . Since $\mathcal{F}_{n}$ is totally bounded with respect to $\rho$ , the $\mathcal{H}_{n,\delta}$ ’s are finite. Moreover, for each $\delta>0$ there exists a map $\pi_{\delta}:\mathcal{F}_{n}\rightarrow\mathcal{H}_{n,\delta}$ such that $\rho(f,\pi_{\delta}f)<\delta\|F_{n}\|_{P,2}$ for all $f\in\mathcal{F}_{n}$ and, hence,

\displaystyle\big{|}\|G_{P}\|_{\mathcal{H}_{n,\delta}}-\|G_{P}\|_{\mathcal{F}_{n}}\big{|}\leq\|G_{P}\circ(\pi_{\delta}-id)\|_{\mathcal{F}_{n}}\leq\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}},

where the first inequality holds by the reverse triangle inequality and the prelinearity of the Gaussian $P$ -bridge process (e.g. Dudley, 2014, p. 65, eq. 2.4). (Here and in the following $id$ stands for the identity map.) Since the map $f\mapsto(P_{n}-P)f$ is obviously linear, the same inequality holds for the empirical process $\{\mathbb{G}_{n}(f):f\in\mathcal{F}_{n}\}$ .

By Lemma 8 and Lemma 6,

\displaystyle\begin{split}&\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}-\|G_{P}\|_{\mathcal{F}_{n}}\big{|}>3\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}+6\sqrt{3}\sqrt{r_{n}(\delta)}.\end{split}

By the triangle inequality,

\displaystyle\begin{split}&\mathbb{P}\left\{\big{|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}-\|G_{P}\|_{\mathcal{F}_{n}}\big{|}>3\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|\mathbb{G}_{n}\|_{\mathcal{H}_{n,\delta}}-\|G_{P}\|_{\mathcal{H}_{n,\delta}}\big{|}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}+\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\},\end{split}

Since $\mathcal{H}_{\delta}$ is finite, Proposition 1 implies, for all $s\geq 0$ ,

\displaystyle\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{H}_{n,\delta}}\leq s\right\}\leq\mathbb{P}\left\{\|G_{P}\|_{\mathcal{H}_{n,\delta}}\leq s\right\}+KB_{n}(\delta),

where $K>0$ is an absolute constant and

\displaystyle B_{n}(\delta):=\frac{\|F_{n}\|_{P,3}}{n^{1/6}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M_{n}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{H}_{n,\delta}}+M_{n}}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}.

Therefore, by Strassen’s theorem (e.g. Dudley, 2002, Theorem 11.6.2),

\displaystyle\begin{split}&\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\\ &\quad{}\quad{}\leq KB_{n}(\delta_{n})+6\sqrt{3}\sqrt{r_{n}(\delta)}+\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}>\sqrt{r_{n}(\delta)\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\},\end{split}

(51)

where $K>0$ is an absolute constant and

\displaystyle B_{n}(\delta)=\frac{\|F_{n}\|_{P,3}}{n^{1/6}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M_{n}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{H}_{n,\delta}}+M_{n}}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}}.

By the hypothesis on the moduli of continuity of the Gaussian $P$ -bridge and the empirical process, we can upper bound the right hand side in eq. (51) by

	$\displaystyle KB_{n}(\delta)+6\sqrt{3}\sqrt{r_{n}(\delta)}+2\frac{\psi_{n}(\delta)\vee\phi_{n}(\delta)}{r_{n}(\delta)}$	$\displaystyle<KB_{n}(\delta)+(2+6\sqrt{3})\sqrt{r_{n}(\delta)}$
		$\displaystyle<KB_{n}(\delta)+13\sqrt{r_{n}(\delta)}.$		(52)

Since $r_{n}(\delta)$ is non-increasing in $\delta>0$ , we can assume without loss of generality that there exist $N\geq 1$ such that for all $\delta>0$ and all $n>N$ , $\sqrt{r_{n}(\delta)}<1/13$ ; otherwise the upper bound is trivial by eq. (D.1). Thus, since $\|G_{P}\|_{\mathcal{F}_{n}}\geq\|G_{P}\|_{\mathcal{H}_{n,\delta}}$ , Lemma 9 with $X=\|G_{P}\|_{\mathcal{H}_{n,\delta}}$ and $Z=\|G_{P}\|_{\mathcal{F}_{n}}$ yields, for all $n>N$ ,

	$\displaystyle\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}$	$\displaystyle\leq\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{H}_{n,\delta}})}+\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}}-\\|G_{P}\\|_{\mathcal{H}_{n,\delta}})}$
		$\displaystyle\overset{(a)}{\leq}\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{H}_{n,\delta}})}+\sqrt{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n,\delta}^{\prime}}^{2}}$
		$\displaystyle\overset{(b)}{\leq}\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{H}_{n,\delta}})}+12/13\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})},$

where (a) holds because $0\leq\|G_{P}\|_{\mathcal{F}_{n}}-\|G_{P}\|_{\mathcal{H}_{n,\delta}}\leq\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}$ and (b) holds because of $\sqrt{r_{n}(\delta)}<1/13$ and for suprema of Gaussian processes we can reverse Liapunov’s inequality (i.e. Lemma 15, the unknown constant can be absorbed into $r_{n}(\delta)$ ). Conclude that for all $n>N$ ,

\displaystyle(1-12/13)\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\leq\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{H}_{n,\delta}})}.

(53)

Combine eq. (51)–(53) and obtain for all $n>N$ ,

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|\mathbb{G}_{n}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}\Big{|}\leq 13\left(KB_{n}+\sqrt{r_{n}(\delta)}\right),

where

\displaystyle B_{n}=\frac{\|F_{n}\|_{P,3}}{n^{1/6}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}+\frac{\|F_{n}\mathbf{1}\{F_{n}>M_{n}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}+\frac{\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}+M_{n}}{\sqrt{n\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}}.

Increase the constant $13$ on the right hand side until the bound holds also for all $1\leq n\leq N$ . Since $\delta>0$ is arbitrary, set $\delta=\delta^{*}$ such that $r_{n}(\delta^{*})=r_{n}:=\inf\big{\{}r_{n}(\delta):\delta>0\big{\}}$ .

∎

Proof of Corollary 2.

We only need to verify that under the stated assumptions Theorem 1 applies with $r_{n}\equiv 0$ . Let $n\geq 1$ be arbitrary and define $\mathcal{F}_{n,\delta}^{\prime}:=\mathcal{F}_{n,\delta}^{\prime\prime}\cap\mathcal{F}_{n,\delta}^{\prime\prime\prime}$ , where $\mathcal{F}_{n,\delta}^{\prime\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>d_{P}(f,g)<\delta\|F_{n}\|_{P,2}\}$ and $\mathcal{F}_{n,\delta}^{\prime\prime\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>d_{P_{n}}(f,g)<\delta\|F_{n}\|_{P,2}\}$ . Note that $\mathcal{F}_{n,\delta}^{\prime}=\{f-g:f,g\in\mathcal{F}_{n},\>(d_{P}\vee d_{P_{n}})(f,g)<\delta\|F_{n}\|_{P,2}\}$ . First, by Lemma 16, $\mathcal{F}_{n}$ is totally bounded w.r.t. the pseudo-metric $d_{P}$ . Second, by Lemma 18, $\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime}}\leq\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n,\delta}^{\prime\prime}}\rightarrow 0$ as $\delta\rightarrow\infty$ . Thus, for each $n\geq 1$ , there exists some $\psi_{n}$ such that $\psi_{n}(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ . Third, $\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime}}\big{\|}_{P,1}\leq\big{\|}\|\mathbb{G}_{n}\|_{\mathcal{F}_{n,\delta}^{\prime\prime\prime}}\big{\|}_{P,1}\leq 2\sqrt{n}\|F_{n}\|_{P,2}\delta$ . Thus, for each $n\geq 1$ , there also exists $\phi_{n}$ such that $\psi_{n}(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ . This completes the proof.

(Note that the last argument does not imply that the empirical process is asymptotically $d_{P}$ -equicontinuous: Not only do we use a different norm, but we keep the sample size $n$ fixed and only consider the limit $\delta\rightarrow 0$ . This shows once again that Gaussian approximability is a strictly weaker concept than weak convergence, see also Remark 3 and the discussion in Section 2.2.) ∎

D.2 Proofs of Theorem 3 and Corollaries 4 and 5

Proof of Theorem 3.

The key idea is to construct “entangled $\delta$ -nets” of the function classes $\mathcal{F}_{n}$ and $\mathcal{G}_{n}$ . The precise meaning of this term and our reasoning for this idea will become clear in the course of the proof.

Let $\delta>0$ be arbitrary and define $r_{n}(\delta):=\psi_{n}(\delta)\vee\phi_{n}(\delta)$ . Let $\mathcal{H}_{n,\delta}\subseteq\mathcal{F}_{n}$ and $\mathcal{I}_{n,\delta}\subseteq\mathcal{G}_{n}$ be $\delta\|F_{n}\|_{P,2}$ - and $\delta\|G_{n}\|_{P,2}$ -nets of $\mathcal{F}_{n}$ and $\mathcal{G}_{n}$ w.r.t. $\rho_{1}$ and $\rho_{2}$ , respectively. Since $\mathcal{F}_{n}$ and $\mathcal{G}_{n}$ are totally bounded w.r.t. $\rho_{1}$ and $\rho_{2}$ , respectively, the $\mathcal{H}_{n,\delta}$ ’s and $\mathcal{I}_{n,\delta}$ ’s are finite.

Next, let $\pi^{1}_{\mathcal{G}_{n}}$ be the projection from $\mathcal{F}_{n}\cup\mathcal{G}_{n}$ onto $\mathcal{G}_{n}$ defined by $\rho_{1}(h,\pi^{1}_{\mathcal{G}_{n}}h)=\inf_{g\in\mathcal{G}_{n}}\rho_{1}(h,g)$ for all $h\in\mathcal{F}_{n}\cup\mathcal{G}_{n}$ . If this projection is not unique for some $h\in\mathcal{F}_{n}\cup\mathcal{G}_{n}$ , choose any of the equivalent points. Define the projection $\pi^{2}_{\mathcal{F}_{n}}$ from $\mathcal{F}_{n}\cup\mathcal{G}_{n}$ onto $\mathcal{F}_{n}$ analogously. Finally, define the “entangled $\delta$ -nets” of $\mathcal{F}_{n}$ and $\mathcal{G}_{n}$ as

	$\displaystyle\widetilde{\mathcal{H}}_{n,\delta}$	$\displaystyle:=\pi^{2}_{\mathcal{F}_{n}}\big{(}\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}\big{)}=\mathcal{H}_{n,\delta}\cup\pi^{2}_{\mathcal{F}_{n}}\big{(}\mathcal{I}_{n,\delta}\big{)}\subset\mathcal{F}_{n},\quad\text{and}$
	$\displaystyle\widetilde{\mathcal{I}}_{n,\delta}$	$\displaystyle:=\pi^{1}_{\mathcal{G}_{n}}\big{(}\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}\big{)}=\pi^{1}_{\mathcal{G}_{n}}\big{(}\mathcal{H}_{n,\delta}\big{)}\cup\mathcal{I}_{n,\delta}\subset\mathcal{G}_{n}.$

Note that the entangled sets $\widetilde{\mathcal{H}}_{n,\delta}$ and $\widetilde{\mathcal{I}}_{n,\delta}$ are still $\delta\|F_{n}\|_{P,2}$ - and $\delta\|G_{n}\|_{P,2}$ -nets of $\mathcal{F}_{n}$ and $\mathcal{G}_{n}$ w.r.t. $\rho_{1}$ and $\rho_{2}$ . Hence, for each $\delta>0$ , there exist maps $\tilde{\pi}_{\delta}:\mathcal{F}_{n}\rightarrow\widetilde{\mathcal{H}}_{n,\delta}$ and $\tilde{\theta}_{\delta}:\mathcal{G}_{n}\rightarrow\widetilde{\mathcal{I}}_{n,\delta}$ such that $\rho_{1}(f,\tilde{\pi}_{\delta}f)\leq\delta\|F_{n}\|_{P,2}$ for all $f\in\mathcal{F}_{n}$ and $\rho_{2}(g,\tilde{\theta}_{\delta}g)\leq\delta\|G_{n}\|_{Q,2}$ for all $g\in\mathcal{G}_{n}$ . Therefore, by the reverse triangle inequality and the linearity of Gaussian $P$ -bridge processes and $Q$ -motions,

	$\displaystyle\big{\|}\\|G_{P}\\|_{\widetilde{\mathcal{H}}_{n,\delta}}-\\|G_{P}\\|_{\mathcal{F}_{n}}\big{\|}\leq\\|G_{P}\circ(\tilde{\pi}_{\delta}-id)\\|_{\mathcal{F}_{n}}\leq\\|G_{P}\\|_{\mathcal{F}_{n,\delta}^{\prime}},\quad{}\text{and}$
	$\displaystyle\big{\|}\\|Z_{Q}\\|_{\widetilde{\mathcal{I}}_{n,\delta}}-\\|Z_{Q}\\|_{\mathcal{G}_{n}}\big{\|}\leq\\|Z_{Q}\circ(\tilde{\theta}_{\delta}-id)\\|_{\mathcal{G}_{n}}\leq\\|Z_{Q}\\|_{\mathcal{G}_{n,\delta}^{\prime}},$

where $\mathcal{F}_{n,\delta}^{\prime}=\{f_{1}-f_{2}:f_{1},f_{2}\in\mathcal{F}_{n},\>\rho_{1}(f_{1},f_{2})\leq\delta\|F_{n}\|_{P,2}\}$ and $\mathcal{G}_{n,\delta}^{\prime}=\{g_{1}-g_{2}:g_{1},g_{2}\in\mathcal{G}_{n},\>\rho_{2}(g_{1},g_{2})\leq\delta\|G_{n}\|_{P,2}\}$ .

By Lemma 8 and Lemma 6,

\displaystyle\begin{split}&\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|G_{P}\|_{\mathcal{F}_{n}}-\|Z_{Q}\|_{\mathcal{G}_{n}}\big{|}>3\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}+6\sqrt{3}\sqrt{r_{n}(\delta)}.\end{split}

(54)

By the triangle inequality,

\displaystyle\begin{split}&\mathbb{P}\left\{\big{|}\|G_{P}\|_{\mathcal{F}_{n}}-\|Z_{Q}\|_{\mathcal{G}_{n}}\big{|}>3\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}\\ &\quad{}\quad{}\leq\mathbb{P}\left\{\big{|}\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}}-\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}}\big{|}>\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{\delta}^{\prime}}>\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})}\right\}+\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{\delta}^{\prime}}>\sqrt{r_{n}(\delta)}\sqrt{\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right\}.\end{split}

(55)

Since $\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}}\equiv\|G_{P}\circ\pi^{2}_{\mathcal{F}_{n}}\|_{\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}}$ and $\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}}\equiv\|Z_{Q}\circ\pi^{1}_{\mathcal{G}_{n}}\|_{\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}}$ (note: two finite dimensional mean zero Gaussian vectors with the same (!) index sets; to achieve this we needed the “entangled $\delta$ -nets”) we have by Proposition 3, for all $s\geq 0$ ,

\displaystyle\mathbb{P}\left\{\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}}\leq s\right\}\leq\mathbb{P}\left\{\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}}\leq s\right\}+K\left(\frac{\Delta_{n}(\delta)}{\mathrm{Var}(\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}})\vee\mathrm{Var}(\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}})}\right)^{1/3},

where $K>0$ is an absolute constant and, for all $\delta>0$ ,

	$\displaystyle\Delta_{n}(\delta)$	$\displaystyle:=\sup_{h_{1},h_{2}\in\mathcal{H}_{n,\delta}\cup\mathcal{I}_{n,\delta}}\big{\|}\mathrm{E}[(G_{P}\circ\pi^{2}_{\mathcal{F}_{n}})(h_{1})(G_{P}\circ\pi^{2}_{\mathcal{F}_{n}})(h_{2})]-\mathrm{E}[(Z_{Q}\circ\pi^{1}_{\mathcal{G}_{n}})(h_{1})(Z_{Q}\circ\pi^{1}_{\mathcal{G}_{n}})(h_{2})]\big{\|}$
		$\displaystyle\leq\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\big{\|}\mathrm{E}[G_{P}(f_{1})G_{P}(f_{2})]-\mathrm{E}[Z_{Q}(\pi^{1}_{\mathcal{G}_{n}}f_{1})Z_{Q}(\pi^{1}_{\mathcal{G}_{n}}f_{2})]\big{\|}$
		$\displaystyle\quad\quad\quad\quad\quad\quad\bigvee\sup_{g_{1},g_{2}\in\mathcal{G}_{n}}\big{\|}\mathrm{E}[Z_{Q}(g_{1})Z_{Q}(g_{2})]-\mathrm{E}[G_{P}(\pi^{2}_{\mathcal{F}_{n}}g_{1})G_{P}(\pi^{2}_{\mathcal{F}_{n}}g_{2})]\big{\|}$
		$\displaystyle=:\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n}).$

Therefore, by eq. (54) and eq. (55), Strassen’s theorem (e.g. Dudley, 2002, Theorem 11.6.2; see also proof of Theorem 1), and Markov’s inequality,

\displaystyle\sup_{s\geq 0}\Big{|}\mathbb{P}\left\{\|G_{P}\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\|Z_{Q}\|_{\mathcal{G}_{n}}\leq s\right\}\Big{|}\leq K\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}})\vee\mathrm{Var}(\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}})}\right)^{1/3}+13\sqrt{r_{n}(\delta)}.

(56)

By the same arguments as in the proof of Theorem 1, there exists $N>0$ such that for all $\delta>0$ and all $n\geq N$ we have $\sqrt{r_{n}(\delta)}<1/13$ and, hence,

\displaystyle\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\widetilde{\mathcal{H}}_{n,\delta}})\vee\mathrm{Var}(\|Z_{Q}\|_{\widetilde{\mathcal{I}}_{n,\delta}})}\right)^{1/3}\leq 13\left(\frac{\Delta_{P,Q}(\mathcal{F}_{n},\mathcal{G}_{n})}{\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\vee\mathrm{Var}(\|Z_{Q}\|_{\mathcal{G}_{n}})}\right)^{1/3}.

(57)

Combine eq. (56) and eq. (57) to conclude that the claim of the theorem holds for all $n\geq N$ . Since for $\sqrt{r_{n}(\delta)}\geq 1/13$ the upper bound is trivial by eq. (56), the theorem holds in fact for all $n\geq 1$ . Since $\delta>0$ is arbitrary, set $\delta=\delta^{*}$ such that $r_{n}(\delta^{*})=r_{n}:=\inf\big{\{}r_{n}(\delta):\delta>0\big{\}}$ . ∎

Proof of Corollary 4.

We only need to verify that under the stated assumptions Theorem 3 applies with $r_{n}\equiv 0$ . This follows by the same argument as in proof of Corollary 2. We therefore omit a proof. ∎

Proof of Corollary 5.

First, apply Theorem 3 (and Remark 3) with $\rho=d_{P}$ , $Q=P$ , and $\mathcal{G}_{n}\subseteq\mathcal{F}_{n}$ a $\delta\|F\|_{P,2}$ -net of $\mathcal{F}_{n}$ with respect to $d_{P}$ . Next, compute

	$\displaystyle\Delta_{P,P}(\mathcal{F}_{n},\mathcal{G}_{n})$	$\displaystyle=\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\big{\|}\mathrm{E}[G_{P}(f_{1})G_{P}(f_{2})]-\mathrm{E}[G_{P}(\pi_{\mathcal{G}_{n}}f_{1})G_{P}(\pi_{\mathcal{G}_{n}}f_{2})]\big{\|}$
		$\displaystyle\leq\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\mathrm{E}\left[G_{P}^{2}(f_{1})\big{(}G_{P}(f_{2})-G_{P}(\pi_{\mathcal{G}_{n}}f_{2})\big{)}^{2}\right]^{1/2}$
		$\displaystyle\quad+\sup_{f_{1},f_{2}\in\mathcal{F}_{n}}\mathrm{E}\left[G_{P}^{2}(\pi_{\mathcal{G}_{n}}f_{2})\big{(}G_{P}(f_{1})-G_{P}(\pi_{\mathcal{G}_{n}}f_{1})\big{)}^{2}\right]^{1/2}$
		$\displaystyle\leq 2\delta\\|F\\|_{P,2}\sup_{f\in\mathcal{F}_{n}}\sqrt{Pf^{2}}.$

Lastly, verify that under the stated assumptions Theorem 3 applies with $r_{n}\equiv 0$ . But this follows by the same argument as in proof of Corollary 2. This completes the proof. ∎

D.3 Proofs of Theorems 6 and 9, Proposition 7, and Corollary 8

Proof of Theorem 6.

By the triangle inequality

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|Z_{Q}\\|_{\mathcal{G}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\quad{}\leq\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|G_{P}\\|_{\mathcal{F}_{n}}\leq s\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}+\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|G_{P}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|Z_{Q}\\|_{\mathcal{G}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}.$

To complete the proof, apply Theorem 1 to the first summand and Theorem 3 to the second summand. ∎

Proof of Proposition 7.

Our proof is modeled after the proof of Theorem 1 in Jain and Kallianpur (1970). However, unlike them we do not argue via the reproducing kernel Hilbert space associated to $\mathcal{E}_{Q}$ . Instead, we leverage the fact that under the stated assumptions $Z_{Q}$ has a version that is both a mean-square continuous stochastic process and a random element on some Hilbert space.

Since $\mathcal{F}_{n}$ is compact it is totally bounded and, by Lemma 16, separable w.r.t $e_{Q}$ . Hence, the process $Z_{Q}=\{Z_{Q}(f):f\in\mathcal{F}_{n}\}$ has a separable and jointly measurable version $\widetilde{Z}_{Q}$ (e.g. Giné and Nickl, 2016, Proposition 2.1.12, note that the $e_{Q}$ is the intrinsic standard deviation metric of the Gaussian $Q$ -motion $Z_{Q}$ ). Since $\mathcal{F}_{n}$ is compact and $\mathcal{E}_{Q}$ is continuous, $T_{\mathcal{E}_{Q}}$ is a bounded linear operator. Let $\{\lambda_{k},\varphi_{k})\}_{k=1}^{\infty}$ be the eigenvalue and eigenfunction pairs of $T_{\mathcal{E}_{Q}}$ and define

\displaystyle\widetilde{Z}_{Q}^{m}(f):=\sum_{k=1}^{m}\langle\widetilde{Z}_{Q},\varphi_{k}\rangle\varphi_{k}(f),\quad f\in\mathcal{F}_{n}.

(58)

By continuity of $\mathcal{E}_{Q}$ , $\widetilde{Z}_{Q}$ is a mean-square continuous stochastic process in $L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu)$ . Moreover, by joint measurability, $\widetilde{Z}_{Q}$ is also a random element on the Hilbert space $\big{(}L_{2}(\mathcal{F}_{n},\mathcal{B}_{n},\mu),\langle\cdot,\cdot\rangle\big{)}$ . Hence, Theorems 7.3.5 and 7.4.3 in Hsing and Eubank (2015) apply, and we conclude that the partial sums $\widetilde{Z}_{Q}^{m}(f)$ converge to $\widetilde{Z}_{Q}(f)$ in $L_{2}(\Omega,\mathcal{A},\mathbb{P})$ as $m\rightarrow\infty$ pointwise in $f\in\mathcal{F}_{n}$ .

Now, observe that

\displaystyle\mathrm{E}[\langle\widetilde{Z}_{Q},\varphi_{k}\rangle]=0\quad\quad\mathrm{and}\quad\quad\mathrm{E}[\langle\widetilde{Z}_{Q},\varphi_{k}\rangle\langle\widetilde{Z}_{Q},\varphi_{j}\rangle]=\lambda_{k}\mathbf{1}\{j=k\},\quad\forall j,k\in\mathbb{N}.

(59)

Thus, the $\langle\widetilde{Z}_{Q},\varphi_{k}\rangle$ ’s are uncorrelated random variables. Since the $\langle\widetilde{Z}_{Q},\varphi_{k}\rangle$ ’s are necessarily Gaussian (inner product of a Gaussian random element with a deterministic function $\varphi_{k}$ !), they are in fact independent Gaussian random variables with mean zero and variance $\lambda_{k}$ . Therefore, by Lévy’s theorem, $\widetilde{Z}_{Q}^{m}(f)$ converges to $\widetilde{Z}_{Q}(f)$ almost surely pointwise in $f\in\mathcal{F}_{n}$ . Thus, for $f\in\mathcal{F}$ ,

\displaystyle\widetilde{Z}_{Q}(f)=\widetilde{Z}_{Q}^{\infty}(f)\quad a.s.

(60)

Since $\mathrm{E}\|Z_{Q}\|_{\mathcal{F}_{n}}<\infty$ , Lemma 14 implies that $\widetilde{Z}_{Q}$ is almost surely bounded. Moreover, since $\mathcal{E}_{Q}$ is continuous and $\mathcal{F}_{n}$ compact, Lemma 4.6.6 (4) in Hsing and Eubank (2015) implies that $\widetilde{Z}_{Q}$ is uniformly continuous. Consequently, by construction, $\widetilde{Z}_{Q}^{m}$ is almost surely bounded and uniformly continuous, too. Thus, $\widetilde{Z}_{Q}$ and $\widetilde{Z}_{Q}^{m}$ can be considered random elements on the Banach space $C(\mathcal{F}_{n})$ of bounded continuous functions on $\mathcal{F}_{n}$ equipped with the supremum norm.

Recall that the dual space $C^{*}(\mathcal{F}_{n})$ is the space of finite signed Borel measures $\mu$ on $\mathcal{F}_{n}$ equipped with the total variation norm. The dual pairing between $X\in C(\mathcal{F}_{n})$ and $\mu\in C^{*}(\mathcal{F}_{n})$ is $\langle X,\mu\rangle:=\int_{\mathcal{F}_{n}}X(f)d\mu(f)$ . We compute

$\displaystyle\mathrm{E}\left\|\langle\widetilde{Z}_{Q}^{m},\mu\rangle-\langle\widetilde{Z}_{Q},\mu\rangle\right\|$	$\displaystyle=\mathrm{E}\left\|\int_{\mathcal{F}_{n}}\big{(}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}(f)\big{)}d\mu(f)\right\|$
	$\displaystyle\leq\sup_{f\in\mathcal{F}_{n}}\mathrm{E}\left[\big{\|}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}(f)\big{\|}\right]\\|\mu\\|_{TV}$
	$\displaystyle\leq\sup_{f\in\mathcal{F}_{n}}\mathrm{E}\left[\big{(}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}(f)\big{)}^{2}\right]^{1/2}\\|\mu\\|_{TV}$
	$\displaystyle\overset{(a)}{=}\sup_{f\in\mathcal{F}_{n}}\mathrm{E}\left[\big{(}\widetilde{Z}_{Q}^{m}(f)-\widetilde{Z}_{Q}^{\infty}(f)\big{)}^{2}\right]^{1/2}\\|\mu\\|_{TV}$
	$\displaystyle=\sup_{f\in\mathcal{F}_{n}}\left(\sum_{k=m+1}^{\infty}\lambda_{k}\varphi_{k}^{2}(f)\right)^{1/2}\\|\mu\\|_{TV}$
	$\displaystyle\overset{(b)}{\rightarrow}0\quad\text{as}\quad m\rightarrow\infty,$	(61)

where (a) holds by (60) and (b) holds by Lemma 4.6.6 (3) in Hsing and Eubank (2015). Since (D.3) implies convergence of the dual pairings in probability for every $\mu\in C^{*}(\mathcal{F}_{n})$ , we have, by Itô-Nisio’s theorem,

\displaystyle\big{\|}\widetilde{Z}_{Q}^{m}-\widetilde{Z}_{Q}\big{\|}_{\mathcal{F}_{n}}\rightarrow 0\quad\text{as}\quad m\rightarrow\infty

almost surely. Recall that $\widetilde{Z}_{Q}$ is a version of $Z_{Q}$ . This completes the proof. ∎

Proof of Corollary 8.

By Theorem 6 and arguments as in the proof of Corollary 2 we have, for each $M\geq 0$ ,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}\leq s\right\}-\mathbb{P}\left\{\\|\widehat{Z}_{n}^{m}\\|_{\mathcal{F}_{n}}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>M\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\sup_{f,g\in\mathcal{F}_{n}}\big{\|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{\|}}{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}\right)^{1/3},$

where $\lesssim$ hides an absolute constant independent of $n,m,M,\mathcal{F}_{n},F_{n},P_{n}$ , and $P$ .

Above approximation error can be further upper bounded by

\displaystyle\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}\leq\sup_{f,g\in\mathcal{F}_{n}}\big{|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}(f,g)\big{|}+\sup_{f,g\in\mathcal{F}_{n}}\big{|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}.

If $\mathcal{F}_{n}$ is compact w.r.t. $d_{\widehat{\mathcal{C}}_{n}}$ , then by Mercer’s theorem (e.g. Hsing and Eubank, 2015, Lemma 4.6.6),

\displaystyle\sup_{f,g\in\mathcal{F}_{n}}\big{|}\widehat{\mathcal{C}}_{n}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{|}=\sup_{f,g\in\mathcal{F}_{n}}\sum_{k=m+1}^{\infty}\left|\widehat{\lambda}_{k}\widehat{\varphi}_{k}(f)\widehat{\varphi}_{k}(g)\right|\rightarrow 0\quad\text{as}\quad m\rightarrow\infty.

This completes the proof of the corollary. ∎

Proof of Theorem 9.

The proof is identical to the proof of Proposition 5. The only differences are the notation and that we use Lemma 12 instead of Lemma 10 and Corollary 2/ Theorem 6 instead of Proposition 1. ∎

Appendix E Proofs of the results in Section 4

Proof of Proposition 1.

Since the problem is finite dimensional we do not have to develop the Karhunen-Loève expansion from Section 3.5. The covariance function $\mathcal{C}_{\psi}$ of the empirical process $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\psi_{i}^{\prime}u$ with $u\in\{v\in\mathbb{R}^{d}:v\in S^{d-1}\}$ has the explicit form $(u,v)\mapsto\mathcal{C}_{\psi}(u,v)=u^{\prime}\mathrm{E}[\psi_{1}\psi_{1}^{\prime}]v$ . Hence, a natural estimate of $\mathcal{C}_{\psi}$ is $(u,v)\mapsto\widehat{\mathcal{C}}_{\psi}(u,v)=u^{\prime}\widehat{\Omega}_{\psi}v$ . A version of a centered Gaussian process defined on $\{v\in\mathbb{R}^{d}:\|v\|_{2}=1\}$ and with covariance function $\widehat{\mathcal{C}}_{\psi}$ is $\{\widehat{Z}_{\psi}^{\prime}u:u\in S^{d-1}\}$ where $\widehat{Z}_{\psi}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{\psi})$ . It is standard to verify that, under the assumptions of the theorem, the metric entropy integrals associated to the Gaussian processes $G_{P}$ and $\widehat{Z}_{\psi}$ are finite for every fixed $n$ and $d$ . Thus, by Lemma 17 there exist versions of $G_{P}$ and $\widehat{Z}_{\psi}$ which are almost surely bounded and almost surely uniformly continuous. Hence, the modulus of continuity condition (6) holds for these versions and we can take $r_{n}=0$ . It follows by Theorem 6 that, for all $\alpha\in(0,1)$ ,

	$\displaystyle\sup_{\alpha\in(0,1)}\Big{\|}\mathbb{P}\left\{\sqrt{n}\\|\hat{\theta}_{n}-\theta_{0}\\|_{2}\leq c_{n}(\alpha;\widehat{\Omega}_{\psi})\right\}-\alpha\Big{\|}$
	$\displaystyle\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\\|\psi_{1}\\|_{2}^{3}])^{1/3}}{n^{1/6}\sqrt{\mathrm{Var}(\\|\\|Z_{\psi}\\|_{2}\\|_{p})}}+\frac{\mathrm{E}\left[\\|\psi_{1}\\|_{2}^{3}\mathbf{1}\{\\|\psi_{1}\\|_{2}^{3}>n\>\mathrm{E}[\\|\psi_{1}\\|_{2}^{3}]\}\right]}{\mathrm{E}\left[\\|\psi_{1}\\|_{2}^{3}\right]}+\frac{\mathrm{E}[\\|Z_{\psi}\\|_{2}]}{\sqrt{n\mathrm{Var}(\\|Z_{\psi}\\|_{2})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\\|Z_{\psi}\\|_{2})}\right)^{1/3}+\mathbb{P}\left\{\\|\widehat{\Omega}_{\psi}-\mathrm{E}[\psi_{1}\psi_{1}^{\prime}]\\|_{op}>\delta\right\}\right\}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\\|Z_{\psi}\\|_{2})}}+\mathbb{P}\left\{\|\Theta_{n}\|>\eta\right\}\right\}.$

Under Assumption 1 or 2 the right hand side in above inequality vanishes as $n$ diverges. For details we refer to Appendix A.1 in Giessing and Fan (2023). This completes the proof. ∎

Proof of Proposition 2.

Since the problem is finite dimensional we do not have to develop the Karhunen-Loève expansion from Section 3.5. Consider $T_{n}\equiv\sqrt{n}\|Q_{n}-Q\|_{{\mathcal{F}_{n}}}$ where $Q_{n}$ is the empirical measure of the collection of $Y_{i}=\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime})H_{d}^{\prime}$ , $1\leq i\leq n$ , $Q$ the pushforward of $P$ under the map $X\mapsto\mathrm{vech}^{\prime}(XX^{\prime})H_{d}^{\prime}$ , and $\mathcal{F}_{n}=\{y\mapsto y(v\otimes u):u,v\in S^{d-1}\}$ . Since each $f\in\mathcal{F}_{n}$ has a (not necessarily unique) representation in terms of $u,v\in S^{d-1}$ , we will identify $f\in\mathcal{F}_{n}$ with pairs $(u,v)\in S^{d-1}\times S^{d-1}$ when there is no danger of confusion. Clearly, $F_{n}(y)=\|y\|_{2}^{2}$ is an envelope function for $\mathcal{F}_{n}$ .

Define the Gaussian processes $\{\widehat{Z}_{n}^{m}(f):f\in\mathcal{F}_{n}\}\equiv\{\widehat{Z}_{n}^{\prime}H_{d}^{\prime}(v\otimes u):u,v\in S^{d-1}\}$ , where $\widehat{Z}_{n}\mid X_{1},\ldots,X_{n}\sim N(0,\widehat{\Omega}_{n})$ with $\widehat{\Omega}_{n}=n^{-1}\sum_{i=1}^{n}\mathrm{vech}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n})\mathrm{vech}^{\prime}(X_{i}X_{i}^{\prime}-\widehat{\Sigma}_{n})\in\mathbb{R}^{d(d+1)/2\times d(d+1)/2}$ , and $\{G_{Q}(f):f\in\mathcal{F}_{n}\}\equiv\{Z^{\prime}H_{d}^{\prime}(v\otimes u):u,v\in S^{d-1}\}$ , where $Z\sim N(0,\Omega)$ with $\Omega=\mathrm{E}[\mathrm{vech}(XX^{\prime}-\Sigma)\mathrm{vech}^{\prime}(XX^{\prime}-\Sigma)]\in\mathbb{R}^{d(d+1)/2\times d(d+1)/2}$ .

Without loss of generality, we can assume that $\mathrm{tr}(\Omega)$ and $\|\Omega\|_{op}$ are finite since otherwise the statement is trivially true. Thus, under the assumptions of the theorem, the metric entropy integrals associated to the Gaussian processes $\widehat{Z}_{n}^{m}$ and $G_{Q}$ are finite for every fixed $n$ and $d$ . Therefore by Lemma 17 there exist versions of $\widehat{Z}_{n}^{m}$ and $G_{Q}$ (which we also denote by $\widehat{Z}_{n}^{m}$ and $G_{Q}$ ) that are almost surely bounded and almost surely uniformly continuous. The modulus of continuity condition (6) holds for these versions and, in particular, we can take $r_{n}=0$ . By Theorem 6,

		$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{T_{n}\leq s\right\}-\mathbb{P}\left\{\\|\widehat{S}_{n}\\|_{op}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\begin{split}&\quad{}\quad{}\quad{}\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>n^{1/3}\\|F_{n}\\|_{P,3}\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}\\ &\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\left(\frac{\sup_{u,v\in S^{d-1}}\big{\|}(u\otimes v)^{\prime}H_{d}(\Omega-\widehat{\Omega}_{n})H_{d}^{\prime}(u\otimes v)\big{\|}}{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}\right)^{1/3}.\end{split}$			(62)

In the remainder of the proof we derive upper bounds on the quantities on the right-hand side in above display.

Note that

\displaystyle\mathrm{tr}(\Sigma)=\mathrm{E}\big{[}\|X\|_{2}^{2}\big{]}\overset{(a)}{\leq}\mathrm{E}\big{[}\|X\|_{2}^{6}\big{]}^{1/3}=\|F_{n}\|_{P,3}\lesssim\|F_{n}(X)\|_{\psi_{1}}\overset{(b)}{\lesssim}\mathrm{tr}(\Sigma),

(63)

where (a) follows from Hölder’s inequality and (b) holds because $X=(X_{1},\ldots,X_{d})^{\prime}\in\mathbb{R}^{d}$ is sub-Gaussian with mean zero and covariance $\Sigma$ and therefore

\displaystyle\|F_{n}(X)\|_{\psi_{1}}=\left\|X^{\prime}X\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}^{2}\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}\right\|_{\psi_{2}}^{2}\lesssim\sum_{k=1}^{d}\mathrm{Var}(X_{k})=\mathrm{tr}(\Sigma).

Next, note that $\|u\otimes v\|_{2}^{2}=(u^{\prime}\otimes v^{\prime})(u\otimes v)=u^{\prime}u\otimes v^{\prime}v=1$ for all $u,v\in S^{d-1}$ . Thus, we compute

	$\displaystyle\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}=\mathrm{E}\\|S\\|_{op}$	$\displaystyle=\mathrm{E}\left[\sup_{u,v\in S^{d-1}}\|\mathrm{vec}(S)^{\prime}(u\otimes v)\|\right]$
		$\displaystyle\leq\mathrm{E}\\|\mathrm{vec}(S)\\|_{2}\leq\left(\mathrm{E}\\|\mathrm{vec}(S)\\|_{2}^{2}\right)^{1/2}=\sqrt{\mathrm{tr}(H_{d}\Omega H_{d}^{\prime})}.$

Also, since $\mathrm{vec}(A)=H_{d}\mathrm{vech}(A)$ for all symmetric matrices $A$ , we have

	$\displaystyle\mathrm{tr}(H_{d}\Omega H_{d}^{\prime})=\mathrm{E}\left[\mathrm{vec}(XX^{\prime}-\Sigma)^{\prime}\mathrm{vec}(XX^{\prime}-\Sigma)\right]=\mathrm{E}\left[\mathrm{tr}\left((XX^{\prime}-\Sigma)(XX^{\prime}-\Sigma)\right)\right]$
	$\displaystyle=\mathrm{E}\left[\mathrm{tr}\left(XX^{\prime}XX^{\prime}\right)\right]-\mathrm{tr}(\Sigma^{2})=\mathrm{E}[\\|X\\|_{2}^{4}]-\mathrm{tr}(\Sigma^{2})\lesssim\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2}),$

where the last inequality follows from similar arguments as used to derive the upper bound in (63). Thus,

\displaystyle\mathrm{E}\|G_{P}\|_{\mathcal{F}_{n}}\lesssim\sqrt{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}.

(64)

Moreover, since $(u\otimes v)^{\prime}H_{d}\mathrm{vech}(A)=v^{\prime}Au$ for all symmetric matrices $A$ and matching vectors $u,v$ , we have, for arbitrary $f\in\mathcal{F}_{n}$ ,

$\displaystyle\mathrm{Var}(G_{P}(f))$	$\displaystyle\geq\inf_{u,v\in S^{d-1}}\mathrm{Var}\left(\mathrm{vec}(S)^{\prime}(u\otimes v)\right)$
	$\displaystyle=\inf_{u,v\in S^{d-1}}(u\otimes v)^{\prime}H_{d}\Omega H_{d}^{\prime}(u\otimes v)=\inf_{u,v\in S^{d-1}}\mathrm{E}\left[\big{(}u^{\prime}(XX^{\prime}-\Sigma)v\big{)}^{2}\right]$
	$\displaystyle=\inf_{u\in S^{d-1}}\mathrm{E}[(X^{\prime}u)^{4}]-\\|\Sigma u\\|_{2}^{2}=\inf_{u\in S^{d-1}}\mathrm{Var}\big{(}(X^{\prime}u)^{2}\big{)}$
	$\displaystyle\geq\kappa.$	(65)

Combine eq. (64)–(E) with Lemma 7 to obtain

\displaystyle\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}})\gtrsim\left(\frac{\kappa}{1+\mathrm{E}[\|G_{P}\|_{\mathcal{F}_{n}}/\kappa]}\right)^{2}\gtrsim\left(\frac{\kappa^{2}}{\kappa+\sqrt{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}}\right)^{2}\gtrsim\left(\frac{\kappa^{2}}{\kappa+\mathrm{tr}(\Sigma)}\right)^{2}.

(66)

Next, since (again!) $(u\otimes v)^{\prime}H_{d}\mathrm{vech}(A)=v^{\prime}Au$ for all symmetric matrices $A$ and matching vectors $u,v$ , we compute

		$\displaystyle\sup_{u,v\in S^{d-1}}\left\|(u\otimes v)^{\prime}H_{d}(\Omega-\widehat{\Omega}_{n})H_{d}(u\otimes v)\right\|$
		$\displaystyle\quad\leq\sup_{u,v\in S^{d-1}}\left\|\frac{1}{n}\sum_{i=1}^{n}(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}-\mathrm{E}[(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}]\right\|+\sup_{u,v\in S^{d-1}}u^{\prime}(\widehat{\Sigma}-\Sigma)v$
		$\displaystyle\quad=O_{p}\left(\mathrm{r}(\Sigma)\\|\Sigma\\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)\right),$		(67)

where the last line holds since by Markov’s inequality and Lemma 19,

	$\displaystyle\sup_{u,v\in S^{d-1}}\left\|\frac{1}{n}\sum_{i=1}^{n}(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}-\mathrm{E}[(X_{i}^{\prime}u)^{2}(X_{i}^{\prime}v)^{2}]\right\|$
	$\displaystyle\quad\quad=O_{p}\left(\mathrm{r}(\Sigma)\\|\Sigma\\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)\right),$

and by Theorem 4 in Koltchinskii and Lounici (2017)

\displaystyle\sup_{u,v\in S^{d-1}}u^{\prime}(\widehat{\Sigma}-\Sigma)v=O_{p}\left(\|\Sigma\|_{op}\left(\sqrt{\frac{\mathrm{r}(\Sigma)}{n}}\vee\frac{\mathrm{r}(\Sigma)}{n}\right)\right).

Lastly, for $r>0$ , by the upper and lower bounds in (63),

\displaystyle\frac{\|F_{n}\mathbf{1}\{F_{n}>n^{1/3}\|F_{n}\|_{P,3}\}\|_{P,3}^{3}}{\|F_{n}\|_{P,3}^{3}}\leq\frac{\mathrm{E}[F_{n}^{3+r}]}{n^{r/3}\mathrm{E}[F_{n}^{3}]^{r/3+1}}=\frac{\|F_{n}\|_{P,3+r}^{3+r}}{n^{r/3}\|F_{n}\|_{P,3}^{3+r}}\lesssim n^{-r/3}.

(68)

Now, combining (E)–(68), we obtain for $r=1$ and under Assumption 3,

	$\displaystyle\sup_{s\geq 0}\Big{\|}\mathbb{P}\left\{T_{n}\leq s\right\}-\mathbb{P}\left\{\\|\widehat{S}_{n}\\|_{op}\leq s\mid X_{1},\ldots,X_{n}\right\}\Big{\|}$
	$\displaystyle\quad{}\lesssim\frac{\mathrm{tr}(\Sigma)}{n^{1/6}\kappa}+\frac{\mathrm{tr}^{2}(\Sigma)}{n^{1/6}\kappa^{2}}+n^{-1/3}+\frac{\sqrt{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}}{n^{1/2}\kappa}+\frac{\mathrm{tr}^{2}(\Sigma)-\mathrm{tr}(\Sigma^{2})}{n^{1/2}\kappa^{2}}$
	$\displaystyle\quad{}\quad{}+O_{p}\left(\left(\mathrm{r}(\Sigma)\\|\Sigma\\|_{op}^{2}\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)\right)^{1/3}\left(\frac{\kappa+\mathrm{tr}(\Sigma)}{\kappa^{2}}\right)^{2/3}\right)$
	$\displaystyle\quad{}\lesssim\frac{\\|\Sigma\\|_{op}}{\kappa}\frac{r(\Sigma)}{n^{1/6}}\vee\frac{\\|\Sigma\\|_{op}^{2}}{\kappa^{2}}\frac{r^{2}(\Sigma)}{n^{1/6}}$
	$\displaystyle\quad{}\quad{}+O_{p}\left(\left(\frac{r^{1/3}(\Sigma)\\|\Sigma\\|_{op}^{2/3}}{\kappa^{2/3}}\vee\frac{r(\Sigma)\\|\Sigma\\|_{op}^{4/3}}{\kappa^{4/3}}\right)\left(\sqrt{\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}}\vee\frac{(\log en)^{2}\mathrm{r}(\Sigma)}{n}\right)^{1/3}\right).$

To complete the proof, adjust some constants. ∎

Proof of Proposition 3.

Let $\mathcal{F}_{n}=\{v\mapsto\langle v,u\rangle_{\mathcal{H}}:u\in S^{*}\}$ . Further, let $Q_{n}$ be the empirical measure based on the $V_{i}=(T+\lambda)^{-2}T\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}$ , $1\leq i\leq n$ and $Q$ the pushforward measure of $P$ under the map $(Y,X)\mapsto V=(T+\lambda)^{-2}T\big{(}Y-f_{0}(X)\big{)}k_{X}$ . Since $f_{0}$ is the best approximation in square loss, $\mathrm{E}[Y-f_{0}(X)]=0$ and $V$ and the $V_{i}$ ’s have mean zero. Hence,

\displaystyle\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}=\sup_{u\in S^{*}}\left|\left\langle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}V_{i},u\right\rangle_{\mathcal{H}}\right|+\Theta_{n}\equiv\sqrt{n}\|Q_{n}-Q\|_{\mathcal{F}_{n}}+\Theta_{n},

where $\Theta_{n}$ is a reminder term which satisfies $|\Theta_{n}|\leq\sqrt{n}\|R_{n}\|_{\infty}$ . We plan to apply Theorem 9 to the far right hand side in above display. To this end, we need to (i) find an envelope for the function class $\mathcal{F}_{n}$ , (ii) establish (ii) compactness of $\mathcal{F}_{n}$ , (iii) construct certain Gaussian processes (to be defined below) with continuous covariance functions, and (iv) establish almost sure uniform continuity of these processes w.r.t. their intrinsic standard deviation metrics.

Recall that $\sqrt{k(x,y)}\leq\kappa$ for all $x,y\in S$ and that $\langle v,x^{*}\rangle_{\mathcal{H}}=\langle v,k_{x}\rangle_{\mathcal{H}}$ for all $v\in\mathcal{H}$ and $x\in S$ . Hence, $|\langle v,u\rangle_{\mathcal{H}}|\leq\|v\|_{\mathcal{H}}\sup_{x\in\mathcal{H}}\|k_{x}\|_{\mathcal{H}}\leq\kappa\|v\|_{\mathcal{H}}$ for all $v\in\mathcal{H}$ and $u\in S^{*}$ . Thus, $v\mapsto\kappa\|v\|_{\mathcal{H}}$ is an envelope of the function class $\mathcal{F}_{n}$ .

Next, let $Z$ and $\widehat{Z}_{n}^{m}$ be centered Gaussian random elements on $\mathcal{H}$ with covariance operators $\Omega$ and $\widehat{\Omega}_{n}^{m}$ , respectively, i.e. for all $u,v\in\mathcal{H}$ ,

	$\displaystyle\langle Z,u\rangle_{\mathcal{H}}\sim N\big{(}0,\mathcal{C}(u,v)\big{)},\quad\quad\text{where}\quad\quad\mathcal{C}(u,v)=\langle\Omega u,v\rangle_{\mathcal{H}},$
	$\displaystyle\langle\widehat{Z}_{n},u\rangle_{\mathcal{H}}\sim N\big{(}0,\mathcal{\widehat{C}}_{n}^{m}(u,v)\big{)},\quad\quad\text{where}\quad\quad\mathcal{\widehat{C}}_{n}^{m}(u,v)=\langle\widehat{\Omega}_{n}^{m}u,v\rangle_{\mathcal{H}}.$

By Cauchy-Schwarz these covariance functions are obviously continuous w.r.t. $\|\cdot\|_{\mathcal{H}}$ . Denote the standard deviation metrics associated with above Gaussian random elements by $d_{\mathcal{C}}$ and $d_{\widehat{\mathcal{C}}_{n}^{m}}$ . Then, for all $x,y\in S$ ,

\displaystyle d_{\mathcal{C}}^{2}(x^{*},y^{*})

\displaystyle=\mathrm{E}\big{[}(Z(x^{*})-Z(y^{*}))^{2}\big{]}=\mathrm{E}\big{[}\langle Z,x^{*}-y^{*}\rangle_{\mathcal{H}}^{2}\big{]}\leq\mathrm{E}\|Z\|_{\mathcal{H}}^{2}\|x^{*}-y^{*}\|_{\mathcal{H}}^{2}=\mathrm{tr}(\Omega)d_{k}^{2}(x,y),

and, completely analogous,

\displaystyle d_{\widehat{\mathcal{C}}_{n}^{m}}^{2}(x^{*},y^{*})\leq\mathrm{tr}(\widehat{\Omega}_{n}^{m})d_{k}^{2}(x,y).

Since $(S,d_{k})$ is compact and $\mathrm{tr}(\Omega)\vee\mathrm{tr}(\widehat{\Omega}_{n})<\infty$ by assumption, these inequalities imply that both $S^{*}$ and (by continuity of the inner product) $\mathcal{F}_{n}$ are compact w.r.t. the standard deviation metrics $d_{\mathcal{C}}$ and $d_{\widehat{\mathcal{C}}_{n}^{m}}$ . Moreover, since $\int_{0}^{\infty}\sqrt{N(S,d_{k},\varepsilon)}d\varepsilon<\infty$ , these inequalities also imply that $\int_{0}^{\infty}\sqrt{N(\mathcal{F}_{n},d_{\mathcal{C}},\varepsilon)}d\varepsilon<\infty$ and $\int_{0}^{\infty}\sqrt{N(\mathcal{F}_{n},d_{\widehat{\mathcal{C}}_{n}},\varepsilon)}d\varepsilon<\infty$ . Hence, by Lemma 17 there exist versions of $Z$ and $\widehat{Z}_{n}$ that are almost surely bounded and has almost surely uniformly $d_{\mathcal{C}}$ - and $d_{\widehat{\mathcal{C}}_{n}}$ -continuous sample paths. In the following, we keep using $Z$ and $\widehat{Z}_{n}^{m}$ to denote these versions.

Hence, Theorem 9 applies and we have

\displaystyle\begin{split}&\sup_{\alpha\in(0,1)}\Big{|}\mathbb{P}\left\{\sqrt{n}\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\|_{\infty}\leq c_{n}^{m}(\alpha)\right\}-\alpha\Big{|}\\ &\quad{}\quad{}\lesssim\frac{\kappa\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}}+\frac{\mathrm{E}[\|V\|_{\mathcal{H}}^{3}\mathbf{1}\{\|V\|_{\mathcal{H}}>n^{1/3}\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]^{1/3}\}]}{\mathrm{E}[\|V\|_{\mathcal{H}}^{3}]}+\frac{\mathrm{E}\|Z\|_{\mathcal{F}_{n}}}{\sqrt{n\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}}\\ &\quad{}\quad{}\quad{}+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}\right)^{1/3}+\mathbb{P}\left\{\big{\|}\Omega-\widehat{\Omega}_{n}^{m}\big{\|}_{op}>\delta\right\}\right\}\\ &\quad{}\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})}}+\mathbb{P}\left\{\sqrt{n}\|R_{n}\|_{\infty}>\eta\right\}\right\}.\end{split}

(69)

In the remainder of the proof we derive upper bounds on the quantities on the right-hand side in above display.

First, from the proof of Lemma 20 we know that

\displaystyle V=(T+\lambda)^{-1}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X}-\lambda(T+\lambda)^{-2}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X}.

Hence, by the proof of Lemma 22

	$\displaystyle\\|V\\|_{\mathcal{H}}$	$\displaystyle\leq\left\|Y_{i}-f_{0}(X_{i})\right\|\left\\|(T+\lambda)^{-1}k_{X_{i}}\right\\|_{\mathcal{H}}+\lambda\left\|Y_{i}-f_{0}(X_{i})\right\|\left\\|(T+\lambda)^{-2}k_{X_{i}}\right\\|_{\mathcal{H}}$
		$\displaystyle\leq\lambda^{-1}2\kappa(B+\kappa\\|f_{0}\\|_{\mathcal{H}})\quad a.s.$

and

	$\displaystyle\mathrm{E}\\|V\\|_{\mathcal{H}}^{2}$	$\displaystyle\leq 2\sigma_{0}^{2}\mathrm{E}\left[\left\\|(T+\lambda)^{-1}k_{X_{i}}\right\\|_{\mathcal{H}}^{2}\right]+2\lambda\sigma_{0}^{2}\mathrm{E}\left[\left\\|(T+\lambda)^{-2}k_{X_{i}}\right\\|_{\mathcal{H}}^{2}\right]$
		$\displaystyle\leq 2\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2}T\right)+2\sigma_{0}^{2}\mathrm{E}\left[\left\\|(T+\lambda)^{-1}k_{X_{i}}\right\\|_{\mathcal{H}}^{2}\right]$
		$\displaystyle\leq 4\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2}T\right).$

Therefore,

\displaystyle\mathrm{E}\|V\|_{\mathcal{H}}^{3}

\displaystyle\leq\lambda^{-1}8\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2}T\right)\kappa(B+\kappa\|f_{0}\|_{\mathcal{H}})\lesssim\lambda^{-1}\bar{\sigma}^{3}\mathfrak{n}_{2}^{2}(\lambda),

(70)

where $\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1$ and $\mathfrak{n}_{\alpha}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right)$ , $\alpha\in\mathbb{N}_{0}$ .

Second, by Cauchy-Schwarz,

\displaystyle\mathrm{E}\|Z\|_{\mathcal{F}_{n}}=\mathrm{E}\left[\sup_{u\in S^{*}}\big{|}\langle Z,u\rangle_{\mathcal{H}}\big{|}\right]\leq\mathrm{E}\|Z\|_{\mathcal{H}}\sup_{u\in S^{*}}\|u\|_{\mathcal{H}}\leq\sqrt{\mathrm{tr}(\Omega)}\kappa<\infty.

(71)

Third, for $r>0$ ,

$\displaystyle\frac{\mathrm{E}[\\|V\\|_{\mathcal{H}}^{3}\mathbf{1}\{\\|V\\|_{\mathcal{H}}>n^{1/3}\mathrm{E}[\\|V\\|_{\mathcal{H}}^{3}]^{1/3}\}]}{\mathrm{E}[\\|V\\|_{\mathcal{H}}^{3}]}$	$\displaystyle\leq\frac{\mathrm{E}[\\|V\\|_{\mathcal{H}}^{3+r}]}{n^{r/3}\mathrm{E}[\\|V\\|_{\mathcal{H}}^{3}]^{r/3+1}}$
	$\displaystyle=n^{-r/3}\left(\frac{\mathrm{E}[\\|V\\|_{\mathcal{H}}^{3+r}]^{1/(3+r)}}{\mathrm{E}[\\|V\\|_{\mathcal{H}}^{3}]^{1/3}}\right)^{3+r}$
	$\displaystyle\lesssim n^{-r/3}\left(\lambda^{-1}2\kappa(B+\kappa\\|f_{0}\\|_{\mathcal{H}})\right)^{r}.$	(72)

Fourth, for $u\in S^{*}$ arbitrary,

\displaystyle\mathrm{Var}\big{(}Z(u)\big{)}\geq\inf_{u\in S^{*}}\mathrm{Var}\big{(}\langle Z,u\rangle_{\mathcal{H}}\big{)}=\inf_{u\in S^{*}}\langle\Omega u,u\rangle_{\mathcal{H}}\geq\omega_{S}>0.

(73)

Combined with (71) and Lemma 7 this lower bound yields

\displaystyle\mathrm{Var}(\|Z\|_{\mathcal{F}_{n}})\gtrsim\left(\frac{\omega_{S}}{1+\mathrm{E}\|Z\|_{\mathcal{F}_{n}}/\omega_{S}}\right)^{2}\gtrsim\left(\frac{\omega_{S}^{2}}{\omega_{S}+\sqrt{\mathrm{tr}(\Omega)}\kappa}\right)^{2}\gtrsim\frac{\omega_{S}^{4}}{\mathrm{tr}(\Omega)\kappa^{2}}.

(74)

Fifth, by Lemma 22, with probability at least $1-\delta$ ,

	$\displaystyle\big{\\|}\Omega-\widehat{\Omega}_{n}^{m}\big{\\|}_{op}$	$\displaystyle\leq\big{\\|}\Omega-\widehat{\Omega}_{n}\big{\\|}_{op}+\big{\\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\\|}_{op}$
		$\displaystyle\lesssim\big{\\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\\|}_{op}+\\|T^{3}(T+\lambda)^{-4}\\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}}.$		(75)

Finally, Lemma 21 combined with (69)–(E), $r=1$ and $\delta\in(0,1/n)$ implies that, with probability at least $1-\delta$ ,

	$\displaystyle\begin{split}&\sup_{\alpha\in(0,1)}\Big{\|}\mathbb{P}\left\{\sqrt{n}\\|\widehat{f}^{\mathrm{bc}}_{n}-f_{0}\\|_{\infty}\leq c_{n}^{m}(\alpha)\right\}-\alpha\Big{\|}\\ &\quad{}\quad{}\lesssim\kappa\left(\frac{\bar{\sigma}^{3}\mathfrak{n}_{2}^{2}(\lambda)}{\sqrt{n}\lambda}\right)^{1/3}\frac{\sqrt{\mathrm{tr}(\Omega)}\kappa}{\omega_{S}^{2}}+\delta+\frac{\bar{\sigma}}{n^{1/3}\lambda}+\frac{\mathrm{tr}(\Omega)\kappa^{2}}{\sqrt{n}\omega_{S}^{2}}\\ &\quad{}\quad{}\quad{}+\mathbb{P}\left\{\big{\\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\\|}_{op}>\\|T^{3}(T+\lambda)^{-4}\\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}}\right\}\\ &\quad\quad\quad+\left(\\|T^{3}(T+\lambda)^{-4}\\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log(2/\delta)}{n\lambda^{2}}}\frac{\mathrm{tr}(\Omega)\kappa^{2}}{\omega_{S}^{4}}\right)^{1/3}+\delta+\\ &\quad{}\quad{}\quad{}+\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right)\left(\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\vee\kappa^{2}\log^{2}(2/\delta)\right)\frac{\sqrt{\mathrm{tr}(\Omega)}\kappa}{\omega_{S}^{2}}\\ &\quad\quad\quad+\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}\vee\sqrt{n}\lambda^{2}\right)\kappa\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}}\frac{\sqrt{\mathrm{tr}(\Omega)}\kappa}{\omega_{S}^{2}}\end{split}$			(76)
		$\displaystyle\overset{(a)}{=}o(1)+\mathbb{P}\left\{\big{\\|}\widehat{\Omega}_{n}-\widehat{\Omega}_{n}^{m}\big{\\|}_{op}>\\|T^{3}(T+\lambda)^{-4}\\|_{op}\sqrt{\frac{(\kappa^{4}+\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)+\bar{\sigma}^{4})\log n}{n\lambda^{2}}}\right\},$

where (a) holds provided that sample size $n$ , regularization parameter $\lambda$ , kernel $k$ , and operators $\Omega$ and $T$ are such that

\displaystyle\begin{split}(i)&\quad\quad\bar{\sigma}^{2}\kappa^{4}(\log n)^{3}\mathfrak{n}_{1}(\lambda)\sqrt{\mathrm{tr}(\Omega)}\|T^{3}(T+\lambda)^{-4}\|_{op}=o(\sqrt{n}\lambda\omega_{S}^{2})\\ (ii)&\quad\quad\kappa^{6}(\log n)^{2}\sqrt{\mathrm{tr}(\Omega)}\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(\sqrt{n}\omega_{S}^{2})\\ (iii)&\quad\quad\lambda^{2}\kappa^{2}\sqrt{\mathrm{tr}(\Omega)}\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}=o(\omega_{S}^{2})\\ (iv)&\quad\quad\bar{\sigma}^{2}\kappa^{2}\sqrt{\log n}\mathrm{tr}(\Omega)\mathfrak{n}_{1}(\lambda)\|T^{3}(T+\lambda)^{-4}\|_{op}=o(\sqrt{n}\lambda\omega_{S}^{4})\\ (v)&\quad\quad\bar{\sigma}^{3}\kappa^{3}\big{(}\mathrm{tr}(\Omega)\big{)}^{3/2}\mathfrak{n}_{2}^{2}(\lambda)=o(\sqrt{n}\lambda\omega_{S}^{6})\\ (vi)&\quad\quad\bar{\sigma}=o(n^{1/3}\lambda),\end{split}

(77)

where $\bar{\sigma}^{2}\geq\sigma_{0}^{2}\vee\kappa^{2}(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}\vee 1$ and $\mathfrak{n}_{\beta}^{2}(\lambda)=\mathrm{tr}\left((T+\lambda)^{-2\beta}T\right)$ , $\beta\in\mathbb{N}_{0}$ .

Simplifying these rates is beyond the scope of this illustrative example. The interested reader may consult Appendix H in Singh and Vijaykumar (2023) and Sections 6 and 7 in Lopes (2022b) for potentially useful results. If the Hilbert space $\mathcal{H}$ is finite dimensional and $T$ is invertible, then these rates are satisfied for $\lambda\rightarrow 0$ and $n^{1/3}\lambda\rightarrow\infty$ (the exact rates feature some $(\log n)^{c}$ - factor for some $c\geq 1$ ). ∎

Appendix F Proofs of the results in Section B

F.1 Proofs of Lemmas 1, 4, and 5

Proof of Lemma 1.

For $s,t\in\mathbb{R}$ and $\lambda>0$ arbitrary, define

\displaystyle g_{s,\lambda}(t):=\left(\left(1+\frac{s-t}{\lambda}\right)\wedge 1\right)\vee 0\quad{}\quad{}\mathrm{and}\quad{}\quad{}g_{s,0}(t):=\mathbf{1}_{(-\infty,s]}(t).

(78)

Since $g_{s,0}\leq g_{s,\lambda}\leq g_{s+\lambda,0}$ for all $s\in\mathbb{R}$ and $\lambda>0$ , it follows that (draw a sketch!)

\displaystyle g_{s,0}\leq g_{s+\lambda,\lambda}\leq g_{s+\lambda,\lambda}\ast\varrho_{\lambda}\leq g_{s+2\lambda,\lambda}\leq g_{s+3\lambda,0},

(79)

where $\varrho_{\lambda}(\cdot):=\lambda^{-1}\varrho\left(\>\cdot\>\lambda^{-1}\right)$ and

\displaystyle\varrho(t)=C_{0}\exp\left(\frac{1}{t^{2}-1}\right)\mathbf{1}_{[-1,1]}(t),

where $C_{0}>0$ is an absolute constant such that $\int\varrho(t)dt=1$ . Since $\varrho\in C_{c}^{\infty}(\mathbb{R})$ with support $[-1,1]$ , we easily verify that the map $t\mapsto(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(t)$ is continuously differentiable and its $k$ th derivative satisfies

\displaystyle\left|D^{k}\left(g_{s+\lambda,\lambda}\ast\varrho_{\lambda}\right)(t)\right|=\left|\left(g_{s+\lambda,\lambda}\ast(D^{k}\varrho_{\lambda})\right)(t)\right|\leq C_{k}\lambda^{-k}\mathbf{1}_{[s,s+3\lambda]}(t),

(80)

where $C_{k}>0$ is a constant depending only on $k\in\mathbb{N}_{0}$ . This establishes the first claim of the lemma with $h_{s,\lambda}\equiv g_{s+\lambda,\lambda}\ast\varrho_{\lambda}$ . To prove the second claim of the lemma we proceed in two steps: First, we show that

\displaystyle\sup_{s\in\mathbb{R}}\left|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right|

\displaystyle\leq\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)]\big{|}+\zeta_{3\lambda}(X)\wedge\zeta_{3\lambda}(Z).

By the chain of inequalities in eq. (79), for $s\in\mathbb{R}$ arbitrary,

		$\displaystyle\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}$
		$\displaystyle\quad{}=\mathrm{E}\left[g_{s,0}(X)-g_{s,0}(Z)\right]$
		$\displaystyle\quad{}\leq\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]\right\|+\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-g_{s,0}(Z)\right]\right\|$
		$\displaystyle\quad{}\leq\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]\right\|+\left\|\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s,0}(Z)\right]\right\|$
		$\displaystyle\quad{}\leq\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]\right\|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\}.$		(81)

Similarly,

		$\displaystyle\mathbb{P}\left\{Z\leq s+3\lambda\right\}-\mathbb{P}\left\{X\leq s+3\lambda\right\}$
		$\displaystyle\quad{}=\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s+3\lambda,0}(X)\right]$
		$\displaystyle\quad{}\leq\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]\right\|+\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-g_{s+3\lambda,0}(Z)\right]\right\|$
		$\displaystyle\quad{}\leq\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]\right\|+\left\|\mathrm{E}\left[g_{s+3\lambda}(Z)-g_{s,0}(Z)\right]\right\|$
		$\displaystyle\quad{}\leq\left\|\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]\right\|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\}.$		(82)

Now, take the supremum over $s\in\mathbb{R}$ in eq. (F.1) and (F.1) and switch the roles of $X$ and $Z$ . Next, we show

\displaystyle\sup_{s\in\mathbb{R}}\big{|}\mathrm{E}[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)]\big{|}

\displaystyle\leq\sup_{s\in\mathbb{R}}\left|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right|+\zeta_{3\lambda}(X)\wedge\zeta_{3\lambda}(Z).

As before, we compute

		$\displaystyle\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)\right]$
		$\displaystyle\quad{}\leq\mathrm{E}\left[g_{s+3\lambda,0}(X)-g_{s+3\lambda,0}(Z)\right]+\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s,0}(Z)\right]$
		$\displaystyle\quad{}\leq\left\|\mathbb{P}\left\{X\leq s+3\lambda\right\}-\mathbb{P}\left\{Z\leq s+3\lambda\right\}\right\|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\},$		(83)

and

		$\displaystyle\mathrm{E}\left[(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(Z)-(g_{s+\lambda,\lambda}\ast\varrho_{\lambda})(X)\right]$
		$\displaystyle\quad{}\leq\mathrm{E}\left[g_{s,0}(Z)-g_{s,0}(X)\right]+\mathrm{E}\left[g_{s+3\lambda,0}(Z)-g_{s,0}(Z)\right]$
		$\displaystyle\quad{}\leq\left\|\mathbb{P}\left\{X\leq s\right\}-\mathbb{P}\left\{Z\leq s\right\}\right\|+\mathbb{P}\left\{s\leq Z\leq s+3\lambda\right\}.$		(84)

To conclude, take the supremum over $s\in\mathbb{R}$ in (F.1) and (F.1) and switch the roles of $X$ and $Z$ . ∎

Proof of Lemma 4.

The set on which $g\circ f$ is not differentiable is contained in the null set $N$ on which $f$ is not differentiable. Thus, $g\circ f$ is $k$ -times differentiable almost everywhere. The derivatives now follow from Hardy (2006). ∎

Proof of Lemma 5.

Since $f$ is a piecewise linear function, it has partial derivatives of any order at all its continuity points. Since the set of discontinuity points of $f$ forms a null set with respect to the Lebesgue measure on $\mathbb{R}^{d}$ , $f$ differentiable almost everywhere on $\mathbb{R}^{d}$ . (These partial derivatives need not to be continuous!) The expressions of these partial derivatives follow from direct calculation.

Even more is true: Since $f$ is Lipschitz continuous, Rademacher’s theorem implies that $f$ is in fact totally differentiable almost everywhere on $\mathbb{R}^{d}$ . Furthermore, since $f$ is convex, Alexandrov’s theorem implies that $f$ satisfies a second order quadratic expansion almost everywhere on $\mathbb{R}^{d}$ , i.e. for all $x,a\in\mathbb{R}^{d}$ ,

\displaystyle\left|f(x+a)-f(x)-Df(x)a-\frac{1}{2}a^{\prime}Ha\right|=o(\|a\|_{2}^{2}).

According to Rockafellar (1999) the matrix $H\in\mathbb{R}^{d\times d}$ is symmetric, positive semi-definite, and equal to the Jacobian of $Df(x)$ for all $x\in\mathbb{R}^{d}$ at which $Df(x)$ exists. Thus, $H$ can be identified with the second derivative $D^{2}f(x)$ for almost all $x\in\mathbb{R}^{d}$ . ∎

F.2 Proofs of Lemmas 2 and 3

Proof of Lemma 2.

Throughout the proof we write $h(x)=(h_{s,\lambda}\circ f)(x)$ with $h_{s,\lambda}\in C^{\infty}_{c}(\mathbb{R})$ as defined in Lemma 1 and $f(x)=\|x\|_{\infty}$ . To simplify the notation, we denote partial derivatives w.r.t $x_{j}$ by $\partial_{j}$ , e.g. we write $\partial_{i_{1}}h$ for $\frac{\partial h}{\partial x_{i_{1}}}$ , $\partial_{i_{2}}f$ for $\frac{\partial f}{\partial x_{i_{2}}}$ , etc. We use $\mathcal{L}^{d}$ to denote the Lebesgue measure on $\mathbb{R}^{d}$ .

Proof of part (i).

Special case $k=1$ .

Let $\varepsilon>0$ and $e_{i_{1}}\in\mathbb{R}^{d}$ the $i_{1}$ -th standard unit vector in $\mathbb{R}^{d}$ . Recall that $h=h_{s,\lambda}\circ f$ , where $h_{s,\lambda}$ is $\lambda^{-1}$ -Lipschitz w.r.t. the norm induced by the absolute value and $f(x)=\|x\|_{\infty}$ is $1$ -Lipschitz w.r.t. the metric induced by the $\ell_{\infty}$ -norm. Thus, for $x_{0},z\in\mathbb{R}^{d}$ and $t,\epsilon>0$ arbitrary,

	$\displaystyle\Delta_{i_{1}}(x_{0},z,t;\epsilon)$	$\displaystyle:=\epsilon^{-1}\left[h\left(e^{-t}x_{0}+e^{-t}\epsilon e_{i_{1}}+\sqrt{1-e^{-2t}}z\right)-h\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}z\right)\right]$
		$\displaystyle\leq\epsilon^{-1}\lambda^{-1}\\|e^{-t}\epsilon e_{i_{1}}\\|_{\infty}$
		$\displaystyle=\lambda^{-1}e^{-t}.$

Hence, the difference quotient $\Delta_{i_{1}}(x_{0},z,t;\epsilon)$ is bounded uniformly in $x_{0},z\in\mathbb{R}^{d}$ and $t,\epsilon>0$ . Furthermore, by Lemma 5 $h$ is differentiable $\mathcal{L}^{d}$ -a.e. Therefore, the conditions of Corollary A.5 in Dudley (2014) are met and we can pass the derivative through both integrals (over $t>0$ and $z\in\mathbb{R}^{d}$ ) to obtain

\displaystyle\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}=\int_{0}^{\infty}e^{-t}P_{t}\partial_{i_{1}}h(x_{0})dt.

(85)

The cases $k\geq 2$ .

“Off-the-shelf” differentiating under the integral sign seems to be only possible in the case $k=1$ . In all other cases, Corollary A.5 and related results in Dudley (2002) do not apply, because the higher-order difference quotients of $h$ are not uniformly integrable (considered as a collection of random variables indexed by a null sequence $(\epsilon_{n})_{n\geq 1}$ ). Therefore, to prove the claim for $k\geq 2$ we develop a more specific inductive argument which is tailored explicitly to the map $x\mapsto\|x\|_{\infty}$ and the fact that we integrate w.r.t. a non-degenerate Gaussian measure.

Base case $k=2$ and $i_{1}\neq i_{2}$ .

Let $Y\sim N(0,I_{d})$ , $A=[A_{1},\ldots,A_{d}]=\Sigma^{-1/2}$ , and $\varphi$ be the density of the law of $Y$ . By a change of variable we can re-write eq. (85) as

$\displaystyle\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{\|}_{x=x_{0}}$	$\displaystyle=\int_{0}^{\infty}e^{-t}P_{t}\partial_{i_{1}}h(x_{0})dt$
	$\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\partial_{i_{1}}h\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}A^{-1}y\right)\varphi(y)dydt$
	$\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\partial_{i_{1}}h\left(u\right)\psi(x_{0},u,t)dudt,$	(86)

where

\displaystyle\psi(x,u,t):=\left(1-e^{-2t}\right)^{-d/2}\det(A)\varphi\left(\frac{A(u-e^{-t}x)}{\sqrt{1-e^{-2t}}}\right).

Since $A$ is full rank, $\psi$ is a smooth function of $\mathcal{L}^{d}$ -a.e. $x\in\mathbb{R}^{d}$ . Hence, the map $x\mapsto e^{-t}\partial_{i_{1}}h\left(u\right)\partial_{i_{2}}\psi(x,u,t)$ exists and is continuous for $\mathcal{L}^{d}$ -a.e. $x\in\mathbb{R}^{d}$ . Furthermore, the map is uniformly bounded and integrable in $u\in\mathbb{R}^{d}$ and $t\geq 0$ for $\mathcal{L}^{d}$ -a.e. $x\in\mathbb{R}^{d}$ . Hence, by Corollary A.4 in Dudley (2014) we can pass a second partial derivative through the integral in (F.2) and compute

\displaystyle\partial_{i_{2}}\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}

\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\partial_{i_{1}}h\left(u\right)\partial_{i_{2}}\psi(x_{0},u,t)dudt.

(87)

In the following we show that one can “pull back” the partial derivative $\partial_{i_{2}}$ from $\psi$ onto $\partial_{i_{1}}h$ . Our argument involves a version of Stein’s lemma (i.e. Chernozhukov et al., 2020, Theorem 11.1) and a regularization of $\partial_{i_{1}}h$ . Several computations also crucially depend on the fact that $h(x)=h_{s,\lambda}(\|x\|_{\infty})$ .

Let $\varrho\in C^{\infty}(\mathbb{R})$ be the standard mollifier

\displaystyle\varrho(r)=C_{0}\exp\left(\frac{1}{r^{2}-1}\right)\mathbf{1}\{-1\leq r\leq 1\},

where $C_{0}>0$ is an absolute constant such that $\int\varrho(r)dr=1$ . For $\eta>0$ , set $\varrho_{\eta}(\cdot)=\eta^{-1}\varrho\left(\>\cdot\>\eta^{-1}\right)$ and define the “partial regularization” of a function $g$ on $\mathbb{R}^{d}$ in its $i$ th coordinate by

\displaystyle(\varrho_{\eta}\ast_{(i)}g):=\int_{-1}^{1}\varrho(r)\>g(u-r\eta e_{i})dr,

(88)

where $e_{i}$ is the $i$ th standard unit vector in $\mathbb{R}^{d}$ . With this notation, define

	$\displaystyle h_{i_{1}}^{\eta,i_{2}}(u)$	$\displaystyle:=(Dh_{s,\lambda}\circ f)(u)(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)$
		$\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\int_{-1}^{1}\varrho(r)\>\partial_{i_{1}}f(u-r\eta e_{i_{2}})dr$
		$\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\>\mathbf{1}\{\|u_{i_{1}}\|\geq\|u_{j}\|,\>\forall j\neq i_{2}\}\big{(}\varrho_{\eta}\ast\mathbf{1}\{\|u_{i_{1}}\|\geq\|\cdot\|\}\big{)}(u_{i_{2}}).$

Obviously, the regularization $h_{i_{1}}^{\eta,i_{2}}(u)$ is just $\partial_{i_{1}}h$ with the discontinuity at $u_{i_{2}}$ smoothed out. (We comment on the rationale behind this partial regularization after eq. (94).) In particular, $u\mapsto\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}(u)$ exists for all $u\in\mathbb{R}^{d}$ , and, by Lemmas 4 and 5,

	$\displaystyle\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}(u)$	$\displaystyle=\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)\>\partial_{i_{2}}f(u)$		(89)
		$\displaystyle\quad{}+\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\partial_{i_{2}}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u).$		(90)

Observe that by Leibniz’s integral rule the partial derivative $\partial_{i_{2}}$ in line (90) takes the form

\displaystyle\partial_{i_{2}}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)=\mathrm{sign}(u_{i_{1}})\mathbf{1}\{|u_{i_{1}}|\geq|u_{j}|,\>j\neq i_{2}\}\>\partial_{i_{2}}\big{(}\varrho_{\eta}\ast\mathbf{1}\{|u_{i_{1}}|\geq|\cdot|\}\big{)}(u_{i_{2}}),

where

$\displaystyle\partial_{i_{2}}\big{(}\varrho_{\eta}\ast\mathbf{1}\{\|u_{i_{1}}\|\geq\|\cdot\|\}\big{)}(u_{i_{2}})$	$\displaystyle=\partial_{i_{2}}\int_{(u_{i_{2}}-\|u_{i_{1}}\|)/\eta}^{(u_{i_{2}}+\|u_{i_{1}}\|)/\eta}\varrho(r)dr$
	$\displaystyle=\varrho_{\eta}(u_{i_{2}}+\|u_{i_{1}}\|)-\varrho_{\eta}(u_{i_{2}}-\|u_{i_{1}}\|)$
	$\displaystyle=\left(\varrho_{\eta}\ast\mathbf{1}_{-\|u_{i_{1}}\|}\right)(u_{i_{2}})-\left(\varrho_{\eta}\ast\mathbf{1}_{\|u_{i_{1}}\|}\right)(u_{i_{2}}),$	(91)

where $x\mapsto\mathbf{1}_{a}(x)=\mathbf{1}\{a=x\}$ , $a,x\in\mathbb{R}$ . Thus, by Lemma 1 (i) we easily verify that $h_{i_{1}}^{\eta,i_{2}}$ and $\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}$ are bounded and integrable.

We return to eq. (87). Adding and subtracting $h_{i_{1}}^{\eta,i_{2}}$ we expand its right hand side as

	$\displaystyle\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}h_{i_{1}}^{\eta,i_{2}}(u)\partial_{i_{2}}\psi(x_{0},u,t)dudt$		(92)
	$\displaystyle\quad{}+\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{2}}(u)\right)\partial_{i_{2}}\psi(x_{0},u,t)dudt.$		(93)

Consider the integral in line (92). A change of variable yields

	$\displaystyle\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}h_{i_{1}}^{\eta,i_{2}}(u)\partial_{i_{2}}\psi(x_{0},u,t)dudt$
	$\displaystyle\quad{}=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}\frac{e^{-2t}}{\sqrt{1-e^{-2t}}}h_{i_{1}}^{\eta,i_{2}}(u)\left(1-e^{-2t}\right)^{-d/2}\det(A)\varphi\left(\frac{A(u-e^{-t}x_{0})}{\sqrt{1-e^{-2t}}}\right)\left(\frac{A(u-e^{-t}x_{0})}{\sqrt{1-e^{-2t}}}\right)^{\prime}A_{i_{2}}dudt$
	$\displaystyle\quad{}=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}\frac{e^{-2t}}{\sqrt{1-e^{-2t}}}h_{i_{1}}^{\eta,i_{2}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}A^{-1}y\right)\varphi(y)y^{\prime}(A^{-1})^{\prime}A^{\prime}A_{i_{2}}dydt$
	$\displaystyle\quad{}=\int_{0}^{\infty}\frac{e^{-2t}}{\sqrt{1-e^{-2t}}}\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z\right)Z^{\prime}\right]dt\>A^{\prime}A_{i_{2}}.$

By Stein’s lemma for non-differentiable (but bounded and integrable) functions (i.e. Chernozhukov et al., 2020, Theorem 11.1) the last line in above display is equal to

	$\displaystyle\int_{0}^{\infty}e^{-2t}D\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{\|}_{x=e^{-t}x_{0}}dt\>\Sigma A^{\prime}A_{i_{2}}$
	$\displaystyle\quad{}=\int_{0}^{\infty}e^{-2t}D\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{\|}_{x=e^{-t}x_{0}}dt\>A^{-1}A_{i_{2}}$
	$\displaystyle\quad{}=\int_{0}^{\infty}e^{-2t}D\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{\|}_{x=e^{-t}x_{0}}dt\>e_{i_{2}}$
	$\displaystyle\quad{}=\int_{0}^{\infty}e^{-2t}\partial_{i_{2}}\mathrm{E}\left[h_{i_{1}}^{\eta,i_{2}}\left(x+\sqrt{1-e^{-2t}}Z\right)\right]\Big{\|}_{x=e^{-t}x_{0}}dt,$

where $e_{i_{2}}$ denotes the $i_{2}$ -th standard unit vector in $\mathbb{R}^{d}$ .

Consider the expression in the previous line. Since $x\mapsto\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}(x+u)$ exists and is uniformly bounded in $x,u\in\mathbb{R}^{d}$ , we can push the partial derivative $\partial_{i_{2}}$ through the expectation (e.g. Folland, 1999, Theorem 2.27 (b); the integrability condition is satisified because we integrate w.r.t. the non-degenerate law of $N(0,\Sigma)$ ). Thus, we have shown that the integral in line (92) equals

\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z\right)\right]dt.

(94)

Obviously, we could have derived eq. (94) under any kind of smoothing. However, under the “partial regularization” of $h_{i_{1}}$ in its $i_{2}$ th coordinate the partial derivative $\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}$ takes on a simple closed-form expression (eq. (89)–(F.2)). The simple form of the expression in eq. (F.2) is particularly important as it strongly suggests that $\partial_{i_{2}}h_{i_{1}}^{\eta,i_{2}}$ does not “blow up” as $\eta\downarrow 0$ . Indeed, if we had “fully” regularized $h_{i_{1}}$ in all its coordinates, the expression in eq. (F.2) would not be as simple and its asymptotic behavior (as $\eta\downarrow 0$ ) would be less clear.

We now show that the integral in eq. (94) converges to

\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt,

(95)

as $\eta\downarrow 0$ , where

\displaystyle V_{0}^{t}:=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z.

We record the following useful bounds: First, by Lemma 1 (i), for all $u\in\mathbb{R}^{d}$ ,

	$\displaystyle\Big{\|}(Dh_{s,\lambda}\circ f)(u)\Big{\|}$	$\displaystyle\leq\lambda^{-1}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\},$		(96)
	$\displaystyle\Big{\|}(D^{2}h_{s,\lambda}\circ f)(u)\Big{\|}$	$\displaystyle\leq\lambda^{-2}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}.$		(97)

Second, there exists $C_{A,d}>0$ (depending on $A$ and $d$ ) such that for all $u\in\mathbb{R}^{d}$ and $t\geq 0$ ,

\displaystyle\left|\psi\right|(x_{0},u,t)\vee\left|\partial_{i_{2}}\psi\right|(x_{0},u,t)\leq C_{A,d}\quad{}\text{for\>\>\>}\mathcal{L}^{d}\text{-a.e.\>\>\>}x_{0}\in\mathbb{R}^{d}.

(98)

By eq. (97) and (98),

		$\displaystyle\left\|\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt\right.$
		$\displaystyle\quad{}\quad{}\quad{}\quad{}\left.-\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt\right\|$
		$\displaystyle\quad{}=\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\Big{(}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{)}\>\partial_{i_{2}}f(u)\psi(x_{0},u,t)dudt\right\|$
		$\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda^{2}}\int_{\mathbb{R}^{d}}\Big{\|}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{\|}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}du$
		$\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0,$		(99)

where the limit in the last line follows from Lemma 3.

Since the law of $N(0,\Sigma)$ is non-degenerate,

\displaystyle\big{|}Corr(V_{0,i_{1}}^{t},V_{0,i_{2}}^{t})\big{|}=\big{|}Corr(Z_{i_{1}},Z_{i_{2}})\big{|}<1,

and, hence, the event $\{|V_{0,i_{1}}^{t}|=|V_{0,i_{2}}^{t}|\}$ is a $N(0,\Sigma)$ -null set for all $t\geq 0$ and $\mathcal{L}^{d}$ -a.e. $x_{0}\in\mathbb{R}^{d}$ (for a more detailed argument, see also below proof of part (ii) of this lemma). Thus,

	$\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\Big{[}\big{(}Dh_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\mathrm{sign}(V_{0,i_{1}}^{t})\mathbf{1}\{\|V_{0,i_{1}}^{t}\|\geq\|V_{0,j}^{t}\|,\>j\neq i_{2}\}\left(-\mathbf{1}_{-\|V_{0,i_{1}}^{t}\|}(V_{0,i_{2}}^{t})+\mathbf{1}_{\|V_{0,i_{1}}^{t}\|}(V_{0,i_{2}}^{t})\right)\Big{]}dt$
	$\displaystyle\quad{}=0.$

Therefore, by eq. (F.2), (96), and (98),

		$\displaystyle\left\|\int_{0}^{\infty}e^{-2t}\mathrm{E}\Big{[}\big{(}Dh_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{2}}(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(V_{0}^{t})\Big{]}dt\right\|$
		$\displaystyle\quad{}=\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}Dh_{s,\lambda}\circ f\big{)}(u)\>\mathrm{sign}(u_{i_{1}})\mathbf{1}\{\|u_{i_{1}}\|\geq\|u_{j}\|,\>j\neq i_{2}\}\right.$
		$\displaystyle\quad{}\quad{}\quad{}\quad{}\left.\phantom{\int_{0}^{\infty}}\times\left[\left(\varrho_{\eta}\ast\left[\mathbf{1}_{-\|u_{i_{1}}\|}-\mathbf{1}_{\|u_{i_{1}}\|}\right]\right)(u_{i_{2}})-\left(\mathbf{1}_{-\|u_{i_{1}}\|}(u_{i_{2}})-\mathbf{1}_{\|u_{i_{1}}\|}(u_{i_{2}})\right)\right]\psi(x_{0},u,t)dt\right\|$
		$\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\left\|\left(\varrho_{\eta}\ast\mathbf{1}_{-\|u_{i_{1}}\|}\right)(u_{i_{2}})-\mathbf{1}_{-\|u_{i_{1}}\|}(u_{i_{2}})\right\|\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}du$
		$\displaystyle\quad{}\quad{}+\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\left\|\left(\varrho_{\eta}\ast\mathbf{1}_{\|u_{i_{1}}\|}\right)(u_{i_{2}})-\mathbf{1}_{\|u_{i_{1}}\|}(u_{i_{2}})\right\|\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}du$
		$\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0,$		(100)

where the limit follows from Lemma 3. The limit in line (95) now follows by combining eq. (F.2) and (F.2) with eq. (89) and (90).

Lastly, we return to the integral in line (93). By eq. (97) and (98),

		$\displaystyle\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{2}}(u)\right)\partial_{i_{2}}\psi(x_{0},u,t)dudt\right\|$
		$\displaystyle\quad{}=\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}(Dh_{s,\lambda}\circ f)(u)\Big{(}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)\Big{)}\partial_{i_{2}}\psi(x_{0},u,t)dudt\right\|$
		$\displaystyle\quad{}\leq\frac{C_{A,d}}{\lambda}\int_{\mathbb{R}^{d}}\Big{\|}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{2})}\partial_{i_{1}}f)(u)\Big{\|}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}du$
		$\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0,$		(101)

where the limit follows from Lemma 3.

Combine eq. (95) and (F.2) with eq. (87), (92), and (93) to conclude via Lemma (5) that for $\mathcal{L}^{d}$ -a.e. $x_{0}\in\mathbb{R}^{d}$ ,

\displaystyle\begin{split}\partial_{i_{2}}\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}&=\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{2}}f(V_{0}^{t})\right]dt\\ &\equiv\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\partial_{i_{2}}\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt\\ &\equiv\int_{0}^{\infty}e^{-2t}P_{t}\left(\partial_{i_{2}}\partial_{i_{1}}h\right)(x_{0})dt.\end{split}

(102)

Notice that the order in which we take the partial derivatives $\partial_{i_{1}}$ and $\partial_{i_{2}}$ does not matter.

Base case $k=2$ and $i_{1}=i_{2}$ .

The strategy is identical to the one of the preceding case $i_{1}\neq i_{2}$ . The only difference is the regularization of $\partial_{i_{1}}h$ . Recall the notion of “partial regularization” from eq. (88) and define, for $\eta>0$ ,

	$\displaystyle h_{i_{1}}^{\eta,i_{1}}(u)$	$\displaystyle:=(Dh_{s,\lambda}\circ f)(u)(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)$
		$\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\int_{-1}^{1}\varrho(r)\>\partial_{i_{1}}f(u-r\eta e_{i_{1}})dr$
		$\displaystyle\equiv(Dh_{s,\lambda}\circ f)(u)\big{(}\varrho_{\eta}\ast\mathrm{sign}(\cdot)\mathbf{1}\{\|\cdot\|\geq\|u_{j}\|,\>\forall j\}\big{)}(u_{i_{1}}).$

The map $u\mapsto\partial_{i_{1}}h_{i_{1}}^{\eta,i_{1}}(u)$ exists for all $u\in\mathbb{R}^{d}$ , and, by Lemmas 4 and 5,

	$\displaystyle\partial_{i_{1}}h_{i_{1}}^{\eta,i_{2}}(u)$	$\displaystyle=\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)\>\partial_{i_{1}}f(u)$		(103)
		$\displaystyle\quad{}+\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\partial_{i_{1}}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u).$		(104)

By Leibniz’s integral rule we find that the partial derivative $\partial_{i_{1}}$ in line (104) equals

		$\displaystyle\partial_{i_{1}}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)$
		$\displaystyle\quad{}=\partial_{i_{1}}\big{(}\varrho_{\eta}\ast\mathrm{sign}(\cdot)\mathbf{1}\{\|\cdot\|\geq\|u_{j}\|,\>\forall j\}\big{)}(u_{i_{1}})$
		$\displaystyle\quad{}=\partial_{i_{1}}\left(\int_{(u_{i_{1}}+\max_{j\neq i_{1}}\|u_{j}\|)/\eta}^{1}\varrho(r)\mathrm{sign}(u_{i_{1}}-r\eta)dr+\int_{-1}^{(u_{i_{1}}-\max_{j\neq i_{1}}\|u_{j}\|)/\eta}\varrho(r)\mathrm{sign}(u_{i_{1}}-r\eta)dr\right)$
		$\displaystyle\quad{}=\partial_{i_{1}}\left(-\int_{(u_{i_{1}}+\max_{j\neq i_{1}}\|u_{j}\|)/\eta}^{1}\varrho(r)dr+\int_{-1}^{(u_{i_{1}}-\max_{j\neq i_{1}}\|u_{j}\|)/\eta}\varrho(r)dr\right)$
		$\displaystyle\quad{}=\varrho_{\eta}(u_{i_{1}}+\max_{j\neq i_{1}}\|u_{j}\|)-\varrho_{\eta}(u_{i_{1}}-\max_{j\neq i_{1}}\|u_{j}\|)$
		$\displaystyle\quad{}=\left(\varrho_{\eta}\ast\mathbf{1}_{-\max_{j\neq i_{1}}\|u_{j}\|}\right)(u_{i_{1}})-\left(\varrho_{\eta}\ast\mathbf{1}_{\max_{j\neq i_{1}}\|u_{j}\|}\right)(u_{i_{1}}),$		(105)

where $x\mapsto\mathbf{1}_{a}(x)=\mathbf{1}\{a=x\}$ , $a,x\in\mathbb{R}$ . Thus, from Lemma 1 (i) we infer that $h_{i_{1}}^{\eta,i_{1}}$ and $\partial_{i_{1}}h_{i_{1}}^{\eta,i_{1}}$ are both bounded and integrable.

The same arguments that led to eq. (87), (92), and (93) also yield

	$\displaystyle\frac{\partial^{2}}{\partial x_{i_{1}}^{2}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{\|}_{x=x_{0}}$	$\displaystyle=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}h_{i_{1}}^{\eta,i_{1}}(u)\partial_{i_{1}}\psi(x_{0},u,t)dudt$		(106)
		$\displaystyle\quad{}+\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{1}}(u)\right)\partial_{i_{1}}\psi(x_{0},u,t)dudt.$		(107)

Repeating the arguments that gave eq. (94) we find that the integral in line (106) equals

\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\partial_{i_{1}}h_{i_{1}}^{\eta,i_{1}}\left(e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z\right)\right]dt.

(108)

We now study the behavior of the integrals in lines (107) and (108) as $\eta\downarrow 0$ . The arguments are similar to those used in the case $i_{1}\neq i_{2}$ ; we provide them for completeness only. As before, let $V_{0}^{t}=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z$ . By eq. (97) and (98) and Lemma 3,

		$\displaystyle\left\|\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt\right.$
		$\displaystyle\quad{}\quad{}\quad{}\quad{}\left.-\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt\right\|$
		$\displaystyle\quad{}=\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(u)\>\Big{(}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{)}\>\partial_{i_{1}}f(u)\psi(x_{0},u,t)dudt\right\|$
		$\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda^{2}}\int_{\mathbb{R}^{d}}\Big{\|}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)-\partial_{i_{1}}f(u)\Big{\|}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}du$
		$\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0,$		(109)

Since the law of $N(0,\Sigma)$ is non-degenerate, by eq. (F.2), (96), and (98) and Lemma 3,

		$\displaystyle\left\|\int_{0}^{\infty}e^{-2t}\mathrm{E}\Big{[}\big{(}Dh_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}})(V_{0}^{t})\Big{]}dt\right\|$
		$\displaystyle\quad{}=\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\big{(}Dh_{s,\lambda}\circ f\big{)}(u)\right.$
		$\displaystyle\left.\phantom{\int_{0}^{\infty}}\times\left[\left(\varrho_{\eta}\ast\left[\mathbf{1}_{-\max_{j\neq i_{1}}\|u_{j}\|}-\mathbf{1}_{\max_{j\neq i_{1}}\|u_{j}\|}\right]\right)(u_{i_{2}})-\left(\mathbf{1}_{-\max_{j\neq i_{1}}\|u_{j}\|}-\mathbf{1}_{\max_{j\neq i_{1}}\|u_{j}\|}\right)(u_{i_{1}})\right]\psi(x_{0},u,t)dt\right\|$
		$\displaystyle\quad{}\leq\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}\left\|\left(\varrho_{\eta}\ast\mathbf{1}_{-\max_{j\neq i_{1}}\|u_{j}\|}\right)(u_{i_{1}})-\mathbf{1}_{-\max_{j\neq i_{1}}\|u_{j}\|}(u_{i_{1}})\right\|du$
		$\displaystyle\quad{}\quad{}+\frac{C_{A,d}}{2\lambda}\int_{\mathbb{R}^{d}}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}\left\|\left(\varrho_{\eta}\ast\mathbf{1}_{\max_{j\neq i_{1}}\|u_{j}\|}\right)(u_{i_{1}})-\mathbf{1}_{\max_{j\neq i_{1}}\|u_{j}\|}(u_{i_{1}})\right\|du$
		$\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0.$		(110)

Combine eq. (F.2) and (F.2) to conclude that, as $\eta\downarrow 0$ , the integral in (108) converges to

\displaystyle\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt.

(111)

Lastly, we turn to the integral in line (107). By eq. (97) and (98) and Lemma 3,

		$\displaystyle\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}\left(\partial_{i_{1}}h(u)-h_{i_{1}}^{\eta,i_{1}}(u)\right)\partial_{i_{1}}\psi(x_{0},u,t)dudt\right\|$
		$\displaystyle\quad{}=\left\|\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-t}(Dh_{s,\lambda}\circ f)(u)\Big{(}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)\Big{)}\partial_{i_{1}}\psi(x_{0},u,t)dudt\right\|$
		$\displaystyle\quad{}\leq\frac{C_{A,d}}{\lambda}\int_{\mathbb{R}^{d}}\Big{\|}\partial_{i_{1}}f(u)-(\varrho_{\eta}\ast_{(i_{1})}\partial_{i_{1}}f)(u)\Big{\|}\mathbf{1}\{s\leq\\|u\\|_{\infty}\leq s+3\lambda\}du$
		$\displaystyle\quad{}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0,$		(112)

Thus, combining eq. (111) and (F.2) with eq. (106) and (107) and invoking Lemma (5) we conclude for $\mathcal{L}^{d}$ -a.e. $x_{0}\in\mathbb{R}^{d}$ ,

\displaystyle\begin{split}\partial_{i_{1}}\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}&=\int_{0}^{\infty}e^{-2t}\mathrm{E}\left[\big{(}D^{2}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\>\partial_{i_{1}}f(V_{0}^{t})\right]dt\\ &\equiv\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-2t}\partial_{i_{1}}\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt\\ &\equiv\int_{0}^{\infty}e^{-2t}P_{t}\left(\partial_{i_{1}}\partial_{i_{1}}h\right)(x_{0})dt.\end{split}

(113)

Inductive step from $k$ to $k+1$ .

Suppose that for arbitrary indices $1\leq i_{1},\ldots,i_{k}\leq d$ , $k\geq 2$ ,

\displaystyle\begin{split}\partial_{i_{k}}\cdots\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}&=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}\partial_{i_{k}}\cdots\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt\\ &\equiv\int_{0}^{\infty}e^{-kt}P_{t}\left(\partial_{i_{k}}\cdots\partial_{i_{1}}h\right)(x_{0})dt.\end{split}

(114)

Under the induction hypothesis (114) the argument that gave identity (87) also gives

\displaystyle\partial_{i_{k+1}}\cdots\partial_{i_{1}}\left(\int_{0}^{\infty}P_{t}h(x)dt\right)\Big{|}_{x=x_{0}}=\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}\partial_{i_{k}}\cdots\partial_{i_{1}}h(u)\partial_{i_{k+1}}\psi(x_{0},u,t)dudt.

(115)

As in above case with $k=2$ , we now show that one can “pull back” the partial derivative $\partial_{i_{k+1}}$ from $\psi$ onto $\partial_{i_{k}}\cdots\partial_{i_{1}}h$ . Let $1\leq i_{k+1}\leq d$ be an arbitrary index and, for $\eta>0$ , define

\displaystyle h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)

\displaystyle:=(D^{k}h_{s,\lambda}\circ f)(u)\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u),

where $(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u)$ denotes the “partial regularization” in the $i_{k+1}$ th coordinate as defined in eq. (88). The map $u\mapsto\partial_{i_{k+1}}h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)$ exists for all $u\in\mathbb{R}^{d}$ , and satisfies, by the chain rule,

	$\displaystyle\partial_{i_{k+1}}h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)$
	$\displaystyle\quad{}=\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(u)\left(\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u)\right)\partial_{i_{k+1}}f(u)$		(116)
	$\displaystyle\quad{}\quad{}+\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(u)\sum_{j:i_{j}\neq i_{k+1}}\left(\prod_{\ell\neq j}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{\ell}}f)(u)\right)\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u)$		(117)
	$\displaystyle\quad{}\quad{}+\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(u)\sum_{j:i_{j}=i_{k+1}}\left(\prod_{\ell\neq j}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{\ell}}f)(u)\right)\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{k+1}}f)(u).$		(118)

Notice that the partial derivative $\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{j}}f)(u)$ in eq. (117) and $\partial_{i_{k+1}}(\varrho_{\eta}\ast_{(i_{k+1})}\partial_{i_{k+1}}f)(u)$ in eq. (118) follow the patterns derived in eq. (F.2) and (F.2), respectively.

Next, expand the right hand side of (115) as

	$\displaystyle\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)\partial_{i_{k+1}}\psi(x_{0},u,t)dudt$		(119)
	$\displaystyle\quad{}+\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-kt}\left(\partial_{i_{1},\ldots,i_{k}}h(u)-h_{i_{1},\ldots,i_{k}}^{\eta,i_{k+1}}(u)\right)\partial_{i_{k+1}}\psi(x_{0},u,t)dudt.$		(120)

Recall the argument developed to establish the limit (95) for $\eta\downarrow 0$ . The same argument, now combined with (116)–(118), also yields that the integral in (119) converges to

		$\displaystyle\int_{0}^{\infty}e^{-(k+1)t}\mathrm{E}\left[\big{(}D^{k+1}h_{s,\lambda}\circ f\big{)}(V_{0}^{t})\>\prod_{j=1}^{k+1}\partial_{i_{j}}f(V_{0}^{t})\right]dt$
		$\displaystyle\quad{}\equiv\int_{0}^{\infty}\int_{\mathbb{R}^{d}}e^{-(k+1)t}\partial_{i_{k+1}}\cdots\partial_{i_{1}}h(u)\psi(x_{0},u,t)dudt$
		$\displaystyle\quad{}\equiv\int_{0}^{\infty}e^{-(k+1)t}P_{t}\left(\partial_{i_{k+1}}\cdots\partial_{i_{1}}h\right)(x_{0})dt.$		(121)

Similarly, the same argument used to show that the integral in (93) vanishes for $\eta\downarrow 0$ also guarantees that the integral in (120) vanishes. Combine eq. (115) and (119)–(F.2) to conclude the inductive step from $k$ to $k+1$ for $k\geq 2$ .

Proof of part (ii).

Combine Lemma 1 (i), 4, and 5 and conclude that, for all $x\in\mathbb{R}^{d}\setminus\mathcal{N}$ ,

	$\displaystyle\left\|\partial_{i_{k}}\ldots\partial_{i_{1}}h\right\|(x)$
	$\displaystyle\quad{}=\left\|\left(D^{k}h_{s,\lambda}\circ f\right)\right\|(x)\left\|\partial_{i_{1}}f\right\|(x)\cdots\left\|\partial_{i_{k}}\right\|(x)$
	$\displaystyle\quad{}\leq C_{k}\lambda^{-k}\mathbf{1}\left\{s\leq\\|x\\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{\|x_{i_{1}}\|\geq\|x_{\ell}\|,\>\forall\ell\right\}\cdots\mathbf{1}\left\{\|x_{i_{k}}\|\geq\|x_{\ell}\|,\>\forall\ell\right\},$

where $C_{k}>0$ is the absolute constant from Lemma 1. Notice that

\displaystyle\mathbf{1}\left\{|x_{i_{1}}|\geq|x_{\ell}|,\>\forall\ell\right\}\cdots\mathbf{1}\left\{|x_{i_{k}}|\geq|x_{\ell}|,\>\forall\ell\right\}=1\quad{}\Longleftrightarrow\quad{}|x_{i_{1}}|=\ldots=|x_{i_{k}}|,

and, hence,

\displaystyle\mathcal{L}^{d}\left(\left\{x\in\mathbb{R}^{d}:\mathbf{1}\left\{|x_{i_{1}}|\geq|x_{\ell}|,\>\forall\ell\right\}\cdots\mathbf{1}\left\{|x_{i_{k}}|\geq|x_{\ell}|,\>\forall\ell\right\}=1\right\}\right)>0\>\>\Leftrightarrow\>\>i_{1}=\ldots=i_{k}.

Thus, there exists a $\mathcal{L}^{d}$ -null set $\mathcal{N}^{\prime}\supseteq\mathcal{N}$ such that for all $x\in\mathbb{R}^{d}\setminus\mathcal{N}^{\prime}$ ,

\displaystyle\left|\partial_{i_{k}}\ldots\partial_{i_{1}}h\right|(x)\leq C_{k}\lambda^{-k}\mathbf{1}\left\{s\leq\|x\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{|x_{i_{1}}|\geq|x_{\ell}|,\>\ell\neq{i_{1}}\right\}\mathbf{1}\{i_{1}=\ldots=i_{k}\}.

(122)

Since $\Sigma$ is positive definite, $N(0,\Sigma)$ is absolutely continuous with respect to $\mathcal{L}^{d}$ . Thus, the $\mathcal{L}^{d}$ -a.e. upper bound (122) continues to hold when evaluated at $V_{0}^{t}=e^{-t}x_{0}+\sqrt{1-e^{-2t}}Z$ and integrated over $Z\sim N(0,\Sigma)$ and $t\sim Exp(k)$ . We conclude that for all $x_{0}\in\mathbb{R}^{d}$ ,

\displaystyle\begin{split}&\left|\int_{0}^{\infty}e^{-kt}P_{t}\left(\partial_{i_{k}}\ldots\partial_{i_{1}}h\right)(x_{0})dt\right|\\ &\>\>\>\leq C_{k}\lambda^{-k}\int_{0}^{\infty}e^{-kt}\mathrm{E}\left[\mathbf{1}\left\{s\leq\|V_{0}^{t}\|_{\infty}\leq s+3\lambda\right\}\mathbf{1}\left\{|V_{0{i_{1}}}^{t}|\geq|V_{0\ell}^{t}|,\>\forall\ell\right\}\right]dt\>\mathbf{1}\{i_{1}=\ldots=i_{k}\}.\end{split}

(123)

∎

Proof of Lemma 3.

For any function $h$ on $\mathbb{R}^{d}$ and vector $y\in\mathbb{R}^{d}$ , we define the translation operator $\tau$ by $\tau_{y}h(x)=h(x-y)$ . With this notation,

\displaystyle(\varrho_{\eta}\ast_{(i)}h)=\int\varrho(r)h(x-r\eta e_{i})dr=\int\varrho(r)\tau_{r\eta e_{i}}h(x)dr

Without loss of generality we can assume that the $f_{j}$ ’s are bounded by one. Also, since $\varrho$ integrates to one, we have

\displaystyle(\varrho_{\eta}\ast_{(i_{j})}f_{j})g(x)-f_{j}(x)g(x)=\int\varrho(r)\left(\tau_{r\eta e_{i_{j}}}f_{j}(x)-f_{j}(x)\right)g(x)dr.

Hence, by the product comparison inequality,

$\displaystyle\left\\|\prod_{j=1}^{k}(\varrho_{\eta}\ast_{(i_{j})}f_{j})g-\prod_{j=1}^{k}f_{j}g\right\\|_{1}$	$\displaystyle=\int\left\|\int\varrho(r)\prod_{j=1}^{k}\left(\tau_{r\eta e_{i_{j}}}f_{j}(x)g(x)-f_{j}(x)g(x)\right)dr\right\|dx$
	$\displaystyle\leq\int\int\|\varrho(r)\|\left\|\prod_{j=1}^{k}\left(\tau_{r\eta e_{i_{j}}}f_{j}(x)g(x)-f_{j}(x)g(x)\right)\right\|drdx$
	$\displaystyle\leq\sum_{k=1}^{k}\int\int\|\varrho(r)\|\left\|\tau_{r\eta e_{i_{j}}}f_{j}(x)g(x)-f_{j}(x)g(x)\right\|drdx$
	$\displaystyle\leq\sum_{k=1}^{k}\int\|\varrho(r)\|\\|(\tau_{r\eta e_{i_{j}}}f_{j})g-f_{j}g\\|_{1}dr.$	(124)

Next, compute

	$\displaystyle\int\|\varrho(r)\|\\|\tau_{r\eta e_{i_{j}}}f_{j}g-f_{j}g\\|_{1}dr$	$\displaystyle\leq\int\|\varrho(r)\|\\|(\tau_{r\eta e_{i_{j}}}f_{j})g-\tau_{r\eta e_{i_{j}}}(f_{j}g)\\|_{1}dr+\int\varrho(r)\\|\tau_{r\eta e_{i_{j}}}(f_{j}g)-f_{j}g\\|_{1}dr$
		$\displaystyle=\int\|\varrho(r)\|\\|(\tau_{r\eta e_{i_{j}}}f_{j})(g-\tau_{r\eta e_{i_{j}}}g)\\|_{1}dr+\int\|\varrho(r)\|\\|\tau_{r\eta e_{i_{j}}}(f_{j}g)-f_{j}g\\|_{1}dr$
		$\displaystyle\leq\int\|\varrho(r)\|\\|g-\tau_{r\eta e_{i_{j}}}g\\|_{1}dr+\int\|\varrho(r)\|\\|\tau_{r\eta e_{i_{j}}}(f_{j}g)-f_{j}g\\|_{1}dr.$

Since $\|g-\tau_{z}g\|_{1}$ and $\|\tau_{z}(f_{j}g)-f_{j}g\|_{1}$ are both bounded by $2\|g\|_{1}<\infty$ for all $z\in\mathbb{R}$ , Proposition 8.5 in Folland (1999) implies that, for all $r\in\mathbb{R}$ ,

\displaystyle\|g-\tau_{r\eta e_{i}}g\|_{1}\vee\|\tau_{r\eta e_{i}}(f_{j}g)-f_{j}g\|_{1}\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0.

Hence, by the dominated convergence theorem

\displaystyle\int|\varrho(r)|\|\tau_{r\eta e_{i_{j}}}f_{j}g-f_{j}g\|_{1}dr\rightarrow 0\quad{}\mathrm{as}\quad{}\eta\rightarrow 0.

Conclude that each summand in eq. (F.2) vanishes as $\eta\downarrow 0$ . This completes the proof. ∎

F.3 Proofs of Lemmas 8, 9, 10, and 11

Proof of Lemma 8.

We provide a full proof for completeness only. Incidentally, this result has already been established by Le Cam (1986), p. 402 (Lemma 2). We compute,

	$\displaystyle\mathbb{P}\left\{X\leq s\right\}$	$\displaystyle=\mathbb{P}\left\{X\leq s,\>Z\leq s+\varepsilon\right\}+\mathbb{P}\left\{X\leq s,\>Z>s+\varepsilon\right\}$
		$\displaystyle\leq\mathbb{P}\left\{Z\leq s+\varepsilon\right\}+\mathbb{P}\left\{\|Z-X\|>\varepsilon\right\}$
		$\displaystyle\leq\mathbb{P}\left\{Z\leq s\right\}+\mathbb{P}\left\{s\leq Z\leq s+\varepsilon\right\}+\mathbb{P}\left\{\|Z-X\|>\varepsilon\right\}.$

For the reverse inequality,

	$\displaystyle\mathbb{P}\left\{Z\leq s-\varepsilon\right\}$	$\displaystyle\leq\mathbb{P}\left\{Z\leq s-\varepsilon,\>X\leq s\right\}+\mathbb{P}\left\{Z\leq s-\varepsilon,\>X>s\right\}$
		$\displaystyle\leq\mathbb{P}\left\{X\leq s\right\}+\mathbb{P}\left\{\|Z-X\|>\varepsilon\right\},$

and, hence,

\displaystyle\mathbb{P}\left\{Z\leq s\right\}\leq\mathbb{P}\left\{X\leq s\right\}+\mathbb{P}\left\{|Z-X|>\varepsilon\right\}+\mathbb{P}\left\{s-\varepsilon\leq Z\leq s\right\}.

Take the supremum over $s\geq 0$ and combine both inequalities. Then switch the roles of $X$ and $Z$ to conclude the proof. ∎

Proof of Lemma 9.

Observe that $X\vee Z=X+(Z-X)_{+}$ . Now, by Cauchy-Schwarz,

	$\displaystyle\mathrm{Var}(X\vee Z)$	$\displaystyle=\mathrm{Var}(X)+\mathrm{Var}\big{(}(Z-X)_{+}\big{)}+2\mathrm{Cov}\big{(}X,(Z-X)_{+}\big{)}$
		$\displaystyle\leq\mathrm{Var}(X)+\mathrm{Var}\big{(}(Z-X)_{+}\big{)}+2\sqrt{\mathrm{Var}(X)}\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}}$
		$\displaystyle=\left(\sqrt{\mathrm{Var}(X)}+\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}}\right)^{2}.$

Similarly, use the estimate $2\mathrm{Cov}\big{(}X,(Z-X)_{+}\big{)}\geq-2\sqrt{\mathrm{Var}(X)}\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}}$ in the first line of above display, to obtain

\displaystyle\mathrm{Var}(X\vee Z)\geq\left(\sqrt{\mathrm{Var}(X)}-\sqrt{\mathrm{Var}\big{(}(Z-X)_{+}\big{)}}\right)^{2}.

Combine both inequalities to obtain the desired two-sided inequality. Further, if $\mathrm{E}[Z-X]\geq 0$ , then by convexity of the map $a\mapsto(a)_{+}$ , $a\in\mathbb{R}$ , and Jensen’s inequality,

\displaystyle\mathrm{Var}\big{(}(Z-X)_{+}\big{)}=\mathrm{E}\big{[}(Z-X)_{+}^{2}\big{]}-\mathrm{E}\big{[}(Z-X)_{+}\big{]}^{2}\leq\mathrm{E}\big{[}(Z-X)^{2}\big{]}-\big{(}\mathrm{E}[Z-X])_{+}^{2}=\mathrm{Var}(Z-X).

∎

Proof of Lemma 10.

The proof is identical to the one of Lemma 3.2 in Chernozhukov et al. (2013). Let $\pi_{n}(\delta)=\delta^{1/3}\left(\mathrm{Var}(\|\Sigma^{1/2}Z\|_{\infty})\right)^{-1/3}$ . By Lemma 3 there exists an absolute constant $K>0$ such that on the event $\left\{\max_{j,k}|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}|\leq\delta\right\}$ , for all $s\geq 0$ ,

\displaystyle\Big{|}\mathbb{P}\left\{\|\widehat{\Sigma}_{n}^{1/2}Z\|_{\infty}\leq s\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\right\}\Big{|}\leq K\pi_{n}(\delta).

In particular, for $s=c_{n}(K\pi_{n}(\delta;\Sigma)+\alpha)$ , we obtain

	$\displaystyle\mathbb{P}\left\{\\|\widehat{\Sigma}_{n}^{1/2}Z\\|_{\infty}\leq c_{n}(K\pi_{n}(\delta)+\alpha;\Sigma)\mid X_{1},\ldots,X_{n}\right\}$	$\displaystyle\geq\mathbb{P}\left\{\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq c_{n}(K\pi_{n}(\delta;\Sigma)+\alpha)\right\}-K\pi_{n}(\delta)$
		$\displaystyle\geq K\pi_{n}(\delta)+\alpha-K\pi_{n}(\delta)=\alpha.$

To conclude the proof of the first statement apply the definition of quantiles. The second claim follows in the same way. ∎

Proof of Lemma 11.

Let $\delta,\eta>0$ be arbitrary and set $\gamma_{n}=\sup_{s\geq 0}|\mathbb{P}\{S_{n}\leq s\mid X_{1},\ldots,X_{n}\}-\mathbb{P}\{\|\Sigma_{n}^{1/2}Z\|_{\infty}\leq s\}|$ , $\kappa_{n}(\delta)=\delta(\mathrm{Var}(\|\Sigma_{n}^{1/2}Z\|_{\infty}))^{-1/2}$ , $\rho_{n}(\delta)=\mathbb{P}\{|R_{n}|>\delta\mid X_{1},\ldots,X_{n}\}$ . By Lemma 6 there exists an absolute constant $K>0$ such that on the event $\{\gamma_{n}+\rho(\delta)\leq\eta\}$ ,

	$\displaystyle\left\|\mathbb{P}\left\{T_{n}\leq s\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq s\right\}\right\|$
	$\displaystyle\quad\leq\left\|\mathbb{P}\left\{T_{n}\leq s,\|R_{n}\|\leq\delta\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq s\right\}\right\|+\mathbb{P}\left\{\|R_{n}\|>\delta\mid X_{1},\ldots,X_{n}\right\}$
	$\displaystyle\quad\leq\sup_{s\geq 0}\left\|\mathbb{P}\left\{S_{n}\leq s\mid X_{1},\ldots,X_{n}\right\}-\mathbb{P}\left\{\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq s\right\}\right\|$
	$\displaystyle\quad\quad+\sup_{s\geq 0}\mathbb{P}\left\{s\leq\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq s+\delta\right\}+\mathbb{P}\left\{\|R_{n}\|>\delta\mid X_{1},\ldots,X_{n}\right\}$
	$\displaystyle\quad\leq\gamma_{n}+\kappa_{n}(\delta)+\rho_{n}(\delta)$
	$\displaystyle\quad\leq K\kappa_{n}(\delta)+\eta.$

Hence, for $s=c_{n}(K\kappa_{n}(\delta)+\eta+\alpha;\Sigma)$ , we obtain

	$\displaystyle\mathbb{P}\left\{T_{n}\leq c_{n}(K\kappa_{n}(\delta)+\eta+\alpha;\Sigma)\right\}$
	$\displaystyle\quad\geq\mathbb{P}\left\{\\|\Sigma_{n}^{1/2}Z\\|_{\infty}\leq c_{n}(K\kappa_{n}(\delta)+\eta+\alpha;\Sigma)\right\}-K\kappa_{n}(\delta)-\eta$
	$\displaystyle\quad\geq K\kappa_{n}(\delta)+\eta+\alpha-K\kappa_{n}(\delta)-\eta$
	$\displaystyle\quad=\alpha.$

The first statement now follows from the definition of quantiles. The second claim follows in the same way. ∎

F.4 Proof of Lemma 18

Proof of Lemma 18.

First, we show necessity: Suppose that the sample paths of $X$ on $U$ are almost surely uniformly continuous w.r.t. $d_{X}$ , i.e. for almost every $\omega\in\Omega$ ,

\displaystyle\lim_{\delta\rightarrow 0}\sup_{d_{X}(u,v)<\delta}|X_{u}(\omega)-X_{v}(\omega)|=0.

Since $X$ is Gaussian, by Lemma 14, for $\delta>0$ arbitrary,

\displaystyle\mathrm{E}\left[\sup_{d_{X}(u,v)<\delta}|X_{u}(\omega)-X_{v}(\omega)|\right]\leq 2\mathrm{E}[Z]<\infty.

Hence, the claim follows from the dominated convergence theorem.

Next, we show sufficiency: Given the premise, we can find a null sequence $\{\delta_{n}\}_{n\geq 1}$ such that

\displaystyle\mathrm{E}\left[\sup_{d_{X}(u,v)<\delta_{n}}|X_{u}-X_{v}|\right]\leq 2^{-n}

(125)

Define events

\displaystyle A_{n}=\left\{\sup_{d_{X}(u,v)<\delta_{n}\wedge 2^{-n}}|X_{u}-X_{v}|>2^{-n/2}\right\}.

By Markov’s inequality and (125),

\displaystyle\sum_{i=1}^{\infty}\mathbb{P}\{A_{n}\}\leq\sum_{i=1}^{\infty}2^{-n/2}=1+\sqrt{2}<\infty.

Thus, by Borel-Cantelli, $\mathbb{P}\{\limsup_{n\rightarrow\infty}A_{n}\}=0$ , i.e. $X$ is almost surely uniformly continuous on $U$ . (Note that we have not used the fact that $X$ is Gaussian and $d_{X}$ the intrinsic standard deviation. Hence, sufficiency holds for arbitrary stochastic processes on general metric spaces.) ∎

F.5 Proofs of Lemmas 19, 20, 21, 22, and 23

Proof of Lemma 19.

Throughout the proof $\otimes$ denotes the tensor product between linear maps.

Let $M_{n}=\max_{1\leq i\leq n}\|X_{i}\|_{2}^{2}$ . Define the map $x\mapsto\varphi(x):=x^{2}\mathbf{1}\{|x|\leq M_{n}\}+M_{n}^{2}\mathbf{1}\{|x|>M_{n}\}$ . This map is Lipschitz continuous with Lipschitz constant $2M_{n}$ and $\varphi(0)=0$ . Let $\varepsilon_{1},\ldots,\varepsilon_{n}$ be i.i.d. Rademacher random variables independent of $X_{1},\ldots,X_{n}$ . Then, symmetrization and contraction principle applied to $\varphi(\cdot)/(2M_{n})$ yield

\displaystyle\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}\otimes X_{i}\otimes X_{i}-T_{4}\right\|_{op}

\displaystyle\leq 4\mathrm{E}\left[M_{n}\sup_{\begin{subarray}{c}\|u\|_{2}=1\\ \|v\|_{2}=1\end{subarray}}\left|\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}(X_{i}^{\prime}u)(X_{i}^{\prime}v)\right|\right].

We upper bound the right-hand side in above display using Cauchy-Schwarz, Hoffmann-Jørgensen, and de-symmetrization inequalities by

	$\displaystyle\left(\mathrm{E}\left[M_{n}^{2}\right]\right)^{1/2}\left(\mathrm{E}\left[\sup_{\begin{subarray}{c}\\|u\\|_{2}=1\\ \\|v\\|_{2}=1\end{subarray}}\left(\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}(X_{i}^{\prime}u)(X_{i}^{\prime}v)\right)^{2}\right]\right)^{1/2}$
	$\displaystyle\quad\lesssim\left(\mathrm{E}\left[M_{n}^{2}\right]\right)^{1/2}\left\{\mathrm{E}\left\\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}-\Sigma\right\\|_{op}+n^{-1}\left(\mathrm{E}\left[M_{n}^{2}\right]\right)^{1/2}\right\}.$

Since $X=(X_{1},\ldots,X_{d})^{\prime}\in\mathbb{R}^{d}$ is sub-Gaussian with mean zero and covariance $\Sigma$ , we compute

\displaystyle\left\|X^{\prime}X\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}^{2}\right\|_{\psi_{1}}\leq\sum_{k=1}^{d}\left\|X_{k}\right\|_{\psi_{2}}^{2}\lesssim\sum_{k=1}^{d}\mathrm{Var}(X_{k})=\mathrm{tr}(\Sigma).

Hence, by Lemma 35 in Giessing and Wang (2021) and Lemma 2.2.2 in van der Vaart and Wellner (1996) (see Exercise 2.14.1 (ibid.) for how to handle the non-convexity of the map inducing the $\psi_{1/2}$ -norm)

\displaystyle\mathrm{E}\left[M_{n}^{2}\right]=\mathrm{E}\left[\max_{1\leq i\leq n}(X_{i}^{\prime}X_{i})^{2}\right]\lesssim(\log en)^{2}\max_{1\leq i\leq n}\left\|(X_{i}^{\prime}X_{i})^{2}\right\|_{\psi_{1/2}}\lesssim(\log en)^{2}\mathrm{tr}^{2}(\Sigma).

Moreover, by Theorem 4 in Koltchinskii and Lounici (2017),

\displaystyle\mathrm{E}\left\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}-\Sigma\right\|_{op}\lesssim\|\Sigma\|_{op}\left(\sqrt{\frac{\mathrm{r}(\Sigma)}{n}}\vee\frac{\mathrm{r}(\Sigma)}{n}\right).

Thus, we conclude that

	$\displaystyle\mathrm{E}\left\\|\frac{1}{n}\sum_{i=1}^{n}X_{i}\otimes X_{i}\otimes X_{i}\otimes X_{i}-T_{4}\right\\|_{op}$
	$\displaystyle\quad\lesssim(\log en)\mathrm{r}(\Sigma)\\|\Sigma\\|_{op}^{2}\left(\sqrt{\frac{\mathrm{r}(\Sigma)}{n}}\vee\frac{\mathrm{r}(\Sigma)}{n}\right)+\frac{(\log en)^{2}\mathrm{r}^{2}(\Sigma)\\|\Sigma\\|_{op}^{2}}{n}.$

Adjust some absolute constants to complete the proof. ∎

Proof of Lemma 20.

To simplify notation we write $\varepsilon_{i}=Y_{i}-f_{0}(X_{i})$ for $1\leq i\leq n$ .

We begin with the following fundamental identity: Let $I$ be the identity operator and $P$ be such that $I+P$ is invertible. Then,

\displaystyle(I+P)^{-1}P-I=-(I+P)^{-1}.

Indeed, we have $I=(I+P)^{-1}(I+P)=(I+P)^{-1}+(I+P)^{-1}P$ . Now, re-arrange the terms on the far left and right hand side of this identity to conclude. Applied to $P=\lambda^{-1}\widehat{T}_{n}$ , we obtain

\displaystyle(\widehat{T}_{n}+\lambda)^{-1}\widehat{T}_{n}-I=(\lambda^{-1}\widehat{T}_{n}+I)^{-1}\lambda^{-1}\widehat{T}_{n}-I=-(\lambda^{-1}\widehat{T}_{n}+I)^{-1}=-\lambda(\widehat{T}_{n}+\lambda)^{-1}.

(126)

Hence, we compute

$\displaystyle\widehat{f}^{\mathrm{bc}}_{n}-f_{0}$	$\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}k_{X_{i}}Y_{i}+\lambda(\widehat{T}_{n}+\lambda)^{-1}\widehat{f}_{n}-f_{0}$
	$\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+(\widehat{T}_{n}+\lambda)^{-1}\widehat{T}_{n}f_{0}+\lambda(\widehat{T}_{n}+\lambda)^{-1}\widehat{f}_{n}-f_{0}$
	$\displaystyle\overset{(a)}{=}(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+\lambda(\widehat{T}_{n}+\lambda)^{-1}(\widehat{f}_{n}-f_{0})$
	$\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+\lambda(\widehat{T}_{n}+\lambda)^{-2}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}$
	$\displaystyle\quad+\lambda(\widehat{T}_{n}+\lambda)^{-1}\left((\widehat{T}_{n}+\lambda)^{-1}\widehat{T}_{n}f_{0}-f_{0}\right)$
	$\displaystyle\overset{(b)}{=}(\widehat{T}_{n}+\lambda)^{-1}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda(\widehat{T}_{n}+\lambda)^{-2}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda^{2}(\widehat{T}_{n}+\lambda)^{-2}f_{0}$
	$\displaystyle=(\widehat{T}_{n}+\lambda)^{-1}\left(I-\lambda(\widehat{T}_{n}+\lambda)^{-1}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda^{2}(\widehat{T}_{n}+\lambda)^{-2}f_{0}$
	$\displaystyle\overset{(c)}{=}(\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n}\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}-\lambda^{2}(\widehat{T}_{n}+\lambda)^{-2}f_{0},$	(127)

where (a), (b), and (c) follow from identity (126).

We now further upper bound the terms in (F.5). Re-walking the steps from (c) to (b) we obtain

		$\displaystyle\left\\|\left((\widehat{T}_{n}+\lambda)^{-2}\widehat{T}_{n}-(T+\lambda)^{-2}T\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\\|_{\infty}$
	$\displaystyle\begin{split}&\quad\quad\leq\left\\|\left((\widehat{T}_{n}+\lambda)^{-1}-(T+\lambda)^{-1}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\\|_{\infty}\\ &\quad\quad\quad+\lambda\left\\|\left((\widehat{T}_{n}+\lambda)^{-2}-(T+\lambda)^{-2}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\\|_{\infty}.\end{split}$			(128)

Note that

\displaystyle\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\|_{op}

\displaystyle\leq\|(\widehat{T}_{n}+\lambda)^{-1}\|_{op}\|T-\widehat{T}_{n}\|_{op}\leq\lambda^{-1}\|T-\widehat{T}_{n}\|_{op}.

(129)

Recall that $|a^{2}-1|\leq 3\max\{|a-1|,|a-1|^{2}\}$ for all $a\geq 0$ . Let $U_{\mathcal{H}}$ be the unit ball of $\mathcal{H}$ . We have

		$\displaystyle\\|(\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\\|_{op}$
		$\displaystyle\quad\overset{(a)}{=}\sup_{u\in U_{\mathcal{H}}}\left\|\left\langle\left((\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\right)u,u\right\rangle_{\mathcal{H}}\right\|$
		$\displaystyle\quad\overset{(b)}{=}\sup_{u\in U_{\mathcal{H}}}\left\|\left\langle(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u,(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\rangle_{\mathcal{H}}-1\right\|$
		$\displaystyle\quad=\sup_{u\in U_{\mathcal{H}}}\left\|\left\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\\|_{\mathcal{H}}^{2}-1\right\|$
		$\displaystyle\quad\leq 3\sup_{u\in U_{\mathcal{H}}}\left\|\left\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\\|_{\mathcal{H}}-1\right\|\vee 3\sup_{u\in U_{\mathcal{H}}}\left\|\left\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)u\right\\|_{\mathcal{H}}-1\right\|^{2}$
		$\displaystyle\quad\overset{(c)}{\leq}3\left\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\right\\|_{op}\vee 3\left\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\right\\|_{op}^{2}$
		$\displaystyle\quad\overset{(d)}{\leq}3\lambda^{-1}\\|T-\widehat{T}_{n}\\|_{op}\vee 3\lambda^{-2}\\|T-\widehat{T}_{n}\\|_{op}^{2},$		(130)

where (a) and (b) hold because $(\widehat{T}_{n}+\lambda)^{-1}$ and $(T+\lambda)$ are self-adjoint, (c) follows from the reverse triangle inequality applied $\|\cdot\|_{\mathcal{H}}$ and the definition of the operator norm, and (d) follows from (129).

By the reproducing property of the kernel $k$ (see Remark LABEL:remark:Assumptions-RE) and (129),

\displaystyle\begin{split}&\left\|\left((\widehat{T}_{n}+\lambda)^{-1}-(T+\lambda)^{-1}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\|_{\infty}\leq\kappa\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}},\end{split}

(131)

and similarly, by (F.5),

\displaystyle\begin{split}&\lambda\left\|\left((\widehat{T}_{n}+\lambda)^{-2}-(T+\lambda)^{-2}\right)\frac{1}{n}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}\right\|_{\infty}\\ &\quad\leq\kappa\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\left(3+3\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\right)\left\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}},\end{split}

(132)

and (129) again,

$\displaystyle\lambda^{2}\\|(\widehat{T}_{n}+\lambda)^{-2}f_{0}\\|_{\infty}$	$\displaystyle\leq\lambda^{2}\kappa\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)\\|_{op}^{2}\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}}$
	$\displaystyle\leq\lambda^{2}\kappa\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\\|_{op}^{2}\\|(T+\lambda)^{2}f_{0}\\|_{\mathcal{H}}+\lambda^{2}\kappa\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}}$
	$\displaystyle\leq\kappa\left(\\|\widehat{T}_{n}-T\\|_{op}^{2}+\lambda^{2}\right)\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}}.$	(133)

Combine (F.5), (F.5), and (131)–(F.5) to conclude that

\displaystyle\sqrt{n}(\widehat{f}^{\mathrm{bc}}_{n}-f_{0})=(T+\lambda)^{-2}T\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\varepsilon_{i}k_{X_{i}}+\sqrt{n}R_{n},

where

	$\displaystyle\sqrt{n}\\|R_{n}\\|_{\infty}$	$\displaystyle\lesssim\kappa\lambda^{-1}\\|\widehat{T}_{n}-T\\|_{op}\left(1+\lambda^{-1}\\|\widehat{T}_{n}-T\\|_{op}\right)\left\\|\frac{1}{n}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\\|_{\mathcal{H}}$
		$\displaystyle+\kappa\left(\sqrt{n}\\|\widehat{T}_{n}-T\\|_{op}^{2}+\sqrt{n}\lambda^{2}\right)\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}}.$

∎

Proof of Lemma 21.

Observe that $\mathrm{tr}(T)\leq\kappa^{2}$ . Therefore, by Lemma 23, with probability at least $1-\delta$ ,

\displaystyle\lambda^{-1}\|\widehat{T}_{n}-T\|_{op}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}}\lesssim\kappa^{2}\log^{2}(2/\delta)\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right),

and

\displaystyle\lambda^{-2}\|\widehat{T}_{n}-T\|_{op}^{2}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(T+\lambda)^{-1}\varepsilon_{i}k_{X_{i}}\right\|_{\mathcal{H}}\lesssim\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right),

and

\displaystyle\kappa\left(\sqrt{n}\|\widehat{T}_{n}-T\|_{op}^{2}+\sqrt{n}\lambda^{2}\right)\|(T+\lambda)^{2}f_{0}\|_{\mathcal{H}}\lesssim\kappa\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}+\sqrt{n}\lambda^{2}\right)\|(T+\lambda)^{-2}f_{0}\|_{\mathcal{H}}.

We combine above three inequalities and conclude that, with probability at least $1-\delta$ ,

	$\displaystyle\sqrt{n}\\|R_{n}\\|_{\infty}$	$\displaystyle\lesssim\left(\sqrt{\frac{\bar{\sigma}^{2}\mathfrak{n}_{1}^{2}(\lambda)}{n\lambda^{2}}}\vee\frac{\bar{\sigma}^{2}}{n\lambda^{2}}\right)\left(\kappa^{2}\log^{2}(2/\delta)+\frac{\kappa^{4}\log^{3}(2/\delta)}{\sqrt{n}\lambda}\right)$
		$\displaystyle+\kappa\\|(T+\lambda)^{-2}f_{0}\\|_{\mathcal{H}}\left(\frac{\kappa^{4}\log^{2}(2/\delta)}{\sqrt{n}}+\sqrt{n}\lambda^{2}\right).$

∎

Proof of Lemma 22.

Note that $A$ and $(A+\lambda)^{-1}$ commute for any operator $A$ . Hence, we compute

\displaystyle\|\widehat{\Omega}_{n}-\Omega\|_{op}\leq\left|\widehat{\sigma}_{n}^{2}-\sigma_{0}^{2}\right|\|(T+\lambda)^{-4}T^{3}\|_{op}+\sigma_{0}^{2}\|(\widehat{T}_{n}+\lambda)^{-4}\widehat{T}_{n}^{3}-(T+\lambda)^{-4}T^{3}\|_{op}=\mathbf{I}+\mathbf{II}.

Bound on term $\mathbf{I}$ . Since $\mathrm{E}[(Y_{i}-f_{0}(X_{i}))^{2}-\sigma_{0}^{2}]=0$ , $|(Y_{i}-f_{0}(X_{i}))^{2}-\sigma_{0}^{2}|\leq(B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}+\sigma_{0}^{2}$ almost surely, and $\mathrm{Var}(Y_{i}-f_{0}(X_{i}))^{2})\leq 2(B+\kappa\|f_{0}\|_{\mathcal{H}})^{4}+2\sigma_{0}^{4}$ Bernstein’s inequality for real-valued random variables implies that, with probability at least $1-\delta$ ,

\displaystyle\left|\widehat{\sigma}_{n}^{2}-\sigma_{0}^{2}\right|\lesssim((B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}+\sigma_{0}^{2})\left(\sqrt{\frac{\log(2/\delta)}{n}}\vee\frac{\log(2/\delta)}{n}\right).

Also,

\displaystyle\|(T+\lambda)^{-4}T^{3}\|_{op}\leq\lambda^{-1}\|(T+\lambda)^{-1}T\|_{op}^{3}.

Hence, with probability at least $1-\delta$ ,

\displaystyle\mathbf{I}\lesssim((B+\kappa\|f_{0}\|_{\mathcal{H}})^{2}+\sigma_{0}^{2})\|(T+\lambda)^{-1}T\|_{op}^{3}\left(\sqrt{\frac{\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\log(2/\delta)}{n\lambda}\right).

Bound on term $\mathbf{II}$ . Compute

	$\displaystyle\\|(\widehat{T}_{n}+\lambda)^{-4}\widehat{T}_{n}^{3}-(T+\lambda)^{-4}T^{3}\\|_{op}$
	$\displaystyle\quad\leq\\|(\widehat{T}_{n}^{3}-T^{3})(T+\lambda)^{-4}\\|_{op}+\\|(\widehat{T}_{n}+\lambda)^{-4}(T+\lambda)^{4}-I\\|_{op}\\|(T+\lambda)^{-4}T^{3}\\|_{op}.$

Since $(a^{3}-b^{3})c^{4}=(a-b)c^{2}\big{(}(a-b)^{2}c^{2}+3(a-b)cbc+3b^{2}c^{2}\big{)}$ for $a,b,c\in\mathbb{R}$ , it follows that

	$\displaystyle\\|(\widehat{T}_{n}^{3}-T^{3})(T+\lambda)^{-4}\\|_{op}$
	$\displaystyle\quad\leq\lambda^{-1}\\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\\|_{op}\left(\\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\\|_{op}^{2}\right.$
	$\displaystyle\quad\quad\left.+3\\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\\|_{op}\\|T(T+\lambda)^{-1}\\|_{op}+3\\|T(T+\lambda)^{-1}\\|_{op}^{2}\right)$
	$\displaystyle\quad\leq 3\lambda^{-1}\\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\\|_{op}\left(\\|(\widehat{T}_{n}-T)(T+\lambda)^{-1}\\|_{op}+\\|T(T+\lambda)^{-1}\\|_{op}\right)^{2}.$

Hence, by Lemma 23, with probability at least $1-\delta$ ,

\displaystyle\begin{split}\|(\widehat{T}_{n}^{3}-T^{3})(T+\lambda)^{-4}\|_{op}&\lesssim\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)\\ &\quad\times\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\vee\|T(T+\lambda)^{-1}\|_{op}\right)^{2}.\end{split}

(134)

Next, recall the approach that led to the bound in (F.5) in the proof of Lemma 20. We iterate this approach to obtain

	$\displaystyle\\|(\widehat{T}_{n}+\lambda)^{-4}(T+\lambda)^{4}-I\\|_{op}$
	$\displaystyle\quad\leq 3\\|(\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\\|_{op}\vee 3\\|(\widehat{T}_{n}+\lambda)^{-2}(T+\lambda)^{2}-I\\|_{op}^{2}$
	$\displaystyle\quad\leq 9\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\\|_{op}\vee 9\\|(\widehat{T}_{n}+\lambda)^{-1}(T+\lambda)-I\\|_{op}^{4}$
	$\displaystyle\quad\leq 9\lambda^{-1}\\|T-\widehat{T}_{n}\\|_{op}\vee 9\lambda^{-4}\\|T-\widehat{T}_{n}\\|_{op}^{4},$

Thus, by Lemma 23 and since $\mathrm{tr}(T)\leq\kappa^{2}$ , with probability at least $1-\delta$ ,

\displaystyle\begin{split}&\|(\widehat{T}_{n}+\lambda)^{-4}(T+\lambda)^{4}-I\|_{op}\|(T+\lambda)^{-1}T\|_{op}^{3}\\ &\quad\lesssim\left(\sqrt{\frac{\kappa^{4}\log(2/\delta)}{n\lambda^{2}}}\vee\left(\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)^{4}\right)\|(T+\lambda)^{-4}T^{3}\|_{op}.\end{split}

(135)

Combine (134) and (135) and conclude that, with probability at least $1-\delta$ ,

	$\displaystyle\mathbf{II}$	$\displaystyle\lesssim\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)\\|(T+\lambda)^{-1}T\\|_{op}^{2}$
		$\displaystyle\quad+\left(\sqrt{\frac{\kappa^{2}\mathfrak{n}_{1}^{2}(\lambda)\log(2/\delta)}{n\lambda^{2}}}\vee\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)^{3}$
		$\displaystyle\quad+\left(\sqrt{\frac{\kappa^{4}\log(2/\delta)}{n\lambda^{2}}}\vee\left(\frac{\kappa^{2}\log(2/\delta)}{n\lambda^{2}}\right)^{4}\right)\\|T^{3}(T+\lambda)^{-4}\\|_{op}.$

Since $(T+\lambda)^{-1}T\lesssim I$ it follows that $\|(T+\lambda)^{-1}T\|_{op}\leq 1$ . Hence, under the rate conditions the upper bounds on terms $\mathbf{I}$ and $\mathbf{II}$ simplify as stated in the lemma. ∎

Proof of Lemma 23.

Proof of claim (i). Let $1\leq i\leq n$ and $\alpha\in\mathbb{N}$ be arbitrary. Since $f_{0}$ minimizes the expected square loss, we have

\displaystyle\mathrm{E}\left[(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right]=0.

Therefore, as in the proof of Lemma G.4 in Singh and Vijaykumar (2023), we compute,

	$\displaystyle\left\\|(T+\lambda)^{-1}(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right\\|_{\mathcal{H}}$	$\displaystyle\leq\left\|Y_{i}-f_{0}(X_{i})\right\|\left\\|(T+\lambda)^{-\alpha}k_{X_{i}}\right\\|_{\mathcal{H}}$
		$\displaystyle\leq\lambda^{-\alpha}\kappa(B+\kappa\\|f_{0}\\|_{\mathcal{H}})\quad a.s.$

and

\displaystyle\mathrm{E}\left[\left\|(T+\lambda)^{-\alpha}\big{(}Y_{i}-f_{0}(X_{i})\big{)}k_{X_{i}}\right\|_{\mathcal{H}}^{2}\right]\leq\sigma_{0}^{2}\mathrm{E}\left[\left\|(T+\lambda)^{-\alpha}k_{X_{i}}\right\|_{\mathcal{H}}^{2}\right]\leq\sigma_{0}^{2}\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right).

Moreover, since $S$ is separable and $k$ is continuous, $\mathcal{H}$ is separable as well. Hence, the conditions of Lemma 24 are satisfied with $\nu=\lambda^{-\alpha}\kappa B+\lambda^{-\alpha}\kappa^{2}\|f_{0}\|_{\mathcal{H}}$ and $\sigma=\sigma_{0}\mathfrak{n}_{\alpha}(\lambda)=\sigma_{0}\sqrt{\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right)}$ . This completes the proof of the first claim.

To proof of claim (ii). Let $1\leq i\leq n$ be arbitrary. Then $\mathrm{E}[(T+\lambda)^{-\alpha}((k_{X_{i}}\otimes k_{X_{i}}^{*})-T)]=0$ . Moreover,

	$\displaystyle\left\\|(T+\lambda)^{-\alpha}\left((k_{X_{i}}\otimes k_{X_{i}}^{*})-T\right)\right\\|_{HS}$	$\displaystyle\leq\left\\|(T+\lambda)^{-\alpha}k_{X_{i}}\otimes k_{X_{i}}^{*}\right\\|_{HS}+\left\\|(T+\lambda)^{-\alpha}T\right\\|_{HS}$
		$\displaystyle\leq\lambda^{-\alpha}\kappa^{2}+\sqrt{\lambda^{-2\alpha}\mathrm{tr}(T^{2})}$
		$\displaystyle\leq 2\lambda^{-\alpha}\kappa^{2}\quad a.s.$

and

	$\displaystyle\mathrm{E}\left[\left\\|(T+\lambda)^{-\alpha}\left((k_{X_{i}}\otimes k_{X_{i}}^{*})-T\right)\right\\|_{HS}^{2}\right]$	$\displaystyle=\mathrm{E}\left[\mathrm{tr}\left((T+\lambda)^{-2\alpha}(k_{X_{i}}\otimes k_{X_{i}}^{*}-T)^{2}\right)\right]$
		$\displaystyle=\mathrm{E}\left[\mathrm{tr}\left((T+\lambda)^{-2\alpha}(k_{X_{i}}\otimes k_{X_{i}}^{*})^{2}\right)\right]-\mathrm{tr}((T+\lambda)^{-2\alpha}T^{2})$
		$\displaystyle\leq\kappa^{2}\mathrm{tr}\left((T+\lambda)^{-2\alpha}T\right).$

The space of linear operators $A:\mathcal{H}\rightarrow\mathcal{H}$ equipped with the Hilbert-Schmidt norm $\|A\|_{HS}=\sqrt{\mathrm{tr}(A^{\prime}A)}$ is a Hilbert space. This space can be identified as the tensor product $\mathcal{H}\otimes\mathcal{H}$ . Since this tensor product space is separable whenever $\mathcal{H}$ is separable, the conditions of Lemma 24 are satisfied $\nu=2\lambda^{-\alpha}\kappa^{2}$ and $\sigma=\kappa\sqrt{\mathrm{tr}((T+\lambda)^{-2\alpha}T)}$ . This completes the proof of the second claim. ∎

	$\displaystyle\sup_{\alpha\in(0,1)}\Big{\|}\mathbb{P}\left\{\\|S_{n}\\|_{\infty}+\Theta_{n}\leq c_{n}(\alpha;\widehat{\Sigma}_{n})\right\}-\alpha\Big{\|}$
	$\displaystyle\quad{}\quad{}\lesssim\frac{(\mathrm{E}[\\|X\\|_{\infty}^{3}])^{1/3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|Z\\|_{\infty})}}+\frac{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\mathbf{1}\{\\|X\\|_{\infty}>M\}\right]}{\mathrm{E}\left[\\|X\\|_{\infty}^{3}\right]}+\frac{\mathrm{E}[\\|Z\\|_{\infty}]+M}{\sqrt{n\mathrm{Var}(\\|Z\\|_{\infty})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\\|Z\\|_{\infty})}\right)^{1/3}+\mathrm{P}\left(\max_{j,k}\|\widehat{\Sigma}_{n,jk}-\Sigma_{jk}\|>\delta\right)\right\}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad\quad+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\\|Z\\|_{\infty})}}+\mathrm{P}\left(\|\Theta_{n}\|>\eta\right)\right\},$

	$\displaystyle\sup_{\alpha\in(0,1)}\Big{\|}\mathbb{P}\left\{\\|\mathbb{G}_{n}\\|_{\mathcal{F}_{n}}+\Theta_{n}\leq c_{n}(\alpha;\widehat{\mathcal{C}}_{n}^{m})\right\}-\alpha\Big{\|}$
	$\displaystyle\quad{}\quad{}\quad{}\lesssim\frac{\\|F_{n}\\|_{P,3}}{\sqrt{n^{1/3}\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\frac{\\|F_{n}\mathbf{1}\{F_{n}>M\}\\|_{P,3}^{3}}{\\|F_{n}\\|_{P,3}^{3}}+\frac{\mathrm{E}\\|G_{P}\\|_{\mathcal{F}_{n}}+M}{\sqrt{n\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\delta>0}\left\{\left(\frac{\delta}{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}\right)^{1/3}+\mathbb{P}\left\{\sup_{f,g\in\mathcal{F}_{n}}\big{\|}\mathcal{C}_{P}(f,g)-\widehat{\mathcal{C}}_{n}^{m}(f,g)\big{\|}>\delta\right\}\right\}$
	$\displaystyle\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}\quad{}+\inf_{\eta>0}\left\{\frac{\eta}{\sqrt{\mathrm{Var}(\\|G_{P}\\|_{\mathcal{F}_{n}})}}+\mathbb{P}\left\{\|\Theta_{n}\|>\eta\right\}\right\},$

	$\displaystyle\big{\|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{\|}$	$\displaystyle=\left\|\mathrm{E}\left[\mathrm{tr}\left(\Sigma D^{2}G_{h}(S_{n})\right)-S_{n}^{\prime}DG_{h}(S_{n})\right]\right\|$
		$\displaystyle=\left\|\sum_{i=1}^{n}\mathrm{E}\left[\frac{1}{n}\mathrm{tr}\left(\widetilde{X}_{i}\widetilde{X}_{i}D^{2}G_{h}(S_{n})\right)-\frac{X_{i}^{\prime}}{\sqrt{n}}DG_{h}(S_{n})\right]\right\|$
		$\displaystyle=\left\|\sum_{i=1}^{n}\mathrm{E}\left[\sum_{\|\alpha\|=2}D^{\alpha}G_{h}(S_{n})\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}-\sum_{\|\alpha\|=1}D^{\alpha}G_{h}\left(S_{n}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha}\right]\right\|,$

		$\displaystyle\big{\|}\mathrm{E}[P_{t}h(S_{n})-h(Z)]\big{\|}$
		$\displaystyle\quad{}=\left\|\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=2}D^{\alpha}G_{h}(S_{n}^{i})\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\right]+\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=2}\sum_{\|\beta\|=1}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\left(\frac{X_{i}}{\sqrt{n}}\right)^{\beta}\right]\right.$
		$\displaystyle\quad{}\left.\quad{}-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=1}D^{\alpha}G_{h}\left(S_{n}^{i}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha}\right]-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=1}\sum_{\|\beta\|=1}D^{\alpha+\beta}G_{h}(S_{n}^{i})\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\right]\right.$
		$\displaystyle\quad{}\left.\quad{}-\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=1}\sum_{\|\beta\|=2}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\frac{2(1-\theta)}{\beta!}\right]\right\|$
		$\displaystyle\quad{}=\left\|\mathrm{E}\left[\sum_{i=1}^{n}\sum_{\|\alpha\|=2}\sum_{\|\beta\|=1}D^{\alpha+\beta}G_{h}\left(S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}\right)\left\{\left(\frac{\widetilde{X}_{i}}{\sqrt{n}}\right)^{\alpha}\left(\frac{X_{i}}{\sqrt{n}}\right)^{\beta}-\left(\frac{X_{i}}{\sqrt{n}}\right)^{\alpha+\beta}\frac{2(1-\theta)}{\alpha!}\right\}\right]\right\|,$		(29)

		$\displaystyle\frac{e^{-3u}}{n^{3/2}}\sum_{i=1}^{n}\sum_{j,k,\ell}\mathrm{E}_{Z}\left[\left\|\frac{\partial^{3}h}{\partial x_{j}\partial x_{k}\partial x_{\ell}}\left(e^{-u}x+\sqrt{1-e^{-2u}}Z\right)\Big{\|}_{x=S_{n}^{i}+\theta\frac{X_{i}}{\sqrt{n}}}\right\|\right]\left\|\widetilde{X}_{ij}\widetilde{X}_{ik}X_{i\ell}-X_{ij}X_{ik}X_{i\ell}\frac{2(1-\theta)}{\alpha_{k,\ell}}\right\|$
		$\displaystyle\leq\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\sum_{j=1}^{d}\mathbf{1}\left\{\left\|V_{ij}\right\|\geq\left\|V_{im}\right\|,\>m\neq j\right\}\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\left\|\widetilde{X}_{ij}^{2}X_{ij}-X_{ij}^{3}\frac{2(1-\theta)}{\alpha_{j,j}}\right\|\right]$
		$\displaystyle\leq\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\left(\sum_{j=1}^{d}\mathbf{1}\left\{\left\|V_{ij}\right\|\geq\left\|V_{im}\right\|,\>m\neq j\right\}\right)\max_{1\leq j\leq d}\left(\widetilde{X}_{ij}^{2}\|X_{ij}\|+\|X_{ij}\|^{3}\right)\right]$
		$\displaystyle=\frac{C_{3}e^{-3u}}{n^{3/2}\lambda^{3}}\mathrm{E}_{Z}\left[\sum_{i=1}^{n}\mathbf{1}_{[s,s+3\lambda]}\left(\left\\|V_{i}\right\\|_{\infty}\right)\left(\\|\widetilde{X}_{i}\\|_{\infty}^{2}\\|X_{i}\\|_{\infty}+\\|X_{i}\\|_{\infty}^{3}\right)\right],$		(30)

Gaussian and Bootstrap Approximations for Suprema of Empirical Processes

Abstract

1 Introduction

1.1 Approximating the sampling distribution of suprema of empirical processes

1.2 Contributions and overview of the results

2 Dimension- and weak variance-free results for the maximum norm of sums of i.i.d. random vectors

2.1 Four basic results

Proposition 1 (Gaussian approximation).

Remark 1 (Extension to independent, non-identically distributed random vectors).

Remark 2 (Extension to non-centered random vectors).

Remark 3 (Gaussian approximation versus CLT).

Corollary 2.

Proposition 3 (Gaussian comparison).

Proposition 4 (Bootstrap approximation of the sampling distribution).

Proposition 5 (Bootstrap approximation of quantiles).

Remark 4 (On the purpose of the random variables (Θn)n≥1(\Theta_{n})_{n\geq 1}).

2.2 Relation to previous work

3 Approximation results for suprema of empirical processes

3.1 Empirical process notation and definitions

3.2 Gaussian approximation inequalities

Theorem 1 (Gaussian approximation).

Remark 1 (Lower and upper bounds on Var(∥GP∥ℱn\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}).

Corollary 2.

Remark 2 (Further simplification with lower bound on second moment).

3.3 Gaussian comparison inequalities

Theorem 3 (Gaussian comparison).

Remark 3 (Comparison inequality for Gaussian QQ-bridge processes).

Corollary 4.

Corollary 5.

3.4 Gaussian process bootstrap

Theorem 6 (Abstract bootstrap approximation).

Remark 4 (Bootstrap approximation with Gaussian QQ-bridge process).

Algorithm 1 (Gaussian process bootstrap).

Remark 5 (Relation to the Gaussian multiplier bootstrap by Chernozhukov et al. (2014b)).

3.5 A practical guide to the Gaussian process bootstrap

Proposition 7 (Karhunen-Loève decomposition of Gaussian processes).

Definition 1 (Admissibility of 𝒞^n\widehat{\mathcal{C}}_{n}).

Remark 6 (Admissible estimates exist).

Corollary 8 (Consistency of the Gaussian process bootstrap).

Theorem 9 (Quantiles of the Gaussian process bootstrap).

Remark 7 (Additional practical considerations).

3.6 Relation to previous work

4 Applications

4.1 Confidence ellipsoids for high-dimensional parameter vectors

Definition 2 (Sub-Gaussian random vector).

Assumption 1 (Sub-Gaussian influence functions).

Assumption 2 (Heavy-tailed influence functions).

Proposition 1 (Bootstrap confidence ellipsoid).

4.2 Inference on the spectral norm of high-dimensional covariance matrices

Assumption 3 (Sub-Gaussian data).

Remark 1 (On the lower bound on the variances).

Proposition 2 (Bootstrap approximation of the distribution of spectral norms of covariance matrices).

Remark 2.

4.3 Simultaneous confidence bands for functions in reproducing kernel Hilbert spaces

Assumption 4 (On the kernel).

Remark 3.

Assumption 5 (On the data).

Remark 4.

Assumption 6 (On the population and sample covariance operators).

Remark 5.

Proposition 3 (Bootstrap quantiles).

Corollary 4 (Simultaneous bootstrap confidence bands).

5 Conclusion

Acknowledgement

References

Appendix A Two toy examples

Appendix B Auxiliary results

B.1 Smoothing inequalities and partial derivatives

Lemma 1.

Remark 1.

Lemma 2.

Remark 2.

Lemma 3.

Remark 3.

Lemma 4 (Partial derivatives of compositions of almost everywhere diff’able functions).

Remark 4.

Remark 5.

Lemma 5 (Partial and total derivatives of ℓ∞\ell_{\infty}-norms).

Remark 6.

B.2 Anti-concentration inequalities and lower bounds on variances

Remark 4 (On the purpose of the random variables $(\Theta_{n})_{n\geq 1}$ ).

Remark 1 (Lower and upper bounds on $\mathrm{Var}(\|G_{P}\|_{\mathcal{F}_{n}}$ ).

Remark 3 (Comparison inequality for Gaussian $Q$ -bridge processes).

Remark 4 (Bootstrap approximation with Gaussian $Q$ -bridge process).

Definition 1 (Admissibility of $\widehat{\mathcal{C}}_{n}$ ).

Lemma 5 (Partial and total derivatives of $\ell_{\infty}$ -norms).