Tail Bounds for Canonical $U$ -Statistics and $U$ -Processes with Unbounded Kernels^†^†thanks: An initial version of this work was available here. This draft is a revised version with some edits.

Abhishek Chakrabortty and Arun Kumar Kuchibhotla Department of Statistics, Texas A&M university. Email: [email protected]Department of Statistics & Data Science, Carnegie Mellon University. Email: [email protected]

Abstract

In this paper, we prove exponential tail bounds for canonical (or degenerate) $U$ -statistics and $U$ -processes under exponential-type tail assumptions on the kernels. Most of the existing results in the relevant literature often assume bounded kernels or obtain sub-optimal tail behavior under unbounded kernels. We obtain sharp rates and optimal tail behavior under sub-Weibull kernel functions. Some examples from nonparametric and semiparametric statistics literature are considered.

Keywords and phrases: Degenerate $U$ -statistics and $U$ -processes, Unbounded kernels, Sub-Weibull tails, Exponential tail bounds, Nonparametric/semiparametric statistics.

1 Introduction and Motivation

In this paper, we study moment and tail bounds of second-order degenerate $U$ -statistics and $U$ -processes. Averages, the simplest function of a collection of random variables, are sums with each summand depending only on one element of the collection. On the other hand, $U$ -statistics depend on tuples of elements in the collection. Formally, second-order $U$ -statistics based on the collection of random variables $Z_{1},\ldots,Z_{n}$ is of the form

{U}_{n}=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}),

(1)

for some functions $\{f_{i,j}:\,1\leq i\neq j\leq n\}$ . In this paper, we consider $U$ -statistics defined on independent but possible non-identically distributed random variables $Z_{1},\ldots,Z_{n}$ defined on some measurable space. $U$ -statistics, in general, are ubiquitous in statistical applications including, e.g., goodness-of-fit tests, two-sample tests using kernel-based distances as well as independence testing via permutation tests; see Kim, (2020) for an overview of this literature.

We motivate our interest in $U$ -statistics using a few prototypical examples. Suppose $X_{1},\ldots,X_{n}$ are independent and identically distributed (i.i.d.) realizations of a random vector $X\in\mathbb{R}^{p}$ with Lebesgue density $f$ . Consider the problem of estimating the quadratic functional

\Gamma(f):=\int_{\mathbb{R}^{p}}f^{2}(x)dx=\mathbb{E}\left[f(X)\right].

(2)

A natural estimator for this functional is given by

\widehat{\Gamma}(f):=\frac{1}{n(n-1)h_{n}^{p}}\sum_{1\leq i\neq j\leq n}K\left(\frac{X_{i}-X_{j}}{h_{n}}\right)=\frac{1}{n}\sum_{i=1}^{n}\widehat{f}^{(-i)}(X_{i}),

(3)

where $h_{n}$ represents the bandwidth and $\widehat{f}^{(-i)}(\cdot)$ represents the leave-one-out kernel density estimator:

\widehat{f}^{(-i)}(x)=\frac{1}{(n-1)h_{n}^{p}}\sum_{j=1,j\neq i}^{n}K\left(\frac{X_{j}-x}{h_{n}}\right).

(4)

Here the function $K(\cdot)$ is assumed to be symmetric and satisfies $\int_{\mathbb{R}^{p}}K(x)dx=1$ . This estimator was introduced by Hall and Marron, (1987) and was studied thoroughly (in terms of adaptivity) for $p=1$ in Giné and Nickl, (2008).

Similarly, to estimate integrals involving the conditional expectation function from i.i.d. realizations $(X_{1},Y_{1}),\ldots,(X_{n},Y_{n})$ of $(X,Y)$ , the following $U$ -statistics appears:

{U}_{n}^{\star}:=\frac{1}{n(n-1)h_{n}^{p}}\sum_{1\leq i\neq j\leq n}Y_{i}K\left(\frac{X_{i}-X_{j}}{h_{n}}\right)Y_{j}.

(5)

Aside from these prototypical examples, various other examples of such $U$ -statistics are encountered in the literature on integral approximation involving kernel smoothing estimators (Newey and Ruud,, 2005; Delyon and Portier,, 2016) and the semiparametric inference literature on quadratic and integral-type functionals (Robins et al.,, 2016). In the latter literature, $U$ -statistics of this type – especially in their degenerate form (see below for the definition) – are fundamentally involved in the analysis of so-called doubly robust estimators of certain functionals encountered in missing data or causal inference problems (Robins et al.,, 1994; Bang and Robins,, 2005), as well as in the literature on adaptive estimation of functionals based on so-called higher order influence functions (Robins et al.,, 2008, 2017; Liu et al.,, 2021).

Apart from the nonparametric and semiparametric statistics literature, second order $U$ -statistics also arise in relation to Hanson-Wright-type inequalities. The classical Hanson-Wright inequality concerns tail bounds for the quadratic form $G^{\top}AG$ where $G$ is a standard multivariate normal random vector in $\mathbb{R}^{n}$ and $A\in\mathbb{R}^{n\times n}$ is a positive semi-definite matrix; see Theorem 3.1.9 of Giné and Nickl, (2016). For further applications of Hanson-Wright inequalities, see Rudelson and Vershynin, (2013) and Spokoiny and Zhilova, (2013), as well as the recent work of He et al., (2024) on sparse random vectors. Note that for any random vector $Y\in\mathbb{R}^{n}$ and matrix $A\in\mathbb{R}^{n\times n}$

Y^{\top}AY=\sum_{1\leq i,j\leq n}Y_{i}A(i,j)Y_{j},

(6)

where $A(i,j)$ represents the $i$ -th row, $j$ -th column entry in the matrix $A.$

Motivated by the examples above, we study the properties of the $U$ -statistic $U_{n}$ . Before proceeding further, we briefly discuss degenerate and non-degenerate $U$ -statistics. See Serfling, (1980, Chapter 5) for more details. This discussion proves that for a precise understanding of the tail behavior of a $U$ -statistics it suffices to consider degenerate $U$ -statistics. In fact, most of the asymptotic normality results related to $U$ -statistics are shown by proving asymptotic negligebility of the degenerate $U$ -statistics compared to the linear statistic; see, for example, Chen and Kato, (2020). This paper is partly motivated by the cases where such asymptotic negligebility may not hold. For example, in the context of estimating $\mu^{2}$ based on IID observations $X_{1},\ldots,X_{n}$ with mean $\mu$ , the unbiased estimator $\binom{n}{2}^{-1}\sum_{i\neq j}X_{i}X_{j}$ exhibits a phase transition at $\mu=O(n^{-1/2})$ in terms of rate and also the limiting distribution.

Degenerate or Canonical $U$ -statistics.

For any sequence of functions (called kernels) $f_{i,j}(\cdot,\cdot)$ and independent random variables $Z_{1},\ldots,Z_{n}$ , a $U$ -statistic is given by

T_{n}:=\sum_{1\leq i\neq j\leq n}\,f_{i,j}(Z_{i},Z_{j}).

Note that the diagonal terms ( $i=j$ cases) are ignored in the summation above. If these diagonal terms are included then the resulting statistic is called a $V$ -statistic. The $U$ -statistic $U_{n}$ is called degenerate or canonical if the kernel functions satisfy

\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{i}\right]=\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{j}\right]=0,\quad\mbox{for all}\quad 1\leq i\neq j\leq n.

(7)

If the kernel functions do not satisfy (7), then the corresponding $U$ -statistic is called non-degenerate. It is not difficult to see that a non-degenerate $U$ -statistic can be written as a sum of independent mean zero random variables and a degenerate $U$ -statistic:

T_{n}=\sum_{1\leq i\neq j\leq n}f_{i,j}^{D}(Z_{i},Z_{j})+\sum_{j=1}^{n}g_{j}(Z_{j})+\sum_{i=1}^{n}h_{i}(Z_{i})=:\mathcal{U}_{n}(f)+T_{n}^{(1)}+T_{n}^{(2)},

(8)

where

$\displaystyle f^{D}_{i,j}(Z_{i},Z_{j})$	$\displaystyle:=f_{i,j}(Z_{i},Z_{j})-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{\|}Z_{j}\right]-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{\|}Z_{i}\right]+\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\right],$
$\displaystyle g_{j}(Z_{j})$	$\displaystyle:=\sum_{i=1,i\neq j}^{n}\Big{\{}\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{\|}Z_{j}\right]-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\right]\Big{\}},$	(9)
$\displaystyle h_{i}(Z_{i})$	$\displaystyle:=\sum_{j=1,j\neq i}^{n}\Big{\{}\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{\|}Z_{i}\right]-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\right]\Big{\}}.$

It is clear from these expressions that the kernels $f_{i,j}^{D}(\cdot,\cdot)$ satisfy (7) and so are degenerate kernels. Since $T_{n}^{(1)}$ and $T_{n}^{(2)}$ in (8) are sums of independent random variables with mean zero, they can be understood easily from the classical results like the central limit theorem (asymptotically) and Bernstein/Hoeffding or more general inequalities (non-asymptotically). For this reason, we focus mostly on the degenerate part of (8) in the rest of the paper and derive non-asymptotic moment as well as tail bounds when the non-degenerate $U$ -statistics is of the form (1). Our main tool is the decoupling inequality proved in de la Peña, (1992). We refer to de la Peña and Giné, (1999, Chapter 3) for more details regarding decoupling in $U$ -statistics.

After deriving non-asymptotic tail bounds for degenerate $U$ -statistics, we provide the same for supremum of degenerate $U$ -statistics over a function class. Suppose $\mathcal{F}_{n}$ is a class of sequence of functions (degenerate kernels) of type $f:=\{f_{i,j}^{D}(\cdot,\cdot):\,1\leq i\neq j\leq n\}$ and define

\mathcal{U}_{n}(f):=\sum_{1\leq i\neq j\leq n}\,f_{i,j}^{D}(Z_{i},Z_{j}).

Then $\{\mathcal{U}_{n}(f):\,f\in\mathcal{F}_{n}\}$ can be viewed as a process called the $U$ -process and we provide exponential tail bounds for the supremum:

\mathcal{U}_{n}(\mathcal{F}):=\sup_{f\in\mathcal{F}_{n}}\,\left|\mathcal{U}_{n}(f)\right|.

An important application would be the study of uniform-in-bandwidth properties of the estimator $\widehat{\Gamma}(f)$ in (3), that is,

\sup_{h_{n}\in[a_{n},b_{n}]}\left|\widehat{\Gamma}(f;h_{n})-\mathbb{E}\left[\widehat{\Gamma}(f;h_{n})\right]\right|,

for some numbers $a_{n},b_{n}\in(0,1)$ . Further applications can be found in de la Peña and Giné, (1999, Section 5.5) and Major, (2013). As a final note, we mention that even though our techniques extend to $U$ -statistics/processes of higher order, we restrict ourselves to second order $U$ -statistics/processes for simplicity and ease of exposition.

1.1 Related Literature

In this section, we review some of the by-now classical exponential tail bounds for degenerate $U$ -statistics and supremum of $U$ -processes. Proposition 2.3 of Arcones and Giné, (1993) proved a Bernstein type inequality for degenerate $U$ -statistics/processes. Specifically, for the degenerate $U$ -statistics

U_{n}:=n^{-1}\sum_{1\leq i\neq j\leq n}\,f(Z_{i},Z_{j}),

with i.i.d. random variables $Z_{1},\ldots,Z_{n}$ , $\sigma^{2}:=\mathbb{E}f^{2}(Z_{i},Z_{j})$ and $\left\lVert f\right\rVert_{\infty}\leq C$ , they show there exists constants $c_{1},c_{2}>0$ such that for any $t>0$ ,

\mathbb{P}\left(|U_{n}|\geq t\right)\leq c_{1}\exp\left(-\frac{c_{1}t}{\sigma+(Ct^{1/2}n^{-1/2})^{2/3}}\right).

This tail bound has two regimes: exponential and Weibull of order $2/3$ . Because of the appearance of the variance, this tail bound provides the correct rate of convergence. Theorem 3.3 of Giné et al., (2000) improved the tail bound by providing the optimal four regimes of the tail: Gaussian, exponential, Weibull of orders $2/3$ and $1/2$ . Houdré and Reynaud-Bouret, (2003) gave an alternative proof to the result of Giné et al., (2000) using martingale inequalities with explicit constants. In particular, Theorem 3.3 of Giné et al., (2000) shows that for all $t\geq 0$ ,

\mathbb{P}\left(|nU_{n}|\geq t\right)\leq L\exp\left(-\frac{1}{L}\min\left\{\frac{t^{2}}{C^{2}},\frac{t}{D},\frac{t^{2/3}}{B^{2/3}},\frac{t^{1/2}}{A^{1/2}}\right\}\right),

for some constants $A,B,C,D$ and $L$ . The main disadvantage of the results above is the restrictive boundedness assumption. Theorem 3.2 of Giné et al., (2000) actually applies without the boundedness condition but the tail bound thus obtained is sub-optimal. For example, if $f(Z_{i},Z_{j})=Y_{i}g(X_{i},X_{j})Y_{j}$ where $Z_{i}=(X_{i},Y_{i})$ , $\left\lVert g\right\rVert_{\infty}\leq C<\infty$ and $Y_{i}$ ’s are mean zero (conditionally) sub-Weibull variables of order $\alpha>0$ , that is, $\mathbb{P}(|Y_{i}|\geq t|X_{i})\leq 2\exp(-t^{\alpha})$ . Then, Theorem 3.2 of Giné et al., (2000) implies a tail bound of the form:

\mathbb{P}\left(|nU_{n}|\geq t\right)\leq L\exp\left(-\frac{1}{L}\min\left\{\frac{t^{2}}{C^{2}},\frac{t}{D},\frac{t^{\alpha_{1}}}{B^{\alpha_{1}}},\frac{t^{\alpha_{2}}}{A^{\alpha_{2}}}\right\}\right),

where $\alpha_{1}^{-1}=(3/2+1/\alpha)$ and $\alpha_{2}^{-1}=(2+2/\alpha)^{-1}$ . This is sub-optimal in comparison with the results of Kolesko and Latała, (2015, Example 3). On the other hand, the results of Kolesko and Latała, (2015) do not get the correct rate of convergence as can be obtained from the results of Giné et al., (2000). This is because the bound of Kolesko and Latała, (2015) does not depend on the variance. We are not aware of any tail bounds in the literature that implies the correct rate of convergence as well as the optimal tail behavior. We also note here the recent work of Bakhshizadeh, (2023) which appeared after the initial working version (Chakrabortty and Kuchibhotla,, 2018) of this preprint. While they do consider general unbounded kernels, their focus is primarily on exponential bounds and large deviation principles for non-degenerate $U$ -statistics, different from ours.

In regards to the tail bounds for degenerate $U$ -processes, some of the important works are Adamczak, (2006), Clémençon et al., (2008) and Major, (2013). The latter two papers only consider bounded kernels and the bounds of Adamczak, (2006) are written in terms of functionals that are in general hard to control. The results of Major, (2005) and Major, (2013) apply only to bounded kernels and are written for VC classes $\mathcal{F}_{n}$ but imply the correct rate of convergence. However, the results there do not show the optimal four regimes in the tail behavior. Theorem 11 of Clémençon et al., (2008) is written as a deviation inequality but does not imply the correct rate of convergence. For instance, if $f(X_{i},X_{j})=\varepsilon_{i}\varepsilon_{j}K((X_{i}-X_{j})/h)$ with $\varepsilon_{i}$ being Rademacher random variables independent of $X_{i}\in\mathbb{R}^{p}$ , then the rate of convergence of

T_{n}:=\sup_{h\in\{h_{n}\}}\,\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}K\left(\frac{X_{i}-X_{j}}{h}\right)\right|,

from Theorem 11 of Clémençon et al., (2008) is $n\left\lVert K\right\rVert_{\infty}=O(n)$ (because of $Fn$ in the moment bound) but the correct rate of convergence is $nh_{n}^{p/2}$ (that can be obtained by calculating the variance). As in the case of $U$ -statistics, we are not aware of any tail bound results that can obtain the correct rate of convergence and apply to unbounded kernels. Using the techniques of truncation, decoupling technique and the entropy method of Boucheron et al., (2005), we prove a deviation inequality for degenerate $U$ -processes that implies the correct rate of convergence and the optimal tail behavior.

Organization.

The rest of the article is organized as follows. In Section 2 we prove exponential tail bounds for second order degenerate $U$ -statistics. In Section 3 we prove a deviation bound for degenerate $U$ -processes and also provide maximal inequalities to control the expectation of the maximum. The proofs of all the results are distributed in Appendices A, B and C.

2 Tail Bounds for Degenerate $U$ –Statistics

We prove two tail bounds for degenerate $U$ -statistics. The first is a general result applicable to all kernels that are bounded above by a product kernel and the second result is for more structured kernels that are of importance in non- and semi-parametric estimation. Define a random variable $W$ to be sub-Weibull of order $\alpha>0$ if $\left\lVert W\right\rVert_{\psi_{\alpha}}<\infty,$ where $\psi_{\alpha}(x)=\exp(x^{\alpha})-1$ for $x\geq 0$ and

\left\lVert W\right\rVert_{\psi_{\alpha}}=\inf\left\{C\geq 0:\,\mathbb{E}\left[\psi_{\alpha}\left(|W|/C\right)\right]\leq 1\right\}.

Several properties of sum of independent sub-Weibull random variables are derived in Kuchibhotla and Chakrabortty, (2022). The main focus of this section is to extend these results to degenerate $U$ -statistics.

Consider a degenerate $U$ -statistics

U_{n}^{D}:=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}),

where $Z_{1},\ldots,Z_{n}$ are independent random variables and $\{f_{i,j}(\cdot,\cdot):\,1\leq i\neq j\leq n\}$ is a collection of degenerate (or canonical) kernels, i.e.,

\mathbb{E}[f_{i,j}(Z_{i},Z_{j})|Z_{i}]=0=\mathbb{E}[f_{i,j}(Z_{i},Z_{j})|Z_{j}].

We assume the following on the degenerate kernel $f_{i,j}$ :

(A1)

For $1\leq i\neq j\leq n$ , there exist non-negative functions $F_{i}(\cdot)$ and $G_{j}(\cdot)$ such that

|f_{i,j}(Z_{i},Z_{j})|\leq F_{i}(Z_{i})G_{j}(Z_{j})\quad\mbox{and}\quad\|F_{i}(Z_{i})\|_{\psi_{\alpha}}\leq K_{F},\;\|G_{j}(Z_{j})\|_{\psi_{\beta}}\leq K_{G}.

The first part of assumption (A1) implies that the degenerate kernel $f_{i,j}$ can be expressed as $f_{i,j}(Z_{i},Z_{j})=F_{i}(Z_{i})w_{i,j}(Z_{i},Z_{j})G_{j}(Z_{j})$ for some collection of bounded kernels $\{w_{i,j}:1\leq i\neq j\leq n\}$ . No further structure on $w_{i,j}$ ’s is required. In the second result we consider below, we place additional structure on $w_{i,j}$ ’s. The second part of assumption (A1) means that

\mathbb{E}\left[\exp\left(\frac{|F_{i}(Z_{i})|^{\alpha}}{K_{F}^{\alpha}}\right)\right]\leq 2\quad\mbox{and}\quad\mathbb{E}\left[\exp\left(\frac{|G_{j}(Z_{j})|^{\beta}}{K_{G}^{\beta}}\right)\right]\leq 2.

Equivalently, $F_{i}(Z_{i})$ is sub-Weibull $(\alpha)$ and $G_{j}(Z_{j})$ is sub-Weibull $(\beta)$ , in the terminology of Kuchibhotla and Chakrabortty, (2022). To present the result, we define a few quantities. Let $(Z_{1}^{\prime},Z_{2}^{\prime},\ldots,Z_{n}^{\prime})$ be an independent copy of $(Z_{1},\ldots,Z_{n})$ .

\begin{split}\Lambda_{1/2}&:=\left(\mathbb{E}\left[\sum_{1\leq i\neq j\leq n}f_{i,j}^{2}(Z_{i},Z_{j})\right]\right)^{1/2},\\ \Lambda_{1}&=\|(f_{i,j})\|_{L^{2}\to L^{2}},\\ &:=\sup\left\{\mathbb{E}\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}^{\prime})\gamma_{i}(Z_{i})\delta_{j}(Z_{j}^{\prime}):\,\sum_{i=1}^{n}\mathbb{E}[\gamma_{i}^{2}(Z_{i})]\leq 1,\sum_{j=1}^{n}\mathbb{E}[\delta_{j}^{2}(Z_{j}^{\prime})]\leq 1\right\},\\ \Lambda_{\alpha}&:=\max_{1\leq i\leq n}\bigg{\|}\sum_{1\leq j\leq n,j\neq i}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{i}]\bigg{\|}_{\psi_{\alpha/2}}^{1/2},\\ \Lambda_{\beta}&:=\max_{1\leq j\leq n}\bigg{\|}\sum_{1\leq i\leq n,i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{j}^{\prime}]\bigg{\|}_{\psi_{\beta/2}}^{1/2}.\end{split}

(10)

The quantities $\Lambda_{1/2}$ and $\Lambda_{1}$ also appear in the moment bound for degenerate $U$ -statistics with bounded kernels; see Theorem 3.2 of Giné et al., (2000). Note that $\Lambda_{\alpha}$ can be trivially bounded as

\Lambda_{\alpha}\leq K_{F}\max_{1\leq i\leq n}\sup_{z}\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}\left[\frac{f_{i,j}^{2}(z,Z_{j}^{\prime})}{F_{i}(z)}\right]\right)^{1/2}.

Similar comment holds for $\Lambda_{\beta}$ as well. We use $\mathfrak{C},\mathfrak{C}_{\alpha},\mathfrak{C}_{\beta}$ and $\mathfrak{C}_{\alpha,\beta}$ to denote universal constants, constants depending on $\alpha,\beta,(\alpha,\beta)$ , respectively. We now present the first main result.

Theorem 1.

Under assumption (A1), for every $p\geq 1$ ,

\begin{split}&\left(\mathbb{E}\left[\left|\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j})\right|^{p}\right]\right)^{1/p}\\ &\leq\mathfrak{C}p^{1/2}\Lambda_{1/2}+\mathfrak{C}p\Lambda_{1}\\ &\quad+\mathfrak{C}_{\beta}p^{1/2+1/\beta^{*}}(\log n)^{1/\beta}\Lambda_{\beta}+\mathfrak{C}_{\alpha}p^{1/2+1/\alpha^{*}}(\log n)^{1/2+1/\alpha}\Lambda_{\beta}\\ &\quad+\mathfrak{C}_{\alpha,\beta}p^{1/\alpha^{*}+1/\beta^{*}}K_{F}K_{G}(\log n)^{1/\alpha+1/\beta+1/\beta^{*}},\end{split}

(11)

where $\alpha^{*}=\min\{\alpha,1\}$ and $\beta^{*}=\min\{\beta,1\}$ . Consequently, for every $\delta\in[0,1]$ , with probability at least $1-\delta$ ,

	$\displaystyle\left\|\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j})\right\|$
	$\displaystyle\leq\mathfrak{C}(\log(1/\delta))^{1/2}\Lambda_{1/2}+\mathfrak{C}\log(1/\delta)\Lambda_{1}$
	$\displaystyle\quad+\mathfrak{C}_{\beta}(\log(1/\delta))^{1/2+1/\beta^{}}(\log n)^{1/\beta}\Lambda_{\beta}+\mathfrak{C}_{\alpha}(\log(1/\delta))^{1/2+1/\alpha^{}}(\log n)^{1/2+1/\alpha}\Lambda_{\beta}$
	$\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}(\log(1/\delta))^{1/\alpha^{}+1/\beta^{}}K_{F}K_{G}(\log n)^{1/\alpha+1/\beta+1/\beta^{*}}.$

Proof.

See Appendix A.1 for a proof. ∎

Theorem 1 reduces to Theorem 3.2 of Giné et al., (2000) by setting $\alpha=\beta=\infty$ ; note that if $\alpha=\beta=\infty$ , then $\alpha^{*}=\beta^{*}=1$ and the log factors in the result become 1. The result is asymmetric in $\alpha,\beta$ only because of the structure of the proof. One can apply the result by switching the roles of $\alpha,\beta$ and take the minimum of the two bounds. We do not present this for brevity. It is interesting to note that the tail exhibits five different behaviors including the commonly expected sub-Gaussian and sub-exponential tails. Because we did not make any assumption on the symmetry of the kernel, $\alpha$ and $\beta$ can be different. Under an assumption of symmetry, $\alpha=\beta$ and Theorem 1 now yields a tail bound that only exhibits five regmies.

Assuming only (A1), Theorem 1 provides a moment and tail bound for degenerate $U$ -statistics. The appearance of the constants $\Lambda_{\alpha}$ and $\Lambda_{\beta}$ might make this result difficult to apply in some applications. For this reason, we provide our second result assuming a little more structure on the kernel. Suppose we have $n$ independent random variables $Z_{1}=(X_{1},Y_{1}),Z_{2}=(X_{2},Y_{2}),\ldots,Z_{n}=(X_{n},Y_{n})$ on some measurable space and sequence of functions $\{w_{i,j}(\cdot,\cdot):\,1\leq i\neq j\leq n\}$ . Consider, for functions $\phi(\cdot)$ and $\psi(\cdot)$ , the $U$ -statistic

U_{n}:=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}),\quad\mbox{where}\quad f_{i,j}(Z_{i},Z_{j}):=\phi(Z_{i})w_{i,j}(X_{i},X_{j})\psi(Z_{j}).

(12)

The kernels $f_{i,j}(\cdot,\cdot)$ are not required to be degenerate here. We will derive moment and tail bounds for the degenerate version of the $U$ -statistics $U_{n}^{D}$ given by

U_{n}^{D}:=\sum_{1\leq i\neq j\leq n}f_{i,j}^{D}(Z_{i},Z_{j}),

for the kernel $f_{i,j}^{D}(\cdot,\cdot)$ defined in (9). We first prove a basic lemma that reduces the problem of moment bounds on $U_{n}^{D}$ to a symmetrized version of $U_{n}$ ; see Theorem 3.5.3 of de la Peña and Giné, (1999). For any random variable $W$ , set $\left\lVert W\right\rVert_{p}=(\mathbb{E}[|W|^{p}])^{1/p}$ for $p\geq 1$ .

Lemma 1.

For any $p\geq 1$ ,

\left\lVert U_{n}^{D}\right\rVert_{p}\leq C\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},

for Rademacher random variables $(\varepsilon_{i},\varepsilon_{i}^{\prime}:\,1\leq i\leq n)$ . Here $C$ can be taken to be $192$ and $Z_{1}^{\prime}=(X_{1}^{\prime},Y_{1}^{\prime}),\ldots,Z_{n}^{\prime}=(X_{n}^{\prime},Y_{n}^{\prime})$ represents an independent of $n$ independent random variables such that $Z_{i}$ is identically distributed as $Z_{i}$ for $1\leq i\leq n$ .

The proof of this lemma (given in Appendix A.2) is based on the by-now classical decoupling inequalities of de la Peña, (1992) and de la Peña and Giné, (1999, Chapter 3). The result also holds in case of degenerate $U$ -processes and does not require the special structure of the kernels $f_{i,j}(\cdot,\cdot)$ in (12).

To prove moment and tail bounds for degenerate second order $U$ -statistics with unbounded kernels, we use the following assumptions. Consider the following assumptions.

(B1)

There exists constants $0<\alpha,\beta,C_{\phi},C_{\psi}<\infty$ such that

\max_{1\leq i\leq n}\,\mathbb{E}\left[\exp\left(\frac{|\phi(Z_{i})|^{\alpha}}{C_{\phi}^{\alpha}}\right)\big{|}X_{i}\right]\leq 2,\quad\mbox{and}\quad\max_{1\leq i\leq n}\,\mathbb{E}\left[\exp\left(\frac{|\psi(Z_{i})|^{\beta}}{C_{\psi}^{\beta}}\right)\big{|}X_{i}\right]\leq 2,

hold almost surely.

(B2)

The functions $\{w_{i,j}(\cdot,\cdot):\,1\leq i\neq j\leq n\}$ are all uniformly bounded, that is,

$\max_{1\leq i\neq j\leq n}\sup_{(x,x^{\prime})\in\mathfrak{X}\times\mathfrak{X}}\,\left|w_{i,j}(x,x^{\prime})\right|\leq B_{w}.$

The main technique in our proof is truncation and Hoffmann-Jørgensen’s inequality. Assumption (B1) implies that conditional on $X_{i}$ ’s the maximum of $\phi(Y_{i})$ is at most a polynomial of $\log n$ (in rate). This along with Assumption (B2) allows us to apply truncation at this rate and study the truncated part using the sharp results of Giné et al., (2000). The unbounded parts of smaller order are controlled using Hoffmann-Jørgensen’s inequality. The bound $B_{w}$ in Assumption (B2) is allowed to grown in $n$ and all the kernels are also allowed to be function of $n$ . All the results to be presented here are non-asymptotic. For more applications of this technique see Kuchibhotla and Chakrabortty, (2022).

Define

\displaystyle T_{\phi}:=8\mathbb{E}\left[\max_{1\leq i\leq n}\left|\phi(Z_{i})\right|\big{|}X_{1},\ldots,X_{n}\right],\quad T_{\psi}:=8\mathbb{E}\left[\max_{1\leq i\leq n}\left|\psi(Z_{i})\right|\big{|}X_{1},\ldots,X_{n}\right],

and the truncated random variables

\begin{split}\Phi_{i,1}:=\phi(Z_{i})\mathbbm{1}\{|\phi(Z_{i})|\leq T_{\phi}\},\quad&\mbox{and}\quad\Phi_{i,2}:=\phi(Z_{i})\mathbbm{1}\{|\phi(Z_{i})|>T_{\phi}\},\\ \Psi^{\prime}_{j,1}:=\psi(Z_{j}^{\prime})\mathbbm{1}\{|\psi(Z_{j}^{\prime})|\leq T_{\psi}\},\quad&\mbox{and}\quad\Psi^{\prime}_{j,2}:=\psi(Z_{j}^{\prime})\mathbbm{1}\{|\psi(Z_{j}^{\prime})|>T_{\psi}\}.\end{split}

(13)

It is clear that $\phi(Z_{i})=\Phi_{i,1}+\Phi_{i,2}$ and $\psi(Z_{j}^{\prime})=\Psi^{\prime}_{j,1}+\Psi^{\prime}_{j,2}.$ Based on these, note that

\begin{split}\phi(Z_{i})w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})&=\Phi_{i,1}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,1}+\Phi_{i,2}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,1}\\ &\qquad+\Phi_{i,1}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,2}+\Phi_{i,2}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,2}.\end{split}

(14)

The first term on the right hand side is bounded by $T_{\phi}B_{w}T_{\psi}$ . The second and third terms are non-zero only when $\Phi_{i,2}$ and $\Psi^{\prime}_{j,2}$ , are respectively non-zero, which can only happen with only a small probability under Assumption (B1). Finally, the fourth term can be non-zero only if both $\Phi_{i,2}$ and $\Psi^{\prime}_{j,2}$ are non-zero which can happen with even smaller probability. These four terms leads to four different degenerate $U$ -statistics that will be controlled separately in Section A.3 to prove the following result. We need the following notation: for $1\leq i,j\leq n$ ,

\sigma_{i,\phi}^{2}(x)=\mathbb{E}[\phi^{2}(Z_{i})\big{|}X_{i}=x]\quad\mbox{and}\quad\sigma_{j,\psi}^{2}(x)=\mathbb{E}[\psi^{2}(Z_{j})\big{|}X_{j}=x].

Define $\Lambda_{2}:=C_{\phi}C_{\psi}B_{w}(\log n)^{\alpha^{-1}+\beta^{-1}}$ and

	$\displaystyle\Lambda_{{1}/{2}}$	$\displaystyle:=\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w_{i,j}^{2}(X_{i},X_{j})\sigma_{j,\psi}^{2}(X_{j})\right]\right)^{1/2},$
	$\displaystyle\Lambda_{1}$	$\displaystyle:=\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.$
		$\displaystyle\qquad\qquad\quad\left.\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1,\,\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1\right\},$
	$\displaystyle\Lambda_{3/2}^{(\alpha)}$	$\displaystyle:=C_{\phi}(\log n)^{1/\alpha}\sup_{x}\max_{1\leq i\leq n}\left(\sum_{j=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j})\sigma^{2}_{j,\psi}(X_{j})\right]\right)^{1/2},$
	$\displaystyle\Lambda_{3/2}^{(\beta)}$	$\displaystyle:=C_{\psi}(\log n)^{1/\beta}\sup_{x}\max_{1\leq j\leq n}\left(\sum_{i=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(X_{i},x)\sigma^{2}_{i,\phi}(X_{i})\right]\right)^{1/2},$
	$\displaystyle\Lambda_{\alpha^{*}}$	$\displaystyle:=(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+(\log n)\Lambda_{2},\quad\mbox{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}and}$
	$\displaystyle\Lambda_{\beta^{*}}$	$\displaystyle:=(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}+(\log n)\Lambda_{2}.$

The quantities $\Lambda_{1/2},\Lambda_{1},\Lambda_{3/2}^{(\alpha)},\Lambda_{3/2}^{(\beta)},\Lambda_{2}$ also appear in the case of bounded kernels as shown in Theorem 3.2 of Giné et al., (2000).

Theorem 2.

Under Assumptions (B1) and (B2), there exists constant $K>0$ (depending only on $\alpha,\beta$ ) such that for all $p\geq 1$

	$\displaystyle\left\lVert U_{n}^{D}\right\rVert_{p}$	$\displaystyle\leq Kp^{1/2}\Lambda_{1/2}+Kp\Lambda_{1}+Kp^{1/\alpha^{}}\Lambda_{\alpha^{}}+Kp^{1/\beta^{}}\Lambda_{\beta^{}}$
		$\displaystyle\qquad+Kp^{1/2+1/\alpha^{}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{}}\Lambda_{3/2}^{(\beta)}+Kp^{1/\alpha^{}+1/\beta^{}}\Lambda_{2}.$

Here $\alpha^{*}:=\min\{\alpha,1\}$ and $\beta^{*}:=\min\{\beta,1\}$ . By Markov’s inequality, there exists a constant $K^{\prime}>0$ such that for any $t\geq 0$ ,

\mathbb{P}\left(|U_{n}^{D}|\geq K^{\prime}\mathcal{T}_{\alpha,\beta}(t)\right)\leq 2\exp(-t),

(15)

where

	$\displaystyle\mathcal{T}_{\alpha,\beta}(t)$	$\displaystyle:=\sqrt{t}\Lambda_{1/2}+t\Lambda_{1}+t^{1/\alpha^{}}\Lambda_{\alpha^{}}+t^{1/\beta^{}}\Lambda_{\beta^{}}$
		$\displaystyle\qquad+t^{1/2+1/\alpha^{}}\Lambda_{3/2}^{(\alpha)}+t^{1/2+\beta^{}}\Lambda_{3/2}^{(\beta)}+t^{1/\alpha^{}+1/\beta^{}}\Lambda_{2}.$

Proof.

See Appendix A.3 for a proof. ∎

Remark 2.1 (Comparison with previous results) As noted in the introduction, an important feature of our result is that the kernel is allowed to be unbounded with proper tail behavior. The tail of the degenerate $U$ -statistics as shown in (15) has seven different regimes, the prominent ones being the Gaussian and exponential parts. These seven regimes collapse to five if $\alpha=\beta$ . In particular, if $\alpha=\beta\leq 1$ , then for $p\geq 1$ ,

	$\displaystyle\left\lVert U_{n}^{D}\right\rVert_{p}$	$\displaystyle\leq K\mathbf{p^{1/2}}\Lambda_{1/2}+K\mathbf{p}\Lambda_{1}+K\mathbf{p^{1/\alpha}}\left[(\log n)^{1/2}\left\{\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right\}+(\log n)\Lambda_{2}\right]$
		$\displaystyle\quad+K\mathbf{p^{1/2+1/\alpha}}\left[\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right]+K\mathbf{p^{1/\alpha+1/\beta}}\Lambda_{2}.$

If $\alpha=\beta=\infty$ , then our assumption (B1) implies boundedness of the kernels. In this case, only four regimes remain and these four regimes coincide with those shown in Theorem 3.2 of Giné et al., (2000). Additionally in the case of bounded kernels ( $\alpha=\beta=\infty$ ), Theorem 2 essentially coincides with Theorem 3.2 of Giné et al., (2000) except for the additional $\sqrt{\log n}$ and $\log n$ factors. We believe these to be artifacts of our proof and closely following the proof of Theorem 1, they could be avoided. $\diamond$

3 Tail Bounds for Degenerate $U$ –Processes

In this section, we generalize Theorem 2 to degenerate $U$ -processes. Consider

\mathcal{U}_{n}(\mathcal{W}):=\sup_{w\in\mathcal{W}}\left|\mathcal{U}_{n}(w)\right|,\quad\mbox{where}\quad\mathcal{U}_{n}(w):=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\phi(Z_{i})w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})\varepsilon_{j}^{\prime},

for some function class $\mathcal{W}$ with elements of the type $w=(w_{i,j})_{1\leq i\neq j\leq n}$ . If $\mathcal{W}$ is a singleton, then this reduces to the $U$ -statistic studied in Section 2. Here $\varepsilon_{1},\ldots,\varepsilon_{n}$ denote an independent sequence of Rademacher random variables as before. For simplicity, we consider the symmetrized version and by Lemma 1 the results also hold for the original degenerate $U$ -process; see Theorem 3.5.3 of de la Peña and Giné, (1999) for details.

$U$ -processes were introduced in Nolan and Pollard, (1987) to study cross-validation in the context of kernel density estimation. They studied uniform almost sure limit theorems and established the rate of convergence. These results parallel the Glivenko-Cantelli theorems well-known for empirical processes. Functional limit theorems were established in Nolan et al., (1988). Exponential tail bounds that parallel the classical Bernstein’s inequality for non-degenerate and degenerate $U$ -statistics were given in Arcones and Giné, (1993). They also established LLN and CLT type results under various metric entropy conditions. Most of these results require boundedness of the kernel functions. Being asymptotic in nature, some of these results can be extended to the case of unbounded kernels using a truncation argument. Finite sample concentration inequalities for degenerate unbounded $U$ -processes are not readily available.

The only work (we are aware of) that provides general results for $U$ -processes applicable to $\mathcal{U}_{n}(\mathcal{W})$ is Adamczak, (2006). In this work, degenerate $U$ -processes of arbitrary order were considered. However, the moment bounds for $U$ -processes in this work depend further on the moment bounds of some complicated degenerate $U$ -processes of lower order. Furthermore, the tail behavior thus obtained is not sharp for unbounded $U$ -processes.

To avoid measurability issues for $\mathcal{U}_{n}(\mathcal{W})$ , we use either of the following conventions. One simple assumption on $\mathcal{W}$ used in van der Vaart and Wellner, (1996) that implies measurability is separability and in this case we can take $\mathcal{W}$ to be a dense countable subset of $\mathcal{W}$ . Another convention used in Talagrand, (2014) is to define for any $\mathcal{W}$ and increasing function $f(\cdot)$ ,

\mathbb{E}\left[f\left(\mathcal{U}_{n}(\mathcal{W})\right)\right]:=\sup\left\{\mathbb{E}\left[f(\mathcal{U}_{n}(\mathcal{F}))\right]:\,\mathcal{F}\subseteq\mathcal{W}\mbox{ a finite subset}\right\}.

Based on either convention, we treat $\mathcal{W}$ as a countable set for the remaining part of this section.

One “simple” way to obtain tail bounds for $\mathcal{U}_{n}(\mathcal{W})$ is via generic chaining as follows: First apply Theorem 2 for $\mathcal{U}_{n}(w)-\mathcal{U}_{n}(w^{\prime})$ for functions $w,w^{\prime}\in\mathcal{W}$ . The tail bound (15) provides a mixed tail in terms of various semi-metrics on $\mathcal{W}$ . Using these and following the proof of classical generic chaining bound (e.g., Theorem 3.5 of Dirksen, (2015)), one can obtain tail bounds for $U$ -processes in terms of $\gamma$ -functionals; see Talagrand, (2014) and Dirksen, (2015) for details. A problem with this approach is the complication in controlling the $\gamma$ -functionals. This approach with Dudley’s chaining (instead of generic chaining) was used for bounded kernel $U$ -processes in Nolan and Pollard, (1987) and Nolan et al., (1988).

In the following, we first provide a deviation inequality for $\mathcal{U}_{n}(\mathcal{W})$ and then prove a maximal inequality to control the expectations appearing in the deviation inequality. For these results, we consider the following generalization of assumption (B2).

(A2^′)

The functions $\{w:\,w\in\mathcal{W}\}$ are all uniformly bounded, that is,

\sup_{w\in\mathcal{W}}\sup_{(x,x^{\prime})\in\mathfrak{X}\times\mathfrak{X}}\,\max_{1\leq i\neq j\leq n}\left|w_{i,j}(x,x^{\prime})\right|\leq B_{\mathcal{W}}.

We will use the notation of $\Phi_{i,1},\Phi_{i,2},\Psi_{i,1}^{\prime},\Psi_{i,2}^{\prime}$ given in (13). For the main result of this section, define

	$\displaystyle\Lambda_{2}(\mathcal{W})$	$\displaystyle:=(\log n)^{\alpha^{-1}+\beta^{-1}}C_{\phi}C_{\psi}B_{\mathcal{W}},$
	$\displaystyle E_{n,1}(\mathcal{W})$	$\displaystyle:=C_{\psi}(\log n)^{1/\beta}\sup_{x\in\mathfrak{X}}\max_{1\leq j\leq n}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left\|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},x)\right\|\right],$
	$\displaystyle{E}_{n,2}(\mathcal{W})$	$\displaystyle:=C_{\phi}(\log n)^{1/\alpha}\sup_{x\in\mathfrak{X}}\max_{1\leq i\leq n}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left\|\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(x,X_{j}^{\prime})\right\|\right],$
	$\displaystyle\mathfrak{W}_{n,1}(\mathcal{W})$	$\displaystyle:=\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{p_{j}\}}\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}\int p_{j}(x)\sigma_{j,\psi}(x)w_{i,j}(X_{i},x)P_{X_{j}}(dx)\right],$
	$\displaystyle\mathfrak{W}_{n,2}(\mathcal{W})$	$\displaystyle:=\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\sum_{1\leq i\neq j\leq n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right],$
	$\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\Sigma_{n,1}^{1/2}(\mathcal{W})}$	$\displaystyle:=C_{\psi}(\log n)^{1/\beta}\sup_{x\in\mathfrak{X}}\sup_{w\in\mathcal{W}}\,\max_{1\leq j\leq n}\left(\sum_{i=1,i\neq j}^{n}\mathbb{E}[\sigma_{i,\phi}^{2}(X_{i})w^{2}_{i,j}(X_{i},x)]\right)^{1/2},$
	$\displaystyle{\Sigma}_{n,2}^{1/2}(\mathcal{W})$	$\displaystyle:=C_{\phi}(\log n)^{1/\alpha}\sup_{x\in\mathfrak{X}}\sup_{w\in\mathcal{W}}\max_{1\leq i\leq n}\left(\sum_{j=1,j\neq i}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j})w^{2}_{i,j}(x,X_{j})\right]\right)^{1/2},$
	$\displaystyle\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}$	$\displaystyle:=\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\sup_{\{p_{j}\}}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j}^{\prime})\sigma_{j,\psi}(X_{j}^{\prime})p_{j}(X_{j}^{\prime})\right].$

Here in the definitions, the supremum over $\{q_{i}\}$ (or $\{p_{j}\}$ ) represents supremum over all function $(q_{1},\ldots,q_{n})$ (or $(p_{1},\ldots,p_{n})$ )satisfying

\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1,\quad\mbox{and}\quad\sum_{j=1}^{n}\int p_{j}^{2}(x)P_{X_{i}}(dx)\leq 1,

where $P_{X_{i}}(\cdot)$ denotes the probability measure of $X_{i}$ . Note that $\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}$ is similar to $\Lambda_{1}$ defined for Theorem 2.

Theorem 3.

Under assumptions (B1) and (A2^′), there exists a constant $K>0$ (depending only on $\alpha,\beta$ ) such that for all $p\geq 1$

	$\displaystyle\left\lVert\mathcal{U}_{n}(\mathcal{W})\right\rVert_{p}$	$\displaystyle\leq K\mathbb{E}\left[\mathcal{U}_{n}^{(1)}(\mathcal{W})\right]+Kp^{1/2}(\mathfrak{W}_{n,1}(\mathcal{W})+\mathfrak{W}_{n,2}(\mathcal{W}))+Kp\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}$
		$\displaystyle\quad+Kp^{1/\alpha^{*}}\left[E_{n,2}(\mathcal{W})+\Sigma_{n,2}^{1/2}(\mathcal{W})\sqrt{\log n}+\Lambda_{2}(\mathcal{W})\log n\right]$
		$\displaystyle\quad+Kp^{1/\beta^{*}}\left[E_{n,1}(\mathcal{W})+\Sigma_{n,1}^{1/2}(\mathcal{W})\sqrt{\log n}+\Lambda_{2}(\mathcal{W})\log n\right]$
		$\displaystyle\quad+Kp^{1/2+1/\alpha^{}}\Sigma_{n,2}^{1/2}(\mathcal{W})+Kp^{1/2+1/\beta^{}}\Sigma_{n,1}^{1/2}(\mathcal{W})+Kp^{1/\alpha^{}+1/\beta^{}}\Lambda_{2}(\mathcal{W}).$

Proof.

See Appendix B.1 for a proof. ∎

If $\mathcal{W}$ is a singleton set, then the above result reduces to Theorem 2. From the moment bound above, it is easy to derive a tail bound using Markov’s inequality. In comparison, we again get seven different tail regimes that again reduce to five if $\alpha=\beta$ . Unlike the result of Adamczak, (2006), the moment bound in Theorem 3 only depends on some expectations. An additional advantage of Theorem 3 is that all the expectations only involve bounded random variables.

3.1 Maximal Inequality for Bounded Degenerate U-Processes

To apply Theorem 3, we need to control various expectations appearing on the right hand side of the moment bound there. Expect for $\mathbb{E}[\mathcal{U}_{n}^{(1)}(\mathcal{W})]$ , all the other quantities are maximal inequalities related to empirical processes. See van der Vaart and Wellner, (2011) and Lemmas 3.4.2-3.4.3 of van der Vaart and Wellner, (1996) for maximal inequalities of empirical processes. In this section, we provide a maximal inequality for $\mathcal{U}_{n}^{(1)}(\mathcal{W})$ . For independent and identically distributed random variables, Chen and Kato, (2020, Theorem 5.1) provide a maximal inequality for degenerate $U$ -processes of arbitrary order. This result is similar to Theorem 2.1 of van der Vaart and Wellner, (2011) for empirical processes. The same proof as in Chen and Kato, (2020) does not provide the “correct” bound in the case of possibly non-identically distributed observations since they use Hoeffding averaging which can lead to sub-optimal rate if the observations are not identically distributed. A modification of the proof leads to the maximal inequality below.

For any $\eta>0$ , function class $\mathcal{F}$ containing functions $f=(f_{i,j})_{1\leq i\neq j\leq n}:{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\times{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\to\mathbb{R}$ and a discrete probability measure $Q$ with support $\{z_{1},\ldots,z_{t}\}$ , let $N(\eta,\mathcal{F},\left\lVert\cdot\right\rVert_{2,Q})$ denotes the minimum $m$ such that there exists $f^{(1)},f^{(2)},\ldots,f^{(m)}\in\mathcal{F}$ satisfying

\sup_{f\in\mathcal{F}}\inf_{1\leq j\leq m}\left\lVert f-f^{(j)}\right\rVert_{2,Q}\leq\eta,

where for $f\in\mathcal{F}$ ,

\left\lVert f\right\rVert_{2,Q}^{2}:=\frac{\sum_{1\leq i\neq j\leq t}f_{i,j}^{2}(z_{i},z_{j})Q(\{z_{i}\})Q(\{z_{j}\})}{\sum_{1\leq i\neq j\leq t}Q(\{z_{i}\})Q(\{z_{j}\})}.

Note that the right hand side is expectation with respect to the measure induced on $\{(z_{i},z_{j}):1\leq i\neq j\leq t\}$ . Define the uniform entropy integral needed for $U$ -processes is given by

J_{2}(\delta,\mathcal{F},\left\lVert\cdot\right\rVert_{2}):=\sup_{Q}\int_{0}^{\delta}\log N(\eta\left\lVert F\right\rVert_{2,Q},\mathcal{F},\left\lVert\cdot\right\rVert_{2,Q})d\eta.

Here $F=(F_{i,j})_{1\leq i\neq j\leq n}$ represents the envelope function for $\mathcal{F}$ satisfying $|f_{i,j}(x,x^{\prime})|\leq F_{i,j}(x,x^{\prime})$ for all $f\in\mathcal{F},x,x^{\prime}\in{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}$ and the supremum is taken over all discrete probability measures $Q$ supported on ${\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\times{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}$ .

The following Lemma proves a maximal inequality using Theorem 5.1.4 of de la Peña and Giné, (1999). The proof is very similar to that of Theorem 5.1 of Chen and Kato, (2020) which itself was based on the proof of Theorem 2.1 of van der Vaart and Wellner, (2011).

Theorem 4.

Suppose $\mathcal{F}$ represent a class of real-valued functions $f:{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\times{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\to\mathbb{R}$ uniformly bounded by $R$ with the envelope function $F$ . Then there exists a universal constant $C>0$ such that

\mathbb{E}\left[\sup_{f\in\mathcal{F}}\left|\frac{\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f_{i,j}(X_{i},X_{j})}{\sqrt{n(n-1)}}\right|\right]\leq C\left\lVert F\right\rVert_{2,P}J_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})\left[1+\frac{J_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})b^{2}}{a^{2}}\right],

for any $a\geq A_{n}$ and $b\geq B_{n}$ , where $B_{n}^{2}=R/(n\left\lVert F\right\rVert_{2,P})$ ,

	$\displaystyle\left\lVert F\right\rVert_{2,P}^{2}$	$\displaystyle:=\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}[F^{2}_{i,j}(X_{i},X_{j})],$
	$\displaystyle A_{n}^{2}$	$\displaystyle:=\left\lVert F\right\rVert_{2,P}^{-2}\left[\Gamma_{n,1}^{2}(\mathcal{F})+\Gamma_{n,2}^{2}(\mathcal{F})+\Sigma_{n}^{2}(\mathcal{F})\right],$
	$\displaystyle\Gamma_{n,1}^{2}(\mathcal{F})$	$\displaystyle:=\mathbb{E}\left[\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\left\|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\|X_{i}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right\|\right],$
	$\displaystyle\Gamma_{n,2}^{2}(\mathcal{F})$	$\displaystyle:=\mathbb{E}\left[\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\left\|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\|X_{j}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right\|\right],$
	$\displaystyle\Sigma_{n}^{2}(\mathcal{F})$	$\displaystyle:=\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right].$

Proof.

See Appendix C for a proof. ∎

APPENDIX

Appendix A Proofs of Results in Section 2

A.1 Proof of Theorem 1

By Theorem 3.5.3 of de la Peña and Giné, (1999), it follows that

\|U_{n}^{D}\|_{p}\leq\mathfrak{C}\left\|\mathcal{U}_{n}\right\|_{p},

where

\mathcal{U}_{n}:=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}^{\prime}).

For $1\leq i\leq n$ and any $z$ , define

h_{i}(z):=\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\,f_{i,j}(z,Z_{j}^{\prime}).

Observe that

\mathcal{U}_{n}=\sum_{i=1}^{n}h_{i}(Z_{i}).

First, we consider the behavior of $h_{i}(z)$ for a fixed $z$ and then the behavior of $\mathcal{U}_{n}$ . By the degeneracy of the kernel, we have that $h_{i}(z)$ is a sum of independent mean zero random variables for every $i,z$ . Moreover, $\|f_{i,j}(z,Z_{j}^{\prime})\|_{\psi_{\beta}}\leq F_{i}(z)K_{G}$ for all $i,z$ . Hence, Theorem 3.4 of Kuchibhotla and Chakrabortty, (2022) (with $q=1$ and $t=\log(\delta/3)$ ) implies

\mathbb{P}\left(|h_{i}(z)|\geq 7\sqrt{\log(3/\delta)}\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(z,Z_{j}^{\prime})]\right)^{1/2}+C_{\beta}F_{i}(z)K_{G}(\log(2n))^{1/\beta}(\log(3/\delta))^{1/\beta^{*}}\right)\leq\delta,

where $\beta^{*}=\min\{\beta,1\}$ . Based on this, define

H_{i}(z;\delta):=F_{i}(z)\mathcal{B}_{i}(z,\delta/n),

where

\mathcal{B}_{i}(z,\delta):=7\sqrt{\log(3/\delta)}\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}\left[\frac{f_{i,j}^{2}(z,Z_{j}^{\prime})}{F_{i}^{2}(z)}\right]\right)^{1/2}+C_{\beta}K_{G}(\log(2n))^{1/\beta}(\log(3/\delta))^{1/\beta^{*}}.

Getting back to the behavior of $\mathcal{U}_{n}$ , we first note that by degeneracy and symmetrization,

\|\mathcal{U}_{n}\|_{p}\leq 2\left\|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\right\|_{p}\quad\mbox{for all}\quad p\geq 1.

(16)

Here $\varepsilon_{1},\ldots,\varepsilon_{n}$ are independent Rademacher random variables independent of $Z_{1},\ldots,Z_{n}$ , $Z_{1}^{\prime},\ldots,Z_{n}^{\prime}$ . Hence, it suffices to understand the behavior of

\mathcal{U}_{n}^{\prime}:=\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i}).

(The introduction of Rademacher variables is only done for notational convenience in applying truncation.)

\begin{split}\mathbb{P}\left(|\mathcal{U}_{n}^{\prime}|\geq t\right)&\leq\mathbb{P}\left(\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right|\geq t\right)\\ &\quad+\mathbb{P}\left(|h_{i}(Z_{i})|>H_{i}(Z_{i};\delta_{1})\quad\mbox{for some}\quad 1\leq i\leq n\right)\\ &\leq\mathbb{P}\left(\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right|\geq t\right)\\ &\quad+\sum_{i=1}^{n}\mathbb{P}(|h_{i}(Z_{i})|>H_{i}(Z_{i};\delta_{1}))\\ &\leq\mathbb{P}\left(\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right|\geq t\right)+\delta_{1}.\end{split}

(17)

Because $\{h_{i}(Z_{i}):1\leq i\leq n\}$ are independent random variables conditional on $\{Z_{j}^{\prime}:1\leq j\leq n\}$ , we get by another application of Theorem 3.4 of Kuchibhotla and Chakrabortty, (2022) (with $q=1$ and $t=\log(3/\delta_{2})$ ) that conditional on $\{Z_{j}^{\prime}:1\leq i\leq n\}$ , with probability at least $1-\delta_{2}$ ,

\begin{split}&\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i})\}\right|\\ &\quad\leq 7\sqrt{\log(3/\delta_{2})}\left(\sum_{i=1}^{n}\mathbb{E}[h_{i}^{2}(Z_{i})|\{Z_{j}^{\prime}\}]\right)^{1/2}\\ &\qquad+C_{\alpha}(\log(2n))^{1/\alpha}(\log(3/\delta_{2}))^{1/\alpha^{*}}\max_{1\leq i\leq n}\left\|h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right\|_{\psi_{\alpha}|\{Z_{j}^{\prime}\}}.\end{split}

(18)

Observe now that

	$\displaystyle\left\\|h_{i}(Z_{i})\mathbf{1}\{\|h_{i}(Z_{i})\|\leq H_{i}(Z_{i};\delta_{1})\}\right\\|_{\psi_{\alpha}\|\{Z_{j}^{\prime}\}}$
	$\displaystyle\leq\\|F_{i}(Z_{i})\mathcal{B}_{i}(Z_{i},\delta_{1}/n)\\|_{\psi_{\alpha}}$
	$\displaystyle\leq 7\sqrt{\log(3n/\delta_{1})}\left\\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{i}]\right)^{1/2}\right\\|_{\psi_{\alpha}}+C_{\beta}K_{F}K_{G}(\log(2n))^{1/\beta}(\log(3n/\delta_{1}))^{1/\beta^{*}}.$

To bound the first term on the right hand side of (18), we follow the argument in the proof of Theorem 3.2 of Giné et al., (2000). However, in place of inequality (3.8) of Giné et al., (2000), we apply Theorem B.1 of Kuchibhotla and Chakrabortty, (2022). Following the display after inequality (3.11) of Giné et al., (2000), we have

	$\displaystyle\left(\sum_{i=1}^{n}\mathbb{E}[h_{i}^{2}(Z_{i})\|\{Z_{j}^{\prime}\}]\right)^{1/2}$
	$\displaystyle=\sup\left\{\sum_{i=1}^{n}\mathbb{E}[h_{i}(Z_{i})\gamma_{i}(Z_{i})\|\{Z_{j}^{\prime}\}]:\,\sum_{i=1}^{n}\mathbb{E}[\gamma_{i}^{2}(Z_{i})]\leq 1\right\}$
	$\displaystyle=\sup\left\{\sum_{j=1}^{n}\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}(Z_{i},Z_{j}^{\prime})\gamma_{i}(Z_{i})\|Z_{j}^{\prime}]\right):\,\sum_{i=1}^{n}\mathbb{E}[\gamma_{i}^{2}(Z_{i})]\leq 1\right\},$

where the supremum is taken over a countable subset of mean zero vector functions $(\gamma_{1},\ldots,\gamma_{n})$ . Define

W_{j}(\gamma)=\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}(Z_{i},Z_{j}^{\prime})\gamma_{i}(Z_{i})|Z_{j}^{\prime}].

Degeneracy of $\{f_{i,j}\}$ implies that $W_{j}$ ’s are mean zero independent random variables. Hence, by Theorem B.1 of Kuchibhotla and Chakrabortty, (2022), we get

	$\displaystyle\left(\mathbb{E}\left[\sup_{\gamma}\left\|\sum_{j=1}^{n}W_{j}\right\|^{p}\right]\right)^{1/p}$
	$\displaystyle\leq 2\mathbb{E}\left[\sup_{\gamma}\left\|\sum_{j=1}^{n}W_{j}\right\|\right]+\sqrt{2p}\left(\sup_{\gamma}\sum_{j=1}^{n}\mathbb{E}[W_{j}^{2}]\right)^{1/2}+C_{\beta}p^{1/\beta^{*}}\left\\|\max_{1\leq i\leq n}\sup_{\gamma}W_{j}\right\\|_{\psi_{\beta}}.$

Following the argument in Theorem 3.2 of Giné et al., (2000), we get

	$\displaystyle\mathbb{E}\left[\sup_{\gamma}\left\|\sum_{j=1}^{n}W_{j}\right\|\right]~{}$	$\displaystyle\leq~{}\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2},$
	$\displaystyle\sup_{\gamma}\sum_{j=1}^{n}\mathbb{E}[W_{j}^{2}]~{}$	$\displaystyle\leq~{}\\|(f_{i,j})\\|_{L^{2}\to L^{2}}^{2},$
	$\displaystyle\max_{1\leq j\leq n}\sup_{\gamma}\,W_{j}~{}$	$\displaystyle\leq~{}\max_{1\leq j\leq n}\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{j}^{\prime}]\right)^{1/2}.$

Therefore, by Markov’s inequality, with probability at least $1-\delta_{3}$ ,

\begin{split}&\left(\sum_{i=1}^{n}\mathbb{E}[h_{i}^{2}(Z_{i})|\{Z_{j}^{\prime}\}]\right)^{1/2}\\ &\leq 2\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}+\sqrt{2\log(1/\delta_{3})}\|(f_{i,j})\|_{L^{2}\to L^{2}}\\ &\quad+C_{\beta}(\log(1/\delta_{3}))^{1/\beta^{*}}(\log(n))^{1/\beta}\max_{1\leq j\leq n}\left\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{j}^{\prime}]\right)^{1/2}\right\|_{\psi_{\beta}}.\end{split}

(19)

Combining inequalities (17), (18), (19), we get that with probability $1-\delta_{1}-\delta_{2}-\delta_{3}$ ,

	$\displaystyle\|\mathcal{U}_{n}^{\prime}\|$	$\displaystyle\leq 14\sqrt{\log(3/\delta_{2})}\left(\sum_{i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}$
		$\displaystyle\quad+7\sqrt{2\log(3/\delta_{2})\log(1/\delta_{3})}\\|(f_{i,j})\\|_{L^{2}\to L^{2}}$
		$\displaystyle\quad+\mathfrak{C}_{\beta}(\log(3/\delta_{2}))^{1/2}(\log(1/\delta_{3}))^{1/\beta^{*}}(\log n)^{1/\beta}\max_{1\leq j\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{j}^{\prime}]\right)^{1/2}\right\\|_{\psi_{\beta}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha}(\log(3n/\delta_{1}))^{1/2}(\log(3/\delta_{2}))^{1/\alpha^{*}}(\log(2n))^{1/\alpha}\max_{1\leq i\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{i}]\right)^{1/2}\right\\|_{\psi_{\alpha}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}K_{F}K_{G}(\log(2n))^{1/\alpha+1/\beta}(\log(3/\delta_{2}))^{1/\alpha^{}}(\log(3n/\delta_{1}))^{1/\beta^{}}.$

Taking $\delta_{1}=\delta_{2}=\delta_{3}=\delta/3$ and integrating over $\delta\in[0,1]$ , this inequality yields the following moment bound

	$\displaystyle\\|\mathcal{U}_{n}^{\prime}\\|_{p}$	$\displaystyle\leq\mathfrak{C}p^{1/2}\left(\sum_{i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}$
		$\displaystyle\quad+\mathfrak{C}p\\|(f_{i,j})\\|_{L^{2}\to L^{2}}$
		$\displaystyle\quad+\mathfrak{C}_{\beta}p^{1/2+1/\beta^{*}}(\log n)^{1/\beta}\max_{1\leq j\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{j}^{\prime}]\right)^{1/2}\right\\|_{\psi_{\beta}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha}p^{1/2+1/\alpha^{*}}(\log(2n))^{1/2+1/\alpha}\max_{1\leq i\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{i}]\right)^{1/2}\right\\|_{\psi_{\alpha}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}p^{1/\alpha^{}+1/\beta^{}}K_{F}K_{G}(\log(2n))^{1/\alpha+1/\beta+1/\beta^{*}}.$

This inequality combined with (16) yields the tail bound for $U_{n}^{D}$ .

A.2 Proof of Lemma 1

From Theorem 3.1.1 of de la Peña and Giné, (1999) and following the arguments similar to those in Theorem 3.5.3 of de la Peña and Giné, (1999), we get for all $p\geq 1$

\left\lVert T_{n}^{D}\right\rVert_{p}\leq 48\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}^{D}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},

where $\varepsilon_{i},\varepsilon_{i}^{\prime},1\leq i\leq n$ are Rademacher random variables independent of $(Z_{i},Z_{i}^{\prime}),1\leq i\leq n$ . Note from (9) that

	$\displaystyle\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}^{D}(Z_{i},Z_{j}^{\prime})$	$\displaystyle=\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})-\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(z,Z_{j}^{\prime})P_{i}(dz)$
		$\displaystyle\quad-\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(Z_{i},z)P_{j}(dz)+\varepsilon_{i}\varepsilon_{j}^{\prime}\iint f_{i,j}(z,z^{\prime})P_{i}(dz)P_{j}(dz).$

Here $P_{i}$ represents the probability measure of $Z_{i}$ for $1\leq i\leq n$ . By Jensen’s inequality, it is clear that for $p\geq 1$ ,

	$\displaystyle\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(z,Z_{j}^{\prime})P_{i}(dz)\right\rVert_{p}$	$\displaystyle\leq\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},$
	$\displaystyle\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(Z_{i},z)dP_{j}(z)\right\rVert_{p}$	$\displaystyle\leq\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},$
	$\displaystyle\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}\iint f_{i,j}(z,z^{\prime})P_{i}(dz)P_{j}(dz)\right\rVert_{p}$	$\displaystyle\leq\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p}.$

Therefore, for $p\geq 1$ ,

\left\lVert T_{n}^{D}\right\rVert_{p}\leq 192\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p}

Throughout the proofs in all the appendices to follow, we use the notation

\mathcal{Z}_{n}^{\prime}:=\{(Z_{1}^{\prime},\varepsilon_{1}^{\prime}),\ldots,(Z_{n}^{\prime},\varepsilon_{n}^{\prime})\}\quad\mbox{and}\quad\mathcal{Z}_{n}:=\{(Z_{1},\varepsilon_{1}),\ldots,(Z_{n},\varepsilon_{n})\}.

Note that this is different from $\mathcal{Z}_{n}^{\prime}$ and $\mathcal{Z}_{n}$ defined in the main text.

A.3 Proof of Theorem 2

Based on the basic decomposition (14), we get

\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\phi(Z_{i})w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})\varepsilon_{j}^{\prime}=\mathcal{U}_{n}^{(1)}+\mathcal{U}_{n}^{(2)}+\mathcal{U}_{n}^{(3)}+\mathcal{U}_{n}^{(4)},

where

\begin{split}\mathcal{U}_{n}^{(1)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime},\\ \mathcal{U}_{n}^{(2)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime},\\ \mathcal{U}_{n}^{(3)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime},\\ \mathcal{U}_{n}^{(4)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime}.\end{split}

(20)

It is easy to verify that $\mathcal{U}_{n}^{(k)},1\leq k\leq 4$ are all degenerate $U$ -statistics. From Theorem 3.2 of Giné et al., (2000), we get that there exists a constant $K>0$ such that for all $p\geq 1$ ,

\left\lVert\mathcal{U}_{n}^{(1)}\right\rVert_{p}\leq K\left[\sqrt{p}A+pB+p^{3/2}C+p^{2}D\right],

where

\begin{split}A&:=\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi_{i,1}^{\prime}\right)^{2}\right]\right)^{1/2},\\ B&:=\sup\left\{\mathbb{E}\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\xi_{i}(\varepsilon_{i},Z_{i})\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{i,1}\zeta_{j}(\varepsilon_{j}^{\prime},Z_{j}^{\prime})\varepsilon_{j}^{\prime}:\right.\\ &\qquad\qquad\quad\left.\mathbb{E}\sum_{i=1}^{n}\xi_{i}^{2}(\varepsilon_{i},Z_{i})\leq 1,\,\mathbb{E}\sum_{j=1}^{n}\zeta_{i}^{2}(\varepsilon_{i}^{\prime},Z_{i}^{\prime})\leq 1\right\},\\ C^{p}&:=\mathbb{E}\left(\max_{1\leq i\leq n}\mathbb{E}\left[\sum_{j=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{i,1}\right)^{2}\big{|}X_{i},Y_{i}\right]\right)^{p/2}\\ &\qquad+\mathbb{E}\left(\max_{1\leq j\leq n}\mathbb{E}\left[\sum_{i=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{i,1}\right)^{2}\big{|}X_{j}^{\prime},Y_{j}^{\prime}\right]\right)^{p/2}\\ D^{p}&:=\mathbb{E}\left[\max_{1\leq i\neq j\leq n}|\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}|^{p}\right].\end{split}

(21)

It is clear that

\displaystyle A^{2}\leq\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\phi^{2}(Y_{i})w_{i,j}^{2}(X_{i},X_{j})\psi^{2}(Y_{j})\right]=\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w_{i,j}^{2}(X_{i},X_{j})\sigma_{j,\psi}^{2}(X_{j})\right].

The quantity $B$ appears as the square root of the wimpy variance of the supremum of an empirical process; see Boucheron et al., (2013, page 314). Lemma 4 of Section A.4 implies that

	$\displaystyle B$	$\displaystyle\leq\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.$
		$\displaystyle\qquad\qquad\quad\left.\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1,\,\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1\right\}.$

For bounding $C$ , note that

	$\displaystyle\mathbb{E}\left[\sum_{j=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{j,1}\right)^{2}\big{\|}X_{i},Y_{i}\right]$	$\displaystyle\leq T_{\phi}^{2}\sup_{x}\sum_{j=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j})\sigma^{2}_{j,\psi}(X_{j})\right],$
	$\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{j,1}\right)^{2}\big{\|}X_{j}^{\prime},Y_{j}^{\prime}\right]$	$\displaystyle\leq T_{\psi}^{2}\sup_{x}\sum_{i=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(X_{i},x)\sigma^{2}_{i,\phi}(X_{i})\right].$

Combining these two inequalities implies that

C\leq T_{\phi}\sup_{x}\left(\sum_{j=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j})\sigma^{2}_{j,\psi}(X_{j})\right]\right)^{1/2}+T_{\psi}\sup_{x}\left(\sum_{i=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(X_{i},x)\sigma^{2}_{i,\phi}(X_{i})\right]\right)^{1/2}

Finally, it is clear from assumption (B2) that $D\leq T_{\phi}T_{\psi}B_{w}.$ Combining all these with Theorem 3.2 of Giné et al., (2000) and noting

T_{\phi}\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\quad\mbox{and}\quad T_{\psi}\leq K_{\beta}C_{\psi}(\log n)^{1/\beta},

we get that there exists a constant $K>0$ such that for all $p\geq 1$

\left\lVert\mathcal{U}_{n}^{(1)}\right\rVert_{p}\leq K\left[\sqrt{p}\Lambda_{1/2}+p\Lambda_{1}+p^{3/2}\left\{\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right\}+p^{2}\Lambda_{2}\right].

(22)

To bound $\mathcal{U}_{n}^{(2)}$ and $\mathcal{U}_{n}^{(3)}$ in (33), we use Hoffmann-Jøgensen’s inequality (Proposition 6.8 of Ledoux and Talagrand, (1991)). Observe that

\mathcal{U}_{n}^{(2)}:=\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,2}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime}),\quad\mbox{where}\quad g_{i}(X_{i};\mathcal{Z}_{n}^{\prime}):=\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}.

With $\mathcal{Z}_{n}^{\prime}:=\{(\varepsilon_{1}^{\prime},Z_{1}^{\prime}),\ldots,(\varepsilon_{n}^{\prime},Z_{n}^{\prime})\}$ and $\mathcal{X}_{n}:=\{X_{1},\ldots,X_{n}\}$ , note that

\displaystyle\mathbb{P}

\displaystyle\left(\max_{1\leq I\leq n}\left|\sum_{i=1}^{I}\varepsilon_{i}\Phi_{i,2}g_{i}(X_{i},\mathcal{Z}_{n}^{\prime})\right|>0\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right)\leq\mathbb{P}\left(\max_{1\leq i\leq n}|\phi(Z_{i})|\geq T_{\phi}\big{|}\mathcal{X}_{n}\right)\leq 1/8,

and so, by Equation (6.8) of Ledoux and Talagrand, (1991), we get

	$\displaystyle\mathbb{E}\left[\mathcal{U}_{n}^{(2)}\big{\|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right]$	$\displaystyle\leq 8\mathbb{E}\left[\max_{1\leq i\leq n}\left\|\Phi_{i,2}\left(g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right)\right\|\big{\|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right]$
		$\displaystyle\leq 8\mathbb{E}\left[\max_{1\leq i\leq n}\left\|\phi(Z_{i})\right\|\big{\|}\mathcal{X}_{n}\right]\max_{1\leq i\leq n}\left\|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right\|=T_{\phi}\max_{1\leq i\leq n}\left\|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right\|.$

From assumption (B1) and Theorem 6.21 of Ledoux and Talagrand, (1991), we thus get for $0<\alpha\leq 1$ ,

\begin{split}\left\lVert\mathcal{U}_{n}^{(2)}\right\rVert_{\psi_{\alpha}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}&\leq K_{\alpha}\mathbb{E}\left[\mathcal{U}_{n}^{(2)}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right]+K_{\alpha}\left\lVert\max_{1\leq i\leq n}\left|\Phi_{i,2}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|\right\rVert_{\psi_{\alpha}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\\ &\leq K_{\alpha}\left(T_{\phi}+\left\lVert\max_{1\leq i\leq n}|\phi(Y_{i})|\right\rVert_{\psi_{\alpha}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\right)\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|\\ &\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|,\end{split}

(23)

for some constant $K_{\alpha}$ depending only on $\alpha$ (and can be different in different lines). If $\alpha\geq 1$ , then we get

\left\lVert\mathcal{U}_{n}^{(2)}\right\rVert_{\psi_{\alpha^{*}}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|.

See proof of Theorem 3.3 in Kuchibhotla and Chakrabortty, (2022) for similar argument. Thus,

\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}|^{p}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right]\leq K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|^{p}.

Thus, for $p\geq 1$ ,

\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}|^{p}\right]\leq K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\mathbb{E}\left[\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|^{p}\right].

(24)

To control the right hand side above, recall that

g_{i}(x;\mathcal{Z}_{n}^{\prime})=\sum_{j=1,j\neq i}^{n}w_{i,j}(x,X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime},

is a sum of mean zero independent random variables that are bounded by $B_{w}T_{\psi}$ . Also, note that

\mbox{Var}(g_{i}(x;\mathcal{Z}_{n}^{\prime}))=\sum_{j=1,j\neq i}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j}^{\prime})\psi^{2}(Z_{j}^{\prime})\right]=\sum_{j=1,j\neq i}^{n}\mathbb{E}[w_{i,j}^{2}(x,X_{j}^{\prime})\sigma_{j,\psi}^{2}(X_{j}^{\prime})].

Therefore by Bernstein’s inequality (Lemma 4 of van de Geer and Lederer, (2013)), we get that

\mathbb{P}\left(\max_{1\leq i\leq n}|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})|-\Upsilon_{\psi}\sqrt{6\log(1+n)}-3B_{w}T_{\psi}\log n\geq\Upsilon_{\psi}\sqrt{t}+3B_{w}T_{\psi}t\right)\leq 2e^{-t},

(25)

where

\Upsilon_{\psi}^{2}:=\max_{x}\sum_{j=1,j\neq i}^{n}\mathbb{E}[w_{i,j}^{2}(x,X_{j}^{\prime})\sigma_{j,\psi}^{2}(X_{j}^{\prime})].

So, by Propositions A.3 and A.4 of Kuchibhotla and Chakrabortty, (2022), we get that for $p\geq 1$ ,

\mathbb{E}\left[\max_{1\leq i\leq n}|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})|^{p}\right]\leq C^{p}\left[(\log n)^{p/2}\Upsilon_{\psi}^{p}+(B_{w}T_{\psi})^{p}(\log n)^{p}+p^{p/2}\Upsilon_{\psi}^{p}+p^{p}(B_{w}T_{\psi})^{p}\right].

Hence for $p\geq 1$ ,

\begin{split}\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}|^{p}\right]&\leq K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\left[(\log n)^{p/2}\Upsilon_{\psi}^{p}+(B_{w}T_{\psi})^{p}(\log n)^{p}\right]\\ &\quad+K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\left[p^{p/2}\Upsilon_{\psi}^{p}+p^{p}(B_{w}T_{\psi})^{p}\right].\end{split}

(26)

A similar calculation for $\mathcal{U}_{n}^{(3)}$ shows that for $p\geq 1$ ,

\begin{split}\mathbb{E}\left[|\mathcal{U}_{n}^{(3)}|^{p}\right]&\leq K_{\beta}^{p}C_{\psi}^{p}(\log n)^{p/\beta}p^{p/\beta^{*}}\left[(\log n)^{p/2}\Upsilon_{\phi}^{p}+(B_{w}T_{\phi})^{p}(\log n)^{p}\right]\\ &\quad+K_{\beta}^{p}C_{\psi}^{p}(\log n)^{p/\beta}p^{p/\beta^{*}}\left[p^{p/2}\Upsilon_{\phi}^{p}+p^{p}(B_{w}T_{\phi})^{p}\right],\end{split}

(27)

where

\Upsilon_{\phi}^{2}:=\max_{x}\sum_{i=1,i\neq j}^{n}\mathbb{E}[w_{i,j}^{2}(X_{i},x)\sigma_{i,\phi}^{2}(X_{i})].

To control $\mathcal{U}_{n}^{(4)}$ , recall that

\mathcal{U}_{n}^{(4)}=\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,2}\left(\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,2}^{\prime}\varepsilon_{j}^{\prime}\right).

Following the arguments leading to (23), we have

\left\lVert\mathcal{U}_{n}^{(4)}\right\rVert_{\psi_{\alpha^{*}}|\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\left|\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j}^{\prime}\varepsilon_{j}^{\prime}\right|.

Conditioning on $\mathcal{X}_{n},\mathcal{X}_{n}^{\prime}$ , the right hand side satisfies the hypothesis of (6.8) of Ledoux and Talagrand, (1991) and so by Theorem 6.21 of Ledoux and Talagrand, (1991), we get

	$\displaystyle\left\lVert\max_{1\leq i\leq n}\left\|\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j}^{\prime}\varepsilon_{j}^{\prime}\right\|\right\rVert_{\psi_{\beta^{*}}\|\mathcal{X}_{n},\mathcal{X}_{n}^{\prime}}$	$\displaystyle\leq K_{\beta}\left\lVert\max_{1\leq i\neq j\leq n}\left\|w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})\right\|\right\rVert_{\psi_{\beta}\|\mathcal{X}_{n},\mathcal{X}_{n}^{\prime}}$
		$\displaystyle\leq K_{\beta}B_{w}(\log n)^{1/\beta}C_{\psi},$

for some constant $K_{\beta}$ depending only on $\beta$ . Therefore, for $p\geq 1$

\mathbb{E}\left[|\mathcal{U}_{n}^{(4)}|^{p}\right]\leq K^{p}C_{\psi}^{p}C_{\phi}^{p}p^{p(1/\alpha^{*}+1/\beta^{*})}(\log n)^{p(\alpha^{-1}+\beta^{-1})}B_{w}^{p}\leq K^{p}p^{p(1/\alpha^{*}+1/\beta^{*})}\Lambda_{2}^{p},

(28)

for some constant $K>0$ .

Combining bounds (26) and (27), we get that for some constant $K>0$ and for all $p\geq 1$ ,

	$\displaystyle\left\lVert\mathcal{U}_{n}^{(2)}+\mathcal{U}_{n}^{(3)}\right\rVert_{p}$	$\displaystyle\leq Kp^{1/\alpha^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+Kp^{1/\beta^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\qquad+Kp^{1/2+1/\alpha^{}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{}}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\qquad+K(\log n)\Lambda_{2}[p^{1/\alpha^{}}+p^{1/\beta^{}}]+K\Lambda_{2}[p^{1+1/\alpha^{}}+p^{1+1/\beta^{}}].$

Combining this inequality with (22) and (28), we get for all $p\geq 1$

	$\displaystyle\left\lVert\sum_{\ell=1}^{4}\mathcal{U}_{n}^{(\ell)}\right\rVert_{p}$	$\displaystyle\leq K\left[p^{1/2}\Lambda_{1/2}+p\Lambda_{1}+p^{3/2}\left\{\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right\}+p^{2}\Lambda_{2}\right]$
		$\displaystyle\quad+Kp^{1/\alpha^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+Kp^{1/\beta^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\quad+Kp^{1/2+1/\alpha^{}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{}}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\quad+K(\log n)\Lambda_{2}[p^{1/\alpha^{}}+p^{1/\beta^{}}]+K\Lambda_{2}[p^{1+1/\alpha^{}}+p^{1+1/\beta^{}}]$
		$\displaystyle\quad+Kp^{(1/\alpha^{}+1/\beta^{})}\Lambda_{2}.$

Since $\alpha^{*}\leq 1$ and $\beta^{*}\leq 1$ , we have

\min\{p^{1/2+1/\alpha^{*}},p^{1/2+1/\beta^{*}}\}\geq p^{3/2}\mbox{ and }\min\{p^{1+1/\alpha^{*}},p^{1+1/\beta^{*}},p^{1/\alpha^{*}+1/\beta^{*}}\}\geq p^{2}.

Using these inequalities, the bound above can be simplified as

	$\displaystyle\left\lVert\sum_{\ell=1}^{4}\mathcal{U}_{n}^{(\ell)}\right\rVert_{p}$	$\displaystyle\leq Kp^{1/2}\Lambda_{1/2}+Kp\Lambda_{1}$
		$\displaystyle\quad+Kp^{1/\alpha^{}}\left[(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+(\log n)\Lambda_{2}\right]+Kp^{1/\beta^{}}\left[(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}+(\log n)\Lambda_{2}\right]$
		$\displaystyle\quad+Kp^{1/2+1/\alpha^{}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{}}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\quad+Kp^{1/\alpha^{}+1/\beta^{}}\Lambda_{2}.$

Here the constant $K>0$ depends only on $\alpha,\beta$ . This completes the proof based on Lemma 1.

A.4 Auxiliary Lemmas Used in Theorem 2

The two lemmas to follow in this section provide explicit (but not necessarily optimal) constants for Equations (3.1) and (2.6) of Giné et al., (2000). These lemmas can be used in the proof of Theorem 3.2 of Giné et al., (2000) to get explicit constants. In this respect, we note that Theorem 3.4.8 of Giné and Nickl, (2016) (which was first proved in Houdré and Reynaud-Bouret, (2003)) does not imply Theorem 3.2 of Giné et al., (2000) since the result of Giné et al., (2000) applies for unbounded kernels in $U$ -statistics while the result of Giné and Nickl, (2016) applies exclusively for bounded kernel $U$ -statistics.

Lemma 2.

Suppose $Z_{1},\ldots,Z_{n}$ are independent mean zero random variables. Then for $p\geq 1$ ,

\mathbb{E}\left[\left|\sum_{i=1}^{n}Z_{i}\right|^{p}\right]\leq 4^{p}p^{p/2}\left(\sum_{i=1}^{n}\mathbb{E}\left[Z_{i}^{2}\right]\right)^{p/2}+4^{p}p^{p}\mathbb{E}\left[\max_{1\leq i\leq n}|Z_{i}|^{p}\right].

Proof.

By Theorem 7 of Boucheron et al., (2005), we get for $p\geq 2$ ,

\mathbb{E}\left[\left|\sum_{i=1}^{n}Z_{i}\right|^{p}\right]\leq 2^{p+1}\left(\frac{2p}{e-\sqrt{e}}\right)^{p/2}\mathbb{E}\left[\left(\sum_{i=1}^{n}Z_{i}^{2}\right)^{p/2}\right].

By Theorem 8 of Boucheron et al., (2005), we get for $p\geq 2$ ,

\mathbb{E}\left[\left(\sum_{i=1}^{n}Z_{i}^{2}\right)^{p/2}\right]\leq 3^{p/2}\left(\sum_{i=1}^{n}\mathbb{E}\left[Z_{i}^{2}\right]\right)^{p/2}+\left(\frac{3p\kappa}{2}\right)^{p/2}\mathbb{E}\left[\max_{1\leq i\leq n}|Z_{i}|^{p}\right],

for $\kappa=0.5\sqrt{e}/(\sqrt{e}-1)$ . Thus for $p\geq 2$ ,

\mathbb{E}\left[\left|\sum_{i=1}^{n}Z_{i}\right|^{p}\right]\leq 4^{p}p^{p/2}\left(\sum_{i=1}^{n}\mathbb{E}\left[Z_{i}^{2}\right]\right)^{p/2}+4^{p}p^{p}\mathbb{E}\left[\max_{1\leq i\leq n}|Z_{i}|^{p}\right].

Since the inequality holds true for $p=1$ trivially, the result follows. ∎

Lemma 3.

Suppose $\xi_{i},1\leq i\leq n$ are independent random variables, then for $p\geq 1$ and $\alpha>0$ ,

p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[\left|\xi_{i}\right|^{p}\right]\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2(1.5)^{p\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}.

Proof.

Fix $p\geq 1$ . Define $\delta_{0}\geq 0$ such that

\delta_{0}:=\inf\left\{t>0:\,\sum_{i=1}^{n}\mathbb{P}\left(|\xi_{i}|>t\right)\leq 1\right\}.

By (1.4.4) of de la Peña and Giné, (1999), it follows that

\frac{1}{2}\max\left\{\delta_{0}^{p},\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\mathbbm{1}_{\{|\xi_{i}|>\delta_{0}\}}\right]\right\}\leq\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right].

(29)

Observe that

	$\displaystyle\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|^{p}\right]$	$\displaystyle=\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|^{p}\mathbbm{1}_{\{\|\xi_{i}\|>\delta_{0}\}}\right]+\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|^{p}\mathbbm{1}_{\{\|\xi_{i}\|\leq\delta_{0}\}}\right]$
		$\displaystyle\overset{(a)}{\leq}2\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|^{p}\mathbbm{1}_{\{\|\xi_{i}\|\leq\delta_{0}\}}\right]$
		$\displaystyle\leq 2\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+\delta_{0}^{p-1}\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\mathbbm{1}_{\{\|\xi_{i}\|\leq\delta_{0}\}}\right]$
		$\displaystyle\overset{(a)}{\leq}2\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p-1}\right]\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\mathbbm{1}_{\{\|\xi_{i}\|\leq\delta_{0}\}}\right]\right)$
		$\displaystyle\leq 2\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p-1}\right]\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right).$

Inequality (a) follows from (29). To prove the result now, we consider two cases:

–

Case 1: If

p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\leq\left(\sum_{i=1}^{n}\mathbb{E}[|\xi_{i}|]\right)^{p},

then

	$\displaystyle\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p-1}\right]\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)$	$\displaystyle\leq\left(\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]\right)^{(p-1)/p}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)$
		$\displaystyle\leq\frac{1}{p^{(p-1)\alpha}}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)^{p-1}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)$
		$\displaystyle\leq\frac{1}{p^{(p-1)\alpha}}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)^{p}.$

Therefore (in case 1),

p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\right]\leq 2p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2p^{\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}.

(30)

–

Case 2: If

p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\geq\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p},

then

	$\displaystyle\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p-1}\right]$	$\displaystyle\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)$
		$\displaystyle\leq\left(\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]\right)^{(p-1)/p}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)$
		$\displaystyle\leq\left(\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]\right)^{(p-1)/p}p^{\alpha}\left(\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]\right)^{1/p}$
		$\displaystyle\leq p^{\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right].$

Therefore (in case 2),

$\displaystyle p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|^{p}\right]$	$\displaystyle\leq 2p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2p^{\alpha(p+1)}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]$
	$\displaystyle\leq 2p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2p^{p\alpha}\left(e^{1/e}\right)^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]$
	$\displaystyle\leq(2+(1.5)^{p\alpha})p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right].$	(31)

Combining inequalities (30) and (31), we get for $p\geq 1$ and $\alpha>0$ that

	$\displaystyle p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|^{p}\right]$	$\displaystyle\leq(2+(1.5)^{p\alpha})p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2p^{\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)^{p}$
		$\displaystyle\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2p^{\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)^{p}$
		$\displaystyle\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2(p^{1/p})^{p\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)^{p}$
		$\displaystyle\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\|\xi_{i}\|^{p}\right]+2(1.5)^{p\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[\|\xi_{i}\|\right]\right)^{p}.$

This proves the result. ∎

Lemma 4.

Under the notation of Theorem 2, the quantity $B$ defined in (21) satisfies

	$\displaystyle B$	$\displaystyle\leq\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.$
		$\displaystyle\qquad\qquad\quad\left.\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1,\,\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1\right\}.$

Proof.

Following the proof of Theorem 3.2 of Giné et al., (2000), the quantity $B$ is the square root of the wimpy variance of

S_{n}:=\left(\sum_{i=1}^{n}\mathbb{E}\left[F_{i}^{2}(\varepsilon_{i},Z_{i};\mathcal{Z}_{n}^{\prime})\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{1/2},

where $\mathcal{Z}_{n}^{\prime}:=\{(\varepsilon_{1}^{\prime},Z_{1}^{\prime}),\ldots,(\varepsilon_{n}^{\prime},Z_{n}^{\prime})\}$ and

F_{i}(\varepsilon_{i},Z_{i};\mathcal{Z}_{n}^{\prime}):=\varepsilon_{i}\Phi_{i,1}\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}.

This implies that

S_{n}\leq\left(\sum_{i=1}^{n}\mathbb{E}\left[G_{i}^{2}(X_{i};\mathcal{Z}_{n}^{\prime})\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{1/2},

where for $\sigma_{i,\phi}^{2}(x):=\mathbb{E}\left[\phi^{2}(Y_{i})\big{|}X_{i}=x\right]$ ,

G_{i}(X_{i};\mathcal{Z}_{n}^{\prime}):=\sigma_{i,\phi}(X_{i})\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}.

Note that $\sigma_{i,\phi}(\cdot)$ depends on $i$ since the random variables are allowed to be non-identically distributed. Now observe that

S_{n}=\sup\left\{\sum_{i=1}^{n}\int q_{i}(x)G_{i}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx):\,\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1\right\}.

(32)

To prove this, note that for any $\{q_{i}(\cdot):\,1\leq i\leq n\}$ satisfying the (integral) constraint,

	$\displaystyle\sum_{i=1}^{n}$	$\displaystyle\int q_{i}(x)G_{i}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)$
		$\displaystyle\leq\sum_{i=1}^{n}\left(\int q_{i}^{2}(x)P_{X_{i}}(dx)\right)^{1/2}\left(\int G_{i}^{2}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)\right)^{1/2}$
		$\displaystyle\leq\left(\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\right)^{1/2}\left(\sum_{i=1}^{n}\int G_{i}^{2}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)\right)^{1/2}\leq S_{n}.$

To prove the reverse inequality, define for $1\leq i\leq n$ ,

q_{i}(x):=G_{i}(x;\mathcal{Z}_{n}^{\prime})\left(\sum_{i=1}^{n}\int G_{i}^{2}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)\right)^{1/2}.

It is clear that $\{q_{i}(\cdot):\,1\leq i\leq n\}$ satisfy the integral constraint in (32) and

\sum_{i=1}^{n}\int q_{i}(x)G_{i}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)=S_{n}.

This completes the proof of (32). Rewriting the representation (32), we get

S_{n}=\sup_{\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1}\,\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi^{\prime}_{j,1}\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right).

This representation shows that $S_{n}$ is indeed the supremum of an empirical process. The wimpy variance of this supremum is given by

	$\displaystyle\sup_{\{q_{i}(\cdot)\}}$	$\displaystyle\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi^{\prime}_{j,1}\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right)\right)$
		$\displaystyle\leq\sup_{\{q_{i}(\cdot)\}}\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j}^{\prime})\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right)^{2}\right]$
		$\displaystyle=\sup_{\{q_{i}(\cdot)\}}\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j})\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j})P_{X_{i}}(dx)\right)^{2}\right].$

Now a duality argument implies that

	$\displaystyle\sup_{\{q_{i}(\cdot)\}}$	$\displaystyle\left(\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j})\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j})P_{X_{i}}(dx)\right)^{2}\right]\right)^{1/2}$
		$\displaystyle=\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.$
		$\displaystyle\qquad\quad\left.\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1,\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1\right\}.$

Thus the result follows. ∎

Appendix B Proofs of Results in Section 3

B.1 Proof of Theorem 3

Similar to $\mathcal{U}_{n}^{(\ell)},1\leq\ell\leq 4$ defined in the proof of Theorem 2, we define

\begin{split}\mathcal{U}_{n}^{(1)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}\right|,\\ \mathcal{U}_{n}^{(2)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}\right|,\\ \mathcal{U}_{n}^{(3)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime}\right|,\\ \mathcal{U}_{n}^{(4)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime}\right|.\end{split}

(33)

As in the proof of Theorem 2, we will control each of the terms separately in the following lemmas. All the lemmas below assume (B1) and (A2^′).

Lemma 5 (Control of $\mathcal{U}_{n}^{(4)}(\mathcal{W})$ ).

There exists a constant $K>0$ (depending only on $\alpha,\beta$ ) such that for all $p\geq 1$ ,

\left\lVert\mathcal{U}_{n}^{(4)}(\mathcal{W})\right\rVert_{p}\leq K\Lambda_{2}(\mathcal{W})p^{1/\alpha^{*}+1/\beta^{*}}.

Proof.

Since $\left\lVert w_{i,j}\right\rVert_{\infty}\leq B_{\mathcal{W}}$ for all $w\in\mathcal{W}$ , it follows that

\mathcal{U}_{n}^{(4)}(\mathcal{W})\leq B_{\mathcal{W}}\sum_{1\leq i\neq j\leq n}|\Phi_{i,2}\Psi_{j,2}^{\prime}|\leq B_{\mathcal{W}}\left(\sum_{i=1}^{n}|\Phi_{i,2}|\right)\left(\sum_{j=1}^{n}|\Psi_{j,2}^{\prime}|\right).

By definition

	$\displaystyle\mathbb{P}\left(\max_{1\leq I\leq n}\sum_{i=1}^{I}\|\Phi_{i,2}\|>0\big{\|}\mathcal{X}_{n}\right)$	$\displaystyle\leq\mathbb{P}\left(\max_{1\leq i\leq n}\|\phi(Z_{i})\|\geq T_{\phi}\big{\|}\mathcal{X}_{n}\right)\leq 1/8,$
	$\displaystyle\mathbb{P}\left(\max_{1\leq I\leq n}\sum_{i=1}^{I}\|\Psi_{i,2}^{\prime}\|>0\big{\|}\mathcal{X}_{n}^{\prime}\right)$	$\displaystyle\leq\mathbb{P}\left(\max_{1\leq i\leq n}\|\psi(Z_{i}^{\prime})\|\geq T_{\psi}\big{\|}\mathcal{X}_{n}^{\prime}\right)\leq 1/8.$

Hence by (6.8) of Ledoux and Talagrand, (1991), we get that

	$\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}\|\Phi_{i,2}\|\big{\|}\mathcal{X}_{n}\right]$	$\displaystyle\leq C\mathbb{E}\left[\max_{1\leq i\leq n}\|\phi(Z_{i})\|\big{\|}\mathcal{X}_{n}\right]$
	$\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}\|\Psi_{i,2}^{\prime}\|\big{\|}\mathcal{X}_{n}^{\prime}\right]$	$\displaystyle\leq C\mathbb{E}\left[\max_{1\leq i\leq n}\|\psi(Z_{i}^{\prime})\|\big{\|}\mathcal{X}_{n}^{\prime}\right],$

for some constant $C>0$ . Thus by applying Theorem 6.21 of Ledoux and Talagrand, (1991) to $\sum\{\Phi_{i,1}-\mathbb{E}[\Phi_{i,1}|\mathcal{X}_{n}]\}$ and $\sum\{\Psi_{i,2}^{\prime}-\mathbb{E}[\Psi_{i,2}^{\prime}|\mathcal{X}_{n}^{\prime}]\}$ , we get

	$\displaystyle\left\lVert\sum_{i=1}^{n}\|\Phi_{i,2}\|\right\rVert_{\psi_{\alpha^{*}}\|\mathcal{X}_{n}}$	$\displaystyle\leq C\left\lVert\max_{1\leq i\leq n}\|\phi(Z_{i})\|\right\rVert_{\psi_{\alpha}\|\mathcal{X}_{n}}\leq CC_{\phi}(\log n)^{1/\alpha},$
	$\displaystyle\left\lVert\sum_{i=1}^{n}\|\Psi_{i,2}^{\prime}\|\right\rVert_{\psi_{\beta^{*}}\|\mathcal{X}_{n}}$	$\displaystyle\leq C\left\lVert\max_{1\leq i\leq n}\|\psi(Z_{i}^{\prime})\|\right\rVert_{\psi_{\beta}\|\mathcal{X}_{n}^{\prime}}\leq CC_{\psi}(\log n)^{1/\beta},$

Therefore, for all $p\geq 1$ ,

\left\lVert\mathcal{U}_{n}^{(4)}(\mathcal{W})\right\rVert_{p}\leq KB_{\mathcal{W}}C_{\phi}C_{\psi}(\log n)^{\alpha^{-1}+\beta^{-1}}p^{1/\alpha^{*}+1/\beta^{*}}=K\Lambda_{2}(\mathcal{W})p^{1/\alpha^{*}+1/\beta^{*}}.

This completes the proof. ∎

The following lemma controls the moments of $\mathcal{U}_{n}^{(2)}(\mathcal{W})$ and $\mathcal{U}_{n}^{(3)}(\mathcal{W})$ .

Lemma 6 (Control of $\mathcal{U}_{n}^{(2)}(\mathcal{W})$ and $\mathcal{U}_{n}^{(3)}(\mathcal{W})$ ).

There exists a constant $K>0$ (depending only on $\alpha,\beta$ ) such that for $p\geq 1$ ,

	$\displaystyle\left\lVert\mathcal{U}_{n}^{(2)}(\mathcal{W})\right\rVert_{p}$	$\displaystyle\leq Kp^{1/\alpha^{*}}\left[E_{n,2}(\mathcal{W})+(\log n)^{1/2}\Sigma_{n,2}^{1/2}(\mathcal{W})+(\log n)\Lambda_{2}(\mathcal{W})\right]$
		$\displaystyle\qquad+Kp^{1/2+1/\alpha^{}}\Sigma_{n,2}^{1/2}(\mathcal{W})+Kp^{1+1/\alpha^{}}\Lambda_{2}(\mathcal{W})$
	$\displaystyle\left\lVert\mathcal{U}_{n}^{(3)}(\mathcal{W})\right\rVert_{p}$	$\displaystyle\leq Kp^{1/\beta^{*}}\left[E_{n,1}(\mathcal{W})+(\log n)^{1/2}\Sigma_{n,1}^{1/2}(\mathcal{W})+(\log n)\Lambda_{2}(\mathcal{W})\right]$
		$\displaystyle\qquad+Kp^{1/2+1/\beta^{}}\Sigma_{n,1}^{1/2}(\mathcal{W})+Kp^{1+1/\beta^{}}\Lambda_{2}(\mathcal{W}).$

Proof.

We will only prove the bound for $\mathcal{U}_{n}^{(2)}(\mathcal{W})$ and the proof for $\mathcal{U}_{n}^{(3)}(\mathcal{W})$ follows very similar arguments. Recall that

\mathcal{U}_{n}^{(2)}(\mathcal{W})=\sup_{w\in\mathcal{W}}\left|\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,2}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right|,\;\mbox{where}\;g_{i}(x;\mathcal{Z}_{n}^{\prime},w):=\sum_{j=1,j\neq i}^{n}\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}w_{i,j}(x,X_{j}^{\prime}).

Here again (6.8) of Ledoux and Talagrand, (1991) applies and we get

\left\lVert\mathcal{U}_{n}^{(2)}(\mathcal{W})\right\rVert_{\psi_{\alpha^{*}}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\leq KC_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(X_{i},X_{j}^{\prime})\right|.

By a similar calculation, we get

\left\lVert\mathcal{U}_{n}^{(3)}(\mathcal{W})\right\rVert_{\psi_{\beta^{*}}\big{|}\mathcal{X}_{n}^{\prime},\mathcal{Z}_{n}}\leq KC_{\psi}(\log n)^{1/\beta}\max_{1\leq j\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right|.

Thus, for $p\geq 1$ ,

\begin{split}\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}(\mathcal{W})|^{p}\right]&\leq K^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\mathbb{E}\left[\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(X_{i},X_{j}^{\prime})\right|^{p}\right],\\ \mathbb{E}\left[|\mathcal{U}_{n}^{(3)}(\mathcal{W})|^{p}\right]&\leq K^{p}C_{\psi}^{p}(\log n)^{p/\beta}p^{p/\beta^{*}}\mathbb{E}\left[\max_{1\leq j\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right|^{p}\right].\end{split}

(34)

The right hand side quantities involve supremum of bounded empirical processes for which Talagrand’s inequality applies; see proposition 3.1 of Giné et al., (2000). Observe that for any $x\in\mathfrak{X}$ ,

	$\displaystyle\max_{1\leq j\leq n}\sup_{w\in\mathcal{W}}\|\Psi_{j,1}^{\prime}w_{i,j}(x,X_{j}^{\prime})\|$	$\displaystyle\leq C_{\psi}(\log n)^{1/\beta}B_{\mathcal{W}},$
	$\displaystyle\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\|\Phi_{i,1}w_{i,j}(X_{i},x)\|$	$\displaystyle\leq C_{\phi}(\log n)^{1/\alpha}B_{\mathcal{W}}.$

By proposition 3.1 of Giné et al., (2000), we obtain for any $x\in\mathfrak{X}$ and $p\geq 1$ ,

\displaystyle\mathbb{E}\left[\sup_{w\in\mathcal{W}}\,|g_{i}(x;\mathcal{Z}_{n}^{\prime},w)|^{p}\right]\leq K^{p}\left\{\bar{E}_{n,2}^{p}(\mathcal{W})+p^{p/2}\bar{\Sigma}_{n,2}^{p/2}(\mathcal{W})+p^{p}C_{\psi}^{p}(\log n)^{p/\beta}B_{\mathcal{W}}^{p}\right\},

where $\bar{E}_{n,2}(\mathcal{W})=C_{\phi}^{-1}E_{n,2}(\mathcal{W})/(\log n)^{1/\alpha}$ and $\bar{\Sigma}_{n,2}^{1/2}(\mathcal{W})=C_{\phi}^{-1}\Sigma_{n,2}^{1/2}(\mathcal{W})/(\log n)^{1/\alpha}$ . Therefore, by following the argument that lead to (26), we get that

	$\displaystyle\mathbb{E}\left[\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\|^{p}\right]$		(35)
	$\displaystyle\qquad\leq K^{p}\left[\bar{E}_{n,2}^{p}(\mathcal{W})+p^{p/2}\bar{\Sigma}_{n,2}^{p/2}(\mathcal{W})+p^{p}C_{\psi}^{p}(\log n)^{p/\beta}B_{\mathcal{W}}^{p}\right]$
	$\displaystyle\qquad\quad+K^{p}\left[(\log n)^{p/2}\bar{\Sigma}_{n,2}^{p/2}(\mathcal{W})+(\log n)^{p}C_{\psi}^{p}(\log n)^{p/\beta}B_{\mathcal{W}}^{p}\right].$

Substituting this in (34), we get

	$\displaystyle\mathbb{E}\left[\|\mathcal{U}_{n}^{(2)}(\mathcal{W})\|^{p}\right]$	$\displaystyle\leq K^{p}p^{p/\alpha^{*}}\left[E_{n,2}^{p}(\mathcal{W})+p^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+p^{p}\Lambda_{2}^{p}(\mathcal{W})\right]$
		$\displaystyle\quad+K^{p}p^{p/\alpha^{*}}\left[(\log n)^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+(\log n)^{p}\Lambda_{2}^{p}(\mathcal{W})\right].$

By a similar calculation, we get

	$\displaystyle\mathbb{E}\left[\|\mathcal{U}_{n}^{(3)}(\mathcal{W})\|^{p}\right]$	$\displaystyle\leq K^{p}p^{p/\beta^{*}}\left[E_{n,1}^{p}(\mathcal{W})+p^{p/2}\Sigma_{n,1}^{p/2}(\mathcal{W})+p^{p}\Lambda_{2}^{p}(\mathcal{W})\right]$
		$\displaystyle\quad+K^{p}p^{p/\beta^{*}}\left[(\log n)^{p/2}\Sigma_{n,1}^{p/2}(\mathcal{W})+(\log n)^{p}\Lambda_{2}^{p}(\mathcal{W})\right].$

This completes the proof of the result. ∎

The following lemma controls the moments of $\mathcal{U}_{n}^{(1)}(\mathcal{W})$ . This is a bounded degenerate $U$ -process and is (usually) the dominating term among the four parts.

Lemma 7 (Control of $\mathcal{U}_{n}^{(1)}(\mathcal{W})$ ).

There exists a constant $K>0$ (depending only on $\alpha,\beta$ ) such that for all $p\geq 1$ ,

	$\displaystyle\left\lVert\mathcal{U}_{n}^{(1)}(\mathcal{W})\right\rVert_{p}$	$\displaystyle\leq K\mathbb{E}\left[\mathcal{U}_{n}^{(1)}(\mathcal{W})\right]+Kp^{1/2}\left(\mathfrak{W}_{n,1}(\mathcal{W})+\mathfrak{W}_{n,2}(\mathcal{W})\right)$
		$\displaystyle\quad+Kp\left(\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}+E_{n,1}(\mathcal{W})+E_{n,2}(\mathcal{W})+\Sigma_{n,2}^{1/2}(\mathcal{W})\sqrt{\log n}+\Lambda_{2}(\mathcal{W})\log n\right)$
		$\displaystyle\quad+Kp^{3/2}\left(\Sigma_{n,1}^{1/2}(\mathcal{W})+\Sigma_{n,2}^{1/2}(\mathcal{W})\right)+Kp^{2}\Lambda_{2}(\mathcal{W}).$

Proof.

Recall that

\mathcal{U}_{n}^{(1)}(\mathcal{W})=\sup_{w\in\mathcal{W}}\left|\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,1}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right|,\,\mbox{where}\,g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w):=\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(X_{i},X_{j}^{\prime}).

Observe that conditional on $\mathcal{Z}_{n}^{\prime}$ , $\mathcal{U}_{n}^{(1)}(\mathcal{W})$ is a bounded empirical process and so Talagrand’s inequality applies. Thus by Proposition 3.1 of Giné et al., (2000), we get for $p\geq 1$

	$\displaystyle\mathbb{E}\left[\|\mathcal{U}_{n}^{(1)}(\mathcal{W})\|^{p}\big{\|}\mathcal{Z}_{n}^{\prime}\right]$	$\displaystyle\leq K^{p}\left(\mathbb{E}\left[\|\mathcal{U}_{n}^{(1)}(\mathcal{W})\|\big{\|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p}$
		$\displaystyle\quad+K^{p}p^{p/2}\sup_{w\in\mathcal{W}}\left(\sum_{i=1}^{n}\mathbb{E}\left[\Phi_{i,1}^{2}g_{i}^{2}(X_{i};\mathcal{Z}_{n}^{\prime},w)\big{\|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p/2}$
		$\displaystyle\quad+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq i\leq n}\|\Phi_{i,1}\|^{p}\sup_{w\in\mathcal{W}}\left\|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right\|^{p}\big{\|}\mathcal{Z}_{n}^{\prime}\right].$

Therefore, for $p\geq 1$ ,

	$\displaystyle\mathbb{E}\left[\|\mathcal{U}_{n}^{(1)}(\mathcal{W})\|^{p}\right]$	$\displaystyle\leq K^{p}\mathbb{E}\left(\mathbb{E}\left[\|\mathcal{U}_{n}^{(1)}(\mathcal{W})\|\big{\|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p}$
		$\displaystyle\quad+K^{p}p^{p/2}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left(\sum_{i=1}^{n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})g_{i}^{2}(X_{i};\mathcal{Z}_{n}^{\prime},w)\big{\|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p/2}\right]$
		$\displaystyle\quad+K^{p}p^{p}C_{\phi}^{p}(\log n)^{p/\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\left\|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right\|^{p}\right]$
		$\displaystyle=:K^{p}\left[\mathbf{I}+\mathbf{II}+\mathbf{III}\right].$

Controlling $\mathbf{III}:$ Using (35) from Lemma 6, we get

	$\displaystyle\mathbf{III}$	$\displaystyle\leq K^{p}p^{p}\left[E_{n,2}^{p}(\mathcal{W})+p^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+p^{p}\Lambda_{2}^{p}(\mathcal{W})\right]$
		$\displaystyle\quad+K^{p}p^{p}\left[(\log n)^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+(\log n)^{p}\Lambda_{2}^{p}(\mathcal{W})\right].$

Controlling $\mathbf{II}:$ To control $\mathbf{II}$ , we use a technique similar to the one used in Lemma 4. For this note by (32) that for any $w(\cdot,\cdot)$

	$\displaystyle\left(\sum_{i=1}^{n}\int\sigma_{i,\phi}^{2}(x)g_{i}^{2}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx)\right)^{1/2}$
	$\displaystyle\qquad=\sup\left\{\sum_{i=1}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)g_{i}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx):\,\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1\right\}.$

Therefore,

\displaystyle\mathbf{II}

\displaystyle=p^{p/2}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1}\left|\sum_{i=1}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)g_{i}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx)\right|^{p}\right].

Now observe that

\displaystyle\sum_{i=1}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)g_{i}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx)=\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w),

where $\{q_{i}\}$ represents the sequence $(q_{1},\ldots,q_{n})$ satisfying $\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1\}$ and

\ell_{j}(X_{j}^{\prime};\{q_{i}\},w):=\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx).

(36)

Thus

\mathbf{II}=p^{p/2}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|^{p}\right].

The right hand side is a bounded empirical process and by proposition 3.1 of Giné et al., (2000), we get

\begin{split}&\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|^{p}\right]\\ &\qquad\leq K^{p}\left(\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|\right]\right)^{p}\\ &\qquad\quad+K^{p}p^{p/2}\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left(\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right)\right)^{p/2}\\ &\qquad\quad+K^{p}p^{p}\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\max_{1\leq j\leq n}|\Psi_{j,1}^{\prime}|^{p}|\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)|^{p}\right].\end{split}

(37)

We will now control each of the three terms appearing in (37). Using the fact $|\Psi_{j,1}^{\prime}|\leq KC_{\phi}(\log n)^{1/\beta}$ , we get

\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\max_{1\leq j\leq n}|\Psi_{j,1}^{\prime}|^{p}|\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)|^{p}\right]\leq C_{\psi}^{p}(\log n)^{p/\beta}\sup_{w\in\mathcal{W}}\sup_{x^{\prime}\in\mathfrak{X}}\sup_{\{q_{i}\}}|\ell_{j}(x^{\prime};\{q_{i}\},w)|^{p}.

By following the duality argument (32), we get

\sup_{\{q_{i}\}}|\ell_{j}(x^{\prime};\{q_{i}\},w)|\leq\left(\sum_{i=1,i\neq j}^{n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w^{2}(X_{i},x^{\prime})\right]\right)^{1/2},

and so,

\begin{split}&\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\max_{1\leq j\leq n}|\Psi_{j,1}^{\prime}|^{p}|\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)|^{p}\right]\\ &\qquad\leq K^{p}C_{\psi}^{p}(\log n)^{p/\beta}\sup_{w\in\mathcal{W},x\in\mathfrak{X}}\,\left(\sum_{i=1}^{n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w^{2}(X_{i},x^{\prime})\right]\right)^{p/2}=K^{p}\Sigma_{n,1}^{p/2}(\mathcal{W}).\end{split}

(38)

Also, note that

\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right)=\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j}^{\prime})\ell_{j}^{2}(X_{j}^{\prime};\{q_{i}\},w)\right].

Hence, again following the duality argument (32), we get

\begin{split}&\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left(\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right)\right)^{p/2}\\ &\qquad\leq\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\sup_{\{p_{j}\}}\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j}^{\prime})\sigma_{j,\psi}(X_{j}^{\prime})p_{j}(X_{j}^{\prime})\right]\right)^{p}.\end{split}

(39)

Here $\{p_{j}\}$ represents a sequence $(p_{1},\ldots,p_{n})$ satisfying $\sum_{j=1}^{n}\int p_{j}^{2}(x)P_{X_{j}}(dx)\leq 1$ .

Substituting (39) and (38) in (37), we get

\begin{split}\mathbf{II}&\leq K^{p}p^{p/2}\left(\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|\right]\right)^{p}\\ &\qquad\quad+K^{p}p^{p}\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}^{p}+K^{p}p^{3p/2}\Sigma_{n,1}^{p/2}(\mathcal{W}).\end{split}

(40)

Controlling $\mathbf{I}:$ We use Lemma 8 (a restatement of Lemma 2 of Adamczak, (2006)) to control $\mathbf{I}.$ In the notation of Lemma 8, take

W_{j}=(\varepsilon_{j}^{\prime},Z_{j}^{\prime}),\,T=(Z_{1},\ldots,Z_{n},\varepsilon_{1},\ldots,\varepsilon_{n}),

and for $w\in\mathcal{W}$ ,

f_{j}^{w}(W_{j},T)=\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}.

This implies

S=\mathbb{E}_{T}\left[\sup_{w\in\mathcal{W}}\left|\sum_{j=1}^{n}\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}\right|\right]=\mathbb{E}\left[\mathcal{U}_{n}^{(1)}(\mathcal{W})\big{|}\mathcal{Z}_{n}^{\prime}\right].

Observe that $\mathbb{E}[S]=\mathbb{E}[\mathcal{U}_{n}^{(1)}(\mathcal{W})]$ . Thus we get for $p\geq 1$

	$\displaystyle\mathbb{E}\left[S^{p}\right]$	$\displaystyle\leq K^{p}\left(\mathbb{E}[S]\right)^{p}+K^{p}p^{p/2}\Upsilon^{p}$		(41)
		$\displaystyle\qquad+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq j\leq n}\left(\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left\|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}\right\|\big{\|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p}\right],$

where

\displaystyle\Upsilon

\displaystyle:=\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[\left(\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[f_{j}^{w}(W_{j},T)q_{j}(T)\right]\right)^{2}\right]\right)^{1/2},

with $\mathcal{Q}$ defined in Lemma 8. We now simplify the last two terms on the right hand side of (41). First observe that for the third term

	$\displaystyle\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left\|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}\right\|\big{\|}\mathcal{Z}_{n}^{\prime}\right]$
	$\displaystyle\qquad\leq KC_{\psi}(\log n)^{1/\beta}\sup_{x\in\mathfrak{X}}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left\|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},x)\right\|\right]=KE_{n,1}(\mathcal{W}).$

To control $\Upsilon$ , observe that

\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[f_{j}^{w}(W_{j},T)q_{j}(T)\right]=\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[q_{j}(T)\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right].

So, using the definition of $\sigma_{j,\psi}^{2}(\cdot)$ , we get

	$\displaystyle\Upsilon$	$\displaystyle=\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j}^{\prime})\left(\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[q_{j}(T)\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right]\right)^{2}\right]\right)^{1/2}$
		$\displaystyle\overset{(a)}{=}\sup_{\{p_{j}\},q\in\mathcal{Q}}\sum_{j=1}^{n}\mathbb{E}\left[p_{j}(X_{j}^{\prime})\sigma_{j,\psi}(X_{j}^{\prime})\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[q_{j}(T)\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right]\right]$
		$\displaystyle\overset{(b)}{=}\sup_{\{p_{j}\}}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,1}\left(\sum_{j=1,j\neq i}^{n}\int p_{j}(x)\sigma_{j,\psi}(x)w_{i,j}(X_{i},x)P_{X_{j}}(dx)\right)\right].$

Equality (a) above follows from the duality argument (32) while equality (b) follows from the argument given in Lemma 8.

∎

B.2 Auxiliary Lemmas Used in Theorem 3

The following lemma is a rewording of Lemma 2 of Adamczak, (2006). For this result, define the class of functions

\mathcal{Q}:=\left\{q(\cdot)=(q_{1}(\cdot),q_{2}(\cdot),\ldots):\,\sum_{k=1}^{\infty}|q_{k}(T)|=1\quad\mbox{for all}\quad T\right\}.

The domain of functions in $\mathcal{Q}$ is left out on purpose.

Lemma 8.

Suppose $\mathcal{F}:=\{(f_{1}^{k},\ldots,f_{n}^{k}):\,k\geq 1\}$ represents a countable class of vector functions. Define for independent random variables $T,W_{1},\ldots,W_{n}$ ,

S:=\mathbb{E}_{T}\left[\sup_{k\geq 1}\left|\sum_{j=1}^{n}f_{j}^{k}(W_{j},T)\right|\right],

where $\mathbb{E}_{T}[\cdot]$ represents the expectation only with respect to $T$ . (So, $S$ is a random variable that depends on $W_{1},\ldots,W_{n}$ ). If $\mathbb{E}_{W}[f_{j}^{k}(W_{j},T)]=0$ for a.e $T$ , then there exists a constant $K>0$ such that for all $p\geq 1$ ,

	$\displaystyle\mathbb{E}\left[S^{p}\right]$	$\displaystyle\leq K^{p}(\mathbb{E}[S])^{p}+K^{p}p^{p/2}\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[\left(\sum_{k=1}^{\infty}\mathbb{E}_{T}[f_{j}^{k}(W_{j},T)q_{j}(T)]\right)^{2}\right]\right)^{p/2}$
		$\displaystyle\qquad+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq j\leq n}\left(\mathbb{E}_{T}\left[\sup_{k\geq 1}\|f_{j}^{k}(W_{j},T)\|\right]\right)^{p}\right].$

Proof.

Following the proof of Lemma 2 of Adamczak, (2006), we get

S=\sup_{q\in\mathcal{Q}}\,\left|\sum_{k=1}^{\infty}\mathbb{E}_{T}\left[q_{k}(Y)\sum_{j=1}^{n}f_{j}^{k}(W_{j},T)\right]\right|.

To see this, define $\widehat{q}(\cdot)=(\widehat{q}_{1}(\cdot),\ldots)\in\mathcal{Q}$ such that

\widehat{q}_{\widehat{k}}(t)=\mbox{sign}\left(\sum_{j=1}^{n}f_{j}^{\widehat{k}}(W_{j},T)\right),\quad\mbox{and}\quad\widehat{q}_{k}(t)=0,\quad\mbox{for $k\neq\widehat{k}$.}

Here $\widehat{k}$ satisfying

\left|\sum_{j=1}^{n}f_{j}^{\widehat{k}}(W_{j},T)\right|=\sup_{k\geq 1}\left|\sum_{j=1}^{n}f_{j}^{k}(W_{j},T)\right|.

Therefore,

S=\sup_{q\in\mathcal{Q}}\left|\sum_{j=1}^{n}\left(\sum_{k=1}^{\infty}\mathbb{E}_{T}\left[q_{k}(T)f_{j}^{k}(W_{j},T)\right]\right)\right|=:\sup_{q\in\mathcal{Q}}\left|\sum_{j=1}^{n}g_{q,j}(W_{j})\right|.

The right hand side above is the supremum of a mean zero empirical process and so by proposition 3.1 of Giné et al., (2000), we get

\displaystyle\mathbb{E}\left[S^{p}\right]

\displaystyle\leq K^{p}(\mathbb{E}[S])^{p}+K^{p}p^{p/2}\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[g_{q,j}^{2}(W_{j})\right]\right)^{p/2}+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq j\leq n}\sup_{q\in\mathcal{Q}}\left|g_{q,j}(W_{j})\right|^{p}\right].

From the definition of $\mathcal{Q}$ , we get

\sup_{q\in\mathcal{Q}}|g_{q,j}(W_{j})|=\sup_{q\in\mathcal{Q}}\left|\sum_{k=1}^{\infty}\mathbb{E}_{T}\left[q_{k}(T)f_{j}^{k}(W_{j},T)\right]\right|=\mathbb{E}_{T}\left[\sup_{k\geq 1}|f_{j}^{k}(W_{j},T)|\right].

Thus,

\mathbb{E}\left[\max_{1\leq j\leq n}\sup_{q\in\mathcal{Q}}\left|g_{q,j}(W_{j})\right|^{p}\right]=\mathbb{E}\left[\max_{1\leq j\leq n}\left(\mathbb{E}_{T}\left[\sup_{k\geq 1}|f_{j}^{k}(W_{j},T)|\right]\right)^{p}\right].

So, the result follows. ∎

Appendix C Proof of the Maximal Inequality (Theorem 4)

The following moment bound of Rademacher chaos is used in the proof. See corollary 3.2.6 of de la Peña and Giné, (1999) and inequalities leading to (4.1.20) on page 167 of de la Peña and Giné, (1999).

Lemma 9.

Let $Z$ be a homogeneous Rademacher chaos of degree 2, that is,

Z:=\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}a_{i,j},

for some constants $a_{i,j},1\leq i\neq j\leq n$ . Then $\left\lVert Z\right\rVert_{\psi_{1}}\leq 4es_{n}$ , where

s_{n}^{2}:=\sum_{1\leq i\neq j\leq n}a_{i,j}^{2}.

Proof of Theorem 4.

As before, let $\mathcal{X}_{n}:=\{X_{1},X_{2},\ldots,X_{n}\}.$ Also, let

Z_{\epsilon}(f):=\left|\frac{1}{\sqrt{n(n-1)}}\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f_{i,j}(X_{i},X_{j})\right|.

By Lemma 9, we get conditional on $\mathcal{X}_{n}$ ,

\displaystyle\left\lVert Z_{\epsilon}(f)\right\rVert_{\psi_{1}|\mathcal{Z}_{n}}

\displaystyle\leq 4e\left(\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}f^{2}_{i,j}(X_{i},X_{j})\right)^{1/2}\leq 4e\left\lVert f\right\rVert_{2,P_{n}},

where

\left\lVert f\right\rVert_{2,P_{n}}:=\left(\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}f^{2}_{i,j}(X_{i},X_{j})\right)^{1/2},

and define the discrete probability measure $P_{n}$ with support $\{X_{1},\ldots,X_{n}\}$ as

P_{n}(\{X_{i}\}):=\frac{1}{n}\quad\mbox{for}\quad 1\leq i\leq n.

Now, following the proof of Theorem 5.1.4 of de la Peña and Giné, (1999),

\left\lVert\max_{f\in\mathcal{F}}Z_{\epsilon}(f)\right\rVert_{\psi_{1}|\mathcal{X}_{n}}\leq C\int_{0}^{\Delta_{n}}\log N\left(\varepsilon,\mathcal{F},\left\lVert\cdot\right\rVert_{2,P_{n}}\right)d\varepsilon,

where

\Delta_{n}:=\sup_{f\in\mathcal{F}}\left\lVert f\right\rVert_{2,P_{n}}.

Therefore,

\left\lVert\max_{f\in\mathcal{F}}Z_{\epsilon}(f)\right\rVert_{\psi_{1}|\mathcal{X}_{n}}\leq C\left\lVert F\right\rVert_{2,P_{n}}J_{2}\left(\frac{\Delta_{n}}{\left\lVert F\right\rVert_{2,P_{n}}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right).

This implies that

\mathbb{E}\left[\sup_{f\in\mathcal{F}}Z_{\epsilon}(f)\right]\leq C\mathbb{E}\left[\left\lVert F\right\rVert_{2,P_{n}}J_{2}\left(\frac{\Delta_{n}}{\left\lVert F\right\rVert_{2,P_{n}}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)\right].

(42)

Using concavity of $(x,y)\mapsto\sqrt{y}J_{2}(\sqrt{x/y},\mathcal{F},\left\lVert\cdot\right\rVert_{2})$ as in the proof of Theorem 2.1 of van der Vaart and Wellner, (2011), it follows that

\mathbb{E}\left[\left\lVert F\right\rVert_{2,P_{n}}J_{2}\left(\frac{\Delta_{n}}{\left\lVert F\right\rVert_{2,P_{n}}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)\right]\leq\left\lVert F\right\rVert_{2,P}J_{2}\left(\frac{\sqrt{\mathbb{E}\left[\Delta_{n}^{2}\right]}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right),

(43)

where

\left\lVert F\right\rVert_{2,P}^{2}:=\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[F^{2}_{i,j}(X_{i},X_{j})\right].

At this point the proof of Theorem 5.1 of Chen and Kato, (2020) uses Hoeffding averaging to bound $\mathbb{E}\left[\Delta_{n}^{2}\right]$ which proves the result for i.i.d. random variables $X_{i}$ . To allow for non-identically distributed random variables $X_{i},1\leq i\leq n$ , we bound $\mathbb{E}[\Delta_{n}^{2}]$ in terms of $J_{2}$ on the right hand side of (43). This is similar to the proof of Theorem 2.1 of van der Vaart and Wellner, (2011). To bound $\mathbb{E}\left[\Delta_{n}^{2}\right]$ , define for $f\in\mathcal{F},$

	$\displaystyle W_{n}^{(1)}(f)$	$\displaystyle:=\frac{1}{n(n-1)}\left\|\sum_{1\leq i\neq j\leq n}\left\{f^{2}_{i,j}(X_{i},X_{j})-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\|X_{i}\right]\right\}\right.$
		$\displaystyle\qquad\qquad-\left.\vphantom{\sum_{1\leq i\neq j\leq n}}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\|X_{j}\right]+\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right\|,$
	$\displaystyle W_{n}^{(2)}(f)$	$\displaystyle:=\frac{1}{n(n-1)}\left\|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\|X_{i}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right\|,$
	$\displaystyle W_{n}^{(3)}(f)$	$\displaystyle:=\frac{1}{n(n-1)}\left\|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\|X_{j}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right\|.$

Using these definitions, we get

\Delta_{n}^{2}\leq\sup_{f\in\mathcal{F}}W_{n}^{(1)}(f)+\sup_{f\in\mathcal{F}}W_{n}^{(2)}(f)+\sup_{f\in\mathcal{F}}W_{n}^{(3)}(f)+\Sigma_{n}^{2}(\mathcal{F}),

(44)

where

\Sigma_{n}^{2}(\mathcal{F}):=\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right].

By decoupling and symmetrization, we obtain

\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(1)}(f)\right]\leq C\mathbb{E}\left[\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\left|\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f^{2}_{i,j}(X_{i},X_{j})\right|\right].

Set for $f\in\mathcal{F}$ ,

R_{\epsilon}(f):=\frac{1}{\sqrt{n(n-1)}}\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f^{2}_{i,j}(X_{i},X_{j}).

Again by Lemma 9 and using $|f_{i,j}(x,x^{\prime})+g_{i,j}(x,x^{\prime})|\leq 2R$ for all $f,g\in\mathcal{F}$ and $x,x^{\prime}\in\mathcal{X}$ , we get

	$\displaystyle\left\lVert R_{\epsilon}(f)-R_{\epsilon}(g)\right\rVert_{\psi_{1}\big{\|}\mathcal{X}_{n}}$	$\displaystyle\leq 8eR\left(\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\left(f_{i,j}(X_{i},X_{j})-g_{i,j}(X_{i},X_{j})\right)^{2}\right)^{1/2}$
		$\displaystyle\leq 8eR\left\lVert f-g\right\rVert_{2,P_{n}}.$

Hence by following the first part of the proof, we get

\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(1)}(f)\right]\leq C\frac{R\left\lVert F\right\rVert_{2,P}}{n}J_{2}\left(\frac{\sqrt{\mathbb{E}\left[\Delta_{n}^{2}\right]}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right).

(45)

Substituting this in (44) after taking expectations,

\displaystyle\frac{\left\lVert\Delta_{n}\right\rVert_{2}^{2}}{\left\lVert F\right\rVert_{2,P}^{2}}\leq CB^{2}_{n}J_{2}\left(\frac{\left\lVert\Delta_{n}\right\rVert_{2}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)+A^{2}_{n},

where

B^{2}_{n}:=\frac{R}{n\left\lVert F\right\rVert_{2,P}}\quad\mbox{and}\quad A^{2}_{n}:=\frac{\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(2)}(f)\right]+\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(3)}(f)\right]+\Sigma_{n}^{2}(\mathcal{F})}{\left\lVert F\right\rVert_{2,P}^{2}}.

It follows that

\frac{\left\lVert\Delta_{n}\right\rVert_{2}^{2}}{\left\lVert F\right\rVert_{2,P}^{2}}\leq Cb^{2}_{n}J_{2}\left(\frac{\left\lVert\Delta_{n}\right\rVert_{2}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)+a^{2},

for any $a\geq A_{n}$ and $b\geq B_{n}$ . Therefore, by Lemma 2.1 of van der Vaart and Wellner, (2011), it follows that for any $a\geq A_{n}$ and $b\geq B_{n}$ ,

J_{2}\left(\frac{\left\lVert\Delta_{n}\right\rVert_{2}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)\leq CJ_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})\left[1+\frac{J_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})b^{2}}{a^{2}}\right].

Substituting this in (43) and (42), we get

\mathbb{E}\left[\sup_{f\in\mathcal{F}}Z_{\epsilon}(f)\right]\leq C\left\lVert F\right\rVert_{2,P}J_{2}(a)\left[1+\frac{J_{2}(a^{2},\mathcal{F},\left\lVert\cdot\right\rVert_{2})b^{2}}{a^{2}}\right],

for any $a\geq A_{n}$ and $b\geq B_{n}$ . The result is proved.∎

References

Adamczak, (2006) Adamczak, R. (2006). Moment inequalities for $U$ -statistics. Ann. Probab., 34(6):2288–2314.
Arcones and Giné, (1993) Arcones, M. A. and Giné, E. (1993). Limit theorems for $U$ -processes. Ann. Probab., 21(3):1494–1542.
Bakhshizadeh, (2023) Bakhshizadeh, M. (2023). Exponential tail bounds and large deviation principle for heavy-tailed $U$ -statistics. arXiv preprint arXiv:2301.11563.
Bang and Robins, (2005) Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973.
Bentkus and Götze, (1999) Bentkus, V. and Götze, F. (1999). Optimal bounds in non-gaussian limit theorems for $U$ -statistics. The Annals of Probability, 27(1):454–521.
Boucheron et al., (2005) Boucheron, S., Bousquet, O., Lugosi, G., and Massart, P. (2005). Moment inequalities for functions of independent random variables. Ann. Probab., 33(2):514–560.
Boucheron et al., (2013) Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration inequalities. Oxford University Press, Oxford. A nonasymptotic theory of independence, With a foreword by Michel Ledoux.
Chakrabortty and Kuchibhotla, (2018) Chakrabortty, A. and Kuchibhotla, A. K. (2018). Tail bounds for canonical U-statistics and U-processes with unbounded kernels. Technical report, Working paper, Wharton School, University of Pennsylvania.
Chen and Kato, (2020) Chen, X. and Kato, K. (2020). Jackknife multiplier bootstrap: finite sample approximations to the $U$ -process supremum with applications. Probability Theory and Related Fields, 176:1097–1163.
Clémençon et al., (2008) Clémençon, S., Lugosi, G., and Vayatis, N. (2008). Ranking and empirical minimization of $U$ -statistics. Ann. Statist., 36(2):844–874.
de la Peña, (1992) de la Peña, V. H. (1992). Decoupling and Khintchine’s inequalities for $U$ -statistics. Ann. Probab., 20(4):1877–1892.
de la Peña and Giné, (1999) de la Peña, V. H. and Giné, E. (1999). Decoupling. Probability and its Applications (New York). Springer-Verlag, New York. From dependence to independence, Randomly stopped processes. $U$ -statistics and processes. Martingales and beyond.
Delyon and Portier, (2016) Delyon, B. and Portier, F. (2016). Integral approximation by kernel smoothing. Bernoulli, 22(4):2177–2208.
Dirksen, (2015) Dirksen, S. (2015). Tail bounds via generic chaining. Electron. J. Probab., 20:no. 53, 29.
Giné et al., (2000) Giné, E., Latała, R., and Zinn, J. (2000). Exponential and moment inequalities for $U$ -statistics. In High dimensional probability, II (Seattle, WA, 1999), volume 47 of Progr. Probab., pages 13–38. Birkhäuser Boston, Boston, MA.
Giné and Nickl, (2008) Giné, E. and Nickl, R. (2008). A simple adaptive estimator of the integrated square of a density. Bernoulli, 14(1):47–61.
Giné and Nickl, (2016) Giné, E. and Nickl, R. (2016). Mathematical foundations of infinite-dimensional statistical models. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, New York.
Hall and Marron, (1987) Hall, P. and Marron, J. S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett., 6(2):109–115.
He et al., (2024) He, Y., Wang, K., and Zhu, Y. (2024). Sparse hanson-wright inequalities with applications. arXiv preprint arXiv:2410.15652.
Houdré and Reynaud-Bouret, (2003) Houdré, C. and Reynaud-Bouret, P. (2003). Exponential inequalities, with constants, for U-statistics of order two. In Stochastic inequalities and applications, volume 56 of Progr. Probab., pages 55–69. Birkhäuser, Basel.
Kim, (2020) Kim, I. (2020). Statistical Theory and Methods for Comparing Distributions. PhD thesis, Carnegie Mellon University.
Kolesko and Latała, (2015) Kolesko, K. and Latała, R. (2015). Moment estimates for chaoses generated by symmetric random variables with logarithmically convex tails. Statist. Probab. Lett., 107:210–214.
Kuchibhotla and Chakrabortty, (2022) Kuchibhotla, A. K. and Chakrabortty, A. (2022). Moving beyond sub-Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression. Information and Inference: A Journal of the IMA, 11(4):1389–1456.
Ledoux and Talagrand, (1991) Ledoux, M. and Talagrand, M. (1991). Probability in Banach spaces, volume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Springer-Verlag, Berlin. Isoperimetry and processes.
Liu et al., (2021) Liu, L., Mukherjee, R., Robins, J. M., and Tchetgen, E. T. (2021). Adaptive estimation of nonparametric functionals. Journal of Machine Learning Research, 22(99):1–66.
Major, (2005) Major, P. (2005). Tail behaviour of multiple random integrals and $U$ -statistics. Probab. Surv., 2:448–505.
Major, (2013) Major, P. (2013). On the estimation of multiple random integrals and $U$ -statistics, volume 2079 of Lecture Notes in Mathematics. Springer, Heidelberg.
Newey and Ruud, (2005) Newey, W. K. and Ruud, P. A. (2005). Density weighted linear least squares. In Identification and inference for econometric models, pages 554–573. Cambridge Univ. Press, Cambridge.
Nolan and Pollard, (1987) Nolan, D. and Pollard, D. (1987). $U$ -processes: rates of convergence. Ann. Statist., 15(2):780–799.
Nolan et al., (1988) Nolan, D., Pollard, D., et al. (1988). Functional limit theorems for $u$ -processes. The Annals of Probability, 16(3):1291–1298.
Robins et al., (2008) Robins, J., Li, L., Tchetgen, E., van der Vaart, A., et al. (2008). Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and statistics: essays in honor of David A. Freedman, volume 2, pages 335–422. Institute of Mathematical Statistics.
Robins et al., (2017) Robins, J. M., Li, L., Mukherjee, R., Tchetgen, E. T., and van der Vaart, A. (2017). Minimax estimation of a functional on a structured high-dimensional model. The Annals of Statistics, 45(5):1951–1987.
Robins et al., (2016) Robins, J. M., Li, L., Tchetgen, E. T., and van der Vaart, A. (2016). Asymptotic normality of quadratic estimators. Stochastic processes and their applications, 126(12):3733–3759.
Robins et al., (1994) Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866.
Rudelson and Vershynin, (2013) Rudelson, M. and Vershynin, R. (2013). Hanson-Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab., 18:no. 82, 9.
Serfling, (1980) Serfling, R. J. (1980). Approximation theorems of mathematical statistics. John Wiley & Sons, Inc., New York. Wiley Series in Probability and Mathematical Statistics.
Spokoiny and Zhilova, (2013) Spokoiny, V. and Zhilova, M. (2013). Sharp deviation bounds for quadratic forms. Math. Methods Statist., 22(2):100–113.
Talagrand, (2014) Talagrand, M. (2014). Upper and lower bounds for stochastic processes, volume 60 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer, Heidelberg. Modern methods and classical problems.
van de Geer and Lederer, (2013) van de Geer, S. and Lederer, J. (2013). The Bernstein-Orlicz norm and deviation inequalities. Probab. Theory Related Fields, 157(1-2):225–250.
van der Vaart and Wellner, (2011) van der Vaart, A. and Wellner, J. A. (2011). A local maximal inequality under uniform entropy. Electron. J. Stat., 5:192–203.
van der Vaart and Wellner, (1996) van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics.

	$\displaystyle\left\|\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j})\right\|$
	$\displaystyle\leq\mathfrak{C}(\log(1/\delta))^{1/2}\Lambda_{1/2}+\mathfrak{C}\log(1/\delta)\Lambda_{1}$
	$\displaystyle\quad+\mathfrak{C}_{\beta}(\log(1/\delta))^{1/2+1/\beta^{}}(\log n)^{1/\beta}\Lambda_{\beta}+\mathfrak{C}_{\alpha}(\log(1/\delta))^{1/2+1/\alpha^{}}(\log n)^{1/2+1/\alpha}\Lambda_{\beta}$
	$\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}(\log(1/\delta))^{1/\alpha^{}+1/\beta^{}}K_{F}K_{G}(\log n)^{1/\alpha+1/\beta+1/\beta^{*}}.$

	$\displaystyle\|\mathcal{U}_{n}^{\prime}\|$	$\displaystyle\leq 14\sqrt{\log(3/\delta_{2})}\left(\sum_{i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}$
		$\displaystyle\quad+7\sqrt{2\log(3/\delta_{2})\log(1/\delta_{3})}\\|(f_{i,j})\\|_{L^{2}\to L^{2}}$
		$\displaystyle\quad+\mathfrak{C}_{\beta}(\log(3/\delta_{2}))^{1/2}(\log(1/\delta_{3}))^{1/\beta^{*}}(\log n)^{1/\beta}\max_{1\leq j\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{j}^{\prime}]\right)^{1/2}\right\\|_{\psi_{\beta}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha}(\log(3n/\delta_{1}))^{1/2}(\log(3/\delta_{2}))^{1/\alpha^{*}}(\log(2n))^{1/\alpha}\max_{1\leq i\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{i}]\right)^{1/2}\right\\|_{\psi_{\alpha}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}K_{F}K_{G}(\log(2n))^{1/\alpha+1/\beta}(\log(3/\delta_{2}))^{1/\alpha^{}}(\log(3n/\delta_{1}))^{1/\beta^{}}.$

	$\displaystyle\\|\mathcal{U}_{n}^{\prime}\\|_{p}$	$\displaystyle\leq\mathfrak{C}p^{1/2}\left(\sum_{i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}$
		$\displaystyle\quad+\mathfrak{C}p\\|(f_{i,j})\\|_{L^{2}\to L^{2}}$
		$\displaystyle\quad+\mathfrak{C}_{\beta}p^{1/2+1/\beta^{*}}(\log n)^{1/\beta}\max_{1\leq j\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{j}^{\prime}]\right)^{1/2}\right\\|_{\psi_{\beta}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha}p^{1/2+1/\alpha^{*}}(\log(2n))^{1/2+1/\alpha}\max_{1\leq i\leq n}\left\\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})\|Z_{i}]\right)^{1/2}\right\\|_{\psi_{\alpha}}$
		$\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}p^{1/\alpha^{}+1/\beta^{}}K_{F}K_{G}(\log(2n))^{1/\alpha+1/\beta+1/\beta^{*}}.$

	$\displaystyle\left\lVert\mathcal{U}_{n}^{(2)}+\mathcal{U}_{n}^{(3)}\right\rVert_{p}$	$\displaystyle\leq Kp^{1/\alpha^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+Kp^{1/\beta^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\qquad+Kp^{1/2+1/\alpha^{}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{}}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\qquad+K(\log n)\Lambda_{2}[p^{1/\alpha^{}}+p^{1/\beta^{}}]+K\Lambda_{2}[p^{1+1/\alpha^{}}+p^{1+1/\beta^{}}].$

	$\displaystyle\left\lVert\sum_{\ell=1}^{4}\mathcal{U}_{n}^{(\ell)}\right\rVert_{p}$	$\displaystyle\leq K\left[p^{1/2}\Lambda_{1/2}+p\Lambda_{1}+p^{3/2}\left\{\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right\}+p^{2}\Lambda_{2}\right]$
		$\displaystyle\quad+Kp^{1/\alpha^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+Kp^{1/\beta^{}}(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\quad+Kp^{1/2+1/\alpha^{}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{}}\Lambda_{3/2}^{(\beta)}$
		$\displaystyle\quad+K(\log n)\Lambda_{2}[p^{1/\alpha^{}}+p^{1/\beta^{}}]+K\Lambda_{2}[p^{1+1/\alpha^{}}+p^{1+1/\beta^{}}]$
		$\displaystyle\quad+Kp^{(1/\alpha^{}+1/\beta^{})}\Lambda_{2}.$

Tail Bounds for Canonical UU-Statistics and UU-Processes with Unbounded Kernels††thanks: An initial version of this work was available here. This draft is a revised version with some edits.

Abstract

1 Introduction and Motivation

Degenerate or Canonical UU-statistics.

1.1 Related Literature

Organization.

2 Tail Bounds for Degenerate UU–Statistics

Theorem 1.

Proof.

Lemma 1.

Theorem 2.

Proof.

3 Tail Bounds for Degenerate UU–Processes

Theorem 3.

Proof.

3.1 Maximal Inequality for Bounded Degenerate U-Processes

Theorem 4.

Proof.

Appendix A Proofs of Results in Section 2

A.1 Proof of Theorem 1

A.2 Proof of Lemma 1

A.3 Proof of Theorem 2

A.4 Auxiliary Lemmas Used in Theorem 2

Lemma 2.

Proof.

Lemma 3.

Proof.

Lemma 4.

Proof.

Appendix B Proofs of Results in Section 3

B.1 Proof of Theorem 3

Lemma 5 (Control of 𝒰n(4)(𝒲)\mathcal{U}_{n}^{(4)}(\mathcal{W})).

Proof.

Lemma 6 (Control of 𝒰n(2)(𝒲)\mathcal{U}_{n}^{(2)}(\mathcal{W}) and 𝒰n(3)(𝒲)\mathcal{U}_{n}^{(3)}(\mathcal{W})).

Proof.

Lemma 7 (Control of 𝒰n(1)(𝒲)\mathcal{U}_{n}^{(1)}(\mathcal{W})).

Proof.

B.2 Auxiliary Lemmas Used in Theorem 3

Lemma 8.

Proof.

Appendix C Proof of the Maximal Inequality (Theorem 4)

Lemma 9.

Proof of Theorem 4.

References

Tail Bounds for Canonical $U$ -Statistics and $U$ -Processes with Unbounded Kernels^†^†thanks: An initial version of this work was available here. This draft is a revised version with some edits.

Degenerate or Canonical $U$ -statistics.

2 Tail Bounds for Degenerate $U$ –Statistics

3 Tail Bounds for Degenerate $U$ –Processes

Lemma 5 (Control of $\mathcal{U}_{n}^{(4)}(\mathcal{W})$ ).

Lemma 6 (Control of $\mathcal{U}_{n}^{(2)}(\mathcal{W})$ and $\mathcal{U}_{n}^{(3)}(\mathcal{W})$ ).

Lemma 7 (Control of $\mathcal{U}_{n}^{(1)}(\mathcal{W})$ ).