This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Tail Bounds for Canonical UU-Statistics and UU-Processes with Unbounded Kernelsthanks: An initial version of this work was available here. This draft is a revised version with some edits.

Abhishek Chakrabortty  and  Arun Kumar Kuchibhotla Department of Statistics, Texas A&M university. Email: [email protected]Department of Statistics & Data Science, Carnegie Mellon University. Email: [email protected]
Abstract

In this paper, we prove exponential tail bounds for canonical (or degenerate) UU-statistics and UU-processes under exponential-type tail assumptions on the kernels. Most of the existing results in the relevant literature often assume bounded kernels or obtain sub-optimal tail behavior under unbounded kernels. We obtain sharp rates and optimal tail behavior under sub-Weibull kernel functions. Some examples from nonparametric and semiparametric statistics literature are considered.

Keywords and phrases: Degenerate UU-statistics and UU-processes, Unbounded kernels, Sub-Weibull tails, Exponential tail bounds, Nonparametric/semiparametric statistics.

1 Introduction and Motivation

In this paper, we study moment and tail bounds of second-order degenerate UU-statistics and UU-processes. Averages, the simplest function of a collection of random variables, are sums with each summand depending only on one element of the collection. On the other hand, UU-statistics depend on tuples of elements in the collection. Formally, second-order UU-statistics based on the collection of random variables Z1,,ZnZ_{1},\ldots,Z_{n} is of the form

Un=1ijnfi,j(Zi,Zj),{U}_{n}=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}), (1)

for some functions {fi,j: 1ijn}\{f_{i,j}:\,1\leq i\neq j\leq n\}. In this paper, we consider UU-statistics defined on independent but possible non-identically distributed random variables Z1,,ZnZ_{1},\ldots,Z_{n} defined on some measurable space. UU-statistics, in general, are ubiquitous in statistical applications including, e.g., goodness-of-fit tests, two-sample tests using kernel-based distances as well as independence testing via permutation tests; see Kim, (2020) for an overview of this literature.

We motivate our interest in UU-statistics using a few prototypical examples. Suppose X1,,XnX_{1},\ldots,X_{n} are independent and identically distributed (i.i.d.) realizations of a random vector XpX\in\mathbb{R}^{p} with Lebesgue density ff. Consider the problem of estimating the quadratic functional

Γ(f):=pf2(x)𝑑x=𝔼[f(X)].\Gamma(f):=\int_{\mathbb{R}^{p}}f^{2}(x)dx=\mathbb{E}\left[f(X)\right]. (2)

A natural estimator for this functional is given by

Γ^(f):=1n(n1)hnp1ijnK(XiXjhn)=1ni=1nf^(i)(Xi),\widehat{\Gamma}(f):=\frac{1}{n(n-1)h_{n}^{p}}\sum_{1\leq i\neq j\leq n}K\left(\frac{X_{i}-X_{j}}{h_{n}}\right)=\frac{1}{n}\sum_{i=1}^{n}\widehat{f}^{(-i)}(X_{i}), (3)

where hnh_{n} represents the bandwidth and f^(i)()\widehat{f}^{(-i)}(\cdot) represents the leave-one-out kernel density estimator:

f^(i)(x)=1(n1)hnpj=1,jinK(Xjxhn).\widehat{f}^{(-i)}(x)=\frac{1}{(n-1)h_{n}^{p}}\sum_{j=1,j\neq i}^{n}K\left(\frac{X_{j}-x}{h_{n}}\right). (4)

Here the function K()K(\cdot) is assumed to be symmetric and satisfies pK(x)𝑑x=1\int_{\mathbb{R}^{p}}K(x)dx=1. This estimator was introduced by Hall and Marron, (1987) and was studied thoroughly (in terms of adaptivity) for p=1p=1 in Giné and Nickl, (2008).

Similarly, to estimate integrals involving the conditional expectation function from i.i.d. realizations (X1,Y1),,(Xn,Yn)(X_{1},Y_{1}),\ldots,(X_{n},Y_{n}) of (X,Y)(X,Y), the following UU-statistics appears:

Un:=1n(n1)hnp1ijnYiK(XiXjhn)Yj.{U}_{n}^{\star}:=\frac{1}{n(n-1)h_{n}^{p}}\sum_{1\leq i\neq j\leq n}Y_{i}K\left(\frac{X_{i}-X_{j}}{h_{n}}\right)Y_{j}. (5)

Aside from these prototypical examples, various other examples of such UU-statistics are encountered in the literature on integral approximation involving kernel smoothing estimators (Newey and Ruud,, 2005; Delyon and Portier,, 2016) and the semiparametric inference literature on quadratic and integral-type functionals (Robins et al.,, 2016). In the latter literature, UU-statistics of this type – especially in their degenerate form (see below for the definition) – are fundamentally involved in the analysis of so-called doubly robust estimators of certain functionals encountered in missing data or causal inference problems (Robins et al.,, 1994; Bang and Robins,, 2005), as well as in the literature on adaptive estimation of functionals based on so-called higher order influence functions (Robins et al.,, 2008, 2017; Liu et al.,, 2021).

Apart from the nonparametric and semiparametric statistics literature, second order UU-statistics also arise in relation to Hanson-Wright-type inequalities. The classical Hanson-Wright inequality concerns tail bounds for the quadratic form GAGG^{\top}AG where GG is a standard multivariate normal random vector in n\mathbb{R}^{n} and An×nA\in\mathbb{R}^{n\times n} is a positive semi-definite matrix; see Theorem 3.1.9 of Giné and Nickl, (2016). For further applications of Hanson-Wright inequalities, see Rudelson and Vershynin, (2013) and Spokoiny and Zhilova, (2013), as well as the recent work of He et al., (2024) on sparse random vectors. Note that for any random vector YnY\in\mathbb{R}^{n} and matrix An×nA\in\mathbb{R}^{n\times n}

YAY=1i,jnYiA(i,j)Yj,Y^{\top}AY=\sum_{1\leq i,j\leq n}Y_{i}A(i,j)Y_{j}, (6)

where A(i,j)A(i,j) represents the ii-th row, jj-th column entry in the matrix A.A.

Motivated by the examples above, we study the properties of the UU-statistic UnU_{n}. Before proceeding further, we briefly discuss degenerate and non-degenerate UU-statistics. See Serfling, (1980, Chapter 5) for more details. This discussion proves that for a precise understanding of the tail behavior of a UU-statistics it suffices to consider degenerate UU-statistics. In fact, most of the asymptotic normality results related to UU-statistics are shown by proving asymptotic negligebility of the degenerate UU-statistics compared to the linear statistic; see, for example, Chen and Kato, (2020). This paper is partly motivated by the cases where such asymptotic negligebility may not hold. For example, in the context of estimating μ2\mu^{2} based on IID observations X1,,XnX_{1},\ldots,X_{n} with mean μ\mu, the unbiased estimator (n2)1ijXiXj\binom{n}{2}^{-1}\sum_{i\neq j}X_{i}X_{j} exhibits a phase transition at μ=O(n1/2)\mu=O(n^{-1/2}) in terms of rate and also the limiting distribution.

Degenerate or Canonical UU-statistics.

For any sequence of functions (called kernels) fi,j(,)f_{i,j}(\cdot,\cdot) and independent random variables Z1,,ZnZ_{1},\ldots,Z_{n}, a UU-statistic is given by

Tn:=1ijnfi,j(Zi,Zj).T_{n}:=\sum_{1\leq i\neq j\leq n}\,f_{i,j}(Z_{i},Z_{j}).

Note that the diagonal terms (i=ji=j cases) are ignored in the summation above. If these diagonal terms are included then the resulting statistic is called a VV-statistic. The UU-statistic UnU_{n} is called degenerate or canonical if the kernel functions satisfy

𝔼[fi,j(Zi,Zj)|Zi]=𝔼[fi,j(Zi,Zj)|Zj]=0,for all1ijn.\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{i}\right]=\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{j}\right]=0,\quad\mbox{for all}\quad 1\leq i\neq j\leq n. (7)

If the kernel functions do not satisfy (7), then the corresponding UU-statistic is called non-degenerate. It is not difficult to see that a non-degenerate UU-statistic can be written as a sum of independent mean zero random variables and a degenerate UU-statistic:

Tn=1ijnfi,jD(Zi,Zj)+j=1ngj(Zj)+i=1nhi(Zi)=:𝒰n(f)+Tn(1)+Tn(2),T_{n}=\sum_{1\leq i\neq j\leq n}f_{i,j}^{D}(Z_{i},Z_{j})+\sum_{j=1}^{n}g_{j}(Z_{j})+\sum_{i=1}^{n}h_{i}(Z_{i})=:\mathcal{U}_{n}(f)+T_{n}^{(1)}+T_{n}^{(2)}, (8)

where

fi,jD(Zi,Zj)\displaystyle f^{D}_{i,j}(Z_{i},Z_{j}) :=fi,j(Zi,Zj)𝔼[fi,j(Zi,Zj)|Zj]𝔼[fi,j(Zi,Zj)|Zi]+𝔼[fi,j(Zi,Zj)],\displaystyle:=f_{i,j}(Z_{i},Z_{j})-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{j}\right]-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{i}\right]+\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\right],
gj(Zj)\displaystyle g_{j}(Z_{j}) :=i=1,ijn{𝔼[fi,j(Zi,Zj)|Zj]𝔼[fi,j(Zi,Zj)]},\displaystyle:=\sum_{i=1,i\neq j}^{n}\Big{\{}\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{j}\right]-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\right]\Big{\}}, (9)
hi(Zi)\displaystyle h_{i}(Z_{i}) :=j=1,jin{𝔼[fi,j(Zi,Zj)|Zi]𝔼[fi,j(Zi,Zj)]}.\displaystyle:=\sum_{j=1,j\neq i}^{n}\Big{\{}\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\big{|}Z_{i}\right]-\mathbb{E}\left[f_{i,j}(Z_{i},Z_{j})\right]\Big{\}}.

It is clear from these expressions that the kernels fi,jD(,)f_{i,j}^{D}(\cdot,\cdot) satisfy (7) and so are degenerate kernels. Since Tn(1)T_{n}^{(1)} and Tn(2)T_{n}^{(2)} in (8) are sums of independent random variables with mean zero, they can be understood easily from the classical results like the central limit theorem (asymptotically) and Bernstein/Hoeffding or more general inequalities (non-asymptotically). For this reason, we focus mostly on the degenerate part of (8) in the rest of the paper and derive non-asymptotic moment as well as tail bounds when the non-degenerate UU-statistics is of the form (1). Our main tool is the decoupling inequality proved in de la Peña, (1992). We refer to de la Peña and Giné, (1999, Chapter 3) for more details regarding decoupling in UU-statistics.

After deriving non-asymptotic tail bounds for degenerate UU-statistics, we provide the same for supremum of degenerate UU-statistics over a function class. Suppose n\mathcal{F}_{n} is a class of sequence of functions (degenerate kernels) of type f:={fi,jD(,): 1ijn}f:=\{f_{i,j}^{D}(\cdot,\cdot):\,1\leq i\neq j\leq n\} and define

𝒰n(f):=1ijnfi,jD(Zi,Zj).\mathcal{U}_{n}(f):=\sum_{1\leq i\neq j\leq n}\,f_{i,j}^{D}(Z_{i},Z_{j}).

Then {𝒰n(f):fn}\{\mathcal{U}_{n}(f):\,f\in\mathcal{F}_{n}\} can be viewed as a process called the UU-process and we provide exponential tail bounds for the supremum:

𝒰n():=supfn|𝒰n(f)|.\mathcal{U}_{n}(\mathcal{F}):=\sup_{f\in\mathcal{F}_{n}}\,\left|\mathcal{U}_{n}(f)\right|.

An important application would be the study of uniform-in-bandwidth properties of the estimator Γ^(f)\widehat{\Gamma}(f) in (3), that is,

suphn[an,bn]|Γ^(f;hn)𝔼[Γ^(f;hn)]|,\sup_{h_{n}\in[a_{n},b_{n}]}\left|\widehat{\Gamma}(f;h_{n})-\mathbb{E}\left[\widehat{\Gamma}(f;h_{n})\right]\right|,

for some numbers an,bn(0,1)a_{n},b_{n}\in(0,1). Further applications can be found in de la Peña and Giné, (1999, Section 5.5) and Major, (2013). As a final note, we mention that even though our techniques extend to UU-statistics/processes of higher order, we restrict ourselves to second order UU-statistics/processes for simplicity and ease of exposition.

1.1 Related Literature

In this section, we review some of the by-now classical exponential tail bounds for degenerate UU-statistics and supremum of UU-processes. Proposition 2.3 of Arcones and Giné, (1993) proved a Bernstein type inequality for degenerate UU-statistics/processes. Specifically, for the degenerate UU-statistics

Un:=n11ijnf(Zi,Zj),U_{n}:=n^{-1}\sum_{1\leq i\neq j\leq n}\,f(Z_{i},Z_{j}),

with i.i.d. random variables Z1,,ZnZ_{1},\ldots,Z_{n}, σ2:=𝔼f2(Zi,Zj)\sigma^{2}:=\mathbb{E}f^{2}(Z_{i},Z_{j}) and fC\left\lVert f\right\rVert_{\infty}\leq C, they show there exists constants c1,c2>0c_{1},c_{2}>0 such that for any t>0t>0,

(|Un|t)c1exp(c1tσ+(Ct1/2n1/2)2/3).\mathbb{P}\left(|U_{n}|\geq t\right)\leq c_{1}\exp\left(-\frac{c_{1}t}{\sigma+(Ct^{1/2}n^{-1/2})^{2/3}}\right).

This tail bound has two regimes: exponential and Weibull of order 2/32/3. Because of the appearance of the variance, this tail bound provides the correct rate of convergence. Theorem 3.3 of Giné et al., (2000) improved the tail bound by providing the optimal four regimes of the tail: Gaussian, exponential, Weibull of orders 2/32/3 and 1/21/2. Houdré and Reynaud-Bouret, (2003) gave an alternative proof to the result of Giné et al., (2000) using martingale inequalities with explicit constants. In particular, Theorem 3.3 of Giné et al., (2000) shows that for all t0t\geq 0,

(|nUn|t)Lexp(1Lmin{t2C2,tD,t2/3B2/3,t1/2A1/2}),\mathbb{P}\left(|nU_{n}|\geq t\right)\leq L\exp\left(-\frac{1}{L}\min\left\{\frac{t^{2}}{C^{2}},\frac{t}{D},\frac{t^{2/3}}{B^{2/3}},\frac{t^{1/2}}{A^{1/2}}\right\}\right),

for some constants A,B,C,DA,B,C,D and LL. The main disadvantage of the results above is the restrictive boundedness assumption. Theorem 3.2 of Giné et al., (2000) actually applies without the boundedness condition but the tail bound thus obtained is sub-optimal. For example, if f(Zi,Zj)=Yig(Xi,Xj)Yjf(Z_{i},Z_{j})=Y_{i}g(X_{i},X_{j})Y_{j} where Zi=(Xi,Yi)Z_{i}=(X_{i},Y_{i}), gC<\left\lVert g\right\rVert_{\infty}\leq C<\infty and YiY_{i}’s are mean zero (conditionally) sub-Weibull variables of order α>0\alpha>0, that is, (|Yi|t|Xi)2exp(tα)\mathbb{P}(|Y_{i}|\geq t|X_{i})\leq 2\exp(-t^{\alpha}). Then, Theorem 3.2 of Giné et al., (2000) implies a tail bound of the form:

(|nUn|t)Lexp(1Lmin{t2C2,tD,tα1Bα1,tα2Aα2}),\mathbb{P}\left(|nU_{n}|\geq t\right)\leq L\exp\left(-\frac{1}{L}\min\left\{\frac{t^{2}}{C^{2}},\frac{t}{D},\frac{t^{\alpha_{1}}}{B^{\alpha_{1}}},\frac{t^{\alpha_{2}}}{A^{\alpha_{2}}}\right\}\right),

where α11=(3/2+1/α)\alpha_{1}^{-1}=(3/2+1/\alpha) and α21=(2+2/α)1\alpha_{2}^{-1}=(2+2/\alpha)^{-1}. This is sub-optimal in comparison with the results of Kolesko and Latała, (2015, Example 3). On the other hand, the results of Kolesko and Latała, (2015) do not get the correct rate of convergence as can be obtained from the results of Giné et al., (2000). This is because the bound of Kolesko and Latała, (2015) does not depend on the variance. We are not aware of any tail bounds in the literature that implies the correct rate of convergence as well as the optimal tail behavior. We also note here the recent work of Bakhshizadeh, (2023) which appeared after the initial working version (Chakrabortty and Kuchibhotla,, 2018) of this preprint. While they do consider general unbounded kernels, their focus is primarily on exponential bounds and large deviation principles for non-degenerate UU-statistics, different from ours.

In regards to the tail bounds for degenerate UU-processes, some of the important works are Adamczak, (2006), Clémençon et al., (2008) and Major, (2013). The latter two papers only consider bounded kernels and the bounds of Adamczak, (2006) are written in terms of functionals that are in general hard to control. The results of Major, (2005) and Major, (2013) apply only to bounded kernels and are written for VC classes n\mathcal{F}_{n} but imply the correct rate of convergence. However, the results there do not show the optimal four regimes in the tail behavior. Theorem 11 of Clémençon et al., (2008) is written as a deviation inequality but does not imply the correct rate of convergence. For instance, if f(Xi,Xj)=εiεjK((XiXj)/h)f(X_{i},X_{j})=\varepsilon_{i}\varepsilon_{j}K((X_{i}-X_{j})/h) with εi\varepsilon_{i} being Rademacher random variables independent of XipX_{i}\in\mathbb{R}^{p}, then the rate of convergence of

Tn:=suph{hn}|1ijnεiεjK(XiXjh)|,T_{n}:=\sup_{h\in\{h_{n}\}}\,\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}K\left(\frac{X_{i}-X_{j}}{h}\right)\right|,

from Theorem 11 of Clémençon et al., (2008) is nK=O(n)n\left\lVert K\right\rVert_{\infty}=O(n) (because of FnFn in the moment bound) but the correct rate of convergence is nhnp/2nh_{n}^{p/2} (that can be obtained by calculating the variance). As in the case of UU-statistics, we are not aware of any tail bound results that can obtain the correct rate of convergence and apply to unbounded kernels. Using the techniques of truncation, decoupling technique and the entropy method of Boucheron et al., (2005), we prove a deviation inequality for degenerate UU-processes that implies the correct rate of convergence and the optimal tail behavior.

Organization.

The rest of the article is organized as follows. In Section 2 we prove exponential tail bounds for second order degenerate UU-statistics. In Section 3 we prove a deviation bound for degenerate UU-processes and also provide maximal inequalities to control the expectation of the maximum. The proofs of all the results are distributed in Appendices A, B and C.

2 Tail Bounds for Degenerate UU–Statistics

We prove two tail bounds for degenerate UU-statistics. The first is a general result applicable to all kernels that are bounded above by a product kernel and the second result is for more structured kernels that are of importance in non- and semi-parametric estimation. Define a random variable WW to be sub-Weibull of order α>0\alpha>0 if Wψα<,\left\lVert W\right\rVert_{\psi_{\alpha}}<\infty, where ψα(x)=exp(xα)1\psi_{\alpha}(x)=\exp(x^{\alpha})-1 for x0x\geq 0 and

Wψα=inf{C0:𝔼[ψα(|W|/C)]1}.\left\lVert W\right\rVert_{\psi_{\alpha}}=\inf\left\{C\geq 0:\,\mathbb{E}\left[\psi_{\alpha}\left(|W|/C\right)\right]\leq 1\right\}.

Several properties of sum of independent sub-Weibull random variables are derived in Kuchibhotla and Chakrabortty, (2022). The main focus of this section is to extend these results to degenerate UU-statistics.

Consider a degenerate UU-statistics

UnD:=1ijnfi,j(Zi,Zj),U_{n}^{D}:=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}),

where Z1,,ZnZ_{1},\ldots,Z_{n} are independent random variables and {fi,j(,): 1ijn}\{f_{i,j}(\cdot,\cdot):\,1\leq i\neq j\leq n\} is a collection of degenerate (or canonical) kernels, i.e.,

𝔼[fi,j(Zi,Zj)|Zi]=0=𝔼[fi,j(Zi,Zj)|Zj].\mathbb{E}[f_{i,j}(Z_{i},Z_{j})|Z_{i}]=0=\mathbb{E}[f_{i,j}(Z_{i},Z_{j})|Z_{j}].

We assume the following on the degenerate kernel fi,jf_{i,j}:

  1. (A1)

    For 1ijn1\leq i\neq j\leq n, there exist non-negative functions Fi()F_{i}(\cdot) and Gj()G_{j}(\cdot) such that

    |fi,j(Zi,Zj)|Fi(Zi)Gj(Zj)andFi(Zi)ψαKF,Gj(Zj)ψβKG.|f_{i,j}(Z_{i},Z_{j})|\leq F_{i}(Z_{i})G_{j}(Z_{j})\quad\mbox{and}\quad\|F_{i}(Z_{i})\|_{\psi_{\alpha}}\leq K_{F},\;\|G_{j}(Z_{j})\|_{\psi_{\beta}}\leq K_{G}.

The first part of assumption (A1) implies that the degenerate kernel fi,jf_{i,j} can be expressed as fi,j(Zi,Zj)=Fi(Zi)wi,j(Zi,Zj)Gj(Zj)f_{i,j}(Z_{i},Z_{j})=F_{i}(Z_{i})w_{i,j}(Z_{i},Z_{j})G_{j}(Z_{j}) for some collection of bounded kernels {wi,j:1ijn}\{w_{i,j}:1\leq i\neq j\leq n\}. No further structure on wi,jw_{i,j}’s is required. In the second result we consider below, we place additional structure on wi,jw_{i,j}’s. The second part of assumption (A1) means that

𝔼[exp(|Fi(Zi)|αKFα)]2and𝔼[exp(|Gj(Zj)|βKGβ)]2.\mathbb{E}\left[\exp\left(\frac{|F_{i}(Z_{i})|^{\alpha}}{K_{F}^{\alpha}}\right)\right]\leq 2\quad\mbox{and}\quad\mathbb{E}\left[\exp\left(\frac{|G_{j}(Z_{j})|^{\beta}}{K_{G}^{\beta}}\right)\right]\leq 2.

Equivalently, Fi(Zi)F_{i}(Z_{i}) is sub-Weibull(α)(\alpha) and Gj(Zj)G_{j}(Z_{j}) is sub-Weibull(β)(\beta), in the terminology of Kuchibhotla and Chakrabortty, (2022). To present the result, we define a few quantities. Let (Z1,Z2,,Zn)(Z_{1}^{\prime},Z_{2}^{\prime},\ldots,Z_{n}^{\prime}) be an independent copy of (Z1,,Zn)(Z_{1},\ldots,Z_{n}).

Λ1/2:=(𝔼[1ijnfi,j2(Zi,Zj)])1/2,Λ1=(fi,j)L2L2,:=sup{𝔼1ijnfi,j(Zi,Zj)γi(Zi)δj(Zj):i=1n𝔼[γi2(Zi)]1,j=1n𝔼[δj2(Zj)]1},Λα:=max1in1jn,ji𝔼[fi,j2(Zi,Zj)|Zi]ψα/21/2,Λβ:=max1jn1in,ij𝔼[fi,j2(Zi,Zj)|Zj]ψβ/21/2.\begin{split}\Lambda_{1/2}&:=\left(\mathbb{E}\left[\sum_{1\leq i\neq j\leq n}f_{i,j}^{2}(Z_{i},Z_{j})\right]\right)^{1/2},\\ \Lambda_{1}&=\|(f_{i,j})\|_{L^{2}\to L^{2}},\\ &:=\sup\left\{\mathbb{E}\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}^{\prime})\gamma_{i}(Z_{i})\delta_{j}(Z_{j}^{\prime}):\,\sum_{i=1}^{n}\mathbb{E}[\gamma_{i}^{2}(Z_{i})]\leq 1,\sum_{j=1}^{n}\mathbb{E}[\delta_{j}^{2}(Z_{j}^{\prime})]\leq 1\right\},\\ \Lambda_{\alpha}&:=\max_{1\leq i\leq n}\bigg{\|}\sum_{1\leq j\leq n,j\neq i}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{i}]\bigg{\|}_{\psi_{\alpha/2}}^{1/2},\\ \Lambda_{\beta}&:=\max_{1\leq j\leq n}\bigg{\|}\sum_{1\leq i\leq n,i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{j}^{\prime}]\bigg{\|}_{\psi_{\beta/2}}^{1/2}.\end{split} (10)

The quantities Λ1/2\Lambda_{1/2} and Λ1\Lambda_{1} also appear in the moment bound for degenerate UU-statistics with bounded kernels; see Theorem 3.2 of Giné et al., (2000). Note that Λα\Lambda_{\alpha} can be trivially bounded as

ΛαKFmax1insupz(1jn,ji𝔼[fi,j2(z,Zj)Fi(z)])1/2.\Lambda_{\alpha}\leq K_{F}\max_{1\leq i\leq n}\sup_{z}\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}\left[\frac{f_{i,j}^{2}(z,Z_{j}^{\prime})}{F_{i}(z)}\right]\right)^{1/2}.

Similar comment holds for Λβ\Lambda_{\beta} as well. We use ,α,β\mathfrak{C},\mathfrak{C}_{\alpha},\mathfrak{C}_{\beta} and α,β\mathfrak{C}_{\alpha,\beta} to denote universal constants, constants depending on α,β,(α,β)\alpha,\beta,(\alpha,\beta), respectively. We now present the first main result.

Theorem 1.

Under assumption (A1), for every p1p\geq 1,

(𝔼[|1ijnfi,j(Zi,Zj)|p])1/pp1/2Λ1/2+pΛ1+βp1/2+1/β(logn)1/βΛβ+αp1/2+1/α(logn)1/2+1/αΛβ+α,βp1/α+1/βKFKG(logn)1/α+1/β+1/β,\begin{split}&\left(\mathbb{E}\left[\left|\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j})\right|^{p}\right]\right)^{1/p}\\ &\leq\mathfrak{C}p^{1/2}\Lambda_{1/2}+\mathfrak{C}p\Lambda_{1}\\ &\quad+\mathfrak{C}_{\beta}p^{1/2+1/\beta^{*}}(\log n)^{1/\beta}\Lambda_{\beta}+\mathfrak{C}_{\alpha}p^{1/2+1/\alpha^{*}}(\log n)^{1/2+1/\alpha}\Lambda_{\beta}\\ &\quad+\mathfrak{C}_{\alpha,\beta}p^{1/\alpha^{*}+1/\beta^{*}}K_{F}K_{G}(\log n)^{1/\alpha+1/\beta+1/\beta^{*}},\end{split} (11)

where α=min{α,1}\alpha^{*}=\min\{\alpha,1\} and β=min{β,1}\beta^{*}=\min\{\beta,1\}. Consequently, for every δ[0,1]\delta\in[0,1], with probability at least 1δ1-\delta,

|1ijnfi,j(Zi,Zj)|\displaystyle\left|\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j})\right|
(log(1/δ))1/2Λ1/2+log(1/δ)Λ1\displaystyle\leq\mathfrak{C}(\log(1/\delta))^{1/2}\Lambda_{1/2}+\mathfrak{C}\log(1/\delta)\Lambda_{1}
+β(log(1/δ))1/2+1/β(logn)1/βΛβ+α(log(1/δ))1/2+1/α(logn)1/2+1/αΛβ\displaystyle\quad+\mathfrak{C}_{\beta}(\log(1/\delta))^{1/2+1/\beta^{*}}(\log n)^{1/\beta}\Lambda_{\beta}+\mathfrak{C}_{\alpha}(\log(1/\delta))^{1/2+1/\alpha^{*}}(\log n)^{1/2+1/\alpha}\Lambda_{\beta}
+α,β(log(1/δ))1/α+1/βKFKG(logn)1/α+1/β+1/β.\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}(\log(1/\delta))^{1/\alpha^{*}+1/\beta^{*}}K_{F}K_{G}(\log n)^{1/\alpha+1/\beta+1/\beta^{*}}.
Proof.

See Appendix A.1 for a proof. ∎

Theorem 1 reduces to Theorem 3.2 of Giné et al., (2000) by setting α=β=\alpha=\beta=\infty; note that if α=β=\alpha=\beta=\infty, then α=β=1\alpha^{*}=\beta^{*}=1 and the log factors in the result become 1. The result is asymmetric in α,β\alpha,\beta only because of the structure of the proof. One can apply the result by switching the roles of α,β\alpha,\beta and take the minimum of the two bounds. We do not present this for brevity. It is interesting to note that the tail exhibits five different behaviors including the commonly expected sub-Gaussian and sub-exponential tails. Because we did not make any assumption on the symmetry of the kernel, α\alpha and β\beta can be different. Under an assumption of symmetry, α=β\alpha=\beta and Theorem 1 now yields a tail bound that only exhibits five regmies.

Assuming only (A1), Theorem 1 provides a moment and tail bound for degenerate UU-statistics. The appearance of the constants Λα\Lambda_{\alpha} and Λβ\Lambda_{\beta} might make this result difficult to apply in some applications. For this reason, we provide our second result assuming a little more structure on the kernel. Suppose we have nn independent random variables Z1=(X1,Y1),Z2=(X2,Y2),,Zn=(Xn,Yn)Z_{1}=(X_{1},Y_{1}),Z_{2}=(X_{2},Y_{2}),\ldots,Z_{n}=(X_{n},Y_{n}) on some measurable space and sequence of functions {wi,j(,): 1ijn}\{w_{i,j}(\cdot,\cdot):\,1\leq i\neq j\leq n\}. Consider, for functions ϕ()\phi(\cdot) and ψ()\psi(\cdot), the UU-statistic

Un:=1ijnfi,j(Zi,Zj),wherefi,j(Zi,Zj):=ϕ(Zi)wi,j(Xi,Xj)ψ(Zj).U_{n}:=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}),\quad\mbox{where}\quad f_{i,j}(Z_{i},Z_{j}):=\phi(Z_{i})w_{i,j}(X_{i},X_{j})\psi(Z_{j}). (12)

The kernels fi,j(,)f_{i,j}(\cdot,\cdot) are not required to be degenerate here. We will derive moment and tail bounds for the degenerate version of the UU-statistics UnDU_{n}^{D} given by

UnD:=1ijnfi,jD(Zi,Zj),U_{n}^{D}:=\sum_{1\leq i\neq j\leq n}f_{i,j}^{D}(Z_{i},Z_{j}),

for the kernel fi,jD(,)f_{i,j}^{D}(\cdot,\cdot) defined in (9). We first prove a basic lemma that reduces the problem of moment bounds on UnDU_{n}^{D} to a symmetrized version of UnU_{n}; see Theorem 3.5.3 of de la Peña and Giné, (1999). For any random variable WW, set Wp=(𝔼[|W|p])1/p\left\lVert W\right\rVert_{p}=(\mathbb{E}[|W|^{p}])^{1/p} for p1p\geq 1.

Lemma 1.

For any p1p\geq 1,

UnDpC1ijnεiεjfi,j(Zi,Zj)p,\left\lVert U_{n}^{D}\right\rVert_{p}\leq C\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},

for Rademacher random variables (εi,εi: 1in)(\varepsilon_{i},\varepsilon_{i}^{\prime}:\,1\leq i\leq n). Here CC can be taken to be 192192 and Z1=(X1,Y1),,Zn=(Xn,Yn)Z_{1}^{\prime}=(X_{1}^{\prime},Y_{1}^{\prime}),\ldots,Z_{n}^{\prime}=(X_{n}^{\prime},Y_{n}^{\prime}) represents an independent of nn independent random variables such that ZiZ_{i} is identically distributed as ZiZ_{i} for 1in1\leq i\leq n.

The proof of this lemma (given in Appendix A.2) is based on the by-now classical decoupling inequalities of de la Peña, (1992) and de la Peña and Giné, (1999, Chapter 3). The result also holds in case of degenerate UU-processes and does not require the special structure of the kernels fi,j(,)f_{i,j}(\cdot,\cdot) in (12).

To prove moment and tail bounds for degenerate second order UU-statistics with unbounded kernels, we use the following assumptions. Consider the following assumptions.

  1. (B1)

    There exists constants 0<α,β,Cϕ,Cψ<0<\alpha,\beta,C_{\phi},C_{\psi}<\infty such that

    max1in𝔼[exp(|ϕ(Zi)|αCϕα)|Xi]2,andmax1in𝔼[exp(|ψ(Zi)|βCψβ)|Xi]2,\max_{1\leq i\leq n}\,\mathbb{E}\left[\exp\left(\frac{|\phi(Z_{i})|^{\alpha}}{C_{\phi}^{\alpha}}\right)\big{|}X_{i}\right]\leq 2,\quad\mbox{and}\quad\max_{1\leq i\leq n}\,\mathbb{E}\left[\exp\left(\frac{|\psi(Z_{i})|^{\beta}}{C_{\psi}^{\beta}}\right)\big{|}X_{i}\right]\leq 2,

    hold almost surely.

  2. (B2)

    The functions {wi,j(,): 1ijn}\{w_{i,j}(\cdot,\cdot):\,1\leq i\neq j\leq n\} are all uniformly bounded, that is,

    max1ijnsup(x,x)𝔛×𝔛|wi,j(x,x)|Bw.\max_{1\leq i\neq j\leq n}\sup_{(x,x^{\prime})\in\mathfrak{X}\times\mathfrak{X}}\,\left|w_{i,j}(x,x^{\prime})\right|\leq B_{w}.

The main technique in our proof is truncation and Hoffmann-Jørgensen’s inequality. Assumption (B1) implies that conditional on XiX_{i}’s the maximum of ϕ(Yi)\phi(Y_{i}) is at most a polynomial of logn\log n (in rate). This along with Assumption (B2) allows us to apply truncation at this rate and study the truncated part using the sharp results of Giné et al., (2000). The unbounded parts of smaller order are controlled using Hoffmann-Jørgensen’s inequality. The bound BwB_{w} in Assumption (B2) is allowed to grown in nn and all the kernels are also allowed to be function of nn. All the results to be presented here are non-asymptotic. For more applications of this technique see Kuchibhotla and Chakrabortty, (2022).

Define

Tϕ:=8𝔼[max1in|ϕ(Zi)||X1,,Xn],Tψ:=8𝔼[max1in|ψ(Zi)||X1,,Xn],\displaystyle T_{\phi}:=8\mathbb{E}\left[\max_{1\leq i\leq n}\left|\phi(Z_{i})\right|\big{|}X_{1},\ldots,X_{n}\right],\quad T_{\psi}:=8\mathbb{E}\left[\max_{1\leq i\leq n}\left|\psi(Z_{i})\right|\big{|}X_{1},\ldots,X_{n}\right],

and the truncated random variables

Φi,1:=ϕ(Zi)𝟙{|ϕ(Zi)|Tϕ},andΦi,2:=ϕ(Zi)𝟙{|ϕ(Zi)|>Tϕ},Ψj,1:=ψ(Zj)𝟙{|ψ(Zj)|Tψ},andΨj,2:=ψ(Zj)𝟙{|ψ(Zj)|>Tψ}.\begin{split}\Phi_{i,1}:=\phi(Z_{i})\mathbbm{1}\{|\phi(Z_{i})|\leq T_{\phi}\},\quad&\mbox{and}\quad\Phi_{i,2}:=\phi(Z_{i})\mathbbm{1}\{|\phi(Z_{i})|>T_{\phi}\},\\ \Psi^{\prime}_{j,1}:=\psi(Z_{j}^{\prime})\mathbbm{1}\{|\psi(Z_{j}^{\prime})|\leq T_{\psi}\},\quad&\mbox{and}\quad\Psi^{\prime}_{j,2}:=\psi(Z_{j}^{\prime})\mathbbm{1}\{|\psi(Z_{j}^{\prime})|>T_{\psi}\}.\end{split} (13)

It is clear that ϕ(Zi)=Φi,1+Φi,2\phi(Z_{i})=\Phi_{i,1}+\Phi_{i,2} and ψ(Zj)=Ψj,1+Ψj,2.\psi(Z_{j}^{\prime})=\Psi^{\prime}_{j,1}+\Psi^{\prime}_{j,2}. Based on these, note that

ϕ(Zi)wi,j(Xi,Xj)ψ(Zj)=Φi,1wi,j(Xi,Xj)Ψj,1+Φi,2wi,j(Xi,Xj)Ψj,1+Φi,1wi,j(Xi,Xj)Ψj,2+Φi,2wi,j(Xi,Xj)Ψj,2.\begin{split}\phi(Z_{i})w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})&=\Phi_{i,1}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,1}+\Phi_{i,2}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,1}\\ &\qquad+\Phi_{i,1}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,2}+\Phi_{i,2}w_{i,j}(X_{i},X_{j})\Psi^{\prime}_{j,2}.\end{split} (14)

The first term on the right hand side is bounded by TϕBwTψT_{\phi}B_{w}T_{\psi}. The second and third terms are non-zero only when Φi,2\Phi_{i,2} and Ψj,2\Psi^{\prime}_{j,2}, are respectively non-zero, which can only happen with only a small probability under Assumption (B1). Finally, the fourth term can be non-zero only if both Φi,2\Phi_{i,2} and Ψj,2\Psi^{\prime}_{j,2} are non-zero which can happen with even smaller probability. These four terms leads to four different degenerate UU-statistics that will be controlled separately in Section A.3 to prove the following result. We need the following notation: for 1i,jn1\leq i,j\leq n,

σi,ϕ2(x)=𝔼[ϕ2(Zi)|Xi=x]andσj,ψ2(x)=𝔼[ψ2(Zj)|Xj=x].\sigma_{i,\phi}^{2}(x)=\mathbb{E}[\phi^{2}(Z_{i})\big{|}X_{i}=x]\quad\mbox{and}\quad\sigma_{j,\psi}^{2}(x)=\mathbb{E}[\psi^{2}(Z_{j})\big{|}X_{j}=x].

Define Λ2:=CϕCψBw(logn)α1+β1\Lambda_{2}:=C_{\phi}C_{\psi}B_{w}(\log n)^{\alpha^{-1}+\beta^{-1}} and

Λ1/2\displaystyle\Lambda_{{1}/{2}} :=(1ijn𝔼[σi,ϕ2(Xi)wi,j2(Xi,Xj)σj,ψ2(Xj)])1/2,\displaystyle:=\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w_{i,j}^{2}(X_{i},X_{j})\sigma_{j,\psi}^{2}(X_{j})\right]\right)^{1/2},
Λ1\displaystyle\Lambda_{1} :=sup{1ijn𝔼[qi(Xi)σi,ϕ(Xi)wi,j(Xi,Xj)σj,ψ(Xj)pj(Xj)]:\displaystyle:=\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.
j=1n𝔼[qi2(Xi)]1,i=1n𝔼[pj2(Xj)]1},\displaystyle\qquad\qquad\quad\left.\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1,\,\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1\right\},
Λ3/2(α)\displaystyle\Lambda_{3/2}^{(\alpha)} :=Cϕ(logn)1/αsupxmax1in(j=1n𝔼[wi,j2(x,Xj)σj,ψ2(Xj)])1/2,\displaystyle:=C_{\phi}(\log n)^{1/\alpha}\sup_{x}\max_{1\leq i\leq n}\left(\sum_{j=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j})\sigma^{2}_{j,\psi}(X_{j})\right]\right)^{1/2},
Λ3/2(β)\displaystyle\Lambda_{3/2}^{(\beta)} :=Cψ(logn)1/βsupxmax1jn(i=1n𝔼[wi,j2(Xi,x)σi,ϕ2(Xi)])1/2,\displaystyle:=C_{\psi}(\log n)^{1/\beta}\sup_{x}\max_{1\leq j\leq n}\left(\sum_{i=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(X_{i},x)\sigma^{2}_{i,\phi}(X_{i})\right]\right)^{1/2},
Λα\displaystyle\Lambda_{\alpha^{*}} :=(logn)1/2Λ3/2(α)+(logn)Λ2,and\displaystyle:=(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+(\log n)\Lambda_{2},\quad\mbox{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}and}
Λβ\displaystyle\Lambda_{\beta^{*}} :=(logn)1/2Λ3/2(β)+(logn)Λ2.\displaystyle:=(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}+(\log n)\Lambda_{2}.

The quantities Λ1/2,Λ1,Λ3/2(α),Λ3/2(β),Λ2\Lambda_{1/2},\Lambda_{1},\Lambda_{3/2}^{(\alpha)},\Lambda_{3/2}^{(\beta)},\Lambda_{2} also appear in the case of bounded kernels as shown in Theorem 3.2 of Giné et al., (2000).

Theorem 2.

Under Assumptions (B1) and (B2), there exists constant K>0K>0 (depending only on α,β\alpha,\beta) such that for all p1p\geq 1

UnDp\displaystyle\left\lVert U_{n}^{D}\right\rVert_{p} Kp1/2Λ1/2+KpΛ1+Kp1/αΛα+Kp1/βΛβ\displaystyle\leq Kp^{1/2}\Lambda_{1/2}+Kp\Lambda_{1}+Kp^{1/\alpha^{*}}\Lambda_{\alpha^{*}}+Kp^{1/\beta^{*}}\Lambda_{\beta^{*}}
+Kp1/2+1/αΛ3/2(α)+Kp1/2+1/βΛ3/2(β)+Kp1/α+1/βΛ2.\displaystyle\qquad+Kp^{1/2+1/\alpha^{*}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{*}}\Lambda_{3/2}^{(\beta)}+Kp^{1/\alpha^{*}+1/\beta^{*}}\Lambda_{2}.

Here α:=min{α,1}\alpha^{*}:=\min\{\alpha,1\} and β:=min{β,1}\beta^{*}:=\min\{\beta,1\}. By Markov’s inequality, there exists a constant K>0K^{\prime}>0 such that for any t0t\geq 0,

(|UnD|K𝒯α,β(t))2exp(t),\mathbb{P}\left(|U_{n}^{D}|\geq K^{\prime}\mathcal{T}_{\alpha,\beta}(t)\right)\leq 2\exp(-t), (15)

where

𝒯α,β(t)\displaystyle\mathcal{T}_{\alpha,\beta}(t) :=tΛ1/2+tΛ1+t1/αΛα+t1/βΛβ\displaystyle:=\sqrt{t}\Lambda_{1/2}+t\Lambda_{1}+t^{1/\alpha^{*}}\Lambda_{\alpha^{*}}+t^{1/\beta^{*}}\Lambda_{\beta^{*}}
+t1/2+1/αΛ3/2(α)+t1/2+βΛ3/2(β)+t1/α+1/βΛ2.\displaystyle\qquad+t^{1/2+1/\alpha^{*}}\Lambda_{3/2}^{(\alpha)}+t^{1/2+\beta^{*}}\Lambda_{3/2}^{(\beta)}+t^{1/\alpha^{*}+1/\beta^{*}}\Lambda_{2}.
Proof.

See Appendix A.3 for a proof.

Remark 2.1 (Comparison with previous results) As noted in the introduction, an important feature of our result is that the kernel is allowed to be unbounded with proper tail behavior. The tail of the degenerate UU-statistics as shown in (15) has seven different regimes, the prominent ones being the Gaussian and exponential parts. These seven regimes collapse to five if α=β\alpha=\beta. In particular, if α=β1\alpha=\beta\leq 1, then for p1p\geq 1,

UnDp\displaystyle\left\lVert U_{n}^{D}\right\rVert_{p} K𝐩𝟏/𝟐Λ1/2+K𝐩Λ1+K𝐩𝟏/α[(logn)1/2{Λ3/2(α)+Λ3/2(β)}+(logn)Λ2]\displaystyle\leq K\mathbf{p^{1/2}}\Lambda_{1/2}+K\mathbf{p}\Lambda_{1}+K\mathbf{p^{1/\alpha}}\left[(\log n)^{1/2}\left\{\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right\}+(\log n)\Lambda_{2}\right]
+K𝐩𝟏/𝟐+𝟏/α[Λ3/2(α)+Λ3/2(β)]+K𝐩𝟏/α+𝟏/βΛ2.\displaystyle\quad+K\mathbf{p^{1/2+1/\alpha}}\left[\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right]+K\mathbf{p^{1/\alpha+1/\beta}}\Lambda_{2}.

If α=β=\alpha=\beta=\infty, then our assumption (B1) implies boundedness of the kernels. In this case, only four regimes remain and these four regimes coincide with those shown in Theorem 3.2 of Giné et al., (2000). Additionally in the case of bounded kernels (α=β=\alpha=\beta=\infty), Theorem 2 essentially coincides with Theorem 3.2 of Giné et al., (2000) except for the additional logn\sqrt{\log n} and logn\log n factors. We believe these to be artifacts of our proof and closely following the proof of Theorem 1, they could be avoided. \diamond

3 Tail Bounds for Degenerate UU–Processes

In this section, we generalize Theorem 2 to degenerate UU-processes. Consider

𝒰n(𝒲):=supw𝒲|𝒰n(w)|,where𝒰n(w):=1ijnεiϕ(Zi)wi,j(Xi,Xj)ψ(Zj)εj,\mathcal{U}_{n}(\mathcal{W}):=\sup_{w\in\mathcal{W}}\left|\mathcal{U}_{n}(w)\right|,\quad\mbox{where}\quad\mathcal{U}_{n}(w):=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\phi(Z_{i})w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})\varepsilon_{j}^{\prime},

for some function class 𝒲\mathcal{W} with elements of the type w=(wi,j)1ijnw=(w_{i,j})_{1\leq i\neq j\leq n}. If 𝒲\mathcal{W} is a singleton, then this reduces to the UU-statistic studied in Section 2. Here ε1,,εn\varepsilon_{1},\ldots,\varepsilon_{n} denote an independent sequence of Rademacher random variables as before. For simplicity, we consider the symmetrized version and by Lemma 1 the results also hold for the original degenerate UU-process; see Theorem 3.5.3 of de la Peña and Giné, (1999) for details.

UU-processes were introduced in Nolan and Pollard, (1987) to study cross-validation in the context of kernel density estimation. They studied uniform almost sure limit theorems and established the rate of convergence. These results parallel the Glivenko-Cantelli theorems well-known for empirical processes. Functional limit theorems were established in Nolan et al., (1988). Exponential tail bounds that parallel the classical Bernstein’s inequality for non-degenerate and degenerate UU-statistics were given in Arcones and Giné, (1993). They also established LLN and CLT type results under various metric entropy conditions. Most of these results require boundedness of the kernel functions. Being asymptotic in nature, some of these results can be extended to the case of unbounded kernels using a truncation argument. Finite sample concentration inequalities for degenerate unbounded UU-processes are not readily available.

The only work (we are aware of) that provides general results for UU-processes applicable to 𝒰n(𝒲)\mathcal{U}_{n}(\mathcal{W}) is Adamczak, (2006). In this work, degenerate UU-processes of arbitrary order were considered. However, the moment bounds for UU-processes in this work depend further on the moment bounds of some complicated degenerate UU-processes of lower order. Furthermore, the tail behavior thus obtained is not sharp for unbounded UU-processes.

To avoid measurability issues for 𝒰n(𝒲)\mathcal{U}_{n}(\mathcal{W}), we use either of the following conventions. One simple assumption on 𝒲\mathcal{W} used in van der Vaart and Wellner, (1996) that implies measurability is separability and in this case we can take 𝒲\mathcal{W} to be a dense countable subset of 𝒲\mathcal{W}. Another convention used in Talagrand, (2014) is to define for any 𝒲\mathcal{W} and increasing function f()f(\cdot),

𝔼[f(𝒰n(𝒲))]:=sup{𝔼[f(𝒰n())]:𝒲 a finite subset}.\mathbb{E}\left[f\left(\mathcal{U}_{n}(\mathcal{W})\right)\right]:=\sup\left\{\mathbb{E}\left[f(\mathcal{U}_{n}(\mathcal{F}))\right]:\,\mathcal{F}\subseteq\mathcal{W}\mbox{ a finite subset}\right\}.

Based on either convention, we treat 𝒲\mathcal{W} as a countable set for the remaining part of this section.

One “simple” way to obtain tail bounds for 𝒰n(𝒲)\mathcal{U}_{n}(\mathcal{W}) is via generic chaining as follows: First apply Theorem 2 for 𝒰n(w)𝒰n(w)\mathcal{U}_{n}(w)-\mathcal{U}_{n}(w^{\prime}) for functions w,w𝒲w,w^{\prime}\in\mathcal{W}. The tail bound (15) provides a mixed tail in terms of various semi-metrics on 𝒲\mathcal{W}. Using these and following the proof of classical generic chaining bound (e.g., Theorem 3.5 of Dirksen, (2015)), one can obtain tail bounds for UU-processes in terms of γ\gamma-functionals; see Talagrand, (2014) and Dirksen, (2015) for details. A problem with this approach is the complication in controlling the γ\gamma-functionals. This approach with Dudley’s chaining (instead of generic chaining) was used for bounded kernel UU-processes in Nolan and Pollard, (1987) and Nolan et al., (1988).

In the following, we first provide a deviation inequality for 𝒰n(𝒲)\mathcal{U}_{n}(\mathcal{W}) and then prove a maximal inequality to control the expectations appearing in the deviation inequality. For these results, we consider the following generalization of assumption (B2).

  1. (A2)

    The functions {w:w𝒲}\{w:\,w\in\mathcal{W}\} are all uniformly bounded, that is,

    supw𝒲sup(x,x)𝔛×𝔛max1ijn|wi,j(x,x)|B𝒲.\sup_{w\in\mathcal{W}}\sup_{(x,x^{\prime})\in\mathfrak{X}\times\mathfrak{X}}\,\max_{1\leq i\neq j\leq n}\left|w_{i,j}(x,x^{\prime})\right|\leq B_{\mathcal{W}}.

We will use the notation of Φi,1,Φi,2,Ψi,1,Ψi,2\Phi_{i,1},\Phi_{i,2},\Psi_{i,1}^{\prime},\Psi_{i,2}^{\prime} given in (13). For the main result of this section, define

Λ2(𝒲)\displaystyle\Lambda_{2}(\mathcal{W}) :=(logn)α1+β1CϕCψB𝒲,\displaystyle:=(\log n)^{\alpha^{-1}+\beta^{-1}}C_{\phi}C_{\psi}B_{\mathcal{W}},
En,1(𝒲)\displaystyle E_{n,1}(\mathcal{W}) :=Cψ(logn)1/βsupx𝔛max1jn𝔼[supw𝒲|i=1,ijnεiΦi,1wi,j(Xi,x)|],\displaystyle:=C_{\psi}(\log n)^{1/\beta}\sup_{x\in\mathfrak{X}}\max_{1\leq j\leq n}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},x)\right|\right],
En,2(𝒲)\displaystyle{E}_{n,2}(\mathcal{W}) :=Cϕ(logn)1/αsupx𝔛max1in𝔼[supw𝒲|j=1,jinεjΨj,1wi,j(x,Xj)|],\displaystyle:=C_{\phi}(\log n)^{1/\alpha}\sup_{x\in\mathfrak{X}}\max_{1\leq i\leq n}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left|\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(x,X_{j}^{\prime})\right|\right],
𝔚n,1(𝒲)\displaystyle\mathfrak{W}_{n,1}(\mathcal{W}) :=𝔼[supw𝒲sup{pj}1ijnεiΦi,1pj(x)σj,ψ(x)wi,j(Xi,x)PXj(dx)],\displaystyle:=\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{p_{j}\}}\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}\int p_{j}(x)\sigma_{j,\psi}(x)w_{i,j}(X_{i},x)P_{X_{j}}(dx)\right],
𝔚n,2(𝒲)\displaystyle\mathfrak{W}_{n,2}(\mathcal{W}) :=𝔼[supw𝒲sup{qi}1ijnεjΨj,1qi(x)σi,ϕ(x)wi,j(x,Xj)PXi(dx)],\displaystyle:=\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\sum_{1\leq i\neq j\leq n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right],
Σn,11/2(𝒲)\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\Sigma_{n,1}^{1/2}(\mathcal{W})} :=Cψ(logn)1/βsupx𝔛supw𝒲max1jn(i=1,ijn𝔼[σi,ϕ2(Xi)wi,j2(Xi,x)])1/2,\displaystyle:=C_{\psi}(\log n)^{1/\beta}\sup_{x\in\mathfrak{X}}\sup_{w\in\mathcal{W}}\,\max_{1\leq j\leq n}\left(\sum_{i=1,i\neq j}^{n}\mathbb{E}[\sigma_{i,\phi}^{2}(X_{i})w^{2}_{i,j}(X_{i},x)]\right)^{1/2},
Σn,21/2(𝒲)\displaystyle{\Sigma}_{n,2}^{1/2}(\mathcal{W}) :=Cϕ(logn)1/αsupx𝔛supw𝒲max1in(j=1,jin𝔼[σj,ψ2(Xj)wi,j2(x,Xj)])1/2,\displaystyle:=C_{\phi}(\log n)^{1/\alpha}\sup_{x\in\mathfrak{X}}\sup_{w\in\mathcal{W}}\max_{1\leq i\leq n}\left(\sum_{j=1,j\neq i}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j})w^{2}_{i,j}(x,X_{j})\right]\right)^{1/2},
(ϕwψ)𝒲22\displaystyle\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2} :=supw𝒲sup{qi}sup{pj}1ijn𝔼[qi(Xi)σi,ϕ(Xi)wi,j(Xi,Xj)σj,ψ(Xj)pj(Xj)].\displaystyle:=\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\sup_{\{p_{j}\}}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j}^{\prime})\sigma_{j,\psi}(X_{j}^{\prime})p_{j}(X_{j}^{\prime})\right].

Here in the definitions, the supremum over {qi}\{q_{i}\} (or {pj}\{p_{j}\}) represents supremum over all function (q1,,qn)(q_{1},\ldots,q_{n}) (or (p1,,pn)(p_{1},\ldots,p_{n}))satisfying

i=1nqi2(x)PXi(dx)1,andj=1npj2(x)PXi(dx)1,\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1,\quad\mbox{and}\quad\sum_{j=1}^{n}\int p_{j}^{2}(x)P_{X_{i}}(dx)\leq 1,

where PXi()P_{X_{i}}(\cdot) denotes the probability measure of XiX_{i}. Note that (ϕwψ)𝒲22\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2} is similar to Λ1\Lambda_{1} defined for Theorem 2.

Theorem 3.

Under assumptions (B1) and (A2), there exists a constant K>0K>0 (depending only on α,β\alpha,\beta) such that for all p1p\geq 1

𝒰n(𝒲)p\displaystyle\left\lVert\mathcal{U}_{n}(\mathcal{W})\right\rVert_{p} K𝔼[𝒰n(1)(𝒲)]+Kp1/2(𝔚n,1(𝒲)+𝔚n,2(𝒲))+Kp(ϕwψ)𝒲22\displaystyle\leq K\mathbb{E}\left[\mathcal{U}_{n}^{(1)}(\mathcal{W})\right]+Kp^{1/2}(\mathfrak{W}_{n,1}(\mathcal{W})+\mathfrak{W}_{n,2}(\mathcal{W}))+Kp\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}
+Kp1/α[En,2(𝒲)+Σn,21/2(𝒲)logn+Λ2(𝒲)logn]\displaystyle\quad+Kp^{1/\alpha^{*}}\left[E_{n,2}(\mathcal{W})+\Sigma_{n,2}^{1/2}(\mathcal{W})\sqrt{\log n}+\Lambda_{2}(\mathcal{W})\log n\right]
+Kp1/β[En,1(𝒲)+Σn,11/2(𝒲)logn+Λ2(𝒲)logn]\displaystyle\quad+Kp^{1/\beta^{*}}\left[E_{n,1}(\mathcal{W})+\Sigma_{n,1}^{1/2}(\mathcal{W})\sqrt{\log n}+\Lambda_{2}(\mathcal{W})\log n\right]
+Kp1/2+1/αΣn,21/2(𝒲)+Kp1/2+1/βΣn,11/2(𝒲)+Kp1/α+1/βΛ2(𝒲).\displaystyle\quad+Kp^{1/2+1/\alpha^{*}}\Sigma_{n,2}^{1/2}(\mathcal{W})+Kp^{1/2+1/\beta^{*}}\Sigma_{n,1}^{1/2}(\mathcal{W})+Kp^{1/\alpha^{*}+1/\beta^{*}}\Lambda_{2}(\mathcal{W}).
Proof.

See Appendix B.1 for a proof. ∎

If 𝒲\mathcal{W} is a singleton set, then the above result reduces to Theorem 2. From the moment bound above, it is easy to derive a tail bound using Markov’s inequality. In comparison, we again get seven different tail regimes that again reduce to five if α=β\alpha=\beta. Unlike the result of Adamczak, (2006), the moment bound in Theorem 3 only depends on some expectations. An additional advantage of Theorem 3 is that all the expectations only involve bounded random variables.

3.1 Maximal Inequality for Bounded Degenerate U-Processes

To apply Theorem 3, we need to control various expectations appearing on the right hand side of the moment bound there. Expect for 𝔼[𝒰n(1)(𝒲)]\mathbb{E}[\mathcal{U}_{n}^{(1)}(\mathcal{W})], all the other quantities are maximal inequalities related to empirical processes. See van der Vaart and Wellner, (2011) and Lemmas 3.4.2-3.4.3 of van der Vaart and Wellner, (1996) for maximal inequalities of empirical processes. In this section, we provide a maximal inequality for 𝒰n(1)(𝒲)\mathcal{U}_{n}^{(1)}(\mathcal{W}). For independent and identically distributed random variables, Chen and Kato, (2020, Theorem 5.1) provide a maximal inequality for degenerate UU-processes of arbitrary order. This result is similar to Theorem 2.1 of van der Vaart and Wellner, (2011) for empirical processes. The same proof as in Chen and Kato, (2020) does not provide the “correct” bound in the case of possibly non-identically distributed observations since they use Hoeffding averaging which can lead to sub-optimal rate if the observations are not identically distributed. A modification of the proof leads to the maximal inequality below.

For any η>0\eta>0, function class \mathcal{F} containing functions f=(fi,j)1ijn:χ×χf=(f_{i,j})_{1\leq i\neq j\leq n}:{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\times{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\to\mathbb{R} and a discrete probability measure QQ with support {z1,,zt}\{z_{1},\ldots,z_{t}\}, let N(η,,2,Q)N(\eta,\mathcal{F},\left\lVert\cdot\right\rVert_{2,Q}) denotes the minimum mm such that there exists f(1),f(2),,f(m)f^{(1)},f^{(2)},\ldots,f^{(m)}\in\mathcal{F} satisfying

supfinf1jmff(j)2,Qη,\sup_{f\in\mathcal{F}}\inf_{1\leq j\leq m}\left\lVert f-f^{(j)}\right\rVert_{2,Q}\leq\eta,

where for ff\in\mathcal{F},

f2,Q2:=1ijtfi,j2(zi,zj)Q({zi})Q({zj})1ijtQ({zi})Q({zj}).\left\lVert f\right\rVert_{2,Q}^{2}:=\frac{\sum_{1\leq i\neq j\leq t}f_{i,j}^{2}(z_{i},z_{j})Q(\{z_{i}\})Q(\{z_{j}\})}{\sum_{1\leq i\neq j\leq t}Q(\{z_{i}\})Q(\{z_{j}\})}.

Note that the right hand side is expectation with respect to the measure induced on {(zi,zj):1ijt}\{(z_{i},z_{j}):1\leq i\neq j\leq t\}. Define the uniform entropy integral needed for UU-processes is given by

J2(δ,,2):=supQ0δlogN(ηF2,Q,,2,Q)𝑑η.J_{2}(\delta,\mathcal{F},\left\lVert\cdot\right\rVert_{2}):=\sup_{Q}\int_{0}^{\delta}\log N(\eta\left\lVert F\right\rVert_{2,Q},\mathcal{F},\left\lVert\cdot\right\rVert_{2,Q})d\eta.

Here F=(Fi,j)1ijnF=(F_{i,j})_{1\leq i\neq j\leq n} represents the envelope function for \mathcal{F} satisfying |fi,j(x,x)|Fi,j(x,x)|f_{i,j}(x,x^{\prime})|\leq F_{i,j}(x,x^{\prime}) for all f,x,xχf\in\mathcal{F},x,x^{\prime}\in{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}} and the supremum is taken over all discrete probability measures QQ supported on χ×χ{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\times{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}.

The following Lemma proves a maximal inequality using Theorem 5.1.4 of de la Peña and Giné, (1999). The proof is very similar to that of Theorem 5.1 of Chen and Kato, (2020) which itself was based on the proof of Theorem 2.1 of van der Vaart and Wellner, (2011).

Theorem 4.

Suppose \mathcal{F} represent a class of real-valued functions f:χ×χf:{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\times{\mathchoice{\raisebox{0.0pt}{$\displaystyle\chi$}}{\raisebox{0.0pt}{$\textstyle\chi$}}{\raisebox{0.0pt}{$\scriptstyle\chi$}}{\raisebox{0.0pt}{$\scriptscriptstyle\chi$}}}\to\mathbb{R} uniformly bounded by RR with the envelope function FF. Then there exists a universal constant C>0C>0 such that

𝔼[supf|1ijnϵiϵjfi,j(Xi,Xj)n(n1)|]CF2,PJ2(a,,2)[1+J2(a,,2)b2a2],\mathbb{E}\left[\sup_{f\in\mathcal{F}}\left|\frac{\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f_{i,j}(X_{i},X_{j})}{\sqrt{n(n-1)}}\right|\right]\leq C\left\lVert F\right\rVert_{2,P}J_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})\left[1+\frac{J_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})b^{2}}{a^{2}}\right],

for any aAna\geq A_{n} and bBnb\geq B_{n}, where Bn2=R/(nF2,P)B_{n}^{2}=R/(n\left\lVert F\right\rVert_{2,P}),

F2,P2\displaystyle\left\lVert F\right\rVert_{2,P}^{2} :=1n(n1)1ijn𝔼[Fi,j2(Xi,Xj)],\displaystyle:=\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}[F^{2}_{i,j}(X_{i},X_{j})],
An2\displaystyle A_{n}^{2} :=F2,P2[Γn,12()+Γn,22()+Σn2()],\displaystyle:=\left\lVert F\right\rVert_{2,P}^{-2}\left[\Gamma_{n,1}^{2}(\mathcal{F})+\Gamma_{n,2}^{2}(\mathcal{F})+\Sigma_{n}^{2}(\mathcal{F})\right],
Γn,12()\displaystyle\Gamma_{n,1}^{2}(\mathcal{F}) :=𝔼[supf1n(n1)|1ijn{𝔼[fi,j2(Xi,Xj)|Xi]𝔼[fi,j2(Xi,Xj)]}|],\displaystyle:=\mathbb{E}\left[\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\left|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})|X_{i}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right|\right],
Γn,22()\displaystyle\Gamma_{n,2}^{2}(\mathcal{F}) :=𝔼[supf1n(n1)|1ijn{𝔼[f2i,j(Xi,Xj)|Xj]𝔼[f2i,j(Xi,Xj)]}|],\displaystyle:=\mathbb{E}\left[\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\left|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})|X_{j}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right|\right],
Σn2()\displaystyle\Sigma_{n}^{2}(\mathcal{F}) :=supf1n(n1)1ijn𝔼[f2i,j(Xi,Xj)].\displaystyle:=\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right].
Proof.

See Appendix C for a proof.

APPENDIX

Appendix A Proofs of Results in Section 2

A.1 Proof of Theorem 1

By Theorem 3.5.3 of de la Peña and Giné, (1999), it follows that

UnDp𝒰np,\|U_{n}^{D}\|_{p}\leq\mathfrak{C}\left\|\mathcal{U}_{n}\right\|_{p},

where

𝒰n:=1ijnfi,j(Zi,Zj).\mathcal{U}_{n}:=\sum_{1\leq i\neq j\leq n}f_{i,j}(Z_{i},Z_{j}^{\prime}).

For 1in1\leq i\leq n and any zz, define

hi(z):=1jn,jifi,j(z,Zj).h_{i}(z):=\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\,f_{i,j}(z,Z_{j}^{\prime}).

Observe that

𝒰n=i=1nhi(Zi).\mathcal{U}_{n}=\sum_{i=1}^{n}h_{i}(Z_{i}).

First, we consider the behavior of hi(z)h_{i}(z) for a fixed zz and then the behavior of 𝒰n\mathcal{U}_{n}. By the degeneracy of the kernel, we have that hi(z)h_{i}(z) is a sum of independent mean zero random variables for every i,zi,z. Moreover, fi,j(z,Zj)ψβFi(z)KG\|f_{i,j}(z,Z_{j}^{\prime})\|_{\psi_{\beta}}\leq F_{i}(z)K_{G} for all i,zi,z. Hence, Theorem 3.4 of Kuchibhotla and Chakrabortty, (2022) (with q=1q=1 and t=log(δ/3)t=\log(\delta/3)) implies

(|hi(z)|7log(3/δ)(1jn,ji𝔼[fi,j2(z,Zj)])1/2+CβFi(z)KG(log(2n))1/β(log(3/δ))1/β)δ,\mathbb{P}\left(|h_{i}(z)|\geq 7\sqrt{\log(3/\delta)}\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(z,Z_{j}^{\prime})]\right)^{1/2}+C_{\beta}F_{i}(z)K_{G}(\log(2n))^{1/\beta}(\log(3/\delta))^{1/\beta^{*}}\right)\leq\delta,

where β=min{β,1}\beta^{*}=\min\{\beta,1\}. Based on this, define

Hi(z;δ):=Fi(z)i(z,δ/n),H_{i}(z;\delta):=F_{i}(z)\mathcal{B}_{i}(z,\delta/n),

where

i(z,δ):=7log(3/δ)(1jn,ji𝔼[fi,j2(z,Zj)Fi2(z)])1/2+CβKG(log(2n))1/β(log(3/δ))1/β.\mathcal{B}_{i}(z,\delta):=7\sqrt{\log(3/\delta)}\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}\left[\frac{f_{i,j}^{2}(z,Z_{j}^{\prime})}{F_{i}^{2}(z)}\right]\right)^{1/2}+C_{\beta}K_{G}(\log(2n))^{1/\beta}(\log(3/\delta))^{1/\beta^{*}}.

Getting back to the behavior of 𝒰n\mathcal{U}_{n}, we first note that by degeneracy and symmetrization,

𝒰np2i=1nεihi(Zi)pfor allp1.\|\mathcal{U}_{n}\|_{p}\leq 2\left\|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\right\|_{p}\quad\mbox{for all}\quad p\geq 1. (16)

Here ε1,,εn\varepsilon_{1},\ldots,\varepsilon_{n} are independent Rademacher random variables independent of Z1,,ZnZ_{1},\ldots,Z_{n}, Z1,,ZnZ_{1}^{\prime},\ldots,Z_{n}^{\prime}. Hence, it suffices to understand the behavior of

𝒰n:=i=1nεihi(Zi).\mathcal{U}_{n}^{\prime}:=\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i}).

(The introduction of Rademacher variables is only done for notational convenience in applying truncation.)

(|𝒰n|t)(|i=1nεihi(Zi)𝟏{|hi(Zi)|Hi(Zi;δ1)}|t)+(|hi(Zi)|>Hi(Zi;δ1)for some1in)(|i=1nεihi(Zi)𝟏{|hi(Zi)|Hi(Zi;δ1)}|t)+i=1n(|hi(Zi)|>Hi(Zi;δ1))(|i=1nεihi(Zi)𝟏{|hi(Zi)|Hi(Zi;δ1)}|t)+δ1.\begin{split}\mathbb{P}\left(|\mathcal{U}_{n}^{\prime}|\geq t\right)&\leq\mathbb{P}\left(\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right|\geq t\right)\\ &\quad+\mathbb{P}\left(|h_{i}(Z_{i})|>H_{i}(Z_{i};\delta_{1})\quad\mbox{for some}\quad 1\leq i\leq n\right)\\ &\leq\mathbb{P}\left(\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right|\geq t\right)\\ &\quad+\sum_{i=1}^{n}\mathbb{P}(|h_{i}(Z_{i})|>H_{i}(Z_{i};\delta_{1}))\\ &\leq\mathbb{P}\left(\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right|\geq t\right)+\delta_{1}.\end{split} (17)

Because {hi(Zi):1in}\{h_{i}(Z_{i}):1\leq i\leq n\} are independent random variables conditional on {Zj:1jn}\{Z_{j}^{\prime}:1\leq j\leq n\}, we get by another application of Theorem 3.4 of Kuchibhotla and Chakrabortty, (2022) (with q=1q=1 and t=log(3/δ2)t=\log(3/\delta_{2})) that conditional on {Zj:1in}\{Z_{j}^{\prime}:1\leq i\leq n\}, with probability at least 1δ21-\delta_{2},

|i=1nεihi(Zi)𝟏{|hi(Zi)|Hi(Zi)}|7log(3/δ2)(i=1n𝔼[hi2(Zi)|{Zj}])1/2+Cα(log(2n))1/α(log(3/δ2))1/αmax1inhi(Zi)𝟏{|hi(Zi)|Hi(Zi;δ1)}ψα|{Zj}.\begin{split}&\left|\sum_{i=1}^{n}\varepsilon_{i}h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i})\}\right|\\ &\quad\leq 7\sqrt{\log(3/\delta_{2})}\left(\sum_{i=1}^{n}\mathbb{E}[h_{i}^{2}(Z_{i})|\{Z_{j}^{\prime}\}]\right)^{1/2}\\ &\qquad+C_{\alpha}(\log(2n))^{1/\alpha}(\log(3/\delta_{2}))^{1/\alpha^{*}}\max_{1\leq i\leq n}\left\|h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right\|_{\psi_{\alpha}|\{Z_{j}^{\prime}\}}.\end{split} (18)

Observe now that

hi(Zi)𝟏{|hi(Zi)|Hi(Zi;δ1)}ψα|{Zj}\displaystyle\left\|h_{i}(Z_{i})\mathbf{1}\{|h_{i}(Z_{i})|\leq H_{i}(Z_{i};\delta_{1})\}\right\|_{\psi_{\alpha}|\{Z_{j}^{\prime}\}}
Fi(Zi)i(Zi,δ1/n)ψα\displaystyle\leq\|F_{i}(Z_{i})\mathcal{B}_{i}(Z_{i},\delta_{1}/n)\|_{\psi_{\alpha}}
7log(3n/δ1)(1jn,ji𝔼[fi,j2(Zi,Zj)|Zi])1/2ψα+CβKFKG(log(2n))1/β(log(3n/δ1))1/β.\displaystyle\leq 7\sqrt{\log(3n/\delta_{1})}\left\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{i}]\right)^{1/2}\right\|_{\psi_{\alpha}}+C_{\beta}K_{F}K_{G}(\log(2n))^{1/\beta}(\log(3n/\delta_{1}))^{1/\beta^{*}}.

To bound the first term on the right hand side of (18), we follow the argument in the proof of Theorem 3.2 of Giné et al., (2000). However, in place of inequality (3.8) of Giné et al., (2000), we apply Theorem B.1 of Kuchibhotla and Chakrabortty, (2022). Following the display after inequality (3.11) of Giné et al., (2000), we have

(i=1n𝔼[hi2(Zi)|{Zj}])1/2\displaystyle\left(\sum_{i=1}^{n}\mathbb{E}[h_{i}^{2}(Z_{i})|\{Z_{j}^{\prime}\}]\right)^{1/2}
=sup{i=1n𝔼[hi(Zi)γi(Zi)|{Zj}]:i=1n𝔼[γi2(Zi)]1}\displaystyle=\sup\left\{\sum_{i=1}^{n}\mathbb{E}[h_{i}(Z_{i})\gamma_{i}(Z_{i})|\{Z_{j}^{\prime}\}]:\,\sum_{i=1}^{n}\mathbb{E}[\gamma_{i}^{2}(Z_{i})]\leq 1\right\}
=sup{j=1n(1in,ij𝔼[fi,j(Zi,Zj)γi(Zi)|Zj]):i=1n𝔼[γi2(Zi)]1},\displaystyle=\sup\left\{\sum_{j=1}^{n}\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}(Z_{i},Z_{j}^{\prime})\gamma_{i}(Z_{i})|Z_{j}^{\prime}]\right):\,\sum_{i=1}^{n}\mathbb{E}[\gamma_{i}^{2}(Z_{i})]\leq 1\right\},

where the supremum is taken over a countable subset of mean zero vector functions (γ1,,γn)(\gamma_{1},\ldots,\gamma_{n}). Define

Wj(γ)=1in,ij𝔼[fi,j(Zi,Zj)γi(Zi)|Zj].W_{j}(\gamma)=\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}(Z_{i},Z_{j}^{\prime})\gamma_{i}(Z_{i})|Z_{j}^{\prime}].

Degeneracy of {fi,j}\{f_{i,j}\} implies that WjW_{j}’s are mean zero independent random variables. Hence, by Theorem B.1 of Kuchibhotla and Chakrabortty, (2022), we get

(𝔼[supγ|j=1nWj|p])1/p\displaystyle\left(\mathbb{E}\left[\sup_{\gamma}\left|\sum_{j=1}^{n}W_{j}\right|^{p}\right]\right)^{1/p}
2𝔼[supγ|j=1nWj|]+2p(supγj=1n𝔼[Wj2])1/2+Cβp1/βmax1insupγWjψβ.\displaystyle\leq 2\mathbb{E}\left[\sup_{\gamma}\left|\sum_{j=1}^{n}W_{j}\right|\right]+\sqrt{2p}\left(\sup_{\gamma}\sum_{j=1}^{n}\mathbb{E}[W_{j}^{2}]\right)^{1/2}+C_{\beta}p^{1/\beta^{*}}\left\|\max_{1\leq i\leq n}\sup_{\gamma}W_{j}\right\|_{\psi_{\beta}}.

Following the argument in Theorem 3.2 of Giné et al., (2000), we get

𝔼[supγ|j=1nWj|]\displaystyle\mathbb{E}\left[\sup_{\gamma}\left|\sum_{j=1}^{n}W_{j}\right|\right]~{} (1ijn𝔼[fi,j2(Zi,Zj)])1/2,\displaystyle\leq~{}\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2},
supγj=1n𝔼[Wj2]\displaystyle\sup_{\gamma}\sum_{j=1}^{n}\mathbb{E}[W_{j}^{2}]~{} (fi,j)L2L22,\displaystyle\leq~{}\|(f_{i,j})\|_{L^{2}\to L^{2}}^{2},
max1jnsupγWj\displaystyle\max_{1\leq j\leq n}\sup_{\gamma}\,W_{j}~{} max1jn(1in,ij𝔼[fi,j2(Zi,Zj)|Zj])1/2.\displaystyle\leq~{}\max_{1\leq j\leq n}\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{j}^{\prime}]\right)^{1/2}.

Therefore, by Markov’s inequality, with probability at least 1δ31-\delta_{3},

(i=1n𝔼[hi2(Zi)|{Zj}])1/22(1ijn𝔼[fi,j2(Zi,Zj)])1/2+2log(1/δ3)(fi,j)L2L2+Cβ(log(1/δ3))1/β(log(n))1/βmax1jn(1in,ij𝔼[fi,j2(Zi,Zj)|Zj])1/2ψβ.\begin{split}&\left(\sum_{i=1}^{n}\mathbb{E}[h_{i}^{2}(Z_{i})|\{Z_{j}^{\prime}\}]\right)^{1/2}\\ &\leq 2\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}+\sqrt{2\log(1/\delta_{3})}\|(f_{i,j})\|_{L^{2}\to L^{2}}\\ &\quad+C_{\beta}(\log(1/\delta_{3}))^{1/\beta^{*}}(\log(n))^{1/\beta}\max_{1\leq j\leq n}\left\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{j}^{\prime}]\right)^{1/2}\right\|_{\psi_{\beta}}.\end{split} (19)

Combining inequalities (17), (18), (19), we get that with probability 1δ1δ2δ31-\delta_{1}-\delta_{2}-\delta_{3},

|𝒰n|\displaystyle|\mathcal{U}_{n}^{\prime}| 14log(3/δ2)(ij𝔼[fi,j2(Zi,Zj)])1/2\displaystyle\leq 14\sqrt{\log(3/\delta_{2})}\left(\sum_{i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}
+72log(3/δ2)log(1/δ3)(fi,j)L2L2\displaystyle\quad+7\sqrt{2\log(3/\delta_{2})\log(1/\delta_{3})}\|(f_{i,j})\|_{L^{2}\to L^{2}}
+β(log(3/δ2))1/2(log(1/δ3))1/β(logn)1/βmax1jn(1in,ij𝔼[fi,j2(Zi,Zj)|Zj])1/2ψβ\displaystyle\quad+\mathfrak{C}_{\beta}(\log(3/\delta_{2}))^{1/2}(\log(1/\delta_{3}))^{1/\beta^{*}}(\log n)^{1/\beta}\max_{1\leq j\leq n}\left\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{j}^{\prime}]\right)^{1/2}\right\|_{\psi_{\beta}}
+α(log(3n/δ1))1/2(log(3/δ2))1/α(log(2n))1/αmax1in(1jn,ji𝔼[fi,j2(Zi,Zj)|Zi])1/2ψα\displaystyle\quad+\mathfrak{C}_{\alpha}(\log(3n/\delta_{1}))^{1/2}(\log(3/\delta_{2}))^{1/\alpha^{*}}(\log(2n))^{1/\alpha}\max_{1\leq i\leq n}\left\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{i}]\right)^{1/2}\right\|_{\psi_{\alpha}}
+α,βKFKG(log(2n))1/α+1/β(log(3/δ2))1/α(log(3n/δ1))1/β.\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}K_{F}K_{G}(\log(2n))^{1/\alpha+1/\beta}(\log(3/\delta_{2}))^{1/\alpha^{*}}(\log(3n/\delta_{1}))^{1/\beta^{*}}.

Taking δ1=δ2=δ3=δ/3\delta_{1}=\delta_{2}=\delta_{3}=\delta/3 and integrating over δ[0,1]\delta\in[0,1], this inequality yields the following moment bound

𝒰np\displaystyle\|\mathcal{U}_{n}^{\prime}\|_{p} p1/2(ij𝔼[fi,j2(Zi,Zj)])1/2\displaystyle\leq\mathfrak{C}p^{1/2}\left(\sum_{i\neq j}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})]\right)^{1/2}
+p(fi,j)L2L2\displaystyle\quad+\mathfrak{C}p\|(f_{i,j})\|_{L^{2}\to L^{2}}
+βp1/2+1/β(logn)1/βmax1jn(1in,ij𝔼[fi,j2(Zi,Zj)|Zj])1/2ψβ\displaystyle\quad+\mathfrak{C}_{\beta}p^{1/2+1/\beta^{*}}(\log n)^{1/\beta}\max_{1\leq j\leq n}\left\|\left(\sum_{\begin{subarray}{c}1\leq i\leq n,\\ i\neq j\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{j}^{\prime}]\right)^{1/2}\right\|_{\psi_{\beta}}
+αp1/2+1/α(log(2n))1/2+1/αmax1in(1jn,ji𝔼[fi,j2(Zi,Zj)|Zi])1/2ψα\displaystyle\quad+\mathfrak{C}_{\alpha}p^{1/2+1/\alpha^{*}}(\log(2n))^{1/2+1/\alpha}\max_{1\leq i\leq n}\left\|\left(\sum_{\begin{subarray}{c}1\leq j\leq n,\\ j\neq i\end{subarray}}\mathbb{E}[f_{i,j}^{2}(Z_{i},Z_{j}^{\prime})|Z_{i}]\right)^{1/2}\right\|_{\psi_{\alpha}}
+α,βp1/α+1/βKFKG(log(2n))1/α+1/β+1/β.\displaystyle\quad+\mathfrak{C}_{\alpha,\beta}p^{1/\alpha^{*}+1/\beta^{*}}K_{F}K_{G}(\log(2n))^{1/\alpha+1/\beta+1/\beta^{*}}.

This inequality combined with (16) yields the tail bound for UnDU_{n}^{D}.

A.2 Proof of Lemma 1

From Theorem 3.1.1 of de la Peña and Giné, (1999) and following the arguments similar to those in Theorem 3.5.3 of de la Peña and Giné, (1999), we get for all p1p\geq 1

TnDp481ijnεiεjfi,jD(Zi,Zj)p,\left\lVert T_{n}^{D}\right\rVert_{p}\leq 48\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}^{D}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},

where εi,εi,1in\varepsilon_{i},\varepsilon_{i}^{\prime},1\leq i\leq n are Rademacher random variables independent of (Zi,Zi),1in(Z_{i},Z_{i}^{\prime}),1\leq i\leq n. Note from (9) that

εiεjfi,jD(Zi,Zj)\displaystyle\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}^{D}(Z_{i},Z_{j}^{\prime}) =εiεjfi,j(Zi,Zj)εiεjfi,j(z,Zj)Pi(dz)\displaystyle=\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})-\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(z,Z_{j}^{\prime})P_{i}(dz)
εiεjfi,j(Zi,z)Pj(dz)+εiεjfi,j(z,z)Pi(dz)Pj(dz).\displaystyle\quad-\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(Z_{i},z)P_{j}(dz)+\varepsilon_{i}\varepsilon_{j}^{\prime}\iint f_{i,j}(z,z^{\prime})P_{i}(dz)P_{j}(dz).

Here PiP_{i} represents the probability measure of ZiZ_{i} for 1in1\leq i\leq n. By Jensen’s inequality, it is clear that for p1p\geq 1,

1ijnεiεjfi,j(z,Zj)Pi(dz)p\displaystyle\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(z,Z_{j}^{\prime})P_{i}(dz)\right\rVert_{p} 1ijnεiεjfi,j(Zi,Zj)p,\displaystyle\leq\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},
1ijnεiεjfi,j(Zi,z)dPj(z)p\displaystyle\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}\int f_{i,j}(Z_{i},z)dP_{j}(z)\right\rVert_{p} 1ijnεiεjfi,j(Zi,Zj)p,\displaystyle\leq\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p},
1ijnεiεjfi,j(z,z)Pi(dz)Pj(dz)p\displaystyle\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}\iint f_{i,j}(z,z^{\prime})P_{i}(dz)P_{j}(dz)\right\rVert_{p} 1ijnεiεjfi,j(Zi,Zj)p.\displaystyle\leq\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p}.

Therefore, for p1p\geq 1,

TnDp1921ijnεiεjfi,j(Zi,Zj)p\left\lVert T_{n}^{D}\right\rVert_{p}\leq 192\left\lVert\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\varepsilon_{j}^{\prime}f_{i,j}(Z_{i},Z_{j}^{\prime})\right\rVert_{p}

Throughout the proofs in all the appendices to follow, we use the notation

𝒵n:={(Z1,ε1),,(Zn,εn)}and𝒵n:={(Z1,ε1),,(Zn,εn)}.\mathcal{Z}_{n}^{\prime}:=\{(Z_{1}^{\prime},\varepsilon_{1}^{\prime}),\ldots,(Z_{n}^{\prime},\varepsilon_{n}^{\prime})\}\quad\mbox{and}\quad\mathcal{Z}_{n}:=\{(Z_{1},\varepsilon_{1}),\ldots,(Z_{n},\varepsilon_{n})\}.

Note that this is different from 𝒵n\mathcal{Z}_{n}^{\prime} and 𝒵n\mathcal{Z}_{n} defined in the main text.

A.3 Proof of Theorem 2

Based on the basic decomposition (14), we get

1ijnεiϕ(Zi)wi,j(Xi,Xj)ψ(Zj)εj=𝒰n(1)+𝒰n(2)+𝒰n(3)+𝒰n(4),\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\phi(Z_{i})w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})\varepsilon_{j}^{\prime}=\mathcal{U}_{n}^{(1)}+\mathcal{U}_{n}^{(2)}+\mathcal{U}_{n}^{(3)}+\mathcal{U}_{n}^{(4)},

where

𝒰n(1):=1ijnεiΦi,1wi,j(Xi,Xj)Ψj,1εj,𝒰n(2):=1ijnεiΦi,2wi,j(Xi,Xj)Ψj,1εj,𝒰n(3):=1ijnεiΦi,1wi,j(Xi,Xj)Ψj,2εj,𝒰n(4):=1ijnεiΦi,2wi,j(Xi,Xj)Ψj,2εj.\begin{split}\mathcal{U}_{n}^{(1)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime},\\ \mathcal{U}_{n}^{(2)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime},\\ \mathcal{U}_{n}^{(3)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime},\\ \mathcal{U}_{n}^{(4)}&:=\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime}.\end{split} (20)

It is easy to verify that 𝒰n(k),1k4\mathcal{U}_{n}^{(k)},1\leq k\leq 4 are all degenerate UU-statistics. From Theorem 3.2 of Giné et al., (2000), we get that there exists a constant K>0K>0 such that for all p1p\geq 1,

𝒰n(1)pK[pA+pB+p3/2C+p2D],\left\lVert\mathcal{U}_{n}^{(1)}\right\rVert_{p}\leq K\left[\sqrt{p}A+pB+p^{3/2}C+p^{2}D\right],

where

A:=(1ijn𝔼[Φi,12wi,j2(Xi,Xj)(Ψi,1)2])1/2,B:=sup{𝔼1ijnεiξi(εi,Zi)Φi,1wi,j(Xi,Xj)Ψi,1ζj(εj,Zj)εj:𝔼i=1nξi2(εi,Zi)1,𝔼j=1nζi2(εi,Zi)1},Cp:=𝔼(max1in𝔼[j=1nΦi,12wi,j2(Xi,Xj)(Ψi,1)2|Xi,Yi])p/2+𝔼(max1jn𝔼[i=1nΦi,12wi,j2(Xi,Xj)(Ψi,1)2|Xj,Yj])p/2Dp:=𝔼[max1ijn|Φi,1wi,j(Xi,Xj)Ψj,1|p].\begin{split}A&:=\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi_{i,1}^{\prime}\right)^{2}\right]\right)^{1/2},\\ B&:=\sup\left\{\mathbb{E}\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\xi_{i}(\varepsilon_{i},Z_{i})\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{i,1}\zeta_{j}(\varepsilon_{j}^{\prime},Z_{j}^{\prime})\varepsilon_{j}^{\prime}:\right.\\ &\qquad\qquad\quad\left.\mathbb{E}\sum_{i=1}^{n}\xi_{i}^{2}(\varepsilon_{i},Z_{i})\leq 1,\,\mathbb{E}\sum_{j=1}^{n}\zeta_{i}^{2}(\varepsilon_{i}^{\prime},Z_{i}^{\prime})\leq 1\right\},\\ C^{p}&:=\mathbb{E}\left(\max_{1\leq i\leq n}\mathbb{E}\left[\sum_{j=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{i,1}\right)^{2}\big{|}X_{i},Y_{i}\right]\right)^{p/2}\\ &\qquad+\mathbb{E}\left(\max_{1\leq j\leq n}\mathbb{E}\left[\sum_{i=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{i,1}\right)^{2}\big{|}X_{j}^{\prime},Y_{j}^{\prime}\right]\right)^{p/2}\\ D^{p}&:=\mathbb{E}\left[\max_{1\leq i\neq j\leq n}|\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}|^{p}\right].\end{split} (21)

It is clear that

A21ijn𝔼[ϕ2(Yi)wi,j2(Xi,Xj)ψ2(Yj)]=1ijn𝔼[σi,ϕ2(Xi)wi,j2(Xi,Xj)σj,ψ2(Xj)].\displaystyle A^{2}\leq\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\phi^{2}(Y_{i})w_{i,j}^{2}(X_{i},X_{j})\psi^{2}(Y_{j})\right]=\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w_{i,j}^{2}(X_{i},X_{j})\sigma_{j,\psi}^{2}(X_{j})\right].

The quantity BB appears as the square root of the wimpy variance of the supremum of an empirical process; see Boucheron et al., (2013, page 314). Lemma 4 of Section A.4 implies that

B\displaystyle B sup{1ijn𝔼[qi(Xi)σi,ϕ(Xi)wi,j(Xi,Xj)σj,ψ(Xj)pj(Xj)]:\displaystyle\leq\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.
j=1n𝔼[qi2(Xi)]1,i=1n𝔼[pj2(Xj)]1}.\displaystyle\qquad\qquad\quad\left.\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1,\,\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1\right\}.

For bounding CC, note that

𝔼[j=1nΦi,12wi,j2(Xi,Xj)(Ψj,1)2|Xi,Yi]\displaystyle\mathbb{E}\left[\sum_{j=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{j,1}\right)^{2}\big{|}X_{i},Y_{i}\right] Tϕ2supxj=1n𝔼[wi,j2(x,Xj)σ2j,ψ(Xj)],\displaystyle\leq T_{\phi}^{2}\sup_{x}\sum_{j=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j})\sigma^{2}_{j,\psi}(X_{j})\right],
𝔼[i=1nΦi,12wi,j2(Xi,Xj)(Ψj,1)2|Xj,Yj]\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}\Phi_{i,1}^{2}w_{i,j}^{2}(X_{i},X_{j}^{\prime})\left(\Psi^{\prime}_{j,1}\right)^{2}\big{|}X_{j}^{\prime},Y_{j}^{\prime}\right] Tψ2supxi=1n𝔼[wi,j2(Xi,x)σ2i,ϕ(Xi)].\displaystyle\leq T_{\psi}^{2}\sup_{x}\sum_{i=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(X_{i},x)\sigma^{2}_{i,\phi}(X_{i})\right].

Combining these two inequalities implies that

CTϕsupx(j=1n𝔼[wi,j2(x,Xj)σ2j,ψ(Xj)])1/2+Tψsupx(i=1n𝔼[wi,j2(Xi,x)σ2i,ϕ(Xi)])1/2C\leq T_{\phi}\sup_{x}\left(\sum_{j=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j})\sigma^{2}_{j,\psi}(X_{j})\right]\right)^{1/2}+T_{\psi}\sup_{x}\left(\sum_{i=1}^{n}\mathbb{E}\left[w_{i,j}^{2}(X_{i},x)\sigma^{2}_{i,\phi}(X_{i})\right]\right)^{1/2}

Finally, it is clear from assumption (B2) that DTϕTψBw.D\leq T_{\phi}T_{\psi}B_{w}. Combining all these with Theorem 3.2 of Giné et al., (2000) and noting

TϕKαCϕ(logn)1/αandTψKβCψ(logn)1/β,T_{\phi}\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\quad\mbox{and}\quad T_{\psi}\leq K_{\beta}C_{\psi}(\log n)^{1/\beta},

we get that there exists a constant K>0K>0 such that for all p1p\geq 1

𝒰n(1)pK[pΛ1/2+pΛ1+p3/2{Λ3/2(α)+Λ3/2(β)}+p2Λ2].\left\lVert\mathcal{U}_{n}^{(1)}\right\rVert_{p}\leq K\left[\sqrt{p}\Lambda_{1/2}+p\Lambda_{1}+p^{3/2}\left\{\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right\}+p^{2}\Lambda_{2}\right]. (22)

To bound 𝒰n(2)\mathcal{U}_{n}^{(2)} and 𝒰n(3)\mathcal{U}_{n}^{(3)} in (33), we use Hoffmann-Jøgensen’s inequality (Proposition 6.8 of Ledoux and Talagrand, (1991)). Observe that

𝒰n(2):=i=1nεiΦi,2gi(Xi;𝒵n),wheregi(Xi;𝒵n):=j=1,jinwi,j(Xi,Xj)Ψj,1εj.\mathcal{U}_{n}^{(2)}:=\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,2}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime}),\quad\mbox{where}\quad g_{i}(X_{i};\mathcal{Z}_{n}^{\prime}):=\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}.

With 𝒵n:={(ε1,Z1),,(εn,Zn)}\mathcal{Z}_{n}^{\prime}:=\{(\varepsilon_{1}^{\prime},Z_{1}^{\prime}),\ldots,(\varepsilon_{n}^{\prime},Z_{n}^{\prime})\} and 𝒳n:={X1,,Xn}\mathcal{X}_{n}:=\{X_{1},\ldots,X_{n}\}, note that

\displaystyle\mathbb{P} (max1In|i=1IεiΦi,2gi(Xi,𝒵n)|>0|𝒳n,𝒵n)(max1in|ϕ(Zi)|Tϕ|𝒳n)1/8,\displaystyle\left(\max_{1\leq I\leq n}\left|\sum_{i=1}^{I}\varepsilon_{i}\Phi_{i,2}g_{i}(X_{i},\mathcal{Z}_{n}^{\prime})\right|>0\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right)\leq\mathbb{P}\left(\max_{1\leq i\leq n}|\phi(Z_{i})|\geq T_{\phi}\big{|}\mathcal{X}_{n}\right)\leq 1/8,

and so, by Equation (6.8) of Ledoux and Talagrand, (1991), we get

𝔼[𝒰n(2)|𝒳n,𝒵n]\displaystyle\mathbb{E}\left[\mathcal{U}_{n}^{(2)}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right] 8𝔼[max1in|Φi,2(gi(Xi;𝒵n))||𝒳n,𝒵n]\displaystyle\leq 8\mathbb{E}\left[\max_{1\leq i\leq n}\left|\Phi_{i,2}\left(g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right)\right|\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right]
8𝔼[max1in|ϕ(Zi)||𝒳n]max1in|gi(Xi;𝒵n)|=Tϕmax1in|gi(Xi;𝒵n)|.\displaystyle\leq 8\mathbb{E}\left[\max_{1\leq i\leq n}\left|\phi(Z_{i})\right|\big{|}\mathcal{X}_{n}\right]\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|=T_{\phi}\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|.

From assumption (B1) and Theorem 6.21 of Ledoux and Talagrand, (1991), we thus get for 0<α10<\alpha\leq 1,

𝒰n(2)ψα|𝒳n,𝒵nKα𝔼[𝒰n(2)|𝒳n,𝒵n]+Kαmax1in|Φi,2gi(Xi;𝒵n)|ψα|𝒳n,𝒵nKα(Tϕ+max1in|ϕ(Yi)|ψα|𝒳n,𝒵n)max1in|gi(Xi;𝒵n)|KαCϕ(logn)1/αmax1in|gi(Xi;𝒵n)|,\begin{split}\left\lVert\mathcal{U}_{n}^{(2)}\right\rVert_{\psi_{\alpha}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}&\leq K_{\alpha}\mathbb{E}\left[\mathcal{U}_{n}^{(2)}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right]+K_{\alpha}\left\lVert\max_{1\leq i\leq n}\left|\Phi_{i,2}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|\right\rVert_{\psi_{\alpha}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\\ &\leq K_{\alpha}\left(T_{\phi}+\left\lVert\max_{1\leq i\leq n}|\phi(Y_{i})|\right\rVert_{\psi_{\alpha}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\right)\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|\\ &\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|,\end{split} (23)

for some constant KαK_{\alpha} depending only on α\alpha (and can be different in different lines). If α1\alpha\geq 1, then we get

𝒰n(2)ψα|𝒳n,𝒵nKαCϕ(logn)1/αmax1in|gi(Xi;𝒵n)|.\left\lVert\mathcal{U}_{n}^{(2)}\right\rVert_{\psi_{\alpha^{*}}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|.

See proof of Theorem 3.3 in Kuchibhotla and Chakrabortty, (2022) for similar argument. Thus,

𝔼[|𝒰n(2)|p|𝒳n,𝒵n]KαpCϕp(logn)p/αpp/αmax1in|gi(Xi;𝒵n)|p.\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}|^{p}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}\right]\leq K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|^{p}.

Thus, for p1p\geq 1,

𝔼[|𝒰n(2)|p]KαpCϕp(logn)p/αpp/α𝔼[max1in|gi(Xi;𝒵n)|p].\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}|^{p}\right]\leq K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\mathbb{E}\left[\max_{1\leq i\leq n}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})\right|^{p}\right]. (24)

To control the right hand side above, recall that

gi(x;𝒵n)=j=1,jinwi,j(x,Xj)Ψj,1εj,g_{i}(x;\mathcal{Z}_{n}^{\prime})=\sum_{j=1,j\neq i}^{n}w_{i,j}(x,X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime},

is a sum of mean zero independent random variables that are bounded by BwTψB_{w}T_{\psi}. Also, note that

Var(gi(x;𝒵n))=j=1,jin𝔼[wi,j2(x,Xj)ψ2(Zj)]=j=1,jin𝔼[wi,j2(x,Xj)σj,ψ2(Xj)].\mbox{Var}(g_{i}(x;\mathcal{Z}_{n}^{\prime}))=\sum_{j=1,j\neq i}^{n}\mathbb{E}\left[w_{i,j}^{2}(x,X_{j}^{\prime})\psi^{2}(Z_{j}^{\prime})\right]=\sum_{j=1,j\neq i}^{n}\mathbb{E}[w_{i,j}^{2}(x,X_{j}^{\prime})\sigma_{j,\psi}^{2}(X_{j}^{\prime})].

Therefore by Bernstein’s inequality (Lemma 4 of van de Geer and Lederer, (2013)), we get that

(max1in|gi(Xi;𝒵n)|Υψ6log(1+n)3BwTψlognΥψt+3BwTψt)2et,\mathbb{P}\left(\max_{1\leq i\leq n}|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})|-\Upsilon_{\psi}\sqrt{6\log(1+n)}-3B_{w}T_{\psi}\log n\geq\Upsilon_{\psi}\sqrt{t}+3B_{w}T_{\psi}t\right)\leq 2e^{-t}, (25)

where

Υψ2:=maxxj=1,jin𝔼[wi,j2(x,Xj)σj,ψ2(Xj)].\Upsilon_{\psi}^{2}:=\max_{x}\sum_{j=1,j\neq i}^{n}\mathbb{E}[w_{i,j}^{2}(x,X_{j}^{\prime})\sigma_{j,\psi}^{2}(X_{j}^{\prime})].

So, by Propositions A.3 and A.4 of Kuchibhotla and Chakrabortty, (2022), we get that for p1p\geq 1,

𝔼[max1in|gi(Xi;𝒵n)|p]Cp[(logn)p/2Υψp+(BwTψ)p(logn)p+pp/2Υψp+pp(BwTψ)p].\mathbb{E}\left[\max_{1\leq i\leq n}|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime})|^{p}\right]\leq C^{p}\left[(\log n)^{p/2}\Upsilon_{\psi}^{p}+(B_{w}T_{\psi})^{p}(\log n)^{p}+p^{p/2}\Upsilon_{\psi}^{p}+p^{p}(B_{w}T_{\psi})^{p}\right].

Hence for p1p\geq 1,

𝔼[|𝒰n(2)|p]KαpCϕp(logn)p/αpp/α[(logn)p/2Υψp+(BwTψ)p(logn)p]+KαpCϕp(logn)p/αpp/α[pp/2Υψp+pp(BwTψ)p].\begin{split}\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}|^{p}\right]&\leq K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\left[(\log n)^{p/2}\Upsilon_{\psi}^{p}+(B_{w}T_{\psi})^{p}(\log n)^{p}\right]\\ &\quad+K_{\alpha}^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\left[p^{p/2}\Upsilon_{\psi}^{p}+p^{p}(B_{w}T_{\psi})^{p}\right].\end{split} (26)

A similar calculation for 𝒰n(3)\mathcal{U}_{n}^{(3)} shows that for p1p\geq 1,

𝔼[|𝒰n(3)|p]KβpCψp(logn)p/βpp/β[(logn)p/2Υϕp+(BwTϕ)p(logn)p]+KβpCψp(logn)p/βpp/β[pp/2Υϕp+pp(BwTϕ)p],\begin{split}\mathbb{E}\left[|\mathcal{U}_{n}^{(3)}|^{p}\right]&\leq K_{\beta}^{p}C_{\psi}^{p}(\log n)^{p/\beta}p^{p/\beta^{*}}\left[(\log n)^{p/2}\Upsilon_{\phi}^{p}+(B_{w}T_{\phi})^{p}(\log n)^{p}\right]\\ &\quad+K_{\beta}^{p}C_{\psi}^{p}(\log n)^{p/\beta}p^{p/\beta^{*}}\left[p^{p/2}\Upsilon_{\phi}^{p}+p^{p}(B_{w}T_{\phi})^{p}\right],\end{split} (27)

where

Υϕ2:=maxxi=1,ijn𝔼[wi,j2(Xi,x)σi,ϕ2(Xi)].\Upsilon_{\phi}^{2}:=\max_{x}\sum_{i=1,i\neq j}^{n}\mathbb{E}[w_{i,j}^{2}(X_{i},x)\sigma_{i,\phi}^{2}(X_{i})].

To control 𝒰n(4)\mathcal{U}_{n}^{(4)}, recall that

𝒰n(4)=i=1nεiΦi,2(j=1,jinwi,j(Xi,Xj)Ψj,2εj).\mathcal{U}_{n}^{(4)}=\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,2}\left(\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,2}^{\prime}\varepsilon_{j}^{\prime}\right).

Following the arguments leading to (23), we have

𝒰n(4)ψα|𝒳n,𝒵nKαCϕ(logn)1/αmax1in|j=1,jinwi,j(Xi,Xj)Ψjεj|.\left\lVert\mathcal{U}_{n}^{(4)}\right\rVert_{\psi_{\alpha^{*}}|\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\leq K_{\alpha}C_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\left|\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j}^{\prime}\varepsilon_{j}^{\prime}\right|.

Conditioning on 𝒳n,𝒳n\mathcal{X}_{n},\mathcal{X}_{n}^{\prime}, the right hand side satisfies the hypothesis of (6.8) of Ledoux and Talagrand, (1991) and so by Theorem 6.21 of Ledoux and Talagrand, (1991), we get

max1in|j=1,jinwi,j(Xi,Xj)Ψjεj|ψβ|𝒳n,𝒳n\displaystyle\left\lVert\max_{1\leq i\leq n}\left|\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j}^{\prime}\varepsilon_{j}^{\prime}\right|\right\rVert_{\psi_{\beta^{*}}|\mathcal{X}_{n},\mathcal{X}_{n}^{\prime}} Kβmax1ijn|wi,j(Xi,Xj)ψ(Zj)|ψβ|𝒳n,𝒳n\displaystyle\leq K_{\beta}\left\lVert\max_{1\leq i\neq j\leq n}\left|w_{i,j}(X_{i},X_{j}^{\prime})\psi(Z_{j}^{\prime})\right|\right\rVert_{\psi_{\beta}|\mathcal{X}_{n},\mathcal{X}_{n}^{\prime}}
KβBw(logn)1/βCψ,\displaystyle\leq K_{\beta}B_{w}(\log n)^{1/\beta}C_{\psi},

for some constant KβK_{\beta} depending only on β\beta. Therefore, for p1p\geq 1

𝔼[|𝒰n(4)|p]KpCψpCϕppp(1/α+1/β)(logn)p(α1+β1)BwpKppp(1/α+1/β)Λ2p,\mathbb{E}\left[|\mathcal{U}_{n}^{(4)}|^{p}\right]\leq K^{p}C_{\psi}^{p}C_{\phi}^{p}p^{p(1/\alpha^{*}+1/\beta^{*})}(\log n)^{p(\alpha^{-1}+\beta^{-1})}B_{w}^{p}\leq K^{p}p^{p(1/\alpha^{*}+1/\beta^{*})}\Lambda_{2}^{p}, (28)

for some constant K>0K>0.

Combining bounds (26) and (27), we get that for some constant K>0K>0 and for all p1p\geq 1,

𝒰n(2)+𝒰n(3)p\displaystyle\left\lVert\mathcal{U}_{n}^{(2)}+\mathcal{U}_{n}^{(3)}\right\rVert_{p} Kp1/α(logn)1/2Λ3/2(α)+Kp1/β(logn)1/2Λ3/2(β)\displaystyle\leq Kp^{1/\alpha^{*}}(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+Kp^{1/\beta^{*}}(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}
+Kp1/2+1/αΛ3/2(α)+Kp1/2+1/βΛ3/2(β)\displaystyle\qquad+Kp^{1/2+1/\alpha^{*}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{*}}\Lambda_{3/2}^{(\beta)}
+K(logn)Λ2[p1/α+p1/β]+KΛ2[p1+1/α+p1+1/β].\displaystyle\qquad+K(\log n)\Lambda_{2}[p^{1/\alpha^{*}}+p^{1/\beta^{*}}]+K\Lambda_{2}[p^{1+1/\alpha^{*}}+p^{1+1/\beta^{*}}].

Combining this inequality with (22) and (28), we get for all p1p\geq 1

=14𝒰n()p\displaystyle\left\lVert\sum_{\ell=1}^{4}\mathcal{U}_{n}^{(\ell)}\right\rVert_{p} K[p1/2Λ1/2+pΛ1+p3/2{Λ3/2(α)+Λ3/2(β)}+p2Λ2]\displaystyle\leq K\left[p^{1/2}\Lambda_{1/2}+p\Lambda_{1}+p^{3/2}\left\{\Lambda_{3/2}^{(\alpha)}+\Lambda_{3/2}^{(\beta)}\right\}+p^{2}\Lambda_{2}\right]
+Kp1/α(logn)1/2Λ3/2(α)+Kp1/β(logn)1/2Λ3/2(β)\displaystyle\quad+Kp^{1/\alpha^{*}}(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+Kp^{1/\beta^{*}}(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}
+Kp1/2+1/αΛ3/2(α)+Kp1/2+1/βΛ3/2(β)\displaystyle\quad+Kp^{1/2+1/\alpha^{*}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{*}}\Lambda_{3/2}^{(\beta)}
+K(logn)Λ2[p1/α+p1/β]+KΛ2[p1+1/α+p1+1/β]\displaystyle\quad+K(\log n)\Lambda_{2}[p^{1/\alpha^{*}}+p^{1/\beta^{*}}]+K\Lambda_{2}[p^{1+1/\alpha^{*}}+p^{1+1/\beta^{*}}]
+Kp(1/α+1/β)Λ2.\displaystyle\quad+Kp^{(1/\alpha^{*}+1/\beta^{*})}\Lambda_{2}.

Since α1\alpha^{*}\leq 1 and β1\beta^{*}\leq 1, we have

min{p1/2+1/α,p1/2+1/β}p3/2 and min{p1+1/α,p1+1/β,p1/α+1/β}p2.\min\{p^{1/2+1/\alpha^{*}},p^{1/2+1/\beta^{*}}\}\geq p^{3/2}\mbox{ and }\min\{p^{1+1/\alpha^{*}},p^{1+1/\beta^{*}},p^{1/\alpha^{*}+1/\beta^{*}}\}\geq p^{2}.

Using these inequalities, the bound above can be simplified as

=14𝒰n()p\displaystyle\left\lVert\sum_{\ell=1}^{4}\mathcal{U}_{n}^{(\ell)}\right\rVert_{p} Kp1/2Λ1/2+KpΛ1\displaystyle\leq Kp^{1/2}\Lambda_{1/2}+Kp\Lambda_{1}
+Kp1/α[(logn)1/2Λ3/2(α)+(logn)Λ2]+Kp1/β[(logn)1/2Λ3/2(β)+(logn)Λ2]\displaystyle\quad+Kp^{1/\alpha^{*}}\left[(\log n)^{1/2}\Lambda_{3/2}^{(\alpha)}+(\log n)\Lambda_{2}\right]+Kp^{1/\beta^{*}}\left[(\log n)^{1/2}\Lambda_{3/2}^{(\beta)}+(\log n)\Lambda_{2}\right]
+Kp1/2+1/αΛ3/2(α)+Kp1/2+1/βΛ3/2(β)\displaystyle\quad+Kp^{1/2+1/\alpha^{*}}\Lambda_{3/2}^{(\alpha)}+Kp^{1/2+1/\beta^{*}}\Lambda_{3/2}^{(\beta)}
+Kp1/α+1/βΛ2.\displaystyle\quad+Kp^{1/\alpha^{*}+1/\beta^{*}}\Lambda_{2}.

Here the constant K>0K>0 depends only on α,β\alpha,\beta. This completes the proof based on Lemma 1.

A.4 Auxiliary Lemmas Used in Theorem 2

The two lemmas to follow in this section provide explicit (but not necessarily optimal) constants for Equations (3.1) and (2.6) of Giné et al., (2000). These lemmas can be used in the proof of Theorem 3.2 of Giné et al., (2000) to get explicit constants. In this respect, we note that Theorem 3.4.8 of Giné and Nickl, (2016) (which was first proved in Houdré and Reynaud-Bouret, (2003)) does not imply Theorem 3.2 of Giné et al., (2000) since the result of Giné et al., (2000) applies for unbounded kernels in UU-statistics while the result of Giné and Nickl, (2016) applies exclusively for bounded kernel UU-statistics.

Lemma 2.

Suppose Z1,,ZnZ_{1},\ldots,Z_{n} are independent mean zero random variables. Then for p1p\geq 1,

𝔼[|i=1nZi|p]4ppp/2(i=1n𝔼[Zi2])p/2+4ppp𝔼[max1in|Zi|p].\mathbb{E}\left[\left|\sum_{i=1}^{n}Z_{i}\right|^{p}\right]\leq 4^{p}p^{p/2}\left(\sum_{i=1}^{n}\mathbb{E}\left[Z_{i}^{2}\right]\right)^{p/2}+4^{p}p^{p}\mathbb{E}\left[\max_{1\leq i\leq n}|Z_{i}|^{p}\right].
Proof.

By Theorem 7 of Boucheron et al., (2005), we get for p2p\geq 2,

𝔼[|i=1nZi|p]2p+1(2pee)p/2𝔼[(i=1nZi2)p/2].\mathbb{E}\left[\left|\sum_{i=1}^{n}Z_{i}\right|^{p}\right]\leq 2^{p+1}\left(\frac{2p}{e-\sqrt{e}}\right)^{p/2}\mathbb{E}\left[\left(\sum_{i=1}^{n}Z_{i}^{2}\right)^{p/2}\right].

By Theorem 8 of Boucheron et al., (2005), we get for p2p\geq 2,

𝔼[(i=1nZi2)p/2]3p/2(i=1n𝔼[Zi2])p/2+(3pκ2)p/2𝔼[max1in|Zi|p],\mathbb{E}\left[\left(\sum_{i=1}^{n}Z_{i}^{2}\right)^{p/2}\right]\leq 3^{p/2}\left(\sum_{i=1}^{n}\mathbb{E}\left[Z_{i}^{2}\right]\right)^{p/2}+\left(\frac{3p\kappa}{2}\right)^{p/2}\mathbb{E}\left[\max_{1\leq i\leq n}|Z_{i}|^{p}\right],

for κ=0.5e/(e1)\kappa=0.5\sqrt{e}/(\sqrt{e}-1). Thus for p2p\geq 2,

𝔼[|i=1nZi|p]4ppp/2(i=1n𝔼[Zi2])p/2+4ppp𝔼[max1in|Zi|p].\mathbb{E}\left[\left|\sum_{i=1}^{n}Z_{i}\right|^{p}\right]\leq 4^{p}p^{p/2}\left(\sum_{i=1}^{n}\mathbb{E}\left[Z_{i}^{2}\right]\right)^{p/2}+4^{p}p^{p}\mathbb{E}\left[\max_{1\leq i\leq n}|Z_{i}|^{p}\right].

Since the inequality holds true for p=1p=1 trivially, the result follows. ∎

Lemma 3.

Suppose ξi,1in\xi_{i},1\leq i\leq n are independent random variables, then for p1p\geq 1 and α>0\alpha>0,

ppαi=1n𝔼[|ξi|p]4(1.5)pαppα𝔼[max1in|ξi|p]+2(1.5)pα(i=1n𝔼[|ξi|])p.p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[\left|\xi_{i}\right|^{p}\right]\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2(1.5)^{p\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}.
Proof.

Fix p1p\geq 1. Define δ00\delta_{0}\geq 0 such that

δ0:=inf{t>0:i=1n(|ξi|>t)1}.\delta_{0}:=\inf\left\{t>0:\,\sum_{i=1}^{n}\mathbb{P}\left(|\xi_{i}|>t\right)\leq 1\right\}.

By (1.4.4) of de la Peña and Giné, (1999), it follows that

12max{δ0p,i=1n𝔼[|ξi|p𝟙{|ξi|>δ0}]}𝔼[max1in|ξi|p].\frac{1}{2}\max\left\{\delta_{0}^{p},\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\mathbbm{1}_{\{|\xi_{i}|>\delta_{0}\}}\right]\right\}\leq\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]. (29)

Observe that

i=1n𝔼[|ξi|p]\displaystyle\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\right] =i=1n𝔼[|ξi|p𝟙{|ξi|>δ0}]+i=1n𝔼[|ξi|p𝟙{|ξi|δ0}]\displaystyle=\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\mathbbm{1}_{\{|\xi_{i}|>\delta_{0}\}}\right]+\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\mathbbm{1}_{\{|\xi_{i}|\leq\delta_{0}\}}\right]
(a)2𝔼[max1in|ξi|p]+i=1n𝔼[|ξi|p𝟙{|ξi|δ0}]\displaystyle\overset{(a)}{\leq}2\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\mathbbm{1}_{\{|\xi_{i}|\leq\delta_{0}\}}\right]
2𝔼[max1in|ξi|p]+δ0p1i=1n𝔼[|ξi|𝟙{|ξi|δ0}]\displaystyle\leq 2\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+\delta_{0}^{p-1}\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\mathbbm{1}_{\{|\xi_{i}|\leq\delta_{0}\}}\right]
(a)2𝔼[max1in|ξi|p]+2𝔼[max1in|ξi|p1](i=1n𝔼[|ξi|𝟙{|ξi|δ0}])\displaystyle\overset{(a)}{\leq}2\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p-1}\right]\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\mathbbm{1}_{\{|\xi_{i}|\leq\delta_{0}\}}\right]\right)
2𝔼[max1in|ξi|p]+2𝔼[max1in|ξi|p1](i=1n𝔼[|ξi|]).\displaystyle\leq 2\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p-1}\right]\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right).

Inequality (a) follows from (29). To prove the result now, we consider two cases:

  • Case 1: If

    ppα𝔼[max1in|ξi|p](i=1n𝔼[|ξi|])p,p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\leq\left(\sum_{i=1}^{n}\mathbb{E}[|\xi_{i}|]\right)^{p},

    then

    𝔼[max1in|ξi|p1](i=1n𝔼[|ξi|])\displaystyle\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p-1}\right]\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right) (𝔼[max1in|ξi|p])(p1)/p(i=1n𝔼[|ξi|])\displaystyle\leq\left(\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\right)^{(p-1)/p}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)
    1p(p1)α(i=1n𝔼[|ξi|])p1(i=1n𝔼[|ξi|])\displaystyle\leq\frac{1}{p^{(p-1)\alpha}}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p-1}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)
    1p(p1)α(i=1n𝔼[|ξi|])p.\displaystyle\leq\frac{1}{p^{(p-1)\alpha}}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}.

    Therefore (in case 1),

    ppαi=1n𝔼[|ξi|p]2ppα𝔼[max1in|ξi|p]+2pα(i=1n𝔼[|ξi|])p.p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\right]\leq 2p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2p^{\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}. (30)
  • Case 2: If

    ppα𝔼[max1in|ξi|p](i=1n𝔼[|ξi|])p,p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\geq\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p},

    then

    𝔼[max1in|ξi|p1]\displaystyle\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p-1}\right] (i=1n𝔼[|ξi|])\displaystyle\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)
    (𝔼[max1in|ξi|p])(p1)/p(i=1n𝔼[|ξi|])\displaystyle\leq\left(\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\right)^{(p-1)/p}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)
    (𝔼[max1in|ξi|p])(p1)/ppα(𝔼[max1in|ξi|p])1/p\displaystyle\leq\left(\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\right)^{(p-1)/p}p^{\alpha}\left(\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]\right)^{1/p}
    pα𝔼[max1in|ξi|p].\displaystyle\leq p^{\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right].

    Therefore (in case 2),

    ppαi=1n𝔼[|ξi|p]\displaystyle p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\right] 2ppα𝔼[max1in|ξi|p]+2pα(p+1)𝔼[max1in|ξi|p]\displaystyle\leq 2p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2p^{\alpha(p+1)}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]
    2ppα𝔼[max1in|ξi|p]+2ppα(e1/e)pα𝔼[max1in|ξi|p]\displaystyle\leq 2p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2p^{p\alpha}\left(e^{1/e}\right)^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]
    (2+(1.5)pα)ppα𝔼[max1in|ξi|p].\displaystyle\leq(2+(1.5)^{p\alpha})p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]. (31)

Combining inequalities (30) and (31), we get for p1p\geq 1 and α>0\alpha>0 that

ppαi=1n𝔼[|ξi|p]\displaystyle p^{p\alpha}\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|^{p}\right] (2+(1.5)pα)ppα𝔼[max1in|ξi|p]+2pα(i=1n𝔼[|ξi|])p\displaystyle\leq(2+(1.5)^{p\alpha})p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2p^{\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}
4(1.5)pαppα𝔼[max1in|ξi|p]+2pα(i=1n𝔼[|ξi|])p\displaystyle\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2p^{\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}
4(1.5)pαppα𝔼[max1in|ξi|p]+2(p1/p)pα(i=1n𝔼[|ξi|])p\displaystyle\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2(p^{1/p})^{p\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}
4(1.5)pαppα𝔼[max1in|ξi|p]+2(1.5)pα(i=1n𝔼[|ξi|])p.\displaystyle\leq 4(1.5)^{p\alpha}p^{p\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}|\xi_{i}|^{p}\right]+2(1.5)^{p\alpha}\left(\sum_{i=1}^{n}\mathbb{E}\left[|\xi_{i}|\right]\right)^{p}.

This proves the result. ∎

Lemma 4.

Under the notation of Theorem 2, the quantity BB defined in (21) satisfies

B\displaystyle B sup{1ijn𝔼[qi(Xi)σi,ϕ(Xi)wi,j(Xi,Xj)σj,ψ(Xj)pj(Xj)]:\displaystyle\leq\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.
j=1n𝔼[qi2(Xi)]1,i=1n𝔼[pj2(Xj)]1}.\displaystyle\qquad\qquad\quad\left.\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1,\,\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1\right\}.
Proof.

Following the proof of Theorem 3.2 of Giné et al., (2000), the quantity BB is the square root of the wimpy variance of

Sn:=(i=1n𝔼[Fi2(εi,Zi;𝒵n)|𝒵n])1/2,S_{n}:=\left(\sum_{i=1}^{n}\mathbb{E}\left[F_{i}^{2}(\varepsilon_{i},Z_{i};\mathcal{Z}_{n}^{\prime})\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{1/2},

where 𝒵n:={(ε1,Z1),,(εn,Zn)}\mathcal{Z}_{n}^{\prime}:=\{(\varepsilon_{1}^{\prime},Z_{1}^{\prime}),\ldots,(\varepsilon_{n}^{\prime},Z_{n}^{\prime})\} and

Fi(εi,Zi;𝒵n):=εiΦi,1j=1,jinwi,j(Xi,Xj)Ψj,1εj.F_{i}(\varepsilon_{i},Z_{i};\mathcal{Z}_{n}^{\prime}):=\varepsilon_{i}\Phi_{i,1}\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}.

This implies that

Sn(i=1n𝔼[Gi2(Xi;𝒵n)|𝒵n])1/2,S_{n}\leq\left(\sum_{i=1}^{n}\mathbb{E}\left[G_{i}^{2}(X_{i};\mathcal{Z}_{n}^{\prime})\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{1/2},

where for σi,ϕ2(x):=𝔼[ϕ2(Yi)|Xi=x]\sigma_{i,\phi}^{2}(x):=\mathbb{E}\left[\phi^{2}(Y_{i})\big{|}X_{i}=x\right],

Gi(Xi;𝒵n):=σi,ϕ(Xi)j=1,jinwi,j(Xi,Xj)Ψj,1εj.G_{i}(X_{i};\mathcal{Z}_{n}^{\prime}):=\sigma_{i,\phi}(X_{i})\sum_{j=1,j\neq i}^{n}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}.

Note that σi,ϕ()\sigma_{i,\phi}(\cdot) depends on ii since the random variables are allowed to be non-identically distributed. Now observe that

Sn=sup{i=1nqi(x)Gi(x;𝒵n)PXi(dx):i=1nqi2(x)PXi(dx)1}.S_{n}=\sup\left\{\sum_{i=1}^{n}\int q_{i}(x)G_{i}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx):\,\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1\right\}. (32)

To prove this, note that for any {qi(): 1in}\{q_{i}(\cdot):\,1\leq i\leq n\} satisfying the (integral) constraint,

i=1n\displaystyle\sum_{i=1}^{n} qi(x)Gi(x;𝒵n)PXi(dx)\displaystyle\int q_{i}(x)G_{i}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)
i=1n(qi2(x)PXi(dx))1/2(Gi2(x;𝒵n)PXi(dx))1/2\displaystyle\leq\sum_{i=1}^{n}\left(\int q_{i}^{2}(x)P_{X_{i}}(dx)\right)^{1/2}\left(\int G_{i}^{2}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)\right)^{1/2}
(i=1nqi2(x)PXi(dx))1/2(i=1nGi2(x;𝒵n)PXi(dx))1/2Sn.\displaystyle\leq\left(\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\right)^{1/2}\left(\sum_{i=1}^{n}\int G_{i}^{2}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)\right)^{1/2}\leq S_{n}.

To prove the reverse inequality, define for 1in1\leq i\leq n,

qi(x):=Gi(x;𝒵n)(i=1nGi2(x;𝒵n)PXi(dx))1/2.q_{i}(x):=G_{i}(x;\mathcal{Z}_{n}^{\prime})\left(\sum_{i=1}^{n}\int G_{i}^{2}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)\right)^{1/2}.

It is clear that {qi(): 1in}\{q_{i}(\cdot):\,1\leq i\leq n\} satisfy the integral constraint in (32) and

i=1nqi(x)Gi(x;𝒵n)PXi(dx)=Sn.\sum_{i=1}^{n}\int q_{i}(x)G_{i}(x;\mathcal{Z}_{n}^{\prime})P_{X_{i}}(dx)=S_{n}.

This completes the proof of (32). Rewriting the representation (32), we get

Sn=supi=1nqi2(x)PXi(dx)1j=1nεjΨj,1(i=1,ijnqi(x)σi,ϕ(x)wi,j(x,Xj)PXi(dx)).S_{n}=\sup_{\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1}\,\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi^{\prime}_{j,1}\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right).

This representation shows that SnS_{n} is indeed the supremum of an empirical process. The wimpy variance of this supremum is given by

sup{qi()}\displaystyle\sup_{\{q_{i}(\cdot)\}} Var(j=1nεjΨj,1(i=1,ijnqi(x)σi,ϕ(x)wi,j(x,Xj)PXi(dx)))\displaystyle\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi^{\prime}_{j,1}\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right)\right)
sup{qi()}j=1n𝔼[σj,ψ2(Xj)(i=1,ijnqi(x)σi,ϕ(x)wi,j(x,Xj)PXi(dx))2]\displaystyle\leq\sup_{\{q_{i}(\cdot)\}}\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j}^{\prime})\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx)\right)^{2}\right]
=sup{qi()}j=1n𝔼[σj,ψ2(Xj)(i=1,ijnqi(x)σi,ϕ(x)wi,j(x,Xj)PXi(dx))2].\displaystyle=\sup_{\{q_{i}(\cdot)\}}\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j})\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j})P_{X_{i}}(dx)\right)^{2}\right].

Now a duality argument implies that

sup{qi()}\displaystyle\sup_{\{q_{i}(\cdot)\}} (j=1n𝔼[σj,ψ2(Xj)(i=1,ijnqi(x)σi,ϕ(x)wi,j(x,Xj)PXi(dx))2])1/2\displaystyle\left(\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j})\left(\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j})P_{X_{i}}(dx)\right)^{2}\right]\right)^{1/2}
=sup{1ijn𝔼[qi(Xi)σi,ϕ(Xi)wi,j(Xi,Xj)σj,ψ(Xj)pj(Xj)]:\displaystyle=\sup\left\{\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j})\sigma_{j,\psi}(X_{j})p_{j}(X_{j})\right]:\right.
i=1n𝔼[pj2(Xj)]1,j=1n𝔼[qi2(Xi)]1}.\displaystyle\qquad\quad\left.\sum_{i=1}^{n}\mathbb{E}\left[p_{j}^{2}(X_{j})\right]\leq 1,\sum_{j=1}^{n}\mathbb{E}\left[q_{i}^{2}(X_{i})\right]\leq 1\right\}.

Thus the result follows. ∎

Appendix B Proofs of Results in Section 3

B.1 Proof of Theorem 3

Similar to 𝒰n(),14\mathcal{U}_{n}^{(\ell)},1\leq\ell\leq 4 defined in the proof of Theorem 2, we define

𝒰n(1)(𝒲):=supw𝒲|1ijnεiΦi,1wi,j(Xi,Xj)Ψj,1εj|,𝒰n(2)(𝒲):=supw𝒲|1ijnεiΦi,2wi,j(Xi,Xj)Ψj,1εj|,𝒰n(3)(𝒲):=supw𝒲|1ijnεiΦi,1wi,j(Xi,Xj)Ψj,2εj|,𝒰n(4)(𝒲):=supw𝒲|1ijnεiΦi,2wi,j(Xi,Xj)Ψj,2εj|.\begin{split}\mathcal{U}_{n}^{(1)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}\right|,\\ \mathcal{U}_{n}^{(2)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,1}\varepsilon_{j}^{\prime}\right|,\\ \mathcal{U}_{n}^{(3)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime}\right|,\\ \mathcal{U}_{n}^{(4)}(\mathcal{W})&:=\sup_{w\in\mathcal{W}}\left|\sum_{1\leq i\neq j\leq n}\varepsilon_{i}\Phi_{i,2}w_{i,j}(X_{i},X_{j}^{\prime})\Psi^{\prime}_{j,2}\varepsilon_{j}^{\prime}\right|.\end{split} (33)

As in the proof of Theorem 2, we will control each of the terms separately in the following lemmas. All the lemmas below assume (B1) and (A2).

Lemma 5 (Control of 𝒰n(4)(𝒲)\mathcal{U}_{n}^{(4)}(\mathcal{W})).

There exists a constant K>0K>0 (depending only on α,β\alpha,\beta) such that for all p1p\geq 1,

𝒰n(4)(𝒲)pKΛ2(𝒲)p1/α+1/β.\left\lVert\mathcal{U}_{n}^{(4)}(\mathcal{W})\right\rVert_{p}\leq K\Lambda_{2}(\mathcal{W})p^{1/\alpha^{*}+1/\beta^{*}}.
Proof.

Since wi,jB𝒲\left\lVert w_{i,j}\right\rVert_{\infty}\leq B_{\mathcal{W}} for all w𝒲w\in\mathcal{W}, it follows that

𝒰n(4)(𝒲)B𝒲1ijn|Φi,2Ψj,2|B𝒲(i=1n|Φi,2|)(j=1n|Ψj,2|).\mathcal{U}_{n}^{(4)}(\mathcal{W})\leq B_{\mathcal{W}}\sum_{1\leq i\neq j\leq n}|\Phi_{i,2}\Psi_{j,2}^{\prime}|\leq B_{\mathcal{W}}\left(\sum_{i=1}^{n}|\Phi_{i,2}|\right)\left(\sum_{j=1}^{n}|\Psi_{j,2}^{\prime}|\right).

By definition

(max1Ini=1I|Φi,2|>0|𝒳n)\displaystyle\mathbb{P}\left(\max_{1\leq I\leq n}\sum_{i=1}^{I}|\Phi_{i,2}|>0\big{|}\mathcal{X}_{n}\right) (max1in|ϕ(Zi)|Tϕ|𝒳n)1/8,\displaystyle\leq\mathbb{P}\left(\max_{1\leq i\leq n}|\phi(Z_{i})|\geq T_{\phi}\big{|}\mathcal{X}_{n}\right)\leq 1/8,
(max1Ini=1I|Ψi,2|>0|𝒳n)\displaystyle\mathbb{P}\left(\max_{1\leq I\leq n}\sum_{i=1}^{I}|\Psi_{i,2}^{\prime}|>0\big{|}\mathcal{X}_{n}^{\prime}\right) (max1in|ψ(Zi)|Tψ|𝒳n)1/8.\displaystyle\leq\mathbb{P}\left(\max_{1\leq i\leq n}|\psi(Z_{i}^{\prime})|\geq T_{\psi}\big{|}\mathcal{X}_{n}^{\prime}\right)\leq 1/8.

Hence by (6.8) of Ledoux and Talagrand, (1991), we get that

𝔼[i=1n|Φi,2||𝒳n]\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}|\Phi_{i,2}|\big{|}\mathcal{X}_{n}\right] C𝔼[max1in|ϕ(Zi)||𝒳n]\displaystyle\leq C\mathbb{E}\left[\max_{1\leq i\leq n}|\phi(Z_{i})|\big{|}\mathcal{X}_{n}\right]
𝔼[i=1n|Ψi,2||𝒳n]\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}|\Psi_{i,2}^{\prime}|\big{|}\mathcal{X}_{n}^{\prime}\right] C𝔼[max1in|ψ(Zi)||𝒳n],\displaystyle\leq C\mathbb{E}\left[\max_{1\leq i\leq n}|\psi(Z_{i}^{\prime})|\big{|}\mathcal{X}_{n}^{\prime}\right],

for some constant C>0C>0. Thus by applying Theorem 6.21 of Ledoux and Talagrand, (1991) to {Φi,1𝔼[Φi,1|𝒳n]}\sum\{\Phi_{i,1}-\mathbb{E}[\Phi_{i,1}|\mathcal{X}_{n}]\} and {Ψi,2𝔼[Ψi,2|𝒳n]}\sum\{\Psi_{i,2}^{\prime}-\mathbb{E}[\Psi_{i,2}^{\prime}|\mathcal{X}_{n}^{\prime}]\}, we get

i=1n|Φi,2|ψα|𝒳n\displaystyle\left\lVert\sum_{i=1}^{n}|\Phi_{i,2}|\right\rVert_{\psi_{\alpha^{*}}|\mathcal{X}_{n}} Cmax1in|ϕ(Zi)|ψα|𝒳nCCϕ(logn)1/α,\displaystyle\leq C\left\lVert\max_{1\leq i\leq n}|\phi(Z_{i})|\right\rVert_{\psi_{\alpha}|\mathcal{X}_{n}}\leq CC_{\phi}(\log n)^{1/\alpha},
i=1n|Ψi,2|ψβ|𝒳n\displaystyle\left\lVert\sum_{i=1}^{n}|\Psi_{i,2}^{\prime}|\right\rVert_{\psi_{\beta^{*}}|\mathcal{X}_{n}} Cmax1in|ψ(Zi)|ψβ|𝒳nCCψ(logn)1/β,\displaystyle\leq C\left\lVert\max_{1\leq i\leq n}|\psi(Z_{i}^{\prime})|\right\rVert_{\psi_{\beta}|\mathcal{X}_{n}^{\prime}}\leq CC_{\psi}(\log n)^{1/\beta},

Therefore, for all p1p\geq 1,

𝒰n(4)(𝒲)pKB𝒲CϕCψ(logn)α1+β1p1/α+1/β=KΛ2(𝒲)p1/α+1/β.\left\lVert\mathcal{U}_{n}^{(4)}(\mathcal{W})\right\rVert_{p}\leq KB_{\mathcal{W}}C_{\phi}C_{\psi}(\log n)^{\alpha^{-1}+\beta^{-1}}p^{1/\alpha^{*}+1/\beta^{*}}=K\Lambda_{2}(\mathcal{W})p^{1/\alpha^{*}+1/\beta^{*}}.

This completes the proof. ∎

The following lemma controls the moments of 𝒰n(2)(𝒲)\mathcal{U}_{n}^{(2)}(\mathcal{W}) and 𝒰n(3)(𝒲)\mathcal{U}_{n}^{(3)}(\mathcal{W}).

Lemma 6 (Control of 𝒰n(2)(𝒲)\mathcal{U}_{n}^{(2)}(\mathcal{W}) and 𝒰n(3)(𝒲)\mathcal{U}_{n}^{(3)}(\mathcal{W})).

There exists a constant K>0K>0 (depending only on α,β\alpha,\beta) such that for p1p\geq 1,

𝒰n(2)(𝒲)p\displaystyle\left\lVert\mathcal{U}_{n}^{(2)}(\mathcal{W})\right\rVert_{p} Kp1/α[En,2(𝒲)+(logn)1/2Σn,21/2(𝒲)+(logn)Λ2(𝒲)]\displaystyle\leq Kp^{1/\alpha^{*}}\left[E_{n,2}(\mathcal{W})+(\log n)^{1/2}\Sigma_{n,2}^{1/2}(\mathcal{W})+(\log n)\Lambda_{2}(\mathcal{W})\right]
+Kp1/2+1/αΣn,21/2(𝒲)+Kp1+1/αΛ2(𝒲)\displaystyle\qquad+Kp^{1/2+1/\alpha^{*}}\Sigma_{n,2}^{1/2}(\mathcal{W})+Kp^{1+1/\alpha^{*}}\Lambda_{2}(\mathcal{W})
𝒰n(3)(𝒲)p\displaystyle\left\lVert\mathcal{U}_{n}^{(3)}(\mathcal{W})\right\rVert_{p} Kp1/β[En,1(𝒲)+(logn)1/2Σn,11/2(𝒲)+(logn)Λ2(𝒲)]\displaystyle\leq Kp^{1/\beta^{*}}\left[E_{n,1}(\mathcal{W})+(\log n)^{1/2}\Sigma_{n,1}^{1/2}(\mathcal{W})+(\log n)\Lambda_{2}(\mathcal{W})\right]
+Kp1/2+1/βΣn,11/2(𝒲)+Kp1+1/βΛ2(𝒲).\displaystyle\qquad+Kp^{1/2+1/\beta^{*}}\Sigma_{n,1}^{1/2}(\mathcal{W})+Kp^{1+1/\beta^{*}}\Lambda_{2}(\mathcal{W}).
Proof.

We will only prove the bound for 𝒰n(2)(𝒲)\mathcal{U}_{n}^{(2)}(\mathcal{W}) and the proof for 𝒰n(3)(𝒲)\mathcal{U}_{n}^{(3)}(\mathcal{W}) follows very similar arguments. Recall that

𝒰n(2)(𝒲)=supw𝒲|i=1nεiΦi,2gi(Xi;𝒵n,w)|,wheregi(x;𝒵n,w):=j=1,jinΨj,1εjwi,j(x,Xj).\mathcal{U}_{n}^{(2)}(\mathcal{W})=\sup_{w\in\mathcal{W}}\left|\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,2}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right|,\;\mbox{where}\;g_{i}(x;\mathcal{Z}_{n}^{\prime},w):=\sum_{j=1,j\neq i}^{n}\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}w_{i,j}(x,X_{j}^{\prime}).

Here again (6.8) of Ledoux and Talagrand, (1991) applies and we get

𝒰n(2)(𝒲)ψα|𝒳n,𝒵nKCϕ(logn)1/αmax1insupw𝒲|j=1,jinεjΨj,1wi,j(Xi,Xj)|.\left\lVert\mathcal{U}_{n}^{(2)}(\mathcal{W})\right\rVert_{\psi_{\alpha^{*}}\big{|}\mathcal{X}_{n},\mathcal{Z}_{n}^{\prime}}\leq KC_{\phi}(\log n)^{1/\alpha}\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(X_{i},X_{j}^{\prime})\right|.

By a similar calculation, we get

𝒰n(3)(𝒲)ψβ|𝒳n,𝒵nKCψ(logn)1/βmax1jnsupw𝒲|i=1,ijnεiΦi,1wi,j(Xi,Xj)|.\left\lVert\mathcal{U}_{n}^{(3)}(\mathcal{W})\right\rVert_{\psi_{\beta^{*}}\big{|}\mathcal{X}_{n}^{\prime},\mathcal{Z}_{n}}\leq KC_{\psi}(\log n)^{1/\beta}\max_{1\leq j\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right|.

Thus, for p1p\geq 1,

𝔼[|𝒰n(2)(𝒲)|p]KpCϕp(logn)p/αpp/α𝔼[max1insupw𝒲|j=1,jinεjΨj,1wi,j(Xi,Xj)|p],𝔼[|𝒰n(3)(𝒲)|p]KpCψp(logn)p/βpp/β𝔼[max1jnsupw𝒲|i=1,ijnεiΦi,1wi,j(Xi,Xj)|p].\begin{split}\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}(\mathcal{W})|^{p}\right]&\leq K^{p}C_{\phi}^{p}(\log n)^{p/\alpha}p^{p/\alpha^{*}}\mathbb{E}\left[\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(X_{i},X_{j}^{\prime})\right|^{p}\right],\\ \mathbb{E}\left[|\mathcal{U}_{n}^{(3)}(\mathcal{W})|^{p}\right]&\leq K^{p}C_{\psi}^{p}(\log n)^{p/\beta}p^{p/\beta^{*}}\mathbb{E}\left[\max_{1\leq j\leq n}\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right|^{p}\right].\end{split} (34)

The right hand side quantities involve supremum of bounded empirical processes for which Talagrand’s inequality applies; see proposition 3.1 of Giné et al., (2000). Observe that for any x𝔛x\in\mathfrak{X},

max1jnsupw𝒲|Ψj,1wi,j(x,Xj)|\displaystyle\max_{1\leq j\leq n}\sup_{w\in\mathcal{W}}|\Psi_{j,1}^{\prime}w_{i,j}(x,X_{j}^{\prime})| Cψ(logn)1/βB𝒲,\displaystyle\leq C_{\psi}(\log n)^{1/\beta}B_{\mathcal{W}},
max1insupw𝒲|Φi,1wi,j(Xi,x)|\displaystyle\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}|\Phi_{i,1}w_{i,j}(X_{i},x)| Cϕ(logn)1/αB𝒲.\displaystyle\leq C_{\phi}(\log n)^{1/\alpha}B_{\mathcal{W}}.

By proposition 3.1 of Giné et al., (2000), we obtain for any x𝔛x\in\mathfrak{X} and p1p\geq 1,

𝔼[supw𝒲|gi(x;𝒵n,w)|p]Kp{E¯n,2p(𝒲)+pp/2Σ¯n,2p/2(𝒲)+ppCψp(logn)p/βB𝒲p},\displaystyle\mathbb{E}\left[\sup_{w\in\mathcal{W}}\,|g_{i}(x;\mathcal{Z}_{n}^{\prime},w)|^{p}\right]\leq K^{p}\left\{\bar{E}_{n,2}^{p}(\mathcal{W})+p^{p/2}\bar{\Sigma}_{n,2}^{p/2}(\mathcal{W})+p^{p}C_{\psi}^{p}(\log n)^{p/\beta}B_{\mathcal{W}}^{p}\right\},

where E¯n,2(𝒲)=Cϕ1En,2(𝒲)/(logn)1/α\bar{E}_{n,2}(\mathcal{W})=C_{\phi}^{-1}E_{n,2}(\mathcal{W})/(\log n)^{1/\alpha} and Σ¯n,21/2(𝒲)=Cϕ1Σn,21/2(𝒲)/(logn)1/α\bar{\Sigma}_{n,2}^{1/2}(\mathcal{W})=C_{\phi}^{-1}\Sigma_{n,2}^{1/2}(\mathcal{W})/(\log n)^{1/\alpha}. Therefore, by following the argument that lead to (26), we get that

𝔼[max1insupw𝒲|gi(Xi;𝒵n,w)|p]\displaystyle\mathbb{E}\left[\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)|^{p}\right] (35)
Kp[E¯n,2p(𝒲)+pp/2Σ¯n,2p/2(𝒲)+ppCψp(logn)p/βB𝒲p]\displaystyle\qquad\leq K^{p}\left[\bar{E}_{n,2}^{p}(\mathcal{W})+p^{p/2}\bar{\Sigma}_{n,2}^{p/2}(\mathcal{W})+p^{p}C_{\psi}^{p}(\log n)^{p/\beta}B_{\mathcal{W}}^{p}\right]
+Kp[(logn)p/2Σ¯n,2p/2(𝒲)+(logn)pCψp(logn)p/βB𝒲p].\displaystyle\qquad\quad+K^{p}\left[(\log n)^{p/2}\bar{\Sigma}_{n,2}^{p/2}(\mathcal{W})+(\log n)^{p}C_{\psi}^{p}(\log n)^{p/\beta}B_{\mathcal{W}}^{p}\right].

Substituting this in (34), we get

𝔼[|𝒰n(2)(𝒲)|p]\displaystyle\mathbb{E}\left[|\mathcal{U}_{n}^{(2)}(\mathcal{W})|^{p}\right] Kppp/α[En,2p(𝒲)+pp/2Σn,2p/2(𝒲)+ppΛ2p(𝒲)]\displaystyle\leq K^{p}p^{p/\alpha^{*}}\left[E_{n,2}^{p}(\mathcal{W})+p^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+p^{p}\Lambda_{2}^{p}(\mathcal{W})\right]
+Kppp/α[(logn)p/2Σn,2p/2(𝒲)+(logn)pΛ2p(𝒲)].\displaystyle\quad+K^{p}p^{p/\alpha^{*}}\left[(\log n)^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+(\log n)^{p}\Lambda_{2}^{p}(\mathcal{W})\right].

By a similar calculation, we get

𝔼[|𝒰n(3)(𝒲)|p]\displaystyle\mathbb{E}\left[|\mathcal{U}_{n}^{(3)}(\mathcal{W})|^{p}\right] Kppp/β[En,1p(𝒲)+pp/2Σn,1p/2(𝒲)+ppΛ2p(𝒲)]\displaystyle\leq K^{p}p^{p/\beta^{*}}\left[E_{n,1}^{p}(\mathcal{W})+p^{p/2}\Sigma_{n,1}^{p/2}(\mathcal{W})+p^{p}\Lambda_{2}^{p}(\mathcal{W})\right]
+Kppp/β[(logn)p/2Σn,1p/2(𝒲)+(logn)pΛ2p(𝒲)].\displaystyle\quad+K^{p}p^{p/\beta^{*}}\left[(\log n)^{p/2}\Sigma_{n,1}^{p/2}(\mathcal{W})+(\log n)^{p}\Lambda_{2}^{p}(\mathcal{W})\right].

This completes the proof of the result. ∎

The following lemma controls the moments of 𝒰n(1)(𝒲)\mathcal{U}_{n}^{(1)}(\mathcal{W}). This is a bounded degenerate UU-process and is (usually) the dominating term among the four parts.

Lemma 7 (Control of 𝒰n(1)(𝒲)\mathcal{U}_{n}^{(1)}(\mathcal{W})).

There exists a constant K>0K>0 (depending only on α,β\alpha,\beta) such that for all p1p\geq 1,

𝒰n(1)(𝒲)p\displaystyle\left\lVert\mathcal{U}_{n}^{(1)}(\mathcal{W})\right\rVert_{p} K𝔼[𝒰n(1)(𝒲)]+Kp1/2(𝔚n,1(𝒲)+𝔚n,2(𝒲))\displaystyle\leq K\mathbb{E}\left[\mathcal{U}_{n}^{(1)}(\mathcal{W})\right]+Kp^{1/2}\left(\mathfrak{W}_{n,1}(\mathcal{W})+\mathfrak{W}_{n,2}(\mathcal{W})\right)
+Kp((ϕwψ)𝒲22+En,1(𝒲)+En,2(𝒲)+Σn,21/2(𝒲)logn+Λ2(𝒲)logn)\displaystyle\quad+Kp\left(\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}+E_{n,1}(\mathcal{W})+E_{n,2}(\mathcal{W})+\Sigma_{n,2}^{1/2}(\mathcal{W})\sqrt{\log n}+\Lambda_{2}(\mathcal{W})\log n\right)
+Kp3/2(Σn,11/2(𝒲)+Σn,21/2(𝒲))+Kp2Λ2(𝒲).\displaystyle\quad+Kp^{3/2}\left(\Sigma_{n,1}^{1/2}(\mathcal{W})+\Sigma_{n,2}^{1/2}(\mathcal{W})\right)+Kp^{2}\Lambda_{2}(\mathcal{W}).
Proof.

Recall that

𝒰n(1)(𝒲)=supw𝒲|i=1nεiΦi,1gi(Xi;𝒵n,w)|,wheregi(Xi;𝒵n,w):=j=1,jinεjΨj,1wi,j(Xi,Xj).\mathcal{U}_{n}^{(1)}(\mathcal{W})=\sup_{w\in\mathcal{W}}\left|\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,1}g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right|,\,\mbox{where}\,g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w):=\sum_{j=1,j\neq i}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}w_{i,j}(X_{i},X_{j}^{\prime}).

Observe that conditional on 𝒵n\mathcal{Z}_{n}^{\prime}, 𝒰n(1)(𝒲)\mathcal{U}_{n}^{(1)}(\mathcal{W}) is a bounded empirical process and so Talagrand’s inequality applies. Thus by Proposition 3.1 of Giné et al., (2000), we get for p1p\geq 1

𝔼[|𝒰n(1)(𝒲)|p|𝒵n]\displaystyle\mathbb{E}\left[|\mathcal{U}_{n}^{(1)}(\mathcal{W})|^{p}\big{|}\mathcal{Z}_{n}^{\prime}\right] Kp(𝔼[|𝒰n(1)(𝒲)||𝒵n])p\displaystyle\leq K^{p}\left(\mathbb{E}\left[|\mathcal{U}_{n}^{(1)}(\mathcal{W})|\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p}
+Kppp/2supw𝒲(i=1n𝔼[Φi,12gi2(Xi;𝒵n,w)|𝒵n])p/2\displaystyle\quad+K^{p}p^{p/2}\sup_{w\in\mathcal{W}}\left(\sum_{i=1}^{n}\mathbb{E}\left[\Phi_{i,1}^{2}g_{i}^{2}(X_{i};\mathcal{Z}_{n}^{\prime},w)\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p/2}
+Kppp𝔼[max1in|Φi,1|psupw𝒲|gi(Xi;𝒵n,w)|p|𝒵n].\displaystyle\quad+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq i\leq n}|\Phi_{i,1}|^{p}\sup_{w\in\mathcal{W}}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right|^{p}\big{|}\mathcal{Z}_{n}^{\prime}\right].

Therefore, for p1p\geq 1,

𝔼[|𝒰n(1)(𝒲)|p]\displaystyle\mathbb{E}\left[|\mathcal{U}_{n}^{(1)}(\mathcal{W})|^{p}\right] Kp𝔼(𝔼[|𝒰n(1)(𝒲)||𝒵n])p\displaystyle\leq K^{p}\mathbb{E}\left(\mathbb{E}\left[|\mathcal{U}_{n}^{(1)}(\mathcal{W})|\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p}
+Kppp/2𝔼[supw𝒲(i=1n𝔼[σi,ϕ2(Xi)gi2(Xi;𝒵n,w)|𝒵n])p/2]\displaystyle\quad+K^{p}p^{p/2}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left(\sum_{i=1}^{n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})g_{i}^{2}(X_{i};\mathcal{Z}_{n}^{\prime},w)\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p/2}\right]
+KpppCϕp(logn)p/α𝔼[max1insupw𝒲|gi(Xi;𝒵n,w)|p]\displaystyle\quad+K^{p}p^{p}C_{\phi}^{p}(\log n)^{p/\alpha}\mathbb{E}\left[\max_{1\leq i\leq n}\sup_{w\in\mathcal{W}}\left|g_{i}(X_{i};\mathcal{Z}_{n}^{\prime},w)\right|^{p}\right]
=:Kp[𝐈+𝐈𝐈+𝐈𝐈𝐈].\displaystyle=:K^{p}\left[\mathbf{I}+\mathbf{II}+\mathbf{III}\right].

Controlling 𝐈𝐈𝐈:\mathbf{III}: Using (35) from Lemma 6, we get

𝐈𝐈𝐈\displaystyle\mathbf{III} Kppp[En,2p(𝒲)+pp/2Σn,2p/2(𝒲)+ppΛ2p(𝒲)]\displaystyle\leq K^{p}p^{p}\left[E_{n,2}^{p}(\mathcal{W})+p^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+p^{p}\Lambda_{2}^{p}(\mathcal{W})\right]
+Kppp[(logn)p/2Σn,2p/2(𝒲)+(logn)pΛ2p(𝒲)].\displaystyle\quad+K^{p}p^{p}\left[(\log n)^{p/2}\Sigma_{n,2}^{p/2}(\mathcal{W})+(\log n)^{p}\Lambda_{2}^{p}(\mathcal{W})\right].

Controlling 𝐈𝐈:\mathbf{II}: To control 𝐈𝐈\mathbf{II}, we use a technique similar to the one used in Lemma 4. For this note by (32) that for any w(,)w(\cdot,\cdot)

(i=1nσi,ϕ2(x)gi2(x;𝒵n,w)PXi(dx))1/2\displaystyle\left(\sum_{i=1}^{n}\int\sigma_{i,\phi}^{2}(x)g_{i}^{2}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx)\right)^{1/2}
=sup{i=1nqi(x)σi,ϕ(x)gi(x;𝒵n,w)PXi(dx):i=1nqi2(x)PXi(dx)1}.\displaystyle\qquad=\sup\left\{\sum_{i=1}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)g_{i}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx):\,\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1\right\}.

Therefore,

𝐈𝐈\displaystyle\mathbf{II} =pp/2𝔼[supw𝒲supi=1nqi2(x)PXi(dx)1|i=1nqi(x)σi,ϕ(x)gi(x;𝒵n,w)PXi(dx)|p].\displaystyle=p^{p/2}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1}\left|\sum_{i=1}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)g_{i}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx)\right|^{p}\right].

Now observe that

i=1nqi(x)σi,ϕ(x)gi(x;𝒵n,w)PXi(dx)=j=1nεjΨj,1j(Xj;{qi},w),\displaystyle\sum_{i=1}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)g_{i}(x;\mathcal{Z}_{n}^{\prime},w)P_{X_{i}}(dx)=\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w),

where {qi}\{q_{i}\} represents the sequence (q1,,qn)(q_{1},\ldots,q_{n}) satisfying i=1nqi2(x)PXi(dx)1}\sum_{i=1}^{n}\int q_{i}^{2}(x)P_{X_{i}}(dx)\leq 1\} and

j(Xj;{qi},w):=i=1,ijnqi(x)σi,ϕ(x)wi,j(x,Xj)PXi(dx).\ell_{j}(X_{j}^{\prime};\{q_{i}\},w):=\sum_{i=1,i\neq j}^{n}\int q_{i}(x)\sigma_{i,\phi}(x)w_{i,j}(x,X_{j}^{\prime})P_{X_{i}}(dx). (36)

Thus

𝐈𝐈=pp/2𝔼[supw𝒲sup{qi}|j=1nεjΨj,1j(Xj;{qi},w)|p].\mathbf{II}=p^{p/2}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|^{p}\right].

The right hand side is a bounded empirical process and by proposition 3.1 of Giné et al., (2000), we get

𝔼[supw𝒲sup{qi}|j=1nεjΨj,1j(Xj;{qi},w)|p]Kp(𝔼[sup{qi}supw𝒲|j=1nεjΨj,1j(Xj;{qi},w)|])p+Kppp/2sup{qi}supw𝒲(Var(j=1nεjΨj,1j(Xj;{qi},w)))p/2+Kppp𝔼[sup{qi}supw𝒲max1jn|Ψj,1|p|j(Xj;{qi},w)|p].\begin{split}&\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sup_{\{q_{i}\}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|^{p}\right]\\ &\qquad\leq K^{p}\left(\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|\right]\right)^{p}\\ &\qquad\quad+K^{p}p^{p/2}\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left(\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right)\right)^{p/2}\\ &\qquad\quad+K^{p}p^{p}\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\max_{1\leq j\leq n}|\Psi_{j,1}^{\prime}|^{p}|\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)|^{p}\right].\end{split} (37)

We will now control each of the three terms appearing in (37). Using the fact |Ψj,1|KCϕ(logn)1/β|\Psi_{j,1}^{\prime}|\leq KC_{\phi}(\log n)^{1/\beta}, we get

𝔼[sup{qi}supw𝒲max1jn|Ψj,1|p|j(Xj;{qi},w)|p]Cψp(logn)p/βsupw𝒲supx𝔛sup{qi}|j(x;{qi},w)|p.\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\max_{1\leq j\leq n}|\Psi_{j,1}^{\prime}|^{p}|\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)|^{p}\right]\leq C_{\psi}^{p}(\log n)^{p/\beta}\sup_{w\in\mathcal{W}}\sup_{x^{\prime}\in\mathfrak{X}}\sup_{\{q_{i}\}}|\ell_{j}(x^{\prime};\{q_{i}\},w)|^{p}.

By following the duality argument (32), we get

sup{qi}|j(x;{qi},w)|(i=1,ijn𝔼[σi,ϕ2(Xi)w2(Xi,x)])1/2,\sup_{\{q_{i}\}}|\ell_{j}(x^{\prime};\{q_{i}\},w)|\leq\left(\sum_{i=1,i\neq j}^{n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w^{2}(X_{i},x^{\prime})\right]\right)^{1/2},

and so,

𝔼[sup{qi}supw𝒲max1jn|Ψj,1|p|j(Xj;{qi},w)|p]KpCψp(logn)p/βsupw𝒲,x𝔛(i=1n𝔼[σi,ϕ2(Xi)w2(Xi,x)])p/2=KpΣn,1p/2(𝒲).\begin{split}&\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\max_{1\leq j\leq n}|\Psi_{j,1}^{\prime}|^{p}|\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)|^{p}\right]\\ &\qquad\leq K^{p}C_{\psi}^{p}(\log n)^{p/\beta}\sup_{w\in\mathcal{W},x\in\mathfrak{X}}\,\left(\sum_{i=1}^{n}\mathbb{E}\left[\sigma_{i,\phi}^{2}(X_{i})w^{2}(X_{i},x^{\prime})\right]\right)^{p/2}=K^{p}\Sigma_{n,1}^{p/2}(\mathcal{W}).\end{split} (38)

Also, note that

Var(j=1nεjΨj,1j(Xj;{qi},w))=j=1n𝔼[σj,ψ2(Xj)j2(Xj;{qi},w)].\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right)=\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j}^{\prime})\ell_{j}^{2}(X_{j}^{\prime};\{q_{i}\},w)\right].

Hence, again following the duality argument (32), we get

sup{qi}supw𝒲(Var(j=1nεjΨj,1j(Xj;{qi},w)))p/2sup{qi}supw𝒲sup{pj}(1ijn𝔼[qi(Xi)σi,ϕ(Xi)wi,j(Xi,Xj)σj,ψ(Xj)pj(Xj)])p.\begin{split}&\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left(\mbox{Var}\left(\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right)\right)^{p/2}\\ &\qquad\leq\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\sup_{\{p_{j}\}}\left(\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[q_{i}(X_{i})\sigma_{i,\phi}(X_{i})w_{i,j}(X_{i},X_{j}^{\prime})\sigma_{j,\psi}(X_{j}^{\prime})p_{j}(X_{j}^{\prime})\right]\right)^{p}.\end{split} (39)

Here {pj}\{p_{j}\} represents a sequence (p1,,pn)(p_{1},\ldots,p_{n}) satisfying j=1npj2(x)PXj(dx)1\sum_{j=1}^{n}\int p_{j}^{2}(x)P_{X_{j}}(dx)\leq 1.

Substituting (39) and (38) in (37), we get

𝐈𝐈Kppp/2(𝔼[sup{qi}supw𝒲|j=1nεjΨj,1j(Xj;{qi},w)|])p+Kppp(ϕwψ)𝒲22p+Kpp3p/2Σn,1p/2(𝒲).\begin{split}\mathbf{II}&\leq K^{p}p^{p/2}\left(\mathbb{E}\left[\sup_{\{q_{i}\}}\sup_{w\in\mathcal{W}}\left|\sum_{j=1}^{n}\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\ell_{j}(X_{j}^{\prime};\{q_{i}\},w)\right|\right]\right)^{p}\\ &\qquad\quad+K^{p}p^{p}\left\lVert(\phi w\psi)_{\mathcal{W}}\right\rVert_{2\to 2}^{p}+K^{p}p^{3p/2}\Sigma_{n,1}^{p/2}(\mathcal{W}).\end{split} (40)

Controlling 𝐈:\mathbf{I}: We use Lemma 8 (a restatement of Lemma 2 of Adamczak, (2006)) to control 𝐈.\mathbf{I}. In the notation of Lemma 8, take

Wj=(εj,Zj),T=(Z1,,Zn,ε1,,εn),W_{j}=(\varepsilon_{j}^{\prime},Z_{j}^{\prime}),\,T=(Z_{1},\ldots,Z_{n},\varepsilon_{1},\ldots,\varepsilon_{n}),

and for w𝒲w\in\mathcal{W},

fjw(Wj,T)=i=1,ijnεiΦi,1wi,j(Xi,Xj)Ψj,1εj.f_{j}^{w}(W_{j},T)=\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}.

This implies

S=𝔼T[supw𝒲|j=1ni=1,ijnεiΦi,1wi,j(Xi,Xj)Ψj,1εj|]=𝔼[𝒰n(1)(𝒲)|𝒵n].S=\mathbb{E}_{T}\left[\sup_{w\in\mathcal{W}}\left|\sum_{j=1}^{n}\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}\right|\right]=\mathbb{E}\left[\mathcal{U}_{n}^{(1)}(\mathcal{W})\big{|}\mathcal{Z}_{n}^{\prime}\right].

Observe that 𝔼[S]=𝔼[𝒰n(1)(𝒲)]\mathbb{E}[S]=\mathbb{E}[\mathcal{U}_{n}^{(1)}(\mathcal{W})]. Thus we get for p1p\geq 1

𝔼[Sp]\displaystyle\mathbb{E}\left[S^{p}\right] Kp(𝔼[S])p+Kppp/2Υp\displaystyle\leq K^{p}\left(\mathbb{E}[S]\right)^{p}+K^{p}p^{p/2}\Upsilon^{p} (41)
+Kppp𝔼[max1jn(𝔼[supw𝒲|i=1,ijnεiΦi,1wi,j(Xi,Xj)Ψj,1εj||𝒵n])p],\displaystyle\qquad+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq j\leq n}\left(\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}\right|\big{|}\mathcal{Z}_{n}^{\prime}\right]\right)^{p}\right],

where

Υ\displaystyle\Upsilon :=supq𝒬(j=1n𝔼[(w𝒲𝔼T[fjw(Wj,T)qj(T)])2])1/2,\displaystyle:=\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[\left(\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[f_{j}^{w}(W_{j},T)q_{j}(T)\right]\right)^{2}\right]\right)^{1/2},

with 𝒬\mathcal{Q} defined in Lemma 8. We now simplify the last two terms on the right hand side of (41). First observe that for the third term

𝔼[supw𝒲|i=1,ijnεiΦi,1wi,j(Xi,Xj)Ψj,1εj||𝒵n]\displaystyle\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\Psi_{j,1}^{\prime}\varepsilon_{j}^{\prime}\right|\big{|}\mathcal{Z}_{n}^{\prime}\right]
KCψ(logn)1/βsupx𝔛𝔼[supw𝒲|i=1,ijnεiΦi,1wi,j(Xi,x)|]=KEn,1(𝒲).\displaystyle\qquad\leq KC_{\psi}(\log n)^{1/\beta}\sup_{x\in\mathfrak{X}}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\left|\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},x)\right|\right]=KE_{n,1}(\mathcal{W}).

To control Υ\Upsilon, observe that

w𝒲𝔼T[fjw(Wj,T)qj(T)]=εjΨj,1w𝒲𝔼T[qj(T)i=1,ijnεiΦi,1wi,j(Xi,Xj)].\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[f_{j}^{w}(W_{j},T)q_{j}(T)\right]=\varepsilon_{j}^{\prime}\Psi_{j,1}^{\prime}\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[q_{j}(T)\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right].

So, using the definition of σj,ψ2()\sigma_{j,\psi}^{2}(\cdot), we get

Υ\displaystyle\Upsilon =supq𝒬(j=1n𝔼[σj,ψ2(Xj)(w𝒲𝔼T[qj(T)i=1,ijnεiΦi,1wi,j(Xi,Xj)])2])1/2\displaystyle=\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[\sigma_{j,\psi}^{2}(X_{j}^{\prime})\left(\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[q_{j}(T)\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right]\right)^{2}\right]\right)^{1/2}
=(a)sup{pj},q𝒬j=1n𝔼[pj(Xj)σj,ψ(Xj)w𝒲𝔼T[qj(T)i=1,ijnεiΦi,1wi,j(Xi,Xj)]]\displaystyle\overset{(a)}{=}\sup_{\{p_{j}\},q\in\mathcal{Q}}\sum_{j=1}^{n}\mathbb{E}\left[p_{j}(X_{j}^{\prime})\sigma_{j,\psi}(X_{j}^{\prime})\sum_{w\in\mathcal{W}}\mathbb{E}_{T}\left[q_{j}(T)\sum_{i=1,i\neq j}^{n}\varepsilon_{i}\Phi_{i,1}w_{i,j}(X_{i},X_{j}^{\prime})\right]\right]
=(b)sup{pj}𝔼[supw𝒲i=1nεiΦi,1(j=1,jinpj(x)σj,ψ(x)wi,j(Xi,x)PXj(dx))].\displaystyle\overset{(b)}{=}\sup_{\{p_{j}\}}\mathbb{E}\left[\sup_{w\in\mathcal{W}}\sum_{i=1}^{n}\varepsilon_{i}\Phi_{i,1}\left(\sum_{j=1,j\neq i}^{n}\int p_{j}(x)\sigma_{j,\psi}(x)w_{i,j}(X_{i},x)P_{X_{j}}(dx)\right)\right].

Equality (a) above follows from the duality argument (32) while equality (b) follows from the argument given in Lemma 8.

B.2 Auxiliary Lemmas Used in Theorem 3

The following lemma is a rewording of Lemma 2 of Adamczak, (2006). For this result, define the class of functions

𝒬:={q()=(q1(),q2(),):k=1|qk(T)|=1for allT}.\mathcal{Q}:=\left\{q(\cdot)=(q_{1}(\cdot),q_{2}(\cdot),\ldots):\,\sum_{k=1}^{\infty}|q_{k}(T)|=1\quad\mbox{for all}\quad T\right\}.

The domain of functions in 𝒬\mathcal{Q} is left out on purpose.

Lemma 8.

Suppose :={(f1k,,fnk):k1}\mathcal{F}:=\{(f_{1}^{k},\ldots,f_{n}^{k}):\,k\geq 1\} represents a countable class of vector functions. Define for independent random variables T,W1,,WnT,W_{1},\ldots,W_{n},

S:=𝔼T[supk1|j=1nfjk(Wj,T)|],S:=\mathbb{E}_{T}\left[\sup_{k\geq 1}\left|\sum_{j=1}^{n}f_{j}^{k}(W_{j},T)\right|\right],

where 𝔼T[]\mathbb{E}_{T}[\cdot] represents the expectation only with respect to TT. (So, SS is a random variable that depends on W1,,WnW_{1},\ldots,W_{n}). If 𝔼W[fjk(Wj,T)]=0\mathbb{E}_{W}[f_{j}^{k}(W_{j},T)]=0 for a.e TT, then there exists a constant K>0K>0 such that for all p1p\geq 1,

𝔼[Sp]\displaystyle\mathbb{E}\left[S^{p}\right] Kp(𝔼[S])p+Kppp/2supq𝒬(j=1n𝔼[(k=1𝔼T[fjk(Wj,T)qj(T)])2])p/2\displaystyle\leq K^{p}(\mathbb{E}[S])^{p}+K^{p}p^{p/2}\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[\left(\sum_{k=1}^{\infty}\mathbb{E}_{T}[f_{j}^{k}(W_{j},T)q_{j}(T)]\right)^{2}\right]\right)^{p/2}
+Kppp𝔼[max1jn(𝔼T[supk1|fjk(Wj,T)|])p].\displaystyle\qquad+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq j\leq n}\left(\mathbb{E}_{T}\left[\sup_{k\geq 1}|f_{j}^{k}(W_{j},T)|\right]\right)^{p}\right].
Proof.

Following the proof of Lemma 2 of Adamczak, (2006), we get

S=supq𝒬|k=1𝔼T[qk(Y)j=1nfjk(Wj,T)]|.S=\sup_{q\in\mathcal{Q}}\,\left|\sum_{k=1}^{\infty}\mathbb{E}_{T}\left[q_{k}(Y)\sum_{j=1}^{n}f_{j}^{k}(W_{j},T)\right]\right|.

To see this, define q^()=(q^1(),)𝒬\widehat{q}(\cdot)=(\widehat{q}_{1}(\cdot),\ldots)\in\mathcal{Q} such that

q^k^(t)=sign(j=1nfjk^(Wj,T)),andq^k(t)=0,for kk^.\widehat{q}_{\widehat{k}}(t)=\mbox{sign}\left(\sum_{j=1}^{n}f_{j}^{\widehat{k}}(W_{j},T)\right),\quad\mbox{and}\quad\widehat{q}_{k}(t)=0,\quad\mbox{for $k\neq\widehat{k}$.}

Here k^\widehat{k} satisfying

|j=1nfjk^(Wj,T)|=supk1|j=1nfjk(Wj,T)|.\left|\sum_{j=1}^{n}f_{j}^{\widehat{k}}(W_{j},T)\right|=\sup_{k\geq 1}\left|\sum_{j=1}^{n}f_{j}^{k}(W_{j},T)\right|.

Therefore,

S=supq𝒬|j=1n(k=1𝔼T[qk(T)fjk(Wj,T)])|=:supq𝒬|j=1ngq,j(Wj)|.S=\sup_{q\in\mathcal{Q}}\left|\sum_{j=1}^{n}\left(\sum_{k=1}^{\infty}\mathbb{E}_{T}\left[q_{k}(T)f_{j}^{k}(W_{j},T)\right]\right)\right|=:\sup_{q\in\mathcal{Q}}\left|\sum_{j=1}^{n}g_{q,j}(W_{j})\right|.

The right hand side above is the supremum of a mean zero empirical process and so by proposition 3.1 of Giné et al., (2000), we get

𝔼[Sp]\displaystyle\mathbb{E}\left[S^{p}\right] Kp(𝔼[S])p+Kppp/2supq𝒬(j=1n𝔼[gq,j2(Wj)])p/2+Kppp𝔼[max1jnsupq𝒬|gq,j(Wj)|p].\displaystyle\leq K^{p}(\mathbb{E}[S])^{p}+K^{p}p^{p/2}\sup_{q\in\mathcal{Q}}\left(\sum_{j=1}^{n}\mathbb{E}\left[g_{q,j}^{2}(W_{j})\right]\right)^{p/2}+K^{p}p^{p}\mathbb{E}\left[\max_{1\leq j\leq n}\sup_{q\in\mathcal{Q}}\left|g_{q,j}(W_{j})\right|^{p}\right].

From the definition of 𝒬\mathcal{Q}, we get

supq𝒬|gq,j(Wj)|=supq𝒬|k=1𝔼T[qk(T)fjk(Wj,T)]|=𝔼T[supk1|fjk(Wj,T)|].\sup_{q\in\mathcal{Q}}|g_{q,j}(W_{j})|=\sup_{q\in\mathcal{Q}}\left|\sum_{k=1}^{\infty}\mathbb{E}_{T}\left[q_{k}(T)f_{j}^{k}(W_{j},T)\right]\right|=\mathbb{E}_{T}\left[\sup_{k\geq 1}|f_{j}^{k}(W_{j},T)|\right].

Thus,

𝔼[max1jnsupq𝒬|gq,j(Wj)|p]=𝔼[max1jn(𝔼T[supk1|fjk(Wj,T)|])p].\mathbb{E}\left[\max_{1\leq j\leq n}\sup_{q\in\mathcal{Q}}\left|g_{q,j}(W_{j})\right|^{p}\right]=\mathbb{E}\left[\max_{1\leq j\leq n}\left(\mathbb{E}_{T}\left[\sup_{k\geq 1}|f_{j}^{k}(W_{j},T)|\right]\right)^{p}\right].

So, the result follows. ∎

Appendix C Proof of the Maximal Inequality (Theorem 4)

The following moment bound of Rademacher chaos is used in the proof. See corollary 3.2.6 of de la Peña and Giné, (1999) and inequalities leading to (4.1.20) on page 167 of de la Peña and Giné, (1999).

Lemma 9.

Let ZZ be a homogeneous Rademacher chaos of degree 2, that is,

Z:=1ijnϵiϵjai,j,Z:=\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}a_{i,j},

for some constants ai,j,1ijna_{i,j},1\leq i\neq j\leq n. Then Zψ14esn\left\lVert Z\right\rVert_{\psi_{1}}\leq 4es_{n}, where

sn2:=1ijnai,j2.s_{n}^{2}:=\sum_{1\leq i\neq j\leq n}a_{i,j}^{2}.
Proof of Theorem 4.

As before, let 𝒳n:={X1,X2,,Xn}.\mathcal{X}_{n}:=\{X_{1},X_{2},\ldots,X_{n}\}. Also, let

Zϵ(f):=|1n(n1)1ijnϵiϵjfi,j(Xi,Xj)|.Z_{\epsilon}(f):=\left|\frac{1}{\sqrt{n(n-1)}}\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f_{i,j}(X_{i},X_{j})\right|.

By Lemma 9, we get conditional on 𝒳n\mathcal{X}_{n},

Zϵ(f)ψ1|𝒵n\displaystyle\left\lVert Z_{\epsilon}(f)\right\rVert_{\psi_{1}|\mathcal{Z}_{n}} 4e(1n(n1)1ijnf2i,j(Xi,Xj))1/24ef2,Pn,\displaystyle\leq 4e\left(\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}f^{2}_{i,j}(X_{i},X_{j})\right)^{1/2}\leq 4e\left\lVert f\right\rVert_{2,P_{n}},

where

f2,Pn:=(1n(n1)1ijnf2i,j(Xi,Xj))1/2,\left\lVert f\right\rVert_{2,P_{n}}:=\left(\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}f^{2}_{i,j}(X_{i},X_{j})\right)^{1/2},

and define the discrete probability measure PnP_{n} with support {X1,,Xn}\{X_{1},\ldots,X_{n}\} as

Pn({Xi}):=1nfor1in.P_{n}(\{X_{i}\}):=\frac{1}{n}\quad\mbox{for}\quad 1\leq i\leq n.

Now, following the proof of Theorem 5.1.4 of de la Peña and Giné, (1999),

maxfZϵ(f)ψ1|𝒳nC0ΔnlogN(ε,,2,Pn)dε,\left\lVert\max_{f\in\mathcal{F}}Z_{\epsilon}(f)\right\rVert_{\psi_{1}|\mathcal{X}_{n}}\leq C\int_{0}^{\Delta_{n}}\log N\left(\varepsilon,\mathcal{F},\left\lVert\cdot\right\rVert_{2,P_{n}}\right)d\varepsilon,

where

Δn:=supff2,Pn.\Delta_{n}:=\sup_{f\in\mathcal{F}}\left\lVert f\right\rVert_{2,P_{n}}.

Therefore,

maxfZϵ(f)ψ1|𝒳nCF2,PnJ2(ΔnF2,Pn,,2).\left\lVert\max_{f\in\mathcal{F}}Z_{\epsilon}(f)\right\rVert_{\psi_{1}|\mathcal{X}_{n}}\leq C\left\lVert F\right\rVert_{2,P_{n}}J_{2}\left(\frac{\Delta_{n}}{\left\lVert F\right\rVert_{2,P_{n}}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right).

This implies that

𝔼[supfZϵ(f)]C𝔼[F2,PnJ2(ΔnF2,Pn,,2)].\mathbb{E}\left[\sup_{f\in\mathcal{F}}Z_{\epsilon}(f)\right]\leq C\mathbb{E}\left[\left\lVert F\right\rVert_{2,P_{n}}J_{2}\left(\frac{\Delta_{n}}{\left\lVert F\right\rVert_{2,P_{n}}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)\right]. (42)

Using concavity of (x,y)yJ2(x/y,,2)(x,y)\mapsto\sqrt{y}J_{2}(\sqrt{x/y},\mathcal{F},\left\lVert\cdot\right\rVert_{2}) as in the proof of Theorem 2.1 of van der Vaart and Wellner, (2011), it follows that

𝔼[F2,PnJ2(ΔnF2,Pn,,2)]F2,PJ2(𝔼[Δn2]F2,P,,2),\mathbb{E}\left[\left\lVert F\right\rVert_{2,P_{n}}J_{2}\left(\frac{\Delta_{n}}{\left\lVert F\right\rVert_{2,P_{n}}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)\right]\leq\left\lVert F\right\rVert_{2,P}J_{2}\left(\frac{\sqrt{\mathbb{E}\left[\Delta_{n}^{2}\right]}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right), (43)

where

F2,P2:=1n(n1)1ijn𝔼[F2i,j(Xi,Xj)].\left\lVert F\right\rVert_{2,P}^{2}:=\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[F^{2}_{i,j}(X_{i},X_{j})\right].

At this point the proof of Theorem 5.1 of Chen and Kato, (2020) uses Hoeffding averaging to bound 𝔼[Δn2]\mathbb{E}\left[\Delta_{n}^{2}\right] which proves the result for i.i.d. random variables XiX_{i}. To allow for non-identically distributed random variables Xi,1inX_{i},1\leq i\leq n, we bound 𝔼[Δn2]\mathbb{E}[\Delta_{n}^{2}] in terms of J2J_{2} on the right hand side of (43). This is similar to the proof of Theorem 2.1 of van der Vaart and Wellner, (2011). To bound 𝔼[Δn2]\mathbb{E}\left[\Delta_{n}^{2}\right], define for f,f\in\mathcal{F},

Wn(1)(f)\displaystyle W_{n}^{(1)}(f) :=1n(n1)|1ijn{f2i,j(Xi,Xj)𝔼[f2i,j(Xi,Xj)|Xi]}\displaystyle:=\frac{1}{n(n-1)}\left|\sum_{1\leq i\neq j\leq n}\left\{f^{2}_{i,j}(X_{i},X_{j})-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})|X_{i}\right]\right\}\right.
{𝔼[f2i,j(Xi,Xj)|Xj]+𝔼[f2i,j(Xi,Xj)]}|,\displaystyle\qquad\qquad-\left.\vphantom{\sum_{1\leq i\neq j\leq n}}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})|X_{j}\right]+\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right|,
Wn(2)(f)\displaystyle W_{n}^{(2)}(f) :=1n(n1)|1ijn{𝔼[f2i,j(Xi,Xj)|Xi]𝔼[f2i,j(Xi,Xj)]}|,\displaystyle:=\frac{1}{n(n-1)}\left|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})|X_{i}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right|,
Wn(3)(f)\displaystyle W_{n}^{(3)}(f) :=1n(n1)|1ijn{𝔼[f2i,j(Xi,Xj)|Xj]𝔼[f2i,j(Xi,Xj)]}|.\displaystyle:=\frac{1}{n(n-1)}\left|\sum_{1\leq i\neq j\leq n}\left\{\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})|X_{j}\right]-\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right]\right\}\right|.

Using these definitions, we get

Δn2supfWn(1)(f)+supfWn(2)(f)+supfWn(3)(f)+Σn2(),\Delta_{n}^{2}\leq\sup_{f\in\mathcal{F}}W_{n}^{(1)}(f)+\sup_{f\in\mathcal{F}}W_{n}^{(2)}(f)+\sup_{f\in\mathcal{F}}W_{n}^{(3)}(f)+\Sigma_{n}^{2}(\mathcal{F}), (44)

where

Σn2():=supf1n(n1)1ijn𝔼[f2i,j(Xi,Xj)].\Sigma_{n}^{2}(\mathcal{F}):=\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\mathbb{E}\left[f^{2}_{i,j}(X_{i},X_{j})\right].

By decoupling and symmetrization, we obtain

𝔼[supfWn(1)(f)]C𝔼[supf1n(n1)|1ijnϵiϵjf2i,j(Xi,Xj)|].\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(1)}(f)\right]\leq C\mathbb{E}\left[\sup_{f\in\mathcal{F}}\frac{1}{n(n-1)}\left|\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f^{2}_{i,j}(X_{i},X_{j})\right|\right].

Set for ff\in\mathcal{F},

Rϵ(f):=1n(n1)1ijnϵiϵjf2i,j(Xi,Xj).R_{\epsilon}(f):=\frac{1}{\sqrt{n(n-1)}}\sum_{1\leq i\neq j\leq n}\epsilon_{i}\epsilon_{j}f^{2}_{i,j}(X_{i},X_{j}).

Again by Lemma 9 and using |fi,j(x,x)+gi,j(x,x)|2R|f_{i,j}(x,x^{\prime})+g_{i,j}(x,x^{\prime})|\leq 2R for all f,gf,g\in\mathcal{F} and x,x𝒳x,x^{\prime}\in\mathcal{X}, we get

Rϵ(f)Rϵ(g)ψ1|𝒳n\displaystyle\left\lVert R_{\epsilon}(f)-R_{\epsilon}(g)\right\rVert_{\psi_{1}\big{|}\mathcal{X}_{n}} 8eR(1n(n1)1ijn(fi,j(Xi,Xj)gi,j(Xi,Xj))2)1/2\displaystyle\leq 8eR\left(\frac{1}{n(n-1)}\sum_{1\leq i\neq j\leq n}\left(f_{i,j}(X_{i},X_{j})-g_{i,j}(X_{i},X_{j})\right)^{2}\right)^{1/2}
8eRfg2,Pn.\displaystyle\leq 8eR\left\lVert f-g\right\rVert_{2,P_{n}}.

Hence by following the first part of the proof, we get

𝔼[supfWn(1)(f)]CRF2,PnJ2(𝔼[Δn2]F2,P,,2).\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(1)}(f)\right]\leq C\frac{R\left\lVert F\right\rVert_{2,P}}{n}J_{2}\left(\frac{\sqrt{\mathbb{E}\left[\Delta_{n}^{2}\right]}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right). (45)

Substituting this in (44) after taking expectations,

Δn22F2,P2CB2nJ2(Δn2F2,P,,2)+A2n,\displaystyle\frac{\left\lVert\Delta_{n}\right\rVert_{2}^{2}}{\left\lVert F\right\rVert_{2,P}^{2}}\leq CB^{2}_{n}J_{2}\left(\frac{\left\lVert\Delta_{n}\right\rVert_{2}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)+A^{2}_{n},

where

B2n:=RnF2,PandA2n:=𝔼[supfWn(2)(f)]+𝔼[supfWn(3)(f)]+Σn2()F2,P2.B^{2}_{n}:=\frac{R}{n\left\lVert F\right\rVert_{2,P}}\quad\mbox{and}\quad A^{2}_{n}:=\frac{\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(2)}(f)\right]+\mathbb{E}\left[\sup_{f\in\mathcal{F}}W_{n}^{(3)}(f)\right]+\Sigma_{n}^{2}(\mathcal{F})}{\left\lVert F\right\rVert_{2,P}^{2}}.

It follows that

Δn22F2,P2Cb2nJ2(Δn2F2,P,,2)+a2,\frac{\left\lVert\Delta_{n}\right\rVert_{2}^{2}}{\left\lVert F\right\rVert_{2,P}^{2}}\leq Cb^{2}_{n}J_{2}\left(\frac{\left\lVert\Delta_{n}\right\rVert_{2}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)+a^{2},

for any aAna\geq A_{n} and bBnb\geq B_{n}. Therefore, by Lemma 2.1 of van der Vaart and Wellner, (2011), it follows that for any aAna\geq A_{n} and bBnb\geq B_{n},

J2(Δn2F2,P,,2)CJ2(a,,2)[1+J2(a,,2)b2a2].J_{2}\left(\frac{\left\lVert\Delta_{n}\right\rVert_{2}}{\left\lVert F\right\rVert_{2,P}},\mathcal{F},\left\lVert\cdot\right\rVert_{2}\right)\leq CJ_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})\left[1+\frac{J_{2}(a,\mathcal{F},\left\lVert\cdot\right\rVert_{2})b^{2}}{a^{2}}\right].

Substituting this in (43) and (42), we get

𝔼[supfZϵ(f)]CF2,PJ2(a)[1+J2(a2,,2)b2a2],\mathbb{E}\left[\sup_{f\in\mathcal{F}}Z_{\epsilon}(f)\right]\leq C\left\lVert F\right\rVert_{2,P}J_{2}(a)\left[1+\frac{J_{2}(a^{2},\mathcal{F},\left\lVert\cdot\right\rVert_{2})b^{2}}{a^{2}}\right],

for any aAna\geq A_{n} and bBnb\geq B_{n}. The result is proved.∎

References

  • Adamczak, (2006) Adamczak, R. (2006). Moment inequalities for UU-statistics. Ann. Probab., 34(6):2288–2314.
  • Arcones and Giné, (1993) Arcones, M. A. and Giné, E. (1993). Limit theorems for UU-processes. Ann. Probab., 21(3):1494–1542.
  • Bakhshizadeh, (2023) Bakhshizadeh, M. (2023). Exponential tail bounds and large deviation principle for heavy-tailed UU-statistics. arXiv preprint arXiv:2301.11563.
  • Bang and Robins, (2005) Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973.
  • Bentkus and Götze, (1999) Bentkus, V. and Götze, F. (1999). Optimal bounds in non-gaussian limit theorems for UU-statistics. The Annals of Probability, 27(1):454–521.
  • Boucheron et al., (2005) Boucheron, S., Bousquet, O., Lugosi, G., and Massart, P. (2005). Moment inequalities for functions of independent random variables. Ann. Probab., 33(2):514–560.
  • Boucheron et al., (2013) Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration inequalities. Oxford University Press, Oxford. A nonasymptotic theory of independence, With a foreword by Michel Ledoux.
  • Chakrabortty and Kuchibhotla, (2018) Chakrabortty, A. and Kuchibhotla, A. K. (2018). Tail bounds for canonical U-statistics and U-processes with unbounded kernels. Technical report, Working paper, Wharton School, University of Pennsylvania.
  • Chen and Kato, (2020) Chen, X. and Kato, K. (2020). Jackknife multiplier bootstrap: finite sample approximations to the UU-process supremum with applications. Probability Theory and Related Fields, 176:1097–1163.
  • Clémençon et al., (2008) Clémençon, S., Lugosi, G., and Vayatis, N. (2008). Ranking and empirical minimization of UU-statistics. Ann. Statist., 36(2):844–874.
  • de la Peña, (1992) de la Peña, V. H. (1992). Decoupling and Khintchine’s inequalities for UU-statistics. Ann. Probab., 20(4):1877–1892.
  • de la Peña and Giné, (1999) de la Peña, V. H. and Giné, E. (1999). Decoupling. Probability and its Applications (New York). Springer-Verlag, New York. From dependence to independence, Randomly stopped processes. UU-statistics and processes. Martingales and beyond.
  • Delyon and Portier, (2016) Delyon, B. and Portier, F. (2016). Integral approximation by kernel smoothing. Bernoulli, 22(4):2177–2208.
  • Dirksen, (2015) Dirksen, S. (2015). Tail bounds via generic chaining. Electron. J. Probab., 20:no. 53, 29.
  • Giné et al., (2000) Giné, E., Latała, R., and Zinn, J. (2000). Exponential and moment inequalities for UU-statistics. In High dimensional probability, II (Seattle, WA, 1999), volume 47 of Progr. Probab., pages 13–38. Birkhäuser Boston, Boston, MA.
  • Giné and Nickl, (2008) Giné, E. and Nickl, R. (2008). A simple adaptive estimator of the integrated square of a density. Bernoulli, 14(1):47–61.
  • Giné and Nickl, (2016) Giné, E. and Nickl, R. (2016). Mathematical foundations of infinite-dimensional statistical models. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, New York.
  • Hall and Marron, (1987) Hall, P. and Marron, J. S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett., 6(2):109–115.
  • He et al., (2024) He, Y., Wang, K., and Zhu, Y. (2024). Sparse hanson-wright inequalities with applications. arXiv preprint arXiv:2410.15652.
  • Houdré and Reynaud-Bouret, (2003) Houdré, C. and Reynaud-Bouret, P. (2003). Exponential inequalities, with constants, for U-statistics of order two. In Stochastic inequalities and applications, volume 56 of Progr. Probab., pages 55–69. Birkhäuser, Basel.
  • Kim, (2020) Kim, I. (2020). Statistical Theory and Methods for Comparing Distributions. PhD thesis, Carnegie Mellon University.
  • Kolesko and Latała, (2015) Kolesko, K. and Latała, R. (2015). Moment estimates for chaoses generated by symmetric random variables with logarithmically convex tails. Statist. Probab. Lett., 107:210–214.
  • Kuchibhotla and Chakrabortty, (2022) Kuchibhotla, A. K. and Chakrabortty, A. (2022). Moving beyond sub-Gaussianity in high-dimensional statistics: Applications in covariance estimation and linear regression. Information and Inference: A Journal of the IMA, 11(4):1389–1456.
  • Ledoux and Talagrand, (1991) Ledoux, M. and Talagrand, M. (1991). Probability in Banach spaces, volume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)]. Springer-Verlag, Berlin. Isoperimetry and processes.
  • Liu et al., (2021) Liu, L., Mukherjee, R., Robins, J. M., and Tchetgen, E. T. (2021). Adaptive estimation of nonparametric functionals. Journal of Machine Learning Research, 22(99):1–66.
  • Major, (2005) Major, P. (2005). Tail behaviour of multiple random integrals and UU-statistics. Probab. Surv., 2:448–505.
  • Major, (2013) Major, P. (2013). On the estimation of multiple random integrals and UU-statistics, volume 2079 of Lecture Notes in Mathematics. Springer, Heidelberg.
  • Newey and Ruud, (2005) Newey, W. K. and Ruud, P. A. (2005). Density weighted linear least squares. In Identification and inference for econometric models, pages 554–573. Cambridge Univ. Press, Cambridge.
  • Nolan and Pollard, (1987) Nolan, D. and Pollard, D. (1987). UU-processes: rates of convergence. Ann. Statist., 15(2):780–799.
  • Nolan et al., (1988) Nolan, D., Pollard, D., et al. (1988). Functional limit theorems for uu-processes. The Annals of Probability, 16(3):1291–1298.
  • Robins et al., (2008) Robins, J., Li, L., Tchetgen, E., van der Vaart, A., et al. (2008). Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and statistics: essays in honor of David A. Freedman, volume 2, pages 335–422. Institute of Mathematical Statistics.
  • Robins et al., (2017) Robins, J. M., Li, L., Mukherjee, R., Tchetgen, E. T., and van der Vaart, A. (2017). Minimax estimation of a functional on a structured high-dimensional model. The Annals of Statistics, 45(5):1951–1987.
  • Robins et al., (2016) Robins, J. M., Li, L., Tchetgen, E. T., and van der Vaart, A. (2016). Asymptotic normality of quadratic estimators. Stochastic processes and their applications, 126(12):3733–3759.
  • Robins et al., (1994) Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866.
  • Rudelson and Vershynin, (2013) Rudelson, M. and Vershynin, R. (2013). Hanson-Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab., 18:no. 82, 9.
  • Serfling, (1980) Serfling, R. J. (1980). Approximation theorems of mathematical statistics. John Wiley & Sons, Inc., New York. Wiley Series in Probability and Mathematical Statistics.
  • Spokoiny and Zhilova, (2013) Spokoiny, V. and Zhilova, M. (2013). Sharp deviation bounds for quadratic forms. Math. Methods Statist., 22(2):100–113.
  • Talagrand, (2014) Talagrand, M. (2014). Upper and lower bounds for stochastic processes, volume 60 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer, Heidelberg. Modern methods and classical problems.
  • van de Geer and Lederer, (2013) van de Geer, S. and Lederer, J. (2013). The Bernstein-Orlicz norm and deviation inequalities. Probab. Theory Related Fields, 157(1-2):225–250.
  • van der Vaart and Wellner, (2011) van der Vaart, A. and Wellner, J. A. (2011). A local maximal inequality under uniform entropy. Electron. J. Stat., 5:192–203.
  • van der Vaart and Wellner, (1996) van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics.