This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Berry-Esseen theorem for incomplete U-statistics with Bernoulli sampling

Dennis Leung School of Mathematics and Statistics, University of Melbourne, Australia [email protected]
Abstract.

There has been a resurgence of interest in the asymptotic normality of incomplete U-statistics that only sum over roughly as many kernel evaluations as there are data samples, due to its computational efficiency and usefulness in quantifying the uncertainty for ensemble-based predictions. In this paper, we focus on the normal convergence of one such construction, the incomplete U-statistic with Bernoulli sampling, based on a raw sample of size nn and a computational budget NN. Under minimalistic moment assumptions on the kernel, we offer accompanying Berry-Esseen bounds of the natural rate 1/min(N,n)1/\sqrt{\min(N,n)} that characterize the normal approximating accuracy involved when nNn\asymp N, i.e. nn and NN are of the same order in such a way that n/Nn/N is lower-and-upper bounded by constants. Our key techniques include Stein’s method specialized for the so-called Studentized nonlinear statistics, and an exponential lower tail bound for non-negative kernel U-statistics.

Key words and phrases:
Incomplete U-statistics, Berry-Esseen theorems, exponential tail bounds, self-normalized limit theory, Stein’s method
2000 Mathematics Subject Classification:
62E17

1. Introduction

Suppose X1,,XnMX_{1},\dots,X_{n}\in M are independent and identically distributed random variables taking values in a given measurable space (M,)(M,\mathcal{M}). Hoeffding (1948) first introduced U-statistics, which, for a given degree mm\in\mathbb{N}, are statistics of the form

(1.1) Un=1(nm)(i1,,im)In,mh(Xi1,,Xim),U_{n}=\frac{1}{{n\choose m}}\sum_{(i_{1},\dots,i_{m})\in I_{n,m}}h(X_{i_{1}},\dots,X_{i_{m}}),

where In,m{(i1,,im):1i1<<imn}{I_{n,m}}\equiv\{(i_{1},\dots,i_{m}):1\leq i_{1}<\cdots<i_{m}\leq n\} and h:Mmh:M^{m}\rightarrow\mathbb{R} is a given symmetric function in mm arguments, i.e.

h(x1,,xm)=h(xπ1,,xπm)h(x_{1},\dots,x_{m})=h(x_{\pi_{1}},\dots,x_{\pi_{m}})

for any permutation (π1,,πm)(\pi_{1},\dots,\pi_{m}) of the indices (1,,m)(1,\dots,m). It generalizes the sample mean and encompasses many examples such as the sample variance; see Koroljuk and Borovskich (1994) for a comprehensive treatment of the classical theory on U-statistics. To simplify notation, for each mm-tuple 𝐢=(i1,,im)In,m{\bf i}=(i_{1},\dots,i_{m})\in I_{n,m}, we will henceforth adopt the shorthands 𝐗𝐢(Xi1,,Xim){\bf X}_{\bf i}\equiv(X_{i_{1}},\dots,X_{i_{m}}) and h(𝐗𝐢)h(Xi1,,Xim)h({\bf X}_{\bf i})\equiv h(X_{i_{1}},\dots,X_{i_{m}}), as well as using 𝒳(X1,,Xn)\mathcal{X}\equiv(X_{1},\dots,X_{n}) to represent all the raw data inserted into UnU_{n}. Throughout this paper, we will assume

(1.2) 𝔼[h(X1,,Xm)]=0 and σh2𝔼[h2(X1,,Xm)]<,\mathbb{E}[h(X_{1},\dots,X_{m})]=0\text{ and }\sigma_{h}^{2}\equiv\mathbb{E}[h^{2}(X_{1},\dots,X_{m})]<\infty,

and that the single-argument projection function g:Mg:M\rightarrow\mathbb{R} associated with the kernel hh, defined by g(x1)𝔼[h(x1,X1,,Xm1)]g(x_{1})\equiv\mathbb{E}[h(x_{1},X_{1},\dots,X_{m-1})], is non-degenerate, i.e.

(1.3) σg2var[g(X)]>0;\sigma_{g}^{2}\equiv\text{var}[g(X)]>0;

note that (1.3) necessarily implies σh2>0\sigma_{h}^{2}>0 by Jensen’s inequality.

UnU_{n} is complete, in a sense that it is an average of the kernel hh evaluated at all possible ordered subsamples of size mm from {1,,n}\{1,\dots,n\}. Blom (1976) suggested alternative constructions dubbed “incomplete U-statistics” which only average over an appropriate subset of In,m{I_{n,m}}, with the intuition that the strong dependence among the summands in (1.1) should allow one to only use a well-chosen subset of them without losing too much statistical efficiency. In this work, we study one particular such construction, known as the incomplete U-statistic with Bernoulli sampling, which has the form

(1.4) Un,N=1N^𝐢In,mZ𝐢h(𝐗𝐢),U_{n,N}^{\prime}=\frac{1}{\hat{N}}\sum_{{\bf i}\in I_{n,m}}Z_{{\bf i}}h({\bf X}_{\bf i}),

where the Z𝐢Z_{\bf i}’s are (nm){n\choose m} independent and identically distributed Bernoulli random variables with success probability

(1.5) pN/(nm) for a positive integer N<(nm),p\equiv N/{n\choose m}\text{ for a positive integer }N<{n\choose m},

i.e., P(Z𝐢=1)=p=N(nm)1P(Z_{\bf i}=1)=p=N{n\choose m}^{-1} and P(Z𝐢=0)=1p=1N(nm)1P(Z_{\bf i}=0)=1-p=1-N{n\choose m}^{-1}, and N^\hat{N} is defined as

N^𝐢In,mZ𝐢.\hat{N}\equiv\sum_{{\bf i}\in I_{n,m}}Z_{\bf i}.

We call NN the computational budget because it is the expected number of kernel evaluations N^\hat{N} that Un,NU_{n,N}^{\prime} winds up summing over. In the sequel, we will also use 𝒵(Z𝐢)𝐢In,m\mathcal{Z}\equiv(Z_{\bf i})_{{\bf i}\in{I_{n,m}}} to represent all these Bernoulli samplers, which are also assumed to be independent of the raw data 𝒳\mathcal{X}.

It is an established result (Janson, 1984, Corollary 1) that, if

(1.6) N,n such that the ratio αn/N tends to a positive constant, N,n\longrightarrow\infty\text{ such that the ratio }\alpha\equiv n/N\text{ tends to a positive constant, }

the following weak convergence holds for any degree m2m\geq 2 under (1.2)-(1.3):

(1.7) nUn,Nσd𝒩(0,1),\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}\longrightarrow_{d}\mathcal{N}(0,1),

where

σ2m2σg2+ασh2.\sigma^{2}\equiv m^{2}\sigma_{g}^{2}+\alpha\sigma_{h}^{2}.

Note that the asymptotic regime in (1.6) captures a common computationally efficient use case where the number of kernel evaluations being summed over by Un,NU_{n,N}^{\prime} is in the same order as the raw sample size nn, whereas the full-blown U-statistic UnU_{n} entails a lot more summations, of the order O(nm)O(n^{m}). The asymptotic normality of incomplete U-statistics such as (1.7) has found a resurgence of interest among the machine learning community, since Mentch and Hooker (2016) pointed out that it provides a computationally feasible framework for quantifying the uncertainty of ensemble predictions; see also Zhou et al. (2021), Peng et al. (2022), DiCiccio and Romano (2022), Xu et al. (2022) for some related recent works. While the convergent result (1.7) itself has been known for a long time, a Berry-Esseen (B-E) bound of a natural rate that characterizes the associated normal approximating accuracy is thus far missing from the literature.

In this paper, we offer the latter to fill such a gap. Capitalizing on the recent framework of Stein’s method for the so-called “Studentized nonlinear statistics” (Leung et al., 2024) and an exponential lower tail bound for non-negative U-statistics (Leung and Shao, 2023, Lemma 4.3), we establish B-E bounds for Un,NU_{n,N}^{\prime} under minimalistic moment assumptions on the kernel hh, which has the rate 1/min(n,N)1/\sqrt{\min(n,N)} when nn and NN are of the same order, the optimal rate we believe one could possibly achieve. In passing, we mention that the statistical community has also been developing B-E bounds for high-dimensional incomplete U-statistics (Song et al., 2019, Chen and Kato, 2019); our focus here is instead to provide a bound of the best rate for the one-dimensional convergent result in (1.7).

1.1. Other notation and convention

P()P(\cdot), 𝔼[]\mathbb{E}[\cdot] denote generic probability and expectation operators. For any q1q\geq 1, we use Yq=(𝔼|Y|q)1/q\|Y\|_{q}=(\mathbb{E}|Y|^{q})^{1/q} to denote the LqL_{q}-norm of any real-valued random variable YY. Given k{1,,n}k\in\{1,\dots,n\} and a function t:Mkt:M^{k}\rightarrow\mathbb{R} in kk arguments in MM, we use 𝔼[t]\mathbb{E}[t] and tq\|t\|_{q} as shorthands for 𝔼[t(X1,,Xk)]\mathbb{E}[t(X_{1},\dots,X_{k})] and t(X1,,Xk)q\|t(X_{1},\dots,X_{k})\|_{q}, respectively. ϕ()\phi(\cdot) and Φ()\Phi(\cdot) are respectively the standard normal density and distribution functions, with Φ¯()1Φ()\bar{\Phi}(\cdot)\equiv 1-\Phi(\cdot); 𝒩(μ,v2)\mathcal{N}(\mu,v^{2}) denotes the univariate normal distribution with mean μ\mu and variance v2v^{2}. The indicator function is denoted by I()I(\cdot). Generally speaking, CC denotes an unspecified absolute positive constant, where “absolute” means it is universal for the statements it appears in and doesn’t depend on other quantities; it generally varies in values at different occurrences throughout the paper.

2. The main result

We first review the nature of the normal convergence in (1.7). To facilitate the discussion, we define the U-statistics

(2.1) Uh2(nm)1𝐢In,mh2(𝐗𝐢) and U|h|3(nm)1𝐢In,m|h|3(𝐗𝐢),U_{h^{2}}\equiv{n\choose m}^{-1}\sum_{{\bf i}\in I_{n,m}}h^{2}({\bf X}_{\bf i})\text{ and }U_{|h|^{3}}\equiv{n\choose m}^{-1}\sum_{{\bf i}\in I_{n,m}}|h|^{3}({\bf X}_{\bf i}),

whose non-negative kernels h2h^{2} and |h|3|h|^{3} are induced by taking powers of the absolute value of the original kernel hh. One can easily check that, as in the recent works on B-E bounds for incomplete U-statistics (Chen and Kato, 2019, Song et al., 2019, Peng et al., 2022, Sturma et al., 2022), Un,NU_{n,N}^{\prime} admits the decomposition

(2.2) nUn,Nσ=NN^((1p)nBnσ+nUnσ) when N^0,\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}=\frac{N}{\hat{N}}\bigg{(}\frac{\sqrt{(1-p)n}B_{n}}{\sigma}+\frac{\sqrt{n}U_{n}}{\sigma}\bigg{)}\text{ when }\hat{N}\neq 0,

where

(2.3) Bn1N𝐢In,mZ𝐢p1ph(𝐗𝐢).B_{n}\equiv\frac{1}{N}\sum_{{\bf i}\in I_{n,m}}\frac{Z_{\bf i}-p}{\sqrt{1-p}}h({\bf X}_{\bf i}).

Hence, up to multiplicative scaling factors, Un,NU^{\prime}_{n,N} can be understood as a sum of the complete U-statistic UnU_{n} and the term BnB_{n}. Under the assumptions (1.2) and (1.3), it is well-known that as nn tends to infinity, nUn\sqrt{n}U_{n} converges weakly to the limit 𝒩(0,m2σg2)\mathcal{N}(0,m^{2}\sigma_{g}^{2}) (Koroljuk and Borovskich, 1994, Ch.4). On the other hand, when conditioned on the raw data 𝒳\mathcal{X}, nBn\sqrt{n}B_{n} should be approximately distributed as 𝒩(0,αUh2)\mathcal{N}(0,\alpha U_{h^{2}}), for being a sum of independent mean-zero terms solely driven by the randomness of 𝒵\mathcal{Z}, with variance npN2𝐢In,mh2(X𝐢)=αUh2\frac{np}{N^{2}}\sum_{{\bf i}\in I_{n,m}}h^{2}(X_{\bf i})=\alpha U_{h^{2}}; since

(2.4) Uh2 is a consistent estimator for σh2,\text{$U_{h^{2}}$ is a consistent estimator for $\sigma_{h}^{2}$},

marginally, nBn\sqrt{n}B_{n} is expected to be approximately distributed as 𝒩(0,ασh2)\mathcal{N}(0,\alpha\sigma_{h}^{2}) and also approximately independent of UnU_{n}. Altogether, in light of N/N^1N/\hat{N}\approx 1, nUn,Nσ\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}, being a sum of two approximately independent and normal random variables, is expected to be asymptotically normal with mean zero and variance

α(1p)σh2σ2+m2σg2σ21\frac{\alpha(1-p)\sigma_{h}^{2}}{\sigma^{2}}+\frac{m^{2}\sigma_{g}^{2}}{\sigma^{2}}\approx 1

under the regime of (1.6), because pN/(nm)0p\equiv N/{n\choose m}\longrightarrow 0. Indeed, this is stated by the weak convergence in (1.7).

An extension of the preceding discussion also sheds light on what one should expect from a B-E bound that assesses the normal approximating accuracy of (1.7). Firstly, the complete U-statistic UnU_{n} has a well-established B-E bound (Chen and Shao, 2007, Peng et al., 2022) that reads as

(2.5) supz|P(nUnmσgz)Φ(z)|(1+2){mn(σh2mσg21)}1/2+6.1𝔼[|g|3]nσg3.\sup_{z\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z\bigg{)}-\Phi(z)\bigg{|}\leq(1+\sqrt{2})\bigg{\{}\frac{m}{n}\bigg{(}\frac{\sigma_{h}^{2}}{m\sigma_{g}^{2}}-1\bigg{)}\bigg{\}}^{1/2}+\frac{6.1\mathbb{E}[|g|^{3}]}{\sqrt{n}\sigma_{g}^{3}}.

Moreover, ignoring the technical difficulty that may arise from division by zero when Uh2=0U_{h^{2}}=0 for the time being, one can write

NBnUh2=𝐢In,m(Z𝐢p)h(𝐗𝐢)Uh2N(1p);\frac{\sqrt{N}B_{n}}{\sqrt{U_{h^{2}}}}=\sum_{{\bf i}\in I_{n,m}}\frac{(Z_{\bf i}-p)h({\bf X}_{\bf i})}{\sqrt{U_{h^{2}}N(1-p)}};

so NBn/Uh2\sqrt{N}B_{n}/\sqrt{U_{h^{2}}} is a sum of conditionally independent random variables (Z𝐢p)h(𝐗𝐢)/Uh2N(1p)(Z_{\bf i}-p)h({\bf X}_{\bf i})/\sqrt{U_{h^{2}}N(1-p)} indexed by 𝐢In,m{\bf i}\in{I_{n,m}}, each having the conditional mean

𝔼[(Z𝐢p)h(𝐗𝐢)Uh2N(1p)𝒳]=0,\mathbb{E}\bigg{[}\frac{(Z_{\bf i}-p)h({\bf X}_{\bf i})}{\sqrt{U_{h^{2}}N(1-p)}}\mid\mathcal{X}\bigg{]}=0,

and the conditional second and third absolute moments

𝔼[(Z𝐢p)2h2(𝐗𝐢)Uh2N(1p)𝒳]=h2(𝐗𝐢)Uh2(nm) and 𝔼[|Z𝐢p|3|h|3(𝐗𝐢)Uh23/2N3/2(1p)3/2𝒳]=(12p+2p2)|h|3(𝐗𝐢)(nm)Uh23/2N(1p),\mathbb{E}\bigg{[}\frac{(Z_{\bf i}-p)^{2}h^{2}({\bf X}_{\bf i})}{U_{h^{2}}N(1-p)}\mid\mathcal{X}\bigg{]}=\frac{h^{2}({\bf X}_{\bf i})}{U_{h^{2}}{n\choose m}}\text{ and }\\ \mathbb{E}\bigg{[}\frac{|Z_{\bf i}-p|^{3}|h|^{3}({\bf X}_{\bf i})}{U_{h^{2}}^{3/2}N^{3/2}(1-p)^{3/2}}\mid\mathcal{X}\bigg{]}=\frac{(1-2p+2p^{2})|h|^{3}({\bf X}_{\bf i})}{{n\choose m}U_{h^{2}}^{3/2}\sqrt{N(1-p)}},

where we have used the calculation 𝔼[|Z𝐢|3]=p(1p)(12p+2p2)\mathbb{E}[|Z_{\bf i}|^{3}]=p(1-p)(1-2p+2p^{2}). Since the second moment calculation above suggests that

𝔼[NBn2/Uh2𝒳]=(Uh2(nm))1𝐢In,mh2(𝐗𝐢)=1,\mathbb{E}[NB_{n}^{2}/U_{h^{2}}\mid\mathcal{X}]=\bigg{(}U_{h^{2}}{n\choose m}\bigg{)}^{-1}\sum_{{\bf i}\in{I_{n,m}}}h^{2}({\bf X}_{\bf i})=1,

the classical B-E bound (Berry, 1941, Esseen, 1942, Shevtsova, 2010) for a sum of independent variables with unit variance shall imply that

(2.6) supz|P(NBnUh2z𝒳)Φ(z)|0.56U|h|3(12p+2p2)Uh23/2N(1p).\sup_{z\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{N}B_{n}}{\sqrt{U_{h^{2}}}}\leq z\mid\mathcal{X}\bigg{)}-\Phi(z)\bigg{|}\leq\frac{0.56U_{|h|^{3}}(1-2p+2p^{2})}{U_{h^{2}}^{3/2}\sqrt{N(1-p)}}.

As discussed, since the asymptotic normality of Un,NU_{n,N}^{\prime} under the regime (1.6) can be viewed as the distribution resulting from summing the two approximately independent and normal random variables UnU_{n} and BnB_{n}, the two B-E bounds in (2.5) and (2.6) intuitively suggest that a “correct” B-E bound for the marginal cumulative distribution of Un,NU_{n,N}^{\prime} should be of the rate 1/min(N,n)1/\sqrt{\min(N,n)} when nNn\asymp N, i.e. when there exists a positive constant C>0C>0 such that

(2.7) max(nN,Nn)<C,\max\Big{(}\frac{n}{N},\frac{N}{n}\Big{)}<C,

as a non-asymptotic analog of the regime (1.6).

This paper materializes the latter intuition into a Berry-Esseen theorem, which can be stated in terms of projection functions with respect to the non-negative kernel h2h^{2} of the U-statistic Uh2U_{h^{2}} in (2.1). For each r=1,,mr=1,\dots,m, first define the function Ψ~r:Mr\tilde{\Psi}_{r}:M^{r}\rightarrow\mathbb{R} in rr arguments by

Ψ~r(x1,,xr)𝔼[h2(x1,,xr,Xr+1,,Xm)]σh2,\tilde{\Psi}_{r}(x_{1},\dots,x_{r})\equiv\mathbb{E}[h^{2}(x_{1},\dots,x_{r},X_{r+1},\dots,X_{m})]-\sigma_{h}^{2},

which is centered in the sense that 𝔼[Ψ~r]𝔼[Ψ~r(X1,,Xr)]=0\mathbb{E}[\tilde{\Psi}_{r}]\equiv\mathbb{E}[\tilde{\Psi}_{r}(X_{1},\dots,X_{r})]=0; specifically for Ψ~1\tilde{\Psi}_{1}, we also let Ψ1:M0\Psi_{1}:M\rightarrow\mathbb{R}_{\geq 0} be its “uncentered version”

(2.8) Ψ1(x1)𝔼[h2(x1,X2,,Xm)],\Psi_{1}(x_{1})\equiv\mathbb{E}[h^{2}(x_{1},X_{2},\dots,X_{m})],

which must also be non-negative. Next, we can recursively define the Hoeffding projection kernels πr(h2):Mr\pi_{r}(h^{2}):M^{r}\rightarrow\mathbb{R} of h2h^{2} as

π1(h2)(x1)Ψ~1(x1) for r=1\pi_{1}(h^{2})(x_{1})\equiv\tilde{\Psi}_{1}(x_{1})\text{ for }r=1

and

πr(h2)(x1,xr)Ψ~r(x1,xr)s=1r11j1<<jsrπs(h2)(xj1,,xjs) for r=2,,m.\pi_{r}(h^{2})(x_{1}\dots,x_{r})\equiv\tilde{\Psi}_{r}(x_{1}\dots,x_{r})-\sum_{s=1}^{r-1}\sum_{1\leq j_{1}<\cdots<j_{s}\leq r}\pi_{s}(h^{2})(x_{j_{1}},\dots,x_{j_{s}})\\ \text{ for }r=2,\dots,m.

These projection functions give rise to the famous Hoeffding decomposition (De la Pena and Giné, 1999, p.137) of the U-statistic Uh2U_{h^{2}} that, upon re-centering and rescaling with σh2\sigma_{h}^{2}, can be written as

(2.9) Uh2σh2σh2=i=1nηim+r=2m(mr)(nr)11j1<<jrnπr(h2)(Xj1,,Xjr)σh2,\frac{U_{h^{2}}-\sigma_{h}^{2}}{\sigma_{h}^{2}}=\sum_{i=1}^{n}\eta_{i}-m+\sum_{r=2}^{m}{m\choose r}{n\choose r}^{-1}\sum_{1\leq j_{1}<\dots<j_{r}\leq n}\frac{\pi_{r}(h^{2})(X_{j_{1}},\dots,X_{j_{r}})}{\sigma_{h}^{2}},

where

(2.10) ηimΨ1(Xi)nσh2 for i=1,,n\eta_{i}\equiv\frac{m\Psi_{1}(X_{i})}{n\sigma_{h}^{2}}\text{ for }i=1,\dots,n

are the leading terms in the decomposition. We note that each of the ηi\eta_{i}’s has mean 𝔼[ηi]=m/n\mathbb{E}[\eta_{i}]=m/n, giving rise to the re-centering of i=1nηi\sum_{i=1}^{n}\eta_{i} by mm in (2.9). We can now state our main B-E bound for Un,NU_{n,N}^{\prime}; recall 𝔼[πr(h2)]𝔼[πr(h2)(X1,,Xr)]\mathbb{E}[\pi_{r}(h^{2})]\equiv\mathbb{E}[\pi_{r}(h^{2})(X_{1},\dots,X_{r})] and 𝔼[Ψ13/2]𝔼[Ψ13/2(X1)]=𝔼[(Ψ1(X1))3/2]\mathbb{E}[\Psi_{1}^{3/2}]\equiv\mathbb{E}[\Psi_{1}^{3/2}(X_{1})]=\mathbb{E}[(\Psi_{1}(X_{1}))^{3/2}], as per our convention in Section 1.1.

Theorem 2.1 (Berry-Esseen theorem for incomplete U-statistics with Bernoulli sampling).

Let Un,NU_{n,N}^{\prime} be constructed as in (1.4) with a symmetric kernel hh on MmM^{m} and [n/m][n/m] be the greatest integer less than n/mn/m. Suppose that (1.2)-(1.3), 𝔼[|h|3]<\mathbb{E}[|h|^{3}]<\infty and 2m<n/22\leq m<n/2 hold. For an absolute constant C>0C>0, one has the B-E bound

supz|P(nUn,Nσz)Φ(z)|C{(mn(σh2mσg21))1/2+𝔼[|g|3]nσg3+𝔼[|h|3](12p+2p2)σh3(N(1p))1/2+exp([n/m]σh624(𝔼[|h|3])2)+n,m,N,h},\sup_{z\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}\leq z\bigg{)}-\Phi(z)\bigg{|}\leq C\Bigg{\{}\bigg{(}\frac{m}{n}\bigg{(}\frac{\sigma_{h}^{2}}{m\sigma_{g}^{2}}-1\bigg{)}\bigg{)}^{1/2}+\frac{\mathbb{E}[|g|^{3}]}{\sqrt{n}\sigma_{g}^{3}}\\ +\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}+\exp\bigg{(}\frac{-[n/m]\sigma_{h}^{6}}{24(\mathbb{E}[|h|^{3}])^{2}}\bigg{)}+\mathcal{R}_{n,m,N,h}\Bigg{\}},

where n,m,N,h\mathcal{R}_{n,m,N,h} is a non-negative term that can be bounded as

n,m,N,hNn2(1p)+mn(1p)+(1+mNn(1p))m3/2𝔼[Ψ13/2]n1/2σh3+(m1)1/3(r=2m(mr)3/2(nr)1/22r/2𝔼[|πr(h2)|3/2]σh3)2/3.\mathcal{R}_{n,m,N,h}\leq\frac{N}{n^{2}(1-p)}+\frac{\sqrt{m}}{\sqrt{n(1-p)}}+\bigg{(}1+\frac{\sqrt{mN}}{\sqrt{n(1-p)}}\bigg{)}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\\ +(m-1)^{1/3}\bigg{(}\sum_{r=2}^{m}{m\choose r}^{3/2}{n\choose r}^{-1/2}\frac{2^{r/2}\mathbb{E}[|\pi_{r}(h^{2})|^{3/2}]}{\sigma_{h}^{3}}\bigg{)}^{2/3}.

If, in addition, 𝔼[h4]<\mathbb{E}[h^{4}]<\infty, then we have

n,m,N,hNn2(1p)+mn(1p)+m𝔼[(h2σh2)2]σh2n.\mathcal{R}_{n,m,N,h}\leq\frac{N}{n^{2}(1-p)}+\frac{\sqrt{m}}{\sqrt{n(1-p)}}+\frac{\sqrt{m\mathbb{E}[(h^{2}-\sigma_{h}^{2})^{2}]}}{\sigma_{h}^{2}\sqrt{n}}.

It is easy to see that each term on the right hand side of our B-E bound in Theorem 2.1 decays no slower than the rate 1/min(N,n)1/\sqrt{\min(N,n)} as n,Nn,N\longrightarrow\infty, when nNn\asymp N as in (2.7); in particular, since the expression

(2.11) (r=2m(mr)3/2(nr)1/22r/2𝔼[|πr(h2)|3/2]σh3)2/3\bigg{(}\sum_{r=2}^{m}{m\choose r}^{3/2}{n\choose r}^{-1/2}\frac{2^{r/2}\mathbb{E}[|\pi_{r}(h^{2})|^{3/2}]}{\sigma_{h}^{3}}\bigg{)}^{2/3}

is seen to be of order O(n2/3)O(n^{-2/3}), the term n,m,N,h\mathcal{R}_{n,m,N,h} decays at the rate n1/2n^{-1/2} under (2.7), whether under the minimalistic finite third moment assumption 𝔼[|h|3]<\mathbb{E}[|h|^{3}]<\infty or the stronger finite fourth moment assumption 𝔼[h4]<\mathbb{E}[h^{4}]<\infty. Our result can also be contrasted with Peng et al. (2022, Theorem 5), which to our best knowledge is the first attempt in proving a B-E bound for the convergence in (1.7). However, Peng et al. (2022) had to assume six finite moments on the kernel hh and their bound exhibits a slower rate than 1/min(N,n)1/\sqrt{\min(N,n)} under (2.7). To further appreciate Theorem 2.1, the various parts of our bound could be interpreted as follows:

  • (mn(σh2mσg21))1/2+𝔼[|g|3]nσg3\big{(}\frac{m}{n}(\frac{\sigma_{h}^{2}}{m\sigma_{g}^{2}}-1)\big{)}^{1/2}+\frac{\mathbb{E}[|g|^{3}]}{\sqrt{n}\sigma_{g}^{3}} stems from the complete U-statistic B-E bound in (2.5), which account for the “part-complete-U-statistic” behavior of Un,NU_{n,N}^{\prime} coming from UnU_{n}; recall the decomposition in (2.2).

  • 𝔼[|h|3](12p+2p2)σh3(N(1p))1/2+exp([n/m]σh624(𝔼[|h|3])2)\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}+\exp(\frac{-[n/m]\sigma_{h}^{6}}{24(\mathbb{E}[|h|^{3}])^{2}}) is analogous to the B-E bound in (2.6) for the conditional normal convergence of NBn/Uh2\sqrt{N}B_{n}/\sqrt{U_{h^{2}}}, which accounts for the “part-independent-sum” behavior of Un,NU_{n,N}^{\prime} coming from BnB_{n} as per the decomposition in (2.2). However, because Theorem 2.1 really concerns the convergence of the marginal distribution for Un,NU_{n,N}^{\prime}, the data-dependent ratio U|h|3/Uh23/2U_{|h|^{3}}/U_{h^{2}}^{3/2} in (2.6) is now replaced by 𝔼[|h|3]/σh3\mathbb{E}[|h|^{3}]/\sigma_{h}^{3}, a ratio of population moments that can be roughly thought of as its “average”; note that 𝔼[U|h|3]=𝔼[|h|3]\mathbb{E}[U_{|h|^{3}}]=\mathbb{E}[|h|^{3}] and 𝔼[Uh2]=σh2\mathbb{E}[U_{h^{2}}]=\sigma_{h}^{2}. The extra term exp([n/m]σh624(𝔼[|h|3])2)\exp\big{(}\frac{-[n/m]\sigma_{h}^{6}}{24(\mathbb{E}[|h|^{3}])^{2}}\big{)} captures the fact that U|h|3/Uh23/2U_{|h|^{3}}/U_{h^{2}}^{3/2} can never be too large due to the exponentially decaying lower tail behavior of Uh2U_{h^{2}} as a non-negative U-statistic; see (3.20) below.

    The factor 12p+2p2(1p)1/2\frac{1-2p+2p^{2}}{(1-p)^{1/2}} should also be understood with some care, because pp depends on NN and nn as per (1.5). Indeed, as NN approaches (nm){n\choose m}, pp approaches 11, which makes 12p+2p2(1p)1/2\frac{1-2p+2p^{2}}{(1-p)^{1/2}} blows up and the whole bound become vacuous. However, our B-E bound in Theorem 2.1 is really geared for nn and NN having the same order as in (2.7), in which case limn12p+2p2(1p)1/2=1\lim_{n\rightarrow\infty}\frac{1-2p+2p^{2}}{(1-p)^{1/2}}=1 as limnp=0\lim_{n\rightarrow\infty}p=0. In fact, it is well-known that Un,NU_{n,N}^{\prime} has different limiting distributions than the one considered in (1.7), when the relative growth rates of nn and NN are different from the prescription in (1.6) (Janson, 1984, Mentch and Hooker, 2016).

  • n,m,N,h\mathcal{R}_{n,m,N,h} stems from various “remainder terms” emerging in the proof of Theorem 2.1. For instance, one such remainder term is D2D_{2} defined in (3.6) below, which corresponds to the error incurred by estimating σh2\sigma_{h}^{2} with Uh2U_{h^{2}}, as is implicit in (2.4). While the bound on n,m,N,h\mathcal{R}_{n,m,N,h} under the minimalistic third moment assumption 𝔼[|h|3]<\mathbb{E}[|h|^{3}]<\infty appears to have a relatively complicated form as exhibited by (2.11), it comes from standard moment inequalities on the centered U-statistic Uh2σh2U_{h^{2}}-\sigma_{h}^{2}; see Ferger (1996) and Koroljuk and Borovskich (1994, Chapter 2) for instance.

The proof of Theorem 2.1 adopts a technique called variable censoring, recently introduced in Leung et al. (2024) as a refinement of the variable truncation technique for proving B-E bounds with Stein’s method. For a given real-valued random variable YY, its censored version with respect to two constants a,b{,}a,b\in\mathbb{R}\cup\{-\infty,\infty\}, where aba\leq b, is defined as

Y¯aI(Y<a)+YI(aYb)+bI(Y>b),\bar{Y}\equiv aI(Y<a)+YI(a\leq Y\leq b)+bI(Y>b),

i.e. once YY goes beyond the range [a,b][a,b], its censored version Y¯\bar{Y} can only be either aa or bb depending on which direction the value of YY goes; that aa or/and bb are allowed to be infinity means that YY may only be censored in one direction, or not be censored at all. This technique has the following apparent but very useful properties for establishing bounds:

Property 2.2 (Properties of variable censoring).

Let YY and ZZ be any two real-value variables. The following facts hold:

  1. (i)

    Suppose, for some a,b{,}a,b\in\mathbb{R}\cup\{-\infty,\infty\} with aba\leq b,

    Y¯aI(Y<a)+YI(aYb)+bI(Y>b)\bar{Y}\equiv aI(Y<a)+YI(a\leq Y\leq b)+bI(Y>b)

    and

    Z¯aI(Z<a)+ZI(aZb)+bI(Z>b);\bar{Z}\equiv aI(Z<a)+ZI(a\leq Z\leq b)+bI(Z>b);

    then it must be that |Y¯Z¯||YZ|.|\bar{Y}-\bar{Z}|\leq|Y-Z|.

  2. (ii)

    If YY is a non-negative random variable, then it must also be true that

    YI(0Yb)+bI(Y>b)Y for any b(0,),YI(0\leq Y\leq b)+bI(Y>b)\leq Y\text{ for any }b\in(0,\infty),

    i.e., the upper-censored version of YY is always no larger than YY itself.

We now explain the structure of Theorem 2.1’s proof. Throughout, we will use Ω\Omega to denote the entire underlying probability space. Moreover, we will define the following censored version of N^\hat{N},

(2.12) N^cN^I(N^[N2,3N2])+3N2I(N^>3N2)+N2I(N^<N2),\hat{N}_{c}\equiv\hat{N}I\bigg{(}\hat{N}\in\Big{[}\frac{N}{2},\frac{3N}{2}\Big{]}\bigg{)}+\frac{3N}{2}I\bigg{(}\hat{N}>\frac{3N}{2}\bigg{)}+\frac{N}{2}I\bigg{(}\hat{N}<\frac{N}{2}\bigg{)},

and the event

(2.13) 𝒵{N^[N2,3N2]}\mathcal{E}_{\mathcal{Z}}\equiv\bigg{\{}\hat{N}\in\Big{[}\frac{N}{2},\frac{3N}{2}\Big{]}\bigg{\}}

that only depends on the Bernoulli samplers in 𝒵\mathcal{Z}; note that N^c=N^\hat{N}_{c}=\hat{N} on 𝒵\mathcal{E}_{\mathcal{Z}}. The first step of the proof is to establish, for any zz\in\mathbb{R}, the bound

(2.14) |P(nUn,Nσz)Φ(z)|supz|P(nUnmσgz)Φ(z)|+P(Ω\𝒵)+|P(T𝔷𝒳)𝔼[Φ(𝔷𝒳)]|+εz,\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}\leq z\bigg{)}-\Phi(z)\bigg{|}\leq\\ \sup_{z^{\prime}\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z^{\prime}\bigg{)}-\Phi(z^{\prime})\bigg{|}+P(\Omega\backslash\mathcal{E}_{\mathcal{Z}})+\big{|}P(T\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}+\varepsilon_{z},

where

(2.15) TN3/2BnN^cσh+(NN^cN^c)NUnσh1p,T\equiv\frac{N^{3/2}B_{n}}{\hat{N}_{c}\sigma_{h}}+\bigg{(}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\bigg{)}\frac{\sqrt{N}U_{n}}{\sigma_{h}\sqrt{1-p}},
(2.16) 𝔷𝒳zσσhα(1p)NUnσh1p,\mathfrak{z}_{\mathcal{X}}\equiv\frac{z\sigma}{\sigma_{h}\sqrt{\alpha(1-p)}}-\frac{\sqrt{N}U_{n}}{\sigma_{h}\sqrt{1-p}},

and

(2.17) εz|𝔼[Φ(σzm2σg2+α(1p)σh2)]Φ(z)|.\varepsilon_{z}\equiv\bigg{|}\mathbb{E}\bigg{[}\Phi\bigg{(}\frac{\sigma z}{\sqrt{m^{2}\sigma_{g}^{2}+\alpha(1-p)\sigma_{h}^{2}}}\bigg{)}\bigg{]}-\Phi(z)\bigg{|}.

Following other existing works on B-E bounds for incomplete U-statistics, this can be developed with iterative arguments similar to ones pioneered by Chen and Kato (2019), and will be proved at the end of this section. Apparently, the remaining steps boil down to bounding each term on the right hand side of (2.14). The first term can be readily handled by the B-E bound for complete U-statistics in (2.5), where as the second term can be bounded by Bernstein’s inequality (van der Vaart and Wellner, 1996, Lemma 2.2.9, p.102) as in

(2.18) P(Ω\𝒵)=P(|N^N1|>12)2exp(3N28)C𝔼[|h|3](12p+2p2)σh3(N(1p))1/2,P(\Omega\backslash\mathcal{E}_{\mathcal{Z}})=P\Big{(}\Big{|}\frac{\hat{N}}{N}-1\bigg{|}>\frac{1}{2}\Big{)}\leq 2\exp\Big{(}-\frac{3N}{28}\Big{)}\leq C\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}},

where the last inequality comes from the fact that

(2.19) minp[0,1]12p+2p2(1p)1/2=25147\min_{p\in[0,1]}\frac{1-2p+2p^{2}}{(1-p)^{1/2}}=\frac{25}{14\sqrt{7}}

and 𝔼[|h|3]σh3\mathbb{E}[|h|^{3}]\geq\sigma_{h}^{3}. The proof of Theorem 2.1 can then be completed by the following two lemmas on the other two terms in (2.14):

Lemma 2.3 (B-E bound for TT with the random threshold 𝔷𝒳\mathfrak{z}_{\mathcal{X}}).

Suppose the same assumptions as in Theorem 2.1 hold. If [|h|3]<[|h|^{3}]<\infty, then

supz|P(T𝔷𝒳)𝔼[Φ(𝔷𝒳)]|C[𝔼[|h|3](12p+2p2)σh3(N(1p))1/2+exp([n/m]σh624(𝔼[|h|3])2)+~n,m,N,h],\sup_{z\in\mathbb{R}}\big{|}P(T\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}\leq C\bigg{[}\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}+\exp\Big{(}\frac{-[n/m]\sigma_{h}^{6}}{24(\mathbb{E}[|h|^{3}])^{2}}\Big{)}+\tilde{\mathcal{R}}_{n,m,N,h}\bigg{]},

where ~n,m,N,h\tilde{\mathcal{R}}_{n,m,N,h} is a non-negative term that can be bounded as

~n,m,N,hmn(1p)+(1+mNn(1p))m3/2𝔼[Ψ13/2]n1/2σh3+(m1)1/3(r=2m(mr)3/2(nr)1/22r/2𝔼[|πr(h2)|3/2]σh3)2/3.\tilde{\mathcal{R}}_{n,m,N,h}\leq\frac{\sqrt{m}}{\sqrt{n(1-p)}}+\bigg{(}1+\frac{\sqrt{mN}}{\sqrt{n(1-p)}}\bigg{)}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\\ +(m-1)^{1/3}\bigg{(}\sum_{r=2}^{m}{m\choose r}^{3/2}{n\choose r}^{-1/2}\frac{2^{r/2}\mathbb{E}[|\pi_{r}(h^{2})|^{3/2}]}{\sigma_{h}^{3}}\bigg{)}^{2/3}.

If, in addition, 𝔼[h4]<\mathbb{E}[h^{4}]<\infty, then we have

~n,m,N,hmn(1p)+m𝔼[(h2σh2)2]σh2n.\tilde{\mathcal{R}}_{n,m,N,h}\leq\frac{\sqrt{m}}{\sqrt{n(1-p)}}+\frac{\sqrt{m\mathbb{E}[(h^{2}-\sigma_{h}^{2})^{2}]}}{\sigma_{h}^{2}\sqrt{n}}.
Lemma 2.4 (Bound for εz\varepsilon_{z}).

Under the same assumptions as in Theorem 2.1, there exists an absolute constant C>0C>0 such that for all zz\in\mathbb{R},

εzCNn2(1p).\varepsilon_{z}\leq\frac{CN}{n^{2}(1-p)}.

Collecting (2.5), (2.14) and (2.18), Lemmas 2.3 and 2.4 immediately render Theorem 2.1. The critical result is Lemma 2.3, the B-E bound for the term TT with the random threshold 𝔷𝒳\mathfrak{z}_{\mathcal{X}}, which also depends on the data 𝒳\mathcal{X} via UnU_{n} as suggested in (2.16). Most of the rest of this paper is devoted to proving Lemma 2.3, while the straightforward proof of Lemma 2.4 is included in Appendix A. We note that in (2.15), (NN^cN^c)NUnσh1p\big{(}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\big{)}\frac{\sqrt{N}U_{n}}{\sigma_{h}\sqrt{1-p}} is a lower-order term while N3/2BnN^cσh\frac{N^{3/2}B_{n}}{\hat{N}_{c}\sigma_{h}} is the main term that drives the distribution of TT; hence, it may be tempting to draw a parallel between Lemma 2.3 and the B-E bound for the conditional distribution P(NBn/Uh2z𝒳)P(\sqrt{N}B_{n}/\sqrt{U_{h^{2}}}\leq z\mid\mathcal{X}) in (2.6). Nevertheless, while (2.6) is a direct corollary of the classical result by Berry (1941) and Esseen (1942), the normalizer N^cσh\hat{N}_{c}\sigma_{h} for N3/2BnN^{3/2}B_{n} is not the latter’s conditional standard deviation NUh2N\sqrt{U_{h^{2}}} given 𝒳\mathcal{X}. Moreover, unlike standard B-E bound problems, one has to handle a random threshold. These attributes make proving Lemma 2.3 non-trivially challenging.

Turns out, two ingredients that have only been recently formalized in the literature come in handy: the framework of proving B-E bounds for the so-called “Studentized nonlinear statistics” with Stein’s method (Leung et al., 2024), and an exponential lower tail bound for the U-statistic Uh2U_{h^{2}} in (2.1). We note that analyzing the tail behavior of Uh2U_{h^{2}} is quite an important aspect, as will become apparent later in our proof of Lemma 2.3 in Section 3. For another technical comparison with Peng et al. (2022, Theorem 5), we note that the Chebyshev’s inequality used therein (Peng et al., 2022, p.285) for controlling the tail of Uh2U_{h^{2}} is too crude to produce a bound of the desired rate 1/min(N,n)1/\sqrt{\min(N,n)}. In contrast, we exploit the non-negativity of Uh2U_{h^{2}} and only control its lower tail, which decays exponentially in nn as per the latest result in Leung and Shao (2023, Lemma 4.3).

Before ending this section, we establish (2.14) as promised:

Proof of (2.14).

Let Y1Y_{1} and Y2Y_{2} be two independent standard normal random variables that are also independent of the data 𝒳\mathcal{X}. With the decomposition in (2.2), one can first write the inequality

P(nUn,Nσz)\displaystyle P\bigg{(}\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}\leq z\bigg{)} P(NnBnN^cσ+(NN^c)nUnN^cσ1pz1pnUnσ1p)+P(Ω\𝒵)\displaystyle\leq P\bigg{(}\frac{N\sqrt{n}B_{n}}{\hat{N}_{c}\sigma}+\frac{(N-\hat{N}_{c})\sqrt{n}U_{n}}{\hat{N}_{c}\sigma\sqrt{1-p}}\leq\frac{z}{\sqrt{1-p}}-\frac{\sqrt{n}U_{n}}{\sigma\sqrt{1-p}}\bigg{)}+P(\Omega\backslash\mathcal{E}_{\mathcal{Z}})
(2.20) =P(T𝔷𝒳)+P(Ω\𝒵).\displaystyle=P(T\leq\mathfrak{z}_{\mathcal{X}})+P(\Omega\backslash\mathcal{E}_{\mathcal{Z}}).

The first term on the right hand side of (2) can be further bounded as

(2.21) P(T𝔷𝒳)P(Y1𝔷𝒳)+|P(T𝔷𝒳)𝔼[Φ(𝔷𝒳)]|.P(T\leq\mathfrak{z}_{\mathcal{X}})\leq P\left(Y_{1}\leq\mathfrak{z}_{\mathcal{X}}\right)+\big{|}P\big{(}T\leq\mathfrak{z}_{\mathcal{X}}\big{)}-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}.

Continuing, we can bound the first term on the right hand side of (2.21) as

P(Y1𝔷𝒳)\displaystyle P\left(Y_{1}\leq\mathfrak{z}_{\mathcal{X}}\right) =P(NUnσh1pzσσhα(1p)Y1)\displaystyle=P\bigg{(}\frac{\sqrt{N}U_{n}}{\sigma_{h}\sqrt{1-p}}\leq\frac{z\sigma}{\sigma_{h}\sqrt{\alpha(1-p)}}-Y_{1}\bigg{)}
=𝔼[P(nUnmσgzσmσgσhα(1p)mσgY1Y1)]\displaystyle=\mathbb{E}\bigg{[}P\bigg{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq\frac{z\sigma}{m\sigma_{g}}-\frac{\sigma_{h}\sqrt{\alpha(1-p)}}{m\sigma_{g}}Y_{1}\mid Y_{1}\bigg{)}\bigg{]}
P(Y2zσmσgσhα(1p)mσgY1)+supz|P(nUnmσgz)Φ(z)|\displaystyle\leq P\bigg{(}Y_{2}\leq\frac{z\sigma}{m\sigma_{g}}-\frac{\sigma_{h}\sqrt{\alpha(1-p)}}{m\sigma_{g}}Y_{1}\bigg{)}+\sup_{z^{\prime}\in\mathbb{R}}\Big{|}P\Big{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z^{\prime}\Big{)}-\Phi(z^{\prime})\Big{|}
=Φ(σzm2σg2+α(1p)σh2)+supz|P(nUnmσgz)Φ(z)|\displaystyle=\Phi\bigg{(}\frac{\sigma z}{\sqrt{m^{2}\sigma_{g}^{2}+\alpha(1-p)\sigma_{h}^{2}}}\bigg{)}+\sup_{z^{\prime}\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z^{\prime}\bigg{)}-\Phi(z^{\prime})\bigg{|}
(2.22) Φ(z)+supz|P(nUnmσgz)Φ(z)|+εz,\displaystyle\leq\Phi(z)+\sup_{z^{\prime}\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z^{\prime}\bigg{)}-\Phi(z^{\prime})\bigg{|}+\varepsilon_{z},

where εz\varepsilon_{z} is as defined in (2.17); the last equality above uses that mσgY2+σhα(1p)Y1m\sigma_{g}Y_{2}+\sigma_{h}\sqrt{\alpha(1-p)}Y_{1} is distributed as a normal random variable with mean 0 and variance

m2σg2+α(1p)σh2.m^{2}\sigma_{g}^{2}+\alpha(1-p)\sigma_{h}^{2}.

Combining (2)-(2.22), we have

(2.23) P(nUn,Nσz)Φ(z)+supz|P(nUnmσgz)Φ(z)|+εz+P(Ω\𝒵)+|P(T𝔷𝒳)𝔼[Φ(𝔷𝒳)]|P\bigg{(}\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}\leq z\bigg{)}\leq\\ \Phi(z)+\sup_{z^{\prime}\in\mathbb{R}}\Big{|}P\Big{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z^{\prime}\Big{)}-\Phi(z^{\prime})\Big{|}+\varepsilon_{z}+P(\Omega\backslash\mathcal{E}_{\mathcal{Z}})+|P(T\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]|

By reverse arguments, in parallel to (2)-(2.22), we can get

P(nUn,Nσz)P(T𝔷𝒳)P(Ω\𝒵),P\bigg{(}\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}\leq z\bigg{)}\geq P(T\leq\mathfrak{z}_{\mathcal{X}})-P(\Omega\backslash\mathcal{E}_{\mathcal{Z}}),
P(T𝔷𝒳)P(Y1𝔷𝒳)|P(T𝔷𝒳)𝔼[Φ(𝔷𝒳)]|P(T\leq\mathfrak{z}_{\mathcal{X}})\geq P\left(Y_{1}\leq\mathfrak{z}_{\mathcal{X}}\right)-|P(T\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]|

and

P(Y1𝔷𝒳)Φ(z)supz|P(nUnmσgz)Φ(z)|εz,P\left(Y_{1}\leq\mathfrak{z}_{\mathcal{X}}\right)\geq\Phi(z)-\sup_{z^{\prime}\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z^{\prime}\bigg{)}-\Phi(z^{\prime})\bigg{|}-\varepsilon_{z},

which can be combined to give

(2.24) P(nUn,Nσz)Φ(z)supz|P(nUnmσgz)Φ(z)|εzP(Ω\𝒵)|P(T𝔷𝒳)𝔼[Φ(𝔷𝒳)]|.P\bigg{(}\frac{\sqrt{n}U_{n,N}^{\prime}}{\sigma}\leq z\bigg{)}\geq\\ \Phi(z)-\sup_{z^{\prime}\in\mathbb{R}}\bigg{|}P\bigg{(}\frac{\sqrt{n}U_{n}}{m\sigma_{g}}\leq z^{\prime}\bigg{)}-\Phi(z^{\prime})\bigg{|}-\varepsilon_{z}-P(\Omega\backslash\mathcal{E}_{\mathcal{Z}})-\big{|}P\big{(}T\leq\mathfrak{z}_{\mathcal{X}}\big{)}-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}.

Together, (2.23) and (2.24) render (2.14).

3. Proof of the critical Lemma 2.3

Now we start proving Lemma 2.3. Define the event

(3.1) 1,𝒳{Uh2>σh22}\mathcal{E}_{1,\mathcal{X}}\equiv\bigg{\{}U_{h^{2}}>\frac{\sigma_{h}^{2}}{2}\bigg{\}}

that only depends on the raw data 𝒳\mathcal{X}. For the rest of this paper, if Ω\mathcal{E}\subset\Omega is any event whose occurrence is fully determined by the raw data 𝒳\mathcal{X}, with a slight abuse of notation we will use 𝒳\mathcal{X}\in\mathcal{E} to denote a situation where the event \mathcal{E} is true. For instance, 𝒳1,𝒳\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}} indicates the realized value of 𝒳\mathcal{X} is such that Uh2>σh2U_{h^{2}}>\sigma_{h}^{2}. In an analogous manner, 𝒵𝒵\mathcal{Z}\in\mathcal{E}_{\mathcal{Z}} means the realized values of the Bernoulli samplers are such that N^[N2,3N2]\hat{N}\in[\frac{N}{2},\frac{3N}{2}], for the event defined in (2.13). Next, we censor various random quantities: Define the lower censored version of Uh2U_{h^{2}} as

(3.2) U¯h2σh22I(Uh2σh22)+Uh2I(Uh2>σh22),\bar{U}_{h^{2}}\equiv\frac{\sigma_{h}^{2}}{2}I\bigg{(}U_{h^{2}}\leq\frac{\sigma_{h}^{2}}{2}\bigg{)}+U_{h^{2}}I\bigg{(}U_{h^{2}}>\frac{\sigma_{h}^{2}}{2}\bigg{)},

from which we let

(3.3) ξ𝐢(Z𝐢p)h(𝐗𝐢)(U¯h2N(1p))1/2,\xi_{\bf i}\equiv\frac{(Z_{\bf i}-p)h({\bf X}_{\bf i})}{(\bar{U}_{h^{2}}N(1-p))^{1/2}},

whose first moments conditional on the raw data 𝒳\mathcal{X} can be computed as

(3.4) 𝔼[ξ𝐢𝒳]=0,𝔼[ξ𝐢2𝒳]=h2(𝐗𝐢)U¯h2(nm) and 𝔼[|ξ𝐢|3𝒳]=(12p+2p2)|h(𝐗𝐢)|3(nm)U¯h23/2N(1p).\mathbb{E}[\xi_{\bf i}\mid\mathcal{X}]=0,\quad\mathbb{E}[\xi_{\bf i}^{2}\mid\mathcal{X}]=\frac{h^{2}({\bf X}_{\bf i})}{\bar{U}_{h^{2}}{n\choose m}}\text{ and }\mathbb{E}[|\xi_{\bf i}|^{3}\mid\mathcal{X}]=\frac{(1-2p+2p^{2})|h({\bf X}_{\bf i})|^{3}}{{n\choose m}\bar{U}_{h^{2}}^{3/2}\sqrt{N(1-p)}}.

With these, by letting

(3.5) D1NN^cN^c𝐢In,mξ𝐢+(NN^cN^c)NUnU¯h2(1p),D_{1}\equiv\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\sum_{{\bf i}\in{I_{n,m}}}\xi_{\bf i}+\bigg{(}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\bigg{)}\frac{\sqrt{N}U_{n}}{\sqrt{\bar{U}_{h^{2}}(1-p)}},

and

(3.6) D2σh2U¯h2(Uh2σh21),D_{2}\equiv-\frac{\sigma_{h}^{2}}{\bar{U}_{h^{2}}}\bigg{(}\frac{U_{h^{2}}}{\sigma_{h}^{2}}-1\bigg{)},

we will define the random variable

(3.7) TSN𝐢In,mξ𝐢+D11+D2,T_{SN}\equiv\frac{\sum_{{\bf i}\in I_{n,m}}\xi_{\bf i}+D_{1}}{\sqrt{1+D_{2}}},

where “SN” stands for “Studentized Nonlinear” and it will soon become clear why TSNT_{SN} is named this way. We also remark that the quantities ξ𝐢\xi_{\bf i}, D1D_{1} and D2D_{2}, which have U¯h2\bar{U}_{h^{2}} and N^c\hat{N}_{c} figuring as denominators in their definitions, are all well-defined on the entire space Ω\Omega; unlike U¯h2\bar{U}_{h^{2}} and N^\hat{N}, the U¯h2\bar{U}_{h^{2}} and N^c\hat{N}_{c} can never be equal to zero by their censoring constructions, so division by zero is not a concern.

Because

(3.8) U¯h2=Uh2 on 1,𝒳,\bar{U}_{h^{2}}=U_{h^{2}}\text{ on }\mathcal{E}_{1,\mathcal{X}},

in light of how U¯h2\bar{U}_{h^{2}} and 1,𝒳\mathcal{E}_{1,\mathcal{X}} are defined, by tracing the definitions in (3.3), (3.5) and (3.6), one should see that

𝐢In,mξ𝐢=NBnUh2,D1=(NN^c)NBnN^cUh2+(NN^cN^c)NUnUh2(1p), and D2=σh2Uh21 when 𝒳1,𝒳.\sum_{{\bf i}\in{I_{n,m}}}\xi_{\bf i}=\frac{\sqrt{N}B_{n}}{\sqrt{U_{h^{2}}}},\quad D_{1}=\frac{(N-\hat{N}_{c})\sqrt{N}B_{n}}{\hat{N}_{c}\sqrt{U_{h^{2}}}}+\bigg{(}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\bigg{)}\frac{\sqrt{N}U_{n}}{\sqrt{U_{h^{2}}(1-p)}},\\ \text{ and }D_{2}=\frac{\sigma_{h}^{2}}{U_{h^{2}}}-1\text{ when }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}}.

Together with the definition of TT in (2.15), the facts in the prior display can be put together to render that

T=TSN when 𝒳1,𝒳;T=T_{SN}\text{ when }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}};

this gets us off on the right foot with developing the B-E bound in Lemma 2.3, as it allows one to write

(3.9) |P(T𝔷𝒳)𝔼[Φ(𝔷𝒳)]||P(TSN𝔷𝒳)𝔼[Φ(𝔷𝒳)]|+P(Ω\1,𝒳)|P(T\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]|\leq\big{|}P(T_{SN}\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}+P\big{(}\Omega\backslash\mathcal{E}_{1,\mathcal{X}}\big{)}

by truncating away the complementary event Ω\1,𝒳\Omega\backslash\mathcal{E}_{1,\mathcal{X}}, which has a small probability as we will see. We have thus transformed the problem into one of establishing a B-E bound for TSNT_{SN} with respect to the random threshold 𝔷𝒳\mathfrak{z}_{\mathcal{X}}, i.e. a bound on the absolute difference

(3.10) |P(TSN𝔷𝒳)𝔼[Φ(𝔷𝒳)]|,\big{|}P(T_{SN}\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|},

and can adopt the paradigm recently formalized in Leung et al. (2024) of using Stein’s method to prove B-E bounds for Studentized nonlinear statistics: For nn real-valued deterministic functions t1(),,tn()t_{1}(\cdot),\dots,t_{n}(\cdot) and a sample of independent random variables Z1,,ZnZ_{1},\dots,Z_{n}\in\mathbb{R} such that 𝔼[ti(Zi)]=0\mathbb{E}[t_{i}(Z_{i})]=0 for all ii and i=1n𝔼[t2(Zi)]=1\sum_{i=1}^{n}\mathbb{E}[t^{2}(Z_{i})]=1, a Studentized nonlinear statistic is a statistic that can be written into the generic form

(3.11) i=1nti(Zi)+𝒟11+𝒟2,\frac{\sum_{i=1}^{n}t_{i}(Z_{i})+\mathcal{D}_{1}}{\sqrt{1+\mathcal{D}_{2}}},

where 𝒟1=𝒟1(Z1,,Zn)\mathcal{D}_{1}=\mathcal{D}_{1}(Z_{1},\dots,Z_{n}) and 𝒟2=𝒟2(Z1,,Zn)\mathcal{D}_{2}=\mathcal{D}_{2}(Z_{1},\dots,Z_{n}) are (usually small) remainder terms that may depend on the data Z1,,ZnZ_{1},\dots,Z_{n}, with 𝒟2>1\mathcal{D}_{2}>-1 almost surely; in particular, the generic statistic in (3.11) is “Studentized” by the data-driven estimate 1+𝒟21+\mathcal{D}_{2} of its own variance, and we refer the readers to Shao et al. (2016, Sec. 4) for some typical examples of such statistics. Indeed,

(3.12) TSN can be understood as an instance of (3.11 when conditioning on 𝒳 for any 𝒳1,𝒳:\text{$T_{SN}$ can be understood as an instance of \eqref{sn_stat_generic_form} }\\ \text{ when conditioning on $\mathcal{X}$ for any $\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}}$}:

From the conditional moment calculations in (3.4), we have

(3.13) 𝔼[ξ𝐢𝒳]=0 and 𝐢In,m𝔼[ξ𝐢2𝒳]=Uh2U¯h2=1 for 𝒳1,𝒳,\mathbb{E}[\xi_{\bf i}\mid\mathcal{X}]=0\text{ and }\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[\xi_{\bf i}^{2}\mid\mathcal{X}]=\frac{U_{h^{2}}}{\bar{U}_{h^{2}}}=1\text{ for }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}},

where the last equality comes from the fact in (3.8). Moreover, only D1D_{1} depends on 𝒵\mathcal{Z}, and D2D_{2} is no smaller than 1-1 because, by (3.8) again,

1+D2=1σh2U¯h2(Uh2σh21)=σh2U¯h2>0 for 𝒳1,𝒳.1+D_{2}=1-\frac{\sigma_{h}^{2}}{\bar{U}_{h^{2}}}\bigg{(}\frac{{U}_{h^{2}}}{\sigma_{h}^{2}}-1\bigg{)}=\frac{\sigma_{h}^{2}}{\bar{U}_{h^{2}}}>0\text{ for }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}}.

These verify the claim in (3.12).

Based on the preceding observation, we can adopt the framework of Leung et al. (2024) to develop a preliminary bound on the absolute difference in (3.10). To state the resulting bound, we shall define the following “leave-one-out” versions of N^\hat{N}, N^c\hat{N}_{c} and D1D_{1} with respect to each 𝐢In,m{\bf i}\in{I_{n,m}}, by leaving out any term involving Z𝐢Z_{\bf i}:

N^(𝐢)𝐣In,m,𝐣𝐢Z𝐣,\hat{N}^{({\bf i})}\equiv\sum_{{\bf j}\in{I_{n,m}},{\bf j}\neq{\bf i}}Z_{\bf j},
(3.14) N^c(𝐢)N^(𝐢)I(N^(𝐢)[N2,3N2])+3N2I(N^(𝐢)>3N2)+N2I(N^(𝐢)<N2),\hat{N}_{c}^{({\bf i})}\equiv\hat{N}^{({\bf i})}I\bigg{(}\hat{N}^{({\bf i})}\in\Big{[}\frac{N}{2},\frac{3N}{2}\Big{]}\bigg{)}+\frac{3N}{2}I\bigg{(}\hat{N}^{({\bf i})}>\frac{3N}{2}\bigg{)}+\frac{N}{2}I\bigg{(}\hat{N}^{({\bf i})}<\frac{N}{2}\bigg{)},

and

(3.15) D1(𝐢)NN^c(𝐢)N^c(𝐢)𝐣In,m,𝐣𝐢ξ𝐣+(NN^c(𝐢)N^c(𝐢))NUnU¯h2(1p).D_{1}^{({\bf i})}\equiv\frac{N-\hat{N}_{c}^{({\bf i})}}{\hat{N}_{c}^{({\bf i})}}\sum_{{\bf j}\in I_{n,m},{\bf j}\neq{\bf i}}\xi_{\bf j}+\bigg{(}\frac{N-\hat{N}_{c}^{({\bf i})}}{\hat{N}_{c}^{({\bf i})}}\bigg{)}\frac{\sqrt{N}U_{n}}{\sqrt{\bar{U}_{h^{2}}(1-p)}}.

Further, we censor D1D_{1} and D2D_{2}, as well as ξ𝐢\xi_{{\bf i}} and D1(𝐢)D_{1}^{({\bf i})} for each 𝐢In,m{\bf i}\in{I_{n,m}}, as:

D¯kDkI(12Dk12)12I(Dk<12)+12I(Dk>12) for k=1,2,\bar{D}_{k}\equiv D_{k}I\bigg{(}-\frac{1}{2}\leq D_{k}\leq\frac{1}{2}\bigg{)}-\frac{1}{2}I\bigg{(}D_{k}<-\frac{1}{2}\bigg{)}+\frac{1}{2}I\bigg{(}D_{k}>\frac{1}{2}\bigg{)}\text{ for }k=1,2,
(3.16) ξ¯𝐢ξ𝐢I(|ξ𝐢|1)+1I(ξ𝐢>1)1I(ξ𝐢<1).{\bar{\xi}}_{\bf i}\equiv\xi_{\bf i}I(|\xi_{\bf i}|\leq 1)+1I(\xi_{\bf i}>1)-1I(\xi_{\bf i}<-1).

and

(3.17) D¯1(𝐢)D1(𝐢)I(12D1(𝐢)12)12I(D1(𝐢)<12)+12I(D1(𝐢)>12).\bar{D}_{1}^{({\bf i})}\equiv D_{1}^{({\bf i})}I\bigg{(}-\frac{1}{2}\leq D_{1}^{({\bf i})}\leq\frac{1}{2}\bigg{)}-\frac{1}{2}I\bigg{(}D_{1}^{({\bf i})}<-\frac{1}{2}\bigg{)}+\frac{1}{2}I\bigg{(}D_{1}^{({\bf i})}>\frac{1}{2}\bigg{)}.

The following B-E bound for TSNT_{SN} with respect to the random threshold 𝔷𝒳\mathfrak{z}_{\mathcal{X}} is proven in Appendix B via suitably adapting the arguments from Leung et al. (2024):

Lemma 3.1 (B-E bound for TSNT_{SN} with the random threshold 𝔷𝒳\mathfrak{z}_{\mathcal{X}}).

Under the same assumptions as Theorem 2.1, for any zz\in\mathbb{R}, there exists an absolute constant C>0C>0 such that

|P(TSN𝔷𝒳)𝔼[Φ(𝔷𝒳)]|C(T(D1)+T(D2)+P(Ω\1,𝒳)+𝔼[|h|3](12p+2p2)σh3N(1p)),|P(T_{SN}\leq\mathfrak{z}_{\mathcal{X}})-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]|\leq C\bigg{(}T(D_{1})+T(D_{2})+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})+\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}\bigg{)},

for

T(D1)P(|D1|>1/2)+𝔼[𝔼[D¯12|𝒳]+𝐢In,m|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]σh(nm)]T(D_{1})\equiv P(|D_{1}|>1/2)+\mathbb{E}\Bigg{[}\sqrt{\mathbb{E}[\bar{D}_{1}^{2}|\mathcal{X}]}+\frac{\sum_{{\bf i}\in I_{n,m}}|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}}{\sigma_{h}\sqrt{{n\choose m}}}\Bigg{]}

and

T(D2)P(|D2|>1/2)+𝔼[D¯22]+|𝔼[𝔷𝒳D¯2f𝔷𝒳(𝐢In,mξ¯𝐢)]|,T(D_{2})\equiv P(|D_{2}|>1/2)+\mathbb{E}[\bar{D}_{2}^{2}]+\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{2}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|},

where

(3.18) fz(w){2πew2/2Φ(w)Φ¯(z)wz2πew2/2Φ(z)Φ¯(w)w>zf_{z}(w)\equiv\begin{cases}\sqrt{2\pi}e^{w^{2}/2}\Phi(w)\bar{\Phi}(z)&w\leq z\\ \sqrt{2\pi}e^{w^{2}/2}\Phi(z)\bar{\Phi}(w)&w>z\end{cases}

is the solution to the Stein equation (Stein, 1972)

(3.19) I(wz)Φ(z)=fz(w)wfz(w),I(w\leq z)-\Phi(z)=f_{z}^{\prime}(w)-wf_{z}(w),

and f𝔷𝒳f_{\mathfrak{z}_{\mathcal{X}}} denotes fzf_{z} in (3.18) with zz substituted by the random quantity 𝔷𝒳\mathfrak{z}_{\mathcal{X}}.

Apparently, to eventually arrive at Lemma 2.3, we must further bound the quantities T(D1)T(D_{1}) and T(D2)T(D_{2}) in Lemma 3.1, which respectively contain terms related to the remainders in the numerator and denominator of TSNT_{SN} in (3.7):

Lemma 3.2 (Bounding T(D1)T(D_{1})).

Under the same assumptions as in Theorem 2.1, the term T1(D1)T_{1}(D_{1}) defined in Lemma 3.1 satisfies the bound

T(D1)C(1N+mn(1p)).T(D_{1})\leq C\bigg{(}\frac{1}{\sqrt{N}}+\frac{\sqrt{m}}{\sqrt{n(1-p)}}\bigg{)}.
Lemma 3.3 (Bounding T(D2)T(D_{2})).

Suppose the same assumptions as in Theorem 2.1 hold. If 𝔼[|h|3]<\mathbb{E}[|h|^{3}]<\infty, then the term T(D2)T(D_{2}) defined in Lemma 3.1 satisfies the bound

T(D2)C[𝔼[|h|3](12p+2p2)σh3(N(1p))1/2+(1+mNn(1p))m3/2𝔼[Ψ13/2]n1/2σh3+(m1)1/3(r=2m(mr)3/2(nr)1/22r/2𝔼[|πr(h2)|3/2]σh3)2/3]+P(Ω\1,𝒳).T(D_{2})\leq C\Bigg{[}\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}+\bigg{(}1+\frac{\sqrt{mN}}{\sqrt{n(1-p)}}\bigg{)}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}+\\ (m-1)^{1/3}\bigg{(}\sum_{r=2}^{m}{m\choose r}^{3/2}{n\choose r}^{-1/2}\frac{2^{r/2}\mathbb{E}[|\pi_{r}(h^{2})|^{3/2}]}{\sigma_{h}^{3}}\bigg{)}^{2/3}\Bigg{]}+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}}).

If, in addition, 𝔼[h4]<\mathbb{E}[h^{4}]<\infty, then T(D2)8m𝔼[(h2σh2)2]σh2nT(D_{2})\leq\frac{8\sqrt{m\mathbb{E}[(h^{2}-\sigma_{h}^{2})^{2}]}}{\sigma_{h}^{2}\sqrt{n}}.

The proof of Lemma 3.2 in Appendix C entails computing tight moment estimates that exploits conditioning on the raw data 𝒳\mathcal{X} wherever necessary, and the proof of Lemma 3.3 in Section 4 involves interesting leave-one-out arguments quite unique to this problem. To seal the proof of Lemma 2.3, we critically require the following exponential lower tail bound for U-statistics constructed with non-negative kernels recently proved in Leung and Shao (2023, Lemma 4.3):

Lemma 3.4 (Exponential lower tail bound for U-statistics with non-negative kernels).

Suppose Uκ=(nm)1𝐢In,mκ(X𝐢)U_{\kappa}={n\choose m}^{-1}\sum_{{\bf i}\in{I_{n,m}}}\kappa(X_{\bf i}) is a U-statistic, and κ:Mm0\kappa:M^{m}\longrightarrow\mathbb{R}_{\geq 0} is a symmetric kernel of degree mm that can only take non-negative values, with the property that 0<𝔼[κ]<0<\mathbb{E}[\kappa^{\ell}]<\infty for some (1,2]\ell\in(1,2]. Then for 0<t𝔼[κ]0<t\leq\mathbb{E}[\kappa],

P(Uκt)exp([n/m](1)(𝔼[κ]t)/(1)(𝔼[κ])1/(1)),P(U_{\kappa}\leq t)\leq\exp\left(\frac{-[n/m](\ell-1)(\mathbb{E}[\kappa]-t)^{\ell/(\ell-1)}}{\ell(\mathbb{E}[\kappa^{\ell}])^{1/(\ell-1)}}\right),

where [n/m][n/m] is defined as the greatest integer less than n/mn/m.

Taking =3/2\ell=3/2 and t=σh2/2t=\sigma_{h}^{2}/2 in Lemma 3.4, from the definition of 1,𝒳\mathcal{E}_{1,\mathcal{X}} in (3.1) one has

(3.20) P(Ω\1,𝒳)exp([n/m]σh624(𝔼[|h|3])2).P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})\leq\exp\Big{(}\frac{-[n/m]\sigma_{h}^{6}}{24(\mathbb{E}[|h|^{3}])^{2}}\Big{)}.

Combining (3.9), (3.20), Lemmas 3.1-3.3 and the fact that

1N𝔼[|h|3](12p+2p2)σh3(N(1p))1/2\frac{1}{\sqrt{N}}\leq\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}

(by virtue of (2.19) and 𝔼[|h|3]σh3\mathbb{E}[|h|^{3}]\geq\sigma_{h}^{3}), the proof of Lemma 2.3 is finished.

4. Bounding T(D2)T(D_{2}) in Lemma 3.3

In this section, we will use the following bounds on the solution to the Stein equation in (3.18) proven in Appendix D.1.

Lemma 4.1 (Bounds for fzf_{z} and fzf_{z}^{\prime}).

The following bounds are true for fzf_{z} in (3.18) and its derivative fzf_{z}^{\prime}:

  1. (i)

    For all w,zw,z\in\mathbb{R}, 0<fz(w)0.630<f_{z}(w)\leq 0.63 and |fz(w)|1|f_{z}^{\prime}(w)|\leq 1 .

  2. (ii)

    For z1z\geq 1,

    fz(w){1.7ez for wz11/z for z1<wz1/w for z<wf_{z}(w)\leq\begin{cases}1.7e^{-z}&\text{ for }\quad w\leq z-1\\ 1/z&\text{ for }\quad z-1<w\leq z\\ 1/w&\text{ for }\quad z<w\end{cases}

    and

    |fz(w)|{e1/2z for wz11 for z1<wz(1+z2)1 for w>z.|f_{z}^{\prime}(w)|\leq\begin{cases}e^{1/2-z}&\text{ for }\quad w\leq z-1\\ 1&\text{ for }\quad z-1<w\leq z\\ (1+z^{2})^{-1}&\text{ for }\quad w>z\end{cases}.
  3. (iii)

    For z1z\leq-1,

    fz(w){1.7e|z| if wz+11/|z| if zw<z+11/|w| if w<zf_{z}(w)\leq\begin{cases}1.7e^{-|z|}&\text{ if }\quad w\geq z+1\\ 1/|z|&\text{ if }\quad z\leq w<z+1\\ 1/|w|&\text{ if }\quad w<z\end{cases}

    and

    |fz(w)|{e1/2|z| if wz+11 if zw<z+1(1+|z|2)1 if w<z.|f_{z}^{\prime}(w)|\leq\begin{cases}e^{1/2-|z|}&\text{ if }\quad w\geq z+1\\ 1&\text{ if }\quad z\leq w<z+1\\ (1+|z|^{2})^{-1}&\text{ if }\quad w<z\end{cases}.

Due to the bounds for fzf_{z} in Lemma 4.1(i)(i)-(iii)(iii), it is easy to see that

(4.1) supz,w|zfz(w)|1.\sup_{z,w\in\mathbb{R}}|zf_{z}(w)|\leq 1.

Given D¯22|D¯2|\bar{D}_{2}^{2}\leq|\bar{D}_{2}|, |D¯2||D2||\bar{D}_{2}|\leq|D_{2}| by Property 2.2(ii)(ii) and (4.1), in light of the definition of T(D2)T(D_{2}) we have

T(D2)4D228Uh2σh22σh28m𝔼[(h2σh2)2]σh2n,T(D_{2})\leq 4\|D_{2}\|_{2}\leq\frac{8\|U_{h^{2}}-\sigma_{h}^{2}\|_{2}}{\sigma_{h}^{2}}\leq\frac{8\sqrt{m\mathbb{E}[(h^{2}-\sigma_{h}^{2})^{2}]}}{\sigma_{h}^{2}\sqrt{n}},

where the second last inequality comes from U¯h2σh2/2\bar{U}_{h^{2}}\geq\sigma_{h}^{2}/2 and the last inequality uses a classical bound on U-statistic variance (Koroljuk and Borovskich, 1994, Lemma 1.1.4). This has proved the bound for T(D2)T(D_{2}) in Lemma 3.3 under the stronger finite fourth moment assumption 𝔼[h4]<\mathbb{E}[h^{4}]<\infty.

The bound for T(D2)T(D_{2}) in Lemma 3.3 under the finite third moment assumption 𝔼[|h|3]<\mathbb{E}[|h|^{3}]<\infty is considerably more involved to prove. To prove it, we first upper-censor the ηi\eta_{i}’s in (2.10) as

η¯iηiI(ηi1)+1I(ηi>1),{\bar{\eta}}_{i}\equiv\eta_{i}I(\eta_{i}\leq 1)+1I(\eta_{i}>1),

from which we can further define

(4.2) Π1i=1n(η¯i𝔼[η¯i]),\Pi_{1}\equiv\sum_{i=1}^{n}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]),
(4.3) Π2\displaystyle\Pi_{2} r=2m(mr)(nr)11i1<<irnπr(h2)(Xi1,,Xir)σh2i=1n𝔼[(ηi1)I(|ηi|>1)],\displaystyle\equiv\sum_{r=2}^{m}{m\choose r}{n\choose r}^{-1}\sum_{1\leq i_{1}<\dots<i_{r}\leq n}\frac{\pi_{r}(h^{2})(X_{i_{1}},\dots,X_{i_{r}})}{\sigma_{h}^{2}}-\sum_{i=1}^{n}\mathbb{E}[(\eta_{i}-1)I(|\eta_{i}|>1)],
(4.4) D3σh2(Π1+Π2)U¯h2D_{3}\equiv-\frac{\sigma_{h}^{2}(\Pi_{1}+\Pi_{2})}{\bar{U}_{h^{2}}}

and

D¯3D3I(12D312)12I(D3<12)+12I(D3>12).\bar{D}_{3}\equiv D_{3}I\bigg{(}-\frac{1}{2}\leq D_{3}\leq\frac{1}{2}\bigg{)}-\frac{1}{2}I\bigg{(}D_{3}<-\frac{1}{2}\bigg{)}+\frac{1}{2}I\bigg{(}D_{3}>\frac{1}{2}\bigg{)}.

The following moment bounds on (4.2) and (4.3) proven in Appendix D.2 will soon be useful in the sequel.

Lemma 4.2 (Moment estimates on Π1\Pi_{1} and Π2\Pi_{2}).

If 𝔼[|h|3]<\mathbb{E}[|h|^{3}]<\infty, the following moment bounds hold for Π1\Pi_{1} and Π2\Pi_{2} defined in (4.2) and (4.3):

𝔼[|Π1|]{C(){(m3/2𝔼[Ψ13/2]n1/2σh3)/2+m3/2𝔼[Ψ13/2]n1/2σh3} for   2<<Cm3/2𝔼[Ψ13/2]n1/2σh3 for 322,\mathbb{E}\big{[}|\Pi_{1}|^{\ell}\big{]}\leq\begin{cases}C(\ell)\bigg{\{}\Big{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\Big{)}^{\ell/2}+\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\bigg{\}}&\text{ for }\;\ 2<\ell<\infty\\ C\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}&\text{ for }\;\ \frac{3}{2}\leq\ell\leq 2\end{cases},

where C()>0C(\ell)>0 is a constant depending only on \ell, and

Π23/2m3/2𝔼[Ψ13/2]n1/2σh3+(m1)1/3(r=2m(mr)3/2(nr)1/22r/2𝔼[|πr(h2)|3/2]σh3)2/3.\|\Pi_{2}\|_{3/2}\leq\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}+(m-1)^{1/3}\bigg{(}\sum_{r=2}^{m}{m\choose r}^{3/2}{n\choose r}^{-1/2}\frac{2^{r/2}\mathbb{E}[|\pi_{r}(h^{2})|^{3/2}]}{\sigma_{h}^{3}}\bigg{)}^{2/3}.

Since

m=i=1n𝔼[ηi]=i=1n𝔼[η¯i]+i=1n𝔼[(ηi1)I(|ηi|>1)],m=\sum_{i=1}^{n}\mathbb{E}[\eta_{i}]=\sum_{i=1}^{n}\mathbb{E}[{\bar{\eta}}_{i}]+\sum_{i=1}^{n}\mathbb{E}[(\eta_{i}-1)I(|\eta_{i}|>1)],

in light of the Hoeffding decomposition in (2.9), one can see that

(4.5) D2=D3 when max1inηi1,D_{2}=D_{3}\text{ when }\max_{1\leq i\leq n}\eta_{i}\leq 1,

by comparing the definitions of D2D_{2} and D3D_{3} in (3.6) and (4.4). Now, because T(D2)T(D_{2})’s component terms satisfy the bounds P(|D2|>1/2)1P(|D_{2}|>1/2)\leq 1, |D¯22D¯32|1/2|\bar{D}_{2}^{2}-\bar{D}_{3}^{2}|\leq 1/2 and |𝔷𝒳(D¯2D¯3)f𝔷𝒳(𝐢In,mξ¯𝐢)|1|\mathfrak{z}_{\mathcal{X}}(\bar{D}_{2}-\bar{D}_{3})f_{\mathfrak{z}_{\mathcal{X}}}(\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i})|\leq 1 by (4.1), it is easy to see from (4.5) that

(4.6) T(D2)P(|D3|>1/2)+𝔼[D¯32]+|𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]|+52P(Ω\2,𝒳),T(D_{2})\leq P(|D_{3}|>1/2)+\mathbb{E}[\bar{D}_{3}^{2}]+\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|}+\frac{5}{2}P(\Omega\backslash\mathcal{E}_{2,\mathcal{X}}),

where we have defined the event

2,𝒳{max1inηi1}.\mathcal{E}_{2,\mathcal{X}}\equiv\bigg{\{}\max_{1\leq i\leq n}\eta_{i}\leq 1\bigg{\}}.

In Appendices D.3-D.5, we will derive that one can bound each of the first three terms on the right hand side of (4.6) in moments of Π1\Pi_{1} and Π2\Pi_{2} as:

(4.7) P(|D3|>1/2)C(𝔼[Π12]+Π23/2)P(|D_{3}|>1/2)\leq C(\mathbb{E}[\Pi_{1}^{2}]+\|\Pi_{2}\|_{3/2})
(4.8) 𝔼[D¯32]C(𝔼[Π12]+Π23/2)\mathbb{E}[\bar{D}_{3}^{2}]\leq C(\mathbb{E}[\Pi_{1}^{2}]+\|\Pi_{2}\|_{3/2})
(4.9) |𝔼[𝔷𝒳D¯3f𝔷𝒵(𝐢In,mξ¯𝐢)]||𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]|+22𝔼[|h|3](12p+2p2)σh3(N(1p))1/2C(m3/2𝔼[Ψ13/2]σh3n1/2+𝔼[|Π1|3/2]+𝔼[Π12]+𝔼[|Π1|3]+Π23/2)+P(Ω\1,𝒳).\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{Z}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|}\leq\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}\Big{]}\Big{|}+\frac{2\sqrt{2}\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}\\ C\bigg{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{\sigma_{h}^{3}n^{1/2}}+\mathbb{E}[|\Pi_{1}|^{3/2}]+\mathbb{E}[\Pi_{1}^{2}]+\mathbb{E}[|\Pi_{1}|^{3}]+\|\Pi_{2}\|_{3/2}\bigg{)}+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}}).

Hence, upon applying the moment bound for 𝔼[|Π1|3/2]\mathbb{E}[|\Pi_{1}|^{3/2}], 𝔼[Π12]\mathbb{E}[\Pi_{1}^{2}] and 𝔼[|Π1|3]\mathbb{E}[|\Pi_{1}|^{3}] in Lemma 4.2 to the right hand sides of (4.7)-(4.9), one can further get

(4.10) P(|D3|>1/2)C(m3/2𝔼[Ψ13/2]n1/2σh3+Π23/2)P(|D_{3}|>1/2)\leq C\bigg{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}+\|\Pi_{2}\|_{3/2}\bigg{)}
(4.11) 𝔼[D¯32]C(m3/2𝔼[Ψ13/2]n1/2σh3+Π23/2)\mathbb{E}[\bar{D}_{3}^{2}]\leq C\bigg{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}+\|\Pi_{2}\|_{3/2}\bigg{)}
(4.12) |𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]||𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]|+C(𝔼[|h|3](12p+2p2)σh3(N(1p))1/2+m3/2𝔼[Ψ13/2]n1/2σh3+Π23/2)+P(Ω\1,𝒳),\Big{|}\mathbb{E}\big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\big{)}\big{]}\Big{|}\leq\Big{|}\mathbb{E}\big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}\big{]}\Big{|}\\ +C\bigg{(}\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}+\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}+\|\Pi_{2}\|_{3/2}\bigg{)}+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}}),

where we have absorbed the term (m3/2𝔼[Ψ13/2]n1/2σh3)3/2\big{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\big{)}^{3/2} showing up in the bound for 𝔼[|Π1|3]\mathbb{E}[|\Pi_{1}|^{3}] into m3/2𝔼[Ψ13/2]n1/2σh3\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}. The latter is permissible, because

|𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]|1/2|\mathbb{E}[\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}(\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i})]|\leq 1/2

by virtue of (4.1) and D¯3\bar{D}_{3} being censored within the interval [1/2,1/2][-1/2,1/2]; for the bound on |𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]||\mathbb{E}[\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}(\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i})]| to be non-vacuous, one can hence without loss of generality assume m3/2𝔼[Ψ13/2]n1/2σh31\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\leq 1, which implies (m3/2𝔼[Ψ13/2]n1/2σh3)3/2m3/2𝔼[Ψ13/2]n1/2σh3\big{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\big{)}^{3/2}\leq\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}} to justify absorbing the former into the latter. By recalling (4.6), (4.10)-(4.12) give

(4.13) T(D2)|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]|+C(𝔼[|h|3](12p+2p2)σh3(N(1p))1/2+m3/2𝔼[Ψ13/2]n1/2σh3+Π23/2)+P(Ω\1,𝒳)+52P(Ω\2,𝒳).T(D_{2})\leq\Big{|}\mathbb{E}\big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}\big{]}\Big{|}\\ +C\bigg{(}\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}+\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}+\|\Pi_{2}\|_{3/2}\bigg{)}+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})+\frac{5}{2}P(\Omega\backslash\mathcal{E}_{2,\mathcal{X}}).

For the rest of this section, we will show the estimate

(4.14) |𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]|C(1+mNn(1p))m3/2𝔼[Ψ13/2]n1/2σh3\Big{|}\mathbb{E}\big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}\big{]}\Big{|}\leq C\bigg{(}1+\frac{\sqrt{mN}}{\sqrt{n(1-p)}}\bigg{)}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}

with leave-one-out-arguments that are quite unique to this problem; combining (4.13)-(4.14), the union bound

P(Ω\2,𝒳)i=1nP(ηi>1)i=1n𝔼[ηi3/2]=m3/2𝔼[Ψ13/2]n1/2σh3P(\Omega\backslash\mathcal{E}_{2,\mathcal{X}})\leq\sum_{i=1}^{n}P(\eta_{i}>1)\leq\sum_{i=1}^{n}\mathbb{E}[\eta_{i}^{3/2}]=\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}

and the bound for Π23/2\|\Pi_{2}\|_{3/2} in Lemma 4.2 will then have proven Lemma 3.3 under the finite third moment assumption 𝔼[|h|3]<\mathbb{E}[|h|^{3}]<\infty.

For the proof of (4.14) to follow, we will use the symbol 𝐣In,m,i𝐣\sum_{{\bf j}\in I_{n,m},i\not\in{\bf j}} to denote a summation over the (n1m){n-1\choose m} index vectors 𝐣=(j1,,jm){\bf j}=(j_{1},\dots,j_{m}) in In,m{I_{n,m}} for which jkij_{k}\neq i for any k=1,,mk=1,\dots,m. In the same spirit, we will use 𝐣In,m,i𝐣\sum_{{\bf j}\in I_{n,m},i\in{\bf j}} to denote a summation over the (n1m1){n-1\choose m-1} index vectors 𝐣=(j1,,jm){\bf j}=(j_{1},\dots,j_{m}) in In,m{I_{n,m}} for which jk=ij_{k}=i for some k=1,,mk=1,\dots,m. Let

(4.15) Uh2(i)(nm)1𝐣In,m,i𝐣h2(X𝐣),U_{h^{2}}^{(i)}\equiv{n\choose m}^{-1}\sum_{{\bf j}\in I_{n,m},i\not\in{\bf j}}h^{2}(X_{\bf j}),

be the “leave-one-out” version of Uh2U_{h^{2}} eliminating terms involving XiX_{i}, and define

(4.16) U¯h2(i)σh22I(Uh2(i)σh22)+Uh2(i)I(Uh2(i)>σh22) and ξ𝐣i=(Z𝐣p)h(𝐗𝐣)U¯h2(i)N(1p),\bar{U}^{(i)}_{h^{2}}\equiv\frac{\sigma_{h}^{2}}{2}I\bigg{(}U^{(i)}_{h^{2}}\leq\frac{\sigma_{h}^{2}}{2}\bigg{)}+U_{h^{2}}^{(i)}I\bigg{(}U_{h^{2}}^{(i)}>\frac{\sigma_{h}^{2}}{2}\bigg{)}\quad\text{ and }\quad{{}_{i}}\xi_{{\bf j}}=\frac{(Z_{\bf j}-p)h({\bf X}_{\bf j})}{\sqrt{\bar{U}_{h^{2}}^{(i)}N(1-p)}},

analogously to U¯h2\bar{U}_{h^{2}} and ξ𝐣\xi_{\bf j}. Then write

(4.17) 𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]=i=1n𝔼[𝔷𝒳(η¯i𝔼[η¯i])(f𝔷𝒳(𝐣In,mξ𝐣)f𝔷𝒳(𝐣In,mξ𝐣i))]+i=1n𝔼[𝔷𝒳(η¯i𝔼[η¯i])f𝔷𝒳(𝐣In,mξ𝐣i)].\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}\Big{]}=\sum_{i=1}^{n}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\cdot({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\cdot\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\big{)}\Big{)}\Big{]}+\\ \sum_{i=1}^{n}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\cdot({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\cdot f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}\Big{]}.

We will next establish the bounds

(4.18) |i=1n𝔼[𝔷𝒳(η¯i𝔼[η¯i])(f𝔷𝒳(𝐣In,mξ𝐣)f𝔷𝒳(𝐣In,mξ𝐣i))]|(8+82)m3/2𝔼[Ψ13/2]n1/2σh3\Big{|}\sum_{i=1}^{n}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\cdot\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\big{)}\Big{)}\Big{]}\Big{|}\leq\frac{(8+8\sqrt{2})m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}

and

(4.19) |i=1n𝔼[𝔷𝒳(η¯i𝔼[η¯i])f𝔷𝒳(𝐣In,mξ𝐣i)]|m3/2𝔼[Ψ13/2]n1/2σh3(122+4+2.52mNn(1p));\Big{|}\sum_{i=1}^{n}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}\Big{]}\Big{|}\leq\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\Big{(}12\sqrt{2}+4+\frac{2.52\sqrt{mN}}{\sqrt{n(1-p)}}\Big{)};

combining (4.17)-(4.19) will then finish proving (4.14).

4.1. Bound on the term in (4.18)

We first have to separately consider |𝔷𝒳|<2|\mathfrak{z}_{\mathcal{X}}|<2 and |𝔷𝒳|2|\mathfrak{z}_{\mathcal{X}}|\geq 2. Since supz,w|fz(w)|1\sup_{z,w\in\mathbb{R}}|f_{z}^{\prime}(w)|\leq 1 by Lemma 4.1(i)(i),

(4.20) |𝔼[𝔷𝒳(η¯i𝔼[η¯i])(f𝔷𝒳(𝐣In,mξ𝐣)f𝔷𝒳(𝐣In,mξ𝐣i))I(|𝔷𝒳|<2)]|2𝔼[|η¯i𝔼[η¯i]||𝐣In,m(ξ𝐣ξ𝐣i)|I(|𝔷𝒳|<2)].\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\cdot\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}\Big{)}\cdot I\Big{(}|\mathfrak{z}_{\mathcal{X}}|<2\Big{)}\bigg{]}\bigg{|}\\ \leq 2\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\Big{|}\sum_{{\bf j}\in{I_{n,m}}}(\xi_{\bf j}-{{}_{i}}\xi_{\bf j})\Big{|}\cdot I\Big{(}|\mathfrak{z}_{\mathcal{X}}|<2\Big{)}\bigg{]}.

Since

I(Ω\{|𝐣In,mξ𝐣i||𝔷𝒳|1},|𝔷𝒳|2)|𝐣In,mξ𝐣i||𝔷𝒳|1I(|𝔷𝒳|2),I\Big{(}\Omega\backslash\Big{\{}|\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}|\leq|\mathfrak{z}_{\mathcal{X}}|-1\Big{\}},\quad|\mathfrak{z}_{\mathcal{X}}|\geq 2\Big{)}\leq\frac{|\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}|}{|\mathfrak{z}_{\mathcal{X}}|-1}\cdot I(|\mathfrak{z}_{\mathcal{X}}|\geq 2),

we can form the bound

|𝔼[𝔷𝒳(η¯i𝔼[η¯i])(f𝔷𝒳(𝐣In,mξ𝐣)f𝔷𝒳(𝐣In,mξ𝐣i))I(|𝔷𝒳|2)]|\displaystyle\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\big{)}\Big{)}I(|\mathfrak{z}_{\mathcal{X}}|\geq 2)\bigg{]}\bigg{|}
|𝔼[𝔷𝒳(η¯i𝔼[η¯i])(f𝔷𝒳(𝐣In,mξ𝐣)f𝔷𝒳(𝐣In,mξ𝐣i))I(|𝐣In,mξ𝐣i||𝔷𝒳|1,|𝔷𝒳|2)]|\displaystyle\leq\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\big{)}\Big{)}I\Big{(}\big{|}\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}\big{|}\leq|\mathfrak{z}_{\mathcal{X}}|-1,|\mathfrak{z}_{\mathcal{X}}|\geq 2\Big{)}\bigg{]}\bigg{|}
+|𝔼[𝔷𝒳(η¯i𝔼[η¯i])(f𝔷𝒳(𝐣In,mξ𝐣)f𝔷𝒳(𝐣In,mξ𝐣i))|𝐣In,mξ𝐣i||𝔷𝒳|1I(|𝔷𝒳|2)]|\displaystyle+\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Big{(}{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]\Big{)}\cdot\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}\Big{)}\cdot\frac{|\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}|}{|\mathfrak{z}_{\mathcal{X}}|-1}\cdot I(|\mathfrak{z}_{\mathcal{X}}|\geq 2)\bigg{]}\bigg{|}
𝔼[|𝔷𝒳|e1/2|𝔷𝒳||η¯i𝔼[η¯i]||𝐣In,m(ξ𝐣ξ𝐣i)|I(|𝔷𝒳|2)]\displaystyle\leq\mathbb{E}\bigg{[}|\mathfrak{z}_{\mathcal{X}}|e^{1/2-|\mathfrak{z}_{\mathcal{X}}|}\cdot\Big{|}{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]\Big{|}\cdot\Big{|}\sum_{{\bf j}\in{I_{n,m}}}(\xi_{\bf j}-{{}_{i}}\xi_{\bf j})\Big{|}\cdot I(|\mathfrak{z}_{\mathcal{X}}|\geq 2)\bigg{]}
(4.21) +𝔼[|𝔷𝒳||𝔷𝒳|1|η¯i𝔼[η¯i]||𝐣In,m(ξ𝐣ξ𝐣i)||𝐣In,mξ𝐣i|I(|𝔷𝒳|2)]\displaystyle\qquad\qquad+\mathbb{E}\bigg{[}\frac{|\mathfrak{z}_{\mathcal{X}}|}{|\mathfrak{z}_{\mathcal{X}}|-1}\cdot\Big{|}{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]\Big{|}\cdot\Big{|}\sum_{{\bf j}\in{I_{n,m}}}(\xi_{\bf j}-{{}_{i}}\xi_{\bf j})\Big{|}\cdot\Big{|}\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}\Big{|}\cdot I(|\mathfrak{z}_{\mathcal{X}}|\geq 2)\bigg{]}

where the first term in (4.21) comes from facts:

  1. (i)

    For any zz\in\mathbb{R} with |z|2|z|\geq 2, |fz(w)|e1/2|z||f_{z}^{\prime}(w)|\leq e^{1/2-|z|} if |w||z|1|w|\leq|z|-1 by the bounds for fzf_{z}^{\prime} in Lemma 4.1(ii)(ii) and (iii)(iii);

  2. (ii)

    For any any zz\in\mathbb{R} with |z|2|z|\geq 2, |𝐣In,mξ𝐣i||z|1|\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}|\leq|z|-1 implies |𝐣In,mξ𝐣||z|1|\sum_{{\bf j}\in{I_{n,m}}}\xi_{\bf j}|\leq|z|-1 because U¯h2(i)U¯h2\bar{U}^{(i)}_{h^{2}}\leq\bar{U}_{h^{2}} by definition,

and the second term in (4.21) comes from supw,z|fz(w)|1\sup_{w,z\in\mathbb{R}}|f_{z}^{\prime}(w)|\leq 1 by Lemma 4.1(i)(i).

As

(4.22) sup|z|2|z|e12|z|=2e3/2 and sup|z|2|z||z|1=2,\sup_{|z|\geq 2}|z|e^{\frac{1}{2}-|z|}=2e^{-3/2}\text{ and }\sup_{|z|\geq 2}\frac{|z|}{|z|-1}=2,

(4.20)-(4.21) imply

(4.23) |𝔼[𝔷𝒳(η¯i𝔼[η¯i])(f𝔷𝒳(𝐣In,mξ𝐣)f𝔷𝒳(𝐣In,mξ𝐣i))]|2{𝔼[|η¯i𝔼[η¯i]||𝐣In,m(ξ𝐣ξ𝐣i)|](1)+𝔼[|η¯i𝔼[η¯i]||𝐣In,m(ξ𝐣ξ𝐣i)||𝐣In,mξ𝐣i|](2)}.\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Big{(}{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]\Big{)}\cdot\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\big{)}\Big{)}\bigg{]}\bigg{|}\leq\\ 2\bigg{\{}\underbrace{\mathbb{E}\Big{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\Big{|}\sum_{{\bf j}\in{I_{n,m}}}(\xi_{\bf j}-{{}_{i}}\xi_{\bf j})\Big{|}\Big{]}}_{\equiv(1)}+\underbrace{\mathbb{E}\Big{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\Big{|}\sum_{{\bf j}\in{I_{n,m}}}(\xi_{\bf j}-{{}_{i}}\xi_{\bf j})\Big{|}\cdot|\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}|\Big{]}}_{\equiv(2)}\bigg{\}}.

Now we estimate the term (1)(1) in (4.23) as

(1)\displaystyle(1) =𝔼[|η¯i𝔼[η¯i]||U¯h2U¯h2(i)U¯h2(i)||𝐣In,mξ𝐣|]\displaystyle=\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{|}\frac{\sqrt{\bar{U}_{h^{2}}}-\sqrt{\bar{U}_{h^{2}}^{(i)}}}{\sqrt{\bar{U}_{h^{2}}^{(i)}}}\bigg{|}\cdot\big{|}\sum_{{\bf j}\in{I_{n,m}}}\xi_{\bf j}\big{|}\bigg{]}
2σh𝔼[|η¯i𝔼[η¯i]|U¯h2U¯h2(i)(𝔼[(𝐣In,mξ𝐣)2𝒳]=Uh2/U¯h21)1/2]\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\sqrt{\bar{U}_{h^{2}}-\bar{U}_{h^{2}}^{(i)}}\cdot\Big{(}\underbrace{\mathbb{E}\Big{[}\big{(}\sum_{{\bf j}\in{I_{n,m}}}\xi_{\bf j}\big{)}^{2}\mid\mathcal{X}\Big{]}}_{=U_{h^{2}}/\bar{U}_{h^{2}}\leq 1}\Big{)}^{1/2}\bigg{]}
2σh𝔼[|η¯i𝔼[η¯i]|((nm)1𝐣In,m,i𝐣h2(𝐗𝐣))1/2] by Property 2.2(i) of censoring\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{(}{n\choose m}^{-1}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}h^{2}({\bf X}_{\bf j})\bigg{)}^{1/2}\bigg{]}\text{ by Property~{}\ref{property:censoring_property}$(i)$ of censoring}
2σh𝔼[|η¯i𝔼[η¯i]|((nm)1𝐣In,m,i𝐣𝔼[h2(𝐗𝐣)Xi])1/2]\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{(}{n\choose m}^{-1}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}\mathbb{E}[h^{2}({\bf X}_{\bf j})\mid X_{i}]\bigg{)}^{1/2}\bigg{]}
(4.24) =2m𝔼[|η¯i𝔼[η¯i]|Ψ11/2(Xi)]σhn2m𝔼[(η¯i+𝔼[η¯i])Ψ11/2(Xi)]σhn22m3/2𝔼[Ψ13/2]n3/2σh3.\displaystyle=\frac{\sqrt{2m}\mathbb{E}[|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\Psi_{1}^{1/2}(X_{i})]}{\sigma_{h}\sqrt{n}}\leq\frac{\sqrt{2m}\mathbb{E}[({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\Psi_{1}^{1/2}(X_{i})]}{\sigma_{h}\sqrt{n}}\leq\frac{2\sqrt{2}m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{3/2}\sigma_{h}^{3}}.

To estimate the term (2)(2) in (4.23), we first write

(2)\displaystyle(2) =𝔼[|η¯i𝔼[η¯i]||U¯h2U¯h2(i)U¯h2(i)|U¯h2U¯h2(i)𝔼[(𝐣In,m(Z𝐣p)h(𝐗𝐣)U¯h2N(1p))2𝒳]=Uh2/U¯h21]\displaystyle=\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\Bigg{|}\frac{\sqrt{\bar{U}_{h^{2}}}-\sqrt{\bar{U}_{h^{2}}^{(i)}}}{\sqrt{\bar{U}_{h^{2}}^{(i)}}}\Bigg{|}\cdot\sqrt{\frac{\bar{U}_{h^{2}}}{\bar{U}_{h^{2}}^{(i)}}}\cdot\underbrace{\mathbb{E}\bigg{[}\bigg{(}\sum_{{\bf j}\in{I_{n,m}}}\frac{(Z_{\bf j}-p)h({\bf X}_{\bf j})}{\sqrt{\bar{U}_{h^{2}}N(1-p)}}\bigg{)}^{2}\mid\mathcal{X}\bigg{]}}_{=U_{h^{2}}/\bar{U}_{h^{2}}\leq 1}\bigg{]}
2σh𝔼[|η¯i𝔼[η¯i]|(U¯h2U¯h2(i))(1+U¯h2U¯h2(i)U¯h2(i))] by U¯h2U¯h2(i)U¯h2U¯h2(i)\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\sqrt{(\bar{U}_{h^{2}}-\bar{U}_{h^{2}}^{(i)})\cdot\bigg{(}1+\frac{\bar{U}_{h^{2}}-\bar{U}_{h^{2}}^{(i)}}{\bar{U}_{h^{2}}^{(i)}}\bigg{)}}\bigg{]}\text{ by }\sqrt{\bar{U}_{h^{2}}}-\sqrt{\bar{U}_{h^{2}}^{(i)}}\leq\sqrt{\bar{U}_{h^{2}}-\bar{U}_{h^{2}}^{(i)}}
(4.25) 2σh𝔼[|η¯i𝔼[η¯i]|(U¯h2U¯h2(i)+U¯h2U¯h2(i)U¯h2(i))],\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{(}\sqrt{\bar{U}_{h^{2}}-\bar{U}_{h^{2}}^{(i)}}+\frac{\bar{U}_{h^{2}}-\bar{U}_{h^{2}}^{(i)}}{\sqrt{\bar{U}_{h^{2}}^{(i)}}}\bigg{)}\bigg{]},

where the last inequality uses the concavity of the \sqrt{\cdot} function. Since Uh2Uh2(i)=(nm)1𝐣In,m,i𝐣h2(𝐗𝐣)U_{h^{2}}-U_{h^{2}}^{(i)}={n\choose m}^{-1}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}h^{2}({\bf X}_{\bf j}), continuing from (4.25), we have

(2)\displaystyle(2) 2σh𝔼[|η¯i𝔼[η¯i]|(𝐣In,m,i𝐣h2(𝐗𝐣)(nm)+2𝐣In,m,i𝐣h2(𝐗𝐣)(nm)σh)] by Property 2.2(i)\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{(}\sqrt{\frac{\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}h^{2}({\bf X}_{\bf j})}{{n\choose m}}}+\frac{\sqrt{2}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}h^{2}({\bf X}_{\bf j})}{{n\choose m}\sigma_{h}}\bigg{)}\bigg{]}\text{ by Property~{}\ref{property:censoring_property}$(i)$}
2σh𝔼[|η¯i𝔼[η¯i]|(𝐣In,m,i𝐣𝔼[h2(𝐗𝐣)Xi](nm)+2𝐣In,m,i𝐣𝔼[h2(𝐗𝐣)Xi](nm)σh)]\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{(}\sqrt{\frac{\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}\mathbb{E}[h^{2}({\bf X}_{\bf j})\mid X_{i}]}{{n\choose m}}}+\frac{\sqrt{2}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}\mathbb{E}[h^{2}({\bf X}_{\bf j})\mid X_{i}]}{{n\choose m}\sigma_{h}}\bigg{)}\bigg{]}
2σh𝔼[(η¯i+𝔼[η¯i])(mΨ1(Xi)n+2mΨ1(Xi)nσh)]\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\mathbb{E}\bigg{[}({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\cdot\bigg{(}\sqrt{\frac{m\Psi_{1}(X_{i})}{n}}+\frac{\sqrt{2}m\Psi_{1}(X_{i})}{n\sigma_{h}}\bigg{)}\bigg{]}
2σh{𝔼[η¯i(mΨ1(Xi)n+2mΨ1(Xi)nσh)]+(1+2)m3/2𝔼[Ψ1]𝔼[Ψ11/2]n3/2σh2} by 𝔼[η¯i]𝔼[η¯i1/2]\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\bigg{\{}\mathbb{E}\bigg{[}{\bar{\eta}}_{i}\bigg{(}\sqrt{\frac{m\Psi_{1}(X_{i})}{n}}+\frac{\sqrt{2}m\Psi_{1}(X_{i})}{n\sigma_{h}}\bigg{)}\bigg{]}+\frac{(1+\sqrt{2})m^{3/2}\mathbb{E}[\Psi_{1}]\mathbb{E}[\Psi_{1}^{1/2}]}{n^{3/2}\sigma_{h}^{2}}\bigg{\}}\text{ by }\mathbb{E}[{\bar{\eta}}_{i}]\leq\mathbb{E}[{\bar{\eta}}_{i}^{1/2}]
2σh{(1+2)m3/2𝔼[Ψ13/2]n3/2σh2+(1+2)m3/2𝔼[Ψ1]𝔼[Ψ11/2]n3/2σh2} by η¯iη¯i1/2\displaystyle\leq\frac{\sqrt{2}}{\sigma_{h}}\bigg{\{}\frac{(1+\sqrt{2})m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{3/2}\sigma_{h}^{2}}+\frac{(1+\sqrt{2})m^{3/2}\mathbb{E}[\Psi_{1}]\mathbb{E}[\Psi_{1}^{1/2}]}{n^{3/2}\sigma_{h}^{2}}\bigg{\}}\text{ by }{\bar{\eta}}_{i}\leq{\bar{\eta}}_{i}^{1/2}
(4.26) (4+22)m3/2𝔼[Ψ13/2]n3/2σh3 by 𝔼[Ψ1]𝔼[Ψ11/2]𝔼[Ψ13/2].\displaystyle\leq\frac{(4+2\sqrt{2})m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{3/2}\sigma_{h}^{3}}\text{ by }\mathbb{E}[\Psi_{1}]\mathbb{E}[\Psi_{1}^{1/2}]\leq\mathbb{E}[\Psi_{1}^{3/2}].

Combining (4.23)-(4.26), we have proved the bound in (4.18).

4.2. Bound on the term in (4.19)

Let

(4.27) 𝔷𝒳(i)zσσhα(1p)NUn(i)σh1p,\mathfrak{z}_{\mathcal{X}^{(i)}}\equiv\frac{z\sigma}{\sigma_{h}\sqrt{\alpha(1-p)}}-\frac{\sqrt{N}U_{n}^{(i)}}{\sigma_{h}\sqrt{1-p}},

where

Un(i)(nm)1𝐣In,m,i𝐣h(X𝐣).U_{n}^{(i)}\equiv{n\choose m}^{-1}\sum_{{\bf j}\in I_{n,m},i\not\in{\bf j}}h(X_{\bf j}).

First we write the equality

(4.28) 𝔼[𝔷𝒳(η¯i𝔼[η¯i])f𝔷𝒳(𝐣In,mξ𝐣i)]=𝔼[𝔷𝒳(i)(η¯i𝔼[η¯i])f𝔷𝒳(i)(𝐣In,mξ𝐣i)](3)+𝔼[(η¯i𝔼[η¯i])(𝔷𝒳f𝔷𝒳(𝐣In,mξ𝐣i)𝔷𝒳(i)f𝔷𝒳(i)(𝐣In,mξ𝐣i))](4),\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}\Big{]}=\underbrace{\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}^{(i)}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}\Big{]}}_{\equiv(3)}\\ +\underbrace{\mathbb{E}\Big{[}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\cdot\Big{(}\mathfrak{z}_{\mathcal{X}}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}-\mathfrak{z}_{\mathcal{X}^{(i)}}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{{}_{i}}\xi_{\bf j}\Big{)}\Big{)}\Big{]}}_{\equiv(4)},

and will bound (3)(3) and (4)(4). By the independence between XiX_{i} and {Xj}ji\{X_{j}\}_{j\neq i}, we have

𝔼[𝔷𝒳(i)(η¯i𝔼[η¯i])f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i)]=𝔼[η¯i𝔼[η¯i]]=0𝔼[𝔷𝒳(i)f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i)],\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}^{(i)}}\Big{(}{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]\Big{)}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}\bigg{]}=\underbrace{\mathbb{E}\big{[}{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]\big{]}}_{=0}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}^{(i)}}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}\bigg{]},

so we can write (3)(3) as

(3)\displaystyle(3) =𝔼[𝔷𝒳(i)(η¯i𝔼[η¯i])(f𝔷𝒳(i)(𝐣In,mξ𝐣i)f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i))]\displaystyle=\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}^{(i)}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\bigg{(}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}\Big{(}\sum_{{\bf j}\in{I_{n,m}}}{{}_{i}}\xi_{\bf j}\Big{)}-f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}\bigg{)}\bigg{]}
by the independence between XiX_{i} and {Xj}ji\{X_{j}\}_{j\neq i}
=𝔼[𝔷𝒳(i)(η¯i𝔼[η¯i])0𝐣In,m,i𝐣ξ𝐣i𝔼[f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i+t)𝒳]𝑑t]\displaystyle=\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}^{(i)}}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\int_{0}^{\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}}\mathbb{E}\bigg{[}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}^{\prime}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\Big{)}\mid\mathcal{X}\bigg{]}dt\bigg{]}
 by the independence of the Z𝐣’s and the fundamental theorem of calculus.\displaystyle\hskip 28.45274pt\text{ by the independence of the $Z_{\bf j}$'s and the fundamental theorem of calculus}.

From here, with both η¯i{\bar{\eta}}_{i} and 𝔼[η¯i]\mathbb{E}[{\bar{\eta}}_{i}] being non-negative, we can immediately write

(4.29) |(3)|𝔼[(η¯i+𝔼[η¯i])0|𝐣In,m,i𝐣ξ𝐣i|𝔼[|𝔷𝒳(i)f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i+t)|𝒳]dt.]|(3)|\leq\mathbb{E}\Bigg{[}({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\int_{0}^{|\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}|}\mathbb{E}\bigg{[}\Big{|}\mathfrak{z}_{\mathcal{X}^{(i)}}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}^{\prime}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\Big{)}\Big{|}\mid\mathcal{X}\bigg{]}dt.\Bigg{]}

At this point, we will bound the integrand 𝔼[|𝔷𝒳(i)f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i+t)|𝒳]\mathbb{E}[|\mathfrak{z}_{\mathcal{X}^{(i)}}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}^{\prime}(\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t)|\mid\mathcal{X}] in (4.29). By supw,z|fz(w)|1\sup_{w,z\in\mathbb{R}}|f_{z}^{\prime}(w)|\leq 1 from Lemma 4.1(i)(i),

(4.30) 𝔼[|𝔷𝒳(i)f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i+t)|I(|𝔷𝒳(i)|2)𝒳]2.\mathbb{E}\bigg{[}\Big{|}\mathfrak{z}_{\mathcal{X}^{(i)}}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}^{\prime}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\Big{)}\Big{|}\cdot I\big{(}|\mathfrak{z}_{\mathcal{X}^{(i)}}|\leq 2\big{)}\mid\mathcal{X}\bigg{]}\leq 2.

Since supz,w|fz(w)|1\sup_{z,w\in\mathbb{R}}|f^{\prime}_{z}(w)|\leq 1 by Lemma 4.1(i)(i) and |fz(w)|e1/2|z||f^{\prime}_{z}(w)|\leq e^{1/2-|z|} for any |z|>2|z|>2 and |w||z|1|w|\leq|z|-1 by Lemma 4.1(ii)(ii) and (iii)(iii), for any t0t\geq 0, we have

𝔼[|𝔷𝒳(i)f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i+t)|I(|𝔷𝒳(i)|>2)𝒳]\displaystyle\mathbb{E}\bigg{[}\Big{|}\mathfrak{z}_{\mathcal{X}^{(i)}}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}^{\prime}\big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\big{)}\Big{|}\cdot I\big{(}|\mathfrak{z}_{\mathcal{X}^{(i)}}|>2\big{)}\mid\mathcal{X}\bigg{]}
𝔼[|𝔷𝒳(i)|e1/2|𝔷𝒳(i)|I(|𝐣In,m,i𝐣ξ𝐣i+t||𝔷𝒳(i)|1,|𝔷𝒳(i)|>2)𝒳]\displaystyle\leq\mathbb{E}\Big{[}|\mathfrak{z}_{\mathcal{X}^{(i)}}|e^{1/2-|\mathfrak{z}_{\mathcal{X}^{(i)}}|}\cdot I\Big{(}\big{|}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\big{|}\leq|\mathfrak{z}_{\mathcal{X}^{(i)}}|-1,|\mathfrak{z}_{\mathcal{X}^{(i)}}|>2\Big{)}\mid\mathcal{X}\Big{]}
+𝔼[|𝔷𝒳(i)|I(|𝐣In,m,i𝐣ξ𝐣i+t|>|𝔷𝒳(i)|1,|𝔷𝒳(i)|>2)𝒳]\displaystyle\hskip 85.35826pt+\mathbb{E}\Big{[}|\mathfrak{z}_{\mathcal{X}^{(i)}}|\cdot I\Big{(}\big{|}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\big{|}>|\mathfrak{z}_{\mathcal{X}^{(i)}}|-1,|\mathfrak{z}_{\mathcal{X}^{(i)}}|>2\Big{)}\mid\mathcal{X}\Big{]}
𝔼[|𝔷𝒳(i)|e1/2|𝔷𝒳(i)|I(|𝐣In,m,i𝐣ξ𝐣i+t||𝔷𝒳(i)|1,|𝔷𝒳(i)|>2)𝒳]\displaystyle\leq\mathbb{E}\Big{[}|\mathfrak{z}_{\mathcal{X}^{(i)}}|e^{1/2-|\mathfrak{z}_{\mathcal{X}^{(i)}}|}\cdot I\Big{(}\big{|}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\big{|}\leq|\mathfrak{z}_{\mathcal{X}^{(i)}}|-1,|\mathfrak{z}_{\mathcal{X}^{(i)}}|>2\Big{)}\mid\mathcal{X}\Big{]}
+𝔼[|𝔷𝒳(i)||𝐣In,m,i𝐣ξ𝐣i|+t|𝔷𝒳(i)|1I(|𝔷𝒳(i)|>2)𝒳]\displaystyle\hskip 85.35826pt+\mathbb{E}\bigg{[}|\mathfrak{z}_{\mathcal{X}^{(i)}}|\cdot\frac{\big{|}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}|+t}{|\mathfrak{z}_{\mathcal{X}^{(i)}}|-1}\cdot I\big{(}|\mathfrak{z}_{\mathcal{X}^{(i)}}|>2\big{)}\mid\mathcal{X}\bigg{]}
2+2(𝔼[|𝐣In,m,i𝐣ξ𝐣i|𝒳]+t) by (4.22)\displaystyle\leq 2+2\Big{(}\mathbb{E}\Big{[}\Big{|}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{|}\mid\mathcal{X}\Big{]}+t\Big{)}\text{ by \eqref{some_sup_bdds}}
2+2((𝔼[𝐣In,m,i𝐣ξ𝐣2i𝒳])1/2+t) by 𝔼[(𝐣In,m,i𝐣ξ𝐣i)2𝒳]=𝔼[𝐣In,m,i𝐣ξ𝐣2i𝒳]\displaystyle\leq 2+2\bigg{(}\Big{(}\mathbb{E}\Big{[}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}^{2}\mid\mathcal{X}\Big{]}\Big{)}^{1/2}+t\bigg{)}\text{ by }\mathbb{E}\Big{[}(\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j})^{2}\mid\mathcal{X}\Big{]}=\mathbb{E}\Big{[}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}^{2}\mid\mathcal{X}\Big{]}
(4.31) 2+2(1+t) by 𝔼[𝐣In,m,i𝐣ξ𝐣2i𝒳]=Uh2(i)/U¯h2(i)1.\displaystyle\leq 2+2(1+t)\quad\text{ by }\quad\mathbb{E}\Big{[}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}^{2}\mid\mathcal{X}\Big{]}=U_{h^{2}}^{(i)}/\bar{U}_{h^{2}}^{(i)}\leq 1.

Hence, from (4.30)-(4.31), we have

𝔼[|𝔷𝒳(i)f𝔷𝒳(i)(𝐣In,m,i𝐣ξ𝐣i+t)|𝒳]6+2t;\mathbb{E}\bigg{[}\Big{|}\mathfrak{z}_{\mathcal{X}^{(i)}}f_{\mathfrak{z}_{\mathcal{X}^{(i)}}}^{\prime}\big{(}\sum_{{\bf j}\in{I_{n,m}},i\not\in{\bf j}}{{}_{i}}\xi_{\bf j}+t\big{)}\Big{|}\mid\mathcal{X}\bigg{]}\leq 6+2t;

plugging this latter fact into (4.29), we get

|(3)|\displaystyle|(3)| 𝔼[(η¯i+𝔼[η¯i])0|𝐣In,m,i𝐣ξ𝐣i|6+2tdt]\displaystyle\leq\mathbb{E}\bigg{[}({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\int_{0}^{|\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}|}6+2t\;\ dt\bigg{]}
=𝔼[(η¯i+𝔼[η¯i])(6|𝐣In,m,i𝐣ξ𝐣i|+(𝐣In,m,i𝐣ξ𝐣i)2)]\displaystyle=\mathbb{E}\bigg{[}({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\bigg{(}6\Big{|}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{|}+\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}^{2}\bigg{)}\bigg{]}
𝔼[(η¯i+𝔼[η¯i])(6(𝔼[(𝐣In,m,i𝐣ξ𝐣i)2Xi])1/2+𝔼[(𝐣In,m,i𝐣ξ𝐣i)2Xi])]\displaystyle\leq\mathbb{E}\bigg{[}({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\bigg{(}6\Big{(}\mathbb{E}\Big{[}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}^{2}\mid X_{i}\Big{]}\Big{)}^{1/2}+\mathbb{E}\Big{[}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}^{2}\mid X_{i}\Big{]}\bigg{)}\bigg{]}
(4.32) 𝔼[(η¯i+𝔼[η¯i])(6(2mΨ1(Xi)nσh2)1/2+2mΨ1(Xi)nσh2)],\displaystyle\leq\mathbb{E}\bigg{[}({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\bigg{(}6\bigg{(}\frac{2m\Psi_{1}(X_{i})}{n\sigma_{h}^{2}}\bigg{)}^{1/2}+\frac{2m\Psi_{1}(X_{i})}{n\sigma_{h}^{2}}\bigg{)}\bigg{]},

where (4.32) comes from evaluating and bounding the conditional expectation as

𝔼[(𝐣In,m,i𝐣ξ𝐣i)2Xi]=𝔼[𝔼[(𝐣In,m,i𝐣ξ𝐣i)2𝒳]Xi]=𝐣In,m,i𝐣𝔼[h2(𝐗𝐣)Xi](nm)U¯h2(i)=mΨ1(Xi)nU¯h2(i)2mΨ1(Xi)nσh2.\mathbb{E}\Big{[}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}^{2}\mid X_{i}\Big{]}=\mathbb{E}\Big{[}\mathbb{E}\Big{[}\Big{(}\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}{{}_{i}}\xi_{\bf j}\Big{)}^{2}\mid\mathcal{X}\Big{]}\mid X_{i}\Big{]}\\ =\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}\frac{\mathbb{E}[h^{2}({\bf X}_{\bf j})\mid X_{i}]}{{n\choose m}\bar{U}^{(i)}_{h^{2}}}=\frac{m\Psi_{1}(X_{i})}{n\bar{U}^{(i)}_{h^{2}}}\leq\frac{2m\Psi_{1}(X_{i})}{n\sigma_{h}^{2}}.

Since 𝔼[η¯i]𝔼[η¯i1/2]\mathbb{E}[{\bar{\eta}}_{i}]\leq\mathbb{E}[{\bar{\eta}}_{i}^{1/2}], we can then further continue from (4.32) as

|(3)|\displaystyle|(3)| 𝔼[η¯i(6(2mΨ1(Xi)nσh2)1/2+2mΨ1(Xi)nσh2)]+(62+2)m3/2𝔼[Ψ1]𝔼[Ψ11/2]n3/2σh3\displaystyle\leq\mathbb{E}\bigg{[}{\bar{\eta}}_{i}\bigg{(}6\bigg{(}\frac{2m\Psi_{1}(X_{i})}{n\sigma_{h}^{2}}\bigg{)}^{1/2}+\frac{2m\Psi_{1}(X_{i})}{n\sigma_{h}^{2}}\bigg{)}\bigg{]}+\frac{(6\sqrt{2}+2)m^{3/2}\mathbb{E}[\Psi_{1}]\mathbb{E}[\Psi_{1}^{1/2}]}{n^{3/2}\sigma_{h}^{3}}
(62+2)m3/2𝔼[Ψ13/2]n3/2σh3+(62+2)m3/2𝔼[Ψ1]𝔼[Ψ11/2]n3/2σh3 by η¯iη¯i1/2\displaystyle\leq\frac{(6\sqrt{2}+2)m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{3/2}\sigma_{h}^{3}}+\frac{(6\sqrt{2}+2)m^{3/2}\mathbb{E}[\Psi_{1}]\mathbb{E}[\Psi_{1}^{1/2}]}{n^{3/2}\sigma_{h}^{3}}\text{ by }{\bar{\eta}}_{i}\leq{\bar{\eta}}_{i}^{1/2}
(4.33) (122+4)m3/2𝔼[Ψ13/2]n3/2σh3,\displaystyle\leq\frac{(12\sqrt{2}+4)m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{3/2}\sigma_{h}^{3}},

where the last inequality is true via 𝔼[Ψ1]𝔼[Ψ11/2]𝔼[Ψ13/2]\mathbb{E}[\Psi_{1}]\mathbb{E}[\Psi_{1}^{1/2}]\leq\mathbb{E}[\Psi_{1}^{3/2}].

Now we bound (4)(4) in (4.28). Using that supz,w|fz(w)|0.63\sup_{z,w\in\mathbb{R}}|f_{z}(w)|\leq 0.63 in Lemma 4.1(i)(i) and 𝔷𝒳𝔷𝒳(i)=Nσh1p(UnUn(i))\mathfrak{z}_{\mathcal{X}}-\mathfrak{z}_{\mathcal{X}^{(i)}}=\frac{\sqrt{N}}{\sigma_{h}\sqrt{1-p}}(U_{n}-U_{n}^{(i)}), we first write

|(4)|\displaystyle|(4)| 1.26Nσh1p𝔼[|η¯i𝔼[η¯i]||UnUn(i))|]\displaystyle\leq\frac{1.26\sqrt{N}}{\sigma_{h}\sqrt{1-p}}\mathbb{E}\Big{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot|U_{n}-U_{n}^{(i)})|\Big{]}
=1.26mNσhn1p𝔼[|η¯i𝔼[η¯i]||𝐣In,m,i𝐣h(X𝐣)(n1m1)|]\displaystyle=\frac{1.26m\sqrt{N}}{\sigma_{h}n\sqrt{1-p}}\mathbb{E}\Bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{|}\frac{\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}h(X_{\bf j})}{{n-1\choose m-1}}\bigg{|}\Bigg{]}
1.26mNσhn1p𝔼[|η¯i𝔼[η¯i]|(𝐣In,m,i𝐣h2(X𝐣)(n1m1))1/2]\displaystyle\leq\frac{1.26m\sqrt{N}}{\sigma_{h}n\sqrt{1-p}}\mathbb{E}\Bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\bigg{(}\frac{\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}h^{2}(X_{\bf j})}{{n-1\choose m-1}}\bigg{)}^{1/2}\Bigg{]}
=1.26mNσhn1p𝔼[|η¯i𝔼[η¯i]|𝔼[(𝐣In,m,i𝐣h2(X𝐣)(n1m1))1/2Xi]]\displaystyle=\frac{1.26m\sqrt{N}}{\sigma_{h}n\sqrt{1-p}}\mathbb{E}\Bigg{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\mathbb{E}\bigg{[}\bigg{(}\frac{\sum_{{\bf j}\in{I_{n,m}},i\in{\bf j}}h^{2}(X_{\bf j})}{{n-1\choose m-1}}\bigg{)}^{1/2}\mid X_{i}\bigg{]}\Bigg{]}
1.26mNσhn1p𝔼[|η¯i𝔼[η¯i]|Ψ11/2(Xi)]\displaystyle\leq\frac{1.26m\sqrt{N}}{\sigma_{h}n\sqrt{1-p}}\mathbb{E}\big{[}|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|\cdot\Psi_{1}^{1/2}(X_{i})\big{]}
1.26mNσhn1p𝔼[(η¯i+𝔼[η¯i])Ψ11/2(Xi)]\displaystyle\leq\frac{1.26m\sqrt{N}}{\sigma_{h}n\sqrt{1-p}}\mathbb{E}\big{[}({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])\cdot\Psi_{1}^{1/2}(X_{i})\big{]}
=1.26mNσhn1p(𝔼[η¯iΨ11/2(Xi)]+𝔼[η¯i]𝔼[Ψ11/2])\displaystyle=\frac{1.26m\sqrt{N}}{\sigma_{h}n\sqrt{1-p}}\bigg{(}\mathbb{E}[{\bar{\eta}}_{i}\Psi_{1}^{1/2}(X_{i})]+\mathbb{E}[{\bar{\eta}}_{i}]\mathbb{E}[\Psi_{1}^{1/2}]\bigg{)}
1.26mNσhn1p(m𝔼[Ψ13/2]nσh2+m𝔼[Ψ1]𝔼[Ψ11/2]nσh2)\displaystyle\leq\frac{1.26m\sqrt{N}}{\sigma_{h}n\sqrt{1-p}}\bigg{(}\frac{m\mathbb{E}[\Psi_{1}^{3/2}]}{n\sigma_{h}^{2}}+\frac{m\mathbb{E}[\Psi_{1}]\mathbb{E}[\Psi_{1}^{1/2}]}{n\sigma_{h}^{2}}\bigg{)}
(4.34) 2.52m2N𝔼[Ψ13/2]σh3n21p.\displaystyle\leq\frac{2.52m^{2}\sqrt{N}\mathbb{E}[\Psi_{1}^{3/2}]}{\sigma_{h}^{3}n^{2}\sqrt{1-p}}.

Combining (4.28), (4.33) and (4.34) gives (4.19).

Appendix A Proof of Lemma 2.4

Because Φ(t)=1Φ(t)\Phi(t)=1-\Phi(-t) for any t<0t<0, we can assume that z0z\geq 0 without loss of generality. To simplify notation, we let

σ~2m2σg2+α(1p)σh2,\tilde{\sigma}^{2}\equiv m^{2}\sigma_{g}^{2}+\alpha(1-p)\sigma_{h}^{2},

so from the definition in (2.17),

εz=|Φ(σzσ~)Φ(z)|.\varepsilon_{z}=\Big{|}\Phi\Big{(}\frac{\sigma z}{\tilde{\sigma}}\Big{)}-\Phi(z)\Big{|}.

Using the assumption that z0z\geq 0, we get that

εz\displaystyle\varepsilon_{z} ϕ(z)z|σσ~1| by Taylor’s expansion around z, and that σ~σ\displaystyle\leq\phi(z)z\Big{|}\frac{\sigma}{\tilde{\sigma}}-1\Big{|}\text{ by Taylor's expansion around $z$, and that }\tilde{\sigma}\leq\sigma
exp(0.5)2π|σσ~σ~| by supz0zϕ(z)=exp(0.5)2π\displaystyle\leq\frac{\exp(-0.5)}{\sqrt{2\pi}}\Bigg{|}\frac{\sigma-\tilde{\sigma}}{\tilde{\sigma}}\Bigg{|}\text{ by $\sup_{z\geq 0}z\phi(z)=\frac{\exp(-0.5)}{\sqrt{2\pi}}$}
=exp(0.5)2π|σ2σ~2σ~(σ+σ~)|\displaystyle=\frac{\exp(-0.5)}{\sqrt{2\pi}}\Bigg{|}\frac{\sigma^{2}-\tilde{\sigma}^{2}}{\tilde{\sigma}(\sigma+\tilde{\sigma})}\Bigg{|}
=exp(0.5)2π(n(nm)1σh2σ~(σ+σ~))\displaystyle=\frac{\exp(-0.5)}{\sqrt{2\pi}}\Bigg{(}\frac{n{n\choose m}^{-1}\sigma_{h}^{2}}{\tilde{\sigma}(\sigma+\tilde{\sigma})}\Bigg{)}
exp(0.5)22π(n(nm)1σh2σ~2) by σ~σ\displaystyle\leq\frac{\exp(-0.5)}{2\sqrt{2\pi}}\Bigg{(}\frac{n{n\choose m}^{-1}\sigma_{h}^{2}}{\tilde{\sigma}^{2}}\Bigg{)}\text{ by }\tilde{\sigma}\leq\sigma
exp(0.5)N2π(n1)n(1p),\displaystyle\leq\frac{\exp(-0.5)N}{\sqrt{2\pi}(n-1)n(1-p)},

where the last inequality uses α(1p)σh2σ~2\alpha(1-p)\sigma_{h}^{2}\leq\tilde{\sigma}^{2} and

n(nm)=m1(n1)(n(m1))2n1 as a consequence of 2m<n/2.\frac{n}{{n\choose m}}=\frac{m\cdots 1}{(n-1)\cdots(n-(m-1))}\leq\frac{2}{n-1}\text{ as a consequence of }2\leq m<n/2.

This essentially proves Lemma 2.4.

Appendix B Proof of Lemma 3.1

We need the following Bennett’s inequality from Leung et al. (2024).

Lemma B.1 (Bennett’s inequality for a sum of censored random variables).

Let ξ1,,ξn\xi_{1},\dots,\xi_{n} be independent random variables with 𝔼[ξi]=0\mathbb{E}[\xi_{i}]=0 for all i=1,,ni=1,\dots,n and i=1n𝔼[ξi2]1\sum_{i=1}^{n}\mathbb{E}[\xi_{i}^{2}]\leq 1, and define ξ¯i=ξiI(|ξi|1)+1I(ξi>1)1I(ξi<1){\bar{\xi}}_{i}=\xi_{i}I(|\xi_{i}|\leq 1)+1I(\xi_{i}>1)-1I(\xi_{i}<-1). Then

𝔼[ei=1nξ¯i]exp(41(e2+1))8.15.\mathbb{E}[e^{\sum_{i=1}^{n}{\bar{\xi}}_{i}}]\leq\exp\left(4^{-1}(e^{2}+1)\right)\leq 8.15.

Recalling ξ𝐢\xi_{\bf i} and ξ¯𝐢{\bar{\xi}}_{\bf i} in (3.3) and (3.16), we first let

(B.1) W𝐢In,mξ𝐢 and W¯𝐢In,mξ¯𝐢.W\equiv\sum_{{\bf i}\in{I_{n,m}}}\xi_{\bf i}\text{ and }{\bar{W}}\equiv\sum_{{\bf i}\in{I_{n,m}}}{\bar{\xi}}_{\bf i}.

The algebraic inequalities

(B.2) 1+s/2s2/2(1+s)1/21+s/2 for all s1,1+s/2-s^{2}/2\leq(1+s)^{1/2}\leq 1+s/2\text{ for all }s\geq-1,

lead to the following two sets of event inclusions depending on the sign of 𝔷𝒳\mathfrak{z}_{\mathcal{X}}:

B.0.1. If 𝔷𝒳0\mathfrak{z}_{\mathcal{X}}\geq 0:

In this case we have

𝔷𝒳(1+D22D222)𝔷𝒳(1+D2)1/2𝔷𝒳(1+D22)\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\Big{)}\leq\mathfrak{z}_{\mathcal{X}}(1+D_{2})^{1/2}\leq\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)}

via (B.2), which in turn implies

{TSN>𝔷𝒳,𝔷𝒳0}{W+D1>𝔷𝒳(1+D22),𝔷𝒳0}{𝔷𝒳(1+D22D222)<W+D1𝔷𝒳(1+D22),𝔷𝒳0}\{T_{SN}>\mathfrak{z}_{\mathcal{X}},\mathfrak{z}_{\mathcal{X}}\geq 0\}\subset\\ \bigg{\{}W+D_{1}>\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)},\mathfrak{z}_{\mathcal{X}}\geq 0\bigg{\}}\cup\bigg{\{}\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\Big{)}<W+D_{1}\leq\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)},\mathfrak{z}_{\mathcal{X}}\geq 0\bigg{\}}

and

{TSN>𝔷𝒳,𝔷𝒳0}{W+D1>𝔷𝒳(1+D22),𝔷𝒳0}.\{T_{SN}>\mathfrak{z}_{\mathcal{X}},\mathfrak{z}_{\mathcal{X}}\geq 0\}\supset\bigg{\{}W+D_{1}>\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)},\mathfrak{z}_{\mathcal{X}}\geq 0\bigg{\}}.

B.0.2. If 𝔷𝒳<0\mathfrak{z}_{\mathcal{X}}<0:

In this case we have

𝔷𝒳(1+D22)𝔷𝒳(1+D2)1/2𝔷𝒳(1+D22D222)\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)}\leq\mathfrak{z}_{\mathcal{X}}(1+D_{2})^{1/2}\leq\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\Big{)}

via (B.2), which in turn implies

{TSN𝔷𝒳,𝔷𝒳<0}{W+D1𝔷𝒳(1+D22),𝔷𝒳<0}{𝔷𝒳(1+D22)<W+D1𝔷𝒳(1+D22D222),𝔷𝒳<0}\{T_{SN}\leq\mathfrak{z}_{\mathcal{X}},\mathfrak{z}_{\mathcal{X}}<0\}\subset\\ \bigg{\{}W+D_{1}\leq\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)},\mathfrak{z}_{\mathcal{X}}<0\bigg{\}}\cup\bigg{\{}\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)}<W+D_{1}\leq\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\Big{)},\mathfrak{z}_{\mathcal{X}}<0\bigg{\}}

and

{TSN𝔷𝒳,𝔷𝒳<0}{W+D1𝔷𝒳(1+D22),𝔷𝒳<0}.\{T_{SN}\leq\mathfrak{z}_{\mathcal{X}},\mathfrak{z}_{\mathcal{X}}<0\}\supset\bigg{\{}W+D_{1}\leq\mathfrak{z}_{\mathcal{X}}\Big{(}1+\frac{D_{2}}{2}\Big{)},\mathfrak{z}_{\mathcal{X}}<0\bigg{\}}.

For the rest of this section, we will use the sign function

sgn(z)I(z>0)I(z<0)\text{sgn}(z)\equiv I(z>0)-I(z<0)

for any zz\in\mathbb{R}. The conclusions in Sections B.0.1 and B.0.2, along with the equality

I(W+D1𝔷𝒳(1+D2/2))Φ(𝔷𝒳)=I(W+D1>𝔷𝒳(1+D2/2))Φ¯(𝔷𝒳),I\big{(}W+D_{1}\leq\mathfrak{z}_{\mathcal{X}}(1+D_{2}/2)\big{)}-\Phi(\mathfrak{z}_{\mathcal{X}})=I\big{(}W+D_{1}>\mathfrak{z}_{\mathcal{X}}(1+D_{2}/2)\big{)}-\bar{\Phi}(\mathfrak{z}_{\mathcal{X}}),

allow us to write

(B.3) |P(TSN𝔷𝒳)𝔼[Φ(𝔷𝒳)]|=|𝔼[(I(TSN𝔷𝒳)Φ(𝔷𝒳))I(𝔷𝒳<0)]+𝔼[(I(TSN>𝔷𝒳)Φ¯(𝔷𝒳))I(𝔷𝒳0)]|k=12P(|Dk|>1/2)+23/2𝔼[|h|3](12p+2p2)σh3N(1p)+|P(W¯Δ¯𝔷𝒳𝔷𝒳)𝔼[Φ(𝔷𝒳)]|+P(sgn(𝔷𝒳)𝔷𝒳(1+D22D222)sgn(𝔷𝒳)(W+D1)sgn(𝔷𝒳)𝔷𝒳(1+D22)),\big{|}P\big{(}T_{SN}\leq\mathfrak{z}_{\mathcal{X}}\big{)}-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}\\ =\bigg{|}\mathbb{E}\Big{[}\Big{(}I\big{(}T_{SN}\leq\mathfrak{z}_{\mathcal{X}}\big{)}-\Phi(\mathfrak{z}_{\mathcal{X}})\Big{)}\cdot I(\mathfrak{z}_{\mathcal{X}}<0)\Big{]}+\mathbb{E}\Big{[}\Big{(}I\big{(}T_{SN}>\mathfrak{z}_{\mathcal{X}}\big{)}-\bar{\Phi}(\mathfrak{z}_{\mathcal{X}})\Big{)}\cdot I(\mathfrak{z}_{\mathcal{X}}\geq 0)\Big{]}\bigg{|}\\ \leq\sum_{k=1}^{2}P(|D_{k}|>1/2)+\frac{2^{3/2}\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}+\big{|}P\big{(}{\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\leq\mathfrak{z}_{\mathcal{X}}\big{)}-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}+\\ P\bigg{(}\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\cdot\bigg{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\bigg{)}\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot(W+D_{1})\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\cdot\bigg{(}1+\frac{D_{2}}{2}\bigg{)}\bigg{)},

where we have also defined

(B.4) Δ¯𝔷𝒳𝔷𝒳D¯2/2D¯1\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\equiv\mathfrak{z}_{\mathcal{X}}\bar{D}_{2}/2-\bar{D}_{1}

and employed the fact

P(max𝐢In,m|ξ𝐢|>1)𝐢In,m𝔼[𝔼[|ξ𝐢|3𝒳]]23/2𝔼[|h|3](12p+2p2)σh3N(1p)P\big{(}\max_{{\bf i}\in{I_{n,m}}}|\xi_{\bf i}|>1\big{)}\leq\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[\mathbb{E}[|\xi_{\bf i}|^{3}\mid\mathcal{X}]]\leq\frac{2^{3/2}\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}

by the conditional third moment calculations in (3.4) and σh2/2U¯h2\sigma_{h}^{2}/2\leq\bar{U}_{h^{2}}. On the other hand, since fzf_{z} solves the Stein equation in (3.19), one can write

(B.5) P(W¯Δ¯𝔷𝒳𝔷𝒳)𝔼[Φ(𝔷𝒳)]=F+𝔼[Δ¯𝔷𝒳f𝔷𝒳(W¯)],P\big{(}{\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\leq\mathfrak{z}_{\mathcal{X}}\big{)}-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]=F+\mathbb{E}\Big{[}\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\Big{]},

where

F\displaystyle F 𝔼[I(W¯Δ¯𝔷𝒳𝔷𝒳)Φ(𝔷𝒳)Δ¯𝔷𝒳f𝔷𝒳(W¯)]\displaystyle\equiv\mathbb{E}\big{[}I({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\leq\mathfrak{z}_{\mathcal{X}})-\Phi({\mathfrak{z}_{\mathcal{X}}})-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\big{]}
(B.6) =𝔼[f𝔷𝒳(W¯Δ¯𝔷𝒳)W¯f𝔷𝒳(W¯Δ¯𝔷𝒳)+Δ¯𝔷𝒳(f𝔷𝒳(W¯Δ¯𝔷𝒳)f𝔷𝒳(W¯))].\displaystyle=\mathbb{E}\Big{[}f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})-{\bar{W}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})+\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})-f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\Big{)}\Big{]}.

Hence, with the definition of Δ¯𝔷𝒳\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}} and the property that 0<fz0.630<f_{z}\leq 0.63 (Lemma 4.1(i)(i)), from (B.5) we obtain

(B.7) |P(W¯Δ¯𝔷𝒳𝔷𝒳)𝔼[Φ(𝔷𝒳)]||F|+0.63𝔼[|D¯1|]+|12𝔼[𝔷𝒳D¯2f𝔷𝒳(W¯)]|.\big{|}P\big{(}{\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\leq\mathfrak{z}_{\mathcal{X}}\big{)}-\mathbb{E}[\Phi(\mathfrak{z}_{\mathcal{X}})]\big{|}\leq|F|+0.63\mathbb{E}[|\bar{D}_{1}|]+\bigg{|}\frac{1}{2}\mathbb{E}\big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{2}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\big{]}\bigg{|}.

Hence, to prove Lemma 3.1, it remain to establish (B.8) and (B.9) below:

(B.8) P(sgn(𝔷𝒳)𝔷𝒳(1+D22D222)sgn(𝔷𝒳)(W+D1)sgn(𝔷𝒳)𝔷𝒳(1+D22))C{𝔼[D¯22]+𝔼[|h|3](12p+2p2)σh3N(1p)+𝐢In,m𝔼[|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]]σh(nm)}+k=12P(|Dk|>1/2)+P(Ω\1,𝒳).P\bigg{(}\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\bigg{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\bigg{)}\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot(W+D_{1})\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\bigg{(}1+\frac{D_{2}}{2}\bigg{)}\bigg{)}\\ \leq C\Bigg{\{}\mathbb{E}[\bar{D}_{2}^{2}]+\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}+\frac{\sum_{{\bf i}\in I_{n,m}}\mathbb{E}\Big{[}|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}\Big{]}}{\sigma_{h}\sqrt{{n\choose m}}}\Bigg{\}}\\ +\sum_{k=1}^{2}P(|D_{k}|>1/2)+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}}).

and

(B.9) |F|C{𝔼[|h|3](12p+2p2)σh3N(1p)+𝔼[D¯22]+𝔼[E[D¯12𝒳]+𝐢In,m|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]σh(nm)]+P(Ω\1,𝒳)}|F|\leq C\Bigg{\{}\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}+\mathbb{E}[\bar{D}^{2}_{2}]+\\ \mathbb{E}\Bigg{[}\sqrt{E[\bar{D}_{1}^{2}\mid\mathcal{X}]}+\frac{\sum_{{\bf i}\in I_{n,m}}|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}}{\sigma_{h}\sqrt{{n\choose m}}}\Bigg{]}+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})\Bigg{\}}

Combining (B.3), (B.7)-(B.9) and 𝔼[|D¯1|]𝔼[𝔼[D¯12𝒳]]\mathbb{E}[|\bar{D}_{1}|]\leq\mathbb{E}[\sqrt{\mathbb{E}[\bar{D}_{1}^{2}\mid\mathcal{X}]}], Lemma 3.1 is then established.

In what follows, we will respectively prove (B.8) and (B.9) but with some details skipped. This is because, as discussed in (3.12), TSNT_{SN} is an example of a Studentized nonlinear statistic in (3.11) when conditioned on 𝒳\mathcal{X} for any

(B.10) 𝒳1,𝒳,\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}},

and a good part of the mathematical results already developed in Leung et al. (2024) can be directly borrowed; the readers will be given exact pointers to the corresponding sections in Leung et al. (2024) for these skipped details. We will focus on working out derivations that are not covered by Leung et al. (2024). In the framework of Leung et al. (2024), for each 𝐢In,m{\bf i}\in{I_{n,m}}, one also has to define the “leave-one-out” version of the remainder D2D_{2} in the normalizer of TSNT_{SN} akin to D1(𝐢)D_{1}^{({\bf i})} and D¯1(𝐢){\bar{D}}_{1}^{({\bf i})} in (3.15) and (3.17). However, Since D2D_{2} doesn’t depend on the Bernoulli samplers in 𝒵\mathcal{Z} indexed by 𝐢In,m{\bf i}\in{I_{n,m}}, unlike D1(𝐢)D_{1}^{({\bf i})} and D¯1(𝐢){\bar{D}}_{1}^{({\bf i})}, we can trivially take the “leave-one-out” versions of D2D_{2} and D¯2{\bar{D}}_{2} to be

(B.11) D2(𝐢)D2 and D¯2(𝐢)D¯2.D_{2}^{({\bf i})}\equiv D_{2}\text{ and }{\bar{D}}_{2}^{({\bf i})}\equiv{\bar{D}}_{2}.

We will also define the leave-one-out version of W¯{\bar{W}} in (B.1) as

(B.12) W¯(𝐢)=𝐣In,m,𝐣𝐢ξ¯𝐣.{\bar{W}}^{({\bf i})}=\sum_{{\bf j}\in{I_{n,m}},{\bf j}\neq{\bf i}}{\bar{\xi}}_{\bf j}.

Moreover, the following four inequalities will come in handy to supplement the developed results in Leung et al. (2024). The first is

(B.13) 𝐢In,m𝔼[ξ𝐢2I(|ξ𝐢|>1)𝒳]+𝐢In,m𝔼[|ξ𝐢|3I(|ξ𝐢|1)𝒳]𝐢In,m𝔼[|ξ𝐢|3𝒳]=U|h|3(12p+2p2)U¯h23/2N(1p)23/2U|h|3(12p+2p2)σh3N(1p),\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[\xi_{\bf i}^{2}I(|\xi_{\bf i}|>1)\mid\mathcal{X}]+\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[|\xi_{\bf i}|^{3}I(|\xi_{\bf i}|\leq 1)\mid\mathcal{X}]\\ \leq\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[|\xi_{\bf i}|^{3}\mid\mathcal{X}]=\frac{U_{|h|^{3}}(1-2p+2p^{2})}{\bar{U}_{h^{2}}^{3/2}\sqrt{N(1-p)}}\leq\frac{2^{3/2}U_{|h|^{3}}(1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}},

coming from the third moment calculation in (3.4) and σh2/2U¯h2\sigma_{h}^{2}/2\leq\bar{U}_{h^{2}}. Next is the Cauchy inequality

(B.14) 𝔼[|ξ¯𝐢e±W¯(𝐢)/2(D¯1D¯1(𝐢))|𝒳]𝔼[eW¯(𝐢)𝒳]𝔼[ξ¯𝐢2𝒳]𝔼[(D¯1D¯1(𝐢))2|𝒳]C|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]σh(nm) for 𝒳1,𝒳,\mathbb{E}\Big{[}|{\bar{\xi}}_{\bf i}e^{\pm\bar{W}^{({\bf i})}/2}(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})|\mid\mathcal{X}\Big{]}\leq\sqrt{\mathbb{E}[e^{{\bar{W}}^{({\bf i})}}\mid\mathcal{X}]\mathbb{E}[{\bar{\xi}}_{\bf i}^{2}\mid\mathcal{X}]}\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}\\ \leq C\frac{|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}}{\sigma_{h}\sqrt{{n\choose m}}}\text{ for }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}},

where in the last inequality we have used the second moment calculation in (3.4), σh2/2U¯h2\sigma_{h}^{2}/2\leq\bar{U}_{h^{2}} and Lemma B.1 in light of (3.13). The third is

(B.15) 𝔼[e±W¯|𝒳]8.15 for 𝒳1,𝒳,\mathbb{E}[e^{\pm{\bar{W}}}|\mathcal{X}]\leq 8.15\text{ for }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}},

by virtue of Lemma B.1 in light of (3.13). The last one is

𝔼[(1+eW¯(𝐢))|D¯1D¯1(𝐢)|𝒳]\displaystyle\mathbb{E}\Big{[}(1+e^{\bar{W}^{({\bf i})}})|\bar{D}_{1}-\bar{D}_{1}^{({\bf i})}|\mid\mathcal{X}\Big{]} (1+𝔼[e2W¯(𝐢)𝒳])𝔼[(D¯1D¯1(𝐢))2|𝒳]\displaystyle\leq\Big{(}1+\sqrt{\mathbb{E}[e^{2{\bar{W}}^{({\bf i})}}\mid\mathcal{X}]}\Big{)}\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}
(B.16) C𝔼[(D¯1D¯1(𝐢))2|𝒳] for 𝒳1,𝒳,\displaystyle\leq C\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}\text{ for }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}},

where we have again applied Lemma B.1 in light of (3.13).

B.1. Proof of (B.8)

Given how D2(𝐢)D_{2}^{({\bf i})} and D¯2(𝐢)\bar{D}_{2}^{({\bf i})} are defined in (B.11) and sgn(𝔷𝒳)𝔷𝒳0\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\geq 0, one can follow exactly the same arguments in Leung et al. (2024, Appendix C.1) to show that under condition (B.10),

P(sgn(𝔷𝒳)𝔷𝒳(1+D22D222)sgn(𝔷𝒳)(W+D1)sgn(𝔷𝒳)𝔷𝒳(1+D22)𝒳)k=12P(|Dk|>1/2𝒳)+C{𝐢In,m𝔼[ξ𝐢2I(|ξ𝐢|>1)𝒳]+𝐢In,m𝔼[|ξ𝐢|3I(|ξ𝐢|1)𝒳]+𝔼[(1+esgn(𝔷𝒳)W¯)D¯22𝒳]+𝐢In,m𝔼[|ξ¯𝐢esgn(𝔷𝒳)W¯(𝐢)/2(D¯1D¯1(𝐢))|𝒳]},P\bigg{(}\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\cdot\bigg{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\bigg{)}\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot(W+D_{1})\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\cdot\bigg{(}1+\frac{D_{2}}{2}\bigg{)}\mid\mathcal{X}\bigg{)}\\ \leq\sum_{k=1}^{2}P(|D_{k}|>1/2\mid\mathcal{X})+C\Bigg{\{}\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[\xi_{\bf i}^{2}I(|\xi_{\bf i}|>1)\mid\mathcal{X}]\\ +\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[|\xi_{\bf i}|^{3}I(|\xi_{\bf i}|\leq 1)\mid\mathcal{X}]+\mathbb{E}\Big{[}\big{(}1+e^{\text{sgn}(\mathfrak{z}_{\mathcal{X}}){\bar{W}}}\big{)}\bar{D}_{2}^{2}\mid\mathcal{X}\Big{]}+\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\big{[}|{\bar{\xi}}_{\bf i}e^{\text{sgn}(\mathfrak{z}_{\mathcal{X}})\bar{W}^{({\bf i})}/2}(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})|\mid\mathcal{X}\big{]}\Bigg{\}},

which can be further simplified by (B.13)-(B.15) as

(B.17) P(sgn(𝔷𝒳)𝔷𝒳(1+D22D222)sgn(𝔷𝒳)(W+D1)sgn(𝔷𝒳)𝔷𝒳(1+D22)𝒳)k=12P(|Dk|>12𝒳)+C{D¯22+U|h|3(12p+2p2)σh3N(1p)+𝐢In,m|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]σh(nm)} for 𝒳1,𝒳.P\bigg{(}\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\cdot\bigg{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\bigg{)}\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot(W+D_{1})\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\cdot\mathfrak{z}_{\mathcal{X}}\cdot\bigg{(}1+\frac{D_{2}}{2}\bigg{)}\mid\mathcal{X}\bigg{)}\\ \leq\sum_{k=1}^{2}P\bigg{(}|D_{k}|>\frac{1}{2}\mid\mathcal{X}\bigg{)}+C\Bigg{\{}\bar{D}_{2}^{2}+\frac{U_{|h|^{3}}(1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}+\frac{\sum_{{\bf i}\in I_{n,m}}|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}}{\sigma_{h}\sqrt{{n\choose m}}}\Bigg{\}}\\ \text{ for }\mathcal{X}\in\mathcal{E}_{1,\mathcal{X}}.

To form the bound on the marginal probability, we write

P(sgn(𝔷𝒳)𝔷𝒳(1+D22D222)sgn(𝔷𝒳)(W+D1)sgn(𝔷𝒳)𝔷𝒳(1+D22))P(Ω\1,𝒳)+𝔼[P(sgn(𝔷𝒳)𝔷𝒳(1+D22D222)sgn(𝔷𝒳)(W+D1)sgn(𝔷𝒳)𝔷𝒳(1+D22)𝒳)I(1,𝒳)];P\bigg{(}\text{sgn}(\mathfrak{z}_{\mathcal{X}})\mathfrak{z}_{\mathcal{X}}\bigg{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\bigg{)}\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})(W+D_{1})\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\mathfrak{z}_{\mathcal{X}}\bigg{(}1+\frac{D_{2}}{2}\bigg{)}\bigg{)}\leq P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})+\\ \mathbb{E}\bigg{[}P\bigg{(}\text{sgn}(\mathfrak{z}_{\mathcal{X}})\mathfrak{z}_{\mathcal{X}}\bigg{(}1+\frac{D_{2}}{2}-\frac{D_{2}^{2}}{2}\bigg{)}\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})(W+D_{1})\leq\text{sgn}(\mathfrak{z}_{\mathcal{X}})\mathfrak{z}_{\mathcal{X}}\bigg{(}1+\frac{D_{2}}{2}\bigg{)}\mid\mathcal{X}\bigg{)}I(\mathcal{E}_{1,\mathcal{X}})\bigg{]};

applying (B.17) to the prior display yields (B.8).

B.2. Proof of (B.9)

Let {Z𝐢}𝐢In,m\{Z_{\bf i}^{*}\}_{{\bf i}\in{I_{n,m}}} be independent copies of {Z𝐢}𝐢In,m\{Z_{\bf i}\}_{{\bf i}\in{I_{n,m}}}. For each 𝐢In,m{\bf i}\in{I_{n,m}}, define

D1,𝐢=D1({Z𝐣}𝐣In,m,𝐣𝐢,Z𝐢;𝒳),D_{1,{\bf i}^{*}}=D_{1}\Big{(}\{Z_{\bf j}\}_{{\bf j}\in{I_{n,m}},{\bf j}\neq{\bf i}},Z_{{\bf i}}^{*};\mathcal{X}\Big{)},

which is constructed in exactly same way as D1D_{1} in (3.5), except that Z𝐢Z_{\bf i} is replaced by Z𝐢Z_{\bf i}^{*} as the input, as well as its censored version

D¯1,𝐢D¯1,𝐢I(12D¯1,𝐢12)12I(D¯1,𝐢<12)+12I(D¯1,𝐢>12);\bar{D}_{1,{\bf i}^{*}}\equiv\bar{D}_{1,{\bf i}^{*}}I\bigg{(}-\frac{1}{2}\leq\bar{D}_{1,{\bf i}^{*}}\leq\frac{1}{2}\bigg{)}-\frac{1}{2}I\bigg{(}\bar{D}_{1,{\bf i}^{*}}<-\frac{1}{2}\bigg{)}+\frac{1}{2}I\bigg{(}\bar{D}_{1,{\bf i}^{*}}>\frac{1}{2}\bigg{)};

note that D1,𝐢D_{1,{\bf i}^{*}} is not to be confused with the “leave-one-out” D1(𝐢)D_{1}^{({\bf i})} defined in (3.15). From these, we also define

Δ¯𝔷𝒳,𝐢𝔷𝒳D¯2/2D¯1,𝐢,\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}}\equiv\mathfrak{z}_{\mathcal{X}}\bar{D}_{2}/2-\bar{D}_{1,{\bf i}^{*}},

which is similar to Δ¯𝔷𝒳\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}} in (B.4) but has D¯1\bar{D}_{1} replaced by D1,𝐢D_{1,{\bf i}^{*}}. Moreover, with the censored ξ¯𝐢{\bar{\xi}}_{\bf i} from (3.16), we define the the conditional K function

k¯𝐢(t)\displaystyle{\bar{k}}_{\bf i}(t) 𝔼[ξ¯𝐢(I(0tξ¯𝐢)I(ξ¯𝐢t<0))𝐗𝐢]\displaystyle\equiv\mathbb{E}\Big{[}{\bar{\xi}}_{\bf i}\big{(}I(0\leq t\leq{\bar{\xi}}_{\bf i})-I({\bar{\xi}}_{\bf i}\leq t<0)\big{)}\mid{\bf X}_{\bf i}\Big{]}
=𝔼[ξ¯𝐢(I(0tξ¯𝐢)I(ξ¯𝐢t<0))𝒳],\displaystyle=\mathbb{E}\Big{[}{\bar{\xi}}_{\bf i}\big{(}I(0\leq t\leq{\bar{\xi}}_{\bf i})-I({\bar{\xi}}_{\bf i}\leq t<0)\big{)}\mid\mathcal{X}\Big{]},

which has the properties

(B.18) k¯𝐢(t)𝑑t=11k¯𝐢(t)𝑑t=𝔼[ξ¯𝐢2𝒳] and |t|k¯𝐢(t)𝑑t=11|t|k¯𝐢(t)𝑑t=𝔼[|ξ¯𝐢|3𝒳]2\int_{-\infty}^{\infty}{\bar{k}}_{\bf i}(t)dt=\int_{-1}^{1}{\bar{k}}_{\bf i}(t)dt=\mathbb{E}[{\bar{\xi}}_{\bf i}^{2}\mid\mathcal{X}]\text{ and }\\ \int_{-\infty}^{\infty}|t|{\bar{k}}_{\bf i}(t)dt=\int_{-1}^{1}|t|{\bar{k}}_{\bf i}(t)dt=\frac{\mathbb{E}[|{\bar{\xi}}_{\bf i}|^{3}\mid\mathcal{X}]}{2}

Now we begin to establish (B.9). From the definition of FF in (B.6), we write

(B.19) F=𝔼[{f𝔷𝒳(W¯Δ¯𝔷𝒳)W¯f𝔷𝒳(W¯Δ¯𝔷𝒳)+Δ¯𝔷𝒳(f𝔷𝒳(W¯Δ¯𝔷𝒳)f𝔷𝒳(W¯))}I(1,𝒳)]F1+𝔼[{I(W¯Δ¯𝔷𝒳𝔷𝒳)Φ(𝔷𝒳)Δ¯𝔷𝒳f𝔷𝒳(W¯)}I(Ω\1,𝒳)]F2.F=\\ \underbrace{\mathbb{E}\bigg{[}\bigg{\{}f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})-{\bar{W}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})+\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\Big{(}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})-f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\Big{)}\bigg{\}}\cdot I(\mathcal{E}_{1,\mathcal{X}})\bigg{]}}_{\equiv F_{1}}\\ +\underbrace{\mathbb{E}\bigg{[}\Big{\{}I({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\leq\mathfrak{z}_{\mathcal{X}})-\Phi({\mathfrak{z}_{\mathcal{X}}})-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\Big{\}}\cdot I(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})\bigg{]}}_{\equiv F_{2}}.

Next, we further break down F1F_{1} by noticing two facts: By independence and the fundamental theorem of calculus, one have

(B.20) 𝔼[11f𝔷𝒳(W¯(𝐢)Δ¯𝔷𝒳,𝐢+t)k¯𝐢(t)𝑑t𝒳]=𝔼[ξ¯𝐢{f𝔷𝒳(W¯Δ¯𝔷𝒳,𝐢)f𝔷𝒳(W¯(𝐢)Δ¯𝔷𝒳,𝐢)}𝒳]\mathbb{E}\bigg{[}\int_{-1}^{1}f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}\Big{(}{\bar{W}}^{({\bf i})}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}}+t\Big{)}{\bar{k}}_{\bf i}(t)dt\mid\mathcal{X}\bigg{]}=\\ \mathbb{E}\bigg{[}{\bar{\xi}}_{\bf i}\bigg{\{}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}{\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}}\Big{)}-f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}{\bar{W}}^{({\bf i})}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}}\Big{)}\bigg{\}}\mid\mathcal{X}\bigg{]}

On the other hand, we have

(B.21) 𝕚In,mξ𝐢2=1 under condition (B.10)\sum_{\mathbb{i}\in{I_{n,m}}}\xi_{\bf i}^{2}=1\text{ under condition }\eqref{cx_event_condition}

as per the conditional moment calculations in (3.13). Combining the two observations in (B.20) and (B.21), as well as the fact k¯𝐢(t)𝑑t=𝔼[ξ¯𝐢2𝒳]\int_{-\infty}^{\infty}{\bar{k}}_{\bf i}(t)dt=\mathbb{E}[{\bar{\xi}}_{\bf i}^{2}\mid\mathcal{X}] in (B.18), one can write F1F_{1} as

(B.22) F1=k=14𝔼[F1kI(1,𝒳)]F_{1}=\sum_{k=1}^{4}\mathbb{E}[F_{1k}I(\mathcal{E}_{1,\mathcal{X}})]

where

F11𝐢In,m𝔼[11{f𝔷𝒳(W¯Δ¯𝔷𝒳)f𝔷𝒳(W¯(𝐢)Δ¯𝔷𝒳,𝐢+t)}k¯𝐢(t)𝑑t𝒳],F_{11}\equiv\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\Bigg{[}\int_{-1}^{1}\bigg{\{}f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}\bigg{(}{\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\bigg{)}-f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}\bigg{(}{\bar{W}}^{({\bf i})}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}}+t\bigg{)}\bigg{\}}{\bar{k}}_{\bf i}(t)dt\mid\mathcal{X}\Bigg{]},
F12𝐢In,m𝔼[(ξ𝐢21)I(|ξ𝐢|>1)𝒳]𝔼[f𝔷𝒳(W¯Δ¯𝔷𝒳)𝒳]𝐢In,m𝔼[ξ¯𝐢f𝔷𝒳(W¯(𝐢)Δ¯𝔷𝒳,𝐢)𝒳],F_{12}\equiv\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\bigg{[}(\xi_{\bf i}^{2}-1)I(|\xi_{\bf i}|>1)\mid\mathcal{X}\bigg{]}\cdot\mathbb{E}\bigg{[}f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})\mid\mathcal{X}\bigg{]}\\ -\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\bigg{[}{\bar{\xi}}_{{\bf i}}f_{\mathfrak{z}_{\mathcal{X}}}\bigg{(}{\bar{W}}^{({\bf i})}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}}\bigg{)}\mid\mathcal{X}\bigg{]},
F13{𝐢In,m𝔼[ξ¯𝐢{f𝔷𝒳(W¯Δ¯𝔷𝒳)f𝔷𝒳(W¯Δ¯𝔷𝒳,𝐢)}𝒳]}F_{13}\equiv\Bigg{\{}-\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\Bigg{[}{\bar{\xi}}_{{\bf i}}\Bigg{\{}f_{\mathfrak{z}_{\mathcal{X}}}\bigg{(}{\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\bigg{)}-f_{\mathfrak{z}_{\mathcal{X}}}\bigg{(}{\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}}\bigg{)}\Bigg{\}}\mid\mathcal{X}\Bigg{]}\Bigg{\}}

and

F14𝔼[Δ¯𝔷𝒳0Δ¯𝔷𝒳f𝔷𝒳(W¯+t)𝑑t𝒳].F_{14}\equiv\mathbb{E}\Bigg{[}\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\int_{0}^{-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}}f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}\bigg{(}{\bar{W}}+t\bigg{)}dt\mid\mathcal{X}\Bigg{]}.

We will now establish bounds for F11F_{11}, F12F_{12}, F13F_{13} and F14F_{14}.

Given how we define D2(𝐢)D_{2}^{({\bf i})} and D¯2(𝐢)\bar{D}_{2}^{({\bf i})} in (B.11), borrowing exactly the same arguments in Leung et al. (2024, Appendix C.2)333The terms F11F_{11}, F13F_{13} and F14F_{14} are completely analogous to the terms R1R_{1}, R3R_{3} and R4R_{4} appearing in Leung et al. (2024, Appendix C.2); under (B.10), all the assumptions for the (fairly laborious) proofs there in Leung et al. (2024, Appendix C.2) are met., one can establish the bound

|F11|C{𝐢In,m𝔼[ξ𝐢2I(|ξ𝐢|>1)𝒳]+𝐢In,m𝔼[|ξ𝐢|3I(|ξ𝐢|1)𝒳]+𝐢In,m𝔼[ξ¯𝐢2𝒳]𝔼[(1+eW¯(𝐢))|D¯1D¯1(𝐢)|𝒳]+𝐢In,m𝔼[|ξ¯𝐢eW¯(𝐢)/2(D¯1D¯1(𝐢))|𝒳]},|F_{11}|\leq C\Bigg{\{}\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[\xi_{\bf i}^{2}I(|\xi_{\bf i}|>1)\mid\mathcal{X}]+\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[|\xi_{\bf i}|^{3}I(|\xi_{\bf i}|\leq 1)\mid\mathcal{X}]+\\ \sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\big{[}{\bar{\xi}}_{\bf i}^{2}\mid\mathcal{X}\big{]}\cdot\mathbb{E}\Big{[}(1+e^{\bar{W}^{({\bf i})}})|\bar{D}_{1}-\bar{D}_{1}^{({\bf i})}|\mid\mathcal{X}\Big{]}+\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\Big{[}|{\bar{\xi}}_{\bf i}e^{\bar{W}^{({\bf i})}/2}(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})|\mid\mathcal{X}\Big{]}\Bigg{\}},
|F13|C𝐢In,m𝔼[|ξ¯𝐢(1+eW¯(𝐢)/2)(D¯1D¯1(𝐢))|𝒳]|F_{13}|\leq C\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\Big{[}|{\bar{\xi}}_{\bf i}(1+e^{\bar{W}^{({\bf i})}/2})(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})|\mid\mathcal{X}\Big{]}

and

|F14|C{E[D¯12𝒳]+D¯22+𝔼[eW¯D¯22𝒳]}, all under (B.10).|F_{14}|\leq C\Bigg{\{}\sqrt{E[\bar{D}_{1}^{2}\mid\mathcal{X}]}+\bar{D}_{2}^{2}+\mathbb{E}[e^{\bar{W}}\bar{D}_{2}^{2}\mid\mathcal{X}]\Bigg{\}},\text{ all under }\eqref{cx_event_condition}.

Applying U¯h2σh2/2\bar{U}_{h^{2}}\geq\sigma_{h}^{2}/2, (B.13)-(B.0.2) and the second moment calculation in (3.4) to the above bounds for F11F_{11}, F13F_{13} and F14F_{14} further gives

(B.23) |F11|C{U|h|3(12p+2p2)σh3N(1p)+𝐢In,m|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]σh(nm)},|F_{11}|\leq C\Bigg{\{}\frac{U_{|h|^{3}}(1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}+\sum_{{\bf i}\in{I_{n,m}}}\frac{|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}}{\sigma_{h}\sqrt{{n\choose m}}}\Bigg{\}},
(B.24) |F13|C𝐢In,m|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]σh(nm)|F_{13}|\leq C\sum_{{\bf i}\in{I_{n,m}}}\frac{|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}}{\sigma_{h}\sqrt{{n\choose m}}}

and

(B.25) |F14|C{E[D¯12𝒳]+D¯22}, all under (B.10),|F_{14}|\leq C\Bigg{\{}\sqrt{E[\bar{D}_{1}^{2}\mid\mathcal{X}]}+\bar{D}_{2}^{2}\Bigg{\}},\text{ all under }\eqref{cx_event_condition},

where we have also used 𝔼[ξ¯𝐢2𝒳]𝔼[ξ¯𝐢2𝒳]\mathbb{E}\big{[}{\bar{\xi}}_{\bf i}^{2}\mid\mathcal{X}\big{]}\leq\sqrt{\mathbb{E}\big{[}{\bar{\xi}}_{\bf i}^{2}\mid\mathcal{X}\big{]}} for (B.23), because |ξ¯𝐢|1|{\bar{\xi}}_{\bf i}|\leq 1.

For F12F_{12}, the following bound

(B.26) |F12|CU|h|3(12p+2p2)σh3N(1p).|F_{12}|\leq C\frac{U_{|h|^{3}}(1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}.

can also be established as follows: First, using |f𝔷𝒳|1|f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}|\leq 1 (Lemma 4.1(i)(i)) and (B.13), we have

(B.27) |𝐢In,m𝔼[(ξ𝐢21)I(|ξ𝐢|>1)𝒳]𝔼[f𝔷𝒳(W¯Δ¯𝔷𝒳)𝒳]|𝐢In,m𝔼[ξ𝐢2I(|ξ𝐢|>1)𝒳]23/2U|h|3(12p+2p2)σh3N(1p).\bigg{|}\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[(\xi_{\bf i}^{2}-1)I(|\xi_{\bf i}|>1)\mid\mathcal{X}]\cdot\mathbb{E}[f_{\mathfrak{z}_{\mathcal{X}}}^{\prime}({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}})\mid\mathcal{X}]\bigg{|}\\ \leq\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\Big{[}\xi_{\bf i}^{2}I(|\xi_{\bf i}|>1)\mid\mathcal{X}\Big{]}\leq\frac{2^{3/2}U_{|h|^{3}}(1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}.

Second, using that |f𝔷𝒳|0.63|f_{\mathfrak{z}_{\mathcal{X}}}|\leq 0.63 from Lemma 4.1(i)(i), we have

|𝐢In,m𝔼[ξ¯𝐢f𝔷𝒳(W¯(𝐢)Δ¯𝔷𝒳,𝐢)𝒳]|\displaystyle\Bigg{|}\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\Big{[}{\bar{\xi}}_{{\bf i}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}}^{({\bf i})}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}},{\bf i}^{*}})\mid\mathcal{X}\Big{]}\Bigg{|}
=|𝐢In,m𝔼[ξ¯𝐢𝒳]𝔼[f𝔷𝒳(W¯(𝐢)Δ¯𝔷𝒳,𝐢)𝒳]|\displaystyle=\Bigg{|}\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[{\bar{\xi}}_{{\bf i}}\mid\mathcal{X}]\mathbb{E}\Big{[}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}}^{({\bf i})}-\bar{\Delta}_{{\mathfrak{z}_{\mathcal{X}}},{\bf i}^{*}})\mid\mathcal{X}\Big{]}\Bigg{|}
0.63𝐢In,m|𝔼[ξ¯𝐢𝒳]|\displaystyle\leq 0.63\sum_{{\bf i}\in{I_{n,m}}}\Big{|}\mathbb{E}\big{[}{\bar{\xi}}_{{\bf i}}\mid\mathcal{X}\big{]}\Big{|}
=0.63𝐢In,m|𝔼[(ξ𝐢1)I(ξ𝐢>1)+(ξ𝐢+1)I(ξ𝐢<1)𝒳]| by 𝔼[ξ𝐢𝒳]=0 in (3.4)\displaystyle=0.63\sum_{{\bf i}\in{I_{n,m}}}\Big{|}\mathbb{E}\big{[}(\xi_{\bf i}-1)I(\xi_{\bf i}>1)+(\xi_{\bf i}+1)I(\xi_{\bf i}<-1)\mid\mathcal{X}\big{]}\Big{|}\text{ by }\mathbb{E}[\xi_{\bf i}\mid\mathcal{X}]=0\text{ in }\eqref{cX_cond_moment_properties_of_xi}
0.63𝐢In,m𝔼[|ξ𝐢|I(|ξ𝐢|>1)𝒳]\displaystyle\leq 0.63\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}\Big{[}|\xi_{{\bf i}}|I(|\xi_{{\bf i}}|>1)\mid\mathcal{X}\Big{]}
0.63𝐢In,m𝔼[|ξ𝐢|3𝒳]\displaystyle\leq 0.63\sum_{{\bf i}\in{I_{n,m}}}\mathbb{E}[|\xi_{{\bf i}}|^{3}\mid\mathcal{X}]
(B.28) =0.63U|h|3(12p+2p2)U¯h23/2N(1p)1.262U|h|3(12p+2p2)σh3N(1p),\displaystyle=0.63\frac{U_{|h|^{3}}(1-2p+2p^{2})}{\bar{U}_{h^{2}}^{3/2}\sqrt{N(1-p)}}\leq 1.26\frac{\sqrt{2}U_{|h|^{3}}(1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}},

where (3.4) is used in the last equality. So (B.27) and (B.28) together give (B.26). Collecting (B.22)-(B.26), we can get

(B.29) |F1|C{𝔼[|h|3](12p+2p2)σh3N(1p)+𝔼[D¯22]+𝔼[E[D¯12𝒳]+𝐢In,m|h(𝐗𝐢)|𝔼[(D¯1D¯1(𝐢))2|𝒳]σh(nm)]}.|F_{1}|\leq C\Bigg{\{}\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}+\mathbb{E}[\bar{D}^{2}_{2}]+\\ \mathbb{E}\Bigg{[}\sqrt{E[\bar{D}_{1}^{2}\mid\mathcal{X}]}+\frac{\sum_{{\bf i}\in I_{n,m}}|h({\bf X}_{\bf i})|\sqrt{\mathbb{E}[(\bar{D}_{1}-\bar{D}_{1}^{({\bf i})})^{2}|\mathcal{X}]}}{\sigma_{h}\sqrt{{n\choose m}}}\Bigg{]}\Bigg{\}}.

For F2F_{2} in (B.19), because |f𝔷𝒳|0.63|f_{\mathfrak{z}_{\mathcal{X}}}|\leq 0.63 from Lemma 4.1(i)(i) and supz,w|zfz(w)|1\sup_{z,w\in\mathbb{R}}|zf_{z}(w)|\leq 1 in (4.1), it is easy to see that

I(W¯Δ¯𝔷𝒳𝔷𝒳)Φ(𝔷𝒳)Δ¯𝔷𝒳f𝔷𝒳(W¯)=I(W¯Δ¯𝔷𝒳𝔷𝒳)Φ(𝔷𝒳)+f𝔷𝒳(W¯)D¯1𝔷𝒳f𝔷𝒳(W¯)D¯22I({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\leq\mathfrak{z}_{\mathcal{X}})-\Phi({\mathfrak{z}_{\mathcal{X}}})-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\\ =I({\bar{W}}-\bar{\Delta}_{\mathfrak{z}_{\mathcal{X}}}\leq{\mathfrak{z}_{\mathcal{X}}})-\Phi(\mathfrak{z}_{\mathcal{X}})+f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\bar{D}_{1}-\frac{\mathfrak{z}_{\mathcal{X}}f_{\mathfrak{z}_{\mathcal{X}}}({\bar{W}})\bar{D}_{2}}{2}

can be bounded in absolute value by 33 to give that

(B.30) |F2|3P(Ω\1,𝒳).|F_{2}|\leq 3P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}}).

Lastly, combining (B.19), (B.29) and (B.30) we obtain (B.9).

Appendix C Bounding T(D1)T(D_{1}) in Lemma 3.2

To prove Lemma 3.2, we first claim that the following conditional 2\ell_{2}-norm bounds hold:

(C.1) 𝔼[D12|𝒳]2(1+3N)Uh2U¯h2N+22|Un|σh1p.\sqrt{\mathbb{E}[D_{1}^{2}|\mathcal{X}]}\leq\frac{2\sqrt{(1+3N)U_{h^{2}}}}{\sqrt{\bar{U}_{h^{2}}}N}+\frac{2\sqrt{2}\big{|}U_{n}\big{|}}{\sigma_{h}\sqrt{1-p}}.

and

(C.2) 𝔼[(D1D1(𝐢))2|𝒳]2|h(𝐗𝐢)|1+NNU¯h2(nm)+4Uh2U¯h2N(nm)+4|Un|U¯h2(1p)(nm).\sqrt{\mathbb{E}[(D_{1}-D_{1}^{({\bf i})})^{2}|\mathcal{X}]}\leq\frac{2|h({\bf X}_{\bf i})|\sqrt{1+N}}{N\sqrt{\bar{U}_{h^{2}}{n\choose m}}}+\frac{4\sqrt{U_{h^{2}}}}{\sqrt{\bar{U}_{h^{2}}N{n\choose m}}}+\frac{4|U_{n}|}{\sqrt{\bar{U}_{h^{2}}(1-p){n\choose m}}}.

Since D¯12\bar{D}_{1}^{2} amounts to the non-negative D12D_{1}^{2} upper-censored at 1/41/4, Property 2.2(i)(i) and (ii)(ii) give that |D¯1D¯1(𝐢)||D1D1(𝐢)||\bar{D}_{1}-\bar{D}_{1}^{({\bf i})}|\leq|D_{1}-D_{1}^{({\bf i})}| and D¯12D12\bar{D}_{1}^{2}\leq D_{1}^{2}. These, together with the Markov inequality

P(|D1|>1/2)2𝔼[𝔼[D12|𝒳]],P(|D_{1}|>1/2)\leq 2\mathbb{E}\bigg{[}\sqrt{\mathbb{E}[D_{1}^{2}|\mathcal{X}]}\bigg{]},

allow us to bound T(D1)T(D_{1}) with (C.1) and (C.2) as

T(D1)\displaystyle T(D_{1}) 𝔼[6(1+3N)Uh2NU¯h2+62|Un|σh1p+2Uh21+NσhNU¯h2+4U|h|Uh2σhNU¯h2+4U|h||Un|σhU¯h2(1p)]\displaystyle\leq\mathbb{E}\Bigg{[}\frac{6\sqrt{(1+3N)U_{h^{2}}}}{N\sqrt{\bar{U}_{h^{2}}}}+\frac{6\sqrt{2}|U_{n}|}{\sigma_{h}\sqrt{1-p}}+\frac{2U_{h^{2}}\sqrt{1+N}}{\sigma_{h}N\sqrt{\bar{U}_{h^{2}}}}+\frac{4U_{|h|}\sqrt{U_{h^{2}}}}{\sigma_{h}\sqrt{N\bar{U}_{h^{2}}}}+\frac{4U_{|h|}|U_{n}|}{\sigma_{h}\sqrt{\bar{U}_{h^{2}}(1-p)}}\Bigg{]}
(C.3) 𝔼[6(1+3N)N+62|Un|σh1p+2Uh2(1+N)σhN+4Uh2σhN+4|Un|σh(1p)]\displaystyle\leq\mathbb{E}\Bigg{[}\frac{6\sqrt{(1+3N)}}{N}+\frac{6\sqrt{2}|U_{n}|}{\sigma_{h}\sqrt{1-p}}+\frac{2\sqrt{U_{h^{2}}(1+N)}}{\sigma_{h}N}+\frac{4\sqrt{U_{h^{2}}}}{\sigma_{h}\sqrt{N}}+\frac{4|U_{n}|}{\sigma_{h}\sqrt{(1-p)}}\Bigg{]}

where we have defined the U-statistic with kernel |h||h|

U|h|(nm)1𝐣In,m|h|(X𝐣).U_{|h|}\equiv{n\choose m}^{-1}\sum_{{\bf j}\in I_{n,m}}|h|(X_{\bf j}).

and used the inequality U|h|Uh2U¯h2U_{|h|}\leq\sqrt{U_{h^{2}}}\leq\sqrt{\bar{U}_{h^{2}}} in (C.3). Since

𝔼[|Un|]𝔼[Un2]mnσh\mathbb{E}[|U_{n}|]\leq\sqrt{\mathbb{E}[U_{n}^{2}]}\leq\sqrt{\frac{m}{n}}\sigma_{h}

(Koroljuk and Borovskich, 1994, Lemma 1.1.4) and 𝔼[Uh2]𝔼[Uh2]=σh\mathbb{E}[\sqrt{U_{h^{2}}}]\leq\sqrt{\mathbb{E}[U_{h^{2}}]}=\sigma_{h}, continuing from (C.3) we further get

(C.4) T(D1)6(1+3N)N+62mn(1p)+2(1+N)N+4N+4mn(1p),T(D_{1})\leq\frac{6\sqrt{(1+3N)}}{N}+\frac{6\sqrt{2m}}{\sqrt{n(1-p)}}+\frac{2\sqrt{(1+N)}}{N}+\frac{4}{\sqrt{N}}+\frac{4\sqrt{m}}{\sqrt{n(1-p)}},

which implies Lemma 3.2. It remains to prove (C.1) and (C.2). In what follows, we will use the fact that

(C.5) |N^cN||N^N|;|\hat{N}_{c}-N|\leq|\hat{N}-N|;

this is true because if N^[N2,3N2]\hat{N}\not\in[\frac{N}{2},\frac{3N}{2}], it must be that |N^cN|=N2|\hat{N}_{c}-N|=\frac{N}{2} and |N^N|N2|\hat{N}-N|\geq\frac{N}{2}.

C.1. Proof of (C.1)

First, by the definition of D1D_{1} in (3.5) and the fact in (C.5),

(C.6) |D1|2|(𝐣In,mξ𝐣)(𝐤In,m(Z𝐤p))|N+22|Un(N^N)|σhN(1p).|D_{1}|\leq\frac{2\big{|}\big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\big{)}\big{(}\sum_{{\bf k}\in I_{n,m}}(Z_{\bf k}-p)\big{)}\big{|}}{N}+\frac{2\sqrt{2}\big{|}U_{n}\cdot(\hat{N}-N)\big{|}}{\sigma_{h}\sqrt{N(1-p)}}.

In addition,

(C.7) 𝔼[|N^N|𝒳]N^N2=𝐢In,mp(1p)=N(1p)N\mathbb{E}[|\hat{N}-N|\mid\mathcal{X}]\leq\|\hat{N}-N\|_{2}=\sqrt{\sum_{{\bf i}\in I_{n,m}}p(1-p)}=\sqrt{N(1-p)}\leq\sqrt{N}

Moreover, by independence among {Z𝐣}𝐣In,m\{Z_{\bf j}\}_{{\bf j}\in{I_{n,m}}} and that 𝔼[ξ𝐣|𝒳]=0\mathbb{E}[\xi_{\bf j}|\mathcal{X}]=0 from (3.4),

𝔼[(𝐣In,mξ𝐣)2(𝐤In,m(Z𝐤p))2𝒳]\displaystyle\mathbb{E}\bigg{[}\bigg{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\bigg{)}^{2}\bigg{(}\sum_{{\bf k}\in I_{n,m}}(Z_{\bf k}-p)\bigg{)}^{2}\mid\mathcal{X}\bigg{]}
=𝔼[𝐣In,mξ𝐣2(Z𝐣p)2+𝐣In,mξ𝐣2𝐤𝐣(Z𝐤p)2+2𝐣In,mξ𝐣(Z𝐣p)(𝐤𝐣ξ𝐤(Z𝐤p))𝒳]\displaystyle=\mathbb{E}\Big{[}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}^{2}(Z_{\bf j}-p)^{2}+\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}^{2}\sum_{{\bf k}\neq{\bf j}}(Z_{\bf k}-p)^{2}+2\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}(Z_{\bf j}-p)\bigg{(}\sum_{{\bf k}\neq{\bf j}}\xi_{\bf k}(Z_{\bf k}-p)\bigg{)}\mid\mathcal{X}\Big{]}
=𝐣In,mp(1p)(1+3p23p)h2(𝐗𝐣)U¯h2N(1p)+((nm)1)p2(1p)2𝐣In,mh2(𝐗𝐣)U¯h2N(1p)\displaystyle=\sum_{{\bf j}\in I_{n,m}}\frac{p(1-p)(1+3p^{2}-3p)h^{2}({\bf X}_{\bf j})}{\bar{U}_{h^{2}}N(1-p)}+\bigg{(}{n\choose m}-1\bigg{)}\frac{p^{2}(1-p)^{2}\sum_{{\bf j}\in I_{n,m}}h^{2}({\bf X}_{\bf j})}{\bar{U}_{h^{2}}N(1-p)}
+2p2(1p)2U¯h2N(1p)𝐣In,mh(𝐗𝐣)(𝐤𝐣h(𝐗𝐤))\displaystyle\hskip 227.62204pt+\frac{2p^{2}(1-p)^{2}}{\bar{U}_{h^{2}}N(1-p)}\sum_{{\bf j}\in I_{n,m}}h({\bf X}_{\bf j})\bigg{(}\sum_{{\bf k}\neq{\bf j}}h({\bf X}_{\bf k})\bigg{)}
=Uh2U¯h2(1+3p23p+((nm)1)p(1p))+2N(1p)U¯h2(nm)2𝐣In,mh(𝐗𝐣)(𝐤𝐣h(𝐗𝐤))\displaystyle=\frac{U_{h^{2}}}{\bar{U}_{h^{2}}}\bigg{(}1+3p^{2}-3p+\bigg{(}{n\choose m}-1\bigg{)}p(1-p)\bigg{)}+\frac{2N(1-p)}{\bar{U}_{h^{2}}{n\choose m}^{2}}\sum_{{\bf j}\in I_{n,m}}h({\bf X}_{\bf j})\bigg{(}\sum_{{\bf k}\neq{\bf j}}h({\bf X}_{\bf k})\bigg{)}
Uh2U¯h2(1+3p23p+N(1p))+2N(1p)U|h|2U¯h2\displaystyle\leq\frac{U_{h^{2}}}{\bar{U}_{h^{2}}}\Big{(}1+3p^{2}-3p+N(1-p)\Big{)}+\frac{2N(1-p)U_{|h|}^{2}}{\bar{U}_{h^{2}}}
Uh2(1+3p23p+3N(1p))U¯h2(1+3N)Uh2U¯h2,\displaystyle\leq\frac{U_{h^{2}}(1+3p^{2}-3p+3N(1-p))}{\bar{U}_{h^{2}}}\leq\frac{(1+3N)U_{h^{2}}}{\bar{U}_{h^{2}}},

where the second last inequality uses Jensen’s inequality U|h|2Uh2U^{2}_{|h|}\leq U_{h^{2}}. Combining the last inequality with (C.6) and (C.7), we get (C.1).

C.2. Proof of (C.2)

From the definitions of D1D_{1} and D1(𝐢)D_{1}^{({\bf i})} in (3.5) and (3.15), write

D1D1(𝐢)\displaystyle D_{1}-D_{1}^{({\bf i})} =ξ𝐢NN^cN^c+(𝐣𝐢ξ𝐣+NUnU¯h2(1p))(NN^cN^cNN^c(𝐢)N^c(𝐢))\displaystyle=\xi_{\bf i}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}+\Bigg{(}\sum_{{\bf j}\neq{\bf i}}\xi_{\bf j}+\frac{\sqrt{N}U_{n}}{\sqrt{\bar{U}_{h^{2}}(1-p)}}\Bigg{)}\Bigg{(}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}-\frac{N-\hat{N}_{c}^{({\bf i})}}{\hat{N}_{c}^{({\bf i})}}\Bigg{)}
=ξ𝐢NN^cN^c+(𝐣𝐢ξ𝐣+NUnU¯h2(1p))(N(N^c(𝐢)N^c)N^cN^c(𝐢)).\displaystyle=\xi_{\bf i}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}+\Bigg{(}\sum_{{\bf j}\neq{\bf i}}\xi_{\bf j}+\frac{\sqrt{N}U_{n}}{\sqrt{\bar{U}_{h^{2}}(1-p)}}\Bigg{)}\Bigg{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\Bigg{)}.

Hence, by the triangular inequality, we have

(C.8) 𝔼[(D1D1(𝐢))2𝒳]𝔼[(ξ𝐢NN^cN^c)2𝒳]+𝔼[(𝐣𝐢ξ𝐣)2(N(N^c(𝐢)N^c)N^cN^c(𝐢))2𝒳]+𝔼[NUn2U¯h2(1p)(N(N^c(𝐢)N^c)N^cN^c(𝐢))2𝒳].\sqrt{\mathbb{E}[(D_{1}-D_{1}^{({\bf i})})^{2}\mid\mathcal{X}]}\leq\sqrt{\mathbb{E}\bigg{[}\bigg{(}\xi_{\bf i}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\bigg{)}^{2}\mid\mathcal{X}\bigg{]}}\\ +\sqrt{\mathbb{E}\bigg{[}\Big{(}\sum_{{\bf j}\neq{\bf i}}\xi_{\bf j}\Big{)}^{2}\bigg{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\bigg{)}^{2}\mid\mathcal{X}\bigg{]}}+\sqrt{\mathbb{E}\bigg{[}\frac{NU_{n}^{2}}{\bar{U}_{h^{2}}(1-p)}\bigg{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\bigg{)}^{2}\mid\mathcal{X}\bigg{]}}.

We now bound the conditional second moments on the right of (C.8) in order. Since

(C.9) |ξ𝐢NN^cN^c||(Z𝐢p)(N^N)h(𝐗𝐢)|N^cU¯h2N(1p)2|(Z𝐢p)(N^N)h(𝐗𝐢)|N3/2U¯h2(1p)\bigg{|}\xi_{\bf i}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\bigg{|}\leq\frac{|(Z_{\bf i}-p)(\hat{N}-N)h({\bf X}_{\bf i})|}{\hat{N}_{c}\sqrt{\bar{U}_{h^{2}}N(1-p)}}\leq\frac{2|(Z_{\bf i}-p)(\hat{N}-N)h({\bf X}_{\bf i})|}{N^{3/2}\sqrt{\bar{U}_{h^{2}}(1-p)}}

by (C.5), we will compute the second moment of (Z𝐢p)(N^N)(Z_{\bf i}-p)(\hat{N}-N). Upon expansion,

(Z𝐢p)2(N^N)2\displaystyle(Z_{\bf i}-p)^{2}(\hat{N}-N)^{2}
=(Z𝐢p)2(Z𝐢+𝐣𝐢Z𝐣N)2\displaystyle=(Z_{\bf i}-p)^{2}(Z_{\bf i}+\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-N)^{2}
=(Z𝐢p)2(Z𝐢2+2Z𝐢(𝐣𝐢Z𝐣N)+(𝐣𝐢Z𝐣N)2)\displaystyle=(Z_{\bf i}-p)^{2}\bigg{(}Z_{\bf i}^{2}+2Z_{\bf i}\bigg{(}\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-N\bigg{)}+\bigg{(}\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-N\bigg{)}^{2}\bigg{)}
=(Z𝐢p)2{Z𝐢(1+2(𝐣𝐢Z𝐣N))+(𝐣𝐢Z𝐣N)2}\displaystyle=(Z_{\bf i}-p)^{2}\bigg{\{}Z_{\bf i}\bigg{(}1+2\bigg{(}\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-N\bigg{)}\bigg{)}+\bigg{(}\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-N\bigg{)}^{2}\bigg{\}}
=Z𝐢(1p)2(1+2(𝐣𝐢Z𝐣N))+(Z𝐢p)2(𝐣𝐢Z𝐣N=𝐣𝐢Z𝐣𝔼[𝐣𝐢Z𝐣]p)2.\displaystyle=Z_{\bf i}(1-p)^{2}\bigg{(}1+2\bigg{(}\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-N\bigg{)}\bigg{)}+(Z_{\bf i}-p)^{2}\bigg{(}\underbrace{\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-N}_{=\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}-\mathbb{E}[\sum_{{\bf j}\neq{\bf i}}Z_{\bf j}]-p}\bigg{)}^{2}.

Taking expectations on both sides, we get

𝔼[(Z𝐢p)2(NN^)2]\displaystyle\mathbb{E}[(Z_{\bf i}-p)^{2}(N-\hat{N})^{2}]
=p(1p)2(1+2(NpN))+p(1p)(((nm)1)p(1p)+p2)\displaystyle=p(1-p)^{2}\bigg{(}1+2(N-p-N)\bigg{)}+p(1-p)\bigg{(}\bigg{(}{n\choose m}-1\bigg{)}p(1-p)+p^{2}\bigg{)}
=p(1p)((1p)(12p)+N(1p)p(1p)+p2)\displaystyle=p(1-p)\bigg{(}(1-p)(1-2p)+N(1-p)-p(1-p)+p^{2}\bigg{)}
=p(1p)(1+4p24p+N(1p))\displaystyle=p(1-p)(1+4p^{2}-4p+N(1-p))
(C.10) =p(1p)((2p1)2+N(1p)).\displaystyle=p(1-p)((2p-1)^{2}+N(1-p)).

Hence, from (C.9) and (C.10), we get

𝔼[(ξ𝐢NN^cN^c)2𝒳]\displaystyle\mathbb{E}\bigg{[}\bigg{(}\xi_{\bf i}\frac{N-\hat{N}_{c}}{\hat{N}_{c}}\bigg{)}^{2}\mid\mathcal{X}\bigg{]} 4𝔼[(Z𝐢p)2(N^N)2h2(𝐗𝐢)𝒳]N3U¯h2(1p)\displaystyle\leq\frac{4\mathbb{E}[(Z_{\bf i}-p)^{2}(\hat{N}-N)^{2}h^{2}({\bf X}_{\bf i})\mid\mathcal{X}]}{N^{3}\bar{U}_{h^{2}}(1-p)}
4h2(𝐗𝐢)((2p1)2+N(1p))N2(nm)U¯h2\displaystyle\leq\frac{4h^{2}({\bf X}_{\bf i})((2p-1)^{2}+N(1-p))}{N^{2}{n\choose m}\bar{U}_{h^{2}}}
(C.11) 4h2(𝐗𝐢)(1+N)N2(nm)U¯h2\displaystyle\leq\frac{4h^{2}({\bf X}_{\bf i})(1+N)}{N^{2}{n\choose m}\bar{U}_{h^{2}}}

Now we bound 𝔼[(𝐣𝐢ξ𝐣)2(N(N^c(𝐢)N^c)N^cN^c(𝐢))2𝒳]\mathbb{E}\bigg{[}\Big{(}\sum_{{\bf j}\neq{\bf i}}\xi_{\bf j}\Big{)}^{2}\bigg{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\bigg{)}^{2}\mid\mathcal{X}\bigg{]}. Note that, by the definitions of N^c\hat{N}_{c} and N^c(𝐢)\hat{N}_{c}^{({\bf i})} in (2.12) and (3.14), as well as Property 2.2(i)(i),

(𝐣𝐢ξ𝐣)2(N(N^c(𝐢)N^c)N^cN^c(𝐢))2(𝐣𝐢ξ𝐣)2(4(N^(𝐢)N^)N)2=(𝐣𝐢(Z𝐣p)h(𝐗𝐣)U¯h2N(1p))2(4Z𝐢N)2,\Big{(}\sum_{{\bf j}\neq{\bf i}}\xi_{\bf j}\Big{)}^{2}\bigg{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\bigg{)}^{2}\leq\Big{(}\sum_{{\bf j}\neq{\bf i}}\xi_{\bf j}\Big{)}^{2}\bigg{(}\frac{4(\hat{N}^{({\bf i})}-\hat{N})}{N}\bigg{)}^{2}=\Big{(}\sum_{{\bf j}\neq{\bf i}}\frac{(Z_{\bf j}-p)h({\bf X}_{\bf j})}{\sqrt{\bar{U}_{h^{2}}N(1-p)}}\Big{)}^{2}\bigg{(}\frac{4Z_{\bf i}}{N}\bigg{)}^{2},

we can then take conditional expectation on both sides and get

𝔼[(𝐣𝐢ξ𝐣)2(N(N^c(𝐢)N^c)N^cN^c(𝐢))2𝒳]\displaystyle\mathbb{E}\bigg{[}\Big{(}\sum_{{\bf j}\neq{\bf i}}\xi_{\bf j}\Big{)}^{2}\bigg{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\bigg{)}^{2}\mid\mathcal{X}\bigg{]} 16pN2𝐣𝐢p(1p)h2(𝐗𝐣)U¯h2N(1p)\displaystyle\leq\frac{16p}{N^{2}}\sum_{{\bf j}\neq{\bf i}}\frac{p(1-p)h^{2}({\bf X}_{\bf j})}{\bar{U}_{h^{2}}N(1-p)}
(C.12) =16pN2U¯h2𝐣𝐢h2(𝐗𝐣)(nm)16Uh2N(nm)U¯h2\displaystyle=\frac{16p}{N^{2}\bar{U}_{h^{2}}}\sum_{{\bf j}\neq{\bf i}}\frac{h^{2}({\bf X}_{\bf j})}{{n\choose m}}\leq\frac{16U_{h^{2}}}{N{n\choose m}\bar{U}_{h^{2}}}

Lastly, for 𝔼[NUn2U¯h2(1p)(N(N^c(𝐢)N^c)N^cN^c(𝐢))2𝒳]\mathbb{E}\Big{[}\frac{NU_{n}^{2}}{\bar{U}_{h^{2}}(1-p)}\Big{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\Big{)}^{2}\mid\mathcal{X}\Big{]}, we have

𝔼[NUn2U¯h2(1p)(N(N^c(𝐢)N^c)N^cN^c(𝐢))2𝒳]\displaystyle\mathbb{E}\Bigg{[}\frac{NU_{n}^{2}}{\bar{U}_{h^{2}}(1-p)}\Bigg{(}\frac{N(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})}{\hat{N}_{c}\hat{N}_{c}^{({\bf i})}}\Bigg{)}^{2}\mid\mathcal{X}\Bigg{]} 16Un2𝔼[(N^c(𝐢)N^c)2]U¯h2(1p)N\displaystyle\leq\frac{16U_{n}^{2}\mathbb{E}[(\hat{N}_{c}^{({\bf i})}-\hat{N}_{c})^{2}]}{\bar{U}_{h^{2}}(1-p)N}
(C.13) 16Un2𝔼[Z𝐢2]U¯h2(1p)N=16Un2U¯h2(1p)(nm),\displaystyle\leq\frac{16U_{n}^{2}\mathbb{E}[Z_{\bf i}^{2}]}{\bar{U}_{h^{2}}(1-p)N}=\frac{16U_{n}^{2}}{\bar{U}_{h^{2}}(1-p){n\choose m}},

where the last inequality comes from Property 2.2(i)(i).

Putting (C.11)-(C.13) back into (C.8), we obtain (C.2).

Appendix D Supplementary calculations for Section 4

This section proves Lemmas 4.1-4.2 and (4.7)-(4.9) in Section 4.

D.1. Proof of Lemma 4.1

Leung et al. (2024, Appendix A) has already established Lemma 4.1 (i)(i) and Lemma 4.1 (ii)(ii). To prove Lemma 4.1(iii)(iii), let z~=z\tilde{z}=-z and w~=w\tilde{w}=-w. By the definition in (3.18), it is easy to see that

fz(w)=fz~(w~);f_{z}(w)=f_{\tilde{z}}(\tilde{w});

since z~1\tilde{z}\geq 1, we can apply Lemma 4.1(ii)(ii) to fz~(w~)f_{\tilde{z}}(\tilde{w}) and get Lemma 4.1(iii)(iii).

D.2. Proof of Lemma 4.2

We first prove the bound for Π1\Pi_{1}, and note that the constant C()C(\ell) below, while depending only on \ell, can change in values at different occurrences. For >2\ell>2, by Rosenthal (1970, Theorem 3)’s inequality, we have

𝔼[|i=1n(η¯i𝔼[η¯i])|]\displaystyle\mathbb{E}\Big{[}\Big{|}\sum_{i=1}^{n}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\Big{|}^{\ell}\Big{]}
C(){(i=1n𝔼[(η¯i𝔼[η¯i])2])/2+i=1n𝔼[|η¯i𝔼[η¯i]|]}\displaystyle\leq C(\ell)\Bigg{\{}\Big{(}\sum_{i=1}^{n}\mathbb{E}[({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])^{2}]\Big{)}^{\ell/2}+\sum_{i=1}^{n}\mathbb{E}[|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|^{\ell}]\Bigg{\}}
C(){(i=1n𝔼[|η¯i𝔼[η¯i]|3/2])/2+i=1n𝔼[|η¯i𝔼[η¯i]|3/2]} since |η¯i𝔼[η¯i]|1\displaystyle\leq C(\ell)\Bigg{\{}\Big{(}\sum_{i=1}^{n}\mathbb{E}[|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|^{3/2}]\Big{)}^{\ell/2}+\sum_{i=1}^{n}\mathbb{E}[|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|^{3/2}]\Bigg{\}}\text{ since }|\bar{\eta}_{i}-\mathbb{E}[\bar{\eta}_{i}]|\leq 1
C(){(i=1n𝔼[η¯i3/2])/2+i=1n𝔼[η¯i3/2]} since 𝔼[η¯i3/2](𝔼[η¯i])3/2>0\displaystyle\leq C(\ell)\Bigg{\{}\Big{(}\sum_{i=1}^{n}\mathbb{E}[{\bar{\eta}}_{i}^{3/2}]\Big{)}^{\ell/2}+\sum_{i=1}^{n}\mathbb{E}[{\bar{\eta}}_{i}^{3/2}]\Bigg{\}}\text{ since }\mathbb{E}[{\bar{\eta}}_{i}^{3/2}]\geq(\mathbb{E}[{\bar{\eta}}_{i}])^{3/2}>0
C(){(m3/2𝔼[Ψ13/2]n1/2σh3)/2+m3/2𝔼[Ψ13/2]n1/2σh3}.\displaystyle\leq C(\ell)\bigg{\{}\bigg{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\bigg{)}^{\ell/2}+\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\bigg{\}}.

Moreover, by Chatterji (1969, Lemma 1), we also have, for [32,2]\ell\in[\frac{3}{2},2],

𝔼[|i=1n(η¯i𝔼[η¯i])|]\displaystyle\mathbb{E}\bigg{[}\Big{|}\sum_{i=1}^{n}({\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}])\Big{|}^{\ell}\bigg{]} 2i=1n𝔼[|η¯i𝔼[η¯i]|]\displaystyle\leq 2\sum_{i=1}^{n}\mathbb{E}[|{\bar{\eta}}_{i}-\mathbb{E}[{\bar{\eta}}_{i}]|^{\ell}]
2i=1n𝔼[(η¯i+𝔼[η¯i])] since η¯i0\displaystyle\leq 2\sum_{i=1}^{n}\mathbb{E}[({\bar{\eta}}_{i}+\mathbb{E}[{\bar{\eta}}_{i}])^{\ell}]\text{ since }\bar{\eta}_{i}\geq 0
Cm3/2𝔼[Ψ13/2]n1/2σh3 since η¯iη¯i3/2.\displaystyle\leq C\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}\text{ since }{\bar{\eta}}_{i}^{\ell}\leq{\bar{\eta}}_{i}^{3/2}.

Hence we have proved the bound for Π1\Pi_{1}.

Now we prove the bound for Π2\Pi_{2}. First,

i=1n𝔼[(ηi1)I(|ηi|>1)]i=1n𝔼[ηiI(ηi>1)]i=1n𝔼[ηi3/2]=m3/2𝔼[Ψ13/2]n1/2σh3.\sum_{i=1}^{n}\mathbb{E}[(\eta_{i}-1)I(|\eta_{i}|>1)]\leq\sum_{i=1}^{n}\mathbb{E}[\eta_{i}I(\eta_{i}>1)]\leq\sum_{i=1}^{n}\mathbb{E}[\eta_{i}^{3/2}]=\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{n^{1/2}\sigma_{h}^{3}}.

Next, by a standard moment inequality for U-statistics (Koroljuk and Borovskich, 1994, Theorem 2.1.3), we have

𝔼[|r=2m(mr)(nr)11i1<<irnπr(h2)(Xi1,,Xir)σh2|3/2](m1)1/2r=2m(mr)3/2(nr)1/22r/2𝔼[|πr(h2)|3/2]σh3.\mathbb{E}\Bigg{[}\bigg{|}\sum_{r=2}^{m}{m\choose r}{n\choose r}^{-1}\sum_{1\leq i_{1}<\dots<i_{r}\leq n}\frac{\pi_{r}(h^{2})(X_{i_{1}},\dots,X_{i_{r}})}{\sigma_{h}^{2}}\bigg{|}^{3/2}\Bigg{]}\\ \leq(m-1)^{1/2}\sum_{r=2}^{m}{m\choose r}^{3/2}{n\choose r}^{-1/2}\frac{2^{r/2}\mathbb{E}[|\pi_{r}(h^{2})|^{3/2}]}{\sigma_{h}^{3}}.

Collecting these inequalities gives the bound for Π23/2\|\Pi_{2}\|_{3/2} in Lemma 4.2.

D.3. Proof of (4.7)

Since U¯h2>σh2/2\bar{U}_{h^{2}}>\sigma_{h}^{2}/2, we have

P(|D3|>1/2)\displaystyle P(|D_{3}|>1/2) P(|Π1+Π2|>1/4)\displaystyle\leq P(|\Pi_{1}+\Pi_{2}|>1/4)
P(|Π1|>1/8)+P(Π2|>1/8)\displaystyle\leq P(|\Pi_{1}|>1/8)+P(\Pi_{2}|>1/8)
64𝔼[Π12]+8Π23/2\displaystyle\leq 64\mathbb{E}[\Pi_{1}^{2}]+8\|\Pi_{2}\|_{3/2}

D.4. Proof of (4.8)

Recognizing that |D¯3||\bar{D}_{3}| amounts to the non-negative random variable |D3||D_{3}| upper-censored at 1/21/2, one can use Property 2.2(ii)(ii) and U¯h2>σh2/2\bar{U}_{h^{2}}>\sigma_{h}^{2}/2 to get

(D.1) |D¯3||D3|2(|Π1|+|Π2|)|\bar{D}_{3}|\leq|D_{3}|\leq 2(|\Pi_{1}|+|\Pi_{2}|)

Since the value of |D¯3||\bar{D}_{3}| must be capped at 1/21/2, (D.1) must also necessarily implies

(D.2) |D¯3|2(|Π¯1|+|Π¯2|)|\bar{D}_{3}|\leq 2(|\bar{\Pi}_{1}|+|\bar{\Pi}_{2}|)

where Π¯1\bar{\Pi}_{1} and Π¯2\bar{\Pi}_{2} represent Π1\Pi_{1} and Π2\Pi_{2} censored within the interval [1,1][-1,1] as

Π¯kΠkI(|Πk|1)+I(Πk1)I(Πk<1) for k=1,2.\bar{\Pi}_{k}\equiv\Pi_{k}I(|\Pi_{k}|\leq 1)+I(\Pi_{k}\geq 1)-I(\Pi_{k}<-1)\text{ for }k=1,2.

Hence, (D.2) gives (4.8) by Property 2.2(ii)(ii) and |Π¯2|2|Π¯2||\bar{\Pi}_{2}|^{2}\leq|\bar{\Pi}_{2}|.

D.5. Proof of (4.9)

By the definition of D3D_{3} in (4.4), one can first write

(D.3) 𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]=𝔼[𝔷𝒳σh2Π1U¯h2f𝔷𝒳(𝐢In,mξ¯𝐢)]𝔼[𝔷𝒳σh2Π2U¯h2f𝔷𝒳(𝐢In,mξ¯𝐢)]𝔼[𝔷𝒳(D31/2)f𝔷𝒳(𝐢In,mξ¯𝐢)I(D3>1/2)]𝔼[𝔷𝒳(D3+1/2)f𝔷𝒳(𝐢In,mξ¯𝐢)I(D3<1/2)].\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}=-\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\frac{\sigma_{h}^{2}\Pi_{1}}{\bar{U}_{h^{2}}}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}-\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\frac{\sigma_{h}^{2}\Pi_{2}}{\bar{U}_{h^{2}}}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\\ -\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}(D_{3}-1/2)f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}I(D_{3}>1/2)\Big{]}-\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}(D_{3}+1/2)f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}I(D_{3}<-1/2)\Big{]}.

Given supz,w|zfz(w)|1\sup_{z,w\in\mathbb{R}}|zf_{z}(w)|\leq 1 in (4.1), from (4.4) it is easy to see that

|𝔼[𝔷𝒳2f𝔷𝒳(𝐢In,mξ¯𝐢)I(D3>1/2)]|\displaystyle\bigg{|}\mathbb{E}\Big{[}\frac{\mathfrak{z}_{\mathcal{X}}}{2}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}I(D_{3}>1/2)\Big{]}\bigg{|} 12𝔼[I(|Π1+Π2|>14)]\displaystyle\leq\frac{1}{2}\mathbb{E}\bigg{[}I\bigg{(}|\Pi_{1}+\Pi_{2}|>\frac{1}{4}\bigg{)}\bigg{]}
12𝔼[I(|Π1|>18)+I(|Π2|>18)]\displaystyle\leq\frac{1}{2}\mathbb{E}\bigg{[}I\bigg{(}|\Pi_{1}|>\frac{1}{8}\bigg{)}+I\bigg{(}|\Pi_{2}|>\frac{1}{8}\bigg{)}\bigg{]}
(D.4) 32𝔼[Π12]+4Π23/2.\displaystyle\leq 32\mathbb{E}[\Pi_{1}^{2}]+4\|\Pi_{2}\|_{3/2}.

Similarly, we also have

(D.5) |𝔼[𝔷𝒳2f𝔷𝒳(𝐢In,mξ¯𝐢)I(D2<1/2)]|32𝔼[Π12]+4Π23/2.\bigg{|}\mathbb{E}\Big{[}\frac{\mathfrak{z}_{\mathcal{X}}}{2}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}I(D_{2}<-1/2)\Big{]}\bigg{|}\leq 32\mathbb{E}[\Pi_{1}^{2}]+4\|\Pi_{2}\|_{3/2}.

Next, using (4.1) and (4.4) again,

|𝔼[𝔷𝒳D3f𝔷𝒳(𝐢In,mξ¯𝐢)I(|D3|>1/2)]|\displaystyle\bigg{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}D_{3}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}I(|D_{3}|>1/2)\Big{]}\bigg{|}
𝔼[|D3|I(|D3|>12)]\displaystyle\leq\mathbb{E}\bigg{[}|D_{3}|I\bigg{(}|D_{3}|>\frac{1}{2}\bigg{)}\bigg{]}
2𝔼[|Π1+Π2|I(|Π1+Π2|>14)]\displaystyle\leq 2\mathbb{E}\bigg{[}|\Pi_{1}+\Pi_{2}|I\bigg{(}|\Pi_{1}+\Pi_{2}|>\frac{1}{4}\bigg{)}\bigg{]}
8𝔼[|Π1||Π1+Π2|]+2Π23/2\displaystyle\leq 8\mathbb{E}\big{[}\big{|}\Pi_{1}|\cdot|\Pi_{1}+\Pi_{2}|\big{]}+2\|\Pi_{2}\|_{3/2}
(D.6) 8(𝔼[Π12]+Π13Π23/2)+2Π23/2.\displaystyle\leq 8(\mathbb{E}[\Pi_{1}^{2}]+\|\Pi_{1}\|_{3}\|\Pi_{2}\|_{3/2})+2\|\Pi_{2}\|_{3/2}.

Lastly, using (4.1) again,

(D.7) |𝔼[𝔷𝒳σh2Π2U¯h2f𝔷𝒳(𝐢In,mξ¯𝐢)]|2𝔼[|Π2|]2Π23/2.\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\frac{\sigma_{h}^{2}\Pi_{2}}{\bar{U}_{h^{2}}}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|}\leq 2\mathbb{E}[|\Pi_{2}|]\leq 2\|\Pi_{2}\|_{3/2}.

Combining (D.3)-(D.7) gives

(D.8) |𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]||𝔼[𝔷𝒳σh2Π1U¯h2f𝔷𝒳(𝐣In,mξ¯𝐣)]|+C(𝔼[Π12]+Π13Π23/2+Π23/2)\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|}\leq\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\frac{\sigma_{h}^{2}\Pi_{1}}{\bar{U}_{h^{2}}}f_{\mathfrak{z}_{\mathcal{X}}}\bigg{(}\sum_{{\bf j}\in{I_{n,m}}}{\bar{\xi}}_{\bf j}\bigg{)}\bigg{]}\bigg{|}\\ +C\Big{(}\mathbb{E}[\Pi_{1}^{2}]+\|\Pi_{1}\|_{3}\|\Pi_{2}\|_{3/2}+\|\Pi_{2}\|_{3/2}\Big{)}

We will further bound the term |𝔼[𝔷𝒳σh2Π1U¯h2f𝔷𝒳(𝐣In,mξ¯𝐣)]|\big{|}\mathbb{E}\big{[}\mathfrak{z}_{\mathcal{X}}\frac{\sigma_{h}^{2}\Pi_{1}}{\bar{U}_{h^{2}}}f_{\mathfrak{z}_{\mathcal{X}}}\big{(}\sum_{{\bf j}\in{I_{n,m}}}{\bar{\xi}}_{\bf j}\big{)}\big{]}\big{|} in (D.8); first write

𝔼[𝔷𝒳σh2Π1U¯h2f𝔷𝒳(𝐣In,mξ¯𝐣)]\displaystyle\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\frac{\sigma_{h}^{2}\Pi_{1}}{\bar{U}_{h^{2}}}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\bigg{]}
(D.9) =𝔼[𝔷𝒳Π1σh2U¯h2(1U¯h2σh2)f𝔷𝒳(𝐣In,mξ¯𝐣)]+𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ¯𝐣)].\displaystyle=\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\cdot\Pi_{1}\cdot\frac{\sigma_{h}^{2}}{\bar{U}_{h^{2}}}\cdot\Big{(}1-\frac{\bar{U}_{h^{2}}}{\sigma_{h}^{2}}\Big{)}\cdot f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\bigg{]}+\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\Big{]}.

Using (4.1), we have

|𝔼[𝔷𝒳Π1σh2U¯h2(1U¯h2σh2)f𝔷𝒳(𝐣In,mξ¯𝐣)]|\displaystyle\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\cdot\Pi_{1}\cdot\frac{\sigma_{h}^{2}}{\bar{U}_{h^{2}}}\cdot\Big{(}1-\frac{\bar{U}_{h^{2}}}{\sigma_{h}^{2}}\Big{)}\cdot f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\bigg{]}\bigg{|}
𝔼[|Π1σh2U¯h2(1U¯h2σh2)|]\displaystyle\leq\mathbb{E}\Big{[}\Big{|}\Pi_{1}\cdot\frac{\sigma_{h}^{2}}{\bar{U}_{h^{2}}}\cdot\Big{(}1-\frac{\bar{U}_{h^{2}}}{\sigma_{h}^{2}}\Big{)}\Big{|}\Big{]}
2𝔼[|Π1(1U¯h2σh2)|]\displaystyle\leq 2\mathbb{E}\Big{[}\Big{|}\Pi_{1}\Big{(}1-\frac{\bar{U}_{h^{2}}}{\sigma_{h}^{2}}\Big{)}\Big{|}\Big{]}
=2𝔼[|Π1(1Uh2σh2)|I(1,𝒳)]+2𝔼[|Π1|I(Ω\1,𝒳)]\displaystyle=2\mathbb{E}\Big{[}\Big{|}\Pi_{1}\Big{(}1-\frac{U_{h^{2}}}{\sigma_{h}^{2}}\Big{)}\Big{|}\cdot I(\mathcal{E}_{1,\mathcal{X}})\Big{]}+2\mathbb{E}\Big{[}|\Pi_{1}|\cdot I\big{(}\Omega\backslash\mathcal{E}_{1,\mathcal{X}}\big{)}\Big{]}
2Π13(i=1nm(Ψ1(Xi)σh2)nσh23/2+Π23/2)+2Π13P(Ω\1,𝒳)2/3 by (2.9)\displaystyle\leq 2\|\Pi_{1}\|_{3}\bigg{(}\Big{\|}\sum_{i=1}^{n}\frac{m(\Psi_{1}(X_{i})-\sigma_{h}^{2})}{n\sigma_{h}^{2}}\Big{\|}_{3/2}+\|\Pi_{2}\|_{3/2}\bigg{)}+2\|\Pi_{1}\|_{3}P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})^{2/3}\text{ by }\eqref{H_decomp_h2_ustat}
(D.10) 2Π13(25/3mΨ13/2σh2n1/3+Π23/2)+2Π13P(Ω\1,𝒳)2/3,\displaystyle\leq 2\|\Pi_{1}\|_{3}\bigg{(}\frac{2^{5/3}m\|\Psi_{1}\|_{3/2}}{\sigma_{h}^{2}n^{1/3}}+\|\Pi_{2}\|_{3/2}\bigg{)}+2\|\Pi_{1}\|_{3}P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})^{2/3},

where the last inequality uses Chatterji (1969, Lemma 1)’s inequality:

𝔼[(i=1nm(Ψ1(Xi)σh2)nσh2)3/2]2m3/2𝔼[|Ψ1(X1)σh2|3/2]σh3n2m3/2𝔼[|Ψ1(X1)+σh2|3/2]σh3n(2m)3/2(𝔼[|Ψ13/2]+σh3)σh3n2(2m)3/2𝔼[|Ψ13/2]σh3n.\mathbb{E}\Big{[}\Big{(}\sum_{i=1}^{n}\frac{m(\Psi_{1}(X_{i})-\sigma_{h}^{2})}{n\sigma_{h}^{2}}\Big{)}^{3/2}\Big{]}\leq\frac{2m^{3/2}\mathbb{E}[|\Psi_{1}(X_{1})-\sigma_{h}^{2}|^{3/2}]}{\sigma_{h}^{3}\sqrt{n}}\\ \leq\frac{2m^{3/2}\mathbb{E}[|\Psi_{1}(X_{1})+\sigma_{h}^{2}|^{3/2}]}{\sigma_{h}^{3}\sqrt{n}}\leq\frac{(2m)^{3/2}(\mathbb{E}[|\Psi_{1}^{3/2}]+\sigma_{h}^{3})}{\sigma_{h}^{3}\sqrt{n}}\leq\frac{2(2m)^{3/2}\mathbb{E}[|\Psi_{1}^{3/2}]}{\sigma_{h}^{3}\sqrt{n}}.

Since

Π13mΨ13/2σh2n1/3𝔼[|Π1|3]+m3/2𝔼[Ψ13/2]σh3n1/2,Π13Π23/2𝔼[|Π1|3]+𝔼[|Π2|3/2] and Π13P(Ω\1,𝒳)2/3𝔼[|Π1|3]+P(Ω\1,𝒳),\|\Pi_{1}\|_{3}\frac{m\|\Psi_{1}\|_{3/2}}{\sigma_{h}^{2}n^{1/3}}\leq\mathbb{E}[|\Pi_{1}|^{3}]+\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{\sigma_{h}^{3}n^{1/2}},\\ \|\Pi_{1}\|_{3}\|\Pi_{2}\|_{3/2}\leq\mathbb{E}[|\Pi_{1}|^{3}]+\mathbb{E}[|\Pi_{2}|^{3/2}]\text{ and }\\ \|\Pi_{1}\|_{3}P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})^{2/3}\leq\mathbb{E}[|\Pi_{1}|^{3}]+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}}),

(D.8)-(D.10) together imply

|𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]||𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ¯𝐣)]|+C(m3/2𝔼[Ψ13/2]σh3n1/2+𝔼[Π12]+𝔼[|Π1|3]+𝔼[|Π2|3/2]+Π23/2)+P(Ω\1,𝒳),\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|}\leq\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\Big{]}\Big{|}+\\ C\bigg{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{\sigma_{h}^{3}n^{1/2}}+\mathbb{E}[\Pi_{1}^{2}]+\mathbb{E}[|\Pi_{1}|^{3}]+\mathbb{E}[|\Pi_{2}|^{3/2}]+\|\Pi_{2}\|_{3/2}\bigg{)}+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}}),

which can be slightly simplified as

(D.11) |𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]||𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ¯𝐣)]|+C(m3/2𝔼[Ψ13/2]σh3n1/2+𝔼[Π12]+𝔼[|Π1|3]+Π23/2)+P(Ω\1,𝒳)\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|}\leq\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\Big{]}\Big{|}+\\ C\bigg{(}\frac{m^{3/2}\mathbb{E}[\Psi_{1}^{3/2}]}{\sigma_{h}^{3}n^{1/2}}+\mathbb{E}[\Pi_{1}^{2}]+\mathbb{E}[|\Pi_{1}|^{3}]+\|\Pi_{2}\|_{3/2}\bigg{)}+P(\Omega\backslash\mathcal{E}_{1,\mathcal{X}})

by absorbing 𝔼[|Π2|3/2]\mathbb{E}[|\Pi_{2}|^{3/2}] into Π23/2\|\Pi_{2}\|_{3/2}; this is because |𝔼[𝔷𝒳D¯3f𝔷𝒳(𝐢In,mξ¯𝐢)]|1/2\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\bar{D}_{3}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf i}\in I_{n,m}}{\bar{\xi}}_{\bf i}\Big{)}\Big{]}\Big{|}\leq 1/2 by (4.1), one can without loss of generality assume that Π23/21\|\Pi_{2}\|_{3/2}\leq 1 (otherwise the bound is vacuous), so 𝔼[|Π2|3/2]Π23/2\mathbb{E}[|\Pi_{2}|^{3/2}]\leq\|\Pi_{2}\|_{3/2}.

Lastly, we will bound the term |𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ¯𝐣)]|\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\Big{]}\Big{|} on the right hand side of (D.11). Recall the definition of ξ𝐣\xi_{\bf j} in (3.3), and define the event

𝒳,𝒵={max𝐣In,m|ξ𝐣|1}\mathscr{E}_{\mathcal{X},\mathcal{Z}}=\Big{\{}\max_{{\bf j}\in I_{n,m}}|\xi_{\bf j}|\leq 1\Big{\}}

which depends on both 𝒳\mathcal{X} and 𝒵\mathcal{Z}; it will be useful to note that

(D.12) P(Ω\𝒳,𝒵𝒳)𝐣In,m𝔼[|ξ𝐣|3𝒳]=U|h|3(12p+2p2)U¯h23/2N(1p)23/2U|h|3(12p+2p2)σh3N(1p),P\Big{(}\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}}\mid\mathcal{X}\Big{)}\leq\sum_{{\bf j}\in{I_{n,m}}}\mathbb{E}[|\xi_{\bf j}|^{3}\mid\mathcal{X}]=\frac{U_{|h|^{3}}(1-2p+2p^{2})}{\bar{U}_{h^{2}}^{3/2}\sqrt{N(1-p)}}\leq\frac{2^{3/2}U_{|h|^{3}}(1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}},

by the computation of 𝔼[|ξ𝐣|3𝒳]\mathbb{E}[|\xi_{\bf j}|^{3}\mid\mathcal{X}] in (3.4). One can bound then it as

|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ¯𝐣)]|\displaystyle\bigg{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\Big{]}\bigg{|}
=|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)I(𝒳,𝒵)]|+|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ¯𝐣)I(Ω\𝒳,𝒵)]|\displaystyle=\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}I(\mathscr{E}_{\mathcal{X},\mathcal{Z}})\bigg{]}\bigg{|}+\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}I(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}})\bigg{]}\bigg{|}
|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)I(𝒳,𝒵)]|+𝔼[|Π1I(Ω\𝒳,𝒵)|] by (4.1)\displaystyle\leq\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}I(\mathscr{E}_{\mathcal{X},\mathcal{Z}})\bigg{]}\bigg{|}+\mathbb{E}\big{[}\big{|}\Pi_{1}I(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}})\big{|}\big{]}\text{ by \eqref{xfx_sup_bdd}}
(D.13) |𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)I(𝒳,𝒵)]|+2Π13/2h3(12p+2p2)1/3σh(N(1p))1/6,\displaystyle\leq\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}I(\mathscr{E}_{\mathcal{X},\mathcal{Z}})\bigg{]}\bigg{|}+\frac{\sqrt{2}\|\Pi_{1}\|_{3/2}\|h\|_{3}(1-2p+2p^{2})^{1/3}}{\sigma_{h}(N(1-p))^{1/6}},

where the last inequality comes from applying Hölder’s inequality and (D.12) on the right hand side of

𝔼[|Π1I(Ω\𝒳,𝒵)|]=𝔼[|Π1|P(Ω\𝒳,𝒵𝒳)]𝔼[|Π1|(P(Ω\𝒳,𝒵𝒳))1/3].\mathbb{E}\big{[}\big{|}\Pi_{1}I(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}})\big{|}\big{]}=\mathbb{E}[|\Pi_{1}|P(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}}\mid\mathcal{X})]\leq\mathbb{E}\Big{[}|\Pi_{1}|\big{(}P(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}}\mid\mathcal{X})\big{)}^{1/3}\Big{]}.

Further, 𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)I(𝒳,𝒵)]\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}I(\mathscr{E}_{\mathcal{X},\mathcal{Z}})\Big{]} in (D.13) can be written as

(D.14) 𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)I(𝒳,𝒵)]=𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)I(Ω\𝒳,𝒵)];\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}I(\mathscr{E}_{\mathcal{X},\mathcal{Z}})\Big{]}=\\ \mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}\Big{]}-\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}I(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}})\Big{]};

on the other hand, (4.1) implies

|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)I(Ω\𝒳,𝒵)]|\displaystyle\bigg{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}I(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}})\Big{]}\bigg{|}
Π13/2I(Ω\𝒳,𝒵)3\displaystyle\leq\|\Pi_{1}\|_{3/2}\|I(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}})\|_{3}
=Π13/2(𝔼[P(Ω\𝒳,𝒵𝒳)])1/3\displaystyle=\|\Pi_{1}\|_{3/2}\Big{(}\mathbb{E}[P(\Omega\backslash\mathscr{E}_{\mathcal{X},\mathcal{Z}}\mid\mathcal{X})]\Big{)}^{1/3}
Π13/2(23/2𝔼[|h|3](12p+2p2)σh3N(1p))1/3 by (D.12)\displaystyle\leq\|\Pi_{1}\|_{3/2}\Bigg{(}\frac{2^{3/2}\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}\sqrt{N(1-p)}}\Bigg{)}^{1/3}\text{ by \eqref{cE_3_bdd} }
(D.15) 2Π13/2h3(12p+2p2)1/3σh(N(1p))1/6.\displaystyle\leq\frac{\sqrt{2}\|\Pi_{1}\|_{3/2}\|h\|_{3}(1-2p+2p^{2})^{1/3}}{\sigma_{h}(N(1-p))^{1/6}}.

Putting (D.13)-(D.15) together give

|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ¯𝐣)]|\displaystyle\Big{|}\mathbb{E}\Big{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}{\bar{\xi}}_{\bf j}\Big{)}\Big{]}\Big{|}
22Π13/2h3(12p+2p2)1/3σh(N(1p))1/6+|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]|\displaystyle\leq\frac{2\sqrt{2}\|\Pi_{1}\|_{3/2}\|h\|_{3}(1-2p+2p^{2})^{1/3}}{\sigma_{h}(N(1-p))^{1/6}}+\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}\bigg{]}\bigg{|}
(D.16) 22(𝔼[|Π1|3/2]+𝔼[|h|3](12p+2p2)σh3(N(1p))1/2)+|𝔼[𝔷𝒳Π1f𝔷𝒳(𝐣In,mξ𝐣)]|.\displaystyle\leq 2\sqrt{2}\bigg{(}\mathbb{E}[|\Pi_{1}|^{3/2}]+\frac{\mathbb{E}[|h|^{3}](1-2p+2p^{2})}{\sigma_{h}^{3}(N(1-p))^{1/2}}\bigg{)}+\bigg{|}\mathbb{E}\bigg{[}\mathfrak{z}_{\mathcal{X}}\Pi_{1}f_{\mathfrak{z}_{\mathcal{X}}}\Big{(}\sum_{{\bf j}\in I_{n,m}}\xi_{\bf j}\Big{)}\bigg{]}\bigg{|}.

Combining (D.11) and (D.16) gives (4.9).

References

  • Berry (1941) Andrew C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Amer. Math. Soc., 49:122–136, 1941. ISSN 0002-9947,1088-6850.
  • Blom (1976) Gunnar Blom. Some properties of incomplete UU-statistics. Biometrika, 63(3):573–580, 1976. ISSN 0006-3444,1464-3510.
  • Chatterji (1969) Srishti D Chatterji. An l p-convergence theorem. The Annals of Mathematical Statistics, 40(3):1068–1070, 1969.
  • Chen and Shao (2007) Louis HY Chen and Qi-Man Shao. Normal approximation for nonlinear statistics using a concentration inequality approach. Bernoulli, 13(2):581–599, 2007.
  • Chen and Kato (2019) Xiaohui Chen and Kengo Kato. Randomized incomplete uu-statistics in high dimensions. The Annals of Statistics, 47(6):3127–3156, 2019.
  • De la Pena and Giné (1999) Victor De la Pena and Evarist Giné. Decoupling: from dependence to independence. Springer Science & Business Media, 1999.
  • DiCiccio and Romano (2022) Cyrus DiCiccio and Joseph Romano. CLT for UU-statistics with growing dimension. Statist. Sinica, 32(1):323–344, 2022. ISSN 1017-0405,1996-8507.
  • Esseen (1942) Carl-Gustav Esseen. On the liapunoff limit of error in the theory of probability. Ark. Mat. Astron. Fys., A28(9):1–19, 1942.
  • Ferger (1996) Dietmar Ferger. Moment inequalities for UU-statistics with degeneracy of higher order. Sankhyā Ser. A, 58(1):142–148, 1996. ISSN 0581-572X.
  • Hoeffding (1948) Wassily Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statistics, 19:293–325, 1948. ISSN 0003-4851.
  • Janson (1984) Svante Janson. The asymptotic distributions of incomplete u-statistics. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 66(4):495–505, 1984.
  • Koroljuk and Borovskich (1994) V. S. Koroljuk and Yu. V. Borovskich. Theory of UU-statistics, volume 273 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1994. ISBN 0-7923-2608-3. Translated from the 1989 Russian original by P. V. Malyshev and D. V. Malyshev and revised by the authors.
  • Leung and Shao (2023) Dennis Leung and Qi-Man Shao. Nonuniform berry-esseen bounds for studentized u-statistics. Bernoulli, accepted, 2023.
  • Leung et al. (2024) Dennis Leung, Qi-Man Shao, and Liqian Zhang. Another look at stein’s method for studentized nonlinear statistics with an application to u-statistics. Journal of Theoretical Probability, accepted, 2024.
  • Mentch and Hooker (2016) Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. The Journal of Machine Learning Research, 17(1):841–881, 2016.
  • Peng et al. (2022) Wei Peng, Tim Coleman, and Lucas Mentch. Rates of convergence for random forests via generalized u-statistics. Electronic Journal of Statistics, 16(1):232–292, 2022.
  • Rosenthal (1970) Haskell P. Rosenthal. On the subspaces of LpL^{p} (p>2)(p>2) spanned by sequences of independent random variables. Israel J. Math., 8:273–303, 1970. ISSN 0021-2172.
  • Shao et al. (2016) Qi-Man Shao, Kan Zhang, and Wen-Xin Zhou. Stein’s method for nonlinear statistics: a brief survey and recent progress. J. Statist. Plann. Inference, 168:68–89, 2016. ISSN 0378-3758,1873-1171. doi: 10.1016/j.jspi.2015.06.008. URL https://doi.org/10.1016/j.jspi.2015.06.008.
  • Shevtsova (2010) Irina G Shevtsova. An improvement of convergence rate estimates in the lyapunov theorem. In Doklady Mathematics, volume 82, pages 862–864. Springer, 2010.
  • Song et al. (2019) Yanglei Song, Xiaohui Chen, and Kengo Kato. Approximating high-dimensional infinite-order UU-statistics: statistical and computational guarantees. Electron. J. Stat., 13(2):4794–4848, 2019. ISSN 1935-7524. doi: 10.1214/19-EJS1643. URL https://doi.org/10.1214/19-EJS1643.
  • Stein (1972) Charles Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory, volume 6, pages 583–603. University of California Press, 1972.
  • Sturma et al. (2022) Nils Sturma, Mathias Drton, and Dennis Leung. Testing many and possibly singular polynomial constraints. arXiv preprint arXiv:2208.11756, 2022.
  • van der Vaart and Wellner (1996) Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996. ISBN 0-387-94640-3. With applications to statistics.
  • Xu et al. (2022) Tianning Xu, Ruoqing Zhu, and Xiaofeng Shao. On variance estimation of random forests with infinite-order u-statistics. arXiv preprint arXiv:2202.09008, 2022.
  • Zhou et al. (2021) Zhengze Zhou, Lucas Mentch, and Giles Hooker. VV-statistics and variance estimation. J. Mach. Learn. Res., 22:Paper No. [287], 48, 2021. ISSN 1532-4435,1533-7928.