This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exponential tail bounds and Large Deviation Principle for Heavy-Tailed U-Statistics111This work was partially supported by the National Science Foundation under Grant 1811614.

Milad Bakhshizadeh Stanford University
Abstract

We study deviation of U-statistics when samples have heavy-tailed distribution so the kernel of the U-statistic does not have bounded exponential moments at any positive point. We obtain an exponential upper bound for the tail of the U-statistics which clearly denotes two regions of tail decay, the first is a Gaussian decay and the second behaves like the tail of the kernel. For several common U-statistics, we also show the upper bound has the right rate of decay as well as sharp constants by obtaining rough logarithmic limits which in turn can be used to develop LDP for U-statistics. In spite of usual LDP results in the literature, processes we consider in this work have LDP speed slower than their sample size nn.

1 Introduction

Suppose X1,X2,,XniidμX_{1},X_{2},\ldots,X_{n}\overset{iid}{\sim}\mu where μ\mu is a probability measure supported on \mathbb{R}. We consider the following U-statistic of order mm with a kernel function h():mh(\cdot)\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{m}\to\mathbb{R} which is symmetric about its arguments:

Un1(nm)1i1<<imnh(Xi1,,Xim).U_{n}\triangleq\frac{1}{{n\choose m}}\sum_{1\leq i_{1}<\ldots<i_{m}\leq n}h(X_{i_{1}},\ldots,X_{i_{m}}).

We study the decay of (|Un𝔼[Un]|>t)\mathbb{P}\left({\mathinner{\!\left\lvert U_{n}-\mathbb{E}\mathinner{\left[U_{n}\right]}\right\rvert}>t}\right) as a function of t,nt,n, and mm, in a setup which is rarely addressed in the literature and is characterized by a couple of key assumptions: heavy-tailed distribution for h(X1,,Xm)h(X_{1},...,X_{m}), and large values for tt. We explain the role of each assumption in more details below.

When h(X1,,Xm)hh(X_{1},...,X_{m})\triangleq h has a heavy-tailed distribution, its Moment Generating Function (MGF) is not bounded at any positive point. This breaks many concentration results and violates Cramer’s condition which is required for common results that determine large deviation behavior. While deviation of U-statistics has been studied extensively in the light-tailed regime [1, 9, 8, 15, 16, 23, 22], the same for heavy-tailed U-statistics is relatively under-explored. In this paper, we aim to focus on the heavy-tailed regime.

Moreover, when kernel hh is assumed to have a heavy tail, for fixed values of nn and mm, (|Un𝔼[Un]|>t)\mathbb{P}\left({\mathinner{\!\left\lvert U_{n}-\mathbb{E}\mathinner{\left[U_{n}\right]}\right\rvert}>t}\right) has two different behaviors: 1) for small values of tt it decays like a Gaussian tail, and 2) for tt larger than the zone of normal convergence, i.e. large deviation zone, it has a decay slower than normal distribution. In spite of the first region which has been studied in the literature largely, see [16, 15] and the references therein, there are little documented information about the tail behavior in the second region. Several setups have been developed to study the behavior of the tail. We mention some of them below to discuss the setup that results of the current paper belong to.

  1. 1.

    Finite sample concentration inequalities that give upper bounds for (|Un𝔼[Un]|>t)\mathbb{P}\left({\mathinner{\!\left\lvert U_{n}-\mathbb{E}\mathinner{\left[U_{n}\right]}\right\rvert}>t}\right) for all values of n.

  2. 2.

    Asymptotic distribution studies that find suitable scaling ana_{n} and limiting distribution DD for which an|Un𝔼[Un]|𝑑Da_{n}\mathinner{\!\left\lvert U_{n}-\mathbb{E}\mathinner{\left[U_{n}\right]}\right\rvert}\xrightarrow{d}D. an=cna_{n}=c\sqrt{n}, and D=𝒩(0,1)D=\mathcal{N}(0,1) is the well-known CLT that holds for non-degenerate U-statistics.

  3. 3.

    Berry–Esseen type inequalities that seek for uniform upper bound |(an|Un𝔼[Un]|>t)ϕ(t)|\mathinner{\!\left\lvert\mathbb{P}\left({a_{n}\mathinner{\!\left\lvert U_{n}-\mathbb{E}\mathinner{\left[U_{n}\right]}\right\rvert}>t}\right)-\phi(t)\right\rvert} for all t𝒞t\in\mathcal{C}, where 𝒞\mathcal{C}\subseteq\mathbb{R}.

  4. 4.

    Large deviation studies that look for convergence speed bnb_{n} and rate function f(t)f(t) for which we have limnlog(|Un𝔼[Un]|>t)bn=f(t),f(t)>0\lim\limits_{n\to\infty}\frac{-\log\mathbb{P}\left({\mathinner{\!\left\lvert U_{n}-\mathbb{E}\mathinner{\left[U_{n}\right]}\right\rvert}>t}\right)}{b_{n}}=f(t),\;f(t)>0 for large tt.

In this work, we try to shed some light on the undocumented facts about the deviation of U-statistics in setups 1 and 4 listed above, the concentration inequality, and the large deviation setups. In particular, we develop a general concentration inequality that holds for all U-statistics with finite second moments. Moreover, we characterize a class of kernels for which that concentration inequality is sharp enough to yield the large deviation limit. The byproduct of such sharpness analysis is obtaining exact convergence speed bnb_{n} and a closed form for the rate function in terms of the tail of the kernel hh. For heavy-tailed kernels, which are considered in this work, we usually obtain bnnb_{n}\ll n, e.g bn=nα,α<1b_{n}=n^{\alpha},\;\alpha<1. This distinguishes our results from several previous works that tried to determine tail behavior in the asymptotic Gaussian regime, i.e. bn=nb_{n}=n.

1.1 Related works

Large deviations of U-statistics have been studied in several previous works under a variety of assumptions. Hoeffiding [12] formally introduced U-statistics and showed with the right scaling they converge to a Normal distribution. Follow up studies offered bounds for the deviation of U-statistics from their Gaussian limits (see [16, 15] and the references therein).

Giné et al. [11] offers upper bound for moments of U-statistics. There is a relatively straightforward connection between moment inequalities and exponential bounds for the tail (see [21]). Utilizing such connection, [11] obtains exponential tail bounds. However, these bounds do not show Gaussian decay when the deviation amount tt is within the Gaussian zone. Therefore, the authors also offer improvements for their inequalities in the case of completely degenerate kernels of order m=2m=2.

Recently, Chakrabortty and Kuchibhotla [5] considered degenerate kernels of order 2 and samples from heavy-tailed subWeibull distributions and obtained exponential tail bounds for such U-statistics. However, the specific form they used for the U-statistics seems restrictive. They define

Un=ijϕ(Yi)wi,j(Xi,Xj)ψ(Yj),U_{n}=\sum\limits_{i\neq j}\phi(Y_{i})w_{i,j}(X_{i},X_{j})\psi(Y_{j}),

and impose uniform boundedness assumption on wi,jw_{i,j}. Hence, the kernel can be unbounded only through product function ϕ(Yi)ψ(Yj)\phi(Y_{i})\psi(Y_{j}). It is not clear to us if results of [5] can recover exponential bounds for general non-degenerate kernels like ones named in Lemma 2.19 of the current work.

One property of U-statitics of order m2m\geq 2 that makes them more challenging than the case m=1m=1, i.e. iid sums, is containing dependent terms in the sum. There are a couple of ways to handle such dependencies, decoupling and martingale inequalities. Each direction points to a path that is relatively different from another. Both directions have been explored to develop tail bounds for U-statistics.

de la Pena [7] provides a decoupling tool that helps one to get rid of dependencies of summands in the U-statistics. This tool has been utilized by several later works to obtain exponential concentration bounds, e.g. [17, 4, 5]. While the decoupling technique is proved very powerful in handling dependencies, there is an extra 88 factor on the upper bound it offers which usually makes the exponential bound it provides off by a constant factor in its exponent.

Another approach to obtain exponential inequalities for U-statistics is to leverage inequalities for (sub/super) martingales such as those in [10]. Houdré and Reynaud-Bouret [13] and some of their references pursue this direction. Using de la Pena [7] decoupling and Talagrand [20] inequality is common in such approach. Nevertheless, in this approach, the statements get algebraically complicated soon and several constants given by sup\sup or expectation of partial sums get involved. As a result, many bounds obtained by martingale inequalities are restricted to order 2 or degenerate kernels. Moreover, constants are not usually sharp for the similar reason discussed in utilizing the decoupling technique above.

In addition to tail bounds, several works have been devoted to showing U-statistics satisfy Large Deviation Principal (LDP). It is common for works in this direction to assume the kernel has finite exponential moments, hence they exclude distributions we are focusing on in the current manuscript.

Dasgupta [6] tried to show large deviation behavior of U-statistics is determined by their conditional expectation given one summand. At the first glance this sounds a reasonable claim since the asymptotic normality of U-statistics is tightly related to such conditional expectations [12]. However, this claim has been disapproved later [3, 22].

Nikitin and Ponikarov [18] study large deviation of U-statistics with bounded non-degenerate kernels. It also claims no large deviation results have been known by the time the work was published. Arcones [1] developes a decomposition technique for the case of a 2-variables kernel with finite MGF that enables one to prove LDP. Nevertheless, extension of this technique to higher order kernels is not straightforward. Eichelsbacher and Löwe [9] consider both U-statistics and V-statistics, which includes terms with repeated samples in the sum, given by a kernel function with bounded MGF. In this case, they show satisfying LDP for U-statistics and the corresponding V-statistics is equivalent with the same rate function. Lemma 3.1 from [9], which allows one to bound MGF of a U-statistic, is one the most important tools we used in the current work. Eichelsbacher [8] provides sufficient conditions under which a U-statistic would satisfy an LDP under weak Cramer condition. It utilizes a representation of a bi-variable kernel as infinite sums of separable functions in order to reduce the problem of the large deviation for U-statistics to the one for sums of iid variables. Then, it applies contraction theorem to prove LDP for the U-statistics.

The current work focuses on heavy-tailed distributions which are mostly excluded from large deviation studies above. The exponential tail bound given in Section 2.1 includes all kernels with finite variances regardless of their order, boundedness, or degeneracy. The upper bound we offer is simple enough to reveal both Gaussian and heavy tail decay regions immediately. All parameters in the statement of the upper bound have known limits when nn\to\infty. Moreover, we offer ready-to-use tools to handle them for fixed sample size nn in Section 2.1.1. In addition, to the best of our knowledge, the rough logarithmic limits, with speeds slower than nn, obtained in Section 2.3 have not been documented in earlier works.

1.2 Some Applications

U-statistics appear in several statistical analyses including point estimation, such as population variance [12], hypothesis testing [19], and statistical mechanics [14]. As a self-averaging function, it is quite natural to ask how fast a U-statistic concentrates around its mean.

In particular, for the asymptotic analysis of hypothesis testing, the speed and the rate function for the large deviation of such statistics are important. For instance, obtaining the values of Bahadur efficiency and Bahadur exact slope requires having LDP for the test function [19]. Moreover, the large deviation principle and exponential concentrations for U-statistics play substantial roles in the study of macroscopic limits of interacting particle systems and in general statistical mechanics [17]. Our result enables one to extend such theories when samples are drawn from a heavy-tailed distribution.

2 Main results

The main result of this paper is two-fold. First, in Section 2.1 we develop a general upper bound for the tail probability of U-statistics whose kernels have bounded variances. Second, in Section 2.3, we show for several kernel functions this upper bound is tight enough to determine the large deviation behavior of the U-statistics. If a centered kernel h:mh\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{m}\to\mathbb{R} satisfies the criteria discussed in Section 2.3 and UnU_{n} denotes the U-statistic corresponding to hh with nn samples, we show in Lemma 2.14 that (Un>t)\mathbb{P}\left({U_{n}>t}\right) and (h>nmt)\mathbb{P}\left({h>\frac{n}{m}t}\right) have the same asymptotic behavior in logarithmic sense, i.e.

limnlog(Un>t)log(h>nmt)=1.\lim_{n\to\infty}\frac{\log\mathbb{P}\left({U_{n}>t}\right)}{\log\mathbb{P}\left({h>\frac{n}{m}t}\right)}=1.

Therefore, both the convergence speed and the rate function for large deviation of UnU_{n} are determined by the tail of kernel hh.

In addition, the lower bound developed in Section 2.2 reveals the event that is responsible for large deviations of UnU_{n}. Indeed, one of the X1,,XnX_{1},...,X_{n} gets large enough to make all the summands it is contributing to exceed nmt\frac{n}{m}t. There will be (n1m1){n-1}\choose{m-1} such terms in the expansion of UnU_{n}.

2.1 Finite sample upper bound

Without loss of generality, we will operate under the assumption that the kernel is centered.

Assumption 1.

Suppose hh is centered, i.e.

𝔼h(X1,,Xm)=0.\mathbb{E}h(X_{1},\ldots,X_{m})=0.

We need to define rate functions I,JI,J as below.

Definition 2.1.

Let I,J:00I,J\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}^{\geq 0}\to\mathbb{R}^{\geq 0} be two non-decreasing functions that upperbound right tails of hh and |Xi|\mathinner{\!\left\lvert X_{i}\right\rvert} exponentially. In other words:

(h(X1,,Xm)>t)exp(I(t)),t0,\displaystyle\mathbb{P}\left({h(X_{1},...,X_{m})>t}\right)\leq\exp\left({-I(t)}\right),\quad\forall t\geq 0, (2.1)
(|Xi|>t)exp(J(t)),t0.\displaystyle\mathbb{P}\left({\mathinner{\!\left\lvert X_{i}\right\rvert}>t}\right)\leq\exp\left({-J(t)}\right),\quad\forall t\geq 0. (2.2)

Note that one can simply take I(t)=log(h(X1,,Xm)>t),J(t)=log(|Xi|>t)I(t)=-\log\mathbb{P}\left({h(X_{1},...,X_{m})>t}\right),\;J(t)=-\log\mathbb{P}\left({\mathinner{\!\left\lvert X_{i}\right\rvert}>t}\right). The whole point of Definition 2.1 is to allow one to work with possibly simpler upper bounds.

Below Lemma is a simpler modified version of Lemma 3.1 from [9].

Lemma 2.2.

Let knmk\triangleq\lfloor\frac{n}{m}\rfloor. Then, given any L0L\geq 0 and λ0\lambda\geq 0, the following holds:

𝔼[exp(λ1(nm)1i1<<imnhL(Xi1,,Xim))]𝔼[exp(λkhL(X1,,Xm))]k,\mathbb{E}\left[\exp\left(\lambda\frac{1}{{n\choose m}}\sum_{1\leq i_{1}<\ldots<i_{m}\leq n}h_{L}(X_{i_{1}},\ldots,X_{i_{m}})\right)\right]\leq\mathbb{E}\mathinner{\left[\exp\left({\frac{\lambda}{k}h_{L}(X_{1},...,X_{m})}\right)\right]}^{k},

where hL(Xi1,,Xim)h(Xi1,,Xim)𝟏(h(Xi1,,Xim)L)h_{L}(X_{i_{1}},\ldots,X_{i_{m}})\triangleq h(X_{i_{1}},\ldots,X_{i_{m}})\mathbf{1}(h(X_{i_{1}},\ldots,X_{i_{m}})\leq L).

Proof.

Define B(X1,,Xn)=1k(hL(X1,,Xm)+hL(Xm+1,,X2m)++hL(Xkmm+1,,Xkm))B(X_{1},...,X_{n})=\frac{1}{k}(h_{L}(X_{1},...,X_{m})+h_{L}(X_{m+1},...,X_{2m})+...+h_{L}(X_{km-m+1},...,X_{km})), then we have

1(nm)i1<<imhL(Xi1,,Xim)=1n!σSnB(Xσ(1),,Xσ(n)).\frac{1}{{n\choose m}}\sum_{i_{1}<...<i_{m}}h_{L}(X_{i_{1}},...,X_{i_{m}})=\frac{1}{n!}\sum_{\sigma\in S_{n}}B(X_{\sigma(1)},...,X_{\sigma(n)}).

Since hLLh_{L}\leq L is bounded, its moment generating function is finite at any positive point, hence we obtain

𝔼[exp(λ1(nm)1i1<<imnhL(Xi1,,Xim))]\displaystyle\mathbb{E}\left[\exp\left(\lambda\frac{1}{{n\choose m}}\sum_{1\leq i_{1}<\ldots<i_{m}\leq n}h_{L}(X_{i_{1}},\ldots,X_{i_{m}})\right)\right] =𝔼[exp(λn!σB(Xσ(1),,Xσ(n)))]\displaystyle=\mathbb{E}\left[\exp\left(\frac{\lambda}{{n!}}\sum_{\sigma}B(X_{\sigma(1)},...,X_{\sigma(n)})\right)\right]
1n!σ𝔼[exp(λB(Xσ(1),,Xσ(n)))]\displaystyle\overset{*}{\leq}\frac{1}{n!}\sum_{\sigma}\mathbb{E}\mathinner{\left[\exp\left({\lambda B(X_{\sigma(1)},...,X_{\sigma(n)})}\right)\right]}
=𝔼[exp(λB(X1,,Xn))]\displaystyle=\mathbb{E}\mathinner{\left[\exp\left({\lambda B(X_{1},...,X_{n})}\right)\right]}
=𝔼[exp(λkhL(X1,,Xm))]k.\displaystyle=\mathbb{E}\mathinner{\left[\exp\left({\frac{\lambda}{k}h_{L}(X_{1},...,X_{m})}\right)\right]}^{k}.

To obtain inequality marked by *, we used the fact exp(1n!λBσ)1n!exp(Bσ)\exp\left({\frac{1}{n!}\sum\lambda B\circ\sigma}\right)\leq\frac{1}{n!}\sum\exp\left({B\circ\sigma}\right) by convexity of the exponential function.

\square

For the sake of simplicity we drop arguments of h(X1,,Xm),hL(X1,,Xm)h(X_{1},...,X_{m}),h_{L}(X_{1},...,X_{m}) and only write h,hLh,h_{L}

Lemma 2.3 (Lemma 1 of [2]).

If 𝔼[h]=0\mathbb{E}\mathinner{\left[h\right]}=0, for any η,L0\eta,L\geq 0 we have

𝔼[exp(ηhL)]exp(v(L,η)2η2),\mathbb{E}\mathinner{\left[\exp(\eta h_{L})\right]}\leq\exp\left(\frac{v(L,\eta)}{2}\eta^{2}\right),

where v(L,η)𝔼[hL2𝟏(h0)+hL2exp(ηhL)𝟏(h>0)]v(L,\eta)\triangleq\mathbb{E}\mathinner{\left[h_{L}^{2}\mathbf{1}(h\leq 0)+h_{L}^{2}\exp(\eta h_{L})\mathbf{1}(h>0)\right]}.

Theorem 2.4.

Under Assumption 1, for any 0<β1,t00<\beta\leq 1,\;t\geq 0

(Un>t)exp(kt22v(kt,βI(kt)kt))+exp(βI(kt)max(12,c(t,β,k)))+(nm)exp(I(kt)),\mathbb{P}\left({U_{n}>t}\right)\leq\exp\left({-\frac{kt^{2}}{2v(kt,\beta\frac{I(kt)}{kt})}}\right)+\exp\left({-\beta I(kt)\max(\frac{1}{2},c(t,\beta,k))}\right)+{n\choose m}\exp\left({-I(kt)}\right), (2.3)

where v(,)v(\cdot,\cdot) is the same as in Lemma 2.3, c(t,β,k)1β2tI(kt)ktv(kt,βI(kt)kt)c(t,\beta,k)\triangleq 1-\frac{\beta}{2t}\frac{I(kt)}{kt}v(kt,\beta\frac{I(kt)}{kt}), and k=nmk=\lfloor\frac{n}{m}\rfloor.

Proof.

Let show the U-statistic with kernel hh by Un(h)U_{n}(h)

(Un(h)>t)\displaystyle\mathbb{P}\left({U_{n}(h)>t}\right) (Un(hL)>t)+(i1,,im,h(Xi1,,Xim)>L)\displaystyle\leq\mathbb{P}\left({U_{n}(h_{L})>t}\right)+\mathbb{P}\left({\exists i_{1},...,i_{m},\quad h(X_{i_{1}},...,X_{i_{m}})>L}\right)
exp(λt)𝔼[exp(λUn(hL))]+(nm)exp(I(L))\displaystyle\leq\exp\left({-\lambda t}\right)\mathbb{E}\mathinner{\left[\exp\left({\lambda U_{n}(h_{L})}\right)\right]}+{n\choose m}\exp\left({-I(L)}\right)
exp(λt)(𝔼[exp(λkhL)])k+(nm)exp(I(L))\displaystyle{\leq}\exp\left({-\lambda t}\right)\left(\mathbb{E}\mathinner{\left[\exp\left({\frac{\lambda}{k}h_{L}}\right)\right]}\right)^{k}+{n\choose m}\exp\left({-I(L)}\right)\hskip 85.35826pt Lemma 2.2
exp(λt)(exp(v(L,λk)2λ2k2))k+(nm)exp(I(L))\displaystyle\leq\exp\left({-\lambda t}\right)\left(\exp\left({\frac{v(L,\frac{\lambda}{k})}{2}\frac{\lambda^{2}}{k^{2}}}\right)\right)^{k}+{n\choose m}\exp\left({-I(L)}\right)\hskip 85.35826pt Lemma 2.3
=exp(λt+v(L,λk)2kλ2)+(nm)exp(I(L)).\displaystyle=\exp\left({-\lambda t+\frac{v(L,\frac{\lambda}{k})}{2k}\lambda^{2}}\right)+{n\choose m}\exp\left({-I(L)}\right).

Choose L=ktL=kt. To conclude the proof we only need to show that there are always choices for λ\lambda which makes

exp(λt+v(kt,λk)2kλ2)exp(kt22v(kt,βI(kt)kt))+exp(βI(kt)max(12,c(t,β,k)))\exp\left({-\lambda t+\frac{v(kt,\frac{\lambda}{k})}{2k}\lambda^{2}}\right)\leq\exp\left({-\frac{kt^{2}}{2v(kt,\beta\frac{I(kt)}{kt})}}\right)+\exp\left({-\beta I(kt)\max(\frac{1}{2},c(t,\beta,k))}\right)

.

We consider two cases:

  1. 1.

    if tv(kt,βI(kt)kt)βI(kt)kt\frac{t}{v\left(kt,\beta\frac{I(kt)}{kt}\right)}\leq\frac{\beta I(kt)}{kt} choose λ=ktv(kt,βI(kt)kt)\lambda=\frac{kt}{v(kt,\beta\frac{I(kt)}{kt})},     so λk=tv(kt,βI(kt)kt)βI(kt)kt\frac{\lambda}{k}=\frac{t}{v\left(kt,\beta\frac{I(kt)}{kt}\right)}\leq\frac{\beta I(kt)}{kt}

  2. 2.

    if tv(kt,βI(kt)kt)>βI(kt)kt\frac{t}{v\left(kt,\beta\frac{I(kt)}{kt}\right)}>\frac{\beta I(kt)}{kt} choose λ=βI(kt)t\lambda=\frac{\beta I(kt)}{t}.

Then, in case 1 since λkβI(L)L\frac{\lambda}{k}\leq\frac{\beta I(L)}{L}, we have v(L,λk)v(L,βI(kt)kt)v(L,\frac{\lambda}{k})\leq v(L,\frac{\beta I(kt)}{kt}) (Note that v(L,)v(L,\cdot) is increasing in its second argument). Hence,

λt+v(kt,λk)2kλ2λt+v(kt,βI(kt)kt)2kλ2=kt22v(kt,βI(kt)kt).\displaystyle-\lambda t+\frac{v(kt,\frac{\lambda}{k})}{2k}\lambda^{2}\leq-\lambda t+\frac{v(kt,\frac{\beta I(kt)}{kt})}{2k}\lambda^{2}=-\frac{kt^{2}}{2v(kt,\frac{\beta I(kt)}{kt})}.

In the second case one just needs to substitute λ\lambda to obtain

λt+v(kt,λk)2kλ2\displaystyle-\lambda t+\frac{v(kt,\frac{\lambda}{k})}{2k}\lambda^{2} =βI(kt)+v(kt,βI(kt)kt)β2I(kt)22kt2\displaystyle=-{\beta I(kt)}+\frac{v(kt,\frac{\beta I(kt)}{kt})\beta^{2}I(kt)^{2}}{2kt^{2}}
=βI(kt)(1v(kt,βI(kt)kt)βI(kt)2kt2)\displaystyle=-\beta I(kt)\left(1-\frac{v(kt,\frac{\beta I(kt)}{kt})\beta I(kt)}{2kt^{2}}\right)
=βc(t,β,k)I(kt)\displaystyle=-\beta c(t,\beta,k)I(kt)
=βmax(12,c(t,β,k))I(kt).\displaystyle=-\beta\max(\frac{1}{2},c(t,\beta,k))I(kt).

Note that since in this case tv(kt,βI(kt)kt)>βI(kt)kt\frac{t}{v\left(kt,\beta\frac{I(kt)}{kt}\right)}>\frac{\beta I(kt)}{kt}, we have c(t,β,k)>12c(t,\beta,k)>\frac{1}{2} so we have max(12,c(t,β,k))=c(t,β,k)\max(\frac{1}{2},c(t,\beta,k))=c(t,\beta,k). The max\max operator controls this term when we are in the first case, so the upper bound does not blow up.

\square

Remark 2.5 (Two regions of deviations).

Inequality (2.3) reveals two different decay rates for the tail of UnU_{n}. For small tts, the first term, i.e. exp(kt22v)\exp\left({-\frac{kt^{2}}{2v}}\right), will be dominant, hence we observe Gaussian-like deviation. This behavior has been studied already by CLT for U-statistics [12]. For larger tts, the last couple of terms on the right hand side of (2.3) will be dominant. We call this region large deviation region. Asymptotically, the sum of the last two terms decays like (nm)exp(I(kt)){n\choose m}\exp\left({-I(kt)}\right) for both subWeibull and polynomial tail kernels (see Section III of [2] for detailed discussion).

Inequality (2.3) denotes large deviation behavior whenever

kt2v(kt,β)I(kt).\frac{kt^{2}}{v(kt,\beta)}\gg I(kt). (2.4)

For instance, when I(kt)=ktα,α1I(kt)=\sqrt[\alpha]{kt},\;\alpha\geq 1 we have large deviation behavior for tkα12α1=nmα12α1t\gg k^{-\frac{\alpha-1}{2\alpha-1}}=\lfloor\frac{n}{m}\rfloor^{-\frac{\alpha-1}{2\alpha-1}}. This means the region of Gaussian deviation shrinks to 0 as nn\to\infty, when α>1\alpha>1.

2.1.1 Parameters of inequality (2.3)

Theorem 2.4 bounds the tail of UnU_{n} in terms of k=nmk=\lfloor\frac{n}{m}\rfloor and the tail of kernel hh. The only terms of (2.3) that might seem unfamiliar are c(t,β,k),v(kt,βI(kt)kt)c(t,\beta,k),v(kt,\beta\frac{I(kt)}{kt}). What typically happens in the asymptotic setting nn\to\infty is c(t,β,k)1,v(kt,βI(kt)kt)Var(h)c(t,\beta,k)\to 1,\;v(kt,\beta\frac{I(kt)}{kt})\to Var(h). Moreover, β\beta can be chosen arbitrarily close to 11. Hence, for large nn, one can think of upper bound (2.3) as exp(kt22Var(h))+(1+(nm))exp(I(kt))\exp\left({-\frac{kt^{2}}{2Var(h)}}\right)+(1+{n\choose m})\exp\left({-I(kt)}\right). For logarithmic I(t)I(t) which corresponds to polynomial tail kernels, there are more restrictions on the constant β\beta. Nevertheless, for large deviation regime, the dominant term of (2.3) will still be (nm)exp(I(kt)){n\choose m}\exp\left({-I(kt)}\right). This Section contains several statements that make the above claims precise.

Remark 2.6.

Lemma 4 of [2] states that v(kt,βI(kt)kt)kVar(h)v(kt,\beta\frac{I(kt)}{kt})\xrightarrow{k\to\infty}Var(h) in either of the following setups:

  1. 1.

    I(t)=ctα,α>1,β<1I(t)=c\sqrt[\alpha]{t},\;\alpha>1,\;\beta<1

  2. 2.

    I(t)=γlog(t),γ>2,β<12γI(t)=\gamma\log(t),\;\gamma>2,\;\beta<1-\frac{2}{\gamma}.

Hence, for large values of kk, one should be able to upper bound the first term of (2.3) with exp(kt22CVar(h))\exp\left({-\frac{kt^{2}}{2CVar(h)}}\right), where C<1C<1, but can get arbitrarily close to 11.

In the above, Case 1 includes all subWeibull variables, and Case 2 includes variables with polynomial tails and finite variances.

Remark 2.7.

When kt2I(kt)kt^{2}\gg I(kt), and v(kt,βI(kt)kt)v(kt,\beta\frac{I(kt)}{kt}) is bounded, we have

c(t,β,k)k1.c(t,\beta,k)\xrightarrow{k\to\infty}1. (2.5)

This includes both cases of Remark 2.6 with tkα12α1t\gg k^{-\frac{\alpha-1}{2\alpha-1}} and tlogkkt\gg\sqrt{\frac{\log k}{k}}, respectively.

To verify (2.5) it suffices to note that c(t,β,k)=1β2tI(kt)ktv(kt,βI(kt)kt)c(t,\beta,k)=1-\frac{\beta}{2t}\frac{I(kt)}{kt}v(kt,\beta\frac{I(kt)}{kt}), I(kt)ktk0\;\frac{I(kt)}{kt}\xrightarrow{k\to\infty}0, and all other terms in the definition of c(t,β,k)c(t,\beta,k) remain bounded.

While Remark 2.6 provides asymptotic bounds for v(kt,βI(kt)kt)v(kt,\beta\frac{I(kt)}{kt}), one might need bounds for finite sample case to utilize Theorem 2.4. Next Lemma and Remark provide such bounds.

Lemma 2.8.

If I(t)ctαI(t)\geq c\sqrt[\alpha]{t} for some α1\alpha\geq 1, Var(h)<\text{Var}(h)<\infty, and β<1\beta<1 is fixed, then there is a fixed number v<v<\infty such that for any L>1L>1 and ηβI(L)L\eta\leq\beta\frac{I(L)}{L} we have v(L,η)vv(L,\eta)\leq v.

Proof.

Since ηβI(L)L\eta\leq\beta\frac{I(L)}{L}, we have v(L,η)v(L,βI(L)L)v(L,\eta)\leq v(L,\beta\frac{I(L)}{L}). Moreover, by Corollary 2 of [2] we obtain

v(L,βI(L)L)\displaystyle v(L,\beta\frac{I(L)}{L}) 𝔼[h2𝟏(h0)]+Γ(2α+1)((1β)c)2α+L1α1βcΓ(3α+1)3((1β)c)3α\displaystyle\leq\mathbb{E}\mathinner{\left[h^{2}\mathbf{1}(h\leq 0)\right]}+\frac{\Gamma(2\alpha+1)}{((1-\beta)c)^{2\alpha}}+L^{\frac{1}{\alpha}-1}\frac{\beta c\Gamma(3\alpha+1)}{3((1-\beta)c)^{3\alpha}}
𝔼[h2]+Γ(2α+1)((1β)c)2α+βcΓ(3α+1)3((1β)c)3α.\displaystyle\leq\mathbb{E}\mathinner{\left[h^{2}\right]}+\frac{\Gamma(2\alpha+1)}{((1-\beta)c)^{2\alpha}}+\frac{\beta c\Gamma(3\alpha+1)}{3((1-\beta)c)^{3\alpha}}. (2.6)

The right hand side of (2.6) does not depend on LL, hence it remains constant as LL\to\infty.

Note that there is a slight change of variable for function v(L,η)v(L,\eta) defined here and the one defined in [2], which of course has been taken into account in quoting Corollary 2 from [2].

\square

Remark 2.9.

If I(t)γlogtI(t)\geq\gamma\log t with γ>2\gamma>2, which includes kernels with polynomial tails and finite variances, Corollary 3 of [2] yields

v(L,βI(L)L)CL2(1β)γlogL,v(L,\beta\frac{I(L)}{L})\leq CL^{2-(1-\beta)\gamma}\log L, (2.7)

for some CC independent of LL.

Hence, for β<11γ\beta<1-\frac{1}{\gamma}, while v(L,βI(L)L)v(L,\beta\frac{I(L)}{L}) can grow as LL\to\infty, still the last two terms of (2.3) are the dominant terms of right hand side. As discussed in [2], β<11γ\beta<1-\frac{1}{\gamma} is sufficient for obtaining sharp upper bounds for the deviation probability of the sum of iid variables with polynomial tails, i.e. U-statistics with order m=1m=1.

2.2 Lower bound

Let J(t)J(t) be the function defined in Definition 2.1, and An=[J1(log2n),J1(log2n)]n1A_{n}=\mathinner{\left[-J^{-1}(\log 2n),J^{-1}(\log 2n)\right]}^{n-1}, where J1J^{-1} denotes generalized inverse of JJ and [,]\mathinner{\left[\cdot,\cdot\right]} is the closed interval of given limits. Define:

Definition 2.10.
φn(X)inf(X1,,Xn1)Anh(X1,X2,,Xm1,X).\varphi_{n}(X)\triangleq\inf\limits_{(X_{1},...,X_{n-1})\in A_{n}}h(X_{1},X_{2},...,X_{m-1},X). (2.8)

Note that we force X1,Xn1X_{1},...X_{n-1} to be in AnA_{n}, but only use the first m1m-1 in the argument of inf\inf.

Then we have:

Lemma 2.11.

Assume kernel hh has a finite variance. Then

(Un>t)C(φn(Xn)ntm),\displaystyle\mathbb{P}\left({U_{n}>t}\right)\geq C\mathbb{P}\left({\varphi_{n}(X_{n})\geq\frac{nt}{m}}\right), (2.9)

where C>0C>0 is an absolute constant independent of nn.

Proof.
(Un>t)\displaystyle\mathbb{P}\left({U_{n}>t}\right) (Un1(X1,,Xn1)0,i1<<im1<nh(Xi1,Xi2,,Xim1,Xn)>(nm)t)\displaystyle\geq\mathbb{P}\left({U_{n-1}(X_{1},...,X_{n-1})\geq 0,\;\sum_{i_{1}<...<i_{m-1}<n}h(X_{i_{1}},X_{i_{2}},...,X_{i_{m-1}},X_{n})>{n\choose m}t}\right)
(Un1(X1,,Xn1)0,(X1,,Xn1)An,(n1m1)φn(Xn)>(nm)t)\displaystyle\geq\mathbb{P}\left({U_{n-1}(X_{1},...,X_{n-1})\geq 0,\;(X_{1},...,X_{n-1})\in A_{n},{n-1\choose m-1}\varphi_{n}(X_{n})>{n\choose m}t}\right)
(Un1(X1,,Xn1)0,|Xi|J1(log2n)in1)(φn(Xn)>ntm).\displaystyle\geq\mathbb{P}\left({U_{n-1}(X_{1},...,X_{n-1})\geq 0,\;\mathinner{\!\left\lvert X_{i}\right\rvert}\leq J^{-1}(\log 2n)\;\;\forall i\leq n-1}\right)\mathbb{P}\left({\varphi_{n}(X_{n})>\frac{nt}{m}}\right).

Note that

(|Xi|J1(log2n)in1)\displaystyle\mathbb{P}\left({\mathinner{\!\left\lvert X_{i}\right\rvert}\leq J^{-1}(\log 2n)\;\;\forall i\leq n-1}\right) =(1(|Xi|>J1(log2n)))n1\displaystyle=\left(1-\mathbb{P}\left({\mathinner{\!\left\lvert X_{i}\right\rvert}>J^{-1}(\log 2n)}\right)\right)^{n-1}
(1exp(J(J1(log2n))))n1\displaystyle\geq\left(1-\exp\left({-J(J^{-1}(\log 2n))}\right)\right)^{n-1}
(112n)n1n1e.\displaystyle\geq(1-\frac{1}{2n})^{n-1}\xrightarrow{n\to\infty}\frac{1}{\sqrt{\rm e}}.

Moreover, (Un10)n12\mathbb{P}\left({U_{n-1}\geq 0}\right)\xrightarrow{n\to\infty}\frac{1}{2} by CLT for U-statistics [12]. Hence, for large enough nn we obtain:

(Un1(X1,,Xn1)0,|Xi|J1(log2n)in1)0.9(12+1e1)>0.\mathbb{P}\left({U_{n-1}(X_{1},...,X_{n-1})\geq 0,\;\mathinner{\!\left\lvert X_{i}\right\rvert}\leq J^{-1}(\log 2n)\;\;\forall i\leq n-1}\right)\geq 0.9\left(\frac{1}{2}+\frac{1}{\sqrt{\rm e}}-1\right)>0.

Choosing C<0.9(1e12)C<0.9\left(\frac{1}{\sqrt{e}}-\frac{1}{2}\right), and small enough to cover all the finite cases before the above asymptotic become true concludes the proof.

\square

Remark 2.12.

An=[J1(log2n),J1(log2n)]n1A_{n}=\mathinner{\left[-J^{-1}(\log 2n),J^{-1}(\log 2n)\right]}^{n-1} in Lemma 2.11 can be replaced with any sequence of events Ann1A_{n}\subset\mathbb{R}^{n-1} for which lim infn(An)>12\liminf\limits_{n\to\infty}\mathbb{P}\left({A_{n}}\right)>\frac{1}{2}. Also, one can work with (Un1ε,φn(Xn)>ntm+ε(nm))\mathbb{P}\left({U_{n-1}\geq-\varepsilon,\varphi_{n}(X_{n})>\frac{nt}{m}+\frac{\varepsilon}{{n\choose m}}}\right) to relax this condition to lim infn(An)>0\liminf\limits_{n\to\infty}\mathbb{P}\left({A_{n}}\right)>0.

2.3 Large Deviation Principle

In this Section, we show the upper bound (2.3) is asymptotically tight in certain cases. The idea is to show the rate function for the large deviation of a U-statistic is asymptotically equivalent to the right hand side of (2.3). A trivial requirement for such sharpness to hold is functions I(t),J(t)I(t),J(t) defined in Definition 2.1 be asymptotically tight. This is formalized in the next assumption.

Assumption 2.

Suppose

limtlog(h(X1,,Xm)>t)I(t)=limtlog(|Xi|>t)J(t)=1.\displaystyle\lim_{t\to\infty}\frac{-\log\mathbb{P}\left({h(X_{1},...,X_{m})>t}\right)}{I(t)}=\lim_{t\to\infty}\frac{-\log\mathbb{P}\left({\mathinner{\!\left\lvert X_{i}\right\rvert}>t}\right)}{J(t)}=1.

Hereafter, we focus on subWeibull distributions. The tail of such distribution is bounded by some Weibull distribution, hence, all of their moments are finite. Nonetheless, the exponential moment is not finite. Assumption 3 encodes the class of heavy-tailed subWeibull random variables.

Assumption 3.

Assume there is α>1\alpha>1 and c>0c>0 such that I(t)ctα,t>0I(t)\geq c\sqrt[\alpha]{t},\;\forall t>0.

Although LDP is derived under Assumptions 3 and 6 here, we think one can obtain similar results for distributions with polynomial tails, i.e. logarithmic I(t),J(t)I(t),J(t), and finite second moments following footsteps of this Section and Bakhshizadeh et al. [2]. However, for the sake of brevity we do not include polynomial tails in the following Sections.

Moreover, we need a lower bound for the deviation amount tt to make sure it is in the large deviation regime. The below assumption provides such a bound.

Assumption 4.

Suppose kt2I(kt)\frac{kt^{2}}{I(kt)}\to\infty, i.e. kt2I(kt)kt^{2}\gg I(kt), as nn\to\infty.

Remark 2.13.

For I(t)=ctα,α>1I(t)=c\sqrt[\alpha]{t},\;\alpha>1, Assumption 4 holds whenever tnα12α1t\gg n^{-\frac{\alpha-1}{2\alpha-1}}. This includes constant tt as well as converging to 0 sequence tnt_{n} as long it decays slower than (1n)α12α1(\frac{1}{n})^{\frac{\alpha-1}{2\alpha-1}}.

Lemma 2.14.

Suppose Assumptions 1, 2, 3, and 4 hold. For φn()\varphi_{n}(\cdot) defined in (2.8) and I()I(\cdot) in Theorem 2.4 if one has

limnlog(φn(X)nmt)I(kt)=1,\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({\varphi_{n}(X)\geq\frac{n}{m}t}\right)}{I(kt)}=1, (2.10)

where k=nmk=\lfloor\frac{n}{m}\rfloor, then

limnlog(Un>t)I(kt)=1.\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({U_{n}>t}\right)}{I(kt)}=1.

In other words, I(kt)I(kt) is the large deviation rate function for UnU_{n}.

We postpone proof of the above Lemma to Section 4.1.

Condition (2.10) essentially says large values of h(X1,,Xm)h(X_{1},...,X_{m}) are determined by large value of only one coordinate. It can be proved for many commonly used kernels applied to heavy-tailed variables XiX_{i}. Assuming tail of |Xi|\mathinner{\!\left\lvert X_{i}\right\rvert} is a shifted sub-additive function, a usual property of heavy tails, we can prove (2.10) for several kernels.

Assumption 5.

Suppose a constant shift on J(t)J(t), defined in Definition 2.1, makes it sub-additive on non-negative Real numbers, i.e.

J(t1+t2)J(t1)+J(t2)+b,t1,t20,J(t_{1}+t_{2})\leq J(t_{1})+J(t_{2})+b,\quad\forall t_{1},t_{2}\geq 0, (2.11)

where bb\in\mathbb{R} is an absolute constant.

Remark 2.15.

Assumption 5 is somehow posing heavy-tailed distribution requirement for random variable XX. While it does not directly state JJ is a sublinear function, which is equivalent to the formal definition of a heavy-tailed distribution, it controls JJ’s growth to be equal to or slower than linear functions. Lemma 2.16 and Remarks 2.17, 2.18 denote most well-known heavy-tailed distributions can have shifted sub-additive tail function JJ.

Lemma 2.16.

If J:J\mathrel{\mathop{\mathchar 58\relax}}\mathbb{R}\to\mathbb{R} is a concave function on non-negative Real numbers, then JJ satisfies (2.11) with b=J(0)b=-J(0).

Proof.

By concavity of JJ we have

J(t1)=J(t1t1+t2(t1+t2)+t2t1+t20)t1t1+t2J(t1+t2)+t2t1+t2J(0).J(t_{1})=J\left(\frac{t_{1}}{t_{1}+t_{2}}(t_{1}+t_{2})+\frac{t_{2}}{t_{1}+t_{2}}0\right)\geq\frac{t_{1}}{t_{1}+t_{2}}J(t_{1}+t_{2})+\frac{t_{2}}{t_{1}+t_{2}}J(0).

Similarly, we obtain J(t2)t2t1+t2J(t1+t2)+t1t1+t2J(0)J(t_{2})\geq\frac{t_{2}}{t_{1}+t_{2}}J(t_{1}+t_{2})+\frac{t_{1}}{t_{1}+t_{2}}J(0). Summing up the above two inequalities shows

J(t1+t2)J(t1)+J(t2)J(0).J(t_{1}+t_{2})\leq J(t_{1})+J(t_{2})-J(0).

\square

Remark 2.17.

For the below distributions, one can find a function J(t)J(t) as defined in Definition 2.1 which is both asymptotically tight and shifted sub-additive, i.e. satisfies Assumptions 2, 5.

  1. 1.

    Exponential

  2. 2.

    |𝒩(0,1)|α,α2\mathinner{\!\left\lvert\mathcal{N}(0,1)\right\rvert}^{\alpha},\quad\alpha\geq 2

  3. 3.

    Log-Normal

  4. 4.

    Weibull distribution with shape parameter s1s\leq 1

  5. 5.

    Log Logistic

  6. 6.

    Pareto

We postpone proof of Remark 2.17 to Section 4.2.

Remark 2.18.

In general, with simple modifications, we expect the tail function J(t)J(t) for heavy-tailed distributions satisfy Assumption 5. This includes distributions that are not named in Remark 2.17. Note that J(t)=log(|X|>t)J(t)=-\log\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>t}\right) is an increasing function that grows to infinity, and by heavy-tailed assumption is supposed to be sub-linear. Hence, it is expected that J(t)J(t) becomes a concave function after some point tTt\geq T. Using the same technique we utilized in the proof of case 3, Log-Normal distribution, of Remark 2.17, one can define a function J2(t)J_{2}(t) which is equal to J(t)J(t) on [T,)[T,\infty), and linearly extends to [0,T][0,T] such that it is less than J(t)J(t) and remains concave on the whole non-negative Real numbers. At this point, Lemma 2.16 shows J2(t)J_{2}(t) should satisfy (2.11).

Assumption 6.

Suppose J(t)logtJ(t)\gg\log t as tt\to\infty, i.e. limtlogtJ(t)=0\lim\limits_{t\to\infty}\frac{\log t}{J(t)}=0.

Lemma 2.19.

Under Assumptions 2, 5, 6 condition (2.10) holds with any bounded tt in the following cases:

  1. 1.

    h(X,Y)=|XY|𝔼[|XY|]h(X,Y)=\mathinner{\!\left\lvert X-Y\right\rvert}-\mathbb{E}\mathinner{\left[\mathinner{\!\left\lvert X-Y\right\rvert}\right]}

  2. 2.

    h(X,Y)=(XY)2𝔼[(XY)2]h(X,Y)=(X-Y)^{2}-\mathbb{E}\mathinner{\left[(X-Y)^{2}\right]}

  3. 3.

    h(X1,,Xm)=max(|X1|,,|Xm|)𝔼[max(|X1|,,|Xm|)]h(X_{1},...,X_{m})=\max(\mathinner{\!\left\lvert X_{1}\right\rvert},...,\mathinner{\!\left\lvert X_{m}\right\rvert})-\mathbb{E}\mathinner{\left[\max(\mathinner{\!\left\lvert X_{1}\right\rvert},...,\mathinner{\!\left\lvert X_{m}\right\rvert})\right]}

  4. 4.

    h(X,Y)=12(X2+Y2)max(X,Y)𝔼[X2]+𝔼[max(X,Y)]h(X,Y)=\frac{1}{2}(X^{2}+Y^{2})-\max{(X,Y)}-\mathbb{E}\mathinner{\left[X^{2}\right]}+\mathbb{E}\mathinner{\left[\max{(X,Y)}\right]}

Hence, under extra Assumptions 3, 4 Lemma 2.14 yields

limnlog(Un>t)I(kt)=1,\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({U_{n}>t}\right)}{I(kt)}=1,

for U-statistics constructed with the above kernels.

Proof of the above Lemma is postponed to Section 4.3.

Remark 2.20.

The last kernel in Lemma 2.19 is related to ω2\omega^{2}-statistics for the goodness of fit [23].

Remark 2.21.

Similar to case 3 of Lemma 2.19, if one takes J(t)J(t) to be the tail function of XX instead of |X|\mathinner{\!\left\lvert X\right\rvert}, i.e. (X>t)exp(J(t))\mathbb{P}\left({X>t}\right)\lesssim\exp\left({-J(t)}\right), she can show condition (2.10) also holds for the below kernel

h(X1,,Xm)=max(X1,,Xm)𝔼[max(X1,,Xm)].h(X_{1},...,X_{m})=\max(X_{1},...,X_{m})-\mathbb{E}\mathinner{\left[\max(X_{1},...,X_{m})\right]}.

2.4 Discussion on the necessity of condition (2.10)

Lemma 2.14 denotes the upper bound given in Theorem 2.4 is asymptotically sharp if (2.10) holds. Lemma 2.19 lists some common kernels of U-statistics for which (2.10) holds. One might ask if (2.10) is necessary to obtain asymptotic sharpness and LDP. Below, we study an example for which the bound given by Theorem 2.4 is not sharp. This shows kernels need to satisfy certain conditions to have the same asymptotic as the right hand side of (2.3). Determining the necessary conditions for such kernels is not given in this work, but is an interesting question to be addressed in future studies.

Consider m=2m=2 and h(x,y)=xyh(x,y)=xy. Also, assume XX has a symmetric distribution around origin, and J(t)=ctα,α<1J(t)=ct^{\alpha},\;\alpha<1 (e.g XWeibullX\sim\text{Weibull} or X=𝒩(0,1)2αX=\mathcal{N}(0,1)^{\frac{2}{\alpha}}). In this case,

(XY>u)\displaystyle\mathbb{P}\left({XY>u}\right) (|X|>u or |Y|>u)2exp(J(u))=2exp(cuα2),\displaystyle\leq\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>\sqrt{u}\text{ or }\mathinner{\!\left\lvert Y\right\rvert}>\sqrt{u}}\right)\simeq 2\exp\left({-J(\sqrt{u})}\right)=2\exp\left({-cu^{\frac{\alpha}{2}}}\right),
(XY>u)\displaystyle\mathbb{P}\left({XY>u}\right) (X>u)(Y>u)exp(2J(u))=exp(2cuα2),\displaystyle\geq\mathbb{P}\left({X>\sqrt{u}}\right)\mathbb{P}\left({Y>\sqrt{u}}\right)\simeq\exp\left({-2J(\sqrt{u})}\right)=\exp\left({-2cu^{\frac{\alpha}{2}}}\right),

for large enough uu. Hence, for the tail function I(u)I(u) we will have

C1uα2I(u)C2uα2,C1,C2>0.C_{1}u^{\frac{\alpha}{2}}\leq I(u)\leq C_{2}u^{\frac{\alpha}{2}},\quad C_{1},C_{2}>0. (2.12)

If one directly applies Theorem 2.4, she obtains

(Un>t)exp(kt22v)+exp(βI(kt)max(12,c(t,β,k)))+(n2)exp(I(kt)),\mathbb{P}\left({U_{n}>t}\right)\leq\exp\left({-\frac{kt^{2}}{2v}}\right)+\exp\left({-\beta I(kt)\max(\frac{1}{2},c(t,\beta,k))}\right)+{n\choose 2}\exp\left({-I(kt)}\right), (2.13)

where constant vv is given by Lemma 2.8, β<1\beta<1 is arbitrary for large nn, and c(t,β,k)n1c(t,\beta,k)\xrightarrow{n\to\infty}1. The right hand side of (2.13) is at least exp(I(kt))>exp(C22α2nα2)=exp(C2nα2)\exp\left({-I(kt)}\right)>\exp\left({-\frac{C_{2}}{2^{\frac{\alpha}{2}}}{n}^{\frac{\alpha}{2}}}\right)=\exp\left({-C_{2}^{\prime}n^{\frac{\alpha}{2}}}\right). This bound is loose since we will show (Un>t)\mathbb{P}\left({U_{n}>t}\right) decays like exp(C3nα)\exp\left({-C_{3}n^{\alpha}}\right), and nαnα2n^{\alpha}\gg n^{\frac{\alpha}{2}} as nn\to\infty.

Observe that

Un=1n(n1)(i=1nXi)21n(n1)i=1nXi2=nn1Sn21n(n1)Tn,U_{n}=\frac{1}{n(n-1)}\left(\sum_{i=1}^{n}X_{i}\right)^{2}-\frac{1}{n(n-1)}\sum_{i=1}^{n}X_{i}^{2}=\frac{n}{n-1}S_{n}^{2}-\frac{1}{n(n-1)}T_{n},

where Sn=1nXi,Tn=Xi2S_{n}=\frac{1}{n}\sum X_{i},\;T_{n}=\sum X_{i}^{2}.

Note that SnS_{n} is an order 11 U-statistic with kernel h2(x)=xh_{2}(x)=x, and Unnn1Sn22Sn2U_{n}\leq\frac{n}{n-1}S_{n}^{2}\leq 2S_{n}^{2}. Also, for kernel h2h_{2} the tail function I2(t)=J(t)log2I_{2}(t)=J(t)-\log 2. Hence, by Theorem 2.4 we obtain

(Un>t)\displaystyle\mathbb{P}\left({U_{n}>t}\right) (Sn>t2)\displaystyle\leq\mathbb{P}\left({S_{n}>\sqrt{\frac{t}{2}}}\right)
exp(nt4v)+C3exp(βmax(12,c(t,β,k))J(nt/2))+C3nexp(J(nt/2)).\displaystyle\leq\exp\left({-\frac{nt}{4v}}\right)+C_{3}\exp\left({-\beta\max(\frac{1}{2},c(t,\beta,k))J(n\sqrt{t/2})}\right)+C_{3}n\exp\left({-J(n\sqrt{t/2})}\right). (2.14)

As discussed in Section 2.3, the right hand side of (2.14) decays like exp(J(nt/2))=exp(C3nαtα2)\exp\left({-J(n\sqrt{t/2})}\right)=\exp\left({-C_{3}n^{\alpha}t^{\frac{\alpha}{2}}}\right), which is much smaller than (2.13) when nn is large.

Indeed, we can show (2.14) has the right order of decay. Considering the event E=(Xn1,Xn>n(n1)t and Un20)E=(X_{n-1},X_{n}>\sqrt{n(n-1)t}\text{ and }U_{n-2}\geq 0) for which

(Un>t)(E)exp(C4nαtα2).\mathbb{P}\left({U_{n}>t}\right)\geq\mathbb{P}\left({E}\right)\simeq\exp\left({-C_{4}n^{\alpha}t^{\frac{\alpha}{2}}}\right).

Therefore, one can show nαn^{\alpha} is the correct speed for the large deviation decay of UnU_{n}. In other words, there are constants C4,C5>0C_{4},C_{5}>0 such that for large enough nn

exp(C4(nt)α)(Un>t)exp(C5(nt)α).\exp\left({-C_{4}(n\sqrt{t})^{\alpha}}\right)\leq\mathbb{P}\left({U_{n}>t}\right)\leq\exp\left({-C_{5}(n\sqrt{t})^{\alpha}}\right).
Remark 2.22.

Note that h(x,y)=xyh(x,y)=xy is a degenerate unbounded kernel. Both Nikitin and Ponikarov [18] and Chakrabortty and Kuchibhotla [5] claim sharp exponential bounds or large deviation limits for such kernels have not been addressed in works preceding them. As discussed in the current Section, while product kernel does not satisfy (2.10), a slight modification in the usage of Theorem 2.4 can still yield an exponential bound which is sharp up to a constant. This shows the strength of Theorem 2.4 even beyond scenarios in which sharpness has been shown through Lemma 2.14.

3 Future works

While we documented some important information about the concentration of U-statistics with heavy-tailed samples, there are questions that remained unanswered. It seems addressing some of them takes only extra effort along the similar path of reasoning we used in this manuscript. We exclude such questions just for the sake of brevity which we believe helps to convey the main message of the current work better. Other questions sound more challenging and may require different techniques to be addressed. In this Section, we point out both types of questions and leave them for future studies.

Note that Lemmas 2.14 and 2.19 denote the upper bound (2.3) is sharp for certain U-statistics when tt is larger than the region of Gaussian decay. However, the first term of (2.3), the Gaussian term, does not have a sharp constant in general. Below Remark declares this fact.

Remark 3.1.

Let h1(X1)=𝔼[h(X1,,Xm)|X1]h_{1}(X_{1})=\mathbb{E}\mathinner{\left[h(X_{1},...,X_{m})\mathinner{\rvert}X_{1}\right]}. The asymptotic variance of UnU_{n} is m2nVar(h1)\frac{m^{2}}{n}Var(h_{1}) [12], v(kt,βI(kt)kt)Var(h)v(kt,\beta\frac{I(kt)}{kt})\to\text{Var}(h), and Var(h)mVar(h1)Var(h)\geq mVar(h_{1}). In fact, Var(h)mVar(h1)=Var(himh1(Xi))Var(h)-mVar(h_{1})=Var(h-\sum\limits_{i\leq m}h_{1}(X_{i})). This means

limnkt22v(kt,βI(kt)kt)mt22n Var (Un)<1.\lim\limits_{n\to\infty}\frac{\frac{kt^{2}}{2v(kt,\beta\frac{I(kt)}{kt})}}{\frac{mt^{2}}{2n\text{ Var }(U_{n})}}<1.

It would be interesting if one can improve (2.3) to have sharp constants on both Gaussian and heavy-tailed regions of deviation. A direction that sounds promising to do so is to use Hoeffding decomposition [12] and apply Theorem 2.4 for projections of UnU_{n} individually.

Another possible improvement is to extend results of Section 2.3 beyond kernels with subWeibull tails, i.e. when Assumption 3 does not hold. This already has been done for sums of iid samples, i.e. U-statistics of order m=1m=1, in Bakhshizadeh et al. [2]. Moreover, Theorem 2.4 offers non-trivial bounds as long as Var(Xi)<\text{Var}(X_{i})<\infty. Assumption 3 is used to remove all logarithmic terms when nn\to\infty. Taking such terms into account needs more effort, but does not change the spirit of our reasoning. Let I(t)=γlogtI(t)=\gamma\log t have logarithmic growth. In the light of (2.3) f(t)=I(kt)log(nm)lognI(kt)mlognlognf(t)=\frac{I(kt)-\log{n\choose m}}{\log n}\simeq\frac{I(kt)-m\log n}{\log n} seems a reasonable rate function for LDP of UnU_{n} with speed bn=lognb_{n}=\log n.

Moreover, condition (2.10) is only a sufficient condition for the sharpness of inequality (2.3). It is interesting to determine necessary conditions for sharpness of Theorem 2.4 in the sense of rough logarithmic limits. In addition, developing sharp upper bounds when such conditions do not hold can be a good direction to extend results of the current paper.

Finally, perhaps the most important message of this paper is one can extend concentration results developed for subGaussian variables to distribution with heavier tails simply by truncation technique and precise tuning of the truncation level. While this technique is applied for U-statistics here, the question is to what other self-averaging processes we can apply the same and obtain sharp concentration. We hope this work motivates future studies to obtain upper bounds with sharp constants to a larger class self-averaging processes.

Acknowledgment

The author is thankful for inspiring discussions he has had with Dr. Nabarun Deb during the development of this work.

4 Proofs

This Section includes the proofs of statements in the previous Sections. First, let recall the following Lemma from Bakhshizadeh et al. [2] which turns out to be useful in the calculation of logarithmic limits.

Lemma 4.1 (Lemma 5 of [2]).

Let an,bna_{n},b_{n} and cnc_{n} be sequences of positive numbers such that

limnlogancn=a,limnlogbncn=b,limncn=.\lim_{n\to\infty}\frac{\log a_{n}}{c_{n}}=a,\;\lim_{n\to\infty}\frac{\log b_{n}}{c_{n}}=b,\;\lim_{n\to\infty}c_{n}=\infty.

Then

limnlog(an+bn)cn=max{a,b}.\lim_{n\to\infty}\frac{\log(a_{n}+b_{n})}{c_{n}}=\max\mathinner{\left\{a,b\right\}}.

4.1 Proof of Lemma 2.14

Proof.

By Lemma 2.8, for any β<1\beta<1, we have v(kt,βI(kt)kt)<vv(kt,\beta\frac{I(kt)}{kt})<v, where vv is a constant independent of kk. Therefore, we obtain the following

limkkt22v(kt,βI(kt)kt)I(kt)limkkt22vI(kt)=,\lim_{k\to\infty}\frac{\frac{-kt^{2}}{2v(kt,\beta\frac{I(kt)}{kt})}}{I(kt)}\leq\lim_{k\to\infty}\frac{\frac{-kt^{2}}{2v}}{I(kt)}=-\infty,

because ktI(kt)\frac{kt}{I(kt)}\to\infty as ktkt\to\infty. Moreover, by Remark 2.7, c(t,β,k)1c(t,\beta,k)\to 1, hence,

limkβI(kt)max(12,c(t,β,k))I(kt)=β,limklog(nm)I(kt)I(kt)=1.\lim_{k\to\infty}\frac{-\beta I(kt)\max(\frac{1}{2},c(t,\beta,k))}{I(kt)}=-\beta,\quad\lim_{k\to\infty}\frac{\log{n\choose m}-I(kt)}{I(kt)}=-1.

Having inequality (2.3) and the above equations, Lemma 4.1 yields limnlog(Un>t)I(kt)β,β<1\lim\limits_{n\to\infty}\frac{\log\mathbb{P}\left({U_{n}>t}\right)}{I(kt)}\leq-\beta,\;\forall\beta<1. Multiplying by 1-1 and taking supremum over β<1\beta<1 implies

limklog(Un>t)I(kt)1.\lim_{k\to\infty}\frac{-\log\mathbb{P}\left({U_{n}>t}\right)}{I(kt)}\geq 1. (4.1)

On the other hand, Lemma 2.11 yields limnlog(Un>t)log(φn(X)>ntm)1,\lim\limits_{n\to\infty}\frac{-\log\mathbb{P}\left({U_{n}>t}\right)}{-\log\mathbb{P}\left({\varphi_{n}(X)>\frac{nt}{m}}\right)}\leq 1, so by (2.10) we obtain

limnlog(Un>t)I(kt)1,\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({U_{n}>t}\right)}{I(kt)}\leq 1,

\square

4.2 Proof Remark 2.17

Proof.
  1. 1.

    If XExp(λ)X\sim\text{Exp}(\lambda) we have (X>t)=exp(λt)\mathbb{P}\left({X>t}\right)=\exp\left({-\lambda t}\right), so J(t)=log(X>t)=λtJ(t)=-\log\mathbb{P}\left({X>t}\right)=\lambda t is a linear function which satisfies (2.11) with equality and b=0b=0.

  2. 2.

    Let X=|Z|α,Z𝒩(0,1)X=\mathinner{\!\left\lvert Z\right\rvert}^{\alpha},\;Z\sim\mathcal{N}(0,1). Observe that

    (X>t)=(|Z|>tα)2exp(12t2α)\mathbb{P}\left({X>t}\right)=\mathbb{P}\left({\mathinner{\!\left\lvert Z\right\rvert}>\sqrt[\alpha]{t}}\right)\leq 2\exp\left({-\frac{1}{2}t^{\frac{2}{\alpha}}}\right)

    Hence, setting J(t)=tα22log2J(t)=\frac{\sqrt[\frac{\alpha}{2}]{t}}{2}-\log 2 gives us a tail upperbound as in (2.2). Utilizing LDP rate function of the Normal distribution shows J(t)J(t) is asymptotically tight too. Also, for α2,J(t)\alpha\geq 2,\;J(t) is a concave function, hence by Lemma 2.16 satisfies Assumption 5.

  3. 3.

    Let X=exp(Z),Z𝒩(0,1)X=\exp\left({Z}\right),\;Z\sim\mathcal{N}(0,1). Note that

    (X>t)=(Z>logt)exp(J(t)),\mathbb{P}\left({X>t}\right)=\mathbb{P}\left({Z>\log t}\right)\leq\exp\left({-J(t)}\right),

    where J(t)={00t112log2tt>1J(t)=\begin{cases}0&0\leq t\leq 1\\ {\frac{1}{2}\log^{2}t}&t>1\end{cases}. Similar to the previous case, asymptotic tightness of JJ comes from rate function of the Normal distribution. Instead of showing (2.11) for J(t)J(t), we replace it with a concave asymptotically tight lower bound J2(t)J_{2}(t). This useful technique can be applied to several other cases as well. Let

    J2(t)={1e(te)+120te12log2tt>e.J_{2}(t)=\begin{cases}\frac{1}{\rm e}(t-{\rm e})+\frac{1}{2}&0\leq t\leq{\rm e}\\ \frac{1}{2}\log^{2}t&t>{\rm e}\end{cases}.

    Then, J(t)J2(t)t0J(t)\geq J_{2}(t)\;\forall t\geq 0, i.e. (X>t)exp(J2(t))\mathbb{P}\left({X>t}\right)\leq\exp\left({-J_{2}(t)}\right), and J2J_{2} is a concave function on 0\mathbb{R}^{\geq 0}, hence by Lemma 2.16 satisfies (2.11).

    To verify above claims, one only needs to note that J(t)J(t) is a convex function on [0,e][0,{\rm e}] and is a concave function on [e,)[{\rm e},\infty). J2(t)J_{2}(t) on [0,e][0,\rm e] is simply the linear approximation of J(t)J(t) at t=et={\rm e} to make it concave everywhere.

    Refer to caption
    Figure 1: J(t)J(t) and J2(t)J_{2}(t)
  4. 4.

    If XWeibull(λ,s)X\sim\text{Weibull}(\lambda,s), then (|X|>t)=exp((tλ)s)\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>t}\right)=\exp\left({-(\frac{t}{\lambda})^{s}}\right). Hence, one can take the trivial tail bound J(t)=log(X>t)=(tλ)sJ(t)=-\log\mathbb{P}\left({X>t}\right)=\left(\frac{t}{\lambda}\right)^{s}. This function satisfies J(0)=0J(0)=0, and is concave for s1s\leq 1. Hence, Lemma 2.16 yeilds J(t1+t2)J(t1)+J(t2)J(t_{1}+t_{2})\leq J(t_{1})+J(t_{2}).

  5. 5.

    Let XLog-logistic(α,β),α,β>0X\sim\text{Log-logistic}(\alpha,\beta),\;\alpha,\beta>0

    J(t)=log(|X|>t)=log(111+(x/α)β)=log(1+(x/α)β).J(t)=-\log\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>t}\right)=-\log(1-\frac{1}{1+(x/\alpha)^{-\beta}})=\log(1+(x/\alpha)^{\beta}).

    Note that

    2β(1+xβ)(1+yβ)2β+(2max{x,y})β1+(x+y)β,x,y1.2^{\beta}(1+x^{\beta})(1+y^{\beta})\geq 2^{\beta}+(2\max\{x,y\})^{\beta}\geq 1+(x+y)^{\beta},\quad\forall x,y\geq 1.

    therefore

    2β(1+(t1/α)β)(1+(t2/α)β)(1+((t1+t2)/α)β).2^{\beta}(1+(t_{1}/\alpha)^{\beta})(1+(t_{2}/\alpha)^{\beta})\geq(1+((t_{1}+t_{2})/\alpha)^{\beta}).

    Applying log\log to the above inequality yields J(t1)+J(t2)+βlog2J(t1+t2)J(t_{1})+J(t_{2})+\beta\log 2\geq J(t_{1}+t_{2}).

  6. 6.

    Let XPareto(xm,α)X\sim\text{Pareto}(x_{m},\alpha). Then,

    J(t)=log(X>t)=αlogtxm,t>xm,J(t)=-\log\mathbb{P}\left({X>t}\right)=\alpha\log\frac{t}{x_{m}},\quad t>x_{m},

    is a concave function on the support of XX, i.e. [xm,)[x_{m},\infty). Similar to the Log-Normal case above, one can linearly extend J(t)J(t) to a concave function on [0,)[0,\infty) and utilize Lemma 2.16 to verify J(t)J(t) is shifted sub-additive.

\square

4.3 Proof of Lemma 2.19

Proof.

1. The strategy to show (2.10) is to show limlog(φn(X)>nmt)J(kt)=limI(kt)J(kt)=1\lim\frac{-\log\mathbb{P}\left({\varphi_{n}(X)>\frac{n}{m}t}\right)}{J(kt)}=\lim\frac{I(kt)}{J(kt)}=1 as nn\to\infty. Let us write c𝔼[|XY|]c\triangleq\mathbb{E}\mathinner{\left[\mathinner{\!\left\lvert X-Y\right\rvert}\right]}. Note that

φn(X)\displaystyle\varphi_{n}(X) =inf|y|J1(log2n)|Xy|c\displaystyle=\inf_{\mathinner{\!\left\lvert y\right\rvert}\leq J^{-1}(\log{2n})}\mathinner{\!\left\lvert X-y\right\rvert}-c
={cif|X|J1(log2n)|X|J1(log2n)cif|X|>J1(log2n)..\displaystyle=\begin{cases}-c&\mbox{if}\ \mathinner{\!\left\lvert X\right\rvert}\leq J^{-1}(\log{2n})\\ \mathinner{\!\left\lvert X\right\rvert}-J^{-1}(\log{2n})-c&\mbox{if}\ \mathinner{\!\left\lvert X\right\rvert}>J^{-1}(\log{2n}).\end{cases}.

With the above display in mind, observe that

(φn(X)>nmt)\displaystyle\mathbb{P}(\varphi_{n}(X)>\frac{n}{m}t) =(|X|>nmt+J1(log2n)+c).\displaystyle=\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>\frac{n}{m}t+J^{-1}(\log 2n)+c}\right).

Note that, by Assumption 2 we have:

limnlog(|X|>nmt+J1(log2n)+c)J(nmt+J1(log2n)+c)=1.\lim\limits_{n\to\infty}\frac{-\log\mathbb{P}(\mathinner{\!\left\lvert X\right\rvert}>\frac{n}{m}t+J^{-1}(\log{2n})+c)}{J(\frac{n}{m}t+J^{-1}(\log 2n)+c)}=1.

Also, note that for large nn, J(kt)J(nmt+J1(log2n)+c)J(kt+t+J1(log4km)+c)J(kt)\leq J(\frac{n}{m}t+J^{-1}(\log 2n)+c)\leq J(kt+t+J^{-1}(\log 4km)+c). Using (4.16) from Lemma 4.2 with u=4km,c1=t4m,c2=c+tu=4km,\;c_{1}=\frac{t}{4m},\;c_{2}=c+t we obtain limnJ(kt+t+J1(log4km)+c)J(kt)=1\lim\limits_{n\to\infty}\frac{J(kt+t+J^{-1}(\log 4km)+c)}{J(kt)}=1, therefore

limnlog(φn(X)>nmt)J(kt)=limnJ(nmt+J1(log2n)+c)J(kt)=1.\displaystyle\lim\limits_{n\to\infty}\frac{-\log{\mathbb{P}(\varphi_{n}(X)>\frac{n}{m}t)}}{J(kt)}=\lim\limits_{n\to\infty}\frac{J(\frac{n}{m}t+J^{-1}(\log 2n)+c)}{J(kt)}=1. (4.2)

Next, we try to approximate the term I(kt)I(kt) from (2.10). Towards this direction, we begin by observing that

(|X1X2|c+kt)(X2c)(X1c+c+kt),\mathbb{P}(|X_{1}-X_{2}|\geq c+kt)\geq\mathbb{P}(X_{2}\leq c^{\prime})\mathbb{P}(X_{1}\geq c^{\prime}+c+kt),

for some cc^{\prime}\in\mathbb{R} such that (X2c)>0\mathbb{P}\left({X_{2}\leq c^{\prime}}\right)>0. Consequently,

lim supnlog(h(X1,X2)kt)J(kt+c+c)1,\limsup\limits_{n\to\infty}\frac{-\log{\mathbb{P}(h(X_{1},X_{2})\geq kt)}}{J(kt+c^{\prime}+c)}\leq 1,

Hence,

lim supnI(kt)J(kt+c+c)=lim supnI(kt)J(kt)1.\limsup\limits_{n\to\infty}\frac{I(kt)}{J(kt+c^{\prime}+c)}=\limsup\limits_{n\to\infty}\frac{I(kt)}{J(kt)}\leq 1. (4.3)

We used Lemma 4.2 to drop c+cc^{\prime}+c.

For the other direction, we need to establish an upper bound for (h(X1,X2)>kt)\mathbb{P}\left({h(X_{1},X_{2})>kt}\right). Let u>0u>0, then

(|X1X2|u)\displaystyle\mathbb{P}(|X_{1}-X_{2}|\geq u) (|X1|>u)+(|X2|>u)+(|X1X2|u,|X1|,|X2|u)\displaystyle\leq\mathbb{P}\left({\mathinner{\!\left\lvert X_{1}\right\rvert}>u}\right)+\mathbb{P}\left({\mathinner{\!\left\lvert X_{2}\right\rvert}>u}\right)+\mathbb{P}\left({\mathinner{\!\left\lvert X_{1}-X_{2}\right\rvert}\geq u,\;\mathinner{\!\left\lvert X_{1}\right\rvert},\mathinner{\!\left\lvert X_{2}\right\rvert}\leq u}\right)
2exp(J(u))+2(X1>X2+u,|X1|,|X2|u)\displaystyle\leq 2\exp\left({-J(u)}\right)+2\mathbb{P}\left({X_{1}>X_{2}+u,\;\mathinner{\!\left\lvert X_{1}\right\rvert},\mathinner{\!\left\lvert X_{2}\right\rvert}\leq u}\right)
2exp(J(u))+2i=0u(iuuX2i1uu)(X1(1iu)u)\displaystyle\leq 2\exp\left({-J(u)}\right)+2\sum_{i=0}^{\lceil u\rceil}\mathbb{P}\left({-\frac{i}{\lceil u\rceil}u\leq X_{2}\leq-\frac{i-1}{\lceil u\rceil}u}\right)\mathbb{P}\left({X_{1}\geq(1-\frac{i}{{\lceil u\rceil}})u}\right)
2exp(J(u))+2i=0uexp(J(i1uu)J((1iu)u))\displaystyle\leq 2\exp\left({-J(u)}\right)+2\sum_{i=0}^{\lceil u\rceil}\exp\left({-J\left({\frac{i-1}{\lceil u\rceil}u}\right)-J\left({(1-\frac{i}{{\lceil u\rceil}})u}\right)}\right)
2exp(J(u))+2i=0uexp(J(uuu)+b)\displaystyle\overset{*}{\leq}2\exp\left({-J(u)}\right)+2\sum_{i=0}^{\lceil u\rceil}\exp\left({-J\left({u-\frac{u}{\lceil u\rceil}}\right)+b}\right)
2exp(J(u))+2(u+2)exp(b)exp(J(u1))\displaystyle\leq 2\exp\left({-J(u)}\right)+2(u+2)\exp\left({b}\right)\exp\left({-J(u-1)}\right)
3ebuexp(J(u1)),for u2eb+4.\displaystyle\leq 3{\rm e}^{b}u\exp\left({-J(u-1)}\right),\quad\text{for }u\geq 2{\rm e}^{-b}+4.

To obtain inequality marked by *, we used Assumption 5. Taking log-\log of above inequality we get

lim infulog(|X1X2|>u)log3ub+J(u1)1.\liminf_{u\to\infty}\frac{-\log\mathbb{P}\left({\mathinner{\!\left\lvert X_{1}-X_{2}\right\rvert}>u}\right)}{-\log 3u-b+J(u-1)}\geq 1. (4.4)

Set u=kt+cu=kt+c. Since, by Lemma 4.2, limklog3(kt+c)b+J(kt+c1)J(kt)=1\lim\limits_{k\to\infty}\frac{-\log 3(kt+c)-b+J(kt+c-1)}{J(kt)}=1, we obtain:

lim infuI(kt)J(kt)1.\liminf_{u\to\infty}\frac{I(kt)}{J(kt)}\geq 1. (4.5)

Equations (4.2), (4.3), and (4.5) yield (2.10).

2. Once again, we set c𝔼(XY)2c\triangleq\mathbb{E}(X-Y)^{2} and note that, in this case,

φn(X)\displaystyle\varphi_{n}(X) =inf|y|J1(log2n)(Xy)2c\displaystyle=\inf_{\mathinner{\!\left\lvert y\right\rvert}\leq J^{-1}(\log{2n})}(X-y)^{2}-c
={cif|X|J1(log2n),(|X|J1(log2n))2cif|X|>J1(log2n).\displaystyle=\begin{cases}-c&\mbox{if}\ \mathinner{\!\left\lvert X\right\rvert}\leq J^{-1}(\log{2n}),\\ (\mathinner{\!\left\lvert X\right\rvert}-J^{-1}(\log{2n}))^{2}-c&\mbox{if}\ \mathinner{\!\left\lvert X\right\rvert}>J^{-1}(\log{2n}).\end{cases}

With the above display in mind, observe that

(φn(X)>nmt)\displaystyle\mathbb{P}(\varphi_{n}(X)>\frac{n}{m}t) =(|X|>J1(log2n)+c+nmt).\displaystyle=\mathbb{P}(\mathinner{\!\left\lvert X\right\rvert}>J^{-1}(\log{2n})+\sqrt{c+\frac{n}{m}t}).

Note that for large enough nn

ktJ1(log2n)+c+nmtJ1(log2n)+|c|+t+kt.\sqrt{kt}\leq J^{-1}(\log{2n})+\sqrt{c+\frac{n}{m}t}\leq J^{-1}(\log{2n})+\sqrt{\mathinner{\!\left\lvert c\right\rvert}}+\sqrt{t}+\sqrt{kt}.

Hence, by Assumptions 5, 6 we obtain

limnJ(J1(log2n)+c+nmt)J(kt)=1.\lim\limits_{n\to\infty}\frac{J(J^{-1}(\log{2n})+\sqrt{c+\frac{n}{m}t})}{J(\sqrt{kt})}=1.

(see proof of Lemma 4.2 for details.)

We therefore have

limnlog(φn(X)>nmt)J(kt)=1.\displaystyle\lim\limits_{n\to\infty}\frac{-\log{\mathbb{P}(\varphi_{n}(X)>\frac{n}{m}t)}}{J(\sqrt{kt})}=1. (4.6)

Next, we try to approximate the term I(kt)I(kt) from (2.10). Note that

((X1X2)2c+kt)=(|X1X2|c+kt)(|X2|c)(|X1|c+c+kt),\mathbb{P}((X_{1}-X_{2})^{2}\geq c+kt)=\mathbb{P}(\mathinner{\!\left\lvert X_{1}-X_{2}\right\rvert}\geq\sqrt{c+kt})\geq\mathbb{P}(\mathinner{\!\left\lvert X_{2}\right\rvert}\leq c^{\prime})\mathbb{P}(\mathinner{\!\left\lvert X_{1}\right\rvert}\geq c^{\prime}+\sqrt{c+kt}), (4.7)

for a constant cc^{\prime} such that (|X2|c)>0\mathbb{P}\left({\mathinner{\!\left\lvert X_{2}\right\rvert}\leq c^{\prime}}\right)>0. Consequently,

lim supnlog((X1X2)2c+kt)J(c+c+kt)1,\limsup\limits_{n\to\infty}\frac{-\log{\mathbb{P}((X_{1}-X_{2})^{2}\geq c+kt)}}{J(c^{\prime}+\sqrt{c+kt})}\leq 1,

which in turn yields

lim supnI(kt)J(kt)1.\limsup\limits_{n\to\infty}\frac{I(kt)}{J(\sqrt{kt})}\leq 1. (4.8)

We utilized Lemma 4.2 to drop c,cc,c^{\prime} from denominator in limits.

For the other direction, we need to establish an upper bound for the left hand side of (4.7). As proved in (4.4), with u=c+ktu=\sqrt{c+kt}, we have

lim infklog(|X1X2|>c+kt)log3c+ktb+J(c+kt1)1.\liminf_{k\to\infty}\frac{-\log\mathbb{P}\left({\mathinner{\!\left\lvert X_{1}-X_{2}\right\rvert}>\sqrt{c+kt}}\right)}{-\log 3\sqrt{c+kt}-b+J(\sqrt{c+kt}-1)}\geq 1. (4.9)

Since logc+ktJ(kt)0,J(c+kt1)J(kt)1\frac{\log\sqrt{c+kt}}{J(\sqrt{kt})}\to 0,\;\frac{J(\sqrt{c+kt}-1)}{J(\sqrt{kt})}\to 1 as kk\to\infty, from (4.9) we obtain

lim infklog(h(X1,X2)kt)J(kt)1.\liminf_{k\to\infty}\frac{-\log\mathbb{P}\left({h(X_{1},X_{2})\geq kt}\right)}{J(\sqrt{kt})}\geq 1. (4.10)

This completes the proof.

3. We write c=𝔼max(|X1|,,|Xm|)c=\mathbb{E}\max(\mathinner{\!\left\lvert X_{1}\right\rvert},...,\mathinner{\!\left\lvert X_{m}\right\rvert}). Note that

φn(X)=|X|c.\displaystyle\varphi_{n}(X)=\mathinner{\!\left\lvert X\right\rvert}-c.

Hence, (φn(X)nmt)=(|X|nmt+c)\mathbb{P}\left({\varphi_{n}(X)\geq\frac{n}{m}t}\right)=\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}\geq\frac{n}{m}t+c}\right), and similar to the above cases we can show:

limnlog(φn(X)nmt)J(kt)=limnI(kt)J(kt)=limnlog(φn(X)nmt)I(kt)=1.\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({\varphi_{n}(X)\geq\frac{n}{m}t}\right)}{J(kt)}=\lim_{n\to\infty}\frac{I(kt)}{J(kt)}=\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({\varphi_{n}(X)\geq\frac{n}{m}t}\right)}{I(kt)}=1. (4.11)

4. Assume nn is large enough so 1J1(log2n)1\leq J^{-1}(\log 2n). Calling 𝔼[X2]𝔼[max{X,Y}]=c\mathbb{E}\mathinner{\left[X^{2}\right]}-\mathbb{E}\mathinner{\left[\max\mathinner{\left\{X,Y\right\}}\right]}=c we have

φn(X)={12X2Xc,ifX>12,12X212c,ifX12.\varphi_{n}(X)=\begin{cases}\frac{1}{2}X^{2}-X-c,&\mbox{if}\ X>\frac{1}{2},\\ \frac{1}{2}X^{2}-\frac{1}{2}-c,&\mbox{if}\ X\leq\frac{1}{2}.\end{cases}

Therefore,

(φn(X)>nmt)\displaystyle\mathbb{P}\left({\varphi_{n}(X)>\frac{n}{m}t}\right) =(X22X>nmt+c,X>0)+(X2nmt+2c+1)\displaystyle=\mathbb{P}\left({\frac{X^{2}}{2}-X>\frac{n}{m}t+c,X>0}\right)+\mathbb{P}\left({X\leq-\sqrt{\frac{2n}{m}t+2c+1}}\right)
=(X>1+2nmt+2c+1)+(X2nmt+2c+1).\displaystyle=\mathbb{P}\left({X>1+\sqrt{\frac{2n}{m}t+2c+1}}\right)+\mathbb{P}\left({X\leq-\sqrt{\frac{2n}{m}t+2c+1}}\right).

Hence,

(|X|>2nmt+2c+1+1)(φn(X)>nmt)(|X|>2nmt+2c+1).\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>\sqrt{\frac{2n}{m}t+2c+1}+1}\right)\leq\mathbb{P}\left({\varphi_{n}(X)>\frac{n}{m}t}\right)\leq\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>\sqrt{\frac{2n}{m}t+2c+1}}\right).

Similar to the previous cases above, by Lemma 4.2, we can then show

limnlog(φn(X)>nmt)J(2nmt+2c+1+1)=limnlog(φn(X)>nmt)J(2nmt+2c+1)=limnlog(φn(X)>nmt)J(2kt)=1.\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({\varphi_{n}(X)>\frac{n}{m}t}\right)}{J\left(\sqrt{\frac{2n}{m}t+2c+1}+1\right)}=\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({\varphi_{n}(X)>\frac{n}{m}t}\right)}{J\left(\sqrt{\frac{2n}{m}t+2c+1}\right)}=\lim_{n\to\infty}\frac{-\log\mathbb{P}\left({\varphi_{n}(X)>\frac{n}{m}t}\right)}{J(\sqrt{2kt})}=1. (4.12)

Moreover, since h(X,Y)min(12X212,12X2X)ch(X,Y)\geq\min(\frac{1}{2}X^{2}-\frac{1}{2},\frac{1}{2}X^{2}-X)-c, we obtain

(h(X,Y)>kt)\displaystyle\mathbb{P}\left({h(X,Y)>kt}\right) (min(12X212,12X2X)>kt+c)\displaystyle\geq\mathbb{P}\left({\min(\frac{1}{2}X^{2}-\frac{1}{2},\frac{1}{2}X^{2}-X)>kt+c}\right)
(|X|>2kt+2c+1+1).\displaystyle\geq\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>\sqrt{2kt+2c+1}+1}\right).

Hence,

lim supnlog(h(X,Y)>kt)J(2kt+2c+1+1)=lim supnI(kt)J(2kt)1.\limsup_{n\to\infty}\frac{-\log\mathbb{P}\left({h(X,Y)>kt}\right)}{J(\sqrt{2kt+2c+1}+1)}=\limsup_{n\to\infty}\frac{I(kt)}{J(\sqrt{2kt})}\leq 1. (4.13)

Again, for dropping extra constants from the denominator we used Lemma 4.2.

Furthermore, h(X,Y)12X2+|X|+12Y2+|Y|ch(X,Y)\leq\frac{1}{2}X^{2}+\mathinner{\!\left\lvert X\right\rvert}+\frac{1}{2}Y^{2}+\mathinner{\!\left\lvert Y\right\rvert}-c, and 12X2+|X|>u|X|>1+2u+1,u0\frac{1}{2}X^{2}+\mathinner{\!\left\lvert X\right\rvert}>u\iff\mathinner{\!\left\lvert X\right\rvert}>-1+\sqrt{2u+1},\;\forall u\geq 0. Hence,

(h(X,Y)>kt)\displaystyle\mathbb{P}\left({h(X,Y)>kt}\right) (12X2+|X|+12Y2+|Y|>kt+c)\displaystyle\leq\mathbb{P}\left({\frac{1}{2}X^{2}+\mathinner{\!\left\lvert X\right\rvert}+\frac{1}{2}Y^{2}+\mathinner{\!\left\lvert Y\right\rvert}>kt+c}\right)
(12X2+|X|>kt+c)+(12Y2+|Y|>kt+c)\displaystyle\leq\mathbb{P}\left({\frac{1}{2}X^{2}+\mathinner{\!\left\lvert X\right\rvert}>kt+c}\right)+\mathbb{P}\left({\frac{1}{2}Y^{2}+\mathinner{\!\left\lvert Y\right\rvert}>kt+c}\right)
+i=1kt+c(i1kt+c(kt+c)12X2+|X|ikt+c(kt+c))×\displaystyle\quad+\sum_{i=1}^{\lceil kt+c\rceil}\mathbb{P}\left({\frac{i-1}{{\lceil kt+c\rceil}}(kt+c)\leq\frac{1}{2}X^{2}+\mathinner{\!\left\lvert X\right\rvert}\leq\frac{i}{{\lceil kt+c\rceil}}(kt+c)}\right)\times
(12Y2+|Y|>(1ikt+c)(kt+c))\displaystyle\hskip 56.9055pt\mathbb{P}\left({\frac{1}{2}Y^{2}+\mathinner{\!\left\lvert Y\right\rvert}>(1-\frac{i}{{\lceil kt+c\rceil}})(kt+c)}\right)
2(|X|>1+2kt+2c+1)\displaystyle\leq 2\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}>-1+\sqrt{2kt+2c+1}}\right)
+i=1kt+c(|X|1+2(i1)kt+c(kt+c)+1)×\displaystyle\quad+\sum_{i=1}^{\lceil kt+c\rceil}\mathbb{P}\left({\mathinner{\!\left\lvert X\right\rvert}\geq-1+\sqrt{\frac{2(i-1)}{{\lceil kt+c\rceil}}(kt+c)+1}}\right)\times
(|Y|>1+2(1ikt+c)(kt+c)+1)\displaystyle\hskip 56.9055pt\mathbb{P}\left({\mathinner{\!\left\lvert Y\right\rvert}>-1+\sqrt{2(1-\frac{i}{{\lceil kt+c\rceil}})(kt+c)+1}}\right)
2exp(J(1+2kt+2c+1))\displaystyle\leq 2\exp\left({-J(-1+\sqrt{2kt+2c+1})}\right)
+i=1kt+cexp(J(1+2(i1)kt+c(kt+c)+1)J(1+2(1ikt+c)(kt+c)+1))\displaystyle\quad+\sum_{i=1}^{\lceil kt+c\rceil}\exp\left({-J(-1+\sqrt{\frac{2(i-1)}{{\lceil kt+c\rceil}}(kt+c)+1})-J(-1+\sqrt{2(1-\frac{i}{{\lceil kt+c\rceil}})(kt+c)+1})}\right)
2exp(J(1+2kt+2c+1))+i=1kt+cebexp(J(2+2(11kt+c)(kt+c)+2))\displaystyle\overset{*}{\leq}2\exp\left({-J(-1+\sqrt{2kt+2c+1})}\right)+\sum_{i=1}^{\lceil kt+c\rceil}{\rm e}^{b}\exp\left({-J(-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2})}\right)
=2exp(J(1+2kt+2c+1))+kt+cebexp(J(2+2(11kt+c)(kt+c)+2)).\displaystyle=2\exp\left({-J(-1+\sqrt{2kt+2c+1})}\right)+{\lceil kt+c\rceil}{\rm e}^{b}\exp\left({-J(-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2})}\right).

To obtain inequality marked by * we used Assumption 5 and the fact that x+yx+y\sqrt{x+y}\leq\sqrt{x}+\sqrt{y} for non-negative x,yx,y.

Hence, by Lemma 4.3 we obtain

lim infnI(kt)J(2kt)1,\liminf_{n\to\infty}\frac{I(kt)}{J(\sqrt{2kt})}\geq 1, (4.14)

which concludes the proof. \square

Lemma 4.2.

If tail function J(t)J(t), defined in (2.2), satisfy Assumptions 5, 6, then for any c1>0c_{1}>0 and c2,c3c_{2},c_{3}\in\mathbb{R} we have

limuJ(u+c2)J(u)=1\lim_{u\to\infty}\frac{J(u+c_{2})}{J(u)}=1 (4.15)
limuJ(c1u+J1(logu)+c2)J(c1u)=1\lim_{u\to\infty}\frac{J(c_{1}u+J^{-1}(\log u)+c_{2})}{J(c_{1}u)}=1 (4.16)
limuJ(c2+u+c3)J(u)=1\lim_{u\to\infty}\frac{J(c_{2}+\sqrt{u+c}_{3})}{J(\sqrt{u})}=1 (4.17)
Proof.

To prove (4.15), one only need to note that by Assumption 5 we have

J(u)J(|c2|)bJ(u|c2|)J(u+c2)J(u+|c2|)J(u)+J(|c2|)+b,J(u)-J(\mathinner{\!\left\lvert c_{2}\right\rvert})-b\leq J(u-\mathinner{\!\left\lvert c_{2}\right\rvert})\leq J(u+c_{2})\leq J(u+\mathinner{\!\left\lvert c_{2}\right\rvert})\leq J(u)+J(\mathinner{\!\left\lvert c_{2}\right\rvert})+b, (4.18)

and that JJ is a non-decreasing function, and J(u)uJ(u)\xrightarrow{u\to\infty}\infty.

To show (4.16) observe that J1(logu)+c2>0J^{-1}(\log u)+c_{2}>0 for large enough uu. By Assumption 5 we obtain

J(c1u)J(c1u+J1(logu)+c2)\displaystyle J(c_{1}u)\leq J(c_{1}u+J^{-1}(\log u)+c_{2}) J(c1u)+logu+J(|c2|)+2b.\displaystyle\leq J(c_{1}u)+\log u+J(\mathinner{\!\left\lvert c_{2}\right\rvert})+2b.

Note that limloguJ(u)=limCJ(u)=0\lim\frac{\log u}{J(u)}=\lim\frac{C}{J(u)}=0 as uu\to\infty, so dividing by J(c1u)J(c_{1}u) and taking limit of the above inequalities yields (4.16).

For the third part observe that when uu is large enough we have

u|c3||c2|c2+u+c3u+|c3|+|c2|.\sqrt{u}-\sqrt{\mathinner{\!\left\lvert c_{3}\right\rvert}}-\mathinner{\!\left\lvert c_{2}\right\rvert}\leq c_{2}+\sqrt{u+c_{3}}\leq\sqrt{u}+\sqrt{\mathinner{\!\left\lvert c_{3}\right\rvert}}+\mathinner{\!\left\lvert c_{2}\right\rvert}.

Then, since JJ is non-decreasing and satisfies Assumption 5 we obtain

J(u)J(|c3|+|c2|)bJ(c2+u+c3)J(u)+J(|c3|+|c2|)+b.J(\sqrt{u})-J(\sqrt{\mathinner{\!\left\lvert c_{3}\right\rvert}}+\mathinner{\!\left\lvert c_{2}\right\rvert})-b\leq J(c_{2}+\sqrt{u+c_{3}})\leq J(\sqrt{u})+J(\sqrt{\mathinner{\!\left\lvert c_{3}\right\rvert}}+\mathinner{\!\left\lvert c_{2}\right\rvert})+b.

Once again, dividing by J(u)J(\sqrt{u}) and taking uu\to\infty yields (4.17). \square

Lemma 4.3.

Under Assumptions 5, 6 we have

limklog(2exp(J(1+2kt+2c+1))+kt+cebexp(J(2+2(11kt+c)(kt+c)+2)))J(2kt)=1\lim_{k\to\infty}\frac{-\log\left(2\exp\left({-J(-1+\sqrt{2kt+2c+1})}\right)+{\lceil kt+c\rceil}{\rm e}^{b}\exp\left({-J(-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2})}\right)\right)}{J(\sqrt{2kt})}=1
Proof.

Given Lemma 4.1, it suffices to show that

limklog(2exp(J(1+2kt+2c+1)))J(2kt)=1\displaystyle\lim_{k\to\infty}\frac{-\log\left(2\exp\left({-J(-1+\sqrt{2kt+2c+1})}\right)\right)}{J(\sqrt{2kt})}=1 (4.19)
limkblog(kt+cexp(J(2+2(11kt+c)(kt+c)+2)))J(2kt)=1.\displaystyle\lim_{k\to\infty}\frac{-b-\log\left({\lceil kt+c\rceil}\exp\left({-J(-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2})}\right)\right)}{J(\sqrt{2kt})}=1. (4.20)

Note that

limklog(2exp(J(1+2kt+2c+1)))J(2kt)\displaystyle\lim_{k\to\infty}\frac{-\log\left(2\exp\left({-J(-1+\sqrt{2kt+2c+1})}\right)\right)}{J(\sqrt{2kt})} =limkJ(1+2kt+2c+1)J(2kt)=1,\displaystyle=\lim_{k\to\infty}\frac{J(-1+\sqrt{2kt+2c+1})}{J(\sqrt{2kt})}=1,

by Lemma 4.2.

Moreover,

limkblog(kt+cexp(J(2+2(11kt+c)(kt+c)+2)))J(2kt)\displaystyle\lim_{k\to\infty}\frac{-b-\log\left({\lceil kt+c\rceil}\exp\left({-J(-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2})}\right)\right)}{J(\sqrt{2kt})}
=limkblogkt+cJ(2kt)+J(2+2(11kt+c)(kt+c)+2)J(2kt).\displaystyle\;=\lim_{k\to\infty}\frac{-b-\log\lceil kt+c\rceil}{J(\sqrt{2kt})}+\frac{J(-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2})}{J(\sqrt{2kt})}.

By Assumption 6 we have limkblogkt+cJ(2kt)=limk2logkt+cJ(2kt)=0\lim\limits_{k\to\infty}\frac{-b-\log\lceil kt+c\rceil}{J(\sqrt{2kt})}=\lim\limits_{k\to\infty}\frac{-2\log\sqrt{\lceil kt+c\rceil}}{J(\sqrt{2kt})}=0. We also have

limk2(11kt+c)(kt+c)+2=2kt+2c+2,\lim_{k\to\infty}\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2}=\sqrt{2kt+2c+2},

Hence, for large enough kk, there are fixed c2,c3c_{2},c_{3}\in\mathbb{R} such that

2+2kt+c22+2(11kt+c)(kt+c)+22+2kt+c3,k>K.-2+\sqrt{2kt+c_{2}}\leq-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2}\leq-2+\sqrt{2kt+c_{3}},\quad\forall k>K.

Given J(t)J(t) is an increasing function, and limkJ(2+2kt+ci)J(2kt)=1,i=2,3\lim_{k\to\infty}\frac{J(-2+\sqrt{2kt+c_{i}})}{J(\sqrt{2kt})}=1,\;i=2,3 by Lemma 4.2 we obtain

limkJ(2+2(11kt+c)(kt+c)+2)J(2kt)=1.\lim_{k\to\infty}\frac{J(-2+\sqrt{2(1-\frac{1}{\lceil kt+c\rceil})(kt+c)+2})}{J(\sqrt{2kt})}=1. (4.21)

\square

References

  • Arcones [1992] Miguel A Arcones. Large deviations for u-statistics. Journal of multivariate analysis, 42(2):299–301, 1992.
  • Bakhshizadeh et al. [2020] Milad Bakhshizadeh, Arian Maleki, and Victor H. de la Pena. Sharp concentration results for heavy-tailed distributions, 2020. URL https://arxiv.org/abs/2003.13819.
  • Baringhaus and Rank [2002] L Baringhaus and R Rank. On large deviations of u-statistics and their projections. Sankhyā: The Indian Journal of Statistics, Series A, pages 167–170, 2002.
  • Bretagnolle [1999] Jean Bretagnolle. A new large deviation inequality for u-statistics of order 2. ESAIM: Probability and Statistics, 3:151–162, 1999.
  • Chakrabortty and Kuchibhotla [2018] Abhishek Chakrabortty and Arun Kumar Kuchibhotla. Tail bounds for canonical u-statistics and u-processes with unbounded kernels. 2018.
  • Dasgupta [1984] Ratan Dasgupta. On large deviation probabilities of u-statistics in non iid case. Sankhyā: The Indian Journal of Statistics, Series A, pages 110–116, 1984.
  • de la Pena [1992] Victor H de la Pena. Decoupling and khintchine’s inequalities for u-statistics. The Annals of Probability, pages 1877–1892, 1992.
  • Eichelsbacher [2005] Peter Eichelsbacher. Refined large deviations for von mises statistics. Theory of Probability & Its Applications, 49(1):139–148, 2005.
  • Eichelsbacher and Löwe [1995] Peter Eichelsbacher and Matthias Löwe. A large deviation principle form-variate von mises-statistics and u-statistics. Journal of Theoretical Probability, 8(4):807–824, 1995.
  • Fan et al. [2012] Xiequan Fan, Ion Grama, and Quansheng Liu. Large deviation exponential inequalities for supermartingales. Electronic Communications in Probability, 17:1–8, 2012.
  • Giné et al. [2000] Evarist Giné, Rafał Latała, and Joel Zinn. Exponential and moment inequalities for u-statistics. In High Dimensional Probability II, pages 13–38. Springer, 2000.
  • Hoeffiding [1948] W Hoeffiding. A class of statistics with asymptotically normal distributions. Annals of Mathematical Statistics, 19:293–325, 1948.
  • Houdré and Reynaud-Bouret [2003] Christian Houdré and Patricia Reynaud-Bouret. Exponential inequalities, with constants, for u-statistics of order two. In Stochastic inequalities and applications, pages 55–69. Springer, 2003.
  • Kiessling and Wang [2012] Michael K-H Kiessling and Yu Wang. Onsager’s ensemble for point vortices with random circulations on the sphere. Journal of Statistical Physics, 148(5):896–932, 2012.
  • Korolyuk and Borovskich [2013] Vladimir S Korolyuk and Yu V Borovskich. Theory of U-statistics, volume 273. Springer Science & Business Media, 2013.
  • Lee [2019] A.J. Lee. U-Statistics: Theory and Practice. CRC Press, 2019. ISBN 9781351405850. URL https://books.google.com/books?id=YC2NDwAAQBAJ.
  • Liu and Wu [2020] Wei Liu and Liming Wu. Large deviations for empirical measures of mean-field gibbs measures. Stochastic Processes and their Applications, 130(2):503–520, 2020.
  • Nikitin and Ponikarov [2001] Ya. Yu. Nikitin and E.V. Ponikarov. Rough asymptotics of probabilities of chernoff type large deviations for von mises functionals and u-statistics. Translations of the American Mathematical Society-Series 2, 203:107–146, 2001.
  • Nikitin [1995] Yakov Nikitin. Asymptotic efficiency of nonparametric tests. Cambridge University Press, 1995.
  • Talagrand [1996] Michel Talagrand. New concentration inequalities in product spaces. Inventiones mathematicae, 126(3):505–563, 1996.
  • Vershynin [2018] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  • Wang [1994] Wenyang Wang. Large deviations of U-empirical probability measures and statistical functionals. The Johns Hopkins University, 1994.
  • Yakov Y Nikitin [2004] Irina Peaucelle Yakov Y Nikitin. Efficiency and local optimality of nonparametric tests based on u-and v-statistics. Metron, 62(2):185–200, 2004.