Exponential tail bounds and Large Deviation Principle for Heavy-Tailed U-Statistics111This work was partially supported by the National Science Foundation under Grant 1811614.
Abstract
We study deviation of U-statistics when samples have heavy-tailed distribution so the kernel of the U-statistic does not have bounded exponential moments at any positive point. We obtain an exponential upper bound for the tail of the U-statistics which clearly denotes two regions of tail decay, the first is a Gaussian decay and the second behaves like the tail of the kernel. For several common U-statistics, we also show the upper bound has the right rate of decay as well as sharp constants by obtaining rough logarithmic limits which in turn can be used to develop LDP for U-statistics. In spite of usual LDP results in the literature, processes we consider in this work have LDP speed slower than their sample size .
1 Introduction
Suppose where is a probability measure supported on . We consider the following U-statistic of order with a kernel function which is symmetric about its arguments:
We study the decay of as a function of , and , in a setup which is rarely addressed in the literature and is characterized by a couple of key assumptions: heavy-tailed distribution for , and large values for . We explain the role of each assumption in more details below.
When has a heavy-tailed distribution, its Moment Generating Function (MGF) is not bounded at any positive point. This breaks many concentration results and violates Cramer’s condition which is required for common results that determine large deviation behavior. While deviation of U-statistics has been studied extensively in the light-tailed regime [1, 9, 8, 15, 16, 23, 22], the same for heavy-tailed U-statistics is relatively under-explored. In this paper, we aim to focus on the heavy-tailed regime.
Moreover, when kernel is assumed to have a heavy tail, for fixed values of and , has two different behaviors: 1) for small values of it decays like a Gaussian tail, and 2) for larger than the zone of normal convergence, i.e. large deviation zone, it has a decay slower than normal distribution. In spite of the first region which has been studied in the literature largely, see [16, 15] and the references therein, there are little documented information about the tail behavior in the second region. Several setups have been developed to study the behavior of the tail. We mention some of them below to discuss the setup that results of the current paper belong to.
-
1.
Finite sample concentration inequalities that give upper bounds for for all values of n.
-
2.
Asymptotic distribution studies that find suitable scaling and limiting distribution for which . , and is the well-known CLT that holds for non-degenerate U-statistics.
-
3.
Berry–Esseen type inequalities that seek for uniform upper bound for all , where .
-
4.
Large deviation studies that look for convergence speed and rate function for which we have for large .
In this work, we try to shed some light on the undocumented facts about the deviation of U-statistics in setups 1 and 4 listed above, the concentration inequality, and the large deviation setups. In particular, we develop a general concentration inequality that holds for all U-statistics with finite second moments. Moreover, we characterize a class of kernels for which that concentration inequality is sharp enough to yield the large deviation limit. The byproduct of such sharpness analysis is obtaining exact convergence speed and a closed form for the rate function in terms of the tail of the kernel . For heavy-tailed kernels, which are considered in this work, we usually obtain , e.g . This distinguishes our results from several previous works that tried to determine tail behavior in the asymptotic Gaussian regime, i.e. .
1.1 Related works
Large deviations of U-statistics have been studied in several previous works under a variety of assumptions. Hoeffiding [12] formally introduced U-statistics and showed with the right scaling they converge to a Normal distribution. Follow up studies offered bounds for the deviation of U-statistics from their Gaussian limits (see [16, 15] and the references therein).
Giné et al. [11] offers upper bound for moments of U-statistics. There is a relatively straightforward connection between moment inequalities and exponential bounds for the tail (see [21]). Utilizing such connection, [11] obtains exponential tail bounds. However, these bounds do not show Gaussian decay when the deviation amount is within the Gaussian zone. Therefore, the authors also offer improvements for their inequalities in the case of completely degenerate kernels of order .
Recently, Chakrabortty and Kuchibhotla [5] considered degenerate kernels of order 2 and samples from heavy-tailed subWeibull distributions and obtained exponential tail bounds for such U-statistics. However, the specific form they used for the U-statistics seems restrictive. They define
and impose uniform boundedness assumption on . Hence, the kernel can be unbounded only through product function . It is not clear to us if results of [5] can recover exponential bounds for general non-degenerate kernels like ones named in Lemma 2.19 of the current work.
One property of U-statitics of order that makes them more challenging than the case , i.e. iid sums, is containing dependent terms in the sum. There are a couple of ways to handle such dependencies, decoupling and martingale inequalities. Each direction points to a path that is relatively different from another. Both directions have been explored to develop tail bounds for U-statistics.
de la Pena [7] provides a decoupling tool that helps one to get rid of dependencies of summands in the U-statistics. This tool has been utilized by several later works to obtain exponential concentration bounds, e.g. [17, 4, 5]. While the decoupling technique is proved very powerful in handling dependencies, there is an extra factor on the upper bound it offers which usually makes the exponential bound it provides off by a constant factor in its exponent.
Another approach to obtain exponential inequalities for U-statistics is to leverage inequalities for (sub/super) martingales such as those in [10]. Houdré and Reynaud-Bouret [13] and some of their references pursue this direction. Using de la Pena [7] decoupling and Talagrand [20] inequality is common in such approach. Nevertheless, in this approach, the statements get algebraically complicated soon and several constants given by or expectation of partial sums get involved. As a result, many bounds obtained by martingale inequalities are restricted to order 2 or degenerate kernels. Moreover, constants are not usually sharp for the similar reason discussed in utilizing the decoupling technique above.
In addition to tail bounds, several works have been devoted to showing U-statistics satisfy Large Deviation Principal (LDP). It is common for works in this direction to assume the kernel has finite exponential moments, hence they exclude distributions we are focusing on in the current manuscript.
Dasgupta [6] tried to show large deviation behavior of U-statistics is determined by their conditional expectation given one summand. At the first glance this sounds a reasonable claim since the asymptotic normality of U-statistics is tightly related to such conditional expectations [12]. However, this claim has been disapproved later [3, 22].
Nikitin and Ponikarov [18] study large deviation of U-statistics with bounded non-degenerate kernels. It also claims no large deviation results have been known by the time the work was published. Arcones [1] developes a decomposition technique for the case of a 2-variables kernel with finite MGF that enables one to prove LDP. Nevertheless, extension of this technique to higher order kernels is not straightforward. Eichelsbacher and Löwe [9] consider both U-statistics and V-statistics, which includes terms with repeated samples in the sum, given by a kernel function with bounded MGF. In this case, they show satisfying LDP for U-statistics and the corresponding V-statistics is equivalent with the same rate function. Lemma 3.1 from [9], which allows one to bound MGF of a U-statistic, is one the most important tools we used in the current work. Eichelsbacher [8] provides sufficient conditions under which a U-statistic would satisfy an LDP under weak Cramer condition. It utilizes a representation of a bi-variable kernel as infinite sums of separable functions in order to reduce the problem of the large deviation for U-statistics to the one for sums of iid variables. Then, it applies contraction theorem to prove LDP for the U-statistics.
The current work focuses on heavy-tailed distributions which are mostly excluded from large deviation studies above. The exponential tail bound given in Section 2.1 includes all kernels with finite variances regardless of their order, boundedness, or degeneracy. The upper bound we offer is simple enough to reveal both Gaussian and heavy tail decay regions immediately. All parameters in the statement of the upper bound have known limits when . Moreover, we offer ready-to-use tools to handle them for fixed sample size in Section 2.1.1. In addition, to the best of our knowledge, the rough logarithmic limits, with speeds slower than , obtained in Section 2.3 have not been documented in earlier works.
1.2 Some Applications
U-statistics appear in several statistical analyses including point estimation, such as population variance [12], hypothesis testing [19], and statistical mechanics [14]. As a self-averaging function, it is quite natural to ask how fast a U-statistic concentrates around its mean.
In particular, for the asymptotic analysis of hypothesis testing, the speed and the rate function for the large deviation of such statistics are important. For instance, obtaining the values of Bahadur efficiency and Bahadur exact slope requires having LDP for the test function [19]. Moreover, the large deviation principle and exponential concentrations for U-statistics play substantial roles in the study of macroscopic limits of interacting particle systems and in general statistical mechanics [17]. Our result enables one to extend such theories when samples are drawn from a heavy-tailed distribution.
2 Main results
The main result of this paper is two-fold. First, in Section 2.1 we develop a general upper bound for the tail probability of U-statistics whose kernels have bounded variances. Second, in Section 2.3, we show for several kernel functions this upper bound is tight enough to determine the large deviation behavior of the U-statistics. If a centered kernel satisfies the criteria discussed in Section 2.3 and denotes the U-statistic corresponding to with samples, we show in Lemma 2.14 that and have the same asymptotic behavior in logarithmic sense, i.e.
Therefore, both the convergence speed and the rate function for large deviation of are determined by the tail of kernel .
In addition, the lower bound developed in Section 2.2 reveals the event that is responsible for large deviations of . Indeed, one of the gets large enough to make all the summands it is contributing to exceed . There will be such terms in the expansion of .
2.1 Finite sample upper bound
Without loss of generality, we will operate under the assumption that the kernel is centered.
Assumption 1.
Suppose is centered, i.e.
We need to define rate functions as below.
Definition 2.1.
Let be two non-decreasing functions that upperbound right tails of and exponentially. In other words:
(2.1) | |||
(2.2) |
Note that one can simply take . The whole point of Definition 2.1 is to allow one to work with possibly simpler upper bounds.
Below Lemma is a simpler modified version of Lemma 3.1 from [9].
Lemma 2.2.
Let . Then, given any and , the following holds:
where .
Proof.
Define , then we have
Since is bounded, its moment generating function is finite at any positive point, hence we obtain
To obtain inequality marked by , we used the fact by convexity of the exponential function.
For the sake of simplicity we drop arguments of and only write
Lemma 2.3 (Lemma 1 of [2]).
If , for any we have
where .
Proof.
Let show the U-statistic with kernel by
Lemma 2.2 | ||||
Lemma 2.3 | ||||
Choose . To conclude the proof we only need to show that there are always choices for which makes
.
We consider two cases:
-
1.
if choose , so
-
2.
if choose .
Then, in case 1 since , we have (Note that is increasing in its second argument). Hence,
In the second case one just needs to substitute to obtain
Note that since in this case , we have so we have . The operator controls this term when we are in the first case, so the upper bound does not blow up.
Remark 2.5 (Two regions of deviations).
Inequality (2.3) reveals two different decay rates for the tail of . For small s, the first term, i.e. , will be dominant, hence we observe Gaussian-like deviation. This behavior has been studied already by CLT for U-statistics [12]. For larger s, the last couple of terms on the right hand side of (2.3) will be dominant. We call this region large deviation region. Asymptotically, the sum of the last two terms decays like for both subWeibull and polynomial tail kernels (see Section III of [2] for detailed discussion).
Inequality (2.3) denotes large deviation behavior whenever
(2.4) |
For instance, when we have large deviation behavior for . This means the region of Gaussian deviation shrinks to as , when .
2.1.1 Parameters of inequality (2.3)
Theorem 2.4 bounds the tail of in terms of and the tail of kernel . The only terms of (2.3) that might seem unfamiliar are . What typically happens in the asymptotic setting is . Moreover, can be chosen arbitrarily close to . Hence, for large , one can think of upper bound (2.3) as . For logarithmic which corresponds to polynomial tail kernels, there are more restrictions on the constant . Nevertheless, for large deviation regime, the dominant term of (2.3) will still be . This Section contains several statements that make the above claims precise.
Remark 2.6.
Lemma 4 of [2] states that in either of the following setups:
-
1.
-
2.
.
Hence, for large values of , one should be able to upper bound the first term of (2.3) with , where , but can get arbitrarily close to .
In the above, Case 1 includes all subWeibull variables, and Case 2 includes variables with polynomial tails and finite variances.
Remark 2.7.
When , and is bounded, we have
(2.5) |
This includes both cases of Remark 2.6 with and , respectively.
To verify (2.5) it suffices to note that , , and all other terms in the definition of remain bounded.
While Remark 2.6 provides asymptotic bounds for , one might need bounds for finite sample case to utilize Theorem 2.4. Next Lemma and Remark provide such bounds.
Lemma 2.8.
If for some , , and is fixed, then there is a fixed number such that for any and we have .
Proof.
Since , we have . Moreover, by Corollary 2 of [2] we obtain
(2.6) |
The right hand side of (2.6) does not depend on , hence it remains constant as .
Note that there is a slight change of variable for function defined here and the one defined in [2], which of course has been taken into account in quoting Corollary 2 from [2].
Remark 2.9.
If with , which includes kernels with polynomial tails and finite variances, Corollary 3 of [2] yields
(2.7) |
for some independent of .
2.2 Lower bound
Let be the function defined in Definition 2.1, and , where denotes generalized inverse of and is the closed interval of given limits. Define:
Definition 2.10.
(2.8) |
Note that we force to be in , but only use the first in the argument of .
Then we have:
Lemma 2.11.
Assume kernel has a finite variance. Then
(2.9) |
where is an absolute constant independent of .
Proof.
Choosing , and small enough to cover all the finite cases before the above asymptotic become true concludes the proof.
Remark 2.12.
in Lemma 2.11 can be replaced with any sequence of events for which . Also, one can work with to relax this condition to .
2.3 Large Deviation Principle
In this Section, we show the upper bound (2.3) is asymptotically tight in certain cases. The idea is to show the rate function for the large deviation of a U-statistic is asymptotically equivalent to the right hand side of (2.3). A trivial requirement for such sharpness to hold is functions defined in Definition 2.1 be asymptotically tight. This is formalized in the next assumption.
Assumption 2.
Suppose
Hereafter, we focus on subWeibull distributions. The tail of such distribution is bounded by some Weibull distribution, hence, all of their moments are finite. Nonetheless, the exponential moment is not finite. Assumption 3 encodes the class of heavy-tailed subWeibull random variables.
Assumption 3.
Assume there is and such that .
Although LDP is derived under Assumptions 3 and 6 here, we think one can obtain similar results for distributions with polynomial tails, i.e. logarithmic , and finite second moments following footsteps of this Section and Bakhshizadeh et al. [2]. However, for the sake of brevity we do not include polynomial tails in the following Sections.
Moreover, we need a lower bound for the deviation amount to make sure it is in the large deviation regime. The below assumption provides such a bound.
Assumption 4.
Suppose , i.e. , as .
Remark 2.13.
For , Assumption 4 holds whenever . This includes constant as well as converging to sequence as long it decays slower than .
Lemma 2.14.
Suppose Assumptions 1, 2, 3, and 4 hold. For defined in (2.8) and in Theorem 2.4 if one has
(2.10) |
where , then
In other words, is the large deviation rate function for .
We postpone proof of the above Lemma to Section 4.1.
Condition (2.10) essentially says large values of are determined by large value of only one coordinate. It can be proved for many commonly used kernels applied to heavy-tailed variables . Assuming tail of is a shifted sub-additive function, a usual property of heavy tails, we can prove (2.10) for several kernels.
Assumption 5.
Suppose a constant shift on , defined in Definition 2.1, makes it sub-additive on non-negative Real numbers, i.e.
(2.11) |
where is an absolute constant.
Remark 2.15.
Assumption 5 is somehow posing heavy-tailed distribution requirement for random variable . While it does not directly state is a sublinear function, which is equivalent to the formal definition of a heavy-tailed distribution, it controls ’s growth to be equal to or slower than linear functions. Lemma 2.16 and Remarks 2.17, 2.18 denote most well-known heavy-tailed distributions can have shifted sub-additive tail function .
Lemma 2.16.
If is a concave function on non-negative Real numbers, then satisfies (2.11) with .
Proof.
By concavity of we have
Similarly, we obtain . Summing up the above two inequalities shows
Remark 2.17.
Remark 2.18.
In general, with simple modifications, we expect the tail function for heavy-tailed distributions satisfy Assumption 5. This includes distributions that are not named in Remark 2.17. Note that is an increasing function that grows to infinity, and by heavy-tailed assumption is supposed to be sub-linear. Hence, it is expected that becomes a concave function after some point . Using the same technique we utilized in the proof of case 3, Log-Normal distribution, of Remark 2.17, one can define a function which is equal to on , and linearly extends to such that it is less than and remains concave on the whole non-negative Real numbers. At this point, Lemma 2.16 shows should satisfy (2.11).
Assumption 6.
Suppose as , i.e. .
Lemma 2.19.
Proof of the above Lemma is postponed to Section 4.3.
2.4 Discussion on the necessity of condition (2.10)
Lemma 2.14 denotes the upper bound given in Theorem 2.4 is asymptotically sharp if (2.10) holds. Lemma 2.19 lists some common kernels of U-statistics for which (2.10) holds. One might ask if (2.10) is necessary to obtain asymptotic sharpness and LDP. Below, we study an example for which the bound given by Theorem 2.4 is not sharp. This shows kernels need to satisfy certain conditions to have the same asymptotic as the right hand side of (2.3). Determining the necessary conditions for such kernels is not given in this work, but is an interesting question to be addressed in future studies.
Consider and . Also, assume has a symmetric distribution around origin, and (e.g or ). In this case,
for large enough . Hence, for the tail function we will have
(2.12) |
If one directly applies Theorem 2.4, she obtains
(2.13) |
where constant is given by Lemma 2.8, is arbitrary for large , and . The right hand side of (2.13) is at least . This bound is loose since we will show decays like , and as .
Observe that
where .
Note that is an order U-statistic with kernel , and . Also, for kernel the tail function . Hence, by Theorem 2.4 we obtain
(2.14) |
As discussed in Section 2.3, the right hand side of (2.14) decays like , which is much smaller than (2.13) when is large.
Indeed, we can show (2.14) has the right order of decay. Considering the event for which
Therefore, one can show is the correct speed for the large deviation decay of . In other words, there are constants such that for large enough
Remark 2.22.
Note that is a degenerate unbounded kernel. Both Nikitin and Ponikarov [18] and Chakrabortty and Kuchibhotla [5] claim sharp exponential bounds or large deviation limits for such kernels have not been addressed in works preceding them. As discussed in the current Section, while product kernel does not satisfy (2.10), a slight modification in the usage of Theorem 2.4 can still yield an exponential bound which is sharp up to a constant. This shows the strength of Theorem 2.4 even beyond scenarios in which sharpness has been shown through Lemma 2.14.
3 Future works
While we documented some important information about the concentration of U-statistics with heavy-tailed samples, there are questions that remained unanswered. It seems addressing some of them takes only extra effort along the similar path of reasoning we used in this manuscript. We exclude such questions just for the sake of brevity which we believe helps to convey the main message of the current work better. Other questions sound more challenging and may require different techniques to be addressed. In this Section, we point out both types of questions and leave them for future studies.
Note that Lemmas 2.14 and 2.19 denote the upper bound (2.3) is sharp for certain U-statistics when is larger than the region of Gaussian decay. However, the first term of (2.3), the Gaussian term, does not have a sharp constant in general. Below Remark declares this fact.
Remark 3.1.
Let . The asymptotic variance of is [12], , and . In fact, . This means
It would be interesting if one can improve (2.3) to have sharp constants on both Gaussian and heavy-tailed regions of deviation. A direction that sounds promising to do so is to use Hoeffding decomposition [12] and apply Theorem 2.4 for projections of individually.
Another possible improvement is to extend results of Section 2.3 beyond kernels with subWeibull tails, i.e. when Assumption 3 does not hold. This already has been done for sums of iid samples, i.e. U-statistics of order , in Bakhshizadeh et al. [2]. Moreover, Theorem 2.4 offers non-trivial bounds as long as . Assumption 3 is used to remove all logarithmic terms when . Taking such terms into account needs more effort, but does not change the spirit of our reasoning. Let have logarithmic growth. In the light of (2.3) seems a reasonable rate function for LDP of with speed .
Moreover, condition (2.10) is only a sufficient condition for the sharpness of inequality (2.3). It is interesting to determine necessary conditions for sharpness of Theorem 2.4 in the sense of rough logarithmic limits. In addition, developing sharp upper bounds when such conditions do not hold can be a good direction to extend results of the current paper.
Finally, perhaps the most important message of this paper is one can extend concentration results developed for subGaussian variables to distribution with heavier tails simply by truncation technique and precise tuning of the truncation level. While this technique is applied for U-statistics here, the question is to what other self-averaging processes we can apply the same and obtain sharp concentration. We hope this work motivates future studies to obtain upper bounds with sharp constants to a larger class self-averaging processes.
Acknowledgment
The author is thankful for inspiring discussions he has had with Dr. Nabarun Deb during the development of this work.
4 Proofs
This Section includes the proofs of statements in the previous Sections. First, let recall the following Lemma from Bakhshizadeh et al. [2] which turns out to be useful in the calculation of logarithmic limits.
Lemma 4.1 (Lemma 5 of [2]).
Let and be sequences of positive numbers such that
Then
4.1 Proof of Lemma 2.14
4.2 Proof Remark 2.17
Proof.
-
1.
If we have , so is a linear function which satisfies (2.11) with equality and .
- 2.
-
3.
Let . Note that
where . Similar to the previous case, asymptotic tightness of comes from rate function of the Normal distribution. Instead of showing (2.11) for , we replace it with a concave asymptotically tight lower bound . This useful technique can be applied to several other cases as well. Let
Then, , i.e. , and is a concave function on , hence by Lemma 2.16 satisfies (2.11).
To verify above claims, one only needs to note that is a convex function on and is a concave function on . on is simply the linear approximation of at to make it concave everywhere.
Figure 1: and -
4.
If , then . Hence, one can take the trivial tail bound . This function satisfies , and is concave for . Hence, Lemma 2.16 yeilds .
-
5.
Let
Note that
therefore
Applying to the above inequality yields .
-
6.
Let . Then,
is a concave function on the support of , i.e. . Similar to the Log-Normal case above, one can linearly extend to a concave function on and utilize Lemma 2.16 to verify is shifted sub-additive.
4.3 Proof of Lemma 2.19
Proof.
1. The strategy to show (2.10) is to show as . Let us write . Note that
(4.2) |
Next, we try to approximate the term from (2.10). Towards this direction, we begin by observing that
for some such that . Consequently,
Hence,
(4.3) |
We used Lemma 4.2 to drop .
For the other direction, we need to establish an upper bound for . Let , then
To obtain inequality marked by , we used Assumption 5. Taking of above inequality we get
(4.4) |
Set . Since, by Lemma 4.2, , we obtain:
(4.5) |
2. Once again, we set and note that, in this case,
With the above display in mind, observe that
Note that for large enough
We therefore have
(4.6) |
Next, we try to approximate the term from (2.10). Note that
(4.7) |
for a constant such that . Consequently,
which in turn yields
(4.8) |
We utilized Lemma 4.2 to drop from denominator in limits.
For the other direction, we need to establish an upper bound for the left hand side of (4.7). As proved in (4.4), with , we have
(4.9) |
Since as , from (4.9) we obtain
(4.10) |
This completes the proof.
3. We write . Note that
Hence, , and similar to the above cases we can show:
(4.11) |
4. Assume is large enough so . Calling we have
Therefore,
Hence,
Similar to the previous cases above, by Lemma 4.2, we can then show
(4.12) |
Moreover, since , we obtain
Furthermore, , and . Hence,
To obtain inequality marked by we used Assumption 5 and the fact that for non-negative .
Lemma 4.2.
(4.15) |
(4.16) |
(4.17) |
References
- Arcones [1992] Miguel A Arcones. Large deviations for u-statistics. Journal of multivariate analysis, 42(2):299–301, 1992.
- Bakhshizadeh et al. [2020] Milad Bakhshizadeh, Arian Maleki, and Victor H. de la Pena. Sharp concentration results for heavy-tailed distributions, 2020. URL https://arxiv.org/abs/2003.13819.
- Baringhaus and Rank [2002] L Baringhaus and R Rank. On large deviations of u-statistics and their projections. Sankhyā: The Indian Journal of Statistics, Series A, pages 167–170, 2002.
- Bretagnolle [1999] Jean Bretagnolle. A new large deviation inequality for u-statistics of order 2. ESAIM: Probability and Statistics, 3:151–162, 1999.
- Chakrabortty and Kuchibhotla [2018] Abhishek Chakrabortty and Arun Kumar Kuchibhotla. Tail bounds for canonical u-statistics and u-processes with unbounded kernels. 2018.
- Dasgupta [1984] Ratan Dasgupta. On large deviation probabilities of u-statistics in non iid case. Sankhyā: The Indian Journal of Statistics, Series A, pages 110–116, 1984.
- de la Pena [1992] Victor H de la Pena. Decoupling and khintchine’s inequalities for u-statistics. The Annals of Probability, pages 1877–1892, 1992.
- Eichelsbacher [2005] Peter Eichelsbacher. Refined large deviations for von mises statistics. Theory of Probability & Its Applications, 49(1):139–148, 2005.
- Eichelsbacher and Löwe [1995] Peter Eichelsbacher and Matthias Löwe. A large deviation principle form-variate von mises-statistics and u-statistics. Journal of Theoretical Probability, 8(4):807–824, 1995.
- Fan et al. [2012] Xiequan Fan, Ion Grama, and Quansheng Liu. Large deviation exponential inequalities for supermartingales. Electronic Communications in Probability, 17:1–8, 2012.
- Giné et al. [2000] Evarist Giné, Rafał Latała, and Joel Zinn. Exponential and moment inequalities for u-statistics. In High Dimensional Probability II, pages 13–38. Springer, 2000.
- Hoeffiding [1948] W Hoeffiding. A class of statistics with asymptotically normal distributions. Annals of Mathematical Statistics, 19:293–325, 1948.
- Houdré and Reynaud-Bouret [2003] Christian Houdré and Patricia Reynaud-Bouret. Exponential inequalities, with constants, for u-statistics of order two. In Stochastic inequalities and applications, pages 55–69. Springer, 2003.
- Kiessling and Wang [2012] Michael K-H Kiessling and Yu Wang. Onsager’s ensemble for point vortices with random circulations on the sphere. Journal of Statistical Physics, 148(5):896–932, 2012.
- Korolyuk and Borovskich [2013] Vladimir S Korolyuk and Yu V Borovskich. Theory of U-statistics, volume 273. Springer Science & Business Media, 2013.
- Lee [2019] A.J. Lee. U-Statistics: Theory and Practice. CRC Press, 2019. ISBN 9781351405850. URL https://books.google.com/books?id=YC2NDwAAQBAJ.
- Liu and Wu [2020] Wei Liu and Liming Wu. Large deviations for empirical measures of mean-field gibbs measures. Stochastic Processes and their Applications, 130(2):503–520, 2020.
- Nikitin and Ponikarov [2001] Ya. Yu. Nikitin and E.V. Ponikarov. Rough asymptotics of probabilities of chernoff type large deviations for von mises functionals and u-statistics. Translations of the American Mathematical Society-Series 2, 203:107–146, 2001.
- Nikitin [1995] Yakov Nikitin. Asymptotic efficiency of nonparametric tests. Cambridge University Press, 1995.
- Talagrand [1996] Michel Talagrand. New concentration inequalities in product spaces. Inventiones mathematicae, 126(3):505–563, 1996.
- Vershynin [2018] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Wang [1994] Wenyang Wang. Large deviations of U-empirical probability measures and statistical functionals. The Johns Hopkins University, 1994.
- Yakov Y Nikitin [2004] Irina Peaucelle Yakov Y Nikitin. Efficiency and local optimality of nonparametric tests based on u-and v-statistics. Metron, 62(2):185–200, 2004.