On Uniform Confidence Intervals for the Tail Index and the Extreme Quantile
Abstract
This paper presents two results concerning uniform confidence intervals for the tail index and the extreme quantile.
First, we show that it is impossible to construct a length-optimal confidence interval satisfying the correct uniform coverage over a local non-parametric family of tail distributions.
Second, in light of the impossibility result, we construct honest confidence intervals that are uniformly valid by incorporating the worst-case bias in the local non-parametric family.
The proposed method is applied to simulated data and a real data set of National Vital Statistics from National Center for Health Statistics.
Keywords: honest confidence interval, extreme quantile, impossibility, tail index, uniform inference
1 Introduction
Suppose that one is interested in constructing a confidence interval (CI) for the true tail index of a distribution. To define this parameter, assume that the distribution function (d.f.), denoted by , has its well-defined density function and satisfies the well-known von Mises condition:
(1) |
as , where is called the tail index. Point-wise CIs for have been proposed by a number of papers, including Cheng and Peng (2001), Lu and Peng (2002), Peng and Qi (2006), and Haeusler and Segers (2007). More recently, Carpentier and Kim (2014b) develop an adaptive CI for that is uniformly valid over a parametric family of tail distributions indexed by the second-order parameter.
Since the seminal paper by Hill (1975), the literature has investigated numerous estimators for as well as their asymptotic properties. See the recent reviews by Gomes and Guillou (2015) and Fedotenkov (2020). Drees (2001) obtains the worst-case optimal convergence rate, i.e., the min-max bound, in estimation over a local non-parametric family of tail distributions. Carpentier and Kim (2014a) construct adaptive and minimax optimal estimator over the parametric family of second-order Pareto-type distributions. Motivated by these results, a natural question is whether a length-optimal CI for can achieve uniformly correct coverage probabilities for over the non-parametric family of tail distributions. To our best knowledge, the existing literature has not answered this important question, while such a problem has been investigated in other important contexts in statistics, e.g., for CIs of non-parametric densities (e.g., Low, 1997; Hoffmann and Nickl, 2011; Bull and Nickl, 2013; Carpentier, 2013), non-parametric regressions (e.g., Li, 1989; Genovese and Wasserman, 2008), and high-dimensional regression models (e.g., van de Geer et al., 2014; Wu et al., 2021) to list but a very few.
In this paper, we first show that it is in fact impossible to construct a length-optimal CI for the true tail index satisfying the uniformly correct coverage over the local non-parametric family considered by Drees (2001). Specifically, any CI enjoying the uniform coverage property can be no shorter than the worst-case bias over the non-parametric family up to a constant. This negative result is analogous to those of Low (1997) and Genovese and Wasserman (2008) presented in the contexts of non-parametric densities/regressions, but is novel in the context of the tail index.
Second, we derive the asymptotic distribution of Hill’s estimator uniformly over the local non-parametric family of tail distributions. Given the above impossibility result, it is imperative to account for a possible range of bias over the non-parametric family. Hence, we construct an honest CI for the tail index that is locally uniformly valid by incorporating the worse-case bias over the local non-parametric family, as well as influences from a sample on asymptotic randomness. We also demonstrate that this honest CI for the tail index extends to that for extreme quantiles.
Simulation studies support our theoretical predictions. While the naïve length-optimal CI not accounting for a possible range of bias over the local non-parametric family suffers from severe under-coverage overall, our proposed CIs on the other hand achieve correct coverage. We apply the proposed method to National Vital Statistics from National Center for Health Statistics, and construct CIs for quantiles of extremely low infant birth weights across a variety of mothers’ demographic characteristics and maternal behaviors. We find that, even after accounting for a possible range of bias over the local non-parametric family, having no prenatal visit during pregnancy remains a strong risk factor for low infant birth weight.
Organization: Section 2 introduces the setup, notations, and definitions. Section 3 establishes the impossibility result. Section 4 proposes a uniformly valid CI. Section 5 presents an application to extreme quantiles. Sections 6 and 7 illustrate simulation studies and real data analyses, respectively. Section 8 summarizes the paper. Mathematical proofs are collected in the Appendix.
2 Setup, Notations, and Definitions
2.1 Non-parametric Families in the Tail
Any distribution function satisfying the von Mises condition (1) can be equivalently characterized in the right tail in terms of its inverse by
for some and , which satisfies that tends to zero as . The standard Pareto distribution falls in this family as a trivial special case in which is the zero function and for all .
To maintain a non-parametric setup in statistical inference about the true tail idex , we follow Drees (2001) and consider the following family of d.f.’s:
where and for some constants .
Two remarks are in order about this family . First, let denote the Pareto distribution function with true tail index , i.e., . This is the center of localization by the family . The factor represents a deviation from this center. If we set , then the model essentially becomes parametric, since the deviation is fully determined by and hence by the constants, and . In this parametric setup, Hall and Welsh (1985) establish the optimal uniform rates of convergence over a family of d.f.’s with densities of the form , where . In contrast, we consider a non-parametric family like , in which the function serves merely as an upper bound for deviations from the center. We will focus on the classic estimator by Hill (1975). Since Hill’s estimator is scale invariant, we hereafter assume without loss of generality and for succinct writing.
Second, by the fact that , we can rewrite the quantile characterization of an element as
(2) |
To construct a uniformly valid inference, we now again follow Drees (2001) and consider a sequence of families of data-generating processes with tail index converging to at a rate for some as . Specifically, we consider a sequence satisfying
(3) |
This sequence essentially entails the optimal choice of the tuning parameter (de Haan and Ferreira, 2006, p.77), which we introduce later. Now, we obtain a drifting sequence of local families consisting of -indexed elements whose quantile functions satisfy
(4) | ||||
where the second equality is due to (2). The corresponding tail index is given by , and characterizes the local deviation from the standard Pareto distribution.
Given the above reparametrization, we translate the local non-parametric family of d.f.’s into that of . Setting the upper bound to , we consider the family
which contains all square integrable functions that are uniformly bounded and satisfy the bound . The family induces a local counterpart of the non-parametric family , namely
indexed by as a function of the sample size . Given the local reparametrization introduced above, now represents a deviation function and the associated tail index is .
2.2 Hill’s Estimator
Let be a random sample from , and let denote the -th largest order statistic in this sample. With these notations, Hill’s estimator is defined by
In practice, a researcher often implements this estimator for an interval of values of the tuning parameter to demonstrate ad hoc robustness. This common practice can be formally accommodated by allowing for a sequence of intervals, i.e., for , similarly to Drees, Resnick, and de Haan (2000).
Define the functional by
for any measurable function . If we substitute the quantile function of the standard Pareto distribution, that is, , then we have
identifying the true for any . Define the tail empirical quantile function
for , where denotes the smallest larger integer. With these auxiliary notations, as implied by Example 3.1 in Drees (1998a), Hill’s estimator can be equivalently rewritten as .
3 Asymptotic Impossibility
This section presents the first one of the two main theoretical results of this paper. In light of the min-max result for estimation (cf. Drees, 2001), it is natural that a researcher is interested in a length-optimal confidence interval satisfying the uniform coverage over a non-parametric family, such as the one introduced in Section 2.1. This section shows an impossibility of this objective.
Specifically, we aim to establish that the length of any confidence interval that has a coverage of uniformly for all is no shorter than the supremum seminorm of over up to a constant multiple. This implies that we cannot find a length-optimal confidence interval which satisfies the uniform coverage without accounting for the worst-case bias over the non-parametric family . Such an impossibility result in spirit parallels the one about non-parametric density estimation established by Low (1997) and the one about non-parametric regression estimation established by Genovese and Wasserman (2008), among others.
Define the modulus of continuity of by
(5) |
This is the worst-case bias in absolute value. Let and denote the joint distributions of i.i.d. draws from and , respectively. Let denote the expectation with respect to the product measure . The following theorem establishes the impossibility result. That is, the expected length of a uniformly valid confidence interval is no shorter than a constant multiplie of the modulus of continuity of .
Theorem 1 (Impossibility)
For , suppose that a confidence interval for has coverage probabilities of at least uniformly for all . Then, there exist and depending on and only such that and
(6) |
for all .
As already mentioned above, this result is analogous to those established by Low (1997), Cai and Low (2004), Genovese and Wasserman (2008), and Armstrong and Kolesár (2018), to list but a few, in other important contexts of statistics, but it is novel in the context of the tail index. To understand the lower bound in (6), we now derive a concrete expression for the element in the definition (5) of the modulus of continuity . Note that
The first term in the last line characterizes the bias due to the deviation from the standard Pareto distribution, and the second term characterizes the asymptotic randomness. As a consequence, to obtain a feasible and uniformly valid confidence interval, we will set an upper bound for the first term and adjust the critical value based on the second term. We obtain such a uniform confidence interval in the following section.
4 Uniform Confidence Interval
Given that , it has been established that the optimal rate of convergence of Hill’s estimator is . See, for example, Remark 3.2.7 in de Haan and Ferreira (2006). Such an optimal rate entails non-negligible asymptotic bias as charaterized in Theorem 2 below. To achieve this rate, we let , where means . Then, the restriction (3) implies that and as . We formally summarize these conditions below.
Condition 1
As , .
As the second one of the two main theoretical results of this paper, the following theorem derives the asymptotic distribution of uniformly for all and by exploiting the features of the functional defined in Section 2.2. Let denote the distribution of i.i.d. draws from the distribution .
Theorem 2 (Uniform Asymptotic Distribution)
A proof can be found in the Appendix. The convergence of Hill’s estimator as a function of has been established in the literature (e.g., Resnick and Stărică, 1997; Drees et al., 2000). In comparison, Theorem 2 here contributes to the literature by showing the asymptotic distribution uniformly over both and the local non-parametric function class. The terms and characterize the asymptotic randomness and bias, respectively. It follows that
(9) |
To conduct statistical inference based on (9), we need to compute the bound
of the bias. To this end, note that for all . Therefore, for any ,
(10) |
This bound is tight and achieved when, for example, .
With this tight bias bound taken into account in a similar spirit to Armstrong and Kolesár (2020), a locally uniformly valid confidence interval for the tail index is given by
(11) |
for , where is short for ,
is the upper bound of the bias, and denotes the suitable quantile of whose values can be found in Table 1.
0.10 | 0.05 | 0.01 | 0.10 | 0.05 | 0.01 | |||||
---|---|---|---|---|---|---|---|---|---|---|
1 | 1.64 | 1.96 | 2.56 | 1/4 | 2.41 | 2.71 | 3.27 | |||
10/11 | 1.87 | 2.19 | 2.76 | 1/5 | 2.46 | 2.74 | 3.34 | |||
5/6 | 1.95 | 2.27 | 2.86 | 1/10 | 2.58 | 2.85 | 3.44 | |||
2/3 | 2.09 | 2.42 | 3.01 | 1/20 | 2.67 | 2.92 | 3.51 | |||
1/2 | 2.22 | 2.54 | 3.12 | 1/50 | 2.75 | 3.01 | 3.57 | |||
1/3 | 2.33 | 2.66 | 3.23 | 1/100 | 2.80 | 3.08 | 3.61 |
As a final remark, we discuss the choice of the higher-order parameters and . They both depend on the underlying distribution and hence unknown. While this feature appears to be a disadvantage of our method, the impossibility result in Theorem 1 implies that this feature cannot be avoided, regardless of how we construct the interval. The existing literature has proposed several estimators of and , whose consistency requires additional assumptions on the underlying function, and equivalently further restrictions on the class . See Carpentier and Kim (2014a); Cheng and Peng (2001); Haeusler and Segers (2007) for some data-driven methods of choosing the higher-order parameters. The corresponding confidence intervals are no longer uniformly valid, but still point-wisely valid.
As an alternative, we propose a rule-of-thumb choice of and and proceed with the proposed interval. In particular, if the underlying distribution is Student-t with degrees of freedom, we know that . Furthermore, for the bias upper bound , we set , so that is at most of the true tail index to be estimated. In practice, we replace with the estimator . This rule-of-thumb choice is reministic to the Silverman’s choice of bandwidth in kernel density estimation, where the reference is the Gaussian distribution. We examine the performance of this rule by simulations in Section 6.
5 Extreme Quantiles
In this section, we apply Theorem 2 to uniform inference about extreme quantiles. To characterize the extremeness, we focus on the sequence of quantiles where as . Consider the extreme quantile estimator by Weissman (1978):
where is Hill’s estimator. Recall that the true quantile under the local drifting sequence is
as in (4), where satisfies as in Condition 1. We now aim to asymptotically approximate the distribution of
To this end, we state the following condition on the relation among , and (e.g., de Haan and Ferreira, 2006, Theorem 4.3.1).
Condition 2
and as .
Theorem 3 (Extreme Quantiles)
Theorem 3 implies that the asymptotic distribution of the extreme quantile estimator is the same as that of the tail index. Therefore, a uniformly valid confidence interval can be constructed similarly. In particular, a robust confidence interval for with nominal uniform coverage probability accounting for the bias is constructed as
(12) |
where again is short for , denotes a suitable quantile that can be found in Table 1, and .
6 Simulation Studies
In this section, we use simulated data to evaluate finite-sample performance of our proposed confidence intervals in comparison with the naïve lenth-optimal confidence interval. Sections 6.1 and 6.2 focus on inference about the tail index and extreme quantiles, respectively.
6.1 Tail Index
The following simulation design is employed for our analysis. We generate independent standard uniform random variables and construct the observations , where . We set , so that
(13) |
where the constants, and , characterize the scale and the shape, respectively, of the deviation from the Pareto distribution. For ease of comparisons, we set , which corresponds to the Student-t distribution with degrees of freedom. Then, we set for normalization, where we vary across sets of simulations.
To construct the optimal confidence interval, we need a choice of . We use the data-driven algorithm proposed by Guillou and Hall (2001), which we briefly summarize in Appendix B for the convenience of readers. This choice of is of the optimal rate as established in their Theorem 2. Specifically, we select according to (33) in Appendix B with the restriction that .
We implement three confidence intervals for the purpose of comparisons. The first method is the naïve length-optimal confidence interval without accounting for a possible bias over the local non-parametric family, that is,
(14) |
where is selected according to the procedure described above. Our impossibility result predicts that it fails to achieve a correct coverage uniformly over the non-parametric class encompassing our simulation design presented above.
The second one is , i.e., our proposed confidence interval given in (11) with , where is selected according to the procedure described above. The bias upper bound is chosen following the rule-of-thumb choice described in Section 4.
The third is based on snooping. Specifically, given selected according to the procedure described above, now consider the range containing integers denoted by . The snooping interval is constructed by
(15) |
where is defined in (11) and the lower bound of snooping is set to . For the bias upper bound , we set and replace with the estimator for each .
Tail Index: Coverage Probabilities | ||||||||||||
250 | 500 | 1000 | ||||||||||
(1, 0) | 0.92 | 0.99 | 0.98 | 0.92 | 0.99 | 0.99 | 0.91 | 0.99 | 0.99 | |||
(1, 0.5) | 0.88 | 0.98 | 0.97 | 0.81 | 0.99 | 0.98 | 0.73 | 0.99 | 0.99 | |||
(1, 1) | 0.77 | 0.98 | 0.97 | 0.64 | 0.98 | 0.98 | 0.65 | 0.99 | 0.99 | |||
(0.5, 0) | 0.91 | 0.99 | 0.98 | 0.91 | 0.99 | 0.99 | 0.91 | 0.99 | 0.99 | |||
(0.5, 0.5) | 0.70 | 0.98 | 0.98 | 0.55 | 0.98 | 0.99 | 0.55 | 0.98 | 0.98 | |||
(0.5, 1) | 0.51 | 0.87 | 0.94 | 0.59 | 0.86 | 0.92 | 0.61 | 0.93 | 0.95 | |||
Tail Index: Average Lengths | ||||||||||||
250 | 500 | 1000 | ||||||||||
(1, 0) | 0.29 | 0.49 | 0.53 | 0.20 | 0.40 | 0.43 | 0.14 | 0.34 | 0.36 | |||
(1, 0.5) | 0.30 | 0.50 | 0.54 | 0.21 | 0.42 | 0.43 | 0.16 | 0.36 | 0.37 | |||
(1, 1) | 0.31 | 0.52 | 0.54 | 0.23 | 0.44 | 0.45 | 0.18 | 0.38 | 0.39 | |||
(0.5, 0) | 0.14 | 0.24 | 0.26 | 0.10 | 0.20 | 0.22 | 0.07 | 0.17 | 0.18 | |||
(0.5, 0.5) | 0.16 | 0.27 | 0.28 | 0.12 | 0.22 | 0.23 | 0.09 | 0.20 | 0.20 | |||
(0.5, 1) | 0.18 | 0.29 | 0.31 | 0.15 | 0.25 | 0.26 | 0.12 | 0.22 | 0.23 |
From Theorem 2, we expect that both and will deliver asymptotically correct (uniform) coverages, whereas will not. Table 2 presents the coverage probabilities (top panel) and the average lengths (bottom panel) of the 95% confidence intervals based on 5000 Monte Carlo iterations. Key findings can be summarized as follows. First, both the intervals and have the correct coverage probability for most of the distributions consistently with our theory. When and , the deviation of the tail distribution away from the Pareto distribution is the most severe. In this case, suffers from some undercoverage when is small (e.g., and ), but achieves more satisfactory coverage as becomes large (e.g., ). Second, the coverage by is inadequate throughout, and even when the deviation from the Pareto distribution is relatively small. Furthermore, the extent of the undercoverage by tends to exacerbate as the sample size increases. Finally, the lengths of are slightly larger than those of when is 250, but they become almost identical when gets larger. From these findings, we prefer and to . As the sample size becomes larger, and become equally preferable, while consistently underperforms.x
6.2 Extreme Quantiles
We now turn to extreme quantiles. The data generating process continues to be the same as in the previous subsection. The object of interest is the 99% quantile so that . Similarly to the previous subsection, we again compare three confidence intervals.
The first one is the naïve confidence interval without accounting for a possible bias in the non-parametric family, that is,
(16) |
The second one is our proposed confidence interval, that is, in (12) with . The third one is our proposed confidence interval with snooping interval, that is,
(17) |
where are constructed in the same way as in (15), and is defined as in (12), and we set .
Extreme Quantile: Coverage Probabilities | ||||||||||||
250 | 500 | 1000 | ||||||||||
(1, 0) | 0.90 | 0.96 | 0.94 | 0.92 | 0.98 | 0.97 | 0.93 | 0.99 | 0.99 | |||
(1, 0.5) | 0.92 | 0.96 | 0.93 | 0.93 | 0.98 | 0.97 | 0.89 | 0.99 | 0.98 | |||
(1, 1) | 0.92 | 0.95 | 0.93 | 0.89 | 0.97 | 0.95 | 0.83 | 0.99 | 0.97 | |||
(0.5, 0) | 0.92 | 0.98 | 0.97 | 0.93 | 0.99 | 0.98 | 0.93 | 0.99 | 0.99 | |||
(0.5, 0.5) | 0.91 | 0.97 | 0.95 | 0.86 | 0.98 | 0.96 | 0.78 | 0.99 | 0.98 | |||
(0.5, 1) | 0.86 | 0.96 | 0.94 | 0.81 | 0.97 | 0.96 | 0.85 | 0.98 | 0.97 | |||
Extreme Quantile: Average Lengths | ||||||||||||
250 | 500 | 1000 | ||||||||||
(1, 0) | 136 | 232 | 240 | 91 | 183 | 189 | 62 | 152 | 156 | |||
(1, 0.5) | 170 | 291 | 277 | 110 | 225 | 213 | 76 | 185 | 174 | |||
(1, 1) | 199 | 342 | 302 | 132 | 263 | 235 | 90 | 209 | 189 | |||
(0.5, 0) | 6.3 | 10.9 | 11.6 | 4.4 | 8.9 | 9.3 | 3.0 | 7.5 | 7.7 | |||
(0.5, 0.5) | 8.3 | 14.4 | 14.3 | 5.8 | 11.5 | 11.3 | 4.2 | 9.4 | 9.1 | |||
(0.5, 1) | 10.9 | 18.3 | 17.6 | 7.5 | 13.7 | 13.3 | 5.3 | 10.6 | 10.4 |
Table 3 presents the coverage probabilities (top panel) and the average lengths (bottom panel) of the 95% confidence intervals based on 1000 Monte Carlo iterations. The findings are similar to those in Table 2. In particular, both the intervals and lead to correct coverage probabilities for all distributions, while suffers from undercoverage in general. Regarding the lengths, and are both longer than to allow for the correct coverage. Furthermore, when is 0.5, has approximately the same lengths as . When , the lengths of are shorter than those of , especially when the model largely deviates away from the Pareto distribution. This is because the true quantile is substantially larger so that the effects of adapting the critical value become more significant. We prefer and to from these observations.
7 Real Data Analysis
This section illustrates an application of the proposed method to an analysis of extremely low infant birth weights. Their relations with mothers’ demographic characteristics and maternal behaviors address important research questions. We use detailed natality data (Vital Statistics) published by the National Center for Health Statistics, which has been used by prior studies including Abrevaya (2001) and many others. Our sample consists of repeated cross sections from 1989 to 2002. Using the data from each of these years, we construct 95% confidence intervals of the tail index in the left tail and the first percentile following the same computational procedure as the one taken in Section 6. Details of our implementation with the current empirical data set are as follows.
We follow previous studies (e.g., Abrevaya, 2001) to choose the variables for mothers’ demographic characteristics and maternal behaviors. The variable of our interest is the infant birth weight measured in kilograms. For the purpose of comparison, we set a benchmark subsample in which the infant is a boy and the mother is younger than the median age in the full sample, is white and married, has levels of education lower than a high school degree, had her first prenatal visit in the first trimester (natal1), and did not smoke during the pregnancy. In addition to this benchmark subsample (benchmark), we also consider seven alternative subsamples corresponding to one and only one of the following scenarios: the mother has at least a high school diploma (high school); the infant is a girl (girl); the mother is unmarried (unmarried); the mother is black (black); the mother did not have prenatal visit during pregnancy (no pre-visit); the mother smokes ten cigarettes per day on average (smoke) and the mother’s age is above the median age in the full sample (older).
For each of these subsamples, we construct the 95% confidence intervals , , and for the tail index in the left tail in the same way as in Section 6.1, and the 95% confidence intervals , , and for the first percentile as in Section 6.2. Since we are interested in the left tail (extremely low birth weights), we consider only the birth weight, denoted , that is less than some cutoff value , and take as the input in our computational procedure for inference. We choose based on the prior findings in Abrevaya (2001). Namely, Abrevaya (2001) finds that the relationship between the infant birth weight and mother’s demographics change substantially at the 90th percentile of birth weight in the full sample, which is approximately 4 kilograms. Once , , and are constructed for the 99th percentile of this transformed variable, we in turn multiply by and add back to restore the interval for the original first percentile to conduct inference about the extremely low birth weights.
Tail Index | |||||||
Subsample | |||||||
benchmark | [0.27 | 0.29] | [0.24 | 0.31] | [0.25 | 0.31] | |
high school | [0.29 | 0.30] | [0.26 | 0.33] | [0.27 | 0.30] | |
girl | [0.25 | 0.27] | [0.23 | 0.29] | [0.23 | 0.28] | |
unmarried | [0.29 | 0.31] | [0.26 | 0.34] | [0.27 | 0.32] | |
black | [0.24 | 0.29] | [0.21 | 0.32] | [0.21 | 0.33] | |
no pre-visit | [0.34 | 0.41] | [0.31 | 0.45] | [0.31 | 0.46] | |
smoke | [0.22 | 0.27] | [0.20 | 0.30] | [0.19 | 0.29] | |
older | [0.29 | 0.31] | [0.26 | 0.34] | [0.26 | 0.32] | |
First Percentile in Kilogram | |||||||
Subsample | |||||||
benchmark | [1.46 | 1.56] | [1.30 | 1.72] | [1.35 | 1.68] | |
high school | [1.48 | 1.52] | [1.31 | 1.69] | [1.43 | 1.68] | |
girl | [1.50 | 1.59] | [1.35 | 1.74] | [1.43 | 1.72] | |
unmarried | [1.20 | 1.30] | [0.99 | 1.51] | [1.10 | 1.48] | |
black | [0.99 | 1.39] | [0.80 | 1.58] | [0.79 | 1.58] | |
no pre-visit | [0.00 | 0.63] | [0.00 | 1.08] | [0.00 | 1.09] | |
smoke | [1.34 | 1.59] | [1.21 | 1.72] | [1.27 | 1.71] | |
older | [1.30 | 1.45] | [1.13 | 1.62] | [1.24 | 1.62] |
Table 4 presents the results for the 2002 sample. The results for other years are similar and hence omitted here to save space. Key empirical findings from these results can be summarized as follows. First, and are similar in length for the tail index, while tends to be slightly longer than for the first percentile. Second, both of them are substantially longer than the naïve intervals, an , suggesting that ignoring the bias could lead to misleadingly short intervals. Third, compared with the benchmark subsample, the mothers who do not have any prenatal visit during pregnancy bear a substantially higher risk of having extremely low infant birth weights. This observation remains true even after accounting for a possible bias.
8 Summary
In this paper, we present two theoretical results concerning uniform confidence intervals for the tail index and extreme quantiles. First, we find it impossible to construct a length-optimal confidence interval satisfying the correct uniform coverage over the local non-parametric family of tail distributions. Second, in light of the impossibility result, we construct an honest confidence interval that is uniformly valid by accounting for the worst-case bias over the local non-parametric class. Simulation studies support our theoretical results. While the naïve length-optimal confidence interval suffers from severe under-coverage, our proposed confidence intervals achieve correct coverage. Applying the proposed method to National Vital Statistics from National Center for Health Statistics, we find that, even after accounting for the worst-case bias bound, having no prenatal visit during pregnancy remain a strong risk factor for low infant birth weight. This result demonstrates that, despite the impossibility result, it is possible to conduct a robust yet informative statistical inference about the tail index and extreme quantiles.
References
- Abrevaya (2001) Abrevaya, J. (2001): “The effects of demographics and maternal behavior on the distribution of birth outcomes,” Empirical Economics, 26, 247–257.
- Armstrong and Kolesár (2018) Armstrong, T. B. and M. Kolesár (2018): “Optimal inference in a class of regression models,” Econometrica, 86, 655–683.
- Armstrong and Kolesár (2020) ——— (2020): “Simple and honest confidence intervals in nonparametric regression,” Quantitative Economics, 11, 1–39.
- Bull and Nickl (2013) Bull, A. D. and R. Nickl (2013): “Adaptive confidence sets in ,” Probability Theory and Related Fields, 156, 889–919.
- Cai and Low (2004) Cai, T. T. and M. G. Low (2004): “An adaptation theory for nonparametric confidence intervals,” Annals of statistics, 32, 1805–1840.
- Carpentier (2013) Carpentier, A. (2013): “Honest and adaptive confidence sets in ,” Electronic Journal of Statistics, 7, 2875–2923.
- Carpentier and Kim (2014a) Carpentier, A. and A. K. H. Kim (2014a): “Adaptive and minimax optimal estimation of the tail coefficient,” Statistica Sinica, 25, 1133–1144.
- Carpentier and Kim (2014b) ——— (2014b): “Adaptive confidence intervals for the tail coefficient in a wide second order class of Pareto models,” Electronic Journal of Statistics, 8, 2066–2110.
- Cheng and Peng (2001) Cheng, S. and L. Peng (2001): “Confidence intervals for the tail index,” Bernoulli, 7, 751–760.
- Danielsson et al. (2001) Danielsson, J., L. de Haan, L. Peng, and C. G. de Vries (2001): “Using a bootstrap method to choose the sample fraction in tail index estimation,” Journal of Multivariate Analysis, 76, 226–248.
- Danielsson et al. (2016) Danielsson, J., L. M. Ergun, L. de Haan, and C. G. de Vries (2016): “Tail index estimation: Quantile driven threshold selection,” Working Paper.
- de Haan and Ferreira (2006) de Haan, L. and A. Ferreira (2006): Extreme Value Theory: An Introduction, Springer.
- Drees (1998a) Drees, H. (1998a): “A general class of estimators of the extreme value index,” Journal of Statistical Planning and Inference, 66, 95–112.
- Drees (1998b) ——— (1998b): “On smooth statistical tail functionals,” Scandinavian Journal of Statistics, 25, 187–210.
- Drees (2001) ——— (2001): “Minimax risk bounds in extreme value theory,” Annals of Statistics, 29, 266–294.
- Drees and Kaufmann (1998) Drees, H. and E. Kaufmann (1998): “Selecting the optimal sample fraction in univariate extreme value estimation,” Stochastic Processes and their Applications, 75, 149–172.
- Drees et al. (2000) Drees, H., S. I. Resnick, and L. de Haan (2000): “How to make a Hill plot,” Annals of Statistics, 28, 254–274.
- Fedotenkov (2020) Fedotenkov, I. (2020): “A review of more than one hundred Pareto-tail index estimators,” Statistica, 80, 245–299.
- Geluk and Peng (2000) Geluk, J. L. and L. Peng (2000): “An adaptive optimal estimate of the tail index for MA (l) time series,” Statistics & Probability Letters, 46, 217–227.
- Genovese and Wasserman (2008) Genovese, C. and L. Wasserman (2008): “Adaptive confidence bands,” Annals of Statistics, 36, 875–905.
- Gomes and Guillou (2015) Gomes, M. I. and A. Guillou (2015): “Extreme value theory and statistics of univariate extremes: A review,” International Statistical Review, 83, 263–292.
- Guillou and Hall (2001) Guillou, A. and P. Hall (2001): “A diagnostic for selecting the threshold in extreme value analysis,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 293–305.
- Haeusler and Segers (2007) Haeusler, E. and J. Segers (2007): “Assessing confidence intervals for the tail index by Edgeworth expansions for the Hill estimator,” Bernoulli, 13, 175–194.
- Hall and Welsh (1985) Hall, P. and A. H. Welsh (1985): “Adaptive estimates of parameters of regular variation,” Annals of Statistics, 13, 331–341.
- Hill (1975) Hill, B. M. (1975): “A simple general approach to inference about the tail of a distribution.” Annals of Statistics, 3, 1163–1174.
- Hoffmann and Nickl (2011) Hoffmann, M. and R. Nickl (2011): “On adaptive inference and confidence bands,” Annals of Statistics, 39, 2383–2409.
- Li (1989) Li, K.-C. (1989): “Honest confidence regions for nonparametric regression,” Annals of Statistics, 17, 1001–1008.
- Low (1997) Low, M. G. (1997): “On nonparametric confidence intervals,” Annals of Statistics, 25, 2547–2554.
- Lu and Peng (2002) Lu, J.-C. and L. Peng (2002): “Likelihood based confidence intervals for the tail index,” Extremes, 5, 337–352.
- Peng and Qi (2006) Peng, L. and Y. Qi (2006): “A new calibration method of constructing empirical likelihood-based confidence intervals for the tail index,” Australian & New Zealand Journal of Statistics, 48, 59–66.
- Reiss (1989) Reiss, R.-D. (1989): Approximate distribution of order statistics, Springer, New York.
- Resnick (2007) Resnick, S. I. (2007): Heavy-tail phenomena: probabilistic and statistical modeling, Springer Science & Business Media.
- Resnick and Stărică (1997) Resnick, S. I. and C. Stărică (1997): “Smoothing the Hill estimator,” Advances in Applied Probability, 29, 271–293.
- van de Geer et al. (2014) van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014): “On asymptotically optimal confidence regions and tests for high-dimensional models,” Annals of Statistics, 42, 1166–1202.
- Weissman (1978) Weissman, I. (1978): “Estimation of parameters and large quantiles based on the k largest observations,” Journal of the American Statistical Association, 73, 812–815.
- Wu et al. (2021) Wu, Y., L. Wang, and H. Fu (2021): “Model-assisted uniformly honest inference for optimal treatment regimes in high dimension,” Journal of the American Statistical Association, forthcoming.
Supplmentary Appendix
On Uniform Confidence Intervals for the Tail Index and the Extreme Quantile
Appendix A Proofs
A.1 Proof of Theorem 1
We need the following two auxiliary lemmas to prove Theorem 1. Throughout, suppose that the distributions and are absolutely continuous with their density functions denoted by and , respectively.
Lemma 1
We have
Proof of Lemma 1.
By the definitions of and , we can write
Hence, it follows that
The change of variables yields
where equality (i) follows from as , equality (ii) follows from as , equality (iii) follows from the assumptions that is uniformly bounded and square integrable for all , and equality (iv) follows from the fact that with (under Condition 1) and . ∎
Lemma 2
The Hellinger distance between and satisfies
(18) |
Proof of Lemma 2.
First, Proposition 2.1 in Drees (2001) yields
(19) |
where
Note by Drees (2001, p.286) that satisfies
(20) | ||||
(21) |
Therefore, by expanding the square in (19), the equality (18) follows once we establish
This equality follows as
where inequality (i) follows by Cauchy Schwarz and equality (ii) follows from Lemma 1 and (21). ∎
Proof of Theorem 1.
We first use Lemma 2 to translate the Hellinger distance between and into the -distance between and . This is done by Equation (17) in Low (1997) so that
Next, let be arbitrary. By the definition of , there exists an and such that
Let for a short-handed notation. Also, let and denote the corresponding density and the probability measure, respectively. Since has probability of coverage of at least uniformly over , it follows that, for any ,
and
Then, we have
By the same lines of argument as in equations (12)–(15) in Low (1997), we obtain
The inequality (6) now follows since is set to be arbitrary. ∎
A.2 Proof of Theorem 2
Proof.
First, we approximate the empirical tail quantile function with the partial sum process of the standard exponential random variables. Let for be i.i.d. standard exponential random variables and , and , . Let denote the joint distribution of i.i.d. draws from the distribution . By Reiss (1989, Theorem 5.4.3) – see also Drees (2001, eq.(5.12)) – the variational distance between the distribution of under and the distribution of vanishes uniformly as , that is,
(22) |
which implies that
(23) |
uniformly for all and . Recall from Section 2.2 that characterizes Hill’s estimator. Hence it suffices to approximate . To this end, we employ a strong approximation of with . Specifically, using Drees (2001, eq.(5.13)), we obtain
(24) |
for all , where denotes the standard Wiener process.
In the second step, we exploit the feature of the functional . Since is scale invariant such that for any constant , we have
Moreover, is Hadamard differentiable at (cf. Drees, 1998b, Condition 3) in the sense that
(25) |
uniformly for all functions with and . To derive the expression of , we write
where
Following the derivation in Drees (1998a, p.104), we obtain
In the final step, we substitute , , and
By (24), (25), and the functional delta method, we have
(26) |
where . Use the definition of to obtain that
(27) |
and
(28) |
where we used the equality . Thus, in view of and combining (26) to (28), we find
where is uniform for all and . We conclude from (23) that
This implies that
where is uniform for all and . This completes the proof. ∎
A.3 Proof of Theorem 3
Proof.
We use the notation . Condition 2 guarantees that . Using the tail quantile process (24) with , we obtain
We aim to establish
(29) | ||||
(30) | ||||
(31) | ||||
(32) |
where the and terms are all uniform over both and .
First, Condition 1 yields that . Thus, (29) follows from
where equality (i) follows from (4), equality (ii) follows from a change of variables, inequality (iii) follows from for , and equality (iv) follows from and for any sequence .
Next, consider . Note that uniformly for all and by Theorem 2 and Condition 2. Then, using for any generic sequence , we have that
uniformly for all and . This yields (30).
Appendix B Appendix: Choice of
In this appendix section, we present a data-driven choice rule of following Guillou and Hall (2001) for completeness and for convenience of readers. For other methods of choosing the tail threshold, see, for example, Drees and Kaufmann (1998), Geluk and Peng (2000), Danielsson et al. (2001), and Danielsson et al. (2016). See also Resnick (2007, Chapter 4.4) for a review.
We use the shorthand notation in this section for notational simplicity. Define for . If is exactly Pareto distributed with exponent , then should be i.i.d. with exponential distribution and . Given this observation, we further construct the antisymmetric weights such that and . Then, the statistic
has zero mean and unit variance provided that is exactly Pareto and .
To evaluate a deviation of the above approximation, we further define the following criterion based on a moving average of :
where equals the integer part of . Intuitively, the larger is, the larger bias we have in the Pareto tail approximation, and hence by more exceeds one. To obtain an implementable rule, we follow Guillou and Hall (2001) to use and propose to choose the smallest that satisfies for all and some pre-specified constant . Again, following Guillou and Hall (2001), we set . For convenience of reference, we write the concrete equation:
(33) |