lemmasection
Quantile index regression
Abstract
Estimating the structures at high or low quantiles has become an important subject and attracted increasing attention across numerous fields. However, due to data sparsity at tails, it usually is a challenging task to obtain reliable estimation, especially for high-dimensional data. This paper suggests a flexible parametric structure to tails, and this enables us to conduct the estimation at quantile levels with rich observations and then to extrapolate the fitted structures to far tails. The proposed model depends on some quantile indices and hence is called the quantile index regression. Moreover, the composite quantile regression method is employed to obtain non-crossing quantile estimators, and this paper further establishes their theoretical properties, including asymptotic normality for the case with low-dimensional covariates and non-asymptotic error bounds for that with high-dimensional covariates. Simulation studies and an empirical example are presented to illustrate the usefulness of the new model.
Keywords: Asymptotic normality; High-dimensional analysis; Non-asymptotic property; Partially parametric model; Quantile regression.
1 Introduction
Quantile regression proposed by Koenker and Bassett, (1978) has been widely used across various fields such as biological science, ecology, economics, finance, and machine learning, etc.; see, e.g., Cade and Noon, (2003), Yu et al., (2003), Meinshausen and Ridgeway, (2006), Linton and Xiao, (2017) and Koenker, (2017). More references on quantile regression can be found in the books of Koenker, (2005) and Davino et al., (2014). Quantile regression has also been studied for high-dimensional data; see, e.g., Belloni and Chernozhukov, (2011), Wang et al., 2012b and Zheng et al., (2015). On the other hand, due to practical needs, it is increasingly becoming a popular subject to estimate the structures at high or low quantiles, such as the risk of high loss for investments in finance (Kuester et al.,, 2006; Zheng et al.,, 2018), high tropical cyclone intensity and extreme waves in climatology (Elsner et al.,, 2008; Jagger and Elsner,, 2008; Lobeto et al.,, 2021), and low infant birth weights in medicine (Abrevaya,, 2001; Chernozhukov et al.,, 2020). It hence is natural to make inference at these extreme quantiles for high-dimensional data, while this is still an open problem.
There are two types of approaches in the literature to model the structures at tails. The first one is based on the conditional distribution function (CDF) of the response for a given set of covariates , and it is usually assumed to have a semiparametric structure at tails; see, e.g., Pareto-type structures in Beirlant and Goegebeur, (2004) and Wang and Tsai, (2009). While this method cannot provide conditional quantiles in explicit forms. Later, Noufaily and Jones, (2013) considered a full parametric form, the generalized gamma distribution, to the CDF and then inverted the fitted distribution into a conditional quantile distribution. However, as indicated in Racine and Li, (2017), indirect inverse-CDF-based estimators may not be efficient in tail regions when the data has unbounded support.
The second approach is extremal quantile regression, which combines quantile regression with extreme value theory to estimate the conditional quantile at a very high or low level of , which satisfies with being the sample size; see Chernozhukov, (2005). Specifically, this is a two-stage approach: (i.) performing the estimation at intermediate quantiles with ; and (ii.) extrapolating the fitted quantile structures to those at extreme quantiles by assuming the extreme value index that is associated to the tails of conditional distributions; see Wang et al., 2012a and Wang and Li, (2013) for details. The key of this method is to make use of the feasible estimation at intermediate levels since there are relatively more observations around these levels. However, intermediate quantiles are also at the far tails, and the corresponding data points may still not be rich enough for the case with many covariates.
In order to handle the case with high-dimensional covariates, along the lines of extremal quantile regression, this paper suggests to conduct estimation at quantile levels with much richer observations, say some fixed levels around , and then extrapolate the estimated results to extreme quantiles by fully or partially assuming a form of conditional distribution or quantile functions on . Note that there exist many quantile functions, which have explicit forms, such as the generalized lambda and Burr XII distributions (Gilchrist,, 2000). Especially the generalized lambda distribution can provide a very accurate approximation to some Pareto-type and extreme value distributions, as well as some commonly used distributions such as Gaussian distribution (Vasicek,, 1976; Gilchrist,, 2000). These flexible parametric forms can be assumed to the quantile function on , and the drawback of inverting a distribution function hence can be avoided.
Specifically, for a predetermined interval , the quantile function of response is assumed to have an explicit form of for each level , up to unknown parameters or indices . By further letting be a function of covariates , we then can define the conditional quantile function as follows:
(1.1) |
where is the distribution of conditional on , and is a -dimensional parametric function. Note that can be referred to indices, and model (1.1) can then be called the quantile index regression (QIR) for simplicity. In practice, to handle high quantiles, we may take with a fixed value of and then conduct a composite quantile regression (CQR) estimation for model (1.1) at levels within but with richer observations. Subsequently, the fitted QIR model can be used to predict extreme quantiles. More importantly, since the estimation is conducted at fixed quantile levels, there is no difficulty to handle the case with high-dimensional covariates. In addition, comparing with the aforementioned two types of approaches in the literature, the proposed method can not only estimate quantile regression functions effectively, but also forecast extreme quantiles directly. Finally, the QIR model naturally yields non-crossing quantile regression estimators since its quantile function is nondecreasing with respect to .
The proposed model is introduced in details at Section 2, and the three main contributions can be summarized below:
-
(a)
When conducting the CQR estimation, we encounter the first challenge on model identification, and this problem has been carefully studied for the flexible Tukey lambda distribution in Section 2.2.
-
(b)
Section 2.2 also derives the asymptotic normality of CQR estimators for the case with low-dimensional covariates. This is a challenging task since the corresponding objective function is non-convex and non-differentiable, and we overcome the difficulty by adopting the bracketing method in Pollard, (1985).
-
(c)
Section 2.3 establishes non-asymptotic properties of a regularized high-dimensional estimation. This is also not trivial due to the problem at (b).
The rest of this paper is organized as follows. Section 3 discusses some implementation issues in searching for these estimators. Numerical studies, including simulation experiments and a real analysis, are given in Sections 4 and 5, and Section 6 provides a short conclusion and discussion. All technical details are relegated to the Appendix.
For the sake of convenience, this paper denotes vectors and matrices by boldface letters, e.g., and , and denotes scalars by regular letters, e.g., and . In addition, for any two real-valued sequences and , denote (or ) if there exists a constant such that (or ) for all , and denote if and . For a generic vector and matrix , let , and represent the Euclidean norm, -norm and Frobenius norm, respectively.
2 Quantile index regression
2.1 Quantile index regression model
Consider a response and a -dimensional vector of covariates . We then rewrite the quantile function of conditional on at (1.1) with an explicit form of below,
(2.1) |
where is an interval or the union of multiple disjoint intervals, , , , , the link functions s are monotonic for , and the intercept term can be included by letting . We call model (2.1) the quantile index regression (QIR) for simplicity, and the two examples of below are first introduced to illustrate the new model.
Example 1.
Consider the location shift model, , for all , where is a fixed level, is the location index and is the quantile function of standard normality. Under the identity link function, , we can construct a linear quantile regression model at . Then, after estimation, we can make a prediction at any level of .
Example 2.
Consider the Tukey lambda distribution (Vasicek,, 1976) that is defined by its quantile function,
where , and , and are the location, scale and tail indices, respectively. Due to its flexibility, the Tukey lambda distribution can provide an accurate approximation to many commonly used distributions, such as normal, logistic, Weibull, uniform, and Cauchy distributions, etc.; see Gilchrist, (2000). It is then expected to have a better performance when model (2.1) is combined with this distribution.
In the literature, there exist many quantile functions, which have explicit forms, such as the generalized lambda and Burr XII distributions (Gilchrist,, 2000; Fournier et al.,, 2007). For example, the generalized lambda distribution has the form of
where the indices and control the right and left tails, respectively. Note that it reduces to the Tukey lambda distribution when . This indicates that the generalized lambda distribution can be considered if we focus on the quantiles with the full range, i.e. . On the other hand, the Tukey lambda may be a better choice if our interest is on the quantiles at one side only, i.e. or .
2.2 Low-dimensional composite quantile regression estimation
Denote the observed data by , and they are independent and identically distributed () samples of random vector , where is the response, contains covariates, and is the number of observations.
Let be fixed quantile levels, where for all . To achieve higher efficiency, we consider the composite quantile regression (CQR) estimator below.
(2.2) |
where is the quantile check function; see Zou and Yuan, (2008) and Kai et al., (2010). Note that has the non-crossing property with respect to since, for each , is a non-decreasing quantile function.
To study the asymptotic properties of , we first consider , where is the population loss function, and it may not be unique, i.e. the CQR estimation at (2.2) may suffer from the identification problem. For the sake of illustration, let us consider the case without including covariates , i.e. we estimate with a sequence of quantile levels and . To this end, it requires that two different values of can not yield the same quantile function across all levels. In other words, if there exists that yield for all quantiles, then and are not identifiable. In sum, to guarantee that is the unique minimizer of the population loss, we make the following assumption on the quantile function .
Assumption 1.
For any two index vectors , there exists at least one such that .
Intuitively, for any quantile function , one can always increase the number of quantile levels to make Assumption 1 hold. However, it may also depend on the structures of quantile functions and the number of indices. For the sake of illustration, we state the sufficient and necessary condition for the Tukey lambda distribution that satisfies Assumption 1.
Lemma 1.
Assumption 1, together with an additional assumption on covariates , allows us to show that is the unique minimizer of ; see the following theorem, which is critical to establish the asymptotic properties of .
Theorem 1.
Suppose that is finite and positive definite. If Assumption 1 holds, then is the unique minimizer of .
To demonstrate the consistency of given below, we assume that the parameter space is compact, and the true parameter vector is an interior point of .
Theorem 2.
Suppose that If the conditions of Theorem 1 hold, then in probability as .
The moment condition assumed in the above theorem allows us to adopt the uniform consistency theorem of Andrews, (1987) in our technical proofs. To show the asymptotic distribution of , we introduce two additional assumptions given below.
Assumption 2.
For all ,
Assumption 3.
The conditional density is bounded and continuous uniformly for all .
Assumption 2 is used to prove Lemma A.2 in the Appendix; see also Zhu and Ling, (2011). Assumption 3 is commonly used in the literature of quantile regression (Koenker,, 2005; Belloni et al., 2019a, ), and it can be relaxed by providing more complicated and lengthy technical details (Kato et al.,, 2012; Chernozhukov et al.,, 2015; Galvao and Kato,, 2016).
Denote
and
Theorem 3.
Note that the objective function is non-convex and non-differentiable, and this makes it challenging to establish the asymptotic normality of . We overcome the difficulty by making use of the bracketing method in Pollard, (1985). Moreover, to estimate the asymptotic variance in Theorem 3, we first apply the nonparametric method in Hendricks and Koenker, (1991) to estimate the quantities of with , and then matrices and can be approximated by the sample averaging.
In addition, based on the estimator , one can use to further predict the conditional quantile at any level , and the corresponding theoretical justification can be established by directly applying the delta-method (van der Vaart,, 1998, Chapter 3).
Corollary 1.
Suppose that the conditions of Theorem 3 are satisfied. Then, for any ,
in distribution as , where .
2.3 High-dimensional regularized estimation
This subsection considers the case with high-dimensional covariates, i.e., , and the true parameter vector is assumed to be -sparse, i.e. the number of nonzero elements in is no more than . A regularized CQR estimation can then be introduced,
(2.3) |
where is given in Theorem 4, is a penalty function, and it depends on a tuning (regularization) parameter with .
Consider the loss function defined in (2.2), and usually is nonconvex with respect to . As a result, will be nonconvex although the check loss is convex, and there is no more harm to use nonconvex penalty functions. Specifically, we consider the component-wise penalization,
where is possibly nonconvex and satisfies the following assumption.
Assumption 4.
The univariate function satisfies the following conditions: (i) it is symmetric around zero with ; (ii) it is is nondecreasing on the nonnegative real line; (iii) the function is nonincreasing with respect to ; (iv) it is differentiable for all and subdifferentiable at , with and being a constant; (v) there exists such that is convex.
The above is the -amenable assumption given in Loh and Wainwright, (2015) and Loh, (2017), and the penalty function is required not too far from the convexity. Note that the popular penalty functions, including SCAD (Fan and Li.,, 2001) and MCP (Zhang,, 2010), satisfy the above properties.
In the literature of nonconvex penalized quantile regression, Jiang et al., (2012) studied nonlinear quantile regressions with SCAD regularizer from the asymptotic viewpoint, while it can only handle the case with . Wang et al., 2012b and Sherwood et al., (2016) considered the case that grows exponentially with , and their proving techniques heavily depend on the condition that the loss function should be represented as a difference of the two convex functions. However, does not meet this requirement since quantile function can be nonconvex.
On the other hand, non-asymptotic properties recently have attracted considerable attention in the theories of high-dimensional analysis; see, e.g., Belloni and Chernozhukov, (2011); Sivakumar and Banerjee, (2017); Pan and Zhou, (2021). This subsection attempts to study them for our proposed quantile estimators, while it is a nontrival task since existing results only focused on linear quantile regression. Loh and Wainwright, (2015) and Loh, (2017) studied the non-asymptotic properties for M-estimators with both nonconvex loss and regularizers, while they required the loss function to be twice differentiable. The technical proofs in the Appendix follow the framework in Loh and Wainwright, (2015) and Loh, (2017), and some new techniques are developed to tackle the nondifferentiability of the quantile check function.
Let with s being link functions and , and we can then denote . Moreover, by letting for and , we can further denote .
Assumption 5.
Quantile function is differentiable with respect to , and there exist two positive constants and such that and .
The differentiable assumption of quantile functions allows us to use the Lipschitz property and multivariate contraction theorem. The boundedness of covariates is to assure that the bounded difference inequality can be used, and it can be relaxed with more complicated and lengthy technical details (Wang and He,, 2021).
Denote by the Euclidean ball centered at with radius , and let be the smallest eigenvalue of matrix
Assumption 6.
There exists a fixed such that , and assume that .
The above assumption guarantees that the population loss is strongly convex around the true parameter vector . Specifically, let be the first-order Taylor expansion. Then, by Assumption 6, we have that for all such that ; see Lemma A.5 in the Appendix for details. We next obtain the non-asymptotic estimation bound of .
Theorem 4.
In practice, we can choose , and it then holds that , which has the standard rate of error bounds; see, e.g., (Loh,, 2017). Moreover, for the predicted conditional quantile of at any level , it can be readily verified that has the same convergence rate as . Finally, the above theorem requires the minimization (2.3) to be conducted in , which is unknown but fixed. This enables us to solve the problem by conducting a random initialization in optimizing algorithms.
3 Implementation issues
3.1 Optimizing algorithms
This subsection provides algorithms to search for the CQR estimator at (2.2) and regularized estimator at (2.3).
For the CQR estimation without penalty at (2.2), we employ the commonly used gradient descent algorithm to search for estimators, and the th update is given by
where is from the th iteration, and is the step size. Note that the quantile check loss is nondifferentiable at zero, and in the above refers to the subgradient (Moon et al.,, 2021) instead. In practice, too small step size will cause the algorithm to converge slowly, while too large step size may cause the algorithm to diverge. We choose the step size by the backtracking line search (BLS) method, which is shown to be simple and effective; see Bertsekas, (2016). Specifically, the algorithm starts with a large step size and, at th update, it is reduced by keeping multiplying a fraction of until , where is another hyper-parameter. The simulation experiments in Section 4 work well with the setting of .
For the regularized estimation at (2.3), we adopt the composite gradient descent algorithm (Loh and Wainwright,, 2015), which is designed for a nonconvex problem and fits our objective functions well. Consider the SCAD penalty, which satisfies Assumption 4 with and . We then can rewrite the optimization problem at (2.3) into
where, from Assumption 4, is convex. As a result, similar to the composite gradient descent algorithm in Loh and Wainwright, (2015), the th update can be calculated by
which has a closed-form solution of
with and , where the step size is chosen by the BLS method.
3.2 Hyper-parameter selection
There are two types of hyper-parameters in the penalized estimation at (2.3): the tuning parameter and quantile levels of with . We can employ validation methods to select the tuning parameter such that the composite quantile check loss is minimized.
The selection of ’s is another important task since it will affect the efficiency of resulting estimators. Suppose that we are interested in some high quantiles of with , and then the QIR model can be assumed to the interval of , which contains all ’s. We may further choose a suitable interval of such that ’s can be equally spaced on it, i.e. for , where it can be set to . As a result, the selection of ’s is equivalent to that of .
We may choose such that it is close to ’s, while a reliable estimation can be afforded at this level. The selection of is a trade-off between estimation efficiency and model misspecification; see Wang et al., 2012a ; Wang and Tsai, (2009). On one hand, to improve estimation efficiency, we may choose close to 0.5 since the richest observations will appear at the middle for most real data. On the other hand, we have to assume the parametric structure over the whole interval of , i.e. more limitations will be added to the real example. The criterion of prediction errors (PEs) is hence introduced,
where we will choose with the minimum value of PEs; see also Wang et al., 2012a .
In practice, the cross validation method can be used to select and simultaneously. Specifically, the composite quantile check loss and PEs are both evaluated at validation sets. For each candidate interval of , the tuning parameter is selected according to the composite quantile check loss, and the corresponding value of PE is also recorded. We then will choose the value of , which corresponds to the minimum value of PEs among all candidate intervals.
4 Simulation Studies
4.1 Composite quantile regression estimation
This subsection conducts simulation experiments to evaluate the finite-sample performance of the low-dimensional composite quantile regression (CQR) estimation at (2.2).
The Tukey Lambda distribution in Example 2 is used to generate the sample,
(4.1) |
where are independent and follow , , is an sequence with 2-dimensional standard normality. The true parameter vector is , and we set the location parameters , the scale parameters and the tail parameters . For the tail index , before generating the data, we first scale each covariate into the range of such that a relatively stable sample can be generated. In addition, , and are the inverse of link functions. We choose identity link for the location index and softplus-related link for the scale and tail indices, i.e., and , where is a smoothed version of and hence the name. Note that and . We consider three sample sizes of , 1000 and 2000, and there are 500 replications for each sample size.
The algorithm for CQR estimation in Section 3 is applied with and ’s being equally spaced over . We consider three quantile ranges of , and , and the estimation efficiency is first evaluated. Figure 1 gives the boxplots of three fitted location parameters . It can be seen that both bias and standard deviation decrease as the sample size increase. Moreover, when decreases, the quantile levels with richer observations will be used for the estimation and, as expected, both bias and standard deviation will decrease. Boxplots for fitted scale and tail parameters show a similar pattern and hence are omitted to save the space.
We next evaluate the prediction performance of at two interesting quantile levels of and 0.995. Consider two values of covariates, and , and the corresponding tail indices are and , respectively. Note that the Tukey lambda distribution can provide a good approximation to Cauchy and normal distributions when the tail indices are and , respectively, and it becomes more heavy-tailed when the tail index decreases (Freimer et al.,, 1988). The prediction error in terms of squared loss (PES), , is calculated for each replication, and the corresponding sample mean refers to the commonly used mean square error. Table 1 presents both the sample mean and standard deviation of PESs across 500 replications as in Wang et al., 2012a . A clear trend of improvement can be observed as the sample size becomes larger, and the prediction is more accurate at the 99.1-th quantile level for almost all cases.
4.2 High-dimensional regularized estimation
This subsection conducts simulation experiments to evaluate the finite-sample performance of the high-dimensional regularized estimation at (2.3).
For the data generating process at (4.1), we consider three dimensions of , and , and the true parameter vectors are extended from those in Section 4.1 by adding zeros, i.e. , and , which are vectors of length with non-zero entries. As a result, all true parameters make a vector of length with non-zero entries. The sample size is chosen such that with , 10, 20, 30, 40 and 50, where refers to the largest integer smaller than or equal to . All other settings are the same as in the previous subsection.
The algorithm for regularized estimation in Section 3 is used to search for the estimators, and we generate an independent validation set of size to select tuning parameter by minimizing the composite quantile check loss; see also Wang et al., 2012b . Figure 2 gives the estimation errors of . It can be seen that is roughly proportional to the quantity of , and this confirms the convergence rate in Theorem 4. Moreover, the estimation errors approach zero as the sample size increase, and we can then conclude the consistency of . Finally, when increases, the quantile levels with less observations will be used in the estimating procedure, and hence larger estimation errors can be observed.
We next evaluate the prediction performance at quantile levels and , and covariates take values of and , similar to those in the previous subsection. Table 2 gives mean square errors of the predicted conditional quantiles , as well as the sample standard deviations of prediction errors in squared loss, with . It can be seen that larger sample size leads to much smaller mean square errors. Moreover, when is larger, the prediction also becomes worse, and it may be due to the lower estimation efficiency. Finally, similar to the experiments in the previous subsection, the prediction at is more accurate for almost all cases. The results for the cases with and 150 are similar and hence omitted.
Finally, we consider the following criteria to evaluate the performance of variable selection: average number of selected active coefficients (size), percentage of active and inactive coefficients both correctly selected simultaneously (), percentage of active coefficients correctly selected (), percentage of inactive coefficients correctly selected (), false positive rate (FP), and false negative rate (FN). Table 3 reports the selecting results with and , 30 and 50. When is larger, both and decrease, and it indicates the increasing of selection accuracy. In addition, performance improves when sample size gets larger. The results for and 150 are similar and hence omitted.
5 Application to Childhood Malnutrition
Childhood malnutrition is well known to be one of the most urgent problems in developing countries. The Demographic and Health Surveys (DHS) has conducted nationally representative surveys on child and maternal health, family planning and child survival, etc., and this results in many datasets for research purposes. The dataset for India was first analyzed by Koenker, (2011), and can be downloaded from http://www.econ.uiuc.edu/~roger/research/bandaids/bandaids.html. It has also been studied by many researchers (Fenske et al.,, 2011; Belloni et al., 2019b, ) for childhood malnutrition problem in India, and quantile regression with low- or high-dimensional covariates was conducted at the levels of and . The proposed model enables us to consider much lower quntiles, corresponding to more severe childhood malnutrition problem.
The child’s height is chosen as the indicator for malnutrition as in Belloni et al., 2019b . Specifically, the response is set to , and we then consider high quantiles to study the childhood malnutrition problem such that it is consistent with previous sections. Other variables include seven continuous and 13 categorical ones, and they contain both biological factors and socioeconomic factors that are possibly related to childhood malnutrition. Examples of biological factors include the child’s age, gender, duration of breastfeeding in months, the mother’s age and body-mass index (BMI), and socioeconomic factors contain the mother’s employment status, religion, residence, and the availability of electricity. All seven continuous variables are standardized to have mean zero and variance one, and two-way interactions between all variables are also included. Moreover, we concentrate on the samples from pool families. As a result, there are covariates in total after removing variables with all elements being zero, and the sample size is . Denote the full model size by , which correspond to the sizes of location, scale and tail, respectively. Furthermore, as in the simulation experiments, covariates are further rescaled to the range for the tail index.
We aim at two high quanitles of and , and the algorithm for high-dimensional regularized estimation in Section 3 is first applied to select the interval of . Specifically, the value of is fixed to , and that of is selected among with . The value of is set to 10, and the ’s with are equally spaced over . For each , the whole samples are randomly split into five parts with equal size, except that one part is short of two observations, and the 5-fold cross validation is used to select the tuning parameter . To stabilize the process, we conduct the random splitting five times and choose the value of minimizing the composite check loss over all five splittings. The averaged value of PEs is also calculated over all five splittings, and the corresponding plot is presented in Figure 3. As a result, we choose since it corresponds to the minimum value of PEs.
We next apply the QIR model to the whole dataset with , and the tuning parameter is scaled by since the sample size changes from to . The fitted model is of size , and we can predict the conditional quantile structure at any level . For example, consider the variable of child’s age, and we are interested in children with ages of 20, 30 and 40 months. The duration of breastfeeding is set to be the same as child’s age, since the age is always larger than the duration of breastfeeding, and we set the values of all other variables in to be the same as the th observation, which has the response value being the sample median. Figure 4 plots the predicted quantile curves for three different ages. It can be seen that younger children may have extremely lower heights, and we may conclude that it may be easier for younger children to be affected by malnutrition.
Figure 4 also draws quantile curves for mother’s education, child’s gender and mother’s unemployment condition, and the values of variables at the th observation are also used for non-focal covariates in the prediction. For child’s gender, the baby boy is usually higher than baby girls as observed in Koenker, (2011), while the difference varnishes for much larger quantiles. In addition, the quanitle curves for mother’s education are almost parallel, while those for mother’s unemployment condition are crossed. More importantly, all these new insights are at very high quantiles, and this confirms the necessity of the proposed model.
Finally, we compare the proposed QIR model with two commonly used ones in the literature: (i.) linear quantile regression at the level of with penalty in Belloni et al., 2019b , and (ii.) extremal quantile regression in (Wang et al., 2012a, ) adapted to high-dimensional data. The prediction performance at and is considered for the comparison, and we fix . For Method (ii.), we consider quantile levels, equally spaced over , and the linear quantile regression with penalty is conducted at each level. As in Wang et al., 2012a , we can estimate the extreme value index, and hence the fitted structures can be extrapolated to the level of . Note that there is no theoretical justification for Method (ii.) in the literature. As in simulation experiments, the tuning parameter in the above three methods is selected by minimizing the composite check loss in the testing set. We randomly split the data 100 times, and one value of PE can be obtained from each splitting. Figure 3 gives the boxplots of PEs from our model and two competing methods, and the advantages of our model can be observed at both target levels of and 0.995.
6 Conclusions and discussions
This paper proposes a reliable method for the inference at extreme quantiles with both low- and high-dimensional covariates. The main idea is first to conduct a composite quantile regression at fixed quantile levels, and we then can extrapolate the estimated results to extreme quantiles by assuming a parametric structure at tails. The Tukey lambda structure can be used due to its flexibility and the explicit form of its quantile functions, and the success of the proposed methodology has been demonstrated by extensive numeral studies.
This paper can be extended in the following two directions. On one hand, in the proposed model, a parametric structure is assumed over the interval of . Although the criterion of PE is suggested in Section 3 to balance the estimation efficiency and model misspecification, it should be interesting to provide a statistical tool for the goodness-of-fit. Dong et al., (2019) introduced a goodness-of-fit test for parametric quantile regression at a fixed quantile level, and it can be used for our problem by extending the test statistic from a fixed level to the interval of . We leave it for the future research. On the other hand, the idea in this paper is general and can be applied to many other scenarios. For example, for conditional heteroscedastic time series models, it is usually difficult to conduct the quantile estimation at both median and extreme quantiles. The difficulty at extreme quantiles is due to the sparse data at tails, while that at median is due to the tiny values of fitted parameters (Zhu et al.,, 2018; Zhu and Li,, 2021). Our idea certainly can be used to solve this problem to some extent.
Appendix: Technical details
A.1 Proofs for low-dimensional CQR Estimation
This subsection gives technical proofs of Lemma 1 and Theorems 1-3 in section 2.2. Two auxiliary lemmas are also presented at the end of this subsection: Lemma A.1 is used for the proof of Lemma 1, and Lemma A.2 is for that of Theorem 3. The proofs of these two auxiliary lemmas are given in a supplementary file.
Proof of Lemma 1.
The Tukey lambda distribution has the form of
where , , . We prove Lemma 1 for . Consider four arbitrary quantile levels , and two arbitrary index vectors and such that
(A.1) |
We show that in the following.
The first step is to prove using the proof by contradiction. Suppose that and, without loss of generality, we assume that . Denote for . From (A.1), we have
(A.2) |
and
(A.3) |
Let us fix , , and . As a result, , , and are all fixed values. Denote
(A.4) |
and
where is the derivative function of , and if and only if . Note that equations (A.2) and (A.3) correspond to and , respectively. Moreover, it can be verified that and , i.e. the equation has at least four different solutions. As a result, the equation or has at least three different solutions. While it is implied by Lemma A.1 that the equation has at most two different solutions. Due to the contradiction, we prove that , and it is readily to further verify that . We hence accomplish the proof of Lemma 1(i). The result of Lemma 1(ii) can be proved similarly. ∎
Proof of Theorem 1.
We first prove the uniqueness of . Denote . From model (2.1), is the minimizer not only of , but also of for all . Suppose that is another minimizer of , and then it is also the minimizer of for all .
Note that, for ,
(A.5) |
where ; see Knight, (1998). Let and . It holds that, for each ,
which implies that with probability one. This, together with the conditions in Lemma 1 and the monotonic link functions, leads to the fact that for all . We then have since is positive definite. This accomplishes the proof of the uniqueness of . ∎
Proof of Theroem 2.
Note that and . From Knight’s identity at (A.5),
which, together with the condition of and Corollary 2 of Andrews, (1987), implies that
(A.6) |
Note that is a continuous function with respect to and, from Theorem 1, is the unique minimizer of . As a result, for each ,
(A.7) |
where and is its complement set, and hence
(A.8) |
Note that
which together with (A.6) and (A.8), implies that
as . This accomplishes the proof of consistency.
∎
Proof of Theorem 3.
For simplicity, we denote by in the whole proof of this theorem. Let and, from Knight’s identity at (A.5),
(A.9) |
Note that, by Taylor expansion, , where is a vector between and . Let ,
We then decompose into
(A.10) | ||||
(A.11) | ||||
(A.12) |
where
and
Lemma A.1.
Consider the function of defined in the proof of Lemma 1 with . It holds that, (1) for , is strictly decreasing, (2) for , is strictly increasing.
A.2 Proofs for high-dimensional regularized estimation
This subsection first conducts deterministic analysis at Lemma A.3, and then stochastic analysis at Lemmas A.4 and A.5. The proof of Theorem 4 follows from the deterministic analysis in Lemma A.3 and stochastic analysis in Lemmas A.4, and A.5.
We first treat the observed data, , to be deterministic. Consider the loss function , and its first-order Taylor-series error
where , is the inner product, , , and is a subgradient of .
Definition 1.
Loss function satisfies the local restricted strong convexity (LRSC) condition if
where , and and are and norms, respectively.
The above LRSC condition has a larger tolerance term compared with that in Loh and Wainwright, (2015), which has a form of . Similar tolerance term can also be found in Fan et al., (2019) for high-dimensional generalized trace regression. It is ready to establish an upper bound for estimation errors when the penalty is appropriately selected.
Lemma A.3.
Suppose that the regularizer satisfies Assumption 4, and loss function satisfies the LRSC condition with and . If the tuning parameter satisfies that
(A.18) |
then the minimizer over the set of satisfies the error bounds
(A.19) |
Proof..
Denote , and it holds that . Note that this lemma naturally holds if . As a result, we only need to consider the case with , and it can be verified that
(A.20) |
Note that , and then . This, together with the LRSC condition and Holder’s inequality, implies that
(A.21) | ||||
(A.22) | ||||
(A.23) | ||||
(A.24) |
where the last inequality follows from the subadditivity of , while the penultimate inequality is by Assumption 4; see also Lemma 4 in Loh and Wainwright, (2015). As a result,
(A.25) |
Moreover, from Lemma 5 in Loh and Wainwright, (2015), it holds that
(A.26) |
where is the index set of the largest elements of in magnitude. Combining (A.25) and (A.26), we have
(A.27) |
As a result,
(A.28) |
It is also implied by (A.26) that , which leads to
(A.29) |
This accomplishes the proof of this lemma. ∎
We next conduct the stochastic analysis to verify that the “good” event and LRSC condition hold with high probability in Lemmas A.4 and A.5, respectively.
Lemma A.4.
Proof..
It can be verified that, conditional on , is sub-Gaussian with the parameter of 0.5, and hence, for any ,
As a result, for each , and ,
(A.32) | ||||
(A.33) | ||||
(A.34) |
which implies that
By letting with , we accomplish the proof with and .
∎
Lemma A.5.
Proof..
We first show the strong convexity of . Let and, by Taylor expansion,
where , and is between and . Note that and hence, by Knight’s identity at (A.5) and Assumption 6, it can be verified that
(A.35) |
uniformly for .
For , denote and . Note that and . Let , and we next prove that, uniformly for ,
(A.36) |
with probability at least for any , where . As in Theorem 9.34 in Wainwright, (2019), we use the peeling argument, which is a common strategy in empirical process theory.
Tail bound for fixed radii: Define a set for a fixed radii , and a random variable . We next show that, for any ,
(A.37) |
with probability at least .
For , denote . Note that random variable has a form of , and it is guaranteed by Assumption 5 that
i.e., if we replace by , while keep other fixed, then changes by at most . As a result, by the bounded differences inequality and for any ,
(A.38) |
with probability at least .
In addition, it is implied by Assumption 5 that, for all ,
which leads to
(A.39) |
where and are Rademacher and standard Gaussian random variables, respectively, the first inequality is due to the symmetrization theorem (Loh and Wainwright,, 2015, Lemma 12), the second one is by the multivariate contraction theorem (van de Geer,, 2016, Theorem 16.3), the third one is due to the fact , and the last one is by Lemma 16 of Loh and Wainwright, (2015) given the sample size of for some . The upper bound at (A.37) holds by combining (A.38) and (A.39).
Extension to uniform radii via peeling: Define a sequence of sets with . It can be verified that . As a result,
where . By applying (A.37), it holds that
i.e. (A.36) holds.
Finally, from (LABEL:lemma3-eq1), (A.36) and Lemma A.4,
uniformly for with probability at least for any . This accomplishes the proof.
∎
References
- Abrevaya, (2001) Abrevaya, J. (2001). The effect of demographics and maternal behavior on the distribution of birth outcomes. Empirical Economics, 26:247–259.
- Andrews, (1987) Andrews, D. W. K. (1987). Consistency in nonlinear econometric models: A generic uniform law of large numbers. Econometrica, 55:1465–1471.
- Beirlant and Goegebeur, (2004) Beirlant, J. and Goegebeur, Y. (2004). Local polynomial maximum likelihood estimation for Pareto-type distributions. Journal of Multivariate Analysis, 89:97–118.
- Belloni and Chernozhukov, (2011) Belloni, A. and Chernozhukov, V. (2011). -penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39:82–130.
- (5) Belloni, A., Chernozhukov, V., Chetverikov, D., and Fernandez-Val, I. (2019a). Conditional quantile processes based on series or many regressors. Journal of Econometrics, 213:4–29.
- (6) Belloni, A., Chernozhukov, V., and Kato, K. (2019b). Valid post-selection inference in high-dimensional approximately sparse quantile regression models. Journal of the American Statistical Association, 114:749–758.
- Bertsekas, (2016) Bertsekas, D. P. (2016). Nonlinear Programming. Athena Scientific.
- Cade and Noon, (2003) Cade, B. S. and Noon, B. R. (2003). A gentle introduction to quantile regression for ecologists. Frontiers in Ecology and the Environment, 1:412–420.
- Chernozhukov, (2005) Chernozhukov, V. (2005). Extremal quantile regression. The Annals of Statistics, 33:806–839.
- Chernozhukov et al., (2015) Chernozhukov, V., Fernández-Val, I., and Kowalski, A. E. (2015). Quantile regression with censoring and endogeneity. Journal of Econometrics, 186:201–221.
- Chernozhukov et al., (2020) Chernozhukov, V., Fernández-Val, I., and Melly, B. (2020). Fast algorithms for the quantile regression process. Empirical Econometrics.
- Davino et al., (2014) Davino, C., Furno, M., and Vistocco, D. (2014). Quantile Regression: Theory and Applications. Wiley, New York.
- Dong et al., (2019) Dong, C., Li, G., and Feng, X. (2019). Lack-of-fit tests for quantile regression models. Journal of the Royal Statistical Society, Series B, 81:629–648.
- Elsner et al., (2008) Elsner, J. B., Kossin, J. P., and Jagger, T. H. (2008). The increasing intensity of the strongest tropical cyclones. Nature, 455:92–95.
- Fan et al., (2019) Fan, J., Gong, W., and Zhu, Z. (2019). Generalized high-dimensional trace regression via nuclear norm regularization. Journal of Econometrics, 212:177–202.
- Fan and Li., (2001) Fan, J. and Li., R. (2001). Variable selection via nonconcave penalized likelihood and its oracleproperties. Journal of the American Statistical Association, 96:1348–1360.
- Fenske et al., (2011) Fenske, N., Kneib, T., and Hothorn, T. (2011). Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. Journal of the American Statistical Association, 106:494–510.
- Fournier et al., (2007) Fournier, B., Rupin, N., Bigerelle, M., Najjar, D., Iost, A., and Wilcox, R. (2007). Estimating the parameters of a generalized lambda distribution. Computational Statistics & Data Analysis, 51:2813–2835.
- Freimer et al., (1988) Freimer, M., Kollia, G., Mudholkar, G. S., and Lin, C. T. (1988). A study of the generalized tukey lambda family. Communications in Statistics-Theory and Methods, 17:3547–3567.
- Galvao and Kato, (2016) Galvao, A. F. and Kato, K. (2016). Smoothed quantile regression for panel data. Journal of Econometrics, 193:92–112.
- Gilchrist, (2000) Gilchrist, W. G. (2000). Statistical modelling with quantile functions. Chapman & Hall/CRC, London.
- Hendricks and Koenker, (1991) Hendricks, W. and Koenker, R. (1991). Hierarchical spline models for conditional quantiles and the demand for electricity. Journal of the American Statistical Association, 87:58–68.
- Jagger and Elsner, (2008) Jagger, T. H. and Elsner, J. B. (2008). Modeling tropical cyclone intensity with quantile regression. International Journal of Climatology, 29:1351–1361.
- Jiang et al., (2012) Jiang, X., Jiang, J., and Song, X. (2012). Oracle model selection for nonlinear models based on weighted composite quantile regression. Statistica Sinica, 22:1479–1506.
- Kai et al., (2010) Kai, B., Li, R., and Zou, H. (2010). Local composite quantile regression smoothing: An efficient and safe alternative to local polynomial regression. Journal of the Royal Statistical Society, Series B, 72:49–69.
- Kato et al., (2012) Kato, K., Galvao, A. F., and Montes-Rojas, G. V. (2012). Asymptotics for panel quantile regression models with individual effects. Journal of Econometrics, 170:76–91.
- Knight, (1998) Knight, K. (1998). Limiting distributions for L1 regression estimators under general conditions. The Annals of Statistics, 26:755–770.
- Koenker, (2005) Koenker, R. (2005). Quantile regression. Cambridge University Press, Cambridge.
- Koenker, (2011) Koenker, R. (2011). Additive models for quantile regression: Model selection and confidence bandaids. Brazilian Journal of Probability and Statistics, 25:239–262.
- Koenker, (2017) Koenker, R. (2017). Quantile regression: 40 years on. Annual Review of Economics, 9:155–176.
- Koenker and Bassett, (1978) Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica, 46:33–50.
- Kuester et al., (2006) Kuester, K., Mittnik, S., and Paolella, M. S. (2006). Value-at-Risk prediction: A comparison of alternative strategies. Journal of Financial Econometrics, 4:53–89.
- Linton and Xiao, (2017) Linton, O. and Xiao, Z. (2017). Quantile regression applications in finance. In Handbook of Quantile Regression, pages 381–407. Chapman and Hall/CRC.
- Lobeto et al., (2021) Lobeto, H., Menendez, M., and Losada, I. J. (2021). Future behavior of wind wave extremes due to climate change. Scientific reports, 11:1–12.
- Loh, (2017) Loh, P. (2017). Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. The Annals of Statistics, 45:866–896.
- Loh and Wainwright, (2015) Loh, P. and Wainwright, M. J. (2015). Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. Journal of Machine Learning Research, 16:559–616.
- Meinshausen and Ridgeway, (2006) Meinshausen, N. and Ridgeway, G. (2006). Quantile regression forests. Journal of Machine Learning Research, 7:983–999.
- Moon et al., (2021) Moon, S. J., Jeon, J.-J., Lee, J. S. H., and Kim, Y. (2021). Learning multiple quantiles with neural networks. Journal of Computational and Graphical Statistics, In press.
- Noufaily and Jones, (2013) Noufaily, A. and Jones, M. C. (2013). Parametric quantile regression based on the generalized gamma distribution. Journal of the Royal Statistical Society, Series C, 62:723–740.
- Pan and Zhou, (2021) Pan, X. and Zhou, W.-X. (2021). Multiplier bootstrap for quantile regression: Non-asymptotic theory under random design. Information and Inference, 3:813–861.
- Pollard, (1985) Pollard, D. (1985). New ways to prove central limit theorems. Econometric Theory, 1:295–313.
- Racine and Li, (2017) Racine, J. S. and Li, K. (2017). Nonparametric conditional quantile estimation: a locally weighted quantile kernel approach. Journal of Econometrics, 201:72–94.
- Sherwood et al., (2016) Sherwood, B., Wang, L., et al. (2016). Partially linear additive quantile regression in ultra-high dimension. The Annals of Statistics, 44:288–317.
- Sivakumar and Banerjee, (2017) Sivakumar, V. and Banerjee, A. (2017). High-dimensional structured quantile regression. In International Conference on Machine Learning, pages 3220–3229.
- van de Geer, (2016) van de Geer, S. A. (2016). Estimation and testing under sparsity. Springer.
- van der Vaart, (1998) van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press, Cambridge.
- Vasicek, (1976) Vasicek, O. (1976). A test for normality based on sample entropy. Journal of the Royal Statistical Society, Series B, 38:54–59.
- Wainwright, (2019) Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press.
- Wang and Tsai, (2009) Wang, H. and Tsai, C.-L. (2009). Tail index regression. Journal of the American Statistical Association, 104:1233–1240.
- Wang and Li, (2013) Wang, H. J. and Li, D. (2013). Estimation of extreme conditional quantiles through power transformation. Journal of the American Statistical Association, 108:1062–1074.
- (51) Wang, H. J., Li, D., and He, X. (2012a). Estimation of high conditional quantiles for heavy-tailed distributions. Journal of the American Statistical Association, 107:1453–1464.
- Wang and He, (2021) Wang, L. and He, X. (2021). Analysis of global and local optima of regularized quantile regression in high dimension: a subgradient approach. Technical report, Miami Herbert Business School, University of Miami.
- (53) Wang, L., Wu, Y., and Li, R. (2012b). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107:214–222.
- Yu et al., (2003) Yu, K., Lu, Z., and Stander, J. (2003). Quantile regression: applications and current research areas. Journal of the Royal Statistical Society: Series D, 52:331–350.
- Zhang, (2010) Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38:894–942.
- Zheng et al., (2015) Zheng, Q., Peng, L., and He, X. (2015). Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics, 43:2225–2258.
- Zheng et al., (2018) Zheng, Y., Zhu, Q., Li, G., and Xiao, Z. (2018). Hybrid quantile regression estimation for time series models with conditional heteroscedasticity. Journal of the Royal Statistical Society, Series B, 80:975–993.
- Zhu and Ling, (2011) Zhu, K. and Ling, S. (2011). Global self-weighted and local quasi-maximum exponential likelihood estimators for arma–garch/igarch models. The Annals of Statistics, 39:2131–2163.
- Zhu and Li, (2021) Zhu, Q. and Li, G. (2021). Quantile double autoregression. Econometric Theory, In press.
- Zhu et al., (2018) Zhu, Q., Zheng, Y., and Li, G. (2018). Linear double autoregression. Journal of Econometrics, 207:162–174.
- Zou and Yuan, (2008) Zou, H. and Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics, 36:1108–1126.




0.991 | 0.995 | 0.991 | 0.995 | ||||
True | 10.34 | 11.83 | 15.13 | 18.84 | |||
500 | 1.32(2.17) | 2.35(4.11) | 5.41(11.41) | 12.13(28.19) | |||
1.42(2.20) | 2.55(4.07) | 5.42(10.95) | 12.12(26.73) | ||||
2.00(3.92) | 3.64(7.56) | 6.18(12.86) | 14.10(32.78) | ||||
1000 | 0.77(1.68) | 1.39(3.29) | 2.67(5.28) | 5.93(12.61) | |||
0.80(1.39) | 1.44(2.64) | 2.62(4.27) | 5.75(9.75) | ||||
1.31(2.53) | 2.44(5.07) | 3.22(5.08) | 7.23(11.78) | ||||
2000 | 0.32(0.47) | 0.57(0.85) | 1.03(1.56) | 2.25(3.49) | |||
0.36(0.49) | 0.64(0.90) | 1.05(1.47) | 2.31(3.24) | ||||
0.70(1.34) | 1.30(2.44) | 1.34(1.75) | 3.05(4.06) |
c | 0.991 | 0.995 | 0.991 | 0.995 | |||
True | 10.34 | 11.83 | 15.13 | 18.84 | |||
10 | 1.82(2.61) | 3.23(4.82) | 6.75(9.47) | 14.83(21.98) | |||
2.05(6.00) | 3.78(13.10) | 6.11(8.22) | 13.32(18.56) | ||||
2.92(7.19) | 5.44(15.60) | 7.24(10.36) | 15.91(25.05) | ||||
30 | 0.65(1.59) | 1.15(3.12) | 1.97(3.08) | 4.26(6.90) | |||
0.65(1.49) | 1.18(2.94) | 1.96(2.85) | 4.26(6.31) | ||||
0.92(2.15) | 1.66(4.08) | 2.22(2.95) | 4.86(6.40) | ||||
50 | 0.33(0.49) | 0.58(0.84) | 1.17(1.77) | 2.52(3.88) | |||
0.39(0.57) | 0.69(1.01) | 1.28(1.93) | 2.78(4.26) | ||||
0.54(0.86) | 0.99(1.58) | 1.55(2.39) | 3.51(5.57) |
c | size | FP | FN | ||||
10 | 9.04(0.99) | 91.6 | 96 | 95.6 | 0.06(0.68) | 0.47(2.34) | |
30 | 9.00(0.00) | 100 | 100 | 100 | 0.00(0.00) | 0.00(0.00) | |
50 | 9.00(0.00) | 100 | 100 | 100 | 0.00(0.00) | 0.00(0.00) | |
10 | 8.91(1.25) | 79 | 82.6 | 95.4 | 0.07(0.84) | 2.02(4.52) | |
30 | 9.00(0.08) | 99.4 | 99.6 | 99.8 | 0.00(0.03) | 0.04(0.70) | |
50 | 9.00(0.00) | 100 | 100 | 100 | 0.00(0.00) | 0.00(0.00) | |
10 | 8.56(0.99) | 48.4 | 54.6 | 90.4 | 0.10(0.41) | 6.51(8.43) | |
30 | 8.88(0.38) | 87.8 | 88.4 | 99.2 | 0.01(0.06) | 1.42(4.10) | |
50 | 8.96(0.23) | 96.4 | 96.6 | 99.8 | 0.00(0.03) | 0.44(2.50) |





