This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

High-Dimensional Extreme Quantile Regression

Yiwei Tang
Department of Statistics and Data Science, Fudan University
and
Huixia Judy Wang
Department of Statistics, The George Washington University
and
Deyuan Li
Department of Statistics and Data Science, Fudan University
Abstract

The estimation of conditional quantiles at extreme tails is of great interest in numerous applications. Various methods that integrate regression analysis with an extrapolation strategy derived from extreme value theory have been proposed to estimate extreme conditional quantiles in scenarios with a fixed number of covariates. However, these methods prove ineffective in high-dimensional settings, where the number of covariates increases with the sample size. In this article, we develop new estimation methods tailored for extreme conditional quantiles with high-dimensional covariates. We establish the asymptotic properties of the proposed estimators and demonstrate their superior performance through simulation studies, particularly in scenarios of growing dimension and high dimension where existing methods may fail. Furthermore, the analysis of auto insurance data validates the efficacy of our methods in estimating extreme conditional insurance claims and selecting important variables.


Keywords: Extrapolation; Extreme value; High-dimensional data; Regression analysis.

1 Introduction

Quantile regression has become a widely recognized and useful alternative to classical least-squares regression for analyzing heterogeneous data. Since its introduction by Koenker and Bassett (1978), quantile regression has gradually been extended to a wide variety of data analytic settings; for a comprehensive review, see Koenker (2005); Koenker et al. (2017).

While traditional quantile regression allows exploration of a wide range of conditional quantiles for τ[τl,τu]\tau\in[\tau_{l},\tau_{u}] with τl,τu(0,1)\tau_{l},\tau_{u}\in(0,1), there is often interest in the extreme tails where τ\tau is close to 0 or 1. Without loss of generality, we focus the discussion on τ\tau close to 1. The inherent challenge in estimating tail quantiles lies in the fact that the number of observations in the tails, that is, above the τ\tauth quantile, is often small. In various contexts, methods have been developed to study extreme quantile regression by leveraging extreme value theory. Chernozhukov (2005), Wang, Li and He (2012), and Wang and Li (2013) studied extreme conditional quantiles under the linear quantile regression framework, and Li and Wang (2019) focused on the linear quantile autoregressive model. Daouia et al. (2013) and Gardes and Stupfler (2019) considered nonparametric smoothing methods. Wang and Tsai (2009) and Youngman (2019) applied generalized Pareto models to analyze the exceedance at a high threshold. Velthoen et al. (2023) and Gnecco et al. (2024) explored tree-based methods such as gradient boosting and random forest.

With advancements in data collection techniques such as genomics, economics, finance, and imaging studies, the dimension of covariates pp is becoming larger and can grow with the sample size nn. Existing methodology and theory in quantile regression for high-dimensional covariates have primarily focused on a central quantile level or compact quantile sets in (0,1)(0,1). In situations where pp grows with nn, various studies have investigated asymptotic behaviors of quantile regression estimators, e.g., Welsh (1989), He and Shao (2000), Belloni et al. (2019), Pan and Zhou (2021), and He et al. (2023). Assuming the sparsity in regression coefficients, researchers have explored the 1\ell_{1}-penalized quantile regression estimator and its generalizations (Belloni and Chernozhukov, 2011; Fan et al., 2014), concave penalties (Wang, Wu and Li, 2012), and smoothed quantile regression with concave penalties (Tan et al., 2022). Additionally, the debiased estimator of high-dimensional quantile regression has been studied for inference at central quantiles (Zhao et al., 2014; Bradic and Kolar, 2017).

In this article, we address the challenge of estimating extreme conditional quantiles of YY\in\mathbb{R} given a set of predictors 𝐗p\mathbf{X}\in\mathbb{R}^{p} in a high-dimensional context. The challenge lies in the scarcity of efficient samples at the tail and the sparsity of samples in high-dimensional space. As noted by Gnecco et al. (2024), a lack of samples exceeding the corresponding conditional τ\tauth quantile leads to empirical estimators with large bias and variance. The high dimensionality of the predictor space introduces additional bias, as noise covariates can obscure true signals.

Current research on extreme quantiles primarily focuses on estimation within fixed-dimensional settings and is either not applicable or less competitive in high-dimensional settings. Although tree-based methods (Velthoen et al., 2023; Gnecco et al., 2024) demonstrate effectiveness in relatively large and fixed covariate dimensions, addressing scenarios where the covariate dimension grows with the sample size, comparable to the sample size itself, remains inadequately explored. Recently, Sasaki et al. (2024) proposed a high-dimensional tail index regression model, assuming that the dimension grows with nn and is comparable to n(1τ)n(1-\tau). However, this approach imposes constraints such as a Pareto tail and specific link functions between the extreme value index and covariates.

In this article, we propose novel estimators for extreme quantiles in high dimensions by integrating concepts from extreme value theory with regularized estimators. Building upon a linear conditional quantile model, we employ regularized quantile regression to estimate intermediate conditional quantiles and extrapolate them to extreme quantiles. Here, “intermediate quantile” refers to a quantile level τn\tau_{n} that approaches one at a moderate rate such that n(1τn)n(1-\tau_{n})\rightarrow\infty, whereas “extreme quantile” denotes a quantile level τn\tau_{n} that approaches one more rapidly.

The theoretical analysis of extreme quantiles in high dimensions is challenging. The limiting distribution of a quantile estimator could be intractable in high dimensions, even at central quantile levels. Existing work on central quantiles within (0,1)(0,1) often requires a common assumption, namely fY(QY(τ|𝐗)|𝐗)>C>0f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})>C>0, where fY(|𝐗)f_{Y}(\cdot|\mathbf{X}) is the conditional density of YY given 𝐗\mathbf{X}, and QY(τ|𝐗)Q_{Y}(\tau|\mathbf{X}) is the τ\tauth conditional quantile of YY; see for instance Belloni and Chernozhukov (2011), and Wang, Wu and Li (2012). However, this assumption is violated in the tails as τ\tau approaches 11. It is even more challenging to achieve a uniform result of the regularized estimator over the entire tail, which is crucial for constructing an effective extreme value index estimator. This paper addresses the gap and establishes the uniform error rate for quantile regression estimator in high dimensions and at intermediate quantiles. This rate includes terms analogous to those in high-dimensional quantile regression at a central quantile level, augmented by an inflationary component attributed to escalating quantile levels or diminishing effective sample sizes. In addition, we propose a refined Hill approach for estimating the extreme value index, which is based on a fixed number of intermediate high-dimensional quantile estimates. This framework enables us to analyze the uniform behavior of regularized quantile regression in the tail region and assess the rate at which fY(QY(τ|𝐗)|𝐗)f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X}) tends to zero. The error rates of the proposed refined Hill and extreme quantile estimators offer theoretical insights into threshold selection in high-dimensional contexts.

The rest of the article is organized as follows. In Section 2, we present the proposed estimators for the extreme value index and extreme conditional quantiles, and derive theoretical results for the proposed estimators using techniques from both extreme value theory and high-dimensional statistics theory. We assess the finite sample performance of the proposed methods through a simulation study in Section 3 and the analysis of auto insurance data in Section 4. Technical details and additional information for Sections 3 and 4 are provided in the online Supplementary Material.

2 Extreme Quantile Estimation in High Dimensions

Suppose we observe a random sample {(𝐗i,Yi),i=1,,n}\{(\mathbf{X}_{i},Y_{i}),i=1,\ldots,n\} of the random vector (𝐗,Y)(\mathbf{X},Y), where YiY_{i} is the univariate response variable and 𝐗i=(Xi1,,Xip)T\mathbf{X}_{i}=(X_{i1},\ldots,X_{ip})^{T} is the pp-dimensional centralized design vector. This article considers the high dimensional case: p:=p(n)p:=p(n)\rightarrow\infty as nn\rightarrow\infty. Let QY(τ|𝐗)=inf{y:FY(Y|𝐗)τ}Q_{Y}(\tau|\mathbf{X})=\inf\{y:F_{Y}(Y|\mathbf{X})\geq\tau\} denote the τ\tauth conditional quantile of YY given 𝐗\mathbf{X}, where FY(|𝐗)F_{Y}(\cdot|\mathbf{X}) is the cumulative distribution function (CDF) of YY given 𝐗\mathbf{X}. Denote 𝒳p\mathcal{X}\subset\mathbb{R}^{p} as the support of 𝐗\mathbf{X}.

Throughout the article, we assume that FY(|𝐗)F_{Y}(\cdot|\mathbf{X}) is in the maximum domain of attraction of an extreme value distribution Gγ()G_{\gamma}(\cdot) with the extreme value index (EVI) γ>0\gamma>0, denoted by FY(|𝐗)D(Gγ)F_{Y}(\cdot|\mathbf{X})\in D(G_{\gamma}). That is, for a given random sample Y1,,YnY_{1},\ldots,Y_{n} from FY(|𝐗)F_{Y}(\cdot|\mathbf{X}), there exist constants an>0a_{n}>0 and bnb_{n}\in\mathbb{R} such that

P(max1inYibnany|𝐗)Gγ(y)=exp{(1+γy)1/γ},P\left(\left.\frac{\max_{1\leq i\leq n}Y_{i}-b_{n}}{a_{n}}\leq y\right|\mathbf{X}\right)\rightarrow G_{\gamma}(y)=\exp\{-(1+\gamma y)^{-1/\gamma}\},

as nn\rightarrow\infty, for 1+γy01+\gamma y\geq 0. In this paper, we assume γ>0\gamma>0, which means that Y|𝐗Y|\mathbf{X} has heavy-tailed distributions as commonly seen in many applications, such as stock market returns, insurance claims, earthquake magnitudes, river flows during floods.

The main objective of this paper is to estimate the conditional quantile QY(τn|𝐗)Q_{Y}(\tau_{n}|\mathbf{X}) when τn1\tau_{n}\rightarrow 1 and pp\rightarrow\infty as nn\rightarrow\infty. We address two distinct regimes for the quantile level τn\tau_{n} as it approaches 1: intermediate order quantile levels such that n(1τn)n(1-\tau_{n})\rightarrow\infty, and extreme order quantile levels with n(1τn)Cn(1-\tau_{n})\rightarrow C, where C0C\geq 0 is some constant.

We focus on the following tail linear quantile regression model:

QY(τ|𝐗)=β0(τ)+X1β1(τ)++Xpβp(τ)=:𝐙T𝜷(τ),for allτ[τln,1),Q_{Y}(\tau|\mathbf{X})=\beta_{0}(\tau)+X_{1}\beta_{1}(\tau)+\cdots+X_{p}\beta_{p}(\tau)=:\mathbf{Z}^{T}\boldsymbol{\beta}(\tau),\quad\text{for all}\ \tau\in[\tau_{ln},1), (1)

where 𝐙=(1,𝐗T)T\mathbf{Z}=(1,\mathbf{X}^{T})^{T}, τln1\tau_{ln}\rightarrow 1 as nn\rightarrow\infty, and the quantile slope coefficients 𝜷(τ)\boldsymbol{\beta}(\tau) may vary across τ\tau\in [τln,1)[\tau_{ln},1). The linear quantile model in (1) is specifically assumed at the upper tail. Similar tail model specifications have also been considered in Wang, Li and He (2012), Li and Wang (2019), Xu et al. (2022). In high-dimensional settings, we extend this assumption and additionally impose a sparsity condition to ensure model identifiability. We assume there are ss tail-relevant variables, that is, s:=#{j=1,2,,p:βj(τ)0,s:=\#\{j=1,2,\ldots,p:\beta_{j}\left(\tau\right)\neq 0, τ[τln,1)}\exists\tau\in[\tau_{ln},1)\}, and s=o(n)s=o(n). A tail-relevant variable may influence the tail quantiles of Y|𝐗Y|\mathbf{X}, even if it does not affect the central quantiles. Without loss of generality, we assume that the first ss slope coefficients are nonzero, i.e., 𝜷(τ)=(β0(τ),β1(τ),,βs(τ),𝟎ps)T\boldsymbol{\beta}(\tau)=(\beta_{0}(\tau),\beta_{1}(\tau),\ldots,\beta_{s}(\tau),\mathbf{0}_{p-s})^{T}.

We propose a three-step procedure to estimate the extreme conditional quantile. In the first step, we obtain 1\ell_{1}-penalized quantile estimators at a sequence of intermediate quantile levels. For heavy-tailed distributions, the Hill estimator (Hill, 1975) is the most commonly used method to estimate the EVI. However, based on the average of log excesses, the Hill estimator requires analysis of the tail quantile process (de Haan and Ferreira, 2006), which poses both computational and theoretical challenges in high dimensions. To address these challenges, we develop a refined Hill estimator in the second step, which relies on quantile estimates at a fixed number of intermediate quantile levels that approach to one at the same rate. In the third step, we develop an extrapolation estimator for the extreme conditional quantile QY(τn|𝐗)Q_{Y}(\tau_{n}|\mathbf{X}) by leveraging extreme value theory and results from the first two steps.

Before presenting the proposed estimators and their theoretical properties, we first introduce the notations used throughout the paper. For a sequence of random variables E1,E2,,EnE_{1},E_{2},\ldots,E_{n}, we denote its order statistics as E(1)E(2)E(n)E_{(1)}\leq E_{(2)}\leq\ldots\leq E_{(n)}. Given a random sample 𝐙1,,𝐙n\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}, let 𝔾n(f)=𝔾n{f(𝐙i)}:=n1/2i=1n[f(𝐙i)E{f(𝐙i)}]\mathbb{G}_{n}(f)=\mathbb{G}_{n}\{f(\mathbf{Z}_{i})\}:=n^{-1/2}\sum_{i=1}^{n}[f(\mathbf{Z}_{i})-\mathrm{E}\{f(\mathbf{Z}_{i})\}] and 𝔼nf=𝔼nf(𝐙i):=n1i=1nf(𝐙i)\mathbb{E}_{n}f=\mathbb{E}_{n}f\left(\mathbf{Z}_{i}\right):=n^{-1}\sum_{i=1}^{n}f(\mathbf{Z}_{i}). We denote the 2\ell_{2}, 1\ell_{1}, \ell_{\infty} and 0\ell_{0} norms by \|\cdot\|, 1\|\cdot\|_{1}, \|\cdot\|_{\infty} and 0\|\cdot\|_{0}, respectively. Let 𝜷1,n=j=1pσ^j|βj|\|\boldsymbol{\beta}\|_{1,n}=\sum_{j=1}^{p}\widehat{\sigma}_{j}|\beta_{j}| denote the weighted 1\ell_{1}-norm with σ^j2:=𝔼n(Xij2)\widehat{\sigma}_{j}^{2}:=\mathbb{E}_{n}(X_{ij}^{2}). Given a vector 𝜹p\boldsymbol{\delta}\in\mathbb{R}^{p}, and a set of indices T{1,,p}T\subset\{1,\ldots,p\}, we denote by 𝜹T\boldsymbol{\delta}_{T} the vector where δTj=δj\delta_{Tj}=\delta_{j} if jT,δTj=0j\in T,\delta_{Tj}=0 if jTj\notin T. We use aba\lesssim b to denote a=O(b)a=O(b), meaning acba\leq cb for some constant c>0c>0 that does not depend on nn; and aba\asymp b to denote a=O(b)a=O(b) and b=O(a)b=O(a). We use aPba\lesssim_{P}b to denote a=OP(b)a=O_{P}(b). Additionally, we use ab=max{a,b}a\vee b=\max\{a,b\} and ab=min{a,b}a\wedge b=\min\{a,b\}. For notational simplicity, let Fi()=FY(|𝐗i)F_{i}(\cdot)=F_{Y}(\cdot|\mathbf{X}_{i}) be the conditional distribution function of YY given 𝐗i\mathbf{X}_{i}, and denote fi()=fY(|𝐗i)f_{i}(\cdot)=f_{Y}(\cdot|\mathbf{X}_{i}). Finally, let 𝕏=(𝐗1,,𝐗n),=(𝐙1,,𝐙n)\mathbb{X}=(\mathbf{X}_{1},\ldots,\mathbf{X}_{n})^{\top},\mathbb{Z}=(\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n})^{\top}, and 𝐘=(Y1,Y2,,Yn)T\mathbf{Y}=(Y_{1},Y_{2},\ldots,Y_{n})^{T}.

2.1 1\ell_{1}-penalized Intermediate Quantile Estimator

Define 𝒯n:={c(1τ0n):c[c1,c2]},\mathcal{T}_{n}:=\{c(1-\tau_{0n}):c\in[c_{1},c_{2}]\}, where τ0n\tau_{0n} is an intermediate quantile level such that τ0n1\tau_{0n}\rightarrow 1 and n(1τ0n)n(1-\tau_{0n})\rightarrow\infty, and 0<c1<c2<0<c_{1}<c_{2}<\infty are constants. For any τ\tau such that 1τ𝒯n1-\tau\in\mathcal{T}_{n}, we define the 1\ell_{1}-penalized quantile estimator of 𝜷(τ)\boldsymbol{\beta}(\tau) as

𝜷^(τ)=argmin(β0,𝜷T)Tp+11ni=1nρτ(Yiβ0𝐗iT𝜷)+λτ(1τ)n𝜷1,n,\widehat{\boldsymbol{\beta}}(\tau)=\mathop{\rm argmin}\limits_{(\beta_{0},\boldsymbol{\beta}^{T})^{T}\in\mathbb{R}^{p+1}}\frac{1}{n}\sum_{i=1}^{n}\rho_{\tau}(Y_{i}-\beta_{0}-\mathbf{X}_{i}^{T}\boldsymbol{\beta})+\frac{\lambda\sqrt{\tau(1-\tau)}}{n}\|\boldsymbol{\beta}\|_{1,n}, (2)

where λ>0\lambda>0 is the penalization parameter.

For high dimensional settings, the choice of λ\lambda is crucial for achieving estimation consistency. As suggested by Bickel et al. (2009), λ\lambda should exceed a suitably rescaled (sub)gradient of the sample criterion function evaluated at the true parameter value. While the asymptotic scale of λ\lambda has been derived for central quantiles (Belloni and Chernozhukov, 2011; Zheng et al., 2013), this scale differs for intermediate quantiles as τ\tau approaches one. Specifically, a subgradient of the penalized quantile regression objective function at 𝜷(τ)\boldsymbol{\beta}(\tau) is given by 𝔼n[𝐙i{τ𝟙(Yi𝐙iT𝜷(τ))}]+λτ(1τ)n1(0,𝐠T)T\mathbb{E}_{n}[\mathbf{Z}_{i}\{\tau-\mathbbm{1}(Y_{i}\leq\mathbf{Z}_{i}^{T}\boldsymbol{\beta}(\tau))\}]+{\lambda\sqrt{\tau(1-\tau)}}n^{-1}(0,\mathbf{g}^{T})^{T}, where 𝐙=(1,𝐗T)T\mathbf{Z}=(1,\mathbf{X}^{T})^{T} and 𝐠=(g1,,gp)T\mathbf{g}=(g_{1},\ldots,g_{p})^{T} is a subgradient of 𝜷1,n\|\boldsymbol{\beta}\|_{1,n} with |gj|=σ^j|g_{j}|=\widehat{\sigma}_{j} for j=1,2,,pj=1,2,\ldots,p. The infinity norm of the first term is of the same order for both central and intermediate quantiles; however, the second term varies in order due to the factor τ(1τ)\sqrt{\tau(1-\tau)}. Lemma 1 in the Supplementary Material provides an asymptotic uniform lower bound of λ\lambda over τ𝒯n\tau\in\mathcal{T}_{n}, and a practical choice for λ\lambda is discussed in Section 2.4.

One major challenge in establishing the theoretical properties of 𝜷^(τ)\widehat{\boldsymbol{\beta}}(\tau) arise from the violation of the condition fY(QY(τ|𝐗)|𝐗)>C>0f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})>C>0 at intermediate quantiles. This violation complicates the establishment of a quadratic lower bound for the Taylor expansion of the expected quantile loss function. Consequently, the general framework of Negahban et al. (2012) cannot be applied to obtain the convergence rate through the restricted strong convexity property of the empirical loss function. However, leveraging condition C5 and extreme value theory, we assess the rate at which the conditional quantile and density converge, determining how fY(QY(τ|𝐗)|𝐗)f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X}) approaches zero as τ\tau goes to one. This allows us to establish an asymptotic quadratic lower bound for the expected intermediate quantile loss after appropriate standardization (see Lemma 4 in the Supplementary Material).

To establish the theoretical properties of 𝜷^(τ)\widehat{\boldsymbol{\beta}}(\tau) over τ𝒯n\tau\in\mathcal{T}_{n}, we impose Conditions C1-C6. For brevity, the conditions are detailed in Supplementary Material, but we highlight one key condition, C5, below.

Condition C5. There exists an auxiliary line 𝐙𝐙T𝜽r\mathbf{Z}\rightarrow\mathbf{Z}^{T}\boldsymbol{\theta}_{r} with 0<r<10<r<1 and a bounded vector 𝜽r\boldsymbol{\theta}_{r} such that for U=Y𝐙T𝜽rU=Y-\mathbf{Z}^{T}\boldsymbol{\theta}_{r} and some heavy-tailed distribution function F0()F_{0}(\cdot) with extreme value index γ>0\gamma>0 and density f0()f_{0}(\cdot), the following hold for some positive continuous functions K(),K1(),K2()K(\cdot),{K}_{1}(\cdot),{K}_{2}(\cdot) on 𝒵=(1,𝒳)\mathcal{Z}=(1,\mathcal{X}).

  1. (i)

    There exists positive sequences dn,d1nd_{n},d_{1n} such that K(𝐙)dn,K1(𝐙)d1nK(\mathbf{Z})\asymp d_{n},K_{1}(\mathbf{Z})\asymp d_{1n} hold for all 𝐙𝒵\mathbf{Z}\in\mathcal{Z}.

  2. (ii)

    For any sequence tnt_{n}\rightarrow\infty, if dn{1F0(tn)}0d_{n}\{1-F_{0}(t_{n})\}\rightarrow 0 and dnf0(tn)0d_{n}f_{0}(t_{n})\rightarrow 0 as nn\rightarrow\infty, then

    |1FU(tn|𝐙)K(𝐙){1F0(tn)}1|\displaystyle\left|\frac{1-F_{U}(t_{n}|\mathbf{Z})}{K(\mathbf{Z})\{1-F_{0}(t_{n})\}}-1\right| ={1F0(tn)}δK1(𝐙){1+o(1)},and\displaystyle=\{1-F_{0}(t_{n})\}^{\delta}{K}_{1}(\mathbf{Z})\{1+o(1)\},~{}\text{and}
    |fU(tn|𝐙)K(𝐙)f0(tn)1|\displaystyle\left|\frac{f_{U}(t_{n}|\mathbf{Z})}{K(\mathbf{Z})f_{0}(t_{n})}-1\right| ={f0(tn)}δK2(𝐙){1+o(1)},\displaystyle=\{f_{0}(t_{n})\}^{\delta}{K}_{2}(\mathbf{Z})\{1+o(1)\},

    holds uniformly for 𝐙𝒵\mathbf{Z}\in\mathcal{Z}, where δ>0\delta>0 is a constant.

  3. (iii)

    U0(t):=F01(11/t)U_{0}(t):=F_{0}^{-1}(1-1/t) satisfies the second-order condition A1(t)1{U0(tz)/U0(t)zγ}zγ(zρ1)/ρ, as tA_{1}(t)^{-1}\{U_{0}(tz)/U_{0}(t)-z^{\gamma}\}\rightarrow z^{\gamma}(z^{\rho}-1)/\rho,\text{ as }t\rightarrow\infty with γ>0,ρ<0\gamma>0,\rho<0, and A1(t)=γdtρA_{1}(t)=\gamma dt^{\rho} with d0d\neq 0. Additionally, f0(t)f_{0}(t) is regularly varying at infinity with index 1/γ1-1/\gamma-1.

Remark 1.

Condition C5 presents a novel framework tailored for high-dimensional settings, where K(𝐙)K(\mathbf{Z}), K1(𝐙)K_{1}(\mathbf{Z}), and K2(𝐙)K_{2}(\mathbf{Z}) may be unbounded and affect the tail behavior of Y|𝐗Y|\mathbf{X}. Unlike fixed-dimensional cases (Wang, Li and He, 2012; Chernozhukov, 2005), this condition addresses the combined complexities of high dimensionality and tail behavior. The conditions dn{1F0(tn)}0d_{n}\{1-F_{0}(t_{n})\}\rightarrow 0 and dnf0(tn)0d_{n}f_{0}(t_{n})\rightarrow 0 as nn\rightarrow\infty ensure that the tail effect of 1F0(tn)1-F_{0}(t_{n}) outweighs the diverging scale effect of K(𝐙)K(\mathbf{Z}) in high dimensions. This requirement allows us to apply extreme value theory to infer the tail behavior of YY, and is mild, as it is automatically satisfied when the number of important variables ss is finite (see Example 1). The second-order condition of U0U_{0} implies that U0(tz)/U0(t)zγ{U_{0}(tz)}/{U_{0}(t)}\rightarrow z^{\gamma}, which can be interpreted as a first-order condition (see Condition C6 for the definition). The first-order condition is a necessary and sufficient condition for F0F_{0} to belong to the maximum domain of attraction D(Gγ)D(G_{\gamma}) for heavy-tailed distributions (de Haan and Ferreira, 2006). Most commonly used families of continuous distributions satisfies the second-order condition. Although both the extreme value index estimator and the extreme quantile estimator are based on the first-order condition, the second-order condition is essential for deriving their theoretical results. Additionally, Lemma 3 in the Supplementary Material demonstrates that under Condition C5, a second-order condition of FY(t|𝐗)F_{Y}(t|\mathbf{X}) still holds, similar to the fixed-dimensional case.

Define sr:=𝜽r0s_{r}:=\|\boldsymbol{\theta}_{r}\|_{0}. The following Theorem 1 establishes a uniform bound of 𝜷^(τ)𝜷(τ)\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau) for τ𝒯n\tau\in\mathcal{T}_{n}.

Theorem 1.

Assume Conditions C1-C6 hold, along with 1τ0n>slog(p)/n1-\tau_{0n}>\sqrt{s\log(p)/n}, dn1d1n1/δ(1τ0n)0d_{n}^{-1}d_{1n}^{1/\delta}(1-\tau_{0n})\rightarrow 0, srdnγ(1τ0n)γ0s_{r}d_{n}^{-\gamma}(1-\tau_{0n})^{\gamma}\rightarrow 0, and λnlogp/1τ0n\lambda\asymp\sqrt{n\log p}/\sqrt{1-\tau_{0n}}. Then, for any ϵ(0,1)\epsilon\in(0,1) and some positive constant CC, there exist a positive integer NN such that, for all n>Nn>N,

sup1τ𝒯n𝜷^(τ)𝜷(τ)Cdnγ(1τ0n)γ1slog(pn)n,\sup_{1-\tau\in\mathcal{T}_{n}}\|\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau)\|\leq Cd_{n}^{\gamma}(1-\tau_{0n})^{-\gamma-1}\sqrt{\frac{s\log(p\vee n)}{n}},

with probability at least 14ϵ5p41-4\epsilon-5p^{-4}.

Theorem 1 establishes bounds similar in spirit to those found in Chernozhukov (2005) but with an additional term, dnγ(1τ0n)γ1d_{n}^{\gamma}\left(1-\tau_{0n}\right)^{-\gamma-1}, which inflates the upper bound. This term reflects the decay of the conditional density in the tails. Specifically, as shown in Remark S.2 in the Supplementary Material, under certain conditions, as nn\rightarrow\infty, fY(QY(τ|𝐗)|𝐗)dnγ(1τ0n)γ+1,f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})\sim d_{n}^{-\gamma}(1-\tau_{0n})^{\gamma+1}, which describes the rate at which the conditional density approaches zero at intermediate quantiles. Theorem 1 poses both upper and lower bounds conditions on 1τ0n1-\tau_{0n}. The condition 1τ0n>slog(p)/n1-\tau_{0n}>\sqrt{s\log(p)/n} prevents τ0n\tau_{0n} from approaching one too quickly, aligning with the requirement for restricted strong convexity of the objective function (see Lemma 4 in the Supplementary Material). Restricted strong convexity refers to maintaining strong convexity within a restricted set (Negahban et al., 2012). In the high-dimensional context, this restricted set is defined such that the error vector lies within it with high probability, given a suitably chosen penalty and certain conditions. Intuitively, restricted strong convexity holds only if the prediction error of the penalized estimator, 𝐙T(𝜷^(τ)𝜷(τ))\mathbf{Z}^{T}(\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau)), grows slower than the quantile of Y|𝐗Y|\mathbf{X}. The conditions srdnγ(1τ0n)γ0s_{r}d_{n}^{-\gamma}(1-\tau_{0n})^{\gamma}\rightarrow 0 and dn1d1n1/δ(1τ0n)0d_{n}^{-1}d_{1n}^{1/\delta}(1-\tau_{0n})\rightarrow 0 ensure that the tail effect dominates the high-dimensional effect. In Section 2.3, we provide an example illustrating the feasibility of these conditions.

2.2 Refined Hill Estimator

Let k=n(1τ0n)k=\lfloor n(1-\tau_{0n})\rfloor. For simplicity, we omit the rounding notation, as it does not impact the theoretical discussion that follows. Define quantile levels τ0n=τ1<τ2<<τJ(0,1)\tau_{0n}=\tau_{1}<\tau_{2}<\cdots<\tau_{J}\in(0,1), where τj=1ljk/n,lj=sj1,s(0,1)\tau_{j}=1-l_{j}k/n,l_{j}=s^{j-1},s\in(0,1). For each j=1,2,,Jj=1,2,\ldots,J, we estimate 𝜷(τj)\boldsymbol{\beta}(\tau_{j}) by the 1\ell_{1}-penalized quantile regression estimator in (2). Suppose that we are interested in estimating the extreme conditional quantile of YY given 𝐗=𝐱𝒳\mathbf{X}=\mathbf{x}\in\mathcal{X}. Let 𝐳=(1,𝐱T)T\mathbf{z}=(1,\mathbf{x}^{T})^{T}, and we can define the refined Hill estimator as

γ^=(j=1Jϕ(lj)log(1/lj))1j=1Jϕ(lj)log(𝐳T𝜷^(τj)𝐳T𝜷^(τ1)).\hat{\gamma}=\left(\sum_{j=1}^{J}\phi\left(l_{j}\right)\log\left(1/l_{j}\right)\right)^{-1}\sum_{j=1}^{J}\phi\left(l_{j}\right)\log\left(\frac{\mathbf{z}^{T}\widehat{\boldsymbol{\beta}}(\tau_{j})}{\mathbf{z}^{T}\widehat{\boldsymbol{\beta}}(\tau_{1})}\right).

where ϕ()\phi(\cdot) is some positive measurable function.

The refined Hill estimator is a weighted variant of the Hill-type estimators found in the literature but with the following key differences. First, γ^\hat{\gamma} utilizes a fixed number of log excesses at intermediate quantile levels with the same rate, whereas the Hill estimator typically relies on quantiles approaching one at varying rates, as seen in (Wang, Li and He, 2012). Second, γ^\hat{\gamma} allows for flexible weighting through ϕ\phi, while the Hill estimator commonly applies equal weights to upper quantiles (Hill, 1975; Wang, Li and He, 2012; Daouia et al., 2023). We will discuss the choice of the weight function ϕ\phi, and the tuning parameters JJ and ss in Section 2.4. Similar weighting schemes have been explored by other researchers to generalize Hill and Pickands estimators (Drees, 1995; Daouia et al., 2013; He et al., 2022) and for estimating the Weibull tail coefficient (Gardes and Girard, 2016; He et al., 2020). These methods were developed in fixed-dimensional settings where the theoretical guarantees of the refined estimators could be established using the tractable asymptotic normality of intermediate quantile estimates. While this strategy is not feasible in high-dimensional settings, we can leverage the uniform convergence results from Theorem 1 to establish the convergence rate of the refined Hill estimator. Suppose there exist a positive sequence d1nd_{1n} such that, K1(𝐙)d1nK_{1}(\mathbf{Z})\asymp d_{1n} hold for all 𝐙𝒵\mathbf{Z}\in\mathcal{Z}.

Theorem 2.

Assume that 𝐱\mathbf{x} belongs to a compact set in 𝒳p\mathcal{X}\subset\mathbb{R}^{p}. Assume the conditions of Theorem 1, and that sr=o((dnγ+ρ(n/k)γ+ρ)(dnγδd1n(n/k)γδ))s_{r}=o\left(\left(d_{n}^{\gamma+\rho}(n/k)^{\gamma+\rho}\right)\vee\left(d_{n}^{\gamma-\delta}d_{1n}(n/k)^{\gamma-\delta}\right)\right) , dnρ(n/k)ρd1ndnδ(n/k)δ=o((ns/k)log(pn)/n)d_{n}^{\rho}(n/k)^{-\rho}\vee d_{1n}d_{n}^{-\delta}(n/k)^{\delta}=o\left((ns/k)\sqrt{{\log(p\vee n)}/n}\right). Then we have

γ^γ=Op(nsklog(pn)n).\hat{\gamma}-\gamma=O_{p}\left(\frac{ns}{k}\sqrt{\frac{\log(p\vee n)}{n}}\right).
Remark 2.

Theorem 2 suggests that if (ns/k)log(pn)/n0(ns/k)\sqrt{{\log(p\vee n)}/n}\rightarrow 0, then γ^𝑝γ\hat{\gamma}\overset{p}{\rightarrow}\gamma. In Theorem 2, the conditions on srs_{r} are posed to account for the error in the second-order term in the tail expansion of FY(|𝐗)F_{Y}(\cdot|\mathbf{X}). These conditions can be simplified for specific models, as discussed in Section 2.3. Note that the convergence rate of γ^\hat{\gamma} is (ns/k)log(pn)/n(ns/k)\sqrt{\log(p\vee n)/n}, which is slower than the typical k1/2{k}^{-1/2} rate achieved in the fixed-dimensional case (Wang and Tsai, 2009; Daouia et al., 2013). The difference arises from the slower convergence of the intermediate quantile estimators in high-dimensional settings. Furthermore, to ensure consistency, kk must satisfy k>n1/2k>n^{1/2}, which is stricter than the requirement in the fixed dimensional case (Wang, Li and He, 2012), where k>nηk>n^{\eta} with η\eta being a small positive constant. Finally, while the error bound of the 1\ell_{1}-penalized intermediate quantile estimator depends on γ\gamma, the convergence rate of γ^\hat{\gamma} remains invariant across γ\gamma.

2.3 Extreme Conditional Quantile Estimator

Let τn1\tau_{n}\rightarrow 1 be an extreme quantile level such that 1τn=o(1τ0n)1-\tau_{n}=o(1-\tau_{0n}). Suppose that we are interested in estimating the τn\tau_{n}th conditional quantile of YY given 𝐗=𝐱\mathbf{X}=\mathbf{x}. Direct estimation using the 1\ell_{1}-penalized quantile estimator in (2) is often inaccurate or unstable due to limited data in the extreme tail. However, we can leverage extreme value theory and extrapolate from intermediate to extreme quantiles by using the relationship between quantiles with levels approaching one at different rates.

Define UY(t|𝐗)=inf{y:FY(y|𝐗)11/t}=FY1(11/t|𝐗)U_{Y}(t|\mathbf{X})=\inf\{y:F_{Y}(y|\mathbf{X})\geq 1-1/t\}=F^{-1}_{Y}(1-1/t|\mathbf{X}) for t1t\geq 1, the (11/t1-1/t)th quantile of FY(|𝐗)F_{Y}(\cdot|\mathbf{X}). According to Corollary 1.2.10 in de Haan and Ferreira (2006), for a heavy-tailed distribution FY(|𝐗=𝐱)D(Gγ)F_{Y}(\cdot|\mathbf{X}=\mathbf{x})\in D(G_{\gamma}), we have

UY(tz|𝐱)UY(t|𝐱)zγ,as t.\frac{U_{Y}(tz|\mathbf{x})}{U_{Y}(t|\mathbf{x})}\rightarrow z^{\gamma},\quad\text{as }t\rightarrow\infty.

Motivated by this, for a given 𝐱\mathbf{x}, we estimate QY(τn|𝐱)Q_{Y}(\tau_{n}|\mathbf{x}) by

Q^Y(τn|𝐱)=(1τ0n1τn)γ^𝐳T𝜷^(τ0n),\widehat{Q}_{Y}(\tau_{n}|\mathbf{x})=\left(\frac{1-\tau_{0n}}{1-\tau_{n}}\right)^{\hat{\gamma}}\mathbf{z}^{T}\widehat{\boldsymbol{\beta}}(\tau_{0n}),

where 𝐳=(1,𝐱T)T\mathbf{z}=(1,\mathbf{x}^{T})^{T}.

Theorem 3.

Assume the conditions of Theorem 2. Let τn1\tau_{n}\rightarrow 1 be a quantile level such that n(1τn)=o(k)n(1-\tau_{n})=o(k), then we have

Q^Y(τn|𝐱)QY(τn|𝐱)1=Op(log(kn(1τn))nsklog(pn)n).\frac{\widehat{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}{{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}-1=O_{p}\left(\log\left(\frac{k}{n(1-\tau_{n})}\right)\frac{ns}{k}\sqrt{\frac{\log(p\vee n)}{n}}\right).

Additionally, if log[k/{n(1τn)}]\log[k/\{n(1-\tau_{n})\}] (ns/k)log(pn)/n0(ns/k)\sqrt{{\log(p\vee n)}/{n}}\rightarrow 0, Q^Y(τn|𝐱)/QY(τn|𝐱)𝑝1{\widehat{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}/{{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}\overset{p}{\rightarrow}1.

According to our estimation procedure, the error in estimating extreme quantiles primarily arises from the EVI estimator and the 1\ell_{1}-penalized intermediate quantile estimator. Specifically, the error rate of the EVI estimator and the relative error rate of the intermediate quantile estimator are both given by (ns/k)log(pn)/n(ns/k)\sqrt{{\log(p\vee n)}/{n}}. Since the EVI estimator contributes to the error rate at the exponent, the final error rate for the extreme quantile estimation is log[k/{n(1τn)}(ns/k)log(pn)/n\log[{k}/\{n(1-\tau_{n})\}(ns/k)\sqrt{{\log(p\vee n)}/{n}}. In the fixed dimension case, the rate of the relative error of extreme quantile estimator is log[k/{n(1τn)}]/k\log[k/\{n(1-\tau_{n})\}]/\sqrt{k}. Comparing this with our results, we observe that the inflated term arises from the EVI estimator, which has an error rate of 1/k{1}/{\sqrt{k}} in the fixed-dimensional setting.

Remark 3.

In this paper, we assume that FY(|𝐗)D(Gγ)F_{Y}(\cdot|\mathbf{X})\in D(G_{\gamma}), which represents a first-order condition on FY(|𝐗)F_{Y}(\cdot|\mathbf{X}). To derive the properties of Q^Y(τn|𝐱)\widehat{Q}_{Y}(\tau_{n}|\mathbf{x}), a second-order condition is needed. Condition C5 assumes the existence of an auxiliary line so that U=Y𝐙T𝛉rU=Y-\mathbf{Z}^{T}\boldsymbol{\theta}_{r} satisfies the second-order condition. Under this condition, we can show that FY(|𝐗)F_{Y}(\cdot|\mathbf{X}) also satisfies the second-order condition even for high-dimensional 𝐗\mathbf{X} (see Lemma 3), and it has the same EVI across 𝐗𝒳\mathbf{X}\in\mathcal{X}. This implies that the EVI can be technically estimated based on the intermediate conditional quantiles at any 𝐱𝒳\mathbf{x}\in\mathcal{X}. Theorem 2 establishes the convergence rate for γ^\hat{\gamma} obtained at 𝐗=𝐱\mathbf{X}=\mathbf{x} belonging to a compact set, which we now denote as γ^(𝐱)\hat{\gamma}(\mathbf{x}) to emphasize its dependence on 𝐱\mathbf{x}. Careful selection of 𝐱\mathbf{x} is important, particularly in high-dimensional cases where data points may be sparse due to the curse of dimensionality. In practice, we recommend estimating γ^(𝐱)\hat{\gamma}(\mathbf{x}) at 𝐱¯=n1i=1n𝐗i\bar{\mathbf{x}}=n^{-1}\sum_{i=1}^{n}\mathbf{X}_{i}, as this leads to more stable numerical performance and a higher convergence rate than γ^(𝐱)\hat{\gamma}(\mathbf{x}), whose convergence rate is presented in Remark 2. Specifically, γ^(𝐱¯)γ=Op((n/k)slog(pn)/n)\hat{\gamma}(\bar{\mathbf{x}})-\gamma=O_{p}\left(({n}/k)\sqrt{{s\log(p\vee n)}/{n}}\right); see the proof of Theorem 2 in the Supplementary Material for more details. A pooled estimator that averages {γ^(𝐗i),i=1,,n}\{\hat{\gamma}(\mathbf{X}_{i}),i=1,\ldots,n\} could also improve estimation efficiency; however, this approach is computationally intensive and less practical in high-dimensional settings.

The regularity conditions outlined in our paper encompass a broad range of conventional regression settings. To illustrate these conditions, we consider the location-scale shift model as a representative example.

Example 1. Consider the location-scale shift model:

Y=α+𝐗T𝜷+(1+𝐗T𝝈~)ε,Y=\alpha+\mathbf{X}^{T}\boldsymbol{\beta}+(1+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}})\varepsilon,

where 1+𝐗T𝝈~>01+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}}>0 for 𝐗𝒳\mathbf{X}\in\mathcal{X}, and εF0()\varepsilon\sim F_{0}(\cdot) with F0F_{0} satisfying the second order condition for some γ>0\gamma>0 and ρ<0\rho<0. Obviously, QY(τ|𝐗)=(α+F01(τ))+𝐗T(𝜷+𝝈~F01(τ))Q_{Y}(\tau|\mathbf{X})=(\alpha+F_{0}^{-1}(\tau))+\mathbf{X}^{T}(\boldsymbol{\beta}+\tilde{\boldsymbol{\sigma}}F_{0}^{-1}(\tau)).

Suppose that the scale effect variable is finite, i.e., σ~0<\|\tilde{\sigma}\|_{0}<\infty. Under this model, Condition C5 holds with K(𝐙)=(1+𝐗T𝝈~)1/γK\left(\mathbf{Z}\right)=\left(1+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}}\right)^{{1}/{\gamma}}, K1(𝐙)=C{(1+𝐗T𝝈~)ρ/γ1}{K}_{1}\left(\mathbf{Z}\right)=C\left\{\left(1+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}}\right)^{-{\rho}/{\gamma}}-1\right\}, where CC is some constant and δ=ρ\delta=-\rho. Details of the derivation can be found in the Supplementary Material. Let 𝜷0=nb\|\boldsymbol{\beta}\|_{0}=n^{b}, where 0b<10\leq b<1. Then dnCd_{n}\sim C, d1nCd_{1n}\sim C, and sr=nbs_{r}=n^{b}. Thus the conditions of Theorem 2 are kk\rightarrow\infty, k/n0k/n\rightarrow 0, slogpnk\sqrt{s\log p}\sqrt{n}\lesssim k, nb(k/n)γ0n^{b}(k/n)^{\gamma}\rightarrow 0, for any γ>0\gamma>0, which implies slogpnk<min{nsr1/γ,n}.\sqrt{s\log p}\sqrt{n}\lesssim k<\min\{ns_{r}^{-1/\gamma},n\}. The error rate is sup1τ𝒯n𝜷^(τ)𝜷(τ)=Op((n/k)γ+1slog(pn)/n)\sup_{1-\tau\in\mathcal{T}_{n}}\|\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau)\|=O_{p}\left((n/k)^{\gamma+1}\sqrt{s\log(p\vee n)/n}\right). Next, we consider two special cases to simplify the conditions on kk and discuss the feasibility.

  • (I).

    The non-zero components of 𝜷\boldsymbol{\beta} and 𝜷(τ)\boldsymbol{\beta}(\tau) are at the same positions, i.e., s=sr=nbs=s_{r}=n^{b}, where 0<b<10<b<1.

    Theorem 1 requires the condition slogpnk<ns1/γ\sqrt{s\log p}\sqrt{n}\lesssim k<ns^{-{1}/{\gamma}}. The addtional conditions on kk in Theorems 2 and 3 are dnρ(k/n)ρd1ndnδ(k/n)δ=o((ns/k)log(pn)/n)d_{n}^{\rho}(k/n)^{-\rho}\vee d_{1n}d_{n}^{-\delta}(k/n)^{\delta}=o((ns/k)\sqrt{\log(p\vee n)/n}) and sr=o((dnγ+ρ(n/k)γ+ρ)(dnγδd1n(n/k)γδ))s_{r}=o\left((d_{n}^{\gamma+\rho}(n/k)^{\gamma+\rho})\vee(d_{n}^{\gamma-\delta}d_{1n}(n/k)^{\gamma-\delta})\right), which can be simplified to k(s2logp)1/2(1ρ)n11/2(1ρ)k\lesssim(s^{2}\log p)^{1/2(1-\rho)}n^{1-1/2(1-\rho)} and kns1/(γ+ρ)k\lesssim ns^{-{1}/{(\gamma+\rho)}}, respectively. To obtain consistency of γ^\hat{\gamma}, the condition on kk is slogpnks\sqrt{\log p}\sqrt{n}\lesssim k. In summary, the condition on kk is slogpnk{(s2logp)1/2(1ρ)n11/2(1ρ)}ns1/(γ+ρ).s\sqrt{\log p}\sqrt{n}\lesssim k\lesssim\left\{(s^{2}\log p)^{1/2(1-\rho)}n^{1-1/2(1-\rho)}\right\}\wedge ns^{-1/(\gamma+\rho)}. To ensure the existence of an appropriate kk, the additional sparsity conditions are s11/(2ρ)log(p)/n0s^{1-1/(2\rho)}\log(p)/n\rightarrow 0 and s1+2/(γ+ρ)logp/n0s^{1+{2}/{(\gamma+\rho)}}\log p/n\rightarrow 0.

  • (II).

    Suppose that 𝜷0\|\boldsymbol{\beta}\|_{0} is finite, that is, b=0b=0.

    In this case, the condition on kk becomes less stringent. Specifically, the condition in Theorem 1 simplifies to logpnk<n\sqrt{\log p}\sqrt{n}\lesssim k<n. Furthermore, the conditions on kk in Theorems 2 and 3 can be expressed as logpnk(logp)1/2(1ρ)n11/2(1ρ)\sqrt{\log p}\sqrt{n}\lesssim k\lesssim(\log p)^{1/2(1-\rho)}n^{1-1/2(1-\rho)}. These conditions are feasible for any ρ<0\rho<0.

2.4 Computational Issues

The proposed method relies on several tuning parameters/functions, including the penalization parameter λ\lambda, the integer kk associated with the intermediate quantile level τ0n=1k/n\tau_{0n}=1-k/n, the weight function ϕ()\phi(\cdot) and the number of intermediate quantile levels JJ in the refined Hill estimator. In this section, we will discuss practical methods for selecting these quantities.

Selection of λ\lambda. In high-dimension settings, the optimal choice of λ\lambda depends on many factors, including pp, the design matrix and error distribution. Bickel et al. (2009) suggested that λ\lambda should exceed the supremum norm of a suitably rescaled (sub)gradient of the sample criterion function, Belloni and Chernozhukov (2011) proposed a similar approach for selecting λ\lambda at central quantiles. However, applying this method directly to intermediate quantiles often results in over-shrinkage. Our theoretical study suggests that at intermediate quantiles, λ\lambda should scale as Cnlogp/1τ0nC\sqrt{n\log p}/\sqrt{1-\tau_{0n}}, but accurately estimating the constant CC is challenging, as noted in Homrighausen and McDonald (2017); Chetverikov et al. (2021). In our implementation, we conduct 10-fold cross-validations to select λ\lambda for each τj,j=1,2,,J\tau_{j},j=1,2,\ldots,J. This involves splitting the data into ten folds and using each fold to evaluate the estimator based on the remaining folds. For each λ\lambda, we calculate the average quantile loss at τj\tau_{j} across all folds and select the λ\lambda that minimizes this loss. While time-intensive, our numerical studies show that cross-validation performs well across various settings considered. Developing an efficient and data-driven selection method tailored to intermediate quantiles is an interesting direction for future research.

Selection of kk. The intermediate quantile level, or equivalently kk, plays an important role of balancing between bias and variance. A small kk increases variance, while a large kk leads to higher bias. A common approach is to select kk at the first stable point in the plot of the EVI estimator versus kk; see Neves et al. (2015) for a heuristic algorithm. However, this method is computationally intensive in high-dimensional cases, as the estimator needs to be calculated over a sequence of kk values, and the EVI may exhibit local variations across 𝐱\mathbf{x}. To reduce the computational cost, we propose a rule of thumb for selecting kk. Based on Theorem 2 and Example 1, kk should satisfy slogpnk<min{nsr1/γ,n}\sqrt{s\log p}\sqrt{n}\lesssim k<\min\{ns_{r}^{-1/\gamma},n\}. Motivated by this, we set k=c0n0.5+δ1(logp)0.5+δ2k=\lfloor c_{0}n^{0.5+\delta_{1}}(\log p)^{0.5+\delta_{2}}\rfloor, where c0>0c_{0}>0 is a constant, and δ1\delta_{1} and δ2\delta_{2} are small positive constants. The sensitivity analysis in the Supplementary Material shows that our proposed method remains stable for c0[0.5,2.3]c_{0}\in[0.5,2.3], δ1[0.005,0.016]\delta_{1}\in[0.005,0.016], and δ2[0.01,0.08]\delta_{2}\in[0.01,0.08], across various scenarios and sample sizes. Throughout our numerical studies, we use c0=0.8c_{0}=0.8, δ1=0.01\delta_{1}=0.01 and δ2=0.05\delta_{2}=0.05.

Choice of weights. The refined Hill estimator assigns different weights to JJ intermediate quantiles. In our implementation, we use ϕ(j)=ja\phi(\ell_{j})=\ell_{j}^{a} and j=sj1\ell_{j}=s^{j-1}, where a,s(0,1)a,s\in(0,1), as recommended by He et al. (2022) for extreme analysis in an autoregressive model. The authors showed that, in their setup, the asymptotic variance of the refined Hill estimator is proportional to a term depending only on (J,s,a)(J,s,a). Minimizing this term yields the optimal choice (J,s,a)(J,s,a). While the limiting distribution is intractable without debiasing in high-dimensional settings, we find that the same parameters continue to perform well, resulting in more efficient estimation than methods based on the unweighted Hill estimator.

3 Simulation Study

We conduct a simulation study to investigate the performance of the proposed method for estimating extreme quantiles in high-dimensional settings. The data are generated from

Yi=Xi1+Xi2+(1+wXi1)εi,i=1,2,,n,Y_{i}=X_{i1}+X_{i2}+(1+wX_{i1})\varepsilon_{i},\qquad i=1,2,\ldots,n,

where XijU(0,1)X_{ij}\sim U(0,1) for j=1,2,,pj=1,2,\ldots,p and εi\varepsilon_{i} are independent and identically distributed student t random variables, and ww is a constant controlling the degree of heteroscedasticity. Under this model, the τ\tauth conditional quantile of YY is QY(τ|𝐗i)=α(τ)+𝐗iT𝜷(τ),Q_{Y}(\tau|\mathbf{X}_{i})=\alpha(\tau)+\mathbf{X}_{i}^{T}\boldsymbol{\beta}(\tau), where 𝐗i=(Xi1,Xi2,,Xip)T\mathbf{X}_{i}=\left(X_{i1},X_{i2},\ldots,X_{ip}\right)^{T}, α(τ)=Qε(τ)\alpha(\tau)=Q_{\varepsilon}(\tau), 𝜷(τ)=(1+wQε(τ),1,0,0)T\boldsymbol{\beta}(\tau)=\left(1+wQ_{\varepsilon}(\tau),1,0\ldots,0\right)^{T}, and Qε(τ)Q_{\varepsilon}(\tau) is the τ\tauth quantile of εi\varepsilon_{i}. We consider six different cases. In Cases 1, 3, and 5, εit(5)\varepsilon_{i}\sim t(5) and in Cases 2, 4 and 6, εit(2)\varepsilon_{i}\sim t(2). Cases 1-2 have homoscedastic errors with w=0w=0, while w=0.5w=0.5 in Cases 3-4, and w=0.9w=0.9 in Cases 5-6. Note that, for t(v)t(v), the EVI is γ=1/v\gamma=1/v. We consider two settings for the sample size nn and dimension pp: (1) p=p(n)p=p(n) grows with nn but pp is smaller than nn, with p=n0.5p=\lceil n^{0.5}\rceil and n=1000,5000n=1000,5000; (2) pp is larger than or equal to nn, with (n,p)=(1000,1000)(n,p)=(1000,1000) and (1000,1200)(1000,1200). For each scenario, the simulation is repeated 500500 times.

For each simulation, we estimate QY(τn|𝐱)Q_{Y}(\tau_{n}|\mathbf{x}) at τn=0.995,0.999\tau_{n}=0.995,0.999 using three methods: the extreme quantile regression method (EQR) without assuming common slope in Wang, Li and He (2012); the high-dimensional quantile regression (HQR) of Belloni and Chernozhukov (2011), and the proposed high-dimensional extreme quantile regression (HEQR). For EQR, the number of upper-order statistics kk is set to [4.5n1/3][4.5n^{1/3}], as recommended in Wang, Li and He (2012).

For each simulation, we calculate integrated squared error (ISE), defined as

ISE=1Ll=1L{Q^Y(τn|𝐗l)QY(τn|𝐗l)1}2,\mathrm{ISE}=\frac{1}{L}\sum_{l=1}^{L}\left\{\frac{\widehat{Q}_{Y}(\tau_{n}|\mathbf{X}_{l}^{*})}{Q_{Y}(\tau_{n}|\mathbf{X}_{l}^{*})}-1\right\}^{2},

where 𝐗1,,𝐗L\mathbf{X}_{1}^{*},\ldots,\mathbf{X}_{L}^{*} are evaluation points of the covariates, and we define the mean integrated squared error (MISE) as the average ISE across 500500 simulations. In our simulation, we set L=100L=100, and let 𝐗l\mathbf{X}_{l}^{*} be random replicates of 𝐗\mathbf{X}.

Table 1: The mean integrated squared error (MISE) and standard error (in parentheses) of different estimators for the extreme conditional quantile at τn=0.995,0.999\tau_{n}=0.995,0.999 for γ=0.2\gamma=0.2 and p=n0.5p=\lceil n^{0.5}\rceil. All values are in percentages.
ww nn τn\tau_{n} HEQR EQR HQR
0 1000 0.995 2.9(0.05) 3.6(0.05) 5.7(0.13)
0.999 4.7(0.09) 5.2(0.09) 12.5(0.12)
5000 0.995 1.6(0.02) 2.7(0.02) 3.7(0.04)
0.999 2.4(0.03) 4.2(0.05) 6.1(0.06)
0.5 1000 0.995 3.3(0.05) 4.4(0.06) 7.5(0.21)
0.999 4.8(0.10) 5.8(0.10) 14.6(0.14)
5000 0.995 2.0(0.02) 3.4(0.03) 5.0(0.06)
0.999 2.4(0.03) 4.8(0.06) 6.9(0.08)
0.9 1000 0.995 3.7(0.05) 5.2(0.07) 9.0(0.29)
0.999 5.0(0.09) 6.5(0.12) 15.8(0.17)
5000 0.995 2.3(0.02) 4.0(0.04) 6.0(0.08)
0.999 2.6(0.03) 5.1(0.06) 7.8(0.12)
  • HEQR: the proposed estimator; EQR: the method in Wang, Li and He (2012); HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

Table 2: The mean integrated squared error (MISE) and standard error (in parentheses) of different estimators for the extreme conditional quantile at τn=0.995,0.999\tau_{n}=0.995,0.999 for γ=0.5\gamma=0.5 and p=n0.5p=\lceil n^{0.5}\rceil. All values are in percentages.
ww nn τn\tau_{n} HEQR EQR HQR
0 1000 0.995 7.4(0.26) 10.7(0.56) 24.6(3.33)
0.999 15.0(0.68) 19.9(2.11) 69.7(16.05)
5000 0.995 5.3(0.11) 7.7(0.12) 12.7(0.28)
0.999 7.2(0.23) 10.5(0.26) 28.5(2.78)
0.5 1000 0.995 8.5(0.31) 13.5(0.70) 32.3(3.70)
0.999 15.4(0.84) 23.8(2.63) 96.8(28.76)
5000 0.995 6.3(0.14) 9.7(0.18) 17.1(0.42)
0.999 8.1(0.33) 11.9(0.39) 36.5(3.93)
0.9 1000 0.995 9.6(0.33) 16.1(0.80) 40.5(4.25)
0.999 16.0(0.86) 27.3(2.86) 119.9(38.54)
5000 0.995 7.3(0.17) 11.5(0.23) 21.5(0.58)
0.999 9.4(0.39) 13.5(0.49) 46.9(5.11)
  • HEQR: the proposed estimator; EQR: the method in Wang, Li and He (2012); HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

Table 3: The mean integrated squared error (MISE) and standard error (in parentheses) of different estimators for the extreme conditional quantile at τn=0.995,0.999\tau_{n}=0.995,0.999 for the setting pnp\geq n, with (n,p)=(1000,1000)(n,p)=(1000,1000) and (1000,1200)(1000,1200). All values are in percentages.
ww γ\gamma pp τn\tau_{n} HEQR HQR
0 0.2 1000 0.995 0.23(0.03) 0.19(0.02)
0.999 0.23(0.03) 0.43(0.03)
1200 0.995 0.20(0.02) 0.16(0.02)
0.999 0.22(0.03) 0.30(0.03)
0.5 0.2 1000 0.995 0.26(0.03) 0.23(0.02)
0.999 0.26(0.03) 0.50(0.03)
1200 0.995 0.23(0.03) 0.20(0.02)
0.999 0.24(0.03) 0.35(0.03)
0 0.5 1000 0.995 0.32(0.04) 0.44(0.33)
0.999 0.41(0.06) 0.63(0.13)
1200 0.995 0.41(0.05) 0.36(0.23)
0.999 0.51(0.05) 0.55(0.14)
0.5 0.5 1000 0.995 0.34(0.06) 0.52(0.71)
0.999 0.39(0.07) 0.67(0.17)
1200 0.995 0.34(0.05) 0.41(0.41)
0.999 0.44(0.08) 0.58(0.20)
  • HEQR: the proposed estimator; HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

We begin by examining the p=n0.5p=\lceil n^{0.5}\rceil setting. Tables 1 and 2 summarize the MISE of three estimators for the extreme conditional quantile at τn=0.995,\tau_{n}=0.995, and 0.9990.999. Results show that for all three methods, both MISE and its standard error increase as τn\tau_{n} approaches 1, and as the distribution becomes more heavy-tailed or heteroscedastic. In all scenarios, the proposed method HEQR outperforms the others in terms of both MISE and standard error. Both EQR and HEQR rely on extrapolation using extreme value theory. In this setting, since pp increases with nn but remains smaller than nn, ERQ method is still applicable, and both EQR and HEQR perform better than HQR. The advantages of HEQR over EQR come from its penalized estimation of 𝜷(τ)\boldsymbol{\beta}(\tau) and its use of the refined Hill estimator, compared to EQR’s unweighted Hill estimator.

We next examine the setting where pnp\geq n. Table 3 summarizes the results for two methods HEQR and HQR, with (n,p)=(1000,1000)(n,p)=(1000,1000) and (1000,1200)(1000,1200), as EQR is not applicable in these cases. While HQR performs reasonably well for small γ=0.2\gamma=0.2 at τn=0.995\tau_{n}=0.995, it is significantly less efficient than HEQR for all other scenarios, regarding both MISE and its standard error. Overall, the results suggest that HEQR demonstrates superior accuracy and stability, particularly for estimating extremely high quantiles in heavy-tailed distributions.

We perform a sensitivity analysis to assess the stability of the proposed method with respect to the choices of c0c_{0}, δ1\delta_{1}, and δ2\delta_{2} in the rule of thumb k=c0n0.5+δ1(logp)0.5+δ2k=\lfloor c_{0}n^{0.5+\delta_{1}}(\log p)^{0.5+\delta_{2}}\rfloor. For brevity, we present results only for Case 1. The other cases exhibit similar behavior, and the results are provided in the Supplementary file. Figure 1 shows the MISE of EQR, HQR and HEQR for the extreme conditional quantiles at τn=0.995\tau_{n}=0.995 against c0[0.4,3]c_{0}\in[0.4,3], with (δ1,δ2)(\delta_{1},\delta_{2}) fixed at (0.01,0.05)(0.01,0.05) for n=1000n=1000 and 50005000. The shaded area denotes the 95% pointwise confidence band of the MISE. The two horizontal lines represent the MISE of HQR and EQR, respectively, while the shaded area corresponds to the 95% pointwise confidence band of the MISE. For c0[0.5,2.3]c_{0}\in[0.5,2.3], the MISE of HEQR is consistently smaller than that of HQR and EQR, indicating that c0c_{0} in this range is a suitable choice. Figures 2 and 3 plot the MISE against δ1[0.005,0.03]\delta_{1}\in[0.005,0.03] and δ2[0.02,0.08]\delta_{2}\in[0.02,0.08], respectively, with the other two constants fixed at their recommended values. The horizontal line represents the MISE associated with the recommended values c0=0.8c_{0}=0.8, δ1=0.01\delta_{1}=0.01, and δ2=0.05\delta_{2}=0.05, while the shaded area corresponds to the 95% pointwise confidence band of the MISE. The results indicate that the method is insensitive to variations in δ1[0.005,0.016]\delta_{1}\in[0.005,0.016] and δ2[0.01,0.08]\delta_{2}\in[0.01,0.08].

Refer to caption
(a) n=1000n=1000
Refer to caption
(b) n=5000n=5000
Figure 1: The MISE of estimators for the extreme conditional quantile at τn=0.995\tau_{n}=0.995 against c0c_{0} in Case 1 with n=1000n=1000 (a) and n=5000n=5000 (b).
Refer to caption
(a) τn=0.995\tau_{n}=0.995
Refer to caption
(b) τn=0.999\tau_{n}=0.999
Figure 2: The MISE of HEQR estimator for the extreme conditional quantile at τn=0.995\tau_{n}=0.995 (a) and 0.9990.999 (b) against δ1\delta_{1} in Case 1.
Refer to caption
(a) τn=0.995\tau_{n}=0.995
Refer to caption
(b) τn=0.999\tau_{n}=0.999
Figure 3: The MISE of HEQR estimator for the extreme conditional quantile at τn=0.995\tau_{n}=0.995 (a) and 0.9990.999 (b) against δ2\delta_{2} in Case 1.

4 Analysis of Auto Insurance Claims

In this section, we analyze an auto insurance claims dataset to investigate the effects of various factors on the higher quantiles of claim amounts. The data is available on Kaggle at https://www.kaggle.com/datasets/xiaomengsun/car-insurance-claim-data. Insurance claims data are typically heavy-tailed and heterogeneous, reflecting the diverse nature of risks they embody and the array of factors that can influence claim sizes and frequencies. These factors not only complicate the analysis but also provide a rich vein of insights. We consider the following quantile regression model:

QYi(τ|𝐗)=β0(τ)+Xi1β1(τ)+Xi2β2(τ)++Xipβp(τ),i=1,,n=8423,Q_{Y_{i}}(\tau|\mathbf{X})=\beta_{0}(\tau)+X_{i1}\beta_{1}(\tau)+X_{i2}\beta_{2}(\tau)+\cdots+X_{ip}\beta_{p}(\tau),\quad i=1,\ldots,n=8423,

where YY represents the auto claim amount (in USD), and Xi1,Xi2,,XipX_{i1},X_{i2},\ldots,X_{ip} are p=43p=43 covariates included after preprocessing. To account for nonlinearity, we include quadratic terms for “Age” and “Income,” as well as quadratic and cubic terms for “Vehicle Value” in the model, with both “Income” and “Vehicle Value” on the log-transformed scale. A detailed description of the variables can be found in Section S.4 of the Supplementary Materials. We focus on high quantiles (τ=0.991,0.995,0.999\tau=0.991,0.995,0.999) to study the impact of various factors on large claims.

We conduct a cross-validation study to compare the performance of different methods for predicting the extremely high conditional quantiles of auto claims. We randomly divide the data set into a training set (20% of the data set, n1=1684n_{1}=1684 observations) and a testing set (the remaining 80% of the data set, n2=6739n_{2}=6739 observations). We apply HEQR, EQR, and HQR to analyze the training set and predict the extreme quantiles of auto claims conditional on the covariates in the testing set. For HEQR, we set k=0.8n10.51(logp)0.55k=\lfloor 0.8n_{1}^{0.51}(\log p)^{0.55}\rfloor and estimate γ\gamma at a central data point, with continuous covariates set to their mean values and binary variables set to their modes. Since 𝟙{Y<QY(τ|𝐗)}\mathbbm{1}\{{Y}<Q_{Y}({\tau}|\mathbf{X})\} has a mean of τ\tau and a variance of τ(1τ)\tau(1-\tau), this motivates us to consider the following prediction error (PE) metric:

PE={n2τ(1τ)}1/2j[τ𝟙{Yj<Q^τ(Y|𝐗j)}],\mathrm{PE}=\{n_{2}\tau(1-\tau)\}^{-1/2}\sum_{j\in\mathcal{I}}[\tau-\mathbbm{1}\{{Y}_{j}<\widehat{Q}_{\tau}({Y}|\mathbf{X}_{j})\}],

where \mathcal{I} is the index set of the testing set. The cross-validation process is repeated 500 times. Table 4 summarizes the median absolute PE for different methods at τn=0.991\tau_{n}=0.991, 0.9950.995, and 0.9990.999, with values in parentheses representing the corresponding median absolute deviation. The results indicate that HEQR yields significantly smaller prediction errors than the other two methods across all three quantile levels, demonstrating higher efficiency.

Table 4: The median absolute value (with median absolute deviation in parentheses) of prediction errors for different methods at τn=0.991\tau_{n}=0.991, 0.9950.995, and 0.9990.999, based on cross-validation of the auto claims data.
τn=0.991\tau_{n}=0.991 τn=0.995\tau_{n}=0.995 τn=0.999\tau_{n}=0.999
HEQR 1.84(1.13) 1.83(1.19) 1.28(0.63)
EQR 7.82(0.00) 5.82(0.00) 2.60(0.00)
HQR 8.95(2.06) 13.96(3.02) 32.86(6.74)
  • HEQR: the proposed estimator; EQR: the method in Wang, Li and He (2012); HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

Next, we analyze the full dataset to estimate extreme quantiles and identify important variables using HEQR. Out of the 43 covariates, our method identified 34 to have nonzero effects at the intermediate high quantile level at τ0n=1k/n\tau_{0n}=1-k/n with k=0.8n0.51(logp)0.55k=\lfloor 0.8n^{0.51}(\log p)^{0.55}\rfloor and n=8423n=8423. Notably, one of the most significant variables identified by the HEQR model is whether an individual resides in a rural district. This finding aligns with prior research by Clemente et al. (2023), which links rural residences to lower auto insurance claims due to reduced claim frequency and severity. Additionally, our results show that an increase in vehicle age corresponds to a decrease in claims, consistent with Clemente et al. (2023), who also found vehicle age negatively impacts claim frequency. Figure 4 shows the 1\ell_{1}-penalized quantile coefficient estimates 𝜷^j(τ)\widehat{\boldsymbol{\beta}}_{j}(\tau) for selected covariates at upper quantiles. The results indicate that being a commercial vehicle positively affects auto claims, with the effect increasing at higher quantiles. In contrast, “Rural Population” and “Time in Force” (the duration the insurance policy has been in effect, abbreviated as TIF) have negative effects, which become stronger at higher quantiles. The variable “Bachelor’s Degree” shows little impact on auto claims. These findings highlight the heterogeneity of covariate effects, particularly in the upper tail of the claim distribution.

Refer to caption
Figure 4: The estimated quantile coefficients of selected covariates for the auto claims data versus the quantile level at the upper tail.

We also include the full-data analysis results from HQR for comparison. Figure 5 presents the extreme quantile estimates of auto claims at τn=0.991\tau_{n}=0.991 and 0.999 as a function of X20X_{20}, log-transformed Vehicle Value, for two groups of individuals with different TIF from both HQR and HEQR, with other continuous covariates set to their mean values and categorical variables to their modes. The two groups correspond to individuals with TIF of 1 year and 7 years, representing the 0.25 and 0.75 quantiles, respectively. Results from HEQR suggest that, for both groups, higher vehicle values lead to larger claims, likely due to higher repair costs for more expensive vehicles, which aligns with common expectations and prior findings (Clemente et al., 2023). In contrast, HQR shows little variation in high quantiles of claims across vehicle values. Both methods indicate that individuals with a shorter TIF tend to have higher claims at τn=0.991\tau_{n}=0.991. This finding aligns with our common understanding, as shorter TIFs may reflect less experienced or stable policyholders, potentially leading to higher-risk profiles and more frequent or severe claims. HEQR consistently shows the effect of TIF across high quantiles, while HQR becomes unstable at τn=0.999\tau_{n}=0.999, displaying almost no difference between the two groups.

Refer to caption
(a) τn=0.995\tau_{n}=0.995
Refer to caption
(b) τn=0.999\tau_{n}=0.999
Figure 5: The estimated extreme conditional quantiles of auto claims (in $) from HEQR and HQR against the log-transformed Vehicle Value (X20X_{20}) for two groups: time in force equal to 1 (TIF=1) and 7 (TIF=7), at τn=0.991\tau_{n}=0.991 and 0.9990.999, with other continuous covariates set to their mean values and categorical variables to their modes.

SUPPLEMENTARY MATERIAL

Supplementary Material for High-dimensional Extreme Quantile Regression:

The supplementary material includes all the technical conditions, proofs of theorems, additional simulation results, and further details on the analysis of the auto insurance claims data. (.pdf file)

Auto Insurance Claims Data Set:

Data set used in the illustration of the proposed methods in Section 4. (.csv file)

References

  • (1)
  • Belloni and Chernozhukov (2011) Belloni, A. and Chernozhukov, V. (2011), ‘1\ell 1-penalized quantile regression in high-dimensional sparse models’, Annals of Statistics 39(1), 82–130.
  • Belloni et al. (2019) Belloni, A., Chernozhukov, V., Chetverikov, D. and Fernández-Val, I. (2019), ‘Conditional quantile processes based on series or many regressors’, Journal of Econometrics 213(1), 4–29.
  • Bickel et al. (2009) Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009), ‘Simultaneous analysis of lasso and dantzig selector’, Annals of Statistics 37(4), 1705–1732.
  • Bradic and Kolar (2017) Bradic, J. and Kolar, M. (2017), ‘Uniform inference for high-dimensional quantile regression: linear functionals and regression rank scores’, arXiv:1702.06209.
  • Chernozhukov (2005) Chernozhukov, V. (2005), ‘Extramal quantile regression’, Annals of Statistics 33(2), 806–839.
  • Chetverikov et al. (2021) Chetverikov, D., Liao, Z. and Chernozhukov, V. (2021), ‘On cross-validated lasso in high dimensions’, Annals of Statistics 49(3), 1300–1317.
  • Clemente et al. (2023) Clemente, C., Guerreiro, G. R. and Bravo, J. M. (2023), ‘Modelling motor insurance claim frequency and severity using gradient boosting’, Risks 11(9), 1–20.
  • Daouia et al. (2013) Daouia, A., Gardes, L. and Girard, S. (2013), ‘On kernel smoothing for extremal quantile regression’, Bernoulli 19(5B), 2557–2589.
  • Daouia et al. (2023) Daouia, A., Stupfler, G. and Usseglio-Carleve, A. (2023), ‘Inference for extremal regression with dependent heavy-tailed data’, Annals of Statistics 51(5), 2040–2066.
  • de Haan and Ferreira (2006) de Haan, L. and Ferreira, A. (2006), Extreme Value Theory: An Introduction, Springer Science & Business Media.
  • Drees (1995) Drees, H. (1995), ‘Refined pickands estimators of the extreme value index’, Annals of Statistics 23(6), 2059–2080.
  • Fan et al. (2014) Fan, J., Fan, Y. and Barut, E. (2014), ‘Adaptive robust variable selection’, Annals of Statistics 42(1), 324–351.
  • Gardes and Girard (2016) Gardes, L. and Girard, S. (2016), ‘On the estimation of the functional weibull tail-coefficient’, Journal of Multivariate Analysis 146, 29–45.
  • Gardes and Stupfler (2019) Gardes, L. and Stupfler, G. (2019), ‘An integrated functional weissman estimator for conditional extreme quantiles’, REVSTAT-Statistical Journal 17(1), 109–144.
  • Gnecco et al. (2024) Gnecco, N., Terefe, E. M. and Engelke, S. (2024), ‘Extremal random forests’, DOI: https://doi.org/10.1080/01621459.2023.2300522.
  • He et al. (2020) He, F., Wang, H. J. and Tong, T. (2020), ‘Extremal linear quantile regression with weibull-type tails’, Statistica Sinica 30(3), 1357–1377.
  • He et al. (2022) He, F., Wang, H. J. and Zhou, Y. (2022), ‘Extremal quantile autoregression for heavy-tailed time series’, Computational Statistics & Data Analysis 176, 107563.
  • He et al. (2023) He, X., Pan, X., Tan, K. M. and Zhou, W.-X. (2023), ‘Smoothed quantile regression with large-scale inference’, Journal of Econometrics 232(2), 367–388.
  • He and Shao (2000) He, X. and Shao, Q.-M. (2000), ‘On parameters of increasing dimensions’, Journal of Multivariate Analysis 73(1), 120–135.
  • Hill (1975) Hill, B. M. (1975), ‘A simple general approach to inference about the tail of a distribution’, Annals of Statistics 3(5), 1163–1174.
  • Homrighausen and McDonald (2017) Homrighausen, D. and McDonald, D. J. (2017), ‘Risk consistency of cross-validation with lasso-type procedures’, Statistica Sinica 49(3), 1017–1036.
  • Koenker (2005) Koenker, R. (2005), Quantile Regression, Cambridge University Press.
  • Koenker and Bassett (1978) Koenker, R. and Bassett, J. G. (1978), ‘Regression quantiles’, Econometrica 46(1), 33–50.
  • Koenker et al. (2017) Koenker, R., Chernozhukov, V., He, X. and Peng, L. (2017), Handbook of quantile regression, CRC press.
  • Li and Wang (2019) Li, D. and Wang, H. J. (2019), ‘Extreme quantile estimation for autoregressive models’, Journal of Business & Economic Statistics 37(4), 661–670.
  • Negahban et al. (2012) Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012), ‘A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers’, Statistical Science 27(4), 538–557.
  • Neves et al. (2015) Neves, M. M., Gomes, M. I., Figueiredo, F. and Prata Gomes, D. (2015), ‘Modeling extreme events: sample fraction adaptive choice in parameter estimation’, Journal of Statistical Theory and Practice 9(1), 184–199.
  • Pan and Zhou (2021) Pan, X. and Zhou, W.-X. (2021), ‘Multiplier bootstrap for quantile regression: non-asymptotic theory under random design’, Information and Inference 10(3), 813–861.
  • Sasaki et al. (2024) Sasaki, Y., Tao, J. and Wang, Y. (2024), ‘High-dimensional tail index regression: with an application to text analyses of viral posts in social media’, arXiv:2403.01318.
  • Tan et al. (2022) Tan, K. M., Wang, L. and Zhou, W.-X. (2022), ‘High-dimensional quantile regression: Convolution smoothing and concave regularization’, Journal of the Royal Statistical Society Series B: Statistical Methodology 84(1), 205–233.
  • Velthoen et al. (2023) Velthoen, J., Dombry, C., Cai, J.-J. and Engelke, S. (2023), ‘Gradient boosting for extreme quantile regression’, Extremes 26(4), 639–667.
  • Wang and Li (2013) Wang, H. J. and Li, D. (2013), ‘Estimation of extreme conditional quantiles through power transformation’, Journal of the American Statistical Association 108(503), 1062–1074.
  • Wang, Li and He (2012) Wang, H. J., Li, D. and He, X. (2012), ‘Estimation of high conditional quantiles for heavy-tailed distributions’, Journal of the American Statistical Association 107(500), 1453–1464.
  • Wang and Tsai (2009) Wang, H. and Tsai, C.-L. (2009), ‘Tail index regression’, Journal of the American Statistical Association 104(487), 1233–1240.
  • Wang, Wu and Li (2012) Wang, L., Wu, Y. and Li, R. (2012), ‘Quantile regression for analyzing heterogeneity in ultra-high dimension’, Journal of the American Statistical Association 107(497), 214–222.
  • Welsh (1989) Welsh, A. (1989), ‘On m-processes and m-estimation’, Annals of Statistics 17(1), 337–361.
  • Xu et al. (2022) Xu, W., Hou, Y. and Li, D. (2022), ‘Prediction of extremal expectile based on regression models with heteroscedastic extremes’, Journal of Business & Economic Statistics 40(2), 522–536.
  • Youngman (2019) Youngman, B. D. (2019), ‘Generalized additive models for exceedances of high thresholds with an application to return level estimation for us wind gusts’, Journal of the American Statistical Association 114(528), 1865–1879.
  • Zhao et al. (2014) Zhao, T., Kolar, M. and Liu, H. (2014), ‘A general framework for robust testing and confidence regions in high-dimensional quantile regression’, arXiv:1412.8724.
  • Zheng et al. (2013) Zheng, Q., Gallagher, C. and Kulasekera, K. (2013), ‘Adaptive penalized quantile regression for high dimensional data’, Journal of Statistical Planning and Inference 143(6), 1029–1038.