High-Dimensional Extreme Quantile Regression

Yiwei Tang
Department of Statistics and Data Science, Fudan University
and
Huixia Judy Wang
Department of Statistics, The George Washington University
and
Deyuan Li
Department of Statistics and Data Science, Fudan University

Abstract

The estimation of conditional quantiles at extreme tails is of great interest in numerous applications. Various methods that integrate regression analysis with an extrapolation strategy derived from extreme value theory have been proposed to estimate extreme conditional quantiles in scenarios with a fixed number of covariates. However, these methods prove ineffective in high-dimensional settings, where the number of covariates increases with the sample size. In this article, we develop new estimation methods tailored for extreme conditional quantiles with high-dimensional covariates. We establish the asymptotic properties of the proposed estimators and demonstrate their superior performance through simulation studies, particularly in scenarios of growing dimension and high dimension where existing methods may fail. Furthermore, the analysis of auto insurance data validates the efficacy of our methods in estimating extreme conditional insurance claims and selecting important variables.

Keywords: Extrapolation; Extreme value; High-dimensional data; Regression analysis.

1 Introduction

Quantile regression has become a widely recognized and useful alternative to classical least-squares regression for analyzing heterogeneous data. Since its introduction by Koenker and Bassett (1978), quantile regression has gradually been extended to a wide variety of data analytic settings; for a comprehensive review, see Koenker (2005); Koenker et al. (2017).

While traditional quantile regression allows exploration of a wide range of conditional quantiles for $\tau\in[\tau_{l},\tau_{u}]$ with $\tau_{l},\tau_{u}\in(0,1)$ , there is often interest in the extreme tails where $\tau$ is close to 0 or 1. Without loss of generality, we focus the discussion on $\tau$ close to 1. The inherent challenge in estimating tail quantiles lies in the fact that the number of observations in the tails, that is, above the $\tau$ th quantile, is often small. In various contexts, methods have been developed to study extreme quantile regression by leveraging extreme value theory. Chernozhukov (2005), Wang, Li and He (2012), and Wang and Li (2013) studied extreme conditional quantiles under the linear quantile regression framework, and Li and Wang (2019) focused on the linear quantile autoregressive model. Daouia et al. (2013) and Gardes and Stupfler (2019) considered nonparametric smoothing methods. Wang and Tsai (2009) and Youngman (2019) applied generalized Pareto models to analyze the exceedance at a high threshold. Velthoen et al. (2023) and Gnecco et al. (2024) explored tree-based methods such as gradient boosting and random forest.

With advancements in data collection techniques such as genomics, economics, finance, and imaging studies, the dimension of covariates $p$ is becoming larger and can grow with the sample size $n$ . Existing methodology and theory in quantile regression for high-dimensional covariates have primarily focused on a central quantile level or compact quantile sets in $(0,1)$ . In situations where $p$ grows with $n$ , various studies have investigated asymptotic behaviors of quantile regression estimators, e.g., Welsh (1989), He and Shao (2000), Belloni et al. (2019), Pan and Zhou (2021), and He et al. (2023). Assuming the sparsity in regression coefficients, researchers have explored the $\ell_{1}$ -penalized quantile regression estimator and its generalizations (Belloni and Chernozhukov, 2011; Fan et al., 2014), concave penalties (Wang, Wu and Li, 2012), and smoothed quantile regression with concave penalties (Tan et al., 2022). Additionally, the debiased estimator of high-dimensional quantile regression has been studied for inference at central quantiles (Zhao et al., 2014; Bradic and Kolar, 2017).

In this article, we address the challenge of estimating extreme conditional quantiles of $Y\in\mathbb{R}$ given a set of predictors $\mathbf{X}\in\mathbb{R}^{p}$ in a high-dimensional context. The challenge lies in the scarcity of efficient samples at the tail and the sparsity of samples in high-dimensional space. As noted by Gnecco et al. (2024), a lack of samples exceeding the corresponding conditional $\tau$ th quantile leads to empirical estimators with large bias and variance. The high dimensionality of the predictor space introduces additional bias, as noise covariates can obscure true signals.

Current research on extreme quantiles primarily focuses on estimation within fixed-dimensional settings and is either not applicable or less competitive in high-dimensional settings. Although tree-based methods (Velthoen et al., 2023; Gnecco et al., 2024) demonstrate effectiveness in relatively large and fixed covariate dimensions, addressing scenarios where the covariate dimension grows with the sample size, comparable to the sample size itself, remains inadequately explored. Recently, Sasaki et al. (2024) proposed a high-dimensional tail index regression model, assuming that the dimension grows with $n$ and is comparable to $n(1-\tau)$ . However, this approach imposes constraints such as a Pareto tail and specific link functions between the extreme value index and covariates.

In this article, we propose novel estimators for extreme quantiles in high dimensions by integrating concepts from extreme value theory with regularized estimators. Building upon a linear conditional quantile model, we employ regularized quantile regression to estimate intermediate conditional quantiles and extrapolate them to extreme quantiles. Here, “intermediate quantile” refers to a quantile level $\tau_{n}$ that approaches one at a moderate rate such that $n(1-\tau_{n})\rightarrow\infty$ , whereas “extreme quantile” denotes a quantile level $\tau_{n}$ that approaches one more rapidly.

The theoretical analysis of extreme quantiles in high dimensions is challenging. The limiting distribution of a quantile estimator could be intractable in high dimensions, even at central quantile levels. Existing work on central quantiles within $(0,1)$ often requires a common assumption, namely $f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})>C>0$ , where $f_{Y}(\cdot|\mathbf{X})$ is the conditional density of $Y$ given $\mathbf{X}$ , and $Q_{Y}(\tau|\mathbf{X})$ is the $\tau$ th conditional quantile of $Y$ ; see for instance Belloni and Chernozhukov (2011), and Wang, Wu and Li (2012). However, this assumption is violated in the tails as $\tau$ approaches $1$ . It is even more challenging to achieve a uniform result of the regularized estimator over the entire tail, which is crucial for constructing an effective extreme value index estimator. This paper addresses the gap and establishes the uniform error rate for quantile regression estimator in high dimensions and at intermediate quantiles. This rate includes terms analogous to those in high-dimensional quantile regression at a central quantile level, augmented by an inflationary component attributed to escalating quantile levels or diminishing effective sample sizes. In addition, we propose a refined Hill approach for estimating the extreme value index, which is based on a fixed number of intermediate high-dimensional quantile estimates. This framework enables us to analyze the uniform behavior of regularized quantile regression in the tail region and assess the rate at which $f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})$ tends to zero. The error rates of the proposed refined Hill and extreme quantile estimators offer theoretical insights into threshold selection in high-dimensional contexts.

The rest of the article is organized as follows. In Section 2, we present the proposed estimators for the extreme value index and extreme conditional quantiles, and derive theoretical results for the proposed estimators using techniques from both extreme value theory and high-dimensional statistics theory. We assess the finite sample performance of the proposed methods through a simulation study in Section 3 and the analysis of auto insurance data in Section 4. Technical details and additional information for Sections 3 and 4 are provided in the online Supplementary Material.

2 Extreme Quantile Estimation in High Dimensions

Suppose we observe a random sample $\{(\mathbf{X}_{i},Y_{i}),i=1,\ldots,n\}$ of the random vector $(\mathbf{X},Y)$ , where $Y_{i}$ is the univariate response variable and $\mathbf{X}_{i}=(X_{i1},\ldots,X_{ip})^{T}$ is the $p$ -dimensional centralized design vector. This article considers the high dimensional case: $p:=p(n)\rightarrow\infty$ as $n\rightarrow\infty$ . Let $Q_{Y}(\tau|\mathbf{X})=\inf\{y:F_{Y}(Y|\mathbf{X})\geq\tau\}$ denote the $\tau$ th conditional quantile of $Y$ given $\mathbf{X}$ , where $F_{Y}(\cdot|\mathbf{X})$ is the cumulative distribution function (CDF) of $Y$ given $\mathbf{X}$ . Denote $\mathcal{X}\subset\mathbb{R}^{p}$ as the support of $\mathbf{X}$ .

Throughout the article, we assume that $F_{Y}(\cdot|\mathbf{X})$ is in the maximum domain of attraction of an extreme value distribution $G_{\gamma}(\cdot)$ with the extreme value index (EVI) $\gamma>0$ , denoted by $F_{Y}(\cdot|\mathbf{X})\in D(G_{\gamma})$ . That is, for a given random sample $Y_{1},\ldots,Y_{n}$ from $F_{Y}(\cdot|\mathbf{X})$ , there exist constants $a_{n}>0$ and $b_{n}\in\mathbb{R}$ such that

P\left(\left.\frac{\max_{1\leq i\leq n}Y_{i}-b_{n}}{a_{n}}\leq y\right|\mathbf{X}\right)\rightarrow G_{\gamma}(y)=\exp\{-(1+\gamma y)^{-1/\gamma}\},

as $n\rightarrow\infty$ , for $1+\gamma y\geq 0$ . In this paper, we assume $\gamma>0$ , which means that $Y|\mathbf{X}$ has heavy-tailed distributions as commonly seen in many applications, such as stock market returns, insurance claims, earthquake magnitudes, river flows during floods.

The main objective of this paper is to estimate the conditional quantile $Q_{Y}(\tau_{n}|\mathbf{X})$ when $\tau_{n}\rightarrow 1$ and $p\rightarrow\infty$ as $n\rightarrow\infty$ . We address two distinct regimes for the quantile level $\tau_{n}$ as it approaches 1: intermediate order quantile levels such that $n(1-\tau_{n})\rightarrow\infty$ , and extreme order quantile levels with $n(1-\tau_{n})\rightarrow C$ , where $C\geq 0$ is some constant.

We focus on the following tail linear quantile regression model:

Q_{Y}(\tau|\mathbf{X})=\beta_{0}(\tau)+X_{1}\beta_{1}(\tau)+\cdots+X_{p}\beta_{p}(\tau)=:\mathbf{Z}^{T}\boldsymbol{\beta}(\tau),\quad\text{for all}\ \tau\in[\tau_{ln},1),

(1)

where $\mathbf{Z}=(1,\mathbf{X}^{T})^{T}$ , $\tau_{ln}\rightarrow 1$ as $n\rightarrow\infty$ , and the quantile slope coefficients $\boldsymbol{\beta}(\tau)$ may vary across $\tau\in$ $[\tau_{ln},1)$ . The linear quantile model in (1) is specifically assumed at the upper tail. Similar tail model specifications have also been considered in Wang, Li and He (2012), Li and Wang (2019), Xu et al. (2022). In high-dimensional settings, we extend this assumption and additionally impose a sparsity condition to ensure model identifiability. We assume there are $s$ tail-relevant variables, that is, $s:=\#\{j=1,2,\ldots,p:\beta_{j}\left(\tau\right)\neq 0,$ $\exists\tau\in[\tau_{ln},1)\}$ , and $s=o(n)$ . A tail-relevant variable may influence the tail quantiles of $Y|\mathbf{X}$ , even if it does not affect the central quantiles. Without loss of generality, we assume that the first $s$ slope coefficients are nonzero, i.e., $\boldsymbol{\beta}(\tau)=(\beta_{0}(\tau),\beta_{1}(\tau),\ldots,\beta_{s}(\tau),\mathbf{0}_{p-s})^{T}$ .

We propose a three-step procedure to estimate the extreme conditional quantile. In the first step, we obtain $\ell_{1}$ -penalized quantile estimators at a sequence of intermediate quantile levels. For heavy-tailed distributions, the Hill estimator (Hill, 1975) is the most commonly used method to estimate the EVI. However, based on the average of log excesses, the Hill estimator requires analysis of the tail quantile process (de Haan and Ferreira, 2006), which poses both computational and theoretical challenges in high dimensions. To address these challenges, we develop a refined Hill estimator in the second step, which relies on quantile estimates at a fixed number of intermediate quantile levels that approach to one at the same rate. In the third step, we develop an extrapolation estimator for the extreme conditional quantile $Q_{Y}(\tau_{n}|\mathbf{X})$ by leveraging extreme value theory and results from the first two steps.

Before presenting the proposed estimators and their theoretical properties, we first introduce the notations used throughout the paper. For a sequence of random variables $E_{1},E_{2},\ldots,E_{n}$ , we denote its order statistics as $E_{(1)}\leq E_{(2)}\leq\ldots\leq E_{(n)}$ . Given a random sample $\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n}$ , let $\mathbb{G}_{n}(f)=\mathbb{G}_{n}\{f(\mathbf{Z}_{i})\}:=n^{-1/2}\sum_{i=1}^{n}[f(\mathbf{Z}_{i})-\mathrm{E}\{f(\mathbf{Z}_{i})\}]$ and $\mathbb{E}_{n}f=\mathbb{E}_{n}f\left(\mathbf{Z}_{i}\right):=n^{-1}\sum_{i=1}^{n}f(\mathbf{Z}_{i})$ . We denote the $\ell_{2}$ , $\ell_{1}$ , $\ell_{\infty}$ and $\ell_{0}$ norms by $\|\cdot\|$ , $\|\cdot\|_{1}$ , $\|\cdot\|_{\infty}$ and $\|\cdot\|_{0}$ , respectively. Let $\|\boldsymbol{\beta}\|_{1,n}=\sum_{j=1}^{p}\widehat{\sigma}_{j}|\beta_{j}|$ denote the weighted $\ell_{1}$ -norm with $\widehat{\sigma}_{j}^{2}:=\mathbb{E}_{n}(X_{ij}^{2})$ . Given a vector $\boldsymbol{\delta}\in\mathbb{R}^{p}$ , and a set of indices $T\subset\{1,\ldots,p\}$ , we denote by $\boldsymbol{\delta}_{T}$ the vector where $\delta_{Tj}=\delta_{j}$ if $j\in T,\delta_{Tj}=0$ if $j\notin T$ . We use $a\lesssim b$ to denote $a=O(b)$ , meaning $a\leq cb$ for some constant $c>0$ that does not depend on $n$ ; and $a\asymp b$ to denote $a=O(b)$ and $b=O(a)$ . We use $a\lesssim_{P}b$ to denote $a=O_{P}(b)$ . Additionally, we use $a\vee b=\max\{a,b\}$ and $a\wedge b=\min\{a,b\}$ . For notational simplicity, let $F_{i}(\cdot)=F_{Y}(\cdot|\mathbf{X}_{i})$ be the conditional distribution function of $Y$ given $\mathbf{X}_{i}$ , and denote $f_{i}(\cdot)=f_{Y}(\cdot|\mathbf{X}_{i})$ . Finally, let $\mathbb{X}=(\mathbf{X}_{1},\ldots,\mathbf{X}_{n})^{\top},\mathbb{Z}=(\mathbf{Z}_{1},\ldots,\mathbf{Z}_{n})^{\top}$ , and $\mathbf{Y}=(Y_{1},Y_{2},\ldots,Y_{n})^{T}$ .

2.1 $\ell_{1}$ -penalized Intermediate Quantile Estimator

Define $\mathcal{T}_{n}:=\{c(1-\tau_{0n}):c\in[c_{1},c_{2}]\},$ where $\tau_{0n}$ is an intermediate quantile level such that $\tau_{0n}\rightarrow 1$ and $n(1-\tau_{0n})\rightarrow\infty$ , and $0<c_{1}<c_{2}<\infty$ are constants. For any $\tau$ such that $1-\tau\in\mathcal{T}_{n}$ , we define the $\ell_{1}$ -penalized quantile estimator of $\boldsymbol{\beta}(\tau)$ as

\widehat{\boldsymbol{\beta}}(\tau)=\mathop{\rm argmin}\limits_{(\beta_{0},\boldsymbol{\beta}^{T})^{T}\in\mathbb{R}^{p+1}}\frac{1}{n}\sum_{i=1}^{n}\rho_{\tau}(Y_{i}-\beta_{0}-\mathbf{X}_{i}^{T}\boldsymbol{\beta})+\frac{\lambda\sqrt{\tau(1-\tau)}}{n}\|\boldsymbol{\beta}\|_{1,n},

(2)

where $\lambda>0$ is the penalization parameter.

For high dimensional settings, the choice of $\lambda$ is crucial for achieving estimation consistency. As suggested by Bickel et al. (2009), $\lambda$ should exceed a suitably rescaled (sub)gradient of the sample criterion function evaluated at the true parameter value. While the asymptotic scale of $\lambda$ has been derived for central quantiles (Belloni and Chernozhukov, 2011; Zheng et al., 2013), this scale differs for intermediate quantiles as $\tau$ approaches one. Specifically, a subgradient of the penalized quantile regression objective function at $\boldsymbol{\beta}(\tau)$ is given by $\mathbb{E}_{n}[\mathbf{Z}_{i}\{\tau-\mathbbm{1}(Y_{i}\leq\mathbf{Z}_{i}^{T}\boldsymbol{\beta}(\tau))\}]+{\lambda\sqrt{\tau(1-\tau)}}n^{-1}(0,\mathbf{g}^{T})^{T}$ , where $\mathbf{Z}=(1,\mathbf{X}^{T})^{T}$ and $\mathbf{g}=(g_{1},\ldots,g_{p})^{T}$ is a subgradient of $\|\boldsymbol{\beta}\|_{1,n}$ with $|g_{j}|=\widehat{\sigma}_{j}$ for $j=1,2,\ldots,p$ . The infinity norm of the first term is of the same order for both central and intermediate quantiles; however, the second term varies in order due to the factor $\sqrt{\tau(1-\tau)}$ . Lemma 1 in the Supplementary Material provides an asymptotic uniform lower bound of $\lambda$ over $\tau\in\mathcal{T}_{n}$ , and a practical choice for $\lambda$ is discussed in Section 2.4.

One major challenge in establishing the theoretical properties of $\widehat{\boldsymbol{\beta}}(\tau)$ arise from the violation of the condition $f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})>C>0$ at intermediate quantiles. This violation complicates the establishment of a quadratic lower bound for the Taylor expansion of the expected quantile loss function. Consequently, the general framework of Negahban et al. (2012) cannot be applied to obtain the convergence rate through the restricted strong convexity property of the empirical loss function. However, leveraging condition C5 and extreme value theory, we assess the rate at which the conditional quantile and density converge, determining how $f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})$ approaches zero as $\tau$ goes to one. This allows us to establish an asymptotic quadratic lower bound for the expected intermediate quantile loss after appropriate standardization (see Lemma 4 in the Supplementary Material).

To establish the theoretical properties of $\widehat{\boldsymbol{\beta}}(\tau)$ over $\tau\in\mathcal{T}_{n}$ , we impose Conditions C1-C6. For brevity, the conditions are detailed in Supplementary Material, but we highlight one key condition, C5, below.

Condition C5. There exists an auxiliary line $\mathbf{Z}\rightarrow\mathbf{Z}^{T}\boldsymbol{\theta}_{r}$ with $0<r<1$ and a bounded vector $\boldsymbol{\theta}_{r}$ such that for $U=Y-\mathbf{Z}^{T}\boldsymbol{\theta}_{r}$ and some heavy-tailed distribution function $F_{0}(\cdot)$ with extreme value index $\gamma>0$ and density $f_{0}(\cdot)$ , the following hold for some positive continuous functions $K(\cdot),{K}_{1}(\cdot),{K}_{2}(\cdot)$ on $\mathcal{Z}=(1,\mathcal{X})$ .

(i)

There exists positive sequences $d_{n},d_{1n}$ such that $K(\mathbf{Z})\asymp d_{n},K_{1}(\mathbf{Z})\asymp d_{1n}$ hold for all $\mathbf{Z}\in\mathcal{Z}$ .

(ii)

For any sequence $t_{n}\rightarrow\infty$ , if $d_{n}\{1-F_{0}(t_{n})\}\rightarrow 0$ and $d_{n}f_{0}(t_{n})\rightarrow 0$ as $n\rightarrow\infty$ , then

	$\displaystyle\left\|\frac{1-F_{U}(t_{n}\|\mathbf{Z})}{K(\mathbf{Z})\{1-F_{0}(t_{n})\}}-1\right\|$	$\displaystyle=\{1-F_{0}(t_{n})\}^{\delta}{K}_{1}(\mathbf{Z})\{1+o(1)\},~{}\text{and}$
	$\displaystyle\left\|\frac{f_{U}(t_{n}\|\mathbf{Z})}{K(\mathbf{Z})f_{0}(t_{n})}-1\right\|$	$\displaystyle=\{f_{0}(t_{n})\}^{\delta}{K}_{2}(\mathbf{Z})\{1+o(1)\},$

holds uniformly for $\mathbf{Z}\in\mathcal{Z}$ , where $\delta>0$ is a constant.

(iii)

$U_{0}(t):=F_{0}^{-1}(1-1/t)$ satisfies the second-order condition $A_{1}(t)^{-1}\{U_{0}(tz)/U_{0}(t)-z^{\gamma}\}\rightarrow z^{\gamma}(z^{\rho}-1)/\rho,\text{ as }t\rightarrow\infty$ with $\gamma>0,\rho<0$ , and $A_{1}(t)=\gamma dt^{\rho}$ with $d\neq 0$ . Additionally, $f_{0}(t)$ is regularly varying at infinity with index $-1/\gamma-1$ .

Remark 1.

Condition C5 presents a novel framework tailored for high-dimensional settings, where $K(\mathbf{Z})$ , $K_{1}(\mathbf{Z})$ , and $K_{2}(\mathbf{Z})$ may be unbounded and affect the tail behavior of $Y|\mathbf{X}$ . Unlike fixed-dimensional cases (Wang, Li and He, 2012; Chernozhukov, 2005), this condition addresses the combined complexities of high dimensionality and tail behavior. The conditions $d_{n}\{1-F_{0}(t_{n})\}\rightarrow 0$ and $d_{n}f_{0}(t_{n})\rightarrow 0$ as $n\rightarrow\infty$ ensure that the tail effect of $1-F_{0}(t_{n})$ outweighs the diverging scale effect of $K(\mathbf{Z})$ in high dimensions. This requirement allows us to apply extreme value theory to infer the tail behavior of $Y$ , and is mild, as it is automatically satisfied when the number of important variables $s$ is finite (see Example 1). The second-order condition of $U_{0}$ implies that ${U_{0}(tz)}/{U_{0}(t)}\rightarrow z^{\gamma}$ , which can be interpreted as a first-order condition (see Condition C6 for the definition). The first-order condition is a necessary and sufficient condition for $F_{0}$ to belong to the maximum domain of attraction $D(G_{\gamma})$ for heavy-tailed distributions (de Haan and Ferreira, 2006). Most commonly used families of continuous distributions satisfies the second-order condition. Although both the extreme value index estimator and the extreme quantile estimator are based on the first-order condition, the second-order condition is essential for deriving their theoretical results. Additionally, Lemma 3 in the Supplementary Material demonstrates that under Condition C5, a second-order condition of $F_{Y}(t|\mathbf{X})$ still holds, similar to the fixed-dimensional case.

Define $s_{r}:=\|\boldsymbol{\theta}_{r}\|_{0}$ . The following Theorem 1 establishes a uniform bound of $\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau)$ for $\tau\in\mathcal{T}_{n}$ .

Theorem 1.

Assume Conditions C1-C6 hold, along with $1-\tau_{0n}>\sqrt{s\log(p)/n}$ , $d_{n}^{-1}d_{1n}^{1/\delta}(1-\tau_{0n})\rightarrow 0$ , $s_{r}d_{n}^{-\gamma}(1-\tau_{0n})^{\gamma}\rightarrow 0$ , and $\lambda\asymp\sqrt{n\log p}/\sqrt{1-\tau_{0n}}$ . Then, for any $\epsilon\in(0,1)$ and some positive constant $C$ , there exist a positive integer $N$ such that, for all $n>N$ ,

\sup_{1-\tau\in\mathcal{T}_{n}}\|\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau)\|\leq Cd_{n}^{\gamma}(1-\tau_{0n})^{-\gamma-1}\sqrt{\frac{s\log(p\vee n)}{n}},

with probability at least $1-4\epsilon-5p^{-4}$ .

Theorem 1 establishes bounds similar in spirit to those found in Chernozhukov (2005) but with an additional term, $d_{n}^{\gamma}\left(1-\tau_{0n}\right)^{-\gamma-1}$ , which inflates the upper bound. This term reflects the decay of the conditional density in the tails. Specifically, as shown in Remark S.2 in the Supplementary Material, under certain conditions, as $n\rightarrow\infty$ , $f_{Y}(Q_{Y}(\tau|\mathbf{X})|\mathbf{X})\sim d_{n}^{-\gamma}(1-\tau_{0n})^{\gamma+1},$ which describes the rate at which the conditional density approaches zero at intermediate quantiles. Theorem 1 poses both upper and lower bounds conditions on $1-\tau_{0n}$ . The condition $1-\tau_{0n}>\sqrt{s\log(p)/n}$ prevents $\tau_{0n}$ from approaching one too quickly, aligning with the requirement for restricted strong convexity of the objective function (see Lemma 4 in the Supplementary Material). Restricted strong convexity refers to maintaining strong convexity within a restricted set (Negahban et al., 2012). In the high-dimensional context, this restricted set is defined such that the error vector lies within it with high probability, given a suitably chosen penalty and certain conditions. Intuitively, restricted strong convexity holds only if the prediction error of the penalized estimator, $\mathbf{Z}^{T}(\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau))$ , grows slower than the quantile of $Y|\mathbf{X}$ . The conditions $s_{r}d_{n}^{-\gamma}(1-\tau_{0n})^{\gamma}\rightarrow 0$ and $d_{n}^{-1}d_{1n}^{1/\delta}(1-\tau_{0n})\rightarrow 0$ ensure that the tail effect dominates the high-dimensional effect. In Section 2.3, we provide an example illustrating the feasibility of these conditions.

2.2 Refined Hill Estimator

Let $k=\lfloor n(1-\tau_{0n})\rfloor$ . For simplicity, we omit the rounding notation, as it does not impact the theoretical discussion that follows. Define quantile levels $\tau_{0n}=\tau_{1}<\tau_{2}<\cdots<\tau_{J}\in(0,1)$ , where $\tau_{j}=1-l_{j}k/n,l_{j}=s^{j-1},s\in(0,1)$ . For each $j=1,2,\ldots,J$ , we estimate $\boldsymbol{\beta}(\tau_{j})$ by the $\ell_{1}$ -penalized quantile regression estimator in (2). Suppose that we are interested in estimating the extreme conditional quantile of $Y$ given $\mathbf{X}=\mathbf{x}\in\mathcal{X}$ . Let $\mathbf{z}=(1,\mathbf{x}^{T})^{T}$ , and we can define the refined Hill estimator as

\hat{\gamma}=\left(\sum_{j=1}^{J}\phi\left(l_{j}\right)\log\left(1/l_{j}\right)\right)^{-1}\sum_{j=1}^{J}\phi\left(l_{j}\right)\log\left(\frac{\mathbf{z}^{T}\widehat{\boldsymbol{\beta}}(\tau_{j})}{\mathbf{z}^{T}\widehat{\boldsymbol{\beta}}(\tau_{1})}\right).

where $\phi(\cdot)$ is some positive measurable function.

The refined Hill estimator is a weighted variant of the Hill-type estimators found in the literature but with the following key differences. First, $\hat{\gamma}$ utilizes a fixed number of log excesses at intermediate quantile levels with the same rate, whereas the Hill estimator typically relies on quantiles approaching one at varying rates, as seen in (Wang, Li and He, 2012). Second, $\hat{\gamma}$ allows for flexible weighting through $\phi$ , while the Hill estimator commonly applies equal weights to upper quantiles (Hill, 1975; Wang, Li and He, 2012; Daouia et al., 2023). We will discuss the choice of the weight function $\phi$ , and the tuning parameters $J$ and $s$ in Section 2.4. Similar weighting schemes have been explored by other researchers to generalize Hill and Pickands estimators (Drees, 1995; Daouia et al., 2013; He et al., 2022) and for estimating the Weibull tail coefficient (Gardes and Girard, 2016; He et al., 2020). These methods were developed in fixed-dimensional settings where the theoretical guarantees of the refined estimators could be established using the tractable asymptotic normality of intermediate quantile estimates. While this strategy is not feasible in high-dimensional settings, we can leverage the uniform convergence results from Theorem 1 to establish the convergence rate of the refined Hill estimator. Suppose there exist a positive sequence $d_{1n}$ such that, $K_{1}(\mathbf{Z})\asymp d_{1n}$ hold for all $\mathbf{Z}\in\mathcal{Z}$ .

Theorem 2.

Assume that $\mathbf{x}$ belongs to a compact set in $\mathcal{X}\subset\mathbb{R}^{p}$ . Assume the conditions of Theorem 1, and that $s_{r}=o\left(\left(d_{n}^{\gamma+\rho}(n/k)^{\gamma+\rho}\right)\vee\left(d_{n}^{\gamma-\delta}d_{1n}(n/k)^{\gamma-\delta}\right)\right)$ , $d_{n}^{\rho}(n/k)^{-\rho}\vee d_{1n}d_{n}^{-\delta}(n/k)^{\delta}=o\left((ns/k)\sqrt{{\log(p\vee n)}/n}\right)$ . Then we have

\hat{\gamma}-\gamma=O_{p}\left(\frac{ns}{k}\sqrt{\frac{\log(p\vee n)}{n}}\right).

Remark 2.

Theorem 2 suggests that if $(ns/k)\sqrt{{\log(p\vee n)}/n}\rightarrow 0$ , then $\hat{\gamma}\overset{p}{\rightarrow}\gamma$ . In Theorem 2, the conditions on $s_{r}$ are posed to account for the error in the second-order term in the tail expansion of $F_{Y}(\cdot|\mathbf{X})$ . These conditions can be simplified for specific models, as discussed in Section 2.3. Note that the convergence rate of $\hat{\gamma}$ is $(ns/k)\sqrt{\log(p\vee n)/n}$ , which is slower than the typical ${k}^{-1/2}$ rate achieved in the fixed-dimensional case (Wang and Tsai, 2009; Daouia et al., 2013). The difference arises from the slower convergence of the intermediate quantile estimators in high-dimensional settings. Furthermore, to ensure consistency, $k$ must satisfy $k>n^{1/2}$ , which is stricter than the requirement in the fixed dimensional case (Wang, Li and He, 2012), where $k>n^{\eta}$ with $\eta$ being a small positive constant. Finally, while the error bound of the $\ell_{1}$ -penalized intermediate quantile estimator depends on $\gamma$ , the convergence rate of $\hat{\gamma}$ remains invariant across $\gamma$ .

2.3 Extreme Conditional Quantile Estimator

Let $\tau_{n}\rightarrow 1$ be an extreme quantile level such that $1-\tau_{n}=o(1-\tau_{0n})$ . Suppose that we are interested in estimating the $\tau_{n}$ th conditional quantile of $Y$ given $\mathbf{X}=\mathbf{x}$ . Direct estimation using the $\ell_{1}$ -penalized quantile estimator in (2) is often inaccurate or unstable due to limited data in the extreme tail. However, we can leverage extreme value theory and extrapolate from intermediate to extreme quantiles by using the relationship between quantiles with levels approaching one at different rates.

Define $U_{Y}(t|\mathbf{X})=\inf\{y:F_{Y}(y|\mathbf{X})\geq 1-1/t\}=F^{-1}_{Y}(1-1/t|\mathbf{X})$ for $t\geq 1$ , the ( $1-1/t$ )th quantile of $F_{Y}(\cdot|\mathbf{X})$ . According to Corollary 1.2.10 in de Haan and Ferreira (2006), for a heavy-tailed distribution $F_{Y}(\cdot|\mathbf{X}=\mathbf{x})\in D(G_{\gamma})$ , we have

\frac{U_{Y}(tz|\mathbf{x})}{U_{Y}(t|\mathbf{x})}\rightarrow z^{\gamma},\quad\text{as }t\rightarrow\infty.

Motivated by this, for a given $\mathbf{x}$ , we estimate $Q_{Y}(\tau_{n}|\mathbf{x})$ by

\widehat{Q}_{Y}(\tau_{n}|\mathbf{x})=\left(\frac{1-\tau_{0n}}{1-\tau_{n}}\right)^{\hat{\gamma}}\mathbf{z}^{T}\widehat{\boldsymbol{\beta}}(\tau_{0n}),

where $\mathbf{z}=(1,\mathbf{x}^{T})^{T}$ .

Theorem 3.

Assume the conditions of Theorem 2. Let $\tau_{n}\rightarrow 1$ be a quantile level such that $n(1-\tau_{n})=o(k)$ , then we have

\frac{\widehat{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}{{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}-1=O_{p}\left(\log\left(\frac{k}{n(1-\tau_{n})}\right)\frac{ns}{k}\sqrt{\frac{\log(p\vee n)}{n}}\right).

Additionally, if $\log[k/\{n(1-\tau_{n})\}]$ $(ns/k)\sqrt{{\log(p\vee n)}/{n}}\rightarrow 0$ , ${\widehat{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}/{{Q}_{Y}\left(\tau_{n}|\mathbf{x}\right)}\overset{p}{\rightarrow}1$ .

According to our estimation procedure, the error in estimating extreme quantiles primarily arises from the EVI estimator and the $\ell_{1}$ -penalized intermediate quantile estimator. Specifically, the error rate of the EVI estimator and the relative error rate of the intermediate quantile estimator are both given by $(ns/k)\sqrt{{\log(p\vee n)}/{n}}$ . Since the EVI estimator contributes to the error rate at the exponent, the final error rate for the extreme quantile estimation is $\log[{k}/\{n(1-\tau_{n})\}(ns/k)\sqrt{{\log(p\vee n)}/{n}}$ . In the fixed dimension case, the rate of the relative error of extreme quantile estimator is $\log[k/\{n(1-\tau_{n})\}]/\sqrt{k}$ . Comparing this with our results, we observe that the inflated term arises from the EVI estimator, which has an error rate of ${1}/{\sqrt{k}}$ in the fixed-dimensional setting.

Remark 3.

In this paper, we assume that $F_{Y}(\cdot|\mathbf{X})\in D(G_{\gamma})$ , which represents a first-order condition on $F_{Y}(\cdot|\mathbf{X})$ . To derive the properties of $\widehat{Q}_{Y}(\tau_{n}|\mathbf{x})$ , a second-order condition is needed. Condition C5 assumes the existence of an auxiliary line so that $U=Y-\mathbf{Z}^{T}\boldsymbol{\theta}_{r}$ satisfies the second-order condition. Under this condition, we can show that $F_{Y}(\cdot|\mathbf{X})$ also satisfies the second-order condition even for high-dimensional $\mathbf{X}$ (see Lemma 3), and it has the same EVI across $\mathbf{X}\in\mathcal{X}$ . This implies that the EVI can be technically estimated based on the intermediate conditional quantiles at any $\mathbf{x}\in\mathcal{X}$ . Theorem 2 establishes the convergence rate for $\hat{\gamma}$ obtained at $\mathbf{X}=\mathbf{x}$ belonging to a compact set, which we now denote as $\hat{\gamma}(\mathbf{x})$ to emphasize its dependence on $\mathbf{x}$ . Careful selection of $\mathbf{x}$ is important, particularly in high-dimensional cases where data points may be sparse due to the curse of dimensionality. In practice, we recommend estimating $\hat{\gamma}(\mathbf{x})$ at $\bar{\mathbf{x}}=n^{-1}\sum_{i=1}^{n}\mathbf{X}_{i}$ , as this leads to more stable numerical performance and a higher convergence rate than $\hat{\gamma}(\mathbf{x})$ , whose convergence rate is presented in Remark 2. Specifically, $\hat{\gamma}(\bar{\mathbf{x}})-\gamma=O_{p}\left(({n}/k)\sqrt{{s\log(p\vee n)}/{n}}\right)$ ; see the proof of Theorem 2 in the Supplementary Material for more details. A pooled estimator that averages $\{\hat{\gamma}(\mathbf{X}_{i}),i=1,\ldots,n\}$ could also improve estimation efficiency; however, this approach is computationally intensive and less practical in high-dimensional settings.

The regularity conditions outlined in our paper encompass a broad range of conventional regression settings. To illustrate these conditions, we consider the location-scale shift model as a representative example.

Example 1. Consider the location-scale shift model:

Y=\alpha+\mathbf{X}^{T}\boldsymbol{\beta}+(1+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}})\varepsilon,

where $1+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}}>0$ for $\mathbf{X}\in\mathcal{X}$ , and $\varepsilon\sim F_{0}(\cdot)$ with $F_{0}$ satisfying the second order condition for some $\gamma>0$ and $\rho<0$ . Obviously, $Q_{Y}(\tau|\mathbf{X})=(\alpha+F_{0}^{-1}(\tau))+\mathbf{X}^{T}(\boldsymbol{\beta}+\tilde{\boldsymbol{\sigma}}F_{0}^{-1}(\tau))$ .

Suppose that the scale effect variable is finite, i.e., $\|\tilde{\sigma}\|_{0}<\infty$ . Under this model, Condition C5 holds with $K\left(\mathbf{Z}\right)=\left(1+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}}\right)^{{1}/{\gamma}}$ , ${K}_{1}\left(\mathbf{Z}\right)=C\left\{\left(1+\mathbf{X}^{T}\widetilde{\boldsymbol{\sigma}}\right)^{-{\rho}/{\gamma}}-1\right\}$ , where $C$ is some constant and $\delta=-\rho$ . Details of the derivation can be found in the Supplementary Material. Let $\|\boldsymbol{\beta}\|_{0}=n^{b}$ , where $0\leq b<1$ . Then $d_{n}\sim C$ , $d_{1n}\sim C$ , and $s_{r}=n^{b}$ . Thus the conditions of Theorem 2 are $k\rightarrow\infty$ , $k/n\rightarrow 0$ , $\sqrt{s\log p}\sqrt{n}\lesssim k$ , $n^{b}(k/n)^{\gamma}\rightarrow 0$ , for any $\gamma>0$ , which implies $\sqrt{s\log p}\sqrt{n}\lesssim k<\min\{ns_{r}^{-1/\gamma},n\}.$ The error rate is $\sup_{1-\tau\in\mathcal{T}_{n}}\|\widehat{\boldsymbol{\beta}}(\tau)-\boldsymbol{\beta}(\tau)\|=O_{p}\left((n/k)^{\gamma+1}\sqrt{s\log(p\vee n)/n}\right)$ . Next, we consider two special cases to simplify the conditions on $k$ and discuss the feasibility.

(I).

The non-zero components of $\boldsymbol{\beta}$ and $\boldsymbol{\beta}(\tau)$ are at the same positions, i.e., $s=s_{r}=n^{b}$ , where $0<b<1$ .

Theorem 1 requires the condition $\sqrt{s\log p}\sqrt{n}\lesssim k<ns^{-{1}/{\gamma}}$ . The addtional conditions on $k$ in Theorems 2 and 3 are $d_{n}^{\rho}(k/n)^{-\rho}\vee d_{1n}d_{n}^{-\delta}(k/n)^{\delta}=o((ns/k)\sqrt{\log(p\vee n)/n})$ and $s_{r}=o\left((d_{n}^{\gamma+\rho}(n/k)^{\gamma+\rho})\vee(d_{n}^{\gamma-\delta}d_{1n}(n/k)^{\gamma-\delta})\right)$ , which can be simplified to $k\lesssim(s^{2}\log p)^{1/2(1-\rho)}n^{1-1/2(1-\rho)}$ and $k\lesssim ns^{-{1}/{(\gamma+\rho)}}$ , respectively. To obtain consistency of $\hat{\gamma}$ , the condition on $k$ is $s\sqrt{\log p}\sqrt{n}\lesssim k$ . In summary, the condition on $k$ is $s\sqrt{\log p}\sqrt{n}\lesssim k\lesssim\left\{(s^{2}\log p)^{1/2(1-\rho)}n^{1-1/2(1-\rho)}\right\}\wedge ns^{-1/(\gamma+\rho)}.$ To ensure the existence of an appropriate $k$ , the additional sparsity conditions are $s^{1-1/(2\rho)}\log(p)/n\rightarrow 0$ and $s^{1+{2}/{(\gamma+\rho)}}\log p/n\rightarrow 0$ .
(II).

Suppose that $\|\boldsymbol{\beta}\|_{0}$ is finite, that is, $b=0$ .

In this case, the condition on $k$ becomes less stringent. Specifically, the condition in Theorem 1 simplifies to $\sqrt{\log p}\sqrt{n}\lesssim k<n$ . Furthermore, the conditions on $k$ in Theorems 2 and 3 can be expressed as $\sqrt{\log p}\sqrt{n}\lesssim k\lesssim(\log p)^{1/2(1-\rho)}n^{1-1/2(1-\rho)}$ . These conditions are feasible for any $\rho<0$ .

2.4 Computational Issues

The proposed method relies on several tuning parameters/functions, including the penalization parameter $\lambda$ , the integer $k$ associated with the intermediate quantile level $\tau_{0n}=1-k/n$ , the weight function $\phi(\cdot)$ and the number of intermediate quantile levels $J$ in the refined Hill estimator. In this section, we will discuss practical methods for selecting these quantities.

Selection of $\lambda$ . In high-dimension settings, the optimal choice of $\lambda$ depends on many factors, including $p$ , the design matrix and error distribution. Bickel et al. (2009) suggested that $\lambda$ should exceed the supremum norm of a suitably rescaled (sub)gradient of the sample criterion function, Belloni and Chernozhukov (2011) proposed a similar approach for selecting $\lambda$ at central quantiles. However, applying this method directly to intermediate quantiles often results in over-shrinkage. Our theoretical study suggests that at intermediate quantiles, $\lambda$ should scale as $C\sqrt{n\log p}/\sqrt{1-\tau_{0n}}$ , but accurately estimating the constant $C$ is challenging, as noted in Homrighausen and McDonald (2017); Chetverikov et al. (2021). In our implementation, we conduct 10-fold cross-validations to select $\lambda$ for each $\tau_{j},j=1,2,\ldots,J$ . This involves splitting the data into ten folds and using each fold to evaluate the estimator based on the remaining folds. For each $\lambda$ , we calculate the average quantile loss at $\tau_{j}$ across all folds and select the $\lambda$ that minimizes this loss. While time-intensive, our numerical studies show that cross-validation performs well across various settings considered. Developing an efficient and data-driven selection method tailored to intermediate quantiles is an interesting direction for future research.

Selection of $k$ . The intermediate quantile level, or equivalently $k$ , plays an important role of balancing between bias and variance. A small $k$ increases variance, while a large $k$ leads to higher bias. A common approach is to select $k$ at the first stable point in the plot of the EVI estimator versus $k$ ; see Neves et al. (2015) for a heuristic algorithm. However, this method is computationally intensive in high-dimensional cases, as the estimator needs to be calculated over a sequence of $k$ values, and the EVI may exhibit local variations across $\mathbf{x}$ . To reduce the computational cost, we propose a rule of thumb for selecting $k$ . Based on Theorem 2 and Example 1, $k$ should satisfy $\sqrt{s\log p}\sqrt{n}\lesssim k<\min\{ns_{r}^{-1/\gamma},n\}$ . Motivated by this, we set $k=\lfloor c_{0}n^{0.5+\delta_{1}}(\log p)^{0.5+\delta_{2}}\rfloor$ , where $c_{0}>0$ is a constant, and $\delta_{1}$ and $\delta_{2}$ are small positive constants. The sensitivity analysis in the Supplementary Material shows that our proposed method remains stable for $c_{0}\in[0.5,2.3]$ , $\delta_{1}\in[0.005,0.016]$ , and $\delta_{2}\in[0.01,0.08]$ , across various scenarios and sample sizes. Throughout our numerical studies, we use $c_{0}=0.8$ , $\delta_{1}=0.01$ and $\delta_{2}=0.05$ .

Choice of weights. The refined Hill estimator assigns different weights to $J$ intermediate quantiles. In our implementation, we use $\phi(\ell_{j})=\ell_{j}^{a}$ and $\ell_{j}=s^{j-1}$ , where $a,s\in(0,1)$ , as recommended by He et al. (2022) for extreme analysis in an autoregressive model. The authors showed that, in their setup, the asymptotic variance of the refined Hill estimator is proportional to a term depending only on $(J,s,a)$ . Minimizing this term yields the optimal choice $(J,s,a)$ . While the limiting distribution is intractable without debiasing in high-dimensional settings, we find that the same parameters continue to perform well, resulting in more efficient estimation than methods based on the unweighted Hill estimator.

3 Simulation Study

We conduct a simulation study to investigate the performance of the proposed method for estimating extreme quantiles in high-dimensional settings. The data are generated from

Y_{i}=X_{i1}+X_{i2}+(1+wX_{i1})\varepsilon_{i},\qquad i=1,2,\ldots,n,

where $X_{ij}\sim U(0,1)$ for $j=1,2,\ldots,p$ and $\varepsilon_{i}$ are independent and identically distributed student t random variables, and $w$ is a constant controlling the degree of heteroscedasticity. Under this model, the $\tau$ th conditional quantile of $Y$ is $Q_{Y}(\tau|\mathbf{X}_{i})=\alpha(\tau)+\mathbf{X}_{i}^{T}\boldsymbol{\beta}(\tau),$ where $\mathbf{X}_{i}=\left(X_{i1},X_{i2},\ldots,X_{ip}\right)^{T}$ , $\alpha(\tau)=Q_{\varepsilon}(\tau)$ , $\boldsymbol{\beta}(\tau)=\left(1+wQ_{\varepsilon}(\tau),1,0\ldots,0\right)^{T}$ , and $Q_{\varepsilon}(\tau)$ is the $\tau$ th quantile of $\varepsilon_{i}$ . We consider six different cases. In Cases 1, 3, and 5, $\varepsilon_{i}\sim t(5)$ and in Cases 2, 4 and 6, $\varepsilon_{i}\sim t(2)$ . Cases 1-2 have homoscedastic errors with $w=0$ , while $w=0.5$ in Cases 3-4, and $w=0.9$ in Cases 5-6. Note that, for $t(v)$ , the EVI is $\gamma=1/v$ . We consider two settings for the sample size $n$ and dimension $p$ : (1) $p=p(n)$ grows with $n$ but $p$ is smaller than $n$ , with $p=\lceil n^{0.5}\rceil$ and $n=1000,5000$ ; (2) $p$ is larger than or equal to $n$ , with $(n,p)=(1000,1000)$ and $(1000,1200)$ . For each scenario, the simulation is repeated $500$ times.

For each simulation, we estimate $Q_{Y}(\tau_{n}|\mathbf{x})$ at $\tau_{n}=0.995,0.999$ using three methods: the extreme quantile regression method (EQR) without assuming common slope in Wang, Li and He (2012); the high-dimensional quantile regression (HQR) of Belloni and Chernozhukov (2011), and the proposed high-dimensional extreme quantile regression (HEQR). For EQR, the number of upper-order statistics $k$ is set to $[4.5n^{1/3}]$ , as recommended in Wang, Li and He (2012).

For each simulation, we calculate integrated squared error (ISE), defined as

\mathrm{ISE}=\frac{1}{L}\sum_{l=1}^{L}\left\{\frac{\widehat{Q}_{Y}(\tau_{n}|\mathbf{X}_{l}^{*})}{Q_{Y}(\tau_{n}|\mathbf{X}_{l}^{*})}-1\right\}^{2},

where $\mathbf{X}_{1}^{*},\ldots,\mathbf{X}_{L}^{*}$ are evaluation points of the covariates, and we define the mean integrated squared error (MISE) as the average ISE across $500$ simulations. In our simulation, we set $L=100$ , and let $\mathbf{X}_{l}^{*}$ be random replicates of $\mathbf{X}$ .

Table 1: The mean integrated squared error (MISE) and standard error (in parentheses) of different estimators for the extreme conditional quantile at

\tau_{n}=0.995,0.999

for

\gamma=0.2

and

p=\lceil n^{0.5}\rceil

. All values are in percentages.

$w$	$n$	$\tau_{n}$	HEQR	EQR	HQR
0	1000	0.995	2.9(0.05)	3.6(0.05)	5.7(0.13)
		0.999	4.7(0.09)	5.2(0.09)	12.5(0.12)
	5000	0.995	1.6(0.02)	2.7(0.02)	3.7(0.04)
		0.999	2.4(0.03)	4.2(0.05)	6.1(0.06)
0.5	1000	0.995	3.3(0.05)	4.4(0.06)	7.5(0.21)
		0.999	4.8(0.10)	5.8(0.10)	14.6(0.14)
	5000	0.995	2.0(0.02)	3.4(0.03)	5.0(0.06)
		0.999	2.4(0.03)	4.8(0.06)	6.9(0.08)
0.9	1000	0.995	3.7(0.05)	5.2(0.07)	9.0(0.29)
		0.999	5.0(0.09)	6.5(0.12)	15.8(0.17)
	5000	0.995	2.3(0.02)	4.0(0.04)	6.0(0.08)
		0.999	2.6(0.03)	5.1(0.06)	7.8(0.12)

•

HEQR: the proposed estimator; EQR: the method in Wang, Li and He (2012); HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

Table 2: The mean integrated squared error (MISE) and standard error (in parentheses) of different estimators for the extreme conditional quantile at

\tau_{n}=0.995,0.999

for

\gamma=0.5

and

p=\lceil n^{0.5}\rceil

. All values are in percentages.

$w$	$n$	$\tau_{n}$	HEQR	EQR	HQR
0	1000	0.995	7.4(0.26)	10.7(0.56)	24.6(3.33)
		0.999	15.0(0.68)	19.9(2.11)	69.7(16.05)
	5000	0.995	5.3(0.11)	7.7(0.12)	12.7(0.28)
		0.999	7.2(0.23)	10.5(0.26)	28.5(2.78)
0.5	1000	0.995	8.5(0.31)	13.5(0.70)	32.3(3.70)
		0.999	15.4(0.84)	23.8(2.63)	96.8(28.76)
	5000	0.995	6.3(0.14)	9.7(0.18)	17.1(0.42)
		0.999	8.1(0.33)	11.9(0.39)	36.5(3.93)
0.9	1000	0.995	9.6(0.33)	16.1(0.80)	40.5(4.25)
		0.999	16.0(0.86)	27.3(2.86)	119.9(38.54)
	5000	0.995	7.3(0.17)	11.5(0.23)	21.5(0.58)
		0.999	9.4(0.39)	13.5(0.49)	46.9(5.11)

•

HEQR: the proposed estimator; EQR: the method in Wang, Li and He (2012); HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

Table 3: The mean integrated squared error (MISE) and standard error (in parentheses) of different estimators for the extreme conditional quantile at

\tau_{n}=0.995,0.999

for the setting

p\geq n

, with

(n,p)=(1000,1000)

and

(1000,1200)

. All values are in percentages.

$w$	$\gamma$	$p$	$\tau_{n}$	HEQR	HQR
0	0.2	1000	0.995	0.23(0.03)	0.19(0.02)
			0.999	0.23(0.03)	0.43(0.03)
		1200	0.995	0.20(0.02)	0.16(0.02)
			0.999	0.22(0.03)	0.30(0.03)
0.5	0.2	1000	0.995	0.26(0.03)	0.23(0.02)
			0.999	0.26(0.03)	0.50(0.03)
		1200	0.995	0.23(0.03)	0.20(0.02)
			0.999	0.24(0.03)	0.35(0.03)
0	0.5	1000	0.995	0.32(0.04)	0.44(0.33)
			0.999	0.41(0.06)	0.63(0.13)
		1200	0.995	0.41(0.05)	0.36(0.23)
			0.999	0.51(0.05)	0.55(0.14)
0.5	0.5	1000	0.995	0.34(0.06)	0.52(0.71)
			0.999	0.39(0.07)	0.67(0.17)
		1200	0.995	0.34(0.05)	0.41(0.41)
			0.999	0.44(0.08)	0.58(0.20)

•

HEQR: the proposed estimator; HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

We begin by examining the $p=\lceil n^{0.5}\rceil$ setting. Tables 1 and 2 summarize the MISE of three estimators for the extreme conditional quantile at $\tau_{n}=0.995,$ and $0.999$ . Results show that for all three methods, both MISE and its standard error increase as $\tau_{n}$ approaches 1, and as the distribution becomes more heavy-tailed or heteroscedastic. In all scenarios, the proposed method HEQR outperforms the others in terms of both MISE and standard error. Both EQR and HEQR rely on extrapolation using extreme value theory. In this setting, since $p$ increases with $n$ but remains smaller than $n$ , ERQ method is still applicable, and both EQR and HEQR perform better than HQR. The advantages of HEQR over EQR come from its penalized estimation of $\boldsymbol{\beta}(\tau)$ and its use of the refined Hill estimator, compared to EQR’s unweighted Hill estimator.

We next examine the setting where $p\geq n$ . Table 3 summarizes the results for two methods HEQR and HQR, with $(n,p)=(1000,1000)$ and $(1000,1200)$ , as EQR is not applicable in these cases. While HQR performs reasonably well for small $\gamma=0.2$ at $\tau_{n}=0.995$ , it is significantly less efficient than HEQR for all other scenarios, regarding both MISE and its standard error. Overall, the results suggest that HEQR demonstrates superior accuracy and stability, particularly for estimating extremely high quantiles in heavy-tailed distributions.

We perform a sensitivity analysis to assess the stability of the proposed method with respect to the choices of $c_{0}$ , $\delta_{1}$ , and $\delta_{2}$ in the rule of thumb $k=\lfloor c_{0}n^{0.5+\delta_{1}}(\log p)^{0.5+\delta_{2}}\rfloor$ . For brevity, we present results only for Case 1. The other cases exhibit similar behavior, and the results are provided in the Supplementary file. Figure 1 shows the MISE of EQR, HQR and HEQR for the extreme conditional quantiles at $\tau_{n}=0.995$ against $c_{0}\in[0.4,3]$ , with $(\delta_{1},\delta_{2})$ fixed at $(0.01,0.05)$ for $n=1000$ and $5000$ . The shaded area denotes the 95% pointwise confidence band of the MISE. The two horizontal lines represent the MISE of HQR and EQR, respectively, while the shaded area corresponds to the 95% pointwise confidence band of the MISE. For $c_{0}\in[0.5,2.3]$ , the MISE of HEQR is consistently smaller than that of HQR and EQR, indicating that $c_{0}$ in this range is a suitable choice. Figures 2 and 3 plot the MISE against $\delta_{1}\in[0.005,0.03]$ and $\delta_{2}\in[0.02,0.08]$ , respectively, with the other two constants fixed at their recommended values. The horizontal line represents the MISE associated with the recommended values $c_{0}=0.8$ , $\delta_{1}=0.01$ , and $\delta_{2}=0.05$ , while the shaded area corresponds to the 95% pointwise confidence band of the MISE. The results indicate that the method is insensitive to variations in $\delta_{1}\in[0.005,0.016]$ and $\delta_{2}\in[0.01,0.08]$ .

4 Analysis of Auto Insurance Claims

In this section, we analyze an auto insurance claims dataset to investigate the effects of various factors on the higher quantiles of claim amounts. The data is available on Kaggle at https://www.kaggle.com/datasets/xiaomengsun/car-insurance-claim-data. Insurance claims data are typically heavy-tailed and heterogeneous, reflecting the diverse nature of risks they embody and the array of factors that can influence claim sizes and frequencies. These factors not only complicate the analysis but also provide a rich vein of insights. We consider the following quantile regression model:

Q_{Y_{i}}(\tau|\mathbf{X})=\beta_{0}(\tau)+X_{i1}\beta_{1}(\tau)+X_{i2}\beta_{2}(\tau)+\cdots+X_{ip}\beta_{p}(\tau),\quad i=1,\ldots,n=8423,

where $Y$ represents the auto claim amount (in USD), and $X_{i1},X_{i2},\ldots,X_{ip}$ are $p=43$ covariates included after preprocessing. To account for nonlinearity, we include quadratic terms for “Age” and “Income,” as well as quadratic and cubic terms for “Vehicle Value” in the model, with both “Income” and “Vehicle Value” on the log-transformed scale. A detailed description of the variables can be found in Section S.4 of the Supplementary Materials. We focus on high quantiles ( $\tau=0.991,0.995,0.999$ ) to study the impact of various factors on large claims.

We conduct a cross-validation study to compare the performance of different methods for predicting the extremely high conditional quantiles of auto claims. We randomly divide the data set into a training set (20% of the data set, $n_{1}=1684$ observations) and a testing set (the remaining 80% of the data set, $n_{2}=6739$ observations). We apply HEQR, EQR, and HQR to analyze the training set and predict the extreme quantiles of auto claims conditional on the covariates in the testing set. For HEQR, we set $k=\lfloor 0.8n_{1}^{0.51}(\log p)^{0.55}\rfloor$ and estimate $\gamma$ at a central data point, with continuous covariates set to their mean values and binary variables set to their modes. Since $\mathbbm{1}\{{Y}<Q_{Y}({\tau}|\mathbf{X})\}$ has a mean of $\tau$ and a variance of $\tau(1-\tau)$ , this motivates us to consider the following prediction error (PE) metric:

\mathrm{PE}=\{n_{2}\tau(1-\tau)\}^{-1/2}\sum_{j\in\mathcal{I}}[\tau-\mathbbm{1}\{{Y}_{j}<\widehat{Q}_{\tau}({Y}|\mathbf{X}_{j})\}],

where $\mathcal{I}$ is the index set of the testing set. The cross-validation process is repeated 500 times. Table 4 summarizes the median absolute PE for different methods at $\tau_{n}=0.991$ , $0.995$ , and $0.999$ , with values in parentheses representing the corresponding median absolute deviation. The results indicate that HEQR yields significantly smaller prediction errors than the other two methods across all three quantile levels, demonstrating higher efficiency.

Table 4: The median absolute value (with median absolute deviation in parentheses) of prediction errors for different methods at

\tau_{n}=0.991

0.995

, and

0.999

, based on cross-validation of the auto claims data.

	$\tau_{n}=0.991$	$\tau_{n}=0.995$	$\tau_{n}=0.999$
HEQR	1.84(1.13)	1.83(1.19)	1.28(0.63)
EQR	7.82(0.00)	5.82(0.00)	2.60(0.00)
HQR	8.95(2.06)	13.96(3.02)	32.86(6.74)

•

HEQR: the proposed estimator; EQR: the method in Wang, Li and He (2012); HQR: the high-dimensional qunatile regression estimator in Belloni and Chernozhukov (2011).

Next, we analyze the full dataset to estimate extreme quantiles and identify important variables using HEQR. Out of the 43 covariates, our method identified 34 to have nonzero effects at the intermediate high quantile level at $\tau_{0n}=1-k/n$ with $k=\lfloor 0.8n^{0.51}(\log p)^{0.55}\rfloor$ and $n=8423$ . Notably, one of the most significant variables identified by the HEQR model is whether an individual resides in a rural district. This finding aligns with prior research by Clemente et al. (2023), which links rural residences to lower auto insurance claims due to reduced claim frequency and severity. Additionally, our results show that an increase in vehicle age corresponds to a decrease in claims, consistent with Clemente et al. (2023), who also found vehicle age negatively impacts claim frequency. Figure 4 shows the $\ell_{1}$ -penalized quantile coefficient estimates $\widehat{\boldsymbol{\beta}}_{j}(\tau)$ for selected covariates at upper quantiles. The results indicate that being a commercial vehicle positively affects auto claims, with the effect increasing at higher quantiles. In contrast, “Rural Population” and “Time in Force” (the duration the insurance policy has been in effect, abbreviated as TIF) have negative effects, which become stronger at higher quantiles. The variable “Bachelor’s Degree” shows little impact on auto claims. These findings highlight the heterogeneity of covariate effects, particularly in the upper tail of the claim distribution.

We also include the full-data analysis results from HQR for comparison. Figure 5 presents the extreme quantile estimates of auto claims at $\tau_{n}=0.991$ and 0.999 as a function of $X_{20}$ , log-transformed Vehicle Value, for two groups of individuals with different TIF from both HQR and HEQR, with other continuous covariates set to their mean values and categorical variables to their modes. The two groups correspond to individuals with TIF of 1 year and 7 years, representing the 0.25 and 0.75 quantiles, respectively. Results from HEQR suggest that, for both groups, higher vehicle values lead to larger claims, likely due to higher repair costs for more expensive vehicles, which aligns with common expectations and prior findings (Clemente et al., 2023). In contrast, HQR shows little variation in high quantiles of claims across vehicle values. Both methods indicate that individuals with a shorter TIF tend to have higher claims at $\tau_{n}=0.991$ . This finding aligns with our common understanding, as shorter TIFs may reflect less experienced or stable policyholders, potentially leading to higher-risk profiles and more frequent or severe claims. HEQR consistently shows the effect of TIF across high quantiles, while HQR becomes unstable at $\tau_{n}=0.999$ , displaying almost no difference between the two groups.

SUPPLEMENTARY MATERIAL

Supplementary Material for High-dimensional Extreme Quantile Regression:: The supplementary material includes all the technical conditions, proofs of theorems, additional simulation results, and further details on the analysis of the auto insurance claims data. (.pdf file)
Auto Insurance Claims Data Set:: Data set used in the illustration of the proposed methods in Section 4. (.csv file)

References

(1)
Belloni and Chernozhukov (2011) Belloni, A. and Chernozhukov, V. (2011), ‘ $\ell 1$ -penalized quantile regression in high-dimensional sparse models’, Annals of Statistics 39(1), 82–130.
Belloni et al. (2019) Belloni, A., Chernozhukov, V., Chetverikov, D. and Fernández-Val, I. (2019), ‘Conditional quantile processes based on series or many regressors’, Journal of Econometrics 213(1), 4–29.
Bickel et al. (2009) Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009), ‘Simultaneous analysis of lasso and dantzig selector’, Annals of Statistics 37(4), 1705–1732.
Bradic and Kolar (2017) Bradic, J. and Kolar, M. (2017), ‘Uniform inference for high-dimensional quantile regression: linear functionals and regression rank scores’, arXiv:1702.06209.
Chernozhukov (2005) Chernozhukov, V. (2005), ‘Extramal quantile regression’, Annals of Statistics 33(2), 806–839.
Chetverikov et al. (2021) Chetverikov, D., Liao, Z. and Chernozhukov, V. (2021), ‘On cross-validated lasso in high dimensions’, Annals of Statistics 49(3), 1300–1317.
Clemente et al. (2023) Clemente, C., Guerreiro, G. R. and Bravo, J. M. (2023), ‘Modelling motor insurance claim frequency and severity using gradient boosting’, Risks 11(9), 1–20.
Daouia et al. (2013) Daouia, A., Gardes, L. and Girard, S. (2013), ‘On kernel smoothing for extremal quantile regression’, Bernoulli 19(5B), 2557–2589.
Daouia et al. (2023) Daouia, A., Stupfler, G. and Usseglio-Carleve, A. (2023), ‘Inference for extremal regression with dependent heavy-tailed data’, Annals of Statistics 51(5), 2040–2066.
de Haan and Ferreira (2006) de Haan, L. and Ferreira, A. (2006), Extreme Value Theory: An Introduction, Springer Science & Business Media.
Drees (1995) Drees, H. (1995), ‘Refined pickands estimators of the extreme value index’, Annals of Statistics 23(6), 2059–2080.
Fan et al. (2014) Fan, J., Fan, Y. and Barut, E. (2014), ‘Adaptive robust variable selection’, Annals of Statistics 42(1), 324–351.
Gardes and Girard (2016) Gardes, L. and Girard, S. (2016), ‘On the estimation of the functional weibull tail-coefficient’, Journal of Multivariate Analysis 146, 29–45.
Gardes and Stupfler (2019) Gardes, L. and Stupfler, G. (2019), ‘An integrated functional weissman estimator for conditional extreme quantiles’, REVSTAT-Statistical Journal 17(1), 109–144.
Gnecco et al. (2024) Gnecco, N., Terefe, E. M. and Engelke, S. (2024), ‘Extremal random forests’, DOI: https://doi.org/10.1080/01621459.2023.2300522.
He et al. (2020) He, F., Wang, H. J. and Tong, T. (2020), ‘Extremal linear quantile regression with weibull-type tails’, Statistica Sinica 30(3), 1357–1377.
He et al. (2022) He, F., Wang, H. J. and Zhou, Y. (2022), ‘Extremal quantile autoregression for heavy-tailed time series’, Computational Statistics & Data Analysis 176, 107563.
He et al. (2023) He, X., Pan, X., Tan, K. M. and Zhou, W.-X. (2023), ‘Smoothed quantile regression with large-scale inference’, Journal of Econometrics 232(2), 367–388.
He and Shao (2000) He, X. and Shao, Q.-M. (2000), ‘On parameters of increasing dimensions’, Journal of Multivariate Analysis 73(1), 120–135.
Hill (1975) Hill, B. M. (1975), ‘A simple general approach to inference about the tail of a distribution’, Annals of Statistics 3(5), 1163–1174.
Homrighausen and McDonald (2017) Homrighausen, D. and McDonald, D. J. (2017), ‘Risk consistency of cross-validation with lasso-type procedures’, Statistica Sinica 49(3), 1017–1036.
Koenker (2005) Koenker, R. (2005), Quantile Regression, Cambridge University Press.
Koenker and Bassett (1978) Koenker, R. and Bassett, J. G. (1978), ‘Regression quantiles’, Econometrica 46(1), 33–50.
Koenker et al. (2017) Koenker, R., Chernozhukov, V., He, X. and Peng, L. (2017), Handbook of quantile regression, CRC press.
Li and Wang (2019) Li, D. and Wang, H. J. (2019), ‘Extreme quantile estimation for autoregressive models’, Journal of Business & Economic Statistics 37(4), 661–670.
Negahban et al. (2012) Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012), ‘A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers’, Statistical Science 27(4), 538–557.
Neves et al. (2015) Neves, M. M., Gomes, M. I., Figueiredo, F. and Prata Gomes, D. (2015), ‘Modeling extreme events: sample fraction adaptive choice in parameter estimation’, Journal of Statistical Theory and Practice 9(1), 184–199.
Pan and Zhou (2021) Pan, X. and Zhou, W.-X. (2021), ‘Multiplier bootstrap for quantile regression: non-asymptotic theory under random design’, Information and Inference 10(3), 813–861.
Sasaki et al. (2024) Sasaki, Y., Tao, J. and Wang, Y. (2024), ‘High-dimensional tail index regression: with an application to text analyses of viral posts in social media’, arXiv:2403.01318.
Tan et al. (2022) Tan, K. M., Wang, L. and Zhou, W.-X. (2022), ‘High-dimensional quantile regression: Convolution smoothing and concave regularization’, Journal of the Royal Statistical Society Series B: Statistical Methodology 84(1), 205–233.
Velthoen et al. (2023) Velthoen, J., Dombry, C., Cai, J.-J. and Engelke, S. (2023), ‘Gradient boosting for extreme quantile regression’, Extremes 26(4), 639–667.
Wang and Li (2013) Wang, H. J. and Li, D. (2013), ‘Estimation of extreme conditional quantiles through power transformation’, Journal of the American Statistical Association 108(503), 1062–1074.
Wang, Li and He (2012) Wang, H. J., Li, D. and He, X. (2012), ‘Estimation of high conditional quantiles for heavy-tailed distributions’, Journal of the American Statistical Association 107(500), 1453–1464.
Wang and Tsai (2009) Wang, H. and Tsai, C.-L. (2009), ‘Tail index regression’, Journal of the American Statistical Association 104(487), 1233–1240.
Wang, Wu and Li (2012) Wang, L., Wu, Y. and Li, R. (2012), ‘Quantile regression for analyzing heterogeneity in ultra-high dimension’, Journal of the American Statistical Association 107(497), 214–222.
Welsh (1989) Welsh, A. (1989), ‘On m-processes and m-estimation’, Annals of Statistics 17(1), 337–361.
Xu et al. (2022) Xu, W., Hou, Y. and Li, D. (2022), ‘Prediction of extremal expectile based on regression models with heteroscedastic extremes’, Journal of Business & Economic Statistics 40(2), 522–536.
Youngman (2019) Youngman, B. D. (2019), ‘Generalized additive models for exceedances of high thresholds with an application to return level estimation for us wind gusts’, Journal of the American Statistical Association 114(528), 1865–1879.
Zhao et al. (2014) Zhao, T., Kolar, M. and Liu, H. (2014), ‘A general framework for robust testing and confidence regions in high-dimensional quantile regression’, arXiv:1412.8724.
Zheng et al. (2013) Zheng, Q., Gallagher, C. and Kulasekera, K. (2013), ‘Adaptive penalized quantile regression for high dimensional data’, Journal of Statistical Planning and Inference 143(6), 1029–1038.

High-Dimensional Extreme Quantile Regression

Abstract

1 Introduction

2 Extreme Quantile Estimation in High Dimensions

2.1 ℓ1\ell_{1}-penalized Intermediate Quantile Estimator

Remark 1.

Theorem 1.

2.2 Refined Hill Estimator

Theorem 2.

Remark 2.

2.3 Extreme Conditional Quantile Estimator

Theorem 3.

Remark 3.

2.4 Computational Issues

3 Simulation Study

4 Analysis of Auto Insurance Claims

References

2.1 $\ell_{1}$ -penalized Intermediate Quantile Estimator