This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust High-dimensional Tuning Free Multiple Testingthanks: Supported by NSF grants DMS-2210833, DMS-2053832, DMS-2052926 and ONR grant N00014-22-1-2340

Jianqing Fan  Zhipeng Lou  Mengxin Yu
Abstract

A stylized feature of high-dimensional data is that many variables have heavy tails, and robust statistical inference is critical for valid large-scale statistical inference. Yet, the existing developments such as Winsorization, Huberization and median of means require the bounded second moments and involve variable-dependent tuning parameters, which hamper their fidelity in applications to large-scale problems. To liberate these constraints, this paper revisits the celebrated Hodges-Lehmann (HL) estimator for estimating location parameters in both the one- and two-sample problems, from a non-asymptotic perspective. Our study develops Berry-Esseen inequality and Cramér type moderate deviation for the HL estimator based on newly developed non-asymptotic Bahadur representation, and builds data-driven confidence intervals via a weighted bootstrap approach. These results allow us to extend the HL estimator to large-scale studies and propose tuning-free and moment-free high-dimensional inference procedures for testing global null and for large-scale multiple testing with false discovery proportion control. It is convincingly shown that the resulting tuning-free and moment-free methods control false discovery proportion at a prescribed level. The simulation studies lend further support to our developed theory.

1 Introduction

Large-scale, high-dimensional data with rich structures have been widely collected in almost all scientific disciplines and humanities, thanks to the advancements of modern technologies. Massive developments have been made in statistics over the past two decades on extracting valuable information from these high dimensional data; see Bühlmann and Van De Geer (2011); Hastie et al. (2009, 2015); Wainwright (2019); Fan et al. (2020a); Chen et al. (2021) for a detailed account and references therein.

Despite convenience for theoretical analysis, the sub-Gaussian tails condition is not realistic in many applications that involve high-dimensional variables. For instance, it is well known that heavy-tailed distributions is a stylized feature for financial returns and macroeconomic variables (Cont, 2001; Stock and Watson, 2002; Fan and Yao, 2017; Fan et al., 2021). Therefore, tools designed for sub-Gaussian data can lead to erroneous scientific conclusions. Asking thousands of gene expressions to have all sub-Gaussian tails is a mathematical dream, not a reality that data scientists face. For example, comparing gene expression profiles between various cell sub-populations, especially after treatments and therapies, is an essential statistical task (Nagalakshmi et al., 2008; Shendure and Ji, 2008; Wang et al., 2009; Li et al., 2012; Li and Tibshirani, 2013; Gupta et al., 2014; Finotello and Di Camillo, 2015). However, it is unrealistic to hope all thousands of gene expressions have sub-Gaussian distributions: outliers in non-sub-Gaussian distributions can have a significant impact on nonrobust procedures and lead to many false positives and negatives (Gupta et al., 2014; Wang et al., 2015). The situation also arises for inferences using functional magnetic resonance imaging (fMRI) data since the data do not conform to the assumed Gaussian distribution (Eklund et al., 2016). These practical challenges demand for developing efficient and reliable robust inference methods.

Recently, robust statistical methods have gained popularity as a mean of resolving outliers and heavy-tailed noises. Many preceding arts have taken a significant stride toward effective statistical estimation under heavy-tailed distributions. For instance, aiming at dealing with the heavy-tailed noise contamination, the Huber regression is proposed (Huber, 1973), and subsequent publications along these lines include Yohai and Maronna (1979); Mammen (1989); He and Shao (1996, 2000), where the asymptotic properties of the Huber estimator have been thoroughly investigated. From a non-asymptotic perspective, the Huber-type estimator was recently revisited by Sun et al. (2020), in which the authors propose an adaptive Huber regression method and establish its non-asymptotic deviation bounds by only requiring finite (1+δ)(1+\delta)-th moment of the noise with any δ>0\delta>0. Moreover, using a similar idea for making the correspondent MM-estimator insensitive to extreme values, Catoni (2012) developed a novel approach through minimizing a robust empirical loss. It is demonstrated that the estimator has exponential concentration around the true mean and enjoys the same statistical rate as the sample average for sub-Gaussian distributions when the population only has a bounded second moment. Brownlees et al. (2015) further investigates empirical risk minimization based on the robust estimator proposed in Catoni (2012). Additionally, the so-called median of means strategy, which can be traced back to Nemirovsky and Yudin (1983), is another successful method for handling heavy-tailed distributions. By only requiring bounded second moment, it achieves the sub-Gaussian type of concentration around the population mean parameter. Minsker (2015) and Hsu and Sabato (2016) further generalize this idea to multivariate cases. Moreover, there also exists a series of works that focus on solving the issue caused by heavy-tailed noises using quantile-based robust estimation; see Arcones (1995); Koenker and Hallock (2001); Belloni and Chernozhukov (2011); Fan et al. (2014); Zheng et al. (2015) for more details. Furthermore, recently, in Fan et al. (2021); Yang et al. (2017); Fan et al. (2022c), under heavy-tailed contamination, they proposed a novel principle by simply truncating or shrinking the response variables appropriately to achieve sub-Gaussian rates, and they only require bounded second moment of the measurements. Additionally, the aforementioned methodologies can also be applied to a wide range of problems, such as matrix sensing, matrix completion, robust PCA, factor analysis, and neural networks. For interested readers, we refer to Minsker (2015); Hsu and Sabato (2016); Fan et al. (2017); Loh (2017); Minsker (2018); Goldstein et al. (2018); Wang et al. (2020); Fan et al. (2022b); Wang and Fan (2022); Fan et al. (2022a) for more details.

While many effective solutions have been developed to address the problem of heavy-tailedness, these solutions still have some potential shortcomings. In specific, the developments call for the second moments to be bounded and are primarily based on shrinkage data, Huber-type of loss, median of means (Huber, 1973; Nemirovsky and Yudin, 1983; Catoni, 2012; Fan et al., 2021). More critically, Huberization, Winsorization and sample splitting introduce additional tuning parameters and these tuning paprameters should be variable-dependent that makes large-scale applications difficult and damages the fidelity of empirical results . Although, the quantile estimators (Koenker and Hallock, 2001) such as the median and the Hodge-Lehmann (HL) estimator (Hodges and Lehmann, 1963) can eliminate the restriction on moment conditions and tuning parameters selection, the empirical median is often less efficient and requires stronger distribution assumptions. In addition, the existing literature on the HL estimator focuses mainly on low-dimensional asymptotic analysis and can not be applied to large-scale inferences.

In this paper, we revisit the celebrated HL estimator (Hodges and Lehmann, 1963) and conduct non-asymptotic and large-scale theoretical studies for both one-sample and two-sample problems. For one-sample location estimation in the univariate case, we let X1,,XnX_{1},\ldots,X_{n}\in\mathbb{R} be independent and identically distributed (i.i.d.) random variables with

Xi=θ+ξi,i=1,,n,\displaystyle X_{i}=\theta+\xi_{i},\enspace i=1,\ldots,n, (1.1)

where θ\theta represents the location parameter of interest and ξ1,,ξn\xi_{1},\ldots,\xi_{n} are i.i.d. random variables drawn from some unknown distribution. In this scenario, the HL estimator of θ\theta is defined by

θ^=median{Xi+Xj2:1i<jn}.\displaystyle\widehat{\theta}=\mathrm{median}\left\{\frac{X_{i}+X_{j}}{2}:1\leq i<j\leq n\right\}. (1.2)

By assuming the pseudomedian of ξ1\xi_{1} to be zero, we derive non-asymptotic Bahadur representation of θ^.\widehat{\theta}. To the best of our knowledge, this is the first study of its kind on the non-asymptotic expansion of the HL estimator. From there, we also establish the Berry-Essen bound and moderate deviation for θ^\widehat{\theta} in the widest range. Furthermore, as there are multiple unknowns in the asymptotic distribution of θ^\widehat{\theta}, including the density function of ξ1\xi_{1} and the unknown location parameter θ\theta, we then propose a weighted bootstrap approach to construct confidence intervals for θ\theta based on data. These results and methods are essential for the large-scale inference.

In addition to the study of one-sample problem, two-sample location shift problems arise frequently in many scientific studies, including choosing genes that are expressed differently in normal and injured spinal cord, determining the effects of treatment between treated and control groups, finding change points, etc. To this end, we let Y1,,YmY_{1},\ldots,Y_{m}\in\mathbb{R} be another independent sample of i.i.d. random variables satisfying

Yj=θ+εj,j=1,,m.\displaystyle Y_{j}=\theta^{\circ}+\varepsilon_{j},\enspace j=1,\ldots,m. (1.3)

The primary goal is to conduct statistical inference for Θ=θθ\Theta=\theta-\theta^{\circ}. In the sequel, following Hodges and Lehmann (1963), the two-sample HL estimator for Θ\Theta is given by

Θ^=median{XiYj:i=1,,n;j=1,,m}.\displaystyle\widehat{\Theta}=\mathrm{median}\{X_{i}-Y_{j}:i=1,\ldots,n;\,j=1,\ldots,m\}. (1.4)

Instead of assuming the noises are generated from the same distribution (Hodges and Lehmann, 1963), we only require median(ξ1ε1)=0(\xi_{1}-\varepsilon_{1})=0, which is more general and allows random noises to have different distributions. In a similar vein, we establish the non-asymptotic expansion of Θ^\widehat{\Theta}, investigate its asymptotic distributions, and calculate the confidence interval via bootstrap techniques. Again, the techniques and results developed can be applied to large-scale multiple testing problems.

There is a rich literature on large-scale multiple testing problems for location parameters (Benjamini and Hochberg, 1995; Storey, 2002, 2003; Genovese and Wasserman, 2004; Ferreira and Zwinderman, 2006; Chi, 2007; Blanchard and Roquain, 2009). However, most of these works assume the noise distributions sub-Gaussian. Moving away for sub-Gaussian assumptions, Fan et al. (2019) propose estimating mean vector via minimizing the Huber type loss and perform the false discovery proportion (FDP) control. However, leveraging Huber type estimators necessitates moment limits and introduces tuning parameters, making it hard to be applied in large-scale inference, as the tuning parameters should ideally be variable-dependent. Additionally, while the HL estimator enjoys tuning-free and moment-free qualities in the univariate setting, its behavior in high dimensions is largely unknown. To this end, in this research, we further expand the HL estimator to high-dimensional regimes prompted by the lack of tuning free large-scale multiple testing problems for heavy-tailed distributions.

In specific, we assume 𝑿i=𝜽+𝝃i\bm{X}_{i}=\bm{\theta}+\bm{\xi}_{i}, i[n]i\in[n] and 𝒀j=𝜽+𝜺j,j[m]\bm{Y}_{j}=\bm{\theta}^{\circ}+\bm{\varepsilon}_{j},j\in[m], where 𝜽=(θ1,,θp)\bm{\theta}=(\theta_{1},\ldots,\theta_{p})^{\top} and 𝜽=(θ1,,θp)\bm{\theta}^{\circ}=(\theta_{1}^{\circ},\ldots,\theta_{p}^{\circ})^{\top} are p-dimensional vectors of unknown parameters and random noises 𝝃i,i[n],𝜺j,j[m]\bm{\xi}_{i},i\in[n],\bm{\varepsilon}_{j},j\in[m]. Let 𝚯=𝜽𝜽\bm{\Theta}=\bm{\theta}-\bm{\theta}^{\circ}. For both one- and two-sample problems, we propose a carefully constructed Gaussian multiplier bootstrap to test global null hypotheses

H0:θorΘ=0 for all [p]versusH1:θorΘ0 for some [p],\displaystyle H_{0}:\theta_{\ell}\,\,\textrm{or}\,\,\Theta_{\ell}=0\mbox{ for all }\ell\in[p]\enspace\mathrm{versus}\enspace H_{1}:\theta_{\ell}\,\,\textrm{or}\,\,\Theta_{\ell}\neq 0\mbox{ for some }\ell\in[p], (1.5)

by extending the HL estimator to high-dimensional regimes. When the null hypothesis above is rejected, we then perform multiple testing, allowing weakly dependent measurements, and efficiently control the FDP. Compared with existing literature (Liu and Shao, 2014; Fan et al., 2019), our procedures do not involve any tuning parameters and moment conditions for testing global null and large-scale multiple testing. These theoretical finds are further supported by exhaustive numerical studies.

The main contributions of the paper can be summarized as follows:

  • The existing studies on the HL estimator mainly focus on its asymptotic behavior, which is too weak for high-dimensional applications. In practice, however, it is crucial to understand the HL estimator’s performance under finite sample, especially in high-dimensional and large-scale experiments. For this purpose, we first derive the non-asymptotic expansions of the HL estimators for both one-sample and two-sample problems.

  • With the non-asymptotic expansions of the HL estimators, for both one- and two-sample problems, we derive its Berry-Essen type bounds and Cramér type moderate deviations, with the widest range. To deal with unknown components in the distribution, we further develop the weighted bootstrap to build data-driven confidence intervals. In addition, we also furnish the non-asymptotic analysis of the bootstrap estimator.

  • Existing work on large-scale testing with heavy-tail errors typically involves additional tuning parameters and the moment conditions. In order to address these issues, we generalize the HL estimator to large-scale studies and propose tuning-free and moment-free high-dimensional testing procedures. Additionally, we develop bootstrap methods for calculating critical values for large-scale applications. We show that the resulting false discovery proportion is well controlled.

1.1 Roadmap

In §2, we first set up the model and introduce basic settings. We then derive both one-sample and two-sample non-asymptotic expansions of the HL estimator, its Berry-Esseen bound and moderate deviations. In addition, as the asymptotic distribution of the estimator involves unknown quantities, in §3, we conduct multiplier bootstrap to construct valid data-driven confidence intervals. Moreover, §4 is devoted to extending the HL estimator to large-scale multiple testing problems. §5 contains comprehensive numerical studies to verify theoretical results. Finally we conclude the paper with some discussions in §6. All the proofs are deferred to the appendix.

1.2 Notation

For any integer mm, we use [m][m] to denote the set [m]={1,2,,m}[m]=\{1,2,\ldots,m\}. For any function h:h:\mathbb{R}\to\mathbb{R}, we denote h=supz|h(z)|\|h\|_{\infty}=\sup_{z\in\mathbb{R}}|h(z)|. Throughout this paper, we use C,C1,C2,C,C_{1},C_{2},\ldots to denote universal positive constants whose values may vary at different places. We use 𝕀{}\mathbb{I}\{\cdot\} to denote the indicator function. For any set AA, we use |A||A| to denote its cardinality. For two positive sequences {an}n1\{a_{n}\}_{n\geq 1} and {bn}n1\{b_{n}\}_{n\geq 1}, we write an=𝒪(bn)a_{n}=\mathcal{O}(b_{n}) or anbna_{n}\lesssim b_{n} if there exists a positive constant CC such that anCbna_{n}\leq C\cdot b_{n} and we write an=o(bn)a_{n}=o(b_{n}) if an/bn0a_{n}/b_{n}\rightarrow 0. In addition, we define the pseudomedian of a distribution FF to be the median of the distribution of (Z1+Z2)/2,(Z_{1}+Z_{2})/2, where Z1Z_{1} and Z2Z_{2} are independent, each with the same distribution FF. Moreover, for any distribution FF and constant cc, we let cFc\cdot F represent the distribution of the random variable cXc\cdot X, where XX is the random variable drawn from F.F.

2 Estimation and Inference

This section is devoted to studying the non-asymptotic expansions of the Hodges-Lehmann estimator and conducting statistical estimation and inference for population location shift parameters. For both one-sample and two-sample problems, the theoretical properties, which are needed for large-scale inferences, are presented in the following sections.

2.1 One-sample Problem

Let Xi=θ+ξiX_{i}=\theta+\xi_{i}, i[n]i\in[n], be i.i.d. real-valued random variables, where θ\theta\in\mathbb{R} is the unknown location parameter of interest and ξ1,,ξn\xi_{1},\ldots,\xi_{n} are i.i.d. random variables drawn from some unknown distribution. It is assumed that ξ1\xi_{1} has a pseudomedian (Høyland, 1965) of zero, throughout this section. As a consequence, letting U(t)={(X1+X2)/2t}U(t)=\mathbb{P}\{(X_{1}+X_{2})/2\leq t\}, it holds that θ=inf{t:U(t)1/2}\theta=\inf\{t\in\mathbb{R}:U(t)\geq 1/2\}. The HL estimator (Hodges and Lehmann, 1963) of θ\theta is given by the median of all Walsh averages of the observations X1,,XnX_{1},\ldots,X_{n}, namely,

θ^=median{(Xi+Xj)/2:ij[n]}.\displaystyle\widehat{\theta}=\mathrm{median}\{(X_{i}+X_{j})/2:i\neq j\in[n]\}. (2.1)

Equivalently, if we define the U-process Un(t)={n(n1)}1ij[n]𝕀{(Xi+Xj)/2t}U_{n}(t)=\{n(n-1)\}^{-1}\sum_{i\neq j\in[n]}\mathbb{I}\{(X_{i}+X_{j})/2\leq t\}, the HL estimator θ^\widehat{\theta} in (2.1) can also be expressed as the sample median of the process Un(t)U_{n}(t), namely,

θ^=inf{t:Un(t)1/2}.\displaystyle\widehat{\theta}=\inf\{t\in\mathbb{R}:U_{n}(t)\geq 1/2\}. (2.2)

Let F(t)=(ξ1t)F(t)=\mathbb{P}(\xi_{1}\leq t) denote the cumulative distribution function of ξ1\xi_{1} and f(t)=F(t)f(t)=F^{\prime}(t) be its density function. We then present the non-asymptotic Bahadur representation of θ^\widehat{\theta} in the following theorem.

Theorem 2.1.

Assume that there exist positive constants c0c_{0} and κ0\kappa_{0} such that inf|δ|c0U(θ+δ)κ0\inf_{|\delta|\leq c_{0}}U^{\prime}(\theta+\delta)\geq\kappa_{0}. Then for any z>0z>0, we have

(|θ^θ|>z)2exp{nκ02(zc0)2}.\displaystyle\mathbb{P}(|\widehat{\theta}-\theta|>z)\leq 2\exp\{-n\kappa_{0}^{2}(z\wedge c_{0})^{2}\}. (2.3)

Furthermore, assume that supz|f(z)|<\sup_{z\in\mathbb{R}}|f(z)|<\infty and there exist positive constants c1c_{1} and κ1\kappa_{1} such that sup|δ|c1|U′′(θ+δ)|κ1\sup_{|\delta|\leq c_{1}}|U^{\prime\prime}(\theta+\delta)|\leq\kappa_{1}. Then for any z>0z>0 such that z=o(n)z=o(n), we have

{|θ^θ2nU(θ)i=1n{12F(ξi)}|>C1(z1)n}C2exp(z),\displaystyle\mathbb{P}\bigg{\{}\bigg{|}\widehat{\theta}-\theta-\frac{2}{nU^{\prime}(\theta)}\sum_{i=1}^{n}\bigg{\{}\frac{1}{2}-F(-\xi_{i})\bigg{\}}\bigg{|}>\frac{C_{1}(z\vee 1)}{n}\bigg{\}}\leq C_{2}\exp(-z), (2.4)

where C1,C2C_{1},C_{2} are positive constants depending only on c0,κ0,c1,κ1c_{0},\kappa_{0},c_{1},\kappa_{1} and f\|f\|_{\infty}.

We note that existing works mainly study the asymptotic distribution of quantiles of U-statistics instead of non-asymptotic ones (Arcones, 1996). Asymptotic theory, however, is frequently less effective for theoretical studies in high-dimensional statistics (Wainwright, 2019). To fill in the blank, in Theorem 2.1, we present both the non-asymptotic deviation bound and linear approximation of the HL estimator θ^\widehat{\theta}. It is worth mentioning that the HL estimator θ^\widehat{\theta} has sub-Gaussian tails without any moment constraints imposed on the noise ξ1\xi_{1}, whereas the Huber-type or winsorized estimator requires the existence of the second moment. Moreover, in contrast to Huber regression (Zhou et al., 2018; Sun et al., 2020) or truncation (Fan et al., 2021), which both require additional tuning parameters, HL-type estimation is tuning-free and thus more scalable.

Moreover, when the distribution of ξ1\xi_{1} is symmetric around zero, θ\theta reduces to the median of the distribution of XX. In this scenario, the sample median θ^med=median{X1,,Xn}\widehat{\theta}_{\mathrm{med}}=\mathrm{median}\{X_{1},\ldots,X_{n}\} serves as a plausible alternative robust estimator for θ\theta. Under similar regularity conditions on the density function f(t)f(t), the classical Bahadur representation for θ^med\widehat{\theta}_{\mathrm{med}} reveals that

{|θ^medθ1nf(0)i=1n(12𝕀{ξi0})|>Clognn3/4}Cnc\displaystyle\mathbb{P}\bigg{\{}\bigg{|}\widehat{\theta}_{\mathrm{med}}-\theta-\frac{1}{nf(0)}\sum_{i=1}^{n}\bigg{(}\frac{1}{2}-\mathbb{I}\{\xi_{i}\leq 0\}\bigg{)}\bigg{|}>\frac{C\log n}{n^{3/4}}\bigg{\}}\leq Cn^{-c} (2.5)

for any constant c>0c>0. Compared with (2.4) (by taking z=𝒪(logn)z=\mathcal{O}(\log n)), the linear approximation of HL estimator is much more accurate than that of the quantile estimator (Arcones, 1996).

2.1.1 Asymptotic Distribution

In addition to estimation, statistical inference is also essential in real-world applications. To this end, with the developed non-asymptotic expansion at hand, we next present the asymptotic distribution of the HL estimator θ^\widehat{\theta} in this section.

Let Φ()\Phi(\cdot) denote the cumulative distribution function of standard normal random variable. The following theorem establishes a Berry-Esseen theorem for θ^\widehat{\theta}.

Theorem 2.2.

Under the conditions of Theorem 2.1, we have

supz|{n(θ^θ)σθz}Φ(z)|Clognn,\displaystyle\sup_{z\in\mathbb{R}}\bigg{|}\mathbb{P}\bigg{\{}\frac{\sqrt{n}(\widehat{\theta}-\theta)}{\sigma_{\theta}}\leq z\bigg{\}}-\Phi(z)\bigg{|}\leq\frac{C\log n}{\sqrt{n}},

where C<C<\infty is a positive constant independent of nn and

σθ2=4Var{F(ξ1)}{U(θ)}2.\displaystyle\sigma_{\theta}^{2}=\frac{4\operatorname{{\rm Var}}\{F(-\xi_{1})\}}{\{U^{\prime}(\theta)\}^{2}}. (2.6)

Theorem 2.2 establishes the asymptotic normality of θ^\widehat{\theta}. When the distribution of ξ1\xi_{1} is symmetric around zero, the asymptotic variance above reduces to σθ2=1/[3{U(θ)}2]\sigma_{\theta}^{2}=1/[3\{U^{\prime}(\theta)\}^{2}]. Consequently, in view of (2.5) and  (2.6), the asymptotic relative efficiency (ARE) between the HL estimator θ^\widehat{\theta} and the sample median θ^med\widehat{\theta}_{\mathrm{med}} is 3{U(θ)}2/4{f(0)}2{3\{U^{\prime}(\theta)\}^{2}}/{4\{f(0)\}^{2}} (Hodges and Lehmann, 1963). A concrete example is given in Table 1, where we summarize the ARE between θ^\widehat{\theta} and θ^med\widehat{\theta}_{\mathrm{med}} for ξ1tν\xi_{1}\sim t_{\nu}.

ν\nu 11 22 44 88 1616 \infty
ARE 0.750.75 1.041.04 1.251.25 1.371.37 1.431.43 1.501.50
Table 1: Asymptotic relative efficiency between the HL estimator and the sample median

In particular, when ν2\nu\geq 2, the HL estimator has a strictly smaller asymptotic variance than the sample median. The above example illustrates the effectiveness of the HL estimator over the quantile regression method.

Based on the non-asymptotic linear expansion in (2.4), we further derive the Cramér-type moderate deviation to quantify the relative error of the normal approximation for θ^\widehat{\theta} in the following theorem, which has important applications to large-scale inference (Fan et al., 2007; Liu and Shao, 2014; Xia et al., 2018; Zhou et al., 2018).

Theorem 2.3.

Let {δn}n1\{\delta_{n}\}_{n\geq 1} be a sequence of positive numbers satisfying nδn\sqrt{n}\delta_{n}\to\infty. Then, under the conditions of Theorem 2.1, we have

|{n(θ^θ)/σθ>z}1Φ(z)1|C{1+z3n+(1+z)δn+2(z1)2πexp(z22nδn)},\displaystyle\bigg{|}\frac{\mathbb{P}\{\sqrt{n}(\widehat{\theta}-\theta)/\sigma_{\theta}>z\}}{1-\Phi(z)}-1\bigg{|}\leq C\bigg{\{}\frac{1+z^{3}}{\sqrt{n}}+(1+z)\delta_{n}+2(z\vee 1)\sqrt{2\pi}\exp\bigg{(}\frac{z^{2}}{2}-\sqrt{n}\delta_{n}\bigg{)}\bigg{\}}, (2.7)

uniformly for 0<zo(δn1n1/4δn)0<z\leq o(\delta_{n}^{-1}\wedge n^{1/4}\sqrt{\delta_{n}}), where C<C<\infty is a positive constant independent of zz and nn. In particular, when δnn1/6\delta_{n}\asymp n^{-1/6}, we have

|{n(θ^θ)/σθ>z}1Φ(z)1|0,\displaystyle\bigg{|}\frac{\mathbb{P}\{\sqrt{n}(\widehat{\theta}-\theta)/\sigma_{\theta}>z\}}{1-\Phi(z)}-1\bigg{|}\to 0,

uniformly for 0<zo(n1/6)0<z\leq o(n^{1/6}).

It is worth mentioning that taking δnn1/6\delta_{n}\asymp n^{-1/6} yields the wideest possible range 0<zo(n1/6)0<z\leq o(n^{1/6}) for the relative error in (2.7) to vanish, which is also optimal for the Cramér-type moderate deviation results (Petrov, 1975; Fan et al., 2007; Liu and Shao, 2014; Zhou et al., 2018; Fan et al., 2019; Chen and Zhou, 2020; Fang et al., 2020). Next, we proceed to estimate the location shift parameter between two distributions via the HL estimator.

2.2 Two-sample Problem

A variety of applications use two-sample location shift estimation and inference, such as testing gene differences, quantifying treatment effects, and detecting change points. Accordingly, this section examines the two-sample estimation and inference of the population location shift parameter.

Let Yj=θ+εjY_{j}=\theta^{\circ}+\varepsilon_{j}, j[m]j\in[m], be another sample of i.i.d. real-valued random variables independent of {X1,,Xn}\{X_{1},\ldots,X_{n}\}, and we aim at constructing confidence interval for Θ=θθ\Theta=\theta-\theta^{\circ}. Throughout this section, it is assumed that

Θ=inf{t:𝒰(t)1/2},𝒰(t)=(X1Y1t).\displaystyle\Theta=\inf\{t\in\mathbb{R}:\mathcal{U}(t)\geq 1/2\},\qquad\mathcal{U}(t)=\mathbb{P}(X_{1}-Y_{1}\leq t). (2.8)

The existing literature on HL estimators mainly deals with the case where ξ1\xi_{1} and ε1\varepsilon_{1} are identically distributed (Hodges and Lehmann, 1963; Lehmann, 1963; Bauer, 1972; Rosenkranz, 2010). In contrast, it should be noted that the assumption imposed in (2.8) is satisfied as long as median(ε1ξ1)=0,(\varepsilon_{1}-\xi_{1})=0, which is much more general than the identical distribution. In the sequel, following Hodges and Lehmann (1963), the two sample HL estimator for Θ\Theta is given by

Θ^=median{XiYj:i[n],j[m]}.\displaystyle\widehat{\Theta}=\mathrm{median}\{X_{i}-Y_{j}:i\in[n],j\in[m]\}. (2.9)

Before proceeding, we present the following assumption on the relative sample sizes of the involved random samples.

Assumption 2.1.

There exists a positive constant η¯<1\bar{\eta}<1 such that η¯(n/m)1/η¯\bar{\eta}\leq(n/m)\leq 1/\bar{\eta}.

Assumption 2.1 is a natural condition which ensures the sample sizes to be comparable. Such a requirement is commonly imposed for two sample estimation and inference (Bai and Saranadasa, 1996; Chen and Qin, 2010; Li and Chen, 2012; Chang et al., 2017; Zhang et al., 2020). In what follows, we write N=nm/(n+m)N=nm/(n+m) for simplicity. The sub-Gaussian-type deviation inequality and the non-asymptotic Bahadur representation of the two-sample HL estimator Θ^\widehat{\Theta} are established in the subsequent theorem.

Theorem 2.4.

Assume that there exist positive constants c¯0\bar{c}_{0} and κ¯0\bar{\kappa}_{0} such that inf|δ|c¯0𝒰(Θ+δ)κ¯0>0\inf_{|\delta|\leq\bar{c}_{0}}\mathcal{U}^{\prime}(\Theta+\delta)\geq\bar{\kappa}_{0}>0. Then for any z>0z>0, we have

(|Θ^Θ|>z)4exp{2(nm)κ¯02(zc¯0)2}.\displaystyle\mathbb{P}(|\widehat{\Theta}-\Theta|>z)\leq 4\exp\{-2(n\wedge m)\bar{\kappa}_{0}^{2}(z\wedge\bar{c}_{0})^{2}\}.

Furthermore, assume that supt|𝒰(t)|<\sup_{t\in\mathbb{R}}|\mathcal{U}^{\prime}(t)|<\infty and there exist positive constants c¯1\bar{c}_{1} and κ¯1\bar{\kappa}_{1} such that sup|δ|c¯1|𝒰′′(Θ+δ)|κ¯1\sup_{|\delta|\leq\bar{c}_{1}}|\mathcal{U}^{\prime\prime}(\Theta+\delta)|\leq\bar{\kappa}_{1}. Then, under Assumption 2.1, for any 0<z=o(N)0<z=o(N), we have

{|Θ^Θ1𝒰(Θ){1ni=1nG(ξi)1mj=1mF(εj)}|>C1(z1)N}C2exp(z),\displaystyle\mathbb{P}\bigg{\{}\bigg{|}\widehat{\Theta}-\Theta-\frac{1}{\mathcal{U}^{\prime}(\Theta)}\bigg{\{}\frac{1}{n}\sum_{i=1}^{n}G(\xi_{i})-\frac{1}{m}\sum_{j=1}^{m}F(\varepsilon_{j})\bigg{\}}\bigg{|}>\frac{C_{1}(z\vee 1)}{N}\bigg{\}}\leq C_{2}\exp(-z),

where G(t)=(ε1t)G(t)=\mathbb{P}(\varepsilon_{1}\leq t) stands for the cumulative distribution function of ε1\varepsilon_{1} and C1,C2<C_{1},C_{2}<\infty are positive constants depending only on c¯0,κ¯0,c¯1,κ¯1,η¯\bar{c}_{0},\bar{\kappa}_{0},\bar{c}_{1},\bar{\kappa}_{1},\bar{\eta} and 𝒰\|\mathcal{U}^{\prime}\|_{\infty}.

Theorem 2.4 presents the non-asymptotic approximation of the HL estimator Θ^\widehat{\Theta}, where the approximator also enjoys sub-Gaussian tails without posing any constraints on the moments of ξ1\xi_{1} and ε1\varepsilon_{1}. Equipped with this, we establish the Berry-Esseen bound and Cramér type moderate deviation of Θ^\widehat{\Theta}, respectively, in the following theorem. Before proceeding, we define the asymptotic variance of N(Θ^Θ)\sqrt{N}(\widehat{\Theta}-\Theta) to be

σ~Θ2=1{𝒰(Θ)}2(nn+mVar{F(ε1)}+mn+mVar{G(ξ1)}).\displaystyle\widetilde{\sigma}_{\Theta}^{2}=\frac{1}{\{\mathcal{U}^{\prime}(\Theta)\}^{2}}\bigg{(}\frac{n}{n+m}\operatorname{{\rm Var}}\{F(\varepsilon_{1})\}+\frac{m}{n+m}\operatorname{{\rm Var}}\{G(\xi_{1})\}\bigg{)}.
Theorem 2.5.

Under the conditions of Theorem 2.4, we have

supz|{N(Θ^Θ)σ~Θz}Φ(z)|C1logNN,\displaystyle\sup_{z\in\mathbb{R}}\bigg{|}\mathbb{P}\bigg{\{}\frac{\sqrt{N}(\widehat{\Theta}-\Theta)}{\widetilde{\sigma}_{\Theta}}\leq z\bigg{\}}-\Phi(z)\bigg{|}\leq\frac{C_{1}\log N}{\sqrt{N}},

where C1<C_{1}<\infty is a positive constant independent of NN. Moreover, let {δN}N1\{\delta_{N}\}_{N\geq 1} be a sequence of positive constants satisfying NδN\sqrt{N}\delta_{N}\to\infty. Then, we further achieve

|{N(Θ^Θ)/σ~Θz}1Φ(z)1|C2{1+z3N+(1+z)δN+2(z1)2πexp(z22NδN)},\displaystyle\bigg{|}\frac{\mathbb{P}\{\sqrt{N}(\widehat{\Theta}-\Theta)/\widetilde{\sigma}_{\Theta}\geq z\}}{1-\Phi(z)}-1\bigg{|}\leq C_{2}\bigg{\{}\frac{1+z^{3}}{\sqrt{N}}+(1+z)\delta_{N}+2(z\vee 1)\sqrt{2\pi}\exp\bigg{(}\frac{z^{2}}{2}-\sqrt{N}\delta_{N}\bigg{)}\bigg{\}},

uniformly for 0<zo(δN1N1/4δN)0<z\leq o(\delta_{N}^{-1}\wedge N^{1/4}\sqrt{\delta_{N}}), where C2<C_{2}<\infty is a positive constant independent of zz and NN. In particular, when δNN1/6\delta_{N}\asymp N^{-1/6}, we have

|{N(Θ^Θ)/σ~Θz}1Φ(z)1|0,\displaystyle\bigg{|}\frac{\mathbb{P}\{\sqrt{N}(\widehat{\Theta}-\Theta)/\widetilde{\sigma}_{\Theta}\geq z\}}{1-\Phi(z)}-1\bigg{|}\to 0,

uniformly for 0<zo(N1/6)0<z\leq o(N^{1/6}).

One observes that the asymptotic distributions of θ^\widehat{\theta} and Θ^\widehat{\Theta} involve many unknown quantities such as density functions and population parameters θ\theta and Θ\Theta. In the following section, we utilize the bootstrap method to construct confidence intervals for the parameters of interest.

3 Bootstrap Calibration

In this section, we propose a weighted bootstrap method to construct confidence intervals for θ\theta and Θ\Theta, rather than directly estimating those involved unknown terms in asymptotic variances using the brute force methods. The reason is that the direct estimation approach always necessitates the additional selection of tuning parameters and imposes moment conditions. Additionally, the bootstrap calibration performs admirably with finite samples, particularly when the sample size is modest. Therefore, in the sections that follow, we outline the bootstrap procedures for both the one- and two-sample problems.

3.1 Boostrap for One-sample Problem

Recall that the one-sample HL estimator is given by θ^=argminνij[n]|Xi+Xj2ν|\widehat{\theta}=\arg\min_{\nu\in\mathbb{R}}\sum_{i\neq j\in[n]}|X_{i}+X_{j}-2\nu|. Throughout this paper, we focus on the weighted bootstrap procedure in which the bootstrap estimate of θ^\widehat{\theta} is defined by minimizing the randomly perturbed objective function. More specifically, let ω1,,ωn\omega_{1},\ldots,\omega_{n}\in\mathbb{R} be i.i.d. non-negative random variables with 𝔼(ω1)=1\mathbb{E}(\omega_{1})=1 and Var(ω1)=1\operatorname{{\rm Var}}(\omega_{1})=1. Then the weighted bootstrap estimate of θ^\widehat{\theta} is given by

θ^=argminνij[n]ωiωj|Xi+Xj2ν|.\displaystyle\widehat{\theta}^{\star}=\underset{\nu\in\mathbb{R}}{\arg\min}\sum_{i\neq j\in[n]}\omega_{i}\omega_{j}|X_{i}+X_{j}-2\nu|.

A natural candidate of the bootstrap weight above would be ωi\omega_{i} sampled from a 2\cdotBernoulli(0.5) distribution (the multiplication of 2 is to guarantee the previous normalization condition). In this case, the bootstrap estimator θ^\widehat{\theta}^{\star} has the simple closed-form expression as follows,

θ^=median{(Xi+Xj)/2:ij𝒮},\displaystyle\widehat{\theta}^{\star}=\mathrm{median}\{(X_{i}+X_{j})/2:i\neq j\in\mathcal{S}\}, (3.1)

which is the same as the sub-sampled HL estimator computed based on the dataset {Xi:i𝒮}\{X_{i}:i\in\mathcal{S}\} where 𝒮={i[n]:ωi0}\mathcal{S}=\{i\in[n]:\omega_{i}\neq 0\}, and we concentrate on this type of bootstrap calibration procedure in what follows.

Let Bn=ij[n]ωiωjB_{n}=\sum_{i\neq j\in[n]}\omega_{i}\omega_{j} denote the total number of Walsh averages in (3.1) and denote Vij=[𝕀{ξi+ξj0}Un(θ)]/U(θ)V_{ij}=[\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-U_{n}(\theta)]/U^{\prime}(\theta) for each ij[n]i\neq j\in[n]. In the subsequent theorem, we establish the non-asymptotic Bahadur representation of the bootstrap estimator θ^\widehat{\theta}^{\star} and approximated distribution of bootstrap samples.

Theorem 3.1.

Under the conditions of Theorem 2.1, for any ω,z>0\omega,z>0 such that (ωz)=o(n)(\omega\vee z)=o(n), with probability at least 1C1exp(ω)1-C_{1}\exp(-\omega), we have

{|θ^θ^|>C2(ω+lognn+z/n)}\displaystyle\mathbb{P}^{\star}\bigg{\{}|\widehat{\theta}^{\star}-\widehat{\theta}|>C_{2}\bigg{(}\frac{\omega+\log n}{n}+\sqrt{z/n}\bigg{)}\bigg{\}} C3exp(z),\displaystyle\leq C_{3}\exp(-z), (3.2)
{|θ^θ^1Bnij[n](ωiωj1)Vij|>C4(z+ω+logn)n}\displaystyle\mathbb{P}^{\star}\bigg{\{}\bigg{|}\widehat{\theta}^{\star}-\widehat{\theta}-\frac{1}{B_{n}}\sum_{i\neq j\in[n]}(\omega_{i}\omega_{j}-1)V_{ij}\bigg{|}>\frac{C_{4}(z+\omega+\log n)}{n}\bigg{\}} C5exp(z),\displaystyle\leq C_{5}\exp(-z), (3.3)

where ()=(|X1,,Xn)\mathbb{P}^{\star}(\cdot)=\mathbb{P}(\cdot|X_{1},\ldots,X_{n}) stands for the conditional probability and C1C_{1}C7C_{7} are positive constants depending only on c0,κ0,c1,κ1c_{0},\kappa_{0},c_{1},\kappa_{1} and f\|f\|_{\infty}.

The non-asymptotic linear expansion in (3.3) enables us to derive the asymptotic normality of the bootstrap estimator θ^\widehat{\theta}^{\star}. Combined with the Berry-Esseen bound in Theorem 2.2, we further establish a non-asymptotic upper bound on the Kolmogorov distance between the distribution functions of θ^θ\widehat{\theta}-\theta and θ^θ^\widehat{\theta}^{\star}-\widehat{\theta}. More specifically, with probability at least 1Cexp(ω)1-C\exp(-\omega), we have

supz|(|θ^θ^|z)(|θ^θ|z)|C(ω+logn)n,\displaystyle\sup_{z\in\mathbb{R}}\left|\mathbb{P}^{\star}\left(|\widehat{\theta}^{\star}-\widehat{\theta}|\leq z\right)-\mathbb{P}\left(|\widehat{\theta}-\theta|\leq z\right)\right|\leq\frac{C^{\prime}(\omega+\log n)}{\sqrt{n}}, (3.4)

where CC and CC^{\prime} are positive constants independent of nn. Consequently, we are equipped to construct confidence interval for θ\theta in a data-driven way. For any significance level α(0,1)\alpha\in(0,1), let

q1α=inf{z:(|θ^θ^|z)1α}.\displaystyle q_{1-\alpha}^{\star}=\inf\left\{z\in\mathbb{R}:\mathbb{P}^{\star}\left(|\widehat{\theta}^{\star}-\widehat{\theta}|\leq z\right)\geq 1-\alpha\right\}.

Then the (1α)×100%(1-\alpha)\times 100\% confidence interval for θ\theta is given by 𝕀(θ,1α)={θ^q1α,θ^+q1α}\mathbb{CI}(\theta,1-\alpha)=\{\widehat{\theta}-q_{1-\alpha}^{\star},\widehat{\theta}+q_{1-\alpha}^{\star}\}.

3.2 Bootstrap in Two-sample Problem

This section is devoted to constructing confidence intervals for Θ\Theta for the two-sample problem. Let ωn+1,,ωn+m\omega_{n+1},\ldots,\omega_{n+m}\in\mathbb{R} be i.i.d. 2\cdotBernoulli(0.5) random variables independent of {ω1,,ωn}\{\omega_{1},\ldots,\omega_{n}\}. Following (3.1), the bootstrap estimator for Θ^\widehat{\Theta} is defined by

Θ^=median{XiYj:i𝒮X,j𝒮Y},\displaystyle\widehat{\Theta}^{\star}=\textrm{median}\{X_{i}-Y_{j}:i\in\mathcal{S}^{X},j\in\mathcal{S}^{Y}\}, (3.5)

where 𝒮X={i[n]:ωi0}\mathcal{S}^{X}=\{i\in[n]:\omega_{i}\neq 0\} and 𝒮Y={j[m]:ωj+n0}\mathcal{S}^{Y}=\{j\in[m]:\omega_{j+n}\neq 0\}. It is worth noting that Θ^\widehat{\Theta}^{\star} is equivalent to the sub-sampled HL estimator based on the two datasets {Xi:i𝒮X}\{X_{i}:i\in\mathcal{S}^{X}\} and {Yj:j𝒮Y}\{Y_{j}:j\in\mathcal{S}^{Y}\}. With these necessary tools at hands, the non-asymptotic Bahadur representation of the bootstrap estimator Θ^\widehat{\Theta}^{\star} and approximated distribution of bootstrap samples are developed in the following theorem.

Theorem 3.2.

Under the conditions of Theorem 2.4, for any ω,z>0\omega,z>0 such that (ωz)=o(N)(\omega\vee z)=o(N), with probability at least 1C1exp(ω)1-C_{1}\exp(-\omega), we have

{|Θ^Θ^|>C2(ω+logNN+z/N)}\displaystyle\mathbb{P}^{\star}\bigg{\{}|\widehat{\Theta}^{\star}-\widehat{\Theta}|>C_{2}\bigg{(}\frac{\omega+\log N}{N}+\sqrt{z/N}\bigg{)}\bigg{\}} C3exp(z),\displaystyle\leq C_{3}\exp(-z),
{|Θ^Θ^1ni=1nj=1m(ωiωj+n1)𝒱ij|>C4(z+ω+logN)N}\displaystyle\mathbb{P}^{\star}\bigg{\{}\bigg{|}\widehat{\Theta}^{\star}-\widehat{\Theta}-\frac{1}{\mathcal{B}_{n}}\sum_{i=1}^{n}\sum_{j=1}^{m}(\omega_{i}\omega_{j+n}-1)\mathcal{V}_{ij}\bigg{|}>\frac{C_{4}(z+\omega+\log N)}{N}\bigg{\}} C5exp(z),\displaystyle\leq C_{5}\exp(-z),

where ()=(|X1,,Xn,Y1,,Yn+m)\mathbb{P}^{\star}(\cdot)=\mathbb{P}(\cdot|X_{1},\ldots,X_{n},Y_{1},\ldots,Y_{n+m}) stands for the conditional probability, n=i=1nj=1mωiωj+n\mathcal{B}_{n}=\sum_{i=1}^{n}\sum_{j=1}^{m}\omega_{i}\omega_{j+n} is the total number of pairwise differences in (3.5) and 𝒱ij=[𝕀{ξiεj}(nm)1i=1nj=1m𝕀{ξiεj}]/𝒰(Θ)\mathcal{V}_{ij}=[\mathbb{I}\{\xi_{i}\leq\varepsilon_{j}\}-(nm)^{-1}\sum_{i=1}^{n}\sum_{j=1}^{m}\mathbb{I}\{\xi_{i}\leq\varepsilon_{j}\}]/\mathcal{U}^{\prime}(\Theta) for each i[n]i\in[n] and j[m]j\in[m]. In addition, C1C_{1}C5C_{5} are positive constants depending only on c0,κ0,c1,κ1,η¯c_{0},\kappa_{0},c_{1},\kappa_{1},\bar{\eta} and 𝒰\|\mathcal{U}^{\prime}\|_{\infty}.

We then obtain the Berry-Esseen bound and build confidence intervals for Θ\Theta based on the non-asymptotic expansion of the two-sample bootstrap estimator Θ^\widehat{\Theta}^{\star} using similar arguments in §3.1.

4 Large-scale Multiple Testing

With the advancement of technology, large-scale, high-dimensional data have been extensively collected over the past two decades in a variety of fields such as medicine, biology, genetics, earth science, and finance. In large-scale regimes, there are inevitably heavy-tailed noises and it is crucial to develop robust statistical inference procedures. Yet, existing research that infers location shifts via Huber-type estimates calls for variable-dependent tuning parameters and moment limitations (Fan et al., 2019; Sun et al., 2020). This type of technique is hard to apply efficiently and faithfully to large-scale inferences due to the choices of tuning values. Moment constraints also exclude a large number of heavy-tailed distributions. To remedy the issues, this section focuses on extending the HL estimation to high dimensions and developing tuning-free and moment-free high-dimensional multiple testing procedures.

4.1 Large-Scale Testing for One-sample Problem

In this section, we investigate high-dimensional multiple testing using the HL estimator for one-sample data. Let 𝑿i=𝜽+𝝃i\bm{X}_{i}=\bm{\theta}+\bm{\xi}_{i}, i[n]i\in[n], be i.i.d. p-dimensional random vectors, where 𝜽=(θ1,,θp)\bm{\theta}=(\theta_{1},\ldots,\theta_{p})^{\top} is a p-dimensional vector of unknown parameters and 𝝃1,,𝝃np\bm{\xi}_{1},\ldots,\bm{\xi}_{n}\in\mathbb{R}^{p} are i.i.d. random vectors. With building blocks presented in the previous section, we first proceed to constructing simultaneous confidence intervals for 𝜽\bm{\theta} using Gaussian approximation and bootstrap calibrations.

4.1.1 Gaussian Approximation

The primary goal of this section is to construct simultaneous confidence intervals for 𝜽\bm{\theta}. To this end, we develop a Gaussian approximation for the maximum deviation max[p]|θ^θ|\max_{\ell\in[p]}|\widehat{\theta}_{\ell}-\theta_{\ell}| following the intuition of recently developed high dimensional distributional theory (Chernozhukov et al., 2017; Chernozhuokov et al., 2022). More specifically, let 𝒁=(Z1,,Zp)\bm{Z}=(Z_{1},\ldots,Z_{p})^{\top} be a p-dimensional centered Gaussian random vector with

Cov(Zk,Z)=4Cov{Fk(ξ1k),F(ξ1)}Uk(θk)U(θ),k,[p],\displaystyle\operatorname{\rm Cov}(Z_{k},Z_{\ell})=\frac{4\operatorname{\rm Cov}\{F_{k}(-\xi_{1k}),F_{\ell}(-\xi_{1\ell})\}}{U_{k}^{\prime}(\theta_{k})U_{\ell}^{\prime}(\theta_{\ell})},\enspace k,\ell\in[p], (4.1)

where U(t)=(X1+X22t)U_{\ell}(t)=\mathbb{P}(X_{1\ell}+X_{2\ell}\leq 2t) and U(t)U_{\ell}^{\prime}(t) stands for its derivative, and F(t)=(ξ1t)F_{\ell}(t)=\mathbb{P}(\xi_{1\ell}\leq t). We have the high dimensional Gaussian approximation in the following theorem.

Theorem 4.1.

Assume that there exist positive constants c0,κ0,c1c_{0},\kappa_{0},c_{1} and κ1\kappa_{1} such that

min[p]inf|δ|c0U(θ+δ)κ0andmax[p]sup|δ|c1|U′′(θ+δ)|κ1.\displaystyle\min_{\ell\in[p]}\inf_{|\delta|\leq c_{0}}U_{\ell}^{\prime}(\theta_{\ell}+\delta)\geq\kappa_{0}\enspace\mathrm{and}\enspace\max_{\ell\in[p]}\sup_{|\delta|\leq c_{1}}|U_{\ell}^{\prime\prime}(\theta_{\ell}+\delta)|\leq\kappa_{1}.

Then, we have

supz>0|(max[p]n|θ^θ|z)(max[p]|Z|z)|Clog5/4(pn)n1/4.\displaystyle\sup_{z>0}\left|\mathbb{P}\left(\max_{\ell\in[p]}\sqrt{n}|\widehat{\theta}_{\ell}-\theta_{\ell}|\leq z\right)-\mathbb{P}\left(\max_{\ell\in[p]}|Z_{\ell}|\leq z\right)\right|\leq\frac{C\log^{5/4}(pn)}{n^{1/4}}. (4.2)

We consider testing the global null hypotheses

H0:θ=0 for all [p]versusH1:θ0 for some [p].\displaystyle H_{0}:\theta_{\ell}=0\mbox{ for all }\ell\in[p]\enspace\mathrm{versus}\enspace H_{1}:\theta_{\ell}\neq 0\mbox{ for some }\ell\in[p]. (4.3)

Based on the marginal HL estimators {θ^}[p]\{\widehat{\theta}_{\ell}\}_{\ell\in[p]}, one shall reject the null hypothesis H0H_{0} in (4.3) when max[p]|θ^|\max_{\ell\in[p]}|\widehat{\theta}_{\ell}| exceeds certain threshold that depends on the distribution of max[p]|Z|\max_{\ell\in[p]}|Z_{\ell}|. However, in light of (4.1), the distribution depends on the unknown distribution functions FF_{\ell}. Therefore, to approximate the distribution of max[p]|Z|\max_{\ell\in[p]}|Z_{\ell}|, we also propose to use bootstrap procedure. In specific, in one-sample regime, recall that 𝒮={i[n]:ωi0}\mathcal{S}=\{i\in[n]:\omega_{i}\neq 0\}. For each [p]\ell\in[p], define the bootstrap estimate of θ\theta_{\ell} as θ^=median{(Xi+Xj)/2:ij𝒮}\widehat{\theta}_{\ell}^{\star}=\mathrm{median}\{(X_{i\ell}+X_{j\ell})/2:i\neq j\in\mathcal{S}\}. It is worth mentioning that these bootstrap estimators {θ^}[p]\{\widehat{\theta}_{\ell}^{\star}\}_{\ell\in[p]} can be efficiently computed in practice. For α(0,1)\alpha\in(0,1), let Q1α=inf{z:(max[p]|θ^θ^|z)1α}Q_{1-\alpha}^{\star}=\inf\{z\in\mathbb{R}:\mathbb{P}^{\star}(\max_{\ell\in[p]}|\widehat{\theta}_{\ell}^{\star}-\widehat{\theta}_{\ell}|\leq z)\geq 1-\alpha\} denote the (1α)(1-\alpha)th quantile of the bootstrap statistic max[p]|θ^θ^|\max_{\ell\in[p]}|\widehat{\theta}_{\ell}^{\star}-\widehat{\theta}_{\ell}|. With the help of bootstrap, we manage to estimate the quantiles of the approximated distribution efficiently and the corresponding results are presented in the following theorem.

Theorem 4.2.

Under the conditions of Theorem 4.1, we have

|(max[p]|θ^θ|>Q1α)α|Clog5/4(pn)n1/4.\displaystyle\left|\mathbb{P}\left(\max_{\ell\in[p]}|\widehat{\theta}_{\ell}-\theta_{\ell}|>Q_{1-\alpha}^{\star}\right)-\alpha\right|\leq\frac{C\log^{5/4}(pn)}{n^{1/4}}.

Theorem 4.2 reveals that the proposed bootstrap procedure can efficiently estimate the quantiles of the approximated distribution. This allows for the direct construction of simultaneous data-driven confidence intervals for 𝜽\bm{\theta}. In addition, when the null-hypothesis of (4.3) is rejected, it is essential to conduct multiple testing to identify significant individuals and control false discovery proportion (FDP). We next address this problem in the following section.

4.1.2 Multiple Testing

The goal of this section is to conduct multiple testing to identify statistically significant individuals with controlled false discovery proportions. Specifically, we consider simultaneously testing the hypotheses

H0,:θ=0versusH1,:θ0,for[p].\displaystyle H_{0,\ell}:\theta_{\ell}=0\enspace\mathrm{versus}\enspace H_{1,\ell}:\theta_{\ell}\neq 0,\enspace\mathrm{for}\enspace\ell\in[p].

Let 0={[p]:θ=0}\mathcal{H}_{0}=\{\ell\in[p]:\theta_{\ell}=0\} denote the set of true null hypotheses with cardinality |0||\mathcal{H}_{0}|. For each [p]\ell\in[p], let PP_{\ell} denote the p-value for testing the individual hypothesis H0,H_{0,\ell}. For any prescribed threshold t(0,1)t\in(0,1), we shall reject the null hypothesis H0,H_{0,\ell} whenever P<tP_{\ell}<t. Then the false discovery proportion is defined by

FDP(t)=V(t)/max{R(t),1},\displaystyle\mathrm{FDP}(t)=V(t)/\max\{R(t),1\}, (4.4)

where V(t)=0𝕀{Pt}V(t)=\sum_{\ell\in\mathcal{H}_{0}}\mathbb{I}\{P_{\ell}\leq t\} denotes the number of false discoveries and R(t)==1p𝕀{Pt}R(t)=\sum_{\ell=1}^{p}\mathbb{I}\{P_{\ell}\leq t\} is the number of total discoveries. Note that the denominator is observable but V(t)V(t) is not. When |0||\mathcal{H}_{0}| tends to infinity, V(t)t|0|tpV(t)\approx t|\mathcal{H}_{0}|\leq tp. This can be used to give an upper bound of FDP(t)(t). In many applications, |0|p|\mathcal{H}_{0}|\approx p, and Storey (2002) gives an estimator for |0||\mathcal{H}_{0}| and incorporates it into FDP(t)\mathrm{FDP}(t) estimator.

It is worth noting that the p-values {P}[p]\{P_{\ell}\}_{\ell\in[p]} are computed by constructing inferential test statistics, with pivotal limiting distributions, based on the normal distribution calibration (Fan et al., 2007). As illustrated above, to construct a test statistic for each hypothesis with pivotal asymptotic distribution, the asymptotic variance of the HL estimator always depends on the unknown components and the traditional quantile-based approach is not scalable in the ultra-high dimensional scenario. To remedy this issue, we leverage bootstrap to proceed the analysis. Specifically, let {ωi:i[n],[p]}\{\omega_{i\ell}:i\in[n],\ell\in[p]\} be i.i.d. non-negative random variables generated in the same way with those in §3.1 and denote 𝒮={i[n]:ωi0}\mathcal{S}_{\ell}=\{i\in[n]:\omega_{i\ell}\neq 0\} for each [p]\ell\in[p]. Similar to (3.1), the bootstrap estimate of θ^\widehat{\theta}_{\ell} is defined by

θ^=median{(Xi+Xj)/2:ij𝒮}.\displaystyle\widehat{\theta}_{\ell}^{\star}=\mathrm{median}\{(X_{i\ell}+X_{j\ell})/2:i\neq j\in\mathcal{S}_{\ell}\}.

Consequently, our p-values are derived as P=(|θ^θ^|>|θ^||𝑿1,,𝑿n)P_{\ell}=\mathbb{P}(|\widehat{\theta}_{\ell}^{\star}-\widehat{\theta}_{\ell}|>|\widehat{\theta}_{\ell}||\bm{X}_{1},\ldots,\bm{X}_{n}) and let P(1)P(2)P(p)P_{(1)}\leq P_{(2)}\leq\ldots\leq P_{(p)} denote the ordered p-values. In order to choose tt properly to control the FDP, we adopt the distribution-free procedure proposed by Benjamini and Hochberg (1995). Specifically, for any significance level α(0,1)\alpha\in(0,1), the data-dependent threshold is tBH=P(BH)t_{BH}=P_{(\ell_{BH})}, where BH\ell_{BH} is given by

BH=max{[p]:P()α/p}.\displaystyle\ell_{BH}=\max\{\ell\in[p]:P_{(\ell)}\leq\alpha\ell/p\}.

Recall that an estimate of FDP(t)\mathrm{FDP}(t) is FDP^(t)=pt/R(t)\widehat{\mathrm{FDP}}(t)=pt/R(t), following the discussion after (4.4). Then a natural choice of threshold is

t^=sup{t:FDP^(t)α}=sup{t:tαR(t)/p}=tBH.\widehat{t}=\sup\{t:\widehat{\mathrm{FDP}}(t)\leq\alpha\}=\sup\{t:t\leq\alpha R(t)/p\}=t_{BH}.

This provides a simple explanation on the choice of tBHt_{BH}.

In what follows, we assume that |0|/pπ0(0,1]|\mathcal{H}_{0}|/p\to\pi_{0}\in(0,1]. For each k,[p]k,\ell\in[p], we define the correlation measure as ρk=Corr{Fk(ξ1k),F(ξ1)}\rho_{k\ell}=\mathrm{Corr}\{F_{k}(-\xi_{1k}),F_{\ell}(-\xi_{1\ell})\}, and impose the assumption quantifying dependence between measurements below.

Assumption 4.1.

There exists a positive constant 0<ρ<10<\rho<1 such that maxk|ρk|ρ\max_{k\neq\ell}|\rho_{k\ell}|\leq\rho and

max[p]k=1p𝕀{|ρk|>1(logp)2+κ}=𝒪(pϕ),\displaystyle\max_{\ell\in[p]}\sum_{k=1}^{p}\mathbb{I}\left\{|\rho_{k\ell}|>\frac{1}{(\log p)^{2+\kappa}}\right\}=\mathcal{O}(p^{\phi}),

for some κ>0\kappa>0 and ϕ(0,(1ρ)/(1+ρ))\phi\in(0,(1-\rho)/(1+\rho)).

Assumption 4.1 requires that for every variable, the number of other variables, whose correlations with the given variable exceed certain threshold, does not grow too fast. It is worth noting that this is a commonly imposed condition in large-scale multiple testing problems (Liu and Shao, 2014). Based on this assumption, we next summarize the theoretical results in the following Theorem 4.3.

Theorem 4.3.

Assume that logp=o(n1/5)\log p=o(n^{1/5}) and

ϖp:=|{[p]:|θ/σ|λ0(2logp)/n}|,\displaystyle\varpi_{p}:=\left|\left\{\ell\in[p]:|\theta_{\ell}/\sigma_{\ell}|\geq\lambda_{0}\sqrt{(2\log p)/n}\right\}\right|\to\infty, (4.5)

for some λ0>2\lambda_{0}>2, where σ2=4Var{F(ξ1)}/{U(θ)}2\sigma_{\ell}^{2}=4\mathrm{Var}\{F_{\ell}(-\xi_{1\ell})\}/\{U_{\ell}^{\prime}(\theta_{\ell})\}^{2} for each [p]\ell\in[p]. Then, under Assumption 4.1 and the conditions of Theorem 4.1, we have

|FDP(tBH)|0|/pα|0.\displaystyle\bigg{|}\frac{\mathrm{FDP}(t_{BH})}{|\mathcal{H}_{0}|/p}-\alpha\bigg{|}\overset{\mathbb{P}}{\rightarrow}0. (4.6)

Theorem 4.3 develops theoretical guarantees for the consistency of FDP control procedure. To further support the derived results, we make the following several remarks.

Remark 4.1.

Condition (4.5) is nearly optimal for controlling the false discovery proportion. More specifically, as shown in Proposition 2.1 in Liu and Shao (2014), if the number of alternative hypotheses is fixed instead of tending to infinity, the B-H approach fails to control the FDP at any level 0<β<10<\beta<1 even if the true p-values for 0,\mathcal{H}_{0,\ell} are known. ∎

4.2 Large-Scale Two-Sample Tests

In this section, we study the large-scale two-sample testings. In specific, let 𝒀j=𝜽+𝜺j,j[m]\bm{Y}_{j}=\bm{\theta}^{\circ}+\bm{\varepsilon}_{j},j\in[m], be another sample of i.i.d. p-dimensional random vectors independent of {𝑿1,,𝑿n}\{\bm{X}_{1},\ldots,\bm{X}_{n}\}, where 𝜽=(θ1,,θp)p\bm{\theta}^{\circ}=(\theta_{1}^{\circ},\ldots,\theta_{p}^{\circ})^{\top}\in\mathbb{R}^{p}. Let 𝚯=𝜽𝜽=(Θ1,,Θp)p\bm{\Theta}=\bm{\theta}-\bm{\theta}^{\circ}=(\Theta_{1},\ldots,\Theta_{p})^{\top}\in\mathbb{R}^{p} be the location shift parameter. Following (2.9), the HL estimator for Θ\Theta_{\ell} is given by Θ^=median{XiYj:i[n],j[m]}\widehat{\Theta}_{\ell}=\mathrm{median}\{X_{i\ell}-Y_{j\ell}:i\in[n],j\in[m]\}. A detailed global test for whether θ=θ\theta_{\ell}=\theta_{\ell}^{\circ} for all [p]\ell\in[p] is given in §E.4.

We next conduct a simultaneously test on the hypotheses

H0,:θ=θversusH1,:θθ,for[p],\displaystyle H_{0,\ell}:\theta_{\ell}=\theta_{\ell}^{\circ}\enspace\mathrm{versus}\enspace H_{1,\ell}:\theta_{\ell}\neq\theta_{\ell}^{\circ},\enspace\mathrm{for}\enspace\ell\in[p],

where 0={[p]:θ=θ}\mathcal{H}_{0}^{\diamond}=\{\ell\in[p]:\theta_{\ell}=\theta_{\ell}^{\circ}\} denote the set of true null hypotheses and |0|==1p𝕀{θ=θ}|\mathcal{H}_{0}^{\diamond}|=\sum_{\ell=1}^{p}\mathbb{I}\{\theta_{\ell}=\theta_{\ell}^{\circ}\} is its cardinality. Throughout this section, we assume that |0|/pπ0(0,1]|\mathcal{H}_{0}^{\diamond}|/p\to\pi_{0}\in(0,1]. For each [p]\ell\in[p], let PP_{\ell}^{\diamond} denote the p-value for testing whether θ=θ\theta_{\ell}=\theta_{\ell}^{\circ}. For any prescribed threshold t(0,1)t\in(0,1), we shall reject the null hypothesis θ=θ\theta_{\ell}=\theta_{\ell}^{\circ} whenever P<tP_{\ell}^{\diamond}<t. The primary goal is to control the following false discovery proposition FDP(t)\mathrm{FDP}^{\diamond}(t),

FDP(t)=V(t)/max{R(t),1},\displaystyle\mathrm{FDP}^{\diamond}(t)=V^{\diamond}(t)/\max\{R^{\diamond}(t),1\},

by selecting a proper tt, where V(t)=0𝕀{Pt}V^{\diamond}(t)=\sum_{\ell\in\mathcal{H}_{0}^{\diamond}}\mathbb{I}\{P_{\ell}^{\diamond}\leq t\} and R(t)==1p𝕀{Pt}R^{\diamond}(t)=\sum_{\ell=1}^{p}\mathbb{I}\{P_{\ell}^{\diamond}\leq t\}.

To approximate the unknown involved asymptotic distributions, we let {ωi:i[n+m],[p]}\{\omega_{i\ell}:i\in[n+m],\ell\in[p]\} be i.i.d. non-negative random variables generated in the same way with those in §3.2 and denote 𝒮X={i[n]:ωi0}\mathcal{S}_{\ell}^{X}=\{i\in[n]:\omega_{i\ell}\neq 0\} and 𝒮Y={j[(n+1):(m+n)]:ωj0}\mathcal{S}_{\ell}^{Y}=\{j\in[(n+1):(m+n)]:\omega_{j\ell}\neq 0\} for each [p]\ell\in[p]. Following (3.5), the bootstrap estimate for Θ^\widehat{\Theta}_{\ell} is defined by

Θ^=median{XiYj:i𝒮X,j𝒮Y}.\displaystyle\widehat{\Theta}_{\ell}^{\star}=\mathrm{median}\{X_{i\ell}-Y_{j\ell}:i\in\mathcal{S}_{\ell}^{X},j\in\mathcal{S}_{\ell}^{Y}\}.

Then, for each [p]\ell\in[p], the p-value is given by P=(|Θ^Θ^|>|Θ^||𝑿1,,𝑿n)P_{\ell}^{\diamond}=\mathbb{P}(|\widehat{\Theta}_{\ell}^{\star}-\widehat{\Theta}_{\ell}|>|\widehat{\Theta}_{\ell}||\bm{X}_{1},\ldots,\bm{X}_{n}), and P(1)P(2)P(p)P_{(1)}^{\diamond}\leq P_{(2)}^{\diamond}\leq\ldots\leq P_{(p)}^{\diamond} are the ordered p-values. Following Benjamini and Hochberg (1995), the data-dependent threshold tBH=P(BH)t_{BH}^{\diamond}=P_{(\ell_{BH}^{\diamond})}^{\diamond}, where BH\ell_{BH}^{\diamond} is chosen as BH=max{[p]:P()α/p}\ell_{BH}^{\diamond}=\max\{\ell\in[p]:P_{(\ell)}^{\diamond}\leq\alpha\ell/p\}.

With these necessary tools at hand, in the paragraph that follows, we present the required assumptions and the main theorem, respectively. In specific, Assumption 4.2 provides the formal condition that quantifies the level of dependence, while Theorem 4.4 presents the theoretical assurances for two-sample multiple testing.

Assumption 4.2.

There exist positive constants 0<ρ<10<\rho^{\diamond}<1 such that maxk|ρk|ρ\max_{k\neq\ell}|\rho_{k\ell}^{\diamond}|\leq\rho^{\diamond}, where ρk=Corr(G¯nkF¯mk,G¯nF¯m)\rho_{k\ell}^{\diamond}=\mathrm{Corr}(\bar{G}_{nk}-\bar{F}_{mk},\bar{G}_{n\ell}-\bar{F}_{m\ell}), with G¯n=n1i=1nG(ξi)\bar{G}_{n\ell}=n^{-1}\sum_{i=1}^{n}G_{\ell}(\xi_{i\ell}) and F¯m=m1j=1mF(εj)\bar{F}_{m\ell}=m^{-1}\sum_{j=1}^{m}F_{\ell}(\varepsilon_{j\ell}) for each [p]\ell\in[p]. Moreover, for some κ>0\kappa^{\diamond}>0 and 0<ϕ<(1ρ)/(1+ρ)0<\phi_{\diamond}<(1-\rho^{\diamond})/(1+\rho^{\diamond}), we have

max[p]k=1p𝕀{|Corr{Gk(ξ1k),G(ξ1)}||Corr{Fk(ε1k),F(ε1)}|>1(logp)2+κ}=𝒪(pϕ).\displaystyle\max_{\ell\in[p]}\sum_{k=1}^{p}\mathbb{I}\bigg{\{}|\mathrm{Corr}\{G_{k}(\xi_{1k}),G_{\ell}(\xi_{1\ell})\}|\vee|\mathrm{Corr}\{F_{k}(\varepsilon_{1k}),F_{\ell}(\varepsilon_{1\ell})\}|>\frac{1}{(\log p)^{2+\kappa^{\diamond}}}\bigg{\}}=\mathcal{O}(p^{\phi_{\diamond}}).
Theorem 4.4.

Assume that logp=o(N1/5)\log p=o(N^{1/5}) and

ϖp:=|{[p]:|(θθ)/σ~|λ0(2logp)/N}|,\displaystyle\varpi_{p}^{\diamond}:=\left|\left\{\ell\in[p]:|(\theta_{\ell}-\theta_{\ell}^{\circ})/\widetilde{\sigma}_{\ell}|\geq\lambda_{0}\sqrt{(2\log p)/N}\right\}\right|\to\infty, (4.7)

for some λ0>2\lambda_{0}>2, where σ~=N[Var{F(ε1)}/m+Var{G(ξ1)}/n]/{𝒰(Θ)}2\widetilde{\sigma}_{\ell}=N[\mathrm{Var}\{F_{\ell}(\varepsilon_{1\ell})\}/m+\mathrm{Var}\{G_{\ell}(\xi_{1\ell})\}/n]/\{\mathcal{U}_{\ell}^{\prime}(\Theta_{\ell})\}^{2} for each [p]\ell\in[p]. Then, under Assumption 4.2 and the conditions of Theorem E.1, we have

|FDP(tBH)|0|/pα|0.\displaystyle\bigg{|}\frac{\mathrm{FDP}^{\diamond}(t_{BH}^{\diamond})}{|\mathcal{H}_{0}^{\diamond}|/p}-\alpha\bigg{|}\overset{\mathbb{P}}{\rightarrow}0. (4.8)

5 Numerical Studies

In this section, we use simulation experiments to verify the theoretical findings in the paper. In specific, we validate the results on Gaussian approximation and FDP control in via experiments in §5.1 and §5.2, respectively.

5.1 Numerical Studies for Global Tests

We let n=m=300,p=400n=m=300,p=400 and generate the variables {𝐗i}i=1n\{\mathbf{X}_{i}\}_{i=1}^{n} and {𝐘j}j=1m\{\mathbf{Y}_{j}\}_{j=1}^{m} following the setting in §4.1 and §4.2, respectively, where the involved random variables ξip,εjp,(i[n],j[m])\xi_{i}\in\mathbb{R}^{p},\varepsilon_{j}\in\mathbb{R}^{p},(i\in[n],j\in[m]) follow several distributions described below. Cases 1-3 are devoted primarily to one-sample tests, whereas Cases 4-6 are for two-sample tests. Moreover, the notation diff(F)\textrm{diff}(F) denotes the distribution that is generated by the differences between two independent random variables drawn from distribution FF.

  • Case 1: Scaled t3t_{3} distribution with ξi,k0.3t3,(i,k)[n]×[p].\xi_{i,k}\sim 0.3\cdot t_{3},(i,k)\in[n]\times[p].

  • Case 2: Mixture of Pareto distribution with shape parameter 22 and standard Gaussian distribution, namely, ξi,kdiff(0.2Pareto(2)+0.8N(0,1)),(i,k)[n]×[p].\xi_{i,k}\sim\textrm{diff}(0.2\cdot\textrm{Pareto}(2)+0.8\cdot N(0,1)),(i,k)\in[n]\times[p].

  • Case 3: Mixture of Gaussian distributions, namely, ξip0.2N(0,10Σ)+0.8N(0,Σ),i[n],\xi_{i}\in\mathbb{R}^{p}\sim 0.2\cdot N(0,10\Sigma)+0.8\cdot N(0,\Sigma),i\in[n], where Σc,d=0.7|cd|,(c,d)[p]×[p].\Sigma_{c,d}=0.7^{|c-d|},(c,d)\in[p]\times[p].

  • Case 4: Gaussian distributions but with different covariance, namely, ξiN(0,1.5Σ)\xi_{i}\sim N(0,1.5\Sigma) and εjN(0,Σ),(i,j)[n]×[m],\varepsilon_{j}\sim N(0,\Sigma),(i,j)\in[n]\times[m], where the covariance matrix is the same with that in case 3.

  • Case 5: Mixture of Gaussian distributions with ξip0.2N(0,10Σ)+0.8N(0,Σ),i[n],\xi_{i}\in\mathbb{R}^{p}\sim 0.2\cdot N(0,10\Sigma)+0.8\cdot N(0,\Sigma),i\in[n], where the covariance matrix is the same with that in case 3, and mixture of Pareto distribution with shape parameter 2 and standard Gaussian distribution, namely, εj,kdiff(0.2Pareto(2)+0.8N(0,1)),(j,k)[m]×[p].\varepsilon_{j,k}\sim\textrm{diff}(0.2\cdot\textrm{Pareto}(2)+0.8\cdot N(0,1)),(j,k)\in[m]\times[p].

  • Case 6: Scaled t3t_{3} distribution, where ξi,k0.3t3,(i,k)[n]×[p]\xi_{i,k}\sim 0.3\cdot t_{3},(i,k)\in[n]\times[p] and εj,k0.3t3,(j,k)[n]×[p].\varepsilon_{j,k}\sim 0.3\cdot t_{3},(j,k)\in[n]\times[p].

As a first step, we validate the results of Gaussian approximation of HL estimator by conducting the tests in (4.3) and (E.3) with threshold α=0.05\alpha=0.05, respectively. We set the first 50 entries of 𝜽p\bm{\theta}\in\mathbb{R}^{p} (or 𝚯=𝜽𝜽0p\bm{\Theta}=\bm{\theta}-\bm{\theta}_{0}\in\mathbb{R}^{p}) given in §4.1 and §4.2 as μ\mu, and the other entries as 0, where μ\mu increases from 0 to 0.25. When μ=0\mu=0, the null-hypothesis test holds; otherwise, the alternative holds. The size or power of the tests in (4.3) and (E.3) are computed via averaged outcomes from 500 replications of the methods in §4.1.1 and §E.4, respectively. In addition, for every replication, we conduct the weighted bootstrap method 300 times to compute the critical value of the test.

In addition, under the same experimental settings given above, we also compare the performance of HL estimator with the sample mean estimator (1ni=1n𝑿i\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\bm{X}_{i} in one-sample case and nm/(n+m)(1ni=1n𝑿i1mj=1m𝒀j)\sqrt{nm/(n+m)}(\frac{1}{n}\sum_{i=1}^{n}{\bm{X}_{i}}-\frac{1}{m}\sum_{j=1}^{m}\bm{Y}_{j}) in two-sample test). In this scenario, the critical values for tests (4.3) and (E.3) via sample mean estimators are computed via 300 bootstrap samples from joint Gaussian distribution N(𝟎,Σ^)N(\mathbf{0},\widehat{\Sigma}), where Σ^\widehat{\Sigma} is the sample covariance matrix (Σ^=1n1i=1n(𝑿i𝑿¯)(𝑿i𝑿¯)\widehat{\Sigma}=\frac{1}{n-1}\sum_{i=1}^{n}(\bm{X}_{i}-\bar{\bm{X}})(\bm{X}_{i}-\bar{\bm{X}})^{\top} in one-sample test and Σ^=1n+m2i=1n(𝑿i𝑿¯)(𝑿i𝑿¯)+1m+n2j=1m(𝒀j𝒀¯)(𝒀j𝒀¯)\widehat{\Sigma}=\frac{1}{n+m-2}\sum_{i=1}^{n}(\bm{X}_{i}-\bar{\bm{X}})(\bm{X}_{i}-\bar{\bm{X}})^{\top}+\frac{1}{m+n-2}\sum_{j=1}^{m}(\bm{Y}_{j}-\bar{\bm{Y}})(\bm{Y}_{j}-\bar{\bm{Y}})^{\top} in two-sample test). The results are then summarized in Table 2.

Estimator μ\mu One-sample Two-sample
Case 1 Case 2 Case 3 Case 4 Case 5 Case 6
HL μ=0\mu=0 0.054 0.048 0.052 0.040 0.046 0.045
μ=0.05\mu=0.05 0.880 0.062 0.060 0.080 0.084 0.282
μ=0.10\mu=0.10 1.000 0.480 0.464 0.164 0.140 1.000
μ=0.15\mu=0.15 1.000 0.982 0.942 0.322 0.320 1.000
μ=0.20\mu=0.20 1.000 1.000 1.000 0.728 0.688 1.000
μ=0.25\mu=0.25 1.000 1.000 1.000 0.944 1.000 1.000
Mean μ=0\mu=0 0.004 0.000 0.044 0.052 0.000 0.000
μ=0.05\mu=0.05 0.042 0.000 0.046 0.122 0.000 0.000
μ=0.10\mu=0.10 0.540 0.000 0.478 0.200 0.000 0.166
μ=0.15\mu=0.15 0.818 0.002 0.940 0.384 0.002 0.718
μ=0.20\mu=0.20 1.000 0.006 1.000 0.742 0.008 0.858
μ=0.25\mu=0.25 1.000 0.010 1.000 0.968 0.020 0.900
Table 2: Sizes and powers for testing the global null using Gaussian approximation via HL estimator and sample mean estimator, respectively.

We conclude from Table 2 that, in terms of the HL estimator, for all scenarios, the sizes of the test are roughly 0.050.05 when the null hypothesis is true (μ=0\mu=0). Therefore, this validates the results of the Gaussian approximation. On the other hand, when the alternative holds, the power of the tests via HL estimator rises quickly to 11 as μ\mu increases. This demonstrates the effectiveness of the HL test statistics. However, in terms of sample mean estimator, one observes that when the null holds, the size of the test is approximately 0 for most scenarios, whereas, when alternative holds, the power is much less than that of HL estimator. Thus, this further confirms the efficiency and robustness of HL estimator.

5.2 Numerical Studies for FDP Control

We validate the theoretical findings for FDP control under multiple regimes and also compare the performance of HL estimator with student’s tt-statistics given in Liu and Shao (2014). In specific, we maintain most of the settings mentioned in §5.1. The only difference is that we let the first 5050 entries of 𝜽\bm{\theta} and 𝚯\bm{\Theta} be μ\mu, where μ{0.5,0.3},\mu\in\{0.5,0.3\}, and the rest ones being 0. The target false discover proportion α\alpha is varied uniformly from 0.050.05 to 0.250.25 and the empirical FDP are computed via the procedures in §4.1.2 and §4.2, respectively. Meanwhile, for both of these two statistics, besides the FDP, the true positive proportion (TPP) of the test TPP(tBH)=1𝕀{PtBH}/|1|\textrm{TPP}(t_{BH})=\sum_{\ell\in\mathcal{H}_{1}}\mathbb{I}\{P_{\ell}\leq t_{BH}\}/|\mathcal{H}_{1}| is also computed. The numerical outcomes are summarized in Table 3 and Table 4 (corresponds to cases with μ=0.5\mu=0.5 and μ=0.3\mu=0.3, respectively).

Estimator α\alpha 0.050.05 0.100.10 0.150.15 0.200.20 0.250.25
HL Case 1 0.068 (1.000) 0.110 (1.000) 0.152 (1.000) 0.212 (1.000) 0.256 (1.000)
Case 2 0.062 (1.000) 0.116 (1.000) 0.154 (1.000) 0.198 (1.000) 0.254 (1.000)
Case 3 0.052 (1.000) 0.102 (1.000) 0.150 (1.000) 0.188 (1.000) 0.248 (1.000)
Case 4 0.058 (1.000) 0.122 (1.000) 0.170 (1.000) 0.238 (1.000) 0.273 (1.000)
Case 5 0.064 (1.000) 0.103 (1.000) 0.142 (1.000) 0.187 (1.000) 0.234 (1.000)
Case 6 0.062 (1.000) 0.092 (1.000) 0.151 (1.000) 0.193 (1.000) 0.232 (1.000)
Student’s tt Case 1 0.071 (1.000) 0.109 (1.000) 0.168 (1.000) 0.228 (1.000) 0.276 (1.000)
Case 2 0.033 (0.976) 0.075 (0.976) 0.118 (0.990) 0.161 (0.996) 0.217 (1.000)
Case 3 0.042 (1.000) 0.087 (1.000) 0.131 (1.000) 0.206 (1.000) 0.255 (1.000)
Case 4 0.063 (1.000) 0.127 (1.000) 0.181 (1.000) 0.246 (1.000) 0.299 (1.000)
Case 5 0.026 (0.962) 0.064 (0.962) 0.110 (0.988) 0.159 (0.994) 0.195 (1.000)
Case 6 0.048 (1.000) 0.106 (1.000) 0.168 (1.000) 0.196 (1.000) 0.272 (1.000)
Table 3: Empirical FDP and TPP versus the nominal level α\alpha of the test via HL estimator and student’s tt-statistics when μ=0.5\mu=0.5. The numbers outside and inside the brackets are averaged empirical false discovery proportions (FDP) and averaged true positive proportions (TPP), respectively, from 50 replications of the experiments in §4.1.2 and §4.2. For every replication, we conduct bootstrap 300 times for every dimension to compute the corresponding empirical pp-values for HL estimator using the method in §4.1.2 and §4.2. In addition, the pp-values for student’s tt-statistics are computed via quantiles of the standard Gaussian distribution.
Estimator α\alpha 0.050.05 0.100.10 0.150.15 0.200.20 0.250.25
HL Case 1 0.048 (1.000) 0.112 (1.000) 0.154 (1.000) 0.202 (1.000) 0.256 (1.000)
Case 2 0.074 (1.000) 0.133 (1.000) 0.178 (1.000) 0.222 (1.000) 0.260 (1.000)
Case 3 0.054 (1.000) 0.104 (1.000) 0.153 (1.000) 0.210 (1.000) 0.254 (1.000)
Case 4 0.006 (0.984) 0.048 (0.998) 0.108 (1.000) 0.176 (1.000) 0.232 (1.000)
Case 5 0.000 (0.992) 0.024 (0.992) 0.092 (0.992) 0.166 (0.992) 0.200 (0.992)
Case 6 0.055 (1.000) 0.095 (1.000) 0.138 (1.000) 0.184 (1.000) 0.254 (1.000)
Student’s tt Case 1 0.044 (1.000) 0.098 (1.000) 0.166 (1.000) 0.204 (1.000) 0.263 (1.000)
Case 2 0.011 (0.910) 0.056 (0.910) 0.106 (0.930) 0.161 (0.968) 0.205 (0.984)
Case 3 0.052 (1.000) 0.106 (1.000) 0.150 (1.000) 0.198 (1.000) 0.259 (1.000)
Case 4 0.004 (1.000) 0.046 (1.000) 0.110 (1.000) 0.158 (1.000) 0.210 (1.000)
Case 5 0.000 (0.870) 0.000 (0.952) 0.056 (0.984) 0.100 (0.990) 0.170 (1.000)
Case 6 0.046 (1.000) 0.106 (1.000) 0.140 (1.000) 0.180 (1.000) 0.262 (1.000)
Table 4: Empirical FDP and TPP versus the nominal level α\alpha of the test via HL estimator and student’s tt-statistics when μ=0.3\mu=0.3. The remain captions are the same with those in Table 3.

Compared with the student’s t-statistics, the empirical FDPs of HL estimator are closer to the theoretical thresholds in most cases and outcomes based on HL estimator have larger true positive proportion (TPP). This validates the theory of FDP control and confirms the benefit and effectiveness of using HL estimator when heavy-tailed error exists. In addition, we also further compare the performance of HL estimator with the student’s t-statistics in both one-sample and two-sample tests when the noises follow t1t_{1} distribution, where the HL estimator also performs much better. Interested readers are referred to §A.1 for more details.

6 Conclusion

In large-scale data analysis, conventional methods are ineffective since outliers and variables with heavy tails can easily corrupt the data. To resolve heavy-tailed contamination, existing techniques, such as Huberized mean, truncation, and median of means, always require additional tuning parameters and moment restrictions. Consequently, they cannot effectively be scaled to the high-dimensional applications with fidelity. Using the well-known Hodge-Lehmann estimator, the constraint on moment and tuning parameters can be removed. However, its non-asymptotic and large-scale properties have never been investigated. This paper fills this important gap by contributing a finite-sample analysis of the HL estimator, generalizing it to large-scale studies, and proposing tuning-free and moment-free high-dimensional testing methods.

There are various potential future directions that merit further studies. First, we permit mild measurement dependence while controlling the FDP in both the one- and two-sample regimes. However, in reality, high-dimensional data can exhibit strong dependency such as those collected in the field of economics, finance, genomics, and meteorology. To solve such strong dependence problems, factor-adjusted multiple testing via Huber-type estimation has been proposed (Fan et al., 2019). However, using Huber-type loss will result in extra tuning parameters and moment constraints. As a result, factor adjustments can be incorporated into the large-scale HL estimation and testing procedures in order to develop tuning-free procedures that can be adapted to a strong dependence scenario. Second, in terms of computation, we need to compute the median of 𝒪(n2)\mathcal{O}(n^{2}) pairs. However, in the one sample estimation regime, if we use the sub-sampling idea, one may compute the median of n/2n/2 non-overlapping pair averages:

θτ=median(Xτ(2i1)+Xτ(2i)2,i[n/2])\displaystyle\theta^{\tau}=\textrm{median}\left(\frac{X_{\tau(2i-1)}+X_{\tau(2i)}}{2},i\in[n/2]\right)

with a random permutation τ\tau on [n].[n]. By taking 5 or more permutations and the averages of these estimates, numerically, the averaged one approximates well the Hodge-Lehmann estimator and only requires 𝒪(n)\mathcal{O}(n) samples (Fan et al., 2020b). Therefore, it is also interesting to derive non-asymptotic analysis for this estimator based on sub-sampling.

References

  • Arcones (1995) Arcones, M. A. (1995). The asymptotic accuracy of the bootstrap of u-quantiles. Ann. Statist. 1802–1822.
  • Arcones (1996) Arcones, M. A. (1996). The bahadur-kiefer representation for u-quantiles. Ann. Statist., 24 1400–1422.
  • Bai and Saranadasa (1996) Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statist. Sinica, 6 311–329.
  • Bauer (1972) Bauer, D. F. (1972). Constructing confidence sets using rank statistics. J. Amer. Statist. Assoc., 67 687–690.
  • Belloni and Chernozhukov (2011) Belloni, A. and Chernozhukov, V. (2011). 1\ell_{1}-penalized quantile regression in high-dimensional sparse models. Ann. Statist., 39 82–130.
  • Benjamini and Hochberg (1995) Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B, 57 289–300.
  • Bentkus and Dzindzalieta (2015) Bentkus, V. K. and Dzindzalieta, D. (2015). A tight gaussian bound for weighted sums of rademacher random variables. Bernoulli, 21 1231–1237.
  • Blanchard and Roquain (2009) Blanchard, G. and Roquain, É. (2009). Adaptive false discovery rate control under independence and dependence. Journal of Machine Learning Research, 10.
  • Brownlees et al. (2015) Brownlees, C., Joly, E. and Lugosi, G. (2015). Empirical risk minimization for heavy-tailed losses. The Annals of Statistics, 43 2507–2536.
  • Bühlmann and Van De Geer (2011) Bühlmann, P. and Van De Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media.
  • Catoni (2012) Catoni, O. (2012). Challenging the empirical mean and empirical variance: a deviation study. In Annales de l’IHP Probabilités et statistiques, vol. 48.
  • Chang et al. (2017) Chang, J., Zheng, C., Zhou, W.-X. and Zhou, W. (2017). Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics, 73 1300–1310.
  • Chen and Qin (2010) Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist., 38 808–835.
  • Chen and Zhou (2020) Chen, X. and Zhou, W.-X. (2020). Robust inference via multiplier bootstrap. Ann. Statist., 48 1665–1691.
  • Chen et al. (2021) Chen, Y., Chi, Y., Fan, J., Ma, C. et al. (2021). Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14 566–806.
  • Chernozhukov et al. (2017) Chernozhukov, V., Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. Ann. Probab., 45 2309–2352.
  • Chernozhuokov et al. (2022) Chernozhuokov, V., Chetverikov, D., Kato, K. and Koike, Y. (2022). Improved central limit theorem and bootstrap approximations in high dimensions. Ann. Statist., 50 2562–2586.
  • Chi (2007) Chi, Z. (2007). On the performance of fdr control: constraints and a partial solution. The Annals of Statistics, 35 1409–1431.
  • Cont (2001) Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative finance, 1 223.
  • Eklund et al. (2016) Eklund, A., Nichols, T. E. and Knutsson, H. (2016). Cluster failure: Why fmri inferences for spatial extent have inflated false-positive rates. Proceedings of the national academy of sciences, 113 7900–7905.
  • Fan et al. (2014) Fan, J., Fan, Y. and Barut, E. (2014). Adaptive robust variable selection. Ann. Statist., 42 324–351.
  • Fan et al. (2022a) Fan, J., Gu, Y. and Zhou, W.-X. (2022a). How do noise tails impact on deep relu networks? arXiv preprint arXiv:2203.10418.
  • Fan et al. (2007) Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s tt or bootstrap calibration be applied? J. Amer. Statist. Assoc., 102 1282–1288.
  • Fan et al. (2019) Fan, J., Ke, Y., Sun, Q. and Zhou, W.-X. (2019). FarmTest: factor-adjusted robust multiple testing with approximate false discovery control. J. Amer. Statist. Assoc., 114 1880–1893.
  • Fan et al. (2017) Fan, J., Li, Q. and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B. Stat. Methodol., 79 247–265.
  • Fan et al. (2020a) Fan, J., Li, R., Zhang, C.-H. and Zou, H. (2020a). Statistical foundations of data science. Chapman and Hall/CRC.
  • Fan et al. (2022b) Fan, J., Lou, Z. and Yu, M. (2022b). Are latent factor regression and sparse regression adequate? arXiv preprint arXiv:2203.01219.
  • Fan et al. (2020b) Fan, J., Ma, C. and Wang, K. (2020b). Comment on “a tuning-free robust and efficient approach to high-dimensional regression”. Journal of the American Statistical Association, 115 1720–1725.
  • Fan et al. (2021) Fan, J., Wang, W. and Zhu, Z. (2021). A shrinkage principle for heavy-tailed data: high-dimensional robust low-rank matrix recovery. Ann. Statist., 49 1239–1266.
  • Fan et al. (2022c) Fan, J., Yang, Z. and Yu, M. (2022c). Understanding implicit regularization in over-parameterized single index model. Journal of the American Statistical Association 1–14.
  • Fan and Yao (2017) Fan, J. and Yao, Q. (2017). The elements of financial econometrics. Cambridge University Press.
  • Fang et al. (2020) Fang, X., Luo, L. and Shao, Q.-M. (2020). A refined Cramér-type moderate deviation for sums of local statistics. Bernoulli, 26 2319–2352.
  • Ferreira and Zwinderman (2006) Ferreira, J. A. and Zwinderman, A. H. (2006). On the Benjamini-Hochberg method. Ann. Statist., 34 1827–1849.
  • Finotello and Di Camillo (2015) Finotello, F. and Di Camillo, B. (2015). Measuring differential gene expression with rna-seq: challenges and strategies for data analysis. Briefings in functional genomics, 14 130–142.
  • Genovese and Wasserman (2004) Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. The annals of statistics, 32 1035–1061.
  • Giné et al. (2000) Giné, E., Latała, R. and Zinn, J. (2000). Exponential and moment inequalities for u-statistics. In High Dimensional Probability II. Springer, 13–38.
  • Goldstein et al. (2018) Goldstein, L., Minsker, S. and Wei, X. (2018). Structured signal recovery from non-linear and heavy-tailed measurements. IEEE Transactions on Information Theory, 64 5513–5530.
  • Gupta et al. (2014) Gupta, S., Ellis, S. E., Ashar, F. N., Moes, A., Bader, J. S., Zhan, J., West, A. B. and Arking, D. E. (2014). Transcriptome analysis reveals dysregulation of innate immune response genes and neuronal activity-dependent genes in autism. Nature communications, 5 1–8.
  • Hastie et al. (2009) Hastie, T., Tibshirani, R., Friedman, J. H. and Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer.
  • Hastie et al. (2015) Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical learning with sparsity. Monographs on statistics and applied probability, 143 143.
  • He and Shao (1996) He, X. and Shao, Q.-M. (1996). A general bahadur representation of m-estimators and its application to linear regression with nonstochastic designs. The Annals of Statistics, 24 2608–2630.
  • He and Shao (2000) He, X. and Shao, Q.-M. (2000). On parameters of increasing dimensions. Journal of multivariate analysis, 73 120–135.
  • Hodges and Lehmann (1963) Hodges, J. L., Jr. and Lehmann, E. L. (1963). Estimates of location based on rank tests. Ann. Math. Statist., 34 598–611.
  • Høyland (1965) Høyland, A. (1965). Robustness of the Hodges-Lehmann estimates for shift. Ann. Math. Statist., 36 174–197.
  • Hsu and Sabato (2016) Hsu, D. and Sabato, S. (2016). Loss minimization and parameter estimation with heavy tails. The Journal of Machine Learning Research, 17 543–582.
  • Huber (1973) Huber, P. J. (1973). Robust regression: asymptotics, conjectures and monte carlo. The annals of statistics 799–821.
  • Koenker and Hallock (2001) Koenker, R. and Hallock, K. F. (2001). Quantile regression. Journal of economic perspectives, 15 143–156.
  • Lehmann (1963) Lehmann, E. L. (1963). Nonparametric confidence intervals for a shift parameter. Ann. Math. Statist., 34 1507–1512.
  • Li and Chen (2012) Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist., 40 908–940.
  • Li and Tibshirani (2013) Li, J. and Tibshirani, R. (2013). Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat. Methods Med. Res., 22 519–536.
  • Li et al. (2012) Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012). Normalization, testing, and false discovery rate estimation for rna-sequencing data. Biostatistics, 13 523–538.
  • Liu and Shao (2014) Liu, W. and Shao, Q.-M. (2014). Phase transition and regularized bootstrap in large-scale tt-tests with false discovery rate control. Ann. Statist., 42 2003–2025.
  • Loh (2017) Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust mm-estimators. The Annals of Statistics, 45 866–896.
  • Mammen (1989) Mammen, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap. The Annals of Statistics 382–400.
  • Minsker (2015) Minsker, S. (2015). Geometric median and robust estimation in banach spaces. Bernoulli, 21 2308–2335.
  • Minsker (2018) Minsker, S. (2018). Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46 2871–2903.
  • Nagalakshmi et al. (2008) Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by rna sequencing. Science, 320 1344–1349.
  • Nemirovsky and Yudin (1983) Nemirovsky, A. S. and Yudin, D. B. a. (1983). Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics, John Wiley & Sons, Inc., New York.
  • Petrov (1975) Petrov, V. V. (1975). Sums of independent random variables. Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 82, Springer-Verlag, New York-Heidelberg. Translated from the Russian by A. A. Brown.
  • Rosenkranz (2010) Rosenkranz, G. K. (2010). A note on the hodges–lehmann estimator. Pharmaceutical statistics, 9 162–167.
  • Shendure and Ji (2008) Shendure, J. and Ji, H. (2008). Next-generation dna sequencing. Nature biotechnology, 26 1135–1145.
  • Stock and Watson (2002) Stock, J. H. and Watson, M. W. (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics, 20 147–162.
  • Storey (2002) Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol., 64 479–498.
  • Storey (2003) Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the qq-value. Ann. Statist., 31 2013–2035.
  • Storey et al. (2004) Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol., 66 187–205.
  • Sun et al. (2020) Sun, Q., Zhou, W.-X. and Fan, J. (2020). Adaptive huber regression. Journal of the American Statistical Association, 115 254–265.
  • Wainwright (2019) Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, vol. 48. Cambridge University Press.
  • Wang and Fan (2022) Wang, B. and Fan, J. (2022). Robust matrix completion with heavy-tailed noise. arXiv preprint arXiv:2206.04276.
  • Wang et al. (2020) Wang, L., Peng, B., Bradic, J., Li, R. and Wu, Y. (2020). A tuning-free robust and efficient approach to high-dimensional regression. Journal of the American Statistical Association, 115 1700–1714.
  • Wang et al. (2015) Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. Journal of the American Statistical Association, 110 1658–1669.
  • Wang et al. (2009) Wang, Z., Gerstein, M. and Snyder, M. (2009). Rna-seq: a revolutionary tool for transcriptomics. Nature reviews genetics, 10 57–63.
  • Xia et al. (2018) Xia, Y., Cai, T. T. and Li, H. (2018). Joint testing and false discovery rate control in high-dimensional multivariate regression. Biometrika, 105 249–269.
  • Yang et al. (2017) Yang, Z., Balasubramanian, K. and Liu, H. (2017). High-dimensional non-gaussian single index models via thresholded score function estimation. In International conference on machine learning. PMLR.
  • Yohai and Maronna (1979) Yohai, V. J. and Maronna, R. A. (1979). Asymptotic behavior of m-estimators for the linear model. The Annals of Statistics 258–268.
  • Zhang et al. (2020) Zhang, J.-T., Guo, J., Zhou, B. and Cheng, M.-Y. (2020). A simple two-sample test in high dimensions based on L2L^{2}-norm. J. Amer. Statist. Assoc., 115 1011–1027.
  • Zheng et al. (2015) Zheng, Q., Peng, L. and He, X. (2015). Globally adaptive quantile regression with ultra-high dimensional data. Annals of statistics, 43 2225.
  • Zhou et al. (2018) Zhou, W.-X., Bose, K., Fan, J. and Liu, H. (2018). A new perspective on robust MM-estimation: finite sample theory and applications to dependence-adjusted multiple testing. Ann. Statist., 46 1904–1931.

Appendix A Appendix

In the following sections, we present additional simulation results as well as the proofs of all the theoretical results in the main paper.

A.1 Additional Simulation

In this section, we conduct additional simulations for FDP control using HL estimator and student’s t-statistics, respectively. We keep all settings as in §5.2, except that the noises are generated from the following case 7 and case 8, where we change the distribution in case 1 and case 6 from t3t_{3} to t1.t_{1}. The numerical outcomes are summarized in Table 5 and Table 6.

  • Case 7: Scaled t1t_{1} distribution with ξi,k0.3t1,(i,k)[n]×[p].\xi_{i,k}\sim 0.3\cdot t_{1},(i,k)\in[n]\times[p].

  • Case 8: Scaled t1t_{1} distribution, where ξi,k0.3t1,(i,k)[n]×[p]\xi_{i,k}\sim 0.3\cdot t_{1},(i,k)\in[n]\times[p] and εj,k0.3t1,(j,k)[n]×[p].\varepsilon_{j,k}\sim 0.3\cdot t_{1},(j,k)\in[n]\times[p].

Estimator α\alpha 0.050.05 0.100.10 0.150.15 0.200.20 0.250.25
HL Case 7 0.036 (1.000) 0.082 (1.000) 0.132 (1.000) 0.180 (1.000) 0.222 (1.000)
Case 8 0.069 (1.000) 0.105 (1.000) 0.168 (1.000) 0.225 (1.000) 0.272 (1.000)
Student’s tt Case 7 0.000 (0.282) 0.000 (0.284) 0.000 (0.320) 0.000 (0.320) 0.000 (0.320)
Case 8 0.000 (0.274) 0.000 (0.274) 0.000 (0.274) 0.000 (0.322) 0.000 (0.322)
Table 5: Empirical FDP and TPP versus the nominal level α\alpha of the test via HL estimator and student’s tt-statistics when μ=0.5\mu=0.5. The remain captions are the same with those in Table 3.
Estimator α\alpha 0.050.05 0.100.10 0.150.15 0.200.20 0.250.25
HL Case 7 0.036 (1.000) 0.82 (1.000) 0.122 (1.000) 0.179 (1.000) 0.224 (1.000)
Case 8 0.058 (1.000) 0.105 (1.000) 0.157 (1.000) 0.214 (1.000) 0.252 (1.000)
Student’s tt Case 7 0.000 (0.282) 0.000 (0.282) 0.000 (0.282) 0.000 (0.284) 0.000 (0.284)
Case 8 0.000 (0.134) 0.000 (0.134) 0.000 (0.134) 0.000 (0.134) 0.000 (0.134)
Table 6: Empirical FDP and TPP versus the nominal level α\alpha of the test via HL estimator and student’s tt-statistics when μ=0.3\mu=0.3. The other captions are the same with those in Table 3.

We conclude from Table 5 and Table 6, the HL estimator outperforms the student’s tt-statistics when the noise follows t1t_{1} distribution in terms of the FDP and TPP. Therefore, this further confirms the robustness of HL estimator.

Appendix B Lemmas

Lemma B.1.

Let Sn=i=1nYiS_{n}=\sum_{i=1}^{n}Y_{i}, where Y1,,YnY_{1},\ldots,Y_{n}\in\mathbb{R} are independent random variables such that aiYibia_{i}\leq Y_{i}\leq b_{i} for each i[n]i\in[n]. Then, for any z>0z>0, we have

(|Sn𝔼Sn|>z)2exp{2z2i=1n(biai)2}.\displaystyle\mathbb{P}(|S_{n}-\mathbb{E}S_{n}|>z)\leq 2\exp\left\{-\frac{2z^{2}}{\sum_{i=1}^{n}(b_{i}-a_{i})^{2}}\right\}.
Lemma B.2.

Let Y1,,YnY_{1},\ldots,Y_{n}\in\mathbb{R} be i.i.d. random variables and Hn={n(n1)}1ij[n]h(Xi,Xj)H_{n}=\{n(n-1)\}^{-1}\sum_{i\neq j\in[n]}h(X_{i},X_{j}), where h(x,y)h(x,y) is a symmetric function with ah(x,y)ba\leq h(x,y)\leq b for some aba\leq b\in\mathbb{R}. Then, for any z>0z>0, we have

(Hn𝔼Hn>z)exp{nz2(ba)2}.\displaystyle\mathbb{P}(H_{n}-\mathbb{E}H_{n}>z)\leq\exp\left\{-\frac{nz^{2}}{(b-a)^{2}}\right\}.

Let V1,,VmV_{1},\ldots,V_{m}\in\mathbb{R} be another sample of i.i.d. random variables independent of {Y1,,Yn}\{Y_{1},\ldots,Y_{n}\} and let n,m=(nm)1i=1nj=1mh¯(Yi,Vj)\mathcal{H}_{n,m}=(nm)^{-1}\sum_{i=1}^{n}\sum_{j=1}^{m}\bar{h}(Y_{i},V_{j}), where h¯(x,y)\bar{h}(x,y) is bounded such that ah¯(x,y)ba\leq\bar{h}(x,y)\leq b for some aba\leq b\in\mathbb{R}. Then, for any z>0z>0, we have

(n,m𝔼n,m>z)exp{2(nm)z2(ba)2}.\displaystyle\mathbb{P}(\mathcal{H}_{n,m}-\mathbb{E}\mathcal{H}_{n,m}>z)\leq\exp\left\{-\frac{2(n\wedge m)z^{2}}{(b-a)^{2}}\right\}.

Appendix C Proof of Theoretical Results in §2

C.1 Proof of Theorem 2.1

For simplicity of notation, we write Wn(t)=Un(θ+t)U(θ+t)W_{n}(t)=U_{n}(\theta+t)-U(\theta+t) for tt\in\mathbb{R}. Define R(Xi,Xj,t)=𝕀{θ<(Xi+Xj)/2θ+t}R(X_{i},X_{j},t)=\mathbb{I}\{\theta<(X_{i}+X_{j})/2\leq\theta+t\} and

R¯(Xi,Xj,t)=R(Xi,Xj,t)𝔼{R(Xi,Xj,t)|Xi}𝔼{R(Xi,Xj,t)|Xj}+𝔼{R(Xi,Xj,t)}.\displaystyle\bar{R}(X_{i},X_{j},t)=R(X_{i},X_{j},t)-\mathbb{E}\{R(X_{i},X_{j},t)|X_{i}\}-\mathbb{E}\{R(X_{i},X_{j},t)|X_{j}\}+\mathbb{E}\{R(X_{i},X_{j},t)\}.

With this notation, we have Wn(t)Wn(0)=R¯n(t)+R¯n(t)W_{n}(t)-W_{n}(0)=\bar{R}_{n}^{\diamond}(t)+\bar{R}_{n}^{\circ}(t), where

R¯n(t)=2ni=1n[𝔼{R(Xi,Xj,t)|Xi}𝔼{R(Xi,Xj,t)}]andR¯n(t)=1n(n1)ij[n]R¯(Xi,Xj,t).\displaystyle\bar{R}_{n}^{\diamond}(t)=\frac{2}{n}\sum_{i=1}^{n}[\mathbb{E}\{R(X_{i},X_{j},t)|X_{i}\}-\mathbb{E}\{R(X_{i},X_{j},t)\}]\enspace\mathrm{and}\enspace\bar{R}_{n}^{\circ}(t)=\frac{1}{n(n-1)}\sum_{i\neq j\in[n]}\bar{R}(X_{i},X_{j},t).
Lemma C.1.

For any 0<t1/(2f)0<t\leq 1/(2\|f\|_{\infty}), there exists a universal positive constant CC such that

(|R¯n(t)|>z)Cexp[1Cmin{n2z2tf,(n3z2tf)1/3,nztf,nz1/2}].\displaystyle\mathbb{P}(|\bar{R}_{n}^{\circ}(t)|>z)\leq C\exp\bigg{[}-\frac{1}{C}\min\bigg{\{}\frac{n^{2}z^{2}}{t\|f\|_{\infty}},\bigg{(}\frac{n^{3}z^{2}}{t\|f\|_{\infty}}\bigg{)}^{1/3},\frac{nz}{\sqrt{t\|f\|_{\infty}}},nz^{1/2}\bigg{\}}\bigg{]}. (C.1)
Proof of Lemma C.1.

Observe that R¯(Xi,Xj,t)\bar{R}(X_{i},X_{j},t) is a bounded canonical kernel of XiX_{i} and XjX_{j} for any tt\in\mathbb{R}. Hence it is straightforward to derive (C.1) by Theorem 3.3 in Giné et al. (2000). ∎

Lemma C.2.

For any 0<Λ1/(2f)0<\Lambda\leq 1/(2\|f\|_{\infty}), with probability at least 1Cexp(z)1-C\exp(-z), we have

sup|t|Λ|Wn(t)Wn(0)|C1{(K+zn)2+Λf(z+C2n)1/2+K+zn(Λf)1/2},\displaystyle\sup_{|t|\leq\Lambda}|W_{n}(t)-W_{n}(0)|\leq C_{1}\bigg{\{}\bigg{(}\frac{K+z}{n}\bigg{)}^{2}+\Lambda\|f\|_{\infty}\bigg{(}\frac{z+C_{2}}{n}\bigg{)}^{1/2}+\frac{K+z}{n}(\Lambda\|f\|_{\infty})^{1/2}\bigg{\}}, (C.2)

where K=min{k:2kn}K=\min\{k\in\mathbb{N}:2^{k}\geq n\}.

Proof of Lemma C.2.

For any h[2K]h\in[2^{K}] and t(Λ(h1)2K,Λh2K]t\in(\Lambda(h-1)2^{-K},\Lambda h2^{-K}], we have

Wn(t)Wn(0)\displaystyle W_{n}(t)-W_{n}(0) Un(θ+Λh2K)U(θ+Λ(h1)2K)Wn(0)\displaystyle\leq U_{n}\left(\theta+\frac{\Lambda h}{2^{K}}\right)-U\left(\theta+\frac{\Lambda(h-1)}{2^{K}}\right)-W_{n}(0)
Wn(Λh2K)Wn(0)+U(θ+Λh2K)U(θ+Λ(h1)2K)\displaystyle\leq W_{n}\left(\frac{\Lambda h}{2^{K}}\right)-W_{n}(0)+U\left(\theta+\frac{\Lambda h}{2^{K}}\right)-U\bigg{(}\theta+\frac{\Lambda(h-1)}{2^{K}}\bigg{)}
Wn(Λh2K)Wn(0)+ΛU2K.\displaystyle\leq W_{n}\left(\frac{\Lambda h}{2^{K}}\right)-W_{n}(0)+\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}.

Similarly, we have

Wn(t)Wn(0)Wn(Λ(h1)2K)Wn(0)ΛU2K.\displaystyle W_{n}(t)-W_{n}(0)\geq W_{n}\bigg{(}\frac{\Lambda(h-1)}{2^{K}}\bigg{)}-W_{n}(0)-\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}.

Consequently, we obtain

sup0<tΛ|Wn(t)Wn(0)|\displaystyle\sup_{0<t\leq\Lambda}|W_{n}(t)-W_{n}(0)| maxh[2K]|Wn(Λh2K)Wn(0)|+ΛU2K\displaystyle\leq\max_{h\in[2^{K}]}\left|W_{n}\left(\frac{\Lambda h}{2^{K}}\right)-W_{n}(0)\right|+\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}
maxh[2K]|R¯n(Λh2K)|ΓK+maxh[2K]|R¯n(Λh2K)|ΓK+ΛU2K.\displaystyle\leq\underbrace{\max_{h\in[2^{K}]}\left|\bar{R}_{n}^{\circ}\left(\frac{\Lambda h}{2^{K}}\right)\right|}_{\Gamma_{K}^{\circ}}+\underbrace{\max_{h\in[2^{K}]}\left|\bar{R}_{n}^{\diamond}\left(\frac{\Lambda h}{2^{K}}\right)\right|}_{\Gamma_{K}^{\diamond}}+\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}.

For each h[2K]h\in[2^{K}], by (C.1), with probability at least 1Cexp(z)1-C\exp(-z),

ΓKC{(K+zn)2+K+znΛU},\displaystyle\Gamma_{K}^{\circ}\leq C\bigg{\{}\bigg{(}\frac{K+z}{n}\bigg{)}^{2}+\frac{K+z}{n}\sqrt{\Lambda\|U^{\prime}\|_{\infty}}\bigg{\}},

where C>0C>0 is a universal constant.

ΓKk=1Kmaxh[2k]|R¯n(Λh2k)R¯n(Λ(h1)2k)|+maxh[2]|R¯n(Λh2)|=:ΓK,1+ΓK,2.\displaystyle\Gamma_{K}^{\diamond}\leq\sum_{k=1}^{K}\max_{h\in[2^{k}]}\bigg{|}\bar{R}_{n}^{\diamond}\bigg{(}\frac{\Lambda h}{2^{k}}\bigg{)}-\bar{R}_{n}^{\diamond}\bigg{(}\frac{\Lambda(h-1)}{2^{k}}\bigg{)}\bigg{|}+\max_{h\in[2]}\bigg{|}\bar{R}_{n}^{\diamond}\bigg{(}\frac{\Lambda h}{2}\bigg{)}\bigg{|}=:\Gamma_{K,1}^{\diamond}+\Gamma_{K,2}^{\diamond}.

For each k[K]k\in[K], by Lemma B.1, with probability at least 1exp(z)1-\exp(-z), we have

maxh[2k]|R¯n(Λh2k)R¯n(Λ(h1)2k)|CΛf2kk+zn.\displaystyle\max_{h\in[2^{k}]}\bigg{|}\bar{R}_{n}^{\diamond}\left(\frac{\Lambda h}{2^{k}}\right)-\bar{R}_{n}^{\diamond}\bigg{(}\frac{\Lambda(h-1)}{2^{k}}\bigg{)}\bigg{|}\leq C\Lambda\|f\|_{\infty}2^{-k}\sqrt{\frac{k+z}{n}}.

Hence with the same probability, we have

ΓK,1k=1K(CΛf2k2k+zn)C1Λfz+C2n.\displaystyle\Gamma_{K,1}^{\diamond}\leq\sum_{k=1}^{K}\bigg{(}C\Lambda\|f\|_{\infty}2^{-k}\sqrt{\frac{2k+z}{n}}\bigg{)}\leq C_{1}\Lambda\|f\|_{\infty}\sqrt{\frac{z+C_{2}}{n}}.

Similarly, it follows that

(ΓK,2>C3Λf(2z)/n)C4exp(z).\displaystyle\mathbb{P}\left(\Gamma_{K,2}^{\diamond}>C_{3}\Lambda\|f\|_{\infty}\sqrt{(2z)/n}\right)\leq C_{4}\exp(-z).

Putting all these pieces together, we obtain (C.2). ∎

Proof of Theorem 2.1.

For any z0z\geq 0, since U(t)U(t) is a cumulative distribution function, we have

12>Un(θ\displaystyle\frac{1}{2}>U_{n}(\theta +z)=U(θ+z){U(θ+z)Un(θ+z)}\displaystyle+z)=U(\theta+z)-\{U(\theta+z)-U_{n}(\theta+z)\}
U(θ)+0zc0U(θ+ν)𝑑ν{U(θ+z)Un(θ+z)}\displaystyle\geq U(\theta)+\int_{0}^{z\wedge c_{0}}U^{\prime}(\theta+\nu)d\nu-\{U(\theta+z)-U_{n}(\theta+z)\}
12+κ0(zc0){U(θ+z)Un(θ+z)}.\displaystyle\geq\frac{1}{2}+\kappa_{0}(z\wedge c_{0})-\{U(\theta+z)-U_{n}(\theta+z)\}.

By the definition of θ^\widehat{\theta} in (C.4), it follows that (θ^>θ+z){1/2>Un(θ+z)}\mathbb{P}(\widehat{\theta}>\theta+z)\leq\mathbb{P}\{1/2>U_{n}(\theta+z)\}. Therefore

(θ^>θ+z){U(θ+z)Un(θ+z)>κ0(zc0)}exp{nκ02(zc0)2},\displaystyle\mathbb{P}(\widehat{\theta}>\theta+z)\leq\mathbb{P}\{U(\theta+z)-U_{n}(\theta+z)>\kappa_{0}(z\wedge c_{0})\}\leq\exp\{-n\kappa_{0}^{2}(z\wedge c_{0})^{2}\},

where the last inequality follows from the Hoeffding inequality in Lemma B.2.

Recall that Wn(t)=Un(θ+t)U(θ+t)W_{n}(t)=U_{n}(\theta+t)-U(\theta+t) for tt\in\mathbb{R}. Taking Λ=z/(nκ02)\Lambda=\sqrt{z/(n\kappa_{0}^{2})} yields (|θ^θ|>Λ)2exp(z)\mathbb{P}(|\widehat{\theta}-\theta|>\Lambda)\leq 2\exp(-z). By Lemma C.2, with probability at least 1Cexp(z)1-C\exp(-z), we have

sup|δ|Λ|Wn(δ)Wn(0)|C1f(z+c)nκ0.\displaystyle\sup_{|\delta|\leq\Lambda}|W_{n}(\delta)-W_{n}(0)|\leq\frac{C_{1}\|f\|_{\infty}(z+c)}{n\kappa_{0}}.

We then obtain

|θ^θUn(θ^)Un(θ)U(θ)||Wn(θ^θ)Wn(0)|κ0+κ1|θ^θ|2κ0C1f(z+c)nκ02+κ1znκ03.\displaystyle\bigg{|}\widehat{\theta}-\theta-\frac{U_{n}(\widehat{\theta})-U_{n}(\theta)}{U^{\prime}(\theta)}\bigg{|}\leq\frac{|W_{n}(\widehat{\theta}-\theta)-W_{n}(0)|}{\kappa_{0}}+\frac{\kappa_{1}|\widehat{\theta}-\theta|^{2}}{\kappa_{0}}\leq\frac{C_{1}\|f\|_{\infty}(z+c)}{n\kappa_{0}^{2}}+\frac{\kappa_{1}z}{n\kappa_{0}^{3}}.

Finally, denote

Δθ:\displaystyle\Delta_{\theta}: =1/2Un(θ)U(θ)2nU(θ)i=1n{12F(ξi)}\displaystyle=\frac{1/2-U_{n}(\theta)}{U^{\prime}(\theta)}-\frac{2}{nU^{\prime}(\theta)}\sum_{i=1}^{n}\left\{\frac{1}{2}-F(-\xi_{i})\right\}
=1n(n1)U(θ)ij[n](F(ξi)+F(ξj)𝕀{ξi+ξj0}12).\displaystyle=\frac{1}{n(n-1)U^{\prime}(\theta)}\sum_{i\neq j\in[n]}\left(F(-\xi_{i})+F(-\xi_{j})-\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-\frac{1}{2}\right).

Observe that |F(ξi)+F(ξj)𝕀{ξi+ξj0}1/2|2|F(-\xi_{i})+F(-\xi_{j})-\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-1/2|\leq 2 uniformly for ij[n]i\neq j\in[n]. Then (2.4) follows from Theorem 3.3 in Giné et al. (2000). ∎

C.2 Proof of Theorem 2.2

Proof of Theorem 2.2.

For simplicity of notation, denote

Tn=n(θ^θ)σθandTn=1nVar{F(ξ1)}i=1n{12F(ξi)}.\displaystyle T_{n}=\frac{\sqrt{n}(\widehat{\theta}-\theta)}{\sigma_{\theta}}\enspace\mathrm{and}\enspace T_{n}^{\sharp}=\frac{1}{\sqrt{n\operatorname{{\rm Var}}\{F(-\xi_{1})\}}}\sum_{i=1}^{n}\left\{\frac{1}{2}-F(-\xi_{i})\right\}. (C.3)

Note that maxi[n]|1/2F(ξi)|1/2\max_{i\in[n]}|1/2-F(-\xi_{i})|\leq 1/2. Hence supz|(Tnz)Φ(z)|Cn1/2\sup_{z\in\mathbb{R}}|\mathbb{P}(T_{n}^{\sharp}\leq z)-\Phi(z)|\leq Cn^{-1/2}. Let 𝒞<\mathcal{C}<\infty be a sufficiently large positive constant. Then it follows from (2.4) that

(|TnTn|>𝒞lognn)C1n.\displaystyle\mathbb{P}\left(|T_{n}-T_{n}^{\sharp}|>\frac{\mathcal{C}\log n}{\sqrt{n}}\right)\leq\frac{C_{1}}{\sqrt{n}}.

Consequently, we obtain

supz|(Tnz)Φ(z)|supz|(Tnz)Φ(z)|+supz|(Tnz)(Tnz)|𝒞(logn)/n.\displaystyle\sup_{z\in\mathbb{R}}|\mathbb{P}(T_{n}\leq z)-\Phi(z)|\leq\sup_{z\in\mathbb{R}}|\mathbb{P}(T_{n}^{\sharp}\leq z)-\Phi(z)|+\sup_{z\in\mathbb{R}}|\mathbb{P}(T_{n}\leq z)-\mathbb{P}(T_{n}^{\sharp}\leq z)|\leq\mathcal{C}(\log n)/\sqrt{n}.

C.3 Proof of Theorem 2.3

Proof of Theorem 2.3.

Recall the definitions of TnT_{n} and TnT_{n}^{\sharp} in (C.3). Since maxi[n]|1/2F(ξi)|1/2\max_{i\in[n]}|1/2-F(-\xi_{i})|\leq 1/2, it follows that

|(Tn>z)1Φ(z)1|C(1+z3)nfor0zC0n1/6,\displaystyle\left|\frac{\mathbb{P}(T_{n}^{\sharp}>z)}{1-\Phi(z)}-1\right|\leq\frac{C(1+z^{3})}{\sqrt{n}}\enspace\mathrm{for}\enspace 0\leq z\leq C_{0}n^{1/6},

where C0>0C_{0}>0 is any fixed constant and CC is a positive constant depending only on C0C_{0}. Denote Φ¯()=1Φ()\bar{\Phi}(\cdot)=1-\Phi(\cdot). By (2.4), we have

(Tn>z)\displaystyle\mathbb{P}(T_{n}>z) (Tn>zδn)+C1exp(nδn)\displaystyle\leq\mathbb{P}(T_{n}^{\sharp}>z-\delta_{n})+C_{1}\exp(-\sqrt{n}\delta_{n})
{1+C(1+z3)n}Φ¯(zδn)+C1exp(nδn)\displaystyle\leq\left\{1+\frac{C(1+z^{3})}{\sqrt{n}}\right\}\bar{\Phi}(z-\delta_{n})+C_{1}\exp(-\sqrt{n}\delta_{n})
{1+C(1+z3)n}{1+(1+z)δnexp(zδn)}Φ¯(z)+C1exp(nδn).\displaystyle\leq\left\{1+\frac{C(1+z^{3})}{\sqrt{n}}\right\}\{1+(1+z)\delta_{n}\exp(z\delta_{n})\}\bar{\Phi}(z)+C_{1}\exp(-\sqrt{n}\delta_{n}).

By Mill’s inequality, for any z>0z>0,

z2πexp(z22)1Φ¯(z)2(z1)2πexp(z22).\displaystyle z\sqrt{2\pi}\exp\left(\frac{z^{2}}{2}\right)\leq\frac{1}{\bar{\Phi}(z)}\leq 2(z\vee 1)\sqrt{2\pi}\exp\left(\frac{z^{2}}{2}\right).

Consequently, it follows that

(Tn>z)1Φ(z)1C(1+z3)n+C2(1+z)δn+2C1(z1)2πexp(z22nδn).\displaystyle\frac{\mathbb{P}(T_{n}>z)}{1-\Phi(z)}-1\leq\frac{C(1+z^{3})}{\sqrt{n}}+C_{2}(1+z)\delta_{n}+2C_{1}(z\vee 1)\sqrt{2\pi}\exp\left(\frac{z^{2}}{2}-\sqrt{n}\delta_{n}\right).

Similarly, we have

1(Tn>z)1Φ(z)C(1+z3)n+C2(1+z)δn+C1z2πexp(z22nδn).\displaystyle 1-\frac{\mathbb{P}(T_{n}>z)}{1-\Phi(z)}\leq\frac{C(1+z^{3})}{\sqrt{n}}+C_{2}(1+z)\delta_{n}+C_{1}z\sqrt{2\pi}\exp\left(\frac{z^{2}}{2}-\sqrt{n}\delta_{n}\right).

Putting all these pieces together, we obtain (2.7). ∎

C.4 Proof of Theorem 2.4

Define the two-sample U-process 𝒰n,m(t)=(nm)1i=1nj=1m𝕀{XiYjt}\mathcal{U}_{n,m}(t)=(nm)^{-1}\sum_{i=1}^{n}\sum_{j=1}^{m}\mathbb{I}\{X_{i}-Y_{j}\leq t\}. Then the HL estimator Θ^\widehat{\Theta} in (2.9) can be equivalently expressed as the sample median of the U-process 𝒰n,m(t)\mathcal{U}_{n,m}(t), namely,

Θ^=inf{t:𝒰n,m(t)1/2}.\displaystyle\widehat{\Theta}=\inf\{t\in\mathbb{R}:\mathcal{U}_{n,m}(t)\geq 1/2\}. (C.4)

The remaining steps for proving Theorem 2.4 can be derived by following similar steps as in the proof of Lemma C.2 and proof of Theorem 2.1.

C.5 Proof of Theorem 2.5

For simplicity of notation, we write 𝒯N=N(Θ^Θ)/Θ~\mathcal{T}_{N}=\sqrt{N}(\widehat{\Theta}-\Theta)/\widetilde{\Theta} and

𝒯N=G¯nF¯mVar(G¯nF¯m),\displaystyle\mathcal{T}_{N}^{\sharp}=\frac{\bar{G}_{n}-\bar{F}_{m}}{\sqrt{\mathrm{Var}(\bar{G}_{n}-\bar{F}_{m})}},

where G¯n=n1i=1nG(ξi)\bar{G}_{n}=n^{-1}\sum_{i=1}^{n}G(\xi_{i}) and F¯m=m1j=1mF(εj)\bar{F}_{m}=m^{-1}\sum_{j=1}^{m}F(\varepsilon_{j}). Then, similar to proof of Theorem 2.2, it follows that

supz|(𝒯Nz)Φ(z)|supz|(𝒯Nz)(𝒯Nz)|+supz|(𝒯Nz)Φ(z)|logNN.\displaystyle\sup_{z\in\mathbb{R}}|\mathbb{P}(\mathcal{T}_{N}\leq z)-\Phi(z)|\leq\sup_{z\in\mathbb{R}}|\mathbb{P}(\mathcal{T}_{N}\leq z)-\mathbb{P}(\mathcal{T}_{N}^{\sharp}\leq z)|+\sup_{z\in\mathbb{R}}|\mathbb{P}(\mathcal{T}_{N}^{\sharp}\leq z)-\Phi(z)|\lesssim\frac{\log N}{\sqrt{N}}.

The proof of the Cramér-type moderate deviation for Θ^\widehat{\Theta} can be derived by following similar proof procedures of Theorem 2.3. Therefore, we decide to omit the details.

Appendix D Proof of Results in §3

Define

σ^θ2=4n{U(θ)}2i=1n(1n1ji𝕀{ξi+ξj0}Un(θ))2.\displaystyle\widehat{\sigma}_{\theta}^{2}=\frac{4}{n\{U^{\prime}(\theta)\}^{2}}\sum_{i=1}^{n}\bigg{(}\frac{1}{n-1}\sum_{j\neq i}\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-U_{n}(\theta)\bigg{)}^{2}.
Lemma D.1.

Under the conditions of Theorem 2.1, we have

(|σ^θσθ1|>z)exp(CUn)+exp(CUnz)+exp(CUnz2),\displaystyle\mathbb{P}\bigg{(}\bigg{|}\frac{\widehat{\sigma}_{\theta}}{\sigma_{\theta}}-1\bigg{|}>z\bigg{)}\lesssim\exp(-C_{U}n)+\exp(-C_{U}nz)+\exp(-C_{U}nz^{2}), (D.1)

where CUC_{U} is a positive constant depending only on UU.

Proof of Lemma D.1.

For each i[n]i\in[n], denote

Wi=1n1ji𝕀{ξi+ξj0}F(ξi).\displaystyle W_{i}=\frac{1}{n-1}\sum_{j\neq i}\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-F(-\xi_{i}).

Recall that Un(θ)={n(n1)}1ij[n]𝕀{ξi+ξj0}U_{n}(\theta)=\{n(n-1)\}^{-1}\sum_{i\neq j\in[n]}\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}. Hence

σ^θ2\displaystyle\widehat{\sigma}_{\theta}^{2}- 4n{U(θ)}2i=1n{F(ξi)12}2σ~θ2=4n{U(θ)}2i=1nWi24|Un(θ)1/2|2{U(θ)}2\displaystyle\underbrace{\frac{4}{n\{U^{\prime}(\theta)\}^{2}}\sum_{i=1}^{n}\left\{F(-\xi_{i})-\frac{1}{2}\right\}^{2}}_{\widetilde{\sigma}_{\theta}^{2}}=\frac{4}{n\{U^{\prime}(\theta)\}^{2}}\sum_{i=1}^{n}W_{i}^{2}-\frac{4|U_{n}(\theta)-1/2|^{2}}{\{U^{\prime}(\theta)\}^{2}}
+8n{U(θ)}2i=1nWi{F(ξi)12}=:Δσ,1Δσ,2+Δσ,3.\displaystyle+\frac{8}{n\{U^{\prime}(\theta)\}^{2}}\sum_{i=1}^{n}W_{i}\left\{F(-\xi_{i})-\frac{1}{2}\right\}=:\Delta_{\sigma,1}-\Delta_{\sigma,2}+\Delta_{\sigma,3}.

Since 𝔼(σ~θ2)=σθ2\mathbb{E}(\widetilde{\sigma}_{\theta}^{2})=\sigma_{\theta}^{2} and maxi[n]|F(ξi)1/2|1/2\max_{i\in[n]}|F(-\xi_{i})-1/2|\leq 1/2, it follows from Lemma B.1 that

(|σ~θ2σθ2|>z)2exp(CUnz2),\displaystyle\mathbb{P}(|\widetilde{\sigma}_{\theta}^{2}-\sigma_{\theta}^{2}|>z)\leq 2\exp(-C_{U}nz^{2}),

where CUC_{U} is a positive constant depending only on the density function UU^{\prime}. Note that {𝕀{ξi+ξj0}F(ξi)}ji\{\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-F(-\xi_{i})\}_{j\neq i} is a sequence of independent centered random variables conditional on ξi\xi_{i} for each i[n]i\in[n]. Hence, by Hoeffding’s inequality, we have (maxi[n]Wi2>z)2nexp(Cnz)\mathbb{P}(\max_{i\in[n]}W_{i}^{2}>z)\leq 2n\exp(-Cnz). Therefore

(Δσ,1>z)(4{U(θ)}2maxi[n]Wi2>z)2nexp(CUnz).\displaystyle\mathbb{P}(\Delta_{\sigma,1}>z)\leq\mathbb{P}\bigg{(}\frac{4}{\{U^{\prime}(\theta)\}^{2}}\max_{i\in[n]}W_{i}^{2}>z\bigg{)}\leq 2n\exp(-C_{U}nz).

Consequently, by the Cauchy-Schwarz inequality, we have |Δσ,3|24σ~θ2Δσ,1|\Delta_{\sigma,3}|^{2}\leq 4\widetilde{\sigma}_{\theta}^{2}\Delta_{\sigma,1} and

(|Δσ,4|>z)2exp(CUn)+2nexp(CUnz2).\displaystyle\mathbb{P}(|\Delta_{\sigma,4}|>z)\leq 2\exp(-C_{U}n)+2n\exp(-C_{U}nz^{2}).

By Lemma B.2, we have (|Un(θ)1/2|>z)2exp(nz2)\mathbb{P}(|U_{n}(\theta)-1/2|>z)\leq 2\exp(-nz^{2}) and (Δσ,3>z)2exp(CUnz)\mathbb{P}(\Delta_{\sigma,3}>z)\leq 2\exp(-C_{U}nz). Putting all these pieces together, we obtain (D.1). ∎

D.1 Proof of Theorem 3.1

Define the bootstrap U-process

Un(t)=1Bnij[n]ωiωj𝕀{Xi+Xj2t},whereBn=ij[n]ωiωj.\displaystyle U_{n}^{\star}(t)=\frac{1}{B_{n}}\sum_{i\neq j\in[n]}\omega_{i}\omega_{j}\mathbb{I}\left\{\frac{X_{i}+X_{j}}{2}\leq t\right\},\enspace\mathrm{where}\enspace B_{n}=\sum_{i\neq j\in[n]}\omega_{i}\omega_{j}.

Define Wn(t)=Un(θ+t)U(θ+t)W_{n}^{\star}(t)=U_{n}^{\star}(\theta+t)-U(\theta+t) and Wn(t)=Un(θ+t)Un(θ+t)W_{n}^{\diamond}(t)=U_{n}^{\star}(\theta+t)-U_{n}(\theta+t) for tt\in\mathbb{R}. We first introduce some notation. Let K=min{k:2kn}K=\min\left\{k\in\mathbb{N}:2^{k}\geq n\right\}. For each ij[n]i\neq j\in[n] and h[2K]h\in[2^{K}], denote

(Xi,Xj,Λh2K)\displaystyle\mathcal{R}\left(X_{i},X_{j},\frac{\Lambda h}{2^{K}}\right) =R(Xi,Xj,Λh2K)Un(θ+Λh2K)+Un(θ)\displaystyle=R\left(X_{i},X_{j},\frac{\Lambda h}{2^{K}}\right)-U_{n}\left(\theta+\frac{\Lambda h}{2^{K}}\right)+U_{n}(\theta)
=R(Xi,Xj,Λh2K)1n(n1)ij[n]R(Xi,Xj,Λh2K).\displaystyle=R\left(X_{i},X_{j},\frac{\Lambda h}{2^{K}}\right)-\frac{1}{n(n-1)}\sum_{i\neq j\in[n]}R\left(X_{i},X_{j},\frac{\Lambda h}{2^{K}}\right).
Lemma D.2.

Let 0<Λ1/(2f)0<\Lambda\leq 1/(2\|f\|_{\infty}). Under K={ΓnΓ}\mathcal{E}_{K}=\{\Gamma_{n}\leq\Gamma_{\diamond}\}, for any zCnz\leq Cn, we have

\displaystyle\mathbb{P}^{\star} {sup|t|Λ|Wn(t)Wn(0)|>sup|t|Λ|Wn(t)Wn(0)|+ΛU2K+C1n(z+K)+C2Γ(z+K)2n2}\displaystyle\bigg{\{}\sup_{|t|\leq\Lambda}|W_{n}^{\star}(t)-W_{n}^{\star}(0)|>\sup_{|t|\leq\Lambda}|W_{n}(t)-W_{n}(0)|+\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}+\frac{C_{1}n(z+K)+C_{2}\sqrt{\Gamma_{\diamond}(z+K)}}{2n^{2}}\bigg{\}} (D.2)
Cexp(z)+exp(n16).\displaystyle\leq C\exp(-z)+\exp\left(-\frac{n}{16}\right). (D.3)
Proof of Lemma D.2.

For any h[2K]h\in[2^{K}] and t(Λ(h1)2K,Λh2K]t\in(\Lambda(h-1)2^{-K},\Lambda h2^{-K}], we have

Wn(t)Wn(0)\displaystyle W_{n}^{\star}(t)-W_{n}^{\star}(0) =Un(θ+t)U(θ+t){Un(θ)U(θ)}\displaystyle=U_{n}^{\star}(\theta+t)-U(\theta+t)-\{U_{n}^{\star}(\theta)-U(\theta)\}
Un(θ+Λh2K)U(θ+Λ(h1)2K){Un(θ)U(θ)}\displaystyle\leq U_{n}^{\star}\left(\theta+\frac{\Lambda h}{2^{K}}\right)-U\left(\theta+\frac{\Lambda(h-1)}{2^{K}}\right)-\{U_{n}^{\star}(\theta)-U(\theta)\}
Wn(Λh2K)Wn(0)+ΛU2K\displaystyle\leq W_{n}^{\star}\left(\frac{\Lambda h}{2^{K}}\right)-W_{n}^{\star}(0)+\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}
Wn(Λh2K)Wn(0)+sup|t|Λ|Wn(t)Wn(0)|+ΛU2K\displaystyle\leq W_{n}^{\diamond}\left(\frac{\Lambda h}{2^{K}}\right)-W_{n}^{\diamond}(0)+\sup_{|t|\leq\Lambda}|W_{n}(t)-W_{n}(0)|+\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}

Similarly, we have

Wn(t)Wn(0)Wn(Λ(h1)2K)Wn(0)sup|t|Λ|Wn(t)Wn(0)|ΛU2K.\displaystyle W_{n}^{\star}(t)-W_{n}^{\star}(0)\geq W_{n}^{\diamond}\left(\frac{\Lambda(h-1)}{2^{K}}\right)-W_{n}^{\diamond}(0)-\sup_{|t|\leq\Lambda}|W_{n}(t)-W_{n}(0)|-\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}.

Consequently, we have

sup0<tΛ|Wn(t)Wn(0)|maxh[2K]|Wn(Λh2K)Wn(0)|+sup|t|Λ|Wn(t)Wn(0)|+ΛU2K\displaystyle\sup_{0<t\leq\Lambda}|W_{n}^{\star}(t)-W_{n}(0)|\leq\max_{h\in[2^{K}]}\bigg{|}W_{n}^{\diamond}\bigg{(}\frac{\Lambda h}{2^{K}}\bigg{)}-W_{n}^{\diamond}(0)\bigg{|}+\sup_{|t|\leq\Lambda}|W_{n}(t)-W_{n}(0)|+\frac{\Lambda\|U^{\prime}\|_{\infty}}{2^{K}}

Hence it suffices to upper bound maxh[2K]|Wn(Λh/2K)Wn(0)|\max_{h\in[2^{K}]}|W_{n}^{\diamond}(\Lambda h/2^{K})-W_{n}^{\diamond}(0)|. For each h[2K]h\in[2^{K}], we have

Wn(Λh2K)Wn(0)\displaystyle W_{n}^{\diamond}\bigg{(}\frac{\Lambda h}{2^{K}}\bigg{)}-W_{n}^{\diamond}(0) =1Nij[n](Xi,Xj,Λh2K)(ωiωj1)\displaystyle=\frac{1}{N}\sum_{i\neq j\in[n]}\mathcal{R}\left(X_{i},X_{j},\frac{\Lambda h}{2^{K}}\right)(\omega_{i}\omega_{j}-1)
=1Nij[n](Xi,Xj,Λh2K)(ωi1)(ωj1)\displaystyle=\frac{1}{N}\sum_{i\neq j\in[n]}\mathcal{R}\left(X_{i},X_{j},\frac{\Lambda h}{2^{K}}\right)(\omega_{i}-1)(\omega_{j}-1)
+2Nj=1n{ij(Xi,Xj,Λh2K)}(ωj1)\displaystyle+\frac{2}{N}\sum_{j=1}^{n}\bigg{\{}\sum_{i\neq j}\mathcal{R}\left(X_{i},X_{j},\frac{\Lambda h}{2^{K}}\right)\bigg{\}}(\omega_{j}-1)
=:ΔW,h+ΔW,h.\displaystyle=:\Delta_{W,h}^{\natural}+\Delta_{W,h}^{\sharp}.

Recall that Bn=ij[n]ωiωjB_{n}=\sum_{i\neq j\in[n]}\omega_{i}\omega_{j}, where ω1,,ωn\omega_{1},\ldots,\omega_{n} are i.i.d. random variables such that 0ωi20\leq\omega_{i}\leq 2 and 𝔼(ωi)=1\mathbb{E}(\omega_{i})=1 for each i[n]i\in[n]. Hence, it follows from Lemma B.2 that (Bn>2n(n1))exp(n/16)\mathbb{P}(B_{n}>2n(n-1))\leq\exp(-n/16). We first upper bound maxh[2K]|ΔW,h|\max_{h\in[2^{K}]}|\Delta_{W,h}^{\natural}|. Observe that ΔW,h\Delta_{W,h}^{\natural} is a degenerate second order U-statistic with

maxij[n]maxh[2K]|(Xi,Xj,Λh2K)|1.\displaystyle\max_{i\neq j\in[n]}\max_{h\in[2^{K}]}\bigg{|}\mathcal{R}\bigg{(}X_{i},X_{j},\frac{\Lambda h}{2^{K}}\bigg{)}\bigg{|}\leq 1.

Hence, by Theorem 3.3 in Giné et al. (2000),

(maxh[2K]|ΔW,h|>z2n2)C2Kexp[1Cmin{z2n2,zn,(zn)2/3,z}]+exp(n16).\displaystyle\mathbb{P}^{\star}\bigg{(}\max_{h\in[2^{K}]}|\Delta_{W,h}^{\natural}|>\frac{z}{2n^{2}}\bigg{)}\leq C2^{K}\exp\bigg{[}-\frac{1}{C}\min\bigg{\{}\frac{z^{2}}{n^{2}},\frac{z}{n},\bigg{(}\frac{z}{\sqrt{n}}\bigg{)}^{2/3},\sqrt{z}\bigg{\}}\bigg{]}+\exp\left(-\frac{n}{16}\right).

We now upper bound maxh[2K]|ΔW,h|\max_{h\in[2^{K}]}|\Delta_{W,h}^{\sharp}|. For simplicity of notation, denote Rij,h=R(Xi,Xj,Λh/2K)R_{ij,h}=R(X_{i},X_{j},\Lambda h/2^{K}) and R¯n,h={n(n1)}1ij[n]Rij,h\bar{R}_{n,h}=\{n(n-1)\}^{-1}\sum_{i\neq j\in[n]}R_{ij,h}. Define

Γn=maxh[2K]j=1n|ij(Xi,Xj,Λh2K)|2.\displaystyle\Gamma_{n}=\max_{h\in[2^{K}]}\sum_{j=1}^{n}\bigg{|}\sum_{i\neq j}\mathcal{R}\bigg{(}X_{i},X_{j},\frac{\Lambda h}{2^{K}}\bigg{)}\bigg{|}^{2}. (D.4)

Hence, under K\mathcal{E}_{K}, by Theorem 1.1 in Bentkus and Dzindzalieta (2015), it follows that

{maxh[2K]|ΔW,h|>C1Γ(z+K)2n2}C2exp(z)+exp(n16).\displaystyle\mathbb{P}^{\star}\bigg{\{}\max_{h\in[2^{K}]}|\Delta_{W,h}^{\sharp}|>\frac{C_{1}\sqrt{\Gamma_{\diamond}(z+K)}}{2n^{2}}\bigg{\}}\leq C_{2}\exp(-z)+\exp\left(-\frac{n}{16}\right).

Putting all these pieces together, we obtain (D.2). ∎

Lemma D.3.

Let Γn\Gamma_{n} be defined in (D.4). Assume that Λ1/(2f)\Lambda\leq 1/(2\|f\|_{\infty}) and lognnΛf\log n\leq n\Lambda\|f\|_{\infty}. Then we have

(ΓnCn3Λ2f2)1C1exp(nΛf).\displaystyle\mathbb{P}(\Gamma_{n}\leq Cn^{3}\Lambda^{2}\|f\|_{\infty}^{2})\geq 1-C_{1}\exp(-n\Lambda\|f\|_{\infty}). (D.5)
Proof of Lemma D.3.

By the triangle inequality,

Γn\displaystyle\Gamma_{n} maxh[2K]j=1n|ij{Rij,h𝔼(Rij,h|Xj)}|2+maxh[2K]n2j=1n|𝔼(Rij,h|Xj)𝔼(Rij,h)|2\displaystyle\lesssim\max_{h\in[2^{K}]}\sum_{j=1}^{n}\bigg{|}\sum_{i\neq j}\{R_{ij,h}-\mathbb{E}(R_{ij,h}|X_{j})\}\bigg{|}^{2}+\max_{h\in[2^{K}]}n^{2}\sum_{j=1}^{n}|\mathbb{E}(R_{ij,h}|X_{j})-\mathbb{E}(R_{ij,h})|^{2}
+maxh[2K]n3|R¯n,h𝔼(R¯n,h)|2=:Γn,1+Γn,2+Γn,3.\displaystyle+\max_{h\in[2^{K}]}n^{3}|\bar{R}_{n,h}-\mathbb{E}(\bar{R}_{n,h})|^{2}=:\Gamma_{n,1}+\Gamma_{n,2}+\Gamma_{n,3}.

For each j[n]j\in[n], conditional on XjX_{j}, {Rij,h𝔼(Rij,h|Xj)}ij\{R_{ij,h}-\mathbb{E}(R_{ij,h}|X_{j})\}_{i\neq j} are independent centered random variables with

maxh[2K]maxj[n]|Rij,h𝔼(Rij,h|Xj)|1andmaxh[2K]maxj[n]𝔼{|Rij,h𝔼(Rij,h|Xj)|2|Xj}Λf.\displaystyle\max_{h\in[2^{K}]}\max_{j\in[n]}|R_{ij,h}-\mathbb{E}(R_{ij,h}|X_{j})|\leq 1\enspace\mathrm{and}\enspace\max_{h\in[2^{K}]}\max_{j\in[n]}\mathbb{E}\{|R_{ij,h}-\mathbb{E}(R_{ij,h}|X_{j})|^{2}|X_{j}\}\leq\Lambda\|f\|_{\infty}.

Then, by the Bernstein inequality, with probability at least 1Cexp(z)1-C\exp(-z), we have

Γn,1nmaxh[2K]maxj[n]|ij{Rij,h𝔼(Rij,h|Xj)}|2C1{n(z+logn)2+n2Λf(z+logn)}.\displaystyle\Gamma_{n,1}\leq n\max_{h\in[2^{K}]}\max_{j\in[n]}\bigg{|}\sum_{i\neq j}\{R_{ij,h}-\mathbb{E}(R_{ij,h}|X_{j})\}\bigg{|}^{2}\leq C_{1}\{n(z+\log n)^{2}+n^{2}\Lambda\|f\|_{\infty}(z+\log n)\}.

We now bound Γn,2\Gamma_{n,2}. Observe that {|𝔼(Rij,h|Xj)𝔼(Rij,h)|2}j[n]\{|\mathbb{E}(R_{ij,h}|X_{j})-\mathbb{E}(R_{ij,h})|^{2}\}_{j\in[n]} are independent random variables with

maxh[2K]maxj[n]|𝔼(Rij,h|Xj)𝔼(Rij,h)|2Λ2f2.\displaystyle\max_{h\in[2^{K}]}\max_{j\in[n]}|\mathbb{E}(R_{ij,h}|X_{j})-\mathbb{E}(R_{ij,h})|^{2}\leq\Lambda^{2}\|f\|_{\infty}^{2}.

Hence, it follows from Lemma B.1 that

[Γn,2C{n3Λ2f2+n5/2Λ2f2(z+logn)}]1exp(z).\displaystyle\mathbb{P}\left[\Gamma_{n,2}\leq C\left\{n^{3}\Lambda^{2}\|f\|_{\infty}^{2}+n^{5/2}\Lambda^{2}\|f\|_{\infty}^{2}\sqrt{(z+\log n)}\right\}\right]\geq 1-\exp(-z).

Recall that Wn(t)={n(n1)}1ij[n]R(Xi,Xj,t)W_{n}(t)=\{n(n-1)\}^{-1}\sum_{i\neq j\in[n]}R(X_{i},X_{j},t). Hence R¯n,h𝔼(R¯n,h)=Wn(Λh/2K)Wn(0)\bar{R}_{n,h}-\mathbb{E}(\bar{R}_{n,h})=W_{n}(\Lambda h/2^{K})-W_{n}(0) for each h[2K]h\in[2^{K}]. Therefore, by Lemma C.2, with probability at least 1Cexp(z)1-C\exp(-z), we have

Γn,3Cn3{(K+zn)2+Λfz+cn+K+znΛf}2.\displaystyle\Gamma_{n,3}\leq Cn^{3}\bigg{\{}\bigg{(}\frac{K+z}{n}\bigg{)}^{2}+\Lambda\|f\|_{\infty}\sqrt{\frac{z+c}{n}}+\frac{K+z}{n}\sqrt{\Lambda\|f\|_{\infty}}\bigg{\}}^{2}.

Putting all these pieces together, we obtain (D.5). ∎

Proof of Theorem 3.1.

By Theorem 2.1, for any ω>0\omega>0 such that ωnκ02c02\omega\leq n\kappa_{0}^{2}c_{0}^{2}, we have

(|θ^θ|>ω/(nκ02))2exp(ω).\displaystyle\mathbb{P}\left(|\widehat{\theta}-\theta|>\sqrt{\omega/(n\kappa_{0}^{2})}\right)\leq 2\exp(-\omega).

Combined with Lemma C.2, for any z>0z>0 such that z1/(4f)z\leq 1/(4\|f\|_{\infty}), with probability at least 1C1exp(ω)1-C_{1}\exp(-\omega), we have

|Un(θ^\displaystyle|U_{n}(\widehat{\theta} +z)Un(θ^){U(θ^+z)U(θ^)}|\displaystyle+z)-U_{n}(\widehat{\theta})-\{U(\widehat{\theta}+z)-U(\widehat{\theta})\}|
|Un(θ^+z)Un(θ){U(θ^+z)U(θ)}|\displaystyle\leq|U_{n}(\widehat{\theta}+z)-U_{n}(\theta)-\{U(\widehat{\theta}+z)-U(\theta)\}|
+|Un(θ^)Un(θ){U(θ^)U(θ)}|\displaystyle+|U_{n}(\widehat{\theta})-U_{n}(\theta)-\{U(\widehat{\theta})-U(\theta)\}|
C2(fzω+Cn+K+ωnfz+K+ωnκ0f).\displaystyle\leq C_{2}\bigg{(}\|f\|_{\infty}z\sqrt{\frac{\omega+C}{n}}+\frac{K+\omega}{n}\sqrt{\|f\|_{\infty}z}+\frac{K+\omega}{n\kappa_{0}}\|f\|_{\infty}\bigg{)}.

Similar to the proof of Theorem 2.1, for any z0z\geq 0 such that θ^>θ^+z\widehat{\theta}^{\star}>\widehat{\theta}+z, we have

12\displaystyle\frac{1}{2} >Un(θ^+z)Un(θ^+z)|Un(θ^+z)Un(θ^+z)|\displaystyle>U_{n}^{\star}(\widehat{\theta}+z)\geq U_{n}(\widehat{\theta}+z)-|U_{n}^{\star}(\widehat{\theta}+z)-U_{n}(\widehat{\theta}+z)|
Un(θ^)+{U(θ^+z)U(θ^)}|Un(θ^+z)Un(θ^+z)|\displaystyle\geq U_{n}(\widehat{\theta})+\left\{U(\widehat{\theta}+z)-U(\widehat{\theta})\right\}-|U_{n}^{\star}(\widehat{\theta}+z)-U_{n}(\widehat{\theta}+z)|
|Un(θ^+z)Un(θ^){U(θ^+z)U(θ^)}|\displaystyle-|U_{n}(\widehat{\theta}+z)-U_{n}(\widehat{\theta})-\{U(\widehat{\theta}+z)-U(\widehat{\theta})\}|
12+κ0z2|Un(θ^+z)Un(θ^+z)|C2(K+ωnfz+K+ωnκ0f)\displaystyle\geq\frac{1}{2}+\frac{\kappa_{0}z}{2}-|U_{n}^{\star}(\widehat{\theta}+z)-U_{n}(\widehat{\theta}+z)|-C_{2}\left(\frac{K+\omega}{n}\sqrt{\|f\|_{\infty}z}+\frac{K+\omega}{n\kappa_{0}}\|f\|_{\infty}\right)
12+κ0z2|Un(θ^+z)Un(θ^+z)|2C2(K+ω)nκ0f.\displaystyle\geq\frac{1}{2}+\frac{\kappa_{0}z}{2}-|U_{n}^{\star}(\widehat{\theta}+z)-U_{n}(\widehat{\theta}+z)|-\frac{2C_{2}(K+\omega)}{n\kappa_{0}}\|f\|_{\infty}.

Then it follows from Lemma B.1 that

{|θ^θ^|>2κ0zn+4C2f(K+ω)nκ02}2exp(C3z)+exp(n16).\displaystyle\mathbb{P}^{\star}\left\{|\widehat{\theta}^{\star}-\widehat{\theta}|>\frac{2}{\kappa_{0}}\sqrt{\frac{z}{n}}+\frac{4C_{2}\|f\|_{\infty}(K+\omega)}{n\kappa_{0}^{2}}\right\}\leq 2\exp(-C_{3}z)+\exp\left(-\frac{n}{16}\right).

By Lemma D.2 and Lemma D.3, with probability at least 1Cexp(ω)1-C\exp(-\omega), we have

{|Un(θ^)U(θ^){Un(θ)U(θ)}|>𝒞(K+z+ω)n}Cexp(z)+exp(n16),\displaystyle\mathbb{P}^{\star}\bigg{\{}|U_{n}^{\star}(\widehat{\theta}^{\star})-U(\widehat{\theta}^{\star})-\{U_{n}^{\star}(\theta)-U(\theta)\}|>\frac{\mathcal{C}(K+z+\omega)}{n}\bigg{\}}\leq C\exp(-z)+\exp\left(-\frac{n}{16}\right),

where 𝒞\mathcal{C} is a positive constant depending only on κ0,f\kappa_{0},\|f\|_{\infty} and c0c_{0}. Therefore, combined with the fact that sup|δ|c1|U′′(θ+δ)|κ1\sup_{|\delta|\leq c_{1}}|U^{\prime\prime}(\theta+\delta)|\leq\kappa_{1}, we obtain

|θ^θ1BnU(θ)ij[n]ωiωj(𝕀{ξi+ξj0}12)|𝒞(K+z+ω)n.\displaystyle\bigg{|}\widehat{\theta}^{\star}-\theta-\frac{1}{B_{n}U^{\prime}(\theta)}\sum_{i\neq j\in[n]}\omega_{i}\omega_{j}\left(\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-\frac{1}{2}\right)\bigg{|}\leq\frac{\mathcal{C}(K+z+\omega)}{n}.

Combining these with Theorem 2.1, we obtain (3.3). ∎

Proof of (3.4).

For each i[n]i\in[n], denote

𝒟i=2U(θ)(1n1ji𝕀{ξi+ξj0}Un(θ)).\displaystyle\mathcal{D}_{i}=\frac{2}{U^{\prime}(\theta)}\bigg{(}\frac{1}{n-1}\sum_{j\neq i}\mathbb{I}\{\xi_{i}+\xi_{j}\leq 0\}-U_{n}(\theta)\bigg{)}.

Then if follows from (3.3) that with probability at least 1Cexp(ω)1-C\exp(-\omega), we have

{|θ^θ^1ni=1n𝒟i(ωi1)|>C(ω+logn)n}Cnc.\displaystyle\mathbb{P}^{\star}\bigg{\{}\bigg{|}\widehat{\theta}^{\star}-\widehat{\theta}-\frac{1}{n}\sum_{i=1}^{n}\mathcal{D}_{i}(\omega_{i}-1)\bigg{|}>\frac{C(\omega+\log n)}{n}\bigg{\}}\leq Cn^{-c}. (D.6)

By Lemma D.1, we have (σ^θσθ/2)1Cexp(n)\mathbb{P}(\widehat{\sigma}_{\theta}\geq\sigma_{\theta}/2)\geq 1-C\exp(-n). Consequently, by Berry-Esseen theorem, with probability at least 1Cexp(n)1-C\exp(-n), we have

supz|(1nσ^θi=1n𝒟i(ωi1)z)Φ(z)|Cn.\displaystyle\sup_{z\in\mathbb{R}}\bigg{|}\mathbb{P}^{\star}\bigg{(}\frac{1}{\sqrt{n}\widehat{\sigma}_{\theta}}\sum_{i=1}^{n}\mathcal{D}_{i}(\omega_{i}-1)\leq z\bigg{)}-\Phi(z)\bigg{|}\leq\frac{C}{\sqrt{n}}.

Combining this with (D.6) yields (3.4). ∎

D.2 Proof of Theorem 3.2

Based on Theorem 2.4, the proof of Theorem 3.2 can be derived by following the proof of Theorem 3.1. Thus we omit the details.

Appendix E Proof of Results in §4

E.1 Proof of Theorem 4.1

Proof of Theorem 4.1.

For each i[n]i\in[n] and [p]\ell\in[p], denote Di={12F(ξi)}/U(θ)D_{i\ell}=\{1-2F_{\ell}(-\xi_{i\ell})\}/U_{\ell}^{\prime}(\theta_{\ell}). Define 𝑫¯n=(D¯n1,,D¯np)p\bar{\bm{D}}_{n}=(\bar{D}_{n1},\ldots,\bar{D}_{np})^{\top}\in\mathbb{R}^{p}, where D¯n=n1i=1nDi\bar{D}_{n\ell}=n^{-1}\sum_{i=1}^{n}D_{i\ell} for each [p]\ell\in[p]. Note that |Di|1/κ0|D_{i\ell}|\leq 1/\kappa_{0} uniformly for i[n]i\in[n] and [p]\ell\in[p]. Hence, it follows from Theorem 2.1 in Chernozhuokov et al. (2022) that

supz|(n𝑫¯nz)(𝒁z)|Clog5/4(pn)n1/4.\displaystyle\sup_{z\in\mathbb{R}}|\mathbb{P}(\sqrt{n}\|\bar{\bm{D}}_{n}\|_{\infty}\leq z)-\mathbb{P}(\|\bm{Z}\|_{\infty}\leq z)|\leq\frac{C\log^{5/4}(pn)}{n^{1/4}}.

Let δ=Clog(pn)/n\delta^{\sharp}=C\log(pn)/\sqrt{n}. By Theorem 2.1, it follows that

supz\displaystyle\sup_{z\in\mathbb{R}} |(n𝜽^𝜽z)(n𝑫¯nz)|\displaystyle|\mathbb{P}(\sqrt{n}\|\widehat{\bm{\theta}}-\bm{\theta}\|_{\infty}\leq z)-\mathbb{P}(\sqrt{n}\|\bar{\bm{D}}_{n}\|_{\infty}\leq z)|
(n𝜽^𝜽𝑫¯n>δ)+supz(z<n𝑫¯nz+δ)\displaystyle\leq\mathbb{P}(\sqrt{n}\|\widehat{\bm{\theta}}-\bm{\theta}-\bar{\bm{D}}_{n}\|_{\infty}>\delta^{\sharp})+\sup_{z\in\mathbb{R}}\mathbb{P}(z<\sqrt{n}\|\bar{\bm{D}}_{n}\|_{\infty}\leq z+\delta^{\sharp})
nc+δlogp+log5/4(pn)n1/4Clog5/4(pn)n1/4.\displaystyle\lesssim n^{-c}+\delta^{\sharp}\sqrt{\log p}+\frac{\log^{5/4}(pn)}{n^{1/4}}\leq\frac{C\log^{5/4}(pn)}{n^{1/4}}.

Putting all these pieces together, we obtain (4.2). ∎

E.2 Proof of Theorem 4.2

Proof of Theorem 4.2.

For each [p]\ell\in[p] and i[n]i\in[n], denote

𝒟i=2σU(θ)(1n1ji𝕀{ξi+ξj0}Un(θ)).\displaystyle\mathcal{D}_{i\ell}=\frac{2}{\sigma_{\ell}U_{\ell}^{\prime}(\theta_{\ell})}\bigg{(}\frac{1}{n-1}\sum_{j\neq i}\mathbb{I}\{\xi_{i\ell}+\xi_{j\ell}\leq 0\}-U_{n\ell}(\theta_{\ell})\bigg{)}.

Let 𝓓¯n=(𝒟¯n1,,𝒟¯np)\bar{\bm{\mathcal{D}}}_{n}=(\bar{\mathcal{D}}_{n1},\ldots,\bar{\mathcal{D}}_{np})^{\top}, where 𝒟¯n=n1i=1n𝒟i\bar{\mathcal{D}}_{n\ell}=n^{-1}\sum_{i=1}^{n}\mathcal{D}_{i\ell} for each [p]\ell\in[p]. By the triangle inequality,

max[p]i=1n(Di\displaystyle\max_{\ell\in[p]}\sum_{i=1}^{n}(D_{i\ell} 𝒟i)2max[p]i=1n(1n1ji𝕀{ξi+ξj0}F(ξi))2+nmax[p]|Un(θ)12|2\displaystyle-\mathcal{D}_{i\ell})^{2}\lesssim\max_{\ell\in[p]}\sum_{i=1}^{n}\bigg{(}\frac{1}{n-1}\sum_{j\neq i}\mathbb{I}\{\xi_{i\ell}+\xi_{j\ell}\leq 0\}-F_{\ell}(-\xi_{i\ell})\bigg{)}^{2}+n\max_{\ell\in[p]}\bigg{|}U_{n\ell}(\theta_{\ell})-\frac{1}{2}\bigg{|}^{2}
nmax[p]maxi[n]|1n1ji𝕀{ξi+ξj0}F(ξi)|2+nmax[p]|Un(θ)12|2\displaystyle\leq n\max_{\ell\in[p]}\max_{i\in[n]}\bigg{|}\frac{1}{n-1}\sum_{j\neq i}\mathbb{I}\{\xi_{i\ell}+\xi_{j\ell}\leq 0\}-F_{\ell}(-\xi_{i\ell})\bigg{|}^{2}+n\max_{\ell\in[p]}\bigg{|}U_{n\ell}(\theta_{\ell})-\frac{1}{2}\bigg{|}^{2}
=:ΔD+ΔD.\displaystyle=:\Delta_{D}^{\diamond}+\Delta_{D}^{\circ}.

By Lemma B.1, it follows that {ΔD>Clog(np)}(np)c\mathbb{P}\{\Delta_{D}^{\diamond}>C\log(np)\}\lesssim(np)^{-c}. Recall that

U(θ)=1n(n1)ij[n]𝕀{ξi+ξj0},[p].\displaystyle U_{\ell}(\theta_{\ell})=\frac{1}{n(n-1)}\sum_{i\neq j\in[n]}\mathbb{I}\{\xi_{i\ell}+\xi_{j\ell}\leq 0\},\enspace\ell\in[p].

Hence, it follows from Lemma B.2 that {ΔD>Clog(np)}(np)c\mathbb{P}\{\Delta_{D}^{\circ}>C\log(np)\}\lesssim(np)^{-c}. Consequently, for any δ>0\delta>0, by Theorem 1.1 in Bentkus and Dzindzalieta (2015), with probability at least 1C(np)c1-C(np)^{-c}, we have

(max[p]|i=1n(Di𝒟i)(ωi1)|>δ)pexp(Cδ2log(np)).\displaystyle\mathbb{P}^{\star}\bigg{(}\max_{\ell\in[p]}\bigg{|}\sum_{i=1}^{n}(D_{i\ell}-\mathcal{D}_{i\ell})(\omega_{i}-1)\bigg{|}>\delta\bigg{)}\lesssim p\exp\left(-\frac{C\delta^{2}}{\log(np)}\right).

Combined with Lemma, with probability at least 1C(np)c1-C(np)^{-c}, we have

supz\displaystyle\sup_{z\in\mathbb{R}} |(max[p]|i=1nDi(ωi1)|z)(max[p]|i=1n𝒟i(ωi1)|z)|\displaystyle\bigg{|}\mathbb{P}^{\star}\bigg{(}\max_{\ell\in[p]}\bigg{|}\sum_{i=1}^{n}D_{i\ell}(\omega_{i}-1)\bigg{|}\leq z\bigg{)}-\mathbb{P}^{\star}\bigg{(}\max_{\ell\in[p]}\bigg{|}\sum_{i=1}^{n}\mathcal{D}_{i\ell}(\omega_{i}-1)\bigg{|}\leq z\bigg{)}\bigg{|}
pexp{Clog(np)}+log(np)logpnlog(np)logpn.\displaystyle\lesssim p\exp\{-C\log(np)\}+\frac{\log(np)\sqrt{\log p}}{\sqrt{n}}\leq\frac{\log(np)\sqrt{\log p}}{\sqrt{n}}.

E.3 Proof of Theorem 4.3

Proof of Theorem 4.3.

Following the proof of Theorem 2.3, it follows from Theorem 3.1 that with probability at least 1Cnc1-Cn^{-c}, we have

{n(θ^θ^)>z}2{1Φ(z/σ)}=1+o(1),\displaystyle\frac{\mathbb{P}^{\star}\{\sqrt{n}(\widehat{\theta}_{\ell}^{\star}-\widehat{\theta}_{\ell})>z\}}{2\{1-\Phi(z/\sigma_{\ell})\}}=1+o(1), (E.1)

uniformly for 0<zo(n1/6)0<z\leq o(n^{1/6}) and [p]\ell\in[p]. By Lemma 1 in Storey et al. (2004), it follows that

tS={t[0,1]:tαmax(=1p𝕀{Pt},1)p}.\displaystyle t_{S}=\left\{t\in[0,1]:t\leq\frac{\alpha\max(\sum_{\ell=1}^{p}\mathbb{I}\{P_{\ell}\leq t\},1)}{p}\right\}.

By the definition of tSt_{S}, we have

tS=αmax(=1p𝕀{PtS},1)p.\displaystyle t_{S}=\frac{\alpha\max(\sum_{\ell=1}^{p}\mathbb{I}\{P_{\ell}\leq t_{S}\},1)}{p}. (E.2)

Observe that tSα/pt_{S}\geq\alpha/p. Hence, it follows from (E.1) that {tS>max[p]𝒢(σ(2logp)1/2)}1\mathbb{P}\{t_{S}>\max_{\ell\in[p]}\mathcal{G}_{\ell}^{\star}(\sigma_{\ell}(2\log p)^{1/2})\}\to 1. Combining this with (E.2) yields

tSαp=1p\displaystyle t_{S}\geq\frac{\alpha}{p}\sum_{\ell=1}^{p} 𝕀{PtS}αp=1p𝕀{𝒢(n|θ^|)𝒢(σ2logp)}\displaystyle\mathbb{I}\{P_{\ell}\leq t_{S}\}\geq\frac{\alpha}{p}\sum_{\ell=1}^{p}\mathbb{I}\left\{\mathcal{G}_{\ell}^{\star}(\sqrt{n}|\widehat{\theta}_{\ell}|)\leq\mathcal{G}_{\ell}^{\star}\left(\sigma_{\ell}\sqrt{2\log p}\right)\right\}
αp=1p𝕀{n|θ^|σ2logp}\displaystyle\geq\frac{\alpha}{p}\sum_{\ell=1}^{p}\mathbb{I}\left\{\sqrt{n}|\widehat{\theta}_{\ell}|\geq\sigma_{\ell}\sqrt{2\log p}\right\}
αp=1p𝕀{|θ|σ2logp+max[p]n(θ^θ)σ}.\displaystyle\geq\frac{\alpha}{p}\sum_{\ell=1}^{p}\mathbb{I}\bigg{\{}\frac{|\theta_{\ell}|}{\sigma_{\ell}}\geq\sqrt{2\log p}+\max_{\ell\in[p]}\frac{\sqrt{n}(\widehat{\theta}_{\ell}-\theta_{\ell})}{\sigma_{\ell}}\bigg{\}}.

For some λ>0\lambda>0, define

λ={nmax[p]|θ^θ|σ(1+λ)2logp}.\displaystyle\mathcal{E}_{\lambda}=\bigg{\{}\sqrt{n}\max_{\ell\in[p]}\frac{|\widehat{\theta}_{\ell}-\theta_{\ell}|}{\sigma_{\ell}}\leq(1+\lambda)\sqrt{2\log p}\bigg{\}}.

Under λ\mathcal{E}_{\lambda}, it is straightforward to verify that

tSαp=1p𝕀{|θ|σ(2+λ)2logp}\displaystyle t_{S}\geq\frac{\alpha}{p}\sum_{\ell=1}^{p}\mathbb{I}\left\{\frac{|\theta_{\ell}|}{\sigma_{\ell}}\geq(2+\lambda)\sqrt{2\log p}\right\}

By theorem 2.3, it follows that (λc)Cpexp{(1+λ)2logp}Cpλ22λ\mathbb{P}(\mathcal{E}_{\lambda}^{c})\leq Cp\exp\{-(1+\lambda)^{2}\log p\}\leq Cp^{-\lambda^{2}-2\lambda}. Then, following the proof of Theorem 3.3 in Zhou et al. (2018), for any sequence 1bp1\leq b_{p}\to\infty, it is straightforward to derive that

supbp/pt1|0𝕀{Pt}|0|t1|0.\displaystyle\sup_{b_{p}/p\leq t\leq 1}\bigg{|}\frac{\sum_{\ell\in\mathcal{H}_{0}}\mathbb{I}\{P_{\ell}\leq t\}}{|\mathcal{H}_{0}|t}-1\bigg{|}\overset{\mathbb{P}}{\to}0.

Putting all these pieces together, we obtain (4.6). ∎

E.4 Large-scale Two-sample Simultaneous Testing

We consider the following high dimensional two-sample global test,

H0:θ=θforall[p]versusH1:θθforsome[p].\displaystyle H_{0}:\theta_{\ell}=\theta_{\ell}^{\circ}\enspace\mathrm{for\ all}\enspace\ell\in[p]\enspace\mathrm{versus}\enspace H_{1}:\theta_{\ell}\neq\theta_{\ell}^{\circ}\enspace\mathrm{for\ some}\enspace\ell\in[p]. (E.3)

Given the HL estimators Θ^=median{XiYj:i[n],j[m]}\widehat{\Theta}_{\ell}=\mathrm{median}\{X_{i\ell}-Y_{j\ell}:i\in[n],j\in[m]\}, [p]\ell\in[p], for Θ=θθ\Theta_{\ell}=\theta_{\ell}-\theta_{\ell}^{\circ}, we shall the null hypothesis H0H_{0} in (E.3) whenever max[p]|Θ^|\max_{\ell\in[p]}|\widehat{\Theta}_{\ell}| exceeds some threshold. To this end, we develop a Gaussian approximation for max[p]|Θ^Θ|\max_{\ell\in[p]}|\widehat{\Theta}_{\ell}-\Theta_{\ell}|. More specifically, let 𝓩=(𝒵1,,𝒵p)\bm{\mathcal{Z}}=(\mathcal{Z}_{1},\ldots,\mathcal{Z}_{p})^{\top} be a p-dimensional centered Gaussian random vector with

Cov(𝒵k,𝒵)=NCov(G¯nkF¯mk,G¯nF¯m)𝒰k(Θk)𝒰(Θ),k,[p].\displaystyle\operatorname{\rm Cov}(\mathcal{Z}_{k},\mathcal{Z}_{\ell})=\frac{N\operatorname{\rm Cov}(\bar{G}_{nk}-\bar{F}_{mk},\bar{G}_{n\ell}-\bar{F}_{m\ell})}{\mathcal{U}_{k}^{\prime}(\Theta_{k})\mathcal{U}_{\ell}^{\prime}(\Theta_{\ell})},\enspace k,\ell\in[p].

In the following theorem, we establish a non-asymptotic upper bound for the Kolmogorov distance between the distribution functions of max[p]|Θ^Θ|\max_{\ell\in[p]}|\widehat{\Theta}_{\ell}-\Theta_{\ell}| and its Gaussian analogue max[p]|𝒵|\max_{\ell\in[p]}|\mathcal{Z}_{\ell}|.

Theorem E.1.

Let 𝒰(t)=(X1Y1t)\mathcal{U}_{\ell}(t)=\mathbb{P}(X_{1\ell}-Y_{1\ell}\leq t) for each [p]\ell\in[p]. Assume that there exist positive constants c¯0,κ¯0,c¯1\bar{c}_{0},\bar{\kappa}_{0},\bar{c}_{1} and κ¯1\bar{\kappa}_{1} such that

min[p]inf|Λ|c¯0𝒰(Θ+Λ)κ¯0>0andmax[p]sup|Λ|c¯1|𝒰′′(Θ+Λ)|κ¯1.\displaystyle\min_{\ell\in[p]}\inf_{|\Lambda|\leq\bar{c}_{0}}\mathcal{U}_{\ell}^{\prime}(\Theta_{\ell}+\Lambda)\geq\bar{\kappa}_{0}>0\enspace\mathrm{and}\enspace\max_{\ell\in[p]}\sup_{|\Lambda|\leq\bar{c}_{1}}|\mathcal{U}_{\ell}^{\prime\prime}(\Theta_{\ell}+\Lambda)|\leq\bar{\kappa}_{1}.

Then, under Assumption 2.1, we have

supz|(max[p]N|Θ^Θ|z)(max[p]|𝒵|z)|Clog5/4(Np)N1/4.\displaystyle\sup_{z\in\mathbb{R}}\left|\mathbb{P}\left(\max_{\ell\in[p]}\sqrt{N}|\widehat{\Theta}_{\ell}-\Theta_{\ell}|\leq z\right)-\mathbb{P}\left(\max_{\ell\in[p]}|\mathcal{Z}_{\ell}|\leq z\right)\right|\leq\frac{C\log^{5/4}(Np)}{N^{1/4}}.
Proof of Theorem E.1.

We are able to prove Theorem E.1 by using similar arguments as in Theorem 4.1. The major differences only lies in changing the one-sample to two-sample estimators. Thus, we decided to omit the corresponding details. ∎

Motivated by Theorem E.1, an asymptotic α\alpha-level test for (E.3) is given by 𝕀{max[p]|Θ^|>Q1α}\mathbb{I}\{\max_{\ell\in[p]}|\widehat{\Theta}_{\ell}|>Q_{1-\alpha}^{\diamond}\}, where Q1αQ_{1-\alpha}^{\diamond} stands for the (1α)(1-\alpha)th quantile of the bootstrap statistic max[p]|Θ^Θ^|\max_{\ell\in[p]}|\widehat{\Theta}_{\ell}^{\star}-\widehat{\Theta}_{\ell}|, namely,

Q1α=inf{z:(max[p]|Θ^Θ^|z)1α},α(0,1).\displaystyle Q_{1-\alpha}^{\diamond}=\inf\left\{z\in\mathbb{R}:\mathbb{P}^{\star}\left(\max_{\ell\in[p]}|\widehat{\Theta}_{\ell}^{\star}-\widehat{\Theta}_{\ell}|\leq z\right)\geq 1-\alpha\right\},\enspace\alpha\in(0,1).

The validity of the proposed test is justified via the following theorem.

Theorem E.2.

Under the conditions of Theorem E.1, we have

|(max[p]|Θ^Θ|>Q1α)α|Clog5/4(pN)N1/4.\displaystyle\left|\mathbb{P}\left(\max_{\ell\in[p]}|\widehat{\Theta}_{\ell}-\Theta_{\ell}|>Q_{1-\alpha}^{\diamond}\right)-\alpha\right|\leq\frac{C\log^{5/4}(pN)}{N^{1/4}}.
Proof of Theorem E.2.

The proof of Theorem E.2 can be derived similarly by following the proof procedure of Theorem 4.2 by replacing the one-sample estimator with the two-sample version. Therefore, we omit the details. ∎

E.5 Proof of Theorem 4.4

The proof of Theorem 4.4 can be derived similarly by following the proof procedure of Theorem 4.3 by changing the one-sample estimator to the two-sample version. Therefore, we omit the details.