This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A New Procedure for Controlling False Discovery Rate in Large-Scale tt-tests

Changliang Zoua, Haojie Renb, Xu Guoc and Runze Lib
aSchool of Statistics and Data Science, Nankai University
Tianjin, P.R. China
bDepartment of Statistics, The Pennsylvania State University,
University Park, PA 16802-2111, USA
cSchool of Statistics, Beijing Normal University
Beijing, P.R. China
Abstract

This paper is concerned with false discovery rate (FDR) control in large-scale multiple testing problems. We first propose a new data-driven testing procedure for controlling the FDR in large-scale tt-tests for one-sample mean problem. The proposed procedure achieves exact FDR control in finite sample settings when the populations are symmetric no matter the number of tests or sample sizes. Comparing with the existing bootstrap method for FDR control, the proposed procedure is computationally efficient. We show that the proposed method can control the FDR asymptotically for asymmetric populations even when the test statistics are not independent. We further show that the proposed procedure with a simple correction is as accurate as the bootstrap method to the second-order degree, and could be much more effective than the existing normal calibration. We extend the proposed procedure to two-sample mean problem. Empirical results show that the proposed procedures have better FDR control than existing ones when the proportion of true alternative hypotheses is not too low, while maintaining reasonably good detection ability.
Keywords: Data splitting; Large-deviation probability; Multiple comparisons; Product of two normal variables; Skewness; Symmetry

1 Introduction

Many multiple testing problems are closely related to one-sample mean problem. Let (Xi1,(X_{i1}, ,Xip)\ldots,X_{ip})^{\top}, 1int1\leq i\leq n_{t} be independent and identically distributed (i.i.d.) samples from X=(X1,,Xp)X=(X_{1},\ldots,X_{p})^{\top} with mean μ=(μ1,,μp)\mu=(\mu_{1},\ldots,\mu_{p})^{\top}. Of interest is to test H0:μ=0H_{0}:\mu=0 versus H1:μ0H_{1}:\mu\neq 0. This leads to consider a multiple testing problem on the mean values

0j:μj=0v.s.1j:μj0, 1jp.\mathbb{H}_{0j}:\mu_{j}=0\ \ \mbox{v.s.}\ \ \mathbb{H}_{1j}:\mu_{j}\neq 0,\ 1\leq j\leq p.

A standard procedure for false discovery rate (FDR) control is to apply the Benjamini and Hochberg (BH) method (Benjamini and Hochberg, 1995) to the tt statistics with the standard normal or Student’s tt calibration. That is, let X¯j=nt1i=1ntXij\bar{X}_{j}=n_{t}^{-1}\sum_{i=1}^{n_{t}}X_{ij} and sj2=(nt1)1i=1nt(XijX¯j)2s_{j}^{2}=(n_{t}-1)^{-1}\sum_{i=1}^{n_{t}}(X_{ij}-\bar{X}_{j})^{2}, and define Tj=ntX¯j/sjT_{j}=\sqrt{n_{t}}\bar{X}_{j}/s_{j}, then the procedure rejects a hypothesis whenever |Tj|T|T_{j}|\geq T with a data-dependent threshold

T=inf{t>0:2p{1Φ(t)}#{j:|Tj|t}α},\displaystyle T=\inf\left\{t>0:\frac{2p\{1-\Phi(t)\}}{\#\{j:|T_{j}|\geq t\}}\leq\alpha\right\}, (1.1)

for a desired FDR level α\alpha (Storey et al., 2004), where Φ()\Phi(\cdot) is the cumulative distribution function of N(0,1)N(0,1). It has been revealed that the accuracy of this control procedure heavily depends on the skewness of XjX_{j}’s and the diverging rate of pp relatively to nn, since Φ(t)\Phi(t) is only an approximation to the distribution of TjT_{j}.

Many studies have investigated the performance of large-scale tt tests. Efron (2004) observed that the null distribution choices substantially affect the simultaneous inference procedure in a microarray analysis. Delaigle et al. (2011) conducted a careful study of moderate and large deviations of the tt statistic which is indispensable to understanding its robustness and drawback for analyzing high dimensional data. Under a condition of non-sparse signals, Cao and Kosorok (2011) proved the robustness of Student’s tt test statistics and N(0,1)N(0,1) calibration in the control of FDR. Liu and Shao (2014) gave a systematic analysis on the asymptotic conditions with which the large-scale tt testing procedure is able to have FDR control.

Bootstrap method is known as a useful way to improve the accuracy of an exact null distribution approximation and has been demonstrated to be particularly effective for highly multiple testing problems. See Delaigle et al. (2011) and the references therein. In general, the bootstrap is capable of correcting for skewness, and therefore leads to second-order accuracy. Accordingly, a faster increasing rate of pp could be allowed (Liu and Shao, 2014) and better FDR control would be achieved when the data populations are skewed. However, multiple testing problems with tens of thousands or even millions of hypotheses are now commonplace, and practitioners may be reluctant to use a bootstrap method in such situations, and therefore a rapid testing procedure is highly desirable.

In this paper, we propose a new data-driven selection procedure controlling the FDR. The method entails constructing pp new test statistics with marginal symmetry property, using the empirical distribution of the negative statistics to approximate that of the positive ones, and searching for the threshold with a formula similar to (1.1). The proposed procedure is computationally efficient since it only uses a one-time split of the data and calculation of the product of two tt statistics obtained from two splits. We study theoretical properties of the proposed procedure. We show that (a) the proposed method achieves exact FDR control even in finite sample settings when XjX_{j}’s are symmetric and independent; and (b) the proposed method achieves asymptotical FDR control under mild conditions when the populations are asymmetric and dependent. We further propose a simple refinement of the proposed procedure, and study the asymptotical property of the refined one. The theoretical property of the proposed refined one implies that it is as accurate as the bootstrap method to the second-order degree in certain situations. We also investigate extension of the proposed procedure to two-sample mean problem. Simulation comparisons imply that the proposed method has better FDR control than existing methods, maintains reasonably good power and has a significant reduction in computing time and storage.

The rest of this paper is organized as follows. In Section 2, we present the new procedure and establish its FDR control property. Some extensions are given in Section 3. Numerical studies are conducted in Section 4. Section 5 concludes the paper, and theoretical proofs are delineated in the Appendix. Some technical details and additional numerical results are provided in the Supplementary Material.

Notations. AnBnA_{n}\approx B_{n} stands for that An/Bnp1A_{n}/B_{n}\mathop{\rightarrow}\limits^{p}1 as nn\to\infty. The “\gtrsim” and “\lesssim” are similarly defined. We denote by 0\mathcal{I}_{0} and 1\mathcal{I}_{1} the true null set and alternative set, p0=|0|p_{0}=|\mathcal{I}_{0}| and p1=|1|p_{1}=|\mathcal{I}_{1}|, respectively.

2 A New FDR Control Procedure and its Theoretical Properties

We first propose a new FDR control procedure for the one-sample mean problem, and then establish the theoretical properties of the proposed procedure.

2.1 A new FDR control procedure

Without loss of generality, assume that the sample size is an even integer nt=2nn_{t}=2n. We randomly split the data into two disjoint groups 𝒟1{\mathcal{D}}_{1} and 𝒟2{\mathcal{D}}_{2} of equal size nn. The tt test statistics for the jjth variable on 𝒟1{\mathcal{D}}_{1} and 𝒟2{\mathcal{D}}_{2} are denoted as T1jT_{1j} and T2jT_{2j}, respectively. Define

Wj=T1jT2j.W_{j}=T_{1j}T_{2j}.

Clearly, WjW_{j} is likely to be large for most of the signals regardless of the sign of μj,j1\mu_{j},j\in\mathcal{I}_{1}, and small for most of the null variables. Observing that WjW_{j} is, at least asymptotically, symmetric with mean zero for j0j\in\mathcal{I}_{0} due to the central limit theorem and the independence between 𝒟1{\mathcal{D}}_{1} and 𝒟2{\mathcal{D}}_{2}, we can choose a threshold L>0L>0 by setting

L=inf{t>0:#{j:Wjt}#{j:Wjt}1α},\displaystyle L=\inf\left\{t>0:\frac{\#\{j:W_{j}\leq-t\}}{\#\{j:W_{j}\geq t\}\vee 1}\leq\alpha\right\}, (2.1)

and reject the 0j\mathbb{H}_{0j} if WjLW_{j}\geq L, where α\alpha is the target FDR level. If the set is empty, we simply set L=+L=+\infty. The fraction in (2.1) is an estimate of the false discovery proportion (FDP) since the set {j:Wjt,j1}\{j:W_{j}\leq-t,j\in\mathcal{I}_{1}\} is often very small (if the signal is not too weak) and thus #{j:Wjt}\#\{j:W_{j}\leq-t\} is a good approximation to #{j:Wjt,j0}\#\{j:W_{j}\leq-t,j\in\mathcal{I}_{0}\}.

Refer to caption
Figure 1: (a): Scatter plot of the WjW_{j}’s with red triangles and black dots denoting true signals and nulls respectively; (b): The corresponding estimate of FDP curve (against tt) along with the true FDP. In this case, p=1000p=1000, nt=100n_{t}=100 and |1|=50|\mathcal{I}_{1}|=50. We consider a multivariate chi-squared distribution with two degrees of freedoms and an autoregressive structure (0.5|ij|)(0.5^{|i-j|}). We set the signal-to-noise ratio as 2.

As described above, we construct the test statistic WjW_{j} with marginal symmetry property by using sample splitting. Thus we refer this procedure to as Reflection via Sample Splitting (RESS). The RESS procedure is data-dependent and does not involve any unknown quantity related to the populations. This is an important and desirable feature of the RESS. Figure 1 depicts a visual representation of the RESS procedure. Specifically, Figure 1(a) depicts the scatter plot of the WjW_{j}’s with red triangles denoting true signals. Observe that the true signals are primarily above the x-axis, indicating Wj>0W_{j}>0, while the null WjW_{j}’s (black dots) are roughly symmetrically distributed across the horizontal lines. Figure 1(b) depicts the corresponding estimate of FDP (i.e., the fraction in (2.1)) along with the true FDP over tt. The approximation in this case is very good as only three true alternatives (i.e., three red triangles) lie below the horizontal line in Figure 1(a).

Knockoff framework was introduced by Barber and Candès (2015) in the high-dimensional linear regression setting. The knockoff selection procedure operates by constructing “knockoff copies” of each of the pp covariates (features) with certain knowledge of the covariates or responses, which are then used as a control group to ensure that the model selection algorithm is not choosing too many irrelevant covariates. The signs of test statistics in the knockoff need to satisfy (or roughly) joint exchangeability so that the corresponding knockoff can yield accurate FDR control in finite sample setting. Refer to Candes et al. (2018) for more discussions. The proposed threshold LL in (2.1) shares a similar spirit to the knockoff, but they are distinguished in that the RESS procedure does not require any prior information about the distribution of X=(X1,,Xp)X=(X_{1},\ldots,X_{p})^{\top}. This is especially important since it is difficult to estimate the distribution of XX when pp is very large. We employ the sample-splitting strategy to achieve a marginal symmetry property. It turns out that the FDR can be controlled reasonably well due to the marginal symmetry of WjW_{j}’s. The theoretical findings on the FDR control under certain dependence structures such as positive regression dependence on subset or weak dependence at a marginal level (Benjamini and Yekutieli, 2001; Storey et al., 2004) shed light on the validity of the RESS procedure. Detailed analysis will be given in Section 2.2.

Refer to caption
Figure 2: Power comparison of the tests based on WjW_{j} (solid line), TjT_{j} (dashed line) and T1j2T^{2}_{1j} (dashed-dotted line) when nt=50n_{t}=50 or 100100. The type I error is fixed as 0.05 and the error distributions considered include N(0,1)N(0,1), the Student’s tt distribution with five degrees of freedom t(5)t(5) and the exponential distribution.

At a first glance, the test statistic WjW_{j} may result in much information loss due to the sample-splitting. In fact, benefiting from the joint use of two independent tt statistics, the relative power loss of WjW_{j} with respect to TjT_{j} is quite mild. By the inequality Pr2(T1j>t)+Pr2(T1j<t)Pr(T1jT2j>t)2Pr(T1j>t)+2Pr(T1j<t)\Pr^{2}(T_{1j}>\sqrt{t})+\Pr^{2}(T_{1j}<-\sqrt{t})\leq\Pr(T_{1j}T_{2j}>t)\leq 2\Pr(T_{1j}>\sqrt{t})+2\Pr(T_{1j}<-\sqrt{t}), the power ratio of the tests based on WjW_{j} and TjT_{j} can be easily bounded as

{1Φ(zα/4nμj/σj)}2+{1Φ(zα/4+nμj/σj)}22Φ(zα/22nμj/σj)Φ(zα/2+2nμj/σj)Pr1j(Wj>Wα)Pr1j(|Tj|>zα/2)\displaystyle\frac{\left\{1-\Phi(z_{\alpha/4}-\sqrt{n}\mu_{j}/\sigma_{j})\right\}^{2}+\left\{1-\Phi(z_{\alpha/4}+\sqrt{n}\mu_{j}/\sigma_{j})\right\}^{2}}{2-\Phi(z_{\alpha/2}-\sqrt{2n}\mu_{j}/\sigma_{j})-\Phi(z_{\alpha/2}+\sqrt{2n}\mu_{j}/\sigma_{j})}\lesssim\frac{\Pr_{\mathbb{H}_{1j}}(W_{j}>W_{\alpha})}{\Pr_{\mathbb{H}_{1j}}(|T_{j}|>z_{\alpha/2})}
2{2Φ(zα/2nμj/σj)Φ(zα/2+nμj/σj)}2Φ(zα/22nμj/σj)Φ(zα/2+2nμj/σj),\displaystyle\lesssim\frac{2\left\{2-\Phi(z_{\sqrt{\alpha/2}}-\sqrt{n}\mu_{j}/\sigma_{j})-\Phi(z_{\sqrt{\alpha/2}}+\sqrt{n}\mu_{j}/\sigma_{j})\right\}}{2-\Phi(z_{\alpha/2}-\sqrt{2n}\mu_{j}/\sigma_{j})-\Phi(z_{\alpha/2}+\sqrt{2n}\mu_{j}/\sigma_{j})},

where σj\sigma_{j} is the standard deviation of XjX_{j}, and WαW_{\alpha} and zαz_{\alpha} are the upper α\alpha quantiles of the distributions of WjW_{j} and N(0,1)N(0,1), respectively. Further when nμj\sqrt{n}\mu_{j}\to\infty, both of the test statistics WjW_{j} and TjT_{j} have asymptotic power 1. For better understanding, the power curves (with size corrected) of the two tests are presented in Figure 2 for some commonly used settings. We can see that though WjW_{j} is always inferior to TjT_{j} as we can expect, the disadvantage is not very significant and also tends to be less pronounced when nn increases. This power sacrifice of the RESS in turn brings us much better error rate control as we shall show in the next subsection. On the other hand, compared with the test statistic T1j2T_{1j}^{2} based on only group 𝒟1{\mathcal{D}}_{1}, the proposed test statistic Wj=T1jT2jW_{j}=T_{1j}T_{2j} has smaller variance because Var(Wj)1<2Var(T1j2)\mathrm{Var}(W_{j})\approx 1<2\approx\mathrm{Var}(T^{2}_{1j}). Since the null distribution of WjW_{j} is symmetric, the upper quantiles of the distributions of WjW_{j} would be much smaller than those of T1j2T^{2}_{1j}. As a result, WjW_{j} is more powerful than T1j2T_{1j}^{2}.

Remark 1  It is noteworthy that the joint use of WjW_{j} and the threshold LL distinguishes our RESS procedure from the methods given by Wasserman and Roeder (2009) and Meinshausen et al. (2009) which used the sample-splitting scheme to conduct variable selection with error rate control. They used the first split to reduce the number of variables to a manageable size and then applied FDR control methods on the remaining variables using the data from the second split. The normal calibration is usually required to obtain pp-values. In contrast, in the RESS procedure, the data from both two splits are used to compute the statistics and the empirical distribution is in place of asymptotic distributions.

2.2 Theoretical results

We firstly investigate the FDR control of the proposed RESS procedure when X1,,XpX_{1},\ldots,X_{p} are independent each other, and then extend the results to the dependent case under stronger conditions. A simply yet effective refined procedure with better accuracy in FDR control will be further developed after the convergence rate of the FDP of the RESS is investigated.

2.2.1 Independent case

A preliminary result of this paper is that the proposed procedure controls a quantity nearly equal to the FDR when the populations are symmetric.

Proposition 1

Assume X1,,XpX_{1},\ldots,X_{p} are symmetrically distributed and independent each other. For any α(0,1)\alpha\in(0,1) and nt4n_{t}\geq 4, the RESS method satisfies

𝔼[#{j:WjL,j0}#{j:WjL}+α1]α.\mathbb{E}\left[\frac{\#\{j:W_{j}\geq L,j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L\}+\alpha^{-1}}\right]\leq\alpha.

The term bounded by this proposition is very close to the FDR in settings where α1\alpha^{-1} is dominated by #{j:WjL}\#\{j:W_{j}\geq L\}. Following Barber and Candès (2015), if it is preferable to control the FDR exactly, we may adjust the threshold by

L+=inf{t>0:1+#{j:Wjt}#{j:Wjt}1α},L_{+}=\inf\left\{t>0:\frac{1+\#\{j:W_{j}\leq-t\}}{\#\{j:W_{j}\geq t\}\vee 1}\leq\alpha\right\},

with which we can show that

FDRW(L+)=𝔼[#{j:WjL+,j0}#{j:WjL+}]α.\mbox{FDR}_{W}(L_{+})=\mathbb{E}\left[\frac{\#\{j:W_{j}\geq L_{+},j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L_{+}\}}\right]\leq\alpha.

In what follows, as we mainly focus on the asymptotic FDR control, the results with LL and L+L_{+} are generally the same. From the proof of this proposition, we can see that the inequality is due to the fact that

#{j:Wjt,j0}#{j:Wjt}\#\{j:W_{j}\leq-t,j\in\mathcal{I}_{0}\}\leq\#\{j:W_{j}\leq-t\}

which would usually be tight because most strong signals yield a positive WjW_{j} or at least a not too small value of WjW_{j}. This implies that it is very likely that the FDR of the RESS will be fairly close to the nominal one unless a large proportion of μj\mu_{j}’s for j1j\in\mathcal{I}_{1} is very weak.

Proposition 1 is a direct corollary of the following result in which XjX_{j}’s are allowed to be asymmetric.

Proposition 2

Assume X1,,XpX_{1},\ldots,X_{p} are independent each other and nt4n_{t}\geq 4. For any α(0,1)\alpha\in(0,1), the RESS method satisfies

𝔼[#{j:WjL,j0}#{j:WjL}+α1]minϵ0{α(1+4ϵ)+Pr(maxj0Δj>ϵ)},\mathbb{E}\left[\frac{\#\{j:W_{j}\geq L,j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L\}+\alpha^{-1}}\right]\leq\min_{\epsilon\geq 0}\left\{\alpha(1+4\epsilon)+\Pr\left(\max_{j\in\mathcal{I}_{0}}\Delta_{j}>\epsilon\right)\right\},

where Δj=|Pr(Wj>0|Wj|)1/2|\Delta_{j}=\left|\Pr(W_{j}>0\mid|W_{j}|)-1/2\right|.

We can interpret Δj\Delta_{j} as measuring the extent to which the symmetry is violated for a specific variable jj. This result concurs with our intuition that controlling the Δj\Delta_{j}’s is sufficient to ensure control of the FDR for the RESS method. In the most ideal case where X1,,XpX_{1},\ldots,X_{p} are symmetrically distributed, Δj=0\Delta_{j}=0 for all j0j\in\mathcal{I}_{0}, and we automatically obtain the FDR-control result in Proposition 1 since we can take ϵ=0\epsilon=0. Under asymmetric scenarios, the Δj\Delta_{j} can still be expected to be small due to the convergence of T1jT_{1j} and T2jT_{2j} to the normal if nn is not too small. In the next theorems, we will show that under mild conditions maxj0Δj0\max_{j\in\mathcal{I}_{0}}\Delta_{j}\to 0 in probability, yielding a meaningful result on FDR control in more realistic settings. The proof of this proposition follows similarly to Theorem 2 in Barber et al. (2019) which shows that the Model-X knockoff (Candes et al., 2018) selection procedure incurs an inflation of the FDR that is proportional to the errors in estimating the distribution of each feature conditional on the remaining features.

For our asymptotic analysis, we need the following assumptions. Throughout this paper, we assume p1γpp_{1}\leq\gamma p for some γ<1\gamma<1, which includes the sparse setting p1=o(p)p_{1}=o(p).

Assumption 1

(Moments) (i) For some constant K1>0K_{1}>0, max1jp𝔼(Xjμj)4/σj4<K1\max_{1\leq j\leq p}\mathbb{E}(X_{j}-\mu_{j})^{4}/\sigma_{j}^{4}<K_{1}; (ii) For some constants C>0C>0 and K2>0K_{2}>0, max1jp𝔼[exp{C(Xjμj)2/σj2}]<K2\max_{1\leq j\leq p}\mathbb{E}[\exp\{C(X_{j}-\mu_{j})^{2}/\sigma_{j}^{2}\}]<K_{2}.

Assumption 2

(Signals) βpCard{j:|μj|/σj3logp/n}.\beta_{p}\equiv{\rm Card}\left\{j:|\mu_{j}|/\sigma_{j}\geq 3\sqrt{\log p/n}\right\}\to\infty.

Remark 2  The moment condition in Assumption 1-(i) is required in a large deviation result for the Student’s tt statistics on which our proof heavily hinges. Assumption 1-(ii), which requires exponentially light tails and implies that all moments of XX are finite, is stronger than Assumption 1-(i). This will be only needed when we want to use the RESS method with correction (see Section 2.2.2). In fact, for the familywise error control with bootstrap calibration, similar conditions are also imposed to achieve better accuracy (Fan et al., 2007). The implication of Assumption 2 is that p1p_{1}\to\infty. If the number of true alternatives is fixed as pp\to\infty, Liu and Shao (2014) have shown that even with the true pp-values, the BH method is unable to control FDP with a high probability. Thus, we use this condition to rule out such cases. \Box

Theorem 1

Assume X1,,XpX_{1},\ldots,X_{p} are independent each other. Suppose Assumptions 1-(i), 2 and p=o{exp(n1/3)}p=o\{\exp(n^{1/3})\} hold. For any α(0,1)\alpha\in(0,1), the FDP of the RESS method satisfies

FDPW(L)\displaystyle{\rm FDP}_{W}(L) #{j:WjL,j0}#{j:WjL}1\displaystyle\equiv\frac{\#\{j:W_{j}\geq L,j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L\}\vee 1}
α+Op{ξn,p/n+n1(logp)3maxj0κj2+βp1/2},\displaystyle\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}+\beta_{p}^{-1/2}\right\}, (2.2)

and limsup(n,p)FDRα\mathop{\lim\sup}_{(n,p)\to\infty}{\rm FDR}\leq\alpha, where κj=𝔼{(Xjμj)3}/σj3\kappa_{j}=\mathbb{E}\{(X_{j}-\mu_{j})^{3}\}/\sigma_{j}^{3} and ξn,p=max(logp,logn)\xi_{n,p}=\max(\log p,\log n).

The proof of this theorem relies on a nice large-deviation result for tt-statistics (Delaigle et al., 2011) but our new statistic WjW_{j} makes that the technical details of our theory are not straightforward and cannot be obtained from existing works. Under a finite fourth moment condition, Theorem 1 reveals that the FDR of our RESS method can be controlled if 1\mathcal{I}_{1} satisfies the technical condition on the alternative set, Assumption 2. Roughly speaking, the theorem ensures that the maxj0Δj\max_{j\in\mathcal{I}_{0}}\Delta_{j} in Proposition 2, which can be bounded by maxj0sup0t2logp|fj(t)/fj(t)1|\max_{j\in\mathcal{I}_{0}}\sup_{0\leq t\leq{2\log p}}|f_{j}(t)/f_{j}(-t)-1|, is small, where fj()f_{j}(\cdot) denotes the probability density function of WjW_{j}. Note that the inequality in (2.2) is mainly due to

j0𝕀(WjL)j𝕀(WjL)j0𝕀(WjL)j𝕀(WjL)1.\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L\right)}{\sum_{j}\mathbb{I}\left(W_{j}\leq-L\right)}\approx\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-L\right)}{\sum_{j}\mathbb{I}\left(W_{j}\leq-L\right)}\leq 1.

Hence, the FDR control is often quite tight because

j1𝕀(WjL)/j0𝕀(WjL)p0\sum_{j\in\mathcal{I}_{1}}\mathbb{I}\left(W_{j}\leq-L\right)/\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-L\right)\mathop{\rightarrow}\limits^{p}0

in many situations, such like (p1||)/p00(p_{1}-|\mathcal{M}|)/p_{0}\to 0, where ={i:|μi|/σi(2+c)logp/n}\mathcal{M}=\left\{i:|\mu_{i}|/\sigma_{i}\geq\sqrt{(2+c)\log p/n}\right\} for any small c>0c>0. See a proof given in the Supplementary Material.

It is interesting to further unpack the convergence rate given in this theorem. Liu and Shao (2014) has shown that the convergence rate of the bootstrap calibration is

FDPBootstrap\displaystyle{\rm FDP}_{\rm Bootstrap} α+Op{logp/n+n1(logp)2}.\displaystyle\lesssim\alpha+O_{p}\left\{\sqrt{\log p/n}+n^{-1}(\log p)^{2}\right\}.

The FDRW{\rm FDR}_{W} indicates that our “raw” RESS method is inferior to the bootstrap calibration, especially when pp is very large (so that the term of order n1n^{-1} has non-ignorable effect). Actually, the term n1(logp)3n^{-1}(\log p)^{3} can be eliminated by a simple correction as discussed below.

2.2.2 A refined procedure

By more carefully examining the FDP of the proposed RESS method, we can show that for any 0t2logp0\leq t\leq 2\log p, we have

FDP^W(t)\displaystyle\widehat{\rm FDP}_{W}(t) #{j:WjL}#{j:WjL}1\displaystyle\equiv\frac{\#\{j:W_{j}\leq-L\}}{\#\{j:W_{j}\geq L\}\vee 1}
FDPW(t)(12t3κ¯9n)+Op(ξn,p/n+βp1/2),\displaystyle\approx{\rm FDP}_{W}(t)\left(1-\frac{2t^{3}\bar{\kappa}}{9n}\right)+O_{p}\left(\sqrt{\xi_{n,p}/n}+\beta_{p}^{-1/2}\right), (2.3)

where κ¯=p01j0κj2\bar{\kappa}=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}. Say, we are able to express the term of order n1t3n^{-1}t^{3} to a more accurate way, which benefits from utilizing the empirical distribution p1j𝕀(Wjt)p^{-1}\sum_{j}\mathbb{I}(W_{j}\leq-t) to approximate p01j0𝕀(Wjt)p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t) and “surprisingly” eliminate the terms of order n1/2(logp)3/2n^{-1/2}(\log p)^{3/2}. Clearly, the FDP^W(t)\widehat{\rm FDP}_{W}(t) is an underestimate of the true FDP, and in turn yields an inflation of the FDR.

This motives us to consider the test statistic W~j=nX¯1jX¯2j/s1j2=T1jT~2j\tilde{W}_{j}=n\bar{X}_{1j}\bar{X}_{2j}/s_{1j}^{2}=T_{1j}\tilde{T}_{2j}, where T~2j=nX¯2j/s1j\tilde{T}_{2j}=\sqrt{n}\bar{X}_{2j}/s_{1j}. As shown in the Appendix, the FDP of using W~j\tilde{W}_{j} satisfies

FDP^W~(t)FDPW~(t)(1+5t3κ¯18n)+Op(ξn,p/n+βp1/2).\displaystyle\widehat{\rm FDP}_{\tilde{W}}(t)\approx{\rm FDP}_{\tilde{W}}(t)\left(1+\frac{5t^{3}\bar{\kappa}}{18n}\right)+O_{p}\left(\sqrt{\xi_{n,p}/n}+\beta_{p}^{-1/2}\right). (2.4)

The difference in the asymptotic expansions of FDP^W(t)\widehat{\rm FDP}_{W}(t) and FDP^W~(t)\widehat{\rm FDP}_{\tilde{W}}(t) is due to the different large-deviation probabilities of the WjW_{j} and W~j\tilde{W}_{j}. This difference immediately suggests a “refined” threshold as,

Lrefined=inf{t>0:#{j:Wjt}#{j:Wjt}1{149θ(t)}α},\displaystyle L_{\rm refined}=\inf\left\{t>0:\frac{\#\{j:W_{j}\leq-t\}}{\#\{j:W_{j}\geq t\}\vee 1}\left\{1-\frac{4}{9}\theta(t)\right\}\leq\alpha\right\}, (2.5)

where

θ(t)=(#{j:Wjt}#{j:Wjt})(#{j:W~jt}#{j:W~jt})#{j:Wjt}1.\theta(t)=\frac{\left(\#\{j:W_{j}\leq-t\}-\#\{j:W_{j}\geq t\}\right)-\left(\#\{j:\tilde{W}_{j}\leq-t\}-\#\{j:\tilde{W}_{j}\geq t\}\right)}{\#\{j:W_{j}\leq-t\}\vee 1}.

We show that θ(t)t3κ¯2n\theta(t)\approx-\frac{t^{3}\bar{\kappa}}{2n} under certain conditions, and consequently using LrefinedL_{\rm refined} is capable of eliminating the effect of the term 2t3κ¯9n\frac{2t^{3}\bar{\kappa}}{9n} in (2.2.2).

The next theorem demonstrates that the refined procedure has better convergence rate in certain circumstance. Basically, we restrict our attention to the sparse case, say p1/p00p_{1}/p_{0}\to 0, such like p1=pηp_{1}=p^{\eta} for 0<η<10<\eta<1. This is because the term of order t3/nt^{3}/n only matters when tt is large. From the proof of Theorem 1 we see that L2log{p/(βpα)}L\lesssim 2\log\{p/(\beta_{p}\alpha)\}. In other words, only if p1p_{1} or βp\beta_{p} is small, the tail approximation of FDP^W(t)\widehat{\rm FDP}_{{W}}(t) to FDPW(t){\rm FDP}_{{W}}(t) would be important.

Theorem 2

Assume X1,,XpX_{1},\ldots,X_{p} are independent each other. Suppose Assumptions 1-(ii), 2, p=o{exp(n1/3)}p=o\{\exp(n^{1/3})\}, p1=pηp_{1}=p^{\eta} and (p1/βp1)log2(p/βp)=O(1)(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1) hold. For any α(0,1)\alpha\in(0,1), the FDP of the refined RESS method satisfies FDPrefinedα+Op{ξn,p/n+n1(logp)2+βp1/2}{\rm FDP}_{\rm refined}\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{2}+\beta_{p}^{-1/2}\right\}.

In this theorem, the condition (p1/βp1)log2(p/βp)=O(1)(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1) implies that the number of the signals we can identify dominates the number of those very weak signals. The RESS method with the refined threshold LrefinedL_{\rm refined} has the same convergence rate as the bootstrap calibration. Simultaneous testing of many hypotheses allows us to construct a “data-driven” correction of skewness without resampling. Thus, in some sense, large-scale tt tests with ultrahigh-dimension may not be considered as a “curse” but a “blessing” in our problem.

We summarize the refined RESS procedure as follows.

Reflection via Sampling Splitting Procedure (RESS)

  • Step 1: Randomly split the data into two parts with equal size. Compute X¯kj\bar{X}_{kj} and skj2s_{kj}^{2} for k=1,2k=1,2 and j=1,,pj=1,\ldots,p;

  • Step 2: Obtain WjW_{j} and W~j\tilde{W}_{j} for j=1,,pj=1,\ldots,p. Compute θ(t)\theta(t) for t{|Wj|}j=1pt\in\{|W_{j}|\}_{j=1}^{p};

  • Step 3: Find the threshold LrefinedL_{\rm refined} by (2.5) and reject 0j\mathbb{H}_{0j} if WjLrefinedW_{j}\geq L_{\rm refined}.

The total computation complexity is of order O(np+plogp)O(np+p\log p) and the procedure can be easily implemented even without high-level programming language. The R and Excel codes are available upon request.

We want to make some remarks on the use of W~j\tilde{W}_{j}. As can be seen from (2.4), FDP^W~(t)\widehat{\mathrm{FDP}}_{\tilde{W}}(t) is an overestimate of the true FDP, and therefore yields a slightly more conservative procedure. In practice, if the computation is our major concern, using the RESS procedure with W~j\tilde{W}_{j} could be a safe choice.

2.2.3 Dependent case

We establish theoretical properties of the dependent XX case. The first result is a direct extension of Proposition 2.

Proposition 3

Assume that nt4n_{t}\geq 4. For any α(0,1)\alpha\in(0,1), the RESS method satisfies

FDRminϵ0{α(1+4ϵ)+Pr(maxj0Δj>ϵ)},\mbox{\rm FDR}\leq\min_{\epsilon\geq 0}\left\{\alpha(1+4\epsilon)+\Pr\left(\max_{j\in\mathcal{I}_{0}}\Delta_{j}^{\prime}>\epsilon\right)\right\},

where Δj=|Pr(Wj>0|Wj|,𝐖j)1/2|\Delta_{j}^{\prime}=\left|\Pr(W_{j}>0\mid|W_{j}|,{\bf W}_{-j})-1/2\right| and 𝐖j=(W1,,Wp)Wj{\bf W}_{-j}=(W_{1},\ldots,W_{p})^{\top}\setminus W_{j}.

Again, Δj\Delta_{j}^{\prime} quantifies the effect of both the asymmetry of XjX_{j} and the dependence between WjW_{j} and 𝐖j{\bf W}_{-j} on the FDR.

To achieve asymptotical FDR control, the following condition on the dependence structure is imposed.

Assumption 3

(Correlation) For each XjX_{j}, assume that the number of variables XkX_{k} that are dependent with XjX_{j} is no more than rp=o(βp)r_{p}=o(\beta_{p}).

This assumption implies that XjX_{j} is independent with the other prpp-r_{p} variables. This is certainly not the weakest condition, but is adopted here to simplify the proof.

Let ζp=(rp/βp)1/2\zeta_{p}=(r_{p}/\beta_{p})^{1/2}. The following theorem is parallel with Theorems 1-2.

Theorem 3

Suppose Assumptions 2, 3 and p=o{exp(n1/3)}p=o\{\exp(n^{1/3})\} hold.

  • (i)

    If Assumption 1-(i) hold, then for any α(0,1)\alpha\in(0,1), the FDP of the RESS method satisfies

    FDPα+Op{ξn,p/n+n1(logp)3maxj0κj2+ζp},\displaystyle{\rm FDP}\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}+\zeta_{p}\right\},

    and limsupnFDRα\mathop{\lim\sup}_{n\to\infty}{\rm FDR}\leq\alpha.

  • (ii)

    If Assumption 1-(ii), p1=o(p)p_{1}=o(p) and (p1/βp1)log2(p/βp)=O(1)(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1) hold, then for any α(0,1)\alpha\in(0,1), the FDP of the refined RESS method satisfies FDPrefinedα+Op{ξn,p/n+n1(logp)2+ζp}{\rm FDP}_{\rm refined}\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{2}+\zeta_{p}\right\}.

This theorem implies that the RESS method remains valid asymptotically for weak dependence. Comparing Theorem 3 with Theorems 1-2, the main difference lies on the convergence rates of βp1/2\beta_{p}^{-1/2} and ζp\zeta_{p}; the latter one is asymptotically larger. This can be understood because the approximation of the empirical distribution to the population one is expected to have slower rate for dependent summation. When XjX_{j} is independent with all the other variables XkX_{k} or only dependent with finite number of XkX_{k}, then rp=O(1)r_{p}=O(1). In this situation, Theorem 3 reduces to Theorems 1-2.

Remark 3  In our discussion, we restrict p=o{exp(n1/3)}p=o\{\exp(n^{1/3})\} which facilitates our technical derivation. In Liu and Shao (2014), a faster rate of pp is allowed, p=o{exp(n1/2)}p=o\{\exp(n^{1/2})\}. However, we also notice that the bootstrap method proposed by Liu and Shao (2014) jointly uses the empirical distribution of Tj,j=1,pT_{j}^{*},j=1,\ldots p, where TjT_{j}^{*} is the bootstrap statistic for the jjth variable. This implies that its computation complexity is of order O(p2B+pnB)O(p^{2}B+pnB) and it also requires O(pB)O(pB) storage, where BB is the number of bootstrap replications. For the commonly used bootstrap, say to approximate the distribution of TjT_{j} individually by resampling (Fan et al., 2007), p=o{exp(n1/2)}p=o\{\exp(n^{1/2})\} is achieved if the replication of bootstrap is of order p2p^{2}. Though our theoretical results only allow p=o{exp(n1/3)}p=o\{\exp(n^{1/3})\}, we conjecture that similar results also hold when p=o{exp(n1/2)}p=o\{\exp(n^{1/2})\} if more stringent conditions were imposed. Encouragingly, our extensive simulation results show that the refined procedure could work at least as good as the bootstrap method in terms of FDR control, even when pp is super-large. \Box

3 Extensions

In this section, we discuss two generalizations of our RESS procedure.

3.1 One-sided alternatives

In certain situations, we are interested in the case with one-sided alternatives, say without loss of generality, μj<0\mu_{j}<0 for all j1j\in\mathcal{I}_{1}. We may modify the RESS by ruling out the cells with T1j>0T_{1j}>0 and T2j>0T_{2j}>0 from the set {j:WjL}\{j:W_{j}\geq L\} to improve the power. To be more specific, the threshold in (2.1) is modified by

L~=inf{t>0:#{j:Wjt}#{j:Wjt,T1j>0,T2j>0}#{j:Wjt,T1j<0,T2j<0}1α},\tilde{L}=\inf\left\{t>0:\frac{\#\{j:W_{j}\leq-t\}-\#\{j:W_{j}\geq t,T_{1j}>0,T_{2j}>0\}}{\#\{j:W_{j}\geq t,T_{1j}<0,T_{2j}<0\}\vee 1}\leq\alpha\right\},

and we reject 0j:μj0\mathbb{H}_{0j}:\mu_{j}\geq 0 when WjL~,T1j<0,T2j<0W_{j}\geq\tilde{L},T_{1j}<0,T_{2j}<0. We have the following result.

Corollary 1

Consider the one-sided hypotheses that μj<0\mu_{j}<0 for all j1j\in\mathcal{I}_{1}. If the conditions in Theorem 3-(i) hold, then for any α(0,1)\alpha\in(0,1), the FDP of the RESS method with the threshold L~\tilde{L} satisfies

FDP(L~)α+Op{ξn,p/n+n1(logp)3maxj0κj2+ζp1},\displaystyle{\rm FDP}(\tilde{L})\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}+\zeta_{p}^{-1}\right\},

and limsupnFDR(L~)α\mathop{\lim\sup}_{n\to\infty}{\rm FDR}(\tilde{L})\leq\alpha.

By using the results in Delaigle et al. (2011), the convergence rate of the normal calibration is

FDPΦ\displaystyle{\rm FDP}_{\Phi} α+Op{n1/2(logp)3/2maxj0κj2}.\displaystyle\lesssim\alpha+O_{p}\left\{n^{-1/2}(\log p)^{3/2}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}\right\}.

Comparing this with Corollary 1, we see that the RESS strategy has removed the skewness term that describes first-order inaccuracies of the standard normal approximation. This important property is due to the fact that WjW_{j} is more symmetric than the tt statistic. See the proof of Theorem 1 for details. The refined RESS procedure, which also enjoys the second-order accuracy, can be defined similarly to LrefinedL_{\rm refined} but we do not discuss it in detail.

3.2 Two-sample problem

We next extend the RESS procedure to two-sample problem. Assume there is another random sample 𝒢=(Zi1,,Zip)i=12m\mathcal{G}=(Z_{i1},\cdots,Z_{ip})_{i=1}^{2m} distributed from (Z1,,Zp)(Z_{1},\cdots,Z_{p}). The population mean vectors of (X1,,Xp)(X_{1},\cdots,X_{p}) and (Z1,,Zp)(Z_{1},\cdots,Z_{p}) are (μ1X,,μpX)(\mu^{X}_{1},\cdots,\mu^{X}_{p}) and (μ1Z,,μpZ)(\mu^{Z}_{1},\cdots,\mu^{Z}_{p}), respectively. We aim to carry out pp two-sample tests, that is, 0j:μjX=μjZ\mathbb{H}_{0j}:\mu^{X}_{j}=\mu^{Z}_{j} versus 1j:μjXμjZ\mathbb{H}_{1j}:\mu^{X}_{j}\neq\mu^{Z}_{j}, for j=1,,pj=1,\ldots,p. The RESS procedure can be readily generalized to this two-sample problem as follows.

Firstly, similar to the sample splitting of 𝒟1{\mathcal{D}}_{1} and 𝒟2{\mathcal{D}}_{2}, the data 𝒢\mathcal{G} are also splitted randomly into two disjoint groups 𝒢1\mathcal{G}_{1} and 𝒢2\mathcal{G}_{2} with equal size mm. Based on 𝒟k{\mathcal{D}}_{k} and 𝒢k\mathcal{G}_{k}, two-sample tt-test TkjT_{kj} is defined as follows:

Tkj=X¯kjZ¯kjsXkj2/n+sZkj2/mfor j=1,,pand k=1,2.T_{kj}=\frac{\bar{X}_{kj}-\bar{Z}_{kj}}{\sqrt{s^{2}_{X_{kj}}/n+s^{2}_{Z_{kj}}/m}}~{}~{}\text{for }j=1,\ldots,p~{}~{}{\text{and }}k=1,2.

Here X¯kj\bar{X}_{kj} and Z¯kj\bar{Z}_{kj} are the sample means of XjX_{j} and ZjZ_{j}, while sXkj2s^{2}_{X_{kj}} and sZkj2s^{2}_{Z_{kj}} are the sample variances of XjX_{j} and ZjZ_{j}. Finally, define Wj=T1jT2jW_{j}=T_{1j}T_{2j}. The threshold LL can be then selected similarly as in (2.1), and 0j\mathbb{H}_{0j} will be rejected when WjLW_{j}\geq L.

To establish the FDR control result, the Assumptions 2 and 3 are modified as follows.

Assumption 4

(Signals) For a large CC, βpCard{j:|μjXμjZ|Clogp/n}.\beta_{p}\equiv{\rm Card}\left\{j:|\mu^{X}_{j}-\mu^{Z}_{j}|\geq C\sqrt{\log p/n}\right\}\to\infty.

Assumption 5

(Correlation) For each XjX_{j}, assume that the number of variables XkX_{k} that are dependent with XjX_{j} is no more than rp=o(βp)r_{p}=o(\beta_{p}). The same assumption is imposed on (Z1,,Zp)(Z_{1},\cdots,Z_{p}).

By Theorem 2.4 in Chang et al. (2016) and the proofs for Theorem 3, we have the following result.

Corollary 2

Suppose Assumption 1-(i), Assumptions 4 and 5 and p=o(exp(n1/3))p=o(\exp(n^{1/3})) hold, then for any α(0,1)\alpha\in(0,1), the FDP of the RESS method for the two-sample problem satisfies

FDP(L)α+Op{ξn,p/n+n1(logp)3+ζp1},\displaystyle{\rm FDP}(L)\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}+\zeta_{p}^{-1}\right\},

and limsupnFDRα\mathop{\lim\sup}_{n\to\infty}{\rm FDR}\leq\alpha.

4 Numerical results

4.1 Simulation comparison

We evaluate the performance of our proposed RESS procedure on simulated data sets and compare the FDR and true positive rate (TPR) with other existing techniques. All the results are obtained from 200 replication simulations.

4.1.1 Model and benchmarks

We set the model as Xij=μj+εij,1int,1jpX_{ij}=\mu_{j}+\varepsilon_{ij},~{}~{}1\leq i\leq n_{t},~{}~{}1\leq j\leq p, where the alternative signal using μj=δjlogp/nt\mu_{j}=\delta_{j}\sqrt{\log p/n_{t}} with δjUnif(1.5,1)\delta_{j}\sim\text{Unif}(-1.5,-1) or δjUnif(1,1.5)\delta_{j}\sim\text{Unif}(1,1.5) for j1j\in\mathcal{I}_{1}. The random error is generated by the autoregression model εij=ρεi,j1+ϵij\varepsilon_{ij}=\rho\varepsilon_{i,j-1}+\epsilon_{ij}, where ρ[0,1)\rho\in[0,1) and ϵij\epsilon_{ij}’s are i.i.d from three centered distributions: Student’s tt with five degrees of freedom t(5)t(5), exponential with rate one (exp(1)1\text{exp}(1)-1) and a mixed distribution which consists of ϵijN(0,1)\epsilon_{ij}\sim N(0,1) for 1j[p/3]1\leq j\leq[p/3], ϵijt(5)\epsilon_{ij}\sim t(5) for [p/3]+1j[2p/3][p/3]+1\leq j\leq[2p/3] and ϵijexp(1)1\epsilon_{ij}\sim\text{exp}(1)-1 for [2p/3]+1jp[2p/3]+1\leq j\leq p. When ρ=0\rho=0, the random errors are independent. We consider the number of alternatives as p1=[ϑp]p_{1}=[\vartheta p] with ϑ[0.01,0.15]\vartheta\in[0.01,0.15].

The following three benchmarks are considered for comparison. The first one is the BH procedure with the pp-values obtained from the standard normal approximation (termed as BH for simplicity). The other two are bootstrap-based approaches. Assume {𝑿k1,𝑿knt}\{{\boldsymbol{X}}^{*}_{k1},\ldots{\boldsymbol{X}}^{*}_{kn_{t}}\}, 1kB1\leq k\leq B denotes bootstrap resamples drawn by sampling randomly and TkjT^{*}_{kj} are Student’s t test statistics constructed from {X1kjX¯j,,XntkjX¯j}\{X^{*}_{1kj}-\bar{X}_{j},\ldots,X^{*}_{n_{t}kj}-\bar{X}_{j}\}. One bootstrap method is to estimate the pp-values according to the bootstrap distribution individually by pj,BI=1/Bk=1B𝕀{|Tkj||Tj|}p_{j,BI}=1/B\sum_{k=1}^{B}\mathbb{I}\{|T^{*}_{kj}|\geq|T_{j}|\} (Fan et al., 2007) and another one is to estimate the pp-values with the average pp bootstrap distribution together by pj,BA=1/Bpk=1Bi=1p𝕀{|Tki||Tj|}p_{j,BA}=1/{Bp}\sum_{k=1}^{B}\sum_{i=1}^{p}\mathbb{I}\{|T^{*}_{ki}|\geq|T_{j}|\} (Liu and Shao, 2014). We call these two as “I-bootstrap” and “A-bootstrap”, respectively. The A-bootstrap jointly estimates the distribution of TjT_{j}’s and thus can be expected to have better performance. However, the computational complexity of I-bootstrap is much lower; it is of order O(npB)O(npB) while that of A-bootstrap increases in a quadratic rate of pp. We take the number of bootstrap samples as B=200B=200 in this section.

4.1.2 Results

We compare the performance of our proposed RESS method (termed as “RESS0\text{RESS}_{0}”) and the refined RESS method (“RESS”) in a range of settings, with the BH procedure, I-bootstrap and A-bootstrap, and examine the effects of skewness, signal magnitude and correlation between variables.

Table 1: Comparison results of FDR, TPR and the average computing time under p=5000,ρ=0p=5000,\rho=0 and ϑ=0.05\vartheta=0.05. Values in parentheses are standard deviations.
Errors nt=50n_{t}=50 nt=100n_{t}=100
Method FDR(%) TPR(%) Time(s) FDR(%) TPR(%) Time(s)
RESS 17.8(4.6)17.8_{\tiny{(4.6)}} 51.6(5.5)51.6_{\tiny{(5.5)}} 1.75 18.5(5.0)18.5_{\tiny{(5.0)}} 51.6(5.8)51.6_{\tiny{(5.8)}} 1.84
RESS0\text{RESS}_{0} 19.7(4.4)19.7_{\tiny{(4.4)}} 53.8(5.4)53.8_{\tiny{(5.4)}} 1.66 19.9(4.9)19.9_{\tiny{(4.9)}} 53.1(5.6)53.1_{\tiny{(5.6)}} 1.74
t(5)t(5) BH 16.6(3.9)16.6_{\tiny{(3.9)}} 52.1(4.4)52.1_{\tiny{(4.4)}} 1.60 17.7(3.2)17.7_{\tiny{(3.2)}} 52.9(4.5)52.9_{\tiny{(4.5)}} 1.60
A-bootstrap 12.8(3.4)12.8_{\tiny{(3.4)}} 47.6(4.8)47.6_{\tiny{(4.8)}} 404 16.3(3.1)16.3_{\tiny{(3.1)}} 51.4(4.5)51.4_{\tiny{(4.5)}} 400
I-bootstrap 22.3(2.6)22.3_{\tiny{(2.6)}} 53.2(3.3)53.2_{\tiny{(3.3)}} 314 23.6(3.0)23.6_{\tiny{(3.0)}} 54.9(3.3)54.9_{\tiny{(3.3)}} 311
RESS 19.4(5.0)19.4_{\tiny{(5.0)}} 63.1(10.1)63.1_{\tiny{(10.1)}} 1.76 20.8(4.9)20.8_{\tiny{(4.9)}} 75.2(5.8)75.2_{\tiny{(5.8)}} 1.85
RESS0\text{RESS}_{0} 32.2(3.8)32.2_{\tiny{(3.8)}} 81.5(3.6)81.5_{\tiny{(3.6)}} 1.66 27.3(3.9)27.3_{\tiny{(3.9)}} 81.4(2.8)81.4_{\tiny{(2.8)}} 1.76
exp(1) BH 37.2(2.7)37.2_{\tiny{(2.7)}} 85.5(2.9)85.5_{\tiny{(2.9)}} 1.70 30.2(2.7)30.2_{\tiny{(2.7)}} 84.4(2.5)84.4_{\tiny{(2.5)}} 1.63
A-bootstrap 18.8(3.5)18.8_{\tiny{(3.5)}} 62.8(5.1)62.8_{\tiny{(5.1)}} 424 19.7(2.8)19.7_{\tiny{(2.8)}} 76.0(3.4)76.0_{\tiny{(3.4)}} 403
I-bootstrap 32.2(3.5)32.2_{\tiny{(3.5)}} 70.2(4.6)70.2_{\tiny{(4.6)}} 332 28.2(2.6)28.2_{\tiny{(2.6)}} 77.9(2.6)77.9_{\tiny{(2.6)}} 317
RESS 20.7(3.9)20.7_{\tiny{(3.9)}} 67.7(5.3)67.7_{\tiny{(5.3)}} 1.85 20.6(4.0)20.6_{\tiny{(4.0)}} 70.0(4.4)70.0_{\tiny{(4.4)}} 2.17
RESS0\text{RESS}_{0} 25.1(3.9)25.1_{\tiny{(3.9)}} 72.2(4.0)72.2_{\tiny{(4.0)}} 1.75 23.1(3.7)23.1_{\tiny{(3.7)}} 72.1(3.8)72.1_{\tiny{(3.8)}} 2.06
Mixed BH 26.2(2.9)26.2_{\tiny{(2.9)}} 74.0(3.1)74.0_{\tiny{(3.1)}} 1.65 23.7(3.1)23.7_{\tiny{(3.1)}} 74.3(3.2)74.3_{\tiny{(3.2)}} 1.64
A-bootstrap 17.5(3.0)17.5_{\tiny{(3.0)}} 65.1(4.4)65.1_{\tiny{(4.4)}} 405 19.0(2.9)19.0_{\tiny{(2.9)}} 70.2(3.5)70.2_{\tiny{(3.5)}} 409
I-bootstrap 25.7(4.4)25.7_{\tiny{(4.4)}} 67.3(5.6)67.3_{\tiny{(5.6)}} 314 26.9(4.1)26.9_{\tiny{(4.1)}} 72.8(4.6)72.8_{\tiny{(4.6)}} 319

Firstly, we set p=5000p=5000 and ϑ=0.05\vartheta=0.05. The full sample size ntn_{t} is taken to be 5050 or 100100, and the target FDR level α\alpha is set as 0.20.2. Table 1 displays the estimated FDR, TPR and average computation time obtained by each method under three different error settings with ρ=0\rho=0. For the symmetric distribution t(5)t(5), the RESS0\text{RESS}_{0} is able to deliver a quite accurate control but that is not the case for the other skewed errors. It performs better than the BH in terms of FDR control, but has a slightly disadvantage over the I-bootstrap. This is consistent with our theoretical analysis in Section 2.2.1. Actually, from Theorem 1 and Theorem 3-(i), when pp is very large, the skewness still has non-ignorable effect on the FDR control of the RESS0\text{RESS}_{0} method. In contrast, we observe that the FDR levels of RESS are close to the nominal level under all the scenarios; it is clearly more effective than the I-bootstrap, RESS0\text{RESS}_{0} and BH, and the difference is quite remarkable in some cases. Certainly, this is not surprising as it is a data-driven method which has second-order accuracy justified in Section 2.2.2. The power of RESS is also quite high compared to the other methods, revealing that its detection ability is not largely compromised by data-splitting.

Refer to caption
Figure 3: Relative times and FDR curves against the number of tests pp when errors are distributed from t(5)\text{t}(5) under nt=100n_{t}=100, ρ=0\rho=0 and ϑ=0.05\vartheta=0.05; The grey dashed lines indicate the target FDR level.

The I-bootstrap improves slightly the accuracy of FDR control under the exp(1)\text{exp}(1) and mixed cases over the normal calibration, but its FDRs are still considerably larger than the nominal level. Note that the I-bootstrap calibration may need an extremely large replications BB, i.e. p2p^{2}, to achieve FDR control (Liu and Shao, 2014), and therefore does not perform well with commonly used B=200B=200. The A-bootstrap method offers comparable performances to our RESS method, though it tends to be a little conservative under the t(5)t(5) cases. Also, the A-bootstrap generally has smaller variations than RESS due to the use of bootstrap for estimating an overall empirical null distribution. Certainly, this gain comes partly from its computation-intensive feature; in most cases, it requires more than 200 times computational time than the RESS. In fact, as mentioned earlier, the computational complexities of A-bootstrap and RESS are of order p2B+pntBp^{2}B+pn_{t}B and plogp+pntp\log p+pn_{t}, respectively, and hence their relative computational time could increase fast as pp increases. Figure 3 depicts the relative time of A-bootstrap to RESS and the FDR vaules under the t(5)t(5) cases.

Refer to caption
Figure 4: Box-plots of FDP and true positive proportion (TPP) when errors are distributed from exp(1)\exp(1) under nt=100n_{t}=100, p=5000p=5000, and ρ=0\rho=0; The red dashed lines indicate the target FDR level.
Refer to caption
Figure 5: FDR curves against the values of λ\lambda when errors are distributed from Gamma distribution Γ(λ/2,2)\Gamma(\lambda/2,2) under nt=100n_{t}=100, p=5000p=5000, ρ=0\rho=0 and ϑ=0.05\vartheta=0.05; The red dashed lines indicate the target FDR level.

Next, we examine the effect of the skewness and the proportion of alternatives. As we have shown that the refined RESS performs usually better than the RESS0{\rm RESS}_{0}, in what follows we focus only on the refined RESS. With respect to skewness, we evaluate the performances of various methods by varying shape parameter λ\lambda of Gamma distribution Γ(λ/2,2)\Gamma(\lambda/2,2), Figure 5 depicts the estimated FDR curves against the values of λ\lambda when p=5000p=5000, nt=100n_{t}=100 and ϑ=0.05\vartheta=0.05 for α=0.1\alpha=0.1 and α=0.2\alpha=0.2. It can be seen that the refined RESS successfully controls FDR in an acceptable range of the target level no matter the magnitude of the skewness. Figure 4 shows the boxplots of FDP and powers under ϑ=0.03,0.05,0.1\vartheta=0.03,0.05,0.1 and 0.150.15 when the errors are generated from exp(1)\text{exp}(1). The refined RESS has stable FDP close to the nominal level α=0.2\alpha=0.2, even when ϑ\vartheta is small. Similar results with other error distributions and signal magnitudes can be found in the Supplementary Material.

Refer to caption
Figure 6: FDR curves against the correlation when errors are distributed from exp(1)\text{exp}(1) and mixed one under nt=100n_{t}=100, p=5000p=5000 and ϑ=0.05\vartheta=0.05; The red dashed lines indicate the target FDR level.

Finally, we turn to investigate the effect of the correlation level ρ\rho. Figure 6 shows the FDR curves of four methods against the values of ρ\rho when the errors are generated from the exp(1)\text{exp}(1) and mixed distributions. Again, it can be seen that the refined RESS and A-bootstrap result in a reasonably good FDR control in most situations, even when ρ\rho is as large as 0.8. This concurs with our asymptotic justification that the refined RESS method is still effective provided that X{X} satisfies certain weak dependence structure.

4.2 A real-data example

We next illustrate the proposed refined RESS procedure by an empirical analysis of the acute lymphoblastic leukemia (ALL) data, which consists of 12,256 gene probe sets for 128 adult patients enrolled in the Italian GIMEMA multi center clinical trial 0496. It is known that malignant cells in B-lineage ALL often have genetic abnormalities, which have a significant impact on the clinical course of the disease. Specifically, the molecular heterogeneity of the B-lineage ALL is well established as BCR/ABL, ALL1/AF4, E2A/PBX1 and NEG and the gene expression profiles of groups BCR/ABL and NEG are more similar to each other than to the others. In our analysis, we consider a sub-dataset of 79 B-lineage units with 37 BCR/ABL mutation and 42 NEG and use the traditional two-sample tt-test to examine which probe sets are differentially expressed. The dataset was previously studied by Chiaretti et al. (2005) and Bourgon et al. (2010) and is available at http://www.bioconductor.org.

Refer to caption
Figure 7: Density histogram of the skewness of the p=12,256p=12,256 genes for BCR/ABL and NEG, respectively.
Refer to caption
Figure 8: Scatter plot of the WjW_{j} for our raw RESS with the red triangles and blue dashed line denoting selected differentially expressed probe sets and the threshold under α=0.05\alpha=0.05.

Here, we consider the extension of our proposed refined RESS in two-sample case and compare it with BH procedure, A-bootstrap and I-bootrstrap over a wide range of significant levels. Both of the bootstrap sampling are repeated 200 times. Table 2 summarizes the number of probe sets differentially expressed between BCR/ABL and NEG. The BH procedure with the normal calibration tends to reject surprising more genes than RESS and A-bootstrap for various significance levels. In fact, the normality approximation seems to be violated for many of the genes as some skewness values largely deviate from zero in Figure 7. As noted earlier, the I-bootstrap needs an extremely large replications, i.e. p2p^{2} to improve the accuracy and thus leads to the large number of rejections with B=200B=200. Figure 8 presents the scatterplot of RESS’s statistics WjW_{j}. We can observe that all selected probe sets (red triangles) have large values of WjW_{j}, while the unselected ones (black dots) are roughly symmetric across the horizontal lines. With the data-driven threshold, the number of differentially expressed probe sets based on our RESS appears more reasonable. The results of A-bootstrap coincide with that of RESS as expected.

Table 2: Numbers of differentially expressed probe sets under various significant levels.
α\alpha RESS BH I-bootstrap A-bootstrap
0.05 157 163 326 141
0.1 221 238 326 222
0.15 280 334 476 310
0.2 333 414 476 397

5 Concluding Remarks

In this paper, we have proposed a multiple testing procedure, RESS, that controls FDR in the large-scale tt setting and offers high power to discover true signals. We give theoretical results showing that the proposed method maintains FDR control under mild conditions. The empirical performance of the refined RESS demonstrates excellent FDR control and reasonable power in comparison to other methods such as the bootstrap or normal calibrations. The ideas of the RESS procedure may be extended for controlling other rates such as per family error rate.

Appendix: Proofs

Appendix A: Lemmas

Before we present the proofs of the theorems, we first state several lemmas whose proofs can be found in the Supplementary Material. A few well-known theorems to be repeatedly used are also presented in the Supplementary Material. The first lemma characterizes the closeness between Pr(Wjt)\Pr(W_{j}\geq t) and Pr(Wjt){\Pr(W_{j}\leq-t)}, which plays an important role in the proof. For simplicity, we suppress the dependence of TkjT_{kj} on jj which should not cause any confusion. Let Φ~(x)=1Φ(x)\tilde{\Phi}(x)=1-\Phi(x), G(t)=p01j0Pr(Wjt)G(t)=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\Pr(W_{j}\geq t), G(t)=p01j0Pr(Wjt)G_{-}(t)=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\Pr(W_{j}\leq-t) and G1(y)=inf{t0:G(t)y}G^{-1}(y)=\inf\{t\geq 0:G(t)\leq y\} for 0y10\leq y\leq 1.

Lemma A.1

Suppose Assumption 1-(i) hold. For any 0tClogp0\leq t\leq C\log p with C>0C>0,

Pr(T1T2t)Pr(T1T2t)1=O(antn1/2)+2t3κ29n,\frac{\Pr(T_{1}T_{2}\geq t)}{\Pr(T_{1}T_{2}\leq-t)}-1=O(a_{nt}n^{-1/2})+\frac{2t^{3}\kappa^{2}}{9n},

where ant=(2t+logn)1/2a_{nt}=({2t}+{\log n})^{1/2}.

The second lemma characterizes the closeness between Pr(W~jt)\Pr(\tilde{W}_{j}\geq t) and Pr(W~jt){\Pr(\tilde{W}_{j}\leq-t)}.

Lemma A.2

Suppose Assumption 1-(ii) hold. For sufficiently large tt satisfying tlogpt\lesssim\log p,

Pr(W~t)Pr(W~t)1=O(antn1/2+t1/2)5t3κ218n.\frac{\Pr(\tilde{W}\geq t)}{\Pr(\tilde{W}\leq-t)}-1=O(a_{nt}n^{-1/2}+t^{-1/2})-\frac{5t^{3}\kappa^{2}}{18n}.

The next lemma establishes the uniform convergence of {p0G(t)}1j0𝕀(Wjt)1\{{p_{0}G(t)}\}^{-1}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t)}-1.

Lemma A.3

Suppose Assumptions 1-(i) and 3 hold. Then, for any bpb_{p}\to\infty and bp=o(p)b_{p}=o(p),

sup0tG1(αbp/p)|j0𝕀(Wjt)p0G(t)1|=Op(ζp),\displaystyle\sup_{0\leq t\leq G^{-1}(\alpha b_{p}/p)}\left|\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t)}{p_{0}G(t)}-1\right|=O_{p}(\zeta_{p}), (A.1)
sup0tG1(t)(αbp/p)|j0𝕀(Wjt)p0G(t)1|=Op(ζp).\displaystyle\sup_{0\leq t\leq G^{-1}_{-}(t)(\alpha b_{p}/p)}\left|\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{p_{0}G_{-}(t)}-1\right|=O_{p}(\zeta_{p}). (A.2)

The last one is the counterpart of Lemma A.3 for W~j\tilde{W}_{j}.

Lemma A.4

Suppose Assumptions 1-(ii) and 3 hold. Then, for any bpb_{p}\to\infty and bp=o(p)b_{p}=o(p), (A.1)(\ref{mco1}) and (A.2) holds if we replace Wj{W}_{j} and G(t)G(t) with W~j\tilde{W}_{j} and G~(t)\tilde{G}(t), where G~(t)=p01j0Pr(W~jt)\tilde{G}(t)=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\Pr(\tilde{W}_{j}\geq t).

Appendix B: Proof of Theorems

In fact, we show that under a weaker condition, i.e., Assumption 3, the results in Theorems 1-2 hold similarly, and accordingly Theorem 3 is also proved.

Proof of Theorem 1. By definition, our test is equivalent to reject 0j\mathbb{H}_{0j} if WjLW_{j}\geq{L}, where

L=inf{t0:j=1p𝕀(Wjt)αmax(j=1p𝕀(Wjt),1)}.{L}=\inf\left\{t\geq 0:\sum_{j=1}^{p}\mathbb{I}(W_{j}\leq-t)\leq\alpha\max\left(\sum_{j=1}^{p}\mathbb{I}(W_{j}\geq t),1\right)\right\}.

Let 𝒜\mathcal{A} be a subset of {1,2,,p}\{1,2,\ldots,p\} satisfying 𝒜{j:|μj|/σj3logp/n}\mathcal{A\subset}\left\{j:|\mu_{j}|/\sigma_{j}\geq 3\sqrt{\log p/n}\right\} and bp|𝒜|=min(nξn,p,βp)b_{p}\equiv|\mathcal{A}|=\min(\sqrt{n\xi_{n,p}},\beta_{p}). By Assumption 1-(i) and Markov’s inequality, for any ϵ>0\epsilon>0, Pr(maxj𝒜|skj2/σj21|ϵ)=O(ξn,p/n),k=1,2\Pr(\max_{j\in\mathcal{A}}|s_{kj}^{2}/\sigma^{2}_{j}-1|\geq\epsilon)=O(\sqrt{\xi_{n,p}/n}),k=1,2. By Assumption 2 and Lemma S.1, there exists some c>2c>2,

Pr(j=1p𝕀(Wjclogp)bp/2)\displaystyle\Pr\left(\sum_{j=1}^{p}\mathbb{I}(W_{j}\geq c\log p)\geq b_{p}/2\right)
Pr(j𝒜{𝕀(T1jT2jclogp)Pr(T1jT2jclogp)}bp/2j𝒜Pr(T1jT2jclogp))\displaystyle\geq\Pr\left(\sum_{j\in\mathcal{A}}\left\{\mathbb{I}(T_{1j}T_{2j}\geq c\log p)-\Pr(T_{1j}T_{2j}\geq c\log p)\right\}\geq b_{p}/2-\sum_{j\in\mathcal{A}}\Pr(T_{1j}T_{2j}\geq c\log p)\right)
1j𝒜Pr(T1jT2jclogp){j𝒜Pr(T1jT2jclogp)bp/2}2\displaystyle\geq 1-\frac{\sum_{j\in\mathcal{A}}\Pr(T_{1j}T_{2j}\geq c\log p)}{\left\{\sum_{j\in\mathcal{A}}\Pr(T_{1j}T_{2j}\geq c\log p)-b_{p}/2\right\}^{2}}
1cbp1{1+o(1)}=1+O(ξn,p/n+bp1),\displaystyle\geq 1-cb_{p}^{-1}\left\{1+o(1)\right\}=1+O(\sqrt{\xi_{n,p}/n}+b_{p}^{-1}), (A.3)

where we use the fact that Prj𝒜(T1j>clogp)Prj0(T1jclogp)(1p1){1+o(1)}\Pr_{j\in\mathcal{A}}(T_{1j}>\sqrt{c\log p})\geq\Pr_{j\in\mathcal{I}_{0}}(T_{1j}\geq-\sqrt{c\log p})\geq(1-p^{-1})\left\{1+o(1)\right\}.

Define ηp=2log{p/(bpα)}\eta_{p}=\sqrt{2\log\{p/(b_{p}\alpha)\}}. We note that G(t)αbp/pG(t)\geq\alpha b_{p}/p implies that tηp2t\leq\eta_{p}^{2} when pp is large. This is because

G(ηp2)2Pr(|T1|>ηp)22/πexp(ηp2/2)/ηpαbp/pG(t),\displaystyle G(\eta^{2}_{p})\leq 2\Pr(|T_{1}|>{\eta_{p}})\leq 2\sqrt{2/\pi}\exp(-\eta^{2}_{p}/2)/\eta_{p}\leq\alpha b_{p}/p\leq G(t),

if pp is sufficiently large. This, together with Lemma A.3 and Eq. (A.3) implies that LηpL\leq\eta_{p} with probability at least O(ξn,p/n+bp1)O(\sqrt{\xi_{n,p}/n}+b_{p}^{-1}).

Therefore, by Lemma A.1, we get

j0𝕀(WjL)j0𝕀(WjL)1=Op(ξn,p/n)+2L3maxj0κj29n{1+O(anL/n+ζp)}.\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-{L})}-1=O_{p}(\sqrt{\xi_{n,p}/n})+\frac{2{L}^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}}{9n}\left\{1+O(a_{n{L}}/\sqrt{n}+\zeta_{p})\right\}. (A.4)

Write

FDP\displaystyle{\rm FDP} =j0𝕀(WjL)1j𝕀(WjL)=j𝕀(WjL)1j𝕀(WjL)×j0𝕀(WjL)j𝕀(WjL)\displaystyle=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq{L}\right)}{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}
α×R(L).\displaystyle\leq\alpha\times R({L}).

Note that R(L)j0𝕀(WjL)/j0𝕀(WjL)R({L})\leq{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq{L}\right)}/{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-{L}\right)}, and thus limsupnFDPα\mathop{\lim\sup}_{n\to\infty}{\rm FDP}\leq\alpha in probability by (A.4). Then, for any ϵ>0\epsilon>0,

FDR(1+ϵ)αR(L)+Pr(FDP(1+ϵ)αR(L)),\mathrm{FDR}\leq(1+\epsilon)\alpha R({L})+\Pr\left(\mathrm{FDP}\geq(1+\epsilon)\alpha R({L})\right),

from which the second part of this theorem is proved. \Box

Proof of Theorem 2. We firstly establish a lower bound for LL so that the condition in Lemma A.2 is valid. Notice

j𝕀(Wjt)\displaystyle\sum_{j}\mathbb{I}(W_{j}\leq-t) j0𝕀(Wjt)p0G(t),\displaystyle\geq\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)\approx p_{0}G_{-}(t),
αj𝕀(Wjt)\displaystyle\alpha\sum_{j}\mathbb{I}(W_{j}\geq t) α{p1+j0𝕀(Wjt)}α{p1+p0G(t)}.\displaystyle\leq\alpha\left\{p_{1}+\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t)\right\}\approx\alpha\left\{p_{1}+p_{0}G(t)\right\}.

Hence, we can conclude that LG1(αp1/{p0(1α)})L\gtrsim G^{-1}(\alpha p_{1}/\{p_{0}(1-\alpha)\}).

For log{p0(1α)/(αp1)}tlog{p/(bpα)}\log\{p_{0}(1-\alpha)/(\alpha p_{1})\}\lesssim t\lesssim\log\{p/(b_{p}\alpha)\},

θ(t)=\displaystyle\theta(t)= {j𝕀(Wjt)j𝕀(W~jt)}{j𝕀(Wjt)j𝕀(W~jt)}j𝕀(Wjt)\displaystyle\frac{\left\{\sum_{j}\mathbb{I}(W_{j}\leq-t)-\sum_{j}\mathbb{I}(\tilde{W}_{j}\leq-t)\right\}-\left\{\sum_{j}\mathbb{I}(W_{j}\geq t)-\sum_{j}\mathbb{I}(\tilde{W}_{j}\geq t)\right\}}{\sum_{j}\mathbb{I}(W_{j}\leq-t)}
=\displaystyle= j0𝕀(Wjt)1j𝕀(Wjt)×[j0{𝕀(Wjt)𝕀(Wjt)}j0𝕀(Wjt)+j0{𝕀(W~jt)𝕀(W~jt)}j0𝕀(Wjt)\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}\times\Bigg{[}\frac{\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(W_{j}\leq-t)-\mathbb{I}(W_{j}\geq t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}+\frac{\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(\tilde{W}_{j}\geq t)-\mathbb{I}(\tilde{W}_{j}\leq-t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}
+j1{𝕀(W~jt)𝕀(Wjt)}j0𝕀(Wjt)+j1{𝕀(Wjt)𝕀(W~jt)}j0𝕀(Wjt)]\displaystyle+\frac{\sum_{j\in\mathcal{I}_{1}}\left\{\mathbb{I}(\tilde{W}_{j}\geq t)-\mathbb{I}(W_{j}\geq t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}+\frac{\sum_{j\in\mathcal{I}_{1}}\left\{\mathbb{I}(W_{j}\leq-t)-\mathbb{I}(\tilde{W}_{j}\leq-t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}\Bigg{]}
\displaystyle\equiv j0𝕀(Wjt)1j𝕀(Wjt)(D1+D2+D3+D4)\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}(D_{1}+D_{2}+D_{3}+D_{4})

By Lemmas A.1-A.4, we get

D1\displaystyle D_{1} =Op(ξn,p/n+ζp)2t3κ¯9n,\displaystyle=O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)-\frac{2t^{3}\bar{\kappa}}{9n},
D2\displaystyle D_{2} =Op(ξn,p/n+ζp+(logp)1)5t3κ¯18n.\displaystyle=O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}+(\log p)^{-1}\right)-\frac{5t^{3}\bar{\kappa}}{18n}.

Next, we deal with D3D_{3}, and the D4D_{4} can be bounded in a similar way to D3D_{3}. Let 𝒞p\mathcal{C}_{p} denote the event that {maxj1|skj2/σj21|<δn,k=1,2}\left\{\max_{j\in\mathcal{I}_{1}}|s_{kj}^{2}/\sigma^{2}_{j}-1|<\delta_{n},k=1,2\right\} and δn=cn1logp\delta_{n}=c\sqrt{n^{-1}\log p} for some sufficiently large c>0c>0. By Assumption 1-(ii), we have

Pr{maxj1|skj2/σj21|δn}=O(ξn,p/n),\displaystyle\Pr\left\{\max_{j\in\mathcal{I}_{1}}|s_{kj}^{2}/\sigma^{2}_{j}-1|\geq\delta_{n}\right\}=O(\sqrt{\xi_{n,p}/n}),

and thus Pr(𝒞pc)=O(ξn,p/n){\Pr(\mathcal{C}_{p}^{c})}=O(\sqrt{\xi_{n,p}/n}) holds.

Conditional on the event 𝒞p\mathcal{C}_{p}, we have

|D3|\displaystyle|D_{3}| =|j1𝕀(Wjt)j1𝕀(W~jt)|j0𝕀(Wjt)\displaystyle=\frac{\left|\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(W_{j}\geq t)-\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(\tilde{W}_{j}\geq t)\right|}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}
j1|𝕀(Wjt)𝕀(W~jt)|j0𝕀(Wjt)\displaystyle\leq\frac{\sum_{j\in\mathcal{I}_{1}}\left|\mathbb{I}(W_{j}\geq t)-\mathbb{I}(\tilde{W}_{j}\geq t)\right|}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}
=j1𝕀(tWjt(1+3δn))j0𝕀(Wjt)+j1𝕀(t(13δn)Wjt)j0𝕀(Wjt)\displaystyle=\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(t\leq W_{j}\leq t(1+3\delta_{n}))}{{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}}+\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(t(1-3\delta_{n})\leq W_{j}\leq t)}{{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}}
D31+D32.\displaystyle\equiv D_{31}+D_{32}.

We only need to deal with D31D_{31} and D32D_{32} follows similarly. By the Markov’s inequality, we get

Pr(D31>ϵ)\displaystyle\Pr(D_{31}>\epsilon)\leq p01j1Pr(tWjt(1+3δn))ϵG(t)(1+ζp)\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))}{\epsilon G(t)(1+\zeta_{p})}
=\displaystyle= p01j1𝒮pPr(tWjt(1+3δn))+j𝒮pPr(tWjt(1+3δn))ϵG(t)(1+ζp)\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}\setminus\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))+\sum_{j\in\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))}{\epsilon G(t)(1+\zeta_{p})}
\displaystyle\leq p01j1𝒮pPr(tWjt(1+3δn))+Cβpp1ϵG(t)(1+ζp)\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}\setminus\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))+C\beta_{p}p^{-1}}{\epsilon G(t)(1+\zeta_{p})}
\displaystyle\leq p01j1𝒮pPr(tWjt(1+3δn))+Cβpp1ϵt1exp(t)(1+ζp),\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}\setminus\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))+C\beta_{p}p^{-1}}{\epsilon t^{-1}\exp(-t)(1+\zeta_{p})},
\displaystyle\leq C(p1βp)t2exp(t)δn+βpp1texp(t)p0ϵ(1+ζp)\displaystyle C\frac{(p_{1}-\beta_{p})t^{2}\exp(t)\delta_{n}+\beta_{p}p^{-1}t\exp(t)}{p_{0}\epsilon(1+\zeta_{p})}
\displaystyle\leq C(p1/βp1)log2(p/βp)δnϵ(1+ζp)+O(p1logp)=O(δn),\displaystyle C\frac{(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})\delta_{n}}{\epsilon(1+\zeta_{p})}+O(p^{-1}\log p)=O(\delta_{n}),

where 𝒮p={j:|μj|/σj3logp/n}\mathcal{S}_{p}=\left\{j:|\mu_{j}|/\sigma_{j}\geq 3\sqrt{\log p/n}\right\}, and CC is some positive constant. The second equality is due to Lemma A.3, the fourth inequality comes from the fact that tlog{p/(βpα)}t\lesssim\log\{p/(\beta_{p}\alpha)\} and the last inequality uses the condition (p1/βp1)log2(p/βp)=O(1)(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1) .

Finally, collecting all the terms of Dk,k=1,,4D_{k},k=1,\ldots,4, we conclude that for any G1(αp1/{p0(1α)})tlog{p/(βpα)}G^{-1}(\alpha p_{1}/\{p_{0}(1-\alpha)\})\lesssim t\lesssim\log\{p/(\beta_{p}\alpha)\},

θ(t)=j0𝕀(Wjt)1j𝕀(Wjt)[t3κ¯2n+Op(ξn,p/n+ζp)].\theta(t)=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}\left[-\frac{t^{3}\bar{\kappa}}{2n}+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)\right].

By using similar arguments given in the Supplemental Material we can show that j0𝕀(Wjt)1j𝕀(Wjt)=1+Op(pη1)\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}=1+O_{p}(p^{\eta-1}). Thus, we conclude that θ(t)=t3κ¯2n{1+Op(pη1)}+Op(ξn,p/n+ζp)\theta(t)=-\frac{t^{3}\bar{\kappa}}{2n}\left\{1+O_{p}(p^{\eta-1})\right\}+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right).

Accordingly, we obtain

FDPW(L)=j0𝕀(WjL)1j𝕀(WjL)\displaystyle\mathrm{FDP}_{W}({L})=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}
=j𝕀(WjL)1j𝕀(WjL)×j0𝕀(WjL)j𝕀(WjL)\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}
=j𝕀(WjL)1j𝕀(WjL)×j0𝕀(WjL)j0𝕀(WjL)×j0𝕀(WjL)j𝕀(WjL)\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-{L}\right)}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-{L})}{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}
=j𝕀(WjL)1j𝕀(WjL){1+2t3κ¯9n+Op(ξn,p/n+ζp)}{1+Op(pη1)}\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\left\{1+\frac{2t^{3}\bar{\kappa}}{9n}+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)\right\}\left\{1+O_{p}(p^{\eta-1})\right\}
=j𝕀(WjL)1j𝕀(WjL){149θ(L)}{1+Op(ξn,p/n+ζp)}\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\left\{1-\frac{4}{9}\theta({L})\right\}\left\{1+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)\right\}
α+Op(ξn,p/n+ζp).\displaystyle\leq\alpha+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right).

The proof is completed. \Box

References

  • Barber and Candès (2015) Barber, R. F. and Candès, E. J. (2015), “Controlling the false discovery rate via knockoffs,” The Annals of Statistics, 43, 2055–2085.
  • Barber et al. (2019) Barber, R. F., Candès, E. J., and Samworth, R. J. (2019), “Robust inference with knockoffs,” The Annals of Statistics, to appear.
  • Benjamini and Hochberg (1995) Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57, 289–300.
  • Benjamini and Yekutieli (2001) Benjamini, Y. and Yekutieli, D. (2001), “The control of the false discovery rate in multiple testing under dependency,” The Annals of Statistics, 29, 1165–1188.
  • Bourgon et al. (2010) Bourgon, R., Gentleman, R., and Huber, W. (2010), “Independent filtering increases detection power for high-throughput experiments,” Proceedings of the National Academy of Sciences, 107, 9546–9551.
  • Candes et al. (2018) Candes, E., Fan, Y., Janson, L., and Lv, J. (2018), “Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 551–577.
  • Cao and Kosorok (2011) Cao, H. and Kosorok, M. R. (2011), “Simultaneous critical values for t-tests in very high dimensions,” Bernoulli: official journal of the Bernoulli Society for Mathematical Statistics and Probability, 17, 347–394.
  • Chang et al. (2016) Chang, J., Shao, Q.-M., and Zhou, W.-X. (2016), “Cramér-type moderate deviations for Studentized two-sample UU-statistics with applications,” The Annals of Statistics, 44, 1931–1956.
  • Chiaretti et al. (2005) Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Wang, K. S., Mandelli, F., Foa, R., and Ritz, J. (2005), “Gene expression profiles of B-lineage adult acute lymphocytic leukemia reveal genetic patterns that identify lineage derivation and distinct mechanisms of transformation,” Clinical cancer research, 11, 7209–7219.
  • Delaigle et al. (2011) Delaigle, A., Hall, P., and Jin, J. (2011), “Robustness and accuracy of methods for high dimensional data analysis based on Student’s t-statistic,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 283–301.
  • Efron (2004) Efron, B. (2004), “Large-scale simultaneous hypothesis testing: the choice of a null hypothesis,” Journal of the American Statistical Association, 99, 96–104.
  • Fan et al. (2007) Fan, J., Hall, P., and Yao, Q. (2007), “To how many simultaneous hypothesis tests can normal, student’s t or bootstrap calibration be applied?” Journal of the American Statistical Association, 102, 1282–1288.
  • Liu and Shao (2014) Liu, W. and Shao, Q.-M. (2014), “Phase transition and regularized bootstrap in large-scale tt-tests with false discovery rate control,” The Annals of Statistics, 42, 2003–2025.
  • Meinshausen et al. (2009) Meinshausen, N., Meier, L., and Bühlmann, P. (2009), “P-values for high-dimensional regression,” Journal of the American Statistical Association, 104, 1671–1681.
  • Petrov (2012) Petrov, V. V. (2012), Sums of independent random variables, vol. 82, Springer Science & Business Media.
  • Storey et al. (2004) Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 187–205.
  • Wang (2005) Wang, Q. (2005), “Limit theorems for self-normalized large deviation,” Electronic Journal of Probability, 10, 1260–1285.
  • Wasserman and Roeder (2009) Wasserman, L. and Roeder, K. (2009), “High dimensional variable selection,” The Annals of Statistics, 37, 2178–2201.

Supplementary Material for “A New Procedure for Controlling False Discovery Rate in Large-Scale tt-tests”

This supplementary material contains the proofs of some technical lemmas and corollaries, and additional simulation results.

Additional Lemmas

The first one is the large deviation result for the Student’s tt statistic TT. See also Wang (2005).

Lemma S.1

[Delaigle et al. (2011)] Let B>1B>1 denote a constant. Then,

Pr(T>x)1Φ(x)=exp{x3κ3n}[1+θ(n,x){(1+|x|)n1/2+(1+|x|)4n1}]\frac{\Pr(T>x)}{1-\Phi(x)}=\exp\left\{-\frac{x^{3}\kappa}{3\sqrt{n}}\right\}\left[1+\theta(n,x)\left\{(1+|x|)n^{-1/2}+(1+|x|)^{4}n^{-1}\right\}\right]

as nn\to\infty, where the function θ\theta is bounded in absolute value by a finite, positive constant C1(B)C_{1}(B) (depending only on BB), uniformly in all distributions of XX for which 𝔼|X|4B\mathbb{E}|X|^{4}\leq B, 𝔼(X2)=1\mathbb{E}(X^{2})=1 and 𝔼(X)=0\mathbb{E}(X)=0, and uniformly in xx satisfying 0xBn1/40\leq x\leq Bn^{1/4}.

The second one is a standard large deviation result for the mean; See Theorem VIII-4 in Petrov (2012).

Lemma S.2 (Large deviation for the mean)

Suppose that X1,,XnX_{1},\ldots,X_{n} are i.i.d random variables with mean zero and variance σ2\sigma^{2}, satisfying Assumption 1-(ii). Then for any 0xcn1/60\leq x\leq cn^{1/6} and c>0c>0,

Pr(nX¯/σ>x)1Φ(x)=exp{x3κ6n}{1+o(1)}.\frac{\Pr(\sqrt{n}\bar{X}/\sigma>x)}{1-\Phi(x)}=\exp\left\{\frac{x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+o(1)\right\}.

The third lemma is a large deviation result for Tj/(sj/σj)T_{j}/(s_{j}/\sigma_{j}).

Lemma S.3

Suppose Assumptions 1-(ii) hold. Then for xx\to\infty and x=o(n1/6)x=o(n^{1/6}),

Pr(T/(s/σ)>x)1Φ(x)=exp{5x3κ6n}{1+O(x2+x/n)}.\frac{\Pr(T/(s/\sigma)>x)}{1-\Phi(x)}=\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}.

Proof. Without loss of generality, we assume that σ2=1\sigma^{2}=1. First of all, we deal with Pr(nX¯/m2>x)\Pr(\sqrt{n}\bar{X}/m_{2}>x), where m2=n1i=1nXi2m_{2}=n^{-1}\sum_{i=1}^{n}X_{i}^{2}. Observe

Pr(nX¯/m2>x)=Pr(n1/2i=1nYix),\displaystyle\Pr(\sqrt{n}\bar{X}/m_{2}>x)=\Pr\left(n^{-1/2}\sum_{i=1}^{n}Y_{i}\geq x\right),

where Yi=Xicn,x(Xi21)Y_{i}=X_{i}-c_{n,x}(X_{i}^{2}-1) and cn,x=n1/2xc_{n,x}=n^{-1/2}x. Simple calculation yields

Var(Yi)\displaystyle\mathrm{Var}(Y_{i}) =(12cn,xκ)+O(cn2),\displaystyle=(1-2c_{n,x}\kappa)+O(c^{2}_{n}),
E(Yi3)\displaystyle E(Y_{i}^{3}) =κ3cn,x𝔼Xi4+3cn,x+O(cn2).\displaystyle=\kappa-3c_{n,x}\mathbb{E}X_{i}^{4}+3c_{n,x}+O(c^{2}_{n}).

By Lemma S.2, we have

Pr(n1/2i=1nYiVar(Y)xVar(Y))Φ~(x/Var(Y))\displaystyle\frac{\Pr\left(\frac{n^{-1/2}\sum_{i=1}^{n}Y_{i}}{\sqrt{\mathrm{Var}(Y)}}\geq\frac{x}{\sqrt{\mathrm{Var}(Y)}}\right)}{\tilde{\Phi}(x/\sqrt{\mathrm{Var}(Y)})} =exp{x3κY6nVar3/2Y}{1+O(x/n)}\displaystyle=\exp\left\{\frac{x^{3}\kappa_{Y}}{6\sqrt{n}\mathrm{Var}^{3/2}{Y}}\right\}\left\{1+O(x/\sqrt{n})\right\}
=exp{x3κ6n}{1+O(x/n)},\displaystyle=\exp\left\{\frac{x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O(x/\sqrt{n})\right\},

where κY=E(Yi3)/Var3/2(Yi)\kappa_{Y}=E(Y_{i}^{3})/\mathrm{Var}^{3/2}(Y_{i}).

By using the fact that for xx\to\infty and x=o(an1/2)x=o(a_{n}^{-1/2}),

Φ~(x(1+an))=Φ~(x)exp(x2an){1+O(x2)},\tilde{\Phi}(x(1+a_{n}))=\tilde{\Phi}(x)\exp(-x^{2}{a_{n}})\left\{1+O(x^{-2})\right\},

we have

Φ~(x/Var(Y))=Φ~(x)exp{x2cn,xκ}{1+O(x2)}.\displaystyle{\tilde{\Phi}(x/\sqrt{\mathrm{Var}(Y)})}={\tilde{\Phi}(x)}\exp\left\{-x^{2}c_{n,x}\kappa\right\}\left\{1+O(x^{-2})\right\}.

Thus,

Pr(nX¯/m2>x)\displaystyle\Pr(\sqrt{n}\bar{X}/m_{2}>x) =Φ~(x)exp{x2cn,xκ+x3κ6n}{1+O(x2+x/n)}\displaystyle={\tilde{\Phi}(x)}\exp\left\{-x^{2}c_{n,x}\kappa+\frac{x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}
=Φ~(x)exp{5x3κ6n}{1+O(x2+x/n)}.\displaystyle={\tilde{\Phi}(x)}\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}.

Finally, we show that Pr(T/s>x){\Pr(T/s>x)} and Pr(nX¯/m2>x)\Pr(\sqrt{n}\bar{X}/m_{2}>x) are close enough. Note that

Pr(T/s>x)\displaystyle\Pr(T/s>x) =Pr(nX¯/(m2X¯2)>x)\displaystyle=\Pr(\sqrt{n}\bar{X}/(m_{2}-\bar{X}^{2})>x)
=Pr(xX¯2+nX¯m2x>0)\displaystyle=\Pr(x\bar{X}^{2}+\sqrt{n}\bar{X}-m_{2}x>0)
=Pr(nX¯m2>n+n2+4nx2m22xm2)+Pr(nX¯m2<nn2+4nx2m22xm2)\displaystyle=\Pr\left(\frac{\sqrt{n}\bar{X}}{m_{2}}>\frac{-{n}+\sqrt{n^{2}+4nx^{2}m_{2}}}{2xm_{2}}\right)+\Pr\left(\frac{\sqrt{n}\bar{X}}{m_{2}}<\frac{-{n}-\sqrt{n^{2}+4nx^{2}m_{2}}}{2xm_{2}}\right)
=Pr(nX¯m2>x{1+O(x2n1)})+o(x/n)Φ~(x)\displaystyle=\Pr\left(\frac{\sqrt{n}\bar{X}}{m_{2}}>x\left\{1+O(x^{2}n^{-1})\right\}\right)+o(x/\sqrt{n})\tilde{\Phi}(x)
=Φ~(x{1+O(x2n1)})exp{5x3κ6n}{1+O(x2+x/n)}\displaystyle=\tilde{\Phi}(x\left\{1+O(x^{2}n^{-1})\right\})\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}
=Φ~(x)exp{5x3κ6n}{1+O(x2+x/n)}.\displaystyle=\tilde{\Phi}(x)\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}.

\Box

Proof of Lemmas and Propositions

Proof of Lemma A.1. Recalling p=exp{o(n1/3)}p=\exp\{o(n^{1/3})\}, we have t=o(n1/3)t=o(n^{1/3}) and ant=o(n1/6)a_{nt}=o(n^{1/6}). Let 𝒜p={v:t/ant<|v|<ant}{\mathcal{A}}_{p}=\{v:t/a_{nt}<|v|<a_{nt}\}. Then,

Pr(T1T2>t)Pr(T1T2<t)1\displaystyle\frac{\Pr(T_{1}T_{2}>t)}{\Pr(T_{1}T_{2}<-t)}-1
=Pr(T1T2>t,𝒜p)Pr(T1T2<t,𝒜p)Pr(T1T2<t)+Pr(T1T2>t,𝒜pc)Pr(T1T2<t,𝒜pc)Pr(T1T2<t)\displaystyle=\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}_{p})-\Pr(T_{1}T_{2}<-t,{\mathcal{A}}_{p})}{\Pr(T_{1}T_{2}<-t)}+\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})-\Pr(T_{1}T_{2}<-t,{\mathcal{A}}^{c}_{p})}{\Pr(T_{1}T_{2}<-t)}
C1+C2.\displaystyle\equiv C_{1}+C_{2}.

Firstly, for the term C2C_{2},

C2\displaystyle C_{2} =\displaystyle= [Pr(T1T2>t,𝒜pc)P(Z1Z2>t)Pr(T1T2<t,𝒜pc)Pr(Z1Z2<t)]Pr(Z1Z2<t)Pr(T1T2<t),\displaystyle\left[\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})}{P(Z_{1}Z_{2}>t)}-\frac{\Pr(T_{1}T_{2}<-t,{\mathcal{A}}^{c}_{p})}{\Pr(Z_{1}Z_{2}<-t)}\right]\frac{\Pr(Z_{1}Z_{2}<-t)}{\Pr(T_{1}T_{2}<-t)},

where Z1Z_{1} and Z2Z_{2} are two independent N(0,1)N(0,1) variables. From the proof given later, it can be easily see that Pr(Z1Z2<t)/Pr(T1T2<t)1{\Pr(Z_{1}Z_{2}<-t)}/{\Pr(T_{1}T_{2}<-t)}\rightarrow 1 uniformly in 0tClogp0\leq t\leq C\log p. Thus, in what follows we mainly focus on the rate of Pr(T1T2>t,𝒜pc)/Pr(Z1Z2>t){\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})}/{\Pr(Z_{1}Z_{2}>t)}. The other term Pr(T1T2<t,𝒜pc)\Pr(T_{1}T_{2}<-t,{\mathcal{A}}^{c}_{p}) can be handled similarly.

Note that Pr(T1T2>t,𝒜pc)2Pr(|T1|>ant)\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})\leq 2\Pr(|T_{1}|>a_{nt}), Pr(Z1Z2>t){Pr(Z1>t)}2\Pr(Z_{1}Z_{2}>t)\geq\left\{\Pr(Z_{1}>\sqrt{t})\right\}^{2}. By the inequality

xx2+1ϕ(x)<Φ~(x)<ϕ(x)/x,for allx\frac{x}{x^{2}+1}\phi(x)<\tilde{\Phi}(x)<\phi(x)/x,\ \ \mbox{for all}\ x

and the large deviation formula for the tt-statistic (Lemma S.1), we obtain that

Pr(T1T2>t,𝒜pc)Pr(Z1Z2>t)\displaystyle\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})}{\Pr(Z_{1}Z_{2}>t)} 2Pr(|T1|>ant){Pr(Z1>t)}2\displaystyle\leq\frac{2\Pr(|T_{1}|>a_{nt})}{\left\{\Pr(Z_{1}>\sqrt{t})\right\}^{2}}
cexp{12(ant2tt)}tant\displaystyle\leq c\exp\left\{-\frac{1}{2}(a_{nt}^{2}-t-t)\right\}\frac{t}{a_{nt}}
=O(t/n),\displaystyle=O\left(\sqrt{t/n}\right),

thus we claim that C2=O(t/n)C_{2}=O(\sqrt{t/n}).

Next, we deal with the main term C1C_{1}. Denote 𝒜p+={v:t/ant<v<ant}{\mathcal{A}}_{p}^{+}=\{v:t/a_{nt}<v<a_{nt}\}. Observe

Pr(T1>t/T2,𝒜p+)\displaystyle\Pr(T_{1}>t/T_{2},{\mathcal{A}}_{p}^{+})
=\displaystyle= 𝒜p+Φ~(t/v)exp{t3κ3v3n}{1+O(t/(vn))}𝑑Φ~(v)exp{v3κ3n}{1+O(v/n)}\displaystyle-\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\exp\left\{-\frac{t^{3}\kappa}{3v^{3}\sqrt{n}}\right\}\left\{1+O({t}/{(v\sqrt{n})})\right\}d\tilde{\Phi}(v)\exp\left\{-\frac{v^{3}\kappa}{3\sqrt{n}}\right\}\left\{1+O(v/\sqrt{n})\right\}
=\displaystyle= 𝒜p+Φ~(t/v)ϕ(v)exp{{(t/v)3+v3}κ3n}𝑑v{1+O(ant/n)}\displaystyle\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}
+𝒜p+Φ~(t/v)Φ~(v)exp{{(t/v)3+v3}κ3n}v2κn𝑑v{1+O(ant/n)}\displaystyle+\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\tilde{\Phi}(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}\frac{v^{2}\kappa}{\sqrt{n}}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}
\displaystyle\equiv R1+R2,\displaystyle R_{1}+R_{2},

where we use Lemma S.1 again. Note that

R2\displaystyle R_{2} 𝒜p+Φ~(t/v)ϕ(v)exp{{(t/v)3+v3}κ3n}vκn𝑑v{1+O(ant/n)},\displaystyle\leq\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}\frac{v\kappa}{\sqrt{n}}dv\left\{1+O(a_{nt}/\sqrt{n})\right\},

and accordingly R2=R1O(ant/n)R_{2}=R_{1}O(a_{nt}/\sqrt{n}). Hence,

Pr(T1T2>t,𝒜p+)=𝒜p+Φ~(t/v)ϕ(v)exp{{(t/v)3+v3}κ3n}𝑑v{1+O(ant/n)}.\Pr(T_{1}T_{2}>t,{\mathcal{A}}_{p}^{+})=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}.

Similarly,

Pr(T1T2>t,𝒜p)\displaystyle\Pr(T_{1}T_{2}>t,{\mathcal{A}}_{p}^{-}) =𝒜p+Φ~(t/v)ϕ(v)exp{{(t/v)3+v3}κ3n}𝑑v{1+O(ant/n)},\displaystyle=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\},
Pr(T1T2<t,𝒜p+)\displaystyle\Pr(T_{1}T_{2}<-t,{\mathcal{A}}_{p}^{+}) =𝒜p+Φ~(t/v)ϕ(v)exp{{(t/v)3v3}κ3n}𝑑v{1+O(ant/n)},\displaystyle=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{\frac{\{(t/v)^{3}-v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\},
Pr(T1T2<t,𝒜p)\displaystyle\Pr(T_{1}T_{2}<-t,{\mathcal{A}}_{p}^{-}) =𝒜p+Φ~(t/v)ϕ(v)exp{{(t/v)3+v3}κ3n}𝑑v{1+O(ant/n)}.\displaystyle=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{\frac{\{-(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}.

The Taylor’s expansion yields that

Pr(T1T2>t)Pr(T1T2<t)=4t3κ29n𝒜p+Φ~(t/v)ϕ(v)𝑑v{1+O(ant/n)}.\displaystyle\Pr(T_{1}T_{2}>t)-\Pr(T_{1}T_{2}<-t)=\frac{4t^{3}\kappa^{2}}{9n}\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)dv\left\{1+O(a_{nt}/\sqrt{n})\right\}.

Consequently, we easily get that

C1=2t3κ29n+O(ant/n),\displaystyle C_{1}=\frac{2t^{3}\kappa^{2}}{9n}+O(a_{nt}/\sqrt{n}),

from which we obtain the assertion. \Box

Proof of Lemma A.2. The proof is similar to Lemma A.1 but using the large deviation result for T/(s/σ)T/(s/\sigma) obtained in Lemma S.3 and for the mean. \Box

Proof of Lemma A.3. We prove (A.1) and the proof of (A.2) follows similarly. Here Wj=T1jT2jW_{j}=T_{1j}T_{2j}. Clearly, G(t)G(t) is a deceasing and continuous function. Let z0<z1<<zdp1z_{0}<z_{1}<\cdots<z_{d_{p}}\leq 1 and ti=G1(zi)t_{i}=G^{-1}(z_{i}), where z0=bp/p,zi=bp/p+cpexp(iδ)/p,dp=[{log((pbp)/cp)}1/δ]z_{0}=b_{p}/p,z_{i}=b_{p}/p+c_{p}\exp(i^{\delta})/p,d_{p}=[\{\log((p-b_{p})/c_{p})\}^{1/\delta}] with cp/bp0c_{p}/b_{p}\to 0 and 0<δ<10<\delta<1. Note that G(ti)/G(ti+1)=1+o(1)G(t_{i})/G(t_{i+1})=1+o(1) uniformly in ii. Thus, it is enough to obtain the convergence rate of

Dp=sup0idp|j0{𝕀(Wj>ti)Pr(Wj>ti)}p0G(ti)|.D_{p}=\sup_{0\leq i\leq d_{p}}\left|\frac{\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(W_{j}>t_{i})-\Pr(W_{j}>t_{i})\right\}}{p_{0}G(t_{i})}\right|.

Define 𝒮j={k0:Xkis dependent withXj}\mathcal{S}_{j}=\{k\in\mathcal{I}_{0}:\,\,X_{k}\,\,\mbox{is dependent with}\,\,X_{j}\} and further

D(t)=𝔼[j0{𝕀(Wj>t)Pr(Wj>t)}2].D(t)=\mathbb{E}\left[\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(W_{j}>t)-\Pr(W_{j}>t)\right\}^{2}\right].

It is noted that

D(t)\displaystyle D(t) =j0k𝒮j𝔼[{𝕀(Wj>t)Pr(Wj>t)}{𝕀(Wk>t)Pr(Wk>t)}]\displaystyle=\sum_{j\in\mathcal{I}_{0}}\sum_{k\in\mathcal{S}_{j}}\mathbb{E}\left[\left\{\mathbb{I}(W_{j}>t)-\Pr(W_{j}>t)\right\}\left\{\mathbb{I}(W_{k}>t)-\Pr(W_{k}>t)\right\}\right]
rpp0G(t).\displaystyle\leq r_{p}p_{0}G(t).

We then obtain

Pr(Dpϵ)\displaystyle\Pr(D_{p}\geq\epsilon) i=0dpPr(|j0[𝕀(Wj>ti)Pr(Wj>ti)]p0G(ti)|ϵ)\displaystyle\leq\sum_{i=0}^{d_{p}}\Pr\left(\left|\frac{\sum_{j\in\mathcal{I}_{0}}[\mathbb{I}(W_{j}>t_{i})-\Pr(W_{j}>t_{i})]}{p_{0}G(t_{i})}\right|\geq\epsilon\right)
1ϵ2i=0dp1p02G2(ti)D(ti)\displaystyle\leq\frac{1}{\epsilon^{2}}\sum_{i=0}^{d_{p}}\frac{1}{p^{2}_{0}G^{2}(t_{i})}D(t_{i})
rpϵ2i=0dp1p0G(ti).\displaystyle\leq\frac{r_{p}}{\epsilon^{2}}\sum_{i=0}^{d_{p}}\frac{1}{p_{0}G(t_{i})}.

Moreover, observe that

i=0dp1p0G(ti)=pp0(1bp+i=1dp1bp+cpeiδ)\displaystyle\sum_{i=0}^{d_{p}}\frac{1}{p_{0}G(t_{i})}=\frac{p}{p_{0}}\left(\frac{1}{b_{p}}+\sum_{i=1}^{d_{p}}\frac{1}{b_{p}+c_{p}e^{i^{\delta}}}\right)
\displaystyle\leq c(1bp+cp1i=1dp11+eiδ)c/cp{1+o(1)}.\displaystyle c\left(\frac{1}{b_{p}}+c_{p}^{-1}\sum_{i=1}^{d_{p}}\frac{1}{1+e^{i^{\delta}}}\right)\leq c/c_{p}\{1+o(1)\}.

Because cpc_{p} can be made arbitrarily large as long as cp/bp0c_{p}/b_{p}\to 0, we have Dp=Op(rp/bp)D_{p}=O_{p}(\sqrt{r_{p}/b_{p}}).

Proof of Lemma A.4. By using the same arguments given in the proof of Lemma A.3, this lemma can be proved easily and thus omitted. \Box

Proof of Proposition 2

We prove this proposition for L+L_{+}. The result for LL can be obtained similarly. Fix ϵ>0\epsilon>0 and for any threshold t>0t>0, define

Rϵ(t)=j0𝕀(Wjt,Δjϵ)1+j0𝕀(Wjt).R_{\epsilon}(t)=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq t,\Delta_{j}\leq\epsilon\right)}{1+\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-t\right)}.

Consider the event that 𝒜={Δmaxj0Δjϵ}\mathcal{A}=\{\Delta\equiv\max_{j\in\mathcal{I}_{0}}\Delta_{j}\leq\epsilon\}. Furthermore, for a threshold rule L=T(𝐖)L=T({\bf W}) mapping statistics 𝐖{\bf W} to a threshold L0L\geq 0, for each index j=1,,pj=1,\ldots,p, we define

Lj=T(W1,,Wj1,|Wj|,Wj+1,,Wp)0L_{j}=T\left(W_{1},\ldots,W_{j-1},|W_{j}|,W_{j+1},\ldots,W_{p}\right)\geq 0

i.e. the threshold that we would obtain if sgn(Wj)\mathrm{sgn}(W_{j}) were set to 1.

Then for the RESS method with the threshold L+L_{+}, we can write

j0𝕀(WjL+,Δjϵ)1j𝕀(WjL+)\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L_{+},\Delta_{j}\leq\epsilon\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq L_{+})} =1+j𝕀(WjL+)1j𝕀(WjL+)×j0𝕀(WjL+,Δjϵ)1+j𝕀(WjL+)\displaystyle=\frac{1+\sum_{j}\mathbb{I}\left(W_{j}\leq-L_{+}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq L_{+})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L_{+},\Delta_{j}\leq\epsilon\right)}{1+\sum_{j}\mathbb{I}\left(W_{j}\leq-L_{+}\right)}
α×Rϵ(L+).\displaystyle\leq\alpha\times R_{\epsilon}(L_{+}).

It is crucial to get an upper bound for 𝔼{Rϵ(L+)}\mathbb{E}\{R_{\epsilon}(L_{+})\}. We have

𝔼{Rϵ(L)}\displaystyle\mathbb{E}\{R_{\epsilon}(L)\} =𝔼[j0𝕀(WjL,Δjϵ)1+j0𝕀(WjL+)]\displaystyle=\mathbb{E}\left[\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L,\Delta_{j}\leq\epsilon\right)}{1+\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-L_{+}\right)}\right]
=j0𝔼{𝕀(WjLj,Δjϵ)1+k0,kj𝕀(WkLj)}\displaystyle=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}
=j0𝔼[𝔼{𝕀(WjLj,Δjϵ)1+k0,kj𝕀(WkLj)|Wj|}]\displaystyle=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left[\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\mid|W_{j}|\right\}\right]
=j0𝔼{Pr(Wj>0|Wj|)𝕀(|Wj|Lj,Δjϵ)1+k0,kj𝕀(WkLj)},\displaystyle=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\Pr\left(W_{j}>0\mid|W_{j}|\right)\mathbb{I}\left(|W_{j}|\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}, (S.5)

where the last step holds since the only unknown is the sign of WjW_{j} after conditioning on |Wj||W_{j}|. By definition of Δj\Delta_{j}, we have Pr(Wj>0|Wj|)1/2+Δj\Pr\left(W_{j}>0\mid|W_{j}|\right)\leq 1/2+\Delta_{j}.

Hence,

𝔼\displaystyle\mathbb{E} {Rϵ(L+)}\displaystyle\{R_{\epsilon}(L_{+})\}
j0𝔼{(12+Δj)𝕀(|Wj|Lj,Δjϵ)1+k0,kj𝕀(WkLj)}\displaystyle\leq\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{(\frac{1}{2}+\Delta_{j})\mathbb{I}\left(|W_{j}|\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}
(12+ϵ)[j0𝔼{𝕀(WjLj,Δjϵ)1+k0,kj𝕀(WkLj)}+j0𝔼{𝕀(WjLj)1+k0,kj𝕀(WkLj)}]\displaystyle\leq(\frac{1}{2}+\epsilon)\left[\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}+\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}\right]
=(12+ϵ)[𝔼{Rϵ(L+)}+j0𝔼{𝕀(WjLj)1+k0,kj𝕀(WkLj)}].\displaystyle=(\frac{1}{2}+\epsilon)\left[\mathbb{E}\{R_{\epsilon}(L_{+})\}+\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}\right].

Finally, the sum in the last expression can be simplified as: if for all null jj, Wj>LjW_{j}>-L_{j}, then the sum is equal to zero, while otherwise,

j0𝔼{𝕀(WjLj)1+k0,kj𝕀(WkLj)}=j0𝔼{𝕀(WjLj)1+k0,kj𝕀(WkLk)}=1,\displaystyle\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{k}\right)}\right\}=1,

where the first step comes from the fact: for any j,kj,k, if Wjmin(Lj,Lk)W_{j}\leq-\min(L_{j},L_{k}) and Wkmin(Lj,Lk)W_{k}\leq-\min(L_{j},L_{k}), then Lj=LkL_{j}=L_{k}. See Barber et al. (2019) for a proof.

Accordingly, we have

𝔼{Rϵ(L+)}1/2+ϵ1/2ϵ1+4ϵ.\mathbb{E}\{R_{\epsilon}(L_{+})\}\leq\frac{1/2+\epsilon}{1/2-\epsilon}\leq 1+4\epsilon.

Consequently, the assertion of this proposition holds. \Box

Proof of Proposition 3. The proof follows similarly to that of Proposition 2 but uses Pr(Wj>0|Wj|,𝐖j)\Pr\left(W_{j}>0\mid|W_{j}|,{\bf W}_{-j}\right) to replace Pr(Wj>0|Wj|)\Pr\left(W_{j}>0\mid|W_{j}|\right) in (S.5). \Box

Discussion on R(t)R(t). Recall the definition of ={j:|μj|/σj(2+c)logp/n}\mathcal{M}=\left\{j:|\mu_{j}|/\sigma_{j}\geq\sqrt{(2+c)\log p/n}\right\} for any c>0c>0. We shall show that

sup0t2logp|j1𝕀(Wjt)j0𝕀(Wjt)|p0.\sup_{0\leq t\leq 2\log p}\left|\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}\left(W_{j}\leq-t\right)}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-t\right)}\right|\mathop{\rightarrow}\limits^{p}0.

Observe that for 0t2logp0\leq t\leq 2\log p,

jPr(Wjt)j0Pr(Wjt)\displaystyle\frac{\sum_{j\in\mathcal{M}}\Pr\left(W_{j}\leq-t\right)}{\sum_{j\in\mathcal{I}_{0}}\Pr\left(W_{j}\leq-t\right)}
2jPr(T1jt)p0exp(t)/t{1+o(1)}\displaystyle\leq\frac{2\sum_{j\in\mathcal{M}}\Pr(T_{1j}\leq-\sqrt{t})}{p_{0}\exp(-t)/t\left\{1+o(1)\right\}}
2||maxj0Pr(T1jt(1+c)logp)p0exp(t)/t{1+o(1)}=o(1).\displaystyle\leq\frac{2|\mathcal{M}|\max_{j\in\mathcal{I}_{0}}\Pr(T_{1j}\leq-\sqrt{t}-\sqrt{(1+c)\log p})}{p_{0}\exp(-t)/t\left\{1+o(1)\right\}}=o(1).

On the other hand, j1\Pr(Wjt)(p1||)Prj0(Wjt)\sum_{j\in\mathcal{I}_{1}\backslash\mathcal{M}}\Pr\left(W_{j}\leq-t\right)\leq(p_{1}-|\mathcal{M}|)\Pr_{j\in\mathcal{I}_{0}}\left(W_{j}\leq-t\right) and thus the assertion holds due to (p1||)/p00(p_{1}-|\mathcal{M}|)/p_{0}\to 0.

Table S1: Comparison results of FDR and TPR when the signals are μj=δjlogp/nt\mu_{j}=\delta_{j}\sqrt{\log p/n_{t}} with δjUnif(1.5,1)\delta_{j}\sim{\rm Unif}(-1.5,-1) under p=5000p=5000, ρ=0\rho=0, ϑ=0.05\vartheta=0.05 and the target FDR α=0.2\alpha=0.2.
nt=50n_{t}=50 nt=100n_{t}=100
Method FDR(%) TPR(%) FDR(%) TPR(%)
t(5)t(5) RESS 20.1(6.0)20.1_{\tiny{(6.0)}} 64.0(6.0)64.0_{\tiny{(6.0)}} 21.0(6.2)21.0_{\tiny{(6.2)}} 65.2(6.4)65.2_{\tiny{(6.4)}}
BH 17.3(2.8)17.3_{\tiny{(2.8)}} 63.6(3.6)63.6_{\tiny{(3.6)}} 18.2(2.7)18.2_{\tiny{(2.7)}} 65.4(3.5)65.4_{\tiny{(3.5)}}
exp(1) RESS 32.9(4.0)32.9_{\tiny{(4.0)}} 74.1(3.5)74.1_{\tiny{(3.5)}} 27.5(3.4)27.5_{\tiny{(3.4)}} 77.4(3.3)77.4_{\tiny{(3.3)}}
BH 49.1(2.1)49.1_{\tiny{(2.1)}} 85.6(2.4)85.6_{\tiny{(2.4)}} 41.1(2.9)41.1_{\tiny{(2.9)}} 86.7(2.5)86.7_{\tiny{(2.5)}}
Mixed RESS 26.0(4.5)26.0_{\tiny{(4.5)}} 74.3(4.3)74.3_{\tiny{(4.3)}} 23.5(4.6)23.5_{\tiny{(4.6)}} 76.2(4.0)76.2_{\tiny{(4.0)}}
BH 31.6(2.6)31.6_{\tiny{(2.6)}} 80.3(2.6)80.3_{\tiny{(2.6)}} 27.6(3.1)27.6_{\tiny{(3.1)}} 80.7(2.5)80.7_{\tiny{(2.5)}}

Additional simulation results

Table S1 reports a brief comparison between this RESS and BH procedures when μj<0\mu_{j}<0 for all j1j\in\mathcal{I}_{1}. We see that the RESS significantly improves the accuracy of FDR control over the BH to certain degree, while maintains high power in most cases.

Figure S1 is a counterpart of Figure 4 when the errors are distributed from the mixed distribution.

Refer to caption
Figure S1: Box-plots of false discovery proportion (FDP) and true positive proportion (TPP) when errors are distributed from the mixed distribution under nt=100n_{t}=100, p=5000p=5000, and ρ=0\rho=0; The red dashed lines indicate the target FDR level.