A New Procedure for Controlling False Discovery Rate in Large-Scale $t$ -tests

Changliang Zou^a, Haojie Ren^b, Xu Guo^c and Runze Li^b
^aSchool of Statistics and Data Science, Nankai University
Tianjin, P.R. China
^bDepartment of Statistics, The Pennsylvania State University,
University Park, PA 16802-2111, USA
^cSchool of Statistics, Beijing Normal University
Beijing, P.R. China

Abstract

This paper is concerned with false discovery rate (FDR) control in large-scale multiple testing problems. We first propose a new data-driven testing procedure for controlling the FDR in large-scale $t$ -tests for one-sample mean problem. The proposed procedure achieves exact FDR control in finite sample settings when the populations are symmetric no matter the number of tests or sample sizes. Comparing with the existing bootstrap method for FDR control, the proposed procedure is computationally efficient. We show that the proposed method can control the FDR asymptotically for asymmetric populations even when the test statistics are not independent. We further show that the proposed procedure with a simple correction is as accurate as the bootstrap method to the second-order degree, and could be much more effective than the existing normal calibration. We extend the proposed procedure to two-sample mean problem. Empirical results show that the proposed procedures have better FDR control than existing ones when the proportion of true alternative hypotheses is not too low, while maintaining reasonably good detection ability.
Keywords: Data splitting; Large-deviation probability; Multiple comparisons; Product of two normal variables; Skewness; Symmetry

1 Introduction

Many multiple testing problems are closely related to one-sample mean problem. Let $(X_{i1},$ $\ldots,X_{ip})^{\top}$ , $1\leq i\leq n_{t}$ be independent and identically distributed (i.i.d.) samples from $X=(X_{1},\ldots,X_{p})^{\top}$ with mean $\mu=(\mu_{1},\ldots,\mu_{p})^{\top}$ . Of interest is to test $H_{0}:\mu=0$ versus $H_{1}:\mu\neq 0$ . This leads to consider a multiple testing problem on the mean values

\mathbb{H}_{0j}:\mu_{j}=0\ \ \mbox{v.s.}\ \ \mathbb{H}_{1j}:\mu_{j}\neq 0,\ 1\leq j\leq p.

A standard procedure for false discovery rate (FDR) control is to apply the Benjamini and Hochberg (BH) method (Benjamini and Hochberg, 1995) to the $t$ statistics with the standard normal or Student’s $t$ calibration. That is, let $\bar{X}_{j}=n_{t}^{-1}\sum_{i=1}^{n_{t}}X_{ij}$ and $s_{j}^{2}=(n_{t}-1)^{-1}\sum_{i=1}^{n_{t}}(X_{ij}-\bar{X}_{j})^{2}$ , and define $T_{j}=\sqrt{n_{t}}\bar{X}_{j}/s_{j}$ , then the procedure rejects a hypothesis whenever $|T_{j}|\geq T$ with a data-dependent threshold

\displaystyle T=\inf\left\{t>0:\frac{2p\{1-\Phi(t)\}}{\#\{j:|T_{j}|\geq t\}}\leq\alpha\right\},

(1.1)

for a desired FDR level $\alpha$ (Storey et al., 2004), where $\Phi(\cdot)$ is the cumulative distribution function of $N(0,1)$ . It has been revealed that the accuracy of this control procedure heavily depends on the skewness of $X_{j}$ ’s and the diverging rate of $p$ relatively to $n$ , since $\Phi(t)$ is only an approximation to the distribution of $T_{j}$ .

Many studies have investigated the performance of large-scale $t$ tests. Efron (2004) observed that the null distribution choices substantially affect the simultaneous inference procedure in a microarray analysis. Delaigle et al. (2011) conducted a careful study of moderate and large deviations of the $t$ statistic which is indispensable to understanding its robustness and drawback for analyzing high dimensional data. Under a condition of non-sparse signals, Cao and Kosorok (2011) proved the robustness of Student’s $t$ test statistics and $N(0,1)$ calibration in the control of FDR. Liu and Shao (2014) gave a systematic analysis on the asymptotic conditions with which the large-scale $t$ testing procedure is able to have FDR control.

Bootstrap method is known as a useful way to improve the accuracy of an exact null distribution approximation and has been demonstrated to be particularly effective for highly multiple testing problems. See Delaigle et al. (2011) and the references therein. In general, the bootstrap is capable of correcting for skewness, and therefore leads to second-order accuracy. Accordingly, a faster increasing rate of $p$ could be allowed (Liu and Shao, 2014) and better FDR control would be achieved when the data populations are skewed. However, multiple testing problems with tens of thousands or even millions of hypotheses are now commonplace, and practitioners may be reluctant to use a bootstrap method in such situations, and therefore a rapid testing procedure is highly desirable.

In this paper, we propose a new data-driven selection procedure controlling the FDR. The method entails constructing $p$ new test statistics with marginal symmetry property, using the empirical distribution of the negative statistics to approximate that of the positive ones, and searching for the threshold with a formula similar to (1.1). The proposed procedure is computationally efficient since it only uses a one-time split of the data and calculation of the product of two $t$ statistics obtained from two splits. We study theoretical properties of the proposed procedure. We show that (a) the proposed method achieves exact FDR control even in finite sample settings when $X_{j}$ ’s are symmetric and independent; and (b) the proposed method achieves asymptotical FDR control under mild conditions when the populations are asymmetric and dependent. We further propose a simple refinement of the proposed procedure, and study the asymptotical property of the refined one. The theoretical property of the proposed refined one implies that it is as accurate as the bootstrap method to the second-order degree in certain situations. We also investigate extension of the proposed procedure to two-sample mean problem. Simulation comparisons imply that the proposed method has better FDR control than existing methods, maintains reasonably good power and has a significant reduction in computing time and storage.

The rest of this paper is organized as follows. In Section 2, we present the new procedure and establish its FDR control property. Some extensions are given in Section 3. Numerical studies are conducted in Section 4. Section 5 concludes the paper, and theoretical proofs are delineated in the Appendix. Some technical details and additional numerical results are provided in the Supplementary Material.

Notations. $A_{n}\approx B_{n}$ stands for that $A_{n}/B_{n}\mathop{\rightarrow}\limits^{p}1$ as $n\to\infty$ . The “ $\gtrsim$ ” and “ $\lesssim$ ” are similarly defined. We denote by $\mathcal{I}_{0}$ and $\mathcal{I}_{1}$ the true null set and alternative set, $p_{0}=|\mathcal{I}_{0}|$ and $p_{1}=|\mathcal{I}_{1}|$ , respectively.

2 A New FDR Control Procedure and its Theoretical Properties

We first propose a new FDR control procedure for the one-sample mean problem, and then establish the theoretical properties of the proposed procedure.

2.1 A new FDR control procedure

Without loss of generality, assume that the sample size is an even integer $n_{t}=2n$ . We randomly split the data into two disjoint groups ${\mathcal{D}}_{1}$ and ${\mathcal{D}}_{2}$ of equal size $n$ . The $t$ test statistics for the $j$ th variable on ${\mathcal{D}}_{1}$ and ${\mathcal{D}}_{2}$ are denoted as $T_{1j}$ and $T_{2j}$ , respectively. Define

W_{j}=T_{1j}T_{2j}.

Clearly, $W_{j}$ is likely to be large for most of the signals regardless of the sign of $\mu_{j},j\in\mathcal{I}_{1}$ , and small for most of the null variables. Observing that $W_{j}$ is, at least asymptotically, symmetric with mean zero for $j\in\mathcal{I}_{0}$ due to the central limit theorem and the independence between ${\mathcal{D}}_{1}$ and ${\mathcal{D}}_{2}$ , we can choose a threshold $L>0$ by setting

\displaystyle L=\inf\left\{t>0:\frac{\#\{j:W_{j}\leq-t\}}{\#\{j:W_{j}\geq t\}\vee 1}\leq\alpha\right\},

(2.1)

and reject the $\mathbb{H}_{0j}$ if $W_{j}\geq L$ , where $\alpha$ is the target FDR level. If the set is empty, we simply set $L=+\infty$ . The fraction in (2.1) is an estimate of the false discovery proportion (FDP) since the set $\{j:W_{j}\leq-t,j\in\mathcal{I}_{1}\}$ is often very small (if the signal is not too weak) and thus $\#\{j:W_{j}\leq-t\}$ is a good approximation to $\#\{j:W_{j}\leq-t,j\in\mathcal{I}_{0}\}$ .

Refer to caption — Figure 1: (a): Scatter plot of the $W_{j}$ ’s with red triangles and black dots denoting true signals and nulls respectively; (b): The corresponding estimate of FDP curve (against $t$ ) along with the true FDP. In this case, $p=1000$ , $n_{t}=100$ and $|\mathcal{I}_{1}|=50$ . We consider a multivariate chi-squared distribution with two degrees of freedoms and an autoregressive structure $(0.5^{|i-j|})$ . We set the signal-to-noise ratio as 2.

As described above, we construct the test statistic $W_{j}$ with marginal symmetry property by using sample splitting. Thus we refer this procedure to as Reflection via Sample Splitting (RESS). The RESS procedure is data-dependent and does not involve any unknown quantity related to the populations. This is an important and desirable feature of the RESS. Figure 1 depicts a visual representation of the RESS procedure. Specifically, Figure 1(a) depicts the scatter plot of the $W_{j}$ ’s with red triangles denoting true signals. Observe that the true signals are primarily above the x-axis, indicating $W_{j}>0$ , while the null $W_{j}$ ’s (black dots) are roughly symmetrically distributed across the horizontal lines. Figure 1(b) depicts the corresponding estimate of FDP (i.e., the fraction in (2.1)) along with the true FDP over $t$ . The approximation in this case is very good as only three true alternatives (i.e., three red triangles) lie below the horizontal line in Figure 1(a).

Knockoff framework was introduced by Barber and Candès (2015) in the high-dimensional linear regression setting. The knockoff selection procedure operates by constructing “knockoff copies” of each of the $p$ covariates (features) with certain knowledge of the covariates or responses, which are then used as a control group to ensure that the model selection algorithm is not choosing too many irrelevant covariates. The signs of test statistics in the knockoff need to satisfy (or roughly) joint exchangeability so that the corresponding knockoff can yield accurate FDR control in finite sample setting. Refer to Candes et al. (2018) for more discussions. The proposed threshold $L$ in (2.1) shares a similar spirit to the knockoff, but they are distinguished in that the RESS procedure does not require any prior information about the distribution of $X=(X_{1},\ldots,X_{p})^{\top}$ . This is especially important since it is difficult to estimate the distribution of $X$ when $p$ is very large. We employ the sample-splitting strategy to achieve a marginal symmetry property. It turns out that the FDR can be controlled reasonably well due to the marginal symmetry of $W_{j}$ ’s. The theoretical findings on the FDR control under certain dependence structures such as positive regression dependence on subset or weak dependence at a marginal level (Benjamini and Yekutieli, 2001; Storey et al., 2004) shed light on the validity of the RESS procedure. Detailed analysis will be given in Section 2.2.

At a first glance, the test statistic $W_{j}$ may result in much information loss due to the sample-splitting. In fact, benefiting from the joint use of two independent $t$ statistics, the relative power loss of $W_{j}$ with respect to $T_{j}$ is quite mild. By the inequality $\Pr^{2}(T_{1j}>\sqrt{t})+\Pr^{2}(T_{1j}<-\sqrt{t})\leq\Pr(T_{1j}T_{2j}>t)\leq 2\Pr(T_{1j}>\sqrt{t})+2\Pr(T_{1j}<-\sqrt{t})$ , the power ratio of the tests based on $W_{j}$ and $T_{j}$ can be easily bounded as

	$\displaystyle\frac{\left\{1-\Phi(z_{\alpha/4}-\sqrt{n}\mu_{j}/\sigma_{j})\right\}^{2}+\left\{1-\Phi(z_{\alpha/4}+\sqrt{n}\mu_{j}/\sigma_{j})\right\}^{2}}{2-\Phi(z_{\alpha/2}-\sqrt{2n}\mu_{j}/\sigma_{j})-\Phi(z_{\alpha/2}+\sqrt{2n}\mu_{j}/\sigma_{j})}\lesssim\frac{\Pr_{\mathbb{H}_{1j}}(W_{j}>W_{\alpha})}{\Pr_{\mathbb{H}_{1j}}(\|T_{j}\|>z_{\alpha/2})}$
	$\displaystyle\lesssim\frac{2\left\{2-\Phi(z_{\sqrt{\alpha/2}}-\sqrt{n}\mu_{j}/\sigma_{j})-\Phi(z_{\sqrt{\alpha/2}}+\sqrt{n}\mu_{j}/\sigma_{j})\right\}}{2-\Phi(z_{\alpha/2}-\sqrt{2n}\mu_{j}/\sigma_{j})-\Phi(z_{\alpha/2}+\sqrt{2n}\mu_{j}/\sigma_{j})},$

where $\sigma_{j}$ is the standard deviation of $X_{j}$ , and $W_{\alpha}$ and $z_{\alpha}$ are the upper $\alpha$ quantiles of the distributions of $W_{j}$ and $N(0,1)$ , respectively. Further when $\sqrt{n}\mu_{j}\to\infty$ , both of the test statistics $W_{j}$ and $T_{j}$ have asymptotic power 1. For better understanding, the power curves (with size corrected) of the two tests are presented in Figure 2 for some commonly used settings. We can see that though $W_{j}$ is always inferior to $T_{j}$ as we can expect, the disadvantage is not very significant and also tends to be less pronounced when $n$ increases. This power sacrifice of the RESS in turn brings us much better error rate control as we shall show in the next subsection. On the other hand, compared with the test statistic $T_{1j}^{2}$ based on only group ${\mathcal{D}}_{1}$ , the proposed test statistic $W_{j}=T_{1j}T_{2j}$ has smaller variance because $\mathrm{Var}(W_{j})\approx 1<2\approx\mathrm{Var}(T^{2}_{1j})$ . Since the null distribution of $W_{j}$ is symmetric, the upper quantiles of the distributions of $W_{j}$ would be much smaller than those of $T^{2}_{1j}$ . As a result, $W_{j}$ is more powerful than $T_{1j}^{2}$ .

Remark 1 It is noteworthy that the joint use of $W_{j}$ and the threshold $L$ distinguishes our RESS procedure from the methods given by Wasserman and Roeder (2009) and Meinshausen et al. (2009) which used the sample-splitting scheme to conduct variable selection with error rate control. They used the first split to reduce the number of variables to a manageable size and then applied FDR control methods on the remaining variables using the data from the second split. The normal calibration is usually required to obtain $p$ -values. In contrast, in the RESS procedure, the data from both two splits are used to compute the statistics and the empirical distribution is in place of asymptotic distributions.

2.2 Theoretical results

We firstly investigate the FDR control of the proposed RESS procedure when $X_{1},\ldots,X_{p}$ are independent each other, and then extend the results to the dependent case under stronger conditions. A simply yet effective refined procedure with better accuracy in FDR control will be further developed after the convergence rate of the FDP of the RESS is investigated.

2.2.1 Independent case

A preliminary result of this paper is that the proposed procedure controls a quantity nearly equal to the FDR when the populations are symmetric.

Proposition 1

Assume $X_{1},\ldots,X_{p}$ are symmetrically distributed and independent each other. For any $\alpha\in(0,1)$ and $n_{t}\geq 4$ , the RESS method satisfies

\mathbb{E}\left[\frac{\#\{j:W_{j}\geq L,j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L\}+\alpha^{-1}}\right]\leq\alpha.

The term bounded by this proposition is very close to the FDR in settings where $\alpha^{-1}$ is dominated by $\#\{j:W_{j}\geq L\}$ . Following Barber and Candès (2015), if it is preferable to control the FDR exactly, we may adjust the threshold by

L_{+}=\inf\left\{t>0:\frac{1+\#\{j:W_{j}\leq-t\}}{\#\{j:W_{j}\geq t\}\vee 1}\leq\alpha\right\},

with which we can show that

\mbox{FDR}_{W}(L_{+})=\mathbb{E}\left[\frac{\#\{j:W_{j}\geq L_{+},j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L_{+}\}}\right]\leq\alpha.

In what follows, as we mainly focus on the asymptotic FDR control, the results with $L$ and $L_{+}$ are generally the same. From the proof of this proposition, we can see that the inequality is due to the fact that

\#\{j:W_{j}\leq-t,j\in\mathcal{I}_{0}\}\leq\#\{j:W_{j}\leq-t\}

which would usually be tight because most strong signals yield a positive $W_{j}$ or at least a not too small value of $W_{j}$ . This implies that it is very likely that the FDR of the RESS will be fairly close to the nominal one unless a large proportion of $\mu_{j}$ ’s for $j\in\mathcal{I}_{1}$ is very weak.

Proposition 1 is a direct corollary of the following result in which $X_{j}$ ’s are allowed to be asymmetric.

Proposition 2

Assume $X_{1},\ldots,X_{p}$ are independent each other and $n_{t}\geq 4$ . For any $\alpha\in(0,1)$ , the RESS method satisfies

\mathbb{E}\left[\frac{\#\{j:W_{j}\geq L,j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L\}+\alpha^{-1}}\right]\leq\min_{\epsilon\geq 0}\left\{\alpha(1+4\epsilon)+\Pr\left(\max_{j\in\mathcal{I}_{0}}\Delta_{j}>\epsilon\right)\right\},

where $\Delta_{j}=\left|\Pr(W_{j}>0\mid|W_{j}|)-1/2\right|$ .

We can interpret $\Delta_{j}$ as measuring the extent to which the symmetry is violated for a specific variable $j$ . This result concurs with our intuition that controlling the $\Delta_{j}$ ’s is sufficient to ensure control of the FDR for the RESS method. In the most ideal case where $X_{1},\ldots,X_{p}$ are symmetrically distributed, $\Delta_{j}=0$ for all $j\in\mathcal{I}_{0}$ , and we automatically obtain the FDR-control result in Proposition 1 since we can take $\epsilon=0$ . Under asymmetric scenarios, the $\Delta_{j}$ can still be expected to be small due to the convergence of $T_{1j}$ and $T_{2j}$ to the normal if $n$ is not too small. In the next theorems, we will show that under mild conditions $\max_{j\in\mathcal{I}_{0}}\Delta_{j}\to 0$ in probability, yielding a meaningful result on FDR control in more realistic settings. The proof of this proposition follows similarly to Theorem 2 in Barber et al. (2019) which shows that the Model-X knockoff (Candes et al., 2018) selection procedure incurs an inflation of the FDR that is proportional to the errors in estimating the distribution of each feature conditional on the remaining features.

For our asymptotic analysis, we need the following assumptions. Throughout this paper, we assume $p_{1}\leq\gamma p$ for some $\gamma<1$ , which includes the sparse setting $p_{1}=o(p)$ .

Assumption 1

(Moments) (i) For some constant $K_{1}>0$ , $\max_{1\leq j\leq p}\mathbb{E}(X_{j}-\mu_{j})^{4}/\sigma_{j}^{4}<K_{1}$ ; (ii) For some constants $C>0$ and $K_{2}>0$ , $\max_{1\leq j\leq p}\mathbb{E}[\exp\{C(X_{j}-\mu_{j})^{2}/\sigma_{j}^{2}\}]<K_{2}$ .

Assumption 2

(Signals) $\beta_{p}\equiv{\rm Card}\left\{j:|\mu_{j}|/\sigma_{j}\geq 3\sqrt{\log p/n}\right\}\to\infty.$

Remark 2 The moment condition in Assumption 1-(i) is required in a large deviation result for the Student’s $t$ statistics on which our proof heavily hinges. Assumption 1-(ii), which requires exponentially light tails and implies that all moments of $X$ are finite, is stronger than Assumption 1-(i). This will be only needed when we want to use the RESS method with correction (see Section 2.2.2). In fact, for the familywise error control with bootstrap calibration, similar conditions are also imposed to achieve better accuracy (Fan et al., 2007). The implication of Assumption 2 is that $p_{1}\to\infty$ . If the number of true alternatives is fixed as $p\to\infty$ , Liu and Shao (2014) have shown that even with the true $p$ -values, the BH method is unable to control FDP with a high probability. Thus, we use this condition to rule out such cases. $\Box$

Theorem 1

Assume $X_{1},\ldots,X_{p}$ are independent each other. Suppose Assumptions 1-(i), 2 and $p=o\{\exp(n^{1/3})\}$ hold. For any $\alpha\in(0,1)$ , the FDP of the RESS method satisfies

	$\displaystyle{\rm FDP}_{W}(L)$	$\displaystyle\equiv\frac{\#\{j:W_{j}\geq L,j\in\mathcal{I}_{0}\}}{\#\{j:W_{j}\geq L\}\vee 1}$
		$\displaystyle\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}+\beta_{p}^{-1/2}\right\},$		(2.2)

and $\mathop{\lim\sup}_{(n,p)\to\infty}{\rm FDR}\leq\alpha$ , where $\kappa_{j}=\mathbb{E}\{(X_{j}-\mu_{j})^{3}\}/\sigma_{j}^{3}$ and $\xi_{n,p}=\max(\log p,\log n)$ .

The proof of this theorem relies on a nice large-deviation result for $t$ -statistics (Delaigle et al., 2011) but our new statistic $W_{j}$ makes that the technical details of our theory are not straightforward and cannot be obtained from existing works. Under a finite fourth moment condition, Theorem 1 reveals that the FDR of our RESS method can be controlled if $\mathcal{I}_{1}$ satisfies the technical condition on the alternative set, Assumption 2. Roughly speaking, the theorem ensures that the $\max_{j\in\mathcal{I}_{0}}\Delta_{j}$ in Proposition 2, which can be bounded by $\max_{j\in\mathcal{I}_{0}}\sup_{0\leq t\leq{2\log p}}|f_{j}(t)/f_{j}(-t)-1|$ , is small, where $f_{j}(\cdot)$ denotes the probability density function of $W_{j}$ . Note that the inequality in (2.2) is mainly due to

\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L\right)}{\sum_{j}\mathbb{I}\left(W_{j}\leq-L\right)}\approx\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-L\right)}{\sum_{j}\mathbb{I}\left(W_{j}\leq-L\right)}\leq 1.

Hence, the FDR control is often quite tight because

\sum_{j\in\mathcal{I}_{1}}\mathbb{I}\left(W_{j}\leq-L\right)/\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-L\right)\mathop{\rightarrow}\limits^{p}0

in many situations, such like $(p_{1}-|\mathcal{M}|)/p_{0}\to 0$ , where $\mathcal{M}=\left\{i:|\mu_{i}|/\sigma_{i}\geq\sqrt{(2+c)\log p/n}\right\}$ for any small $c>0$ . See a proof given in the Supplementary Material.

It is interesting to further unpack the convergence rate given in this theorem. Liu and Shao (2014) has shown that the convergence rate of the bootstrap calibration is

\displaystyle{\rm FDP}_{\rm Bootstrap}

\displaystyle\lesssim\alpha+O_{p}\left\{\sqrt{\log p/n}+n^{-1}(\log p)^{2}\right\}.

The ${\rm FDR}_{W}$ indicates that our “raw” RESS method is inferior to the bootstrap calibration, especially when $p$ is very large (so that the term of order $n^{-1}$ has non-ignorable effect). Actually, the term $n^{-1}(\log p)^{3}$ can be eliminated by a simple correction as discussed below.

2.2.2 A refined procedure

By more carefully examining the FDP of the proposed RESS method, we can show that for any $0\leq t\leq 2\log p$ , we have

	$\displaystyle\widehat{\rm FDP}_{W}(t)$	$\displaystyle\equiv\frac{\#\{j:W_{j}\leq-L\}}{\#\{j:W_{j}\geq L\}\vee 1}$
		$\displaystyle\approx{\rm FDP}_{W}(t)\left(1-\frac{2t^{3}\bar{\kappa}}{9n}\right)+O_{p}\left(\sqrt{\xi_{n,p}/n}+\beta_{p}^{-1/2}\right),$		(2.3)

where $\bar{\kappa}=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}$ . Say, we are able to express the term of order $n^{-1}t^{3}$ to a more accurate way, which benefits from utilizing the empirical distribution $p^{-1}\sum_{j}\mathbb{I}(W_{j}\leq-t)$ to approximate $p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t)$ and “surprisingly” eliminate the terms of order $n^{-1/2}(\log p)^{3/2}$ . Clearly, the $\widehat{\rm FDP}_{W}(t)$ is an underestimate of the true FDP, and in turn yields an inflation of the FDR.

This motives us to consider the test statistic $\tilde{W}_{j}=n\bar{X}_{1j}\bar{X}_{2j}/s_{1j}^{2}=T_{1j}\tilde{T}_{2j}$ , where $\tilde{T}_{2j}=\sqrt{n}\bar{X}_{2j}/s_{1j}$ . As shown in the Appendix, the FDP of using $\tilde{W}_{j}$ satisfies

\displaystyle\widehat{\rm FDP}_{\tilde{W}}(t)\approx{\rm FDP}_{\tilde{W}}(t)\left(1+\frac{5t^{3}\bar{\kappa}}{18n}\right)+O_{p}\left(\sqrt{\xi_{n,p}/n}+\beta_{p}^{-1/2}\right).

(2.4)

The difference in the asymptotic expansions of $\widehat{\rm FDP}_{W}(t)$ and $\widehat{\rm FDP}_{\tilde{W}}(t)$ is due to the different large-deviation probabilities of the $W_{j}$ and $\tilde{W}_{j}$ . This difference immediately suggests a “refined” threshold as,

\displaystyle L_{\rm refined}=\inf\left\{t>0:\frac{\#\{j:W_{j}\leq-t\}}{\#\{j:W_{j}\geq t\}\vee 1}\left\{1-\frac{4}{9}\theta(t)\right\}\leq\alpha\right\},

(2.5)

where

\theta(t)=\frac{\left(\#\{j:W_{j}\leq-t\}-\#\{j:W_{j}\geq t\}\right)-\left(\#\{j:\tilde{W}_{j}\leq-t\}-\#\{j:\tilde{W}_{j}\geq t\}\right)}{\#\{j:W_{j}\leq-t\}\vee 1}.

We show that $\theta(t)\approx-\frac{t^{3}\bar{\kappa}}{2n}$ under certain conditions, and consequently using $L_{\rm refined}$ is capable of eliminating the effect of the term $\frac{2t^{3}\bar{\kappa}}{9n}$ in (2.2.2).

The next theorem demonstrates that the refined procedure has better convergence rate in certain circumstance. Basically, we restrict our attention to the sparse case, say $p_{1}/p_{0}\to 0$ , such like $p_{1}=p^{\eta}$ for $0<\eta<1$ . This is because the term of order $t^{3}/n$ only matters when $t$ is large. From the proof of Theorem 1 we see that $L\lesssim 2\log\{p/(\beta_{p}\alpha)\}$ . In other words, only if $p_{1}$ or $\beta_{p}$ is small, the tail approximation of $\widehat{\rm FDP}_{{W}}(t)$ to ${\rm FDP}_{{W}}(t)$ would be important.

Theorem 2

Assume $X_{1},\ldots,X_{p}$ are independent each other. Suppose Assumptions 1-(ii), 2, $p=o\{\exp(n^{1/3})\}$ , $p_{1}=p^{\eta}$ and $(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1)$ hold. For any $\alpha\in(0,1)$ , the FDP of the refined RESS method satisfies ${\rm FDP}_{\rm refined}\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{2}+\beta_{p}^{-1/2}\right\}$ .

In this theorem, the condition $(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1)$ implies that the number of the signals we can identify dominates the number of those very weak signals. The RESS method with the refined threshold $L_{\rm refined}$ has the same convergence rate as the bootstrap calibration. Simultaneous testing of many hypotheses allows us to construct a “data-driven” correction of skewness without resampling. Thus, in some sense, large-scale $t$ tests with ultrahigh-dimension may not be considered as a “curse” but a “blessing” in our problem.

We summarize the refined RESS procedure as follows.

Reflection via Sampling Splitting Procedure (RESS)

•

Step 1: Randomly split the data into two parts with equal size. Compute $\bar{X}_{kj}$ and $s_{kj}^{2}$ for $k=1,2$ and $j=1,\ldots,p$ ;
•

Step 2: Obtain $W_{j}$ and $\tilde{W}_{j}$ for $j=1,\ldots,p$ . Compute $\theta(t)$ for $t\in\{|W_{j}|\}_{j=1}^{p}$ ;
•

Step 3: Find the threshold $L_{\rm refined}$ by (2.5) and reject $\mathbb{H}_{0j}$ if $W_{j}\geq L_{\rm refined}$ .

The total computation complexity is of order $O(np+p\log p)$ and the procedure can be easily implemented even without high-level programming language. The R and Excel codes are available upon request.

We want to make some remarks on the use of $\tilde{W}_{j}$ . As can be seen from (2.4), $\widehat{\mathrm{FDP}}_{\tilde{W}}(t)$ is an overestimate of the true FDP, and therefore yields a slightly more conservative procedure. In practice, if the computation is our major concern, using the RESS procedure with $\tilde{W}_{j}$ could be a safe choice.

2.2.3 Dependent case

We establish theoretical properties of the dependent $X$ case. The first result is a direct extension of Proposition 2.

Proposition 3

Assume that $n_{t}\geq 4$ . For any $\alpha\in(0,1)$ , the RESS method satisfies

\mbox{\rm FDR}\leq\min_{\epsilon\geq 0}\left\{\alpha(1+4\epsilon)+\Pr\left(\max_{j\in\mathcal{I}_{0}}\Delta_{j}^{\prime}>\epsilon\right)\right\},

where $\Delta_{j}^{\prime}=\left|\Pr(W_{j}>0\mid|W_{j}|,{\bf W}_{-j})-1/2\right|$ and ${\bf W}_{-j}=(W_{1},\ldots,W_{p})^{\top}\setminus W_{j}$ .

Again, $\Delta_{j}^{\prime}$ quantifies the effect of both the asymmetry of $X_{j}$ and the dependence between $W_{j}$ and ${\bf W}_{-j}$ on the FDR.

To achieve asymptotical FDR control, the following condition on the dependence structure is imposed.

Assumption 3

(Correlation) For each $X_{j}$ , assume that the number of variables $X_{k}$ that are dependent with $X_{j}$ is no more than $r_{p}=o(\beta_{p})$ .

This assumption implies that $X_{j}$ is independent with the other $p-r_{p}$ variables. This is certainly not the weakest condition, but is adopted here to simplify the proof.

Let $\zeta_{p}=(r_{p}/\beta_{p})^{1/2}$ . The following theorem is parallel with Theorems 1-2.

Theorem 3

Suppose Assumptions 2, 3 and $p=o\{\exp(n^{1/3})\}$ hold.

(i)

If Assumption 1-(i) hold, then for any $\alpha\in(0,1)$ , the FDP of the RESS method satisfies

\displaystyle{\rm FDP}\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}+\zeta_{p}\right\},

and $\mathop{\lim\sup}_{n\to\infty}{\rm FDR}\leq\alpha$ .

(ii)

If Assumption 1-(ii), $p_{1}=o(p)$ and $(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1)$ hold, then for any $\alpha\in(0,1)$ , the FDP of the refined RESS method satisfies ${\rm FDP}_{\rm refined}\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{2}+\zeta_{p}\right\}$ .

This theorem implies that the RESS method remains valid asymptotically for weak dependence. Comparing Theorem 3 with Theorems 1-2, the main difference lies on the convergence rates of $\beta_{p}^{-1/2}$ and $\zeta_{p}$ ; the latter one is asymptotically larger. This can be understood because the approximation of the empirical distribution to the population one is expected to have slower rate for dependent summation. When $X_{j}$ is independent with all the other variables $X_{k}$ or only dependent with finite number of $X_{k}$ , then $r_{p}=O(1)$ . In this situation, Theorem 3 reduces to Theorems 1-2.

Remark 3 In our discussion, we restrict $p=o\{\exp(n^{1/3})\}$ which facilitates our technical derivation. In Liu and Shao (2014), a faster rate of $p$ is allowed, $p=o\{\exp(n^{1/2})\}$ . However, we also notice that the bootstrap method proposed by Liu and Shao (2014) jointly uses the empirical distribution of $T_{j}^{*},j=1,\ldots p$ , where $T_{j}^{*}$ is the bootstrap statistic for the $j$ th variable. This implies that its computation complexity is of order $O(p^{2}B+pnB)$ and it also requires $O(pB)$ storage, where $B$ is the number of bootstrap replications. For the commonly used bootstrap, say to approximate the distribution of $T_{j}$ individually by resampling (Fan et al., 2007), $p=o\{\exp(n^{1/2})\}$ is achieved if the replication of bootstrap is of order $p^{2}$ . Though our theoretical results only allow $p=o\{\exp(n^{1/3})\}$ , we conjecture that similar results also hold when $p=o\{\exp(n^{1/2})\}$ if more stringent conditions were imposed. Encouragingly, our extensive simulation results show that the refined procedure could work at least as good as the bootstrap method in terms of FDR control, even when $p$ is super-large. $\Box$

3 Extensions

In this section, we discuss two generalizations of our RESS procedure.

3.1 One-sided alternatives

In certain situations, we are interested in the case with one-sided alternatives, say without loss of generality, $\mu_{j}<0$ for all $j\in\mathcal{I}_{1}$ . We may modify the RESS by ruling out the cells with $T_{1j}>0$ and $T_{2j}>0$ from the set $\{j:W_{j}\geq L\}$ to improve the power. To be more specific, the threshold in (2.1) is modified by

\tilde{L}=\inf\left\{t>0:\frac{\#\{j:W_{j}\leq-t\}-\#\{j:W_{j}\geq t,T_{1j}>0,T_{2j}>0\}}{\#\{j:W_{j}\geq t,T_{1j}<0,T_{2j}<0\}\vee 1}\leq\alpha\right\},

and we reject $\mathbb{H}_{0j}:\mu_{j}\geq 0$ when $W_{j}\geq\tilde{L},T_{1j}<0,T_{2j}<0$ . We have the following result.

Corollary 1

Consider the one-sided hypotheses that $\mu_{j}<0$ for all $j\in\mathcal{I}_{1}$ . If the conditions in Theorem 3-(i) hold, then for any $\alpha\in(0,1)$ , the FDP of the RESS method with the threshold $\tilde{L}$ satisfies

\displaystyle{\rm FDP}(\tilde{L})\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}+\zeta_{p}^{-1}\right\},

and $\mathop{\lim\sup}_{n\to\infty}{\rm FDR}(\tilde{L})\leq\alpha$ .

By using the results in Delaigle et al. (2011), the convergence rate of the normal calibration is

\displaystyle{\rm FDP}_{\Phi}

\displaystyle\lesssim\alpha+O_{p}\left\{n^{-1/2}(\log p)^{3/2}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}\right\}.

Comparing this with Corollary 1, we see that the RESS strategy has removed the skewness term that describes first-order inaccuracies of the standard normal approximation. This important property is due to the fact that $W_{j}$ is more symmetric than the $t$ statistic. See the proof of Theorem 1 for details. The refined RESS procedure, which also enjoys the second-order accuracy, can be defined similarly to $L_{\rm refined}$ but we do not discuss it in detail.

3.2 Two-sample problem

We next extend the RESS procedure to two-sample problem. Assume there is another random sample $\mathcal{G}=(Z_{i1},\cdots,Z_{ip})_{i=1}^{2m}$ distributed from $(Z_{1},\cdots,Z_{p})$ . The population mean vectors of $(X_{1},\cdots,X_{p})$ and $(Z_{1},\cdots,Z_{p})$ are $(\mu^{X}_{1},\cdots,\mu^{X}_{p})$ and $(\mu^{Z}_{1},\cdots,\mu^{Z}_{p})$ , respectively. We aim to carry out $p$ two-sample tests, that is, $\mathbb{H}_{0j}:\mu^{X}_{j}=\mu^{Z}_{j}$ versus $\mathbb{H}_{1j}:\mu^{X}_{j}\neq\mu^{Z}_{j}$ , for $j=1,\ldots,p$ . The RESS procedure can be readily generalized to this two-sample problem as follows.

Firstly, similar to the sample splitting of ${\mathcal{D}}_{1}$ and ${\mathcal{D}}_{2}$ , the data $\mathcal{G}$ are also splitted randomly into two disjoint groups $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ with equal size $m$ . Based on ${\mathcal{D}}_{k}$ and $\mathcal{G}_{k}$ , two-sample $t$ -test $T_{kj}$ is defined as follows:

T_{kj}=\frac{\bar{X}_{kj}-\bar{Z}_{kj}}{\sqrt{s^{2}_{X_{kj}}/n+s^{2}_{Z_{kj}}/m}}~{}~{}\text{for }j=1,\ldots,p~{}~{}{\text{and }}k=1,2.

Here $\bar{X}_{kj}$ and $\bar{Z}_{kj}$ are the sample means of $X_{j}$ and $Z_{j}$ , while $s^{2}_{X_{kj}}$ and $s^{2}_{Z_{kj}}$ are the sample variances of $X_{j}$ and $Z_{j}$ . Finally, define $W_{j}=T_{1j}T_{2j}$ . The threshold $L$ can be then selected similarly as in (2.1), and $\mathbb{H}_{0j}$ will be rejected when $W_{j}\geq L$ .

To establish the FDR control result, the Assumptions 2 and 3 are modified as follows.

Assumption 4

(Signals) For a large $C$ , $\beta_{p}\equiv{\rm Card}\left\{j:|\mu^{X}_{j}-\mu^{Z}_{j}|\geq C\sqrt{\log p/n}\right\}\to\infty.$

Assumption 5

(Correlation) For each $X_{j}$ , assume that the number of variables $X_{k}$ that are dependent with $X_{j}$ is no more than $r_{p}=o(\beta_{p})$ . The same assumption is imposed on $(Z_{1},\cdots,Z_{p})$ .

By Theorem 2.4 in Chang et al. (2016) and the proofs for Theorem 3, we have the following result.

Corollary 2

Suppose Assumption 1-(i), Assumptions 4 and 5 and $p=o(\exp(n^{1/3}))$ hold, then for any $\alpha\in(0,1)$ , the FDP of the RESS method for the two-sample problem satisfies

\displaystyle{\rm FDP}(L)\leq\alpha+O_{p}\left\{\sqrt{\xi_{n,p}/n}+n^{-1}(\log p)^{3}+\zeta_{p}^{-1}\right\},

and $\mathop{\lim\sup}_{n\to\infty}{\rm FDR}\leq\alpha$ .

4 Numerical results

4.1 Simulation comparison

We evaluate the performance of our proposed RESS procedure on simulated data sets and compare the FDR and true positive rate (TPR) with other existing techniques. All the results are obtained from 200 replication simulations.

4.1.1 Model and benchmarks

We set the model as $X_{ij}=\mu_{j}+\varepsilon_{ij},~{}~{}1\leq i\leq n_{t},~{}~{}1\leq j\leq p$ , where the alternative signal using $\mu_{j}=\delta_{j}\sqrt{\log p/n_{t}}$ with $\delta_{j}\sim\text{Unif}(-1.5,-1)$ or $\delta_{j}\sim\text{Unif}(1,1.5)$ for $j\in\mathcal{I}_{1}$ . The random error is generated by the autoregression model $\varepsilon_{ij}=\rho\varepsilon_{i,j-1}+\epsilon_{ij}$ , where $\rho\in[0,1)$ and $\epsilon_{ij}$ ’s are i.i.d from three centered distributions: Student’s $t$ with five degrees of freedom $t(5)$ , exponential with rate one ( $\text{exp}(1)-1$ ) and a mixed distribution which consists of $\epsilon_{ij}\sim N(0,1)$ for $1\leq j\leq[p/3]$ , $\epsilon_{ij}\sim t(5)$ for $[p/3]+1\leq j\leq[2p/3]$ and $\epsilon_{ij}\sim\text{exp}(1)-1$ for $[2p/3]+1\leq j\leq p$ . When $\rho=0$ , the random errors are independent. We consider the number of alternatives as $p_{1}=[\vartheta p]$ with $\vartheta\in[0.01,0.15]$ .

The following three benchmarks are considered for comparison. The first one is the BH procedure with the $p$ -values obtained from the standard normal approximation (termed as BH for simplicity). The other two are bootstrap-based approaches. Assume $\{{\boldsymbol{X}}^{*}_{k1},\ldots{\boldsymbol{X}}^{*}_{kn_{t}}\}$ , $1\leq k\leq B$ denotes bootstrap resamples drawn by sampling randomly and $T^{*}_{kj}$ are Student’s t test statistics constructed from $\{X^{*}_{1kj}-\bar{X}_{j},\ldots,X^{*}_{n_{t}kj}-\bar{X}_{j}\}$ . One bootstrap method is to estimate the $p$ -values according to the bootstrap distribution individually by $p_{j,BI}=1/B\sum_{k=1}^{B}\mathbb{I}\{|T^{*}_{kj}|\geq|T_{j}|\}$ (Fan et al., 2007) and another one is to estimate the $p$ -values with the average $p$ bootstrap distribution together by $p_{j,BA}=1/{Bp}\sum_{k=1}^{B}\sum_{i=1}^{p}\mathbb{I}\{|T^{*}_{ki}|\geq|T_{j}|\}$ (Liu and Shao, 2014). We call these two as “I-bootstrap” and “A-bootstrap”, respectively. The A-bootstrap jointly estimates the distribution of $T_{j}$ ’s and thus can be expected to have better performance. However, the computational complexity of I-bootstrap is much lower; it is of order $O(npB)$ while that of A-bootstrap increases in a quadratic rate of $p$ . We take the number of bootstrap samples as $B=200$ in this section.

4.1.2 Results

We compare the performance of our proposed RESS method (termed as “ $\text{RESS}_{0}$ ”) and the refined RESS method (“RESS”) in a range of settings, with the BH procedure, I-bootstrap and A-bootstrap, and examine the effects of skewness, signal magnitude and correlation between variables.

Table 1: Comparison results of FDR, TPR and the average computing time under

p=5000,\rho=0

and

\vartheta=0.05

. Values in parentheses are standard deviations.

Errors		$n_{t}=50$			$n_{t}=100$
	Method	FDR(%)	TPR(%)	Time(s)	FDR(%)	TPR(%)	Time(s)
	RESS	$17.8_{\tiny{(4.6)}}$	$51.6_{\tiny{(5.5)}}$	1.75	$18.5_{\tiny{(5.0)}}$	$51.6_{\tiny{(5.8)}}$	1.84
	$\text{RESS}_{0}$	$19.7_{\tiny{(4.4)}}$	$53.8_{\tiny{(5.4)}}$	1.66	$19.9_{\tiny{(4.9)}}$	$53.1_{\tiny{(5.6)}}$	1.74
$t(5)$	BH	$16.6_{\tiny{(3.9)}}$	$52.1_{\tiny{(4.4)}}$	1.60	$17.7_{\tiny{(3.2)}}$	$52.9_{\tiny{(4.5)}}$	1.60
	A-bootstrap	$12.8_{\tiny{(3.4)}}$	$47.6_{\tiny{(4.8)}}$	404	$16.3_{\tiny{(3.1)}}$	$51.4_{\tiny{(4.5)}}$	400
	I-bootstrap	$22.3_{\tiny{(2.6)}}$	$53.2_{\tiny{(3.3)}}$	314	$23.6_{\tiny{(3.0)}}$	$54.9_{\tiny{(3.3)}}$	311
	RESS	$19.4_{\tiny{(5.0)}}$	$63.1_{\tiny{(10.1)}}$	1.76	$20.8_{\tiny{(4.9)}}$	$75.2_{\tiny{(5.8)}}$	1.85
	$\text{RESS}_{0}$	$32.2_{\tiny{(3.8)}}$	$81.5_{\tiny{(3.6)}}$	1.66	$27.3_{\tiny{(3.9)}}$	$81.4_{\tiny{(2.8)}}$	1.76
exp(1)	BH	$37.2_{\tiny{(2.7)}}$	$85.5_{\tiny{(2.9)}}$	1.70	$30.2_{\tiny{(2.7)}}$	$84.4_{\tiny{(2.5)}}$	1.63
	A-bootstrap	$18.8_{\tiny{(3.5)}}$	$62.8_{\tiny{(5.1)}}$	424	$19.7_{\tiny{(2.8)}}$	$76.0_{\tiny{(3.4)}}$	403
	I-bootstrap	$32.2_{\tiny{(3.5)}}$	$70.2_{\tiny{(4.6)}}$	332	$28.2_{\tiny{(2.6)}}$	$77.9_{\tiny{(2.6)}}$	317
	RESS	$20.7_{\tiny{(3.9)}}$	$67.7_{\tiny{(5.3)}}$	1.85	$20.6_{\tiny{(4.0)}}$	$70.0_{\tiny{(4.4)}}$	2.17
	$\text{RESS}_{0}$	$25.1_{\tiny{(3.9)}}$	$72.2_{\tiny{(4.0)}}$	1.75	$23.1_{\tiny{(3.7)}}$	$72.1_{\tiny{(3.8)}}$	2.06
Mixed	BH	$26.2_{\tiny{(2.9)}}$	$74.0_{\tiny{(3.1)}}$	1.65	$23.7_{\tiny{(3.1)}}$	$74.3_{\tiny{(3.2)}}$	1.64
	A-bootstrap	$17.5_{\tiny{(3.0)}}$	$65.1_{\tiny{(4.4)}}$	405	$19.0_{\tiny{(2.9)}}$	$70.2_{\tiny{(3.5)}}$	409
	I-bootstrap	$25.7_{\tiny{(4.4)}}$	$67.3_{\tiny{(5.6)}}$	314	$26.9_{\tiny{(4.1)}}$	$72.8_{\tiny{(4.6)}}$	319

Firstly, we set $p=5000$ and $\vartheta=0.05$ . The full sample size $n_{t}$ is taken to be $50$ or $100$ , and the target FDR level $\alpha$ is set as $0.2$ . Table 1 displays the estimated FDR, TPR and average computation time obtained by each method under three different error settings with $\rho=0$ . For the symmetric distribution $t(5)$ , the $\text{RESS}_{0}$ is able to deliver a quite accurate control but that is not the case for the other skewed errors. It performs better than the BH in terms of FDR control, but has a slightly disadvantage over the I-bootstrap. This is consistent with our theoretical analysis in Section 2.2.1. Actually, from Theorem 1 and Theorem 3-(i), when $p$ is very large, the skewness still has non-ignorable effect on the FDR control of the $\text{RESS}_{0}$ method. In contrast, we observe that the FDR levels of RESS are close to the nominal level under all the scenarios; it is clearly more effective than the I-bootstrap, $\text{RESS}_{0}$ and BH, and the difference is quite remarkable in some cases. Certainly, this is not surprising as it is a data-driven method which has second-order accuracy justified in Section 2.2.2. The power of RESS is also quite high compared to the other methods, revealing that its detection ability is not largely compromised by data-splitting.

The I-bootstrap improves slightly the accuracy of FDR control under the $\text{exp}(1)$ and mixed cases over the normal calibration, but its FDRs are still considerably larger than the nominal level. Note that the I-bootstrap calibration may need an extremely large replications $B$ , i.e. $p^{2}$ , to achieve FDR control (Liu and Shao, 2014), and therefore does not perform well with commonly used $B=200$ . The A-bootstrap method offers comparable performances to our RESS method, though it tends to be a little conservative under the $t(5)$ cases. Also, the A-bootstrap generally has smaller variations than RESS due to the use of bootstrap for estimating an overall empirical null distribution. Certainly, this gain comes partly from its computation-intensive feature; in most cases, it requires more than 200 times computational time than the RESS. In fact, as mentioned earlier, the computational complexities of A-bootstrap and RESS are of order $p^{2}B+pn_{t}B$ and $p\log p+pn_{t}$ , respectively, and hence their relative computational time could increase fast as $p$ increases. Figure 3 depicts the relative time of A-bootstrap to RESS and the FDR vaules under the $t(5)$ cases.

Next, we examine the effect of the skewness and the proportion of alternatives. As we have shown that the refined RESS performs usually better than the ${\rm RESS}_{0}$ , in what follows we focus only on the refined RESS. With respect to skewness, we evaluate the performances of various methods by varying shape parameter $\lambda$ of Gamma distribution $\Gamma(\lambda/2,2)$ , Figure 5 depicts the estimated FDR curves against the values of $\lambda$ when $p=5000$ , $n_{t}=100$ and $\vartheta=0.05$ for $\alpha=0.1$ and $\alpha=0.2$ . It can be seen that the refined RESS successfully controls FDR in an acceptable range of the target level no matter the magnitude of the skewness. Figure 4 shows the boxplots of FDP and powers under $\vartheta=0.03,0.05,0.1$ and $0.15$ when the errors are generated from $\text{exp}(1)$ . The refined RESS has stable FDP close to the nominal level $\alpha=0.2$ , even when $\vartheta$ is small. Similar results with other error distributions and signal magnitudes can be found in the Supplementary Material.

Finally, we turn to investigate the effect of the correlation level $\rho$ . Figure 6 shows the FDR curves of four methods against the values of $\rho$ when the errors are generated from the $\text{exp}(1)$ and mixed distributions. Again, it can be seen that the refined RESS and A-bootstrap result in a reasonably good FDR control in most situations, even when $\rho$ is as large as 0.8. This concurs with our asymptotic justification that the refined RESS method is still effective provided that ${X}$ satisfies certain weak dependence structure.

4.2 A real-data example

We next illustrate the proposed refined RESS procedure by an empirical analysis of the acute lymphoblastic leukemia (ALL) data, which consists of 12,256 gene probe sets for 128 adult patients enrolled in the Italian GIMEMA multi center clinical trial 0496. It is known that malignant cells in B-lineage ALL often have genetic abnormalities, which have a significant impact on the clinical course of the disease. Specifically, the molecular heterogeneity of the B-lineage ALL is well established as BCR/ABL, ALL1/AF4, E2A/PBX1 and NEG and the gene expression profiles of groups BCR/ABL and NEG are more similar to each other than to the others. In our analysis, we consider a sub-dataset of 79 B-lineage units with 37 BCR/ABL mutation and 42 NEG and use the traditional two-sample $t$ -test to examine which probe sets are differentially expressed. The dataset was previously studied by Chiaretti et al. (2005) and Bourgon et al. (2010) and is available at http://www.bioconductor.org.

Here, we consider the extension of our proposed refined RESS in two-sample case and compare it with BH procedure, A-bootstrap and I-bootrstrap over a wide range of significant levels. Both of the bootstrap sampling are repeated 200 times. Table 2 summarizes the number of probe sets differentially expressed between BCR/ABL and NEG. The BH procedure with the normal calibration tends to reject surprising more genes than RESS and A-bootstrap for various significance levels. In fact, the normality approximation seems to be violated for many of the genes as some skewness values largely deviate from zero in Figure 7. As noted earlier, the I-bootstrap needs an extremely large replications, i.e. $p^{2}$ to improve the accuracy and thus leads to the large number of rejections with $B=200$ . Figure 8 presents the scatterplot of RESS’s statistics $W_{j}$ . We can observe that all selected probe sets (red triangles) have large values of $W_{j}$ , while the unselected ones (black dots) are roughly symmetric across the horizontal lines. With the data-driven threshold, the number of differentially expressed probe sets based on our RESS appears more reasonable. The results of A-bootstrap coincide with that of RESS as expected.

Table 2: Numbers of differentially expressed probe sets under various significant levels.

$\alpha$	RESS	BH	I-bootstrap	A-bootstrap
0.05	157	163	326	141
0.1	221	238	326	222
0.15	280	334	476	310
0.2	333	414	476	397

5 Concluding Remarks

In this paper, we have proposed a multiple testing procedure, RESS, that controls FDR in the large-scale $t$ setting and offers high power to discover true signals. We give theoretical results showing that the proposed method maintains FDR control under mild conditions. The empirical performance of the refined RESS demonstrates excellent FDR control and reasonable power in comparison to other methods such as the bootstrap or normal calibrations. The ideas of the RESS procedure may be extended for controlling other rates such as per family error rate.

Appendix: Proofs

Appendix A: Lemmas

Before we present the proofs of the theorems, we first state several lemmas whose proofs can be found in the Supplementary Material. A few well-known theorems to be repeatedly used are also presented in the Supplementary Material. The first lemma characterizes the closeness between $\Pr(W_{j}\geq t)$ and ${\Pr(W_{j}\leq-t)}$ , which plays an important role in the proof. For simplicity, we suppress the dependence of $T_{kj}$ on $j$ which should not cause any confusion. Let $\tilde{\Phi}(x)=1-\Phi(x)$ , $G(t)=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\Pr(W_{j}\geq t)$ , $G_{-}(t)=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\Pr(W_{j}\leq-t)$ and $G^{-1}(y)=\inf\{t\geq 0:G(t)\leq y\}$ for $0\leq y\leq 1$ .

Lemma A.1

Suppose Assumption 1-(i) hold. For any $0\leq t\leq C\log p$ with $C>0$ ,

\frac{\Pr(T_{1}T_{2}\geq t)}{\Pr(T_{1}T_{2}\leq-t)}-1=O(a_{nt}n^{-1/2})+\frac{2t^{3}\kappa^{2}}{9n},

where $a_{nt}=({2t}+{\log n})^{1/2}$ .

The second lemma characterizes the closeness between $\Pr(\tilde{W}_{j}\geq t)$ and ${\Pr(\tilde{W}_{j}\leq-t)}$ .

Lemma A.2

Suppose Assumption 1-(ii) hold. For sufficiently large $t$ satisfying $t\lesssim\log p$ ,

\frac{\Pr(\tilde{W}\geq t)}{\Pr(\tilde{W}\leq-t)}-1=O(a_{nt}n^{-1/2}+t^{-1/2})-\frac{5t^{3}\kappa^{2}}{18n}.

The next lemma establishes the uniform convergence of $\{{p_{0}G(t)}\}^{-1}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t)}-1$ .

Lemma A.3

Suppose Assumptions 1-(i) and 3 hold. Then, for any $b_{p}\to\infty$ and $b_{p}=o(p)$ ,

	$\displaystyle\sup_{0\leq t\leq G^{-1}(\alpha b_{p}/p)}\left\|\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t)}{p_{0}G(t)}-1\right\|=O_{p}(\zeta_{p}),$		(A.1)
	$\displaystyle\sup_{0\leq t\leq G^{-1}_{-}(t)(\alpha b_{p}/p)}\left\|\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{p_{0}G_{-}(t)}-1\right\|=O_{p}(\zeta_{p}).$		(A.2)

The last one is the counterpart of Lemma A.3 for $\tilde{W}_{j}$ .

Lemma A.4

Suppose Assumptions 1-(ii) and 3 hold. Then, for any $b_{p}\to\infty$ and $b_{p}=o(p)$ , $(\ref{mco1})$ and (A.2) holds if we replace ${W}_{j}$ and $G(t)$ with $\tilde{W}_{j}$ and $\tilde{G}(t)$ , where $\tilde{G}(t)=p_{0}^{-1}\sum_{j\in\mathcal{I}_{0}}\Pr(\tilde{W}_{j}\geq t)$ .

Appendix B: Proof of Theorems

In fact, we show that under a weaker condition, i.e., Assumption 3, the results in Theorems 1-2 hold similarly, and accordingly Theorem 3 is also proved.

Proof of Theorem 1. By definition, our test is equivalent to reject $\mathbb{H}_{0j}$ if $W_{j}\geq{L}$ , where

{L}=\inf\left\{t\geq 0:\sum_{j=1}^{p}\mathbb{I}(W_{j}\leq-t)\leq\alpha\max\left(\sum_{j=1}^{p}\mathbb{I}(W_{j}\geq t),1\right)\right\}.

Let $\mathcal{A}$ be a subset of $\{1,2,\ldots,p\}$ satisfying $\mathcal{A\subset}\left\{j:|\mu_{j}|/\sigma_{j}\geq 3\sqrt{\log p/n}\right\}$ and $b_{p}\equiv|\mathcal{A}|=\min(\sqrt{n\xi_{n,p}},\beta_{p})$ . By Assumption 1-(i) and Markov’s inequality, for any $\epsilon>0$ , $\Pr(\max_{j\in\mathcal{A}}|s_{kj}^{2}/\sigma^{2}_{j}-1|\geq\epsilon)=O(\sqrt{\xi_{n,p}/n}),k=1,2$ . By Assumption 2 and Lemma S.1, there exists some $c>2$ ,

	$\displaystyle\Pr\left(\sum_{j=1}^{p}\mathbb{I}(W_{j}\geq c\log p)\geq b_{p}/2\right)$
	$\displaystyle\geq\Pr\left(\sum_{j\in\mathcal{A}}\left\{\mathbb{I}(T_{1j}T_{2j}\geq c\log p)-\Pr(T_{1j}T_{2j}\geq c\log p)\right\}\geq b_{p}/2-\sum_{j\in\mathcal{A}}\Pr(T_{1j}T_{2j}\geq c\log p)\right)$
	$\displaystyle\geq 1-\frac{\sum_{j\in\mathcal{A}}\Pr(T_{1j}T_{2j}\geq c\log p)}{\left\{\sum_{j\in\mathcal{A}}\Pr(T_{1j}T_{2j}\geq c\log p)-b_{p}/2\right\}^{2}}$
	$\displaystyle\geq 1-cb_{p}^{-1}\left\{1+o(1)\right\}=1+O(\sqrt{\xi_{n,p}/n}+b_{p}^{-1}),$		(A.3)

where we use the fact that $\Pr_{j\in\mathcal{A}}(T_{1j}>\sqrt{c\log p})\geq\Pr_{j\in\mathcal{I}_{0}}(T_{1j}\geq-\sqrt{c\log p})\geq(1-p^{-1})\left\{1+o(1)\right\}$ .

Define $\eta_{p}=\sqrt{2\log\{p/(b_{p}\alpha)\}}$ . We note that $G(t)\geq\alpha b_{p}/p$ implies that $t\leq\eta_{p}^{2}$ when $p$ is large. This is because

\displaystyle G(\eta^{2}_{p})\leq 2\Pr(|T_{1}|>{\eta_{p}})\leq 2\sqrt{2/\pi}\exp(-\eta^{2}_{p}/2)/\eta_{p}\leq\alpha b_{p}/p\leq G(t),

if $p$ is sufficiently large. This, together with Lemma A.3 and Eq. (A.3) implies that $L\leq\eta_{p}$ with probability at least $O(\sqrt{\xi_{n,p}/n}+b_{p}^{-1})$ .

Therefore, by Lemma A.1, we get

\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-{L})}-1=O_{p}(\sqrt{\xi_{n,p}/n})+\frac{2{L}^{3}\max_{j\in\mathcal{I}_{0}}\kappa_{j}^{2}}{9n}\left\{1+O(a_{n{L}}/\sqrt{n}+\zeta_{p})\right\}.

(A.4)

Write

	$\displaystyle{\rm FDP}$	$\displaystyle=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq{L}\right)}{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}$
		$\displaystyle\leq\alpha\times R({L}).$

Note that $R({L})\leq{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq{L}\right)}/{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-{L}\right)}$ , and thus $\mathop{\lim\sup}_{n\to\infty}{\rm FDP}\leq\alpha$ in probability by (A.4). Then, for any $\epsilon>0$ ,

\mathrm{FDR}\leq(1+\epsilon)\alpha R({L})+\Pr\left(\mathrm{FDP}\geq(1+\epsilon)\alpha R({L})\right),

from which the second part of this theorem is proved. $\Box$

Proof of Theorem 2. We firstly establish a lower bound for $L$ so that the condition in Lemma A.2 is valid. Notice

	$\displaystyle\sum_{j}\mathbb{I}(W_{j}\leq-t)$	$\displaystyle\geq\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)\approx p_{0}G_{-}(t),$
	$\displaystyle\alpha\sum_{j}\mathbb{I}(W_{j}\geq t)$	$\displaystyle\leq\alpha\left\{p_{1}+\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq t)\right\}\approx\alpha\left\{p_{1}+p_{0}G(t)\right\}.$

Hence, we can conclude that $L\gtrsim G^{-1}(\alpha p_{1}/\{p_{0}(1-\alpha)\})$ .

For $\log\{p_{0}(1-\alpha)/(\alpha p_{1})\}\lesssim t\lesssim\log\{p/(b_{p}\alpha)\}$ ,

	$\displaystyle\theta(t)=$	$\displaystyle\frac{\left\{\sum_{j}\mathbb{I}(W_{j}\leq-t)-\sum_{j}\mathbb{I}(\tilde{W}_{j}\leq-t)\right\}-\left\{\sum_{j}\mathbb{I}(W_{j}\geq t)-\sum_{j}\mathbb{I}(\tilde{W}_{j}\geq t)\right\}}{\sum_{j}\mathbb{I}(W_{j}\leq-t)}$
	$\displaystyle=$	$\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}\times\Bigg{[}\frac{\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(W_{j}\leq-t)-\mathbb{I}(W_{j}\geq t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}+\frac{\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(\tilde{W}_{j}\geq t)-\mathbb{I}(\tilde{W}_{j}\leq-t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}$
		$\displaystyle+\frac{\sum_{j\in\mathcal{I}_{1}}\left\{\mathbb{I}(\tilde{W}_{j}\geq t)-\mathbb{I}(W_{j}\geq t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}+\frac{\sum_{j\in\mathcal{I}_{1}}\left\{\mathbb{I}(W_{j}\leq-t)-\mathbb{I}(\tilde{W}_{j}\leq-t)\right\}}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}\Bigg{]}$
	$\displaystyle\equiv$	$\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}(D_{1}+D_{2}+D_{3}+D_{4})$

By Lemmas A.1-A.4, we get

	$\displaystyle D_{1}$	$\displaystyle=O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)-\frac{2t^{3}\bar{\kappa}}{9n},$
	$\displaystyle D_{2}$	$\displaystyle=O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}+(\log p)^{-1}\right)-\frac{5t^{3}\bar{\kappa}}{18n}.$

Next, we deal with $D_{3}$ , and the $D_{4}$ can be bounded in a similar way to $D_{3}$ . Let $\mathcal{C}_{p}$ denote the event that $\left\{\max_{j\in\mathcal{I}_{1}}|s_{kj}^{2}/\sigma^{2}_{j}-1|<\delta_{n},k=1,2\right\}$ and $\delta_{n}=c\sqrt{n^{-1}\log p}$ for some sufficiently large $c>0$ . By Assumption 1-(ii), we have

\displaystyle\Pr\left\{\max_{j\in\mathcal{I}_{1}}|s_{kj}^{2}/\sigma^{2}_{j}-1|\geq\delta_{n}\right\}=O(\sqrt{\xi_{n,p}/n}),

and thus ${\Pr(\mathcal{C}_{p}^{c})}=O(\sqrt{\xi_{n,p}/n})$ holds.

Conditional on the event $\mathcal{C}_{p}$ , we have

	$\displaystyle\|D_{3}\|$	$\displaystyle=\frac{\left\|\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(W_{j}\geq t)-\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(\tilde{W}_{j}\geq t)\right\|}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}$
		$\displaystyle\leq\frac{\sum_{j\in\mathcal{I}_{1}}\left\|\mathbb{I}(W_{j}\geq t)-\mathbb{I}(\tilde{W}_{j}\geq t)\right\|}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}$
		$\displaystyle=\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(t\leq W_{j}\leq t(1+3\delta_{n}))}{{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}}+\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(t(1-3\delta_{n})\leq W_{j}\leq t)}{{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}}$
		$\displaystyle\equiv D_{31}+D_{32}.$

We only need to deal with $D_{31}$ and $D_{32}$ follows similarly. By the Markov’s inequality, we get

	$\displaystyle\Pr(D_{31}>\epsilon)\leq$	$\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))}{\epsilon G(t)(1+\zeta_{p})}$
	$\displaystyle=$	$\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}\setminus\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))+\sum_{j\in\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))}{\epsilon G(t)(1+\zeta_{p})}$
	$\displaystyle\leq$	$\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}\setminus\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))+C\beta_{p}p^{-1}}{\epsilon G(t)(1+\zeta_{p})}$
	$\displaystyle\leq$	$\displaystyle p_{0}^{-1}\frac{\sum_{j\in\mathcal{I}_{1}\setminus\mathcal{S}_{p}}\Pr(t\leq W_{j}\leq t(1+3\delta_{n}))+C\beta_{p}p^{-1}}{\epsilon t^{-1}\exp(-t)(1+\zeta_{p})},$
	$\displaystyle\leq$	$\displaystyle C\frac{(p_{1}-\beta_{p})t^{2}\exp(t)\delta_{n}+\beta_{p}p^{-1}t\exp(t)}{p_{0}\epsilon(1+\zeta_{p})}$
	$\displaystyle\leq$	$\displaystyle C\frac{(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})\delta_{n}}{\epsilon(1+\zeta_{p})}+O(p^{-1}\log p)=O(\delta_{n}),$

where $\mathcal{S}_{p}=\left\{j:|\mu_{j}|/\sigma_{j}\geq 3\sqrt{\log p/n}\right\}$ , and $C$ is some positive constant. The second equality is due to Lemma A.3, the fourth inequality comes from the fact that $t\lesssim\log\{p/(\beta_{p}\alpha)\}$ and the last inequality uses the condition $(p_{1}/\beta_{p}-1)\log^{2}(p/\beta_{p})=O(1)$ .

Finally, collecting all the terms of $D_{k},k=1,\ldots,4$ , we conclude that for any $G^{-1}(\alpha p_{1}/\{p_{0}(1-\alpha)\})\lesssim t\lesssim\log\{p/(\beta_{p}\alpha)\}$ ,

\theta(t)=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}\left[-\frac{t^{3}\bar{\kappa}}{2n}+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)\right].

By using similar arguments given in the Supplemental Material we can show that $\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}{{1\vee\sum_{j}\mathbb{I}(W_{j}\leq-t)}}=1+O_{p}(p^{\eta-1})$ . Thus, we conclude that $\theta(t)=-\frac{t^{3}\bar{\kappa}}{2n}\left\{1+O_{p}(p^{\eta-1})\right\}+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)$ .

Accordingly, we obtain

		$\displaystyle\mathrm{FDP}_{W}({L})=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}$
		$\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}$
		$\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\geq{L})}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-{L}\right)}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-{L})}{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}$
		$\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\left\{1+\frac{2t^{3}\bar{\kappa}}{9n}+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)\right\}\left\{1+O_{p}(p^{\eta-1})\right\}$
		$\displaystyle=\frac{\sum_{j}\mathbb{I}\left(W_{j}\leq-{L}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq{L})}\left\{1-\frac{4}{9}\theta({L})\right\}\left\{1+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right)\right\}$
		$\displaystyle\leq\alpha+O_{p}\left(\sqrt{\xi_{n,p}/n}+\zeta_{p}\right).$

The proof is completed. $\Box$

References

Barber and Candès (2015) Barber, R. F. and Candès, E. J. (2015), “Controlling the false discovery rate via knockoffs,” The Annals of Statistics, 43, 2055–2085.
Barber et al. (2019) Barber, R. F., Candès, E. J., and Samworth, R. J. (2019), “Robust inference with knockoffs,” The Annals of Statistics, to appear.
Benjamini and Hochberg (1995) Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57, 289–300.
Benjamini and Yekutieli (2001) Benjamini, Y. and Yekutieli, D. (2001), “The control of the false discovery rate in multiple testing under dependency,” The Annals of Statistics, 29, 1165–1188.
Bourgon et al. (2010) Bourgon, R., Gentleman, R., and Huber, W. (2010), “Independent filtering increases detection power for high-throughput experiments,” Proceedings of the National Academy of Sciences, 107, 9546–9551.
Candes et al. (2018) Candes, E., Fan, Y., Janson, L., and Lv, J. (2018), “Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 551–577.
Cao and Kosorok (2011) Cao, H. and Kosorok, M. R. (2011), “Simultaneous critical values for t-tests in very high dimensions,” Bernoulli: official journal of the Bernoulli Society for Mathematical Statistics and Probability, 17, 347–394.
Chang et al. (2016) Chang, J., Shao, Q.-M., and Zhou, W.-X. (2016), “Cramér-type moderate deviations for Studentized two-sample $U$ -statistics with applications,” The Annals of Statistics, 44, 1931–1956.
Chiaretti et al. (2005) Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Wang, K. S., Mandelli, F., Foa, R., and Ritz, J. (2005), “Gene expression profiles of B-lineage adult acute lymphocytic leukemia reveal genetic patterns that identify lineage derivation and distinct mechanisms of transformation,” Clinical cancer research, 11, 7209–7219.
Delaigle et al. (2011) Delaigle, A., Hall, P., and Jin, J. (2011), “Robustness and accuracy of methods for high dimensional data analysis based on Student’s t-statistic,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 283–301.
Efron (2004) Efron, B. (2004), “Large-scale simultaneous hypothesis testing: the choice of a null hypothesis,” Journal of the American Statistical Association, 99, 96–104.
Fan et al. (2007) Fan, J., Hall, P., and Yao, Q. (2007), “To how many simultaneous hypothesis tests can normal, student’s t or bootstrap calibration be applied?” Journal of the American Statistical Association, 102, 1282–1288.
Liu and Shao (2014) Liu, W. and Shao, Q.-M. (2014), “Phase transition and regularized bootstrap in large-scale $t$ -tests with false discovery rate control,” The Annals of Statistics, 42, 2003–2025.
Meinshausen et al. (2009) Meinshausen, N., Meier, L., and Bühlmann, P. (2009), “P-values for high-dimensional regression,” Journal of the American Statistical Association, 104, 1671–1681.
Petrov (2012) Petrov, V. V. (2012), Sums of independent random variables, vol. 82, Springer Science & Business Media.
Storey et al. (2004) Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 187–205.
Wang (2005) Wang, Q. (2005), “Limit theorems for self-normalized large deviation,” Electronic Journal of Probability, 10, 1260–1285.
Wasserman and Roeder (2009) Wasserman, L. and Roeder, K. (2009), “High dimensional variable selection,” The Annals of Statistics, 37, 2178–2201.

Supplementary Material for “A New Procedure for Controlling False Discovery Rate in Large-Scale $t$ -tests”

This supplementary material contains the proofs of some technical lemmas and corollaries, and additional simulation results.

Additional Lemmas

The first one is the large deviation result for the Student’s $t$ statistic $T$ . See also Wang (2005).

Lemma S.1

[Delaigle et al. (2011)] Let $B>1$ denote a constant. Then,

\frac{\Pr(T>x)}{1-\Phi(x)}=\exp\left\{-\frac{x^{3}\kappa}{3\sqrt{n}}\right\}\left[1+\theta(n,x)\left\{(1+|x|)n^{-1/2}+(1+|x|)^{4}n^{-1}\right\}\right]

as $n\to\infty$ , where the function $\theta$ is bounded in absolute value by a finite, positive constant $C_{1}(B)$ (depending only on $B$ ), uniformly in all distributions of $X$ for which $\mathbb{E}|X|^{4}\leq B$ , $\mathbb{E}(X^{2})=1$ and $\mathbb{E}(X)=0$ , and uniformly in $x$ satisfying $0\leq x\leq Bn^{1/4}$ .

The second one is a standard large deviation result for the mean; See Theorem VIII-4 in Petrov (2012).

Lemma S.2 (Large deviation for the mean)

Suppose that $X_{1},\ldots,X_{n}$ are i.i.d random variables with mean zero and variance $\sigma^{2}$ , satisfying Assumption 1-(ii). Then for any $0\leq x\leq cn^{1/6}$ and $c>0$ ,

\frac{\Pr(\sqrt{n}\bar{X}/\sigma>x)}{1-\Phi(x)}=\exp\left\{\frac{x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+o(1)\right\}.

The third lemma is a large deviation result for $T_{j}/(s_{j}/\sigma_{j})$ .

Lemma S.3

Suppose Assumptions 1-(ii) hold. Then for $x\to\infty$ and $x=o(n^{1/6})$ ,

\frac{\Pr(T/(s/\sigma)>x)}{1-\Phi(x)}=\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}.

Proof. Without loss of generality, we assume that $\sigma^{2}=1$ . First of all, we deal with $\Pr(\sqrt{n}\bar{X}/m_{2}>x)$ , where $m_{2}=n^{-1}\sum_{i=1}^{n}X_{i}^{2}$ . Observe

\displaystyle\Pr(\sqrt{n}\bar{X}/m_{2}>x)=\Pr\left(n^{-1/2}\sum_{i=1}^{n}Y_{i}\geq x\right),

where $Y_{i}=X_{i}-c_{n,x}(X_{i}^{2}-1)$ and $c_{n,x}=n^{-1/2}x$ . Simple calculation yields

	$\displaystyle\mathrm{Var}(Y_{i})$	$\displaystyle=(1-2c_{n,x}\kappa)+O(c^{2}_{n}),$
	$\displaystyle E(Y_{i}^{3})$	$\displaystyle=\kappa-3c_{n,x}\mathbb{E}X_{i}^{4}+3c_{n,x}+O(c^{2}_{n}).$

By Lemma S.2, we have

	$\displaystyle\frac{\Pr\left(\frac{n^{-1/2}\sum_{i=1}^{n}Y_{i}}{\sqrt{\mathrm{Var}(Y)}}\geq\frac{x}{\sqrt{\mathrm{Var}(Y)}}\right)}{\tilde{\Phi}(x/\sqrt{\mathrm{Var}(Y)})}$	$\displaystyle=\exp\left\{\frac{x^{3}\kappa_{Y}}{6\sqrt{n}\mathrm{Var}^{3/2}{Y}}\right\}\left\{1+O(x/\sqrt{n})\right\}$
		$\displaystyle=\exp\left\{\frac{x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O(x/\sqrt{n})\right\},$

where $\kappa_{Y}=E(Y_{i}^{3})/\mathrm{Var}^{3/2}(Y_{i})$ .

By using the fact that for $x\to\infty$ and $x=o(a_{n}^{-1/2})$ ,

\tilde{\Phi}(x(1+a_{n}))=\tilde{\Phi}(x)\exp(-x^{2}{a_{n}})\left\{1+O(x^{-2})\right\},

we have

\displaystyle{\tilde{\Phi}(x/\sqrt{\mathrm{Var}(Y)})}={\tilde{\Phi}(x)}\exp\left\{-x^{2}c_{n,x}\kappa\right\}\left\{1+O(x^{-2})\right\}.

Thus,

	$\displaystyle\Pr(\sqrt{n}\bar{X}/m_{2}>x)$	$\displaystyle={\tilde{\Phi}(x)}\exp\left\{-x^{2}c_{n,x}\kappa+\frac{x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}$
		$\displaystyle={\tilde{\Phi}(x)}\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}.$

Finally, we show that ${\Pr(T/s>x)}$ and $\Pr(\sqrt{n}\bar{X}/m_{2}>x)$ are close enough. Note that

	$\displaystyle\Pr(T/s>x)$	$\displaystyle=\Pr(\sqrt{n}\bar{X}/(m_{2}-\bar{X}^{2})>x)$
		$\displaystyle=\Pr(x\bar{X}^{2}+\sqrt{n}\bar{X}-m_{2}x>0)$
		$\displaystyle=\Pr\left(\frac{\sqrt{n}\bar{X}}{m_{2}}>\frac{-{n}+\sqrt{n^{2}+4nx^{2}m_{2}}}{2xm_{2}}\right)+\Pr\left(\frac{\sqrt{n}\bar{X}}{m_{2}}<\frac{-{n}-\sqrt{n^{2}+4nx^{2}m_{2}}}{2xm_{2}}\right)$
		$\displaystyle=\Pr\left(\frac{\sqrt{n}\bar{X}}{m_{2}}>x\left\{1+O(x^{2}n^{-1})\right\}\right)+o(x/\sqrt{n})\tilde{\Phi}(x)$
		$\displaystyle=\tilde{\Phi}(x\left\{1+O(x^{2}n^{-1})\right\})\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}$
		$\displaystyle=\tilde{\Phi}(x)\exp\left\{-\frac{5x^{3}\kappa}{6\sqrt{n}}\right\}\left\{1+O\left(x^{-2}+x/\sqrt{n}\right)\right\}.$

$\Box$

Proof of Lemmas and Propositions

Proof of Lemma A.1. Recalling $p=\exp\{o(n^{1/3})\}$ , we have $t=o(n^{1/3})$ and $a_{nt}=o(n^{1/6})$ . Let ${\mathcal{A}}_{p}=\{v:t/a_{nt}<|v|<a_{nt}\}$ . Then,

	$\displaystyle\frac{\Pr(T_{1}T_{2}>t)}{\Pr(T_{1}T_{2}<-t)}-1$
	$\displaystyle=\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}_{p})-\Pr(T_{1}T_{2}<-t,{\mathcal{A}}_{p})}{\Pr(T_{1}T_{2}<-t)}+\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})-\Pr(T_{1}T_{2}<-t,{\mathcal{A}}^{c}_{p})}{\Pr(T_{1}T_{2}<-t)}$
	$\displaystyle\equiv C_{1}+C_{2}.$

Firstly, for the term $C_{2}$ ,

\displaystyle C_{2}

\displaystyle=

\displaystyle\left[\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})}{P(Z_{1}Z_{2}>t)}-\frac{\Pr(T_{1}T_{2}<-t,{\mathcal{A}}^{c}_{p})}{\Pr(Z_{1}Z_{2}<-t)}\right]\frac{\Pr(Z_{1}Z_{2}<-t)}{\Pr(T_{1}T_{2}<-t)},

where $Z_{1}$ and $Z_{2}$ are two independent $N(0,1)$ variables. From the proof given later, it can be easily see that ${\Pr(Z_{1}Z_{2}<-t)}/{\Pr(T_{1}T_{2}<-t)}\rightarrow 1$ uniformly in $0\leq t\leq C\log p$ . Thus, in what follows we mainly focus on the rate of ${\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})}/{\Pr(Z_{1}Z_{2}>t)}$ . The other term $\Pr(T_{1}T_{2}<-t,{\mathcal{A}}^{c}_{p})$ can be handled similarly.

Note that $\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})\leq 2\Pr(|T_{1}|>a_{nt})$ , $\Pr(Z_{1}Z_{2}>t)\geq\left\{\Pr(Z_{1}>\sqrt{t})\right\}^{2}$ . By the inequality

\frac{x}{x^{2}+1}\phi(x)<\tilde{\Phi}(x)<\phi(x)/x,\ \ \mbox{for all}\ x

and the large deviation formula for the $t$ -statistic (Lemma S.1), we obtain that

	$\displaystyle\frac{\Pr(T_{1}T_{2}>t,{\mathcal{A}}^{c}_{p})}{\Pr(Z_{1}Z_{2}>t)}$	$\displaystyle\leq\frac{2\Pr(\|T_{1}\|>a_{nt})}{\left\{\Pr(Z_{1}>\sqrt{t})\right\}^{2}}$
		$\displaystyle\leq c\exp\left\{-\frac{1}{2}(a_{nt}^{2}-t-t)\right\}\frac{t}{a_{nt}}$
		$\displaystyle=O\left(\sqrt{t/n}\right),$

thus we claim that $C_{2}=O(\sqrt{t/n})$ .

Next, we deal with the main term $C_{1}$ . Denote ${\mathcal{A}}_{p}^{+}=\{v:t/a_{nt}<v<a_{nt}\}$ . Observe

		$\displaystyle\Pr(T_{1}>t/T_{2},{\mathcal{A}}_{p}^{+})$
	$\displaystyle=$	$\displaystyle-\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\exp\left\{-\frac{t^{3}\kappa}{3v^{3}\sqrt{n}}\right\}\left\{1+O({t}/{(v\sqrt{n})})\right\}d\tilde{\Phi}(v)\exp\left\{-\frac{v^{3}\kappa}{3\sqrt{n}}\right\}\left\{1+O(v/\sqrt{n})\right\}$
	$\displaystyle=$	$\displaystyle\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}$
		$\displaystyle+\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\tilde{\Phi}(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}\frac{v^{2}\kappa}{\sqrt{n}}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}$
	$\displaystyle\equiv$	$\displaystyle R_{1}+R_{2},$

where we use Lemma S.1 again. Note that

\displaystyle R_{2}

\displaystyle\leq\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}\frac{v\kappa}{\sqrt{n}}dv\left\{1+O(a_{nt}/\sqrt{n})\right\},

and accordingly $R_{2}=R_{1}O(a_{nt}/\sqrt{n})$ . Hence,

\Pr(T_{1}T_{2}>t,{\mathcal{A}}_{p}^{+})=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{-\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}.

Similarly,

	$\displaystyle\Pr(T_{1}T_{2}>t,{\mathcal{A}}_{p}^{-})$	$\displaystyle=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{\frac{\{(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\},$
	$\displaystyle\Pr(T_{1}T_{2}<-t,{\mathcal{A}}_{p}^{+})$	$\displaystyle=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{\frac{\{(t/v)^{3}-v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\},$
	$\displaystyle\Pr(T_{1}T_{2}<-t,{\mathcal{A}}_{p}^{-})$	$\displaystyle=\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)\exp\left\{\frac{\{-(t/v)^{3}+v^{3}\}\kappa}{3\sqrt{n}}\right\}dv\left\{1+O(a_{nt}/\sqrt{n})\right\}.$

The Taylor’s expansion yields that

\displaystyle\Pr(T_{1}T_{2}>t)-\Pr(T_{1}T_{2}<-t)=\frac{4t^{3}\kappa^{2}}{9n}\int_{{\mathcal{A}}^{+}_{p}}\tilde{\Phi}({t}/{v})\phi(v)dv\left\{1+O(a_{nt}/\sqrt{n})\right\}.

Consequently, we easily get that

\displaystyle C_{1}=\frac{2t^{3}\kappa^{2}}{9n}+O(a_{nt}/\sqrt{n}),

from which we obtain the assertion. $\Box$

Proof of Lemma A.2. The proof is similar to Lemma A.1 but using the large deviation result for $T/(s/\sigma)$ obtained in Lemma S.3 and for the mean. $\Box$

Proof of Lemma A.3. We prove (A.1) and the proof of (A.2) follows similarly. Here $W_{j}=T_{1j}T_{2j}$ . Clearly, $G(t)$ is a deceasing and continuous function. Let $z_{0}<z_{1}<\cdots<z_{d_{p}}\leq 1$ and $t_{i}=G^{-1}(z_{i})$ , where $z_{0}=b_{p}/p,z_{i}=b_{p}/p+c_{p}\exp(i^{\delta})/p,d_{p}=[\{\log((p-b_{p})/c_{p})\}^{1/\delta}]$ with $c_{p}/b_{p}\to 0$ and $0<\delta<1$ . Note that $G(t_{i})/G(t_{i+1})=1+o(1)$ uniformly in $i$ . Thus, it is enough to obtain the convergence rate of

D_{p}=\sup_{0\leq i\leq d_{p}}\left|\frac{\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(W_{j}>t_{i})-\Pr(W_{j}>t_{i})\right\}}{p_{0}G(t_{i})}\right|.

Define $\mathcal{S}_{j}=\{k\in\mathcal{I}_{0}:\,\,X_{k}\,\,\mbox{is dependent with}\,\,X_{j}\}$ and further

D(t)=\mathbb{E}\left[\sum_{j\in\mathcal{I}_{0}}\left\{\mathbb{I}(W_{j}>t)-\Pr(W_{j}>t)\right\}^{2}\right].

It is noted that

	$\displaystyle D(t)$	$\displaystyle=\sum_{j\in\mathcal{I}_{0}}\sum_{k\in\mathcal{S}_{j}}\mathbb{E}\left[\left\{\mathbb{I}(W_{j}>t)-\Pr(W_{j}>t)\right\}\left\{\mathbb{I}(W_{k}>t)-\Pr(W_{k}>t)\right\}\right]$
		$\displaystyle\leq r_{p}p_{0}G(t).$

We then obtain

	$\displaystyle\Pr(D_{p}\geq\epsilon)$	$\displaystyle\leq\sum_{i=0}^{d_{p}}\Pr\left(\left\|\frac{\sum_{j\in\mathcal{I}_{0}}[\mathbb{I}(W_{j}>t_{i})-\Pr(W_{j}>t_{i})]}{p_{0}G(t_{i})}\right\|\geq\epsilon\right)$
		$\displaystyle\leq\frac{1}{\epsilon^{2}}\sum_{i=0}^{d_{p}}\frac{1}{p^{2}_{0}G^{2}(t_{i})}D(t_{i})$
		$\displaystyle\leq\frac{r_{p}}{\epsilon^{2}}\sum_{i=0}^{d_{p}}\frac{1}{p_{0}G(t_{i})}.$

Moreover, observe that

			$\displaystyle\sum_{i=0}^{d_{p}}\frac{1}{p_{0}G(t_{i})}=\frac{p}{p_{0}}\left(\frac{1}{b_{p}}+\sum_{i=1}^{d_{p}}\frac{1}{b_{p}+c_{p}e^{i^{\delta}}}\right)$
		$\displaystyle\leq$	$\displaystyle c\left(\frac{1}{b_{p}}+c_{p}^{-1}\sum_{i=1}^{d_{p}}\frac{1}{1+e^{i^{\delta}}}\right)\leq c/c_{p}\{1+o(1)\}.$

Because $c_{p}$ can be made arbitrarily large as long as $c_{p}/b_{p}\to 0$ , we have $D_{p}=O_{p}(\sqrt{r_{p}/b_{p}})$ .

Proof of Lemma A.4. By using the same arguments given in the proof of Lemma A.3, this lemma can be proved easily and thus omitted. $\Box$

Proof of Proposition 2

We prove this proposition for $L_{+}$ . The result for $L$ can be obtained similarly. Fix $\epsilon>0$ and for any threshold $t>0$ , define

R_{\epsilon}(t)=\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq t,\Delta_{j}\leq\epsilon\right)}{1+\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-t\right)}.

Consider the event that $\mathcal{A}=\{\Delta\equiv\max_{j\in\mathcal{I}_{0}}\Delta_{j}\leq\epsilon\}$ . Furthermore, for a threshold rule $L=T({\bf W})$ mapping statistics ${\bf W}$ to a threshold $L\geq 0$ , for each index $j=1,\ldots,p$ , we define

L_{j}=T\left(W_{1},\ldots,W_{j-1},|W_{j}|,W_{j+1},\ldots,W_{p}\right)\geq 0

i.e. the threshold that we would obtain if $\mathrm{sgn}(W_{j})$ were set to 1.

Then for the RESS method with the threshold $L_{+}$ , we can write

	$\displaystyle\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L_{+},\Delta_{j}\leq\epsilon\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq L_{+})}$	$\displaystyle=\frac{1+\sum_{j}\mathbb{I}\left(W_{j}\leq-L_{+}\right)}{1\vee\sum_{j}\mathbb{I}(W_{j}\geq L_{+})}\times\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L_{+},\Delta_{j}\leq\epsilon\right)}{1+\sum_{j}\mathbb{I}\left(W_{j}\leq-L_{+}\right)}$
		$\displaystyle\leq\alpha\times R_{\epsilon}(L_{+}).$

It is crucial to get an upper bound for $\mathbb{E}\{R_{\epsilon}(L_{+})\}$ . We have

$\displaystyle\mathbb{E}\{R_{\epsilon}(L)\}$	$\displaystyle=\mathbb{E}\left[\frac{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\geq L,\Delta_{j}\leq\epsilon\right)}{1+\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-L_{+}\right)}\right]$
	$\displaystyle=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}$
	$\displaystyle=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left[\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\mid\|W_{j}\|\right\}\right]$
	$\displaystyle=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\Pr\left(W_{j}>0\mid\|W_{j}\|\right)\mathbb{I}\left(\|W_{j}\|\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\},$	(S.5)

where the last step holds since the only unknown is the sign of $W_{j}$ after conditioning on $|W_{j}|$ . By definition of $\Delta_{j}$ , we have $\Pr\left(W_{j}>0\mid|W_{j}|\right)\leq 1/2+\Delta_{j}$ .

Hence,

	$\displaystyle\mathbb{E}$	$\displaystyle\{R_{\epsilon}(L_{+})\}$
		$\displaystyle\leq\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{(\frac{1}{2}+\Delta_{j})\mathbb{I}\left(\|W_{j}\|\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}$
		$\displaystyle\leq(\frac{1}{2}+\epsilon)\left[\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\geq L_{j},\Delta_{j}\leq\epsilon\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}+\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}\right]$
		$\displaystyle=(\frac{1}{2}+\epsilon)\left[\mathbb{E}\{R_{\epsilon}(L_{+})\}+\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}\right].$

Finally, the sum in the last expression can be simplified as: if for all null $j$ , $W_{j}>-L_{j}$ , then the sum is equal to zero, while otherwise,

\displaystyle\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{j}\right)}\right\}=\sum_{j\in\mathcal{I}_{0}}\mathbb{E}\left\{\frac{\mathbb{I}\left(W_{j}\leq-L_{j}\right)}{1+\sum_{k\in\mathcal{I}_{0},k\neq j}\mathbb{I}\left(W_{k}\leq-L_{k}\right)}\right\}=1,

where the first step comes from the fact: for any $j,k$ , if $W_{j}\leq-\min(L_{j},L_{k})$ and $W_{k}\leq-\min(L_{j},L_{k})$ , then $L_{j}=L_{k}$ . See Barber et al. (2019) for a proof.

Accordingly, we have

\mathbb{E}\{R_{\epsilon}(L_{+})\}\leq\frac{1/2+\epsilon}{1/2-\epsilon}\leq 1+4\epsilon.

Consequently, the assertion of this proposition holds. $\Box$

Proof of Proposition 3. The proof follows similarly to that of Proposition 2 but uses $\Pr\left(W_{j}>0\mid|W_{j}|,{\bf W}_{-j}\right)$ to replace $\Pr\left(W_{j}>0\mid|W_{j}|\right)$ in (S.5). $\Box$

Discussion on $R(t)$ . Recall the definition of $\mathcal{M}=\left\{j:|\mu_{j}|/\sigma_{j}\geq\sqrt{(2+c)\log p/n}\right\}$ for any $c>0$ . We shall show that

\sup_{0\leq t\leq 2\log p}\left|\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}\left(W_{j}\leq-t\right)}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}\left(W_{j}\leq-t\right)}\right|\mathop{\rightarrow}\limits^{p}0.

Observe that for $0\leq t\leq 2\log p$ ,

	$\displaystyle\frac{\sum_{j\in\mathcal{M}}\Pr\left(W_{j}\leq-t\right)}{\sum_{j\in\mathcal{I}_{0}}\Pr\left(W_{j}\leq-t\right)}$
	$\displaystyle\leq\frac{2\sum_{j\in\mathcal{M}}\Pr(T_{1j}\leq-\sqrt{t})}{p_{0}\exp(-t)/t\left\{1+o(1)\right\}}$
	$\displaystyle\leq\frac{2\|\mathcal{M}\|\max_{j\in\mathcal{I}_{0}}\Pr(T_{1j}\leq-\sqrt{t}-\sqrt{(1+c)\log p})}{p_{0}\exp(-t)/t\left\{1+o(1)\right\}}=o(1).$

On the other hand, $\sum_{j\in\mathcal{I}_{1}\backslash\mathcal{M}}\Pr\left(W_{j}\leq-t\right)\leq(p_{1}-|\mathcal{M}|)\Pr_{j\in\mathcal{I}_{0}}\left(W_{j}\leq-t\right)$ and thus the assertion holds due to $(p_{1}-|\mathcal{M}|)/p_{0}\to 0$ .

Table S1: Comparison results of FDR and TPR when the signals are

\mu_{j}=\delta_{j}\sqrt{\log p/n_{t}}

with

\delta_{j}\sim{\rm Unif}(-1.5,-1)

under

p=5000

\rho=0

\vartheta=0.05

and the target FDR

\alpha=0.2

		$n_{t}=50$		$n_{t}=100$
	Method	FDR(%)	TPR(%)	FDR(%)	TPR(%)
$t(5)$	RESS	$20.1_{\tiny{(6.0)}}$	$64.0_{\tiny{(6.0)}}$	$21.0_{\tiny{(6.2)}}$	$65.2_{\tiny{(6.4)}}$
	BH	$17.3_{\tiny{(2.8)}}$	$63.6_{\tiny{(3.6)}}$	$18.2_{\tiny{(2.7)}}$	$65.4_{\tiny{(3.5)}}$
exp(1)	RESS	$32.9_{\tiny{(4.0)}}$	$74.1_{\tiny{(3.5)}}$	$27.5_{\tiny{(3.4)}}$	$77.4_{\tiny{(3.3)}}$
	BH	$49.1_{\tiny{(2.1)}}$	$85.6_{\tiny{(2.4)}}$	$41.1_{\tiny{(2.9)}}$	$86.7_{\tiny{(2.5)}}$
Mixed	RESS	$26.0_{\tiny{(4.5)}}$	$74.3_{\tiny{(4.3)}}$	$23.5_{\tiny{(4.6)}}$	$76.2_{\tiny{(4.0)}}$
	BH	$31.6_{\tiny{(2.6)}}$	$80.3_{\tiny{(2.6)}}$	$27.6_{\tiny{(3.1)}}$	$80.7_{\tiny{(2.5)}}$

Additional simulation results

Table S1 reports a brief comparison between this RESS and BH procedures when $\mu_{j}<0$ for all $j\in\mathcal{I}_{1}$ . We see that the RESS significantly improves the accuracy of FDR control over the BH to certain degree, while maintains high power in most cases.

Figure S1 is a counterpart of Figure 4 when the errors are distributed from the mixed distribution.

	$\displaystyle\|D_{3}\|$	$\displaystyle=\frac{\left\|\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(W_{j}\geq t)-\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(\tilde{W}_{j}\geq t)\right\|}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}$
		$\displaystyle\leq\frac{\sum_{j\in\mathcal{I}_{1}}\left\|\mathbb{I}(W_{j}\geq t)-\mathbb{I}(\tilde{W}_{j}\geq t)\right\|}{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}$
		$\displaystyle=\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(t\leq W_{j}\leq t(1+3\delta_{n}))}{{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}}+\frac{\sum_{j\in\mathcal{I}_{1}}\mathbb{I}(t(1-3\delta_{n})\leq W_{j}\leq t)}{{\sum_{j\in\mathcal{I}_{0}}\mathbb{I}(W_{j}\leq-t)}}$
		$\displaystyle\equiv D_{31}+D_{32}.$

A New Procedure for Controlling False Discovery Rate in Large-Scale tt-tests