A New Procedure for Controlling False Discovery Rate in Large-Scale -tests
Abstract
This paper is concerned with false discovery rate (FDR) control in large-scale multiple
testing problems. We first propose a new data-driven testing
procedure for controlling the FDR in large-scale -tests for one-sample mean
problem.
The proposed procedure
achieves exact FDR control in finite sample settings when the populations are
symmetric no matter the number of tests or sample sizes. Comparing
with the existing bootstrap method for FDR control, the proposed procedure is computationally efficient.
We show that the proposed method can control the FDR asymptotically for asymmetric
populations even when the test statistics are not independent. We further show that
the proposed procedure with a simple correction is as accurate as the bootstrap method to
the second-order degree, and could be much more effective than the existing normal calibration.
We extend the proposed procedure to two-sample mean problem.
Empirical results show that the proposed procedures have better FDR control than existing
ones when the proportion of true alternative hypotheses is not too low, while
maintaining reasonably good detection ability.
Keywords: Data splitting; Large-deviation probability; Multiple comparisons; Product of two normal variables; Skewness; Symmetry
1 Introduction
Many multiple testing problems are closely related to one-sample mean problem. Let , be independent and identically distributed (i.i.d.) samples from with mean . Of interest is to test versus . This leads to consider a multiple testing problem on the mean values
A standard procedure for false discovery rate (FDR) control is to apply the Benjamini and Hochberg (BH) method (Benjamini and Hochberg, 1995) to the statistics with the standard normal or Student’s calibration. That is, let and , and define , then the procedure rejects a hypothesis whenever with a data-dependent threshold
(1.1) |
for a desired FDR level (Storey et al., 2004), where is the cumulative distribution function of . It has been revealed that the accuracy of this control procedure heavily depends on the skewness of ’s and the diverging rate of relatively to , since is only an approximation to the distribution of .
Many studies have investigated the performance of large-scale tests. Efron (2004) observed that the null distribution choices substantially affect the simultaneous inference procedure in a microarray analysis. Delaigle et al. (2011) conducted a careful study of moderate and large deviations of the statistic which is indispensable to understanding its robustness and drawback for analyzing high dimensional data. Under a condition of non-sparse signals, Cao and Kosorok (2011) proved the robustness of Student’s test statistics and calibration in the control of FDR. Liu and Shao (2014) gave a systematic analysis on the asymptotic conditions with which the large-scale testing procedure is able to have FDR control.
Bootstrap method is known as a useful way to improve the accuracy of an exact null distribution approximation and has been demonstrated to be particularly effective for highly multiple testing problems. See Delaigle et al. (2011) and the references therein. In general, the bootstrap is capable of correcting for skewness, and therefore leads to second-order accuracy. Accordingly, a faster increasing rate of could be allowed (Liu and Shao, 2014) and better FDR control would be achieved when the data populations are skewed. However, multiple testing problems with tens of thousands or even millions of hypotheses are now commonplace, and practitioners may be reluctant to use a bootstrap method in such situations, and therefore a rapid testing procedure is highly desirable.
In this paper, we propose a new data-driven selection procedure controlling the FDR. The method entails constructing new test statistics with marginal symmetry property, using the empirical distribution of the negative statistics to approximate that of the positive ones, and searching for the threshold with a formula similar to (1.1). The proposed procedure is computationally efficient since it only uses a one-time split of the data and calculation of the product of two statistics obtained from two splits. We study theoretical properties of the proposed procedure. We show that (a) the proposed method achieves exact FDR control even in finite sample settings when ’s are symmetric and independent; and (b) the proposed method achieves asymptotical FDR control under mild conditions when the populations are asymmetric and dependent. We further propose a simple refinement of the proposed procedure, and study the asymptotical property of the refined one. The theoretical property of the proposed refined one implies that it is as accurate as the bootstrap method to the second-order degree in certain situations. We also investigate extension of the proposed procedure to two-sample mean problem. Simulation comparisons imply that the proposed method has better FDR control than existing methods, maintains reasonably good power and has a significant reduction in computing time and storage.
The rest of this paper is organized as follows. In Section 2, we present the new procedure and establish its FDR control property. Some extensions are given in Section 3. Numerical studies are conducted in Section 4. Section 5 concludes the paper, and theoretical proofs are delineated in the Appendix. Some technical details and additional numerical results are provided in the Supplementary Material.
Notations. stands for that as . The “” and “” are similarly defined. We denote by and the true null set and alternative set, and , respectively.
2 A New FDR Control Procedure and its Theoretical Properties
We first propose a new FDR control procedure for the one-sample mean problem, and then establish the theoretical properties of the proposed procedure.
2.1 A new FDR control procedure
Without loss of generality, assume that the sample size is an even integer . We randomly split the data into two disjoint groups and of equal size . The test statistics for the th variable on and are denoted as and , respectively. Define
Clearly, is likely to be large for most of the signals regardless of the sign of , and small for most of the null variables. Observing that is, at least asymptotically, symmetric with mean zero for due to the central limit theorem and the independence between and , we can choose a threshold by setting
(2.1) |
and reject the if , where is the target FDR level. If the set is empty, we simply set . The fraction in (2.1) is an estimate of the false discovery proportion (FDP) since the set is often very small (if the signal is not too weak) and thus is a good approximation to .

As described above, we construct the test statistic with marginal symmetry property by using sample splitting. Thus we refer this procedure to as Reflection via Sample Splitting (RESS). The RESS procedure is data-dependent and does not involve any unknown quantity related to the populations. This is an important and desirable feature of the RESS. Figure 1 depicts a visual representation of the RESS procedure. Specifically, Figure 1(a) depicts the scatter plot of the ’s with red triangles denoting true signals. Observe that the true signals are primarily above the x-axis, indicating , while the null ’s (black dots) are roughly symmetrically distributed across the horizontal lines. Figure 1(b) depicts the corresponding estimate of FDP (i.e., the fraction in (2.1)) along with the true FDP over . The approximation in this case is very good as only three true alternatives (i.e., three red triangles) lie below the horizontal line in Figure 1(a).
Knockoff framework was introduced by Barber and Candès (2015) in the high-dimensional linear regression setting. The knockoff selection procedure operates by constructing “knockoff copies” of each of the covariates (features) with certain knowledge of the covariates or responses, which are then used as a control group to ensure that the model selection algorithm is not choosing too many irrelevant covariates. The signs of test statistics in the knockoff need to satisfy (or roughly) joint exchangeability so that the corresponding knockoff can yield accurate FDR control in finite sample setting. Refer to Candes et al. (2018) for more discussions. The proposed threshold in (2.1) shares a similar spirit to the knockoff, but they are distinguished in that the RESS procedure does not require any prior information about the distribution of . This is especially important since it is difficult to estimate the distribution of when is very large. We employ the sample-splitting strategy to achieve a marginal symmetry property. It turns out that the FDR can be controlled reasonably well due to the marginal symmetry of ’s. The theoretical findings on the FDR control under certain dependence structures such as positive regression dependence on subset or weak dependence at a marginal level (Benjamini and Yekutieli, 2001; Storey et al., 2004) shed light on the validity of the RESS procedure. Detailed analysis will be given in Section 2.2.

At a first glance, the test statistic may result in much information loss due to the sample-splitting. In fact, benefiting from the joint use of two independent statistics, the relative power loss of with respect to is quite mild. By the inequality , the power ratio of the tests based on and can be easily bounded as
where is the standard deviation of , and and are the upper quantiles of the distributions of and , respectively. Further when , both of the test statistics and have asymptotic power 1. For better understanding, the power curves (with size corrected) of the two tests are presented in Figure 2 for some commonly used settings. We can see that though is always inferior to as we can expect, the disadvantage is not very significant and also tends to be less pronounced when increases. This power sacrifice of the RESS in turn brings us much better error rate control as we shall show in the next subsection. On the other hand, compared with the test statistic based on only group , the proposed test statistic has smaller variance because . Since the null distribution of is symmetric, the upper quantiles of the distributions of would be much smaller than those of . As a result, is more powerful than .
Remark 1 It is noteworthy that the joint use of and the threshold distinguishes our RESS procedure from the methods given by Wasserman and Roeder (2009) and Meinshausen et al. (2009) which used the sample-splitting scheme to conduct variable selection with error rate control. They used the first split to reduce the number of variables to a manageable size and then applied FDR control methods on the remaining variables using the data from the second split. The normal calibration is usually required to obtain -values. In contrast, in the RESS procedure, the data from both two splits are used to compute the statistics and the empirical distribution is in place of asymptotic distributions.
2.2 Theoretical results
We firstly investigate the FDR control of the proposed RESS procedure when are independent each other, and then extend the results to the dependent case under stronger conditions. A simply yet effective refined procedure with better accuracy in FDR control will be further developed after the convergence rate of the FDP of the RESS is investigated.
2.2.1 Independent case
A preliminary result of this paper is that the proposed procedure controls a quantity nearly equal to the FDR when the populations are symmetric.
Proposition 1
Assume are symmetrically distributed and independent each other. For any and , the RESS method satisfies
The term bounded by this proposition is very close to the FDR in settings where is dominated by . Following Barber and Candès (2015), if it is preferable to control the FDR exactly, we may adjust the threshold by
with which we can show that
In what follows, as we mainly focus on the asymptotic FDR control, the results with and are generally the same. From the proof of this proposition, we can see that the inequality is due to the fact that
which would usually be tight because most strong signals yield a positive or at least a not too small value of . This implies that it is very likely that the FDR of the RESS will be fairly close to the nominal one unless a large proportion of ’s for is very weak.
Proposition 1 is a direct corollary of the following result in which ’s are allowed to be asymmetric.
Proposition 2
Assume are independent each other and . For any , the RESS method satisfies
where .
We can interpret as measuring the extent to which the symmetry is violated for a specific variable . This result concurs with our intuition that controlling the ’s is sufficient to ensure control of the FDR for the RESS method. In the most ideal case where are symmetrically distributed, for all , and we automatically obtain the FDR-control result in Proposition 1 since we can take . Under asymmetric scenarios, the can still be expected to be small due to the convergence of and to the normal if is not too small. In the next theorems, we will show that under mild conditions in probability, yielding a meaningful result on FDR control in more realistic settings. The proof of this proposition follows similarly to Theorem 2 in Barber et al. (2019) which shows that the Model-X knockoff (Candes et al., 2018) selection procedure incurs an inflation of the FDR that is proportional to the errors in estimating the distribution of each feature conditional on the remaining features.
For our asymptotic analysis, we need the following assumptions. Throughout this paper, we assume for some , which includes the sparse setting .
Assumption 1
(Moments) (i) For some constant , ; (ii) For some constants and , .
Assumption 2
(Signals)
Remark 2 The moment condition in Assumption 1-(i) is required in a large deviation result for the Student’s statistics on which our proof heavily hinges. Assumption 1-(ii), which requires exponentially light tails and implies that all moments of are finite, is stronger than Assumption 1-(i). This will be only needed when we want to use the RESS method with correction (see Section 2.2.2). In fact, for the familywise error control with bootstrap calibration, similar conditions are also imposed to achieve better accuracy (Fan et al., 2007). The implication of Assumption 2 is that . If the number of true alternatives is fixed as , Liu and Shao (2014) have shown that even with the true -values, the BH method is unable to control FDP with a high probability. Thus, we use this condition to rule out such cases.
Theorem 1
The proof of this theorem relies on a nice large-deviation result for -statistics (Delaigle et al., 2011) but our new statistic makes that the technical details of our theory are not straightforward and cannot be obtained from existing works. Under a finite fourth moment condition, Theorem 1 reveals that the FDR of our RESS method can be controlled if satisfies the technical condition on the alternative set, Assumption 2. Roughly speaking, the theorem ensures that the in Proposition 2, which can be bounded by , is small, where denotes the probability density function of . Note that the inequality in (2.2) is mainly due to
Hence, the FDR control is often quite tight because
in many situations, such like , where for any small . See a proof given in the Supplementary Material.
It is interesting to further unpack the convergence rate given in this theorem. Liu and Shao (2014) has shown that the convergence rate of the bootstrap calibration is
The indicates that our “raw” RESS method is inferior to the bootstrap calibration, especially when is very large (so that the term of order has non-ignorable effect). Actually, the term can be eliminated by a simple correction as discussed below.
2.2.2 A refined procedure
By more carefully examining the FDP of the proposed RESS method, we can show that for any , we have
(2.3) |
where . Say, we are able to express the term of order to a more accurate way, which benefits from utilizing the empirical distribution to approximate and “surprisingly” eliminate the terms of order . Clearly, the is an underestimate of the true FDP, and in turn yields an inflation of the FDR.
This motives us to consider the test statistic , where . As shown in the Appendix, the FDP of using satisfies
(2.4) |
The difference in the asymptotic expansions of and is due to the different large-deviation probabilities of the and . This difference immediately suggests a “refined” threshold as,
(2.5) |
where
We show that under certain conditions, and consequently using is capable of eliminating the effect of the term in (2.2.2).
The next theorem demonstrates that the refined procedure has better convergence rate in certain circumstance. Basically, we restrict our attention to the sparse case, say , such like for . This is because the term of order only matters when is large. From the proof of Theorem 1 we see that . In other words, only if or is small, the tail approximation of to would be important.
Theorem 2
In this theorem, the condition implies that the number of the signals we can identify dominates the number of those very weak signals. The RESS method with the refined threshold has the same convergence rate as the bootstrap calibration. Simultaneous testing of many hypotheses allows us to construct a “data-driven” correction of skewness without resampling. Thus, in some sense, large-scale tests with ultrahigh-dimension may not be considered as a “curse” but a “blessing” in our problem.
We summarize the refined RESS procedure as follows.
Reflection via Sampling Splitting Procedure (RESS)
-
•
Step 1: Randomly split the data into two parts with equal size. Compute and for and ;
-
•
Step 2: Obtain and for . Compute for ;
-
•
Step 3: Find the threshold by (2.5) and reject if .
The total computation complexity is of order and the procedure can be easily implemented even without high-level programming language. The R and Excel codes are available upon request.
We want to make some remarks on the use of . As can be seen from (2.4), is an overestimate of the true FDP, and therefore yields a slightly more conservative procedure. In practice, if the computation is our major concern, using the RESS procedure with could be a safe choice.
2.2.3 Dependent case
We establish theoretical properties of the dependent case. The first result is a direct extension of Proposition 2.
Proposition 3
Assume that . For any , the RESS method satisfies
where and .
Again, quantifies the effect of both the asymmetry of and the dependence between and on the FDR.
To achieve asymptotical FDR control, the following condition on the dependence structure is imposed.
Assumption 3
(Correlation) For each , assume that the number of variables that are dependent with is no more than .
This assumption implies that is independent with the other variables. This is certainly not the weakest condition, but is adopted here to simplify the proof.
Theorem 3
This theorem implies that the RESS method remains valid asymptotically for weak dependence. Comparing Theorem 3 with Theorems 1-2, the main difference lies on the convergence rates of and ; the latter one is asymptotically larger. This can be understood because the approximation of the empirical distribution to the population one is expected to have slower rate for dependent summation. When is independent with all the other variables or only dependent with finite number of , then . In this situation, Theorem 3 reduces to Theorems 1-2.
Remark 3 In our discussion, we restrict which facilitates our technical derivation. In Liu and Shao (2014), a faster rate of is allowed, . However, we also notice that the bootstrap method proposed by Liu and Shao (2014) jointly uses the empirical distribution of , where is the bootstrap statistic for the th variable. This implies that its computation complexity is of order and it also requires storage, where is the number of bootstrap replications. For the commonly used bootstrap, say to approximate the distribution of individually by resampling (Fan et al., 2007), is achieved if the replication of bootstrap is of order . Though our theoretical results only allow , we conjecture that similar results also hold when if more stringent conditions were imposed. Encouragingly, our extensive simulation results show that the refined procedure could work at least as good as the bootstrap method in terms of FDR control, even when is super-large.
3 Extensions
In this section, we discuss two generalizations of our RESS procedure.
3.1 One-sided alternatives
In certain situations, we are interested in the case with one-sided alternatives, say without loss of generality, for all . We may modify the RESS by ruling out the cells with and from the set to improve the power. To be more specific, the threshold in (2.1) is modified by
and we reject when . We have the following result.
Corollary 1
Consider the one-sided hypotheses that for all . If the conditions in Theorem 3-(i) hold, then for any , the FDP of the RESS method with the threshold satisfies
and .
By using the results in Delaigle et al. (2011), the convergence rate of the normal calibration is
Comparing this with Corollary 1, we see that the RESS strategy has removed the skewness term that describes first-order inaccuracies of the standard normal approximation. This important property is due to the fact that is more symmetric than the statistic. See the proof of Theorem 1 for details. The refined RESS procedure, which also enjoys the second-order accuracy, can be defined similarly to but we do not discuss it in detail.
3.2 Two-sample problem
We next extend the RESS procedure to two-sample problem. Assume there is another random sample distributed from . The population mean vectors of and are and , respectively. We aim to carry out two-sample tests, that is, versus , for . The RESS procedure can be readily generalized to this two-sample problem as follows.
Firstly, similar to the sample splitting of and , the data are also splitted randomly into two disjoint groups and with equal size . Based on and , two-sample -test is defined as follows:
Here and are the sample means of and , while and are the sample variances of and . Finally, define . The threshold can be then selected similarly as in (2.1), and will be rejected when .
To establish the FDR control result, the Assumptions 2 and 3 are modified as follows.
Assumption 4
(Signals) For a large ,
Assumption 5
(Correlation) For each , assume that the number of variables that are dependent with is no more than . The same assumption is imposed on .
By Theorem 2.4 in Chang et al. (2016) and the proofs for Theorem 3, we have the following result.
4 Numerical results
4.1 Simulation comparison
We evaluate the performance of our proposed RESS procedure on simulated data sets and compare the FDR and true positive rate (TPR) with other existing techniques. All the results are obtained from 200 replication simulations.
4.1.1 Model and benchmarks
We set the model as , where the alternative signal using with or for . The random error is generated by the autoregression model , where and ’s are i.i.d from three centered distributions: Student’s with five degrees of freedom , exponential with rate one () and a mixed distribution which consists of for , for and for . When , the random errors are independent. We consider the number of alternatives as with .
The following three benchmarks are considered for comparison. The first one is the BH procedure with the -values obtained from the standard normal approximation (termed as BH for simplicity). The other two are bootstrap-based approaches. Assume , denotes bootstrap resamples drawn by sampling randomly and are Student’s t test statistics constructed from . One bootstrap method is to estimate the -values according to the bootstrap distribution individually by (Fan et al., 2007) and another one is to estimate the -values with the average bootstrap distribution together by (Liu and Shao, 2014). We call these two as “I-bootstrap” and “A-bootstrap”, respectively. The A-bootstrap jointly estimates the distribution of ’s and thus can be expected to have better performance. However, the computational complexity of I-bootstrap is much lower; it is of order while that of A-bootstrap increases in a quadratic rate of . We take the number of bootstrap samples as in this section.
4.1.2 Results
We compare the performance of our proposed RESS method (termed as “”) and the refined RESS method (“RESS”) in a range of settings, with the BH procedure, I-bootstrap and A-bootstrap, and examine the effects of skewness, signal magnitude and correlation between variables.
Errors | ||||||||
---|---|---|---|---|---|---|---|---|
Method | FDR(%) | TPR(%) | Time(s) | FDR(%) | TPR(%) | Time(s) | ||
RESS | 1.75 | 1.84 | ||||||
1.66 | 1.74 | |||||||
BH | 1.60 | 1.60 | ||||||
A-bootstrap | 404 | 400 | ||||||
I-bootstrap | 314 | 311 | ||||||
RESS | 1.76 | 1.85 | ||||||
1.66 | 1.76 | |||||||
exp(1) | BH | 1.70 | 1.63 | |||||
A-bootstrap | 424 | 403 | ||||||
I-bootstrap | 332 | 317 | ||||||
RESS | 1.85 | 2.17 | ||||||
1.75 | 2.06 | |||||||
Mixed | BH | 1.65 | 1.64 | |||||
A-bootstrap | 405 | 409 | ||||||
I-bootstrap | 314 | 319 |
Firstly, we set and . The full sample size is taken to be or , and the target FDR level is set as . Table 1 displays the estimated FDR, TPR and average computation time obtained by each method under three different error settings with . For the symmetric distribution , the is able to deliver a quite accurate control but that is not the case for the other skewed errors. It performs better than the BH in terms of FDR control, but has a slightly disadvantage over the I-bootstrap. This is consistent with our theoretical analysis in Section 2.2.1. Actually, from Theorem 1 and Theorem 3-(i), when is very large, the skewness still has non-ignorable effect on the FDR control of the method. In contrast, we observe that the FDR levels of RESS are close to the nominal level under all the scenarios; it is clearly more effective than the I-bootstrap, and BH, and the difference is quite remarkable in some cases. Certainly, this is not surprising as it is a data-driven method which has second-order accuracy justified in Section 2.2.2. The power of RESS is also quite high compared to the other methods, revealing that its detection ability is not largely compromised by data-splitting.

The I-bootstrap improves slightly the accuracy of FDR control under the and mixed cases over the normal calibration, but its FDRs are still considerably larger than the nominal level. Note that the I-bootstrap calibration may need an extremely large replications , i.e. , to achieve FDR control (Liu and Shao, 2014), and therefore does not perform well with commonly used . The A-bootstrap method offers comparable performances to our RESS method, though it tends to be a little conservative under the cases. Also, the A-bootstrap generally has smaller variations than RESS due to the use of bootstrap for estimating an overall empirical null distribution. Certainly, this gain comes partly from its computation-intensive feature; in most cases, it requires more than 200 times computational time than the RESS. In fact, as mentioned earlier, the computational complexities of A-bootstrap and RESS are of order and , respectively, and hence their relative computational time could increase fast as increases. Figure 3 depicts the relative time of A-bootstrap to RESS and the FDR vaules under the cases.


Next, we examine the effect of the skewness and the proportion of alternatives. As we have shown that the refined RESS performs usually better than the , in what follows we focus only on the refined RESS. With respect to skewness, we evaluate the performances of various methods by varying shape parameter of Gamma distribution , Figure 5 depicts the estimated FDR curves against the values of when , and for and . It can be seen that the refined RESS successfully controls FDR in an acceptable range of the target level no matter the magnitude of the skewness. Figure 4 shows the boxplots of FDP and powers under and when the errors are generated from . The refined RESS has stable FDP close to the nominal level , even when is small. Similar results with other error distributions and signal magnitudes can be found in the Supplementary Material.

Finally, we turn to investigate the effect of the correlation level . Figure 6 shows the FDR curves of four methods against the values of when the errors are generated from the and mixed distributions. Again, it can be seen that the refined RESS and A-bootstrap result in a reasonably good FDR control in most situations, even when is as large as 0.8. This concurs with our asymptotic justification that the refined RESS method is still effective provided that satisfies certain weak dependence structure.
4.2 A real-data example
We next illustrate the proposed refined RESS procedure by an empirical analysis of the acute lymphoblastic leukemia (ALL) data, which consists of 12,256 gene probe sets for 128 adult patients enrolled in the Italian GIMEMA multi center clinical trial 0496. It is known that malignant cells in B-lineage ALL often have genetic abnormalities, which have a significant impact on the clinical course of the disease. Specifically, the molecular heterogeneity of the B-lineage ALL is well established as BCR/ABL, ALL1/AF4, E2A/PBX1 and NEG and the gene expression profiles of groups BCR/ABL and NEG are more similar to each other than to the others. In our analysis, we consider a sub-dataset of 79 B-lineage units with 37 BCR/ABL mutation and 42 NEG and use the traditional two-sample -test to examine which probe sets are differentially expressed. The dataset was previously studied by Chiaretti et al. (2005) and Bourgon et al. (2010) and is available at http://www.bioconductor.org.


Here, we consider the extension of our proposed refined RESS in two-sample case and compare it with BH procedure, A-bootstrap and I-bootrstrap over a wide range of significant levels. Both of the bootstrap sampling are repeated 200 times. Table 2 summarizes the number of probe sets differentially expressed between BCR/ABL and NEG. The BH procedure with the normal calibration tends to reject surprising more genes than RESS and A-bootstrap for various significance levels. In fact, the normality approximation seems to be violated for many of the genes as some skewness values largely deviate from zero in Figure 7. As noted earlier, the I-bootstrap needs an extremely large replications, i.e. to improve the accuracy and thus leads to the large number of rejections with . Figure 8 presents the scatterplot of RESS’s statistics . We can observe that all selected probe sets (red triangles) have large values of , while the unselected ones (black dots) are roughly symmetric across the horizontal lines. With the data-driven threshold, the number of differentially expressed probe sets based on our RESS appears more reasonable. The results of A-bootstrap coincide with that of RESS as expected.
RESS | BH | I-bootstrap | A-bootstrap | |
---|---|---|---|---|
0.05 | 157 | 163 | 326 | 141 |
0.1 | 221 | 238 | 326 | 222 |
0.15 | 280 | 334 | 476 | 310 |
0.2 | 333 | 414 | 476 | 397 |
5 Concluding Remarks
In this paper, we have proposed a multiple testing procedure, RESS, that controls FDR in the large-scale setting and offers high power to discover true signals. We give theoretical results showing that the proposed method maintains FDR control under mild conditions. The empirical performance of the refined RESS demonstrates excellent FDR control and reasonable power in comparison to other methods such as the bootstrap or normal calibrations. The ideas of the RESS procedure may be extended for controlling other rates such as per family error rate.
Appendix: Proofs
Appendix A: Lemmas
Before we present the proofs of the theorems, we first state several lemmas whose proofs can be found in the Supplementary Material. A few well-known theorems to be repeatedly used are also presented in the Supplementary Material. The first lemma characterizes the closeness between and , which plays an important role in the proof. For simplicity, we suppress the dependence of on which should not cause any confusion. Let , , and for .
Lemma A.1
The second lemma characterizes the closeness between and .
Lemma A.2
Suppose Assumption 1-(ii) hold. For sufficiently large satisfying ,
The next lemma establishes the uniform convergence of .
The last one is the counterpart of Lemma A.3 for .
Appendix B: Proof of Theorems
In fact, we show that under a weaker condition, i.e., Assumption 3, the results in Theorems 1-2 hold similarly, and accordingly Theorem 3 is also proved.
Proof of Theorem 1. By definition, our test is equivalent to reject if , where
Let be a subset of satisfying and . By Assumption 1-(i) and Markov’s inequality, for any , . By Assumption 2 and Lemma S.1, there exists some ,
(A.3) |
where we use the fact that .
Define . We note that implies that when is large. This is because
if is sufficiently large. This, together with Lemma A.3 and Eq. (A.3) implies that with probability at least .
Therefore, by Lemma A.1, we get
(A.4) |
Write
Note that , and thus in probability by (A.4). Then, for any ,
from which the second part of this theorem is proved.
Proof of Theorem 2. We firstly establish a lower bound for so that the condition in Lemma A.2 is valid. Notice
Hence, we can conclude that .
For ,
Next, we deal with , and the can be bounded in a similar way to . Let denote the event that and for some sufficiently large . By Assumption 1-(ii), we have
and thus holds.
Conditional on the event , we have
We only need to deal with and follows similarly. By the Markov’s inequality, we get
where , and is some positive constant. The second equality is due to Lemma A.3, the fourth inequality comes from the fact that and the last inequality uses the condition .
Finally, collecting all the terms of , we conclude that for any ,
By using similar arguments given in the Supplemental Material we can show that . Thus, we conclude that .
Accordingly, we obtain
The proof is completed.
References
- Barber and Candès (2015) Barber, R. F. and Candès, E. J. (2015), “Controlling the false discovery rate via knockoffs,” The Annals of Statistics, 43, 2055–2085.
- Barber et al. (2019) Barber, R. F., Candès, E. J., and Samworth, R. J. (2019), “Robust inference with knockoffs,” The Annals of Statistics, to appear.
- Benjamini and Hochberg (1995) Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57, 289–300.
- Benjamini and Yekutieli (2001) Benjamini, Y. and Yekutieli, D. (2001), “The control of the false discovery rate in multiple testing under dependency,” The Annals of Statistics, 29, 1165–1188.
- Bourgon et al. (2010) Bourgon, R., Gentleman, R., and Huber, W. (2010), “Independent filtering increases detection power for high-throughput experiments,” Proceedings of the National Academy of Sciences, 107, 9546–9551.
- Candes et al. (2018) Candes, E., Fan, Y., Janson, L., and Lv, J. (2018), “Panning for gold: ‘model-X’ knockoffs for high dimensional controlled variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80, 551–577.
- Cao and Kosorok (2011) Cao, H. and Kosorok, M. R. (2011), “Simultaneous critical values for t-tests in very high dimensions,” Bernoulli: official journal of the Bernoulli Society for Mathematical Statistics and Probability, 17, 347–394.
- Chang et al. (2016) Chang, J., Shao, Q.-M., and Zhou, W.-X. (2016), “Cramér-type moderate deviations for Studentized two-sample -statistics with applications,” The Annals of Statistics, 44, 1931–1956.
- Chiaretti et al. (2005) Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Wang, K. S., Mandelli, F., Foa, R., and Ritz, J. (2005), “Gene expression profiles of B-lineage adult acute lymphocytic leukemia reveal genetic patterns that identify lineage derivation and distinct mechanisms of transformation,” Clinical cancer research, 11, 7209–7219.
- Delaigle et al. (2011) Delaigle, A., Hall, P., and Jin, J. (2011), “Robustness and accuracy of methods for high dimensional data analysis based on Student’s t-statistic,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 283–301.
- Efron (2004) Efron, B. (2004), “Large-scale simultaneous hypothesis testing: the choice of a null hypothesis,” Journal of the American Statistical Association, 99, 96–104.
- Fan et al. (2007) Fan, J., Hall, P., and Yao, Q. (2007), “To how many simultaneous hypothesis tests can normal, student’s t or bootstrap calibration be applied?” Journal of the American Statistical Association, 102, 1282–1288.
- Liu and Shao (2014) Liu, W. and Shao, Q.-M. (2014), “Phase transition and regularized bootstrap in large-scale -tests with false discovery rate control,” The Annals of Statistics, 42, 2003–2025.
- Meinshausen et al. (2009) Meinshausen, N., Meier, L., and Bühlmann, P. (2009), “P-values for high-dimensional regression,” Journal of the American Statistical Association, 104, 1671–1681.
- Petrov (2012) Petrov, V. V. (2012), Sums of independent random variables, vol. 82, Springer Science & Business Media.
- Storey et al. (2004) Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 187–205.
- Wang (2005) Wang, Q. (2005), “Limit theorems for self-normalized large deviation,” Electronic Journal of Probability, 10, 1260–1285.
- Wasserman and Roeder (2009) Wasserman, L. and Roeder, K. (2009), “High dimensional variable selection,” The Annals of Statistics, 37, 2178–2201.
Supplementary Material for “A New Procedure for Controlling False Discovery Rate in Large-Scale -tests”
This supplementary material contains the proofs of some technical lemmas and corollaries, and additional simulation results.
Additional Lemmas
The first one is the large deviation result for the Student’s statistic . See also Wang (2005).
Lemma S.1
[Delaigle et al. (2011)] Let denote a constant. Then,
as , where the function is bounded in absolute value by a finite, positive constant (depending only on ), uniformly in all distributions of for which , and , and uniformly in satisfying .
The second one is a standard large deviation result for the mean; See Theorem VIII-4 in Petrov (2012).
Lemma S.2 (Large deviation for the mean)
Suppose that are i.i.d random variables with mean zero and variance , satisfying Assumption 1-(ii). Then for any and ,
The third lemma is a large deviation result for .
Lemma S.3
Suppose Assumptions 1-(ii) hold. Then for and ,
Proof. Without loss of generality, we assume that . First of all, we deal with , where . Observe
where and . Simple calculation yields
By using the fact that for and ,
we have
Thus,
Finally, we show that and are close enough. Note that
Proof of Lemmas and Propositions
Proof of Lemma A.1. Recalling , we have and . Let . Then,
Firstly, for the term ,
where and are two independent variables. From the proof given later, it can be easily see that uniformly in . Thus, in what follows we mainly focus on the rate of . The other term can be handled similarly.
Note that , . By the inequality
and the large deviation formula for the -statistic (Lemma S.1), we obtain that
thus we claim that .
Next, we deal with the main term . Denote . Observe
where we use Lemma S.1 again. Note that
and accordingly . Hence,
Similarly,
The Taylor’s expansion yields that
Consequently, we easily get that
from which we obtain the assertion.
Proof of Lemma A.2. The proof is similar to Lemma A.1 but using the large deviation result for obtained in Lemma S.3 and for the mean.
Proof of Lemma A.3. We prove (A.1) and the proof of (A.2) follows similarly. Here . Clearly, is a deceasing and continuous function. Let and , where with and . Note that uniformly in . Thus, it is enough to obtain the convergence rate of
Define and further
It is noted that
We then obtain
Moreover, observe that
Because can be made arbitrarily large as long as , we have .
Proof of Lemma A.4. By using the same arguments given in the proof of Lemma A.3, this lemma can be proved easily and thus omitted.
Proof of Proposition 2
We prove this proposition for . The result for can be obtained similarly. Fix and for any threshold , define
Consider the event that . Furthermore, for a threshold rule mapping statistics to a threshold , for each index , we define
i.e. the threshold that we would obtain if were set to 1.
Then for the RESS method with the threshold , we can write
It is crucial to get an upper bound for . We have
(S.5) |
where the last step holds since the only unknown is the sign of after conditioning on . By definition of , we have .
Hence,
Finally, the sum in the last expression can be simplified as: if for all null , , then the sum is equal to zero, while otherwise,
where the first step comes from the fact: for any , if and , then . See Barber et al. (2019) for a proof.
Accordingly, we have
Consequently, the assertion of this proposition holds.
Proof of Proposition 3. The proof follows similarly to that of Proposition 2 but uses to replace in (S.5).
Discussion on . Recall the definition of for any . We shall show that
Observe that for ,
On the other hand, and thus the assertion holds due to .
Method | FDR(%) | TPR(%) | FDR(%) | TPR(%) | ||
---|---|---|---|---|---|---|
RESS | ||||||
BH | ||||||
exp(1) | RESS | |||||
BH | ||||||
Mixed | RESS | |||||
BH |
Additional simulation results
Table S1 reports a brief comparison between this RESS and BH procedures when for all . We see that the RESS significantly improves the accuracy of FDR control over the BH to certain degree, while maintains high power in most cases.
