One-at-a-time knockoffs: controlled false discovery rate with higher power
Abstract
We propose one-at-a-time knockoffs (OATK), a new methodology for detecting important explanatory variables in linear regression models while controlling the false discovery rate (FDR). For each explanatory variable, OATK generates a knockoff design matrix that preserves the Gram matrix by replacing one-at-a-time only the single corresponding column of the original design matrix. OATK is a substantial relaxation and simplification of the knockoff filter by Barber and Candès, (2015)(BC), which simultaneously generates all columns of the knockoff design matrix to satisfy a much larger set of constraints. To test each variable’s importance, statistics are then constructed by comparing the original vs. knockoff coefficients. Under a mild correlation assumption on the original design matrix, OATK asymptotically controls the FDR at any desired level. Moreover, OATK consistently achieves (often substantially) higher power than BC and other approaches across a variety of simulation examples and a real genetics dataset. Generating knockoffs one-at-a-time also has substantial computational advantages and facilitates additional enhancements, such as conditional calibration or derandomization, to further improve power and consistency of FDR control. OATK can be viewed as the conditional randomization test (CRT) generalized to fixed-design linear regression problems, and can generate fine-grained p-values for each hypothesis.
Keywords: variable selection, knockoff filter, selective inference, false discovery rate, multiple hypothesis testing
1 Introduction
Linear and ridge regression are universal tools for data analysis. In applications in which there exist many explanatory variables in the data that do not affect the response variable, users often seek to discover the sparse subset of significant variables and discard the variables having no effect. Consider the fixed-design linear regression model
(1) |
where is the response vector, is the known and deterministic design matrix of explanatory variables with Gram matrix , is Gaussian noise, and is the vector of unknown regression coefficients. Given this model, we are interested in testing the null hypotheses
for each . The goal is to identify , i.e., the index set of non-null variables. Specifically, we aim to leverage the data to produce an estimate of , while controlling the false discovery rate , where
(2) |
is the false discovery proportion, and for . Given a pre-specified target FDR level , our primary goal is to control the FDR of by in a manner that yields high , where
(3) |
is the true discovery proportion. In both (2) and (3), the expectation is taken with respect to with treated as fixed.
A classical approach to this variable selection problem is to first construct a p-value for each hypothesis, and then obtain a selection set based on these p-values (Benjamini and Hochberg,, 1995; Benjamini and Yekutieli,, 2001; Storey,, 2002; Storey et al.,, 2004). In some cases, it might be difficult to obtain valid p-values, especially when one wishes to incorporate additional information (e.g., sparsity, smoothness) in the model fitting step.
Recently, there have been two new (suites of) p-value-free variable selection algorithms: the knockoff filter (Barber and Candès,, 2015; Candès et al.,, 2018) (BC) and the Gaussian mirror (Xing et al., 2023a, ; Dai et al., 2023a, ; Dai et al., 2023b, ) (GM). The knockoff filter constructs a knockoff matrix such that
where and is some suitably chosen -dimensional non-negative vector, such that is positive semi-definite. The knockoff filter regresses (possibly using regularization, e.g., Lasso (Tibshirani,, 1996) or ridge (Hoerl and Kennard,, 1970)) onto the augmented design matrix to yield the augmented vector of estimated coefficients for both the original explanatory variables and their knockoffs. The procedure then constructs certain antisymmetric statistics such that compares and in a manner that results in a positive if appears more significant than and a negative if the opposite is true. A variable is selected (i.e., ) if is large and positive. For example, one choice that was further explored in Candès et al., (2018) is . Another choice that applies only to Lasso regression with regularization parameter , which was the primary focus of BC, is to define , where () denotes the largest on the Lasso path for which () was nonzero. A positive in this case means that, as was increased, remained in the model longer than , and vice-versa for negative . The knockoff filter then selects variables via , where the threshold is selected empirically via
(4) |
Above, the numerator is an estimate of the number of false discoveries for threshold , since the knockoffs are constructed without regard for and truly have no effect on the response, and so is symmetric with respect to for the null features. Barber and Candès, (2015) showed that this knockoff procedure controls the FDR by .
Although the knockoff filter provably controls the FDR, it has been observed to lose power in certain problems (Gimenez and Zou,, 2019; Liu and Rigollet,, 2019; Li and Fithian,, 2021). The loss of power may be due to several reasons. Perhaps most significantly, the statistics of the knockoff filter are computed for the augmented regression problem of regressing onto , which is substantially different than the original problem of regressing onto . Namely, the augmented regression involves twice as many explanatory variables as the original regression, which typically substantially increases the variance of the estimated regression coefficients and reduces the ability to discern between versus .
In contrast, the GM approach of Xing et al., 2023a considers each explanatory variable one at a time. For each , it generates a pair of “mirror variables” and for some appropriately scaled Gaussian random vector , regresses onto where denotes with the column removed, and then computes a statistic by contrasting the coefficients for and in this augmented regression. As noted in Xing et al., 2023a , this is equivalent to regressing onto and contrasting the coefficients of and . Compared with knockoffs, this approach has weaker theoretical FDR control guarantees — the FDR is controlled asymptotically when the dependence between features is mild — but empirically, it sometimes exhibits higher power while maintaining reasonable FDR control.
We propose a new approach, termed one-at-a-time knockoffs (OATK), that combines desirable aspects of the GM and BC approaches but that is couched more firmly in the knockoff framework. One of the contributions of this work is to draw a closer connection to the knockoff framework than was drawn in Xing et al., 2023a , in a manner that provides insight into how to achieve more powerful detection of signals and improve computational issues, while preserving FDR control. The specific contributions of this work include:
-
(1)
OATK constructs a knockoff copy for each variable one at a time, where the knockoffs must satisfy only a subset of the BC knockoff conditions, allowing for substantially greater flexibility in generating knockoffs. The subsequent knockoff regressions are of exactly the same dimension as the original regression (only a single column is replaced by its knockoff), avoiding the reduction in power associated with the BC regression onto variables. Our feature importance statistics constructed from the knockoff and the original coefficients are still symmetric around zero for the nulls.
-
(2)
OATK is far more computationally efficient to implement than the BC procedure, and, using well-known matrix algebra identities, the feature importance statistics can be computed without actually regressing onto the knockoff design matrix. The low computational expense and one-at-a-time nature of the knockoffs facilitate adding performance enhancements such as derandomization or a “multi-bit” variant that constructs multiple knockoff copies for each feature, leading to a fine-grained marginally valid p-value that can be viewed as a conditional randomization test p-value (Candès et al.,, 2018) in the fixed-design setting. Such p-values (one-bit or multi-bit) are then passed to the SeqStep filter (Barber and Candès,, 2015) to generate the selection set.
-
(3)
We prove asymptotic FDR control of OATK under mild assumptions and develop non-asymptotic bounds on the FDR, characterizing conditions for approximate FDR control. We also propose conditionally-calibrated OATK that achieves exact FDR control, leveraging techniques in Fithian and Lei, (2022).
-
(4)
Through extensive numerical experiments and real-data examples, we demonstrate that our approach is substantially more powerful than the approaches of either BC or GM, while still controlling the FDR.
Additional notation and assumptions used throughout the paper.
Throughout, we assume is full rank, each () is scaled to have unit norm, and we denote as the knockoff matrix. The matrix denotes the original design matrix with column removed. For any matrix , let and denote the -th column and -th row, respectively, and let or denote the -th row, -th column element. The set of null features is denoted by . We denote and as the number of null and non-null variables, respectively.
2 One-at-a-time knockoff procedure
Our approach generates knockoffs one-at-a-time satisfying the following two conditions:
(5) |
where is a diagonal matrix with non-negative entries such that is a positive semi-definite matrix. This is substantially less restrictive than the BC approach, which additionally requires . Throughout, the regressions can either be ordinary least squares (OLS) or regularized ridge regression, the former being a special case of the latter.
In Section 2.1, we describe the main ideas behind our OATK regression. In Section 2.2, we convert the knockoff requirements (5) into equivalent but computationally simpler conditions and suggest a simple and attractive choice for . In Section 2.3, we derive an explicit connection between our knockoff regression coefficients and the original regression coefficients that is both revealing and that allows a fast procedure for computing the coefficients without having to explicitly generate knockoffs or conduct the knockoff regressions. In Section 2.4, we discuss what properties the test statistics must possess to control FDR. In Section 2.5, we describe other enhancements that OATK can incorporate to further improve FDR control and power.
2.1 OATK regression structure
Consider the ridge regression parameter estimates
(6) |
where , and is the ridge regularization parameter. The same value for used in this ridge regression is used in all the knockoff regressions described shortly. See Appendix A for a computationally efficient procedure to select empirically. We define our knockoff ridge regression approach as follows. For each , consider
(7) |
where is the design matrix that we use in the OATK knockoff regression for the -th variable (i.e., we replace only the -th column by its knockoff). The only element of that we will need is , which from (7) is obtained via , where can be interpreted as the reciprocal of the regression sum of squared error (SSE) for regressing onto the other columns of . Thus, we can efficiently obtain our vector of knockoff coefficients via
(8) |
Note that in (6) and (8) is different than in the augmented regression of the BC approach.
The following proposition, whose proof is in Appendix B.1, establishes the joint distribution of with treated as fixed.
Proposition 1.
From Proposition 1, if , for any , the marginal distribution of is
(14) |
which implies that the distribution of is symmetric, as formalized below.
Corollary 2.1.
Under the same conditions of Proposition 1, when is a null, the marginal distribution of is symmetric, i.e., .
Several remarks are in order.
Remark 1.
For the special case that (OLS), the means and covariances reduce to
(15) | |||
(16) | |||
(17) | |||
(18) |
In this case, a particularly appealing choice of is , which is always allowable (see Section 2.2). This results in regardless of , which should yield better separation between the statistics and when . A potentially desirable side effect of this choice for is that it also results in
Remark 2.
Unlike the BC knockoffs, our knockoffs are not required to satisfy , other than the diagonal conditions . The removal of this condition only affects and allows for a more flexible, and potentially better, choice of . As the price, our procedure no longer enjoys provable finite-sample FDR control like the BC procedure. But as we show later, it still delivers asymptotic FDR control and finite-sample FDR bounds under mild and verifiable conditions, and it demonstrates good FDR control empirically.
2.2 Knockoff generation and choice of
The one-at-a-time knockoff conditions defined in (5) are equivalent to requiring
(19) |
To provide insight into the role of the knockoffs and how to generate them efficiently, consider the singular value decomposition , where and have orthonormal columns, and is diagonal. Without loss of generality, represent as
(20) |
where is the unit-norm vector orthogonal to the columns of and such that the columns of are an orthonormal basis for the column space of , is the sum of squares error (SSE) for regressing onto , and satisfies . Our assumption that is full rank implies for . To generate , we must find the coefficients and and then generate appropriately. The following proposition characterizes the exact form of .
Proposition 2.
For each , the -th knockoff satisfying (5) must be of the form
(21) |
where we can select any and then generate randomly as any vector orthogonal to the columns of and with norm .
The proof of Proposition 2 is relegated to Appendix B.2. Regarding the choice of , Equation (14) implies that it has no effect on the variances of , although it does affect their correlations (and their correlations with ). The appealing choice suggested in Remark 1 is always allowable, and we use this in all of our examples. This choice also results in , in which case (21) becomes
(22) |
where is any vector orthogonal to the columns of and with norm . The OATK (22) is orthogonal to , the component of orthogonal to the column space of . Consequently, the choice also results in the correlation between and being no more than what ensures that has the required correlation with in (19).
2.3 Fast computation of OAT knockoff coefficients
Combining (21) and (8), the -th knockoff coefficient for general satisfying becomes
(23) | ||||
(24) | ||||
(25) |
where denotes the coefficient for in the OLS regression of onto . The last equality follows because, by standard partial correlation arguments, is equivalent to the coefficient for the regression of onto , which is
(26) |
For the specific choice , (25) becomes
(27) |
From (25) or the special case (27), the knockoff coefficients can be computed efficiently without requiring generation of the full knockoffs or additional regressions that include the knockoffs. One can simply conduct both ridge and OLS regression of onto and then compute after generating . An efficient implementation of this procedure and a fast approach for implementing ridge regression for various and then selecting the best via leave-one-out cross-validation (LOOCV) is detailed in Appendix A. In addition to having interpretive value, expression (27) also enables a very fast version of derandomized knockoffs, which we discuss in Section 2.5.3.
2.4 Knockoff statistics
After constructing and , the goal is to construct knockoff statistics such that large positive values of give evidence that for each variable . Specifically, we focus on
(28) |
and our OATK approach selects variables via , where the data-driven threshold is selected (analogous to the BC threshold selection) as
(29) |
where is the desired FDR and is the offset parameter, usually fixed as or .
The intuition of (28) is that a variable is selected only if and . This second inequality reduces false discoveries when there exists strong multicollinearity, which potentially yields large variance of and therefore large magnitudes of the knockoff coefficient even when the variable is null. We use (28) in all our numerical examples due to its empirically accurate FDR control and high power.
In general, any choice of can be used with (29) in our framework as long as they are marginally exchangeable, as defined below.
Definition 2.2.
The statistics are marginally exchangeable if swapping and only switches the sign of for all .
2.5 Extensions of one-at-a-time knockoffs
We introduce three extensions of OATK. The first considers multiple exchangeable knockoff copies, which allows for producing fine-grained p-values. The second adjusts the rejection set of OATK to achieve finite-sample FDR control with the conditional calibration technique. The third extension concerns the derandomization of the OATK procedure.
2.5.1 Multiple OATK
For each , OATK constructs a knockoff copy that preserves the correlation structure of the original feature. We now consider generating multiple knockoff copies, with a joint correlation condition. Specifically, let denote the number of knockoff copies such that . For any , let denote the -th knockoff copy for and denote the assembly of the knockoff copies (this is to be distinguished from ). We impose the following condition on : for each ,
(30) | ||||
(31) |
where is some constant that will be determined shortly. Letting be an -dimensional vector of ’s, we generate via
(32) |
where is an orthogonal matrix whose column space is orthogonal to that of (this is possible since ), and satisfies
(33) |
When , the right-hand side of (33) is positive semi-definite and such exists. This is formalized in the following lemma.
Lemma 2.3.
The proof of Lemma 2.3 can be found in Appendix B.3. Compared with (19), Condition (iv) in (30) additionally specifies the correlation between the knockoff copies, and the conditions are reduced to those of OATK when .
Next, define for each and the design matrix , i.e., replacing the -th column by the -th knockoff. Let denote the coefficient of from regressing onto , and the coefficient of from regressing onto . Denoting , we define the OATK p-value as
(34) |
The selection procedure is based on and the SeqStep+ filter (Barber and Candès,, 2015), proceeding in two steps: (1) the features are ordered according to such that , where is a permutation of ; (2) the rejection set is , where
for some constant . This selection rule reduces to the basic OATK when .
The following lemma proves that is super-uniform under . By construction, is supported on , and when is large, the p-value is sufficiently fine-grained to demonstrate the strength of evidence against the null hypothesis.
Lemma 2.4.
Suppose there are no ties among for almost surely. Then, defined in (34) is uniform distributed on for any .
The proof of Lemma 2.4 can be found in Appendix B.4. Multiple OATK can be viewed as a generalization of the conditional randomization test p-value (Candès et al.,, 2018) to the fixed design setting, where a (fine-grained) p-value is obtained by permutation on a synthetically created group. Like the case, FDR can be approximately controlled when the dependence between p-values is relatively small. For simplicity, we focus on proving FDR control in Section 3 for the OATK, which is equivalent to the multiple OATK with .
2.5.2 Conditional calibration for exact FDR control
As alluded to before, our relaxation of the knockoff conditions comes at the price of finite-sample FDR control. Although we will show in Section 3, that OATK exhibits approximate finite-sample FDR control, as well as asymptotic FDR control, it still may be desirable to achieve exact finite-sample control in some cases, especially when type-I errors are costly.
To see why the OATK conditions do not lead to exact FDR control, recall that for BC knockoffs,
where the first inequality holds by (29) with , and the last inequality holds if , where is with its -th item removed, for all null features , which is not the case for our method.
A remedy to the FDR violation (both in theory and in practice) is via conditional calibration (Fithian and Lei,, 2022; Luo et al.,, 2022), which is efficiently implementable with OATK. To conditionally calibrate OATK, suppose that for each we have a statistic that is sufficient for the model described by such that if is null, we can sample from . In this case, we could use Monte Carlo to empirically evaluate the quantity
(35) |
where is a parameter not necessarily equal to and is a constant. The recommended choice is to take to yield better power (Fithian and Lei,, 2022).
To calibrate OATK, we first run the base OATK procedure to obtain and . Then, for each , compute and let
(36) | |||
For a fixed , can be computed using Monte Carlo. For each , we sample , run OATK to obtain (as a function of ), evaluate the term inside the expectation in (35), and average across the replicates. To compute , it suffices to evaluate at because implies and implies .
To obtain the rejection set, we apply the eBH procedure (Wang and Ramdas,, 2022) to the ’s, which sorts and computes
It returns the set . The following theorem proves that ’s are compound e-values (Ignatiadis et al.,, 2024) and applying the eBH procedure to ’s guarantees FDR control. Its proof can be found in Appendix B.5.
Proposition 3.
The ’s defined in (36) satisfies that and applying the eBH procedure to ’s controls the FDR at the desired level.
Remark 3.
If we let (instead of as defined as above), , when defining the ’s, then applying the eBH procedure to ’s is equivalent to applying the BC filter to the ’s (Ren and Barber,, 2024).
The remaining question is the choice of . In our setting, where , a sufficient statistics is , and the full distribution of can be found in Luo et al., (2022, Section E.2).
2.5.3 Derandomized one-at-a-time knockoffs
A drawback of OATK, and of BC and GM, is that the knockoff variable is stochastic in the sense that in Proposition 2 is not unique, so that repeating the algorithm yields different knockoffs and possibly different rejection sets. As for BC, this is alleviated by the derandomization procedure proposed by Ren et al., (2023) and Ren and Barber, (2024). For OATK, we adopt the derandomization scheme in Ren et al., (2023): we fix a rejection threshold and repeat the basic OATK procedure times, each time using a different random generation of the vectors. The rejection set is then the set of variables that were rejected in more than trials.
-
1.
For each replicate , generate knockoffs with randomly generated vectors (e.g., as Gaussian vectors that are subsequently made orthogonal to and then rescaled to have the required norm) and construct the rejection set as in the basic OATK procedure.
-
2.
For each variable , compute the rejection frequency
(37) -
3.
Return the final rejection set as .
OATK allows derandomization to be incorporated in a more straightforward and computationally efficient manner that further improves performance. This is in part due to the removal of the BC knockoff requirement that , which provides improved flexibility and computational efficiency when generating knockoffs, in addition to allowing us to generate knockoffs one-at-a-time. Specifically, each knockoff variable has a deterministic component and a stochastic component, as given by Proposition 2. For each derandomization replicate, the first two terms in the right-hand side of (21) remain the same, and only the term varies. As a result, when computing the knockoff coefficients via (27), the term is computed only once. Then, copies of are generated to compute , and subsequently , for all derandomization replicates.
3 Theoretical guarantees
The insight behind the threshold selection of in (29) is essentially the same as in the BC procedure. Consider the FDP of OATK:
FDP | ||||
(38) |
where step (a) is approximately true by the symmetry of for null variables (Corollary 2.1) under mild correlation across features, and step (b) is ensured by the threshold selection (29). The approximation in step (a) carries an term due to the samples being finite. Note that the first inequality in (38) should be close to an equality when the non-null variables are sparse, especially if their coefficients are so large that is much more likely than for the non-null variables. We more rigorously derive the FDR control guarantees of OATK in the remainder of this section, with asymptotic and non-asymptotic guarantees covered in Sections 3.1 and 3.2 , respectively.
3.1 Asymptotic FDR bounds
The following theorem shows that when the ’s are reasonably weakly dependent on each other asymptotically, the OAT knockoff FDR is controlled as , , and all go to infinity in a certain manner.
Theorem 3.1.
Assume and as . Define
Assume that , , and are all continuous in and that there exist constants , and such that
(39) |
Then, for any target FDR level , if there exists such that , then
The proof is given in Appendix B.7. The following remark gives an example where the condition in (39) holds.
Remark 4.
Consider , for and obtained from OLS regression (original and knockoff) as described in Section 2.1 with . Then, we have that , implying that . As a result, for any , . To proceed, define the maximum correlation between two random variables and as
where and . Next, for any ,
where the first inequality follows from the fact that and ; the second inequality is due to the definition of . By Equation (8) in Bryc and Dembo, (2005), we have
As for the first term above,
As for the second term, we have that
Combining the two terms, we can see that Condition (39) is satisfied when
for some .
3.2 Non-asymptotic FDR bounds
Like the BC knockoffs, the key to the FDR control of OATK is the symmetry of the test statistics conditional on the other features. While the BC knockoff construction enforces such conditional symmetry, our relaxed construction does not exactly, so the FDR control of the BC procedure does not directly apply. The result in this section characterizes the FDR inflation in terms of the violation of conditional symmetry.
To start, we define for any the “symmetry index”
The following theorem provides an upper bound on the FDR of OATK as a function of the ’s.
Theorem 3.2.
If in the definition of in Equation (29), then for any , the FDR of OATK can be bounded as
The proof of Theorem 3.2 largely follows the leave-one-out technique used in Barber et al., (2020) and is provided in Appendix B.6.
Remark 5.
If the ’s satisfy the strict knockoff condition in Barber and Candès, (2015), then for , and FDR is strictly controlled by taking in Theorem 3.2. When the strict knockoff condition is violated only mildly, there exists a small such that is also small, and therefore the FDR is approximately controlled.
4 Numerical experiments
In this section we show numerically that, relative to existing FDR-controlling algorithms, OATK exhibits higher power with reasonable finite-sample FDR control. We implement OATK using ridge regression, utilizing the computational speedups described in Section 2.3. For each simulation example, we fix and uniformly sample from such that . We set for all experiments. The non-null coefficients are set to , where the sign is sampled uniformly at random and the signal amplitude is varied across experiments. The remaining entries of are set to zero. The desired false discovery rate is fixed as . In each simulation example, we assume a model for and generate a different dataset for each replicate by sampling from the model, normalizing each column to have unit norm, and generating according to the linear model of Equation (1) with unit variance for the noise. In all experiments, the regularization parameter is selected using leave-one-out cross-validation, as described in Appendix A, and the offset parameter in Equation (29) is set to . For each example, we conducted replicates. Section 4.1 assumes to be multivariate normal with different covariance structures, and Section 4.2 considers sampled from a discrete-time Markov chain. Section 4.3 considers real genetic data by fixing as the set of genetic mutations of the HIV genome (HIV Drug Resistance Database (Rhee et al.,, 2003)). Using this real (normalized to have columns with unit norm), we construct a semi-synthetic simulation example by sampling and as described above, similar to the semi-synthetic genetics study of Xing et al., 2023a . We also conduct a genome-wide association study by taking to be real experimentally collected susceptibility data of the HIV virus to fosamprenavir, a common treatment drug. Section 4.4 examines further enhancements of OATK when coupled with multi-bit -value and conditional calibration.
In each simulation example, we include both randomized and derandomized versions of OATK. The existing algorithms with which we compare OATK are the Benjamini-Hochberg procedure (Benjamini and Hochberg,, 1995) (BH), BC, and GM. The BC and GM procedures were implemented with Lasso regression, as we observed higher power compared to ridge regression. Additionally, for the Markov chain benchmark in Section 4.2, we compare OATK against the modified knockoff algorithm of Sesia et al., (2018) (KnockoffDMC), which generates knockoff variables distributed as a discrete-time Markov chain. Since we only consider low-dimensional datasets where , we use the formulation of GM that requires no pre-screening of variables, detailed in Appendix C. The R package for OATK is publicly available at https://github.com/charlie-guan/oatk.
4.1 Simulation with Gaussian
In this example, each row of is generated independently from a multivariate normal distribution, , with three different structures of . Namely, we consider
-
1.
Power decay: for all .
-
2.
Constant positive correlation: for all
-
3.
Constant negative correlation: , where for all .
In all following simulations, we set for all covariance structures. FDR and power results under different are included in Appendix C.2. Changing does not change OATK’s uniformly higher power compared to other algorithms.
Small and study.
We set and . Fig. 1 plots the FDR and power (the average FDP and TDP, respectively) across the 100 replicates for each example. OATK exhibits significantly higher power than the other algorithms across all three covariance structures and all signal amplitudes. OATK exhibits slight FDR inflation with the power decay and constant positive covariances, but its FDR never exceeded , except in . For the constant negative covariance, OATK exhibited somewhat higher FDR inflation at around but far higher power than other algorithms. Derandomization, which is efficiently implementable with OATK due to its substantially lower computational expense, lowers the FDR inflation somewhat with no effect on power. GM typically has somewhat lower power and lower FDR than BH. In the constant positive covariance at lower amplitudes, GM had higher FDR inflation than all other methods. BC exhibits consistent FDR control and higher power than BH and GM for the power decay and constant positive correlations, but its power drops to almost zero for the constant negative correlation. It should also be noted that when we varied , similar conclusions hold regarding the generally superior power and reasonable FDR control of OATK, as we demonstrate in Appendix C.2.

Larger and study.
We next set and . Fig. 2 shows OATK again has the best power among all procedures. While it still has relatively mild FDR inflation in some cases, the differences in power between OATK and the other methods are even higher than in the smaller and study. Derandomization again lowers the FDR inflation, while slightly boosting power.

Regarding replicate-to-replicate variability, Fig. 3 and 4 show the distribution of FDP and TDP across all replicates at a fixed signal amplitude of in the small and study and the large and study, respectively, from which it can be seen that derandomization lowers the variances of both FDP and TDP in OATK. One drawback of OATK is that it only satisfies asymptotic FDR control, while other methods such as BH and BC satisfy finite-sample FDR control for any . As a result, OATK exhibits some FDR inflation in Figs. 1 and 2, while the two competing procedures do not. GM, which satisfies only asymptotic FDR control, also shows FDR inflation in some cases, especially in the constant positive covariance example. However, in light of the replicate-to-replicate variability seen in Figs. 3 and 4 for all methods, small inflation of the FDR may be viewed as relatively innocuous, since it is small relative to the FDP variability. In particular, users apply the approach to a single data set, and the actual FDP for that data set will often differ substantially from the FDR due to the FDP variability. From Figs. 3 and 4, all methods yield quite high FDPs (e.g., above ) on at least some replicates, and the upper FDP quartile for even the BC and BH methods (which guarantee ) exceeds for many examples. In the constant negative example, BH and BC suffer from more extreme outliers where the FDP is high, and they sometimes yield higher variability of TDP compared to OATK. Moreover, the derandomized OATK reduces the variances of both the FDP and TDP to levels that are lower than the other methods, so that it actually yields less extreme worst-case FDPs and TDPs than the other methods.


4.2 Simulation with Markov chain design
Our OATK approach applies to general fixed-design matrix , as long as it is full rank. Next we consider the setting of Sesia et al., (2018), who developed a knockoff filter specialized to that is generated from a Markov chain. Specifically, for , the -th row of is generated independently of the other rows as a scalar discrete Markov chain with state space , as follows. We first sample the Markov chain hyperparameters , with the same hyperparameters applied to each row. We sample the initial state uniformly at random from . Then for , we sample the next state using the transition probability
In this model, the chain is more likely to remain in the same state than to jump to a different state, but this sojourn probability differs across covariates due to different . If the chain does jump to a different state, then it chooses between the other two possible states uniformly at random.
Fig. 5 shows OATK performs the best in power for both smaller and () and larger and (), although in the latter situation its power is comparable to KnockoffDMC. In the smaller setting, there is some FDR inflation but derandomization lowers it closer to the target level, and the inflation is comparable to that of other methods, especially KnockoffDMC. In the larger setting, the FDR control is essentially at the target level. Moreover, OATK does not have the significant FDR inflation at lower signal amplitudes that GM does. Relative to KnockoffDMC, OATK performs similarly in power at the larger and setting and better in power in the smaller setting, in spite of the fact that KnockoffDMC was developed specifically for, and only applies to, the Markov chain situation and requires estimating the transition probability matrix prior to variable selection. Additionally, OATK with derandomization yields the smallest replicate-to-replicate variability in TDP (Fig. 6). KnockoffDMC suffered from poor TDP in several replicates. OATK avoided these extreme low-power outliers, except for one replicate which was avoided via derandomization. Therefore, even when the power is comparable, OATK is a more reliable and consistent variable selection procedure, especially in cases where reproducibility is essential.


4.3 Genome-wide association study with HIV drug resistance data
Genome-wide association studies (GWAS) examine a genome-wide collection of genetic variants from individuals sampled from a target population to detect whether any variants are associated with a phenotype, e.g., assessing which mutations increase resistance to therapeutics for patients with human immunodeficiency virus (HIV).
4.3.1 Semi-synthetic example with real covariates and synthetic response
We consider real genetic covariates using the HIV Drug Resistance Database of Rhee et al., (2003), available at https://hivdb.stanford.edu. Their protease inhibitor (PI) database collected 2395 genetic isolates. After removing duplicate data, rows with missing values, and rare mutations (i.e., occurring in less than 10 data points), we obtain with samples and genetic mutations. The possible values of are 0 or 1, which indicate the absence () or presence () of the -th genetic mutation in the -th patient’s genome. When constructing , a positive non-null coefficient means the presence of its corresponding mutation increases the response phenotype (e.g., increased drug resistance), and a negative coefficient indicates lowered response. The zero coefficients correspond to mutations that have no effect on the response. We set . The results are shown in Fig. 7. OATK again yields the best power but with some FDR inflation that is reduced by derandomization to levels comparable with BC. All of the methods except BH have some FDR inflation.

4.3.2 Identification of key genetic mutations that affect resistance to antiretroviral therapy
In this example, we use completely real data by using the same as Section 4.3.1 and taking as the logarithmic fold resistance to fosamprenavir, a common antiretroviral therapeutic for HIV patients. We do not run any simulations, as both and are data collected from real laboratory tests done in vitro, as part of the HIV Drug Resistance Database. The experimental methodology used to collect the data is detailed in Rhee et al., (2003). The goal of this study is to compare variable selection procedures for identifying the relevant mutations that impact a patient’s drug resistance to fosamprenavir, and corroborate any identified mutations from additional clinical trials.
The data was cleaned as described in Section 4.3.1, now resulting in having samples and genetic mutations. The dimensionality is slightly different from Section 4.3.1 due to the addition of the real fold resistance data to fosamprenavir. Table 1 lists the total number of key mutations identified by each procedure, as well as the ones that were uniquely identified (i.e., not identified by any of the other procedures). OATK selected 77 key mutations, which is the most among the four benchmarked algorithms; it also had the highest number of mutations not identified by the other three algorithms. In particular, OATK uniquely identified the mutations 36L, 71V, 73A, and 73C. As an attempt to verify whether these uniquely identified mutations were true or false positives, we found that they were noted by several clinical studies (Lastere et al.,, 2004; Pellegrin et al.,, 2007; Masquelier et al.,, 2008) to affect virological response in HIV patients when treated with fosamprenavir. We emphasize for the results in Table 1, we used only laboratory tests in vitro and did not use clinical data. In this sense, OATK uniquely identified multiple mutations that were corroborated to have clinical impact. In contrast, the uniquely identified mutations of other procedures were not clinically corroborated, to the best of our knowledge.
Procedure | # Identified | Uniquely Identified Mutations |
---|---|---|
BH | 60 | None |
BC | 64 | 12S, 17E, 55R, 66V |
GM | 71 | 23I, 37S, 57G, 68E |
OATK | 77 | 12P, 13V, 36L, 67E, 71V, 73A, 73C |
4.4 Effect of multi-bit and conditional calibration on OATK
Fig. 8 compares the performance of OATK with () and without () the multi-bit p-value enhancement for various generation sizes for the same Gaussian setting of Section 4.1 presented in Figure 1. Relative to basic OATK (), multi-bit OATK lowers the FDR inflation but reduces power, especially for the constant negative covariance example for which the FDR inflation was higher. However, even with the power reduction, multi-bit OATK still had as high or higher power (and sometimes substantially higher) than the other methods across all three examples (compare Figs. 1 and 8). For , multi-bit OATK conservatively controlled FDR below the specified across all examples and signal amplitudes. Multi-bit OATK with yielded the closest FDR to the desired compared to lower ’s.

Fig. 9 shows the effects of conditional calibration on OATK for a Gaussian example with varying . For , , and , we set , , and , respectively. We fix the signal amplitude at . Rather than computing for every variable using Monte Carlo simulations, we only carry out the calibration step on promising variables, following a similar approach by Luo et al., (2022) to improve computational speed. The implementation details are in Appendix C.3. Fig. 9 shows the conditionally calibrated OATK corrects for the FDR inflation across all for all covariance cases without having to generate for all , confirming conditional calibration achieves finite-sample FDR control. However, it also lowers the power compared to the basic OATK.

5 Conclusions
This paper introduces one-at-a-time knockoffs, a more powerful and computationally efficient algorithm for controlling the false discovery rate while discovering true signals in OLS and ridge regression problems with . It substantially relaxes the conditions knockoffs must satisfy in the original BC knockoff filter, by removing any requirements on the correlation between knockoffs for different variables. This relaxation enables knockoffs to be generated one-at-a-time in a computationally efficient manner and knockoff regressions that involve the exact same number of variables as the original regression. In contrast, the BC knockoff approach requires knockoffs to be generated simultaneously to satisfy the additional requirement and conducts a knockoff regression with twice the number of variables as the original regression, which unnecessarily reduces the power of the knockoff approach.
We also developed a fast version of OATK that avoids having to explicitly generate the knockoff variables or compute their coefficients in full regressions. This allows additional performance-boosting enhancements like derandomization (to reduce variance), multi-bit p-values (to boost power or quantify uncertainty), and conditional calibration (to achieve finite-sample false discovery rate control) to be implemented efficiently. We have proven that OATK controls the FDR at any desired level asymptotically and provided a finite-sample bound on the FDR.
One limitation of our current OATK approach is that it is restricted to the case (the original BC approach is restricted even further to ). One version of the GM approach, which also tests each variable one at a time, allows via a pre-screening step that uses Lasso regression to select a subset of fewer than variables, to which the GM procedure is then applied. We implemented similar pre-screening with our OATK approach and observed from the numerical results that it (i) works well with and (ii) also boosts power in the case while still controlling the FDR. We also found the pre-screening to substantially improve the power of the GM approach when . However, its theoretical justification is unclear, and the theoretical results on FDR control in Xing et al., 2023a for the GM approach with do not apply to the actual numerical implementation in their companion code (Xing et al., 2023b, ) nor to the version of OATK with pre-screening that we observed to work well empirically. We are currently developing an extension of OATK that applies to either or and that has stronger theoretical justification.
References
- Barber and Candès, (2015) Barber, R. F. and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5):2055 – 2085.
- Barber et al., (2020) Barber, R. F., Candès, E. J., and Samworth, R. J. (2020). Robust inference with knockoffs. The Annals of Statistics, 48(3):1409–1431.
- Benjamini and Hochberg, (1995) Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300.
- Benjamini and Yekutieli, (2001) Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, pages 1165–1188.
- Bryc and Dembo, (2005) Bryc, W. and Dembo, A. (2005). On the maximum correlation coefficient. Theory of Probability & Its Applications, 49(1):132–138.
- Candès et al., (2018) Candès, E., Fan, Y., Janson, L., and Lv, J. (2018). Panning for gold:‘model-x’knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3):551–577.
- (7) Dai, C., Lin, B., Xing, X., and Liu, J. S. (2023a). False discovery rate control via data splitting. Journal of the American Statistical Association, 118(544):2503–2520.
- (8) Dai, C., Lin, B., Xing, X., and Liu, J. S. (2023b). A scale-free approach for false discovery rate control in generalized linear models. Journal of the American Statistical Association, 118(543):1551–1565.
- Fithian and Lei, (2022) Fithian, W. and Lei, L. (2022). Conditional calibration for false discovery rate control under dependence. The Annals of Statistics, 50(6):3091–3118.
- Gimenez and Zou, (2019) Gimenez, J. R. and Zou, J. (2019). Improving the stability of the knockoff procedure: Multiple simultaneous knockoffs and entropy maximization. In The 22nd international conference on artificial intelligence and statistics, pages 2184–2192. PMLR.
- Hoerl and Kennard, (1970) Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67.
- Ignatiadis et al., (2024) Ignatiadis, N., Wang, R., and Ramdas, A. (2024). Compound e-values and empirical bayes. arXiv preprint arXiv:2409.19812.
- Lastere et al., (2004) Lastere, S., Dalban, C., Collin, G., Descamps, D., Girard, P.-M., Clavel, F., Costagliola, D., and Brun-Vezinet, F. (2004). Impact of insertions in the hiv-1 p6 ptapp region on the virological response to amprenavir. Antiviral therapy, 9(2):221–227.
- Li and Fithian, (2021) Li, X. and Fithian, W. (2021). Whiteout: when do fixed-x knockoffs fail? arXiv preprint arXiv:2107.06388.
- Liu and Rigollet, (2019) Liu, J. and Rigollet, P. (2019). Power analysis of knockoff filters for correlated designs. Advances in Neural Information Processing Systems, 32.
- Luo et al., (2022) Luo, Y., Fithian, W., and Lei, L. (2022). Improving knockoffs with conditional calibration. arXiv preprint arXiv:2208.09542.
- Masquelier et al., (2008) Masquelier, B., Assoumou, K. L., Descamps, D., Bocket, L., Cottalorda, J., Ruffault, A., Marcelin, A. G., Morand-Joubert, L., Tamalet, C., Charpentier, C., Peytavin, G., Antoun, Z., Brun-Vézinet, F., and Costagliola, D., o. b. o. t. A. R. S. G. (2008). Clinically validated mutation scores for HIV-1 resistance to fosamprenavir/ritonavir. Journal of Antimicrobial Chemotherapy, 61(6):1362–1368.
- Pellegrin et al., (2007) Pellegrin, I., Breilh, D., Coureau, G., Boucher, S., Neau, D., Merel, P., Lacoste, D., Fleury, H., Saux, M.-C., Pellegrin, J.-L., Lazaro, E., Dabis, F., and Thiébaut, R. (2007). Interpretation of genotype and pharmacokinetics for resistance to fosamprenavir-ritonavir-based regimens in antiretroviral-experienced patients. Antimicrobial Agents and Chemotherapy, 51(4):1473–1480.
- Ren and Barber, (2024) Ren, Z. and Barber, R. F. (2024). Derandomised knockoffs: leveraging e-values for false discovery rate control. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(1):122–154.
- Ren et al., (2023) Ren, Z., Wei, Y., and Candès, E. (2023). Derandomizing knockoffs. Journal of the American Statistical Association, 118(542):948–958.
- Rhee et al., (2003) Rhee, S.-Y., Gonzales, M. J., Kantor, R., Betts, B. J., Ravela, J., and Shafer, R. W. (2003). Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Research, 31(1):298–303.
- Sesia et al., (2018) Sesia, M., Sabatti, C., and Candès, E. J. (2018). Gene hunting with hidden Markov model knockoffs. Biometrika, 106(1):1–18.
- Storey, (2002) Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64(3):479–498.
- Storey et al., (2004) Storey, J. D., Taylor, J. E., and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(1):187–205.
- Tibshirani, (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288.
- Wang and Ramdas, (2022) Wang, R. and Ramdas, A. (2022). False Discovery Rate Control with E-values. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3):822–852.
- (27) Xing, X., Zhao, Z., and Liu, J. S. (2023a). Controlling false discovery rate using gaussian mirrors. Journal of the American Statistical Association, 118(541):222–241.
- (28) Xing, X., Zhao, Z., and Liu, J. S. (2023b). Gaussian mirror for high-dimensional controlled variable selection. https://github.com/BioAlgs/GM.
Appendix A Fast Implementation of One-at-a-time Knockoffs using Ridge Regression
We describe a fast approach for implementing ridge regression for various and then selecting the best via leave-one-out cross-validation (LOOCV). Combining LOOCV with Section 2.3, OATK is far faster than the BC or the GM approach.
Consider the singular value decomposition , where and have orthonormal columns, and are the singular values. The OLS and ridge coefficient estimates become
(40) | ||||
(41) | ||||
(42) |
The vector of fitted response values using is
(43) |
and the corresponding LOOCV error sum of squares is
(44) |
Equations (40)-(44) provide a fast method to fit ridge regression models with many different values and select the one that minimizes .
Appendix B Technical Proofs
B.1 Proof of Proposition 1
By standard results for ridge regression, we have and
(49) | |||
(50) |
For the OATK coefficients,
(51) | ||||
(52) |
where
(53) | ||||
(54) | ||||
(55) | ||||
(56) | ||||
(57) | ||||
(58) | ||||
(59) | ||||
(60) |
so that the marginal variances of and are given by
(61) |
We also have
(62) | ||||
(63) |
B.2 Proof of Proposition 2
Recall that we can decompose the OATK as
(64) |
With the above representation, Condition (i) in (19) becomes
(65) |
so that
(66) |
Notice that if and only if lies in the column space of , in which case the term in (64) disappears, and we can represent , where by (66). In this case, Condition (iii) in (19) implies , and the only knockoff that satisfies Condition (i) and Condition (iii) is . This is the primary reason we must assume is full rank.
B.3 Proof Lemma 2.3
Define . It can be checked that has two eigenvalues: and , where the former corresponds to a unique eigenvector and the latter has multiplicity . It is then straightforward that is positive semi-definite when .
B.4 Proof of Lemma 2.4
Given a permutation on and any , we first show that
(79) |
Recall that for some function . By construction, and , where we write . It then suffices to prove that
For any , we can explicitly write
where the second-to-last step is due to the construction of , and the last step is because under the null. We then have
(80) |
where and
(81) |
where is an -dimensional vector with all entries being one. It can be seen that the above distribution is invariant to the permutation of . We have thus shown (79).
Then for any ,
(82) | ||||
(83) | ||||
(84) | ||||
(85) |
We have therefore completed the proof.
B.5 Proof of Proposition 3
If , then
Now suppose instead that . By definition, there exists an increasing sequence such that as and . Then
where step (i) is by the monotone convergence theorem.
In both cases, there is
verifying that the ’s are compound e-values. The FDR control of the eBH procedure follows directly from Wang and Ramdas, (2022).
B.6 Proof of Theorem 3.2
Roughly speaking, the FDR can be well controlled when for all with high probability. To formalize this idea, for some , we decompose the FDR as
FDR | |||
where the inequality follows because the FDP is upper bounded by one. By the choice of , we further bound the first term above as
Since the threshold is determined by , we can write for some mapping . For each , we further define . On the event , and therefore , deterministically. As a result,
Recall that, by definition,
Then by the law of total expectation,
(86) |
where the last step is because that , , , and are deterministic given and . We then have
where the last step follows from Barber et al., (2020, Lemma 6). Summing over and combining everything above, we arrive at
B.7 Proof of Theorem 3.1
We start by defining the following empirical quantities:
as well as their population counterparts:
By symmetry, we have that . We also let
which corresponds to the true, estimated, and their asymptotic counterparts of the FDP.
To connect the empirical quantities with their oracle counterparts, we leverage the following lemma adapted from Xing et al., 2023a . Note that the condition and proof in Xing et al., 2023a only apply to the proof of ’s uniform convergence. We nevertheless modify their proof and provide the details in Section B.8.
Lemma B.1 (Lemma 8 of Xing et al., 2023a ).
Under the same assumption of Theorem 3.1, and suppose that is a continuous function. Then as ,
By assumption, for any sufficiently small , there exists a threshold such that
Also, since as , there exists sufficiently large such that for any , .
Using Lemma B.1, we have for the fixed , , and . By the continuous mapping theorem,
Therefore, for any , there exists large enough such that for any ,
On the event , there is (recall the definition of ). By monotonicity, when . Meanwhile, since , we can find large enough such that when , with probability at least ,
Next, on the event ,
Similarly, on the same event, with , we have
Again by Lemma B.1, there exists such that when , with probability at least ,
(87) | ||||
(88) |
Let denote the intersection of and the event in Equation (87). Taking the union bound, we have that . Then, for ,
Above, step (a) is because and step (b) is because on the event ,
We then proceed to upperbound the FDR. When ,
FDR | |||
where the last inequality follows from the definition of . Since and are arbitrary, we conclude the proof.
B.8 Proof of Lemma B.1
For any , we let . Since is continuous in , we can find such that , . Then,
where the last step follows from the choice of ’s. Invoking Chebyshev’s inequality, we have
which goes to zero as since . The proof for the convergence of is exactly the same.
We now focus on . Again by the continuity of , we can find such that , . Then,
where the last step follows from the choice of ’s. Invoking Chebyshev’s inequality, we have
which goes to zero as since . The proof for the convergence of is exactly the same.
Appendix C Additional simulation details
C.1 Implementation details of the Gaussian mirror
Our implementation of the GM procedure in our setting matches the algorithm that was theoretically proven by (Xing et al., 2023a, , Theorem 4) to exhibit asymptotic FDR control. Specifically, for , GM constructs , where the marginal variance is defined as
GM then regresses onto , using OLS or regularized regression. Letting and be the coefficients for and , respectively, GM computes the test statistic . Finally, it constructs the rejection threshold and returns in the same manner as BC and OATK using Equation (29). As noted in Xing et al., 2023a , the Gaussian mirror can be viewed as a type of knockoff for since the procedure is equivalent to regressing onto and then taking to be the difference in the magnitude of the regression coefficients for and . This formulation achieves asymptotic FDR control when using OLS under a mild correlation assumption on . We use Lasso regression in our implementation, as we found it yields higher power than its OLS and ridge versions while largely maintaining FDR control in our numerical examples.
This implementation differs from what was used in their numerical studies and coded in their accompanying software (Xing et al., 2023b, ), primarily due to a pre-screening step that is necessary for . They initially pre-screen for promising variables by conducting Lasso regression on the original data. Let be the set of indices with nonzero coefficients in the pre-screening regression and be their coefficients. They obtain , which consists of only the columns of in . Then, they construct Gaussian mirror variables for all variables, but they select only with respect to , such that
(89) |
where is if and with the column removed if . The software package of GM (Xing et al., 2023b, ) uses this pre-screened formulation for both and cases. We found the Lasso pre-screening step to boost the power and FDR control in both the GM and OATK procedures, although OATK with pre-screening was still more powerful than GM with pre-screening.
In our numerical implementations in Section 4, we implement both GM and OATK without the pre-screening step for multiple reasons. First, pre-screening was primarily intended to handle the case, whereas our focus is on , for which pre-screening is not needed. Second, it allows the more innate aspects of the two methods to be compared without conflating the effects of pre-screening. Third, the pre-screening coded implementation in Xing, et al. (2023b) lacks a firm theoretical basis. Regarding the last point, while the coded implementation follows to some extent the high-dimensional () GM algorithm for which Xing et al., 2023a (, Algorithm 2, Theorem 5) proved asymptotic FDR control, there are fundamental differences that render the theoretical results inapplicable. First, Algorithm 2 of Xing et al., 2023a generates only for , i.e., only the variables that survive the pre-screening step, and the screened-out variables are completely ignored in the subsequent GM regressions and considered as null variables by default. In contrast, the implemented algorithm generates ’s for all variables, including the screened-out ’s, which may be included in depending on and . We found empirically that ignoring all screened-out variables resulted in much poorer FDR control than when knockoffs were generated for them, as was done in the coded implementation. Moreover, in Algorithm 2 of Xing et al., 2023a is computed as
(90) |
where is the projection of onto the column space of , which differs substantially from (89) used in the coded implementation.
Additionally, the construction of differs between the two algorithms. The coded implementation constructs , which is the same as in the low-dimensional algorithm without the pre-screening step. In contrast, Algorithm 2 defines
(91) |
where is the standard deviation of the noise term of the linear model (1) (which must be estimated), is the CDF of the standard normal distribution, is the CDF of a random variable truncated to the interval , and are parameters calculated from , , and . Defining as in (91) is necessary to prove asymptotic FDR control because it results in the symmetry of for null ’s by correcting for the post-selection bias caused by the Lasso regression of the pre-screening step, whereas the version of implemented in the code does not.
C.2 Additional numerical results for Gaussian
This section examines additional numerical results for the Gaussian study of Section 4.1. Fig. 10 shows the effect of the covariance parameter on the FDR and power on the smaller example with fixed amplitude . Increasing , which increases the correlation among covariates in all three covariance structures, reduces power across all variable selection procedures. However, OATK shows the slowest decay in power, and increasing does not change OATK’s uniformly higher power compared to other algorithms. The FDR control of OATK is consistently reasonable across in the power decay and constant positive examples, never exceeding 0.13. In the constant negative case, OATK achieves the worst FDR inflation at when , but we view such inflation as relatively innocuous, given OATK’s superior power and the replicate-to-replicate variability, as discussed in Section 4.1.

C.3 Fast implementation of conditionally calibrated OATK
We run the conditional calibration procedure detailed in Section 2.5 on , where is the rejection set of BH with FDR level , is the rejection set of the basic OATK, and
(92) | ||||
(93) |
Here, denotes the p-value corresponding to the standard two-sided t-statistic from OLS and denote the order statistics of In other words, we run the calibration procedure only on variables with relatively small p-value, a larger OATK test statistic, or in the rejection set of OATK.