Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials
Abstract
In an influential critique of empirical practice, Freedman [Fre08a, Fre08b] showed that the linear regression estimator was biased for the analysis of randomized controlled trials under the randomization model. Under Freedman’s assumptions, we derive exact closed-form bias corrections for the linear regression estimator with and without treatment-by-covariate interactions. We show that the limiting distribution of the bias corrected estimator is identical to the uncorrected estimator, implying that the asymptotic gains from adjustment can be attained without introducing any risk of bias. Taken together with results from Lin [Lin13a], our results show that Freedman’s theoretical arguments against the use of regression adjustment can be completely resolved with minor modifications to practice.
1 Introduction
Randomized Controlled Trials (RCTs) are popular in empirical economics [AP08, DGK07, Gle17, LR11]. When estimating average treatment effects, adjustment for pretreatment covariates with linear regression is a commonly recommended practice because it can reduce the variability of estimates. However, adjusting for covariates remains somewhat controversial, in large part because of an influential critique from David Freedman [Fre08a, Fre08b].
Freedman argued that randomization does not justify the use of linear regression for completely randomized experiments. Freedman’s theoretical arguments relied on three results proven under the randomization-based [SNDS90, IR15] inferential paradigm:
-
1.
asymptotically, the linear regression estimator can be inefficient relative to the unadjusted (difference-in-means) estimator if the design is imbalanced;
-
2.
the classical standard error for linear regression is inconsistent;
-
3.
the regression estimator has an bias term.
Freedman’s arguments were influential among scholars across multiple disciplines (e.g., [Dun12],[Rec21]). Freedman’s third argument garnered particular attention among social scientists. Notably, [DC18]’s critique of randomization in empirical economics argued that the bias introduced by regression undermines the gold standard argument for RCTs.
Scholars have worked to address Freedman’s critiques and to understand the extent that they can and do matter for empirical practice in empirical economics. Using Freedman’s own framework, [Lin13a] showed that arguments 1 and 2 were resolved by small modifications to practice. Freedman’s efficiency result may be addressed through simple modifications to the regression specification, namely including treatment by covariate interactions [Bli73, Oax73]. Then it can be shown that the adjusted estimator is never less asymptotically efficient than the unadjusted estimator. Regarding argument 2, Lin proves that robust standard errors ([Whi80, Hub67, Eic67], see also [SA12]) are asymptotically conservative in Freedman’s setting, guaranteeing the validity of large-sample inference. On argument 3, [Lin13a] (see also [Lin13b]) notes that the leading term of the bias is in fact estimable and can be shown to be small in a real-world empirical example. However, the small-sample bias of the regression estimator was not yet fully resolved.
Since [Lin13a], there have been notable papers that have proposed unbiased regression-type estimators for experimental data. [MSY13] demonstrate that if the regression model is fully saturated (see also [AI17] and [Imb10]), then the associated effect estimate is unbiased conditional on the event that treatment is not collinear with any covariate stratum. This approach cannot generally be used without coarsening continuous covariates. [AM13] proposes the use of auxiliary data, demonstrating that the suitable use of hold-out samples ensures the finite-sample unbiasedness of the associated regression estimator, but the paper does not consider efficiency properties. More recently, [WGB18] extended [AM13] to propose an innovative but computationally expensive split-sample approach for completely randomized experiments.
The primary contribution of this paper is to resolve Freedman’s third theoretical argument by proposing finite-sample-exact, closed-form bias corrections without adding any new assumptions.
Our idea builds on [Lin13a]’s proposal to estimate the leading term of the bias, but further develops a novel finite-sample exact bias correction encompassing all higher-order terms [Fre08a]. We derive these bias corrections for both the noninteracted and interacted linear regression estimators. We prove that the estimators have the same limiting distributions as the non-bias-adjusted estimators, implying that they could replace existing estimators in instances where bias is a prevailing concern (e.g., trials that may be aggregated in meta-analysis). We further provide a numerical illustration demonstrating these properties.
Finally, we remind the readers that the practice of debiasing estimators is not uncontroversial. [TE93]222We thank Winston Lin for suggesting this reference. has warned that the bias correction could be dangerous in practice due to its high variability333As will be shown in the simulations, when the performance is measured by the Root Mean Squared Error (RMSE), there is no clear dominance among the estimators: in some cases the RMSEs of the debiased estimators are strictly smaller than those of other estimators, and in other cases larger.. In real-world decision making processes, people may express different preferences for different statistical properties (i.e. unbiasedness or low Mean Squared Error)444See [WGB18] for an anecdotal example of a policy-maker favoring unbiasedness.. Our results shall imply that in large samples the additional variation caused by the bias correction is negligible, but for small samples, in some cases, we find it important to account for the sampling variability of the additional terms. To address this problem, we propose a simple modification to the standard error estimation procedure. Such modification, based on recomputing OLS residuals using the debiased estimators, is shown to work well on our simulated datasets. We make recommendations for practice in the Simulation section.
The organization of the paper is as follows: Section 2 includes the model setup and assumptions; Section 3 considers a characterization of bias terms of the OLS estimators; Section 4 proposes the bias corrections; Section 5 includes simulation results with both simulated datasets and a real world dataset. In the appendix one can find the proofs for theorems in Section 3 and Section 4, and more simulation results.
2 Setting, Assumptions and Notations
We follow the setting of [Fre08a] and [Lin13a], which assume a Neyman [SNDS90] model with covariates.
There are subjects indexed by . For each subject we observe an outcome and a column vector of covariates . The dimension of the covariates, , does not change with the sample size.
Each subject has two potential outcomes and (cf., the stable-unit-treatment-value-assumption [Rub90]). We observe if is chosen for treatment arm (treated group) and if i is chosen for arm (control group). Let be the dummy variable for treatment arm . Thus the observed outcome for is .
The experiment is assumed to be completely randomized: out of subjects are
randomly assigned to arm and the remaining subjects to arm . Random assignment is the only source of randomness in the model. We do not assume a superpopulation: the subjects are the population of interest.
We introduce some notation. let be the population size, and be the number of subjects in treatment arms and , respectively. Let denote the set of individuals who are chosen for arm and similarly . Let , and denote the population average, group average, and group average, respectively, of possibly a vector-valued variable . The average treatment effect (ATE) can be written in this notation as:
and the difference-in-means estimator:
Simiarly we can write for and for and .
We make the following assumptions throughout the paper, which are standard in the literature. (cf., [Fre08b, Fre08a, Lin13a]).
Assumption 1 (Bounded fourth moments).
For all , and ,
where is a finite constant.
Assumption 2 (Convergence of first and second moments).
For ,
where is a positive definite matrix with finite entries. Moreoever, converges to an invertible matrix.
Assumption 3 (Group Sizes).
Let and , the inclusion probability into the treatment arm A and arm B, respectively. We assume and for all , and
Assumption 4 (Centering).
All assumptions are employed regularly in the literature. They are used to derive consistency and asymptotic normality for the estimator. Assumption 3 requires each arm receives a nontrivial fraction of subjects over the asymptotic sequence of the models. Assumption 4 is without loss of generality: in practice, researchers can just demean each covariate and apply our method.
We remind the readers of the definitions of our two OLS regression adjusted ATE estimators. The first estimator comes from a noninteracted OLS regression where one regresses observed outcome on the treatment indicator and demeaned pretreatment covariates . The coefficient estimate for is the noninteracted OLS regression adjusted ATE estimator. The second estimator comes from a fully interacted OLS regression where one regresses observed outcome on the treatment indicator , demeaned pretreatment covariates , and interaction terms of the treatment indicators and demeaned pretreatment covariates. The coefficient estimate for is the interacted OLS regression adjusted ATE estimator.
Finally, we prepare some notation for the sections below. Let and be the centered potential outcomes, namely, and . Let , , and . With this notation the regression coefficients estimators of the pretreatment covariates in the noninteracted case can be written as , and the population coefficients . Denote the (rescaled) leverage of th data point as
.
Further define , , , , and . Thus the regression coefficients estimator of the pretreatment covariates in the interactive case can be written as and . Their population counterparts are and .
3 Bias Characterization
As shown in [Lin13a], the OLS regression adjusted ATE estimator can be written as:
for the noninteracted case and
for the interacted case, where , and are the OLS coefficients in front of the covariates.
A characterization of the bias terms is provided in this section. Note that both the noninteracted and interacted estimators can be written as sums of the difference-in-means estimator and a regression adjustment using group means and OLS coefficients. The bias comes from the regression adjustment terms, in particular from estimating the regression coefficients on the covariates. We shall characterize the bias terms of the coefficient estimator first. We start from the noninteracted case. Note from here on we shall assume for simplicity that all design matrices (i.e. and ) are invertible. In case of noninvertible design matrices, our debiased procedure will still work after choosing a particular generalized inverse, and compute the ATE estimators according to the formulae above.
Theorem 3.1.
The OLS coefficient vector for covariates can be written as:
with
From the coefficient decomposition one can directly characterize the bias term of the ATE estimator. Note that the bias terms are of order .
Corollary 3.1.
The bias of the estimator is:
Moreover,
Following the same steps as we did for the noninteracted estimator, we are able to derive analogous results for the interacted estimator.
Theorem 3.2.
The OLS coefficient vectors for covariates can be written as
with
Corollary 3.2.
The bias of the estimator is:
Moreover, and .
Note that this result implies that the bias terms of the interactive ATE estimator are also of order .
4 Bias Corrections for Regression Components
Having established the decomposition, we now derive estimators of each bias term for use as bias corrections. We show that these bias estimates are (i) exactly unbiased and (ii) have estimation error of . It follows that use of this bias correction with an adjusted estimator yields a finite-sample unbiased estimator with the limit distribution of the adjusted estimator. We remind the readers that , the (rescaled) leverage of th data point as defined in Section 2.
We again begin with the noninteracted case.
Theorem 4.1.
An unbiased estimator for the bias in the noninteracted case is:
and are two constants depending on , and . Their exact formulas are given in the appendix. Moreover
Corollary 4.1.
The following estimator is unbiased for estimating the ATE:
The results are derived analogously in the interacted case.
Theorem 4.2.
An unbiased estimator for the bias in the interacted case is:
and are two constants depending on , and . Their exact formulas are given in the appendix. Moreover .
Corollary 4.2.
The following estimator is unbiased for estimating the ATE:
Remark 1.
Note that both adjustments in Theorem 4.1 and Theorem 4.2 are of order . Thus and . The debiased estimators have the same limiting distributions as the original estimators.
Remark 2.
We briefly remark on why it is possible to design unbiased adjusted estimators in closed-form. Examining at the expressions in Section 3, although all biases are nonlinear, only , and have an infinite order Taylor expansion, but these terms can be expressed purely in terms of the observable data. All other terms are in expectation functions of moments that can be unbiasedly estimated.
5 Simulations
In this section we apply our estimators on several datasets.
We briefly comment on variance estimation and confidence interval construction. We showed in Section 4 that our debiased estimators have the same asymptotic distributions as those of the OLS estimators. This implies that for large samples we can recenter the OLS confidence intervals with our debiased estimators and expect the same coverage probabilities. For small samples, however, we find it important to account for the sampling variability of the additional terms. Indeed, in one of the simulations below, we found that a naive recentering procedure may lead to severe undercoverage. To address this problem, we propose a simple procedure shown to work well on our simulated datasets.555Another way is to directly estimate the variances of the additional terms, but this may be cumbersome to do. In this procedure, one first runs the OLS regression and computes the debiased estimator. Then one replaces the OLS treatment coefficient with the debiased estimated coefficients and recomputes the OLS residuals, keeping all other coefficients the same. Finally, one computes the variances and constructs confidence intervals for the debiased estimator using the same formula as for the OLS estimators. In the simulations below, such procedures will be denoted by BC, which stands for bias-corrected. In Appendix B, one can find a more detailed comparison of this new procedure with standard ones. In practice, we recommend researchers to use our debiased estimators with this procedure, the BC-HC2 heteroskedasticity-robust variance estimator with a Satterthwaite adjustment for inference666For a discussion of the Satterthwaite adjustment, see [Sat46],[BM02], [Lin13a] and [Imb10]..
5.1 Simulated Datasets
In this section we compare the performance of our debiased estimators with that of standard estimators using simulated datasets. We show the results of two simulation schemes here777In Appendix B, readers can find results for two more simulation schemes as well as some graphical information of the data generating processes.. In each scheme, we first generate two dimensional covariates that are the quantiles of some prespecified distributions888For example, in scheme 1, with a sample of units, the covariates of th unit are the quantiles of a Beta(0.5,0.5) distribution and a Triangle(0,1) distribution.. We then compute the studentized leverage ratios and use those to impute potential outcomes. We consider three ways to impute potential outcomes. For all cases the average treatment effect is equal to 0. The experiment is a completely randomized experiment with 24 units and an inclusion probability of for the treatment arm. Table 5.1 includes simulation details. Note that these schemes are designed specifically such that the finite sample bias is relatively large.
Table 2 and 3 report the simulation results for the two schemes. Our debiased estimators are exactly unbiased as expected. In terms of the root mean squared error (RMSE) the picture is less clear. There are cases where the debiased estimators dominated others (DGP 1.1 and DGP 1.3), and cases where the unadjusted estimator is the best (DGP 1.2 and DGP 2.2)999This is the artifact of the DGPs. Recall the variance formula for the difference-in-means estimator, for example from [IR15].. Note that DGP 1.3 and DGP 2.3 are constant effect models, in which the noninteracted OLS estimators are first order unbiased. However in DGP 1.3 we still observe a small, higher-order bias.
In terms of confidence interval coverages, observe in DGP 2.1, 2.2 and 2.3 that the original recentering intervals exhibit significant undercoverages. However, the procedure based on recomputing the OLS residuals with Satterthwaite adjustments works reasonably well. There is only one case DGP2.3, Non-Int, where the coverage is not very satisfactory. As shown in Table 6 and 8 in the appendix, the BC procedures do not add to the median confidence interval length significantly, although it tends to add to the average confidence interval length. The Satterthwaite could also add to the median (mean) confidence interval length: it typically increases the confidence interval length by at most 10 to 20 percent, compared with the Student-t adjustment.
#treated | ATE | |||||
Scheme 1, N=24 | ||||||
DGP1.1 | Beta(0.5,0.5) | Tri(0,1) | 0 | 2 | 8 | 0 |
DGP1.2 | - | |||||
DGP1.3 | ||||||
Scheme 2, N=24 | ||||||
DGP2.1 | Beta(2,5) | Norm(0,1) | 0 | 2 | 8 | 0 |
DGP2.2 | - | |||||
DGP2.3 |
-
•
Note: DGPs for Simulations. Beta(,) are the beta distributions with shape parameters and . Tri(0,1) is the symmetric triangular distribution on the unit interval. Norm(0,1) is the standard normal distribution. is the studentized leverage ratio for the th unit. It is computed as , where , and . Note DGP1.1, DGP1.2, DGP2.1 and DGP2.2 are variable effects models, and DGP 1.3 and DGP 2.3 are constant effects models.
ATE Estimators | |||||
Unadjusted | OLS | Debiased | |||
Non-Int. | Interacted | Non-Int. | Interacted | ||
DGP1.1, N =24 | |||||
Bias | -0.000 | -0.044 | -0.171 | -0.000 | -0.000 |
SD | 0.577 | 0.569 | 0.734 | 0.558 | 0.570 |
RMSE | 0.577 | 0.571 | 0.754 | 0.558 | 0.570 |
CI Coverage (HC2, Student-t) | 0.961 | 0.957 | 0.919 | 0.960 | 0.953 |
CI Coverage (HC2, Satterthwaite) | 0.965 | 0.964 | 0.949 | 0.966 | 0.970 |
CI Coverage (BC-HC2, Student-t) | 0.961 | 0.957 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.967 | 0.973 | |||
DGP1.2, N =24 | |||||
Bias | -0.000 | -0.046 | -0.097 | -0.000 | -0.000 |
SD | 0.144 | 0.220 | 0.275 | 0.205 | 0.182 |
RMSE | 0.144 | 0.225 | 0.292 | 0.205 | 0.182 |
CI Coverage (HC2, Student-t) | 1.000 | 0.999 | 0.982 | 1.000 | 0.999 |
CI Coverage (HC2, Satterthwaite) | 1.000 | 1.000 | 0.994 | 1.000 | 1.000 |
CI Coverage (BC-HC2, Student-t) | 1.000 | 1.000 | |||
CI Coverage (BC-HC2, Satterthwaite) | 1.000 | 1.000 | |||
DGP1.3, N =24 | |||||
Bias | -0.000 | 0.002 | -0.074 | 0.000 | 0.000 |
SD | 0.433 | 0.417 | 0.483 | 0.400 | 0.408 |
RMSE | 0.433 | 0.417 | 0.489 | 0.400 | 0.408 |
CI Coverage (HC2, Student-t) | 0.940 | 0.938 | 0.916 | 0.946 | 0.948 |
CI Coverage (HC2, Satterthwaite) | 0.947 | 0.949 | 0.950 | 0.956 | 0.970 |
CI Coverage (BC-HC2, Student-t) | 0.947 | 0.950 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.956 | 0.971 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is .
ATE Estimators | |||||
Unadjusted | OLS | Debiased | |||
Non-Int. | Interacted | Non-Int. | Interacted | ||
DGP2.1, N =24 | |||||
Bias | 0.000 | -0.237 | 0.028 | 0.000 | 0.000 |
SD | 0.577 | 0.344 | 0.283 | 0.459 | 0.439 |
RMSE | 0.577 | 0.418 | 0.284 | 0.459 | 0.439 |
CI Coverage (HC2, Student-t) | 0.910 | 0.913 | 0.757 | 0.923 | 0.470 |
CI Coverage (HC2, Satterthwaite) | 0.915 | 0.920 | 0.837 | 0.928 | 0.548 |
CI Coverage (BC-HC2, Student-t) | 0.923 | 0.876 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.928 | 0.930 | |||
DGP2.2, N =24 | |||||
Bias | 0.000 | -0.237 | 0.015 | 0.000 | 0.000 |
SD | 0.144 | 0.326 | 0.132 | 0.314 | 0.225 |
RMSE | 0.144 | 0.403 | 0.133 | 0.314 | 0.225 |
Coverage_HC2 | 1.000 | 0.929 | 0.933 | 0.969 | 0.724 |
CI Coverage (HC2, Student-t) | 1.00 | 0.93 | 0.93 | 0.967 | 0.614 |
CI Coverage (HC2, Satterthwaite) | 1.00 | 0.935 | 0.991 | 0.969 | 0.724 |
CI Coverage (BC-HC2, Student-t) | 0.965 | 0.967 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.968 | 0.996 | |||
DGP2.3, N =24 | |||||
Bias | 0.000 | 0.000 | 0.013 | 0.000 | 0.000 |
SD | 0.433 | 0.097 | 0.163 | 0.195 | 0.239 |
RMSE | 0.433 | 0.097 | 0.164 | 0.195 | 0.239 |
CI Coverage (HC2, Student-t) | 0.93 | 0.97 | 0.85 | 0.654 | 0.570 |
CI Coverage (HC2, Satterthwaite) | 0.942 | 0.983 | 0.947 | 0.683 | 0.678 |
CI Coverage (BC-HC2, Student-t) | 0.809 | 0.896 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.850 | 0.944 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is .
5.2 Real Dataset
In this section we compare the performance of debiased estimators with that of standard OLS estimators on a real world dataset. We precisely follow [Lin13a]’s simulation setting. We generate our simulation from the experimental example of [ALO09] by simulating random assignments under the maintained hypothesis of no treatment effect. Because this setting assumes no effects, bias is expected to be negligible: [Fre08a] notes that the leading term of the bias is greatest when treatment effects are heterogeneous. The simulation is thus not primarily meant to investigate bias, but rather the precision and coverage consequences of the use of our bias corrections in a real-world setting.101010We thank Winston Lin for sharing the replication files.
[ALO09] sought to measure the effects of support services and financial incentives on college students’ academic achievement. The experiment randomly assigns eligible first-year undergraduate students into four groups. One treatment group was offered both support services and financial incentives. A second group was offered only support services and a third group only financial incentives. The control group was eligible only for standard university support services. As in [Lin13a], we only use the data for men in the services-and-incentives () and service-only () groups. The simulation datasets are generated assuming the treatment has no effect on any students. We replicate the experiments times, and each time randomly assign students to the services-and-incentives group and to the service-only group. The regression estimators estimate the treatment effects adjusting for high-school GPAs. The standard errors of the OLS estimators are estimated using the standard sandwich formulas.
Table 1 reports the simulation results from simulations. The first and second rows of the table show the means and standard deviations of the five estimators. All estimators are approximately unbiased after rounding and the variances of the debiased estimators are no larger than the standard estimators. The third row shows that all confidence intervals cover the true value of the ATE with approximately 95 percent probability. The fourth row reports the average length of the confidence intervals. On average, the intervals of regression-adjusted estimators are slightly narrower than that of the unadjusted estimator. (The width of the confidence intervals for the debiased estimators are mechanically identical to those for the standard estimators due to being constructed using the same SE estimators.)
ATE Estimators | |||||
Unadjusted | OLS | Debiased | |||
Non-Int. | Interacted | Non-Int. | Interacted | ||
Bias | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
SD | 0.159 | 0.150 | 0.147 | 0.147 | 0.147 |
MSE | 0.159 | 0.150 | 0.147 | 0.147 | 0.147 |
CI Coverage (HC2, Student-t) | 0.949 | 0.949 | 0.949 | 0.949 | 0.949 |
CI Coverage (HC2, Satterthwaite) | 0.949 | 0.949 | 0.949 | 0.950 | 0.950 |
CI Coverage (BC-HC2, Student-t) | 0.949 | 0.949 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.950 | 0.950 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].
Together, these results demonstrate that, in a real-world setting, our bias corrections can be effectively introduced without appreciably compromising the precision or coverage properties of regression adjusted estimators.
Appendix A Proofs
A.1 Constants
We define following five constants:
A.2 Auxiliary Lemmas
Let and be three possibly identical random variables such that .
Lemma A.1.
Proof.
We only prove the first equality. The second one can be proved analogously. First notice two useful equalities:
where in third equality uses the second moment estimate for a complete random experiments and the fifth equality uses the fact that .
where the fourth and fifth equality use . Finally,
where for the last equality we apply the previous two equalities. ∎
Lemma A.2.
(1) |
Proof.
Consider times the third moment estimator :
In expectation this equals to:
∎
A.3 Theorem 3.1
Proof.
By the Frisch–Waugh–Lovell theorem, the OLS estimate of the coefficient can be written , where
and
Now, consider the order of the terms , and . For , , and are by assumption. Meanwhile, and are each by moment conditions on , and . Therefore
For , note that which is . is also . Meanwhile, converges to a constant vector so that it is . Therefore
For , , , and are , and , and are . Therefore
∎
A.4 Corollary 3.1
Proof.
where in the third equality we used the unbiasedness of the difference-in-means estimator and the unbiasedness of and as estimators of the sample mean . ∎
A.5 Theorem 3.2
Proof.
The decomposition is algebraic. Now , , and by the assumptions.
Similarly . These estimates give the orders in the theorem. ∎
A.6 Corollary 3.2
Proof.
Same as Corollary 3.1 after noting that and ∎
A.7 Theorem 4.1
Let
Note both of them are of order .
Proof.
We first propose an estimator for . Using the assumption , we have
Then we have
where the second equality follows from Proposition 1, [Fre08b]. A estimator of this bias is:
It is clear that this adjustment is of order .
It should be obvious that is directly estimable and of order .
We now propose an estimator for :
By Lemma A1 and A2, an unbiased estimator for this quantity is:
It is clear that this adjustment is of order .
∎
A.8 Theorem 4.2
We have:
Note that and are of order
Proof.
We prove the result for bias in the control arm. Proof for the treated arm is analogous. First notice
The first term is directly estimable and is of order . Analogous to the noninteractive case, the second term can be estimated by:
It should be clear that this term is of order .
The third term can be estimated by:
It is clear this term is of order ∎
Appendix B More Simulation Results
B.1 Details on Simulation Schemes
#treated | ATE | |||||
Scheme 1, N=24 | ||||||
DGP1.1 | Beta(0.5,0.5) | Tri(0,1) | 0 | 2 | 8 | 0 |
DGP1.2 | - | |||||
DGP1.3 | ||||||
Scheme 2, N=24 | ||||||
DGP2.1 | Beta(2,5) | Norm(0,1) | 0 | 2 | 8 | 0 |
DGP2.2 | - | |||||
DGP2.3 | ||||||
Scheme 3, N=24 | ||||||
DGP3.1 | Uniform(0,1) | Uniform(0,1), Squared, Reversed | 0 | 2 | 8 | 0 |
DGP3.2 | - | |||||
DGP3.3 | ||||||
Scheme 4, N=24 | ||||||
DGP3.1 | Uniform(0,1) | Uniform(0,1), Squared | 0 | 2 | 8 | 0 |
DGP3.2 | - | |||||
DGP3.3 |

B.2 Scheme 1
ATE Estimators | ||||||
HC2 | BC-HC2 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP1.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 95.30 | 96.00 | 96.60 | 95.40 | 96.10 | 96.70 |
Interacted (Debiased) | 94.60 | 95.30 | 97.00 | 95.00 | 95.70 | 97.30 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.68 | 2.83 | 3.00 | 2.70 | 2.85 | 3.02 |
Interacted (Debiased) | 3.20 | 3.38 | 4.13 | 3.26 | 3.44 | 4.28 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.71 | 2.86 | 3.01 | 2.71 | 2.86 | 3.01 |
Interacted (Debiased) | 3.02 | 3.19 | 3.67 | 3.02 | 3.19 | 3.67 |
DGP1.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 99.90 | 100.00 | 100.00 | 99.90 | 100.00 | 100.00 |
Interacted (Debiased) | 99.90 | 99.90 | 100.00 | 100.00 | 100.00 | 100.00 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.84 | 1.95 | 2.06 | 1.85 | 1.95 | 2.06 |
Interacted (Debiased) | 1.87 | 1.98 | 2.42 | 1.90 | 2.00 | 2.48 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 1.83 | 1.93 | 2.03 | 1.83 | 1.93 | 2.03 |
Interacted (Debiased) | 1.77 | 1.87 | 2.16 | 1.77 | 1.87 | 2.16 |
DGP1.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 93.50 | 94.60 | 95.60 | 93.60 | 94.70 | 95.60 |
Interacted (Debiased) | 93.80 | 94.80 | 97.00 | 94.10 | 95.00 | 97.10 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.63 | 1.72 | 1.82 | 1.63 | 1.72 | 1.82 |
Interacted (Debiased) | 1.87 | 1.98 | 2.42 | 1.91 | 2.01 | 2.49 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 1.62 | 1.71 | 1.80 | 1.62 | 1.71 | 1.80 |
Interacted (Debiased) | 1.77 | 1.87 | 2.16 | 1.77 | 1.87 | 2.16 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].
ATE Estimators | ||||||
HC3 | BC-HC3 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP1.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 96.70 | 97.20 | 97.70 | 96.70 | 97.20 | 97.80 |
Interacted (Debiased) | 97.40 | 97.70 | 98.80 | 97.60 | 97.90 | 99.00 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 3.00 | 3.16 | 3.37 | 3.01 | 3.18 | 3.39 |
Interacted (Debiased) | 4.58 | 4.83 | 7.60 | 4.60 | 4.85 | 7.83 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 3.02 | 3.18 | 3.38 | 3.03 | 3.20 | 3.40 |
Interacted (Debiased) | 3.94 | 4.16 | 5.18 | 3.90 | 4.11 | 5.12 |
DGP1.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Interacted (Debiased) | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.06 | 2.17 | 2.32 | 2.06 | 2.17 | 2.32 |
Interacted (Debiased) | 2.54 | 2.68 | 4.16 | 2.54 | 2.68 | 4.25 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.04 | 2.15 | 2.28 | 2.04 | 2.15 | 2.28 |
I.Nobias | 2.21 | 2.33 | 2.91 | 2.19 | 2.31 | 2.88 |
DGP1.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 95.30 | 96.10 | 97.00 | 95.30 | 96.20 | 97.00 |
Interacted (Debiased) | 96.80 | 97.40 | 98.90 | 97.00 | 97.50 | 99.00 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.78 | 1.88 | 2.01 | 1.79 | 1.89 | 2.01 |
Interacted (Debiased) | 2.54 | 2.68 | 4.16 | 2.57 | 2.71 | 4.33 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 1.77 | 1.87 | 1.98 | 1.77 | 1.87 | 1.99 |
Interacted (Debiased) | 2.21 | 2.33 | 2.91 | 2.20 | 2.32 | 2.89 |
B.3 Scheme 2
ATE Estimators | ||||||
HC2 | BC-HC2 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP2.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 91.40 | 92.30 | 92.80 | 91.40 | 92.30 | 92.80 |
Interacted (Debiased) | 44.70 | 47.00 | 54.80 | 85.40 | 87.60 | 93.20 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.93 | 2.04 | 2.16 | 2.04 | 2.15 | 2.28 |
Interacted (Debiased) | 0.60 | 0.63 | 0.89 | 2.48 | 2.61 | 4.50 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 1.95 | 2.06 | 2.17 | 1.95 | 2.06 | 2.17 |
Interacted (Debiased) | 0.51 | 0.54 | 0.67 | 0.51 | 0.54 | 0.67 |
DGP2.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 96.30 | 96.70 | 96.90 | 96.10 | 96.50 | 96.80 |
Interacted (Debiased) | 58.90 | 61.40 | 72.40 | 95.30 | 96.70 | 99.60 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.91 | 2.02 | 2.14 | 1.96 | 2.07 | 2.19 |
Interacted (Debiased) | 0.40 | 0.42 | 0.59 | 1.33 | 1.40 | 2.36 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 1.94 | 2.05 | 2.16 | 1.94 | 2.05 | 2.16 |
Interacted (Debiased) | 0.38 | 0.40 | 0.48 | 0.38 | 0.40 | 0.48 |
DGP2.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 62.50 | 65.40 | 68.30 | 76.50 | 80.90 | 85.00 |
Interacted (Debiased) | 54.20 | 57.00 | 67.80 | 87.40 | 89.60 | 94.40 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 0.37 | 0.39 | 0.41 | 0.45 | 0.48 | 0.51 |
Interacted (Debiased) | 0.40 | 0.42 | 0.59 | 1.33 | 1.41 | 2.38 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 0.35 | 0.37 | 0.40 | 0.35 | 0.37 | 0.40 |
Interacted (Debiased) | 0.38 | 0.40 | 0.48 | 0.38 | 0.40 | 0.48 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].
ATE Estimators | ||||||
HC3 | BC-HC3 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP2.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 93.20 | 94.10 | 94.50 | 93.20 | 94.00 | 94.50 |
Interacted (Debiased) | 74.20 | 76.00 | 87.60 | 96.40 | 97.00 | 98.80 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.37 | 2.50 | 2.67 | 2.44 | 2.57 | 2.76 |
Interacted (Debiased) | 1.60 | 1.69 | 5.29 | 12.18 | 12.85 | 57.49 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.37 | 2.50 | 2.66 | 2.44 | 2.57 | 2.73 |
Interacted (Debiased) | 1.07 | 1.13 | 1.85 | 2.11 | 2.23 | 3.94 |
DGP2.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 97.00 | 97.30 | 97.40 | 96.80 | 97.10 | 97.30 |
Interacted (Debiased) | 85.30 | 86.70 | 96.10 | 99.90 | 99.90 | 100.00 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.34 | 2.47 | 2.64 | 2.35 | 2.48 | 2.66 |
Interacted (Debiased) | 0.93 | 0.98 | 2.88 | 6.26 | 6.60 | 29.16 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.33 | 2.46 | 2.62 | 2.37 | 2.50 | 2.65 |
Interacted (Debiased) | 0.67 | 0.71 | 1.16 | 1.18 | 1.25 | 2.21 |
DGP2.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 71.60 | 74.60 | 78.10 | 86.90 | 90.40 | 93.30 |
Interacted (Debiased) | 83.30 | 84.70 | 94.50 | 98.20 | 98.80 | 99.60 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 0.45 | 0.47 | 0.50 | 0.54 | 0.57 | 0.61 |
Interacted (Debiased) | 0.93 | 0.98 | 2.88 | 6.28 | 6.63 | 29.31 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 0.42 | 0.45 | 0.48 | 0.50 | 0.52 | 0.55 |
Interacted (Debiased) | 0.67 | 0.71 | 1.16 | 1.18 | 1.25 | 2.24 |
B.4 Scheme 3
ATE Estimators | |||||
Unadjusted | OLS | Debiased | |||
Non-Int. | Interacted | Non-Int. | Interacted | ||
DGP3.1, N =24 | |||||
Bias | 0.000 | -0.144 | -0.004 | 0.000 | 0.000 |
SD | 0.577 | 0.362 | 0.281 | 0.433 | 0.387 |
MSE | 0.577 | 0.390 | 0.281 | 0.433 | 0.387 |
CI Coverage (HC2, Student-t) | 0.943 | 0.949 | 0.843 | 0.959 | 0.712 |
CI Coverage (HC2, Satterthwaite) | 0.948 | 0.959 | 0.917 | 0.965 | 0.810 |
CI Coverage (BC-HC2, Student-t) | 0.960 | 0.876 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.966 | 0.936 | |||
DGP3.2, N =24 | |||||
Bias | 0.000 | 0.144 | 0.003 | 0.000 | 0.000 |
SD | 0.144 | 0.342 | 0.129 | 0.316 | 0.198 |
MSE | 0.144 | 0.371 | 0.129 | 0.316 | 0.198 |
CI Coverage (HC2, Student-t) | 1.000 | 0.963 | 0.927 | 0.983 | 0.804 |
CI Coverage (HC2, Satterthwaite) | 1.000 | 0.967 | 0.980 | 0.985 | 0.906 |
CI Coverage (BC-HC2, Student-t) | 0.981 | 0.959 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.983 | 0.992 | |||
DGP3.3, N =24 | |||||
Bias | 0.000 | 0.000 | -0.001 | 0.000 | 0.000 |
SD | 0.433 | 0.109 | 0.164 | 0.168 | 0.210 |
MSE | 0.433 | 0.109 | 0.164 | 0.168 | 0.210 |
CI Coverage (HC2, Student-t) | 0.941 | 0.942 | 0.857 | 0.817 | 0.750 |
CI Coverage (HC2, Satterthwaite) | 0.948 | 0.954 | 0.936 | 0.841 | 0.859 |
CI Coverage (BC-HC2, Student-t) | 0.870 | 0.874 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.896 | 0.939 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].
ATE Estimators | ||||||
HC2 | BC-HC2 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP3.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 95.000 | 95.900 | 96.500 | 95.200 | 96.000 | 96.600 |
Interacted (Debiased) | 69.100 | 71.200 | 81.000 | 85.700 | 87.600 | 93.600 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.130 | 2.248 | 2.385 | 2.181 | 2.302 | 2.445 |
Interacted (Debiased) | 0.793 | 0.837 | 1.095 | 1.502 | 1.585 | 2.389 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.160 | 2.280 | 2.406 | 2.205 | 2.327 | 2.449 |
Interacted (Debiased) | 0.778 | 0.821 | 1.006 | 0.978 | 1.033 | 1.225 |
DGP3.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 97.900 | 98.300 | 98.500 | 97.600 | 98.100 | 98.300 |
Interacted (Debiased) | 78.400 | 80.400 | 90.600 | 94.900 | 95.900 | 99.200 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.110 | 2.227 | 2.362 | 2.125 | 2.242 | 2.381 |
Interacted (Debiased) | 0.487 | 0.514 | 0.672 | 0.852 | 0.899 | 1.338 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.129 | 2.247 | 2.368 | 2.150 | 2.269 | 2.389 |
Interacted (Debiased) | 0.478 | 0.505 | 0.613 | 0.575 | 0.607 | 0.721 |
DGP3.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 79.200 | 81.700 | 84.100 | 84.400 | 87.000 | 89.600 |
Interacted (Debiased) | 72.600 | 75.000 | 85.900 | 85.400 | 87.400 | 93.900 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 0.422 | 0.446 | 0.473 | 0.457 | 0.483 | 0.513 |
Interacted (Debiased) | 0.487 | 0.514 | 0.672 | 0.816 | 0.862 | 1.277 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 0.417 | 0.440 | 0.465 | 0.435 | 0.459 | 0.483 |
Interacted (Debiased) | 0.478 | 0.505 | 0.613 | 0.554 | 0.584 | 0.695 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is .
ATE Estimators | ||||||
HC3 | BC-HC3 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP3.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 96.700 | 97.200 | 97.500 | 96.800 | 97.300 | 97.600 |
Interacted (Debiased) | 87.900 | 89.000 | 95.100 | 95.800 | 96.600 | 99.100 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.514 | 2.653 | 2.842 | 2.557 | 2.699 | 2.895 |
Interacted (Debiased) | 1.793 | 1.893 | 4.927 | 4.118 | 4.347 | 14.929 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.512 | 2.651 | 2.816 | 2.551 | 2.692 | 2.854 |
Interacted (Debiased) | 1.385 | 1.462 | 2.136 | 1.606 | 1.695 | 2.568 |
DGP3.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 98.500 | 98.700 | 98.900 | 98.300 | 98.500 | 98.700 |
Interacted (Debiased) | 93.000 | 93.800 | 98.600 | 99.000 | 99.300 | 100.000 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.490 | 2.628 | 2.815 | 2.491 | 2.629 | 2.818 |
Interacted (Debiased) | 0.989 | 1.044 | 2.624 | 2.280 | 2.406 | 8.289 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.487 | 2.625 | 2.782 | 2.498 | 2.636 | 2.793 |
Interacted (Debiased) | 0.787 | 0.831 | 1.219 | 0.906 | 0.956 | 1.447 |
DGP3.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 85.000 | 87.000 | 89.300 | 90.100 | 92.100 | 94.200 |
Interacted (Debiased) | 90.000 | 91.300 | 97.200 | 95.600 | 96.500 | 99.100 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 0.489 | 0.516 | 0.552 | 0.526 | 0.555 | 0.596 |
Interacted (Debiased) | 0.989 | 1.044 | 2.624 | 2.133 | 2.251 | 7.610 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 0.479 | 0.506 | 0.538 | 0.499 | 0.526 | 0.558 |
Interacted (Debiased) | 0.787 | 0.831 | 1.219 | 0.883 | 0.932 | 1.417 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is .
B.5 Scheme 4
ATE Estimators | |||||
Unadjusted | OLS | Debiased | |||
Non-Int. | Interacted | Non-Int. | Interacted | ||
DGP4.1, N =24 | |||||
Coverage, Percentage | |||||
Bias | 0.000 | -0.060 | -0.042 | 0.000 | 0.000 |
SD | 0.577 | 0.438 | 0.692 | 0.524 | 0.570 |
MSE | 0.577 | 0.442 | 0.693 | 0.524 | 0.570 |
CI Coverage (HC2, Student-t) | 0.862 | 0.849 | 0.801 | 0.832 | 0.803 |
CI Coverage (HC2, Satterthwaite) | 0.872 | 0.858 | 0.856 | 0.843 | 0.860 |
CI Coverage (BC-HC2, Student-t) | 0.836 | 0.841 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.847 | 0.890 | |||
DGP4.2, N =24 | |||||
Coverage, Percentage | |||||
Bias | -0.000 | 0.061 | 0.028 | 0.000 | 0.000 |
SD | 0.144 | 0.325 | 0.317 | 0.303 | 0.256 |
MSE | 0.144 | 0.331 | 0.318 | 0.303 | 0.256 |
CI Coverage (HC2, Student-t) | 1.000 | 0.933 | 0.907 | 0.954 | 0.930 |
CI Coverage (HC2, Satterthwaite) | 1.000 | 0.940 | 0.974 | 0.960 | 0.991 |
CI Coverage (BC-HC2, Student-t) | 0.954 | 0.952 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.961 | 0.994 | |||
DGP4.3, N =24 | |||||
Coverage, Percentage | |||||
Bias | -0.000 | 0.001 | -0.014 | 0.000 | 0.000 |
SD | 0.433 | 0.272 | 0.405 | 0.329 | 0.355 |
MSE | 0.433 | 0.272 | 0.405 | 0.329 | 0.355 |
CI Coverage (HC2, Student-t) | 0.952 | 0.948 | 0.828 | 0.895 | 0.851 |
CI Coverage (HC2, Satterthwaite) | 0.961 | 0.960 | 0.921 | 0.916 | 0.926 |
CI Coverage (BC-HC2, Student-t) | 0.907 | 0.868 | |||
CI Coverage (BC-HC2, Satterthwaite) | 0.927 | 0.929 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].
ATE Estimators | ||||||
HC2 | BC-HC2 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP4.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 82.100 | 83.200 | 84.300 | 82.400 | 83.600 | 84.700 |
Interacted (Debiased) | 79.000 | 80.300 | 86.000 | 82.800 | 84.100 | 89.000 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.101 | 2.217 | 2.355 | 2.135 | 2.254 | 2.395 |
Interacted (Debiased) | 1.898 | 2.004 | 2.625 | 2.412 | 2.546 | 3.616 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.166 | 2.286 | 2.419 | 2.196 | 2.318 | 2.449 |
Interacted (Debiased) | 1.822 | 1.923 | 2.293 | 1.952 | 2.060 | 2.458 |
DGP4.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 94.600 | 95.400 | 96.000 | 94.600 | 95.400 | 96.100 |
Interacted (Debiased) | 92.000 | 93.000 | 99.100 | 94.200 | 95.200 | 99.400 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.983 | 2.093 | 2.220 | 1.990 | 2.101 | 2.229 |
Interacted (Debiased) | 1.194 | 1.260 | 1.655 | 1.424 | 1.503 | 2.103 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.041 | 2.154 | 2.273 | 2.051 | 2.164 | 2.283 |
Interacted (Debiased) | 1.155 | 1.219 | 1.460 | 1.193 | 1.260 | 1.497 |
DGP4.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 87.500 | 89.500 | 91.600 | 88.700 | 90.700 | 92.700 |
Interacted (Debiased) | 83.100 | 85.100 | 92.600 | 85.000 | 86.800 | 92.900 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.051 | 1.109 | 1.176 | 1.068 | 1.127 | 1.197 |
Interacted (Debiased)s | 1.194 | 1.260 | 1.655 | 1.443 | 1.523 | 2.132 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 1.028 | 1.085 | 1.148 | 1.050 | 1.108 | 1.170 |
Interacted (Debiased) | 1.155 | 1.219 | 1.460 | 1.204 | 1.271 | 1.505 |
-
•
Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].
ATE Estimators | ||||||
HC3 | BC-HC3 | |||||
Z | Student-t | Satterthwaite | Z | Student-t | Satterthwaite | |
DGP4.1, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 85.400 | 86.500 | 87.600 | 85.600 | 86.700 | 87.900 |
Interacted (Debiased) | 89.800 | 90.500 | 95.100 | 92.200 | 92.800 | 96.600 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.533 | 2.673 | 2.866 | 2.565 | 2.707 | 2.905 |
Interacted (Debiased) | 4.225 | 4.459 | 11.551 | 5.853 | 6.178 | 18.780 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.561 | 2.703 | 2.880 | 2.595 | 2.739 | 2.913 |
Interacted (Debiased) | 3.143 | 3.317 | 4.784 | 3.276 | 3.458 | 5.019 |
DGP4.2, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 96.300 | 97.000 | 97.600 | 96.500 | 97.100 | 97.800 |
Interacted (Debiased) | 96.900 | 97.400 | 100.000 | 98.400 | 98.800 | 100.000 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 2.382 | 2.514 | 2.693 | 2.388 | 2.521 | 2.701 |
Interacted (Debiased) | 2.363 | 2.494 | 6.239 | 3.118 | 3.291 | 9.625 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 2.425 | 2.559 | 2.715 | 2.432 | 2.567 | 2.725 |
Interacted (Debiased) | 1.831 | 1.932 | 2.824 | 1.872 | 1.976 | 2.895 |
DGP4.3, N =24 | ||||||
Coverage, Percentage | ||||||
Non-interacted (Debiased) | 92.100 | 93.600 | 95.400 | 93.100 | 94.500 | 96.100 |
Interacted (Debiased) | 93.500 | 94.500 | 98.300 | 94.200 | 95.100 | 98.500 |
CI Width, Average | ||||||
Non-interacted (Debiased) | 1.213 | 1.281 | 1.370 | 1.228 | 1.296 | 1.388 |
Interacted (Debiased) | 2.363 | 2.494 | 6.239 | 3.258 | 3.438 | 10.277 |
CI Width, Median | ||||||
Non-interacted (Debiased) | 1.174 | 1.239 | 1.320 | 1.198 | 1.264 | 1.343 |
Interacted (Debiased) | 1.831 | 1.932 | 2.824 | 1.891 | 1.996 | 2.930 |
References
- [AI17] Susan Athey and Guido W Imbens. The econometrics of randomized experiments. In Handbook of economic field experiments, volume 1, pages 73–140. Elsevier, 2017.
- [ALO09] Joshua Angrist, Daniel Lang, and Philip Oreopoulos. Incentives and services for college achievement: Evidence from a randomized trial. American Economic Journal: Applied Economics, 1(1):136–63, 2009.
- [AM13] Peter M Aronow and Joel A Middleton. A class of unbiased estimators of the average treatment effect in randomized experiments. Journal of Causal Inference, 1(1):135–154, 2013.
- [AP08] Joshua D Angrist and Jörn-Steffen Pischke. Mostly harmless econometrics. Princeton university press, 2008.
- [Bli73] Alan S Blinder. Wage discrimination: reduced form and structural estimates. Journal of Human resources, pages 436–455, 1973.
- [BM02] Robert M Bell and Daniel F McCaffrey. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28(2):169–182, 2002.
- [DC18] Angus Deaton and Nancy Cartwright. Understanding and misunderstanding randomized controlled trials. Social Science & Medicine, 210:2–21, 2018.
- [DGK07] Esther Duflo, Rachel Glennerster, and Michael Kremer. Using randomization in development economics research: A toolkit. Handbook of development economics, 4:3895–3962, 2007.
- [Dun12] Thad Dunning. Natural experiments in the social sciences: a design-based approach. Cambridge University Press, 2012.
- [Eic67] Friedhelm Eicker. Limit theorems for regressions with unequal and dependent errors. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 59–82. University of California Press, 1967.
- [Fre08a] David A Freedman. on regression adjustments to experimental data. Advances in Applied Mathematics, 40(2):180–193, 2008.
- [Fre08b] David A Freedman. on regression adjustments in experiments with several treatments. The annals of applied statistics, 2(1):176–196, 2008.
- [Gle17] Rachel Glennerster. The practicalities of running randomized evaluations: Partnerships, measurement, ethics, and transparency. In Handbook of economic field experiments, volume 1, pages 175–243. Elsevier, 2017.
- [Hub67] Peter J. Huber. The behavior of maximum likelihood estimates under nonstandard condition. In N.M. LeCam and J. Neyman, editors, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 1967. University of California Press.
- [Imb10] Guido W Imbens. Better late than nothing: Some comments on deaton (2009) and heckman and urzua (2009). Journal of Economic literature, 48(2):399–423, 2010.
- [IR15] Guido W Imbens and Donald B Rubin. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015.
- [Lin13a] Winston Lin. Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics, 7(1):295–318, 2013.
- [Lin13b] Winston Lin. Essays on Causal Inference in Randomized Experiments. University of California, Berkeley, 2013.
- [LR11] John A List and Imran Rasul. Field experiments in labor economics. In Handbook of labor economics, volume 4, pages 103–228. Elsevier, 2011.
- [MSY13] Luke W Miratrix, Jasjeet S Sekhon, and Bin Yu. Adjusting treatment effect estimates by post-stratification in randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(2):369–396, 2013.
- [MW85] James G MacKinnon and Halbert White. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of econometrics, 29(3):305–325, 1985.
- [Oax73] Ronald Oaxaca. Male-female wage differentials in urban labor markets. International economic review, pages 693–709, 1973.
- [Pus21] James Pustejovsky. clubSandwich: Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections, 2021. R package version 0.5.3.
- [Rec21] Ben Recht. Effect size is significantly more important than statistical significance., 2021.
- [Rub90] Donald B. Rubin. Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25(3):279–292, 1990.
- [SA12] Cyrus Samii and Peter M Aronow. On equivalencies between design-based and regression-based variance estimators for randomized experiments. Statistics & Probability Letters, 82(2):365–370, 2012.
- [Sat46] Franklin E Satterthwaite. An approximate distribution of estimates of variance components. Biometrics bulletin, 2(6):110–114, 1946.
- [SNDS90] Jerzy Splawa-Neyman, Dorota M Dabrowska, and TP Speed. On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science, pages 465–472, 1923 [1990].
- [TE93] Robert J Tibshirani and Bradley Efron. An introduction to the bootstrap. Monographs on statistics and applied probability, 57:1–436, 1993.
- [WGB18] Edward Wu and Johann A Gagnon-Bartsch. The loop estimator: Adjusting for covariates in randomized experiments. Evaluation review, 42(4):458–488, 2018.
- [Whi80] Halbert White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: journal of the Econometric Society, pages 817–838, 1980.