This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials

Haoge Chang, Joel A. Middleton, and P. M. Aronow111We thank Donald Andrews, Winston Lin, Cyrus Samii and Jasjeet Sekhon for helpful comments and discussions.
Abstract

In an influential critique of empirical practice, Freedman [Fre08a, Fre08b] showed that the linear regression estimator was biased for the analysis of randomized controlled trials under the randomization model. Under Freedman’s assumptions, we derive exact closed-form bias corrections for the linear regression estimator with and without treatment-by-covariate interactions. We show that the limiting distribution of the bias corrected estimator is identical to the uncorrected estimator, implying that the asymptotic gains from adjustment can be attained without introducing any risk of bias. Taken together with results from Lin [Lin13a], our results show that Freedman’s theoretical arguments against the use of regression adjustment can be completely resolved with minor modifications to practice.

1 Introduction

Randomized Controlled Trials (RCTs) are popular in empirical economics [AP08, DGK07, Gle17, LR11]. When estimating average treatment effects, adjustment for pretreatment covariates with linear regression is a commonly recommended practice because it can reduce the variability of estimates. However, adjusting for covariates remains somewhat controversial, in large part because of an influential critique from David Freedman [Fre08a, Fre08b].

Freedman argued that randomization does not justify the use of linear regression for completely randomized experiments. Freedman’s theoretical arguments relied on three results proven under the randomization-based [SNDS90, IR15] inferential paradigm:

  1. 1.

    asymptotically, the linear regression estimator can be inefficient relative to the unadjusted (difference-in-means) estimator if the design is imbalanced;

  2. 2.

    the classical standard error for linear regression is inconsistent;

  3. 3.

    the regression estimator has an Op(n1)O_{p}(n^{-1}) bias term.

Freedman’s arguments were influential among scholars across multiple disciplines (e.g., [Dun12],[Rec21]). Freedman’s third argument garnered particular attention among social scientists. Notably, [DC18]’s critique of randomization in empirical economics argued that the bias introduced by regression undermines the gold standard argument for RCTs.

Scholars have worked to address Freedman’s critiques and to understand the extent that they can and do matter for empirical practice in empirical economics. Using Freedman’s own framework, [Lin13a] showed that arguments 1 and 2 were resolved by small modifications to practice. Freedman’s efficiency result may be addressed through simple modifications to the regression specification, namely including treatment by covariate interactions [Bli73, Oax73]. Then it can be shown that the adjusted estimator is never less asymptotically efficient than the unadjusted estimator. Regarding argument 2, Lin proves that robust standard errors ([Whi80, Hub67, Eic67], see also [SA12]) are asymptotically conservative in Freedman’s setting, guaranteeing the validity of large-sample inference. On argument 3, [Lin13a] (see also [Lin13b]) notes that the leading term of the bias is in fact estimable and can be shown to be small in a real-world empirical example. However, the small-sample bias of the regression estimator was not yet fully resolved.

Since [Lin13a], there have been notable papers that have proposed unbiased regression-type estimators for experimental data. [MSY13] demonstrate that if the regression model is fully saturated (see also [AI17] and [Imb10]), then the associated effect estimate is unbiased conditional on the event that treatment is not collinear with any covariate stratum. This approach cannot generally be used without coarsening continuous covariates. [AM13] proposes the use of auxiliary data, demonstrating that the suitable use of hold-out samples ensures the finite-sample unbiasedness of the associated regression estimator, but the paper does not consider efficiency properties. More recently, [WGB18] extended [AM13] to propose an innovative but computationally expensive split-sample approach for completely randomized experiments.

The primary contribution of this paper is to resolve Freedman’s third theoretical argument by proposing finite-sample-exact, closed-form bias corrections without adding any new assumptions. Our idea builds on [Lin13a]’s proposal to estimate the leading term of the bias, but further develops a novel finite-sample exact bias correction encompassing all higher-order terms [Fre08a]. We derive these bias corrections for both the noninteracted and interacted linear regression estimators. We prove that the estimators have the same limiting distributions as the non-bias-adjusted estimators, implying that they could replace existing estimators in instances where bias is a prevailing concern (e.g., trials that may be aggregated in meta-analysis). We further provide a numerical illustration demonstrating these properties.
Finally, we remind the readers that the practice of debiasing estimators is not uncontroversial. [TE93]222We thank Winston Lin for suggesting this reference. has warned that the bias correction could be dangerous in practice due to its high variability333As will be shown in the simulations, when the performance is measured by the Root Mean Squared Error (RMSE), there is no clear dominance among the estimators: in some cases the RMSEs of the debiased estimators are strictly smaller than those of other estimators, and in other cases larger.. In real-world decision making processes, people may express different preferences for different statistical properties (i.e. unbiasedness or low Mean Squared Error)444See [WGB18] for an anecdotal example of a policy-maker favoring unbiasedness.. Our results shall imply that in large samples the additional variation caused by the bias correction is negligible, but for small samples, in some cases, we find it important to account for the sampling variability of the additional terms. To address this problem, we propose a simple modification to the standard error estimation procedure. Such modification, based on recomputing OLS residuals using the debiased estimators, is shown to work well on our simulated datasets. We make recommendations for practice in the Simulation section.
The organization of the paper is as follows: Section 2 includes the model setup and assumptions; Section 3 considers a characterization of bias terms of the OLS estimators; Section 4 proposes the bias corrections; Section 5 includes simulation results with both simulated datasets and a real world dataset. In the appendix one can find the proofs for theorems in Section 3 and Section 4, and more simulation results.

2 Setting, Assumptions and Notations

We follow the setting of [Fre08a] and [Lin13a], which assume a Neyman [SNDS90] model with covariates. There are nn subjects indexed by i=1,,ni=1,...,n. For each subject we observe an outcome YiY_{i} and a column vector of covariates zi=(zi1,zi2,,ziK)K\textbf{z}_{i}=(z_{i1},z_{i2},...,z_{iK})\in\mathbb{R}^{K}. The dimension of the covariates, KK, does not change with the sample size.
Each subject has two potential outcomes aia_{i} and bib_{i} (cf., the stable-unit-treatment-value-assumption [Rub90]). We observe Yi=aiY_{i}=a_{i} if ii is chosen for treatment arm AA (treated group) and Yi=biY_{i}=b_{i} if i is chosen for arm BB (control group). Let TiT_{i} be the dummy variable for treatment arm AA. Thus the observed outcome for ii is Yi=aiTi+bi(1Ti)Y_{i}=a_{i}T_{i}+b_{i}(1-T_{i}).
The experiment is assumed to be completely randomized: nAn_{A} out of nn subjects are randomly assigned to arm AA and the remaining nB=nnAn_{B}=n-n_{A} subjects to arm BB. Random assignment is the only source of randomness in the model. We do not assume a superpopulation: the nn subjects are the population of interest.
We introduce some notation. let nn be the population size, nAn_{A} and nBn_{B} be the number of subjects in treatment arms AA and BB, respectively. Let [A]={iTi=1}[A]=\{i\mid T_{i}=1\} denote the set of individuals who are chosen for arm AA and similarly [B]={iTi=0}[B]=\{i\mid T_{i}=0\}. Let x¯=1Ni=1Nxi\bar{x}=\frac{1}{N}\sum_{i=1}^{N}x_{i}, x¯A=1nAi[A]xi\bar{x}_{A}=\frac{1}{n_{A}}\sum_{i\in[A]}x_{i} and x¯B=1nBi[B]xi\bar{x}_{B}=\frac{1}{n_{B}}\sum_{i\in[B]}x_{i} denote the population average, group AA average, and group BB average, respectively, of possibly a vector-valued variable xx. The average treatment effect (ATE) can be written in this notation as:

a¯b¯\bar{a}-\bar{b}

and the difference-in-means estimator:

a¯Ab¯B\overline{a}_{A}-\overline{b}_{B}

Simiarly we can write 1ni=1n𝐳i𝐳i=𝐳𝐳¯\frac{1}{n}\sum_{i=1}^{n}\mathbf{z}_{i}\mathbf{z}_{i}^{\prime}=\overline{\mathbf{z}\mathbf{z}^{\prime}} for 𝐳iK\mathbf{z}_{i}\in\mathbb{R}^{K} and 1ni=1nai𝐳i=a𝐳¯\frac{1}{n}\sum_{i=1}^{n}a_{i}\mathbf{z}_{i}=\overline{a\mathbf{z}} for aia_{i}\in\mathbb{R} and 𝕫K\mathbb{z}\in\mathbb{R}^{K}.
We make the following assumptions throughout the paper, which are standard in the literature. (cf., [Fre08b, Fre08a, Lin13a]).

Assumption 1 (Bounded fourth moments).

For all n=1,2n=1,2..., and xi{ai,bi,zi1,,ziK}x_{i}\in\{a_{i},b_{i},z_{i1},...,z_{iK}\},

1ni=1nxi4<L<\displaystyle\frac{1}{n}\sum_{i=1}^{n}x_{i}^{4}<L<\infty

where LL is a finite constant.

Assumption 2 (Convergence of first and second moments).

For xi=[ai,bi,𝐳i]2+Kx_{i}=[a_{i},b_{i},{\mathbf{z}}_{i}^{\prime}]\in\mathbb{R}^{2+K},

1ni=1nxixi𝐌\displaystyle\frac{1}{n}\sum_{i=1}^{n}x_{i}x_{i}^{\prime}\to\mathbf{M}

where 𝐌\mathbf{M} is a positive definite matrix with finite entries. Moreoever, 1ni=1n𝐳i𝐳i\frac{1}{n}\sum_{i=1}^{n}{\mathbf{z}}_{i}{\mathbf{z}}_{i}^{\prime} converges to an invertible matrix.

Assumption 3 (Group Sizes).

Let pA,n=nAnp_{A,n}=\frac{n_{A}}{n} and pB,n=nnAnp_{B,n}=\frac{n-n_{A}}{n}, the inclusion probability into the treatment arm A and arm B, respectively. We assume pA,n>0p_{A,n}>0 and pB,n>0p_{B,n}>0 for all nn, and

pA,npA>0, as n\displaystyle p_{A,n}\to p_{A}>0,\text{ as }n\to\infty
pB,n1pA>0, as n\displaystyle p_{B,n}\to 1-p_{A}>0,\text{ as }n\to\infty
Assumption 4 (Centering).

𝐳¯=0\bar{{\mathbf{z}}}=0

All assumptions are employed regularly in the literature. They are used to derive consistency and asymptotic normality for the estimator. Assumption 3 requires each arm receives a nontrivial fraction of subjects over the asymptotic sequence of the models. Assumption 4 is without loss of generality: in practice, researchers can just demean each covariate and apply our method.
We remind the readers of the definitions of our two OLS regression adjusted ATE estimators. The first estimator comes from a noninteracted OLS regression where one regresses observed outcome YY on the treatment indicator TT and demeaned pretreatment covariates ZZ. The coefficient estimate for TT is the noninteracted OLS regression adjusted ATE estimator. The second estimator comes from a fully interacted OLS regression where one regresses observed outcome YY on the treatment indicator TT, demeaned pretreatment covariates ZZ, and interaction terms of the treatment indicators and demeaned pretreatment covariates. The coefficient estimate for TT is the interacted OLS regression adjusted ATE estimator.
Finally, we prepare some notation for the sections below. Let aia^{*}_{i} and bib^{*}_{i} be the centered potential outcomes, namely, ai=aia¯a^{*}_{i}=a_{i}-\bar{a} and bi=bib¯b^{*}_{i}=b_{i}-\bar{b}. Let D^=𝐳𝐳¯pA𝐳¯A𝐳¯ApB𝐳¯B𝐳¯B\widehat{D}=\overline{\mathbf{z}\mathbf{z}^{\prime}}-{p}_{A}\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}-{p}_{B}\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime}, N^=pA(a𝐳¯Aa¯A𝐳¯A)+pB(b𝐳¯Bb¯B𝐳¯B)\widehat{N}={p}_{A}(\overline{a\mathbf{z}}_{A}-\overline{a}_{A}\overline{\mathbf{z}}_{A})+{p}_{B}(\overline{b\mathbf{z}}_{B}-\overline{b}_{B}\overline{\mathbf{z}}_{B}), D=𝐳𝐳¯D=\overline{\mathbf{z}\mathbf{z}^{\prime}} and N=pAa𝐳¯+pBb𝐳¯N={p}_{A}\overline{a\mathbf{z}}+{p}_{B}\overline{b\mathbf{z}}. With this notation the regression coefficients estimators of the pretreatment covariates in the noninteracted case can be written as Q^=D^1N^\widehat{Q}={\widehat{D}}^{-1}\widehat{N}, and the population coefficients Q=D1NQ=D^{-1}N. Denote the (rescaled) leverage of iith data point as hi=ziD1zih_{i}=z_{i}^{\prime}D^{-1}z_{i}.
Further define D^A=𝐳𝐳¯A𝐳¯A𝐳¯A\widehat{D}_{A}=\overline{\mathbf{z}\mathbf{z}^{\prime}}_{A}-\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}, N^A=a𝐳¯Aa¯A𝐳¯A\widehat{N}_{A}=\overline{a\mathbf{z}}_{A}-\overline{a}_{A}\overline{\mathbf{z}}_{A}, D^B=𝐳𝐳¯B𝐳¯B𝐳¯B\widehat{D}_{B}=\overline{\mathbf{z}\mathbf{z}^{\prime}}_{B}-\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime}, N^B=b𝐳¯Bb¯B𝐳¯B\widehat{N}_{B}=\overline{b\mathbf{z}}_{B}-\overline{b}_{B}\overline{\mathbf{z}}_{B}, NA=a𝐳¯N_{A}=\overline{a\mathbf{z}} and NB=b𝐳¯N_{B}=\overline{b\mathbf{z}}. Thus the regression coefficients estimator of the pretreatment covariates in the interactive case can be written as Q^A=D^A1N^A\widehat{Q}_{A}=\widehat{D}^{-1}_{A}\widehat{N}_{A} and Q^B=D^B1N^B\hat{Q}_{B}=\widehat{D}^{-1}_{B}\widehat{N}_{B}. Their population counterparts are QA=D1NAQ_{A}=D^{-1}N_{A} and QB=D1NBQ_{B}=D^{-1}N_{B}.

3 Bias Characterization

As shown in [Lin13a], the OLS regression adjusted ATE estimator can be written as:

ATE^NI=a¯Ab¯B{(𝐳¯A𝐳¯)Q^(𝐳¯B𝐳¯)Q^}\displaystyle\widehat{ATE}_{NI}=\overline{a}_{A}-\overline{b}_{B}-\{(\overline{\mathbf{z}}_{A}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}-(\overline{\mathbf{z}}_{B}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}\}

for the noninteracted case and

ATE^I\displaystyle\widehat{ATE}_{I} =a¯Ab¯B{(𝐳¯A𝐳¯)Q^A(𝐳¯B𝐳¯)Q^B}\displaystyle=\overline{a}_{A}-\overline{b}_{B}-\{(\overline{\mathbf{z}}_{A}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}_{A}-(\overline{{\mathbf{z}}}_{B}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}_{B}\}

for the interacted case, where Q^\widehat{Q}, Q^A\widehat{Q}_{A} and Q^B\widehat{Q}_{B} are the OLS coefficients in front of the covariates.

A characterization of the bias terms is provided in this section. Note that both the noninteracted and interacted estimators can be written as sums of the difference-in-means estimator and a regression adjustment using group means and OLS coefficients. The bias comes from the regression adjustment terms, in particular from estimating the regression coefficients on the covariates. We shall characterize the bias terms of the coefficient estimator first. We start from the noninteracted case. Note from here on we shall assume for simplicity that all design matrices (i.e. D^\hat{D} and DD) are invertible. In case of noninvertible design matrices, our debiased procedure will still work after choosing a particular generalized inverse, and compute the ATE estimators according to the formulae above.

Theorem 3.1.

The OLS coefficient vector for covariates can be written as:

Q^=\displaystyle\widehat{Q}= Q+ν1+ν2+ν3\displaystyle Q+\nu_{1}+\nu_{2}+\nu_{3}

with

ν1\displaystyle\nu_{1} =D1(pA(a𝐳¯Aa𝐳¯)+pB(b𝐳¯Bb𝐳¯))=Op(n12),\displaystyle=D^{-1}\left({p}_{A}\left(\overline{a^{*}\mathbf{z}}_{A}-\overline{a^{*}\mathbf{z}}\right)+{p}_{B}\left(\overline{b^{*}\mathbf{z}}_{B}-\overline{b^{*}\mathbf{z}}\right)\right)=O_{p}(n^{-\frac{1}{2}}),
ν2\displaystyle\nu_{2} =(D^1D1)N^=Op(n1),\displaystyle=\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}=O_{p}(n^{-1}),
ν3\displaystyle\nu_{3} =D1(pAa¯A𝐳¯A+pBb¯B𝐳¯B)=Op(n1).\displaystyle=-D^{-1}\left({p}_{A}\overline{a^{*}}_{A}\overline{\mathbf{z}}_{A}+{p}_{B}\overline{b^{*}}_{B}\overline{\mathbf{z}}_{B}\right)=O_{p}(n^{-1}).

From the coefficient decomposition one can directly characterize the bias term of the ATE estimator. Note that the bias terms are of order Op(n1)O_{p}(n^{-1}).

Corollary 3.1.

The bias of the ATE^NI\widehat{ATE}_{NI} estimator is:

𝐄[ATE^NIATE]=\displaystyle\mathbf{E}[\widehat{ATE}_{NI}-ATE]= 𝐄[(𝐳¯B𝐳¯A)(ν1+ν2+ν3)].\displaystyle\mathbf{E}\left[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})(\nu_{1}+\nu_{2}+\nu_{3})\right].

Moreover, (𝐳¯B𝐳¯A)(ν1+ν2+ν3)=Op(n12)Op(n12)=Op(n1)(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})(\nu_{1}+\nu_{2}+\nu_{3})=O_{p}(n^{-\frac{1}{2}})O_{p}(n^{-\frac{1}{2}})=O_{p}(n^{-1})

Following the same steps as we did for the noninteracted estimator, we are able to derive analogous results for the interacted estimator.

Theorem 3.2.

The OLS coefficient vectors for covariates can be written as

Q^A=QA+ν1A+ν2A\displaystyle\hat{Q}_{A}=Q_{A}+\nu_{1A}+\nu_{2A}
Q^B=QB+ν1B+ν2B\displaystyle\hat{Q}_{B}=Q_{B}+\nu_{1B}+\nu_{2B}

with

ν1A\displaystyle\nu_{1A} =(D^A1D1)N^A=Op(n12)\displaystyle=(\hat{D}^{-1}_{A}-D^{-1})\hat{N}_{A}=O_{p}(n^{-\frac{1}{2}})
ν2A\displaystyle\nu_{2A} =D1(N^ANA)=Op(n12);\displaystyle=D^{-1}(\hat{N}_{A}-N_{A})=O_{p}(n^{-\frac{1}{2}});
ν1B\displaystyle\nu_{1B} =(D^B1D1)N^B=Op(n12);\displaystyle=(\hat{D}^{-1}_{B}-D^{-1})\hat{N}_{B}=O_{p}(n^{-\frac{1}{2}});
ν2B\displaystyle\nu_{2B} =D1(N^BNB)=Op(n12)\displaystyle=D^{-1}(\hat{N}_{B}-N_{B})=O_{p}(n^{-\frac{1}{2}})
Corollary 3.2.

The bias of the ATE^I\widehat{ATE}_{I} estimator is:

E[ATE^IATE]\displaystyle E[\widehat{ATE}_{I}-ATE] =E[z¯B(ν1B+ν2B)]E[z¯A(ν1A+ν2A)]\displaystyle=E[\bar{z}_{B}(\nu_{1B}+\nu_{2B})]-E[\bar{z}_{A}(\nu_{1A}+\nu_{2A})]

Moreover, z¯B(ν1B+ν2B)=Op(n12)Op(n12)=Op(n1)\bar{z}_{B}(\nu_{1B}+\nu_{2B})=O_{p}(n^{-\frac{1}{2}})O_{p}(n^{-\frac{1}{2}})=O_{p}(n^{-1}) and z¯A(ν1A+ν2A)=Op(n12)Op(n12)=Op(n1)\bar{z}_{A}(\nu_{1A}+\nu_{2A})=O_{p}(n^{-\frac{1}{2}})O_{p}(n^{-\frac{1}{2}})=O_{p}(n^{-1}).

Note that this result implies that the bias terms of the interactive ATE estimator are also of order Op(n1)O_{p}(n^{-1}).

4 Bias Corrections for Regression Components

Having established the decomposition, we now derive estimators of each bias term for use as bias corrections. We show that these bias estimates are (i) exactly unbiased and (ii) have estimation error of Op(n1)O_{p}(n^{-1}). It follows that use of this bias correction with an adjusted estimator yields a finite-sample unbiased estimator with the limit distribution of the adjusted estimator. We remind the readers that hi=ziD1zih_{i}=z_{i}^{\prime}D^{-1}z_{i}, the (rescaled) leverage of iith data point as defined in Section 2.

We again begin with the noninteracted case.

Theorem 4.1.

An unbiased estimator for the bias in the noninteracted case is:

Bias^NI=\displaystyle\widehat{Bias}_{NI}= 1nnBnB1(hb¯Bh¯Bb¯B)1nnAnA1(ha¯Ah¯Aa¯A)+(𝐳¯B𝐳¯A)(D^1D1)N^+\displaystyle\frac{1}{n}\frac{n_{B}}{n_{B}-1}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)-\frac{1}{n}\frac{n_{A}}{n_{A}-1}\left(\overline{ha}_{A}-\overline{h}_{A}\overline{a}_{A}\right)+(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}+
CA,NInAi[A](zi𝐳¯A)D1(zi𝐳¯A)(aia¯A)CB,NInBi[B](zi𝐳¯B)D1(zi𝐳¯B)(bib¯B)\displaystyle\frac{C_{A,NI}}{n_{A}}\sum_{i\in[A]}(z_{i}-\overline{\mathbf{z}}_{A})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{A})(a_{i}-\overline{a}_{A})-\frac{C_{B,NI}}{n_{B}}\sum_{i\in[B]}(z_{i}-\overline{\mathbf{z}}_{B})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{B})(b_{i}-\overline{b}_{B})

CA,NIC_{A,NI} and CB,NIC_{B,NI} are two constants depending on nn, nAn_{A} and nBn_{B}. Their exact formulas are given in the appendix. Moreover Bias^NI=Op(n1)\widehat{Bias}_{NI}=O_{p}(n^{-1})

Corollary 4.1.

The following estimator is unbiased for estimating the ATE:

ATE^NI,Debias=ATE^NIBias^NI\widehat{ATE}_{NI,Debias}=\widehat{ATE}_{NI}-\widehat{Bias}_{NI}

The results are derived analogously in the interacted case.

Theorem 4.2.

An unbiased estimator for the bias in the interacted case is:

Bias^I=\displaystyle\widehat{Bias}_{I}= 1nnAnB1(hb¯Bh¯Bb¯B)+z¯B(D^B1D1)N^BCB,InBi[B](zi𝐳¯B)D1(zi𝐳¯B)(bib¯B)\displaystyle\frac{1}{n}\frac{n_{A}}{n_{B}-1}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)+\bar{z}_{B}^{\prime}(\hat{D}_{B}^{-1}-D^{-1})\hat{N}_{B}-\frac{C_{B,I}}{n_{B}}\sum_{i\in[B]}(z_{i}-\overline{\mathbf{z}}_{B})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{B})(b_{i}-\overline{b}_{B})-
1nnBnA1(ha¯Ah¯Aa¯A)z¯A(D^A1D1)N^A+CA,InAi[A](zi𝐳¯A)D1(zi𝐳¯A)(aia¯A).\displaystyle\frac{1}{n}\frac{n_{B}}{n_{A}-1}\left(\overline{ha}_{A}-\overline{h}_{A}\overline{a}_{A}\right)-\bar{z}_{A}^{\prime}(\hat{D}_{A}^{-1}-D^{-1})\hat{N}_{A}+\frac{C_{A,I}}{n_{A}}\sum_{i\in[A]}(z_{i}-\overline{\mathbf{z}}_{A})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{A})(a_{i}-\overline{a}_{A}).

CA,IC_{A,I} and CB,IC_{B,I} are two constants depending on nn, nAn_{A} and nBn_{B}. Their exact formulas are given in the appendix. Moreover Bias^I=Op(n1)\widehat{Bias}_{I}=O_{p}(n^{-1}).

Corollary 4.2.

The following estimator is unbiased for estimating the ATE:

ATE^I,Debias=ATE^IBias^I\widehat{ATE}_{I,Debias}=\widehat{ATE}_{I}-\widehat{Bias}_{I}
Remark 1.

Note that both adjustments in Theorem 4.1 and Theorem 4.2 are of order Op(n1)O_{p}(n^{-1}). Thus n(ATE^NI,DebiasATE^NI)=op(1)\sqrt{n}(\widehat{ATE}_{NI,Debias}-\widehat{ATE}_{NI})=o_{p}(1) and n(ATE^I,DebiasATE^I)=op(1)\sqrt{n}(\widehat{ATE}_{I,Debias}-\widehat{ATE}_{I})=o_{p}(1). The debiased estimators have the same limiting distributions as the original estimators.

Remark 2.

We briefly remark on why it is possible to design unbiased adjusted estimators in closed-form. Examining at the expressions in Section 3, although all biases are nonlinear, only v2v_{2}, v1Av_{1A} and v1Bv_{1B} have an infinite order Taylor expansion, but these terms can be expressed purely in terms of the observable data. All other terms are in expectation functions of moments that can be unbiasedly estimated.

5 Simulations

In this section we apply our estimators on several datasets.

We briefly comment on variance estimation and confidence interval construction. We showed in Section 4 that our debiased estimators have the same asymptotic distributions as those of the OLS estimators. This implies that for large samples we can recenter the OLS confidence intervals with our debiased estimators and expect the same coverage probabilities. For small samples, however, we find it important to account for the sampling variability of the additional terms. Indeed, in one of the simulations below, we found that a naive recentering procedure may lead to severe undercoverage. To address this problem, we propose a simple procedure shown to work well on our simulated datasets.555Another way is to directly estimate the variances of the additional terms, but this may be cumbersome to do. In this procedure, one first runs the OLS regression and computes the debiased estimator. Then one replaces the OLS treatment coefficient with the debiased estimated coefficients and recomputes the OLS residuals, keeping all other coefficients the same. Finally, one computes the variances and constructs confidence intervals for the debiased estimator using the same formula as for the OLS estimators. In the simulations below, such procedures will be denoted by BC, which stands for bias-corrected. In Appendix B, one can find a more detailed comparison of this new procedure with standard ones. In practice, we recommend researchers to use our debiased estimators with this procedure, the BC-HC2 heteroskedasticity-robust variance estimator with a Satterthwaite adjustment for inference666For a discussion of the Satterthwaite adjustment, see [Sat46],[BM02], [Lin13a] and [Imb10]..

5.1 Simulated Datasets

In this section we compare the performance of our debiased estimators with that of standard estimators using simulated datasets. We show the results of two simulation schemes here777In Appendix B, readers can find results for two more simulation schemes as well as some graphical information of the data generating processes.. In each scheme, we first generate two dimensional covariates that are the quantiles of some prespecified distributions888For example, in scheme 1, with a sample of NN units, the covariates of iith unit are the iN+1\frac{i}{N+1} quantiles of a Beta(0.5,0.5) distribution and a Triangle(0,1) distribution.. We then compute the studentized leverage ratios and use those to impute potential outcomes. We consider three ways to impute potential outcomes. For all cases the average treatment effect is equal to 0. The experiment is a completely randomized experiment with 24 units and an inclusion probability of 13\frac{1}{3} for the treatment arm. Table 5.1 includes simulation details. Note that these schemes are designed specifically such that the finite sample bias is relatively large.
Table 2 and 3 report the simulation results for the two schemes. Our debiased estimators are exactly unbiased as expected. In terms of the root mean squared error (RMSE) the picture is less clear. There are cases where the debiased estimators dominated others (DGP 1.1 and DGP 1.3), and cases where the unadjusted estimator is the best (DGP 1.2 and DGP 2.2)999This is the artifact of the DGPs. Recall the variance formula for the difference-in-means estimator, for example from [IR15].. Note that DGP 1.3 and DGP 2.3 are constant effect models, in which the noninteracted OLS estimators are first order unbiased. However in DGP 1.3 we still observe a small, higher-order bias.
In terms of confidence interval coverages, observe in DGP 2.1, 2.2 and 2.3 that the original recentering intervals exhibit significant undercoverages. However, the procedure based on recomputing the OLS residuals with Satterthwaite adjustments works reasonably well. There is only one case DGP2.3, Non-Int, where the coverage is not very satisfactory. As shown in Table 6 and 8 in the appendix, the BC procedures do not add to the median confidence interval length significantly, although it tends to add to the average confidence interval length. The Satterthwaite could also add to the median (mean) confidence interval length: it typically increases the confidence interval length by at most 10 to 20 percent, compared with the Student-t adjustment.

X1(i)X_{1}(i) X2(i)X_{2}(i) Y0(i)Y_{0}(i) Y1(i)Y_{1}(i) #treated ATE
Scheme 1, N=24
DGP1.1 Beta(0.5,0.5) Tri(0,1) 0 2hih_{i} 8 0
DGP1.2 -hih_{i} hih_{i}
DGP1.3 hih_{i} hih_{i}
Scheme 2, N=24
DGP2.1 Beta(2,5) Norm(0,1) 0 2hih_{i} 8 0
DGP2.2 -hih_{i} hih_{i}
DGP2.3 hih_{i} hih_{i}
  • Note: DGPs for Simulations. Beta(α\alpha,β\beta) are the beta distributions with shape parameters α\alpha and β\beta. Tri(0,1) is the symmetric triangular distribution on the unit interval. Norm(0,1) is the standard normal distribution. hih_{i} is the studentized leverage ratio for the iith unit. It is computed as hi=viv¯σvh_{i}=\frac{v_{i}-\bar{v}}{\sigma_{v}}, where vi=xi(i=1nxixi)1xiv_{i}=x_{i}^{\prime}(\sum_{i=1}^{n}x_{i}x_{i}^{\prime})^{-1}x_{i}, v¯=1ni=1nvi\bar{v}=\frac{1}{n}\sum_{i=1}^{n}v_{i} and σv2=1n1i=1N(viv¯2)\sigma^{2}_{v}=\frac{1}{n-1}\sum_{i=1}^{N}(v_{i}-\bar{v}^{2}). Note DGP1.1, DGP1.2, DGP2.1 and DGP2.2 are variable effects models, and DGP 1.3 and DGP 2.3 are constant effects models.

Table 1: Simulation Results for Scheme 1
ATE Estimators
Unadjusted OLS Debiased
Non-Int. Interacted Non-Int. Interacted
DGP1.1, N =24
Bias -0.000 -0.044 -0.171 -0.000 -0.000
SD 0.577 0.569 0.734 0.558 0.570
RMSE 0.577 0.571 0.754 0.558 0.570
CI Coverage (HC2, Student-t) 0.961 0.957 0.919 0.960 0.953
CI Coverage (HC2, Satterthwaite) 0.965 0.964 0.949 0.966 0.970
CI Coverage (BC-HC2, Student-t) 0.961 0.957
CI Coverage (BC-HC2, Satterthwaite) 0.967 0.973
DGP1.2, N =24
Bias -0.000 -0.046 -0.097 -0.000 -0.000
SD 0.144 0.220 0.275 0.205 0.182
RMSE 0.144 0.225 0.292 0.205 0.182
CI Coverage (HC2, Student-t) 1.000 0.999 0.982 1.000 0.999
CI Coverage (HC2, Satterthwaite) 1.000 1.000 0.994 1.000 1.000
CI Coverage (BC-HC2, Student-t) 1.000 1.000
CI Coverage (BC-HC2, Satterthwaite) 1.000 1.000
DGP1.3, N =24
Bias -0.000 0.002 -0.074 0.000 0.000
SD 0.433 0.417 0.483 0.400 0.408
RMSE 0.433 0.417 0.489 0.400 0.408
CI Coverage (HC2, Student-t) 0.940 0.938 0.916 0.946 0.948
CI Coverage (HC2, Satterthwaite) 0.947 0.949 0.950 0.956 0.970
CI Coverage (BC-HC2, Student-t) 0.947 0.950
CI Coverage (BC-HC2, Satterthwaite) 0.956 0.971
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is ×\times100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is (248){24\choose 8}.

Table 2: Simulation Results for Scheme 2
ATE Estimators
Unadjusted OLS Debiased
Non-Int. Interacted Non-Int. Interacted
DGP2.1, N =24
Bias 0.000 -0.237 0.028 0.000 0.000
SD 0.577 0.344 0.283 0.459 0.439
RMSE 0.577 0.418 0.284 0.459 0.439
CI Coverage (HC2, Student-t) 0.910 0.913 0.757 0.923 0.470
CI Coverage (HC2, Satterthwaite) 0.915 0.920 0.837 0.928 0.548
CI Coverage (BC-HC2, Student-t) 0.923 0.876
CI Coverage (BC-HC2, Satterthwaite) 0.928 0.930
DGP2.2, N =24
Bias 0.000 -0.237 0.015 0.000 0.000
SD 0.144 0.326 0.132 0.314 0.225
RMSE 0.144 0.403 0.133 0.314 0.225
Coverage_HC2 1.000 0.929 0.933 0.969 0.724
CI Coverage (HC2, Student-t) 1.00 0.93 0.93 0.967 0.614
CI Coverage (HC2, Satterthwaite) 1.00 0.935 0.991 0.969 0.724
CI Coverage (BC-HC2, Student-t) 0.965 0.967
CI Coverage (BC-HC2, Satterthwaite) 0.968 0.996
DGP2.3, N =24
Bias 0.000 0.000 0.013 0.000 0.000
SD 0.433 0.097 0.163 0.195 0.239
RMSE 0.433 0.097 0.164 0.195 0.239
CI Coverage (HC2, Student-t) 0.93 0.97 0.85 0.654 0.570
CI Coverage (HC2, Satterthwaite) 0.942 0.983 0.947 0.683 0.678
CI Coverage (BC-HC2, Student-t) 0.809 0.896
CI Coverage (BC-HC2, Satterthwaite) 0.850 0.944
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is ×\times100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is (248){24\choose 8}.

5.2 Real Dataset

In this section we compare the performance of debiased estimators with that of standard OLS estimators on a real world dataset. We precisely follow [Lin13a]’s simulation setting. We generate our simulation from the experimental example of [ALO09] by simulating random assignments under the maintained hypothesis of no treatment effect. Because this setting assumes no effects, bias is expected to be negligible: [Fre08a] notes that the leading term of the bias is greatest when treatment effects are heterogeneous. The simulation is thus not primarily meant to investigate bias, but rather the precision and coverage consequences of the use of our bias corrections in a real-world setting.101010We thank Winston Lin for sharing the replication files.
[ALO09] sought to measure the effects of support services and financial incentives on college students’ academic achievement. The experiment randomly assigns eligible first-year undergraduate students into four groups. One treatment group was offered both support services and financial incentives. A second group was offered only support services and a third group only financial incentives. The control group was eligible only for standard university support services. As in [Lin13a], we only use the data for men in the services-and-incentives (N=58N=58) and service-only (N=99N=99) groups. The simulation datasets are generated assuming the treatment has no effect on any students. We replicate the experiments 10710^{7} times, and each time randomly assign 5858 students to the services-and-incentives group and 9999 to the service-only group. The regression estimators estimate the treatment effects adjusting for high-school GPAs. The standard errors of the OLS estimators are estimated using the standard sandwich formulas.
Table 1 reports the simulation results from 10710^{7} simulations. The first and second rows of the table show the means and standard deviations of the five estimators. All estimators are approximately unbiased after rounding and the variances of the debiased estimators are no larger than the standard estimators. The third row shows that all confidence intervals cover the true value of the ATE with approximately 95 percent probability. The fourth row reports the average length of the confidence intervals. On average, the intervals of regression-adjusted estimators are slightly narrower than that of the unadjusted estimator. (The width of the confidence intervals for the debiased estimators are mechanically identical to those for the standard estimators due to being constructed using the same SE estimators.)

Table 3: Simulation of Angrist, Lang, Oreopolous (2009) experiment with zero treatment effects (10710^{7} replications).
ATE Estimators
Unadjusted OLS Debiased
Non-Int. Interacted Non-Int. Interacted
Bias 0.000 0.000 0.000 0.000 0.000
SD 0.159 0.150 0.147 0.147 0.147
MSE 0.159 0.150 0.147 0.147 0.147
CI Coverage (HC2, Student-t) 0.949 0.949 0.949 0.949 0.949
CI Coverage (HC2, Satterthwaite) 0.949 0.949 0.949 0.950 0.950
CI Coverage (BC-HC2, Student-t) 0.949 0.949
CI Coverage (BC-HC2, Satterthwaite) 0.950 0.950
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is ×\times100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Together, these results demonstrate that, in a real-world setting, our bias corrections can be effectively introduced without appreciably compromising the precision or coverage properties of regression adjusted estimators.

Appendix A Proofs

A.1 Constants

We define following five constants:

NAAA\displaystyle N_{AAA} =nnA3(nAn3nA(nA1)n(n1)+2nA(nA1)(nA2)n(n1)(n2))\displaystyle=\frac{n}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})
NBBB\displaystyle N_{BBB} =nnB3(nBn3nB(nB1)n(n1)+2nB(nB1)(nB2)n(n1)(n2))\displaystyle=\frac{n}{n_{B}^{3}}(\frac{n_{B}}{n}-\frac{3n_{B}(n_{B}-1)}{n(n-1)}+\frac{2n_{B}(n_{B}-1)(n_{B}-2)}{n(n-1)(n-2)})
NAAB\displaystyle N_{AAB} =nnA2nB(nAnBn(n1)+nA(nA1)nBn(n1)(n2))\displaystyle=\frac{n}{n_{A}^{2}n_{B}}(-\frac{n_{A}n_{B}}{n(n-1)}+\frac{n_{A}(n_{A}-1)n_{B}}{n(n-1)(n-2)})
NAdj,A\displaystyle N_{Adj,A} =n(n1)(n2)(nA1)(nA2)nAnA3n3\displaystyle=\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}
NAdj,B\displaystyle N_{Adj,B} =v(n1)(n2)(nB1)(nB2)nBnB3n3\displaystyle=\frac{{v(n-1)(n-2)}}{(n_{B}-1)(n_{B}-2)n_{B}}\frac{n_{B}^{3}}{n^{3}}

A.2 Auxiliary Lemmas

Let xi,yix_{i},y_{i} and ziz_{i} be three possibly identical random variables such that x¯=y¯=z¯=0\bar{x}=\bar{y}=\bar{z}=0.

Lemma A.1.
E[x¯Ay¯Az¯A]=NAAA1ni=1nxiyiziE[\bar{x}_{A}\bar{y}_{A}\bar{z}_{A}]=N_{AAA}\frac{1}{n}\sum_{i=1}^{n}x_{i}y_{i}z_{i}
E[x¯Ay¯Az¯B]=NAAB1ni=1nxiyiziE[\bar{x}_{A}\bar{y}_{A}\bar{z}_{B}]=N_{AAB}\frac{1}{n}\sum_{i=1}^{n}x_{i}y_{i}z_{i}
Proof.

We only prove the first equality. The second one can be proved analogously. First notice two useful equalities:

E[i=1nTixiyijiTjzi]=E[i=1njiTiTjxiyizj]=i=1njiE[TiTj]xiyizj\displaystyle E[\sum_{i=1}^{n}T_{i}x_{i}y_{i}\sum_{j\not=i}T_{j}z_{i}]=E[\sum_{i=1}^{n}\sum_{j\not=i}T_{i}T_{j}x_{i}y_{i}z_{j}]=\sum_{i=1}^{n}\sum_{j\not=i}E[T_{i}T_{j}]x_{i}y_{i}z_{j}
=nA(nA1)n(n1)i=1njixiyizj=nA(nA1)n(n1)(i=1nj=1nxiyizji=1Nxiyizi)=nA(nA1)n(n1)i=1Nxiyizi,\displaystyle=\frac{n_{A}(n_{A}-1)}{n(n-1)}\sum_{i=1}^{n}\sum_{j\not=i}x_{i}y_{i}z_{j}=\frac{n_{A}(n_{A}-1)}{n(n-1)}(\sum_{i=1}^{n}\sum_{j=1}^{n}x_{i}y_{i}z_{j}-\sum_{i=1}^{N}x_{i}y_{i}z_{i})=-\frac{n_{A}(n_{A}-1)}{n(n-1)}\sum_{i=1}^{N}x_{i}y_{i}z_{i},

where in third equality uses the second moment estimate for a complete random experiments and the fifth equality uses the fact that i=1nzi=nz¯=0\sum_{i=1}^{n}z_{i}=n\bar{z}=0.

E[i=1nTixijiTjyjs{i,j}Tszs]=i=1njis{i,j}E[TiTjTs]xiyjzs\displaystyle E[\sum_{i=1}^{n}T_{i}x_{i}\sum_{j\not=i}T_{j}y_{j}\sum_{s\not\in\{i,j\}}T_{s}z_{s}]=\sum_{i=1}^{n}\sum_{j\not=i}\sum_{s\not\in\{i,j\}}E[T_{i}T_{j}T_{s}]x_{i}y_{j}z_{s}
=nA(nA1)(nA2)n(n1)(n2)i=1njis{i,j}xiyjzs\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}\sum_{i=1}^{n}\sum_{j\not=i}\sum_{s\not\in\{i,j\}}x_{i}y_{j}z_{s}
=nA(nA1)(nA2)n(n1)(n2)(i=1njis=1nxiyjzsi=1njixiyj(zi+zj))\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}(\sum_{i=1}^{n}\sum_{j\not=i}\sum_{s=1}^{n}x_{i}y_{j}z_{s}-\sum_{i=1}^{n}\sum_{j\not=i}x_{i}y_{j}(z_{i}+z_{j}))
=nA(nA1)(nA2)n(n1)(n2)(i=1nj=1nxiyj(zi+zj)+2i=1nxiyizi)\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}(-\sum_{i=1}^{n}\sum_{j=1}^{n}x_{i}y_{j}(z_{i}+z_{j})+2\sum_{i=1}^{n}x_{i}y_{i}z_{i})
=2nA(nA1)(nA2)n(n1)(n2)i=1nxiyizi,\displaystyle=\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}\sum_{i=1}^{n}x_{i}y_{i}z_{i},

where the fourth and fifth equality use i=1nxi=i=1nyi=i=1nzi=0\sum_{i=1}^{n}x_{i}=\sum_{i=1}^{n}y_{i}=\sum_{i=1}^{n}z_{i}=0. Finally,

E[x¯Ay¯Az¯A]=1nA3E[i=1nTixii=1nTiyii=1nTizi]\displaystyle E[\bar{x}_{A}\bar{y}_{A}\bar{z}_{A}]=\frac{1}{n_{A}^{3}}E[\sum_{i=1}^{n}T_{i}x_{i}\sum_{i=1}^{n}T_{i}y_{i}\sum_{i=1}^{n}T_{i}z_{i}]
=1nA3(E[iTixiyizi]+E[i=1njiTiTj(xiyizj+xiyjzi+xjyizi)]+E[i=1NTixijiTjyjs{i,j}Tszs]\displaystyle=\frac{1}{n_{A}^{3}}(E[\sum_{i}T_{i}x_{i}y_{i}z_{i}]+E[\sum_{i=1}^{n}\sum_{j\not=i}T_{i}T_{j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{i}+x_{j}y_{i}z_{i})]+E[\sum_{i=1}^{N}T_{i}x_{i}\sum_{j\not=i}T_{j}y_{j}\sum_{s\not\in\{i,j\}}T_{s}z_{s}]
=1nA3(nAn3nA(nA1)n(n1)+2nA(nA1)(nA2)n(n1)(n2))i=1nxiyizi,\displaystyle=\frac{1}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})\sum_{i=1}^{n}x_{i}y_{i}z_{i},

where for the last equality we apply the previous two equalities. ∎

Lemma A.2.
NAdj,AE[1nAi[A](xix¯A)(yiy¯A)(ziz¯A)]=1ni=1n(xix¯)(yiy¯)(ziz¯)N_{Adj,A}E[\frac{1}{n_{A}}\sum_{i\in[A]}(x_{i}-\bar{x}_{A})(y_{i}-\bar{y}_{A})(z_{i}-\bar{z}_{A})]=\frac{1}{n}\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})(z_{i}-\bar{z}) (1)
Proof.
n3i=1n(xix¯)(yiy¯)(ziz¯)=i=1nj=1nk=1ns=1n(xixj)(yiyk)(zizs)\displaystyle n^{3}\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})(z_{i}-\bar{z})=\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}\sum_{s=1}^{n}(x_{i}-x_{j})(y_{i}-y_{k})(z_{i}-z_{s})
=n3i=1nxiyizin2i=1nj=1n(xiyixj+xiyjzj+xjyizi)+2nijsxiyjzs\displaystyle=n^{3}\sum_{i=1}^{n}x_{i}y_{i}z_{i}-n^{2}\sum_{i=1}^{n}\sum_{j=1}^{n}(x_{i}y_{i}x_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n\sum_{i}\sum_{j}\sum_{s}x_{i}y_{j}z_{s}
=(n33n2+2n)i=1nxiyizi(n22n)i=1j(xiyizj+xiyjzj+xjyizi)+2nijsxiyjzs\displaystyle=(n^{3}-3n^{2}+2n)\sum_{i=1}^{n}x_{i}y_{i}z_{i}-(n^{2}-2n)\sum_{i=1\not=j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n\sum_{i\not=j\not=s}x_{i}y_{j}z_{s}

Consider nA4n_{A}^{4} times the third moment estimator 1nAi[A](xix¯A)(yiy¯A)(ziz¯A)\frac{1}{n_{A}}\sum_{i\in[A]}(x_{i}-\bar{x}_{A})(y_{i}-\bar{y}_{A})(z_{i}-\bar{z}_{A}):

=(nA33nA2+2nA)i=1nTixiyizi(nA22nA)i=1jTiTj(xiyizj+xiyjzj+xjyizi)+2nAijsTiTjTsxiyjzs\displaystyle=(n_{A}^{3}-3n_{A}^{2}+2n_{A})\sum_{i=1}^{n}T_{i}x_{i}y_{i}z_{i}-(n_{A}^{2}-2n_{A})\sum_{i=1\not=j}T_{i}T_{j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n_{A}\sum_{i\not=j\not=s}T_{i}T_{j}T_{s}x_{i}y_{j}z_{s}
=nA(nA1)(nA2)i=1nTixiyizinA(nA2)i=1jTiTj(xiyizj+xiyjzj+xjyizi)+2nAijsTiTjTsxiyjzs\displaystyle=n_{A}(n_{A}-1)(n_{A}-2)\sum_{i=1}^{n}T_{i}x_{i}y_{i}z_{i}-n_{A}(n_{A}-2)\sum_{i=1\not=j}T_{i}T_{j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n_{A}\sum_{i\not=j\not=s}T_{i}T_{j}T_{s}x_{i}y_{j}z_{s}

In expectation this equals to:

=nA(nA1)(nA2)nAni=1nxiyizinA(nA1)(nA2)nAn(n1)i=1j(xiyizj+xiyjzj+xjyizi)\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n}\sum_{i=1}^{n}x_{i}y_{i}z_{i}-\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)}\sum_{i=1\not=j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})
+2nA(nA1)(nA2)nAn(n1)(n2)ijsxiyjzs\displaystyle+2\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)(n-2)}\sum_{i\not=j\not=s}x_{i}y_{j}z_{s}
=nA(nA1)(nA2)nAn(n1)(n2)n((n33n2+2n)i=1nxiyizi(n22n)i=1j(xiyizj+xiyjzj+xjyizi)\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)(n-2)n}((n^{3}-3n^{2}+2n)\sum_{i=1}^{n}x_{i}y_{i}z_{i}-(n^{2}-2n)\sum_{i=1\not=j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})
+2Nijsxiyjzs)\displaystyle+2N\sum_{i\not=j\not=s}x_{i}y_{j}z_{s})
=nA(nA1)(nA2)nAn(n1)(n2)n×n3i=1n(xix¯)(yiy¯)(ziz¯)\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)(n-2)n}\times n^{3}\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})(z_{i}-\bar{z})

A.3 Theorem 3.1

Proof.

By the Frisch–Waugh–Lovell theorem, the OLS estimate of the coefficient can be written Q^=D^1N^\widehat{Q}=\widehat{D}^{-1}\widehat{N}, where

N^\displaystyle\widehat{N} =pA(a𝐳¯Aa¯A𝐳¯A)+pB(b𝐳¯Bb¯B𝐳¯B)\displaystyle={p}_{A}(\overline{a\mathbf{z}}_{A}-\overline{a}_{A}\overline{\mathbf{z}}_{A})+{p}_{B}(\overline{b\mathbf{z}}_{B}-\overline{b}_{B}\overline{\mathbf{z}}_{B})
=pA(a𝐳¯Aa¯A𝐳¯A)+pB(b𝐳¯Bb¯B𝐳¯B)\displaystyle={p}_{A}(\overline{a^{*}\mathbf{z}}_{A}-\overline{a^{*}}_{A}\overline{\mathbf{z}}_{A})+{p}_{B}(\overline{b^{*}\mathbf{z}}_{B}-\overline{b^{*}}_{B}\overline{\mathbf{z}}_{B})
=N+pA(a𝐳¯Aa𝐳¯)+pB(b𝐳¯Bb𝐳¯)pAa¯A𝐳¯ApBb¯B𝐳¯B\displaystyle=N+{p}_{A}(\overline{a^{*}\mathbf{z}}_{A}-\overline{a^{*}\mathbf{z}})+{p}_{B}(\overline{b^{*}\mathbf{z}}_{B}-\overline{b^{*}\mathbf{z}})-{p}_{A}\overline{a^{*}}_{A}\overline{\mathbf{z}}_{A}-{p}_{B}\overline{b^{*}}_{B}\overline{\mathbf{z}}_{B}

and

D^=\displaystyle\widehat{D}= 𝐳𝐳¯pA𝐳¯A𝐳¯ApB𝐳¯B𝐳¯B.\displaystyle\overline{\mathbf{z}\mathbf{z}^{\prime}}-{p}_{A}\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}-{p}_{B}\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime}.
Q^=\displaystyle\widehat{Q}= D1N^+(D^1D1)N^\displaystyle D^{-1}\widehat{N}+\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}
=\displaystyle= D1(N+pA(a𝐳¯Aa𝐳¯)+pB(b𝐳¯Bb𝐳¯)pAa¯A𝐳¯ApBb¯B𝐳¯B)+(D^1D1)N^\displaystyle D^{-1}\left(N+{p}_{A}(\overline{a^{*}\mathbf{z}}_{A}-\overline{a^{*}\mathbf{z}})+{p}_{B}(\overline{b^{*}\mathbf{z}}_{B}-\overline{b^{*}\mathbf{z}})-{p}_{A}\overline{a^{*}}_{A}\overline{\mathbf{z}}_{A}-{p}_{B}\overline{b^{*}}_{B}\overline{\mathbf{z}}_{B}\right)+\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}
=\displaystyle= Q+D1(pA(a𝐳¯Aa𝐳¯)+pB(b𝐳¯Bb𝐳¯))D1(pAa¯A𝐳¯A+pBb¯B𝐳¯B)\displaystyle Q+D^{-1}\left({p}_{A}(\overline{a^{*}\mathbf{z}}_{A}-\overline{a^{*}\mathbf{z}})+{p}_{B}(\overline{b^{*}\mathbf{z}}_{B}-\overline{b^{*}\mathbf{z}})\right)-D^{-1}\left({p}_{A}\overline{a^{*}}_{A}\overline{\mathbf{z}}_{A}+{p}_{B}\overline{b^{*}}_{B}\overline{\mathbf{z}}_{B}\right)
+(D^1D1)N^\displaystyle+\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}
=\displaystyle= Q+ν1+ν3+ν2,\displaystyle Q+\nu_{1}+\nu_{3}+\nu_{2},

Now, consider the order of the terms ν1\nu_{1}, ν2\nu_{2} and ν3\nu_{3}. For ν1\nu_{1}, D1=𝐳𝐳¯D^{-1}=\overline{\mathbf{z}\mathbf{z}^{\prime}}, pA{p}_{A} and pB{p}_{B} are O(1)O(1) by assumption. Meanwhile, (a𝐳¯Aa𝐳¯)(\overline{a^{*}\mathbf{z}}_{A}-\overline{a\mathbf{z}}) and (b𝐳¯Bb𝐳¯)(\overline{b^{*}\mathbf{z}}_{B}-\overline{b\mathbf{z}}) are each Op(n12)O_{p}(n^{-\frac{1}{2}}) by moment conditions on aa, bb and zz. Therefore ν1=Op(n12)\nu_{1}=O_{p}(n^{-\frac{1}{2}})
For ν2\nu_{2}, note that D^D=pA𝐳¯A𝐳¯ApB𝐳¯B𝐳¯B\widehat{D}-D=-{p}_{A}\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}-{p}_{B}\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime} which is Op(n1)O_{p}(n^{-1}). D^1D1=D^1(DD^)D1=Op(1)Op(n1)O(1)\widehat{D}^{-1}-D^{-1}=\widehat{D}^{-1}(D-\widehat{D})D^{-1}=O_{p}(1)O_{p}(n^{-1})O(1) is also Op(n1)O_{p}(n^{-1}). Meanwhile, N^\widehat{N} converges to a constant vector so that it is Op(1)O_{p}(1). Therefore ν2=Op(n1)\nu_{2}=O_{p}(n^{-1})
For ν3\nu_{3}, 𝐳¯A\overline{\mathbf{z}}_{A}, 𝐳¯B\overline{\mathbf{z}}_{B}, b¯B\overline{b^{*}}_{B} and a¯A\overline{a^{*}}_{A} are Op(n12)O_{p}(n^{-\frac{1}{2}}), and D1D^{-1}, pA{p}_{A} and pB{p}_{B} are Op(1)O_{p}(1). Therefore ν3=Op(n1)\nu_{3}=O_{p}(n^{-1})

A.4 Corollary 3.1

Proof.
𝐄[ATE^adjATE]=\displaystyle\mathbf{E}[\widehat{ATE}_{adj}-ATE]= 𝐄[a¯Ab¯B(𝐳¯A𝐳¯B)Q^]ATE\displaystyle\mathbf{E}\left[\overline{a}_{A}-\overline{b}_{B}-(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}\widehat{Q}\right]-ATE
=\displaystyle= (𝐄[a¯Ab¯B]ATE)𝐄[(𝐳¯A𝐳¯B)]Q𝐄[(𝐳¯A𝐳¯B)(ν1+ν2+ν3)]\displaystyle\left(\mathbf{E}\left[\overline{a}_{A}-\overline{b}_{B}\right]-ATE\right)-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}\right]Q-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}(\nu_{1}+\nu_{2}+\nu_{3})\right]
=\displaystyle= (ATEATE)0×Q𝐄[(𝐳¯A𝐳¯B)(ν1+ν2+ν3)]\displaystyle\left(ATE-ATE\right)-0\times Q-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}(\nu_{1}+\nu_{2}+\nu_{3})\right]
=\displaystyle= 𝐄[(𝐳¯A𝐳¯B)(ν1+ν2+ν3)].\displaystyle-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}(\nu_{1}+\nu_{2}+\nu_{3})\right].

where in the third equality we used the unbiasedness of the difference-in-means estimator and the unbiasedness of 𝐳¯A\overline{\mathbf{z}}_{A} and 𝐳¯B\overline{\mathbf{z}}_{B} as estimators of the sample mean 𝐳¯\overline{\mathbf{z}}. ∎

A.5 Theorem 3.2

Proof.

The decomposition is algebraic. Now N^ANA=Op(n12)\hat{N}_{A}-N_{A}=O_{p}(n^{-\frac{1}{2}}), N^BNB=Op(n12)\hat{N}_{B}-N_{B}=O_{p}(n^{-\frac{1}{2}}), N^A=Op(1)\hat{N}_{A}=O_{p}(1) and N^B=Op(1)\hat{N}_{B}=O_{p}(1) by the assumptions.

D^A1D1=D^A1(DD^A)D1=Op(1)Op(n12)O(1)\hat{D}^{-1}_{A}-D^{-1}=\hat{D}_{A}^{-1}(D-\hat{D}_{A})D^{-1}=O_{p}(1)O_{p}(n^{-\frac{1}{2}})O(1)

Similarly D^B1D1=Op(n12)\hat{D}_{B}^{-1}-D^{-1}=O_{p}(n^{-\frac{1}{2}}). These estimates give the orders in the theorem. ∎

A.6 Corollary 3.2

Proof.

Same as Corollary 3.1 after noting that 𝐳¯A=Op(n12)\overline{\mathbf{z}}_{A}=O_{p}(n^{-\frac{1}{2}}) and 𝐳¯B=Op(n12)\overline{\mathbf{z}}_{B}=O_{p}(n^{-\frac{1}{2}})

A.7 Theorem 4.1

Let

CA,NI=nAnBnnA3(nAn3nA(nA1)n(n1)+2nA(nA1)(nA2)n(n1)(n2))n(n1)(n2)(nA1)(nA2)nAnA3n3C_{A,NI}=\frac{n_{A}}{n_{B}}\frac{n}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}
CB,NI=nAnBnnA2nB(nAnBn(n1)+nA(nA1)nBn(n1)(n2))n(n1)(n2)(nA1)(nA2)nAnA3n3C_{B,NI}=\frac{n_{A}}{n_{B}}\frac{n}{n_{A}^{2}n_{B}}(-\frac{n_{A}n_{B}}{n(n-1)}+\frac{n_{A}(n_{A}-1)n_{B}}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}

Note both of them are of order O(n1)O(n^{-1}).

Proof.

We first propose an estimator for 𝐄[(𝐳¯B𝐳¯A)ν1]\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})\nu_{1}]. Using the assumption 𝐳¯=0\overline{\mathbf{z}}=0, we have

(𝐳¯B𝐳¯A)=1pA𝐳¯B=1pB𝐳¯A.\displaystyle(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})=\frac{1}{{p}_{A}}\overline{\mathbf{z}}_{B}=-\frac{1}{{p}_{B}}\overline{\mathbf{z}}_{A}.

Then we have

𝐄[(𝐳¯B𝐳¯A)ν1]\displaystyle\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\nu_{1}] =1pA𝐄[𝐳¯BD1(pA(a𝐳¯Aa𝐳¯)+pB(b𝐳¯Bb𝐳¯))]\displaystyle=\frac{1}{{p}_{A}}\mathbf{E}\left[\overline{\mathbf{z}}_{B}^{\prime}D^{-1}\left({p}_{A}(\overline{a^{*}\mathbf{z}}_{A}-\overline{a^{*}\mathbf{z}})+{p}_{B}(\overline{b^{*}\mathbf{z}}_{B}-\overline{b^{*}\mathbf{z}})\right)\right]
=1n1(1ni=1nziD1bizi1ni=1nziD1aizi)\displaystyle=\frac{1}{n-1}(\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}b_{i}^{*}z_{i}-\frac{1}{n}\sum_{i=1}^{n}z_{i}D^{-1}a_{i}^{*}z_{i})
=1n1(hb¯h¯b¯)1n1(ha¯h¯a¯)\displaystyle=\frac{1}{n-1}(\overline{hb}-\bar{h}\bar{b})-\frac{1}{n-1}(\overline{ha}-\bar{h}\bar{a})

where the second equality follows from Proposition 1, [Fre08b]. A estimator of this bias is:

1n1[nB(n1)(nB1)n(hb¯Bh¯Bb¯B)nA(n1)(nA1)n(ha¯Ah¯Aa¯A)]\frac{1}{n-1}\Large[\frac{n_{B}\left(n-1\right)}{\left(n_{B}-1\right)n}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)-\frac{n_{A}\left(n-1\right)}{\left(n_{A}-1\right)n}\left(\overline{ha}_{A}-\overline{h}_{A}\overline{a}_{A}\right)\Large]

It is clear that this adjustment is of order Op(n1)O_{p}(n^{-1}).
It should be obvious that (𝐳¯A𝐳¯B)ν2(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}\nu_{2} is directly estimable and of order Op(n1)O_{p}(n^{-1}).
We now propose an estimator for 𝐄[(𝐳¯B𝐳¯A)ν3]\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\nu_{3}]:

𝐄[(𝐳¯B𝐳¯A)ν3]\displaystyle\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\nu_{3}] =1pBE[𝐳¯AD1pAa¯A𝐳¯A]+1pBE[𝐳¯AD1pBb¯B𝐳¯B]\displaystyle=\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{a^{*}}_{A}\overline{\mathbf{z}}_{A}]+\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{B}\overline{b^{*}}_{B}\overline{\mathbf{z}}_{B}]
=1pBE[𝐳¯AD1pAa¯A𝐳¯A]1pBE[𝐳¯AD1pAb¯B𝐳¯A]\displaystyle=\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{a^{*}}_{A}\overline{\mathbf{z}}_{A}]-\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{b^{*}}_{B}\overline{\mathbf{z}}_{A}]
=pApBNAAA1ni=1nziD1ziaipApBNAAB1ni=1nziD1zibi\displaystyle=\frac{p_{A}}{p_{B}}N_{AAA}\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}z_{i}a^{*}_{i}-\frac{p_{A}}{p_{B}}N_{AAB}\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}z_{i}b^{*}_{i}

By Lemma A1 and A2, an unbiased estimator for this quantity is:

nAnBNAAANadj,A1nAi[A]ziD1zi(aia¯A)nAnBNAABNadj,Bi[B]ziD1zi(bib¯B)\displaystyle\frac{n_{A}}{n_{B}}N_{AAA}N_{adj,A}\frac{1}{n_{A}}\sum_{i\in[A]}z_{i}^{\prime}D^{-1}z_{i}(a_{i}-\overline{a}_{A})-\frac{n_{A}}{n_{B}}N_{AAB}N_{adj,B}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})
=CA,NInAi[A]ziD1zi(aia¯A)CB,NInBi[B]ziD1zi(bib¯B)\displaystyle=\frac{C_{A,NI}}{n_{A}}\sum_{i\in[A]}z_{i}^{\prime}D^{-1}z_{i}(a_{i}-\overline{a}_{A})-\frac{C_{B,NI}}{n_{B}}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})

It is clear that this adjustment is of order Op(n1)O_{p}(n^{-1}).

A.8 Theorem 4.2

We have:

CA=nnA3(nAn3nA(nA1)n(n1)+2nA(nA1)(nA2)n(n1)(n2))n(n1)(n2)(nA1)(nA2)nAnA3n3C_{A}=\frac{n}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}
CB=nnB3(nBn3nB(nB1)n(n1)+2nB(nB1)(nB2)n(n1)(n2))n(n1)(n2)(nB1)(nB2)nBnB3n3C_{B}=\frac{n}{n_{B}^{3}}(\frac{n_{B}}{n}-\frac{3n_{B}(n_{B}-1)}{n(n-1)}+\frac{2n_{B}(n_{B}-1)(n_{B}-2)}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{B}-1)(n_{B}-2)n_{B}}\frac{n_{B}^{3}}{n^{3}}

Note that CAC_{A} and CBC_{B} are of order O(n1)O(n^{-1})

Proof.

We prove the result for bias in the control arm. Proof for the treated arm is analogous. First notice

E[z¯B(ν1B+ν2B)]\displaystyle E[\bar{z}_{B}^{\prime}(\nu_{1B}+\nu_{2B})] =E[z¯B(D^B1D1)N^B]\displaystyle=E[\bar{z}_{B}^{\prime}(\hat{D}^{-1}_{B}-D^{-1})\hat{N}_{B}]
+E[z¯BD11nB(i[B]zibi1ni[N]zibi)]\displaystyle+E[\bar{z}_{B}^{\prime}D^{-1}\frac{1}{n_{B}}\large(\sum_{i\in[B]}z_{i}b_{i}^{*}-\frac{1}{n}\sum_{i\in[N]}z_{i}b_{i}^{*}\large)]
E[z¯BD1z¯Bb¯B]\displaystyle-E[\bar{z}_{B}^{\prime}D^{-1}\bar{z}_{B}\bar{b}^{*}_{B}]

The first term is directly estimable and is of order Op(n1)O_{p}(n^{-1}). Analogous to the noninteractive case, the second term can be estimated by:

1n1nAnBnB(n1)(nB1)n(hb¯Bh¯Bb¯B)=1nnAnB1(hb¯Bh¯Bb¯B)\frac{1}{n-1}\frac{n_{A}}{n_{B}}\frac{n_{B}\left(n-1\right)}{\left(n_{B}-1\right)n}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)=\frac{1}{n}\frac{n_{A}}{n_{B}-1}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)

It should be clear that this term is of order Op(n1)O_{p}(n^{-1}).
The third term can be estimated by:

NBBBNadj,B1nBi[B]ziD1zi(bib¯B)=CB,Ii[B]ziD1zi(bib¯B)\displaystyle N_{BBB}*N_{adj,B}\frac{1}{n_{B}}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})=C_{B,I}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})

It is clear this term is of order Op(n1)O_{p}(n^{-1})

Appendix B More Simulation Results

B.1 Details on Simulation Schemes

X1(i)X_{1}(i) X2(i)X_{2}(i) Y0(i)Y_{0}(i) Y1(i)Y_{1}(i) #treated ATE
Scheme 1, N=24
DGP1.1 Beta(0.5,0.5) Tri(0,1) 0 2hih_{i} 8 0
DGP1.2 -hih_{i} hih_{i}
DGP1.3 hih_{i} hih_{i}
Scheme 2, N=24
DGP2.1 Beta(2,5) Norm(0,1) 0 2hih_{i} 8 0
DGP2.2 -hih_{i} hih_{i}
DGP2.3 hih_{i} hih_{i}
Scheme 3, N=24
DGP3.1 Uniform(0,1) Uniform(0,1), Squared, Reversed 0 2hih_{i} 8 0
DGP3.2 -hih_{i} hih_{i}
DGP3.3 hih_{i} hih_{i}
Scheme 4, N=24
DGP3.1 Uniform(0,1) Uniform(0,1), Squared 0 2hih_{i} 8 0
DGP3.2 -hih_{i} hih_{i}
DGP3.3 hih_{i} hih_{i}
Table 4: DGPs for Simulations. Beta(α\alpha,β\beta) are the beta distributions with shape parameters α\alpha and β\beta. Tri(0,1) is the symmetric triangular distribution on the unit interval. Norm(0,1) is the standard normal distribution. Uniform(0,1) is the uniform distribution on the unit interval. For Uniform(0,1) Squared, we square the quantiles of the uniformt distribution. For Uniform(0,1), Squared, Reversed, we reverse the order of the quantiles such that the 1st unit corresponds to the highest quantiles. hih_{i} is the studentized leverage ratio for the iith unit. It is computed as hi=viv¯σvh_{i}=\frac{v_{i}-\bar{v}}{\sigma_{v}}, where vi=xi(i=1nxixi)1xiv_{i}=x_{i}^{\prime}(\sum_{i=1}^{n}x_{i}x_{i}^{\prime})^{-1}x_{i}, v¯=1ni=1nvi\bar{v}=\frac{1}{n}\sum_{i=1}^{n}v_{i} and σv2=1n1i=1N(viv¯2)\sigma^{2}_{v}=\frac{1}{n-1}\sum_{i=1}^{N}(v_{i}-\bar{v}^{2}).
Refer to caption
Figure 1: Histogram Plots for the DGPs. For each scheme, we plot the distributions of XX, X2X2 and the studentized leverages hih_{i}. Please refer to the note under Table 1 or Table 5 for the definition of hih_{i}s.

B.2 Scheme 1

Table 5: Confidence Interval Statistics for Scheme 1 using HC2
ATE Estimators
HC2 BC-HC2
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP1.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 95.30 96.00 96.60 95.40 96.10 96.70
Interacted (Debiased) 94.60 95.30 97.00 95.00 95.70 97.30
CI Width, Average
Non-interacted (Debiased) 2.68 2.83 3.00 2.70 2.85 3.02
Interacted (Debiased) 3.20 3.38 4.13 3.26 3.44 4.28
CI Width, Median
Non-interacted (Debiased) 2.71 2.86 3.01 2.71 2.86 3.01
Interacted (Debiased) 3.02 3.19 3.67 3.02 3.19 3.67
DGP1.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 99.90 100.00 100.00 99.90 100.00 100.00
Interacted (Debiased) 99.90 99.90 100.00 100.00 100.00 100.00
CI Width, Average
Non-interacted (Debiased) 1.84 1.95 2.06 1.85 1.95 2.06
Interacted (Debiased) 1.87 1.98 2.42 1.90 2.00 2.48
CI Width, Median
Non-interacted (Debiased) 1.83 1.93 2.03 1.83 1.93 2.03
Interacted (Debiased) 1.77 1.87 2.16 1.77 1.87 2.16
DGP1.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 93.50 94.60 95.60 93.60 94.70 95.60
Interacted (Debiased) 93.80 94.80 97.00 94.10 95.00 97.10
CI Width, Average
Non-interacted (Debiased) 1.63 1.72 1.82 1.63 1.72 1.82
Interacted (Debiased) 1.87 1.98 2.42 1.91 2.01 2.49
CI Width, Median
Non-interacted (Debiased) 1.62 1.71 1.80 1.62 1.71 1.80
Interacted (Debiased) 1.77 1.87 2.16 1.77 1.87 2.16
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 6: Confidence Interval Statistics for Scheme 1 using HC3
ATE Estimators
HC3 BC-HC3
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP1.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 96.70 97.20 97.70 96.70 97.20 97.80
Interacted (Debiased) 97.40 97.70 98.80 97.60 97.90 99.00
CI Width, Average
Non-interacted (Debiased) 3.00 3.16 3.37 3.01 3.18 3.39
Interacted (Debiased) 4.58 4.83 7.60 4.60 4.85 7.83
CI Width, Median
Non-interacted (Debiased) 3.02 3.18 3.38 3.03 3.20 3.40
Interacted (Debiased) 3.94 4.16 5.18 3.90 4.11 5.12
DGP1.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 100.00 100.00 100.00 100.00 100.00 100.00
Interacted (Debiased) 100.00 100.00 100.00 100.00 100.00 100.00
CI Width, Average
Non-interacted (Debiased) 2.06 2.17 2.32 2.06 2.17 2.32
Interacted (Debiased) 2.54 2.68 4.16 2.54 2.68 4.25
CI Width, Median
Non-interacted (Debiased) 2.04 2.15 2.28 2.04 2.15 2.28
I.Nobias 2.21 2.33 2.91 2.19 2.31 2.88
DGP1.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 95.30 96.10 97.00 95.30 96.20 97.00
Interacted (Debiased) 96.80 97.40 98.90 97.00 97.50 99.00
CI Width, Average
Non-interacted (Debiased) 1.78 1.88 2.01 1.79 1.89 2.01
Interacted (Debiased) 2.54 2.68 4.16 2.57 2.71 4.33
CI Width, Median
Non-interacted (Debiased) 1.77 1.87 1.98 1.77 1.87 1.99
Interacted (Debiased) 2.21 2.33 2.91 2.20 2.32 2.89
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

B.3 Scheme 2

Table 7: Confidence Interval Statistics for Scheme 2 using HC2
ATE Estimators
HC2 BC-HC2
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP2.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 91.40 92.30 92.80 91.40 92.30 92.80
Interacted (Debiased) 44.70 47.00 54.80 85.40 87.60 93.20
CI Width, Average
Non-interacted (Debiased) 1.93 2.04 2.16 2.04 2.15 2.28
Interacted (Debiased) 0.60 0.63 0.89 2.48 2.61 4.50
CI Width, Median
Non-interacted (Debiased) 1.95 2.06 2.17 1.95 2.06 2.17
Interacted (Debiased) 0.51 0.54 0.67 0.51 0.54 0.67
DGP2.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 96.30 96.70 96.90 96.10 96.50 96.80
Interacted (Debiased) 58.90 61.40 72.40 95.30 96.70 99.60
CI Width, Average
Non-interacted (Debiased) 1.91 2.02 2.14 1.96 2.07 2.19
Interacted (Debiased) 0.40 0.42 0.59 1.33 1.40 2.36
CI Width, Median
Non-interacted (Debiased) 1.94 2.05 2.16 1.94 2.05 2.16
Interacted (Debiased) 0.38 0.40 0.48 0.38 0.40 0.48
DGP2.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 62.50 65.40 68.30 76.50 80.90 85.00
Interacted (Debiased) 54.20 57.00 67.80 87.40 89.60 94.40
CI Width, Average
Non-interacted (Debiased) 0.37 0.39 0.41 0.45 0.48 0.51
Interacted (Debiased) 0.40 0.42 0.59 1.33 1.41 2.38
CI Width, Median
Non-interacted (Debiased) 0.35 0.37 0.40 0.35 0.37 0.40
Interacted (Debiased) 0.38 0.40 0.48 0.38 0.40 0.48
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 8: Confidence Interval Statistics for Scheme 2 using HC3
ATE Estimators
HC3 BC-HC3
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP2.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 93.20 94.10 94.50 93.20 94.00 94.50
Interacted (Debiased) 74.20 76.00 87.60 96.40 97.00 98.80
CI Width, Average
Non-interacted (Debiased) 2.37 2.50 2.67 2.44 2.57 2.76
Interacted (Debiased) 1.60 1.69 5.29 12.18 12.85 57.49
CI Width, Median
Non-interacted (Debiased) 2.37 2.50 2.66 2.44 2.57 2.73
Interacted (Debiased) 1.07 1.13 1.85 2.11 2.23 3.94
DGP2.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 97.00 97.30 97.40 96.80 97.10 97.30
Interacted (Debiased) 85.30 86.70 96.10 99.90 99.90 100.00
CI Width, Average
Non-interacted (Debiased) 2.34 2.47 2.64 2.35 2.48 2.66
Interacted (Debiased) 0.93 0.98 2.88 6.26 6.60 29.16
CI Width, Median
Non-interacted (Debiased) 2.33 2.46 2.62 2.37 2.50 2.65
Interacted (Debiased) 0.67 0.71 1.16 1.18 1.25 2.21
DGP2.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 71.60 74.60 78.10 86.90 90.40 93.30
Interacted (Debiased) 83.30 84.70 94.50 98.20 98.80 99.60
CI Width, Average
Non-interacted (Debiased) 0.45 0.47 0.50 0.54 0.57 0.61
Interacted (Debiased) 0.93 0.98 2.88 6.28 6.63 29.31
CI Width, Median
Non-interacted (Debiased) 0.42 0.45 0.48 0.50 0.52 0.55
Interacted (Debiased) 0.67 0.71 1.16 1.18 1.25 2.24
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

B.4 Scheme 3

Table 9: Simulation Results for Scheme 3
ATE Estimators
Unadjusted OLS Debiased
Non-Int. Interacted Non-Int. Interacted
DGP3.1, N =24
Bias 0.000 -0.144 -0.004 0.000 0.000
SD 0.577 0.362 0.281 0.433 0.387
MSE 0.577 0.390 0.281 0.433 0.387
CI Coverage (HC2, Student-t) 0.943 0.949 0.843 0.959 0.712
CI Coverage (HC2, Satterthwaite) 0.948 0.959 0.917 0.965 0.810
CI Coverage (BC-HC2, Student-t) 0.960 0.876
CI Coverage (BC-HC2, Satterthwaite) 0.966 0.936
DGP3.2, N =24
Bias 0.000 0.144 0.003 0.000 0.000
SD 0.144 0.342 0.129 0.316 0.198
MSE 0.144 0.371 0.129 0.316 0.198
CI Coverage (HC2, Student-t) 1.000 0.963 0.927 0.983 0.804
CI Coverage (HC2, Satterthwaite) 1.000 0.967 0.980 0.985 0.906
CI Coverage (BC-HC2, Student-t) 0.981 0.959
CI Coverage (BC-HC2, Satterthwaite) 0.983 0.992
DGP3.3, N =24
Bias 0.000 0.000 -0.001 0.000 0.000
SD 0.433 0.109 0.164 0.168 0.210
MSE 0.433 0.109 0.164 0.168 0.210
CI Coverage (HC2, Student-t) 0.941 0.942 0.857 0.817 0.750
CI Coverage (HC2, Satterthwaite) 0.948 0.954 0.936 0.841 0.859
CI Coverage (BC-HC2, Student-t) 0.870 0.874
CI Coverage (BC-HC2, Satterthwaite) 0.896 0.939
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is ×\times100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 10: Confidence Interval Statistics for Scheme 3 using HC2
ATE Estimators
HC2 BC-HC2
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP3.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 95.000 95.900 96.500 95.200 96.000 96.600
Interacted (Debiased) 69.100 71.200 81.000 85.700 87.600 93.600
CI Width, Average
Non-interacted (Debiased) 2.130 2.248 2.385 2.181 2.302 2.445
Interacted (Debiased) 0.793 0.837 1.095 1.502 1.585 2.389
CI Width, Median
Non-interacted (Debiased) 2.160 2.280 2.406 2.205 2.327 2.449
Interacted (Debiased) 0.778 0.821 1.006 0.978 1.033 1.225
DGP3.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 97.900 98.300 98.500 97.600 98.100 98.300
Interacted (Debiased) 78.400 80.400 90.600 94.900 95.900 99.200
CI Width, Average
Non-interacted (Debiased) 2.110 2.227 2.362 2.125 2.242 2.381
Interacted (Debiased) 0.487 0.514 0.672 0.852 0.899 1.338
CI Width, Median
Non-interacted (Debiased) 2.129 2.247 2.368 2.150 2.269 2.389
Interacted (Debiased) 0.478 0.505 0.613 0.575 0.607 0.721
DGP3.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 79.200 81.700 84.100 84.400 87.000 89.600
Interacted (Debiased) 72.600 75.000 85.900 85.400 87.400 93.900
CI Width, Average
Non-interacted (Debiased) 0.422 0.446 0.473 0.457 0.483 0.513
Interacted (Debiased) 0.487 0.514 0.672 0.816 0.862 1.277
CI Width, Median
Non-interacted (Debiased) 0.417 0.440 0.465 0.435 0.459 0.483
Interacted (Debiased) 0.478 0.505 0.613 0.554 0.584 0.695
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is (248){24\choose 8}.

Table 11: Confidence Interval Statistics for Scheme 3 using HC3
ATE Estimators
HC3 BC-HC3
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP3.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 96.700 97.200 97.500 96.800 97.300 97.600
Interacted (Debiased) 87.900 89.000 95.100 95.800 96.600 99.100
CI Width, Average
Non-interacted (Debiased) 2.514 2.653 2.842 2.557 2.699 2.895
Interacted (Debiased) 1.793 1.893 4.927 4.118 4.347 14.929
CI Width, Median
Non-interacted (Debiased) 2.512 2.651 2.816 2.551 2.692 2.854
Interacted (Debiased) 1.385 1.462 2.136 1.606 1.695 2.568
DGP3.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 98.500 98.700 98.900 98.300 98.500 98.700
Interacted (Debiased) 93.000 93.800 98.600 99.000 99.300 100.000
CI Width, Average
Non-interacted (Debiased) 2.490 2.628 2.815 2.491 2.629 2.818
Interacted (Debiased) 0.989 1.044 2.624 2.280 2.406 8.289
CI Width, Median
Non-interacted (Debiased) 2.487 2.625 2.782 2.498 2.636 2.793
Interacted (Debiased) 0.787 0.831 1.219 0.906 0.956 1.447
DGP3.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 85.000 87.000 89.300 90.100 92.100 94.200
Interacted (Debiased) 90.000 91.300 97.200 95.600 96.500 99.100
CI Width, Average
Non-interacted (Debiased) 0.489 0.516 0.552 0.526 0.555 0.596
Interacted (Debiased) 0.989 1.044 2.624 2.133 2.251 7.610
CI Width, Median
Non-interacted (Debiased) 0.479 0.506 0.538 0.499 0.526 0.558
Interacted (Debiased) 0.787 0.831 1.219 0.883 0.932 1.417
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is (248){24\choose 8}.

B.5 Scheme 4

Table 12: Simulation Results for Scheme 4
ATE Estimators
Unadjusted OLS Debiased
Non-Int. Interacted Non-Int. Interacted
DGP4.1, N =24
Coverage, Percentage
Bias 0.000 -0.060 -0.042 0.000 0.000
SD 0.577 0.438 0.692 0.524 0.570
MSE 0.577 0.442 0.693 0.524 0.570
CI Coverage (HC2, Student-t) 0.862 0.849 0.801 0.832 0.803
CI Coverage (HC2, Satterthwaite) 0.872 0.858 0.856 0.843 0.860
CI Coverage (BC-HC2, Student-t) 0.836 0.841
CI Coverage (BC-HC2, Satterthwaite) 0.847 0.890
DGP4.2, N =24
Coverage, Percentage
Bias -0.000 0.061 0.028 0.000 0.000
SD 0.144 0.325 0.317 0.303 0.256
MSE 0.144 0.331 0.318 0.303 0.256
CI Coverage (HC2, Student-t) 1.000 0.933 0.907 0.954 0.930
CI Coverage (HC2, Satterthwaite) 1.000 0.940 0.974 0.960 0.991
CI Coverage (BC-HC2, Student-t) 0.954 0.952
CI Coverage (BC-HC2, Satterthwaite) 0.961 0.994
DGP4.3, N =24
Coverage, Percentage
Bias -0.000 0.001 -0.014 0.000 0.000
SD 0.433 0.272 0.405 0.329 0.355
MSE 0.433 0.272 0.405 0.329 0.355
CI Coverage (HC2, Student-t) 0.952 0.948 0.828 0.895 0.851
CI Coverage (HC2, Satterthwaite) 0.961 0.960 0.921 0.916 0.926
CI Coverage (BC-HC2, Student-t) 0.907 0.868
CI Coverage (BC-HC2, Satterthwaite) 0.927 0.929
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is ×\times100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 13: Confidence Interval Statistics for Scheme 4 using HC2
ATE Estimators
HC2 BC-HC2
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP4.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 82.100 83.200 84.300 82.400 83.600 84.700
Interacted (Debiased) 79.000 80.300 86.000 82.800 84.100 89.000
CI Width, Average
Non-interacted (Debiased) 2.101 2.217 2.355 2.135 2.254 2.395
Interacted (Debiased) 1.898 2.004 2.625 2.412 2.546 3.616
CI Width, Median
Non-interacted (Debiased) 2.166 2.286 2.419 2.196 2.318 2.449
Interacted (Debiased) 1.822 1.923 2.293 1.952 2.060 2.458
DGP4.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 94.600 95.400 96.000 94.600 95.400 96.100
Interacted (Debiased) 92.000 93.000 99.100 94.200 95.200 99.400
CI Width, Average
Non-interacted (Debiased) 1.983 2.093 2.220 1.990 2.101 2.229
Interacted (Debiased) 1.194 1.260 1.655 1.424 1.503 2.103
CI Width, Median
Non-interacted (Debiased) 2.041 2.154 2.273 2.051 2.164 2.283
Interacted (Debiased) 1.155 1.219 1.460 1.193 1.260 1.497
DGP4.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 87.500 89.500 91.600 88.700 90.700 92.700
Interacted (Debiased) 83.100 85.100 92.600 85.000 86.800 92.900
CI Width, Average
Non-interacted (Debiased) 1.051 1.109 1.176 1.068 1.127 1.197
Interacted (Debiased)s 1.194 1.260 1.655 1.443 1.523 2.132
CI Width, Median
Non-interacted (Debiased) 1.028 1.085 1.148 1.050 1.108 1.170
Interacted (Debiased) 1.155 1.219 1.460 1.204 1.271 1.505
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 14: Confidence Interval Statistics for Scheme 4 using HC3
ATE Estimators
HC3 BC-HC3
Z Student-t Satterthwaite Z Student-t Satterthwaite
DGP4.1, N =24
Coverage, Percentage
Non-interacted (Debiased) 85.400 86.500 87.600 85.600 86.700 87.900
Interacted (Debiased) 89.800 90.500 95.100 92.200 92.800 96.600
CI Width, Average
Non-interacted (Debiased) 2.533 2.673 2.866 2.565 2.707 2.905
Interacted (Debiased) 4.225 4.459 11.551 5.853 6.178 18.780
CI Width, Median
Non-interacted (Debiased) 2.561 2.703 2.880 2.595 2.739 2.913
Interacted (Debiased) 3.143 3.317 4.784 3.276 3.458 5.019
DGP4.2, N =24
Coverage, Percentage
Non-interacted (Debiased) 96.300 97.000 97.600 96.500 97.100 97.800
Interacted (Debiased) 96.900 97.400 100.000 98.400 98.800 100.000
CI Width, Average
Non-interacted (Debiased) 2.382 2.514 2.693 2.388 2.521 2.701
Interacted (Debiased) 2.363 2.494 6.239 3.118 3.291 9.625
CI Width, Median
Non-interacted (Debiased) 2.425 2.559 2.715 2.432 2.567 2.725
Interacted (Debiased) 1.831 1.932 2.824 1.872 1.976 2.895
DGP4.3, N =24
Coverage, Percentage
Non-interacted (Debiased) 92.100 93.600 95.400 93.100 94.500 96.100
Interacted (Debiased) 93.500 94.500 98.300 94.200 95.100 98.500
CI Width, Average
Non-interacted (Debiased) 1.213 1.281 1.370 1.228 1.296 1.388
Interacted (Debiased) 2.363 2.494 6.239 3.258 3.438 10.277
CI Width, Median
Non-interacted (Debiased) 1.174 1.239 1.320 1.198 1.264 1.343
Interacted (Debiased) 1.831 1.932 2.824 1.891 1.996 2.930
  • Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

References

  • [AI17] Susan Athey and Guido W Imbens. The econometrics of randomized experiments. In Handbook of economic field experiments, volume 1, pages 73–140. Elsevier, 2017.
  • [ALO09] Joshua Angrist, Daniel Lang, and Philip Oreopoulos. Incentives and services for college achievement: Evidence from a randomized trial. American Economic Journal: Applied Economics, 1(1):136–63, 2009.
  • [AM13] Peter M Aronow and Joel A Middleton. A class of unbiased estimators of the average treatment effect in randomized experiments. Journal of Causal Inference, 1(1):135–154, 2013.
  • [AP08] Joshua D Angrist and Jörn-Steffen Pischke. Mostly harmless econometrics. Princeton university press, 2008.
  • [Bli73] Alan S Blinder. Wage discrimination: reduced form and structural estimates. Journal of Human resources, pages 436–455, 1973.
  • [BM02] Robert M Bell and Daniel F McCaffrey. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28(2):169–182, 2002.
  • [DC18] Angus Deaton and Nancy Cartwright. Understanding and misunderstanding randomized controlled trials. Social Science & Medicine, 210:2–21, 2018.
  • [DGK07] Esther Duflo, Rachel Glennerster, and Michael Kremer. Using randomization in development economics research: A toolkit. Handbook of development economics, 4:3895–3962, 2007.
  • [Dun12] Thad Dunning. Natural experiments in the social sciences: a design-based approach. Cambridge University Press, 2012.
  • [Eic67] Friedhelm Eicker. Limit theorems for regressions with unequal and dependent errors. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 59–82. University of California Press, 1967.
  • [Fre08a] David A Freedman. on regression adjustments to experimental data. Advances in Applied Mathematics, 40(2):180–193, 2008.
  • [Fre08b] David A Freedman. on regression adjustments in experiments with several treatments. The annals of applied statistics, 2(1):176–196, 2008.
  • [Gle17] Rachel Glennerster. The practicalities of running randomized evaluations: Partnerships, measurement, ethics, and transparency. In Handbook of economic field experiments, volume 1, pages 175–243. Elsevier, 2017.
  • [Hub67] Peter J. Huber. The behavior of maximum likelihood estimates under nonstandard condition. In N.M. LeCam and J. Neyman, editors, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 1967. University of California Press.
  • [Imb10] Guido W Imbens. Better late than nothing: Some comments on deaton (2009) and heckman and urzua (2009). Journal of Economic literature, 48(2):399–423, 2010.
  • [IR15] Guido W Imbens and Donald B Rubin. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015.
  • [Lin13a] Winston Lin. Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics, 7(1):295–318, 2013.
  • [Lin13b] Winston Lin. Essays on Causal Inference in Randomized Experiments. University of California, Berkeley, 2013.
  • [LR11] John A List and Imran Rasul. Field experiments in labor economics. In Handbook of labor economics, volume 4, pages 103–228. Elsevier, 2011.
  • [MSY13] Luke W Miratrix, Jasjeet S Sekhon, and Bin Yu. Adjusting treatment effect estimates by post-stratification in randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(2):369–396, 2013.
  • [MW85] James G MacKinnon and Halbert White. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of econometrics, 29(3):305–325, 1985.
  • [Oax73] Ronald Oaxaca. Male-female wage differentials in urban labor markets. International economic review, pages 693–709, 1973.
  • [Pus21] James Pustejovsky. clubSandwich: Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections, 2021. R package version 0.5.3.
  • [Rec21] Ben Recht. Effect size is significantly more important than statistical significance., 2021.
  • [Rub90] Donald B. Rubin. Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25(3):279–292, 1990.
  • [SA12] Cyrus Samii and Peter M Aronow. On equivalencies between design-based and regression-based variance estimators for randomized experiments. Statistics & Probability Letters, 82(2):365–370, 2012.
  • [Sat46] Franklin E Satterthwaite. An approximate distribution of estimates of variance components. Biometrics bulletin, 2(6):110–114, 1946.
  • [SNDS90] Jerzy Splawa-Neyman, Dorota M Dabrowska, and TP Speed. On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science, pages 465–472, 1923 [1990].
  • [TE93] Robert J Tibshirani and Bradley Efron. An introduction to the bootstrap. Monographs on statistics and applied probability, 57:1–436, 1993.
  • [WGB18] Edward Wu and Johann A Gagnon-Bartsch. The loop estimator: Adjusting for covariates in randomized experiments. Evaluation review, 42(4):458–488, 2018.
  • [Whi80] Halbert White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: journal of the Econometric Society, pages 817–838, 1980.