Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials

Haoge Chang, Joel A. Middleton, and P. M. Aronow¹¹1We thank Donald Andrews, Winston Lin, Cyrus Samii and Jasjeet Sekhon for helpful comments and discussions.

Abstract

In an influential critique of empirical practice, Freedman [Fre08a, Fre08b] showed that the linear regression estimator was biased for the analysis of randomized controlled trials under the randomization model. Under Freedman’s assumptions, we derive exact closed-form bias corrections for the linear regression estimator with and without treatment-by-covariate interactions. We show that the limiting distribution of the bias corrected estimator is identical to the uncorrected estimator, implying that the asymptotic gains from adjustment can be attained without introducing any risk of bias. Taken together with results from Lin [Lin13a], our results show that Freedman’s theoretical arguments against the use of regression adjustment can be completely resolved with minor modifications to practice.

1 Introduction

Randomized Controlled Trials (RCTs) are popular in empirical economics [AP08, DGK07, Gle17, LR11]. When estimating average treatment effects, adjustment for pretreatment covariates with linear regression is a commonly recommended practice because it can reduce the variability of estimates. However, adjusting for covariates remains somewhat controversial, in large part because of an influential critique from David Freedman [Fre08a, Fre08b].

Freedman argued that randomization does not justify the use of linear regression for completely randomized experiments. Freedman’s theoretical arguments relied on three results proven under the randomization-based [SNDS90, IR15] inferential paradigm:

1.

asymptotically, the linear regression estimator can be inefficient relative to the unadjusted (difference-in-means) estimator if the design is imbalanced;
2.

the classical standard error for linear regression is inconsistent;
3.

the regression estimator has an $O_{p}(n^{-1})$ bias term.

Freedman’s arguments were influential among scholars across multiple disciplines (e.g., [Dun12],[Rec21]). Freedman’s third argument garnered particular attention among social scientists. Notably, [DC18]’s critique of randomization in empirical economics argued that the bias introduced by regression undermines the gold standard argument for RCTs.

Scholars have worked to address Freedman’s critiques and to understand the extent that they can and do matter for empirical practice in empirical economics. Using Freedman’s own framework, [Lin13a] showed that arguments 1 and 2 were resolved by small modifications to practice. Freedman’s efficiency result may be addressed through simple modifications to the regression specification, namely including treatment by covariate interactions [Bli73, Oax73]. Then it can be shown that the adjusted estimator is never less asymptotically efficient than the unadjusted estimator. Regarding argument 2, Lin proves that robust standard errors ([Whi80, Hub67, Eic67], see also [SA12]) are asymptotically conservative in Freedman’s setting, guaranteeing the validity of large-sample inference. On argument 3, [Lin13a] (see also [Lin13b]) notes that the leading term of the bias is in fact estimable and can be shown to be small in a real-world empirical example. However, the small-sample bias of the regression estimator was not yet fully resolved.

Since [Lin13a], there have been notable papers that have proposed unbiased regression-type estimators for experimental data. [MSY13] demonstrate that if the regression model is fully saturated (see also [AI17] and [Imb10]), then the associated effect estimate is unbiased conditional on the event that treatment is not collinear with any covariate stratum. This approach cannot generally be used without coarsening continuous covariates. [AM13] proposes the use of auxiliary data, demonstrating that the suitable use of hold-out samples ensures the finite-sample unbiasedness of the associated regression estimator, but the paper does not consider efficiency properties. More recently, [WGB18] extended [AM13] to propose an innovative but computationally expensive split-sample approach for completely randomized experiments.

The primary contribution of this paper is to resolve Freedman’s third theoretical argument by proposing finite-sample-exact, closed-form bias corrections without adding any new assumptions. Our idea builds on [Lin13a]’s proposal to estimate the leading term of the bias, but further develops a novel finite-sample exact bias correction encompassing all higher-order terms [Fre08a]. We derive these bias corrections for both the noninteracted and interacted linear regression estimators. We prove that the estimators have the same limiting distributions as the non-bias-adjusted estimators, implying that they could replace existing estimators in instances where bias is a prevailing concern (e.g., trials that may be aggregated in meta-analysis). We further provide a numerical illustration demonstrating these properties.
Finally, we remind the readers that the practice of debiasing estimators is not uncontroversial. [TE93]²²2We thank Winston Lin for suggesting this reference. has warned that the bias correction could be dangerous in practice due to its high variability³³3As will be shown in the simulations, when the performance is measured by the Root Mean Squared Error (RMSE), there is no clear dominance among the estimators: in some cases the RMSEs of the debiased estimators are strictly smaller than those of other estimators, and in other cases larger.. In real-world decision making processes, people may express different preferences for different statistical properties (i.e. unbiasedness or low Mean Squared Error)⁴⁴4See [WGB18] for an anecdotal example of a policy-maker favoring unbiasedness.. Our results shall imply that in large samples the additional variation caused by the bias correction is negligible, but for small samples, in some cases, we find it important to account for the sampling variability of the additional terms. To address this problem, we propose a simple modification to the standard error estimation procedure. Such modification, based on recomputing OLS residuals using the debiased estimators, is shown to work well on our simulated datasets. We make recommendations for practice in the Simulation section.
The organization of the paper is as follows: Section 2 includes the model setup and assumptions; Section 3 considers a characterization of bias terms of the OLS estimators; Section 4 proposes the bias corrections; Section 5 includes simulation results with both simulated datasets and a real world dataset. In the appendix one can find the proofs for theorems in Section 3 and Section 4, and more simulation results.

2 Setting, Assumptions and Notations

We follow the setting of [Fre08a] and [Lin13a], which assume a Neyman [SNDS90] model with covariates. There are $n$ subjects indexed by $i=1,...,n$ . For each subject we observe an outcome $Y_{i}$ and a column vector of covariates $\textbf{z}_{i}=(z_{i1},z_{i2},...,z_{iK})\in\mathbb{R}^{K}$ . The dimension of the covariates, $K$ , does not change with the sample size.
Each subject has two potential outcomes $a_{i}$ and $b_{i}$ (cf., the stable-unit-treatment-value-assumption [Rub90]). We observe $Y_{i}=a_{i}$ if $i$ is chosen for treatment arm $A$ (treated group) and $Y_{i}=b_{i}$ if i is chosen for arm $B$ (control group). Let $T_{i}$ be the dummy variable for treatment arm $A$ . Thus the observed outcome for $i$ is $Y_{i}=a_{i}T_{i}+b_{i}(1-T_{i})$ .
The experiment is assumed to be completely randomized: $n_{A}$ out of $n$ subjects are randomly assigned to arm $A$ and the remaining $n_{B}=n-n_{A}$ subjects to arm $B$ . Random assignment is the only source of randomness in the model. We do not assume a superpopulation: the $n$ subjects are the population of interest.
We introduce some notation. let $n$ be the population size, $n_{A}$ and $n_{B}$ be the number of subjects in treatment arms $A$ and $B$ , respectively. Let $[A]=\{i\mid T_{i}=1\}$ denote the set of individuals who are chosen for arm $A$ and similarly $[B]=\{i\mid T_{i}=0\}$ . Let $\bar{x}=\frac{1}{N}\sum_{i=1}^{N}x_{i}$ , $\bar{x}_{A}=\frac{1}{n_{A}}\sum_{i\in[A]}x_{i}$ and $\bar{x}_{B}=\frac{1}{n_{B}}\sum_{i\in[B]}x_{i}$ denote the population average, group $A$ average, and group $B$ average, respectively, of possibly a vector-valued variable $x$ . The average treatment effect (ATE) can be written in this notation as:

\bar{a}-\bar{b}

and the difference-in-means estimator:

\overline{a}_{A}-\overline{b}_{B}

Simiarly we can write $\frac{1}{n}\sum_{i=1}^{n}\mathbf{z}_{i}\mathbf{z}_{i}^{\prime}=\overline{\mathbf{z}\mathbf{z}^{\prime}}$ for $\mathbf{z}_{i}\in\mathbb{R}^{K}$ and $\frac{1}{n}\sum_{i=1}^{n}a_{i}\mathbf{z}_{i}=\overline{a\mathbf{z}}$ for $a_{i}\in\mathbb{R}$ and $\mathbb{z}\in\mathbb{R}^{K}$ .
We make the following assumptions throughout the paper, which are standard in the literature. (cf., [Fre08b, Fre08a, Lin13a]).

Assumption 1 (Bounded fourth moments).

For all $n=1,2...$ , and $x_{i}\in\{a_{i},b_{i},z_{i1},...,z_{iK}\}$ ,

\displaystyle\frac{1}{n}\sum_{i=1}^{n}x_{i}^{4}<L<\infty

where $L$ is a finite constant.

Assumption 2 (Convergence of first and second moments).

For $x_{i}=[a_{i},b_{i},{\mathbf{z}}_{i}^{\prime}]\in\mathbb{R}^{2+K}$ ,

\displaystyle\frac{1}{n}\sum_{i=1}^{n}x_{i}x_{i}^{\prime}\to\mathbf{M}

where $\mathbf{M}$ is a positive definite matrix with finite entries. Moreoever, $\frac{1}{n}\sum_{i=1}^{n}{\mathbf{z}}_{i}{\mathbf{z}}_{i}^{\prime}$ converges to an invertible matrix.

Assumption 3 (Group Sizes).

Let $p_{A,n}=\frac{n_{A}}{n}$ and $p_{B,n}=\frac{n-n_{A}}{n}$ , the inclusion probability into the treatment arm A and arm B, respectively. We assume $p_{A,n}>0$ and $p_{B,n}>0$ for all $n$ , and

\displaystyle p_{A,n}\to p_{A}>0,\text{ as }n\to\infty

\displaystyle p_{B,n}\to 1-p_{A}>0,\text{ as }n\to\infty

Assumption 4 (Centering).

$\bar{{\mathbf{z}}}=0$

All assumptions are employed regularly in the literature. They are used to derive consistency and asymptotic normality for the estimator. Assumption 3 requires each arm receives a nontrivial fraction of subjects over the asymptotic sequence of the models. Assumption 4 is without loss of generality: in practice, researchers can just demean each covariate and apply our method.
We remind the readers of the definitions of our two OLS regression adjusted ATE estimators. The first estimator comes from a noninteracted OLS regression where one regresses observed outcome $Y$ on the treatment indicator $T$ and demeaned pretreatment covariates $Z$ . The coefficient estimate for $T$ is the noninteracted OLS regression adjusted ATE estimator. The second estimator comes from a fully interacted OLS regression where one regresses observed outcome $Y$ on the treatment indicator $T$ , demeaned pretreatment covariates $Z$ , and interaction terms of the treatment indicators and demeaned pretreatment covariates. The coefficient estimate for $T$ is the interacted OLS regression adjusted ATE estimator.
Finally, we prepare some notation for the sections below. Let $a^{*}_{i}$ and $b^{*}_{i}$ be the centered potential outcomes, namely, $a^{*}_{i}=a_{i}-\bar{a}$ and $b^{*}_{i}=b_{i}-\bar{b}$ . Let $\widehat{D}=\overline{\mathbf{z}\mathbf{z}^{\prime}}-{p}_{A}\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}-{p}_{B}\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime}$ , $\widehat{N}={p}_{A}(\overline{a\mathbf{z}}_{A}-\overline{a}_{A}\overline{\mathbf{z}}_{A})+{p}_{B}(\overline{b\mathbf{z}}_{B}-\overline{b}_{B}\overline{\mathbf{z}}_{B})$ , $D=\overline{\mathbf{z}\mathbf{z}^{\prime}}$ and $N={p}_{A}\overline{a\mathbf{z}}+{p}_{B}\overline{b\mathbf{z}}$ . With this notation the regression coefficients estimators of the pretreatment covariates in the noninteracted case can be written as $\widehat{Q}={\widehat{D}}^{-1}\widehat{N}$ , and the population coefficients $Q=D^{-1}N$ . Denote the (rescaled) leverage of $i$ th data point as $h_{i}=z_{i}^{\prime}D^{-1}z_{i}$ .
Further define $\widehat{D}_{A}=\overline{\mathbf{z}\mathbf{z}^{\prime}}_{A}-\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}$ , $\widehat{N}_{A}=\overline{a\mathbf{z}}_{A}-\overline{a}_{A}\overline{\mathbf{z}}_{A}$ , $\widehat{D}_{B}=\overline{\mathbf{z}\mathbf{z}^{\prime}}_{B}-\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime}$ , $\widehat{N}_{B}=\overline{b\mathbf{z}}_{B}-\overline{b}_{B}\overline{\mathbf{z}}_{B}$ , $N_{A}=\overline{a\mathbf{z}}$ and $N_{B}=\overline{b\mathbf{z}}$ . Thus the regression coefficients estimator of the pretreatment covariates in the interactive case can be written as $\widehat{Q}_{A}=\widehat{D}^{-1}_{A}\widehat{N}_{A}$ and $\hat{Q}_{B}=\widehat{D}^{-1}_{B}\widehat{N}_{B}$ . Their population counterparts are $Q_{A}=D^{-1}N_{A}$ and $Q_{B}=D^{-1}N_{B}$ .

3 Bias Characterization

As shown in [Lin13a], the OLS regression adjusted ATE estimator can be written as:

\displaystyle\widehat{ATE}_{NI}=\overline{a}_{A}-\overline{b}_{B}-\{(\overline{\mathbf{z}}_{A}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}-(\overline{\mathbf{z}}_{B}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}\}

for the noninteracted case and

\displaystyle\widehat{ATE}_{I}

\displaystyle=\overline{a}_{A}-\overline{b}_{B}-\{(\overline{\mathbf{z}}_{A}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}_{A}-(\overline{{\mathbf{z}}}_{B}-\overline{{\mathbf{z}}})^{\prime}\widehat{Q}_{B}\}

for the interacted case, where $\widehat{Q}$ , $\widehat{Q}_{A}$ and $\widehat{Q}_{B}$ are the OLS coefficients in front of the covariates.

A characterization of the bias terms is provided in this section. Note that both the noninteracted and interacted estimators can be written as sums of the difference-in-means estimator and a regression adjustment using group means and OLS coefficients. The bias comes from the regression adjustment terms, in particular from estimating the regression coefficients on the covariates. We shall characterize the bias terms of the coefficient estimator first. We start from the noninteracted case. Note from here on we shall assume for simplicity that all design matrices (i.e. $\hat{D}$ and $D$ ) are invertible. In case of noninvertible design matrices, our debiased procedure will still work after choosing a particular generalized inverse, and compute the ATE estimators according to the formulae above.

Theorem 3.1.

The OLS coefficient vector for covariates can be written as:

\displaystyle\widehat{Q}=

\displaystyle Q+\nu_{1}+\nu_{2}+\nu_{3}

with

	$\displaystyle\nu_{1}$	$\displaystyle=D^{-1}\left({p}_{A}\left(\overline{a^{}\mathbf{z}}_{A}-\overline{a^{}\mathbf{z}}\right)+{p}_{B}\left(\overline{b^{}\mathbf{z}}_{B}-\overline{b^{}\mathbf{z}}\right)\right)=O_{p}(n^{-\frac{1}{2}}),$
	$\displaystyle\nu_{2}$	$\displaystyle=\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}=O_{p}(n^{-1}),$
	$\displaystyle\nu_{3}$	$\displaystyle=-D^{-1}\left({p}_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}+{p}_{B}\overline{b^{}}_{B}\overline{\mathbf{z}}_{B}\right)=O_{p}(n^{-1}).$

From the coefficient decomposition one can directly characterize the bias term of the ATE estimator. Note that the bias terms are of order $O_{p}(n^{-1})$ .

Corollary 3.1.

The bias of the $\widehat{ATE}_{NI}$ estimator is:

\displaystyle\mathbf{E}[\widehat{ATE}_{NI}-ATE]=

\displaystyle\mathbf{E}\left[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})(\nu_{1}+\nu_{2}+\nu_{3})\right].

Moreover, $(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})(\nu_{1}+\nu_{2}+\nu_{3})=O_{p}(n^{-\frac{1}{2}})O_{p}(n^{-\frac{1}{2}})=O_{p}(n^{-1})$

Following the same steps as we did for the noninteracted estimator, we are able to derive analogous results for the interacted estimator.

Theorem 3.2.

The OLS coefficient vectors for covariates can be written as

	$\displaystyle\hat{Q}_{A}=Q_{A}+\nu_{1A}+\nu_{2A}$
	$\displaystyle\hat{Q}_{B}=Q_{B}+\nu_{1B}+\nu_{2B}$

with

	$\displaystyle\nu_{1A}$	$\displaystyle=(\hat{D}^{-1}_{A}-D^{-1})\hat{N}_{A}=O_{p}(n^{-\frac{1}{2}})$
	$\displaystyle\nu_{2A}$	$\displaystyle=D^{-1}(\hat{N}_{A}-N_{A})=O_{p}(n^{-\frac{1}{2}});$
	$\displaystyle\nu_{1B}$	$\displaystyle=(\hat{D}^{-1}_{B}-D^{-1})\hat{N}_{B}=O_{p}(n^{-\frac{1}{2}});$
	$\displaystyle\nu_{2B}$	$\displaystyle=D^{-1}(\hat{N}_{B}-N_{B})=O_{p}(n^{-\frac{1}{2}})$

Corollary 3.2.

The bias of the $\widehat{ATE}_{I}$ estimator is:

\displaystyle E[\widehat{ATE}_{I}-ATE]

\displaystyle=E[\bar{z}_{B}(\nu_{1B}+\nu_{2B})]-E[\bar{z}_{A}(\nu_{1A}+\nu_{2A})]

Moreover, $\bar{z}_{B}(\nu_{1B}+\nu_{2B})=O_{p}(n^{-\frac{1}{2}})O_{p}(n^{-\frac{1}{2}})=O_{p}(n^{-1})$ and $\bar{z}_{A}(\nu_{1A}+\nu_{2A})=O_{p}(n^{-\frac{1}{2}})O_{p}(n^{-\frac{1}{2}})=O_{p}(n^{-1})$ .

Note that this result implies that the bias terms of the interactive ATE estimator are also of order $O_{p}(n^{-1})$ .

4 Bias Corrections for Regression Components

Having established the decomposition, we now derive estimators of each bias term for use as bias corrections. We show that these bias estimates are (i) exactly unbiased and (ii) have estimation error of $O_{p}(n^{-1})$ . It follows that use of this bias correction with an adjusted estimator yields a finite-sample unbiased estimator with the limit distribution of the adjusted estimator. We remind the readers that $h_{i}=z_{i}^{\prime}D^{-1}z_{i}$ , the (rescaled) leverage of $i$ th data point as defined in Section 2.

We again begin with the noninteracted case.

Theorem 4.1.

An unbiased estimator for the bias in the noninteracted case is:

	$\displaystyle\widehat{Bias}_{NI}=$	$\displaystyle\frac{1}{n}\frac{n_{B}}{n_{B}-1}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)-\frac{1}{n}\frac{n_{A}}{n_{A}-1}\left(\overline{ha}_{A}-\overline{h}_{A}\overline{a}_{A}\right)+(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}+$
		$\displaystyle\frac{C_{A,NI}}{n_{A}}\sum_{i\in[A]}(z_{i}-\overline{\mathbf{z}}_{A})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{A})(a_{i}-\overline{a}_{A})-\frac{C_{B,NI}}{n_{B}}\sum_{i\in[B]}(z_{i}-\overline{\mathbf{z}}_{B})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{B})(b_{i}-\overline{b}_{B})$

$C_{A,NI}$ and $C_{B,NI}$ are two constants depending on $n$ , $n_{A}$ and $n_{B}$ . Their exact formulas are given in the appendix. Moreover $\widehat{Bias}_{NI}=O_{p}(n^{-1})$

Corollary 4.1.

The following estimator is unbiased for estimating the ATE:

\widehat{ATE}_{NI,Debias}=\widehat{ATE}_{NI}-\widehat{Bias}_{NI}

The results are derived analogously in the interacted case.

Theorem 4.2.

An unbiased estimator for the bias in the interacted case is:

	$\displaystyle\widehat{Bias}_{I}=$	$\displaystyle\frac{1}{n}\frac{n_{A}}{n_{B}-1}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)+\bar{z}_{B}^{\prime}(\hat{D}_{B}^{-1}-D^{-1})\hat{N}_{B}-\frac{C_{B,I}}{n_{B}}\sum_{i\in[B]}(z_{i}-\overline{\mathbf{z}}_{B})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{B})(b_{i}-\overline{b}_{B})-$
		$\displaystyle\frac{1}{n}\frac{n_{B}}{n_{A}-1}\left(\overline{ha}_{A}-\overline{h}_{A}\overline{a}_{A}\right)-\bar{z}_{A}^{\prime}(\hat{D}_{A}^{-1}-D^{-1})\hat{N}_{A}+\frac{C_{A,I}}{n_{A}}\sum_{i\in[A]}(z_{i}-\overline{\mathbf{z}}_{A})^{\prime}D^{-1}(z_{i}-\overline{\mathbf{z}}_{A})(a_{i}-\overline{a}_{A}).$

$C_{A,I}$ and $C_{B,I}$ are two constants depending on $n$ , $n_{A}$ and $n_{B}$ . Their exact formulas are given in the appendix. Moreover $\widehat{Bias}_{I}=O_{p}(n^{-1})$ .

Corollary 4.2.

The following estimator is unbiased for estimating the ATE:

\widehat{ATE}_{I,Debias}=\widehat{ATE}_{I}-\widehat{Bias}_{I}

Remark 1.

Note that both adjustments in Theorem 4.1 and Theorem 4.2 are of order $O_{p}(n^{-1})$ . Thus $\sqrt{n}(\widehat{ATE}_{NI,Debias}-\widehat{ATE}_{NI})=o_{p}(1)$ and $\sqrt{n}(\widehat{ATE}_{I,Debias}-\widehat{ATE}_{I})=o_{p}(1)$ . The debiased estimators have the same limiting distributions as the original estimators.

Remark 2.

We briefly remark on why it is possible to design unbiased adjusted estimators in closed-form. Examining at the expressions in Section 3, although all biases are nonlinear, only $v_{2}$ , $v_{1A}$ and $v_{1B}$ have an infinite order Taylor expansion, but these terms can be expressed purely in terms of the observable data. All other terms are in expectation functions of moments that can be unbiasedly estimated.

5 Simulations

In this section we apply our estimators on several datasets.

We briefly comment on variance estimation and confidence interval construction. We showed in Section 4 that our debiased estimators have the same asymptotic distributions as those of the OLS estimators. This implies that for large samples we can recenter the OLS confidence intervals with our debiased estimators and expect the same coverage probabilities. For small samples, however, we find it important to account for the sampling variability of the additional terms. Indeed, in one of the simulations below, we found that a naive recentering procedure may lead to severe undercoverage. To address this problem, we propose a simple procedure shown to work well on our simulated datasets.⁵⁵5Another way is to directly estimate the variances of the additional terms, but this may be cumbersome to do. In this procedure, one first runs the OLS regression and computes the debiased estimator. Then one replaces the OLS treatment coefficient with the debiased estimated coefficients and recomputes the OLS residuals, keeping all other coefficients the same. Finally, one computes the variances and constructs confidence intervals for the debiased estimator using the same formula as for the OLS estimators. In the simulations below, such procedures will be denoted by BC, which stands for bias-corrected. In Appendix B, one can find a more detailed comparison of this new procedure with standard ones. In practice, we recommend researchers to use our debiased estimators with this procedure, the BC-HC2 heteroskedasticity-robust variance estimator with a Satterthwaite adjustment for inference⁶⁶6For a discussion of the Satterthwaite adjustment, see [Sat46],[BM02], [Lin13a] and [Imb10]..

5.1 Simulated Datasets

In this section we compare the performance of our debiased estimators with that of standard estimators using simulated datasets. We show the results of two simulation schemes here⁷⁷7In Appendix B, readers can find results for two more simulation schemes as well as some graphical information of the data generating processes.. In each scheme, we first generate two dimensional covariates that are the quantiles of some prespecified distributions⁸⁸8For example, in scheme 1, with a sample of $N$ units, the covariates of $i$ th unit are the $\frac{i}{N+1}$ quantiles of a Beta(0.5,0.5) distribution and a Triangle(0,1) distribution.. We then compute the studentized leverage ratios and use those to impute potential outcomes. We consider three ways to impute potential outcomes. For all cases the average treatment effect is equal to 0. The experiment is a completely randomized experiment with 24 units and an inclusion probability of $\frac{1}{3}$ for the treatment arm. Table 5.1 includes simulation details. Note that these schemes are designed specifically such that the finite sample bias is relatively large.
Table 2 and 3 report the simulation results for the two schemes. Our debiased estimators are exactly unbiased as expected. In terms of the root mean squared error (RMSE) the picture is less clear. There are cases where the debiased estimators dominated others (DGP 1.1 and DGP 1.3), and cases where the unadjusted estimator is the best (DGP 1.2 and DGP 2.2)⁹⁹9This is the artifact of the DGPs. Recall the variance formula for the difference-in-means estimator, for example from [IR15].. Note that DGP 1.3 and DGP 2.3 are constant effect models, in which the noninteracted OLS estimators are first order unbiased. However in DGP 1.3 we still observe a small, higher-order bias.
In terms of confidence interval coverages, observe in DGP 2.1, 2.2 and 2.3 that the original recentering intervals exhibit significant undercoverages. However, the procedure based on recomputing the OLS residuals with Satterthwaite adjustments works reasonably well. There is only one case DGP2.3, Non-Int, where the coverage is not very satisfactory. As shown in Table 6 and 8 in the appendix, the BC procedures do not add to the median confidence interval length significantly, although it tends to add to the average confidence interval length. The Satterthwaite could also add to the median (mean) confidence interval length: it typically increases the confidence interval length by at most 10 to 20 percent, compared with the Student-t adjustment.

	$X_{1}(i)$	$X_{2}(i)$	$Y_{0}(i)$	$Y_{1}(i)$	#treated	ATE
Scheme 1, N=24
DGP1.1	Beta(0.5,0.5)	Tri(0,1)	0	2 $h_{i}$	8	0
DGP1.2			- $h_{i}$	$h_{i}$
DGP1.3			$h_{i}$	$h_{i}$
Scheme 2, N=24
DGP2.1	Beta(2,5)	Norm(0,1)	0	2 $h_{i}$	8	0
DGP2.2			- $h_{i}$	$h_{i}$
DGP2.3			$h_{i}$	$h_{i}$

•

Note: DGPs for Simulations. Beta( $\alpha$ , $\beta$ ) are the beta distributions with shape parameters $\alpha$ and $\beta$ . Tri(0,1) is the symmetric triangular distribution on the unit interval. Norm(0,1) is the standard normal distribution. $h_{i}$ is the studentized leverage ratio for the $i$ th unit. It is computed as $h_{i}=\frac{v_{i}-\bar{v}}{\sigma_{v}}$ , where $v_{i}=x_{i}^{\prime}(\sum_{i=1}^{n}x_{i}x_{i}^{\prime})^{-1}x_{i}$ , $\bar{v}=\frac{1}{n}\sum_{i=1}^{n}v_{i}$ and $\sigma^{2}_{v}=\frac{1}{n-1}\sum_{i=1}^{N}(v_{i}-\bar{v}^{2})$ . Note DGP1.1, DGP1.2, DGP2.1 and DGP2.2 are variable effects models, and DGP 1.3 and DGP 2.3 are constant effects models.

Table 1: Simulation Results for Scheme 1

	ATE Estimators
	Unadjusted	OLS		Debiased
		Non-Int.	Interacted	Non-Int.	Interacted
DGP1.1, N =24
Bias	-0.000	-0.044	-0.171	-0.000	-0.000
SD	0.577	0.569	0.734	0.558	0.570
RMSE	0.577	0.571	0.754	0.558	0.570
CI Coverage (HC2, Student-t)	0.961	0.957	0.919	0.960	0.953
CI Coverage (HC2, Satterthwaite)	0.965	0.964	0.949	0.966	0.970
CI Coverage (BC-HC2, Student-t)				0.961	0.957
CI Coverage (BC-HC2, Satterthwaite)				0.967	0.973
DGP1.2, N =24
Bias	-0.000	-0.046	-0.097	-0.000	-0.000
SD	0.144	0.220	0.275	0.205	0.182
RMSE	0.144	0.225	0.292	0.205	0.182
CI Coverage (HC2, Student-t)	1.000	0.999	0.982	1.000	0.999
CI Coverage (HC2, Satterthwaite)	1.000	1.000	0.994	1.000	1.000
CI Coverage (BC-HC2, Student-t)				1.000	1.000
CI Coverage (BC-HC2, Satterthwaite)				1.000	1.000
DGP1.3, N =24
Bias	-0.000	0.002	-0.074	0.000	0.000
SD	0.433	0.417	0.483	0.400	0.408
RMSE	0.433	0.417	0.489	0.400	0.408
CI Coverage (HC2, Student-t)	0.940	0.938	0.916	0.946	0.948
CI Coverage (HC2, Satterthwaite)	0.947	0.949	0.950	0.956	0.970
CI Coverage (BC-HC2, Student-t)				0.947	0.950
CI Coverage (BC-HC2, Satterthwaite)				0.956	0.971

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is $\times$ 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is ${24\choose 8}$ .

Table 2: Simulation Results for Scheme 2

	ATE Estimators
	Unadjusted	OLS		Debiased
		Non-Int.	Interacted	Non-Int.	Interacted
DGP2.1, N =24
Bias	0.000	-0.237	0.028	0.000	0.000
SD	0.577	0.344	0.283	0.459	0.439
RMSE	0.577	0.418	0.284	0.459	0.439
CI Coverage (HC2, Student-t)	0.910	0.913	0.757	0.923	0.470
CI Coverage (HC2, Satterthwaite)	0.915	0.920	0.837	0.928	0.548
CI Coverage (BC-HC2, Student-t)				0.923	0.876
CI Coverage (BC-HC2, Satterthwaite)				0.928	0.930
DGP2.2, N =24
Bias	0.000	-0.237	0.015	0.000	0.000
SD	0.144	0.326	0.132	0.314	0.225
RMSE	0.144	0.403	0.133	0.314	0.225
Coverage_HC2	1.000	0.929	0.933	0.969	0.724
CI Coverage (HC2, Student-t)	1.00	0.93	0.93	0.967	0.614
CI Coverage (HC2, Satterthwaite)	1.00	0.935	0.991	0.969	0.724
CI Coverage (BC-HC2, Student-t)				0.965	0.967
CI Coverage (BC-HC2, Satterthwaite)				0.968	0.996
DGP2.3, N =24
Bias	0.000	0.000	0.013	0.000	0.000
SD	0.433	0.097	0.163	0.195	0.239
RMSE	0.433	0.097	0.164	0.195	0.239
CI Coverage (HC2, Student-t)	0.93	0.97	0.85	0.654	0.570
CI Coverage (HC2, Satterthwaite)	0.942	0.983	0.947	0.683	0.678
CI Coverage (BC-HC2, Student-t)				0.809	0.896
CI Coverage (BC-HC2, Satterthwaite)				0.850	0.944

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is $\times$ 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is ${24\choose 8}$ .

5.2 Real Dataset

In this section we compare the performance of debiased estimators with that of standard OLS estimators on a real world dataset. We precisely follow [Lin13a]’s simulation setting. We generate our simulation from the experimental example of [ALO09] by simulating random assignments under the maintained hypothesis of no treatment effect. Because this setting assumes no effects, bias is expected to be negligible: [Fre08a] notes that the leading term of the bias is greatest when treatment effects are heterogeneous. The simulation is thus not primarily meant to investigate bias, but rather the precision and coverage consequences of the use of our bias corrections in a real-world setting.¹⁰¹⁰10We thank Winston Lin for sharing the replication files.
[ALO09] sought to measure the effects of support services and financial incentives on college students’ academic achievement. The experiment randomly assigns eligible first-year undergraduate students into four groups. One treatment group was offered both support services and financial incentives. A second group was offered only support services and a third group only financial incentives. The control group was eligible only for standard university support services. As in [Lin13a], we only use the data for men in the services-and-incentives ( $N=58$ ) and service-only ( $N=99$ ) groups. The simulation datasets are generated assuming the treatment has no effect on any students. We replicate the experiments $10^{7}$ times, and each time randomly assign $58$ students to the services-and-incentives group and $99$ to the service-only group. The regression estimators estimate the treatment effects adjusting for high-school GPAs. The standard errors of the OLS estimators are estimated using the standard sandwich formulas.
Table 1 reports the simulation results from $10^{7}$ simulations. The first and second rows of the table show the means and standard deviations of the five estimators. All estimators are approximately unbiased after rounding and the variances of the debiased estimators are no larger than the standard estimators. The third row shows that all confidence intervals cover the true value of the ATE with approximately 95 percent probability. The fourth row reports the average length of the confidence intervals. On average, the intervals of regression-adjusted estimators are slightly narrower than that of the unadjusted estimator. (The width of the confidence intervals for the debiased estimators are mechanically identical to those for the standard estimators due to being constructed using the same SE estimators.)

Table 3: Simulation of Angrist, Lang, Oreopolous (2009) experiment with zero treatment effects (

10^{7}

replications).

	ATE Estimators
	Unadjusted	OLS		Debiased
		Non-Int.	Interacted	Non-Int.	Interacted
Bias	0.000	0.000	0.000	0.000	0.000
SD	0.159	0.150	0.147	0.147	0.147
MSE	0.159	0.150	0.147	0.147	0.147
CI Coverage (HC2, Student-t)	0.949	0.949	0.949	0.949	0.949
CI Coverage (HC2, Satterthwaite)	0.949	0.949	0.949	0.950	0.950
CI Coverage (BC-HC2, Student-t)				0.949	0.949
CI Coverage (BC-HC2, Satterthwaite)				0.950	0.950

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is $\times$ 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Together, these results demonstrate that, in a real-world setting, our bias corrections can be effectively introduced without appreciably compromising the precision or coverage properties of regression adjusted estimators.

Appendix A Proofs

A.1 Constants

We define following five constants:

	$\displaystyle N_{AAA}$	$\displaystyle=\frac{n}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})$
	$\displaystyle N_{BBB}$	$\displaystyle=\frac{n}{n_{B}^{3}}(\frac{n_{B}}{n}-\frac{3n_{B}(n_{B}-1)}{n(n-1)}+\frac{2n_{B}(n_{B}-1)(n_{B}-2)}{n(n-1)(n-2)})$
	$\displaystyle N_{AAB}$	$\displaystyle=\frac{n}{n_{A}^{2}n_{B}}(-\frac{n_{A}n_{B}}{n(n-1)}+\frac{n_{A}(n_{A}-1)n_{B}}{n(n-1)(n-2)})$
	$\displaystyle N_{Adj,A}$	$\displaystyle=\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}$
	$\displaystyle N_{Adj,B}$	$\displaystyle=\frac{{v(n-1)(n-2)}}{(n_{B}-1)(n_{B}-2)n_{B}}\frac{n_{B}^{3}}{n^{3}}$

A.2 Auxiliary Lemmas

Let $x_{i},y_{i}$ and $z_{i}$ be three possibly identical random variables such that $\bar{x}=\bar{y}=\bar{z}=0$ .

Lemma A.1.

E[\bar{x}_{A}\bar{y}_{A}\bar{z}_{A}]=N_{AAA}\frac{1}{n}\sum_{i=1}^{n}x_{i}y_{i}z_{i}

E[\bar{x}_{A}\bar{y}_{A}\bar{z}_{B}]=N_{AAB}\frac{1}{n}\sum_{i=1}^{n}x_{i}y_{i}z_{i}

Proof.

We only prove the first equality. The second one can be proved analogously. First notice two useful equalities:

	$\displaystyle E[\sum_{i=1}^{n}T_{i}x_{i}y_{i}\sum_{j\not=i}T_{j}z_{i}]=E[\sum_{i=1}^{n}\sum_{j\not=i}T_{i}T_{j}x_{i}y_{i}z_{j}]=\sum_{i=1}^{n}\sum_{j\not=i}E[T_{i}T_{j}]x_{i}y_{i}z_{j}$
	$\displaystyle=\frac{n_{A}(n_{A}-1)}{n(n-1)}\sum_{i=1}^{n}\sum_{j\not=i}x_{i}y_{i}z_{j}=\frac{n_{A}(n_{A}-1)}{n(n-1)}(\sum_{i=1}^{n}\sum_{j=1}^{n}x_{i}y_{i}z_{j}-\sum_{i=1}^{N}x_{i}y_{i}z_{i})=-\frac{n_{A}(n_{A}-1)}{n(n-1)}\sum_{i=1}^{N}x_{i}y_{i}z_{i},$

where in third equality uses the second moment estimate for a complete random experiments and the fifth equality uses the fact that $\sum_{i=1}^{n}z_{i}=n\bar{z}=0$ .

	$\displaystyle E[\sum_{i=1}^{n}T_{i}x_{i}\sum_{j\not=i}T_{j}y_{j}\sum_{s\not\in\{i,j\}}T_{s}z_{s}]=\sum_{i=1}^{n}\sum_{j\not=i}\sum_{s\not\in\{i,j\}}E[T_{i}T_{j}T_{s}]x_{i}y_{j}z_{s}$
	$\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}\sum_{i=1}^{n}\sum_{j\not=i}\sum_{s\not\in\{i,j\}}x_{i}y_{j}z_{s}$
	$\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}(\sum_{i=1}^{n}\sum_{j\not=i}\sum_{s=1}^{n}x_{i}y_{j}z_{s}-\sum_{i=1}^{n}\sum_{j\not=i}x_{i}y_{j}(z_{i}+z_{j}))$
	$\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}(-\sum_{i=1}^{n}\sum_{j=1}^{n}x_{i}y_{j}(z_{i}+z_{j})+2\sum_{i=1}^{n}x_{i}y_{i}z_{i})$
	$\displaystyle=\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)}\sum_{i=1}^{n}x_{i}y_{i}z_{i},$

where the fourth and fifth equality use $\sum_{i=1}^{n}x_{i}=\sum_{i=1}^{n}y_{i}=\sum_{i=1}^{n}z_{i}=0$ . Finally,

	$\displaystyle E[\bar{x}_{A}\bar{y}_{A}\bar{z}_{A}]=\frac{1}{n_{A}^{3}}E[\sum_{i=1}^{n}T_{i}x_{i}\sum_{i=1}^{n}T_{i}y_{i}\sum_{i=1}^{n}T_{i}z_{i}]$
	$\displaystyle=\frac{1}{n_{A}^{3}}(E[\sum_{i}T_{i}x_{i}y_{i}z_{i}]+E[\sum_{i=1}^{n}\sum_{j\not=i}T_{i}T_{j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{i}+x_{j}y_{i}z_{i})]+E[\sum_{i=1}^{N}T_{i}x_{i}\sum_{j\not=i}T_{j}y_{j}\sum_{s\not\in\{i,j\}}T_{s}z_{s}]$
	$\displaystyle=\frac{1}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})\sum_{i=1}^{n}x_{i}y_{i}z_{i},$

where for the last equality we apply the previous two equalities. ∎

Lemma A.2.

N_{Adj,A}E[\frac{1}{n_{A}}\sum_{i\in[A]}(x_{i}-\bar{x}_{A})(y_{i}-\bar{y}_{A})(z_{i}-\bar{z}_{A})]=\frac{1}{n}\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})(z_{i}-\bar{z})

(1)

Proof.

	$\displaystyle n^{3}\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})(z_{i}-\bar{z})=\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}\sum_{s=1}^{n}(x_{i}-x_{j})(y_{i}-y_{k})(z_{i}-z_{s})$
	$\displaystyle=n^{3}\sum_{i=1}^{n}x_{i}y_{i}z_{i}-n^{2}\sum_{i=1}^{n}\sum_{j=1}^{n}(x_{i}y_{i}x_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n\sum_{i}\sum_{j}\sum_{s}x_{i}y_{j}z_{s}$
	$\displaystyle=(n^{3}-3n^{2}+2n)\sum_{i=1}^{n}x_{i}y_{i}z_{i}-(n^{2}-2n)\sum_{i=1\not=j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n\sum_{i\not=j\not=s}x_{i}y_{j}z_{s}$

Consider $n_{A}^{4}$ times the third moment estimator $\frac{1}{n_{A}}\sum_{i\in[A]}(x_{i}-\bar{x}_{A})(y_{i}-\bar{y}_{A})(z_{i}-\bar{z}_{A})$ :

	$\displaystyle=(n_{A}^{3}-3n_{A}^{2}+2n_{A})\sum_{i=1}^{n}T_{i}x_{i}y_{i}z_{i}-(n_{A}^{2}-2n_{A})\sum_{i=1\not=j}T_{i}T_{j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n_{A}\sum_{i\not=j\not=s}T_{i}T_{j}T_{s}x_{i}y_{j}z_{s}$
	$\displaystyle=n_{A}(n_{A}-1)(n_{A}-2)\sum_{i=1}^{n}T_{i}x_{i}y_{i}z_{i}-n_{A}(n_{A}-2)\sum_{i=1\not=j}T_{i}T_{j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})+2n_{A}\sum_{i\not=j\not=s}T_{i}T_{j}T_{s}x_{i}y_{j}z_{s}$

In expectation this equals to:

	$\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n}\sum_{i=1}^{n}x_{i}y_{i}z_{i}-\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)}\sum_{i=1\not=j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})$
	$\displaystyle+2\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)(n-2)}\sum_{i\not=j\not=s}x_{i}y_{j}z_{s}$
	$\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)(n-2)n}((n^{3}-3n^{2}+2n)\sum_{i=1}^{n}x_{i}y_{i}z_{i}-(n^{2}-2n)\sum_{i=1\not=j}(x_{i}y_{i}z_{j}+x_{i}y_{j}z_{j}+x_{j}y_{i}z_{i})$
	$\displaystyle+2N\sum_{i\not=j\not=s}x_{i}y_{j}z_{s})$
	$\displaystyle=\frac{n_{A}(n_{A}-1)(n_{A}-2)n_{A}}{n(n-1)(n-2)n}\times n^{3}\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})(z_{i}-\bar{z})$

∎

A.3 Theorem 3.1

Proof.

By the Frisch–Waugh–Lovell theorem, the OLS estimate of the coefficient can be written $\widehat{Q}=\widehat{D}^{-1}\widehat{N}$ , where

	$\displaystyle\widehat{N}$	$\displaystyle={p}_{A}(\overline{a\mathbf{z}}_{A}-\overline{a}_{A}\overline{\mathbf{z}}_{A})+{p}_{B}(\overline{b\mathbf{z}}_{B}-\overline{b}_{B}\overline{\mathbf{z}}_{B})$
		$\displaystyle={p}_{A}(\overline{a^{}\mathbf{z}}_{A}-\overline{a^{}}_{A}\overline{\mathbf{z}}_{A})+{p}_{B}(\overline{b^{}\mathbf{z}}_{B}-\overline{b^{}}_{B}\overline{\mathbf{z}}_{B})$
		$\displaystyle=N+{p}_{A}(\overline{a^{}\mathbf{z}}_{A}-\overline{a^{}\mathbf{z}})+{p}_{B}(\overline{b^{}\mathbf{z}}_{B}-\overline{b^{}\mathbf{z}})-{p}_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}-{p}_{B}\overline{b^{}}_{B}\overline{\mathbf{z}}_{B}$

and

\displaystyle\widehat{D}=

\displaystyle\overline{\mathbf{z}\mathbf{z}^{\prime}}-{p}_{A}\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}-{p}_{B}\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime}.

	$\displaystyle\widehat{Q}=$	$\displaystyle D^{-1}\widehat{N}+\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}$
	$\displaystyle=$	$\displaystyle D^{-1}\left(N+{p}_{A}(\overline{a^{}\mathbf{z}}_{A}-\overline{a^{}\mathbf{z}})+{p}_{B}(\overline{b^{}\mathbf{z}}_{B}-\overline{b^{}\mathbf{z}})-{p}_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}-{p}_{B}\overline{b^{}}_{B}\overline{\mathbf{z}}_{B}\right)+\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}$
	$\displaystyle=$	$\displaystyle Q+D^{-1}\left({p}_{A}(\overline{a^{}\mathbf{z}}_{A}-\overline{a^{}\mathbf{z}})+{p}_{B}(\overline{b^{}\mathbf{z}}_{B}-\overline{b^{}\mathbf{z}})\right)-D^{-1}\left({p}_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}+{p}_{B}\overline{b^{}}_{B}\overline{\mathbf{z}}_{B}\right)$
		$\displaystyle+\left(\widehat{D}^{-1}-D^{-1}\right)\widehat{N}$
	$\displaystyle=$	$\displaystyle Q+\nu_{1}+\nu_{3}+\nu_{2},$

Now, consider the order of the terms $\nu_{1}$ , $\nu_{2}$ and $\nu_{3}$ . For $\nu_{1}$ , $D^{-1}=\overline{\mathbf{z}\mathbf{z}^{\prime}}$ , ${p}_{A}$ and ${p}_{B}$ are $O(1)$ by assumption. Meanwhile, $(\overline{a^{*}\mathbf{z}}_{A}-\overline{a\mathbf{z}})$ and $(\overline{b^{*}\mathbf{z}}_{B}-\overline{b\mathbf{z}})$ are each $O_{p}(n^{-\frac{1}{2}})$ by moment conditions on $a$ , $b$ and $z$ . Therefore $\nu_{1}=O_{p}(n^{-\frac{1}{2}})$
For $\nu_{2}$ , note that $\widehat{D}-D=-{p}_{A}\overline{\mathbf{z}}_{A}\overline{\mathbf{z}}_{A}^{\prime}-{p}_{B}\overline{\mathbf{z}}_{B}\overline{\mathbf{z}}_{B}^{\prime}$ which is $O_{p}(n^{-1})$ . $\widehat{D}^{-1}-D^{-1}=\widehat{D}^{-1}(D-\widehat{D})D^{-1}=O_{p}(1)O_{p}(n^{-1})O(1)$ is also $O_{p}(n^{-1})$ . Meanwhile, $\widehat{N}$ converges to a constant vector so that it is $O_{p}(1)$ . Therefore $\nu_{2}=O_{p}(n^{-1})$
For $\nu_{3}$ , $\overline{\mathbf{z}}_{A}$ , $\overline{\mathbf{z}}_{B}$ , $\overline{b^{*}}_{B}$ and $\overline{a^{*}}_{A}$ are $O_{p}(n^{-\frac{1}{2}})$ , and $D^{-1}$ , ${p}_{A}$ and ${p}_{B}$ are $O_{p}(1)$ . Therefore $\nu_{3}=O_{p}(n^{-1})$ ∎

A.4 Corollary 3.1

Proof.

	$\displaystyle\mathbf{E}[\widehat{ATE}_{adj}-ATE]=$	$\displaystyle\mathbf{E}\left[\overline{a}_{A}-\overline{b}_{B}-(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}\widehat{Q}\right]-ATE$
	$\displaystyle=$	$\displaystyle\left(\mathbf{E}\left[\overline{a}_{A}-\overline{b}_{B}\right]-ATE\right)-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}\right]Q-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}(\nu_{1}+\nu_{2}+\nu_{3})\right]$
	$\displaystyle=$	$\displaystyle\left(ATE-ATE\right)-0\times Q-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}(\nu_{1}+\nu_{2}+\nu_{3})\right]$
	$\displaystyle=$	$\displaystyle-\mathbf{E}\left[(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}(\nu_{1}+\nu_{2}+\nu_{3})\right].$

where in the third equality we used the unbiasedness of the difference-in-means estimator and the unbiasedness of $\overline{\mathbf{z}}_{A}$ and $\overline{\mathbf{z}}_{B}$ as estimators of the sample mean $\overline{\mathbf{z}}$ . ∎

A.5 Theorem 3.2

Proof.

The decomposition is algebraic. Now $\hat{N}_{A}-N_{A}=O_{p}(n^{-\frac{1}{2}})$ , $\hat{N}_{B}-N_{B}=O_{p}(n^{-\frac{1}{2}})$ , $\hat{N}_{A}=O_{p}(1)$ and $\hat{N}_{B}=O_{p}(1)$ by the assumptions.

\hat{D}^{-1}_{A}-D^{-1}=\hat{D}_{A}^{-1}(D-\hat{D}_{A})D^{-1}=O_{p}(1)O_{p}(n^{-\frac{1}{2}})O(1)

Similarly $\hat{D}_{B}^{-1}-D^{-1}=O_{p}(n^{-\frac{1}{2}})$ . These estimates give the orders in the theorem. ∎

A.6 Corollary 3.2

Proof.

Same as Corollary 3.1 after noting that $\overline{\mathbf{z}}_{A}=O_{p}(n^{-\frac{1}{2}})$ and $\overline{\mathbf{z}}_{B}=O_{p}(n^{-\frac{1}{2}})$ ∎

A.7 Theorem 4.1

Let

C_{A,NI}=\frac{n_{A}}{n_{B}}\frac{n}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}

C_{B,NI}=\frac{n_{A}}{n_{B}}\frac{n}{n_{A}^{2}n_{B}}(-\frac{n_{A}n_{B}}{n(n-1)}+\frac{n_{A}(n_{A}-1)n_{B}}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}

Note both of them are of order $O(n^{-1})$ .

Proof.

We first propose an estimator for $\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})\nu_{1}]$ . Using the assumption $\overline{\mathbf{z}}=0$ , we have

\displaystyle(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})=\frac{1}{{p}_{A}}\overline{\mathbf{z}}_{B}=-\frac{1}{{p}_{B}}\overline{\mathbf{z}}_{A}.

Then we have

	$\displaystyle\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\nu_{1}]$	$\displaystyle=\frac{1}{{p}_{A}}\mathbf{E}\left[\overline{\mathbf{z}}_{B}^{\prime}D^{-1}\left({p}_{A}(\overline{a^{}\mathbf{z}}_{A}-\overline{a^{}\mathbf{z}})+{p}_{B}(\overline{b^{}\mathbf{z}}_{B}-\overline{b^{}\mathbf{z}})\right)\right]$
		$\displaystyle=\frac{1}{n-1}(\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}b_{i}^{}z_{i}-\frac{1}{n}\sum_{i=1}^{n}z_{i}D^{-1}a_{i}^{}z_{i})$
		$\displaystyle=\frac{1}{n-1}(\overline{hb}-\bar{h}\bar{b})-\frac{1}{n-1}(\overline{ha}-\bar{h}\bar{a})$

where the second equality follows from Proposition 1, [Fre08b]. A estimator of this bias is:

\frac{1}{n-1}\Large[\frac{n_{B}\left(n-1\right)}{\left(n_{B}-1\right)n}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)-\frac{n_{A}\left(n-1\right)}{\left(n_{A}-1\right)n}\left(\overline{ha}_{A}-\overline{h}_{A}\overline{a}_{A}\right)\Large]

It is clear that this adjustment is of order $O_{p}(n^{-1})$ .
It should be obvious that $(\overline{\mathbf{z}}_{A}-\overline{\mathbf{z}}_{B})^{\prime}\nu_{2}$ is directly estimable and of order $O_{p}(n^{-1})$ .
We now propose an estimator for $\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\nu_{3}]$ :

	$\displaystyle\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\nu_{3}]$	$\displaystyle=\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}]+\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{B}\overline{b^{}}_{B}\overline{\mathbf{z}}_{B}]$
		$\displaystyle=\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}]-\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{b^{}}_{B}\overline{\mathbf{z}}_{A}]$
		$\displaystyle=\frac{p_{A}}{p_{B}}N_{AAA}\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}z_{i}a^{}_{i}-\frac{p_{A}}{p_{B}}N_{AAB}\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}z_{i}b^{}_{i}$

By Lemma A1 and A2, an unbiased estimator for this quantity is:

	$\displaystyle\frac{n_{A}}{n_{B}}N_{AAA}N_{adj,A}\frac{1}{n_{A}}\sum_{i\in[A]}z_{i}^{\prime}D^{-1}z_{i}(a_{i}-\overline{a}_{A})-\frac{n_{A}}{n_{B}}N_{AAB}N_{adj,B}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})$
	$\displaystyle=\frac{C_{A,NI}}{n_{A}}\sum_{i\in[A]}z_{i}^{\prime}D^{-1}z_{i}(a_{i}-\overline{a}_{A})-\frac{C_{B,NI}}{n_{B}}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})$

It is clear that this adjustment is of order $O_{p}(n^{-1})$ .
∎

A.8 Theorem 4.2

We have:

C_{A}=\frac{n}{n_{A}^{3}}(\frac{n_{A}}{n}-\frac{3n_{A}(n_{A}-1)}{n(n-1)}+\frac{2n_{A}(n_{A}-1)(n_{A}-2)}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{A}-1)(n_{A}-2)n_{A}}\frac{n_{A}^{3}}{n^{3}}

C_{B}=\frac{n}{n_{B}^{3}}(\frac{n_{B}}{n}-\frac{3n_{B}(n_{B}-1)}{n(n-1)}+\frac{2n_{B}(n_{B}-1)(n_{B}-2)}{n(n-1)(n-2)})\frac{{n(n-1)(n-2)}}{(n_{B}-1)(n_{B}-2)n_{B}}\frac{n_{B}^{3}}{n^{3}}

Note that $C_{A}$ and $C_{B}$ are of order $O(n^{-1})$

Proof.

We prove the result for bias in the control arm. Proof for the treated arm is analogous. First notice

	$\displaystyle E[\bar{z}_{B}^{\prime}(\nu_{1B}+\nu_{2B})]$	$\displaystyle=E[\bar{z}_{B}^{\prime}(\hat{D}^{-1}_{B}-D^{-1})\hat{N}_{B}]$
		$\displaystyle+E[\bar{z}_{B}^{\prime}D^{-1}\frac{1}{n_{B}}\large(\sum_{i\in[B]}z_{i}b_{i}^{}-\frac{1}{n}\sum_{i\in[N]}z_{i}b_{i}^{}\large)]$
		$\displaystyle-E[\bar{z}_{B}^{\prime}D^{-1}\bar{z}_{B}\bar{b}^{*}_{B}]$

The first term is directly estimable and is of order $O_{p}(n^{-1})$ . Analogous to the noninteractive case, the second term can be estimated by:

\frac{1}{n-1}\frac{n_{A}}{n_{B}}\frac{n_{B}\left(n-1\right)}{\left(n_{B}-1\right)n}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)=\frac{1}{n}\frac{n_{A}}{n_{B}-1}\left(\overline{hb}_{B}-\overline{h}_{B}\overline{b}_{B}\right)

It should be clear that this term is of order $O_{p}(n^{-1})$ .
The third term can be estimated by:

\displaystyle N_{BBB}*N_{adj,B}\frac{1}{n_{B}}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})=C_{B,I}\sum_{i\in[B]}z_{i}^{\prime}D^{-1}z_{i}(b_{i}-\overline{b}_{B})

It is clear this term is of order $O_{p}(n^{-1})$ ∎

Appendix B More Simulation Results

B.1 Details on Simulation Schemes

	$X_{1}(i)$	$X_{2}(i)$	$Y_{0}(i)$	$Y_{1}(i)$	#treated	ATE
Scheme 1, N=24
DGP1.1	Beta(0.5,0.5)	Tri(0,1)	0	2 $h_{i}$	8	0
DGP1.2			- $h_{i}$	$h_{i}$
DGP1.3			$h_{i}$	$h_{i}$
Scheme 2, N=24
DGP2.1	Beta(2,5)	Norm(0,1)	0	2 $h_{i}$	8	0
DGP2.2			- $h_{i}$	$h_{i}$
DGP2.3			$h_{i}$	$h_{i}$
Scheme 3, N=24
DGP3.1	Uniform(0,1)	Uniform(0,1), Squared, Reversed	0	2 $h_{i}$	8	0
DGP3.2			- $h_{i}$	$h_{i}$
DGP3.3			$h_{i}$	$h_{i}$
Scheme 4, N=24
DGP3.1	Uniform(0,1)	Uniform(0,1), Squared	0	2 $h_{i}$	8	0
DGP3.2			- $h_{i}$	$h_{i}$
DGP3.3			$h_{i}$	$h_{i}$

Table 4: DGPs for Simulations. Beta(

\alpha

\beta

) are the beta distributions with shape parameters

\alpha

and

\beta

. Tri(0,1) is the symmetric triangular distribution on the unit interval. Norm(0,1) is the standard normal distribution. Uniform(0,1) is the uniform distribution on the unit interval. For Uniform(0,1) Squared, we square the quantiles of the uniformt distribution. For Uniform(0,1), Squared, Reversed, we reverse the order of the quantiles such that the 1st unit corresponds to the highest quantiles.

h_{i}

is the studentized leverage ratio for the

i

th unit. It is computed as

h_{i}=\frac{v_{i}-\bar{v}}{\sigma_{v}}

, where

v_{i}=x_{i}^{\prime}(\sum_{i=1}^{n}x_{i}x_{i}^{\prime})^{-1}x_{i}

\bar{v}=\frac{1}{n}\sum_{i=1}^{n}v_{i}

and

\sigma^{2}_{v}=\frac{1}{n-1}\sum_{i=1}^{N}(v_{i}-\bar{v}^{2})

Refer to caption — Figure 1: Histogram Plots for the DGPs. For each scheme, we plot the distributions of $X$ , $X2$ and the studentized leverages $h_{i}$ . Please refer to the note under Table 1 or Table 5 for the definition of $h_{i}$ s.

B.2 Scheme 1

Table 5: Confidence Interval Statistics for Scheme 1 using HC2

	ATE Estimators
	HC2			BC-HC2
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP1.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	95.30	96.00	96.60	95.40	96.10	96.70
Interacted (Debiased)	94.60	95.30	97.00	95.00	95.70	97.30
CI Width, Average
Non-interacted (Debiased)	2.68	2.83	3.00	2.70	2.85	3.02
Interacted (Debiased)	3.20	3.38	4.13	3.26	3.44	4.28
CI Width, Median
Non-interacted (Debiased)	2.71	2.86	3.01	2.71	2.86	3.01
Interacted (Debiased)	3.02	3.19	3.67	3.02	3.19	3.67
DGP1.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	99.90	100.00	100.00	99.90	100.00	100.00
Interacted (Debiased)	99.90	99.90	100.00	100.00	100.00	100.00
CI Width, Average
Non-interacted (Debiased)	1.84	1.95	2.06	1.85	1.95	2.06
Interacted (Debiased)	1.87	1.98	2.42	1.90	2.00	2.48
CI Width, Median
Non-interacted (Debiased)	1.83	1.93	2.03	1.83	1.93	2.03
Interacted (Debiased)	1.77	1.87	2.16	1.77	1.87	2.16
DGP1.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	93.50	94.60	95.60	93.60	94.70	95.60
Interacted (Debiased)	93.80	94.80	97.00	94.10	95.00	97.10
CI Width, Average
Non-interacted (Debiased)	1.63	1.72	1.82	1.63	1.72	1.82
Interacted (Debiased)	1.87	1.98	2.42	1.91	2.01	2.49
CI Width, Median
Non-interacted (Debiased)	1.62	1.71	1.80	1.62	1.71	1.80
Interacted (Debiased)	1.77	1.87	2.16	1.77	1.87	2.16

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 6: Confidence Interval Statistics for Scheme 1 using HC3

	ATE Estimators
	HC3			BC-HC3
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP1.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	96.70	97.20	97.70	96.70	97.20	97.80
Interacted (Debiased)	97.40	97.70	98.80	97.60	97.90	99.00
CI Width, Average
Non-interacted (Debiased)	3.00	3.16	3.37	3.01	3.18	3.39
Interacted (Debiased)	4.58	4.83	7.60	4.60	4.85	7.83
CI Width, Median
Non-interacted (Debiased)	3.02	3.18	3.38	3.03	3.20	3.40
Interacted (Debiased)	3.94	4.16	5.18	3.90	4.11	5.12
DGP1.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	100.00	100.00	100.00	100.00	100.00	100.00
Interacted (Debiased)	100.00	100.00	100.00	100.00	100.00	100.00
CI Width, Average
Non-interacted (Debiased)	2.06	2.17	2.32	2.06	2.17	2.32
Interacted (Debiased)	2.54	2.68	4.16	2.54	2.68	4.25
CI Width, Median
Non-interacted (Debiased)	2.04	2.15	2.28	2.04	2.15	2.28
I.Nobias	2.21	2.33	2.91	2.19	2.31	2.88
DGP1.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	95.30	96.10	97.00	95.30	96.20	97.00
Interacted (Debiased)	96.80	97.40	98.90	97.00	97.50	99.00
CI Width, Average
Non-interacted (Debiased)	1.78	1.88	2.01	1.79	1.89	2.01
Interacted (Debiased)	2.54	2.68	4.16	2.57	2.71	4.33
CI Width, Median
Non-interacted (Debiased)	1.77	1.87	1.98	1.77	1.87	1.99
Interacted (Debiased)	2.21	2.33	2.91	2.20	2.32	2.89

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

B.3 Scheme 2

Table 7: Confidence Interval Statistics for Scheme 2 using HC2

	ATE Estimators
	HC2			BC-HC2
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP2.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	91.40	92.30	92.80	91.40	92.30	92.80
Interacted (Debiased)	44.70	47.00	54.80	85.40	87.60	93.20
CI Width, Average
Non-interacted (Debiased)	1.93	2.04	2.16	2.04	2.15	2.28
Interacted (Debiased)	0.60	0.63	0.89	2.48	2.61	4.50
CI Width, Median
Non-interacted (Debiased)	1.95	2.06	2.17	1.95	2.06	2.17
Interacted (Debiased)	0.51	0.54	0.67	0.51	0.54	0.67
DGP2.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	96.30	96.70	96.90	96.10	96.50	96.80
Interacted (Debiased)	58.90	61.40	72.40	95.30	96.70	99.60
CI Width, Average
Non-interacted (Debiased)	1.91	2.02	2.14	1.96	2.07	2.19
Interacted (Debiased)	0.40	0.42	0.59	1.33	1.40	2.36
CI Width, Median
Non-interacted (Debiased)	1.94	2.05	2.16	1.94	2.05	2.16
Interacted (Debiased)	0.38	0.40	0.48	0.38	0.40	0.48
DGP2.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	62.50	65.40	68.30	76.50	80.90	85.00
Interacted (Debiased)	54.20	57.00	67.80	87.40	89.60	94.40
CI Width, Average
Non-interacted (Debiased)	0.37	0.39	0.41	0.45	0.48	0.51
Interacted (Debiased)	0.40	0.42	0.59	1.33	1.41	2.38
CI Width, Median
Non-interacted (Debiased)	0.35	0.37	0.40	0.35	0.37	0.40
Interacted (Debiased)	0.38	0.40	0.48	0.38	0.40	0.48

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 8: Confidence Interval Statistics for Scheme 2 using HC3

	ATE Estimators
	HC3			BC-HC3
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP2.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	93.20	94.10	94.50	93.20	94.00	94.50
Interacted (Debiased)	74.20	76.00	87.60	96.40	97.00	98.80
CI Width, Average
Non-interacted (Debiased)	2.37	2.50	2.67	2.44	2.57	2.76
Interacted (Debiased)	1.60	1.69	5.29	12.18	12.85	57.49
CI Width, Median
Non-interacted (Debiased)	2.37	2.50	2.66	2.44	2.57	2.73
Interacted (Debiased)	1.07	1.13	1.85	2.11	2.23	3.94
DGP2.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	97.00	97.30	97.40	96.80	97.10	97.30
Interacted (Debiased)	85.30	86.70	96.10	99.90	99.90	100.00
CI Width, Average
Non-interacted (Debiased)	2.34	2.47	2.64	2.35	2.48	2.66
Interacted (Debiased)	0.93	0.98	2.88	6.26	6.60	29.16
CI Width, Median
Non-interacted (Debiased)	2.33	2.46	2.62	2.37	2.50	2.65
Interacted (Debiased)	0.67	0.71	1.16	1.18	1.25	2.21
DGP2.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	71.60	74.60	78.10	86.90	90.40	93.30
Interacted (Debiased)	83.30	84.70	94.50	98.20	98.80	99.60
CI Width, Average
Non-interacted (Debiased)	0.45	0.47	0.50	0.54	0.57	0.61
Interacted (Debiased)	0.93	0.98	2.88	6.28	6.63	29.31
CI Width, Median
Non-interacted (Debiased)	0.42	0.45	0.48	0.50	0.52	0.55
Interacted (Debiased)	0.67	0.71	1.16	1.18	1.25	2.24

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

B.4 Scheme 3

Table 9: Simulation Results for Scheme 3

	ATE Estimators
	Unadjusted	OLS		Debiased
		Non-Int.	Interacted	Non-Int.	Interacted
DGP3.1, N =24
Bias	0.000	-0.144	-0.004	0.000	0.000
SD	0.577	0.362	0.281	0.433	0.387
MSE	0.577	0.390	0.281	0.433	0.387
CI Coverage (HC2, Student-t)	0.943	0.949	0.843	0.959	0.712
CI Coverage (HC2, Satterthwaite)	0.948	0.959	0.917	0.965	0.810
CI Coverage (BC-HC2, Student-t)				0.960	0.876
CI Coverage (BC-HC2, Satterthwaite)				0.966	0.936
DGP3.2, N =24
Bias	0.000	0.144	0.003	0.000	0.000
SD	0.144	0.342	0.129	0.316	0.198
MSE	0.144	0.371	0.129	0.316	0.198
CI Coverage (HC2, Student-t)	1.000	0.963	0.927	0.983	0.804
CI Coverage (HC2, Satterthwaite)	1.000	0.967	0.980	0.985	0.906
CI Coverage (BC-HC2, Student-t)				0.981	0.959
CI Coverage (BC-HC2, Satterthwaite)				0.983	0.992
DGP3.3, N =24
Bias	0.000	0.000	-0.001	0.000	0.000
SD	0.433	0.109	0.164	0.168	0.210
MSE	0.433	0.109	0.164	0.168	0.210
CI Coverage (HC2, Student-t)	0.941	0.942	0.857	0.817	0.750
CI Coverage (HC2, Satterthwaite)	0.948	0.954	0.936	0.841	0.859
CI Coverage (BC-HC2, Student-t)				0.870	0.874
CI Coverage (BC-HC2, Satterthwaite)				0.896	0.939

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is $\times$ 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 10: Confidence Interval Statistics for Scheme 3 using HC2

	ATE Estimators
	HC2			BC-HC2
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP3.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	95.000	95.900	96.500	95.200	96.000	96.600
Interacted (Debiased)	69.100	71.200	81.000	85.700	87.600	93.600
CI Width, Average
Non-interacted (Debiased)	2.130	2.248	2.385	2.181	2.302	2.445
Interacted (Debiased)	0.793	0.837	1.095	1.502	1.585	2.389
CI Width, Median
Non-interacted (Debiased)	2.160	2.280	2.406	2.205	2.327	2.449
Interacted (Debiased)	0.778	0.821	1.006	0.978	1.033	1.225
DGP3.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	97.900	98.300	98.500	97.600	98.100	98.300
Interacted (Debiased)	78.400	80.400	90.600	94.900	95.900	99.200
CI Width, Average
Non-interacted (Debiased)	2.110	2.227	2.362	2.125	2.242	2.381
Interacted (Debiased)	0.487	0.514	0.672	0.852	0.899	1.338
CI Width, Median
Non-interacted (Debiased)	2.129	2.247	2.368	2.150	2.269	2.389
Interacted (Debiased)	0.478	0.505	0.613	0.575	0.607	0.721
DGP3.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	79.200	81.700	84.100	84.400	87.000	89.600
Interacted (Debiased)	72.600	75.000	85.900	85.400	87.400	93.900
CI Width, Average
Non-interacted (Debiased)	0.422	0.446	0.473	0.457	0.483	0.513
Interacted (Debiased)	0.487	0.514	0.672	0.816	0.862	1.277
CI Width, Median
Non-interacted (Debiased)	0.417	0.440	0.465	0.435	0.459	0.483
Interacted (Debiased)	0.478	0.505	0.613	0.554	0.584	0.695

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is ${24\choose 8}$ .

Table 11: Confidence Interval Statistics for Scheme 3 using HC3

	ATE Estimators
	HC3			BC-HC3
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP3.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	96.700	97.200	97.500	96.800	97.300	97.600
Interacted (Debiased)	87.900	89.000	95.100	95.800	96.600	99.100
CI Width, Average
Non-interacted (Debiased)	2.514	2.653	2.842	2.557	2.699	2.895
Interacted (Debiased)	1.793	1.893	4.927	4.118	4.347	14.929
CI Width, Median
Non-interacted (Debiased)	2.512	2.651	2.816	2.551	2.692	2.854
Interacted (Debiased)	1.385	1.462	2.136	1.606	1.695	2.568
DGP3.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	98.500	98.700	98.900	98.300	98.500	98.700
Interacted (Debiased)	93.000	93.800	98.600	99.000	99.300	100.000
CI Width, Average
Non-interacted (Debiased)	2.490	2.628	2.815	2.491	2.629	2.818
Interacted (Debiased)	0.989	1.044	2.624	2.280	2.406	8.289
CI Width, Median
Non-interacted (Debiased)	2.487	2.625	2.782	2.498	2.636	2.793
Interacted (Debiased)	0.787	0.831	1.219	0.906	0.956	1.447
DGP3.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	85.000	87.000	89.300	90.100	92.100	94.200
Interacted (Debiased)	90.000	91.300	97.200	95.600	96.500	99.100
CI Width, Average
Non-interacted (Debiased)	0.489	0.516	0.552	0.526	0.555	0.596
Interacted (Debiased)	0.989	1.044	2.624	2.133	2.251	7.610
CI Width, Median
Non-interacted (Debiased)	0.479	0.506	0.538	0.499	0.526	0.558
Interacted (Debiased)	0.787	0.831	1.219	0.883	0.932	1.417

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21]. The number of simulation is ${24\choose 8}$ .

B.5 Scheme 4

Table 12: Simulation Results for Scheme 4

	ATE Estimators
	Unadjusted	OLS		Debiased
		Non-Int.	Interacted	Non-Int.	Interacted
DGP4.1, N =24
Coverage, Percentage
Bias	0.000	-0.060	-0.042	0.000	0.000
SD	0.577	0.438	0.692	0.524	0.570
MSE	0.577	0.442	0.693	0.524	0.570
CI Coverage (HC2, Student-t)	0.862	0.849	0.801	0.832	0.803
CI Coverage (HC2, Satterthwaite)	0.872	0.858	0.856	0.843	0.860
CI Coverage (BC-HC2, Student-t)				0.836	0.841
CI Coverage (BC-HC2, Satterthwaite)				0.847	0.890
DGP4.2, N =24
Coverage, Percentage
Bias	-0.000	0.061	0.028	0.000	0.000
SD	0.144	0.325	0.317	0.303	0.256
MSE	0.144	0.331	0.318	0.303	0.256
CI Coverage (HC2, Student-t)	1.000	0.933	0.907	0.954	0.930
CI Coverage (HC2, Satterthwaite)	1.000	0.940	0.974	0.960	0.991
CI Coverage (BC-HC2, Student-t)				0.954	0.952
CI Coverage (BC-HC2, Satterthwaite)				0.961	0.994
DGP4.3, N =24
Coverage, Percentage
Bias	-0.000	0.001	-0.014	0.000	0.000
SD	0.433	0.272	0.405	0.329	0.355
MSE	0.433	0.272	0.405	0.329	0.355
CI Coverage (HC2, Student-t)	0.952	0.948	0.828	0.895	0.851
CI Coverage (HC2, Satterthwaite)	0.961	0.960	0.921	0.916	0.926
CI Coverage (BC-HC2, Student-t)				0.907	0.868
CI Coverage (BC-HC2, Satterthwaite)				0.927	0.929

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. The unit of CI coverage is $\times$ 100 percentage points. CI Coverage (HC2, Student-t) is calculated using the original OLS residuals. CI Coverage (BC-HC2, Student-t) and CI Coverage (BC-HC2, Satterthwaite) are calculated using the recomputed OLS residuals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 13: Confidence Interval Statistics for Scheme 4 using HC2

	ATE Estimators
	HC2			BC-HC2
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP4.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	82.100	83.200	84.300	82.400	83.600	84.700
Interacted (Debiased)	79.000	80.300	86.000	82.800	84.100	89.000
CI Width, Average
Non-interacted (Debiased)	2.101	2.217	2.355	2.135	2.254	2.395
Interacted (Debiased)	1.898	2.004	2.625	2.412	2.546	3.616
CI Width, Median
Non-interacted (Debiased)	2.166	2.286	2.419	2.196	2.318	2.449
Interacted (Debiased)	1.822	1.923	2.293	1.952	2.060	2.458
DGP4.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	94.600	95.400	96.000	94.600	95.400	96.100
Interacted (Debiased)	92.000	93.000	99.100	94.200	95.200	99.400
CI Width, Average
Non-interacted (Debiased)	1.983	2.093	2.220	1.990	2.101	2.229
Interacted (Debiased)	1.194	1.260	1.655	1.424	1.503	2.103
CI Width, Median
Non-interacted (Debiased)	2.041	2.154	2.273	2.051	2.164	2.283
Interacted (Debiased)	1.155	1.219	1.460	1.193	1.260	1.497
DGP4.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	87.500	89.500	91.600	88.700	90.700	92.700
Interacted (Debiased)	83.100	85.100	92.600	85.000	86.800	92.900
CI Width, Average
Non-interacted (Debiased)	1.051	1.109	1.176	1.068	1.127	1.197
Interacted (Debiased)s	1.194	1.260	1.655	1.443	1.523	2.132
CI Width, Median
Non-interacted (Debiased)	1.028	1.085	1.148	1.050	1.108	1.170
Interacted (Debiased)	1.155	1.219	1.460	1.204	1.271	1.505

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

Table 14: Confidence Interval Statistics for Scheme 4 using HC3

	ATE Estimators
	HC3			BC-HC3
	Z	Student-t	Satterthwaite	Z	Student-t	Satterthwaite
DGP4.1, N =24
Coverage, Percentage
Non-interacted (Debiased)	85.400	86.500	87.600	85.600	86.700	87.900
Interacted (Debiased)	89.800	90.500	95.100	92.200	92.800	96.600
CI Width, Average
Non-interacted (Debiased)	2.533	2.673	2.866	2.565	2.707	2.905
Interacted (Debiased)	4.225	4.459	11.551	5.853	6.178	18.780
CI Width, Median
Non-interacted (Debiased)	2.561	2.703	2.880	2.595	2.739	2.913
Interacted (Debiased)	3.143	3.317	4.784	3.276	3.458	5.019
DGP4.2, N =24
Coverage, Percentage
Non-interacted (Debiased)	96.300	97.000	97.600	96.500	97.100	97.800
Interacted (Debiased)	96.900	97.400	100.000	98.400	98.800	100.000
CI Width, Average
Non-interacted (Debiased)	2.382	2.514	2.693	2.388	2.521	2.701
Interacted (Debiased)	2.363	2.494	6.239	3.118	3.291	9.625
CI Width, Median
Non-interacted (Debiased)	2.425	2.559	2.715	2.432	2.567	2.725
Interacted (Debiased)	1.831	1.932	2.824	1.872	1.976	2.895
DGP4.3, N =24
Coverage, Percentage
Non-interacted (Debiased)	92.100	93.600	95.400	93.100	94.500	96.100
Interacted (Debiased)	93.500	94.500	98.300	94.200	95.100	98.500
CI Width, Average
Non-interacted (Debiased)	1.213	1.281	1.370	1.228	1.296	1.388
Interacted (Debiased)	2.363	2.494	6.239	3.258	3.438	10.277
CI Width, Median
Non-interacted (Debiased)	1.174	1.239	1.320	1.198	1.264	1.343
Interacted (Debiased)	1.831	1.932	2.824	1.891	1.996	2.930

•

Note: CI Coverage measures the empirical coverage rates of nominal 95 percent confidence intervals. HC3 refers to the approximate jackknife robust standard error estimator[MW85]. Both Student-t and Satterthwaite adjustments are calculated using the R-package clubSandwich[Pus21].

References

[AI17] Susan Athey and Guido W Imbens. The econometrics of randomized experiments. In Handbook of economic field experiments, volume 1, pages 73–140. Elsevier, 2017.
[ALO09] Joshua Angrist, Daniel Lang, and Philip Oreopoulos. Incentives and services for college achievement: Evidence from a randomized trial. American Economic Journal: Applied Economics, 1(1):136–63, 2009.
[AM13] Peter M Aronow and Joel A Middleton. A class of unbiased estimators of the average treatment effect in randomized experiments. Journal of Causal Inference, 1(1):135–154, 2013.
[AP08] Joshua D Angrist and Jörn-Steffen Pischke. Mostly harmless econometrics. Princeton university press, 2008.
[Bli73] Alan S Blinder. Wage discrimination: reduced form and structural estimates. Journal of Human resources, pages 436–455, 1973.
[BM02] Robert M Bell and Daniel F McCaffrey. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28(2):169–182, 2002.
[DC18] Angus Deaton and Nancy Cartwright. Understanding and misunderstanding randomized controlled trials. Social Science & Medicine, 210:2–21, 2018.
[DGK07] Esther Duflo, Rachel Glennerster, and Michael Kremer. Using randomization in development economics research: A toolkit. Handbook of development economics, 4:3895–3962, 2007.
[Dun12] Thad Dunning. Natural experiments in the social sciences: a design-based approach. Cambridge University Press, 2012.
[Eic67] Friedhelm Eicker. Limit theorems for regressions with unequal and dependent errors. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 59–82. University of California Press, 1967.
[Fre08a] David A Freedman. on regression adjustments to experimental data. Advances in Applied Mathematics, 40(2):180–193, 2008.
[Fre08b] David A Freedman. on regression adjustments in experiments with several treatments. The annals of applied statistics, 2(1):176–196, 2008.
[Gle17] Rachel Glennerster. The practicalities of running randomized evaluations: Partnerships, measurement, ethics, and transparency. In Handbook of economic field experiments, volume 1, pages 175–243. Elsevier, 2017.
[Hub67] Peter J. Huber. The behavior of maximum likelihood estimates under nonstandard condition. In N.M. LeCam and J. Neyman, editors, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 1967. University of California Press.
[Imb10] Guido W Imbens. Better late than nothing: Some comments on deaton (2009) and heckman and urzua (2009). Journal of Economic literature, 48(2):399–423, 2010.
[IR15] Guido W Imbens and Donald B Rubin. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015.
[Lin13a] Winston Lin. Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics, 7(1):295–318, 2013.
[Lin13b] Winston Lin. Essays on Causal Inference in Randomized Experiments. University of California, Berkeley, 2013.
[LR11] John A List and Imran Rasul. Field experiments in labor economics. In Handbook of labor economics, volume 4, pages 103–228. Elsevier, 2011.
[MSY13] Luke W Miratrix, Jasjeet S Sekhon, and Bin Yu. Adjusting treatment effect estimates by post-stratification in randomized experiments. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(2):369–396, 2013.
[MW85] James G MacKinnon and Halbert White. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of econometrics, 29(3):305–325, 1985.
[Oax73] Ronald Oaxaca. Male-female wage differentials in urban labor markets. International economic review, pages 693–709, 1973.
[Pus21] James Pustejovsky. clubSandwich: Cluster-Robust (Sandwich) Variance Estimators with Small-Sample Corrections, 2021. R package version 0.5.3.
[Rec21] Ben Recht. Effect size is significantly more important than statistical significance., 2021.
[Rub90] Donald B. Rubin. Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25(3):279–292, 1990.
[SA12] Cyrus Samii and Peter M Aronow. On equivalencies between design-based and regression-based variance estimators for randomized experiments. Statistics & Probability Letters, 82(2):365–370, 2012.
[Sat46] Franklin E Satterthwaite. An approximate distribution of estimates of variance components. Biometrics bulletin, 2(6):110–114, 1946.
[SNDS90] Jerzy Splawa-Neyman, Dorota M Dabrowska, and TP Speed. On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science, pages 465–472, 1923 [1990].
[TE93] Robert J Tibshirani and Bradley Efron. An introduction to the bootstrap. Monographs on statistics and applied probability, 57:1–436, 1993.
[WGB18] Edward Wu and Johann A Gagnon-Bartsch. The loop estimator: Adjusting for covariates in randomized experiments. Evaluation review, 42(4):458–488, 2018.
[Whi80] Halbert White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: journal of the Econometric Society, pages 817–838, 1980.

	$\displaystyle\mathbf{E}[(\overline{\mathbf{z}}_{B}-\overline{\mathbf{z}}_{A})^{\prime}\nu_{3}]$	$\displaystyle=\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}]+\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{B}\overline{b^{}}_{B}\overline{\mathbf{z}}_{B}]$
		$\displaystyle=\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{a^{}}_{A}\overline{\mathbf{z}}_{A}]-\frac{1}{p_{B}}E[\overline{\mathbf{z}}_{A}^{\prime}D^{-1}p_{A}\overline{b^{}}_{B}\overline{\mathbf{z}}_{A}]$
		$\displaystyle=\frac{p_{A}}{p_{B}}N_{AAA}\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}z_{i}a^{}_{i}-\frac{p_{A}}{p_{B}}N_{AAB}\frac{1}{n}\sum_{i=1}^{n}z_{i}^{\prime}D^{-1}z_{i}b^{}_{i}$