This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Individual Causal Inference
Using Panel Data With Multiple Outcomes

Wei Tian
UNSW
(August 24, 2021)
Abstract

Policy evaluation in empirical microeconomics has been focusing on estimating the average treatment effect and more recently the heterogeneous treatment effects, often relying on the unconfoundedness assumption. We propose a method based on the interactive fixed effects model to estimate treatment effects at the individual level, which allows both the treatment assignment and the potential outcomes to be correlated with the unobserved individual characteristics. This method is suitable for panel datasets where multiple related outcomes are observed for a large number of individuals over a small number of time periods. Monte Carlo simulations show that our method outperforms related methods. To illustrate our method, we provide an example of estimating the effect of health insurance coverage on individual usage of hospital emergency departments using the Oregon Health Insurance Experiment data. We find heterogeneous treatment effects in the sample. Comparisons between different groups show that the individuals who would have fewer emergency-department visits if covered by health insurance were younger and not in very bad physical conditions. However, their access to primary care were limited due to being in much more disadvantaged positions financially, which made them resort to using the emergency department as the usual place for medical care. Health insurance coverage might have decreased emergency-department use among this group by increasing access to primary care and possibly leading to improved health. In contrast, the individuals who would have more emergency-department visits if covered by health insurance were more likely to be older and in poor health. So even with access to primary care, they still used emergency departments more often for severe conditions, although sometimes for primary care treatable and non-emergent conditions as well. Health insurance coverage might have increased their emergency-department use by reducing the out-of-pocket cost of the visits.

1 Introduction

The main focus of the policy evaluation literature has been the average treatment effect and more recently the heterogeneous treatment effects or conditional average treatment effects, which are the average treatment effects for heterogeneous subgroups defined by the observed covariates (for reviews of these methods, see Athey and Imbens,, 2017; Abadie and Cattaneo,, 2018). Ubiquitous in these studies is the unconfoundedness assumption, or the strong ignorability assumption, which requires all the covariates correlated with both the potential outcomes and the treatment assignment to be observed (Rosenbaum and Rubin,, 1983).111This is also known as selection on observables or the conditional independence assumption. Under this assumption, the potential outcomes and the treatment status are independent conditional on the observed covariates, and the difference between the mean outcomes of the treated and the untreated groups with the same values of the observed covariates is an unbiased estimator of the average treatment effect for the units in the groups. The unconfoundedness assumption is satisfied in randomised controlled experiments, but may not be plausible otherwise even with a rich set of covariates, since the access to certain essential individual characteristics remains limited for the researchers due to privacy or ethical concerns, despite the explosive growth of data availability in the big data era.

One popular method to circumvent the unconfoundedness assumption is difference-in-differences (DID), which assumes that the effect of the unobserved confounder on the untreated potential outcome is constant over time, so that the average outcomes of the treated and untreated units would follow parallel trends in the absence of the treatment.222Alternative methods that do not rely on the unconfoundedness assumption include the instrumental variables method and the regression discontinuity design, which estimate the average treatment effect for specific subpopulations (the compliers or those with values of the running variable near the cutoff). This is also a strong assumption, and in many cases is not supported by data. The interactive fixed effects model relaxes the “parallel trends” assumption and allows the unobserved confounders to have time-varying effects on the outcomes, by modeling them using an interactive fixed effects term, which incorporates the additive unit and time fixed effects model or difference-in-differences as a special case (Bai,, 2009).

Several methods have been developed based on the interactive fixed effects model to estimate the treatment effect on a single or several treated units, where the units are observed over an extended period of time before the treatment (Abadie et al.,, 2010; Hsiao et al.,, 2012; Xu,, 2017). These methods exploit the cross-sectional correlations attributed to the unobserved common factors to predict the counterfactual outcomes for the treated units, and are mainly used in macroeconomic settings with a large number of pretreatment periods, which is crucial for the results to be credible. For example, Abadie et al., (2015) point out that “the applicability of the method requires a sizable number of preintervention periods” and that “we do not recommend using this method when the pretreatment fit is poor or the number of pretreatment periods is small”, while Xu, (2017) states that users should be cautious when there are fewer than 10 pretreatment periods. As a consequence, despite the potential to estimate individual treatment effects without imposing the unconfoundedness assumption, these methods have not seen much use in empirical microeconomics, since the individuals are rarely tracked for more than a few periods that justify the use of these methods.

The main contribution of this paper is that we propose a method for estimating the individual treatment effects in applied microeconomic settings, characterised by multiple related outcomes being observed for a large number of individuals over a small number of time periods. The method is based on the interactive fixed effects model, which assumes that an outcome of interest can be well approximated by a linear combination of a small number of observed and unobserved individual characteristics. Analogous to Hsiao et al., (2012) who predict the posttreatment outcomes using pretreatment outcomes in lieu of the unobserved time factors, we use a subsample of the pretreatment outcomes to replace the unobserved individual characteristics in the models, and use the remaining pretreatment outcomes as instrumental variables. Although our method does not require a large number of pretreatment periods, the number of pretreatment outcomes needs to be at least as large as the number of unobserved individual characteristics, which may still be difficult to satisfy in microeconomic datasets if we use only a single outcome, especially if the treatment assignment took place in the early stages of the survey or if the study subjects are children or youths. Utilising multiple related outcomes allows our method to be applicable in cases where there is only a single period before the treatment. Under the assumption that these outcomes depend on roughly the same set of observed covariates and unobserved individual characteristics with time-varying and outcome-specific coefficients shared by all individuals, our method exploits the correlations across related outcomes and over time, which are induced by the unobserved individual characteristics, to predict the counterfactual outcomes and estimate the treatment effects for each individual in the posttreatment periods.

Our method has several advantages. First, with the assumption on the model specification, it relaxes the arguably much stronger unconfoundedness assumption, and allows the treatment assignment to be correlated with the unobserved individual characteristics. Second, it enables the estimation of treatment effects on the individual level, which may be helpful for designing more individualised policies to maximize social welfare, as well as for other fields such as precision medicine and individualised marketing. It also has the potential to be combined with more flexible machine learning methods to work with big datasets and more general nonlinear function forms. Third, it is intuitive. In real life, we may never know a person through and through, and a viable approach to predicting the outcome of a person is using his or her related outcomes in the past, assuming that the outcomes are affected by the underlying individual characteristics and that these characteristics are stable over time, at least within the study period. For example, past academic performance is an important consideration when recruiting a student into college, as it is believed that a student that excelled in the past is likely to continue to have outstanding performance. To the extent that we may never observe all the confounders, this is perhaps the only way to predict potential outcomes and estimate treatment effects on the individual level in social sciences without going deeper to the levels of neuroscience or biology. Fourth, our method has wide applicability, as it is common to have multiple related outcomes collected in microeconomics data. For example, we may observe several health related outcomes such as health facility usage, health related cost, general health, etc.

The rest of the study is organised as follows. Section 2 presents the theoretical framework. Section 3 examines the small sample performance of our method using Monte Carlo simulation, and compares it with related methods. Section 4 provides an empirical example of estimating the effect of health insurance coverage on individual usage of hospital emergency departments using the Oregon Health Insurance Experiment data. Section 5 concludes and discusses potential directions for future research. The proofs are collected in the appendix.

2 Theory

2.1 Set Up

Suppose that we observe KK outcomes in domain 𝒦={1,2,,K}\mathcal{K}=\{1,2,\dots,K\} for NN individuals or units over T2T\geq 2 time periods, where a domain refers to a collection of related outcomes that depend on the same set of observed covariates and unobserved characteristics. For example, health-related outcomes may be affected by observed covariates such as age, education, occupation and income, as well as unobserved individual characteristics such as genetic inheritance, health habits and risk preferences. Assume that the N1N_{1} individuals in the treated group 𝒯\mathcal{T} receive the treatment at period T0+1TT_{0}+1\leq T and remain treated afterwards, while the N0=NN1N_{0}=N-N_{1} individuals in the control group 𝒞\mathcal{C} remain untreated throughout the TT periods. Denoting the binary treatment status for individual ii at time tt as DitD_{it}, we have Dit=1D_{it}=1 for i𝒯i\in\mathcal{T} and t>T0t>T_{0}, and Dit=0D_{it}=0 otherwise.

Following the “Rubin Causal Model” (Rubin,, 1974), the treatment effect on outcome k𝒦k\in\mathcal{K} for individual ii at time tt is given by the difference between the treated and untreated potential outcomes

τit,k=Yit,k1Yit,k0,\displaystyle\tau_{it,k}=Y_{it,k}^{1}-Y_{it,k}^{0}, (1)

where Yit,k1Y_{it,k}^{1} is the treated potential outcome, the outcome that we would observe for individual ii at time tt if Dit=1D_{it}=1, and Yit,k0Y_{it,k}^{0} is the untreated potential outcome, the outcome that we would observe if Dit=0D_{it}=0. Instead of assuming the unconfoundedness condition, we characterise the two potential outcomes for individual ii at time tt and k𝒦k\in\mathcal{K} using the interactive fixed effects models:

Yit,k1=\displaystyle Y_{it,k}^{1}= 𝑿it𝜷t,k1+𝝁i𝝀t,k1+εit,k1,\displaystyle\boldsymbol{X}_{it}^{\prime}\boldsymbol{\beta}_{t,k}^{1}+\boldsymbol{\mu}_{i}^{\prime}\boldsymbol{\lambda}_{t,k}^{1}+\varepsilon_{it,k}^{1}, (2)
Yit,k0=\displaystyle Y_{it,k}^{0}= 𝑿it𝜷t,k0+𝝁i𝝀t,k0+εit,k0,\displaystyle\boldsymbol{X}_{it}^{\prime}\boldsymbol{\beta}_{t,k}^{0}+\boldsymbol{\mu}_{i}^{\prime}\boldsymbol{\lambda}_{t,k}^{0}+\varepsilon_{it,k}^{0}, (3)

where 𝑿it\boldsymbol{X}_{it} is the r×1r\times 1 vector of observed covariates unaffected by the treatment, 𝝁i\boldsymbol{\mu}_{i} is the f×1f\times 1 vector of unobserved individual characteristics, 𝜷t,k1\boldsymbol{\beta}_{t,k}^{1} and 𝝀t,k1\boldsymbol{\lambda}_{t,k}^{1} are the r×1r\times 1 and f×1f\times 1 vectors of coefficients of 𝑿it\boldsymbol{X}_{it} and 𝝁i\boldsymbol{\mu}_{i} respectively for the treated potential outcome, 𝜷t,k0\boldsymbol{\beta}_{t,k}^{0} and 𝝀t,k0\boldsymbol{\lambda}_{t,k}^{0} are the coefficients for the untreated potential outcome, and εit,k1\varepsilon_{it,k}^{1} and εit,k0\varepsilon_{it,k}^{0} are the idiosyncratic shocks.

Remark 1.

Our models for the potential outcomes are quite general, and incorporate the models in Abadie et al., (2010), Hsiao et al., (2012) and Xu, (2017), as well as the the additive fixed effects model for difference-in-differences as special cases.333Specifically, if we assume 𝜷t,k0=𝜷k0\boldsymbol{\beta}_{t,k}^{0}=\boldsymbol{\beta}_{k}^{0}, model (3) reduces to the model in Bai, (2009) and Xu, (2017); if we assume 𝑿it=𝑿i\boldsymbol{X}_{it}=\boldsymbol{X}_{i} and the first element of 𝝁i\boldsymbol{\mu}_{i} is 1, model (3) reduces to the model in Abadie et al., (2010); if we assume 𝑿it=𝑿i\boldsymbol{X}_{it}=\boldsymbol{X}_{i} are unobserved and the first element of 𝝀t,k0\boldsymbol{\lambda}_{t,k}^{0} is 1, then model (3) reduces to the model in Hsiao et al., (2012); if we assume 𝝁i=[1ai]\boldsymbol{\mu}_{i}=[1\enspace a_{i}]^{\prime} and 𝝀t,k0=[bt1]\boldsymbol{\lambda}_{t,k}^{0}=[b_{t}\enspace 1]^{\prime}, then model (3) reduces to the additive fixed effects model for difference-in-differences. Note that the related outcomes need not depend on exactly the same set of observed covariates and unobserved individual characteristics. The vectors of outcome-specific and time-varying coefficients may contain 0, so that outcome kk may be affected by some of the observed covariates and unobserved individual characteristics in some periods, but not necessarily by all of them in all periods, as long as there is enough variation in the coefficients over time or across the outcomes. The potential outcomes are also allowed to depend on predictors not included in 𝑿it\boldsymbol{X}_{it} or 𝝁i\boldsymbol{\mu}_{i}, as long as they are not correlated with the included predictors and the treatment status so that they can be treated as part of the idiosyncratic shock.

The regularity conditions on the observed covariates and the unobserved individual characteristics are stated in Assumption 1, and the assumptions on the idiosyncratic shocks are given in Assumption 2.

Assumption 1.
  1. 1)

    𝑿it\boldsymbol{X}_{it}, 𝝁i\boldsymbol{\mu}_{i} are independent for all ii, and are identically distributed for all i𝒯i\in\mathcal{T} and all i𝒞i\in\mathcal{C} respectively;

  2. 2)

    There exists M[0,)M\in[0,\infty) such that 𝔼𝑿it4<M\mathbb{E}\|\boldsymbol{X}_{it}\|^{4}<M and 𝔼𝝁i4<M\mathbb{E}\|\boldsymbol{\mu}_{i}\|^{4}<M.

Assumption 2.

For d{0,1}d\in\{0,1\}, we have

  1. 1)

    𝔼(εit,kd𝑿js,𝝁j,Djs)=0\mathbb{E}\left(\varepsilon_{it,k}^{d}\mid\boldsymbol{X}_{js},\boldsymbol{\mu}_{j},D_{js}\right)=0 for all ii, jj, tt, ss and kk;

  2. 2)

    εit,kd\varepsilon_{it,k}^{d} are independent across ii and tt;

  3. 3)

    𝔼(εit,kd,εit,ld)=σt,kld\mathbb{E}\left(\varepsilon_{it,k}^{d},\varepsilon_{it,l}^{d}\right)=\sigma_{t,kl}^{d} for all ii, tt, kk, ll;

  4. 4)

    There exists M[0,)M\in[0,\infty) such that 𝔼|εit,kd|4<M\mathbb{E}|\varepsilon_{it,k}^{d}|^{4}<M for all ii, tt, kk.

Remark 2.

The distributions of the observed covariates and the unobserved individual characteristics are allowed to differ for the treated and untreated individuals, i.e., selection on unobservables is allowed, which is a great advantage over the policy evaluation methods that rely on the unconfoundedness condition. The idiosyncratic shocks are assumed to have zero mean conditional on the observed covariates, unobserved individual characteristics and the treatment status. They are also assumed to be independent across individuals and time periods, as the unobserved interactive fixed effects that account for the cross-sectional and time-serial correlations have been separated out.444The idiosyncratic shocks may be allowed to be correlated both over time and across outcomes, as long as they can be modelled parametrically and removed using a quasi-differencing approach. This is left for future research. Furthermore, they are assumed to be homoskedastic across individuals for our inference method to be valid. The last part in Assumption 2 is a regularity condition which, together with the conditions in Assumption 1, ensures the weak law of large numbers and the central limit theorem hold.

Given the models for Yit,k1Y_{it,k}^{1} and Yit,k0Y_{it,k}^{0} in (2) and (3), the individual treatment effect is identified by the observed covariates and the unobserved individual characteristics, i.e., two persons with the same values for these underlying predictors have identical individual treatment effect. Denote the set of observed covariates and unobserved individual characteristics as 𝑯it=[𝑿it𝝁i]\boldsymbol{H}_{it}=\left[\boldsymbol{X}_{it}^{\prime}\enspace\boldsymbol{\mu}_{i}^{\prime}\right]^{\prime}, then the individual treatment effect for individual ii with 𝑯it=𝒉it\boldsymbol{H}_{it}=\boldsymbol{h}_{it} is given by

τ¯it,k𝔼(Yit,k1Yit,k0𝑯it=𝒉it),\displaystyle\bar{\tau}_{it,k}\equiv\mathbb{E}\left(Y_{it,k}^{1}-Y_{it,k}^{0}\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right), (4)

which may appear similar to the conditional average treatment effect, but is different by conditioning not only on the observed covariates, but also on the unobserved individual characteristics.555As we assume the parametric models for the potential outcomes in (2) and (3) for all individuals, there is no need to impose additional assumptions on the propensity distribution for the individual treatment effect to be identified on its full support.

Our goal is to estimate the individual treatment effects τ¯it,k\bar{\tau}_{it,k}, i=1,,Ni=1,\dots,N. Once we have estimated the individual treatment effects, the estimates of the average treatment effects for heterogeneous subgroups defined by some observed covariates, also known as the conditional average treatment effects, and the estimate for the average treatment effect for the sample or the population are also readily available using the average of the estimated individual treatment effects in the corresponding groups.

As 𝝁i\boldsymbol{\mu}_{i} is not observed, a direct application of least squares estimation to estimate the models in (2) and (3) would suffer from omitted variables bias. Since we have multiple outcomes that depend on the same set of underlying predictors, and we observe the untreated potential outcomes for all individuals prior to the treatment, we can use these pretreatment outcomes to replace 𝝁i\boldsymbol{\mu}_{i} in the models.666This is analogous to the first step of the approach in Hsiao et al., (2012), who predict the posttreatment outcomes using pretreatment outcomes in lieu of the unobserved time factors in a small NN, big TT environment. Stacking the KK outcomes observed in tT0t\leq T_{0}, we have

𝒀it0=𝜷t0𝑿it+𝝀t0𝝁i+𝜺it0,\displaystyle\boldsymbol{Y}_{it}^{0}=\boldsymbol{\beta}_{t}^{0}\boldsymbol{X}_{it}+\boldsymbol{\lambda}_{t}^{0}\boldsymbol{\mu}_{i}+\boldsymbol{\varepsilon}_{it}^{0}, (5)

where 𝒀it0\boldsymbol{Y}_{it}^{0} and 𝜺it0\boldsymbol{\varepsilon}_{it}^{0} are K×1K\times 1, 𝜷t0\boldsymbol{\beta}_{t}^{0} is K×rK\times r, and 𝝀t0\boldsymbol{\lambda}_{t}^{0} is K×fK\times f. Let 𝒫{1,,T0}\mathcal{P}\subseteq\left\{1,\cdots,T_{0}\right\} be a set of PP pretreatment periods. We can further stack the outcomes over these periods to get

𝒀i𝒫=𝜹i𝒫+𝝀𝒫𝝁i+𝜺i𝒫,\displaystyle\boldsymbol{Y}_{i}^{\mathcal{P}}=\boldsymbol{\delta}_{i}^{\mathcal{P}}+\boldsymbol{\lambda}^{\mathcal{P}}\boldsymbol{\mu}_{i}+\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}, (6)

where 𝜹i𝒫=[(𝜷s0𝑿is)]\boldsymbol{\delta}_{i}^{\mathcal{P}}=\left[\cdots\enspace\left(\boldsymbol{\beta}_{s}^{0}\boldsymbol{X}_{is}\right)^{\prime}\enspace\cdots\right]^{\prime} with s𝒫s\in\mathcal{P} is KP×1KP\times 1, 𝝀𝒫\boldsymbol{\lambda}^{\mathcal{P}} is KP×fKP\times f, and 𝜺i𝒫\boldsymbol{\varepsilon}_{i}^{\mathcal{P}} is KP×1KP\times 1.

To be able to recover 𝝁i\boldsymbol{\mu}_{i} from the covariates and outcomes observed in 𝒫\mathcal{P}, we need the following full rank condition, which ensures that there is enough variation in the effects of the unobserved individual characteristics over time or across different outcomes.

Assumption 3.

𝝀𝒫𝝀𝒫{\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}\boldsymbol{\lambda}^{\mathcal{P}} has rank ff.

Remark 3.

Although we do not require the number of pretreatment outcomes to be large, Assumption 3 implies that KPKP needs to be at least as large as ff. As T0T_{0} (and thus PP) is usually small in empirical microeconomics, this assumption is made more plausible by having K>1K>1, i.e., using multiple related outcomes.

Remark 4.

The number of factors ff is generally not observed. To determine ff, one may use the method in Bai and Ng, (2002) when both NN and TT are large. One may also adopt a cross-validation procedure to choose ff that minimises the out-of-sample mean squared prediction error, as in Xu, (2017). Although we do not estimate the interactive fixed effects term directly, we may choose the number of pretreatment outcomes that best accommodates ff using cross-validation as well, which will be discussed in more details later.

Under Assumption 3, we can pre-multiply both sides of equation (6) by (𝝀𝒫𝝀𝒫)1𝝀𝒫({\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}\boldsymbol{\lambda}^{\mathcal{P}})^{-1}{\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime} to obtain

𝝁i=(𝝀𝒫𝝀𝒫)1𝝀𝒫(𝒀i𝒫𝜹i𝒫𝜺i𝒫).\displaystyle\boldsymbol{\mu}_{i}=({\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}\boldsymbol{\lambda}^{\mathcal{P}})^{-1}{\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}(\boldsymbol{Y}_{i}^{\mathcal{P}}-\boldsymbol{\delta}_{i}^{\mathcal{P}}-\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}). (7)

Substituting (7) into Yit,k0=𝑿it𝜷t,k0+𝝁i𝝀t,k0+εit,k0Y_{it,k}^{0}=\boldsymbol{X}_{it}^{\prime}\boldsymbol{\beta}_{t,k}^{0}+\boldsymbol{\mu}_{i}^{\prime}\boldsymbol{\lambda}_{t,k}^{0}+\varepsilon_{it,k}^{0}, t>T0t>T_{0}, and with a little abuse on the notation by omitting the superscript 𝒫\mathcal{P} on the new coefficients and error term, we have

Yit,k0\displaystyle Y_{it,k}^{0} =𝑿it𝜷t,k0𝑿is𝜶st,k0Pterms+𝒀i𝒫𝜸t,k0+eit,k0,\displaystyle=\boldsymbol{X}_{it}^{\prime}\boldsymbol{\beta}_{t,k}^{0}\underbrace{-\cdots-\boldsymbol{X}_{is}^{\prime}\boldsymbol{\alpha}_{st,k}^{0}-\cdots}_{P\ \text{terms}}+{\boldsymbol{Y}_{i}^{\mathcal{P}}}^{\prime}\boldsymbol{\gamma}_{t,k}^{0}+e_{it,k}^{0}, (8)

where

𝜶st,k0\displaystyle\boldsymbol{\alpha}_{st,k}^{0} =𝜷s0𝝀s0(𝝀𝒫𝝀𝒫)1𝝀t,k0,s𝒫,\displaystyle={\boldsymbol{\beta}_{s}^{0}}^{\prime}\boldsymbol{\lambda}^{0}_{s}({\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}\boldsymbol{\lambda}^{\mathcal{P}})^{-1}\boldsymbol{\lambda}^{0}_{t,k},\enspace s\in\mathcal{P},
𝜸t,k0\displaystyle\boldsymbol{\gamma}_{t,k}^{0} =𝝀𝒫(𝝀𝒫𝝀𝒫)1𝝀t,k0,\displaystyle=\boldsymbol{\lambda}^{\mathcal{P}}({\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}\boldsymbol{\lambda}^{\mathcal{P}})^{-1}\boldsymbol{\lambda}^{0}_{t,k},
eit,k0\displaystyle e_{it,k}^{0} =εit,k0𝜸t,k0𝜺i𝒫.\displaystyle=\varepsilon_{it,k}^{0}-{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}.

Let Z=r(P+1)+KPZ=r(P+1)+KP. If we denote the Z×1Z\times 1 vector of observables [𝑿it𝑿is𝒀i𝒫][\boldsymbol{X}_{it}^{\prime}\cdots\boldsymbol{X}_{is}^{\prime}\cdots{\boldsymbol{Y}_{i}^{\mathcal{P}}}^{\prime}]^{\prime} as 𝒁it\boldsymbol{Z}_{it}, and the Z×1Z\times 1 vector of coefficients [𝜷t,k0𝜶st,k0𝜸t,k0][{\boldsymbol{\beta}_{t,k}^{0}}^{\prime}\cdots{\boldsymbol{\alpha}_{st,k}^{0}}^{\prime}\cdots{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}]^{\prime} as 𝜽t,k0\boldsymbol{\theta}_{t,k}^{0}, then equation (8) can be abbreviated as

Yit,k0=𝒁it𝜽t,k0+eit,k0.Y_{it,k}^{0}=\boldsymbol{Z}_{it}^{\prime}\boldsymbol{\theta}_{t,k}^{0}+e_{it,k}^{0}. (9)

Similarly, substituting (7) into Yit,k1=𝑿it𝜷t,k1+𝝁i𝝀t,k1+εit,k1Y_{it,k}^{1}=\boldsymbol{X}_{it}^{\prime}\boldsymbol{\beta}_{t,k}^{1}+\boldsymbol{\mu}_{i}^{\prime}\boldsymbol{\lambda}_{t,k}^{1}+\varepsilon_{it,k}^{1}, t>T0t>T_{0}, we have

Yit,k1=𝒁it𝜽t,k1+eit,k1,Y_{it,k}^{1}=\boldsymbol{Z}_{it}^{\prime}\boldsymbol{\theta}_{t,k}^{1}+e_{it,k}^{1}, (10)

where

𝜽t,k1\displaystyle\boldsymbol{\theta}_{t,k}^{1} =[𝜷t,k1𝜶st,k1𝜸t,k1],\displaystyle=[{\boldsymbol{\beta}_{t,k}^{1}}^{\prime}\cdots{\boldsymbol{\alpha}_{st,k}^{1}}^{\prime}\cdots{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}]^{\prime},
𝜶st,k1\displaystyle\boldsymbol{\alpha}_{st,k}^{1} =𝜷s0𝝀s0(𝝀𝒫𝝀𝒫)1𝝀t,k1,s𝒫,\displaystyle={\boldsymbol{\beta}_{s}^{0}}^{\prime}\boldsymbol{\lambda}^{0}_{s}({\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}\boldsymbol{\lambda}^{\mathcal{P}})^{-1}\boldsymbol{\lambda}_{t,k}^{1},\enspace s\in\mathcal{P},
𝜸t,k1\displaystyle\boldsymbol{\gamma}_{t,k}^{1} =𝝀𝒫(𝝀𝒫𝝀𝒫)1𝝀t,k1,\displaystyle=\boldsymbol{\lambda}^{\mathcal{P}}({\boldsymbol{\lambda}^{\mathcal{P}}}^{\prime}\boldsymbol{\lambda}^{\mathcal{P}})^{-1}\boldsymbol{\lambda}_{t,k}^{1},
eit,k1\displaystyle e_{it,k}^{1} =εit,k1𝜸t,k1𝜺i𝒫.\displaystyle=\varepsilon_{it,k}^{1}-{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}.

2.2 Estimation

Under Assumption 2, we have 𝔼(eit,k1𝑯it)=0\mathbb{E}(e_{it,k}^{1}\mid\boldsymbol{H}_{it})=0 and 𝔼(eit,k0𝑯it)=0\mathbb{E}(e_{it,k}^{0}\mid\boldsymbol{H}_{it})=0. This suggests using τ^it,k=𝒁it(𝜽^t,k1𝜽^t,k0)\widehat{\tau}_{it,k}=\boldsymbol{Z}_{it}^{\prime}(\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\widehat{\boldsymbol{\theta}}_{t,k}^{0}), where 𝜽^t,k1\widehat{\boldsymbol{\theta}}_{t,k}^{1} and 𝜽^t,k0\widehat{\boldsymbol{\theta}}_{t,k}^{0} are some estimators of 𝜽t,k1{\boldsymbol{\theta}}_{t,k}^{1} and 𝜽t,k0{\boldsymbol{\theta}}_{t,k}^{0}, to estimate τ¯it,k\bar{\tau}_{it,k}. Note, however, that the error terms eit,k1e_{it,k}^{1} and eit,k0e_{it,k}^{0} are correlated with the regressors, since 𝒁it\boldsymbol{Z}_{it} contains 𝒀i𝒫\boldsymbol{Y}_{i}^{\mathcal{P}} which is correlated with 𝜺i𝒫\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}. This renders the OLS estimators biased and inconsistent, which can be seen as a classical measurement errors in variables problem.777This is noted in Ferman and Pinto, (2019) as well, who also suggested using pre-treatment outcomes as instrumental variables to deal with the problem. Our method is also related to the quasi-differencing approach in Holtz-Eakin et al., (1988) and the GMM approach in Ahn et al., (2013). While these studies focus on estimating the coefficients on the observed covariates, our focus is on estimating the individual treatment effects. We thus use the remaining outcomes as instrumental variables for 𝒀i𝒫\boldsymbol{Y}_{i}^{\mathcal{P}} to consistently estimate 𝜽t,k1{\boldsymbol{\theta}}_{t,k}^{1} and 𝜽t,k0{\boldsymbol{\theta}}_{t,k}^{0} in each period, which would then allow us to obtain asymptotically unbiased estimates for the individual treatment effects.888We may construct the vectors of regressors and instruments differently under alternative assumptions on the dependence structure of the idiosyncratic shocks. For example, if the idiosyncratic shocks are correlated across time but are independent across outcomes, then we can split different outcomes into regressors and instruments. This would be similar to using the characteristics of similar products (Berry et al.,, 1995) or trading countries (see the Trade-weighted World Income instrument in Acemoglu et al.,, 2008) as instrumental variables. Incorporating more complex structures of the idiosyncratic shocks in the model is left for future research. Since the outcomes depend on about the same set of observed and unobserved individual characteristics, the remaining outcomes are strongly correlated with the outcomes included in 𝒀i𝒫\boldsymbol{Y}_{i}^{\mathcal{P}}. Additionally, given that the idiosyncratic shocks are independent across time, the remaining outcomes are not correlated with eit,k1e_{it,k}^{1} or eit,k0e_{it,k}^{0}. Thus, both the relevance and exogeneity conditions are satisfied, and the remaining outcomes can serve as valid instrumental variables.

Let 𝑹it=[𝑿it𝑿is𝒀i𝒫]\boldsymbol{R}_{it}=[\boldsymbol{X}_{it}^{\prime}\cdots\boldsymbol{X}_{is}^{\prime}\cdots{\boldsymbol{Y}_{i}^{-\mathcal{P}}}^{\prime}]^{\prime} be the R×1R\times 1 vector of instruments, where the (KTKP1)×1(KT-KP-1)\times 1 vector 𝒀i𝒫\boldsymbol{Y}_{i}^{-\mathcal{P}} comprises the remaining pretreatment outcomes as well as the posttreatment outcomes other than Yit,kY_{it,k}.999In the special case of T1=1T_{1}=1 and T0=1T_{0}=1, we can include K1K-1 pretreatment outcomes as regressors, and use the posttreatment outcomes other than Yit,kY_{it,k} as instruments so that RZR\geq Z. Stacking 𝒁it\boldsymbol{Z}_{it}, 𝑹it\boldsymbol{R}_{it} and 𝒀it,k0\boldsymbol{Y}_{it,k}^{0} respectively over the N0N_{0} untreated individuals, we obtain the N0×ZN_{0}\times Z matrix of regressors 𝒁t0\boldsymbol{Z}_{t}^{0}, the N0×RN_{0}\times R matrix of instruments 𝑹t0\boldsymbol{R}_{t}^{0} and the N0×1N_{0}\times 1 matrix of outcomes 𝒀t,k0\boldsymbol{Y}_{t,k}^{0} for the untreated individuals. We can obtain 𝒁t1\boldsymbol{Z}_{t}^{1}, 𝑹t1\boldsymbol{R}_{t}^{1} and 𝒀t,k1\boldsymbol{Y}_{t,k}^{1} similarly for the N1N_{1} treated individuals. The GMM estimator for the individual treatment effect τ¯it,k\bar{\tau}_{it,k} can then be constructed as

τ^it,k\displaystyle\widehat{\tau}_{it,k} =𝒁it(𝜽^t,k1𝜽^t,k0),\displaystyle=\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\widehat{\boldsymbol{\theta}}_{t,k}^{0}\right), (11)

where

𝜽^t,k1\displaystyle\widehat{\boldsymbol{\theta}}_{t,k}^{1} =(𝒁t1𝑹t1𝑾1𝑹t1𝒁t1)1𝒁t1𝑹t1𝑾1𝑹t1𝒀t,k1,\displaystyle=\left({\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{R}_{t}^{1}\boldsymbol{W}^{1}{\boldsymbol{R}_{t}^{1}}^{\prime}\boldsymbol{Z}_{t}^{1}\right)^{-1}{\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{R}_{t}^{1}\boldsymbol{W}^{1}{\boldsymbol{R}_{t}^{1}}^{\prime}\boldsymbol{Y}_{t,k}^{1}, (12)
𝜽^t,k0\displaystyle\widehat{\boldsymbol{\theta}}_{t,k}^{0} =(𝒁t0𝑹t0𝑾0𝑹t0𝒁t0)1𝒁t0𝑹t0𝑾0𝑹t0𝒀t,k0,\displaystyle=\left({\boldsymbol{Z}_{t}^{0}}^{\prime}\boldsymbol{R}_{t}^{0}\boldsymbol{W}^{0}{\boldsymbol{R}_{t}^{0}}^{\prime}\boldsymbol{Z}_{t}^{0}\right)^{-1}{\boldsymbol{Z}_{t}^{0}}^{\prime}\boldsymbol{R}_{t}^{0}\boldsymbol{W}^{0}{\boldsymbol{R}_{t}^{0}}^{\prime}\boldsymbol{Y}_{t,k}^{0}, (13)

with 𝑾1\boldsymbol{W}^{1} and 𝑾0\boldsymbol{W}^{0} being some R×RR\times R positive definite matrices.

Remark 5.

Using the residuals 𝒆^t,k1=𝒀t,k1𝒁t1𝜽^t,k1\widehat{\boldsymbol{e}}_{t,k}^{1}=\boldsymbol{Y}_{t,k}^{1}-\boldsymbol{Z}_{t}^{1}\widehat{\boldsymbol{\theta}}_{t,k}^{1} and 𝒆^t,k0=𝒀t,k0𝒁t0𝜽^t,k0\widehat{\boldsymbol{e}}_{t,k}^{0}=\boldsymbol{Y}_{t,k}^{0}-\boldsymbol{Z}_{t}^{0}\widehat{\boldsymbol{\theta}}_{t,k}^{0}, we can further construct the two-step efficient GMM estimator by replacing 𝑾1\boldsymbol{W}^{1} and 𝑾0\boldsymbol{W}^{0} in equations (12) and (13) with N1(𝑹t1𝑼t1𝑹t1)1N_{1}\left({\boldsymbol{R}_{t}^{1}}^{\prime}\boldsymbol{U}_{t}^{1}\boldsymbol{R}_{t}^{1}\right)^{-1} and N0(𝑹t0𝑼t0𝑹t0)1N_{0}\left({\boldsymbol{R}_{t}^{0}}^{\prime}\boldsymbol{U}_{t}^{0}\boldsymbol{R}_{t}^{0}\right)^{-1}, where 𝑼t1\boldsymbol{U}_{t}^{1} and 𝑼t0\boldsymbol{U}_{t}^{0} are diagonal matrices with the squared elements of 𝒆^t,k1\widehat{\boldsymbol{e}}_{t,k}^{1} and 𝒆^t,k0\widehat{\boldsymbol{e}}_{t,k}^{0} on the diagonals.

Remark 6.

One may also construct the estimators for the individual treatment effects using authentic predicted outcomes obtained from a leave-one-out procedure, where 𝜽t,k1{\boldsymbol{\theta}}_{t,k}^{1} and 𝜽t,k0{\boldsymbol{\theta}}_{t,k}^{0} are estimated for each individual using the sample that excludes that individual. This procedure may be computationally expensive though, as there are no simple linear expressions for the leave-one-out coefficients estimates and residuals as for those in linear regression (Hansen,, 2021).

The following result shows that the bias of the GMM estimator for the individual treatment effect in (11) goes away as both the number of treated individuals and the number of untreated individuals become larger.

Proposition 1.

Under Assumptions 1-3, 𝔼(τ^it,kτit,k𝐇it=𝐡it)0\mathbb{E}\left(\widehat{\tau}_{it,k}-\tau_{it,k}\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right)\rightarrow 0 as N1,N0N_{1},N_{0}\rightarrow\infty.

Once we have the estimates for the individual treatment effects, the average treatment effect τt,k=𝔼(τit,k)\tau_{t,k}=\mathbb{E}\left(\tau_{it,k}\right) can be conveniently estimated using the average of the estimated individual treatment effects τ^t,k=1Ni=1Nτ^it,k\widehat{\tau}_{t,k}=\frac{1}{N}\sum_{i=1}^{N}\widehat{\tau}_{it,k}, which can be shown to be consistent.

Proposition 2.

Under Assumptions 1-3, τ^t,kτt,k𝑝0\widehat{\tau}_{t,k}-\tau_{t,k}\overset{p}{\rightarrow}0 as N0,N1N_{0},N_{1}\rightarrow\infty, and τ^t,kτt,k=Op(N11/2)+Op(N01/2)\widehat{\tau}_{t,k}-\tau_{t,k}=O_{p}\left(N_{1}^{-1/2}\right)+O_{p}\left(N_{0}^{-1/2}\right).

2.3 Model Selection

To satisfy Assumption 3, we need the number of pretreatment outcomes that we include as regressors in the model to be at least as large as ff. Including more pretreatment outcomes may increase the variance of the estimator by increasing the variances of 𝜽^t,k1\widehat{\boldsymbol{\theta}}_{t,k}^{1} and 𝜽^t,k0\widehat{\boldsymbol{\theta}}_{t,k}^{0}, but may also reduce the variance of the estimator when the sample is large and the variances of 𝜽^t,k1\widehat{\boldsymbol{\theta}}_{t,k}^{1} and 𝜽^t,k0\widehat{\boldsymbol{\theta}}_{t,k}^{0} are small, since

(𝜸t,k1𝜸t,k0)𝜺i𝒫=1KPq𝒦s𝒫(𝝀t,k1𝝀t,k0)(1KPl𝒦n𝒫𝝀n,l0𝝀n,l0)1𝝀s,q0εis,q0\displaystyle\left(\boldsymbol{\gamma}_{t,k}^{1}-\boldsymbol{\gamma}_{t,k}^{0}\right)^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}=\frac{1}{KP}\sum_{q\in\mathcal{K}}\sum_{s\in\mathcal{P}}\left(\boldsymbol{\lambda}^{1}_{t,k}-\boldsymbol{\lambda}^{0}_{t,k}\right)^{\prime}\left(\frac{1}{KP}\sum_{l\in\mathcal{K}}\sum_{n\in\mathcal{P}}\boldsymbol{\lambda}_{n,l}^{0}{\boldsymbol{\lambda}_{n,l}^{0}}^{\prime}\right)^{-1}\boldsymbol{\lambda}_{s,q}^{0}\varepsilon_{is,q}^{0} (14)

in the prediction error converges in probability to 0 as KPKP grows.101010Consistency of the individual treatment effect estimator may also be shown by allowing both NN and KPKP to grow, with restrictions on the relative growth rate, e.g., KPmin(N1,N0)0\frac{KP}{\min\left(\sqrt{N_{1}},\sqrt{N_{0}}\right)}\rightarrow 0. We do not pursue this path in this study, as the number of pretreatment outcomes in empirical microeconomics that we focus on is usually not large.

To select the number of pretreatment outcomes to include in the model, we follow a model selection procedure similar to that in Hsiao et al., (2012), where for each usable number of pretreatment outcomes, we construct many different models by including a random subset of the pretreatment outcomes as regressors and the remaining outcomes as instruments. We then estimate the models using GMM and obtain the leave-one-out prediction errors for all or a subsample of the individuals. The best set of pretreatment outcomes is chosen as the one that minimises the mean squared leave-one-out prediction error.111111An alternative way to select the best set of pretreatment outcomes is to use information criteria such as GMM-BIC and GMM-AIC (Andrews,, 1999). To avoid the potential problem of post-selection inference, we may also randomly split the sample into two parts, where we select the best model on one part, and conduct inference on the other.

In addition to the models using only a subset of the pretreatment outcomes, we also consider averaging different models that use the same number of pretreatment outcomes. Since the estimators constructed using only a subset of the pretreatment outcomes are asymptotically unbiased, as long as the number of pretreatment outcomes is larger than ff, this property is passed on to the averaged estimator. The averaged estimator may also be more efficient as it uses more information in the sample and reduces uncertainty caused by a small number of sample splits.121212We stick with simple averaging in this paper. More flexible averaging scheme, e.g., with larger weights on those with smaller out of sample prediction errors, would be an interesting direction for future research. The leave-one-out prediction errors are also averaged over the models, and the best number of pretreatment outcomes to be used for the averaged estimator is similarly determined by minimising the mean squared leave-one-out prediction error.

2.4 Related methods

2.4.1 Linear conditional mean

An alternative approach to estimating the treatment effects is to follow Hsiao et al., (2012) and assume that

𝔼(𝜺i𝒫𝒁it)=𝑪𝒁it,\displaystyle\mathbb{E}\left(\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}\mid\boldsymbol{Z}_{it}\right)=\boldsymbol{C}^{\prime}\boldsymbol{Z}_{it}, (15)

where 𝑪=𝔼(𝒁it𝒁it)1𝔼(𝒁it𝜺i𝒫)\boldsymbol{C}=\mathbb{E}\left(\boldsymbol{Z}_{it}\boldsymbol{Z}_{it}^{\prime}\right)^{-1}\mathbb{E}\left(\boldsymbol{Z}_{it}{\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}}^{\prime}\right) is Z×KT0Z\times KT_{0}.131313This assumption holds in special cases, e.g., when the unobserved predictors and the idiosyncratic shocks all follow the normal distribution (Li and Bell,, 2017). In more general cases, this assumption may be considered to hold approximately. We can then separate the error term into a part correlated with the regressors and a part that has zero conditional mean, and rewrite the untreated potential outcome Yit,k0Y_{it,k}^{0} as

Yit,k0\displaystyle Y_{it,k}^{0} =𝔼(Yit,k0𝒁it)+uit,k0\displaystyle=\mathbb{E}\left(Y_{it,k}^{0}\mid\boldsymbol{Z}_{it}\right)+u_{it,k}^{0}
=(𝜽t,k0𝜸t,k0𝑪)𝒁it+uit,k0\displaystyle=\left({\boldsymbol{\theta}_{t,k}^{0}}^{\prime}-{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{C}^{\prime}\right)\boldsymbol{Z}_{it}+u_{it,k}^{0}
=𝒁it𝜽t,k0+uit,k0,\displaystyle=\boldsymbol{Z}_{it}^{\prime}\boldsymbol{\theta}_{t,k}^{*0}+u_{it,k}^{0}, (16)

where uit,k0=εit,k0𝜸t,k0𝜺i𝒫+𝜸t,k0𝑪𝒁itu_{it,k}^{0}=\varepsilon_{it,k}^{0}-{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}+{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{C}^{\prime}\boldsymbol{Z}_{it}. Similarly, the treated potential outcome Yit,k1Y_{it,k}^{1} can be rewritten as

Yit,k1\displaystyle Y_{it,k}^{1} =𝒁it𝜽t,k1+uit,k1,\displaystyle=\boldsymbol{Z}_{it}^{\prime}\boldsymbol{\theta}_{t,k}^{*1}+u_{it,k}^{1}, (17)

where 𝜽t,k1=𝜽t,k1𝑪𝜸t,k1\boldsymbol{\theta}_{t,k}^{*1}={\boldsymbol{\theta}_{t,k}^{1}}-\boldsymbol{C}\boldsymbol{\gamma}_{t,k}^{1}, and uit,k1=εit,k1𝜸t,k1𝜺i𝒫+𝜸t,k1𝑪𝒁itu_{it,k}^{1}=\varepsilon_{it,k}^{1}-{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}+{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{C}^{\prime}\boldsymbol{Z}_{it}.

Since 𝔼(uit,k1𝒁it)=𝔼[eit,k1𝔼(eit,k1𝒁it)𝒁it]=0\mathbb{E}(u_{it,k}^{1}\mid\boldsymbol{Z}_{it})=\mathbb{E}[e_{it,k}^{1}-\mathbb{E}(e_{it,k}^{1}\mid\boldsymbol{Z}_{it})\mid\boldsymbol{Z}_{it}]=0 and 𝔼(uit,k0𝒁it)=0\mathbb{E}(u_{it,k}^{0}\mid\boldsymbol{Z}_{it})=0, it is straightforward to show that the least squares estimators 𝜽^t,k1=(𝒁t1𝒁t1)1𝒁t1𝒀t,k1\widehat{\boldsymbol{\theta}}_{t,k}^{*1}=({\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{Z}_{t}^{1})^{-1}{\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{Y}_{t,k}^{1} and 𝜽^t,k0=(𝒁t0𝒁t0)1𝒁t0𝒀t,k0\widehat{\boldsymbol{\theta}}_{t,k}^{*0}=({\boldsymbol{Z}_{t}^{0}}^{\prime}\boldsymbol{Z}_{t}^{0})^{-1}{\boldsymbol{Z}_{t}^{0}}^{\prime}\boldsymbol{Y}_{t,k}^{0} are the unbiased estimators of 𝜽t,k1\boldsymbol{\theta}_{t,k}^{*1} and 𝜽t,k0\boldsymbol{\theta}_{t,k}^{*0} respectively.141414The linear conditional mean assumption also implies that the unconfoundedness assumption is satisfied, as 𝔼(Yit,k0𝒁it,Dit=1)=𝔼(Yit,k0𝒁it,Dit=0)\mathbb{E}(Y_{it,k}^{0}\mid\boldsymbol{Z}_{it},D_{it}=1)=\mathbb{E}(Y_{it,k}^{0}\mid\boldsymbol{Z}_{it},D_{it}=0) and 𝔼(Yit,k1𝒁it,Dit=1)=𝔼(Yit,k1𝒁it,Dit=0)\mathbb{E}(Y_{it,k}^{1}\mid\boldsymbol{Z}_{it},D_{it}=1)=\mathbb{E}(Y_{it,k}^{1}\mid\boldsymbol{Z}_{it},D_{it}=0).

We can then construct an estimator as

τ~it,k\displaystyle\tilde{\tau}_{it,k} =𝒁it(𝜽^t,k1𝜽^t,k0),\displaystyle=\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*1}-\widehat{\boldsymbol{\theta}}_{t,k}^{*0}\right), (18)

which is an unbiased estimator for the average treatment effect for individuals with the same values of 𝒁it\boldsymbol{Z}_{it}, or the conditional average treatment effect. It follows that the average of the conditional average treatment effects estimators τ~t,k=1Ni=1Nτ~it,k\tilde{\tau}_{t,k}=\frac{1}{N}\sum_{i=1}^{N}\tilde{\tau}_{it,k} is an unbiased estimator for the average treatment effect τt,k\tau_{t,k}. In addition, it can also be shown that τ~t,k\tilde{\tau}_{t,k} is a consistent estimator without imposing the linear conditional mean assumption (Li and Bell,, 2017).

Proposition 3.

Under Assumptions 1-3,

  1. (i)

    if 𝔼(𝜺i𝒫𝒁it)=𝑪𝒁it\mathbb{E}\left(\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}\mid\boldsymbol{Z}_{it}\right)=\boldsymbol{C}^{\prime}\boldsymbol{Z}_{it}, then 𝔼(τ~it,kτit,k𝒁it=𝒛it)=0\mathbb{E}\left(\tilde{\tau}_{it,k}-\tau_{it,k}\mid\boldsymbol{Z}_{it}=\boldsymbol{z}_{it}\right)=0 and 𝔼(τ~t,kτt,k)=0\mathbb{E}\left(\tilde{\tau}_{t,k}-\tau_{t,k}\right)=0;

  2. (ii)

    τ~t,kτt,k=Op(N11/2)+Op(N01/2)\tilde{\tau}_{t,k}-\tau_{t,k}=O_{p}\left(N_{1}^{-1/2}\right)+O_{p}\left(N_{0}^{-1/2}\right).

Remark 7.

Note that 𝔼(τit,k𝒁it=𝒛it)\mathbb{E}\left(\tau_{it,k}\mid\boldsymbol{Z}_{it}=\boldsymbol{z}_{it}\right) is the average treatment effect for individuals with 𝒁it=𝒛it\boldsymbol{Z}_{it}=\boldsymbol{z}_{it}, or the conditional average treatment effect, whereas the individual treatment effect is 𝔼(τit,k𝑯it=𝒉it)\mathbb{E}\left(\tau_{it,k}\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right) as given in (4). The two are generally not the same since 𝑪𝒁it0\boldsymbol{C}^{\prime}\boldsymbol{Z}_{it}\neq 0.

2.4.2 Interactive fixed effects model

Instead of replacing the unobserved confounders with the observed pretreatment outcomes, Bai, (2009) models the unobserved fixed effects directly by iterating between estimating the coefficients on the observed covariates and estimating the unobserved factors and factor loadings using the principal component analysis, given some initial values. This approach allows more general structures in the error terms, but requires both NN and TT to be large, and is also more restrictive on the model specification: the observed covariates need to be time-varying, while the coefficients are assumed constant over time. Xu, (2017) adapts this method to the potential outcomes framework to estimate the average treatment effects on the treated, assuming that the untreated potential outcomes for both the treated and untreated units follow the interactive fixed effects model, and proposes a cross-validation procedure to choose the number of unobserved factors and a parametric bootstrap procedure for inference.

This approach has the desired feature of being less computationally expensive compared with repeated pretreatment set splitting and averaging, and is potentially more efficient compared with using only the best set of pretreatment outcomes and discarding the remaining information when all outcomes are related. However, its potential to be adapted to our settings is limited by the restrictions discussed above. In particular, we may assume the coefficients to be constant over time, but it would be unrealistic to assume that they are the same across different outcomes, if we were to use multiple related outcomes.

Another closely related study is Athey et al., (2021), which generalises the results from the matrix completion literature in computer science to impute the missing elements of the untreated potential outcome matrix for the treated units in the posttreatment periods, where the matrix is assumed to have a low rank structure, similar to that of the interactive fixed effects model. The bias of the estimator is shown to have an upper bound that goes to 0 as both NN and TT grow. This method allows staggered adoption of the treatment, i.e., the treated units receive the treatment at different time periods.

2.4.3 Synthetic control method

Abadie et al., (2010) estimate the treatment effect on a treated unit by predicting its untreated potential outcome using a synthetic control constructed as a weighted average of the control units. The synthetic control method applies to cases where the pretreatment characteristics of the treated unit can be closely approximated by the synthetic control constructed using a small number of control units over an extended period of time before the treatment, which may not generally hold. In terms of implementation, the objective function for the synthetic control method is similar to that of the linear regression approach in Hsiao et al., (2012). However, the weights on the control units in the synthetic control method are restricted to be nonnegative to avoid extrapolation. This reduces the risk of overfitting, but may also limit its applicability by making it difficult to find a set of weights that satisfy the restrictions.

2.5 Inference

To measure the conditional variance of the individual treatment effect estimator, Var(τ^it,k𝑯,𝑫)\text{Var}\left(\widehat{\tau}_{it,k}\mid\boldsymbol{H},\boldsymbol{D}\right), where 𝑯\boldsymbol{H} is the matrix of observed covariates and unobserved individual characteristics and 𝑫\boldsymbol{D} is the matrix of the treatment status for all individuals and all time periods in the sample, we follow Xu, (2017) and employ a parametric bootstrap procedure.

First, we apply our method to all outcomes in all periods to obtain Y^it,k1\widehat{Y}_{it,k}^{1} and e^it,k1\widehat{e}_{it,k}^{1} for the treated individuals in the posttreatment periods, and Y^it,k0\widehat{Y}_{it,k}^{0} and e^it,k0\widehat{e}_{it,k}^{0} for the untreated individuals in the posttreatment periods and for all individuals in the pretreatment periods. Note that the residuals e^it,k1\widehat{e}_{it,k}^{1} and e^it,k0\widehat{e}_{it,k}^{0} are estimates for εit,k1𝜸t,k1𝜺i𝒫\varepsilon_{it,k}^{1}-{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}} and εit,k0𝜸t,k0𝜺i𝒫\varepsilon_{it,k}^{0}-{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}, respectively, rather than the idiosyncratic shocks in the original model, εit,k1\varepsilon_{it,k}^{1} and εit,k0\varepsilon_{it,k}^{0}. Thus, the variance of the individual treatment effect estimator tends to be overestimated using the parametric bootstrap by resampling these residuals, especially when the number of pretreatment outcomes is small.151515See the discussion on equation (14). Correcting for this bias would be a necessary step for future research.

These fitted values of the outcomes can be stacked into a TK×1TK\times 1 vector 𝒀^i\widehat{\boldsymbol{Y}}_{i} for each individual, where 𝒀^i\widehat{\boldsymbol{Y}}_{i} for i𝒯i\in\mathcal{T} contains Y^it,k1\widehat{Y}_{it,k}^{1} in the posttreatment periods and Y^it,k0\widehat{Y}_{it,k}^{0} in the pretreatment periods, and 𝒀^i\widehat{\boldsymbol{Y}}_{i} for i𝒞i\in\mathcal{C} contains Y^it,k0\widehat{Y}_{it,k}^{0} in all periods. The TK×1TK\times 1 vector of residuals 𝒆^i\widehat{\boldsymbol{e}}_{i} can be obtained similarly.

We then start bootstrapping for BB rounds:

  1. 1.

    In round b{1,,B}b\in\{1,\dots,B\}, generate a bootstrapped sample as

    𝒀i(b)\displaystyle\boldsymbol{Y}_{i}^{(b)} =𝒀^i+𝒆^i(b),for alli,\displaystyle=\widehat{\boldsymbol{Y}}_{i}+\widehat{\boldsymbol{e}}_{i}^{(b)},\enspace\text{for all}\enspace i,

    where 𝒆^i(b)\widehat{\boldsymbol{e}}_{i}^{(b)} is randomly drawn from {𝒆^i}i𝒯\{\widehat{\boldsymbol{e}}_{i}\}_{i\in\mathcal{T}} for i𝒯i\in\mathcal{T}, and from {𝒆^i}i𝒞\{\widehat{\boldsymbol{e}}_{i}\}_{i\in\mathcal{C}} for i𝒞i\in\mathcal{C}.161616Since the entire series of residuals over the TT periods and KK outcomes are resampled, correlation and heteroskedasticity across time and outcomes are preserved (Xu,, 2017).

  2. 2.

    Construct τ^it,k(b)\widehat{\tau}_{it,k}^{(b)} for each ii using the above bootstrapped sample.

The variance for the individual treatment effect estimator is computed using the bootstrap estimates as

Var(τ^it,k𝑯,𝑫)=1Bb=1B(τ^it,k(b)1Ba=1Bτ^it,k(a)),i=1,,N,\displaystyle\text{Var}\left(\widehat{\tau}_{it,k}\mid\boldsymbol{H},\boldsymbol{D}\right)=\frac{1}{B}\sum_{b=1}^{B}\left(\widehat{\tau}_{it,k}^{(b)}-\frac{1}{B}\sum_{a=1}^{B}\widehat{\tau}_{it,k}^{(a)}\right),\enspace i=1,\dots,N,

and the 100(1α)%100(1-\alpha)\% confidence intervals for τ¯it,k\bar{\tau}_{it,k}, i=1,,Ni=1,\dots,N can be constructed as

[τ^it,k[α2B],τ^it,k[(1α2)B]],\left[\widehat{\tau}_{it,k}^{\left[\frac{\alpha}{2}B\right]}\,,\;\widehat{\tau}_{it,k}^{\left[(1-\frac{\alpha}{2})B\right]}\right],

where the superscript denotes the index of the bootstrap estimates in ascending order. Alternatively, we can use a normal approximation and construct the confidence intervals as

[τ^it,k+Φ1(α2)σ^it,k,τ^it,k+Φ1(1α2)σ^it,k],\left[\widehat{\tau}_{it,k}+\Phi^{-1}\left(\frac{\alpha}{2}\right)\widehat{\sigma}_{it,k}\,,\;\widehat{\tau}_{it,k}+\Phi^{-1}\left(1-\frac{\alpha}{2}\right)\widehat{\sigma}_{it,k}\right],

where Φ()\Phi\left(\cdot\right) is the cumulative distribution function for the standard normal distribution, and σ^it,k=Var(τ^it,k𝑯,𝑫)\widehat{\sigma}_{it,k}=\sqrt{\text{Var}\left(\widehat{\tau}_{it,k}\mid\boldsymbol{H},\boldsymbol{D}\right)}.

The variance for the average treatment effect estimator τ^t,k=1Ni=1Nτ^it,k\widehat{\tau}_{t,k}=\frac{1}{N}\sum_{i=1}^{N}\widehat{\tau}_{it,k} and the confidence interval for the average treatment effect τt,k\tau_{t,k} can be obtained in similar manners using the bootstrap estimates τ^t,k(b)=1Ni=1Nτ^it,k(b)\widehat{\tau}_{t,k}^{(b)}=\frac{1}{N}\sum_{i=1}^{N}\widehat{\tau}_{it,k}^{(b)}, b=1,,Bb=1,\dots,B.

3 Monte Carlo Simulations

In this section, we conduct Monte Carlo simulations to assess the performance of our estimator in small samples, and compare it with related methods in relevant settings. The number of posttreatment period T1T_{1} is fixed at 1, and the number of related outcomes KK is fixed at 5 in all settings.

The untreated potential outcomes are generated from

Yit,k0=𝑿it𝜷t,k0+𝝁i𝝀t,k0+εit,k0,k𝒦,Y_{it,k}^{0}=\boldsymbol{X}_{it}^{\prime}\boldsymbol{\beta}_{t,k}^{0}+\boldsymbol{\mu}_{i}^{\prime}\boldsymbol{\lambda}_{t,k}^{0}+\varepsilon_{it,k}^{0},\enspace k\in\mathcal{K}, (19)

where 𝑿it\boldsymbol{X}_{it} contains 2 observed covariates, and 𝝁i\boldsymbol{\mu}_{i} contains 2 unobserved individual characteristics as well as the constant 1. The 2 observed covariates are i.i.d. N(0,1)N(0,1) in period 1, and then follow an AR(1) process, 𝑿it=0.9𝑿i,t1+ξit\boldsymbol{X}_{it}=0.9\boldsymbol{X}_{i,t-1}+\xi_{it}, where ξit\xi_{it} are i.i.d. N(0,10.92)N(0,\sqrt{1-0.9^{2}}), so that the observed covariates are correlated across time and the variances stay 1. The 2 unobserved individual characteristics are also i.i.d. N(0,1)N(0,1). The coefficients 𝜷t,k0\boldsymbol{\beta}_{t,k}^{0} and 𝝀t,k0\boldsymbol{\lambda}_{t,k}^{0} are i.i.d. N(ωk,1)N(\omega_{k},1) with ωkN(1,1)\omega_{k}\sim N(1,1), for k𝒦k\in\mathcal{K}, so that the means of the coefficients differ across outcomes, and the idiosyncratic shocks εit,k0\varepsilon_{it,k}^{0} are i.i.d. N(0,1)N(0,1).

The individual treatment effect in the posttreatment period τ¯iT0+1,k\bar{\tau}_{iT_{0}+1,k} is a deterministic function of 𝑿it\boldsymbol{X}_{it} and 𝝁i\boldsymbol{\mu}_{i} with the coefficients being i.i.d. N(0.5,0.5)N(0.5,0.5), for k𝒦k\in\mathcal{K}. And the observed outcomes Yit,kY_{it,k}, k𝒦k\in\mathcal{K} are equal to Yit,k0εit,k0+τ¯iT0+1,k+εit,k1Y_{it,k}^{0}-\varepsilon_{it,k}^{0}+\bar{\tau}_{iT_{0}+1,k}+\varepsilon_{it,k}^{1}, where εit,k1\varepsilon_{it,k}^{1} are i.i.d. N(0,1)N(0,1), for the treated individuals in the posttreatment period, and Yit,k0Y_{it,k}^{0} otherwise. 𝑿it\boldsymbol{X}_{it} and 𝝁i\boldsymbol{\mu}_{i} as well as their coefficients for the untreated potential outcomes and the treatment effects are drawn 5 times, and for each set of {𝑿it,𝝁i}\{\boldsymbol{X}_{it},\boldsymbol{\mu}_{i}\} and their coefficients drawn, εit,k0\varepsilon_{it,k}^{0} and εit,k1\varepsilon_{it,k}^{1} are drawn 1000 times, which allows us to compute the bias and variance of the estimator conditional on the observed covariates and the unobserved individual characteristics.

To measure the performances of the estimators, we compute the biases and standard deviations for the estimates of the individual treatment effects and the average treatment effect for outcome KK in the posttreatment period. Specifically, the bias of the individual treatment effect estimator τ^iT0+1,K\widehat{\tau}_{iT_{0}+1,K} is measured by 1Ni=1N15d=15|𝔼(τ^iT0+1,K(d,s))τ¯iT0+1,K(d)|\frac{1}{N}\sum_{i=1}^{N}\frac{1}{5}\sum_{d=1}^{5}\left|\mathbb{E}\left(\widehat{\tau}_{iT_{0}+1,K}^{(d,s)}\right)-\bar{\tau}_{iT_{0}+1,K}^{(d)}\right|, where the superscript dd denotes the ddth draw of {𝑿it,𝝁i}\{\boldsymbol{X}_{it},\boldsymbol{\mu}_{i}\} and ss denotes the ssth draw of εit,k0\varepsilon_{it,k}^{0} and εit,k1\varepsilon_{it,k}^{1}, and the standard deviation is constructed as 1Ni=1N15d=15𝔼(τ^iT0+1,K(d,s)𝔼τ^iT0+1,K(d,s))2\frac{1}{N}\sum_{i=1}^{N}\frac{1}{5}\sum_{d=1}^{5}\sqrt{\mathbb{E}\left(\widehat{\tau}_{iT_{0}+1,K}^{(d,s)}-\mathbb{E}\widehat{\tau}_{iT_{0}+1,K}^{(d,s)}\right)^{2}}. Similarly, the bias of the average treatment effect estimator τ^T0+1,K\widehat{\tau}_{T_{0}+1,K} is measured by 15d=15|𝔼(τ^T0+1,K(d,s))τ¯T0+1,K(d)|\frac{1}{5}\sum_{d=1}^{5}\left|\mathbb{E}\left(\widehat{\tau}_{T_{0}+1,K}^{(d,s)}\right)-\bar{\tau}_{T_{0}+1,K}^{(d)}\right|, and the standard deviation is constructed as 15d=15𝔼(τ^T0+1,K(d,s)𝔼τ^T0+1,K(d,s))2\frac{1}{5}\sum_{d=1}^{5}\sqrt{\mathbb{E}\left(\widehat{\tau}_{T_{0}+1,K}^{(d,s)}-\mathbb{E}\widehat{\tau}_{T_{0}+1,K}^{(d,s)}\right)^{2}}.171717The performance of the estimators can also be measured using RMSE, which is computed as 15d=15𝔼(τ^iT0+1,K(d,s)𝔼τ¯iT0+1,K(d,s))2\frac{1}{5}\sum_{d=1}^{5}\sqrt{\mathbb{E}\left(\widehat{\tau}_{iT_{0}+1,K}^{(d,s)}-\mathbb{E}{\bar{\tau}}_{iT_{0}+1,K}^{(d,s)}\right)^{2}} for τ^iT0+1,K\widehat{\tau}_{iT_{0}+1,K} and 15d=15𝔼(τ^T0+1,K(d,s)𝔼τ¯T0+1,K(d,s))2\frac{1}{5}\sum_{d=1}^{5}\sqrt{\mathbb{E}\left(\widehat{\tau}_{T_{0}+1,K}^{(d,s)}-\mathbb{E}{\bar{\tau}}_{T_{0}+1,K}^{(d,s)}\right)^{2}} for τ^T0+1,K\widehat{\tau}_{T_{0}+1,K}. Since the biases of our estimators are small, these measures are quite similar to SD and are thus omitted from reporting.

[!htbp] Simulation Results on Model Selection Best Set Model Averaging ITE ATE ITE ATE N1N_{1} N0N_{0} T0T_{0} P Bias SD Bias SD P Bias SD Bias SD 50 50 1 2.2 0.151 1.225 0.082 0.384 2.3 0.231 1.163 0.120 0.336 100 100 1 2.2 0.076 0.764 0.032 0.212 2.4 0.065 0.836 0.024 0.205 200 200 1 2.1 0.038 0.476 0.004 0.127 2.6 0.046 0.712 0.011 0.133 50 50 2 2.6 0.062 0.875 0.014 0.253 2.9 0.150 0.758 0.040 0.232 100 100 2 2.7 0.035 0.685 0.003 0.165 2.9 0.073 0.563 0.014 0.159 200 200 2 3.2 0.038 0.729 0.003 0.137 4.0 0.031 0.702 0.003 0.131

  • Note: This table compares the estimator using only the best set of pretreatment outcomes and the estimator constructed from model averaging, in terms of the optimal number of pretreatment outcomes selected by LOO cross-validation, as well as the bias and SD for the ITE and ATE estimates, with varying sample size and number of pretreatment periods, based on 5000 simulations for each setting.

Table 3 compares the GMM estimator constructed using only the best set of pretreatment outcomes with that constructed by averaging estimators from different models with the same number of pretreatment outcomes. We see that the best number of pretreatment outcomes, PP, is slightly larger than the number of unobserved individual characteristics (f=2f=2) for both estimators, and increases when the sample size is larger and when there are more pretreatment outcomes available, which is in line with our discussions in section 2.3. The estimators constructed by model averaging also tends to select a slightly larger PP than the estimator using only the best set of pretreatment outcomes.

In almost all settings, the estimator using only the best set of pretreatment outcomes tends to have a smaller bias, whereas the estimator constructed from model averaging tends to have a smaller variance, except for estimating the individual treatment effects when the number of pretreatment outcomes is very small. The bias and SD also become smaller for both estimators when the sample size as well as the number of pretreatment outcomes grow.

In the following simulations, we fix PP at 2 when T0=1T_{0}=1, and 3 when T0=2T_{0}=2, and construct the GMM estimator using only the best set of pretreatment outcomes, with the best set of pretreatment outcomes selected at the first simulation and used for the remaining simulations for each setting.181818This is mainly to save computing time and does not fundamentally change the conclusions.

[!htbp] Simulation Results for the GMM estimator ITE ATE N1N_{1} N0N_{0} T0T_{0} Bias SD Coverage Bias SD Coverage Panel A: εit,k0\varepsilon_{it,k}^{0} uncorrelated across tt and kk 50 50 1 0.096 1.422 0.997 0.040 0.443 0.995 100 100 1 0.043 0.856 0.992 0.005 0.226 0.984 50 50 2 0.043 1.163 0.973 0.011 0.288 0.959 100 100 2 0.025 0.900 0.953 0.005 0.181 0.956 Panel B: εit,k0\varepsilon_{it,k}^{0} correlated across tt and kk 50 50 1 0.134 1.419 0.996 0.045 0.431 0.993 100 100 1 0.065 0.851 0.991 0.018 0.230 0.982 50 50 2 0.063 1.162 0.976 0.008 0.294 0.961 100 100 2 0.037 0.906 0.964 0.009 0.183 0.967

  • Note: This table compares the bias and SD of the GMM estimator, as well as the coverage probability of the 95% confidence interval, with varying sample size and number of pretreatment periods, based on 5000 simulations for each setting.

Table 3 reports the bias and SD of the GMM estimator, as well as the coverage probability of the 95% confidence interval, for estimating the individual treatment effects and the average treatment effect. Panel A shows that the bias and SD for the estimators are small even with a small sample size and a small number of pretreatment outcomes. However, the 95% confidence intervals tend to have larger coverage probabilities, especially when the number of pretreatment outcomes is small. This distortion is alleviated as more pretreatment outcomes are available.

Since the validity of the GMM estimator relies on the assumption that εit,k\varepsilon_{it,k} are uncorrelated across time or outcomes, we examine the performance of the estimator when this assumption is violated in Panel B, where the idiosyncratic shocks follow an AR(1) process over time with the autoregression coefficient being 0.1, and are correlated across outcomes by sharing a common component for different outcomes in the same period. This slightly increases the biases and SD’s of the estimators, but the performance of the estimators are still quite good, especially in comparison with related methods as shown in the following tables.

[!htbp] Simulation OLS GMM ITE ATE ITE ATE N1N_{1} N0N_{0} T0T_{0} Bias SD Bias SD Bias SD Bias SD Panel A: Linear conditional mean 100 100 1 0.099 0.622 0.041 0.186 0.113 1.297 0.086 0.257 200 200 1 0.059 0.415 0.015 0.119 0.011 0.430 0.003 0.129 100 100 2 0.046 0.677 0.012 0.158 0.026 0.901 0.003 0.179 200 200 2 0.064 0.547 0.012 0.121 0.028 0.937 0.003 0.142 Panel B: Nonlinear conditional mean 100 100 1 0.097 0.687 0.057 0.199 0.025 0.806 0.011 0.244 200 200 1 0.140 0.456 0.040 0.122 0.016 0.552 0.003 0.149 100 100 2 0.077 0.734 0.015 0.173 0.115 1.118 0.007 0.213 200 200 2 0.093 0.551 0.004 0.116 0.025 0.910 0.003 0.142

  • Note: This table compares the bias and SD for the OLS estimator and the GMM estimator, with varying sample size and number of pretreatment periods, based on 5000 simulations for each setting.

Table 3 compares our method with the OLS approach in Hsiao et al., (2012). In panel A, both the unobserved individual characteristics and the idiosyncratic shocks are normally distributed so that the linear conditional mean assumption is satisfied. The results show that the GMM estimator outperforms the OLS estimator by having a smaller bias in estimating both the individual treatment effects and the average treatment effect, although the variance of the GMM estimator is also larger.

In panel B, the unobserved individual characteristics are drawn from the uniform distribution, and the linear conditional mean assumption is no longer satisfied (Li and Bell,, 2017). We see that the results are virtually unchanged for the GMM estimator, while the OLS estimator performs slightly worse by having larger biases and SD’s, which is more pronounced in estimating the individual treatment effects. The results indicate that the linear conditional mean assumption is not a very strong one. Indeed, the distribution of the sum of several random variables would become more bell-shaped like the normal distribution under fairly general conditions, as a result of the central limit theorem. The simulation results are very similar when the unobserved individual characteristics are drawn from a mix of other distributions.

[!htbp] Simulation IFE GMM ITT ATT ITT ATT N1N_{1} N0N_{0} T0T_{0} Bias SD Bias SD Bias SD Bias SD Panel A: 𝑿it\boldsymbol{X}_{it} constant across tt 5 100 1 1.203 1.357 0.657 0.610 0.047 1.499 0.016 0.687 5 200 1 1.264 1.229 0.492 0.548 0.089 1.624 0.023 0.730 5 100 2 0.922 1.140 0.263 0.518 0.034 1.265 0.015 0.585 5 200 2 0.982 1.147 0.378 0.520 0.040 1.387 0.018 0.634 Panel B: 𝜷t,k0\boldsymbol{\beta}_{t,k}^{0} constant across tt 5 100 1 1.289 1.220 0.773 0.579 0.029 1.349 0.016 0.616 5 200 1 1.681 1.370 0.836 0.620 0.034 1.563 0.020 0.713 5 100 2 0.930 1.070 0.440 0.486 0.026 1.261 0.017 0.577 5 200 2 1.417 1.083 1.015 0.489 0.031 1.250 0.011 0.564

  • Note: This table compares the bias and SD for the IFE estimator and the GMM estimator, with varying sample size and number of pretreatment periods, based on 5000 simulations for each setting.

Table 3 compares our method with the method of estimating the interactive fixed effects model directly, which was first developed in Bai, (2009) and then adapted into the potential outcomes framework by Xu, (2017) to allow heterogeneous treatment effects. We fix the number of treated individuals at 5, and compare the performance of the two methods in estimating the individual treatment effect on the treated and the average treatment effect on the treated.

We consider two scenarios that are relevant in the context of empirical microeconomics. In panel A, the observed covariates are constant over time. This is plausible for covariates such as gender, race or education level, which are likely to be stable over time. Since the IFE method requires the observed covariates to be time-varying, the covariates that are constant over time are dropped from the estimation and become part of the unobserved individual characteristics, which makes the model equivalent to a pure factor model with 4 unobserved factors. As we have 5 related outcomes, this model should still be estimable by the IFE method. However, we see that IFE method perform poorly when there are only a small number of pretreatment outcomes to recover the unobserved individual characteristics. The bias and SD of the IFE estimator become smaller as more pretreatment outcomes are available, but are still quite large compared with our method.

To accommodate the restrictive model specification for the IFE method, we allow the covariates to be time-varying while keeping the coefficients constant over time in panel B, although the coefficients are allowed to vary across outcomes since it is unlikely that the coefficients for different outcomes would be the same in practice. We see that the IFE estimator has poor performance since the model is still misspecified in their method, whereas the results for our method are virtually unchanged.

[!htbp] Simulation SCM GMM ITT ATT ITT ATT N1N_{1} N0N_{0} T0T_{0} Bias SD Bias SD Bias SD Bias SD Panel A: distributions of 𝝁i\boldsymbol{\mu}_{i} same for treated and control 5 100 1 0.376 1.247 0.245 0.577 0.029 1.349 0.016 0.617 5 200 1 0.883 1.376 0.698 0.628 0.035 1.563 0.020 0.713 5 100 2 0.930 1.191 0.526 0.547 0.023 1.243 0.014 0.565 5 200 2 0.469 1.186 0.126 0.531 0.031 1.250 0.012 0.564 Panel B: distributions of 𝝁i\boldsymbol{\mu}_{i} different for treated and control 5 100 1 0.763 1.253 0.634 0.605 0.036 1.368 0.022 0.658 5 200 1 1.412 1.413 1.269 0.656 0.037 1.573 0.024 0.735 5 100 2 0.982 1.204 0.513 0.558 0.027 1.249 0.020 0.578 5 200 2 0.781 1.203 0.613 0.551 0.032 1.256 0.014 0.577

  • Note: This table compares the bias and SD for the SCM estimator and the GMM estimator, with varying sample size and number of pretreatment periods, based on 5000 simulations for each setting.

Table 3 compares our method with the synthetic control method (Abadie et al.,, 2010). In panel A, the unobserved individual characteristics for both the treated individuals and the untreated individuals are drawn from N(0,1)N(0,1), while in panel B, the unobserved individual characteristics for the treated individuals are drawn from N(1,1)N(1,1). Since the synthetic control method requires the treated units to be in the convex hull of the control units by restricting the weights assigned to the control units to be nonnegative, their method may perform poorly when the support of the unobserved individual characteristics are different for the treated and untreated individuals. While our method should be unaffected by the degree of overlapping in the distributions of the unobserved individual characteristics for the two treatment groups. The simulation results show that indeed the synthetic control estimator performs worse in panel B. Perhaps somewhat surprising is that its performance is also poor compared with our method in panel A. This is because the coefficients are outcome-specific, so that the levels of the outcomes are also likely to vary across outcomes, which makes it more difficult to obtain a good pretreatment fit under the nonnegativity restriction. In comparison, our method has good performance in both panels.

Overall, the simulation results show that our method has good performance in terms of the bias and SD in estimating the individual treatment effects and the average treatment effect under various settings, and has superior performance than related methods. The shortcoming of our method is that the confidence intervals tend to be too wide, especially when the number of pretreatment outcomes is small.

4 Empirical Application

We illustrate our method by estimating the effect of health insurance coverage on the individual usage of hospital emergency departments.

Although the usage of emergency departments applies to only a small proportion of the population, it imposes great financial pressure on the health care system. In addition, it is not clear ex ante what the direction of the effect should be. E.g., Taubman et al., (2014) argues that health insurance coverage could either increase emergency-department use by reducing its cost for the patients, or decrease emergency-department use by encouraging primary care use or improving health.

The findings on emergency-department use have been mixed. Using survey data collected from the participants of the Oregon Health Insurance Experiment (OHIE) about a year after they were notified of the selection results, Finkelstein et al., (2012) find no discernible impact of health insurance coverage on emergency-department use.191919The Oregon Health Insurance Experiment (OHIE) was initiated in 2008, targeting at low-income adults in Oregon who had been without health insurance for at least 6 months. Among the 89,824 individuals who signed up, 35,169 individuals were randomly selected by the lottery and were eligible to apply for the Oregon Health Plan (OHP) Standard program, which provided relatively comprehensive medical benefits with no consumer cost sharing, and the monthly premiums was only between $0 and $20 depending on the income. As a randomised controlled experiment, the OHIE offers an opportunity for researchers to study the effect of health insurance coverage on various health outcomes without confounding factors. While using the visit-level data for all emergency-department visits to twelve hospitals in the Portland area probabilistically matched to the OHIE study population on the basis of name, date of birth, and gender, Taubman et al., (2014) find that health insurance coverage significantly increases emergency-department use by 0.41 visits per person, from an average of 1.02 visits per person in the control group in the first 15 months of the experiment. They also examine whether the effect differs across heterogeneous groups, and find statistically significant increases in emergency-department use across most subgroups in terms of the number of pre-experiment emergency-department visits, hospital admission (inpatient or outpatient visits), timing (on-hours or off-hours visits), the type of visits (emergent and not preventable, emergent and preventable, primary care treatable, and non-emergent), as well as gender, age, and health condition.

In this application, we wish to estimate the effect of health insurance coverage on emergency-department use for each individual in the sample. This would potentially help us better understand whether and how health insurance coverage affects emergency-department use, compared with using only the average treatment effect for the whole sample or for some preassigned subgroups (conditional average treatment effects).

Our data combines both the hospital emergency-department visit-level data and the survey data. There are two time periods, one before the randomisation and one after.202020The pre-randomisation period in the hospital visit-level data was from January 2007 to March 2008, and the post-randomisation period was from March 2008 to September 2009. The two surveys were collected shortly after the randomisation and about a year after randomisation, respectively, each covering a 6-month period before the survey. To estimate the individual treatment effects, we include 3 observed covariates including gender, birth year, and household income as a percentage of the federal poverty line, and 10 related outcomes including different types of emergency-department visits and medical charges. We also consider a rich list of variables on which we make comparisons for individuals with different estimated treatment effects. There are 2154 individuals with complete information on these variables.212121Note that our sample size is significantly smaller than the other studies using the OHIE data, due to the inclusion of the extensive list of variables. For example, the sample size in Finkelstein et al., (2012) is 74,922, and the sample size in Taubman et al., (2014) is 24,646. So our sample may not be representative of the OHIE sample and the results in different studies may not be directly comparable.

Sample Selection Selected Not-selected Difference Insured Not-insured Difference (1) (2) (3) (4) (5) (6) Female 0.59 0.60 -0.01 0.63 0.58 0.05* Birth year 1966.24 1966.44 -0.19 1967.03 1966.09 0.95 Household income as percent of federal poverty line 79.77 75.67 4.10 53.73 86.57 -32.84*** # ED visits 0.32 0.43 -0.11** 0.48 0.33 0.15** # outpatient ED visits 0.27 0.35 -0.09** 0.40 0.28 0.13** # weekday daytime ED visits 0.18 0.24 -0.06** 0.27 0.19 0.08** # emergent non-preventable ED visits 0.07 0.09 -0.02 0.11 0.07 0.04** # emergent preventable ED visits 0.03 0.03 0.00 0.03 0.02 0.01 # primary care treatable ED visits 0.10 0.15 -0.05*** 0.15 0.12 0.04 Total charges 859.71 1276.90 -417.19* 1379.58 947.54 432.04 Total ED charges 345.48 504.64 -159.16** 494.95 396.86 98.09 # ED visits to a high uninsured volume hospital 0.17 0.22 -0.05 0.25 0.17 0.08** # ED visits (survey) 0.24 0.30 -0.06* 0.38 0.23 0.15*** N 1103 1051 577 1577

  • 1) This table compares the mean values of the covariates and related outcomes in the pretreatment period for individuals selected/not-selected by the lottery, and individuals insured/not-insured.

  • 2) Significance levels of the two-sample t-test: * 10%, ** 5%, *** 1%.

The first 3 columns in Table 4 present the mean values of the covariates and outcomes in the pretreatment period for individuals selected by the lottery and for individuals not selected by the lottery, as well as the difference between the two groups. Since a considerable number of observations with incomplete information are dropped, being selected by the lottery is negatively correlated with different types of emergency-department visits in the pretreatment period in our sample, which suggests that the lottery assignment is not likely to be a valid instrument for health insurance coverage. Table 4 also compares the mean pretreatment characteristics for individuals covered by health insurance and those not covered, which shows that individuals who were covered were poorer and used emergency-department more frequently in the pretreatment period than people who were not covered by health insurance.

Refer to caption
Figure 1: Distribution of the estimated individual treatment effects

Figure 1 shows the distribution of the estimated individual treatment effects using our method.222222As mentioned earlier, since only individuals with complete information on the variables are selected into our sample, the distribution may not be representative of the OHIE participants. The distribution of the estimated individual treatment effects may be more spread out than the distribution of the true effects due to noise, or less spread out since the estimates are based on the parametric models for the potential outcomes, which may be over-simplifying compared with the true models. The mean of estimated individual treatment effects, or the estimated average treatment effect is 0.33, which is significant at 1% level. 114 individuals have treatment effects that are significant at 10% level, among which 23 are negative and 91 are positive.232323Note that if we were to adjust for multiple testing, e.g., using the Benjamini–Hochberg procedure to control the false discovery rate (FDR) at 10% level, then we would be left with only one individual whose treatment effect is significant. Although the small number of individuals with significant treatment effects may also be attributed to the overestimation of the variance of the individual treatment effect estimator. We then move on to compare the characteristics of the individuals based on their estimated treatment effects, which are presented in Table 4-4. Column (1) shows the mean characteristics of individuals whose treatment effects are not significant at 10% level, column (2) shows the mean characteristics of individuals whose treatment effects are significantly negative, column (3) shows the differences between column (2) and column (1), column (4) shows the mean characteristics of individuals whose treatment effects are significantly positive, and column (5) shows the differences between column (4) and column (1).

Compared with individuals who would not be significantly affected by the treatment, individuals who would significantly decrease or increase their emergency-department visits if covered by health insurance both had more emergency-department visits and more medical charges in the pretreatment period. However, these two groups were also distinct in some characteristics.

The individuals who would have fewer emergency-department visits if covered by health insurance were on average 7 years younger than individuals in the control group and 10 years younger than the positive group, more likely to be female with less education, and in particular, were much poorer than individuals in the other groups. They were more likely to be diagnosed with depression but not other conditions. Importantly, they were less likely to have any primary care visits, and more likely to use emergency department as the place for medical care. In terms of emergency-department use in the pretreatment period, they had fewer visits resulting in hospitalisation, more outpatient visits, more preventable and non-emergent visits, more visits to hospitals with a low fraction of uninsured patients, fewer visits for chronic conditions, and more visits for injury. Although their medical charges were not as high as those for individuals in the positive group, they owed more money for medical expenses.

In comparison, individuals who would have more emergency-department visits if covered by health insurance were more likely to be older, male, and with household income right above the federal poverty line, which means that they were not as poor as the individuals in the other groups. They were in worse health conditions, more likely to be diagnosed with diabetes and high blood pressure, and were taking more prescription medications. They also had more emergency-department visits of all types in the pretreatment period, including visits resulting in hospitalisation and visits for more severe conditions such as chronic conditions, chest pain and psychological conditions, and they incurred more medical charges.

Overall, these comparisons suggest that the individuals who would have fewer emergency-department visits if covered by health insurance were younger and not in very bad physical conditions. However, their access to primary care were limited due to being in much more disadvantaged positions financially, which made them resort to using the emergency department as the usual place for medical care. In contrast, the individuals who would have more emergency-department visits if covered by health insurance were more likely to be older and in poor health. So even with access to primary care, they still used emergency departments more often for severe conditions, although sometimes for primary care treatable and non-emergent conditions as well.

All in all, it seems that both mechanisms discussed by Taubman et al., (2014) are playing a role. For people who used emergency department for medical care because they did not have access to primary care service, health insurance coverage decreases emergency-department use because it increases access to primary care and may also lead to improved health. Whereas for people who had access to primary care and still used emergency department due to worse physical conditions, health insurance coverage increases emergency-department use because it reduces the out-of-pocket cost of the visits.

This application shows the potential value of estimating individual treatment effects for policy evaluation. Our findings would not have been possible by only estimating conditional average treatment effects, as we would not be able to distinguish individuals with positive or negative treatment effects at the first place.

Comparison of Characteristics Same Fewer Difference More Difference (1) (2) (3) (4) (5) Birth year 1966.42 1973.90 7.48* 1964.09 -2.33** Female 0.61 1.00 0.39*** 0.37 -0.24*** Education 2.52 1.90 -0.62** 2.30 -0.22** English 0.89 0.90 0.01 0.95 0.07*** Race: White 0.75 0.50 -0.25 0.79 0.04 Hispanic 0.10 0.10 0.00 0.12 0.02 Black 0.06 0.30 0.24 0.06 0.00 Asian 0.10 0.10 0.00 0.06 -0.05* American Indian or Alaska Native 0.04 0.10 0.06 0.07 0.03 Native Hawaiian or Pacific Islander 0.01 0.00 -0.01*** 0.01 0.00 Other races 0.07 0.10 0.03 0.06 -0.01 Employed 0.53 0.60 0.07 0.48 -0.05 Average hours worked per week 2.25 2.40 0.15 2.20 -0.05 Household income as percent of federal poverty line 76.12 28.02 -48.09*** 114.98 38.86*** Household Size (adults and children) 2.99 3.50 0.51 2.61 -0.39** Number of family members under 19 living in house 0.88 1.00 0.12 0.56 -0.33*** Overall health 3.01 2.40 -0.61 2.84 -0.17 Health change -0.09 -0.10 -0.01 -0.08 0.01 # days physical health not good 6.97 7.60 0.63 9.49 2.52** # days mental health not good 8.43 13.90 5.47 10.30 1.87 # days poor health impaired regular activities 6.09 7.90 1.81 8.36 2.27** Diabetes 0.10 0.20 0.10 0.20 0.10** Asthma 0.13 0.20 0.07 0.13 0.01 High blood pressure 0.24 0.20 -0.04 0.38 0.15*** Depression 0.37 0.70 0.33* 0.47 0.10** Any primary care visits 0.55 0.20 -0.35** 0.57 0.02 # primary care visits 1.64 1.80 0.16 1.86 0.22 Any hospital visits 0.04 0.10 0.06 0.12 0.07** # hospital visits 0.05 0.10 0.05 0.16 0.11** Usual place for medical care: Private clinic 0.18 0.00 -0.18*** 0.28 0.10** Public clinic 0.17 0.10 -0.07 0.18 0.01 Hospital-based clinic 0.07 0.00 -0.07*** 0.08 0.01 Hospital ER 0.03 0.50 0.47** 0.06 0.03 Urgent care clinic 0.03 0.10 0.07 0.02 -0.01 Other places 0.06 0.00 -0.06*** 0.07 0.00 Don’t have usual place 0.46 0.30 -0.16 0.32 -0.14***

  • 1) This table shows the mean characteristics for individuals whose treatment effects are not statistically significant at 10% level (column 1), whose treatment effects are significantly negative (column 2), and whose treatment effects are significantly positive (column 4). Column (3) contains the differences between column (2) and column (1), and column (5) contains the differences between column (4) and column (1).

  • 2) Significance levels of the two-sample t-test: * 10%, ** 5%, *** 1%.

Comparison of Characteristics Same Fewer Difference More Difference (1) (2) (3) (4) (5) Needed medical care 0.68 1.00 0.32*** 0.76 0.08* Got all needed medical care 0.58 0.40 -0.18 0.56 -0.02 Reason went without care: Cost too much 0.33 0.20 -0.13 0.31 -0.02 No insurance 0.36 0.50 0.14 0.35 -0.01 Doc wouldn’t take insurance 0.01 0.00 -0.01*** 0.00 -0.01*** Owed money to provider 0.05 0.00 -0.05*** 0.10 0.05* Couldn’t get an appointment 0.03 0.00 -0.03*** 0.01 -0.02* Office wasn’t open 0.01 0.00 -0.01*** 0.00 -0.01*** Didn’t have a doctor 0.11 0.20 0.09 0.10 -0.01 Other reasons 0.03 0.00 -0.03*** 0.04 0.01 Don’t know 0.00 0.00 0.00*** 0.00 0.00*** Needed prescription medications 0.61 0.80 0.19 0.77 0.16*** Got all needed prescriptions 0.74 0.70 -0.04 0.64 -0.09* Currently taking any prescription medications 0.44 0.30 -0.14 0.68 0.25*** # prescription medications taking 1.37 1.30 -0.07 2.56 1.18*** Reason went without prescription medication: Cost too much 0.21 0.20 -0.01 0.27 0.06 No insurance 0.20 0.10 -0.10 0.20 0.00 Didn’t have doctor 0.08 0.10 0.02 0.10 0.01 Couldn’t get prescription 0.08 0.10 0.02 0.07 -0.01 Couldn’t get to pharmacy 0.01 0.10 0.09 0.00 -0.01*** Other reasons 0.02 0.00 -0.02*** 0.04 0.02 Don’t know 0.00 0.00 0.00 0.00 0.00 Needed dental care 0.70 1.00 0.30*** 0.70 0.00 Got all needed dental care 0.41 0.30 -0.11 0.37 -0.05 Any ER visits 0.14 0.90 0.76*** 0.26 0.12*** # of ER visits 0.25 2.40 2.15*** 0.45 0.20** Used emergency room for non-emergency care 0.02 0.10 0.08 0.06 0.03 Reason went to ER: Needed emergency care 0.05 0.80 0.75*** 0.06 0.01 Clinics closed 0.01 0.30 0.29* 0.02 0.00 Couldn’t get doctor’s appointment 0.02 0.20 0.18 0.02 0.00 Didn’t have personal doctor 0.02 0.30 0.28 0.03 0.01 Couldn’t afford copay to see a doctor 0.01 0.20 0.19 0.02 0.00 Didn’t know where else to go 0.02 0.20 0.18 0.04 0.02 Other reason 0.01 0.10 0.09 0.02 0.01 Needed prescription drug 0.01 0.10 0.09 0.00 -0.01*** Don’t know 0.00 0.00 0.00 0.00 0.00 Any out of pocket costs for medical care 0.65 0.80 0.15 0.71 0.06 Total out of pocket costs for medical care 5195.07 1257.00 -3938.07 1136.44 -4058.62 Borrowed money/skipped bills to pay health care bills 0.34 0.50 0.16 0.47 0.13** Currently owe money for medical expenses 0.46 0.80 0.34** 0.71 0.25*** Total amount currently owed for medical expenses 1559.40 7354.00 5794.60** 5694.17 4134.77***

  • 1) This table shows the mean characteristics for individuals whose treatment effects are not statistically significant at 10% level (column 1), whose treatment effects are significantly negative (column 2), and whose treatment effects are significantly positive (column 4). Column (3) contains the differences between column (2) and column (1), and column (5) contains the differences between column (4) and column (1).

  • 2) Significance levels of the two-sample t-test: * 10%, ** 5%, *** 1%.

Comparison of Characteristics Same Fewer Difference More Difference (1) (2) (3) (4) (5) Any ED visits 0.16 0.90 0.74*** 0.84 0.68*** # ED visits 0.28 3.10 2.82*** 1.94 1.67*** Any ED visits resulting in hospitalization 0.03 0.00 -0.03*** 0.37 0.34*** # ED visits resulting in hospitalization 0.03 0.00 -0.03*** 0.62 0.59*** Any outpatient ED visits 0.15 0.90 0.75*** 0.67 0.52*** # outpatient ED visits 0.25 3.10 2.85*** 1.32 1.07*** Any weekday daytime ED visits 0.10 0.60 0.50** 0.62 0.52*** # weekday daytime ED visits 0.15 1.50 1.35** 1.14 0.99*** Any off-time ED visits 0.09 0.90 0.81*** 0.51 0.42*** # off-time ED visits 0.12 1.60 1.48*** 0.83 0.71*** # emergent non-preventable ED visits 0.05 0.17 0.12 0.66 0.61*** # emergent preventable ED visits 0.02 0.14 0.12** 0.15 0.12*** # primary care treatable ED visits 0.10 1.37 1.27*** 0.46 0.36*** # non-emergent ED visits 0.05 1.22 1.16*** 0.35 0.29*** # unclassified ED visits 0.05 0.20 0.15 0.36 0.30*** Any ambulatory case sensitive ED visits 0.01 0.00 -0.01*** 0.15 0.14*** # ambulatory case sensitive ED visits 0.02 0.00 -0.02*** 0.15 0.13*** Any ED visits to a high uninsured volume hospital 0.08 0.20 0.12 0.76 0.68*** # ED visits to a high uninsured volume hospital 0.12 0.30 0.18 1.61 1.49*** Any ED visits to a low uninsured volume hospital 0.09 0.90 0.81*** 0.19 0.10** # ED visits to a low uninsured volume hospital 0.15 2.80 2.65*** 0.33 0.17** Any ED visits for chronic conditions 0.03 0.30 0.27 0.36 0.32*** # ED visits for chronic conditions 0.05 0.50 0.45 0.62 0.57*** Any ED visits for injury 0.06 0.40 0.34* 0.32 0.26*** # ED visits for injury 0.07 0.40 0.33* 0.43 0.37*** Any ED visits for skin conditions 0.01 0.10 0.09 0.02 0.01 # ED visits for skin conditions 0.02 0.10 0.08 0.03 0.01 Any ED visits for abdominal pain 0.01 0.10 0.09 0.06 0.05** # ED visits for abdominal pain 0.01 0.10 0.09 0.10 0.09** Any ED visits for back pain 0.01 0.30 0.29* 0.04 0.03 # ED visits for back pain 0.01 0.40 0.39 0.06 0.04 Any ED visits for chest pain 0.01 0.00 -0.01*** 0.05 0.04* # ED visits for chest pain 0.01 0.00 -0.01*** 0.05 0.04* Any ED visits for headache 0.01 0.00 -0.01*** 0.01 0.00 # ED visits for headache 0.01 0.00 -0.01*** 0.01 0.00 Any ED visits for mood disorders 0.00 0.00 0.00** 0.09 0.08*** # ED visits for mood disorders 0.00 0.00 0.00** 0.15 0.15** Any ED visits for psych conditions/substance abuse 0.01 0.00 -0.01*** 0.17 0.17*** # ED visits for psych conditions/substance abuse 0.01 0.00 -0.01*** 0.36 0.34*** Total ED charges 274.98 1818.44 1543.46** 3195.22 2920.24*** Total charges 639.70 2223.85 1584.16** 9260.22 8620.52***

  • 1) This table shows the mean characteristics for individuals whose treatment effects are not statistically significant at 10% level (column 1), whose treatment effects are significantly negative (column 2), and whose treatment effects are significantly positive (column 4). Column (3) contains the differences between column (2) and column (1), and column (5) contains the differences between column (4) and column (1).

  • 2) Significance levels of the two-sample t-test: * 10%, ** 5%, *** 1%.

5 Conclusion

In this paper, we propose a method for estimating the individual treatment effects using panel data, where multiple related outcomes are observed for a large number of individuals over a small number of pretreatment periods. The method is based on the interactive fixed effects model, and allows both the treatment assignment and the potential outcomes to be correlated with the unobserved individual characteristics. Monte Carlo simulations show that our method outperforms related methods. We also provide an example of estimating the effect of health insurance coverage on individual usage of hospital emergency departments using the Oregon Health Insurance Experiment data.

There are several directions for future research. First, our method requires the idiosyncratic shocks in the pretreatment outcomes to be uncorrelated either over time or across outcomes. It would be a valuable addition to allow (or detect and adjust for) more general dependence structure in the idiosyncratic shocks. Second, since the residuals of the rearranged models are not estimates of the idiosyncratic shocks, the variance of our estimator may be over-estimated, especially when the number of pretreatment outcomes is small. A necessary step for future research is to correct for this bias. Third, the repeated pretreatment set splitting and averaging approach in our method is computationally expensive. It would be an interesting direction for future research to find better ways to select related outcomes or use more flexible averaging scheme. Fourth, the linear model specification may be restrictive. There is potential to extend our method, perhaps in combination with more flexible machine learning methods, to work with more general nonlinear outcomes.

Appendix A Proofs

Proof of Proposition 1.
τ^it,kτit,k=\displaystyle\widehat{\tau}_{it,k}-\tau_{it,k}= Y^it,k1Y^it,k0(Yit,k1Yit,k0)\displaystyle\widehat{Y}_{it,k}^{1}-\widehat{Y}_{it,k}^{0}-\left(Y_{it,k}^{1}-Y_{it,k}^{0}\right)
=\displaystyle= 𝒁it(𝜽^t,k1𝜽t,k1)𝒁it(𝜽^t,k0𝜽t,k0)eit,k1+eit,k0.\displaystyle\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\boldsymbol{\theta}_{t,k}^{1}\right)-\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{0}-\boldsymbol{\theta}_{t,k}^{0}\right)-e_{it,k}^{1}+e_{it,k}^{0}.

Given the assumptions and our models in (2) and (3), we have that Yit,kY_{it,k}, tT0t\leq T_{0}, k𝒦k\in\mathcal{K} are i.i.d. for all i𝒯i\in\mathcal{T} and all i𝒞i\in\mathcal{C}, and 𝔼|Yit,k|2<\mathbb{E}|Y_{it,k}|^{2}<\infty, so that 𝒁it\boldsymbol{Z}_{it} and 𝑹it\boldsymbol{R}_{it} are i.i.d. for all i𝒯i\in\mathcal{T} and all i𝒞i\in\mathcal{C}, 𝔼𝒁it2<\mathbb{E}\|\boldsymbol{Z}_{it}\|^{2}<\infty, and 𝔼𝑹it2<\mathbb{E}\|\boldsymbol{R}_{it}\|^{2}<\infty. In addition, 𝔼(𝒁it𝑹it)\mathbb{E}\left(\boldsymbol{Z}_{it}\boldsymbol{R}_{it}^{\prime}\right) has full rank due to the observed covariates and the unobserved individual characteristics, so that 𝔼(𝒁it𝑹it)𝑾1𝔼(𝑹it𝒁it)\mathbb{E}\left(\boldsymbol{Z}_{it}\boldsymbol{R}_{it}^{\prime}\right)\boldsymbol{W}^{1}\mathbb{E}\left(\boldsymbol{R}_{it}\boldsymbol{Z}_{it}^{\prime}\right) is invertible. Therefore, the weak law of large numbers and the continuous mapping theorem hold, and

𝜽^t,k1𝜽t,k1\displaystyle\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\boldsymbol{\theta}_{t,k}^{1} =(𝒁t1𝑹t1𝑾1𝑹t1𝒁t1)1𝒁t1𝑹t1𝑾1𝑹t1𝒆t,k1\displaystyle=\left({\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{R}_{t}^{1}\boldsymbol{W}^{1}{\boldsymbol{R}_{t}^{1}}^{\prime}\boldsymbol{Z}_{t}^{1}\right)^{-1}{\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{R}_{t}^{1}\boldsymbol{W}^{1}{\boldsymbol{R}_{t}^{1}}^{\prime}\boldsymbol{e}_{t,k}^{1}
=(1N1𝒁t1𝑹t1𝑾11N1𝑹t1𝒁t1)11N1𝒁t1𝑹t1𝑾11N1𝑹t1𝒆t,k1\displaystyle=\left(\frac{1}{N_{1}}{\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{R}_{t}^{1}\boldsymbol{W}^{1}\frac{1}{N_{1}}{\boldsymbol{R}_{t}^{1}}^{\prime}\boldsymbol{Z}_{t}^{1}\right)^{-1}\frac{1}{N_{1}}{\boldsymbol{Z}_{t}^{1}}^{\prime}\boldsymbol{R}_{t}^{1}\boldsymbol{W}^{1}\frac{1}{N_{1}}{\boldsymbol{R}_{t}^{1}}^{\prime}\boldsymbol{e}_{t,k}^{1}
𝑝[𝔼(𝒁it𝑹it)𝑾1𝔼(𝑹it𝒁it)]1𝔼(𝒁it𝑹it)𝑾1𝔼(𝑹iteit,k1)\displaystyle\overset{p}{\rightarrow}\left[\mathbb{E}\left(\boldsymbol{Z}_{it}\boldsymbol{R}_{it}^{\prime}\right)\boldsymbol{W}^{1}\mathbb{E}\left(\boldsymbol{R}_{it}\boldsymbol{Z}_{it}^{\prime}\right)\right]^{-1}\mathbb{E}\left(\boldsymbol{Z}_{it}\boldsymbol{R}_{it}^{\prime}\right)\boldsymbol{W}^{1}\mathbb{E}\left(\boldsymbol{R}_{it}e_{it,k}^{1}\right)
=0,\displaystyle=0,

as N1N_{1}\rightarrow\infty. Similarly, it can be shown that 𝜽^t,k0𝜽t,k0𝑝0\widehat{\boldsymbol{\theta}}_{t,k}^{0}-\boldsymbol{\theta}_{t,k}^{0}\overset{p}{\rightarrow}0 as N0N_{0}\rightarrow\infty.

Since 𝒁it=Op(1)\boldsymbol{Z}_{it}=O_{p}(1), we have 𝒁it(𝜽^t,k1𝜽t,k1)=op(1)\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\boldsymbol{\theta}_{t,k}^{1}\right)=o_{p}(1) and 𝒁it(𝜽^t,k0𝜽t,k0)=op(1)\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{0}-\boldsymbol{\theta}_{t,k}^{0}\right)=o_{p}(1). We also have 𝔼(eit,k1𝑯it=𝒉it)=0\mathbb{E}\left(e_{it,k}^{1}\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right)=0 and 𝔼(eit,k0𝑯it=𝒉it)=0\mathbb{E}\left(e_{it,k}^{0}\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right)=0 under Assumption 2.

Under the assumptions and by the Cauchy-Schwarz inequality, there exists M[0,)M^{*}\in[0,\infty) such that 𝔼|𝒁it(𝜽^t,k1𝜽t,k1)|(𝔼𝒁it2)1/2(𝔼𝜽^t,k1𝜽t,k12)1/2<M\mathbb{E}\left|\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\boldsymbol{\theta}_{t,k}^{1}\right)\right|\leq\left(\mathbb{E}\left\|\boldsymbol{Z}_{it}\right\|^{2}\right)^{1/2}\left(\mathbb{E}\left\|\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\boldsymbol{\theta}_{t,k}^{1}\right\|^{2}\right)^{1/2}<M^{*}, 𝔼|𝒁it(𝜽^t,k0𝜽t,k0)|<M\mathbb{E}\left|\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{0}-\boldsymbol{\theta}_{t,k}^{0}\right)\right|<M^{*}, and 𝔼|eit,k0eit,k1|<M\mathbb{E}\left|e_{it,k}^{0}-e_{it,k}^{1}\right|<M^{*}. By the triangle inequality, 𝔼|τ^it,kτit,k|𝔼|𝒁it(𝜽^t,k1𝜽t,k1)|+𝔼|𝒁it(𝜽^t,k0𝜽t,k0)|+𝔼|eit,k0eit,k1|<3M\mathbb{E}\left|\widehat{\tau}_{it,k}-\tau_{it,k}\right|\leq\mathbb{E}\left|\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\boldsymbol{\theta}_{t,k}^{1}\right)\right|+\mathbb{E}\left|\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{0}-\boldsymbol{\theta}_{t,k}^{0}\right)\right|+\mathbb{E}\left|e_{it,k}^{0}-e_{it,k}^{1}\right|<3M^{*}, which implies that τ^it,kτit,k\widehat{\tau}_{it,k}-\tau_{it,k} is uniformly integrable. Then by Lebesgue’s Dominated Convergence Theorem, convergence in probability implies convergence in means, i.e.,

limN1,N0𝔼(τ^it,kτit,k𝑯it=𝒉it)\displaystyle\underset{N_{1},N_{0}\rightarrow\infty}{\text{lim}}\mathbb{E}\left(\widehat{\tau}_{it,k}-\tau_{it,k}\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right)
=\displaystyle= 𝔼[plimN1,N0(τ^it,kτit,k)𝑯it=𝒉it]\displaystyle\mathbb{E}\left[\underset{N_{1},N_{0}\rightarrow\infty}{\text{plim}}\left(\widehat{\tau}_{it,k}-\tau_{it,k}\right)\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right]
=\displaystyle= 𝔼(eit,k0eit,k1𝑯it=𝒉it)\displaystyle\mathbb{E}\left(e_{it,k}^{0}-e_{it,k}^{1}\mid\boldsymbol{H}_{it}=\boldsymbol{h}_{it}\right)
=\displaystyle= 0.\displaystyle 0.

Proof of Proposition 2.

Under Assumptions 1-3, central limit theorem applies, and we have

1Ni=1N(τ^it,kτit,k)\displaystyle\frac{1}{N}\sum_{i=1}^{N}\left(\widehat{\tau}_{it,k}-\tau_{it,k}\right)
=\displaystyle= 1Ni=1N𝒁it(𝜽^t,k1𝜽t,k1)1Ni=1N𝒁it(𝜽^t,k0𝜽t,k0)1Ni=1N(eit,k1eit,k0)\displaystyle\frac{1}{N}\sum_{i=1}^{N}\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{1}-\boldsymbol{\theta}_{t,k}^{1}\right)-\frac{1}{N}\sum_{i=1}^{N}\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{0}-\boldsymbol{\theta}_{t,k}^{0}\right)-\frac{1}{N}\sum_{i=1}^{N}\left(e_{it,k}^{1}-e_{it,k}^{0}\right)
=\displaystyle= Op(N11/2)+Op(N01/2)+Op((N1+N0)1/2)\displaystyle O_{p}\left(N_{1}^{-1/2}\right)+O_{p}\left(N_{0}^{-1/2}\right)+O_{p}\left(\left(N_{1}+N_{0}\right)^{-1/2}\right)
=\displaystyle= Op(N11/2)+Op(N01/2).\displaystyle O_{p}\left(N_{1}^{-1/2}\right)+O_{p}\left(N_{0}^{-1/2}\right).

Since 1Ni=1Nτit,k𝔼(τit,k)=Op((N1+N0)1/2)\frac{1}{N}\sum_{i=1}^{N}\tau_{it,k}-\mathbb{E}\left(\tau_{it,k}\right)=O_{p}\left(\left(N_{1}+N_{0}\right)^{-1/2}\right). We have that 1Ni=1Nτ^it,kτt,k=1Ni=1Nτ^it,k1Ni=1Nτit,k+1Ni=1Nτit,kτt,k=Op(N11/2)+Op(N01/2)\frac{1}{N}\sum_{i=1}^{N}\widehat{\tau}_{it,k}-\tau_{t,k}=\frac{1}{N}\sum_{i=1}^{N}\widehat{\tau}_{it,k}-\frac{1}{N}\sum_{i=1}^{N}\tau_{it,k}+\frac{1}{N}\sum_{i=1}^{N}\tau_{it,k}-\tau_{t,k}=O_{p}\left(N_{1}^{-1/2}\right)+O_{p}\left(N_{0}^{-1/2}\right).

Proof of Proposition 3.

(i)

τ~it,kτit,k=\displaystyle\tilde{\tau}_{it,k}-\tau_{it,k}= 𝒁it(𝜽^t,k1𝜽t,k1)𝒁it(𝜽^t,k0𝜽t,k0)(uit,k1uit,k0).\displaystyle\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*1}-\boldsymbol{\theta}_{t,k}^{*1}\right)-\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*0}-\boldsymbol{\theta}_{t,k}^{*0}\right)-\left(u_{it,k}^{1}-u_{it,k}^{0}\right).

Since 𝔼(uit,k1𝒁it)=0\mathbb{E}\left(u_{it,k}^{1}\mid\boldsymbol{Z}_{it}\right)=0 and under Assumption 2, we have 𝔼(𝜽^t,k1𝜽t,k1𝒁t)=0\mathbb{E}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*1}-\boldsymbol{\theta}_{t,k}^{*1}\mid\boldsymbol{Z}_{t}\right)=0, and 𝔼(𝜽^t,k1𝜽t,k1𝒁it)=𝔼i[𝔼(𝜽^t,k1𝜽t,k1𝒁t)]=0\mathbb{E}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*1}-\boldsymbol{\theta}_{t,k}^{*1}\mid\boldsymbol{Z}_{it}\right)=\mathbb{E}_{-i}\left[\mathbb{E}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*1}-\boldsymbol{\theta}_{t,k}^{*1}\mid\boldsymbol{Z}_{t}\right)\right]=0, where 𝔼i()\mathbb{E}_{-i}(\cdot) denotes the expectation taken with respect to 𝒁jt,ji\boldsymbol{Z}_{jt},\ j\neq i. Similarly, we have 𝔼(𝜽^t,k0𝜽t,k0𝒁it)=0\mathbb{E}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*0}-\boldsymbol{\theta}_{t,k}^{*0}\mid\boldsymbol{Z}_{it}\right)=0.

Thus, 𝔼(τ~it,kτit,k𝒁it=𝒛it)=0\mathbb{E}\left(\tilde{\tau}_{it,k}-\tau_{it,k}\mid\boldsymbol{Z}_{it}=\boldsymbol{z}_{it}\right)=0. It follows that 𝔼(τ~t,kτt,k)=0\mathbb{E}\left(\tilde{\tau}_{t,k}-\tau_{t,k}\right)=0 using the law of iterated expectations.

(ii)

1Ni=1N(τ~it,kτit,k)\displaystyle\frac{1}{N}\sum_{i=1}^{N}\left(\tilde{\tau}_{it,k}-\tau_{it,k}\right)
=\displaystyle= 1Ni=1N[𝒁it(𝜽^t,k1𝜽t,k1)+𝜸t,k1𝜺i𝒫εit,k1]1Ni=1N[𝒁it(𝜽^t,k0𝜽t,k0)+𝜸t,k0𝜺i𝒫εit,k0]\displaystyle\frac{1}{N}\sum_{i=1}^{N}\left[\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*1}-\boldsymbol{\theta}_{t,k}^{1}\right)+{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}-\varepsilon_{it,k}^{1}\right]-\frac{1}{N}\sum_{i=1}^{N}\left[\boldsymbol{Z}_{it}^{\prime}\left(\widehat{\boldsymbol{\theta}}_{t,k}^{*0}-\boldsymbol{\theta}_{t,k}^{0}\right)+{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}-\varepsilon_{it,k}^{0}\right]
=\displaystyle= 1Ni=1N[𝒁it(j𝒯𝒁jt𝒁jt)1j𝒯𝒁jt(εjt,k1𝜸t,k1𝜺j𝒫)+𝜸t,k1𝜺i𝒫εit,k1]\displaystyle\frac{1}{N}\sum_{i=1}^{N}\left[\boldsymbol{Z}_{it}^{\prime}\left(\sum_{j\in\mathcal{T}}\boldsymbol{Z}_{jt}\boldsymbol{Z}_{jt}^{\prime}\right)^{-1}\sum_{j\in\mathcal{T}}\boldsymbol{Z}_{jt}\left(\varepsilon_{jt,k}^{1}-{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{j}^{\mathcal{P}}\right)+{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}-\varepsilon_{it,k}^{1}\right]
1Ni=1N[𝒁it(j𝒞𝒁jt𝒁jt)1j𝒞𝒁jt(εjt,k0𝜸t,k0𝜺j𝒫)+𝜸t,k0𝜺i𝒫εit,k0].\displaystyle-\frac{1}{N}\sum_{i=1}^{N}\left[\boldsymbol{Z}_{it}^{\prime}\left(\sum_{j\in\mathcal{C}}\boldsymbol{Z}_{jt}\boldsymbol{Z}_{jt}^{\prime}\right)^{-1}\sum_{j\in\mathcal{C}}\boldsymbol{Z}_{jt}\left(\varepsilon_{jt,k}^{0}-{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{\varepsilon}_{j}^{\mathcal{P}}\right)+{\boldsymbol{\gamma}_{t,k}^{0}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}-\varepsilon_{it,k}^{0}\right].

The following two statements hold:

1Ni=1N𝒁it(j𝒯𝒁jt𝒁jt)1j𝒯𝒁jtεjt,k1\displaystyle\frac{1}{N}\sum_{i=1}^{N}\boldsymbol{Z}_{it}^{\prime}\left(\sum_{j\in\mathcal{T}}\boldsymbol{Z}_{jt}\boldsymbol{Z}_{jt}^{\prime}\right)^{-1}\sum_{j\in\mathcal{T}}\boldsymbol{Z}_{jt}\varepsilon_{jt,k}^{1} =Op(N11/2),\displaystyle=O_{p}\left(N_{1}^{-1/2}\right),
1Ni=1N𝒁it(j𝒞𝒁jt𝒁jt)1j𝒞𝒁jtεjt,k0\displaystyle\frac{1}{N}\sum_{i=1}^{N}\boldsymbol{Z}_{it}^{\prime}\left(\sum_{j\in\mathcal{C}}\boldsymbol{Z}_{jt}\boldsymbol{Z}_{jt}^{\prime}\right)^{-1}\sum_{j\in\mathcal{C}}\boldsymbol{Z}_{jt}\varepsilon_{jt,k}^{0} =Op(N01/2).\displaystyle=O_{p}\left(N_{0}^{-1/2}\right).

Following similar arguments as in Li and Bell, (2017), we denote Δ~i1=𝜸t,k1𝜺i𝒫εit,k1𝒁it(j𝒯𝒁jt𝒁jt)1j𝒯𝒁jt𝜸t,k1𝜺j𝒫\tilde{\Delta}_{i}^{1}={\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}-\varepsilon_{it,k}^{1}-\boldsymbol{Z}_{it}^{\prime}\left(\sum_{j\in\mathcal{T}}\boldsymbol{Z}_{jt}\boldsymbol{Z}_{jt}^{\prime}\right)^{-1}\sum_{j\in\mathcal{T}}\boldsymbol{Z}_{jt}{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{j}^{\mathcal{P}}, and Δi1=𝜸t,k1𝜺i𝒫εit,k1𝒁it𝔼(𝒁it𝒁it)1𝔼(𝒁it𝜸t,k1𝜺i𝒫)\Delta_{i}^{1}={\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}-\varepsilon_{it,k}^{1}-\boldsymbol{Z}_{it}^{\prime}\mathbb{E}\left(\boldsymbol{Z}_{it}\boldsymbol{Z}_{it}^{\prime}\right)^{-1}\mathbb{E}\left(\boldsymbol{Z}_{it}{\boldsymbol{\gamma}_{t,k}^{1}}^{\prime}\boldsymbol{\varepsilon}_{i}^{\mathcal{P}}\right). We have that 𝔼(𝒁itΔi1)=0\mathbb{E}\left(\boldsymbol{Z}_{it}\Delta_{i}^{1}\right)=0. Since 𝒁it\boldsymbol{Z}_{it} contains constant 1, it follows that 𝔼(Δi1)=0\mathbb{E}\left(\Delta_{i}^{1}\right)=0. Thus 1Ni=1NΔ~i1𝑝𝔼(Δi1)=0\frac{1}{N}\sum_{i=1}^{N}\tilde{\Delta}_{i}^{1}\overset{p}{\rightarrow}\mathbb{E}\left(\Delta_{i}^{1}\right)=0 as N1N_{1}\rightarrow\infty, and 1Ni=1NΔ~i1=Op(N11/2)\frac{1}{N}\sum_{i=1}^{N}\tilde{\Delta}_{i}^{1}=O_{p}\left(N_{1}^{-1/2}\right). Similarly, we have 1Ni=1NΔ~i0=Op(N01/2)\frac{1}{N}\sum_{i=1}^{N}\tilde{\Delta}_{i}^{0}=O_{p}\left(N_{0}^{-1/2}\right).

Thus, 1Ni=1Nτ~it,k1Ni=1Nτit,k=Op(N11/2)+Op(N01/2)\frac{1}{N}\sum_{i=1}^{N}\tilde{\tau}_{it,k}-\frac{1}{N}\sum_{i=1}^{N}\tau_{it,k}=O_{p}\left(N_{1}^{-1/2}\right)+O_{p}\left(N_{0}^{-1/2}\right), and 1Ni=1Nτ~it,kτt,k=Op(N11/2)+Op(N01/2)\frac{1}{N}\sum_{i=1}^{N}\tilde{\tau}_{it,k}-\tau_{t,k}=O_{p}\left(N_{1}^{-1/2}\right)+O_{p}\left(N_{0}^{-1/2}\right).

References

  • Abadie and Cattaneo, (2018) Abadie, A. and Cattaneo, M. D. (2018). Econometric methods for program evaluation. Annual Review of Economics, 10:465–503.
  • Abadie et al., (2010) Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American Statistical Association, 105(490):493–505.
  • Abadie et al., (2015) Abadie, A., Diamond, A., and Hainmueller, J. (2015). Comparative politics and the synthetic control method. American Journal of Political Science, 59(2):495–510.
  • Acemoglu et al., (2008) Acemoglu, D., Johnson, S., Robinson, J. A., and Yared, P. (2008). Income and democracy. American Economic Review, 98(3):808–42.
  • Ahn et al., (2013) Ahn, S. C., Lee, Y. H., and Schmidt, P. (2013). Panel data models with multiple time-varying individual effects. Journal of Econometrics, 174(1):1–14.
  • Andrews, (1999) Andrews, D. W. (1999). Consistent moment selection procedures for generalized method of moments estimation. Econometrica, 67(3):543–563.
  • Athey et al., (2021) Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, pages 1–15.
  • Athey and Imbens, (2017) Athey, S. and Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic Perspectives, 31(2):3–32.
  • Bai, (2009) Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4):1229–1279.
  • Bai and Ng, (2002) Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221.
  • Berry et al., (1995) Berry, S., Levinsohn, J., and Pakes, A. (1995). Automobile prices in market equilibrium. Econometrica: Journal of the Econometric Society, pages 841–890.
  • Ferman and Pinto, (2019) Ferman, B. and Pinto, C. (2019). Synthetic controls with imperfect pre-treatment fit. arXiv preprint arXiv:1911.08521.
  • Finkelstein et al., (2012) Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., Allen, H., Baicker, K., and Oregon Health Study Group (2012). The Oregon Health Insurance Experiment: evidence from the first year. The Quarterly Journal of Economics, 127(3):1057–1106.
  • Hansen, (2021) Hansen, B. E. (2021). Econometrics. Manuscript. https://www.ssc.wisc.edu/ bhansen/econometrics/.
  • Holtz-Eakin et al., (1988) Holtz-Eakin, D., Newey, W., and Rosen, H. S. (1988). Estimating vector autoregressions with panel data. Econometrica: Journal of the econometric society, pages 1371–1395.
  • Hsiao et al., (2012) Hsiao, C., Steve Ching, H., and Ki Wan, S. (2012). A panel data approach for program evaluation: measuring the benefits of political and economic integration of Hong Kong with mainland China. Journal of Applied Econometrics, 27(5):705–740.
  • Li and Bell, (2017) Li, K. T. and Bell, D. R. (2017). Estimation of average treatment effects with panel data: Asymptotic theory and implementation. Journal of Econometrics, 197(1):65–75.
  • Rosenbaum and Rubin, (1983) Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55.
  • Rubin, (1974) Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688.
  • Taubman et al., (2014) Taubman, S. L., Allen, H. L., Wright, B. J., Baicker, K., and Finkelstein, A. N. (2014). Medicaid increases emergency-department use: evidence from Oregon’s Health Insurance Experiment. Science, 343(6168):263–268.
  • Xu, (2017) Xu, Y. (2017). Generalized synthetic control method: Causal inference with interactive fixed effects models. Political Analysis, 25(1):57–76.