This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Estimating interaction effects with panel data

Chris Muris Department of Economics, McMaster University muerisc@mcmaster.ca  and  Konstantin M. Wacker Department of Global Economics and Management, University of Groningen k.m.wacker@rug.nl
Abstract.

A common task in empirical economics is to estimate interaction effects that measure how the effect of one variable XX on another variable YY depends on a third variable HH. This paper considers the estimation of interaction effects in linear panel models with a fixed number of time periods. There are at least two ways to estimate interaction effects in this setting, both common in applied work. Our theoretical results show that these two approaches are distinct, and only coincide under strong conditions on unobserved effect heterogeneity. Our empirical results show that the difference between the two approaches is large, leading to conflicting conclusions about the sign of the interaction effect. Taken together, our findings may guide the choice between the two approaches in empirical work.

Key words and phrases:
panel data, interaction effects, correlated random coefficients
This paper benefited from discussions at presentations at the Universities of Bristol, Mainz, Vienna, the World Bank, the European Commission’s Joint Research Centre, and the Southern Economic Association meetings. We would like to thank the respective participants for their feedbacks and inputs, especially Irene Botosaru, Peter Egger, Michaela Kesina, Adam Lavecchia, Krishna Pendakur, James Powell, Nathanael Vellekoop, Frank Windmeijer, and Jonathan Zhang. We are grateful to Mathias Thoenig for providing the data used by Couttenier, Petrencu, Rohner, and Thoenig (2019) and for helping with preparing it for our empirical application, to the Swiss BfS for sharing regional GDP p.c. data, and to Eline Koopman for excellent research assistance.

1. Introduction

In empirical work in economics, we often want to estimate how the effect of one variable XX on another variable YY depends on a third variable HH. The standard approach to answering such a question with cross-sectional data is to estimate interaction effects based on a linear model Y=α+Xβ+XHκ+UY=\alpha+X\beta+XH\kappa+U.111Alternative approaches for modeling effect heterogeneity include those based on random coefficients (cf. Lewbel and Pendakur, 2017), and those based on tools from machine learning (see Athey and Imbens, 2015; Bordt, Farbmacher, and Kogel, 2020; Chernozhukov, Hansen, Liao, and Zhu, 2019; Wager and Athey, 2018). In this case, κ\kappa is called the interaction term coefficient, and it measures how the effect of XX on YY varies with HH.

With panel data, there are at least two ways to estimate interaction effects in linear models. The most common approach, which we will call the interaction term estimator (ITE), is based on a regression of YitY_{it} on XitX_{it} and XitHiX_{it}H_{i} (and additive fixed effects).222See, for example, Burnside and Dollar (2000); Shambaugh (2004); List and Sturm (2006); Amiti and Konings (2007); Spilimbergo (2009); Epifani and Gancia (2009); Duflo, Dupas, and Kremer (2011); Bloom, Sadun, and Van Reenen (2012); Berman, Martin, and Mayer (2012); Bloom, Draca, and Van Reenen (2016); Storeygard (2016); Alsan and Goldin (2019); Herrera, Ordoñez, and Trebesch (2020); Manacorda and Tesei (2020). The OLS estimate of the coefficient on XitHiX_{it}H_{i} is the ITE for κ\kappa. A second approach, which we will call the correlated interaction term estimator (CITE), consists of two steps. First, regress YitY_{it} on XitX_{it} separately for each panel unit ii, obtaining bib_{i}. In a second step, project bib_{i} onto HiH_{i} in a cross-sectional regression. The result of the second step is the CITE for κ\kappa. CITE has been used in applied work but appears to be less popular than ITE.333Existing papers that use this approach include Couttenier, Petrencu, Rohner, and Thoenig (2019) and MaCurdy (1981).

One goal of this paper is to show that these two approaches – ITE and CITE – are distinct, and will only recover the same object under strong conditions on unobserved effect heterogeneity. We derive conditions under which CITE is consistent for the interaction effect. These conditions are not sufficient to guarantee consistency of ITE. In two empirical applications, we show that ITE and CITE lead to conflicting conclusions about the sign of interaction effects. The difference in interaction effect estimates can be large. Based on our results, since CITE requires weaker assumptions on unobserved effect heterogeneity for consistency, we recommend it for typical applications in economics where there may be substantial, correlated unobserved effect heterogeneity.444In a sense that will become clear later, this preference for CITE over ITE corresponds to the preference for fixed effects over (correlated) random effects approaches for linear panel models with additive unobserved heterogeneity. In most existing work, the choice between ITE and CITE is typically not made explicit, and is done without motivation. We believe that we are the first to provide a rigorous analysis of these two estimators.

Fundamental to our results is that, in applications where interaction effects matter, there is likely additional unobserved effect heterogeneity in the effect of XX on YY. To fix ideas, consider the following outcome equation, which is a special case of our general framework:555Our general framework in Section 2 additionally has vector-valued XitX_{it}, additional controls, time-varying interaction variables, and additional sources of unobserved effect heterogeneity.

(1) Yit=αi+Xitβi+κXitHi+εit.Y_{it}=\alpha_{i}+X_{it}\beta_{i}+\kappa X_{it}H_{i}+\varepsilon_{it}.

In (1), βi\beta_{i} captures unobserved effect heterogeneity in addition to the observed effect heterogeneity due to the interaction term κHi\kappa H_{i}.

That the consistency of ITE relies on restrictive exogeneity conditions is evident from equation (1): the error term in the linear model underlying a regression of YY on XX and XHXH will have a composite error term εit+(βiβ)Xit\varepsilon_{it}+(\beta_{i}-\beta)X_{it}. This error term is correlated with XitX_{it} and HiH_{i}, unless the unobserved effect heterogeneity is unrelated to XitX_{it} and HiH_{i}.

CITE, on the other hand, allows the unobserved effect heterogeneity βi\beta_{i} to be arbitrarily correlated with XitX_{it}, by treating the βi\beta_{i} as parameters. We show that it is consistent for the interaction effect under conditions that do not guarantee the consistency of ITE. Importantly, this approach does not require a large number of time periods. Our consistency result is obtained under fixed-TT.

If HiH_{i} is endogenous, neither ITE nor CITE recover a causal effect of HiH_{i} on the effect of XX on YY. The best one can hope for is to recover the correlation between βi\beta_{i} and HiH_{i}. CITE recovers this correlation, but ITE does not.

To demonstrate the consistency of CITE, we build on existing work from the literature on correlated random coefficient models for linear panel models (see Chamberlain, 1992; Arellano and Bonhomme, 2012; Graham and Powell, 2012; Laage, 2020; Sasaki and Ura, 2021). The object of interest in this literature is (some feature of) the distribution of βi\beta_{i} in Yit=Xitβi+UitY_{it}=X_{it}\beta_{i}+U_{it}. Techniques from this literature can also be used in the context of difference-in-difference estimation (cf. Verdier, 2020; de Chaisemartin and D’Haultfoeuille, 2022). Our work differs from the papers in this literature because we are interested in the estimation of interaction effects, while the distribution of the random coefficients is a nuisance parameter.

Previous work on interaction effects has made mention of using CITE.666Balli and Sørensen (2013) observe that unobserved effect heterogeneity can lead to inconsistency in the ITE, and suggest that “if the time-series dimension of the data is large, one may directly allow for country-varying slopes”. We show that such slopes should generally be preferred regardless of the time-series dimension. Giesselmann and Schmidt-Catran (2020) discuss CITE, but prefer their double-demeaned estimator, citing computational concerns, a concern that interaction effects involving time-invariant variables cannot be estimated, and observing that the minimum number of time periods is higher for CITE than for their ITE. However, we are not aware of existing work that demonstrates that ITE and CITE are distinct, and that analyzes their asymptotic properties.

Organization

Section 2 introduces our framework for the estimation of interaction effects with panel data. Section 3 formally defines the two estimators. Sections 4 and 5 present our theoretical results. Section 6 contains the results for two empirical applications. Appendix A contains proofs of the main results. Appendix B has additional data and details for the empirical applications.

2. Setup

We are interested in the estimation of interaction effects in a static linear panel model. Typically, the outcome equation for this purpose is specified as

(2) Yit=XitHiκ+XitGitϕ+Zitγ+U~it,Y_{it}=X_{it}H_{i}\kappa+X_{it}G_{it}\phi+Z_{it}\gamma+\widetilde{U}_{it},

which allows the effect of the explanatory variable XitX_{it} on the dependent variable YitY_{it} to depend on observable, time-invariant interaction variables HiH_{i} and observable, time-varying interaction variables GitG_{it}. The parameters of interest are κ\kappa, the interaction term coefficient (ITC) on HiH_{i}, and ϕ\phi, the ITC on GitG_{it}.

Once we admit that the effect of XX on YY may depend on the observable GG and HH, we may wonder whether there are additional, unobserved, sources of effect heterogeneity. Our framework explicitly introduces unobserved heterogeneity in the effect of XX on YY via the following three equations for cross-section unit i=1,,ni=1,\cdots,n at time t=1,,Tt=1,\cdots,T:

(3) Yit\displaystyle Y_{it} =Xitβit+Zitγ+Uit,\displaystyle=X_{it}\beta_{it}+Z_{it}\gamma+U_{it},
(4) βitk\displaystyle\beta_{itk} =δik+Gitϕk+Vitk,k=1,,Kx,\displaystyle=\delta_{ik}+G_{it}\phi_{k}+V_{itk},\,k=1,\ldots,K_{x},
(5) δi1\displaystyle\delta_{i1} =Hiκ+ϵi.\displaystyle=H_{i}\kappa+\epsilon_{i}.

The outcome equation (3) describes how YitY_{it}\in\mathbb{R} responds to a change in XitKxX_{it}\in\mathbb{R}^{K_{x}}, allowing for control variables ZitKzZ_{it}\in\mathbb{R}^{K_{z}}. The effect of XitkX_{itk} on YitY_{it}, denoted βitk\beta_{itk}, can vary across ii and tt.

The coefficient equation (4) describes the heterogeneity in the effect of XitkX_{itk} on YitY_{it}. It decomposes the heterogeneous effect βitk\beta_{itk} in:

  1. (i)

    a part that depends on GitKgG_{it}\in\mathbb{R}^{K_{g}};

  2. (ii)

    a time-invariant, unit-specific δik\delta_{ik}\in\mathbb{R}; and

  3. (iii)

    an idiosyncratic effect heterogeneity VitkV_{itk}.

The heterogeneity equation (5) relates the unit-specific effect heterogeneity in the effect of the first regressor, δi1\delta_{i1}, to HiKhH_{i}\in\mathbb{R}^{K_{h}}, so that ϵi\epsilon_{i} captures the unobserved part of δi1\delta_{i1}. We do not model the time-invariant coefficients on the other regressors, see Remark 1.

In our framework, the effect of XX on YY may vary even when holding constant GG and HH, due to the presence of unobserved effect heterogeneity ϵi\epsilon_{i} and VitkV_{itk}. If we shut down unobserved effect heterogeneity, i.e.

Vitk=ϵi=0 for all i,t,k,V_{itk}=\epsilon_{i}=0\text{ for all }i,t,k,

and if all observables are scalar, then we obtain the standard outcome equation (2). Thus, our framework is a natural generalization of the standard way of thinking about interaction effects in linear models that takes seriously the role of unobserved effect heterogeneity.

In a model with scalar X,G,H,ZX,G,H,Z, and with unobserved effect heterogeneity, the reduced form of (3)-(5) is

(6) Yit=XitHiκ+XitGitϕ+Zit+(Uit+XitVit+Xitϵi).Y_{it}=X_{it}H_{i}\kappa+X_{it}G_{it}\phi+Z_{it}+\left(U_{it}+X_{it}V_{it}+X_{it}\epsilon_{i}\right).

It is clear from (6) that, under our setup, the error term U~it\widetilde{U}_{it} in (2) contains the unobserved effect heterogeneity terms. If those terms are correlated with the observables, a regression of YY on XGXG and XHXH is inconsistent due to endogeneity.

Remark 1.

Our specification allows for additive fixed effects in the outcome equation by setting Xit2=1X_{it2}=1. Then δi2\delta_{i2} is the conventional additive fixed effect.

Remark 2.

The index tt need not refer to time, but can refer to students tt for a given classroom ii, counties tt within a given state ii, employees tt within a given firm ii, etc.

3. Two estimators

There are at least two estimators for the ITCs ϕ\phi and κ\kappa introduced in equations (3)-(5). We call them the interaction term estimator (ITE) and the correlated interaction term estimator (CITE). The ITE, defined formally in Section 3.2, is a regression of YY on (X,Z)(X,Z) augmented with interactions of XX with GG and HH. The CITE, defined formally in Section 3.1, is a two-step estimator. The first step is a regression of YY on (X,Z)(X,Z) augmented with interactions of XX with GG and dummy variables for each ii. The second step is a regression of the estimated coefficients on the dummy variable interaction terms on HH.

Both approaches can accommodate additive fixed effects in the outcome equation by setting Xit2X_{it2} equal to a constant.

As a particular example of our framework and the two estimators, consider the special case with scalar XX and HH, no ZZ and GG, and Vit1=0V_{it1}=0. Then our setup simplifies to

(7) Yit\displaystyle Y_{it} =Xitδi1+Uit,\displaystyle=X_{it}\delta_{i1}+U_{it},
(8) δi1\displaystyle\delta_{i1} =Hiκ+ϵi,\displaystyle=H_{i}\kappa+\epsilon_{i},

with reduced form

(9) Yit\displaystyle Y_{it} =(XitHi)κ+(Uit+Xitϵit).\displaystyle=\left(X_{it}H_{i}\right)\kappa+\left(U_{it}+X_{it}\epsilon_{it}\right).

Then ITE is the regression of YY on X×HX\times H suggested by  (9). In contrast, CITE is based on (7) and (8): first, regress YY on XX for each ii to obtain δ^i1\widehat{\delta}_{i1}. Then regress δ^i1\widehat{\delta}_{i1} on HiH_{i} to obtain κ^\widehat{\kappa}.

In the remainder of this section, we formally define the two estimators. From now on, we will assume access to a random sample consistent with the framework in Section 2.

Assumption 1 (Random sampling).

For each i=1,,ni=1,\cdots,n, the observed data is

Wi=(Yi1,,YiT,Xi1,,XiT,Gi1,,GiT,Zi1,,ZiT,Hi),W_{i}=\left(Y_{i1},\cdots,Y_{iT},X_{i1},\cdots,X_{iT},G_{i1},\cdots,G_{iT},Z_{i1},\cdots,Z_{iT},H_{i}\right),

generated by equations (3)–(5). {Wi}i=1n\{W_{i}\}_{i=1}^{n} is an i.i.d. random sequence.

3.1. Correlated interaction term estimator (CITE)

Substituting (4) into (3) obtains the reduced form

Yit\displaystyle Y_{it} =kXitkβitk+Zitγ+Uit\displaystyle=\sum_{k}X_{itk}\beta_{itk}+Z_{it}\gamma+U_{it}
=kXitk(δik+Gitϕk+Vitk)+Zitγ+Uit\displaystyle=\sum_{k}X_{itk}\left(\delta_{ik}+G_{it}\phi_{k}+V_{itk}\right)+Z_{it}\gamma+U_{it}
(10) Xitδi+Ψitθ+ζit,\displaystyle\equiv X_{it}\delta_{i}+\Psi_{it}\theta+\zeta_{it},

where we have introduced the following objects (dimensions in square brackets):

δi\displaystyle\delta_{i} =(δi1,,δiK)\displaystyle=\left(\delta_{i1},\cdots,\delta_{iK}\right) [Kx×1],\displaystyle\left[K_{x}\times 1\right],
Ψit\displaystyle\Psi_{it} =(XitGit,Zit)\displaystyle=\left(X_{it}\otimes G_{it},Z_{it}\right) [1×(KxKg+Kz)],\displaystyle\left[1\times\left(K_{x}K_{g}+K_{z}\right)\right],
θ\displaystyle\theta =(ϕ1,,ϕK,γ)\displaystyle=\left(\phi_{1},\cdots,\phi_{K},\gamma\right) [(KxKg+Kz)×1],\displaystyle\left[\left(K_{x}K_{g}+K_{z}\right)\times 1\right],
ζit\displaystyle\zeta_{it} =Uit+kVitkXitk\displaystyle=U_{it}+\sum_{k}V_{itk}X_{itk} [1×1].\displaystyle\left[1\times 1\right].

For a given ii, collecting this relationship across t=1,,Tt=1,\cdots,T obtains

(11) Yi=Xiδi+Ψiθ+ζi,Y_{i}=X_{i}\delta_{i}+\Psi_{i}\theta+\zeta_{i},

where YiY_{i}, XiX_{i}, Ψi\Psi_{i}, and ζi\zeta_{i} are the counterparts of the objects in (10), with TT rows.

For the CITE to be well-defined, we assume sufficient variation in XiX_{i}.

Assumption 2.

There exists a h>0h>0 such that

infidet(XiXi)h.\inf_{i}\det\left(X_{i}^{{}^{\prime}}X_{i}\right)\geq h.

Assumption 2 avoids the identification issues in Graham and Powell (2012) and Arellano and Bonhomme (2012). If it does not hold for all ii, our analysis should be thought of as conditioning on a subpopulation for which it does hold, cf. Arellano and Bonhomme (2012). Assumption 2 allows us to define the residual maker matrix

Mi=IXi(XiXi)1Xi,M_{i}=I-X_{i}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}},

and write

MiYi\displaystyle M_{i}Y_{i} =MiXiδi+MiΨiθ+Miζi\displaystyle=M_{i}X_{i}\delta_{i}+M_{i}\Psi_{i}\theta+M_{i}\zeta_{i}
(12) =MiΨiθ+Miζi,\displaystyle=M_{i}\Psi_{i}\theta+M_{i}\zeta_{i},

using that MiXi=0M_{i}X_{i}=0. We also require sufficient variation in Ψi\Psi_{i} after projecting out XiX_{i}.

Assumption 3.

The matrix E(ΨiMiΨi)E\left(\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right) is invertible.

Assumptions 12 and 3 guarantee that CITE for θ\theta is well-defined for large enough nn.

Definition 1.

CITE for θ\theta is given by

θ^n(i=1nΨiMiΨi)1i=1nΨiMiYi.\widehat{\theta}_{n}\equiv\left(\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right)^{-1}\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}Y_{i}.

CITE for (ϕ,γ)(\phi,\gamma) can be extracted from θ^n\widehat{\theta}_{n}. For κ\kappa, we need variation in HiH_{i}.

Assumption 4.

The matrix E(HiHi)E\left(H_{i}^{{}^{\prime}}H_{i}\right) is invertible.

This is a standard no multicollinearity assumption on the regressors HiH_{i}. For each ii, let δ^i1\widehat{\delta}_{i1} be the first element of

δ^i=(XiXi)1Xi(YiΨiθ^n).\widehat{\delta}_{i}=\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\left(Y_{i}-\Psi_{i}\widehat{\theta}_{n}\right).
Definition 2.

Then the CITE for κ\kappa is

κ^n(i=1nHiHi)1i=1nHiδ^i1.\widehat{\kappa}_{n}\equiv\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}\widehat{\delta}_{i1}.
Remark 3.

In the special case that Kx=1K_{x}=1 and Xi1=1X_{i1}=1, CITE is the linear fixed effects estimator based on

Yit=δi1+Zitγ+Uit.Y_{it}=\delta_{i1}+Z_{it}\gamma+U_{it}.

In contrast, ITE is a correlated random effects estimator, see Remark 4.

3.2. Interaction term estimator (ITE)

Define Xi,1X_{i,-1} to be XiX_{i} with the first column removed, and define δi,1\delta_{i,-1} analogously. Then rewrite (11) as

Yi\displaystyle Y_{i} =Xiδi+Ψiθ+ζi\displaystyle=X_{i}\delta_{i}+\Psi_{i}\theta+\zeta_{i}
=Xi,1δi,1+Xi1δi1+Ψiθ+ζi\displaystyle=X_{i,-1}\delta_{i,-1}+X_{i1}\delta_{i1}+\Psi_{i}\theta+\zeta_{i}
=Xi,1δi,1+Xi1(Hiκ+ϵi)+Ψiθ+ζi\displaystyle=X_{i,-1}\delta_{i,-1}+X_{i1}(H_{i}\kappa+\epsilon_{i})+\Psi_{i}\theta+\zeta_{i}
(13) =Xi,1δi,1+Xi1Hiκ+ΨiθΨi~θ~+ζi+Xi1ϵiζ~i.\displaystyle=X_{i,-1}\delta_{i,-1}+\underbrace{X_{i1}H_{i}\kappa+\Psi_{i}\theta}_{\widetilde{\Psi_{i}}\widetilde{\theta}}+\underbrace{\zeta_{i}+X_{i1}\epsilon_{i}}_{\widetilde{\zeta}_{i}}.

The ITE requires sufficient variation in Xi,1X_{i,-1}.

Assumption 5.

There exists a h>0h>0 such that

infidet(Xi,1Xi,1)h.\inf_{i}\det\left(X_{i,-1}^{{}^{\prime}}X_{i,-1}\right)\geq h.

This assumption is similar to assumption 5, but is slightly weaker because of excluding the first column of XiX_{i}. Under assumption 5, we can define

Mi,1=IXi,1(Xi,1Xi,1)1Xi,1M_{i,-1}=I-X_{i,-1}\left(X_{i,-1}^{{}^{\prime}}X_{i,-1}\right)^{-1}X_{i,-1}^{{}^{\prime}}

and premultiply (13) by Mi,1M_{i,-1} to obtain

Mi,1Yi\displaystyle M_{i,-1}Y_{i} =Mi,1Xi,1δi,1+Mi,1Ψ~iθ~+Mi,1ζ~i\displaystyle=M_{i,-1}X_{i,-1}\delta_{i,-1}+M_{i,-1}\widetilde{\Psi}_{i}\widetilde{\theta}+M_{i,-1}\widetilde{\zeta}_{i}
(14) =Mi,1Ψ~iθ~+Mi,1ζ~i.\displaystyle=M_{i,-1}\widetilde{\Psi}_{i}\widetilde{\theta}+M_{i,-1}\widetilde{\zeta}_{i}.
Assumption 6.

The matrix E(Ψ~iMi,1Ψ~i)E\left(\widetilde{\Psi}_{i}^{{}^{\prime}}M_{i,-1}\widetilde{\Psi}_{i}\right) is invertible.

Assumption 6 guarantees sufficient variation in Ψ~i\widetilde{\Psi}_{i} after projecting out Xi,1X_{i,-1}. Given Assumptions 5 and 6, the ITE is well-defined.

Definition 3.

The ITE for θ~=(κ,ϕ,γ)=(κ,θ)\widetilde{\theta}=(\kappa,\phi,\gamma)=(\kappa,\theta) is

θ~^n(i=1nΨ~iMi,1Ψ~i)1i=1nΨ~iMi,1Yi.\widehat{\widetilde{\theta}}_{n}\equiv\left(\sum_{i=1}^{n}\widetilde{\Psi}_{i}^{{}^{\prime}}M_{i,-1}\widetilde{\Psi}_{i}\right)^{-1}\sum_{i=1}^{n}\widetilde{\Psi}_{i}^{{}^{\prime}}M_{i,-1}Y_{i}.

ITE estimates for κ\kappa, ϕ\phi and γ\gamma can be extracted from θ~^n\widehat{\widetilde{\theta}}_{n}.

Remark 4.

In the special case that Kx=1K_{x}=1 and Xi1=1X_{i1}=1, the ITE is a correlated random effects estimator (cf. Mundlak (1978) and Wooldridge (2010), 10.7.3). It approximates the unobserved heterogeneity by δi1=Hiκ+ϵi\delta_{i1}=H_{i}\kappa+\epsilon_{i}, so that

Yit=δi1+Zitγ+Uit=Hiκ+Zitγ+(Uit+ϵi).Y_{it}=\delta_{i1}+Z_{it}\gamma+U_{it}=H_{i}\kappa+Z_{it}\gamma+(U_{it}+\epsilon_{i}).

Consistency of the ITE would require assumptions about orthogonality of ϵ\epsilon and (H,Z)(H,Z).

Remark 5.

In the special case that Kx=2K_{x}=2 and Xit2=1X_{it2}=1, we have a linear fixed effects model with interaction terms. With a scalar HH and no GG,

Yit=δi2+XitHiκ+(Uit+XitVit1+Xitϵi).Y_{it}=\delta_{i2}+X_{it}H_{i}\kappa+\left(U_{it}+X_{it}V_{it1}+X_{it}\epsilon_{i}\right).

In this case, the transformation Mi,1M_{i,-1} is the within transformation that produces

Y~it=Yit1Ts=1TYis,X~it=Xit1Ts=1TXis\widetilde{Y}_{it}=Y_{it}-\frac{1}{T}\sum_{s=1}^{T}Y_{is},\;\;\widetilde{X}_{it}=X_{it}-\frac{1}{T}\sum_{s=1}^{T}X_{is}

and the ITE is obtained from linear regression of Y~it\widetilde{Y}_{it} on Hi×X~itH_{i}\times\widetilde{X}_{it}.

4. Main results

We now show that a set of exogeneity conditions that is sufficient for the consistency of CITE (Section 4.2) is not sufficient for the consistency of ITE (Section 4.3).

These results are derived without exogeneity restrictions involving ϵi\epsilon_{i}. This allows the heterogeneity equation to be misspecified. We discuss this modeling choice and its consequences in Section 4.1. In Section 5, we provide results under correct specification.

4.1. Misspecification

Throughout this section, we will not make exogeneity assumptions that involve ϵi\epsilon_{i}.777See Section 5 for an analysis under correct specification. This leaves room for misspecification of the heterogeneity equation, in the sense that we may have

(15) E[ϵi|Hi]0.E\left[\left.\epsilon_{i}\right|H_{i}\right]\neq 0.

Such misspecification arises if relevant variables are omitted from the heterogeneity equation. To see this, let there be a vector HiH_{i}^{*} such that δi1=Hiκ+ϵi\delta_{i1}=H_{i}^{*}\kappa^{*}+\epsilon^{*}_{i} and E[ϵi|Hi]=0E\left[\left.\epsilon^{*}_{i}\right|H_{i}^{*}\right]=0. If the researcher only uses (or has access to) a partial list HiHiH_{i}\subsetneq H_{i}^{*} and uses the alternative specification

δi1=Hiκ+ϵi,\delta_{i1}=H_{i}\kappa+\epsilon_{i},

then generally E[ϵi|Hi]0E\left[\left.\epsilon_{i}\right|H_{i}\right]\neq 0. Misspecification also arises if HiH_{i} is measured with error, if there is functional form misspecification, etc.

Our results below show that CITE is robust against such misspecification, but that ITE is not.

Under misspecification of the heterogeneity equation, κ\kappa is too ambitious of a target even if δi1\delta_{i1} were known. Consider instead the infeasible regression of δi1\delta_{i1} on HiH_{i}, which tends to the population projection coefficient of δi1\delta_{i1} on HiH_{i},

(16) κ~\displaystyle\widetilde{\kappa} (E[HiHi])1E[Hiδi1]\displaystyle\equiv\left(E\left[H_{i}^{{}^{\prime}}H_{i}\right]\right)^{-1}E\left[H_{i}^{{}^{\prime}}\delta_{i1}\right]
(17) =κ+(E[HiHi])1E[Hiϵi]\displaystyle=\kappa+\left(E\left[H_{i}^{{}^{\prime}}H_{i}\right]\right)^{-1}E\left[H_{i}^{{}^{\prime}}\epsilon_{i}\right]

This parameter κ~\widetilde{\kappa} is of interest in many applications. It answers the question: “How does the effect of XX on YY vary with HH?”. It does not answer the causal question: “How does the effect of XX on YY change given an exogenous change in HH?”. By restricting attention to κ~\widetilde{\kappa} instead of κ\kappa, we can allow for misspecification while still obtaining an interesting parameter. This is what CITE delivers. ITE does not deliver κ~\widetilde{\kappa}.

4.2. Consistency of CITE

Our analysis proceeds under the following exogeneity assumption, where ViV_{i} collects all the coefficient equation error terms VitkV_{itk}.

Assumption 7 (Strict exogeneity).

The error terms in equations (3) and (4) satisfy

0\displaystyle 0 =E(Ui|Xi,Zi,Gi,Hi)\displaystyle=E\left(\left.U_{i}\right|X_{i},Z_{i},G_{i},H_{i}\right)
=E(Vi|Xi,Zi,Gi,Hi).\displaystyle=E\left(\left.V_{i}\right|X_{i},Z_{i},G_{i},H_{i}\right).

This assumption is similar to the strict exogeneity assumption that is standard in literature on correlated random coefficient panel models.888See Chamberlain (1992); Arellano and Bonhomme (2012); Graham and Powell (2012). A notable exception is Laage (2020). It is restrictive, but not much stronger than what is necessary for accommodating additive fixed effects in a linear model with fixed TT. Our version of strict exogeneity requires HH – along with (X,Z,G)(X,Z,G) – to be orthogonal to the error terms in the outcome and coefficient equations. This is not necessary for estimation of ϕ\phi and γ\gamma, which would only require orthogonality with respect to (X,Z,G)(X,Z,G), not HH.

We first state the result for θ=(ϕ,γ)\theta=(\phi,\gamma).

Theorem 1 (Consistency of CITE for θ\theta).

If Assumptions  123, and 7 hold, and if

EMiYi2<,EMiΨi2<E\left\|M_{i}Y_{i}\right\|^{2}<\infty,~{}E\left\|M_{i}\Psi_{i}\right\|^{2}<\infty

then as nn\to\infty, θ^npθ\widehat{\theta}_{n}\stackrel{{\scriptstyle p}}{{\to}}\theta.

Proof.

See Appendix A.∎

The second result is for κ~\widetilde{\kappa}. In what follows, e1e_{1}^{\prime} is the column vector of length KhK_{h}, with the first element equal to 1, and all other elements equal to zero.

Theorem 2 (Consistency for κ~\widetilde{\kappa}).

Under the conditions of Theorem 1, and if Assumption 4 holds, and if (i) EHi2<E\left\|H_{i}\right\|^{2}<\infty, (ii) Eδi12<E\left\|\delta_{i1}\right\|^{2}<\infty, (iii) EHie1(XiXi)1XiΨi<E\left\|H_{i}e_{1}^{\prime}(X_{i}^{\prime}X_{i})^{-1}X_{i}^{\prime}\Psi_{i}\right\|<\infty, and (iv) EHie1(XiXi)1Xiζi<E\left\|H_{i}e_{1}^{\prime}(X_{i}^{\prime}X_{i})^{-1}X_{i}^{\prime}\zeta_{i}\right\|<\infty, then, as nn\to\infty, κ^pκ~\widehat{\kappa}\stackrel{{\scriptstyle p}}{{\to}}\widetilde{\kappa}.

Proof.

See Appendix A.∎

Remark 6.

Theorem 2 adapts (Arellano and Bonhomme, 2012, Corollary 1) to the present context. Note that our Assumption 2 implies that we can drop their conditioning on 𝕊\mathbb{S}. Then, set their Fi=HiF_{i}=H_{i}, their yi=Yiy_{i}=Y_{i}, their γi=δi1\gamma_{i}=\delta_{i1}, their Zi=ΨiZ_{i}=\Psi_{i}, their δ=θ\delta=\theta.

4.3. Inconsistency of the ITE

Without assumptions on ϵi\epsilon_{i}, the ITE is inconsistent. To show this, we consider the special case with XX\in\mathbb{R}, H2H\in\mathbb{R}^{2}, and without G,ZG,Z. We consider a setting with omitted variables: the ITE in this section uses the first element of HH but not the second. Other forms of endogeneity lead to similar conclusions. For example, letting Hi2=Hi12H_{i2}=H_{i1}^{2} in what follows is functional form misspecification. The case of measurement error in Hi1H_{i1} is similar.

Partition Hi=(Hi1,Hi2)H_{i}=(H_{i1},H_{i2}) and rewrite the model equations for the special case under consideration:

Yit\displaystyle Y_{it} =Xit1βit1+Uit\displaystyle=X_{it1}\beta_{it1}+U_{it}
βit1\displaystyle\beta_{it1} =δi1+Vit1\displaystyle=\delta_{i1}+V_{it1}
δi1\displaystyle\delta_{i1} =Hiκ+ϵi,\displaystyle=H_{i}\kappa+\epsilon_{i},
=Hi1κ1+Hi2κ2+ϵi.\displaystyle=H_{i1}\kappa_{1}+H_{i2}\kappa_{2}+\epsilon_{i}.

Assume strict exogeneity for the full set of covariates HiH_{i}

0\displaystyle 0 =E(Ui|Xi,Zi,Gi,Hi1,Hi2),\displaystyle=E\left(\left.U_{i}\right|X_{i},Z_{i},G_{i},H_{i1},H_{i2}\right),
0\displaystyle 0 =E(Vi|Xi,Zi,Gi,Hi1,Hi2)\displaystyle=E\left(\left.V_{i}\right|X_{i},Z_{i},G_{i},H_{i1},H_{i2}\right)

and additionally assume that the heterogeneity equation is correctly specified if both Hi1H_{i1} and Hi2H_{i2} are included, i.e.

0=E(ϵi|Xi,Zi,Gi,Hi1,Hi2).0=E\left(\left.\epsilon_{i}\right|X_{i},Z_{i},G_{i},H_{i1},H_{i2}\right).

These assumptions are stronger than necessary for consistency of CITE (using Hi1H_{i1} only) for κ~1\widetilde{\kappa}_{1},999This follows from Theorem 2. the projection coefficient of δi1\delta_{i1} on Hi1H_{i1}, which for this special case equals:

κ~1=κ1+(E[Hi12])1E[Hi1Hi2]κ2.\widetilde{\kappa}_{1}=\kappa_{1}+\left(E\left[H_{i1}^{2}\right]\right)^{-1}E\left[H_{i1}H_{i2}\right]\kappa_{2}.

However, they are not sufficient for consistency for κ~1\widetilde{\kappa}_{1} of the ITE that uses only Hi1H_{i1},

κˇ1\displaystyle\widecheck{\kappa}_{1} =(itXit12Hi12)1(itXit1Hi1Yit)\displaystyle=\left(\sum_{i}\sum_{t}X_{it1}^{2}H_{i1}^{2}\right)^{-1}\left(\sum_{i}\sum_{t}X_{it1}H_{i1}Y_{it}\right)
=κ1+(itXit12Hi12)1(itXit1Hi1(Xit1(Hi2κ2+ϵi+Vit1)+Uit))\displaystyle=\kappa_{1}+\left(\sum_{i}\sum_{t}X_{it1}^{2}H_{i1}^{2}\right)^{-1}\left(\sum_{i}\sum_{t}X_{it1}H_{i1}(X_{it1}(H_{i2}\kappa_{2}+\epsilon_{i}+V_{it1})+U_{it})\right)

To see that it is not consistent under the maintained assumptions, assume that the relevant laws of large numbers apply, so that

κˇ1\displaystyle\widecheck{\kappa}_{1} pκ1+(tE[Xit12Hi12])1(tE[Xit1Hi1(Xit1(Hi2κ2+ϵi+Vit1)+Uit]))\displaystyle\stackrel{{\scriptstyle p}}{{\to}}\kappa_{1}+\left(\sum_{t}E\left[X_{it1}^{2}H_{i1}^{2}\right]\right)^{-1}\left(\sum_{t}E\left[X_{it1}H_{i1}(X_{it1}(H_{i2}\kappa_{2}+\epsilon_{i}+V_{it1})+U_{it}\right])\right)
=κ1+(tE[Xit12Hi12])1(tE[Xit12Hi1Hi2κ2])\displaystyle=\kappa_{1}+\left(\sum_{t}E\left[X_{it1}^{2}H_{i1}^{2}\right]\right)^{-1}\left(\sum_{t}E\left[X_{it1}^{2}H_{i1}H_{i2}\kappa_{2}\right]\right)
κ1+(E[Hi12])1E[Hi1Hi2]κ2.\displaystyle\neq\kappa_{1}+\left(E\left[H_{i1}^{2}\right]\right)^{-1}E\left[H_{i1}H_{i2}\right]\kappa_{2}.
=κ~1.\displaystyle=\widetilde{\kappa}_{1}.

This shows that CITE with Hi1H_{i1} converges to κ~1\widetilde{\kappa}_{1}, whereas ITE does not. From the expression of the probability limit of κˇ1\widecheck{\kappa}_{1}, it is clear that the inconsistency can be made arbitrarily large.

5. Results under correct specification

The results in the previous section were derived under misspecification. We now add the assumption of correct specification to the model.

Assumption 8.

The heterogeneity equation is correctly specified,

(18) E(ϵi|Hi)=0.E\left(\left.\epsilon_{i}\right|H_{i}\right)=0.

This assumption is likely too strong for most empirical applications in economics, see Section 4.1. Together, Assumptions 7 and 8 require that all variables are exogeneous in the outcome and coefficient equation, and that HiH_{i} is exogenous in the heterogeneity equation. Assumption 8 does not require ϵi\epsilon_{i} to be orthogonal to the other regressors (X,G,Z)\left(X,G,Z\right).

If the heterogeneity equation is correctly specified, then CITE is consistent for the causal parameter κ\kappa (Theorem 3) but ITE is not (Section 5.2). We also discuss stronger conditions that restore consistency of ITE (Section 5.3).

5.1. Consistency of CITE

Theorem 1 applies without modification: CITE is consistent for (ϕ,γ)(\phi,\gamma). Under correct specification, we additionally have that CITE is consistent for κ\kappa, instead of for the projection coefficient κ~\widetilde{\kappa}.

Theorem 3 (Consistency of DVAITE for κ\kappa).

If the conditions for Theorem 2 hold, and if assumption 8 holds, then, as nn\to\infty, κ^pκ\widehat{\kappa}\stackrel{{\scriptstyle p}}{{\to}}\kappa.

Proof.

Under assumption 8, κ~=κ.\widetilde{\kappa}=\kappa.. Other than that, the proof is identical to that of Theorem 2. ∎

5.2. Inconsistency of ITE

Assumption 8, in conjunction with Assumption 7, is likely too strong for most applications in economics. But it is not sufficient for consistency of ITE for κ\kappa. Consider the special case that G,X,HG,X,H are scalar, and that there are no ZZ. Then the reduced form simplifies to:

Yit=Ψitθ~+(Uit+Xitϵi+XitVit),Y_{it}=\Psi_{it}\widetilde{\theta}+\left(U_{it}+X_{it}\epsilon_{i}+X_{it}V_{it}\right),

where

Ψ~it=(XitGit,XitHi),\widetilde{\Psi}_{it}=(X_{it}G_{it},X_{it}H_{i}),

and θ~=(ϕ,κ)\widetilde{\theta}=(\phi,\kappa). The ITE simplifies to

θ~^=(itΨ~itΨ~it)1itΨ~itYit,\widehat{\widetilde{\theta}}=\left(\sum_{i}\sum_{t}\widetilde{\Psi}_{it}^{\prime}\widetilde{\Psi}_{it}\right)^{-1}\sum_{i}\sum_{t}\widetilde{\Psi}_{it}^{\prime}Y_{it},

so that, if an appropriate law of large numbers holds,

θ~^θ~\displaystyle\widehat{\widetilde{\theta}}-\widetilde{\theta} p(tE[Ψ~itΨ~it])1tE[Ψ~it(Uit+XitVit+Xitϵi)]\displaystyle\stackrel{{\scriptstyle p}}{{\to}}\left(\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}\widetilde{\Psi}_{it}\right]\right)^{-1}\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}\left(U_{it}+X_{it}V_{it}+X_{it}\epsilon_{i}\right)\right]
=(tE[Ψ~itΨ~it])1tE[Ψ~itXitϵi].\displaystyle=\left(\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}\widetilde{\Psi}_{it}\right]\right)^{-1}\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}X_{it}\epsilon_{i}\right].

The equality follows from Assumptions 7 and 8, which eliminate the first two components of the composite error terms. However, the remaining component, XitϵiX_{it}\epsilon_{i}, is not orthogonal to Ψit=(XitGit,XitHi)\Psi_{it}=(X_{it}G_{it},X_{it}H_{i}).

5.3. A sufficient condition for consistency of ITE

The strongest exogeneity assumptions we have imposed so far is

0\displaystyle 0 =E(Ui|Xi,Zi,Gi,Hi),\displaystyle=E\left(\left.U_{i}\right|X_{i},Z_{i},G_{i},H_{i}\right),
0\displaystyle 0 =E(Vi|Xi,Zi,Gi,Hi),\displaystyle=E\left(\left.V_{i}\right|X_{i},Z_{i},G_{i},H_{i}\right),
0\displaystyle 0 =E(ϵi|Hi).\displaystyle=E\left(\left.\epsilon_{i}\right|H_{i}\right).

It does not restrict the distribution of ϵi\epsilon_{i} conditional on the time-varying observables. It is easy to show that the ITE is consistent if we impose the following stronger correlated random effects assumption:

0\displaystyle 0 =E(Ui|Xi,Zi,Gi,Hi)\displaystyle=E\left(\left.U_{i}\right|X_{i},Z_{i},G_{i},H_{i}\right)
=E(Vi|Xi,Zi,Gi,Hi)\displaystyle=E\left(\left.V_{i}\right|X_{i},Z_{i},G_{i},H_{i}\right)
=E(ϵi|Xi,Zi,Gi,Hi).\displaystyle=E\left(\left.\epsilon_{i}\right|X_{i},Z_{i},G_{i},H_{i}\right).

This requires ϵi\epsilon_{i} to be orthogonal to (X,G,Z)(X,G,Z) in addition to HH.

6. Empirical applications

We use two empirical applications to show that ITE and CITE can lead to meaningfully different conclusions about interaction effects.

6.1. Stock and Watson, 2015, Chapter 10

In their textbook example, Stock and Watson study the relationship between a U.S. state’s traffic fatality rate (YitY_{it}) and an alcohol tax (XitX_{it}) using data from 48 U.S. states over a period of 7 years. We extend this example to explore whether this relationship depends on time-varying interaction variables GG (state’s unemployment rate; state’s minimum punishment for drunk driving). We also consider time-invariant interaction variables HH, namely the period-1 values of the proportion of a state’s population that is mormon or southern baptist.

Table 1 reports the ITCs estimated with ITE (column 1) and CITE (column 2). CITE suggests that the effect of alcohol taxes on traffic fatalities depends negatively on the unemployment rate. In contrast, the ITE does not find evidence for the presence of an interaction effect. For the presence of minimum punishment, the ITE estimates a positive interaction effect. CITE also estimates a positive effect, but it is not statistically significant at conventional levels.

The estimates for time-invariant interaction variables are also not in agreement. Using the ITE, we find a statistically significant relationship between the alcohol tax effect and the proportion of southern baptist. When repeating the analysis with CITE, we do not find evidence for such a relationship.

In conclusion, the two estimators for interaction effects yield contrasting conclusions for three of the four interaction variables. For two variables, the point estimate changes sign.

ITE CITE
unemployment rate 0.003 -0.045***
(0.015) (0.018)
minimum punishment 0.260*** 0.139
(0.119) (0.125)
proportion mormon 0.001 0.111
(0.008) (0.100)
proportion southern baptist -0.041*** 0.065
(0.019) (0.101)
nTnT 336 336
Table 1. Point estimates and standard errors (in parentheses) for interaction effects in Stock and Watson’s ”beer tax”. First column has results for ITE, second column has results for CITE. Time-varying interaction variables are the unemployment rate and the existence of minimum punishment. Time-invariant interaction variables are the proportion of a state’s population that is mormon/southern baptist. *** p<<0.01, ** p<<0.05, * p<<0.1.

6.2. Couttenier, Petrencu, Rohner, and Thoenig (2019)

Empirical research increasingly uses higher-dimensional panel data and interactions among those dimensions (e.g., Berman, Martin, and Mayer, 2012; Bloom, Draca, and Van Reenen, 2016; Manacorda and Tesei, 2020). To show the applicability of CITE in such a setting and contrast it with the performance of ITE, we revisit the results in Couttenier, Petrencu, Rohner, and Thoenig (2019), CPRT hereafter, who use micro-level panel data from Switzerland that are aggregated to a time (tt), age cohort (aa), migrant nationality (nn), and Swiss regional (ii) dimension (see Appendix B for details).

CPRT are interested in the relationship

(19) CPn,a,t,i=βikidn,a,t+Zn,a,tγa+FEi,t+FEn,t+FEa+εn,a,t,i,CP_{n,a,t,i}=\beta_{i}kid_{n,a,t}+Z_{n,a,t}\gamma_{a}+FE_{i,t}+FE_{n,t}+FE_{a}+\varepsilon_{n,a,t,i},

to estimate the effect βi\beta_{i} of migrants’ past exposure to conflict (such as civil war) as a kidkid in their origin country on current crime propensity CPCP in Switzerland. They refer to βi\beta_{i} as the crime premium. Interaction effects in this setting can therefore be interpreted as the impact of interaction variables on that crime premium.

CPRT are interested in whether policies in host regions are related to the crime premium. In particular, they are interested in the role of the time-invariant interaction variable openjobacciopenjobacc_{i}, which equals 1 for regions where asylum seekers can start working in all sectors of activity three months after arrival. To that end, they use CITE, i.e. they project their region-specific coefficient estimates β^i\hat{\beta}_{i} from (21) onto openjobacciopenjobacc_{i}:

(20) β^i=a+bopenjobacci+εi.\hat{\beta}_{i}=a+b\ openjobacc_{i}+\varepsilon_{i}.

We reexamine their CITE results by including additional, time-varying interaction variables and compare them to the ITE results. The additional interaction variables that we include are two demographic variables: a region’s share of middle-aged persons (pop_middlepop\_middle) and a region’s share of urban population (urbanpopurbanpop). Appendix B has more details on the data and specifications.

Table 2 reports our interaction effect estimates. First, we show in Appendix B that the mean effect of conflict exposure on crime propensity according to CITE is 0.46. This is very close to CPRT’s results, but not identical to the inclusion of additional time-varying interaction variables in our specification.

Second, CITE reveals that the crime premium is lower if urbanization increases (column 1, urbanpopurbanpop), possibly reflecting better psycho-social support and integration opportunities. Note that the negative interaction coefficient 0.027-0.027 is not only statistically significant but also economically large: it suggests that at an urbanization rate of 92% (the 75th percentile), the crime premium is essentially 0 (equal to 0.023), while it is estimated to be 0.79 at an urbanization rate of 64% (the 25th percentile) – a magnitude that is close to the higher-end baseline result in CPRT.

Third, CITE suggests that states with openjobacci=1openjobacc_{i}=1, i.e. states with a generous labour market policy for immigrants, have much lower crime premiums.

In contrast, ITE does not find any statistically significant interaction terms. The point estimates for openjobaccopenjobacc and urbanpopurbanpop are about half those of CITE.101010If the ITE for urban population of -0.011 is taken at face value, it implies a crime premium of 0.62 and 0.31 at the 75th and 25th percentile of urbanpopurbanpop, respectively, and hence a much lower implied difference for this interaction than the reported CITE results.

In conclusion, CITE can be used in complex panel data settings that are increasingly common in the applied literature. Furthermore, CITE reveals two interesting interaction effects related to the crime premium that ITE does not find.

(1) (2)
CITE ITE
pop_middlepop\_middle 0.805 -0.171
(0.758) (0.216)
urbanpopurbanpop -0.0274*** -0.0112
(0.00919) (0.0110)
openjobaccopenjobacc -1.146* -0.490
(0.622) (0.447)
nTnT 48272 48272
Table 2. Point estimates and standard errors (in parentheses) for CPRT. First column has results for CITE, second column has results for ITE. Time-varying interaction variables are the share of a region’s population that is middle-aged, pop_middlepop\_middle, and the share of a region’s population that is urban urbanpopurbanpop. The time-invariant interaction variable openjobacciopenjobacc_{i} is a dummy variable indicating whether asylum seekers can start working in all sectors of activity three months after arrival. *** p<<0.01, ** p<<0.05, * p<<0.1.

References

  • (1)
  • Alsan and Goldin (2019) Alsan, M., and C. Goldin (2019): “Watersheds in Child Mortality: The Role of Effective Water and Sewerage Infrastructure, 1880–1920,” Journal of Political Economy, 127(2), 586–638.
  • Amiti and Konings (2007) Amiti, M., and J. Konings (2007): “Trade Liberalization, Intermediate Inputs, and Productivity: Evidence from Indonesia,” American Economic Review, 97(5), 1611–1638.
  • Arellano and Bonhomme (2012) Arellano, M., and S. Bonhomme (2012): “Identifying Distributional Characteristics in Random Coefficients Panel Data Models,” The Review of Economic Studies, 79(3), 987–1020.
  • Athey and Imbens (2015) Athey, S., and G. W. Imbens (2015): “Machine Learning Methods for Estimating Heterogeneous Causal Effects,” Stat.
  • Balli and Sørensen (2013) Balli, H. O., and B. E. Sørensen (2013): “Interaction Effects in Econometrics,” Empirical Economics, 45(1), 583–603.
  • Berman, Martin, and Mayer (2012) Berman, N., P. Martin, and T. Mayer (2012): “How Do Different Exporters React to Exchange Rate Changes?,” Quarterly Journal of Economics, 127(1), 437–492.
  • Bloom, Draca, and Van Reenen (2016) Bloom, N., M. Draca, and J. Van Reenen (2016): “Trade Induced Technical Change? The Impact of Chinese Imports on Innovation, IT and Productivity,” Review of Economic Studies, 83(1), 87–117.
  • Bloom, Sadun, and Van Reenen (2012) Bloom, N., R. Sadun, and J. Van Reenen (2012): “Americans Do IT Better: US Multinationals and the Productivity Miracle,” American Economic Review, 102(1), 167–201.
  • Bordt, Farbmacher, and Kogel (2020) Bordt, S., H. Farbmacher, and H. Kogel (2020): “Estimating Grouped Patterns of Heterogeneity in Repeated Public Goods Experiments,” .
  • Burnside and Dollar (2000) Burnside, C., and D. Dollar (2000): “Aid, Policies, and Growth,” American Economic Review, 90(4), 23.
  • Chamberlain (1992) Chamberlain, G. (1992): “Efficiency Bounds for Semiparametric Regression,” Econometrica, 60(3), 567–596.
  • Chernozhukov, Hansen, Liao, and Zhu (2019) Chernozhukov, V., C. Hansen, Y. Liao, and Y. Zhu (2019): “Inference for Heterogeneous Effects Using Low-Rank Estimation of Factor Slopes,” .
  • Couttenier, Petrencu, Rohner, and Thoenig (2019) Couttenier, M., V. Petrencu, D. Rohner, and M. Thoenig (2019): “The Violent Legacy of Conflict: Evidence on Asylum Seekers, Crime, and Public Policy in Switzerland,” American Economic Review, 109(12), 4378–4425.
  • de Chaisemartin and D’Haultfoeuille (2022) de Chaisemartin, C., and X. D’Haultfoeuille (2022): “Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey,” Discussion paper.
  • Duflo, Dupas, and Kremer (2011) Duflo, E., P. Dupas, and M. Kremer (2011): “Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya,” American Economic Review, 101(5), 1739–1774.
  • Epifani and Gancia (2009) Epifani, P., and G. Gancia (2009): “Openness, Government Size and the Terms of Trade,” The Review of Economic Studies, 76(2), 629–668.
  • Giesselmann and Schmidt-Catran (2020) Giesselmann, M., and A. W. Schmidt-Catran (2020): “Interactions in Fixed Effects Regression Models,” Sociological Methods & Research, p. 004912412091493.
  • Graham and Powell (2012) Graham, B. S., and J. L. Powell (2012): “Identification and Estimation of Average Partial Effects in ”Irregular” Correlated Random Coefficient Panel Data Models,” Econometrica, 80(5), 2105–2152.
  • Herrera, Ordoñez, and Trebesch (2020) Herrera, H., G. Ordoñez, and C. Trebesch (2020): “Political Booms, Financial Crises,” Journal of Political Economy, 128(2), 507–543.
  • Laage (2020) Laage, L. (2020): “A Correlated Random Coefficient Panel Model with Time-Varying Endogeneity,” .
  • Lewbel and Pendakur (2017) Lewbel, A., and K. Pendakur (2017): “Unobserved Preference Heterogeneity in Demand Using Generalized Random Coefficients,” Journal of Political Economy, 125(4), 1100–1148.
  • List and Sturm (2006) List, J. A., and D. M. Sturm (2006): “How Elections Matter: Theory and Evidence from Environmental Policy*,” Quarterly Journal of Economics, 121(4), 1249–1281.
  • MaCurdy (1981) MaCurdy, T. E. (1981): “An Empirical Model of Labor Supply in a Life-Cycle Setting,” Journal of Political Economy, 89(6), 1059–1085.
  • Manacorda and Tesei (2020) Manacorda, M., and A. Tesei (2020): “Liberation Technology: Mobile Phones and Political Mobilization in Africa,” Econometrica, 88(2), 533–567.
  • Mundlak (1978) Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica, 46(1), 69–85.
  • Sasaki and Ura (2021) Sasaki, Y., and T. Ura (2021): “Slow Movers in Panel Data,” .
  • Shambaugh (2004) Shambaugh, J. C. (2004): “The Effect of Fixed Exchange Rates on Monetary Policy,” Quarterly Journal of Economics, 119(1), 301–352.
  • Spilimbergo (2009) Spilimbergo, A. (2009): “Democracy and Foreign Education,” American Economic Review, 99(1), 528–543.
  • Stock and Watson (2015) Stock, J. H., and M. W. Watson (2015): Introduction to Econometrics, The Pearson Series in Economics. Pearson, Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam Cape Town Dubai London, updated third edition, global edition edn.
  • Storeygard (2016) Storeygard, A. (2016): “Farther on down the Road: Transport Costs, Trade and Urban Growth in Sub-Saharan Africa,” Review of Economic Studies, 83(3), 1263–1295.
  • Verdier (2020) Verdier, V. (2020): “Average Treatment Effects for Stayers with Correlated Random Coefficient Models of Panel Data,” Journal of Applied Econometrics, 35(7), 917–939.
  • Wager and Athey (2018) Wager, S., and S. Athey (2018): “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests,” Journal of the American Statistical Association, 113(523), 1228–1242.
  • Wooldridge (2010) Wooldridge, J. M. (2010): Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, Mass, 2nd ed edn.

Appendix A Proofs

Proof of Theorem 1.

Recall from Definition 1 that CITE is the coefficient on the explanatory variables MiΨiM_{i}\Psi_{i} in a linear regression with dependent variable MiYiM_{i}Y_{i}, i.e.

θ^n=(i=1nΨiMiΨi)1i=1nΨiMiYi.\widehat{\theta}_{n}=\left(\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right)^{-1}\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}Y_{i}.

The reduced form for MiYiM_{i}Y_{i}, see (12), is

MiYi=MiΨiθ+Miζi.M_{i}Y_{i}=M_{i}\Psi_{i}\theta+M_{i}\zeta_{i}.

Therefore,

θ^nθ\displaystyle\widehat{\theta}_{n}-\theta =(i=1nΨiMiΨi)1i=1nΨiMiYiθ\displaystyle=\left(\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right)^{-1}\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}Y_{i}-\theta
=(i=1nΨiMiΨin)1i=1nΨiMiζin.\displaystyle=\left(\frac{\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}}{n}\right)^{-1}\frac{\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\zeta_{i}}{n}.

By the boundedness assumptions in the statement of the theorem, the weak law of large numbers (WLLN) for random vectors ensures that both terms converge to their expectations.

We will now show that E(ΨiMiζi)=0,E\left(\Psi_{i}^{\prime}M_{i}\zeta_{i}\right)=0, which completes the proof. Recall that the elements of ζi\zeta_{i} are

ζit=Uit+kVitkXitk\zeta_{it}=U_{it}+\sum_{k}V_{itk}X_{itk}

and that, because of Assumption 7, E(ζit|Xi,Gi,Zi)=0E(\left.\zeta_{it}\right|X_{i},G_{i},Z_{i})=0 so that, by the law of iterated expectations (LIE), E(ΨiMiζi)=0E\left(\Psi_{i}^{\prime}M_{i}\zeta_{i}\right)=0, since MiM_{i} and Ψi\Psi_{i} are transformations of (Xi,Gi,Zi)(X_{i},G_{i},Z_{i}) only. ∎

Proof of Theorem 2.

Recall that δ^i1\widehat{\delta}_{i1} be the first element of

δ^i=(XiXi)1Xi(YiΨiθ^n).\widehat{\delta}_{i}=\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\left(Y_{i}-\Psi_{i}\widehat{\theta}_{n}\right).

and that

YiΨiθ^n=Ψi(θθ^n)+Xiδi+ζiY_{i}-\Psi_{i}\widehat{\theta}_{n}=\Psi_{i}(\theta-\widehat{\theta}_{n})+X_{i}\delta_{i}+\zeta_{i}

so that the CITE for κ\kappa in definition 2 is

κ^n\displaystyle\widehat{\kappa}_{n} =(i=1nHiHi)1i=1nHiδ^i1\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}\widehat{\delta}_{i1}
=(i=1nHiHi)1i=1nHie1(XiXi)1Xi(YiΨiθ^n)\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\left(Y_{i}-\Psi_{i}\widehat{\theta}_{n}\right)
=(i=1nHiHi)1i=1nHie1(XiXi)1XiΨi(θθ^n)\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\Psi_{i}(\theta-\widehat{\theta}_{n})
+(i=1nHiHi)1i=1nHie1(XiXi)1XiXiδi\displaystyle\phantom{=}+\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}X_{i}\delta_{i}
+(i=1nHiHi)1i=1nHie1(XiXi)1Xiζi\displaystyle\phantom{=}+\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\zeta_{i}
A1n+A2n+A3n\displaystyle\equiv A_{1n}+A_{2n}+A_{3n}

For the second term,

A2n\displaystyle A_{2n} =(i=1nHiHi)1i=1nHiδi1\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}\delta_{i1}
p(E(HiHi))1E(Hiδi1)\displaystyle\stackrel{{\scriptstyle p}}{{\to}}(E(H_{i}^{\prime}H_{i}))^{-1}E(H_{i}^{\prime}\delta_{i1})
=κ~\displaystyle=\widetilde{\kappa}

where the first line simplifies the expression for A2nA_{2n} from the previous display; the convergence on the second line follows from the WLLN, which applies because of Assumption 1 and conditions (i) and (ii) in the statement of the result; and the final equality is (17).

We will now show that A1nA_{1n} and A3nA_{3n} converge in probability to zero, which completes the proof. First,

A1n\displaystyle A_{1n} =(i=1nHiHi)1(i=1nHie1(XiXi)1XiΨi)(θθ^n)\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\Psi_{i}\right)(\theta-\widehat{\theta}_{n})
=op(1),\displaystyle=o_{p}(1),

because a WLLN applies to the first and second term because of conditions (i) and (iii) in the statement of the result, so that they are Op(1)O_{p}(1) (note that the inverse exists for E(HiHi)E(H_{i}^{\prime}H_{i}) because of Assumption 4), and the final term is op(1)o_{p}(1) from Theorem 1. Second,

A3n\displaystyle A_{3n} =(i=1nHiHi)1i=1nHie1(XiXi)1Xiζi\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\zeta_{i}
p(E(HiHi))1E(Hie1(XiXi)1Xiζi)\displaystyle\stackrel{{\scriptstyle p}}{{\to}}\left(E(H_{i}^{\prime}H_{i})\right)^{-1}E\left(H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\zeta_{i}\right)

because WLLNs apply in light of conditions (i) and (iv) in the statement of the result. That the second expectation is zero follows from the last step of the proof of Theorem 1. ∎

Appendix B Data and additional details for section 6

To investigate whether past exposure to conflict in origin countries (such as civil conflict or mass killing) makes migrants more violence-prone in their host country, Couttenier, Petrencu, Rohner, and Thoenig (2019, CPRT hereafter) use data aggregated to the age cohort (aa) and immigrant nationality (nn) level and observed for each of the years (tt) between 2009 and 2016.111111We follow the notation of CPRT in this Appendix to facilitate comparison with their paper. The notation in section 6 is slightly different to facilitate consistency with the setup in our paper. In their baseline result, they find that cohorts exposed to civil conflict or mass killing in their origin country during childhood are 35 percent more prone to commit a violent crime in Switzerland than the average cohort. They then move on to explore heterogeneity in public policies across 26 Swiss regions, so-called ‘cantons’ (cc), to investigate how host country institutions modulate the impact of past exposure to conflict on current crime propensity (CPCP).121212CPCP is measured as the share (scaled in percentage points) of individuals in the cohort who perpetrate at least one violent crime in a given year. The main analysis we rely on limits the sample to the years 2011-2016 and to 25 cantons (by dropping Appenzell-Innerrhoden). Therefore, in a first step, they run the regression:

(21) CPn,a,t,c=βckidn,a,t+Zn,a,tγa+FEc,t+FEn,t+FEa+εn,a,t,cCP_{n,a,t,c}=\beta_{c}kid_{n,a,t}+Z_{n,a,t}\gamma_{a}+FE_{c,t}+FE_{n,t}+FE_{a}+\varepsilon_{n,a,t,c}

for their sub-sample of male asylum seekers, where kidn,a,tkid_{n,a,t} is a binary measure of early-age exposure to violence, ZZ contains binary control variables, and the FEFEs are fixed effects for canton×\timesyears, nationality×\timesyears, and age. Note that βc\beta_{c} is a canton-specific parameter capturing the violent ‘crime premium’ of conflict exposure over crime propensity of the average cohort and that its identification relies on within-canton, within-nationality, between-cohort variation.

In a second step, this canton-specific coefficient β^c\hat{\beta}_{c} is projected on a set of canton-specific policy and control variables, which resembles the heterogeneity equation in the context of our CITE framework:

(22) β^c=α×Policyc+Xcβ+εc.\hat{\beta}_{c}=\alpha\times Policy_{c}+X^{\prime}_{c}\beta+\varepsilon_{c}.

To take into account the precision with which the cantonal-specific effects β^c\hat{\beta}_{c} are estimated in the first step, CPRT estimate this second equation by GLS using the inverse of the standard errors estimated in the first stage as weights.131313CPRT refer to Bertrand and Schoar (2003 QJE) and Bandiera, Prat, and Valletti (2009 AER), who have previously taken this approach. Concerning those policy variables, CPRT mostly focus on ‘openjobacc’, which is a binary variable equal to 1 for cantons where asylum seekers can start working in all sectors of activity three months after arrival.

B.1. Replication of CPRT’s second stage results

We replicate those results in the first column of table 3, which is identical to column 1 of table 9 in CPRT. To facilitate comparison with our CITE framework, column 2 of table 3 reports the result based on OLS estimation, which leads to nearly identical results (although with somewhat higher standard errors).141414Note that the coefficient ×\times mean(‘openjobacc’) + constant, 0.62×0.64+0.91=0.506-0.62\times 0.64+0.91=0.506, provides an estimate of the mean of the canton-specific parameters β^c\hat{\beta}_{c}. Those 0.5060.506 are identical to regressing the β^c\hat{\beta}_{c} on a constant (up to the third decimal) and almost identical to the estimate we obtain for a homogeneous relationship in equation (21): 0.498 (SE: 0.470).

Table 3. Second stage results in CPRT (2019)
(1) (2)
VARIABLES β^c\hat{\beta}_{c} β^c\hat{\beta}_{c}
openjobacc -0.640* -0.624
(0.366) (0.439)
constant 0.796** 0.905**
(0.303) (0.352)
Observations 25 25
R-squared 0.246 0.253
Estimation GLS OLS
Standard errors in parentheses
*** p<<0.01, ** p<<0.05, * p<<0.1

B.2. Addition of time-varying GG variables

To explore the behavior of CITE vs. ITE in the framework developed in our paper, we add time-varying canton-specific data on the share of middle-aged (20-64 years) and urban population.151515We initially included the share of young (0-19 years) and old (65 years and above) population but since both of them led to nearly identical parameter estimates (around -1 for CITE and around 0.25 for ITE), we decided to opt for a simpler model with the share of middle-aged population (which is the residual of young and old). Those data come from annual ‘Kantonsporträts’ provided by the Swiss ‘Federal Statistical Office’ and capture the GG variables in the framework of our paper. Table 4 provides descriptive statistics of the sample data used in our application.

Table 4. Descriptive statistics
Variable Obs Mean SD Min Max
CP 48,272 3.92 17.28 0.00 100.00
x1_kid012 48,272 0.62 0.49 0.00 1.00
h1_openjobacc 48,272 0.62 0.48 0.00 1.00
g4_pop_middle 48,272 61.87 1.20 58.36 63.90
g6_urbanpop 48,272 75.77 19.64 0.00 100.00

B.3. Calculation of mean effect of conflict exposure

The mean ‘crime premium’ of conflict exposure can be calculated from table 2 by multiplying all interaction term coefficients with the sample means of the respective interaction variables and adding those terms up. For example, for the CITE estimate, this gives 1.146×0.6446.5-1.146\times 0.64-46.5 (from column 2) +0.805×61.8670.0274×75.774+0.805\times 61.867-0.0274\times 75.774 (from column 1) =0.462=0.462.161616Note that for CITE, the constant of the second-step regression needs to be taken into account, while for ITE the coefficient for x1_kid needs to be taken into account. This is very similar to the ‘crime premium’ result of 0.498 one obtains from CPRT when not including any interaction terms (see footnote 14).