Estimating interaction effects with panel data

Chris Muris Department of Economics, McMaster University muerisc@mcmaster.ca and Konstantin M. Wacker Department of Global Economics and Management, University of Groningen k.m.wacker@rug.nl

Abstract.

A common task in empirical economics is to estimate interaction effects that measure how the effect of one variable $X$ on another variable $Y$ depends on a third variable $H$ . This paper considers the estimation of interaction effects in linear panel models with a fixed number of time periods. There are at least two ways to estimate interaction effects in this setting, both common in applied work. Our theoretical results show that these two approaches are distinct, and only coincide under strong conditions on unobserved effect heterogeneity. Our empirical results show that the difference between the two approaches is large, leading to conflicting conclusions about the sign of the interaction effect. Taken together, our findings may guide the choice between the two approaches in empirical work.

Key words and phrases:

panel data, interaction effects, correlated random coefficients

This paper benefited from discussions at presentations at the Universities of Bristol, Mainz, Vienna, the World Bank, the European Commission’s Joint Research Centre, and the Southern Economic Association meetings. We would like to thank the respective participants for their feedbacks and inputs, especially Irene Botosaru, Peter Egger, Michaela Kesina, Adam Lavecchia, Krishna Pendakur, James Powell, Nathanael Vellekoop, Frank Windmeijer, and Jonathan Zhang. We are grateful to Mathias Thoenig for providing the data used by Couttenier, Petrencu, Rohner, and Thoenig (2019) and for helping with preparing it for our empirical application, to the Swiss BfS for sharing regional GDP p.c. data, and to Eline Koopman for excellent research assistance.

1. Introduction

In empirical work in economics, we often want to estimate how the effect of one variable $X$ on another variable $Y$ depends on a third variable $H$ . The standard approach to answering such a question with cross-sectional data is to estimate interaction effects based on a linear model $Y=\alpha+X\beta+XH\kappa+U$ .¹¹1Alternative approaches for modeling effect heterogeneity include those based on random coefficients (cf. Lewbel and Pendakur, 2017), and those based on tools from machine learning (see Athey and Imbens, 2015; Bordt, Farbmacher, and Kogel, 2020; Chernozhukov, Hansen, Liao, and Zhu, 2019; Wager and Athey, 2018). In this case, $\kappa$ is called the interaction term coefficient, and it measures how the effect of $X$ on $Y$ varies with $H$ .

With panel data, there are at least two ways to estimate interaction effects in linear models. The most common approach, which we will call the interaction term estimator (ITE), is based on a regression of $Y_{it}$ on $X_{it}$ and $X_{it}H_{i}$ (and additive fixed effects).²²2See, for example, Burnside and Dollar (2000); Shambaugh (2004); List and Sturm (2006); Amiti and Konings (2007); Spilimbergo (2009); Epifani and Gancia (2009); Duflo, Dupas, and Kremer (2011); Bloom, Sadun, and Van Reenen (2012); Berman, Martin, and Mayer (2012); Bloom, Draca, and Van Reenen (2016); Storeygard (2016); Alsan and Goldin (2019); Herrera, Ordoñez, and Trebesch (2020); Manacorda and Tesei (2020). The OLS estimate of the coefficient on $X_{it}H_{i}$ is the ITE for $\kappa$ . A second approach, which we will call the correlated interaction term estimator (CITE), consists of two steps. First, regress $Y_{it}$ on $X_{it}$ separately for each panel unit $i$ , obtaining $b_{i}$ . In a second step, project $b_{i}$ onto $H_{i}$ in a cross-sectional regression. The result of the second step is the CITE for $\kappa$ . CITE has been used in applied work but appears to be less popular than ITE.³³3Existing papers that use this approach include Couttenier, Petrencu, Rohner, and Thoenig (2019) and MaCurdy (1981).

One goal of this paper is to show that these two approaches – ITE and CITE – are distinct, and will only recover the same object under strong conditions on unobserved effect heterogeneity. We derive conditions under which CITE is consistent for the interaction effect. These conditions are not sufficient to guarantee consistency of ITE. In two empirical applications, we show that ITE and CITE lead to conflicting conclusions about the sign of interaction effects. The difference in interaction effect estimates can be large. Based on our results, since CITE requires weaker assumptions on unobserved effect heterogeneity for consistency, we recommend it for typical applications in economics where there may be substantial, correlated unobserved effect heterogeneity.⁴⁴4In a sense that will become clear later, this preference for CITE over ITE corresponds to the preference for fixed effects over (correlated) random effects approaches for linear panel models with additive unobserved heterogeneity. In most existing work, the choice between ITE and CITE is typically not made explicit, and is done without motivation. We believe that we are the first to provide a rigorous analysis of these two estimators.

Fundamental to our results is that, in applications where interaction effects matter, there is likely additional unobserved effect heterogeneity in the effect of $X$ on $Y$ . To fix ideas, consider the following outcome equation, which is a special case of our general framework:⁵⁵5Our general framework in Section 2 additionally has vector-valued $X_{it}$ , additional controls, time-varying interaction variables, and additional sources of unobserved effect heterogeneity.

(1)

Y_{it}=\alpha_{i}+X_{it}\beta_{i}+\kappa X_{it}H_{i}+\varepsilon_{it}.

In (1), $\beta_{i}$ captures unobserved effect heterogeneity in addition to the observed effect heterogeneity due to the interaction term $\kappa H_{i}$ .

That the consistency of ITE relies on restrictive exogeneity conditions is evident from equation (1): the error term in the linear model underlying a regression of $Y$ on $X$ and $XH$ will have a composite error term $\varepsilon_{it}+(\beta_{i}-\beta)X_{it}$ . This error term is correlated with $X_{it}$ and $H_{i}$ , unless the unobserved effect heterogeneity is unrelated to $X_{it}$ and $H_{i}$ .

CITE, on the other hand, allows the unobserved effect heterogeneity $\beta_{i}$ to be arbitrarily correlated with $X_{it}$ , by treating the $\beta_{i}$ as parameters. We show that it is consistent for the interaction effect under conditions that do not guarantee the consistency of ITE. Importantly, this approach does not require a large number of time periods. Our consistency result is obtained under fixed- $T$ .

If $H_{i}$ is endogenous, neither ITE nor CITE recover a causal effect of $H_{i}$ on the effect of $X$ on $Y$ . The best one can hope for is to recover the correlation between $\beta_{i}$ and $H_{i}$ . CITE recovers this correlation, but ITE does not.

To demonstrate the consistency of CITE, we build on existing work from the literature on correlated random coefficient models for linear panel models (see Chamberlain, 1992; Arellano and Bonhomme, 2012; Graham and Powell, 2012; Laage, 2020; Sasaki and Ura, 2021). The object of interest in this literature is (some feature of) the distribution of $\beta_{i}$ in $Y_{it}=X_{it}\beta_{i}+U_{it}$ . Techniques from this literature can also be used in the context of difference-in-difference estimation (cf. Verdier, 2020; de Chaisemartin and D’Haultfoeuille, 2022). Our work differs from the papers in this literature because we are interested in the estimation of interaction effects, while the distribution of the random coefficients is a nuisance parameter.

Previous work on interaction effects has made mention of using CITE.⁶⁶6Balli and Sørensen (2013) observe that unobserved effect heterogeneity can lead to inconsistency in the ITE, and suggest that “if the time-series dimension of the data is large, one may directly allow for country-varying slopes”. We show that such slopes should generally be preferred regardless of the time-series dimension. Giesselmann and Schmidt-Catran (2020) discuss CITE, but prefer their double-demeaned estimator, citing computational concerns, a concern that interaction effects involving time-invariant variables cannot be estimated, and observing that the minimum number of time periods is higher for CITE than for their ITE. However, we are not aware of existing work that demonstrates that ITE and CITE are distinct, and that analyzes their asymptotic properties.

Organization

Section 2 introduces our framework for the estimation of interaction effects with panel data. Section 3 formally defines the two estimators. Sections 4 and 5 present our theoretical results. Section 6 contains the results for two empirical applications. Appendix A contains proofs of the main results. Appendix B has additional data and details for the empirical applications.

2. Setup

We are interested in the estimation of interaction effects in a static linear panel model. Typically, the outcome equation for this purpose is specified as

(2)

Y_{it}=X_{it}H_{i}\kappa+X_{it}G_{it}\phi+Z_{it}\gamma+\widetilde{U}_{it},

which allows the effect of the explanatory variable $X_{it}$ on the dependent variable $Y_{it}$ to depend on observable, time-invariant interaction variables $H_{i}$ and observable, time-varying interaction variables $G_{it}$ . The parameters of interest are $\kappa$ , the interaction term coefficient (ITC) on $H_{i}$ , and $\phi$ , the ITC on $G_{it}$ .

Once we admit that the effect of $X$ on $Y$ may depend on the observable $G$ and $H$ , we may wonder whether there are additional, unobserved, sources of effect heterogeneity. Our framework explicitly introduces unobserved heterogeneity in the effect of $X$ on $Y$ via the following three equations for cross-section unit $i=1,\cdots,n$ at time $t=1,\cdots,T$ :

(3)	$\displaystyle Y_{it}$	$\displaystyle=X_{it}\beta_{it}+Z_{it}\gamma+U_{it},$
(4)	$\displaystyle\beta_{itk}$	$\displaystyle=\delta_{ik}+G_{it}\phi_{k}+V_{itk},\,k=1,\ldots,K_{x},$
(5)	$\displaystyle\delta_{i1}$	$\displaystyle=H_{i}\kappa+\epsilon_{i}.$

The outcome equation (3) describes how $Y_{it}\in\mathbb{R}$ responds to a change in $X_{it}\in\mathbb{R}^{K_{x}}$ , allowing for control variables $Z_{it}\in\mathbb{R}^{K_{z}}$ . The effect of $X_{itk}$ on $Y_{it}$ , denoted $\beta_{itk}$ , can vary across $i$ and $t$ .

The coefficient equation (4) describes the heterogeneity in the effect of $X_{itk}$ on $Y_{it}$ . It decomposes the heterogeneous effect $\beta_{itk}$ in:

(i)

a part that depends on $G_{it}\in\mathbb{R}^{K_{g}}$ ;
(ii)

a time-invariant, unit-specific $\delta_{ik}\in\mathbb{R}$ ; and
(iii)

an idiosyncratic effect heterogeneity $V_{itk}$ .

The heterogeneity equation (5) relates the unit-specific effect heterogeneity in the effect of the first regressor, $\delta_{i1}$ , to $H_{i}\in\mathbb{R}^{K_{h}}$ , so that $\epsilon_{i}$ captures the unobserved part of $\delta_{i1}$ . We do not model the time-invariant coefficients on the other regressors, see Remark 1.

In our framework, the effect of $X$ on $Y$ may vary even when holding constant $G$ and $H$ , due to the presence of unobserved effect heterogeneity $\epsilon_{i}$ and $V_{itk}$ . If we shut down unobserved effect heterogeneity, i.e.

V_{itk}=\epsilon_{i}=0\text{ for all }i,t,k,

and if all observables are scalar, then we obtain the standard outcome equation (2). Thus, our framework is a natural generalization of the standard way of thinking about interaction effects in linear models that takes seriously the role of unobserved effect heterogeneity.

In a model with scalar $X,G,H,Z$ , and with unobserved effect heterogeneity, the reduced form of (3)-(5) is

(6)

Y_{it}=X_{it}H_{i}\kappa+X_{it}G_{it}\phi+Z_{it}+\left(U_{it}+X_{it}V_{it}+X_{it}\epsilon_{i}\right).

It is clear from (6) that, under our setup, the error term $\widetilde{U}_{it}$ in (2) contains the unobserved effect heterogeneity terms. If those terms are correlated with the observables, a regression of $Y$ on $XG$ and $XH$ is inconsistent due to endogeneity.

Remark 1.

Our specification allows for additive fixed effects in the outcome equation by setting $X_{it2}=1$ . Then $\delta_{i2}$ is the conventional additive fixed effect.

Remark 2.

The index $t$ need not refer to time, but can refer to students $t$ for a given classroom $i$ , counties $t$ within a given state $i$ , employees $t$ within a given firm $i$ , etc.

3. Two estimators

There are at least two estimators for the ITCs $\phi$ and $\kappa$ introduced in equations (3)-(5). We call them the interaction term estimator (ITE) and the correlated interaction term estimator (CITE). The ITE, defined formally in Section 3.2, is a regression of $Y$ on $(X,Z)$ augmented with interactions of $X$ with $G$ and $H$ . The CITE, defined formally in Section 3.1, is a two-step estimator. The first step is a regression of $Y$ on $(X,Z)$ augmented with interactions of $X$ with $G$ and dummy variables for each $i$ . The second step is a regression of the estimated coefficients on the dummy variable interaction terms on $H$ .

Both approaches can accommodate additive fixed effects in the outcome equation by setting $X_{it2}$ equal to a constant.

As a particular example of our framework and the two estimators, consider the special case with scalar $X$ and $H$ , no $Z$ and $G$ , and $V_{it1}=0$ . Then our setup simplifies to

(7)		$\displaystyle Y_{it}$	$\displaystyle=X_{it}\delta_{i1}+U_{it},$
(8)		$\displaystyle\delta_{i1}$	$\displaystyle=H_{i}\kappa+\epsilon_{i},$

with reduced form

(9)

\displaystyle Y_{it}

\displaystyle=\left(X_{it}H_{i}\right)\kappa+\left(U_{it}+X_{it}\epsilon_{it}\right).

Then ITE is the regression of $Y$ on $X\times H$ suggested by (9). In contrast, CITE is based on (7) and (8): first, regress $Y$ on $X$ for each $i$ to obtain $\widehat{\delta}_{i1}$ . Then regress $\widehat{\delta}_{i1}$ on $H_{i}$ to obtain $\widehat{\kappa}$ .

In the remainder of this section, we formally define the two estimators. From now on, we will assume access to a random sample consistent with the framework in Section 2.

Assumption 1 (Random sampling).

For each $i=1,\cdots,n$ , the observed data is

W_{i}=\left(Y_{i1},\cdots,Y_{iT},X_{i1},\cdots,X_{iT},G_{i1},\cdots,G_{iT},Z_{i1},\cdots,Z_{iT},H_{i}\right),

generated by equations (3)–(5). $\{W_{i}\}_{i=1}^{n}$ is an i.i.d. random sequence.

3.1. Correlated interaction term estimator (CITE)

Substituting (4) into (3) obtains the reduced form

	$\displaystyle Y_{it}$	$\displaystyle=\sum_{k}X_{itk}\beta_{itk}+Z_{it}\gamma+U_{it}$
		$\displaystyle=\sum_{k}X_{itk}\left(\delta_{ik}+G_{it}\phi_{k}+V_{itk}\right)+Z_{it}\gamma+U_{it}$
(10)			$\displaystyle\equiv X_{it}\delta_{i}+\Psi_{it}\theta+\zeta_{it},$

where we have introduced the following objects (dimensions in square brackets):

$\displaystyle\delta_{i}$	$\displaystyle=\left(\delta_{i1},\cdots,\delta_{iK}\right)$	$\displaystyle\left[K_{x}\times 1\right],$
$\displaystyle\Psi_{it}$	$\displaystyle=\left(X_{it}\otimes G_{it},Z_{it}\right)$	$\displaystyle\left[1\times\left(K_{x}K_{g}+K_{z}\right)\right],$
$\displaystyle\theta$	$\displaystyle=\left(\phi_{1},\cdots,\phi_{K},\gamma\right)$	$\displaystyle\left[\left(K_{x}K_{g}+K_{z}\right)\times 1\right],$
$\displaystyle\zeta_{it}$	$\displaystyle=U_{it}+\sum_{k}V_{itk}X_{itk}$	$\displaystyle\left[1\times 1\right].$

For a given $i$ , collecting this relationship across $t=1,\cdots,T$ obtains

(11)

Y_{i}=X_{i}\delta_{i}+\Psi_{i}\theta+\zeta_{i},

where $Y_{i}$ , $X_{i}$ , $\Psi_{i}$ , and $\zeta_{i}$ are the counterparts of the objects in (10), with $T$ rows.

For the CITE to be well-defined, we assume sufficient variation in $X_{i}$ .

Assumption 2.

There exists a $h>0$ such that

\inf_{i}\det\left(X_{i}^{{}^{\prime}}X_{i}\right)\geq h.

Assumption 2 avoids the identification issues in Graham and Powell (2012) and Arellano and Bonhomme (2012). If it does not hold for all $i$ , our analysis should be thought of as conditioning on a subpopulation for which it does hold, cf. Arellano and Bonhomme (2012). Assumption 2 allows us to define the residual maker matrix

M_{i}=I-X_{i}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}},

and write

	$\displaystyle M_{i}Y_{i}$	$\displaystyle=M_{i}X_{i}\delta_{i}+M_{i}\Psi_{i}\theta+M_{i}\zeta_{i}$
(12)			$\displaystyle=M_{i}\Psi_{i}\theta+M_{i}\zeta_{i},$

using that $M_{i}X_{i}=0$ . We also require sufficient variation in $\Psi_{i}$ after projecting out $X_{i}$ .

Assumption 3.

The matrix $E\left(\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right)$ is invertible.

Assumptions 1, 2 and 3 guarantee that CITE for $\theta$ is well-defined for large enough $n$ .

Definition 1.

CITE for $\theta$ is given by

\widehat{\theta}_{n}\equiv\left(\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right)^{-1}\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}Y_{i}.

CITE for $(\phi,\gamma)$ can be extracted from $\widehat{\theta}_{n}$ . For $\kappa$ , we need variation in $H_{i}$ .

Assumption 4.

The matrix $E\left(H_{i}^{{}^{\prime}}H_{i}\right)$ is invertible.

This is a standard no multicollinearity assumption on the regressors $H_{i}$ . For each $i$ , let $\widehat{\delta}_{i1}$ be the first element of

\widehat{\delta}_{i}=\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\left(Y_{i}-\Psi_{i}\widehat{\theta}_{n}\right).

Definition 2.

Then the CITE for $\kappa$ is

\widehat{\kappa}_{n}\equiv\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}\widehat{\delta}_{i1}.

Remark 3.

In the special case that $K_{x}=1$ and $X_{i1}=1$ , CITE is the linear fixed effects estimator based on

Y_{it}=\delta_{i1}+Z_{it}\gamma+U_{it}.

In contrast, ITE is a correlated random effects estimator, see Remark 4.

3.2. Interaction term estimator (ITE)

Define $X_{i,-1}$ to be $X_{i}$ with the first column removed, and define $\delta_{i,-1}$ analogously. Then rewrite (11) as

	$\displaystyle Y_{i}$	$\displaystyle=X_{i}\delta_{i}+\Psi_{i}\theta+\zeta_{i}$
		$\displaystyle=X_{i,-1}\delta_{i,-1}+X_{i1}\delta_{i1}+\Psi_{i}\theta+\zeta_{i}$
		$\displaystyle=X_{i,-1}\delta_{i,-1}+X_{i1}(H_{i}\kappa+\epsilon_{i})+\Psi_{i}\theta+\zeta_{i}$
(13)			$\displaystyle=X_{i,-1}\delta_{i,-1}+\underbrace{X_{i1}H_{i}\kappa+\Psi_{i}\theta}_{\widetilde{\Psi_{i}}\widetilde{\theta}}+\underbrace{\zeta_{i}+X_{i1}\epsilon_{i}}_{\widetilde{\zeta}_{i}}.$

The ITE requires sufficient variation in $X_{i,-1}$ .

Assumption 5.

There exists a $h>0$ such that

\inf_{i}\det\left(X_{i,-1}^{{}^{\prime}}X_{i,-1}\right)\geq h.

This assumption is similar to assumption 5, but is slightly weaker because of excluding the first column of $X_{i}$ . Under assumption 5, we can define

M_{i,-1}=I-X_{i,-1}\left(X_{i,-1}^{{}^{\prime}}X_{i,-1}\right)^{-1}X_{i,-1}^{{}^{\prime}}

and premultiply (13) by $M_{i,-1}$ to obtain

	$\displaystyle M_{i,-1}Y_{i}$	$\displaystyle=M_{i,-1}X_{i,-1}\delta_{i,-1}+M_{i,-1}\widetilde{\Psi}_{i}\widetilde{\theta}+M_{i,-1}\widetilde{\zeta}_{i}$
(14)			$\displaystyle=M_{i,-1}\widetilde{\Psi}_{i}\widetilde{\theta}+M_{i,-1}\widetilde{\zeta}_{i}.$

Assumption 6.

The matrix $E\left(\widetilde{\Psi}_{i}^{{}^{\prime}}M_{i,-1}\widetilde{\Psi}_{i}\right)$ is invertible.

Assumption 6 guarantees sufficient variation in $\widetilde{\Psi}_{i}$ after projecting out $X_{i,-1}$ . Given Assumptions 5 and 6, the ITE is well-defined.

Definition 3.

The ITE for $\widetilde{\theta}=(\kappa,\phi,\gamma)=(\kappa,\theta)$ is

\widehat{\widetilde{\theta}}_{n}\equiv\left(\sum_{i=1}^{n}\widetilde{\Psi}_{i}^{{}^{\prime}}M_{i,-1}\widetilde{\Psi}_{i}\right)^{-1}\sum_{i=1}^{n}\widetilde{\Psi}_{i}^{{}^{\prime}}M_{i,-1}Y_{i}.

ITE estimates for $\kappa$ , $\phi$ and $\gamma$ can be extracted from $\widehat{\widetilde{\theta}}_{n}$ .

Remark 4.

In the special case that $K_{x}=1$ and $X_{i1}=1$ , the ITE is a correlated random effects estimator (cf. Mundlak (1978) and Wooldridge (2010), 10.7.3). It approximates the unobserved heterogeneity by $\delta_{i1}=H_{i}\kappa+\epsilon_{i}$ , so that

Y_{it}=\delta_{i1}+Z_{it}\gamma+U_{it}=H_{i}\kappa+Z_{it}\gamma+(U_{it}+\epsilon_{i}).

Consistency of the ITE would require assumptions about orthogonality of $\epsilon$ and $(H,Z)$ .

Remark 5.

In the special case that $K_{x}=2$ and $X_{it2}=1$ , we have a linear fixed effects model with interaction terms. With a scalar $H$ and no $G$ ,

Y_{it}=\delta_{i2}+X_{it}H_{i}\kappa+\left(U_{it}+X_{it}V_{it1}+X_{it}\epsilon_{i}\right).

In this case, the transformation $M_{i,-1}$ is the within transformation that produces

\widetilde{Y}_{it}=Y_{it}-\frac{1}{T}\sum_{s=1}^{T}Y_{is},\;\;\widetilde{X}_{it}=X_{it}-\frac{1}{T}\sum_{s=1}^{T}X_{is}

and the ITE is obtained from linear regression of $\widetilde{Y}_{it}$ on $H_{i}\times\widetilde{X}_{it}$ .

4. Main results

We now show that a set of exogeneity conditions that is sufficient for the consistency of CITE (Section 4.2) is not sufficient for the consistency of ITE (Section 4.3).

These results are derived without exogeneity restrictions involving $\epsilon_{i}$ . This allows the heterogeneity equation to be misspecified. We discuss this modeling choice and its consequences in Section 4.1. In Section 5, we provide results under correct specification.

4.1. Misspecification

Throughout this section, we will not make exogeneity assumptions that involve $\epsilon_{i}$ .⁷⁷7See Section 5 for an analysis under correct specification. This leaves room for misspecification of the heterogeneity equation, in the sense that we may have

(15)

E\left[\left.\epsilon_{i}\right|H_{i}\right]\neq 0.

Such misspecification arises if relevant variables are omitted from the heterogeneity equation. To see this, let there be a vector $H_{i}^{*}$ such that $\delta_{i1}=H_{i}^{*}\kappa^{*}+\epsilon^{*}_{i}$ and $E\left[\left.\epsilon^{*}_{i}\right|H_{i}^{*}\right]=0$ . If the researcher only uses (or has access to) a partial list $H_{i}\subsetneq H_{i}^{*}$ and uses the alternative specification

\delta_{i1}=H_{i}\kappa+\epsilon_{i},

then generally $E\left[\left.\epsilon_{i}\right|H_{i}\right]\neq 0$ . Misspecification also arises if $H_{i}$ is measured with error, if there is functional form misspecification, etc.

Our results below show that CITE is robust against such misspecification, but that ITE is not.

Under misspecification of the heterogeneity equation, $\kappa$ is too ambitious of a target even if $\delta_{i1}$ were known. Consider instead the infeasible regression of $\delta_{i1}$ on $H_{i}$ , which tends to the population projection coefficient of $\delta_{i1}$ on $H_{i}$ ,

(16)		$\displaystyle\widetilde{\kappa}$	$\displaystyle\equiv\left(E\left[H_{i}^{{}^{\prime}}H_{i}\right]\right)^{-1}E\left[H_{i}^{{}^{\prime}}\delta_{i1}\right]$
(17)			$\displaystyle=\kappa+\left(E\left[H_{i}^{{}^{\prime}}H_{i}\right]\right)^{-1}E\left[H_{i}^{{}^{\prime}}\epsilon_{i}\right]$

This parameter $\widetilde{\kappa}$ is of interest in many applications. It answers the question: “How does the effect of $X$ on $Y$ vary with $H$ ?”. It does not answer the causal question: “How does the effect of $X$ on $Y$ change given an exogenous change in $H$ ?”. By restricting attention to $\widetilde{\kappa}$ instead of $\kappa$ , we can allow for misspecification while still obtaining an interesting parameter. This is what CITE delivers. ITE does not deliver $\widetilde{\kappa}$ .

4.2. Consistency of CITE

Our analysis proceeds under the following exogeneity assumption, where $V_{i}$ collects all the coefficient equation error terms $V_{itk}$ .

Assumption 7 (Strict exogeneity).

The error terms in equations (3) and (4) satisfy

	$\displaystyle 0$	$\displaystyle=E\left(\left.U_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right)$
		$\displaystyle=E\left(\left.V_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right).$

This assumption is similar to the strict exogeneity assumption that is standard in literature on correlated random coefficient panel models.⁸⁸8See Chamberlain (1992); Arellano and Bonhomme (2012); Graham and Powell (2012). A notable exception is Laage (2020). It is restrictive, but not much stronger than what is necessary for accommodating additive fixed effects in a linear model with fixed $T$ . Our version of strict exogeneity requires $H$ – along with $(X,Z,G)$ – to be orthogonal to the error terms in the outcome and coefficient equations. This is not necessary for estimation of $\phi$ and $\gamma$ , which would only require orthogonality with respect to $(X,Z,G)$ , not $H$ .

We first state the result for $\theta=(\phi,\gamma)$ .

Theorem 1 (Consistency of CITE for $\theta$ ).

If Assumptions 1, 2, 3, and 7 hold, and if

E\left\|M_{i}Y_{i}\right\|^{2}<\infty,~{}E\left\|M_{i}\Psi_{i}\right\|^{2}<\infty

then as $n\to\infty$ , $\widehat{\theta}_{n}\stackrel{{\scriptstyle p}}{{\to}}\theta$ .

Proof.

See Appendix A.∎

The second result is for $\widetilde{\kappa}$ . In what follows, $e_{1}^{\prime}$ is the column vector of length $K_{h}$ , with the first element equal to 1, and all other elements equal to zero.

Theorem 2 (Consistency for $\widetilde{\kappa}$ ).

Under the conditions of Theorem 1, and if Assumption 4 holds, and if (i) $E\left\|H_{i}\right\|^{2}<\infty$ , (ii) $E\left\|\delta_{i1}\right\|^{2}<\infty$ , (iii) $E\left\|H_{i}e_{1}^{\prime}(X_{i}^{\prime}X_{i})^{-1}X_{i}^{\prime}\Psi_{i}\right\|<\infty$ , and (iv) $E\left\|H_{i}e_{1}^{\prime}(X_{i}^{\prime}X_{i})^{-1}X_{i}^{\prime}\zeta_{i}\right\|<\infty$ , then, as $n\to\infty$ , $\widehat{\kappa}\stackrel{{\scriptstyle p}}{{\to}}\widetilde{\kappa}$ .

Proof.

See Appendix A.∎

Remark 6.

Theorem 2 adapts (Arellano and Bonhomme, 2012, Corollary 1) to the present context. Note that our Assumption 2 implies that we can drop their conditioning on $\mathbb{S}$ . Then, set their $F_{i}=H_{i}$ , their $y_{i}=Y_{i}$ , their $\gamma_{i}=\delta_{i1}$ , their $Z_{i}=\Psi_{i}$ , their $\delta=\theta$ .

4.3. Inconsistency of the ITE

Without assumptions on $\epsilon_{i}$ , the ITE is inconsistent. To show this, we consider the special case with $X\in\mathbb{R}$ , $H\in\mathbb{R}^{2}$ , and without $G,Z$ . We consider a setting with omitted variables: the ITE in this section uses the first element of $H$ but not the second. Other forms of endogeneity lead to similar conclusions. For example, letting $H_{i2}=H_{i1}^{2}$ in what follows is functional form misspecification. The case of measurement error in $H_{i1}$ is similar.

Partition $H_{i}=(H_{i1},H_{i2})$ and rewrite the model equations for the special case under consideration:

	$\displaystyle Y_{it}$	$\displaystyle=X_{it1}\beta_{it1}+U_{it}$
	$\displaystyle\beta_{it1}$	$\displaystyle=\delta_{i1}+V_{it1}$
	$\displaystyle\delta_{i1}$	$\displaystyle=H_{i}\kappa+\epsilon_{i},$
		$\displaystyle=H_{i1}\kappa_{1}+H_{i2}\kappa_{2}+\epsilon_{i}.$

Assume strict exogeneity for the full set of covariates $H_{i}$

	$\displaystyle 0$	$\displaystyle=E\left(\left.U_{i}\right\|X_{i},Z_{i},G_{i},H_{i1},H_{i2}\right),$
	$\displaystyle 0$	$\displaystyle=E\left(\left.V_{i}\right\|X_{i},Z_{i},G_{i},H_{i1},H_{i2}\right)$

and additionally assume that the heterogeneity equation is correctly specified if both $H_{i1}$ and $H_{i2}$ are included, i.e.

0=E\left(\left.\epsilon_{i}\right|X_{i},Z_{i},G_{i},H_{i1},H_{i2}\right).

These assumptions are stronger than necessary for consistency of CITE (using $H_{i1}$ only) for $\widetilde{\kappa}_{1}$ ,⁹⁹9This follows from Theorem 2. the projection coefficient of $\delta_{i1}$ on $H_{i1}$ , which for this special case equals:

\widetilde{\kappa}_{1}=\kappa_{1}+\left(E\left[H_{i1}^{2}\right]\right)^{-1}E\left[H_{i1}H_{i2}\right]\kappa_{2}.

However, they are not sufficient for consistency for $\widetilde{\kappa}_{1}$ of the ITE that uses only $H_{i1}$ ,

	$\displaystyle\widecheck{\kappa}_{1}$	$\displaystyle=\left(\sum_{i}\sum_{t}X_{it1}^{2}H_{i1}^{2}\right)^{-1}\left(\sum_{i}\sum_{t}X_{it1}H_{i1}Y_{it}\right)$
		$\displaystyle=\kappa_{1}+\left(\sum_{i}\sum_{t}X_{it1}^{2}H_{i1}^{2}\right)^{-1}\left(\sum_{i}\sum_{t}X_{it1}H_{i1}(X_{it1}(H_{i2}\kappa_{2}+\epsilon_{i}+V_{it1})+U_{it})\right)$

To see that it is not consistent under the maintained assumptions, assume that the relevant laws of large numbers apply, so that

	$\displaystyle\widecheck{\kappa}_{1}$	$\displaystyle\stackrel{{\scriptstyle p}}{{\to}}\kappa_{1}+\left(\sum_{t}E\left[X_{it1}^{2}H_{i1}^{2}\right]\right)^{-1}\left(\sum_{t}E\left[X_{it1}H_{i1}(X_{it1}(H_{i2}\kappa_{2}+\epsilon_{i}+V_{it1})+U_{it}\right])\right)$
		$\displaystyle=\kappa_{1}+\left(\sum_{t}E\left[X_{it1}^{2}H_{i1}^{2}\right]\right)^{-1}\left(\sum_{t}E\left[X_{it1}^{2}H_{i1}H_{i2}\kappa_{2}\right]\right)$
		$\displaystyle\neq\kappa_{1}+\left(E\left[H_{i1}^{2}\right]\right)^{-1}E\left[H_{i1}H_{i2}\right]\kappa_{2}.$
		$\displaystyle=\widetilde{\kappa}_{1}.$

This shows that CITE with $H_{i1}$ converges to $\widetilde{\kappa}_{1}$ , whereas ITE does not. From the expression of the probability limit of $\widecheck{\kappa}_{1}$ , it is clear that the inconsistency can be made arbitrarily large.

5. Results under correct specification

The results in the previous section were derived under misspecification. We now add the assumption of correct specification to the model.

Assumption 8.

The heterogeneity equation is correctly specified,

(18)

E\left(\left.\epsilon_{i}\right|H_{i}\right)=0.

This assumption is likely too strong for most empirical applications in economics, see Section 4.1. Together, Assumptions 7 and 8 require that all variables are exogeneous in the outcome and coefficient equation, and that $H_{i}$ is exogenous in the heterogeneity equation. Assumption 8 does not require $\epsilon_{i}$ to be orthogonal to the other regressors $\left(X,G,Z\right)$ .

If the heterogeneity equation is correctly specified, then CITE is consistent for the causal parameter $\kappa$ (Theorem 3) but ITE is not (Section 5.2). We also discuss stronger conditions that restore consistency of ITE (Section 5.3).

5.1. Consistency of CITE

Theorem 1 applies without modification: CITE is consistent for $(\phi,\gamma)$ . Under correct specification, we additionally have that CITE is consistent for $\kappa$ , instead of for the projection coefficient $\widetilde{\kappa}$ .

Theorem 3 (Consistency of DVAITE for $\kappa$ ).

If the conditions for Theorem 2 hold, and if assumption 8 holds, then, as $n\to\infty$ , $\widehat{\kappa}\stackrel{{\scriptstyle p}}{{\to}}\kappa$ .

Proof.

Under assumption 8, $\widetilde{\kappa}=\kappa.$ . Other than that, the proof is identical to that of Theorem 2. ∎

5.2. Inconsistency of ITE

Assumption 8, in conjunction with Assumption 7, is likely too strong for most applications in economics. But it is not sufficient for consistency of ITE for $\kappa$ . Consider the special case that $G,X,H$ are scalar, and that there are no $Z$ . Then the reduced form simplifies to:

Y_{it}=\Psi_{it}\widetilde{\theta}+\left(U_{it}+X_{it}\epsilon_{i}+X_{it}V_{it}\right),

where

\widetilde{\Psi}_{it}=(X_{it}G_{it},X_{it}H_{i}),

and $\widetilde{\theta}=(\phi,\kappa)$ . The ITE simplifies to

\widehat{\widetilde{\theta}}=\left(\sum_{i}\sum_{t}\widetilde{\Psi}_{it}^{\prime}\widetilde{\Psi}_{it}\right)^{-1}\sum_{i}\sum_{t}\widetilde{\Psi}_{it}^{\prime}Y_{it},

so that, if an appropriate law of large numbers holds,

	$\displaystyle\widehat{\widetilde{\theta}}-\widetilde{\theta}$	$\displaystyle\stackrel{{\scriptstyle p}}{{\to}}\left(\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}\widetilde{\Psi}_{it}\right]\right)^{-1}\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}\left(U_{it}+X_{it}V_{it}+X_{it}\epsilon_{i}\right)\right]$
		$\displaystyle=\left(\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}\widetilde{\Psi}_{it}\right]\right)^{-1}\sum_{t}E\left[\widetilde{\Psi}_{it}^{\prime}X_{it}\epsilon_{i}\right].$

The equality follows from Assumptions 7 and 8, which eliminate the first two components of the composite error terms. However, the remaining component, $X_{it}\epsilon_{i}$ , is not orthogonal to $\Psi_{it}=(X_{it}G_{it},X_{it}H_{i})$ .

5.3. A sufficient condition for consistency of ITE

The strongest exogeneity assumptions we have imposed so far is

	$\displaystyle 0$	$\displaystyle=E\left(\left.U_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right),$
	$\displaystyle 0$	$\displaystyle=E\left(\left.V_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right),$
	$\displaystyle 0$	$\displaystyle=E\left(\left.\epsilon_{i}\right\|H_{i}\right).$

It does not restrict the distribution of $\epsilon_{i}$ conditional on the time-varying observables. It is easy to show that the ITE is consistent if we impose the following stronger correlated random effects assumption:

	$\displaystyle 0$	$\displaystyle=E\left(\left.U_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right)$
		$\displaystyle=E\left(\left.V_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right)$
		$\displaystyle=E\left(\left.\epsilon_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right).$

This requires $\epsilon_{i}$ to be orthogonal to $(X,G,Z)$ in addition to $H$ .

6. Empirical applications

We use two empirical applications to show that ITE and CITE can lead to meaningfully different conclusions about interaction effects.

6.1. Stock and Watson, 2015, Chapter 10

In their textbook example, Stock and Watson study the relationship between a U.S. state’s traffic fatality rate ( $Y_{it}$ ) and an alcohol tax ( $X_{it}$ ) using data from 48 U.S. states over a period of 7 years. We extend this example to explore whether this relationship depends on time-varying interaction variables $G$ (state’s unemployment rate; state’s minimum punishment for drunk driving). We also consider time-invariant interaction variables $H$ , namely the period-1 values of the proportion of a state’s population that is mormon or southern baptist.

Table 1 reports the ITCs estimated with ITE (column 1) and CITE (column 2). CITE suggests that the effect of alcohol taxes on traffic fatalities depends negatively on the unemployment rate. In contrast, the ITE does not find evidence for the presence of an interaction effect. For the presence of minimum punishment, the ITE estimates a positive interaction effect. CITE also estimates a positive effect, but it is not statistically significant at conventional levels.

The estimates for time-invariant interaction variables are also not in agreement. Using the ITE, we find a statistically significant relationship between the alcohol tax effect and the proportion of southern baptist. When repeating the analysis with CITE, we do not find evidence for such a relationship.

In conclusion, the two estimators for interaction effects yield contrasting conclusions for three of the four interaction variables. For two variables, the point estimate changes sign.

	ITE	CITE
unemployment rate	0.003	-0.045***
	(0.015)	(0.018)
minimum punishment	0.260***	0.139
	(0.119)	(0.125)
proportion mormon	0.001	0.111
	(0.008)	(0.100)
proportion southern baptist	-0.041***	0.065
	(0.019)	(0.101)
$nT$	336	336

Table 1. Point estimates and standard errors (in parentheses) for interaction effects in Stock and Watson’s ”beer tax”. First column has results for ITE, second column has results for CITE. Time-varying interaction variables are the unemployment rate and the existence of minimum punishment. Time-invariant interaction variables are the proportion of a state’s population that is mormon/southern baptist. *** p

<

0.01, ** p

<

0.05, * p

<

0.1.

6.2. Couttenier, Petrencu, Rohner, and Thoenig (2019)

Empirical research increasingly uses higher-dimensional panel data and interactions among those dimensions (e.g., Berman, Martin, and Mayer, 2012; Bloom, Draca, and Van Reenen, 2016; Manacorda and Tesei, 2020). To show the applicability of CITE in such a setting and contrast it with the performance of ITE, we revisit the results in Couttenier, Petrencu, Rohner, and Thoenig (2019), CPRT hereafter, who use micro-level panel data from Switzerland that are aggregated to a time ( $t$ ), age cohort ( $a$ ), migrant nationality ( $n$ ), and Swiss regional ( $i$ ) dimension (see Appendix B for details).

CPRT are interested in the relationship

(19)

CP_{n,a,t,i}=\beta_{i}kid_{n,a,t}+Z_{n,a,t}\gamma_{a}+FE_{i,t}+FE_{n,t}+FE_{a}+\varepsilon_{n,a,t,i},

to estimate the effect $\beta_{i}$ of migrants’ past exposure to conflict (such as civil war) as a $kid$ in their origin country on current crime propensity $CP$ in Switzerland. They refer to $\beta_{i}$ as the crime premium. Interaction effects in this setting can therefore be interpreted as the impact of interaction variables on that crime premium.

CPRT are interested in whether policies in host regions are related to the crime premium. In particular, they are interested in the role of the time-invariant interaction variable $openjobacc_{i}$ , which equals 1 for regions where asylum seekers can start working in all sectors of activity three months after arrival. To that end, they use CITE, i.e. they project their region-specific coefficient estimates $\hat{\beta}_{i}$ from (21) onto $openjobacc_{i}$ :

(20)

\hat{\beta}_{i}=a+b\ openjobacc_{i}+\varepsilon_{i}.

We reexamine their CITE results by including additional, time-varying interaction variables and compare them to the ITE results. The additional interaction variables that we include are two demographic variables: a region’s share of middle-aged persons ( $pop\_middle$ ) and a region’s share of urban population ( $urbanpop$ ). Appendix B has more details on the data and specifications.

Table 2 reports our interaction effect estimates. First, we show in Appendix B that the mean effect of conflict exposure on crime propensity according to CITE is 0.46. This is very close to CPRT’s results, but not identical to the inclusion of additional time-varying interaction variables in our specification.

Second, CITE reveals that the crime premium is lower if urbanization increases (column 1, $urbanpop$ ), possibly reflecting better psycho-social support and integration opportunities. Note that the negative interaction coefficient $-0.027$ is not only statistically significant but also economically large: it suggests that at an urbanization rate of 92% (the 75th percentile), the crime premium is essentially 0 (equal to 0.023), while it is estimated to be 0.79 at an urbanization rate of 64% (the 25th percentile) – a magnitude that is close to the higher-end baseline result in CPRT.

Third, CITE suggests that states with $openjobacc_{i}=1$ , i.e. states with a generous labour market policy for immigrants, have much lower crime premiums.

In contrast, ITE does not find any statistically significant interaction terms. The point estimates for $openjobacc$ and $urbanpop$ are about half those of CITE.¹⁰¹⁰10If the ITE for urban population of -0.011 is taken at face value, it implies a crime premium of 0.62 and 0.31 at the 75th and 25th percentile of $urbanpop$ , respectively, and hence a much lower implied difference for this interaction than the reported CITE results.

In conclusion, CITE can be used in complex panel data settings that are increasingly common in the applied literature. Furthermore, CITE reveals two interesting interaction effects related to the crime premium that ITE does not find.

	(1)	(2)
	CITE	ITE
$pop\_middle$	0.805	-0.171
	(0.758)	(0.216)
$urbanpop$	-0.0274***	-0.0112
	(0.00919)	(0.0110)
$openjobacc$	-1.146*	-0.490
	(0.622)	(0.447)
$nT$	48272	48272

Table 2. Point estimates and standard errors (in parentheses) for CPRT. First column has results for CITE, second column has results for ITE. Time-varying interaction variables are the share of a region’s population that is middle-aged,

pop\_middle

, and the share of a region’s population that is urban

urbanpop

. The time-invariant interaction variable

openjobacc_{i}

is a dummy variable indicating whether asylum seekers can start working in all sectors of activity three months after arrival. *** p

<

0.01, ** p

<

0.05, * p

<

0.1.

References

(1)
Alsan and Goldin (2019) Alsan, M., and C. Goldin (2019): “Watersheds in Child Mortality: The Role of Effective Water and Sewerage Infrastructure, 1880–1920,” Journal of Political Economy, 127(2), 586–638.
Amiti and Konings (2007) Amiti, M., and J. Konings (2007): “Trade Liberalization, Intermediate Inputs, and Productivity: Evidence from Indonesia,” American Economic Review, 97(5), 1611–1638.
Arellano and Bonhomme (2012) Arellano, M., and S. Bonhomme (2012): “Identifying Distributional Characteristics in Random Coefficients Panel Data Models,” The Review of Economic Studies, 79(3), 987–1020.
Athey and Imbens (2015) Athey, S., and G. W. Imbens (2015): “Machine Learning Methods for Estimating Heterogeneous Causal Effects,” Stat.
Balli and Sørensen (2013) Balli, H. O., and B. E. Sørensen (2013): “Interaction Effects in Econometrics,” Empirical Economics, 45(1), 583–603.
Berman, Martin, and Mayer (2012) Berman, N., P. Martin, and T. Mayer (2012): “How Do Different Exporters React to Exchange Rate Changes?,” Quarterly Journal of Economics, 127(1), 437–492.
Bloom, Draca, and Van Reenen (2016) Bloom, N., M. Draca, and J. Van Reenen (2016): “Trade Induced Technical Change? The Impact of Chinese Imports on Innovation, IT and Productivity,” Review of Economic Studies, 83(1), 87–117.
Bloom, Sadun, and Van Reenen (2012) Bloom, N., R. Sadun, and J. Van Reenen (2012): “Americans Do IT Better: US Multinationals and the Productivity Miracle,” American Economic Review, 102(1), 167–201.
Bordt, Farbmacher, and Kogel (2020) Bordt, S., H. Farbmacher, and H. Kogel (2020): “Estimating Grouped Patterns of Heterogeneity in Repeated Public Goods Experiments,” .
Burnside and Dollar (2000) Burnside, C., and D. Dollar (2000): “Aid, Policies, and Growth,” American Economic Review, 90(4), 23.
Chamberlain (1992) Chamberlain, G. (1992): “Efficiency Bounds for Semiparametric Regression,” Econometrica, 60(3), 567–596.
Chernozhukov, Hansen, Liao, and Zhu (2019) Chernozhukov, V., C. Hansen, Y. Liao, and Y. Zhu (2019): “Inference for Heterogeneous Effects Using Low-Rank Estimation of Factor Slopes,” .
Couttenier, Petrencu, Rohner, and Thoenig (2019) Couttenier, M., V. Petrencu, D. Rohner, and M. Thoenig (2019): “The Violent Legacy of Conflict: Evidence on Asylum Seekers, Crime, and Public Policy in Switzerland,” American Economic Review, 109(12), 4378–4425.
de Chaisemartin and D’Haultfoeuille (2022) de Chaisemartin, C., and X. D’Haultfoeuille (2022): “Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey,” Discussion paper.
Duflo, Dupas, and Kremer (2011) Duflo, E., P. Dupas, and M. Kremer (2011): “Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya,” American Economic Review, 101(5), 1739–1774.
Epifani and Gancia (2009) Epifani, P., and G. Gancia (2009): “Openness, Government Size and the Terms of Trade,” The Review of Economic Studies, 76(2), 629–668.
Giesselmann and Schmidt-Catran (2020) Giesselmann, M., and A. W. Schmidt-Catran (2020): “Interactions in Fixed Effects Regression Models,” Sociological Methods & Research, p. 004912412091493.
Graham and Powell (2012) Graham, B. S., and J. L. Powell (2012): “Identification and Estimation of Average Partial Effects in ”Irregular” Correlated Random Coefficient Panel Data Models,” Econometrica, 80(5), 2105–2152.
Herrera, Ordoñez, and Trebesch (2020) Herrera, H., G. Ordoñez, and C. Trebesch (2020): “Political Booms, Financial Crises,” Journal of Political Economy, 128(2), 507–543.
Laage (2020) Laage, L. (2020): “A Correlated Random Coefficient Panel Model with Time-Varying Endogeneity,” .
Lewbel and Pendakur (2017) Lewbel, A., and K. Pendakur (2017): “Unobserved Preference Heterogeneity in Demand Using Generalized Random Coefficients,” Journal of Political Economy, 125(4), 1100–1148.
List and Sturm (2006) List, J. A., and D. M. Sturm (2006): “How Elections Matter: Theory and Evidence from Environmental Policy*,” Quarterly Journal of Economics, 121(4), 1249–1281.
MaCurdy (1981) MaCurdy, T. E. (1981): “An Empirical Model of Labor Supply in a Life-Cycle Setting,” Journal of Political Economy, 89(6), 1059–1085.
Manacorda and Tesei (2020) Manacorda, M., and A. Tesei (2020): “Liberation Technology: Mobile Phones and Political Mobilization in Africa,” Econometrica, 88(2), 533–567.
Mundlak (1978) Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica, 46(1), 69–85.
Sasaki and Ura (2021) Sasaki, Y., and T. Ura (2021): “Slow Movers in Panel Data,” .
Shambaugh (2004) Shambaugh, J. C. (2004): “The Effect of Fixed Exchange Rates on Monetary Policy,” Quarterly Journal of Economics, 119(1), 301–352.
Spilimbergo (2009) Spilimbergo, A. (2009): “Democracy and Foreign Education,” American Economic Review, 99(1), 528–543.
Stock and Watson (2015) Stock, J. H., and M. W. Watson (2015): Introduction to Econometrics, The Pearson Series in Economics. Pearson, Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam Cape Town Dubai London, updated third edition, global edition edn.
Storeygard (2016) Storeygard, A. (2016): “Farther on down the Road: Transport Costs, Trade and Urban Growth in Sub-Saharan Africa,” Review of Economic Studies, 83(3), 1263–1295.
Verdier (2020) Verdier, V. (2020): “Average Treatment Effects for Stayers with Correlated Random Coefficient Models of Panel Data,” Journal of Applied Econometrics, 35(7), 917–939.
Wager and Athey (2018) Wager, S., and S. Athey (2018): “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests,” Journal of the American Statistical Association, 113(523), 1228–1242.
Wooldridge (2010) Wooldridge, J. M. (2010): Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, Mass, 2nd ed edn.

Appendix A Proofs

Proof of Theorem 1.

Recall from Definition 1 that CITE is the coefficient on the explanatory variables $M_{i}\Psi_{i}$ in a linear regression with dependent variable $M_{i}Y_{i}$ , i.e.

\widehat{\theta}_{n}=\left(\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right)^{-1}\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}Y_{i}.

The reduced form for $M_{i}Y_{i}$ , see (12), is

M_{i}Y_{i}=M_{i}\Psi_{i}\theta+M_{i}\zeta_{i}.

Therefore,

	$\displaystyle\widehat{\theta}_{n}-\theta$	$\displaystyle=\left(\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}\right)^{-1}\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}Y_{i}-\theta$
		$\displaystyle=\left(\frac{\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\Psi_{i}}{n}\right)^{-1}\frac{\sum_{i=1}^{n}\Psi_{i}^{{}^{\prime}}M_{i}\zeta_{i}}{n}.$

By the boundedness assumptions in the statement of the theorem, the weak law of large numbers (WLLN) for random vectors ensures that both terms converge to their expectations.

We will now show that $E\left(\Psi_{i}^{\prime}M_{i}\zeta_{i}\right)=0,$ which completes the proof. Recall that the elements of $\zeta_{i}$ are

\zeta_{it}=U_{it}+\sum_{k}V_{itk}X_{itk}

and that, because of Assumption 7, $E(\left.\zeta_{it}\right|X_{i},G_{i},Z_{i})=0$ so that, by the law of iterated expectations (LIE), $E\left(\Psi_{i}^{\prime}M_{i}\zeta_{i}\right)=0$ , since $M_{i}$ and $\Psi_{i}$ are transformations of $(X_{i},G_{i},Z_{i})$ only. ∎

Proof of Theorem 2.

Recall that $\widehat{\delta}_{i1}$ be the first element of

\widehat{\delta}_{i}=\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\left(Y_{i}-\Psi_{i}\widehat{\theta}_{n}\right).

and that

Y_{i}-\Psi_{i}\widehat{\theta}_{n}=\Psi_{i}(\theta-\widehat{\theta}_{n})+X_{i}\delta_{i}+\zeta_{i}

so that the CITE for $\kappa$ in definition 2 is

	$\displaystyle\widehat{\kappa}_{n}$	$\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}\widehat{\delta}_{i1}$
		$\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\left(Y_{i}-\Psi_{i}\widehat{\theta}_{n}\right)$
		$\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\Psi_{i}(\theta-\widehat{\theta}_{n})$
		$\displaystyle\phantom{=}+\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}X_{i}\delta_{i}$
		$\displaystyle\phantom{=}+\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\zeta_{i}$
		$\displaystyle\equiv A_{1n}+A_{2n}+A_{3n}$

For the second term,

	$\displaystyle A_{2n}$	$\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}\delta_{i1}$
		$\displaystyle\stackrel{{\scriptstyle p}}{{\to}}(E(H_{i}^{\prime}H_{i}))^{-1}E(H_{i}^{\prime}\delta_{i1})$
		$\displaystyle=\widetilde{\kappa}$

where the first line simplifies the expression for $A_{2n}$ from the previous display; the convergence on the second line follows from the WLLN, which applies because of Assumption 1 and conditions (i) and (ii) in the statement of the result; and the final equality is (17).

We will now show that $A_{1n}$ and $A_{3n}$ converge in probability to zero, which completes the proof. First,

	$\displaystyle A_{1n}$	$\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\Psi_{i}\right)(\theta-\widehat{\theta}_{n})$
		$\displaystyle=o_{p}(1),$

because a WLLN applies to the first and second term because of conditions (i) and (iii) in the statement of the result, so that they are $O_{p}(1)$ (note that the inverse exists for $E(H_{i}^{\prime}H_{i})$ because of Assumption 4), and the final term is $o_{p}(1)$ from Theorem 1. Second,

	$\displaystyle A_{3n}$	$\displaystyle=\left(\sum_{i=1}^{n}H_{i}^{{}^{\prime}}H_{i}\right)^{-1}\sum_{i=1}^{n}H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\zeta_{i}$
		$\displaystyle\stackrel{{\scriptstyle p}}{{\to}}\left(E(H_{i}^{\prime}H_{i})\right)^{-1}E\left(H_{i}^{{}^{\prime}}e_{1}^{\prime}\left(X_{i}^{{}^{\prime}}X_{i}\right)^{-1}X_{i}^{{}^{\prime}}\zeta_{i}\right)$

because WLLNs apply in light of conditions (i) and (iv) in the statement of the result. That the second expectation is zero follows from the last step of the proof of Theorem 1. ∎

Appendix B Data and additional details for section 6

To investigate whether past exposure to conflict in origin countries (such as civil conflict or mass killing) makes migrants more violence-prone in their host country, Couttenier, Petrencu, Rohner, and Thoenig (2019, CPRT hereafter) use data aggregated to the age cohort ( $a$ ) and immigrant nationality ( $n$ ) level and observed for each of the years ( $t$ ) between 2009 and 2016.¹¹¹¹11We follow the notation of CPRT in this Appendix to facilitate comparison with their paper. The notation in section 6 is slightly different to facilitate consistency with the setup in our paper. In their baseline result, they find that cohorts exposed to civil conflict or mass killing in their origin country during childhood are 35 percent more prone to commit a violent crime in Switzerland than the average cohort. They then move on to explore heterogeneity in public policies across 26 Swiss regions, so-called ‘cantons’ ( $c$ ), to investigate how host country institutions modulate the impact of past exposure to conflict on current crime propensity ( $CP$ ).¹²¹²12 $CP$ is measured as the share (scaled in percentage points) of individuals in the cohort who perpetrate at least one violent crime in a given year. The main analysis we rely on limits the sample to the years 2011-2016 and to 25 cantons (by dropping Appenzell-Innerrhoden). Therefore, in a first step, they run the regression:

(21)

CP_{n,a,t,c}=\beta_{c}kid_{n,a,t}+Z_{n,a,t}\gamma_{a}+FE_{c,t}+FE_{n,t}+FE_{a}+\varepsilon_{n,a,t,c}

for their sub-sample of male asylum seekers, where $kid_{n,a,t}$ is a binary measure of early-age exposure to violence, $Z$ contains binary control variables, and the $FE$ s are fixed effects for canton $\times$ years, nationality $\times$ years, and age. Note that $\beta_{c}$ is a canton-specific parameter capturing the violent ‘crime premium’ of conflict exposure over crime propensity of the average cohort and that its identification relies on within-canton, within-nationality, between-cohort variation.

In a second step, this canton-specific coefficient $\hat{\beta}_{c}$ is projected on a set of canton-specific policy and control variables, which resembles the heterogeneity equation in the context of our CITE framework:

(22)

\hat{\beta}_{c}=\alpha\times Policy_{c}+X^{\prime}_{c}\beta+\varepsilon_{c}.

To take into account the precision with which the cantonal-specific effects $\hat{\beta}_{c}$ are estimated in the first step, CPRT estimate this second equation by GLS using the inverse of the standard errors estimated in the first stage as weights.¹³¹³13CPRT refer to Bertrand and Schoar (2003 QJE) and Bandiera, Prat, and Valletti (2009 AER), who have previously taken this approach. Concerning those policy variables, CPRT mostly focus on ‘openjobacc’, which is a binary variable equal to 1 for cantons where asylum seekers can start working in all sectors of activity three months after arrival.

B.1. Replication of CPRT’s second stage results

We replicate those results in the first column of table 3, which is identical to column 1 of table 9 in CPRT. To facilitate comparison with our CITE framework, column 2 of table 3 reports the result based on OLS estimation, which leads to nearly identical results (although with somewhat higher standard errors).¹⁴¹⁴14Note that the coefficient $\times$ mean(‘openjobacc’) + constant, $-0.62\times 0.64+0.91=0.506$ , provides an estimate of the mean of the canton-specific parameters $\hat{\beta}_{c}$ . Those $0.506$ are identical to regressing the $\hat{\beta}_{c}$ on a constant (up to the third decimal) and almost identical to the estimate we obtain for a homogeneous relationship in equation (21): 0.498 (SE: 0.470).

Table 3. Second stage results in CPRT (2019)

Standard errors in parentheses
	(1)	(2)
VARIABLES	$\hat{\beta}_{c}$	$\hat{\beta}_{c}$
openjobacc	-0.640*	-0.624
	(0.366)	(0.439)
constant	0.796**	0.905**
	(0.303)	(0.352)
Observations	25	25
R-squared	0.246	0.253
Estimation	GLS	OLS
* p $<$ 0.01, p $<$ 0.05, * p $<$ 0.1

B.2. Addition of time-varying $G$ variables

To explore the behavior of CITE vs. ITE in the framework developed in our paper, we add time-varying canton-specific data on the share of middle-aged (20-64 years) and urban population.¹⁵¹⁵15We initially included the share of young (0-19 years) and old (65 years and above) population but since both of them led to nearly identical parameter estimates (around -1 for CITE and around 0.25 for ITE), we decided to opt for a simpler model with the share of middle-aged population (which is the residual of young and old). Those data come from annual ‘Kantonsporträts’ provided by the Swiss ‘Federal Statistical Office’ and capture the $G$ variables in the framework of our paper. Table 4 provides descriptive statistics of the sample data used in our application.

Table 4. Descriptive statistics

Variable	Obs	Mean	SD	Min	Max
CP	48,272	3.92	17.28	0.00	100.00
x1_kid012	48,272	0.62	0.49	0.00	1.00
h1_openjobacc	48,272	0.62	0.48	0.00	1.00
g4_pop_middle	48,272	61.87	1.20	58.36	63.90
g6_urbanpop	48,272	75.77	19.64	0.00	100.00

B.3. Calculation of mean effect of conflict exposure

The mean ‘crime premium’ of conflict exposure can be calculated from table 2 by multiplying all interaction term coefficients with the sample means of the respective interaction variables and adding those terms up. For example, for the CITE estimate, this gives $-1.146\times 0.64-46.5$ (from column 2) $+0.805\times 61.867-0.0274\times 75.774$ (from column 1) $=0.462$ .¹⁶¹⁶16Note that for CITE, the constant of the second-step regression needs to be taken into account, while for ITE the coefficient for x1_kid needs to be taken into account. This is very similar to the ‘crime premium’ result of 0.498 one obtains from CPRT when not including any interaction terms (see footnote 14).

	$\displaystyle 0$	$\displaystyle=E\left(\left.U_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right)$
		$\displaystyle=E\left(\left.V_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right)$
		$\displaystyle=E\left(\left.\epsilon_{i}\right\|X_{i},Z_{i},G_{i},H_{i}\right).$

Estimating interaction effects with panel data

Abstract.

Key words and phrases:

1. Introduction

Organization

2. Setup

Remark 1.

Remark 2.

3. Two estimators

Assumption 1 (Random sampling).

3.1. Correlated interaction term estimator (CITE)

Assumption 2.

Assumption 3.

Definition 1.

Assumption 4.

Definition 2.

Remark 3.

3.2. Interaction term estimator (ITE)

Assumption 5.

Assumption 6.

Definition 3.

Remark 4.

Remark 5.

4. Main results

4.1. Misspecification

4.2. Consistency of CITE

Assumption 7 (Strict exogeneity).

Theorem 1 (Consistency of CITE for θ\theta).

Proof.

Theorem 2 (Consistency for κ~\widetilde{\kappa}).

Proof.

Remark 6.

4.3. Inconsistency of the ITE

5. Results under correct specification

Assumption 8.

5.1. Consistency of CITE

Theorem 3 (Consistency of DVAITE for κ\kappa).

Proof.

5.2. Inconsistency of ITE

5.3. A sufficient condition for consistency of ITE

6. Empirical applications

6.1. Stock and Watson, 2015, Chapter 10

6.2. Couttenier, Petrencu, Rohner, and Thoenig (2019)

References

Appendix A Proofs

Proof of Theorem 1.

Proof of Theorem 2.

Appendix B Data and additional details for section 6

B.1. Replication of CPRT’s second stage results

B.2. Addition of time-varying GG variables

B.3. Calculation of mean effect of conflict exposure

Theorem 1 (Consistency of CITE for $\theta$ ).

Theorem 2 (Consistency for $\widetilde{\kappa}$ ).

Theorem 3 (Consistency of DVAITE for $\kappa$ ).

B.2. Addition of time-varying $G$ variables