Estimating interaction effects with panel data
Abstract.
A common task in empirical economics is to estimate interaction effects that measure how the effect of one variable on another variable depends on a third variable . This paper considers the estimation of interaction effects in linear panel models with a fixed number of time periods. There are at least two ways to estimate interaction effects in this setting, both common in applied work. Our theoretical results show that these two approaches are distinct, and only coincide under strong conditions on unobserved effect heterogeneity. Our empirical results show that the difference between the two approaches is large, leading to conflicting conclusions about the sign of the interaction effect. Taken together, our findings may guide the choice between the two approaches in empirical work.
Key words and phrases:
panel data, interaction effects, correlated random coefficients1. Introduction
In empirical work in economics, we often want to estimate how the effect of one variable on another variable depends on a third variable . The standard approach to answering such a question with cross-sectional data is to estimate interaction effects based on a linear model .111Alternative approaches for modeling effect heterogeneity include those based on random coefficients (cf. Lewbel and Pendakur, 2017), and those based on tools from machine learning (see Athey and Imbens, 2015; Bordt, Farbmacher, and Kogel, 2020; Chernozhukov, Hansen, Liao, and Zhu, 2019; Wager and Athey, 2018). In this case, is called the interaction term coefficient, and it measures how the effect of on varies with .
With panel data, there are at least two ways to estimate interaction effects in linear models. The most common approach, which we will call the interaction term estimator (ITE), is based on a regression of on and (and additive fixed effects).222See, for example, Burnside and Dollar (2000); Shambaugh (2004); List and Sturm (2006); Amiti and Konings (2007); Spilimbergo (2009); Epifani and Gancia (2009); Duflo, Dupas, and Kremer (2011); Bloom, Sadun, and Van Reenen (2012); Berman, Martin, and Mayer (2012); Bloom, Draca, and Van Reenen (2016); Storeygard (2016); Alsan and Goldin (2019); Herrera, Ordoñez, and Trebesch (2020); Manacorda and Tesei (2020). The OLS estimate of the coefficient on is the ITE for . A second approach, which we will call the correlated interaction term estimator (CITE), consists of two steps. First, regress on separately for each panel unit , obtaining . In a second step, project onto in a cross-sectional regression. The result of the second step is the CITE for . CITE has been used in applied work but appears to be less popular than ITE.333Existing papers that use this approach include Couttenier, Petrencu, Rohner, and Thoenig (2019) and MaCurdy (1981).
One goal of this paper is to show that these two approaches – ITE and CITE – are distinct, and will only recover the same object under strong conditions on unobserved effect heterogeneity. We derive conditions under which CITE is consistent for the interaction effect. These conditions are not sufficient to guarantee consistency of ITE. In two empirical applications, we show that ITE and CITE lead to conflicting conclusions about the sign of interaction effects. The difference in interaction effect estimates can be large. Based on our results, since CITE requires weaker assumptions on unobserved effect heterogeneity for consistency, we recommend it for typical applications in economics where there may be substantial, correlated unobserved effect heterogeneity.444In a sense that will become clear later, this preference for CITE over ITE corresponds to the preference for fixed effects over (correlated) random effects approaches for linear panel models with additive unobserved heterogeneity. In most existing work, the choice between ITE and CITE is typically not made explicit, and is done without motivation. We believe that we are the first to provide a rigorous analysis of these two estimators.
Fundamental to our results is that, in applications where interaction effects matter, there is likely additional unobserved effect heterogeneity in the effect of on . To fix ideas, consider the following outcome equation, which is a special case of our general framework:555Our general framework in Section 2 additionally has vector-valued , additional controls, time-varying interaction variables, and additional sources of unobserved effect heterogeneity.
(1) |
In (1), captures unobserved effect heterogeneity in addition to the observed effect heterogeneity due to the interaction term .
That the consistency of ITE relies on restrictive exogeneity conditions is evident from equation (1): the error term in the linear model underlying a regression of on and will have a composite error term . This error term is correlated with and , unless the unobserved effect heterogeneity is unrelated to and .
CITE, on the other hand, allows the unobserved effect heterogeneity to be arbitrarily correlated with , by treating the as parameters. We show that it is consistent for the interaction effect under conditions that do not guarantee the consistency of ITE. Importantly, this approach does not require a large number of time periods. Our consistency result is obtained under fixed-.
If is endogenous, neither ITE nor CITE recover a causal effect of on the effect of on . The best one can hope for is to recover the correlation between and . CITE recovers this correlation, but ITE does not.
To demonstrate the consistency of CITE, we build on existing work from the literature on correlated random coefficient models for linear panel models (see Chamberlain, 1992; Arellano and Bonhomme, 2012; Graham and Powell, 2012; Laage, 2020; Sasaki and Ura, 2021). The object of interest in this literature is (some feature of) the distribution of in . Techniques from this literature can also be used in the context of difference-in-difference estimation (cf. Verdier, 2020; de Chaisemartin and D’Haultfoeuille, 2022). Our work differs from the papers in this literature because we are interested in the estimation of interaction effects, while the distribution of the random coefficients is a nuisance parameter.
Previous work on interaction effects has made mention of using CITE.666Balli and Sørensen (2013) observe that unobserved effect heterogeneity can lead to inconsistency in the ITE, and suggest that “if the time-series dimension of the data is large, one may directly allow for country-varying slopes”. We show that such slopes should generally be preferred regardless of the time-series dimension. Giesselmann and Schmidt-Catran (2020) discuss CITE, but prefer their double-demeaned estimator, citing computational concerns, a concern that interaction effects involving time-invariant variables cannot be estimated, and observing that the minimum number of time periods is higher for CITE than for their ITE. However, we are not aware of existing work that demonstrates that ITE and CITE are distinct, and that analyzes their asymptotic properties.
Organization
Section 2 introduces our framework for the estimation of interaction effects with panel data. Section 3 formally defines the two estimators. Sections 4 and 5 present our theoretical results. Section 6 contains the results for two empirical applications. Appendix A contains proofs of the main results. Appendix B has additional data and details for the empirical applications.
2. Setup
We are interested in the estimation of interaction effects in a static linear panel model. Typically, the outcome equation for this purpose is specified as
(2) |
which allows the effect of the explanatory variable on the dependent variable to depend on observable, time-invariant interaction variables and observable, time-varying interaction variables . The parameters of interest are , the interaction term coefficient (ITC) on , and , the ITC on .
Once we admit that the effect of on may depend on the observable and , we may wonder whether there are additional, unobserved, sources of effect heterogeneity. Our framework explicitly introduces unobserved heterogeneity in the effect of on via the following three equations for cross-section unit at time :
(3) | ||||
(4) | ||||
(5) |
The outcome equation (3) describes how responds to a change in , allowing for control variables . The effect of on , denoted , can vary across and .
The coefficient equation (4) describes the heterogeneity in the effect of on . It decomposes the heterogeneous effect in:
-
(i)
a part that depends on ;
-
(ii)
a time-invariant, unit-specific ; and
-
(iii)
an idiosyncratic effect heterogeneity .
The heterogeneity equation (5) relates the unit-specific effect heterogeneity in the effect of the first regressor, , to , so that captures the unobserved part of . We do not model the time-invariant coefficients on the other regressors, see Remark 1.
In our framework, the effect of on may vary even when holding constant and , due to the presence of unobserved effect heterogeneity and . If we shut down unobserved effect heterogeneity, i.e.
and if all observables are scalar, then we obtain the standard outcome equation (2). Thus, our framework is a natural generalization of the standard way of thinking about interaction effects in linear models that takes seriously the role of unobserved effect heterogeneity.
In a model with scalar , and with unobserved effect heterogeneity, the reduced form of (3)-(5) is
(6) |
It is clear from (6) that, under our setup, the error term in (2) contains the unobserved effect heterogeneity terms. If those terms are correlated with the observables, a regression of on and is inconsistent due to endogeneity.
Remark 1.
Our specification allows for additive fixed effects in the outcome equation by setting . Then is the conventional additive fixed effect.
Remark 2.
The index need not refer to time, but can refer to students for a given classroom , counties within a given state , employees within a given firm , etc.
3. Two estimators
There are at least two estimators for the ITCs and introduced in equations (3)-(5). We call them the interaction term estimator (ITE) and the correlated interaction term estimator (CITE). The ITE, defined formally in Section 3.2, is a regression of on augmented with interactions of with and . The CITE, defined formally in Section 3.1, is a two-step estimator. The first step is a regression of on augmented with interactions of with and dummy variables for each . The second step is a regression of the estimated coefficients on the dummy variable interaction terms on .
Both approaches can accommodate additive fixed effects in the outcome equation by setting equal to a constant.
As a particular example of our framework and the two estimators, consider the special case with scalar and , no and , and . Then our setup simplifies to
(7) | ||||
(8) |
with reduced form
(9) |
Then ITE is the regression of on suggested by (9). In contrast, CITE is based on (7) and (8): first, regress on for each to obtain . Then regress on to obtain .
In the remainder of this section, we formally define the two estimators. From now on, we will assume access to a random sample consistent with the framework in Section 2.
Assumption 1 (Random sampling).
3.1. Correlated interaction term estimator (CITE)
Substituting (4) into (3) obtains the reduced form
(10) |
where we have introduced the following objects (dimensions in square brackets):
For a given , collecting this relationship across obtains
(11) |
where , , , and are the counterparts of the objects in (10), with rows.
For the CITE to be well-defined, we assume sufficient variation in .
Assumption 2.
There exists a such that
Assumption 2 avoids the identification issues in Graham and Powell (2012) and Arellano and Bonhomme (2012). If it does not hold for all , our analysis should be thought of as conditioning on a subpopulation for which it does hold, cf. Arellano and Bonhomme (2012). Assumption 2 allows us to define the residual maker matrix
and write
(12) |
using that . We also require sufficient variation in after projecting out .
Assumption 3.
The matrix is invertible.
Definition 1.
CITE for is given by
CITE for can be extracted from . For , we need variation in .
Assumption 4.
The matrix is invertible.
This is a standard no multicollinearity assumption on the regressors . For each , let be the first element of
Definition 2.
Then the CITE for is
Remark 3.
In the special case that and , CITE is the linear fixed effects estimator based on
In contrast, ITE is a correlated random effects estimator, see Remark 4.
3.2. Interaction term estimator (ITE)
Define to be with the first column removed, and define analogously. Then rewrite (11) as
(13) |
The ITE requires sufficient variation in .
Assumption 5.
There exists a such that
This assumption is similar to assumption 5, but is slightly weaker because of excluding the first column of . Under assumption 5, we can define
and premultiply (13) by to obtain
(14) |
Assumption 6.
The matrix is invertible.
Assumption 6 guarantees sufficient variation in after projecting out . Given Assumptions 5 and 6, the ITE is well-defined.
Definition 3.
The ITE for is
ITE estimates for , and can be extracted from .
Remark 4.
Remark 5.
In the special case that and , we have a linear fixed effects model with interaction terms. With a scalar and no ,
In this case, the transformation is the within transformation that produces
and the ITE is obtained from linear regression of on .
4. Main results
We now show that a set of exogeneity conditions that is sufficient for the consistency of CITE (Section 4.2) is not sufficient for the consistency of ITE (Section 4.3).
These results are derived without exogeneity restrictions involving . This allows the heterogeneity equation to be misspecified. We discuss this modeling choice and its consequences in Section 4.1. In Section 5, we provide results under correct specification.
4.1. Misspecification
Throughout this section, we will not make exogeneity assumptions that involve .777See Section 5 for an analysis under correct specification. This leaves room for misspecification of the heterogeneity equation, in the sense that we may have
(15) |
Such misspecification arises if relevant variables are omitted from the heterogeneity equation. To see this, let there be a vector such that and . If the researcher only uses (or has access to) a partial list and uses the alternative specification
then generally . Misspecification also arises if is measured with error, if there is functional form misspecification, etc.
Our results below show that CITE is robust against such misspecification, but that ITE is not.
Under misspecification of the heterogeneity equation, is too ambitious of a target even if were known. Consider instead the infeasible regression of on , which tends to the population projection coefficient of on ,
(16) | ||||
(17) |
This parameter is of interest in many applications. It answers the question: “How does the effect of on vary with ?”. It does not answer the causal question: “How does the effect of on change given an exogenous change in ?”. By restricting attention to instead of , we can allow for misspecification while still obtaining an interesting parameter. This is what CITE delivers. ITE does not deliver .
4.2. Consistency of CITE
Our analysis proceeds under the following exogeneity assumption, where collects all the coefficient equation error terms .
This assumption is similar to the strict exogeneity assumption that is standard in literature on correlated random coefficient panel models.888See Chamberlain (1992); Arellano and Bonhomme (2012); Graham and Powell (2012). A notable exception is Laage (2020). It is restrictive, but not much stronger than what is necessary for accommodating additive fixed effects in a linear model with fixed . Our version of strict exogeneity requires – along with – to be orthogonal to the error terms in the outcome and coefficient equations. This is not necessary for estimation of and , which would only require orthogonality with respect to , not .
We first state the result for .
Proof.
See Appendix A.∎
The second result is for . In what follows, is the column vector of length , with the first element equal to 1, and all other elements equal to zero.
Theorem 2 (Consistency for ).
Proof.
See Appendix A.∎
4.3. Inconsistency of the ITE
Without assumptions on , the ITE is inconsistent. To show this, we consider the special case with , , and without . We consider a setting with omitted variables: the ITE in this section uses the first element of but not the second. Other forms of endogeneity lead to similar conclusions. For example, letting in what follows is functional form misspecification. The case of measurement error in is similar.
Partition and rewrite the model equations for the special case under consideration:
Assume strict exogeneity for the full set of covariates
and additionally assume that the heterogeneity equation is correctly specified if both and are included, i.e.
These assumptions are stronger than necessary for consistency of CITE (using only) for ,999This follows from Theorem 2. the projection coefficient of on , which for this special case equals:
However, they are not sufficient for consistency for of the ITE that uses only ,
To see that it is not consistent under the maintained assumptions, assume that the relevant laws of large numbers apply, so that
This shows that CITE with converges to , whereas ITE does not. From the expression of the probability limit of , it is clear that the inconsistency can be made arbitrarily large.
5. Results under correct specification
The results in the previous section were derived under misspecification. We now add the assumption of correct specification to the model.
Assumption 8.
The heterogeneity equation is correctly specified,
(18) |
This assumption is likely too strong for most empirical applications in economics, see Section 4.1. Together, Assumptions 7 and 8 require that all variables are exogeneous in the outcome and coefficient equation, and that is exogenous in the heterogeneity equation. Assumption 8 does not require to be orthogonal to the other regressors .
If the heterogeneity equation is correctly specified, then CITE is consistent for the causal parameter (Theorem 3) but ITE is not (Section 5.2). We also discuss stronger conditions that restore consistency of ITE (Section 5.3).
5.1. Consistency of CITE
Theorem 1 applies without modification: CITE is consistent for . Under correct specification, we additionally have that CITE is consistent for , instead of for the projection coefficient .
Theorem 3 (Consistency of DVAITE for ).
5.2. Inconsistency of ITE
Assumption 8, in conjunction with Assumption 7, is likely too strong for most applications in economics. But it is not sufficient for consistency of ITE for . Consider the special case that are scalar, and that there are no . Then the reduced form simplifies to:
where
and . The ITE simplifies to
so that, if an appropriate law of large numbers holds,
The equality follows from Assumptions 7 and 8, which eliminate the first two components of the composite error terms. However, the remaining component, , is not orthogonal to .
5.3. A sufficient condition for consistency of ITE
The strongest exogeneity assumptions we have imposed so far is
It does not restrict the distribution of conditional on the time-varying observables. It is easy to show that the ITE is consistent if we impose the following stronger correlated random effects assumption:
This requires to be orthogonal to in addition to .
6. Empirical applications
We use two empirical applications to show that ITE and CITE can lead to meaningfully different conclusions about interaction effects.
6.1. Stock and Watson, 2015, Chapter 10
In their textbook example, Stock and Watson study the relationship between a U.S. state’s traffic fatality rate () and an alcohol tax () using data from 48 U.S. states over a period of 7 years. We extend this example to explore whether this relationship depends on time-varying interaction variables (state’s unemployment rate; state’s minimum punishment for drunk driving). We also consider time-invariant interaction variables , namely the period-1 values of the proportion of a state’s population that is mormon or southern baptist.
Table 1 reports the ITCs estimated with ITE (column 1) and CITE (column 2). CITE suggests that the effect of alcohol taxes on traffic fatalities depends negatively on the unemployment rate. In contrast, the ITE does not find evidence for the presence of an interaction effect. For the presence of minimum punishment, the ITE estimates a positive interaction effect. CITE also estimates a positive effect, but it is not statistically significant at conventional levels.
The estimates for time-invariant interaction variables are also not in agreement. Using the ITE, we find a statistically significant relationship between the alcohol tax effect and the proportion of southern baptist. When repeating the analysis with CITE, we do not find evidence for such a relationship.
In conclusion, the two estimators for interaction effects yield contrasting conclusions for three of the four interaction variables. For two variables, the point estimate changes sign.
ITE | CITE | |
unemployment rate | 0.003 | -0.045*** |
(0.015) | (0.018) | |
minimum punishment | 0.260*** | 0.139 |
(0.119) | (0.125) | |
proportion mormon | 0.001 | 0.111 |
(0.008) | (0.100) | |
proportion southern baptist | -0.041*** | 0.065 |
(0.019) | (0.101) | |
336 | 336 |
6.2. Couttenier, Petrencu, Rohner, and Thoenig (2019)
Empirical research increasingly uses higher-dimensional panel data and interactions among those dimensions (e.g., Berman, Martin, and Mayer, 2012; Bloom, Draca, and Van Reenen, 2016; Manacorda and Tesei, 2020). To show the applicability of CITE in such a setting and contrast it with the performance of ITE, we revisit the results in Couttenier, Petrencu, Rohner, and Thoenig (2019), CPRT hereafter, who use micro-level panel data from Switzerland that are aggregated to a time (), age cohort (), migrant nationality (), and Swiss regional () dimension (see Appendix B for details).
CPRT are interested in the relationship
(19) |
to estimate the effect of migrants’ past exposure to conflict (such as civil war) as a in their origin country on current crime propensity in Switzerland. They refer to as the crime premium. Interaction effects in this setting can therefore be interpreted as the impact of interaction variables on that crime premium.
CPRT are interested in whether policies in host regions are related to the crime premium. In particular, they are interested in the role of the time-invariant interaction variable , which equals 1 for regions where asylum seekers can start working in all sectors of activity three months after arrival. To that end, they use CITE, i.e. they project their region-specific coefficient estimates from (21) onto :
(20) |
We reexamine their CITE results by including additional, time-varying interaction variables and compare them to the ITE results. The additional interaction variables that we include are two demographic variables: a region’s share of middle-aged persons () and a region’s share of urban population (). Appendix B has more details on the data and specifications.
Table 2 reports our interaction effect estimates. First, we show in Appendix B that the mean effect of conflict exposure on crime propensity according to CITE is 0.46. This is very close to CPRT’s results, but not identical to the inclusion of additional time-varying interaction variables in our specification.
Second, CITE reveals that the crime premium is lower if urbanization increases (column 1, ), possibly reflecting better psycho-social support and integration opportunities. Note that the negative interaction coefficient is not only statistically significant but also economically large: it suggests that at an urbanization rate of 92% (the 75th percentile), the crime premium is essentially 0 (equal to 0.023), while it is estimated to be 0.79 at an urbanization rate of 64% (the 25th percentile) – a magnitude that is close to the higher-end baseline result in CPRT.
Third, CITE suggests that states with , i.e. states with a generous labour market policy for immigrants, have much lower crime premiums.
In contrast, ITE does not find any statistically significant interaction terms. The point estimates for and are about half those of CITE.101010If the ITE for urban population of -0.011 is taken at face value, it implies a crime premium of 0.62 and 0.31 at the 75th and 25th percentile of , respectively, and hence a much lower implied difference for this interaction than the reported CITE results.
In conclusion, CITE can be used in complex panel data settings that are increasingly common in the applied literature. Furthermore, CITE reveals two interesting interaction effects related to the crime premium that ITE does not find.
(1) | (2) | |
CITE | ITE | |
0.805 | -0.171 | |
(0.758) | (0.216) | |
-0.0274*** | -0.0112 | |
(0.00919) | (0.0110) | |
-1.146* | -0.490 | |
(0.622) | (0.447) | |
48272 | 48272 |
References
- (1)
- Alsan and Goldin (2019) Alsan, M., and C. Goldin (2019): “Watersheds in Child Mortality: The Role of Effective Water and Sewerage Infrastructure, 1880–1920,” Journal of Political Economy, 127(2), 586–638.
- Amiti and Konings (2007) Amiti, M., and J. Konings (2007): “Trade Liberalization, Intermediate Inputs, and Productivity: Evidence from Indonesia,” American Economic Review, 97(5), 1611–1638.
- Arellano and Bonhomme (2012) Arellano, M., and S. Bonhomme (2012): “Identifying Distributional Characteristics in Random Coefficients Panel Data Models,” The Review of Economic Studies, 79(3), 987–1020.
- Athey and Imbens (2015) Athey, S., and G. W. Imbens (2015): “Machine Learning Methods for Estimating Heterogeneous Causal Effects,” Stat.
- Balli and Sørensen (2013) Balli, H. O., and B. E. Sørensen (2013): “Interaction Effects in Econometrics,” Empirical Economics, 45(1), 583–603.
- Berman, Martin, and Mayer (2012) Berman, N., P. Martin, and T. Mayer (2012): “How Do Different Exporters React to Exchange Rate Changes?,” Quarterly Journal of Economics, 127(1), 437–492.
- Bloom, Draca, and Van Reenen (2016) Bloom, N., M. Draca, and J. Van Reenen (2016): “Trade Induced Technical Change? The Impact of Chinese Imports on Innovation, IT and Productivity,” Review of Economic Studies, 83(1), 87–117.
- Bloom, Sadun, and Van Reenen (2012) Bloom, N., R. Sadun, and J. Van Reenen (2012): “Americans Do IT Better: US Multinationals and the Productivity Miracle,” American Economic Review, 102(1), 167–201.
- Bordt, Farbmacher, and Kogel (2020) Bordt, S., H. Farbmacher, and H. Kogel (2020): “Estimating Grouped Patterns of Heterogeneity in Repeated Public Goods Experiments,” .
- Burnside and Dollar (2000) Burnside, C., and D. Dollar (2000): “Aid, Policies, and Growth,” American Economic Review, 90(4), 23.
- Chamberlain (1992) Chamberlain, G. (1992): “Efficiency Bounds for Semiparametric Regression,” Econometrica, 60(3), 567–596.
- Chernozhukov, Hansen, Liao, and Zhu (2019) Chernozhukov, V., C. Hansen, Y. Liao, and Y. Zhu (2019): “Inference for Heterogeneous Effects Using Low-Rank Estimation of Factor Slopes,” .
- Couttenier, Petrencu, Rohner, and Thoenig (2019) Couttenier, M., V. Petrencu, D. Rohner, and M. Thoenig (2019): “The Violent Legacy of Conflict: Evidence on Asylum Seekers, Crime, and Public Policy in Switzerland,” American Economic Review, 109(12), 4378–4425.
- de Chaisemartin and D’Haultfoeuille (2022) de Chaisemartin, C., and X. D’Haultfoeuille (2022): “Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey,” Discussion paper.
- Duflo, Dupas, and Kremer (2011) Duflo, E., P. Dupas, and M. Kremer (2011): “Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya,” American Economic Review, 101(5), 1739–1774.
- Epifani and Gancia (2009) Epifani, P., and G. Gancia (2009): “Openness, Government Size and the Terms of Trade,” The Review of Economic Studies, 76(2), 629–668.
- Giesselmann and Schmidt-Catran (2020) Giesselmann, M., and A. W. Schmidt-Catran (2020): “Interactions in Fixed Effects Regression Models,” Sociological Methods & Research, p. 004912412091493.
- Graham and Powell (2012) Graham, B. S., and J. L. Powell (2012): “Identification and Estimation of Average Partial Effects in ”Irregular” Correlated Random Coefficient Panel Data Models,” Econometrica, 80(5), 2105–2152.
- Herrera, Ordoñez, and Trebesch (2020) Herrera, H., G. Ordoñez, and C. Trebesch (2020): “Political Booms, Financial Crises,” Journal of Political Economy, 128(2), 507–543.
- Laage (2020) Laage, L. (2020): “A Correlated Random Coefficient Panel Model with Time-Varying Endogeneity,” .
- Lewbel and Pendakur (2017) Lewbel, A., and K. Pendakur (2017): “Unobserved Preference Heterogeneity in Demand Using Generalized Random Coefficients,” Journal of Political Economy, 125(4), 1100–1148.
- List and Sturm (2006) List, J. A., and D. M. Sturm (2006): “How Elections Matter: Theory and Evidence from Environmental Policy*,” Quarterly Journal of Economics, 121(4), 1249–1281.
- MaCurdy (1981) MaCurdy, T. E. (1981): “An Empirical Model of Labor Supply in a Life-Cycle Setting,” Journal of Political Economy, 89(6), 1059–1085.
- Manacorda and Tesei (2020) Manacorda, M., and A. Tesei (2020): “Liberation Technology: Mobile Phones and Political Mobilization in Africa,” Econometrica, 88(2), 533–567.
- Mundlak (1978) Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,” Econometrica, 46(1), 69–85.
- Sasaki and Ura (2021) Sasaki, Y., and T. Ura (2021): “Slow Movers in Panel Data,” .
- Shambaugh (2004) Shambaugh, J. C. (2004): “The Effect of Fixed Exchange Rates on Monetary Policy,” Quarterly Journal of Economics, 119(1), 301–352.
- Spilimbergo (2009) Spilimbergo, A. (2009): “Democracy and Foreign Education,” American Economic Review, 99(1), 528–543.
- Stock and Watson (2015) Stock, J. H., and M. W. Watson (2015): Introduction to Econometrics, The Pearson Series in Economics. Pearson, Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam Cape Town Dubai London, updated third edition, global edition edn.
- Storeygard (2016) Storeygard, A. (2016): “Farther on down the Road: Transport Costs, Trade and Urban Growth in Sub-Saharan Africa,” Review of Economic Studies, 83(3), 1263–1295.
- Verdier (2020) Verdier, V. (2020): “Average Treatment Effects for Stayers with Correlated Random Coefficient Models of Panel Data,” Journal of Applied Econometrics, 35(7), 917–939.
- Wager and Athey (2018) Wager, S., and S. Athey (2018): “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests,” Journal of the American Statistical Association, 113(523), 1228–1242.
- Wooldridge (2010) Wooldridge, J. M. (2010): Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, Mass, 2nd ed edn.
Appendix A Proofs
Proof of Theorem 1.
Recall from Definition 1 that CITE is the coefficient on the explanatory variables in a linear regression with dependent variable , i.e.
The reduced form for , see (12), is
Therefore,
By the boundedness assumptions in the statement of the theorem, the weak law of large numbers (WLLN) for random vectors ensures that both terms converge to their expectations.
We will now show that which completes the proof. Recall that the elements of are
and that, because of Assumption 7, so that, by the law of iterated expectations (LIE), , since and are transformations of only. ∎
Proof of Theorem 2.
Recall that be the first element of
and that
so that the CITE for in definition 2 is
For the second term,
where the first line simplifies the expression for from the previous display; the convergence on the second line follows from the WLLN, which applies because of Assumption 1 and conditions (i) and (ii) in the statement of the result; and the final equality is (17).
We will now show that and converge in probability to zero, which completes the proof. First,
because a WLLN applies to the first and second term because of conditions (i) and (iii) in the statement of the result, so that they are (note that the inverse exists for because of Assumption 4), and the final term is from Theorem 1. Second,
because WLLNs apply in light of conditions (i) and (iv) in the statement of the result. That the second expectation is zero follows from the last step of the proof of Theorem 1. ∎
Appendix B Data and additional details for section 6
To investigate whether past exposure to conflict in origin countries (such as civil conflict or mass killing) makes migrants more violence-prone in their host country, Couttenier, Petrencu, Rohner, and Thoenig (2019, CPRT hereafter) use data aggregated to the age cohort () and immigrant nationality () level and observed for each of the years () between 2009 and 2016.111111We follow the notation of CPRT in this Appendix to facilitate comparison with their paper. The notation in section 6 is slightly different to facilitate consistency with the setup in our paper. In their baseline result, they find that cohorts exposed to civil conflict or mass killing in their origin country during childhood are 35 percent more prone to commit a violent crime in Switzerland than the average cohort. They then move on to explore heterogeneity in public policies across 26 Swiss regions, so-called ‘cantons’ (), to investigate how host country institutions modulate the impact of past exposure to conflict on current crime propensity ().121212 is measured as the share (scaled in percentage points) of individuals in the cohort who perpetrate at least one violent crime in a given year. The main analysis we rely on limits the sample to the years 2011-2016 and to 25 cantons (by dropping Appenzell-Innerrhoden). Therefore, in a first step, they run the regression:
(21) |
for their sub-sample of male asylum seekers, where is a binary measure of early-age exposure to violence, contains binary control variables, and the s are fixed effects for cantonyears, nationalityyears, and age. Note that is a canton-specific parameter capturing the violent ‘crime premium’ of conflict exposure over crime propensity of the average cohort and that its identification relies on within-canton, within-nationality, between-cohort variation.
In a second step, this canton-specific coefficient is projected on a set of canton-specific policy and control variables, which resembles the heterogeneity equation in the context of our CITE framework:
(22) |
To take into account the precision with which the cantonal-specific effects are estimated in the first step, CPRT estimate this second equation by GLS using the inverse of the standard errors estimated in the first stage as weights.131313CPRT refer to Bertrand and Schoar (2003 QJE) and Bandiera, Prat, and Valletti (2009 AER), who have previously taken this approach. Concerning those policy variables, CPRT mostly focus on ‘openjobacc’, which is a binary variable equal to 1 for cantons where asylum seekers can start working in all sectors of activity three months after arrival.
B.1. Replication of CPRT’s second stage results
We replicate those results in the first column of table 3, which is identical to column 1 of table 9 in CPRT. To facilitate comparison with our CITE framework, column 2 of table 3 reports the result based on OLS estimation, which leads to nearly identical results (although with somewhat higher standard errors).141414Note that the coefficient mean(‘openjobacc’) + constant, , provides an estimate of the mean of the canton-specific parameters . Those are identical to regressing the on a constant (up to the third decimal) and almost identical to the estimate we obtain for a homogeneous relationship in equation (21): 0.498 (SE: 0.470).
(1) | (2) | |
VARIABLES | ||
openjobacc | -0.640* | -0.624 |
(0.366) | (0.439) | |
constant | 0.796** | 0.905** |
(0.303) | (0.352) | |
Observations | 25 | 25 |
R-squared | 0.246 | 0.253 |
Estimation | GLS | OLS |
Standard errors in parentheses | ||
*** p0.01, ** p0.05, * p0.1 |
B.2. Addition of time-varying variables
To explore the behavior of CITE vs. ITE in the framework developed in our paper, we add time-varying canton-specific data on the share of middle-aged (20-64 years) and urban population.151515We initially included the share of young (0-19 years) and old (65 years and above) population but since both of them led to nearly identical parameter estimates (around -1 for CITE and around 0.25 for ITE), we decided to opt for a simpler model with the share of middle-aged population (which is the residual of young and old). Those data come from annual ‘Kantonsporträts’ provided by the Swiss ‘Federal Statistical Office’ and capture the variables in the framework of our paper. Table 4 provides descriptive statistics of the sample data used in our application.
Variable | Obs | Mean | SD | Min | Max |
---|---|---|---|---|---|
CP | 48,272 | 3.92 | 17.28 | 0.00 | 100.00 |
x1_kid012 | 48,272 | 0.62 | 0.49 | 0.00 | 1.00 |
h1_openjobacc | 48,272 | 0.62 | 0.48 | 0.00 | 1.00 |
g4_pop_middle | 48,272 | 61.87 | 1.20 | 58.36 | 63.90 |
g6_urbanpop | 48,272 | 75.77 | 19.64 | 0.00 | 100.00 |
B.3. Calculation of mean effect of conflict exposure
The mean ‘crime premium’ of conflict exposure can be calculated from table 2 by multiplying all interaction term coefficients with the sample means of the respective interaction variables and adding those terms up. For example, for the CITE estimate, this gives (from column 2) (from column 1) .161616Note that for CITE, the constant of the second-step regression needs to be taken into account, while for ITE the coefficient for x1_kid needs to be taken into account. This is very similar to the ‘crime premium’ result of 0.498 one obtains from CPRT when not including any interaction terms (see footnote 14).