This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\marginsize

1.75cm1.75cm1.25cm1.25cm

Identification of Peer Effects using Panel Data

Marisa Miraldo, Carol Propper, Christiern Rose Miraldo and Propper: Imperial College Business School, Rose: University of Queensland. Corresponding author: Christiern Rose, School of Economics, University of Queensland, 4072, Australia, [email protected]. This project is part of the Health Foundation’s Efficiency Research programme. The Health Foundation is an independent charity working to improve the quality of healthcare in the UK. We are thankful for their financial support. Propper acknowledges financial support from the ERC. Hospital Episode Statistics datasets were provided by the Health and Social Care Information Centre (HSCIC, now National Health Service Digital). The Hospital Episode Statistics are copyright ©2000/01-2013/14, re-used with the permission of The Health & Social Care Information Centre. All rights reserved.

1.   Introduction

This paper provides new identification results for panel data models of peer effects, through which outcomes depend on peers’ unobservable (to the researcher) heterogeneity. Our framework also allows for correlated effects, modelled as unobserved group heterogeneity which is permitted to be correlated with individual heterogeneity in an unrestricted manner. We extend existing identification results to apply both to a general network structure (e.g., a social network) and to allow for correlated effects, and apply our results to study innovation take-up among physicians.

Identification depends on a conditional mean restriction requiring exogenous mobility of individuals between groups over time, as first formalised in Abowd et al. (1999). That is, though our model of correlated effects allows, for example, that high outcome individuals be systematically located in high outcome groups, their mobility between groups ought not be determined by transitory outcome shocks. Not all patterns of mobility suffice for identification, and we provide identifying and non-identifying examples. We also provide an extension of our identification results to allow for endogenous peer effects, through which outcomes depend directly on peers’ outcomes. With endogenous effects, identification can also be attained using the conditional mean restriction, however, for certain network structures, additional conditional variance restrictions are necessary. We conduct a Monte-Carlo with many individuals and few time periods and demonstrate that the NLS estimator first proposed by Arcidiacono et al. (2012) works well in practice. Increasing the number of time periods, the rate of mobility and the richness of the network (e.g., social network data) improves the performance of the estimator.

Our empirical work considers innovation take-up in cancer treatment. The innovation we consider is keyhole surgery for colorectal cancer, and our data are from the English National Health Service. Colorectal cancer is the third most common cancer worldwide (Arnold et al., 2017). In England, it accounts for 10% of cancer deaths and is the most expensive cancer to treat (Laudicella et al., 2016). Keyhole surgery for colorectal cancer is an important innovation. It has been shown to lead to better patient outcomes than the alternative open procedure, particularly in the short term. Moreover, in the National Health Service, keyhole surgery is less costly, primarily due to shorter post-surgery hospital stays by patients (Lacy et al., 2002; COSTSG, 2004; Laudicella et al., 2016). Despite this, its take-up was slow, increasing from 1% of eligible surgeries when it was first introduced in 2000 to 49% by 2014.

We use matched patient-surgeon-hospital-year data from 2000 to 2014 to estimate peer effects in take-up of keyhole surgery, measured by the fraction of colorectal cancer surgeries performed by keyhole. We find positive and statistically significant peer effects. Our results suggest that a standard deviation increase in the average latent take-up of other surgeons in the same hospital leads to a 5 percentage point increase in take-up. We decompose this effect by additionally estimating the effect of peer experience, which we find accounts for some, but not all, of the peer effect.

1.1.   Related Literature

Research has largely focussed on settings in which peer effects operate through observable characteristics. That is, in addition to correlated effects, outcomes depend on peers’ observables (Manski, 1993; Moffitt et al., 2001; Lee, 2007; Bramoullé et al., 2009; Calvó-Armengol et al., 2009; Davezies et al., 2009; De Giorgi et al., 2010; Goldsmith-Pinkham and Imbens, 2013; Blume et al., 2015; De Paula, 2017; Cohen-Cole et al., 2018; Bramoullé et al., 2019). These papers consider a sample comprising a cross-section of groups, for which identification requires within-group variation in peers. Our panel data allow us to relax these requirements. First, the researcher need not observe individual characteristics, and if observed, need not know which ones are exogenous nor which ones are appropriate to include. Second, variation induced by mobility of individuals between groups over time means that there is no need for within-group variation in peers. This implies that identification is possible under the linear-in-means network structure, in which peer effects operate through the group average, and which precludes identification when only a cross-section of groups is available (Manski, 1993; Bramoullé et al., 2009).

Another strand of the literature considers peer effects operating through individual unobservables using a cross-section of groups (Graham, 2008; Rose, 2017). Relative to these papers, we allow for unobserved group level heterogeneity to be arbitrarily correlated with individual heterogeneity, and, though our results can be applied to any network, we show that panel data can be used to identify peer effects for the linear-in-means network, which are not identifiable with only a cross-section of networks when there are correlated effects (Rose, 2017).

The most closely related work considers panel data models of peer effects. Key contributions are Mas and Moretti (2009), Arcidiacono et al. (2012) and Cornelissen et al. (2017), which, like us, study models of peer effects operating through the unobserved heterogeneity of peers. We build on their work by providing identification conditions which are straightforward to verify in practice, and which can be applied to a general network structure (e.g., social networks) in the presence of correlated effects. We also extend our results to the canonical model of endogenous peer effects, in which outcomes are simultaneously determined.111Arcidiacono et al. (2012) consider a variant of endogenous effects through which outcomes depend on the expected (as opposed to realized) outcomes of others, and study identification in an example with 2 individuals. This model does not allow simulataneity in outcomes.

Beyond the peer effects literature our work can be viewed as extending the worker-firm fixed effects framework for wage decomposition used in labor economics (e.g., Abowd et al. (1999)) to allow for within-firm interactions of workers. That is, in addition to worker and firm heterogeneity, wages may depend on the composition of other workers in the firm as well as their wages. Such spillovers would be expected to operate in firms in which workers work in teams (Mas and Moretti, 2009; Cornelissen et al., 2017).

Our empirical work constributes to the literature on innovation take-up, particularly in the healthcare context. Related papers are Agha and Molitor (2018) and Barrenho et al. (2020). Agha and Molitor (2018) study how the take-up of new cancer drugs depends on the presence of local opinion leaders. To do this, they compare diffusion patterns across regions, separating correlated regional demand for new technology from information spillovers. They find that take-up is fastest in the region in which the lead author on the clinical trial practices. However, their work does not directly study peer effects in innovation. Barrenho et al. (2020) study the take-up of keyhole surgery, and estimate peer effects through surgeon observables. Our empirical contribution is to additionally allow peer effects to operate through surgeons’ latent propensity to innovate, which we show matters above and beyond the effect of peer experience.

We proceed as follows. In Section 2, we present our baseline model. In Section 3 we provide identification results, apply them to two examples and discuss estimation. In Sections 4 and 5 we conduct a Monte Carlo experiment and apply our method to surgeons’ take-up of keyhole surgery for colorectal cancer. In Section 6 we conclude. Proofs and extensions are in the Appendix.

1.2.   Notation

If MM is a strictly positive integer, we denote [M]={1,2,,M}[M]=\{1,2,...,M\}. If 𝐀\mathbf{A} and 𝐁\mathbf{B} are M×PM\times P and M×QM\times Q matrices, we denote the M×(P+Q)M\times(P+Q) matrix obtained by concatenating 𝐀\mathbf{A} and 𝐁\mathbf{B} by [𝐀,𝐁][\mathbf{A},\mathbf{B}]. If element (i,j)(i,j) of 𝐀\mathbf{A} is 𝐀ij\mathbf{A}_{ij}, we write 𝐀=(𝐀ij)i[M],j[P]\mathbf{A}=(\mathbf{A}_{ij})_{i\in[M],j\in[P]}, and if 𝐚\mathbf{a} is a vector, 𝐚i\mathbf{a}_{i} denotes entry ii. We use 𝐈M\mathbf{I}_{M} for the MM dimensional identity, ιM\iota_{M} for the M×1M\times 1 vector of ones and 𝟎\mathbf{0} to denote a matrix of zeros. If its dimensions are ambiguous we write 𝟎M,P\mathbf{0}_{M,P} to denote a M×PM\times P matrix of zeros. We use 𝟏()\mathbf{1}(\cdot) to denote the indicator function.

2.   Model

We consider a pattern of mobility of NN workers between MM groups over TT periods. Our interest lies in the typical setting in which TT is small and NN can be large. A mobility pattern is characterised by the N×MN\times M matrices of group membership indicators 𝐂1,𝐂2,,𝐂T\mathbf{C}_{1},\mathbf{C}_{2},...,\mathbf{C}_{T} and the N×NN\times N interaction matrices 𝐆1,𝐆2,,𝐆T\mathbf{G}_{1},\mathbf{G}_{2},...,\mathbf{G}_{T}, the elements of which encode the peer effect exerted by one individual on another. The groups are such that in each period, every individual is in exactly one group and every group has at least one individual in at least one period. We use g(i,t)[M]g(i,t)\in[M] to denote the group of individual i[N]i\in[N] in year t[T]t\in[T] and Ng(i,t)N_{g(i,t)} to denote its size.

The N×1N\times 1 vector of continuous outcomes in period t[T]t\in[T] is 𝕪t\mathbb{y}_{t}, which is determined by

𝐲t\displaystyle\mathbf{y}_{t} =(𝐈N+ρ𝐆t)α+𝐂tγ+ϵt\displaystyle=\left(\mathbf{I}_{N}+\rho\mathbf{G}_{t}\right)\alpha+\mathbf{C}_{t}\gamma+\epsilon_{t} (2.1)

where α\alpha is an N×1N\times 1 vector of time-invariant individual unobserved heterogeneity, γ\gamma an M×1M\times 1 vector of unobserved group heterogeneity, and ϵt\epsilon_{t} an N×1N\times 1 disturbance. In Section 3.1, we consider an extension in which γ\gamma is permitted to vary by period. The parameter of interest is ρ\rho, which captures the peer effect.

Unless otherwise stated 𝐆t\mathbf{G}_{t} is unrestricted and can be interpreted as the adjacency matrix of a weighted, directed network linking NN individuals. In particular, denoting entry (i,j)(i,j) of 𝐆t\mathbf{G}_{t} by 𝐆ijt\mathbf{G}_{ijt}, it need not be the case that 𝐆ijt=0\mathbf{G}_{ijt}=0 when g(i,t)g(j,t)g(i,t)\neq g(j,t), nor when i=ji=j, nor when 𝐆jit=0\mathbf{G}_{jit}=0. From this point onwards, we refer to 𝐆t\mathbf{G}_{t} as the network. A typical example of 𝐆t\mathbf{G}_{t} is the linear-in-means network,

𝐆¯t=(𝟏(g(i,t)=g(j,t))Ng(i,t)1)(i,j)[N]2.\displaystyle\overline{\mathbf{G}}_{t}=\left(\mathbf{1}(g(i,t)=g(j,t))N_{g(i,t)}^{-1}\right)_{(i,j)\in[N]^{2}}. (2.2)

This implies that the peer effect operates through the group average of αi\alpha_{i}, and is a natural choice when only group membership indicators are available. If more detailed data on the network structure are available (e.g., social network data), this information can be incorporated into 𝐆t\mathbf{G}_{t}. Clearly, 𝐆t\mathbf{G}_{t} can evolve over time as individuals move between groups.

In our empirical application, 𝐲t\mathbf{y}_{t} is surgeons’ take-up of keyhole surgery for colorectal cancer, measured by the fraction of eligible surgeries performed by keyhole in year tt, 𝐂t\mathbf{C}_{t} comprises indicators for the hospital in which a surgeon practices, and we take 𝐆t\mathbf{G}_{t} to be linear-in-means, linear-in-others’-means (in which the focal surgeon is excluded from the group average, see (3.14)) and a persistent version of linear-in-others’-means, in which links persist when a surgeon moves to a new hospital and are weighted by the number of years worked at the same hospital. The latter captures cumulative peer exposure, which may be better suited to innovation take-up. The vector α\alpha captures surgeons’ latent propensity to take up keyhole surgery. It can account for heterogeneity in education/training, ability (e.g., dexterity) and taste for innovation. The vector γ\gamma captures hospital level heterogeneity including resources (e.g., equipment) and patient composition.

Stacking (2.1) by period yields 𝐲=(𝐉+ρ𝐆)α+𝐂γ+ϵ\mathbf{y}=\left(\mathbf{J}+\rho\mathbf{G}\right)\alpha+\mathbf{C}\gamma+\epsilon, where 𝐲\mathbf{y} and ϵ\epsilon are NT×1NT\times 1, 𝐂=(𝐂1,𝐂2,,𝐂T)\mathbf{C}=(\mathbf{C}_{1}^{\prime},\mathbf{C}_{2}^{\prime},\ldots,\mathbf{C}_{T}^{\prime})^{\prime}, 𝐉=(𝐈N,𝐈N,,𝐈N)\mathbf{J}=(\mathbf{I}_{N},\mathbf{I}_{N},\ldots,\mathbf{I}_{N})^{\prime} and 𝐆=(𝐆1,𝐆2,,𝐆T)\mathbf{G}=(\mathbf{G}_{1}^{\prime},\mathbf{G}_{2}^{\prime},\ldots,\mathbf{G}_{T}^{\prime})^{\prime}. Since i=1N𝐉ki=f=1M𝐂kf=1\sum_{i=1^{N}}\mathbf{J}_{ki}=\sum_{f=1}^{M}\mathbf{C}_{kf}=1 for all k[NT]k\in[NT], we use the normalization γM=0\gamma_{M}=0 to obtain

𝐲=(𝐉+ρ𝐆)α+𝐃γ+ϵ\displaystyle\mathbf{y}=\left(\mathbf{J}+\rho\mathbf{G}\right)\alpha+\mathbf{D}\gamma+\epsilon (2.3)

where 𝐃\mathbf{D} comprises the first M1M-1 columns of 𝐂\mathbf{C} and from this point forwards γ=(γ1,γ2,,γM1)\gamma=(\gamma_{1},\gamma_{2},...,\gamma_{M-1})^{\prime}. This is without consequence for identification of ρ\rho.

Our identification results depend on variation in the network (both over individuals and over time) and mobility of individuals between groups over time. It is well known that mobility serves to separate the individual and correlated effects (e.g., Abowd et al. (1999)). As we show below, mobility also serves to separate individual and correlated effects from the peer effect. This is because, in the typical setting in which there are no between group links,222i.e., if g(i,t)g(j,t)g(i,t)\neq g(j,t) then 𝐆ijt=0\mathbf{G}_{ijt}=0, though our results do not require this. if an individual moves from one group to another she ceases to interact with others in her previous group and begins to interact with others in her new group.

3.   Identification

Our identification results treat the joint distribution of 𝐲,𝐆,𝐃\mathbf{y},\mathbf{G},\mathbf{D} as observable. This is consistent with the researcher accessing a sample from this distribution. For small TT, our approach is identical in spirit to that used throughout the peer effects literature (i.e., with T=1T=1), in which the researcher is assumed to observe a cross-section of groups (see Bramoullé et al. (2019) for a review).

A more challenging but sometimes more realistic alternative is that the researcher observes a sample of individuals from a single group, so that, depending on the network structure, all individuals are potentially linked to one another, at least indirectly. It is more challenging because additional structure is required to deal with the dependence between individuals, though this is primarily a concern for inference rather than identification. Goldsmith-Pinkham and Imbens (2013) describe how asymptotic analysis could be implemented based on a random variable measuring ‘distance’ between individuals. Their argument is as follows. If distant individuals have low probability of link formation (e.g., due to homophily), and distance has large support, it may be possible to construct blocks of individuals such that each pair of blocks could be treated as close to independent. This is in the same spirit as certain types of asymptotic analysis for time-series, and allows the researcher to view a sample of individuals from a single group similarly to a sample of many groups.

Goldsmith-Pinkham and Imbens (2013) use the above arguments to justify conducting identification analysis as if the joint distribuion of 𝐲,𝐆,𝐗\mathbf{y},\mathbf{G},\mathbf{X} were observable, where 𝐗\mathbf{X} is a matrix of exogenous characteristics through which peer effects may operate.333Their model does not include correlated effects, hence the absence of 𝐃\mathbf{D}. Their arguments can be directly applied in our context to justify analysis of a sample from the joint distribution of 𝐲,𝐆,𝐃\mathbf{y},\mathbf{G},\mathbf{D} if TT is small. The only caveat is that 𝐲,𝐆,𝐃\mathbf{y},\mathbf{G},\mathbf{D} must include all observed time periods for the individuals and groups, so that temporal dependence is not an issue in (hypothetically) constructing blocks of observations. In our application, such blocks could be thought of as corresponding to regions of England because mobility is primarily between hospitals in the same region (Goldacre et al., 2013; Barrenho et al., 2020). In other applications such as wage decomposition in labor economics, the relevant partition could be by industry and/or by region.

We study identification of ρ\rho based on the conditional mean restriction

𝔼[𝐲|𝐆,𝐃]=(𝐉+ρ𝐆)𝔼[α|𝐆,𝐃]+𝐃𝔼[γ|𝐆,𝐃],\displaystyle\mathbb{E}[\mathbf{y}|\mathbf{G},\mathbf{D}]=\left(\mathbf{J}+\rho\mathbf{G}\right)\mathbb{E}[\alpha|\mathbf{G},\mathbf{D}]+\mathbf{D}\mathbb{E}[\gamma|\mathbf{G},\mathbf{D}], (3.1)

which implies exogeneity of the network and mobility of individuals between groups over time with respect to the outcome shock ϵ\epsilon. The former is typical in the peer effects literature and the latter is typical in the wage decomposition literature. We do not restrict 𝔼[α|𝐆,𝐃]\mathbb{E}[\alpha|\mathbf{G},\mathbf{D}] nor 𝔼[γ|𝐆,𝐃]\mathbb{E}[\gamma|\mathbf{G},\mathbf{D}], which allows for a limited form of network endogeneity with respect to unobserved individual and group heterogeneity.

We say that ρ\rho is identified when it can be uniquely recovered from the right-hand side of (3.1). Our results are thus asymptotic in nature (see Manski (1995)), and hence charaterize whether peer effects can be distentangled from individual and group heterogeneity if there is no limit to the number of mobility patterns observed. To simplify the exposition, following Bramoullé et al. (2009) and Abowd et al. (1999), the remainder of the paper presents the case in which 𝐆\mathbf{G} and 𝐃\mathbf{D} are treated as fixed. To allow for the random case, we simply replace unconditional expectations with expectations conditional on 𝐆,𝐃\mathbf{G},\mathbf{D}, and the identification results below hold if there exists a realization in the support of 𝐆,𝐃\mathbf{G},\mathbf{D} which satisfies the relevant condition. In the same way, it is straightforward to allow N,MN,M and TT to vary by mobility pattern. Identical arguments are used throughout the peer effects literature (see, e.g., Bramoullé et al. (2009)). Returning to the fixed case, (3.1) becomes

𝔼[𝐲]=(𝐉+ρ𝐆)μα+𝐃μγ,\displaystyle\mathbb{E}[\mathbf{y}]=\left(\mathbf{J}+\rho\mathbf{G}\right)\mu^{\alpha}+\mathbf{D}\mu^{\gamma}, (3.2)

where μα=𝔼[α]\mu^{\alpha}=\mathbb{E}[\alpha] and μγ=𝔼[γ]\mu^{\gamma}=\mathbb{E}[\gamma]. Since (3.2) is non-linear in the parameter ρ\rho and the (unknown) μα\mu^{\alpha}, establishing identification is non-trivial, depending both on the properties of the NT×2N+M1NT\times 2N+M-1 matrix [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] and on the value of μα\mu^{\alpha}. For example, it is clear that ρ\rho is not identified when μα=𝟎\mu^{\alpha}=\mathbf{0}.

Mobility of individuals between groups over time is necessary for identification. In the absence of mobility, [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] has rank at most NN, so (3.2) yields NN equations in N+MN+M unknowns. For the same reason, we also require T2T\geq 2. This is because the peer effect operates through time-invariant individual unobserved heterogeneity, rather than through individual observables. However, mobility alone is not sufficient for identification, as made clear in Example 2 below.

We now present our first identification result, which makes use of the within-group annihilator for the correlated effects, given by 𝐖=𝐈NT𝐃(𝐃𝐃)1𝐃\mathbf{W}=\mathbf{I}_{NT}-\mathbf{D}(\mathbf{D}^{\prime}\mathbf{D})^{-1}\mathbf{D}, and a decomposition of vectors 𝐯=(𝐯1,𝐯2)\mathbf{v}=(\mathbf{v}_{1}^{\prime},\mathbf{v}_{2}^{\prime})^{\prime} which lie in the null-space of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{WG}] such that 𝐯1\mathbf{v}_{1} and 𝐯2\mathbf{v}_{2} are both N×1N\times 1.

Proposition 1.

ρ\rho is identified if there does not exist a vector 𝐯=(𝐯1,𝐯2)\mathbf{v}=(\mathbf{v}_{1}^{\prime},\mathbf{v}_{2}^{\prime})^{\prime} in the null-space of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{WG}] and scalars δ1\delta_{1} and δ20\delta_{2}\neq 0 verifying δ1𝐯1+δ2𝐯2=μα\delta_{1}\mathbf{v}_{1}+\delta_{2}\mathbf{v}_{2}=\mu^{\alpha}. Otherwise ρ\rho is not identified.

Notice that full column rank of [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] is not necessary because 𝐯=𝟎\mathbf{v}=\mathbf{0} need not be the only vector in the null-space of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{WG}]. This is because [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] has 2N+M12N+M-1 columns but there are only N+MN+M unknowns. Requiring [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] to have full column rank is too strong because it rules out T=2T=2.444This is because full column rank requires NT2N+M1NT\geq 2N+M-1, hence T2+(M1)/NT\geq 2+(M-1)/N. T=2T=2 is not immediately ruled out when M=1M=1, but ρ\rho is not identifiable in this case because there can be no mobility if there is only 1 group. A common empirical setting is when the rows of 𝐆\mathbf{G} sum to one (e.g., linear-in-means), such that peer effects operate through a weighted average. If there are no other collinearities among the columns of [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] then one can apply the following.

Corollary 1.1.

If 𝐆ιN=ιNT\mathbf{G}\iota_{N}=\iota_{NT}, ρ\rho is identified if rank([𝐉,𝐆,𝐃])=2N+M2{\rm rank}([\mathbf{J},\mathbf{G},\mathbf{D}])=2N+M-2 and there exists (i,j)[N]2(i,j)\in[N]^{2} such that μiαμjα\mu^{\alpha}_{i}\neq\mu^{\alpha}_{j}.

Corollary 1.1 can be shown using the decomposition [𝐖𝐉,𝐖𝐆]=𝐒𝐑[\mathbf{WJ},\mathbf{WG}]=\mathbf{S}\mathbf{R} where 𝐒\mathbf{S} is the NT×2N1NT\times 2N-1 full rank matrix formed by concatenating 𝐖𝐉\mathbf{WJ} and the first N1N-1 columns of 𝐖𝐆\mathbf{WG} and

𝐑=(𝐒𝐒)1𝐒[𝐖𝐉,𝐖𝐆]=(𝐈N𝟎N,N1ιN𝟎N1,N𝐈N1ιN1).\displaystyle\mathbf{R}=(\mathbf{S}^{\prime}\mathbf{S})^{-1}\mathbf{S}^{\prime}[\mathbf{WJ},\mathbf{WG}]=\begin{pmatrix}\mathbf{I}_{N}&\mathbf{0}_{N,N-1}&\iota_{N}\\ \mathbf{0}_{N-1,N}&\mathbf{I}_{N-1}&-\iota_{N-1}\end{pmatrix}. (3.3)

From the structure of 𝐑\mathbf{R}, it is immediate that vectors 𝐯\mathbf{v} in the null-space of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{WG}] are of the form 𝐯1=cιN\mathbf{v}_{1}=c\iota_{N}, 𝐯2=𝐯1\mathbf{v}_{2}=-\mathbf{v}_{1} for cc\in\mathbb{R}. Applying Proposition 1 yields identification of ρ\rho if there exists (i,j)[N]2(i,j)\in[N]^{2} such that μiαμjα\mu^{\alpha}_{i}\neq\mu^{\alpha}_{j}. If this condition is violated then μα=aιN\mu^{\alpha}=a\iota_{N} for some aa\in\mathbb{R}, and (𝐉+ρ𝐆)μα=(1+ρ)aιNT,(\mathbf{J}+\rho\mathbf{G})\mu^{\alpha}=(1+\rho)a\iota_{NT}, so only (1+ρ)μα(1+\rho)\mu^{\alpha} is identifiable. Intuitively, we require μiαμjα\mu^{\alpha}_{i}\neq\mu^{\alpha}_{j} because if individuals are homogeneous then no amount of mobility can lead to changes in the average of αi\alpha_{i} over peers, hence outcomes do not vary in response to changes in peer composition over time.

The identification conditions above depend on μα\mu^{\alpha}, which is not observed. We now ask how ‘large’ is the set of values of μα\mu^{\alpha} for which ρ\rho is identified. This is the notion of generic identification (see Lewbel (2019)). If ρ\rho is generically identified, then it is identified for all values of μα\mu^{\alpha} with the exception of a few pathological cases, which are ‘unlikely’ to arise in practice.

Corollary 1.2.

If rank([𝐖𝐉,𝐖𝐆])N+1{\rm rank}([\mathbf{WJ},\mathbf{WG}])\geq N+1 the set of μα\mu^{\alpha} for which ρ\rho is not identified is a measure zero subset of N\mathbb{R}^{N}.

Corollary 1.2 means that ρ\rho is generically identified if rank([𝐖𝐉,𝐖𝐆])N+1{\rm rank}([\mathbf{WJ},\mathbf{WG}])\geq N+1. This is because 𝐯\mathbf{v} lies in a subspace of 2N\mathbb{R}^{2N} of dimension at most N1N-1, hence δ1𝐯1+δ2𝐯2\delta_{1}\mathbf{v}_{1}+\delta_{2}\mathbf{v}_{2} lies in a subspace of N\mathbb{R}^{N} of dimension at most N1N-1.

As with the well known rank condition in linear models (e.g., no perfect multicollinearity among regressors for linear regression), the researcher can check whether the rank requirements of Corollaries 1.1 and 1.2 hold in the observed data. To do this, one interprets NN as the total number of observed individuals, MM as the total number of observed firms and TT as the total number of observed time periods, and constructs the observed values of 𝐉,𝐆\mathbf{J},\mathbf{G} and 𝐃\mathbf{D} accordingly.555In our identification analysis, these quantities are the number of individuals, firms and time periods in a single mobility pattern (i.e., a draw from the joint distribution of 𝐲,𝐆,𝐃\mathbf{y},\mathbf{G},\mathbf{D}). The observed data comprise the realizations of many such patterns. If the rank requirement of either Corollary holds then the researcher can be confident of identification.

Proposition 1 can be viewed as panel data analogue of the identification results of Bramoullé et al. (2009), which imply that, for T=1T=1 and exogenous observed characteristic(s) 𝐗\mathbf{X} of dimension N×KN\times K, the peer effect ρ˙\dot{\rho} in the model

𝔼[𝐲|𝐗]=𝐗β˙+𝐆𝐗ρ˙+𝐃μ˙γ\displaystyle\mathbb{E}[\mathbf{y}|\mathbf{X}]=\mathbf{X}\dot{\beta}+\mathbf{GX}\dot{\rho}+\mathbf{D}\dot{\mu}^{\gamma} (3.4)

is identified if and only if 𝐖𝐉\mathbf{WJ} and 𝐖𝐆\mathbf{WG} are linearly independent (i.e., if there does not exist nonzero λ2\lambda\in\mathbb{R}^{2} such that λ1𝐖𝐉+λ2𝐖𝐆=𝟎\lambda_{1}\mathbf{WJ}+\lambda_{2}\mathbf{WG}=\mathbf{0}).666Since α\alpha does not appear in (3.4), there is no need to impose γM=0\gamma_{M}=0, which is no longer a normalization. In this case 𝐃\mathbf{D} ought to be replaced by 𝐂\mathbf{C} and 𝐖\mathbf{W} by the annihilator for 𝐂\mathbf{C} when referring to the model in (3.4). We continue to use the notation 𝐃\mathbf{D} and 𝐖\mathbf{W} to facilitate the comparison with our results. This is a weaker rank requirement than that in Proposition 1, and can be satisfied when T=1T=1. This is because the peer effect is assumed to operate through the observed 𝐗\mathbf{X} rather than through the unobserved α\alpha. In practice the researcher may not know what to include in 𝐗\mathbf{X} and/or 𝐗\mathbf{X} may be of large dimension and/or endogenous. Proposition 1 shows that identification can be attained with T2T\geq 2 without requiring any knowledge on 𝐗\mathbf{X}.777Of course this requires that 𝐗\mathbf{X} be time-invariant. However, common choices of the components of 𝐗\mathbf{X} such as gender and education are also time-invariant. An additional advantage of panel data is that it facilitates identification for some network structures for which peer effects are not otherwise identifiable. For example, if T=1T=1 and the network is linear-in-means, then 𝐖𝐆=𝟎\mathbf{WG}=\mathbf{0} and (3.4) does not identify the peer effect. In contrast, as we show in Example 1 below, peer effects are identifiable when T=2T=2 provided that there is mobility of individuals between groups over time.

It is straightforward to extend our model to include exogenous characteristics, yielding

𝔼[𝐲|𝐗]=𝐗β+𝐅𝐗ρ1+(𝐉+ρ2𝐆)μα(𝐗)+𝐃μγ(𝐗)\displaystyle\mathbb{E}[\mathbf{y}|\mathbf{X}]=\mathbf{X}\beta+\mathbf{F}\mathbf{X}\rho_{1}+\left(\mathbf{J}+\rho_{2}\mathbf{G}\right)\mu^{\alpha}(\mathbf{X})+\mathbf{D}\mu^{\gamma}(\mathbf{X}) (3.5)

where 𝐗\mathbf{X} is NT×KNT\times K, 𝐅\mathbf{F} is a NT×NTNT\times NT block diagonal matrix with blocks 𝐆1,𝐆2,,𝐆T\mathbf{G}_{1},\mathbf{G}_{2},...,\mathbf{G}_{T}, μα(𝐗)=𝔼[α|𝐗]\mu^{\alpha}(\mathbf{X})=\mathbb{E}[\alpha|\mathbf{X}] and μγ(𝐗)=𝔼[γ|𝐗]\mu^{\gamma}(\mathbf{X})=\mathbb{E}[\gamma|\mathbf{X}]. Provided that the entries of 𝐗\mathbf{X} vary over time, the parameters ρ1\rho_{1} and ρ2\rho_{2} are separately identifiable under similar conditions to Proposition 1. For brevity, we do not pursue this formally, though we do estimate such a specification in our empirical application.

In the Appendix we provide analagous idenification results which additionally allow for endogenous peer effects, through which outcomes are simultaneously determined. We also provide conditional variance restrictions in the spirit of Graham (2008) and Rose (2017), which can be used when the conditional mean restriction does not suffice for identification.

We now consider two examples of our baseline identification results with N=M=T=2N=M=T=2.

Example 1: An identifying mobility pattern. Consider the following mobility pattern in which individuals one and two are respectively in groups 1 and 2 in the first period. In the second period, individual one remains in group 1 and individual two moves from group 2 to group 1. Under the linear-in-means network, this mobility pattern yields

[𝐉,𝐆,𝐃]=(1010101010101/21/21011/21/21),[𝐖𝐉,𝐖𝐆]=(1/31/31/31/301011/31/31/61/62/32/31/61/6),\displaystyle[\mathbf{J},\mathbf{G},\mathbf{D}]=\begin{pmatrix}1&0&1&0&1\\ 0&1&0&1&0\\ 1&0&1/2&1/2&1\\ 0&1&1/2&1/2&1\\ \end{pmatrix},\quad[\mathbf{WJ},\mathbf{W}\mathbf{G}]=\begin{pmatrix}1/3&-1/3&1/3&-1/3\\ 0&1&0&1\\ 1/3&-1/3&-1/6&1/6\\ -2/3&2/3&-1/6&1/6\end{pmatrix}, (3.6)

which respectively have rank 44 and 33. Since [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] has rank 2N+M22N+M-2, by Corollary 1.1, ρ\rho is identified if μ1αμ2α\mu^{\alpha}_{1}\neq\mu^{\alpha}_{2}. Since rank[𝐖𝐉,𝐖𝐆]=N+1{\rm rank}[\mathbf{WJ},\mathbf{W}\mathbf{G}]=N+1, Corollary 1.2 states that ρ\rho is generically identified. This is because the subset of 2\mathbb{R}^{2} such that μ1α=μ2α\mu^{\alpha}_{1}=\mu^{\alpha}_{2} has measure zero.

For the intuition, consider the underlying system of equations for the outcomes yity_{it} of individual ii in period tt,

𝔼[y11]=μ1α+ρμ1α+μ1γ𝔼[y21]=μ2α+ρμ2α𝔼[y12]=μ1α+ρ(μ1α+μ2α)/2+μ1γ𝔼[y22]=μ2α+ρ(μ1α+μ2α)/2+μ1γ\displaystyle\begin{array}[]{ll}\mathbb{E}[y_{11}]=\mu^{\alpha}_{1}+\rho\mu^{\alpha}_{1}+\mu^{\gamma}_{1}&\mathbb{E}[y_{21}]=\mu^{\alpha}_{2}+\rho\mu^{\alpha}_{2}\\ \mathbb{E}[y_{12}]=\mu^{\alpha}_{1}+\rho(\mu^{\alpha}_{1}+\mu^{\alpha}_{2})/2+\mu^{\gamma}_{1}&\mathbb{E}[y_{22}]=\mu^{\alpha}_{2}+\rho(\mu^{\alpha}_{1}+\mu^{\alpha}_{2})/2+\mu^{\gamma}_{1}\end{array} (3.9)

When individual two moves groups, individual one obtains a new peer, hence the peer effect on individual one changes from ρμ1α\rho\mu^{\alpha}_{1} in the first period to ρ(μ1α+μ2α)/2\rho(\mu^{\alpha}_{1}+\mu^{\alpha}_{2})/2 in the second period, whilst the individual and correlated effect are unchanged. The change due to the peer effect is given by the expected change in the outcome of individual one between the first and second period 𝔼[y11]𝔼[y12]=ρ(μ1αμ2α)/2\mathbb{E}[y_{11}]-\mathbb{E}[y_{12}]=\rho(\mu^{\alpha}_{1}-\mu^{\alpha}_{2})/2. To identify ρ\rho we now need to identify μ1αμ2α\mu^{\alpha}_{1}-\mu^{\alpha}_{2}. We can use the second period, in which both individuals are in the same group, hence have the same correlated and peer effects. This means that any difference in their expected outcomes is due to differences in their individual effects, so 𝔼[y12]𝔼[y22]=μ1αμ2α\mathbb{E}[y_{12}]-\mathbb{E}[y_{22}]=\mu^{\alpha}_{1}-\mu^{\alpha}_{2}. If μ1αμ2α\mu^{\alpha}_{1}\neq\mu^{\alpha}_{2} we obtain

ρ=2(𝔼[y11]𝔼[y12])/(𝔼[y12]𝔼[y22]).\displaystyle\rho=2(\mathbb{E}[y_{11}]-\mathbb{E}[y_{12}])/(\mathbb{E}[y_{12}]-\mathbb{E}[y_{22}]). (3.10)

If we were to observe this mobility pattern repeatedly, we could estimate the expectations using sample means, yielding an estimator of ρ\rho. Of course, in practice we do not repeatedly observe the same mobility pattern, but a variety of patterns, the information from which we combine through an estimator based on the conditional mean restriction (3.1). Nevertheless, for the purposes of identification we require only that there exists a single identifying mobility pattern realized with non-zero probability. Finally, notice that the above arguments are unchanged when the normalization γM=0\gamma_{M}=0 is not used, in which case one has 𝔼[y21]=μ2α+ρμ2α+μ2γ\mathbb{E}[y_{21}]=\mu^{\alpha}_{2}+\rho\mu^{\alpha}_{2}+\mu^{\gamma}_{2} in (3.9).

Example 2: A non-identifying mobility pattern. Now modify Example 1 such that both individuals move in the second period. This means that the individuals are in different groups in the first period and swap groups in the second period, implying that 𝐆=𝐉\mathbf{G}=\mathbf{J}. The null-space of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{W}\mathbf{G}] comprises 𝐯=(𝐯1,𝐯2)=(𝐮,𝐮)\mathbf{v}=(\mathbf{v}_{1}^{\prime},\mathbf{v}_{2}^{\prime})^{\prime}=(\mathbf{u}^{\prime},-\mathbf{u}^{\prime})^{\prime} for any 𝐮2\mathbf{u}\in\mathbb{R}^{2}. For any value of μα2\mu^{\alpha}\in\mathbb{R}^{2}, there clearly exists 𝐮2\mathbf{u}\in\mathbb{R}^{2} and scalars δ1\delta_{1} and δ20\delta_{2}\neq 0 such that (δ1δ2)𝐮=μα(\delta_{1}-\delta_{2})\mathbf{u}=\mu^{\alpha}, so by Proposition 1, ρ\rho is not identified. Note also that the identification condition in Corollary 1.1 is violated since [𝐉,𝐆,𝐃][\mathbf{J},\mathbf{G},\mathbf{D}] has rank 3<2N+M2=43<2N+M-2=4 and the generic identification condition in Corollary 1.2 is violated since [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{W}\mathbf{G}] has rank 2<N+1=32<N+1=3.

For the intuition, consider again the underlying system of equations

𝔼[y11]=μ1α+ρμ1α+μ1γ𝔼[y21]=μ2α+ρμ2α𝔼[y12]=μ1α+ρμ1α𝔼[y22]=μ2α+ρμ2α+μ1γ\displaystyle\begin{array}[]{ll}\mathbb{E}[y_{11}]=\mu^{\alpha}_{1}+\rho\mu^{\alpha}_{1}+\mu^{\gamma}_{1}&\mathbb{E}[y_{21}]=\mu^{\alpha}_{2}+\rho\mu^{\alpha}_{2}\\ \mathbb{E}[y_{12}]=\mu^{\alpha}_{1}+\rho\mu^{\alpha}_{1}&\mathbb{E}[y_{22}]=\mu^{\alpha}_{2}+\rho\mu^{\alpha}_{2}+\mu^{\gamma}_{1}\end{array} (3.13)

The correlated effect μ1γ\mu^{\gamma}_{1} is identified by individual one moving groups (μ1γ=𝔼[y11]𝔼[y12]\mu^{\gamma}_{1}=\mathbb{E}[y_{11}]-\mathbb{E}[y_{12}]) but the remaining equations are only sufficient to identify (1+ρ)μα(1+\rho)\mu^{\alpha}. The reason for this is that there is no variation in the peers of either individual because the individuals are never in the same group in the same period. Mobility of individuals between groups over time is insufficient for identification of ρ\rho because it does not induce changes in peer groups. This contrasts with the canonical wage decomposition model imposing ρ=0\rho=0, for which group-swapping would be sufficient to identify μ1α,μ2α,μ1γ\mu^{\alpha}_{1},\mu^{\alpha}_{2},\mu^{\gamma}_{1}.888Note that identification of μα\mu^{\alpha} and μγ\mu^{\gamma} is only up to the normalization γM=0\gamma_{M}=0. Since it does not identify ρ\rho, no matter how many times we observe this mobility pattern in our data, it cannot be used to construct an estimator of ρ\rho.

3.1.   Time-varying Correlated Effects

If correlated effects are time-varying, then γg(i,t)\gamma_{g(i,t)} is replaced by γg(i,t)t\gamma_{g(i,t)t}, in which case 𝐂\mathbf{C} is a NT×MTNT\times MT block diagonal matrix with blocks 𝐂1,𝐂2,,𝐂T\mathbf{C}_{1},\mathbf{C}_{2},...,\mathbf{C}_{T}, 𝐃\mathbf{D} comprises the first MT1MT-1 columns of 𝐂\mathbf{C} and γ\gamma is (MT1)×1(MT-1)\times 1. All Propositions then apply as stated, provided that 𝐖\mathbf{W} is modified accordingly. The parameters are not identifiable under the linear-in-means network because the peer effect varies only at the group-period level, so cannot be separated from the correlated effect.

3.2.   Estimation

Arcidiacono et al. (2012) propose NLS estimation of (2.3), treating α,ρ,γ\alpha,\rho,\gamma as parameters to be estimated (i.e., a fixed effects approach). The authors study its properties under the linear-in-others’-means network,

𝐆~t=(𝟏(g(i,t)=g(j,t),ij)(Ng(i,t)1)1)(i,j)[N]2,\displaystyle\widetilde{\mathbf{G}}_{t}=\left(\mathbf{1}(g(i,t)=g(j,t),i\neq j)(N_{g(i,t)}-1)^{-1}\right)_{(i,j)\in[N]^{2}}, (3.14)

and without correlated effects. Assuming that 𝔼[ϵitϵjs]=0\mathbb{E}[\epsilon_{it}\epsilon_{js}]=0 for all ij,tsi\neq j,t\neq s, 𝔼[ϵit2|g(i,t)]=𝔼[ϵjt2|g(j,t)]\mathbb{E}[\epsilon_{it}^{2}|g(i,t)]=\mathbb{E}[\epsilon_{jt}^{2}|g(j,t)] for all i,j,ti,j,t such that g(i,t)=g(j,t)g(i,t)=g(j,t), 𝔼[ϵitαj]=0\mathbb{E}[\epsilon_{it}\alpha_{j}]=0 for all i,j,ti,j,t and ρ<mini,tNg(i,t)\rho<\min_{i,t}N_{g(i,t)}, the authors show that the estimator of ρ\rho is consistent and asymptotically normal in the number of individuals provided that there are at least two time periods. The authors discuss a variant of correlated effects appropriate to their application to peer effects in education, which is allowed to vary over time but is restricted to be the same across multiple groups, and argue that they expect similar behavior of the estimator in this case. The proposed NLS estimator is based on the conditional mean restriction (3.1), hence, subject to identification, could equally be applied to other network structures and specifications of the correlated effect. We do not formally establish its properties because our focus is on identification and our empirical application. However we do explore this in our Monte-Carlo experiment.

4.   Monte Carlo Experiment

The design is tailored to our empirical application, with 700 individuals, 140 groups, 15 time periods, mobility rate p=0.03p=0.03 (this is the probability that an individual moves groups from one period to the next, see summary statistics in Table 2) and ρ=0.5\rho=0.5. We also consider designs with 2 time periods and mobility rate p=0.1p=0.1.

The data generating process is as follows. In the first period, all individuals are randomly assigned to groups of size five. In each subsequent period, each individual moves group with probability pp, in which case she draws a new group with uniform probability over all groups. The expected group size is 5 for all groups in all periods. The network structures we consider are linear-in-means, linear-in-others’-means, and a variant of linear-in-others’-means in which links persist when individuals move groups and are weighted by the number of periods spent in the same group, and a social network, in which there is within-group variation in peers.

For the persistent network, we let 𝐀s\mathbf{A}_{s} be the N×NN\times N binary adjacency matrix in period ss, with element (i,j)(i,j) equal to 1 if g(i,s)=g(j,s)g(i,s)=g(j,s) and iji\neq j. Then we define 𝐆t\mathbf{G}_{t} by taking s=1t𝐀s\sum_{s=1}^{t}\mathbf{A}_{s} and rescaling its rows to sum to 1. This means that 𝐆α\mathbf{G}\alpha captures both contemporaneous and cumulative exposure to others.

The social network is constructed as follows. In the first period, each individual draws two links uniformly over other individuals in the same group. In each subsequent period, links persist whilst individuals remain in the same group. If an individual loses a link(s) due to mobility, a replacement link(s) is drawn uniformly over the other individuals in the group with whom there is not already a link. Links need not be reciprocal.999If an individual is in a group of size 1 they have no link. If there exists a link between ii and jj in period tt then 𝐆ijt\mathbf{G}_{ijt} is equal to the inverse of the number of links that ii has in period tt. Otherwise 𝐆ijt=0\mathbf{G}_{ijt}=0. If there are three or fewer individuals in the group, each individual is linked to all other individuals.

We take αi=1+Wi+ηi\alpha_{i}=1+W_{i}+\eta_{i}, where Wi[0,1]W_{i}\in[0,1] is the number of moves made by individual ii divided by (T1)(T-1) and ηi𝒩(0,1)\eta_{i}\sim\mathcal{N}(0,1). We set ϵit𝒩(0,1/2)\epsilon_{it}\sim\mathcal{N}(0,1/2) and consider cases in which γ=𝟎\gamma=\mathbf{0} (which is also imposed on the estimator) and in which γm\gamma_{m} is the mean of αi\alpha_{i} over all members of group mm and all time periods. This design means that high αi\alpha_{i} individuals are more mobile and tend to be located in high γm\gamma_{m} groups. We choose the variances of αi\alpha_{i} and ϵit\epsilon_{it} as above so that, on average, a standard deviation increase in the peer effect on individual ii in period tt (i.e., in j=1N𝐆ijtαj\sum_{j=1}^{N}\mathbf{G}_{ijt}\alpha_{j}) leads to a 0.15-0.3 (depending on the network structure) standard deviation increase in yity_{it}. This matches the effect sizes we find in our empirical application. We simulate 500 datasets for each experiment. Every dataset verifies the generic identification condition in Corollary 1.2.

Table 1: Monte Carlo Results
Unobserved α\alpha
LIM 0.501 0.5 0.523 0.522 0.502 0.503 0.496 0.515
(0.04) (0.051) (0.297) (0.456) (0.026) (0.032) (0.161) (0.447)
LIOM 0.5 0.5 0.505 0.454 0.501 0.502 0.494 0.437
(0.029) (0.035) (0.216) (0.355) (0.019) (0.022) (0.121) (0.386)
PLIOM 0.501 0.507 0.505 0.394 0.503 0.503 0.489 0.455
(0.068) (0.085) (0.414) (0.641) (0.046) (0.052) (0.231) (0.361)
SOC 0.501 0.5 0.5 0.462 0.5 0.5 0.503 0.496
(0.019) (0.023) (0.185) (0.361) (0.013) (0.014) (0.084) (0.248)
Observed α\alpha
LIM 0.5 0.5 0.498 0.493 0.501 0.502 0.499 0.491
(0.009) (0.037) (0.027) (0.175) (0.009) (0.024) (0.026) (0.12)
LIOM 0.5 0.499 0.499 0.495 0.5 0.502 0.499 0.493
(0.008) (0.028) (0.022) (0.139) (0.008) (0.018) (0.021) (0.096)
PLIOM 0.5 0.501 0.499 0.497 0.501 0.501 0.499 0.49
(0.008) (0.03) (0.022) (0.165) (0.008) (0.024) (0.021) (0.115)
SOC 0.5 0.499 0.5 0.498 0.5 0.5 0.5 0.498
(0.007) (0.013) (0.019) (0.039) (0.007) (0.012) (0.017) (0.039)
Correlated Effects No Yes No Yes No Yes No Yes
Periods 15 15 2 2 15 15 2 2
Mobility Rate (pp) 0.03 0.03 0.03 0.03 0.1 0.1 0.1 0.1
Individuals 700 700 700 700 700 700 700 700
Groups 140 140 140 140 140 140 140 140

Notes: We report the mean and standard deviation (in parentheses) estimate of ρ\rho over 500 datasets, with true value 0.50.5. ’LIM’ is the linear-in-means network, ‘LIOM’ is linear-in-others’-means, ‘PLIOM’ is the persistent LIOM and ‘SOC’ is the social network. ‘Unobserved α\alpha’ is the NLS estimator in Section 3.2. ‘Observed α\alpha’ is the OLS estimator taking 𝐗=𝐉α\mathbf{X}=\mathbf{J}\alpha in (3.4). If there are no correlated effects the estimator imposes γ=𝟎\gamma=\mathbf{0}. Correlated effects are time-invariant.

4.1.   Results

Table 1 reports the results. The top panel reports the NLS estimator of ρ\rho described in Section 3.2. To provide a benchmark for comparison, the bottom panel reports the infeasible OLS estimator that would be used if α\alpha were observable (i.e., for the model in (3.4) taking 𝐗=𝐉α\mathbf{X}=\mathbf{J}\alpha). Columns 1-2 of Table 1 show designs with 15 periods and mobility rate 0.03, which match our empirical application. The NLS estimator of ρ\rho is centered on the true value over all networks and with and without correlated effects. The variance of the NLS estimator is of a similar order of magnitude to the infeasible OLS estimator, though of course it is larger. Columns 3-4 reduce the number of periods to 2, which causes an increase in the variance of the estimator. The final 4 columns increase the rate of mobility to 0.1, which reduces the variance of the estimator. Comparing the rows of Table 1, we find that peer effects are most precisely estimated for the social network. This is likely because the social network exhibits the most within-group variation in peers.

5.   Application

We study surgeons’ take-up of keyhole surgery for colorectal cancer in the English National Health Service (henceforth, NHS). As discussed in the introduction, colorectal cancer is prevalent, costly to treat, and accounts for a high proportion of cancer deaths worldwide. An important innovation in its treatment is keyhole surgery, which reduces costs and improves patient outcomes relative to the older open procedure. Despite this, take-up of keyhole surgery in England was slow, increasing from 1% of eligible surgeries in 2000, the year it was introduced in England, to 49% by 2014 (see Table 2). Our goal is to study the extent to which its diffusion was driven by peer effects, hence by mobility of surgeons between hospitals over time.

The NHS is an ideal setting for our empirical work because it treats almost all cancer patients in England,101010There is a small private sector in England that mainly provides care for planned procedures for which there are long waiting lists. Private sector provision for (any) cancers during the period we examine was very limited and primarily focused on treatment of overseas patients. it is a public system in which surgeons are salaried employees of one hospital at any point in time (hence their renumeration does not depend on the treatment provided), all hospitals operate under the same financial rules set by central government and have the technology necessary for keyhole surgery, and a two week waiting time guarantee for cancer referral implies that allocation of patients to surgeons is close to random, based on surgeon availability in a local hospital (Barrenho et al., 2020).

5.1.   Data

Our data are from Barrenho et al. (2020), comprising a panel of NHS surgeons and their take-up of keyhole surgery from its introduction in 2000 through to 2014. The data are obtained by merging Hospital Episode Statistics, which provides treatment information for all patients in the English NHS with NHS Workforce Statistics and the General Medical Council register. This provides matched patient-surgeon-hospital-year data, which is collapsed into a surgeon-hospital-year panel. To be included in the estimation sample, surgeons must be observed at least twice, must perform more than 5 colorectal cancer surgeries,111111This is to avoid including surgeons who do not routinely perform the procedure. and hospital-year pairs must have at least two surgeons. The resulting unbalanced panel comprises 11,923 observations of 1,363 surgeons over 15 years across 194 hospitals.

The dependent variable is surgeon take-up of keyhole surgery for colorectal cancer, measured by the fraction of eligible colorectal cancer sugeries performed by keyhole in the focal year.121212Keyhole surgery is suitable for some, but not all, patients. The sample of patients considered is restricted to those for which the surgeon has a choice between keyhole and the alternative open surgery. A detailed description of patient eligibility can be found in Barrenho et al. (2020). The surgeons we observe are senior physicians, known as Consultants. They are entirely autonomous, and, for the patients we consider, have discretion to choose either keyhole or open surgery. We also observe surgeon demographics (gender and age, though we omit gender from our models because it is time-invariant), and compute an experience measure which is the cumulative number of eligible colorectal cancer surgeries performed (both keyhole and open) by the beginning of the focal year. To account for increasing average take-up over time, we also include year fixed effects in all specifications.

We apply our model for surgeon ii in hospital g(i,t)g(i,t) in year tt,131313We set a surgeon’s hospital to be that at which they practiced for the majority of days of the year. and consider the linear-in-means and linear-in-others’-means networks, as well as the persistent linear-in-others’-means network used in our Monte-Carlo experiment. Recall that in the persistent network, a surgeon is a peer if they have ever worked concurrently in the same hospital, and peers are weighted by number of years worked together, with weights summing to one. This network captures both contemporaneous and cumulative exposure to others, hence measures a ‘stock’ of peer influence, rather than the ‘flow’ measured by the other networks.

Table 2 summarises the data. In a typical hospital in a typical year there are 5 surgeons performing colorectal cancer surgery. The largest has 16 surgeons. A typical surgeon is in their early 40s and has performed around 140 colorectal cancer surgeries by the beginning of a typical year. We only observe surgeons beginning in 2000, hence experience is equal to zero in this year for all surgeons. The minimum value of experience can be zero in any year due to entry of newly qualified surgeons.141414For this reason, in addition to taste and ability, α\alpha may be thought of as including experience in colorectal cancer surgery prior to 2000.

Table 2: Summary Statistics
Year 2000 2003 2006 2009 2012 2014 All
Surgeons observed 600 761 782 834 891 854 1363
Take-up Mean 0.008 0.022 0.104 0.293 0.421 0.485 0.218
S.D. 0.031 0.072 0.173 0.273 0.28 0.291 0.28
Min 0 0 0 0 0 0 0
Max 0.256 1 1 1 1 1 1
Age <40<40 Mean 0.179 0.19 0.201 0.225 0.207 0.252 0.233
Age 404440-44 Mean 0.283 0.246 0.253 0.233 0.284 0.284 0.26
Age 454945-49 Mean 0.227 0.172 0.239 0.194 0.219 0.219 0.212
Age 505450-54 Mean 0.196 0.153 0.165 0.131 0.166 0.16 0.165
Experience (00’s) Mean 0 0.689 1.279 1.771 2.104 2.486 1.418
S.D. 0 0.548 1.028 1.444 1.838 2.013 1.51
Min 0 0 0 0 0 0.01 0
Max 0 2.65 5.01 6.65 8.62 9.37 9.37
Moved Mean - 0.035 0.028 0.038 0.035 0.023 0.029
Peers changed Mean - 0.582 0.602 0.64 0.738 0.542 0.601
Hospitals observed 151 151 149 148 143 139 194
No. of surgeons Mean 3.974 5.04 5.248 5.635 6.231 6.144 5.36
S.D. 1.689 2.343 2.339 2.419 2.533 2.541 2.439
Min 2 2 2 2 2 2 2
Max 10 14 12 13 16 14 16

Notes: ‘Take-up’ is the fraction of colorectal cancer surgeries performed by keyhole. ‘Age’ are indicators for the specified range. ‘Experience‘ is 00’s of colorectal cancer surgery at the beginning of the year. ‘Moved’ is a binary indicator for a surgeon being located in a different hospital in year tt than in year t1t-1. ‘Peers changed’ is a binary indicator for a surgeons’ peers being different in year tt than in t1t-1. ‘No. of surgeons’ is the number of surgeons located in a hospital.

5.2.   Identification

Our identification results are based on a sample from the joint distribution of 𝐲,𝐆,𝐃\mathbf{y},\mathbf{G},\mathbf{D}. As argued in Section 3, we also expect them to be applicable if the sample can be partitioned into blocks based on individuals’ ‘distance’ to one another, such that the dependence between each pair of blocks is limited. Mobility of surgeons between hospitals is largely within the local region (Goldacre et al., 2013; Barrenho et al., 2020), hence such blocks could (hypothetically) be constructed by partitioning regions. Due to this, we expect that our theory provides a reasonable approximation.

Identification is based on exogenous mobility of surgeons between hospitals over time. We expect this to be the case because all hospitals have the required technology for keyhole surgery, the NHS is a public system with similar working conditions and renumeration nationwide, and colorectal cancer forms only a small part of surgeons’ workloads. The leading reason for mobility is to relocate closer to the pre-medical school family home (Goldacre et al., 2013). Barrenho et al. (2020) provide empirical evidence of exogenous mobility, showing that past take-up does not predict mobility, and that, conditional on moving, there is no association between take-up of the surgeon and take-up of the hospital moved to. These arguments support the conditional mean restriction in (3.1). Since 𝔼[α|𝐆,𝐃]\mathbb{E}[\alpha|\mathbf{G},\mathbf{D}] and 𝔼[γ|𝐆,𝐃]\mathbb{E}[\gamma|\mathbf{G},\mathbf{D}] are unrestricted, we allow for dependence between a surgeon’s latent propensity to take-up keyhole surgery and the nature of their network. Moreover, we do not rule out high take-up surgeons being systematically located in high take-up hospitals, though we do rule out their mobility between hospitals being driven by transitory take-up shocks.

We now discuss the extent of identifying variation in the data. Though only 3% of surgeons move hospital in a typical year (see Table 2), our panel is relatively long, comprising 15 years in total. Moreover, each move changes the peer groups of all those in the hospital left and all those in the hospital joined. The median number of surgeons in a hospital-year pair is 5, hence a surgeon moving from one median sized hospital to another changes the peer groups of 10 surgeons. For this reason, 60% of surgeons’ peers change in a typical year. These changes in peer groups can generate large fluctuations in average peer take-up because peer groups are small. Our Monte-Carlo experiment also demonstrates that the observed mobility rate is sufficient to accurately estimate ρ\rho.

All estimated models verify generic identification of ρ\rho. In the baseline model without surgeon demographics, the rank of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{WG}] is 2361 for the linear-in-means network, 2678 for the linear-in-others’-means network and 2683 for the persistent network. Since there are N=1363N=1363 surgeons, the rank requirement of Corollary 1.2 is satisfied.151515For these calculations, year dummies are included to construct the annihilator 𝐖\mathbf{W}, hence the partialling out is with respect to both hospital and year dummies. For models with surgeon demographics, these are also included to construct 𝐖\mathbf{W}.

5.3.   Results

Table 3: Peer Effects in Keyhole Surgery for Colorectal cancer
Linear-in-means Linear-in-others’-means Persistent LIOM
𝐆α\mathbf{G}\alpha 0.427 0.531 0.422 0.564 0.58 0.507 0.993 0.995 0.99
(0.06) (0.061) (0.072) (0.036) (0.036) (0.045) (0.008) (0.008) (0.013)
Age<40<40 -0.004 0.001 0.022
(0.022) (0.023) (0.021)
Age404440-44 0.03 0.035 0.047
(0.017) (0.018) (0.017)
Age454945-49 0.044 0.042 0.049
(0.012) (0.013) (0.012)
Age505450-54 0.03 0.029 0.032
(0.008) (0.008) (0.008)
Exper. 0.043 0.047 0.045
(0.003) (0.003) (0.003)
𝐅\mathbf{F}Age <40<40 -0.003 -0.012 0.019
(0.023) (0.019) (0.027)
𝐅\mathbf{F}Age 404440-44 0.01 0.007 0.022
(0.019) (0.016) (0.02)
𝐅\mathbf{F}Age 454945-49 -0.014 -0.008 0.005
(0.02) (0.017) (0.017)
𝐅\mathbf{F}Age 505450-54 0.002 0.003 0.013
(0.017) (0.013) (0.011)
𝐅\mathbf{F}Exper. 0.025 0.035 0.008
(0.008) (0.006) (0.005)
ρ(i=1N𝐆ijt2)1/2\rho\left(\sum_{i=1}^{N}\mathbf{G}_{ijt}^{2}\right)^{1/2} 0.18 0.122 0.103 0.269 0.193 0.161 0.443 0.345 0.354
σα\sigma_{\alpha} 0.168 0.253 0.294 0.181 0.262 0.315 0.207 0.297 0.306
Surgeon FE Yes Yes Yes Yes Yes Yes Yes Yes Yes
Hospital FE No Yes Yes No Yes Yes No Yes Yes
Year FE Yes Yes Yes Yes Yes Yes Yes Yes Yes
Observations 11923 11887 11887 11923 11887 11887 11923 11887 11887

Notes: To allow for heteroskedasticity, standard errors are computed using a wild bootstrap, as proposed by Arcidiacono et al. (2012). σα2\sigma^{2}_{\alpha} is computed using the estimated α\alpha, as proposed by Arcidiacono et al. (2012).

Table 3 reports the results. The first row reports estimates of ρ\rho. We find positive peer effects in all specifications. The peer effect is precisely estimated, and statistically distinguishable from zero in all specifications. This is in line with our Monte-Carlo results, and, though their application to peer effects in education is different to ours, with the results of Arcidiacono et al. (2012).

The vector α\alpha combines both surgeons’ ability and taste for innovation, but we cannot decompose the two with the available data. Clearly, the size of the peer effect is interpretable only relative to the variance of α\alpha. To provide an interpretable benchmark, we treat 𝐆\mathbf{G} as fixed and αi\alpha_{i} as i.i.d., with variance σα2\sigma^{2}_{\alpha} estimated using the estimated α\alpha, as proposed by Arcidiacono et al. (2012). For surgeon ii in year tt, the effect on take-up of a standard deviation increase in average peer latent propensity to innovate is ρσα(i=1N𝐆ijt2)1/2\rho\sigma_{\alpha}\left(\sum_{i=1}^{N}\mathbf{G}_{ijt}^{2}\right)^{1/2}. For the linear-in-means network this is ρσαNg(i,t)1/2\rho\sigma_{\alpha}N_{g(i,t)}^{-1/2}, and for linear-in-others’-means we replace Ng(i,t)N_{g(i,t)} by Ng(i,t)1N_{g(i,t)}-1. Since the effect of a standard deviation increase in own latent propensity to innovate is σα\sigma_{\alpha}, the magnitude of the peer effect relative to the own effect is ρ(i=1N𝐆ijt2)1/2\rho\left(\sum_{i=1}^{N}\mathbf{G}_{ijt}^{2}\right)^{1/2}. Table 3 reports its average over all observations. With hospital fixed effects, it is around 0.1 for linear-in-means, 0.15-0.2 for linear-in-others’-means and 0.35 for the persistent network. Our estimates imply that a standard deviation increase in average peer latent propensity to innovate increases take-up by 3 percentage points for linear-in-means, 5 percentage points for linear-in-others’-means and 10 percentage points for the persistent network. Our finding that the persistent network corresponds to the largest effect size suggests that both contemporaneous and cumulative peer exposure play a role.

For each network, we also estimate a specifications which includes the age and experience of a surgeon and their peers (i.e., the model in (3.5)). We find an inverted-U profile in age, with take-up estimated to peak between 45 and 49. The coefficient on experience is positive, suggesting that an additional hundred career colorectal cancer surgeries leads to around a 5 percentage point increase in take-up. The effect of peer age is close to zero, whilst peer experience plays a similar role to own experience. The estimate of ρ\rho is smaller than in models in which peer effects operate through α\alpha alone, though it remains positive and statistically significant at conventional levels. This suggests that some, but not all, of the peer effect operates through experience.

6.   Conclusion

This paper provides and implements new identification results for panel data models with peer effects. Our results suggest that these can typically be separated from correlated effects provided either that there is sufficient mobility in the data or that the network data are sufficiently detailed. We find positive peer effects in surgeons’ take-up of keyhole surgery for colorectal cancer, which operate through peer experience and latent propensity to take-up the innovation. Our results imply that exposure to others with high take-up and experience may be useful to increase innovation take-up. Based on this, policymakers might consider programmes which expose low take-up surgeons to high take-up and/or experienced surgeons. For example, one might conceive a targeted secondment programme.

References

  • Abowd et al. (1999) Abowd, J. M., F. Kramarz and D. N. Margolis, “High wage workers and high wage firms,” Econometrica 67 (1999), 251–333.
  • Agha and Molitor (2018) Agha, L. and D. Molitor, “The local influence of pioneer investigators on technology adoption: evidence from new cancer drugs,” Review of Economics and Statistics 100 (2018), 29–44.
  • Arcidiacono et al. (2012) Arcidiacono, P., G. Foster, N. Goodpaster and J. Kinsler, “Estimating spillovers using panel data, with an application to the classroom,” Quantitative Economics 3 (2012), 421–470.
  • Arnold et al. (2017) Arnold, M., M. S. Sierra, M. Laversanne, I. Soerjomataram, A. Jemal and F. Bray, “Global patterns and trends in colorectal cancer incidence and mortality,” Gut 66 (2017), 683–691.
  • Barrenho et al. (2020) Barrenho, E., E. Gautier, M. Miraldo, C. Propper and C. Rose, “Innovation Diffusion and Physician Networks: Keyhole Surgery for Cancer in the English NHS,” CEPR Discussion Paper 15515 (2020).
  • Blume et al. (2015) Blume, L. E., W. A. Brock, S. N. Durlauf and R. Jayaraman, “Linear social interactions models,” Journal of Political Economy 123 (2015), 444–496.
  • Bramoullé et al. (2009) Bramoullé, Y., H. Djebbari and B. Fortin, “Identification of peer effects through social networks,” Journal of econometrics 150 (2009), 41–55.
  • Bramoullé et al. (2019) ———, “Peer effects in networks: A survey,” (2019).
  • Calvó-Armengol et al. (2009) Calvó-Armengol, A., E. Patacchini and Y. Zenou, “Peer effects and social networks in education,” The Review of Economic Studies 76 (2009), 1239–1267.
  • Cohen-Cole et al. (2018) Cohen-Cole, E., X. Liu and Y. Zenou, “Multivariate choices and identification of social interactions,” Journal of Applied Econometrics 33 (2018), 165–178.
  • Cornelissen et al. (2017) Cornelissen, T., C. Dustmann and U. Schönberg, “Peer effects in the workplace,” American Economic Review 107 (2017), 425–56.
  • COSTSG (2004) COSTSG, “A comparison of laparoscopically assisted and open colectomy for colon cancer,” New England Journal of Medicine 350 (2004), 2050–2059.
  • Davezies et al. (2009) Davezies, L., X. d’Haultfoeuille and D. Fougère, “Identification of peer effects using group size variation,” The Econometrics Journal 12 (2009), 397–413.
  • De Giorgi et al. (2010) De Giorgi, G., M. Pellizzari and S. Redaelli, “Identification of social interactions through partially overlapping peer groups,” American Economic Journal: Applied Economics 2 (2010), 241–75.
  • De Paula (2017) De Paula, A., “Econometrics of network models,” in Advances in economics and econometrics: Theory and applications, eleventh world congress (Cambridge University Press Cambridge, 2017), 268–323.
  • Goldacre et al. (2013) Goldacre, M., J. Davidson, J. Maisonneuve and T. Lambert, “Geographical movement of doctors from education to training and eventual career post: UK cohort studies,” Journal of the Royal Society of Medicine 106 (2013), 96–104.
  • Goldsmith-Pinkham and Imbens (2013) Goldsmith-Pinkham, P. and G. W. Imbens, “Social networks and the identification of peer effects,” Journal of Business & Economic Statistics 31 (2013), 253–264.
  • Graham (2008) Graham, B. S., “Identifying social interactions through conditional variance restrictions,” Econometrica 76 (2008), 643–660.
  • Lacy et al. (2002) Lacy, A. M., J. C. García-Valdecasas, S. Delgado, A. Castells, P. Taurá, J. M. Piqué and J. Visa, “Laparoscopy-assisted colectomy versus open colectomy for treatment of non-metastatic colon cancer: a randomised trial,” The Lancet 359 (2002), 2224–2229.
  • Laudicella et al. (2016) Laudicella, M., B. Walsh, E. Burns and P. C. Smith, “Cost of care for cancer patients in England: evidence from population-based patient-level data,” British journal of cancer 114 (2016), 1286–1292.
  • Lee (2007) Lee, L.-F., “Identification and estimation of econometric models with group interactions, contextual factors and fixed effects,” Journal of Econometrics 140 (2007), 333–374.
  • Lewbel (2019) Lewbel, A., “The identification zoo: Meanings of identification in econometrics,” Journal of Economic Literature 57 (2019), 835–903.
  • Manski (1993) Manski, C. F., “Identification of endogenous social effects: The reflection problem,” The Review of Economic Studies 60 (1993), 531–542.
  • Manski (1995) ———, Identification problems in the social sciences (Harvard University Press, 1995).
  • Mas and Moretti (2009) Mas, A. and E. Moretti, “Peers at work,” American Economic Review 99 (2009), 112–45.
  • Moffitt et al. (2001) Moffitt, R. A. et al., “Policy interventions, low-level equilibria, and social interactions,” Social dynamics 4 (2001), 6–17.
  • Rose (2017) Rose, C. D., “Identification of peer effects through social networks using variance restrictions,” The Econometrics Journal 20 (2017), S47–S60.

Appendix

Proof of Proposition 1

Denote θ=(μα,ρ,μγ)\theta=(\mu^{\alpha},\rho,\mu^{\gamma}). Equation (3.2) yields 𝔼[𝐖𝐲]=𝐖𝐉μα+𝐖𝐆ρμα\mathbb{E}[\mathbf{Wy}]=\mathbf{WJ}\mu^{\alpha}+\mathbf{WG}\rho\mu^{\alpha}. Suppose that there is θ¯\overline{\theta} satisfying the conditional moment restriction in (3.2). Then we have

[𝐖𝐉,𝐖𝐆](μαμ¯αρμαρ¯μ¯α)=𝟎\displaystyle[\mathbf{WJ},\mathbf{WG}]\begin{pmatrix}\mu^{\alpha}-\overline{\mu}^{\alpha}\\ \rho\mu^{\alpha}-\overline{\rho}\overline{\mu}^{\alpha}\end{pmatrix}=\mathbf{0} (A.1)

Let 𝐯=(𝐯1,𝐯2)\mathbf{v}=(\mathbf{v}_{1}^{\prime},\mathbf{v}_{2}^{\prime}) be a vector in the null-space of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{WG}]. Then for some 𝐯\mathbf{v} we have

𝐯1=μαμ¯α,𝐯2=ρμαρ¯μ¯α\displaystyle\mathbf{v}_{1}=\mu^{\alpha}-\overline{\mu}^{\alpha},\quad\mathbf{v}_{2}=\rho\mu^{\alpha}-\overline{\rho}\overline{\mu}^{\alpha} (A.2)

which implies ρ¯𝐯1𝐯2=(ρ¯ρ)μα\overline{\rho}\mathbf{v}_{1}-\mathbf{v}_{2}=(\overline{\rho}-\rho)\mu^{\alpha}. If ρ¯ρ\overline{\rho}\neq\rho then this is equivalently expressed as

ρ¯ρ¯ρ𝐯11ρ¯ρ𝐯2=μα\displaystyle\frac{\overline{\rho}}{\overline{\rho}-\rho}\mathbf{v}_{1}-\frac{1}{\overline{\rho}-\rho}\mathbf{v}_{2}=\mu^{\alpha} (A.3)

If there does not exist 𝐯\mathbf{v} in the null-space of [𝐖𝐉,𝐖𝐆][\mathbf{WJ},\mathbf{WG}] and scalars δ1\delta_{1} and δ20\delta_{2}\neq 0 verifying δ1𝐯1+δ2𝐯2=μα\delta_{1}\mathbf{v}_{1}+\delta_{2}\mathbf{v}_{2}=\mu^{\alpha} then, by contradiction we have ρ¯=ρ\overline{\rho}=\rho, so ρ\rho is identified. Otherwise we can clearly have ρ¯ρ\overline{\rho}\neq\rho, so ρ\rho is not identified.

Endogenous peer effects

We now consider an extension of our identification results to models with endogenous peer effects. We consider the canonical model, in which outcomes are simutaneously determined (see Bramoullé et al. (2019) for a review of endogenous peer effects). The outcome equation is

yit=αi+ψj=1N𝐆ijtyjt+ρj=1N𝐆ijtαj+γg(i,t)+ϵit,\displaystyle y_{it}=\alpha_{i}+\psi\sum_{j=1}^{N}\mathbf{G}_{ijt}y_{jt}+\rho\sum_{j=1}^{N}\mathbf{G}_{ijt}\alpha_{j}+\gamma_{g(i,t)}+\epsilon_{it}, (A.4)

or in stacked form,

𝐲=ψ𝐅𝐲+(𝐉+ρ𝐆)α+𝐃γ+ϵ,\displaystyle\mathbf{y}=\psi\mathbf{F}\mathbf{y}+(\mathbf{J}+\rho\mathbf{G})\alpha+\mathbf{D}\gamma+\epsilon, (A.5)

where ψ\psi is a scalar parameter capturing the endogenous peer effect and 𝐅\mathbf{F} is a NT×NTNT\times NT block diagonal matrix with blocks 𝐆1,𝐆2,,𝐆T\mathbf{G}_{1},\mathbf{G}_{2},...,\mathbf{G}_{T}.

Throughout this section we suppose that the rows of 𝐆\mathbf{G} sum to one (𝐆ιN=ιNT\mathbf{G}\iota_{N}=\iota_{NT}), that there are no between-group interactions (𝐆ijt=0\mathbf{G}_{ijt}=0 if g(i,t)g(j,t)g(i,t)\neq g(j,t)) and |ψ|<1|\psi|<1. These are standard assumptions maintained in almost all papers concerning identification of peer effects, and are made to ensure that the reduced form exists and the reduced form correlated effect is proportional to the structural correlated effect. Solving for the reduced form and taking expectations yields

𝔼[𝐲]=(𝐈NTψ𝐅)1(𝐉+ρ𝐆)μα+𝐃(1ψ)1μγ.\displaystyle\mathbb{E}[\mathbf{y}]=(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}(\mathbf{J}+\rho\mathbf{G})\mu^{\alpha}+\mathbf{D}(1-\psi)^{-1}\mu^{\gamma}. (A.6)

We now modify Proposition 1 to allow for endogenous effects, making use of 𝐇=((𝐆12),(𝐆22),,(𝐆T2))\mathbf{H}=((\mathbf{G}_{1}^{2})^{\prime},(\mathbf{G}_{2}^{2})^{\prime},...,(\mathbf{G}_{T}^{2})^{\prime})^{\prime} and vectors 𝐯=(𝐯1,𝐯2,𝐯3)\mathbf{v}=(\mathbf{v}_{1}^{\prime},\mathbf{v}_{2}^{\prime},\mathbf{v}_{3}^{\prime})^{\prime} in the null-space of [𝐖𝐉,𝐖𝐆,𝐖𝐇][\mathbf{WJ},\mathbf{WG},\mathbf{WH}] such that 𝐯1,𝐯2,𝐯3\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3} are all N×1N\times 1.

Proposition 2.

ψ\psi and ρ\rho are identified if ψ+ρ0\psi+\rho\neq 0 and there does not exist a vector 𝐯=(𝐯1,𝐯2,𝐯3)\mathbf{v}=(\mathbf{v}_{1}^{\prime},\mathbf{v}_{2}^{\prime},\mathbf{v}_{3}^{\prime})^{\prime} in the null-space of [𝐖𝐉,𝐖𝐆,𝐖𝐇][\mathbf{WJ},\mathbf{WG},\mathbf{WH}] and scalars δ1,δ2,δ3,δ4\delta_{1},\delta_{2},\delta_{3},\delta_{4} satisfying δ1𝐯1𝐯2=(δ1δ2)μα\delta_{1}\mathbf{v}_{1}-\mathbf{v}_{2}=(\delta_{1}-\delta_{2})\mu^{\alpha}, δ3𝐯1+𝐯3=(δ3δ4)μα\delta_{3}\mathbf{v}_{1}+\mathbf{v}_{3}=(\delta_{3}-\delta_{4})\mu^{\alpha} and either δ1δ2\delta_{1}\neq\delta_{2} or δ3δ4\delta_{3}\neq\delta_{4}. Otherwise, ψ\psi and ρ\rho are not identified.

The identification conditions are similar in form to those of Proposition 1. Similarly to Corollary 1.1, generic identification of ρ\rho and ψ\psi is attained whenever 𝐯\mathbf{v} lies in a subspace of N\mathbb{R}^{N} of dimension strictly less than NN. A sufficient condition which can be used in practice is that [𝐖𝐉,𝐖𝐆,𝐖𝐇][\mathbf{WJ},\mathbf{WG},\mathbf{WH}] has rank at least equal to 2N+12N+1. The additional requirement that ψ+ρ0\psi+\rho\neq 0 is needed because ψ+ρ=0\psi+\rho=0 implies 𝔼[𝐲]=𝐉μα+𝐃(1ψ)1μγ,\mathbb{E}[\mathbf{y}]=\mathbf{J}\mu^{\alpha}+\mathbf{D}(1-\psi)^{-1}\mu^{\gamma}, such that ψ\psi and ρ\rho are not identifiable. This is because the endogenous effect exactly offsets the contextual effect, yielding no net peer effect.

As with Proposition 1, Proposition 2 can also be related to the work of (Bramoullé et al., 2009). The authors results imply that for T=1T=1, the peer effects ψ˙\dot{\psi} and ρ˙\dot{\rho} in the model

𝔼[𝐲|𝐗]=(𝐈Nψ˙𝐅)1(𝐗β˙+𝐆𝐗ρ˙)+𝐃(1ψ˙)1μ˙γ\displaystyle\mathbb{E}[\mathbf{y}|\mathbf{X}]=(\mathbf{I}_{N}-\dot{\psi}\mathbf{F})^{-1}(\mathbf{X}\dot{\beta}+\mathbf{GX}\dot{\rho})+\mathbf{D}(1-\dot{\psi})^{-1}\dot{\mu}^{\gamma} (A.7)

are identified if and only if 𝐖𝐉,𝐖𝐆\mathbf{WJ},\mathbf{WG} and 𝐖𝐇\mathbf{WH} are linearly independent (i.e., if there does not exist nonzero λ3\lambda\in\mathbb{R}^{3} such that λ1𝐖𝐉+λ2𝐖𝐆+λ3𝐖𝐆𝟐=𝟎\lambda_{1}\mathbf{WJ}+\lambda_{2}\mathbf{WG}+\lambda_{3}\mathbf{WG^{2}}=\mathbf{0}) and β˙ψ˙+ρ˙0\dot{\beta}\dot{\psi}+\dot{\rho}\neq 0. Analagously to the discussion of Proposition 1, the linear independence requirement is weaker and can be satisfied with T=1T=1 because 𝐗\mathbf{X} is assumed to be observable and exogenous. These requirements can be relaxed when T2T\geq 2. The condition β˙ψ˙+ρ˙0\dot{\beta}\dot{\psi}+\dot{\rho}\neq 0 is analagous to ψ+ρ0\psi+\rho\neq 0 in Proposition 2, and also rules out the case of no net peer effect.

Under the linear-in-means network, ρ\rho and ψ\psi are not separately identifiable without additional restrictions. This is because 𝐇=𝐆\mathbf{H}=\mathbf{G}, and we have

𝔼[𝐲]=(𝐉+ψ+ρ1ψ𝐆¯)μα+𝐃(1ψ)1μγ\displaystyle\mathbb{E}[\mathbf{y}]=\left(\mathbf{J}+\frac{\psi+\rho}{1-\psi}\overline{\mathbf{G}}\right)\mu^{\alpha}+\mathbf{D}(1-\psi)^{-1}\mu^{\gamma} (A.8)

so that only the reduced form parameter (ψ+ρ)(1ψ)1(\psi+\rho)(1-\psi)^{-1} is identifiable. Since (A.8) takes the same form as (3.2), the results of Section 3 can be applied to establish identification of (ψ+ρ)(1ψ)1(\psi+\rho)(1-\psi)^{-1}.

To separately identify ρ\rho and ψ\psi, one can use additional conditional variance restrictions. To this end, we consider the variance

𝕍[𝐖𝐲]=\displaystyle\mathbb{V}[\mathbf{Wy}]= 𝐖(𝐈NTψ𝐅)1(𝐉+ρ𝐆)𝕍[α](𝐉+ρ𝐆)(𝐈NTψ𝐅)1𝐖\displaystyle\mathbf{W}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}(\mathbf{J}+\rho\mathbf{G})\mathbb{V}[\alpha](\mathbf{J}+\rho\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}\mathbf{W}
+\displaystyle+ 𝐖(𝐈NTψ𝐅)1𝕍[ϵ](𝐈NTψ𝐅)1𝐖\displaystyle\mathbf{W}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}\mathbb{V}[\epsilon](\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}\mathbf{W}

and the restrictions

𝕆𝕍[ϵit,ϵjs]={σ2(g(i,t))i=j,t=s0otherwise,𝕆𝕍[αi,αj]={σα2i=j0otherwise.\displaystyle\mathbb{COV}[\epsilon_{it},\epsilon_{js}]=\begin{cases}\sigma^{2}(g(i,t))&i=j,t=s\\ 0&\text{otherwise}\end{cases},\quad\mathbb{COV}[\alpha_{i},\alpha_{j}]=\begin{cases}\sigma^{2}_{\alpha}&i=j\\ 0&\text{otherwise}\end{cases}. (A.9)

As with the conditional mean restriction in (3.1) considered thus far, when 𝐆\mathbf{G} and 𝐃\mathbf{D} are treated as random the variance and restrictions above are conditional on 𝐆,𝐃\mathbf{G},\mathbf{D}. We continue to present the fixed case for ease of exposition.

Our approach is similar in spirit to Graham (2008) and Rose (2017), which consider a cross-section of networks. The restriction on the variance of ϵ\epsilon requires that the transitory shocks experienced by individuals in the same group be uncorrelated and have equal variance. Uncorrelatedness implies that outcomes are correlated only due to the correlated effect and peer effects, whereas equality of variance implies that within a group, no individuals are subject to systematically larger shocks than others. Importantly, 𝕆𝕍[αi,γg(j,t)]\mathbb{COV}[\alpha_{i},\gamma_{g(j,t)}] is unrestricted for all i,ji,j and tt. This allows for sorting of individuals to groups based on their time-invariant heterogeneity.

We first present the identification result for the linear-in-means network structure. In the next section we derive an identification result which applies to general network structures and under weaker conditional variance restrictions. In general, we require only that 𝕆𝕍[ϵit,ϵjs]=0\mathbb{COV}[\epsilon_{it},\epsilon_{js}]=0 if g(i,t)g(j,t)g(i,t)\neq g(j,t). The stronger restriction in (A.9) is necessary for the linear-in-means network, which is well known to be the most challenging case (Manski, 1993; Bramoullé et al., 2009).

Proposition 3.

If 𝐆=𝐆¯\mathbf{G}=\overline{\mathbf{G}}, ψ\psi and ρ\rho are identified if 𝐖,𝐖𝐉𝐉𝐖,𝐖(𝐉𝐆+𝐆𝐉)𝐖,𝐖𝐆𝐆𝐖\mathbf{W},\mathbf{WJJ^{\prime}W},\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W},\mathbf{WGG^{\prime}W} and 𝐖𝐅𝐖\mathbf{WFW} are linearly independent.

In contrast to Proposition 2 there is no requirement that ψ+ρ0\psi+\rho\neq 0. This is because the endogenous effect operates through the outcomes, and hence propagates variation both in α\alpha and ϵ\epsilon, whilst the contextual effect operates through α\alpha alone. Since they operate through different channels, the peer effects can be separated under restrictions on the within-group variance of ϵ\epsilon such as those in (A.9). The identification condition in Proposition 3 fails if T=1T=1 because 𝐅=𝐆=𝐆=𝐆𝐆\mathbf{F}=\mathbf{G}=\mathbf{G}^{\prime}=\mathbf{GG}^{\prime} and 𝐉=𝐈N\mathbf{J}=\mathbf{I}_{N} so that 𝐖(𝐉𝐆+𝐆𝐉)𝐖=2𝐖𝐆𝐆𝐖=2𝐖𝐅𝐖\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}=2\mathbf{WGG^{\prime}W}=2\mathbf{WFW}. This is in agreement with Rose (2017), which shows that the conditional variance restrictions in (A.9) are insufficient for identification with a cross-section of networks.161616Graham (2008) shows that contextual effects can be identified with a cross-section of networks under additional restrictions, including on 𝕆𝕍[αi,γg(j,t)]\mathbb{COV}[\alpha_{i},\gamma_{g(j,t)}].

Due to its simplistic nature, the identification condition in Proposition 3 fails in Example 1. However, it typically holds in more realistic examples. For example, identification is restored if a third individual is added and individual three is in group 2 in both periods.

Conditional Variance Restrictions for a General Network Structure

In the general network case identification can be attained without making restrictions on the within-group variance of ϵ\epsilon. This is in contrast to the linear-in-means network.171717See the end of the proof of Proposition 3 for an explanation of why restrictions on the within-group variance are necessary for the linear-in-means model. One can use

𝕆𝕍[ϵit,ϵjs]=0g(i,t)g(j,s),𝕆𝕍[αi,αj]={σα2i=j0otherwise\displaystyle\mathbb{COV}[\epsilon_{it},\epsilon_{js}]=0\quad g(i,t)\neq g(j,s),\quad\mathbb{COV}[\alpha_{i},\alpha_{j}]=\begin{cases}\sigma^{2}_{\alpha}&i=j\\ 0&\text{otherwise}\end{cases} (A.10)

instead of (A.9), which allows for unrestricted within-group variance of ϵ\epsilon. We make use of the following definition from Rose (2017), reprinted here for convenience.

  • Definition

    Consider LL matrices of the same dimension 𝐀1,,𝐀L\mathbf{A}_{1},...,\mathbf{A}_{L}. The matrix 𝐀l\mathbf{A}_{l} (l[L]l\in[L]) is maximally linearly independent of 𝐀1,,𝐀L\mathbf{A}_{1},...,\mathbf{A}_{L} if λl=0\lambda_{l}=0 for all λL\lambda\in\mathbb{R}^{L} such that l=1Lλl𝐀l=𝟎\sum_{l=1}^{L}\lambda_{l}\mathbf{A}_{l}=\mathbf{0}.

Note that linear independence of 𝐀1,,𝐀L\mathbf{A}_{1},...,\mathbf{A}_{L} is equivalent to each of the LL matrices being maximally linearly independent. In what follows we require only that a strict subset be maximially linearly independent, which is weaker than linear independence. Identification hinges on the covariance terms for outcomes of observations in different groups, which are non-zero provided that there is mobility between groups. To extract these covariance terms, we use m[NT]\mathcal{E}_{m}\subseteq[NT] to denote the indices of the rows of 𝐲\mathbf{y} corresponding to group m[M]m\in[M] and define 𝐄m\mathbf{E}_{m} as the matrix constructed from rows m\mathcal{E}_{m} of 𝐈NT\mathbf{I}_{NT}. Pre-multiplying any conformable matrix by 𝐄m\mathbf{E}_{m} extracts rows m\mathcal{E}_{m}, and post-multiplying by 𝐄m\mathbf{E}_{m}^{\prime} extracts columns m\mathcal{E}_{m}. We also use 𝐄m\mathbf{E}_{-m} to extract the rows in the complement of m\mathcal{E}_{m}.

Proposition 4.

ψ,ρ,σα2\psi,\rho,\sigma^{2}_{\alpha} are identified if ψ+ρ0\psi+\rho\neq 0 and there exists m[M]m\in[M] such that 𝐄m𝐖𝐉𝐉𝐖𝐄m,𝐄m𝐖(𝐉𝐆+𝐆𝐉)𝐖𝐄m\mathbf{E}_{m}\mathbf{WJJ^{\prime}W}\mathbf{E}_{-m}^{\prime},\mathbf{E}_{m}\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime} and 𝐄m𝐖(𝐉𝐇+𝐇𝐉)𝐖𝐄m\mathbf{E}_{m}\mathbf{W}(\mathbf{JH}^{\prime}+\mathbf{HJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime} are maximially linearly independent from 𝐄m𝐖𝐉𝐉𝐖𝐄m,𝐄m𝐖(𝐉𝐆+𝐆𝐉)𝐖𝐄m,𝐄m𝐖(𝐉𝐇+𝐇𝐉)𝐖𝐄m,𝐄m𝐖(𝐆𝐇+𝐇𝐆)𝐖𝐄m,𝐄m𝐖𝐆𝐆𝐖𝐄m,\mathbf{E}_{m}\mathbf{WJJ^{\prime}W}\mathbf{E}_{-m}^{\prime},\mathbf{E}_{m}\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime},\mathbf{E}_{m}\mathbf{W}(\mathbf{JH}^{\prime}+\mathbf{HJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime},\mathbf{E}_{m}\mathbf{W}(\mathbf{GH}^{\prime}+\mathbf{HG}^{\prime})\mathbf{W}\mathbf{E}_{-m},\mathbf{E}_{m}\mathbf{WGG^{\prime}W}\mathbf{E}_{-m}^{\prime}, and 𝐄m𝐖𝐇𝐇𝐖𝐄m\mathbf{E}_{m}\mathbf{WHH^{\prime}W}\mathbf{E}_{-m}^{\prime}.

Identically to Proposition 2, ψ+ρ0\psi+\rho\neq 0 is required so that the endogenous and contextual effects do not exactly offset one another. This contrasts with Proposition 3, which concerns the linear-in-means network, for which ψ+ρ0\psi+\rho\neq 0 is not required. The reason for this is that Proposition 3 additionally restricts the within-group variance of ϵ\epsilon because restrictions on the between-group variance alone are insufficient for identification. Since the endogenous effects operate both through ϵ\epsilon and α\alpha, whereas the contextual effects operate only through α\alpha, under restrictions on the within-group variance of ϵ\epsilon there are no values of ψ\psi and ρ\rho such that the two peer effects exactly offset one another in the reduced form variance.

Proof of Proposition 2

Denote θ=(ψ,ρ,α,γ)\theta=(\psi,\rho,\alpha,\gamma). Under the conditional moment restriction we have

𝔼[𝐲t]=(𝐈Nψ𝐆t)1(𝐈Nμα+𝐆tρμα+𝐃tμγ)\displaystyle\mathbb{E}[\mathbf{y}_{t}]=(\mathbf{I}_{N}-\psi\mathbf{G}_{t})^{-1}(\mathbf{I}_{N}\mu^{\alpha}+\mathbf{G}_{t}\rho\mu^{\alpha}+\mathbf{D}_{t}\mu^{\gamma}) (A.11)

for t[T]t\in[T]. The inverse of (𝐈Nψ𝐆t)(\mathbf{I}_{N}-\psi\mathbf{G}_{t}) exists because 𝐆tιN=ιN\mathbf{G}_{t}\iota_{N}=\iota_{N} and |ψ|<1|\psi|<1. Since 𝐆tιN=ιN\mathbf{G}_{t}\iota_{N}=\iota_{N} and 𝐆ijt=0\mathbf{G}_{ijt}=0 if g(i,t)g(j,t)g(i,t)\neq g(j,t) we have (𝐈Nψ𝐆t)1𝐃tμγ=𝐃t(1ψ)1μγ(\mathbf{I}_{N}-\psi\mathbf{G}_{t})^{-1}\mathbf{D}_{t}\mu^{\gamma}=\mathbf{D}_{t}(1-\psi)^{-1}\mu^{\gamma}, hence

𝔼[𝐲t]=(𝐈Nψ𝐆t)1(𝐈N+ρ𝐆t)μα+𝐃t(1ψ)1μγ\displaystyle\mathbb{E}[\mathbf{y}_{t}]=(\mathbf{I}_{N}-\psi\mathbf{G}_{t})^{-1}(\mathbf{I}_{N}+\rho\mathbf{G}_{t})\mu^{\alpha}+\mathbf{D}_{t}(1-\psi)^{-1}\mu^{\gamma} (A.12)

Suppose that there is θ¯\overline{\theta} satisfying the conditional moment restriction. Then we have

(𝐈Nψ𝐆t)1(𝐈N+ρ𝐆t)μα+𝐃t(1ψ)1μγ=(𝐈Nψ¯𝐆t)1(𝐈N+ρ¯𝐆t)μ¯α+𝐃t(1ψ¯)1μ¯γ\displaystyle(\mathbf{I}_{N}-\psi\mathbf{G}_{t})^{-1}(\mathbf{I}_{N}+\rho\mathbf{G}_{t})\mu^{\alpha}+\mathbf{D}_{t}(1-\psi)^{-1}\mu^{\gamma}=(\mathbf{I}_{N}-\overline{\psi}\mathbf{G}_{t})^{-1}(\mathbf{I}_{N}+\overline{\rho}\mathbf{G}_{t})\overline{\mu}^{\alpha}+\mathbf{D}_{t}(1-\overline{\psi})^{-1}\overline{\mu}^{\gamma} (A.13)

Pre-multiplying both sides by (𝐈Nψ𝐆t)(𝐈Nψ¯𝐆t)(\mathbf{I}_{N}-\psi\mathbf{G}_{t})(\mathbf{I}_{N}-\overline{\psi}\mathbf{G}_{t}) and rearranging yields

𝐈N(μαμ¯α)+𝐆t((ρψ¯)μα(ρ¯ψ)μ¯α)+𝐆t2(ψ¯ρμα+ψρ¯μ¯α)\displaystyle\mathbf{I}_{N}(\mu^{\alpha}-\overline{\mu}^{\alpha})+\mathbf{G}_{t}\left((\rho-\overline{\psi})\mu^{\alpha}-(\overline{\rho}-\psi)\overline{\mu}^{\alpha}\right)+\mathbf{G}_{t}^{2}\left(-\overline{\psi}\rho\mu^{\alpha}+\psi\overline{\rho}\overline{\mu}^{\alpha}\right)
+𝐃t((1ψ¯)μγ(1ψ)μ¯γ)=𝟎\displaystyle+\mathbf{D}_{t}\left((1-\overline{\psi})\mu^{\gamma}-(1-\psi)\overline{\mu}^{\gamma}\right)=\mathbf{0} (A.14)

Stacking for t[T]t\in[T] yields

𝐉(μαμ¯α)+𝐆((ρψ¯)μα(ρ¯ψ)μ¯α)+𝐇(ψ¯ρμα+ψρ¯μ¯α)+𝐃((1ψ¯)μγ(1ψ)μ¯γ)=𝟎\displaystyle\mathbf{J}(\mu^{\alpha}-\overline{\mu}^{\alpha})+\mathbf{G}\left((\rho-\overline{\psi})\mu^{\alpha}-(\overline{\rho}-\psi)\overline{\mu}^{\alpha}\right)+\mathbf{H}\left(-\overline{\psi}\rho\mu^{\alpha}+\psi\overline{\rho}\overline{\mu}^{\alpha}\right)+\mathbf{D}\left((1-\overline{\psi})\mu^{\gamma}-(1-\psi)\overline{\mu}^{\gamma}\right)=\mathbf{0} (A.15)

and applying 𝐖\mathbf{W} on the left yields

[𝐖𝐉,𝐖𝐆,𝐖𝐇](μαμ¯α(ρψ¯)μα(ρ¯ψ)μ¯αψ¯ρμα+ψρ¯μ¯α)=𝟎\displaystyle[\mathbf{WJ},\mathbf{WG},\mathbf{WH}]\begin{pmatrix}\mu^{\alpha}-\overline{\mu}^{\alpha}\\ (\rho-\overline{\psi})\mu^{\alpha}-(\overline{\rho}-\psi)\overline{\mu}^{\alpha}\\ -\overline{\psi}\rho\mu^{\alpha}+\psi\overline{\rho}\overline{\mu}^{\alpha}\end{pmatrix}=\mathbf{0} (A.16)

Let 𝐯=(𝐯1,𝐯2,𝐯3)\mathbf{v}=(\mathbf{v}_{1}^{\prime},\mathbf{v}_{2}^{\prime},\mathbf{v}_{3}^{\prime}) be a vector in the null-space of [𝐖𝐉,𝐖𝐆,𝐖𝐇][\mathbf{WJ},\mathbf{WG},\mathbf{WH}]. Then for some 𝐯\mathbf{v},

𝐯1=μαμ¯α,𝐯2=(ρψ¯)μα(ρ¯ψ)μ¯α,𝐯3=ψ¯ρμα+ψρ¯μ¯α\displaystyle\mathbf{v}_{1}=\mu^{\alpha}-\overline{\mu}^{\alpha},\quad\mathbf{v}_{2}=(\rho-\overline{\psi})\mu^{\alpha}-(\overline{\rho}-\psi)\overline{\mu}^{\alpha},\quad\mathbf{v}_{3}=-\overline{\psi}\rho\mu^{\alpha}+\psi\overline{\rho}\overline{\mu}^{\alpha} (A.17)

Eliminating μ¯α\overline{\mu}^{\alpha} yields

(ρ¯ψ)𝐯1𝐯2=(ρ¯ψ+ψ¯ρ)μα,ψρ¯𝐯1+𝐯3=(ψρ¯ψ¯ρ)μα\displaystyle(\overline{\rho}-\psi)\mathbf{v}_{1}-\mathbf{v}_{2}=(\overline{\rho}-\psi+\overline{\psi}-\rho)\mu^{\alpha},\quad\psi\overline{\rho}\mathbf{v}_{1}+\mathbf{v}_{3}=(\psi\overline{\rho}-\overline{\psi}\rho)\mu^{\alpha} (A.18)

If there does not exist 𝐯\mathbf{v} in the null-space of [𝐖𝐉,𝐖𝐆,𝐖𝐇][\mathbf{WJ},\mathbf{WG},\mathbf{WH}] and scalars δ1,δ2,δ3,δ4\delta_{1},\delta_{2},\delta_{3},\delta_{4} satisfying δ1𝐯1𝐯2=(δ1δ2)μα\delta_{1}\mathbf{v}_{1}-\mathbf{v}_{2}=(\delta_{1}-\delta_{2})\mu^{\alpha}, δ3𝐯1+𝐯3=(δ3δ4)μα\delta_{3}\mathbf{v}_{1}+\mathbf{v}_{3}=(\delta_{3}-\delta_{4})\mu^{\alpha} and either δ1δ2\delta_{1}\neq\delta_{2} or δ3δ4\delta_{3}\neq\delta_{4} then we have

ρ¯ψ+ψ¯ρ=0,ψρ¯ψ¯ρ=0\displaystyle\overline{\rho}-\psi+\overline{\psi}-\rho=0,\quad\psi\overline{\rho}-\overline{\psi}\rho=0 (A.19)

Solving the first equation for ψ¯\overline{\psi} and injecting into the second yields (ρ¯ρ)(ψ+ρ)=0(\overline{\rho}-\rho)(\psi+\rho)=0, hence ρ¯=ρ\overline{\rho}=\rho provided that ψ+ρ0\psi+\rho\neq 0. Injecting ρ¯=ρ\overline{\rho}=\rho into the first equation yields ψ¯=ψ\overline{\psi}=\psi.

Proof of Proposition 3

Denote θ=(ψ,ρ,σα2)\theta=(\psi,\rho,\sigma^{2}_{\alpha})^{\prime}. We have the reduced form

𝐲=(𝐈NTψ𝐅)1(𝐉+ρ𝐆)α+𝐃(1ψ)1β+(𝐈NTψ𝐅)1ϵ\displaystyle\mathbf{y}=(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}(\mathbf{J}+\rho\mathbf{G})\alpha+\mathbf{D}(1-\psi)^{-1}\beta+(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}\epsilon (A.20)

Since 𝐆=𝐆¯\mathbf{G}=\overline{\mathbf{G}}, we have 𝐆t=𝐆t2\mathbf{G}_{t}=\mathbf{G}_{t}^{2} for t[T]t\in[T] and 𝐅=𝐅2\mathbf{F}=\mathbf{F}^{2}, hence (𝐈NTψ𝐅)1=𝐈NT+ψ1ψ𝐅(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}=\mathbf{I}_{NT}+\frac{\psi}{1-\psi}\mathbf{F} and

𝐖𝐲=𝐖(𝐉+ψ+ρ1ψ𝐆)α+𝐖(𝐈NT+ψ1ψ𝐅)ϵ\displaystyle\mathbf{W}\mathbf{y}=\mathbf{W}\left(\mathbf{J}+\frac{\psi+\rho}{1-\psi}\mathbf{G}\right)\alpha+\mathbf{W}\left(\mathbf{I}_{NT}+\frac{\psi}{1-\psi}\mathbf{F}\right)\epsilon (A.21)

with variance

𝕍[𝐖𝐲]\displaystyle\mathbb{V}[\mathbf{Wy}] =σα2𝐖(𝐉+ψ+ρ1ψ𝐆)(𝐉+ψ+ρ1ψ𝐆)𝐖\displaystyle=\sigma^{2}_{\alpha}\mathbf{W}\left(\mathbf{J}+\frac{\psi+\rho}{1-\psi}\mathbf{G}\right)\left(\mathbf{J}+\frac{\psi+\rho}{1-\psi}\mathbf{G}\right)\mathbf{W} (A.22)
+𝐖(𝐈NT+ψ1ψ𝐅)Σ(𝐈NT+ψ1ψ𝐅)𝐖\displaystyle+\mathbf{W}\left(\mathbf{I}_{NT}+\frac{\psi}{1-\psi}\mathbf{F}\right)\Sigma\left(\mathbf{I}_{NT}+\frac{\psi}{1-\psi}\mathbf{F}\right)\mathbf{W} (A.23)
=σα2𝐖𝐉𝐉𝐖+σα2(ψ+ρ)1ψ𝐖(𝐉𝐆+𝐆𝐉)𝐖+σα2(ψ+ρ)2(1ψ)2𝐖𝐆𝐆𝐖\displaystyle=\sigma^{2}_{\alpha}\mathbf{WJJ^{\prime}W}+\frac{\sigma^{2}_{\alpha}(\psi+\rho)}{1-\psi}\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}+\frac{\sigma^{2}_{\alpha}(\psi+\rho)^{2}}{(1-\psi)^{2}}\mathbf{W}\mathbf{GG}^{\prime}\mathbf{W} (A.24)
+𝐖Σ𝐖+ψ(2ψ)(1ψ)2𝐖𝐅Σ𝐅𝐖\displaystyle+\mathbf{W}\Sigma\mathbf{W}+\frac{\psi(2-\psi)}{(1-\psi)^{2}}\mathbf{W}\mathbf{F}\Sigma\mathbf{F}\mathbf{W} (A.25)

where Σ=𝕍[ϵ]\Sigma=\mathbb{V}[\epsilon]. Consider group m[M]m\in[M] and define m[NT]\mathcal{E}_{m}\subseteq[NT] to be the indices of the rows of 𝐲\mathbf{y} which correspond to group mm. Then we can define 𝐄m\mathbf{E}_{m} as the matrix constructed from rows m\mathcal{E}_{m} of 𝐈NT\mathbf{I}_{NT}. Pre-multiplying any conformable matrix by 𝐄m\mathbf{E}_{m} extracts rows m\mathcal{E}_{m}, and post-multiplying by 𝐄m\mathbf{E}_{m}^{\prime} extracts columns m\mathcal{E}_{m}. Consider first the within-group variance for group mm, given by

σα2𝐄m𝐖𝐉𝐉𝐖𝐄m+σα2(ψ+ρ)1ψ𝐄m𝐖(𝐉𝐆+𝐆𝐉)𝐖𝐄m\displaystyle\sigma^{2}_{\alpha}\mathbf{E}_{m}\mathbf{WJJ^{\prime}W}\mathbf{E}_{m}^{\prime}+\frac{\sigma^{2}_{\alpha}(\psi+\rho)}{1-\psi}\mathbf{E}_{m}\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}\mathbf{E}_{m}^{\prime} (A.26)
+σα2(ψ+ρ)2(1ψ)2𝐄m𝐖𝐆𝐆𝐖𝐄m+σ2(m)𝐄m𝐖𝐄m+σ2(m)ψ(2ψ)(1ψ)2𝐄m𝐖𝐅𝐖𝐄m\displaystyle+\frac{\sigma^{2}_{\alpha}(\psi+\rho)^{2}}{(1-\psi)^{2}}\mathbf{E}_{m}\mathbf{W}\mathbf{GG}^{\prime}\mathbf{W}\mathbf{E}_{m}^{\prime}+\sigma^{2}(m)\mathbf{E}_{m}\mathbf{W}\mathbf{E}_{m}^{\prime}+\frac{\sigma^{2}(m)\psi(2-\psi)}{(1-\psi)^{2}}\mathbf{E}_{m}\mathbf{W}\mathbf{F}\mathbf{W}\mathbf{E}_{m}^{\prime} (A.27)

and the between group variance given by

σα2𝐄m𝐖𝐉𝐉𝐖𝐄m+σα2(ψ+ρ)1ψ𝐄m𝐖(𝐉𝐆+𝐆𝐉)𝐖𝐄m+σα2(ψ+ρ)2(1ψ)2𝐄m𝐖𝐆𝐆𝐖𝐄m\displaystyle\sigma^{2}_{\alpha}\mathbf{E}_{m}\mathbf{WJJ^{\prime}W}\mathbf{E}_{-m}^{\prime}+\frac{\sigma^{2}_{\alpha}(\psi+\rho)}{1-\psi}\mathbf{E}_{m}\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime}+\frac{\sigma^{2}_{\alpha}(\psi+\rho)^{2}}{(1-\psi)^{2}}\mathbf{E}_{m}\mathbf{W}\mathbf{GG}^{\prime}\mathbf{W}\mathbf{E}_{-m}^{\prime} (A.28)

where 𝐄m\mathbf{E}_{-m} extracts the rows mc\mathcal{E}_{m}^{c}, which denotes the complement of m\mathcal{E}_{m}. Notice first that the between-group variance alone is insufficient to identify ψ\psi and ρ\rho. Only (ψ+ρ)(1ψ)1(\psi+\rho)(1-\psi)^{-1} is identifiable based on 𝕆𝕍[ϵit,ϵjt]=0\mathbb{COV}[\epsilon_{it},\epsilon_{jt}]=0 for all (i,j,t)[N]2×T:g(i,t)g(j,t)(i,j,t)\in[N]^{2}\times T:g(i,t)\neq g(j,t) alone. However, under the additional restrictions on the within-group variance in (A.9), we obtain two additional terms which allow additionally for identification of ψ(2ψ)(1ψ)2\psi(2-\psi)(1-\psi)^{-2}. Now suppose that there is θ¯\overline{\theta} satisfying the conditional variance restrictions in (A.9). Then, if 𝐖,𝐖𝐉𝐉𝐖,𝐖(𝐉𝐆+𝐆𝐉)𝐖,𝐖𝐆𝐆𝐖\mathbf{W},\mathbf{WJJ^{\prime}W},\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W},\mathbf{WGG^{\prime}W} and 𝐖𝐅𝐖\mathbf{WFW} are linearly independent then there exists mm such that σ¯2(m)=σ2(m)\overline{\sigma}^{2}(m)=\sigma^{2}(m), σ¯α2=σα2\overline{\sigma}^{2}_{\alpha}=\sigma^{2}_{\alpha} and

ψ+ρ1ψ=ψ¯+ρ¯1ψ¯,ψ(2ψ)(1ψ)2=ψ¯(2ψ¯)(1ψ¯)2\displaystyle\frac{\psi+\rho}{1-\psi}=\frac{\overline{\psi}+\overline{\rho}}{1-\overline{\psi}},\quad\frac{\psi(2-\psi)}{(1-\psi)^{2}}=\frac{\overline{\psi}(2-\overline{\psi})}{(1-\overline{\psi})^{2}} (A.29)

These two equations have two solutions given by ψ¯=ψ,ρ¯=ρ\overline{\psi}=\psi,\overline{\rho}=\rho and ψ¯=2ψ,ρ¯=(2ψ+3ρψρ2)(1+ψ)1\overline{\psi}=2-\psi,\overline{\rho}=(2\psi+3\rho-\psi\rho-2)(1+\psi)^{-1}. Since |ψ|<1|\psi|<1 we have |2ψ|>1|2-\psi|>1, hence the second solution is infeasible. Hence we obtain θ¯=θ\overline{\theta}=\theta.

Proof of Proposition 4 Denote θ=(ψ,ρ,σα2)\theta=(\psi,\rho,\sigma^{2}_{\alpha})^{\prime}. Under the variance restriction (A.9) we have

𝕍[𝐲]\displaystyle\mathbb{V}[\mathbf{y}] =σα2(𝐈NTψ𝐅)1(𝐉+ρ𝐆)(𝐉+ρ𝐆)(𝐈NTψ𝐅)1+1(1ψ)2𝐃Σγ𝐃\displaystyle=\sigma^{2}_{\alpha}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}(\mathbf{J}+\rho\mathbf{G})(\mathbf{J}+\rho\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}+\frac{1}{(1-\psi)^{2}}\mathbf{D}\Sigma_{\gamma}\mathbf{D}^{\prime} (A.30)
+(𝐈NTψ𝐅)1Σ(𝐈NTψ𝐅)1+11ψ(𝐈NTψ𝐅)1(𝐉+ρ𝐆)Σαγ𝐃\displaystyle+(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}\Sigma(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}+\frac{1}{1-\psi}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}(\mathbf{J}+\rho\mathbf{G})\Sigma_{\alpha\gamma}\mathbf{D}^{\prime} (A.31)
+11ψ𝐃Σαγ(𝐉+ρ𝐆)(𝐈NTψ𝐅)1+11ψ(𝐈NTψ𝐅)1Σϵγ𝐃\displaystyle+\frac{1}{1-\psi}\mathbf{D}\Sigma_{\alpha\gamma}^{\prime}(\mathbf{J}+\rho\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}+\frac{1}{1-\psi}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}\Sigma_{\epsilon\gamma}\mathbf{D}^{\prime} (A.32)
+11ψ𝐃Σϵγ(𝐈NTψ𝐅)1\displaystyle+\frac{1}{1-\psi}\mathbf{D}\Sigma_{\epsilon\gamma}^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1} (A.33)

where Σ\Sigma encodes the variance of ϵ\epsilon, Σαγ\Sigma_{\alpha\gamma} encodes the covariance terms for α,γ\alpha,\gamma and similarly for Σϵγ\Sigma_{\epsilon\gamma}. Suppose that there is θ¯\overline{\theta} satisfying the variance restriction in (A.10). Then we have

σα2(𝐈NTψ𝐅)1(𝐉+ρ𝐆)(𝐉+ρ𝐆)(𝐈NTψ𝐅)1+1(1ψ)2𝐃Σγ𝐃\displaystyle\sigma^{2}_{\alpha}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}(\mathbf{J}+\rho\mathbf{G})(\mathbf{J}+\rho\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}+\frac{1}{(1-\psi)^{2}}\mathbf{D}\Sigma_{\gamma}\mathbf{D}^{\prime} (A.34)
+(𝐈NTψ𝐅)1Σ(𝐈NTψ𝐅)1+11ψ(𝐈NTψ𝐅)1(𝐉+ρ𝐆)Σαγ𝐃\displaystyle+(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}\Sigma(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}+\frac{1}{1-\psi}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}(\mathbf{J}+\rho\mathbf{G})\Sigma_{\alpha\gamma}\mathbf{D}^{\prime} (A.35)
+11ψ𝐃Σαγ(𝐉+ρ𝐆)(𝐈NTψ𝐅)1+11ψ(𝐈NTψ𝐅)1Σϵγ𝐃\displaystyle+\frac{1}{1-\psi}\mathbf{D}\Sigma_{\alpha\gamma}^{\prime}(\mathbf{J}+\rho\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}+\frac{1}{1-\psi}(\mathbf{I}_{NT}-\psi\mathbf{F})^{-1}\Sigma_{\epsilon\gamma}\mathbf{D}^{\prime} (A.36)
+11ψ𝐃Σϵγ(𝐈NTψ𝐅)1=\displaystyle+\frac{1}{1-\psi}\mathbf{D}\Sigma_{\epsilon\gamma}^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})^{-1}= (A.37)
σ¯α2(𝐈NTψ¯𝐅)1(𝐉+ρ¯𝐆)(𝐉+ρ¯𝐆)(𝐈NTψ¯𝐅)1+1(1ψ¯)2𝐃Σ¯γ𝐃\displaystyle\overline{\sigma}^{2}_{\alpha}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F})^{-1}(\mathbf{J}+\overline{\rho}\mathbf{G})(\mathbf{J}+\overline{\rho}\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}^{\prime})^{-1}+\frac{1}{(1-\overline{\psi})^{2}}\mathbf{D}\overline{\Sigma}_{\gamma}\mathbf{D}^{\prime} (A.38)
+ϵ(𝐈NTψ¯𝐅)1Σ¯(𝐈NTψ¯𝐅)1+11ψ¯(𝐈NTψ¯𝐅)1(𝐉+ρ¯𝐆)Σ¯αγ𝐃\displaystyle+\epsilon(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F})^{-1}\overline{\Sigma}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}^{\prime})^{-1}+\frac{1}{1-\overline{\psi}}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F})^{-1}(\mathbf{J}+\overline{\rho}\mathbf{G})\overline{\Sigma}_{\alpha\gamma}\mathbf{D}^{\prime} (A.39)
+11ψ¯𝐃Σ¯αγ(𝐉+ρ¯𝐆)(𝐈NTψ¯𝐅)1+11ψ¯(𝐈NTψ¯𝐅)1Σ¯ϵγ𝐃\displaystyle+\frac{1}{1-\overline{\psi}}\mathbf{D}\overline{\Sigma}_{\alpha\gamma}^{\prime}(\mathbf{J}+\overline{\rho}\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}^{\prime})^{-1}+\frac{1}{1-\overline{\psi}}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F})^{-1}\overline{\Sigma}_{\epsilon\gamma}\mathbf{D}^{\prime} (A.40)
+11ψ¯𝐃Σ¯ϵγ(𝐈NTψ¯𝐅)1\displaystyle+\frac{1}{1-\overline{\psi}}\mathbf{D}\overline{\Sigma}_{\epsilon\gamma}^{\prime}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}^{\prime})^{-1} (A.41)

Pre-multiplying both sides by 𝐖(𝐈NTψ𝐅)(𝐈NTψ¯𝐅)\mathbf{W}(\mathbf{I}_{NT}-\psi\mathbf{F})(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}) and post-multiplying by (𝐈NTψ𝐅)(𝐈NTψ¯𝐅)𝐖(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}^{\prime})\mathbf{W} yields

σα2𝐖(𝐈NTψ¯𝐅)(𝐉+ρ𝐆)(𝐉+ρ𝐆)(𝐈NTψ¯𝐅)𝐖+𝐖(𝐈NTψ¯𝐅)Σ(𝐈NTψ¯𝐅)𝐖=\displaystyle\sigma^{2}_{\alpha}\mathbf{W}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F})(\mathbf{J}+\rho\mathbf{G})(\mathbf{J}+\rho\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}^{\prime})\mathbf{W}+\mathbf{W}(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F})\Sigma(\mathbf{I}_{NT}-\overline{\psi}\mathbf{F}^{\prime})\mathbf{W}=
σ¯α2𝐖(𝐈NTψ𝐅)(𝐉+ρ¯𝐆)(𝐉+ρ¯𝐆)(𝐈NTψ𝐅)𝐖+𝐖(𝐈NTψ𝐅)Σ¯(𝐈NTψ𝐅)𝐖\displaystyle\overline{\sigma}^{2}_{\alpha}\mathbf{W}(\mathbf{I}_{NT}-\psi\mathbf{F})(\mathbf{J}+\overline{\rho}\mathbf{G})(\mathbf{J}+\overline{\rho}\mathbf{G})^{\prime}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})\mathbf{W}+\mathbf{W}(\mathbf{I}_{NT}-\psi\mathbf{F})\overline{\Sigma}(\mathbf{I}_{NT}-\psi\mathbf{F}^{\prime})\mathbf{W} (A.42)

Rearranging yields

(σα2σ¯α2)𝐖𝐉𝐉𝐖+(σα2(ρψ¯)σ¯α2(ρ¯ψ))𝐖(𝐉𝐆+𝐆𝐉)𝐖\displaystyle(\sigma^{2}_{\alpha}-\overline{\sigma}^{2}_{\alpha})\mathbf{WJJ^{\prime}W}+(\sigma^{2}_{\alpha}(\rho-\overline{\psi})-\overline{\sigma}^{2}_{\alpha}(\overline{\rho}-\psi))\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W} (A.43)
+(σ¯α2ψρ¯σα2ψ¯ρ)𝐖(𝐉𝐇+𝐇𝐉)𝐖+(σ¯α2(ρ¯ψ)ψρ¯σα2(ρψ¯)ψ¯ρ)𝐖(𝐆𝐇+𝐇𝐆)𝐖\displaystyle+(\overline{\sigma}^{2}_{\alpha}\psi\overline{\rho}-\sigma^{2}_{\alpha}\overline{\psi}\rho)\mathbf{W}(\mathbf{JH}^{\prime}+\mathbf{HJ}^{\prime})\mathbf{W}+(\overline{\sigma}^{2}_{\alpha}(\overline{\rho}-\psi)\psi\overline{\rho}-\sigma^{2}_{\alpha}(\rho-\overline{\psi})\overline{\psi}\rho)\mathbf{W}(\mathbf{GH}^{\prime}+\mathbf{HG}^{\prime})\mathbf{W} (A.44)
+(σα2(ρψ¯)2σ¯α2(ρ¯ψ)2)𝐖𝐆𝐆𝐖+(σα2ψ¯2ρ2σ¯α2ψ2ρ¯2)𝐖𝐇𝐇𝐖+𝐖(ΣΣ¯)𝐖\displaystyle+(\sigma^{2}_{\alpha}(\rho-\overline{\psi})^{2}-\overline{\sigma}^{2}_{\alpha}(\overline{\rho}-\psi)^{2})\mathbf{WGG^{\prime}W}+(\sigma^{2}_{\alpha}\overline{\psi}^{2}\rho^{2}-\overline{\sigma}^{2}_{\alpha}\psi^{2}\overline{\rho}^{2})\mathbf{WHH^{\prime}W}+\mathbf{W}(\Sigma-\overline{\Sigma})\mathbf{W} (A.45)
+𝐖(𝐅(ψΣ¯ψ¯Σ)+(ψΣ¯ψ¯Σ)𝐅)𝐖+𝐖𝐅(ψ¯2Σψ2Σ¯)𝐅𝐖=𝟎\displaystyle+\mathbf{W}(\mathbf{F}(\psi\overline{\Sigma}-\overline{\psi}\Sigma)+(\psi\overline{\Sigma}-\overline{\psi}\Sigma)\mathbf{F}^{\prime})\mathbf{W}+\mathbf{WF}(\overline{\psi}^{2}\Sigma-\psi^{2}\overline{\Sigma})\mathbf{F^{\prime}W}=\mathbf{0} (A.46)

Since the within-group variance in ϵ\epsilon is not restricted by (A.10), identification hinges on between-group variance in the outcomes. Consider the covariance terms for the outcomes of group mm with those in all other groups, for which we have the equations

(σα2σ¯α2)𝐄m𝐖𝐉𝐉𝐖𝐄m+(σα2(ρψ¯)σ¯α2(ρ¯ψ))𝐄m𝐖(𝐉𝐆+𝐆𝐉)𝐖𝐄m\displaystyle(\sigma^{2}_{\alpha}-\overline{\sigma}^{2}_{\alpha})\mathbf{E}_{m}\mathbf{WJJ^{\prime}W}\mathbf{E}_{-m}^{\prime}+(\sigma^{2}_{\alpha}(\rho-\overline{\psi})-\overline{\sigma}^{2}_{\alpha}(\overline{\rho}-\psi))\mathbf{E}_{m}\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime}
+(σ¯α2ψρ¯σα2ψ¯ρ)𝐄m𝐖(𝐉𝐇+𝐇𝐉)𝐖𝐄m\displaystyle+(\overline{\sigma}^{2}_{\alpha}\psi\overline{\rho}-\sigma^{2}_{\alpha}\overline{\psi}\rho)\mathbf{E}_{m}\mathbf{W}(\mathbf{JH}^{\prime}+\mathbf{HJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime} (A.47)
+(σ¯α2(ρ¯ψ)ψρ¯σα2(ρψ¯)ψ¯ρ)𝐄m𝐖(𝐆𝐇+𝐇𝐆)𝐖𝐄m\displaystyle+(\overline{\sigma}^{2}_{\alpha}(\overline{\rho}-\psi)\psi\overline{\rho}-\sigma^{2}_{\alpha}(\rho-\overline{\psi})\overline{\psi}\rho)\mathbf{E}_{m}\mathbf{W}(\mathbf{GH}^{\prime}+\mathbf{HG}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime} (A.48)
+(σα2(ρψ¯)2σ¯α2(ρ¯ψ)2)𝐄m𝐖𝐆𝐆𝐖𝐄m\displaystyle+(\sigma^{2}_{\alpha}(\rho-\overline{\psi})^{2}-\overline{\sigma}^{2}_{\alpha}(\overline{\rho}-\psi)^{2})\mathbf{E}_{m}\mathbf{WGG^{\prime}W}\mathbf{E}_{-m}^{\prime} (A.49)
+(σα2ψ¯2ρ2σ¯α2ψ2ρ¯2)𝐄m𝐖𝐇𝐇𝐖𝐄m=𝟎\displaystyle+(\sigma^{2}_{\alpha}\overline{\psi}^{2}\rho^{2}-\overline{\sigma}^{2}_{\alpha}\psi^{2}\overline{\rho}^{2})\mathbf{E}_{m}\mathbf{WHH^{\prime}W}\mathbf{E}_{-m}^{\prime}=\mathbf{0} (A.50)

where 𝐄m\mathbf{E}_{m} and 𝐄m\mathbf{E}_{-m}^{\prime} are defined in the proof of Proposition 3. If there exists m[M]m\in[M] such that 𝐄m𝐖𝐉𝐉𝐖𝐄m,𝐄m𝐖(𝐉𝐆+𝐆𝐉)𝐖𝐄m\mathbf{E}_{m}\mathbf{WJJ^{\prime}W}\mathbf{E}_{-m}^{\prime},\mathbf{E}_{m}\mathbf{W}(\mathbf{JG}^{\prime}+\mathbf{GJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime} and 𝐄m𝐖(𝐉𝐇+𝐇𝐉)𝐖𝐄m\mathbf{E}_{m}\mathbf{W}(\mathbf{JH}^{\prime}+\mathbf{HJ}^{\prime})\mathbf{W}\mathbf{E}_{-m}^{\prime} are maximially linearly independent from the other matrices then we have σα2=σ¯α2\sigma^{2}_{\alpha}=\overline{\sigma}^{2}_{\alpha} and

ρ¯ψ+ψ¯ρ=0,ψρ¯ψ¯ρ=0\displaystyle\overline{\rho}-\psi+\overline{\psi}-\rho=0,\quad\psi\overline{\rho}-\overline{\psi}\rho=0 (A.51)

Solving the first equation for ψ¯\overline{\psi} and injecting into the second yields (ρ¯ρ)(ψ+ρ)=0(\overline{\rho}-\rho)(\psi+\rho)=0, hence ρ¯=ρ\overline{\rho}=\rho provided that ψ+ρ0\psi+\rho\neq 0. Injecting into the first equation yields ψ¯=ψ\overline{\psi}=\psi.