This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\marginsize

3cm3cm3cm3cm

Peer Effects with Miss-specified Peer Groups

Christiern Rose (UQ), Lizi Yu (UQ)

1.   Introduction

We address two challenges faced by researchers studying peer effects, each of which can lead to miss-specification of peer groups. The first challenge is that standard methods require that the researcher has access to group data: a sample of groups which includes the outcome and characteristics of all members (see Bramoullé et al. (2020) for a recent review). Such data are often proprietary or restricted-access (Breza et al., 2020), and widely available individual level survey data cannot be used. Without group data, empirical practice is either to drop individuals with missing peer data, leading to sample selection and loss of information, or to use only non-missing peers, leading to measurement error. Measurement error also arises if the researcher is unaware that peers are missing. We refer to this as the missing data problem, and propose a solution which corrects for measurement error and makes full use of the available information.

The second challenge is that the researcher has to choose the relevant peer group, often from a set of candidate group structures. For example, studies based on the Dartmouth room-mate data, in which college freshman were randomly allocated to dorm rooms, have noted that it isn’t clear whether peer effects operate at the room or the floor levels (Sacerdote, 2001; Glaeser et al., 2003; Angrist, 2014), and the relevant group may be different for different outcomes, such as academic attainment and fraternity membership (Sacerdote, 2001). Empirical practice is to conduct a robustness test by re-estimating the peer effects for each candidate group. This implicitly assumes that the relevant group is the same for all individuals (i.e., it is deterministic), but there is no reason for this to be the case. In the Dartmouth context, some dorms may simply be more sociable than others, and the researcher is unlikely to know which ones. Moreover, the researcher does not know which (if any) of the estimates are valid, which can be problematic when there are large differences in the estimates. We refer to this as the group uncertainty problem, and propose a solution based on random peer group structure.

We first show that missing data and group uncertainty are examples of a wider class of miss-specification in which the members of the specified peer group are a subset of the members of the (true) group. This is clear for missing data if the specified group comprises the observed group members. For group uncertainty, it is applicable in the typical empirical setting in which the candidate groups are nested and the specified group is the smallest group. For example, in the Dartmouth context rooms are nested within floors and groups can be specified based on rooms.

Under subset-miss-specification, we show that peer effects can be identified using assumptions which allow the distribution of the group size to be inferred. With missing data, we show that widely used assumptions on the missingness mechanism allow the group size to be inferred by restricting the distribution of the group size conditional on the specified group size (i.e., on the number of observed individuals from that group). For group uncertainty, we suppose that there are two nested candidate groups (e.g., dorm rooms and floors), from which the relevant group is exogenously determined at random.111The extension to three or more candidate groups is straightforward. This also restricts the distribution of the group size conditional on the sizes of the two candidate groups.

We apply our approach to settings with both small and medium-large group sizes. Our Monte-Carlo experiment is designed to mimic the Dartmouth room-mate data used in Sacerdote (2001); Glaeser et al. (2003) and Angrist (2014), in which students were assigned to dorm rooms of size 2, 3 or 4. Our results demonstrate that the biases arising from ignoring missing data and group uncertainty can be large, but are corrected by the GMM and NLS estimators we propose.

Our empirical application studies how peer ability impacts lawyers’ decision to exit the local legal market using unique employer-employee matched data comprising all lawyers practicing in Shanghai, China. Peer groups are specified based on the number of junior lawyers practicing in the same firm, with a mean of 11.2 and a maximum of 482. We find that the likelihood of a lawyer’s quit-to-exit increases in the proportion of high-ability peers, which is consistent with the invidious comparison model (Hoxby and Weingarth, 2005; Antecol et al., 2016). We apply our results on missing data to show that similar estimates and qualitative conclusions could have been reached had we only had access to an individual level sample of lawyers, rather than the group data we use. Finally, we apply our results on group uncertainty to study whether peer effects operate at the firm or firm-cohort levels, and find evidence of considerable heterogeneity in the relevant peer group across firms.

1.1.   Related literature

The literature on sampled networks, mismeasured networks and missing data has recently attracted much interest. Lewbel et al. (2019) and Hardy et al. (2019) consider the implications of measurement error in the network through which peer effects operate. Lewbel et al. (2019) also show that peer effects can be identified and consistently estimated when there is no information on the network beyond group identifiers. Boucher and Houndetoungan (2020) study peer effects when the network is unknown but that its distribution is known or can be consistently estimated.

Chandrasekhar and Lewis (2011) studies the implications of using sampled network data in applied work, provide analytical corrections in some examples of interest and develop a more general graphical reconstruction approach. In the context of peer effects, the analytical correction can be applied if the researcher has data on the network, outcomes and exogenous characteristics for a subsample of individuals and knows the identities, outcomes and exogenous characteristics of all of their peers. This means that data are only missing for peers-of-peers. To apply graphical reconstruction, for every individual, the researcher must observe the outcomes, exogenous characteristics and variables which can be used to predict the propensity of individuals to form links in a network formation model.

Breza et al. (2020) use a parametric network formation model to show that network structure can be identified and used to construct network statistics (e.g., centrality measures) if the researcher does not collect network data for all individuals, but instead collects relational data for a sub-sample of individuals and basic characteristics of all individuals.222Relational data are collected by asking questions of the form ‘Consider all the individuals in your group with whom you do activity X. How many of these have trait Y?’ (Breza et al., 2020) Sojourner (2013), Wang and Lee (2013) and Liu et al. (2017) consider the case in which the network and all individuals are observed without error but there are missing data in the outcomes and/or exogenous characteristics.

All of the above papers suppose that the researcher observes at least some data on every individual. In contrast, we allow for some individuals to be entirely missing and do not require knowledge on whether there even exist such individuals. These gains come from exploiting the structure of the group interactions model we consider, which, though widely used in practice (e.g., Moffitt et al. (2001); Sacerdote (2001); Glaeser et al. (2003); Angrist (2014); Boucher et al. (2014); Cornelissen et al. (2017)) and well understood theoretically (e.g., Lee (2007); Bramoullé et al. (2009); Davezies et al. (2009); Bramoullé et al. (2020)), is less general than some.333We discuss this issue further in Section 8.

Our work is also related to the literature on multiple peer effects. Goldsmith-Pinkham and Imbens (2013), Arduini et al. (2020) and Reza et al. (2021) study a setting in which individuals are exposed to multiple peer effects, each operating through a different network. In contrast, we suppose that individuals are exposed to peer effects operating through one network, but there is uncertainty as to which is the relevant network. Our model is thus not a model of multiple peer effects, but one of peer effects with uncertain peer groups. The approaches are not observationally equivalent. With multiple peer effects, the endogenous peer effects propogate through a network formed by taking the union of the links over multiple networks. In contrast, with uncertain peer groups, the endogenous effects propogate through only one network, the identity of which is unknown.

Our empirical application contributes to the literature on peer effects in competitive environments, particularly with respect to peers’ ability impact on lower- and medium-achieving individuals. One strand of this literature focuses on sports competitions and tournaments and mostly find a negative impact. Brown (2011), Smith (2013) and Emerson and Hill (2018) find a negative peer ability effect, in contrast to Guryan et al. (2009) who find no effect. Yamane and Hayashi (2015) find positive effects when subjects are ahead and negative effects when they are behind. Another strand of this literature focuses on educational outcomes and reports mixed findings. Some studies find significant peer effects (for example, Hoxby (2000) and Sacerdote (2001) find a positive effect and Antecol et al. (2016) find negative effect), and others find small to no effects (for example, Foster (2006) and Angrist and Lang (2004)). Brady et al. (2017) find negative effects at broader level but positive effects at smaller level, suggesting that peer effects operate differently for different groups. The overall perception is that effects are not identical across different ability groups. Low-achieving groups are adversely affected by peer ability and may benefit from tracking compared to ability-mixing groups (duflo2011pee; Carrell et al., 2013; Booij et al., 2017). Our work builds on these papers by examining career decisions in the highly competitive and incentivized labor market for junior lawyers. Our results are more in line with the literature finding peer ability hurts subjects’ persistence in education and in their chosen career Thiemann (2022); Howell (2021).

2.   Model

We consider a group interactions model in which the data comprise an i.i.d. sample of G0G_{0} groups. The asymptotic is G0G_{0}\to\infty, which is analagous to the large-NN fixed-TT asymptotic used for microeconometric panel data (Bramoullé et al., 2009). Individual i{1,2,,N0}i\in\{1,2,...,N_{0}\} is in exactly one group denoted g0(i){1,2,,G0}g_{0}(i)\in\{1,2,...,G_{0}\}. The size of ii’s group is ng0(i)2n_{g_{0}(i)}\geq 2. When it is not required, we omit the dependence on ii and simply refer to g0g_{0} and ng0n_{g_{0}}. The continuous outcome yiy_{i} of individual ii is determined by

yi=αg0+(1ng01j:g0(j)=g0(i)jiyj)β+(1ng01j:g0(j)=g0(i)jixj)δ+xiγ+ϵi,\displaystyle y_{i}=\alpha_{g_{0}}+\left(\frac{1}{n_{g_{0}}-1}\sum_{\begin{subarray}{c}j:g_{0}(j)=g_{0}(i)\\ j\neq i\end{subarray}}y_{j}\right)\beta+\left(\frac{1}{n_{g_{0}}-1}\sum_{\begin{subarray}{c}j:g_{0}(j)=g_{0}(i)\\ j\neq i\end{subarray}}x_{j}\right)\delta+x_{i}\gamma+\epsilon_{i}, (2.1)

where αg0\alpha_{g_{0}} is group unobserved heterogeneity, xix_{i} is an exogenous characteristic and ϵi\epsilon_{i} is an error term satisfying

𝔼[ϵi|αg0,ng0,𝐱g0]=0i{1,2,,N0},\displaystyle\mathbb{E}[\epsilon_{i}|\alpha_{g_{0}},n_{g_{0}},\mathbf{x}_{g_{0}}]=0\quad\forall i\in\{1,2,...,N_{0}\}, (2.2)

where 𝐱g0=(xj)j:g0(j)=g0(i)\mathbf{x}_{g_{0}}=(x_{j})_{j:g_{0}(j)=g_{0}(i)} is ng0×1n_{g_{0}}\times 1.444Since we only consider the within-transformation of the fixed effects model below, (2.2) can be relaxed to 𝔼[ϵi|ng0,𝐱g0]=0\mathbb{E}[\epsilon_{i}|n_{g_{0}},\mathbf{x}_{g_{0}}]=0. We maintain the form of (2.2) to maintain comparability with the literature. The moment condition supposes that group size and the characteristics of the group members are exogenous with respect to ϵi\epsilon_{i}. However, both can be arbitrarily dependent on αg0\alpha_{g_{0}}, which allows for selection of individuals into groups. The model includes correlated effects (captured by αg0\alpha_{g_{0}}), endogenous peer effects (β\beta) and contextual peer effects (δ\delta).

For simplicity of exposition, following Bramoullé et al. (2009) we present results for the case where there is a single characteristic xix_{i}. All results continue to apply in the more general case in which xix_{i}, γ\gamma and δ\delta are vectors. As explained in Section 1.1, this model is well understood from a theoretical perspective and is pervasive in applied work. Moreover, it can be microfounded based on a game in which utility depends on one’s own action and the average action of the others in the group (see Blume et al. (2015); Bramoullé et al. (2020)).

To ensure that the reduced form of (2.1) exists, as is standard in the literature, we suppose that |β|<1|\beta|<1 (i.e., that the endogenous peer effect is not explosive), yielding

yi\displaystyle y_{i} =αg01β+(j:g0(j)=g0(i)jixj)π1(ng0)+xiπ2(ng0)+ui,\displaystyle=\frac{\alpha_{g_{0}}}{1-\beta}+\left(\sum_{\begin{subarray}{c}j:g_{0}(j)=g_{0}(i)\\ j\neq i\end{subarray}}x_{j}\right)\pi_{1}(n_{g_{0}})+x_{i}\pi_{2}(n_{g_{0}})+u_{i}, (2.3)
π1(n)\displaystyle\pi_{1}(n) ((δ+βγ)(n1)11β(β+n2)(n1)1),π2(n)(γ+β(δγ(n2))(n1)11β(β+n2)(n1)1),\displaystyle\triangleq\left(\frac{(\delta+\beta\gamma)(n-1)^{-1}}{1-\beta(\beta+n-2)(n-1)^{-1}}\right),\quad\pi_{2}(n)\triangleq\left(\frac{\gamma+\beta(\delta-\gamma(n-2))(n-1)^{-1}}{1-\beta(\beta+n-2)(n-1)^{-1}}\right),

where uiu_{i} is the reduced form error. Applying the within-group transformation,

y~iyi1ng0j:g0(j)=g0(i)yj,\widetilde{y}_{i}\triangleq y_{i}-\frac{1}{n_{g_{0}}}\sum_{j:g_{0}(j)=g_{0}(i)}y_{j},

to the reduced form yields,

y~i=x~iπ(ng0)+u~i,π(n)(γδ(n1)11+β(n1)1).\displaystyle\widetilde{y}_{i}=\widetilde{x}_{i}\pi(n_{g_{0}})+\widetilde{u}_{i},\quad\pi(n)\triangleq\left(\frac{\gamma-\delta(n-1)^{-1}}{1+\beta(n-1)^{-1}}\right). (2.4)

Though other transformations can be used, we focus on the within-group transformation since it delivers the least information loss (see Section 3.3 of Bramoullé et al. (2009)).

3.   (Possibly) Miss-specified Groups

The researcher specifies the model in Section 2 with group g(i){1,,G}g(i)\in\{1,...,G\} for individual ii. The size of group g(i)g(i) is ng(i)n_{g(i)}. As with g0(i)g_{0}(i), we omit the dependence of g(i)g(i) on ii unless it is required. The specified groups may cover only a subset 𝒮{1,,N0}\mathcal{S}\subseteq\{1,...,N_{0}\} of individuals of cardinality NN0N\leq N_{0}. This could arise, for example, if the available data comprise a random sample of the N0N_{0} individuals. However, we maintain that every individual in 𝒮\mathcal{S} is in exactly one specified group. We also make use of the within-specified-group transformation,

y¯iyi1ngj:g(j)=g(i)yj,\overline{y}_{i}\triangleq y_{i}-\frac{1}{n_{g}}\sum_{j:g(j)=g(i)}y_{j},

which would be applied to eliminate specified group fixed effects. To build intuition on the implications of miss-specification for identification, we now consider two examples and derive their implications for the reduced form, which will later be used for our identification analysis.

Example 1 - Superset miss-specification. Suppose that two groups, each of size n0/2n_{0}/2, are combined to form a specified group of size n0n_{0}. Solving for the reduced form and applying the within-specified-group transformation yields, for individual ii in group 1

y¯i\displaystyle\overline{y}_{i} =x¯iπ(n0)+(a+b+c+u¯i)\displaystyle=\overline{x}_{i}\pi(n_{0})+(a+b+c+\overline{u}_{i}) (3.1)
=x¯iπ(n0)+v¯i,\displaystyle=\overline{x}_{i}\pi(n_{0})+\overline{v}_{i}, (3.2)

where,

a=α1α22(1β),b=(γ+δ1β)1n0(j:g0(j)=1xjj:g0(j)=2xj),\displaystyle a=\frac{\alpha_{1}-\alpha_{2}}{2(1-\beta)},\quad b=\left(\frac{\gamma+\delta}{1-\beta}\right)\frac{1}{n_{0}}\left(\sum_{j:g_{0}(j)=1}x_{j}-\sum_{j:g_{0}(j)=2}x_{j}\right),
c=x¯in0(γβ+δ)2(n01+β)(n0/21+β).\displaystyle c=-\frac{\overline{x}_{i}n_{0}(\gamma\beta+\delta)}{2(n_{0}-1+\beta)(n_{0}/2-1+\beta)}.

There are three sources of miss-specification, each corresponding to a term in equation (3.1). First, the researcher miss-specifies the correlated effect, yielding the term aa. Second, by specifying only one group the researcher erroneously includes the exogenous characteristics and outcomes of those in group 2 in the structural equation determining the outcomes for group 1. This yields the term bb. Finally, the researcher miss-specifies the group size to be n0n_{0} instead of n0/2n_{0}/2. This leads the reduced form parameter to be incorrectly specified as π(n0)\pi(n_{0}) rather than π(n0/2)\pi(n_{0}/2), yielding the term cc. Due to these terms, the error v¯i\overline{v}_{i} depends on x¯i\overline{x}_{i} and n0n_{0}.

Example 2 - Subset miss-specification. Now switch the roles of the groups and the specified groups in Example 1, such that the researcher divides one group into two specified groups, each of size n0/2n_{0}/2. Then we obtain, for individual ii in group 1,

y¯i\displaystyle\overline{y}_{i} =x¯iπ(n0/2)+(c+u¯i)\displaystyle=\overline{x}_{i}\pi(n_{0}/2)+(c+\overline{u}_{i}) (3.3)
=x¯iπ(n0/2)+v¯i,\displaystyle=\overline{x}_{i}\pi(n_{0}/2)+\overline{v}_{i}, (3.4)

where this time,

c=x¯in0(γβ+δ)2(nn0/21+β)(n01+β).\displaystyle c=\frac{\overline{x}_{i}n_{0}(\gamma\beta+\delta)}{2(n_{n_{0}}/2-1+\beta)(n_{0}-1+\beta)}.

Terms such as aa and bb in Example 1 do not arise. This is because all members of specified group 1 have the same correlated effect as those in specified group 2, and, the sum of the outcomes and exogenous characteristics over all members of specified group 2 (respectively, specified group 1) is common to all members of specified group 1 (respectively, specified group 2). The within-specified-group transformation thus eliminates both sources of miss-specification. Miss-specification only manifests through the group size (i.e., through cc).

In Example 1 the specified group members are a superset of the group members, and there are three sources of miss-specification. In Example 2 they are a subset, and there is one source of miss-specification.555Though we do not present such an example, the case in which we have neither a subset nor a superset is similar to the superset case. Examples 1-2 suggest that subset miss-specification is more readily addressed because one only needs to be concerned with specification of the group size (i.e., the term cc). This is important, because there is no clear solution to terms such as aa and bb without making strong restrictions on the distributions of 𝐱g0\mathbf{x}_{g_{0}} and αg0\alpha_{g_{0}}. As we show below, the intuition that subset miss-specification is more easily addressed is not specific to Examples 1-2. We also argue below that subset miss-specification is more relevant in practice. We thus proceed under the following assumption,

Assumption 1 (Subset miss-specification).

For all (i,j)𝒮2(i,j)\in\mathcal{S}^{2}, g(i)=g(j)g0(i)=g0(j)g(i)=g(j)\Rightarrow g_{0}(i)=g_{0}(j),

and the remainder of the paper focuses on miss-specification arising due to incorrect specification of the group size.

Assumption 1 covers the case of missing data by allowing the researcher to observe N<N0N<N_{0} individuals with outcomes, characteristics and group identifiers. For example, in the context of education, the researcher might access a sample of students’ test-scores, characteristics and classroom/school identifiers (e.g., Davezies et al. (2009)) or have access to a large educational survey such as the student level PISA survey, which contains school and grade identifiers but includes only a subset of students.666Other examples include risky behaviours and neighbourhood effects. For the former, survey data on smoking, drinking and illicit drug use among students can be incomplete (Lundborg, 2006). For the latter, the neighbourhood average outcome may be measured using survey data which includes geographic identifiers (e.g., census tract). For example, Bertrand et al. (2000) use the 5% public use microsample of the 1990 Census to study welfare take-up. Glaeser et al. (2003) use the same data for wages.

Assumption 1 also covers the case in which there is group uncertainty but the candidate groups are nested. It is satisfied provided that the researcher uses the smallest group structure to specify gg. For example, Glaeser et al. (2003) and Angrist (2014) consider peer groups based on dorm rooms and floors,777Similarly, peer effects in education may operate at the classroom, grade or school levels. Neighbourhood effects may operate at at the two, three or four digit postcode level. so Assumption 1 holds if the specified groups are rooms. In our empirical application, we postulate that peer effects among lawyers may operate either at the firm or firm-cohort levels, in which case Assumption 1 holds if firm-cohorts are specified.

4.   Identification

We now study identification under Assumption 1. Our identification analysis follows Bramoullé et al. (2009), hence we say that the structural parameters are (point) identified if and only if they can be uniquely recovered from the reduced form parameters presented below (i.e., there is an injective relationship). Our results are thus asymptotic in nature (see Manski (1995)), and characterize whether endogenous, contextual and correlated effects can be distentangled if there is no limit to the number of groups G0G_{0}.

Under Assumption 1, the reduced form for individual i𝒮i\in\mathcal{S} is

y¯i\displaystyle\overline{y}_{i} =x¯iπ(ng0)+u¯i\displaystyle=\overline{x}_{i}\pi(n_{g_{0}})+\overline{u}_{i} (4.1)

and taking conditional expectations yields

𝔼[y¯i|ng,𝐱g]\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}] =x¯iφ(ng,𝐱g)+𝔼[u¯i|ng,𝐱g],\displaystyle=\overline{x}_{i}\varphi(n_{g},\mathbf{x}_{g})+\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}], (4.2)

where φ(n,𝐱)𝔼[π(ng0)|ng=n,𝐱g=𝐱]\varphi(n,\mathbf{x})\triangleq\mathbb{E}[\pi(n_{g_{0}})|n_{g}=n,\mathbf{x}_{g}=\mathbf{x}]. To ease the notational burden we do not make explicit that the expectations are also conditional on i𝒮i\in\mathcal{S}. Below, we consider only cases in which this omission is innocuous and omit the condition i𝒮i\in\mathcal{S} for the remainder of the paper.

Equation (4.2) shows that identification prospects depend on whether there is within-specified-group variation in the exogenous characteristic, on whether 𝔼[u¯i|ng,𝐱g]=0\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0, and if both of these conditions are satisfied, on whether the structural parameters can be uniquely recovered from the reduced form parameters φ(n,𝐱)\varphi(n,\mathbf{x}), which are identified for all (𝐱,n2)(\mathbf{x},n\geq 2) in the support of (𝐱g,ng)(\mathbf{x}_{g},n_{g}).888We require n2n\geq 2 because φ(1,𝐱)\varphi(1,\mathbf{x}) is not identifiable since ng=1n_{g}=1 implies y¯i=x¯i=0\overline{y}_{i}=\overline{x}_{i}=0.

In general, even under Assumption 1, the moment condition in (2.2) does not imply 𝔼[u¯i|ng,𝐱g]=0\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0. Suppose however that we can make a restriction such that it is satisfied, and also that there is within-specified-group variation in the exogenous covariate. Then the structural parameters are point identified if they can be uniquely recovered from the reduced form parameters φ(n,𝐱)\varphi(n,\mathbf{x}) and the equations

φ(n,𝐱)=m=2[ng0=m|ng=n,𝐱g=𝐱]π(m)\displaystyle\varphi(n,\mathbf{x})=\sum_{m=2}^{\infty}\mathbb{P}[n_{g_{0}}=m|n_{g}=n,\mathbf{x}_{g}=\mathbf{x}]\pi(m) (4.3)

for all (𝐱,n2)(\mathbf{x},n\geq 2) in the support of (𝐱g,ng)(\mathbf{x}_{g},n_{g}). The distribution ng0|ng,𝐱gn_{g_{0}}|n_{g},\mathbf{x}_{g} is not observed, hence the structural parameters are not point identified unless it can be restricted.999Partial identification is possible because the probabilities [ng0=m|ng,𝐱g]\mathbb{P}[n_{g_{0}}=m|n_{g},\mathbf{x}_{g}] for m=2,3.m=2,3.... are non-negative and sum to 1. We do not pursue partial identification, since, as shown below, typical empirical settings faced by researchers can lead to point identification.

We now consider three assumptions, each of which guarantees 𝔼[u¯i|ng,𝐱g]=0\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0 and restricts the distribution of ng0|ng,𝐱gn_{g_{0}}|n_{g},\mathbf{x}_{g}. Each assumption includes the standard model of peer effects with known peer groups as a special case, respectively when ng=ng0n_{g}=n_{g_{0}} (Assumption 2), when ρ=1\rho=1 (Assumption 3) and when ψ{0,1}\psi\in\{0,1\} (Assumption 4).

To rule out pathological cases and ease the notational burden, we present our our results for the case in which there is within-specified-group variation in the exogenous characteristic, and that the support of the distribution of ng0|𝐱g0n_{g_{0}}|\mathbf{x}_{g_{0}} does not depend on 𝐱g0\mathbf{x}_{g_{0}}. The results in Propositions 1 and 2 continue to apply if the latter is relaxed, provided that there exists an element in the support of the exogenous characteristic such that the support condition holds for the conditional distribution of ng0n_{g_{0}}.

4.1.   Missing data

We first consider identification when there are missing data. Our analysis makes uses of the indicator si{0,1}s_{i}\in\{0,1\} for the condition i𝒮i\in\mathcal{S}.

Assumption 2 (Known group size).

For each individual i{1,2,,N0}i\in\{1,2,...,N_{0}\}, the researcher observes (yi,xi,g0(i),ng0(i))(y_{i},x_{i},g_{0}(i),n_{g_{0}(i)}) when si=1s_{i}=1 and 𝔼[ϵi|αg0,ng0,𝐱g0,𝐬g0]=0\mathbb{E}[\epsilon_{i}|\alpha_{g_{0}},n_{g_{0}},\mathbf{x}_{g_{0}},\mathbf{s}_{g_{0}}]=0. The researcher groups individuals by g0g_{0}.

Assumption 2 covers the simplest case in which the group size is known but individuals are sampled from the group. In practice, it means that the researcher has access to a sample of individuals with outcomes, characteristics, group membership indicators and group size. The distribution of sis_{i} (i.e., the inclusion probability) can be heterogeneous across individuals, but ought not to depend on the structural error (i.e., there should be no sample selection on unobservables). Identification under Assumption 2 was first considered by Davezies et al. (2009). We include it for comparison with our approach, which allows the group size to be unknown.

Assumption 3 (Unknown group size).

For each individual i{1,2,,N0}i\in\{1,2,...,N_{0}\}, the researcher observes (yi,xi,g0(i))(y_{i},x_{i},g_{0}(i)) when si=1s_{i}=1, 𝔼[si|𝐱g0,ng0]=ρ(0,1]\mathbb{E}[s_{i}|\mathbf{x}_{g_{0}},n_{g_{0}}]=\rho\in(0,1], 𝕆𝕍[si,sj|𝐱g0,ng0]=0\mathbb{COV}[s_{i},s_{j}|\mathbf{x}_{g_{0}},n_{g_{0}}]=0 for jg0(i),jij\in g_{0}(i),j\neq i, and 𝔼[ϵi|αg0,ng0,𝐱g0,𝐬g0]=0\mathbb{E}[\epsilon_{i}|\alpha_{g_{0}},n_{g_{0}},\mathbf{x}_{g_{0}},\mathbf{s}_{g_{0}}]=0. The researcher groups individuals by g0g_{0}.

Assumption 3 relaxes the requirement that the researcher knows the group size. In practice, it allows for the case in which the researcher observes an individual level sample of outcomes, exogenous characteristics and group membership indicators. In such a sample, the researcher knows the number of individuals sampled from each group (ngn_{g}) but does not know the group size (ng0n_{g_{0}}), nor whether there are any missing individuals in each group. Relative to Assumption 2, we additionally assume that the sample of observed individuals is unbiased and uncorrelated. These are mild assumptions likely to be satisfied in well designed individual level samples under common sampling schemes (Arnab, 2017).

Assumptions 2 and 3 both maintain that the sampling of individuals does not induce sample selection, whilst Assumption 3 additionally maintains that the sampling does not depend on exogenous covariates nor on the group size. Viewed from the missing data perspective, Assumption 3 is essentially a type of Missing Completely at Random assumption, which is widely used (often implicitly) in empirical work. Together, 𝔼[si|𝐱g0,ng0]=ρ(0,1]\mathbb{E}[s_{i}|\mathbf{x}_{g_{0}},n_{g_{0}}]=\rho\in(0,1] and 𝕆𝕍[si,sj|𝐱g0,ng0]=0\mathbb{COV}[s_{i},s_{j}|\mathbf{x}_{g_{0}},n_{g_{0}}]=0 imply

ng|ng0,𝐱gBinomial(ng0,ρ),\displaystyle n_{g}|n_{g_{0}},\mathbf{x}_{g}\sim\text{Binomial}(n_{g_{0}},\rho), (4.4)

which leads to identification of the distribution ng0|ng,𝐱gn_{g_{0}}|n_{g},\mathbf{x}_{g} from the observed distribution ng|ng1,𝐱gn_{g}|n_{g}\geq 1,\mathbf{x}_{g}.101010Conditioning on ng1n_{g}\geq 1 is important, because for some groups the researcher may not observe any individuals. Hence it is the distribution ng|ng1,𝐱gn_{g}|n_{g}\geq 1,\mathbf{x}_{g} which is observed rather than ng|𝐱gn_{g}|\mathbf{x}_{g}.

We now briefly discuss possible generalizations of Assumption 3, and the extent to which they facilitate identification. First, one might consider ρ=ρg0\rho=\rho_{g_{0}} to allow for group heterogeneity. Such an assumption does not provide information to identify ng0|ng,𝐱gn_{g_{0}}|n_{g},\mathbf{x}_{g} because [ng0=m|𝐱g]\mathbb{P}[n_{g_{0}}=m|\mathbf{x}_{g}] and ρg0\rho_{g_{0}} only appear multiplicatively in the expression for [ng=n|ng1,𝐱g]\mathbb{P}[n_{g}=n|n_{g}\geq 1,\mathbf{x}_{g}] for all mm and nn. For the same reason, using ρ=ρ(ng0)\rho=\rho(n_{g_{0}}) does not provide identifying information.

Alternatively one might consider ρ=ρ(zi)\rho=\rho(z_{i}) for some exogenous observable ziz_{i} (e.g., zi=xiz_{i}=x_{i}). Viewed from the missing data perspective, this corresponds to a Missing At Random assumption. In this case ng|ng0,𝐱g0,𝐳g0n_{g}|n_{g_{0}},\mathbf{x}_{g_{0}},\mathbf{z}_{g_{0}} follows a Poisson Binomial distribution, which could, in principle, be used for identification. However, if 𝐳g0\mathbf{z}_{g_{0}} is observed then ng0n_{g_{0}} is known and identification is attained instead under the weaker Assumption 2. In contrast, the distribution ng|ng0,𝐱g,𝐳gn_{g}|n_{g_{0}},\mathbf{x}_{g},\mathbf{z}_{g} does not take a known form, hence is of limited use for the identification in the typical setting in which we have no information on the missing individuals (i.e., there are some individuals for whom ziz_{i} is unobserved).

We do not pursue the above generalisations formally because the focus of our analysis is not on sample selection at the individual level, but instead on replacing a group level sample with an individual level sample. Hence we consider Assumption 3 as the benchmark case in which the researcher has access to a well designed sample of individuals rather than a well designed sample of groups. We now present our identification result.

Proposition 1.

(γ,β,δ)(\gamma,\beta,\delta) are point identified if γβ+δ0\gamma\beta+\delta\neq 0 and any one of the following holds

  1. 1.

    Assumption 2 holds and the support of the distribution of ng0n_{g_{0}} has at least three elements (Davezies et al., 2009).

  2. 2.

    Assumption 3 holds and the support of the distribution of ng0n_{g_{0}} is bounded and has at least three elements.

Proposition 1 shows that the structural parameters are identifiable if γβ+δ0\gamma\beta+\delta\neq 0. This is a well known necessary identification condition in models with correctly specified groups and both endogenous and contextual effects (see for example Bramoullé et al. (2009) or Rose (2017)), which is generically satisfied over the parameter space. If it is violated then the endogenous and contextual effects exactly offset one another such that the reduced form effect is π(n)=γ\pi(n)=\gamma, which does not vary with nn, and hence no amount of variation in group sizes can identify β\beta and δ\delta. Both results in Proposition 1 also require that the support of the distribution of ng0n_{g_{0}} has at least three elements. This is also necessary condition for identification when the groups are correctly specified and there are endogenous, contextual and correlated effects (Lee, 2007; Davezies et al., 2009; Bramoullé et al., 2009), hence it is also necessary if the groups are (possibly) miss-specified.

Result 1 was first established by Davezies et al. (2009) for missing data with known group sizes whereas result 2 applies when the group sizes are unknown. Point identification of ρ\rho comes in part from information on the lower bound of the support of ng0n_{g_{0}}. The result uses that ng02n_{g_{0}}\geq 2, which is maintained throughout the theoretical and empirical literature (e.g., Lee (2007); Bramoullé et al. (2009); Davezies et al. (2009); Bramoullé et al. (2020); Moffitt et al. (2001); Sacerdote (2001); Glaeser et al. (2003); Angrist (2014); Boucher et al. (2014); Cornelissen et al. (2017)). Indeed, groups of size one cannot provide information on peer effects and are not consistent with our model. However, if individuals are missing, then we may observe groups with only one member even though all groups have at least two members (i.e., [ng=1|𝐱g,ng1]>0\mathbb{P}[n_{g}=1|\mathbf{x}_{g},n_{g}\geq 1]>0 if ρ<1\rho<1). Hence the distribution of ngn_{g} provides information on ρ\rho, which combined with (4.4), leads to identification of ρ\rho and the distribution ng0|ng,𝐱gn_{g_{0}}|n_{g},\mathbf{x}_{g}. Equation (4.3) can then be used to identify the peer effects.

In some applications a stronger support restriction may be used if it is known that ng0n¯n_{g_{0}}\geq\underline{n} for some known n¯>2\underline{n}>2. For example, this is likely to be satisfied in studies of peer effects in education, where classrooms might reasonably be assumed to contain at least a handful of students. Though not required for identification, this can improve the precision of the GMM estimator we propose below. In other applications ng0=1n_{g_{0}}=1 may be feasible, in which case one can never rule out ρ=1\rho=1 and ng0=ngn_{g_{0}}=n_{g} so that peer effects are not identifiable. For this reason, a restriction on the lower bound of the support of ng0n_{g_{0}} is necessary for identification of the peer effects.

Result 2 also uses the mild assumption that the distribution of ng0n_{g_{0}} is bounded, which is needed to identify ρ\rho and the distribution of ng0|𝐱gn_{g_{0}}|\mathbf{x}_{g} from the distribution of ng|ng1,𝐱gn_{g}|n_{g}\geq 1,\mathbf{x}_{g} as a solution of a non-linear system comprising a finite number of equations. The upper bound need not be known since it is identifiable from the distribution of ng|ng1n_{g}|n_{g}\geq 1. This assumption is also used, for example, by Lee (2007), Davezies et al. (2009), Lewbel (2019) and Boucher and Houndetoungan (2020).

4.2.   Group uncertainty

We now consider subset miss-specification resulting from group uncertainty.

Assumption 4 (Group uncertainty).

For each individual i{1,2,,N0}i\in\{1,2,...,N_{0}\} the researcher observes (yi,xi,g1(i),g2(i),ng1(i),ng2(i))(y_{i},x_{i},g_{1}(i),g_{2}(i),n_{g_{1}(i)},n_{g_{2}(i)}), 2ng1ng22\leq n_{g_{1}}\leq n_{g_{2}} and g1(i)=g1(j)g2(i)=g2(j)g_{1}(i)=g_{1}(j)\Rightarrow g_{2}(i)=g_{2}(j) for all (i,j){1,2,,N0}2(i,j)\in\{1,2,...,N_{0}\}^{2}. The researcher groups individuals by g1g_{1}, 𝔼[ϵi|ng1,ng2,𝐱g2]=0\mathbb{E}[\epsilon_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=0 and [ng0=ng1|ng1,ng2,𝐱g2]=ψ[0,1]\mathbb{P}[n_{g_{0}}=n_{g_{1}}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=\psi\in[0,1] and [ng0=ng2|ng1,ng2,𝐱g2]=1ψ\mathbb{P}[n_{g_{0}}=n_{g_{2}}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=1-\psi.

Assumption 4 covers the case in which there is group uncertainty but the candidate groups are naturally nested. Though an extension to three or more is straightforward, for simplicity of exposition we do not pursue it formally. We also require a minor strengthening of the moment condition, replacing (2.2) with 𝔼[ϵi|ng1,ng2,𝐱g2]=0\mathbb{E}[\epsilon_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=0, and hence ngn_{g} with ng1,ng2n_{g_{1}},n_{g_{2}} in equations (4.2) and (4.3). The stronger moment condition requires that both potential group sizes and the exogenous characteristic of all individuals in the two potential groups be strictly exogenous with respect to the structural error.

Under Assumption 4, the reduced form is

𝔼[y¯i|ng1,ng2,𝐱g2]=x¯iφ(n1,n2,𝐱)\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=\overline{x}_{i}\varphi^{\prime}(n_{1},n_{2},\mathbf{x}) (4.5)

where φ(n1,n2,𝐱)𝔼[π(ng0)|ng1=n1,ng2=n2,𝐱g2=𝐱]=ψπ(n1)+(1ψ)π(n2)\varphi^{\prime}(n_{1},n_{2},\mathbf{x})\triangleq\mathbb{E}[\pi(n_{g_{0}})|n_{g_{1}}=n_{1},n_{g_{2}}=n_{2},\mathbf{x}_{g_{2}}=\mathbf{x}]=\psi\pi(n_{1})+(1-\psi)\pi(n_{2}), hence identification hinges on the properties of the joint distribution of ng1n_{g_{1}} and ng2n_{g_{2}}, specifically, through the bi-partite graph of the support of (ng1,ng2)(n_{g_{1}},n_{g_{2}}), which we denote by 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}}. Figure 1 depicts an example.

Our approach can be generalized to allow ψ=ψ(𝐳g2)\psi=\psi(\mathbf{z}_{g_{2}}) (e.g., 𝐳g2=𝐱g2\mathbf{z}_{g_{2}}=\mathbf{x}_{g_{2}}), provided that the conditional expectations and probabilities in Assumption 4 continue to hold conditional also on 𝐳g2\mathbf{z}_{g_{2}}. We discuss the corresponding modification of Proposition 2 at the end of this section. We now present our identification result.

Proposition 2.

(γ,β,δ)(\gamma,\beta,\delta) are point identified if γβ+δ0\gamma\beta+\delta\neq 0, Assumption 4 holds, the support of the distribution of ng1n_{g_{1}} has at least three elements, the support of the distribution of ng2n_{g_{2}} has at least three elements and either there is a connected component of 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}} with at least three vertices corresponding to the support of ng1n_{g_{1}} or there is a connected component of 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}} with at least three vertices corresponding to the support of ng2n_{g_{2}}.

Figure 1: A connected component of 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}} with five vertices
Refer to caption

Notes: The vertices in the left hand column corresponds to ng1n_{g_{1}}, which has support {2,3,4}\{2,3,4\}. The vertices in the right hand column correspond to ng2n_{g_{2}}, which has support {5,6}\{5,6\}. An edge between n1n_{1} and n2n_{2} implies that (n1,n2)(n_{1},n_{2}) lies in the support of (ng1,ng2)(n_{g_{1}},n_{g_{2}}). This example satisfies the identification condition in Proposition 2 becase it is connected and has three vertices in the left hand column.

The conditions γβ+δ0\gamma\beta+\delta\neq 0 and that the supports of the distributions of ng1n_{g_{1}} and ng2n_{g_{2}} each have least three elements are necessary even when there is no miss-specification (i.e, when it is known that either ψ=0\psi=0 or ψ=1\psi=1) (Lee, 2007; Davezies et al., 2009; Bramoullé et al., 2009). In addition to these, we require a mild condition on the joint support of the candidate group sizes (ng1,ng2)(n_{g_{1}},n_{g_{2}}), expressed through the bi-partite graph 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}}. When this condition is satisfied, using standard arguments for bi-partite fixed effects models (e.g., Abowd et al. (1999)), we are able to identify ψπ(n1)+(1ψ)π(n2)\psi\pi(n_{1})+(1-\psi)\pi(n_{2}) for all (n1,n2)(n_{1},n_{2}) in a subset of the support of (ng1,ng2)(n_{g_{1}},n_{g_{2}}). This subset is given by all pairs which appear in the same connected component of 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}}. Importantly, there is no requirement that the supports of ng1n_{g_{1}} and ng2n_{g_{2}} overlap. Figure 1 provides such an example.

The intuition for identification of ψπ(n1)+(1ψ)π(n2)\psi\pi(n_{1})+(1-\psi)\pi(n_{2}) comes from a regression of y¯i\overline{y}_{i} on the interaction of x¯i\overline{x}_{i} with dummies for each value of ng1n_{g_{1}} and the interaction of x¯i\overline{x}_{i} with dummies for each value of ng2n_{g_{2}}. Together, both sets of dummies sum to two for every individual in the sample (because each individual is in exactly two candidate groups, a small one and a larger one), hence one dummy must be omitted. For this reason, we identify the pairwise sums of ψπ(n1)\psi\pi(n_{1}) and (1ψ)π(n2)(1-\psi)\pi(n_{2}) for all pairs in the same connected component. If we observe a connected component containing at least three vertices corresponding to ng1n_{g_{1}} we can contrast ψπ(ng1)\psi\pi(n_{g_{1}}) over three different values of ng1n_{g_{1}} holding ng2n_{g_{2}} constant. This yields three equations in the four unknowns ψ,γ,δ,β\psi,\gamma,\delta,\beta. We obtain an additional equation by repeating the exercise for a different value of ng2n_{g_{2}}, which leads to identification.

Under the generalization of Assumption 4 to ψ=ψ(𝐳g2)\psi=\psi(\mathbf{z}_{g_{2}}), Proposition 2 holds provided that the support conditions on ng1n_{g_{1}} and ng2n_{g_{2}} hold conditionally on 𝐳g2=𝐳\mathbf{z}_{g_{2}}=\mathbf{z} for some element 𝐳\mathbf{z} of the support of 𝐳g2\mathbf{z}_{g_{2}}. Similarly, one replaces 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}} by 𝒢ng1,ng2|𝐳g2=𝐳\mathcal{G}_{n_{g_{1}},n_{g_{2}}|\mathbf{z}_{g_{2}}=\mathbf{z}}. This rules out pathological cases such as zi=ng2(i)z_{i}=n_{g_{2}(i)}.

5.   Estimation

For missing data with known group sizes, we estimate γ,δ,β\gamma,\delta,\beta by non-linear least squares based on

𝔼[y¯i|ng0,𝐱g]=x¯iπ(ng0).\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}}). (5.1)

We focus here on the non-linear least squares estimator since it is more easily adapted to the case of unknown group sizes. We present an alternative instrumental variables estimator in the appendix. If the group sizes are unknown, we use

𝔼[y¯i|ng,𝐱g]=x¯im=2𝐩m(ng;ρ,𝐪)π(m),\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}]=\overline{x}_{i}\sum_{m=2}^{\infty}\mathbf{p}_{m}(n_{g};\rho,\mathbf{q})\pi(m), (5.2)

where ρ(0,1]\rho\in(0,1], 0𝐪m10\leq\mathbf{q}_{m}\leq 1 for m=1,2,m=1,2,..., 𝐪1=0\mathbf{q}_{1}=0, m=1𝐪m=1\sum_{m=1}^{\infty}\mathbf{q}_{m}=1, and

𝐩m(ng;ρ,𝐪)\displaystyle\mathbf{p}_{m}(n_{g};\rho,\mathbf{q}) 𝐪m(mng)ρng(1ρ)mngn=2𝐪n(nng)ρng(1ρ)nng.\displaystyle\triangleq\frac{\mathbf{q}_{m}{m\choose n_{g}}\rho^{n_{g}}(1-\rho)^{m-n_{g}}}{\sum_{n=2}^{\infty}\mathbf{q}_{n}{n\choose n_{g}}\rho^{n_{g}}(1-\rho)^{n-n_{g}}}. (5.3)

For the parameters 𝐪\mathbf{q} and ρ\rho we can use maximum likelihood based on

[ng|ng1]=m=2𝐪m(mng)ρng(1ρ)mngn=1m=2𝐪m(mn)ρn(1ρ)mn.\displaystyle\mathbb{P}[n_{g}|n_{g}\geq 1]=\frac{\sum_{m=2}^{\infty}\mathbf{q}_{m}{m\choose n_{g}}\rho^{n_{g}}(1-\rho)^{m-n_{g}}}{\sum_{n=1}^{\infty}\sum_{m=2}^{\infty}\mathbf{q}_{m}{m\choose n}\rho^{n}(1-\rho)^{m-n}}. (5.4)

In practice we do not sequentially estimate the parameters by maximum likelihood and then by non-linear least squares. Instead, we estimate the parameters jointly by GMM based on the non-linear least squares moment conditions and equality to zero of the expectation of the score of the log-likelihood function based on (5.4). This framework allows the researcher to use GMM standard errors (as opposed to having to adjust for a two-stage estimator or use a boostrap) and to include additional information when it is available. For example, if the researcher knows which observed groups have missing members (i.e., they observe an indicator for ng=ng0n_{g}=n_{g_{0}}), they can also use the moment 𝔼[𝟏(ng=ng0)|ng]=ρng\mathbb{E}[\mathbf{1}(n_{g}=n_{g_{0}})|n_{g}]=\rho^{n_{g}}. Similarly, one can also make a parametric assumption on the distribution of ng0n_{g_{0}} (i.e., use 𝐪=𝐪(𝝀)\mathbf{q}=\mathbf{q}(\boldsymbol{\lambda}) for a parameter 𝝀\boldsymbol{\lambda}).

Identically to the empirical likelihood estimator of the probability mass function, the score equations yield 𝐪^m=0\widehat{\mathbf{q}}_{m}=0 for all m>n¯^m>\widehat{\overline{n}}, where n¯^\widehat{\overline{n}} is the sample maximum of ngn_{g}, hence the concentrated model is

𝔼[y¯i|ng,𝐱g]\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}] =x¯im=2n¯^𝐩m(ng;ρ,𝐪)π(m),\displaystyle=\overline{x}_{i}\sum_{m=2}^{\widehat{\overline{n}}}\mathbf{p}_{m}(n_{g};\rho,\mathbf{q})\pi(m), (5.5)
[ng|ng1]\displaystyle\mathbb{P}[n_{g}|n_{g}\geq 1] =m=2n¯^𝐪m(mng)ρng(1ρ)mngn=1n¯^m=2n¯^𝐪m(mn)ρn(1ρ)mn.\displaystyle=\frac{\sum_{m=2}^{\widehat{\overline{n}}}\mathbf{q}_{m}{m\choose n_{g}}\rho^{n_{g}}(1-\rho)^{m-n_{g}}}{\sum_{n=1}^{\widehat{\overline{n}}}\sum_{m=2}^{\widehat{\overline{n}}}\mathbf{q}_{m}{m\choose n}\rho^{n}(1-\rho)^{m-n}}. (5.6)

For group uncertainty, we apply non-linear least squares based on

𝔼[y¯i|ng1,ng2,𝐱g]=x¯i[ψπ(ng1)+(1ψ)π(ng2)].\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}[\psi\pi(n_{g_{1}})+(1-\psi)\pi(n_{g_{2}})]. (5.7)

6.   Monte-Carlo

We tailor the design to the well known data on roommates at Dartmouth college studied by Sacerdote (2001), Glaeser et al. (2003) and Angrist (2014). The original data comprise 1589 freshman at Dartmouth college who were randomly assigned to dorm-rooms. Fifty three percent of dorm-rooms were doubles, 44 percent were triples and the remaining 3 percent were quads. The first part of our Monte-Carlo experiment supposes that only a random sample of these students is available. The second part considers the issue of the definition of the relevant peer group, which could be the room or the floor (Sacerdote, 2001; Glaeser et al., 2003; Angrist, 2014).

6.1.   Missing data

We consider a design satisfying Assumption 3. The original data comprise a complete sample of freshman (i.e., ρ=1\rho=1). Our design varies ρ{0.3,0.5,0.7,0.9,1}\rho\in\{0.3,0.5,0.7,0.9,1\}. For ρ=1\rho=1 we set si=1s_{i}=1 for all i{1,,N0}i\in\{1,...,N_{0}\} and for ρ<1\rho<1 we set pi=ρ+qip_{i}=\rho+q_{i}, where qii.i.d.Uniform(0.1,0.1)q_{i}\overset{i.i.d.}{\sim}\text{Uniform}(-0.1,0.1) and si=1s_{i}=1 with probability pip_{i}. We set G0G_{0} to be the nearest integer to M/(ρ𝔼[ng0])M/(\rho\mathbb{E}[n_{g_{0}}]), where MM is discussed below. For each dataset, we draw G0G_{0} dorm-rooms from the distribution of dorm-room sizes, which we take to be ng02Binomial(2,0.25)n_{g_{0}}-2\sim{\rm Binomial}(2,0.25) so as to match the sample mean and support of the Dartmouth data. We use a parametric distribution so as to evaluate the performance of the GMM estimator both with and without a parametric restriction on the group size distribution.

Since we draw a fixed number of dorm rooms, each with a random size, the number of students N0N_{0} varies from one dataset to the next. We then draw an independent random sample of NN students, which includes student ii with probability pip_{i}. Hence NN also varies form one dataset to the next. Allowing G0G_{0} to depend on ρ\rho as above implies that the size of the observed sample is NMN\approx M no matter the value of ρ\rho. Setting M=1600M=1600, our design matches the sample size of the Dartmouth data. We thus consider the performance of our method had the Dartmouth data been obtained from a random sample of freshman, rather than all freshman, holding the number of observations constant as we vary ρ\rho. We also consider M=8000M=8000 to study the performance of the method under different sample sizes.

Sacerdote (2001) exploits random assignment of freshmen to dorms in the Dartmouth data to identify peer effects in educational attainment, measured by freshman GPA. It is argued that endogenous effects are difficult to identify due to the reflection problem (see Manski (1993)). Sacerdote (2001) focuses instead on credibly identifying the reduced form effect of room-mate high school attainment on freshman GPA (see column 5 of Table 3 in Sacerdote (2001)). The estimated peer effect is positive and statistically significant, though smaller in magnitude than own high-school attainment. This reduced form regression has R2=0.19R^{2}=0.19.

To match this setting as well as possible, in our design we set the own effect to be γ=1\gamma=1 and the endogenous effect to be β=0\beta=0, hence the reduced form effect of room-mate’s high-school attainment is equal to the contextual effect, which we set to be δ=0.5\delta=0.5. We set xii.i.d.𝒩(0,1)x_{i}\overset{i.i.d.}{\sim}\mathcal{N}(0,1), ϵii.i.d.𝒩(0,σ2)\epsilon_{i}\overset{i.i.d.}{\sim}\mathcal{N}(0,\sigma^{2}) and αg0i.i.d.𝒩(1,σ2)\alpha_{g_{0}}\overset{i.i.d.}{\sim}\mathcal{N}(1,\sigma^{2}) and choose σ2=2(γ2+δ2/𝔼[ng01])\sigma^{2}=2(\gamma^{2}+\delta^{2}/\mathbb{E}[n_{g_{0}}-1]). For the expected dorm-room size, this choice of σ2\sigma^{2} corresponds to fraction 0.8 of 𝕍[yi]\mathbb{V}[y_{i}] being due to αg0(i)+ϵi\alpha_{g_{0}(i)}+\epsilon_{i} and 0.2 being due to xiγ+(1ng0)1j:g0(j)=g0(i),jixjx_{i}\gamma+(1-n_{g_{0}})^{-1}\sum_{j:g_{0}(j)=g_{0}(i),j\neq i}x_{j}, hence a population R2R^{2} of 0.2 in a reduced form regression without fixed effects, as estimated by Sacerdote (2001).

6.1.1 Results

Table 1 presents the results. We compare four models; one in which the observed groups are treated as if they are the groups (‘MS’, estimated by NLS), one in which the group size is known (‘K’, estimated by NLS), one in which the group size is unknown (‘U’, estimated by GMM) and a modification of U in which the parametric assumption ng02Binomial(2,ω)n_{g_{0}}-2\sim{\rm Binomial}(2,\omega) is additionally imposed (‘U-P’, estimated by GMM). Model MS is miss-specified when ρ<1\rho<1. The other models are correctly specified but use information and assumptions in different ways. Model (K) is the full information benchmark with which we compare models (U) and (U-P). Models (U) and (U-P) take the upper bound on the support of the group sizes to be known as n¯=4\overline{n}=4, which, in the context of the Dartmouth data implies that the researcher knows that there are no rooms of more than four students.

Beginning with the contextual effects only specification (β=0\beta=0 is imposed) and ρ=1\rho=1, we see that the results are very similar (identical to three decimal places) for all four models. As ρ\rho decreases, the distributions of the NLS estimator of δ\delta and γ\gamma of model (MS) shift closer to zero. For ρ0.5\rho\leq 0.5, the bias becomes sizeable. In contrast, the estimators for the other models remain approximately unbiased for all values of ρ\rho, though their root mean squared error (RMSE) increases as ρ\rho decreases. This is at least partly because, as ρ\rho decreases the number of groups with only one observed member increases, and these groups provide no information on the parameters due to the within-specified-group transformation (i.e., because y¯i=x¯i=0\overline{y}_{i}=\overline{x}_{i}=0 when ng=1n_{g}=1).

The NLS estimator for model (K) performs at least as well as the GMM estimator for model (U) in terms of RMSE. This is unsurprising since it requires known group sizes in order to be implemented. For ρ=0.9\rho=0.9 the difference in RMSE is small (0.086 vs 0.099 for δ\delta when M=8000M=8000), though the gap widens as ρ\rho decreases. This is likely because, as ρ\rho decreases, there is less variation in the specified group size ngn_{g}, and hence, for a given specified group size, more uncertainty as to the group size ng0n_{g_{0}}. The additional parametric assumption on the group size distribution used in model (U-P) makes little difference when ρ0.7\rho\geq 0.7, but for smaller values the difference in RMSE between the GMM estimators of models (U) and (U-P) becomes non-neglible.

Moving on to the specification with contextual and endogenous effects, it is clear that, though identified due to there being at least three group sizes, there is insufficient group size variation in this design to reliably separate the two, as suggested by (Sacerdote, 2001). This is likely because only around 6% of rooms were quads. In particular, the RMSE on β\beta is large and all estimators of β\beta are biased upwards, whereas estimators of δ\delta tend to be biased downwards. Nevertheless, the qualiative conclusions regarding the relative performance of the estimators of the four models are the same as for the specifications with contextual effects only.

6.2.   Group uncertainty

Sacerdote (2001), Glaeser et al. (2003) and Angrist (2014) all discuss the appropriate definition of the peer group, which may either be the room, the floor or the entire dorm. We focus here on the distinction between the room and the floor, and consider a design in which Assumption 4 holds. To match the Dartmouth data, we first draw the rooms as described above. We then draw f1{1,2,,5}f_{1}\in\{1,2,...,5\} uniformly, and take rooms {1,2,,f1}\{1,2,...,f_{1}\} to comprise the first floor. We then draw another integer f2{1,2,,5}f_{2}\in\{1,2,...,5\} and take rooms {f1+1,,f1+f2}\{f_{1}+1,...,f_{1}+f_{2}\} to comprise the second floor. We proceed in this way until every room has been allocated to a floor. In the Dartmouth data, the sample mean number of students per floor is close to 8. This data generating process yields an expected number of students per floor of 7.5. All other aspects of the data generating process remain unchanged and we vary ψ{0.2,0.4,0.6,0.8}\psi\in\{0.2,0.4,0.6,0.8\}.

6.2.1 Results

Table 2 presents the results. We compare four models; one in which the specified group is the room (‘R’), one in which the specified group is the floor (‘F’), one in which the researcher knows whether the group is the room or the floor (‘K’) and one in which the group is unknown (‘U’). All are estimated by NLS. Models (R) and (F) are miss-specified because have 0<ψ<10<\psi<1 in all designs. The other models are correctly specified but use information and assumptions in different ways. Model (K) is the full information benchmark with which we compare model (U).

For brevity, the discussion focuses on the contextual effects only specifications. Specifying the group to be the room (model (R)) leads to downwards bias of δ\delta. The bias and RMSE grow as ψ\psi decreases. This is because the proportion of incorrectly specified groups grows as ψ\psi decreases. For the same reason the RMSE from specifying the group to be the floor (model (F)) increases as ψ\psi increases.

The results suggest that the estimator of model (F) is unbiased even when ψ>0\psi>0. This is coincidental. It just so happens that in this design the three sources of bias (the terms a,ba,b and cc discussed in Example 1 in Section 3) offset one another. To demonstrate this, Table 3 modifies the design by instead setting αg0(i)i.i.d.𝒩(ng01j:g0(j)=g0(i)xj,σ2)\alpha_{g_{0}(i)}\overset{i.i.d.}{\sim}\mathcal{N}(n_{g_{0}}^{-}1\sum_{j:g_{0}(j)=g_{0}(i)}x_{j},\sigma^{2}). This has no impact on the estimators of models (R), (K) and (U) because all are based on room fixed effects.121212For these estimators the results are numerically identical to results reported in Table 2 because identical seeds were used to draw the data for each replication. However, it introduces substantial bias for the estimator of model (F), which uses floor fixed effects.

Comparing the results for models (K) and (U), as expected the RMSE of the latter is larger since it does not require knowledge of the groups. Nevertheless, depending on the sample size and value of ψ\psi, the NLS estimator of model (U) has small bias and sufficiently small RMSE so as to be distinguishable from zero. Considering now the specifications with endogenous and contextual effects, though weakly identified, the main qualiative conclusions regarding the relative performance of the estimators of the four models are the same as for the specifications with contextual effects only. Notice also that once endogenous effects are introduced the estimator of model (F) often has larger bias and RMSE than that of model (U). This is particularly true for δ\delta.

7.   Application

In this section, we use unique employer-employee matched data to study how lawyers’ quit-to-exit responses depend on their own ability and that of their peers. The dataset is collected from the Shanghai Bar Association and includes a full history of law firm composition and lawyers’ career changes from 2009-2016. Exploiting lawyers’ annual registration records, we observe when they cancel legal practice in Shanghai, which we refer to as their quit-to-exit. On quitting, lawyers may change occupations, practice law in other regions, or return to education. Since legal practice is among the highest-income occupations and Shanghai is one of China’s highest-paying cities, we argue that quit-to-exit in this context is likely a result of lesser persistence in a highly competitive, incentivized environment, similar to exits in the education and labor literature (Thiemann, 2022; Wasserman, 2018).

Our analysis focuses on lawyers’ quit-to-exit in 2016 and how it is affected by their own ability and that of their peers. Throughout our analysis, we focus on associate lawyers, excluding partners and directors. This is because associates and partners/directors are unlikely to perceive one another as direct competitors. The behavior, ability and characteristics of partners and directors in the firm is accounted for by firm fixed effects. From this point onwards, we refer to associate lawyers simply as lawyers.

To measure lawyers’ ability, we merge the registration records with Shanghai court judgements. In this dataset, lawyers’ performance can be more reliably measured for civil litigation, so we focus on law firms and lawyers whose main business is in civil law. Our analysis is thus restricted to a sample of law firms in which most lawyers complete at least one civil case annually and lawyers who do not declare criminal law as one of the three specialized fields.131313The performance in criminal law is not well measured because: 1) We only observe a small share of criminal judgements and many are restricted for confidentiality reason; 2) For criminal cases, the case fees are identical within the same case category (e.g. theft or homicide), so we cannot use case fees to measure case size as what we do for civil cases. Nevertheless, our results are robust to use number of civil and criminal cases to create the ability measure and run the analysis for all (civil and criminal) lawyers, and are robust to including all law firms in the analysis (see Table 7). To measure performance in civil litigation, we use lawyers’ fee-weighted caseload in the previous three years (2013-2015). Though we observe case outcomes (win or lose), we do not use this to measure ability because it is likely that high-ability lawyers take on more difficult cases.

To compute our ability measure, we perform the following steps: 1) We extract the case fee (measured in thousand yuan) for each civil case, which is the amount paid to file the case to the court and is calculated based on the disputed amount of the case; 2) We divide the case fee by the number of lawyers representing the case, according to the lawyers being listed in the court judgement; and 3) We repeat the previous two steps for each civil case and sum all fee-weighted cases undertaken from 2013-2015, and 4) We compute the annual average caseload by dividing by the number of years of practice between 2013 and 2015.141414For lawyers who started practice after 2013, the annual caseload in 2013 (and 2014 if started after 2014) would be zero. To include these lawyers and make them comparable, we use the annual average caseload, rather than the three-year sum, by excluding the year(s) before the lawyer’s legal practice date.

The fee-weighted annual average caseload is a proxy for ability because it reflects past performance and income generated, hence can be used to predict one’s career prospects. Generally, lawyers in China charge by case, rather than by work hours (Liu, 2006). For civil litigation, lawyers’ fees are typically charged based on the disputed amount (Michelson, 2006), thus high-ability lawyers are more likely to work on high fee cases, regardless of whether cases are assigned by partners or obtained by the lawyer themselves. Nevertheless, we are aware that non-litigation business is not measured (e.g., Initial Public Offerings and Mergers/Acquisitions). To this end, we replicate our analysis in small- to medium-size firms because major non-litigation business is concentrated in large firms.

The effect of interest is that of peer ability on quit-to-exit. To measure peer ability we use both the peer average fee-weighted caseload and the proportion of high-ability peers, where high-ability is defined as having a fee-weighted caseload in the top quartile (i.e., >24.44>24.44 in our data). We believe that the latter better captures the competitive environment through which lawyers compete for promotion to a small number of senior positions (team leaders, partners and directors).

We consider two group structures through which peer effects may operate. Group F supposes that peer effects operate among all lawyers working at the same law firm. Alternatively, we might expect stronger peer effects among similar-age lawyers, who are likely to be at a similar career stage. To account for this possibility, we split lawyers into two cohorts with a cut-off age of 35 (inclusive) for the younger cohort. Group F-C supposes that peer effects only operate through lawyers in the same cohort at the same law firm.

Our sample includes 8,448 lawyers working at 755 law firms during 2016. Table 4 summarizes the data. As shown in the left panel, 3.6% of lawyers quit-to-exit, 45% are female, 38% have a graduate degree (Masters or PhD), the average age is around 36, average experience is around 8 years, and the average fee-weighted annual caseload is 20 thousand yuan. In the right panel of Table 4, we present summary statistics for the 732 small- to medium-size firms, which are largely similar to the sample of all firms.

We estimate (2.1) taking the dependent variable to be an indicator for quit-to-exit in 2016. Even though the dependent variable is binary, the linear model is well defined from both an econometric and economic perspective, and has desirable computational and statistical properties, especially when there are group fixed effects (see Boucher et al. (2020)).

Individual covariates comprise an indicator for being female, an indicator for having a graduate degree, age, years of experience, quadratic terms in age and experience, and fee-weighted caseload. We consider contextual effects only (i.e., β=0\beta=0 is imposed) because we do not expect strategic interactions in quit-to-exit behavior. To account for within-firm dependence, we cluster standard errors by firm. All soecifications include either firm or firm-cohort fixed effects.

7.1.   Baseline results

Our baseline results are reported in Table 5. Models (1)-(2) respectively use group structures F and F-C. We find a positive yet statistically insignificant peer effect in fee-weighted caseload. In group F, the magnitude of the peer effect is around one third of the effect of a lawyer’s own fee-weighted caseload, which is negative and statistically significant at the 0.01 level. For group F-C, the magnitude of the peer effect is similar to that of own fee-weighted caseload. Models (3)-(4) consider instead the proportion of high-ability peers in groups F and F-C respectively, for which we find larger peer effects. The coefficient of 0.0544 in model (3) is statistically significant at the 0.05 level and indicates that a one standard deviation (17.2 pp) increase in the proportion of high-ability peers leads to 26.0 percent (0.9 pp) increase in lawyers’ quit-to-exit probability. We find a similar effect in model (4). In both models we find a notable gender gap, in that female lawyers are around 35.8-41.1 percent (1.3-1.5 pp) more likely to quit than male lawyers. This is consistent with previous findings that women are more likely to discontinue in competitive environments (Hunt, 2016; Wasserman, 2018). We also find evidence that quits are negatively but concavely correlated with own age and negatively correlated with own fee-weighted caseload, which is consistent with Jovanovic (1979).

Next, in models (5)-(6) we estimate the impact of peer ability separately for those above and below 35. We find that the peer effect is stronger in the younger cohort. The coefficient of 0.0767 in model (5) is statistically significant at the 0.01 level and indicates that a one standard deviation (19.3 pp) increase in the proportion of high-ability peers leads to 35.9 percent (1.48 pp) increase in lawyers’ quit-to-exit likelihood. In contrast, we find a smaller effect among those older than 35, which is not statistically distinguishable from zero. We also find a larger gender gap among those below 35. Overall, we find stronger quit-to-exit effects among those under 35, which is consistent with Jovanovic (1979).

In Table 6, we restrict our analysis to small- to medium- size firms to alleviate the concern that our ability measure cannot capture non-litigation performance. We limit the sample of firms to those have 50 or fewer lawyers (i.e. 732 out of 755 firms). Our results are qualitatively identical the baseline specification. Another concern is on lawyers close to the retirement age, who may intentionally reduce caseload before the retirement. To that end, we replicate our preferred models (models (3)-(4) in Table 5) and we exclude lawyers beyond age 55, the age at which most employees in China are eligible to claim their pension. As shown in models (1)-(2) of Table 7, we still find similar adverse peer ability effects, indicating that retiring lawyers are not driving the estimated effects. Next, in models (3)-(4), we conduct another robustness check by including all law firms in Shanghai. The similar effects we find in these two columns suggest that caseload broadly serves as a reliable ability proxy in the legal profession context. Overall, the similar effects found in alternative samples in models (1)-(4) and in Table 6 imply that the estimated peer ability effects generally exist in our context. Lastly, in models (5)-(6), we consider an alternative definition using the number of high-ability peers in the specific groups F and F-C. The results are also in line with our baseline findings.

In Table 8, we conduct a heterogeneity analysis. In models (1)-(2), we examine the impact of high-ability peers on lawyers whose fee-weighted caseload is below median. Consistent with Antecol et al. (2016) and Booij et al. (2017), these low- to medium-ability individuals drive the negative peer ability effects. In models (3)-(4), we study whether women are more sensitive to peer ability effects. Despite the significant gender gap of persistence, we find no evidence suggesting that women are more sensitive to peer ability effects. In models (5)-(6), we check whether the magnitude of peer ability effects vary by firm size by interacting the proportion of high-ability peers with an indicator for firms below the median size. The coefficient on the interaction term suggests that lawyers in small firms are not additively more prone to peer ability effects.

7.2.   Missing data

We now ask whether we could have obtained similar findings had only a random sample of lawyers with firm identifiers been available. To do this, we draw random 1000 random sub-samples of lawyers using the sampling process from our Monte-Carlo experiment (see Section 6.1). We focus on the baseline specification for group F (model (3) of Table 5), consider 50% and 70% subsamples (i.e., ρ=0.5\rho=0.5 and ρ=0.7\rho=0.7). The minimum value of ng0n_{g_{0}} in our sample is 2, which we use as a lower bound for the estimator with unknown group sizes. We make a parametric restriction on the distribution of ng02n_{g_{0}}-2, supposing that it follows a negative binomial distribution. Figure 2 depicts the empirical distribution for the original sample and a fitted negative binomial distribution, which provides a good approximation.

The results are reported in Figures 3 (ρ=0.5\rho=0.5) and 4 (ρ=0.7\rho=0.7). As in our Monte-Carlo experiment, we consider the estimator based on miss-specified peer group sizes (i.e., incorrectly supposing that ng=ng0n_{g}=n_{g_{0}}), known sizes and unknown sizes. Looking at the left hand columns we see that miss-specification has little consequence for estimates of the effect of a lawyer’s own characteristics. However, looking at the right hand columns we see that miss-specification biases estimates of peer effects towards zero. Comparing Figure 3 with Figure 4, we see that the bias in in the peer effects is larger for ρ=0.5\rho=0.5 than for ρ=0.7\rho=0.7. Unsurprisingly, the variability of the estimators is also larger for ρ=0.5\rho=0.5 than for ρ=0.7\rho=0.7. These findings are qualitatively identical to those of our Monte-Carlo experiment, but use real data and larger group sizes.

7.3.   Group uncertainty

Model (7) of Table 5 allows for uncertainty as to whether F or F-C is the relevant peer group. The point estimate of ψ\psi (i.e., the probability of F-C) is 0.562, which suggests considerable heterogeneity, though it is not precisely estimated. Despite this, we find similar peer effects to the models based on groups F and F-C. We obtain similar results in the sample of small-medium firms (see Table 6).

8.   Conclusion

Our identification results and empirical work demonstrate that it is possible to conduct empirical analysis of peer effects despite missing data and group uncertainty. Regarding missing data, we require only that the researcher has access to a sample of individuals with outcomes, exogenous characteristics and group identifiers. We propose a method which does not require information on group size, nor on individuals which are not sampled, nor on whether such individuals even exist. In principle, and subject to the limitations discussed below, this opens up the possibility of peer effects studies based on widely available individual level survey data (provided that it contains group identifiers). We also show that peer effects can be identified under group uncertainty. Future work may extend our results to incorporate both missing individuals and group uncertainty simultaneously.

A limitation of our work is that our results are specific to the group interactions model we consider. Under more general network structures (e.g., social networks), it is not the case that the specified group fixed effects reduce the problem of miss-specification to the problem of inferring the group size. Future work may seek to bridge this divide.

References

  • Abowd et al. (1999) Abowd, J. M., F. Kramarz and D. N. Margolis, “High wage workers and high wage firms,” Econometrica 67 (1999), 251–333.
  • Angrist (2014) Angrist, J. D., “The perils of peer effects,” Labour Economics 30 (2014), 98–108.
  • Angrist and Lang (2004) Angrist, J. D. and K. Lang, “Does school integration generate peer effects? Evidence from Boston’s Metco Program,” American Economic Review 94 (2004), 1613–1634.
  • Antecol et al. (2016) Antecol, H., O. Eren and S. Ozbeklik, “Peer effects in disadvantaged primary schools evidence from a randomized experiment,” Journal of Human Resources 51 (2016), 95–132.
  • Arduini et al. (2020) Arduini, T., E. Patacchini and E. Rainone, “Identification and estimation of network models with heterogeneous interactions,” in The Econometrics of Networks (Emerald Publishing Limited, 2020).
  • Arnab (2017) Arnab, R., Survey sampling theory and applications (Academic Press, 2017).
  • Bertrand et al. (2000) Bertrand, M., E. F. Luttmer and S. Mullainathan, “Network effects and welfare cultures,” The Quarterly Journal of Economics 115 (2000), 1019–1055.
  • Blume et al. (2015) Blume, L. E., W. A. Brock, S. N. Durlauf and R. Jayaraman, “Linear social interactions models,” Journal of Political Economy 123 (2015), 444–496.
  • Booij et al. (2017) Booij, A. S., E. Leuven and H. Oosterbeek, “Ability peer effects in university: Evidence from a randomized experiment,” The review of economic studies 84 (2017), 547–578.
  • Boucher et al. (2014) Boucher, V., Y. Bramoullé, H. Djebbari and B. Fortin, “Do peers affect student achievement? Evidence from Canada using group size variation,” Journal of applied econometrics 29 (2014), 91–109.
  • Boucher et al. (2020) Boucher, V., Y. Bramoullé et al., Binary Outcomes and Linear Interactions (Aix-Marseille School of Economics, 2020).
  • Boucher and Houndetoungan (2020) Boucher, V. and A. Houndetoungan, “Estimating peer effects using partial network data,” Working paper (2020).
  • Brady et al. (2017) Brady, R. R., M. A. Insler and A. S. Rahman, “Bad company: Understanding negative peer effects in college achievement,” European Economic Review 98 (2017), 144–168.
  • Bramoullé et al. (2009) Bramoullé, Y., H. Djebbari and B. Fortin, “Identification of peer effects through social networks,” Journal of econometrics 150 (2009), 41–55.
  • Bramoullé et al. (2020) ———, “Peer effects in networks: A survey,” Annual Review of Economics 12 (2020), 603–629.
  • Breza et al. (2020) Breza, E., A. G. Chandrasekhar, T. H. McCormick and M. Pan, “Using aggregated relational data to feasibly identify network structure without network data,” American Economic Review 110 (2020), 2454–84.
  • Brown (2011) Brown, J., “Quitters never win: The (adverse) incentive effects of competing with superstars,” Journal of Political Economy 119 (2011), 982–1013.
  • Carrell et al. (2013) Carrell, S. E., B. I. Sacerdote and J. E. West, “From natural variation to optimal policy? The importance of endogenous peer group formation,” Econometrica 81 (2013), 855–882.
  • Chandrasekhar and Lewis (2011) Chandrasekhar, A. and R. Lewis, “Econometrics of sampled networks,” Unpublished manuscript, MIT.[422] (2011).
  • Cornelissen et al. (2017) Cornelissen, T., C. Dustmann and U. Schönberg, “Peer effects in the workplace,” American Economic Review 107 (2017), 425–56.
  • Davezies et al. (2009) Davezies, L., X. d’Haultfoeuille and D. Fougère, “Identification of peer effects using group size variation,” The Econometrics Journal 12 (2009), 397–413.
  • Duflo et al. (2011) Duflo, E., P. Dupas and M. Kremer, “Peer effects, teacher incentives, and the impact of tracking: Evidence from a randomized evaluation in Kenya,” American economic review 101 (2011), 1739–74.
  • Emerson and Hill (2018) Emerson, J. and B. Hill, “Peer effects in marathon racing: The role of pace setters,” Labour Economics 52 (2018), 74–82.
  • Foster (2006) Foster, G., “It’s not your peers, and it’s not your friends: Some progress toward understanding the educational peer effect mechanism,” Journal of public Economics 90 (2006), 1455–1475.
  • Glaeser et al. (2003) Glaeser, E. L., B. I. Sacerdote and J. A. Scheinkman, “The social multiplier,” Journal of the European Economic Association 1 (2003), 345–353.
  • Goldsmith-Pinkham and Imbens (2013) Goldsmith-Pinkham, P. and G. W. Imbens, “Social networks and the identification of peer effects,” Journal of Business & Economic Statistics 31 (2013), 253–264.
  • Guryan et al. (2009) Guryan, J., K. Kroft and M. J. Notowidigdo, “Peer effects in the workplace: Evidence from random groupings in professional golf tournaments,” American Economic Journal: Applied Economics 1 (2009), 34–68.
  • Hardy et al. (2019) Hardy, M., R. M. Heath, W. Lee and T. H. McCormick, “Estimating spillovers using imprecisely measured networks,” arXiv preprint arXiv:1904.00136 (2019).
  • Howell (2021) Howell, S. T., “Learning from feedback: Evidence from new ventures,” Review of Finance 25 (2021), 595–627.
  • Hoxby (2000) Hoxby, C., “Peer effects in the classroom: Learning from gender and race variation,” Technical Report, National Bureau of Economic Research, 2000.
  • Hoxby and Weingarth (2005) Hoxby, C. M. and G. Weingarth, “Taking race out of the equation: School reassignment and the structure of peer effects,” Technical Report, Citeseer, 2005.
  • Hunt (2016) Hunt, J., “Why do women leave science and engineering?,” ILR Review 69 (2016), 199–226.
  • Jovanovic (1979) Jovanovic, B., “Job matching and the theory of turnover,” Journal of political economy 87 (1979), 972–990.
  • Lee (2007) Lee, L.-F., “Identification and estimation of econometric models with group interactions, contextual factors and fixed effects,” Journal of Econometrics 140 (2007), 333–374.
  • Lewbel (2019) Lewbel, A., “The identification zoo: Meanings of identification in econometrics,” Journal of Economic Literature 57 (2019), 835–903.
  • Lewbel et al. (2019) Lewbel, A., X. Qu, X. Tang et al., “Social networks with misclassified or unobserved links,” Unpublished manuscript (2019).
  • Liu (2006) Liu, S., “Client influence and the contingency of professionalism: the work of elite corporate lawyers in China,” Law & Society Review 40 (2006), 751–782.
  • Liu et al. (2017) Liu, X., E. Patacchini and E. Rainone, “Peer effects in bedtime decisions among adolescents: a social network model with sampled data,” The econometrics journal 20 (2017), S103–S125.
  • Lundborg (2006) Lundborg, P., “Having the wrong friends? Peer effects in adolescent substance use,” Journal of health economics 25 (2006), 214–233.
  • Manski (1993) Manski, C. F., “Identification of endogenous social effects: The reflection problem,” The Review of Economic Studies 60 (1993), 531–542.
  • Manski (1995) ———, Identification problems in the social sciences (Harvard University Press, 1995).
  • Michelson (2006) Michelson, E., “The practice of law as an obstacle to justice: Chinese lawyers at work,” Law & Society Review 40 (2006), 1–38.
  • Moffitt et al. (2001) Moffitt, R. A. et al., “Policy interventions, low-level equilibria, and social interactions,” Social dynamics 4 (2001), 6–17.
  • Reza et al. (2021) Reza, S., P. Manchanda and J.-K. Chong, “Identification and Estimation of Endogenous Peer Effects Using Partial Network Data from Multiple Reference Groups,” Management Science (2021).
  • Rose (2017) Rose, C. D., “Identification of peer effects through social networks using variance restrictions,” The Econometrics Journal 20 (2017), S47–S60.
  • Sacerdote (2001) Sacerdote, B., “Peer effects with random assignment: Results for Dartmouth roommates,” The Quarterly journal of economics 116 (2001), 681–704.
  • Smith (2013) Smith, J., “Peers, pressure, and performance at the national spelling bee,” Journal of Human resources 48 (2013), 265–285.
  • Sojourner (2013) Sojourner, A., “Identification of peer effects with missing peer data: Evidence from Project STAR,” The Economic Journal 123 (2013), 574–605.
  • Thiemann (2022) Thiemann, P., “The persistent effects of short-term peer groups on performance: Evidence from a natural experiment in higher education,” Management Science 68 (2022), 1131–1148.
  • Wang and Lee (2013) Wang, W. and L.-F. Lee, “Estimation of spatial autoregressive models with randomly missing data in the dependent variable,” The Econometrics Journal 16 (2013), 73–102.
  • Wasserman (2018) Wasserman, M., “Gender differences in politician persistence,” The Review of Economics and Statistics (2018), 1–46.
  • Yamane and Hayashi (2015) Yamane, S. and R. Hayashi, “Peer effects among swimmers,” The Scandinavian Journal of Economics 117 (2015), 1230–1255.

Appendix

Proof.

of Proposition 1

The first result was established by Davezies et al. (2009).

To establish the second result, first note that Assumption 3 implies 𝔼[u¯i|ng,𝐱g]=0\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0 for all i𝒮i\in\mathcal{S}. Let us now fix 𝐱g=𝐱\mathbf{x}_{g}=\mathbf{x} for some 𝐱\mathbf{x} in its support. We begin by identifying ρ\rho and the distribution of ng0|𝐱g=𝐱n_{g_{0}}|\mathbf{x}_{g}=\mathbf{x} using the observable distribution of ng|ng1,𝐱g=𝐱n_{g}|n_{g}\geq 1,\mathbf{x}_{g}=\mathbf{x} and the fact that Assumption 3 implies

ng|ng0,𝐱g=𝐱Binomial(ng0,ρ)n_{g}|n_{g_{0}},\mathbf{x}_{g}=\mathbf{x}\sim\text{Binomial}(n_{g_{0}},\rho)

By the boundedness assumption, there exists n¯<:[ng0=n¯]>0,[ng0n¯]=1\overline{n}<\infty:\mathbb{P}[n_{g_{0}}=\overline{n}]>0,\mathbb{P}[n_{g_{0}}\leq\overline{n}]=1. Note that, because ρ>0\rho>0, n¯\overline{n} is identifiable as the integer nn such that [ng=n|ng1]>0\mathbb{P}[n_{g}=n|n_{g}\geq 1]>0. This implies that there exists identifiable n¯(𝐱)<:[ng0=n¯(𝐱)|𝐱g=𝐱]>0,[ng0(𝐱)n¯(𝐱)|𝐱g=𝐱]=1\overline{n}(\mathbf{x})<\infty:\mathbb{P}[n_{g_{0}}=\overline{n}(\mathbf{x})|\mathbf{x}_{g}=\mathbf{x}]>0,\mathbb{P}[n_{g_{0}}(\mathbf{x})\leq\overline{n}(\mathbf{x})|\mathbf{x}_{g}=\mathbf{x}]=1.

Let 𝐩n(𝐱)=[ng=n|ng1,𝐱g=𝐱]\mathbf{p}_{n}(\mathbf{x})=\mathbb{P}[n_{g}=n|n_{g}\geq 1,\mathbf{x}_{g}=\mathbf{x}] and 𝐪m(𝐱)=[ng0=m|𝐱g=𝐱]\mathbf{q}_{m}(\mathbf{x})=\mathbb{P}[n_{g_{0}}=m|\mathbf{x}_{g}=\mathbf{x}]. Under Assumption 3, 𝐩(𝐱)=(𝐩n(𝐱))n=1,,n¯(𝐱)\mathbf{p}(\mathbf{x})=(\mathbf{p}_{n}(\mathbf{x}))_{n=1,...,\overline{n}(\mathbf{x})} and 𝐪(𝐱)=(𝐪m(𝐱))m=1,,n¯(𝐱)\mathbf{q}(\mathbf{x})=(\mathbf{q}_{m}(\mathbf{x}))_{m=1,...,\overline{n}(\mathbf{x})} verify

𝐩(𝐱)/𝐩n¯(𝐱)(𝐱)=𝐀(𝐱,ρ)𝐬(𝐱)ρn¯(𝐱)\displaystyle\mathbf{p}(\mathbf{x})/\mathbf{p}_{\overline{n}(\mathbf{x})}(\mathbf{x})=\mathbf{A}(\mathbf{x},\rho)\mathbf{s}(\mathbf{x})\rho^{-\overline{n}(\mathbf{x})} (8.1)

where 𝐀(𝐱,ρ)=(𝐀ij(𝐱,ρ))i=1,,n¯(𝐱),j=1,,n¯(𝐱)\mathbf{A}(\mathbf{x},\rho)=(\mathbf{A}_{ij}(\mathbf{x},\rho))_{i=1,...,\overline{n}(\mathbf{x}),j=1,...,\overline{n}(\mathbf{x})} is an upper triangular Binomial matrix with entries,

𝐀ij(𝐱,ρ)={(ji)ρi(1ρ)jiij0otherwise\mathbf{A}_{ij}(\mathbf{x},\rho)=\begin{cases}{j\choose i}\rho^{i}(1-\rho)^{j-i}&i\leq j\\ 0&\text{otherwise}\end{cases}

and 𝐬(𝐱)=𝐪(𝐱)/𝐪n¯(𝐱)(𝐱)\mathbf{s}(\mathbf{x})=\mathbf{q}(\mathbf{x})/\mathbf{q}_{\overline{n}(\mathbf{x})}(\mathbf{x}). By construction, 𝐬n¯(𝐱)(𝐱)=1\mathbf{s}_{\overline{n}(\mathbf{x})}(\mathbf{x})=1 and m=1n¯(𝐱)𝐬m(𝐱)=1/𝐪n¯(𝐱)(𝐱)\sum_{m=1}^{\overline{n}(\mathbf{x})}\mathbf{s}_{m}(\mathbf{x})=1/\mathbf{q}_{\overline{n}(\mathbf{x})}(\mathbf{x}). We also know 𝐪1(𝐱)=0\mathbf{q}_{1}(\mathbf{x})=0 because ng02n_{g_{0}}\geq 2, hence we know 𝐬1(𝐱)=0\mathbf{s}_{1}(\mathbf{x})=0. The remaining entries of 𝐬(𝐱)\mathbf{s}(\mathbf{x}) are non-negative but unknown.

Now suppose that there exists ρ~\widetilde{\rho} and 𝐬~(𝐱)\widetilde{\mathbf{s}}(\mathbf{x}) which also verify (8.1). Then we have

𝐬(𝐱)\displaystyle\mathbf{s}(\mathbf{x}) =(ρρ~)n¯(𝐱)𝐀(𝐱,ρ)1𝐀(𝐱,ρ~)𝐬~(𝐱)\displaystyle=\left(\frac{\rho}{\widetilde{\rho}}\right)^{\overline{n}(\mathbf{x})}\mathbf{A}(\mathbf{x},\rho)^{-1}\mathbf{A}(\mathbf{x},\widetilde{\rho})\widetilde{\mathbf{s}}(\mathbf{x}) (8.2)
=𝐁(𝐱,ρ,ρ~)𝐬~(𝐱)\displaystyle=\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho})\widetilde{\mathbf{s}}(\mathbf{x}) (8.3)

where 𝐀(𝐱,ρ)1\mathbf{A}(\mathbf{x},\rho)^{-1} exists since |𝐀(𝐱,ρ)|=ρT(n¯(𝐱))|\mathbf{A}(\mathbf{x},\rho)|=\rho^{T(\overline{n}(\mathbf{x}))} where T(n¯(𝐱))T(\overline{n}(\mathbf{x})) is the n¯(𝐱)th\overline{n}(\mathbf{x})^{th} triangular number and by assumption ρ>0\rho>0. The matrix 𝐁(𝐱,ρ,ρ~)=(𝐁ij(𝐱,ρ,ρ~))i=1,,n¯(𝐱),j=1,,n¯(𝐱)\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho})=(\mathbf{B}_{ij}(\mathbf{x},\rho,\widetilde{\rho}))_{i=1,...,\overline{n}(\mathbf{x}),j=1,...,\overline{n}(\mathbf{x})} is an upper triangular matrix with entries,

𝐁ij(𝐱,ρ,ρ~)={(ρρ~)n¯(𝐱)(ji)ρ~iρj(ρρ~)jiij0otherwise.\mathbf{B}_{ij}(\mathbf{x},\rho,\widetilde{\rho})=\begin{cases}\left(\frac{\rho}{\widetilde{\rho}}\right)^{\overline{n}(\mathbf{x})}{j\choose i}\widetilde{\rho}^{i}\rho^{-j}(\rho-\widetilde{\rho})^{j-i}&i\leq j\\ 0&\text{otherwise}\end{cases}.

The final equation in the system in (8.3) is redundant because we already established that 𝐬n¯(𝐱)=𝐬~n¯(𝐱)=1\mathbf{s}_{\overline{n}(\mathbf{x})}=\widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})}=1 by construction. Rearranging the penultimate equation yields

ρ=ρ~(𝐬n¯(𝐱)1(𝐱)+n¯(𝐱)𝐬~n¯(𝐱)1(𝐱)+n¯(𝐱))\displaystyle\rho=\widetilde{\rho}\left(\frac{\mathbf{s}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})+\overline{n}(\mathbf{x})}{\widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})+\overline{n}(\mathbf{x})}\right) (8.4)

Denoting the term in parentheses by Δ(𝐱)\Delta(\mathbf{x}), injecting (8.4) into the definition of 𝐁(𝐱,ρ,ρ~)\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho}) yields 𝐁(𝐱,ρ,ρ~)=𝐂(Δ(𝐱))\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho})=\mathbf{C}(\Delta(\mathbf{x})) with entries

𝐂ij(Δ(𝐱))={(ji)Δ(𝐱)n¯(𝐱)j(Δ(𝐱)1)jiij0otherwise,\mathbf{C}_{ij}(\Delta(\mathbf{x}))=\begin{cases}{j\choose i}\Delta(\mathbf{x})^{\overline{n}(\mathbf{x})-j}(\Delta(\mathbf{x})-1)^{j-i}&i\leq j\\ 0&\text{otherwise}\end{cases},

hence we can re-write the system in (8.3) as

𝐬(𝐱)\displaystyle\mathbf{s}(\mathbf{x}) =𝐂(Δ(𝐱))𝐬~(𝐱).\displaystyle=\mathbf{C}(\Delta(\mathbf{x}))\widetilde{\mathbf{s}}(\mathbf{x}). (8.5)

From its definition, it is clear that Δ(𝐱)>0\Delta(\mathbf{x})>0 and when Δ(𝐱)=1\Delta(\mathbf{x})=1 we have ρ=ρ~\rho=\widetilde{\rho} (due to (8.4)), 𝐬(𝐱)=𝐬~(𝐱)\mathbf{s}(\mathbf{x})=\widetilde{\mathbf{s}}(\mathbf{x}) (due to (8.3)) and hence 𝐪(𝐱)=𝐪~(𝐱)\mathbf{q}(\mathbf{x})=\widetilde{\mathbf{q}}(\mathbf{x}) (because m=1n¯(𝐱)𝐬m(𝐱)=1/𝐪n¯(𝐱)(𝐱)\sum_{m=1}^{\overline{n}(\mathbf{x})}\mathbf{s}_{m}(\mathbf{x})=1/\mathbf{q}_{\overline{n}(\mathbf{x})}(\mathbf{x})). We now show that Δ(𝐱)=1\Delta(\mathbf{x})=1 by ruling out Δ(𝐱)>1\Delta(\mathbf{x})>1 and Δ(𝐱)<1\Delta(\mathbf{x})<1 by contradiction.

Suppose first that Δ(𝐱)>1\Delta(\mathbf{x})>1. Then, since 𝐬1(𝐱)=0\mathbf{s}_{1}(\mathbf{x})=0, the first equation in (8.5) is

0=j=1n¯(𝐱)jΔ(𝐱)n¯(𝐱)j(Δ(𝐱)1)j1𝐬~j(𝐱)0=\sum_{j=1}^{\overline{n}(\mathbf{x})}j\Delta(\mathbf{x})^{\overline{n}(\mathbf{x})-j}(\Delta(\mathbf{x})-1)^{j-1}\widetilde{\mathbf{s}}_{j}(\mathbf{x})

The right hand side is strictly positive because by construction at least one entry of 𝐬~(𝐱)\widetilde{\mathbf{s}}(\mathbf{x}) must be strictly positive. Hence we cannot have Δ(𝐱)>1\Delta(\mathbf{x})>1.

Suppose instead that Δ(𝐱)<1\Delta(\mathbf{x})<1. Using the first n¯(𝐱)2\overline{n}(\mathbf{x})-2 equations in (8.5) to solve for the n¯(𝐱)2\overline{n}(\mathbf{x})-2 unknowns 𝐬~2(𝐱),𝐬~3(𝐱),,𝐬~n¯(𝐱)1(𝐱)\widetilde{\mathbf{s}}_{2}(\mathbf{x}),\widetilde{\mathbf{s}}_{3}(\mathbf{x}),...,\widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})-1}(\mathbf{x}), we obtain

(𝐬~2(𝐱)𝐬~3(𝐱)𝐬~n¯(𝐱)1(𝐱))=𝐃(Δ(𝐱))1[(0𝐬2(𝐱)𝐬n¯(𝐱)2(𝐱))𝐝(Δ(𝐱))],\displaystyle\begin{pmatrix}\widetilde{\mathbf{s}}_{2}(\mathbf{x})\\ \widetilde{\mathbf{s}}_{3}(\mathbf{x})\\ \vdots\\ \widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})\end{pmatrix}=\mathbf{D}(\Delta(\mathbf{x}))^{-1}\left[\begin{pmatrix}0\\ \mathbf{s}_{2}(\mathbf{x})\\ \vdots\\ \mathbf{s}_{\overline{n}(\mathbf{x})-2}(\mathbf{x})\end{pmatrix}-\mathbf{d}(\Delta(\mathbf{x}))\right], (8.6)

where 𝐃(Δ(𝐱))\mathbf{D}(\Delta(\mathbf{x})) is the sub-matrix of 𝐂(Δ(𝐱))\mathbf{C}(\Delta(\mathbf{x})) formed from rows 1,2,,n¯(𝐱)21,2,...,\overline{n}(\mathbf{x})-2 and columns 2,3,,n¯(𝐱)12,3,...,\overline{n}(\mathbf{x})-1 and 𝐝(Δ(𝐱))\mathbf{d}(\Delta(\mathbf{x})) is the sub-matrix of 𝐂(Δ(𝐱))\mathbf{C}(\Delta(\mathbf{x})) formed from rows 1,2,,n¯(𝐱)21,2,...,\overline{n}(\mathbf{x})-2 and column n¯(𝐱)\overline{n}(\mathbf{x}). Injecting (8.6) into the right hand side of the penultimate equation of (8.5) yields

𝐬n¯(𝐱)1(𝐱)={n¯(𝐱)(1Δ(𝐱))n¯(𝐱)1n¯(𝐱)=2j=2n¯(𝐱)2j(1Δ(𝐱))j2𝐬j(𝐱)(n¯(𝐱)1)(1Δ(𝐱))n¯(𝐱)3n¯(𝐱)(1Δ(𝐱))n¯(𝐱)1n¯(𝐱)3\mathbf{s}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})=\begin{cases}-\frac{\overline{n}(\mathbf{x})(1-\Delta(\mathbf{x}))}{\overline{n}(\mathbf{x})-1}&\overline{n}(\mathbf{x})=2\\ -\frac{\sum_{j=2}^{\overline{n}(\mathbf{x})-2}j(1-\Delta(\mathbf{x}))^{j-2}\mathbf{s}_{j}(\mathbf{x})}{(\overline{n}(\mathbf{x})-1)(1-\Delta(\mathbf{x}))^{\overline{n}(\mathbf{x})-3}}-\frac{\overline{n}(\mathbf{x})(1-\Delta(\mathbf{x}))}{\overline{n}(\mathbf{x})-1}&\overline{n}(\mathbf{x})\geq 3\end{cases}

Since Δ(𝐱)(0,1)\Delta(\mathbf{x})\in(0,1) and the entries of 𝐬(𝐱)\mathbf{s}(\mathbf{x}) are non-negative, the right hand side is strictly negative whilst the left hand side is non-negative. Hence we cannot have Δ(𝐱)<1\Delta(\mathbf{x})<1. So Δ(𝐱)=1\Delta(\mathbf{x})=1 and ρ\rho and 𝐪(𝐱)\mathbf{q}(\mathbf{x}) are identified.

Provided that there is variation in the elements of 𝐱\mathbf{x}, the conditional moment

𝔼[y¯i|ng=n,𝐱g=𝐱]\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g}=n,\mathbf{x}_{g}=\mathbf{x}] =x¯iφ(n,𝐱)\displaystyle=\overline{x}_{i}\varphi(n,\mathbf{x}) (8.7)

identifies φ(n,𝐱)\varphi(n,\mathbf{x}) for all such (𝐱,n2)(\mathbf{x},n\geq 2) in the support of (𝐱g,ng)(\mathbf{x}_{g},n_{g}) (it is not identified when n=1n=1 nor when 𝐱=cιn\mathbf{x}=c\iota_{n} for some constant cc because in these cases y¯i=x¯i=0\overline{y}_{i}=\overline{x}_{i}=0). The rest of the proof restricts attention to the case in which there is variation in the elements of 𝐱\mathbf{x}.

Define 𝐫n(𝐱)=[ng=n|𝐱g=𝐱]\mathbf{r}_{n}(\mathbf{x})=\mathbb{P}[n_{g}=n|\mathbf{x}_{g}=\mathbf{x}] and 𝐫(𝐱)=(𝐫n(𝐱))n=1,,n¯(𝐱)\mathbf{r}(\mathbf{x})=(\mathbf{r}_{n}(\mathbf{x}))_{n=1,...,\overline{n}(\mathbf{x})}. Since 𝐪(𝐱)\mathbf{q}(\mathbf{x}) and ρ\rho are identified, 𝐫(𝐱)\mathbf{r}(\mathbf{x}) is also identified using

𝐫(𝐱)=(ιn¯(𝐱)𝐀(𝐱,ρ)𝐪(𝐱))𝐩(𝐱).\mathbf{r}(\mathbf{x})=(\iota_{\overline{n}(\mathbf{x})}^{\prime}\mathbf{A}(\mathbf{x},\rho)\mathbf{q}(\mathbf{x}))\mathbf{p}(\mathbf{x}).

Now denote 𝒬(𝐱)={m{2,,n¯(𝐱)}:𝐪m(𝐱)>0}\mathcal{Q}(\mathbf{x})=\{m\in\{2,...,\overline{n}(\mathbf{x})\}:\mathbf{q}_{m}(\mathbf{x})>0\}, the largest element of which is n¯(𝐱)\overline{n}(\mathbf{x}). Moreover, since ρ>0\rho>0, the support of 𝐫(𝐱)\mathbf{r}(\mathbf{x}) is (𝐱)={1,,n¯(𝐱)}\mathcal{R}(\mathbf{x})=\{1,...,\overline{n}(\mathbf{x})\} if ρ(0,1)\rho\in(0,1) and (𝐱)=𝒬(𝐱)\mathcal{R}(\mathbf{x})=\mathcal{Q}(\mathbf{x}) if ρ=1\rho=1. Let 𝝅^(𝐱)=(π(m))m𝒬(𝐱)\widehat{\boldsymbol{\pi}}(\mathbf{x})=(\pi(m))_{m\in\mathcal{Q}(\mathbf{x})}, 𝐪^(𝐱)=(𝐪m(𝐱))m𝒬(𝐱)\widehat{\mathbf{q}}(\mathbf{x})=(\mathbf{q}_{m}(\mathbf{x}))_{m\in\mathcal{Q}(\mathbf{x})}, 𝐫^(𝐱)=(𝐫n(𝐱))n(𝐱)\1\widehat{\mathbf{r}}(\mathbf{x})=(\mathbf{r}_{n}(\mathbf{x}))_{n\in\mathcal{R}(\mathbf{x})\backslash 1} and 𝐀^(𝐱,ρ)\widehat{\mathbf{A}}(\mathbf{x},\rho) be the sub-matrix of 𝐀(𝐱,ρ)\mathbf{A}(\mathbf{x},\rho) formed from rows (𝐱)\1\mathcal{R}(\mathbf{x})\backslash 1 and columns 𝒬(𝐱)\mathcal{Q}(\mathbf{x}). Then we have the equations

(φ(n,𝐱))n(𝐱)\1=diag(𝐫^(𝐱))1𝐀^(𝐱,ρ)diag(𝐪^(𝐱))𝝅^(𝐱),\displaystyle(\varphi(n,\mathbf{x}))_{n\in\mathcal{R}(\mathbf{x})\backslash 1}={\rm diag}(\widehat{\mathbf{r}}(\mathbf{x}))^{-1}\widehat{\mathbf{A}}(\mathbf{x},\rho){\rm diag}(\widehat{\mathbf{q}}(\mathbf{x}))\widehat{\boldsymbol{\pi}}(\mathbf{x}), (8.8)

for which the left hand side is identified. Since 𝐀(𝐱,ρ)\mathbf{A}(\mathbf{x},\rho) has full rank, 𝐀^(𝐱,ρ)\widehat{\mathbf{A}}(\mathbf{x},\rho) has rank |𝒬(𝐱)||\mathcal{Q}(\mathbf{x})|. This is because, when ρ=1\rho=1, we have (𝐱)=𝒬(𝐱)\mathcal{R}(\mathbf{x})=\mathcal{Q}(\mathbf{x}) and 𝐀^(𝐱,ρ)=𝐈|𝒬(𝐱)|\widehat{\mathbf{A}}(\mathbf{x},\rho)=\mathbf{I}_{|\mathcal{Q}(\mathbf{x})|}. When ρ(0,1)\rho\in(0,1), ={1,,n¯(𝐱)}\mathcal{R}=\{1,...,\overline{n}(\mathbf{x})\}, hence 𝐀^(𝐱,ρ)\widehat{\mathbf{A}}(\mathbf{x},\rho) comprises rows {2,,n¯(𝐱)}\{2,...,\overline{n}(\mathbf{x})\} and a subset of columns {2,,n¯(𝐱)}\{2,...,\overline{n}(\mathbf{x})\} of 𝐀(𝐱,ρ)\mathbf{A}(\mathbf{x},\rho). The sub-matrix of 𝐀(𝐱,ρ)\mathbf{A}(\mathbf{x},\rho) with rows {2,,n¯(𝐱)}\{2,...,\overline{n}(\mathbf{x})\} and columns {2,,n¯(𝐱)}\{2,...,\overline{n}(\mathbf{x})\} has determinant ρT(n¯(𝐱)1)>0\rho^{T(\overline{n}(\mathbf{x})-1)}>0, which implies that 𝐀^(𝐱,ρ)\widehat{\mathbf{A}}(\mathbf{x},\rho) has rank |𝒬(𝐱)||\mathcal{Q}(\mathbf{x})|. Moreover, diag(𝐪^(𝐱)){\rm diag}(\widehat{\mathbf{q}}(\mathbf{x})) is invertible by construction, hence (8.8) identifies 𝝅^(𝐱)\widehat{\boldsymbol{\pi}}(\mathbf{x}) for all 𝐱\mathbf{x} in the support of 𝐱g\mathbf{x}_{g}.

By assumption there are at least three different group sizes in the support of ng0n_{g_{0}}, so |𝐱𝒬(𝐱)|3|\cup_{\mathbf{x}}\mathcal{Q}(\mathbf{x})|\geq 3. This implies that π(n1),π(n2),π(n3)\pi(n_{1}),\pi(n_{2}),\pi(n_{3}) are identified for at least three different group sizes. We conclude by applying the well known result established by Lee (2007), Davezies et al. (2009) and Bramoullé et al. (2009) that (γ,δ,β)(\gamma,\delta,\beta) are identified when there are at least three different group sizes and γβ+δ0\gamma\beta+\delta\neq 0.

Proof.

of Proposition 2

Fix 𝐱g2=𝐱\mathbf{x}_{g_{2}}=\mathbf{x} where 𝐱\mathbf{x} is some point in its support. Provided that there is variation in the elements of 𝐱\mathbf{x}, the conditional moment

𝔼[y¯i|ng1=n1,ng2=n2,𝐱g2=𝐱]\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{1}}=n_{1},n_{g_{2}}=n_{2},\mathbf{x}_{g_{2}}=\mathbf{x}] =x¯iφ(n1,n2,𝐱)\displaystyle=\overline{x}_{i}\varphi^{\prime}(n_{1},n_{2},\mathbf{x}) (8.9)

identifies φ(n1,n2,𝐱)\varphi^{\prime}(n_{1},n_{2},\mathbf{x}) for all such (𝐱,n12,n2)(\mathbf{x},n_{1}\geq 2,n_{2}) in the support of (𝐱g2,ng1,ng2)(\mathbf{x}_{g_{2}},n_{g_{1}},n_{g_{2}}) (it is not identified when n1=1n_{1}=1 nor when 𝐱=cιn2\mathbf{x}=c\iota_{n_{2}} for some constant cc because in these cases y¯i=x¯i=0\overline{y}_{i}=\overline{x}_{i}=0). We obtain

φ(n1,n2,𝐱)=ψπ(n1)+(1ψ)π(n2)\displaystyle\varphi^{\prime}(n_{1},n_{2},\mathbf{x})=\psi\pi(n_{1})+(1-\psi)\pi(n_{2}) (8.10)

We now consider the cases ψ=0\psi=0, ψ=1\psi=1 and ψ(0,1)\psi\in(0,1) separately. If ψ=0\psi=0 then (γ,δ,β)(\gamma,\delta,\beta) are identified because γβ+δ0\gamma\beta+\delta\neq 0 and ng2n_{g_{2}} has at least three points in the support of its distribution. If ψ=1\psi=1 then (γ,δ,β)(\gamma,\delta,\beta) are identified because γβ+δ0\gamma\beta+\delta\neq 0 and ng1n_{g_{1}} has at least three points in the support of its distribution. The rest of the proof now considers ψ(0,1)\psi\in(0,1).

By standard arguments for two-way bi-partite fixed effects models (e.g., Abowd et al. (1999)), equation (8.10) identifies μ(n1)=ψπ(n1)+c\mu(n_{1})=\psi\pi(n_{1})+c and μ(n2)=(1ψ)π(n2)c\mu(n_{2})=(1-\psi)\pi(n_{2})-c for every pair (n1,n2)(n_{1},n_{2}) corresponding to two vertices in a connected component of 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}}, where cc is an unknown constant. Let the vertices in the connected component respectively correspond to 𝒩1\mathcal{N}_{1} and 𝒩2\mathcal{N}_{2}, where 𝒩1\mathcal{N}_{1} is a subset of the support of ng1n_{g_{1}} and 𝒩2\mathcal{N}_{2} is a subset of the support of ng2n_{g_{2}}.

By assumption either |𝒩1|3|\mathcal{N}_{1}|\geq 3 or |𝒩2|3|\mathcal{N}_{2}|\geq 3. Without loss of generality, we present the case of |𝒩1|3|\mathcal{N}_{1}|\geq 3. If the converse holds, the argument is symmetric, switching the roles of the group sizes in the remainder of the proof. Since the component is connected, we have 2n1,1<n1,2<n1,32\leq n_{1,1}<n_{1,2}<n_{1,3} and 2n22\leq n_{2} such that {n1,1,n1,2,n1,3}𝒩1\{n_{1,1},n_{1,2},n_{1,3}\}\subseteq\mathcal{N}_{1} and {n2}𝒩2\{n_{2}\}\subseteq\mathcal{N}_{2}. Moreover, μ(n)\mu(n) is identified for n{n1,1,n1,2,n1,3,n2}n\in\{n_{1,1},n_{1,2},n_{1,3},n_{2}\}. For i{1,2,3}i\in\{1,2,3\}, denote

Δi=μ(n1,i)+μ(n2)=π(n2)+ψ(γβ+δ)(n1,in2)(n1,i1+β)(n21+β)\displaystyle\Delta_{i}=\mu(n_{1,i})+\mu(n_{2})=\pi(n_{2})+\frac{\psi(\gamma\beta+\delta)(n_{1,i}-n_{2})}{(n_{1,i}-1+\beta)(n_{2}-1+\beta)} (8.11)

Then after some algebra, for k{1,2,3}k\in\{1,2,3\} we have

ΔiΔk=ψ(γβ+δ)(n1,in1,k)(n1,i1+β)(n1,k1+β)\displaystyle\Delta_{i}-\Delta_{k}=\frac{\psi(\gamma\beta+\delta)(n_{1,i}-n_{1,k})}{(n_{1,i}-1+\beta)(n_{1,k}-1+\beta)} (8.12)

Now since ψ0\psi\neq 0 and δ+βγ0\delta+\beta\gamma\neq 0,

ΔΔ1Δ2Δ1Δ3=(n1,1n1,2)(n1,31+β)(n1,1n1,3)(n1,21+β)\displaystyle\Delta\triangleq\frac{\Delta_{1}-\Delta_{2}}{\Delta_{1}-\Delta_{3}}=\frac{(n_{1,1}-n_{1,2})(n_{1,3}-1+\beta)}{(n_{1,1}-n_{1,3})(n_{1,2}-1+\beta)} (8.13)

Since Δ\Delta is identified, β\beta is identified by (8.13) if Δ(n1,1n1,3)n1,1n1,2\Delta(n_{1,1}-n_{1,3})\neq n_{1,1}-n_{1,2}. Suppose that Δ(n1,1n1,3)=n1,1n1,2\Delta(n_{1,1}-n_{1,3})=n_{1,1}-n_{1,2}. Then (8.13) implies n1,2=n1,3n_{1,2}=n_{1,3}, but here we have n1,2<n1,3n_{1,2}<n_{1,3}, a contradiction. So β\beta is identified. Since β\beta is identified, ψ(γβ+δ)\psi(\gamma\beta+\delta) is identified from (8.12), hence γδ/(n21)\gamma-\delta/(n_{2}-1) is identified using (8.11). Now, since the support of ng2n_{g_{2}} has at least three elements, we also identify Δ=μ(n1)+μ(n2)\Delta^{\prime}=\mu(n_{1})+\mu(n_{2}^{\prime}) for some n1n_{1} in the support of ng1n_{g_{1}} and some n2n2n_{2}^{\prime}\neq n_{2} in the support of ng2n_{g_{2}}. Note that n1n_{1} and n2n_{2} may be in a different connected component of 𝒢ng1,ng2\mathcal{G}_{n_{g_{1}},n_{g_{2}}}. Since β\beta and ψ(γβ+δ)\psi(\gamma\beta+\delta) are identified, Δ\Delta^{\prime} identifies γδ/(n21)\gamma-\delta/(n_{2}^{\prime}-1). Since, γδ/(n21)\gamma-\delta/(n_{2}-1) and γδ/(n21)\gamma-\delta/(n_{2}^{\prime}-1) are identified and n2n2n_{2}\neq n_{2}^{\prime}, we identify γ\gamma and δ\delta.

IV estimation for missing data with known group sizes

In stacked form, the structural equation is

𝐲¯g=β𝐆g(ng0)𝐲¯g+δ𝐆g(ng0)𝐱¯g+γ𝐱¯g+ϵ¯g,\displaystyle\overline{\mathbf{y}}_{g}=\beta\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g}+\delta\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g}+\gamma\overline{\mathbf{x}}_{g}+\overline{\boldsymbol{\epsilon}}_{g}, (8.14)

where 𝐲¯g=𝐖g𝐲g\overline{\mathbf{y}}_{g}=\mathbf{W}_{g}\mathbf{y}_{g}, 𝐖g\mathbf{W}_{g} is the ng×ngn_{g}\times n_{g} within-specified-group transformation with diagonal elements 1ng11-n_{g}^{-1} and off-diagonal elements ng1-n_{g}^{-1} and 𝐆g(ng0)\mathbf{G}_{g}(n_{g_{0}}) is ng×ngn_{g}\times n_{g} with diagonal elements equal to zero and off-diagonal elements equal to (ng01)1(n_{g_{0}}-1)^{-1}. We can also rewrite the reduced form (equation (4.1)) in stacked form as

𝐲¯g=(𝐈ngβ𝐆g(ng0))1(γ𝐈ng+δ𝐆g(ng0))𝐱¯g+(𝐈ngβ𝐆g(ng0))1ϵ¯g,\displaystyle\overline{\mathbf{y}}_{g}=(\mathbf{I}_{n_{g}}-\beta\mathbf{G}_{g}(n_{g_{0}}))^{-1}(\gamma\mathbf{I}_{n_{g}}+\delta\mathbf{G}_{g}(n_{g_{0}}))\overline{\mathbf{x}}_{g}+(\mathbf{I}_{n_{g}}-\beta\mathbf{G}_{g}(n_{g_{0}}))^{-1}\overline{\boldsymbol{\epsilon}}_{g}, (8.15)

where 𝐈n\mathbf{I}_{n} is the identity of dimension nn, hence we have

𝔼[𝐆g(ng0)𝐲¯g|ng,ng0,𝐱g]\displaystyle\mathbb{E}[\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g}|n_{g},n_{g_{0}},\mathbf{x}_{g}] =γ𝐆g(ng0)𝐱¯g+(γβ+δ)(j=2βj2𝐆g(ng0)j)𝐱¯g.\displaystyle=\gamma\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g}+(\gamma\beta+\delta)\left(\sum_{j=2}^{\infty}\beta^{j-2}\mathbf{G}_{g}(n_{g_{0}})^{j}\right)\overline{\mathbf{x}}_{g}. (8.16)

Using (8.16), it is clear that 𝐆g(ng0)2𝐱¯g,𝐆g(ng0)3𝐱¯g,\mathbf{G}_{g}(n_{g_{0}})^{2}\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})^{3}\overline{\mathbf{x}}_{g},... are instruments for 𝐆g(ng0)𝐲¯g\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g}, hence we can estimate (8.14) by linear GMM using 𝐱¯g,𝐆g(ng0)𝐱¯g,𝐆g(ng0)2𝐱¯g,\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})^{2}\overline{\mathbf{x}}_{g},... as instruments for 𝐱¯g,𝐆g(ng0)𝐱¯g,𝐆g(ng0)𝐲¯g\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g} (Bramoullé et al., 2009).

\marginsize

3cm3cm0.5cm0.5cm

Table 1: Monte Carlo Results: Missing data
Contextual effect only (β=0\beta=0 imposed), NM=8000N\approx M=8000
Model Miss-specified (M) Known (K) Unknown (U) Unknown-P (U-P)
ρ\rho Mean RMSE Mean RMSE Mean RMSE Mean RMSE
1 δ\delta 0.502 0.084 0.502 0.084 0.502 0.084 0.502 0.084
γ\gamma 1.002 0.06 1.002 0.06 1.002 0.06 1.002 0.06
0.9 δ\delta 0.426 0.112 0.501 0.086 0.499 0.099 0.499 0.099
γ\gamma 0.971 0.07 1.002 0.061 1.001 0.07 1.001 0.07
0.7 δ\delta 0.328 0.198 0.504 0.097 0.502 0.153 0.5 0.151
γ\gamma 0.931 0.106 1.002 0.067 1.001 0.101 1 0.1
0.5 δ\delta 0.264 0.266 0.498 0.112 0.518 0.257 0.504 0.238
γ\gamma 0.904 0.144 0.998 0.077 1.007 0.158 1 0.151
0.3 δ\delta 0.225 0.332 0.505 0.146 0.896 1.602 0.54 0.468
γ\gamma 0.892 0.204 1.002 0.096 1.262 2.825 1.017 0.279
Contextual effect only (β=0\beta=0 imposed), NM=1600N\approx M=1600
1 δ\delta 0.498 0.184 0.498 0.184 0.498 0.184 0.498 0.184
γ\gamma 0.999 0.132 0.999 0.132 0.999 0.132 0.999 0.132
0.9 δ\delta 0.425 0.201 0.498 0.188 0.5 0.22 0.499 0.219
γ\gamma 0.973 0.142 1.003 0.134 1.003 0.153 1.003 0.152
0.7 δ\delta 0.325 0.277 0.502 0.217 0.511 0.347 0.501 0.332
γ\gamma 0.926 0.189 1 0.148 1.002 0.224 0.997 0.219
0.5 δ\delta 0.261 0.353 0.495 0.253 0.693 1.303 0.52 0.538
γ\gamma 0.903 0.251 0.998 0.175 1.083 0.649 1.006 0.336
Contextual and endogenous effects, NM=8000N\approx M=8000
1 β\beta 0.095 0.772 0.095 0.772 0.095 0.772 0.095 0.772
δ\delta 0.471 0.245 0.471 0.245 0.471 0.245 0.471 0.245
γ\gamma 1.02 0.182 1.02 0.182 1.02 0.182 1.02 0.182
0.9 β\beta 0.092 0.822 0.076 0.788 0.04 0.81 0.04 0.81
δ\delta 0.394 0.325 0.479 0.246 0.485 0.263 0.485 0.263
γ\gamma 0.989 0.174 1.018 0.187 1.008 0.193 1.008 0.193
0.7 β\beta 0.133 0.899 0.069 0.795 0.048 0.875 0.052 0.876
δ\delta 0.269 0.48 0.483 0.253 0.481 0.322 0.478 0.322
γ\gamma 0.952 0.171 1.017 0.194 1.008 0.218 1.008 0.217
0.5 β\beta 0.122 0.928 0.052 0.825 0.067 0.914 0.07 0.919
δ\delta 0.205 0.577 0.484 0.272 0.485 0.413 0.475 0.404
γ\gamma 0.923 0.188 1.01 0.202 1.015 0.269 1.012 0.262
0.3 β\beta 0.077 0.952 0.059 0.851 0.066 0.91 0.098 0.926
δ\delta 0.19 0.647 0.489 0.301 0.734 1.463 0.5 0.651
γ\gamma 0.907 0.246 1.015 0.213 1.21 2.77 1.032 0.404
Contextual and endogenous effects, NM=1600N\approx M=1600
1 β\beta 0.066 0.888 0.066 0.888 0.066 0.888 0.066 0.888
δ\delta 0.481 0.343 0.481 0.343 0.481 0.343 0.481 0.343
γ\gamma 1.015 0.245 1.015 0.245 1.015 0.245 1.015 0.245
0.9 β\beta 0.129 0.922 0.095 0.915 0.103 0.915 0.104 0.915
δ\delta 0.387 0.425 0.48 0.35 0.477 0.394 0.476 0.394
γ\gamma 1.004 0.246 1.029 0.26 1.031 0.276 1.031 0.275
0.7 β\beta 0.035 0.95 0.034 0.906 0.006 0.943 0.009 0.943
δ\delta 0.32 0.527 0.495 0.377 0.515 0.499 0.504 0.487
γ\gamma 0.939 0.255 1.009 0.263 1.008 0.335 1.004 0.326
0.5 β\beta 0.031 0.965 0.03 0.915 0.022 0.95 0.023 0.949
δ\delta 0.249 0.643 0.487 0.424 0.599 1.263 0.488 0.707
γ\gamma 0.911 0.305 1.007 0.284 1.051 0.658 1 0.432

Notes: Based on 1000 replications and reported to 3 decimal places. ‘Miss-specified’ is the miss-specified model which uses the NLS estimator obtained by treating the specified groups as if they were the groups (i.e., based on the moment 𝔼[y¯i|ng,𝐱g]=x¯iπ(ng)\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g})). ‘Known’ is the model in which the group size is known which uses the NLS estimator under Assumption 2 (i.e., based on the moment 𝔼[y¯i|ng0,𝐱g]=x¯iπ(ng0)\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}})). ‘Unknown’ is the model in which the group size is unknown which uses the GMM estimator under Assumption 3. ‘Unonwn-P’ is a modification of U which additionally uses the parametric restriction ng02Binomial(2,ω)n_{g_{0}}-2\sim{\rm Binomial}(2,\omega) to construct the GMM estimator. Models Unknown and Unknown-P are based on the upper bound on the support of ng0n_{g_{0}} being known to be n¯=4\overline{n}=4. Results are presented for (β,δ,γ)(\beta,\delta,\gamma) only. Their true values are (0,0.5,1)(0,0.5,1).

Table 2: Monte Carlo Results: Uncertain groups
Contextual effect only (β=0\beta=0 imposed), N=8000N=8000
Model Rooms (R) Floors (F) Known (K) Uncertain (U)
ψ\psi Mean RMSE Mean RMSE Mean RMSE Mean RMSE
0.8 δ\delta 0.41 0.123 0.501 0.135 0.496 0.066 0.516 0.131
γ\gamma 0.992 0.062 1.001 0.034 0.999 0.04 1.002 0.062
0.6 δ\delta 0.321 0.198 0.498 0.135 0.5 0.061 0.504 0.142
γ\gamma 0.983 0.064 1 0.032 1 0.033 1.001 0.063
0.4 δ\delta 0.232 0.281 0.497 0.133 0.5 0.064 0.501 0.145
γ\gamma 0.973 0.067 1 0.031 1 0.028 1.001 0.063
0.2 δ\delta 0.144 0.366 0.499 0.129 0.497 0.08 0.503 0.145
γ\gamma 0.964 0.071 1 0.028 1 0.026 1.004 0.058
Contextual effect only (β=0\beta=0 imposed), N=1600N=1600
0.8 δ\delta 0.402 0.208 0.494 0.333 0.495 0.14 0.564 0.29
γ\gamma 0.983 0.132 0.997 0.077 0.997 0.088 1 0.135
0.6 δ\delta 0.313 0.263 0.503 0.327 0.493 0.137 0.538 0.31
γ\gamma 0.975 0.134 1.001 0.075 0.996 0.073 0.999 0.135
0.4 δ\delta 0.226 0.331 0.505 0.319 0.492 0.146 0.516 0.332
γ\gamma 0.966 0.136 1 0.071 0.997 0.063 0.999 0.134
0.2 δ\delta 0.134 0.409 0.507 0.308 0.499 0.174 0.503 0.357
γ\gamma 0.955 0.139 1.001 0.065 0.999 0.058 1.006 0.13
Contextual and endogenous effects, N=8000N=8000
0.8 β\beta 0.041 0.814 0.127 0.683 0.054 0.46 0.055 0.799
δ\delta 0.392 0.36 0.429 0.384 0.466 0.214 0.492 0.302
γ\gamma 0.998 0.165 1.007 0.051 1.001 0.064 1.01 0.167
0.6 β\beta 0.039 0.848 0.112 0.663 0.07 0.461 0.099 0.8
δ\delta 0.301 0.495 0.431 0.385 0.462 0.227 0.458 0.354
γ\gamma 0.989 0.142 1.004 0.049 1.004 0.049 1.013 0.143
0.4 β\beta 0.042 0.873 0.115 0.679 0.072 0.491 0.106 0.769
δ\delta 0.205 0.64 0.427 0.394 0.458 0.256 0.446 0.379
γ\gamma 0.978 0.119 1.004 0.048 1.003 0.043 1.01 0.117
0.2 β\beta 0.032 0.881 0.119 0.653 0.106 0.561 0.123 0.724
δ\delta 0.123 0.772 0.422 0.391 0.436 0.303 0.433 0.393
γ\gamma 0.969 0.099 1.004 0.044 1.004 0.042 1.011 0.093
Contextual and endogenous effects, N=1600N=1600
0.8 β\beta 0.038 0.913 0.067 0.841 0.097 0.709 0.019 0.896
δ\delta 0.393 0.449 0.465 0.61 0.444 0.343 0.559 0.428
γ\gamma 0.995 0.228 1.001 0.098 1.002 0.123 1.005 0.239
0.6 β\beta 0.024 0.927 0.088 0.833 0.065 0.697 0.051 0.9
δ\delta 0.311 0.568 0.458 0.604 0.455 0.358 0.522 0.485
γ\gamma 0.986 0.209 1.006 0.096 0.998 0.096 1.009 0.22
0.4 β\beta 0.019 0.94 0.096 0.833 0.107 0.716 0.045 0.888
δ\delta 0.223 0.7 0.457 0.579 0.435 0.386 0.493 0.55
γ\gamma 0.974 0.192 1.006 0.091 1.001 0.081 1.005 0.198
0.2 β\beta -0.004 0.94 0.091 0.825 0.078 0.754 0.06 0.863
δ\delta 0.147 0.826 0.456 0.57 0.451 0.429 0.467 0.606
γ\gamma 0.962 0.177 1.004 0.085 1.001 0.075 1.011 0.176

Notes: Based on 1000 replications and reported to 3 decimal places. ‘Rooms’ is the model in which the room is treated as if it were the group with NLS estimator based on the moment 𝔼[y¯i|ng1,ng2,𝐱g]=x¯iπ(ng1)\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{1}}). ‘Floors’ is the model in which the floor is treated as if it were the group with NLS estimator based on the moment 𝔼[y¯i|ng1,ng2,𝐱g]=x¯iπ(ng2)\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{2}}). ‘Known’ is the model in which is it known whether the group is the room or the floor (i.e., based on the moment 𝔼[y¯i|ng0,𝐱g]=x¯iπ(ng0)\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}})). ‘Uncertain’ is the model in which it is unknown whether the group is the room or the floor with NLS estimator based on 𝔼[y¯i|ng1,ng2,𝐱g]=x¯i[ψπ(ng1)+(1ψ)π(ng2)]\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}[\psi\pi(n_{g_{1}})+(1-\psi)\pi(n_{g_{2}})]. Results are presented for (β,δ,γ)(\beta,\delta,\gamma) only. Their true values are (0,0.5,1)(0,0.5,1).

Table 3: Monte Carlo Results: Uncertain groups (FE)
Contextual effect only (β=0\beta=0 imposed), N=8000N=8000
Model Rooms (R) Floors (F) Known (K) Uncertain (U)
ψ\psi Mean RMSE Mean RMSE Mean RMSE Mean RMSE
0.8 δ\delta 0.41 0.123 0.847 0.375 0.496 0.066 0.516 0.131
γ\gamma 0.992 0.062 1.3 0.303 0.999 0.04 1.002 0.062
0.6 δ\delta 0.321 0.198 0.756 0.292 0.5 0.061 0.504 0.142
γ\gamma 0.983 0.064 1.224 0.227 1 0.033 1.001 0.063
0.4 δ\delta 0.232 0.281 0.668 0.218 0.5 0.064 0.501 0.145
γ\gamma 0.973 0.067 1.149 0.153 1 0.028 1.001 0.063
0.2 δ\delta 0.144 0.366 0.584 0.156 0.497 0.08 0.503 0.145
γ\gamma 0.964 0.071 1.075 0.081 1 0.026 1.004 0.058
Contextual effect only (β=0\beta=0 imposed), N=1600N=1600
0.8 δ\delta 0.402 0.208 0.841 0.488 0.495 0.14 0.564 0.29
γ\gamma 0.983 0.132 1.297 0.309 0.997 0.088 1 0.135
0.6 δ\delta 0.313 0.263 0.762 0.43 0.493 0.137 0.538 0.31
γ\gamma 0.975 0.134 1.225 0.24 0.996 0.073 0.999 0.135
0.4 δ\delta 0.226 0.331 0.679 0.375 0.492 0.146 0.516 0.332
γ\gamma 0.966 0.136 1.151 0.169 0.997 0.063 0.999 0.134
0.2 δ\delta 0.134 0.409 0.596 0.328 0.499 0.174 0.503 0.357
γ\gamma 0.955 0.139 1.076 0.102 0.999 0.058 1.006 0.13
Contextual and endogenous effects, N=8000N=8000
0.8 β\beta 0.041 0.814 0.224 0.558 0.054 0.46 0.055 0.799
δ\delta 0.392 0.36 0.725 0.375 0.466 0.214 0.492 0.302
γ\gamma 0.998 0.165 1.318 0.324 1.001 0.064 1.01 0.167
0.6 β\beta 0.039 0.848 0.198 0.565 0.07 0.461 0.099 0.8
δ\delta 0.301 0.495 0.645 0.347 0.462 0.227 0.458 0.354
γ\gamma 0.989 0.142 1.237 0.244 1.004 0.049 1.013 0.143
0.4 β\beta 0.042 0.873 0.17 0.602 0.072 0.491 0.106 0.769
δ\delta 0.205 0.64 0.571 0.347 0.458 0.256 0.446 0.379
γ\gamma 0.978 0.119 1.159 0.168 1.003 0.043 1.01 0.117
0.2 β\beta 0.032 0.881 0.152 0.612 0.106 0.561 0.123 0.724
δ\delta 0.123 0.772 0.492 0.36 0.436 0.303 0.433 0.393
γ\gamma 0.969 0.099 1.082 0.094 1.004 0.042 1.011 0.093
Contextual and endogenous effects, N=1600N=1600
0.8 β\beta 0.038 0.913 0.138 0.747 0.097 0.709 0.019 0.896
δ\delta 0.393 0.449 0.785 0.62 0.444 0.343 0.559 0.428
γ\gamma 0.995 0.228 1.31 0.331 1.002 0.123 1.005 0.239
0.6 β\beta 0.024 0.927 0.129 0.772 0.065 0.697 0.051 0.9
δ\delta 0.311 0.568 0.715 0.582 0.455 0.358 0.522 0.485
γ\gamma 0.986 0.209 1.237 0.263 0.998 0.096 1.009 0.22
0.4 β\beta 0.019 0.94 0.138 0.789 0.107 0.716 0.045 0.888
δ\delta 0.223 0.7 0.617 0.565 0.435 0.386 0.493 0.55
γ\gamma 0.974 0.192 1.161 0.191 1.001 0.081 1.005 0.198
0.2 β\beta -0.004 0.94 0.136 0.804 0.078 0.754 0.06 0.863
δ\delta 0.147 0.826 0.521 0.563 0.451 0.429 0.467 0.606
γ\gamma 0.962 0.177 1.083 0.123 1.001 0.075 1.011 0.176

Notes: Based on 1000 replications and reported to 3 decimal places. ‘Rooms’ is the model in which the room is treated as if it were the group with NLS estimator based on the moment 𝔼[y¯i|ng1,ng2,𝐱g]=x¯iπ(ng1)\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{1}}). ‘Floors’ is the model in which the floor is treated as if it were the group with NLS estimator based on the moment 𝔼[y¯i|ng1,ng2,𝐱g]=x¯iπ(ng2)\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{2}}). ‘Known’ is the model in which is it known whether the group is the room or the floor (i.e., based on the moment 𝔼[y¯i|ng0,𝐱g]=x¯iπ(ng0)\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}})). ‘Uncertain’ is the model in which it is unknown whether the group is the room or the floor with NLS estimator based on 𝔼[y¯i|ng1,ng2,𝐱g]=x¯i[ψπ(ng1)+(1ψ)π(ng2)]\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}[\psi\pi(n_{g_{1}})+(1-\psi)\pi(n_{g_{2}})]. Results are presented for (β,δ,γ)(\beta,\delta,\gamma) only. Their true values are (0,0.5,1)(0,0.5,1).

Table 4: Summary statistics
All firms Firms with ng050n_{g_{0}}\leq 50
N0=8448,G0=755N_{0}=8448,G_{0}=755 N0=5599,G0=732N_{0}=5599,G_{0}=732
Mean Median SD Mean Median SD
Individual covariates:
Quit-to-exit (0/1) 0.036 0 0.187 0.035 0 0.185
Female (0/1) 0.450 0 0.498 0.454 0 0.498
Grad. degree (0/1) 0.376 0 0.484 0.322 0 0.467
Age (years) 36.493 34 10.145 36.713 34 10.812
Experience (years) 7.799 6 6.927 7.712 5 7.183
Own fee-weighted caseload (‘000 yuan) 20.130 3.050 39.325 20.758 3.798 38.478
Group F:
Peer fee-weighted caseload 20.130 17.888 16.101 20.758 16.812 19.210
Prop. high-ability 0.250 0.226 0.172 0.255 0.231 0.204
Prop. female 0.450 0.444 0.190 0.454 0.462 0.230
Mean age 36.493 36.493 5.063 36.713 35.800 6.073
Mean experience 7.799 7.532 3.374 7.712 7.000 4.004
Prop. grad. degree 0.376 0.402 0.210 0.322 0.324 0.229
Group F-C (age35age\leq 35):
Peer fee-weighted caseload 19.532 14.562 18.519 20.221 14.634 21.684
Prop. high-ability 0.244 0.217 0.193 0.251 0.200 0.228
Prop. female 0.537 0.538 0.218 0.531 0.500 0.260
Peer experience 29.736 29.667 1.573 29.665 29.600 1.839
Peer experience 3.998 4.000 1.185 3.884 3.750 1.395
Prop. grad. degree 0.374 0.385 0.228 0.321 0.333 0.251
Group F-C (age>35age>35):
Peer fee-weighted caseload 20.514 18.430 18.558 20.957 15.902 22.556
Prop. high-ability peers 0.253 0.243 0.220 0.253 0.200 0.268
Prop. female 0.338 0.303 0.228 0.352 0.333 0.283
Peer age 45.190 43.108 5.804 46.301 44.818 6.985
Peer experience 12.763 12.500 4.526 13.002 12.429 5.49
Prop. grad. degree 0.388 0.438 0.260 0.330 0.286 0.301
Notes: Quit-to-exit is the dependent variable of interest, which takes a value of 1 if a lawyer quits the job and exits the local legal practice in 2016, 0 otherwise. Fee-weighted caseload measures lawyers’ ability in civil litigation in the prior three years (2013-2015), weighted by the litigation size. We consider two peer groups, F and F-C. Group F means all associate lawyers in the same law firm form the relevant group. Group F-C means lawyers in the same law firm of the same cohort –young or experienced– form the relevant group, where young (experienced) cohort is defined as below (above) 35. Group variables are calculated using a “leave-own-out" method, where high-ability is defined as being in the top quartile of fee-weighted caseload (>24.44>24.44).

Table 5: Peer effects in lawyers’ quit-to-exit
Dep. var: Quit-to-exit (0/1)
(1) (2) (3) (4) (5) (6) (7)
All All All All age35age\leq 35 age>35age>35 All
Group F:
    Peer fee-weighted caseload 0.0001
(0.00029)
    Prop. high-ability 0.0544**
(0.02201)
    Prop. female -0.0136 -0.0115
(0.02893) (0.02840)
    Peer age 0.0042** 0.0046**
(0.00180) (0.00181)
    Peer experience -0.0052* -0.0056*
(0.00295) (0.00295)
    Prop. grad. degree -0.0457* -0.0423
(0.02715) (0.02704)
Group F-C:
    Peer fee-weighted caseload 0.0003
(0.00028)
    Prop. high-ability 0.0451*** 0.0767*** 0.0211
(0.01747) (0.02878) (0.02276)
    Prop. female -0.0163 -0.0142 -0.0250 -0.0090
(0.02166) (0.02178) (0.03430) (0.02576)
    Peer age 0.0008 0.0010 0.0008 0.0007
(0.00198) (0.00196) (0.00524) (0.00213)
    Peer experience -0.0001 -0.0004 -0.0016 -0.0002
(0.00179) (0.00178) (0.00582) (0.00197)
    Prop. grad. degree -0.0365* -0.0347* -0.0376 -0.0279
(0.01895) (0.01875) (0.02865) (0.02301)
Group U:
    Prop. high-ability 0.0581**
(0.0285)
    Prop. female -0.0130
(0.0285)
    Peer age 0.0021
(0.0024)
    Peer experience -0.0015
(0.0025)
    Prop. grad. degree -0.0432*
(0.0242)
ψ\psi 0.5618
(0.6090)
Fee-weighted caseload -0.0003*** -0.0003*** -0.0003*** -0.0003*** -0.0003*** -0.0002*** -0.0003***
(0.00005) (0.00006) (0.00005) (0.00005) (0.00007) (0.00008) (0.0001)
Female 0.0146*** 0.0128** 0.0148*** 0.0129** 0.0159* 0.0086 0.0134**
(0.00555) (0.00602) (0.00552) (0.00602) (0.00860) (0.00700) (0.0061)
Age -0.0035** -0.0044** -0.0035** -0.0043** 0.0086 -0.0001 -0.0043**
(0.00143) (0.00215) (0.00143) (0.00215) (0.01618) (0.00300) (0.0021)
Experience 0.0016 0.0020 0.0016 0.0020 0.0063 0.0008 0.0019
(0.00143) (0.00141) (0.00142) (0.00140) (0.00419) (0.00150) (0.0014)
Grad. degree -0.0050 -0.0066 -0.0048 -0.0065 -0.0133* 0.0042 -0.0065
(0.00552) (0.00558) (0.00551) (0.00556) (0.00689) (0.00893) (0.0055)
N0N_{0} 8448 8137 8448 8137 4622 3515 8137
G0G_{0} 755 1359 755 1359 691 669 1359
FE F F-C F F-C F-C F-C F-C
***: significant at the 0.01 level, **:0.05 level, *: 0.1 level.
Notes: Each column represents a regression; quadratic terms in age and experience are included in the regression. All reported regressors are defined as in Table 4 and Section 7. Standard errors in parentheses are clustered by law firm. Peer group F means all lawyers in the same firm are peers. F-C means lawyers in the same firm of the same cohort (young or experienced) are peers. ‘U’ means there is group uncertainty between F-C and F, with probability ψ\psi of being the former.
Table 6: Peer effects in lawyers’ quit-to-exit: Sub-sample of firms with ng050n_{g_{0}}\leq 50
Dep. var: Quit-to-exit (0/1)
(1) (2) (3) (4) (5) (6) (7)
All All All All age35age\leq 35 age>35age>35 All
Group F:
    Peer fee-weighted caseload 0.0001
(0.00031)
    Prop. high-ability 0.0573**
(0.02318)
    Prop. female -0.0150 -0.0132
(0.03082) (0.03001)
    Peer age 0.0046** 0.0050**
(0.00193) (0.00193)
    Peer experience -0.0059* -0.0062*
(0.00330) (0.00331)
    Prop. grad. degree -0.0249 -0.0217
(0.03011) (0.03014)
Group F-C:
    Peer fee-weighted caseload 0.0004
(0.00033)
    Prop. high-ability 0.0496*** 0.0760** 0.0288
(0.01835) (0.03012) (0.02333)
    Prop. female -0.0165 -0.0149 -0.0091 -0.0374
(0.02279) (0.02288) (0.03562) (0.03034)
    Peer age 0.0007 0.0009 -0.0023 0.0009
(0.00210) (0.00203) (0.00495) (0.00230)
    Peer experience 0.0001 -0.0001 0.0034 -0.0006
(0.00202) (0.00199) (0.00570) (0.00215)
    Prop. grad. degree -0.0201 -0.0188 -0.0106 -0.0181
(0.02235) (0.02236) (0.03298) (0.02634)
Group U:
    Prop. high-ability 0.0643*
(0.0379)
    Prop. female -0.0084
( 0.0308)
    Peer age -0.0005
(0.0025)
    Peer experience 0.0020
(0.0033)
    Prop. grad. degree -0.0505
(0.0372)
ψ\psi 0.6732
(0.8046)
Fee-weighted caseload -0.0003*** -0.0002** -0.0002*** -0.0002*** -0.0002** -0.0001 -0.0002***
(0.00008) (0.00010) (0.00007) (0.00007) (0.00011) (0.00012) (0.0001)
Female 0.0137** 0.0120* 0.0138** 0.0120* 0.0219** -0.0075 0.0112*
(0.00691) (0.00726) (0.00683) (0.00725) (0.00976) (0.01162) (0.0066)
Age -0.0024 -0.0032 -0.0024 -0.0031 0.0159 0.0043 -0.0023
(0.00177) (0.00219) (0.00177) (0.00219) (0.01952) (0.00314) (0.0021)
Experience 0.0005 0.0010 0.0005 0.0010 0.0033 -0.0016 0.0010
(0.00143) (0.00148) (0.00142) (0.00146) (0.00560) (0.00175) (0.0014)
Grad. degree 0.0027 0.0011 0.0029 0.0011 -0.0004 0.0096 -0.0013
(0.00821) (0.00926) (0.00822) (0.00928) (0.01109) (0.01343) ( 0.0087)
N0N_{0} 5599 5288 5599 5288 3083 2205 5288
G0G_{0} 732 1313 732 1313 668 645 1313
FE F F-C F F-C F-C F-C F-C
***: significant at the 0.01 level, **:0.05 level, *: 0.1 level.
Notes: Each column represents a regression; quadratic terms in age and experience are included in the regression. All reported regressors are defined as in Table 4 and Section 7. Standard errors in parentheses are clustered by law firm. Peer group F means all lawyers in the same firm are peers. F-C means lawyers in the same firm of the same cohort (young or experienced) are peers. ‘U’ means there is group uncertainty between F-C and F, with probability ψ\psi of being the former.
Table 7: Peer effects in lawyers’ quit-to-exit: Robustness
Dep. var: Quit-to-exit (0/1)
(1) (2) (3) (4) (5) (6)
<55<55 subjects only All law firms in Use number of
Shanghai high-ability peers
Group F:
   Prop. high-ability 0.0573** 0.0733***
(0.02474) (0.02238)
   Number of high-ability 0.0202***
(0.00503)
Group F-C:
   Prop. high-ability 0.0509*** 0.0421**
(0.01946) (0.01656)
   Number of high-ability 0.0200***
(0.00545)
N0N_{0} 7909 7605 13613 13205 8448 8137
G0G_{0} 639 1243 1061 1919 755 1359
FE F F-C F F-C F F-C
***: significant at the 0.01 level, **:0.05 level, *: 0.1 level.
Notes: Each column represents a regression. Odd-numbered models are estimated based on model (3) of Table 5 (group F), and even-numbered models are estimated based on model (4) of Table 5 (group F-C). Models (1)-(2) focus on subjects below age 55. Models (3)-(4) add an interaction term of prop. high-ability and an indicator identifying low-to-medium ability lawyers (caseload below median). Models (5)-(6) add an interaction term of prop. high-ability and an indicator for female.
Table 8: Peer effects in lawyers’ quit-to-exit: Heterogeneity
Dep. var: Quit-to-exit (0/1)
(1) (2) (3) (4) (5) (6)
Interacts w/ low- Interacts w/ female Interacts w/ small
to med-ability subjects firms
Group F:
   Prop. high-ability 0.0128 0.0693*** 0.0642
(0.02440) (0.02656) (0.04751)
   Prop. high-ability ×\times 0.0698***
      (Low- to med-ability) (0.01693)
   Prop. high-ability ×\times -0.0310
      Female (0.02863)
   Prop. high-ability ×\times Female -0.0109
      Small firms (0.05194)
Group F-C:
   Prop. high-ability 0.0064 0.0539*** 0.0431*
(0.02008) (0.02071) (0.02235)
   Prop. high-ability ×\times 0.0740***
      (Low- to med-ability) (0.01824)
   Prop. high-ability ×\times -0.0118
      Female (0.02707)
   Prop. high-ability ×\times 0.0110
      Small firms (0.03312)
N0N_{0} 7909 7605 8448 8137 8448 8137
G0G_{0} 639 1243 755 1359 755 1359
FE F F-C F F-C F F-C
***: significant at the 0.01 level, **:0.05 level, *: 0.1 level.
Notes: Each column represents a regression. Odd-numbered models are estimated based on model (3) of Table 5, and even-numbered models are estimated based on model (4) of Table 5. Models (1)-(2) add an interaction term of prop. high-ability and an indicator identifying low-to-medium ability lawyers (caseload below median). Models (3)-(4) add an interaction term of prop. high-ability and an indicator for female. Models (5)-(6) add an interaction term of prop. high-ability and an indicator for small firms (firm size below median).
Figure 2: Peer effects in lawyers’ quit-to-exit: Distribution of number of associate lawyers (ng0n_{g_{0}})
Refer to caption

Notes: The minimum of ng0n_{g_{0}} in our sample is 2. The negative binomial distribution is thus fitted using ng02n_{g_{0}}-2.

Figure 3: Peer effects in lawyers’ quit-to-exit: Random 50% subsamples
Refer to caption

Notes: We report the distribution (fitted by kernel density estimation) of the following estimators based on 1000 random subsamples of proportion ρ=0.5\rho=0.5 of lawyers. ‘Miss-specified’ refers to the NLS estimator obtained by treating the observed firms as if they were the firms (i.e., based on the moment 𝔼[y¯i|ng,𝐱g]=x¯iπ(ng)\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g})). ‘Known’ is the NLS estimator under Assumption 2 (i.e., based on the moment 𝔼[y¯i|ng0,𝐱g]=x¯iπ(ng0)\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}})). ‘Unknown-P’ is the GMM estimator under Assumption 3 imposing the parametric restriction ng02NegativeBinomial(m,p)n_{g_{0}}-2\sim{\rm NegativeBinomial}(m,p), where both parameters are estimated. The distributions are rescaled by the standard error obtained from full sample estimation. Vertical lines correspond to the rescaled full sample estimate (i.e., the t-statistic for the test of statistical significance). See model (3) of Table 5 for full sample estimates.

Figure 4: Peer effects in lawyers’ quit-to-exit: Random 70% subsamples
Refer to caption

Notes: We report the distribution (fitted by kernel density estimation) of the following estimators based on 1000 random subsamples of proportion ρ=0.7\rho=0.7 of lawyers. ‘Miss-specified’ refers to the NLS estimator obtained by treating the observed firms as if they were the firms (i.e., based on the moment 𝔼[y¯i|ng,𝐱g]=x¯iπ(ng)\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g})). ‘Known’ is the NLS estimator under Assumption 2 (i.e., based on the moment 𝔼[y¯i|ng0,𝐱g]=x¯iπ(ng0)\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}})). ‘Unknown-P’ is the GMM estimator under Assumption 3 imposing the parametric restriction ng02NegativeBinomial(m,p)n_{g_{0}}-2\sim{\rm NegativeBinomial}(m,p), where both parameters are estimated. The distributions are rescaled by the standard error obtained from full sample estimation. Vertical lines correspond to the rescaled full sample estimate (i.e., the t-statistic for the test of statistical significance). See model (3) of Table 5 for full sample estimates.