\marginsize

3cm3cm3cm3cm

Peer Effects with Miss-specified Peer Groups

Christiern Rose (UQ), Lizi Yu (UQ)

1. Introduction

We address two challenges faced by researchers studying peer effects, each of which can lead to miss-specification of peer groups. The first challenge is that standard methods require that the researcher has access to group data: a sample of groups which includes the outcome and characteristics of all members (see Bramoullé et al. (2020) for a recent review). Such data are often proprietary or restricted-access (Breza et al., 2020), and widely available individual level survey data cannot be used. Without group data, empirical practice is either to drop individuals with missing peer data, leading to sample selection and loss of information, or to use only non-missing peers, leading to measurement error. Measurement error also arises if the researcher is unaware that peers are missing. We refer to this as the missing data problem, and propose a solution which corrects for measurement error and makes full use of the available information.

The second challenge is that the researcher has to choose the relevant peer group, often from a set of candidate group structures. For example, studies based on the Dartmouth room-mate data, in which college freshman were randomly allocated to dorm rooms, have noted that it isn’t clear whether peer effects operate at the room or the floor levels (Sacerdote, 2001; Glaeser et al., 2003; Angrist, 2014), and the relevant group may be different for different outcomes, such as academic attainment and fraternity membership (Sacerdote, 2001). Empirical practice is to conduct a robustness test by re-estimating the peer effects for each candidate group. This implicitly assumes that the relevant group is the same for all individuals (i.e., it is deterministic), but there is no reason for this to be the case. In the Dartmouth context, some dorms may simply be more sociable than others, and the researcher is unlikely to know which ones. Moreover, the researcher does not know which (if any) of the estimates are valid, which can be problematic when there are large differences in the estimates. We refer to this as the group uncertainty problem, and propose a solution based on random peer group structure.

We first show that missing data and group uncertainty are examples of a wider class of miss-specification in which the members of the specified peer group are a subset of the members of the (true) group. This is clear for missing data if the specified group comprises the observed group members. For group uncertainty, it is applicable in the typical empirical setting in which the candidate groups are nested and the specified group is the smallest group. For example, in the Dartmouth context rooms are nested within floors and groups can be specified based on rooms.

Under subset-miss-specification, we show that peer effects can be identified using assumptions which allow the distribution of the group size to be inferred. With missing data, we show that widely used assumptions on the missingness mechanism allow the group size to be inferred by restricting the distribution of the group size conditional on the specified group size (i.e., on the number of observed individuals from that group). For group uncertainty, we suppose that there are two nested candidate groups (e.g., dorm rooms and floors), from which the relevant group is exogenously determined at random.¹¹1The extension to three or more candidate groups is straightforward. This also restricts the distribution of the group size conditional on the sizes of the two candidate groups.

We apply our approach to settings with both small and medium-large group sizes. Our Monte-Carlo experiment is designed to mimic the Dartmouth room-mate data used in Sacerdote (2001); Glaeser et al. (2003) and Angrist (2014), in which students were assigned to dorm rooms of size 2, 3 or 4. Our results demonstrate that the biases arising from ignoring missing data and group uncertainty can be large, but are corrected by the GMM and NLS estimators we propose.

Our empirical application studies how peer ability impacts lawyers’ decision to exit the local legal market using unique employer-employee matched data comprising all lawyers practicing in Shanghai, China. Peer groups are specified based on the number of junior lawyers practicing in the same firm, with a mean of 11.2 and a maximum of 482. We find that the likelihood of a lawyer’s quit-to-exit increases in the proportion of high-ability peers, which is consistent with the invidious comparison model (Hoxby and Weingarth, 2005; Antecol et al., 2016). We apply our results on missing data to show that similar estimates and qualitative conclusions could have been reached had we only had access to an individual level sample of lawyers, rather than the group data we use. Finally, we apply our results on group uncertainty to study whether peer effects operate at the firm or firm-cohort levels, and find evidence of considerable heterogeneity in the relevant peer group across firms.

1.1. Related literature

The literature on sampled networks, mismeasured networks and missing data has recently attracted much interest. Lewbel et al. (2019) and Hardy et al. (2019) consider the implications of measurement error in the network through which peer effects operate. Lewbel et al. (2019) also show that peer effects can be identified and consistently estimated when there is no information on the network beyond group identifiers. Boucher and Houndetoungan (2020) study peer effects when the network is unknown but that its distribution is known or can be consistently estimated.

Chandrasekhar and Lewis (2011) studies the implications of using sampled network data in applied work, provide analytical corrections in some examples of interest and develop a more general graphical reconstruction approach. In the context of peer effects, the analytical correction can be applied if the researcher has data on the network, outcomes and exogenous characteristics for a subsample of individuals and knows the identities, outcomes and exogenous characteristics of all of their peers. This means that data are only missing for peers-of-peers. To apply graphical reconstruction, for every individual, the researcher must observe the outcomes, exogenous characteristics and variables which can be used to predict the propensity of individuals to form links in a network formation model.

Breza et al. (2020) use a parametric network formation model to show that network structure can be identified and used to construct network statistics (e.g., centrality measures) if the researcher does not collect network data for all individuals, but instead collects relational data for a sub-sample of individuals and basic characteristics of all individuals.²²2Relational data are collected by asking questions of the form ‘Consider all the individuals in your group with whom you do activity X. How many of these have trait Y?’ (Breza et al., 2020) Sojourner (2013), Wang and Lee (2013) and Liu et al. (2017) consider the case in which the network and all individuals are observed without error but there are missing data in the outcomes and/or exogenous characteristics.

All of the above papers suppose that the researcher observes at least some data on every individual. In contrast, we allow for some individuals to be entirely missing and do not require knowledge on whether there even exist such individuals. These gains come from exploiting the structure of the group interactions model we consider, which, though widely used in practice (e.g., Moffitt et al. (2001); Sacerdote (2001); Glaeser et al. (2003); Angrist (2014); Boucher et al. (2014); Cornelissen et al. (2017)) and well understood theoretically (e.g., Lee (2007); Bramoullé et al. (2009); Davezies et al. (2009); Bramoullé et al. (2020)), is less general than some.³³3We discuss this issue further in Section 8.

Our work is also related to the literature on multiple peer effects. Goldsmith-Pinkham and Imbens (2013), Arduini et al. (2020) and Reza et al. (2021) study a setting in which individuals are exposed to multiple peer effects, each operating through a different network. In contrast, we suppose that individuals are exposed to peer effects operating through one network, but there is uncertainty as to which is the relevant network. Our model is thus not a model of multiple peer effects, but one of peer effects with uncertain peer groups. The approaches are not observationally equivalent. With multiple peer effects, the endogenous peer effects propogate through a network formed by taking the union of the links over multiple networks. In contrast, with uncertain peer groups, the endogenous effects propogate through only one network, the identity of which is unknown.

Our empirical application contributes to the literature on peer effects in competitive environments, particularly with respect to peers’ ability impact on lower- and medium-achieving individuals. One strand of this literature focuses on sports competitions and tournaments and mostly find a negative impact. Brown (2011), Smith (2013) and Emerson and Hill (2018) find a negative peer ability effect, in contrast to Guryan et al. (2009) who find no effect. Yamane and Hayashi (2015) find positive effects when subjects are ahead and negative effects when they are behind. Another strand of this literature focuses on educational outcomes and reports mixed findings. Some studies find significant peer effects (for example, Hoxby (2000) and Sacerdote (2001) find a positive effect and Antecol et al. (2016) find negative effect), and others find small to no effects (for example, Foster (2006) and Angrist and Lang (2004)). Brady et al. (2017) find negative effects at broader level but positive effects at smaller level, suggesting that peer effects operate differently for different groups. The overall perception is that effects are not identical across different ability groups. Low-achieving groups are adversely affected by peer ability and may benefit from tracking compared to ability-mixing groups (duflo2011pee; Carrell et al., 2013; Booij et al., 2017). Our work builds on these papers by examining career decisions in the highly competitive and incentivized labor market for junior lawyers. Our results are more in line with the literature finding peer ability hurts subjects’ persistence in education and in their chosen career Thiemann (2022); Howell (2021).

2. Model

We consider a group interactions model in which the data comprise an i.i.d. sample of $G_{0}$ groups. The asymptotic is $G_{0}\to\infty$ , which is analagous to the large- $N$ fixed- $T$ asymptotic used for microeconometric panel data (Bramoullé et al., 2009). Individual $i\in\{1,2,...,N_{0}\}$ is in exactly one group denoted $g_{0}(i)\in\{1,2,...,G_{0}\}$ . The size of $i$ ’s group is $n_{g_{0}(i)}\geq 2$ . When it is not required, we omit the dependence on $i$ and simply refer to $g_{0}$ and $n_{g_{0}}$ . The continuous outcome $y_{i}$ of individual $i$ is determined by

\displaystyle y_{i}=\alpha_{g_{0}}+\left(\frac{1}{n_{g_{0}}-1}\sum_{\begin{subarray}{c}j:g_{0}(j)=g_{0}(i)\\ j\neq i\end{subarray}}y_{j}\right)\beta+\left(\frac{1}{n_{g_{0}}-1}\sum_{\begin{subarray}{c}j:g_{0}(j)=g_{0}(i)\\ j\neq i\end{subarray}}x_{j}\right)\delta+x_{i}\gamma+\epsilon_{i},

(2.1)

where $\alpha_{g_{0}}$ is group unobserved heterogeneity, $x_{i}$ is an exogenous characteristic and $\epsilon_{i}$ is an error term satisfying

\displaystyle\mathbb{E}[\epsilon_{i}|\alpha_{g_{0}},n_{g_{0}},\mathbf{x}_{g_{0}}]=0\quad\forall i\in\{1,2,...,N_{0}\},

(2.2)

where $\mathbf{x}_{g_{0}}=(x_{j})_{j:g_{0}(j)=g_{0}(i)}$ is $n_{g_{0}}\times 1$ .⁴⁴4Since we only consider the within-transformation of the fixed effects model below, (2.2) can be relaxed to $\mathbb{E}[\epsilon_{i}|n_{g_{0}},\mathbf{x}_{g_{0}}]=0$ . We maintain the form of (2.2) to maintain comparability with the literature. The moment condition supposes that group size and the characteristics of the group members are exogenous with respect to $\epsilon_{i}$ . However, both can be arbitrarily dependent on $\alpha_{g_{0}}$ , which allows for selection of individuals into groups. The model includes correlated effects (captured by $\alpha_{g_{0}}$ ), endogenous peer effects ( $\beta$ ) and contextual peer effects ( $\delta$ ).

For simplicity of exposition, following Bramoullé et al. (2009) we present results for the case where there is a single characteristic $x_{i}$ . All results continue to apply in the more general case in which $x_{i}$ , $\gamma$ and $\delta$ are vectors. As explained in Section 1.1, this model is well understood from a theoretical perspective and is pervasive in applied work. Moreover, it can be microfounded based on a game in which utility depends on one’s own action and the average action of the others in the group (see Blume et al. (2015); Bramoullé et al. (2020)).

To ensure that the reduced form of (2.1) exists, as is standard in the literature, we suppose that $|\beta|<1$ (i.e., that the endogenous peer effect is not explosive), yielding

	$\displaystyle y_{i}$	$\displaystyle=\frac{\alpha_{g_{0}}}{1-\beta}+\left(\sum_{\begin{subarray}{c}j:g_{0}(j)=g_{0}(i)\\ j\neq i\end{subarray}}x_{j}\right)\pi_{1}(n_{g_{0}})+x_{i}\pi_{2}(n_{g_{0}})+u_{i},$		(2.3)
	$\displaystyle\pi_{1}(n)$	$\displaystyle\triangleq\left(\frac{(\delta+\beta\gamma)(n-1)^{-1}}{1-\beta(\beta+n-2)(n-1)^{-1}}\right),\quad\pi_{2}(n)\triangleq\left(\frac{\gamma+\beta(\delta-\gamma(n-2))(n-1)^{-1}}{1-\beta(\beta+n-2)(n-1)^{-1}}\right),$

where $u_{i}$ is the reduced form error. Applying the within-group transformation,

\widetilde{y}_{i}\triangleq y_{i}-\frac{1}{n_{g_{0}}}\sum_{j:g_{0}(j)=g_{0}(i)}y_{j},

to the reduced form yields,

\displaystyle\widetilde{y}_{i}=\widetilde{x}_{i}\pi(n_{g_{0}})+\widetilde{u}_{i},\quad\pi(n)\triangleq\left(\frac{\gamma-\delta(n-1)^{-1}}{1+\beta(n-1)^{-1}}\right).

(2.4)

Though other transformations can be used, we focus on the within-group transformation since it delivers the least information loss (see Section 3.3 of Bramoullé et al. (2009)).

3. (Possibly) Miss-specified Groups

The researcher specifies the model in Section 2 with group $g(i)\in\{1,...,G\}$ for individual $i$ . The size of group $g(i)$ is $n_{g(i)}$ . As with $g_{0}(i)$ , we omit the dependence of $g(i)$ on $i$ unless it is required. The specified groups may cover only a subset $\mathcal{S}\subseteq\{1,...,N_{0}\}$ of individuals of cardinality $N\leq N_{0}$ . This could arise, for example, if the available data comprise a random sample of the $N_{0}$ individuals. However, we maintain that every individual in $\mathcal{S}$ is in exactly one specified group. We also make use of the within-specified-group transformation,

\overline{y}_{i}\triangleq y_{i}-\frac{1}{n_{g}}\sum_{j:g(j)=g(i)}y_{j},

which would be applied to eliminate specified group fixed effects. To build intuition on the implications of miss-specification for identification, we now consider two examples and derive their implications for the reduced form, which will later be used for our identification analysis.

Example 1 - Superset miss-specification. Suppose that two groups, each of size $n_{0}/2$ , are combined to form a specified group of size $n_{0}$ . Solving for the reduced form and applying the within-specified-group transformation yields, for individual $i$ in group 1

	$\displaystyle\overline{y}_{i}$	$\displaystyle=\overline{x}_{i}\pi(n_{0})+(a+b+c+\overline{u}_{i})$		(3.1)
		$\displaystyle=\overline{x}_{i}\pi(n_{0})+\overline{v}_{i},$		(3.2)

where,

	$\displaystyle a=\frac{\alpha_{1}-\alpha_{2}}{2(1-\beta)},\quad b=\left(\frac{\gamma+\delta}{1-\beta}\right)\frac{1}{n_{0}}\left(\sum_{j:g_{0}(j)=1}x_{j}-\sum_{j:g_{0}(j)=2}x_{j}\right),$
	$\displaystyle c=-\frac{\overline{x}_{i}n_{0}(\gamma\beta+\delta)}{2(n_{0}-1+\beta)(n_{0}/2-1+\beta)}.$

There are three sources of miss-specification, each corresponding to a term in equation (3.1). First, the researcher miss-specifies the correlated effect, yielding the term $a$ . Second, by specifying only one group the researcher erroneously includes the exogenous characteristics and outcomes of those in group 2 in the structural equation determining the outcomes for group 1. This yields the term $b$ . Finally, the researcher miss-specifies the group size to be $n_{0}$ instead of $n_{0}/2$ . This leads the reduced form parameter to be incorrectly specified as $\pi(n_{0})$ rather than $\pi(n_{0}/2)$ , yielding the term $c$ . Due to these terms, the error $\overline{v}_{i}$ depends on $\overline{x}_{i}$ and $n_{0}$ .

Example 2 - Subset miss-specification. Now switch the roles of the groups and the specified groups in Example 1, such that the researcher divides one group into two specified groups, each of size $n_{0}/2$ . Then we obtain, for individual $i$ in group 1,

	$\displaystyle\overline{y}_{i}$	$\displaystyle=\overline{x}_{i}\pi(n_{0}/2)+(c+\overline{u}_{i})$		(3.3)
		$\displaystyle=\overline{x}_{i}\pi(n_{0}/2)+\overline{v}_{i},$		(3.4)

where this time,

\displaystyle c=\frac{\overline{x}_{i}n_{0}(\gamma\beta+\delta)}{2(n_{n_{0}}/2-1+\beta)(n_{0}-1+\beta)}.

Terms such as $a$ and $b$ in Example 1 do not arise. This is because all members of specified group 1 have the same correlated effect as those in specified group 2, and, the sum of the outcomes and exogenous characteristics over all members of specified group 2 (respectively, specified group 1) is common to all members of specified group 1 (respectively, specified group 2). The within-specified-group transformation thus eliminates both sources of miss-specification. Miss-specification only manifests through the group size (i.e., through $c$ ).

In Example 1 the specified group members are a superset of the group members, and there are three sources of miss-specification. In Example 2 they are a subset, and there is one source of miss-specification.⁵⁵5Though we do not present such an example, the case in which we have neither a subset nor a superset is similar to the superset case. Examples 1-2 suggest that subset miss-specification is more readily addressed because one only needs to be concerned with specification of the group size (i.e., the term $c$ ). This is important, because there is no clear solution to terms such as $a$ and $b$ without making strong restrictions on the distributions of $\mathbf{x}_{g_{0}}$ and $\alpha_{g_{0}}$ . As we show below, the intuition that subset miss-specification is more easily addressed is not specific to Examples 1-2. We also argue below that subset miss-specification is more relevant in practice. We thus proceed under the following assumption,

Assumption 1 (Subset miss-specification).

For all $(i,j)\in\mathcal{S}^{2}$ , $g(i)=g(j)\Rightarrow g_{0}(i)=g_{0}(j)$ ,

and the remainder of the paper focuses on miss-specification arising due to incorrect specification of the group size.

Assumption 1 covers the case of missing data by allowing the researcher to observe $N<N_{0}$ individuals with outcomes, characteristics and group identifiers. For example, in the context of education, the researcher might access a sample of students’ test-scores, characteristics and classroom/school identifiers (e.g., Davezies et al. (2009)) or have access to a large educational survey such as the student level PISA survey, which contains school and grade identifiers but includes only a subset of students.⁶⁶6Other examples include risky behaviours and neighbourhood effects. For the former, survey data on smoking, drinking and illicit drug use among students can be incomplete (Lundborg, 2006). For the latter, the neighbourhood average outcome may be measured using survey data which includes geographic identifiers (e.g., census tract). For example, Bertrand et al. (2000) use the 5% public use microsample of the 1990 Census to study welfare take-up. Glaeser et al. (2003) use the same data for wages.

Assumption 1 also covers the case in which there is group uncertainty but the candidate groups are nested. It is satisfied provided that the researcher uses the smallest group structure to specify $g$ . For example, Glaeser et al. (2003) and Angrist (2014) consider peer groups based on dorm rooms and floors,⁷⁷7Similarly, peer effects in education may operate at the classroom, grade or school levels. Neighbourhood effects may operate at at the two, three or four digit postcode level. so Assumption 1 holds if the specified groups are rooms. In our empirical application, we postulate that peer effects among lawyers may operate either at the firm or firm-cohort levels, in which case Assumption 1 holds if firm-cohorts are specified.

4. Identification

We now study identification under Assumption 1. Our identification analysis follows Bramoullé et al. (2009), hence we say that the structural parameters are (point) identified if and only if they can be uniquely recovered from the reduced form parameters presented below (i.e., there is an injective relationship). Our results are thus asymptotic in nature (see Manski (1995)), and characterize whether endogenous, contextual and correlated effects can be distentangled if there is no limit to the number of groups $G_{0}$ .

Under Assumption 1, the reduced form for individual $i\in\mathcal{S}$ is

\displaystyle\overline{y}_{i}

\displaystyle=\overline{x}_{i}\pi(n_{g_{0}})+\overline{u}_{i}

(4.1)

and taking conditional expectations yields

\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}]

\displaystyle=\overline{x}_{i}\varphi(n_{g},\mathbf{x}_{g})+\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}],

(4.2)

where $\varphi(n,\mathbf{x})\triangleq\mathbb{E}[\pi(n_{g_{0}})|n_{g}=n,\mathbf{x}_{g}=\mathbf{x}]$ . To ease the notational burden we do not make explicit that the expectations are also conditional on $i\in\mathcal{S}$ . Below, we consider only cases in which this omission is innocuous and omit the condition $i\in\mathcal{S}$ for the remainder of the paper.

Equation (4.2) shows that identification prospects depend on whether there is within-specified-group variation in the exogenous characteristic, on whether $\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0$ , and if both of these conditions are satisfied, on whether the structural parameters can be uniquely recovered from the reduced form parameters $\varphi(n,\mathbf{x})$ , which are identified for all $(\mathbf{x},n\geq 2)$ in the support of $(\mathbf{x}_{g},n_{g})$ .⁸⁸8We require $n\geq 2$ because $\varphi(1,\mathbf{x})$ is not identifiable since $n_{g}=1$ implies $\overline{y}_{i}=\overline{x}_{i}=0$ .

In general, even under Assumption 1, the moment condition in (2.2) does not imply $\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0$ . Suppose however that we can make a restriction such that it is satisfied, and also that there is within-specified-group variation in the exogenous covariate. Then the structural parameters are point identified if they can be uniquely recovered from the reduced form parameters $\varphi(n,\mathbf{x})$ and the equations

\displaystyle\varphi(n,\mathbf{x})=\sum_{m=2}^{\infty}\mathbb{P}[n_{g_{0}}=m|n_{g}=n,\mathbf{x}_{g}=\mathbf{x}]\pi(m)

(4.3)

for all $(\mathbf{x},n\geq 2)$ in the support of $(\mathbf{x}_{g},n_{g})$ . The distribution $n_{g_{0}}|n_{g},\mathbf{x}_{g}$ is not observed, hence the structural parameters are not point identified unless it can be restricted.⁹⁹9Partial identification is possible because the probabilities $\mathbb{P}[n_{g_{0}}=m|n_{g},\mathbf{x}_{g}]$ for $m=2,3....$ are non-negative and sum to 1. We do not pursue partial identification, since, as shown below, typical empirical settings faced by researchers can lead to point identification.

We now consider three assumptions, each of which guarantees $\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0$ and restricts the distribution of $n_{g_{0}}|n_{g},\mathbf{x}_{g}$ . Each assumption includes the standard model of peer effects with known peer groups as a special case, respectively when $n_{g}=n_{g_{0}}$ (Assumption 2), when $\rho=1$ (Assumption 3) and when $\psi\in\{0,1\}$ (Assumption 4).

To rule out pathological cases and ease the notational burden, we present our our results for the case in which there is within-specified-group variation in the exogenous characteristic, and that the support of the distribution of $n_{g_{0}}|\mathbf{x}_{g_{0}}$ does not depend on $\mathbf{x}_{g_{0}}$ . The results in Propositions 1 and 2 continue to apply if the latter is relaxed, provided that there exists an element in the support of the exogenous characteristic such that the support condition holds for the conditional distribution of $n_{g_{0}}$ .

4.1. Missing data

We first consider identification when there are missing data. Our analysis makes uses of the indicator $s_{i}\in\{0,1\}$ for the condition $i\in\mathcal{S}$ .

Assumption 2 (Known group size).

For each individual $i\in\{1,2,...,N_{0}\}$ , the researcher observes $(y_{i},x_{i},g_{0}(i),n_{g_{0}(i)})$ when $s_{i}=1$ and $\mathbb{E}[\epsilon_{i}|\alpha_{g_{0}},n_{g_{0}},\mathbf{x}_{g_{0}},\mathbf{s}_{g_{0}}]=0$ . The researcher groups individuals by $g_{0}$ .

Assumption 2 covers the simplest case in which the group size is known but individuals are sampled from the group. In practice, it means that the researcher has access to a sample of individuals with outcomes, characteristics, group membership indicators and group size. The distribution of $s_{i}$ (i.e., the inclusion probability) can be heterogeneous across individuals, but ought not to depend on the structural error (i.e., there should be no sample selection on unobservables). Identification under Assumption 2 was first considered by Davezies et al. (2009). We include it for comparison with our approach, which allows the group size to be unknown.

Assumption 3 (Unknown group size).

For each individual $i\in\{1,2,...,N_{0}\}$ , the researcher observes $(y_{i},x_{i},g_{0}(i))$ when $s_{i}=1$ , $\mathbb{E}[s_{i}|\mathbf{x}_{g_{0}},n_{g_{0}}]=\rho\in(0,1]$ , $\mathbb{COV}[s_{i},s_{j}|\mathbf{x}_{g_{0}},n_{g_{0}}]=0$ for $j\in g_{0}(i),j\neq i$ , and $\mathbb{E}[\epsilon_{i}|\alpha_{g_{0}},n_{g_{0}},\mathbf{x}_{g_{0}},\mathbf{s}_{g_{0}}]=0$ . The researcher groups individuals by $g_{0}$ .

Assumption 3 relaxes the requirement that the researcher knows the group size. In practice, it allows for the case in which the researcher observes an individual level sample of outcomes, exogenous characteristics and group membership indicators. In such a sample, the researcher knows the number of individuals sampled from each group ( $n_{g}$ ) but does not know the group size ( $n_{g_{0}}$ ), nor whether there are any missing individuals in each group. Relative to Assumption 2, we additionally assume that the sample of observed individuals is unbiased and uncorrelated. These are mild assumptions likely to be satisfied in well designed individual level samples under common sampling schemes (Arnab, 2017).

Assumptions 2 and 3 both maintain that the sampling of individuals does not induce sample selection, whilst Assumption 3 additionally maintains that the sampling does not depend on exogenous covariates nor on the group size. Viewed from the missing data perspective, Assumption 3 is essentially a type of Missing Completely at Random assumption, which is widely used (often implicitly) in empirical work. Together, $\mathbb{E}[s_{i}|\mathbf{x}_{g_{0}},n_{g_{0}}]=\rho\in(0,1]$ and $\mathbb{COV}[s_{i},s_{j}|\mathbf{x}_{g_{0}},n_{g_{0}}]=0$ imply

\displaystyle n_{g}|n_{g_{0}},\mathbf{x}_{g}\sim\text{Binomial}(n_{g_{0}},\rho),

(4.4)

which leads to identification of the distribution $n_{g_{0}}|n_{g},\mathbf{x}_{g}$ from the observed distribution $n_{g}|n_{g}\geq 1,\mathbf{x}_{g}$ .¹⁰¹⁰10Conditioning on $n_{g}\geq 1$ is important, because for some groups the researcher may not observe any individuals. Hence it is the distribution $n_{g}|n_{g}\geq 1,\mathbf{x}_{g}$ which is observed rather than $n_{g}|\mathbf{x}_{g}$ .

We now briefly discuss possible generalizations of Assumption 3, and the extent to which they facilitate identification. First, one might consider $\rho=\rho_{g_{0}}$ to allow for group heterogeneity. Such an assumption does not provide information to identify $n_{g_{0}}|n_{g},\mathbf{x}_{g}$ because $\mathbb{P}[n_{g_{0}}=m|\mathbf{x}_{g}]$ and $\rho_{g_{0}}$ only appear multiplicatively in the expression for $\mathbb{P}[n_{g}=n|n_{g}\geq 1,\mathbf{x}_{g}]$ for all $m$ and $n$ . For the same reason, using $\rho=\rho(n_{g_{0}})$ does not provide identifying information.

Alternatively one might consider $\rho=\rho(z_{i})$ for some exogenous observable $z_{i}$ (e.g., $z_{i}=x_{i}$ ). Viewed from the missing data perspective, this corresponds to a Missing At Random assumption. In this case $n_{g}|n_{g_{0}},\mathbf{x}_{g_{0}},\mathbf{z}_{g_{0}}$ follows a Poisson Binomial distribution, which could, in principle, be used for identification. However, if $\mathbf{z}_{g_{0}}$ is observed then $n_{g_{0}}$ is known and identification is attained instead under the weaker Assumption 2. In contrast, the distribution $n_{g}|n_{g_{0}},\mathbf{x}_{g},\mathbf{z}_{g}$ does not take a known form, hence is of limited use for the identification in the typical setting in which we have no information on the missing individuals (i.e., there are some individuals for whom $z_{i}$ is unobserved).

We do not pursue the above generalisations formally because the focus of our analysis is not on sample selection at the individual level, but instead on replacing a group level sample with an individual level sample. Hence we consider Assumption 3 as the benchmark case in which the researcher has access to a well designed sample of individuals rather than a well designed sample of groups. We now present our identification result.

Proposition 1.

$(\gamma,\beta,\delta)$ are point identified if $\gamma\beta+\delta\neq 0$ and any one of the following holds

1.

Assumption 2 holds and the support of the distribution of $n_{g_{0}}$ has at least three elements (Davezies et al., 2009).
2.

Assumption 3 holds and the support of the distribution of $n_{g_{0}}$ is bounded and has at least three elements.

Proposition 1 shows that the structural parameters are identifiable if $\gamma\beta+\delta\neq 0$ . This is a well known necessary identification condition in models with correctly specified groups and both endogenous and contextual effects (see for example Bramoullé et al. (2009) or Rose (2017)), which is generically satisfied over the parameter space. If it is violated then the endogenous and contextual effects exactly offset one another such that the reduced form effect is $\pi(n)=\gamma$ , which does not vary with $n$ , and hence no amount of variation in group sizes can identify $\beta$ and $\delta$ . Both results in Proposition 1 also require that the support of the distribution of $n_{g_{0}}$ has at least three elements. This is also necessary condition for identification when the groups are correctly specified and there are endogenous, contextual and correlated effects (Lee, 2007; Davezies et al., 2009; Bramoullé et al., 2009), hence it is also necessary if the groups are (possibly) miss-specified.

Result 1 was first established by Davezies et al. (2009) for missing data with known group sizes whereas result 2 applies when the group sizes are unknown. Point identification of $\rho$ comes in part from information on the lower bound of the support of $n_{g_{0}}$ . The result uses that $n_{g_{0}}\geq 2$ , which is maintained throughout the theoretical and empirical literature (e.g., Lee (2007); Bramoullé et al. (2009); Davezies et al. (2009); Bramoullé et al. (2020); Moffitt et al. (2001); Sacerdote (2001); Glaeser et al. (2003); Angrist (2014); Boucher et al. (2014); Cornelissen et al. (2017)). Indeed, groups of size one cannot provide information on peer effects and are not consistent with our model. However, if individuals are missing, then we may observe groups with only one member even though all groups have at least two members (i.e., $\mathbb{P}[n_{g}=1|\mathbf{x}_{g},n_{g}\geq 1]>0$ if $\rho<1$ ). Hence the distribution of $n_{g}$ provides information on $\rho$ , which combined with (4.4), leads to identification of $\rho$ and the distribution $n_{g_{0}}|n_{g},\mathbf{x}_{g}$ . Equation (4.3) can then be used to identify the peer effects.

In some applications a stronger support restriction may be used if it is known that $n_{g_{0}}\geq\underline{n}$ for some known $\underline{n}>2$ . For example, this is likely to be satisfied in studies of peer effects in education, where classrooms might reasonably be assumed to contain at least a handful of students. Though not required for identification, this can improve the precision of the GMM estimator we propose below. In other applications $n_{g_{0}}=1$ may be feasible, in which case one can never rule out $\rho=1$ and $n_{g_{0}}=n_{g}$ so that peer effects are not identifiable. For this reason, a restriction on the lower bound of the support of $n_{g_{0}}$ is necessary for identification of the peer effects.

Result 2 also uses the mild assumption that the distribution of $n_{g_{0}}$ is bounded, which is needed to identify $\rho$ and the distribution of $n_{g_{0}}|\mathbf{x}_{g}$ from the distribution of $n_{g}|n_{g}\geq 1,\mathbf{x}_{g}$ as a solution of a non-linear system comprising a finite number of equations. The upper bound need not be known since it is identifiable from the distribution of $n_{g}|n_{g}\geq 1$ . This assumption is also used, for example, by Lee (2007), Davezies et al. (2009), Lewbel (2019) and Boucher and Houndetoungan (2020).

4.2. Group uncertainty

We now consider subset miss-specification resulting from group uncertainty.

Assumption 4 (Group uncertainty).

For each individual $i\in\{1,2,...,N_{0}\}$ the researcher observes $(y_{i},x_{i},g_{1}(i),g_{2}(i),n_{g_{1}(i)},n_{g_{2}(i)})$ , $2\leq n_{g_{1}}\leq n_{g_{2}}$ and $g_{1}(i)=g_{1}(j)\Rightarrow g_{2}(i)=g_{2}(j)$ for all $(i,j)\in\{1,2,...,N_{0}\}^{2}$ . The researcher groups individuals by $g_{1}$ , $\mathbb{E}[\epsilon_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=0$ and $\mathbb{P}[n_{g_{0}}=n_{g_{1}}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=\psi\in[0,1]$ and $\mathbb{P}[n_{g_{0}}=n_{g_{2}}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=1-\psi$ .

Assumption 4 covers the case in which there is group uncertainty but the candidate groups are naturally nested. Though an extension to three or more is straightforward, for simplicity of exposition we do not pursue it formally. We also require a minor strengthening of the moment condition, replacing (2.2) with $\mathbb{E}[\epsilon_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=0$ , and hence $n_{g}$ with $n_{g_{1}},n_{g_{2}}$ in equations (4.2) and (4.3). The stronger moment condition requires that both potential group sizes and the exogenous characteristic of all individuals in the two potential groups be strictly exogenous with respect to the structural error.

Under Assumption 4, the reduced form is

\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g_{2}}]=\overline{x}_{i}\varphi^{\prime}(n_{1},n_{2},\mathbf{x})

(4.5)

where $\varphi^{\prime}(n_{1},n_{2},\mathbf{x})\triangleq\mathbb{E}[\pi(n_{g_{0}})|n_{g_{1}}=n_{1},n_{g_{2}}=n_{2},\mathbf{x}_{g_{2}}=\mathbf{x}]=\psi\pi(n_{1})+(1-\psi)\pi(n_{2})$ , hence identification hinges on the properties of the joint distribution of $n_{g_{1}}$ and $n_{g_{2}}$ , specifically, through the bi-partite graph of the support of $(n_{g_{1}},n_{g_{2}})$ , which we denote by $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ . Figure 1 depicts an example.

Our approach can be generalized to allow $\psi=\psi(\mathbf{z}_{g_{2}})$ (e.g., $\mathbf{z}_{g_{2}}=\mathbf{x}_{g_{2}}$ ), provided that the conditional expectations and probabilities in Assumption 4 continue to hold conditional also on $\mathbf{z}_{g_{2}}$ . We discuss the corresponding modification of Proposition 2 at the end of this section. We now present our identification result.

Proposition 2.

$(\gamma,\beta,\delta)$ are point identified if $\gamma\beta+\delta\neq 0$ , Assumption 4 holds, the support of the distribution of $n_{g_{1}}$ has at least three elements, the support of the distribution of $n_{g_{2}}$ has at least three elements and either there is a connected component of $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ with at least three vertices corresponding to the support of $n_{g_{1}}$ or there is a connected component of $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ with at least three vertices corresponding to the support of $n_{g_{2}}$ .

Refer to caption — Figure 1: A connected component of $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ with five vertices

The conditions $\gamma\beta+\delta\neq 0$ and that the supports of the distributions of $n_{g_{1}}$ and $n_{g_{2}}$ each have least three elements are necessary even when there is no miss-specification (i.e, when it is known that either $\psi=0$ or $\psi=1$ ) (Lee, 2007; Davezies et al., 2009; Bramoullé et al., 2009). In addition to these, we require a mild condition on the joint support of the candidate group sizes $(n_{g_{1}},n_{g_{2}})$ , expressed through the bi-partite graph $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ . When this condition is satisfied, using standard arguments for bi-partite fixed effects models (e.g., Abowd et al. (1999)), we are able to identify $\psi\pi(n_{1})+(1-\psi)\pi(n_{2})$ for all $(n_{1},n_{2})$ in a subset of the support of $(n_{g_{1}},n_{g_{2}})$ . This subset is given by all pairs which appear in the same connected component of $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ . Importantly, there is no requirement that the supports of $n_{g_{1}}$ and $n_{g_{2}}$ overlap. Figure 1 provides such an example.

The intuition for identification of $\psi\pi(n_{1})+(1-\psi)\pi(n_{2})$ comes from a regression of $\overline{y}_{i}$ on the interaction of $\overline{x}_{i}$ with dummies for each value of $n_{g_{1}}$ and the interaction of $\overline{x}_{i}$ with dummies for each value of $n_{g_{2}}$ . Together, both sets of dummies sum to two for every individual in the sample (because each individual is in exactly two candidate groups, a small one and a larger one), hence one dummy must be omitted. For this reason, we identify the pairwise sums of $\psi\pi(n_{1})$ and $(1-\psi)\pi(n_{2})$ for all pairs in the same connected component. If we observe a connected component containing at least three vertices corresponding to $n_{g_{1}}$ we can contrast $\psi\pi(n_{g_{1}})$ over three different values of $n_{g_{1}}$ holding $n_{g_{2}}$ constant. This yields three equations in the four unknowns $\psi,\gamma,\delta,\beta$ . We obtain an additional equation by repeating the exercise for a different value of $n_{g_{2}}$ , which leads to identification.

Under the generalization of Assumption 4 to $\psi=\psi(\mathbf{z}_{g_{2}})$ , Proposition 2 holds provided that the support conditions on $n_{g_{1}}$ and $n_{g_{2}}$ hold conditionally on $\mathbf{z}_{g_{2}}=\mathbf{z}$ for some element $\mathbf{z}$ of the support of $\mathbf{z}_{g_{2}}$ . Similarly, one replaces $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ by $\mathcal{G}_{n_{g_{1}},n_{g_{2}}|\mathbf{z}_{g_{2}}=\mathbf{z}}$ . This rules out pathological cases such as $z_{i}=n_{g_{2}(i)}$ .

5. Estimation

For missing data with known group sizes, we estimate $\gamma,\delta,\beta$ by non-linear least squares based on

\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}}).

(5.1)

We focus here on the non-linear least squares estimator since it is more easily adapted to the case of unknown group sizes. We present an alternative instrumental variables estimator in the appendix. If the group sizes are unknown, we use

\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}]=\overline{x}_{i}\sum_{m=2}^{\infty}\mathbf{p}_{m}(n_{g};\rho,\mathbf{q})\pi(m),

(5.2)

where $\rho\in(0,1]$ , $0\leq\mathbf{q}_{m}\leq 1$ for $m=1,2,...$ , $\mathbf{q}_{1}=0$ , $\sum_{m=1}^{\infty}\mathbf{q}_{m}=1$ , and

\displaystyle\mathbf{p}_{m}(n_{g};\rho,\mathbf{q})

\displaystyle\triangleq\frac{\mathbf{q}_{m}{m\choose n_{g}}\rho^{n_{g}}(1-\rho)^{m-n_{g}}}{\sum_{n=2}^{\infty}\mathbf{q}_{n}{n\choose n_{g}}\rho^{n_{g}}(1-\rho)^{n-n_{g}}}.

(5.3)

For the parameters $\mathbf{q}$ and $\rho$ we can use maximum likelihood based on

\displaystyle\mathbb{P}[n_{g}|n_{g}\geq 1]=\frac{\sum_{m=2}^{\infty}\mathbf{q}_{m}{m\choose n_{g}}\rho^{n_{g}}(1-\rho)^{m-n_{g}}}{\sum_{n=1}^{\infty}\sum_{m=2}^{\infty}\mathbf{q}_{m}{m\choose n}\rho^{n}(1-\rho)^{m-n}}.

(5.4)

In practice we do not sequentially estimate the parameters by maximum likelihood and then by non-linear least squares. Instead, we estimate the parameters jointly by GMM based on the non-linear least squares moment conditions and equality to zero of the expectation of the score of the log-likelihood function based on (5.4). This framework allows the researcher to use GMM standard errors (as opposed to having to adjust for a two-stage estimator or use a boostrap) and to include additional information when it is available. For example, if the researcher knows which observed groups have missing members (i.e., they observe an indicator for $n_{g}=n_{g_{0}}$ ), they can also use the moment $\mathbb{E}[\mathbf{1}(n_{g}=n_{g_{0}})|n_{g}]=\rho^{n_{g}}$ . Similarly, one can also make a parametric assumption on the distribution of $n_{g_{0}}$ (i.e., use $\mathbf{q}=\mathbf{q}(\boldsymbol{\lambda})$ for a parameter $\boldsymbol{\lambda}$ ).

Identically to the empirical likelihood estimator of the probability mass function, the score equations yield $\widehat{\mathbf{q}}_{m}=0$ for all $m>\widehat{\overline{n}}$ , where $\widehat{\overline{n}}$ is the sample maximum of $n_{g}$ , hence the concentrated model is

	$\displaystyle\mathbb{E}[\overline{y}_{i}\|n_{g},\mathbf{x}_{g}]$	$\displaystyle=\overline{x}_{i}\sum_{m=2}^{\widehat{\overline{n}}}\mathbf{p}_{m}(n_{g};\rho,\mathbf{q})\pi(m),$		(5.5)
	$\displaystyle\mathbb{P}[n_{g}\|n_{g}\geq 1]$	$\displaystyle=\frac{\sum_{m=2}^{\widehat{\overline{n}}}\mathbf{q}_{m}{m\choose n_{g}}\rho^{n_{g}}(1-\rho)^{m-n_{g}}}{\sum_{n=1}^{\widehat{\overline{n}}}\sum_{m=2}^{\widehat{\overline{n}}}\mathbf{q}_{m}{m\choose n}\rho^{n}(1-\rho)^{m-n}}.$		(5.6)

For group uncertainty, we apply non-linear least squares based on

\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}[\psi\pi(n_{g_{1}})+(1-\psi)\pi(n_{g_{2}})].

(5.7)

6. Monte-Carlo

We tailor the design to the well known data on roommates at Dartmouth college studied by Sacerdote (2001), Glaeser et al. (2003) and Angrist (2014). The original data comprise 1589 freshman at Dartmouth college who were randomly assigned to dorm-rooms. Fifty three percent of dorm-rooms were doubles, 44 percent were triples and the remaining 3 percent were quads. The first part of our Monte-Carlo experiment supposes that only a random sample of these students is available. The second part considers the issue of the definition of the relevant peer group, which could be the room or the floor (Sacerdote, 2001; Glaeser et al., 2003; Angrist, 2014).

6.1. Missing data

We consider a design satisfying Assumption 3. The original data comprise a complete sample of freshman (i.e., $\rho=1$ ). Our design varies $\rho\in\{0.3,0.5,0.7,0.9,1\}$ . For $\rho=1$ we set $s_{i}=1$ for all $i\in\{1,...,N_{0}\}$ and for $\rho<1$ we set $p_{i}=\rho+q_{i}$ , where $q_{i}\overset{i.i.d.}{\sim}\text{Uniform}(-0.1,0.1)$ and $s_{i}=1$ with probability $p_{i}$ . We set $G_{0}$ to be the nearest integer to $M/(\rho\mathbb{E}[n_{g_{0}}])$ , where $M$ is discussed below. For each dataset, we draw $G_{0}$ dorm-rooms from the distribution of dorm-room sizes, which we take to be $n_{g_{0}}-2\sim{\rm Binomial}(2,0.25)$ so as to match the sample mean and support of the Dartmouth data. We use a parametric distribution so as to evaluate the performance of the GMM estimator both with and without a parametric restriction on the group size distribution.

Since we draw a fixed number of dorm rooms, each with a random size, the number of students $N_{0}$ varies from one dataset to the next. We then draw an independent random sample of $N$ students, which includes student $i$ with probability $p_{i}$ . Hence $N$ also varies form one dataset to the next. Allowing $G_{0}$ to depend on $\rho$ as above implies that the size of the observed sample is $N\approx M$ no matter the value of $\rho$ . Setting $M=1600$ , our design matches the sample size of the Dartmouth data. We thus consider the performance of our method had the Dartmouth data been obtained from a random sample of freshman, rather than all freshman, holding the number of observations constant as we vary $\rho$ . We also consider $M=8000$ to study the performance of the method under different sample sizes.

Sacerdote (2001) exploits random assignment of freshmen to dorms in the Dartmouth data to identify peer effects in educational attainment, measured by freshman GPA. It is argued that endogenous effects are difficult to identify due to the reflection problem (see Manski (1993)). Sacerdote (2001) focuses instead on credibly identifying the reduced form effect of room-mate high school attainment on freshman GPA (see column 5 of Table 3 in Sacerdote (2001)). The estimated peer effect is positive and statistically significant, though smaller in magnitude than own high-school attainment. This reduced form regression has $R^{2}=0.19$ .

To match this setting as well as possible, in our design we set the own effect to be $\gamma=1$ and the endogenous effect to be $\beta=0$ , hence the reduced form effect of room-mate’s high-school attainment is equal to the contextual effect, which we set to be $\delta=0.5$ . We set $x_{i}\overset{i.i.d.}{\sim}\mathcal{N}(0,1)$ , $\epsilon_{i}\overset{i.i.d.}{\sim}\mathcal{N}(0,\sigma^{2})$ and $\alpha_{g_{0}}\overset{i.i.d.}{\sim}\mathcal{N}(1,\sigma^{2})$ and choose $\sigma^{2}=2(\gamma^{2}+\delta^{2}/\mathbb{E}[n_{g_{0}}-1])$ . For the expected dorm-room size, this choice of $\sigma^{2}$ corresponds to fraction 0.8 of $\mathbb{V}[y_{i}]$ being due to $\alpha_{g_{0}(i)}+\epsilon_{i}$ and 0.2 being due to $x_{i}\gamma+(1-n_{g_{0}})^{-1}\sum_{j:g_{0}(j)=g_{0}(i),j\neq i}x_{j}$ , hence a population $R^{2}$ of 0.2 in a reduced form regression without fixed effects, as estimated by Sacerdote (2001).

6.1.1 Results

Table 1 presents the results. We compare four models; one in which the observed groups are treated as if they are the groups (‘MS’, estimated by NLS), one in which the group size is known (‘K’, estimated by NLS), one in which the group size is unknown (‘U’, estimated by GMM) and a modification of U in which the parametric assumption $n_{g_{0}}-2\sim{\rm Binomial}(2,\omega)$ is additionally imposed (‘U-P’, estimated by GMM). Model MS is miss-specified when $\rho<1$ . The other models are correctly specified but use information and assumptions in different ways. Model (K) is the full information benchmark with which we compare models (U) and (U-P). Models (U) and (U-P) take the upper bound on the support of the group sizes to be known as $\overline{n}=4$ , which, in the context of the Dartmouth data implies that the researcher knows that there are no rooms of more than four students.

Beginning with the contextual effects only specification ( $\beta=0$ is imposed) and $\rho=1$ , we see that the results are very similar (identical to three decimal places) for all four models. As $\rho$ decreases, the distributions of the NLS estimator of $\delta$ and $\gamma$ of model (MS) shift closer to zero. For $\rho\leq 0.5$ , the bias becomes sizeable. In contrast, the estimators for the other models remain approximately unbiased for all values of $\rho$ , though their root mean squared error (RMSE) increases as $\rho$ decreases. This is at least partly because, as $\rho$ decreases the number of groups with only one observed member increases, and these groups provide no information on the parameters due to the within-specified-group transformation (i.e., because $\overline{y}_{i}=\overline{x}_{i}=0$ when $n_{g}=1$ ).

The NLS estimator for model (K) performs at least as well as the GMM estimator for model (U) in terms of RMSE. This is unsurprising since it requires known group sizes in order to be implemented. For $\rho=0.9$ the difference in RMSE is small (0.086 vs 0.099 for $\delta$ when $M=8000$ ), though the gap widens as $\rho$ decreases. This is likely because, as $\rho$ decreases, there is less variation in the specified group size $n_{g}$ , and hence, for a given specified group size, more uncertainty as to the group size $n_{g_{0}}$ . The additional parametric assumption on the group size distribution used in model (U-P) makes little difference when $\rho\geq 0.7$ , but for smaller values the difference in RMSE between the GMM estimators of models (U) and (U-P) becomes non-neglible.

Moving on to the specification with contextual and endogenous effects, it is clear that, though identified due to there being at least three group sizes, there is insufficient group size variation in this design to reliably separate the two, as suggested by (Sacerdote, 2001). This is likely because only around 6% of rooms were quads. In particular, the RMSE on $\beta$ is large and all estimators of $\beta$ are biased upwards, whereas estimators of $\delta$ tend to be biased downwards. Nevertheless, the qualiative conclusions regarding the relative performance of the estimators of the four models are the same as for the specifications with contextual effects only.

6.2. Group uncertainty

Sacerdote (2001), Glaeser et al. (2003) and Angrist (2014) all discuss the appropriate definition of the peer group, which may either be the room, the floor or the entire dorm. We focus here on the distinction between the room and the floor, and consider a design in which Assumption 4 holds. To match the Dartmouth data, we first draw the rooms as described above. We then draw $f_{1}\in\{1,2,...,5\}$ uniformly, and take rooms $\{1,2,...,f_{1}\}$ to comprise the first floor. We then draw another integer $f_{2}\in\{1,2,...,5\}$ and take rooms $\{f_{1}+1,...,f_{1}+f_{2}\}$ to comprise the second floor. We proceed in this way until every room has been allocated to a floor. In the Dartmouth data, the sample mean number of students per floor is close to 8. This data generating process yields an expected number of students per floor of 7.5. All other aspects of the data generating process remain unchanged and we vary $\psi\in\{0.2,0.4,0.6,0.8\}$ .

6.2.1 Results

Table 2 presents the results. We compare four models; one in which the specified group is the room (‘R’), one in which the specified group is the floor (‘F’), one in which the researcher knows whether the group is the room or the floor (‘K’) and one in which the group is unknown (‘U’). All are estimated by NLS. Models (R) and (F) are miss-specified because have $0<\psi<1$ in all designs. The other models are correctly specified but use information and assumptions in different ways. Model (K) is the full information benchmark with which we compare model (U).

For brevity, the discussion focuses on the contextual effects only specifications. Specifying the group to be the room (model (R)) leads to downwards bias of $\delta$ . The bias and RMSE grow as $\psi$ decreases. This is because the proportion of incorrectly specified groups grows as $\psi$ decreases. For the same reason the RMSE from specifying the group to be the floor (model (F)) increases as $\psi$ increases.

The results suggest that the estimator of model (F) is unbiased even when $\psi>0$ . This is coincidental. It just so happens that in this design the three sources of bias (the terms $a,b$ and $c$ discussed in Example 1 in Section 3) offset one another. To demonstrate this, Table 3 modifies the design by instead setting $\alpha_{g_{0}(i)}\overset{i.i.d.}{\sim}\mathcal{N}(n_{g_{0}}^{-}1\sum_{j:g_{0}(j)=g_{0}(i)}x_{j},\sigma^{2})$ . This has no impact on the estimators of models (R), (K) and (U) because all are based on room fixed effects.¹²¹²12For these estimators the results are numerically identical to results reported in Table 2 because identical seeds were used to draw the data for each replication. However, it introduces substantial bias for the estimator of model (F), which uses floor fixed effects.

Comparing the results for models (K) and (U), as expected the RMSE of the latter is larger since it does not require knowledge of the groups. Nevertheless, depending on the sample size and value of $\psi$ , the NLS estimator of model (U) has small bias and sufficiently small RMSE so as to be distinguishable from zero. Considering now the specifications with endogenous and contextual effects, though weakly identified, the main qualiative conclusions regarding the relative performance of the estimators of the four models are the same as for the specifications with contextual effects only. Notice also that once endogenous effects are introduced the estimator of model (F) often has larger bias and RMSE than that of model (U). This is particularly true for $\delta$ .

7. Application

In this section, we use unique employer-employee matched data to study how lawyers’ quit-to-exit responses depend on their own ability and that of their peers. The dataset is collected from the Shanghai Bar Association and includes a full history of law firm composition and lawyers’ career changes from 2009-2016. Exploiting lawyers’ annual registration records, we observe when they cancel legal practice in Shanghai, which we refer to as their quit-to-exit. On quitting, lawyers may change occupations, practice law in other regions, or return to education. Since legal practice is among the highest-income occupations and Shanghai is one of China’s highest-paying cities, we argue that quit-to-exit in this context is likely a result of lesser persistence in a highly competitive, incentivized environment, similar to exits in the education and labor literature (Thiemann, 2022; Wasserman, 2018).

Our analysis focuses on lawyers’ quit-to-exit in 2016 and how it is affected by their own ability and that of their peers. Throughout our analysis, we focus on associate lawyers, excluding partners and directors. This is because associates and partners/directors are unlikely to perceive one another as direct competitors. The behavior, ability and characteristics of partners and directors in the firm is accounted for by firm fixed effects. From this point onwards, we refer to associate lawyers simply as lawyers.

To measure lawyers’ ability, we merge the registration records with Shanghai court judgements. In this dataset, lawyers’ performance can be more reliably measured for civil litigation, so we focus on law firms and lawyers whose main business is in civil law. Our analysis is thus restricted to a sample of law firms in which most lawyers complete at least one civil case annually and lawyers who do not declare criminal law as one of the three specialized fields.¹³¹³13The performance in criminal law is not well measured because: 1) We only observe a small share of criminal judgements and many are restricted for confidentiality reason; 2) For criminal cases, the case fees are identical within the same case category (e.g. theft or homicide), so we cannot use case fees to measure case size as what we do for civil cases. Nevertheless, our results are robust to use number of civil and criminal cases to create the ability measure and run the analysis for all (civil and criminal) lawyers, and are robust to including all law firms in the analysis (see Table 7). To measure performance in civil litigation, we use lawyers’ fee-weighted caseload in the previous three years (2013-2015). Though we observe case outcomes (win or lose), we do not use this to measure ability because it is likely that high-ability lawyers take on more difficult cases.

To compute our ability measure, we perform the following steps: 1) We extract the case fee (measured in thousand yuan) for each civil case, which is the amount paid to file the case to the court and is calculated based on the disputed amount of the case; 2) We divide the case fee by the number of lawyers representing the case, according to the lawyers being listed in the court judgement; and 3) We repeat the previous two steps for each civil case and sum all fee-weighted cases undertaken from 2013-2015, and 4) We compute the annual average caseload by dividing by the number of years of practice between 2013 and 2015.¹⁴¹⁴14For lawyers who started practice after 2013, the annual caseload in 2013 (and 2014 if started after 2014) would be zero. To include these lawyers and make them comparable, we use the annual average caseload, rather than the three-year sum, by excluding the year(s) before the lawyer’s legal practice date.

The fee-weighted annual average caseload is a proxy for ability because it reflects past performance and income generated, hence can be used to predict one’s career prospects. Generally, lawyers in China charge by case, rather than by work hours (Liu, 2006). For civil litigation, lawyers’ fees are typically charged based on the disputed amount (Michelson, 2006), thus high-ability lawyers are more likely to work on high fee cases, regardless of whether cases are assigned by partners or obtained by the lawyer themselves. Nevertheless, we are aware that non-litigation business is not measured (e.g., Initial Public Offerings and Mergers/Acquisitions). To this end, we replicate our analysis in small- to medium-size firms because major non-litigation business is concentrated in large firms.

The effect of interest is that of peer ability on quit-to-exit. To measure peer ability we use both the peer average fee-weighted caseload and the proportion of high-ability peers, where high-ability is defined as having a fee-weighted caseload in the top quartile (i.e., $>24.44$ in our data). We believe that the latter better captures the competitive environment through which lawyers compete for promotion to a small number of senior positions (team leaders, partners and directors).

We consider two group structures through which peer effects may operate. Group F supposes that peer effects operate among all lawyers working at the same law firm. Alternatively, we might expect stronger peer effects among similar-age lawyers, who are likely to be at a similar career stage. To account for this possibility, we split lawyers into two cohorts with a cut-off age of 35 (inclusive) for the younger cohort. Group F-C supposes that peer effects only operate through lawyers in the same cohort at the same law firm.

Our sample includes 8,448 lawyers working at 755 law firms during 2016. Table 4 summarizes the data. As shown in the left panel, 3.6% of lawyers quit-to-exit, 45% are female, 38% have a graduate degree (Masters or PhD), the average age is around 36, average experience is around 8 years, and the average fee-weighted annual caseload is 20 thousand yuan. In the right panel of Table 4, we present summary statistics for the 732 small- to medium-size firms, which are largely similar to the sample of all firms.

We estimate (2.1) taking the dependent variable to be an indicator for quit-to-exit in 2016. Even though the dependent variable is binary, the linear model is well defined from both an econometric and economic perspective, and has desirable computational and statistical properties, especially when there are group fixed effects (see Boucher et al. (2020)).

Individual covariates comprise an indicator for being female, an indicator for having a graduate degree, age, years of experience, quadratic terms in age and experience, and fee-weighted caseload. We consider contextual effects only (i.e., $\beta=0$ is imposed) because we do not expect strategic interactions in quit-to-exit behavior. To account for within-firm dependence, we cluster standard errors by firm. All soecifications include either firm or firm-cohort fixed effects.

7.1. Baseline results

Our baseline results are reported in Table 5. Models (1)-(2) respectively use group structures F and F-C. We find a positive yet statistically insignificant peer effect in fee-weighted caseload. In group F, the magnitude of the peer effect is around one third of the effect of a lawyer’s own fee-weighted caseload, which is negative and statistically significant at the 0.01 level. For group F-C, the magnitude of the peer effect is similar to that of own fee-weighted caseload. Models (3)-(4) consider instead the proportion of high-ability peers in groups F and F-C respectively, for which we find larger peer effects. The coefficient of 0.0544 in model (3) is statistically significant at the 0.05 level and indicates that a one standard deviation (17.2 pp) increase in the proportion of high-ability peers leads to 26.0 percent (0.9 pp) increase in lawyers’ quit-to-exit probability. We find a similar effect in model (4). In both models we find a notable gender gap, in that female lawyers are around 35.8-41.1 percent (1.3-1.5 pp) more likely to quit than male lawyers. This is consistent with previous findings that women are more likely to discontinue in competitive environments (Hunt, 2016; Wasserman, 2018). We also find evidence that quits are negatively but concavely correlated with own age and negatively correlated with own fee-weighted caseload, which is consistent with Jovanovic (1979).

Next, in models (5)-(6) we estimate the impact of peer ability separately for those above and below 35. We find that the peer effect is stronger in the younger cohort. The coefficient of 0.0767 in model (5) is statistically significant at the 0.01 level and indicates that a one standard deviation (19.3 pp) increase in the proportion of high-ability peers leads to 35.9 percent (1.48 pp) increase in lawyers’ quit-to-exit likelihood. In contrast, we find a smaller effect among those older than 35, which is not statistically distinguishable from zero. We also find a larger gender gap among those below 35. Overall, we find stronger quit-to-exit effects among those under 35, which is consistent with Jovanovic (1979).

In Table 6, we restrict our analysis to small- to medium- size firms to alleviate the concern that our ability measure cannot capture non-litigation performance. We limit the sample of firms to those have 50 or fewer lawyers (i.e. 732 out of 755 firms). Our results are qualitatively identical the baseline specification. Another concern is on lawyers close to the retirement age, who may intentionally reduce caseload before the retirement. To that end, we replicate our preferred models (models (3)-(4) in Table 5) and we exclude lawyers beyond age 55, the age at which most employees in China are eligible to claim their pension. As shown in models (1)-(2) of Table 7, we still find similar adverse peer ability effects, indicating that retiring lawyers are not driving the estimated effects. Next, in models (3)-(4), we conduct another robustness check by including all law firms in Shanghai. The similar effects we find in these two columns suggest that caseload broadly serves as a reliable ability proxy in the legal profession context. Overall, the similar effects found in alternative samples in models (1)-(4) and in Table 6 imply that the estimated peer ability effects generally exist in our context. Lastly, in models (5)-(6), we consider an alternative definition using the number of high-ability peers in the specific groups F and F-C. The results are also in line with our baseline findings.

In Table 8, we conduct a heterogeneity analysis. In models (1)-(2), we examine the impact of high-ability peers on lawyers whose fee-weighted caseload is below median. Consistent with Antecol et al. (2016) and Booij et al. (2017), these low- to medium-ability individuals drive the negative peer ability effects. In models (3)-(4), we study whether women are more sensitive to peer ability effects. Despite the significant gender gap of persistence, we find no evidence suggesting that women are more sensitive to peer ability effects. In models (5)-(6), we check whether the magnitude of peer ability effects vary by firm size by interacting the proportion of high-ability peers with an indicator for firms below the median size. The coefficient on the interaction term suggests that lawyers in small firms are not additively more prone to peer ability effects.

7.2. Missing data

We now ask whether we could have obtained similar findings had only a random sample of lawyers with firm identifiers been available. To do this, we draw random 1000 random sub-samples of lawyers using the sampling process from our Monte-Carlo experiment (see Section 6.1). We focus on the baseline specification for group F (model (3) of Table 5), consider 50% and 70% subsamples (i.e., $\rho=0.5$ and $\rho=0.7$ ). The minimum value of $n_{g_{0}}$ in our sample is 2, which we use as a lower bound for the estimator with unknown group sizes. We make a parametric restriction on the distribution of $n_{g_{0}}-2$ , supposing that it follows a negative binomial distribution. Figure 2 depicts the empirical distribution for the original sample and a fitted negative binomial distribution, which provides a good approximation.

The results are reported in Figures 3 ( $\rho=0.5$ ) and 4 ( $\rho=0.7$ ). As in our Monte-Carlo experiment, we consider the estimator based on miss-specified peer group sizes (i.e., incorrectly supposing that $n_{g}=n_{g_{0}}$ ), known sizes and unknown sizes. Looking at the left hand columns we see that miss-specification has little consequence for estimates of the effect of a lawyer’s own characteristics. However, looking at the right hand columns we see that miss-specification biases estimates of peer effects towards zero. Comparing Figure 3 with Figure 4, we see that the bias in in the peer effects is larger for $\rho=0.5$ than for $\rho=0.7$ . Unsurprisingly, the variability of the estimators is also larger for $\rho=0.5$ than for $\rho=0.7$ . These findings are qualitatively identical to those of our Monte-Carlo experiment, but use real data and larger group sizes.

7.3. Group uncertainty

Model (7) of Table 5 allows for uncertainty as to whether F or F-C is the relevant peer group. The point estimate of $\psi$ (i.e., the probability of F-C) is 0.562, which suggests considerable heterogeneity, though it is not precisely estimated. Despite this, we find similar peer effects to the models based on groups F and F-C. We obtain similar results in the sample of small-medium firms (see Table 6).

8. Conclusion

Our identification results and empirical work demonstrate that it is possible to conduct empirical analysis of peer effects despite missing data and group uncertainty. Regarding missing data, we require only that the researcher has access to a sample of individuals with outcomes, exogenous characteristics and group identifiers. We propose a method which does not require information on group size, nor on individuals which are not sampled, nor on whether such individuals even exist. In principle, and subject to the limitations discussed below, this opens up the possibility of peer effects studies based on widely available individual level survey data (provided that it contains group identifiers). We also show that peer effects can be identified under group uncertainty. Future work may extend our results to incorporate both missing individuals and group uncertainty simultaneously.

A limitation of our work is that our results are specific to the group interactions model we consider. Under more general network structures (e.g., social networks), it is not the case that the specified group fixed effects reduce the problem of miss-specification to the problem of inferring the group size. Future work may seek to bridge this divide.

References

Abowd et al. (1999) Abowd, J. M., F. Kramarz and D. N. Margolis, “High wage workers and high wage firms,” Econometrica 67 (1999), 251–333.
Angrist (2014) Angrist, J. D., “The perils of peer effects,” Labour Economics 30 (2014), 98–108.
Angrist and Lang (2004) Angrist, J. D. and K. Lang, “Does school integration generate peer effects? Evidence from Boston’s Metco Program,” American Economic Review 94 (2004), 1613–1634.
Antecol et al. (2016) Antecol, H., O. Eren and S. Ozbeklik, “Peer effects in disadvantaged primary schools evidence from a randomized experiment,” Journal of Human Resources 51 (2016), 95–132.
Arduini et al. (2020) Arduini, T., E. Patacchini and E. Rainone, “Identification and estimation of network models with heterogeneous interactions,” in The Econometrics of Networks (Emerald Publishing Limited, 2020).
Arnab (2017) Arnab, R., Survey sampling theory and applications (Academic Press, 2017).
Bertrand et al. (2000) Bertrand, M., E. F. Luttmer and S. Mullainathan, “Network effects and welfare cultures,” The Quarterly Journal of Economics 115 (2000), 1019–1055.
Blume et al. (2015) Blume, L. E., W. A. Brock, S. N. Durlauf and R. Jayaraman, “Linear social interactions models,” Journal of Political Economy 123 (2015), 444–496.
Booij et al. (2017) Booij, A. S., E. Leuven and H. Oosterbeek, “Ability peer effects in university: Evidence from a randomized experiment,” The review of economic studies 84 (2017), 547–578.
Boucher et al. (2014) Boucher, V., Y. Bramoullé, H. Djebbari and B. Fortin, “Do peers affect student achievement? Evidence from Canada using group size variation,” Journal of applied econometrics 29 (2014), 91–109.
Boucher et al. (2020) Boucher, V., Y. Bramoullé et al., Binary Outcomes and Linear Interactions (Aix-Marseille School of Economics, 2020).
Boucher and Houndetoungan (2020) Boucher, V. and A. Houndetoungan, “Estimating peer effects using partial network data,” Working paper (2020).
Brady et al. (2017) Brady, R. R., M. A. Insler and A. S. Rahman, “Bad company: Understanding negative peer effects in college achievement,” European Economic Review 98 (2017), 144–168.
Bramoullé et al. (2009) Bramoullé, Y., H. Djebbari and B. Fortin, “Identification of peer effects through social networks,” Journal of econometrics 150 (2009), 41–55.
Bramoullé et al. (2020) ———, “Peer effects in networks: A survey,” Annual Review of Economics 12 (2020), 603–629.
Breza et al. (2020) Breza, E., A. G. Chandrasekhar, T. H. McCormick and M. Pan, “Using aggregated relational data to feasibly identify network structure without network data,” American Economic Review 110 (2020), 2454–84.
Brown (2011) Brown, J., “Quitters never win: The (adverse) incentive effects of competing with superstars,” Journal of Political Economy 119 (2011), 982–1013.
Carrell et al. (2013) Carrell, S. E., B. I. Sacerdote and J. E. West, “From natural variation to optimal policy? The importance of endogenous peer group formation,” Econometrica 81 (2013), 855–882.
Chandrasekhar and Lewis (2011) Chandrasekhar, A. and R. Lewis, “Econometrics of sampled networks,” Unpublished manuscript, MIT.[422] (2011).
Cornelissen et al. (2017) Cornelissen, T., C. Dustmann and U. Schönberg, “Peer effects in the workplace,” American Economic Review 107 (2017), 425–56.
Davezies et al. (2009) Davezies, L., X. d’Haultfoeuille and D. Fougère, “Identification of peer effects using group size variation,” The Econometrics Journal 12 (2009), 397–413.
Duflo et al. (2011) Duflo, E., P. Dupas and M. Kremer, “Peer effects, teacher incentives, and the impact of tracking: Evidence from a randomized evaluation in Kenya,” American economic review 101 (2011), 1739–74.
Emerson and Hill (2018) Emerson, J. and B. Hill, “Peer effects in marathon racing: The role of pace setters,” Labour Economics 52 (2018), 74–82.
Foster (2006) Foster, G., “It’s not your peers, and it’s not your friends: Some progress toward understanding the educational peer effect mechanism,” Journal of public Economics 90 (2006), 1455–1475.
Glaeser et al. (2003) Glaeser, E. L., B. I. Sacerdote and J. A. Scheinkman, “The social multiplier,” Journal of the European Economic Association 1 (2003), 345–353.
Goldsmith-Pinkham and Imbens (2013) Goldsmith-Pinkham, P. and G. W. Imbens, “Social networks and the identification of peer effects,” Journal of Business & Economic Statistics 31 (2013), 253–264.
Guryan et al. (2009) Guryan, J., K. Kroft and M. J. Notowidigdo, “Peer effects in the workplace: Evidence from random groupings in professional golf tournaments,” American Economic Journal: Applied Economics 1 (2009), 34–68.
Hardy et al. (2019) Hardy, M., R. M. Heath, W. Lee and T. H. McCormick, “Estimating spillovers using imprecisely measured networks,” arXiv preprint arXiv:1904.00136 (2019).
Howell (2021) Howell, S. T., “Learning from feedback: Evidence from new ventures,” Review of Finance 25 (2021), 595–627.
Hoxby (2000) Hoxby, C., “Peer effects in the classroom: Learning from gender and race variation,” Technical Report, National Bureau of Economic Research, 2000.
Hoxby and Weingarth (2005) Hoxby, C. M. and G. Weingarth, “Taking race out of the equation: School reassignment and the structure of peer effects,” Technical Report, Citeseer, 2005.
Hunt (2016) Hunt, J., “Why do women leave science and engineering?,” ILR Review 69 (2016), 199–226.
Jovanovic (1979) Jovanovic, B., “Job matching and the theory of turnover,” Journal of political economy 87 (1979), 972–990.
Lee (2007) Lee, L.-F., “Identification and estimation of econometric models with group interactions, contextual factors and fixed effects,” Journal of Econometrics 140 (2007), 333–374.
Lewbel (2019) Lewbel, A., “The identification zoo: Meanings of identification in econometrics,” Journal of Economic Literature 57 (2019), 835–903.
Lewbel et al. (2019) Lewbel, A., X. Qu, X. Tang et al., “Social networks with misclassified or unobserved links,” Unpublished manuscript (2019).
Liu (2006) Liu, S., “Client influence and the contingency of professionalism: the work of elite corporate lawyers in China,” Law & Society Review 40 (2006), 751–782.
Liu et al. (2017) Liu, X., E. Patacchini and E. Rainone, “Peer effects in bedtime decisions among adolescents: a social network model with sampled data,” The econometrics journal 20 (2017), S103–S125.
Lundborg (2006) Lundborg, P., “Having the wrong friends? Peer effects in adolescent substance use,” Journal of health economics 25 (2006), 214–233.
Manski (1993) Manski, C. F., “Identification of endogenous social effects: The reflection problem,” The Review of Economic Studies 60 (1993), 531–542.
Manski (1995) ———, Identification problems in the social sciences (Harvard University Press, 1995).
Michelson (2006) Michelson, E., “The practice of law as an obstacle to justice: Chinese lawyers at work,” Law & Society Review 40 (2006), 1–38.
Moffitt et al. (2001) Moffitt, R. A. et al., “Policy interventions, low-level equilibria, and social interactions,” Social dynamics 4 (2001), 6–17.
Reza et al. (2021) Reza, S., P. Manchanda and J.-K. Chong, “Identification and Estimation of Endogenous Peer Effects Using Partial Network Data from Multiple Reference Groups,” Management Science (2021).
Rose (2017) Rose, C. D., “Identification of peer effects through social networks using variance restrictions,” The Econometrics Journal 20 (2017), S47–S60.
Sacerdote (2001) Sacerdote, B., “Peer effects with random assignment: Results for Dartmouth roommates,” The Quarterly journal of economics 116 (2001), 681–704.
Smith (2013) Smith, J., “Peers, pressure, and performance at the national spelling bee,” Journal of Human resources 48 (2013), 265–285.
Sojourner (2013) Sojourner, A., “Identification of peer effects with missing peer data: Evidence from Project STAR,” The Economic Journal 123 (2013), 574–605.
Thiemann (2022) Thiemann, P., “The persistent effects of short-term peer groups on performance: Evidence from a natural experiment in higher education,” Management Science 68 (2022), 1131–1148.
Wang and Lee (2013) Wang, W. and L.-F. Lee, “Estimation of spatial autoregressive models with randomly missing data in the dependent variable,” The Econometrics Journal 16 (2013), 73–102.
Wasserman (2018) Wasserman, M., “Gender differences in politician persistence,” The Review of Economics and Statistics (2018), 1–46.
Yamane and Hayashi (2015) Yamane, S. and R. Hayashi, “Peer effects among swimmers,” The Scandinavian Journal of Economics 117 (2015), 1230–1255.

Appendix

Proof.

of Proposition 1

The first result was established by Davezies et al. (2009).

To establish the second result, first note that Assumption 3 implies $\mathbb{E}[\overline{u}_{i}|n_{g},\mathbf{x}_{g}]=0$ for all $i\in\mathcal{S}$ . Let us now fix $\mathbf{x}_{g}=\mathbf{x}$ for some $\mathbf{x}$ in its support. We begin by identifying $\rho$ and the distribution of $n_{g_{0}}|\mathbf{x}_{g}=\mathbf{x}$ using the observable distribution of $n_{g}|n_{g}\geq 1,\mathbf{x}_{g}=\mathbf{x}$ and the fact that Assumption 3 implies

n_{g}|n_{g_{0}},\mathbf{x}_{g}=\mathbf{x}\sim\text{Binomial}(n_{g_{0}},\rho)

By the boundedness assumption, there exists $\overline{n}<\infty:\mathbb{P}[n_{g_{0}}=\overline{n}]>0,\mathbb{P}[n_{g_{0}}\leq\overline{n}]=1$ . Note that, because $\rho>0$ , $\overline{n}$ is identifiable as the integer $n$ such that $\mathbb{P}[n_{g}=n|n_{g}\geq 1]>0$ . This implies that there exists identifiable $\overline{n}(\mathbf{x})<\infty:\mathbb{P}[n_{g_{0}}=\overline{n}(\mathbf{x})|\mathbf{x}_{g}=\mathbf{x}]>0,\mathbb{P}[n_{g_{0}}(\mathbf{x})\leq\overline{n}(\mathbf{x})|\mathbf{x}_{g}=\mathbf{x}]=1$ .

Let $\mathbf{p}_{n}(\mathbf{x})=\mathbb{P}[n_{g}=n|n_{g}\geq 1,\mathbf{x}_{g}=\mathbf{x}]$ and $\mathbf{q}_{m}(\mathbf{x})=\mathbb{P}[n_{g_{0}}=m|\mathbf{x}_{g}=\mathbf{x}]$ . Under Assumption 3, $\mathbf{p}(\mathbf{x})=(\mathbf{p}_{n}(\mathbf{x}))_{n=1,...,\overline{n}(\mathbf{x})}$ and $\mathbf{q}(\mathbf{x})=(\mathbf{q}_{m}(\mathbf{x}))_{m=1,...,\overline{n}(\mathbf{x})}$ verify

\displaystyle\mathbf{p}(\mathbf{x})/\mathbf{p}_{\overline{n}(\mathbf{x})}(\mathbf{x})=\mathbf{A}(\mathbf{x},\rho)\mathbf{s}(\mathbf{x})\rho^{-\overline{n}(\mathbf{x})}

(8.1)

where $\mathbf{A}(\mathbf{x},\rho)=(\mathbf{A}_{ij}(\mathbf{x},\rho))_{i=1,...,\overline{n}(\mathbf{x}),j=1,...,\overline{n}(\mathbf{x})}$ is an upper triangular Binomial matrix with entries,

\mathbf{A}_{ij}(\mathbf{x},\rho)=\begin{cases}{j\choose i}\rho^{i}(1-\rho)^{j-i}&i\leq j\\ 0&\text{otherwise}\end{cases}

and $\mathbf{s}(\mathbf{x})=\mathbf{q}(\mathbf{x})/\mathbf{q}_{\overline{n}(\mathbf{x})}(\mathbf{x})$ . By construction, $\mathbf{s}_{\overline{n}(\mathbf{x})}(\mathbf{x})=1$ and $\sum_{m=1}^{\overline{n}(\mathbf{x})}\mathbf{s}_{m}(\mathbf{x})=1/\mathbf{q}_{\overline{n}(\mathbf{x})}(\mathbf{x})$ . We also know $\mathbf{q}_{1}(\mathbf{x})=0$ because $n_{g_{0}}\geq 2$ , hence we know $\mathbf{s}_{1}(\mathbf{x})=0$ . The remaining entries of $\mathbf{s}(\mathbf{x})$ are non-negative but unknown.

Now suppose that there exists $\widetilde{\rho}$ and $\widetilde{\mathbf{s}}(\mathbf{x})$ which also verify (8.1). Then we have

	$\displaystyle\mathbf{s}(\mathbf{x})$	$\displaystyle=\left(\frac{\rho}{\widetilde{\rho}}\right)^{\overline{n}(\mathbf{x})}\mathbf{A}(\mathbf{x},\rho)^{-1}\mathbf{A}(\mathbf{x},\widetilde{\rho})\widetilde{\mathbf{s}}(\mathbf{x})$		(8.2)
		$\displaystyle=\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho})\widetilde{\mathbf{s}}(\mathbf{x})$		(8.3)

where $\mathbf{A}(\mathbf{x},\rho)^{-1}$ exists since $|\mathbf{A}(\mathbf{x},\rho)|=\rho^{T(\overline{n}(\mathbf{x}))}$ where $T(\overline{n}(\mathbf{x}))$ is the $\overline{n}(\mathbf{x})^{th}$ triangular number and by assumption $\rho>0$ . The matrix $\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho})=(\mathbf{B}_{ij}(\mathbf{x},\rho,\widetilde{\rho}))_{i=1,...,\overline{n}(\mathbf{x}),j=1,...,\overline{n}(\mathbf{x})}$ is an upper triangular matrix with entries,

\mathbf{B}_{ij}(\mathbf{x},\rho,\widetilde{\rho})=\begin{cases}\left(\frac{\rho}{\widetilde{\rho}}\right)^{\overline{n}(\mathbf{x})}{j\choose i}\widetilde{\rho}^{i}\rho^{-j}(\rho-\widetilde{\rho})^{j-i}&i\leq j\\ 0&\text{otherwise}\end{cases}.

The final equation in the system in (8.3) is redundant because we already established that $\mathbf{s}_{\overline{n}(\mathbf{x})}=\widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})}=1$ by construction. Rearranging the penultimate equation yields

\displaystyle\rho=\widetilde{\rho}\left(\frac{\mathbf{s}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})+\overline{n}(\mathbf{x})}{\widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})+\overline{n}(\mathbf{x})}\right)

(8.4)

Denoting the term in parentheses by $\Delta(\mathbf{x})$ , injecting (8.4) into the definition of $\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho})$ yields $\mathbf{B}(\mathbf{x},\rho,\widetilde{\rho})=\mathbf{C}(\Delta(\mathbf{x}))$ with entries

\mathbf{C}_{ij}(\Delta(\mathbf{x}))=\begin{cases}{j\choose i}\Delta(\mathbf{x})^{\overline{n}(\mathbf{x})-j}(\Delta(\mathbf{x})-1)^{j-i}&i\leq j\\ 0&\text{otherwise}\end{cases},

hence we can re-write the system in (8.3) as

\displaystyle\mathbf{s}(\mathbf{x})

\displaystyle=\mathbf{C}(\Delta(\mathbf{x}))\widetilde{\mathbf{s}}(\mathbf{x}).

(8.5)

From its definition, it is clear that $\Delta(\mathbf{x})>0$ and when $\Delta(\mathbf{x})=1$ we have $\rho=\widetilde{\rho}$ (due to (8.4)), $\mathbf{s}(\mathbf{x})=\widetilde{\mathbf{s}}(\mathbf{x})$ (due to (8.3)) and hence $\mathbf{q}(\mathbf{x})=\widetilde{\mathbf{q}}(\mathbf{x})$ (because $\sum_{m=1}^{\overline{n}(\mathbf{x})}\mathbf{s}_{m}(\mathbf{x})=1/\mathbf{q}_{\overline{n}(\mathbf{x})}(\mathbf{x})$ ). We now show that $\Delta(\mathbf{x})=1$ by ruling out $\Delta(\mathbf{x})>1$ and $\Delta(\mathbf{x})<1$ by contradiction.

Suppose first that $\Delta(\mathbf{x})>1$ . Then, since $\mathbf{s}_{1}(\mathbf{x})=0$ , the first equation in (8.5) is

0=\sum_{j=1}^{\overline{n}(\mathbf{x})}j\Delta(\mathbf{x})^{\overline{n}(\mathbf{x})-j}(\Delta(\mathbf{x})-1)^{j-1}\widetilde{\mathbf{s}}_{j}(\mathbf{x})

The right hand side is strictly positive because by construction at least one entry of $\widetilde{\mathbf{s}}(\mathbf{x})$ must be strictly positive. Hence we cannot have $\Delta(\mathbf{x})>1$ .

Suppose instead that $\Delta(\mathbf{x})<1$ . Using the first $\overline{n}(\mathbf{x})-2$ equations in (8.5) to solve for the $\overline{n}(\mathbf{x})-2$ unknowns $\widetilde{\mathbf{s}}_{2}(\mathbf{x}),\widetilde{\mathbf{s}}_{3}(\mathbf{x}),...,\widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})$ , we obtain

\displaystyle\begin{pmatrix}\widetilde{\mathbf{s}}_{2}(\mathbf{x})\\ \widetilde{\mathbf{s}}_{3}(\mathbf{x})\\ \vdots\\ \widetilde{\mathbf{s}}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})\end{pmatrix}=\mathbf{D}(\Delta(\mathbf{x}))^{-1}\left[\begin{pmatrix}0\\ \mathbf{s}_{2}(\mathbf{x})\\ \vdots\\ \mathbf{s}_{\overline{n}(\mathbf{x})-2}(\mathbf{x})\end{pmatrix}-\mathbf{d}(\Delta(\mathbf{x}))\right],

(8.6)

where $\mathbf{D}(\Delta(\mathbf{x}))$ is the sub-matrix of $\mathbf{C}(\Delta(\mathbf{x}))$ formed from rows $1,2,...,\overline{n}(\mathbf{x})-2$ and columns $2,3,...,\overline{n}(\mathbf{x})-1$ and $\mathbf{d}(\Delta(\mathbf{x}))$ is the sub-matrix of $\mathbf{C}(\Delta(\mathbf{x}))$ formed from rows $1,2,...,\overline{n}(\mathbf{x})-2$ and column $\overline{n}(\mathbf{x})$ . Injecting (8.6) into the right hand side of the penultimate equation of (8.5) yields

\mathbf{s}_{\overline{n}(\mathbf{x})-1}(\mathbf{x})=\begin{cases}-\frac{\overline{n}(\mathbf{x})(1-\Delta(\mathbf{x}))}{\overline{n}(\mathbf{x})-1}&\overline{n}(\mathbf{x})=2\\ -\frac{\sum_{j=2}^{\overline{n}(\mathbf{x})-2}j(1-\Delta(\mathbf{x}))^{j-2}\mathbf{s}_{j}(\mathbf{x})}{(\overline{n}(\mathbf{x})-1)(1-\Delta(\mathbf{x}))^{\overline{n}(\mathbf{x})-3}}-\frac{\overline{n}(\mathbf{x})(1-\Delta(\mathbf{x}))}{\overline{n}(\mathbf{x})-1}&\overline{n}(\mathbf{x})\geq 3\end{cases}

Since $\Delta(\mathbf{x})\in(0,1)$ and the entries of $\mathbf{s}(\mathbf{x})$ are non-negative, the right hand side is strictly negative whilst the left hand side is non-negative. Hence we cannot have $\Delta(\mathbf{x})<1$ . So $\Delta(\mathbf{x})=1$ and $\rho$ and $\mathbf{q}(\mathbf{x})$ are identified.

Provided that there is variation in the elements of $\mathbf{x}$ , the conditional moment

\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g}=n,\mathbf{x}_{g}=\mathbf{x}]

\displaystyle=\overline{x}_{i}\varphi(n,\mathbf{x})

(8.7)

identifies $\varphi(n,\mathbf{x})$ for all such $(\mathbf{x},n\geq 2)$ in the support of $(\mathbf{x}_{g},n_{g})$ (it is not identified when $n=1$ nor when $\mathbf{x}=c\iota_{n}$ for some constant $c$ because in these cases $\overline{y}_{i}=\overline{x}_{i}=0$ ). The rest of the proof restricts attention to the case in which there is variation in the elements of $\mathbf{x}$ .

Define $\mathbf{r}_{n}(\mathbf{x})=\mathbb{P}[n_{g}=n|\mathbf{x}_{g}=\mathbf{x}]$ and $\mathbf{r}(\mathbf{x})=(\mathbf{r}_{n}(\mathbf{x}))_{n=1,...,\overline{n}(\mathbf{x})}$ . Since $\mathbf{q}(\mathbf{x})$ and $\rho$ are identified, $\mathbf{r}(\mathbf{x})$ is also identified using

\mathbf{r}(\mathbf{x})=(\iota_{\overline{n}(\mathbf{x})}^{\prime}\mathbf{A}(\mathbf{x},\rho)\mathbf{q}(\mathbf{x}))\mathbf{p}(\mathbf{x}).

Now denote $\mathcal{Q}(\mathbf{x})=\{m\in\{2,...,\overline{n}(\mathbf{x})\}:\mathbf{q}_{m}(\mathbf{x})>0\}$ , the largest element of which is $\overline{n}(\mathbf{x})$ . Moreover, since $\rho>0$ , the support of $\mathbf{r}(\mathbf{x})$ is $\mathcal{R}(\mathbf{x})=\{1,...,\overline{n}(\mathbf{x})\}$ if $\rho\in(0,1)$ and $\mathcal{R}(\mathbf{x})=\mathcal{Q}(\mathbf{x})$ if $\rho=1$ . Let $\widehat{\boldsymbol{\pi}}(\mathbf{x})=(\pi(m))_{m\in\mathcal{Q}(\mathbf{x})}$ , $\widehat{\mathbf{q}}(\mathbf{x})=(\mathbf{q}_{m}(\mathbf{x}))_{m\in\mathcal{Q}(\mathbf{x})}$ , $\widehat{\mathbf{r}}(\mathbf{x})=(\mathbf{r}_{n}(\mathbf{x}))_{n\in\mathcal{R}(\mathbf{x})\backslash 1}$ and $\widehat{\mathbf{A}}(\mathbf{x},\rho)$ be the sub-matrix of $\mathbf{A}(\mathbf{x},\rho)$ formed from rows $\mathcal{R}(\mathbf{x})\backslash 1$ and columns $\mathcal{Q}(\mathbf{x})$ . Then we have the equations

\displaystyle(\varphi(n,\mathbf{x}))_{n\in\mathcal{R}(\mathbf{x})\backslash 1}={\rm diag}(\widehat{\mathbf{r}}(\mathbf{x}))^{-1}\widehat{\mathbf{A}}(\mathbf{x},\rho){\rm diag}(\widehat{\mathbf{q}}(\mathbf{x}))\widehat{\boldsymbol{\pi}}(\mathbf{x}),

(8.8)

for which the left hand side is identified. Since $\mathbf{A}(\mathbf{x},\rho)$ has full rank, $\widehat{\mathbf{A}}(\mathbf{x},\rho)$ has rank $|\mathcal{Q}(\mathbf{x})|$ . This is because, when $\rho=1$ , we have $\mathcal{R}(\mathbf{x})=\mathcal{Q}(\mathbf{x})$ and $\widehat{\mathbf{A}}(\mathbf{x},\rho)=\mathbf{I}_{|\mathcal{Q}(\mathbf{x})|}$ . When $\rho\in(0,1)$ , $\mathcal{R}=\{1,...,\overline{n}(\mathbf{x})\}$ , hence $\widehat{\mathbf{A}}(\mathbf{x},\rho)$ comprises rows $\{2,...,\overline{n}(\mathbf{x})\}$ and a subset of columns $\{2,...,\overline{n}(\mathbf{x})\}$ of $\mathbf{A}(\mathbf{x},\rho)$ . The sub-matrix of $\mathbf{A}(\mathbf{x},\rho)$ with rows $\{2,...,\overline{n}(\mathbf{x})\}$ and columns $\{2,...,\overline{n}(\mathbf{x})\}$ has determinant $\rho^{T(\overline{n}(\mathbf{x})-1)}>0$ , which implies that $\widehat{\mathbf{A}}(\mathbf{x},\rho)$ has rank $|\mathcal{Q}(\mathbf{x})|$ . Moreover, ${\rm diag}(\widehat{\mathbf{q}}(\mathbf{x}))$ is invertible by construction, hence (8.8) identifies $\widehat{\boldsymbol{\pi}}(\mathbf{x})$ for all $\mathbf{x}$ in the support of $\mathbf{x}_{g}$ .

By assumption there are at least three different group sizes in the support of $n_{g_{0}}$ , so $|\cup_{\mathbf{x}}\mathcal{Q}(\mathbf{x})|\geq 3$ . This implies that $\pi(n_{1}),\pi(n_{2}),\pi(n_{3})$ are identified for at least three different group sizes. We conclude by applying the well known result established by Lee (2007), Davezies et al. (2009) and Bramoullé et al. (2009) that $(\gamma,\delta,\beta)$ are identified when there are at least three different group sizes and $\gamma\beta+\delta\neq 0$ .

Proof.

of Proposition 2

Fix $\mathbf{x}_{g_{2}}=\mathbf{x}$ where $\mathbf{x}$ is some point in its support. Provided that there is variation in the elements of $\mathbf{x}$ , the conditional moment

\displaystyle\mathbb{E}[\overline{y}_{i}|n_{g_{1}}=n_{1},n_{g_{2}}=n_{2},\mathbf{x}_{g_{2}}=\mathbf{x}]

\displaystyle=\overline{x}_{i}\varphi^{\prime}(n_{1},n_{2},\mathbf{x})

(8.9)

identifies $\varphi^{\prime}(n_{1},n_{2},\mathbf{x})$ for all such $(\mathbf{x},n_{1}\geq 2,n_{2})$ in the support of $(\mathbf{x}_{g_{2}},n_{g_{1}},n_{g_{2}})$ (it is not identified when $n_{1}=1$ nor when $\mathbf{x}=c\iota_{n_{2}}$ for some constant $c$ because in these cases $\overline{y}_{i}=\overline{x}_{i}=0$ ). We obtain

\displaystyle\varphi^{\prime}(n_{1},n_{2},\mathbf{x})=\psi\pi(n_{1})+(1-\psi)\pi(n_{2})

(8.10)

We now consider the cases $\psi=0$ , $\psi=1$ and $\psi\in(0,1)$ separately. If $\psi=0$ then $(\gamma,\delta,\beta)$ are identified because $\gamma\beta+\delta\neq 0$ and $n_{g_{2}}$ has at least three points in the support of its distribution. If $\psi=1$ then $(\gamma,\delta,\beta)$ are identified because $\gamma\beta+\delta\neq 0$ and $n_{g_{1}}$ has at least three points in the support of its distribution. The rest of the proof now considers $\psi\in(0,1)$ .

By standard arguments for two-way bi-partite fixed effects models (e.g., Abowd et al. (1999)), equation (8.10) identifies $\mu(n_{1})=\psi\pi(n_{1})+c$ and $\mu(n_{2})=(1-\psi)\pi(n_{2})-c$ for every pair $(n_{1},n_{2})$ corresponding to two vertices in a connected component of $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ , where $c$ is an unknown constant. Let the vertices in the connected component respectively correspond to $\mathcal{N}_{1}$ and $\mathcal{N}_{2}$ , where $\mathcal{N}_{1}$ is a subset of the support of $n_{g_{1}}$ and $\mathcal{N}_{2}$ is a subset of the support of $n_{g_{2}}$ .

By assumption either $|\mathcal{N}_{1}|\geq 3$ or $|\mathcal{N}_{2}|\geq 3$ . Without loss of generality, we present the case of $|\mathcal{N}_{1}|\geq 3$ . If the converse holds, the argument is symmetric, switching the roles of the group sizes in the remainder of the proof. Since the component is connected, we have $2\leq n_{1,1}<n_{1,2}<n_{1,3}$ and $2\leq n_{2}$ such that $\{n_{1,1},n_{1,2},n_{1,3}\}\subseteq\mathcal{N}_{1}$ and $\{n_{2}\}\subseteq\mathcal{N}_{2}$ . Moreover, $\mu(n)$ is identified for $n\in\{n_{1,1},n_{1,2},n_{1,3},n_{2}\}$ . For $i\in\{1,2,3\}$ , denote

\displaystyle\Delta_{i}=\mu(n_{1,i})+\mu(n_{2})=\pi(n_{2})+\frac{\psi(\gamma\beta+\delta)(n_{1,i}-n_{2})}{(n_{1,i}-1+\beta)(n_{2}-1+\beta)}

(8.11)

Then after some algebra, for $k\in\{1,2,3\}$ we have

\displaystyle\Delta_{i}-\Delta_{k}=\frac{\psi(\gamma\beta+\delta)(n_{1,i}-n_{1,k})}{(n_{1,i}-1+\beta)(n_{1,k}-1+\beta)}

(8.12)

Now since $\psi\neq 0$ and $\delta+\beta\gamma\neq 0$ ,

\displaystyle\Delta\triangleq\frac{\Delta_{1}-\Delta_{2}}{\Delta_{1}-\Delta_{3}}=\frac{(n_{1,1}-n_{1,2})(n_{1,3}-1+\beta)}{(n_{1,1}-n_{1,3})(n_{1,2}-1+\beta)}

(8.13)

Since $\Delta$ is identified, $\beta$ is identified by (8.13) if $\Delta(n_{1,1}-n_{1,3})\neq n_{1,1}-n_{1,2}$ . Suppose that $\Delta(n_{1,1}-n_{1,3})=n_{1,1}-n_{1,2}$ . Then (8.13) implies $n_{1,2}=n_{1,3}$ , but here we have $n_{1,2}<n_{1,3}$ , a contradiction. So $\beta$ is identified. Since $\beta$ is identified, $\psi(\gamma\beta+\delta)$ is identified from (8.12), hence $\gamma-\delta/(n_{2}-1)$ is identified using (8.11). Now, since the support of $n_{g_{2}}$ has at least three elements, we also identify $\Delta^{\prime}=\mu(n_{1})+\mu(n_{2}^{\prime})$ for some $n_{1}$ in the support of $n_{g_{1}}$ and some $n_{2}^{\prime}\neq n_{2}$ in the support of $n_{g_{2}}$ . Note that $n_{1}$ and $n_{2}$ may be in a different connected component of $\mathcal{G}_{n_{g_{1}},n_{g_{2}}}$ . Since $\beta$ and $\psi(\gamma\beta+\delta)$ are identified, $\Delta^{\prime}$ identifies $\gamma-\delta/(n_{2}^{\prime}-1)$ . Since, $\gamma-\delta/(n_{2}-1)$ and $\gamma-\delta/(n_{2}^{\prime}-1)$ are identified and $n_{2}\neq n_{2}^{\prime}$ , we identify $\gamma$ and $\delta$ .

IV estimation for missing data with known group sizes

In stacked form, the structural equation is

\displaystyle\overline{\mathbf{y}}_{g}=\beta\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g}+\delta\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g}+\gamma\overline{\mathbf{x}}_{g}+\overline{\boldsymbol{\epsilon}}_{g},

(8.14)

where $\overline{\mathbf{y}}_{g}=\mathbf{W}_{g}\mathbf{y}_{g}$ , $\mathbf{W}_{g}$ is the $n_{g}\times n_{g}$ within-specified-group transformation with diagonal elements $1-n_{g}^{-1}$ and off-diagonal elements $-n_{g}^{-1}$ and $\mathbf{G}_{g}(n_{g_{0}})$ is $n_{g}\times n_{g}$ with diagonal elements equal to zero and off-diagonal elements equal to $(n_{g_{0}}-1)^{-1}$ . We can also rewrite the reduced form (equation (4.1)) in stacked form as

\displaystyle\overline{\mathbf{y}}_{g}=(\mathbf{I}_{n_{g}}-\beta\mathbf{G}_{g}(n_{g_{0}}))^{-1}(\gamma\mathbf{I}_{n_{g}}+\delta\mathbf{G}_{g}(n_{g_{0}}))\overline{\mathbf{x}}_{g}+(\mathbf{I}_{n_{g}}-\beta\mathbf{G}_{g}(n_{g_{0}}))^{-1}\overline{\boldsymbol{\epsilon}}_{g},

(8.15)

where $\mathbf{I}_{n}$ is the identity of dimension $n$ , hence we have

\displaystyle\mathbb{E}[\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g}|n_{g},n_{g_{0}},\mathbf{x}_{g}]

\displaystyle=\gamma\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g}+(\gamma\beta+\delta)\left(\sum_{j=2}^{\infty}\beta^{j-2}\mathbf{G}_{g}(n_{g_{0}})^{j}\right)\overline{\mathbf{x}}_{g}.

(8.16)

Using (8.16), it is clear that $\mathbf{G}_{g}(n_{g_{0}})^{2}\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})^{3}\overline{\mathbf{x}}_{g},...$ are instruments for $\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g}$ , hence we can estimate (8.14) by linear GMM using $\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})^{2}\overline{\mathbf{x}}_{g},...$ as instruments for $\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{x}}_{g},\mathbf{G}_{g}(n_{g_{0}})\overline{\mathbf{y}}_{g}$ (Bramoullé et al., 2009).

\marginsize

3cm3cm0.5cm0.5cm

Table 1: Monte Carlo Results: Missing data

Contextual effect only ( $\beta=0$ imposed), $N\approx M=8000$
Model		Miss-specified (M)		Known (K)		Unknown (U)		Unknown-P (U-P)
$\rho$		Mean	RMSE	Mean	RMSE	Mean	RMSE	Mean	RMSE
1	$\delta$	0.502	0.084	0.502	0.084	0.502	0.084	0.502	0.084
1	$\gamma$	1.002	0.06	1.002	0.06	1.002	0.06	1.002	0.06
0.9	$\delta$	0.426	0.112	0.501	0.086	0.499	0.099	0.499	0.099
0.9	$\gamma$	0.971	0.07	1.002	0.061	1.001	0.07	1.001	0.07
0.7	$\delta$	0.328	0.198	0.504	0.097	0.502	0.153	0.5	0.151
0.7	$\gamma$	0.931	0.106	1.002	0.067	1.001	0.101	1	0.1
0.5	$\delta$	0.264	0.266	0.498	0.112	0.518	0.257	0.504	0.238
0.5	$\gamma$	0.904	0.144	0.998	0.077	1.007	0.158	1	0.151
0.3	$\delta$	0.225	0.332	0.505	0.146	0.896	1.602	0.54	0.468
0.3	$\gamma$	0.892	0.204	1.002	0.096	1.262	2.825	1.017	0.279
Contextual effect only ( $\beta=0$ imposed), $N\approx M=1600$
1	$\delta$	0.498	0.184	0.498	0.184	0.498	0.184	0.498	0.184
1	$\gamma$	0.999	0.132	0.999	0.132	0.999	0.132	0.999	0.132
0.9	$\delta$	0.425	0.201	0.498	0.188	0.5	0.22	0.499	0.219
0.9	$\gamma$	0.973	0.142	1.003	0.134	1.003	0.153	1.003	0.152
0.7	$\delta$	0.325	0.277	0.502	0.217	0.511	0.347	0.501	0.332
0.7	$\gamma$	0.926	0.189	1	0.148	1.002	0.224	0.997	0.219
0.5	$\delta$	0.261	0.353	0.495	0.253	0.693	1.303	0.52	0.538
0.5	$\gamma$	0.903	0.251	0.998	0.175	1.083	0.649	1.006	0.336
Contextual and endogenous effects, $N\approx M=8000$
1	$\beta$	0.095	0.772	0.095	0.772	0.095	0.772	0.095	0.772
	$\delta$	0.471	0.245	0.471	0.245	0.471	0.245	0.471	0.245
	$\gamma$	1.02	0.182	1.02	0.182	1.02	0.182	1.02	0.182
0.9	$\beta$	0.092	0.822	0.076	0.788	0.04	0.81	0.04	0.81
	$\delta$	0.394	0.325	0.479	0.246	0.485	0.263	0.485	0.263
	$\gamma$	0.989	0.174	1.018	0.187	1.008	0.193	1.008	0.193
0.7	$\beta$	0.133	0.899	0.069	0.795	0.048	0.875	0.052	0.876
	$\delta$	0.269	0.48	0.483	0.253	0.481	0.322	0.478	0.322
	$\gamma$	0.952	0.171	1.017	0.194	1.008	0.218	1.008	0.217
0.5	$\beta$	0.122	0.928	0.052	0.825	0.067	0.914	0.07	0.919
	$\delta$	0.205	0.577	0.484	0.272	0.485	0.413	0.475	0.404
	$\gamma$	0.923	0.188	1.01	0.202	1.015	0.269	1.012	0.262
0.3	$\beta$	0.077	0.952	0.059	0.851	0.066	0.91	0.098	0.926
	$\delta$	0.19	0.647	0.489	0.301	0.734	1.463	0.5	0.651
	$\gamma$	0.907	0.246	1.015	0.213	1.21	2.77	1.032	0.404
Contextual and endogenous effects, $N\approx M=1600$
1	$\beta$	0.066	0.888	0.066	0.888	0.066	0.888	0.066	0.888
	$\delta$	0.481	0.343	0.481	0.343	0.481	0.343	0.481	0.343
	$\gamma$	1.015	0.245	1.015	0.245	1.015	0.245	1.015	0.245
0.9	$\beta$	0.129	0.922	0.095	0.915	0.103	0.915	0.104	0.915
	$\delta$	0.387	0.425	0.48	0.35	0.477	0.394	0.476	0.394
	$\gamma$	1.004	0.246	1.029	0.26	1.031	0.276	1.031	0.275
0.7	$\beta$	0.035	0.95	0.034	0.906	0.006	0.943	0.009	0.943
	$\delta$	0.32	0.527	0.495	0.377	0.515	0.499	0.504	0.487
	$\gamma$	0.939	0.255	1.009	0.263	1.008	0.335	1.004	0.326
0.5	$\beta$	0.031	0.965	0.03	0.915	0.022	0.95	0.023	0.949
	$\delta$	0.249	0.643	0.487	0.424	0.599	1.263	0.488	0.707
	$\gamma$	0.911	0.305	1.007	0.284	1.051	0.658	1	0.432

Notes: Based on 1000 replications and reported to 3 decimal places. ‘Miss-specified’ is the miss-specified model which uses the NLS estimator obtained by treating the specified groups as if they were the groups (i.e., based on the moment $\mathbb{E}[\overline{y}_{i}|n_{g},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g})$ ). ‘Known’ is the model in which the group size is known which uses the NLS estimator under Assumption 2 (i.e., based on the moment $\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}})$ ). ‘Unknown’ is the model in which the group size is unknown which uses the GMM estimator under Assumption 3. ‘Unonwn-P’ is a modification of U which additionally uses the parametric restriction $n_{g_{0}}-2\sim{\rm Binomial}(2,\omega)$ to construct the GMM estimator. Models Unknown and Unknown-P are based on the upper bound on the support of $n_{g_{0}}$ being known to be $\overline{n}=4$ . Results are presented for $(\beta,\delta,\gamma)$ only. Their true values are $(0,0.5,1)$ .

Table 2: Monte Carlo Results: Uncertain groups

Contextual effect only ( $\beta=0$ imposed), $N=8000$
Model		Rooms (R)		Floors (F)		Known (K)		Uncertain (U)
$\psi$		Mean	RMSE	Mean	RMSE	Mean	RMSE	Mean	RMSE
0.8	$\delta$	0.41	0.123	0.501	0.135	0.496	0.066	0.516	0.131
0.8	$\gamma$	0.992	0.062	1.001	0.034	0.999	0.04	1.002	0.062
0.6	$\delta$	0.321	0.198	0.498	0.135	0.5	0.061	0.504	0.142
0.6	$\gamma$	0.983	0.064	1	0.032	1	0.033	1.001	0.063
0.4	$\delta$	0.232	0.281	0.497	0.133	0.5	0.064	0.501	0.145
0.4	$\gamma$	0.973	0.067	1	0.031	1	0.028	1.001	0.063
0.2	$\delta$	0.144	0.366	0.499	0.129	0.497	0.08	0.503	0.145
0.2	$\gamma$	0.964	0.071	1	0.028	1	0.026	1.004	0.058
Contextual effect only ( $\beta=0$ imposed), $N=1600$
0.8	$\delta$	0.402	0.208	0.494	0.333	0.495	0.14	0.564	0.29
0.8	$\gamma$	0.983	0.132	0.997	0.077	0.997	0.088	1	0.135
0.6	$\delta$	0.313	0.263	0.503	0.327	0.493	0.137	0.538	0.31
0.6	$\gamma$	0.975	0.134	1.001	0.075	0.996	0.073	0.999	0.135
0.4	$\delta$	0.226	0.331	0.505	0.319	0.492	0.146	0.516	0.332
0.4	$\gamma$	0.966	0.136	1	0.071	0.997	0.063	0.999	0.134
0.2	$\delta$	0.134	0.409	0.507	0.308	0.499	0.174	0.503	0.357
0.2	$\gamma$	0.955	0.139	1.001	0.065	0.999	0.058	1.006	0.13
Contextual and endogenous effects, $N=8000$
0.8	$\beta$	0.041	0.814	0.127	0.683	0.054	0.46	0.055	0.799
	$\delta$	0.392	0.36	0.429	0.384	0.466	0.214	0.492	0.302
	$\gamma$	0.998	0.165	1.007	0.051	1.001	0.064	1.01	0.167
0.6	$\beta$	0.039	0.848	0.112	0.663	0.07	0.461	0.099	0.8
	$\delta$	0.301	0.495	0.431	0.385	0.462	0.227	0.458	0.354
	$\gamma$	0.989	0.142	1.004	0.049	1.004	0.049	1.013	0.143
0.4	$\beta$	0.042	0.873	0.115	0.679	0.072	0.491	0.106	0.769
	$\delta$	0.205	0.64	0.427	0.394	0.458	0.256	0.446	0.379
	$\gamma$	0.978	0.119	1.004	0.048	1.003	0.043	1.01	0.117
0.2	$\beta$	0.032	0.881	0.119	0.653	0.106	0.561	0.123	0.724
	$\delta$	0.123	0.772	0.422	0.391	0.436	0.303	0.433	0.393
	$\gamma$	0.969	0.099	1.004	0.044	1.004	0.042	1.011	0.093
Contextual and endogenous effects, $N=1600$
0.8	$\beta$	0.038	0.913	0.067	0.841	0.097	0.709	0.019	0.896
	$\delta$	0.393	0.449	0.465	0.61	0.444	0.343	0.559	0.428
	$\gamma$	0.995	0.228	1.001	0.098	1.002	0.123	1.005	0.239
0.6	$\beta$	0.024	0.927	0.088	0.833	0.065	0.697	0.051	0.9
	$\delta$	0.311	0.568	0.458	0.604	0.455	0.358	0.522	0.485
	$\gamma$	0.986	0.209	1.006	0.096	0.998	0.096	1.009	0.22
0.4	$\beta$	0.019	0.94	0.096	0.833	0.107	0.716	0.045	0.888
	$\delta$	0.223	0.7	0.457	0.579	0.435	0.386	0.493	0.55
	$\gamma$	0.974	0.192	1.006	0.091	1.001	0.081	1.005	0.198
0.2	$\beta$	-0.004	0.94	0.091	0.825	0.078	0.754	0.06	0.863
	$\delta$	0.147	0.826	0.456	0.57	0.451	0.429	0.467	0.606
	$\gamma$	0.962	0.177	1.004	0.085	1.001	0.075	1.011	0.176

Notes: Based on 1000 replications and reported to 3 decimal places. ‘Rooms’ is the model in which the room is treated as if it were the group with NLS estimator based on the moment $\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{1}})$ . ‘Floors’ is the model in which the floor is treated as if it were the group with NLS estimator based on the moment $\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{2}})$ . ‘Known’ is the model in which is it known whether the group is the room or the floor (i.e., based on the moment $\mathbb{E}[\overline{y}_{i}|n_{g_{0}},\mathbf{x}_{g}]=\overline{x}_{i}\pi(n_{g_{0}})$ ). ‘Uncertain’ is the model in which it is unknown whether the group is the room or the floor with NLS estimator based on $\mathbb{E}[\overline{y}_{i}|n_{g_{1}},n_{g_{2}},\mathbf{x}_{g}]=\overline{x}_{i}[\psi\pi(n_{g_{1}})+(1-\psi)\pi(n_{g_{2}})]$ . Results are presented for $(\beta,\delta,\gamma)$ only. Their true values are $(0,0.5,1)$ .

Table 3: Monte Carlo Results: Uncertain groups (FE)

Contextual effect only ( $\beta=0$ imposed), $N=8000$
Model		Rooms (R)		Floors (F)		Known (K)		Uncertain (U)
$\psi$		Mean	RMSE	Mean	RMSE	Mean	RMSE	Mean	RMSE
0.8	$\delta$	0.41	0.123	0.847	0.375	0.496	0.066	0.516	0.131
0.8	$\gamma$	0.992	0.062	1.3	0.303	0.999	0.04	1.002	0.062
0.6	$\delta$	0.321	0.198	0.756	0.292	0.5	0.061	0.504	0.142
0.6	$\gamma$	0.983	0.064	1.224	0.227	1	0.033	1.001	0.063
0.4	$\delta$	0.232	0.281	0.668	0.218	0.5	0.064	0.501	0.145
0.4	$\gamma$	0.973	0.067	1.149	0.153	1	0.028	1.001	0.063
0.2	$\delta$	0.144	0.366	0.584	0.156	0.497	0.08	0.503	0.145
0.2	$\gamma$	0.964	0.071	1.075	0.081	1	0.026	1.004	0.058
Contextual effect only ( $\beta=0$ imposed), $N=1600$
0.8	$\delta$	0.402	0.208	0.841	0.488	0.495	0.14	0.564	0.29
0.8	$\gamma$	0.983	0.132	1.297	0.309	0.997	0.088	1	0.135
0.6	$\delta$	0.313	0.263	0.762	0.43	0.493	0.137	0.538	0.31
0.6	$\gamma$	0.975	0.134	1.225	0.24	0.996	0.073	0.999	0.135
0.4	$\delta$	0.226	0.331	0.679	0.375	0.492	0.146	0.516	0.332
0.4	$\gamma$	0.966	0.136	1.151	0.169	0.997	0.063	0.999	0.134
0.2	$\delta$	0.134	0.409	0.596	0.328	0.499	0.174	0.503	0.357
0.2	$\gamma$	0.955	0.139	1.076	0.102	0.999	0.058	1.006	0.13
Contextual and endogenous effects, $N=8000$
0.8	$\beta$	0.041	0.814	0.224	0.558	0.054	0.46	0.055	0.799
	$\delta$	0.392	0.36	0.725	0.375	0.466	0.214	0.492	0.302
	$\gamma$	0.998	0.165	1.318	0.324	1.001	0.064	1.01	0.167
0.6	$\beta$	0.039	0.848	0.198	0.565	0.07	0.461	0.099	0.8
	$\delta$	0.301	0.495	0.645	0.347	0.462	0.227	0.458	0.354
	$\gamma$	0.989	0.142	1.237	0.244	1.004	0.049	1.013	0.143
0.4	$\beta$	0.042	0.873	0.17	0.602	0.072	0.491	0.106	0.769
	$\delta$	0.205	0.64	0.571	0.347	0.458	0.256	0.446	0.379
	$\gamma$	0.978	0.119	1.159	0.168	1.003	0.043	1.01	0.117
0.2	$\beta$	0.032	0.881	0.152	0.612	0.106	0.561	0.123	0.724
	$\delta$	0.123	0.772	0.492	0.36	0.436	0.303	0.433	0.393
	$\gamma$	0.969	0.099	1.082	0.094	1.004	0.042	1.011	0.093
Contextual and endogenous effects, $N=1600$
0.8	$\beta$	0.038	0.913	0.138	0.747	0.097	0.709	0.019	0.896
	$\delta$	0.393	0.449	0.785	0.62	0.444	0.343	0.559	0.428
	$\gamma$	0.995	0.228	1.31	0.331	1.002	0.123	1.005	0.239
0.6	$\beta$	0.024	0.927	0.129	0.772	0.065	0.697	0.051	0.9
	$\delta$	0.311	0.568	0.715	0.582	0.455	0.358	0.522	0.485
	$\gamma$	0.986	0.209	1.237	0.263	0.998	0.096	1.009	0.22
0.4	$\beta$	0.019	0.94	0.138	0.789	0.107	0.716	0.045	0.888
	$\delta$	0.223	0.7	0.617	0.565	0.435	0.386	0.493	0.55
	$\gamma$	0.974	0.192	1.161	0.191	1.001	0.081	1.005	0.198
0.2	$\beta$	-0.004	0.94	0.136	0.804	0.078	0.754	0.06	0.863
	$\delta$	0.147	0.826	0.521	0.563	0.451	0.429	0.467	0.606
	$\gamma$	0.962	0.177	1.083	0.123	1.001	0.075	1.011	0.176

Table 4: Summary statistics

	All firms			Firms with $n_{g_{0}}\leq 50$
	$N_{0}=8448,G_{0}=755$			$N_{0}=5599,G_{0}=732$
	Mean	Median	SD	Mean	Median	SD
Individual covariates:
Quit-to-exit (0/1)	0.036	0	0.187	0.035	0	0.185
Female (0/1)	0.450	0	0.498	0.454	0	0.498
Grad. degree (0/1)	0.376	0	0.484	0.322	0	0.467
Age (years)	36.493	34	10.145	36.713	34	10.812
Experience (years)	7.799	6	6.927	7.712	5	7.183
Own fee-weighted caseload (‘000 yuan)	20.130	3.050	39.325	20.758	3.798	38.478
Group F:
Peer fee-weighted caseload	20.130	17.888	16.101	20.758	16.812	19.210
Prop. high-ability	0.250	0.226	0.172	0.255	0.231	0.204
Prop. female	0.450	0.444	0.190	0.454	0.462	0.230
Mean age	36.493	36.493	5.063	36.713	35.800	6.073
Mean experience	7.799	7.532	3.374	7.712	7.000	4.004
Prop. grad. degree	0.376	0.402	0.210	0.322	0.324	0.229
Group F-C ( $age\leq 35$ ):
Peer fee-weighted caseload	19.532	14.562	18.519	20.221	14.634	21.684
Prop. high-ability	0.244	0.217	0.193	0.251	0.200	0.228
Prop. female	0.537	0.538	0.218	0.531	0.500	0.260
Peer experience	29.736	29.667	1.573	29.665	29.600	1.839
Peer experience	3.998	4.000	1.185	3.884	3.750	1.395
Prop. grad. degree	0.374	0.385	0.228	0.321	0.333	0.251
Group F-C ( $age>35$ ):
Peer fee-weighted caseload	20.514	18.430	18.558	20.957	15.902	22.556
Prop. high-ability peers	0.253	0.243	0.220	0.253	0.200	0.268
Prop. female	0.338	0.303	0.228	0.352	0.333	0.283
Peer age	45.190	43.108	5.804	46.301	44.818	6.985
Peer experience	12.763	12.500	4.526	13.002	12.429	5.49
Prop. grad. degree	0.388	0.438	0.260	0.330	0.286	0.301
Notes: Quit-to-exit is the dependent variable of interest, which takes a value of 1 if a lawyer quits the job and exits the local legal practice in 2016, 0 otherwise. Fee-weighted caseload measures lawyers’ ability in civil litigation in the prior three years (2013-2015), weighted by the litigation size. We consider two peer groups, F and F-C. Group F means all associate lawyers in the same law firm form the relevant group. Group F-C means lawyers in the same law firm of the same cohort –young or experienced– form the relevant group, where young (experienced) cohort is defined as below (above) 35. Group variables are calculated using a “leave-own-out" method, where high-ability is defined as being in the top quartile of fee-weighted caseload ( $>24.44$ ).

Table 5: Peer effects in lawyers’ quit-to-exit

*: significant at the 0.01 level, :0.05 level, *: 0.1 level. Notes: Each column represents a regression; quadratic terms in age and experience are included in the regression. All reported regressors are defined as in Table 4 and Section 7. Standard errors in parentheses are clustered by law firm. Peer group F means all lawyers in the same firm are peers. F-C means lawyers in the same firm of the same cohort (young or experienced) are peers. ‘U’ means there is group uncertainty between F-C and F, with probability $\psi$ of being the former.
Dep. var: Quit-to-exit (0/1)
	(1)	(2)	(3)	(4)	(5)	(6)	(7)
	All	All	All	All	$age\leq 35$	$age>35$	All
Group F:
Peer fee-weighted caseload	0.0001
	(0.00029)
Prop. high-ability			0.0544**
			(0.02201)
Prop. female	-0.0136		-0.0115
	(0.02893)		(0.02840)
Peer age	0.0042**		0.0046**
	(0.00180)		(0.00181)
Peer experience	-0.0052*		-0.0056*
	(0.00295)		(0.00295)
Prop. grad. degree	-0.0457*		-0.0423
	(0.02715)		(0.02704)
Group F-C:
Peer fee-weighted caseload		0.0003
		(0.00028)
Prop. high-ability				0.0451***	0.0767***	0.0211
				(0.01747)	(0.02878)	(0.02276)
Prop. female		-0.0163		-0.0142	-0.0250	-0.0090
		(0.02166)		(0.02178)	(0.03430)	(0.02576)
Peer age		0.0008		0.0010	0.0008	0.0007
		(0.00198)		(0.00196)	(0.00524)	(0.00213)
Peer experience		-0.0001		-0.0004	-0.0016	-0.0002
		(0.00179)		(0.00178)	(0.00582)	(0.00197)
Prop. grad. degree		-0.0365*		-0.0347*	-0.0376	-0.0279
		(0.01895)		(0.01875)	(0.02865)	(0.02301)
Group U:
Prop. high-ability							0.0581**
							(0.0285)
Prop. female							-0.0130
							(0.0285)
Peer age							0.0021
							(0.0024)
Peer experience							-0.0015
							(0.0025)
Prop. grad. degree							-0.0432*
							(0.0242)
$\psi$							0.5618
							(0.6090)
Fee-weighted caseload	-0.0003***	-0.0003***	-0.0003***	-0.0003***	-0.0003***	-0.0002***	-0.0003***
	(0.00005)	(0.00006)	(0.00005)	(0.00005)	(0.00007)	(0.00008)	(0.0001)
Female	0.0146***	0.0128**	0.0148***	0.0129**	0.0159*	0.0086	0.0134**
	(0.00555)	(0.00602)	(0.00552)	(0.00602)	(0.00860)	(0.00700)	(0.0061)
Age	-0.0035**	-0.0044**	-0.0035**	-0.0043**	0.0086	-0.0001	-0.0043**
	(0.00143)	(0.00215)	(0.00143)	(0.00215)	(0.01618)	(0.00300)	(0.0021)
Experience	0.0016	0.0020	0.0016	0.0020	0.0063	0.0008	0.0019
	(0.00143)	(0.00141)	(0.00142)	(0.00140)	(0.00419)	(0.00150)	(0.0014)
Grad. degree	-0.0050	-0.0066	-0.0048	-0.0065	-0.0133*	0.0042	-0.0065
	(0.00552)	(0.00558)	(0.00551)	(0.00556)	(0.00689)	(0.00893)	(0.0055)
$N_{0}$	8448	8137	8448	8137	4622	3515	8137
$G_{0}$	755	1359	755	1359	691	669	1359
FE	F	F-C	F	F-C	F-C	F-C	F-C

Table 6: Peer effects in lawyers’ quit-to-exit: Sub-sample of firms with

n_{g_{0}}\leq 50

*: significant at the 0.01 level, :0.05 level, *: 0.1 level. Notes: Each column represents a regression; quadratic terms in age and experience are included in the regression. All reported regressors are defined as in Table 4 and Section 7. Standard errors in parentheses are clustered by law firm. Peer group F means all lawyers in the same firm are peers. F-C means lawyers in the same firm of the same cohort (young or experienced) are peers. ‘U’ means there is group uncertainty between F-C and F, with probability $\psi$ of being the former.
Dep. var: Quit-to-exit (0/1)
	(1)	(2)	(3)	(4)	(5)	(6)	(7)
	All	All	All	All	$age\leq 35$	$age>35$	All
Group F:
Peer fee-weighted caseload	0.0001
	(0.00031)
Prop. high-ability			0.0573**
			(0.02318)
Prop. female	-0.0150		-0.0132
	(0.03082)		(0.03001)
Peer age	0.0046**		0.0050**
	(0.00193)		(0.00193)
Peer experience	-0.0059*		-0.0062*
	(0.00330)		(0.00331)
Prop. grad. degree	-0.0249		-0.0217
	(0.03011)		(0.03014)
Group F-C:
Peer fee-weighted caseload		0.0004
		(0.00033)
Prop. high-ability				0.0496***	0.0760**	0.0288
				(0.01835)	(0.03012)	(0.02333)
Prop. female		-0.0165		-0.0149	-0.0091	-0.0374
		(0.02279)		(0.02288)	(0.03562)	(0.03034)
Peer age		0.0007		0.0009	-0.0023	0.0009
		(0.00210)		(0.00203)	(0.00495)	(0.00230)
Peer experience		0.0001		-0.0001	0.0034	-0.0006
		(0.00202)		(0.00199)	(0.00570)	(0.00215)
Prop. grad. degree		-0.0201		-0.0188	-0.0106	-0.0181
		(0.02235)		(0.02236)	(0.03298)	(0.02634)
Group U:
Prop. high-ability							0.0643*
							(0.0379)
Prop. female							-0.0084
							( 0.0308)
Peer age							-0.0005
							(0.0025)
Peer experience							0.0020
							(0.0033)
Prop. grad. degree							-0.0505
							(0.0372)
$\psi$							0.6732
							(0.8046)
Fee-weighted caseload	-0.0003***	-0.0002**	-0.0002***	-0.0002***	-0.0002**	-0.0001	-0.0002***
	(0.00008)	(0.00010)	(0.00007)	(0.00007)	(0.00011)	(0.00012)	(0.0001)
Female	0.0137**	0.0120*	0.0138**	0.0120*	0.0219**	-0.0075	0.0112*
	(0.00691)	(0.00726)	(0.00683)	(0.00725)	(0.00976)	(0.01162)	(0.0066)
Age	-0.0024	-0.0032	-0.0024	-0.0031	0.0159	0.0043	-0.0023
	(0.00177)	(0.00219)	(0.00177)	(0.00219)	(0.01952)	(0.00314)	(0.0021)
Experience	0.0005	0.0010	0.0005	0.0010	0.0033	-0.0016	0.0010
	(0.00143)	(0.00148)	(0.00142)	(0.00146)	(0.00560)	(0.00175)	(0.0014)
Grad. degree	0.0027	0.0011	0.0029	0.0011	-0.0004	0.0096	-0.0013
	(0.00821)	(0.00926)	(0.00822)	(0.00928)	(0.01109)	(0.01343)	( 0.0087)
$N_{0}$	5599	5288	5599	5288	3083	2205	5288
$G_{0}$	732	1313	732	1313	668	645	1313
FE	F	F-C	F	F-C	F-C	F-C	F-C

Table 7: Peer effects in lawyers’ quit-to-exit: Robustness

Dep. var: Quit-to-exit (0/1)
	(1)	(2)	(3)	(4)	(5)	(6)
	$<55$ subjects only		All law firms in		Use number of
			Shanghai		high-ability peers
Group F:
Prop. high-ability	0.0573**		0.0733***
	(0.02474)		(0.02238)
Number of high-ability					0.0202***
					(0.00503)
Group F-C:
Prop. high-ability		0.0509***		0.0421**
		(0.01946)		(0.01656)
Number of high-ability						0.0200***
						(0.00545)
$N_{0}$	7909	7605	13613	13205	8448	8137
$G_{0}$	639	1243	1061	1919	755	1359
FE	F	F-C	F	F-C	F	F-C
*: significant at the 0.01 level, :0.05 level, *: 0.1 level. Notes: Each column represents a regression. Odd-numbered models are estimated based on model (3) of Table 5 (group F), and even-numbered models are estimated based on model (4) of Table 5 (group F-C). Models (1)-(2) focus on subjects below age 55. Models (3)-(4) add an interaction term of prop. high-ability and an indicator identifying low-to-medium ability lawyers (caseload below median). Models (5)-(6) add an interaction term of prop. high-ability and an indicator for female.

Table 8: Peer effects in lawyers’ quit-to-exit: Heterogeneity

*: significant at the 0.01 level, :0.05 level, *: 0.1 level. Notes: Each column represents a regression. Odd-numbered models are estimated based on model (3) of Table 5, and even-numbered models are estimated based on model (4) of Table 5. Models (1)-(2) add an interaction term of prop. high-ability and an indicator identifying low-to-medium ability lawyers (caseload below median). Models (3)-(4) add an interaction term of prop. high-ability and an indicator for female. Models (5)-(6) add an interaction term of prop. high-ability and an indicator for small firms (firm size below median).
Dep. var: Quit-to-exit (0/1)
	(1)	(2)	(3)	(4)	(5)	(6)
	Interacts w/ low-		Interacts w/ female		Interacts w/ small
	to med-ability subjects				firms
Group F:
Prop. high-ability	0.0128		0.0693***		0.0642
	(0.02440)		(0.02656)		(0.04751)
Prop. high-ability $\times$	0.0698***
(Low- to med-ability)	(0.01693)
Prop. high-ability $\times$			-0.0310
Female			(0.02863)
Prop. high-ability $\times$ Female					-0.0109
Small firms					(0.05194)
Group F-C:
Prop. high-ability		0.0064		0.0539***		0.0431*
		(0.02008)		(0.02071)		(0.02235)
Prop. high-ability $\times$		0.0740***
(Low- to med-ability)		(0.01824)
Prop. high-ability $\times$				-0.0118
Female				(0.02707)
Prop. high-ability $\times$						0.0110
Small firms						(0.03312)
$N_{0}$	7909	7605	8448	8137	8448	8137
$G_{0}$	639	1243	755	1359	755	1359
FE	F	F-C	F	F-C	F	F-C