Identification and Estimation of a Semiparametric Logit Model using Network Data

Brice Romuald Gueyap Kounga¹¹1Department of Economics, University of Western Ontario, E-mail: [email protected].

Abstract

This paper studies the identification and estimation of a semiparametric binary network model in which the unobserved social characteristic is endogenous, that is, the unobserved individual characteristic influences both the binary outcome of interest and how links are formed within the network. The exact functional form of the latent social characteristic is not known. The proposed estimators are obtained based on matching pairs of agents whose network formation distributions are the same. The consistency and the asymptotic distribution of the estimators are proposed. The finite sample properties of the proposed estimators in a Monte-Carlo simulation are assessed. We conclude this study with an empirical application.

1 Introduction

The way we interact and communicate with people in our social groups (e.g.; friends, family, colleagues) has a substantial impact on how we behave. We often adapt our behaviors based on social cues and feedback. Factors such as personal opinions, consumer preferences, decisions to invest, and even the inclination towards illicit economic activities are considerably influenced by social networks comprised of friends and acquaintances. For example, the choice of college major or the decision to switch major might be influenced by the attitudes and expectations of that student’s friends, neighborhood, and family (Jackson et al.,, 2017; Sacerdote,, 2001; Pu et al.,, 2021; Feld and Zölitz,, 2022). These social influences can play a significant role in an individual’s choice or decision. This means that it is important to consider social influences in economic models to accurately capture and understand the effects of various factors and interventions on outcomes and if not accounted for, social influences might distort or confound the effect of another variable being studied. For instance in the previous example, if a researcher is studying the effect of a new orientation program on students’ choice of college major without considering social influences, the results might be misleading. The true effect of the orientation program might be masked or exaggerated by the unconsidered influence of peer attitudes (AbdulRaheem et al.,, 2017; Rusli et al.,, 2021).

In practice, these social influences are not observed by the researcher, making it hard to include them as a covariate in the model. For example, in the above example, even if the researcher controls for observable individual characteristics such as gender, age, race, and parents’ education, it is likely to omit factors that influence both students’ choice of friends and their choice of college major (e.g.; family expectations, effort, motivation, psychological disorders, or unreported substance use.). These unobserved factors affect both student’s choice of friends and the choice of a specific field. In the literature, a common approach to addressing this problem is to collect network data under the assumption that the latent social influence is revealed by linking behavior in the network (Blume et al.,, 2011; Graham,, 2015; Auerbach,, 2022). That is, in practice, when a researcher observes a link between two agents, they often have similar social characteristics. Therefore, network data can potentially be used to account for unobserved social influences.

In the setting of a binary response model where a correlation arises between regressors and errors due to an omitted vector of unobserved social characteristics, this paper seeks to address two primary research questions. First, how can binary response models with endogenous networks be effectively identified? And second, how can the observed data on network links be utilized to mitigate the effects of such endogeneity? The endogeneity in this framework is due to unobservable individual characteristics that influence both link formation in the network and the binary outcome of interest. To the best of my knowledge, this is the first paper that addresses these questions specifically for binary models. Given the endogenous peer group formation, we show that we can identify the peer effects by controlling the unobserved individual heterogeneity of the network formation model. To this end, this paper specifies a joint semiparametric binary regression and nonparametric model of network formation where the unobserved social characteristics determine both the social influence in the regression and links in the network. The binary regression model is assumed to have an unknown function of the unobserved index of social characteristics as a covariate, and the probability that two agents link is an unknown function of their unobserved social characteristics.

The remainder of the paper is organized as follows. In the next section, we present the literature review. In Section 3, we formally present the model including the identification and estimation of the parameters of interest.In Section 4, we present the results of Monte Carlo simulations. The remaining work to complete this paper is presented in section LABEL:remaining.

2 Literature Review

Many decisions made by agents are influenced by the behaviors and characteristics of other agents. Valuable insights about risky behaviors or emerging technologies are often acquired from friends, as evidenced by the research by Manski, (2004), Christakis and Fowler, (2008) and Banerjee et al., (2013). There is a vast literature of analyzing the social peer effect. Betts and Morell, (1999) find that the characteristics of the high school peer group affect the average of the undergraduate grade point. Case and Katz, (1991) find large peer effects on youth criminal behavior and drug use. A rich literature on neighborhood effects including Jencks et al., (1990), Rosenbaum, (1991), and Katz et al., (2001) shows that neighborhood peers can have profound effects on both adults and children.

Despite the extensive research available on the subject, the identification of social interactions remains problematic because of two well-known issues: endogeneity, due either to peers’ self-selection or to common group effects, and reflection, a particular case of simultaneity (Manski,, 1993; Moffitt et al.,, 2001; Soetevent,, 2006). Some authors explore models with endogenous networks(Goldsmith-Pinkham and Imbens,, 2013; Hsieh and Lee,, 2016; Arduini et al.,, 2015). Yet, their models introduce certain parameter limitations on the network formation model to determine and evaluate the relevant parameters. Consequently, the reliability of their estimators is often tied to the validity of these assumptions, which might not adequately reflect the diverse connecting patterns seen in numerous real-world networks.

Auerbach, (2022) and Johnsson and Moon, (2021) provide identification conditions that do not require parametric restrictions on the network model. Auerbach, (2022) shows that one can control network endogeneity by pairwise differencing the observations of the two agents whose network formation distributions are the same, and proposes a semiparametric estimator based on matching pairs of agents with similar columns of the squared adjacency matrix. This approach has provided direct inspiration for this research.

In spite of the abundant studies on the topic, these authors have not addressed the identification and estimation of semiparametric binary models using network data. This paper aims to bridge this gap in the literature.

3 Model

Consider a finite set of agents $I=\{1,2,\cdots,n\}$ , where each agent is identified by an observed vector of explanatory variables $X_{i}\in\mathbb{R}^{k}(k>0)$ , with a binary outcome $y_{i}\in\{0,1\}$ , and an unobserved index of social characteristics $\omega_{i}\in[0,1]$ . These variables are related by the following model.

y_{i}=\mathbbm{1}\big{\{}X_{i}\beta+\lambda(\omega_{i})-\varepsilon_{i}\geq 0\big{\}}

(1)

where $\varepsilon_{i}$ is independent and identically distributed (i.i.d.) idiosyncratic error with a cumulative distribution function (cdf) $F$ , $\beta\in\mathbb{R}^{k}$ is an unknown slope parameter. The social influence function $\lambda:[0,1]\to\mathbb{R}$ is an unknown measurable function with $\lambda(\omega_{i})$ being the realized social influence for agent $i$ . The social influence term $\lambda(\omega_{i})$ is the direct effect of interacting with a particular collection of communities.

In addition to this, the researcher also observes a binary adjacency matrix $n\times n$ , $D$ . The element $D_{ij}=1$ if there is a direct link between agents $i$ and $j$ , and $D_{ij}=0$ otherwise. By convention, any agent $i$ is not allowed to be linked to itself $D_{ii}=0$ . We assume that all links are undirected, so that the adjacency matrix $D$ is symmetric, that is, $D_{ij}=D_{ji}$ . The existence of a link between agents $i$ and $j$ is determined by

D_{ij}=\mathbbm{1}\big{\{}f(\omega_{i},\omega_{j})\geq\eta_{ij}\big{\}}\mathbbm{1}(i\neq j)

(2)

where $f$ is a symmetric measurable function and $\left\{\eta_{ij}\right\}^{n}_{i,j=1}$ is a symmetric matrix of unobserved scalar disturbances with independent upper diagonal entries that are mutually independent of $\{x_{i},\omega_{i},\varepsilon_{i}\}_{i=1}^{n}$ .

The unobserved individual index of social characteristic $\omega_{i}$ can be interpreted as the social capital that increases the likelihood of forming a link (e.g.; socioeconomic status, social ability, family expectation, trust). $f(\omega_{i},\omega_{j})-\eta_{ij}$ is interpreted as the utility agents $i$ and $j$ receive from forming a link. This implies that $D_{ij}$ and $D_{st}$ are independent conditionally on $\omega_{i},\omega_{j},\omega_{s}$ and $\omega_{t}$ i.e. the utility agents receive from forming a link does not depend on the existence of other links in the sample.

The following examples illustrate applications of the model to the literature.

Example 1 (Program Participation).

Banerjee et al., (2013) model household participation in a microfinance program in which information about the program diffuses over a social network. They measure participation using lending and trust. Their model could be generalized by adding $\omega_{i}$ as an index of individual trustworthiness, family expectations, and integrity in financial matters.

Example 2 (Peers Effect).

The peer effects of this type would be a model for youth smoking behavior, where smokers could be more likely to form friendships with each other. Let $y_{i}=1$ if a student smokes and 0 if not, $X_{i}$ be a vector of student $i$ covariates (age, grade, gender, etc.), and $D_{ij}=1$ if students $i$ and $j$ are friends and 0 otherwise. The number of direct neighbors of student $i$ is $s_{1i}=\sum_{j\neq i}D_{ij}$ . An extension of the Menzel, (2015) peers effect model is by setting $\lambda(\omega_{i})=\delta\frac{1}{s_{1i}}\sum_{i\neq j}D_{ij}y_{i}$ which correspond to the social influence in our model.

Example 3 (Co-authorship).

Ductor et al., (2014) study how knowledge about the social network of a researcher, as embodied in his coauthor relations, helps them in developing a prediction of his or her future productivity. In this setting, $\omega_{i}$ can be interpreted as some unobserved productivity trait that induces the researcher to have more coauthors and also to be more productive at writing papers.

3.1 Identification

The parameters of interest are $\beta$ and $\lambda(\omega_{i})$ , $\lambda$ and $f$ will be treated as nuisance functions. The parameter $\beta$ is identified if $\lambda(\omega_{i})$ depends on $\omega_{i}$ only through the link function $f_{\omega_{i}}(\cdot)\equiv f(\omega_{i},\cdot):[0,1]\to[0,1]$ , which is the conditional probability that an agent with social characteristics $\omega_{i}$ links with agents of every other social characteristic in $[0,1]$ . In order to compare the two agents’ types, we define the integrated squared difference in the network types of agents with social characteristics $\omega_{i}$ and $\omega_{j}$ by

\rho_{ij}=||f_{\omega_{i}}-f_{\omega_{j}}||_{2}=\left[\int\left(f(\omega_{i},t)-f(\omega_{j},t)\right)^{2}dt\right]^{1/2}

(3)

That is if the measure of the network distance between agents $i$ and $j$ equals zero ( $\rho_{ij}=0$ ), then there is no identifiable characteristic within the network that sets $\omega_{i}$ apart from $\omega_{j}$ . Given this scenario, agents $i$ and $j$ are equally likely to form connections within any specific network configuration. Consequently, they would have identical distributions of degrees, eigenvector centralities, and average peer characteristics, as well as any other individual-specific statistic of the network representation $D$ .

Assumption 1.

(i) $(X_{i},\omega_{i},\varepsilon_{i})$ are i.i.d. for all $i=1,\cdots,n$ ; (ii) The random array $\{\eta_{ij}\}_{i,j=1}^{n}$ is symmetric and independent of $(X_{i},\omega_{i},\varepsilon_{i})$ with i.i.d. entries above the diagonal; (iii) $\omega_{i}$ and $\eta_{ij}$ have standard uniform marginals; (iv) $\varepsilon_{i}$ follows a logistic distribution; (v) The binary outcomes $\{y_{i}\}_{i=1}^{n}$ and the binary adjacency matrix $D$ are given respectively by equations (1) and (2); (vi) $\lambda$ and $f$ are Lebesgue-measurable with $f$ being symmetric in its arguments.

Assumption 1(i) implies that the observables $X_{i}$ and the unobservable individual characteristics $(\omega_{i},\varepsilon_{i})$ are randomly drawn. This is a standard assumption in the network literature. Assumption 1(ii) assumes that the link formation error $\eta_{ij}$ is orthogonal to all other observables and unobservables in the model. This means that the dyad-specific unobservable shock $\eta_{ij}$ from the link formation process does not influence the binary outcomes $\{y_{i}\}_{i=1}^{n}$ . The endogeneity in this model takes the form of a dependence between $X_{i}$ and the unobserved error $\lambda(\omega_{i})+\varepsilon_{i}$ through $\omega_{i}$ . From assumption 1(iii), the marginal distributions of $\omega_{i}$ and $\eta_{ij}$ are assumed to have standard uniform marginals without loss because we cannot separately identify them from $f$ . Under this assumption, $f(\omega_{i},\omega_{j})$ is the probability that agent $i$ and $j$ form are directly linked, $\mathbb{P}(D_{ij}=1)=f(\omega_{i},\omega_{j})$ , which implicitly assumes that $f:[0,1]^{2}\to[0,1]$ .

Assumption 2.

$\lambda$ satisfies $\mathbb{E}\left[(\lambda(\omega_{i})-\lambda(\omega_{j}))^{2}|\rho_{ij}=0\right]=0$ .

The assumption 2 states that agents with similar network types have similar social influences. In other words, this means that if two agents interact with similar groups of people or have similar connections, they are likely to have similar opinions, behaviors, or attitudes. $\rho_{ij}=0$ implies that $f_{\omega_{i}}(\cdot)=f_{\omega_{j}}(\cdot)$ and $\lambda(\omega_{i})=\lambda(\omega_{j})$ under assumption 2, but does not imply that $\omega_{i}=\omega_{j}$ .

Theorem 1.

Under assumptions 1 and 2, we have that $\beta$ and $\lambda(\omega_{i})$ are uniquely identify.

\beta=\arg\min_{b\in\mathbb{R}^{k}}-\mathbb{E}\left[y_{i}\log F[(X_{i}-X_{j})b]+y_{j}\log F[(X_{j}-X_{i})b]\bigg{|}\rho_{ij}=0,y_{i}+y_{j}=1\right]

(4)

and

\lambda(\omega_{i})=\mathbb{E}\left[F^{-1}\big{(}\mathbb{P}(y_{i}=1|X_{i},f_{\omega_{i}})\big{)}-X_{i}{\beta}\big{|}f_{\omega_{i}}\right]

(5)

The proof of theorem 1 is provided in Appendix. The conditional expectation in (5) is defined as follows: for any arbitrary random matrix $\Pi_{i}$ indexed at the agent level, we have

\mathbb{E}\left[\Pi_{i}\left|f_{\omega_{i}}\right.\right]\equiv\mathbb{E}\left[\Pi_{i}\left|\omega_{i}\in\{u\in[0,1]:||f_{\omega_{i}}-f_{u}||_{2}=0\}\right.\right]

3.2 Estimation

This section provides a structured discussion regarding the estimation of the parameters of interest from Section 3.1. If the individual social characteristics $\{\omega_{i}\}_{i=1}^{n}$ were observed then we can use existing tools to estimate $\beta$ and $\lambda(\omega_{i})$ . The link formation model in (2) will not provide useful information in this case. The researcher will just have to use observations for which $\omega_{i}$ is close to $\omega_{j}$ (Honoré and Powell,, 1997; Aradillas-Lopez et al.,, 2007). However, since individual social characteristics $\{\omega_{i}\}_{i=1}^{n}$ are not observed and the conditional mean functions in Theorem 1 are also not known, estimates are not feasible.

In order to make these estimates feasible, we exploit the theorem of Lovász, (2012) which demonstrates that pairs of individuals with identical link functions have identical codegree functions. Following Auerbach, (2022), the codegree function is defined by $p(\omega_{i},\omega_{j})=\int f_{\omega_{i}}(s)f_{\omega_{j}}(s)ds$ , which is the probability that agents $i$ and $j$ have a link in common. The agent $i$ ’s codegree type is defined as $p_{\omega_{i}}(\cdot)\equiv p(\omega_{i},\cdot):[0,1]\to[0,1]$ . The pseudometric codegree distance between agents $i$ and $j$ is defined by:

\displaystyle\delta_{ij}=\|p_{\omega_{i}}-p_{\omega_{j}}\|_{2}=\left[\int\left(\int f(t,s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds\right)^{2}dt\right]^{1/2}

which can be consistently estimated²²2Auerbach, (2022) shows in Lemma B1 that $\hat{\delta}_{ij}$ converges uniformly to $\delta_{ij}$ over the $\binom{n}{2}$ distinct pairs of agents as $n\to\infty$ . by the root average squared difference in the $i$ th and $j$ th columns of the squared adjacency matrix

\displaystyle\hat{\delta}_{ij}=\left[\frac{1}{n}\sum_{t=1}^{n}\left(\frac{1}{n}\sum_{s=1}^{n}D_{ts}\left(D_{is}-D_{js}\right)\right)^{2}\right]^{1/2}

The following Lemma shows the relationship between the codegree distance and the network distance.

Lemma 1.

If the link function $f$ satisfies assumption 1, then, for all $i,j\in\{1,\cdots,n\}$ , the distances $\rho_{ij}$ and $\delta_{ij}$ are equivalent.

The proof of this Lemma is provided in Appendix. A direct consequence of Lemma 1 is that if the link function $f$ is continuous and not constant almost everywhere then, $\rho_{ij}=0\iff\delta_{ij}=0$ . Using this equivalence and the fact that $\delta_{ij}$ can be consistently estimated by $\hat{\delta}_{ij}$ , we can therefore confidently substitute $\rho_{ij}=0$ by $\hat{\delta}_{ij}=0$ in order to consistently estimate the parameters of interest.

Assumption 3.

The following statements about the kernel function $K$ and the bandwidth $h$ hold:

(i)

$K$ is non-negative, bounded, differentiable with bounded derivative $K^{\prime}$ , and

$\int K(u)du=1,\ \ \int\left|K(u)\right|du<\infty\ \ \mbox{ and }\ \ \int|K(u)||u|du<\infty$
(ii)

$h>0,\ \ h=o(1),\ \ h^{-1}=O(\sqrt{n})$ , and $n\mathbb{E}\left[K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)\right]\to\infty$ .

Assumption 3(i) is the standard restriction of the kernel function $K$ . The first three restrictions in Assumption 3(ii) on the bandwidth sequence $h$ are also standard. The last one guarantees that as the sample size $n$ increases, the number of matches used to estimate $\beta$ also increases.

Under assumptions 1-3, $\beta$ is consistently estimated by

\hat{\beta}=\arg\min_{b\in\mathbb{R}^{k}}-\sum_{y_{i}\neq y_{j}}K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)\Big{\{}y_{i}\ln F\left[(X_{i}-X_{j})^{\prime}b\right]+y_{j}\ln F\left[(X_{j}-X_{i})^{\prime}b\right]\Big{\}}

(6)

and $\lambda(\omega_{i})$ is consistently estimated by

\hat{\lambda}(\omega_{i})=\left[\sum_{j=1}^{n}K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)\right]^{-1}\left[\sum_{j=1}^{n}\left(F^{-1}\left(\widehat{\mathbb{P}}\left(y_{i}=1\left|X_{i},f_{\omega_{i}}\right.\right)\right)-X_{j}^{\prime}\hat{\beta}\right)K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)\right]

(7)

where

\widehat{\mathbb{P}}\left(y_{i}=1\left|X_{i},f_{\omega_{i}}\right.\right)=\left[\sum_{j=1}^{n}K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)K\left(\frac{X_{j}-X_{i}}{h}\right)\right]^{-1}\left[\sum_{j=1}^{n}y_{j}K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)K\left(\frac{X_{j}-X_{i}}{h}\right)\right]

is a consistent estimator for $\mathbb{P}\left(y_{i}=1\left|X_{i},f_{\omega_{i}}\right.\right)=\mathbb{E}\left[y_{i}\left|X_{i},f_{\omega_{i}}\right.\right]$ ; $K$ is a kernel function and $h$ is a bandwidth parameter depending on the sample size. The term $K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)$ gives more weight to pairs of observations $(i,j)$ with identical codegree function. The consistency and asymptotic normality of these estimators will be developed in the next section.

3.3 Consistency and Asymptotic Normality

Let’s define

m(v_{i},v_{j},b)=-\mathbbm{1}(y_{i}\neq y_{j})\left\{y_{i}\ln F\left[(X_{i}-X_{j})^{\prime}b\right]+y_{j}\ln F\left[(X_{j}-X_{i})^{\prime}b\right]\right\}

and the objective function

\Omega_{n}(\hat{\delta},b)=\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)m(v_{i},v_{j},b)

(8)

where $v_{i}=(y_{i},X_{i})$ .

The estimator $\hat{\beta}$ , as defined in equation (6), is derived through the minimization of the objective function specified in equation (8). In the following two subsections, I will provide conditions on which this estimator is consistent and asymptotically normal.

3.3.1 Consistency

To prove that our estimator $\hat{\beta}$ is consistent, we will take advantage of the following assumptions and theorems found in Newey and McFadden, (1994).

Let us define the following function

l(x,y,b)=\mathbb{E}\left[m(v_{i},v_{j},b)\left|v_{i}=x,\lambda(\omega_{j})=y\right.\right].

Note that

l(x,\lambda(\omega_{i}),b)=\mathbb{E}\left[m(v_{i},v_{j},b)\left|v_{i}=x,\delta_{ij}=0\right.\right].

Assumption 4.

All the following assumptions hold

1.

$\mathbb{E}\left[m(v_{i},v_{j},b)^{2}\right]<\infty$ ;
2.

The function $l(\cdot)$ defined above exists and is a continuous function of each of its arguments;
3.

For all $b$ , $|l(x,y,b)|\leq t(x,b)$ with $\mathbb{E}\left[t(v_{i},b)\right]<\infty$ .

Theorem 2.

If Assumptions 1-4 hold. If the parameter space for $b$ is compact and includes the true value of $\beta$ . Then, the estimator $\hat{\beta}$ of $\beta$ defined in (6) and the estimator $\hat{\lambda}(\omega_{i})$ of $\lambda(\omega_{i})$ defined in (7) are consistent.

The proof of theorem 2 is provided in Appendix.

3.3.2 Asymptotic Normality

Assumption 5.

The following holds:

•

$\mathbb{E}\left(||X_{i}||^{2}\right)<\infty$
•

Conditional on $\delta_{ij}=0$ , $(X_{i}-X_{j})$ has full rank;

Theorem 3.

Under Assumptions 1-5, the estimator $\hat{\beta}$ of $\beta$ defined in (6) is asymptotically normal with

\sqrt{n}\left(\hat{\beta}-\beta\right)\longrightarrow_{d}\mathcal{N}\left(0,4\Sigma^{-1}V\Sigma^{-1}\right)

with

V=Var(t(y_{i},x_{i},w_{i}))

where

t(y_{i},x_{i},w_{i})=\mathbb{E}\left[\mathbbm{1}(y_{i}\neq y_{j})\left\{y_{i}-F\left((X_{i}-X_{j})^{\prime}\beta\right)\right\}(X_{i}-X_{j})\left|y_{i},x_{i},\delta_{ij}=0\right.\right]

and

	$\displaystyle\Sigma$	$\displaystyle=\mathbb{E}\left[\Delta_{\beta}(t(y_{i},x_{i},w_{i}))\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\mathbbm{1}(y_{i}\neq y_{j})F\left((X_{i}-X_{j})^{\prime}\beta\right)F\left((X_{j}-X_{i})^{\prime}\beta\right)(X_{i}-X_{j})(X_{i}-X_{j})^{\prime}\right]\left\|y_{i},x_{i},\delta_{ij}=0\right.\right]$

4 Simulation

We present in this section the results of the simulation using the estimators developed in the previous section. We generate simulated data using the model described in section 3 with $\beta=1$ and $\lambda(w)=1.5w^{2}+\log(w)$ . The estimators will be assessed using three different linking functions: the stochastic blockmodel $f_{1}$ , the beta model $f_{2}$ , and the homophily model $f_{3}$ defined below:

f_{1}(x,y)=\left\{\begin{array}[]{cl}1/3&\mbox{ if }x\leq 1/3\mbox{ and }y>1/3\\ 1/3&\mbox{ if }1/3<x\leq 2/3\mbox{ and }y\leq 2/3\\ 1/3&\mbox{ if }x>2/3\mbox{ and }(y>2/3\mbox{ or }y\leq 1/3)\\ 0&\mbox{ otherwise }\end{array}\right.

f_{2}(x,y)=\frac{\exp(x+y)}{1+\exp(x+y)}\ \ \ \mbox{ and }\ \ \ f_{3}(x,y)=1-(x-y)^{2}

These will be used to define the adjacency matrix $D$ as defined by (2). We run 500 simulations of $n=$ 50, 100, 200, and 300 individuals. For each 500 simulations, we draw a random sample of $n$ observations $\{\xi_{i}\}_{i=1}^{n}$ from a standard univariate normal distribution, $\{\varepsilon_{i}\}_{i=1}^{n}$ from a standard logistic distribution, $\{\omega_{i}\}_{i=1}^{n}$ and the lower diagonal entries of the symmetric matrix $\{\eta_{ij}\}_{i,j=1}^{n}$ from a standard uniform distribution. We generate dependency between $x_{i}$ and $\omega_{i}$ as follows $x_{i}=\xi_{i}+\sqrt{\omega_{i}}$ . We use the Epanechnikov kernel $K(x)=\frac{3}{4}(1-x^{2})\mathbbm{1}(x^{2}<1)$ and the bandwidth sequence $h=n^{-1/9}/10$ .

The performance of our proposed estimators will be evaluated in comparison to the estimators of the following models:

•

Naive Logit: $y_{i}=\mathbbm{1}\left\{\alpha+\beta_{1}x_{i}-\varepsilon_{i}\geq 0\right\}$
•

Infeasible Logit: $y_{i}=\mathbbm{1}\left\{\alpha+\beta_{2}x_{i}+\theta\lambda(\omega_{i})-\varepsilon_{i}\geq 0\right\}$
•

Logit with controls: $y_{i}=\mathbbm{1}\left\{\alpha+\beta_{3}x_{i}+\mu\sum_{j=1}^{d}D_{ij}x_{j}+\gamma\sum_{j=1}^{d}D_{ij}y_{j}-\varepsilon_{i}\geq 0\right\}$

For each model, link function, and sample size, we report the average bias over the 500 simulations.

Table 1 shows the results of the 500 simulations using the stochastic blockmodel.

Table 1: Simulation results

		My Estimates	Naive Logit	Infeasible Logit	Logit controls
	$n$	$\|\hat{\beta}-1\|$	$\|\hat{\beta}_{1}-1\|$	$\|\hat{\beta}_{2}-1\|$	$\|\hat{\beta}_{3}-1\|$
	50	0.1224	0.2039	0.1090	0.1472
Stochastic	100	0.1017	0.2023	0.0707	0.1743
blockmodel	200	0.0751	0.2346	0.0183	0.2181
$(f_{1})$	300	0.0212	0.2466	0.0087	0.2292
	50	0.1241	-	-	0.1453
Beta	100	0.1065	-	-	0.1689
model	200	0.0825	-	-	0.2112
( $f_{2}$ )	300	0.0177	-	-	0.2292
	50	0.1386	-	-	0.1378
Homophily	100	0.1199	-	-	0.1358
model	200	0.0990	-	-	0.1742
( $f_{3}$ )	300	0.0313	-	-	0.1998

The presented results in Table 1 illustrate the average bias across 500 simulations for various models (Naive Logit, Infeasible Logit, Logit with controls) at different sample sizes within a stochastic blockmodel, beta model, and homophily model framework. Across the models, the average bias of my estimate generally diminishes with larger sample sizes, indicating that increased data volume enhances estimation accuracy. However, specific parameter biases ( $\beta_{1}$ and $\beta_{3}$ ) showcase a nuanced pattern, fluctuating inconsistently across sample sizes within each model. Notably, certain parameters display higher biases at smaller sample sizes, suggesting potential sensitivity to data volume. Notably, the Infeasible Logit estimator performs better than the other models because it assumes that $\lambda(w_{i})$ is observed. Despite this, our estimator remains consistent regardless of the link function used. These results reinforce the assumption that the link function does not need to be known for the estimator to be effective.

5 Conclusion

This paper proposes the identification and estimation of a semiparametric binary response model in which one covariate is an unknown function of an unobserved individual characteristic that influences both link formation in the network and the binary outcome. To achieve this, the estimation is based on matching pairs of agents with similar columns of the squared adjacency matrix. The proposed estimators are consistent and asymptotically normal.

Appendix

Proof of Theorem 1.

We have,

\mathbb{P}(y_{i}=1|X_{i},\omega_{i})=F\left(X_{i}\beta+\lambda(\omega_{i})\right)

(9)

Let $\Delta_{ij}=\left\{X_{i},X_{j},||f_{\omega_{i}}-f_{\omega_{j}}||_{2}=0\right\}\equiv\left\{X_{i},X_{j},\lambda(\omega_{i})=\lambda(\omega_{j})\right\}$ by assumption 2.
The probability of $y_{i}=1$ conditional on $y_{i}+y_{j}=1$ is given by:

	$\displaystyle\mathbb{P}(y_{i}=1\|y_{i}+y_{j}=1,\Delta_{ij})$	$\displaystyle=\frac{\mathbb{P}(y_{i}=1,y_{j}=0\|\Delta_{ij})}{\mathbb{P}(y_{i}=1,y_{j}=0\|\Delta_{ij})+\mathbb{P}(y_{i}=0,y_{j}=1\|\Delta_{ij})}$
		$\displaystyle=\frac{\mathbb{P}(y_{i}=1\|\Delta_{ij})\mathbb{P}(y_{j}=0\|\Delta_{ij})}{\mathbb{P}(y_{i}=1\|\Delta_{ij})\mathbb{P}(y_{j}=0\|\Delta_{ij})+\mathbb{P}(y_{i}=0\|\Delta_{ij})\mathbb{P}(y_{j}=1\|\Delta_{ij})}$
		$\displaystyle=\frac{F\left(X_{i}\beta+\lambda(\omega_{i})\right)[1-F\left(X_{j}\beta+\lambda(\omega_{i})\right)]}{F\left(X_{i}\beta+\lambda(\omega_{i})\right)[1-F\left(X_{j}\beta+\lambda(\omega_{i})\right)]+F\left(X_{j}\beta+\lambda(\omega_{i})\right)[1-F\left(X_{i}\beta+\lambda(\omega_{i})\right)]}$

Since $F(x)=\frac{\exp(x)}{1+\exp(x)}$ , then $1-F(x)=\frac{1}{1+\exp(x)}$ .
Thus,

	$\displaystyle\mathbb{P}(y_{i}=1\|y_{i}+y_{j}=1,\Delta_{ij})$	$\displaystyle=\frac{\exp\left(X_{i}\beta+\lambda(\omega_{i})\right)}{\exp\left(X_{i}\beta+\lambda(\omega_{i})\right)+\exp\left(X_{j}\beta+\lambda(\omega_{i})\right)}$
		$\displaystyle=\frac{\exp\left((X_{i}-X_{j})\beta+\lambda(\omega_{i})-\lambda(\omega_{i})\right)}{1+\exp\left((X_{i}-X_{j})\beta+\lambda(\omega_{i})-\lambda(\omega_{i})\right)}$
		$\displaystyle=F\left((X_{i}-X_{j})\beta\right)$

Hence,

\mathbb{P}(y_{i}=1|y_{i}+y_{j}=1,\Delta_{ij})=F\left((X_{i}-X_{j})\beta\right)

Define $\Omega(b)=-\mathbb{E}\left[y_{i}\log F[(X_{i}-X_{j})b]+y_{j}\log F[(X_{j}-X_{i})b]\bigg{|}\rho_{ij}=0,y_{i}+y_{j}=1\right]$ .
We want to show that $\beta$ is a unique minimizer of $\Omega(b)$ . Under assumption 1 and 2, and by Jensen’s inequality, we have

	$\displaystyle\Omega(\beta)-\Omega(b)$	$\displaystyle=\mathbb{E}\Bigg{[}\log\left\{\left(\frac{F[(X_{i}-X_{j})b]}{F[(X_{i}-X_{j})\beta]}\right)^{y_{i}}\left(\frac{F[(X_{j}-X_{i})b]}{F[(X_{j}-X_{i})\beta]}\right)^{y_{j}}\right\}\bigg{\|}\rho_{ij}=0,y_{i}+y_{j}=1\Bigg{]}$
		$\displaystyle\leq\log\mathbb{E}\Bigg{[}\left(\frac{F[(X_{i}-X_{j})b]}{F[(X_{i}-X_{j})\beta]}\right)^{y_{i}}\left(\frac{F[(X_{j}-X_{i})b]}{F[(X_{j}-X_{i})\beta]}\right)^{y_{j}}\bigg{\|}\rho_{ij}=0,y_{i}+y_{j}=1\Bigg{]}$
		$\displaystyle=\log\mathbb{E}\big{[}F((X_{i}-X_{j})b)+F((X_{j}-X_{i})b)\big{\|}\rho_{ij}=0,y_{i}+y_{j}=1\big{]}$
		$\displaystyle=\log(1)=0$

i.e. $\Omega(\beta)\leq\Omega(b)$ , for all $b$ .
Hence,

\beta=\arg\min_{b\in\mathbb{R}^{k}}-\mathbb{E}\left[y_{i}\log F[(X_{i}-X_{j})b]+y_{j}\log F[(X_{j}-X_{i})b]\bigg{|}\rho_{ij}=0,y_{i}+y_{j}=1\right].

Using (9), we can find $\lambda(\omega_{i})$ by inversion

\lambda(\omega_{i})=\mathbb{E}\left[F^{-1}\big{(}\mathbb{P}(y_{i}=1|X_{i},f_{\omega_{i}})\big{)}-X_{i}{\beta}\big{|}f_{\omega_{i}}\right]

The unicity of $\lambda(\omega_{i})$ for all $i$ follows from the unicity of $\beta.$ $\blacksquare$

Proof of Lemma 1.

Let find $a,b>0$ such that for all $i,j\in\{1,\cdots,n\}$ , we have

a\delta_{ij}\leq\rho_{ij}\leq b\delta_{ij}

Firstly,

	$\displaystyle\delta_{ij}^{2}$	$\displaystyle=\\|p_{\omega_{i}}-p_{\omega_{j}}\\|_{2}^{2}=\int\left(\int f(t,s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds\right)^{2}dt$
		$\displaystyle\leq\int\int\left[f(t,s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)\right]^{2}dsdt$
		$\displaystyle\leq\int\left(f(\omega_{i},s)-f(\omega_{j},s)\right)^{2}ds=\\|f_{\omega_{i}}-f_{\omega_{j}}\\|_{2}^{2}=\rho_{ij}^{2}$

where the first inequality is obtained using Jensen’s inequality and the second inequality is obtained using the fact that $\sup_{x,y\in[0,1]}f(x,y)\leq 1$ .
We have then $\delta_{ij}\leq\rho_{ij}\implies a=1.$
Secondly, let $i,j\in\{1,\cdots,n\}$ , $\epsilon>0$ and $t=\arg\max_{w\in\{\omega_{i},\omega_{j}\}}\left|\int f(w,s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds\right|$

	$\displaystyle\mathbbm{1}\left(\rho_{ij}>\epsilon\right)$	$\displaystyle=\mathbbm{1}\left(\\|f_{\omega_{i}}-f_{\omega_{j}}\\|_{2}>\epsilon\right)=\mathbbm{1}\left(\int\left(f(\omega_{i},s)-f(\omega_{j},s)\right)^{2}ds>\epsilon^{2}\right)$
		$\displaystyle=\mathbbm{1}\left(\int f(\omega_{i},s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds-\int f(\omega_{j},s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds>\epsilon^{2}\right)$
		$\displaystyle\leq\mathbbm{1}\left(2\left\|\int f(t,s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds\right\|>\epsilon^{2}\right)$
		$\displaystyle=\mathbbm{1}\left(\left(\int f(t,s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds\right)^{2}>\frac{\epsilon^{4}}{4}\right)$
		$\displaystyle\leq\mathbbm{1}\left(\int\left(\int f(t,s)\left(f(\omega_{i},s)-f(\omega_{j},s)\right)ds\right)^{2}dt>\frac{\epsilon^{4}}{4}\right)$
		$\displaystyle=\mathbbm{1}\left(\\|p_{\omega_{i}}-p_{\omega_{j}}\\|_{2}>\frac{\epsilon^{2}}{2}\right)=\mathbbm{1}\left(\frac{2}{\epsilon}\delta_{ij}>\epsilon\right)$

where the first inequality comes from the triangle inequality.
We have then, for $\epsilon>0$ there exists a constant $b=\frac{2}{\epsilon}$ such that $\rho_{ij}\leq b\delta_{ij}$ .
Hence, $\delta_{ij}\leq\rho_{ij}\leq b\delta_{ij}$ for any $i,j\in\{1,\cdots,n\}$ and $\epsilon>0.$ $\blacksquare$

Proof of Theorem 2.

The limiting objective function will be defined as follows

\Omega(\delta,b)\equiv\mathbb{E}\left[l(v_{i},\lambda(\omega_{i}),b)\right]=\mathbb{E}\left[m(v_{i},v_{j},b)\left|\delta_{ij}=0\right.\right]

Under Assumption 4(2), the limiting objective function is well defined.

We have,

	$\displaystyle\mathbb{E}[\Omega_{n}(\delta,b)]$	$\displaystyle=\mathbb{E}\left[\frac{1}{h}K\left(\frac{\delta^{2}_{ij}}{h}\right)m(v_{i},v_{j},b)\right]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[\frac{1}{h}K\left(\frac{\lambda(\omega_{i})-\lambda(\omega_{j})}{h}\right)l(v_{i},\lambda(\omega_{j}),b)\left\|v_{i},\omega_{i}\right.\right]\right]$
		$\displaystyle=\int\int K(u)l(v_{i},\lambda(\omega_{i})-hu,b)dudF(v_{i},\omega_{i})$
		$\displaystyle\longrightarrow\Omega(\delta,b)$

The first expectation of the right-hand side on the first equality exists because of Assumptions 4(1) and 3(i). The last result holds by dominated convergence.

Under Assumptions 3 and 4, we have

\mathbb{E}\left[\left\{\frac{1}{h}K\left(\frac{\delta^{2}_{ij}}{h}\right)m(v_{i},v_{j},b)\right\}^{2}\right]=O(n)

and by Lemma A.3 in Ahn and Powell, (1993) we have

\Omega_{n}(\delta,b)=\mathbb{E}[\Omega_{n}(\delta,b)]+o_{p}(1)

Hence,

\Omega_{n}(\delta,b)\longrightarrow\Omega(\delta,b)

Pointwise convergence of $\Omega_{n}(\hat{\delta},b)$ to $\Omega(\delta,b)$ is established using Assumptions 1-4 as follows:

	$\displaystyle\Omega_{n}(\hat{\delta},b)-\Omega_{n}(\delta,b)$	$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}\left[K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)-K\left(\frac{\delta_{ij}^{2}}{h}\right)\right]m(v_{i},v_{j},b)\right\|$
		$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}K^{\prime}(a^{*}_{ij})\left[\frac{\hat{\delta}^{2}_{ij}-\delta_{ij}^{2}}{h}\right]m(v_{i},v_{j},b)\right\|$
		$\displaystyle\leq\binom{n}{2}^{-1}\frac{1}{h^{2}}\sum_{i<j}\left\|K^{\prime}(a^{*}_{ij})\right\|\left\|\hat{\delta}^{2}_{ij}-\delta_{ij}^{2}\right\|\left\|m(v_{i},v_{j},b)\right\|$
		$\displaystyle\leq\left\|\hat{\delta}^{2}_{ij}-\delta^{2}_{ij}\right\|\frac{C}{h^{2}}\binom{n}{2}^{-1}\sum_{i<j}\left\|m(v_{i},v_{j},b)\right\|$
		$\displaystyle=O_{p}\left(\frac{1}{h^{2}\sqrt{n}}\right)=o_{p}(1)$

where $a_{ij}^{*}$ lies on the line segment joining $\hat{\delta}_{ij}$ and $\delta_{ij}$ .
Hence, combining this result with the previous one gives

\Omega_{n}(\hat{\delta},b)\longrightarrow_{p}\Omega(\delta,b)\ \ \mbox{ for all }\ \ b.

Let us prove the uniform convergence in probability of $\Omega_{n}(\hat{\delta},b)$ to $\Omega(\hat{\delta},b)$ . We have

\displaystyle\left|\Omega_{n}(\hat{\delta},b_{1})-\Omega_{n}(\hat{\delta},b_{2})\right|\leq\left|\Omega_{n}({\delta},b_{1})-\Omega_{n}({\delta},b_{2})\right|+\left|[\Omega_{n}(\hat{\delta},b_{1})-\Omega_{n}(\hat{\delta},b_{2})]-[\Omega_{n}({\delta},b_{1})-\Omega_{n}({\delta},b_{2})]\right|

The first term of the right-hand side gives

	$\displaystyle\left\|\Omega_{n}({\delta},b_{1})-\Omega_{n}({\delta},b_{2})\right\|$	$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}K\left(\frac{\delta_{ij}^{2}}{h}\right)\left[m(v_{i},v_{j},b_{1})-m(v_{i},v_{j},b_{2})\right]\right\|$
		$\displaystyle\leq\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}K\left(\frac{\delta_{ij}^{2}}{h}\right)M_{ij}\left\|b_{1}-b_{2}\right\|^{\alpha}$
		$\displaystyle=O_{p}(1)\left\|b_{1}-b_{2}\right\|^{\alpha}$

where the second inequality holds because $m(v_{i},v_{j},b)$ is convex in $b$ .

The second term of the right-hand side gives

	$\displaystyle\|(\Omega_{n}(\hat{\delta},b_{1})-\Omega_{n}(\hat{\delta},b_{2}))$	$\displaystyle-(\Omega_{n}({\delta},b_{1})-\Omega_{n}({\delta},b_{2}))\|$
		$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}\left[K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)-K\left(\frac{\delta_{ij}^{2}}{h}\right)\right]\left[m(v_{i},v_{j},b_{1})-m(v_{i},v_{j},b_{2})\right]\right\|$
		$\displaystyle\leq\binom{n}{2}^{-1}\frac{1}{h^{2}}\sum_{i<j}\left\|K^{\prime}(a^{*}_{ij})\right\|\left\|\hat{\delta}^{2}_{ij}-\delta_{ij}^{2}\right\|\left\|m(v_{i},v_{j},b_{1})-m(v_{i},v_{j},b_{2})\right\|$
		$\displaystyle\leq\left\|\hat{\delta}^{2}_{ij}-\delta^{2}_{ij}\right\|\frac{C}{h^{2}}\binom{n}{2}^{-1}\sum_{i<j}M_{ij}\left\|b_{1}-b_{2}\right\|^{\alpha}$
		$\displaystyle=O_{p}(1)\left\|b_{1}-b_{2}\right\|^{\alpha}$

where the third inequality holds because $m(v_{i},v_{j},b)$ is convex in $b$ .

Both results imply that

\left|\Omega_{n}(\hat{\delta},b_{1})-\Omega_{n}(\hat{\delta},b_{2})\right|\leq O_{p}(1)\left|b_{1}-b_{2}\right|^{\alpha}

Hence, under Lemma 2.9 in Newey and McFadden, (1994), we have

\sup_{b}\left|\Omega_{n}(\hat{\delta},b)-\Omega(\hat{\delta},b)\right|\longrightarrow_{p}0

It follows from Theorem 2.1 of Newey and McFadden, (1994) that $\hat{\beta}\longrightarrow_{p}\beta$ .

The consistency of $\hat{\lambda}(w_{i})$ , for all $i$ , follows from the facts that $\hat{\beta}\longrightarrow_{p}\beta$ , $\hat{\delta}_{ij}\longrightarrow_{p}{\delta}_{ij}$ and $\widehat{\mathbb{P}}\left(y_{i}=1\left|X_{i},f_{\omega_{i}}\right.\right)\longrightarrow_{p}{\mathbb{P}}\left(y_{i}=1\left|X_{i},f_{\omega_{i}}\right.\right)$ . $\blacksquare$

References

AbdulRaheem et al., (2017) AbdulRaheem, Y., Yusuf, H. T., and Odutayo, A. O. (2017). Effect of peer tutoring on students’ academic performance in economics in ilorin south, nigeria. Journal of Peer Learning, 10(1):95–102.
Ahn and Powell, (1993) Ahn, H. and Powell, J. L. (1993). Semiparametric estimation of censored selection models with a nonparametric selection mechanism. Journal of Econometrics, 58(1-2):3–29.
Aradillas-Lopez et al., (2007) Aradillas-Lopez, A., Honoré, B. E., and Powell, J. L. (2007). Pairwise difference estimation with nonparametric control variables. International Economic Review, 48(4):1119–1158.
Arduini et al., (2015) Arduini, T., Patacchini, E., and Rainone, E. (2015). Parametric and semiparametric iv estimation of network models with selectivity. Technical report, Einaudi Institute for Economics and Finance (EIEF).
Auerbach, (2022) Auerbach, E. (2022). Identification and estimation of a partially linear regression model using network data. Econometrica, 90(1):347–365.
Banerjee et al., (2013) Banerjee, A., Chandrasekhar, A. G., Duflo, E., and Jackson, M. O. (2013). The diffusion of microfinance. Science, 341(6144):1236498.
Betts and Morell, (1999) Betts, J. R. and Morell, D. (1999). The determinants of undergraduate grade point average: The relative importance of family background, high school resources, and peer group effects. Journal of human Resources, pages 268–293.
Blume et al., (2011) Blume, L. E., Brock, W. A., Durlauf, S. N., and Ioannides, Y. M. (2011). Chapter 18 - identification of social interactions. volume 1 of Handbook of Social Economics, pages 853–964. North-Holland.
Case and Katz, (1991) Case, A. and Katz, L. F. (1991). The company you keep: The effects of family and neighborhood on disadvantaged youths.
Christakis and Fowler, (2008) Christakis, N. A. and Fowler, J. H. (2008). The collective dynamics of smoking in a large social network. New England journal of medicine, 358(21):2249–2258.
Ductor et al., (2014) Ductor, L., Fafchamps, M., Goyal, S., and Van der Leij, M. J. (2014). Social networks and research output. Review of Economics and Statistics, 96(5):936–948.
Feld and Zölitz, (2022) Feld, J. and Zölitz, U. (2022). The effect of higher-achieving peers on major choices and labor market outcomes. Journal of Economic Behavior & Organization, 196:200–219.
Goldsmith-Pinkham and Imbens, (2013) Goldsmith-Pinkham, P. and Imbens, G. W. (2013). Social networks and the identification of peer effects. Journal of Business & Economic Statistics, 31(3):253–264.
Graham, (2015) Graham, B. S. (2015). Methods of identification in social networks. Annu. Rev. Econ., 7(1):465–485.
Honoré and Powell, (1997) Honoré, B. E. and Powell, J. (1997). Pairwise difference estimators for nonlinear models. na.
Hsieh and Lee, (2016) Hsieh, C.-S. and Lee, L. F. (2016). A social interactions model with endogenous friendship formation and selectivity. Journal of Applied Econometrics, 31(2):301–319.
Jackson et al., (2017) Jackson, M. O., Rogers, B. W., and Zenou, Y. (2017). The economic consequences of social-network structure. Journal of Economic Literature, 55(1):49–95.
Jencks et al., (1990) Jencks, C., Mayer, S. E., et al. (1990). The social consequences of growing up in a poor neighborhood. Inner-city poverty in the United States, 111:186.
Johnsson and Moon, (2021) Johnsson, I. and Moon, H. R. (2021). Estimation of peer effects in endogenous social networks: Control function approach. Review of Economics and Statistics, 103(2):328–345.
Katz et al., (2001) Katz, L. F., Kling, J. R., and Liebman, J. B. (2001). Moving to opportunity in boston: Early results of a randomized mobility experiment. The quarterly journal of economics, 116(2):607–654.
Lovász, (2012) Lovász, L. (2012). Large networks and graph limits, volume 60. American Mathematical Soc.
Manski, (1993) Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem. The review of economic studies, 60(3):531–542.
Manski, (2004) Manski, C. F. (2004). Social learning from private experiences: the dynamics of the selection problem. The Review of Economic Studies, 71(2):443–458.
Menzel, (2015) Menzel, K. (2015). Strategic network formation with many agents. Technical report, Working papers, NYU.
Moffitt et al., (2001) Moffitt, R. A. et al. (2001). Policy interventions, low-level equilibria, and social interactions. Social dynamics, 4(45-82):6–17.
Newey and McFadden, (1994) Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. Handbook of econometrics, 4:2111–2245.
Pu et al., (2021) Pu, S., Yan, Y., and Zhang, L. (2021). Do peers affect undergraduates’ decisions to switch majors? Educational Researcher, 50(8):516–526.
Rosenbaum, (1991) Rosenbaum, J. E. (1991). Black pioneers—do their moves to the suburbs increase economic opportunity for mothers and children? Housing Policy Debate, 2(4):1179–1213.
Rusli et al., (2021) Rusli, M., Degeng, N. S., Setyosari, P., and Sulton (2021). Peer teaching: Students teaching students to increase academic performance. Teaching Theology & Religion, 24(1):17–27.
Sacerdote, (2001) Sacerdote, B. (2001). Peer effects with random assignment: Results for dartmouth roommates. The Quarterly journal of economics, 116(2):681–704.
Soetevent, (2006) Soetevent, A. R. (2006). Empirics of the identification of social interactions; an evaluation of the approaches and their results. Journal of Economic surveys, 20(2):193–228.

	$\displaystyle\mathbb{P}(y_{i}=1\|y_{i}+y_{j}=1,\Delta_{ij})$	$\displaystyle=\frac{\mathbb{P}(y_{i}=1,y_{j}=0\|\Delta_{ij})}{\mathbb{P}(y_{i}=1,y_{j}=0\|\Delta_{ij})+\mathbb{P}(y_{i}=0,y_{j}=1\|\Delta_{ij})}$
		$\displaystyle=\frac{\mathbb{P}(y_{i}=1\|\Delta_{ij})\mathbb{P}(y_{j}=0\|\Delta_{ij})}{\mathbb{P}(y_{i}=1\|\Delta_{ij})\mathbb{P}(y_{j}=0\|\Delta_{ij})+\mathbb{P}(y_{i}=0\|\Delta_{ij})\mathbb{P}(y_{j}=1\|\Delta_{ij})}$
		$\displaystyle=\frac{F\left(X_{i}\beta+\lambda(\omega_{i})\right)[1-F\left(X_{j}\beta+\lambda(\omega_{i})\right)]}{F\left(X_{i}\beta+\lambda(\omega_{i})\right)[1-F\left(X_{j}\beta+\lambda(\omega_{i})\right)]+F\left(X_{j}\beta+\lambda(\omega_{i})\right)[1-F\left(X_{i}\beta+\lambda(\omega_{i})\right)]}$

	$\displaystyle\Omega(\beta)-\Omega(b)$	$\displaystyle=\mathbb{E}\Bigg{[}\log\left\{\left(\frac{F[(X_{i}-X_{j})b]}{F[(X_{i}-X_{j})\beta]}\right)^{y_{i}}\left(\frac{F[(X_{j}-X_{i})b]}{F[(X_{j}-X_{i})\beta]}\right)^{y_{j}}\right\}\bigg{\|}\rho_{ij}=0,y_{i}+y_{j}=1\Bigg{]}$
		$\displaystyle\leq\log\mathbb{E}\Bigg{[}\left(\frac{F[(X_{i}-X_{j})b]}{F[(X_{i}-X_{j})\beta]}\right)^{y_{i}}\left(\frac{F[(X_{j}-X_{i})b]}{F[(X_{j}-X_{i})\beta]}\right)^{y_{j}}\bigg{\|}\rho_{ij}=0,y_{i}+y_{j}=1\Bigg{]}$
		$\displaystyle=\log\mathbb{E}\big{[}F((X_{i}-X_{j})b)+F((X_{j}-X_{i})b)\big{\|}\rho_{ij}=0,y_{i}+y_{j}=1\big{]}$
		$\displaystyle=\log(1)=0$

	$\displaystyle\Omega_{n}(\hat{\delta},b)-\Omega_{n}(\delta,b)$	$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}\left[K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)-K\left(\frac{\delta_{ij}^{2}}{h}\right)\right]m(v_{i},v_{j},b)\right\|$
		$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}K^{\prime}(a^{*}_{ij})\left[\frac{\hat{\delta}^{2}_{ij}-\delta_{ij}^{2}}{h}\right]m(v_{i},v_{j},b)\right\|$
		$\displaystyle\leq\binom{n}{2}^{-1}\frac{1}{h^{2}}\sum_{i<j}\left\|K^{\prime}(a^{*}_{ij})\right\|\left\|\hat{\delta}^{2}_{ij}-\delta_{ij}^{2}\right\|\left\|m(v_{i},v_{j},b)\right\|$
		$\displaystyle\leq\left\|\hat{\delta}^{2}_{ij}-\delta^{2}_{ij}\right\|\frac{C}{h^{2}}\binom{n}{2}^{-1}\sum_{i<j}\left\|m(v_{i},v_{j},b)\right\|$
		$\displaystyle=O_{p}\left(\frac{1}{h^{2}\sqrt{n}}\right)=o_{p}(1)$

	$\displaystyle\left\|\Omega_{n}({\delta},b_{1})-\Omega_{n}({\delta},b_{2})\right\|$	$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}K\left(\frac{\delta_{ij}^{2}}{h}\right)\left[m(v_{i},v_{j},b_{1})-m(v_{i},v_{j},b_{2})\right]\right\|$
		$\displaystyle\leq\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}K\left(\frac{\delta_{ij}^{2}}{h}\right)M_{ij}\left\|b_{1}-b_{2}\right\|^{\alpha}$
		$\displaystyle=O_{p}(1)\left\|b_{1}-b_{2}\right\|^{\alpha}$

	$\displaystyle\|(\Omega_{n}(\hat{\delta},b_{1})-\Omega_{n}(\hat{\delta},b_{2}))$	$\displaystyle-(\Omega_{n}({\delta},b_{1})-\Omega_{n}({\delta},b_{2}))\|$
		$\displaystyle=\left\|\binom{n}{2}^{-1}\frac{1}{h}\sum_{i<j}\left[K\left(\frac{\hat{\delta}^{2}_{ij}}{h}\right)-K\left(\frac{\delta_{ij}^{2}}{h}\right)\right]\left[m(v_{i},v_{j},b_{1})-m(v_{i},v_{j},b_{2})\right]\right\|$
		$\displaystyle\leq\binom{n}{2}^{-1}\frac{1}{h^{2}}\sum_{i<j}\left\|K^{\prime}(a^{*}_{ij})\right\|\left\|\hat{\delta}^{2}_{ij}-\delta_{ij}^{2}\right\|\left\|m(v_{i},v_{j},b_{1})-m(v_{i},v_{j},b_{2})\right\|$
		$\displaystyle\leq\left\|\hat{\delta}^{2}_{ij}-\delta^{2}_{ij}\right\|\frac{C}{h^{2}}\binom{n}{2}^{-1}\sum_{i<j}M_{ij}\left\|b_{1}-b_{2}\right\|^{\alpha}$
		$\displaystyle=O_{p}(1)\left\|b_{1}-b_{2}\right\|^{\alpha}$