\UseRawInputEncoding

A sparse $p_{0}$ model with covariates for directed networks

Qiuping Wang
Central China Normal University
Department of Statistics, Central China Normal University, Wuhan, 430079, China. Emails:[email protected].

Abstract

We are concerned here with unrestricted maximum likelihood estimation in a sparse $p_{0}$ model with covariates for directed networks. The model has a density parameter $\nu$ , a $2n$ -dimensional node parameter $\boldsymbol{\eta}$ and a fixed dimensional regression coefficient $\boldsymbol{\gamma}$ of covariates. Previous studies focus on the restricted likelihood inference. When the number of nodes $n$ goes to infinity, we derive the $\ell_{\infty}$ -error between the maximum likelihood estimator (MLE) $(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})$ and its true value $(\boldsymbol{\eta},\boldsymbol{\gamma})$ . They are $O_{p}((\log n/n)^{1/2})$ for $\widehat{\boldsymbol{\eta}}$ and $O_{p}(\log n/n)$ for $\widehat{\boldsymbol{\gamma}}$ , up to an additional factor. This explains the asymptotic bias phenomenon in the asymptotic normality of $\widehat{\boldsymbol{\gamma}}$ in Yan et al. (2019). Further, we derive the asymptotic normality of the MLE. Numerical studies and a data analysis demonstrate our theoretical findings. Key words: Asymptotic normality; Consistency; Degree heterogeneity; Homophily; Maximum likelihood estimator.

1 Introduction

The degree heterogeneity is a very common phenomenon in realistic networks, which characterizes the variation in the node degrees. In directed networks, there are in- and out-degrees for nodes. Another distinct phenomenon is homophily that links tend to form between nodes with same attributes such as age and sex. As depicted in Yan et al. (2019), the directed friendship network between $71$ lawyers studied in Lazega (2001) displays a strong homophily effect. Yan et al. (2019) incorporated the covariates into the $p_{0}$ model in Yan et al. (2016a) to address the homophily. However, the network sparsity has not been considered in their paper. This is an equally important feature in realistic networks.

Here, we further incorporate a sparsity parameter to Yan et al.’s model. Consider a directed graph $\mathcal{G}_{n}$ on $n\geq 2$ nodes that are labeled by $1,\ldots,n$ . Let $a_{ij}\in\{0,1\}$ be an indictor whether there is a directed edge from node $i$ pointing to $j$ . That is, if there is a directed edge from $i$ to $j$ , then $a_{ij}=1$ ; otherwise, $a_{ij}=0$ . Our model postulates that $a_{ij}$ ’s follow independent Bernoulli distributions such that a directed link exists from node $i$ to node $j$ with probability

P(a_{ij}=1)=\frac{\exp(\nu+\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}{1+\exp(\nu+\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}.

(1)

In this model, the parameter $\nu$ measures the network sparsity. For a node $i$ , the incomingness parameter denoted by $\beta_{i}$ characterizes how attractive the node and an outgoingness parameter denoted by $\alpha_{i}$ illustrates the extent to which the node is attracted to others [Holland and Leinhardt (1981)]. $\boldsymbol{\gamma}$ is the regression coefficient of covariates $Z_{ij}$ ’s. A larger $Z_{ij}^{\top}\boldsymbol{\gamma}$ implies a higher likelihood for node $i$ and $j$ to be connected. We will call the above model “covariate- $p_{0}$ -model” hereafter.

The covariate $Z_{ij}$ is either a link dependent vector or a function of node-specific covariates. If $X_{i}$ denotes a vector of node-level attributes, then these node-level attributes can be used to construct a $p$ -dimensional vector $Z_{ij}=g(X_{i},X_{j})$ , where $g(\cdot,\cdot)$ is a function of its arguments. For instance, if we let $g(X_{i},X_{j})$ equal to $\|X_{i}-X_{j}\|_{1}$ , then it measures the similarity between node $i$ and $j$ features.

Exploring asymptotic theory in the covaraite- $p_{0}$ -model is challenging [e.g. Fienberg (2012); Yan et al. (2019)]. The restricted maximum likelihood inference has been used to derive the consistency and asymptotic normality of the restricted maximum likelihood estimator in Yan et al. (2019). However, it is often to solve unrestricted maximum likelihood problem in practices since it is difficult to determine the range of unknown parameters, especially in a situation with a growing dimension of parameter space. Therefore, developing unrestricted likelihood inference is more interesting and more actually related.

Motivated by techniques for the analysis of the unrestricted likelihood inference in the sparse $\beta$ -model with covariates for undirected graphs in Yan (2021), we generalize his approach to directed graphs here. When the number of nodes $n$ goes to infinity, we derive the $\ell_{\infty}$ -error between the maximum likelihood estimator (MLE) $(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})$ and its true value $(\boldsymbol{\eta},\boldsymbol{\gamma})$ by using a two-stage Newton process that first finds the error bound between $\widehat{\boldsymbol{\eta}}_{\gamma}$ and $\boldsymbol{\eta}$ with a fixed $\boldsymbol{\gamma}$ and then derives the error bound between $\widehat{\boldsymbol{\gamma}}$ and $\boldsymbol{\gamma}$ . They are $O_{p}((\log n/n)^{1/2})$ for $\widehat{\boldsymbol{\eta}}$ and $O_{p}(\log n/n)$ for $\widehat{\boldsymbol{\gamma}}$ , up to an additional factor on parameters. Further, we derive the asymptotic normality of the MLE. The asymptotic distribution of $\widehat{\boldsymbol{\gamma}}$ has a bias term while $\widehat{\boldsymbol{\eta}}$ does not have such a bias, which collaborates the findings in Yan et al. (2019). This is because of different convergence rates for $\widehat{\boldsymbol{\gamma}}$ and $\widehat{\boldsymbol{\eta}}$ , which explains the bias phenomenon in Yan et al. (2019). Wide simulations are carried out to demonstrate our theoretical findings. A real data example is also provided for illustration.

1.1 Related work

There is a large body of work concerned with the degree heterogeneity. The $\beta$ -model is one of the most popular models to model the degree heterogeneity, which is named by Chatterjee et al. (2011). Since the number of parameters increases with the number of nodes, asymptotic theory is nonstandard. Exploring the properties of the $\beta$ -model and its generalizations has received wide attentions in recent years (Chatterjee et al., 2011; Perry and Wolfe, 2012; Hillar and Wibisono, 2013; Yan and Xu, 2013; Rinaldo and Fienberg, 2013). In particular, Chatterjee et al. (2011) proved the uniform consistency of the maximum likelihood estimator (MLE) and Yan and Xu (2013) derived the asymptotic normality of the MLE. Yan et al. (2016b) further established a unified theoretical framework under which the consistency and asymptotic normality of the moment estimator hold in a class of node-parameter network models. In the directed case, Yan et al. (2016a) proved the consistency and asymptotic normality of the MLE in the $p_{0}$ model, which is a special case of the $p_{1}$ model by Holland and Leinhardt (1981).

Graham (2017) incorporated covariates to the $\beta$ -model to model the homophily in undirected networks. Dzemski (2019) and Yan et al. (2019) proposed directed graph models to model edges with covariates by using a bivariate normal distribution and a logistic distribution, respectively. The asymptotic normality of the restricted maximum likelihood estimator of the covariate parameter were derived under the assumption that the estimators for all parameters are taken in one compact set in these papers. Yan et al. (2019) further derived the asymptotic normality of the restricted MLE of the node parameter. However, the convergence rate of the covariate parameter is not investigated in their works. Yan (2021) worked with the unrestricted maximum likelihood estimation in a sparse $\beta$ -model with covariates for undirected graphs and proved the uniformly consistency and asymptotic normality of the MLE, where the consistency is proved via the convergence rate of the Newton iterative sequence in a two-stage Newton method. We demonstrate that his method could be further extended here. We also note that extending his result to directed graphs is not trivial because of that the Fisher information matrix here is very different from that in the sparse $\beta$ -model with covariates and that the dimension of parameter space here is much bigger.

Finally, we note that the model can be recast into a logistic regression model. There have been a considerable mount of literatures on asymptotic theory in generalized linear models with a growing number of parameters. Haberman (1977) establish the asymptotic theory of the MLE in exponential response models in a setting that allows the number of parameters goes to infinity. By assuming that a sequence of independent and identical distributed samples $\{X_{i}\}_{i=1}^{n}$ from a canonical exponential family $c(\theta)\exp(\theta^{\top}x)$ with an increasing dimension $p_{n}$ , Portnoy (1988) shew $\|\widehat{\theta}-\theta^{*}\|_{\infty}=O_{p}(\sqrt{p_{n}/n})$ if $p_{n}/n=o(1)$ and $\widehat{\theta}$ is asymptotically normal if $p_{n}^{2}/n=o(1)$ under some additional conditions. For clustered binary data $\{(Y_{i},X_{i})\}_{i=1}^{n}$ with a $p_{n}$ dimensional covariate $X_{i}$ and regression parameter $\theta$ , Wang (2011) derived the asymptotic theory of generalized estimating equations estimators if $p_{n}^{2}/n=o(1)$ . Liang and Du (2012) establish the asymptotic normality of the maximum likelihood estimate when the number of covariates $p$ goes to infinity with the sample size $n$ in the order of $p=o(n)$ , but the error bound between $\widehat{\theta}$ and $\theta$ is not investigated. We note that asymptotic theory in these papers needs the condition that $c_{1}N\leq\lambda_{\min}\leq\lambda_{\max}\leq c_{2}N$ for two positive constants $c_{1}$ and $c_{2}$ , where $S_{N}=\sum_{i=1}^{N}x_{i}x_{i}^{\top}$ and $\lambda_{\min}$ and $\lambda_{\max}$ are the minimum and maximum eigenvalue of $S_{N}$ , $x_{i}$ is a $p_{N}$ -dimensional covariate vector. In our model, the first $n$ diagonal entries of $S_{N}$ is in the order of $n$ while the last $p$ diagonal entries of $S_{N}$ is in the order of $n^{2}$ . Because of different orders of the diagonal elements of $S_{N}$ , the ratio $\lambda_{\max}/\lambda_{\min}$ is in the order of $O(n^{2})$ , far away from the constant order in Liang and Du (2012). It is also worth noting that the consistency and asymptotic normality of the MLE have been derived in exponential response model [Portnoy (1988)]. The asymptotic normality of $\widehat{\theta}$ does not contain a bias term in all mentioned these papers while the asymptotic distribution of $\widehat{\gamma}$ in our model contains a bias term. This is referred to as the incident bias problem in economic literature [e.g., Graham (2017)].

For the remainder of the paper, we proceed as follows. In Section 2, we give the details on the model considered in this paper. In section 3, we establish asymptotic results. Numerical studies are presented in Section 4. We provide further discussion and future work in Section 5. All proofs are relegated to the appendix.

2 Maximum Likelihood Estimation

Let $\boldsymbol{\alpha}=(\alpha_{1},\ldots,\alpha_{n})^{\top}$ and $\boldsymbol{\beta}=(\beta_{1},\ldots,\beta_{n})^{\top}$ . If one transforms $(\nu,\boldsymbol{\alpha},\boldsymbol{\beta})$ to $(\nu+2c_{2},\boldsymbol{\alpha}-c_{1}-c_{2},\boldsymbol{\beta}+c_{1}-c_{2})$ for two distinct constants $c_{1}$ and $c_{2}$ , the probability in (1) does not change. Thus, we need restrictions on parameters to ensure the model identification. We set the restriction that $\alpha_{n}=0$ and $\beta_{n}=0$ for practical analysis, which keeps the density parameter $\nu$ . For theoretical studies in next section, we set $\nu=0$ and $\beta_{n}=0$ as in Yan et al. (2019) for convenience. Other restrictions are also possible, e.g., $\sum_{i}\alpha_{i}=0$ and $\sum_{j}\beta_{j}=0$ , or, $\nu=0$ and $\sum_{i}\alpha_{i}=0$ .

Denote $A=(a_{ij})_{n\times n}$ as the adjacency matrix of $\mathcal{G}_{n}$ . We assume that there are no self-loops, i.e., $a_{ii}=0$ . We use $d_{i}=\sum_{j\neq i}a_{ij}$ to denote the out-degree of node $i$ and $\mathbf{d}=(d_{1},\ldots,d_{n})^{\top}$ . Similarly, $b_{j}=\sum_{i\neq j}a_{ij}$ denotes the in-degree of node $j$ and $\mathbf{b}=(b_{1},\ldots,b_{n})^{\top}$ . The pair $\{(b_{1},d_{1}),\ldots,(b_{n},d_{n})\}$ is the so-called bi-degree sequence. Since an out-edge from node $i$ pointing to $j$ is the in-edge of $j$ coming from $i$ , it is immediate that the sum of out-degrees is equal to that of in-degrees.

Since the random variables $a_{i,j}$ for $i\neq j$ , are mutually independent given the covariates, the log-likelihood function is

\ell(\nu,\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})=\sum_{i,j}a_{ij}(\nu+Z_{ij}^{\top}\boldsymbol{\gamma})+\boldsymbol{\alpha}^{\top}\mathbf{d}+\boldsymbol{\beta}^{\top}\mathbf{b}-\sum_{i\neq j}\log\big{(}1+\exp(Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{j})\big{)}.

(2)

As discussed before, $\boldsymbol{\alpha}$ is a parameter vector tied to the out-degree sequence, and $\boldsymbol{\beta}$ is a parameter vector tied to the in-degree sequence, and $\boldsymbol{\gamma}=(\gamma_{1},\ldots,\gamma_{p})^{\top}$ is a parameter vector tied to the information of node covariates. It follows that the score equations are

\large\begin{array}[]{rcl}\sum\limits_{i,j=1;i\neq j}^{n}a_{ij}&=&\sum\limits_{i,j=1;i\neq j}^{n}\frac{e^{Z_{ij}^{\top}\boldsymbol{{\gamma}}+\nu+\alpha_{i}+\beta_{j}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{j}}},\\ \sum\limits_{i,j=1;i\neq j}^{n}a_{ij}Z_{ij}&=&\sum\limits_{i,j=1;i\neq j}^{n}\frac{Z_{ij}e^{Z_{ij}^{\top}\boldsymbol{{\gamma}}+\nu+\alpha_{i}+\beta_{j}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\alpha_{i}+\beta_{j}}},\\ d_{i}&=&\sum\limits_{k=1,k\neq i}^{n}\frac{e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{k}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{k}}},~{}~{}~{}i=1,\ldots,n,\\ b_{j}&=&\sum\limits_{k=1,k\neq j}^{n}\frac{e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{k}+\beta_{j}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{k}+\beta_{j}}},~{}~{}j=1,\ldots,n.\end{array}

(3)

Denote the MLE of $\ell(\nu,\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})$ as $(\widehat{\nu},\widehat{\boldsymbol{\alpha}},\widehat{\boldsymbol{\beta}},\widehat{\boldsymbol{\gamma}})$ . Note that $\widehat{\alpha}_{1}=\alpha_{1}=0$ and $\widehat{\beta}_{1}=\beta_{1}=0$ . If $(\widehat{\nu},\widehat{\boldsymbol{\alpha}},\widehat{\boldsymbol{\beta}},\widehat{\boldsymbol{\gamma}})$ exists, then it is unique and satisfies the above equations.

As discussed in Yan et al. (2019), when the number of nodes $n$ is small, we can simply use the R function “glm” to solve (3). For relatively large $n$ , this is no longer feasible as it is memory demanding to store the design matrix needed for $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ . In this case, we recommend the use of a two-step iterative algorithm by alternating between solving the second and third equations in (3) via the fixed point method in Yan et al. (2016a) and solving the first equation in (3) via some existing algorithm for generalized linear models.

3 Theoretical Properties

In this section, we set $\nu=0$ and $\beta_{n}=0$ as the model identification condition. The asymptotic properties of $\widehat{\nu}$ in the restriction $\alpha_{n}=\beta_{n}=0$ is the same as those of $\widehat{\alpha}_{n}$ in the restriction $\nu=\beta_{n}=0$ .

We first introduce some notations. Let $\mathbb{R}=(-\infty,\infty)$ be the real domain. For a subset $C\subset\mathbb{R}^{n}$ , let $C^{0}$ and $\overline{C}$ denote the interior and closure of $C$ , respectively. For a vector $\mathbf{x}=(x_{1},\ldots,x_{n})^{\top}\in\mathbb{R}^{n}$ , denote by $\|\mathbf{x}\|$ for a general norm on vectors with the special cases $\|\mathbf{x}\|_{\infty}=\max_{1\leq i\leq n}|x_{i}|$ and $\|\mathbf{x}\|_{1}=\sum_{i}|x_{i}|$ for the $\ell_{\infty}$ - and $\ell_{1}$ -norm of $\mathbf{x}$ respectively. When $n$ is fixed, all norms on vectors are equivalent. Let $B(\mathbf{x},\epsilon)=\{\mathbf{y}:\|\mathbf{x}-\mathbf{y}\|_{\infty}\leq\epsilon\}$ be an $\epsilon$ -neighborhood of $\mathbf{x}$ . For an $n\times n$ matrix $J=(J_{i,j})$ , let $\|J\|_{\infty}$ denote the matrix norm induced by the $\ell_{\infty}$ -norm on vectors in $\mathbb{R}^{n}$ , i.e.,

\|J\|_{\infty}=\max_{x\neq 0}\frac{\|J\mathbf{x}\|_{\infty}}{\|\mathbf{x}\|_{\infty}}=\max_{1\leq i\leq n}\sum_{j=1}^{n}|J_{i,j}|,

and $\|J\|$ be a general matrix norm. Define the matrix maximum norm: $\|J\|_{\max}=\max_{i,j}|J_{ij}|$ . The notation $\sum_{i}$ denotes the summarization over all $i=1,\ldots,n$ and $\sum_{i\neq j}$ is a shorthand for $\sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n}$ .

For convenience of our theoretical analysis, define $\boldsymbol{\eta}=(\alpha_{1},\ldots,\alpha_{n},\beta_{1},\ldots,\beta_{n})^{\top}$ . We use the superscript “*” to denote the true parameter under which the data are generated. When there is no ambiguity, we omit the super script “*”. Further, define

\pi_{ij}:=\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma},~{}~{}\mu(x):=\frac{e^{x}}{1+e^{x}}.

Write $\mu^{\prime}$ , $\mu^{\prime\prime}$ and $\mu^{\prime\prime\prime}$ as the first, second and third derivative of $\mu(x)$ on $x$ , respectively. Direct calculations give that

\mu^{\prime}(x)=\frac{e^{x}}{(1+e^{x})^{2}},~{}~{}\mu^{\prime\prime}(x)=\frac{e^{x}(1-e^{x})}{(1+e^{x})^{3}},~{}~{}\mu^{\prime\prime\prime}(x)=\frac{e^{x}(1-4e^{x}+e^{2x})}{(1+e^{x})^{4}}.

It is not difficult to verify

|\mu^{\prime}(x)|\leq\frac{1}{4},~{}~{}|\mu^{\prime\prime}(x)|\leq\frac{1}{4},~{}~{}|\mu^{\prime\prime\prime}(x)|\leq\frac{1}{4}.

(4)

Let $\epsilon_{n1}$ ( $=o(1)$ ) and $\epsilon_{n2}$ ( $\leq(\log n/n)^{1/2}/(pz_{\max})$ ) be two small positive numbers, where $z_{\max}:=\sup_{i,j}\|Z_{ij}\|_{\infty}$ . Define

b_{n}:=\sup_{\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1}),\boldsymbol{\gamma}\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2})}\frac{(1+e^{\pi_{ij}})^{2}}{e^{\pi_{ij}}}=O(e^{2\|\boldsymbol{\eta}^{*}\|_{\infty}+z_{\max}\|\boldsymbol{\gamma}^{*}\|_{\infty}}).

(5)

It is easy to see that $b_{n}\geq 4$ . When $\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1}),\boldsymbol{\gamma}\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2})$ , $\mu^{\prime}(\pi_{ij})\geq 1/b_{n}$ .

When causing no confusion, we will simply write $\mu_{ij}$ stead of $\mu_{ij}(\boldsymbol{\eta},\boldsymbol{\gamma})$ for shorthand, where

\mu_{ij}(\boldsymbol{\eta},\boldsymbol{\gamma})=\frac{\exp(\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}{1+\exp(\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}=\mu(\pi_{ij}).

Write the partial derivative of a function vector $F(\widehat{\boldsymbol{\eta}},\boldsymbol{\gamma})$ on $\boldsymbol{\eta}$ as

\frac{\partial F(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})}{\partial\boldsymbol{\eta}^{\top}}=\frac{\partial F(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}^{\top}}\bigg{|}_{\boldsymbol{\eta}=\widehat{\boldsymbol{\eta}},\boldsymbol{\gamma}=\widehat{\boldsymbol{\gamma}}}.

Throughout the remainder of this paper, we make the following assumption.

Assumption 1.

Assume that $p$ , the dimension of $Z_{ij}$ , is fixed and that the support of $Z_{ij}$ is $\mathbb{Z}^{p}$ , where $\mathbb{Z}$ is a compact subset of $\mathbb{R}$ .

The above assumption is made in Graham (2017) and Yan et al. (2019). In many real applications, the attributes of nodes have a fixed dimension and $Z_{ij}$ is bounded. For example, if $Z_{ij}$ ’s are indictor variables such as sex, then the assumption holds. If $Z_{ij}$ is not bounded, we make the transform $\tilde{Z}_{ij}=e^{Z_{ij}}/(1+e^{Z_{ij}})$ .

3.1 Consistency

In order to establish the existence and consistency of $(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})$ , we define a system of functions

$\displaystyle F_{i}(\boldsymbol{\eta},\boldsymbol{\gamma})$	$\displaystyle=$	$\displaystyle\sum_{k=1;k\neq i}^{n}\frac{e^{\alpha_{i}+\beta_{k}+Z_{ij}^{\top}\boldsymbol{\gamma}}}{1+e^{\alpha_{i}+\beta_{k}+Z_{ij}^{\top}\boldsymbol{\gamma}}}-d_{i},~{}~{}~{}i=1,\ldots,n,$
$\displaystyle F_{n+j}(\boldsymbol{\eta},\boldsymbol{\gamma})$	$\displaystyle=$	$\displaystyle\sum_{k=1;k\neq j}^{n}\frac{e^{\alpha_{k}+\beta_{j}+Z_{kj}^{\top}\boldsymbol{\gamma}}}{1+e^{\alpha_{k}+\beta_{j}++Z_{kj}^{\top}\boldsymbol{\gamma}}}-b_{j},~{}~{}~{}j=1,\ldots,n,$	(6)
$\displaystyle F(\boldsymbol{\eta},\boldsymbol{\gamma})$	$\displaystyle=$	$\displaystyle(F_{1}(\boldsymbol{\eta},\boldsymbol{\gamma}),\ldots,F_{n}(\boldsymbol{\eta},\boldsymbol{\gamma}),F_{n+1}(\boldsymbol{\eta},\boldsymbol{\gamma}),\ldots,F_{2n-1}(\boldsymbol{\eta},\boldsymbol{\gamma}))^{\top},$

which are based on the score equations for $\widehat{\boldsymbol{\eta}}$ . Define $F_{\gamma,i}(\boldsymbol{\eta})$ be the value of $F_{i}(\boldsymbol{\eta},\boldsymbol{\gamma})$ when $\boldsymbol{\gamma}$ is fixed. Let $\widehat{\boldsymbol{\eta}}_{\gamma}$ be a solution to $F_{\gamma}(\boldsymbol{\eta})=0$ if it exists. Correspondingly, we define two functions for exploring the asymptotic behaviors of the estimator of $\boldsymbol{\gamma}$ :

	$\displaystyle Q(\boldsymbol{\eta},\boldsymbol{\gamma})=\sum_{i,j=1;i\neq j}^{n}Z_{ij}(\mu_{ij}(\boldsymbol{\eta},\boldsymbol{\gamma})-a_{ij}),$		(7)
	$\displaystyle Q_{c}(\boldsymbol{\gamma})=\sum_{i,j=1;i\neq j}^{n}Z_{ij}(\mu_{ij}(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})-a_{ij}).$		(8)

$Q_{c}(\boldsymbol{\gamma})$ could be viewed as the concentrated or profile function of $Q(\boldsymbol{\eta},\boldsymbol{\gamma})$ in which the degree parameter $\boldsymbol{\eta}$ is profiled out. It is clear that

F(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})=0,~{}~{}F_{\gamma}(\widehat{\boldsymbol{\eta}}_{\gamma})=0,~{}~{}Q(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})=0,~{}~{}Q_{c}(\widehat{\boldsymbol{\gamma}})=0.

We first present the error bound between $\widehat{\boldsymbol{\eta}}_{\gamma}$ with $\gamma\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2})$ and $\boldsymbol{\eta}^{*}$ . We construct the Newton iterative sequence $\{\boldsymbol{\eta}^{(k+1)}\}_{k=0}^{\infty}$ with initial value $\boldsymbol{\eta}^{*}$ , where $\boldsymbol{\eta}^{(k+1)}=\boldsymbol{\eta}^{(k)}-[F_{\gamma}^{\prime}(\boldsymbol{\eta}^{(k)})]^{-1}F_{\gamma}(\boldsymbol{\eta}^{(k)})$ and $F_{\gamma}^{\prime}(\boldsymbol{\eta})=\partial F_{\gamma}(\boldsymbol{\eta})/\partial\boldsymbol{\gamma}^{\top}$ . If the iterative converges, then the solution lies in the neighborhood of $\boldsymbol{\eta}^{*}$ . This is done by establishing a geometrically fast convergence rate of the iterative sequence. Details are given in supplementary material. The error bound is stated below.

Lemma 1.

If $\boldsymbol{\gamma}\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2})$ and $b_{n}=o((n/\log n)^{1/12})$ , then as $n$ goes to infinity, with probability at least $1-O(1/n)$ , $\widehat{\boldsymbol{\eta}}_{\gamma}$ exists and satisfies

\|\widehat{\boldsymbol{\eta}}_{\gamma}-\boldsymbol{\eta}^{*}\|_{\infty}=O_{p}\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right)=o_{p}(1).

Further, if $\widehat{\boldsymbol{\eta}}_{\gamma}$ exists, it is unique.

By the compound function derivation law, we have

	$\displaystyle 0=\frac{\partial F_{\gamma}(\widehat{\boldsymbol{\eta}}_{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}=\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\gamma)}{\partial\boldsymbol{\eta}^{\top}}\frac{\partial\widehat{\boldsymbol{\eta}}_{\gamma}}{\boldsymbol{\gamma}^{\top}}+\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}},$		(9)
	$\displaystyle\frac{\partial Q_{c}({\boldsymbol{\gamma}})}{\partial{\boldsymbol{\gamma}}^{\top}}=\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},{\boldsymbol{\gamma}})}{\partial\boldsymbol{\eta}^{\top}}\frac{\partial\widehat{\boldsymbol{\eta}}_{\gamma}}{{\boldsymbol{\gamma}}^{\top}}+\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},{\boldsymbol{\gamma}})}{\partial{\boldsymbol{\gamma}}^{\top}}.$		(10)

By solving $\partial\widehat{\boldsymbol{\eta}}_{\gamma}/\partial{\boldsymbol{\gamma}}^{\top}$ in (9) and substituting it into (10), we get the Jacobian matrix $Q_{c}^{\prime}(\boldsymbol{\gamma})$ $(=\partial Q_{c}(\boldsymbol{\gamma})/\partial\boldsymbol{\gamma}^{\top})$ :

\displaystyle\frac{\partial Q_{c}(\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}=\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}-\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}^{\top}}\left[\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}^{\top}}\right]^{-1}\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}.

(11)

The asymptotic behavior of $\widehat{\boldsymbol{\gamma}}$ crucially depends on the Jacobian matrix $Q_{c}^{\prime}(\boldsymbol{\gamma})$ . Since $\widehat{\boldsymbol{\eta}}_{\gamma}$ does not have a closed form, conditions that are directly imposed on $Q_{c}^{\prime}(\boldsymbol{\gamma})$ are not easily checked. To derive feasible conditions, we define

H(\boldsymbol{\eta},\boldsymbol{\gamma})=\frac{\partial Q(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}}-\frac{\partial Q(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}}\left[\frac{\partial F(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}}\right]^{-1}\frac{\partial F(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}},

(12)

which is a general form of $\partial Q_{c}(\boldsymbol{\gamma})/\partial\boldsymbol{\gamma}$ . Because $H(\boldsymbol{\eta},\boldsymbol{\gamma})$ is the Fisher information matrix of the profiled log-likelihood $\ell_{c}(\boldsymbol{\gamma})$ , we assume that it is positively definite. Otherwise, the covariate- $p_{0}$ -model will be ill-conditioned. When $\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1})$ , we have the equation:

\frac{1}{n^{2}}H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*})=\frac{1}{n^{2}}H(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})+o(1),

(13)

whose proof is omitted. Note that the dimension of $H(\boldsymbol{\eta},\boldsymbol{\gamma})$ is fixed and every its entry is a sum of $(n-1)n$ terms. We assume that there exists a number $\kappa_{n}$ such that

\sup_{\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1})}\|H^{-1}(\boldsymbol{\eta},\boldsymbol{\gamma}^{*})\|_{\infty}\leq\frac{\kappa_{n}}{n^{2}}.

(14)

If $n^{-2}H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*})$ converges to a constant matrix, then $\kappa_{n}$ is bounded. Because $H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*})$ is positively definite,

\kappa_{n}=\sqrt{p}\times\sup_{\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1})}1/\lambda_{\min}(\boldsymbol{\eta}),

where $\lambda_{\min}(\boldsymbol{\eta})$ be the smallest eigenvalue of $n^{-2}H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*})$ . Now we formally state the consistency result.

Theorem 1.

If $\kappa_{n}^{2}b_{n}^{18}=o(n/\log n)$ , then the MLE $(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})$ exists with probability at least $1-O(1/n)$ , and is consistent in the sense that

	$\displaystyle\\|\widehat{\boldsymbol{\gamma}}-\boldsymbol{\gamma}^{*}\\|_{\infty}$	$\displaystyle=O_{p}\left(\frac{\kappa_{n}b_{n}^{9}\log n}{n}\right)=o_{p}(1)$
	$\displaystyle\\|\widehat{\boldsymbol{\eta}}-\boldsymbol{\eta}^{*}\\|_{\infty}$	$\displaystyle=O_{p}\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right)=o_{p}(1).$

Further, if $\widehat{\theta}$ exists, it is unique.

From the above theorem, we can see that $\widehat{\boldsymbol{\gamma}}$ has a convergence rate $1/n$ and $\widehat{\boldsymbol{\eta}}$ has a convergence rate $1/n^{1/2}$ , up to a logarithm factor, if $b_{n}$ and $\kappa_{n}$ is bounded above by a constant. If $\|\boldsymbol{\gamma}^{*}\|_{\infty}$ and $\|\boldsymbol{\eta}^{*}\|_{\infty}$ are constants, then $b_{n}$ and $\kappa_{n}$ are constants as well such that the condition in Theorem 1 holds.

As an application of the $\ell_{\infty}$ -error bound, we use it to recover nonzero signals on node parameters. We need a separation measure $\Delta_{n}$ to distinguish between nonzero signals and zero signals for $\beta_{i}$ s. Define $\Xi=\{i:\beta_{i}^{*}>\Delta_{n}\}$ as the set of signals. Because

\hat{\beta}_{i}\geq\beta_{i}^{*}-|\hat{\beta}_{i}-\beta_{i}^{*}|=\Delta_{n}-O\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right),~{}i\in\Xi,

we immediately have the following corollary.

Corollary 2.

If $\Delta_{n}\gg b_{n}^{3}(\log n/n)^{1/2}$ and $\kappa_{n}^{2}b_{n}^{18}=o(n/\log n)$ , then with probability at least $1-O(n^{-1})$ , the set $\Xi$ can be exactly recovered by its MLE $\widehat{\Xi}$ , where $\widehat{\Xi}=\{i:\hat{\beta}_{i}\geq\Delta_{n}\}$ .

3.2 Asymptotic normality of $\widehat{\boldsymbol{\eta}}$ and $\widehat{\boldsymbol{\gamma}}$

The asymptotic distribution of $\widehat{\boldsymbol{\eta}}$ depends crucially on the inverse of the Fisher information matrix $F^{\prime}_{\gamma}(\boldsymbol{\eta})$ . It does not have a closed form. In order to characterize this matrix, we introduce a general class of matrices that encompass the Fisher matrix. Given two positive numbers $m$ and $M$ with $M\geq m>0$ , we say the $(2n-1)\times(2n-1)$ matrix $V=(v_{i,j})$ belongs to the class $\mathcal{L}_{n}(m,M)$ if the following holds:

\begin{array}[]{l}0\leq v_{i,i}-\sum_{j=1}^{2n-1}v_{i,j}\leq M,~{}~{}i=1,\ldots,2n-1,\\ v_{i,j}=0,~{}~{}i,j=1,\ldots,n,~{}i\neq j,\\ v_{i,j}=0,~{}~{}i,j=n+1,\ldots,2n-1,~{}i\neq j,\\ m\leq v_{i,j}=v_{j,i}\leq M,~{}~{}i=1,\ldots,n,~{}j=n+1,\ldots,2n,~{}j\neq n+i,\\ \end{array}

(15)

Clearly, if $V\in\mathcal{L}_{n}(m,M)$ , then $V$ is a $(2n-1)\times(2n-1)$ diagonally dominant, symmetric nonnegative matrix. It must be positively definite. The definition of $\mathcal{L}_{n}(m,M)$ here is wider than that in Yan et al. (2016a), where the diagonal elements are equal to the row sum excluding themselves for some rows. One can easily show that the Fisher information matrix for the vector parameter $\boldsymbol{\eta}$ belongs to $\mathcal{L}_{n}(m,M)$ . With some abuse of notation, we use $V$ to denote the Fisher information matrix for the vector parameter $\boldsymbol{\eta}$ .

To describe the exact form of elements of $V$ , $v_{ij}$ for $i,j=1,\ldots,2n-1$ , $i\neq j$ , we define

u_{ij}=\mu^{\prime}(\pi_{ij}),~{}~{}u_{ii}=0,~{}~{}u_{i\cdot}=\sum_{j=1}^{n}u_{ij},~{}~{}u_{\cdot j}=\sum_{i=1}^{n}u_{ij}.

Note that $u_{ij}$ is the variance of $a_{ij}$ . Then the elements of $V$ are

v_{ij}=\begin{cases}u_{i\cdot},&i=j=1,\ldots,n,\\ u_{i,j-n},&i=1,\ldots,n,j=n+1,\ldots,2n-1,j\neq i+n,\\ u_{i-n,j},&i=n+1,\ldots,2n,j=1,\ldots,n-1,j\neq i-n,\\ u_{\cdot j-n},&i=j=n+1,\ldots,2n-1,\\ 0,&\mbox{others}.\end{cases}

Yan et al. (2016a) proposed to approximate the inverse $V^{-1}$ by the matrix $S=(s_{i,j})$ , which is defined as

s_{i,j}=\left\{\begin{array}[]{ll}\frac{\delta_{i,j}}{u_{i\cdot}}+\frac{1}{u_{\cdot n}},&i,j=1,\ldots,n,\\ -\frac{1}{u_{\cdot n}},&i=1,\ldots,n,~{}~{}j=n+1,\ldots,2n-1,\\ -\frac{1}{u_{\cdot n}},&i=n+1,\ldots,2n,~{}~{}j=1,\ldots,n-1,\\ \frac{\delta_{i,j}}{u_{\cdot(j-n)}}+\frac{1}{u_{\cdot n}},&i,j=n+1,\ldots,2n-1,\end{array}\right.

(16)

where $\delta_{i,j}=1$ when $i=j$ and $\delta_{i,j}=0$ when $i\neq j$ .

We derive the asymptotic normality of $\widehat{\boldsymbol{\boldsymbol{\eta}}}$ by approximately representing $\boldsymbol{\widehat{\boldsymbol{\eta}}}$ as a function of $\mathbf{d}$ and $\mathbf{b}$ with an explicit expression. This is done via applying a second Taylor expansion to $F(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})$ and showing various remainder terms asymptotically neglect. Details are given in the supplementary mateiral.

Theorem 3.

If $\kappa_{n}b_{n}^{12}=o(n^{1/2}/\log n)$ , then for any fixed $k\geq 1$ , as $n\to\infty$ , the vector consisting of the first $k$ elements of $(\widehat{\boldsymbol{\boldsymbol{\eta}}}-\boldsymbol{\boldsymbol{\eta}}^{*})$ is asymptotically multivariate normal with mean $\mathbf{0}$ and covariance matrix given by the upper left $k\times k$ block of $S$ defined in (16).

Remark 1.

By Theorem 3, for any fixed $i$ , as $n\rightarrow\infty$ , the asymptotic variance of $\hat{\boldsymbol{\eta}}_{i}$ is $1/v_{i,i}^{1/2}$ , whose magnitude is between $O(n^{-1/2}e^{\|\boldsymbol{\boldsymbol{\eta}}^{*}\|_{\infty}})$ and $O(n^{-1/2})$ .

Remark 2.

Under the restriction $\alpha_{n}=\beta_{n}=0$ , the scaled MLE $\widehat{\nu}$ for the density parameter $\nu$ , $(u_{1\cdot}+u_{\cdot n})^{-1/2}(\widehat{\nu}-\nu)$ , converges in distribution to the standard normality.

With similar arguments for proving the asymptotic normality of the restricted MLE for ${\boldsymbol{\gamma}}$ in Yan et al. (2019), we have the same asymptotic normality of the unrestricted MLE. We only present the result here and the proof is omitted. Let

V_{\eta\gamma}=\frac{\partial F(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})}{\partial\boldsymbol{\eta}^{\top}},~{}~{}V_{\gamma\gamma}=\frac{\partial Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})}{\partial\boldsymbol{\gamma}^{\top}}.

Following Amemiya (1985) (p. 126), the Hessian matrix of the concentrated log-likelihood function $\ell^{c}(\boldsymbol{\gamma}^{*})$ is $V_{\gamma\gamma}-V_{\eta\gamma}^{\top}V^{-1}V_{\eta\gamma}$ . To state the form of the limit distribution of $\hat{\boldsymbol{\gamma}}$ , define

I_{n}(\boldsymbol{\gamma}^{*})=\frac{1}{(n-1)n}(V_{\gamma\gamma}-V_{\eta\gamma}^{\top}V^{-1}V_{\eta\gamma}).

(17)

Assume that the limit of $I_{n}(\boldsymbol{\gamma}^{*})$ exists as $n$ goes to infinity, denoted by $I_{*}(\boldsymbol{\gamma}^{*})$ . Then, we have the following theorem.

Theorem 4.

If $\kappa_{n}b_{n}^{12}=o(n^{1/2}/(\log n)^{3/2})$ , then as $n$ goes to infinity, the $p$ -dimensional vector $N^{1/2}(\widehat{\boldsymbol{\gamma}}-\boldsymbol{\gamma}^{*})$ is asymptotically multivariate normal distribution with mean $I_{*}^{-1}(\boldsymbol{\gamma}^{*})B_{*}$ and covariance matrix $I_{*}^{-1}(\boldsymbol{\gamma})$ , where $N=n(n-1)$ and $B_{*}$ is the bias term:

B_{*}=\lim_{n\to\infty}\frac{1}{2\sqrt{N}}\left[\sum_{i=0}^{n}\frac{\sum_{j=0,j\neq i}^{n}\mu^{\prime\prime}(\pi_{ij}^{*})Z_{ij}}{\sum_{j=0,j\neq i}^{n}\mu^{\prime}(\pi_{ij}^{*})}+\sum_{j=0}^{n}\frac{\sum_{i=0,i\neq j}^{n}\mu^{\prime\prime}(\pi_{ij}^{*})Z_{ij}}{\sum_{i=0,i\neq j}^{n}\mu^{\prime}(\pi_{ij}^{*})}\right].

The limiting distribution of $\boldsymbol{\widehat{\gamma}}$ is involved with a bias term

\mu_{*}=\frac{I_{*}^{-1}(\boldsymbol{\gamma})B_{*}}{\sqrt{n(n-1)}}.

If all parameters $\boldsymbol{\gamma}$ and $\boldsymbol{\theta}$ are bounded, then $\mu_{*}=O(n^{-1/2})$ . It follows that $B_{*}=O(1)$ and $(I_{*})_{i,j}=O(1)$ according to their expressions. Since the MLE $\boldsymbol{\widehat{\gamma}}$ is not centered at the true parameter value, the confidence intervals and the p-values of hypothesis testing constructed from $\boldsymbol{\widehat{\gamma}}$ cannot achieve the nominal level without bias-correction under the null: $\boldsymbol{\gamma}^{*}=0$ . This is referred to as the so-called incidental parameter problem in econometric literature [Neyman and Scott (1948)]. The produced bias is due to the appearance of additional parameters. As discussed in Yan et al. (2019), we could use the analytical bias correction formula: $\boldsymbol{\widehat{\gamma}}_{bc}=\boldsymbol{\widehat{\gamma}}-\hat{I}^{-1}\hat{B}/\sqrt{n(n-1)}$ , where $\hat{I}$ and $\hat{B}$ are the estimates of $I_{*}$ and $B_{*}$ by replacing $\boldsymbol{\gamma}$ and $\boldsymbol{\theta}$ in their expressions with their MLEs $\boldsymbol{\widehat{\gamma}}$ and $\boldsymbol{\widehat{\theta}}$ , respectively.

4 Numerical Studies

In this section, we evaluate the asymptotic results of the MLEs for model (2) through simulation studies and a real data example.

4.1 Simulation studies

Similar to Yan et al. (2019), the parameter values take a linear form. Specifically, we set $\alpha_{i}^{*}=ic\log n/n$ for $i=0,\ldots,n$ and let $\beta_{i}^{*}=\alpha_{i}^{*}$ , $i=0,\ldots,n$ for simplicity. Note that there are $n+1$ nodes in the simulations. We considered four different values for $c$ as $c\in\{0,0.2,0.4,0.6\}$ . By allowing the true values of $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ to grow with $n$ , we intended to assess the asymptotic properties under different asymptotic regimes. Similar to Yan et al. (2019), we set $p=2$ here. The first element of the $2$ -dimensional node-specific covariate $X_{i}$ is independently generated from a $Beta(2,2)$ distribution and the second element follows a discrete distribution taking values $1$ and $-1$ with probabilities $0.3$ and $0.7$ . The covariates formed: $Z_{ij}=(|X_{i1}-X_{j1}|,X_{i2}*X_{j2})^{\top}$ . For the parameter $\boldsymbol{\gamma}^{*}$ , we let it be $(1,1)^{\top}$ . The density parameter is $\nu=-\log n/4$ .

Note that by Theorems 3, $\hat{\xi}_{i,j}=[\hat{\alpha}_{i}-\hat{\alpha}_{j}-(\alpha_{i}^{*}-\alpha_{j}^{*})]/(1/\hat{v}_{i,i}+1/\hat{v}_{j,j})^{1/2}$ , $\hat{\zeta}_{i,j}=(\hat{\alpha}_{i}+\hat{\beta}_{j}-\alpha_{i}^{*}-\beta_{j}^{*})/(1/\hat{v}_{i,i}+1/\hat{v}_{n+j,n+j})^{1/2}$ , and $\hat{\eta}_{i,j}=[\hat{\beta}_{i}-\hat{\beta}_{j}-(\beta_{i}^{*}-\beta_{j}^{*})]/(1/\hat{v}_{n+i,n+i}+1/\hat{v}_{n+j,n+j})^{1/2}$ are all asymptotically distributed as standard normal random variables, where $\hat{v}_{i,i}$ is the estimate of $v_{i,i}$ by replacing $(\boldsymbol{\gamma}^{*},\boldsymbol{\eta}^{*})$ with $(\boldsymbol{\widehat{\gamma}},\boldsymbol{\widehat{\eta}})$ . We record the coverage probability of the 95% confidence interval, the length of the confidence interval, and the frequency that the MLE does not exist. The results for $\hat{\xi}_{i,j}$ , $\hat{\zeta}_{i,j}$ and $\hat{\eta}_{i,j}$ are similar, thus only the results of $\hat{\xi}_{i,j}$ are reported. Each simulation is repeated $5,000$ times.

Table 1 reports the coverage probability of the 95% confidence interval for $\alpha_{i}-\alpha_{j}$ , the length of the confidence interval as well as the frequency that the MLE did not exist. As we can see, the length of the confidence interval increases as $c$ increases and decreases as $n$ increases, which qualitatively agrees with the theory. The coverage frequencies are all close to the nominal level when $c=0$ or $0.2$ or $c=0.4$ . When $c=0.6$ , the MLE failed to exist with positive frequencies.

Table 1: The reported values are the coverage frequency (

\times 100\%

) for

\alpha_{i}-\alpha_{j}

for a pair

(i,j)

/ the length of the confidence interval / the frequency (

\times 100\%

) that the MLE did not exist. The pair

(0,0)

indicates those values for the density parameter

\nu

n	$(i,j)$	$c=0$	$c=0.2$	$c=0.4$	$c=0.6$
100	$(1,2)$	$94.9/1.28/0$	$95.02/1.26/0$	$94.58/1.29/0$	$94.76/1.35/30.88$
	$(50,51)$	$95.4/1.28/0$	$94.34/1.27/0$	$94.98/1.39/0$	$95.14/1.68/30.88$
	$(99,100)$	$94.7/1.28/0$	$94.54/1.31/0$	$94.58/1.68/0$	$97.57/2.55/30.88$
	$(1,100)$	$95.12/1.28/0$	$94.5/1.29/0$	$94.8/1.5/0$	$96.53/1.99/30.88$
	$(1,50)$	$94.64/1.28/0$	$94.52/1.26/0$	$94.6/1.34/0$	$94.39/1.52/30.88$
	$(0,0)$	$94.26/1.29/0$	$94.66/1.27/0$	$94.2/1.29/0$	$94.36/1.36/30.88$
200	$(1,2)$	$95.12/0.92/0$	$95.3/0.89/0$	$94.34/0.91/0$	$94.73/0.96/7.74$
	$(100,101)$	$94.86/0.92/0$	$94.92/0.89/0$	$94.98/1/0$	$94.58/1.25/7.74$
	$(199,200)$	$94.54/0.93/0$	$94.74/0.92/0$	$94.88/1.26/0$	$96.4/2.1/7.74$
	$(1,200)$	$94.84/0.92/0$	$94.94/0.91/0$	$95.52/1.11/0$	$96.05/1.61/7.74$
	$(1,100)$	$95.04/0.92/0$	$95.66/0.89/0$	$95.26/0.96/0$	$94.39/1.12/7.74$
	$(0,0)$	$94.62/0.92/0$	$94.9/0.89/0$	$94.78/0.91/0$	$94.17/0.96/7.74$

Table 2 reports simulation results for the estimate $\boldsymbol{\widehat{\gamma}}$ and bias correction estimate $\boldsymbol{\widehat{\gamma}}_{bc}(=\boldsymbol{\widehat{\gamma}}-\hat{I}^{-1}\hat{B}/\sqrt{n(n-1)})$ at the nominal level $95\%$ . The coverage frequencies for the uncorrected estimate $\widehat{\gamma}_{2}$ are visibly below the nominal level and the bias correction estimates are close to the nominal level when $c\leq 0.2$ , which dramatically improve uncorrected estimates. As expected, the standard error increases with $c$ and decreases with $n$ .

Table 2: The reported values are the coverage frequency (

\times 100\%

) of

\gamma_{i}

for

i

with corrected estimates (uncorrected estimates) / length of confidence interval /the frequency (

\times 100\%

) that the MLE did not exist (

\boldsymbol{\gamma}^{*}=(1,1.5)^{\top}

$n$	$c$	$\gamma_{1}$	$\gamma_{2}$
$100$	$0$	$93.72(93.72)/0.58/0$	$92.72(86.62)/0.11/0$
	$0.2$	$93.12(92.88)/0.57/0$	$93.4(83.58)/0.1/0$
	$0.4$	$91.26(92.96)/0.62/0$	$73.04(86.06)/0.12/0$
	$0.6$	$85.24(92.62)/0.73/30.88$	$43.66(87.44)/0.15/30.88$
$200$	$0$	$92.9(93.94)/0.29/0$	$92.38(88.04)/0.06/0$
	$0.2$	$94.14(93.76)/0.28/0$	$93.32(84.22)/0.05/0$
	$0.4$	$90.5(93.00)/0.31/0$	$72.36(86.76)/0.06/0$
	$0.6$	$83.29(93.84)/0.38/7.74$	$40.93(88.64)/0.08/7.74$

4.2 A data example

Yan et al. (2019) analyzed the friendship network among the $71$ attorneys in Lazega’s datasets of lawyers (Lazega, 2001) without the density parameter $\nu$ , which can be ownloaded from https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm. We revisited this data set here and use the covariate- $p_{0}$ -model with the density parameter for analysis. The collected covariates at the node level include $X_{1}$ =status (1=partner; 2=associate); $X_{2}$ =gender (1=man; 2=woman), $X_{3}$ =location (1=Boston; 2=Hartford; 3=Providence), $X_{4}$ =years (with the firm), $X_{5}$ =age, $X_{6}$ =practice (1=litigation; 2=corporate) and $X_{7}$ =law school (1: harvard, yale; 2: ucon; 3: other). We denote the covariates of individual $i$ by $(X_{i1},\ldots,X_{i7})^{T}$ . The network graphs with node attributes “status and practice” were drawn in Yan et al. (2019). Here we complement graphs with node attributes “gender and location” in Figure 1. From this figure, we can see that there are more links between lawyers with the same gender and office location. It exhibits the homophily of gender and location.

Refer to caption — Figure 1: Visualization of Lazega s friendship network among lawyers. Two isolated nodes exist. The vertex sizes are proportional to either nodal in-degrees in (a) or out-degrees in (b). The positions of the vertices are the same in (a) and (b). For nodes with degrees less than $5$ , we set their sizes the same (as a node with degrees $5$ ). In (a), the colors indicate different genders (red for male and blue for female), while in (b), the colors represent different locations (black for Boston, green for Hartford, red for Providence).

Let $1_{\{\cdot\}}$ be an indicator function. As in Yan et al. (2019), we define the similarity between two individuals $i$ and $j$ by the indicator function: $Z_{ij,k}=1_{\{X_{ik}=X_{jk}\}}$ , $k=1,2,3,6,7$ , for categorical variables, For discrete variables, we define the similarity using absolute distance: $Z_{ij,k}=|X_{ik}-X_{jk}|$ , $k=4,5$ . Individuals are labelled from $1$ to $71$ . Before our analysis, we removed those individuals whose in-degrees or out-degrees are zeros, we perform the analysis on the $63$ vertices left. Since the change of model identification conditions have no influence on the analysis of covariates, the variables “status, gender, office location and practice” have significant influence on the link formation while the influence of “years, age, and law school” is not significant, as shown in Table 4 in Yan et al. (2019).

The estimate $\widehat{\nu}$ of the density parameter $\nu$ is $-7.83$ with standard error $1.13$ , which leads to a $p$ -value less than $10^{-4}$ under the null $\nu=0$ . This indicates a significant sparse network. The estimators of $\alpha_{i}$ , $\beta_{i}$ with their estimated standard errors are given in Table 3, in which $\widehat{\alpha}_{71}=\widehat{\beta}_{71}=0$ as a reference. The estimates of out- and in-degree parameters vary widely: from the minimum $-8.65$ to maximum $-2.21$ for $\widehat{\alpha}_{i}$ s and from $-1.55$ to $2.79$ for $\widehat{\beta}_{i}$ s. The minimum, $1/4$ quantile, $3/4$ quantile, and maximum values of $d$ are $1,5,8,12$ , and $25$ ; those of $b$ are $2,5,8,13$ , and $22$ .

Table 3: The estimates of

\alpha_{i}

and

\beta_{j}

and their standard errors in the Lazega’s frienship data set with

\widehat{\alpha}_{71}=\widehat{\beta}_{71}=0

Node	$d_{i}$	$\hat{\alpha}_{i}$	$\hat{\sigma}_{i}$	$b_{j}$	$\hat{\beta}_{i}$	$\hat{\sigma}_{j}$	Node	$d_{i}$	$\hat{\alpha}_{i}$	$\hat{\sigma}_{i}$	$b_{j}$	$\hat{\beta}_{i}$	$\hat{\sigma}_{j}$
1	$4$	$-6.21$	$0.79$	$5$	$0.53$	$0.77$	34	$6$	$-5.54$	$0.67$	$11$	$1.18$	$0.61$
2	$4$	$-6.01$	$0.82$	$9$	$1.91$	$0.7$	35	$9$	$-4.25$	$0.67$	$10$	$1.55$	$0.68$
4	$14$	$-3.46$	$0.65$	$14$	$2.79$	$0.63$	36	$9$	$-5.4$	$0.63$	$11$	$0.77$	$0.61$
5	$3$	$-5.01$	$0.8$	$5$	$1.43$	$0.73$	38	$8$	$-5.21$	$0.64$	$13$	$1.42$	$0.6$
7	$1$	$-6.59$	$1.17$	$2$	$-0.04$	$0.91$	39	$8$	$-5.47$	$0.64$	$13$	$1.14$	$0.61$
8	$1$	$-8.32$	$1.16$	$7$	$0.56$	$0.71$	40	$10$	$-5.29$	$0.62$	$8$	$0.21$	$0.64$
9	$6$	$-5.98$	$0.73$	$14$	$2.1$	$0.63$	41	$12$	$-5.04$	$0.61$	$17$	$1.42$	$0.59$
10	$14$	$-4.17$	$0.65$	$4$	$-0.45$	$0.85$	42	$14$	$-4.55$	$0.59$	$9$	$0.54$	$0.63$
11	$5$	$-6.49$	$0.74$	$14$	$1.7$	$0.63$	43	$15$	$-4.4$	$0.6$	$13$	$1.21$	$0.6$
12	$22$	$-2.95$	$0.61$	$8$	$0.86$	$0.69$	45	$6$	$-5.8$	$0.66$	$4$	$-0.63$	$0.74$
13	$14$	$-4.35$	$0.64$	$19$	$2.56$	$0.6$	46	$3$	$-5.61$	$0.81$	$5$	$0.53$	$0.74$
14	$6$	$-4.27$	$0.7$	$6$	$1.21$	$0.73$	48	$7$	$-5.4$	$0.65$	$4$	$-0.39$	$0.74$
15	$3$	$-4.89$	$0.8$	$2$	$0.39$	$0.93$	49	$4$	$-6.7$	$0.73$	$6$	$-0.42$	$0.68$
16	$8$	$-5.66$	$0.68$	$10$	$0.94$	$0.65$	50	$8$	$-4.34$	$0.67$	$8$	$1.15$	$0.68$
17	$23$	$-2.85$	$0.61$	$18$	$2.5$	$0.61$	51	$6$	$-4.67$	$0.7$	$7$	$1.11$	$0.7$
18	$8$	$-4.62$	$0.66$	$5$	$0.33$	$0.75$	52	$11$	$-5.1$	$0.61$	$14$	$1.12$	$0.6$
19	$4$	$-6.85$	$0.76$	$4$	$-0.77$	$0.79$	54	$7$	$-5.78$	$0.66$	$11$	$0.68$	$0.62$
20	$12$	$-5.01$	$0.65$	$7$	$0.2$	$0.69$	56	$7$	$-5.91$	$0.65$	$10$	$0.39$	$0.63$
21	$8$	$-5.73$	$0.67$	$15$	$1.47$	$0.61$	57	$9$	$-5.42$	$0.63$	$12$	$0.87$	$0.61$
22	$8$	$-5.67$	$0.65$	$6$	$-0.1$	$0.68$	58	$13$	$-3.6$	$0.62$	$12$	$1.83$	$0.64$
23	$1$	$-8.65$	$1.16$	$7$	$-0.01$	$0.68$	59	$5$	$-5.04$	$0.74$	$4$	$0.12$	$0.8$
24	$23$	$-3.59$	$0.59$	$17$	$1.68$	$0.59$	60	$4$	$-6.2$	$0.74$	$8$	$0.47$	$0.65$
25	$11$	$-3.95$	$0.63$	$10$	$1.6$	$0.67$	61	$3$	$-6.57$	$0.79$	$3$	$-0.88$	$0.8$
26	$9$	$-5.45$	$0.64$	$22$	$2.24$	$0.58$	62	$4$	$-6.32$	$0.73$	$5$	$-0.38$	$0.71$
27	$13$	$-4.54$	$0.61$	$17$	$2.02$	$0.59$	64	$19$	$-3.71$	$0.58$	$14$	$1.55$	$0.6$
28	$11$	$-3.91$	$0.64$	$9$	$1.32$	$0.69$	65	$22$	$-3.68$	$0.58$	$8$	$0.32$	$0.64$
29	$10$	$-4.81$	$0.62$	$10$	$1.09$	$0.62$	66	$15$	$-4.56$	$0.59$	$3$	$-0.97$	$0.8$
30	$6$	$-5.26$	$0.71$	$5$	$-0.1$	$0.78$	67	$4$	$-6.5$	$0.73$	$3$	$-1.04$	$0.79$
31	$25$	$-2.21$	$0.59$	$14$	$2.21$	$0.64$	68	$6$	$-5.81$	$0.68$	$5$	$-0.32$	$0.72$
32	$4$	$-5.86$	$0.79$	$7$	$0.54$	$0.74$	69	$5$	$-6.13$	$0.7$	$4$	$-0.64$	$0.74$
33	$12$	$-4.03$	$0.64$	$2$	$-1.55$	$1.01$	70	$7$	$-5.5$	$0.65$	$5$	$-0.25$	$0.71$
34	$6$	$-5.54$	$0.67$	$11$	$1.18$	$0.61$

5 Discussion

In this paper, we have derived the $\ell_{\infty}$ -error between the MLE and its true values and established the asymptotic normality of the MLE in the covariate- $p_{0}$ -model when the number of vertices goes to infinity. Note that the conditions imposed on $b_{n}$ and $\kappa_{n}$ in Theorems 1–4 may not be the best possible. In particular, the conditions guaranteeing the asymptotic normality seem stronger than those guaranteeing the consistency. It would be interesting to investigate whether these bounds can be improved.

As discussed in Yan et al. (2019), there is an implicit taste for the reciprocity parameter in the $p_{1}$ -model [Holland and Leinhardt (1981)], although we do not include this parameter. If similarity terms are included, then there is a tendency toward reciprocity among nodes sharing similar node features. That would alleviate the lack of a reciprocity term to some extent. To measure the reciprocity of dyads, it is natural to incorporate the model term $\rho\sum_{i<j}a_{ij}a_{ji}$ as in the $p_{1}$ model into the covariate- $p_{0}$ -model. In Yan and Leng (2015), empirical results show that there are central limit theorems for the MLE in the $p_{1}$ model without covariates. Nevertheless, although only one new parameter is added, the problem of investigating the asymptotic theory of the MLEs becomes more challenging. In particular, the Fisher information matrix for the parameter vector $(\rho,\alpha_{1},\ldots,\alpha_{n},\beta_{1},\ldots,\beta_{n-1})$ is not diagonally dominant and thus does not belong to the class $\mathcal{L}_{n}(m,M)$ . In order to generalize the method here, a new approximate matrix with high accuracy of the inverse of the Fisher information matrix is needed. It is beyond of the present paper to investigate their asymptotic theory.

6 Appendix: Proofs for theorems

We only give the proof of Theorem 1 here. The proof of Theorem 3 is put in the supplementary material.

Let $F(x):\mathbb{R}^{n}\to\mathbb{R}^{n}$ be a function vector on $x\in\mathbb{R}^{n}$ . We say that a Jacobian matrix $F^{\prime}(x)$ with $x\in\mathbb{R}^{n}$ is Lipschitz continuous on a convex set $D\subset\mathbb{R}^{n}$ if for any $x,y\in D$ , there exists a constant $\lambda>0$ such that for any vector $v\in\mathbb{R}^{n}$ the inequality

\|[F^{\prime}(x)]v-[F^{\prime}(y)]v\|_{\infty}\leq\lambda\|x-y\|_{\infty}\|v\|_{\infty}

holds. We will use the Newton iterative sequence to establish the existence and consistency of the moment estimator. Gragg and Tapia (1974) gave the optimal error bound for the Newton method under the Kantovorich conditions [Kantorovich (1948)].

Lemma 2 (Gragg and Tapia (1974)).

Let $D$ be an open convex set of $\mathbb{R}^{n}$ and $F:D\to\mathbb{R}^{n}$ a differential function with a Jacobian $F^{\prime}(x)$ that is Lipschitz continuous on $D$ with Lipschitz coefficient $\lambda$ . Assume that $x_{0}\in D$ is such that $[F^{\prime}(x_{0})]^{-1}$ exists,

	$\displaystyle\\|[F^{\prime}(x_{0})]^{-1}\\|_{\infty}\leq\aleph,~{}~{}\\|[F^{\prime}(x_{0})]^{-1}F(x_{0})\\|_{\infty}\leq\delta,~{}~{}\rho=2\aleph\lambda\delta\leq 1,$
	$\displaystyle B(x_{0},t^{})\subset D,~{}~{}t^{}=\frac{2}{\rho}(1-\sqrt{1-\rho})\delta=\frac{2\delta}{1+\sqrt{1-\rho}}\leq 2\delta.$

Then: (1) The Newton iterations $x_{k+1}=x_{k}-[F^{\prime}(x_{k})]^{-1}F(x_{k})$ exist and $x_{k}\in B(x_{0},t^{*})\subset D$ for $k\geq 0$ . (2) $x^{*}=\lim x_{k}$ exists, $x^{*}\in\overline{B(x_{0},t^{*})}\subset D$ and $F(x^{*})=0$ .

6.1 Proof of Theorem 1

To show Theorem 1, we need three lemmas below.

Lemma 3.

Let $D=B(\boldsymbol{\gamma}^{*},\epsilon_{n2})(\subset\mathbb{R}^{p})$ . If $\|F(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}=O((n\log n)^{1/2})$ , then $Q_{c}(\boldsymbol{\gamma})$ is Lipschitz continuous on $D$ with the Lipschitz coefficient $O(b_{n}^{9}n^{2})$ .

Lemma 4.

With probability at least $1-O(1/n)$ , we have

\|F(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}\leq(n\log n)^{1/2},~{}~{}\|Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}\leq z_{\max}n(\log n)^{1/2},

(18)

where $z_{\max}:=\max_{i,j}\|Z_{ij}\|_{\infty}$ .

Lemma 5.

The difference between $Q(\widehat{\boldsymbol{\eta}}_{\gamma}^{*},\boldsymbol{\gamma}^{*})$ and $Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})$ is

\|Q(\widehat{\boldsymbol{\eta}}_{\boldsymbol{\gamma}}^{*},\boldsymbol{\gamma}^{*})-Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}=O_{p}(b_{n}^{9}n\log n).

Now we are ready to prove Theorem 1.

Proof of Theorem 1.

We construct the Newton iterative sequence to show the consistency. It is sufficient to verify the Newton-Kantovorich conditions in Lemma 2. We set $\boldsymbol{\gamma}^{*}$ as the initial point $\boldsymbol{\gamma}^{(0)}$ and $\boldsymbol{\gamma}^{(k+1)}=\boldsymbol{\gamma}^{(k)}-[Q_{c}^{\prime}(\boldsymbol{\gamma}^{(k)})]^{-1}Q_{c}(\boldsymbol{\gamma}^{(k)})$ .

By Lemma 1, $\widehat{\boldsymbol{\eta}}_{\gamma^{*}}$ exists with probability approaching one and satisfies we have

\|\widehat{\boldsymbol{\eta}}_{\gamma^{*}}-\boldsymbol{\eta}^{*}\|_{\infty}=O_{p}\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right).

Therefore, $Q_{c}(\boldsymbol{\gamma}^{(0)})$ and $Q_{c}^{\prime}(\boldsymbol{\gamma}^{(0)})$ are well defined.

Recall the definition of $Q_{c}(\boldsymbol{\gamma})$ and $Q(\boldsymbol{\eta},\boldsymbol{\gamma})$ in (7) and (8). By Lemmas 4 and 5, we have

	$\displaystyle\\|Q_{c}(\boldsymbol{\gamma}^{*})\\|_{\infty}$	$\displaystyle\leq$	$\displaystyle\\|Q(\boldsymbol{\eta}^{},\boldsymbol{\gamma}^{})\\|_{\infty}+\\|Q(\widehat{\boldsymbol{\eta}}_{\boldsymbol{\gamma}^{}},\boldsymbol{\gamma}^{})-Q(\boldsymbol{\eta}^{},\boldsymbol{\gamma}^{})\\|_{\infty}$
		$\displaystyle=$	$\displaystyle O_{p}\left(b_{n}^{9}n\log n\right).$

By Lemma 3, $\lambda=n^{2}b_{n}^{9}$ . By (14), we have

\aleph=\|[Q_{c}^{\prime}(\boldsymbol{\gamma}^{*})]^{-1}\|_{\infty}=O(\kappa_{n}n^{-2}).

Thus,

\delta=\|[Q_{c}^{\prime}(\boldsymbol{\gamma}^{*})]^{-1}Q_{c}(\boldsymbol{\gamma}^{*})\|_{\infty}=O_{p}\left(\frac{\kappa_{n}b_{n}^{9}\log n}{n}\right).

As a result, if $\kappa_{n}^{2}b_{n}^{18}=o(n/\log n)$ , then

\rho=2\aleph\lambda\delta=O(\frac{\kappa_{n}^{2}b_{n}^{18}\log n}{n})=o(1).

By Lemma 2, with probability $1-O(n^{-1})$ , the limiting point of the sequence $\{\boldsymbol{\gamma}^{(k)}\}_{k=1}^{\infty}$ exists denoted by $\widehat{\boldsymbol{\gamma}}$ and satisfies

\|\widehat{\boldsymbol{\gamma}}-\boldsymbol{\gamma}^{*}\|_{\infty}=O(\delta).

By Lemma 1, $\widehat{\boldsymbol{\eta}}_{\widehat{\boldsymbol{\gamma}}}$ exists and $(\widehat{\boldsymbol{\eta}}_{\widehat{\boldsymbol{\gamma}}},\widehat{\boldsymbol{\gamma}})$ is the MLE. It completes the proof. ∎

References

Amemiya (1985) Amemiya, T. (1985). Advanced Econometrics. Cambridge, MA. Harvard University Press.
Chatterjee et al. (2011) Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence. Annals of Applied Probability, 21:1400–1435.
Dzemski (2019) Dzemski, A. (2019). An empirical model of dyadic link formation in a network with unobserved heterogeneity. The Review of Economics and Statistics, 101(5):763–776.
Fienberg (2012) Fienberg, S. E. (2012). A brief history of statistical models for network analysis and open challenges. Journal of Computational and Graphical Statistics, 21(4):825–839.
Gragg and Tapia (1974) Gragg, W. B. and Tapia, R. A. (1974). Optimal error bounds for the Newton–Kantorovich theorem. SIAM Journal on Numerical Analysis, 11(1):10–13.
Graham (2017) Graham, B. S. (2017). An econometric model of link formation with degree heterogeneity. Econometrica, 85:1033–1063.
Haberman (1977) Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. The Annals of Statistics, 5:815–841.
Hillar and Wibisono (2013) Hillar, C. and Wibisono, A. (2013). Maximum entropy distributions on graphs. arXiv e-prints, page arXiv:1301.3321.
Holland and Leinhardt (1981) Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373):33–50.
Kantorovich (1948) Kantorovich, L. V. (1948). Functional analysis and applied mathematics. Uspekhi Mat Nauk, pages 89–185.
Lazega (2001) Lazega, E. (2001). The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford University Press.
Liang and Du (2012) Liang, H. and Du, P. (2012). Maximum likelihood estimation in logistic regression models with a diverging number of covariates. Electron. J. Statist., 6:1838–1846.
Neyman and Scott (1948) Neyman, J. and Scott, E. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16:1–32.
Perry and Wolfe (2012) Perry, P. O. and Wolfe, P. J. (2012). Null models for network data. Available at http://arxiv.org/abs/1201.5871.
Portnoy (1988) Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist., 16(1):356–366.
Rinaldo and Fienberg (2013) Rinaldo, A., P. S. and Fienberg, S. E. (2013). Maximum likelihood estimation in the $\beta$ -model. The Annals of Statistics, 41:1085–1110.
Wang (2011) Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates. Ann. Statist., 39(1):389–417.
Yan (2021) Yan, T. (2021). Maximum likelihood estimation in a sparse $\beta$ -model with covariates. Manuscript.
Yan et al. (2019) Yan, T., Jiang, B., Fienberg, S. E., and Leng, C. (2019). Statistical inference in a directed network model with covariates. Journal of the American Statistical Association, 114(526):857–868.
Yan and Leng (2015) Yan, T. and Leng, C. (2015). A simulation study of the $p_{1}$ model. Statistics and Its Interface, 8:255–266.
Yan et al. (2016a) Yan, T., Leng, C., and Zhu, J. (2016a). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist., 44(1):31–57.
Yan et al. (2016b) Yan, T., Qin, H., and Wang, H. (2016b). Asymptotics in undirected random graph models parameterized by the strengths of vertices. Statistica Sinica, 26(1):273–293.
Yan and Xu (2013) Yan, T. and Xu, J. (2013). A central limit theorem in the $\beta$ -model for undirected random graphs with a diverging number of vertices. Biometrika, 100:519–524.

A sparse p0p_{0} model with covariates for directed networks