This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\UseRawInputEncoding

A sparse p0p_{0} model with covariates for directed networks

Qiuping Wang
Central China Normal University
Department of Statistics, Central China Normal University, Wuhan, 430079, China. Emails:[email protected].
Abstract

We are concerned here with unrestricted maximum likelihood estimation in a sparse p0p_{0} model with covariates for directed networks. The model has a density parameter ν\nu, a 2n2n-dimensional node parameter 𝜼\boldsymbol{\eta} and a fixed dimensional regression coefficient 𝜸\boldsymbol{\gamma} of covariates. Previous studies focus on the restricted likelihood inference. When the number of nodes nn goes to infinity, we derive the \ell_{\infty}-error between the maximum likelihood estimator (MLE) (𝜼^,𝜸^)(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}}) and its true value (𝜼,𝜸)(\boldsymbol{\eta},\boldsymbol{\gamma}). They are Op((logn/n)1/2)O_{p}((\log n/n)^{1/2}) for 𝜼^\widehat{\boldsymbol{\eta}} and Op(logn/n)O_{p}(\log n/n) for 𝜸^\widehat{\boldsymbol{\gamma}}, up to an additional factor. This explains the asymptotic bias phenomenon in the asymptotic normality of 𝜸^\widehat{\boldsymbol{\gamma}} in Yan et al. (2019). Further, we derive the asymptotic normality of the MLE. Numerical studies and a data analysis demonstrate our theoretical findings. Key words: Asymptotic normality; Consistency; Degree heterogeneity; Homophily; Maximum likelihood estimator.

1 Introduction

The degree heterogeneity is a very common phenomenon in realistic networks, which characterizes the variation in the node degrees. In directed networks, there are in- and out-degrees for nodes. Another distinct phenomenon is homophily that links tend to form between nodes with same attributes such as age and sex. As depicted in Yan et al. (2019), the directed friendship network between 7171 lawyers studied in Lazega (2001) displays a strong homophily effect. Yan et al. (2019) incorporated the covariates into the p0p_{0} model in Yan et al. (2016a) to address the homophily. However, the network sparsity has not been considered in their paper. This is an equally important feature in realistic networks.

Here, we further incorporate a sparsity parameter to Yan et al.’s model. Consider a directed graph 𝒢n\mathcal{G}_{n} on n2n\geq 2 nodes that are labeled by 1,,n1,\ldots,n. Let aij{0,1}a_{ij}\in\{0,1\} be an indictor whether there is a directed edge from node ii pointing to jj. That is, if there is a directed edge from ii to jj, then aij=1a_{ij}=1; otherwise, aij=0a_{ij}=0. Our model postulates that aija_{ij}’s follow independent Bernoulli distributions such that a directed link exists from node ii to node jj with probability

P(aij=1)=exp(ν+αi+βj+Zij𝜸)1+exp(ν+αi+βj+Zij𝜸).P(a_{ij}=1)=\frac{\exp(\nu+\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}{1+\exp(\nu+\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}. (1)

In this model, the parameter ν\nu measures the network sparsity. For a node ii, the incomingness parameter denoted by βi\beta_{i} characterizes how attractive the node and an outgoingness parameter denoted by αi\alpha_{i} illustrates the extent to which the node is attracted to others [Holland and Leinhardt (1981)]. 𝜸\boldsymbol{\gamma} is the regression coefficient of covariates ZijZ_{ij}’s. A larger Zij𝜸Z_{ij}^{\top}\boldsymbol{\gamma} implies a higher likelihood for node ii and jj to be connected. We will call the above model “covariate-p0p_{0}-model” hereafter.

The covariate ZijZ_{ij} is either a link dependent vector or a function of node-specific covariates. If XiX_{i} denotes a vector of node-level attributes, then these node-level attributes can be used to construct a pp-dimensional vector Zij=g(Xi,Xj)Z_{ij}=g(X_{i},X_{j}), where g(,)g(\cdot,\cdot) is a function of its arguments. For instance, if we let g(Xi,Xj)g(X_{i},X_{j}) equal to XiXj1\|X_{i}-X_{j}\|_{1}, then it measures the similarity between node ii and jj features.

Exploring asymptotic theory in the covaraite-p0p_{0}-model is challenging [e.g. Fienberg (2012); Yan et al. (2019)]. The restricted maximum likelihood inference has been used to derive the consistency and asymptotic normality of the restricted maximum likelihood estimator in Yan et al. (2019). However, it is often to solve unrestricted maximum likelihood problem in practices since it is difficult to determine the range of unknown parameters, especially in a situation with a growing dimension of parameter space. Therefore, developing unrestricted likelihood inference is more interesting and more actually related.

Motivated by techniques for the analysis of the unrestricted likelihood inference in the sparse β\beta-model with covariates for undirected graphs in Yan (2021), we generalize his approach to directed graphs here. When the number of nodes nn goes to infinity, we derive the \ell_{\infty}-error between the maximum likelihood estimator (MLE) (𝜼^,𝜸^)(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}}) and its true value (𝜼,𝜸)(\boldsymbol{\eta},\boldsymbol{\gamma}) by using a two-stage Newton process that first finds the error bound between 𝜼^γ\widehat{\boldsymbol{\eta}}_{\gamma} and 𝜼\boldsymbol{\eta} with a fixed 𝜸\boldsymbol{\gamma} and then derives the error bound between 𝜸^\widehat{\boldsymbol{\gamma}} and 𝜸\boldsymbol{\gamma}. They are Op((logn/n)1/2)O_{p}((\log n/n)^{1/2}) for 𝜼^\widehat{\boldsymbol{\eta}} and Op(logn/n)O_{p}(\log n/n) for 𝜸^\widehat{\boldsymbol{\gamma}}, up to an additional factor on parameters. Further, we derive the asymptotic normality of the MLE. The asymptotic distribution of 𝜸^\widehat{\boldsymbol{\gamma}} has a bias term while 𝜼^\widehat{\boldsymbol{\eta}} does not have such a bias, which collaborates the findings in Yan et al. (2019). This is because of different convergence rates for 𝜸^\widehat{\boldsymbol{\gamma}} and 𝜼^\widehat{\boldsymbol{\eta}}, which explains the bias phenomenon in Yan et al. (2019). Wide simulations are carried out to demonstrate our theoretical findings. A real data example is also provided for illustration.

1.1 Related work

There is a large body of work concerned with the degree heterogeneity. The β\beta-model is one of the most popular models to model the degree heterogeneity, which is named by Chatterjee et al. (2011). Since the number of parameters increases with the number of nodes, asymptotic theory is nonstandard. Exploring the properties of the β\beta-model and its generalizations has received wide attentions in recent years (Chatterjee et al., 2011; Perry and Wolfe, 2012; Hillar and Wibisono, 2013; Yan and Xu, 2013; Rinaldo and Fienberg, 2013). In particular, Chatterjee et al. (2011) proved the uniform consistency of the maximum likelihood estimator (MLE) and Yan and Xu (2013) derived the asymptotic normality of the MLE. Yan et al. (2016b) further established a unified theoretical framework under which the consistency and asymptotic normality of the moment estimator hold in a class of node-parameter network models. In the directed case, Yan et al. (2016a) proved the consistency and asymptotic normality of the MLE in the p0p_{0} model, which is a special case of the p1p_{1} model by Holland and Leinhardt (1981).

Graham (2017) incorporated covariates to the β\beta-model to model the homophily in undirected networks. Dzemski (2019) and Yan et al. (2019) proposed directed graph models to model edges with covariates by using a bivariate normal distribution and a logistic distribution, respectively. The asymptotic normality of the restricted maximum likelihood estimator of the covariate parameter were derived under the assumption that the estimators for all parameters are taken in one compact set in these papers. Yan et al. (2019) further derived the asymptotic normality of the restricted MLE of the node parameter. However, the convergence rate of the covariate parameter is not investigated in their works. Yan (2021) worked with the unrestricted maximum likelihood estimation in a sparse β\beta-model with covariates for undirected graphs and proved the uniformly consistency and asymptotic normality of the MLE, where the consistency is proved via the convergence rate of the Newton iterative sequence in a two-stage Newton method. We demonstrate that his method could be further extended here. We also note that extending his result to directed graphs is not trivial because of that the Fisher information matrix here is very different from that in the sparse β\beta-model with covariates and that the dimension of parameter space here is much bigger.

Finally, we note that the model can be recast into a logistic regression model. There have been a considerable mount of literatures on asymptotic theory in generalized linear models with a growing number of parameters. Haberman (1977) establish the asymptotic theory of the MLE in exponential response models in a setting that allows the number of parameters goes to infinity. By assuming that a sequence of independent and identical distributed samples {Xi}i=1n\{X_{i}\}_{i=1}^{n} from a canonical exponential family c(θ)exp(θx)c(\theta)\exp(\theta^{\top}x) with an increasing dimension pnp_{n}, Portnoy (1988) shew θ^θ=Op(pn/n)\|\widehat{\theta}-\theta^{*}\|_{\infty}=O_{p}(\sqrt{p_{n}/n}) if pn/n=o(1)p_{n}/n=o(1) and θ^\widehat{\theta} is asymptotically normal if pn2/n=o(1)p_{n}^{2}/n=o(1) under some additional conditions. For clustered binary data {(Yi,Xi)}i=1n\{(Y_{i},X_{i})\}_{i=1}^{n} with a pnp_{n} dimensional covariate XiX_{i} and regression parameter θ\theta, Wang (2011) derived the asymptotic theory of generalized estimating equations estimators if pn2/n=o(1)p_{n}^{2}/n=o(1). Liang and Du (2012) establish the asymptotic normality of the maximum likelihood estimate when the number of covariates pp goes to infinity with the sample size nn in the order of p=o(n)p=o(n), but the error bound between θ^\widehat{\theta} and θ\theta is not investigated. We note that asymptotic theory in these papers needs the condition that c1Nλminλmaxc2Nc_{1}N\leq\lambda_{\min}\leq\lambda_{\max}\leq c_{2}N for two positive constants c1c_{1} and c2c_{2}, where SN=i=1NxixiS_{N}=\sum_{i=1}^{N}x_{i}x_{i}^{\top} and λmin\lambda_{\min} and λmax\lambda_{\max} are the minimum and maximum eigenvalue of SNS_{N}, xix_{i} is a pNp_{N}-dimensional covariate vector. In our model, the first nn diagonal entries of SNS_{N} is in the order of nn while the last pp diagonal entries of SNS_{N} is in the order of n2n^{2}. Because of different orders of the diagonal elements of SNS_{N}, the ratio λmax/λmin\lambda_{\max}/\lambda_{\min} is in the order of O(n2)O(n^{2}), far away from the constant order in Liang and Du (2012). It is also worth noting that the consistency and asymptotic normality of the MLE have been derived in exponential response model [Portnoy (1988)]. The asymptotic normality of θ^\widehat{\theta} does not contain a bias term in all mentioned these papers while the asymptotic distribution of γ^\widehat{\gamma} in our model contains a bias term. This is referred to as the incident bias problem in economic literature [e.g., Graham (2017)].

For the remainder of the paper, we proceed as follows. In Section 2, we give the details on the model considered in this paper. In section 3, we establish asymptotic results. Numerical studies are presented in Section 4. We provide further discussion and future work in Section 5. All proofs are relegated to the appendix.

2 Maximum Likelihood Estimation

Let 𝜶=(α1,,αn)\boldsymbol{\alpha}=(\alpha_{1},\ldots,\alpha_{n})^{\top} and 𝜷=(β1,,βn)\boldsymbol{\beta}=(\beta_{1},\ldots,\beta_{n})^{\top}. If one transforms (ν,𝜶,𝜷)(\nu,\boldsymbol{\alpha},\boldsymbol{\beta}) to (ν+2c2,𝜶c1c2,𝜷+c1c2)(\nu+2c_{2},\boldsymbol{\alpha}-c_{1}-c_{2},\boldsymbol{\beta}+c_{1}-c_{2}) for two distinct constants c1c_{1} and c2c_{2}, the probability in (1) does not change. Thus, we need restrictions on parameters to ensure the model identification. We set the restriction that αn=0\alpha_{n}=0 and βn=0\beta_{n}=0 for practical analysis, which keeps the density parameter ν\nu. For theoretical studies in next section, we set ν=0\nu=0 and βn=0\beta_{n}=0 as in Yan et al. (2019) for convenience. Other restrictions are also possible, e.g., iαi=0\sum_{i}\alpha_{i}=0 and jβj=0\sum_{j}\beta_{j}=0, or, ν=0\nu=0 and iαi=0\sum_{i}\alpha_{i}=0.

Denote A=(aij)n×nA=(a_{ij})_{n\times n} as the adjacency matrix of 𝒢n\mathcal{G}_{n}. We assume that there are no self-loops, i.e., aii=0a_{ii}=0. We use di=jiaijd_{i}=\sum_{j\neq i}a_{ij} to denote the out-degree of node ii and 𝐝=(d1,,dn)\mathbf{d}=(d_{1},\ldots,d_{n})^{\top}. Similarly, bj=ijaijb_{j}=\sum_{i\neq j}a_{ij} denotes the in-degree of node jj and 𝐛=(b1,,bn)\mathbf{b}=(b_{1},\ldots,b_{n})^{\top}. The pair {(b1,d1),,(bn,dn)}\{(b_{1},d_{1}),\ldots,(b_{n},d_{n})\} is the so-called bi-degree sequence. Since an out-edge from node ii pointing to jj is the in-edge of jj coming from ii, it is immediate that the sum of out-degrees is equal to that of in-degrees.

Since the random variables ai,ja_{i,j} for iji\neq j, are mutually independent given the covariates, the log-likelihood function is

(ν,𝜶,𝜷,𝜸)=i,jaij(ν+Zij𝜸)+𝜶𝐝+𝜷𝐛ijlog(1+exp(Zij𝜸+ν+αi+βj)).\ell(\nu,\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma})=\sum_{i,j}a_{ij}(\nu+Z_{ij}^{\top}\boldsymbol{\gamma})+\boldsymbol{\alpha}^{\top}\mathbf{d}+\boldsymbol{\beta}^{\top}\mathbf{b}-\sum_{i\neq j}\log\big{(}1+\exp(Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{j})\big{)}. (2)

As discussed before, 𝜶\boldsymbol{\alpha} is a parameter vector tied to the out-degree sequence, and 𝜷\boldsymbol{\beta} is a parameter vector tied to the in-degree sequence, and 𝜸=(γ1,,γp)\boldsymbol{\gamma}=(\gamma_{1},\ldots,\gamma_{p})^{\top} is a parameter vector tied to the information of node covariates. It follows that the score equations are

i,j=1;ijnaij=i,j=1;ijneZij𝜸+ν+αi+βj1+eZij𝜸+ν+αi+βj,i,j=1;ijnaijZij=i,j=1;ijnZijeZij𝜸+ν+αi+βj1+eZij𝜸+αi+βj,di=k=1,kineZij𝜸+ν+αi+βk1+eZij𝜸+ν+αi+βk,i=1,,n,bj=k=1,kjneZij𝜸+ν+αk+βj1+eZij𝜸+ν+αk+βj,j=1,,n.\large\begin{array}[]{rcl}\sum\limits_{i,j=1;i\neq j}^{n}a_{ij}&=&\sum\limits_{i,j=1;i\neq j}^{n}\frac{e^{Z_{ij}^{\top}\boldsymbol{{\gamma}}+\nu+\alpha_{i}+\beta_{j}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{j}}},\\ \sum\limits_{i,j=1;i\neq j}^{n}a_{ij}Z_{ij}&=&\sum\limits_{i,j=1;i\neq j}^{n}\frac{Z_{ij}e^{Z_{ij}^{\top}\boldsymbol{{\gamma}}+\nu+\alpha_{i}+\beta_{j}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\alpha_{i}+\beta_{j}}},\\ d_{i}&=&\sum\limits_{k=1,k\neq i}^{n}\frac{e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{k}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{i}+\beta_{k}}},~{}~{}~{}i=1,\ldots,n,\\ b_{j}&=&\sum\limits_{k=1,k\neq j}^{n}\frac{e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{k}+\beta_{j}}}{1+e^{Z_{ij}^{\top}\boldsymbol{\gamma}+\nu+\alpha_{k}+\beta_{j}}},~{}~{}j=1,\ldots,n.\end{array} (3)

Denote the MLE of (ν,𝜶,𝜷,𝜸)\ell(\nu,\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\gamma}) as (ν^,𝜶^,𝜷^,𝜸^)(\widehat{\nu},\widehat{\boldsymbol{\alpha}},\widehat{\boldsymbol{\beta}},\widehat{\boldsymbol{\gamma}}). Note that α^1=α1=0\widehat{\alpha}_{1}=\alpha_{1}=0 and β^1=β1=0\widehat{\beta}_{1}=\beta_{1}=0. If (ν^,𝜶^,𝜷^,𝜸^)(\widehat{\nu},\widehat{\boldsymbol{\alpha}},\widehat{\boldsymbol{\beta}},\widehat{\boldsymbol{\gamma}}) exists, then it is unique and satisfies the above equations.

As discussed in Yan et al. (2019), when the number of nodes nn is small, we can simply use the R function “glm” to solve (3). For relatively large nn, this is no longer feasible as it is memory demanding to store the design matrix needed for 𝜶\boldsymbol{\alpha} and 𝜷\boldsymbol{\beta}. In this case, we recommend the use of a two-step iterative algorithm by alternating between solving the second and third equations in (3) via the fixed point method in Yan et al. (2016a) and solving the first equation in (3) via some existing algorithm for generalized linear models.

3 Theoretical Properties

In this section, we set ν=0\nu=0 and βn=0\beta_{n}=0 as the model identification condition. The asymptotic properties of ν^\widehat{\nu} in the restriction αn=βn=0\alpha_{n}=\beta_{n}=0 is the same as those of α^n\widehat{\alpha}_{n} in the restriction ν=βn=0\nu=\beta_{n}=0.

We first introduce some notations. Let =(,)\mathbb{R}=(-\infty,\infty) be the real domain. For a subset CnC\subset\mathbb{R}^{n}, let C0C^{0} and C¯\overline{C} denote the interior and closure of CC, respectively. For a vector 𝐱=(x1,,xn)n\mathbf{x}=(x_{1},\ldots,x_{n})^{\top}\in\mathbb{R}^{n}, denote by 𝐱\|\mathbf{x}\| for a general norm on vectors with the special cases 𝐱=max1in|xi|\|\mathbf{x}\|_{\infty}=\max_{1\leq i\leq n}|x_{i}| and 𝐱1=i|xi|\|\mathbf{x}\|_{1}=\sum_{i}|x_{i}| for the \ell_{\infty}- and 1\ell_{1}-norm of 𝐱\mathbf{x} respectively. When nn is fixed, all norms on vectors are equivalent. Let B(𝐱,ϵ)={𝐲:𝐱𝐲ϵ}B(\mathbf{x},\epsilon)=\{\mathbf{y}:\|\mathbf{x}-\mathbf{y}\|_{\infty}\leq\epsilon\} be an ϵ\epsilon-neighborhood of 𝐱\mathbf{x}. For an n×nn\times n matrix J=(Ji,j)J=(J_{i,j}), let J\|J\|_{\infty} denote the matrix norm induced by the \ell_{\infty}-norm on vectors in n\mathbb{R}^{n}, i.e.,

J=maxx0J𝐱𝐱=max1inj=1n|Ji,j|,\|J\|_{\infty}=\max_{x\neq 0}\frac{\|J\mathbf{x}\|_{\infty}}{\|\mathbf{x}\|_{\infty}}=\max_{1\leq i\leq n}\sum_{j=1}^{n}|J_{i,j}|,

and J\|J\| be a general matrix norm. Define the matrix maximum norm: Jmax=maxi,j|Jij|\|J\|_{\max}=\max_{i,j}|J_{ij}|. The notation i\sum_{i} denotes the summarization over all i=1,,ni=1,\ldots,n and ij\sum_{i\neq j} is a shorthand for i=1nj=1,jin\sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n}.

For convenience of our theoretical analysis, define 𝜼=(α1,,αn,β1,,βn)\boldsymbol{\eta}=(\alpha_{1},\ldots,\alpha_{n},\beta_{1},\ldots,\beta_{n})^{\top}. We use the superscript “*” to denote the true parameter under which the data are generated. When there is no ambiguity, we omit the super script “*”. Further, define

πij:=αi+βj+Zij𝜸,μ(x):=ex1+ex.\pi_{ij}:=\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma},~{}~{}\mu(x):=\frac{e^{x}}{1+e^{x}}.

Write μ\mu^{\prime}, μ′′\mu^{\prime\prime} and μ′′′\mu^{\prime\prime\prime} as the first, second and third derivative of μ(x)\mu(x) on xx, respectively. Direct calculations give that

μ(x)=ex(1+ex)2,μ′′(x)=ex(1ex)(1+ex)3,μ′′′(x)=ex(14ex+e2x)(1+ex)4.\mu^{\prime}(x)=\frac{e^{x}}{(1+e^{x})^{2}},~{}~{}\mu^{\prime\prime}(x)=\frac{e^{x}(1-e^{x})}{(1+e^{x})^{3}},~{}~{}\mu^{\prime\prime\prime}(x)=\frac{e^{x}(1-4e^{x}+e^{2x})}{(1+e^{x})^{4}}.

It is not difficult to verify

|μ(x)|14,|μ′′(x)|14,|μ′′′(x)|14.|\mu^{\prime}(x)|\leq\frac{1}{4},~{}~{}|\mu^{\prime\prime}(x)|\leq\frac{1}{4},~{}~{}|\mu^{\prime\prime\prime}(x)|\leq\frac{1}{4}. (4)

Let ϵn1\epsilon_{n1} (=o(1)=o(1)) and ϵn2\epsilon_{n2} ((logn/n)1/2/(pzmax)\leq(\log n/n)^{1/2}/(pz_{\max})) be two small positive numbers, where zmax:=supi,jZijz_{\max}:=\sup_{i,j}\|Z_{ij}\|_{\infty}. Define

bn:=sup𝜼B(𝜼,ϵn1),𝜸B(𝜸,ϵn2)(1+eπij)2eπij=O(e2𝜼+zmax𝜸).b_{n}:=\sup_{\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1}),\boldsymbol{\gamma}\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2})}\frac{(1+e^{\pi_{ij}})^{2}}{e^{\pi_{ij}}}=O(e^{2\|\boldsymbol{\eta}^{*}\|_{\infty}+z_{\max}\|\boldsymbol{\gamma}^{*}\|_{\infty}}). (5)

It is easy to see that bn4b_{n}\geq 4. When 𝜼B(𝜼,ϵn1),𝜸B(𝜸,ϵn2)\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1}),\boldsymbol{\gamma}\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2}), μ(πij)1/bn\mu^{\prime}(\pi_{ij})\geq 1/b_{n}.

When causing no confusion, we will simply write μij\mu_{ij} stead of μij(𝜼,𝜸)\mu_{ij}(\boldsymbol{\eta},\boldsymbol{\gamma}) for shorthand, where

μij(𝜼,𝜸)=exp(αi+βj+Zij𝜸)1+exp(αi+βj+Zij𝜸)=μ(πij).\mu_{ij}(\boldsymbol{\eta},\boldsymbol{\gamma})=\frac{\exp(\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}{1+\exp(\alpha_{i}+\beta_{j}+Z_{ij}^{\top}\boldsymbol{\gamma})}=\mu(\pi_{ij}).

Write the partial derivative of a function vector F(𝜼^,𝜸)F(\widehat{\boldsymbol{\eta}},\boldsymbol{\gamma}) on 𝜼\boldsymbol{\eta} as

F(𝜼^,𝜸^)𝜼=F(𝜼,𝜸)𝜼|𝜼=𝜼^,𝜸=𝜸^.\frac{\partial F(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})}{\partial\boldsymbol{\eta}^{\top}}=\frac{\partial F(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}^{\top}}\bigg{|}_{\boldsymbol{\eta}=\widehat{\boldsymbol{\eta}},\boldsymbol{\gamma}=\widehat{\boldsymbol{\gamma}}}.

Throughout the remainder of this paper, we make the following assumption.

Assumption 1.

Assume that pp, the dimension of ZijZ_{ij}, is fixed and that the support of ZijZ_{ij} is p\mathbb{Z}^{p}, where \mathbb{Z} is a compact subset of \mathbb{R}.

The above assumption is made in Graham (2017) and Yan et al. (2019). In many real applications, the attributes of nodes have a fixed dimension and ZijZ_{ij} is bounded. For example, if ZijZ_{ij}’s are indictor variables such as sex, then the assumption holds. If ZijZ_{ij} is not bounded, we make the transform Z~ij=eZij/(1+eZij)\tilde{Z}_{ij}=e^{Z_{ij}}/(1+e^{Z_{ij}}).

3.1 Consistency

In order to establish the existence and consistency of (𝜼^,𝜸^)(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}}), we define a system of functions

Fi(𝜼,𝜸)\displaystyle F_{i}(\boldsymbol{\eta},\boldsymbol{\gamma}) =\displaystyle= k=1;kineαi+βk+Zij𝜸1+eαi+βk+Zij𝜸di,i=1,,n,\displaystyle\sum_{k=1;k\neq i}^{n}\frac{e^{\alpha_{i}+\beta_{k}+Z_{ij}^{\top}\boldsymbol{\gamma}}}{1+e^{\alpha_{i}+\beta_{k}+Z_{ij}^{\top}\boldsymbol{\gamma}}}-d_{i},~{}~{}~{}i=1,\ldots,n,
Fn+j(𝜼,𝜸)\displaystyle F_{n+j}(\boldsymbol{\eta},\boldsymbol{\gamma}) =\displaystyle= k=1;kjneαk+βj+Zkj𝜸1+eαk+βj++Zkj𝜸bj,j=1,,n,\displaystyle\sum_{k=1;k\neq j}^{n}\frac{e^{\alpha_{k}+\beta_{j}+Z_{kj}^{\top}\boldsymbol{\gamma}}}{1+e^{\alpha_{k}+\beta_{j}++Z_{kj}^{\top}\boldsymbol{\gamma}}}-b_{j},~{}~{}~{}j=1,\ldots,n, (6)
F(𝜼,𝜸)\displaystyle F(\boldsymbol{\eta},\boldsymbol{\gamma}) =\displaystyle= (F1(𝜼,𝜸),,Fn(𝜼,𝜸),Fn+1(𝜼,𝜸),,F2n1(𝜼,𝜸)),\displaystyle(F_{1}(\boldsymbol{\eta},\boldsymbol{\gamma}),\ldots,F_{n}(\boldsymbol{\eta},\boldsymbol{\gamma}),F_{n+1}(\boldsymbol{\eta},\boldsymbol{\gamma}),\ldots,F_{2n-1}(\boldsymbol{\eta},\boldsymbol{\gamma}))^{\top},

which are based on the score equations for 𝜼^\widehat{\boldsymbol{\eta}}. Define Fγ,i(𝜼)F_{\gamma,i}(\boldsymbol{\eta}) be the value of Fi(𝜼,𝜸)F_{i}(\boldsymbol{\eta},\boldsymbol{\gamma}) when 𝜸\boldsymbol{\gamma} is fixed. Let 𝜼^γ\widehat{\boldsymbol{\eta}}_{\gamma} be a solution to Fγ(𝜼)=0F_{\gamma}(\boldsymbol{\eta})=0 if it exists. Correspondingly, we define two functions for exploring the asymptotic behaviors of the estimator of 𝜸\boldsymbol{\gamma}:

Q(𝜼,𝜸)=i,j=1;ijnZij(μij(𝜼,𝜸)aij),\displaystyle Q(\boldsymbol{\eta},\boldsymbol{\gamma})=\sum_{i,j=1;i\neq j}^{n}Z_{ij}(\mu_{ij}(\boldsymbol{\eta},\boldsymbol{\gamma})-a_{ij}), (7)
Qc(𝜸)=i,j=1;ijnZij(μij(𝜼^γ,𝜸)aij).\displaystyle Q_{c}(\boldsymbol{\gamma})=\sum_{i,j=1;i\neq j}^{n}Z_{ij}(\mu_{ij}(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})-a_{ij}). (8)

Qc(𝜸)Q_{c}(\boldsymbol{\gamma}) could be viewed as the concentrated or profile function of Q(𝜼,𝜸)Q(\boldsymbol{\eta},\boldsymbol{\gamma}) in which the degree parameter 𝜼\boldsymbol{\eta} is profiled out. It is clear that

F(𝜼^,𝜸^)=0,Fγ(𝜼^γ)=0,Q(𝜼^,𝜸^)=0,Qc(𝜸^)=0.F(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})=0,~{}~{}F_{\gamma}(\widehat{\boldsymbol{\eta}}_{\gamma})=0,~{}~{}Q(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}})=0,~{}~{}Q_{c}(\widehat{\boldsymbol{\gamma}})=0.

We first present the error bound between 𝜼^γ\widehat{\boldsymbol{\eta}}_{\gamma} with γB(𝜸,ϵn2)\gamma\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2}) and 𝜼\boldsymbol{\eta}^{*}. We construct the Newton iterative sequence {𝜼(k+1)}k=0\{\boldsymbol{\eta}^{(k+1)}\}_{k=0}^{\infty} with initial value 𝜼\boldsymbol{\eta}^{*}, where 𝜼(k+1)=𝜼(k)[Fγ(𝜼(k))]1Fγ(𝜼(k))\boldsymbol{\eta}^{(k+1)}=\boldsymbol{\eta}^{(k)}-[F_{\gamma}^{\prime}(\boldsymbol{\eta}^{(k)})]^{-1}F_{\gamma}(\boldsymbol{\eta}^{(k)}) and Fγ(𝜼)=Fγ(𝜼)/𝜸F_{\gamma}^{\prime}(\boldsymbol{\eta})=\partial F_{\gamma}(\boldsymbol{\eta})/\partial\boldsymbol{\gamma}^{\top}. If the iterative converges, then the solution lies in the neighborhood of 𝜼\boldsymbol{\eta}^{*}. This is done by establishing a geometrically fast convergence rate of the iterative sequence. Details are given in supplementary material. The error bound is stated below.

Lemma 1.

If 𝛄B(𝛄,ϵn2)\boldsymbol{\gamma}\in B(\boldsymbol{\gamma}^{*},\epsilon_{n2}) and bn=o((n/logn)1/12)b_{n}=o((n/\log n)^{1/12}), then as nn goes to infinity, with probability at least 1O(1/n)1-O(1/n), 𝛈^γ\widehat{\boldsymbol{\eta}}_{\gamma} exists and satisfies

𝜼^γ𝜼=Op(bn3lognn)=op(1).\|\widehat{\boldsymbol{\eta}}_{\gamma}-\boldsymbol{\eta}^{*}\|_{\infty}=O_{p}\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right)=o_{p}(1).

Further, if 𝛈^γ\widehat{\boldsymbol{\eta}}_{\gamma} exists, it is unique.

By the compound function derivation law, we have

0=Fγ(𝜼^γ)𝜸=F(𝜼^γ,γ)𝜼𝜼^γ𝜸+F(𝜼^γ,𝜸)𝜸,\displaystyle 0=\frac{\partial F_{\gamma}(\widehat{\boldsymbol{\eta}}_{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}=\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\gamma)}{\partial\boldsymbol{\eta}^{\top}}\frac{\partial\widehat{\boldsymbol{\eta}}_{\gamma}}{\boldsymbol{\gamma}^{\top}}+\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}, (9)
Qc(𝜸)𝜸=Q(𝜼^γ,𝜸)𝜼𝜼^γ𝜸+Q(𝜼^γ,𝜸)𝜸.\displaystyle\frac{\partial Q_{c}({\boldsymbol{\gamma}})}{\partial{\boldsymbol{\gamma}}^{\top}}=\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},{\boldsymbol{\gamma}})}{\partial\boldsymbol{\eta}^{\top}}\frac{\partial\widehat{\boldsymbol{\eta}}_{\gamma}}{{\boldsymbol{\gamma}}^{\top}}+\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},{\boldsymbol{\gamma}})}{\partial{\boldsymbol{\gamma}}^{\top}}. (10)

By solving 𝜼^γ/𝜸\partial\widehat{\boldsymbol{\eta}}_{\gamma}/\partial{\boldsymbol{\gamma}}^{\top} in (9) and substituting it into (10), we get the Jacobian matrix Qc(𝜸)Q_{c}^{\prime}(\boldsymbol{\gamma}) (=Qc(𝜸)/𝜸)(=\partial Q_{c}(\boldsymbol{\gamma})/\partial\boldsymbol{\gamma}^{\top}):

Qc(𝜸)𝜸=Q(𝜼^γ,𝜸)𝜸Q(𝜼^γ,𝜸)𝜼[F(𝜼^γ,𝜸)𝜼]1F(𝜼^γ,𝜸)𝜸.\displaystyle\frac{\partial Q_{c}(\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}=\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}-\frac{\partial Q(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}^{\top}}\left[\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}^{\top}}\right]^{-1}\frac{\partial F(\widehat{\boldsymbol{\eta}}_{\gamma},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}^{\top}}. (11)

The asymptotic behavior of 𝜸^\widehat{\boldsymbol{\gamma}} crucially depends on the Jacobian matrix Qc(𝜸)Q_{c}^{\prime}(\boldsymbol{\gamma}). Since 𝜼^γ\widehat{\boldsymbol{\eta}}_{\gamma} does not have a closed form, conditions that are directly imposed on Qc(𝜸)Q_{c}^{\prime}(\boldsymbol{\gamma}) are not easily checked. To derive feasible conditions, we define

H(𝜼,𝜸)=Q(𝜼,𝜸)𝜸Q(𝜼,𝜸)𝜼[F(𝜼,𝜸)𝜼]1F(𝜼,𝜸)𝜸,H(\boldsymbol{\eta},\boldsymbol{\gamma})=\frac{\partial Q(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}}-\frac{\partial Q(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}}\left[\frac{\partial F(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\eta}}\right]^{-1}\frac{\partial F(\boldsymbol{\eta},\boldsymbol{\gamma})}{\partial\boldsymbol{\gamma}}, (12)

which is a general form of Qc(𝜸)/𝜸\partial Q_{c}(\boldsymbol{\gamma})/\partial\boldsymbol{\gamma}. Because H(𝜼,𝜸)H(\boldsymbol{\eta},\boldsymbol{\gamma}) is the Fisher information matrix of the profiled log-likelihood c(𝜸)\ell_{c}(\boldsymbol{\gamma}), we assume that it is positively definite. Otherwise, the covariate-p0p_{0}-model will be ill-conditioned. When 𝜼B(𝜼,ϵn1)\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1}), we have the equation:

1n2H(𝜼,𝜸)=1n2H(𝜼,𝜸)+o(1),\frac{1}{n^{2}}H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*})=\frac{1}{n^{2}}H(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})+o(1), (13)

whose proof is omitted. Note that the dimension of H(𝜼,𝜸)H(\boldsymbol{\eta},\boldsymbol{\gamma}) is fixed and every its entry is a sum of (n1)n(n-1)n terms. We assume that there exists a number κn\kappa_{n} such that

sup𝜼B(𝜼,ϵn1)H1(𝜼,𝜸)κnn2.\sup_{\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1})}\|H^{-1}(\boldsymbol{\eta},\boldsymbol{\gamma}^{*})\|_{\infty}\leq\frac{\kappa_{n}}{n^{2}}. (14)

If n2H(𝜼,𝜸)n^{-2}H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*}) converges to a constant matrix, then κn\kappa_{n} is bounded. Because H(𝜼,𝜸)H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*}) is positively definite,

κn=p×sup𝜼B(𝜼,ϵn1)1/λmin(𝜼),\kappa_{n}=\sqrt{p}\times\sup_{\boldsymbol{\eta}\in B(\boldsymbol{\eta}^{*},\epsilon_{n1})}1/\lambda_{\min}(\boldsymbol{\eta}),

where λmin(𝜼)\lambda_{\min}(\boldsymbol{\eta}) be the smallest eigenvalue of n2H(𝜼,𝜸)n^{-2}H(\boldsymbol{\eta},\boldsymbol{\gamma}^{*}). Now we formally state the consistency result.

Theorem 1.

If κn2bn18=o(n/logn)\kappa_{n}^{2}b_{n}^{18}=o(n/\log n), then the MLE (𝛈^,𝛄^)(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}}) exists with probability at least 1O(1/n)1-O(1/n), and is consistent in the sense that

𝜸^𝜸\displaystyle\|\widehat{\boldsymbol{\gamma}}-\boldsymbol{\gamma}^{*}\|_{\infty} =Op(κnbn9lognn)=op(1)\displaystyle=O_{p}\left(\frac{\kappa_{n}b_{n}^{9}\log n}{n}\right)=o_{p}(1)
𝜼^𝜼\displaystyle\|\widehat{\boldsymbol{\eta}}-\boldsymbol{\eta}^{*}\|_{\infty} =Op(bn3lognn)=op(1).\displaystyle=O_{p}\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right)=o_{p}(1).

Further, if θ^\widehat{\theta} exists, it is unique.

From the above theorem, we can see that 𝜸^\widehat{\boldsymbol{\gamma}} has a convergence rate 1/n1/n and 𝜼^\widehat{\boldsymbol{\eta}} has a convergence rate 1/n1/21/n^{1/2}, up to a logarithm factor, if bnb_{n} and κn\kappa_{n} is bounded above by a constant. If 𝜸\|\boldsymbol{\gamma}^{*}\|_{\infty} and 𝜼\|\boldsymbol{\eta}^{*}\|_{\infty} are constants, then bnb_{n} and κn\kappa_{n} are constants as well such that the condition in Theorem 1 holds.

As an application of the \ell_{\infty}-error bound, we use it to recover nonzero signals on node parameters. We need a separation measure Δn\Delta_{n} to distinguish between nonzero signals and zero signals for βi\beta_{i}s. Define Ξ={i:βi>Δn}\Xi=\{i:\beta_{i}^{*}>\Delta_{n}\} as the set of signals. Because

β^iβi|β^iβi|=ΔnO(bn3lognn),iΞ,\hat{\beta}_{i}\geq\beta_{i}^{*}-|\hat{\beta}_{i}-\beta_{i}^{*}|=\Delta_{n}-O\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right),~{}i\in\Xi,

we immediately have the following corollary.

Corollary 2.

If Δnbn3(logn/n)1/2\Delta_{n}\gg b_{n}^{3}(\log n/n)^{1/2} and κn2bn18=o(n/logn)\kappa_{n}^{2}b_{n}^{18}=o(n/\log n), then with probability at least 1O(n1)1-O(n^{-1}), the set Ξ\Xi can be exactly recovered by its MLE Ξ^\widehat{\Xi}, where Ξ^={i:β^iΔn}\widehat{\Xi}=\{i:\hat{\beta}_{i}\geq\Delta_{n}\}.

3.2 Asymptotic normality of 𝜼^\widehat{\boldsymbol{\eta}} and 𝜸^\widehat{\boldsymbol{\gamma}}

The asymptotic distribution of 𝜼^\widehat{\boldsymbol{\eta}} depends crucially on the inverse of the Fisher information matrix Fγ(𝜼)F^{\prime}_{\gamma}(\boldsymbol{\eta}). It does not have a closed form. In order to characterize this matrix, we introduce a general class of matrices that encompass the Fisher matrix. Given two positive numbers mm and MM with Mm>0M\geq m>0, we say the (2n1)×(2n1)(2n-1)\times(2n-1) matrix V=(vi,j)V=(v_{i,j}) belongs to the class n(m,M)\mathcal{L}_{n}(m,M) if the following holds:

0vi,ij=12n1vi,jM,i=1,,2n1,vi,j=0,i,j=1,,n,ij,vi,j=0,i,j=n+1,,2n1,ij,mvi,j=vj,iM,i=1,,n,j=n+1,,2n,jn+i,\begin{array}[]{l}0\leq v_{i,i}-\sum_{j=1}^{2n-1}v_{i,j}\leq M,~{}~{}i=1,\ldots,2n-1,\\ v_{i,j}=0,~{}~{}i,j=1,\ldots,n,~{}i\neq j,\\ v_{i,j}=0,~{}~{}i,j=n+1,\ldots,2n-1,~{}i\neq j,\\ m\leq v_{i,j}=v_{j,i}\leq M,~{}~{}i=1,\ldots,n,~{}j=n+1,\ldots,2n,~{}j\neq n+i,\\ \end{array} (15)

Clearly, if Vn(m,M)V\in\mathcal{L}_{n}(m,M), then VV is a (2n1)×(2n1)(2n-1)\times(2n-1) diagonally dominant, symmetric nonnegative matrix. It must be positively definite. The definition of n(m,M)\mathcal{L}_{n}(m,M) here is wider than that in Yan et al. (2016a), where the diagonal elements are equal to the row sum excluding themselves for some rows. One can easily show that the Fisher information matrix for the vector parameter 𝜼\boldsymbol{\eta} belongs to n(m,M)\mathcal{L}_{n}(m,M). With some abuse of notation, we use VV to denote the Fisher information matrix for the vector parameter 𝜼\boldsymbol{\eta}.

To describe the exact form of elements of VV, vijv_{ij} for i,j=1,,2n1i,j=1,\ldots,2n-1, iji\neq j, we define

uij=μ(πij),uii=0,ui=j=1nuij,uj=i=1nuij.u_{ij}=\mu^{\prime}(\pi_{ij}),~{}~{}u_{ii}=0,~{}~{}u_{i\cdot}=\sum_{j=1}^{n}u_{ij},~{}~{}u_{\cdot j}=\sum_{i=1}^{n}u_{ij}.

Note that uiju_{ij} is the variance of aija_{ij}. Then the elements of VV are

vij={ui,i=j=1,,n,ui,jn,i=1,,n,j=n+1,,2n1,ji+n,uin,j,i=n+1,,2n,j=1,,n1,jin,ujn,i=j=n+1,,2n1,0,others.v_{ij}=\begin{cases}u_{i\cdot},&i=j=1,\ldots,n,\\ u_{i,j-n},&i=1,\ldots,n,j=n+1,\ldots,2n-1,j\neq i+n,\\ u_{i-n,j},&i=n+1,\ldots,2n,j=1,\ldots,n-1,j\neq i-n,\\ u_{\cdot j-n},&i=j=n+1,\ldots,2n-1,\\ 0,&\mbox{others}.\end{cases}

Yan et al. (2016a) proposed to approximate the inverse V1V^{-1} by the matrix S=(si,j)S=(s_{i,j}), which is defined as

si,j={δi,jui+1un,i,j=1,,n,1un,i=1,,n,j=n+1,,2n1,1un,i=n+1,,2n,j=1,,n1,δi,ju(jn)+1un,i,j=n+1,,2n1,s_{i,j}=\left\{\begin{array}[]{ll}\frac{\delta_{i,j}}{u_{i\cdot}}+\frac{1}{u_{\cdot n}},&i,j=1,\ldots,n,\\ -\frac{1}{u_{\cdot n}},&i=1,\ldots,n,~{}~{}j=n+1,\ldots,2n-1,\\ -\frac{1}{u_{\cdot n}},&i=n+1,\ldots,2n,~{}~{}j=1,\ldots,n-1,\\ \frac{\delta_{i,j}}{u_{\cdot(j-n)}}+\frac{1}{u_{\cdot n}},&i,j=n+1,\ldots,2n-1,\end{array}\right. (16)

where δi,j=1\delta_{i,j}=1 when i=ji=j and δi,j=0\delta_{i,j}=0 when iji\neq j.

We derive the asymptotic normality of 𝜼^\widehat{\boldsymbol{\boldsymbol{\eta}}} by approximately representing 𝜼^\boldsymbol{\widehat{\boldsymbol{\eta}}} as a function of 𝐝\mathbf{d} and 𝐛\mathbf{b} with an explicit expression. This is done via applying a second Taylor expansion to F(𝜼^,𝜸^)F(\widehat{\boldsymbol{\eta}},\widehat{\boldsymbol{\gamma}}) and showing various remainder terms asymptotically neglect. Details are given in the supplementary mateiral.

Theorem 3.

If κnbn12=o(n1/2/logn)\kappa_{n}b_{n}^{12}=o(n^{1/2}/\log n), then for any fixed k1k\geq 1, as nn\to\infty, the vector consisting of the first kk elements of (𝛈^𝛈)(\widehat{\boldsymbol{\boldsymbol{\eta}}}-\boldsymbol{\boldsymbol{\eta}}^{*}) is asymptotically multivariate normal with mean 𝟎\mathbf{0} and covariance matrix given by the upper left k×kk\times k block of SS defined in (16).

Remark 1.

By Theorem 3, for any fixed ii, as nn\rightarrow\infty, the asymptotic variance of 𝛈^i\hat{\boldsymbol{\eta}}_{i} is 1/vi,i1/21/v_{i,i}^{1/2}, whose magnitude is between O(n1/2e𝛈)O(n^{-1/2}e^{\|\boldsymbol{\boldsymbol{\eta}}^{*}\|_{\infty}}) and O(n1/2)O(n^{-1/2}).

Remark 2.

Under the restriction αn=βn=0\alpha_{n}=\beta_{n}=0, the scaled MLE ν^\widehat{\nu} for the density parameter ν\nu, (u1+un)1/2(ν^ν)(u_{1\cdot}+u_{\cdot n})^{-1/2}(\widehat{\nu}-\nu), converges in distribution to the standard normality.

With similar arguments for proving the asymptotic normality of the restricted MLE for 𝜸{\boldsymbol{\gamma}} in Yan et al. (2019), we have the same asymptotic normality of the unrestricted MLE. We only present the result here and the proof is omitted. Let

Vηγ=F(𝜼,𝜸)𝜼,Vγγ=Q(𝜼,𝜸)𝜸.V_{\eta\gamma}=\frac{\partial F(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})}{\partial\boldsymbol{\eta}^{\top}},~{}~{}V_{\gamma\gamma}=\frac{\partial Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})}{\partial\boldsymbol{\gamma}^{\top}}.

Following Amemiya (1985) (p. 126), the Hessian matrix of the concentrated log-likelihood function c(𝜸)\ell^{c}(\boldsymbol{\gamma}^{*}) is VγγVηγV1VηγV_{\gamma\gamma}-V_{\eta\gamma}^{\top}V^{-1}V_{\eta\gamma}. To state the form of the limit distribution of 𝜸^\hat{\boldsymbol{\gamma}}, define

In(𝜸)=1(n1)n(VγγVηγV1Vηγ).I_{n}(\boldsymbol{\gamma}^{*})=\frac{1}{(n-1)n}(V_{\gamma\gamma}-V_{\eta\gamma}^{\top}V^{-1}V_{\eta\gamma}). (17)

Assume that the limit of In(𝜸)I_{n}(\boldsymbol{\gamma}^{*}) exists as nn goes to infinity, denoted by I(𝜸)I_{*}(\boldsymbol{\gamma}^{*}). Then, we have the following theorem.

Theorem 4.

If κnbn12=o(n1/2/(logn)3/2)\kappa_{n}b_{n}^{12}=o(n^{1/2}/(\log n)^{3/2}), then as nn goes to infinity, the pp-dimensional vector N1/2(𝛄^𝛄)N^{1/2}(\widehat{\boldsymbol{\gamma}}-\boldsymbol{\gamma}^{*}) is asymptotically multivariate normal distribution with mean I1(𝛄)BI_{*}^{-1}(\boldsymbol{\gamma}^{*})B_{*} and covariance matrix I1(𝛄)I_{*}^{-1}(\boldsymbol{\gamma}), where N=n(n1)N=n(n-1) and BB_{*} is the bias term:

B=limn12N[i=0nj=0,jinμ′′(πij)Zijj=0,jinμ(πij)+j=0ni=0,ijnμ′′(πij)Ziji=0,ijnμ(πij)].B_{*}=\lim_{n\to\infty}\frac{1}{2\sqrt{N}}\left[\sum_{i=0}^{n}\frac{\sum_{j=0,j\neq i}^{n}\mu^{\prime\prime}(\pi_{ij}^{*})Z_{ij}}{\sum_{j=0,j\neq i}^{n}\mu^{\prime}(\pi_{ij}^{*})}+\sum_{j=0}^{n}\frac{\sum_{i=0,i\neq j}^{n}\mu^{\prime\prime}(\pi_{ij}^{*})Z_{ij}}{\sum_{i=0,i\neq j}^{n}\mu^{\prime}(\pi_{ij}^{*})}\right].

The limiting distribution of 𝜸^\boldsymbol{\widehat{\gamma}} is involved with a bias term

μ=I1(𝜸)Bn(n1).\mu_{*}=\frac{I_{*}^{-1}(\boldsymbol{\gamma})B_{*}}{\sqrt{n(n-1)}}.

If all parameters 𝜸\boldsymbol{\gamma} and 𝜽\boldsymbol{\theta} are bounded, then μ=O(n1/2)\mu_{*}=O(n^{-1/2}). It follows that B=O(1)B_{*}=O(1) and (I)i,j=O(1)(I_{*})_{i,j}=O(1) according to their expressions. Since the MLE 𝜸^\boldsymbol{\widehat{\gamma}} is not centered at the true parameter value, the confidence intervals and the p-values of hypothesis testing constructed from 𝜸^\boldsymbol{\widehat{\gamma}} cannot achieve the nominal level without bias-correction under the null: 𝜸=0\boldsymbol{\gamma}^{*}=0. This is referred to as the so-called incidental parameter problem in econometric literature [Neyman and Scott (1948)]. The produced bias is due to the appearance of additional parameters. As discussed in Yan et al. (2019), we could use the analytical bias correction formula: 𝜸^bc=𝜸^I^1B^/n(n1)\boldsymbol{\widehat{\gamma}}_{bc}=\boldsymbol{\widehat{\gamma}}-\hat{I}^{-1}\hat{B}/\sqrt{n(n-1)}, where I^\hat{I} and B^\hat{B} are the estimates of II_{*} and BB_{*} by replacing 𝜸\boldsymbol{\gamma} and 𝜽\boldsymbol{\theta} in their expressions with their MLEs 𝜸^\boldsymbol{\widehat{\gamma}} and 𝜽^\boldsymbol{\widehat{\theta}}, respectively.

4 Numerical Studies

In this section, we evaluate the asymptotic results of the MLEs for model (2) through simulation studies and a real data example.

4.1 Simulation studies

Similar to Yan et al. (2019), the parameter values take a linear form. Specifically, we set αi=iclogn/n\alpha_{i}^{*}=ic\log n/n for i=0,,ni=0,\ldots,n and let βi=αi\beta_{i}^{*}=\alpha_{i}^{*}, i=0,,ni=0,\ldots,n for simplicity. Note that there are n+1n+1 nodes in the simulations. We considered four different values for cc as c{0,0.2,0.4,0.6}c\in\{0,0.2,0.4,0.6\}. By allowing the true values of 𝜶\boldsymbol{\alpha} and 𝜷\boldsymbol{\beta} to grow with nn, we intended to assess the asymptotic properties under different asymptotic regimes. Similar to Yan et al. (2019), we set p=2p=2 here. The first element of the 22-dimensional node-specific covariate XiX_{i} is independently generated from a Beta(2,2)Beta(2,2) distribution and the second element follows a discrete distribution taking values 11 and 1-1 with probabilities 0.30.3 and 0.70.7. The covariates formed: Zij=(|Xi1Xj1|,Xi2Xj2)Z_{ij}=(|X_{i1}-X_{j1}|,X_{i2}*X_{j2})^{\top}. For the parameter 𝜸\boldsymbol{\gamma}^{*}, we let it be (1,1)(1,1)^{\top}. The density parameter is ν=logn/4\nu=-\log n/4.

Note that by Theorems 3, ξ^i,j=[α^iα^j(αiαj)]/(1/v^i,i+1/v^j,j)1/2\hat{\xi}_{i,j}=[\hat{\alpha}_{i}-\hat{\alpha}_{j}-(\alpha_{i}^{*}-\alpha_{j}^{*})]/(1/\hat{v}_{i,i}+1/\hat{v}_{j,j})^{1/2}, ζ^i,j=(α^i+β^jαiβj)/(1/v^i,i+1/v^n+j,n+j)1/2\hat{\zeta}_{i,j}=(\hat{\alpha}_{i}+\hat{\beta}_{j}-\alpha_{i}^{*}-\beta_{j}^{*})/(1/\hat{v}_{i,i}+1/\hat{v}_{n+j,n+j})^{1/2}, and η^i,j=[β^iβ^j(βiβj)]/(1/v^n+i,n+i+1/v^n+j,n+j)1/2\hat{\eta}_{i,j}=[\hat{\beta}_{i}-\hat{\beta}_{j}-(\beta_{i}^{*}-\beta_{j}^{*})]/(1/\hat{v}_{n+i,n+i}+1/\hat{v}_{n+j,n+j})^{1/2} are all asymptotically distributed as standard normal random variables, where v^i,i\hat{v}_{i,i} is the estimate of vi,iv_{i,i} by replacing (𝜸,𝜼)(\boldsymbol{\gamma}^{*},\boldsymbol{\eta}^{*}) with (𝜸^,𝜼^)(\boldsymbol{\widehat{\gamma}},\boldsymbol{\widehat{\eta}}). We record the coverage probability of the 95% confidence interval, the length of the confidence interval, and the frequency that the MLE does not exist. The results for ξ^i,j\hat{\xi}_{i,j}, ζ^i,j\hat{\zeta}_{i,j} and η^i,j\hat{\eta}_{i,j} are similar, thus only the results of ξ^i,j\hat{\xi}_{i,j} are reported. Each simulation is repeated 5,0005,000 times.

Table 1 reports the coverage probability of the 95% confidence interval for αiαj\alpha_{i}-\alpha_{j}, the length of the confidence interval as well as the frequency that the MLE did not exist. As we can see, the length of the confidence interval increases as cc increases and decreases as nn increases, which qualitatively agrees with the theory. The coverage frequencies are all close to the nominal level when c=0c=0 or 0.20.2 or c=0.4c=0.4. When c=0.6c=0.6, the MLE failed to exist with positive frequencies.

Table 1: The reported values are the coverage frequency (×100%\times 100\%) for αiαj\alpha_{i}-\alpha_{j} for a pair (i,j)(i,j) / the length of the confidence interval / the frequency (×100%\times 100\%) that the MLE did not exist. The pair (0,0)(0,0) indicates those values for the density parameter ν\nu.
n (i,j)(i,j) c=0c=0 c=0.2c=0.2 c=0.4c=0.4 c=0.6c=0.6
100 (1,2)(1,2) 94.9/1.28/094.9/1.28/0 95.02/1.26/095.02/1.26/0 94.58/1.29/094.58/1.29/0 94.76/1.35/30.8894.76/1.35/30.88
(50,51)(50,51) 95.4/1.28/095.4/1.28/0 94.34/1.27/094.34/1.27/0 94.98/1.39/094.98/1.39/0 95.14/1.68/30.8895.14/1.68/30.88
(99,100)(99,100) 94.7/1.28/094.7/1.28/0 94.54/1.31/094.54/1.31/0 94.58/1.68/094.58/1.68/0 97.57/2.55/30.8897.57/2.55/30.88
(1,100)(1,100) 95.12/1.28/095.12/1.28/0 94.5/1.29/094.5/1.29/0 94.8/1.5/094.8/1.5/0 96.53/1.99/30.8896.53/1.99/30.88
(1,50)(1,50) 94.64/1.28/094.64/1.28/0 94.52/1.26/094.52/1.26/0 94.6/1.34/094.6/1.34/0 94.39/1.52/30.8894.39/1.52/30.88
(0,0)(0,0) 94.26/1.29/094.26/1.29/0 94.66/1.27/094.66/1.27/0 94.2/1.29/094.2/1.29/0 94.36/1.36/30.8894.36/1.36/30.88
200 (1,2)(1,2) 95.12/0.92/095.12/0.92/0 95.3/0.89/095.3/0.89/0 94.34/0.91/094.34/0.91/0 94.73/0.96/7.7494.73/0.96/7.74
(100,101)(100,101) 94.86/0.92/094.86/0.92/0 94.92/0.89/094.92/0.89/0 94.98/1/094.98/1/0 94.58/1.25/7.7494.58/1.25/7.74
(199,200)(199,200) 94.54/0.93/094.54/0.93/0 94.74/0.92/094.74/0.92/0 94.88/1.26/094.88/1.26/0 96.4/2.1/7.7496.4/2.1/7.74
(1,200)(1,200) 94.84/0.92/094.84/0.92/0 94.94/0.91/094.94/0.91/0 95.52/1.11/095.52/1.11/0 96.05/1.61/7.7496.05/1.61/7.74
(1,100)(1,100) 95.04/0.92/095.04/0.92/0 95.66/0.89/095.66/0.89/0 95.26/0.96/095.26/0.96/0 94.39/1.12/7.7494.39/1.12/7.74
(0,0)(0,0) 94.62/0.92/094.62/0.92/0 94.9/0.89/094.9/0.89/0 94.78/0.91/094.78/0.91/0 94.17/0.96/7.7494.17/0.96/7.74

Table 2 reports simulation results for the estimate 𝜸^\boldsymbol{\widehat{\gamma}} and bias correction estimate 𝜸^bc(=𝜸^I^1B^/n(n1))\boldsymbol{\widehat{\gamma}}_{bc}(=\boldsymbol{\widehat{\gamma}}-\hat{I}^{-1}\hat{B}/\sqrt{n(n-1)}) at the nominal level 95%95\%. The coverage frequencies for the uncorrected estimate γ^2\widehat{\gamma}_{2} are visibly below the nominal level and the bias correction estimates are close to the nominal level when c0.2c\leq 0.2, which dramatically improve uncorrected estimates. As expected, the standard error increases with cc and decreases with nn.

Table 2: The reported values are the coverage frequency (×100%\times 100\%) of γi\gamma_{i} for ii with corrected estimates (uncorrected estimates) / length of confidence interval /the frequency (×100%\times 100\%) that the MLE did not exist (𝜸=(1,1.5)\boldsymbol{\gamma}^{*}=(1,1.5)^{\top}).
nn cc γ1\gamma_{1} γ2\gamma_{2}
100100 0 93.72(93.72)/0.58/093.72(93.72)/0.58/0 92.72(86.62)/0.11/092.72(86.62)/0.11/0
0.20.2 93.12(92.88)/0.57/093.12(92.88)/0.57/0 93.4(83.58)/0.1/093.4(83.58)/0.1/0
0.40.4 91.26(92.96)/0.62/091.26(92.96)/0.62/0 73.04(86.06)/0.12/073.04(86.06)/0.12/0
0.60.6 85.24(92.62)/0.73/30.8885.24(92.62)/0.73/30.88 43.66(87.44)/0.15/30.8843.66(87.44)/0.15/30.88
200200 0 92.9(93.94)/0.29/092.9(93.94)/0.29/0 92.38(88.04)/0.06/092.38(88.04)/0.06/0
0.20.2 94.14(93.76)/0.28/094.14(93.76)/0.28/0 93.32(84.22)/0.05/093.32(84.22)/0.05/0
0.40.4 90.5(93.00)/0.31/090.5(93.00)/0.31/0 72.36(86.76)/0.06/072.36(86.76)/0.06/0
0.60.6 83.29(93.84)/0.38/7.7483.29(93.84)/0.38/7.74 40.93(88.64)/0.08/7.7440.93(88.64)/0.08/7.74

4.2 A data example

Yan et al. (2019) analyzed the friendship network among the 7171 attorneys in Lazega’s datasets of lawyers (Lazega, 2001) without the density parameter ν\nu, which can be ownloaded from https://www.stats.ox.ac.uk/~snijders/siena/Lazega_lawyers_data.htm. We revisited this data set here and use the covariate-p0p_{0}-model with the density parameter for analysis. The collected covariates at the node level include X1X_{1}=status (1=partner; 2=associate); X2X_{2}=gender (1=man; 2=woman), X3X_{3}=location (1=Boston; 2=Hartford; 3=Providence), X4X_{4}=years (with the firm), X5X_{5}=age, X6X_{6}=practice (1=litigation; 2=corporate) and X7X_{7}=law school (1: harvard, yale; 2: ucon; 3: other). We denote the covariates of individual ii by (Xi1,,Xi7)T(X_{i1},\ldots,X_{i7})^{T}. The network graphs with node attributes “status and practice” were drawn in Yan et al. (2019). Here we complement graphs with node attributes “gender and location” in Figure 1. From this figure, we can see that there are more links between lawyers with the same gender and office location. It exhibits the homophily of gender and location.

Refer to caption
Refer to caption
Figure 1: Visualization of Lazega s friendship network among lawyers. Two isolated nodes exist. The vertex sizes are proportional to either nodal in-degrees in (a) or out-degrees in (b). The positions of the vertices are the same in (a) and (b). For nodes with degrees less than 55, we set their sizes the same (as a node with degrees 55). In (a), the colors indicate different genders (red for male and blue for female), while in (b), the colors represent different locations (black for Boston, green for Hartford, red for Providence).

Let 1{}1_{\{\cdot\}} be an indicator function. As in Yan et al. (2019), we define the similarity between two individuals ii and jj by the indicator function: Zij,k=1{Xik=Xjk}Z_{ij,k}=1_{\{X_{ik}=X_{jk}\}}, k=1,2,3,6,7k=1,2,3,6,7, for categorical variables, For discrete variables, we define the similarity using absolute distance: Zij,k=|XikXjk|Z_{ij,k}=|X_{ik}-X_{jk}|, k=4,5k=4,5. Individuals are labelled from 11 to 7171. Before our analysis, we removed those individuals whose in-degrees or out-degrees are zeros, we perform the analysis on the 6363 vertices left. Since the change of model identification conditions have no influence on the analysis of covariates, the variables “status, gender, office location and practice” have significant influence on the link formation while the influence of “years, age, and law school” is not significant, as shown in Table 4 in Yan et al. (2019).

The estimate ν^\widehat{\nu} of the density parameter ν\nu is 7.83-7.83 with standard error 1.131.13, which leads to a pp-value less than 10410^{-4} under the null ν=0\nu=0. This indicates a significant sparse network. The estimators of αi\alpha_{i}, βi\beta_{i} with their estimated standard errors are given in Table 3, in which α^71=β^71=0\widehat{\alpha}_{71}=\widehat{\beta}_{71}=0 as a reference. The estimates of out- and in-degree parameters vary widely: from the minimum 8.65-8.65 to maximum 2.21-2.21 for α^i\widehat{\alpha}_{i}s and from 1.55-1.55 to 2.792.79 for β^i\widehat{\beta}_{i}s. The minimum, 1/41/4 quantile, 3/43/4 quantile, and maximum values of dd are 1,5,8,121,5,8,12, and 2525; those of bb are 2,5,8,132,5,8,13, and 2222.

Table 3: The estimates of αi\alpha_{i} and βj\beta_{j} and their standard errors in the Lazega’s frienship data set with α^71=β^71=0\widehat{\alpha}_{71}=\widehat{\beta}_{71}=0.
Node did_{i} α^i\hat{\alpha}_{i} σ^i\hat{\sigma}_{i} bjb_{j} β^i\hat{\beta}_{i} σ^j\hat{\sigma}_{j} Node did_{i} α^i\hat{\alpha}_{i} σ^i\hat{\sigma}_{i} bjb_{j} β^i\hat{\beta}_{i} σ^j\hat{\sigma}_{j}
1 44 6.21-6.21 0.790.79 55 0.530.53 0.770.77 34 66 5.54-5.54 0.670.67 1111 1.181.18 0.610.61
2 44 6.01-6.01 0.820.82 99 1.911.91 0.70.7 35 99 4.25-4.25 0.670.67 1010 1.551.55 0.680.68
4 1414 3.46-3.46 0.650.65 1414 2.792.79 0.630.63 36 99 5.4-5.4 0.630.63 1111 0.770.77 0.610.61
5 33 5.01-5.01 0.80.8 55 1.431.43 0.730.73 38 88 5.21-5.21 0.640.64 1313 1.421.42 0.60.6
7 11 6.59-6.59 1.171.17 22 0.04-0.04 0.910.91 39 88 5.47-5.47 0.640.64 1313 1.141.14 0.610.61
8 11 8.32-8.32 1.161.16 77 0.560.56 0.710.71 40 1010 5.29-5.29 0.620.62 88 0.210.21 0.640.64
9 66 5.98-5.98 0.730.73 1414 2.12.1 0.630.63 41 1212 5.04-5.04 0.610.61 1717 1.421.42 0.590.59
10 1414 4.17-4.17 0.650.65 44 0.45-0.45 0.850.85 42 1414 4.55-4.55 0.590.59 99 0.540.54 0.630.63
11 55 6.49-6.49 0.740.74 1414 1.71.7 0.630.63 43 1515 4.4-4.4 0.60.6 1313 1.211.21 0.60.6
12 2222 2.95-2.95 0.610.61 88 0.860.86 0.690.69 45 66 5.8-5.8 0.660.66 44 0.63-0.63 0.740.74
13 1414 4.35-4.35 0.640.64 1919 2.562.56 0.60.6 46 33 5.61-5.61 0.810.81 55 0.530.53 0.740.74
14 66 4.27-4.27 0.70.7 66 1.211.21 0.730.73 48 77 5.4-5.4 0.650.65 44 0.39-0.39 0.740.74
15 33 4.89-4.89 0.80.8 22 0.390.39 0.930.93 49 44 6.7-6.7 0.730.73 66 0.42-0.42 0.680.68
16 88 5.66-5.66 0.680.68 1010 0.940.94 0.650.65 50 88 4.34-4.34 0.670.67 88 1.151.15 0.680.68
17 2323 2.85-2.85 0.610.61 1818 2.52.5 0.610.61 51 66 4.67-4.67 0.70.7 77 1.111.11 0.70.7
18 88 4.62-4.62 0.660.66 55 0.330.33 0.750.75 52 1111 5.1-5.1 0.610.61 1414 1.121.12 0.60.6
19 44 6.85-6.85 0.760.76 44 0.77-0.77 0.790.79 54 77 5.78-5.78 0.660.66 1111 0.680.68 0.620.62
20 1212 5.01-5.01 0.650.65 77 0.20.2 0.690.69 56 77 5.91-5.91 0.650.65 1010 0.390.39 0.630.63
21 88 5.73-5.73 0.670.67 1515 1.471.47 0.610.61 57 99 5.42-5.42 0.630.63 1212 0.870.87 0.610.61
22 88 5.67-5.67 0.650.65 66 0.1-0.1 0.680.68 58 1313 3.6-3.6 0.620.62 1212 1.831.83 0.640.64
23 11 8.65-8.65 1.161.16 77 0.01-0.01 0.680.68 59 55 5.04-5.04 0.740.74 44 0.120.12 0.80.8
24 2323 3.59-3.59 0.590.59 1717 1.681.68 0.590.59 60 44 6.2-6.2 0.740.74 88 0.470.47 0.650.65
25 1111 3.95-3.95 0.630.63 1010 1.61.6 0.670.67 61 33 6.57-6.57 0.790.79 33 0.88-0.88 0.80.8
26 99 5.45-5.45 0.640.64 2222 2.242.24 0.580.58 62 44 6.32-6.32 0.730.73 55 0.38-0.38 0.710.71
27 1313 4.54-4.54 0.610.61 1717 2.022.02 0.590.59 64 1919 3.71-3.71 0.580.58 1414 1.551.55 0.60.6
28 1111 3.91-3.91 0.640.64 99 1.321.32 0.690.69 65 2222 3.68-3.68 0.580.58 88 0.320.32 0.640.64
29 1010 4.81-4.81 0.620.62 1010 1.091.09 0.620.62 66 1515 4.56-4.56 0.590.59 33 0.97-0.97 0.80.8
30 66 5.26-5.26 0.710.71 55 0.1-0.1 0.780.78 67 44 6.5-6.5 0.730.73 33 1.04-1.04 0.790.79
31 2525 2.21-2.21 0.590.59 1414 2.212.21 0.640.64 68 66 5.81-5.81 0.680.68 55 0.32-0.32 0.720.72
32 44 5.86-5.86 0.790.79 77 0.540.54 0.740.74 69 55 6.13-6.13 0.70.7 44 0.64-0.64 0.740.74
33 1212 4.03-4.03 0.640.64 22 1.55-1.55 1.011.01 70 77 5.5-5.5 0.650.65 55 0.25-0.25 0.710.71
34 66 5.54-5.54 0.670.67 1111 1.181.18 0.610.61

5 Discussion

In this paper, we have derived the \ell_{\infty}-error between the MLE and its true values and established the asymptotic normality of the MLE in the covariate-p0p_{0}-model when the number of vertices goes to infinity. Note that the conditions imposed on bnb_{n} and κn\kappa_{n} in Theorems 14 may not be the best possible. In particular, the conditions guaranteeing the asymptotic normality seem stronger than those guaranteeing the consistency. It would be interesting to investigate whether these bounds can be improved.

As discussed in Yan et al. (2019), there is an implicit taste for the reciprocity parameter in the p1p_{1}-model [Holland and Leinhardt (1981)], although we do not include this parameter. If similarity terms are included, then there is a tendency toward reciprocity among nodes sharing similar node features. That would alleviate the lack of a reciprocity term to some extent. To measure the reciprocity of dyads, it is natural to incorporate the model term ρi<jaijaji\rho\sum_{i<j}a_{ij}a_{ji} as in the p1p_{1} model into the covariate-p0p_{0}-model. In Yan and Leng (2015), empirical results show that there are central limit theorems for the MLE in the p1p_{1} model without covariates. Nevertheless, although only one new parameter is added, the problem of investigating the asymptotic theory of the MLEs becomes more challenging. In particular, the Fisher information matrix for the parameter vector (ρ,α1,,αn,β1,,βn1)(\rho,\alpha_{1},\ldots,\alpha_{n},\beta_{1},\ldots,\beta_{n-1}) is not diagonally dominant and thus does not belong to the class n(m,M)\mathcal{L}_{n}(m,M). In order to generalize the method here, a new approximate matrix with high accuracy of the inverse of the Fisher information matrix is needed. It is beyond of the present paper to investigate their asymptotic theory.

6 Appendix: Proofs for theorems

We only give the proof of Theorem 1 here. The proof of Theorem 3 is put in the supplementary material.

Let F(x):nnF(x):\mathbb{R}^{n}\to\mathbb{R}^{n} be a function vector on xnx\in\mathbb{R}^{n}. We say that a Jacobian matrix F(x)F^{\prime}(x) with xnx\in\mathbb{R}^{n} is Lipschitz continuous on a convex set DnD\subset\mathbb{R}^{n} if for any x,yDx,y\in D, there exists a constant λ>0\lambda>0 such that for any vector vnv\in\mathbb{R}^{n} the inequality

[F(x)]v[F(y)]vλxyv\|[F^{\prime}(x)]v-[F^{\prime}(y)]v\|_{\infty}\leq\lambda\|x-y\|_{\infty}\|v\|_{\infty}

holds. We will use the Newton iterative sequence to establish the existence and consistency of the moment estimator. Gragg and Tapia (1974) gave the optimal error bound for the Newton method under the Kantovorich conditions [Kantorovich (1948)].

Lemma 2 (Gragg and Tapia (1974)).

Let DD be an open convex set of n\mathbb{R}^{n} and F:DnF:D\to\mathbb{R}^{n} a differential function with a Jacobian F(x)F^{\prime}(x) that is Lipschitz continuous on DD with Lipschitz coefficient λ\lambda. Assume that x0Dx_{0}\in D is such that [F(x0)]1[F^{\prime}(x_{0})]^{-1} exists,

[F(x0)]1,[F(x0)]1F(x0)δ,ρ=2λδ1,\displaystyle\|[F^{\prime}(x_{0})]^{-1}\|_{\infty}\leq\aleph,~{}~{}\|[F^{\prime}(x_{0})]^{-1}F(x_{0})\|_{\infty}\leq\delta,~{}~{}\rho=2\aleph\lambda\delta\leq 1,
B(x0,t)D,t=2ρ(11ρ)δ=2δ1+1ρ2δ.\displaystyle B(x_{0},t^{*})\subset D,~{}~{}t^{*}=\frac{2}{\rho}(1-\sqrt{1-\rho})\delta=\frac{2\delta}{1+\sqrt{1-\rho}}\leq 2\delta.

Then: (1) The Newton iterations xk+1=xk[F(xk)]1F(xk)x_{k+1}=x_{k}-[F^{\prime}(x_{k})]^{-1}F(x_{k}) exist and xkB(x0,t)Dx_{k}\in B(x_{0},t^{*})\subset D for k0k\geq 0. (2) x=limxkx^{*}=\lim x_{k} exists, xB(x0,t)¯Dx^{*}\in\overline{B(x_{0},t^{*})}\subset D and F(x)=0F(x^{*})=0.

6.1 Proof of Theorem 1

To show Theorem 1, we need three lemmas below.

Lemma 3.

Let D=B(𝛄,ϵn2)(p)D=B(\boldsymbol{\gamma}^{*},\epsilon_{n2})(\subset\mathbb{R}^{p}). If F(𝛈,𝛄)=O((nlogn)1/2)\|F(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}=O((n\log n)^{1/2}), then Qc(𝛄)Q_{c}(\boldsymbol{\gamma}) is Lipschitz continuous on DD with the Lipschitz coefficient O(bn9n2)O(b_{n}^{9}n^{2}).

Lemma 4.

With probability at least 1O(1/n)1-O(1/n), we have

F(𝜼,𝜸)(nlogn)1/2,Q(𝜼,𝜸)zmaxn(logn)1/2,\|F(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}\leq(n\log n)^{1/2},~{}~{}\|Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}\leq z_{\max}n(\log n)^{1/2}, (18)

where zmax:=maxi,jZijz_{\max}:=\max_{i,j}\|Z_{ij}\|_{\infty}.

Lemma 5.

The difference between Q(𝛈^γ,𝛄)Q(\widehat{\boldsymbol{\eta}}_{\gamma}^{*},\boldsymbol{\gamma}^{*}) and Q(𝛈,𝛄)Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*}) is

Q(𝜼^𝜸,𝜸)Q(𝜼,𝜸)=Op(bn9nlogn).\|Q(\widehat{\boldsymbol{\eta}}_{\boldsymbol{\gamma}}^{*},\boldsymbol{\gamma}^{*})-Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}=O_{p}(b_{n}^{9}n\log n).

Now we are ready to prove Theorem 1.

Proof of Theorem 1.

We construct the Newton iterative sequence to show the consistency. It is sufficient to verify the Newton-Kantovorich conditions in Lemma 2. We set 𝜸\boldsymbol{\gamma}^{*} as the initial point 𝜸(0)\boldsymbol{\gamma}^{(0)} and 𝜸(k+1)=𝜸(k)[Qc(𝜸(k))]1Qc(𝜸(k))\boldsymbol{\gamma}^{(k+1)}=\boldsymbol{\gamma}^{(k)}-[Q_{c}^{\prime}(\boldsymbol{\gamma}^{(k)})]^{-1}Q_{c}(\boldsymbol{\gamma}^{(k)}).

By Lemma 1, 𝜼^γ\widehat{\boldsymbol{\eta}}_{\gamma^{*}} exists with probability approaching one and satisfies we have

𝜼^γ𝜼=Op(bn3lognn).\|\widehat{\boldsymbol{\eta}}_{\gamma^{*}}-\boldsymbol{\eta}^{*}\|_{\infty}=O_{p}\left(b_{n}^{3}\sqrt{\frac{\log n}{n}}\right).

Therefore, Qc(𝜸(0))Q_{c}(\boldsymbol{\gamma}^{(0)}) and Qc(𝜸(0))Q_{c}^{\prime}(\boldsymbol{\gamma}^{(0)}) are well defined.

Recall the definition of Qc(𝜸)Q_{c}(\boldsymbol{\gamma}) and Q(𝜼,𝜸)Q(\boldsymbol{\eta},\boldsymbol{\gamma}) in (7) and (8). By Lemmas 4 and 5, we have

Qc(𝜸)\displaystyle\|Q_{c}(\boldsymbol{\gamma}^{*})\|_{\infty} \displaystyle\leq Q(𝜼,𝜸)+Q(𝜼^𝜸,𝜸)Q(𝜼,𝜸)\displaystyle\|Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}+\|Q(\widehat{\boldsymbol{\eta}}_{\boldsymbol{\gamma}^{*}},\boldsymbol{\gamma}^{*})-Q(\boldsymbol{\eta}^{*},\boldsymbol{\gamma}^{*})\|_{\infty}
=\displaystyle= Op(bn9nlogn).\displaystyle O_{p}\left(b_{n}^{9}n\log n\right).

By Lemma 3, λ=n2bn9\lambda=n^{2}b_{n}^{9}. By (14), we have

=[Qc(𝜸)]1=O(κnn2).\aleph=\|[Q_{c}^{\prime}(\boldsymbol{\gamma}^{*})]^{-1}\|_{\infty}=O(\kappa_{n}n^{-2}).

Thus,

δ=[Qc(𝜸)]1Qc(𝜸)=Op(κnbn9lognn).\delta=\|[Q_{c}^{\prime}(\boldsymbol{\gamma}^{*})]^{-1}Q_{c}(\boldsymbol{\gamma}^{*})\|_{\infty}=O_{p}\left(\frac{\kappa_{n}b_{n}^{9}\log n}{n}\right).

As a result, if κn2bn18=o(n/logn)\kappa_{n}^{2}b_{n}^{18}=o(n/\log n), then

ρ=2λδ=O(κn2bn18lognn)=o(1).\rho=2\aleph\lambda\delta=O(\frac{\kappa_{n}^{2}b_{n}^{18}\log n}{n})=o(1).

By Lemma 2, with probability 1O(n1)1-O(n^{-1}), the limiting point of the sequence {𝜸(k)}k=1\{\boldsymbol{\gamma}^{(k)}\}_{k=1}^{\infty} exists denoted by 𝜸^\widehat{\boldsymbol{\gamma}} and satisfies

𝜸^𝜸=O(δ).\|\widehat{\boldsymbol{\gamma}}-\boldsymbol{\gamma}^{*}\|_{\infty}=O(\delta).

By Lemma 1, 𝜼^𝜸^\widehat{\boldsymbol{\eta}}_{\widehat{\boldsymbol{\gamma}}} exists and (𝜼^𝜸^,𝜸^)(\widehat{\boldsymbol{\eta}}_{\widehat{\boldsymbol{\gamma}}},\widehat{\boldsymbol{\gamma}}) is the MLE. It completes the proof. ∎

References

  • Amemiya (1985) Amemiya, T. (1985). Advanced Econometrics. Cambridge, MA. Harvard University Press.
  • Chatterjee et al. (2011) Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence. Annals of Applied Probability, 21:1400–1435.
  • Dzemski (2019) Dzemski, A. (2019). An empirical model of dyadic link formation in a network with unobserved heterogeneity. The Review of Economics and Statistics, 101(5):763–776.
  • Fienberg (2012) Fienberg, S. E. (2012). A brief history of statistical models for network analysis and open challenges. Journal of Computational and Graphical Statistics, 21(4):825–839.
  • Gragg and Tapia (1974) Gragg, W. B. and Tapia, R. A. (1974). Optimal error bounds for the Newton–Kantorovich theorem. SIAM Journal on Numerical Analysis, 11(1):10–13.
  • Graham (2017) Graham, B. S. (2017). An econometric model of link formation with degree heterogeneity. Econometrica, 85:1033–1063.
  • Haberman (1977) Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. The Annals of Statistics, 5:815–841.
  • Hillar and Wibisono (2013) Hillar, C. and Wibisono, A. (2013). Maximum entropy distributions on graphs. arXiv e-prints, page arXiv:1301.3321.
  • Holland and Leinhardt (1981) Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373):33–50.
  • Kantorovich (1948) Kantorovich, L. V. (1948). Functional analysis and applied mathematics. Uspekhi Mat Nauk, pages 89–185.
  • Lazega (2001) Lazega, E. (2001). The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford University Press.
  • Liang and Du (2012) Liang, H. and Du, P. (2012). Maximum likelihood estimation in logistic regression models with a diverging number of covariates. Electron. J. Statist., 6:1838–1846.
  • Neyman and Scott (1948) Neyman, J. and Scott, E. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16:1–32.
  • Perry and Wolfe (2012) Perry, P. O. and Wolfe, P. J. (2012). Null models for network data. Available at http://arxiv.org/abs/1201.5871.
  • Portnoy (1988) Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist., 16(1):356–366.
  • Rinaldo and Fienberg (2013) Rinaldo, A., P. S. and Fienberg, S. E. (2013). Maximum likelihood estimation in the β\beta-model. The Annals of Statistics, 41:1085–1110.
  • Wang (2011) Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates. Ann. Statist., 39(1):389–417.
  • Yan (2021) Yan, T. (2021). Maximum likelihood estimation in a sparse β\beta-model with covariates. Manuscript.
  • Yan et al. (2019) Yan, T., Jiang, B., Fienberg, S. E., and Leng, C. (2019). Statistical inference in a directed network model with covariates. Journal of the American Statistical Association, 114(526):857–868.
  • Yan and Leng (2015) Yan, T. and Leng, C. (2015). A simulation study of the p1p_{1} model. Statistics and Its Interface, 8:255–266.
  • Yan et al. (2016a) Yan, T., Leng, C., and Zhu, J. (2016a). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist., 44(1):31–57.
  • Yan et al. (2016b) Yan, T., Qin, H., and Wang, H. (2016b). Asymptotics in undirected random graph models parameterized by the strengths of vertices. Statistica Sinica, 26(1):273–293.
  • Yan and Xu (2013) Yan, T. and Xu, J. (2013). A central limit theorem in the β\beta-model for undirected random graphs with a diverging number of vertices. Biometrika, 100:519–524.