This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Differentially private analysis of networks with covariates via a generalized β\beta-model

Ting Yan Department of Statistics, Central China Normal University, Wuhan, 430079, China. Emails: [email protected]
Abstract

How to achieve the tradeoff between privacy and utility is one of fundamental problems in private data analysis. In this paper, we give a rigourous differential privacy analysis of networks in the appearance of covariates via a generalized β\beta-model, which has an nn-dimensional degree parameter β\beta and a pp-dimensional homophily parameter γ\gamma. Under (kn,ϵn)(k_{n},\epsilon_{n})-edge differential privacy, we use the popular Laplace mechanism to release the network statistics. The method of moments is used to estimate the unknown model parameters. We establish the conditions guaranteeing consistency of the differentially private estimators β^\widehat{\beta} and γ^\widehat{\gamma} as the number of nodes nn goes to infinity, which reveal an interesting tradeoff between a privacy parameter and model parameters. The consistency is shown by applying a two-stage Newton’s method to obtain the upper bound of the error between (β^,γ^)(\widehat{\beta},\widehat{\gamma}) and its true value (β,γ)(\beta,\gamma) in terms of the \ell_{\infty} distance, which has a convergence rate of rough order 1/n1/21/n^{1/2} for β^\widehat{\beta} and 1/n1/n for γ^\widehat{\gamma}, respectively. Further, we derive the asymptotic normalities of β^\widehat{\beta} and γ^\widehat{\gamma}, whose asymptotic variances are the same as those of the non-private estimators under some conditions. Our paper sheds light on how to explore asymptotic theory under differential privacy in a principled manner; these principled methods should be applicable to a class of network models with covariates beyond the generalized β\beta-model. Numerical studies and a real data analysis demonstrate our theoretical findings.


Keywords: Asymptotic normality; Consistency; Covariate; Differential privacy; Network data

1 Introduction

Social network data may contain sensitive information about relationships between individuals (e.g., friendship, email exchange, sexual interaction) and even individuals themselves (e.g., respondents in sexual partner networks). Undoubtedly, it will expose individual’s privacy if these data are directly released to the public for various research purposes. Even if individuals are anonymized by removing identifications before being made public, it is still easy to attack by applying some de-anonymization techniques [e.g., Narayanan and Shmatikov, (2009)]. A randomized data releasing mechanism that injects random noises to the original data (i.e., input perturbation) or their aggregate statistics to queries (i.e., output perturbation), provides an alternative to protect data privacy. To rigorously restrict privacy leakage, Dwork et al., (2006) developed a privacy notation–differential privacy that requires that the output of a query does not change too much if we add/remove any single individual’s record to/from a database in randomized data releasing mechanisms. Since then, it has been widely accepted as a privacy standard for releasing sensitive data.

Many differentially private algorithms have been proposed to release network data or their aggregate statistics, especially in computer and machine learning literature [e.g., Day et al., (2016); Macwan and Patel, (2018); Nguyen et al., (2016); Wang et al., (2022)]. On the other hand, denoising approaches have been developed to improve the estimation of network statistics [e.g, Hay et al., (2009); Karwa and Slavković, (2016); Yan, (2021)]. However, differentially private inference in network models is still in its infancy. This is partly because network data are nonstandard and asymptotic analysis is usually based on only one observed network. The increasing dimension of parameters and the appearance of noises poses additional challenge as well [Fienberg, (2012)]. Recently, Karwa and Slavković, (2016) derived consistency and asymptotic normality of the differentially private estimator of the parameter constructed from the denoised degree sequence in the β\beta-model, which is an exponential random graph model with the degree sequence as the sufficient statistic [Chatterjee et al., (2011)]. Yan, (2021) derived asymptotic properties of the differentially private estimators in the p0p_{0} model for directed networks. In despite of these recent developments, to the best of our knowledge, the differentially private analyses of networks in the presence of covariates have not been explored, in that neither their releasing methods nor their theoretical properties are well understood.

The covariates of nodes could have important implications on the link formation. A commonly existing phenomenon in social and econometric network data is that individuals tend to form connections with those like themselves, which is referred to as homophily. Therefore, it is of interest to see how covariates influence differentially private estimation in network models. The aim of this paper is to give a rigorously differentially private analysis of networks in the presence of covariates via a generalized β\beta-model. It contains an nn-dimensional degree parameter β\beta characterizing the variation in the node degrees and a pp-dimensional regression parameter γ\gamma of covariates measuring phomophic or heteriphic effects. A detailed description is given in Section 2.1. This model had been introduced in Graham, (2017) to model economic networks and a directed version was proposed in Yan et al., (2019). In this model, the degree sequence dd and the covariate term yy are the sufficient statistics. Therefore, it is sufficient to treat only these information as privacy contents in this model. Under (kn,ϵn)(k_{n},\epsilon_{n})-edge differential privacy, we propose to use a joint Laplace mechanism to release the network statistics dd and yy, which are added the discrete Laplacian noises and continuous Laplacian noises, respectively.

We construct estimating equations to infer model parameters based on the original maximum likelihood equations, in which the original network statistics are directly replaced by their noisy outputs. We develop new approaches to establish asymptotic theory of differentially private estimators. Owning to noises having zero mean, they are the same as the moment equations. The main contributions are as follows. First, we establish the conditions on a privacy parameter and model parameters that guarantee consistency of the differentially private estimator, which control the trade-off between privacy and utility. A key idea for the proof is that we use a two-stage Newton’s method that first obtains the upper bound of the error in terms of \ell_{\infty} norm between β^γ\widehat{\beta}_{\gamma} and β\beta with a given γ\gamma, and then derives the upper bound of the error between γ^\widehat{\gamma} and γ\gamma by using a profiled function, where β^\widehat{\beta} and γ^\widehat{\gamma} are the differentially private estimators of β\beta and γ\gamma, respectively. As a result, we obtain the convergence rates of β^\widehat{\beta} and γ^\widehat{\gamma} having respective orders of Op(n1/2)O_{p}(n^{-1/2}) and Op(n1)O_{p}(n^{-1}) roughly, both up to a logarithm factor. Notably, the convergence rate for β^\widehat{\beta} matches the minimax optimal upper bound β^β=Op((logp/n)1/2)\|\widehat{\beta}-\beta\|_{\infty}=O_{p}((\log p/n)^{1/2}) for the Lasso estimator in the linear model with p=np=n-dimensional parameter β\beta and the sample size nn in Lounici, (2008). Second, we derive the asymptotic normal distributions of β^\widehat{\beta} and γ^\widehat{\gamma}. This is proved by applying Taylor’s expansions to a series of functions constructed from estimating equations and showing that various remainder terms in the expansions are asymptotically neglect. The convergence rate 1/n1/n of γ^\widehat{\gamma} makes that the asymptotic distribution of β^\widehat{\beta} does not depend on γ^\widehat{\gamma} and therefore has no bias. The asymptotic distribution of γ^\widehat{\gamma} of the homophily parameter γ\gamma contains a bias term in terms of a weighted sum of covariates. Finally, we provide simulation studies as well as a real data analysis to illustrate the theoretical results.

We note that Karwa and Slavković, (2016) obtained asymptotic results of the edge-differentially private estimator based on the denoising process while our asymptotic results do not require the denoising process. Another important difference from Karwa and Slavković, (2016) is that we characterize how errors of estimators depend on the privacy parameter and we do not make the assumption that all parameters are bounded above by a constant in asymptotic theories.

For the rest of the paper, we proceed as follows. In Section 2, we give a necessary background on the generalized β\beta-model and differential privacy. In Section 3, we present the estimation. In Section 4, we present the consistency and asymptotic normality of the differentially private estimator. We carry out simulations and illustrate our results by a real data analysis in Section 5. We give the summary and further discussion in Section 6. The proofs of the main results are regelated into Section 7. The proofs of supported lemmas are given in the supplementary material.

2 Model and differential privacy

In this section, we introduce the generalized β\beta-model with covariates and present the necessary background for differential privacy.

2.1 Generalized β\beta-model

Let GnG_{n} be an undirected graph on n2n\geq 2 nodes labeled by “1,,n1,\ldots,n”. Let A=(aij)n×nA=(a_{ij})_{n\times n} be the adjacency matrix of GnG_{n}, where aija_{ij} is an indicator denoting whether node ii is connected to node jj. That is, aij=1a_{ij}=1 if there is a link between ii and jj; aij=0a_{ij}=0 otherwise. We do not consider self-loops here, i.e., aii=0a_{ii}=0. Let di=jiaijd_{i}=\sum_{j\neq i}a_{ij} be the degree of node ii and d=(d1,,dn)d=(d_{1},\ldots,d_{n})^{\top} be the degree sequence of the graph GnG_{n}. We also observe a pp-dimensional vector zijz_{ij}, the covariate information attached to the edge between nodes ii and jj. The covariate zijz_{ij} can be formed according to the similarity or dissimilarity between nodal attributes ziz_{i} and zjz_{j} for nodes ii and jj. Specifically, zijz_{ij} can be represented through a symmetric function g(,)g(\cdot,\cdot) with ziz_{i} and zjz_{j} as its arguments. As an example if ziz_{i} is an indicator of genders (e.g., 11 for male and 1-1 for female), then we could use zij=zizjz_{ij}=z_{i}\cdot z_{j} to denote the similarity or dissimilarity measurement between ii and jj.

The β\beta-model with covariates [Graham, (2017); Yan et al., (2019)] assumes that the edge aija_{ij} between ii and jj conditional on the unobserved degree effects and observed covariates has the following probability:

(aij=a|β,zij)=e(βi+βj+zijγ)a1+eβi+βj+zijγ,a{0,1},\mathbb{P}(a_{ij}=a|\beta,z_{ij})=\frac{e^{(\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma)a}}{1+e^{\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma}},~{}~{}a\in\{0,1\}, (1)

independent of other edges. The parameter βi\beta_{i} is the intrinsic individual effect that reflects the node heterogeneity to participate in network connection. The common parameter γ\gamma is exogenous, measuring the homophily or heterophilic effect. A larger homophily component zijγz_{ij}^{\top}\gamma means a larger homophily effect. We will refer to γ\gamma as the homophily parameter hereafter although it could represent heterophilic measurement. Hereafter, we call model (1) the covariate-adjusted β\beta-model.

The log-likelihood function is

(β,γ)=i=1nβidi+i<jaijzijγi<jlog(1+eβi+βj+zijγ1+eβi+βj+zijγ).\ell(\beta,\gamma)=\sum_{i=1}^{n}\beta_{i}d_{i}+\sum_{i<j}a_{ij}z_{ij}^{\top}\gamma-\sum_{i<j}\log\left(1+\frac{e^{\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma}}{1+e^{\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma}}\right). (2)

The maximum likelihood equations are

di=jieβi+βj+zijγ1+eβi+βj+zijγ,i<jaijzij=i<jzijeβi+βj+zijγ1+eβi+βj+zijγ.\begin{array}[]{rcl}d_{i}&=&\sum_{j\neq i}\frac{e^{\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma}}{1+e^{\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma}},\\ \sum_{i<j}a_{ij}z_{ij}&=&\sum_{i<j}\frac{z_{ij}e^{\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma}}{1+e^{\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma}}.\end{array} (3)

The R language provides a standard package “glm” to solve (3), which implements an iteratively reweighted least squares method for generalized linear models [McCullagh and Nelder, (1989)].

2.2 Differential privacy

Given an original database DD with records of nn persons, we consider a randomized data releasing mechanism QQ that takes DD as input and outputs a sanitized database S=(S1,,S)S=(S_{1},\ldots,S_{\ell}) for public use. As an illustrated example, the additive noise mechanism returns the answer f(D)+zf(D)+z to the query f(D)f(D), where zz is a random noise. Let ϵ\epsilon be a positive real number and 𝒮\mathcal{S} denote the sample space of QQ. The data releasing mechanism QQ is ϵ\epsilon-differentially private if for any two neighboring databases D1D_{1} and D2D_{2} that differ on a single element (i.e., the data of one person), and all measurable subsets BB of 𝒮\mathcal{S} [Dwork et al., (2006)],

Q(SB|D1)eϵ×Q(SB|D2).Q(S\in B|D_{1})\leq e^{\epsilon}\times Q(S\in B|D_{2}).

This says the probability of an output SS given the input D1D_{1} is less than that given the input D2D_{2} multiplied by a privacy factor eϵe^{\epsilon}. The privacy parameter ϵ\epsilon is chosen according to the privacy policy, which controls the trade-off between privacy and utility. It is generally public. Smaller value of ϵ\epsilon means more privacy protection.

Differential privacy requires that the distribution of the output is almost the same whether or not an individual’s record appears in the original database. We illustrate why it protects privacy with an example. Suppose a hospital wants to release some statistics on the medical records of their patients to the public. In response, a patient may wish to make his record omitted from the study due to a privacy concern that the published results will reveal something about him personally. Differential privacy alleviates this concern because whether or not the patient participates in the study, the probability of a possible output is almost the same. From a theoretical point, any test statistic has nearly no power for testing whether an individual’s data is in the original database or not [Wasserman and Zhou, (2010)].

What is being protected in the differential privacy is precisely the difference between two neighboring databases. Within network data, depending on the definition of the graph neighbor, differential privacy is divided into kk-node differential privacy [Hay et al., (2009)] and kk-edge differential privacy [Nissim et al., (2007)]. Two graphs are called neighbors if they differ in exactly kk edges, then differential privacy is kk-edge differential privacy. The special case with k=1k=1 is generally referred to as edge differential privacy. Analogously, we can define kk-node differential privacy by letting graphs be neighbors if one can be obtained from the other by removing kk nodes and its adjacent edges. Edge differential privacy protects edges not to be detected, whereas node differential privacy protects nodes together with their adjacent edges, which is a stronger privacy policy. However, it may be infeasible to design algorithms that are both node differential privacy and have good utility since it generally needs a large noise [e.g., Hay et al., (2009)]. Following Hay et al., (2009) and Karwa and Slavković, (2016), we use edge differential privacy here.

Let δ(G,G)\delta(G,G^{\prime}) be the harming distance between two graphs GG and GG^{\prime}, i.e., the number of edges on which GG and GG^{\prime} differ. The formal definition of (ϵ,k)(\epsilon,k)-edge differential privacy is as follows.

Definition 1 (Edge differential privacy).

Let ϵ>0\epsilon>0 be a privacy parameter. A randomized mechanism Q(|G)Q(\cdot|G) is (k,ϵ)(k,\epsilon)-edge differentially private if

supG,G𝒢,δ(G,G)=ksupS𝒮Q(S|G)Q(S|G)eϵ,\sup_{G,G^{\prime}\in\mathcal{G},\delta(G,G^{\prime})=k}\sup_{S\in\mathcal{S}}\frac{Q(S|G)}{Q(S|G^{\prime})}\leq e^{\epsilon},

where 𝒢\mathcal{G} is the set of all graphs of interest on nn nodes and 𝒮\mathcal{S} is the set of all possible outputs.

Let f:𝒢f:\mathcal{G}\rightarrow\mathbb{R}^{\ell} be a function. The global sensitivity [Dwork et al., (2006)] of the function ff, denoted Δf\Delta f, is defined below.

Definition 2.

(Global Sensitivity). Let f:𝒢f:\mathcal{G}\to\mathbb{R}^{\ell}. The global sensitivity of ff is defined as

Δ(f)=maxδ(G,G)=kf(G)f(G)1\Delta(f)=\max_{\delta(G,G^{\prime})=k}\|f(G)-f(G^{\prime})\|_{1}

where 1\|\cdot\|_{1} is the L1L_{1} norm.

The global sensitivity measures the largest change for the query function ff in terms of the L1L_{1}-norm between any two neighboring graphs. The magnitude of noises added in the differentially private algorithm QQ crucially depends on the global sensitivity. If the outputs are the network statistics, then a simple algorithm to guarantee edge differential privacy is the Laplace Mechanism [e.g., Dwork et al., (2006)] that adds the Laplacian noise proportional to the global sensitivity of ff.

Lemma 1.

(Laplace Mechanism). Suppose that f:𝒢f:\mathcal{G}\to\mathbb{R}^{\ell} is a output function in 𝒢\mathcal{G}. Let z1,,zz_{1},\ldots,z_{\ell} be independently and identically distributed Laplace random variables with density function e|z|/λ/(2λ)e^{-|z|/\lambda}/(2\lambda). Then the Laplace mechanism outputs f(G)+(z1,,z)f(G)+(z_{1},\ldots,z_{\ell}) is (ϵ,k)(\epsilon,k)-edge differentially private, where ϵ=Δ(f)/λ\epsilon=\Delta(f)/\lambda.

When f(G)f(G) is integer, one can use a discrete Laplace random variable as the noise, where it has the probability mass function:

(X=x)=1λ1+λλ|x|,x{0,±1,},λ(0,1).\mathbb{P}(X=x)=\frac{1-\lambda}{1+\lambda}\lambda^{|x|},~{}~{}x\in\{0,\pm 1,\ldots\},\lambda\in(0,1). (4)

Lemma 1 still holds if the continuous Laplace distribution is replaced by the discrete version and the privacy parameter is chosen by ϵ=Δ(f)logλ\epsilon=-\Delta(f)\log\lambda; see Karwa and Slavković, (2016).

We introduce a nice property on differential privacy: any function of a differentially private mechanism is also differentially private, as stated in the lemma below.

Lemma 2 (Dwork et al., (2006)).

Let ff be an output of an ϵ\epsilon-differentially private mechanism and gg be any function. Then g(f(G))g(f(G)) is also ϵ\epsilon-differentially private.

3 Releasing network statistics and estimation

3.1 Releasing

From the log-likelihood function (2), we know that (d,i<jaijzij)(d,\sum_{i<j}a_{ij}z_{ij}) is the sufficient statistic. Thus, the private information in the covariates-adjusted β\beta–model is (d,i<jaijzij)(d,\sum_{i<j}a_{ij}z_{ij}). We use the continuous Laplace mechanism in Lemma 1 and its discrete version to release the covariate statistic i<jzijaij\sum_{i<j}z_{ij}a_{ij} and the degree sequence dd under (ϵn/2,kn)(\epsilon_{n}/2,k_{n})-edge differential privacy, respectively. The joint mechanism satisfies (ϵn,kn)(\epsilon_{n},k_{n})-edge differential privacy. The subscript nn means that knk_{n} and ϵn\epsilon_{n} are allowed to depend on nn. If we add k1k_{1} or remove k2=knk1k_{2}=k_{n}-k_{1} edges in GnG_{n} and denote the induced graph as GnG_{n}^{\prime}, then

dd1=2kn,i<j(aijaij)zij1pknz\|d-d^{\prime}\|_{1}=2k_{n},~{}~{}\|\sum_{i<j}(a_{ij}-a^{\prime}_{ij})z_{ij}\|_{1}\leq pk_{n}z_{*}

where dd^{\prime} is the degree sequence of GnG_{n}^{\prime}, aija^{\prime}_{ij} is the value of edge (i,j)(i,j) in GnG_{n}^{\prime} and z=maxijk|zijk|z_{*}=\max_{ijk}|z_{ijk}|. So the global sensitivity is 2kn2k_{n} for dd and pknzpk_{n}z_{*} for i<jaijzij)\sum_{i<j}a_{ij}z_{ij}). We release the sufficient statistics dd and y:=i<jaijzijy:=\sum_{i<j}a_{ij}z_{ij} as follows:

d~i=di+ξi,i=1,,n,y~t=i<jzijtaij+ηt,t=1,,p,\displaystyle\begin{array}[]{rcl}\tilde{d}_{i}&=&d_{i}+\xi_{i},~{}~{}i=1,\ldots,n,\\ \tilde{y}_{t}&=&\sum_{i<j}z_{ijt}a_{ij}+\eta_{t},~{}~{}t=1,\ldots,p,\end{array} (7)

where ξi\xi_{i}, i=1,,ni=1,\ldots,n, are independently generated from the discrete Laplace distribution with λn1=eϵn/(4kn)\lambda_{n1}=e^{-\epsilon_{n}/(4k_{n})}, and ηt\eta_{t}, t=1,,pt=1,\ldots,p, are independently generated from the Laplace distribution with λn2=2pknz/ϵn\lambda_{n2}=2pk_{n}z_{*}/\epsilon_{n}.

3.2 Estimation

Write μ(x)=ex/(1+ex)\mu(x)=e^{x}/(1+e^{x}). Define

πij:=zijγ+βi+βj.\pi_{ij}:=z_{ij}^{\top}\gamma+\beta_{i}+\beta_{j}. (8)

It is clear that μ(πij)\mu(\pi_{ij}) is the expectation of aija_{ij}. When we emphasize the arguments β\beta and γ\gamma in μ()\mu(\cdot), we write μij(β,γ)\mu_{ij}(\beta,\gamma) instead of μ(πij)\mu(\pi_{ij}). To estimate model parameters, we directly replace dd and yy in maximum likelihood equations (3) with their noisy observed values d~\tilde{d} and y~\tilde{y}:

d~i=jiμij(β,γ),i=1,,n,y~=i=1nj=1,j<inzijμij(β,γ).\begin{array}[]{rcl}\tilde{d}_{i}&=&\sum_{j\neq i}\mu_{ij}(\beta,\gamma),~{}~{}i=1,\ldots,n,\\ \tilde{y}&=&\sum_{i=1}^{n}\sum_{j=1,j<i}^{n}z_{ij}\mu_{ij}(\beta,\gamma).\end{array} (9)

Because the expectations of the noises are zero, the above equation are the same as the moment equations.

Let (β^,γ^)(\widehat{\beta},\widehat{\gamma}) be the solution to the equations (9). Since (d~,y~)(\tilde{d},\tilde{y}) satisfies (ϵn,kn)(\epsilon_{n},k_{n})-edge differential privacy, (β^,γ^)(\widehat{\beta},\widehat{\gamma}) is also (ϵn,kn)(\epsilon_{n},k_{n})-edge differentially private according to Lemma 2. A two-step iterative algorithm by alternating between solving the first equation in (9) via the fixed point method in Chatterjee et al., (2011) for a given γ\gamma and solving the second equation in (9) via the Newton method or the gradient descent method, can be employed to obtain the solution.

4 Asymptotic properties

In this section, we present consistency and asymptotic normality of the differentially private estimator (β^,γ^)(\widehat{\beta},\widehat{\gamma}). We first introduce some notations. For a subset CnC\subset\mathbb{R}^{n}, let C0C^{0} and C¯\overline{C} denote the interior and closure of CC, respectively. For a vector x=(x1,,xn)nx=(x_{1},\ldots,x_{n})^{\top}\in\mathbb{R}^{n}, denote by x\|x\| for a general norm on vectors with the special cases x=max1in|xi|\|x\|_{\infty}=\max_{1\leq i\leq n}|x_{i}| and x1=i|xi|\|x\|_{1}=\sum_{i}|x_{i}| for the \ell_{\infty}- and 1\ell_{1}-norm of xx respectively. Let B(x,ϵ)={y:xyϵ}B(x,\epsilon)=\{y:\|x-y\|_{\infty}\leq\epsilon\} be an ϵ\epsilon-neighborhood of xx. For an n×nn\times n matrix J=(Jij)J=(J_{ij}), J\|J\|_{\infty} denotes the matrix norm induced by the \ell_{\infty}-norm on vectors in n\mathbb{R}^{n}, i.e.,

J=maxx0Jxx=max1inj=1n|Jij|,\|J\|_{\infty}=\max_{x\neq 0}\frac{\|Jx\|_{\infty}}{\|x\|_{\infty}}=\max_{1\leq i\leq n}\sum_{j=1}^{n}|J_{ij}|,

and J\|J\| be a general matrix norm. Define the matrix maximum norm: Jmax=maxi,j|Jij|\|J\|_{\max}=\max_{i,j}|J_{ij}|. We use the superscript “*” to denote the true parameter under which the data are generated. When there is no ambiguity, we omit the superscript “*”. Define

z:=maxi,jzij.z_{*}:=\max_{i,j}\|z_{ij}\|_{\infty}.

The notation j<i\sum_{j<i} is a shorthand for i=1nj=1,j<in\sum_{i=1}^{n}\sum_{j=1,j<i}^{n}.

Recall that μ(x)=ex/(1+ex)\mu(x)=e^{x}/(1+e^{x}). Write μ\mu^{\prime}, μ′′\mu^{\prime\prime} and μ′′′\mu^{\prime\prime\prime} as the first, second and third derivative of μ(x)\mu(x) on xx, respectively. A direct calculation gives that

μ(x)=ex(1+ex)2,μ′′(x)=ex(1ex)(1+ex)3,μ′′′(x)=ex(14ex+e2x)(1+ex)4.\mu^{\prime}(x)=\frac{e^{x}}{(1+e^{x})^{2}},~{}~{}\mu^{\prime\prime}(x)=\frac{e^{x}(1-e^{x})}{(1+e^{x})^{3}},~{}~{}\mu^{\prime\prime\prime}(x)=\frac{e^{x}(1-4e^{x}+e^{2x})}{(1+e^{x})^{4}}.

It is easily checked that

|μ(x)|14,|μ′′(x)|14,|μ′′′(x)|14.|\mu^{\prime}(x)|\leq\frac{1}{4},~{}~{}|\mu^{\prime\prime}(x)|\leq\frac{1}{4},~{}~{}|\mu^{\prime\prime\prime}(x)|\leq\frac{1}{4}. (10)

Let ϵn1\epsilon_{n1} and ϵn2\epsilon_{n2} be two small positive numbers. Note that f(x)=ex(1+ex)2f(x)=e^{x}(1+e^{x})^{-2} is a decreasing function of xx when x0x\geq 0 and f(x)=f(x)f(x)=f(-x). Recall that πij=βi+βj+zijγ\pi_{ij}=\beta_{i}+\beta_{j}+z_{ij}^{\top}\gamma. Define

bn:=supβB(β,ϵn1),γB(γ,ϵn2)maxi,j(1+eπij)2eπij=O(e2β+γ).b_{n}:=\sup_{\beta\in B(\beta^{*},\epsilon_{n1}),\gamma\in B(\gamma^{*},\epsilon_{n2})}\max_{i,j}\frac{(1+e^{\pi_{ij}})^{2}}{e^{\pi_{ij}}}=O(e^{2\|\beta^{*}\|_{\infty}+\|\gamma^{*}\|_{\infty}}). (11)

In other words, we have

infβB(β,ϵn1),γB(γ,ϵn2)mini,jeπij(1+eπij)21bn.\inf_{\beta\in B(\beta^{*},\epsilon_{n1}),\gamma\in B(\gamma^{*},\epsilon_{n2})}\min_{i,j}\frac{e^{\pi_{ij}}}{(1+e^{\pi_{ij}})^{2}}\geq\frac{1}{b_{n}}.

Note that bn1/4b_{n}\geq 1/4. When causing no confusion, we will simply write μij\mu_{ij} stead of μij(β,γ)\mu_{ij}(\beta,\gamma) for shorthand. We will use the notations μ(πij)\mu(\pi_{ij}) and μij(β,γ)\mu_{ij}(\beta,\gamma) interchangeably. Hereafter, we assume that the dimension pp of zijz_{ij} is fixed.

4.1 Consistency

To derive consistency of the differentially private estimator, let us first define a system of functions based on the estimating equations (9). Define

Fi(β,γ)=j=1,jinμij(β,γ)d~i,i=1,,n,F_{i}(\beta,\gamma)=\sum\limits_{j=1,j\neq i}^{n}\mu_{ij}(\beta,\gamma)-\tilde{d}_{i},~{}~{}i=1,\ldots,n, (12)

and F(β,γ)=(F1(β,γ),,Fn(β,γ))F(\beta,\gamma)=(F_{1}(\beta,\gamma),\ldots,F_{n}(\beta,\gamma))^{\top}. Further, we define Fγ,i(β)F_{\gamma,i}(\beta) as the value of Fi(β,γ)F_{i}(\beta,\gamma) for an arbitrarily fixed γ\gamma and Fγ(β)=(Fγ,1(β),,Fγ,n(β))F_{\gamma}(\beta)=(F_{\gamma,1}(\beta),\ldots,F_{\gamma,n}(\beta))^{\top}. Let β^γ\widehat{\beta}_{\gamma} be a solution to Fγ(β)=0F_{\gamma}(\beta)=0. Correspondingly, we define two functions for exploring the asymptotic behaviors of the estimator of the homophily parameter:

Q(β,γ)=j<izijμij(β,γ)y~,\displaystyle Q(\beta,\gamma)=\sum_{j<i}z_{ij}\mu_{ij}(\beta,\gamma)-\tilde{y}, (13)
Qc(γ)=j<izijμij(β^γ,γ)y~.\displaystyle Q_{c}(\gamma)=\sum_{j<i}z_{ij}\mu_{ij}(\widehat{\beta}_{\gamma},\gamma)-\tilde{y}. (14)

Qc(γ)Q_{c}(\gamma) could be viewed as the profile function of Q(β,γ)Q(\beta,\gamma) in which the degree parameter β\beta is profiled out. It is clear that

F(β^,γ^)=0,Fγ(β^γ)=0,Q(β^,γ^)=0,Qc(γ^)=0.F(\widehat{\beta},\widehat{\gamma})=0,~{}~{}F_{\gamma}(\widehat{\beta}_{\gamma})=0,~{}~{}Q(\widehat{\beta},\widehat{\gamma})=0,~{}~{}Q_{c}(\widehat{\gamma})=0.

By the compound function derivation law, we have

0=Fγ(β^γ)γ=F(β^γ,γ)ββ^γγ+F(β^γ,γ)γ,\displaystyle 0=\frac{\partial F_{\gamma}(\widehat{\beta}_{\gamma})}{\partial\gamma^{\top}}=\frac{\partial F(\widehat{\beta}_{\gamma},\gamma)}{\partial\beta^{\top}}\frac{\partial\widehat{\beta}_{\gamma}}{\gamma^{\top}}+\frac{\partial F(\widehat{\beta}_{\gamma},\gamma)}{\partial\gamma^{\top}}, (15)
Qc(γ)γ=Q(β^γ,γ)ββ^γγ+Q(β^γ,γ)γ.\displaystyle\frac{\partial Q_{c}(\gamma)}{\partial\gamma^{\top}}=\frac{\partial Q(\widehat{\beta}_{\gamma},\gamma)}{\partial\beta^{\top}}\frac{\partial\widehat{\beta}_{\gamma}}{\gamma^{\top}}+\frac{\partial Q(\widehat{\beta}_{\gamma},\gamma)}{\partial\gamma^{\top}}. (16)

By solving β^γ/γ\partial\widehat{\beta}_{\gamma}/\partial\gamma^{\top} in (15) and substituting it into (16), we get the Jacobian matrix Qc(γ)Q_{c}^{\prime}(\gamma) (=Qc(γ)/γ)(=\partial Q_{c}(\gamma)/\partial\gamma):

Qc(γ)γ=Q(β^γ,γ)γQ(β^γ,γ)β[F(β^γ,γ)β]1F(β^γ,γ)γ,\displaystyle\frac{\partial Q_{c}(\gamma)}{\partial\gamma^{\top}}=\frac{\partial Q(\widehat{\beta}_{\gamma},\gamma)}{\partial\gamma^{\top}}-\frac{\partial Q(\widehat{\beta}_{\gamma},\gamma)}{\partial\beta^{\top}}\left[\frac{\partial F(\widehat{\beta}_{\gamma},\gamma)}{\partial\beta^{\top}}\right]^{-1}\frac{\partial F(\widehat{\beta}_{\gamma},\gamma)}{\partial\gamma^{\top}}, (17)

where

Q(β^γ,γ)γ:=Q(β,γ)γ|β=β^γ,γ=γ,F(β^γ,γ)β:=F(β,γ)β|β=β^γ,γ=γ.\frac{\partial Q(\widehat{\beta}_{\gamma},\gamma)}{\partial\gamma^{\top}}:=\frac{\partial Q(\beta,\gamma)}{\partial\gamma^{\top}}{\Big{|}}_{\beta=\widehat{\beta}_{\gamma},\gamma=\gamma},~{}~{}\frac{\partial F(\widehat{\beta}_{\gamma},\gamma)}{\partial\beta^{\top}}:=\frac{\partial F(\beta,\gamma)}{\partial\beta^{\top}}{\Big{|}}_{\beta=\widehat{\beta}_{\gamma},\gamma=\gamma}.

The asymptotic behavior of γ^\widehat{\gamma} crucially depends on the Jacobian matrix Qc(γ)Q_{c}^{\prime}(\gamma). Since β^γ\widehat{\beta}_{\gamma} does not have a closed form, conditions that are directly imposed on Qc(γ)Q_{c}^{\prime}(\gamma) are not easily checked. To derive feasible conditions, we define

H(β,γ)=Q(β,γ)γQ(β,γ)β[F(β,γ)β]1F(β,γ)γ,H(\beta,\gamma)=\frac{\partial Q(\beta,\gamma)}{\partial\gamma}-\frac{\partial Q(\beta,\gamma)}{\partial\beta}\left[\frac{\partial F(\beta,\gamma)}{\partial\beta}\right]^{-1}\frac{\partial F(\beta,\gamma)}{\partial\gamma}, (18)

which is a general form of Qc(γ)/γ\partial Q_{c}(\gamma)/\partial\gamma. Note that H(β^γ,γ)H(\widehat{\beta}_{\gamma},\gamma) is the Fisher information matrix of the concentrated likelihood function c(γ)\ell_{c}(\gamma), where the degree parameter β\beta is profiled out. When βB(β,ϵn1)\beta\in B(\beta^{*},\epsilon_{n1}) and bn2κn2ϵn1=o(1)b_{n}^{2}\kappa_{n}^{2}\epsilon_{n1}=o(1), we have the approximation:

1n2H(β,γ)=1n2H(β,γ)+o(1),\frac{1}{n^{2}}H(\beta,\gamma^{*})=\frac{1}{n^{2}}H(\beta^{*},\gamma^{*})+o(1),

whose proof is given in the supplementary material. We assume that there exists a number ρn\rho_{n} such that

supβB(β,ϵn1)H1(β,γ)ρnn2.\sup_{\beta\in B(\beta^{*},\epsilon_{n1})}\|H^{-1}(\beta,\gamma^{*})\|_{\infty}\leq\frac{\rho_{n}}{n^{2}}.

Note that the dimension of H(β,γ)H(\beta,\gamma) is fixed and every its entry is a sum n(n1)/2n(n-1)/2 of terms. If n2H(β,γ)n^{-2}H(\beta,\gamma^{*}) converges to a constant matrix, then ρn\rho_{n} is bounded. Moreover, if n2H(β,γ)n^{-2}H(\beta,\gamma^{*}) is positively definite, then

ρn=p×supβB(β,ϵn1)1/λmin(β),\rho_{n}=\sqrt{p}\times\sup_{\beta\in B(\beta^{*},\epsilon_{n1})}1/\lambda_{\min}(\beta),

where λmin(β)\lambda_{\min}(\beta) is the smallest eigenvalue of n2H(β,γ)n^{-2}H(\beta,\gamma^{*}).

We use a two-stage Newton iterative sequence to show consistency. In the first stage, we obtain an upper bound of the error between β^γ\widehat{\beta}_{\gamma} and β\beta in terms of the \ell_{\infty} norm for a given γ\gamma. This is done by verifying the well-known Newton-Kantororich conditions, under which the optimal error bounds are established. Then we derive the upper bound of the error between γ^\widehat{\gamma} and γ\gamma by using a profiled function Qc(γ)Q_{c}(\gamma) constructed from estimating equations. Now we formally state the consistency result.

Theorem 1.

Let ϵ~n=1+(4kn/ϵn)(logn/n)1/2\tilde{\epsilon}_{n}=1+(4k_{n}/\epsilon_{n})(\log n/n)^{1/2} and τn=bn3+(kn/ϵn)bn3+z/(logn)1/2\tau_{n}=b_{n}^{3}+(k_{n}/\epsilon_{n})b_{n}^{3}+z_{*}/(\log n)^{1/2}. If

bn2ϵ~n=o(nlogn),bn3ρn2τnz2=o(nlogn),b_{n}^{2}\tilde{\epsilon}_{n}=o\left(\sqrt{\frac{n}{\log n}}\right),~{}~{}b_{n}^{3}\rho_{n}^{2}\tau_{n}z_{*}^{2}=o\left(\frac{n}{\log n}\right),

then the differentially private estimator (β^,γ^)(\widehat{\beta},\widehat{\gamma}) exists with probability approaching one and is consistent in the sense that

γ^γ\displaystyle\|\widehat{\gamma}-\gamma^{*}\|_{\infty} =Op(ρnτnlognn)=op(1),\displaystyle=O_{p}\left(\frac{\rho_{n}\tau_{n}\log n}{n}\right)=o_{p}(1),
β^β\displaystyle\|\widehat{\beta}-\beta^{*}\|_{\infty} =Op(bnϵ~nlognn)=op(1).\displaystyle=O_{p}\left(b_{n}\tilde{\epsilon}_{n}\sqrt{\frac{\log n}{n}}\right)=o_{p}(1).

The scalar factor τn\tau_{n} appears due to that the magnitude of Qc(γ)\|Q_{c}(\gamma^{*})\|_{\infty} is O(τnnlogn)O(\tau_{n}n\log n). Note that the error bound of the parameter estimator in Theorem 3 of Karwa and Slavković, (2016) does not depend on the privacy parameter. Our result here characterizes how the error bound varies on ϵn\epsilon_{n}. In view of the above theorem, we present the consistency conditions and the error bounds under two special cases. The first case is that the parameters and covariates are bounded. The second is that kn/ϵnk_{n}/\epsilon_{n} goes to zero.

Corollary 1.

Assume that β\|\beta^{*}\|_{\infty} and γ\|\gamma^{*}\|_{\infty} and zz_{*} are bounded above by a constant. If kn/ϵn=o(n/logn)k_{n}/\epsilon_{n}=o(n/\log n), then

γ^γ=Op(knϵnlognn),β^β=Op(lognn).\|\widehat{\gamma}-\gamma^{*}\|_{\infty}=O_{p}\left(\frac{k_{n}}{\epsilon_{n}}\cdot\frac{\log n}{n}\right),~{}~{}\|\widehat{\beta}-\beta^{*}\|_{\infty}=O_{p}\left(\sqrt{\frac{\log n}{n}}\right).
Corollary 2.

Assume that kn/ϵn0k_{n}/\epsilon_{n}\to 0. If bn=o((n/logn)1/4)b_{n}=o((n/\log n)^{1/4}), ρnbn3=o(n1/2/logn)\rho_{n}b_{n}^{3}=o(n^{1/2}/\log n) and z=o((logn)1/2)z_{*}=o((\log n)^{1/2}), then

γ^γ=Op(ρnbn3lognn),β^β=Op(bnlognn).\|\widehat{\gamma}-\gamma^{*}\|_{\infty}=O_{p}\left(\frac{\rho_{n}b_{n}^{3}\log n}{n}\right),~{}~{}\|\widehat{\beta}-\beta^{*}\|_{\infty}=O_{p}\left(b_{n}\sqrt{\frac{\log n}{n}}\right).
Remark 1.

The condition kn/ϵn0k_{n}/\epsilon_{n}\to 0 means that the outputs are nearly the same as the original network statistics. From Corollary 2, we can see that the MLE of γ\gamma has a convergence rate of order logn/n\log n/n.

4.2 Asymptotic normality of β^\widehat{\beta}

The asymptotic distribution of β^\widehat{\beta} depends crucially on the inverse of the Fisher information matrix VV of β\beta. Given m,M>0m,M>0, we say an n×nn\times n matrix V=(vij)V=(v_{ij}) belongs to the matrix class n(m,M)\mathcal{L}_{n}(m,M) if VV is a diagonally balanced matrix with positive elements bounded by mm and MM, i.e.,

vii=j=1,jinvij,i=1,,n,0<mvijM,i,j=1,,n;ij.\begin{array}[]{l}v_{ii}=\sum_{j=1,j\neq i}^{n}v_{ij},~{}~{}i=1,\ldots,n,\\ 0<m\leq v_{ij}\leq M,~{}~{}i,j=1,\ldots,n;i\neq j.\end{array}

Clearly, Fγ(β)F^{\prime}_{\gamma}(\beta) belongs to the matrix class (bn1,1/4)\mathcal{L}(b_{n}^{-1},1/4) when βB(β,ϵn1)\beta\in B(\beta^{*},\epsilon_{n1}) and γB(γ,ϵn2)\gamma\in B(\gamma^{*},\epsilon_{n2}). We will obtain the asymptotic distribution of the estimator β^\widehat{\beta} through obtaining its asymptotic expression, which depends on the inverse of Fγ(β)F^{\prime}_{\gamma}(\beta). However, its inverse does not have a closed form. Yan et al., (2015) proposed to approximate the inverse V1V^{-1} of VV by a diagonal matrix

S=diag(1/v11,,1/vnn),S=\mathrm{diag}(1/v_{11},\ldots,1/v_{nn}), (19)

and obtained the upper bound of the approximate error.

Note that vii=jiVar(aij)v_{ii}=\sum_{j\neq i}\mathrm{Var}(a_{ij}). By the central limit theorem for the bounded case, as in Loéve, (1977, p.289), if bn=o(n1/2)b_{n}=o(n^{1/2}), then vii1/2{di𝔼(di)}v_{ii}^{-1/2}\{d_{i}-\mathbb{E}(d_{i})\} converges in distribution to the standard normal distribution. When considering the asymptotic behaviors of the vector (d1,,dr)(d_{1},\ldots,d_{r}) with a fixed rr, one could replace the degrees d1,,drd_{1},\ldots,d_{r} by the independent random variables d~i=di,r+1++din\tilde{d}_{i}=d_{i,r+1}+\ldots+d_{in}, i=1,,ri=1,\ldots,r. Therefore, we have the following proposition.

Proposition 1.

If bn=o(n1/2)b_{n}=o(n^{1/2}), then as nn\to\infty:
(1)For any fixed r1r\geq 1, the components of (d1𝔼(d1),,dr𝔼(dr))(d_{1}-\mathbb{E}(d_{1}),\ldots,d_{r}-\mathbb{E}(d_{r})) are asymptotically independent and normally distributed with variances v11,,vrrv_{11},\ldots,v_{rr}, respectively.
(2)More generally, i=1nci(di𝔼(di))/vii\sum_{i=1}^{n}c_{i}(d_{i}-\mathbb{E}(d_{i}))/\sqrt{v_{ii}} is asymptotically normally distributed with mean zero and variance i=1ci2\sum_{i=1}^{\infty}c_{i}^{2} whenever c1,c2,c_{1},c_{2},\ldots are fixed constants and the latter sum is finite.

Part (2) follows from part (1) and the fact that

limrlim suptVar(k=r+1ncidi𝔼(di)vii)=0\lim_{r\to\infty}\limsup_{t\to\infty}\mathrm{Var}\left(\sum_{k=r+1}^{n}c_{i}\frac{d_{i}-\mathbb{E}(d_{i})}{\sqrt{v_{ii}}}\right)=0

by Theorem 4.2 of Billingsley, (1995). To prove the above equation, it suffices to show that the eigenvalues of the covariance matrix of (di𝔼(di))/vii1/2(d_{i}-\mathbb{E}(d_{i}))/v_{ii}^{1/2}, i=r+1,,ni=r+1,\ldots,n are bounded by 2 (for all r<nr<n). This is true by the well-known Perron-Frobenius theory: if AA is a symmetric matrix with nonnegative elements, then its largest eigenvalue is less than the largest value of row sums.

We apply a second order Taylor expansion to jiμij(β^,γ^)\sum_{j\neq i}\mu_{ij}(\widehat{\beta},\widehat{\gamma}) to derive the asymptotic expression of β^\widehat{\beta}. In the expansion, the first order term is the sum of V(β^β)V(\widehat{\beta}-\beta) and Vγβ(γ^γ)V_{\gamma\beta}(\widehat{\gamma}-\gamma), where Vγβ=F(β,γ)/γV_{\gamma\beta}=\partial F(\beta,\gamma)/\partial\gamma^{\top}. Since V1V^{-1} does not have a closed form, we work with SS defined at (19) to approximate it. By Theorem 1, γ^\widehat{\gamma} has a n1n^{-1} convergence rate up to a logarithm factor. This makes that the term Vγβ(γ^γ)V_{\gamma\beta}(\widehat{\gamma}-\gamma) is an remainder term. The second order term in the expansion is also asymptotically neglect. Then we represent θ^θ\widehat{\theta}-\theta as the sum of S(d𝔼d)S(d-\mathbb{E}d) and a remainder term. The central limit theorem is proved by establishing the asymptotic normality of S(d𝔼d)S(d-\mathbb{E}d) and showing the remainder is negligible. We formally state the central limit theorem as follows.

Theorem 2.

Assume that the conditions in Theorem 1 hold. If bn3ε~n2+zρnτnbn=o(n1/2/logn)b_{n}^{3}\tilde{\varepsilon}_{n}^{2}+z_{*}\rho_{n}\tau_{n}b_{n}=o(n^{1/2}/\log n), then for fixed kk the vector (v111/2(β^1β),,vkk1/2(β^kβk))(v_{11}^{1/2}(\widehat{\beta}_{1}-\beta^{*}),\ldots,v_{kk}^{1/2}(\widehat{\beta}_{k}-\beta^{*}_{k})) converges in distribution to the kk-dimensional multivariate standard normal distribution.

Remark 2.

The asymptotic variance of β^i\widehat{\beta}_{i} is vii1/2v_{ii}^{-1/2} lying between O(bn/n)O(b_{n}/n) and O(1/n)O(1/n), which is the same as in the non-private estimator.

4.3 Asymptotic normality of γ^\widehat{\gamma}

Let TijT_{ij} be an nn-dimensional column vector with iith and jjth elements ones and other elements zeros. Define

V=F(β,γ)β,Vγβ=Q(β,γ)β,sij(β,γ)=(aij𝔼aij)(zijVγβV1Tij).\begin{array}[]{c}V=\frac{\partial F(\beta^{*},\gamma^{*})}{\partial\beta^{\top}},~{}~{}V_{\gamma\beta}=\frac{\partial Q(\beta^{*},\gamma^{*})}{\partial\beta^{\top}},\\ s_{ij}(\beta,\gamma)=(a_{ij}-\mathbb{E}a_{ij})(z_{ij}-V_{\gamma\beta}V^{-1}T_{ij}).\end{array}

Note that sij(β,γ)s_{ij}(\beta,\gamma), i<ji<j, are independent vectors. By direct calculations,

(Vγβ)kj=Qk(β,γ)βj=ijzijkμ(πij),(V_{\gamma\beta})_{kj}=\frac{\partial Q_{k}(\beta^{*},\gamma^{*})}{\partial\beta_{j}}=\sum_{i\neq j}z_{ijk}\mu^{\prime}(\pi^{*}_{ij}),

such that

Vγβz4(n1).\|V_{\gamma\beta}\|_{\infty}\leq\frac{z_{*}}{4}(n-1).

By Lemma 4, we have

VγβV1VγβV1z(n1)4bn(3n4)2(n1)(n2)<bnz.\|V_{\gamma\beta}V^{-1}\|_{\infty}\leq\|V_{\gamma\beta}\|_{\infty}\|V^{-1}\|_{\infty}\leq\frac{z_{*}(n-1)}{4}\cdot\frac{b_{n}(3n-4)}{2(n-1)(n-2)}<b_{n}z_{*}.

Therefore, all sij(β,γ)s_{ij}(\beta^{*},\gamma^{*}) are bounded. Let Q¯=Q(β,γ)η\bar{Q}=Q(\beta^{*},\gamma^{*})-\eta and F¯=F(β,γ)ξ\bar{F}=F(\beta^{*},\gamma^{*})-\xi. Note that

Cov(i<jsij(β,γ))=Cov(Q¯VβγV1F¯)=H(β,γ),\mathrm{Cov}(\sum\nolimits_{i<j}s_{ij}(\beta^{*},\gamma^{*}))=\mathrm{Cov}(\bar{Q}-V_{\beta\gamma}^{\top}V^{-1}\bar{F})=H(\beta^{*},\gamma^{*}),

where H(β,γ)H(\beta,\gamma) is defined at (18). By the central limit theorem for the bounded case, as in Loéve, (1977, p.289), we have the following proposition.

Proposition 2.

For any nonzero fixed vector c=(c1,cp)c=(c_{1},\ldots c_{p})^{\top}, if (cH(β,γ)c)(c^{\top}H(\beta^{*},\gamma^{*})c) diverges, then
(cH(β,γ)c)1/2ci<js~ij(β,γ)(c^{\top}H(\beta^{*},\gamma^{*})c)^{-1/2}c^{\top}\sum_{i<j}\tilde{s}_{ij}(\beta^{*},\gamma^{*}) converges in distribution to the standard normal distribution.

Let N=n(n1)/2N=n(n-1)/2 and

H¯=limn1NH(β,γ).\bar{H}=\lim_{n\to\infty}\frac{1}{N}H(\beta^{*},\gamma^{*}).

We assume that the above limit exists. We briefly describe the idea of proving asymptotic normality of γ^\widehat{\gamma}. We use a mean-value expansion to derive the explicit expression of γ^γ\widehat{\gamma}-\gamma^{*}, which mainly contains a term Qc(γ)Q_{c}(\gamma^{*}) multiplied by H¯1\bar{H}^{-1}. Then by applying a third-order Taylor expansion to Qc(γ)Q_{c}(\gamma^{*}), we find that the first order term is asymptotically normal, the second is the asymptotic bias term and the third is a remainder term. The asymptotic distribution of γ^\widehat{\gamma} is stated below.

Theorem 3.

Assume that the conditions in Theorem 1 hold. If bn4ϵ~n3z=o(n1/2/(logn)3/2)b_{n}^{4}\tilde{\epsilon}_{n}^{3}z_{*}=o(n^{1/2}/(\log n)^{3/2}), then as nn goes to infinity, Nc(γ^γ)\sqrt{N}c^{\top}(\widehat{\gamma}-\gamma) converges in distribution to the normal distribution with mean cH¯1B-c^{\top}\bar{H}^{-1}B_{*} and variance cH¯cc^{\top}\bar{H}c, where

B=1Nk=1njkzkjμ′′(πkj)jkμ(πkj).B_{*}=\frac{1}{\sqrt{N}}\sum_{k=1}^{n}\frac{\sum_{j\neq k}z_{kj}\mu^{\prime\prime}(\pi_{kj}^{*})}{\sum_{j\neq k}\mu^{\prime}(\pi_{kj}^{*})}. (20)
Remark 3.

We now discuss the bias term BB_{*} in (20). We assume that zijz_{ij} is centered and independently drawn from a pp-dimensional multivariate distribution with bounded supports. Then, by Hoeffding,’s (1963) inequality, jkzkjμkj′′(πij)=op((nlogn)1/2)\sum_{j\neq k}z_{kj}\mu_{kj}^{\prime\prime}(\pi_{ij}^{*})=o_{p}((n\log n)^{1/2}) such that B=Op((logn/n)1/2)B_{*}=O_{p}((\log n/n)^{1/2}). In this case, there is no bias in the limiting distribution of γ^\widehat{\gamma}. When B=O(1)B_{*}=O(1), the confidence intervals and the p-values of hypothesis testing constructed from γ^\widehat{\gamma} cannot achieve the nominal level without bias-correction. This is referred to as the well-known incidental parameter problem in econometrics literature [Neyman and Scott, (1948); Graham, (2017)]. As in Dzemski, (2019), we could use a simple analytical bias correction formula: γ^bc=γ^N1/2H1(β^,γ^)B^\widehat{\gamma}_{bc}=\widehat{\gamma}-N^{-1/2}H^{-1}(\widehat{\beta},\widehat{\gamma})\hat{B}, where B^\widehat{B} is the plug-in estimate of BB_{*} by replacing β\beta^{*} and γ\gamma^{*} with their estimators β^\widehat{\beta} and γ^\widehat{\gamma}, respectively.

5 Numerical studies

5.1 Simulations

In this section, we evaluate the performance of the asymptotic theories through finite sizes of networks. The parameters in the simulations are as follows. The setting of the parameter β{\beta}^{*} took a linear form. Specifically, we set βi=(i1)clogn/(n1)\beta_{i}^{*}=(i-1)c\log n/(n-1) for i=1,,ni=1,\ldots,n, where we chose four different values for cc, i.e., c=0.05,0.15,0.3,0.5c=0.05,0.15,0.3,0.5. Since the conditions to guarantee asymptotic properties in theorems depend on the whole quantity kn/ϵnk_{n}/\epsilon_{n}, we set knk_{n} to be fixed (i.e., kn=1k_{n}=1) and let ϵn\epsilon_{n} go to zero as nn increases. We considered two different values for ϵn\epsilon_{n}: logn/n1/6\log n/n^{1/6} and logn/n1/4\log n/n^{1/4}. The edge covariates formed as follows. For each node ii, we generated two dichotomous random variables xi1x_{i1} and xi2x_{i2} from {1,1}\{1,-1\} with unequal probabilities 0.40.4 and 0.60.6 and equal probabilities, respectively. Then we set zij=(xi1xj1,xi2xj2)z_{ij}=(x_{i1}*x_{j1},x_{i2}*x_{j2})^{\top}. For the parameter γ\gamma^{*}, we let it be (0.5,0.5)(0.5,-0.5)^{\top}. Thus, the first positive value measures the homophily effect and the second negative value measures heterophilic effect. We carried out simulations under two different sizes of networks: n=100n=100 and n=200n=200. Each simulation was repeated 10,00010,000 times.

By Theorem 2, ξ^ij=[β^iβ^j(βiβj)]/(1/v^ii+1/v^jj)1/2\hat{\xi}_{ij}=[\hat{\beta}_{i}-\hat{\beta}_{j}-(\beta_{i}^{*}-\beta_{j}^{*})]/(1/\hat{v}_{ii}+1/\hat{v}_{jj})^{1/2} converges in distribution to the standard normal distributions, where v^ii\hat{v}_{ii} is the estimate of viiv_{ii} by replacing β\beta^{*} with β^\hat{\beta}. We choose five special pairs (1,2),(n/2,n/2+1)(1,2),(n/2,n/2+1), (n1,n)(n-1,n), (1,n/2)(1,n/2) and (1,n)(1,n) for (i,j)(i,j). We record the coverage probability of the 95%95\% confidence interval, the length of the confidence interval, and the frequency that the estimates do not exist. These values for γ^\widehat{\gamma} are also reported.

Table 1: The reported values are the coverage frequency (×100%\times 100\%) for αiαj\alpha_{i}-\alpha_{j} for a pair (i,j)(i,j) / the length of the confidence interval / the frequency (×100%\times 100\%) that the estimate did not exist.
nn (i,j)(i,j) c=0.05c=0.05 c=0.15c=0.15 c=0.3c=0.3 c=0.5c=0.5
ϵn=logn/n1/6\epsilon_{n}=\log n/n^{1/6}
100 (1, 2) 93.98/1.19/093.98/1.19/0 93.80/1.21/093.80/1.21/0 93.51/1.27/0.0893.51/1.27/0.08 93.36/1.38/28.1693.36/1.38/28.16
(49,50) 93.91/1.20/093.91/1.20/0 93.80/1.25/093.80/1.25/0 93.57/1.43/0.0893.57/1.43/0.08 93.26/1.84/28.1693.26/1.84/28.16
(99, 100) 93.62/1.21/093.62/1.21/0 93.69/1.32/093.69/1.32/0 93.51/1.73/0.0893.51/1.73/0.08 96.60/2.89/28.1696.60/2.89/28.16
(1,50) 94.15/1.20/094.15/1.20/0 93.88/1.23/093.88/1.23/0 93.64/1.34/0.0893.64/1.34/0.08 92.93/1.61/28.1692.93/1.61/28.16
(1, 100) 93.91/1.20/093.91/1.20/0 94.29/1.26/094.29/1.26/0 93.53/1.50/0.0893.53/1.50/0.08 95.50/2.24/28.1695.50/2.24/28.16
200 (1, 2) 94.37/0.84/094.37/0.84/0 94.57/0.85/094.57/0.85/0 94.44/0.90/0.0194.44/0.90/0.01 94.23/1.00/16.6594.23/1.00/16.65
(99,100) 94.19/0.84/094.19/0.84/0 94.56/0.89/094.56/0.89/0 94.08/1.05/0.0194.08/1.05/0.01 93.88/1.43/16.6593.88/1.43/16.65
(199,200) 94.80/0.85/094.80/0.85/0 94.11/0.95/094.11/0.95/0 94.76/1.34/0.0194.76/1.34/0.01 96.05/2.46/16.6596.05/2.46/16.65
(1,100) 94.27/0.84/094.27/0.84/0 94.38/0.87/094.38/0.87/0 94.51/0.98/0.0194.51/0.98/0.01 93.57/1.23/16.6593.57/1.23/16.65
(1,200) 94.34/0.84/094.34/0.84/0 94.60/0.90/094.60/0.90/0 94.69/1.14/0.0194.69/1.14/0.01 95.76/1.81/16.6595.76/1.81/16.65
ϵn=logn/n1/4\epsilon_{n}=\log n/n^{1/4}
100 (1, 2) 92.61/1.20/092.61/1.20/0 92.74/1.21/092.74/1.21/0 91.88/1.27/0.4091.88/1.27/0.40 91.44/1.39/42.7991.44/1.39/42.79
(49,50) 92.48/1.20/092.48/1.20/0 92.45/1.25/092.45/1.25/0 91.53/1.43/0.4091.53/1.43/0.40 90.04/1.85/42.7990.04/1.85/42.79
(99, 100) 92.69/1.21/092.69/1.21/0 92.40/1.32/092.40/1.32/0 90.98/1.73/0.4090.98/1.73/0.40 95.30/2.89/42.7995.30/2.89/42.79
(1,50) 92.66/1.20/092.66/1.20/0 92.67/1.23/092.67/1.23/0 92.03/1.35/0.4092.03/1.35/0.40 90.81/1.61/42.7990.81/1.61/42.79
(1, 100) 92.52/1.20/092.52/1.20/0 92.77/1.26/092.77/1.26/0 91.19/1.51/0.4091.19/1.51/0.40 93.60/2.17/42.7993.60/2.17/42.79
200 (1, 2) 93.74/0.84/093.74/0.84/0 93.87/0.85/093.87/0.85/0 93.59/0.90/0.0493.59/0.90/0.04 93.15/1.00/33.7293.15/1.00/33.72
(99,100) 93.47/0.84/093.47/0.84/0 93.83/0.89/093.83/0.89/0 93.18/1.05/0.0493.18/1.05/0.04 92.14/1.43/33.7292.14/1.43/33.72
(199,200) 93.95/0.85/093.95/0.85/0 93.26/0.95/093.26/0.95/0 92.98/1.34/0.0492.98/1.34/0.04 93.84/2.47/33.7293.84/2.47/33.72
(1,100) 93.76/0.84/093.76/0.84/0 93.68/0.87/093.68/0.87/0 93.57/0.98/0.0493.57/0.98/0.04 92.11/1.23/33.7292.11/1.23/33.72
(1,200) 93.55/0.84/093.55/0.84/0 93.92/0.90/093.92/0.90/0 93.09/1.13/0.0493.09/1.13/0.04 94.16/1.81/33.7294.16/1.81/33.72

Table 1 reports the simulation results for βiβj\beta_{i}^{*}-\beta_{j}^{*}. The reported frequencies and lengths were conditional on the event that the estimator exists. We found that empirical coverage frequencies are very close to the nominal 95%95\% level when c0.3c\leq 0.3 and ϵn=logn/n1/6\epsilon_{n}=\log n/n^{1/6}, and they are a little less than the nominal level when ϵn=logn/n1/4\epsilon_{n}=\log n/n^{1/4} and n=100n=100. As expected, the length of the confidence interval increases with cc. Conversely, it decreases as nn increases. When c=0.5c=0.5, the estimator failed to exist with a positive frequencies over 20%20\%. In other cases, estimates existed almost in every simulation.

Table 2 reports the median of γ^\widehat{\gamma} as well as those of the bias corrected estimator γ^bc=γ^H¯1B^\widehat{\gamma}_{bc}=\widehat{\gamma}-\bar{H}^{-1}\hat{B}. As we can see, the bias is small and the empirical coverage frequencies for the estimators γ^bc\widehat{\gamma}_{bc} are more closer to the target level 95%95\% than those values for bias-uncorrected γ^\widehat{\gamma} when c0.15c\leq 0.15. When c=0.3c=0.3 and n=100n=100, they are a little lower than the target level. On the other hand, when nn is fixed, the length of confidence interval of γ^\widehat{\gamma} increases as cc becomes larger.

Table 2: The reported values are the coverage frequency (×100%\times 100\%) for γi\gamma_{i} for ii with bias-correction (uncorrected)/ bias (×100\times 100) / length of confidence interval (×10\times 10) /the frequency (×100%\times 100\%) that the MLE did not exist (𝜸=(0.5,0.5)\boldsymbol{\gamma}^{*}=(0.5,-0.5)^{\top}).
nn cc γ\gamma ϵn=logn/n1/6\epsilon_{n}=\log n/n^{1/6} ϵn=logn/n1/4\epsilon_{n}=\log n/n^{1/4}
100100 0.050.05 γ1\gamma_{1} 95.14(93.69)/1.03/0.13/095.14(93.69)/1.03/0.13/0 94.84(92.37)/1.02/0.13/094.84(92.37)/1.02/0.13/0
γ2\gamma_{2} 95.31(93.62)/1.01/0.12/095.31(93.62)/1.01/0.12/0 95.01(92.62)/1.00/0.12/095.01(92.62)/1.00/0.12/0
0.150.15 γ1\gamma_{1} 94.65(93.40)/0.87/0.13/094.65(93.40)/0.87/0.13/0 94.70(93.18)/0.87/0.13/094.70(93.18)/0.87/0.13/0
γ2\gamma_{2} 94.71(92.96)/0.80/0.13/094.71(92.96)/0.80/0.13/0 94.61(92.57)/0.80/0.13/094.61(92.57)/0.80/0.13/0
0.30.3 γ1\gamma_{1} 94.10(93.74)/0.37/0.15/0.0894.10(93.74)/0.37/0.15/0.08 92.96(92.17)/0.37/0.15/0.4092.96(92.17)/0.37/0.15/0.40
γ2\gamma_{2} 93.87(93.47)/0.20/0.15/0.0893.87(93.47)/0.20/0.15/0.08 93.01(92.52)/0.20/0.15/0.4093.01(92.52)/0.20/0.15/0.40
0.50.5 γ1\gamma_{1} 91.08(92.72)/0.75/0.20/28.1691.08(92.72)/0.75/0.20/28.16 89.70(91.94)/0.76/0.20/42.7989.70(91.94)/0.76/0.20/42.79
γ2\gamma_{2} 90.41(93.19)/1.08/0.19/28.1690.41(93.19)/1.08/0.19/28.16 89.06(92.50)/1.08/0.19/42.7989.06(92.50)/1.08/0.19/42.79
200200 0.050.05 γ1\gamma_{1} 95.23(93.72)/0.51/0.06/095.23(93.72)/0.51/0.06/0 95.11(93.50)/0.51/0.06/095.11(93.50)/0.51/0.06/0
γ2\gamma_{2} 95.31(93.64)/0.50/0.06/095.31(93.64)/0.50/0.06/0 95.29(93.46)/0.50/0.06/095.29(93.46)/0.50/0.06/0
0.150.15 γ1\gamma_{1} 95.38(93.92)/0.41/0.07/095.38(93.92)/0.41/0.07/0 95.14(93.70)/0.41/0.07/095.14(93.70)/0.41/0.07/0
γ2\gamma_{2} 94.90(93.11)/0.37/0.06/094.90(93.11)/0.37/0.06/0 94.72(92.85)/0.37/0.06/094.72(92.85)/0.37/0.06/0
0.30.3 γ1\gamma_{1} 95.01(94.80)/0.08/0.08/0.0195.01(94.80)/0.08/0.08/0.01 94.60(94.30)/0.08/0.08/0.0494.60(94.30)/0.08/0.08/0.04
γ2\gamma_{2} 94.45(94.52)/0.02/0.08/0.0194.45(94.52)/0.02/0.08/0.01 94.10(94.22)/0.02/0.08/0.0494.10(94.22)/0.02/0.08/0.04
0.50.5 γ1\gamma_{1} 91.40(93.64)/0.64/0.11/16.6591.40(93.64)/0.64/0.11/16.65 90.43(93.00)/0.64/0.11/33.7290.43(93.00)/0.64/0.11/33.72
γ2\gamma_{2} 90.10(93.70)/0.86/0.10/16.6590.10(93.70)/0.86/0.10/16.65 89.56(93.29)/0.86/0.10/33.7289.56(93.29)/0.86/0.10/33.72

5.2 A real data example

We use the Enron email dataset as an example analysis [Cohen, (2004)], available from https://www.cs.cmu.edu/~enron/. This dataset was released by William Cohen at Carnegie Mellon University and has been widely studied. The Enron email data was originally acquired and made public by the Federal Energy Regulatory Commission during its investigation into fraudulent accounting practices. Some of the emails have been deleted upon requests from affected employees. However, the raw data is messy and needs to be cleaned before any analysis is conducted. Zhou et al., (2007) applied data cleaning strategies to compile the Enron email dataset. We use their cleaned data for the subsequent analysis. The resulting data comprises 21,63521,635 messages sent between 156156 employees with their covariate information. Thus, the corresponding graph has multiple edges. We treat it as a simple undirected graph for our analysis, where each edge denotes that there are at least one message between the corresponding two nodes. We remove the isolated nodes “32” and “37” with zero degrees, where the estimators of the corresponding degree parameters do not exist. This leaves a connected network with 154154 nodes and 18431843 edges. The minimum, 1/41/4 quantile, median, 3/43/4 quantile and maximum values of dd are 11, 1515, 2121, 3030 and 8888, respectively.

Refer to caption
Refer to caption
Figure 1: Visualization of Enran email network among 154154 employees. The vertex sizes are proportional to nodal degrees. The positions of the vertices are the same in two graphs. For nodes with degrees less than 33, we set their sizes the same (as a node with degrees 3). In the left graph, the colors indicate different departments (red for legal and blue for trading and orange for other), while in the right graph, the colors represent different genders (blue for male and red for female).

Each employee has three categorical variables: departments of these employees (Trading, Legal, Other), the genders (Male, Female) and seniorities (Senior, Junior). We plot the network with individual departments and genders in Figure 1. We can see that the degrees exhibit a great variation across nodes and it is not easy to judge homophic or heteriphic effects that require quantitative analysis. The 33-dimensional covariate vector zijz_{ij} of edge (i,j)(i,j) is formed by using a homophilic matching function between these three covariates of two employees ii and jj, i.e., if xikx_{ik} and xjkx_{jk} are equal, then zijk=1z_{ijk}=1; otherwise zijk=1z_{ijk}=-1.

We evaluate how close the estimator (α^,β^)(\hat{\alpha},\hat{\beta}) is to the original MLE (α~,β~)(\tilde{\alpha},\tilde{\beta}) fitted in the generalized β\beta–model. We chose two ϵn\epsilon_{n} (logn/n1/6\log n/n^{1/6}, logn/n1/4\log n/n^{1/4}) and let kn=1k_{n}=1 as in simulations, and repeated to release dd and yy using according to (7) 1,0001,000 times for each ϵn\epsilon_{n}. Then we computed the median private estimate and the upper (97.5th97.5^{th}) and the lower (2.5th2.5^{th}) quantiles. The private estimate existed for each output. The frequencies that the private estimate fails to exist are 33.7%33.7\% and 45.2%45.2\% for ϵ=logn/n1/6\epsilon=\log n/n^{1/6} and ϵ=logn/n1/4\epsilon=\log n/n^{1/4}, respectively. The results for the estimates γ^\widehat{\gamma} and β^\widehat{\beta} are shown in Table 3 and Figure 2. From Table 3, we can see that the median value of γ^\widehat{\gamma} is the same as the MLE and the MLE lies in the 95%95\% confidence interval. The similar phenomenon can also be observed in Figure 2. From this figure, we can see that the length of confidence interval of private estimates under ϵn=logn/n1/6\epsilon_{n}=\log n/n^{1/6} are shorter than those under ϵn=logn/n1/4\epsilon_{n}=\log n/n^{1/4}.

Table 3: The nonprivate MLE γ^i0\widehat{\gamma}_{i}^{0} of γi\gamma_{i}, the median of private estimates of γi\gamma_{i}, the length of confidence interval for Enron email data.
Covariate γ^i0\hat{\gamma}_{i}^{0} γ^i\hat{\gamma}_{i} 95%95\% confidence interval
ϵn=logn/n1/6\epsilon_{n}=\log n/n^{1/6}
Department 0.016-0.016 0.016-0.016 [0.083,0.033][-0.083,0.033]
Gender 0.0630.063 0.0630.063 [0.005,0.136][0.005,0.136]
Seniority 0.0320.032 0.0320.032 [0.022,0.085][-0.022,0.085]
ϵn=logn/n1/4\epsilon_{n}=\log n/n^{1/4}
Department 0.016-0.016 0.016-0.016 [0.083,0.033][-0.083,0.033]
Gender 0.0630.063 0.0620.062 [0.004,0.135][0.004,0.135]
Seniority 0.0320.032 0.0320.032 [0.021,0.085][-0.021,0.085]
Refer to caption
Figure 2: The private estimate β^\hat{\beta} in red color with the MLE in black color for the Enran email network. The blue color indicates the upper and lower values of confidence interval.

6 Summary and discussion

We have present the (kn,ϵn)(k_{n},\epsilon_{n})-edge-differentially private estimation for inferring the degree parameter and homophily parameter in the generalized β\beta–model. We establish consistency of the estimator under several conditions and also derive its asymptotic normal distribution. It is worth noting that the conditions imposed on bnb_{n} imply the network density going to zero with a very slow rate. When networks are very sparse, adding noises easily produces the outputs of negative degrees, which will lead to the non-existence of the maximum likelihood estimator in the covariate-adjusted β\beta-model. In addition, the conditions in Theorems 2 and 3 seem stronger than those needed for consistency. Note that the asymptotic behavior of the estimator depends not only on bnb_{n}, but also on the configuration of the parameters. It is of interest to see whether conditions for guaranteeing theoretical properties could be relaxed.

We use a generalized β\beta–model to give a rigorous differential privacy analysis of networks with covariates. It is notable that the assumption of the logistic distribution of an edge is not essential in our strategies for proofs. Our these principled methods should be applicable to a class of network models beyond the covariate-adjusted β\beta-model [Wang et al., (2023)]. For instance, the developed two-stage Newton method to prove the consistency of the differentially private estimator still works if the logistic distribution is replaced with the probit distribution. Further, the edge independence assumption is not directly used in this method, which only plays a role to derive the upper bound of dd and yy. There are many tail probability inequalities in dependent random variables that could be applied to network statistics with edge dependence situations. We hope that the methods developed here can be further to be applied to edge-dependence network models.

7 Appendix

7.1 Preliminaries

In this section, we present three results that will be used in the proofs. The first is on the approximation error of using SS to approximate the inverse of VV belonging to the matrix class n(m,M)\mathcal{L}_{n}(m,M), where V=(vij)n×nV=(v_{ij})_{n\times n} and S=diag(1/v11,,1/vnn)S=\mathrm{diag}(1/v_{11},\ldots,1/v_{nn}). Yan et al., (2015) obtained the upper bound of the approximation error, which has an order n2n^{-2}. The second is a tight bound of V1\|V^{-1}\|_{\infty} in Hillar et al., (2012). These two results are stated below as lemmas.

Lemma 3 (Yan et al., (2015)).

If Vn(m,M)V\in\mathcal{L}_{n}(m,M), then the following holds:

V1Smax1(n1)2(M2m2+nM22(n2)m3+3n22nm)=O(M2n2m3),\|V^{-1}-S\|_{\max}\leq\frac{1}{(n-1)^{2}}\left(\frac{M}{2m^{2}}+\frac{nM^{2}}{2(n-2)m^{3}}+\frac{3n-2}{2nm}\right)=O(\frac{M^{2}}{n^{2}m^{3}}),

where Amax=maxi,j|Aij|\|A\|_{\max}=\max_{i,j}|A_{ij}| for a general matrix AA.

Lemma 4 (Hillar et al., (2012)).

For Vn(m,M)V\in\mathcal{L}_{n}(m,M), when n3n\geq 3, we have

12M(n1)V13n42m(n1)(n2).\frac{1}{2M(n-1)}\leq\|V^{-1}\|_{\infty}\leq\frac{3n-4}{2m(n-1)(n-2)}.

Let F(x):nnF(x):\mathbb{R}^{n}\to\mathbb{R}^{n} be a function vector on xnx\in\mathbb{R}^{n}. We say that a Jacobian matrix F(x)F^{\prime}(x) with xnx\in\mathbb{R}^{n} is Lipschitz continuous on a convex set DnD\subset\mathbb{R}^{n} if for any x,yDx,y\in D, there exists a constant λ>0\lambda>0 such that for any vector vnv\in\mathbb{R}^{n} the inequality

[F(x)]v[F(y)]vλxyv\|[F^{\prime}(x)]v-[F^{\prime}(y)]v\|_{\infty}\leq\lambda\|x-y\|_{\infty}\|v\|_{\infty}

holds. We will use the Newton iterative sequence to establish the existence and consistency of the differentially private estimator. Gragg and Tapia, (1974) gave the optimal error bound for the Newton method under the Kantovorich conditions [Kantorovich, (1948)]. We only show partial results here that are enough for our applications.

Lemma 5 (Gragg and Tapia, (1974)).

Let DD be an open convex set of n\mathbb{R}^{n} and F:DnF:D\to\mathbb{R}^{n} be Fréchet differentiable on DD with a Jacobian F(x)F^{\prime}(x) that is Lipschitz continuous on DD with Lipschitz coefficient λ\lambda. Assume that x0Dx_{0}\in D is such that [F(x0)]1[F^{\prime}(x_{0})]^{-1} exists,

[F(x0)]1,[F(x0)]1F(x0)δ,h=2λδ1,\|[F^{\prime}(x_{0})]^{-1}\|\leq\aleph,~{}~{}\|[F^{\prime}(x_{0})]^{-1}F(x_{0})\|\leq\delta,~{}~{}h=2\aleph\lambda\delta\leq 1,

and

B(x0,t)D,t=2h(11h)δ=2δ1+1h.B(x_{0},t^{*})\subset D,~{}~{}t^{*}=\frac{2}{h}(1-\sqrt{1-h})\delta=\frac{2\delta}{1+\sqrt{1-h}}.

Then: (1) The Newton iterations xk+1=xk[F(xk)]1F(xk)x_{k+1}=x_{k}-[F^{\prime}(x_{k})]^{-1}F(x_{k}) exist and xkB(x0,t)Dx_{k}\in B(x_{0},t^{*})\subset D for k0k\geq 0. (2) x=limxkx^{*}=\lim x_{k} exists, xB(x0,t)¯Dx^{*}\in\overline{B(x_{0},t^{*})}\subset D and F(x)=0F(x^{*})=0.

7.2 Error bound between β^γ\widehat{\beta}_{\gamma} and β\beta^{*}

We will use the Newton method to derive the error bound between β^γ\widehat{\beta}_{\gamma} and β\beta^{*} through verifying the Kantororich conditions in Lemma 5. The Kantororich conditions require the Lipschitz continuous of Fγ(β)F^{\prime}_{\gamma}(\beta) and the upper bounds of Fγ(β)F_{\gamma}(\beta^{*}). We need three lemmas below, whose proofs are in supplementary material.

Lemma 6.

For any given γ\gamma, the Jacobian matrix Fγ(x)F^{\prime}_{\gamma}(x) of Fγ(x)F_{\gamma}(x) on xx is Lipschitz continuous on n\mathbb{R}^{n} with the Lipschitz coefficient (n1)(n-1).

Lemma 7.

The tail probabilities of d𝔼d\|d-\mathbb{E}d\|_{\infty} and y𝔼y\|y-\mathbb{E}y\|_{\infty} are given below:

(d𝔼dnlogn)\displaystyle\mathbb{P}\Bigg{(}\|d-\mathbb{E}d\|_{\infty}\geq\sqrt{n\log n}\Bigg{)} \displaystyle\leq 2n,\displaystyle\frac{2}{n},
(y𝔼y2zn(n1)logn/2)\displaystyle\mathbb{P}\left(\|y-\mathbb{E}y\|_{\infty}\geq 2z_{*}\sqrt{n(n-1)\log n/2}\right) \displaystyle\leq 2pn2.\displaystyle\frac{2p}{n^{2}}.
Lemma 8.

For any given c>0c>0, we have

(maxi=1,,n|ξi|>c)<nexp(cεn4kn),(maxi=1,,p|ηi|>c)<pexp(cεnpzkn).\mathbb{P}(\max_{i=1,\ldots,n}|\xi_{i}|>c)<n\exp(-\frac{c\varepsilon_{n}}{4k_{n}}),~{}~{}\mathbb{P}(\max_{i=1,\ldots,p}|\eta_{i}|>c)<p\exp(-\frac{c\varepsilon_{n}}{pz_{*}k_{n}}).

Recall that

ε~n=1+4knεnlognn.\tilde{\varepsilon}_{n}=1+\frac{4k_{n}}{\varepsilon_{n}}\sqrt{\frac{\log n}{n}}.

We state the error bound between β^γ\widehat{\beta}_{\gamma} and β\beta^{*} below.

Lemma 9.

Assume ϵn2=O((logn)1/2n1/2)\epsilon_{n2}=O((\log n)^{1/2}n^{-1/2}) and γB(γ,ϵn2)\gamma\in B(\gamma^{*},\epsilon_{n2}). If bn2ε~n=o((n/logn)1/2)b_{n}^{2}\tilde{\varepsilon}_{n}=o((n/\log n)^{1/2}), then with probability approaching one, β^γ\widehat{\beta}_{\gamma} exists and satisfies

β^γβ=Op(bnε~nlognn)=op(1).\|\widehat{\beta}_{\gamma}-\beta^{*}\|_{\infty}=O_{p}\left(b_{n}\tilde{\varepsilon}_{n}\sqrt{\frac{\log n}{n}}\right)=o_{p}(1).
Proof of Lemma 9.

We will derive the error bound between β^γ\widehat{\beta}_{\gamma} and β\beta^{*} through constructing the Newton iterative sequence β(n+1)=β(n)Fγ(β(n))Fγ(β(n))\beta^{(n+1)}=\beta^{(n)}-F_{\gamma}^{\prime}(\beta^{(n)})F_{\gamma}(\beta^{(n)}), where we choose β\beta^{*} as the starting point β(0):=β\beta^{(0)}:=\beta^{*}. To this end, it is sufficient to verify the Kantovorich conditions in Lemma 5. Note that Fγ(β)n(bn1,1/4)F^{\prime}_{\gamma}(\beta)\in\mathcal{L}_{n}(b_{n}^{-1},1/4) when βB(β,ϵn1)\beta\in B(\beta^{*},\epsilon_{n1}) and γB(γ,ϵn2)\gamma\in B(\gamma^{*},\epsilon_{n2}), where ϵn1\epsilon_{n1} is a small positive number and ϵn2=O((logn/n)1/2)\epsilon_{n2}=O((\log n/n)^{1/2}). The following calculations are based on the event EnE_{n}:

En={d:d~𝔼dε~n(nlogn)1/2}.E_{n}=\{d:\|\tilde{d}-\mathbb{E}d\|_{\infty}\leq\tilde{\varepsilon}_{n}(n\log n)^{1/2}\}.

Let V=(vij)=Fγ(β)/βV=(v_{ij})=\partial F_{\gamma}(\beta^{*})/\partial\beta^{\top} and S=diag(1/v11,,1/vnn)S=\mathrm{diag}(1/v_{11},\ldots,1/v_{nn}). By Lemma 4, we have =V1=O(bn/n)\aleph=\|V^{-1}\|_{\infty}=O(b_{n}/n). Recall that Fγ(β)=𝔼dd~F_{\gamma^{*}}(\beta^{*})=\mathbb{E}d-\tilde{d}. Note that the dimension pp of γ\gamma is a fixed constant. If ϵn2=O((logn)1/2n1/2)\epsilon_{n2}=O((\log n)^{1/2}n^{-1/2}), by the mean value theorem and the event EnE_{n}, we have

Fγ(β)\displaystyle\|F_{\gamma}(\beta^{*})\|_{\infty} \displaystyle\leq d~𝔼d+maxi|ji[μij(β,γ)μij(β,γ)]|\displaystyle\|\tilde{d}-\mathbb{E}d\|_{\infty}+\max_{i}|\sum\nolimits_{j\neq i}[\mu_{ij}(\beta^{*},\gamma)-\mu_{ij}(\beta^{*},\gamma^{*})]|
\displaystyle\leq O(ε~n(nlogn)1/2)+maxiji|μij(β,γ¯)||zij(γγ)|\displaystyle O(\tilde{\varepsilon}_{n}(n\log n)^{1/2})+\max_{i}\sum_{j\neq i}|\mu_{ij}^{\prime}(\beta^{*},\bar{\gamma})||z_{ij}^{\top}(\gamma-\gamma^{*})|
\displaystyle\leq O(ε~n(nlogn)1/2)+O(n(logn)1/2n1/2)\displaystyle O(\tilde{\varepsilon}_{n}(n\log n)^{1/2})+O(n\cdot(\log n)^{1/2}n^{-1/2})
=\displaystyle= O(ε~n(nlogn)1/2).\displaystyle O(\tilde{\varepsilon}_{n}(n\log n)^{1/2}).

Repeatedly utilizing Lemma 4, we have

δ=[Fγ(β)]1Fγ(β)=[Fγ(β)]1Fγ(β)=O(ε~nbnlognn)\displaystyle\delta=\|[F^{\prime}_{\gamma}(\beta^{*})]^{-1}F_{\gamma}(\beta^{*})\|_{\infty}=\|[F^{\prime}_{\gamma}(\beta^{*})]^{-1}\|_{\infty}\|F_{\gamma}(\beta^{*})\|_{\infty}=O\left(\tilde{\varepsilon}_{n}b_{n}\sqrt{\frac{\log n}{n}}\right)

By Lemma 6, Fγ(β)F_{\gamma}(\beta) is Lipschitz continuous with Lipschitz coefficient λ=n1\lambda=n-1. Therefore, if ε~nbn2=o((n/logn)1/2)\tilde{\varepsilon}_{n}b_{n}^{2}=o((n/\log n)^{1/2}), then

h=2λδ\displaystyle h=2\aleph\lambda\delta =\displaystyle= O(bnn)×O(n)×O(ε~nbnlognn)\displaystyle O(\frac{b_{n}}{n})\times O(n)\times O(\tilde{\varepsilon}_{n}b_{n}\sqrt{\frac{\log n}{n}})
=\displaystyle= O(ε~nbn2lognn)=o(1).\displaystyle O\left(\tilde{\varepsilon}_{n}b_{n}^{2}\sqrt{\frac{\log n}{n}}\right)=o(1).

The above arguments verify the Kantovorich conditions. By Lemma 5, it yields that

β^γβ=O(ε~nbnlognn).\|\widehat{\beta}_{\gamma}-\beta^{*}\|_{\infty}=O\left(\tilde{\varepsilon}_{n}b_{n}\sqrt{\frac{\log n}{n}}\right). (21)

To finish the proof, it is left to show (Enc)0\mathbb{P}(E_{n}^{c})\to 0. Note that d~i=di+ξi\tilde{d}_{i}=d_{i}+\xi_{i}. By Lemmas 7 and 8, it can be verified as follows:

(Enc)\displaystyle\mathbb{P}(E_{n}^{c}) =\displaystyle= (d~𝔼d>ε~nnlogn)\displaystyle\mathbb{P}(\|\tilde{d}-\mathbb{E}d\|_{\infty}>\tilde{\varepsilon}_{n}\sqrt{n\log n})
\displaystyle\leq (d𝔼d>nlogn)+(maxi=1,,nξi>(ε~n1)nlogn)\displaystyle\mathbb{P}(\|d-\mathbb{E}d\|_{\infty}>\sqrt{n\log n})+\mathbb{P}(\max_{i=1,\ldots,n}\xi_{i}>(\tilde{\varepsilon}_{n}-1)\sqrt{n\log n})
\displaystyle\leq 2n0.\displaystyle\frac{2}{n}\to 0.

It completes the proof. ∎

7.3 Proof of Theorem 1

To show Theorem 1, we need three lemmas below.

Lemma 10.

Let D=B(γ,ϵn2)(p)D=B(\gamma^{*},\epsilon_{n2})(\subset\mathbb{R}^{p}) be an open convex set containing the true point γ\gamma^{*}. If d~𝔼d=O(ϵ~n(nlogn)1/2)\|\tilde{d}-\mathbb{E}d\|_{\infty}=O(\tilde{\epsilon}_{n}(n\log n)^{1/2}), then Qc(γ)Q_{c}(\gamma) is Lipschitz continuous on DD with the Lipschitz coefficient n2bn3n^{2}b_{n}^{-3}.

Lemma 11.

Write β^\widehat{\beta}^{*} as β^γ\widehat{\beta}_{\gamma^{*}} and V=F(β,γ)/βV=\partial F(\beta^{*},\gamma^{*})/\partial\beta^{\top}. If bn2ε~n=o((n/logn)1/2)b_{n}^{2}\tilde{\varepsilon}_{n}=o((n/\log n)^{1/2}), then β^\widehat{\beta}^{*} has the following expansion:

β^β=V1F(β,γ)+V1R,\widehat{\beta}^{*}-\beta^{*}=V^{-1}F(\beta^{*},\gamma^{*})+V^{-1}R, (22)

where R=(R1,,Rn)R=(R_{1},\ldots,R_{n})^{\top} is the remainder term and

V1R=Op(bn3ε~n2lognn).\left\|V^{-1}R\right\|_{\infty}=O_{p}(\frac{b_{n}^{3}\tilde{\varepsilon}_{n}^{2}\log n}{n}).
Lemma 12.

If bn2ε~n=o((n/logn)1/2)b_{n}^{2}\tilde{\varepsilon}_{n}=o((n/\log n)^{1/2}), for any βB(β,ϵn1)\beta\in B(\beta^{*},\epsilon_{n1}) and γB(γ,ϵn2)\gamma\in B(\gamma^{*},\epsilon_{n2}), then we have

Q(β,γ)β(β^γβ)=Op(ε~nbn3nlogn).\|\frac{\partial Q(\beta,\gamma)}{\partial\beta^{\top}}(\widehat{\beta}_{\gamma}-\beta^{*})\|_{\infty}=O_{p}(\tilde{\varepsilon}_{n}b_{n}^{3}n\log n).

Now we are ready to prove Theorem 1.

Proof of Theorem 1.

We construct the Newton iterative sequence to show the consistency. It is sufficient to verify the Kantovorich conditions in Lemma 5. In the Newton method, we set γ\gamma^{*} as the initial point γ(0)\gamma^{(0)} and γ(k+1)=γ(k)[Qc(γ(k))]1Qc(γ(k))\gamma^{(k+1)}=\gamma^{(k)}-[Q_{c}^{\prime}(\gamma^{(k)})]^{-1}Q_{c}(\gamma^{(k)}).

The following calculations are based on the event EnE_{n} that for γB(γ,ϵn2)\gamma\in B(\gamma^{*},\epsilon_{n2}), β^γ\widehat{\beta}_{\gamma} exists and satisfies

β^γβ=O(ε~nbnlognn).\|\widehat{\beta}_{\gamma}-\beta^{*}\|_{\infty}=O\left(\tilde{\varepsilon}_{n}b_{n}\sqrt{\frac{\log n}{n}}\right).

This shows that β^γ(0)\widehat{\beta}_{\gamma^{(0)}} exists such that Qc(γ(0))Q_{c}(\gamma^{(0)}) and Qc(γ(0))Q_{c}^{\prime}(\gamma^{(0)}) are well defined. This in turn shows that in every iterative step, γ(k+1)\gamma^{(k+1)} exists as long as γ(k)\gamma^{(k)} exists.

Recall the definition of Qc(γ)Q_{c}(\gamma) and Q(β,γ)Q(\beta,\gamma) in (13) and (14). By Lemmas 7 and 8, we have

Q(β,γ)\displaystyle\|Q(\beta^{*},\gamma^{*})\|_{\infty} =\displaystyle= 𝔼yy+η\displaystyle\|\mathbb{E}y-y\|_{\infty}+\|\eta\|_{\infty}
=\displaystyle= Op(zn(logn)1/2)+Op((pkn/εn)logn)\displaystyle O_{p}(z_{*}n(\log n)^{1/2})+O_{p}((pk_{n}/\varepsilon_{n})\log n)
=\displaystyle= Op((z+pkn(logn)1/2nεn)n(logn)1/2).\displaystyle O_{p}((z_{*}+\frac{pk_{n}(\log n)^{1/2}}{n\varepsilon_{n}})n(\log n)^{1/2}).

By the mean value theorem and Lemma 12, we have

Q(β^,γ)Q(β,γ)=Q(β¯,γ)β(β^γβ)=Op(ε~nbn3nlogn)).\|Q(\widehat{\beta}^{*},\gamma^{*})-Q(\beta^{*},\gamma^{*})\|_{\infty}=\|\frac{\partial Q(\bar{\beta},\gamma)}{\partial\beta^{\top}}(\widehat{\beta}_{\gamma}-\beta^{*})\|_{\infty}=O_{p}(\tilde{\varepsilon}_{n}b_{n}^{3}n\log n)).

Then it follows that

Qc(γ)\displaystyle\|Q_{c}(\gamma^{*})\|_{\infty} \displaystyle\leq Q(β,γ)+Q(β^,γ)Q(β,γ)\displaystyle\|Q(\beta^{*},\gamma^{*})\|_{\infty}+\|Q(\widehat{\beta}^{*},\gamma^{*})-Q(\beta^{*},\gamma^{*})\|_{\infty}
=\displaystyle= Op((bn3+(kn/εn)bn3+z/(logn)1/2)nlogn)):=Op(τnnlogn).\displaystyle O_{p}\left((b_{n}^{3}+(k_{n}/\varepsilon_{n})b_{n}^{3}+z_{*}/(\log n)^{1/2})n\log n)\right):=O_{p}(\tau_{n}n\log n).

By Lemma 10, Qc(γ)Q^{\prime}_{c}(\gamma) is Lipschitz continuous with λ=n2bn3\lambda=n^{2}b_{n}^{3}. Note that =[Qc(γ)]1=O(ρnn2)\aleph=\|[Q_{c}^{\prime}(\gamma^{*})]^{-1}\|_{\infty}=O(\rho_{n}n^{-2}). Thus,

δ=[Qc(γ)]1Qc(γ)=Op(ρnτnlognn).\delta=\|[Q_{c}^{\prime}(\gamma^{*})]^{-1}Q_{c}(\gamma^{*})\|_{\infty}=O_{p}\left(\frac{\rho_{n}\tau_{n}\log n}{n}\right).

As a result, if bn3ρn2τn=o(n/logn)b_{n}^{3}\rho_{n}^{2}\tau_{n}=o(n/\log n), then

h=2λδ=Op(ρnn2n2bn3ρnτnlognn)=op(1).h=2\aleph\lambda\delta=O_{p}(\frac{\rho_{n}}{n^{2}}\cdot n^{2}b_{n}^{3}\cdot\rho_{n}\tau_{n}\cdot\frac{\log n}{n})=o_{p}(1).

By Lemma 5, with probability approaching one, the limiting point of the sequence {γ(k)}k=1\{\gamma^{(k)}\}_{k=1}^{\infty} exists denoted by γ^\widehat{\gamma}, and satisfies

γ^γ=Op(ρnτnlognn).\|\widehat{\gamma}-\gamma^{*}\|_{\infty}=O_{p}\left(\frac{\rho_{n}\tau_{n}\log n}{n}\right).

At the same time, by Lemma 9, β^γ^\widehat{\beta}_{\widehat{\gamma}} exists, denoted by β^\widehat{\beta}. The limiting points (β^,γ^)(\widehat{\beta},\widehat{\gamma}) satisfies the equation (9). It completes the proof. ∎

7.4 Proofs for Theorem 2

Proof of Theorem 2.

To simplify notations, write π^ij=β^i+β^j+zijγ^\widehat{\pi}_{ij}=\widehat{\beta}_{i}+\widehat{\beta}_{j}+z_{ij}^{\top}\widehat{\gamma}, πij=βi+βj+zijγ\pi_{ij}^{*}=\beta_{i}^{*}+\beta_{j}^{*}+z_{ij}^{\top}\gamma^{*}, μij=μ(πij)\mu_{ij}^{\prime}=\mu^{\prime}(\pi_{ij}^{*}) and

V=F(β,γ)β,Vβγ=F(β,γ)γ.V=\frac{\partial F(\beta^{*},\gamma^{*})}{\partial\beta^{\top}},~{}~{}V_{\beta\gamma}=\frac{\partial F(\beta^{*},\gamma^{*})}{\partial\gamma^{\top}}.

By a second order Taylor expansion, we have

μ(π^ij)μ(πij)=μij(β^iβi)+μij(β^jβj)+μijzij(γ^γ)+gij,\mu(\widehat{\pi}_{ij})-\mu(\pi_{ij}^{*})=\mu_{ij}^{\prime}(\widehat{\beta}_{i}-\beta_{i})+\mu_{ij}^{\prime}(\widehat{\beta}_{j}-\beta_{j})+\mu_{ij}^{\prime}z_{ij}^{\top}(\widehat{\gamma}-\gamma)+g_{ij}, (23)

where

gij=12(β^iβiβ^jβjγ^γ)(μij′′(π~ij)μij′′(π~ij)μij′′(π~ij)zijμij′′(π~ij)μij′′(π~ij)μij′′(π~ij)zijμij′′(π~ij)zijμij′′(π~ij)zijμij′′(π~ij)zijzij)(β^iβiβ^jβjγ^γ),g_{ij}=\frac{1}{2}\begin{pmatrix}\widehat{\beta}_{i}-\beta_{i}^{*}\\ \widehat{\beta}_{j}-\beta_{j}^{*}\\ \widehat{\gamma}-\gamma^{*}\end{pmatrix}^{\top}\begin{pmatrix}\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})&-\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})&\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})z_{ij}^{\top}\\ -\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})&\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})&-\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})z_{ij}^{\top}\\ \mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})z_{ij}^{\top}&-\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})z_{ij}^{\top}&\mu^{\prime\prime}_{ij}(\tilde{\pi}_{ij})z_{ij}z_{ij}^{\top}\end{pmatrix}\begin{pmatrix}\widehat{\beta}_{i}-\beta_{i}^{*}\\ \widehat{\beta}_{j}-\beta_{j}^{*}\\ \widehat{\gamma}-\gamma^{*}\end{pmatrix},

and π~ij\tilde{\pi}_{ij} lies between πij\pi_{ij}^{*} and π^ij\widehat{\pi}_{ij}. Recall that z:=maxi,jzijz_{*}:=\max_{i,j}\|z_{ij}\|_{\infty}. Since |μ′′(πij)|1/4|\mu^{\prime\prime}(\pi_{ij})|\leq 1/4 (see (10)), we have

|gij|β^β2+12β^βγ^γ1z+14γ^γ12z212[4β^β2+γ^γ12z2].\begin{array}[]{rcl}|g_{ij}|&\leq&\|\widehat{\beta}-\beta^{*}\|_{\infty}^{2}+\tfrac{1}{2}\|\widehat{\beta}-\beta^{*}\|_{\infty}\|\widehat{\gamma}-\gamma^{*}\|_{1}z_{*}+\tfrac{1}{4}\|\|\widehat{\gamma}-\gamma^{*}\|_{1}^{2}z_{*}^{2}\\ &\leq&\tfrac{1}{2}[4\|\widehat{\beta}-\beta^{*}\|_{\infty}^{2}+\|\widehat{\gamma}-\gamma^{*}\|_{1}^{2}z_{*}^{2}].\end{array}

Let gi=jigijg_{i}=\sum_{j\neq i}g_{ij} and g=(g1,,gn)g=(g_{1},\ldots,g_{n})^{\top}. If ρn2τn2=o(n/logn)\rho_{n}^{2}\tau_{n}^{2}=o(n/\log n), by Theorem 1, we have

maxi=1,,n|gi|nmaxi,j|gij|=Op((bn2ε~n2+ρn2τn2lognn)logn)=Op(bn2ε~n2logn).\max_{i=1,\ldots,n}|g_{i}|\leq n\max_{i,j}|g_{ij}|=O_{p}\left(\left(b_{n}^{2}\tilde{\varepsilon}_{n}^{2}+\frac{\rho_{n}^{2}\tau_{n}^{2}\log n}{n}\right)\log n\right)=O_{p}\left(b_{n}^{2}\tilde{\varepsilon}_{n}^{2}\log n\right). (24)

By writing (23) into a matrix form, we have

d~𝔼d=V(β^β)+Vβγ(γ^γ)+g,\tilde{d}-\mathbb{E}d=V(\widehat{\beta}-\beta^{*})+V_{\beta\gamma}(\widehat{\gamma}-\gamma^{*})+g,

which is equivalent to

β^β=V1(d𝔼d)+V1Vβγ(γ^γ)+V1g+V1ξ.\widehat{\beta}-\beta^{*}=V^{-1}(d-\mathbb{E}d)+V^{-1}V_{\beta\gamma}(\widehat{\gamma}-\gamma^{*})+V^{-1}g+V^{-1}\xi. (25)

We bound the last three remainder terms in the above equation as follows. Let W=V1SW=V^{-1}-S. Note that (Sg)i=gi/vii(Sg)_{i}=g_{i}/v_{ii} and (n1)bn1vii(n1)/4(n-1)b_{n}^{-1}\leq v_{ii}\leq(n-1)/4. By Lemma 4 and inequality (24), we have

V1gV1g=O(bnn×bn2ε~n2logn)=Op(bn3ε~n2lognn).\displaystyle\|V^{-1}g\|_{\infty}\leq\|V^{-1}\|_{\infty}\|g\|_{\infty}=O(\frac{b_{n}}{n}\times b_{n}^{2}\tilde{\varepsilon}_{n}^{2}\log n)=O_{p}(\frac{b_{n}^{3}\tilde{\varepsilon}_{n}^{2}\log n}{n}). (26)

Note that the iith row of VβγV_{\beta\gamma} is j=1,jinμijzij\sum_{j=1,j\neq i}^{n}\mu^{\prime}_{ij}z_{ij}^{\top}. By Theorem 1, we have

Vβγ(γ^γ)(n1)zγ^γ1=Op(zρnτnlogn).\|V_{\beta\gamma}(\widehat{\gamma}-\gamma^{*})\|_{\infty}\leq(n-1)z_{*}\|\widehat{\gamma}-\gamma^{*}\|_{1}=O_{p}(z_{*}\rho_{n}\tau_{n}\log n).

By Lemma 3, we have

V1Vβγ(γ^γ)V1Vγβ(γ^γ)=Op(zρnτnbnlognn).\begin{array}[]{rcl}\|V^{-1}V_{\beta\gamma}(\widehat{\gamma}-\gamma^{*})\|_{\infty}\leq\|V^{-1}\|_{\infty}\|V_{\gamma\beta}(\widehat{\gamma}-\gamma^{*})\|_{\infty}=O_{p}(\frac{z_{*}\rho_{n}\tau_{n}b_{n}\log n}{n}).\end{array} (27)

By Lemma 8,

P(ξ>8(kn/εn)logn)n×exp(8(kn/εn)logn×4(εn/kn))=1n,P(\|\xi\|_{\infty}>8(k_{n}/\varepsilon_{n})\log n)\leq n\times\exp(-8(k_{n}/\varepsilon_{n})\log n\times 4(\varepsilon_{n}/k_{n}))=\frac{1}{n},

such that

ξ=Op((kn/εn)logn).\|\xi\|_{\infty}=O_{p}((k_{n}/\varepsilon_{n})\log n).

Thus, it yields

V1ξV1ξ=Op(bn(kn/εn)lognn).\|V^{-1}\xi\|_{\infty}\leq\|V^{-1}\|_{\infty}\|\xi\|_{\infty}=O_{p}\left(b_{n}(k_{n}/\varepsilon_{n})\frac{\log n}{n}\right). (28)

By combining (25), (26), (27) and (28), it yields

β^iβi=[V1(d𝔼d)]i+Op((bn3ε~n2+zρnτnbn)lognn).\widehat{\beta}_{i}-\beta^{*}_{i}=[V^{-1}(d-\mathbb{E}d)]_{i}+O_{p}(\frac{(b_{n}^{3}\tilde{\varepsilon}_{n}^{2}+z_{*}\rho_{n}\tau_{n}b_{n})\log n}{n}).

Let W=V1SW=V^{-1}-S and U=Cov(W(d𝔼d))U=\mathrm{Cov}(W(d-\mathbb{E}d)). It is easy to verify that

U=V1SS(InVS)U=V^{-1}-S-S(I_{n}-VS)

and

[S(InVS)]ij=(δij1)vijviivjj.[S(I_{n}-VS)]_{ij}=\frac{(\delta_{ij}-1)v_{ij}}{v_{ii}v_{jj}}.

By Lemma 3, we have

Umax=O(bn3n2).\|U\|_{\max}=O(b_{n}^{3}n^{-2}).

Therefore,

[W(d𝔼d)]i=Op(bn3/2n1).[W(d-\mathbb{E}d)]_{i}=O_{p}(b_{n}^{3/2}n^{-1}). (29)

Consequently, by combining (25), (26), (27) and (29), we have

β^iβi=di𝔼divii++Op((bn3ε~n2+zρnτnbn)lognn).\widehat{\beta}_{i}-\beta^{*}_{i}=\frac{d_{i}-\mathbb{E}d_{i}}{v_{ii}}++O_{p}(\frac{(b_{n}^{3}\tilde{\varepsilon}_{n}^{2}+z_{*}\rho_{n}\tau_{n}b_{n})\log n}{n}).

Therefore, Theorem 2 immediately follows from Proposition 1. ∎

7.5 Proof of Theorem 3

Proof of Theorem 3.

Assume that the conditions in Theorem 1 hold. A mean value expansion gives

Qc(γ^)Qc(γ)=Qc(γ¯)γ(γ^γ),Q_{c}(\widehat{\gamma})-Q_{c}(\gamma^{*})=\frac{\partial Q_{c}(\bar{\gamma})}{\partial\gamma^{\top}}(\widehat{\gamma}-\gamma^{*}),

where γ¯\bar{\gamma} lies between γ\gamma^{*} and γ^\widehat{\gamma}. By noting that Qc(γ^)=0Q_{c}(\widehat{\gamma})=0, we have

N(γ^γ)=[1NQc(γ¯)γ]1×1N¡¡Qc(γ).\sqrt{N}(\widehat{\gamma}-\gamma^{*})=-\Big{[}\frac{1}{N}\frac{\partial Q_{c}(\bar{\gamma})}{\partial\gamma^{\top}}\Big{]}^{-1}\times\frac{1}{\sqrt{N}}¡¡Q_{c}(\gamma^{*}).

Note that the dimension of γ\gamma is fixed. By Theorem 1 and (4.1), we have

1NQc(γ¯)γpH¯=limN1NH(β,γ).\frac{1}{N}\frac{\partial Q_{c}(\bar{\gamma})}{\partial\gamma^{\top}}\stackrel{{\scriptstyle p}}{{\to}}\bar{H}=\lim_{N\to\infty}\frac{1}{N}H(\beta^{*},\gamma^{*}).

Write β^\widehat{\beta}^{*} as β^(γ)\widehat{\beta}(\gamma^{*}) for convenience. Let Q¯(β,γ)=Q(β,γ)η\bar{Q}(\beta,\gamma)=Q(\beta,\gamma)-\eta and Q¯c(β,γ)=Qc(β,γ)η\bar{Q}_{c}(\beta,\gamma)=Q_{c}(\beta,\gamma)-\eta. Note that η\eta is a Laplace random vector. By Lemma 10, if kn/ϵn=o(n/logn)k_{n}/\epsilon_{n}=o(n/\log n), then

ηN1/2=Op(p(kn/ϵn)lognn)=op(1).\frac{\|\eta\|_{\infty}}{N^{1/2}}=O_{p}\left(\frac{p(k_{n}/\epsilon_{n})\log n}{n}\right)=o_{p}(1).

Therefore,

N(γ^γ)=H¯11NQ¯(β^,γ)+op(1).\sqrt{N}(\widehat{\gamma}-\gamma^{*})=-\bar{H}^{-1}\cdot\frac{1}{\sqrt{N}}\bar{Q}(\widehat{\beta}^{*},\gamma^{*})+o_{p}(1). (30)

By applying a third order Taylor expansion to Q¯(β^,γ)\bar{Q}(\widehat{\beta}^{*},\gamma^{*}), it yields

1NQ¯(β^,γ)=S1+S2+S3,\frac{1}{\sqrt{N}}\bar{Q}(\widehat{\beta}^{*},\gamma^{*})=S_{1}+S_{2}+S_{3}, (31)

where

S1=1NQ¯(β,γ)+1N[Q¯(β,γ)β](β^β),S2=12Nk=1n[(β^kβk)2Q¯(β,γ)βkβ×(β^β)],S3=16Nk=1nl=1n{(β^kβk)(β^lβl)[3Q¯(β¯,γ)βkβlβ](β^β)},\begin{array}[]{l}S_{1}=\frac{1}{\sqrt{N}}\bar{Q}(\beta^{*},\gamma^{*})+\frac{1}{\sqrt{N}}\Big{[}\frac{\partial\bar{Q}(\beta^{*},\gamma^{*})}{\partial\beta^{\top}}\Big{]}(\widehat{\beta}^{*}-\beta^{*}),\\ S_{2}=\frac{1}{2\sqrt{N}}\sum_{k=1}^{n}\Big{[}(\widehat{\beta}_{k}^{*}-\beta_{k}^{*})\frac{\partial^{2}\bar{Q}(\beta^{*},\gamma^{*})}{\partial\beta_{k}\partial\beta^{\top}}\times(\widehat{\beta}^{*}-\beta^{*})\Big{]},\\ S_{3}=\frac{1}{6\sqrt{N}}\sum_{k=1}^{n}\sum_{l=1}^{n}\{(\widehat{\beta}_{k}^{*}-\beta_{k}^{*})(\widehat{\beta}_{l}^{*}-\beta_{l}^{*})\Big{[}\frac{\partial^{3}\bar{Q}(\bar{\beta}^{*},\gamma^{*})}{\partial\beta_{k}\partial\beta_{l}\partial\beta^{\top}}\Big{]}(\widehat{\beta}^{*}-\beta^{*})\},\end{array}

and β¯=tβ+(1t)β^\bar{\beta}^{*}=t\beta^{*}+(1-t)\widehat{\beta}^{*} for some t(0,1)t\in(0,1). We will show that (1) S1S_{1} asymptotically follows a multivariate normal distribution; (2) S2S_{2} is a bias term; (3) S3S_{3} is an asymptotically negligible remainder term. Specifically, they are accurately characterized as follows:

S1\displaystyle S_{1} =\displaystyle= 1Nj<isij(β,γ)+Op(bn3ε~n3zlognn),\displaystyle\frac{1}{\sqrt{N}}\sum_{j<i}s_{ij}(\beta^{*},\gamma^{*})+O_{p}(\frac{b_{n}^{3}\tilde{\varepsilon}_{n}^{3}z_{*}\log n}{n}),
S2\displaystyle S_{2} =\displaystyle= k=1njkμkj′′(β,γ)zkjvkk+Op(bn4ε~n3z(logn)1/2n1/2),\displaystyle\sum_{k=1}^{n}\frac{\sum_{j\neq k}\mu_{kj}^{\prime\prime}(\beta^{*},\gamma^{*})z_{kj}}{v_{kk}}+O_{p}\left(\frac{b_{n}^{4}\tilde{\varepsilon}_{n}^{3}z_{*}(\log n)^{1/2}}{n^{1/2}}\right),
S3\displaystyle\|S_{3}\|_{\infty} =\displaystyle= Op((logn)3/2bn3ε~n3n1/2).\displaystyle O_{p}(\frac{(\log n)^{3/2}b_{n}^{3}\tilde{\varepsilon}_{n}^{3}}{n^{1/2}}).

We defer the proofs of the above equations to supplementary material. Substituting the above equations into (30) then gives

N(γ^γ)=H¯1B+H¯1×1Ni<jsij(β,γ)+Op(bn4ε~n3z(logn)3/2n1/2).\sqrt{N}(\widehat{\gamma}-\gamma^{*})=-\bar{H}^{-1}B_{*}+\bar{H}^{-1}\times\frac{1}{\sqrt{N}}\sum_{i<j}s_{ij}(\beta^{*},\gamma^{*})+O_{p}\left(\frac{b_{n}^{4}\tilde{\varepsilon}_{n}^{3}z_{*}(\log n)^{3/2}}{n^{1/2}}\right).

If bn4ε~n3z=o(n1/2/(logn)3/2)b_{n}^{4}\tilde{\varepsilon}_{n}^{3}z_{*}=o(n^{1/2}/(\log n)^{3/2}), then Theorem 3 immediately follows from Proposition 2.

References

  • Billingsley, (1995) Billingsley, P. (1995). Probability and measure. 3rd edition. Wiley, New York.
  • Chatterjee et al., (2011) Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence. Annals of Applied Probability, 21(4):1400–1435.
  • Cohen, (2004) Cohen, W. W. (2004). Enron email dataset (retrieved march 12, 2005).
  • Day et al., (2016) Day, W.-Y., Li, N., and Lyu, M. (2016). Publishing graph degree distribution with node differential privacy. In Proceedings of the 2016 International Conference on Management of Data, pages 123–138, New York, NY, USA.
  • Dwork et al., (2006) Dwork, C., Mcsherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. Lecture Notes in Computer Science, pages 265–284.
  • Dzemski, (2019) Dzemski, A. (2019). An empirical model of dyadic link formation in a network with unobserved heterogeneity. The Review of Economics and Statistics, (To appear).
  • Fienberg, (2012) Fienberg, S. E. (2012). A brief history of statistical models for network analysis and open challenges. Journal of Computational and Graphical Statistics, 21(4):825–839.
  • Gragg and Tapia, (1974) Gragg, W. B. and Tapia, R. A. (1974). Optimal error bounds for the newton�ckantorovich theorem. SIAM Journal on Numerical Analysis, 11(1):10–13.
  • Graham, (2017) Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity. Econometrica, 85(4):1033–1063.
  • Hay et al., (2009) Hay, M., Li, C., Miklau, G., and Jensen, D. (2009). Accurate estimation of the degree distribution of private networks. In 2009 Ninth IEEE International Conference on Data Mining, pages 169–178.
  • Hillar et al., (2012) Hillar, C. J., Lin, S., and Wibisono, A. (2012). Inverses of symmetric, diagonally dominant positive matrices and applications.
  • Hoeffding, (1963) Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30.
  • Kantorovich, (1948) Kantorovich, L. V. (1948). Functional analysis and applied mathematics. Uspekhi Mat Nauk, pages 89–185.
  • Karwa and Slavković, (2016) Karwa, V. and Slavković, A. (2016). Inference using noisy degrees-differentially private beta model and synthetic graphs. The Annals of Statistics, 44:87–112.
  • Loéve, (1977) Loéve, M. (1977). Probability theory I. 4th ed. Springer, New York.
  • Lounici, (2008) Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of lasso and dantzig estimators. Electronic Journal of Statistics, 2:90–102.
  • Macwan and Patel, (2018) Macwan, K. R. and Patel, S. J. (2018). Node differential privacy in social graph degree publishing. Procedia Computer Science, 143:786 – 793. 8th International Conference on Advances in Computing & Communications (ICACC-2018).
  • McCullagh and Nelder, (1989) McCullagh, P. and Nelder, J. (1989). Generalized Linear Models, Second Edition. Chapman and Hall.
  • Narayanan and Shmatikov, (2009) Narayanan, A. and Shmatikov, V. (2009). De-anonymizing social networks. In 2009 30th IEEE Symposium on Security and Privacy, pages 173–187.
  • Neyman and Scott, (1948) Neyman, J. and Scott, E. (1948). Consistent estimates based on partially consistent observations. Econometrica, (16):1–32.
  • Nguyen et al., (2016) Nguyen, H. H., Imine, A., and Rusinowitch, M. (2016). Detecting communities under differential privacy. In Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society, WPES ¡¯16, page 83¨C93, New York, NY, USA. Association for Computing Machinery.
  • Nissim et al., (2007) Nissim, K., Raskhodnikova, S., and Smith, A. (2007). Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84.
  • Wang et al., (2022) Wang, Q., Yan, T., Jiang, B., and Leng, C. (2022). Two-mode networks: inference with as many parameters as actors and differential privacy. Journal of Machine Learning Research, 292(23):1–38.
  • Wang et al., (2023) Wang, Q., Zhang, Y., and Yan, T. (2023). Asymptotic theory in network models with covariates and a growing number of node parameters. Annals of the Institute of Statistical Mathematics, 75(2):369–392.
  • Wasserman and Zhou, (2010) Wasserman, L. and Zhou, S. (2010). A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389.
  • Yan, (2021) Yan, T. (2021). Directed networks with a differentially private bi-degree sequence. Statistica Sinica, 31(4):pp. 2031–2050.
  • Yan et al., (2019) Yan, T., Jiang, B., Fienberg, S. E., and Leng, C. (2019). Statistical inference in a directed network model with covariates. Journal of the American Statistical Association, 114(526):857–868.
  • Yan et al., (2015) Yan, T., Zhao, Y., and Qin, H. (2015). Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters. Journal of Multivariate Analysis, 133:61 – 76.
  • Zhou et al., (2007) Zhou, Y., Goldberg, M., Magdon-Ismail, M., and Wallace, W. A. (2007). Strategies for cleaning organizational emails with an application to enron email dataset. In in 5th Conference of North American Association for Computatational Social Organization Science, Pittsburgh. North American Association for Computational Social and Organizational Science.