This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A two-way heterogeneity model for dynamic networks

Binyan Jiang
Department of Applied Mathematics, The Hong Kong Polytechnic University
Hong Kong, [email protected]
   Chenlei Leng
Department of Statistics, University of Warwick
United Kindom, [email protected]
   Ting Yan
Department of Statistics, Central China Normal University
China, [email protected]
   Qiwei Yao
Department of Statistics, London School of Economics
United Kindom, [email protected]
   Xinyang Yu
Department of Applied Mathematics, The Hong Kong Polytechnic University
Hong Kong, [email protected]
Abstract

Dynamic network data analysis requires joint modelling individual snapshots and time dynamics. This paper proposes a new two-way heterogeneity model towards this goal. The new model equips each node of the network with two heterogeneity parameters, one to characterize the propensity of forming ties with other nodes and the other to differentiate the tendency of retaining existing ties over time. Though the negative log-likelihood function is non-convex, it is locally convex in a neighbourhood of the true value of the parameter vector. By using a novel method of moments estimator as the initial value, the consistent local maximum likelihood estimator (MLE) can be obtained by a gradient descent algorithm. To establish the upper bound for the estimation error of the MLE, we derive a new uniform deviation bound, which is of independent interest. The usefulness of the model and the associated theory are further supported by extensive simulation and the analysis of some real network data sets.

Keywords: Degree heterogeneity, Dynamic networks, Maximum likelihood estimation, Uniform deviation bound

1 Introduction

Network data featuring prominent interactions between subjects arise in various areas such as biology, economics, engineering, medicine, and social sciences [27, 20]. As a rapidly growing field of active research, statistical modelling of networks aims to capture and understand the linking patterns in these data. A large part of the literature has focused on examining these patterns for canonical, static networks that are observed at a single snapshot. Due to the increasing availability of networks that are observed multiple times, models for dynamic networks evolving in time are of increasing interest now. These models typically assume, among others, that networks observed at different time are independent [28, 1], independent conditionally on some latent processes [4, 23], or drawn sequentially from an exponential random graph model conditional on the previous networks [11, 10, 21].

One of the stylized facts of real-life networks is that their nodes often have different tendencies to form ties and may evolve differently over time. The former is manifested by the fact that the so-called hub nodes have many links while the peripheral nodes have small numbers of connections in, for example, a big social network. The latter becomes evident when some individuals are more active in seeking new ties/friends than the others. In this paper, we refer to these two kinds of heterogeneity as static heterogeneity and dynamic heterogeneity respectively. Also known as degree heterogeneity in the static network literature, static heterogeneity has featured prominently in several popular models widely used in practice including the stochastic block model and its degree-corrected generalization [17]. See also [15, 29, 16, 19], and the references therein. Another common and natural approach to capture the static heterogeneity is to introduce node-specific parameters, one for each node. For single networks, this is often conducted via modelling the logit of the link probability between each pair of nodes as the sum of their heterogeneity parameters. Termed as the β\beta-model [2], this model and its generalizations have been extensively studied when a single static network is observed [39, 18, 7, 35, 3, 30].

With nn observed networks each having pp nodes, the goal of this study is two-fold: (i) We propose a dynamic network model named the two-way heterogeneity model that captures both static heterogeneity and dynamic heterogeneity, and develop the associate inference methodology; (ii) We establish new generic asymptotic results that can be applied or extended to different models with a large number of parameters (in relation to pp). We focus on the scenario that the number of nodes pp goes to infinity. Our asymptotic results hold when npnp\rightarrow\infty, though nn may be fixed. The main contributions of our paper can be summarized as follows.

  • We introduce a reparameterization of the general autoregressive network model [14] to accommodate variations in both node degree and dynamic fluctuations. This novel approach can be regarded as an extension of the β\beta-model [2] to a dynamic framework. It encompasses two sets of parameters for heterogeneity: one governs static variations, akin to those in the standard β\beta-model, while the other addresses dynamic fluctuations. Unlike the general model in [14], which necessitates a large number of network observations (i.e. nn\rightarrow\infty), we demonstrate the validity of our formulation even in scenarios where nn is small but pp is large.

  • The formulation of our model gives rise to a high-dimensional non-convex loss function based on likelihood. By establishing the local convexity of the loss function in a neighborhood of the true parameters, we compute the local MLE by a standard gradient descent algorithm using a newly proposed method of moments estimator (MME) as its initial value. To our best knowledge, this is the first result in network data analysis for solving such a non-convex optimization problem with algorithmic guarantees.

  • Furthermore, to characterize the local MLE, we have derived its estimation error bounds in the 2\ell_{2} norm and the \ell_{\infty} norm when npnp\rightarrow\infty in which n2n\geq 2 can be finite. Due to the dynamic structure of the data, the Hessian matrix of the loss function exhibits a complex structure. As a result, existing analytical approaches, such as the interior point theorem [6, 37] developed for static networks, are no longer applicable; see Section 3.1 for further elaboration. We derive a novel locally uniform deviation bound in a neighborhood of the true parameters with a diverging radius. Based on this we first establish 2\ell_{2} norm consistency of the MLE, which paves the way for the uniform consistency in \ell_{\infty} norm.

  • In establishing the locally uniform deviation bound, we have provided a general result for functions of the form L(𝜽)=1p1ijpli,j(θi,θj)Yi,jL(\boldsymbol{\theta})=\frac{1}{p}\sum_{1\leq i\neq j\leq p}l_{i,j}\left(\theta_{i},\theta_{j}\right)Y_{i,j} as defined in (4.11) below. This result explores the sparsity structure of L(𝜽)L(\boldsymbol{\theta}) in the sense that most of its higher order derivatives are zero – the condition which our model satisfies, and provides a new bound that substantially extends the scope of empirical processes for the M-estimators [32] for the models with a fixed number of parameters to those with a growing number of parameters. The result here is of independent interest as it can be applied to any model with an objective function taking the form of LL.

The rest of the paper is organized as follows. We introduce in Section 2 the new two-way heterogeneity model and present its properties. The estimation of its local MLE in a neighborhood of the truth and the associated theoretical properties are presented in Section 3. The development of these properties relies on new local deviation bounds which are presented in Section 4. Simulation studies and an analysis of ants interaction data are reported in Section 5. We conclude the paper in Section 6. All technical proofs are relegated to Appendix A. Additional numerical results showcasing the effectiveness of our method in aiding community detection within stochastic block structures, along with an application aimed at understanding dynamic protein-protein interaction networks, are provided in Appendix B.

2 Two-way Heterogeneity Model

Consider a dynamic network defined on pp nodes which are unchanged over time. Denote by a p×pp\times p matrix 𝐗t=(Xi,jt){\mathbf{X}}^{t}=(X_{i,j}^{t}) its adjacency matrix at time tt, i.e. Xi,jt=1X_{i,j}^{t}=1 indicates the existence of a connection between nodes ii and jj at time tt, and 0 otherwise. We focus on undirected networks without self-loops, i.e., Xi,jt=Xj,itX_{i,j}^{t}=X_{j,i}^{t} for all (i,j)𝒥{(i,j):1i<jp}(i,j)\in{{\mathcal{J}}\equiv\{(i,j):1\leq i<j\leq p\}}, and Xi,it=0X_{i,i}^{t}=0 for 1ip1\leq i\leq p, though our approach can be readily extended to directed networks.

To capture the autoregressive pattern in dynamic networks, [14] proposed to model the network process via the following stationary AR(1) framework:

Xi,jt=Xi,jt1I(εi,jt=0)+I(εi,jt=1),t1,X^{t}_{i,j}\;=\;X^{t-1}_{i,j}\,I({\varepsilon}_{i,j}^{t}=0)\;+\;I({\varepsilon}_{i,j}^{t}=1),\quad t\geq 1,

where I()I(\cdot) denotes the indicator function, and the εi,jt{\varepsilon}_{i,j}^{t}, (i,j)𝒥(i,j)\in{\mathcal{J}} are independent innovations satisfying

P(εi,jt=1)=αi,j,P(εi,jt=1)=βi,j,P(εi,jt=0)=1αi,jβi,j,P({\varepsilon}_{i,j}^{t}=1)=\alpha_{i,j},\quad P({\varepsilon}_{i,j}^{t}=-1)=\beta_{i,j},\quad P({\varepsilon}_{i,j}^{t}=0)=1-\alpha_{i,j}-\beta_{i,j},

for some positive parameters αi,j\alpha_{i,j} and βi,j\beta_{i,j}. This general model opts to neglect the inherent nature of the networks and chooses to estimate each pair (αi,j,βi,j)\left(\alpha_{i,j},\beta_{i,j}\right) independently. As a result, there are p(p1)p(p-1) parameters and consistent model estimation requires nn\rightarrow\infty. Conversely, in numerous real-world scenarios, it is frequently noted that the number of network observations nn is modest, while the number of nodes pp can significantly exceed nn. Under such a scenario of small-nn-large-pp, the conventional model outlined in [14] may not be suitable. To address this and to effectively capture node heterogeneity in dynamic networks, as well as accommodate small-nn-large-pp networks, we propose the following reparameterization for the general AR(1) model mentioned above. This reparameterization not only accounts for inherent node heterogeneity but also reduces the parameter count from p(p1)p(p-1) to 2p2p.

Definition 1.

Two-way Heterogeneity Model (TWHM). The data generating process satisfies

(2.1) Xi,jt=I(εi,jt=0)+Xi,jt1I(εi,jt=1),(i,j)𝒥,X^{t}_{i,j}=I({\varepsilon}_{i,j}^{t}=0)+X^{t-1}_{i,j}I({\varepsilon}_{i,j}^{t}=1),\qquad(i,j)\in{\mathcal{J}},

where the εi,jt{\varepsilon}_{i,j}^{t}, for (i,j)𝒥(i,j)\in{\cal J} and t1t\geq 1 are independent innovations with their distributions satisfying

(2.2) P(εi,jt=r)=eβi,r+βj,r1+k=01eβi,k+βj,kforr=0,1,P(εi,jt=1)=11+k=01eβi,k+βj,k.P({\varepsilon}_{i,j}^{t}=r)=\frac{e^{\beta_{i,r}+\beta_{j,r}}}{1+\sum_{k=0}^{1}e^{\beta_{i,k}+\beta_{j,k}}}~{}~{}{\rm for}~{}r=0,1,\quad P({\varepsilon}_{i,j}^{t}=-1)=\frac{1}{1+\sum_{k=0}^{1}e^{\beta_{i,k}+\beta_{j,k}}}.

TWHM defined above is a reparametrization of the AR(1) network model [14] as it reduces the total number of parameters from 2p22p^{2} therein to 2p2p. By Proposition 1 of [14], the matrix process {𝐗t,t1}\{{\mathbf{X}}^{t},t\geq 1\} is strictly stationary with

(2.3) P(Xi,jt=1)=eβi,0+βj,01+eβi,0+βj,0=1P(Xi,jt=0),P(X_{i,j}^{t}=1)=\frac{e^{\beta_{i,0}+\beta_{j,0}}}{1+e^{\beta_{i,0}+\beta_{j,0}}}=1-P(X_{i,j}^{t}=0),

provided that we activate the process with 𝐗0=(Xi,j0){\mathbf{X}}^{0}=(X_{i,j}^{0}) also following this stationary marginal distribution.

Furthermore,

E(Xi,jt)=eβi,0+βj,01+eβi,0+βj,0,Var(Xi,jt)=eβi,0+βj,0(1+eβi,0+βj,0)2,{\rm E}(X_{i,j}^{t})=\frac{e^{\beta_{i,0}+\beta_{j,0}}}{1+e^{\beta_{i,0}+\beta_{j,0}}},\qquad{\rm Var}(X_{i,j}^{t})=\frac{e^{\beta_{i,0}+\beta_{j,0}}}{(1+e^{\beta_{i,0}+\beta_{j,0}})^{2}},
(2.4) ρi,j(|ts|)Corr(Xi,jt,Xi,js)=(eβi,1+βj,11+r=01eβi,r+βj,r)|ts|.\rho_{i,j}(|t-s|)\equiv{\rm Corr}(X_{i,j}^{t},X_{i,j}^{s})=\left(\frac{e^{\beta_{i,1}+\beta_{j,1}}}{1+\sum_{r=0}^{1}e^{\beta_{i,r}+\beta_{j,r}}}\right)^{|t-s|}.

Note that the connection probabilities in (2.3) depend on 𝜷0=(β1,0,,βp,0)\boldsymbol{\beta}_{0}=(\beta_{1,0},\cdots,\beta_{p,0})^{\top} only, and are of the same form as the (static) β\beta-model [2]. Hence we call 𝜷0\boldsymbol{\beta}_{0} the static heterogeneity parameter. Proposition 1 below confirms that means and variances of node degrees in TWHM also depend on 𝜷0\boldsymbol{\beta}_{0} only, and that different values of βi,0\beta_{i,0} reflect the heterogeneity in the degrees of nodes.

Under TWHM, it holds that

(2.5) P(Xi,jt=1|Xi,jt1=0)=eβi,0+βj,01+k=01eβi,k+βj,k,P(Xi,jt=0|Xi,jt1=1)=11+k=01eβi,k+βj,k.P(X^{t}_{i,j}=1|X^{t-1}_{i,j}=0)=\frac{e^{\beta_{i,0}+\beta_{j,0}}}{1+\sum_{k=0}^{1}e^{\beta_{i,k}+\beta_{j,k}}},P(X^{t}_{i,j}=0|X^{t-1}_{i,j}=1)=\frac{1}{1+\sum_{k=0}^{1}e^{\beta_{i,k}+\beta_{j,k}}}.

Hence the dynamic changes (over time) of network 𝐗t{\mathbf{X}}^{t} depend on, in addition to 𝜷0\boldsymbol{\beta}_{0}, 𝜷1(β1,1,,βp,1)\boldsymbol{\beta}_{1}\equiv(\beta_{1,1},\cdots,\beta_{p,1})^{\top}: the larger βi,1\beta_{i,1} is, the more likely Xi,jtX_{i,j}^{t} will retain the value of Xi,jt1X_{i,j}^{t-1} for all jj. Thus we call 𝜷1\boldsymbol{\beta}_{1} the dynamic heterogeneity parameter, as its components reflect the different dynamic behaviours of the pp nodes. A schematic description of the model can be seen from Figure 1 where three snapshots of a dynamic network with four nodes are depicted.

Refer to caption
Figure 1: A schematic depiction of TWHM: βi,0,i=1,,4\beta_{i,0},i=1,...,4, are parameters to characterize the static heterogeneity of nodes, while βi,1\beta_{i,1} characterize their dynamic heterogeneity.

From now on, let {𝐗t}P𝜽\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}} denote the stationary TWHM with parameters 𝜽=(𝜷0,𝜷1){\boldsymbol{\theta}}=(\boldsymbol{\beta}_{0}^{\top},\boldsymbol{\beta}_{1}^{\top})^{\top}, and dit=j=1pXi,jtd_{i}^{t}=\sum_{j=1}^{p}X_{i,j}^{t} be the degree of node ii at time tt. The proposition below lists some properties of the node degrees.

Proposition 1.

Let {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}}. Then {(d1t,,dpt),t=0,1,2,}\{(d_{1}^{t},\ldots,d_{p}^{t}),t=0,1,2,\cdots\} is a strictly stationary process. Furthermore for any 1i<jp1\leq i<j\leq p and t,s0t,s\geq 0,

E(dit)=k=1,kipeβi,0+βk,01+eβi,0+βk,0,Var(dit)=k=1,kipeβi,0+βk,0(1+eβi,0+βk,0)2,{\rm E}(d_{i}^{t})=\sum_{k=1,\>k\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{1+e^{\beta_{i,0}+\beta_{k,0}}},\qquad{\rm Var}(d_{i}^{t})=\sum_{k=1,\>k\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{(1+e^{\beta_{i,0}+\beta_{k,0}})^{2}},
ρi,jd(|ts|)\displaystyle\rho^{d}_{i,j}(|t-s|) \displaystyle\equiv Corr(dit,djs)\displaystyle{\rm Corr}(d_{i}^{t},d_{j}^{s})
=\displaystyle= {Ci,ρk=1,kip(eβi,1+βk,11+r=01eβi,r+βk,r)|ts|eβi,0+βk,0(1+eβi,0+βk,0)2ifi=j,0ifij,\displaystyle\begin{cases}C_{i,\rho}\sum_{k=1,\>k\neq i}^{p}\left(\frac{e^{\beta_{i,1}+\beta_{k,1}}}{1+\sum_{r=0}^{1}e^{\beta_{i,r}+\beta_{k,r}}}\right)^{|t-s|}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{(1+e^{\beta_{i,0}+\beta_{k,0}})^{2}}\quad&{\rm if}\;i=j,\\ 0&{\rm if}\;i\neq j,\end{cases}

where Ci,ρ=(k=1,kipeβi,0+βk,0(1+eβi,0+βk,0)2)1C_{i,\rho}=\left(\sum_{k=1,\>k\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{(1+e^{\beta_{i,0}+\beta_{k,0}})^{2}}\right)^{-1}.

Proposition 1 implies that when there exist constants β0\beta_{0} and β1\beta_{1} such that βi,0β0\beta_{i,0}\approx\beta_{0} and βi,1β1\beta_{i,1}\approx\beta_{1} for all ii, the degree sequence {dit,t=1,,n}\{d_{i}^{t},t=1,\ldots,n\} is approximately AR(1).

3 Parameter Estimation

We introduce some notation first. Denote by 𝐈p{{\mathbf{I}}}_{p} the p×pp\times p identity matrix. For any ss\in\mathbb{R}, sp\textbf{s}_{p} denotes the p×1p\times 1 vector with all its elements equal to ss. For 𝐚=(a1,,ap)p{\mathbf{a}}=(a_{1},\ldots,a_{p})^{\top}\in\mathbb{R}^{p} and 𝐀=(Ai,j)p×p{\mathbf{A}}=(A_{i,j})\in\mathbb{R}^{p\times p}, let 𝐚q=(aiq)1/q\|{\mathbf{a}}\|_{q}=\left(a_{i}^{q}\right)^{1/q} for any q1q\geq 1, 𝐚=maxi|ai|\|{\mathbf{a}}\|_{\infty}=\max_{i}|a_{i}|, and 𝐀=maxij=1p|Ai,j|\|{\mathbf{A}}\|_{\infty}=\max_{i}\sum_{j=1}^{p}|A_{i,j}|. Furthermore, let 𝐀2\|{\mathbf{A}}\|_{2} denote the spectral norm of 𝐀{\mathbf{A}} which equals its largest eigenvalue. For a random matrix 𝐖p×p{\mathbf{W}}\in\mathbb{R}^{p\times p} with E(𝐖)=0{\rm E}\left({\mathbf{W}}\right)=\textbf{0}, define its matrix variance as Var(𝐖)=max{E(𝐖𝐖)2,E(𝐖𝐖)2}{\rm Var}({\mathbf{W}})=\max\left\{\|{\rm E}\left({\mathbf{W}}{\mathbf{W}}^{\top}\right)\|_{2},\|{\rm E}\left({\mathbf{W}}^{\top}{\mathbf{W}}\right)\|_{2}\right\}. The notation xyx\lesssim y means that there exists a constant c1>0c_{1}>0 such that |x|c1|y||x|\leq c_{1}|y|, while notation xyx\gtrsim y means there exists a constant c2>0c_{2}>0 such that |x|c2|y||x|\geq c_{2}|y|. Denote by 𝐁(𝐱,r)={𝐲:𝐲𝐱r}{\mathbf{B}}_{\infty}\left({\mathbf{x}},r\right)=\left\{{\mathbf{y}}:\|{\mathbf{y}}-{\mathbf{x}}\|_{\infty}\leq r\right\} the ball centred at 𝐱{\mathbf{x}} with \ell_{\infty} radius rr. Let c,c0,c1,,C,C0,C1,c,c_{0},c_{1},\ldots,C,C_{0},C_{1},\ldots denote some generic constants that may be different in different places. Let 𝜽=(𝜷0,𝜷1)=(β1,0,,βp,0,β1,1,,βp,1){\boldsymbol{\theta}}^{*}=(\boldsymbol{\beta}_{0}^{*\top},\boldsymbol{\beta}_{1}^{*\top})^{\top}=(\beta^{*}_{1,0},\cdots,\beta^{*}_{p,0},\beta^{*}_{1,1},\cdots,\beta^{*}_{p,1})^{\top} be the true unknown parameters. We assume:

  • (A1)

    There exists a constant KK such that for any i=1,2,,pi=1,2,\cdots,p, the true parameters satisfy βi,1max(βi,0,0)<K\beta^{*}_{i,1}-\max\big{(}\beta^{*}_{i,0},0\big{)}<K.

Condition (A1) ensures that the autocorrelation functions (ACFs) in (2.4) are bounded away from 1 for any (i,j)𝒥(i,j)\in{\mathcal{J}}. It is worth noting that both βi,1\beta^{*}_{i,1} and βi,0\beta^{*}_{i,0} are allowed to vary with pp, thus accommodating sparse networks in our analysis. In practical terms, βi,0\beta_{i,0}^{*}, which reflects the sparsity of the stationary network, tends to be very small for large networks. Consequently, condition (A1) holds when βi,1\beta_{i,1}^{*} is bounded from above.

3.1 Maximum likelihood estimation

With the available observations 𝐗0,,𝐗n{\mathbf{X}}^{0},\cdots,{\mathbf{X}}^{n}, the log-likelihood function conditionally on 𝐗0{\mathbf{X}}^{0} is of the form L(𝜽;𝐗n,,𝐗1|𝐗0)=t=1nL(𝜽;𝐗t|𝐗t1)L(\boldsymbol{\theta};{\mathbf{X}}^{n},\cdots,{\mathbf{X}}^{1}|{\mathbf{X}}^{0})=\prod_{t=1}^{n}L(\boldsymbol{\theta};{\mathbf{X}}^{t}|{\mathbf{X}}^{t-1}). Note {Xi,jt}\{X^{t}_{i,j}\} for different (i,j)𝒥(i,j)\in{\mathcal{J}} are independent with each other. By (2.5), a (normalized) negative log-likelihood admits the following form:

(3.6) l(𝜽)=1npL(𝜽;𝐗n,𝐗n1,,𝐗1|𝐗0)\displaystyle l(\boldsymbol{\theta})=-{\frac{1}{np}}L(\boldsymbol{\theta};{\mathbf{X}}^{n},{\mathbf{X}}^{n-1},\cdots,{\mathbf{X}}^{1}|{\mathbf{X}}^{0})
=1p1i<jplog(1+eβi,0+βj,0+eβi,1+βj,1)+1np1i<jp{(βi,0+βj,0)t=1nXi,jt\displaystyle=-{\frac{1}{p}}\sum_{1\leq i<j\leq p}\log\Big{(}{1+e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}}\Big{)}+{\frac{1}{np}}{\sum_{1\leq i<j\leq p}\Bigg{\{}\left(\beta_{i,0}+\beta_{j,0}\right)\sum_{t=1}^{n}X_{i,j}^{t}}
+log(1+eβi,1+βj,1)t=1n(1Xi,jt)(1Xi,jt1)\displaystyle+\log\left(1+e^{\beta_{i,1}+\beta_{j,1}}\right)\sum_{t=1}^{n}\left(1-X_{i,j}^{t}\right)\left(1-X_{i,j}^{t-1}\right)
+log(1+eβi,1+βj,1βi,0βj,0)t=1nXi,jtXi,jt1}.\displaystyle+\log\big{(}1+e^{\beta_{i,1}+\beta_{j,1}-\beta_{i,0}-\beta_{j,0}}\big{)}\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1}\Bigg{\}}.

For brevity, write

(3.7) ai,j=t=1nXi,jt,bi,j=t=1nXi,jtXi,jt1,di,j=t=1n(1Xi,jt)(1Xi,jt1).a_{i,j}=\sum_{t=1}^{n}X_{i,j}^{t},\quad b_{i,j}=\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1},\quad d_{i,j}=\sum_{t=1}^{n}\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}.

Then the Hessian matrix of l(𝜽)l(\boldsymbol{\theta}) is of the form

𝐕(𝜽)=2l(𝜽)𝜽𝜽=[2l(𝜽)𝜷0𝜷02l(𝜽)𝜷0𝜷12l(𝜽)𝜷1𝜷02l(𝜽)𝜷1𝜷1]:=[𝐕1(𝜽)𝐕2(𝜽)𝐕2(𝜽)𝐕3(𝜽)],{\mathbf{V}}(\boldsymbol{\theta})=\frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\boldsymbol{\theta}}\partial{\boldsymbol{\theta}^{\top}}}=\begin{bmatrix}\frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\boldsymbol{\beta}_{0}}\partial{\boldsymbol{\beta}_{0}^{\top}}}&\frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\boldsymbol{\beta}_{0}}\partial{\boldsymbol{\beta}_{1}^{\top}}}\\ \frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\boldsymbol{\beta}_{1}}\partial{\boldsymbol{\beta}_{0}^{\top}}}&\frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\boldsymbol{\beta}_{1}}\partial{\boldsymbol{\beta}_{1}^{\top}}}\end{bmatrix}:=\begin{bmatrix}{\mathbf{V}}_{1}(\boldsymbol{\theta})&{\mathbf{V}}_{2}(\boldsymbol{\theta})\\ {\mathbf{V}}_{2}(\boldsymbol{\theta})&{\mathbf{V}}_{3}(\boldsymbol{\theta})\end{bmatrix},

where for iji\not=j,

2l(𝜽)βi,0βj,0\displaystyle\frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\beta_{i,0}}\partial{\beta_{j,0}}} =\displaystyle= 1peβi,0+βj,0(1+eβi,1+βj,1)(1+eβi,0+βj,0+eβi,1+βj,1)21npbi,jeβi,0+βj,0+βi,1+βj,1(eβi,0+βj,0+eβi,1+βj,1)2,\displaystyle\frac{1}{p}\frac{e^{\beta_{i,0}+\beta_{j,0}}(1+e^{\beta_{i,1}+\beta_{j,1}})}{(1+e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}})^{2}}-\frac{1}{np}b_{i,j}\frac{e^{\beta_{i,0}+\beta_{j,0}+\beta_{i,1}+\beta_{j,1}}}{(e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}})^{2}},
2l(𝜽)βi,0βj,1\displaystyle\frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\beta_{i,0}}\partial{\beta_{j,1}}} =\displaystyle= 1peβi,0+βj,0+βi,1+βj,1(1+eβi,0+βj,0+eβi,1+βj,1)2+1npbi,jeβi,0+βj,0+βi,1+βj,1(eβi,0+βj,0+eβi,1+βj,1)2,\displaystyle-\frac{1}{p}\frac{e^{\beta_{i,0}+\beta_{j,0}+\beta_{i,1}+\beta_{j,1}}}{(1+e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}})^{2}}+\frac{1}{np}b_{i,j}\frac{e^{\beta_{i,0}+\beta_{j,0}+\beta_{i,1}+\beta_{j,1}}}{(e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}})^{2}},
2l(𝜽)βi,1βj,1\displaystyle\frac{\partial^{2}{l(\boldsymbol{\theta})}}{\partial{\beta_{i,1}}\partial{\beta_{j,1}}} =\displaystyle= 1peβi,1+βj,1(1+eβi,0+βj,0)(1+eβi,0+βj,0+eβi,1+βj,1)21npdi,jeβi,1+βj,1(1+eβi,1+βj,1)2\displaystyle\frac{1}{p}\frac{e^{\beta_{i,1}+\beta_{j,1}}(1+e^{\beta_{i,0}+\beta_{j,0}})}{(1+e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}})^{2}}-\frac{1}{np}d_{i,j}\frac{e^{\beta_{i,1}+\beta_{j,1}}}{(1+e^{\beta_{i,1}+\beta_{j,1}})^{2}}
1npbi,jeβi,0+βj,0+βi,1+βj,1(eβi,0+βj,0+eβi,1+βj,1)2.\displaystyle-\frac{1}{np}b_{i,j}\frac{e^{\beta_{i,0}+\beta_{j,0}+\beta_{i,1}+\beta_{j,1}}}{(e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}})^{2}}.

Note that matrix 𝐕2(𝜽){\mathbf{V}}_{2}(\boldsymbol{\theta}) is symmetric. Furthermore, the three matrices 𝐕1(𝜽),𝐕2(𝜽){\mathbf{V}}_{1}(\boldsymbol{\theta}),{\mathbf{V}}_{2}(\boldsymbol{\theta}) and 𝐕3(𝜽){\mathbf{V}}_{3}(\boldsymbol{\theta}) are diagonally balanced [12] in the sense that their diagonal elements are the sums of their respective rows, namely,

(𝐕k(𝜽))i,i=j=1,jip(𝐕k(𝜽))i,j,k=1,2,3.({\mathbf{V}}_{k}(\boldsymbol{\theta}))_{i,i}=\sum_{j=1,\>j\neq i}^{p}({\mathbf{V}}_{k}(\boldsymbol{\theta}))_{i,j},\quad k=1,2,3.

Unfortunately the Hessian matrix 𝐕(𝜽){\mathbf{V}}(\boldsymbol{\theta}) is not uniformly positive-definite. Hence l(𝜽)l(\boldsymbol{\theta}) is not convex; see Section 5.1 for an example. Therefore, finding the global MLE by minimizing l(𝜽)l(\boldsymbol{\theta}) would be infeasible, especially given the large dimensionality of 𝜽\boldsymbol{\theta}. To overcome the obstacle, we propose the following roadmap to search for the local MLE over a neighbourhood of the true parameter values 𝜽\boldsymbol{\theta}^{*}.

  • (1)

    First we show that l(𝜽)l(\boldsymbol{\theta}) is locally convex in a neighbourhood of 𝜽\boldsymbol{\theta}^{*} (see Theorem 1 below). Towards this end, we first prove that E(𝐕(𝜽)){\rm E}({\mathbf{V}}(\boldsymbol{\theta})) is positive definite in a neighborhood of 𝜽\boldsymbol{\theta}^{*}. Leveraging on some newly proved concentration results, we show that 𝐕(𝜽){\mathbf{V}}(\boldsymbol{\theta}) converges to E(𝐕(𝜽)){\rm E}({\mathbf{V}}(\boldsymbol{\theta})) uniformly over the neighborhood.

  • (2)

    Denote by 𝜽^\widehat{\boldsymbol{\theta}} the local MLE in the neighbourhood identified above. We derive the bounds for 𝜽^𝜽\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*} respectively in both 2\ell_{2} and \ell_{\infty} norms (see Theorems 2 and 3 below). The 2\ell_{2} convergence is established by providing a uniform upper bound for the local deviation between l(𝜽)E(l(𝜽))l(\boldsymbol{\theta})-{\rm E}(l(\boldsymbol{\theta})) and l(𝜽)E(l(𝜽))l(\boldsymbol{\theta}^{*})-{\rm E}(l(\boldsymbol{\theta}^{*})) (see Corollary 4 in Section 4). The \ell_{\infty} convergence of 𝜽^\widehat{\boldsymbol{\theta}} is established by further exploiting the special structure of the objective function.

  • (3)

    We propose a new method of moments estimator (MME) which is proved to lie asymptotically in the neighbourhood specified in (1) above. With this MME as the initial value, the local MLE 𝜽^\widehat{\boldsymbol{\theta}} can be simply obtained via a gradient decent algorithm.

The main technical challenges in the roadmap above can be summarized as follows.

Firstly, to establish the upper bounds as stated in (2) above, we need to evaluate the uniform local deviations of the loss function. While the theoretical framework for deriving similar deviations of M-estimators has been well established in, for example, [32, 31], classical techniques in empirical process for establishing uniform laws [33] are not applicable because the number of parameters in TWHM diverges.

Secondly, for the classical β\beta-model, proving the existence and convergence of its MLE relies strongly on the interior point theorem [6]. In particular, this theorem is applicable only because the Hessian matrix of the β\beta-model admits a nice structure, i.e. it is diagonally dominant and all its elements are positive depending on the parameters only [2, 39, 37, 8]. However the Hessian matrix of l(𝜽)l(\boldsymbol{\theta}) for TWHM depends on random variables Xi,jtX_{i,j}^{t}’s in addition to the parameters, making it impossible to verify if the score function is uniformly Fréchet differentiable or not, a key assumption required by the interior point theorem.

Lastly, the higher order derivatives of l(𝜽)l(\boldsymbol{\theta}) may diverge as the order increases. To see this, notice that for any integer kk, the kk-th order derivatives of l(𝜽)l(\boldsymbol{\theta}) is closely related to the (k1)(k-1)-th order derivatives of the Sigmoid function S(x)=11+exS(x)=\frac{1}{1+e^{-x}} in that kS(x)xk=m=0k2A(k1,m)(ex)m+1(1+ex)k\frac{\partial^{k}S\left(x\right)}{\partial x^{k}}=\frac{\sum_{m=0}^{k-2}-A\left(k-1,m\right)\left(-e^{x}\right)^{m+1}}{\left(1+e^{x}\right)^{k}}, where A(k1,m)A\left(k-1,m\right) is the Eulerian number [26]. Some of the coefficients A(k1,m)A\left(k-1,m\right) can diverge very quickly as kk increases. Thus, loosely speaking, l(𝜽)l(\boldsymbol{\theta}) is not smooth. This non-smoothness and the need to deal with a growing number of parameters make various local approximations based on Taylor expansions highly non-trivial; noting that the consistency of MLEs in many finite-dimensional models is often established via these approximations.

In our proofs, we have made great use of the special sparse structure of the loss function in the form (4.11) below. This sparsity structure stems from the fact that most of its higher order derivatives are zero. Based on the uniform local deviation bound obtained in Section 4, we have established an upper bound for the error of the local MLE under the l2l_{2} norm. Utilizing the structure of the marginalized functions of the loss we have further established an upper bound for the estimation error under the ll_{\infty} norm thanks to an iterative procedure stated in Section 3.3.

3.2 Existence of the local MLE

To establish the convexity of l(𝜽)l(\boldsymbol{\theta}) in a neighborhood of 𝜽\boldsymbol{\theta}^{*}, we first show that such a local convexity holds for E(𝐕(𝜽)){\rm E}({\mathbf{V}}(\boldsymbol{\theta})).

Proposition 2.

Let 𝐀{\mathbf{A}} be a 2p×2p2p\times 2p matrix defined as 𝐀=[𝐀1𝐀2𝐀2𝐀3],{\mathbf{A}}=\left[\begin{array}[]{ccc}{\mathbf{A}}_{1}&{\mathbf{A}}_{2}\\ {\mathbf{A}}_{2}&{\mathbf{A}}_{3}\end{array}\right], where 𝐀1{\mathbf{A}}_{1}, 𝐀2{\mathbf{A}}_{2}, 𝐀3{\mathbf{A}}_{3} are p×pp\times p symmetric matrices. Then 𝐀{\mathbf{A}} is positive (negative) definite if 𝐀2,𝐀2+𝐀3,𝐀2+𝐀1-{\mathbf{A}}_{2},{\mathbf{A}}_{2}+{\mathbf{A}}_{3},{\mathbf{A}}_{2}+{\mathbf{A}}_{1} are all positive (negative) definite.

Proof.

Consider any nonzero 𝐱=(𝐱1,𝐱2)2p{\mathbf{x}}=({\mathbf{x}}_{1}^{\top},{\mathbf{x}}_{2}^{\top})^{\top}\in\mathbb{R}^{2p} where 𝐱1,𝐱2p{\mathbf{x}}_{1},{\mathbf{x}}_{2}\in\mathbb{R}^{p}, we have:

𝐱T𝐀𝐱\displaystyle{\mathbf{x}}^{T}{\mathbf{A}}{\mathbf{x}} =\displaystyle= 𝐱1𝐀1𝐱1+𝐱2𝐀3𝐱2+2𝐱1𝐀2𝐱2\displaystyle{\mathbf{x}}_{1}^{\top}{\mathbf{A}}_{1}{\mathbf{x}}_{1}+{\mathbf{x}}_{2}^{\top}{\mathbf{A}}_{3}{\mathbf{x}}_{2}+2{\mathbf{x}}_{1}^{\top}{\mathbf{A}}_{2}{\mathbf{x}}_{2}
=\displaystyle= 𝐱1(𝐀1+𝐀2)𝐱1+𝐱2(𝐀3+𝐀2)𝐱2(𝐱1𝐱2)𝐀2(𝐱1𝐱2).\displaystyle{\mathbf{x}}_{1}^{\top}({\mathbf{A}}_{1}+{\mathbf{A}}_{2}){\mathbf{x}}_{1}+{\mathbf{x}}_{2}^{\top}({\mathbf{A}}_{3}+{\mathbf{A}}_{2}){\mathbf{x}}_{2}-({\mathbf{x}}_{1}-{\mathbf{x}}_{2})^{\top}{\mathbf{A}}_{2}({\mathbf{x}}_{1}-{\mathbf{x}}_{2}).

This proves the proposition. ∎

Noting that 𝐕2(𝜽),𝐕2(𝜽)+𝐕3(𝜽)-{\mathbf{V}}_{2}(\boldsymbol{\theta}),{\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta}) and 𝐕2(𝜽)+𝐕1(𝜽){\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta}) are all diagonally balanced matrices, with some routine calculations it can be shown that E𝐕2(𝜽),E(𝐕2(𝜽)+𝐕3(𝜽))-{\rm E}{\mathbf{V}}_{2}(\boldsymbol{\theta}^{*}),{\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta}^{*})+{\mathbf{V}}_{3}(\boldsymbol{\theta}^{*})) and E(𝐕2(𝜽)+𝐕1(𝜽)){\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta}^{*})+{\mathbf{V}}_{1}(\boldsymbol{\theta}^{*})) have only positive elements, and thus are all positive definite. Therefore, E𝐕(𝜽){\rm E}{\mathbf{V}}(\boldsymbol{\theta}^{*}) is positive definite by Proposition 2. By continuity, when 𝜽\boldsymbol{\theta} is close enough to 𝜽\boldsymbol{\theta}^{*}, E𝐕(𝜽){\rm E}{\mathbf{V}}(\boldsymbol{\theta}) is also positive definite, and hence El(𝜽){\rm E}l(\boldsymbol{\theta}) is strongly convex in a neighborhood of 𝜽\boldsymbol{\theta}^{*}. Next we want to show the local convexity of l(𝜽)l(\boldsymbol{\theta}) whose second order derivatives depend on the sufficient statistics bi,j=t=1nXi,jtXi,jt1b_{i,j}=\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1}, and di,j=t=1n(1Xi,jt)(1Xi,jt1)d_{i,j}=\sum_{t=1}^{n}\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}. By noticing that the network process is α\alpha-mixing with an exponential decaying mixing coefficient, we first obtain the following concentration results for bi,jb_{i,j} and di,jd_{i,j}, which ensure element-wise convergence of 𝐕(𝜽){\mathbf{V}}(\boldsymbol{\theta}) to E𝐕(𝜽){\rm E}{\mathbf{V}}(\boldsymbol{\theta}) for a given 𝜽\boldsymbol{\theta} when npnp\rightarrow\infty.

Lemma 1.

Suppose {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}} for some 𝛉=(β1,0,,βp,0,β1,1,,βp,1){\boldsymbol{\theta}}=(\beta_{1,0},\cdots,\beta_{p,0},\beta_{1,1},\cdots,\beta_{p,1})^{\top} satisfying condition (A1). Then for any (i,j)𝒥(i,j)\in{\cal J}, {Xi,jt,t1}\{X^{t}_{i,j},t\geq 1\} is α\alpha-mixing with exponential decaying rates. Moreover, for any positive constant c>0c>0, by choosing c1>0c_{1}>0 to be large enough, it holds with probability greater than 1(np)c1-(np)^{-c} that

max1i<jp{n1|t=1n{Xi,jtE(Xi,jt)}|,n1|bi,jE(bi,j)|,n1|di,jE(di,j)|}c1rn,p,\max_{1\leq i<j\leq p}\left\{n^{-1}\left|\sum_{t=1}^{n}\left\{X_{i,j}^{t}-{\rm E}\left(X_{i,j}^{t}\right)\right\}\right|,n^{-1}\left|b_{i,j}-{\rm E}(b_{i,j})\right|,n^{-1}\left|d_{i,j}-{\rm E}(d_{i,j})\right|\right\}\leq c_{1}r_{n,p},

where rn,p=n1log(np)+n1log(n)loglog(n)log(np)r_{n,p}=\sqrt{n^{-1}\log(np)}+n^{-1}\log\left(n\right)\log\log\left(n\right)\log\left(np\right).

The following lemma provides a lower bound for the smallest eigenvalue of E(𝐕(𝜽)){\rm E}({\mathbf{V}}(\boldsymbol{\theta})).

Lemma 2.

Let {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}, 𝐁(𝛉,r):={𝛉:𝛉𝛉r}{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right):=\left\{\boldsymbol{\theta}:\|\boldsymbol{\theta}-\boldsymbol{\theta}^{*}\|_{\infty}\leq r\right\} and 𝐁(κ0,κ1):={\mathbf{B}}\left(\kappa_{0},\kappa_{1}\right):= {(𝛃0,𝛃1):𝛃0κ0,𝛃1κ1}\Big{\{}\left(\boldsymbol{\beta}_{0},\boldsymbol{\beta}_{1}\right):\|\boldsymbol{\beta}_{0}\|_{\infty}\leq\kappa_{0},\|\boldsymbol{\beta}_{1}\|_{\infty}\leq\kappa_{1}\Big{\}}. Under condition (A1), for any κ0,κ1\kappa_{0},\kappa_{1} and r=cre4κ04κ1r=c_{r}e^{-4\kappa_{0}-4\kappa_{1}} with cr>0c_{r}>0 being a small enough constant, there exists a constant C>0C>0 such that

inf𝜽𝐁(𝜽,r)𝐁(κ0,κ1);𝐚2=1𝐚E(𝐕(𝜽))𝐚Ce4κ04κ1.\inf_{\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)\cap{\mathbf{B}}\left(\kappa_{0},\kappa_{1}\right);\|{\mathbf{a}}\|_{2}=1}{\mathbf{a}}^{\top}{\rm E}\left({\mathbf{V}}(\boldsymbol{\theta})\right){\mathbf{a}}\geq Ce^{-4\kappa_{0}-4\kappa_{1}}.

Examining the proof indicates that the lower bound in Lemma 2 is attained when 𝜷0=(κ0,,κ0)\boldsymbol{\beta}_{0}=(\kappa_{0},\ldots,\kappa_{0})^{\top} and 𝜷1=(κ1,,κ1)\boldsymbol{\beta}_{1}=(-\kappa_{1},\ldots,-\kappa_{1})^{\top}. Hence the smallest eigenvalue of E(𝐕(𝜽)){\rm E}\left({\mathbf{V}}(\boldsymbol{\theta})\right) can decay exponentially in κ0\kappa_{0} and κ1\kappa_{1}. Consequently, an upper bound for the radius κ0\kappa_{0} and κ1\kappa_{1} must be imposed so as to ensure the positive definiteness of the sample analog 𝐕(𝜽){\mathbf{V}}(\boldsymbol{\theta}). Moreover, Lemma 2 also indicates that the positive definiteness of E(𝐕(𝜽)){\rm E}\left({\mathbf{V}}(\boldsymbol{\theta})\right) can be guaranteed when 𝜽\boldsymbol{\theta} is within the \ell_{\infty} ball 𝐁(𝜽,r){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right). To establish the existence of the local MLE in the neighborhood, we need to evaluate the closeness of E(𝐕(𝜽)){\rm E}\left({\mathbf{V}}(\boldsymbol{\theta})\right) and 𝐕(𝜽){\mathbf{V}}(\boldsymbol{\theta}) in terms of the operator norm. Intuitively, for some appropriately chosen κ0,κ1\kappa_{0},\kappa_{1}, if E(𝐕(𝜽))𝐕(𝜽)2\|{\rm E}\left({\mathbf{V}}(\boldsymbol{\theta})\right)-{\mathbf{V}}(\boldsymbol{\theta})\|_{2} has a smaller order than e4κ04κ1e^{-4\kappa_{0}-4\kappa_{1}} uniformly over the parameter space {𝜽:𝜷0κ0,𝜷1κ1\{\boldsymbol{\theta}:\|\boldsymbol{\beta}_{0}\|_{\infty}\leq\kappa_{0},\|\boldsymbol{\beta}_{1}\|_{\infty}\leq\kappa_{1} and 𝜽𝐁(𝜽,r)}\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)\}, the positive definiteness of 𝐕(𝜽){\mathbf{V}}(\boldsymbol{\theta}) can be concluded.

Note that 𝐕2(𝜽)E𝐕2(𝜽),𝐕2(𝜽)+𝐕3(𝜽)E(𝐕2(𝜽)+𝐕3(𝜽)){\mathbf{V}}_{2}(\boldsymbol{\theta})-{\rm E}{\mathbf{V}}_{2}(\boldsymbol{\theta}),{\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta})-{\rm E}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta})\right) and 𝐕2(𝜽)+𝐕1(𝜽)E(𝐕2(𝜽)+𝐕1(𝜽)){\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta})-{\rm E}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta})\right) are all centered and diagonally balanced matrices which can be decomposed into sums of independent random matrices. The following lemma provides a bound for evaluating the moderate deviations of these centered matrices.

Lemma 3.

Let 𝐙=(Zi,j)1i,jp{\mathbf{Z}}=(Z_{i,j})_{1\leq i,j\leq p} be a symmetric p×pp\times p random matrix such that the off-diagonal elements Zi,j,1i<jpZ_{i,j},1\leq i<j\leq p are independent of each other and satisfy

Zi,i=j=1,jinZi,j,E(Zi,j)=0,Var(Zi,j)σ2,andZi,jbalmostsurely.Z_{i,i}=\sum_{j=1,\>j\neq i}^{n}Z_{i,j},\quad{\rm E}\left(Z_{i,j}\right)=0,\quad{\rm Var}\left(Z_{i,j}\right)\leq\sigma^{2},\quad{\rm and}\quad Z_{i,j}\leq b\quad{\rm almost~{}~{}surely}.

Then it holds that

P(𝐙2>ϵ)2pexp(ϵ22σ2(p1)+4bϵ).P\left(\left\|{\mathbf{Z}}\right\|_{2}>\epsilon\right)\leq 2p\ \exp\left(-\frac{\epsilon^{2}}{2\sigma^{2}(p-1)+4b\epsilon}\right).

Proposition 2, Lemma 2 and Lemma 3 imply the theorem below.

Theorem 1.

Let condition (A1) hold, assume {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}, and κr:=𝛃r\kappa_{r}:=\|\boldsymbol{\beta}_{r}^{*}\|_{\infty} where r=0,1r=0,1 with κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some small enough constant c>0c>0. Then as npnp\rightarrow\infty with n2n\geq 2, we have that, with probability tending to one, there exists a unique MLE in the \ell_{\infty} ball 𝐁(𝛉,r)={𝛉:𝛉𝛉r}{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)=\left\{\boldsymbol{\theta}:\|\boldsymbol{\theta}-\boldsymbol{\theta}^{*}\|_{\infty}\leq r\right\} for some r=cre4κ04κ1r=c_{r}e^{-4\kappa_{0}-4\kappa_{1}}, where cr>0c_{r}>0 is a constant.

In the proof of Theorem 1, we have shown that with probability tending to 1, l(𝜽)l(\boldsymbol{\theta}) is convex in the convex and closed set 𝐁(𝜽,r){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right). Consequently, we conclude that there exists a unique local MLE in 𝐁(𝜽,r){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right). From Theorem 1 we can also see that when κ0+κ1\kappa_{0}+\kappa_{1} becomes larger, the radius rr will be smaller, and when κ0+κ1\kappa_{0}+\kappa_{1} is bounded away from infinity, rr has a constant order. From the proof we can also see that the constant crc_{r} can be larger if the smallest eigenvalue of the expected Hessian matrix E(𝐕(𝜽)){\rm E}({\mathbf{V}}(\boldsymbol{\theta})) is larger. Further, by allowing the upper bound of 𝜷0\|\boldsymbol{\beta}^{*}_{0}\|_{\infty} to grow to infinity, our theoretical analysis covers the case where networks are sparse. Specifically, under the condition that 𝜷0κ0\|\boldsymbol{\beta}_{0}^{*}\|_{\infty}\leq\kappa_{0}, from (2) we can obtain the following lower bound (which is achievable when β1,0==βp,0=κ0\beta_{1,0}^{*}=\ldots=\beta_{p,0}^{*}=-\kappa_{0}) for the density of the stationary network:

ρ:=2p(p1)1i<jp𝐏(Xi,jt=1)e2κ01+e2κ0=O(e2κ0).\rho:=\frac{2}{p(p-1)}\sum_{1\leq i<j\leq p}{\mathbf{P}}\left(X_{i,j}^{t}=1\right)\geq\frac{e^{-2\kappa_{0}}}{1+e^{-2\kappa_{0}}}=O\left(e^{-2\kappa_{0}}\right).

In particular, when κ0clog(np)\kappa_{0}\leq c\log(np) for some constant c>0c>0, we have ρ1[1+(np)2c]\rho\geq\frac{1}{[1+(np)^{2c}]}. Thus, compared to full dense network processes where the total number of edges for each network is of the order p2p^{2}, TWHM allows the networks with much fewer edges.

3.3 Consistency of the local MLE

In the previous subsection, we have proved that with probability tending to one, l(𝜽)l(\boldsymbol{\theta}) is convex in 𝐁(𝜽,r){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right), where r=cre4κ04κ1r=c_{r}e^{-4\kappa_{0}-4\kappa_{1}} is defined in Theorem 1. Denote by 𝜽^\widehat{\boldsymbol{\theta}} the (local) MLE in 𝐁(𝜽,r){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right). We now evaluate the 2\ell_{2} and \ell_{\infty} distances between 𝜽^\widehat{\boldsymbol{\theta}} and the true value 𝜽\boldsymbol{\theta}^{*}.

Based on Theorem 5 we obtain a local deviation bound for l(𝜽)l(\boldsymbol{\theta}) as in Corollary 4 in Section 4, from which we establish the following upper bound for the estimation error of 𝜽^\widehat{\boldsymbol{\theta}} under the 2\ell_{2} norm:

Theorem 2.

Let condition (A1) hold, assume {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}, and κr:=𝛃r\kappa_{r}:=\|\boldsymbol{\beta}_{r}^{*}\|_{\infty} where r=0,1r=0,1 with κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some small enough constant c>0c>0. Then as npnp\rightarrow\infty with n2n\geq 2, it holds with probability converging to one that

1p𝜽^𝜽2e4κ0+4κ1log(np)np(1+log(np)p).\frac{1}{\sqrt{p}}\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}\lesssim e^{4\kappa_{0}+4\kappa_{1}}\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

We discuss the implication of this theorem. When nn\to\infty and pp is finite, that is, when we have a fixed number of nodes but a growing number of network snapshots, Theorem 2 indicates that 𝜽^𝜽2=Op(log3nne4κ0+4κ1)=op(1)\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}=O_{p}\left(\sqrt{\frac{\log^{3}n}{n}}e^{4\kappa_{0}+4\kappa_{1}}\right)=o_{p}(1) when cc is small enough. On the other hand, when nn, κ0\kappa_{0} and κ1\kappa_{1} are finite, Theorem 2 indicates that as the number of parameters pp increases, the 2\ell_{2} error bound of 𝜽^\widehat{\boldsymbol{\theta}} increases at a much slower rate O(logp)O\left(\sqrt{\log p}\right).

Although Theorem 2 indicates that 1p𝜽^𝜽2=op(1)\frac{1}{\sqrt{p}}\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}=o_{p}(1) as npnp\rightarrow\infty, it does not guarantee the uniform convergence of all the elements in 𝜽^\widehat{\boldsymbol{\theta}}. To prove the uniform convergence in the \ell_{\infty} norm, we exploit a special structure of the loss function and the 2\ell_{2} norm bound obtained in Theorem 2. Specifically, denote l(𝜽)l(\boldsymbol{\theta}) in (3.6) as l(𝜽)=l(𝜽(i),𝜽(i))l(\boldsymbol{\theta})=l(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}) where 𝜽(i):=(βi,0,βi,1)\boldsymbol{\theta}_{(i)}:=(\beta_{i,0},\beta_{i,1})^{\top}, and 𝜽(i)\boldsymbol{\theta}_{(-i)} contains the remaining elements of 𝜽\boldsymbol{\theta} except 𝜽(i)\boldsymbol{\theta}_{(i)}. Using this notation, we can analogously define 𝜽(i)\boldsymbol{\theta}^{*}_{(i)} and 𝜽(i)\boldsymbol{\theta}^{*}_{(-i)} for the true parameter 𝜽\boldsymbol{\theta}^{*}, and 𝜽^(i)\widehat{\boldsymbol{\theta}}_{(i)} and 𝜽^(i)\widehat{\boldsymbol{\theta}}_{(-i)} for the local MLE 𝜽^\widehat{\boldsymbol{\theta}}. We then have that 𝜽(i)\boldsymbol{\theta}_{(i)}^{*} is the mimizer of El(,𝜽(i)){\rm E}l\left(\cdot,\boldsymbol{\theta}_{(-i)}^{*}\right) while 𝜽^(i)\widehat{\boldsymbol{\theta}}_{(i)} is the minimizer of l(,𝜽^(i))l\left(\cdot,\widehat{\boldsymbol{\theta}}_{(-i)}\right). The error of 𝜽^(i)\widehat{\boldsymbol{\theta}}_{(i)} in estimating 𝜽(i)\boldsymbol{\theta}_{(i)}^{*} then relies on the distance between El(,𝜽(i)){\rm E}l\left(\cdot,\boldsymbol{\theta}_{(-i)}^{*}\right) and l(,𝜽^(i))l\left(\cdot,\widehat{\boldsymbol{\theta}}_{(-i)}\right), which on the other hand depends on both the 2\ell_{2} bound of 𝜽^𝜽2\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\|_{2} and the uniform local deviation bound of l(𝜽(i),𝜽(i))l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}\right). Based on Theorem 2, Corollary 4 in Section 4, and a sequential approach (see equations (A.30) and (A.31) in the appendix), we obtain the following bound for the estimation error under the ll_{\infty} norm.

Theorem 3.

Let condition (A1) hold, assume {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}, and κr:=𝛃r\kappa_{r}:=\|\boldsymbol{\beta}_{r}^{*}\|_{\infty} where r=0,1r=0,1 with κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some small enough constant c>0c>0. Then as np,n2np\rightarrow\infty,n\geq 2, it holds with probability converging to one that

𝜽^𝜽e8κ0+8κ1loglog(np)log(np)np(1+log(np)p).\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}\lesssim e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

Theorem 3 indicates that 𝜽^𝜽=op(1)\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}=o_{p}(1) as npnp\rightarrow\infty. Thus all the components of 𝜽^\widehat{\boldsymbol{\theta}} converge uniformly. On the other hand, when κ=clog(np)\kappa=c\log(np) for some small enough positive constant cc, we have e8κ0+8κ1loglog(np)log(np)np(1+log(np)p)o(cre4κ04κ1)e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\leq o(c_{r}e^{-4\kappa_{0}-4\kappa_{1}}). Compared with Theorem 1, we observe that although the radius rr in Theorem 1 already tends to zero when 𝜷0κ0,𝜷1κ1\|\boldsymbol{\beta}_{0}^{*}\|_{\infty}\leq\kappa_{0},\|\boldsymbol{\beta}_{1}^{*}\|_{\infty}\leq\kappa_{1} and κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some small enough constant c>0c>0, the \ell_{\infty} error bound of 𝜽^\widehat{\boldsymbol{\theta}} has a smaller order asymptotically and thus gives a tighter convergence rate.

We remark that in the MLE, 𝜷0\boldsymbol{\beta}_{0}^{*} and 𝜷1\boldsymbol{\beta}_{1}^{*} are estimated jointly. As we can see from the log-likelihood function, the information related to βi,0\beta_{i,0} is captured by Xi,jtX_{i,j}^{t} and Xi,jtXi,jt1X_{i,j}^{t}X_{i,j}^{t-1}, t=1,,n,jit=1,\ldots,n,j\neq i, while that related to βi,1\beta_{i,1} is captured by (1Xi,jt)(1Xi,jt1)(1-X_{i,j}^{t})(1-X_{i,j}^{t-1}) and Xi,jtXi,jt1X_{i,j}^{t}X_{i,j}^{t-1}, t=1,,n,jit=1,\ldots,n,j\neq i. This indicates that the effective “sample sizes” for estimating βi,0\beta_{i,0} and βi,1\beta_{i,1} are both of the order O(np)O(np). While the theorems we have established in this section is for 𝜽^=(𝜷^0,𝜷^1)\widehat{\boldsymbol{\theta}}=({\widehat{\boldsymbol{\beta}}_{0}}^{\top},{\widehat{\boldsymbol{\beta}}_{1}}^{\top})^{\top} jointly, we would expect 𝜷^0{\widehat{\boldsymbol{\beta}}_{0}} and 𝜷^1{\widehat{\boldsymbol{\beta}}_{1}} to have the same rate of convergence.

3.4 A method of moments estimator

Having established the existence of a unique local MLE in 𝐁(𝜽,r){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right) and proved its convergence, we still need to specify how to find this local MLE. To this end, we propose an initial estimator lying in this neighborhood. Consequently we can adopt any convex optimization method such as the coordinate descent algorithm to locate the local MLE, thanks to the convexity of the loss function in this neighborhood. Based on (2.3), an initial estimator of 𝜷0\boldsymbol{\beta}_{0} denoted as 𝜷~(0)\tilde{\boldsymbol{\beta}}_{(0)} can be found by solving the following method of moments equations

(3.8) t=1nj=1,jipXi,jtnj=1,jipeβi,0+βj,01+eβi,0+βj,0=0,i=1,,p.\frac{\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}X_{i,j}^{t}}{n}-\sum_{j=1,\>j\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{j,0}}}{1+e^{\beta_{i,0}+\beta_{j,0}}}=0,\quad i=1,\cdots,p.

These equations can be viewed as the score functions of the pseudo loss function f(𝜷0):=1i,jplog{1+eβi,0+βj,0}n1i=1p{βi,0t=1nj=1,jipXi,jt}f(\boldsymbol{\beta}_{0}):=\sum_{1\leq i,j\leq p}\log\{1+e^{\beta_{i,0}+\beta_{j,0}}\}-n^{-1}\sum_{i=1}^{p}\{\beta_{i,0}{\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}X_{i,j}^{t}}\}. Since the Hessian matrix of f(𝜷0)f(\boldsymbol{\beta}_{0}) is diagonally balanced with positive elements, the Hessian matrix is positive definite, and, hence, f(𝜷0)f(\boldsymbol{\beta}_{0}) is strongly convex. With the strong convexity, the solution of (3.8) is the minimizer of f()f(\cdot) which can be easily obtained using any standard algorithms such as the gradient descent. On the other hand, note that

E(Xi,jtXi,jt1)=eβi,0+βj,01+eβi,0+βj,0(111+eβi,0+βj,0+eβi,1+βj,1),{\rm E}(X_{i,j}^{t}X_{i,j}^{t-1})=\frac{e^{{\beta}_{i,0}+{\beta}_{j,0}}}{1+e^{{\beta}_{i,0}+{\beta}_{j,0}}}\left(1-\frac{1}{1+e^{{\beta}_{i,0}+{\beta}_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}}\right),

which motivates the use of the following estimating equations to obtain 𝜷~1\tilde{\boldsymbol{\beta}}_{1}, the initial estimator of 𝜷1\boldsymbol{\beta}_{1},

(3.9) t=1nj=1,jip{Xi,jtXi,jt1eβ~i,0+β~j,01+eβ~i,0+β~j,0(111+eβ~i,0+β~j,0+eβi,1+βj,1)}=0,\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left\{X_{i,j}^{t}X_{i,j}^{t-1}-\frac{e^{\tilde{\beta}_{i,0}+\tilde{\beta}_{j,0}}}{1+e^{\tilde{\beta}_{i,0}+\tilde{\beta}_{j,0}}}\left(1-\frac{1}{1+e^{\tilde{\beta}_{i,0}+\tilde{\beta}_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}}\right)\right\}=0,

with i=1,,pi=1,\cdots,p. Similar to (3.8), we can formulate a pseudo loss function such that given 𝜷~0\tilde{\boldsymbol{\beta}}_{0}, its Hessian matrix corresponding to the score equations (3.9) is positive definite, and hence (3.9) can also be solved via the standard gradient descent algorithm. Since 𝜽~=(𝜷~0,𝜷~1)\tilde{\boldsymbol{\theta}}=(\tilde{\boldsymbol{\beta}}_{0}^{\top},\tilde{\boldsymbol{\beta}}_{1}^{\top})^{\top} is obtained by solving two sets of moment equations, we call it the method of moments estimator (MME). An interesting aspect of our construction of these moment equations is that the equations corresponding to the estimation of 𝜷0\boldsymbol{\beta}_{0} and 𝜷1\boldsymbol{\beta}_{1} are decoupled. While the estimator error in estimating 𝜷0\boldsymbol{\beta}_{0} propagates clearly in that of estimating 𝜷1\boldsymbol{\beta}_{1}, we have the following existence, uniqueness, and a uniform upper bound for the estimation error of 𝜽~\tilde{\boldsymbol{\theta}}. Our results build on a novel application of the classical interior mapping theorem [6, 38, 37].

Theorem 4.

Let condition (A1) hold, and {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}. The MME 𝛉~\tilde{\boldsymbol{\theta}} defined by equations (3.8) and (3.9) exists and is unique in probability. Further, assume that κr:=𝛃r\kappa_{r}:=\|\boldsymbol{\beta}_{r}^{*}\|_{\infty} where r=0,1r=0,1 with κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some small enough constant c>0c>0. Then as npnp\to\infty and n2n\geq 2, it holds that

𝜽~𝜽Op(e14κ0+6κ1log(n)log(p)np).\left\|\tilde{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}\leq O_{p}\left(e^{14\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}}\right).

When npnp\to\infty and κ0,κ1\kappa_{0},\kappa_{1} are finite, Theorem 4 gives 𝜽~𝜽=Op(log(n)log(p)np)\left\|\tilde{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}=O_{p}\left(\sqrt{\frac{\log(n)\log(p)}{np}}\right). When κ0+κ1log(np)\kappa_{0}+\kappa_{1}\asymp\log(np), we see that the upper bound for the local MLE in Theorem 3 is dominated by the upper bound of the MME in Theorem 4. Moreover, when κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some small enough constant c>0c>0, we have 𝜽~𝐁(𝜽,r)\tilde{\boldsymbol{\theta}}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right), where rr is defined in Theorem 1. Thus, 𝜽~\tilde{\boldsymbol{\theta}} is in the small neighborhood of 𝜽\boldsymbol{\theta}^{*} as required.

3.5 The sparse case

In the previous results, the estimation error bounds depend on κ0\kappa_{0} and κ1\kappa_{1}, i.e., the upper bounds on 𝜷0\|\boldsymbol{\beta}_{0}^{*}\|_{\infty} and 𝜷1\|\boldsymbol{\beta}_{1}^{*}\|_{\infty}. Clearly, the larger κ0\kappa_{0} is, the more sparse the networks could be, and the larger κ1\kappa_{1} is, the lag-one correlations (c.f. equation (2.4)) could be closer to one, indicating fewer fluctuations in the network process. To further characterize the effect of network sparsity, in this section, we derive further properties under a relatively sparse scenario where κ0βi,0Cκ-\kappa_{0}\leq\beta_{i,0}^{*}\leq C_{\kappa} and κ1βi,1κ1-\kappa_{1}\leq\beta_{i,1}^{*}\leq\kappa_{1} for all i=1,,pi=1,\ldots,p and Cκ>0C_{\kappa}>0 here is a constant. Under this case, there exist constants C>0C>0 and C1>0C_{1}>0 such that Ce2κ0E(Xi,jt)C1<1.Ce^{-2\kappa_{0}}\leq{\rm E}\left(X_{i,j}^{t}\right)\leq C_{1}<1. In the most sparse case where β0,i=κ0,i=1,,p\beta_{0,i}=-\kappa_{0},i=1,\ldots,p, the density of the stationary network is of the order O(e2κ0)O(e^{-2\kappa_{0}}). Similar to Lemma 2 and Theorem 1, the following corollary provides a lower bound for the smallest eigenvalue of E(𝐕(𝜽)){\rm E}({\mathbf{V}}(\boldsymbol{\theta})) and the existence of the MLE.

Corollary 1.

Let {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}, 𝐁(𝛉,r)={𝛉:𝛉𝛉r}{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)=\left\{\boldsymbol{\theta}:\|\boldsymbol{\theta}-\boldsymbol{\theta}^{*}\|_{\infty}\leq r\right\} for some r=cre2κ04κ1r=c_{r}e^{-2\kappa_{0}-4\kappa_{1}} where cr>0c_{r}>0 is a small enough constant. Denote 𝐁(κ0,κ1):={(𝛃0,𝛃1):κ0𝛃i,0Cκ,i=1,,p,{\mathbf{B}}^{\prime}\left(\kappa_{0},\kappa_{1}\right):=\Big{\{}\left(\boldsymbol{\beta}_{0},\boldsymbol{\beta}_{1}\right):-\kappa_{0}\leq\boldsymbol{\beta}_{i,0}\leq C_{\kappa},i=1,\ldots,p, 𝛃1κ1}\|\boldsymbol{\beta}_{1}\|_{\infty}\leq\kappa_{1}\Big{\}} for some constant Cκ>0C_{\kappa}>0. Then, under condition (A1), there exists a constant C>0C>0 such that

inf𝜽𝐁(𝜽,r)𝐁(κ0,κ1);𝐚2=1𝐚E(𝐕(𝜽))𝐚Ce2κ04κ1.\inf_{\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)\cap{\mathbf{B}}^{\prime}\left(\kappa_{0},\kappa_{1}\right);\|{\mathbf{a}}\|_{2}=1}{\mathbf{a}}^{\top}{\rm E}\left({\mathbf{V}}(\boldsymbol{\theta})\right){\mathbf{a}}\geq Ce^{-2\kappa_{0}-4\kappa_{1}}.

Further, assume that 𝛉𝐁(κ0,κ1)\boldsymbol{\theta}^{*}\in{\mathbf{B}}^{\prime}\left(\kappa_{0},\kappa_{1}\right) and κ0+2κ1<clog(np)\kappa_{0}+2\kappa_{1}<c\log(np) for some positive constant c<1/6c<1/6. Then, as npnp\rightarrow\infty with n2n\geq 2, with probability tending to 1, there exists a unique MLE in 𝐁(𝛉,r){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right).

Following Theorems 2-4, we also establish the estimation errors for the MLE and MME in the subsequent corollaries below.

Corollary 2.

Let condition (A1) hold. Assume {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}, 𝛃1κ1\|\boldsymbol{\beta}_{1}^{*}\|_{\infty}\leq\kappa_{1}, and κ0βi,0Cκ-\kappa_{0}\leq\beta_{i,0}^{*}\leq C_{\kappa} for i=1,,pi=1,\ldots,p, and some constant Cκ>0C_{\kappa}>0. Then as npnp\rightarrow\infty with n2n\geq 2, it holds with probability tending to one that

1p𝜽^𝜽2Ce2κ0+4κ1log(np)np(1+log(np)p),\frac{1}{\sqrt{p}}\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}\leq Ce^{2\kappa_{0}+4\kappa_{1}}\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right),
and𝜽^𝜽Ce4κ0+8κ1loglog(np)log(np)np(1+log(np)p).{\rm and}~{}~{}~{}\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}\leq Ce^{4\kappa_{0}+8\kappa_{1}}\log\log(np)\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).
Corollary 3.

Let condition (A1) hold. Assume {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}}, 𝛃1κ1\|\boldsymbol{\beta}_{1}^{*}\|_{\infty}\leq\kappa_{1}, and κ0βi,0Cκ-\kappa_{0}\leq\beta_{i,0}^{*}\leq C_{\kappa} for i=1,,pi=1,\ldots,p and some constant Cκ>0C_{\kappa}>0. Then as npnp\rightarrow\infty with n2n\geq 2, it holds with probability tending to one that the MME 𝛉~\tilde{\boldsymbol{\theta}} defined in equations (3.8) and (3.9) exists uniquely, and when κ0+2κ1<clog(np)\kappa_{0}+2\kappa_{1}<c\log(np) for some constant c<1/12c<1/12, it holds that

𝜽~𝜽Op(e4κ0+6κ1log(n)log(p)np).\left\|\tilde{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}\leq O_{p}\left(e^{4\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}}\right).

From Corollary 2, we can see that when κ1O(1)\kappa_{1}\asymp O(1), the MLE is consistent when κ0clog(np)\kappa_{0}\leq c\log(np) for some positive constant c<1/8c<1/8, with the corresponding lower bound in the density as O(e2clog(np))O((np)1/4)O(e^{-2c\log(np)})\succ O((np)^{-1/4}). Similarly, from Corollary 3 we can see that when κ1O(1)\kappa_{1}\asymp O(1), the density of the networks can be as small as O(e2clog(np))O(e^{-2c\log(np)}) for some constant c<1/12c<1/12, i.e., the density has a larger order than (np)1/6(np)^{-1/6} for the estimation of the MME. Further, when 6κ0+10κ1c1log(np)6\kappa_{0}+10\kappa_{1}\leq c_{1}\log(np) for some constant c1<1/2c_{1}<1/2, we have 𝜽~𝐁(𝜽,r)\tilde{\boldsymbol{\theta}}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right), where rr is defined in Corollary 1. This implies the validity of using 𝜽~\tilde{\boldsymbol{\theta}} as an initial estimator for computing the local MLE.

4 A uniform local deviation bound under high dimensionality

As we have discussed, a key to establish the consistency of the local MLE is to evaluate the magnitude of |[l(𝜽)El(𝜽)][l(𝜽)El(𝜽)]|\big{|}[l(\boldsymbol{\theta})-{\rm E}l(\boldsymbol{\theta})]-[l(\boldsymbol{\theta}^{*})-{\rm E}l(\boldsymbol{\theta}^{*})]\big{|} for all 𝜽𝐁(𝜽,r)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right) with rr specified in Theorem 1. Such local deviation bounds are important for establishing error bounds for general M-estimators in the empirical processes [32]. Note that

(4.10) l(𝜽)El(𝜽)\displaystyle l(\boldsymbol{\theta})-{\rm E}l(\boldsymbol{\theta}) =1p1i<jp{(βi,0+βj,0)(ai,jE(ai,j)n)\displaystyle=-\frac{1}{p}\sum_{1\leq i<j\leq p}\Bigg{\{}\left(\beta_{i,0}+\beta_{j,0}\right)\Big{(}\frac{a_{i,j}-{\rm E}(a_{i,j})}{n}\Big{)}
+\displaystyle+ log(1+e(βi,1+βj,1))(di,jE(di,j)n)\displaystyle\log\left(1+e^{\left(\beta_{i,1}+\beta_{j,1}\right)}\right)\Big{(}\frac{d_{i,j}-{\rm E}(d_{i,j})}{n}\Big{)}
+\displaystyle+ log(1+e(βi,1βi,0)+(βj,1βj,0))(bi,jE(bi,j)n)}\displaystyle\log\left(1+e^{\left(\beta_{i,1}-\beta_{i,0}\right)+\left(\beta_{j,1}-\beta_{j,0}\right)}\right)\Big{(}\frac{b_{i,j}-{\rm E}(b_{i,j})}{n}\Big{)}\Bigg{\}}

where ai,j,bi,ja_{i,j},b_{i,j} and di,jd_{i,j} are defined in (3.7). The three terms on the right-hand side all admit the following form

(4.11) 𝐋(𝜽)=1p1ijpli,j(θi,θj)Yi,j,{\mathbf{L}}\left(\boldsymbol{\theta}\right)=\frac{1}{p}\sum_{1\leq i\neq j\leq p}l_{i,j}\left(\theta_{i},\theta_{j}\right)Y_{i,j},

for some functions 𝐋:p{\mathbf{L}}:\mathbb{R}^{p}\to\mathbb{R}, li,j:2l_{i,j}:\mathbb{R}^{2}\to\mathbb{R}, and centered random variables Yi,jY_{i,j} (1i,jp)(1\leq i,j\leq p). Instead of establishing the uniform bound for each term in (4.10) separately, below we will establish a unified result for bounding |𝐋(𝜽)𝐋(𝜽)||{\mathbf{L}}\left(\boldsymbol{\theta}\right)-{\mathbf{L}}\left(\boldsymbol{\theta}^{\prime}\right)| over a local \ell_{\infty} ball defined as 𝜽𝐁(𝜽,)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}(\boldsymbol{\theta}^{\prime},\cdot) for a general 𝐋{\mathbf{L}} function as in (4.11). We remark that in general without further assumptions on 𝐋{\mathbf{L}}, establishing uniform deviation bounds is impossible when the dimension of the problem diverges. For our TWHM however, the decomposition (4.10) is of a particularly appealing structure in the sense that only two-way interactions between parameters θi\theta_{i} exist. Based on this “sparsity” structure, we develop a novel reformulation (c.f. equation (A.15)) for the main components of the Taylor series of 𝐋(𝜽){\mathbf{L}}(\boldsymbol{\theta}) satisfying the following two conditions.

  • (L-A1)

    There exists a constant α>0\alpha>0, such that for any 1ijp1\leq i\neq j\leq p, any positive integer kk, and any non-negative integer sks\leq k, we have:

    kli,j(θi,θj)θisθjks(k1)!αk.\frac{\partial^{k}l_{i,j}\left(\theta_{i},\theta_{j}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\leq\frac{\left(k-1\right)!}{\alpha^{k}}.
  • (L-A2)

    Random variables Yi,j,1ijpY_{i,j},1\leq i\neq j\leq p are independent satisfying E(Yi,j)=0{\rm E}\left(Y_{i,j}\right)=0, |Yi,j|b(p)|Y_{i,j}|\leq b_{(p)} and Var(Yi,j)σ(p)2{\rm Var}\left(Y_{i,j}\right)\leq\sigma_{(p)}^{2} for any ii and jj, where b(p)b_{(p)} and σ(p)2\sigma_{(p)}^{2} are constants depending on nn and pp but independent of ii and jj.

Loosely speaking, Condition (L-A1) can be seen as a smoothness assumption on the higher order derivatives of li,j(θi,θj)l_{i,j}\left(\theta_{i},\theta_{j}\right) so that we can properly bound these derivatives when Taylor expansion is applied. On the other hand, the upper bound for these derivatives is mild as it can diverge very quickly as kk increases. For our TWHM, it can be verified that (L-A1) holds for li,j(θi,θj)=θi+θjl_{i,j}(\theta_{i},\theta_{j})=\theta_{i}+\theta_{j} and li,j(θi,θj)=log(1+eθi+θj)l_{i,j}(\theta_{i},\theta_{j})=\log(1+e^{\theta_{i}+\theta_{j}}); see (3.6). For the latter, note that the first derivative of function l(x)=log(1+ex)l(x)=\log(1+e^{x}) is seen as the Sigmoid function:

S(x)=ex1+ex=11+ex.S\left(x\right)=\frac{e^{x}}{1+e^{x}}=\frac{1}{1+e^{-x}}.

By the expression of the higher order derivatives of the Sigmoid function [26], the kk-th order derivative of ll is

kl(x)xk=m=0k2A(k1,m)(ex)m+1(1+ex)k,\frac{\partial^{k}l\left(x\right)}{\partial x^{k}}=\frac{\sum_{m=0}^{k-2}-A\left(k-1,m\right)\left(-e^{x}\right)^{m+1}}{\left(1+e^{x}\right)^{k}},

where k2k\geq 2 and A(k1,m)A\left(k-1,m\right) is the Eulerian number. Now for any xx, we have

|m=0k2A(k1,m)(ex)m+1(1+ex)k|m=0k2A(k1,m)=(k1)!.\left|\frac{\sum_{m=0}^{k-2}-A\left(k-1,m\right)\left(-e^{x}\right)^{m+1}}{\left(1+e^{x}\right)^{k}}\right|\leq\sum_{m=0}^{k-2}A\left(k-1,m\right)=\left(k-1\right)!.

Therefore,

|kl(x)xk|(k1)!\left|\frac{\partial^{k}l\left(x\right)}{\partial x^{k}}\right|\leq{\left(k-1\right)!}

holds for all xx\in\mathbb{R} and k2k\geq 2. With extra arguments using the chain rule, this in return implies that (L-A1) is satisfied with α=1\alpha=1 when li,j(𝜽)=log(1+eθi+θj)l_{i,j}(\boldsymbol{\theta})=\log\left(1+e^{\theta_{i}+\theta_{j}}\right).

Condition (L-A2) is a regularization assumption for the random variables Yi,j,1i,jpY_{i,j},1\leq i,j\leq p, and the bounds on their moments are imposed to ensure point-wise concentration. For our TWHM, from Lemma 1 and Lemma 5, we have that there exist large enough constants C>0C>0 and c>0c>0 such that with probability greater than 1(np)c1-(np)^{-c}, the random variables ai,jE(ai,j)n\frac{a_{i,j}-{\rm E}(a_{i,j})}{n}, bi,jE(bi,j)n\frac{b_{i,j}-{\rm E}(b_{i,j})}{n} and di,jE(di,j)n\frac{d_{i,j}-{\rm E}(d_{i,j})}{n} all satisfy condition (L-A2) with b(p)=Cn1log(np)+Cn1log(n)loglog(n)log(np)b_{(p)}=C\sqrt{n^{-1}\log(np)}+Cn^{-1}\log\left(n\right)\log\log\left(n\right)\log\left(np\right) and σ(p)2=Cn1\sigma_{(p)}^{2}=Cn^{-1}.

We present the uniform upper bound on the deviation of 𝐋(𝜽){\mathbf{L}}(\boldsymbol{\theta}) below.

Theorem 5.

Assume conditions (L-A1) and (L-A2). For any given 𝛉p\boldsymbol{\theta}^{\prime}\in\mathbb{R}^{p} and α0(0,α/2)\alpha_{0}\in(0,\alpha/2), there exist large enough constants C>0C>0 and c>0c>0 which are independent of 𝛉\boldsymbol{\theta}^{\prime}, such that, as npnp\to\infty, with probability greater than 1(np)c1-(np)^{-c},

|𝐋(𝜽)𝐋(𝜽)|Cb(p)log(np)+σ(p)plog(np)p𝜽𝜽1\left|{\mathbf{L}}\left(\boldsymbol{\theta}\right)-{\mathbf{L}}\left(\boldsymbol{\theta}^{\prime}\right)\right|\leq C\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{1}

holds uniformly for all 𝛉𝐁(𝛉,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right).

One of the main difficulties in analyzing 𝐋(𝜽){\mathbf{L}}(\boldsymbol{\theta}) defined in (4.11) is that li,j(θi,θj)l_{i,j}(\theta_{i},\theta_{j}) and Yi,jY_{i,j} are coupled, giving rise to complex terms involving both in the Taylor expansion of 𝐋(𝜽){\mathbf{L}}(\boldsymbol{\theta}). When Taylor expansion with order KK is used, condition (L-A1) can reduce the number of higher order terms from O(pK)O(p^{K}) to O(p22K)O(p^{2}2^{K}). On the other hand, by formulating the main terms in the Taylor series into a matrix form in (A.15), the uniform convergence of the sum of these terms is equivalent to that of the spectral norm of a centered random matrix, which is independent of the parameters. Further details can be found in the proofs of Theorem 5.

Define the marginal functions of 𝐋(𝜽){\mathbf{L}}\left(\boldsymbol{\theta}\right) as

𝐋i(𝜽)=1pj=1,jipli,j(θi,θj)Yi,j,i=1,,p,{\mathbf{L}}_{i}\left(\boldsymbol{\theta}\right)=\frac{1}{p}\sum_{j=1,\>j\neq i}^{p}l_{i,j}\left(\theta_{i},\theta_{j}\right)Y_{i,j},\quad i=1,\ldots,p,

by retaining only those terms related to θi\theta_{i}. Similar to Theorem 5, we state the following upper bound for these marginal functions. With some abuse of notation, let 𝜽i:=(θ1,,θi1,θi+1,,θp)\boldsymbol{\theta}_{-i}:=\left(\theta_{1},\cdots,\theta_{i-1},\theta_{i+1},\cdots,\theta_{p}\right)^{\top} be the vector containing all the elements in 𝜽\boldsymbol{\theta} except θi\theta_{i}.

Theorem 6.

If conditions (L-A1) and (L-A2) hold, then for any given 𝛉p\boldsymbol{\theta}^{\prime}\in\mathbb{R}^{p} and α0(0,α/2)\alpha_{0}\in(0,\alpha/2), there exist large enough constants C>0C>0 and c>0c>0 which are independent of 𝛉\boldsymbol{\theta}^{\prime}, such that, as npnp\to\infty, with probability greater than 1(np)c1-(np)^{-c},

|𝐋i(𝜽)𝐋i(𝜽)|Cb(p)p𝜽i𝜽i1+C(𝜽i𝜽i1+1)|θiθi|b(p)log(np)+σ(p)plog(np)p\left|{\mathbf{L}}_{i}\left(\boldsymbol{\theta}\right)-{\mathbf{L}}_{i}\left(\boldsymbol{\theta}^{\prime}\right)\right|\\ \leq C\frac{b_{(p)}}{p}\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}\right\|_{1}+C\left(\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}\right\|_{1}+1\right)\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}

holds uniformly for all 𝛉𝐁(𝛉,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right), and i=1,,pi=1,\cdots,p.

Similar to (4.10), we can also decompose l(𝜽(i),𝜽(i))El(𝜽(i),𝜽(i))l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}\right)-{\rm E}l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}\right) into the sum of three components taking the form (4.11). Consequently, by setting 𝜽\boldsymbol{\theta}^{\prime} in Theorems 5 and 6 to be the true parameter 𝜽\boldsymbol{\theta}^{*}, we can obtain the following upper bounds.

Corollary 4.

For any given 0<α0<1/40<\alpha_{0}<1/4, there exist large enough positive constants c1,c2,c_{1},c_{2}, and CC such that

  • (i)

    with probability greater than 1(np)c11-(np)^{-c_{1}},

    (4.12) |(l(𝜽)l(𝜽))(El(𝜽)El(𝜽))|C1(1+log(np)p)log(np)n𝜽𝜽2\left|\big{(}l(\boldsymbol{\theta})-l(\boldsymbol{\theta}^{*})\big{)}-\big{(}{\rm E}l(\boldsymbol{\theta})-{\rm E}l(\boldsymbol{\theta}^{*})\big{)}\right|\leq C_{1}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\sqrt{\frac{\log(np)}{n}}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{*}\right\|_{2}

    holds uniformly for all 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},\alpha_{0}\right) with some constant α0<1/2\alpha_{0}<1/2;

  • (ii)

    with probability greater than 1(np)c21-(np)^{-c_{2}},

    (4.13) |l(𝜽(i),𝜽(i))l(𝜽(i),𝜽(i))[El(𝜽(i),𝜽(i))El(𝜽(i),𝜽(i))]|C2(1+log(np)p)log(np)n𝜽(i)𝜽(i)2\left|l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)-l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-\left[{\rm E}l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)-{\rm E}l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|\\ \leq C_{2}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\sqrt{\frac{\log(np)}{n}}\left\|\boldsymbol{\theta}_{(i)}-\boldsymbol{\theta}^{*}_{(i)}\right\|_{2}

    holds uniformly for all 𝜽(i)𝐁(𝜽(i),α0)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},\alpha_{0}\right) with some constant α0<1/2\alpha_{0}<1/2.

In (4.12) and (4.13) we have replaced the 1\ell_{1} norm based upper bounds in Theorems 5 and 6 with 2\ell_{2} norm based upper bounds using the fact that for all 𝐱p{\mathbf{x}}\in\mathbb{R}^{p}, 𝐱1p𝐱2\|{\mathbf{x}}\|_{1}\leq\sqrt{p}\|{\mathbf{x}}\|_{2}. It is recognized that networks often exhibit diverse characteristics, including dynamic changes, node heterogeneity, homophily, and transitivity, among others. In this paper, our primary emphasis is on addressing node heterogeneity within dynamic networks. When integrated with other stylized features, the objective function may adopt a similar structure to the 𝐋(𝜽){\mathbf{L}}(\boldsymbol{\theta}) function defined in Equation (4.11)\eqref{Lfunction}. Moreover, many other models that incorporate node heterogeneity can express their log-likelihood functions in a form analogous to Equation (4.11). For instance, the general category of network models with edge formation probabilities represented as f(αi,βj)f(\alpha_{i},\beta_{j}), where f()f(\cdot) is a density or probability mass function, and (αi,βi)(\alpha_{i},\beta_{i}) denote node-specific parameters for node ii. This encompasses models such as the p1p_{1} model [13], the directed β\beta-model [36], and the bivariate gamma model [8]. Additionally, in the domain of ranking data analysis, it is common to introduce individual-specific parameters or scores for ranking, as observed in the classical Bradley-Terry model and its variants [9]. Our discoveries have the potential for application in the theoretical examination of these models or their modifications when considering additional stylized features alongside node heterogeneity.

5 Numerical study

In this section, we assess the performance of the local MLE. For comparison, we have also computed a regularized MME that is numerically more stable than the vanilla MME in (3.9). Specifically, for the former, we solve

(5.14) 1npt=1nj=1,jip{Xi,jtXi,jt1eβ~i,0+β~j,01+eβ~i,0+β~j,0(111+eβ~i,0+β~j,0+eβi,1+βj,1)}+λβi,1=0,\frac{-1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left\{X_{i,j}^{t}X_{i,j}^{t-1}-\frac{e^{\tilde{\beta}_{i,0}+\tilde{\beta}_{j,0}}}{1+e^{\tilde{\beta}_{i,0}+\tilde{\beta}_{j,0}}}\left(1-\frac{1}{1+e^{\tilde{\beta}_{i,0}+\tilde{\beta}_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}}\right)\right\}+\lambda\beta_{i,1}=0,

with i=1,,pi=1,\cdots,p, where λβi,1\lambda\beta_{i,1} can be seen as a ridge penalty with λ>0\lambda>0 as the regularization parameter. Denote the regularized MME as 𝜽~λ\tilde{\boldsymbol{\theta}}_{\lambda}. Similar to Theorem 4, by choosing λ=Cλe2κlog(np)np\lambda=C_{\lambda}e^{2\kappa}\sqrt{\frac{\log(np)}{np}} for some constant CλC_{\lambda}, we can show that 𝜽~λ𝜽Op(e26κlog(n)log(p)np).\left\|\tilde{\boldsymbol{\theta}}_{\lambda}-\boldsymbol{\theta}^{*}\right\|_{\infty}\leq O_{p}\left(e^{26\kappa}\sqrt{\frac{\log(n)\log(p)}{np}}\right). In our implementation we take λ=log(np)np\lambda=\sqrt{\frac{\log(np)}{np}}. The MLE of TWHM is obtained via gradient descent using 𝜽~λ\tilde{\boldsymbol{\theta}}_{\lambda} as the initial value.

5.1 Non-convexity of l(𝜽)l(\boldsymbol{\theta}) and El(𝜽){\rm E}l(\boldsymbol{\theta})

Given the form of l(𝜽)l(\boldsymbol{\theta}), it is intuitively true that it may not be convex everywhere. We confirm this via a simple example. Take (n,p)=(2,1000)(n,p)=(2,1000) and set 𝜷0,𝜷1\boldsymbol{\beta}^{*}_{0},\boldsymbol{\beta}^{*}_{1} to be 0.2p\textbf{0.2}_{p}, 0.5p\textbf{0.5}_{p} or 1p\textbf{1}_{p}. We evaluate the smallest eigenvalue of the Hessian matrix of l(𝜽)l(\boldsymbol{\theta}) and its expectation El(𝜽){\rm E}l(\boldsymbol{\theta}) at the true parameter value 𝜽=(𝜷0,𝜷1)\boldsymbol{\theta}^{*}=(\boldsymbol{\beta}^{*\top}_{0},\boldsymbol{\beta}^{*\top}_{1})^{\top}, or at 𝜽=02p\boldsymbol{\theta}=\textbf{0}_{2p} in one experiment.

Table 1: Signs of the smallest eigenvalues of the Hessian matrices of l(𝜽)l(\boldsymbol{\theta}) and El(𝜽){\rm E}l(\boldsymbol{\theta}) evaluated at 𝜽=𝜽\boldsymbol{\theta}=\boldsymbol{\theta}^{*} or 02p\textbf{0}_{2p} when different values of 𝜽=(𝜷0,𝜷1)\boldsymbol{\theta}^{*}=(\boldsymbol{\beta}^{*\top}_{0},\boldsymbol{\beta}^{*\top}_{1})^{\top} are used to generate data.
Sign of the smallest eigenvalue of l(𝜽)l(\boldsymbol{\theta}^{*}) Sign of the smallest eigenvalue of El(𝜽){\rm E}l(\boldsymbol{\theta}^{*})
𝜷0\boldsymbol{\beta}^{*}_{0}= 0.2p\textbf{0.2}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 0.5p\textbf{0.5}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 1p\textbf{1}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 0.2p\textbf{0.2}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 0.5p\textbf{0.5}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 1p\textbf{1}_{p}
𝜷1\boldsymbol{\beta}^{*}_{1}= 0.2p\textbf{0.2}_{p} ++ ++ ++ 𝜷1\boldsymbol{\beta}^{*}_{1}= 0.2p\textbf{0.2}_{p} ++ ++ ++
𝜷1\boldsymbol{\beta}^{*}_{1}= 0.5p\textbf{0.5}_{p} ++ ++ ++ 𝜷1\boldsymbol{\beta}^{*}_{1}= 0.5p\textbf{0.5}_{p} ++ ++ ++
𝜷1\boldsymbol{\beta}^{*}_{1}= 1p\textbf{1}_{p} ++ ++ ++ 𝜷1\boldsymbol{\beta}^{*}_{1}= 1p\textbf{1}_{p} ++ ++ ++
Sign of the smallest eigenvalue of l(02p)l(\textbf{0}_{2p}) Sign of the smallest eigenvalue of El(02p){\rm E}l(\textbf{0}_{2p})
𝜷0\boldsymbol{\beta}^{*}_{0}= 0.2p\textbf{0.2}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 0.5p\textbf{0.5}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 1p\textbf{1}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 0.2p\textbf{0.2}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 0.5p\textbf{0.5}_{p} 𝜷0\boldsymbol{\beta}^{*}_{0}= 1p\textbf{1}_{p}
𝜷1\boldsymbol{\beta}^{*}_{1}= 0.2p\textbf{0.2}_{p} ++ ++ - 𝜷1\boldsymbol{\beta}^{*}_{1}= 0.2p\textbf{0.2}_{p} ++ ++ -
𝜷1\boldsymbol{\beta}^{*}_{1}= 0.5p\textbf{0.5}_{p} - - - 𝜷1\boldsymbol{\beta}^{*}_{1}= 0.5p\textbf{0.5}_{p} ++ ++ -
𝜷1\boldsymbol{\beta}^{*}_{1}= 1p\textbf{1}_{p} - - - 𝜷1\boldsymbol{\beta}^{*}_{1}= 1p\textbf{1}_{p} - - -

From the top half of Table 1 we can see that, when evaluated at 𝜽\boldsymbol{\theta}^{*}, the Hessian matrices are all positive definite. However, when evaluated at 𝜽=02p\boldsymbol{\theta}=\textbf{0}_{2p}, from the bottom half of the table we can see that the Hessian matrices are no longer positive definite when 𝜽\boldsymbol{\theta}^{*} is far away from 02p\textbf{0}_{2p}. Even when the Hessian matrix of El(𝜽){\rm E}l(\boldsymbol{\theta}) is so at 𝜽=02p\boldsymbol{\theta}=\textbf{0}_{2p} with 𝜽=0.52p\boldsymbol{\theta}^{*}=\textbf{0.5}_{2p}, the corresponding Hessian matrix of l(𝜽)l(\boldsymbol{\theta}) at this point has a negative eigenvalue. Thus, El(𝜽){\rm E}l(\boldsymbol{\theta}) and l(𝜽)l(\boldsymbol{\theta}) are not globally convex.

5.2 Parameter estimation

We first evaluate the error rates of the MLE and MME under different combinations of nn and pp. We set n=2,5,10,n=2,5,10, or 2020 and p200×1.20:6={200,240,288,346,415,498,598}p\in\lfloor 200\times 1.2^{0:6}\rfloor=\{200,240,288,346,415,498,598\}, which results in a total of 28 different combinations of (n,p)(n,p). For each (n,p)(n,p), the data are generated such that {𝐗t}P𝜽\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}^{*}} where the parameters βi,0\beta_{i,0}^{*} and βi,1\beta_{i,1}^{*}(1ip1\leq i\leq p) are drawn independently from the uniform distribution with parameters in (1,1)(-1,1). Each experiment is repeated 100 times under each setting. Denote the estimator (which is either the MLE or the MME) as 𝜽^\widehat{\boldsymbol{\theta}}, and the true parameter value as 𝜽\boldsymbol{\theta}^{*}. We report the average 2\ell_{2} error in terms of 𝜽^𝜽2p\frac{\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\|_{2}}{\sqrt{p}} and the average \ell_{\infty} error 𝜽^𝜽\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\|_{\infty} in Figure 2.

Refer to caption
Figure 2: Mean errors of MME and MLE in terms of the 2\ell_{2} and \ell_{\infty} norm.

From this figure, we can see that the errors in terms of the \ell_{\infty} norm and the 2\ell_{2} norm decrease for MME and MLE as nn or pp increases, while the errors of MLE are smaller across all settings. These observations are consistent with our findings in the main theory.

Next, we provide more numerical simulation to evaluate the performance of MLE and MME by imposing different structures on 𝜷0\boldsymbol{\beta}_{0}^{*} and 𝜷1\boldsymbol{\beta}_{1}^{*}. In particular, we want to evaluate how the estimation accuracy changes by varying the sparsity of the networks as well as varying the correlations of the network sequence. Note that the expected density of the stationary distribution of the network process is simply

1p(p1)E(1ijpXi,jt)=1p(p1)(1ijpeβi,0+βj,01+eβi,0+βj,0).\frac{1}{p(p-1)}{\rm E}\left(\sum_{1\leq i\neq j\leq p}X_{i,j}^{t}\right)=\frac{1}{p(p-1)}\left(\sum_{1\leq i\neq j\leq p}\frac{e^{\beta_{i,0}^{*}+\beta_{j,0}^{*}}}{1+e^{\beta_{i,0}^{*}+\beta_{j,0}^{*}}}\right).

In the sequel, we will use two parameters aa and bb to generate 𝜷r\boldsymbol{\beta}_{r}^{*}, r=0,1r=0,1, according to the following four settings:

  • Setting 1.

    {a}\{a\}: all the elements in 𝜷r\boldsymbol{\beta}_{r}^{*} are set to be equal to aa.

  • Setting 2.

    {a,b}{\{a,b\}}: the first 10%\% elements of 𝜷r\boldsymbol{\beta}_{r}^{*} are set to be equal to aa, while the other elements are set to be equal to bb.

  • Setting 3.

    (a,b)\mathcal{L}_{(a,b)}: the parameters take values in a linear form as βi,r=a+(ab)(i1)/(p1)\beta_{i,r}^{*}=a+(a-b)*(i-1)/(p-1), i=1,,pi=1,\cdots,p.

  • Setting 4.

    U(a,b)U_{(a,b)}: the pp elements in 𝜷r\boldsymbol{\beta}_{r}^{*} are generated independently from the uniform distribution with parameters aa and bb.

In Table 2, we generate 𝜷1\boldsymbol{\beta}_{1}^{*} using Setting 1 with a=0a=0, and generate 𝜷0\boldsymbol{\beta}_{0}^{*} using Setting 2 with different choices for aa and bb to obtain networks with different expected density. In Table 3, we generate 𝜷0\boldsymbol{\beta}_{0}^{*} and 𝜷1\boldsymbol{\beta}_{1}^{*} using combinations of these four settings with different parameters such that the resulting networks have expected density either around 0.05 (sparse) or 0.5 (dense). The number of networks in each process and the number of nodes in each network are set as (n,p)=(20,200),(20,500),(50,200)(n,p)=(20,200),(20,500),(50,200) or (50,500)(50,500). The errors for estimating 𝜽\boldsymbol{\theta}^{*} in terms of the \ell_{\infty} and 2\ell_{2} norms are reported via 100 replications. To further compare the accuracy for estimating 𝜷0\boldsymbol{\beta}_{0}^{*} and 𝜷1\boldsymbol{\beta}_{1}^{*}, in Table 4, we have conducted experiments under Settings 3 and 4, and reported the estimation errors for 𝜷0\boldsymbol{\beta}_{0}^{*} and 𝜷1\boldsymbol{\beta}_{1}^{*} separately. We summarize the simulation results below:

Table 2: The estimation errors of MME and MLE under Setting 1 and Setting 2 for 𝜷0\boldsymbol{\beta}_{0}^{*} by setting 𝜷1=𝟎p\boldsymbol{\beta}^{*}_{1}=\boldsymbol{0}_{p}.
nn pp 𝜷0\boldsymbol{\beta}_{0}^{*} MME, 2\ell_{2} MME, \ell_{\infty} MLE, 2\ell_{2} MLE, \ell_{\infty}
20 200 {0} 0.074 0.219 0.071 0.212
50 200 {0} 0.046 0.138 0.045 0.136
20 500 {0} 0.046 0.150 0.045 0.146
50 500 {0} 0.029 0.093 0.028 0.092
20 200 {0.5,0.5}\{0.5,-0.5\} 0.092 0.222 0.091 0.217
50 200 {0.5,0.5}\{0.5,-0.5\} 0.058 0.140 0.058 0.139
20 500 {0.5,0.5}\{0.5,-0.5\} 0.058 0.154 0.057 0.148
50 500 {0.5,0.5}\{0.5,-0.5\} 0.036 0.095 0.036 0.093
20 200 {1,1}\{1,-1\} 0.120 0.305 0.117 0.284
50 200 {1,1}\{1,-1\} 0.074 0.186 0.074 0.177
20 500 {1,1}\{1,-1\} 0.075 0.200 0.073 0.190
50 500 {1,1}\{1,-1\} 0.038 0.125 0.036 0.119
20 200 {1.5,1.5}\{1.5,-1.5\} 0.164 0.436 0.156 0.397
50 200 {1.5,1.5}\{1.5,-1.5\} 0.102 0.255 0.097 0.236
20 500 {1.5,1.5}\{1.5,-1.5\} 0.103 0.287 0.097 0.262
50 500 {1.5,1.5}\{1.5,-1.5\} 0.065 0.178 0.061 0.164
Table 3: The average estimation errors of MME and MLE under combinations of different settings
Density = 0.05
nn pp 𝜷0\boldsymbol{\beta}_{0}^{*} 𝜷1\boldsymbol{\beta}_{1}^{*} MME, 2\ell_{2} MME, \ell_{\infty} MLE, 2\ell_{2} MLE, \ell_{\infty}
20 200 (4,0)\mathcal{L}_{(-4,0)} U(1,1)U_{(-1,1)} 0.419 1.833 0.392 1.8
50 200 (4,0)\mathcal{L}_{(-4,0)} U(1,1)U_{(-1,1)} 0.253 0.913 0.227 0.82
20 500 (4,0)\mathcal{L}_{(-4,0)} U(1,1)U_{(-1,1)} 0.246 1.119 0.218 0.9
50 500 (4,0)\mathcal{L}_{(-4,0)} U(1,1)U_{(-1,1)} 0.170 0.626 0.148 0.621
20 200 (4,0)\mathcal{L}_{(-4,0)} {0} 0.275 1.452 0.280 1.516
50 200 (4,0)\mathcal{L}_{(-4,0)} {0} 0.161 0.771 0.162 0.774
20 500 (4,0)\mathcal{L}_{(-4,0)} {0} 0.160 0.892 0.162 0.904
50 500 (4,0)\mathcal{L}_{(-4,0)} {0} 0.098 0.506 0.099 0.507
20 200 {1.47}\{-1.47\} U(1,1)U_{(-1,1)} 0.187 0.588 0.161 0.514
50 200 {1.47}\{-1.47\} U(1,1)U_{(-1,1)} 0.116 0.351 0.099 0.305
20 500 {1.47}\{-1.47\} U(1,1)U_{(-1,1)} 0.114 0.387 0.099 0.339
50 500 {1.47}\{-1.47\} U(1,1)U_{(-1,1)} 0.073 0.246 0.062 0.208
20 200 {1.47}\{-1.47\} {0} 0.150 0.482 0.151 0.484
50 200 {1.47}\{-1.47\} {0} 0.93 0.289 0.093 0.29
20 500 {1.47}\{-1.47\} {0} 0.93 0.309 0.093 0.311
50 500 {1.47}\{-1.47\} {0} 0.058 0.195 0.058 0.195
Density = 0.5
20 200 (2,2)\mathcal{L}_{(-2,2)} U(0.1,0.1)U_{(-0.1,0.1)} 0.132 0.415 0.012 0.318
50 200 (2,2)\mathcal{L}_{(-2,2)} U(0.1,0.1)U_{(-0.1,0.1)} 0.080 0.238 0.069 0.194
20 500 (2,2)\mathcal{L}_{(-2,2)} U(0.1,0.1)U_{(-0.1,0.1)} 0.080 0.272 0.068 0.217
50 500 (2,2)\mathcal{L}_{(-2,2)} U(0.1,0.1)U_{(-0.1,0.1)} 0.050 0.168 0.043 0.135
20 200 (1,1)\mathcal{L}_{(-1,1)} U(1,1)U_{(-1,1)} 0.107 0.324 0.095 0.264
50 200 (1,1)\mathcal{L}_{(-1,1)} U(1,1)U_{(-1,1)} 0.067 0.194 0.060 0.163
20 500 (1,1)\mathcal{L}_{(-1,1)} U(1,1)U_{(-1,1)} 0.071 0.267 0.061 0.205
50 500 (1,1)\mathcal{L}_{(-1,1)} U(1,1)U_{(-1,1)} 0.044 0.156 0.039 0.130
20 200 (2,2)\mathcal{L}_{(-2,2)} U(1,1)U_{(-1,1)} 0.137 0.478 0.112 0.329
50 200 (2,2)\mathcal{L}_{(-2,2)} U(1,1)U_{(-1,1)} 0.084 0.274 0.070 0.205
20 500 (2,2)\mathcal{L}_{(-2,2)} U(1,1)U_{(-1,1)} 0.087 0.352 0.071 0.250
50 500 (2,2)\mathcal{L}_{(-2,2)} U(1,1)U_{(-1,1)} 0.054 0.211 0.044 0.150
Table 4: The means and standard deviations of the errors of MME and MLE for estimating 𝜷0\boldsymbol{\beta}_{0}^{*} and 𝜷1\boldsymbol{\beta}_{1}^{*}.
nn 20 100 20 100
pp 200 200 500 500
𝜷0(1,1)\boldsymbol{\beta}_{0}^{*}\sim\mathcal{L}_{(-1,1)} and 𝜷1U(0,2)\boldsymbol{\beta}_{1}^{*}\sim U_{(0,2)}
MME, 2\ell_{2} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.163(0.010) 0.096(0.006) 0.099(0.004) 0.057(0.002)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.177(0.010) 0.084(0.005) 0.104(0.004) 0.050(0.002)
MME, \ell_{\infty} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.570(0.103) 0.367(0.085) 0.395(0.070) 0.241(0.042)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.658(0.137) 0.421(0.079) 0.438(0.076) 0.214(0.037)
MLE, 2\ell_{2} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.211(0.013) 0.091(0.006) 0.121(0.005) 0.054(0.002)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.166(0.011) 0.072(0.005) 0.096(0.004) 0.043(0.002)
MLE, \ell_{\infty} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.809(0.180) 0.354(0.076) 0.532(0.098) 0.232(0.041)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.617(0.116) 0.265(0.052) 0.399(0.065) 0.172(0.028)
𝜷0(2,0)\boldsymbol{\beta}_{0}^{*}\sim\mathcal{L}_{(-2,0)} and 𝜷1U(0,2)\boldsymbol{\beta}_{1}^{*}\sim U_{(0,2)}
MME, 2\ell_{2} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.133(0.012) 0.080(0.007) 0.081(0.004) 0.047(0.002)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.093(0.006) 0.053(0.004) 0.056(0.003) 0.032(0.002)
MME, \ell_{\infty} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.568(0.104) 0.365(0.087) 0.394(0.071) 0.241(0.042)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.387(0.069) 0.236(0.043) 0.258(0.037) 0.162(0.025)
MLE, 2\ell_{2} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.176(0.016) 0.076(0.007) 0.100(0.006) 0.044(0.002)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.116(0.009) 0.051(0.004) 0.068(0.003) 0.031(0.002)
MLE, \ell_{\infty} 𝜷0\boldsymbol{\beta}_{0}^{*} 0.809(0.181) 0.351(0.078) 0.531(0.099) 0.232(0.041)
𝜷1\boldsymbol{\beta}_{1}^{*} 0.513(0.088) 0.227(0.047) 0.348(0.058) 0.158(0.024)
  • The effect of (n,p)(n,p). Similar to what we have observed in Figure 2, the estimation errors become smaller when nn or pp becomes larger. Interestingly, from Tables 23 we can observe that, under the same setting, the errors in 2\ell_{2} norm when (n,p)=(50,200)(n,p)=(50,200) are very close to those when (n,p)=(20,500)(n,p)=(20,500). This is to some degree consistent with our finding in Theorem 2 where the upper bound depends on (n,p)(n,p) through their product npnp.

  • The effect of sparsity. From Table 2 we can see that, as the expected density decreases, the estimation errors increase in almost all the cases. On the other hand, even though the parameters take different values in Table 3, the errors in the sparse cases are in general larger than those in the dense cases.

  • The impact of κ0:=𝜷0\kappa_{0}:=\|\boldsymbol{\beta}_{0}^{*}\|_{\infty}. Typically, estimation errors tend to increase with larger values of κ0\kappa_{0}, as evidenced in Table 2. Additionally, when maintaining the same overall sparsity level, larger κ0\kappa_{0} values are correlated with greater estimation errors, as illustrated in Table 3.

  • MLE vs MME. In general, the estimation errors of the MLE are smaller than those of the MME in most cases as can be seen in Tables 2 and Table 3. In Table 4 where the estimation errors for 𝜷0\boldsymbol{\beta}_{0}^{*} and 𝜷1\boldsymbol{\beta}_{1}^{*} are reported separately, we can see that the estimation errors of the MME of 𝜷1\boldsymbol{\beta}_{1}^{*} are generally larger than those of the MLE of 𝜷1\boldsymbol{\beta}_{1}^{*}, especially when nn is large.

5.3 Real data

In this section, we apply our TWHM to a real dataset to examine an insect interaction network process [25]. We focus on a subset of the data named insecta-ant-colony4 that contains the social interactions of 102 ants in 41 days. In this dataset, the position and orientation of all the ants were recorded twice per second to infer their movements and interactions, based on which 41 daily networks were constructed. More specifically, Xi,jtX_{i,j}^{t} is 1 if there is an interaction between ants ii and jj during day tt, and 0 otherwise. In the ACF and PACF plots of the degree sequences of selected ants (c.f. Figure 1 in Appendix B.1), we can observe patterns similar to those of a first-order autoregressive model with long memory. This motivates the use of TWHM for the analysis of this dataset.

In [25], the 41 daily networks were split into four periods with 11, 10, 10, and 10 days respectively, because the corresponding days separating these periods were identified as change-points. By excluding ants that did not interact with others, we are left with p=102p=102 nodes in period one, p=73p=73 nodes in period two, p=55p=55 nodes in period three and p=35p=35 nodes in period four. Thus we take the networks on day 1, day 12, day 22 and day 32 as the initial networks and fit four different TWHMs, one for each of the four periods.

To appreciate how TWHM captures static heterogeneity, we present a subgraph of 10 nodes during the fourth period (t=32t=324141), 5 of which have the largest and 5 have the smallest fitted βi,0\beta_{i,0} values. The edges of this subgraph are drawn to represent aggregated static connections defined as (𝐗32++𝐗42)/10({\mathbf{X}}^{32}+\cdots+{\mathbf{X}}^{42})/10 between these ants. We can see from the left panel of Figure 3 that the magnitudes of the fitted static heterogeneity parameters agree in principle with the activeness of each ant making connections.

Refer to caption
Refer to caption
Figure 3: The aggregated networks of 10 selected ants during the fourth period reflect static heterogeneity (Left) and dynamic heterogeneity (Right) respectively. The thickness of each edge is proportional to the aggregation. The number in the nodes are the fitted βi,0\beta_{i,0} (Left) and βi,1\beta_{i,1} (Right).

On the other hand, we examine how TWHM can capture dynamic heterogeneity. Towards this, we plot a subgraph of the 10 nodes having the smallest fitted βi,0\beta_{i,0} values in Figure 3(b), where edges represent the magnitude of t=3341I(Xi,jt=Xi,jt1)/9\sum_{t=33}^{41}I\left(X_{i,j}^{t}=X_{i,j}^{t-1}\right)/9 which is a measure of the extent that an edge is preserved across the whole period and hence dynamic heterogeneity. Again, we can see an agreement between the fitted 𝜷1\boldsymbol{\beta}^{*}_{1} and how likely these nodes will preserve their ties.

To evaluate how TWHM performs when it comes to making prediction, we further carry out the following experiments:

  • (i)

    From (1), given the MLE {β^i,r,i=1,,p,r=0,1}\{\widehat{\beta}_{i,r},i=1,\ldots,p,r=0,1\} and the network at time t1t-1, we can estimate the conditional expectation of node ii’s degree as

    d~it\displaystyle\tilde{d}_{i}^{t} :=\displaystyle:= j=1,jipE(Xi,jt|Xi,jt1,𝜽^)\displaystyle\sum_{j=1,\>j\neq i}^{p}{\rm E}\left(X_{i,j}^{t}\Big{|}X_{i,j}^{t-1},\widehat{\boldsymbol{\theta}}\right)
    =\displaystyle= j=1,jip(eβ^i,0+β^j,01+eβ^i,0+β^j,0+eβ^i,1+β^j,1+eβ^i,1+β^j,11+eβ^i,0+β^j,+eβ^i,1+β^j,1Xi,jt1).\displaystyle\sum_{j=1,\>j\neq i}^{p}\left(\frac{e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,0}}}{1+e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,0}}+e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}}+\frac{e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}}{1+e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,}}+e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}}X_{i,j}^{t-1}\right).

    We can then compare the density of the estimated degree sequence {d~it,i=1,,p}\{\tilde{d}_{i}^{t},i=1,\ldots,p\} with that of the observed degree sequence {dit,i=1,,p}\{d_{i}^{t},i=1,\ldots,p\} at time tt. To provide a comparison, we treat networks in each period as i.i.d. observations and utilize the classical β\beta-model to derive the degree sequence estimator {𝐝ˇt}\{\check{{\mathbf{d}}}^{t}\} for the four periods. The fitted degree distributions are depicted in Figure 4, revealing a close resemblance between the estimated and observed densities. This observation suggests that the TWHM demonstrates strong performance in one-step-ahead prediction.

    To further assess the similarity between the estimated degree sequences {𝐝~t}\{\tilde{{\mathbf{d}}}^{t}\}, {𝐝ˇt}\{\check{{\mathbf{d}}}^{t}\}, and the true degree sequence {𝐝t}\{{\mathbf{d}}^{t}\}, we compute the Kolmogorov-Smirnov (KS) distance and conduct the KS test for t=2,,41t=2,\ldots,41. The mean and standard deviation of the KS distances, the p-values of the KS test, and the rejection rate are summarized in Table 5. Notably, at a significance level of 0.05, out of the 40 KS tests, we fail to reject the null hypothesis that {𝐝~t}\{\tilde{{\mathbf{d}}}^{t}\} and {𝐝t}\{{\mathbf{d}}^{t}\} originate from the same distribution in 38 instances, resulting in a rejection rate consistent with the significance level. Conversely, for the degree sequence estimators based on the β\beta-model {𝐝ˇt}\{\check{{\mathbf{d}}}^{t}\}, 8 out of the 40 tests were rejected. These findings indicate that our model exhibits highly promising performance in recovering the degree sequences.

    Table 5: The mean and standard deviation of the KS distances, the p-values of KS test, and the rejection rate between the true degree sequence {𝐝t}\{{\mathbf{d}}^{t}\}, and the THWM based estimator 𝐝~t\tilde{{\mathbf{d}}}^{t} and the β\beta-model based estimator 𝐝ˇt\check{{\mathbf{d}}}^{t} over the 40 networks (t=2,,41t=2,\ldots,41) in the ant dataset.
    KS distance KS test p-value Rejection rate
    𝐝~t\tilde{{\mathbf{d}}}^{t} vs 𝐝t{\mathbf{d}}^{t} 0.179(0.058) 0.361(0.267) 0.05
    𝐝ˇt\check{{\mathbf{d}}}^{t} vs 𝐝t{\mathbf{d}}^{t} 0.192(0.061) 0.298(0.246) 0.20
    Refer to caption
    Figure 4: The observed and estimated degree distributions. X-axis: the node degrees; Red curves: the smoothed degree distributions of the estimated degree sequences; Black curves: the smoothed degree distributions of the observed degree sequences.
  • (ii)

    By incorporating network dynamics, TWHM naturally enables one-step-ahead link prediction via

    (5.15) 𝐏(X^i,jt=1|Xi,jt1)=eβ^i,0+β^j,01+eβ^i,0+β^j,0+eβ^i,1+β^j,1+eβ^i,1+β^j,11+eβ^i,0+β^j,0+eβ^i,1+β^j,1Xi,jt1.{\mathbf{P}}\left(\widehat{X}_{i,j}^{t}=1\Big{|}X_{i,j}^{t-1}\right)=\frac{e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,0}}}{1+e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,0}}+e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}}+\frac{e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}}{1+e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,0}}+e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}}X_{i,j}^{t-1}.

    To transform these probabilities into links, we threshold them by setting X^i,jt=1\widehat{X}_{i,j}^{t}=1 when 𝐏(X^i,jt=1){\mathbf{P}}\left(\widehat{X}_{i,j}^{t}=1\right) ci,j\geq c_{i,j} and X^i,jt=0\widehat{X}_{i,j}^{t}=0 when 𝐏(X^i,jt=1)<ci,j{\mathbf{P}}\left(\widehat{X}_{i,j}^{t}=1\right)<c_{i,j} for some cut-off constants ci,jc_{i,j}. As an illustration, we first consider simply setting ci,j=0.5c_{i,j}=0.5 for all 1i<jp1\leq i<j\leq p for predicting links. We shall denote this approach as TWHM0.5.

    As an alternative, owing to the fact that networks may change slowly, for a given parameter ω\omega, we also consider the following adaptive approach for choosing ci,jc_{i,j}:

    (5.16) X~i,jt:=I{ω𝐏(X^i,jt=1)+(1ω)Xi,jt1>0.5}.\tilde{X}_{i,j}^{t}:=I\{\omega{\mathbf{P}}\left(\widehat{X}_{i,j}^{t}=1\right)+\left(1-\omega\right)X_{i,j}^{t-1}>0.5\}.

    It can be shown that the above estimator is equivalent to the prediction rule I{𝐏(X^i,jt=1)>ci,j}I\left\{{\mathbf{P}}\left(\widehat{X}_{i,j}^{t}=1\right)>c_{i,j}\right\} with cut-off values specified as

    ci,j=0.5eβ^i,1+β^j,1+(1w)eβ^i,0+β^j,0(1w)+eβ^i,1+β^j,1+(1w)eβ^i,0+β^j,0,1i<jp.c_{i,j}=\frac{0.5e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}+\left(1-w\right)e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,0}}}{\left(1-w\right)+e^{\widehat{\beta}_{i,1}+\widehat{\beta}_{j,1}}+\left(1-w\right)e^{\widehat{\beta}_{i,0}+\widehat{\beta}_{j,0}}},\quad 1\leq i<j\leq p.

    This method is denoted as TWHMadaptive. Lastly, as a benchmark, we have also considered a naive approach that simply predicts 𝐗t{\mathbf{X}}^{t} as 𝐗t1{\mathbf{X}}^{t-1}.

    In this experiment, we set the number of training samples to be ntrain=2,5n_{train}=2,5 or 88. For a given training sample size ntrainn_{train} and a period with nn networks, we predict the graph 𝐗ntrain+i{\bf X}^{n_{train}+i} based on the previous ntrainn_{train} networks {𝐗t,t=i,,ntrain+i1}\{{\bf X}^{t},t=i,\ldots,n_{train}+i-1\} for i=1,,nntraini=1,\ldots,n-n_{train}. That is, over the four periods in the data, we have predicted 33, 21 and 9 networks, with 5151 edges in each network in the first period, 2628 in the second period, 1485 in the third period, and 595 in the fourth period for our choices of ntrainn_{train}. The ω\omega parameter employed in TWHMadaptive is selected as follows. For prediction in each period, we choose the value in a sequence of ω\omega values that produces the highest prediction accuracy in predicting 𝐗ntrain+i1{\mathbf{X}}^{n_{train}+i-1} for predicting 𝐗ntrain+i{\mathbf{X}}^{n_{train}+i}. For example, in the first period with n=11n=11 networks, when ntrain=8n_{train}=8, we used {𝐗t,t=i,,i+7}\{{\mathbf{X}}^{t},t=i,\cdots,i+7\} to predict 𝐗i+8{\mathbf{X}}^{i+8} for i=1,2,3i=1,2,3. For each ii, let 𝐗~i+7\tilde{\bf X}^{i+7} be defined as in (5.16). A set of candidate values for ω\omega were used to compute 𝐗~i+7\tilde{\bf X}^{i+7}, and the one that returns the smallest misclassification rate (in predicting 𝐗i+7{\bf X}^{i+7}) was used in TWHMadaptive for predicting 𝐗i+8{\mathbf{X}}^{i+8}. The mean of the chosen ω\omega is 0.9360.936 when n=2n=2, 0.8950.895 when n=5n=5, and 0.9050.905 when n=8n=8. The prediction accuracy of the above-mentioned methods, defined as the percentages of correctly predicted links, are reported in Table 6. We can see that TWHM0.5 and TWHMadaptive both perform better than the naive approach in all the cases. On the other hand, TWHM coupled with adaptive cut-off points can improve the prediction accuracy of TWHM with a cur-off value 0.5 in most periods.

Table 6: The prediction accuracy of TWHM with 0.50.5 as a cut-off point, TWHM with adaptive cut-off points, and the naive estimator 𝐗t1{\mathbf{X}}^{t-1}.
ntrainn_{train} Period TWHM0.5 TWHMadaptive Naive
2 One 0.773 0.800 0.749
Two 0.817 0.817 0.780
Three 0.837 0.837 0.806
Four 0.824 0.831 0.807
Overall 0.811 0.822 0.784
5 One 0.789 0.807 0.759
Two 0.826 0.823 0.779
Three 0.846 0.849 0.805
Four 0.833 0.842 0.805
Overall 0.822 0.829 0.786
8 One 0.795 0.800 0.759
Two 0.832 0.832 0.778
Three 0.855 0.845 0.823
Four 0.831 0.863 0.779
Overall 0.825 0.831 0.782

6 Summary and Discussion

We have proposed a novel two-way heterogeneity model that utilizes two sets of parameters to explicitly capture static heterogeneity and dynamic heterogeneity. In a high-dimension setup, we have provided the existence and the rate of convergence of its local MLE, and proposed a novel method of moments estimator as an initial value to find this local MLE. To the best of our knowledge, this is the first model in the network literature that the local MLE is obtained for a non-convex loss function. The theory of our model is established by developing new uniform upper bounds for the deviation of the loss function.

While we have focused on the estimation of the parameters in this paper, how to conduct statistical inference for the local MLE is a natural next step for research. In our setup, we assume that the parameters are time invariant but this need not be the case. A future direction is to allow the static heterogeneity parameter 𝜷0\boldsymbol{\beta}_{0} and/or the dynamic heterogeneity parameter 𝜷1\boldsymbol{\beta}_{1} to depend on time, giving rise to non-stationary network processes. In case when these parameters change smoothly over time, we may consider estimating the parameters βi,0τ,βi,1τ\beta_{i,0}^{\tau},\beta_{i,1}^{\tau} at time τ\tau by kernel smoothing, that is, by maximizing the following smoothed log-likelihood:

L~(τ,𝐗n,𝐗n1,,𝐗1|𝐗0)=t=1nwt1i<jp{log(1+eβi,0+βj,0+eβi,1+βj,1)+(βi,0+βj,0)Xi,jt(1Xi,jt1)+(1Xi,jt)(1Xi,jt1)log(1+eβi,1+βj,1)+Xi,jtXi,jt1log(eβi,0+βj,0+eβi,1+βj,1)},\tilde{L}(\tau,{\mathbf{X}}^{n},{\mathbf{X}}^{n-1},\cdots,{\mathbf{X}}^{1}|{\mathbf{X}}^{0})\\ ={\sum_{t=1}^{n}w_{t}\sum_{1\leq i<j\leq p}\Bigg{\{}-\log\Big{(}{1+e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}}\Big{)}+\left(\beta_{i,0}+\beta_{j,0}\right)X_{i,j}^{t}\left(1-X_{i,j}^{t-1}\right)}\\ +\left(1-X_{i,j}^{t}\right)\left(1-X_{i,j}^{t-1}\right)\log\left(1+e^{\beta_{i,1}+\beta_{j,1}}\right)+X_{i,j}^{t}X_{i,j}^{t-1}\log\big{(}e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}\big{)}\Bigg{\}},

with wt=K(h1|tτ|)t=1nK(h1|tτ|),w_{t}=\frac{K(h^{-1}|t-\tau|)}{\sum_{t=1}^{n}K(h^{-1}|t-\tau|)}, where K()K(\cdot) is a kernel function and hh is the bandwidth parameter. As another line of research, note that TWHM is formulated as an AR(1) process. We can extend it by including more time lags. For example, we can extend TWHM to include lag-kk dependence by writing

Xi,jt=I(εi,jt=0)+r=1kXi,jtrI(εi,jt=r),X^{t}_{i,j}=I({\varepsilon}_{i,j}^{t}=0)+\sum_{r=1}^{k}X^{t-r}_{i,j}I({\varepsilon}_{i,j}^{t}=r),

where the innovations εi,jt{\varepsilon}_{i,j}^{t} are independent such that

P(εi,jt=r)=eβi,r+βj,r1+s=0keβi,s+βj,sforr=0,,k;P(εi,jt=1)=11+s=0keβi,s+βj,s,P({\varepsilon}_{i,j}^{t}=r)=\frac{e^{\beta_{i,r}+\beta_{j,r}}}{1+\sum_{s=0}^{k}e^{\beta_{i,s}+\beta_{j,s}}}~{}~{}{\rm for}~{}r=0,\cdots,k;\quad P({\varepsilon}_{i,j}^{t}=-1)=\frac{1}{1+\sum_{s=0}^{k}e^{\beta_{i,s}+\beta_{j,s}}},

with parameter 𝜷0=(β1,0,,βp,0)\boldsymbol{\beta}_{0}=(\beta_{1,0},\ldots,\beta_{p,0})^{\top} denoting node-specific static heterogeneity and 𝜷=(βi,r)1ip;1rk\boldsymbol{\beta}=\left(\beta_{i,r}\right)_{1\leq i\leq p;1\leq r\leq k} p×k\in\mathbb{R}^{p\times k} denoting lag-kk dynamic fluctuation. Other future lines of research include adding covariates to model the tendency of nodes making connections [35] and exploring additional structures [3].

References

  • Bhattacharjee et al., [2020] Bhattacharjee, M., Banerjee, M., and Michailidis, G. (2020). Change point estimation in a dynamic stochastic block model. Journal of Machine Learning Research, 21(107):1–59.
  • Chatterjee et al., [2011] Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence. The Annals of Applied Probability, 21(4):1400–1435.
  • Chen et al., [2021] Chen, M., Kato, K., and Leng, C. (2021). Analysis of networks via the sparse β\beta-model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(5).
  • Durante et al., [2016] Durante, D., Dunson, D. B., et al. (2016). Locally adaptive dynamic networks. The Annals of Applied Statistics, 10(4):2203–2232.
  • Fu and He, [2022] Fu, D. and He, J. (2022). Dppin: A biological repository of dynamic protein-protein interaction network data. In 2022 IEEE International Conference on Big Data (Big Data), pages 5269–5277. IEEE.
  • Gragg and Tapia, [1974] Gragg, W. and Tapia, R. (1974). Optimal error bounds for the newton–kantorovich theorem. SIAM Journal on Numerical Analysis, 11(1):10–13.
  • Graham, [2017] Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity. Econometrica, 85(4):1033–1063.
  • Han et al., [2020] Han, R., Chen, K., and Tan, C. (2020). Bivariate gamma model. Journal of Multivariate Analysis, 180:104666.
  • Han et al., [2023] Han, R., Xu, Y., and Chen, K. (2023). A general pairwise comparison model for extremely sparse networks. Journal of the American Statistical Association, 118(544):2422–2432.
  • Hanneke et al., [2010] Hanneke, S., Fu, W., and Xing, E. P. (2010). Discrete temporal models of social networks. Electronic journal of statistics, 4:585–605.
  • Hanneke and Xing, [2007] Hanneke, S. and Xing, E. P. (2007). Discrete temporal models of social networks. In Statistical network analysis: models, issues, and new directions: ICML 2006 workshop on statistical network analysis, Pittsburgh, PA, USA, June 29, 2006, Revised Selected Papers, pages 115–125. Springer.
  • Hillar et al., [2012] Hillar, C. J., Lin, S., and Wibisono, A. (2012). Inverses of symmetric, diagonally dominant positive matrices and applications. arXiv preprint arXiv:1203.6812.
  • Holland and Leinhardt, [1981] Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association, 76(373):33–50.
  • Jiang et al., [2023] Jiang, B., Li, J., and Yao, Q. (2023). Autoregressive networks. Journal of Machine Learning Research, 24(227):1–69.
  • Jin, [2015] Jin, J. (2015). Fast community detection by score. The Annals of Statistics, 43(1):57–89.
  • Jin et al., [2022] Jin, J., Ke, Z. T., Luo, S., and Wang, M. (2022). Optimal estimation of the number of network communities. Journal of the American Statistical Association, pages 1–16.
  • Karrer and Newman, [2011] Karrer, B. and Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1):016107.
  • Karwa et al., [2016] Karwa, V., Slavković, A., et al. (2016). Inference using noisy degrees: Differentially private betabeta-model and synthetic graphs. The Annals of Statistics, 44(1):87–112.
  • Ke and Jin, [2022] Ke, Z. T. and Jin, J. (2022). The score normalization, especially for highly heterogeneous network and text data. arXiv preprint arXiv:2204.11097.
  • Kolaczyk and Csárdi, [2020] Kolaczyk, E. D. and Csárdi, G. (2020). Statistical analysis of network data with R, volume 65. Springer, 2 edition.
  • Krivitsky and Handcock, [2014] Krivitsky, P. N. and Handcock, M. S. (2014). A separable model for dynamic networks. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 76(1):29.
  • Lin and Bai, [2011] Lin, Z. and Bai, Z. (2011). Probability inequalities. Springer Science & Business Media.
  • Matias and Miele, [2017] Matias, C. and Miele, V. (2017). Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society, B, 79(4):1119–1141.
  • Merlevède et al., [2009] Merlevède, F., Peligrad, M., Rio, E., et al. (2009). Bernstein inequality and moderate deviations under strong mixing conditions. In High dimensional probability V: the Luminy volume, pages 273–292. Institute of Mathematical Statistics.
  • Mersch et al., [2013] Mersch, D. P., Crespi, A., and Keller, L. (2013). Tracking individuals shows spatial fidelity is a key regulator of ant social organization. Science, 340(6136):1090–1093.
  • Minai and Williams, [1993] Minai, A. A. and Williams, R. D. (1993). On the derivatives of the sigmoid. Neural Networks, 6(6):845–853.
  • Newman, [2018] Newman, M. (2018). Networks. Oxford university press.
  • Pensky, [2019] Pensky, M. (2019). Dynamic network models and graphon estimation. Annals of Statistics, 47(4):2378–2403.
  • Sengupta and Chen, [2018] Sengupta, S. and Chen, Y. (2018). A block model for node popularity in networks with community structure. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(2):365–386.
  • Stein and Leng, [2020] Stein, S. and Leng, C. (2020). A sparse β\beta-model with covariates for networks. arXiv preprint arXiv:2010.13604.
  • Van der Vaart, [2000] Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge university press.
  • Van Der Vaart and Wellner, [1996] Van Der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence. In Weak convergence and empirical processes, pages 16–28. Springer.
  • Wainwright, [2019] Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press.
  • Wu et al., [2006] Wu, X., Zhu, L., Guo, J., Zhang, D.-Y., and Lin, K. (2006). Prediction of yeast protein–protein interaction network: insights from the gene ontology and annotations. Nucleic acids research, 34(7):2137–2150.
  • Yan et al., [2019] Yan, T., Jiang, B., Fienberg, S. E., and Leng, C. (2019). Statistical inference in a directed network model with covariates. Journal of the American Statistical Association, 114(526):857–868.
  • Yan et al., [2015] Yan, T., Leng, C., and Zhu, J. (2015). Supplement to “asymptotics in directed exponential random graph models with an increasing bi-degree sequence”. Annals of Statistics.
  • Yan et al., [2016] Yan, T., Leng, C., Zhu, J., et al. (2016). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Annals of Statistics, 44(1):31–57.
  • Yan and Xu, [2012] Yan, T. and Xu, J. (2012). Approximating the inverse of a balanced symmetric matrix with positive elements. arXiv preprint arXiv:1202.1058.
  • Yan and Xu, [2013] Yan, T. and Xu, J. (2013). A central limit theorem in the β\beta-model for undirected random graphs with a diverging number of vertices. Biometrika, 100(2):519–524.

Appendix A Technical proofs

For brevity, we denote α0,i,j:=eβi,0+βj,0\alpha_{0,i,j}:=e^{\beta_{i,0}+\beta_{j,0}} and α1,i,j:=eβi,1+βj,1\alpha_{1,i,j}:=e^{\beta_{i,1}+\beta_{j,1}}, and define α0,i,j,α1,i,j\alpha^{*}_{0,i,j},\alpha^{*}_{1,i,j} and α^0,i,j,α^1,i,j\widehat{\alpha}_{0,i,j},\widehat{\alpha}_{1,i,j} similarly based on the true parameter 𝜽\boldsymbol{\theta}^{*} and the MLE 𝜽^\widehat{\boldsymbol{\theta}}.

A.1 Some technical lemmas

Before presenting the proofs for our main results, we first provide some technical lemmas which will be used from time to time in our proofs. Lemmas 4 and 5 below provide further properties about the process {𝐗t}\{{\mathbf{X}}^{t}\}.

Lemma 4.

Let {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}} We have:

{𝐗t𝐗t1,t=0,1,2,}\{{\mathbf{X}}^{t}\circ{\mathbf{X}}^{t-1},t=0,1,2,\cdots\} where \circ is the Hadamard product operator, is strictly stationary. Furthermore for any 1i<jp,1<mp1\leq i<j\leq p,1\leq\ell<m\leq p and |ts|1|t-s|\geq 1, we have

E(Xi,jtXi,jt1)=α0,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j),\displaystyle{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)=\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})},
Var(Xi,jtXi,jt1)=α0,i,j(α0,i,j+α1,i,j)(2α0,i,j+α1,i,j+1)(1+α0,i,j)2(1+α0,i,j+α1,i,j)2,\displaystyle{\rm Var}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)=\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})(2\alpha_{0,i,j}+\alpha_{1,i,j}+1)}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}},
Cov(Xi,jtXi,jt1,Xl,msXl,ms1)=\displaystyle{\rm Cov}(X_{i,j}^{t}X_{i,j}^{t-1},X_{l,m}^{s}X_{l,m}^{s-1})=
{(α1,i,j1+α0,i,j+α1,i,j)|ts|1α0,i,j(α0,i,j+α1,i,j)2(1+α0,i,j)2(1+α0,i,j+α1,i,j)2,(i,j)=(l,m),0,(i,j)(l,m).\displaystyle\left\{\begin{array}[]{ccc}\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{|t-s|-1}\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}},&{(i,j)=(l,m)},\\ \\ 0,&{(i,j)\neq(l,m)}.\end{array}\right.

{(1𝐗t)(1𝐗t1),t=0,1,2,}\{\left(1-{\mathbf{X}}^{t}\right)\circ\left(1-{\mathbf{X}}^{t-1}\right),t=0,1,2,\cdots\} is strictly stationary. Furthermore for any 1i<jp,1<mp1\leq i<j\leq p,1\leq\ell<m\leq p and |ts|1|t-s|\geq 1, we have

E((1Xi,jt)(1Xi,jt1))=1+α1,i,j(1+α0,i,j)(1+α0,i,j+α1,i,j),\displaystyle{\rm E}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\right)=\frac{1+\alpha_{1,i,j}}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})},
Var((1Xi,jt)(1Xi,jt1))=α0,i,j(1+α1,i,j)(α0,i,j+α1,i,j+2)(1+α0,i,j)2(1+α0,i,j+α1,i,j)2,\displaystyle{\rm Var}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\right)=\frac{\alpha_{0,i,j}(1+\alpha_{1,i,j})(\alpha_{0,i,j}+\alpha_{1,i,j}+2)}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}},
Cov((1Xi,jt)(1Xi,jt1),(1Xl.ms)(1Xl,ms1))=\displaystyle{\rm Cov}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)},(1-X_{l.m}^{s})(1-X_{l,m}^{s-1})\right)=
{(α1,i,j1+α0,i,j+α1,i,j)|ts|1α0,i,j(1+α1,i,j)2(1+α0,i,j)2(1+α0,i,j+α1,i,j)2,(i,j)=(l,m),0,(i,j)(l,m).\displaystyle\left\{\begin{array}[]{ccc}\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{|t-s|-1}\frac{\alpha_{0,i,j}(1+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}},&{(i,j)=(l,m)},\\ \\ 0,&{(i,j)\neq(l,m)}.\end{array}\right.
Proof.

(i) Denote μi,j=E(Xi,jtXi,jt1)\mu_{i,j}={\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right), γi,j(k)=Cov(Xi,jtXi,jt1,Xi,jtkXi,jtk1)\gamma_{i,j}(k)={\rm Cov}\left(X_{i,j}^{t}X_{i,j}^{t-1},X_{i,j}^{t-k}X_{i,j}^{t-k-1}\right) and ρi,j(k)=\rho_{i,j}(k)= γi,j(k)/γi,j(1)\gamma_{i,j}(k)/\gamma_{i,j}(1) (k1k\geq 1). For every i<ji<j, we have

E(Xi,jtXi,jt1)=P(Xi,jt=1|Xi,jt1=1)P(Xi,jt1=1)=α0,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j),\displaystyle{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)=P\left(X_{i,j}^{t}=1\Big{|}X_{i,j}^{t-1}=1\right)P\left(X_{i,j}^{t-1}=1\right)=\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})},
Var(Xi,jtXi,jt1)=E((Xi,jtXi,jt1)2)E(Xi,jtXi,jt1)2\displaystyle{\rm Var}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)={\rm E}\left(\Big{(}X_{i,j}^{t}X_{i,j}^{t-1}\Big{)}^{2}\right)-{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)^{2}
=(1E(Xi,jtXi,jt1))E(Xi,jtXi,jt1)=α0,i,j(α0,i,j+α1,i,j)(2α0,i,j+α1,i,j+1)(1+α0,i,j)2(1+α0,i,j+α1,i,j)2,\displaystyle=\left(1-{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)\right){\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)=\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})(2\alpha_{0,i,j}+\alpha_{1,i,j}+1)}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}},

and

γi,j(1)\displaystyle\gamma_{i,j}(1) =\displaystyle= E(Xi,jt(Xi,jt1)2Xi,jt2)μi,j2=P(Xi,jtXi,jt1Xi,jt2=1)μi,j2\displaystyle{\rm E}\left(X_{i,j}^{t}(X_{i,j}^{t-1})^{2}X_{i,j}^{t-2}\right)-\mu^{2}_{i,j}=P\left(X_{i,j}^{t}X_{i,j}^{t-1}X_{i,j}^{t-2}=1\right)-\mu^{2}_{i,j}
=\displaystyle= P(Xi,jt|Xi,jt1Xi,jt2=1)P(Xi,jt1Xi,jt2=1)μi,j2\displaystyle P\left(X_{i,j}^{t}\Big{|}X_{i,j}^{t-1}X_{i,j}^{t-2}=1\right)P\left(X_{i,j}^{t-1}X_{i,j}^{t-2}=1\right)-\mu^{2}_{i,j}
=\displaystyle= E(Xi,jt|Xi,jt1=1)μi,jμi,j2\displaystyle{\rm E}\left(X_{i,j}^{t}\Big{|}X_{i,j}^{t-1}=1\right)\mu_{i,j}-\mu^{2}_{i,j}
=\displaystyle= α0,i,j(α0,i,j+α1,i,j)2(1+α0,i,j)2(1+α0,i,j+α1,i,j)2.\displaystyle\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}.

For k2k\geq 2, by Proposition 1, we have,

γi,j(k)\displaystyle\gamma_{i,j}(k) =\displaystyle= E(Xi,jtXi,jt1Xi,jtkXi,jtk1)μi,j2\displaystyle{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}X_{i,j}^{t-k}X_{i,j}^{t-k-1}\right)-\mu^{2}_{i,j}
=\displaystyle= P(Xi,jtXi,jt1=1|Xi,jtkXi,jtk1=1)P(Xi,jtkXi,jtk1=1)μi,j2\displaystyle P\left(X_{i,j}^{t}X_{i,j}^{t-1}=1\Big{|}X_{i,j}^{t-k}X_{i,j}^{t-k-1}=1\right)P\left(X_{i,j}^{t-k}X_{i,j}^{t-k-1}=1\right)-\mu^{2}_{i,j}
=\displaystyle= P(Xi,jt=1|Xi,jt1=1)P(Xi,jt1=1|Xi,jtk=1)μi,jμi,j2\displaystyle P\left(X_{i,j}^{t}=1\Big{|}X_{i,j}^{t-1}=1\right)P\left(X_{i,j}^{t-1}=1\Big{|}X_{i,j}^{t-k}=1\right)\mu_{i,j}-\mu_{i,j}^{2}
=\displaystyle= P(Xi,jt=1|Xi,jt1=1)P(Xi,jt1Xi,jtk=1)P(Xi,jtk=1)1μi,jμi,j2\displaystyle P\left(X_{i,j}^{t}=1\Big{|}X_{i,j}^{t-1}=1\right)P\left(X_{i,j}^{t-1}X_{i,j}^{t-k}=1\right)P\left(X_{i,j}^{t-k}=1\right)^{-1}\mu_{i,j}-\mu^{2}_{i,j}
=\displaystyle= P(Xi,jt=1|Xi,jt1=1)P(Xi,jt1Xi,jtk=1)P(Xi,jtk+1=1|Xi,jtk=1)μi,j2\displaystyle P\left(X_{i,j}^{t}=1\Big{|}X_{i,j}^{t-1}=1\right)P\left(X_{i,j}^{t-1}X_{i,j}^{t-k}=1\right)P\left(X_{i,j}^{t-k+1}=1\Big{|}X_{i,j}^{t-k}=1\right)-\mu^{2}_{i,j}
=\displaystyle= P(Xi,jt1Xi,jtk=1)P(Xi,jt=1|Xi,jt1=1)2μi,j2\displaystyle P\left(X_{i,j}^{t-1}X_{i,j}^{t-k}=1\right)P\left(X_{i,j}^{t}=1\Big{|}X_{i,j}^{t-1}=1\right)^{2}-\mu^{2}_{i,j}
=\displaystyle= (P(Xi,jt1Xi,jtk=1)E(Xi,jt)2)P(Xi,jt=1|Xi,jt1=1)2\displaystyle\left(P\left(X_{i,j}^{t-1}X_{i,j}^{t-k}=1\right)-{\rm E}\left(X_{i,j}^{t}\right)^{2}\right)P\left(X_{i,j}^{t}=1\Big{|}X_{i,j}^{t-1}=1\right)^{2}
=\displaystyle= Cov(Xi,jt1,Xi,jtk)(α0,i,j+α1,i,j)2(1+α0,i,j+α1,i,j)2\displaystyle{\rm Cov}\left(X_{i,j}^{t-1},X_{i,j}^{t-k}\right)\frac{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
=\displaystyle= (α1,i,j1+α0,i,j+α1,i,j)k1α0,i,j(1+α0,i,j)2(α0,i,j+α1,i,j)2(1+α0,i,j+α1,i,j)2\displaystyle\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{k-1}\frac{\alpha_{0,i,j}}{(1+\alpha_{0,i,j})^{2}}\frac{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
=\displaystyle= (α1,i,j1+α0,i,j+α1,i,j)k1γi,j(1).\displaystyle\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{k-1}\gamma_{i,j}(1).

This proves (i).

(ii) Let μi,j=E((1Xi,jt)(1Xi,jt1))\mu_{i,j}^{\prime}={\rm E}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\right), γi,j(k)=Cov((1Xi,jt)(1Xi,jt1),\gamma_{i,j}^{\prime}(k)={\rm Cov}\Bigg{(}\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}, (1Xi,jtk)(1Xi,jtk1))\left(1-X_{i,j}^{t-k}\right)\left(1-X_{i,j}^{t-k-1}\right)\Bigg{)} and ρi,j(k)=\rho_{i,j}^{\prime}(k)= γi,j(k)/γi,j(1)\gamma_{i,j}^{\prime}(k)/\gamma_{i,j}^{\prime}(1) (k1k\geq 1). Similarly, for every i<ji<j, we have

E((1Xi,jt)(1Xi,jt1))\displaystyle{\rm E}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\right) =\displaystyle= P(Xi,jt=0|Xi,jt1=0)P(Xi,jt1=0)\displaystyle P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)P\left(X_{i,j}^{t-1}=0\right)
=\displaystyle= 1+α1,i,j(1+α0,i,j)(1+α0,i,j+α1,i,j),\displaystyle\frac{1+\alpha_{1,i,j}}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})},
Var((1Xi,jt)(1Xi,jt1))\displaystyle{\rm Var}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\right) =\displaystyle= (1μi,j)μi,j=α0,i,j(1+α1,i,j)(α0,i,j+α1,i,j+2)(1+α0,i,j)2(1+α0,i,j+α1,i,j)2,\displaystyle\left(1-\mu^{\prime}_{i,j}\right)\mu^{\prime}_{i,j}=\frac{\alpha_{0,i,j}(1+\alpha_{1,i,j})(\alpha_{0,i,j}+\alpha_{1,i,j}+2)}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}},

and

γi,j(1)\displaystyle\gamma_{i,j}^{\prime}(1) =\displaystyle= Cov((1Xi,jt)(1Xi,jt1),(1Xi,jt1)(1Xi,jt2))\displaystyle{\rm Cov}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)},\left(1-X_{i,j}^{t-1}\right)\left(1-X_{i,j}^{t-2}\right)\right)
=\displaystyle= E((1Xi,jt)(1Xi,jt1)2(1Xi,jt2))E((1Xi,jt)(1Xi,jt1))2\displaystyle{\rm E}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}^{2}\left(1-X_{i,j}^{t-2}\right)\right)-{\rm E}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\right)^{2}
=\displaystyle= P((1Xi,jt)(1Xi,jt1)2(1Xi,jt2)=1)E((1Xi,jt)(1Xi,jt1))2\displaystyle P\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}^{2}\left(1-X_{i,j}^{t-2}\right)=1\right)-{\rm E}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\right)^{2}
=\displaystyle= P((1Xi,jt1)(1Xi,jt2)=1)P(Xi,jt=0|Xi,jt1=0)(μi,j)2\displaystyle P\left((1-X_{i,j}^{t-1})(1-X_{i,j}^{t-2})=1\right)P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)-\left(\mu^{\prime}_{i,j}\right)^{2}
=\displaystyle= (1+α0,i,j)(μi,j)2(μi,j)2\displaystyle(1+\alpha_{0,i,j})\left(\mu^{\prime}_{i,j}\right)^{2}-\left(\mu_{i,j}^{\prime}\right)^{2}
=\displaystyle= α0,i,j(μi,j)2\displaystyle\alpha_{0,i,j}\left(\mu^{\prime}_{i,j}\right)^{2}
=\displaystyle= α0,i,j(1+α1,i,j)2(1+α0,i,j)2(1+α0,i,j+α1,i,j)2.\displaystyle\frac{\alpha_{0,i,j}(1+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}.

For k2k\geq 2 we have,

γi,j(k)\displaystyle\gamma_{i,j}^{\prime}(k) =\displaystyle= E((1Xi,jt)(1Xi,jt1)(1Xi,jtk)(1Xi,jtk1))(μi,j)2\displaystyle{\rm E}\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}\left(1-X_{i,j}^{t-k}\right)\left(1-X_{i,j}^{t-k-1}\right)\right)-\left(\mu^{\prime}_{i,j}\right)^{2}
=\displaystyle= P((1Xi,jt)(1Xi,jt1)=1|(1Xi,jtk)(1Xi,jtk1)=1)μi,j(μi,j)2\displaystyle P\left(\Big{(}1-X_{i,j}^{t}\Big{)}\Big{(}1-X_{i,j}^{t-1}\Big{)}=1\Big{|}\left(1-X_{i,j}^{t-k}\right)\left(1-X_{i,j}^{t-k-1}\right)=1\right)\mu^{\prime}_{i,j}-\left(\mu^{\prime}_{i,j}\right)^{2}
=\displaystyle= P(Xi,jt=0|Xi,jt1=0)P(Xi,jt1=0|Xi,jtk=0)μi,j(μi,j)2\displaystyle P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)P\left(X_{i,j}^{t-1}=0\Big{|}X_{i,j}^{t-k}=0\right)\mu^{\prime}_{i,j}-\left(\mu^{\prime}_{i,j}\right)^{2}
=\displaystyle= P(Xi,jt=0|Xi,jt1=0)P(Xi,jt1=0,Xi,jtk=0)P(Xi,jtk=0)1μi,j(μi,j)2,\displaystyle P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)P\left(X_{i,j}^{t-1}=0,X_{i,j}^{t-k}=0\right)P\left(X_{i,j}^{t-k}=0\right)^{-1}\mu^{\prime}_{i,j}-\left(\mu^{\prime}_{i,j}\right)^{2},

with

P(Xi,jt1=0,Xi,jtk=0)\displaystyle P\left(X_{i,j}^{t-1}=0,X_{i,j}^{t-k}=0\right) =\displaystyle= Cov(1Xi,jt1,1Xi,jtk)+E(1Xi,jt)2\displaystyle{\rm Cov}\left(1-X_{i,j}^{t-1},1-X_{i,j}^{t-k}\right)+{\rm E}\left(1-X_{i,j}^{t}\right)^{2}
=\displaystyle= Cov(Xi,jt1,Xi,jtk)+1(1+α0,i,j)2\displaystyle{\rm Cov}\left(X_{i,j}^{t-1},X_{i,j}^{t-k}\right)+\frac{1}{(1+\alpha_{0,i,j})^{2}}
=\displaystyle= (α1,i,j1+α0,i,j+α1,i,j)k1α0,i,j(1+α0,i,j)2+1(1+α0,i,j)2\displaystyle\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{k-1}\frac{\alpha_{0,i,j}}{(1+\alpha_{0,i,j})^{2}}+\frac{1}{(1+\alpha_{0,i,j})^{2}}
=\displaystyle= 1(1+α0,i,j)2((α1,i,j1+α0,i,j+α1,i,j)k1α0,i,j+1),\displaystyle\frac{1}{(1+\alpha_{0,i,j})^{2}}\left(\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{k-1}\alpha_{0,i,j}+1\right),

and

P(Xi,jt=0|Xi,jt1=0)1(1+α0,i,j)2P(Xi,jtk=0)1=1+α1,i,j(1+α0,i,j)(1+α0,i,j+α1,i,j).P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)\frac{1}{(1+\alpha_{0,i,j})^{2}}P\left(X_{i,j}^{t-k}=0\right)^{-1}=\frac{1+\alpha_{1,i,j}}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})}.

Thus,

γi,j(k)\displaystyle\gamma_{i,j}^{\prime}(k)
=\displaystyle= P(Xi,jt=0|Xi,jt1=0)P(Xi,jt1=0,Xi,jtk=0)P(Xi,jtk=0)1μi,j(μi,j)2\displaystyle P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)P\left(X_{i,j}^{t-1}=0,X_{i,j}^{t-k}=0\right)P\left(X_{i,j}^{t-k}=0\right)^{-1}\mu^{\prime}_{i,j}-\left(\mu^{\prime}_{i,j}\right)^{2}
=\displaystyle= (P(Xi,jt=0|Xi,jt1=0)P(Xi,jt1=0,Xi,jtk=0)P(Xi,jtk=0)11)(μi,j)2\displaystyle\left(P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)P\left(X_{i,j}^{t-1}=0,X_{i,j}^{t-k}=0\right)P\left(X_{i,j}^{t-k}=0\right)^{-1}-1\right)\left(\mu^{\prime}_{i,j}\right)^{2}
=\displaystyle= P(Xi,jt=0|Xi,jt1=0)(α1,i,j1+α0,i,j+α1,i,j)k1α0,i,j(1+α0,i,j)2P(Xi,jtk=0)1μi,j\displaystyle P\left(X_{i,j}^{t}=0\Big{|}X_{i,j}^{t-1}=0\right)\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{k-1}\frac{\alpha_{0,i,j}}{(1+\alpha_{0,i,j})^{2}}P\left(X_{i,j}^{t-k}=0\right)^{-1}\mu^{\prime}_{i,j}
=\displaystyle= (α1,i,j1+α0,i,j+α1,i,j)k1α0,i,j(1+α1,i,j)2(1+α0,i,j)2(1+α0,i,j+α1,i,j)2\displaystyle\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{k-1}\frac{\alpha_{0,i,j}(1+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j})^{2}(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
=\displaystyle= (α1,i,j1+α0,i,j+α1,i,j)k1γi,j(1).\displaystyle\left(\frac{\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right)^{k-1}\gamma_{i,j}^{\prime}(1).

This proves (ii). ∎

Lemma 5.

Let {𝐗t}P𝛉\{{\mathbf{X}}^{t}\}\sim P_{\boldsymbol{\theta}} hold. Under condition (A1) we have,

sup1i<jp{Var(t=1nXi,jtXi,jt1),Var(t=1nXi,jt)}=O(n).\sup_{1\leq i<j\leq p}\left\{{\rm Var}\left(\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1}\right),\quad{\rm Var}\left(\sum_{t=1}^{n}X_{i,j}^{t}\right)\right\}=O(n).
Proof.

Let Y1,Y2,Y_{1},Y_{2},\ldots be a sequence of Bernoulli random variables with EYi=μ{\rm E}Y_{i}=\mu, Var(Yi)=σ2{\rm Var}(Y_{i})=\sigma^{2} for all i=1,2i=1,2\dots, and assume that Cov(Yi,Yj)σ2ρ|ij|{\rm Cov}\left(Y_{i},Y_{j}\right)\leq\sigma^{2}\rho^{|i-j|} for some 0ρ<10\leq\rho<1. We have

Var(i=1nYi)\displaystyle{\rm Var}\left(\sum_{i=1}^{n}Y_{i}\right) =\displaystyle= (i=1nVar(Yi)+1ijnCov(Yi,Yj))\displaystyle\left(\sum_{i=1}^{n}{\rm Var}\left(Y_{i}\right)+\sum_{1\leq i\neq j\leq n}{\rm Cov}\left(Y_{i},Y_{j}\right)\right)
\displaystyle\leq σ2(n+2ρ(n1)+2ρ2(n2)++2ρn1)\displaystyle{\sigma^{2}}\left(n+2\rho(n-1)+2\rho^{2}(n-2)+\cdots+2\rho^{n-1}\right)
\displaystyle\leq 2σ2(n+ρ(n1)+ρ2(n2)++ρn1)\displaystyle{2\sigma^{2}}\left(n+\rho(n-1)+\rho^{2}(n-2)+\cdots+\rho^{n-1}\right)
\displaystyle\leq 2nσ21ρ.\displaystyle\frac{2n\sigma^{2}}{1-\rho}.

Lemma 5 then follows directly from Proposition 1, Lemma 4, condition (A1), and the above inequality. ∎

Lemma 6.

Suppose ZiZ_{i}, i=1,,pi=1,\cdots,p are independent random variables with

E(Zi)=0,Var(Zi)σ2,{\rm E}\left(Z_{i}\right)=0,\quad{\rm Var}\left(Z_{i}\right)\leq\sigma^{2},

and ZibZ_{i}\leq b almost surely. We have, for any constant c>0c>0, there exists a large enough constant C>0C>0 such that, with probability greater than 1(p)c1-(p)^{-c},

|i=1pZi|C[plog(p)σ+blog(p)].\left|\sum_{i=1}^{p}Z_{i}\right|\leq C[\sqrt{p\log(p)}\sigma+b\log(p)].
Proof.

This is a direct result of Bernstein’s inequality [22]. ∎

A.2 Proof of Proposition 1

Proof.


(i)(i) follows directly from Proposition 1 of [14]. For (ii), we have,

E(dit)\displaystyle{\rm E}\left(d_{i}^{t}\right) =\displaystyle= k=1,kipE(Xi,kt)=k=1,kipeβi,0+βk,01+eβi,0+βk,0,\displaystyle\sum_{k=1,\>k\neq i}^{p}{\rm E}\left(X_{i,k}^{t}\right)=\sum_{k=1,\>k\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{1+e^{\beta_{i,0}+\beta_{k,0}}},
Var(dit)\displaystyle{\rm Var}\left(d_{i}^{t}\right) =\displaystyle= k=1,kipVar(Xi,kt)=k=1,kipeβi,0+βk,0(1+eβi,0+βk,0)2,\displaystyle\sum_{k=1,\>k\neq i}^{p}{\rm Var}\left(X_{i,k}^{t}\right)=\sum_{k=1,\>k\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{(1+e^{\beta_{i,0}+\beta_{k,0}})^{2}},
Cov(dit,dis)\displaystyle{\rm Cov}\left(d_{i}^{t},d_{i}^{s}\right) =\displaystyle= k=1,kipCov(Xi,kt,Xi,ks)\displaystyle\sum_{k=1,\>k\neq i}^{p}{\rm Cov}\left(X_{i,k}^{t},X_{i,k}^{s}\right)
=\displaystyle= k=1,kip(eβi,1+βk,11+r=01eβi,r+βk,r)|ts|eβi,0+βk,0(1+eβi,0+βk,0)2.\displaystyle\sum_{k=1,\>k\neq i}^{p}\left(\frac{e^{\beta_{i,1}+\beta_{k,1}}}{1+\sum_{r=0}^{1}e^{\beta_{i,r}+\beta_{k,r}}}\right)^{|t-s|}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{(1+e^{\beta_{i,0}+\beta_{k,0}})^{2}}.

Thus,

ρi,jd(|ts|)\displaystyle\rho^{d}_{i,j}(|t-s|) \displaystyle\equiv Corr(dit,djs)\displaystyle{\rm Corr}(d_{i}^{t},d_{j}^{s})
=\displaystyle= {Ci,ρk=1,kip(eβi,1+βk,11+r=01eβi,r+βk,r)|ts|eβi,0+βk,0(1+eβi,0+βk,0)2ifi=j,0ifij,\displaystyle\begin{cases}C_{i,\rho}\sum_{k=1,\>k\neq i}^{p}\left(\frac{e^{\beta_{i,1}+\beta_{k,1}}}{1+\sum_{r=0}^{1}e^{\beta_{i,r}+\beta_{k,r}}}\right)^{|t-s|}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{(1+e^{\beta_{i,0}+\beta_{k,0}})^{2}}\quad&{\rm if}\;i=j,\\ 0&{\rm if}\;i\neq j,\end{cases}

where Ci,ρ=(k=1,kipeβi,0+βk,0(1+eβi,0+βk,0)2)1C_{i,\rho}=\left(\sum_{k=1,\>k\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{k,0}}}{(1+e^{\beta_{i,0}+\beta_{k,0}})^{2}}\right)^{-1}.

A.3 Proof of Lemma 1

Proof.

By Proposition 3 of [14] and Lemma 4, we have that the process {𝐗t,t=1,2,}\{{\mathbf{X}}^{t},t=1,2,\ldots\} is α\alpha-mixing with exponential decaying rate. Since the mixing property is hereditary, the processes {(1𝐗t)(1𝐗t1),t=1,2,}\{\left(1-{\mathbf{X}}^{t}\right)\circ\left(1-{\mathbf{X}}^{t-1}\right),t=1,2,\ldots\} and {𝐗t𝐗t1,t=1,2,}\{{\mathbf{X}}^{t}\circ{\mathbf{X}}^{t-1},t=1,2,\ldots\} are also α\alpha-mixing with exponential decaying rate. From Theorem 1 in [24], we obtain the following concentration inequalities: there exists a positive constant CC such that,

𝐏(|t=1n{Xi,jtE(Xi,jt)}|>ϵ)exp(Cϵ2n+ϵlog(n)loglog(n)),\displaystyle{\mathbf{P}}\left(\left|\sum_{t=1}^{n}\left\{X_{i,j}^{t}-{\rm E}\left(X_{i,j}^{t}\right)\right\}\right|>\epsilon\right)\leq\exp\left(\frac{-C\epsilon^{2}}{n+\epsilon\log\left(n\right)\log\log\left(n\right)}\right),
𝐏(|t=1n{Xi,jtXi,jt1E(Xi,jtXi,jt1)}|>ϵ)exp(Cϵ2n+ϵlog(n)loglog(n)),\displaystyle{\mathbf{P}}\left(\left|\sum_{t=1}^{n}\left\{X_{i,j}^{t}X_{i,j}^{t-1}-{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)\right\}\right|>\epsilon\right)\leq\exp\left(\frac{-C\epsilon^{2}}{n+\epsilon\log\left(n\right)\log\log\left(n\right)}\right),
𝐏(|t=1n{(1Xi,jt)(1Xi,jt1)E((1Xi,jt)(1Xi,jt1))}|>ϵ)\displaystyle{\mathbf{P}}\left(\left|\sum_{t=1}^{n}\left\{(1-X_{i,j}^{t})(1-X_{i,j}^{t-1})-{\rm E}\left((1-X_{i,j}^{t})(1-X_{i,j}^{t-1})\right)\right\}\right|>\epsilon\right)
\displaystyle\leq exp(Cϵ2n+ϵlog(n)loglog(n)),\displaystyle\exp\left(\frac{-C\epsilon^{2}}{n+\epsilon\log\left(n\right)\log\log\left(n\right)}\right),

hold for all 1i<jp1\leq i<j\leq p. For any positive constant c>0c>0, by setting ϵ=c1nlog(np)+c1log(n)loglog(n)log(np)\epsilon=c_{1}\sqrt{n\log(np)}+c_{1}\log\left(n\right)\log\log\left(n\right)\log\left(np\right) with a big enough constant c1>0c_{1}>0, we have, with probability greater than 1(np)c1-(np)^{-c},

|t=1n{Xi,jtE(Xi,jt)}|ϵ,\displaystyle\left|\sum_{t=1}^{n}\left\{X_{i,j}^{t}-{\rm E}\left(X_{i,j}^{t}\right)\right\}\right|\leq\epsilon,
|t=1n{Xi,jtXi,jt1E(Xi,jtXi,jt1)}|ϵ,\displaystyle\left|\sum_{t=1}^{n}\left\{X_{i,j}^{t}X_{i,j}^{t-1}-{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)\right\}\right|\leq\epsilon,
|t=1n{(1Xi,jt)(1Xi,jt1)E((1Xi,jt)(1Xi,jt1))}|ϵ,\displaystyle\left|\sum_{t=1}^{n}\left\{(1-X_{i,j}^{t})(1-X_{i,j}^{t-1})-{\rm E}\left((1-X_{i,j}^{t})(1-X_{i,j}^{t-1})\right)\right\}\right|\leq\epsilon,

hold for all 1i<jp1\leq i<j\leq p. ∎

A.4 Proof of Lemma 2

Proof.

Note that for a,b,c,d[0,1]a,b,c,d\in[0,1], we have

|abcd||abcb|+|cbcd|=|ac|b+|bd|c|ac|+|bd|.\left|ab-cd\right|\leq\left|ab-cb\right|+\left|cb-cd\right|=\left|a-c\right|b+\left|b-d\right|c\leq\left|a-c\right|+\left|b-d\right|.

With κ0θi,0κ0-\kappa_{0}\leq\theta_{i,0}\leq\kappa_{0} and κ1θi,1κ1-\kappa_{1}\leq\theta_{i,1}\leq\kappa_{1}, for all 1ip1\leq i\leq p and κr=max(κr,κr)\kappa_{r}=\max\left(\kappa_{r},\kappa_{r}\right) for r=0,1r=0,1, we then have, there exist positive constants C1,C2C_{1},C_{2} such that, for all 1ijp1\leq i\neq j\leq p and 𝜽𝐁(𝜽,cre4κ04κ1)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},c_{r}e^{-4\kappa_{0}-4\kappa_{1}}\right) where cr>0c_{r}>0 is a small enough constant,

E(𝐕2(𝜽)+𝐕1(𝜽))i,j=1pα0,i,j(1+α0,i,j+α1,i,j)2C1e4κ12κ0p,{\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta}))_{i,j}=\frac{1}{p}\frac{\alpha_{0,i,j}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}\geq C_{1}\frac{e^{-4\kappa_{1}-2\kappa_{0}}}{p},
E(𝐕2(𝜽))i,j\displaystyle\quad\quad{\rm E}(-{\mathbf{V}}_{2}(\boldsymbol{\theta}))_{i,j}
=\displaystyle= 1p(α0,i,jα1,i,j(1+α0,i,j+α1,i,j)2E(bi,j)α0,i,jα1,i,j(α0,i,j+α1,i,j)2)\displaystyle\frac{1}{p}\left(\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-{\rm E}(b_{i,j})\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}\right)
=\displaystyle= 1pα0,i,jα1,i,j(α0,i,j+α1,i,j)2((α0,i,j+α1,i,j)2(1+α0,i,j+α1,i,j)2E(bi,j))\displaystyle\frac{1}{p}\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}\left(\frac{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-{\rm E}(b_{i,j})\right)
=\displaystyle= 1pα0,i,jα1,i,j(α0,i,j+α1,i,j)2{((α0,i,j+α1,i,j)2(1+α0,i,j+α1,i,j)2α0,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j))\displaystyle\frac{1}{p}\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}\Bigg{\{}\left(\frac{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})}\right)
(α0,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)α0,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j))}\displaystyle-\left(\frac{\alpha^{*}_{0,i,j}(\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j})}{(1+\alpha^{*}_{0,i,j})(1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j})}-\frac{\alpha_{0,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})}\right)\Bigg{\}}
\displaystyle\geq 1pα0,i,jα1,i,j(α0,i,j+α1,i,j)2{α1,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2\displaystyle\frac{1}{p}\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{1,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
(|α0,i,j1+α0,i,jα0,i,j1+α0,i,j|+|α0,i,j+α1,i,j1+α0,i,j+α1,i,jα0,i,j+α1,i,j1+α0,i,j+α1,i,j|)}\displaystyle-\left(\left|\frac{\alpha^{*}_{0,i,j}}{1+\alpha^{*}_{0,i,j}}-\frac{\alpha_{0,i,j}}{1+\alpha_{0,i,j}}\right|+\left|\frac{\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}}{1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}}-\frac{\alpha_{0,i,j}+\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right|\right)\Bigg{\}}
\displaystyle\geq 1pα0,i,jα1,i,j(α0,i,j+α1,i,j)2{α1,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2\displaystyle\frac{1}{p}\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{1,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
(|α0,i,j(α0,i,jα0,i,j1)(1+α0,i,j)(1+α0,i,j)|+|α0,i,j(α0,i,jα0,i,j1)+α1,i,j(α1,i,jα1,i,j1)(1+α0,i,j+α1,i,j)(1+α0,i,j+α1,i,j)|)}\displaystyle-\left(\left|\frac{\alpha_{0,i,j}\left(\frac{\alpha^{*}_{0,i,j}}{\alpha_{0,i,j}}-1\right)}{\big{(}1+\alpha_{0,i,j}\big{)}\big{(}1+\alpha^{*}_{0,i,j}\big{)}}\right|+\left|\frac{\alpha_{0,i,j}\left(\frac{\alpha^{*}_{0,i,j}}{\alpha_{0,i,j}}-1\right)+\alpha_{1,i,j}\left(\frac{\alpha^{*}_{1,i,j}}{\alpha_{1,i,j}}-1\right)}{\big{(}1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}\big{)}\big{(}1+\alpha_{0,i,j}+\alpha_{1,i,j}\big{)}}\right|\right)\Bigg{\}}
\displaystyle\geq 1pα0,i,jα1,i,j(α0,i,j+α1,i,j)2{α1,i,j(α0,i,j+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2(2|α0,i,jα0,i,j1|+|α1,i,jα1,i,j1|)1+α0,i,j+α1,i,j}\displaystyle\frac{1}{p}\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{1,i,j}(\alpha_{0,i,j}+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-\frac{\left(2\left|\frac{\alpha^{*}_{0,i,j}}{\alpha_{0,i,j}}-1\right|+\left|\frac{\alpha^{*}_{1,i,j}}{\alpha_{1,i,j}}-1\right|\right)}{1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}}\Bigg{\}}
\displaystyle\geq C1e6κ04κ1p,\displaystyle C_{1}\frac{e^{-6\kappa_{0}-4\kappa_{1}}}{p},

and

E(𝐕2(𝜽)+𝐕3(𝜽))i,j\displaystyle{\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta}))_{i,j}
=\displaystyle= 1p(α1,i,j(1+α0,i,j+α1,i,j)2E(di,j)α1,i,j(1+α1,i,j)2)\displaystyle\frac{1}{p}\left(\frac{\alpha_{1,i,j}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-{\rm E}(d_{i,j})\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\right)
=\displaystyle= 1pα1,i,j(1+α1,i,j)2((1+α1,i,j)2(1+α0,i,j+α1,i,j)2E(di,j))\displaystyle\frac{1}{p}\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\left(\frac{(1+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-{\rm E}(d_{i,j})\right)
=\displaystyle= 1pα1,i,j(1+α1,i,j)2{(1+α1,i,j)2(1+α0,i,j+α1,i,j)21+α1,i,j(1+α0,i,j)(1+α0,i,j+α1,i,j)\displaystyle\frac{1}{p}\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{(1+\alpha_{1,i,j})^{2}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-\frac{1+\alpha_{1,i,j}}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})}
(1+α1,i,j(1+α0,i,j)(1+α0,i,j+α1,i,j)1+α1,i,j(1+α0,i,j)(1+α0,i,j+α1,i,j))}\displaystyle-\left(\frac{1+\alpha^{*}_{1,i,j}}{(1+\alpha^{*}_{0,i,j})(1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j})}-\frac{1+\alpha_{1,i,j}}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})}\right)\Bigg{\}}
=\displaystyle= 1pα1,i,j(1+α1,i,j)2{α0,i,jα1,i,j(1+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2\displaystyle\frac{1}{p}\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{0,i,j}\alpha_{1,i,j}(1+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
(|11+α0,i,j11+α0,i,j|+|1+α1,i,j1+α0,i,j+α1,i,j1+α1,i,j1+α0,i,j+α1,i,j|)}\displaystyle-\left(\left|\frac{1}{1+\alpha^{*}_{0,i,j}}-\frac{1}{1+\alpha_{0,i,j}}\right|+\left|\frac{1+\alpha^{*}_{1,i,j}}{1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}}-\frac{1+\alpha_{1,i,j}}{1+\alpha_{0,i,j}+\alpha_{1,i,j}}\right|\right)\Bigg{\}}
=\displaystyle= 1pα1,i,j(1+α1,i,j)2{α0,i,jα1,i,j(1+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2\displaystyle\frac{1}{p}\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{0,i,j}\alpha_{1,i,j}(1+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
(|α0,i,jα0,i,j(1+α0,i,j)(1+α0,i,j)|+|α0,i,jα0,i,j+α0,i,jα1,i,jα0,i,jα1,i,j(1+α0,i,j+α1,i,j)(1+α0,i,j+α1,i,j)|)}\displaystyle-\left(\left|\frac{\alpha_{0,i,j}-\alpha^{*}_{0,i,j}}{\big{(}1+\alpha_{0,i,j}\big{)}\big{(}1+\alpha^{*}_{0,i,j}\big{)}}\right|+\left|\frac{\alpha_{0,i,j}-\alpha^{*}_{0,i,j}+\alpha_{0,i,j}\alpha_{1,i,j}^{*}-\alpha^{*}_{0,i,j}\alpha_{1,i,j}}{\big{(}1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}\big{)}\big{(}1+\alpha_{0,i,j}+\alpha_{1,i,j}\big{)}}\right|\right)\Bigg{\}}
=\displaystyle= 1pα1,i,j(1+α1,i,j)2{α0,i,jα1,i,j(1+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2|α0,i,jα0,i,j(1+α0,i,j)(1+α0,i,j)|\displaystyle\frac{1}{p}\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{0,i,j}\alpha_{1,i,j}(1+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-\left|\frac{\alpha_{0,i,j}-\alpha^{*}_{0,i,j}}{\big{(}1+\alpha_{0,i,j}\big{)}\big{(}1+\alpha^{*}_{0,i,j}\big{)}}\right|
|α0,i,jα0,i,j+α0,i,jα1,i,jα0,i,jα1,i,j+α0,i,jα1,i,jα0,i,jα1,i,j(1+α0,i,j+α1,i,j)(1+α0,i,j+α1,i,j)|}\displaystyle-\left|\frac{\alpha_{0,i,j}-\alpha^{*}_{0,i,j}+\alpha_{0,i,j}\alpha_{1,i,j}^{*}-\alpha_{0,i,j}^{*}\alpha_{1,i,j}^{*}+\alpha_{0,i,j}^{*}\alpha_{1,i,j}^{*}-\alpha^{*}_{0,i,j}\alpha_{1,i,j}}{\big{(}1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}\big{)}\big{(}1+\alpha_{0,i,j}+\alpha_{1,i,j}\big{)}}\right|\Bigg{\}}
=\displaystyle= 1pα1,i,j(1+α1,i,j)2{α0,i,jα1,i,j(1+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2|α0,i,j(α0,i,jα0,i,j1)(1+α0,i,j)(1+α0,i,j)|\displaystyle\frac{1}{p}\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{0,i,j}\alpha_{1,i,j}(1+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-\left|\frac{\alpha_{0,i,j}\left(\frac{\alpha^{*}_{0,i,j}}{\alpha_{0,i,j}}-1\right)}{\big{(}1+\alpha_{0,i,j}\big{)}\big{(}1+\alpha^{*}_{0,i,j}\big{)}}\right|
|α0,i,j(α0,i,jα0,i,j1)α0,i,jα1,i,j(α0,i,jα0,i,j1)+α0,i,jα1,i,j(α1,i,jα1,i,j1)(1+α0,i,j+α1,i,j)(1+α0,i,j+α1,i,j)|}\displaystyle-\left|\frac{\alpha_{0,i,j}\left(\frac{\alpha^{*}_{0,i,j}}{\alpha_{0,i,j}}-1\right)-\alpha_{0,i,j}\alpha^{*}_{1,i,j}\left(\frac{\alpha^{*}_{0,i,j}}{\alpha_{0,i,j}}-1\right)+\alpha^{*}_{0,i,j}\alpha_{1,i,j}\left(\frac{\alpha^{*}_{1,i,j}}{\alpha_{1,i,j}}-1\right)}{\big{(}1+\alpha^{*}_{0,i,j}+\alpha^{*}_{1,i,j}\big{)}\big{(}1+\alpha_{0,i,j}+\alpha_{1,i,j}\big{)}}\right|\Bigg{\}}
\displaystyle\geq C11pα1,i,j(1+α1,i,j)2{α0,i,jα1,i,j(1+α1,i,j)(1+α0,i,j)(1+α0,i,j+α1,i,j)2\displaystyle C_{1}\frac{1}{p}\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}\Bigg{\{}\frac{\alpha_{0,i,j}\alpha_{1,i,j}(1+\alpha_{1,i,j})}{(1+\alpha_{0,i,j})(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}
(2|α0,i,jα0,i,j1|+|α1,i,jα1,i,j1|)}\displaystyle-\left(2\left|\frac{\alpha^{*}_{0,i,j}}{\alpha_{0,i,j}}-1\right|+\left|\frac{\alpha^{*}_{1,i,j}}{\alpha_{1,i,j}}-1\right|\right)\Bigg{\}}
\displaystyle\geq C1e4κ04κ1p.\displaystyle C_{1}\frac{e^{-4\kappa_{0}-4\kappa_{1}}}{p}.

Notice that the elements in E𝐕2(𝜽),E(𝐕2(𝜽)+𝐕3(𝜽))-{\rm E}{\mathbf{V}}_{2}(\boldsymbol{\theta}),{\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta})) and E(𝐕2(𝜽)+𝐕1(𝜽)){\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta})) are all positive. Denote 𝐳=(𝐳1,𝐳2){\mathbf{z}}=({\mathbf{z}}_{1}^{\top},{\mathbf{z}}_{2}^{\top})^{\top} with 𝐳1=(z1,1,,z1,p)p{\mathbf{z}}_{1}=(z_{1,1},\ldots,z_{1,p})^{\top}\in\mathbb{R}^{p} and 𝐳2=(z2,1,,z2,p)p{\mathbf{z}}_{2}=(z_{2,1},\ldots,z_{2,p})^{\top}\in\mathbb{R}^{p}. Then there exists a constant C>0C>0 such that,

E(𝐕(𝜽))2\displaystyle\left\|{\rm E}({\mathbf{V}}(\boldsymbol{\theta}))\right\|_{2}
\displaystyle\geq inf𝐳2=1(1i<jp(E(𝐕1(𝜽)+𝐕2(𝜽))i,j(z1,i+z1,j)2\displaystyle\inf_{\|{\mathbf{z}}\|_{2}=1}\Bigg{(}\sum_{1\leq i<j\leq p}\Big{(}{\rm E}\left({\mathbf{V}}_{1}(\boldsymbol{\theta})+{\mathbf{V}}_{2}(\boldsymbol{\theta})\right)_{i,j}(z_{1,i}+z_{1,j})^{2}
+E(𝐕3(𝜽)+𝐕2(𝜽))i,j(z2,i+z2,j)2E(𝐕2(𝜽))i,j(z1,i+z1,jz2,iz2,j)2))\displaystyle+{\rm E}\left({\mathbf{V}}_{3}(\boldsymbol{\theta})+{\mathbf{V}}_{2}(\boldsymbol{\theta})\right)_{i,j}(z_{2,i}+z_{2,j})^{2}-{\rm E}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})\right)_{i,j}\left(z_{1,i}+z_{1,j}-z_{2,i}-z_{2,j}\right)^{2}\Big{)}\Bigg{)}
\displaystyle\geq inf𝐳2=1(1i<jpE(𝐕1(𝜽)+𝐕2(𝜽))i,j(z1,i+z1,j)2\displaystyle\inf_{\|{\mathbf{z}}\|_{2}=1}\Bigg{(}\sum_{1\leq i<j\leq p}{\rm E}\left({\mathbf{V}}_{1}(\boldsymbol{\theta})+{\mathbf{V}}_{2}(\boldsymbol{\theta})\right)_{i,j}(z_{1,i}+z_{1,j})^{2}
+E(𝐕3(𝜽)+𝐕2(𝜽))i,j(z2,i+z2,j)2)\displaystyle+{\rm E}\left({\mathbf{V}}_{3}(\boldsymbol{\theta})+{\mathbf{V}}_{2}(\boldsymbol{\theta})\right)_{i,j}(z_{2,i}+z_{2,j})^{2}\Bigg{)}
\displaystyle\geq C2e4κ04κ1pinf𝐳2=11i<jp((z1,i+z1,j)2+(z2,i+z2,j)2)\displaystyle C_{2}\frac{e^{-4\kappa_{0}-4\kappa_{1}}}{p}\inf_{\|{\mathbf{z}}\|_{2}=1}\sum_{1\leq i<j\leq p}\left((z_{1,i}+z_{1,j})^{2}+(z_{2,i}+z_{2,j})^{2}\right)
\displaystyle\geq Ce4κ04κ1.\displaystyle Ce^{-4\kappa_{0}-4\kappa_{1}}.

Here in the last step we have used the fact that for any 𝐚=(a1,,ap)p{\mathbf{a}}=(a_{1},\ldots,a_{p})^{\top}\in\mathbb{R}^{p}, 1i<jp(ai+aj)2=𝐚𝐂𝐚\sum_{1\leq i<j\leq p}(a_{i}+a_{j})^{2}={\mathbf{a}}^{\top}{\mathbf{C}}{\mathbf{a}} where 𝐂=(p2)𝐈p+𝟏p𝟏p{\mathbf{C}}=(p-2){{\mathbf{I}}}_{p}+{\bf 1}_{p}{\bf 1}_{p}^{\top}, and the fact that the eigenvalues of 𝐂{\mathbf{C}} is greater or equal to p2p-2. ∎

A.5 Proof of Lemma 3

Proof.

Define a series of matrices {𝐘i,j}\{{\mathbf{Y}}_{i,j}\} (1i<jp)(1\leq i<j\leq p). For 𝐘i,j{\mathbf{Y}}_{i,j}, the (i,j)(i,j), (j,i)(j,i), (i,i)(i,i), (j,j)(j,j) elements are Zi,jZ_{i,j} while other elements are set to be zero. Then all the 𝐘i,j{\mathbf{Y}}_{i,j} matrices are independent and

1i<jp𝐘i,j=𝐙.\displaystyle\sum_{1\leq i<j\leq p}{\mathbf{Y}}_{i,j}={\mathbf{Z}}.

Since 𝐘i,j{\mathbf{Y}}_{i,j} are symmetric and centered random matrices, we have Var(𝐘i,j)=E(𝐘i,j𝐘i,j){\rm Var}\left({\mathbf{Y}}_{i,j}\right)={\rm E}\left({\mathbf{Y}}_{i,j}{\mathbf{Y}}_{i,j}\right). Further, by the definition of 𝐘i,j{\mathbf{Y}}_{i,j}, we know that the (i,j)(i,j)th, (j,i)(j,i)th, (i,i)(i,i)th and (j,j)(j,j)th elements of Var(𝐘i,j){\rm Var}({\mathbf{Y}}_{i,j}) are all equal to 2Var(Zi,j)2{\rm Var}\left(Z_{i,j}\right), while all other elements are zero. Consequently,

𝐘i,j2\displaystyle\|{\mathbf{Y}}_{i,j}\|_{2} bsupa2=1((ai+aj)2)2b,\displaystyle\leq b\sup_{\|a\|_{2}=1}\left((a_{i}+a_{j})^{2}\right)\leq 2b,
1i<jpVar(𝐘i,j)2\displaystyle\left\|\sum_{1\leq i<j\leq p}{\rm Var}({\mathbf{Y}}_{i,j})\right\|_{2} =\displaystyle= supa2=1(1i<jp2Var(Zi,j)(ai+aj)2)\displaystyle\sup_{\|a\|_{2}=1}\left(\sum_{1\leq i<j\leq p}2{\rm Var}\left(Z_{i,j}\right)(a_{i}+a_{j})^{2}\right)
\displaystyle\leq maxi,j(2Var(Zi,j))supa2=1(1i<jp(ai+aj)2)\displaystyle\max_{i,j}\left(2{\rm Var}\Big{(}Z_{i,j}\Big{)}\right)\sup_{\|a\|_{2}=1}\left(\sum_{1\leq i<j\leq p}(a_{i}+a_{j})^{2}\right)
\displaystyle\leq 4σ2(p1).\displaystyle 4\sigma^{2}(p-1).

Using the Matrix Bernstein inequality (c.f. Theorem 6.17 of [33]), we have

P(𝐙2>ϵ)=P(1i<jp𝐘i,j2>ϵ)2pexp(ϵ24σ2(p1)+4bϵ).P\left(\left\|{\mathbf{Z}}\right\|_{2}>\epsilon\right)=P\left(\left\|\sum_{1\leq i<j\leq p}{\mathbf{Y}}_{i,j}\right\|_{2}>\epsilon\right)\leq 2p\ \exp\left(-\frac{\epsilon^{2}}{4\sigma^{2}(p-1)+4b\epsilon}\right).

A.6 Proof of Theorem 1

Proof.

Note that for any 𝜽\boldsymbol{\theta} and iji\neq j, we have

(𝐕2(𝜽)+𝐕1(𝜽))i,j\displaystyle\left({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta})\right)_{i,j} =\displaystyle= 1pα0,i,j(1+α0,i,j+α1,i,j)2>0,\displaystyle\frac{1}{p}\frac{\alpha_{0,i,j}}{(1+\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}>0,
(𝐕2(𝜽)+𝐕1(𝜽))i,i\displaystyle\left({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta})\right)_{i,i} =\displaystyle= k=1,kip(𝐕2(𝜽)+𝐕1(𝜽))i,k.\displaystyle\sum_{k=1,\>k\neq i}^{p}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta})\right)_{i,k}.

Therefore,  𝐕2(𝜽)+𝐕1(𝜽){\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta}) is always positive definite. Next we prove that all 𝐔(𝜽){𝐕2(𝜽),𝐕2(𝜽)+𝐕3(𝜽)}{\mathbf{U}}(\boldsymbol{\theta})\in\{-{\mathbf{V}}_{2}(\boldsymbol{\theta}),{\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta})\} are positive definite (in probability) by showing that with probability tending to 1,

inf𝐚2=1(𝐚(E(𝐔(𝜽)))𝐚)>sup𝐚2=1(𝐚(𝐔(𝜽)E(𝐔(𝜽)))𝐚),\inf_{\|{\mathbf{a}}\|_{2}=1}\left({\mathbf{a}}^{\top}({\rm E}({\mathbf{U}}(\boldsymbol{\theta}))){\mathbf{a}}\right)>\sup_{\|{\mathbf{a}}\|_{2}=1}\left({\mathbf{a}}^{\top}({\mathbf{U}}(\boldsymbol{\theta})-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))){\mathbf{a}}\right),

holds uniformly for all 𝜽𝐁(𝜽,r)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right). Note that

𝐔(𝜽)E(𝐔(𝜽))2\displaystyle\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}
\displaystyle\leq 𝐔(𝜽)E(𝐔(𝜽))2+𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))2.\displaystyle\left\|{\mathbf{U}}(\boldsymbol{\theta}^{*})-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))\right\|_{2}+\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}.

We consider 𝐔(𝜽)E(𝐔(𝜽))2\left\|{\mathbf{U}}(\boldsymbol{\theta}^{*})-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))\right\|_{2} first. By setting ϵ=c1(2σ2(p1)log(np)+blog(np))\epsilon=c_{1}\big{(}\sqrt{2\sigma^{2}(p-1)\log(np)}+b\log(np)\big{)} with some big enough constant c1>0c_{1}>0 in Lemma 3, we have

P(𝐙2>ϵ)\displaystyle P\left(\left\|{\mathbf{Z}}\right\|_{2}>\epsilon\right)
\displaystyle\leq 2pexp(c122σ2(p1)log(np)+2b2σ2(p1)log3/2(np)+b2log2(np)2σ2(p1)+4c1b2σ2(p1)log(np)+4c1b2log(np))\displaystyle 2p\exp\left(-c_{1}^{2}\frac{2\sigma^{2}(p-1)\log(np)+2b\sqrt{2\sigma^{2}(p-1)}\log^{3/2}(np)+b^{2}\log^{2}(np)}{2\sigma^{2}(p-1)+4c_{1}b\sqrt{2\sigma^{2}(p-1)\log(np)}+4c_{1}b^{2}\log(np)}\right)
\displaystyle\leq 2pexp(c1log(np)/4)\displaystyle 2p\exp\left(-c_{1}\log(np)/4\right)
=\displaystyle= 2p(np)c1/4.\displaystyle 2p\left(np\right)^{-c_{1}/4}.

As npnp\to\infty, we have with probability greater than 12p(np)c1/41-2p\left(np\right)^{-c_{1}/4},

(A.21) 𝐙2c1(σ2plog(np)+blog(np)).\left\|{\mathbf{Z}}\right\|_{2}\leq c_{1}\left(\sigma\sqrt{2p\log(np)}+b\log(np)\right).

By Lemma 1 and Lemma 5, we have, uniformly for all 𝜽\boldsymbol{\theta} and 1ijp1\leq i\neq j\leq p, there exist positive constants C1C_{1}, c2c_{2} such that, with probability greater than 1(np)c21-(np)^{-c_{2}},

|𝐔(𝜽)i,jE(𝐔(𝜽))i,j|\displaystyle|{\mathbf{U}}(\boldsymbol{\theta})_{i,j}-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))_{i,j}| \displaystyle\leq C1(log(np)np2+log(n)loglog(n)log(np)np);\displaystyle C_{1}\left(\sqrt{\frac{\log(np)}{np^{2}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log\left(np\right)}{np}\right);
Var(𝐔(𝜽)i,j)\displaystyle{\rm Var}\left({\mathbf{U}}(\boldsymbol{\theta})_{i,j}\right) \displaystyle\leq supij{Var(t=1nXi,jtXi,jt1),Var(t=1nXi,jt)}n2p2\displaystyle\frac{\sup_{i\neq j}\left\{{\rm Var}\left(\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1}\right),{\rm Var}\left(\sum_{t=1}^{n}X_{i,j}^{t}\right)\right\}}{n^{2}p^{2}}
\displaystyle\leq C1np2.\displaystyle\frac{C_{1}}{np^{2}}.

Consequently, from (A.21) we have

𝐔(𝜽)E(𝐔(𝜽))2\displaystyle\left\|{\mathbf{U}}(\boldsymbol{\theta}^{*})-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))\right\|_{2} =\displaystyle= Op(log(np)np+log3(np)np2+log(n)loglog(n)log2(np)np).\displaystyle O_{p}\left(\sqrt{\frac{\log(np)}{np}}+\sqrt{\frac{\log^{3}(np)}{np^{2}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log^{2}\left(np\right)}{np}\right).

Next we derive uniform upper bounds for 𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))2\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}. When 𝐔(𝜽)=𝐕2(𝜽){\mathbf{U}}(\boldsymbol{\theta})=-{\mathbf{V}}_{2}(\boldsymbol{\theta}), by Lemma 1, we have, there exist positive constants C2C_{2} and c3c_{3} such that with probability greater than 1(np)c31-(np)^{-c_{3}},

𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))2\displaystyle\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}
=\displaystyle= sup𝐚2=11i,jp[𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))]i,jaiaj\displaystyle\sup_{\|{\mathbf{a}}\|_{2}=1}\sum_{1\leq i,j\leq p}\left[{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right]_{i,j}a_{i}a_{j}
=\displaystyle= sup𝐚2=11i<jp[𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))]i,j(ai+aj)2\displaystyle\sup_{\|{\mathbf{a}}\|_{2}=1}\sum_{1\leq i<j\leq p}\left[{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right]_{i,j}\left(a_{i}+a_{j}\right)^{2}
\displaystyle\leq max1i<jp|[𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))]i,j|sup𝐚2=11i<jp(ai+aj)2\displaystyle\max_{1\leq i<j\leq p}\left|\left[{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right]_{i,j}\right|\sup_{\|{\mathbf{a}}\|_{2}=1}\sum_{1\leq i<j\leq p}\left(a_{i}+a_{j}\right)^{2}
\displaystyle\leq max1i<jp|𝐕2(𝜽)i,j𝐕2(𝜽)i,j+E(𝐕2(𝜽)i,j)E(𝐕2(𝜽)i,j)|2(p1)\displaystyle\max_{1\leq i<j\leq p}\Bigg{|}{\mathbf{V}}_{2}(\boldsymbol{\theta})_{i,j}-{\mathbf{V}}_{2}(\boldsymbol{\theta})^{*}_{i,j}+{\rm E}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})^{*}_{i,j}\right)-{\rm E}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})_{i,j}\right)\Bigg{|}2\left(p-1\right)
=\displaystyle= 2(p1)npmax1i<jp|(bi,jE(bi,j))(α0,i,jα1,i,j(α0,i,j+α1,i,j)2α0,i,jα1,i,j(α0,i,j+α1,i,j)2)|\displaystyle\frac{2\left(p-1\right)}{np}\max_{1\leq i<j\leq p}\Bigg{|}\left(b_{i,j}-{\rm E}(b_{i,j})\right)\left(\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-\frac{\alpha_{0,i,j}^{*}\alpha_{1,i,j}^{*}}{(\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*})^{2}}\right)\Bigg{|}
\displaystyle\leq max1i<jp2|bi,jE(bi,j)|nmax1i<jp|(α0,i,jα1,i,jα0,i,jα1,i,j)(α0,i,jα0,i,jα1,i,jα1,i,j)(α0,i,j+α1,i,j)2(α0,i,j+α1,i,j)2|\displaystyle\max_{1\leq i<j\leq p}\frac{2\left|b_{i,j}-{\rm E}(b_{i,j})\right|}{n}\max_{1\leq i<j\leq p}\Bigg{|}\frac{\left(\alpha_{0,i,j}^{*}\alpha_{1,i,j}-\alpha_{0,i,j}\alpha_{1,i,j}^{*}\right)\left(\alpha_{0,i,j}^{*}\alpha_{0,i,j}-\alpha_{1,i,j}\alpha_{1,i,j}^{*}\right)}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}(\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*})^{2}}\Bigg{|}
=\displaystyle= max1i<jp2|bi,jE(bi,j)|nmaxij|(α0,i,jα0,i,jα1,i,jα1,i,j)α0,i,jα1,i,j(α0,i,jα0,i,jα1,i,jα1,i,j)(α0,i,j+α1,i,j)2(α0,i,j+α1,i,j)2|\displaystyle\max_{1\leq i<j\leq p}\frac{2\left|b_{i,j}-{\rm E}(b_{i,j})\right|}{n}\max_{i\neq j}\Bigg{|}\left(\frac{\alpha_{0,i,j}^{*}}{\alpha_{0,i,j}}-\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}\right)\frac{\alpha_{0,i,j}\alpha_{1,i,j}\left(\alpha_{0,i,j}^{*}\alpha_{0,i,j}-\alpha_{1,i,j}\alpha_{1,i,j}^{*}\right)}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}(\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*})^{2}}\Bigg{|}
\displaystyle\leq C2min{1,log(np)n+log(n)loglog(n)log(np)n}max1i<jp|α0,i,jα0,i,jα1,i,jα1,i,j|,\displaystyle C_{2}\min\left\{1,\sqrt{\frac{\log(np)}{{n}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log\left(np\right)}{n}\right\}\max_{1\leq i<j\leq p}\left|\frac{\alpha_{0,i,j}^{*}}{\alpha_{0,i,j}}-\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}\right|,

holds uniformly for all 𝜽\boldsymbol{\theta}. Similarly, when 𝐔(𝜽)=𝐕2(𝜽)+𝐕3(𝜽){\mathbf{U}}(\boldsymbol{\theta})={\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta}), by Lemma 1, we have there exist positive constants C3C_{3} and c4c_{4} such that, with probability greater than 1(np)c41-(np)^{-c_{4}},

𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))2\displaystyle\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}
\displaystyle\leq max1i<jp|[𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))]i,j|sup𝐚2=11i<jp(ai+aj)2\displaystyle\max_{1\leq i<j\leq p}\left|\left[{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right]_{i,j}\right|\sup_{\|{\mathbf{a}}\|_{2}=1}\sum_{1\leq i<j\leq p}\left(a_{i}+a_{j}\right)^{2}
\displaystyle\leq max1i<jp|{𝐕2(𝜽)i,j+𝐕3(𝜽)i,j𝐕2(𝜽)i,j𝐕3(𝜽)i,j+E(𝐕2(𝜽)i,j)+E(𝐕3(𝜽)i,j)\displaystyle\max_{1\leq i<j\leq p}\Bigg{|}\Bigg{\{}{\mathbf{V}}_{2}(\boldsymbol{\theta})_{i,j}+{\mathbf{V}}_{3}(\boldsymbol{\theta})_{i,j}-{\mathbf{V}}_{2}(\boldsymbol{\theta})^{*}_{i,j}-{\mathbf{V}}_{3}(\boldsymbol{\theta})^{*}_{i,j}+{\rm E}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})^{*}_{i,j}\right)+{\rm E}\left({\mathbf{V}}_{3}(\boldsymbol{\theta})^{*}_{i,j}\right)
E(𝐕2(𝜽)i,j)E(𝐕3(𝜽)i,j)}|2(p1)\displaystyle-{\rm E}\left({\mathbf{V}}_{2}(\boldsymbol{\theta})_{i,j}\right)-{\rm E}\left({\mathbf{V}}_{3}(\boldsymbol{\theta})_{i,j}\right)\Bigg{\}}\Bigg{|}2\left(p-1\right)
=\displaystyle= 2(p1)npmax1i<jp|(bi,jE(bi,j))(α0,i,jα1,i,j(α0,i,j+α1,i,j)2α0,i,jα1,i,j(α0,i,j+α1,i,j)2)|\displaystyle\frac{2\left(p-1\right)}{np}\max_{1\leq i<j\leq p}\Bigg{|}\left(b_{i,j}-{\rm E}(b_{i,j})\right)\left(\frac{\alpha_{0,i,j}\alpha_{1,i,j}}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}}-\frac{\alpha_{0,i,j}^{*}\alpha_{1,i,j}^{*}}{(\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*})^{2}}\right)\Bigg{|}
+2(p1)npmax1i<jp|(di,jE(di,j))(α1,i,j(1+α1,i,j)2α1,i,j(1+α1,i,j)2)|\displaystyle+\frac{2\left(p-1\right)}{np}\max_{1\leq i<j\leq p}\Bigg{|}\left(d_{i,j}-{\rm E}(d_{i,j})\right)\left(\frac{\alpha_{1,i,j}}{(1+\alpha_{1,i,j})^{2}}-\frac{\alpha_{1,i,j}^{*}}{(1+\alpha_{1,i,j}^{*})^{2}}\right)\Bigg{|}
\displaystyle\leq 2max1i<jp|bi,jE(bi,j)|nmax1i<jp|(α1,i,jα1,i,jα0,i,jα0,i,j)α0,i,jα1,i,j(α0,i,jα0,i,jα1,i,jα1,i,j)(α0,i,j+α1,i,j)2(α0,i,j+α1,i,j)2|\displaystyle 2\max_{1\leq i<j\leq p}\frac{\left|b_{i,j}-{\rm E}(b_{i,j})\right|}{n}\max_{1\leq i<j\leq p}\left|\left(\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}-\frac{\alpha_{0,i,j}^{*}}{\alpha_{0,i,j}}\right)\frac{\alpha_{0,i,j}\alpha_{1,i,j}\left(\alpha_{0,i,j}\alpha_{0,i,j}^{*}-\alpha_{1,i,j}\alpha_{1,i,j}^{*}\right)}{(\alpha_{0,i,j}+\alpha_{1,i,j})^{2}(\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*})^{2}}\right|
+2max1i<jp|di,jE(di,j)|nmax1i<jp|(1α1,i,jα1,i,j)α1,i,j(1α1,i,jα1,i,j)(1+α1,i,j)2(1+α1,i,j)2|\displaystyle+2\max_{1\leq i<j\leq p}\frac{\left|d_{i,j}-{\rm E}(d_{i,j})\right|}{n}\max_{1\leq i<j\leq p}\Bigg{|}\left(1-\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}\right)\frac{\alpha_{1,i,j}\left(1-\alpha_{1,i,j}\alpha_{1,i,j}^{*}\right)}{(1+\alpha_{1,i,j})^{2}(1+\alpha_{1,i,j}^{*})^{2}}\Bigg{|}
\displaystyle\leq C3min{1,log(np)n+log(n)loglog(n)log(np)n}\displaystyle C_{3}\min\left\{1,\sqrt{\frac{\log(np)}{{n}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log\left(np\right)}{n}\right\}
×(max1i<jp|α0,i,jα0,i,jα1,i,jα1,i,j|+max1i<jp|1α1,i,jα1,i,j|),\displaystyle\times\left(\max_{1\leq i<j\leq p}\left|\frac{\alpha_{0,i,j}^{*}}{\alpha_{0,i,j}}-\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}\right|+\max_{1\leq i<j\leq p}\left|1-\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}\right|\right),

holds uniformly for all 𝜽\boldsymbol{\theta}. Consequently, there exist positive constants C4C_{4} and c5c_{5} such that, with probability greater than 1(np)c51-(np)^{-c_{5}},

sup𝜽𝐁(𝜽,r)𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))2\displaystyle\sup_{\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)}\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}
\displaystyle\leq max{C3,C2}min{1,log(np)n+log(n)loglog(n)log(np)n}\displaystyle\max\left\{C_{3},C_{2}\right\}\min\left\{1,\sqrt{\frac{\log(np)}{{n}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log\left(np\right)}{n}\right\}
×sup𝜽𝐁(𝜽,r)(max1i<jp|α0,i,jα0,i,jα1,i,jα1,i,j|+max1i<jp|1α1,i,jα1,i,j|)\displaystyle\times\sup_{\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)}\left(\max_{1\leq i<j\leq p}\left|\frac{\alpha_{0,i,j}^{*}}{\alpha_{0,i,j}}-\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}\right|+\max_{1\leq i<j\leq p}\left|1-\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}\right|\right)
\displaystyle\leq max{C3,C2}min{1,log(np)n+log(n)loglog(n)log(np)n}\displaystyle\max\left\{C_{3},C_{2}\right\}\min\left\{1,\sqrt{\frac{\log(np)}{{n}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log\left(np\right)}{n}\right\}
×sup𝜽𝐁(𝜽,r)(max1i<jp|α0,i,jα0,i,j1|+2max1i<jp|α1,i,jα1,i,j1|)\displaystyle\times\sup_{\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right)}\left(\max_{1\leq i<j\leq p}\left|\frac{\alpha_{0,i,j}^{*}}{\alpha_{0,i,j}}-1\right|+2\max_{1\leq i<j\leq p}\left|\frac{\alpha_{1,i,j}^{*}}{\alpha_{1,i,j}}-1\right|\right)
\displaystyle\leq C4min{1,log(np)n+log(n)loglog(n)log(np)n}r.\displaystyle C_{4}\min\left\{1,\sqrt{\frac{\log(np)}{{n}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log\left(np\right)}{n}\right\}r.

On the other hand, by inequalities (A.4) and (A.4) we have, there exists a positive constant C5C_{5} such that

inf𝐚2=1(𝐚E(𝐔(𝜽))𝐚)\displaystyle\inf_{\|{\mathbf{a}}\|_{2}=1}\left({\mathbf{a}}^{\top}{\rm E}({\mathbf{U}}(\boldsymbol{\theta})){\mathbf{a}}\right) =\displaystyle= inf𝐚2=11i<jpE(𝐔(𝜽))i,j(ai+aj)2\displaystyle\inf_{\|{\mathbf{a}}\|_{2}=1}\sum_{1\leq i<j\leq p}{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))_{i,j}\left(a_{i}+a_{j}\right)^{2}
\displaystyle\geq C5e4κ04κ1.\displaystyle C_{5}e^{-4\kappa_{0}-4\kappa_{1}}.

holds uniformly for any 𝜽𝐁(𝜽,cre4κ04κ1)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},c_{r}e^{-4\kappa_{0}-4\kappa_{1}}\right) where cr>0c_{r}>0 is a small enough constant. Thus, as npnp\to\infty, κ0+κ1clognplog(np)\kappa_{0}+\kappa_{1}\leq c\log\frac{np}{\log(np)} for some small enough constant c>0c>0 and r=cre4κ04κ1r=c_{r}e^{-4\kappa_{0}-4\kappa_{1}} for some small enough constant 0<cr<C5/(2C4)0<c_{r}<C_{5}/(2C_{4}), we have, there exist big enough positive constants C6,c6C_{6},c_{6} such that with probability greater than 1(np)c61-(np)^{-c_{6}},

𝐔(𝜽)E(𝐔(𝜽))2\displaystyle\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}
\displaystyle\leq 𝐔(𝜽)E(𝐔(𝜽))2+𝐔(𝜽)𝐔(𝜽)+E(𝐔(𝜽))E(𝐔(𝜽))2\displaystyle\left\|{\mathbf{U}}(\boldsymbol{\theta}^{*})-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))\right\|_{2}+\left\|{\mathbf{U}}(\boldsymbol{\theta})-{\mathbf{U}}(\boldsymbol{\theta}^{*})+{\rm E}({\mathbf{U}}(\boldsymbol{\theta}^{*}))-{\rm E}({\mathbf{U}}(\boldsymbol{\theta}))\right\|_{2}
\displaystyle\leq C6(log(np)np+log3(np)np2+log(n)loglog(n)log2(np)np)+crC4e4κ04κ1\displaystyle C_{6}\left(\sqrt{\frac{\log(np)}{np}}+\sqrt{\frac{\log^{3}(np)}{np^{2}}}+\frac{\log\left(n\right)\log\log\left(n\right)\log^{2}\left(np\right)}{np}\right)+c_{r}C_{4}e^{-4\kappa_{0}-4\kappa_{1}}
\displaystyle\leq 2crC4e4κ04κ1\displaystyle 2c_{r}C_{4}e^{-4\kappa_{0}-4\kappa_{1}}
<\displaystyle< C5e4κ04κ1\displaystyle C_{5}e^{-4\kappa_{0}-4\kappa_{1}}
\displaystyle\leq inf𝐚2=1(𝐚(E(𝐔(𝜽)))𝐚),\displaystyle\inf_{\|{\mathbf{a}}\|_{2}=1}\left({\mathbf{a}}^{\top}({\rm E}({\mathbf{U}}(\boldsymbol{\theta}))){\mathbf{a}}\right),

holds uniformly for all 𝜽𝐁(𝜽,cre4κ04κ1)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}(\boldsymbol{\theta}^{*},c_{r}e^{-4\kappa_{0}-4\kappa_{1}}). The statements in Theorem 1 can then be concluded. ∎

A.7 Proof of Theorem 2

Proof.

Recall that

l(𝜽):=1p1i<jplog(1+eβi,0+βj,0+eβi,1+βj,1)1np1i<jp{(βi,0+βj,0)t=1nXi,jt+log(1+eβi,1+βj,1)t=1n(1Xi,jt)(1Xi,jt1)+log(1+eβi,1+βj,1βi,0βj,0)t=1nXi,jtXi,jt1},l(\boldsymbol{\theta}):=\frac{1}{p}\sum_{1\leq i<j\leq p}\log\Big{(}{1+e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}}\Big{)}-\frac{1}{np}{\sum_{1\leq i<j\leq p}\Bigg{\{}\left(\beta_{i,0}+\beta_{j,0}\right)\sum_{t=1}^{n}X_{i,j}^{t}}+\\ \log\left(1+e^{\beta_{i,1}+\beta_{j,1}}\right)\sum_{t=1}^{n}\left(1-X_{i,j}^{t}\right)\left(1-X_{i,j}^{t-1}\right)+\log\big{(}1+e^{\beta_{i,1}+\beta_{j,1}-\beta_{i,0}-\beta_{j,0}}\big{)}\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1}\Bigg{\}},

and write lE(𝜽)=El(𝜽)l_{E}(\boldsymbol{\theta})={\rm E}l(\boldsymbol{\theta}); that is,

lE(𝜽):=\displaystyle l_{E}(\boldsymbol{\theta}):= 1p1i<jplog(1+eβi,0+βj,0+eβi,1+βj,1)1np1i<jp{(βi,0+βj,0)t=1nE(Xi,jt)\displaystyle\frac{1}{p}\sum_{1\leq i<j\leq p}\log\Big{(}{1+e^{\beta_{i,0}+\beta_{j,0}}+e^{\beta_{i,1}+\beta_{j,1}}}\Big{)}-\frac{1}{np}\sum_{1\leq i<j\leq p}\Bigg{\{}\left(\beta_{i,0}+\beta_{j,0}\right)\sum_{t=1}^{n}{\rm E}\left(X_{i,j}^{t}\right)
+log(1+eβi,1+βj,1)t=1nE(1Xi,jt)(1Xi,jt1)\displaystyle+\log\left(1+e^{\beta_{i,1}+\beta_{j,1}}\right)\sum_{t=1}^{n}{\rm E}\left(1-X_{i,j}^{t}\right)\left(1-X_{i,j}^{t-1}\right)
+log(1+eβi,1+βj,1βi,0βj,0)t=1nE(Xi,jtXi,jt1)},\displaystyle+\log\big{(}1+e^{\beta_{i,1}+\beta_{j,1}-\beta_{i,0}-\beta_{j,0}}\big{)}\sum_{t=1}^{n}{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)\Bigg{\}},

Define 𝐃n,p:={𝜽ˇ:l(𝜽ˇ)l(𝜽)0and𝜽ˇ𝜽cre4κ04κ1}{\mathbf{D}}_{n,p}:=\left\{\check{\boldsymbol{\theta}}:l\left(\check{\boldsymbol{\theta}}\right)-l(\boldsymbol{\theta}^{*})\leq 0\ \text{and}\ \|\check{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\|_{\infty}\leq c_{r}e^{-4\kappa_{0}-4\kappa_{1}}\right\}. Note that when crc_{r} is small enough, cre4κ04κ1<α0c_{r}e^{-4\kappa_{0}-4\kappa_{1}}<\alpha_{0}. By Corollary 4, there exist big enough positive constants C1,c1C_{1},c_{1} such that, with probability greater than 1(np)c11-(np)^{-c_{1}},

|(l(𝜽)l(𝜽ˇ))(lE(𝜽)lE(𝜽ˇ))|C1(1+log(np)p)log(np)n𝜽ˇ𝜽2,\displaystyle\left|\big{(}l(\boldsymbol{\theta}^{*})-l\left(\check{\boldsymbol{\theta}}\right)\big{)}-\big{(}l_{E}(\boldsymbol{\theta}^{*})-l_{E}\left(\check{\boldsymbol{\theta}}\right)\big{)}\right|\leq C_{1}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\sqrt{\frac{\log(np)}{n}}\left\|\check{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2},

holds uniformly for all random variable 𝜽ˇ𝐃n,p\check{\boldsymbol{\theta}}\in{\mathbf{D}}_{n,p}.
Note that 𝜽\boldsymbol{\theta}^{*} is the minimizer of lE()l_{E}(\cdot). By Lemma 2, there exists a constant C2>0C_{2}>0, such that for all 𝜽ˇ𝐃n,p\check{\boldsymbol{\theta}}\in{\mathbf{D}}_{n,p}, we have,

(l(𝜽)l(𝜽ˇ))(lE(𝜽)lE(𝜽ˇ))\displaystyle\big{(}l(\boldsymbol{\theta}^{*})-l\left(\check{\boldsymbol{\theta}}\right)\big{)}-\big{(}l_{E}(\boldsymbol{\theta}^{*})-l_{E}\left(\check{\boldsymbol{\theta}}\right)\big{)}
\displaystyle\geq lE(𝜽ˇ)lE(𝜽)\displaystyle l_{E}\left(\check{\boldsymbol{\theta}}\right)-l_{E}(\boldsymbol{\theta}^{*})
=\displaystyle= (𝜽ˇ𝜽)2lE(c𝜽𝜽+(1c𝜽)𝜽ˇ)(𝜽ˇ𝜽)\displaystyle\left(\check{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right)^{\top}\nabla^{2}l_{E}(c_{\boldsymbol{\theta}}\boldsymbol{\theta}^{*}+(1-c_{\boldsymbol{\theta}})\check{\boldsymbol{\theta}})\left(\check{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right)
\displaystyle\geq 𝜽ˇ𝜽22inf𝐚21(𝐚2lE(c𝜽𝜽ˇ+(1c𝜽)𝜽)𝐚)\displaystyle\left\|\check{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}^{2}\inf_{\|{\mathbf{a}}\|_{2}\leq 1}\left({\mathbf{a}}^{\top}\nabla^{2}l_{E}(c_{\boldsymbol{\theta}}\check{\boldsymbol{\theta}}+(1-c_{\boldsymbol{\theta}})\boldsymbol{\theta}^{*}){\mathbf{a}}\right)
\displaystyle\geq C2e4κ0+4κ1𝜽ˇ𝜽22.\displaystyle\frac{C_{2}}{e^{4\kappa_{0}+4\kappa_{1}}}\left\|\check{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}^{2}.

Here 0c𝜽10\leq c_{\boldsymbol{\theta}}\leq 1 is a random scalar that may depend on 𝜽ˇ\check{\boldsymbol{\theta}}. Consequently, as npnp\to\infty, with probability greater than 1(np)c11-(np)^{-c_{1}}, there exists a constant C3>0C_{3}>0 such that

(A.22) sup𝜽ˇ𝐃n,p𝜽ˇ𝜽2C3e4κ0+4κ1log(np)n(1+log(np)p).\sup_{\check{\boldsymbol{\theta}}\in{\mathbf{D}}_{n,p}}\left\|\check{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}\leq C_{3}e^{4\kappa_{0}+4\kappa_{1}}\sqrt{\frac{\log(np)}{n}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

Note that the MLE 𝜽^\widehat{\boldsymbol{\theta}} satisfies that 𝜽^𝐁(𝜽,cre4κ04κ1)\widehat{\boldsymbol{\theta}}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},c_{r}e^{-4\kappa_{0}-4\kappa_{1}}\right) and l(𝜽^)l(𝜽)l(\widehat{\boldsymbol{\theta}})\leq l(\boldsymbol{\theta}^{*}). Therefore we have 𝜽^𝐃n,p\widehat{\boldsymbol{\theta}}\in{\mathbf{D}}_{n,p}, and from (A.22) we can conclude that as npnp\to\infty, with probability tending to 1,

1p𝜽^𝜽2C3e4κ0+4κ1log(np)np(1+log(np)p).\frac{1}{\sqrt{p}}\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{2}\leq C_{3}e^{4\kappa_{0}+4\kappa_{1}}\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

A.8 Proof of Theorem 3

Before presenting the proof, we first introduce a technical lemma.

Lemma 7.

For all 𝐚,𝐛,𝐜,𝐝K{\mathbf{a}},{\mathbf{b}},{\mathbf{c}},{\mathbf{d}}\in\mathbb{R}^{K} s.t. max{𝐚𝐝,𝐛𝐜}1\max\{\|{\mathbf{a}}-{\mathbf{d}}\|_{\infty},\|{\mathbf{b}}-{\mathbf{c}}\|_{\infty}\}\leq 1 and any positive z1,z2z_{1},z_{2} s.t. z1z21/4z_{1}z_{2}\geq 1/4, we have:

(k=1Keak+bk)(k=1Keck+dk)(k=1Keak+ck)(k=1Kebk+dk)(e1)2k21j,sKez1|akdk|2+z2|bscs|2.\frac{\left(\sum_{k=1}^{K}e^{a_{k}+b_{k}}\right)\left(\sum_{k=1}^{K}e^{c_{k}+d_{k}}\right)}{\left(\sum_{k=1}^{K}e^{a_{k}+c_{k}}\right)\left(\sum_{k=1}^{K}e^{b_{k}+d_{k}}\right)}\leq(e-1)^{2}k^{2}\prod_{1\leq j,s\leq K}e^{z_{1}\left|a_{k}-d_{k}\right|^{2}+z_{2}\left|b_{s}-c_{s}\right|^{2}}.

Here ak,bk,cka_{k},b_{k},c_{k} and dkd_{k} are the kkth element of 𝐚,𝐛,𝐜{\mathbf{a}},{\mathbf{b}},{\mathbf{c}} and 𝐝{\mathbf{d}}, respectively.

Proof.

Note that for all 0<x10<x\leq 1, we have 1(ex1)/xe11\leq\left(e^{x}-1\right)/x\leq e-1 and for all yy, we have |ey1|e|y|1|e^{y}-1|\leq e^{|y|}-1. Consequently, for all 𝐚,𝐛,𝐜,𝐝K{\mathbf{a}},{\mathbf{b}},{\mathbf{c}},{\mathbf{d}}\in\mathbb{R}^{K} s.t. max{𝐚𝐝,𝐛𝐜}1\max\{\|{\mathbf{a}}-{\mathbf{d}}\|_{\infty},\|{\mathbf{b}}-{\mathbf{c}}\|_{\infty}\}\leq 1, we have,

(k=1Keak+bk)(k=1Keck+dk)(k=1Keak+ck)(k=1Kebk+dk)\displaystyle\frac{\left(\sum_{k=1}^{K}e^{a_{k}+b_{k}}\right)\left(\sum_{k=1}^{K}e^{c_{k}+d_{k}}\right)}{\left(\sum_{k=1}^{K}e^{a_{k}+c_{k}}\right)\left(\sum_{k=1}^{K}e^{b_{k}+d_{k}}\right)}
=\displaystyle= 1+1j<sKeaj+bj+cs+dseaj+bs+cj+ds+eas+bs+cj+djeas+bj+cs+dj(k=1Keak+ck)(k=1Kebk+dk)\displaystyle 1+\sum_{1\leq j<s\leq K}\frac{e^{a_{j}+b_{j}+c_{s}+d_{s}}-e^{a_{j}+b_{s}+c_{j}+d_{s}}+e^{a_{s}+b_{s}+c_{j}+d_{j}}-e^{a_{s}+b_{j}+c_{s}+d_{j}}}{\left(\sum_{k=1}^{K}e^{a_{k}+c_{k}}\right)\left(\sum_{k=1}^{K}e^{b_{k}+d_{k}}\right)}
=\displaystyle= 1+1j<sKeas+bs+cj+dj(k=1Keak+ck)(k=1Kebk+dk)(eaj+dsasdj1)(ebj+csbscj1)\displaystyle 1+\sum_{1\leq j<s\leq K}\frac{e^{a_{s}+b_{s}+c_{j}+d_{j}}}{\left(\sum_{k=1}^{K}e^{a_{k}+c_{k}}\right)\left(\sum_{k=1}^{K}e^{b_{k}+d_{k}}\right)}\left(e^{a_{j}+d_{s}-a_{s}-d_{j}}-1\right)\left(e^{b_{j}+c_{s}-b_{s}-c_{j}}-1\right)
\displaystyle\leq 1+1j<sKeas+bs+cj+dj(k=1Keak+ck)(k=1Kebk+dk)(e|ajdj|+|asds|1)(e|bjcj|+|bscs|1)\displaystyle 1+\sum_{1\leq j<s\leq K}\frac{e^{a_{s}+b_{s}+c_{j}+d_{j}}}{\left(\sum_{k=1}^{K}e^{a_{k}+c_{k}}\right)\left(\sum_{k=1}^{K}e^{b_{k}+d_{k}}\right)}\left(e^{\left|a_{j}-d_{j}\right|+\left|a_{s}-d_{s}\right|}-1\right)\left(e^{\left|b_{j}-c_{j}\right|+\left|b_{s}-c_{s}\right|}-1\right)
\displaystyle\leq 1+(e1)21j<sKeas+bs+cj+dj(|ajdj|+|asds|)(|bjcj|+|bscs|)(k=1Keak+ck)(k=1Kebk+dk)\displaystyle 1+(e-1)^{2}\sum_{1\leq j<s\leq K}\frac{e^{a_{s}+b_{s}+c_{j}+d_{j}}\left(\left|a_{j}-d_{j}\right|+\left|a_{s}-d_{s}\right|\right)\left(\left|b_{j}-c_{j}\right|+\left|b_{s}-c_{s}\right|\right)}{\left(\sum_{k=1}^{K}e^{a_{k}+c_{k}}\right)\left(\sum_{k=1}^{K}e^{b_{k}+d_{k}}\right)}
\displaystyle\leq 1+(e1)21j<sKeas+bs+cj+dj(e(|ajdj|+|asds|)(|bjcj|+|bscs|)1)(k=1Keak+ck)(k=1Kebk+dk)\displaystyle 1+(e-1)^{2}\sum_{1\leq j<s\leq K}\frac{e^{a_{s}+b_{s}+c_{j}+d_{j}}\left(e^{\left(\left|a_{j}-d_{j}\right|+\left|a_{s}-d_{s}\right|\right)\left(\left|b_{j}-c_{j}\right|+\left|b_{s}-c_{s}\right|\right)}-1\right)}{\left(\sum_{k=1}^{K}e^{a_{k}+c_{k}}\right)\left(\sum_{k=1}^{K}e^{b_{k}+d_{k}}\right)}
\displaystyle\leq (e1)21j,ske|ajdj||bscs|\displaystyle(e-1)^{2}\sum_{1\leq j,s\leq k}e^{\left|a_{j}-d_{j}\right|\left|b_{s}-c_{s}\right|}
\displaystyle\leq (e1)21j,skez1|ajdj|2+z2|bscs|2\displaystyle(e-1)^{2}\sum_{1\leq j,s\leq k}e^{z_{1}\left|a_{j}-d_{j}\right|^{2}+z_{2}\left|b_{s}-c_{s}\right|^{2}}
\displaystyle\leq (e1)2k21j,skez1|ajdj|2+z2|bscs|2\displaystyle(e-1)^{2}k^{2}\prod_{1\leq j,s\leq k}e^{z_{1}\left|a_{j}-d_{j}\right|^{2}+z_{2}\left|b_{s}-c_{s}\right|^{2}}

holds for any positive z1,z2z_{1},z_{2} s.t. z1z21/4z_{1}z_{2}\geq 1/4. ∎

Next we proceed to the proof of Theorem 3.

Proof of Theorem 3

Proof.

Recall the functions lE(𝜽(i),𝜽(i))l_{E}\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}\right) and l(𝜽(i),𝜽(i))l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}\right) defined in the proof of Theorem 2. We denote the MLE studied in Theorem 2 as 𝜽^\widehat{\boldsymbol{\theta}} (\big{(}the local minimizer of l(𝜽)l(\boldsymbol{\theta}) in 𝐁(𝜽,cre4κ04κ1)){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},c_{r}e^{-4\kappa_{0}-4\kappa_{1}}\right)\big{)}. Also, let 𝜽^(s)=(β^1,0(s),,β^p,0(s),β^1,1(s),,β^p,1(s))\widehat{\boldsymbol{\theta}}^{(s)}=(\widehat{\beta}_{1,0}^{(s)},\ldots,\widehat{\beta}_{p,0}^{(s)},\widehat{\beta}_{1,1}^{(s)},\ldots,\widehat{\beta}_{p,1}^{(s)})^{\top} be the local minimizer of l(𝜽)l(\boldsymbol{\theta}) in the \ell_{\infty} ball 𝐁(𝜽,rs){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r_{s}\right) where

(A.23) rs\displaystyle r_{s} =\displaystyle= e8κ0+8κ1loglog(np)log(np)(np)s(1+log(np)p),\displaystyle e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{\left(np\right)^{s}}\left(1+\frac{\log(np)}{\sqrt{p}}\right),

for some given constants s0s\geq 0. Let

(A.24) s0\displaystyle s_{0} =\displaystyle= 12(κ0+κ1)+loglog(np)/2+logloglog(np)+log(1+log(np)p)log(cr)log(np).\displaystyle\frac{12\left(\kappa_{0}+\kappa_{1}\right)+\log\log(np)/2+\log\log\log(np)+\log\left(1+\frac{\log(np)}{\sqrt{p}}\right)-\log(c_{r})}{\log(np)}.

We then have 𝜽^(s0)=𝜽^\widehat{\boldsymbol{\theta}}^{(s_{0})}=\widehat{\boldsymbol{\theta}}. Under the condition that κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some positive constant cc, we have s0<1/2s_{0}<1/2 when npnp is large enough.

Next we will show that with probability tending to 1, uniformly for all s[s0,1/2]s\in[s_{0},1/2], 𝜽^(s)\widehat{\boldsymbol{\theta}}^{(s)}, the local MLE for 𝐁(𝜽,rs){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r_{s}\right), is also the local MLE in a smaller ball 𝐁(𝜽,rs){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r_{s^{\prime}}\right) where

rs=(np)2s14rs=e8κ0+8κ1loglog(np)log(np)(np)s(1+log(np)p),r_{s^{\prime}}=(np)^{\frac{2s-1}{4}}{r_{s}}=e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{\left(np\right)^{s^{\prime}}}\left(1+\frac{\log(np)}{\sqrt{p}}\right),

and s=2s+14<1/2s^{\prime}=\frac{2s+1}{4}<1/2. Note that 𝜽(i)\boldsymbol{\theta}_{(i)}^{*} is the minimizer of lE(,𝜽(i))l_{E}\left(\cdot,\boldsymbol{\theta}_{(-i)}^{*}\right). Given 𝜽(i)\boldsymbol{\theta}_{(-i)}^{*}, the Hessian matrix of lE(,𝜽(i))l_{E}\left(\cdot,\boldsymbol{\theta}_{(-i)}^{*}\right) evaluated at 𝜽(i)\boldsymbol{\theta}_{(i)} is a 2×22\times 2 matrix given as:

E𝐕(i)(𝜽(i)):=[E𝐕1(i)(𝜽(i))E𝐕2(i)(𝜽(i))E𝐕2(i)(𝜽(i))E𝐕3(i)(𝜽(i))].{\rm E}{\mathbf{V}}^{(i)}\left(\boldsymbol{\theta}_{(i)}\right):=\left[\begin{array}[]{ccc}{\rm E}{\mathbf{V}}^{(i)}_{1}\left(\boldsymbol{\theta}_{(i)}\right)&{\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)\\ {\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)&{\rm E}{\mathbf{V}}^{(i)}_{3}\left(\boldsymbol{\theta}_{(i)}\right)\end{array}\right].

Following the calculations in Lemma 2, there exists a constant C1>0C_{1}>0 which is independent of 𝜽(i)\boldsymbol{\theta}_{(i)}, such that, for all 𝜽(i)𝐁(𝜽(i),cre4κ04κ1)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},c_{r}e^{-4\kappa_{0}-4\kappa_{1}}\right), we have

E𝐕2(i)(𝜽(i))+E𝐕1(i)(𝜽(i))=j=1,jip1peβi,0+βj,0(1+eβi,0+βj,0+eβi,1+βj,1)2C1e2κ04κ1,{\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)+{\rm E}{\mathbf{V}}^{(i)}_{1}\left(\boldsymbol{\theta}_{(i)}\right)=\sum_{j=1,\>j\neq i}^{p}\frac{1}{p}\frac{e^{\beta_{i,0}+\beta^{*}_{j,0}}}{(1+e^{\beta_{i,0}+\beta^{*}_{j,0}}+e^{\beta_{i,1}+\beta^{*}_{j,1}})^{2}}\geq C_{1}e^{-2\kappa_{0}-4\kappa_{1}},

and

E𝐕2(i)(𝜽(i))C1e4κ04κ1;E𝐕2(i)(𝜽(i))+E𝐕3(i)(𝜽(i))C1e4κ04κ1.-{\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)\geq C_{1}e^{-4\kappa_{0}-4\kappa_{1}};\quad\quad{\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)+{\rm E}{\mathbf{V}}^{(i)}_{3}\left(\boldsymbol{\theta}_{(i)}\right)\geq C_{1}e^{-4\kappa_{0}-4\kappa_{1}}.

Then we have for all 𝜽(i)𝐁(𝜽(i),cre4κ04κ1)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},c_{r}e^{-4\kappa_{0}-4\kappa_{1}}\right),

E𝐕(i)(𝜽(i))2\displaystyle\left\|{\rm E}{\mathbf{V}}^{(i)}\left(\boldsymbol{\theta}_{(i)}\right)\right\|_{2}
=\displaystyle= inf𝐳2=1((E𝐕2(i)(𝜽(i))+E𝐕1(i)(𝜽(i)))z12+(E𝐕2(i)(𝜽(i))+E𝐕3(i)(𝜽(i)))z22\displaystyle\inf_{\|{\mathbf{z}}\|_{2}=1}\Bigg{(}\left({\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)+{\rm E}{\mathbf{V}}^{(i)}_{1}\left(\boldsymbol{\theta}_{(i)}\right)\right)z_{1}^{2}+\left({\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)+{\rm E}{\mathbf{V}}^{(i)}_{3}\left(\boldsymbol{\theta}_{(i)}\right)\right)z_{2}^{2}
E𝐕2(i)(𝜽(i))(z1z2)2)\displaystyle-{\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)\left(z_{1}-z_{2}\right)^{2}\Bigg{)}
\displaystyle\geq inf𝐳2=1{(E𝐕2(i)(𝜽(i))+E𝐕1(i)(𝜽(i)))z12+(E𝐕2(i)(𝜽(i))+E𝐕3(i)(𝜽(i)))z22}\displaystyle\inf_{\|{\mathbf{z}}\|_{2}=1}\Bigg{\{}\left({\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)+{\rm E}{\mathbf{V}}^{(i)}_{1}\left(\boldsymbol{\theta}_{(i)}\right)\right)z_{1}^{2}+\left({\rm E}{\mathbf{V}}^{(i)}_{2}\left(\boldsymbol{\theta}_{(i)}\right)+{\rm E}{\mathbf{V}}^{(i)}_{3}\left(\boldsymbol{\theta}_{(i)}\right)\right)z_{2}^{2}\Bigg{\}}
\displaystyle\geq C1e4κ0+4κ1inf𝐳2=1{z12+z22}\displaystyle\frac{C_{1}}{e^{4\kappa_{0}+4\kappa_{1}}}\inf_{\|{\mathbf{z}}\|_{2}=1}\Bigg{\{}z_{1}^{2}+z_{2}^{2}\Bigg{\}}
=\displaystyle= C1e4κ0+4κ1.\displaystyle\frac{C_{1}}{e^{4\kappa_{0}+4\kappa_{1}}}.

Consequently, for any ii and s[s0,1/2]s\in[s_{0},1/2], there exists a random scalar 𝜽(i)(ξ)\boldsymbol{\theta}_{(i)}^{(\xi)}, s.t.

l(𝜽(i),𝜽^(i)(s))l(𝜽^(i)(s),𝜽^(i)(s))[lE(𝜽(i),𝜽(i))lE(𝜽^(i)(s),𝜽(i))]\displaystyle l\left(\boldsymbol{\theta}_{(i)}^{*},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-l\left(\widehat{\boldsymbol{\theta}}^{(s)}_{(i)},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-\left[l_{E}\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l_{E}\left(\widehat{\boldsymbol{\theta}}^{(s)}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]
\displaystyle\geq lE(𝜽^(i)(s),𝜽(i))lE(𝜽(i),𝜽(i))\displaystyle l_{E}\left(\widehat{\boldsymbol{\theta}}^{(s)}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)-l_{E}\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)
=\displaystyle= 12(𝜽(i)𝜽^(i)(s))lE′′(𝜽(i)(ξ),𝜽(i))(𝜽(i)𝜽^(i)(s))\displaystyle\frac{1}{2}\left(\boldsymbol{\theta}_{(i)}^{*}-\widehat{\boldsymbol{\theta}}^{(s)}_{(i)}\right)^{\top}l_{E}^{\prime\prime}\left(\boldsymbol{\theta}_{(i)}^{(\xi)},\boldsymbol{\theta}_{(-i)}^{*}\right)\left(\boldsymbol{\theta}_{(i)}^{*}-\widehat{\boldsymbol{\theta}}^{(s)}_{(i)}\right)
\displaystyle\geq 𝜽(i)𝜽^(i)(s)22(inf𝜽(i)<κE𝐕(i)(𝜽(i)(ξ),𝜽(i))2)\displaystyle\|\boldsymbol{\theta}_{(i)}^{*}-\widehat{\boldsymbol{\theta}}^{(s)}_{(i)}\|_{2}^{2}\left(\inf_{\|\boldsymbol{\theta}_{(i)}\|_{\infty}<\kappa}\left\|{\rm E}{\mathbf{V}}^{(i)}\left(\boldsymbol{\theta}_{(i)}^{(\xi)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right\|_{2}\right)
\displaystyle\geq C1𝜽(i)𝜽^(i)(s)2e4κ04κ1.\displaystyle\frac{C_{1}\|\boldsymbol{\theta}_{(i)}^{*}-\widehat{\boldsymbol{\theta}}^{(s)}_{(i)}\|_{\infty}^{2}}{e^{-4\kappa_{0}-4\kappa_{1}}}.

On the other hand, notice that

l(𝜽(i),𝜽^(i)(s))l(𝜽(i),𝜽^(i)(s))[lE(𝜽(i),𝜽(i))lE(𝜽(i),𝜽(i))]\displaystyle l\left(\boldsymbol{\theta}_{(i)}^{*},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-l\left(\boldsymbol{\theta}_{(i)},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-\left[l_{E}\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l_{E}\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]
\displaystyle\leq |l(𝜽(i),𝜽^(i)(s))l(𝜽(i),𝜽^(i)(s))[l(𝜽(i),𝜽(i))l(𝜽(i),𝜽(i))]|\displaystyle\left|l\left(\boldsymbol{\theta}_{(i)}^{*},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-l\left(\boldsymbol{\theta}_{(i)},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-\left[l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|
+|l(𝜽(i),𝜽(i))l(𝜽(i),𝜽(i))[lE(𝜽(i),𝜽(i))lE(𝜽(i),𝜽(i))]|.\displaystyle+\left|l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)-\left[l_{E}\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l_{E}\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|.

By Corollary 4, there exist large enough positive constants C2C_{2} and c1c_{1} which are independent of 𝜽(i)\boldsymbol{\theta}_{(i)} such that, with probability greater than 1(np)c11-(np)^{-c_{1}},

|l(𝜽(i),𝜽(i))l(𝜽(i),𝜽(i))[lE(𝜽(i),𝜽(i))lE(𝜽(i),𝜽(i))]|\displaystyle\left|l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)-\left[l_{E}\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l_{E}\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|
\displaystyle\leq C2(1+log(np)p)log(np)np𝜽(i)𝜽(i)2\displaystyle C_{2}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\sqrt{\frac{\log(np)}{np}}\left\|\boldsymbol{\theta}_{(i)}-\boldsymbol{\theta}^{*}_{(i)}\right\|_{2}
\displaystyle\leq 2C2(1+log(np)p)log(np)np𝜽(i)𝜽(i),\displaystyle\sqrt{2}C_{2}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\sqrt{\frac{\log(np)}{np}}\left\|\boldsymbol{\theta}_{(i)}-\boldsymbol{\theta}^{*}_{(i)}\right\|_{\infty},

holds uniformly for any 𝜽(i)𝐁(𝜽(i),α0)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},\alpha_{0}\right) where α0<1/4\alpha_{0}<1/4. By Lemma 7, as npnp\to\infty, there exist big enough positive constants C3C_{3} and c2c_{2} which are independent of 𝜽(i)\boldsymbol{\theta}_{(i)} such that, with probability greater than 1(np)c21-(np)^{-c_{2}},

|l(𝜽(i),𝜽^(i)(s))l(𝜽(i),𝜽^(i)(s))[l(𝜽(i),𝜽(i))l(𝜽(i),𝜽(i))]|\displaystyle\left|l\left(\boldsymbol{\theta}_{(i)}^{*},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-l\left(\boldsymbol{\theta}_{(i)},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-\left[l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|
=\displaystyle= |1npj=1,jip[nlog(1+eβi,0+βj,0+eβi,1+βj,11+eβi,0+β^j,0(s)+eβi,1+β^j,1(s)×1+eβi,0+β^j,0(s)+eβi,1+β^j,1(s)1+eβi,0+βj,0+eβi,1+βj,1)\displaystyle\Bigg{|}\frac{1}{np}\sum_{j=1,\>j\neq i}^{p}\Bigg{[}-n\log\left(\frac{1+e^{\beta^{*}_{i,0}+\beta^{*}_{j,0}}+e^{\beta^{*}_{i,1}+\beta^{*}_{j,1}}}{1+e^{\beta^{*}_{i,0}+\widehat{\beta}^{(s)}_{j,0}}+e^{\beta^{*}_{i,1}+\widehat{\beta}^{(s)}_{j,1}}}\times\frac{1+e^{\beta_{i,0}+\widehat{\beta}^{(s)}_{j,0}}+e^{\beta_{i,1}+\widehat{\beta}^{(s)}_{j,1}}}{1+e^{\beta_{i,0}+\beta^{*}_{j,0}}+e^{\beta_{i,1}+\beta^{*}_{j,1}}}\right)
+log(1+eβi,1+βj,11+eβi,1+β^j,1(s)×1+eβi,1+β^j,1(s)1+eβi,1+βj,1)di,j\displaystyle+\log\left(\frac{1+e^{\beta^{*}_{i,1}+\beta^{*}_{j,1}}}{1+e^{\beta^{*}_{i,1}+\widehat{\beta}^{(s)}_{j,1}}}\times\frac{1+e^{\beta_{i,1}+\widehat{\beta}^{(s)}_{j,1}}}{1+e^{\beta_{i,1}+\beta^{*}_{j,1}}}\right)d_{i,j}
+log(eβi,0+βj,0+eβi,1+βj,1eβi,0+β^j,0(s)+eβi,1+β^j,1(s)×eβi,0+β^j,0(s)+eβi,1+β^j,1(s)eβi,0+βj,0+eβi,1+βj,1)bi,j]|\displaystyle+\log\left(\frac{e^{\beta^{*}_{i,0}+\beta^{*}_{j,0}}+e^{\beta^{*}_{i,1}+\beta^{*}_{j,1}}}{e^{\beta^{*}_{i,0}+\widehat{\beta}^{(s)}_{j,0}}+e^{\beta^{*}_{i,1}+\widehat{\beta}^{(s)}_{j,1}}}\times\frac{e^{\beta_{i,0}+\widehat{\beta}^{(s)}_{j,0}}+e^{\beta_{i,1}+\widehat{\beta}^{(s)}_{j,1}}}{e^{\beta_{i,0}+\beta^{*}_{j,0}}+e^{\beta_{i,1}+\beta^{*}_{j,1}}}\right)b_{i,j}\Bigg{]}\Bigg{|}
\displaystyle\leq C3pj=1,jip(z1|βi,0βi,0|2+z1|βi,1βi,1|2+z2|βj,0β^j,0(s)|2+z2|βj,1β^j,1(s)|2)\displaystyle\frac{C_{3}}{p}\sum_{j=1,\>j\neq i}^{p}\Bigg{(}z_{1}\left|\beta^{*}_{i,0}-\beta_{i,0}\right|^{2}+z_{1}\left|\beta^{*}_{i,1}-\beta_{i,1}\right|^{2}+z_{2}\left|\beta^{*}_{j,0}-\widehat{\beta}^{(s)}_{j,0}\right|^{2}+z_{2}\left|\beta^{*}_{j,1}-\widehat{\beta}^{(s)}_{j,1}\right|^{2}\Bigg{)}
\displaystyle\leq 2C3z1𝜽(i)𝜽(i)2+C3z2p𝜽^(i)(s)𝜽(i)22\displaystyle 2C_{3}z_{1}\|\boldsymbol{\theta}_{(i)}-\boldsymbol{\theta}^{*}_{(i)}\|^{2}_{\infty}+\frac{C_{3}z_{2}}{p}\|\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}-\boldsymbol{\theta}^{*}_{(-i)}\|^{2}_{2}
\displaystyle\leq 2C3z1e16κ0+16κ1[loglog(np)]2log(np)(np)2s(1+log(np)p)2\displaystyle 2C_{3}z_{1}\frac{e^{16\kappa_{0}+16\kappa_{1}}[\log\log(np)]^{2}\log(np)}{\left(np\right)^{2s}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)^{2}
+C3z2pe8κ0+8κ1log(np)n(1+log(np)p)2,\displaystyle+\frac{C_{3}z_{2}}{p}\frac{e^{8\kappa_{0}+8\kappa_{1}}\log(np)}{n}\left(1+\frac{\log(np)}{\sqrt{p}}\right)^{2},

holds uniformly for all for any positive z1,z2z_{1},z_{2} s.t. z1z21/4z_{1}z_{2}\geq 1/4, s[s0,1/2]s\in[s_{0},1/2] and 𝜽(i)𝐁(𝜽(i),rs)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},r_{s}\right). Here in the last step we have used inequality (A.22) and the fact that for all ss, 𝜽^(s)𝐃n,p\widehat{\boldsymbol{\theta}}^{(s)}\in{\mathbf{D}}_{n,p}. Let z1=0.5e4κ04κ1[loglog(np)]1z_{1}=0.5e^{-4\kappa_{0}-4\kappa_{1}}[\log\log(np)]^{-1} (np)s1/2(np)^{s-1/2} and z2=0.5e4κ0+4κ1loglog(np)(np)1/2sz_{2}=0.5e^{4\kappa_{0}+4\kappa_{1}}\log\log(np)(np)^{1/2-s}, we have, there exists a big enough constant C4>0C_{4}>0, with probability greater than 1(np)c21-(np)^{-c_{2}},

|l(𝜽(i),𝜽^(i)(s))l(𝜽(i),𝜽^(i)(s))[l(𝜽(i),𝜽(i))l(𝜽(i),𝜽(i))]|\displaystyle\left|l\left(\boldsymbol{\theta}_{(i)}^{*},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-l\left(\boldsymbol{\theta}_{(i)},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-\left[l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|
\displaystyle\leq C4e12κ0+12κ1loglog(np)log(np)(np)s+1/2(1+log(np)p)2,\displaystyle C_{4}\frac{e^{12\kappa_{0}+12\kappa_{1}}\log\log(np)\log(np)}{(np)^{s+1/2}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)^{2},

holds uniformly for all s[s0,1/2]s\in[s_{0},1/2] and 𝜽(i)𝐁(𝜽(i),rs)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},r_{s}\right). Combining the inequalities (A.8) and (A.8), we have, with probability greater than 1(np)c31-(np)^{-c_{3}} for a large enough constant c3>0c_{3}>0,

(A.28) |l(𝜽(i),𝜽^(i)(s))l(𝜽(i),𝜽^(i)(s))[lE(𝜽(i),𝜽(i))lE(𝜽(i),𝜽(i))]|\displaystyle\left|l\left(\boldsymbol{\theta}_{(i)}^{*},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-l\left(\boldsymbol{\theta}_{(i)},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-\left[l_{E}\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l_{E}\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|
\displaystyle\leq 2C2(1+log(np)p)log(np)np𝜽(i)𝜽(i)\displaystyle\sqrt{2}C_{2}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\sqrt{\frac{\log(np)}{np}}\left\|\boldsymbol{\theta}_{(i)}-\boldsymbol{\theta}^{*}_{(i)}\right\|_{\infty}
+C4e12κ0+12κ1loglog(np)log(np)(np)s+1/2(1+log(np)p)2,\displaystyle+C_{4}\frac{e^{12\kappa_{0}+12\kappa_{1}}\log\log(np)\log(np)}{(np)^{s+1/2}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)^{2},

holds uniformly for all s[s0,1/2]s\in[s_{0},1/2], all i=1,,pi=1,\ldots,p and 𝜽(i)𝐁(𝜽(i),rs)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},r_{s}\right).

Combining the inequalities (A.8) and (A.28), we have, with probability greater than 1(np)c31-(np)^{-c_{3}},

C1e4κ0+4κ1𝜽^(i)(s)𝜽(i)22C2(1+log(np)p)log(np)np𝜽^(i)(s)𝜽(i)+C4e12κ0+12κ1loglog(np)log(np)(np)s+1/2(1+log(np)p)2,\frac{C_{1}}{e^{4\kappa_{0}+4\kappa_{1}}}\|\widehat{\boldsymbol{\theta}}^{(s)}_{(i)}-\boldsymbol{\theta}_{(i)}^{*}\|_{\infty}^{2}\leq\sqrt{2}C_{2}\left(1+\frac{\log(np)}{\sqrt{p}}\right)\sqrt{\frac{\log(np)}{np}}\left\|\widehat{\boldsymbol{\theta}}^{(s)}_{(i)}-\boldsymbol{\theta}^{*}_{(i)}\right\|_{\infty}\\ +C_{4}\frac{e^{12\kappa_{0}+12\kappa_{1}}\log\log(np)\log(np)}{(np)^{s+1/2}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)^{2},

holds uniformly for all s[s0,1/2]s\in[s_{0},1/2] and all i=1,,pi=1,\ldots,p. Notice that the constants C1C_{1}, C2C_{2}, C3C_{3}, C4C_{4}, c2c_{2} and c3c_{3} are all independent of rsr_{s} and ss. This indicates that there exists a big enough constant C5>0C_{5}>0 which is independent of rsr_{s} and ss s.t.

(A.29) 𝜽^(s)𝜽\displaystyle\|\widehat{\boldsymbol{\theta}}^{(s)}-\boldsymbol{\theta}^{*}\|_{\infty}
\displaystyle\leq C5e4κ0+4κ1log(np)np(1+log(np)p)+C5e8κ0+8κ1log(np)loglog(np)(np)2s+14(1+log(np)p)\displaystyle C_{5}e^{4\kappa_{0}+4\kappa_{1}}\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)+C_{5}\frac{e^{8\kappa_{0}+8\kappa_{1}}\sqrt{\log(np)\log\log(np)}}{(np)^{\frac{2s+1}{4}}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)
\displaystyle\leq 2C5e8κ0+8κ1log(np)loglog(np)(np)2s+14(1+log(np)p)\displaystyle 2C_{5}\frac{e^{8\kappa_{0}+8\kappa_{1}}\sqrt{\log(np)\log\log(np)}}{(np)^{\frac{2s+1}{4}}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)
\displaystyle\leq e8κ0+8κ1loglog(np)log(np)(np)2s+14(1+log(np)p).\displaystyle e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{(np)^{\frac{2s+1}{4}}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

Recall that 𝜽^(s)\widehat{\boldsymbol{\theta}}^{(s)} is the local maximizer of l(𝜽)l(\boldsymbol{\theta}) in 𝐁(𝜽,rs){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r_{s}\right) with

rs\displaystyle r_{s} =\displaystyle= e8κ0+8κ1loglog(np)log(np)(np)s(1+log(np)p).\displaystyle e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{\left(np\right)^{s}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

Thus far we have proved that: with probability greater than 1(np)c31-(np)^{-c_{3}} for some large enough constant c3>0c_{3}>0, uniformly for all s[s0,1/2]s\in[s_{0},1/2], 𝜽^(s)\widehat{\boldsymbol{\theta}}^{(s)} is also within the ball 𝐁(𝜽,rs){\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r_{s^{\prime}}\right) with

(A.30) rs=(np)2s14rs=e8κ0+8κ1loglog(np)log(np)(np)2s+14(1+log(np)p).{r_{s^{\prime}}}=(np)^{\frac{2s-1}{4}}{r_{s}}=e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{(np)^{\frac{2s+1}{4}}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

Now define a series {si;i=0,1,}\{s_{i};i=0,1,\cdots\} s.t. s0s_{0} is defined as in equation (A.24) and sk=sk1/2+1/4s_{k}=s_{k-1}/2+1/4. We have:

(A.31) sk12=12(sk112)=12k(s012).s_{k}-\frac{1}{2}=\frac{1}{2}\left(s_{k-1}-\frac{1}{2}\right)=\frac{1}{2^{k}}\left(s_{0}-\frac{1}{2}\right).

Then, we have sk1<sk<1/2s_{k-1}<s_{k}<1/2 for k>1k>1 and limksk1/2\lim_{k\to\infty}s_{k}\to 1/2.

Let K=log2(log(np))+1K=\lfloor\log_{2}\left(\log(np)\right)\rfloor+1 where \lfloor\cdot\rfloor is the smallest integer function. Beginning with 𝜽^(s0)=𝜽^\widehat{\boldsymbol{\theta}}^{(s_{0})}=\widehat{\boldsymbol{\theta}}, and repeatedly using the result in (A.29) for KK times, we have, with probability greater than 1(np)c31-(np)^{-c_{3}}, we can sequentially reduce the upper bound from 𝜽^𝜽e8κ0+8κ1loglog(np)log(np)(np)12+2s012(1+log(np)p)\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\|_{\infty}\leq e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{(np)^{\frac{1}{2}+\frac{2s_{0}-1}{2}}}\left(1+\frac{\log(np)}{\sqrt{p}}\right) to:

𝜽^𝜽\displaystyle\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\|_{\infty} \displaystyle\leq e8κ0+8κ1loglog(np)log(np)(np)12+2s012K(1+log(np)p)\displaystyle e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{(np)^{\frac{1}{2}+\frac{2s_{0}-1}{2^{K}}}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)
=\displaystyle= (np)12s02Ke8κ0+8κ1loglog(np)log(np)np(1+log(np)p)\displaystyle(np)^{\frac{1-2s_{0}}{2^{K}}}e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)
\displaystyle\leq e12s0e8κ0+8κ1loglog(np)log(np)np(1+log(np)p).\displaystyle e^{1-2s_{0}}e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

Here in the last step of above inequality, we have used the fact that when K=log2(log(np))+1>log2(log(np))K=\lfloor\log_{2}\left(\log(np)\right)\rfloor+1>\log_{2}\left(\log(np)\right),

(np)12s02K((np)log(2)log(np))12s0log(2)=212s0log(2)=e12s0.(np)^{\frac{1-2s_{0}}{2^{K}}}\leq\left((np)^{\frac{\log(2)}{\log(np)}}\right)^{\frac{1-2s_{0}}{\log(2)}}=2^{\frac{1-2s_{0}}{\log(2)}}=e^{1-2s_{0}}.

Thus, assuming that np,n2np\rightarrow\infty,n\geq 2, κ0+κ1clog(np)\kappa_{0}+\kappa_{1}\leq c\log(np) for some positive constant cc, we have, with probability tending to 1,

𝜽^𝜽e8κ0+8κ1loglog(np)log(np)np(1+log(np)p).\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}\lesssim e^{8\kappa_{0}+8\kappa_{1}}\log\log(np)\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

A.9 Proof of Theorem 4

For brevity, we denote α~r,i,j:=eβ~i,r+β~i,r\tilde{\alpha}_{r,i,j}:=e^{\tilde{\beta}_{i,r}+\tilde{\beta}_{i,r}}, with i,j=1,,pi,j=1,\cdots,p and r=0,1r=0,1. We use the interior mapping theorem [6] to establish the existence and uniform consistency of the moment estimator. The interior mapping theorem is presented in Lemma 8.

Lemma 8.

(Interior mapping theorem). Let 𝐅(𝐱)=(𝐅1(𝐱),,𝐅p(𝐱)){\mathbf{F}}\left({\mathbf{x}}\right)=\left({\mathbf{F}}_{1}\left({\mathbf{x}}\right),\cdots,{\mathbf{F}}_{p}\left({\mathbf{x}}\right)\right)^{\top} be a vector function on an open convex subset 𝒟{\mathcal{D}} of p\mathbb{R}^{p} and 𝐅(𝐱)𝐅(𝐲)γ𝐱𝐲\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)-{\mathbf{F}}^{\prime}\left({\mathbf{y}}\right)\right\|_{\infty}\lesssim\gamma\left\|{\mathbf{x}}-{\mathbf{y}}\right\|. Assume that x0𝒟x_{0}\in{\mathcal{D}} s.t.

𝐅(𝐱0)1N,𝐅(𝐱0)1𝐅(𝐱0)δ,h=2Nγδ1,\displaystyle\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}_{0}\right)^{-1}\right\|_{\infty}\leq N,\quad\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}_{0}\right)^{-1}{\mathbf{F}}\left({\mathbf{x}}_{0}\right)\right\|_{\infty}\leq\delta,\quad h=2N\gamma\delta\leq 1,
t2h(11h)δ,𝐁(𝐱0,t)𝒟,\displaystyle t^{*}\equiv\frac{2}{h}\left(1-\sqrt{1-h}\right)\delta,\quad{\mathbf{B}}_{\infty}\left({\mathbf{x}}_{0},t^{*}\right)\subset{\mathcal{D}},

where N and δ\delta are positive constants that may depend on x0x_{0} and pp. Then the Newton iterates 𝐱n+1𝐱n𝐅(𝐱n)1𝐅(𝐱n){\mathbf{x}}_{n+1}\equiv{\mathbf{x}}_{n}-{\mathbf{F}}^{\prime}\left({\mathbf{x}}_{n}\right)^{-1}{\mathbf{F}}\left({\mathbf{x}}_{n}\right) exists and 𝐱n𝐁(x0,t)𝒟{\mathbf{x}}_{n}\in{\mathbf{B}}_{\infty}\left(x_{0},t^{*}\right)\subset{\mathcal{D}} for all n0n\geq 0. Moreover, 𝐱=limn𝐱n{\mathbf{x}}^{*}=\lim_{n\to\infty}{\mathbf{x}}_{n} exists, x𝐁(x0,t)¯𝒟x^{*}\in\overline{{\mathbf{B}}_{\infty}\left(x_{0},t^{*}\right)}\subset{\mathcal{D}}, where A¯\overline{A} denotes the closure of set AA and 𝐅(𝐱)=0{\mathbf{F}}\left({\mathbf{x}}^{*}\right)=0.

Proof of Theorem 4.

Proof.

Recall that the 𝜽~(0)\tilde{\boldsymbol{\theta}}_{(0)} is defined as the solution of

t=1nj=1,jipXi,jt=nj=1,jipeβi,0+βj,01+eβi,0+βj,0,i=1,,p.\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}X_{i,j}^{t}=n\sum_{j=1,\>j\neq i}^{p}\frac{e^{\beta_{i,0}+\beta_{j,0}}}{1+e^{\beta_{i,0}+\beta_{j,0}}},\quad i=1,\cdots,p.

For any 𝐱p{\mathbf{x}}\in\mathbb{R}^{p}, define a system of random functions 𝐆(𝐱){\mathbf{G}}\left({\mathbf{x}}\right):

𝐆i(𝐱):=1npt=1nj=1,jip(Xi,jtexi+xj1+exi+xj),i=1,,p;\displaystyle{\mathbf{G}}_{i}\left({\mathbf{x}}\right):=-\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left(X_{i,j}^{t}-\frac{e^{x_{i}+x_{j}}}{1+e^{x_{i}+x_{j}}}\right),\quad i=1,\cdots,p;
𝐆(𝐱):=(𝐆1(𝐱),,𝐆p(𝐱)).\displaystyle{\mathbf{G}}\left({\mathbf{x}}\right):=\left({\mathbf{G}}_{1}\left({\mathbf{x}}\right),\cdots,{\mathbf{G}}_{p}\left({\mathbf{x}}\right)\right)^{\top}.

As npnp\to\infty and κ=clog(np)\kappa=c\log(np) with a small enough constant c>0c>0, there exist big enough constants C1>0,c1>0C_{1}>0,c_{1}>0 such that with probability greater than 1(np)c11-(np)^{c_{1}},

(A.32) 𝐆(𝐱)𝐆(𝐲)\displaystyle\left\|{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)-{\mathbf{G}}^{\prime}\left({\mathbf{y}}\right)\right\|_{\infty} \displaystyle\leq C1𝐱𝐲,\displaystyle C_{1}\|{\mathbf{x}}-{\mathbf{y}}\|_{\infty},
𝐆(𝜽(0))1\displaystyle\left\|{\mathbf{G}}^{\prime}\left(\boldsymbol{\theta}_{(0)}^{*}\right)^{-1}\right\|_{\infty} \displaystyle\leq C1e2κ0,\displaystyle C_{1}e^{2\kappa_{0}},
𝐆(𝜽(0))1𝐆(𝜽(0))\displaystyle\left\|{\mathbf{G}}^{\prime}\left(\boldsymbol{\theta}_{(0)}^{*}\right)^{-1}{\mathbf{G}}\left(\boldsymbol{\theta}_{(0)}^{*}\right)\right\|_{\infty} \displaystyle\leq C1e2κ0log(n)log(p)np,\displaystyle C_{1}e^{2\kappa_{0}}\sqrt{\frac{\log(n)\log(p)}{np}},

hold for all 𝐱,𝐲𝐁(0,κ0){\mathbf{x}},{\mathbf{y}}\in{\mathbf{B}}_{\infty}\left(0,\kappa_{0}\right). For brevity, the proof of inequalities in (A.32) is provided independently in Section A.10. Let x0=𝜽(0)x_{0}=\boldsymbol{\theta}_{(0)}^{*} and 𝒟=Int (𝐁(0,κ0)){\mathcal{D}}=\text{Int }({\mathbf{B}}_{\infty}\left(0,\kappa_{0}\right)). Here the notation Int(A)\text{Int}(A) denotes the interior of a given set AA. We then have:

N=C1e2κ0,γ=C1,δ=C1e2κ0log(n)log(p)np,\displaystyle N=C_{1}e^{2\kappa_{0}},\quad\gamma=C_{1},\quad\delta=C_{1}e^{2\kappa_{0}}\sqrt{\frac{\log(n)\log(p)}{np}},
h=2Nγδ=2C13e4κ0log(n)log(p)np=o(1),\displaystyle h=2N\gamma\delta=2C_{1}^{3}e^{4\kappa_{0}}\sqrt{\frac{\log(n)\log(p)}{np}}=o\left(1\right),
t2C1h(11h)e2κ0log(n)log(p)np,\displaystyle t^{*}\equiv\frac{2C_{1}}{h}\left(1-\sqrt{1-h}\right)e^{2\kappa_{0}}\sqrt{\frac{\log(n)\log(p)}{np}},
𝐁(𝜽(1),t)𝒟.\displaystyle{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}_{(1)}^{*},t^{*}\right)\subset{\mathcal{D}}.

Note that h(0,1)\forall h\in(0,1), 11h<1(1h)=h1-\sqrt{1-h}<1-(1-h)=h, we have

t2C1h(11h)e2κ0log(n)log(p)np<4e4κ0log(n)log(p)np.t^{*}\equiv\frac{2C_{1}}{h}\left(1-\sqrt{1-h}\right)e^{2\kappa_{0}}\sqrt{\frac{\log(n)\log(p)}{np}}<4e^{4\kappa_{0}}\sqrt{\frac{\log(n)\log(p)}{np}}.

Consequently, by Lemma 8, we have that, with probability tending to 1,

(A.33) 𝜽~(0)𝜽(0)log(n)log(p)e4κ0np.\left\|\tilde{\boldsymbol{\theta}}_{(0)}-\boldsymbol{\theta}_{(0)}\right\|_{\infty}\lesssim\sqrt{\frac{\log(n)\log(p)e^{4\kappa_{0}}}{np}}.

Next, we derive the error bound of 𝜽~(1)\tilde{\boldsymbol{\theta}}_{(1)} based on (A.33). For 𝐱p{\mathbf{x}}\in\mathbb{R}^{p}, define a system of random functions 𝐅(𝐱;𝜽~(0)){\mathbf{F}}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right):

𝐅i(𝐱;𝜽~(0)):=1npt=1nj=1,jip{Xi,jtXi,jt1α~0,i,j1+α~0,i,j(111+α~0,i,j+exi+xj)},\displaystyle{\mathbf{F}}_{i}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right):=-\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left\{X_{i,j}^{t}X_{i,j}^{t-1}-\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\left(1-\frac{1}{1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}}\right)\right\},
𝐅(𝐱;𝜽~(0)):=(𝐅1(𝐱;𝜽~(0)),,𝐅p(𝐱;𝜽~(0))).\displaystyle{\mathbf{F}}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right):=\left({\mathbf{F}}_{1}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right),\cdots,{\mathbf{F}}_{p}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right)\right)^{\top}.

As npnp\to\infty and κ0=clog(np)\kappa_{0}=c\log(np) with a small enough constant c>0c>0, there exist big enough constants C2>0,c2>0C_{2}>0,c_{2}>0 such that with probability greater than 1(np)c21-(np)^{c_{2}},

(A.34) 𝐅(𝐱;𝜽~(0))𝐅(𝐲;𝜽~(0))C2𝐱𝐲,\displaystyle\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right)-{\mathbf{F}}^{\prime}\left({\mathbf{y}};\tilde{\boldsymbol{\theta}}_{(0)}\right)\right\|_{\infty}\leq C_{2}\left\|{\mathbf{x}}-{\mathbf{y}}\right\|_{\infty},
𝐅(𝜽(1);𝜽~(0))1C2e12κ0+6κ1,\displaystyle\left\|{\mathbf{F}}^{\prime}\left(\boldsymbol{\theta}_{(1)}^{*};\tilde{\boldsymbol{\theta}}_{(0)}\right)^{-1}\right\|_{\infty}\leq C_{2}e^{12\kappa_{0}+6\kappa_{1}},
𝐅(𝜽(1);𝜽~(0))1𝐅(𝜽(1);𝜽~(0))C2e14κ0+6κ1log(n)log(p)np,\displaystyle\left\|{\mathbf{F}}^{\prime}\left(\boldsymbol{\theta}_{(1)}^{*};\tilde{\boldsymbol{\theta}}_{(0)}\right)^{-1}{\mathbf{F}}\left(\boldsymbol{\theta}_{(1)}^{*};\tilde{\boldsymbol{\theta}}_{(0)}\right)\right\|_{\infty}\leq C_{2}e^{14\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}},

hold for all x,y𝐁(0,κ1)x,y\in{\mathbf{B}}_{\infty}\left(0,\kappa_{1}\right). For brevity, the proof of inequalities in (A.34) is provided independently in Section A.11. Let x0=𝜽(1)x_{0}=\boldsymbol{\theta}_{(1)}^{*} and 𝒟=Int (𝐁(0,κ)){\mathcal{D}}=\text{Int }({\mathbf{B}}_{\infty}\left(0,\kappa\right)). We then have:

N=C2e12κ0+6κ1,γ=C2,δ=C2e14κ0+6κ1log(n)log(p)np,\displaystyle N=C_{2}e^{12\kappa_{0}+6\kappa_{1}},\quad\gamma=C_{2},\quad\delta=C_{2}e^{14\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}},
h=2Nγδ=2C23e26κ0+12κ1log(n)log(p)np=o(1),\displaystyle h=2N\gamma\delta=2C_{2}^{3}e^{26\kappa_{0}+12\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}}=o\left(1\right),
t2C2h(11h)e14κ0+6κ1log(n)log(p)np,\displaystyle t^{*}\equiv\frac{2C_{2}}{h}\left(1-\sqrt{1-h}\right)e^{14\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}},
𝐁(𝜽(1),t)𝒟.\displaystyle{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}_{(1)}^{*},t^{*}\right)\subset{\mathcal{D}}.

Note that h(0,1)\forall h\in(0,1), 11h<1(1h)=h1-\sqrt{1-h}<1-(1-h)=h, we have

t2C2h(11h)e14κ0+6κ1log(n)log(p)np<4C2e14κ0+6κ1log(n)log(p)np.t^{*}\equiv\frac{2C_{2}}{h}\left(1-\sqrt{1-h}\right)e^{14\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}}<4C_{2}e^{14\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}}.

Consequently, by Lemma 8, we have that, with probability tending to 1,

𝜽~(1)𝜽(1)\displaystyle\left\|\tilde{\boldsymbol{\theta}}_{(1)}-\boldsymbol{\theta}_{(1)}^{*}\right\|_{\infty} t\displaystyle\leq t^{*}\leq 4C2e14κ0+6κ1log(n)log(p)np.\displaystyle 4C_{2}e^{14\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}}.

Combining with (A.33), the theorem is proved. ∎

A.10 Proof of (A.32)

Proof.

Note that 𝐆(𝐱){\mathbf{G}}^{\prime}\left({\mathbf{x}}\right) is a balanced symmetric matrix s.t.

𝐆(𝐱)i,j=1pα0,i,j(1+α0,i,j)2,𝐆(𝐱)i,i=j=1,jip𝐆(𝐱)i,j,{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)_{i,j}=\frac{1}{p}\frac{\alpha_{0,i,j}}{\left(1+\alpha_{0,i,j}\right)^{2}},\quad\quad\quad{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)_{i,i}=\sum_{j=1,\>j\neq i}^{p}{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)_{i,j},

for i=1,,pi=1,\cdots,p and iji\neq j. Following [38], we construct a matrix 𝐒=(Si,j){\mathbf{S}}=(S_{i,j}) to approximate the inverse of 𝐆(𝐱){\mathbf{G}}^{\prime}\left({\mathbf{x}}\right). Specifically, for any iji\neq j, we set

Si,j=1𝒯,Si,i=(j=1,jip1pα0,i,j(1+α0,i,j)2)11𝒯,𝒯=1ijp1pα0,i,j(1+α0,i,j)2.S_{i,j}=-\frac{1}{{\mathcal{T}}},\quad S_{i,i}=\left(\sum_{j=1,\>j\neq i}^{p}\frac{1}{p}\frac{\alpha_{0,i,j}}{\left(1+\alpha_{0,i,j}\right)^{2}}\right)^{-1}-\frac{1}{{\mathcal{T}}},\quad{\mathcal{T}}=\sum_{1\leq i\neq j\leq p}\frac{1}{p}\frac{\alpha_{0,i,j}}{\left(1+\alpha_{0,i,j}\right)^{2}}.

Note that

e2κ04p1pα0,i,j(1+α0,i,j)214p,\frac{e^{-2\kappa_{0}}}{4p}\leq\frac{1}{p}\frac{\alpha_{0,i,j}}{\left(1+\alpha_{0,i,j}\right)^{2}}\leq\frac{1}{4p},

we have

𝒯((p1)e2κ04,(p1)4).{\mathcal{T}}\in\left(\frac{(p-1)e^{-2\kappa_{0}}}{4},\frac{(p-1)}{4}\right).

By Theorem 1 in [38], with m=e2κ0/(4p)m=e^{-2\kappa_{0}}/\left(4p\right) and M=1/(4p)M=1/\left(4p\right), there exists a big enough constant C1>0C_{1}>0, we have that

𝐆(𝐱)1𝐒max\displaystyle\left\|{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)^{-1}-{\mathbf{S}}\right\|_{\max} \displaystyle\leq Mm2pM+(p2)m2(p2)m1(p1)2+12m(p1)2\displaystyle\frac{M}{m^{2}}\frac{pM+(p-2)m}{2(p-2)m}\frac{1}{(p-1)^{2}}+\frac{1}{2m(p-1)^{2}}
=\displaystyle= 12m(p1)2(1+Mm+M2m2pp2).\displaystyle\frac{1}{2m(p-1)^{2}}\left(1+\frac{M}{m}+\frac{M^{2}}{m^{2}}\frac{p}{p-2}\right).
\displaystyle\leq C1e6κ0p2,\displaystyle C_{1}\frac{e^{6\kappa_{0}}}{p^{2}},

where 𝐀max=maxi,jAi,j\|{\mathbf{A}}\|_{\max}=\max_{i,j}A_{i,j}. Then, there exists a big enough constant C2>0C_{2}>0,

𝐆(𝐱)1\displaystyle\left\|{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)^{-1}\right\|_{\infty} \displaystyle\leq 𝐆(𝐱)𝐒+𝐒\displaystyle\left\|{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)-{\mathbf{S}}\right\|_{\infty}+\left\|{\mathbf{S}}\right\|_{\infty}
\displaystyle\leq p𝐆(𝐱)1𝐒max+p𝒯+maxi(j=1,jip1pα0,i,j(1+α0,i,j)2)1\displaystyle p\left\|{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)^{-1}-{\mathbf{S}}\right\|_{\max}+\frac{p}{{\mathcal{T}}}+\max_{i}\left(\sum_{j=1,\>j\neq i}^{p}\frac{1}{p}\frac{\alpha_{0,i,j}}{\left(1+\alpha_{0,i,j}\right)^{2}}\right)^{-1}
\displaystyle\leq pC1e6κ0p2+8e2κ0\displaystyle pC_{1}\frac{e^{6\kappa_{0}}}{p^{2}}+8e^{2\kappa_{0}}
<\displaystyle< C2e2κ0.\displaystyle C_{2}e^{2\kappa_{0}}.

By Lemma 1 and Lemma 6, there exist big positive constants C3,c1C_{3},c_{1} such that, with probability greater than 1(np)c11-(np)^{-c_{1}},

(A.35) 𝐆(𝜽(0))\displaystyle\left\|{\mathbf{G}}\left(\boldsymbol{\theta}_{(0)}^{*}\right)\right\|_{\infty}
=\displaystyle= maxi|1npt=1nj=1,jip(Xi,jteβi,0+βj,01+eβi,0+βj,0)|\displaystyle\max_{i}\left|\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left(X_{i,j}^{t}-\frac{e^{\beta^{*}_{i,0}+\beta^{*}_{j,0}}}{1+e^{\beta^{*}_{i,0}+\beta^{*}_{j,0}}}\right)\right|
=\displaystyle= maxi|1npt=1nj=1,jip(Xi,jtE(Xi,jt))|\displaystyle\max_{i}\left|\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left(X_{i,j}^{t}-{\rm E}\left(X_{i,j}^{t}\right)\right)\right|
\displaystyle\leq 1npmaxi|j=1,jip(t=1n(Xi,jtE(Xi,jt)))|\displaystyle\frac{1}{np}\max_{i}\left|\sum_{j=1,\>j\neq i}^{p}\left(\sum_{t=1}^{n}\left(X_{i,j}^{t}-{\rm E}\left(X_{i,j}^{t}\right)\right)\right)\right|
\displaystyle\leq C3np(nplog(p)+nlog(np)+log(n)loglog(n)log(np))\displaystyle\frac{C_{3}}{np}\left(\sqrt{np\log(p)}+\sqrt{n\log(np)}+\log\left(n\right)\log\log\left(n\right)\log\left(np\right)\right)
<\displaystyle< 3C3log(n)log(p)np.\displaystyle 3C_{3}\sqrt{\frac{\log(n)\log(p)}{np}}.

Then, for any 𝐲p{\mathbf{y}}\in\mathbb{R}^{p} with 𝐲<κ0\|{\mathbf{y}}\|_{\infty}<\kappa_{0}, we have that

(A.36) 𝐆(𝜽(0))1𝐆(𝜽(0))𝐆(𝜽(0))1𝐆(𝜽(0))2C2C3e2κ0log(n)log(p)np.\left\|{\mathbf{G}}^{\prime}\left(\boldsymbol{\theta}_{(0)}^{*}\right)^{-1}{\mathbf{G}}\left(\boldsymbol{\theta}_{(0)}^{*}\right)\right\|_{\infty}\leq\left\|{\mathbf{G}}^{\prime}\left(\boldsymbol{\theta}_{(0)}^{*}\right)^{-1}\right\|_{\infty}\left\|{\mathbf{G}}\left(\boldsymbol{\theta}_{(0)}^{*}\right)\right\|_{\infty}\leq 2C_{2}C_{3}e^{2\kappa_{0}}\sqrt{\frac{\log(n)\log(p)}{np}}.

There exists a big enough constant C4>0C_{4}>0, we have, for every x,y𝐁(0,κ)x,y\in{\mathbf{B}}_{\infty}\left(0,\kappa\right),

𝐆(𝐱)𝐆(𝐲)\displaystyle\left\|{\mathbf{G}}^{\prime}\left({\mathbf{x}}\right)-{\mathbf{G}}^{\prime}\left({\mathbf{y}}\right)\right\|_{\infty}
\displaystyle\leq maxi|1npt=1nj=1,jip(Xi,jtexi+xj1+exi+xj)1npt=1nj=1,jip(Xi,jteyi+yj1+eyi+yj)|\displaystyle\max_{i}\left|\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left(X_{i,j}^{t}-\frac{e^{x_{i}+x_{j}}}{1+e^{x_{i}+x_{j}}}\right)-\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left(X_{i,j}^{t}-\frac{e^{y_{i}+y_{j}}}{1+e^{y_{i}+y_{j}}}\right)\right|
\displaystyle\leq 1pmaxi|j=1,jipexi+xj1+exi+xjeyi+yj1+eyi+yj|\displaystyle\frac{1}{p}\max_{i}\left|\sum_{j=1,\>j\neq i}^{p}\frac{e^{x_{i}+x_{j}}}{1+e^{x_{i}+x_{j}}}-\frac{e^{y_{i}+y_{j}}}{1+e^{y_{i}+y_{j}}}\right|
\displaystyle\leq 1pmaxi|j=1,jipezi+zj1+ezi+zj(xi+xjyiyj)|\displaystyle\frac{1}{p}\max_{i}\left|\sum_{j=1,\>j\neq i}^{p}\frac{e^{z_{i}+z_{j}}}{1+e^{z_{i}+z_{j}}}\left(x_{i}+x_{j}-y_{i}-y_{j}\right)\right|
\displaystyle\leq C4𝐱𝐲\displaystyle C_{4}\left\|{\mathbf{x}}-{\mathbf{y}}\right\|_{\infty}

where zi,j:=(1ci,j)(xi+xj)+ci,j(yi+yj)z_{i,j}:=\left(1-c_{i,j}\right)\left(x_{i}+x_{j}\right)+c_{i,j}\left(y_{i}+y_{j}\right) with a series of constants ci,j(0,1)c_{i,j}\in(0,1). Combining the inequalities (A.35), (A.36) and (A.10), we finish the proof of (A.32). ∎

A.11 Proof of (A.34)

Proof.

For brevity, 𝐅(𝐱;𝜽~(0)){\mathbf{F}}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right) is denoted by 𝐅(𝐱){\mathbf{F}}\left({\mathbf{x}}\right). Moreover, as all the conclusions hold uniformly for all 𝜽~(0)\tilde{\boldsymbol{\theta}}_{(0)}, the argument “uniformly for all 𝜽~(0)\tilde{\boldsymbol{\theta}}_{(0)}” are also omitted in what follows.

Note that 𝐅(𝐱){\mathbf{F}}^{\prime}\left({\mathbf{x}}\right) is a balanced symmetric matrix s.t.

𝐅(𝐱)i,j=1pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)2,𝐅(𝐱)i,i=j=1,jip𝐅(𝐱)i,j,{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)_{i,j}=\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}},\quad{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)_{i,i}=\sum_{j=1,\>j\neq i}^{p}{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)_{i,j},

for i=1,,pi=1,\cdots,p and iji\neq j. Following [38], we construct a matrix 𝐒=(Si,j){\mathbf{S}}=(S_{i,j}) to approximate the inverse of 𝐅(𝐱){\mathbf{F}}^{\prime}\left({\mathbf{x}}\right). Specifically, for all iji\neq j, we set

Si,j\displaystyle S_{i,j} =1𝒯,Si,i=(j=1,jip1pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)2)11𝒯,\displaystyle=-\frac{1}{{\mathcal{T}}},\quad S_{i,i}=\left(\sum_{j=1,\>j\neq i}^{p}\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}\right)^{-1}-\frac{1}{{\mathcal{T}}},
𝒯\displaystyle{\mathcal{T}} =1ijp1pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)2.\displaystyle=\sum_{1\leq i\neq j\leq p}\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}.

Note that there exists a big enough constant C1>0C_{1}>0 s.t., for any iji\neq j,

C1pe4κ0+2κ11pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)2<14p,\frac{C_{1}}{pe^{4\kappa_{0}+2\kappa_{1}}}\leq\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}<\frac{1}{4p},

we have

(A.38) 𝒯=1ijp1pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)2(C1pe4κ02κ1,p4).{\mathcal{T}}=\sum_{1\leq i\neq j\leq p}\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}\quad\in\left(C_{1}pe^{-4\kappa_{0}-2\kappa_{1}},\frac{p}{4}\right).

By Theorem 1 in [38], with m=C1/(pe4κ0+2κ1)m=C_{1}/\left(pe^{4\kappa_{0}+2\kappa_{1}}\right) and M=C1/(4p)M=C_{1}/\left(4p\right), there exists big enough constant C2>0C_{2}>0, we have that

𝐅(𝐱)1𝐒max\displaystyle\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)^{-1}-{\mathbf{S}}\right\|_{\max} \displaystyle\leq Mm2pM+(p2)m2(p2)m1(p1)2+12m(p1)2\displaystyle\frac{M}{m^{2}}\frac{pM+(p-2)m}{2(p-2)m}\frac{1}{(p-1)^{2}}+\frac{1}{2m(p-1)^{2}}
=\displaystyle= 12m(p1)2(1+Mm+M2m2pp2).\displaystyle\frac{1}{2m(p-1)^{2}}\left(1+\frac{M}{m}+\frac{M^{2}}{m^{2}}\frac{p}{p-2}\right).
\displaystyle\leq C2e12κ0+6κ1p2,\displaystyle C_{2}\frac{e^{12\kappa_{0}+6\kappa_{1}}}{p^{2}},

where 𝐀max=maxi,jAi,j\|{\mathbf{A}}\|_{\max}=\max_{i,j}A_{i,j}. Then we have that

𝐅(𝐱)1\displaystyle\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)^{-1}\right\|_{\infty}
\displaystyle\leq 𝐒+𝐅(𝐱)1𝐒\displaystyle\left\|{\mathbf{S}}\right\|_{\infty}+\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)^{-1}-{\mathbf{S}}\right\|_{\infty}
\displaystyle\leq maxi(j=1,jip1pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)2)1+p𝒯+p𝐅(𝐱)1𝐒max\displaystyle\max_{i}\left(\sum_{j=1,\>j\neq i}^{p}\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}\right)^{-1}+\frac{p}{{\mathcal{T}}}+p\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)^{-1}-{\mathbf{S}}\right\|_{\max}
<\displaystyle< 2C2e12κ0+6κ1.\displaystyle 2C_{2}e^{12\kappa_{0}+6\kappa_{1}}.

Moreover, we have that

𝐅(𝜽(1))\displaystyle\left\|{\mathbf{F}}\left(\boldsymbol{\theta}_{(1)}^{*}\right)\right\|_{\infty}
=\displaystyle= maxi|1npt=1nj=1,jip{Xi,jtXi,jt1α~0,i,j1+α~0,i,j(111+α~0,i,j+α1,i,j)}|\displaystyle\max_{i}\left|-\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left\{X_{i,j}^{t}X_{i,j}^{t-1}-\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\left(1-\frac{1}{1+\tilde{\alpha}_{0,i,j}+\alpha_{1,i,j}^{*}}\right)\right\}\right|
\displaystyle\leq maxi|1npt=1nj=1,jip{Xi,jtXi,jt1α0,i,j1+α0,i,j(111+α0,i,j+α1,i,j)}|\displaystyle\max_{i}\left|\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left\{X_{i,j}^{t}X_{i,j}^{t-1}-\frac{\alpha_{0,i,j}^{*}}{1+\alpha_{0,i,j}^{*}}\left(1-\frac{1}{1+\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*}}\right)\right\}\right|
+maxi|1npt=1nj=1,jip{α0,i,j1+α0,i,jα0,i,j+α1,i,j1+α0,i,j+α1,i,jα~0,i,j1+α~0,i,jα~0,i,j+α1,i,j1+α~0,i,j+α1,i,j}|\displaystyle+\max_{i}\left|\frac{1}{np}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\left\{\frac{\alpha_{0,i,j}^{*}}{1+\alpha_{0,i,j}^{*}}\frac{\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*}}{1+\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*}}-\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{\tilde{\alpha}_{0,i,j}+\alpha_{1,i,j}^{*}}{1+\tilde{\alpha}_{0,i,j}+\alpha_{1,i,j}^{*}}\right\}\right|
=\displaystyle= L1+L2.\displaystyle L_{1}+L_{2}.

By Lemma 1 and Lemma 6, there exist big positive constants C3,c2C_{3},c_{2} s.t., with probability greater than 1(np)c21-(np)^{-c_{2}},

L1\displaystyle L_{1} =\displaystyle= 1npmaxi|j=1,jipt=1nXi,jtXi,jt1E(Xi,jtXi,jt1)|\displaystyle\frac{1}{np}\max_{i}\left|\sum_{j=1,\>j\neq i}^{p}\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1}-{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)\right|
\displaystyle\leq 1npmaxi|j=1,jip{t=1nXi,jtXi,jt1E(Xi,jtXi,jt1)}|\displaystyle\frac{1}{np}\max_{i}\left|\sum_{j=1,\>j\neq i}^{p}\left\{\sum_{t=1}^{n}X_{i,j}^{t}X_{i,j}^{t-1}-{\rm E}\left(X_{i,j}^{t}X_{i,j}^{t-1}\right)\right\}\right|
\displaystyle\leq C3np(nplog(p)+nlog(np)+log(n)loglog(n)log(np))\displaystyle\frac{C_{3}}{np}\left(\sqrt{np\log(p)}+\sqrt{n\log(np)}+\log\left(n\right)\log\log\left(n\right)\log\left(np\right)\right)
<\displaystyle< 3C3log(n)log(p)np.\displaystyle 3C_{3}\sqrt{\frac{\log(n)\log(p)}{np}}.

Moreover, we have

L2\displaystyle L_{2}
=\displaystyle= 1npmaxi|t=1nj=1,jip{α0,i,j1+α0,i,j(111+α0,i,j+α1,i,j)\displaystyle\frac{1}{np}\max_{i}\Bigg{|}\sum_{t=1}^{n}\sum_{j=1,\>j\neq i}^{p}\Bigg{\{}\frac{\alpha_{0,i,j}^{*}}{1+\alpha_{0,i,j}^{*}}\left(1-\frac{1}{1+\alpha_{0,i,j}^{*}+\alpha_{1,i,j}^{*}}\right)
α~0,i,j1+α~0,i,j(111+α~0,i,j+α1,i,j)}|\displaystyle-\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\left(1-\frac{1}{1+\tilde{\alpha}_{0,i,j}+\alpha_{1,i,j}^{*}}\right)\Bigg{\}}\Bigg{|}
\displaystyle\leq maxi,j,βξ|(βi,0+βj,0β~i,0β~j,0)eβξ,i,j(1+eβξ,i,j)2\displaystyle\max_{i,j,\beta_{\xi}}\Bigg{|}\left(\beta_{i,0}^{*}+\beta_{j,0}^{*}-\tilde{\beta}_{i,0}-\tilde{\beta}_{j,0}\right)\frac{e^{\beta_{\xi,i,j}}}{\left(1+e^{\beta_{\xi,i,j}}\right)^{2}}
+eβξ,i,j(1+eβξ,i,j)(1+eβξ,i,j+α1,i,j)eβξ,i,j((2+α1,i,j)eβξ,i,j+2e2βξ,i,j)(1+eβξ,i,j)2(1+eβξ,i,j+α1,i,j)2|\displaystyle+\frac{e^{\beta_{\xi,i,j}}\left(1+e^{\beta_{\xi,i,j}}\right)\left(1+e^{\beta_{\xi,i,j}}+\alpha_{1,i,j}^{*}\right)-e^{\beta_{\xi,i,j}}\left(\left(2+\alpha_{1,i,j}^{*}\right)e^{\beta_{\xi,i,j}}+2e^{2\beta_{\xi,i,j}}\right)}{\left(1+e^{\beta_{\xi,i,j}}\right)^{2}\left(1+e^{\beta_{\xi,i,j}}+\alpha_{1,i,j}^{*}\right)^{2}}\Bigg{|}
\displaystyle\leq maxi,j,βξ|(βi,0+βj,0β~i,0β~j,0)(eβξ,i,j(1+eβξ,i,j)2+eβξ,i,j(1+α1,i,je2βξ,i,j)(1+eβξ,i,j)2(1+eβξ,i,j+α1,i,j)2)|\displaystyle\max_{i,j,\beta_{\xi}}\Bigg{|}\left(\beta_{i,0}^{*}+\beta_{j,0}^{*}-\tilde{\beta}_{i,0}-\tilde{\beta}_{j,0}\right)\left(\frac{e^{\beta_{\xi,i,j}}}{\left(1+e^{\beta_{\xi,i,j}}\right)^{2}}+\frac{e^{\beta_{\xi,i,j}}\left(1+\alpha_{1,i,j}^{*}-e^{2\beta_{\xi,i,j}}\right)}{\left(1+e^{\beta_{\xi,i,j}}\right)^{2}\left(1+e^{\beta_{\xi,i,j}}+\alpha_{1,i,j}^{*}\right)^{2}}\right)\Bigg{|}
\displaystyle\leq 2𝜽~(0)𝜽(0),\displaystyle 2\left\|\tilde{\boldsymbol{\theta}}_{(0)}-\boldsymbol{\theta}_{(0)}^{*}\right\|_{\infty},

where βξ,i,j:=(1ci,j)(βi,0+βj,0)+ci,j(β~i,0+β~j,0)\beta_{\xi,i,j}:=\left(1-c_{i,j}\right)\left(\beta_{i,0}^{*}+\beta_{j,0}^{*}\right)+c_{i,j}\left(\tilde{\beta}_{i,0}+\tilde{\beta}_{j,0}\right) with a series of constants ci,j(0,1)c_{i,j}\in(0,1). Then, by equation (A.33), there exists a big enough constant C4>0C_{4}>0, we have, with probability tending to 1,

L2\displaystyle L_{2} 2𝜽~(0)𝜽(0)\displaystyle\leq 2\left\|\tilde{\boldsymbol{\theta}}_{(0)}-\boldsymbol{\theta}_{(0)}^{*}\right\|_{\infty}\leq C4log(n)log(p)e4κ0np.\displaystyle C_{4}\sqrt{\frac{\log(n)\log(p)e^{4\kappa_{0}}}{np}}.

We can conclude that, with probability tending to 1,

|𝐅(𝜽(1))|\displaystyle\left|{\mathbf{F}}\left(\boldsymbol{\theta}_{(1)}^{*}\right)\right|_{\infty} \displaystyle\leq 1np(L1+L2)\displaystyle\frac{1}{np}\left(L_{1}+L_{2}\right)
\displaystyle\leq 3C3log(n)log(p)np+C4log(n)log(p)e4κ0np\displaystyle 3C_{3}\sqrt{\frac{\log(n)\log(p)}{np}}+C_{4}\sqrt{\frac{\log(n)\log(p)e^{4\kappa_{0}}}{np}}
\displaystyle\leq C5log(n)log(p)e4κ0np,\displaystyle C_{5}\sqrt{\frac{\log(n)\log(p)e^{4\kappa_{0}}}{np}},

hold uniformly for any ii at the same time. Consequently, we have

𝐅(𝜽(1))1𝐅(𝜽(1))\displaystyle\left\|{\mathbf{F}}^{\prime}\left(\boldsymbol{\theta}_{(1)}^{*}\right)^{-1}{\mathbf{F}}\left(\boldsymbol{\theta}_{(1)}^{*}\right)\right\|_{\infty} \displaystyle\leq 𝐅(𝜽(1))1𝐅(𝜽(1))\displaystyle\left\|{\mathbf{F}}^{\prime}\left(\boldsymbol{\theta}_{(1)}^{*}\right)^{-1}\right\|\left\|{\mathbf{F}}\left(\boldsymbol{\theta}_{(1)}^{*}\right)\right\|_{\infty}
\displaystyle\leq 2C2C5e12κ0+6κ1log(n)log(p)e4κ0np.\displaystyle 2C_{2}C_{5}e^{12\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)e^{4\kappa_{0}}}{np}}.

There exists an big enough constant C6>0C_{6}>0, we have, for every x,y𝐁(0,κ1)x,y\in{\mathbf{B}}_{\infty}\left(0,\kappa_{1}\right),

(A.41) 𝐅(𝐱)𝐅(𝐲)\displaystyle\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}}\right)-{\mathbf{F}}^{\prime}\left({\mathbf{y}}\right)\right\|_{\infty}
\displaystyle\leq maxi|2j=1,jip(1pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)21pα~0,i,j1+α~0,i,jeyi+yj(1+α~0,i,j+eyi+yj)2)|\displaystyle\max_{i}\left|2\sum_{j=1,\>j\neq i}^{p}\left(\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}-\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{y_{i}+y_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{y_{i}+y_{j}}\right)^{2}}\right)\right|
=\displaystyle= 2pmaxi|j=1,jipα~0,i,j1+α~0,i,j(exi+xj(1+α~0,i,j+exi+xj)2eyi+yj(1+α~0,i,j+eyi+yj)2)|\displaystyle\frac{2}{p}\max_{i}\left|\sum_{j=1,\>j\neq i}^{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\left(\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}-\frac{e^{y_{i}+y_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{y_{i}+y_{j}}\right)^{2}}\right)\right|
=\displaystyle= 1pmaxi|j=1,jipα~0,i,j1+α~0,i,jezi,j(1+α~0,i,j+ezi,j)3|xi+xjyiyj||\displaystyle\frac{1}{p}\max_{i}\left|\sum_{j=1,\>j\neq i}^{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{z_{i,j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{z_{i,j}}\right)^{3}}\left|x_{i}+x_{j}-y_{i}-y_{j}\right|\right|
\displaystyle\leq C6𝐱𝐲,\displaystyle C_{6}\left\|{\mathbf{x}}-{\mathbf{y}}\right\|_{\infty},

where zi,j:=(1di,j)(xi+xj)+di,j(yi+yj)z_{i,j}:=\left(1-d_{i,j}\right)\left(x_{i}+x_{j}\right)+d_{i,j}\left(y_{i}+y_{j}\right) with a series of constants di,j(0,1)d_{i,j}\in(0,1). Combining the inequalities (A.11), (A.11) and (A.41), we finish the proof of (A.34). ∎

A.12 Proof of Corollary 1

Proof.

When the condition κ0βi,0Cκ-\kappa_{0}\leq\beta_{i,0}\leq C_{\kappa} and 𝜷1κ1\|\boldsymbol{\beta}_{1}\|_{\infty}\leq\kappa_{1}, where Cκ=O(1)C_{\kappa}=O(1), is satisfied, there exists a constant C1>0C_{1}>0 such that, for all 1ijp1\leq i\neq j\leq p and 𝜽𝐁(𝜽,r)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right) with r=cre2κ04κ1r=c_{r}e^{-2\kappa_{0}-4\kappa_{1}} for a small enough constant cr>0c_{r}>0, it holds

E(𝐕2(𝜽)+𝐕1(𝜽))i,jC1e4κ1p,E(𝐕2(𝜽)+𝐕3(𝜽))i,jC1e4κ12κ0p,\displaystyle{\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{1}(\boldsymbol{\theta}))_{i,j}\geq C_{1}\frac{e^{-4\kappa_{1}}}{p},\quad{\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta})+{\mathbf{V}}_{3}(\boldsymbol{\theta}))_{i,j}\geq C_{1}\frac{e^{-4\kappa_{1}-2\kappa_{0}}}{p},

and

E(𝐕2(𝜽))i,jC1e4κ12κ0p.\displaystyle{\rm E}({\mathbf{V}}_{2}(\boldsymbol{\theta}))_{i,j}\geq C_{1}\frac{e^{-4\kappa_{1}-2\kappa_{0}}}{p}.

Then there exists a constant C2>0C_{2}>0, for all 𝜽𝐁(𝜽,r)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*},r\right) such that we have

E(𝐕(𝜽))2C2e4κ12κ0.\displaystyle\left\|{\rm E}({\mathbf{V}}(\boldsymbol{\theta}))\right\|_{2}\geq C_{2}e^{-4\kappa_{1}-2\kappa_{0}}.

With similar arguments as in the proofs of Lemma 2 and Theorems 1 and 2, we can prove Corollary 1. ∎

A.13 Proof of Corollary 2

Proof.

The first inequality can be proved using the results in Corollary 1, along with analogous reasoning employed in the proof of Theorem 1. Replace equations (A.23) and (A.24) by

rs\displaystyle r_{s} =\displaystyle= e4κ0+8κ1loglog(np)log(np)(np)s(1+log(np)p);\displaystyle e^{4\kappa_{0}+8\kappa_{1}}\log\log(np)\frac{\sqrt{\log(np)}}{\left(np\right)^{s}}\left(1+\frac{\log(np)}{\sqrt{p}}\right);
s0\displaystyle s_{0} =\displaystyle= 6κ0+12κ1+loglog(np)/2+logloglog(np)+log(1+log(np)p)log(cr)log(np).\displaystyle\frac{6\kappa_{0}+12\kappa_{1}+\log\log(np)/2+\log\log\log(np)+\log\left(1+\frac{\log(np)}{\sqrt{p}}\right)-\log(c_{r})}{\log(np)}.

Similar to the proof of Theorem 3, let z1=0.5e2κ04κ1[loglog(np)]1z_{1}=0.5e^{-2\kappa_{0}-4\kappa_{1}}[\log\log(np)]^{-1} (np)s1/2(np)^{s-1/2} and z2=0.5e2κ0+4κ1z_{2}=0.5e^{2\kappa_{0}+4\kappa_{1}} loglog(np)(np)1/2s\log\log(np)(np)^{1/2-s}. We can assert the existence of a sufficiently large constant C1>0C_{1}>0 such that with probability greater than 1(np)c21-(np)^{-c_{2}},

|l(𝜽(i),𝜽^(i)(s))l(𝜽(i),𝜽^(i)(s))[l(𝜽(i),𝜽(i))l(𝜽(i),𝜽(i))]|\displaystyle\left|l\left(\boldsymbol{\theta}_{(i)}^{*},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-l\left(\boldsymbol{\theta}_{(i)},\widehat{\boldsymbol{\theta}}^{(s)}_{(-i)}\right)-\left[l\left(\boldsymbol{\theta}_{(i)}^{*},\boldsymbol{\theta}_{(-i)}^{*}\right)-l\left(\boldsymbol{\theta}_{(i)},\boldsymbol{\theta}_{(-i)}^{*}\right)\right]\right|
\displaystyle\leq C1e6κ0+12κ1loglog(np)log(np)(np)s+1/2(1+log(np)p)2\displaystyle C_{1}\frac{e^{6\kappa_{0}+12\kappa_{1}}\log\log(np)\log(np)}{(np)^{s+1/2}}\left(1+\frac{\log(np)}{\sqrt{p}}\right)^{2}

holds uniformly for all s[s0,1/2]s\in[s_{0},1/2] and 𝜽(i)𝐁(𝜽(i),rs)\boldsymbol{\theta}_{(i)}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{*}_{(i)},r_{s}\right). Following similar steps, we can ascertain that as npnp\rightarrow\infty with n2n\geq 2, with probability converging to one it holds that

𝜽^𝜽Ce4κ0+8κ1loglog(np)log(np)np(1+log(np)p).\left\|\widehat{\boldsymbol{\theta}}-\boldsymbol{\theta}^{*}\right\|_{\infty}\leq Ce^{4\kappa_{0}+8\kappa_{1}}\log\log(np)\sqrt{\frac{\log(np)}{np}}\left(1+\frac{\log(np)}{\sqrt{p}}\right).

A.14 Proof of Corollary 3

Proof.

There exist positive constants C1C_{1} and C2C_{2} such that

1ijp1pα~0,i,j1+α~0,i,jexi+xj(1+α~0,i,j+exi+xj)2(C1pe2κ02κ1,p4),\sum_{1\leq i\neq j\leq p}\frac{1}{p}\frac{\tilde{\alpha}_{0,i,j}}{1+\tilde{\alpha}_{0,i,j}}\frac{e^{x_{i}+x_{j}}}{\left(1+\tilde{\alpha}_{0,i,j}+e^{x_{i}+x_{j}}\right)^{2}}\quad\in\left(C_{1}pe^{-2\kappa_{0}-2\kappa_{1}},\frac{p}{4}\right),

for all 𝐱κ1\|{\mathbf{x}}\|_{\infty}\leq\kappa_{1}. Then the inequalities in (A.34) can be updated to

𝐅(𝐱;𝜽~(0))𝐅(𝐲;𝜽~(0))C2𝐱𝐲,\displaystyle\left\|{\mathbf{F}}^{\prime}\left({\mathbf{x}};\tilde{\boldsymbol{\theta}}_{(0)}\right)-{\mathbf{F}}^{\prime}\left({\mathbf{y}};\tilde{\boldsymbol{\theta}}_{(0)}\right)\right\|_{\infty}\leq C_{2}\left\|{\mathbf{x}}-{\mathbf{y}}\right\|_{\infty},
𝐅(𝜽(1);𝜽~(0))1C2e2κ0+6κ1,\displaystyle\left\|{\mathbf{F}}^{\prime}\left(\boldsymbol{\theta}_{(1)}^{*};\tilde{\boldsymbol{\theta}}_{(0)}\right)^{-1}\right\|_{\infty}\leq C_{2}e^{2\kappa_{0}+6\kappa_{1}},
𝐅(𝜽(1);𝜽~(0))1𝐅(𝜽(1);𝜽~(0))C2e4κ0+6κ1log(n)log(p)np,\displaystyle\left\|{\mathbf{F}}^{\prime}\left(\boldsymbol{\theta}_{(1)}^{*};\tilde{\boldsymbol{\theta}}_{(0)}\right)^{-1}{\mathbf{F}}\left(\boldsymbol{\theta}_{(1)}^{*};\tilde{\boldsymbol{\theta}}_{(0)}\right)\right\|_{\infty}\leq C_{2}e^{4\kappa_{0}+6\kappa_{1}}\sqrt{\frac{\log(n)\log(p)}{np}},

for all 𝐱κ1\|{\mathbf{x}}\|_{\infty}\leq\kappa_{1}, 𝐲κ1\|{\mathbf{y}}\|_{\infty}\leq\kappa_{1}. Consequently, with Lemma 8, similar to the proof of Theorem 4, we can prove Corollary 3. ∎

A.15 Proof of the Theorem 5

Proof.

To simplify the notations, in what follows we shall denote li,j(θi,θj)l_{i,j}(\theta_{i},\theta_{j}) as li,j(𝜽)l_{i,j}(\boldsymbol{\theta}) instead. Let K=c0log(np)+1K=\lfloor c_{0}\log(np)\rfloor+1 for some big enough constant c0>0c_{0}>0, where \lfloor\cdot\rfloor is the smallest integer function. By the Taylor expansion with Lagrange remainder we have, for any 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}(\boldsymbol{\theta}^{\prime},\alpha_{0}), there exists a 𝜽ξ𝐁(𝜽,α0)\boldsymbol{\theta}^{\xi}\in{\mathbf{B}}_{\infty}(\boldsymbol{\theta}^{\prime},\alpha_{0}) dependent on 𝜽\boldsymbol{\theta} s.t.,

𝐋(𝜽)𝐋(𝜽)\displaystyle{\mathbf{L}}\left(\boldsymbol{\theta}\right)-{\mathbf{L}}\left(\boldsymbol{\theta}^{\prime}\right)
=\displaystyle= I1=1p(θI1θI1)𝐋(𝜽)θI1\displaystyle\sum_{I_{1}=1}^{p}\left(\theta_{I_{1}}-\theta_{I_{1}}^{\prime}\right)\frac{\partial{\mathbf{L}}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{I_{1}}}
+12!I1=1pI2=1p(θI1θI1)(θI2θI2)2𝐋(𝜽)θI2θI2\displaystyle+\frac{1}{2!}\sum_{I_{1}=1}^{p}\sum_{I_{2}=1}^{p}\left(\theta_{I_{1}}-\theta_{I_{1}}^{\prime}\right)\left(\theta_{I_{2}}-\theta_{I_{2}}^{\prime}\right)\frac{\partial^{2}{\mathbf{L}}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{I_{2}}\partial\theta_{I_{2}}}
+\displaystyle+\cdots
+1K!I1=1pI2=1pIK=1p(=1K(θIθI))K𝐋(𝜽)θI1θI2θIK\displaystyle+\frac{1}{K!}\sum_{I_{1}=1}^{p}\sum_{I_{2}=1}^{p}\cdots\sum_{I_{K}=1}^{p}\left(\prod_{\ell=1}^{K}\left(\theta_{I_{\ell}}-\theta_{I_{\ell}}^{\prime}\right)\right)\frac{\partial^{K}{\mathbf{L}}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{I_{1}}\partial\theta_{I_{2}}\cdots\partial\theta_{I_{K}}}
+1(K+1)!I1=1pI2=1pIK+1=1p(=1K+1(θIθI))K+1𝐋(𝜽ξ)θI1θI2θIK+1\displaystyle+\frac{1}{\left(K+1\right)!}\sum_{I_{1}=1}^{p}\sum_{I_{2}=1}^{p}\cdots\sum_{I_{K+1}=1}^{p}\left(\prod_{\ell=1}^{K+1}\left(\theta_{I_{\ell}}-\theta_{I_{\ell}}^{\prime}\right)\right)\frac{\partial^{K+1}{\mathbf{L}}\left(\boldsymbol{\theta}^{\xi}\right)}{\partial\theta_{I_{1}}\partial\theta_{I_{2}}\cdots\partial\theta_{I_{K+1}}}
=\displaystyle= 1pk=1K{1k!1ijp(Yi,js=0k(ks)(θiθi)s(θjθj)kskli,j(𝜽)θisθjks)}\displaystyle\frac{1}{p}\sum_{k=1}^{K}\left\{\frac{1}{k!}\sum_{1\leq i\neq j\leq p}\left(Y_{i,j}\sum_{s=0}^{k}\tbinom{k}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k-s}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right)\right\}
+1p1(K+1)!1ijp(Yi,js=0K+1(K+1s)(θiθi)s(θjθj)K+1sK+1li,j(𝜽ξ)θisθjK+1s)\displaystyle+\frac{1}{p}\frac{1}{\left(K+1\right)!}\sum_{1\leq i\neq j\leq p}\left(Y_{i,j}\sum_{s=0}^{K+1}\tbinom{K+1}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{K+1-s}\frac{\partial^{K+1}l_{i,j}\left(\boldsymbol{\theta}^{\xi}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{K+1-s}}\right)
\displaystyle\leq |k=1K1p1k!1ijpYi,j((θiθi)kkli,j(𝜽)θik+(θjθj)kkli,j(𝜽)θjk)|\displaystyle\left|\sum_{k=1}^{K}\frac{1}{p}\frac{1}{k!}\sum_{1\leq i\neq j\leq p}Y_{i,j}\left(\left(\theta_{i}-\theta_{i}^{\prime}\right)^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}+\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{j}^{k}}\right)\right|
+|k=2K1p1k!1ijp(Yi,js=1k1(ks)(θiθi)s(θjθj)kskli,j(𝜽)θisθjks)|\displaystyle+\left|\sum_{k=2}^{K}\frac{1}{p}\frac{1}{k!}\sum_{1\leq i\neq j\leq p}\left(Y_{i,j}\sum_{s=1}^{k-1}\tbinom{k}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k-s}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right)\right|
+1p|1(K+1)!1ijp(Yi,js=0K+1(K+1s)(θiθi)s(θjθj)K+1sK+1li,j(𝜽ξ)θisθjK+1s)|\displaystyle+\frac{1}{p}\left|\frac{1}{\left(K+1\right)!}\sum_{1\leq i\neq j\leq p}\left(Y_{i,j}\sum_{s=0}^{K+1}\tbinom{K+1}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{K+1-s}\frac{\partial^{K+1}l_{i,j}\left(\boldsymbol{\theta}^{\xi}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{K+1-s}}\right)\right|
=\displaystyle= S(1)+S(2)+S(3).\displaystyle S^{(1)}+S^{(2)}+S^{(3)}.

We first consider S(1)S^{(1)}. By Lemma 6, there exist big enough constants C1>0,c1>0C_{1}>0,c_{1}>0 such that, uniformly for all 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right) and all k=1,,Kk=1,\cdots,K, we have, with probability greater than 1(np)c11-(np)^{-c_{1}},

1p1k!|1ijpYi,j((θiθi)kkli,j(𝜽)θik+(θjθj)kkli,j(𝜽)θjk)|\displaystyle\frac{1}{p}\frac{1}{k!}\left|\sum_{1\leq i\neq j\leq p}Y_{i,j}\left(\left(\theta_{i}-\theta_{i}^{\prime}\right)^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}+\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{j}^{k}}\right)\right|
\displaystyle\leq 2p1k!i=1p|θiθiα|k|ji,j=1pαkYi,jkli,j(𝜽)θjk|\displaystyle\frac{2}{p}\frac{1}{k!}\sum_{i=1}^{p}\left|\frac{\theta_{i}-\theta_{i}^{\prime}}{\alpha}\right|^{k}\left|\sum_{j\neq i,j=1}^{p}\alpha^{k}Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{j}^{k}}\right|
\displaystyle\leq 2p1k!maxi|ji,j=1pαkYi,jkli,j(𝜽)θjk|i=1p|θiθiα|k\displaystyle\frac{2}{p}\frac{1}{k!}\max_{i}\left|\sum_{j\neq i,j=1}^{p}\alpha^{k}Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{j}^{k}}\right|\sum_{i=1}^{p}\left|\frac{\theta_{i}-\theta_{i}^{\prime}}{\alpha}\right|^{k}
\displaystyle\leq C1p1k!(b(p)log(np)+σ(p)plog(np))(k1)!𝜽𝜽αkk\displaystyle\frac{C_{1}}{p}\frac{1}{k!}\left(b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}\right)\left(k-1\right)!\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha}\right\|_{k}^{k}
=\displaystyle= C1𝜽𝜽αkkb(p)log(np)+σ(p)plog(np)kp.\displaystyle C_{1}\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha}\right\|_{k}^{k}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{kp}.

Consequently, with probability greater than 1(np)c11-(np)^{-c_{1}} we have:

S(1)\displaystyle S^{(1)} \displaystyle\leq k=1K1p1k!|1ijpYi,j((θiθi)kkli,j(𝜽)θik+(θjθj)kkli,j(𝜽)θjk)|\displaystyle\sum_{k=1}^{K}\frac{1}{p}\frac{1}{k!}\left|\sum_{1\leq i\neq j\leq p}Y_{i,j}\left(\left(\theta_{i}-\theta_{i}^{\prime}\right)^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}+\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{j}^{k}}\right)\right|
\displaystyle\leq C1k=1K𝜽𝜽αkkb(p)log(np)+σ(p)plog(np)kp\displaystyle C_{1}\sum_{k=1}^{K}\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha}\right\|_{k}^{k}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{kp}
\displaystyle\leq C1b(p)log(np)+σ(p)plog(np)pk=1K1k𝜽𝜽αkk,\displaystyle C_{1}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\sum_{k=1}^{K}\frac{1}{k}\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha}\right\|_{k}^{k},

holds uniformly for all 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right). Note that for any 𝐱=(x1,,xp){\mathbf{x}}=\left(x_{1},\cdots,x_{p}\right)^{\top} s.t. 𝐱a<1\|{\mathbf{x}}\|_{\infty}\leq a<1 for some constant a>0a>0, we have

(A.42) k=1K1k𝐱kk=k=1Ki=1p1kxik=i=1pxik=1Kxik1k𝐱1k=1Kak1k𝐱1(log(1a)a).\sum_{k=1}^{K}\frac{1}{k}\left\|{\mathbf{x}}\right\|_{k}^{k}=\sum_{k=1}^{K}\sum_{i=1}^{p}\frac{1}{k}x_{i}^{k}=\sum_{i=1}^{p}x_{i}\sum_{k=1}^{K}\frac{x_{i}^{k-1}}{k}\leq\|{\mathbf{x}}\|_{1}\sum_{k=1}^{K}\frac{a^{k-1}}{k}\leq\left\|{\mathbf{x}}\right\|_{1}\left(-\frac{\log(1-a)}{a}\right).

Here in last step we have used log(1a)=k=1ak/k-\log(1-a)=\sum_{k=1}^{\infty}a^{k}/k. With the fact that 𝜽𝜽α<α0/α<1/2\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha}\right\|_{\infty}<\alpha_{0}/\alpha<1/2 holds for any 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right), we have, with probability greater than 1(np)c11-(np)^{-c_{1}},

S(1)\displaystyle S^{(1)} \displaystyle\leq log(1/2)1/2C1b(p)log(np)+σ(p)plog(np)p𝜽𝜽1\displaystyle-\frac{\log(1/2)}{1/2}C_{1}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{1}
\displaystyle\leq 2C1b(p)log(np)+σ(p)plog(np)p𝜽𝜽1.\displaystyle 2C_{1}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{1}.

Next we derive an upper bound for S(2)S^{(2)}. Define a series of random p×pp\times p matrices {𝐘ks:k=2,,K,s=1,,k1}\{{\mathbf{Y}}^{s}_{k}:k=2,\ldots,K,s=1,\cdots,k-1\} with the (i,j)(i,j)-th elements of 𝐘ks{\mathbf{Y}}^{s}_{k} given as

(𝐘ks)i,j=Yi,jαkkli,j(𝜽)θisθjks,1ijp.\left({\mathbf{Y}}^{s}_{k}\right)_{i,j}=Y_{i,j}\alpha^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}},\quad 1\leq i\neq j\leq p.

Further for any k=2,,Kk=2,\ldots,K, define a random (k1)p×(k1)p(k-1)p\times(k-1)p matrix 𝐖k{\mathbf{W}}_{k} as:

𝐖k=[0𝐘k1𝐘k2𝐘kk2𝐘kk10].{\mathbf{W}}_{k}=\left[\begin{array}[]{ccccc}0&&&&{\mathbf{Y}}^{1}_{k}\\ &&&{\mathbf{Y}}^{2}_{k}&\\ &&\iddots&&\\ &{\mathbf{Y}}^{k-2}_{k}&&&\\ {\mathbf{Y}}^{k-1}_{k}&&&&0\\ \end{array}\right].

Also we define a series of p×1p\times 1 vectors, {𝐳k(s):k=2,,K,s=1,,k1}\{{\mathbf{z}}^{(s)}_{k}:k=2,\ldots,K,s=1,\cdots,k-1\} with ii-th element of 𝐳k(s){\mathbf{z}}^{(s)}_{k} given as:

(𝐳k(s))i=(0.5α)s(θiθi)s(ks).\left({\mathbf{z}}^{(s)}_{k}\right)_{i}=\left(0.5\alpha\right)^{-s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\sqrt{\tbinom{k}{s}}.

For any k=2,,Kk=2,\ldots,K, by denoting 𝐳~k\tilde{{\mathbf{z}}}_{k} as:

𝐳~k=[𝐳k(1)𝐳k(k1)],\tilde{{\mathbf{z}}}_{k}=\left[\begin{array}[]{c}{\mathbf{z}}_{k}^{(1)}\\ \vdots\\ {\mathbf{z}}_{k}^{(k-1)}\\ \end{array}\right],

we have:

1p1k!|1ijp(Yi,js=1k1(ks)(θiθi)s(θjθj)kskli,j(𝜽)θisθjks)|\displaystyle\frac{1}{p}\frac{1}{k!}\left|\sum_{1\leq i\neq j\leq p}\left(Y_{i,j}\sum_{s=1}^{k-1}\tbinom{k}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k-s}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right)\right|
=\displaystyle= 1p12kk!|s=1k1(𝐳~k(ks))𝐘ks𝐳~k(s)|\displaystyle\frac{1}{p}\frac{1}{2^{k}k!}\left|\sum_{s=1}^{k-1}\left(\tilde{{\mathbf{z}}}_{k}^{(k-s)}\right)^{\top}{\mathbf{Y}}_{k}^{s}\tilde{{\mathbf{z}}}_{k}^{(s)}\right|
=\displaystyle= 1p12kk!|𝐳~k𝐖k𝐳~k|\displaystyle\frac{1}{p}\frac{1}{2^{k}k!}\left|\tilde{{\mathbf{z}}}_{k}^{\top}{\mathbf{W}}_{k}\tilde{{\mathbf{z}}}_{k}\right|
\displaystyle\leq 1p12kk!𝐳~k22𝐖k2.\displaystyle\frac{1}{p}\frac{1}{2^{k}k!}\|\tilde{{\mathbf{z}}}_{k}\|_{2}^{2}\|{\mathbf{W}}_{k}\|_{2}.

We remark that by formulating the confounding terms in S(2)S^{(2)} via {𝐳k(s),𝐖k}\{{\mathbf{z}}^{(s)}_{k},{\mathbf{W}}_{k}\}, we have established in (A.15) an upper bound that depends on the parameters in {𝐳k(s),k=2,,K}\{{\mathbf{z}}^{(s)}_{k},k=2,\ldots,K\} and the random matrices {𝐖k,2,,K}\{{\mathbf{W}}_{k},2,\ldots,K\} separately. Using the fact that l=0(l+ss)0.5l+s+1=1\sum_{l=0}^{\infty}\tbinom{l+s}{s}0.5^{l+s+1}=1, we have

k=s+1K(ks)0.5k<l=0(l+ss)0.5l+s=2.\sum_{k=s+1}^{K}\tbinom{k}{s}0.5^{k}<\sum_{l=0}^{\infty}\tbinom{l+s}{s}0.5^{l+s}=2.

Consequently, there exists a big enough constant C2>0C_{2}>0 such that, for all 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right), we have

k=2K0.5kk𝐳~k22\displaystyle\sum_{k=2}^{K}\frac{0.5^{k}}{k}\|\tilde{{\mathbf{z}}}_{k}\|_{2}^{2} =\displaystyle= k=2K0.5kki=1ps=1k1(0.5α)2s(θiθi)2s(ks)\displaystyle\sum_{k=2}^{K}\frac{0.5^{k}}{k}\sum_{i=1}^{p}\sum_{s=1}^{k-1}\left(0.5\alpha\right)^{-2s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{2s}\tbinom{k}{s}
=\displaystyle= k=2K0.5kks=1k1(ks)(0.5α)2s𝜽𝜽2s2s\displaystyle\sum_{k=2}^{K}\frac{0.5^{k}}{k}\sum_{s=1}^{k-1}\tbinom{k}{s}\left(0.5\alpha\right)^{-2s}\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\|_{2s}^{2s}
\displaystyle\leq s=1K11s𝜽𝜽α/22s2s(k=s+1K(ks)0.5k)\displaystyle\sum_{s=1}^{K-1}\frac{1}{s}\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha/2}\right\|_{2s}^{2s}\left(\sum_{k=s+1}^{K}\tbinom{k}{s}0.5^{k}\right)
<\displaystyle< 2s=1K11s𝜽𝜽α/22s2s\displaystyle 2\sum_{s=1}^{K-1}\frac{1}{s}\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha/2}\right\|_{2s}^{2s}
<\displaystyle< 2s=1K11s𝜽𝜽α/2ss\displaystyle 2\sum_{s=1}^{K-1}\frac{1}{s}\left\|\frac{\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}}{\alpha/2}\right\|_{s}^{s}
\displaystyle\leq C2𝜽𝜽1.\displaystyle C_{2}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{1}.

Here the last step follows from (A.42). With similar arguments as in the proof of the matrix Bernstein inequality (c.f. Lemma 3), we can show that uniformly for all kK=c0log(np)+1k\leq K=\lfloor c_{0}\log(np)\rfloor+1, there exist big enough constants C3>0C_{3}>0 and c2>0c_{2}>0, such that with probability greater than 1(np)c21-(np)^{c_{2}},

(A.45) 𝐖k2C3(k1)!(b(p)log(np)+σ(p)plog(np)).\|{\mathbf{W}}_{k}\|_{2}\leq C_{3}\left(k-1\right)!\left(b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}\right).

For brevity, the proof of inequality (A.45) is provided independently in Section (A.16). Consequently, combining (A.15), (A.15) and (A.45) and K=c0log(np)+1K=\lfloor c_{0}\log(np)\rfloor+1, we conclude that with probability greater than 1(np)c21-(np)^{-c_{2}},

S(2)\displaystyle S^{(2)} \displaystyle\leq k=2K1p1k!|1ijp(Yi,js=1k1(ks)(θiθi)s(θjθj)kskli,j(𝜽)θik)|\displaystyle\sum_{k=2}^{K}\frac{1}{p}\frac{1}{k!}\left|\sum_{1\leq i\neq j\leq p}\left(Y_{i,j}\sum_{s=1}^{k-1}\tbinom{k}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k-s}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\right)\right|
\displaystyle\leq k=2K1p12kk!𝐳~k22𝐖k2\displaystyle\sum_{k=2}^{K}\frac{1}{p}\frac{1}{2^{k}k!}\|\tilde{{\mathbf{z}}}_{k}\|_{2}^{2}\|{\mathbf{W}}_{k}\|_{2}
\displaystyle\leq C3b(p)log(np)+σ(p)plog(np)pk=2K1k2k𝐳~k22\displaystyle C_{3}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\sum_{k=2}^{K}\frac{1}{k2^{k}}\|\tilde{{\mathbf{z}}}_{k}\|_{2}^{2}
\displaystyle\leq C2C3b(p)log(np)+σ(p)plog(np)p𝜽𝜽1,\displaystyle C_{2}C_{3}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{1},

uniformly for all 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right). Finally, we derive an upper bound for S(3)S^{(3)}. By condition (L-A1), when c0c_{0} is chosen to be big enough, there exists a big enough constant c3>1c_{3}>1, such that, uniformly for all 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}(\boldsymbol{\theta}^{\prime},\alpha_{0}) and 𝜽ξ\boldsymbol{\theta}^{\xi}, we have

S(3)\displaystyle S^{(3)}
=\displaystyle= 1p|1(K+1)!1ijp(Yi,js=0K+1(K+1s)(θiθi)s(θjθj)K+1sK+1li,j(𝜽ξ)θisθjK+1s)|\displaystyle\frac{1}{p}\left|\frac{1}{\left(K+1\right)!}\sum_{1\leq i\neq j\leq p}\left(Y_{i,j}\sum_{s=0}^{K+1}\tbinom{K+1}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{K+1-s}\frac{\partial^{K+1}l_{i,j}\left(\boldsymbol{\theta}^{\xi}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{K+1-s}}\right)\right|
\displaystyle\leq 1p1K+11ijp|Yi,j|(|θiθi|α+|θjθj|α)K+1\displaystyle\frac{1}{p}\frac{1}{K+1}\sum_{1\leq i\neq j\leq p}\left|Y_{i,j}\right|\left(\frac{\left|\theta_{i}-\theta_{i}^{\prime}\right|}{\alpha}+\frac{\left|\theta_{j}-\theta_{j}^{\prime}\right|}{\alpha}\right)^{K+1}
\displaystyle\leq pb(p)K+1(2𝜽𝜽α)K+1\displaystyle\frac{pb_{(p)}}{K+1}\left(\frac{2\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{\infty}}{\alpha}\right)^{K+1}
\displaystyle\leq pb(p)K+1(2α0α)c0log(np)\displaystyle\frac{pb_{(p)}}{K+1}\left(\frac{2\alpha_{0}}{\alpha}\right)^{c_{0}\log\left(np\right)}
\displaystyle\leq b(p)(np)c3.\displaystyle b_{(p)}\left(np\right)^{-c_{3}}.

Here the last step will hold if we choose c0c_{0} to be large enough such that (2α0/α)c0/(c3+1)(2\alpha_{0}/\alpha)^{c_{0}/(c_{3}+1)}<e1<e^{-1}. When npnp\rightarrow\infty, and c0,c3c_{0},c_{3} are chosen to be large enough, this bound will be dominated by the upper bounds for S(1)S^{(1)} and S(2)S^{(2)}.

Combining the upper bound on S(1),S(2)S^{(1)},S^{(2)} and S(3)S^{(3)}, we conclude that, for any given 𝜽\boldsymbol{\theta}^{\prime}, there exist large enough constants C>0C>0, c>0c>0 which are independent of 𝜽\boldsymbol{\theta}^{\prime} such that for any 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right) where α0(0,α/2)\alpha_{0}\in(0,\alpha/2) at the same time with probability greater than 1(np)c1-(np)^{-c},

|𝐋(𝜽)𝐋(𝜽)|Cb(p)log(np)+σ(p)plog(np)p𝜽𝜽1.\left|{\mathbf{L}}\left(\boldsymbol{\theta}\right)-{\mathbf{L}}\left(\boldsymbol{\theta}^{\prime}\right)\right|\leq C\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{1}.

A.16 Proof of inequality (A.45)

Proof.

For any k=2,,Kk=2,\ldots,K and 1ijp1\leq i\neq j\leq p, let 𝐖k,i,j{\mathbf{W}}_{k,i,j} be defined by keeping the (i,j)(i,j)th element of all the 𝐘ks,s=1,,k1{\mathbf{Y}}^{s}_{k},s=1,\cdots,k-1 in 𝐖k{\mathbf{W}}_{k} unchanged, and setting all other elements to be zero. Then the random matrices 𝐖k,i,j{\mathbf{W}}_{k,i,j}, 1ijp1\leq i\neq j\leq p are independent, and

1ijp𝐖k,i,j=𝐖k.\sum_{1\leq i\neq j\leq p}{\mathbf{W}}_{k,i,j}={\mathbf{W}}_{k}.

For any 𝐚(k1)p{\mathbf{a}}\in\mathbb{R}^{(k-1)p}, we can write it as

𝐚=[𝐚(1)𝐚(k1)],{\mathbf{a}}=\left[\begin{array}[]{c}{\mathbf{a}}^{(1)}\\ \vdots\\ {\mathbf{a}}^{(k-1)}\\ \end{array}\right],

with 𝐚(s)=(a1(s),,ap(s))p{\mathbf{a}}^{(s)}=(a^{(s)}_{1},\ldots,a^{(s)}_{p})^{\top}\in\mathbb{R}^{p}, r=1,2,,k1r=1,2,\cdots,k-1. Then for any k=2,,Kk=2,\ldots,K and 1ijp1\leq i\neq j\leq p, we have,

𝐖k,i,j2\displaystyle\|{\mathbf{W}}_{k,i,j}\|_{2} =\displaystyle= sup𝐚2=1𝐖k,i,j𝐚2\displaystyle\sup_{\|{\mathbf{a}}\|_{2}=1}\|{\mathbf{W}}_{k,i,j}{\mathbf{a}}\|_{2}
=\displaystyle= sup𝐚2=1(s=1k1(αkYi,jkli,j(𝜽)θisθjksaj(ks))2)1/2\displaystyle\sup_{\|{\mathbf{a}}\|_{2}=1}\left(\sum_{s=1}^{k-1}\left(\alpha^{k}Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}a^{(k-s)}_{j}\right)^{2}\right)^{1/2}
\displaystyle\leq max1sk|αkYi,jkli,j(𝜽)θisθjks|sup𝐚2=1(s=1k1(aj(ks))2)1/2\displaystyle\max_{1\leq s\leq k}\left|\alpha^{k}Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right|\sup_{\|{\mathbf{a}}\|_{2}=1}\left(\sum_{s=1}^{k-1}\left(a^{(k-s)}_{j}\right)^{2}\right)^{1/2}
\displaystyle\leq (k1)!b(p).\displaystyle\left(k-1\right)!b_{(p)}.

On the other hand,

max{1ijpE(𝐖k,i,j𝐖k,i,j)2,1ijpE(𝐖k,i,j𝐖k,i,j)2}\displaystyle\max\left\{\left\|\sum_{1\leq i\neq j\leq p}{\rm E}\left({\mathbf{W}}_{k,i,j}{\mathbf{W}}_{k,i,j}^{\top}\right)\right\|_{2},\left\|\sum_{1\leq i\neq j\leq p}{\rm E}\left({\mathbf{W}}_{k,i,j}^{\top}{\mathbf{W}}_{k,i,j}\right)\right\|_{2}\right\}
=\displaystyle= max{sup𝐚2=1|𝐚(1ijpE(𝐖k,i,j𝐖k,i,j))𝐚|,\displaystyle\max\Bigg{\{}\sup_{\|{\mathbf{a}}\|_{2}=1}\left|{\mathbf{a}}^{\top}\left(\sum_{1\leq i\neq j\leq p}{\rm E}\left({\mathbf{W}}_{k,i,j}{\mathbf{W}}_{k,i,j}^{\top}\right)\right){\mathbf{a}}\right|,
sup𝐚2=1|𝐚(1ijpE(𝐖k,i,j𝐖k,i,j))𝐚|}\displaystyle\sup_{\|{\mathbf{a}}\|_{2}=1}\left|{\mathbf{a}}^{\top}\left(\sum_{1\leq i\neq j\leq p}{\rm E}\left({\mathbf{W}}_{k,i,j}^{\top}{\mathbf{W}}_{k,i,j}\right)\right){\mathbf{a}}\right|\Bigg{\}}
=\displaystyle= max{sup𝐚2=1|1ijps=1k1(Var(Yi,j)α2k|kli,j(𝜽)θisθjks|2(aj(ks))2)|,\displaystyle\max\Bigg{\{}\sup_{\|{\mathbf{a}}\|_{2}=1}\left|\sum_{1\leq i\neq j\leq p}\sum_{s=1}^{k-1}\left({\rm Var}\left(Y_{i,j}\right)\alpha^{2k}\left|\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right|^{2}\left(a^{(k-s)}_{j}\right)^{2}\right)\right|,
sup𝐚2=1|1ijps=1k1(Var(Yi,j)α2k|kli,j(𝜽)θisθjks|2(ai(s))2)|}\displaystyle\sup_{\|{\mathbf{a}}\|_{2}=1}\left|\sum_{1\leq i\neq j\leq p}\sum_{s=1}^{k-1}\left({\rm Var}\left(Y_{i,j}\right)\alpha^{2k}\left|\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right|^{2}\left(a^{(s)}_{i}\right)^{2}\right)\right|\Bigg{\}}
\displaystyle\leq maxi,j,s(Var(Yi,j)α2k|kli,j(𝜽)θisθjks|2)sup𝐚2=1|1ijps=1k1(ai(ks))2+(aj(s))2|\displaystyle\max_{i,j,s}\left({\rm Var}\left(Y_{i,j}\right)\alpha^{2k}\left|\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right|^{2}\right)\sup_{\|{\mathbf{a}}\|_{2}=1}\left|\sum_{1\leq i\neq j\leq p}\sum_{s=1}^{k-1}\left(a^{(k-s)}_{i}\right)^{2}+\left(a^{(s)}_{j}\right)^{2}\right|
\displaystyle\leq 2p((k1)!)2σ(p)2.\displaystyle 2p\left(\left(k-1\right)!\right)^{2}\sigma_{(p)}^{2}.

Using the general Matrix Bernstein inequality (c.f. Theorem 6.17 and equation (6.43) of [33]), we have,

P(𝐖k2>ϵ)\displaystyle P\left(\left\|{\mathbf{W}}_{k}\right\|_{2}>\epsilon\right) =P(1i<jp𝐖k,i,j2>ϵ)\displaystyle=P\left(\left\|\sum_{1\leq i<j\leq p}{\mathbf{W}}_{k,i,j}\right\|_{2}>\epsilon\right)
2pexp(ϵ2(p1)((k1)!)2σ(p)2+2(k1)!b(p)ϵ).\displaystyle\leq 2p\ \exp\left(-\frac{\epsilon^{2}}{\left(p-1\right)\left(\left(k-1\right)!\right)^{2}\sigma_{(p)}^{2}+2\left(k-1\right)!b_{(p)}\epsilon}\right).

Consequently, there exist big enough constants C2>0C_{2}>0 and c2>0c_{2}>0 s.t. by choosing

ϵ=C2(k1)!(b(p)log(np)+σ(p)plog(np)),\displaystyle\epsilon=C_{2}\left(k-1\right)!\left(b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}\right),

we have, with probability greater than 1(np)c21-(np)^{c_{2}},

𝐖k2C2(k1)!(b(p)log(np)+σ(p)plog(np)),\|{\mathbf{W}}_{k}\|_{2}\leq C_{2}\left(k-1\right)!\left(b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}\right),

holds for all kK=c0log(np)k\leq K=c_{0}\log(np) where c0>0c_{0}>0 is big enough constant. ∎

A.17 Proof of Theorem 6

Proof.

For any vector 𝐱p{\mathbf{x}}\in\mathbb{R}^{p}, we define 𝐱i:=(x1,,xi1,xi+1,,xp){\mathbf{x}}_{-i}:=(x_{1},\cdots,x_{i-1},x_{i+1},\cdots,x_{p})^{\top}. With similar arguments as in the proof of Theorem 5, we have that there exists a 𝜽ξ\boldsymbol{\theta}^{\xi} such that

𝐋i(𝜽)𝐋i(𝜽)\displaystyle{\mathbf{L}}_{i}\left(\boldsymbol{\theta}\right)-{\mathbf{L}}_{i}\left(\boldsymbol{\theta}^{\prime}\right)
=\displaystyle= 1pk=1(1k!j=1,jip(Yi,js=0k(ks)(θiθi)s(θjθj)kskli,j(𝜽)θisθjks))\displaystyle\frac{1}{p}\sum_{k=1}^{\infty}\left(\frac{1}{k!}\sum_{j=1,\>j\neq i}^{p}\left(Y_{i,j}\sum_{s=0}^{k}\tbinom{k}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k-s}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right)\right)
\displaystyle\leq |k=1K1p1k!j=1,jipYi,j((θiθi)k+(θjθj)k)kli,j(𝜽)θik|\displaystyle\left|\sum_{k=1}^{K}\frac{1}{p}\frac{1}{k!}\sum_{j=1,\>j\neq i}^{p}Y_{i,j}\left(\left(\theta_{i}-\theta_{i}^{\prime}\right)^{k}+\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k}\right)\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\right|
+|k=2K1p1k!j=1,jip(Yi,js=1k1(ks)(θiθi)s(θjθj)kskli,j(𝜽)θisθjks)|\displaystyle+\left|\sum_{k=2}^{K}\frac{1}{p}\frac{1}{k!}\sum_{j=1,\>j\neq i}^{p}\left(Y_{i,j}\sum_{s=1}^{k-1}\tbinom{k}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k-s}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right)\right|
+1p|1(K+1)!j=1,jip(Yi,js=0K+1(K+1s)(θiθi)s(θjθj)K+1sK+1li,j(𝜽ξ)θisθjK+1s)|\displaystyle+\frac{1}{p}\left|\frac{1}{\left(K+1\right)!}\sum_{j=1,\>j\neq i}^{p}\left(Y_{i,j}\sum_{s=0}^{K+1}\tbinom{K+1}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{K+1-s}\frac{\partial^{K+1}l_{i,j}\left(\boldsymbol{\theta}^{\xi}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{K+1-s}}\right)\right|
=\displaystyle= Si(1)+Si(2)+Si(3),\displaystyle S^{(1)}_{i}+S^{(2)}_{i}+S^{(3)}_{i},

where K=c0log(np)+1K=\lfloor c_{0}\log(np)\rfloor+1 for some large enough constant c0c_{0}. First consider Si(1)S^{(1)}_{i}. There exist big enough constants C1>0C_{1}>0, c1>0c_{1}>0 such that, uniformly for all i=1,,pi=1,\cdots,p, k=1,2,,Kk=1,2,\cdots,K and all 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right), we have, with probability greater than 1(np)c11-(np)^{-c_{1}},

1p1k!|j=1,jipYi,j((θiθi)k+(θjθj)k)kli,j(𝜽)θik|\displaystyle\frac{1}{p}\frac{1}{k!}\left|\sum_{j=1,\>j\neq i}^{p}Y_{i,j}\left(\left(\theta_{i}-\theta_{i}^{\prime}\right)^{k}+\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k}\right)\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\right|
=\displaystyle= 1p1k!|θiαθiα|k|ji,j=1pYi,jkli,j(𝜽)θikαk|\displaystyle\frac{1}{p}\frac{1}{k!}\left|\frac{\theta_{i}}{\alpha}-\frac{\theta_{i}^{\prime}}{\alpha}\right|^{k}\left|\sum_{j\neq i,j=1}^{p}Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\alpha^{k}\right|
+1p1k!ji,j=1p|θjαθjα|k|Yi,jkli,j(𝜽)θikαk|\displaystyle+\frac{1}{p}\frac{1}{k!}\sum_{j\neq i,j=1}^{p}\left|\frac{\theta_{j}}{\alpha}-\frac{\theta_{j}^{\prime}}{\alpha}\right|^{k}\left|Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\alpha^{k}\right|
\displaystyle\leq 1p1k!|θiθiα|kmaxi|ji,j=1pYi,jkli,j(𝜽)θikαk|\displaystyle\frac{1}{p}\frac{1}{k!}\left|\frac{\theta_{i}-\theta^{\prime}_{i}}{\alpha}\right|^{k}\max_{i}\left|\sum_{j\neq i,j=1}^{p}Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\alpha^{k}\right|
+1p1k!𝜽i𝜽iαkkmaxi,j|Yi,jkli,j(𝜽)θikαk|\displaystyle+\frac{1}{p}\frac{1}{k!}\left\|\frac{\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}^{\prime}_{-i}}{\alpha}\right\|_{k}^{k}\max_{i,j}\left|Y_{i,j}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\alpha^{k}\right|
\displaystyle\leq C1(|θiθiα|kb(p)log(np)+σ(p)plog(np)kp+𝜽i𝜽iαkkb(p)kp).\displaystyle C_{1}\left(\left|\frac{\theta_{i}-\theta^{\prime}_{i}}{\alpha}\right|^{k}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{kp}+\left\|\frac{\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}^{\prime}_{-i}}{\alpha}\right\|_{k}^{k}\frac{b_{(p)}}{kp}\right).

Then, from inequality (A.42) we have, there exist big enough constants C2>0C_{2}>0 and c2>0c_{2}>0, such that with probability greater than 1(np)c21-(np)^{-c_{2}},

Si(1)\displaystyle S^{(1)}_{i} \displaystyle\leq k=1K1p1k!|j=1,jipYi,j((θiθi)k+(θjθj)k)kli,j(𝜽)θik|\displaystyle\sum_{k=1}^{K}\frac{1}{p}\frac{1}{k!}\left|\sum_{j=1,\>j\neq i}^{p}Y_{i,j}\left(\left(\theta_{i}-\theta_{i}^{\prime}\right)^{k}+\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k}\right)\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{k}}\right|
\displaystyle\leq C1k=1K(|θiθiα|kb(p)log(np)+σ(p)plog(np)kp+𝜽i𝜽iαkkb(p)kp)\displaystyle C_{1}\sum_{k=1}^{K}\left(\left|\frac{\theta_{i}-\theta^{\prime}_{i}}{\alpha}\right|^{k}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{kp}+\left\|\frac{\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}^{\prime}_{-i}}{\alpha}\right\|_{k}^{k}\frac{b_{(p)}}{kp}\right)
\displaystyle\leq C1b(p)log(np)+σ(p)plog(np)p(k=1K1k|θiθiα|k)+C1b(p)pk=1K1k𝜽i𝜽iαkk\displaystyle C_{1}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left(\sum_{k=1}^{K}\frac{1}{k}\left|\frac{\theta_{i}-\theta^{\prime}_{i}}{\alpha}\right|^{k}\right)+C_{1}\frac{b_{(p)}}{p}\sum_{k=1}^{K}\frac{1}{k}\left\|\frac{\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}^{\prime}_{-i}}{\alpha}\right\|_{k}^{k}
\displaystyle\leq C2b(p)log(np)+σ(p)plog(np)p|θiθi|+C2b(p)p𝜽i𝜽i1,\displaystyle C_{2}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left|\theta_{i}-\theta^{\prime}_{i}\right|+C_{2}\frac{b_{(p)}}{p}\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}^{\prime}_{-i}\right\|_{1},

holds uniformly for i=1,,pi=1,\cdots,p and 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right).

Next we derive an upper bound for Si(2)S^{(2)}_{i}. With the fact that Yi,j2b(p)2Y_{i,j}^{2}\leq b_{(p)}^{2} and

Var(Yi,j2)E(Yi,j4)b(p)2E(Yi,j2)=b(p)2σ(p)2,{\rm Var}\left(Y_{i,j}^{2}\right)\leq{\rm E}\left(Y_{i,j}^{4}\right)\leq b_{(p)}^{2}{\rm E}\left(Y_{i,j}^{2}\right)=b_{(p)}^{2}\sigma^{2}_{(p)},

by Lemma 6 we have, there exist big enough constants C3,c3>0C_{3},c_{3}>0 such that with probability greater than 1(np)c31-(np)^{-c_{3}},

maxi(j=1,jipYi,j2)1/2\displaystyle\max_{i}\left(\sum_{j=1,\>j\neq i}^{p}Y_{i,j}^{2}\right)^{1/2} C3(b(p)2log(np)+σ(p)b(p)plog(np))1/2\displaystyle\leq C_{3}\left(b_{(p)}^{2}\log(np)+\sigma_{(p)}b_{(p)}\sqrt{p\log(np)}\right)^{1/2}
C3((plog(np))1/4σ(p)b(p)+b(p)log(np)).\displaystyle\leq C_{3}\left(\left(p\log(np)\right)^{1/4}\sqrt{\sigma_{(p)}b_{(p)}}+b_{(p)}\sqrt{\log(np)}\right).

Consequently, there exists a big enough constant C4>0C_{4}>0 such that, uniformly for all i=1,,pi=1,\cdots,p and 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right), we have, with probability greater than 1(np)c31-(np)^{-c_{3}},

S(2)i\displaystyle S^{(2)}_{i} \displaystyle\leq k=2K1p1k!j=1,jip|(Yi,js=1k1(ks)(θiθi)s(θjθj)kskli,j(𝜽)θisθjks)|\displaystyle\sum_{k=2}^{K}\frac{1}{p}\frac{1}{k!}\sum_{j=1,\>j\neq i}^{p}\left|\left(Y_{i,j}\sum_{s=1}^{k-1}\tbinom{k}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{k-s}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right)\right|
\displaystyle\leq k=2K1p1k!0.5k(j=1,jipYi,j2s=1k1(θiθiα/2)2s)1/2\displaystyle\sum_{k=2}^{K}\frac{1}{p}\frac{1}{k!}0.5^{k}\left(\sum_{j=1,\>j\neq i}^{p}Y_{i,j}^{2}\sum_{s=1}^{k-1}\left(\frac{\theta_{i}-\theta_{i}^{\prime}}{\alpha/2}\right)^{2s}\right)^{1/2}
(j=1,jips=1k1((ks)(θjθjα/2)ksαkkli,j(𝜽)θisθjks)2)1/2\displaystyle\quad\left(\sum_{j=1,\>j\neq i}^{p}\sum_{s=1}^{k-1}\left(\tbinom{k}{s}\left(\frac{\theta_{j}-\theta_{j}^{\prime}}{\alpha/2}\right)^{k-s}\alpha^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right)^{2}\right)^{1/2}
\displaystyle\leq k=2K1p1k!0.5k(|θiθi|2α2/4|θiθi|2)1/2maxi(j=1,jipYi,j2)1/2\displaystyle\sum_{k=2}^{K}\frac{1}{p}\frac{1}{k!}0.5^{k}\left(\frac{\left|\theta_{i}-\theta^{\prime}_{i}\right|^{2}}{\alpha^{2}/4-\left|\theta_{i}-\theta^{\prime}_{i}\right|^{2}}\right)^{1/2}\max_{i}\left(\sum_{j=1,\>j\neq i}^{p}Y_{i,j}^{2}\right)^{1/2}
(s=1k1(ks)2j=1,jip(θjθjα/2)2(ks))1/2maxj,s,k|αkkli,j(𝜽)θisθjks|\displaystyle\quad\left(\sum_{s=1}^{k-1}\tbinom{k}{s}^{2}\sum_{j=1,\>j\neq i}^{p}\left(\frac{\theta_{j}-\theta_{j}^{\prime}}{\alpha/2}\right)^{2(k-s)}\right)^{1/2}\max_{j,s,k}\left|\alpha^{k}\frac{\partial^{k}l_{i,j}\left(\boldsymbol{\theta}^{\prime}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{k-s}}\right|
\displaystyle\leq C4|θiθi|(plog(np))1/4σ(p)b(p)+b(p)log(np)p\displaystyle C_{4}\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{\left(p\log(np)\right)^{1/4}\sqrt{\sigma_{(p)}b_{(p)}}+b_{(p)}\sqrt{\log(np)}}{p}
k=2K0.5kk(s=1k1(ks)2j=1,jip(θjθjα/2)2(ks))1/2\displaystyle\quad\sum_{k=2}^{K}\frac{0.5^{k}}{k}\left(\sum_{s=1}^{k-1}\tbinom{k}{s}^{2}\sum_{j=1,\>j\neq i}^{p}\left(\frac{\theta_{j}-\theta_{j}^{\prime}}{\alpha/2}\right)^{2(k-s)}\right)^{1/2}
\displaystyle\leq C4|θiθi|(plog(np))1/4σ(p)b(p)+b(p)log(np)p\displaystyle C_{4}\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{\left(p\log(np)\right)^{1/4}\sqrt{\sigma_{(p)}b_{(p)}}+b_{(p)}\sqrt{\log(np)}}{p}
×k=2Ks=1k1j=1,jip0.5ks(ks)|θjθjα/2|ks\displaystyle\times\sum_{k=2}^{K}\sum_{s=1}^{k-1}\sum_{j=1,\>j\neq i}^{p}\frac{0.5^{k}}{s}\tbinom{k}{s}\left|\frac{\theta_{j}-\theta_{j}^{\prime}}{\alpha/2}\right|^{k-s}
\displaystyle\leq C4|θiθi|(plog(np))1/4σ(p)b(p)+b(p)log(np)p\displaystyle C_{4}\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{\left(p\log(np)\right)^{1/4}\sqrt{\sigma_{(p)}b_{(p)}}+b_{(p)}\sqrt{\log(np)}}{p}
×s=1K1(1s𝜽i𝜽iα/2ssk=s+1K(ks)0.5k)\displaystyle\times\sum_{s=1}^{K-1}\left(\frac{1}{s}\left\|\frac{\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}}{\alpha/2}\right\|_{s}^{s}\sum_{k=s+1}^{K}\tbinom{k}{s}0.5^{k}\right)
\displaystyle\leq 2C4|θiθi|(plog(np))1/4σ(p)b(p)+b(p)log(np)ps=1K11s𝜽i𝜽iα/2ss\displaystyle 2C_{4}\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{\left(p\log(np)\right)^{1/4}\sqrt{\sigma_{(p)}b_{(p)}}+b_{(p)}\sqrt{\log(np)}}{p}\sum_{s=1}^{K-1}\frac{1}{s}\left\|\frac{\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}}{\alpha/2}\right\|_{s}^{s}
\displaystyle\leq 4C4𝜽i𝜽i1|θiθi|(plog(np))1/4σ(p)b(p)+b(p)log(np)p.\displaystyle 4C_{4}\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}\right\|_{1}\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{\left(p\log(np)\right)^{1/4}\sqrt{\sigma_{(p)}b_{(p)}}+b_{(p)}\sqrt{\log(np)}}{p}.

Here in the above inequalities we have used that fact that k=s+1K(ks)0.5k<2\sum_{k=s+1}^{K}\tbinom{k}{s}0.5^{k}<2, and the last step follows from the inequality (A.42).

Finally, we derive an upper bound for S(3)iS^{(3)}_{i}. By condition (L-A1), by choosing K=c0log(np)+1K=\lfloor c_{0}\log(np)\rfloor+1 with c0c_{0} to be a large enough constant, there exists a big enough constant c4>0c_{4}>0 such that, uniformly for all i=1,,pi=1,\cdots,p, 𝜽ξ\boldsymbol{\theta}^{\xi} and 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right), we have

S(3)i\displaystyle S^{(3)}_{i} =\displaystyle= 1p|1(K+1)!j=1,jip(Yi,js=0K+1(K+1s)(θiθi)s(θjθj)K+1sK+1li,j(𝜽ξ)θisθjK+1s)|\displaystyle\frac{1}{p}\left|\frac{1}{\left(K+1\right)!}\sum_{j=1,\>j\neq i}^{p}\left(Y_{i,j}\sum_{s=0}^{K+1}\tbinom{K+1}{s}\left(\theta_{i}-\theta_{i}^{\prime}\right)^{s}\left(\theta_{j}-\theta_{j}^{\prime}\right)^{K+1-s}\frac{\partial^{K+1}l_{i,j}\left(\boldsymbol{\theta}^{\xi}\right)}{\partial\theta_{i}^{s}\partial\theta_{j}^{K+1-s}}\right)\right|
\displaystyle\leq 1p1K+1j=1,jip|Yi,j|(|θiθi|α+|θjθj|α)K+1\displaystyle\frac{1}{p}\frac{1}{K+1}\sum_{j=1,\>j\neq i}^{p}\left|Y_{i,j}\right|\left(\frac{\left|\theta_{i}-\theta_{i}^{\prime}\right|}{\alpha}+\frac{\left|\theta_{j}-\theta_{j}^{\prime}\right|}{\alpha}\right)^{K+1}
\displaystyle\leq b(p)K+1(2𝜽𝜽α)K+1\displaystyle\frac{b_{(p)}}{K+1}\left(\frac{2\left\|\boldsymbol{\theta}-\boldsymbol{\theta}^{\prime}\right\|_{\infty}}{\alpha}\right)^{K+1}
\displaystyle\leq b(p)K+1(2α0α)K+1\displaystyle\frac{b_{(p)}}{K+1}\left(\frac{2\alpha_{0}}{\alpha}\right)^{K+1}
\displaystyle\leq b(p)(np)c4.\displaystyle b_{(p)}\left(np\right)^{-c_{4}}.

Here the last step will hold if we choose c0c_{0} to be large enough such that (2α0/α)c0/c3(2\alpha_{0}/\alpha)^{c_{0}/c_{3}}<e1<e^{-1}. When npnp\rightarrow\infty, and c0,c3c_{0},c_{3} are chosen to be large enough, this bound will be dominated by the upper bounds for S(1)S^{(1)} and S(2)S^{(2)}.

Consequently, by choosing K=c0log(np)+1K=\lfloor c_{0}\log(np)\rfloor+1 with c0c_{0} to be a large enough constant, we conclude that, for any given 𝜽\boldsymbol{\theta}^{\prime}, there exist large enough constants C,c>0C,c>0 such that uniform for any 𝜽𝐁(𝜽,α0)\boldsymbol{\theta}\in{\mathbf{B}}_{\infty}\left(\boldsymbol{\theta}^{\prime},\alpha_{0}\right) and 1ip1\leq i\leq p, with probability greater than 1(np)c1-(np)^{-c},

|𝐋i(𝜽)𝐋i(𝜽)|\displaystyle\left|{\mathbf{L}}_{i}\left(\boldsymbol{\theta}\right)-{\mathbf{L}}_{i}\left(\boldsymbol{\theta}^{\prime}\right)\right|
\displaystyle\leq C2b(p)log(np)+σ(p)plog(np)p|θiθi|+C2b(p)p𝜽i𝜽i1\displaystyle C_{2}\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}\left|\theta_{i}-\theta^{\prime}_{i}\right|+C_{2}\frac{b_{(p)}}{p}\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}^{\prime}_{-i}\right\|_{1}
+C4𝜽i𝜽i1|θiθi|(plog(np))1/4σ(p)b(p)+b(p)log(np)p\displaystyle+C_{4}\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}\right\|_{1}\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{\left(p\log(np)\right)^{1/4}\sqrt{\sigma_{(p)}b_{(p)}}+b_{(p)}\sqrt{\log(np)}}{p}
\displaystyle\leq Cb(p)p𝜽i𝜽i1+C(𝜽i𝜽i1+1)|θiθi|b(p)log(np)+σ(p)plog(np)p.\displaystyle C\frac{b_{(p)}}{p}\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}\right\|_{1}+C\left(\left\|\boldsymbol{\theta}_{-i}-\boldsymbol{\theta}_{-i}^{\prime}\right\|_{1}+1\right)\left|\theta_{i}-\theta^{\prime}_{i}\right|\frac{b_{(p)}\log(np)+\sigma_{(p)}\sqrt{p\log(np)}}{p}.

Appendix B Additional numerical results

B.1 Informal justification of the use of TWHM in analyzing the insecta-ant-colony4 dataset

To motivate the use of the proposed TWHM for the analysis of the insecta-ant-colony4 dataset, we plot the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the degree sequences of two ants. These ants are selected based on their respective highest and lowest values at time t=1t=1 according to

t=1n|p12j=1,jipXi,jt|.\sum_{t=1}^{n}\left|\frac{p-1}{2}-\sum_{j=1,\>j\neq i}^{p}X_{i,j}^{t}\right|.

Upon examining Figure 5, it becomes evident that both degree sequences exhibit patterns reminiscent of a first-order autoregressive model with long memory. This observation serves as a strong motivation for employing the TWHM methodology.

Refer to caption
Figure 5: The ACF and PACF plots of the degree sequences of two selected ants.

B.2 Community detection under stochastic block structures

We conduct additional numerical studies to assess the efficacy of community detection using the estimated β\beta-parameters. To implement stochastic block structures within the TWHM framework, we partitioned the pp nodes into kk communities of equal size, ensuring that the parameters (βi,0,βi,1)(\beta_{i,0},\beta_{i,1}) for all nodes ii within the same community were identical. Furthermore, we explored scenarios where the networks were independently generated from a Stochastic Block Model (SBM). Specifically, we have considered the following settings:

  • Setting 1: The networks are generated under TWHM with 𝜷i,r=0.2,0,0.2\boldsymbol{\beta}_{i,r}=-0.2,0,0.2 (r=0,1)(r=0,1) for all nodes ii in communities 1,21,2 and 33 respectively.

  • Setting 2: The networks are generated under TWHM with 𝜷i,r=0.4,0,0.4\boldsymbol{\beta}_{i,r}=-0.4,0,0.4 (r=0,1)(r=0,1) for all nodes ii in communities 1,21,2 and 33 respectively.

  • Setting 3: The networks are independently generated under SBM, with the probability matrix among different communities specified as

    [0.260.10.10.10.20.10.10.10.14].\displaystyle\left[\begin{array}[]{ccc}0.26&0.1&0.1\\ 0.1&0.2&0.1\\ 0.1&0.1&0.14\\ \end{array}\right].
  • Setting 4: The networks are independently generated under SBM, with the probability matrix among different communities given by

    [0.40.30.20.30.2250.150.20.150.1].\displaystyle\left[\begin{array}[]{ccc}0.4&0.3&0.2\\ 0.3&0.225&0.15\\ 0.2&0.15&0.1\\ \end{array}\right].
  • Setting 5: The networks are independently generated under SBM, with the probability matrix among different communities given by

    [0.90.50.30.50.30.20.30.20.15].\displaystyle\left[\begin{array}[]{ccc}0.9&0.5&0.3\\ 0.5&0.3&0.2\\ 0.3&0.2&0.15\\ \end{array}\right].

Networks in Settings 1-2 are generated with autoregressive dependence using our TWHM model, while networks in Setting 3 are independent samples following the classical SBM structure. In Settings 4-5, networks are also generated with SBM structures, but the probability formation matrices are not full-rank.
Once the β\beta-parameters are estimated, we apply k-means clustering to these parameters to cluster the pp nodes, denoting this method as ”TWHM-Cluster”. For comparison, we apply spectral clustering, widely used for SBM, on the averaged networks, denoting this method as ”SBM-Spectral”. All experiments are repeated 100 times, and the clustering accuracy is reported in Table 7 below.
From Table 7, we observe that community detection using the β\beta-parameters performs significantly better under Settings 1 and 2, where data were generated from our TWHM model. This improvement is attributed to the fact that parameter estimation has considered the autoregressive structure of the networks.
When networks were independently generated from SBM under Setting 3, the performance of TWHM-Cluster is comparable to that of SBM-Cluster. However, when the probability matrix of SBM is not full-rank (Settings 4 and 5), the TWHM model still demonstrates promising performance, while classical spectral clustering can be much less satisfactory.

Table 7: Mean clustering accuracy of TWHM and SBM over 100 replications. Here k,n,pk,n,p denote the number of communities, the number of network observations, and the number of nodes, respectively.
(k,n,p)(k,n,p) TWHM-Cluster SBM-Spectral
Setting 1 (3,2,300) 68.6% 37.1%
(3,10,300) 95.1% 37.0%
(3,50,300) 99.5% 38.7%
Setting 2 (3,2,300) 92.2% 39.7%
(3,10,300) 95.6% 48.6%
(3,50,300) 100.0% 63.0%
Setting 3 (3,10,300) 92.1% 99.8%
(3,30,300) 99.4% 100.0%
(3,50,300) 99.3% 100.0%
Setting 4 (3,10,500) 97.0% 37.1%
(3,30,500) 93.9% 37.0%
(3,50,500) 100% 37.5%
Setting 5 (3,10,500) 80.0% 71.2%
(3,30,500) 80.2% 70.7%
(3,50,500) 83.8% 72.8%

B.3 Dynamic protein-protein interaction networks

In this section, we applied the proposed TWHM to 12 dynamic protein-protein interaction networks (PPIN) of yeast cells examined in [5]. Each dynamic network comprises 36 network observations. The objective of investigating protein-protein interactions is to glean valuable insights into the cellular function and machinery of a proteome [34]. To provide an overview of these datasets, we present selected statistics in Table 8.

Table 8: Some statistics of the 12 protein-protein interaction network datasets.
Dataset # of Nodes Mean degree Density
DPPIN-Uetz 922 4.68 0.22%
DPPIN-Ito 2856 6.05 0.07%
DPPIN-Ho 1548 54.55 0.13%
DPPIN-Gavin 2541 110.22 0.08%
DPPIN-Krogan (LCMS) 2211 77.01 0.09%
DPPIN-Krogan (MALDI) 2099 74.60 0.10%
DPPIN-Yu 1163 6.19 0.17%
DPPIN-Breitkreutz 869 90.33 0.23%
DPPIN-Babu 5003 44.56 0.04%
DPPIN-Lambert 697 19.09 0.29%
DPPIN-Tarassov 1053 9.17 0.19%
DPPIN-Hazbun 143 27.40 1.40%

We have employed our method for linkage prediction on these PPINs. Similar to the main paper, we utilized TWHM with either a fixed cutoff point of 0.50.5 (TWHM0.5) or the adaptive rule (TWHMadaptive). For comparison, we used a naive estimator 𝐗t1{\mathbf{X}}^{t-1} to predict 𝐗t{\mathbf{X}}^{t} (Naive). Our training time slot size was set to n=10n=10, and the results are presented in Table 9 below. As evident from the table, our approach shows significant promise for accurate link prediction.

Table 9: Comparison of TWHM0.5, TWHMadaptive, and Naive in terms of misclassification rate of links (×105\times 10^{-5}).
Dataset TWHM0.5 TWHMadaptive Naive
DPPIN-Uetz 55 55 88
DPPIN-Ito 28 29 50
DPPIN-Ho 108 111 194
DPPIN-Gavin 139 142 250
DPPIN-Krogan(LCMS) 133 141 222
DPPIN-Krogan(MALDI) 124 127 215
DPPIN-Yu 54 55 86
DPPIN-Breitkreutz 297 305 493
DPPIN-Babu 27 28 45
DPPIN-Lambert 293 298 489
DPPIN-Tarassov 72 75 133
DPPIN-Hazbun 759 772 1198