This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Modelling Volatility of Spatio-temporal Integer-valued Data
with Network Structure and Asymmetry

Yue Pan  and  Jiazhu Pan
Department of Mathematics and Statistics, University of Strathclyde, Glasgow G1 1XH, UK
Abstract

This paper proposes a spatial threshold GARCH-type model for dynamic spatio-temporal integer-valued data with network structure. The proposed model can simplify the parameterization by using network structure in data, and can capture the asymmetric property in dynamic volatility by adopting a threshold structure. The proposed model assumes the conditional distribution is Poisson distribution. Asymptotic theory of maximum likelihood estimation (MLE) for the spatial model is derived when both sample size and network dimension are large. We obtain asymptotic statistical inferences via investigation of the weak dependence of components of the model and application of limit theorems for weakly dependent random fields. Simulation studies and a real data example are presented to support our methodology.

Keywords: Conditional variance/mean, high-dimensional count time series, asymmetry of volatility, spatial threshold GARCH, network structure.

MSC 2020 Subject Classification: Primary 62M10, 91B05; Secondary 60G60, 60F05.

1 Introduction

Integer-valued time series can be observed in a wide range of scientific fields, such as the yearly trading volume of houses on real estate market (De Wit et al., , 2013), number of transactions of stocks (Jones et al., , 1994), daily mortality from Covid-19 (Pham, , 2020) and daily new cases of Covid-19 (Jiang et al., , 2022). A first idea to model integer-valued time series is using a simple first-order autoregressive model (AR)

Xt=αXt1+εt,X_{t}=\alpha X_{t-1}+\varepsilon_{t}, (1.1)

where 0α<10\leq\alpha<1 is a parameter. However in (1.1) XtX_{t} is not necessarily an integer given integer-valued Xt1X_{t-1} and εt\varepsilon_{t}, due to the multiplication structure αXt1\alpha X_{t-1}. To circumvent such problem, McKenzie, (1985) and Al-Osh and Alzaid, (1987) proposed an integer-valued counterpart of AR model (INAR). They replaced the ordinary multiplication αXt1\alpha X_{t-1} by the binomial thinning operation αXt1\alpha\circ X_{t-1}, where αX|XBin(X,α)\alpha\circ X|X\sim Bin(X,\alpha). This was ground-breaking and led to various extensions of thinning-based linear models including integer-valued moving average model (INMA) (Al-Osh and Alzaid, , 1988) and INARMA model (McKenzie, , 1988). An alternative approach to the multiplication problem is to consider the regression of the conditional mean λt:=𝔼(Xt|t1)\lambda_{t}:=\mathbb{E}(X_{t}|\mathcal{H}_{t-1}), where Ht1H_{t-1} is the σ\sigma-algebra generated by historical information up to t1t-1. Based on this idea, integer-valued GARCH-type models (INGARCH) were proposed by Heinen, (2003), Ferland et al., (2006) and Fokianos et al., (2009) with conditional Poisson distribution of XtX_{t}, i.e.

Xt|t1Poisson(λt),\displaystyle X_{t}|\mathcal{H}_{t-1}\sim Poisson(\lambda_{t}), (1.2)
λt=ω+i=1pαiXti+j=1qβjλtj,\displaystyle\lambda_{t}=\omega+\sum_{i=1}^{p}\alpha_{i}X_{t-i}+\sum_{j=1}^{q}\beta_{j}\lambda_{t-j},

where parameters satisfy ω>0,αi0,i=1,,p,βj0,j=1,,q\omega>0,\alpha_{i}\geq 0,i=1,\cdots,p,\beta_{j}\geq 0,j=1,\cdots,q. Other variations of INGARCH models with different specifications of conditional distribution include negative binomial INGARCH (Zhu, , 2010; Xu et al., , 2012) and generalized Poisson INGARCH (Zhu, , 2012).

Integer-valued models mentioned above are all limited to one-dimensional time series. The development of multivariate integer-valued GARCH-type models is still at its early stage. For example, the bivariate INGARCH models (Lee et al., , 2018; Cui and Zhu, , 2018; Cui et al., , 2020), multivariate INGARCH models (Fokianos et al., , 2020; Lee et al., , 2023) and some counterparts of the continuous-valued network GARCH model (Zhou et al., , 2020) are on fixed low-dimensional time series of counts. For high-dimensional integer-valued time series with increasing dimension, there is the Poisson network autoregressive model (PNAR) by Armillotta and Fokianos, (2024). The PNAR allows for integer-valued time series with increasing network dimension. However, it adopted a ARCH-type structure without considering the autoregressive term on the conditional mean/variance to describe persistence in volatility, and it can not capture asymmetric characteristics of volatility. Tao et al., (2024) proposed a grouped PNAR model which has a GARCH structure, but its network dimension is fixed and not suitable to spatio-temporal integer-valued data. In this paper, we propose a Poisson threshold network GARCH model (PTNGARCH) and discuss its asymptotic inferences. The specific contributions are as follows.

  1. 1.

    A threshold structure is designed so that it is able to capture asymmetric property of high-dimensional volatility for integer-valued data. The threshold effect can also be tested under such a framework.

  2. 2.

    Our PTNGARCH includes an autoregressive term on the conditional mean/variance so that it provides a parsimonious description of dynamic volatility persistence of high-dimensional integer-valued time series.

  3. 3.

    Asymptotic theory, when both sample size (TT) and network dimension (NN) are large, of maximum likelihood estimation for the proposed model is established by the limit theorems for arrays of weakly dependent random fields in Pan and Pan, (2024). This enables the proposed model to be suitable for fitting dynamic spatio-temporal integer-valued data (or the so-called ultra-high dimensional integer-valued time series).

The remainder of this paper is arranged as follows. The PTNGARCH model is introduced and its stationarity over time is discussed under fixed network dimension in section 2. Section 3 presents the MLE for parameters, including the threshold, and the consistency and asymptotic normality for estimates of coefficients under large sample size (TT) and large network dimension (NN). Section 4 gives a Wald test which can be applied to testing of the existence of threshold effect (i.e. asymmetric property), GARCH effect and network effect. Section 5 conducts simulation studies to verify the asymptotic properties of the MLE, and applies the proposed model to daily numbers of car accidents that occurred in 41 neighbourhoods in New York City, with interpretation of the results of analysis. All proofs of theoretical results are postponed to the appendix.

2 Model setting and stationarity over time

Consider an non-directed and weightless network with NN nodes. Define adjacency matrix A=(aij)1i,jNA=(a_{ij})_{1\leq i,j\leq N}, where aij=1a_{ij}=1 if there is a connection between node ii and jj, otherwise aij=0a_{ij}=0. In addition, self-connection is not allowed for any node ii by letting aii=0a_{ii}=0. As an interpretation of the network structure, AA is symmetric since aij=ajia_{ij}=a_{ji}, hence for any node ii, the out-degree di(out)=j=1Naijd_{i}^{(out)}=\sum_{j=1}^{N}a_{ij} is equal to the in-degree di(in)=j=1Najid_{i}^{(in)}=\sum_{j=1}^{N}a_{ji}, and we use did_{i} to denote both for convenience. To embed a network into statistical models, it is often convenient to use the row normalized adjacency matrix WW with its (i,j)(i,j) element wij=aijdiw_{ij}=\frac{a_{ij}}{d_{i}}.

For any node i{1,.N}i\in\{1,\cdots.N\} in this network, let yity_{it} be an non-negative integer-valued observation at time tt, and t1\mathcal{H}_{t-1} denotes the σ\sigma-algebra consisting of all available information up to t1t-1. In our Poisson threshold network GARCH model, for each i=1,2,,Ni=1,2,...,N and tt\in\mathbb{Z}, yity_{it} is assumed to follow a conditional (on t1\mathcal{H}_{t-1}) Poisson distribution with (i,t)(i,t)-varying variance (mean) λit\lambda_{it}. A PTNGARCH(1,1) model has the following form:

yit|t1Poisson(λit),\displaystyle y_{it}|\mathcal{H}_{t-1}\sim Poisson(\lambda_{it}), (2.1)
λit=ω+(α(1)1{yi,t1r}+α(2)1{yi,t1<r})yi,t1+ξj=1Nwijyj,t1+βλi,t1,\displaystyle\lambda_{it}=\omega+\left(\alpha^{(1)}1_{\{y_{i,t-1}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-1}<r\}}\right)y_{i,t-1}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-1}+\beta\lambda_{i,t-1},
i=1,2,,N.\displaystyle i=1,2,\cdots,N.

The threshold parameter rr is a positive integer, and 1{}1_{\{\cdot\}} denotes an indicator function. To assure the positiveness of conditional variance, we need to assume positiveness of the base parameter ω\omega, and non-negativeness of all the coefficients α(1)\alpha^{(1)}, α(2)\alpha^{(2)}, ξ\xi, β\beta.

Remark.

Notice that in (2.1) we model the dynamics of conditional mean λit\lambda_{it}, which is the reason why the name “Poisson autoregression” is sometimes used in the literature (Fokianos et al., , 2009; Wang et al., , 2014); Some authors still use the name “GARCH” since the mean is equal to the variance under Poisson distribution, and the dynamics of conditional mean are GARCH-like.

Let {Nit:i=1,2,,N,t}\{N_{it}:i=1,2,...,N,t\in\mathbb{Z}\} be independent Poisson processes with unit intensities. Depending on λit\lambda_{it}, yity_{it} can be interpreted as a Poisson distributed random variable Nit(λit)N_{it}(\lambda_{it}), which is the number of occurrences during the time interval (0,λit](0,\lambda_{it}], i.e. (yit=n|λit=λ)=λnn!eλ.\mathbb{P}(y_{it}=n|\lambda_{it}=\lambda)=\frac{\lambda^{n}}{n!}e^{-\lambda}. We could rewrite (2.1) in a vectorized form as follows:

{𝕐t=(N1t(λ1t),N2t(λ2t),,NNt(λNt)),Λt=ω𝟏N+A(𝕐t1)𝕐t1+βΛt1,\begin{cases}&\mathbb{Y}_{t}=(N_{1t}(\lambda_{1t}),N_{2t}(\lambda_{2t}),...,N_{Nt}(\lambda_{Nt}))^{\prime},\\ &\Lambda_{t}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1})\mathbb{Y}_{t-1}+\beta\Lambda_{t-1},\end{cases} (2.2)

where

Λt=(λ1t,λ2t,,λNt)N,\displaystyle\Lambda_{t}=(\lambda_{1t},\lambda_{2t},...,\lambda_{Nt})^{\prime}\in\mathbb{R}^{N},
𝟏N=(1,1,,1)N,\displaystyle\mathbf{1}_{N}=(1,1,...,1)^{\prime}\in\mathbb{R}^{N},
A(𝕐t1)=α(1)S(𝕐t1)+α(2)(INS(𝕐t1))+ξW,\displaystyle A(\mathbb{Y}_{t-1})=\alpha^{(1)}S(\mathbb{Y}_{t-1})+\alpha^{(2)}(I_{N}-S(\mathbb{Y}_{t-1}))+\xi W,
S(𝕐t1)=diag{1{y1,t1r},1{y2,t1r},,1{yN,t1r}}.\displaystyle S(\mathbb{Y}_{t-1})=\operatorname*{diag}\left\{1_{\{y_{1,t-1}\geq r\}},1_{\{y_{2,t-1}\geq r\}},...,1_{\{y_{N,t-1}\geq r\}}\right\}.

Note that 𝕐tN\mathbb{Y}_{t}\in\mathbb{N}^{N} with dimension NN and ={0,1,2,}\mathbb{N}=\{0,1,2,\cdots\}. For a fixed dimension NN, the stationarity condition of time series 𝕐t\mathbb{Y}_{t} can be obtained. The proof of the following theorem is given in the appendix.

Theorem 1.

If the parameters satisfy

max{α(1),α(2),|α(1)rα(2)(r1)|}+ξ+β<1,\max\left\{\alpha^{(1)},\alpha^{(2)},\left|\alpha^{(1)}r-\alpha^{(2)}(r-1)\right|\right\}+\xi+\beta<1, (2.3)

there exists a unique strictly stationary process {𝕐t:t}\{\mathbb{Y}_{t}:t\in\mathbb{Z}\} that satisfies (2.2) and has finite first order moment. Therefore, each component yity_{it} of model (2.1) has a unique strictly stationary solution with finite first order moment.

3 Parameter estimation with TT\to\infty and NN\to\infty

Denote parameter vector of model (2.1) by ν=(θ,r)\nu=(\theta^{\prime},r)^{\prime} with θ=(ω,α(1),α(2),ξ,β)\theta=(\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta)^{\prime}. Let Θ={θ:ω>0,α(1)0,α(2)0,ξ0,β0}\Theta=\{\theta:\omega>0,\alpha^{(1)}\geq 0,\alpha^{(2)}\geq 0,\xi\geq 0,\beta\geq 0\} and +={0,1,2,}\mathbb{Z}_{+}=\{0,1,2,\cdots\}. For a given threshold rr, the reasonable parameter space for θ\theta is a sufficiently large compact subset of Θ\Theta, such as Θδ={θ:ωδ,α(1)0,α(2)0,ξ0,β0}\Theta_{\delta}=\{\theta:\omega\geq\delta,\alpha^{(1)}\geq 0,\alpha^{(2)}\geq 0,\xi\geq 0,\beta\geq 0\} with sufficient small positive real number δ\delta.

Let DNT={(i,t):i=1,,N;t=1,,T}D_{NT}=\{(i,t):i=1,\cdots,N;t=1,\cdots,T\} be the index set. Suppose that the samples {yit:(i,t)DNT,NT1}\{y_{it}:(i,t)\in D_{NT},NT\geq 1\} are generated by model (2.1) with respect to true parameters ν0=(ω0,α0(1),α0(2),ξ0,β0,r0)\nu_{0}=(\omega_{0},\alpha^{(1)}_{0},\alpha^{(2)}_{0},\xi_{0},\beta_{0},r_{0})^{\prime}.

The log-likelihood function (ignoring constants) is

{LNT(ν)=1NT(i,t)DNTlit(ν),lit(ν)=yitlogλit(ν)λit(ν)\left\{\begin{array}[]{ll}&L_{NT}(\nu)=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}l_{it}(\nu),\\ &l_{it}(\nu)=y_{it}\log\lambda_{it}(\nu)-\lambda_{it}(\nu)\end{array}\right. (3.1)

where λit(ν)\lambda_{it}(\nu) is generated from model (2.1) as

λit(ν)=\displaystyle\lambda_{it}(\nu)= ω+α(1)1{yi,t1r}yi,t1+α(2)1{yi,t1<r}yi,t1\displaystyle\omega+\alpha^{(1)}1_{\{y_{i,t-1}\geq r\}}y_{i,t-1}+\alpha^{(2)}1_{\{y_{i,t-1}<r\}}y_{i,t-1} (3.2)
+ξj=1Nwijyj,t1+βλi,t1(ν).\displaystyle+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-1}+\beta\lambda_{i,t-1}(\nu).

In practice, (3.1) can not be evaluated without knowing the true values of λi0\lambda_{i0} for i=1,2,,Ni=1,2,...,N. We need to approximate (3.1) by (3.3) below, using specified initial values λi0=λ~i0,i=1,2,,N\lambda_{i0}=\tilde{\lambda}_{i0},i=1,2,...,N:

{L~NT(ν)=1NT(i,t)DNTl~it(ν),l~it(ν)=yitlogλ~it(ν)λ~it(ν).\left\{\begin{array}[]{ll}&\tilde{L}_{NT}(\nu)=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\tilde{l}_{it}(\nu),\\ &\tilde{l}_{it}(\nu)=y_{it}\log\tilde{\lambda}_{it}(\nu)-\tilde{\lambda}_{it}(\nu).\end{array}\right. (3.3)

Therefore, the maximum likelihood estimate (MLE) of parameter ν\nu is defined as

ν^NT=argmaxνΘ×+L~NT(ν).\hat{\nu}_{NT}=\operatorname*{argmax}_{\nu\in\Theta\times\mathbb{Z}_{+}}\tilde{L}_{NT}(\nu). (3.4)

However, the solution that maximizes the target function L~NT(ν)\tilde{L}_{NT}(\nu) can not be directly obtained by solving L~NT(ν)ν=0,\frac{\partial\tilde{L}_{NT}(\nu)}{\partial\nu}=0, since r+r\in\mathbb{Z}_{+} is discrete-valued, therefore the partial derivative of L~NT(ν)\tilde{L}_{NT}(\nu) w.r.t. rr is invalid. According to Wang et al., (2014), such optimization problem with integer-valued parameter rr could be broken up into two steps as follows:

  1. 1.

    Find

    θ^NT(r)=argmaxθΘL~NT(θ,r)\hat{\theta}_{NT}^{(r)}=\operatorname*{argmax}_{\theta\in\Theta}\tilde{L}_{NT}(\theta,r)

    for each rr in a predetermined range [r¯,r¯]+[\underline{r},\bar{r}]\subset\mathbb{Z}_{+};

  2. 2.

    Find

    r^NT=argmaxr[r¯,r¯]L~NT(θ^NT(r),r).\hat{r}_{NT}=\operatorname*{argmax}_{r\in[\underline{r},\bar{r}]}\tilde{L}_{NT}(\hat{\theta}_{NT}^{(r)},r).

Then ν^NT=(θ^NT(r^NT),r^NT)\hat{\nu}_{NT}=\left(\hat{\theta}_{NT}^{(\hat{r}_{NT})^{\prime}},\hat{r}_{NT}\right)^{\prime} would be the maximizer of L~NT(ν)\tilde{L}_{NT}(\nu).

Assumption 3.1 below is a regular condition on the parameter space. Assumptions 3.2 and 3.3 are necessary for obtaining η\eta-weak dependence of {lit(ν):(i,t)DNT,NT1}\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\} and their derivatives. For the definition of η\eta-weak dependence for random fields, see Pan and Pan, (2024). Then the consistency of MLE in Theorem 2 can be proved based on the law of large numbers (LLN) of η\eta-weakly dependent arrays of random fields in Pan and Pan, (2024).

Assumption 3.1.

For a given threshold r+r\in\mathbb{Z}_{+},

  • (a)

    The parameter space for θ\theta is a compact subset of Θ\Theta and includes the true parameter θ0\theta_{0} as its interior point;

  • (b)

    Condition (2.3) holds.

Assumption 3.2.
  • (a)

    supNT1sup(i,t)DNT𝔼|yit|2p<\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\mathbb{E}|y_{it}|^{2p}<\infty for some p>1p>1;

  • (b)

    The array of random fields {yit:(i,t)DNT,NT1}\left\{y_{it}:(i,t)\in D_{NT},NT\geq 1\right\} is η\eta-weakly dependent with coefficient η¯y(s):=𝒪(sμy)\bar{\eta}_{y}(s):=\mathcal{O}(s^{-\mu_{y}}) for some μy>22p1p1\mu_{y}>2\frac{2p-1}{p-1}.

Assumption 3.3.

For any i=1,2,,Ni=1,2,...,N and j=1,2,,Nj=1,2,...,N, there exist constants C>0C>0 and b>μyb>\mu_{y} such that wijC|ji|b.w_{ij}\leq C|j-i|^{-b}. That is, the power of connection between two nodes ii and jj decays as the distance |ij||i-j| grows.

Theorem 2.

If Assumptions 3.1, 3.2 and 3.3 hold, the MLE defined by (3.4) is consistent, i.e.

ν^NT𝑝ν0\hat{\nu}_{NT}\overset{p}{\to}\nu_{0}

as TT\to\infty and NN\to\infty.

Since r^NT\hat{r}_{NT} is an integer-valued consistent estimate of r0r_{0}, r^NT\hat{r}_{NT} will eventually be equal to r0r_{0} when the sample size NTNT becomes sufficiently large. Therefore, ν^NT=(θ^NT(r^NT),r^NT)\hat{\nu}_{NT}=\left(\hat{\theta}_{NT}^{(\hat{r}_{NT})^{\prime}},\hat{r}_{NT}\right)^{\prime} is asymptotically equal to (θ^NT(r0),r0)\left(\hat{\theta}_{NT}^{(r_{0})^{\prime}},r_{0}\right)^{\prime}. In this way, the problem of investigating the asymptotic distribution of ν^NT\hat{\nu}_{NT} degenerates to investigating the asymptotic distribution of θ^NT(r0)\hat{\theta}_{NT}^{(r_{0})}.

Theorem 3.

Assume that all conditions in Theorem 2 are satisfied with μy>6p3p1(4p3)(2p1)2(p1)2\mu_{y}>\frac{6p-3}{p-1}\vee\frac{(4p-3)(2p-1)}{2(p-1)^{2}} in Assumption 3.2(b) instead. If the smallest eigenvalue λmin(ΣNT)\lambda_{min}(\Sigma_{NT}) of

ΣNT:=1NT(i,t)DNT𝔼[1λit(ν0)λit(ν0)θλit(ν0)θ]\Sigma_{NT}:=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left[\frac{1}{\lambda_{it}(\nu_{0})}\frac{\partial\lambda_{it}(\nu_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\nu_{0})}{\partial\theta^{\prime}}\right]

satisfies that

infNT1λmin(ΣNT)>0,\inf_{NT\geq 1}\lambda_{min}(\Sigma_{NT})>0, (3.5)

then θ^NT(r0)\hat{\theta}_{NT}^{(r_{0})} is asymptotically normal, i.e.

NTΣNT1/2(θ^NT(r0)θ0)𝑑N(0,I5)\sqrt{NT}\Sigma_{NT}^{1/2}(\hat{\theta}_{NT}^{(r_{0})}-\theta_{0})\overset{d}{\to}N(0,I_{5})

as TT\to\infty, NN\to\infty and N=o(T)N=o(T).

Remark.

In the proof of Theorem 4 below, we will show that, ΣNT\Sigma_{NT} could be consistently estimated by

Σ^NT=1NT(i,t)DNT[1λ~it(ν^NT)λ~it(ν^NT)θλ~it(ν^NT)θ]\widehat{\Sigma}_{NT}=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\nu}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\nu}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\nu}_{NT})}{\partial\theta^{\prime}}\right] (3.6)

in practice. In the simulation studies in Section 5, the elements of Σ^NT\widehat{\Sigma}_{NT} are used to construct confidence intervals and to evaluate coverage probabilities.

4 Hypothesis testing with TT\to\infty and NN\to\infty

Based on Theorem 2 and Theorem 3, for sufficiently large sample region such that r^NT=r0\hat{r}_{NT}=r_{0}, we are able to design a Wald test for the null hypothesis

H0:Γθ0=η,H_{0}:\Gamma\theta_{0}=\eta, (4.1)

where Γ\Gamma is an s×5s\times 5 matrix with rank ss ( 1s51\leq s\leq 5) and η\eta is an ss-dimensional vector. For example, to test the existence of threshold effect, simply let Γ=(0,1,1,0,0)\Gamma=(0,1,-1,0,0) and η=0\eta=0, and the null hypothesis (4.1) becomes

H0:α0(1)=α0(2);H_{0}:\alpha^{(1)}_{0}=\alpha^{(2)}_{0};

To see if the autoregressive term is necessary (i.e. GARCH effect), take Γ=(0,0,0,0,1)\Gamma=(0,0,0,0,1) and η=0\eta=0, and the hypothesis becomes

H0:β0=0v.s.H1:β0>0;H_{0}:\beta_{0}=0\quad v.s.\quad H_{1}:\beta_{0}>0;

To test the network effect, just let Γ=(0,0,0,1,0)\Gamma=(0,0,0,1,0) and η=0\eta=0, and the question becomes testing the hypothesis

H0:ξ0=0v.s.H1:ξ0>0.H_{0}:\xi_{0}=0\quad v.s.\quad H_{1}:\xi_{0}>0.

Corresponding to the asymptotic normality of θ^NT(r0)\hat{\theta}_{NT}^{(r_{0})} in Theorem 3, we define a Wald test statistic as follows

WNT:=(Γθ^NT(r0)η){1NTΓΣ^NT1Γ}1(Γθ^NT(r0)η),W_{NT}:=(\Gamma\hat{\theta}_{NT}^{(r_{0})}-\eta)^{\prime}\left\{\frac{1}{NT}\Gamma\widehat{\Sigma}_{NT}^{-1}\Gamma^{\prime}\right\}^{-1}(\Gamma\hat{\theta}_{NT}^{(r_{0})}-\eta), (4.2)

where Σ^NT\widehat{\Sigma}_{NT} is defined as (3.6). The following Theorem 4 shows that WNTW_{NT} has an asymptotic χ2\chi^{2}-distribution with degree of freedom ss.

Theorem 4.

Under the same assumptions as in Theorem 3, the Wald test statistic defined in (4.2) asymptotically follows a χ2\chi^{2} distribution with degree of freedom ss, i.e.

WNT𝑑χs2,W_{NT}\overset{d}{\to}\chi^{2}_{s},

as TT\to\infty, NN\to\infty and N=o(T)N=o(T).

5 Simulation studies and empirical data analysis

5.1 Simulation studies

We intend to use four different mechanisms of simulating the network structure in model (2.1). The network structure in Example 5.1 satisfies Assumption 3.3. Simulation mechanisms in Examples 5.25.4 are for testing the robustness of our estimation, against network structures that may violate Assumption 3.3.

Example 5.1.

(D-neighbourhood network) For each node i{1,2,,N}i\in\left\{1,2,...,N\right\}, it is connected to node jj only if jj is inside ii’s DD-neighbourhood. That is, in the adjacency matrix, aij=1a_{ij}=1 if 0<|ij|D0<|i-j|\leq D and aij=0a_{ij}=0 otherwise. Figure 1(a) is a visualization of such a network with N=100N=100 and D=10D=10.

Example 5.2.

(Random network) For each node i{1,2,,N}i\in\left\{1,2,...,N\right\}, we generate DiD_{i} from uniform distribution U(0,5)U(0,5), and then draw [Di][D_{i}] samples randomly from {1,2,,N}\left\{1,2,...,N\right\} to form a set SiS_{i} ([x][x] denotes the integer part of xx). A=(aij)A=(a_{ij}) could be generated by letting aij=1a_{ij}=1 if jSij\in S_{i} and aij=0a_{ij}=0 otherwise. In a network simulated with such mechanism, as it is indicated in Figure 1(b), there is no significantly influential node (i.e. node with extremely large in-degree).

Example 5.3.

(Network with power-law) According to Clauset et al., (2009), for each node ii in such a network, DiD_{i} is generated the same way as in Example 5.2. Instead of uniformly selecting [Di][D_{i}] samples from {1,2,,N}\left\{1,2,...,N\right\}, these samples are collected w.r.t. probability pi=si/i=1Nsip_{i}=s_{i}/\sum_{i=1}^{N}s_{i} where sis_{i} is generated from a discrete power-law distribution {si=x}xa\mathbb{P}\left\{s_{i}=x\right\}\propto x^{-a} with scaling parameter a=2.5a=2.5. As shown in Figure 1(c), a few nodes have much larger in-degrees while most of them have less than 2. Compared to Example 5.2, network structure with power-law distribution exhibits larger gaps between the influences of different nodes. This type of network is suitable for modeling social media such as Twitter and Instagram, where celebrities have huge influence while the ordinary majority has little.

Example 5.4.

(Network with K-blocks) As it was proposed in Nowicki and Snijders, (2001), in a network with stochastic block structure, all nodes are divided into blocks and nodes from the same block are more likely to be connected comparing to those from different blocks. To simulate such structure, these NN nodes are randomly divided into KK groups by assigning labels {1,2,,K}\left\{1,2,...,K\right\} to every nodes with equal probability. For any two nodes ii and jj from the same group, let (aij=1)=0.5\mathbb{P}(a_{ij}=1)=0.5 while for those two from different groups, (aij=1)=0.001/N\mathbb{P}(a_{ij}=1)=0.001/N. Hence, it is very unlikely for nodes to be connected across groups. Our simulated network successfully mimics this characteristic as Figure 1(d) shows clear boundaries between groups. Block network also has its advantage in practical perspective. For instance, the price of one stock is highly relevant to those in the same industry sector.

Refer to caption
(a) Example 5.1 (D = 10)
Refer to caption
(b) Example 5.2
Refer to caption
(c) Example 5.3
Refer to caption
(d) Example 5.4 (K = 10)
Figure 1: Visualized network structures with N = 100

Set the true parameters ν0=(0.5,0.7,0.6,0.1,0.1,5)\nu_{0}=(0.5,0.7,0.6,0.1,0.1,5)^{\prime} of the data generating process (2.1). As for the sample region DNT={(i,t):i=1,2,,N;t=1,2,,T}D_{NT}=\{(i,t):i=1,2,...,N;t=1,2,...,T\}, let TT increases from 200 to 2000, while NN also increases at relatively slower rates of 𝒪(T)\mathcal{O}(\sqrt{T}) and 𝒪(T/log(T))\mathcal{O}(T/\log(T)) respectively, as it is showed in the following table:

TT 200 500 1000 2000
NTN\approx\sqrt{T} 14 22 31 44
NT/log(T)N\approx T/\log(T) 37 80 144 263

For each network size NN, the adjacency matrix AA is simulated according to four different mechanisms in Example 5.1 to Example 5.4.

Remark.

Particularly in the empirical analysis we will study the dataset of car collisions across different neighbourhoods that are distributed on five boroughs of New York City. These boroughs are separated by rivers (except for Brooklyn and Queens), and neighbourhoods within the same borough are more likely to share a borderline while cross-borough connections are very rare. Therefore the network constructed with New York City neighbourhoods follows the block structure in Example 5.4 with N=20N=20 and K=5K=5.

Based on a simulated network, the data is generated according to (2.1), and the true parameters are estimated by the MLE (3.4). To monitor the finite performance of MLE, data generation and parameter estimation are repeated for M=1000M=1000 times, for each combination of sample size (N,T)(N,T). The mm-th replication produces the estimates θ^m=(ω^m,α^m(1),α^m(2),ξ^m,β^m)\hat{\theta}_{m}=(\hat{\omega}_{m},\hat{\alpha}^{(1)}_{m},\hat{\alpha}^{(2)}_{m},\hat{\xi}_{m},\hat{\beta}_{m})^{\prime} and r^m\hat{r}_{m}. We use the following two measurements to evaluate the performance of simulation results:

  • Root-mean-square error: RMSEk=M1m=1M(θ^kmθk0)2RMSE_{k}=\sqrt{M^{-1}\sum_{m=1}^{M}(\hat{\theta}_{km}-\theta_{k0})^{2}},

  • Coverage probability: CPk=M1m=1M1{θk0CIkm}CP_{k}=M^{-1}\sum_{m=1}^{M}1_{\left\{\theta_{k0}\in CI_{km}\right\}},

where θ^km\hat{\theta}_{km} represents kkth component of θ^m\hat{\theta}_{m}, CIkmCI_{km} is the 95% confidence interval defined as

CIkm=(θ^kmz0.975SE^km,θ^km+z0.975SE^km).CI_{km}=\left(\hat{\theta}_{km}-z_{0.975}\widehat{SE}_{km},\hat{\theta}_{km}+z_{0.975}\widehat{SE}_{km}\right).

Here the estimated standard error SE^km\widehat{SE}_{km} is the square root of kk-th diagonal element of (NT)1Σ^NT1(NT)^{-1}\widehat{\Sigma}_{NT}^{-1}, and z0.975z_{0.975} is the 0.975th quantile of standard normal distribution. To eliminate the effect of starting points, a different initial values of θ\theta is used for each mm. RMSEs and CPs with different sample sizes and network simulation mechanisms are reported in Tables 1 and 2; We also report the mean estimates of the threshold r0r_{0} in the last columns of both tables.

TT NN ω\omega α(1)\alpha^{(1)} α(2)\alpha^{(2)} ξ\xi β\beta r¯\bar{r}
Example 5.1 200 14 0.0696 (0.94) 0.0203 (0.94) 0.0278 (0.93) 0.0170 (0.95) 0.0256 (0.93) 5.028
500 22 0.0367 (0.96) 0.0100 (0.95) 0.0138 (0.95) 0.0101 (0.93) 0.0127 (0.95) 5
1000 31 0.0238 (0.95) 0.0058 (0.95) 0.0081 (0.95) 0.0062 (0.97) 0.0074 (0.95) 5
2000 44 0.0153 (0.95) 0.0035 (0.95) 0.0047 (0.95) 0.0041 (0.96) 0.0045 (0.95) 5
Example 5.2 200 14 0.0454 (0.95) 0.0200 (0.95) 0.0264 (0.94) 0.0119 (0.96) 0.0245 (0.94) 5.045
500 22 0.0284 (0.95) 0.0101 (0.95) 0.0134 (0.95) 0.0072 (0.94) 0.0126 (0.95) 5.002
1000 31 0.0162 (0.97) 0.0059 (0.96) 0.0077 (0.97) 0.0044 (0.94) 0.0074 (0.95) 5
2000 44 0.0112 (0.96) 0.0034 (0.96) 0.0047 (0.95) 0.0029 (0.94) 0.0043 (0.96) 5
Example 5.3 200 14 0.0511 (0.96) 0.0200 (0.95) 0.0272 (0.94) 0.0131 (0.95) 0.0246 (0.95) 5.034
500 22 0.0349 (0.95) 0.0102 (0.95) 0.0135 (0.96) 0.0084 (0.95) 0.0127 (0.96) 5.001
1000 31 0.0146 (0.95) 0.0060 (0.95) 0.0079 (0.95) 0.0038 (0.95) 0.0077 (0.94) 5
2000 44 0.0104 (0.95) 0.0035 (0.95) 0.0048 (0.94) 0.0025 (0.95) 0.0043 (0.96) 5
Example 5.4 200 14 0.0882 (0.95) 0.0205 (0.95) 0.0273 (0.95) 0.0227 (0.94) 0.0256 (0.93) 5.013
500 22 0.0379 (0.94) 0.0102 (0.95) 0.0136 (0.95) 0.0096 (0.95) 0.0124 (0.95) 5
1000 31 0.0218 (0.95) 0.0060 (0.95) 0.0078 (0.95) 0.0055 (0.95) 0.0073 (0.96) 5
2000 44 0.0118 (0.94) 0.0035 (0.96) 0.0047 (0.95) 0.0029 (0.95) 0.0043 (0.96) 5
Table 1: Simulation results with different network structures (NTN\approx\sqrt{T}).
TT NN ω\omega α(1)\alpha^{(1)} α(2)\alpha^{(2)} ξ\xi β\beta r¯\bar{r}
Example 5.1 200 37 0.0537 (0.95) 0.0124 (0.95) 0.0164 (0.95) 0.0143 (0.94) 0.0158 (0.94) 5.002
500 80 0.0287 (0.96) 0.0054 (0.94) 0.0071 (0.95) 0.0078 (0.95) 0.0066 (0.95) 5
1000 144 0.0201 (0.95) 0.0029 (0.94) 0.0040 (0.93) 0.0055 (0.95) 0.0036 (0.94) 5
2000 263 0.0136 (0.95) 0.0015 (0.94) 0.0019 (0.95) 0.0038 (0.95) 0.0019 (0.93) 5
Example 5.2 200 37 0.0347 (0.95) 0.0121 (0.95) 0.0170 (0.95) 0.0089 (0.95) 0.0161 (0.93) 5.008
500 80 0.0140 (0.95) 0.0053 (0.95) 0.0070 (0.95) 0.0035 (0.95) 0.0066 (0.95) 5
1000 144 0.0073 (0.95) 0.0029 (0.93) 0.0036 (0.95) 0.0020 (0.94) 0.0036 (0.93) 5
2000 263 0.0041 (0.95) 0.0014 (0.95) 0.0020 (0.94) 0.0011 (0.95) 0.0018 (0.96) 5
Example 5.3 200 37 0.0385 (0.95) 0.0124 (0.94) 0.0168 (0.95) 0.0092 (0.95) 0.0152 (0.95) 5.003
500 80 0.0144 (0.95) 0.0054 (0.95) 0.0071 (0.94) 0.0036 (0.95) 0.0067 (0.95) 5
1000 144 0.0073 (0.94) 0.0029 (0.94) 0.0035 (0.96) 0.0019 (0.94) 0.0035 (0.95) 5
2000 263 0.0037 (0.95) 0.0015 (0.95) 0.0019 (0.96) 0.0009 (0.95) 0.0018 (0.95) 5
Example 5.4 200 37 0.0498 (0.95) 0.0120 (0.95) 0.0165 (0.94) 0.0129 (0.94) 0.0148 (0.96) 5.011
500 80 0.0176 (0.94) 0.0055 (0.94) 0.0071 (0.94) 0.0045 (0.94) 0.0069 (0.94) 5
1000 144 0.0083 (0.97) 0.0028 (0.95) 0.0036 (0.96) 0.0022 (0.96) 0.0034 (0.95) 5
2000 263 0.0048 (0.95) 0.0015 (0.95) 0.0019 (0.95) 0.0012 (0.96) 0.0019 (0.95) 5
Table 2: Simulation results with different network structures (NT/log(T)N\approx T/\log(T)).

From Tables 1 and 2 it can be seen that the RMSEs of θ^NT\hat{\theta}_{NT} decrease asymptotically toward zero, and the mean of r^NT\hat{r}_{NT} is equal to r0=5r_{0}=5 for sufficiently large sample size. These results support the consistency of MLE (3.4) in Theorem 2. The reported CPs are close to the theoretical value 0.950.95. This shows that SE^\widehat{SE} provides a reliable estimation of the true standard error of θ^NT\hat{\theta}_{NT}. Moreover, normal Q-Q plots for the estimation results, Figures 2 to 5, are presented when T=2000,N=44T=2000,N=44 and T=2000,N=263T=2000,N=263 respectively, under different network structures. These Q-Q plots provide additional evidence for the asymptotic normality of θ^NT\hat{\theta}_{NT} in Theorem 3.

Refer to caption
(a) T=2000,N=44T=2000,N=44
Refer to caption
(b) T=2000,N=263T=2000,N=263
Figure 2: Q-Q plots of estimates for Example 5.1.
Refer to caption
(a) T=2000,N=44T=2000,N=44
Refer to caption
(b) T=2000,N=263T=2000,N=263
Figure 3: Q-Q plots of estimates for Example 5.2.
Refer to caption
(a) T=2000,N=44T=2000,N=44
Refer to caption
(b) T=2000,N=263T=2000,N=263
Figure 4: Q-Q plots of estimates for Example 5.3.
Refer to caption
(a) T=2000,N=44T=2000,N=44
Refer to caption
(b) T=2000,N=263T=2000,N=263
Figure 5: Q-Q plots of estimates for Example 5.4.

5.2 Analysis of daily numbers of car accidents in New York City

New York City Police Department (NYPD) publishes and updates regularly the detailed data of motor vehicle collisions that have occurred city-wide. These data are openly accessible on NYPD website111https://www1.nyc.gov/site/nypd/stats/traffic-data/traffic-data-collision.page and contain sufficient information for us to apply our model. We collect all records from 16th February 2021 to 30th June 2022, each record includes the date when an accident happened, and the zip code of where it happened. We classified all records into 41 neighbourhoods according to the correspondence between zip codes and the geometric locations they represent. Re-grouping the data by neighbourhoods and dates of occurrence, we obtain a high-dimensional time series with dimension N=41N=41 and sample size T=500T=500.

Two neighbourhoods are regarded as connected nodes if they share a borderline. Therefore, based on the geometric information provided by the data, we are able to construct a reasonable network with 41 nodes, which is visualized in Figure 6. In Figure 7 we plot histograms of daily numbers of car accidents in 9 randomly selected neighbourhoods. The shapes of histograms show potential Poisson distribution. Moreover, in Figure 8 we can easily observe volatility clustering in daily numbers of car accident in four selected neighbourhoods of NYC, indicating potential autoregressive structure in the conditional heteroscedasticity of the data.

Refer to caption
Figure 6: Network of 41 neighbourhoods in New York City
Refer to caption
Figure 7: Distributions of daily occurrences of car accident in selected neighbourhoods.
Refer to caption
Figure 8: Daily occurrences of car accident in 4 neighbourhoods.

Our model was fitted to this dataset by the method proposed in Section 3. The results of parameter estimation are reported in Table 3 below.

ω\omega α(1)\alpha^{(1)} α(2)\alpha^{(2)} ξ\xi β\beta rr
Estimation 0.018693 0.126472 0.135026 0.002727 0.862244 10
SE 4.12e-03 4.40e-03 4.68e-03 1.09e-03 4.73e-03 \
Table 3: Estimation results based on daily number of car accidents in 41 neighbourhoods of NYC.

Now we try to interpret these results. Firstly, it is worthy noting that α(1)\alpha^{(1)} is slightly smaller than α(2)\alpha^{(2)}, which means that the conditional variance of the number of car accidents in these neighbourhoods are less affected by previous day’s number if it is above the threshold r=10r=10. Secondly, the volatility in the number of car accidents in one area is also affected by its geometrically neighboured areas. In addition, the estimated value of β\beta is significantly larger than other coefficients, indicating a strong persistence in volatility that leads to volatility clustering.

At last, we utilize the Wald test in Section 4 to investigate the following important properties of this real dataset, and these properties provide substantial evidence to support our model setting:

(i) The existence of threshold effect (i.e asymmetric property) for volatility. The null hypothesis is H0:α0(1)=α0(2)H_{0}:\alpha^{(1)}_{0}=\alpha^{(2)}_{0} (by taking Γ=(0,1,1,0,0)\Gamma=(0,1,-1,0,0) and η=0\eta=0 in (4.1)). In this case, the Wald statistic (4.2) WNT=18.94W_{NT}=18.94, which suggests the rejection of H0H_{0} at significant level below 0.010.01 according to Theorem 4. This testing shows that the proposed model with threshold is essentially useful for capturing the nature of daily numbers of car accidents in New York City.

(ii) The existence of GARCH effect (persistence in volatility). For the null hypothesis is H0:β0=0H_{0}:\beta_{0}=0, the Wald statistic (4.2) WNT=33212W_{NT}=33212 and pp-value is very close 0. This strongly suggests that the autoregressive term (i.e. λi,t1\lambda_{i,t-1}) should be included in the model. This can be seen from the large value of estimate of β\beta.

(iii) The existence of network effect. The Wald statistic (4.2) WNT=6.31W_{NT}=6.31 and pp-value is 0.0120.012 in this case. This indicate that there is significant network effect among the daily numbers of car accidents.

6 Conclusion

We have proposed a model to describe spatio-temporal integer-valued data which are observed at nodes of a network and have asymmetric property. Although the proposed model is called GARCH-type for volatility (conditional variance), it also can be used to model the conditional mean, because we assume that the conditional distribution is Poisson distribution. Asymptotic inferences, parameter estimation and hypothesis testing of the proposed model, are discussed by applying limit theorems for nonstationary arrays of random fields. Simulations for different network structures and application to a real dataset show that our methodology is useful for modelling dynamic spatio-temporal integer-valued data.

Appendix A Proofs of theoretical results

In this appendix, we give details of proofs for our theoretical results.

Lemma A.1.

If 0β<10\leq\beta<1, 𝔼|yit|<\mathbb{E}\left|y_{it}\right|<\infty and 𝔼|λit(ν)|<\mathbb{E}\left|\lambda_{it}(\nu)\right|<\infty for all (i,t)DNT,NT1(i,t)\in D_{NT},NT\geq 1, then

λit(ν)=k=1βk1[ω+αi,tkyi,tk+ξj=1Nwijyj,tk]\lambda_{it}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right] (A.1)

with probability one for all (i,t)DNT,NT1(i,t)\in D_{NT},NT\geq 1 and νΘ×+\nu\in\Theta\times\mathbb{Z}_{+}, where αi,tk=α(1)1{yi,tkr}+α(2)1{yi,tk<r}\alpha_{i,t-k}=\alpha^{(1)}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}<r\}}.

Proof.

When β=0\beta=0, (A.1) obviously holds. Now we consider the case when 0<β<10<\beta<1. Let log+(x)=log(x)\log^{+}(x)=\log(x) if x>1x>1 and 0 otherwise, ui,tk(ν):=ω+αi,tkyi,tk+ξj=1Nwijyj,tku_{i,t-k}(\nu):=\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}. By Jensen’s inequality we have

𝔼log+|ui,tk(ν)|\displaystyle\mathbb{E}\log^{+}|u_{i,t-k}(\nu)|
\displaystyle\leq log+𝔼|ω+αi,tkyi,tk+ξj=1Nwijyj,tk|\displaystyle\log^{+}\mathbb{E}\left|\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right|
<\displaystyle< .\displaystyle\infty.

By Lemma 2.2 in Berkes et al., (2003) we have k=1[|ui,tk(ν)|>ζk]<\sum_{k=1}^{\infty}\mathbb{P}\left[|u_{i,t-k}(\nu)|>\zeta^{k}\right]<\infty for any ζ>1\zeta>1. Therefore, by the Borel-Cantelli lemma, |ui,tk(ν)|ζk|u_{i,t-k}(\nu)|\leq\zeta^{k} almost surely. Letting 1<ζ<1|β|1<\zeta<\frac{1}{|\beta|}, we can prove that the right-hand-side of (A.1) converges almost surely.

It remains for us to show that

λit(ν)=k=1βk1ui,tk(ν).\lambda_{it}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}u_{i,t-k}(\nu).

From (3.2), we have

λit(ν)βkλi,tk1(ν)=ui,t1(ν)+βui,t2(ν)++βk1ui,tk(ν).\lambda_{it}(\nu)-\beta^{k}\lambda_{i,t-k-1}(\nu)=u_{i,t-1}(\nu)+\beta u_{i,t-2}(\nu)+...+\beta^{k-1}u_{i,t-k}(\nu).

Using the Markov inequality, we obtain that k=1{|βkλi,tk1(ν)|>δ}<\sum_{k=1}^{\infty}\mathbb{P}\left\{|\beta^{k}\lambda_{i,t-k-1}(\nu)|>\delta\right\}<\infty for any δ>0\delta>0. Then, by the Borel-Cantelli lemma, |βkλi,tk1(ν)|a.s.0|\beta^{k}\lambda_{i,t-k-1}(\nu)|\overset{a.s.}{\to}0 as kk\to\infty. Letting kk\to\infty on both sides of above equation, we can conclude the result of Lemme A.1. ∎

A.1 Proof of Theorem 1

Our proof of Theorem 1 relies on the arguments given by Ferland et al., (2006) in their proof of Corollary 1. Let

Λt(0):=(λ1t(0),λ2t(0),,λNt(0));\displaystyle\Lambda_{t}^{(0)}:=\left(\lambda_{1t}^{(0)},\lambda_{2t}^{(0)},...,\lambda_{Nt}^{(0)}\right)^{\prime};
𝕐t(0):=(N1t(λ1t(0)),N2t(λ2t(0)),,NNt(λNt(0))),\displaystyle\mathbb{Y}_{t}^{(0)}:=\left(N_{1t}(\lambda_{1t}^{(0)}),N_{2t}(\lambda_{2t}^{(0)}),...,N_{Nt}(\lambda_{Nt}^{(0)})\right)^{\prime},

where {λit(0):i=1,2,,N,t}\{\lambda_{it}^{(0)}:i=1,2,...,N,t\in\mathbb{Z}\} are IID positive random variables with mean 1. For each n1n\geq 1, we define {𝕐t(n):t}\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\} and {Λt(n):t}\{\Lambda_{t}^{(n)}:t\in\mathbb{Z}\} through following recursion:

𝕐t(n)=(N1t(λ1t(n)),N2t(λ2t(n)),,NNt(λNt(n)));\displaystyle\mathbb{Y}_{t}^{(n)}=(N_{1t}(\lambda_{1t}^{(n)}),N_{2t}(\lambda_{2t}^{(n)}),...,N_{Nt}(\lambda_{Nt}^{(n)}))^{\prime}; (A.2)
Λt(n)=ω𝟏N+A(𝕐t1(n1))𝕐t1(n1)+βΛt1(n1).\displaystyle\Lambda_{t}^{(n)}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}+\beta\Lambda_{t-1}^{(n-1)}.
Claim A.1.

{𝕐t(n):t}\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\} is strictly stationary for each n0n\geq 0.

Proof.

Since {Nit():i=1,2,,N,t}\{N_{it}(\cdot):i=1,2,...,N,t\in\mathbb{Z}\} are independent Poisson processes with unit intensity, then for any tt and hh we have

{𝕐1+h(n)=𝐲1,,𝕐t+h(n)=𝐲t}\displaystyle\mathbb{P}\left\{\mathbb{Y}_{1+h}^{(n)}=\mathbf{y}_{1},...,\mathbb{Y}_{t+h}^{(n)}=\mathbf{y}_{t}\right\} (A.3)
=\displaystyle= 𝔼({𝕐1+h(n)=𝐲1,,𝕐t+h(n)=𝐲t|Λ1+h(n),,Λt+h(n)})\displaystyle\mathbb{E}\left(\mathbb{P}\left\{\mathbb{Y}_{1+h}^{(n)}=\mathbf{y}_{1},...,\mathbb{Y}_{t+h}^{(n)}=\mathbf{y}_{t}\left|\Lambda_{1+h}^{(n)},...,\Lambda_{t+h}^{(n)}\right.\right\}\right)
=\displaystyle= 𝔼(k=1ti=1N(λi,k+h(n))yikyik!eλi,k+h(n)).\displaystyle\mathbb{E}\left(\prod_{k=1}^{t}\prod_{i=1}^{N}\frac{\left(\lambda_{i,k+h}^{(n)}\right)^{y_{ik}}}{y_{ik}!}e^{-\lambda_{i,k+h}^{(n)}}\right).

When n=0n=0, {𝕐1+h(0)=𝐲1,,𝕐t+h(0)=𝐲t}\mathbb{P}\left\{\mathbb{Y}_{1+h}^{(0)}=\mathbf{y}_{1},...,\mathbb{Y}_{t+h}^{(0)}=\mathbf{y}_{t}\right\} is hh-invariant for any tt and hh, by (A.3) and the IID of {λit(0):i=1,2,,N,t}\{\lambda_{it}^{(0)}:i=1,2,...,N,t\in\mathbb{Z}\}. Therefore {𝕐t(0):t}\{\mathbb{Y}_{t}^{(0)}:t\in\mathbb{Z}\} is strictly stationary. Assume that {𝕐t(n1):t}\{\mathbb{Y}_{t}^{(n-1)}:t\in\mathbb{Z}\} and {Λt(n1):t}\{\Lambda_{t}^{(n-1)}:t\in\mathbb{Z}\} are strictly stationary, then {Λt(n):t}\{\Lambda_{t}^{(n)}:t\in\mathbb{Z}\} is also strictly stationary since Λt(n)=ω𝟏N+A(𝕐t1(n1))𝕐t1(n1)+βΛt1(n1)\Lambda_{t}^{(n)}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}+\beta\Lambda_{t-1}^{(n-1)}. According to (A.3) and the strict stationarity of {Λt(n):t}\{\Lambda_{t}^{(n)}:t\in\mathbb{Z}\}, we have {𝕐t(n):t}\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\} being strictly stationary too. Claim A.1 can be proved by induction.

Let 𝐱1=|x1|+|x2|++|xN|\left\lVert\mathbf{x}\right\rVert_{1}=|x_{1}|+|x_{2}|+...+|x_{N}| for vector 𝐱=(x1,x2,,xN)\mathbf{x}=(x_{1},x_{2},...,x_{N})^{\prime}. In following claim we prove the convergence of 𝕐t(n)\mathbb{Y}_{t}^{(n)} as nn\to\infty.

Claim A.2.

𝔼𝕐t(n+1)𝕐t(n)1Cρn\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}\leq C\rho^{n} for some constants C>0C>0 and 0<ρ<10<\rho<1.

Proof.

Since NitN_{it} is a Poisson process with unit intensity, Nit(λit(n+1))Nit(λit(n))N_{it}(\lambda_{it}^{(n+1)})-N_{it}(\lambda_{it}^{(n)}) is Poisson distributed with parameter λit(n+1)λit(n)\lambda_{it}^{(n+1)}-\lambda_{it}^{(n)} assuming that λit(n+1)λit(n)\lambda_{it}^{(n+1)}\geq\lambda_{it}^{(n)}. Then it is easy to verify that

𝔼𝕐t(n+1)𝕐t(n)1\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}
=\displaystyle= 𝔼[𝔼(𝕐t(n+1)𝕐t(n)1|Λt(n+1),Λt(n))]\displaystyle\mathbb{E}\left[\mathbb{E}\left(\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}\left|\Lambda_{t}^{(n+1)},\Lambda_{t}^{(n)}\right.\right)\right]
=\displaystyle= 𝔼[𝔼(i=1N|Nit(λit(n+1))Nit(λit(n))||Λt(n+1),Λt(n))]\displaystyle\mathbb{E}\left[\mathbb{E}\left(\sum_{i=1}^{N}\left|N_{it}(\lambda_{it}^{(n+1)})-N_{it}(\lambda_{it}^{(n)})\right|\left|\Lambda_{t}^{(n+1)},\Lambda_{t}^{(n)}\right.\right)\right]
=\displaystyle= 𝔼[i=1N|λit(n+1)λit(n)|]\displaystyle\mathbb{E}\left[\sum_{i=1}^{N}\left|\lambda_{it}^{(n+1)}-\lambda_{it}^{(n)}\right|\right]
=\displaystyle= 𝔼Λt(n+1)Λt(n)1.\displaystyle\mathbb{E}\left\lVert\Lambda_{t}^{(n+1)}-\Lambda_{t}^{(n)}\right\rVert_{1}.

Recall that, from (A.2),

Λt(n)=ω𝟏N+A(𝕐t1(n1))𝕐t1(n1)+βΛt1(n1).\Lambda_{t}^{(n)}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}+\beta\Lambda_{t-1}^{(n-1)}.

Then

𝕐t(n+1)𝕐t(n)1\displaystyle\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1} (A.4)
\displaystyle\leq A(𝕐t1(n))𝕐t1(n)A(𝕐t1(n1))𝕐t1(n1)1+βΛt1(n)Λt1(n1)1.\displaystyle\left\lVert A(\mathbb{Y}_{t-1}^{(n)})\mathbb{Y}_{t-1}^{(n)}-A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}+\beta\left\lVert\Lambda_{t-1}^{(n)}-\Lambda_{t-1}^{(n-1)}\right\rVert_{1}.

To estimate the right-side terms of the above inequality, we define function ψ(y)=α(1)1{yr}y+α(2)1{y<r}y\psi(y)=\alpha^{(1)}1_{\{y\geq r\}}y+\alpha^{(2)}1_{\{y<r\}}y for yy\in\mathbb{N}. For any y,yy,y^{\prime}\in\mathbb{N}, we have

  • If yry\geq r and yry^{\prime}\geq r, we have |ψ(y)ψ(y)|=α(1)|yy|α|yy||\psi(y^{\prime})-\psi(y)|=\alpha^{(1)}|y^{\prime}-y|\leq\alpha^{*}|y^{\prime}-y| where α=max{α(1),α(2),|α(1)rα(2)(r1)|}\alpha^{*}=\max\left\{\alpha^{(1)},\alpha^{(2)},\left|\alpha^{(1)}r-\alpha^{(2)}(r-1)\right|\right\};

  • If y<ry<r and y<ry^{\prime}<r, we have |ψ(y)ψ(y)|=α(2)|yy|α|yy||\psi(y^{\prime})-\psi(y)|=\alpha^{(2)}|y^{\prime}-y|\leq\alpha^{*}|y^{\prime}-y|.

  • If yy and yy^{\prime} are on different sides of rr, we assume that yry\geq r and y<ry^{\prime}<r without loss of generality. Notice that

    ψ(y)ψ(y)yy=α(1)yα(2)yyy=α(2)+(α(1)α(2))yyy.\frac{\psi(y)-\psi(y^{\prime})}{y-y^{\prime}}=\frac{\alpha^{(1)}y-\alpha^{(2)}y^{\prime}}{y-y^{\prime}}=\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})\frac{y}{y-y^{\prime}}.

    When α(1)α(2)\alpha^{(1)}\geq\alpha^{(2)}, we see

    0<ψ(y)ψ(y)yyα(2)+(α(1)α(2))yy(r1)α(2)+(α(1)α(2))r;0<\frac{\psi(y)-\psi(y^{\prime})}{y-y^{\prime}}\leq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})\frac{y}{y-(r-1)}\leq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})r;

    When α(1)<α(2)\alpha^{(1)}<\alpha^{(2)}, we see

    α(2)ψ(y)ψ(y)yyα(2)+(α(1)α(2))yy(r1)α(2)+(α(1)α(2))r.\alpha^{(2)}\geq\frac{\psi(y)-\psi(y^{\prime})}{y-y^{\prime}}\geq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})\frac{y}{y-(r-1)}\geq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})r.

Combining above cases, we obtain that

|ψ(y)ψ(y)|α|yy||\psi(y^{\prime})-\psi(y)|\leq\alpha^{*}|y^{\prime}-y| (A.5)

for any y,yy^{\prime},y\in\mathbb{N}.

Therefore,

|(A(𝕐t1(n))𝕐t1(n)A(𝕐t1(n1))𝕐t1(n1))i|\displaystyle\left|\left(A(\mathbb{Y}_{t-1}^{(n)})\mathbb{Y}_{t-1}^{(n)}-A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}\right)_{i}\right|
=\displaystyle= |ψ(yi,t1(n))ψ(yi,t1(n1))+ξj=1Nwij(yj,t1(n)yj,t1(n1))|\displaystyle\left|\psi(y_{i,t-1}^{(n)})-\psi(y_{i,t-1}^{(n-1)})+\xi\sum_{j=1}^{N}w_{ij}(y_{j,t-1}^{(n)}-y_{j,t-1}^{(n-1)})\right| (A.6)
\displaystyle\leq α|yi,t1(n)yi,t1(n1)|+ξj=1Nwij|yj,t1(n)yj,t1(n1)|\displaystyle\alpha^{*}\left|y_{i,t-1}^{(n)}-y_{i,t-1}^{(n-1)}\right|+\xi\sum_{j=1}^{N}w_{ij}\left|y_{j,t-1}^{(n)}-y_{j,t-1}^{(n-1)}\right|

for i=1,2,,Ni=1,2,...,N, where (𝕐)i(\mathbb{Y})_{i} is the ii-th element of 𝕐\mathbb{Y}.

Combining (A.4) and (A.6), we get

𝔼𝕐t(n+1)𝕐t(n)1\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}
\displaystyle\leq 𝔼(αIN+ξW+βIN)(𝕐t1(n)𝕐t1(n1))1\displaystyle\mathbb{E}\left\lVert(\alpha^{*}I_{N}+\xi W+\beta I_{N})(\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)})\right\rVert_{1}
\displaystyle\leq ρ(αIN+ξW+βIN)𝔼𝕐t1(n)𝕐t1(n1)1\displaystyle\rho(\alpha^{*}I_{N}+\xi W+\beta I_{N})\mathbb{E}\left\lVert\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}
\displaystyle\leq (α+ξ+β)𝔼𝕐t1(n)𝕐t1(n1)1\displaystyle(\alpha^{*}+\xi+\beta)\mathbb{E}\left\lVert\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}

where ρ()\rho(\cdot) denotes the spectral radius, and the last inequality is due to the Gershgorin circle theorem. According to (2.3), we can find ρ=α+ξ+β<1\rho=\alpha^{*}+\xi+\beta<1 such that

𝔼𝕐t(n+1)𝕐t(n)1\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}
\displaystyle\leq ρ𝔼𝕐t1(n)𝕐t1(n1)1\displaystyle\rho\mathbb{E}\left\lVert\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}
\displaystyle\leq ρn𝔼𝕐tn(1)𝕐tn(0)1\displaystyle\rho^{n}\mathbb{E}\left\lVert\mathbb{Y}_{t-n}^{(1)}-\mathbb{Y}_{t-n}^{(0)}\right\rVert_{1}
=\displaystyle= ρn𝔼Λtn(1)Λtn(0)1\displaystyle\rho^{n}\mathbb{E}\left\lVert\Lambda_{t-n}^{(1)}-\Lambda_{t-n}^{(0)}\right\rVert_{1}
\displaystyle\leq Cρn\displaystyle C\rho^{n}

for C=𝔼Λtn(1)Λtn(0)1<C=\mathbb{E}\left\lVert\Lambda_{t-n}^{(1)}-\Lambda_{t-n}^{(0)}\right\rVert_{1}<\infty.

By Claim A.2,

{𝕐t(n+1)𝕐t(n)}=\displaystyle\mathbb{P}\left\{\mathbb{Y}_{t}^{(n+1)}\neq\mathbb{Y}_{t}^{(n)}\right\}= h=1{𝕐t(n+1)𝕐t(n)1=h}\displaystyle\sum_{h=1}^{\infty}\mathbb{P}\left\{\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}=h\right\}
\displaystyle\leq 𝔼𝕐t(n+1)𝕐t(n)1\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}
\displaystyle\leq Cρn.\displaystyle C\rho^{n}.

This implies that n=1{𝕐t(n+1)𝕐t(n)}<\sum_{n=1}^{\infty}\mathbb{P}\left\{\mathbb{Y}_{t}^{(n+1)}\neq\mathbb{Y}_{t}^{(n)}\right\}<\infty and

{n=1k=n[𝕐t(k+1)𝕐t(k)]}=0\mathbb{P}\left\{\bigcap_{n=1}^{\infty}\bigcup_{k=n}^{\infty}\left[\mathbb{Y}_{t}^{(k+1)}\neq\mathbb{Y}_{t}^{(k)}\right]\right\}=0

according to the Borel-Cantelli lemma. This means that, with probability one, there exists MM such that for all n>Mn>M, 𝕐t(n)\mathbb{Y}_{t}^{(n)} equals to some 𝕐t\mathbb{Y}_{t} with integer components. That is, 𝕐t=limn𝕐t(n)\mathbb{Y}_{t}=\lim_{n\to\infty}\mathbb{Y}_{t}^{(n)} exists almost surely. Apparently, {𝕐t:t}\{\mathbb{Y}_{t}:t\in\mathbb{Z}\} is strictly stationary since {𝕐t(n):t}\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\} is strictly stationary for each n0n\geq 0, according to Claim A.1.

At last, by Claim A.2, we also have

𝔼𝕐t(n+m)𝕐t(n)1k=0m1𝔼𝕐t(n+k+1)𝕐t(n+k)1Cρnk=0m1ρk,\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+m)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}\leq\sum_{k=0}^{m-1}\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+k+1)}-\mathbb{Y}_{t}^{(n+k)}\right\rVert_{1}\leq C\rho^{n}\sum_{k=0}^{m-1}\rho^{k},

for any n,mn,m\in\mathbb{N}. Therefore, {𝕐t(n):n0}\{\mathbb{Y}_{t}^{(n)}:n\geq 0\} is a Cauchy sequence in 𝕃1\mathbb{L}^{1}, hence 𝔼𝕐t1<\mathbb{E}\left\lVert\mathbb{Y}_{t}\right\rVert_{1}<\infty.

A.2 Proof of Theorem 2

From Lemma A.1, we have

λit(ν)=k=1βk1[ω+αi,tkyi,tk+ξj=1Nwijyj,tk]\lambda_{it}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right]

and

supNT1sup(i,t)DNTsupνΘ×+|λit(ν)|<\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}|\lambda_{it}(\nu)|<\infty (A.7)

with probability one, where αi,tk=α(1)1{yi,tkr}+α(2)1{yi,tk<r}\alpha_{i,t-k}=\alpha^{(1)}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}<r\}}. Given initial values λ~i0=0\tilde{\lambda}_{i0}=0 for i=1,2,,Ni=1,2,...,N, we can replace λit(ν)\lambda_{it}(\nu) with λ~it(ν)\tilde{\lambda}_{it}(\nu) and get

λ~it(ν)=k=1tβk1[ω+αi,tkyi,tk+ξj=1Nwijyj,tk]\tilde{\lambda}_{it}(\nu)=\sum_{k=1}^{t}\beta^{k-1}\left[\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right]

for i=1,2,,Ni=1,2,...,N and t1t\geq 1. Hence

λit(ν)λ~it(ν)=βtλi0(ν).\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)=\beta^{t}\lambda_{i0}(\nu). (A.8)

Now we are ready to prove the consistency of ν^NT\hat{\nu}_{NT} when TT\to\infty and NN\to\infty. The proof is broken up into Claim A.3 to Claim A.6 below: Claim A.3 shows that the choice of initial values is asymptotically negligible; Claims A.4 and A.5 verify the weak dependence of {lit(ν):(i,t)DNT,NT1}\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\}, and facilitate the adoption of LLN; Claim A.6 is concerned with the identifiability of the true parameters ν0\nu_{0}.

Claim A.3.

For any νΘ×+\nu\in\Theta\times\mathbb{Z}_{+}, |LNT(ν)L~NT(ν)|𝑝0|L_{NT}(\nu)-\tilde{L}_{NT}(\nu)|\overset{p}{\to}0 as TT\to\infty and NN\to\infty.

Proof.

By (A.7) and (A.8), we have

supνΘ×+|λit(ν)λ~it(ν)|Cρt\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}|\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)|\leq C\rho^{t} (A.9)

almost surely. Therefore,

|LNT(ν)L~NT(ν)|\displaystyle|L_{NT}(\nu)-\tilde{L}_{NT}(\nu)|
\displaystyle\leq 1NT(i,t)DNT|yitlog[λit(ν)λ~it(ν)][λit(ν)λ~it(ν)]|\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left|y_{it}\log\left[\frac{\lambda_{it}(\nu)}{\tilde{\lambda}_{it}(\nu)}\right]-[\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)]\right|
\displaystyle\leq 1NT(i,t)DNT[yit|λit(ν)λ~it(ν)λ~it(ν)|+|λit(ν)λ~it(ν)|]\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[y_{it}\left|\frac{\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)}{\tilde{\lambda}_{it}(\nu)}\right|+\left|\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)\right|\right]
\displaystyle\leq 1NT(i,t)DNTCρt(yitω+1)\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}C\rho^{t}\left(\frac{y_{it}}{\omega}+1\right)

almost surely. By the Markov inequality, for any δ>0\delta>0,

{|LNT(ν)L~NT(ν)|>δ}\displaystyle\mathbb{P}\left\{|L_{NT}(\nu)-\tilde{L}_{NT}(\nu)|>\delta\right\}
\displaystyle\leq 1δNTi=1Nt=1TC1ρt𝔼|yitω+1|\displaystyle\frac{1}{\delta NT}\sum_{i=1}^{N}\sum_{t=1}^{T}C_{1}\rho^{t}\mathbb{E}\left|\frac{y_{it}}{\omega}+1\right|
\displaystyle\leq 1δNTi=1Nt=1TC2ρt(because of Assumption 3.2(a))\displaystyle\frac{1}{\delta NT}\sum_{i=1}^{N}\sum_{t=1}^{T}C_{2}\rho^{t}\quad\mbox{(because of Assumption 3.2(a))}
\displaystyle\leq 1δTC2ρ1ρ0.\displaystyle\frac{1}{\delta T}\frac{C_{2}\rho}{1-\rho}\rightarrow 0.

as NN\to\infty and TT\to\infty. ∎

In the following proofs, for a random variable XX, we denote its 𝕃p\mathbb{L}^{p}-norm by Xp=(𝔼|X|p)1/p\left\lVert X\right\rVert_{p}=(\mathbb{E}|X|^{p})^{1/p}.

Claim A.4.

The functions lit(ν)l_{it}(\nu) are uniformly 𝕃p\mathbb{L}^{p}-bounded for some p>1p>1, i.e.

supNT1sup(i,t)DNTsupνΘ×+lit(ν)p<.\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert l_{it}(\nu)\right\rVert_{p}<\infty.
Proof.

According to the Hölder inequality, we have

lit(ν)p=\displaystyle\left\lVert l_{it}(\nu)\right\rVert_{p}= yitlogλit(ν)λit(ν)p\displaystyle\left\lVert y_{it}\log\lambda_{it}(\nu)-\lambda_{it}(\nu)\right\rVert_{p}
\displaystyle\leq yitlogλit(ν)p+λit(ν)p\displaystyle\left\lVert y_{it}\log\lambda_{it}(\nu)\right\rVert_{p}+\left\lVert\lambda_{it}(\nu)\right\rVert_{p}
\displaystyle\leq yit2plogλit(ν)2p+λit(ν)p.\displaystyle\left\lVert y_{it}\right\rVert_{2p}\left\lVert\log\lambda_{it}(\nu)\right\rVert_{2p}+\left\lVert\lambda_{it}(\nu)\right\rVert_{p}.

Note that

supνΘ×+logλit(ν)2p\displaystyle\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\log\lambda_{it}(\nu)\right\rVert_{2p}
\displaystyle\leq supνΘ×+log+λit(ν)2p+supνΘ×+logλit(ν)2p\displaystyle\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\log^{+}\lambda_{it}(\nu)\right\rVert_{2p}+\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\log^{-}\lambda_{it}(\nu)\right\rVert_{2p}
\displaystyle\leq supνΘ×+λit(ν)+12p+supνΘ×+max{log(ω),0}.\displaystyle\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\lambda_{it}(\nu)+1\right\rVert_{2p}+\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\max\{-\log(\omega),0\}.

Then, by Assumption 3.2(a) and (A.7), we complete the proof. ∎

Claim A.5.

For any element ν\nu in the parameter space satisfying Assumption 3.1, the array of random fields {lit(ν):(i,t)DNT,NT1}\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\} is η\eta-weakly dependent with coefficient η¯0(s)Csμ0\bar{\eta}_{0}(s)\leq Cs^{-\mu_{0}} where μ0>2\mu_{0}>2.

Proof.

For each (i,t)DNT(i,t)\in D_{NT} and h=1,2,h=1,2,..., define {yjτ(h):(j,τ)DNT,NT1}\{y_{j\tau}^{(h)}:(j,\tau)\in D_{NT},NT\geq 1\} such that yjτ(h)yjτy_{j\tau}^{(h)}\neq y_{j\tau} if and only if ρ((i,t),(j,τ))=h\rho((i,t),(j,\tau))=h.

λit(h)(ν)=k=1βk1[ω+αi,tk(h)yi,tk(h)+ξj=1Nwijyj,tk(h)],\lambda_{it}^{(h)}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\alpha_{i,t-k}^{(h)}y_{i,t-k}^{(h)}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}^{(h)}\right],

where

αi,tk(h)=α(1)1{yi,tk(h)r}+α(2)1{yi,tk(h)<r}.\alpha_{i,t-k}^{(h)}=\alpha^{(1)}1_{\{y_{i,t-k}^{(h)}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}^{(h)}<r\}}.

Then, by (A.5) and Assumption 3.3, we have

|λit(ν)λit(h)(ν)|\displaystyle|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)|
\displaystyle\leq k=1βk1|αi,tkyi,tkαi,tk(h)yi,tk(h)|+k=1j=1Nβk1ξwij|yj,tkyj,tk(h)|\displaystyle\sum_{k=1}^{\infty}\beta^{k-1}|\alpha_{i,t-k}y_{i,t-k}-\alpha_{i,t-k}^{(h)}y_{i,t-k}^{(h)}|+\sum_{k=1}^{\infty}\sum_{j=1}^{N}\beta^{k-1}\xi w_{ij}|y_{j,t-k}-y_{j,t-k}^{(h)}|
=\displaystyle= βh1|αi,thyi,thαi,th(h)yi,th(h)|+ξβh11|ji|hwij|yj,thyj,th(h)|\displaystyle\beta^{h-1}|\alpha_{i,t-h}y_{i,t-h}-\alpha_{i,t-h}^{(h)}y_{i,t-h}^{(h)}|+\xi\beta^{h-1}\sum_{1\leq|j-i|\leq h}w_{ij}|y_{j,t-h}-y_{j,t-h}^{(h)}| (A.10)
+ξwi,i±hk=1hβk1|yi±h,tkyi±h,tk(h)|\displaystyle+\xi w_{i,i\pm h}\sum_{k=1}^{h}\beta^{k-1}|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}|
\displaystyle\leq αβh1|yi,thyi,th(h)|+ξβh11|ji|h|yj,thyj,th(h)|\displaystyle\alpha^{*}\beta^{h-1}|y_{i,t-h}-y_{i,t-h}^{(h)}|+\xi\beta^{h-1}\sum_{1\leq|j-i|\leq h}|y_{j,t-h}-y_{j,t-h}^{(h)}|
+Cξhbk=1h|yi±h,tkyi±h,tk(h)|.\displaystyle+C\xi h^{-b}\sum_{k=1}^{h}|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}|.

Therefore λit(ν)\lambda_{it}(\nu) satisfies condition (2.7) in Pan and Pan, (2024) with B(i,t),NT(h)ChbB_{(i,t),NT}(h)\leq Ch^{-b} and l=0l=0. By Proposition 2 and Example 2.1 in Pan and Pan, (2024), the array of random fields {λit(ν):(i,t)DNT,NT1}\{\lambda_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\} is η\eta-weakly dependent with coefficient η¯λ(s)Csμy+2\bar{\eta}_{\lambda}(s)\leq Cs^{-\mu_{y}+2}.

Similarly we can define

lit(h)(ν)=yit(h)logλit(h)(ν)λit(h)(ν).l_{it}^{(h)}(\nu)=y_{it}^{(h)}\log\lambda_{it}^{(h)}(\nu)-\lambda_{it}^{(h)}(\nu).

Since

|lit(ν)lit(h)(ν)|\displaystyle|l_{it}(\nu)-l_{it}^{(h)}(\nu)|\leq yit|logλit(ν)λit(h)(ν)|+|λit(ν)λit(h)(ν)|\displaystyle y_{it}\left|\log\frac{\lambda_{it}(\nu)}{\lambda_{it}^{(h)}(\nu)}\right|+|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)|
\displaystyle\leq yit|λit(ν)λit(h)(ν)1|+|λit(ν)λit(h)(ν)|\displaystyle y_{it}\left|\frac{\lambda_{it}(\nu)}{\lambda_{it}^{(h)}(\nu)}-1\right|+|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)|
\displaystyle\leq yitω|λit(ν)λit(h)(ν)|+|λit(ν)λit(h)(ν)|,\displaystyle\frac{y_{it}}{\omega}|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)|+|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)|,

lit(ν)l_{it}(\nu) also satisfies condition (2.7) in Pan and Pan, (2024) with B(i,t),NT(h)ChbB_{(i,t),NT}(h)\leq Ch^{-b} and l=1l=1 by (A.10), the array of random fields {lit(ν):(i,t)DNT,NT1}\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\} is η\eta-weakly dependent with coefficient η¯0(s)Cs2p22p1μy+2\bar{\eta}_{0}(s)\leq Cs^{-\frac{2p-2}{2p-1}\mu_{y}+2}. Note that 2p22p1μy2>2\frac{2p-2}{2p-1}\mu_{y}-2>2 because of μy>4p2p1\mu_{y}>\frac{4p-2}{p-1}. So Claim A.5 is proved. ∎

Claim A.6.

λit(ν)=λit(ν0)\lambda_{it}(\nu)=\lambda_{it}(\nu_{0}) for all (i,t)DNT(i,t)\in D_{NT} if and only if ν=ν0\nu=\nu_{0}.

Proof.

The if part is obvious, it remains for us to prove the only if part. Observe that

(1βB)λit(ν)=ω+αByit+ξj=1NwijByjt,(1-\beta B)\lambda_{it}(\nu)=\omega+\alpha By_{it}+\xi\sum_{j=1}^{N}w_{ij}By_{jt},

where BB stands for the back-shift operator in the sense that Byit2=yi,t12By_{it}^{2}=y_{i,t-1}^{2}, and α\alpha represents either α(1)\alpha^{(1)} or α(2)\alpha^{(2)} according to the value of αit\alpha_{it} at time tt. Therefore we have

(1βB)Λt(ν)=ω𝟏N+(αBIN+ξBW)𝕐t.(1-\beta B)\Lambda_{t}(\nu)=\omega\mathbf{1}_{N}+(\alpha BI_{N}+\xi BW)\mathbb{Y}_{t}.

The polynomial 1βx1-\beta x has a root x=1/βx=1/\beta, which lies outside the unit circle since 0<β<10<\beta<1. Therefore the inverse 11βx\frac{1}{1-\beta x} is well-defined for any |x|1|x|\leq 1, and we have

Λt(ν)=ω1β𝟏N+𝒫ν(B)𝕐t\Lambda_{t}(\nu)=\frac{\omega}{1-\beta}\mathbf{1}_{N}+\mathcal{P}_{\nu}(B)\mathbb{Y}_{t}

with 𝒫ν(B):=αB1βBIN+ξB1βBW.\mathcal{P}_{\nu}(B):=\frac{\alpha B}{1-\beta B}I_{N}+\frac{\xi B}{1-\beta B}W. As λit(ν)=λit(ν0)\lambda_{it}(\nu)=\lambda_{it}(\nu_{0}) for each i=1,2,,Ni=1,2,...,N,

[𝒫ν(B)𝒫ν0(B)]𝕐t=(ω01β0ω1β)𝟏N.\left[\mathcal{P}_{\nu}(B)-\mathcal{P}_{\nu_{0}}(B)\right]\mathbb{Y}_{t}=\left(\frac{\omega_{0}}{1-\beta_{0}}-\frac{\omega}{1-\beta}\right)\mathbf{1}_{N}.

We can deduce from above equation that 𝒫ν(x)=𝒫ν0(x)\mathcal{P}_{\nu}(x)=\mathcal{P}_{\nu_{0}}(x) for any |x|1|x|\leq 1, otherwise 𝕐t\mathbb{Y}_{t} will be degenerated to a deterministic vector given t1\mathcal{H}_{t-1}. 𝒫ν(x)=𝒫ν0(x)\mathcal{P}_{\nu}(x)=\mathcal{P}_{\nu_{0}}(x) implies that

αx1βxINα0x1β0xIN=(ξ0x1β0xξx1βx)W.\frac{\alpha x}{1-\beta x}I_{N}-\frac{\alpha_{0}x}{1-\beta_{0}x}I_{N}=\left(\frac{\xi_{0}x}{1-\beta_{0}x}-\frac{\xi x}{1-\beta x}\right)W.

The diagonal elements of WW are all zeros while the matrix on the left side of above equation has non-zero diagonal elements, so we have

αx1βx=α0x1β0x,\displaystyle\frac{\alpha x}{1-\beta x}=\frac{\alpha_{0}x}{1-\beta_{0}x},
ξx1βx=ξ0x1β0x,\displaystyle\frac{\xi x}{1-\beta x}=\frac{\xi_{0}x}{1-\beta_{0}x},

which imply α=α0\alpha=\alpha_{0}, β=β0\beta=\beta_{0} and ξ=ξ0\xi=\xi_{0}. Besides, ω=ω0\omega=\omega_{0} can be easily derived from ω1β=ω01β0\frac{\omega}{1-\beta}=\frac{\omega_{0}}{1-\beta_{0}}. ∎

With Claim A.4 and Claim A.5, we can apply Theorem 1 in Pan and Pan, (2024) and obtain that

[LNT(ν)𝔼LNT(ν)]𝑝0\left[L_{NT}(\nu)-\mathbb{E}L_{NT}(\nu)\right]\overset{p}{\to}0 (A.11)

for any νΘ×+\nu\in\Theta\times\mathbb{Z}_{+}. Therefore, we have

limT,N[LNT(ν)LNT(ν0)]\displaystyle\lim_{T,N\to\infty}[L_{NT}(\nu)-L_{NT}(\nu_{0})]
=\displaystyle= limT,N𝔼[LNT(ν)LNT(ν0)]\displaystyle\lim_{T,N\to\infty}\mathbb{E}\left[L_{NT}(\nu)-L_{NT}(\nu_{0})\right]
=\displaystyle= limT,N1NT(i,t)DNT𝔼[yitlogλit(ν)λit(ν0)(λit(ν)λit(ν0))]\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left[y_{it}\log\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right]
=\displaystyle= limT,N1NT(i,t)DNT𝔼{𝔼[yitlogλit(ν)λit(ν0)(λit(ν)λit(ν0))|λit(ν),λit(ν0)]}\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left\{\mathbb{E}\left[\left.y_{it}\log\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right|\lambda_{it}(\nu),\lambda_{it}(\nu_{0})\right]\right\} (A.12)
=\displaystyle= limT,N1NT(i,t)DNT𝔼[λit(ν0)logλit(ν)λit(ν0)(λit(ν)λit(ν0))]\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left[\lambda_{it}(\nu_{0})\log\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right]
\displaystyle\leq limT,N1NT(i,t)DNT𝔼{λit(ν0)[λit(ν)λit(ν0)1](λit(ν)λit(ν0))}\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left\{\lambda_{it}(\nu_{0})\left[\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-1\right]-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right\}
=\displaystyle= 0,\displaystyle 0,

with equality if and only if λit(ν)=λit(ν0)\lambda_{it}(\nu)=\lambda_{it}(\nu_{0}) for all (i,t)DNT(i,t)\in D_{NT}, which is equivalent to ν=ν0\nu=\nu_{0} by Claim A.6.

Note that Claim A.3 implies that

limT,N[|LNT(ν^NT)L~NT(ν^NT)|<δ3]=1\lim_{T,N\to\infty}\mathbb{P}\left[|L_{NT}(\hat{\nu}_{NT})-\tilde{L}_{NT}(\hat{\nu}_{NT})|<\frac{\delta}{3}\right]=1

for any δ>0\delta>0, hence

limT,N[LNT(ν^NT)>L~NT(ν^NT)δ3]=1.\lim_{T,N\to\infty}\mathbb{P}\left[L_{NT}(\hat{\nu}_{NT})>\tilde{L}_{NT}(\hat{\nu}_{NT})-\frac{\delta}{3}\right]=1.

Since ν^NT\hat{\nu}_{NT} maximizes L~NT(ν)\tilde{L}_{NT}(\nu), we have

limT,N[L~NT(ν^NT)>L~NT(ν0)δ3]=1.\lim_{T,N\to\infty}\mathbb{P}\left[\tilde{L}_{NT}(\hat{\nu}_{NT})>\tilde{L}_{NT}({\nu}_{0})-\frac{\delta}{3}\right]=1.

So

limT,N[LNT(ν^NT)>L~NT(ν0)2δ3]=1.\lim_{T,N\to\infty}\mathbb{P}\left[L_{NT}(\hat{\nu}_{NT})>\tilde{L}_{NT}({\nu}_{0})-\frac{2\delta}{3}\right]=1.

Furthermore, from Claim A.3,

limT,N[L~NT(ν0)>LNT(ν0)δ3]=1.\lim_{T,N\to\infty}\mathbb{P}\left[\tilde{L}_{NT}({\nu}_{0})>L_{NT}({\nu}_{0})-\frac{\delta}{3}\right]=1.

Therefore we have

limT,N[0LNT(ν0)LNT(ν^NT)<δ]=1.\lim_{T,N\to\infty}\mathbb{P}\left[0\leq L_{NT}(\nu_{0})-L_{NT}(\hat{\nu}_{NT})<\delta\right]=1. (A.13)

Let Vk(θ)V_{k}(\theta) be an open sphere with centre θ\theta and radius 1/k1/k. By (A.12) we could find

δ=infθΘVk(θ0)rr0[LNT(ν0)LNT(ν)]>0.\delta=\inf_{\begin{subarray}{c}\theta\in\Theta\smallsetminus V_{k}(\theta_{0})\\ r\neq r_{0}\end{subarray}}\left[L_{NT}(\nu_{0})-L_{NT}(\nu)\right]>0.

Then by (A.13),

limT,N{0LNT(ν0)LNT(ν^NT)<infθΘVk(θ0)rr0[LNT(ν0)LNT(ν)]}=1.\displaystyle\lim_{T,N\to\infty}\mathbb{P}\left\{0\leq L_{NT}(\nu_{0})-L_{NT}(\hat{\nu}_{NT})<\inf_{\begin{subarray}{c}\theta\in\Theta\smallsetminus V_{k}(\theta_{0})\\ r\neq r_{0}\end{subarray}}\left[L_{NT}(\nu_{0})-L_{NT}(\nu)\right]\right\}=1.

This implies that

limT,N[θ^NTVk(θ0),r^NT=r0]=1\displaystyle\lim_{T,N\to\infty}\mathbb{P}\left[\hat{\theta}_{NT}\in V_{k}(\theta_{0}),\hat{r}_{NT}=r_{0}\right]=1

for any given k>0k>0. Let kk\to\infty we complete the proof.

A.3 Proof of Theorem 3

With a fixed threshold parameter r=r0r=r_{0}, we rewrite θ^NT:=θ^NT(r0)\hat{\theta}_{NT}:=\hat{\theta}_{NT}^{(r_{0})}, λit(θ):=λit(θ,r0)\lambda_{it}(\theta):=\lambda_{it}(\theta,r_{0}) and lit(θ):=lit(θ,r0)l_{it}(\theta):=l_{it}(\theta,r_{0}) etc., in succeeding proofs for simplicity of notations. To prove the asymptotic normality, we need to derive some intermediate results regarding the first, second and third order derivatives of λit(θ)\lambda_{it}(\theta).

Since

λit(θ)=k=1βk1[ω+(α(1)1{yi,tkr}+α(2)1{yi,tk<r})yi,tk+ξj=1Nwijyj,tk]\lambda_{it}(\theta)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\left(\alpha^{(1)}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}<r\}}\right)y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right]

almost surely, the partial derivative of λit(θ)\lambda_{it}(\theta) are

λit(θ)ω=k=1βk1,\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\omega}=\sum_{k=1}^{\infty}\beta^{k-1},
λit(θ)α(1)=k=1βk1yi,tk1{yi,tkr},\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\alpha^{(1)}}=\sum_{k=1}^{\infty}\beta^{k-1}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}},
λit(θ)α(2)=k=1βk1yi,tk1{yi,tk<r},\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\alpha^{(2)}}=\sum_{k=1}^{\infty}\beta^{k-1}y_{i,t-k}1_{\{y_{i,t-k}<r\}}, (A.14)
λit(θ)ξ=k=1βk1(j=1Nwijyj,tk),\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\xi}=\sum_{k=1}^{\infty}\beta^{k-1}\left(\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right),
λit(θ)β=k=2(k1)βk2ui,tk(θ),\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}u_{i,t-k}(\theta),

where

ui,tk(θ)=ω+α(1)yi,tk1{yi,tkr}+α(2)yi,tk1{yi,tk<r}+ξj=1Nwijyj,tk.u_{i,t-k}(\theta)=\omega+\alpha^{(1)}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}y_{i,t-k}1_{\{y_{i,t-k}<r\}}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}.

Note that

λit(θ)θλ~it(θ)θ=tβt1λi0(ν)𝐞5+βtλi0(θ)θ,\frac{\partial\lambda_{it}(\theta)}{\partial\theta}-\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta}=t\beta^{t-1}\lambda_{i0}(\nu)\mathbf{e}_{5}+\beta^{t}\frac{\partial\lambda_{i0}(\theta)}{\partial\theta}, (A.15)

where 𝐞5=(0,0,0,0,1)\mathbf{e}_{5}=(0,0,0,0,1)^{\prime}.

For the second order derivatives, we get that, for any θm,θn{ω,α(1),α(2),ξ}\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi\},

2λit(θ)θmθn=0,\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}=0,

and

2λit(θ)ωβ=k=2(k1)βk2,\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\omega\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2},
2λit(θ)α(1)β=k=2(k1)βk2yi,tk1{yi,tkr},\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\alpha^{(1)}\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}},
2λit(θ)α(2)β=k=2(k1)βk2yi,tk1{yi,tk<r},\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\alpha^{(2)}\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}y_{i,t-k}1_{\{y_{i,t-k}<r\}}, (A.16)
2λit(θ)ξβ=k=2(k1)βk2(j=1Nwijyj,tk),\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\xi\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}\left(\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right),
2λit(θ)β2=k=3(k1)(k2)βk3ui,tk(θ).\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}u_{i,t-k}(\theta).

We also have

2λit(ν)θθ2λ~it(ν)θθ=t(t1)βt2λi0(ν)𝐞5𝐞5+2tβt1λi0(ν)θ𝐞5+βt2λi0(ν)θθ,\frac{\partial^{2}\lambda_{it}(\nu)}{\partial\theta\partial\theta^{\prime}}-\frac{\partial^{2}\tilde{\lambda}_{it}(\nu)}{\partial\theta\partial\theta^{\prime}}=t(t-1)\beta^{t-2}\lambda_{i0}(\nu)\mathbf{e}_{5}\mathbf{e}_{5}^{\prime}+2t\beta^{t-1}\frac{\partial\lambda_{i0}(\nu)}{\partial\theta}\mathbf{e}_{5}^{\prime}+\beta^{t}\frac{\partial^{2}\lambda_{i0}(\nu)}{\partial\theta\partial\theta^{\prime}}, (A.17)

where 𝐞5=(0,0,0,0,1)\mathbf{e}_{5}=(0,0,0,0,1)^{\prime}.

For the third order derivatives of λit(θ)\lambda_{it}(\theta),

3λit(θ)ωβ2=k=3(k1)(k2)βk3,\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\omega\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3},
3λit(θ)α(1)β2=k=3(k1)(k2)βk3yi,tk1{yi,tkr},\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\alpha^{(1)}\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}},
3λit(θ)α(2)β2=k=3(k1)(k2)βk3yi,tk1{yi,tk<r},\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\alpha^{(2)}\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}y_{i,t-k}1_{\{y_{i,t-k}<r\}}, (A.18)
3λit(θ)ξβ2=k=3(k1)(k2)βk3(j=1Nwijyj,tk),\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\xi\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}\left(\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right),
3λit(θ)β3=k=4(k1)(k2)(k3)βk4ui,tk(θ).\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\beta^{3}}=\sum_{k=4}^{\infty}(k-1)(k-2)(k-3)\beta^{k-4}u_{i,t-k}(\theta).

Based on the consistency of θ^NT\hat{\theta}_{NT}, we are now ready to prove asymptotic normality. We split the proof into Claim A.7 to Claim A.10 below.

Claim A.7.

For any θm{ω,α(1),α(2),ξ,β}\theta_{m}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}, NT|L~NT(θ0)θmLNT(θ0)θm|𝑝0\sqrt{NT}\left|\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\theta_{m}}-\frac{\partial L_{NT}(\theta_{0})}{\partial\theta_{m}}\right|\overset{p}{\to}0 as min{N,T}\min\left\{N,T\right\}\to\infty and TN\frac{T}{N}\to\infty.

Proof.

Note that

{LNT(θ)θ=1NT(i,t)DNT(yitλit(θ)1)λit(θ)θ,λit(θ)θ=𝐡i,t1+βλi,t1(θ)θ,\left\{\begin{aligned} &\frac{\partial L_{NT}(\theta)}{\partial\theta}=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\lambda_{it}(\theta)}-1\right)\frac{\partial\lambda_{it}(\theta)}{\partial\theta},\\ &\frac{\partial\lambda_{it}(\theta)}{\partial\theta}=\mathbf{h}_{i,t-1}+\beta\frac{\partial\lambda_{i,t-1}(\theta)}{\partial\theta},\end{aligned}\right. (A.19)

where

𝐡i,t1:=(1,yi,t11{yi,t1r},yi,t11{yi,t1<r},j=1Nwijyj,t1,λi,t1),\mathbf{h}_{i,t-1}:=\left(1,y_{i,t-1}1_{\{y_{i,t-1}\geq r\}},y_{i,t-1}1_{\{y_{i,t-1}<r\}},\sum_{j=1}^{N}w_{ij}y_{j,t-1},\lambda_{i,t-1}\right)^{\prime},

and similarly

{L~NT(θ)θ=1NT(i,t)DNT(yitλ~it(θ)1)λ~it(θ)θ,λ~it(θ)θ=𝐡~i,t1+βλ~i,t1(θ)θ.\left\{\begin{aligned} &\frac{\partial\tilde{L}_{NT}(\theta)}{\partial\theta}=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\tilde{\lambda}_{it}(\theta)}-1\right)\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta},\\ &\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta}=\tilde{\mathbf{h}}_{i,t-1}+\beta\frac{\partial\tilde{\lambda}_{i,t-1}(\theta)}{\partial\theta}.\end{aligned}\right. (A.20)

Therefore, we have

NT|L~NT(θ0)βLNT(θ0)β|\displaystyle\sqrt{NT}\left|\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\beta}-\frac{\partial L_{NT}(\theta_{0})}{\partial\beta}\right|
\displaystyle\leq 1NT(i,t)DNT|yit[λit(θ0)λ~it(θ0)λ~it(θ0)λit(θ0)λ~it(θ0)β\displaystyle\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left|y_{it}\left[\frac{\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})}{\tilde{\lambda}_{it}(\theta_{0})\lambda_{it}(\theta_{0})}\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right.\right.
+1λit(θ0)(λ~it(θ0)βλit(θ0)β)](λ~it(θ0)βλit(θ0)β)|\displaystyle\left.\left.+\frac{1}{\lambda_{it}(\theta_{0})}\left(\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}\right)\right]-\left(\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}\right)\right|
\displaystyle\leq 1NT(i,t)DNTyitω02|λit(θ0)λ~it(θ0)||λ~it(θ0)β|\displaystyle\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\frac{y_{it}}{\omega_{0}^{2}}\left|\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})\right|\left|\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right|
+1NT(i,t)DNT(yitω0+1)|λit(θ0)βλ~it(θ0)β|.\displaystyle+\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\omega_{0}}+1\right)\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right|.

But, by Assumption 3.2(a) and (A.8), we have

1NT(i,t)DNTyitω02|λit(θ0)λ~it(θ0)||λ~it(θ0)β|1\displaystyle\left\lVert\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\frac{y_{it}}{\omega_{0}^{2}}\left|\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})\right|\left|\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right|\right\rVert_{1}
\displaystyle\leq C1NTi=1Nt=1Tβ0tyit1\displaystyle\frac{C_{1}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\beta_{0}^{t}\left\lVert y_{it}\right\rVert_{1} (A.21)
\displaystyle\leq C2NTi=1Nβ01β00\displaystyle\frac{C_{2}}{\sqrt{NT}}\sum_{i=1}^{N}\frac{\beta_{0}}{1-\beta_{0}}\rightarrow 0

when min{N,T}\min\left\{N,T\right\}\to\infty and T/NT/N\to\infty. Then, in view of (A.15),

1NT(i,t)DNT(yitω0+1)|λit(θ0)βλ~it(θ0)β|1\displaystyle\left\lVert\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\omega_{0}}+1\right)\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right|\right\rVert_{1}
\displaystyle\leq C1NTi=1Nt=1Ttβ0t1yitω0+11+C2NTi=1Nt=1Tβ0tyitω0+11\displaystyle\frac{C_{1}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}t\beta_{0}^{t-1}\left\lVert\frac{y_{it}}{\omega_{0}}+1\right\rVert_{1}+\frac{C_{2}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\beta_{0}^{t}\left\lVert\frac{y_{it}}{\omega_{0}}+1\right\rVert_{1} (A.22)
\displaystyle\leq C3NTi=1Nt=1Ttβ0t1+C4NTi=1Nt=1Tβ0t\displaystyle\frac{C_{3}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}t\beta_{0}^{t-1}+\frac{C_{4}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\beta_{0}^{t}
\displaystyle\leq C3NTi=1N1(1β0)2+C4NTi=1Nβ01β00\displaystyle\frac{C_{3}}{\sqrt{NT}}\sum_{i=1}^{N}\frac{1}{(1-\beta_{0})^{2}}+\frac{C_{4}}{\sqrt{NT}}\sum_{i=1}^{N}\frac{\beta_{0}}{1-\beta_{0}}\rightarrow 0

when min{N,T}\min\left\{N,T\right\}\to\infty and T/NT/N\to\infty. In light of (A.21) and (A.22), we can prove that

NT|L~NT(θ0)βLNT(θ0)β|𝑝0.\sqrt{NT}\left|\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\beta}-\frac{\partial L_{NT}(\theta_{0})}{\partial\beta}\right|\overset{p}{\to}0.

The proofs regarding partial derivatives w.r.t. ω\omega, α(1)\alpha^{(1)}, α(2)\alpha^{(2)} and ξ\xi follow similar arguments and are therefore omitted here. ∎

Claim A.8.

For any θm,θn{ω,α(1),α(2),ξ,β}\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}, sup|θθ0|<ξ|2L~NT(θ)θmθn2LNT(θ0)θmθn|=𝒪p(ξ)\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{\partial^{2}\tilde{L}_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta_{m}\theta_{n}}\right|=\mathcal{O}_{p}(\xi) as min{N,T}\min\left\{N,T\right\}\to\infty and T/NT/N\to\infty.

Proof.

For any θm,θn{ω,α(1),α(2),ξ,β}\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}, we have

2LNT(θ)θmθn\displaystyle\frac{\partial^{2}L_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}} (A.23)
=\displaystyle= 1NTi=1Nt=1T[(yitλit(θ)1)2λit(θ)θmθnyitλit2(θ)λit(θ)θmλit(θ)θn],\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left[\left(\frac{y_{it}}{\lambda_{it}(\theta)}-1\right)\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{y_{it}}{\lambda_{it}^{2}(\theta)}\frac{\partial\lambda_{it}(\theta)}{\partial\theta_{m}}\frac{\partial\lambda_{it}(\theta)}{\partial\theta_{n}}\right],

and

2L~NT(θ)θmθn\displaystyle\frac{\partial^{2}\tilde{L}_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}} (A.24)
=\displaystyle= 1NTi=1Nt=1T[(yitλ~it(θ)1)2λ~it(θ)θmθnyitλ~it2(θ)λ~it(θ)θmλ~it(θ)θn].\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left[\left(\frac{y_{it}}{\tilde{\lambda}_{it}(\theta)}-1\right)\frac{\partial^{2}\tilde{\lambda}_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{y_{it}}{\tilde{\lambda}_{it}^{2}(\theta)}\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta_{m}}\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta_{n}}\right].

Note that

sup|θθ0|<ξ|2L~NT(θ)θmθn2LNT(θ0)θmθn|\displaystyle\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{\partial^{2}\tilde{L}_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right| (A.25)
\displaystyle\leq 1NTi=1Nt=1TsupθΘ|2l~it(θ)θmθn2lit(θ)θmθn|\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\theta\in\Theta}\left|\frac{\partial^{2}\tilde{l}_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}\right|
+1NTi=1Nt=1Tsup|θθ0|<ξ|2lit(θ)θmθn2lit(θ0)θmθn|.\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|.

We will handle above two terms separately next.

For the first term on the right-hand-side of (A.25), we see

1NTi=1Nt=1TsupθΘ|2l~it(θ)θmθn2lit(θ)θmθn|1\displaystyle\left\lVert\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\theta\in\Theta}\left|\frac{\partial^{2}\tilde{l}_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}\right|\right\rVert_{1}
\displaystyle\leq 1NTi=1Nt=1TyitsupθΘ(1λit1λ~it)supθΘ2λitθmθn1\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert y_{it}\sup_{\theta\in\Theta}\left(\frac{1}{\lambda_{it}}-\frac{1}{\tilde{\lambda}_{it}}\right)\sup_{\theta\in\Theta}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}}\right\rVert_{1}
+1NTi=1Nt=1TsupθΘ(yitλ~it1)supθΘ(2λitθmθn2λ~itθmθn)1\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert\sup_{\theta\in\Theta}\left(\frac{y_{it}}{\tilde{\lambda}_{it}}-1\right)\sup_{\theta\in\Theta}\left(\frac{\partial^{2}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\tilde{\lambda}_{it}}{\partial\theta_{m}\partial\theta_{n}}\right)\right\rVert_{1}
+1NTi=1Nt=1TyitsupθΘ(λit2λ~it21)supθΘ1λit2λitθmλitθn1\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert y_{it}\sup_{\theta\in\Theta}\left(\frac{\lambda^{2}_{it}}{\tilde{\lambda}^{2}_{it}}-1\right)\sup_{\theta\in\Theta}\frac{1}{\lambda_{it}^{2}}\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial\lambda_{it}}{\partial\theta_{n}}\right\rVert_{1} (A.26)
+1NTi=1Nt=1TsupθΘyitλ~it2[λ~itθm(λ~itθnλitθn)]1\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert\sup_{\theta\in\Theta}\frac{y_{it}}{\tilde{\lambda}^{2}_{it}}\left[\frac{\partial\tilde{\lambda}_{it}}{\partial\theta_{m}}\left(\frac{\partial\tilde{\lambda}_{it}}{\partial\theta_{n}}-\frac{\partial\lambda_{it}}{\partial\theta_{n}}\right)\right]\right\rVert_{1}
+1NTi=1Nt=1TsupθΘyitλ~it2[λitθn(λ~itθmλitθm)]1\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert\sup_{\theta\in\Theta}\frac{y_{it}}{\tilde{\lambda}^{2}_{it}}\left[\frac{\partial\lambda_{it}}{\partial\theta_{n}}\left(\frac{\partial\tilde{\lambda}_{it}}{\partial\theta_{m}}-\frac{\partial\lambda_{it}}{\partial\theta_{m}}\right)\right]\right\rVert_{1}
:=\displaystyle:= T1+T2+T3+T4+T5\displaystyle T_{1}+T_{2}+T_{3}+T_{4}+T_{5}

Analogous to the proof of (A.21), we can show that T10T_{1}\to 0 and T30T_{3}\to 0 as min{N,T}\min\{N,T\}\to\infty and T/NT/N\to\infty. Using (A.17), we can also verify that

T21NTi=1Nt=1T[C1t(t1)ρt2+C2tρt1+C3ρt]supθΘ(yitλ~it1)1.T_{2}\leq\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}[C_{1}t(t-1)\rho^{t-2}+C_{2}t\rho^{t-1}+C_{3}\rho^{t}]\left\lVert\sup_{\theta\in\Theta}\left(\frac{y_{it}}{\tilde{\lambda}_{it}}-1\right)\right\rVert_{1}.

Then T20T_{2}\rightarrow 0 as well. Similarly, using (A.15), we obtain that T40T_{4}\rightarrow 0 and T50T_{5}\rightarrow 0.

For the second term in the right-hand-side of (A.25), we notice that a Taylor expansion of 2lit(θ)θmθn\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}} at θ0\theta_{0} yields that

1NTi=1Nt=1Tsup|θθ0|<ξ|2lit(θ)θmθn2lit(θ0)θmθn|\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|
\displaystyle\leq 1NTi=1Nt=1Tξsup|θθ0|<ξ|3lit(θ)θmθnθl|\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{\partial^{3}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right|
\displaystyle\leq 1NTi=1Nt=1Tξsup|θθ0|<ξ|yitλit1||3λitθmθnθl|\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{y_{it}}{\lambda_{it}}-1\right|\left|\frac{\partial^{3}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right|
+1NTi=1Nt=1Tξsup|θθ0|<ξ|2yitλit3||λitθlλitθmλitθn|\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{2y_{it}}{\lambda_{it}^{3}}\right|\left|\frac{\partial\lambda_{it}}{\partial\theta_{l}}\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial\lambda_{it}}{\partial\theta_{n}}\right| (A.27)
+1NTi=1Nt=1Tξsup|θθ0|<ξ|yitλit2||λitθl2λitθmθn|\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{y_{it}}{\lambda_{it}^{2}}\right|\left|\frac{\partial\lambda_{it}}{\partial\theta_{l}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}}\right|
+1NTi=1Nt=1Tξsup|θθ0|<ξ|yitλit2||λitθn2λitθlθm|\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{y_{it}}{\lambda_{it}^{2}}\right|\left|\frac{\partial\lambda_{it}}{\partial\theta_{n}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{l}\partial\theta_{m}}\right|
+1NTi=1Nt=1Tξsup|θθ0|<ξ|yitλit2||λitθm2λitθnθl|\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{y_{it}}{\lambda_{it}^{2}}\right|\left|\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{n}\partial\theta_{l}}\right|
:=B1+B2+B3+B4+B5\displaystyle:=B_{1}+B_{2}+B_{3}+B_{4}+B_{5}

for any θl,θm,θn{ω,α(1),α(2),ξ,β}\theta_{l},\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}. According to Assumption 3.2(a) and (A.18), we can verify that

𝔼|yitλit1||3λitθmθnθl|<,\mathbb{E}\left|\frac{y_{it}}{\lambda_{it}}-1\right|\left|\frac{\partial^{3}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right|<\infty,

hence B1=𝒪P(ξ)B_{1}=\mathcal{O}_{P}(\xi). The other terms can be verified following similar lines, in light of (A.14) and (A.16).

Taking (A.26) and (A.27) back to (A.25), we complete the proof. ∎

Claim A.9.
  • (a)

    supNT1sup(i,t)DNTlit(θ0)θ2p<\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\left\lVert\frac{\partial l_{it}(\theta_{0})}{\partial\theta}\right\rVert_{2p}<\infty for some p>1p>1;

  • (b)

    For each 𝐯5\mathbf{v}\in\mathbb{R}^{5} such that |𝐯|=1|\mathbf{v}|=1, {𝐯lit(θ0)θ:(i,t)DNT,NT1}\left\{\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta}:(i,t)\in D_{NT},NT\geq 1\right\} are η\eta-weakly dependent, with dependence coefficient η¯1(s)Csμ1\bar{\eta}_{1}(s)\leq Cs^{-\mu_{1}} where μ1>42p1p1\mu_{1}>4\vee\frac{2p-1}{p-1}.

Proof.

Recall that, from (A.19),

lit(θ0)θ=yitλit(θ0)λit(θ0)θλit(θ0)θ.\frac{\partial l_{it}(\theta_{0})}{\partial\theta}=\frac{y_{it}}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}.

From Assumption 3.2 and Lemma A.1, it is easy to see that (a) holds.

Now we verify (b). In the proof of Claim A.5, for each (i,t)DNT(i,t)\in D_{NT} and h=1,2,h=1,2,..., we defined {yjτ(h):(j,τ)DNT,NT1}\{y_{j\tau}^{(h)}:(j,\tau)\in D_{NT},NT\geq 1\} such that yjτ(h)yjτy_{j\tau}^{(h)}\neq y_{j\tau} if and only if ρ((i,t),(j,τ))=h\rho((i,t),(j,\tau))=h. At first, we verify that lit(θ0)β\frac{\partial l_{it}(\theta_{0})}{\partial\beta} satisfies condition (2.7) in Pan and Pan, (2024). Note that

|lit(θ0)βlit(h)(θ0)β|\displaystyle\left|\frac{\partial l_{it}(\theta_{0})}{\partial\beta}-\frac{\partial l_{it}^{(h)}(\theta_{0})}{\partial\beta}\right| (A.28)
\displaystyle\leq yit|1λit(θ0)λit(θ0)β1λit(h)(θ0)λit(h)(θ0)β|+|λit(θ0)βλit(h)(θ0)β|\displaystyle y_{it}\left|\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{1}{\lambda_{it}^{(h)}(\theta_{0})}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right|+\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right|
\displaystyle\leq |yitω0+1||λit(θ0)βλit(h)(θ0)β|+yitω02|λit(h)(θ0)β||λit(θ0)λit(h)(θ0)|\displaystyle\left|\frac{y_{it}}{\omega_{0}}+1\right|\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right|+\frac{y_{it}}{\omega_{0}^{2}}\left|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right|\left|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right|

and

λit(θ0)β=k=2(k1)β0k2ui,tk(θ0),\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta_{0}^{k-2}u_{i,t-k}(\theta_{0}),

where

ui,tk(θ0)=ω0+α0(1)yi,tk1{yi,tkr0}+α0(2)yi,tk1{yi,tk<r0}+ξ0j=1Nwijyj,tk.u_{i,t-k}(\theta_{0})=\omega_{0}+\alpha^{(1)}_{0}y_{i,t-k}1_{\{y_{i,t-k}\geq r_{0}\}}+\alpha^{(2)}_{0}y_{i,t-k}1_{\{y_{i,t-k}<r_{0}\}}+\xi_{0}\sum_{j=1}^{N}w_{ij}y_{j,t-k}.

Following analogous arguments in (A.10), we obtain that

|λit(θ0)βλit(h)(θ0)β|\displaystyle\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right|\leq α0(h1)β0h2|yi,thyi,th(h)|\displaystyle\alpha^{*}_{0}(h-1)\beta_{0}^{h-2}|y_{i,t-h}-y_{i,t-h}^{(h)}| (A.29)
+ξ0(h1)β0h21|ij|h|yj,thyj,th(h)|\displaystyle+\xi_{0}(h-1)\beta_{0}^{h-2}\sum_{1\leq|i-j|\leq h}|y_{j,t-h}-y_{j,t-h}^{(h)}|
+Chbk=2h|yi±h,tkyi±h,tk(h)|.\displaystyle+Ch^{-b}\sum_{k=2}^{h}|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}|.

Combining (A.10), (A.28) and (A.29), we can verify that lit(θ0)β\frac{\partial l_{it}(\theta_{0})}{\partial\beta} satisfies condition (2.7) in Pan and Pan, (2024) with B(i,t),NT(h)ChbB_{(i,t),NT}(h)\leq Ch^{-b} and l=1l=1. Partial derivatives of lit(θ0)l_{it}(\theta_{0}) with respect to other parameters in θ0\theta_{0} follows similarly. Therefore 𝐯lit(θ0)θ\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta} satisfies condition (2.7) in Pan and Pan, (2024) with B(i,t),NT(h)ChbB_{(i,t),NT}(h)\leq Ch^{-b} and l=1l=1 for each 𝐯5\mathbf{v}\in\mathbb{R}^{5}.

According to Proposition 2 and Example 2.1 in Pan and Pan, (2024), the array of random fields {𝐯lit(θ0)θ:(i,t)DNT,NT1}\{\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta}:(i,t)\in D_{NT},NT\geq 1\} is η\eta-weakly dependent with coefficient η¯1(s)Cs2p22p1μy+2\bar{\eta}_{1}(s)\leq Cs^{-\frac{2p-2}{2p-1}\mu_{y}+2}. But 2p22p1μy2>42p1p1\frac{2p-2}{2p-1}\mu_{y}-2>4\vee\frac{2p-1}{p-1} because μy>6p3p1(4p3)(2p1)2(p1)2\mu_{y}>\frac{6p-3}{p-1}\vee\frac{(4p-3)(2p-1)}{2(p-1)^{2}}. So (b) is verified. ∎

Claim A.10.
  • (a)

    supNT1sup(i,t)DNT2lit(θ0)θθp<\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\left\lVert\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right\rVert_{p}<\infty for some p>1p>1;

  • (b)

    With respect to all θm,θn{ω,α(1),α(2),ξ,β}\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}, {2lit(θ0)θmθn:(i,t)DNT,NT1}\left\{\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}:(i,t)\in D_{NT},NT\geq 1\right\} are η\eta-weakly dependent, with dependence coefficient η¯2(s)Csμ2\bar{\eta}_{2}(s)\leq Cs^{-\mu_{2}} where μ2>2\mu_{2}>2.

Proof.

Recall that, from (A.23),

2lit(θ0)θmθn=(yitλit(θ0)1)2λit(θ0)θmθnyitλit2(θ0)λit(θ0)θmλit(θ0)θn.\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}=\left(\frac{y_{it}}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}.

Then Claim A.10(a) could be directly obtained from Assumption 3.2(a).

Following the same idea as in previous proofs, for each (i,t)DNT(i,t)\in D_{NT} and h=1,2,h=1,2,..., we define {yjτ(h):(j,τ)DNT,NT1}\{y_{j\tau}^{(h)}:(j,\tau)\in D_{NT},NT\geq 1\} such that yjτ(h)yjτy_{j\tau}^{(h)}\neq y_{j\tau} if and only if ρ((i,t),(j,τ))=h\rho((i,t),(j,\tau))=h. To prove (b), we verify that 2lit(θ0)θmθn\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}} satisfies condition (2.7) in Pan and Pan, (2024). Firstly we have

|2lit(θ0)θmθn2lit(h)(θ0)θmθn|\displaystyle\left|\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|
\displaystyle\leq |yitλit(θ0)+1||2λit(θ0)θmθn2λit(h)(θ0)θmθn|+yit|2λit(h)(θ0)θmθn||1λit(θ0)1λit(h)(θ0)|\displaystyle\left|\frac{y_{it}}{\lambda_{it}(\theta_{0})}+1\right|\left|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|+y_{it}\left|\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|\left|\frac{1}{\lambda_{it}(\theta_{0})}-\frac{1}{\lambda_{it}^{(h)}(\theta_{0})}\right|
+yitλit2(θ0)|λit(θ0)θmλit(θ0)θnλit(h)(θ0)θmλit(h)(θ0)θn|\displaystyle+\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right|
+|λit(h)(θ0)θmλit(h)(θ0)θn||yitλit2(θ0)yit(λit(h)(θ0))2|\displaystyle+\left|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right|\left|\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}-\frac{y_{it}}{(\lambda_{it}^{(h)}(\theta_{0}))^{2}}\right| (A.30)
\displaystyle\leq |yitλit(θ0)+1||2λit(θ0)θmθn2λit(h)(θ0)θmθn|\displaystyle\left|\frac{y_{it}}{\lambda_{it}(\theta_{0})}+1\right|\left|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|
+yitλit(θ0)λit(h)(θ0)|2λit(h)(θ0)θmθn||λit(θ0)λit(h)(θ0)|\displaystyle+\frac{y_{it}}{\lambda_{it}(\theta_{0})\lambda_{it}^{(h)}(\theta_{0})}\left|\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|\left|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right|
+yitλit2(θ0)|λit(θ0)θm||λit(θ0)θnλit(h)(θ0)θn|\displaystyle+\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}\right|\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right|
+yitλit2(θ0)|λit(h)(θ0)θn||λit(θ0)θmλit(h)(θ0)θm|\displaystyle+\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\left|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right|\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\right|
+yitλit(θ0)λit(h)(θ0)|λit(h)(θ0)θmλit(h)(θ0)θn||1λit(θ0)+1λit(h)(θ0)||λit(θ0)λit(h)(θ0)|\displaystyle+\frac{y_{it}}{\lambda_{it}(\theta_{0})\lambda_{it}^{(h)}(\theta_{0})}\left|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right|\left|\frac{1}{\lambda_{it}(\theta_{0})}+\frac{1}{\lambda_{it}^{(h)}(\theta_{0})}\right|\left|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right|
\displaystyle\leq (yitω0+1)|2λit(θ0)θmθn2λit(h)(θ0)θmθn|+C1yitω02|λit(θ0)λit(h)(θ0)|\displaystyle\left(\frac{y_{it}}{\omega_{0}}+1\right)\left|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right|+C_{1}\frac{y_{it}}{\omega_{0}^{2}}\left|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right|
+C2yitω02|λit(θ0)θnλit(h)(θ0)θn|+C3yitω02|λit(θ0)θmλit(h)(θ0)θm|\displaystyle+C_{2}\frac{y_{it}}{\omega_{0}^{2}}\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right|+C_{3}\frac{y_{it}}{\omega_{0}^{2}}\left|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\right|
+C4yitω03|λit(θ0)λit(h)(θ0)|.\displaystyle+C_{4}\frac{y_{it}}{\omega_{0}^{3}}\left|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right|.

Taking the second order derivative with respect to ξ\xi and β\beta as an example, analogous to (A.10) and (A.29), we get

|2λit(θ0)ξβ2λit(h)(θ0)ξβ|\displaystyle\left|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\xi\partial\beta}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\xi\partial\beta}\right|
\displaystyle\leq k=2(k1)βk2|j=1Nwijyj,tkj=1Nwijyj,tk(h)|\displaystyle\sum_{k=2}^{\infty}(k-1)\beta^{k-2}\left|\sum_{j=1}^{N}w_{ij}y_{j,t-k}-\sum_{j=1}^{N}w_{ij}y_{j,t-k}^{(h)}\right| (A.31)
\displaystyle\leq (h1)β0h2|ij|h|yj,thyj,th(h)|\displaystyle(h-1)\beta_{0}^{h-2}\sum_{|i-j|\leq h}|y_{j,t-h}-y_{j,t-h}^{(h)}|
+Chbk=2h|yi±h,tkyi±h,tk(h)|.\displaystyle+Ch^{-b}\sum_{k=2}^{h}|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}|.

Proofs regarding second order derivatives with respect to other parameters follow similar arguments and then are omitted here. Substituting (A.10), (A.29) and (A.31) to (A.30), we have that 2lit(θ0)θmθn\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}} satisfies condition (2.7) in Pan and Pan, (2024) with B(i,t),NT(h)ChbB_{(i,t),NT}(h)\leq Ch^{-b} and l=1l=1.

According to Proposition 2 and Example 2.1 in Pan and Pan, (2024), the array of random fields {2lit(θ0)θmθn:(i,t)DNT,NT1}\{\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}:(i,t)\in D_{NT},NT\geq 1\} is η\eta-weakly dependent with coefficient η¯1(s)Cs2p22p1μy+2\bar{\eta}_{1}(s)\leq Cs^{-\frac{2p-2}{2p-1}\mu_{y}+2}, and 2p22p1μy2>2\frac{2p-2}{2p-1}\mu_{y}-2>2.

By the Taylor expansion, for some θ\theta^{*} between θ^NT\hat{\theta}_{NT} and θ0\theta_{0} we have

L~NT(θ^NT)θ=L~NT(θ0)θ+2L~NT(θ)θθ(θ^NTθ0).\frac{\partial\tilde{L}_{NT}(\hat{\theta}_{NT})}{\partial\theta}=\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\theta}+\frac{\partial^{2}\tilde{L}_{NT}(\theta^{*})}{\partial\theta\partial\theta^{\prime}}(\hat{\theta}_{NT}-\theta_{0}).

Since L~NT(θ^NT)θ=0\frac{\partial\tilde{L}_{NT}(\hat{\theta}_{NT})}{\partial\theta}=0, we have

NTΣNT1/2(θ^NTθ0)\displaystyle\sqrt{NT}\Sigma_{NT}^{1/2}(\hat{\theta}_{NT}-\theta_{0})
=\displaystyle= ΣNT1/2(2L~NT(θ)θθ)1NTL~NT(θ0)θ\displaystyle-\Sigma_{NT}^{1/2}\left(\frac{\partial^{2}\tilde{L}_{NT}(\theta^{*})}{\partial\theta\partial\theta^{\prime}}\right)^{-1}\sqrt{NT}\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\theta} (A.32)
=\displaystyle= ΣNT1/2(ΣNT1/22LNT(θ0)θθ)1ΣNT1/2NTLNT(θ0)θ+op(1)\displaystyle-\Sigma_{NT}^{1/2}\left(\Sigma_{NT}^{-1/2}\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right)^{-1}\Sigma_{NT}^{-1/2}\sqrt{NT}\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}+o_{p}(1)

according to Claims A.7 and A.8.

Note that yit=Nit(λit(θ0))y_{it}=N_{it}(\lambda_{it}(\theta_{0})) is Poisson distributed with mean λit(θ0)\lambda_{it}(\theta_{0}) conditioning on historical information t1\mathcal{H}_{t-1}, with {Nit:(i,t)DNT,NT1}\{N_{it}:(i,t)\in D_{NT},NT\geq 1\} being IID Poisson point processes with intensity 1. Therefore we have

𝔼(2LNT(θ0)θθ)\displaystyle\mathbb{E}\left(\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right)
=\displaystyle= 1NTi=1Nt=1T𝔼{𝔼[(Nit(λit(θ0))λit(θ0)1)2λit(θ0)θθ|t1]}\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left\{\mathbb{E}\left[\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}|\mathcal{H}_{t-1}\right]\right\}
1NTi=1Nt=1T𝔼{𝔼[Nit(λit(θ0))λit2(θ0)λit(θ0)θλit(θ0)θ|t1]}\displaystyle-\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left\{\mathbb{E}\left[\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}^{2}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}|\mathcal{H}_{t-1}\right]\right\}
=\displaystyle= 1NTi=1Nt=1T𝔼[1λit(θ0)λit(θ0)θλit(θ0)θ]\displaystyle-\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left[\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]
=\displaystyle= ΣNT.\displaystyle-\Sigma_{NT}.

By Claim A.10, we apply Theorem 1 in Pan and Pan, (2024) and obtain that

2LNT(θ0)θθ+ΣNT𝑝0.\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}+\Sigma_{NT}\overset{p}{\to}0. (A.33)

According to condition (3.5) we can further prove that

(ΣNT1/22LNT(θ0)θθ)ΣNT1/2=(ΣNT1/2+op(1))ΣNT1/2=I5+op(1).-\left(\Sigma_{NT}^{-1/2}\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right)\Sigma_{NT}^{-1/2}=\left(\Sigma_{NT}^{1/2}+o_{p}(1)\right)\Sigma_{NT}^{-1/2}=I_{5}+o_{p}(1). (A.34)

When τt\tau\neq t or jij\neq i, we have

𝔼[(Nit(λit(θ0))λit(θ0)1)(Njτ(λjτ(θ0))λjτ(θ0)1)λit(θ0)θλjτ(θ0)θ|t1]=0\mathbb{E}\left[\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\left(\frac{N_{j\tau}(\lambda_{j\tau}(\theta_{0}))}{\lambda_{j\tau}(\theta_{0})}-1\right)\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{j\tau}(\theta_{0})}{\partial\theta^{\prime}}|\mathcal{H}_{t-1}\right]=0

assuming τ<t\tau<t. Then we can verify that

Var(NTLNT(θ0)θ)\displaystyle\operatorname*{Var}\left(\sqrt{NT}\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}\right)
=\displaystyle= 1NT𝔼{[i=1Nt=1T(Nit(λit(θ0))λit(θ0)1)λit(θ0)θ]\displaystyle\frac{1}{NT}\mathbb{E}\left\{\left[\sum_{i=1}^{N}\sum_{t=1}^{T}\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\right]\right.
×[i=1Nt=1T(Nit(λit(θ0))λit(θ0)1)λit(θ0)θ]}\displaystyle\times\left.\left[\sum_{i=1}^{N}\sum_{t=1}^{T}\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]\right\}
=\displaystyle= 1NTi=1Nt=1T𝔼[(Nit(λit(θ0))λit(θ0)1)2λit(θ0)θλit(θ0)θ]\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left[\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)^{2}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]
=\displaystyle= ΣNT.\displaystyle\Sigma_{NT}.

For each 𝐯5\mathbf{v}\in\mathbb{R}^{5}, Var((i,t)DNT𝐯lit(θ0)θ)=(NT)𝐯ΣNT𝐯.\operatorname*{Var}\left(\sum_{(i,t)\in D_{NT}}\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta}\right)=(NT)\mathbf{v}^{\prime}\Sigma_{NT}\mathbf{v}. By (3.5) and the symmetry of ΣNT\Sigma_{NT}, it is implied that

infNT1𝐯ΣNT𝐯>0.\inf_{NT\geq 1}\mathbf{v}^{\prime}\Sigma_{NT}\mathbf{v}>0.

Then, by Claim A.9 and Theorem 2 in Pan and Pan, (2024), we can prove that

[(NT)𝐯ΣNT𝐯]1/2𝐯(NT)LNT(θ0)θ𝑑N(0,1).[(NT)\mathbf{v}^{\prime}\Sigma_{NT}\mathbf{v}]^{-1/2}\mathbf{v}^{\prime}(NT)\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}\overset{d}{\to}N(0,1).

According to the Cramér-Wold theorem, we have

(ΣNT)1/2NTLNT(θ0)θ𝑑N(0,I5).(\Sigma_{NT})^{-1/2}\sqrt{NT}\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}\overset{d}{\to}N(0,I_{5}). (A.35)

Combining (A.32), (A.34) and (A.35), we complete the proof of Theorem 3.

A.4 Proof of Theorem 4

Recalling from (4.2), the Wald statistic is

WNT:=(Γθ^NTη){ΓNTΣ^NT1Γ}1(Γθ^NTη),W_{NT}:=(\Gamma\hat{\theta}_{NT}-\eta)^{\prime}\left\{\frac{\Gamma}{NT}\widehat{\Sigma}_{NT}^{-1}\Gamma^{\prime}\right\}^{-1}(\Gamma\hat{\theta}_{NT}-\eta),

where

Σ^NT:=1NT(i,t)DNT[1λ~it(θ^NT)λ~it(θ^NT)θλ~it(θ^NT)θ].\widehat{\Sigma}_{NT}:=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right].

It is sufficient to show that

1NT(i,t)DNT[1λ~it(θ^NT)λ~it(θ^NT)θλ~it(θ^NT)θ]ΣNT𝑝0.\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right]-\Sigma_{NT}\overset{p}{\to}0. (A.36)

Note that

1NT(i,t)DNT[1λ~it(θ^NT)λ~it(θ^NT)θλ~it(θ^NT)θ]ΣNT\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right]-\Sigma_{NT}
=\displaystyle= 1NT(i,t)DNT[1λ~it(θ^NT)λ~it(θ^NT)θλ~it(θ^NT)θ𝔼(1λit(θ0)λit(θ0)θλit(θ0)θ)]\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\mathbb{E}\left(\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right)\right]
=\displaystyle= 1NT(i,t)DNT[1λ~it(θ^NT)λ~it(θ^NT)θλ~it(θ^NT)θ1λit(θ0)λit(θ0)θλit(θ0)θ]\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]
+1NT(i,t)DNT[1λit(θ0)λit(θ0)θλit(θ0)θ𝔼(1λit(θ0)λit(θ0)θλit(θ0)θ)]\displaystyle+\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}-\mathbb{E}\left(\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right)\right]
:=\displaystyle:= T1+T2.\displaystyle T_{1}+T_{2}.

Similar to the proof of Claim A.10, we can verify that the LLN Theorem 1 in Pan and Pan, (2024) applies to {1λit(θ0)λit(θ0)θλit(θ0)θ:(i,t)DNT,NT1}\left\{\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}:(i,t)\in D_{NT},NT\geq 1\right\} and therefore T2𝑝0T_{2}\overset{p}{\to}0. T1T_{1} can be further decomposed as follows

1NT(i,t)DNT[1λ~it(θ^NT)λ~it(θ^NT)θλ~it(θ^NT)θ1λit(θ0)λit(θ0)θλit(θ0)θ]\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]
=\displaystyle= 1NT(i,t)DNT[1λ~it(θ^NT)λ~it(θ^NT)θλ~it(θ^NT)θ1λit(θ^NT)λit(θ^NT)θλit(θ^NT)θ]\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\hat{\theta}_{NT})}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right]
+1NT(i,t)DNT[1λit(θ^NT)λit(θ^NT)θλit(θ^NT)θ1λit(θ0)λit(θ0)θλit(θ0)θ]\displaystyle+\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\lambda_{it}(\hat{\theta}_{NT})}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]
:=\displaystyle:= S1+S2.\displaystyle S_{1}+S_{2}.

But, S2𝑝0S_{2}\overset{p}{\to}0 since θ^NT𝑝θ0\hat{\theta}_{NT}\overset{p}{\to}\theta_{0}. The proof of S1𝑝0S_{1}\overset{p}{\to}0 is similar to the proof of (A.26), therefore omitted here. So the proof of Theorem 4 is completed.

References

  • Al-Osh and Alzaid, (1987) Al-Osh, M. A. and Alzaid, A. A. (1987). First-order integer-valued autoregressive (INAR (1)) process. Journal of Time Series Analysis, 8(3):261–275.
  • Al-Osh and Alzaid, (1988) Al-Osh, M. A. and Alzaid, A. A. (1988). Integer-valued moving average (INMA) process. Statistical Papers, 29:281–300.
  • Armillotta and Fokianos, (2024) Armillotta, M. and Fokianos, K. (2024). Count network autoregression. Journal of Time Series Analysis, 45(4):584–612.
  • Berkes et al., (2003) Berkes, I., Horváth, L., and Kokoszka, P. (2003). GARCH processes: structure and estimation. Bernoulli, 9(2):201–227.
  • Clauset et al., (2009) Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4):661–703.
  • Cui et al., (2020) Cui, Y., Li, Q., and Zhu, F. (2020). Flexible bivariate Poisson integer-valued GARCH model. Annals of the Institute of Statistical Mathematics, 72(6):1449–1477.
  • Cui and Zhu, (2018) Cui, Y. and Zhu, F. (2018). A new bivariate integer-valued GARCH model allowing for negative cross-correlation. Test, 27:428–452.
  • De Wit et al., (2013) De Wit, E. R., Englund, P., and Francke, M. K. (2013). Price and transaction volume in the Dutch housing market. Regional Science and Urban Economics, 43(2):220–241.
  • Ferland et al., (2006) Ferland, R., Latour, A., and Oraichi, D. (2006). Integer-valued GARCH process. Journal of time series analysis, 27(6):923–942.
  • Fokianos et al., (2009) Fokianos, K., Rahbek, A., and Tjøstheim, D. (2009). Poisson autoregression. Journal of the American Statistical Association, 104(488):1430–1439.
  • Fokianos et al., (2020) Fokianos, K., Støve, B., Tjøstheim, D., and Doukhan, P. (2020). Multivariate count autoregression. Bernoulli, 26(1):471 – 499.
  • Heinen, (2003) Heinen, A. (2003). Modelling time series count data: an autoregressive conditional Poisson model. Available at SSRN 1117187.
  • Jiang et al., (2022) Jiang, F., Zhao, Z., and Shao, X. (2022). Modelling the covid-19 infection trajectory: A piecewise linear quantile trend model. Journal of Royal Statistical Society (Series B), 84(5):1589–1607.
  • Jones et al., (1994) Jones, C. M., Kaul, G., and Lipson, M. L. (1994). Transactions, volume, and volatility. The Review of Financial Studies, 7(4):631–651.
  • Lee et al., (2023) Lee, S., Kim, D., and Kim, B. (2023). Modeling and inference for multivariate time series of counts based on the INGARCH scheme. Computational Statistics & Data Analysis, 177:107579.
  • Lee et al., (2018) Lee, Y., Lee, S., and Tjøstheim, D. (2018). Asymptotic normality and parameter change test for bivariate Poisson INGARCH models. Test, 27:52–69.
  • McKenzie, (1985) McKenzie, E. (1985). Some simple models for discrete variate time series. Journal of the American Water Resources Association, 21(4):645–650.
  • McKenzie, (1988) McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability, 20(4):822–835.
  • Nowicki and Snijders, (2001) Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic block structures. Journal of the American Statistical Association, 96(455):1077–1087.
  • Pan and Pan, (2024) Pan, Y. and Pan, J. (2024). Limit theorems for weakly dependent non-stationary random field arrays and asymptotic inference of dynamic spatio-temporal models. Available at arXiv:2408.07429.
  • Pham, (2020) Pham, H. (2020). On estimating the number of deaths related to Covid-19. Mathematics, 8(5):655.
  • Tao et al., (2024) Tao, Y., Li, D., and Niu, X. (2024). Grouped network Poisson autoregressive model. Statistica Sinica, 34:1603–1624.
  • Wang et al., (2014) Wang, C., Liu, H., Yao, J.-F., Davis, R. A., and Li, W. K. (2014). Self-excited threshold Poisson autoregression. Journal of the American Statistical Association, 109(506):777–787.
  • Xu et al., (2012) Xu, H.-Y., Xie, M., Goh, T. N., and Fu, X. (2012). A model for integer-valued time series with conditional overdispersion. Computational Statistics & Data Analysis, 56(12):4229–4242.
  • Zhou et al., (2020) Zhou, J., Li, D., Pan, R., and Wang, H. (2020). Network GARCH model. Statistica Sinica, 30:1–18.
  • Zhu, (2010) Zhu, F. (2010). A negative binomial integer-valued GARCH model. Journal of Time Series Analysis, 32(1):54–67.
  • Zhu, (2012) Zhu, F. (2012). Modeling overdispersed or underdispersed count data with generalized Poisson integer-valued GARCH models. Journal of Mathematical Analysis and Applications, 389(1):58–71.