Modelling Volatility of Spatio-temporal Integer-valued Data
with Network Structure and Asymmetry

Yue Pan and Jiazhu Pan
Department of Mathematics and Statistics, University of Strathclyde, Glasgow G1 1XH, UK

Abstract

This paper proposes a spatial threshold GARCH-type model for dynamic spatio-temporal integer-valued data with network structure. The proposed model can simplify the parameterization by using network structure in data, and can capture the asymmetric property in dynamic volatility by adopting a threshold structure. The proposed model assumes the conditional distribution is Poisson distribution. Asymptotic theory of maximum likelihood estimation (MLE) for the spatial model is derived when both sample size and network dimension are large. We obtain asymptotic statistical inferences via investigation of the weak dependence of components of the model and application of limit theorems for weakly dependent random fields. Simulation studies and a real data example are presented to support our methodology.

Keywords: Conditional variance/mean, high-dimensional count time series, asymmetry of volatility, spatial threshold GARCH, network structure.

MSC 2020 Subject Classification: Primary 62M10, 91B05; Secondary 60G60, 60F05.

1 Introduction

Integer-valued time series can be observed in a wide range of scientific fields, such as the yearly trading volume of houses on real estate market (De Wit et al., , 2013), number of transactions of stocks (Jones et al., , 1994), daily mortality from Covid-19 (Pham, , 2020) and daily new cases of Covid-19 (Jiang et al., , 2022). A first idea to model integer-valued time series is using a simple first-order autoregressive model (AR)

X_{t}=\alpha X_{t-1}+\varepsilon_{t},

(1.1)

where $0\leq\alpha<1$ is a parameter. However in (1.1) $X_{t}$ is not necessarily an integer given integer-valued $X_{t-1}$ and $\varepsilon_{t}$ , due to the multiplication structure $\alpha X_{t-1}$ . To circumvent such problem, McKenzie, (1985) and Al-Osh and Alzaid, (1987) proposed an integer-valued counterpart of AR model (INAR). They replaced the ordinary multiplication $\alpha X_{t-1}$ by the binomial thinning operation $\alpha\circ X_{t-1}$ , where $\alpha\circ X|X\sim Bin(X,\alpha)$ . This was ground-breaking and led to various extensions of thinning-based linear models including integer-valued moving average model (INMA) (Al-Osh and Alzaid, , 1988) and INARMA model (McKenzie, , 1988). An alternative approach to the multiplication problem is to consider the regression of the conditional mean $\lambda_{t}:=\mathbb{E}(X_{t}|\mathcal{H}_{t-1})$ , where $H_{t-1}$ is the $\sigma$ -algebra generated by historical information up to $t-1$ . Based on this idea, integer-valued GARCH-type models (INGARCH) were proposed by Heinen, (2003), Ferland et al., (2006) and Fokianos et al., (2009) with conditional Poisson distribution of $X_{t}$ , i.e.

		$\displaystyle X_{t}\|\mathcal{H}_{t-1}\sim Poisson(\lambda_{t}),$		(1.2)
		$\displaystyle\lambda_{t}=\omega+\sum_{i=1}^{p}\alpha_{i}X_{t-i}+\sum_{j=1}^{q}\beta_{j}\lambda_{t-j},$		(1.2)

where parameters satisfy $\omega>0,\alpha_{i}\geq 0,i=1,\cdots,p,\beta_{j}\geq 0,j=1,\cdots,q$ . Other variations of INGARCH models with different specifications of conditional distribution include negative binomial INGARCH (Zhu, , 2010; Xu et al., , 2012) and generalized Poisson INGARCH (Zhu, , 2012).

Integer-valued models mentioned above are all limited to one-dimensional time series. The development of multivariate integer-valued GARCH-type models is still at its early stage. For example, the bivariate INGARCH models (Lee et al., , 2018; Cui and Zhu, , 2018; Cui et al., , 2020), multivariate INGARCH models (Fokianos et al., , 2020; Lee et al., , 2023) and some counterparts of the continuous-valued network GARCH model (Zhou et al., , 2020) are on fixed low-dimensional time series of counts. For high-dimensional integer-valued time series with increasing dimension, there is the Poisson network autoregressive model (PNAR) by Armillotta and Fokianos, (2024). The PNAR allows for integer-valued time series with increasing network dimension. However, it adopted a ARCH-type structure without considering the autoregressive term on the conditional mean/variance to describe persistence in volatility, and it can not capture asymmetric characteristics of volatility. Tao et al., (2024) proposed a grouped PNAR model which has a GARCH structure, but its network dimension is fixed and not suitable to spatio-temporal integer-valued data. In this paper, we propose a Poisson threshold network GARCH model (PTNGARCH) and discuss its asymptotic inferences. The specific contributions are as follows.

1.

A threshold structure is designed so that it is able to capture asymmetric property of high-dimensional volatility for integer-valued data. The threshold effect can also be tested under such a framework.
2.

Our PTNGARCH includes an autoregressive term on the conditional mean/variance so that it provides a parsimonious description of dynamic volatility persistence of high-dimensional integer-valued time series.
3.

Asymptotic theory, when both sample size ( $T$ ) and network dimension ( $N$ ) are large, of maximum likelihood estimation for the proposed model is established by the limit theorems for arrays of weakly dependent random fields in Pan and Pan, (2024). This enables the proposed model to be suitable for fitting dynamic spatio-temporal integer-valued data (or the so-called ultra-high dimensional integer-valued time series).

The remainder of this paper is arranged as follows. The PTNGARCH model is introduced and its stationarity over time is discussed under fixed network dimension in section 2. Section 3 presents the MLE for parameters, including the threshold, and the consistency and asymptotic normality for estimates of coefficients under large sample size ( $T$ ) and large network dimension ( $N$ ). Section 4 gives a Wald test which can be applied to testing of the existence of threshold effect (i.e. asymmetric property), GARCH effect and network effect. Section 5 conducts simulation studies to verify the asymptotic properties of the MLE, and applies the proposed model to daily numbers of car accidents that occurred in 41 neighbourhoods in New York City, with interpretation of the results of analysis. All proofs of theoretical results are postponed to the appendix.

2 Model setting and stationarity over time

Consider an non-directed and weightless network with $N$ nodes. Define adjacency matrix $A=(a_{ij})_{1\leq i,j\leq N}$ , where $a_{ij}=1$ if there is a connection between node $i$ and $j$ , otherwise $a_{ij}=0$ . In addition, self-connection is not allowed for any node $i$ by letting $a_{ii}=0$ . As an interpretation of the network structure, $A$ is symmetric since $a_{ij}=a_{ji}$ , hence for any node $i$ , the out-degree $d_{i}^{(out)}=\sum_{j=1}^{N}a_{ij}$ is equal to the in-degree $d_{i}^{(in)}=\sum_{j=1}^{N}a_{ji}$ , and we use $d_{i}$ to denote both for convenience. To embed a network into statistical models, it is often convenient to use the row normalized adjacency matrix $W$ with its $(i,j)$ element $w_{ij}=\frac{a_{ij}}{d_{i}}$ .

For any node $i\in\{1,\cdots.N\}$ in this network, let $y_{it}$ be an non-negative integer-valued observation at time $t$ , and $\mathcal{H}_{t-1}$ denotes the $\sigma$ -algebra consisting of all available information up to $t-1$ . In our Poisson threshold network GARCH model, for each $i=1,2,...,N$ and $t\in\mathbb{Z}$ , $y_{it}$ is assumed to follow a conditional (on $\mathcal{H}_{t-1}$ ) Poisson distribution with $(i,t)$ -varying variance (mean) $\lambda_{it}$ . A PTNGARCH(1,1) model has the following form:

		$\displaystyle y_{it}\|\mathcal{H}_{t-1}\sim Poisson(\lambda_{it}),$		(2.1)
		$\displaystyle\lambda_{it}=\omega+\left(\alpha^{(1)}1_{\{y_{i,t-1}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-1}<r\}}\right)y_{i,t-1}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-1}+\beta\lambda_{i,t-1},$
		$\displaystyle i=1,2,\cdots,N.$

The threshold parameter $r$ is a positive integer, and $1_{\{\cdot\}}$ denotes an indicator function. To assure the positiveness of conditional variance, we need to assume positiveness of the base parameter $\omega$ , and non-negativeness of all the coefficients $\alpha^{(1)}$ , $\alpha^{(2)}$ , $\xi$ , $\beta$ .

Remark.

Notice that in (2.1) we model the dynamics of conditional mean $\lambda_{it}$ , which is the reason why the name “Poisson autoregression” is sometimes used in the literature (Fokianos et al., , 2009; Wang et al., , 2014); Some authors still use the name “GARCH” since the mean is equal to the variance under Poisson distribution, and the dynamics of conditional mean are GARCH-like.

Let $\{N_{it}:i=1,2,...,N,t\in\mathbb{Z}\}$ be independent Poisson processes with unit intensities. Depending on $\lambda_{it}$ , $y_{it}$ can be interpreted as a Poisson distributed random variable $N_{it}(\lambda_{it})$ , which is the number of occurrences during the time interval $(0,\lambda_{it}]$ , i.e. $\mathbb{P}(y_{it}=n|\lambda_{it}=\lambda)=\frac{\lambda^{n}}{n!}e^{-\lambda}.$ We could rewrite (2.1) in a vectorized form as follows:

\begin{cases}&\mathbb{Y}_{t}=(N_{1t}(\lambda_{1t}),N_{2t}(\lambda_{2t}),...,N_{Nt}(\lambda_{Nt}))^{\prime},\\ &\Lambda_{t}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1})\mathbb{Y}_{t-1}+\beta\Lambda_{t-1},\end{cases}

(2.2)

where

	$\displaystyle\Lambda_{t}=(\lambda_{1t},\lambda_{2t},...,\lambda_{Nt})^{\prime}\in\mathbb{R}^{N},$
	$\displaystyle\mathbf{1}_{N}=(1,1,...,1)^{\prime}\in\mathbb{R}^{N},$
	$\displaystyle A(\mathbb{Y}_{t-1})=\alpha^{(1)}S(\mathbb{Y}_{t-1})+\alpha^{(2)}(I_{N}-S(\mathbb{Y}_{t-1}))+\xi W,$
	$\displaystyle S(\mathbb{Y}_{t-1})=\operatorname*{diag}\left\{1_{\{y_{1,t-1}\geq r\}},1_{\{y_{2,t-1}\geq r\}},...,1_{\{y_{N,t-1}\geq r\}}\right\}.$

Note that $\mathbb{Y}_{t}\in\mathbb{N}^{N}$ with dimension $N$ and $\mathbb{N}=\{0,1,2,\cdots\}$ . For a fixed dimension $N$ , the stationarity condition of time series $\mathbb{Y}_{t}$ can be obtained. The proof of the following theorem is given in the appendix.

Theorem 1.

If the parameters satisfy

\max\left\{\alpha^{(1)},\alpha^{(2)},\left|\alpha^{(1)}r-\alpha^{(2)}(r-1)\right|\right\}+\xi+\beta<1,

(2.3)

there exists a unique strictly stationary process $\{\mathbb{Y}_{t}:t\in\mathbb{Z}\}$ that satisfies (2.2) and has finite first order moment. Therefore, each component $y_{it}$ of model (2.1) has a unique strictly stationary solution with finite first order moment.

3 Parameter estimation with $T\to\infty$ and $N\to\infty$

Denote parameter vector of model (2.1) by $\nu=(\theta^{\prime},r)^{\prime}$ with $\theta=(\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta)^{\prime}$ . Let $\Theta=\{\theta:\omega>0,\alpha^{(1)}\geq 0,\alpha^{(2)}\geq 0,\xi\geq 0,\beta\geq 0\}$ and $\mathbb{Z}_{+}=\{0,1,2,\cdots\}$ . For a given threshold $r$ , the reasonable parameter space for $\theta$ is a sufficiently large compact subset of $\Theta$ , such as $\Theta_{\delta}=\{\theta:\omega\geq\delta,\alpha^{(1)}\geq 0,\alpha^{(2)}\geq 0,\xi\geq 0,\beta\geq 0\}$ with sufficient small positive real number $\delta$ .

Let $D_{NT}=\{(i,t):i=1,\cdots,N;t=1,\cdots,T\}$ be the index set. Suppose that the samples $\{y_{it}:(i,t)\in D_{NT},NT\geq 1\}$ are generated by model (2.1) with respect to true parameters $\nu_{0}=(\omega_{0},\alpha^{(1)}_{0},\alpha^{(2)}_{0},\xi_{0},\beta_{0},r_{0})^{\prime}$ .

The log-likelihood function (ignoring constants) is

\left\{\begin{array}[]{ll}&L_{NT}(\nu)=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}l_{it}(\nu),\\ &l_{it}(\nu)=y_{it}\log\lambda_{it}(\nu)-\lambda_{it}(\nu)\end{array}\right.

(3.1)

where $\lambda_{it}(\nu)$ is generated from model (2.1) as

	$\displaystyle\lambda_{it}(\nu)=$	$\displaystyle\omega+\alpha^{(1)}1_{\{y_{i,t-1}\geq r\}}y_{i,t-1}+\alpha^{(2)}1_{\{y_{i,t-1}<r\}}y_{i,t-1}$		(3.2)
		$\displaystyle+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-1}+\beta\lambda_{i,t-1}(\nu).$		(3.2)

In practice, (3.1) can not be evaluated without knowing the true values of $\lambda_{i0}$ for $i=1,2,...,N$ . We need to approximate (3.1) by (3.3) below, using specified initial values $\lambda_{i0}=\tilde{\lambda}_{i0},i=1,2,...,N$ :

\left\{\begin{array}[]{ll}&\tilde{L}_{NT}(\nu)=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\tilde{l}_{it}(\nu),\\ &\tilde{l}_{it}(\nu)=y_{it}\log\tilde{\lambda}_{it}(\nu)-\tilde{\lambda}_{it}(\nu).\end{array}\right.

(3.3)

Therefore, the maximum likelihood estimate (MLE) of parameter $\nu$ is defined as

\hat{\nu}_{NT}=\operatorname*{argmax}_{\nu\in\Theta\times\mathbb{Z}_{+}}\tilde{L}_{NT}(\nu).

(3.4)

However, the solution that maximizes the target function $\tilde{L}_{NT}(\nu)$ can not be directly obtained by solving $\frac{\partial\tilde{L}_{NT}(\nu)}{\partial\nu}=0,$ since $r\in\mathbb{Z}_{+}$ is discrete-valued, therefore the partial derivative of $\tilde{L}_{NT}(\nu)$ w.r.t. $r$ is invalid. According to Wang et al., (2014), such optimization problem with integer-valued parameter $r$ could be broken up into two steps as follows:

1.

Find

$\hat{\theta}_{NT}^{(r)}=\operatorname*{argmax}_{\theta\in\Theta}\tilde{L}_{NT}(\theta,r)$

for each $r$ in a predetermined range $[\underline{r},\bar{r}]\subset\mathbb{Z}_{+}$ ;
2.

Find

$\hat{r}_{NT}=\operatorname*{argmax}_{r\in[\underline{r},\bar{r}]}\tilde{L}_{NT}(\hat{\theta}_{NT}^{(r)},r).$

Then $\hat{\nu}_{NT}=\left(\hat{\theta}_{NT}^{(\hat{r}_{NT})^{\prime}},\hat{r}_{NT}\right)^{\prime}$ would be the maximizer of $\tilde{L}_{NT}(\nu)$ .

Assumption 3.1 below is a regular condition on the parameter space. Assumptions 3.2 and 3.3 are necessary for obtaining $\eta$ -weak dependence of $\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\}$ and their derivatives. For the definition of $\eta$ -weak dependence for random fields, see Pan and Pan, (2024). Then the consistency of MLE in Theorem 2 can be proved based on the law of large numbers (LLN) of $\eta$ -weakly dependent arrays of random fields in Pan and Pan, (2024).

Assumption 3.1.

For a given threshold $r\in\mathbb{Z}_{+}$ ,

(a)

The parameter space for $\theta$ is a compact subset of $\Theta$ and includes the true parameter $\theta_{0}$ as its interior point;
(b)

Condition (2.3) holds.

Assumption 3.2.

(a)

$\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\mathbb{E}|y_{it}|^{2p}<\infty$ for some $p>1$ ;
(b)

The array of random fields $\left\{y_{it}:(i,t)\in D_{NT},NT\geq 1\right\}$ is $\eta$ -weakly dependent with coefficient $\bar{\eta}_{y}(s):=\mathcal{O}(s^{-\mu_{y}})$ for some $\mu_{y}>2\frac{2p-1}{p-1}$ .

Assumption 3.3.

For any $i=1,2,...,N$ and $j=1,2,...,N$ , there exist constants $C>0$ and $b>\mu_{y}$ such that $w_{ij}\leq C|j-i|^{-b}.$ That is, the power of connection between two nodes $i$ and $j$ decays as the distance $|i-j|$ grows.

Theorem 2.

If Assumptions 3.1, 3.2 and 3.3 hold, the MLE defined by (3.4) is consistent, i.e.

\hat{\nu}_{NT}\overset{p}{\to}\nu_{0}

as $T\to\infty$ and $N\to\infty$ .

Since $\hat{r}_{NT}$ is an integer-valued consistent estimate of $r_{0}$ , $\hat{r}_{NT}$ will eventually be equal to $r_{0}$ when the sample size $NT$ becomes sufficiently large. Therefore, $\hat{\nu}_{NT}=\left(\hat{\theta}_{NT}^{(\hat{r}_{NT})^{\prime}},\hat{r}_{NT}\right)^{\prime}$ is asymptotically equal to $\left(\hat{\theta}_{NT}^{(r_{0})^{\prime}},r_{0}\right)^{\prime}$ . In this way, the problem of investigating the asymptotic distribution of $\hat{\nu}_{NT}$ degenerates to investigating the asymptotic distribution of $\hat{\theta}_{NT}^{(r_{0})}$ .

Theorem 3.

Assume that all conditions in Theorem 2 are satisfied with $\mu_{y}>\frac{6p-3}{p-1}\vee\frac{(4p-3)(2p-1)}{2(p-1)^{2}}$ in Assumption 3.2(b) instead. If the smallest eigenvalue $\lambda_{min}(\Sigma_{NT})$ of

\Sigma_{NT}:=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left[\frac{1}{\lambda_{it}(\nu_{0})}\frac{\partial\lambda_{it}(\nu_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\nu_{0})}{\partial\theta^{\prime}}\right]

satisfies that

\inf_{NT\geq 1}\lambda_{min}(\Sigma_{NT})>0,

(3.5)

then $\hat{\theta}_{NT}^{(r_{0})}$ is asymptotically normal, i.e.

\sqrt{NT}\Sigma_{NT}^{1/2}(\hat{\theta}_{NT}^{(r_{0})}-\theta_{0})\overset{d}{\to}N(0,I_{5})

as $T\to\infty$ , $N\to\infty$ and $N=o(T)$ .

Remark.

In the proof of Theorem 4 below, we will show that, $\Sigma_{NT}$ could be consistently estimated by

\widehat{\Sigma}_{NT}=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\nu}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\nu}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\nu}_{NT})}{\partial\theta^{\prime}}\right]

(3.6)

in practice. In the simulation studies in Section 5, the elements of $\widehat{\Sigma}_{NT}$ are used to construct confidence intervals and to evaluate coverage probabilities.

4 Hypothesis testing with $T\to\infty$ and $N\to\infty$

Based on Theorem 2 and Theorem 3, for sufficiently large sample region such that $\hat{r}_{NT}=r_{0}$ , we are able to design a Wald test for the null hypothesis

H_{0}:\Gamma\theta_{0}=\eta,

(4.1)

where $\Gamma$ is an $s\times 5$ matrix with rank $s$ ( $1\leq s\leq 5$ ) and $\eta$ is an $s$ -dimensional vector. For example, to test the existence of threshold effect, simply let $\Gamma=(0,1,-1,0,0)$ and $\eta=0$ , and the null hypothesis (4.1) becomes

H_{0}:\alpha^{(1)}_{0}=\alpha^{(2)}_{0};

To see if the autoregressive term is necessary (i.e. GARCH effect), take $\Gamma=(0,0,0,0,1)$ and $\eta=0$ , and the hypothesis becomes

H_{0}:\beta_{0}=0\quad v.s.\quad H_{1}:\beta_{0}>0;

To test the network effect, just let $\Gamma=(0,0,0,1,0)$ and $\eta=0$ , and the question becomes testing the hypothesis

H_{0}:\xi_{0}=0\quad v.s.\quad H_{1}:\xi_{0}>0.

Corresponding to the asymptotic normality of $\hat{\theta}_{NT}^{(r_{0})}$ in Theorem 3, we define a Wald test statistic as follows

W_{NT}:=(\Gamma\hat{\theta}_{NT}^{(r_{0})}-\eta)^{\prime}\left\{\frac{1}{NT}\Gamma\widehat{\Sigma}_{NT}^{-1}\Gamma^{\prime}\right\}^{-1}(\Gamma\hat{\theta}_{NT}^{(r_{0})}-\eta),

(4.2)

where $\widehat{\Sigma}_{NT}$ is defined as (3.6). The following Theorem 4 shows that $W_{NT}$ has an asymptotic $\chi^{2}$ -distribution with degree of freedom $s$ .

Theorem 4.

Under the same assumptions as in Theorem 3, the Wald test statistic defined in (4.2) asymptotically follows a $\chi^{2}$ distribution with degree of freedom $s$ , i.e.

W_{NT}\overset{d}{\to}\chi^{2}_{s},

as $T\to\infty$ , $N\to\infty$ and $N=o(T)$ .

5 Simulation studies and empirical data analysis

5.1 Simulation studies

We intend to use four different mechanisms of simulating the network structure in model (2.1). The network structure in Example 5.1 satisfies Assumption 3.3. Simulation mechanisms in Examples 5.2 – 5.4 are for testing the robustness of our estimation, against network structures that may violate Assumption 3.3.

Example 5.1.

(D-neighbourhood network) For each node $i\in\left\{1,2,...,N\right\}$ , it is connected to node $j$ only if $j$ is inside $i$ ’s $D$ -neighbourhood. That is, in the adjacency matrix, $a_{ij}=1$ if $0<|i-j|\leq D$ and $a_{ij}=0$ otherwise. Figure 1(a) is a visualization of such a network with $N=100$ and $D=10$ .

Example 5.2.

(Random network) For each node $i\in\left\{1,2,...,N\right\}$ , we generate $D_{i}$ from uniform distribution $U(0,5)$ , and then draw $[D_{i}]$ samples randomly from $\left\{1,2,...,N\right\}$ to form a set $S_{i}$ ( $[x]$ denotes the integer part of $x$ ). $A=(a_{ij})$ could be generated by letting $a_{ij}=1$ if $j\in S_{i}$ and $a_{ij}=0$ otherwise. In a network simulated with such mechanism, as it is indicated in Figure 1(b), there is no significantly influential node (i.e. node with extremely large in-degree).

Example 5.3.

(Network with power-law) According to Clauset et al., (2009), for each node $i$ in such a network, $D_{i}$ is generated the same way as in Example 5.2. Instead of uniformly selecting $[D_{i}]$ samples from $\left\{1,2,...,N\right\}$ , these samples are collected w.r.t. probability $p_{i}=s_{i}/\sum_{i=1}^{N}s_{i}$ where $s_{i}$ is generated from a discrete power-law distribution $\mathbb{P}\left\{s_{i}=x\right\}\propto x^{-a}$ with scaling parameter $a=2.5$ . As shown in Figure 1(c), a few nodes have much larger in-degrees while most of them have less than 2. Compared to Example 5.2, network structure with power-law distribution exhibits larger gaps between the influences of different nodes. This type of network is suitable for modeling social media such as Twitter and Instagram, where celebrities have huge influence while the ordinary majority has little.

Example 5.4.

(Network with K-blocks) As it was proposed in Nowicki and Snijders, (2001), in a network with stochastic block structure, all nodes are divided into blocks and nodes from the same block are more likely to be connected comparing to those from different blocks. To simulate such structure, these $N$ nodes are randomly divided into $K$ groups by assigning labels $\left\{1,2,...,K\right\}$ to every nodes with equal probability. For any two nodes $i$ and $j$ from the same group, let $\mathbb{P}(a_{ij}=1)=0.5$ while for those two from different groups, $\mathbb{P}(a_{ij}=1)=0.001/N$ . Hence, it is very unlikely for nodes to be connected across groups. Our simulated network successfully mimics this characteristic as Figure 1(d) shows clear boundaries between groups. Block network also has its advantage in practical perspective. For instance, the price of one stock is highly relevant to those in the same industry sector.

Refer to caption — (a) Example 5.1 (D = 10)

Set the true parameters $\nu_{0}=(0.5,0.7,0.6,0.1,0.1,5)^{\prime}$ of the data generating process (2.1). As for the sample region $D_{NT}=\{(i,t):i=1,2,...,N;t=1,2,...,T\}$ , let $T$ increases from 200 to 2000, while $N$ also increases at relatively slower rates of $\mathcal{O}(\sqrt{T})$ and $\mathcal{O}(T/\log(T))$ respectively, as it is showed in the following table:

$T$	200	500	1000	2000
$N\approx\sqrt{T}$	14	22	31	44
$N\approx T/\log(T)$	37	80	144	263

For each network size $N$ , the adjacency matrix $A$ is simulated according to four different mechanisms in Example 5.1 to Example 5.4.

Remark.

Particularly in the empirical analysis we will study the dataset of car collisions across different neighbourhoods that are distributed on five boroughs of New York City. These boroughs are separated by rivers (except for Brooklyn and Queens), and neighbourhoods within the same borough are more likely to share a borderline while cross-borough connections are very rare. Therefore the network constructed with New York City neighbourhoods follows the block structure in Example 5.4 with $N=20$ and $K=5$ .

Based on a simulated network, the data is generated according to (2.1), and the true parameters are estimated by the MLE (3.4). To monitor the finite performance of MLE, data generation and parameter estimation are repeated for $M=1000$ times, for each combination of sample size $(N,T)$ . The $m$ -th replication produces the estimates $\hat{\theta}_{m}=(\hat{\omega}_{m},\hat{\alpha}^{(1)}_{m},\hat{\alpha}^{(2)}_{m},\hat{\xi}_{m},\hat{\beta}_{m})^{\prime}$ and $\hat{r}_{m}$ . We use the following two measurements to evaluate the performance of simulation results:

•

Root-mean-square error: $RMSE_{k}=\sqrt{M^{-1}\sum_{m=1}^{M}(\hat{\theta}_{km}-\theta_{k0})^{2}}$ ,
•

Coverage probability: $CP_{k}=M^{-1}\sum_{m=1}^{M}1_{\left\{\theta_{k0}\in CI_{km}\right\}}$ ,

where $\hat{\theta}_{km}$ represents $k$ th component of $\hat{\theta}_{m}$ , $CI_{km}$ is the 95% confidence interval defined as

CI_{km}=\left(\hat{\theta}_{km}-z_{0.975}\widehat{SE}_{km},\hat{\theta}_{km}+z_{0.975}\widehat{SE}_{km}\right).

Here the estimated standard error $\widehat{SE}_{km}$ is the square root of $k$ -th diagonal element of $(NT)^{-1}\widehat{\Sigma}_{NT}^{-1}$ , and $z_{0.975}$ is the 0.975th quantile of standard normal distribution. To eliminate the effect of starting points, a different initial values of $\theta$ is used for each $m$ . RMSEs and CPs with different sample sizes and network simulation mechanisms are reported in Tables 1 and 2; We also report the mean estimates of the threshold $r_{0}$ in the last columns of both tables.

	$T$	$N$	$\omega$	$\alpha^{(1)}$	$\alpha^{(2)}$	$\xi$	$\beta$	$\bar{r}$
Example 5.1	200	14	0.0696 (0.94)	0.0203 (0.94)	0.0278 (0.93)	0.0170 (0.95)	0.0256 (0.93)	5.028
	500	22	0.0367 (0.96)	0.0100 (0.95)	0.0138 (0.95)	0.0101 (0.93)	0.0127 (0.95)	5
	1000	31	0.0238 (0.95)	0.0058 (0.95)	0.0081 (0.95)	0.0062 (0.97)	0.0074 (0.95)	5
	2000	44	0.0153 (0.95)	0.0035 (0.95)	0.0047 (0.95)	0.0041 (0.96)	0.0045 (0.95)	5
Example 5.2	200	14	0.0454 (0.95)	0.0200 (0.95)	0.0264 (0.94)	0.0119 (0.96)	0.0245 (0.94)	5.045
	500	22	0.0284 (0.95)	0.0101 (0.95)	0.0134 (0.95)	0.0072 (0.94)	0.0126 (0.95)	5.002
	1000	31	0.0162 (0.97)	0.0059 (0.96)	0.0077 (0.97)	0.0044 (0.94)	0.0074 (0.95)	5
	2000	44	0.0112 (0.96)	0.0034 (0.96)	0.0047 (0.95)	0.0029 (0.94)	0.0043 (0.96)	5
Example 5.3	200	14	0.0511 (0.96)	0.0200 (0.95)	0.0272 (0.94)	0.0131 (0.95)	0.0246 (0.95)	5.034
	500	22	0.0349 (0.95)	0.0102 (0.95)	0.0135 (0.96)	0.0084 (0.95)	0.0127 (0.96)	5.001
	1000	31	0.0146 (0.95)	0.0060 (0.95)	0.0079 (0.95)	0.0038 (0.95)	0.0077 (0.94)	5
	2000	44	0.0104 (0.95)	0.0035 (0.95)	0.0048 (0.94)	0.0025 (0.95)	0.0043 (0.96)	5
Example 5.4	200	14	0.0882 (0.95)	0.0205 (0.95)	0.0273 (0.95)	0.0227 (0.94)	0.0256 (0.93)	5.013
	500	22	0.0379 (0.94)	0.0102 (0.95)	0.0136 (0.95)	0.0096 (0.95)	0.0124 (0.95)	5
	1000	31	0.0218 (0.95)	0.0060 (0.95)	0.0078 (0.95)	0.0055 (0.95)	0.0073 (0.96)	5
	2000	44	0.0118 (0.94)	0.0035 (0.96)	0.0047 (0.95)	0.0029 (0.95)	0.0043 (0.96)	5

Table 1: Simulation results with different network structures (

N\approx\sqrt{T}

	$T$	$N$	$\omega$	$\alpha^{(1)}$	$\alpha^{(2)}$	$\xi$	$\beta$	$\bar{r}$
Example 5.1	200	37	0.0537 (0.95)	0.0124 (0.95)	0.0164 (0.95)	0.0143 (0.94)	0.0158 (0.94)	5.002
	500	80	0.0287 (0.96)	0.0054 (0.94)	0.0071 (0.95)	0.0078 (0.95)	0.0066 (0.95)	5
	1000	144	0.0201 (0.95)	0.0029 (0.94)	0.0040 (0.93)	0.0055 (0.95)	0.0036 (0.94)	5
	2000	263	0.0136 (0.95)	0.0015 (0.94)	0.0019 (0.95)	0.0038 (0.95)	0.0019 (0.93)	5
Example 5.2	200	37	0.0347 (0.95)	0.0121 (0.95)	0.0170 (0.95)	0.0089 (0.95)	0.0161 (0.93)	5.008
	500	80	0.0140 (0.95)	0.0053 (0.95)	0.0070 (0.95)	0.0035 (0.95)	0.0066 (0.95)	5
	1000	144	0.0073 (0.95)	0.0029 (0.93)	0.0036 (0.95)	0.0020 (0.94)	0.0036 (0.93)	5
	2000	263	0.0041 (0.95)	0.0014 (0.95)	0.0020 (0.94)	0.0011 (0.95)	0.0018 (0.96)	5
Example 5.3	200	37	0.0385 (0.95)	0.0124 (0.94)	0.0168 (0.95)	0.0092 (0.95)	0.0152 (0.95)	5.003
	500	80	0.0144 (0.95)	0.0054 (0.95)	0.0071 (0.94)	0.0036 (0.95)	0.0067 (0.95)	5
	1000	144	0.0073 (0.94)	0.0029 (0.94)	0.0035 (0.96)	0.0019 (0.94)	0.0035 (0.95)	5
	2000	263	0.0037 (0.95)	0.0015 (0.95)	0.0019 (0.96)	0.0009 (0.95)	0.0018 (0.95)	5
Example 5.4	200	37	0.0498 (0.95)	0.0120 (0.95)	0.0165 (0.94)	0.0129 (0.94)	0.0148 (0.96)	5.011
	500	80	0.0176 (0.94)	0.0055 (0.94)	0.0071 (0.94)	0.0045 (0.94)	0.0069 (0.94)	5
	1000	144	0.0083 (0.97)	0.0028 (0.95)	0.0036 (0.96)	0.0022 (0.96)	0.0034 (0.95)	5
	2000	263	0.0048 (0.95)	0.0015 (0.95)	0.0019 (0.95)	0.0012 (0.96)	0.0019 (0.95)	5

Table 2: Simulation results with different network structures (

N\approx T/\log(T)

From Tables 1 and 2 it can be seen that the RMSEs of $\hat{\theta}_{NT}$ decrease asymptotically toward zero, and the mean of $\hat{r}_{NT}$ is equal to $r_{0}=5$ for sufficiently large sample size. These results support the consistency of MLE (3.4) in Theorem 2. The reported CPs are close to the theoretical value $0.95$ . This shows that $\widehat{SE}$ provides a reliable estimation of the true standard error of $\hat{\theta}_{NT}$ . Moreover, normal Q-Q plots for the estimation results, Figures 2 to 5, are presented when $T=2000,N=44$ and $T=2000,N=263$ respectively, under different network structures. These Q-Q plots provide additional evidence for the asymptotic normality of $\hat{\theta}_{NT}$ in Theorem 3.

5.2 Analysis of daily numbers of car accidents in New York City

New York City Police Department (NYPD) publishes and updates regularly the detailed data of motor vehicle collisions that have occurred city-wide. These data are openly accessible on NYPD website¹¹1https://www1.nyc.gov/site/nypd/stats/traffic-data/traffic-data-collision.page and contain sufficient information for us to apply our model. We collect all records from 16th February 2021 to 30th June 2022, each record includes the date when an accident happened, and the zip code of where it happened. We classified all records into 41 neighbourhoods according to the correspondence between zip codes and the geometric locations they represent. Re-grouping the data by neighbourhoods and dates of occurrence, we obtain a high-dimensional time series with dimension $N=41$ and sample size $T=500$ .

Two neighbourhoods are regarded as connected nodes if they share a borderline. Therefore, based on the geometric information provided by the data, we are able to construct a reasonable network with 41 nodes, which is visualized in Figure 6. In Figure 7 we plot histograms of daily numbers of car accidents in 9 randomly selected neighbourhoods. The shapes of histograms show potential Poisson distribution. Moreover, in Figure 8 we can easily observe volatility clustering in daily numbers of car accident in four selected neighbourhoods of NYC, indicating potential autoregressive structure in the conditional heteroscedasticity of the data.

Our model was fitted to this dataset by the method proposed in Section 3. The results of parameter estimation are reported in Table 3 below.

	$\omega$	$\alpha^{(1)}$	$\alpha^{(2)}$	$\xi$	$\beta$	$r$
Estimation	0.018693	0.126472	0.135026	0.002727	0.862244	10
SE	4.12e-03	4.40e-03	4.68e-03	1.09e-03	4.73e-03	\

Table 3: Estimation results based on daily number of car accidents in 41 neighbourhoods of NYC.

Now we try to interpret these results. Firstly, it is worthy noting that $\alpha^{(1)}$ is slightly smaller than $\alpha^{(2)}$ , which means that the conditional variance of the number of car accidents in these neighbourhoods are less affected by previous day’s number if it is above the threshold $r=10$ . Secondly, the volatility in the number of car accidents in one area is also affected by its geometrically neighboured areas. In addition, the estimated value of $\beta$ is significantly larger than other coefficients, indicating a strong persistence in volatility that leads to volatility clustering.

At last, we utilize the Wald test in Section 4 to investigate the following important properties of this real dataset, and these properties provide substantial evidence to support our model setting:

(i) The existence of threshold effect (i.e asymmetric property) for volatility. The null hypothesis is $H_{0}:\alpha^{(1)}_{0}=\alpha^{(2)}_{0}$ (by taking $\Gamma=(0,1,-1,0,0)$ and $\eta=0$ in (4.1)). In this case, the Wald statistic (4.2) $W_{NT}=18.94$ , which suggests the rejection of $H_{0}$ at significant level below $0.01$ according to Theorem 4. This testing shows that the proposed model with threshold is essentially useful for capturing the nature of daily numbers of car accidents in New York City.

(ii) The existence of GARCH effect (persistence in volatility). For the null hypothesis is $H_{0}:\beta_{0}=0$ , the Wald statistic (4.2) $W_{NT}=33212$ and $p$ -value is very close $0$ . This strongly suggests that the autoregressive term (i.e. $\lambda_{i,t-1}$ ) should be included in the model. This can be seen from the large value of estimate of $\beta$ .

(iii) The existence of network effect. The Wald statistic (4.2) $W_{NT}=6.31$ and $p$ -value is $0.012$ in this case. This indicate that there is significant network effect among the daily numbers of car accidents.

6 Conclusion

We have proposed a model to describe spatio-temporal integer-valued data which are observed at nodes of a network and have asymmetric property. Although the proposed model is called GARCH-type for volatility (conditional variance), it also can be used to model the conditional mean, because we assume that the conditional distribution is Poisson distribution. Asymptotic inferences, parameter estimation and hypothesis testing of the proposed model, are discussed by applying limit theorems for nonstationary arrays of random fields. Simulations for different network structures and application to a real dataset show that our methodology is useful for modelling dynamic spatio-temporal integer-valued data.

Appendix A Proofs of theoretical results

In this appendix, we give details of proofs for our theoretical results.

Lemma A.1.

If $0\leq\beta<1$ , $\mathbb{E}\left|y_{it}\right|<\infty$ and $\mathbb{E}\left|\lambda_{it}(\nu)\right|<\infty$ for all $(i,t)\in D_{NT},NT\geq 1$ , then

\lambda_{it}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right]

(A.1)

with probability one for all $(i,t)\in D_{NT},NT\geq 1$ and $\nu\in\Theta\times\mathbb{Z}_{+}$ , where $\alpha_{i,t-k}=\alpha^{(1)}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}<r\}}$ .

Proof.

When $\beta=0$ , (A.1) obviously holds. Now we consider the case when $0<\beta<1$ . Let $\log^{+}(x)=\log(x)$ if $x>1$ and 0 otherwise, $u_{i,t-k}(\nu):=\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}$ . By Jensen’s inequality we have

		$\displaystyle\mathbb{E}\log^{+}\|u_{i,t-k}(\nu)\|$
	$\displaystyle\leq$	$\displaystyle\log^{+}\mathbb{E}\left\|\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right\|$
	$\displaystyle<$	$\displaystyle\infty.$

By Lemma 2.2 in Berkes et al., (2003) we have $\sum_{k=1}^{\infty}\mathbb{P}\left[|u_{i,t-k}(\nu)|>\zeta^{k}\right]<\infty$ for any $\zeta>1$ . Therefore, by the Borel-Cantelli lemma, $|u_{i,t-k}(\nu)|\leq\zeta^{k}$ almost surely. Letting $1<\zeta<\frac{1}{|\beta|}$ , we can prove that the right-hand-side of (A.1) converges almost surely.

It remains for us to show that

\lambda_{it}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}u_{i,t-k}(\nu).

From (3.2), we have

\lambda_{it}(\nu)-\beta^{k}\lambda_{i,t-k-1}(\nu)=u_{i,t-1}(\nu)+\beta u_{i,t-2}(\nu)+...+\beta^{k-1}u_{i,t-k}(\nu).

Using the Markov inequality, we obtain that $\sum_{k=1}^{\infty}\mathbb{P}\left\{|\beta^{k}\lambda_{i,t-k-1}(\nu)|>\delta\right\}<\infty$ for any $\delta>0$ . Then, by the Borel-Cantelli lemma, $|\beta^{k}\lambda_{i,t-k-1}(\nu)|\overset{a.s.}{\to}0$ as $k\to\infty$ . Letting $k\to\infty$ on both sides of above equation, we can conclude the result of Lemme A.1. ∎

A.1 Proof of Theorem 1

Our proof of Theorem 1 relies on the arguments given by Ferland et al., (2006) in their proof of Corollary 1. Let

	$\displaystyle\Lambda_{t}^{(0)}:=\left(\lambda_{1t}^{(0)},\lambda_{2t}^{(0)},...,\lambda_{Nt}^{(0)}\right)^{\prime};$
	$\displaystyle\mathbb{Y}_{t}^{(0)}:=\left(N_{1t}(\lambda_{1t}^{(0)}),N_{2t}(\lambda_{2t}^{(0)}),...,N_{Nt}(\lambda_{Nt}^{(0)})\right)^{\prime},$

where $\{\lambda_{it}^{(0)}:i=1,2,...,N,t\in\mathbb{Z}\}$ are IID positive random variables with mean 1. For each $n\geq 1$ , we define $\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\}$ and $\{\Lambda_{t}^{(n)}:t\in\mathbb{Z}\}$ through following recursion:

		$\displaystyle\mathbb{Y}_{t}^{(n)}=(N_{1t}(\lambda_{1t}^{(n)}),N_{2t}(\lambda_{2t}^{(n)}),...,N_{Nt}(\lambda_{Nt}^{(n)}))^{\prime};$		(A.2)
		$\displaystyle\Lambda_{t}^{(n)}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}+\beta\Lambda_{t-1}^{(n-1)}.$		(A.2)

Claim A.1.

$\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\}$ is strictly stationary for each $n\geq 0$ .

Proof.

Since $\{N_{it}(\cdot):i=1,2,...,N,t\in\mathbb{Z}\}$ are independent Poisson processes with unit intensity, then for any $t$ and $h$ we have

	$\displaystyle\mathbb{P}\left\{\mathbb{Y}_{1+h}^{(n)}=\mathbf{y}_{1},...,\mathbb{Y}_{t+h}^{(n)}=\mathbf{y}_{t}\right\}$	(A.3)
$\displaystyle=$	$\displaystyle\mathbb{E}\left(\mathbb{P}\left\{\mathbb{Y}_{1+h}^{(n)}=\mathbf{y}_{1},...,\mathbb{Y}_{t+h}^{(n)}=\mathbf{y}_{t}\left\|\Lambda_{1+h}^{(n)},...,\Lambda_{t+h}^{(n)}\right.\right\}\right)$
$\displaystyle=$	$\displaystyle\mathbb{E}\left(\prod_{k=1}^{t}\prod_{i=1}^{N}\frac{\left(\lambda_{i,k+h}^{(n)}\right)^{y_{ik}}}{y_{ik}!}e^{-\lambda_{i,k+h}^{(n)}}\right).$

When $n=0$ , $\mathbb{P}\left\{\mathbb{Y}_{1+h}^{(0)}=\mathbf{y}_{1},...,\mathbb{Y}_{t+h}^{(0)}=\mathbf{y}_{t}\right\}$ is $h$ -invariant for any $t$ and $h$ , by (A.3) and the IID of $\{\lambda_{it}^{(0)}:i=1,2,...,N,t\in\mathbb{Z}\}$ . Therefore $\{\mathbb{Y}_{t}^{(0)}:t\in\mathbb{Z}\}$ is strictly stationary. Assume that $\{\mathbb{Y}_{t}^{(n-1)}:t\in\mathbb{Z}\}$ and $\{\Lambda_{t}^{(n-1)}:t\in\mathbb{Z}\}$ are strictly stationary, then $\{\Lambda_{t}^{(n)}:t\in\mathbb{Z}\}$ is also strictly stationary since $\Lambda_{t}^{(n)}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}+\beta\Lambda_{t-1}^{(n-1)}$ . According to (A.3) and the strict stationarity of $\{\Lambda_{t}^{(n)}:t\in\mathbb{Z}\}$ , we have $\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\}$ being strictly stationary too. Claim A.1 can be proved by induction.

∎

Let $\left\lVert\mathbf{x}\right\rVert_{1}=|x_{1}|+|x_{2}|+...+|x_{N}|$ for vector $\mathbf{x}=(x_{1},x_{2},...,x_{N})^{\prime}$ . In following claim we prove the convergence of $\mathbb{Y}_{t}^{(n)}$ as $n\to\infty$ .

Claim A.2.

$\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}\leq C\rho^{n}$ for some constants $C>0$ and $0<\rho<1$ .

Proof.

Since $N_{it}$ is a Poisson process with unit intensity, $N_{it}(\lambda_{it}^{(n+1)})-N_{it}(\lambda_{it}^{(n)})$ is Poisson distributed with parameter $\lambda_{it}^{(n+1)}-\lambda_{it}^{(n)}$ assuming that $\lambda_{it}^{(n+1)}\geq\lambda_{it}^{(n)}$ . Then it is easy to verify that

		$\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\mathbb{E}\left(\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}\left\|\Lambda_{t}^{(n+1)},\Lambda_{t}^{(n)}\right.\right)\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\mathbb{E}\left(\sum_{i=1}^{N}\left\|N_{it}(\lambda_{it}^{(n+1)})-N_{it}(\lambda_{it}^{(n)})\right\|\left\|\Lambda_{t}^{(n+1)},\Lambda_{t}^{(n)}\right.\right)\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\sum_{i=1}^{N}\left\|\lambda_{it}^{(n+1)}-\lambda_{it}^{(n)}\right\|\right]$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left\lVert\Lambda_{t}^{(n+1)}-\Lambda_{t}^{(n)}\right\rVert_{1}.$

Recall that, from (A.2),

\Lambda_{t}^{(n)}=\omega\mathbf{1}_{N}+A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}+\beta\Lambda_{t-1}^{(n-1)}.

Then

		$\displaystyle\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}$		(A.4)
	$\displaystyle\leq$	$\displaystyle\left\lVert A(\mathbb{Y}_{t-1}^{(n)})\mathbb{Y}_{t-1}^{(n)}-A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}+\beta\left\lVert\Lambda_{t-1}^{(n)}-\Lambda_{t-1}^{(n-1)}\right\rVert_{1}.$		(A.4)

To estimate the right-side terms of the above inequality, we define function $\psi(y)=\alpha^{(1)}1_{\{y\geq r\}}y+\alpha^{(2)}1_{\{y<r\}}y$ for $y\in\mathbb{N}$ . For any $y,y^{\prime}\in\mathbb{N}$ , we have

•

If $y\geq r$ and $y^{\prime}\geq r$ , we have $|\psi(y^{\prime})-\psi(y)|=\alpha^{(1)}|y^{\prime}-y|\leq\alpha^{*}|y^{\prime}-y|$ where $\alpha^{*}=\max\left\{\alpha^{(1)},\alpha^{(2)},\left|\alpha^{(1)}r-\alpha^{(2)}(r-1)\right|\right\}$ ;
•

If $y<r$ and $y^{\prime}<r$ , we have $|\psi(y^{\prime})-\psi(y)|=\alpha^{(2)}|y^{\prime}-y|\leq\alpha^{*}|y^{\prime}-y|$ .

•

If $y$ and $y^{\prime}$ are on different sides of $r$ , we assume that $y\geq r$ and $y^{\prime}<r$ without loss of generality. Notice that

\frac{\psi(y)-\psi(y^{\prime})}{y-y^{\prime}}=\frac{\alpha^{(1)}y-\alpha^{(2)}y^{\prime}}{y-y^{\prime}}=\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})\frac{y}{y-y^{\prime}}.

When $\alpha^{(1)}\geq\alpha^{(2)}$ , we see

0<\frac{\psi(y)-\psi(y^{\prime})}{y-y^{\prime}}\leq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})\frac{y}{y-(r-1)}\leq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})r;

When $\alpha^{(1)}<\alpha^{(2)}$ , we see

\alpha^{(2)}\geq\frac{\psi(y)-\psi(y^{\prime})}{y-y^{\prime}}\geq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})\frac{y}{y-(r-1)}\geq\alpha^{(2)}+(\alpha^{(1)}-\alpha^{(2)})r.

Combining above cases, we obtain that

|\psi(y^{\prime})-\psi(y)|\leq\alpha^{*}|y^{\prime}-y|

(A.5)

for any $y^{\prime},y\in\mathbb{N}$ .

Therefore,

	$\displaystyle\left\|\left(A(\mathbb{Y}_{t-1}^{(n)})\mathbb{Y}_{t-1}^{(n)}-A(\mathbb{Y}_{t-1}^{(n-1)})\mathbb{Y}_{t-1}^{(n-1)}\right)_{i}\right\|$
$\displaystyle=$	$\displaystyle\left\|\psi(y_{i,t-1}^{(n)})-\psi(y_{i,t-1}^{(n-1)})+\xi\sum_{j=1}^{N}w_{ij}(y_{j,t-1}^{(n)}-y_{j,t-1}^{(n-1)})\right\|$	(A.6)
$\displaystyle\leq$	$\displaystyle\alpha^{*}\left\|y_{i,t-1}^{(n)}-y_{i,t-1}^{(n-1)}\right\|+\xi\sum_{j=1}^{N}w_{ij}\left\|y_{j,t-1}^{(n)}-y_{j,t-1}^{(n-1)}\right\|$

for $i=1,2,...,N$ , where $(\mathbb{Y})_{i}$ is the $i$ -th element of $\mathbb{Y}$ .

Combining (A.4) and (A.6), we get

		$\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}$
	$\displaystyle\leq$	$\displaystyle\mathbb{E}\left\lVert(\alpha^{*}I_{N}+\xi W+\beta I_{N})(\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)})\right\rVert_{1}$
	$\displaystyle\leq$	$\displaystyle\rho(\alpha^{*}I_{N}+\xi W+\beta I_{N})\mathbb{E}\left\lVert\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}$
	$\displaystyle\leq$	$\displaystyle(\alpha^{*}+\xi+\beta)\mathbb{E}\left\lVert\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}$

where $\rho(\cdot)$ denotes the spectral radius, and the last inequality is due to the Gershgorin circle theorem. According to (2.3), we can find $\rho=\alpha^{*}+\xi+\beta<1$ such that

		$\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}$
	$\displaystyle\leq$	$\displaystyle\rho\mathbb{E}\left\lVert\mathbb{Y}_{t-1}^{(n)}-\mathbb{Y}_{t-1}^{(n-1)}\right\rVert_{1}$
	$\displaystyle\leq$	$\displaystyle\rho^{n}\mathbb{E}\left\lVert\mathbb{Y}_{t-n}^{(1)}-\mathbb{Y}_{t-n}^{(0)}\right\rVert_{1}$
	$\displaystyle=$	$\displaystyle\rho^{n}\mathbb{E}\left\lVert\Lambda_{t-n}^{(1)}-\Lambda_{t-n}^{(0)}\right\rVert_{1}$
	$\displaystyle\leq$	$\displaystyle C\rho^{n}$

for $C=\mathbb{E}\left\lVert\Lambda_{t-n}^{(1)}-\Lambda_{t-n}^{(0)}\right\rVert_{1}<\infty$ .

∎

By Claim A.2,

	$\displaystyle\mathbb{P}\left\{\mathbb{Y}_{t}^{(n+1)}\neq\mathbb{Y}_{t}^{(n)}\right\}=$	$\displaystyle\sum_{h=1}^{\infty}\mathbb{P}\left\{\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}=h\right\}$
	$\displaystyle\leq$	$\displaystyle\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+1)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}$
	$\displaystyle\leq$	$\displaystyle C\rho^{n}.$

This implies that $\sum_{n=1}^{\infty}\mathbb{P}\left\{\mathbb{Y}_{t}^{(n+1)}\neq\mathbb{Y}_{t}^{(n)}\right\}<\infty$ and

\mathbb{P}\left\{\bigcap_{n=1}^{\infty}\bigcup_{k=n}^{\infty}\left[\mathbb{Y}_{t}^{(k+1)}\neq\mathbb{Y}_{t}^{(k)}\right]\right\}=0

according to the Borel-Cantelli lemma. This means that, with probability one, there exists $M$ such that for all $n>M$ , $\mathbb{Y}_{t}^{(n)}$ equals to some $\mathbb{Y}_{t}$ with integer components. That is, $\mathbb{Y}_{t}=\lim_{n\to\infty}\mathbb{Y}_{t}^{(n)}$ exists almost surely. Apparently, $\{\mathbb{Y}_{t}:t\in\mathbb{Z}\}$ is strictly stationary since $\{\mathbb{Y}_{t}^{(n)}:t\in\mathbb{Z}\}$ is strictly stationary for each $n\geq 0$ , according to Claim A.1.

At last, by Claim A.2, we also have

\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+m)}-\mathbb{Y}_{t}^{(n)}\right\rVert_{1}\leq\sum_{k=0}^{m-1}\mathbb{E}\left\lVert\mathbb{Y}_{t}^{(n+k+1)}-\mathbb{Y}_{t}^{(n+k)}\right\rVert_{1}\leq C\rho^{n}\sum_{k=0}^{m-1}\rho^{k},

for any $n,m\in\mathbb{N}$ . Therefore, $\{\mathbb{Y}_{t}^{(n)}:n\geq 0\}$ is a Cauchy sequence in $\mathbb{L}^{1}$ , hence $\mathbb{E}\left\lVert\mathbb{Y}_{t}\right\rVert_{1}<\infty$ .

A.2 Proof of Theorem 2

From Lemma A.1, we have

\lambda_{it}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right]

and

\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}|\lambda_{it}(\nu)|<\infty

(A.7)

with probability one, where $\alpha_{i,t-k}=\alpha^{(1)}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}<r\}}$ . Given initial values $\tilde{\lambda}_{i0}=0$ for $i=1,2,...,N$ , we can replace $\lambda_{it}(\nu)$ with $\tilde{\lambda}_{it}(\nu)$ and get

\tilde{\lambda}_{it}(\nu)=\sum_{k=1}^{t}\beta^{k-1}\left[\omega+\alpha_{i,t-k}y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right]

for $i=1,2,...,N$ and $t\geq 1$ . Hence

\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)=\beta^{t}\lambda_{i0}(\nu).

(A.8)

Now we are ready to prove the consistency of $\hat{\nu}_{NT}$ when $T\to\infty$ and $N\to\infty$ . The proof is broken up into Claim A.3 to Claim A.6 below: Claim A.3 shows that the choice of initial values is asymptotically negligible; Claims A.4 and A.5 verify the weak dependence of $\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\}$ , and facilitate the adoption of LLN; Claim A.6 is concerned with the identifiability of the true parameters $\nu_{0}$ .

Claim A.3.

For any $\nu\in\Theta\times\mathbb{Z}_{+}$ , $|L_{NT}(\nu)-\tilde{L}_{NT}(\nu)|\overset{p}{\to}0$ as $T\to\infty$ and $N\to\infty$ .

Proof.

By (A.7) and (A.8), we have

\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}|\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)|\leq C\rho^{t}

(A.9)

almost surely. Therefore,

		$\displaystyle\|L_{NT}(\nu)-\tilde{L}_{NT}(\nu)\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left\|y_{it}\log\left[\frac{\lambda_{it}(\nu)}{\tilde{\lambda}_{it}(\nu)}\right]-[\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)]\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[y_{it}\left\|\frac{\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)}{\tilde{\lambda}_{it}(\nu)}\right\|+\left\|\lambda_{it}(\nu)-\tilde{\lambda}_{it}(\nu)\right\|\right]$
	$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}C\rho^{t}\left(\frac{y_{it}}{\omega}+1\right)$

almost surely. By the Markov inequality, for any $\delta>0$ ,

		$\displaystyle\mathbb{P}\left\{\|L_{NT}(\nu)-\tilde{L}_{NT}(\nu)\|>\delta\right\}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\delta NT}\sum_{i=1}^{N}\sum_{t=1}^{T}C_{1}\rho^{t}\mathbb{E}\left\|\frac{y_{it}}{\omega}+1\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\delta NT}\sum_{i=1}^{N}\sum_{t=1}^{T}C_{2}\rho^{t}\quad\mbox{(because of Assumption 3.2(a))}$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\delta T}\frac{C_{2}\rho}{1-\rho}\rightarrow 0.$

as $N\to\infty$ and $T\to\infty$ . ∎

In the following proofs, for a random variable $X$ , we denote its $\mathbb{L}^{p}$ -norm by $\left\lVert X\right\rVert_{p}=(\mathbb{E}|X|^{p})^{1/p}$ .

Claim A.4.

The functions $l_{it}(\nu)$ are uniformly $\mathbb{L}^{p}$ -bounded for some $p>1$ , i.e.

\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert l_{it}(\nu)\right\rVert_{p}<\infty.

Proof.

According to the Hölder inequality, we have

	$\displaystyle\left\lVert l_{it}(\nu)\right\rVert_{p}=$	$\displaystyle\left\lVert y_{it}\log\lambda_{it}(\nu)-\lambda_{it}(\nu)\right\rVert_{p}$
	$\displaystyle\leq$	$\displaystyle\left\lVert y_{it}\log\lambda_{it}(\nu)\right\rVert_{p}+\left\lVert\lambda_{it}(\nu)\right\rVert_{p}$
	$\displaystyle\leq$	$\displaystyle\left\lVert y_{it}\right\rVert_{2p}\left\lVert\log\lambda_{it}(\nu)\right\rVert_{2p}+\left\lVert\lambda_{it}(\nu)\right\rVert_{p}.$

Note that

		$\displaystyle\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\log\lambda_{it}(\nu)\right\rVert_{2p}$
	$\displaystyle\leq$	$\displaystyle\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\log^{+}\lambda_{it}(\nu)\right\rVert_{2p}+\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\log^{-}\lambda_{it}(\nu)\right\rVert_{2p}$
	$\displaystyle\leq$	$\displaystyle\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\left\lVert\lambda_{it}(\nu)+1\right\rVert_{2p}+\sup_{\nu\in\Theta\times\mathbb{Z}_{+}}\max\{-\log(\omega),0\}.$

Then, by Assumption 3.2(a) and (A.7), we complete the proof. ∎

Claim A.5.

For any element $\nu$ in the parameter space satisfying Assumption 3.1, the array of random fields $\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\}$ is $\eta$ -weakly dependent with coefficient $\bar{\eta}_{0}(s)\leq Cs^{-\mu_{0}}$ where $\mu_{0}>2$ .

Proof.

For each $(i,t)\in D_{NT}$ and $h=1,2,...$ , define $\{y_{j\tau}^{(h)}:(j,\tau)\in D_{NT},NT\geq 1\}$ such that $y_{j\tau}^{(h)}\neq y_{j\tau}$ if and only if $\rho((i,t),(j,\tau))=h$ .

\lambda_{it}^{(h)}(\nu)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\alpha_{i,t-k}^{(h)}y_{i,t-k}^{(h)}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}^{(h)}\right],

where

\alpha_{i,t-k}^{(h)}=\alpha^{(1)}1_{\{y_{i,t-k}^{(h)}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}^{(h)}<r\}}.

Then, by (A.5) and Assumption 3.3, we have

	$\displaystyle\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|$
$\displaystyle\leq$	$\displaystyle\sum_{k=1}^{\infty}\beta^{k-1}\|\alpha_{i,t-k}y_{i,t-k}-\alpha_{i,t-k}^{(h)}y_{i,t-k}^{(h)}\|+\sum_{k=1}^{\infty}\sum_{j=1}^{N}\beta^{k-1}\xi w_{ij}\|y_{j,t-k}-y_{j,t-k}^{(h)}\|$
$\displaystyle=$	$\displaystyle\beta^{h-1}\|\alpha_{i,t-h}y_{i,t-h}-\alpha_{i,t-h}^{(h)}y_{i,t-h}^{(h)}\|+\xi\beta^{h-1}\sum_{1\leq\|j-i\|\leq h}w_{ij}\|y_{j,t-h}-y_{j,t-h}^{(h)}\|$	(A.10)
	$\displaystyle+\xi w_{i,i\pm h}\sum_{k=1}^{h}\beta^{k-1}\|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}\|$
$\displaystyle\leq$	$\displaystyle\alpha^{*}\beta^{h-1}\|y_{i,t-h}-y_{i,t-h}^{(h)}\|+\xi\beta^{h-1}\sum_{1\leq\|j-i\|\leq h}\|y_{j,t-h}-y_{j,t-h}^{(h)}\|$
	$\displaystyle+C\xi h^{-b}\sum_{k=1}^{h}\|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}\|.$

Therefore $\lambda_{it}(\nu)$ satisfies condition (2.7) in Pan and Pan, (2024) with $B_{(i,t),NT}(h)\leq Ch^{-b}$ and $l=0$ . By Proposition 2 and Example 2.1 in Pan and Pan, (2024), the array of random fields $\{\lambda_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\}$ is $\eta$ -weakly dependent with coefficient $\bar{\eta}_{\lambda}(s)\leq Cs^{-\mu_{y}+2}$ .

Similarly we can define

l_{it}^{(h)}(\nu)=y_{it}^{(h)}\log\lambda_{it}^{(h)}(\nu)-\lambda_{it}^{(h)}(\nu).

Since

	$\displaystyle\|l_{it}(\nu)-l_{it}^{(h)}(\nu)\|\leq$	$\displaystyle y_{it}\left\|\log\frac{\lambda_{it}(\nu)}{\lambda_{it}^{(h)}(\nu)}\right\|+\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|$
	$\displaystyle\leq$	$\displaystyle y_{it}\left\|\frac{\lambda_{it}(\nu)}{\lambda_{it}^{(h)}(\nu)}-1\right\|+\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|$
	$\displaystyle\leq$	$\displaystyle\frac{y_{it}}{\omega}\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|+\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|,$

$l_{it}(\nu)$ also satisfies condition (2.7) in Pan and Pan, (2024) with $B_{(i,t),NT}(h)\leq Ch^{-b}$ and $l=1$ by (A.10), the array of random fields $\{l_{it}(\nu):(i,t)\in D_{NT},NT\geq 1\}$ is $\eta$ -weakly dependent with coefficient $\bar{\eta}_{0}(s)\leq Cs^{-\frac{2p-2}{2p-1}\mu_{y}+2}$ . Note that $\frac{2p-2}{2p-1}\mu_{y}-2>2$ because of $\mu_{y}>\frac{4p-2}{p-1}$ . So Claim A.5 is proved. ∎

Claim A.6.

$\lambda_{it}(\nu)=\lambda_{it}(\nu_{0})$ for all $(i,t)\in D_{NT}$ if and only if $\nu=\nu_{0}$ .

Proof.

The if part is obvious, it remains for us to prove the only if part. Observe that

(1-\beta B)\lambda_{it}(\nu)=\omega+\alpha By_{it}+\xi\sum_{j=1}^{N}w_{ij}By_{jt},

where $B$ stands for the back-shift operator in the sense that $By_{it}^{2}=y_{i,t-1}^{2}$ , and $\alpha$ represents either $\alpha^{(1)}$ or $\alpha^{(2)}$ according to the value of $\alpha_{it}$ at time $t$ . Therefore we have

(1-\beta B)\Lambda_{t}(\nu)=\omega\mathbf{1}_{N}+(\alpha BI_{N}+\xi BW)\mathbb{Y}_{t}.

The polynomial $1-\beta x$ has a root $x=1/\beta$ , which lies outside the unit circle since $0<\beta<1$ . Therefore the inverse $\frac{1}{1-\beta x}$ is well-defined for any $|x|\leq 1$ , and we have

\Lambda_{t}(\nu)=\frac{\omega}{1-\beta}\mathbf{1}_{N}+\mathcal{P}_{\nu}(B)\mathbb{Y}_{t}

with $\mathcal{P}_{\nu}(B):=\frac{\alpha B}{1-\beta B}I_{N}+\frac{\xi B}{1-\beta B}W.$ As $\lambda_{it}(\nu)=\lambda_{it}(\nu_{0})$ for each $i=1,2,...,N$ ,

\left[\mathcal{P}_{\nu}(B)-\mathcal{P}_{\nu_{0}}(B)\right]\mathbb{Y}_{t}=\left(\frac{\omega_{0}}{1-\beta_{0}}-\frac{\omega}{1-\beta}\right)\mathbf{1}_{N}.

We can deduce from above equation that $\mathcal{P}_{\nu}(x)=\mathcal{P}_{\nu_{0}}(x)$ for any $|x|\leq 1$ , otherwise $\mathbb{Y}_{t}$ will be degenerated to a deterministic vector given $\mathcal{H}_{t-1}$ . $\mathcal{P}_{\nu}(x)=\mathcal{P}_{\nu_{0}}(x)$ implies that

\frac{\alpha x}{1-\beta x}I_{N}-\frac{\alpha_{0}x}{1-\beta_{0}x}I_{N}=\left(\frac{\xi_{0}x}{1-\beta_{0}x}-\frac{\xi x}{1-\beta x}\right)W.

The diagonal elements of $W$ are all zeros while the matrix on the left side of above equation has non-zero diagonal elements, so we have

	$\displaystyle\frac{\alpha x}{1-\beta x}=\frac{\alpha_{0}x}{1-\beta_{0}x},$
	$\displaystyle\frac{\xi x}{1-\beta x}=\frac{\xi_{0}x}{1-\beta_{0}x},$

which imply $\alpha=\alpha_{0}$ , $\beta=\beta_{0}$ and $\xi=\xi_{0}$ . Besides, $\omega=\omega_{0}$ can be easily derived from $\frac{\omega}{1-\beta}=\frac{\omega_{0}}{1-\beta_{0}}$ . ∎

With Claim A.4 and Claim A.5, we can apply Theorem 1 in Pan and Pan, (2024) and obtain that

\left[L_{NT}(\nu)-\mathbb{E}L_{NT}(\nu)\right]\overset{p}{\to}0

(A.11)

for any $\nu\in\Theta\times\mathbb{Z}_{+}$ . Therefore, we have

	$\displaystyle\lim_{T,N\to\infty}[L_{NT}(\nu)-L_{NT}(\nu_{0})]$
$\displaystyle=$	$\displaystyle\lim_{T,N\to\infty}\mathbb{E}\left[L_{NT}(\nu)-L_{NT}(\nu_{0})\right]$
$\displaystyle=$	$\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left[y_{it}\log\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right]$
$\displaystyle=$	$\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left\{\mathbb{E}\left[\left.y_{it}\log\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right\|\lambda_{it}(\nu),\lambda_{it}(\nu_{0})\right]\right\}$	(A.12)
$\displaystyle=$	$\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left[\lambda_{it}(\nu_{0})\log\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right]$
$\displaystyle\leq$	$\displaystyle\lim_{T,N\to\infty}\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\mathbb{E}\left\{\lambda_{it}(\nu_{0})\left[\frac{\lambda_{it}(\nu)}{\lambda_{it}(\nu_{0})}-1\right]-(\lambda_{it}(\nu)-\lambda_{it}(\nu_{0}))\right\}$
$\displaystyle=$	$\displaystyle 0,$

with equality if and only if $\lambda_{it}(\nu)=\lambda_{it}(\nu_{0})$ for all $(i,t)\in D_{NT}$ , which is equivalent to $\nu=\nu_{0}$ by Claim A.6.

Note that Claim A.3 implies that

\lim_{T,N\to\infty}\mathbb{P}\left[|L_{NT}(\hat{\nu}_{NT})-\tilde{L}_{NT}(\hat{\nu}_{NT})|<\frac{\delta}{3}\right]=1

for any $\delta>0$ , hence

\lim_{T,N\to\infty}\mathbb{P}\left[L_{NT}(\hat{\nu}_{NT})>\tilde{L}_{NT}(\hat{\nu}_{NT})-\frac{\delta}{3}\right]=1.

Since $\hat{\nu}_{NT}$ maximizes $\tilde{L}_{NT}(\nu)$ , we have

\lim_{T,N\to\infty}\mathbb{P}\left[\tilde{L}_{NT}(\hat{\nu}_{NT})>\tilde{L}_{NT}({\nu}_{0})-\frac{\delta}{3}\right]=1.

\lim_{T,N\to\infty}\mathbb{P}\left[L_{NT}(\hat{\nu}_{NT})>\tilde{L}_{NT}({\nu}_{0})-\frac{2\delta}{3}\right]=1.

Furthermore, from Claim A.3,

\lim_{T,N\to\infty}\mathbb{P}\left[\tilde{L}_{NT}({\nu}_{0})>L_{NT}({\nu}_{0})-\frac{\delta}{3}\right]=1.

Therefore we have

\lim_{T,N\to\infty}\mathbb{P}\left[0\leq L_{NT}(\nu_{0})-L_{NT}(\hat{\nu}_{NT})<\delta\right]=1.

(A.13)

Let $V_{k}(\theta)$ be an open sphere with centre $\theta$ and radius $1/k$ . By (A.12) we could find

\delta=\inf_{\begin{subarray}{c}\theta\in\Theta\smallsetminus V_{k}(\theta_{0})\\ r\neq r_{0}\end{subarray}}\left[L_{NT}(\nu_{0})-L_{NT}(\nu)\right]>0.

Then by (A.13),

\displaystyle\lim_{T,N\to\infty}\mathbb{P}\left\{0\leq L_{NT}(\nu_{0})-L_{NT}(\hat{\nu}_{NT})<\inf_{\begin{subarray}{c}\theta\in\Theta\smallsetminus V_{k}(\theta_{0})\\ r\neq r_{0}\end{subarray}}\left[L_{NT}(\nu_{0})-L_{NT}(\nu)\right]\right\}=1.

This implies that

\displaystyle\lim_{T,N\to\infty}\mathbb{P}\left[\hat{\theta}_{NT}\in V_{k}(\theta_{0}),\hat{r}_{NT}=r_{0}\right]=1

for any given $k>0$ . Let $k\to\infty$ we complete the proof.

A.3 Proof of Theorem 3

With a fixed threshold parameter $r=r_{0}$ , we rewrite $\hat{\theta}_{NT}:=\hat{\theta}_{NT}^{(r_{0})}$ , $\lambda_{it}(\theta):=\lambda_{it}(\theta,r_{0})$ and $l_{it}(\theta):=l_{it}(\theta,r_{0})$ etc., in succeeding proofs for simplicity of notations. To prove the asymptotic normality, we need to derive some intermediate results regarding the first, second and third order derivatives of $\lambda_{it}(\theta)$ .

Since

\lambda_{it}(\theta)=\sum_{k=1}^{\infty}\beta^{k-1}\left[\omega+\left(\alpha^{(1)}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}1_{\{y_{i,t-k}<r\}}\right)y_{i,t-k}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right]

almost surely, the partial derivative of $\lambda_{it}(\theta)$ are

	$\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\omega}=\sum_{k=1}^{\infty}\beta^{k-1},$
	$\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\alpha^{(1)}}=\sum_{k=1}^{\infty}\beta^{k-1}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}},$
	$\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\alpha^{(2)}}=\sum_{k=1}^{\infty}\beta^{k-1}y_{i,t-k}1_{\{y_{i,t-k}<r\}},$		(A.14)
	$\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\xi}=\sum_{k=1}^{\infty}\beta^{k-1}\left(\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right),$
	$\displaystyle\frac{\partial\lambda_{it}(\theta)}{\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}u_{i,t-k}(\theta),$

where

u_{i,t-k}(\theta)=\omega+\alpha^{(1)}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}}+\alpha^{(2)}y_{i,t-k}1_{\{y_{i,t-k}<r\}}+\xi\sum_{j=1}^{N}w_{ij}y_{j,t-k}.

Note that

\frac{\partial\lambda_{it}(\theta)}{\partial\theta}-\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta}=t\beta^{t-1}\lambda_{i0}(\nu)\mathbf{e}_{5}+\beta^{t}\frac{\partial\lambda_{i0}(\theta)}{\partial\theta},

(A.15)

where $\mathbf{e}_{5}=(0,0,0,0,1)^{\prime}$ .

For the second order derivatives, we get that, for any $\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi\}$ ,

\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}=0,

and

	$\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\omega\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2},$
	$\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\alpha^{(1)}\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}},$
	$\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\alpha^{(2)}\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}y_{i,t-k}1_{\{y_{i,t-k}<r\}},$		(A.16)
	$\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\xi\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta^{k-2}\left(\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right),$
	$\displaystyle\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}u_{i,t-k}(\theta).$

We also have

\frac{\partial^{2}\lambda_{it}(\nu)}{\partial\theta\partial\theta^{\prime}}-\frac{\partial^{2}\tilde{\lambda}_{it}(\nu)}{\partial\theta\partial\theta^{\prime}}=t(t-1)\beta^{t-2}\lambda_{i0}(\nu)\mathbf{e}_{5}\mathbf{e}_{5}^{\prime}+2t\beta^{t-1}\frac{\partial\lambda_{i0}(\nu)}{\partial\theta}\mathbf{e}_{5}^{\prime}+\beta^{t}\frac{\partial^{2}\lambda_{i0}(\nu)}{\partial\theta\partial\theta^{\prime}},

(A.17)

where $\mathbf{e}_{5}=(0,0,0,0,1)^{\prime}$ .

For the third order derivatives of $\lambda_{it}(\theta)$ ,

	$\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\omega\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3},$
	$\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\alpha^{(1)}\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}y_{i,t-k}1_{\{y_{i,t-k}\geq r\}},$
	$\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\alpha^{(2)}\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}y_{i,t-k}1_{\{y_{i,t-k}<r\}},$		(A.18)
	$\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\xi\partial\beta^{2}}=\sum_{k=3}^{\infty}(k-1)(k-2)\beta^{k-3}\left(\sum_{j=1}^{N}w_{ij}y_{j,t-k}\right),$
	$\displaystyle\frac{\partial^{3}\lambda_{it}(\theta)}{\partial\beta^{3}}=\sum_{k=4}^{\infty}(k-1)(k-2)(k-3)\beta^{k-4}u_{i,t-k}(\theta).$

Based on the consistency of $\hat{\theta}_{NT}$ , we are now ready to prove asymptotic normality. We split the proof into Claim A.7 to Claim A.10 below.

Claim A.7.

For any $\theta_{m}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}$ , $\sqrt{NT}\left|\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\theta_{m}}-\frac{\partial L_{NT}(\theta_{0})}{\partial\theta_{m}}\right|\overset{p}{\to}0$ as $\min\left\{N,T\right\}\to\infty$ and $\frac{T}{N}\to\infty$ .

Proof.

Note that

\left\{\begin{aligned} &\frac{\partial L_{NT}(\theta)}{\partial\theta}=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\lambda_{it}(\theta)}-1\right)\frac{\partial\lambda_{it}(\theta)}{\partial\theta},\\ &\frac{\partial\lambda_{it}(\theta)}{\partial\theta}=\mathbf{h}_{i,t-1}+\beta\frac{\partial\lambda_{i,t-1}(\theta)}{\partial\theta},\end{aligned}\right.

(A.19)

where

\mathbf{h}_{i,t-1}:=\left(1,y_{i,t-1}1_{\{y_{i,t-1}\geq r\}},y_{i,t-1}1_{\{y_{i,t-1}<r\}},\sum_{j=1}^{N}w_{ij}y_{j,t-1},\lambda_{i,t-1}\right)^{\prime},

and similarly

\left\{\begin{aligned} &\frac{\partial\tilde{L}_{NT}(\theta)}{\partial\theta}=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\tilde{\lambda}_{it}(\theta)}-1\right)\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta},\\ &\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta}=\tilde{\mathbf{h}}_{i,t-1}+\beta\frac{\partial\tilde{\lambda}_{i,t-1}(\theta)}{\partial\theta}.\end{aligned}\right.

(A.20)

Therefore, we have

		$\displaystyle\sqrt{NT}\left\|\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\beta}-\frac{\partial L_{NT}(\theta_{0})}{\partial\beta}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left\|y_{it}\left[\frac{\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})}{\tilde{\lambda}_{it}(\theta_{0})\lambda_{it}(\theta_{0})}\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right.\right.$
		$\displaystyle\left.\left.+\frac{1}{\lambda_{it}(\theta_{0})}\left(\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}\right)\right]-\left(\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}\right)\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\frac{y_{it}}{\omega_{0}^{2}}\left\|\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})\right\|\left\|\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right\|$
		$\displaystyle+\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\omega_{0}}+1\right)\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right\|.$

But, by Assumption 3.2(a) and (A.8), we have

	$\displaystyle\left\lVert\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\frac{y_{it}}{\omega_{0}^{2}}\left\|\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})\right\|\left\|\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right\|\right\rVert_{1}$
$\displaystyle\leq$	$\displaystyle\frac{C_{1}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\beta_{0}^{t}\left\lVert y_{it}\right\rVert_{1}$	(A.21)
$\displaystyle\leq$	$\displaystyle\frac{C_{2}}{\sqrt{NT}}\sum_{i=1}^{N}\frac{\beta_{0}}{1-\beta_{0}}\rightarrow 0$

when $\min\left\{N,T\right\}\to\infty$ and $T/N\to\infty$ . Then, in view of (A.15),

	$\displaystyle\left\lVert\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\omega_{0}}+1\right)\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right\|\right\rVert_{1}$
$\displaystyle\leq$	$\displaystyle\frac{C_{1}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}t\beta_{0}^{t-1}\left\lVert\frac{y_{it}}{\omega_{0}}+1\right\rVert_{1}+\frac{C_{2}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\beta_{0}^{t}\left\lVert\frac{y_{it}}{\omega_{0}}+1\right\rVert_{1}$	(A.22)
$\displaystyle\leq$	$\displaystyle\frac{C_{3}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}t\beta_{0}^{t-1}+\frac{C_{4}}{\sqrt{NT}}\sum_{i=1}^{N}\sum_{t=1}^{T}\beta_{0}^{t}$
$\displaystyle\leq$	$\displaystyle\frac{C_{3}}{\sqrt{NT}}\sum_{i=1}^{N}\frac{1}{(1-\beta_{0})^{2}}+\frac{C_{4}}{\sqrt{NT}}\sum_{i=1}^{N}\frac{\beta_{0}}{1-\beta_{0}}\rightarrow 0$

when $\min\left\{N,T\right\}\to\infty$ and $T/N\to\infty$ . In light of (A.21) and (A.22), we can prove that

\sqrt{NT}\left|\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\beta}-\frac{\partial L_{NT}(\theta_{0})}{\partial\beta}\right|\overset{p}{\to}0.

The proofs regarding partial derivatives w.r.t. $\omega$ , $\alpha^{(1)}$ , $\alpha^{(2)}$ and $\xi$ follow similar arguments and are therefore omitted here. ∎

Claim A.8.

For any $\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}$ , $\sup_{|\theta-\theta_{0}|<\xi}\left|\frac{\partial^{2}\tilde{L}_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta_{m}\theta_{n}}\right|=\mathcal{O}_{p}(\xi)$ as $\min\left\{N,T\right\}\to\infty$ and $T/N\to\infty$ .

Proof.

For any $\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}$ , we have

		$\displaystyle\frac{\partial^{2}L_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}}$		(A.23)
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left[\left(\frac{y_{it}}{\lambda_{it}(\theta)}-1\right)\frac{\partial^{2}\lambda_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{y_{it}}{\lambda_{it}^{2}(\theta)}\frac{\partial\lambda_{it}(\theta)}{\partial\theta_{m}}\frac{\partial\lambda_{it}(\theta)}{\partial\theta_{n}}\right],$		(A.23)

and

		$\displaystyle\frac{\partial^{2}\tilde{L}_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}}$		(A.24)
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left[\left(\frac{y_{it}}{\tilde{\lambda}_{it}(\theta)}-1\right)\frac{\partial^{2}\tilde{\lambda}_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{y_{it}}{\tilde{\lambda}_{it}^{2}(\theta)}\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta_{m}}\frac{\partial\tilde{\lambda}_{it}(\theta)}{\partial\theta_{n}}\right].$		(A.24)

Note that

	$\displaystyle\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{2}\tilde{L}_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|$	(A.25)
$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\theta\in\Theta}\left\|\frac{\partial^{2}\tilde{l}_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|.$

We will handle above two terms separately next.

For the first term on the right-hand-side of (A.25), we see

	$\displaystyle\left\lVert\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\theta\in\Theta}\left\|\frac{\partial^{2}\tilde{l}_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}\right\|\right\rVert_{1}$
$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert y_{it}\sup_{\theta\in\Theta}\left(\frac{1}{\lambda_{it}}-\frac{1}{\tilde{\lambda}_{it}}\right)\sup_{\theta\in\Theta}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}}\right\rVert_{1}$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert\sup_{\theta\in\Theta}\left(\frac{y_{it}}{\tilde{\lambda}_{it}}-1\right)\sup_{\theta\in\Theta}\left(\frac{\partial^{2}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\tilde{\lambda}_{it}}{\partial\theta_{m}\partial\theta_{n}}\right)\right\rVert_{1}$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert y_{it}\sup_{\theta\in\Theta}\left(\frac{\lambda^{2}_{it}}{\tilde{\lambda}^{2}_{it}}-1\right)\sup_{\theta\in\Theta}\frac{1}{\lambda_{it}^{2}}\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial\lambda_{it}}{\partial\theta_{n}}\right\rVert_{1}$	(A.26)
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert\sup_{\theta\in\Theta}\frac{y_{it}}{\tilde{\lambda}^{2}_{it}}\left[\frac{\partial\tilde{\lambda}_{it}}{\partial\theta_{m}}\left(\frac{\partial\tilde{\lambda}_{it}}{\partial\theta_{n}}-\frac{\partial\lambda_{it}}{\partial\theta_{n}}\right)\right]\right\rVert_{1}$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\left\lVert\sup_{\theta\in\Theta}\frac{y_{it}}{\tilde{\lambda}^{2}_{it}}\left[\frac{\partial\lambda_{it}}{\partial\theta_{n}}\left(\frac{\partial\tilde{\lambda}_{it}}{\partial\theta_{m}}-\frac{\partial\lambda_{it}}{\partial\theta_{m}}\right)\right]\right\rVert_{1}$
$\displaystyle:=$	$\displaystyle T_{1}+T_{2}+T_{3}+T_{4}+T_{5}$

Analogous to the proof of (A.21), we can show that $T_{1}\to 0$ and $T_{3}\to 0$ as $\min\{N,T\}\to\infty$ and $T/N\to\infty$ . Using (A.17), we can also verify that

T_{2}\leq\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}[C_{1}t(t-1)\rho^{t-2}+C_{2}t\rho^{t-1}+C_{3}\rho^{t}]\left\lVert\sup_{\theta\in\Theta}\left(\frac{y_{it}}{\tilde{\lambda}_{it}}-1\right)\right\rVert_{1}.

Then $T_{2}\rightarrow 0$ as well. Similarly, using (A.15), we obtain that $T_{4}\rightarrow 0$ and $T_{5}\rightarrow 0$ .

For the second term in the right-hand-side of (A.25), we notice that a Taylor expansion of $\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}$ at $\theta_{0}$ yields that

	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|$
$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{3}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right\|$
$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}}-1\right\|\left\|\frac{\partial^{3}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{2y_{it}}{\lambda_{it}^{3}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{l}}\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial\lambda_{it}}{\partial\theta_{n}}\right\|$	(A.27)
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}^{2}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{l}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}^{2}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{n}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{l}\partial\theta_{m}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}^{2}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{n}\partial\theta_{l}}\right\|$
	$\displaystyle:=B_{1}+B_{2}+B_{3}+B_{4}+B_{5}$

for any $\theta_{l},\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}$ . According to Assumption 3.2(a) and (A.18), we can verify that

\mathbb{E}\left|\frac{y_{it}}{\lambda_{it}}-1\right|\left|\frac{\partial^{3}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right|<\infty,

hence $B_{1}=\mathcal{O}_{P}(\xi)$ . The other terms can be verified following similar lines, in light of (A.14) and (A.16).

Taking (A.26) and (A.27) back to (A.25), we complete the proof. ∎

Claim A.9.

(a)

$\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\left\lVert\frac{\partial l_{it}(\theta_{0})}{\partial\theta}\right\rVert_{2p}<\infty$ for some $p>1$ ;
(b)

For each $\mathbf{v}\in\mathbb{R}^{5}$ such that $|\mathbf{v}|=1$ , $\left\{\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta}:(i,t)\in D_{NT},NT\geq 1\right\}$ are $\eta$ -weakly dependent, with dependence coefficient $\bar{\eta}_{1}(s)\leq Cs^{-\mu_{1}}$ where $\mu_{1}>4\vee\frac{2p-1}{p-1}$ .

Proof.

Recall that, from (A.19),

\frac{\partial l_{it}(\theta_{0})}{\partial\theta}=\frac{y_{it}}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}.

From Assumption 3.2 and Lemma A.1, it is easy to see that (a) holds.

Now we verify (b). In the proof of Claim A.5, for each $(i,t)\in D_{NT}$ and $h=1,2,...$ , we defined $\{y_{j\tau}^{(h)}:(j,\tau)\in D_{NT},NT\geq 1\}$ such that $y_{j\tau}^{(h)}\neq y_{j\tau}$ if and only if $\rho((i,t),(j,\tau))=h$ . At first, we verify that $\frac{\partial l_{it}(\theta_{0})}{\partial\beta}$ satisfies condition (2.7) in Pan and Pan, (2024). Note that

	$\displaystyle\left\|\frac{\partial l_{it}(\theta_{0})}{\partial\beta}-\frac{\partial l_{it}^{(h)}(\theta_{0})}{\partial\beta}\right\|$	(A.28)
$\displaystyle\leq$	$\displaystyle y_{it}\left\|\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{1}{\lambda_{it}^{(h)}(\theta_{0})}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right\|+\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right\|$
$\displaystyle\leq$	$\displaystyle\left\|\frac{y_{it}}{\omega_{0}}+1\right\|\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right\|+\frac{y_{it}}{\omega_{0}^{2}}\left\|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right\|\left\|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right\|$

and

\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}=\sum_{k=2}^{\infty}(k-1)\beta_{0}^{k-2}u_{i,t-k}(\theta_{0}),

where

u_{i,t-k}(\theta_{0})=\omega_{0}+\alpha^{(1)}_{0}y_{i,t-k}1_{\{y_{i,t-k}\geq r_{0}\}}+\alpha^{(2)}_{0}y_{i,t-k}1_{\{y_{i,t-k}<r_{0}\}}+\xi_{0}\sum_{j=1}^{N}w_{ij}y_{j,t-k}.

Following analogous arguments in (A.10), we obtain that

$\displaystyle\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\beta}\right\|\leq$	$\displaystyle\alpha^{*}_{0}(h-1)\beta_{0}^{h-2}\|y_{i,t-h}-y_{i,t-h}^{(h)}\|$	(A.29)
	$\displaystyle+\xi_{0}(h-1)\beta_{0}^{h-2}\sum_{1\leq\|i-j\|\leq h}\|y_{j,t-h}-y_{j,t-h}^{(h)}\|$
	$\displaystyle+Ch^{-b}\sum_{k=2}^{h}\|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}\|.$

Combining (A.10), (A.28) and (A.29), we can verify that $\frac{\partial l_{it}(\theta_{0})}{\partial\beta}$ satisfies condition (2.7) in Pan and Pan, (2024) with $B_{(i,t),NT}(h)\leq Ch^{-b}$ and $l=1$ . Partial derivatives of $l_{it}(\theta_{0})$ with respect to other parameters in $\theta_{0}$ follows similarly. Therefore $\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta}$ satisfies condition (2.7) in Pan and Pan, (2024) with $B_{(i,t),NT}(h)\leq Ch^{-b}$ and $l=1$ for each $\mathbf{v}\in\mathbb{R}^{5}$ .

According to Proposition 2 and Example 2.1 in Pan and Pan, (2024), the array of random fields $\{\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta}:(i,t)\in D_{NT},NT\geq 1\}$ is $\eta$ -weakly dependent with coefficient $\bar{\eta}_{1}(s)\leq Cs^{-\frac{2p-2}{2p-1}\mu_{y}+2}$ . But $\frac{2p-2}{2p-1}\mu_{y}-2>4\vee\frac{2p-1}{p-1}$ because $\mu_{y}>\frac{6p-3}{p-1}\vee\frac{(4p-3)(2p-1)}{2(p-1)^{2}}$ . So (b) is verified. ∎

Claim A.10.

(a)

$\sup_{NT\geq 1}\sup_{(i,t)\in D_{NT}}\left\lVert\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right\rVert_{p}<\infty$ for some $p>1$ ;
(b)

With respect to all $\theta_{m},\theta_{n}\in\{\omega,\alpha^{(1)},\alpha^{(2)},\xi,\beta\}$ , $\left\{\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}:(i,t)\in D_{NT},NT\geq 1\right\}$ are $\eta$ -weakly dependent, with dependence coefficient $\bar{\eta}_{2}(s)\leq Cs^{-\mu_{2}}$ where $\mu_{2}>2$ .

Proof.

Recall that, from (A.23),

\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}=\left(\frac{y_{it}}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}.

Then Claim A.10(a) could be directly obtained from Assumption 3.2(a).

Following the same idea as in previous proofs, for each $(i,t)\in D_{NT}$ and $h=1,2,...$ , we define $\{y_{j\tau}^{(h)}:(j,\tau)\in D_{NT},NT\geq 1\}$ such that $y_{j\tau}^{(h)}\neq y_{j\tau}$ if and only if $\rho((i,t),(j,\tau))=h$ . To prove (b), we verify that $\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}$ satisfies condition (2.7) in Pan and Pan, (2024). Firstly we have

	$\displaystyle\left\|\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|$
$\displaystyle\leq$	$\displaystyle\left\|\frac{y_{it}}{\lambda_{it}(\theta_{0})}+1\right\|\left\|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|+y_{it}\left\|\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|\left\|\frac{1}{\lambda_{it}(\theta_{0})}-\frac{1}{\lambda_{it}^{(h)}(\theta_{0})}\right\|$
	$\displaystyle+\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right\|$
	$\displaystyle+\left\|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right\|\left\|\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}-\frac{y_{it}}{(\lambda_{it}^{(h)}(\theta_{0}))^{2}}\right\|$	(A.30)
$\displaystyle\leq$	$\displaystyle\left\|\frac{y_{it}}{\lambda_{it}(\theta_{0})}+1\right\|\left\|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|$
	$\displaystyle+\frac{y_{it}}{\lambda_{it}(\theta_{0})\lambda_{it}^{(h)}(\theta_{0})}\left\|\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|\left\|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right\|$
	$\displaystyle+\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}\right\|\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right\|$
	$\displaystyle+\frac{y_{it}}{\lambda_{it}^{2}(\theta_{0})}\left\|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right\|\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\right\|$
	$\displaystyle+\frac{y_{it}}{\lambda_{it}(\theta_{0})\lambda_{it}^{(h)}(\theta_{0})}\left\|\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right\|\left\|\frac{1}{\lambda_{it}(\theta_{0})}+\frac{1}{\lambda_{it}^{(h)}(\theta_{0})}\right\|\left\|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right\|$
$\displaystyle\leq$	$\displaystyle\left(\frac{y_{it}}{\omega_{0}}+1\right)\left\|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|+C_{1}\frac{y_{it}}{\omega_{0}^{2}}\left\|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right\|$
	$\displaystyle+C_{2}\frac{y_{it}}{\omega_{0}^{2}}\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{n}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{n}}\right\|+C_{3}\frac{y_{it}}{\omega_{0}^{2}}\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta_{m}}-\frac{\partial\lambda_{it}^{(h)}(\theta_{0})}{\partial\theta_{m}}\right\|$
	$\displaystyle+C_{4}\frac{y_{it}}{\omega_{0}^{3}}\left\|\lambda_{it}(\theta_{0})-\lambda_{it}^{(h)}(\theta_{0})\right\|.$

Taking the second order derivative with respect to $\xi$ and $\beta$ as an example, analogous to (A.10) and (A.29), we get

	$\displaystyle\left\|\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\xi\partial\beta}-\frac{\partial^{2}\lambda_{it}^{(h)}(\theta_{0})}{\partial\xi\partial\beta}\right\|$
$\displaystyle\leq$	$\displaystyle\sum_{k=2}^{\infty}(k-1)\beta^{k-2}\left\|\sum_{j=1}^{N}w_{ij}y_{j,t-k}-\sum_{j=1}^{N}w_{ij}y_{j,t-k}^{(h)}\right\|$	(A.31)
$\displaystyle\leq$	$\displaystyle(h-1)\beta_{0}^{h-2}\sum_{\|i-j\|\leq h}\|y_{j,t-h}-y_{j,t-h}^{(h)}\|$
	$\displaystyle+Ch^{-b}\sum_{k=2}^{h}\|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}\|.$

Proofs regarding second order derivatives with respect to other parameters follow similar arguments and then are omitted here. Substituting (A.10), (A.29) and (A.31) to (A.30), we have that $\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}$ satisfies condition (2.7) in Pan and Pan, (2024) with $B_{(i,t),NT}(h)\leq Ch^{-b}$ and $l=1$ .

According to Proposition 2 and Example 2.1 in Pan and Pan, (2024), the array of random fields $\{\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}:(i,t)\in D_{NT},NT\geq 1\}$ is $\eta$ -weakly dependent with coefficient $\bar{\eta}_{1}(s)\leq Cs^{-\frac{2p-2}{2p-1}\mu_{y}+2}$ , and $\frac{2p-2}{2p-1}\mu_{y}-2>2$ .

∎

By the Taylor expansion, for some $\theta^{*}$ between $\hat{\theta}_{NT}$ and $\theta_{0}$ we have

\frac{\partial\tilde{L}_{NT}(\hat{\theta}_{NT})}{\partial\theta}=\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\theta}+\frac{\partial^{2}\tilde{L}_{NT}(\theta^{*})}{\partial\theta\partial\theta^{\prime}}(\hat{\theta}_{NT}-\theta_{0}).

Since $\frac{\partial\tilde{L}_{NT}(\hat{\theta}_{NT})}{\partial\theta}=0$ , we have

	$\displaystyle\sqrt{NT}\Sigma_{NT}^{1/2}(\hat{\theta}_{NT}-\theta_{0})$
$\displaystyle=$	$\displaystyle-\Sigma_{NT}^{1/2}\left(\frac{\partial^{2}\tilde{L}_{NT}(\theta^{*})}{\partial\theta\partial\theta^{\prime}}\right)^{-1}\sqrt{NT}\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\theta}$	(A.32)
$\displaystyle=$	$\displaystyle-\Sigma_{NT}^{1/2}\left(\Sigma_{NT}^{-1/2}\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right)^{-1}\Sigma_{NT}^{-1/2}\sqrt{NT}\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}+o_{p}(1)$

according to Claims A.7 and A.8.

Note that $y_{it}=N_{it}(\lambda_{it}(\theta_{0}))$ is Poisson distributed with mean $\lambda_{it}(\theta_{0})$ conditioning on historical information $\mathcal{H}_{t-1}$ , with $\{N_{it}:(i,t)\in D_{NT},NT\geq 1\}$ being IID Poisson point processes with intensity 1. Therefore we have

		$\displaystyle\mathbb{E}\left(\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left\{\mathbb{E}\left[\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial^{2}\lambda_{it}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\|\mathcal{H}_{t-1}\right]\right\}$
		$\displaystyle-\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left\{\mathbb{E}\left[\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}^{2}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\|\mathcal{H}_{t-1}\right]\right\}$
	$\displaystyle=$	$\displaystyle-\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left[\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]$
	$\displaystyle=$	$\displaystyle-\Sigma_{NT}.$

By Claim A.10, we apply Theorem 1 in Pan and Pan, (2024) and obtain that

\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}+\Sigma_{NT}\overset{p}{\to}0.

(A.33)

According to condition (3.5) we can further prove that

-\left(\Sigma_{NT}^{-1/2}\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta\partial\theta^{\prime}}\right)\Sigma_{NT}^{-1/2}=\left(\Sigma_{NT}^{1/2}+o_{p}(1)\right)\Sigma_{NT}^{-1/2}=I_{5}+o_{p}(1).

(A.34)

When $\tau\neq t$ or $j\neq i$ , we have

\mathbb{E}\left[\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\left(\frac{N_{j\tau}(\lambda_{j\tau}(\theta_{0}))}{\lambda_{j\tau}(\theta_{0})}-1\right)\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{j\tau}(\theta_{0})}{\partial\theta^{\prime}}|\mathcal{H}_{t-1}\right]=0

assuming $\tau<t$ . Then we can verify that

		$\displaystyle\operatorname*{Var}\left(\sqrt{NT}\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\mathbb{E}\left\{\left[\sum_{i=1}^{N}\sum_{t=1}^{T}\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\right]\right.$
		$\displaystyle\times\left.\left[\sum_{i=1}^{N}\sum_{t=1}^{T}\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]\right\}$
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\mathbb{E}\left[\left(\frac{N_{it}(\lambda_{it}(\theta_{0}))}{\lambda_{it}(\theta_{0})}-1\right)^{2}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]$
	$\displaystyle=$	$\displaystyle\Sigma_{NT}.$

For each $\mathbf{v}\in\mathbb{R}^{5}$ , $\operatorname*{Var}\left(\sum_{(i,t)\in D_{NT}}\mathbf{v}^{\prime}\frac{\partial l_{it}(\theta_{0})}{\partial\theta}\right)=(NT)\mathbf{v}^{\prime}\Sigma_{NT}\mathbf{v}.$ By (3.5) and the symmetry of $\Sigma_{NT}$ , it is implied that

\inf_{NT\geq 1}\mathbf{v}^{\prime}\Sigma_{NT}\mathbf{v}>0.

Then, by Claim A.9 and Theorem 2 in Pan and Pan, (2024), we can prove that

[(NT)\mathbf{v}^{\prime}\Sigma_{NT}\mathbf{v}]^{-1/2}\mathbf{v}^{\prime}(NT)\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}\overset{d}{\to}N(0,1).

According to the Cramér-Wold theorem, we have

(\Sigma_{NT})^{-1/2}\sqrt{NT}\frac{\partial L_{NT}(\theta_{0})}{\partial\theta}\overset{d}{\to}N(0,I_{5}).

(A.35)

Combining (A.32), (A.34) and (A.35), we complete the proof of Theorem 3.

A.4 Proof of Theorem 4

Recalling from (4.2), the Wald statistic is

W_{NT}:=(\Gamma\hat{\theta}_{NT}-\eta)^{\prime}\left\{\frac{\Gamma}{NT}\widehat{\Sigma}_{NT}^{-1}\Gamma^{\prime}\right\}^{-1}(\Gamma\hat{\theta}_{NT}-\eta),

where

\widehat{\Sigma}_{NT}:=\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right].

It is sufficient to show that

\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right]-\Sigma_{NT}\overset{p}{\to}0.

(A.36)

Note that

		$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right]-\Sigma_{NT}$
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\mathbb{E}\left(\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right)\right]$
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]$
		$\displaystyle+\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}-\mathbb{E}\left(\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right)\right]$
	$\displaystyle:=$	$\displaystyle T_{1}+T_{2}.$

Similar to the proof of Claim A.10, we can verify that the LLN Theorem 1 in Pan and Pan, (2024) applies to $\left\{\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}:(i,t)\in D_{NT},NT\geq 1\right\}$ and therefore $T_{2}\overset{p}{\to}0$ . $T_{1}$ can be further decomposed as follows

		$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]$
	$\displaystyle=$	$\displaystyle\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\tilde{\lambda}_{it}(\hat{\theta}_{NT})}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\tilde{\lambda}_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\hat{\theta}_{NT})}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}\right]$
		$\displaystyle+\frac{1}{NT}\sum_{(i,t)\in D_{NT}}\left[\frac{1}{\lambda_{it}(\hat{\theta}_{NT})}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta}\frac{\partial\lambda_{it}(\hat{\theta}_{NT})}{\partial\theta^{\prime}}-\frac{1}{\lambda_{it}(\theta_{0})}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta}\frac{\partial\lambda_{it}(\theta_{0})}{\partial\theta^{\prime}}\right]$
	$\displaystyle:=$	$\displaystyle S_{1}+S_{2}.$

But, $S_{2}\overset{p}{\to}0$ since $\hat{\theta}_{NT}\overset{p}{\to}\theta_{0}$ . The proof of $S_{1}\overset{p}{\to}0$ is similar to the proof of (A.26), therefore omitted here. So the proof of Theorem 4 is completed.

References

Al-Osh and Alzaid, (1987) Al-Osh, M. A. and Alzaid, A. A. (1987). First-order integer-valued autoregressive (INAR (1)) process. Journal of Time Series Analysis, 8(3):261–275.
Al-Osh and Alzaid, (1988) Al-Osh, M. A. and Alzaid, A. A. (1988). Integer-valued moving average (INMA) process. Statistical Papers, 29:281–300.
Armillotta and Fokianos, (2024) Armillotta, M. and Fokianos, K. (2024). Count network autoregression. Journal of Time Series Analysis, 45(4):584–612.
Berkes et al., (2003) Berkes, I., Horváth, L., and Kokoszka, P. (2003). GARCH processes: structure and estimation. Bernoulli, 9(2):201–227.
Clauset et al., (2009) Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2009). Power-law distributions in empirical data. SIAM Review, 51(4):661–703.
Cui et al., (2020) Cui, Y., Li, Q., and Zhu, F. (2020). Flexible bivariate Poisson integer-valued GARCH model. Annals of the Institute of Statistical Mathematics, 72(6):1449–1477.
Cui and Zhu, (2018) Cui, Y. and Zhu, F. (2018). A new bivariate integer-valued GARCH model allowing for negative cross-correlation. Test, 27:428–452.
De Wit et al., (2013) De Wit, E. R., Englund, P., and Francke, M. K. (2013). Price and transaction volume in the Dutch housing market. Regional Science and Urban Economics, 43(2):220–241.
Ferland et al., (2006) Ferland, R., Latour, A., and Oraichi, D. (2006). Integer-valued GARCH process. Journal of time series analysis, 27(6):923–942.
Fokianos et al., (2009) Fokianos, K., Rahbek, A., and Tjøstheim, D. (2009). Poisson autoregression. Journal of the American Statistical Association, 104(488):1430–1439.
Fokianos et al., (2020) Fokianos, K., Støve, B., Tjøstheim, D., and Doukhan, P. (2020). Multivariate count autoregression. Bernoulli, 26(1):471 – 499.
Heinen, (2003) Heinen, A. (2003). Modelling time series count data: an autoregressive conditional Poisson model. Available at SSRN 1117187.
Jiang et al., (2022) Jiang, F., Zhao, Z., and Shao, X. (2022). Modelling the covid-19 infection trajectory: A piecewise linear quantile trend model. Journal of Royal Statistical Society (Series B), 84(5):1589–1607.
Jones et al., (1994) Jones, C. M., Kaul, G., and Lipson, M. L. (1994). Transactions, volume, and volatility. The Review of Financial Studies, 7(4):631–651.
Lee et al., (2023) Lee, S., Kim, D., and Kim, B. (2023). Modeling and inference for multivariate time series of counts based on the INGARCH scheme. Computational Statistics & Data Analysis, 177:107579.
Lee et al., (2018) Lee, Y., Lee, S., and Tjøstheim, D. (2018). Asymptotic normality and parameter change test for bivariate Poisson INGARCH models. Test, 27:52–69.
McKenzie, (1985) McKenzie, E. (1985). Some simple models for discrete variate time series. Journal of the American Water Resources Association, 21(4):645–650.
McKenzie, (1988) McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability, 20(4):822–835.
Nowicki and Snijders, (2001) Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic block structures. Journal of the American Statistical Association, 96(455):1077–1087.
Pan and Pan, (2024) Pan, Y. and Pan, J. (2024). Limit theorems for weakly dependent non-stationary random field arrays and asymptotic inference of dynamic spatio-temporal models. Available at arXiv:2408.07429.
Pham, (2020) Pham, H. (2020). On estimating the number of deaths related to Covid-19. Mathematics, 8(5):655.
Tao et al., (2024) Tao, Y., Li, D., and Niu, X. (2024). Grouped network Poisson autoregressive model. Statistica Sinica, 34:1603–1624.
Wang et al., (2014) Wang, C., Liu, H., Yao, J.-F., Davis, R. A., and Li, W. K. (2014). Self-excited threshold Poisson autoregression. Journal of the American Statistical Association, 109(506):777–787.
Xu et al., (2012) Xu, H.-Y., Xie, M., Goh, T. N., and Fu, X. (2012). A model for integer-valued time series with conditional overdispersion. Computational Statistics & Data Analysis, 56(12):4229–4242.
Zhou et al., (2020) Zhou, J., Li, D., Pan, R., and Wang, H. (2020). Network GARCH model. Statistica Sinica, 30:1–18.
Zhu, (2010) Zhu, F. (2010). A negative binomial integer-valued GARCH model. Journal of Time Series Analysis, 32(1):54–67.
Zhu, (2012) Zhu, F. (2012). Modeling overdispersed or underdispersed count data with generalized Poisson integer-valued GARCH models. Journal of Mathematical Analysis and Applications, 389(1):58–71.

	$\displaystyle\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|$
$\displaystyle\leq$	$\displaystyle\sum_{k=1}^{\infty}\beta^{k-1}\|\alpha_{i,t-k}y_{i,t-k}-\alpha_{i,t-k}^{(h)}y_{i,t-k}^{(h)}\|+\sum_{k=1}^{\infty}\sum_{j=1}^{N}\beta^{k-1}\xi w_{ij}\|y_{j,t-k}-y_{j,t-k}^{(h)}\|$
$\displaystyle=$	$\displaystyle\beta^{h-1}\|\alpha_{i,t-h}y_{i,t-h}-\alpha_{i,t-h}^{(h)}y_{i,t-h}^{(h)}\|+\xi\beta^{h-1}\sum_{1\leq\|j-i\|\leq h}w_{ij}\|y_{j,t-h}-y_{j,t-h}^{(h)}\|$	(A.10)
	$\displaystyle+\xi w_{i,i\pm h}\sum_{k=1}^{h}\beta^{k-1}\|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}\|$
$\displaystyle\leq$	$\displaystyle\alpha^{*}\beta^{h-1}\|y_{i,t-h}-y_{i,t-h}^{(h)}\|+\xi\beta^{h-1}\sum_{1\leq\|j-i\|\leq h}\|y_{j,t-h}-y_{j,t-h}^{(h)}\|$
	$\displaystyle+C\xi h^{-b}\sum_{k=1}^{h}\|y_{i\pm h,t-k}-y_{i\pm h,t-k}^{(h)}\|.$

	$\displaystyle\|l_{it}(\nu)-l_{it}^{(h)}(\nu)\|\leq$	$\displaystyle y_{it}\left\|\log\frac{\lambda_{it}(\nu)}{\lambda_{it}^{(h)}(\nu)}\right\|+\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|$
	$\displaystyle\leq$	$\displaystyle y_{it}\left\|\frac{\lambda_{it}(\nu)}{\lambda_{it}^{(h)}(\nu)}-1\right\|+\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|$
	$\displaystyle\leq$	$\displaystyle\frac{y_{it}}{\omega}\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|+\|\lambda_{it}(\nu)-\lambda_{it}^{(h)}(\nu)\|,$

		$\displaystyle\sqrt{NT}\left\|\frac{\partial\tilde{L}_{NT}(\theta_{0})}{\partial\beta}-\frac{\partial L_{NT}(\theta_{0})}{\partial\beta}\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left\|y_{it}\left[\frac{\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})}{\tilde{\lambda}_{it}(\theta_{0})\lambda_{it}(\theta_{0})}\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right.\right.$
		$\displaystyle\left.\left.+\frac{1}{\lambda_{it}(\theta_{0})}\left(\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}\right)\right]-\left(\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}\right)\right\|$
	$\displaystyle\leq$	$\displaystyle\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\frac{y_{it}}{\omega_{0}^{2}}\left\|\lambda_{it}(\theta_{0})-\tilde{\lambda}_{it}(\theta_{0})\right\|\left\|\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right\|$
		$\displaystyle+\frac{1}{\sqrt{NT}}\sum_{(i,t)\in D_{NT}}\left(\frac{y_{it}}{\omega_{0}}+1\right)\left\|\frac{\partial\lambda_{it}(\theta_{0})}{\partial\beta}-\frac{\partial\tilde{\lambda}_{it}(\theta_{0})}{\partial\beta}\right\|.$

	$\displaystyle\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{2}\tilde{L}_{NT}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}L_{NT}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|$	(A.25)
$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\theta\in\Theta}\left\|\frac{\partial^{2}\tilde{l}_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|.$

	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{2}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}}-\frac{\partial^{2}l_{it}(\theta_{0})}{\partial\theta_{m}\partial\theta_{n}}\right\|$
$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{\partial^{3}l_{it}(\theta)}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right\|$
$\displaystyle\leq$	$\displaystyle\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}}-1\right\|\left\|\frac{\partial^{3}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}\partial\theta_{l}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{2y_{it}}{\lambda_{it}^{3}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{l}}\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial\lambda_{it}}{\partial\theta_{n}}\right\|$	(A.27)
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}^{2}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{l}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{m}\partial\theta_{n}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}^{2}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{n}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{l}\partial\theta_{m}}\right\|$
	$\displaystyle+\frac{1}{NT}\sum_{i=1}^{N}\sum_{t=1}^{T}\xi\sup_{\|\theta-\theta_{0}\|<\xi}\left\|\frac{y_{it}}{\lambda_{it}^{2}}\right\|\left\|\frac{\partial\lambda_{it}}{\partial\theta_{m}}\frac{\partial^{2}\lambda_{it}}{\partial\theta_{n}\partial\theta_{l}}\right\|$
	$\displaystyle:=B_{1}+B_{2}+B_{3}+B_{4}+B_{5}$

Modelling Volatility of Spatio-temporal Integer-valued Data with Network Structure and Asymmetry

Abstract

1 Introduction

2 Model setting and stationarity over time

Remark.

Theorem 1.

3 Parameter estimation with T→∞T\to\infty and N→∞N\to\infty

Assumption 3.1.

Assumption 3.2.

Assumption 3.3.

Theorem 2.

Theorem 3.

Remark.

4 Hypothesis testing with T→∞T\to\infty and N→∞N\to\infty

Theorem 4.

5 Simulation studies and empirical data analysis

5.1 Simulation studies

Example 5.1.

Example 5.2.

Example 5.3.

Example 5.4.

Remark.

5.2 Analysis of daily numbers of car accidents in New York City

6 Conclusion

Appendix A Proofs of theoretical results

Lemma A.1.

Proof.

A.1 Proof of Theorem 1

Claim A.1.

Proof.

Claim A.2.

Proof.

A.2 Proof of Theorem 2

Claim A.3.

Proof.

Claim A.4.

Proof.

Claim A.5.

Proof.

Claim A.6.

Proof.

A.3 Proof of Theorem 3

Claim A.7.

Proof.

Claim A.8.

Proof.

Claim A.9.

Proof.

Claim A.10.

Proof.

A.4 Proof of Theorem 4

References

Modelling Volatility of Spatio-temporal Integer-valued Data
with Network Structure and Asymmetry

3 Parameter estimation with $T\to\infty$ and $N\to\infty$

4 Hypothesis testing with $T\to\infty$ and $N\to\infty$