Functional Network Autoregressive Models for Panel Data

Tomohiro Ando and Tadao Hoshino Melbourne Business School, the University of Melbourne. Email: [email protected]. Address Correspondence to: Tadao Hoshino, School of Political Science and Economics, Waseda University. Email: [email protected].

Abstract

This study proposes a novel functional vector autoregressive framework for analyzing network interactions of functional outcomes in panel data settings. In this framework, an individual’s outcome function is influenced by the outcomes of others through a simultaneous equation system. To estimate the functional parameters of interest, we need to address the endogeneity issue arising from these simultaneous interactions among outcome functions. This issue is carefully handled by developing a novel functional moment-based estimator. We establish the consistency, convergence rate, and pointwise asymptotic normality of the proposed estimator. Additionally, we discuss the estimation of marginal effects and impulse response analysis. As an empirical illustration, we analyze the demand for a bike-sharing service in the U.S. The results reveal statistically significant spatial interactions in bike availability across stations, with interaction patterns varying over the time of day.

Keywords: bicycle-sharing systems, endogeneity, functional data analysis, panel data, network autoregressive models.

Introduction

The availability of functional data has been rapidly expanding across all fields of research, leading to a growing need for statistical tools that appropriately account for the unique characteristics of each type of functional data. In the analysis of socioeconomic data, there are at least two key aspects that should be addressed. The first is that an individual’s decision or behavioral pattern may influence that of others through social networks—interactions between individuals. The second is that individuals are intrinsically heterogeneous, even after controlling for observable characteristics—unobserved heterogeneity of individuals. Therefore, analyzing socioeconomic functional data requires functional models that jointly capture both of these aspects, which is the aim of this study.

More specifically, to account for the interactions among units, we extend the network (or spatial) autoregressive (NAR) modeling approach to a functional response model. To address the unobserved individual heterogeneity, we introduce the functional fixed effects approach, given the availability of panel data. When the response variable is a scalar rather than a function, there already exists a vast body of studies investigating fixed-effect NAR models for panel data, such as

\displaystyle Y_{it}=\alpha_{0}\sum_{j=1}^{n}w_{i,j}Y_{jt}+X_{it}^{\top}\beta_{0}+f_{0i}+\varepsilon_{it},\;\;i=1,\ldots,n,\;t=1,\ldots,T

(1.1)

and its variants (e.g., Yu et al., 2008; Lee and Yu, 2010, 2014; Kuersteiner and Prucha, 2020, among others). Here, $Y_{it}$ is a scalar outcome, $w_{i,j}$ denotes a known weight term measuring the social or geographical proximity between units $i$ and $j$ , $X_{it}$ is a vector of covariates, $f_{0i}$ represents a fixed effect specific to each $i$ , and $\varepsilon_{it}$ denotes an error term. The term $\sum_{j=1}^{n}w_{i,j}Y_{jt}$ captures the local trend of the outcome variable in the neighborhood of $i$ . Model (1.1) is typically applied in fields such as health, real estate, transportation, education, and municipal data. However, with the increasing availability of functional data in these fields, such as real-time activity recognition, real-time population mobility and congestion patterns, and regional wealth distributions, scalar models like (1.1) may fail to appropriately capture the complex nature of these interactions.

The above discussion motivates us to extend (1.1) to the following model: for $s\in[0,1]$ ,

\displaystyle Y_{it}(s)=\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s),

(1.2)

where $Y_{it}$ represents the outcome function of interest, which may or may not be a smooth function of $s$ , $\alpha_{0}$ is the interaction effect function, $\beta_{0}$ is a vector of functional coefficients, $f_{0i}$ is the fixed effect function, and $\varepsilon_{it}$ is the functional error term with mean zero at each $s$ . Here, $A(\cdot,s)$ denotes a known linear functional, whose functional form may differ according to the research interest. Since the response variable is a function, we can consider various types of interaction patterns. The most typical form of interaction would be the ”concurrent” interaction, where only the responses of others at the same evaluation point $s$ are influential. In this case, $A(\cdot,s)$ is a point-evaluation functional at $s$ : $A(Y_{jt},s)=Y_{jt}(s)$ . When $s$ represents a time, then the past outcome should affect the future outcome (but the converse should not), which would motivate us to employ $A(Y_{jt},s)=\int_{0}^{1}Y_{jt}(u)\nu(u,s)\text{d}u$ , where $\nu(u,s)$ is a user-chosen function that is non-negative, increasing in $u$ up to $s$ , and $\nu(u,s)=0$ for $u>s$ . For other examples, if others’ responses at all evaluation points are equivalently influential, we may use $A(Y_{jt},s)=\int_{0}^{1}Y_{jt}(u)\text{d}u$ . These examples can be represented as an integral operator $A(Y_{jt},s)=\int_{0}^{1}Y_{jt}(u)\nu(u,s)\text{d}u$ with some kernel weight function $\nu(u,s)$ . For example, in the case of point-evaluation functional, we can set $\nu(u,s)=\delta(u-s)$ , where $\delta$ denotes the Dirac delta function.

Here, we provide three types of empirical topics to which the model (1.2) would be nicely applied.

Example 1.1 (Health data analysis).

In the health literature, researchers have increasingly focused on real-time activity data collected through wearable devices or smartphone apps (see, e.g., Di et al. (2024) for a review). As a typical example, $Y_{it}(s)$ represents the activity level of individual $i$ , measured by an accelerometer at time $s$ on day $t$ . Now, suppose the dataset consists of an elderly population, where some people in the same neighborhood frequently engage in fitness activities such as running. If we apply our model to their activity-level data, with $w_{i,j}$ representing neighborhood membership and $A(Y_{jt},s)=Y_{jt}(s)$ , we may observe a significantly positive interaction effect.

Example 1.2 (Demographic data analysis).

Demographic analysis is a major application of functional data analysis (FDA). For instance, functional analysis of regional age distributions (i.e., population pyramids) has been studied extensively (e.g., Delicado, 2011; Hron et al., 2016; Hoshino, 2024). Among these studies, Hoshino (2024) considered a functional spatial autoregressive model in which $Y_{i}(s)$ represents the $s$ -th age quantile of city $i$ , allowing interactions with the age quantiles of neighboring cities. Another common application in FDA is the analysis of mortality and fertility rates (e.g., Hyndman and Ullah, 2007; Chen and Müller, 2012). In such a setting, the outcome function $Y_{it}(s)$ may represent the age-specific fertility rate of women of age $s$ in city $i$ in year $t$ . The presence of regional interactions in fertility would be unsurprising.

Example 1.3 (Transportation data analysis).

Functional data analysis of transportation data, such as traffic flows and demand for transportation services, has been gaining significant attention (see, e.g., Ma et al. (2024) for recent advancements). In the empirical application of this study, we apply our model to analyze the bike use data in a U.S. bike-sharing system. In our empirical analysis, $Y_{it}(s)$ represents the availability of bikes at station $i$ at time $s$ during week $t$ . We find statistically significant positive or negative spatial interactions among bike availabilities of nearby stations, depending on the time of day. Further details are provided in Section 6.

In model (1.2), the parameters of our primary interest to be estimated are $\alpha_{0}$ and $\beta_{0}$ . With the total time periods $T$ possibly large or small, we apply a first-differencing transformation to eliminate the individual fixed effects from the model. For the transformed model, rather than estimating $\alpha_{0}(s)$ and $\beta_{0}(s)$ pointwise at many different $s$ -values separately, we approximate them using orthonormal basis expansions and estimate their entire functional forms jointly in a single estimation. Our proposed estimator is based on the generalized method of moments (GMM). Specifically, we first derive a set of moment conditions at each $s$ , in a similar manner to Lin and Lee (2010), and then integrate these conditions numerically over $s\in[0,1]$ . These integrated moment functions define our GMM objective function, and the resulting estimator is referred to as the integrated-GMM estimator. Once $\alpha_{0}(s)$ and $\beta_{0}(s)$ are estimated, if necessary, we can estimate the fixed effects $f_{01}(s),\ldots,f_{0n}(s)$ simply by taking the individual-level mean of the residuals.

Note that in model (1.2), the outcome functions appear on both the left- and right-hand sides, implying that it is formulated as a system of simultaneous functional equations. Depending on the true values of the functional parameters, the model may exhibit an explosive network interaction process, leading to non-stationarity and inconsistency of the proposed estimator. Thus, we first derive the condition under which the model attains the stationarity in our context. We consequently show that the magnitude of network interactions must reside within a certain range.

Then, under the stationarity condition on network interactions, along with some regularity conditions, we derive the convergence rates of the integrated-GMM estimators for $\alpha_{0}$ and $\beta_{0}$ . In addition, we prove that the estimators are asymptotically normal at each evaluation point $s$ . Due to the complexity of characterizing the stochastic process of functional outcomes, the numerical integration of moment functions, the first-differencing elimination of functional fixed effects, and the need to appropriately control the order of basis expansion, among other factors, establishing these results involves new mathematical challenges and requires careful discussion. These theoretical results are numerically corroborated through a series of Monte Carlo experiments.

As an empirical illustration of our method, we apply it to the demand analysis of a bike-sharing system in the San Francisco Bay Area, U.S. Using publicly available data from Bay Area Bike Share, we analyze spatial interactions in bike availability across 70 stations from May 2014 to August 2015. Our results reveal significant positive spatial interactions in bike availability during the morning hours, while negative interactions emerge in the early evening. Furthermore, we conduct an impulse response analysis to demonstrate how a reduction in bike availability at a given station propagates to nearby stations over time. These findings highlight the importance of spatial interactions in shared mobility services and demonstrate the practical applicability of our method.

Our paper relates to a broad range of theoretical and empirical literature. From a theoretical perspective, our study contributes to both the FDA literature and the network/spatial interactions literature by proposing a new model that connects the two. In this sense, one of the most closely related studies to ours is Zhu et al. (2022). They proposed a functional NAR model similar to (1.2) but not in panel data settings. In contrast to Zhu et al. (2022), our novel GMM estimator requires neither parametric assumptions nor I.I.D. conditions for the disturbance term. This weak requirement arises from that we treat the individual effects as parameters, whereas Zhu et al. (2022) perform functional principal component analysis to control them based on some homogeneity condition. Moreover, they considered only a concurrent interaction case (i.e., $A(Y_{jt},s)=Y_{jt}(s)$ ). As described earlier through the examples of health data, demographic data, and transportation data, the variable $s$ typically represents a time on some scale. The concurrent interaction rules out interactions even with immediate past outcomes and allows only strictly simultaneous interactions, which should limit the applicability of the model. Computationally, our estimator can recover the full functional forms of the functional parameters in a single step, while the estimator in Zhu et al. (2022) must be repeatedly applied at each evaluation point $s$ .

On the empirical side, demand forecasting for bike-sharing systems has been an active topic in the data science literature (e.g., Faghih-Imani and Eluru, 2016; Lin et al., 2018; Eren and Uz, 2020; Torti et al., 2021, among others). Among these studies, Faghih-Imani and Eluru (2016) is most closely related to our study in that they employed a spatial panel model similar to (1.1) to analyze the spatial and temporal interaction structure for the bike-sharing system in New York City, CitiBike. In their approach, however, the data are not treated as functional, and thus the model parameters are not allowed to vary over time. By contrast, Torti et al. (2021) analyzed the flow of bikes in the bike-sharing system in Milan, BikeMi, through a functional linear model with functional coefficients; however, they did not account for the spatial interactions of mobility. Thus, our empirical study can be viewed as combining the strengths of these two papers.

The major contributions of this paper are summarized as follows: First, we propose a novel model for analyzing various forms of network and spatial interactions underlying socioeconomic functional data. Second, we formally establish a condition that ensures the outcome functions follow a unique network-stationary process within the model. Third, we develop a novel GMM-type estimator, the integrated-GMM estimator, for estimating the functional parameters. Fourth, we establish the asymptotic properties of the integrated-GMM estimator, including its consistency, rate of convergence, and pointwise limiting distribution. Fifth, we additionally develop a new approach for implementing network impulse response analysis and investigate its convergence property. Finally, we apply our method to the analysis of bike-sharing demand, offering new empirical insights into functional spatial interactions in mobility.

Paper organization

The rest of the paper is organized as follows. In Section 2, we present our model and discuss its stationarity condition. Section 3 introduces our integrated-GMM estimator and investigates its asymptotic properties. In Section 4, we discuss additional topics related to our model, including the estimation of marginal effects and network impulse response analysis. Section 5 conducts a set of Monte Carlo simulations to numerically demonstrate the properties of our estimator. Section 6 presents our empirical analysis on the U.S. bike-sharing data, and Section 7 concludes. Proofs of all technical results are provided in Appendix.

Notation

For a function $h$ defined on $[0,1]$ and $p\in[1,\infty)$ , the $L^{p}$ norm of $h$ is written as $||h||_{L^{p}}\coloneqq(\int_{0}^{1}|h(s)|^{p}\text{d}s)^{1/p}$ , and $L^{p}(0,1)$ denotes the set of $h$ ’s such that $||h||_{L^{p}}<\infty$ . For a random variable $X$ , the $L^{p}$ norm of $X$ is written as $||X||_{p}\coloneqq(\mathbb{E}|X|^{p})^{1/p}$ . For a matrix $M$ , $||M||$ , $||M||_{1}$ , and $||M||_{\infty}$ denote the Frobenius norm, the maximum absolute column sum, and the maximum absolute row sum of $M$ , respectively. If $M$ is a square matrix, we use $\lambda_{\max}(M)$ and $\lambda_{\min}(M)$ to denote its largest and smallest eigenvalues, respectively. For a positive integer $Z$ , we denote $[Z]\coloneqq\{1,\ldots,Z\}$ . We use $I_{Z}$ to denote an identity matrix of dimension $Z$ . Finally, $X\lesssim Y$ if $X=O(Y)$ almost surely, and $X\lesssim_{P}Y$ if $X=O_{P}(Y)$ .

Functional Network Autoregressive Model

The model

Suppose that we have balanced panel data of size $(n,T)$ : $\{(Y_{it},X_{it},w_{i,1},\ldots,w_{i,n}):i\in[n],\>t\in[T]\}$ . The number of time periods $T$ can be either fixed or tending to infinity jointly with the sample size $n$ . Here, $Y_{it}:[0,1]\to\mathbb{R}$ denotes a random outcome function of interest with the common support $[0,1]$ , $X_{it}=(X_{it}^{1},\ldots,X_{it}^{d_{x}})^{\top}$ denotes a vector of covariates, and $w_{i,j}\in\mathbb{R}$ is the $(i,j)$ -th element of an $n\times n$ time-invariant interaction matrix $W_{n}=(w_{i,j})$ . The value of each $w_{i,j}$ is pre-determined non-randomly. In social network analysis, it is common to set $w_{i,j}=c_{i,j}\bm{1}\{\text{$i$ and $j$ are peers}\}$ , where $c_{i,j}$ is some normalizing constant. Similarly, if each $i$ represents a spatial unit, one may use $w_{i,j}=c_{i,j}\bm{1}\{\Delta(i,j)\leq\overline{\Delta}\}$ , where $\Delta(i,j)$ is the distance between $i$ and $j$ , and $\overline{\Delta}$ is a given threshold. As is the convention, we set $w_{i,i}=0$ for all $i$ for normalization.

As shown in (1.2), our working model is given as follows: for $s\in[0,1]$ ,

\displaystyle Y_{it}(s)=\alpha_{0}(s)A(\overline{Y}_{it},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s),\;\;i\in[n],\;\;t\in[T]

(2.1)

where $\overline{Y}_{it}=\sum_{j=1}^{n}w_{i,j}Y_{jt}$ . Recall that we throughout assume $A(\cdot,\cdot)$ is linear in its first argument so that we have $\sum_{j=1}^{n}w_{i,j}A(Y_{jt},\cdot)=A(\overline{Y}_{it},\cdot)$ . For the structure of network interaction, Beyaztas et al. (2024) and Hoshino (2024) consider alternatively the following form: $\int_{0}^{1}\overline{Y}_{it}(u)\alpha_{0}(u,s)\,\text{d}u$ , reflecting the usual functional linear form in the FDA literature. We cannot say which of this type of interaction structure or the proposed one is more general, but ours may offer some interpretational simplicity. We impose additional shape restrictions on $A(\cdot,\cdot)$ later.

The parameters of primary interest are the interaction effect function $\alpha_{0}(s)$ and the coefficient functions $\beta_{0}(s)=(\beta_{01}(s),\ldots,\beta_{0d_{x}}(s))$ . The functional individual effects $f_{01}(s),\ldots,f_{0n}(s)$ are treated as nuisance parameters. Restricting the support of $s$ to be a unit interval is a normalization, which is harmless as long as the response functions have the identical interval support. For simplicity, we do not explicitly assume that $X_{it}$ is a function of $s$ , which can be relaxed easily at the expense of more complicated notations and proofs. A constant term is not included in $X_{it}$ . In the following, we assume that $Y_{it}\in L^{2}(0,1)$ , $\varepsilon_{it}\in L^{2}(0,1)$ , and that $\alpha_{0}$ and $\beta_{0}$ are continuous.

Stationarity

We discuss the stationarity of our model. Recall that our model is a system of simultaneous functional equations, which may not have a unique interior solution in general, depending on the parameter values. Thus, just like the stability condition for vector autoregressive models in the time series literature, we need to impose some conditions on the model to ensure that the outcome functions follow a unique stationary data-generating process and prevent explosive behavior.

Let $Y_{t}(s)=(Y_{1t}(s),\ldots,Y_{nt}(s))^{\top}$ , $\bm{A}(Y_{t},s)=(A(Y_{1t},s),\ldots,A(Y_{nt},s))^{\top}$ , $X_{t}=(X_{1t},\ldots,X_{nt})^{\top}$ , $F_{0}(s)=(f_{01}(s),\ldots,f_{0n}(s))^{\top}$ , and $\mathcal{E}_{t}(s)=(\varepsilon_{1t}(s),\ldots,\varepsilon_{nt}(s))^{\top}$ . Then, we can re-write (1.2) in matrix form as

\displaystyle Y_{t}(s)=\alpha_{0}(s)W_{n}\bm{A}(Y_{t},s)+X_{t}\beta_{0}(s)+F_{0}(s)+\mathcal{E}_{t}(s),\;\;t\in[T].

(2.2)

This expression clearly indicates that our model is characterized as $T$ distinct systems of functional equations of size $n$ . We introduce the following assumption to ensure that the model has a unique stationary solution.

Assumption 2.1 (Stationarity).

(i) $\overline{\alpha}_{0}\lesssim 1$ and $||W_{n}||_{\infty}\lesssim 1$ such that $\overline{\alpha}_{0}||W_{n}||_{\infty}<1$ , where $\overline{\alpha}_{0}\coloneqq\max_{s\in[0,1]}|\alpha_{0}(s)|$ . (ii) For any $h\in L^{2}(0,1)$ , $||A(h,\cdot)||_{L^{2}}\leq||h||_{L^{2}}$ .

Assumption 2.1(i) requires that the magnitude of the network interaction is not too strong. The existence of $\overline{\alpha}_{0}$ is guaranteed by the continuity of $\alpha_{0}$ . With Assumption 2.1(ii), we have for any $h,h^{\prime}\in L^{2}(0,1)$ that $||A(h-h^{\prime},\cdot)||_{L^{2}}\leq||h-h^{\prime}||_{L^{2}}$ , which indicates the contraction property of the operator $A$ . This assumption still accommodates many empirically interesting interaction patterns. For example, in the case of point-evaluation functional $A(h,s)=h(s)$ , it trivially satisfies $||A(h,\cdot)||_{L^{2}}=||h||_{L^{2}}$ . For another example, suppose $A(h,s)=\int_{0}^{1}h(u)\nu(u,s)\text{d}u$ for some continuous $\nu$ . Since

\displaystyle||A(h,\cdot)||_{L^{2}}^{2}\leq\int_{0}^{1}\int_{0}^{1}\left|h(u)\nu(u,s)\right|^{2}\text{d}u\text{d}s\leq\overline{\nu}^{2}||h||_{L^{2}}^{2},

(2.3)

where $\overline{\nu}\coloneqq\max_{u,s\in[0,1]^{2}}|\nu(u,s)|$ , Assumption 2.1(ii) is implied if $\overline{\nu}\leq 1$ holds.

Now, denote $\mathcal{H}_{n,p}\coloneqq\{H=(h_{1},\ldots,h_{n}):h_{i}\in L^{p}(0,1)\;\text{for all}\;i\}$ , and define a linear operator $\mathcal{A}$ as

\displaystyle(\mathcal{A}H)(s)\coloneqq\alpha_{0}(s)W_{n}\bm{A}(H,s),\;\;H\in\mathcal{H}_{n,p}.

(2.4)

Then, we can write our model symbolically as follows:

\displaystyle Y_{t}=\mathcal{A}Y_{t}+X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t},\;\;t\in[T].

(2.5)

Further, denoting Id to be the identity operator, if the inverse operator $(\text{Id}-\mathcal{A})^{-1}$ exists, the solution $Y_{t}$ of the system can be uniquely determined up to an equivalence class in $\mathcal{H}_{n,2}$ as

\displaystyle Y_{t}=(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t}],\;\;t\in[T].

(2.6)

The next proposition states that Assumption 2.1 is sufficient for the existence of $(\text{Id}-\mathcal{A})^{-1}$ .

Proposition 2.1.

Suppose that Assumption 2.1 holds. Then, $(\text{Id}-\mathcal{A})^{-1}$ exists, and for each $t\in[T]$ , $Y_{t}$ is the only solution of (1.2) in the Banach space $(\mathcal{H}_{n,2},||\cdot||_{\infty,2})$ , where $||H||_{\infty,p}\coloneqq\max_{1\leq i\leq n}||h_{i}||_{L^{p}}$ .

Note that the explicit form of the inverse operator $(\text{Id}-\mathcal{A})^{-1}$ cannot be derived in general, except for some simple cases such as $(\mathcal{A}Y_{t})(s)=\alpha_{0}(s)W_{n}Y_{t}(s)$ . In this case, $(\text{Id}-\mathcal{A})^{-1}$ is obtained as $(I_{n}-\alpha_{0}(\cdot)W_{n})^{-1}$ . However, in practice, we can approximate it with arbitrary precision by truncating the Neumann series expansion $(\text{Id}-\mathcal{A})^{-1}=\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}$ at a sufficiently large order (see, e.g., Kress, 2014). See Remark 1 in Zhu et al. (2022) for a related discussion.

Estimation and Asymptotic Theory

Integrated-GMM estimation

To estimate the unknown functional parameters $\alpha_{0}(s)$ and $\beta_{0}(s)$ , there are broadly two approaches. The first is a ”local” approach that estimates the values of these functions at specific $s$ -values, repeating the estimation across different points to recover the full functional forms. The second is a ”global” approach that estimates the entire functional forms in a single step using a series approximation method. Although both approaches are theoretically valid, the local approach typically requires more computation time and often leads to larger variance (but smaller bias) because it does not exploit information from nearby evaluation points. This study adopts the global approach.

Let $\{\phi_{k}:k=1,2,\ldots\}$ be a series of orthonormal basis functions. We throughout assume that $\phi_{k}$ ’s are continuous on $[0,1]$ . Then, if the functions $\alpha_{0}$ and $\beta_{0}$ are sufficiently smooth, we can approximate

	$\displaystyle\alpha_{0}(s)$	$\displaystyle\approx\sum_{k=1}^{K}\phi_{k}(s)\theta_{0\alpha,k},$		(3.1)
	$\displaystyle\beta_{0j}(s)$	$\displaystyle\approx\sum_{k=1}^{K}\phi_{k}(s)\theta_{0j,k},\;\;j\in[d_{x}],$		(3.2)

uniformly in $s\in[0,1]$ , for some coefficient vectors $\theta_{0\alpha}=(\theta_{0\alpha,1},\ldots,\theta_{0\alpha,K})^{\top}$ and $\theta_{0j}=(\theta_{0j,1},\ldots,\theta_{0j,K})^{\top}$ , $j\in[d_{x}]$ . Here, $K\equiv K_{nT}$ is a sequence of positive integers tending to infinity as $nT$ increases. For simplicity of presentation, we use the same basis function $\phi_{k}$ and the same basis order $K$ to approximate both $\alpha_{0}$ and $\beta_{0}$ . Define $\theta_{0}=(\theta_{0\alpha}^{\top},\theta_{01}^{\top},\ldots,\theta_{0d_{x}}^{\top})^{\top}$ , $\phi^{K}(s)=(\phi_{1}(s),\ldots,\phi_{K}(s))^{\top}$ ,

\displaystyle R_{it}(s)\coloneqq(A(\overline{Y}_{it},s),X_{it}^{\top})^{\top},\;\;H_{it}(s)\coloneqq R_{it}(s)\otimes\phi^{K}(s),\;\;\text{and}\;\;H_{t}(s)=(H_{1t}(s),\ldots,H_{nt}(s))^{\top}.

(3.3)

Then, we can further re-write the model (1.2) as

\displaystyle Y_{t}(s)=H_{t}(s)\theta_{0}+F_{0}(s)+V_{t}(s)+\mathcal{E}_{t}(s),\;\;t\in[T].

(3.4)

Here, $V_{t}(s)=(v_{1t}(s),\ldots,v_{nt}(s))^{\top}$ is an $n\times 1$ vector of series approximation errors:

\displaystyle v_{it}(s)\coloneqq A(\overline{Y}_{it},s)\{\alpha_{0}(s)-\phi^{K}(s)^{\top}\theta_{0\alpha}\}+\sum_{j=1}^{d_{x}}X_{it}^{j}\{\beta_{0j}(s)-\phi^{K}(s)^{\top}\theta_{0j}\}.

(3.5)

Under the assumptions we will introduce, this approximation error diminishes to zero at a certain rate as $K$ goes to infinity. How to choose an appropriate $K$ will be discussed in Remark 3.2.

Further, let

\displaystyle\bm{Y}(s)=\left(\begin{array}[]{c}Y_{1}(s)\\ \vdots\\ Y_{T}(s)\end{array}\right),\quad\bm{H}(s)=\left(\begin{array}[]{c}H_{1}(s)\\ \vdots\\ H_{T}(s)\end{array}\right),\quad\bm{V}(s)=\left(\begin{array}[]{c}V_{1}(s)\\ \vdots\\ V_{T}(s)\end{array}\right),\quad\bm{\mathcal{E}}(s)=\left(\begin{array}[]{c}\mathcal{E}_{1}(s)\\ \vdots\\ \mathcal{E}_{T}(s)\end{array}\right),

(3.18)

and $\underset{n(T-1)\times nT}{\bm{D}}=(d_{ij})$ be the one-period lag operator, whose $(i,j)$ -th element is defined as

\displaystyle d_{ij}=\begin{cases}-1&\text{if}\;\;i=j\\ 1&\text{if}\;\;n+i=j\\ 0&\text{otherwise}\end{cases}

(3.19)

Then, we can remove the unknown fixed effects from the model in the following manner:

\displaystyle\bm{D}\bm{Y}(s)

\displaystyle=\bm{D}\bm{H}(s)\theta_{0}+\bm{D}\bm{V}(s)+\bm{D}\bm{\mathcal{E}}(s).

(3.20)

We estimate $\theta_{0}$ based on this expression. In order to consistently estimate $\theta_{0}$ , we need to address the endogeneity issue caused by the simultaneous interactions of the response functions; that is, since $A(\overline{Y}_{it},s)$ is correlated with the error term $\varepsilon_{it}(s)$ in general, simply regressing $\bm{D}\bm{Y}(s)$ on $\bm{D}\bm{H}(s)$ does not yield a consistent estimate of $\theta_{0}$ . To tackle this issue, we employ an instrumental variable (IV) approach.

Suppose we have a $d_{q}\times 1$ vector of IVs $Q_{it}=(Q_{it}^{1},\ldots,Q_{it}^{d_{q}})^{\top}$ for $A(\overline{Y}_{it},s)$ . For example, noting $W_{n}Y_{t}=W_{n}\mathcal{A}Y_{t}+W_{n}X_{t}\beta_{0}+W_{n}F_{0}+W_{n}\mathcal{E}_{t}$ , we can find that the network lagged covariates $\overline{X}_{it}\coloneqq\sum_{j=1}^{n}w_{i,j}X_{jt}$ (and also their lags) are valid candidates for $Q_{it}$ . Define

\displaystyle B_{it}\coloneqq(Q_{it}^{\top},X_{it}^{\top})^{\top},\;\;Z_{it}(s)\coloneqq B_{it}\otimes\phi^{K}(s),\;\;Z_{t}(s)=(Z_{1t}(s),\ldots,Z_{nt}(s))^{\top},

(3.21)

and $\bm{Z}(s)=(Z_{1}(s)^{\top},\ldots,Z_{T}(s)^{\top})^{\top}$ . Then, we have $\mathbb{E}[\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)]=\bm{0}_{(d_{q}+d_{x})K}$ .

Now, although one can estimate $\theta_{0}$ based on the linear moment conditions $\mathbb{E}[\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)]=\bm{0}_{(d_{q}+d_{x})K}$ only, which results in a two-stage least squares (2SLS) type estimator, we can utilize additionally the quadratic moment conditions to improve the efficiency of estimation (see, e.g., Lin and Lee, 2010). That is, under the independence assumption on the error terms $\{\varepsilon_{it}(s)\}_{i\in[n],t\in[T]}$ (see Assumption 3.3(i) below), for any $n(T-1)\times n(T-1)$ matrices $P_{m}\coloneqq I_{T-1}\otimes P_{m,1}$ , where $P_{m,1}$ ( $m=1,\ldots,M$ ) is an $n\times n$ matrix whose diagonal elements are all zero, we have

\displaystyle\mathbb{E}[\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)]=0,\;\;m\in[M].

(3.22)

Some examples of $P_{m,1}$ include $P_{m,1}=W_{n}$ and $P_{m,1}=W_{n}^{\top}W_{n}-\text{diag}(W_{n}^{\top}W_{n})$ .

Combining the linear and quadratic moment conditions, we can construct our estimator based on the following $d_{g}\coloneqq(d_{q}+d_{x})K+M$ moment conditions: for $s\in[0,1]$ ,

\displaystyle\frac{1}{n(T-1)}\left(\begin{array}[]{c}\mathbb{E}[\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)]\\ \mathbb{E}[\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s)]\\ \vdots\\ \mathbb{E}[\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s)]\end{array}\right)=\bm{0}_{d_{g}}.

(3.27)

As the empirical counterpart of these moment conditions, given a candidate value $\theta$ for $\theta_{0}$ , we define

\displaystyle\underset{d_{g}\times 1}{g_{nT}(s;\theta)}\coloneqq\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta)\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{E}(s;\theta)\\ \vdots\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{E}(s;\theta)\end{array}\right),

(3.32)

where $\bm{E}(s;\theta)\coloneqq\bm{Y}(s)-\bm{H}(s)\theta$ . The estimator of $\theta_{0}$ is obtained by minimizing the norm of $g_{nT}(s;\theta)$ over $s\in[0,1]$ . To this end, we pre-specify $L\equiv L_{nT}$ grid points in $[0,1]$ , denoted by $0<s_{1}<\cdots<s_{L}<1$ , and numerically integrate the moment functions across these points:

\displaystyle\overline{g}_{nT}(\theta)\coloneqq\frac{1}{L}\sum_{l=1}^{L}g_{nT}(s_{l};\theta).

(3.33)

Situations where the response functions are not fully observable are discussed in Remark 3.3.

Now, we are ready to introduce our estimator:

\displaystyle\begin{split}&\widehat{\theta}_{nT}=(\widehat{\theta}_{nT,\alpha}^{\top},\widehat{\theta}_{nT,1}^{\top},\ldots,\widehat{\theta}_{nT,d_{x}}^{\top})^{\top}\coloneqq\operatorname*{argmin}_{\theta\in\Theta_{K}}\mathcal{Q}_{nT}(\theta)\\ &\text{where}\;\;\mathcal{Q}_{nT}(\theta)\coloneqq\overline{g}_{nT}(\theta)^{\top}\Omega_{nT}\overline{g}_{nT}(\theta),\end{split}

(3.34)

$\Omega_{nT}$ is a $d_{g}\times d_{g}$ positive definite symmetric weight matrix, and $\Theta_{K}\subset\mathbb{R}^{(d_{x}+1)K}$ is a compact parameter space containing $\theta_{0}$ in its interior. For one example of the weight matrix, we can use $\Omega_{nT}=I_{d_{g}}$ , or for another example,

\displaystyle\Omega_{nT}=\left(\begin{array}[]{cc}\left(\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{Z}(s_{l})/N\right)^{-1}&\bm{0}_{(d_{q}+d_{x})K\times M}\\ \bm{0}_{M\times(d_{q}+d_{x})K}&I_{M}\end{array}\right),

(3.37)

where

\displaystyle N\coloneqq nL(T-1).

(3.38)

Once $\widehat{\theta}_{nT}$ is obtained, the estimators of $\alpha_{0}(s)$ and $\beta_{0}(s)$ are given as

	$\displaystyle\widehat{\alpha}_{nT}(s)$	$\displaystyle\coloneqq\phi^{K}(s)^{\top}\widehat{\theta}_{nT,\alpha},$		(3.39)
	$\displaystyle\widehat{\beta}_{nT,j}(s)$	$\displaystyle\coloneqq\phi^{K}(s)^{\top}\widehat{\theta}_{nT,j},\;\;j\in[d_{x}]$		(3.40)

which we refer to as the integrated-GMM estimators. Additionally, if one is interested in the estimation of individual fixed effect functions, the following estimator can be used:

\displaystyle\widehat{f}_{ni}(s)\coloneqq\frac{1}{T}\sum_{t=1}^{T}\left(Y_{it}(s)-\widehat{\alpha}_{nT}(s)A(\overline{Y}_{it},s)-X_{it}^{\top}\widehat{\beta}_{nT}(s)\right),\;\;i\in[n]

(3.41)

where $\widehat{\beta}_{nT}(s)=(\widehat{\beta}_{nT,1}(s),\ldots,\widehat{\beta}_{nT,d_{x}}(s))^{\top}$ . Consistent estimation of $f_{0i}(s)$ by $\widehat{f}_{ni}(s)$ requires $T$ to increase to infinity, while $\alpha_{0}(s)$ and $\beta_{0}(s)$ can be consistently estimated even when $T$ is fixed. More specifically, noting that the error term $\varepsilon_{it}(s)$ has mean zero, to average out the errors applying the law of large numbers at each $i$ , $T$ must increase to infinity. Unlike $\widehat{\alpha}_{nT}(s)$ and $\widehat{\beta}_{nT}(s)$ , the estimator $\widehat{f}_{ni}(s)$ is not necessarily continuous, as we do not preclude cases where $Y_{it}(s)$ and $A(\overline{Y}_{it},s)$ are discontinuous in $s$ .

Asymptotic theory

To derive the asymptotic properties of our estimator, we first need to specify the structure of our sampling space. Following Jenish and Prucha (2012), let $\mathcal{D}\subset\mathbb{R}^{d}$ , $1\leq d<\infty$ be a possibly uneven lattice, and $\mathcal{D}_{n}\subset\mathcal{D}$ be the set of observation locations. Once the observation locations are determined for a given sample of size $n$ , we assume that they do not vary over time $t$ . For spatial data, $\mathcal{D}$ would be defined by a geographical space with $d=2$ . Note that $\mathcal{D}$ does not necessarily have to be exactly observable to us. For example, $\mathcal{D}$ is possibly a complex space of general social and economic characteristics. In this case, we can consider it to be an embedding of individuals in a latent space, rather than their physical locations.

We first derive the rates of convergence of our estimators under the following set of assumptions.

Assumption 3.1 (Sampling space).

(i) The maximum coordinate difference (i.e., the Chebyshev distance) between any two observations $i,j\in\mathcal{D}$ , which we denote as $\Delta(i,j)$ , is at least (without loss of generality) 1; and (ii) a threshold distance $\overline{\Delta}$ exists such that $w_{i,j}=0$ if $\Delta(i,j)>\overline{\Delta}$ .

Assumption 3.2 (Observables).

(i) $\{(X_{it},Q_{it})\}_{i\in[n],t\in[T]}$ are non-stochastic and uniformly bounded; and (ii) for all $s\in[0,1]$ , $i\in[n]$ , and $t\in[T]$ , $||Y_{it}(s)||_{p}\lesssim 1$ for some $p>4$ .

Assumption 3.3 (Error term).

(i) $\{\varepsilon_{it}\}_{i\in[n],t\in[T]}$ are independent; (ii) for all $s\in[0,1]$ , $i\in[n]$ , and $t\in[T]$ , $\mathbb{E}[\varepsilon_{it}(s)]=0$ , $||\varepsilon_{it}(s)||_{2}>0$ , and $||\varepsilon_{it}(s)||_{4}\lesssim 1$ ; and (iii) for all $i\in[n]$ and $t\in[T]$ , $\sum_{k=1}^{K}\left(L^{-2}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi_{k}(s_{l})\phi_{k}(s_{l^{\prime}})\right)\lesssim 1$ uniformly in $K$ , where $\Gamma_{it}(s_{l},s_{l^{\prime}})\coloneqq\text{Cov}(\varepsilon_{it}(s_{l}),\varepsilon_{it}(s_{l^{\prime}}))$ .

Assumption 3.4 (Interaction operator).

There exists a function $\omega_{p}$ satisfying $|A(h,s)|^{p}\leq\int_{0}^{1}|h(u)|^{p}\>\omega_{p}(u,s)\text{d}u$ for any given $1\leq p<\infty$ , such that $\int_{0}^{1}\omega_{p}(u,s)\text{d}u\leq 1$ for all $s\in[0,1]$ .

Assumption 3.5 (Weight matrices).

(i) For all $m\in[M]$ , $P_{m,1}$ is symmetric, $\text{diag}(P_{m,1})=\bm{0}_{n}$ , and $||P_{m,1}||_{1},||P_{m,1}||_{\infty}\lesssim 1$ . In addition, writing $P_{m,1}=(p_{m,i,j})$ , a threshold distance $\overline{\Delta}_{m}$ exists such that $p_{m,i,j}=0$ if $\Delta(i,j)>\overline{\Delta}_{m}$ ; and (ii) $0<\lambda_{\min}(\Omega_{nT})\leq\lambda_{\max}(\Omega_{nT})\lesssim 1$ for all sufficiently large $nT$ .

Assumption 3.6 (Identification).

For all sufficiently large $nT$ , $0<\lambda_{\min}\left(\Pi_{nT}^{\top}\Pi_{nT}\right)\leq\lambda_{\max}\left(\Pi_{nT}^{\top}\Pi_{nT}\right)\lesssim 1$ , where $\Pi_{nT}\coloneqq N^{-1}\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{H}(s_{l})]$ , and $N=nL(T-1)$ .

Assumption 3.7 (Series approximation).

$\{\phi_{k}:k=1,2,\ldots\}$ is a series of continuous orthonormal basis functions satisfying $\sup_{s\in[0,1]}|\alpha_{0}(s)-\phi^{K}(s)^{\top}\theta_{0\alpha}|\lesssim K^{-\pi}$ and $\max_{j\in[d_{x}]}\sup_{s\in[0,1]}|\beta_{0j}(s)-\phi^{K}(s)^{\top}\theta_{0j}|\lesssim K^{-\pi}$ .

Assumptions 3.1(i) and (ii) together imply that the number of interacting partners for each unit is bounded (i.e., the network must be sparse). These assumptions play a crucial role in characterizing the stochastic process of the outcome functions. In Assumption 3.2, part (i) assumes that the covariates are non-stochastic and bounded. This type of assumption is frequently utilized in the spatial and network literature and can be interpreted as viewing the analysis conditional on the realized values of the covariates. Meanwhile, part (ii) is introduced to ensure some convergence results for the quadratic moments.

Assumption 3.3(i) allows the error terms to be fully heteroskedastic. Part (ii) should be standard. Part (iii) is a high-level condition, which plays an important role to obtain the parametric convergence rate for the GMM estimator. In general, if $\Gamma_{it}$ belongs to $L^{2}([0,1]^{2})$ , it admits the following series expansion:

\displaystyle\Gamma_{it}(s_{l},s_{l^{\prime}})=\sum_{k_{1},k_{2}=1}^{\infty}\kappa_{it,k_{1},k_{2}}\phi_{k_{1}}(s_{l})\phi_{k_{2}}(s_{l^{\prime}})

(3.42)

for some sequence of constants $\{\kappa_{it,k_{1},k_{2}}\}$ . By the orthonormality of $\phi_{k}$ ,

	$\displaystyle\int_{0}^{1}\int_{0}^{1}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi_{k}(s_{l})\phi_{k}(s_{l^{\prime}})\,\text{d}s_{l}\,\text{d}s_{l^{\prime}}$	$\displaystyle=\sum_{k_{1},k_{2}=1}^{\infty}\kappa_{it,k_{1},k_{2}}\left(\int_{0}^{1}\phi_{k}(s_{l})\phi_{k_{1}}(s_{l})\,\text{d}s_{l}\right)\left(\int_{0}^{1}\phi_{k}(s_{l^{\prime}})\phi_{k_{2}}(s_{l^{\prime}})\,\text{d}s_{l^{\prime}}\right)$		(3.43)
		$\displaystyle=\kappa_{it,k,k}.$		(3.44)

Since $L^{-2}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi_{k}(s_{l})\phi_{k}(s_{l^{\prime}})$ can be seen as a numerical approximation of the left-hand side of the above expression, Assumption 3.3 part (iii) essentially requires that $\sum_{k=1}^{K}\kappa_{it,k,k}\lesssim 1$ uniformly in $K$ . In particular, if the $\kappa_{it,k,k}$ ’s are ordered in decreasing manner such that $\kappa_{it,1,1}\geq\kappa_{it,2,2}\geq\cdots$ , this assumption can be interpreted in two ways: there exists a constant $a>1$ such that $\kappa_{it,k,k}\lesssim k^{-a}$ , or there exists a fixed $b$ such that $\kappa_{it,k,k}=0$ for all $k>b$ .

Assumption 3.4 is not restrictive in most empirically relevant situations. For example, in the case where $A(h,s)=h(s)$ , we can set $\omega_{p}(u,s)=\delta(u-s)$ for any $p$ . For another example, when $A(h,s)=\int_{0}^{1}h(u)\nu(u,s)\text{d}u$ for some kernel $\nu(u,s)$ , since $|A(h,s)|^{p}\leq\int_{0}^{1}|h(u)|^{p}|\nu(u,s)|^{p}\text{d}u$ , we can set $|\nu(u,s)|^{p}=\omega_{p}(u,s)$ in this case.

In Assumption 3.5, we assume that the weight matrices in the quadratic moments are symmetric. Note that this assumption does not lose any generality because $A^{\top}P_{m,1}A=A^{\top}P_{m,1}^{\top}A$ for any $n\times 1$ vector $A$ . If $P_{m,1}$ is not symmetric in practice, we can always symmetrize it as $(P_{m,1}+P_{m,1}^{\top})/2$ . The assumption for the existence of a threshold distance $\overline{\Delta}_{m}$ may be non-standard, but it simplifies the proof. Since $P_{m,1}$ ’s are usually created from the interaction matrix $W_{n}$ and its powers, this assumption is consistent with Assumption 3.1(ii).

Assumption 3.6 is a regularity condition to ensure the identifiability of $\theta_{0}$ . Assumption 3.7 is standard. For example, it is satisfied if spline basis functions are used and $\alpha_{0}$ and $\beta_{0j}$ ’s are Hölder class of smoothness order $\pi$ (see, e.g., Chen, 2007; Belloni et al., 2015).

Theorem 3.1 (Rates of convergence).

Suppose that Assumptions 2.1, 3.1 – 3.7 hold. In addition, assume that $K/\sqrt{nT}\to 0$ and $K^{1-\pi}\to 0$ as $nT\to\infty$ . Then,

(i)

$||\widehat{\theta}_{nT}-\theta_{0}||\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}$
(ii)

$||\widehat{\alpha}_{nT}-\alpha_{0}||_{L^{2}}\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}$ , and $\sup_{s\in[0,1]}|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}$
(iii)

$||\widehat{\beta}_{nT,j}-\beta_{0j}||_{L^{2}}\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}$ , and $\sup_{s\in[0,1]}|\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}$ for all $j\in[d_{x}]$ .

Result (i) of Theorem 3.1 states that if the functional parameters are sufficiently smooth such that $K^{(1-2\pi)/2}\lesssim 1/\sqrt{nT}$ , the series coefficient estimator is consistent and converges at the parametric rate. This result might seem somewhat surprising since the dimension of $\theta_{0}$ is increasing to infinity. An intuitive explanation for this phenomenon is that, although the sample size is $nT$ , the total number of observation points used in the estimation is $N$ ( $=nL(T-1)$ ). This fact, in conjunction with Assumption 3.3(iii), leads to the result. The same convergence rate applies to the $L^{2}$ -convergence rate of the functional estimators, as shown in (ii) and (iii). The uniform convergence rate for these estimators is $K^{1/2}$ slower than the $L^{2}$ -convergence rate. However, note that the convergence results obtained here are not necessarily the sharpest, and the theoretically optimal convergence rates under our setup are also unknown. These points are left for future research.

Remark 3.1 (Local estimation approach).

If one adopts a local approach that directly estimates $\alpha_{0}(s)$ and $\beta_{0}(s)$ at each $s$ , since there are exactly $nT$ observations at each $s$ , it can be readily shown that $|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\lesssim_{p}1/\sqrt{nT}$ and $||\widehat{\beta}_{nT}(s)-\beta_{0}(s)||\lesssim_{p}1/\sqrt{nT}$ . Since the local approach does not rely on series approximation, these results are free from bias terms. However, while achieving unbiasedness, the local estimator faces challenges in deriving the uniform convergence rate.

We next present the limiting distribution of our estimators. To this end, we introduce the following notations and additional assumption:

	$\displaystyle\underset{d_{g}\times(d_{x}+1)K}{J_{nT}(s;\theta)}\coloneqq\frac{\partial g_{nT}(s;\theta)}{\partial\theta^{\top}},\;\;\overline{J}_{nT}(\theta)\coloneqq\frac{1}{L}\sum_{l=1}^{L}J_{nT}(s_{l};\theta),\;\;\overline{J}_{nT}\coloneqq\mathbb{E}\left[\overline{J}_{nT}(\theta_{0})\right]$		(3.45)
	$\displaystyle\underset{d_{g}\times 1}{g_{1,nT}(s)}\coloneqq\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)\\ \bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s)\\ \vdots\\ \bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s)\end{array}\right),\;\;\overline{g}_{1,nT}\coloneqq\frac{1}{L}\sum_{l=1}^{L}g_{1,nT}(s_{l})$		(3.50)
	$\displaystyle\mathcal{V}_{nT}\coloneqq n(T-1)\mathbb{E}\left[\overline{g}_{1,nT}\overline{g}_{1,nT}^{\top}\right]$		(3.51)
	$\displaystyle\underset{(d_{x}+1)K\times(d_{x}+1)K}{\Sigma_{nT}}\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\mathcal{V}_{nT}\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}.$		(3.52)

More explicit forms of the matrices $J_{nT}(s;\theta)$ and $\mathcal{V}_{nT}$ can be found in (A.23) and in (A.38) in Appendix A, respectively. Further, let $\mathbb{S}_{\alpha}$ and $\mathbb{S}_{j}$ be the $K\times(d_{x}+1)K$ selection matrices such that $\theta_{0\alpha}=\mathbb{S}_{\alpha}\theta_{0}$ and $\theta_{0j}=\mathbb{S}_{j}\theta_{0}$ hold.

Assumption 3.8 (Misc.).

For all sufficiently large $nT$ , (i) $\lambda_{\max}\left(N^{-1}\allowbreak\sum_{l=1}^{L}\mathbb{E}[\bm{H}(s_{l})^{\top}\bm{H}(s_{l})]\right)\lesssim 1$ ; (ii) $0<\lambda_{\min}\left(\overline{J}^{\top}_{nT}\overline{J}_{nT}\right)\leq\lambda_{\max}\left(\overline{J}^{\top}_{nT}\overline{J}_{nT}\right)\lesssim 1$ ; and (iii) $0<\lambda_{\min}\left(\mathcal{V}_{nT}\right)\leq\lambda_{\max}\left(\mathcal{V}_{nT}\right)\lesssim 1$ .

Theorem 3.2 (Asymptotic normality).

Suppose that Assumptions 2.1, 3.1 – 3.8 hold. In addition, assume that $K/\sqrt{nT}\to 0$ , $K^{2}/(\sqrt{nT}\left\|\phi^{K}(s)\right\|^{2})\to 0$ , and $\sqrt{nT}K^{(1-2\pi)/2}\to 0$ as $nT\to\infty$ . Then,

	(i)	$\displaystyle\;\;\frac{\sqrt{n(T-1)}\left(\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)\right)}{\sigma_{nT,\alpha}(s)}\overset{d}{\to}N(0,1)$		(3.53)
	(ii)	$\displaystyle\;\;\frac{\sqrt{n(T-1)}\left(\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)\right)}{\sigma_{nT,j}(s)}\overset{d}{\to}N(0,1),$		(3.54)

where $[\sigma_{nT,\alpha}(s)]^{2}\coloneqq\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\Sigma_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)$ , and $[\sigma_{nT,j}(s)]^{2}\coloneqq\phi^{K}(s)^{\top}\mathbb{S}_{j}\Sigma_{nT}\mathbb{S}_{j}^{\top}\phi^{K}(s)$ .

Theorem 3.2 establishes the pointwise asymptotic normality of the integrated-GMM estimators. As is common in series estimation, we impose additional undersmoothing conditions to ensure that the bias terms vanish sufficiently quickly.

In order to perform statistical inference based on the results of Theorem 3.2, we need to consistently estimate the variances $[\sigma_{nT,\alpha}(s)]^{2}$ and $[\sigma_{nT,j}(s)]^{2}$ . To save space, the procedure for consistent variance estimation is not discussed here but is provided in Appendix C.

Remark 3.2 (Choice of $K$ ).

Suppose that $K$ is proportional to $(nT)^{c}$ for some $c>0$ and that $||\phi^{K}(s)||^{2}$ is of order $K$ . Then, to achieve the asymptotic normality, we require $K/\sqrt{nT}\to 0$ and $\sqrt{nT}K^{(1-2\pi)/2}\to 0$ simultaneously, which can be reduced to the following condition on $c$ : $(2\pi-1)^{-1}<c<1/2$ . This result automatically implies that the target functions must be sufficiently smooth with the smoothness $\pi$ at least greater than $3/2$ .

Remark 3.3 (Incompletely observed response function).

The integrated-GMM estimator is often infeasible because the response functions are typically observed only at a finite set of points in $[0,1]$ . Even in such cases, we can approximate the entire functional form of $Y_{it}$ using a linear interpolation method. Suppose that for each $(i,t)$ , $Y_{it}$ is observed at $L_{it}$ distinct points $0\leq s_{it,1}\leq s_{it,2}\leq\dots\leq s_{it,L_{it}}\leq 1$ . Then, for each given $s\in[s_{it,l},s_{it,l+1}]$ , define

\displaystyle Y^{\text{int}}_{it}(s)\coloneqq Y_{it}(s_{it,l})+\frac{Y_{it}(s_{it,l+1})-Y_{it}(s_{it,l})}{s_{it,l+1}-s_{it,l}}(s-s_{it,l}).

(3.55)

When $s<s_{it,1}$ (resp. $s>s_{it,L_{it}}$ ), we set $Y^{\text{int}}_{it}(s)\coloneqq Y_{it}(s_{it,1})$ (resp. $Y^{\text{int}}_{it}(s)\coloneqq Y_{it}(s_{it,L_{it}})$ ). Other than linear interpolation, one may also use a kernel method, as in Zhu et al. (2022), to obtain $Y^{\text{int}}_{it}(s)$ . Then, using $Y^{\text{int}}_{it}(s)$ in place of $Y_{it}(s)$ , we can write

\displaystyle Y^{\text{int}}_{it}(s)=\alpha_{0}(s)A(\overline{Y}^{\text{int}}_{it},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s)+u_{it}(s),

(3.56)

where $u_{it}(s)$ is the interpolation error: $u_{it}(s)\coloneqq Y^{\text{int}}_{it}(s)-Y_{it}(s)+\alpha_{0}(s)A(\overline{Y}_{it}-\overline{Y}^{\text{int}}_{it},s)$ . Thus, if $u_{it}(s)$ converges to zero sufficiently quickly for all $s\in[0,1]$ , $i\in[n]$ , and $t\in[T]$ , we can apply the same estimation and inference strategy as above.

Network multiplier effects: marginal effects and impulse responses

Once the model is estimated, as a next step, one might be interested in computing the marginal effects of covariates on the outcome. In a standard linear regression model without network interaction, the estimated coefficients directly represent the marginal effects of their corresponding covariates. However, in the presence of intricate functional interaction, this is no longer the case.

As shown in Section 2, under Assumption 2.1, we have the following moving-average type representation:

\displaystyle Y_{t}=\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}X_{t}\beta_{0}+\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}F_{0}+\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}\mathcal{E}_{t},\;\;t\in[T].

(4.1)

This expression indicates that the marginal effects of increasing $X_{it}^{j}$ by one unit on $Y_{t}$ is given by $\partial Y_{t}/(\partial X_{it}^{j})=\lim_{c\to 0}\sum_{\ell=0}^{\infty}[\mathcal{A}^{\ell}(X_{t}^{j}+\bm{e}_{i}c)\beta_{0j}-\mathcal{A}^{\ell}X_{t}^{j}\beta_{0j}]/c=\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}\bm{e}_{i}\beta_{0j}$ by the linearity of $\mathcal{A}^{\ell}$ , where $X_{t}^{j}$ is the $j$ -th column of $X_{t}$ , and $\bm{e}_{i}$ denotes the $i$ -th column of $I_{n}$ . Alternatively, a little more informative expression can be obtained as follows: letting $\gamma(h,s)\coloneqq\alpha_{0}(s)A(h,s)$ ,

	$\displaystyle M(i,j,s)$	$\displaystyle\coloneqq\partial Y_{t}(s)/(\partial X_{it}^{j})=\bm{e}_{i}\beta_{0j}(s)+W_{n}\bm{e}_{i}\gamma(\beta_{0j},s)+W_{n}^{2}\bm{e}_{i}\gamma^{2}(\beta_{0j},s)+\cdots$		(4.2)
		$\displaystyle=\sum_{\ell=0}^{\infty}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\beta_{0j},s),$		(4.3)

where $\gamma^{0}(\beta_{0j},s)=\beta_{0j}(s)$ , and $\gamma^{\ell}(\beta_{0j},s)=\gamma(\gamma^{\ell-1}(\beta_{0j},\cdot),s)$ for $\ell\geq 1$ . From this, we can clearly see that the marginal effects $M(i,j,s)$ of increasing $X_{it}^{j}$ consist of the direct effect on unit $i$ , the indirect effect on $i$ ’s immediate neighbors, the second-order indirect effect on $i$ ’s neighbors’ neighbors, and so forth, highlighting the presence of the network multiplier effect. More specifically, recall that when $W_{n}$ represents a (weighted) adjacency matrix, the $(i,j)$ -th element of $W_{n}^{\ell}$ corresponds to the number of (weighted) walks between $i$ and $j$ of length $\ell$ . Thus, the $k$ -th element of $M(i,j,s)$ is interpreted as the weighted sum of the number of walks from $i$ to $k$ , where the contribution of each length- $\ell$ walk to the sum decays exponentially at $\gamma^{\ell}(\beta_{0j},s)$ .

To estimate the marginal effects, not just replacing the unknown parameters with their estimators, the infinite sum generally needs to be approximated by a finite sum: for some positive integer $S$ ,

\displaystyle\widehat{M}^{S}_{nT}(i,j,s)\coloneqq\sum_{\ell=0}^{S}W_{n}^{\ell}\bm{e}_{i}\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s),

(4.4)

where $\widehat{\gamma}_{nT}(h,s)\coloneqq\widehat{\alpha}_{nT}(s)A(h,s)$ . Meanwhile, in the special case of concurrent interaction such that $\gamma(h,s)=\alpha_{0}(s)h(s)$ , $\gamma^{2}(h,s)=(\alpha_{0}(s))^{2}h(s)$ , …, it is easy to see that $M(i,j,s)=\sum_{\ell=0}^{\infty}(\alpha_{0}(s)W_{n})^{\ell}\bm{e}_{i}\beta_{0j}(s)=(I_{n}-\alpha_{0}(s)W_{n})^{-1}\bm{e}_{i}\beta_{0j}(s)$ holds. This implies that, in this case, we can estimate $M(i,j,s)$ directly as $(I_{n}-\widehat{\alpha}_{nT}(s)W_{n})^{-1}\bm{e}_{i}\widehat{\beta}_{nT,j}(s)$ , without computing the infinite sum.

In the above discussion, we have demonstrated how the impacts of shifting one’s covariate propagate to others. Similarly, just like the impulse response analysis in time-series vector autoregression, we can consider network impulse responses when an external shock occurs at a given unit. In particular, in a similar spirit to Koop et al. (1996), we define

\displaystyle I(i,\eta,s)\coloneqq\mathbb{E}[Y_{t}(s)\mid\varepsilon_{it}=\eta]-\mathbb{E}[Y_{t}(s)],

(4.5)

where $\eta$ is a given ”function” representing the external shock. By a similar calculation as above, we obtain

\displaystyle I(i,\eta,s)=\sum_{\ell=0}^{\infty}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\eta,s).

(4.6)

When plotting each element of $W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\eta,s)$ against $\ell=0,1,2,\ldots$ , it can be interpreted as a network version of the impulse response function (as a function of $\ell$ ), similarly to Denbee et al. (2021). The expected total social impact caused by an external shock to unit $i$ can be expressed as $\int_{0}^{1}\bm{1}_{n}^{\top}I(i,\eta,s)\text{d}s$ , and the unit that exerts the largest influence on the society is given by $i^{*}\coloneqq\operatorname*{argmax}_{i\in[n]}\int_{0}^{1}\bm{1}_{n}^{\top}I(i,\eta,s)\text{d}s$ . Denbee et al. (2021) referred to this unit as the risk key player, in the sense that an external shock to $i^{*}$ leads to the highest volatility in the aggregate outcome.

When assuming a concurrent interaction model, the impulse responses at $s$ take the following form: $I(i,\eta,s)=(I_{n}-\alpha_{0}(s)W_{n})^{-1}\bm{e}_{i}\eta(s)$ . Thus, if there is no exogenous shock at $s$ , i.e., if $\eta(s)=0$ , the expected outcome at $s$ remains unaffected. This implies, for instance, that a travel demand shock that occurred five minutes ago has no impact on current mobility availability, which is unrealistic. On the other hand, if the interaction structure is given by $A(h,s)=\int_{0}^{1}h(u)\nu(u,s)\text{d}u$ with $\nu(s^{\prime},s)\neq 0$ for $s^{\prime}<s$ , then a shock occurring at $s^{\prime}$ can transmit to the outcome at $s$ , leading to nonzero impulse responses at $s$ even when $\eta(s)=0$ .

The estimation of $I(i,\eta,s)$ can be performed in the same manner as above. For some positive integer $S$ , we estimate $I(i,\eta,s)$ by $\widehat{I}^{S}_{nT}(i,\eta,s)\coloneqq\sum_{\ell=0}^{S}W_{n}^{\ell}\bm{e}_{i}\widehat{\gamma}_{nT}^{\ell}(\eta,s)$ . The next proposition provides the convergence rate of $\widehat{M}^{S}_{nT}(i,j,s)$ and that of $\widehat{I}^{S}_{nT}(i,\eta,s)$ .

Proposition 4.1.

Suppose that the assumptions in Theorem 3.1 hold. In addition, assume that $\overline{\alpha}_{0}<1$ . Then, uniformly in $s\in[0,1]$ ,

(i)

$\max_{i\in[n]}\left\|\widehat{M}^{S}_{nT}(i,j,s)-M(i,j,s)\right\|_{\infty}\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}+\overline{\alpha}_{0}^{S+1}$ ,
(ii)

$\max_{i\in[n]}\left\|\widehat{I}^{S}_{nT}(i,\eta,s)-I(i,\eta,s)\right\|_{\infty}\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}+\overline{\alpha}_{0}^{S+1}$ .

This proposition indicates that the uniform convergence rates for the marginal effect and the impulse response estimators depend on the uniform convergence rate of the integrated-GMM estimator and the summation order $S$ . Since the approximation error from truncating the infinite sum decreases geometrically as $S$ increases, in practice, setting $S=4$ or $5$ would be sufficient.

Monte Carlo Simulation

In this section, we conduct a series of Monte Carlo experiments to evaluate the finite-sample performance of the integrated-GMM estimator. Throughout the experiments, we consider the following data-generating process (DGP):

\displaystyle Y_{it}(s)=\alpha_{0}(s)\int_{0}^{1}\overline{Y}_{it}(u)\nu(u,s)\text{d}u+X_{it}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s),

(5.1)

where $X_{it}\sim N(0,1)$ , $\alpha_{0}(s)=\phi(s;0.4,0.5^{2})+0.2s-0.4s^{2}$ , $\phi(\cdot;\mu,\sigma^{2})$ denotes the standard normal density function with mean $\mu$ and variance $\sigma^{2}$ , $\nu(u,s)=0.75(1-(u-s)^{2})$ (i.e., the Epanechnikov kernel function), and the individual fixed effects are given by $f_{0i}(s)=1+\cos(is)$ . The coefficient function is given by $\beta_{0}(s)=r(\sqrt{1+s}+s(1-s))$ , where $r$ is chosen from $r\in\{0.4,1\}$ . We use $Q_{it}=(\overline{X}_{it},\overline{\overline{X}}_{it})$ as the IVs for $\overline{Y}_{it}$ , and, thus, the magnitude of $r$ determines the strength of these IVs. For the error term, we generate $\varepsilon_{it}(s)=\sqrt{1+\text{deg}_{i}}(e_{1,it},e_{2,it},e_{3,it})^{\top}(1,s,s^{2})$ , where $\text{deg}_{i}$ denotes the number of units connected to $i$ (i.e., $i$ ’s degree), and $e_{j,it}\sim N(0,0.4^{2})$ for $j=1,2,3$ . The weight matrix $W_{n}$ is a row-normalized adjacency matrix, which is constructed by randomly placing $n$ units on a $[\sqrt{2n}]\times[\sqrt{2n}]$ lattice, where $[a]$ denotes the nearest integer to $a$ . Any two units are connected if the Euclidean distance between them is exactly one. The sample size $n$ is chosen from $n\in\{40,80\}$ , and $T$ is from $T\in\{5,10\}$ .

Since our DGP satisfies the conditions in Assumption 2.1, we can generate the outcome functions using the Neumann series approximation: $Y_{t}\approx Y_{t}^{(S)}\coloneqq\sum_{\ell=0}^{S}\mathcal{A}^{\ell}[X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t}]$ , where $S$ is increased iteratively until $\max_{i\in[n]}|Y_{it}^{(S)}(s)-Y_{it}^{(S-1)}(s)|<0.001$ is satisfied at each $t$ and $s$ . Throughout the simulations, integrals over $[0,1]$ are approximated by finite summations over 99 equally-spaced grid points.

For the choice of basis functions, we use the cubic B-splines orthonormalized via the Gram-Schmidt procedure. The number of inner knots for the B-spline is selected from $\widetilde{K}\in\{2,3\}$ . The number of grid points used to evaluate the moment function is chosen from $L\in\{10,30\}$ , with the points $0<s_{1}<\dots<s_{L}<1$ evenly spaced over $[0,1]$ . For the quadratic moments, we use two weight matrices ( $M=2$ ): $P_{1,1}=W_{n}$ and $P_{2,1}=W_{n}^{\top}W_{n}-\text{diag}(W_{n}^{\top}W_{n})$ . We then compare the performance of three different estimators: GMM 1: the integrated-GMM estimator using the weight matrix given in (3.37), GMM 2: the integrated-GMM estimator using the identity weight matrix, and (integrated) 2SLS: GMM 1 estimator without utilizing the quadratic moment conditions.

For each setup, we generate the dataset 500 times. The performance of the estimators is evaluated based on the average bias (BIAS) and the average root-mean-squared error (RMSE). Specifically, the BIAS and RMSE of estimating $\alpha_{0}$ are defined as

	$\displaystyle\text{BIAS}(\alpha_{0})$	$\displaystyle\coloneqq\frac{1}{500}\sum_{b=1}^{500}\left[\frac{1}{99}\sum_{l=1}^{99}\left(\widehat{\alpha}_{nT}^{(b)}(s_{l})-\alpha_{0}(s_{l})\right)\right]$		(5.2)
	$\displaystyle\text{RMSE}(\alpha_{0})$	$\displaystyle\coloneqq\frac{1}{500}\sum_{b=1}^{500}\sqrt{\frac{1}{99}\sum_{l=1}^{99}\left(\widehat{\alpha}_{nT}^{(b)}(s_{l})-\alpha_{0}(s_{l})\right)^{2}}$		(5.3)

respectively. Here, $\widehat{\alpha}_{nT}^{(b)}$ denotes the estimator of $\alpha_{0}$ obtained from the $b$ -th replicated dataset. The BIAS and RMSE for $\beta_{0}$ are defined analogously.

The simulation results for the estimation of $\alpha_{0}$ are summarized in Table 5.1. From these results, we observe that all three estimators perform reasonably well in terms of BIAS. However, in terms of RMSE, GMM 1 clearly outperforms the other two estimators across all scenarios. Recalling that, without the quadratic moment conditions, GMM 1 and 2SLS are numerically identical, it follows that GMM 1’s efficiency gain relative to 2SLS stems solely from the quadratic moment conditions. Interestingly, when comparing GMM 2 and 2SLS, we find that 2SLS even outperforms GMM 2. These findings suggest that while incorporating quadratic moments does improve the efficiency, the choice of the GMM weight matrix is equally (or potentially more) crucial. When $r$ increases from 0.4 to 1, the RMSEs for the GMM estimators are roughly halved or slightly more, whereas those for 2SLS shrink by more than half. This indicates that the 2SLS estimator is more sensitive to IV strength, as anticipated. The choices of $L$ and $\widetilde{K}$ seem to have only minor impacts on performance. When we increase the sample size from $nT=200$ to $nT=800$ , the RMSE values are roughly halved, demonstrating $\sqrt{nT}$ -consistency of the estimators, which numerically corroborates our theoretical result in Theorem 3.1(ii).

Table 5.2 presents the simulation results for estimating $\beta_{0}$ . Here, GMM 1 and 2SLS perform quite similarly, whereas GMM 2 is slightly less accurate than the other two. Notably, the RMSEs remain almost unchanged across different values of $r$ for all three estimators, suggesting that IV strength is not a critical factor. As with $\alpha_{0}$ , the estimation of $\beta_{0}$ also demonstrates $\sqrt{nT}$ -consistency.

Table 5.1: Simulation result:

\alpha_{0}

					GMM 1		GMM 2		2SLS
$n$	$T$	$L$	$\widetilde{K}$	$r$	BIAS	RMSE	BIAS	RMSE	BIAS	RMSE
40	5	10	2	0.4	0.0062	0.1132	-0.0517	0.3820	0.0479	0.2135
				1	0.0083	0.0649	-0.0205	0.2128	0.0136	0.0820
			3	0.4	0.0061	0.1131	-0.0564	0.3938	0.0479	0.2135
				1	0.0083	0.0649	-0.0242	0.2230	0.0136	0.0820
		30	2	0.4	0.0064	0.1135	-0.0406	0.3562	0.0479	0.2135
				1	0.0084	0.0650	-0.0138	0.1930	0.0136	0.0820
			3	0.4	0.0064	0.1135	-0.0418	0.3610	0.0479	0.2134
				1	0.0084	0.0650	-0.0150	0.1965	0.0136	0.0820
	10	10	2	0.4	-0.0027	0.0773	-0.0357	0.2701	0.0142	0.1370
				1	-0.0007	0.0456	-0.0179	0.1419	0.0028	0.0540
			3	0.4	-0.0027	0.0773	-0.0387	0.2796	0.0142	0.1370
				1	-0.0007	0.0456	-0.0203	0.1493	0.0028	0.0540
		30	2	0.4	-0.0028	0.0773	-0.0296	0.2478	0.0142	0.1370
				1	-0.0007	0.0455	-0.0146	0.1260	0.0028	0.0540
			3	0.4	-0.0029	0.0773	-0.0307	0.2520	0.0142	0.1370
				1	-0.0007	0.0455	-0.0150	0.1283	0.0028	0.0540
80	5	10	2	0.4	-0.0030	0.0812	-0.0394	0.2801	0.0140	0.1495
				1	-0.0004	0.0470	-0.0196	0.1508	0.0019	0.0593
			3	0.4	-0.0030	0.0812	-0.0423	0.2904	0.0140	0.1495
				1	-0.0004	0.0470	-0.0210	0.1578	0.0019	0.0593
		30	2	0.4	-0.0029	0.0813	-0.0336	0.2578	0.0140	0.1495
				1	-0.0004	0.0470	-0.0163	0.1344	0.0019	0.0593
			3	0.4	-0.0029	0.0813	-0.0342	0.2619	0.0140	0.1495
				1	-0.0004	0.0470	-0.0163	0.1365	0.0019	0.0593
	10	10	2	0.4	-0.0024	0.0539	-0.0142	0.1996	0.0023	0.0966
				1	-0.0016	0.0321	-0.0058	0.0980	-0.0007	0.0385
			3	0.4	-0.0024	0.0539	-0.0168	0.2071	0.0023	0.0966
				1	-0.0016	0.0321	-0.0076	0.1032	-0.0007	0.0385
		30	2	0.4	-0.0023	0.0538	-0.0109	0.1812	0.0023	0.0966
				1	-0.0016	0.0320	-0.0044	0.0860	-0.0007	0.0385
			3	0.4	-0.0024	0.0538	-0.0118	0.1842	0.0023	0.0966
				1	-0.0016	0.0320	-0.0046	0.0874	-0.0007	0.0385

Table 5.2: Simulation result:

\beta_{0}

					GMM 1		GMM 2		2SLS
$n$	$T$	$L$	$\widetilde{K}$	$r$	BIAS	RMSE	BIAS	RMSE	BIAS	RMSE
40	5	10	2	0.4	0.0015	0.0653	0.0039	0.0966	0.0025	0.0681
				1	0.0034	0.0669	0.0128	0.1091	-0.0019	0.0670
			3	0.4	0.0015	0.0653	0.0042	0.0978	0.0025	0.0681
				1	0.0034	0.0669	0.0139	0.1115	-0.0019	0.0670
		30	2	0.4	0.0015	0.0653	0.0032	0.0928	0.0025	0.0681
				1	0.0034	0.0669	0.0098	0.1029	-0.0019	0.0670
			3	0.4	0.0015	0.0653	0.0033	0.0930	0.0025	0.0681
				1	0.0034	0.0669	0.0100	0.1032	-0.0019	0.0670
	10	10	2	0.4	0.0004	0.0428	0.0032	0.0612	0.0001	0.0438
				1	0.0014	0.0435	0.0088	0.0654	-0.0014	0.0439
			3	0.4	0.0004	0.0428	0.0034	0.0617	0.0001	0.0438
				1	0.0014	0.0435	0.0094	0.0666	-0.0014	0.0439
		30	2	0.4	0.0004	0.0428	0.0026	0.0590	0.0001	0.0438
				1	0.0014	0.0435	0.0072	0.0620	-0.0014	0.0439
			3	0.4	0.0004	0.0428	0.0027	0.0591	0.0001	0.0438
				1	0.0014	0.0435	0.0073	0.0621	-0.0014	0.0439
80	5	10	2	0.4	0.0022	0.0435	0.0057	0.0669	0.0030	0.0445
				1	0.0039	0.0442	0.0128	0.0739	0.0010	0.0442
			3	0.4	0.0022	0.0435	0.0059	0.0676	0.0030	0.0445
				1	0.0039	0.0442	0.0135	0.0754	0.0010	0.0442
		30	2	0.4	0.0022	0.0435	0.0051	0.0641	0.0030	0.0445
				1	0.0038	0.0442	0.0110	0.0695	0.0010	0.0442
			3	0.4	0.0022	0.0435	0.0051	0.0642	0.0030	0.0445
				1	0.0038	0.0442	0.0110	0.0697	0.0010	0.0442
	10	10	2	0.4	0.0009	0.0304	0.0015	0.0424	0.0013	0.0306
				1	0.0016	0.0306	0.0041	0.0446	0.0005	0.0305
			3	0.4	0.0009	0.0304	0.0017	0.0428	0.0013	0.0306
				1	0.0016	0.0306	0.0045	0.0455	0.0005	0.0305
		30	2	0.4	0.0009	0.0303	0.0012	0.0407	0.0013	0.0306
				1	0.0016	0.0306	0.0034	0.0421	0.0005	0.0305
			3	0.4	0.0009	0.0303	0.0013	0.0408	0.0013	0.0306
				1	0.0016	0.0306	0.0034	0.0422	0.0005	0.0305

Analyzing the Demand of Bike-Sharing System

As an empirical application of our method, we analyze spatial interactions in the demand for a bike-sharing system in the U.S. Demand analysis of shared mobility has been a highly active research topic in recent years across various areas, including transportation research, marketing, economics, and environmental studies. In particular, bike-sharing systems have attracted increasing attention. For a comprehensive review of this literature, see Eren and Uz (2020), for instance.

Data

The dataset used in this analysis comes from the Bay Area Bike Share in San Francisco, which was established in August 2013 and is now known as Bay Wheels. The dataset is publicly available on the Kaggle website (https://www.kaggle.com/datasets/benhamner/sf-bay-area-bike-share). It contains detailed information about the system from August 2013 to August 2015, including station locations, the number of available bicycles at each station over time, and all trip-level data during this time period. The trip data include details such as start and end times and stations, as well as the user type (subscriber or casual user). In this dataset, there are 70 bike stations in total; for a map of all 70 station locations, see Figure 6.1.

Refer to caption — Figure 6.1: Locations of bike stations

Since the initial installation of stations in August 2013, the 70th station (Ryland Park station) was added in April 2014. Accordingly, we use data from May 2014 to August 2015 for this analysis, which represents the largest balanced panel dataset that can be extracted from the raw data.

One concern in the analysis is that the shared mobility services often relocate vehicles or bikes from one station to another to maintain service availability across all locations. To detect potential relocations, we first identified instances where the number of available bicycles jumped up/down by more than or equal to 10 all at once. We then examined the distribution of these events across different hours and days, as given in Figure D.1 in Appendix D. From this figure, we can observe that sudden drops or increases in bike availability tend to occur between midnight and early morning, particularly on Sundays. Although we cannot access to formal records of actual relocation operations, these patterns may suggest that they are likely the result of bike relocation carried out by the service provider. Another concern is the enormous size of the dataset. Because the original data are recorded in minutes every day, using the raw data directly can lead to a memory problem. Moreover, daily data tend to fluctuate and to be noisy due to random events.

To address the aforementioned issues, we first rounded the trip data to 15-minute intervals and then averaged over Monday through Friday at each interval, discarding data from Saturdays and Sundays. Furthermore, to avoid potential bike relocation events in weekdays, we restrict the analysis to the time period from 6 AM to 9 PM. Consequently, our final dataset is a weekly-level panel with $n=70$ stations and $T=69$ weeks. The outcome of interest is the number of bicycles at each station for $s\in[0,1]$ , where $s=0$ corresponds to 6 AM and $s=1$ corresponds to 9 PM.

Figure 6.2 presents the trajectories of average bike availability for all 70 stations during the first week in our panel. It clearly shows that most of the variation in bike availability occurs between 6 AM and 9 PM.

Empirical results

Based on the dataset constructed as described above, we estimate model (1.2), where

$\displaystyle Y_{it}(s)$	$\displaystyle=\text{number of available bikes at each station}$	(6.1)
$\displaystyle A(Y_{jt},s)$	$\displaystyle=\text{average of $Y_{jt}$ in the past one hour}$	(6.2)
$\displaystyle X_{it}$	$\displaystyle=\text{[\;ratio of round trips, ratio of subscribers (departing from station $i$), ratio of}$	(6.3)
	subscribers (arriving at station $i$ ), rainy day dummy, month dummies ]^⊤	(6.4)
$\displaystyle w_{ij}$	$\displaystyle=\frac{\widetilde{w}_{ij}}{\sum_{j\neq i}\widetilde{w}_{ij}},\;\;\text{where}\;\;\widetilde{w}_{ij}=\frac{\bm{1}\{\text{dist}(i,j)\leq 1\text{km}\}}{\text{dist}(i,j)}$	(6.5)

Here, $\text{dist}(i,j)$ denotes the Euclidean distance between stations $i$ and $j$ . The estimation procedure is basically the same as the GMM 1 estimator in Section 5, with $\widetilde{K}=3$ . The rainy day dummy and month dummy variables are not used as IVs. All integrals are approximated by finite summations over grid points at 15-minute intervals.

The estimation result for the interaction effect function $\alpha_{0}$ is presented in Figure 6.3. In the figure, the shaded area depicts the (pointwise) 95% confidence interval. From the figure, we observe that positive spatial interaction in bicycle availability exists during the morning hours. Although the model itself is agnostic regarding the reason for the interaction, it is plausible that as bike-sharing becomes more popular particularly among commuters, it encourages further use of the service, thereby reinforcing demand during the morning. Meanwhile, interestingly, negative interaction appears around 5–7 PM. In the evening, main users may include not only returning commuters but also individuals going out for dining, shopping, concerts, etc. As a result, bicycles might accumulate at certain popular stations while nearby less-popular stations experience lower availability, leading to the negative interaction.

To save space, the estimation results for $\beta_{0}(s)$ are presented in Figure D.2 in Appendix D, excluding the coefficients for the month dummies. Among the key covariates, we observe that only the ratio of arriving subscribers has a statistically significant positive impact on bike availability. This result is intuitive, as stations with a higher number of regular users arriving are expected to hold a richer stock of bikes. For other variables, for instance, the rainy day dummy has a positive effect on bike availability, which is consistent with previous studies, though the effect is not statistically significant. One possible explanation is that the rainy ”day” dummy does not capture detailed temporal variations (i.e., it is not a function of $s$ ), and since our dataset is averaged over weeks, these may have diluted its impact.

Lastly, we conduct an impulse response analysis. The figures summarizing the results are presented in Figure 6.4. For illustration, we arbitrarily select the Embarcadero at Folsom station as the target station receiving an external shock. Specifically, we consider a hypothetical scenario in which the bike stock at this station is reduced by 2 at the peak of 9 AM (panel (a)). Panels (b) and (c) illustrate how the shock propagates to its two nearest stations, Spear at Folsom and Temporary Transbay Terminal. These figures indicate that the external shock spills over to these stations with a slight time delay, peaking just before 10 AM. Since the magnitude of both the external shock and spatial interaction is moderate in this analysis, the impulse responses for both stations are relatively mild.

Conclusion

In this paper, we proposed a novel functional regression framework to analyze spatial and network interactions in functional panel data settings. By extending the standard NAR model to accommodate functional outcomes and individual fixed effects, we developed an integrated-GMM estimator that can estimate the functional parameters potentially more efficiently than 2SLS-based estimators. Under certain conditions, we established the theoretical properties of our estimator, including the consistency, convergence rates, and asymptotic normality, and confirmed its finite-sample performance through Monte Carlo simulations. As an empirical application, we analyzed the demand for a bike-sharing system in the San Francisco Bay Area, revealing significant spatial interactions in bike availability that vary over the time of day. Our findings highlight the importance of accounting for functional spatial dependencies in the demand for shared mobility services and the practical usefulness of our method.

Several unsolved questions still remain. These include: How can we specify the weight function $\nu$ in a data-driven manner? Is it possible to extend the current framework to cases where functional outcomes are only sparsely observed? How can we construct a uniform confidence band for each functional parameter? How can we estimate the model when it has a large number of covariates? We leave these questions for future research.

Acknowledgments

Hoshino’s work was supported by JSPS KAKENHI Grant Number 23KK0226. Most parts of this paper were written during Hoshino’s research visit at the Melbourne Business School (MBS), University of Melbourne. He is deeply grateful to MBS for their hospitality.

Appendix A Preparation

The following definition is from Jenish and Prucha (2012).

Definition A.1 (Near-epoch dependence).

Let $\bm{x}=\{x_{n,i}:i\in\mathcal{D}_{n};\;n\geq 1\}$ and $\bm{e}=\{e_{n,i}:i\in\mathcal{D}_{n};\;n\geq 1\}$ be triangular arrays of random fields, where $x$ and $e$ are real-valued and general (possibly infinite-dimensional) random variables, respectively. Then, the random field $\bm{x}$ is said to be $L^{p}$ -near-epoch dependent (NED) on $\bm{e}$ if

\left\|x_{n,i}-\mathbb{E}\left[x_{n,i}\mid\mathcal{F}_{n,i}(\delta)\right]\right\|_{p}\leq c_{n,i}\rho(\delta)

for an array of finite positive constants $\{c_{n,i}:i\in\mathcal{D}_{n};\;n\geq 1\}$ and some function $\rho(\delta)\geq 0$ with $\rho(\delta)\to 0$ as $\delta\to\infty$ , where $\mathcal{F}_{n,i}(\delta)$ is the $\sigma$ -field generated by $\{e_{n,j}:\Delta(i,j)\leq\delta\}$ . The $c_{n,i}$ ’s and $\rho(\delta)$ are called the NED scaling factors and NED coefficient, respectively. The $\bm{x}$ is said to be uniformly $L^{p}$ -NED on $\bm{e}$ if $c_{n,i}$ is uniformly bounded. If $\rho(\delta)\lesssim\varrho^{\delta}$ for some $0<\varrho<1$ , then it is called geometrically $L^{p}$ -NED.

In the following, for a general $\theta=(\theta_{\alpha}^{\top},\theta_{1}^{\top},\ldots,\theta_{d_{x}}^{\top})^{\top}\in\Theta_{K}$ , we denote

	$\displaystyle\alpha(s;\theta)$	$\displaystyle\coloneqq\phi^{K}(s)^{\top}\theta_{\alpha}$		(A.1)
	$\displaystyle\beta_{j}(s;\theta)$	$\displaystyle\coloneqq\phi^{K}(s)^{\top}\theta_{j},\;\;j\in[d_{x}]$		(A.2)

Since we have assumed that the basis functions are continuous, so are $\alpha(s;\theta)$ and $\beta_{j}(s;\theta)$ , and thus they are uniformly bounded on $[0,1]$ by the extreme value theorem. For a given $\theta$ , the residual vector can be written as $\bm{E}(s;\theta)=(E_{1}(s;\theta)^{\top},\ldots,E_{T}(s;\theta)^{\top})$ , where

	$\displaystyle E_{t}(s;\theta)$	$\displaystyle=(e_{1t}(s;\theta),\ldots,e_{nt}(s;\theta))^{\top}$		(A.3)
	$\displaystyle e_{it}(s;\theta)$	$\displaystyle=Y_{it}(s)-\alpha(s;\theta)A(\overline{Y}_{it},s)-\sum_{j=1}^{d_{x}}X_{it}^{j}\beta_{j}(s;\theta).$		(A.4)

Under Assumptions 2.1(i), 3.2, and 3.4, we have

	$\displaystyle\left\\|e_{it}(s;\theta)\right\\|_{p}$	$\displaystyle\leq\left\\|Y_{it}(s)\right\\|_{p}+\|\alpha(s;\theta)\|\sum_{j=1}^{n}\|w_{i,j}\|\left\\|A(Y_{jt},s)\right\\|_{p}+\sum_{j=1}^{d_{x}}\|X_{it}^{j}\|\cdot\|\beta_{j}(s;\theta)\|$		(A.5)
		$\displaystyle\lesssim 1$		(A.6)

for $p>4$ , uniformly in $s\in[0,1]$ , $\theta\in\Theta_{K}$ , and $(i,t)$ .

Finally, for ease of reference, we provide a list of some basic facts below:

$\displaystyle\bm{D}\bm{E}(s;\theta)$	$\displaystyle=\bm{D}\bm{H}(s)(\theta_{0}-\theta)+\bm{D}\bm{V}(s)+\bm{D}\bm{\mathcal{E}}(s)$	(A.7)
$\displaystyle\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta)$	$\displaystyle=\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{H}(s)(\theta_{0}-\theta)+\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s)+\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)$	(A.8)
$\displaystyle\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta_{0})$	$\displaystyle=\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s)+\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)$	(A.9)
$\displaystyle\bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s;\theta)$	$\displaystyle=(\theta_{0}-\theta)^{\top}\bm{H}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s)(\theta_{0}-\theta)+\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s)$	(A.10)
	$\displaystyle\quad+\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)+2\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s)(\theta_{0}-\theta)$	(A.11)
	$\displaystyle\quad+2\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s)(\theta_{0}-\theta)+2\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)$	(A.12)
$\displaystyle\bm{E}(s;\theta_{0})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s;\theta_{0})$	$\displaystyle=\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s)+\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)+2\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)$	(A.13)

Empirical moment function:

\displaystyle g_{nT}(s;\theta)\coloneqq\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta)\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{E}(s;\theta)\\ \vdots\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{E}(s;\theta)\end{array}\right)

(A.18)

Jacobian of $g_{nT}(s;\theta)$ :

\displaystyle J_{nT}(s;\theta)

\displaystyle\coloneqq\frac{\partial g_{nT}(s;\theta)}{\partial\theta^{\top}}=-\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{H}(s)\\ 2\bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{H}(s)\\ \vdots\\ 2\bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{H}(s)\end{array}\right)

(A.23)

Decompose $\overline{g}_{nT}(\theta_{0})=\overline{g}_{1,nT}+\overline{g}_{2,nT}$ with

	$\displaystyle\overline{g}_{1,nT}$	$\displaystyle\coloneqq\frac{1}{N}\sum_{l=1}^{L}\left(\begin{array}[]{c}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})\\ \bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s_{l})\\ \vdots\\ \bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s_{l})\end{array}\right)$		(A.28)
	$\displaystyle\overline{g}_{2,nT}$	$\displaystyle\coloneqq\frac{1}{N}\sum_{l=1}^{L}\left(\begin{array}[]{c}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})\\ \bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{V}(s_{l})+2\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s_{l})\\ \vdots\\ \bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{V}(s_{l})+2\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s_{l})\end{array}\right)$		(A.33)

The variance-covariance matrix of $\sqrt{n(T-1)}\overline{g}_{1,nT}$ :

\displaystyle\mathcal{V}_{nT}\coloneqq n(T-1)\mathbb{E}\left[\overline{g}_{1,nT}\overline{g}_{1,nT}^{\top}\right]=\left(\begin{array}[]{cccc}\mathcal{V}_{z,nT}&\bm{0}_{(d_{q}+d_{x})K\times 1}&\cdots&\bm{0}_{(d_{q}+d_{x})K\times 1}\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{11,nT}&\cdots&\mathcal{V}_{1M,nT}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{M1,nT}&\cdots&\mathcal{V}_{MM,nT}\end{array}\right)

(A.38)

where

\displaystyle\begin{split}\mathcal{V}_{z,nT}&\coloneqq\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{\mathcal{E}}(s_{l})\bm{\mathcal{E}}(s_{l^{\prime}})^{\top}]\bm{D}^{\top}\bm{D}\bm{Z}(s_{l^{\prime}})\\ &=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T}\sum_{i=1}^{n}z_{it}^{\dagger}(s_{l})z_{it}^{\dagger}(s_{l^{\prime}})^{\top}\Gamma_{it}(s_{l},s_{l^{\prime}}),\end{split}

(A.39)

$z_{it}^{\dagger}(s)$ denotes the $it$ -th column of $\underbracket{\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}}_{(d_{q}+d_{x})K\times nT}$ , $\underbracket{\bm{D}^{\top}P_{m}\bm{D}}_{nT\times nT}\coloneqq\widetilde{P}_{m}=(\widetilde{p}_{m,it,jt})_{11\leq it,jt\leq nT}$ , and, noting that $\text{diag}(\widetilde{P}_{m})=\bm{0}_{n(T-1)}$ and that $\widetilde{P}_{m}$ is symmetric,

\displaystyle\begin{split}\mathcal{V}_{ab,nT}&\coloneqq\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{a}\bm{D}\bm{\mathcal{E}}(s_{l})\bm{\mathcal{E}}(s_{l^{\prime}})^{\top}\bm{D}^{\top}P_{b}\bm{D}\bm{\mathcal{E}}(s_{l^{\prime}})]\\ &=\frac{1}{n(T-1)}\sum_{t=1}^{T}\sum_{t^{\prime}=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\sum_{1\leq j_{1},j_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,j_{1}t^{\prime},j_{2}t^{\prime}}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[\varepsilon_{i_{1}t}(s_{l})\varepsilon_{i_{2}t}(s_{l})\varepsilon_{j_{1}t^{\prime}}(s_{l^{\prime}})\varepsilon_{j_{2}t^{\prime}}(s_{l^{\prime}})]\\ &=\frac{1}{n(T-1)}\sum_{t=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,i_{1}t,i_{2}t}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{i_{1}t}(s_{l},s_{l^{\prime}})\Gamma_{i_{2}t}(s_{l},s_{l^{\prime}})\\ &\quad+\frac{1}{n(T-1)}\sum_{t=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,i_{2}t,i_{1}t}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{i_{1}t}(s_{l},s_{l^{\prime}})\Gamma_{i_{2}t}(s_{l},s_{l^{\prime}})\\ &=\frac{2}{n(T-1)}\sum_{t=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,i_{1}t,i_{2}t}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{i_{1}t}(s_{l},s_{l^{\prime}})\Gamma_{i_{2}t}(s_{l},s_{l^{\prime}}).\end{split}

(A.40)

Note that the cross terms between the linear and quadratic moments are zero:

	$\displaystyle\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})\bm{\mathcal{E}}(s_{l^{\prime}})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l^{\prime}})]$		(A.41)
	$\displaystyle=\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T}\sum_{t^{\prime}=1}^{T}\sum_{i=1}^{n}\sum_{1\leq i_{1},i_{2}\leq n}z_{it}^{\dagger}(s_{l})\widetilde{p}_{m,i_{1}t^{\prime},i_{2}t^{\prime}}\mathbb{E}[\varepsilon_{it}(s_{l})\varepsilon_{i_{1}t^{\prime}}(s_{l^{\prime}})\varepsilon_{i_{2}t^{\prime}}(s_{l^{\prime}})]=\bm{0}_{(d_{q}+d_{x})K}.$		(A.42)

Appendix B Proofs

Proof of Proposition 2.1

Under Assumption 2.1, we have

\displaystyle\begin{split}\left\|\{\mathcal{A}H\}_{i}\right\|_{L^{2}}=\left\|\alpha_{0}(\cdot)\sum_{j=1}^{n}w_{i,j}A(h_{j},\cdot)\right\|_{L^{2}}&\leq\sum_{j=1}^{n}|w_{i,j}|\left\|\alpha_{0}(\cdot)A(h_{j},\cdot)\right\|_{L^{2}}\\ &=\sum_{j=1}^{n}|w_{i,j}|\left(\int_{0}^{1}\left|\alpha_{0}(s)A(h_{j},s)\right|^{2}\text{d}s\right)^{1/2}\\ &\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\left\|A(h_{j},\cdot)\right\|_{L^{2}}\\ &\leq\overline{\alpha}_{0}||W_{n}||_{\infty}\max_{1\leq j\leq n}||h_{j}||_{L^{2}}<||H||_{\infty,2}<\infty\end{split}

(B.1)

for any $H\in\mathcal{H}_{n,2}$ . This implies that $\mathcal{A}H\in\mathcal{H}_{n,2}$ . As is well known, if the operator norm of $\mathcal{A}$ is smaller than one, $(\text{Id}-\mathcal{A})^{-1}$ exists (e.g., Theorem 2.14, Kress (2014)). It is immediate from (B.1) that $\left\|\mathcal{A}H\right\|_{\infty,2}<1$ follows for any $H$ such that $||H||_{\infty,2}=1$ , which yields the desired result. $\blacksquare$

Lemma B.1.

Suppose that Assumptions 2.1(i), 3.2, 3.3(ii), 3.4, 3.5(i), and 3.7 hold. Then, $\mathbb{E}[g_{nT}(s,\theta_{0})]\lesssim\bm{1}_{d_{g}}K^{-\pi}$

Proof.

Observe that

\displaystyle\mathbb{E}[g_{nT}(s;\theta_{0})]

\displaystyle=\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{V}(s)]\\ \mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{V}(s)]+2\mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s)]\\ \vdots\\ \mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{V}(s)]+2\mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s)]\end{array}\right)

(B.6)

By Assumptions 3.2(ii) and 3.4,

	$\displaystyle\|A(\mathbb{E}[Y_{jt}],s)\|$	$\displaystyle\leq\int_{0}^{1}\|\mathbb{E}[Y_{jt}(u)]\|\omega_{1}(u,s)\text{d}u$		(B.7)
		$\displaystyle\leq\int_{0}^{1}\mathbb{E}\|Y_{jt}(u)\|\omega_{1}(u,s)\text{d}u\lesssim 1$		(B.8)

uniformly in $s\in[0,1]$ , implying that $\sup_{s\in[0,1]}|A(\mathbb{E}[Y_{jt}],s)|\lesssim 1$ . Then, we have

	$\displaystyle\|\mathbb{E}[v_{it}(s)]\|$	$\displaystyle\leq\sum_{j=1}^{n}\|w_{i,j}\|\cdot\|A(\mathbb{E}[Y_{jt}],s)\|\cdot\|\alpha_{0}(s)-\alpha(s;\theta_{0})\|+\sum_{j=1}^{d_{x}}\|X_{it}^{j}\|\cdot\|\beta_{0j}(s)-\beta_{j}(s;\theta_{0})\|$		(B.9)
		$\displaystyle\lesssim K^{-\pi}$		(B.10)

uniformly in $s\in[0,1]$ and $(i,t)$ under Assumption 3.7. This implies that the first $(d_{q}+d_{x})K$ elements of $\mathbb{E}[g_{nT}(s;\theta_{0})]$ are of order $K^{-\pi}$ .

Next, by Cauchy-Schwarz inequality and the facts that $\lambda_{\max}(\bm{D}\bm{D}^{\top})\leq 4$ and $\lambda_{\max}(P_{m}P_{m}^{\top})\lesssim 1$ under Assumption 3.5(i), we obtain

$\displaystyle\mathbb{E}\left\|\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s)\right\|$	$\displaystyle\leq\sqrt{\mathbb{E}\left\\|\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\right\\|^{2}}\sqrt{\mathbb{E}\left\\|\bm{V}(s)\right\\|^{2}}$	(B.11)
	$\displaystyle\leq\sqrt{\text{trace}\{\bm{D}^{\top}P_{m}\bm{D}\bm{D}^{\top}P_{m}^{\top}\bm{D}\mathbb{E}[\bm{V}(s)\bm{V}(s)^{\top}]\}}\sqrt{\mathbb{E}\left\\|\bm{V}(s)\right\\|^{2}}$	(B.12)
	$\displaystyle\lesssim\sum_{t=1}^{T}\sum_{i=1}^{n}\mathbb{E}\|v_{it}(s)\|^{2}.$	(B.13)

Similarly as above, by the $c_{r}$ inequality,

	$\displaystyle\mathbb{E}\|v_{it}(s)\|^{2}$	$\displaystyle\leq 2\mathbb{E}\left\|\sum_{j=1}^{n}w_{i,j}A(Y_{jt},s)[\alpha_{0}(s)-\alpha(s;\theta_{0})]\right\|^{2}+2\left\|\sum_{j=1}^{d_{x}}X_{it}^{j}(\beta_{0j}(s)-\beta_{j}(s;\theta_{0}))\right\|^{2}$		(B.14)
		$\displaystyle\leq 2\sum_{j=1}^{n}\sum_{j^{\prime}=1}^{n}w_{i,j}w_{i,j^{\prime}}\mathbb{E}[A(Y_{jt},s)A(Y_{j^{\prime}t},s)][\alpha_{0}(s)-\alpha(s;\theta_{0})]^{2}+cK^{-2\pi}.$		(B.15)

By Cauchy-Schwarz inequality,

\displaystyle|\mathbb{E}[A(Y_{jt},s)A(Y_{j^{\prime}t},s)]|

\displaystyle\leq\left\|A(Y_{jt},s)\right\|_{2}\left\|A(Y_{j^{\prime}t},s)\right\|_{2}.

(B.16)

Further, Assumptions 3.2(ii) and 3.4 imply that $\mathbb{E}|A(Y_{jt},s)|^{2}\leq\int_{0}^{1}\mathbb{E}|Y_{jt}(u)|^{2}\omega_{2}(u,s)\text{d}u\lesssim 1$ uniformly in $s\in[0,1]$ and $(j,t)$ . Thus, $|\mathbb{E}[A(Y_{jt},s)A(Y_{j^{\prime}t},s)]|$ is uniformly bounded, and we have

\displaystyle\left\|v_{it}(s)\right\|_{2}\lesssim K^{-\pi}

(B.17)

uniformly in $s\in[0,1]$ and $(i,t)$ .

Lastly, by Cauchy-Schwarz and Minkowski’s inequalities,

$\displaystyle\mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)]$	$\displaystyle=\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}p_{m,i,j}\mathbb{E}[(v_{i,t+1}(s)-v_{it}(s))(\varepsilon_{j,t+1}(s)-\varepsilon_{jt}(s))]$	(B.18)
	$\displaystyle\leq\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}\|p_{m,i,j}\|\mathbb{E}\|(v_{i,t+1}(s)-v_{it}(s))(\varepsilon_{j,t+1}(s)-\varepsilon_{jt}(s))\|$	(B.19)
	$\displaystyle\leq\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}\|p_{m,i,j}\|\cdot\|\|v_{i,t+1}(s)-v_{it}(s)\|\|_{2}\|\|\varepsilon_{j,t+1}(s)-\varepsilon_{jt}(s)\|\|_{2}$	(B.20)
	$\displaystyle\leq\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}\|p_{m,i,j}\|\cdot\{\|\|v_{i,t+1}(s)\|\|_{2}+\|\|v_{it}(s)\|\|_{2}\}\{\|\|\varepsilon_{j,t+1}(s)\|\|_{2}+\|\|\varepsilon_{jt}(s)\|\|_{2}\}$	(B.21)
	$\displaystyle\lesssim n(T-1)K^{-\pi}$	(B.22)

where the last inequality follows from (B.17) and Assumptions 3.3(ii) and 3.5(i). Combining these results gives the desired result. ∎

Denote the population GMM objective function as follows:

\displaystyle\mathcal{Q}^{*}_{nT}(\theta)\coloneqq\mathbb{E}[\overline{g}_{nT}(\theta)]^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta)]

(B.23)

Lemma B.2.

Suppose that Assumptions 2.1(i), 3.2, 3.3(ii), 3.4, 3.5, 3.6, and 3.7 hold. In addition, assume that $K^{(1-2\pi)/2}\to 0$ as $nT\to\infty$ . Then, for any $\theta\in\Theta_{K}$ and $e>0$ such that $\left\|\theta-\theta_{0}\right\|\geq e$ , there exists a constant $c_{e}>0$ such that $\mathcal{Q}^{*}_{nT}(\theta)-\mathcal{Q}^{*}_{nT}(\theta_{0})>c_{e}$ for all sufficiently large $nT$ .

Proof.

Decompose

	$\displaystyle\mathcal{Q}^{}_{nT}(\theta)-\mathcal{Q}^{}_{nT}(\theta_{0})$	$\displaystyle=\underbracket{\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{\top}\Omega_{nT}\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)}_{\eqqcolon A_{nT}(\theta)}$		(B.24)
		$\displaystyle\quad+2\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta_{0})]$		(B.25)

In view of

\displaystyle\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]=\frac{1}{N}\sum_{l=1}^{L}\left(\begin{array}[]{c}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{H}(s_{l})]\\ \mathbb{E}\left[(\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l}))^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{H}(s_{l})\right]\\ \vdots\\ \mathbb{E}\left[(\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l}))^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{H}(s_{l})\right]\end{array}\right)(\theta_{0}-\theta),

(B.30)

we can find that $A_{nT}(\theta)$ is bounded below from $\lambda_{\min}(\Omega_{nT})\lambda_{\min}(\Pi_{nT}^{\top}\Pi_{nT})||\theta_{0}-\theta||^{2}\geq c_{1}e^{2}$ for some $c_{1}>0$ for all sufficiently large $nT$ , under Assumptions 3.5(ii) and 3.6. Further, Cauchy-Schwarz inequality and Lemma B.1 give that

	$\displaystyle\left\|\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right\|$	$\displaystyle\leq(A_{nT}(\theta))^{1/2}\left(\mathbb{E}[\overline{g}_{nT}(\theta_{0})]^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{1/2}$		(B.31)
		$\displaystyle\leq c_{2}(A_{nT}(\theta))^{1/2}K^{(1-2\pi)/2}$		(B.32)

Hence, since $(A_{nT}(\theta))^{1/2}$ is bounded below from zero and $K^{(1-2\pi)/2}\to 0$ , we have

	$\displaystyle\mathcal{Q}^{}_{nT}(\theta)-\mathcal{Q}^{}_{nT}(\theta_{0})$	$\displaystyle\geq A_{nT}(\theta)-2c_{2}(A_{nT}(\theta))^{1/2}K^{(1-2\pi)/2}$		(B.33)
		$\displaystyle=(A_{nT}(\theta))^{1/2}((A_{nT}(\theta))^{1/2}-2c_{2}K^{(1-2\pi)/2})>0$		(B.34)

for all sufficiently large $nT$ . This completes the proof. ∎

Lemma B.3.

Suppose that Assumptions 2.1, 3.1, 3.2, and 3.4 hold. Then, for any given $s\in[0,1]$ and all $t\in[T]$ , $\{Y_{it}(s):i\in\mathcal{D}_{n};\;n\geq 1\}$ is uniformly and geometrically $L^{2}$ -NED on $\{\varepsilon_{it}:i\in\mathcal{D}_{n};\;n\geq 1\}$ .

Proof.

We prove the lemma in a similar manner to Jenish (2012) and Hoshino (2022). Recall that $Y_{t}$ is uniquely determined in $\mathcal{H}_{n,2}$ as $Y_{t}=(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}]+(\text{Id}-\mathcal{A})^{-1}\mathcal{E}_{t}$ under Assumption 2.1 for all $t\in[T]$ . We denote

\displaystyle[\xi_{1t}(\cdot),\ldots,\xi_{nt}(\cdot)]^{\top}=(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}]+(\text{Id}-\mathcal{A})^{-1}[\cdot]:\mathcal{H}_{n,2}\to\mathcal{H}_{n,2}

(B.35)

such that $Y_{it}=\xi_{it}(\mathcal{E}_{t})$ holds for each $i\in[n]$ .

Define

\displaystyle\mathcal{E}_{1,it}^{(\delta)}\coloneqq\{\varepsilon_{jt}\}_{j:\Delta(i,j)\leq\delta},\qquad\mathcal{E}_{2,it}^{(\delta)}\coloneqq\{\varepsilon_{jt}\}_{j:\Delta(i,j)>\delta}

(B.36)

for some $\delta>0$ . Since $L^{2}(0,1)$ is separable, both $\mathcal{E}_{1,it}^{(\delta)}$ and $\mathcal{E}_{2,it}^{(\delta)}$ are Polish space-valued random elements in $(\mathcal{H}_{|\{j:\Delta(i,j)\leq\delta\}|,2},||\cdot||_{\infty,2})$ and $(\mathcal{H}_{|\{j:\Delta(i,j)>\delta\}|,2},||\cdot||_{\infty,2})$ , respectively. Then, by Lemma 2.11 of Dudley and Philipp (1983) (see also Lemma A.1 of Jenish (2012)), a function $\chi$ exists such that $(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))$ has the same law as that of $(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})$ , which is an appropriate permutation of $\mathcal{E}$ , where $U$ is a random variable uniformly distributed on $[0,1]$ and independent of $\mathcal{E}_{1,it}^{(\delta)}$ .

Now, with a slight abuse of notation, we write

	$\displaystyle Y_{it}$	$\displaystyle=\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})\equiv\xi_{it}(\mathcal{E}_{t})$		(B.37)
	$\displaystyle Y^{(\delta)}_{it}$	$\displaystyle\coloneqq\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))\equiv\xi_{it}(\mathcal{E}_{t}^{(\delta)})$		(B.38)

where $\mathcal{E}^{(\delta)}_{t}=(\varepsilon_{1t}^{(\delta)},\ldots,\varepsilon_{nt}^{(\delta)})^{\top}$ . To be specific,

	$\displaystyle Y_{it}^{(\delta)}(s)$	$\displaystyle=\left\{(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t}^{(\delta)}](s)\right\}_{i}$		(B.39)
		$\displaystyle=\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt}^{(\delta)},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}^{(\delta)}(s).$		(B.40)

By construction, we have

	$\displaystyle\mathbb{E}[Y_{it}(s)\mid\mathcal{F}_{it}(\delta)]$	$\displaystyle=\mathbb{E}\left[\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)\mid\mathcal{E}_{1,it}^{(\delta)}\right]$
		$\displaystyle=\mathbb{E}\left[\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))(s)\mid\mathcal{E}_{1,it}^{(\delta)}\right]=\mathbb{E}[Y^{(\delta)}_{it}(s)\mid\mathcal{F}_{it}(\delta)],$

where $\mathcal{F}_{it}(\delta)$ is the $\sigma$ -field generated by $\mathcal{E}_{1,it}^{(\delta)}=\{\varepsilon_{jt}:\Delta(i,j)\leq\delta\}$ . Similarly, we have $||Y_{it}(u)-Y^{(\delta)}_{it}(u)||^{2}_{2}\leq 4||Y_{it}(u)||_{2}^{2}$ .

Here, suppose that $0<\delta<\overline{\Delta}$ , where $\overline{\Delta}$ is as provided in Assumption 3.1(ii). Then, because at least $i$ ’s own $\varepsilon_{it}$ is included in $\mathcal{E}_{1,it}^{(\delta)}$ , we have $\varepsilon_{it}\equiv\varepsilon_{it}^{(\delta)}$ , and hence

\displaystyle Y_{it}(s)-Y^{(\delta)}_{it}(s)=\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt}-Y^{(\delta)}_{jt},s)

(B.41)

holds. Thus, by Minkowski’s inequality and Assumptions 3.2(ii) and 3.4,

	$\displaystyle\left\\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\\|_{2}$	$\displaystyle=\left\\|\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\\|_{2}$
		$\displaystyle\leq\|\alpha_{0}(s)\|\sum_{j=1}^{n}\|w_{i,j}\|\cdot\left\\|A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\\|_{2}$
		$\displaystyle\leq\|\alpha_{0}(s)\|\sum_{j=1}^{n}\|w_{i,j}\|\cdot\left(\int_{0}^{1}\left\\|Y_{jt}(u)-Y^{(\delta)}_{jt}(u)\right\\|_{2}^{2}\>\omega_{2}(u,s)\text{d}u\right)^{1/2}\leq C\cdot\varrho,$

where $C\coloneqq 2\max_{i,t}\operatorname*{ess\,sup}_{u\in[0,1]}||Y_{it}(u)||_{2}$ , and $\varrho\coloneqq\overline{\alpha}_{0}||W_{n}||_{\infty}$ . Similarly, when $\overline{\Delta}\leq\delta<2\overline{\Delta}$ holds, noting now that under Assumption 3.1(ii) we have $\varepsilon_{jt}\equiv\varepsilon^{(\delta)}_{jt}$ for all $j$ ’s who are direct neighbors of $i$ ,

\displaystyle\begin{split}\left\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\|_{2}&\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\cdot\left\|A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\|_{2}\\ &=\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\cdot\left\|\sum_{k=1}^{n}w_{j,k}A(\alpha_{0}(\cdot)A(Y_{kt}-Y_{kt}^{(\delta)},\cdot),s)\right\|_{2}\\ &\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left\|A(\alpha_{0}(\cdot)A(Y_{kt}-Y_{kt}^{(\delta)},\cdot),s)\right\|_{2}\\ &\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left(\int_{0}^{1}\left\|\alpha_{0}(u)A(Y_{kt}-Y_{kt}^{(\delta)},u)\right\|_{2}^{2}\omega_{2}(u,s)\text{d}u\right)^{1/2}\\ &\leq\overline{\alpha}_{0}^{2}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left(\int_{0}^{1}\left\|A(Y_{kt}-Y_{kt}^{(\delta)},u)\right\|_{2}^{2}\omega_{2}(u,s)\text{d}u\right)^{1/2}\\ &\leq\overline{\alpha}_{0}^{2}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left(\int_{0}^{1}\int_{0}^{1}\left\|Y_{kt}(t)-Y^{(\delta)}_{kt}(t)\right\|_{2}^{2}\>\omega_{2}(t,u)\omega_{2}(u,s)\text{d}t\text{d}u\right)^{1/2}\leq C\cdot\varrho^{2}.\end{split}

Applying the same argument recursively, for $m\overline{\Delta}\leq\delta<(m+1)\overline{\Delta}$ such that $\varepsilon_{jt}\equiv\varepsilon^{(\delta)}_{jt}$ for all $j$ ’s in the $m$ -th order neighborhood of $i$ , we obtain

\displaystyle\left\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\|_{2}\leq C\cdot\varrho^{\lfloor\delta/\overline{\Delta}\rfloor+1}.

(B.42)

Finally, by Jensen’s inequality and (B.42),

\displaystyle\begin{split}\left\|Y_{it}(s)-\mathbb{E}[Y_{it}(s)\mid\mathcal{F}_{it}(\delta)]\right\|_{2}&=\left\|\int_{0}^{1}\left[\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(u,\mathcal{E}_{1,it}^{(\delta)}))(s)\right]\text{d}u\right\|_{2}\\ &\leq\left\{\mathbb{E}\int_{0}^{1}\left|\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(u,\mathcal{E}_{1,it}^{(\delta)}))(s)\right|^{2}\text{d}u\right\}^{1/2}\\ &=\left\{\mathbb{E}\left|\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))(s)\right|^{2}\right\}^{1/2}\\ &=\left\|\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))(s)\right\|_{2}\\ &=\left\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\|_{2}\leq C\cdot\varrho^{\lfloor\delta/\overline{\Delta}\rfloor+1}\to 0\end{split}

(B.43)

as $\delta\to\infty$ by Assumption 2.1. This completes the proof. ∎

Lemma B.4.

Suppose that Assumptions 2.1, 3.1, 3.2, and 3.4 hold. Then, for any given $s\in[0,1]$ and all $t\in[T]$ , $\{A(\overline{Y}_{it},s):i\in\mathcal{D}_{n};\;n\geq 1\}$ is uniformly and geometrically $L^{2}$ -NED on $\{\varepsilon_{it}:i\in\mathcal{D}_{n};\;n\geq 1\}$ .

Proof.

Note that $\mathcal{F}_{jt}((\delta-1)\overline{\Delta})\subseteq\mathcal{F}_{it}(\delta\overline{\Delta})$ for $(i,j)$ with $\Delta(i,j)\leq\overline{\Delta}$ and $\delta>1$ . Thus, by Lemma B.3,

$\displaystyle\left\\|\overline{Y}_{it}(s)-\mathbb{E}[\overline{Y}_{it}(s)\mid\mathcal{F}_{it}(\delta\overline{\Delta})]\right\\|_{2}$	$\displaystyle\leq\sum_{j=1}^{n}\|w_{i,j}\|\left\\|Y_{jt}(s)-\mathbb{E}[Y_{jt}(s)\mid\mathcal{F}_{it}(\delta\overline{\Delta})]\right\\|_{2}$	(B.44)
	$\displaystyle\lesssim\sum_{j:\Delta(i,j)\leq\overline{\Delta}}\left\\|Y_{jt}(s)-\mathbb{E}[Y_{jt}(s)\mid\mathcal{F}_{jt}((\delta-1)\overline{\Delta})]\right\\|_{2}$	(B.45)
	$\displaystyle\lesssim\varrho^{\lfloor\delta\rfloor},$	(B.46)

which implies that $\{\overline{Y}_{it}(s)\}$ is uniformly and geometrically $L^{2}$ -NED. By Assumption 3.4,

$\displaystyle\left\\|A(\overline{Y}_{it},s)-\mathbb{E}[A(\overline{Y}_{it},s)\mid\mathcal{F}_{it}(\delta)]\right\\|_{2}$	$\displaystyle=\left\\|A(\overline{Y}_{it}-\mathbb{E}[\overline{Y}_{it}\mid\mathcal{F}_{it}(\delta)],s)\right\\|_{2}$	(B.47)
	$\displaystyle\leq\left(\int_{0}^{1}\left\\|\overline{Y}_{it}(u)-\mathbb{E}[\overline{Y}_{it}(u)\mid\mathcal{F}_{it}(\delta)]\right\\|_{2}^{2}\>\omega_{2}(u,s)\text{d}u\right)^{1/2}$	(B.48)
	$\displaystyle\lesssim\varrho^{\lfloor\delta/\overline{\Delta}\rfloor}.$	(B.49)

This proves the desired result. ∎

As a useful consequence from Lemma B.3 and B.4, we have

$\displaystyle\left\\|e_{it}(s;\theta)-\mathbb{E}[e_{it}(s;\theta)\mid\mathcal{F}_{it}(\delta)]\right\\|_{2}$	$\displaystyle\leq\left\\|Y_{it}(s)-\mathbb{E}[Y_{it}(s)\mid\mathcal{F}_{it}(\delta)]\right\\|_{2}$	(B.50)
	$\displaystyle\quad+\|\alpha(s;\theta)\|\cdot\left\\|A(\overline{Y}_{it},s)-\mathbb{E}[A(\overline{Y}_{it},s)\mid\mathcal{F}_{it}(\delta)]\right\\|_{2}$	(B.51)
	$\displaystyle\lesssim\varrho^{\lfloor\delta/\overline{\Delta}\rfloor}$	(B.52)

uniformly in $s\in[0,1]$ , $\theta\in\Theta_{K}$ , and $(i,t)$ ; that is, $\{e_{it}(s;\theta)\}$ is uniformly and geometrically $L^{2}$ -NED.

Lemma B.5.

Suppose that Assumption 3.3(i) holds. Let $\{\xi_{it}:i\in\mathcal{D}_{n};\;n\geq 1\}$ be a geometrically $L^{2}$ -NED random field on $\{\varepsilon_{it}:i\in\mathcal{D}_{n};\;n\geq 1\}$ for all $t\in[T]$ , independent of $\{\varepsilon_{it^{\prime}}:i\in\mathcal{D}_{n};\;n\geq 1\}$ for $t^{\prime}\neq t$ . Denote $\vec{\xi}_{it}\coloneqq\xi_{i,t+1}-\xi_{it}$ and $C_{\xi}\coloneqq\max_{i,t}||\xi_{it}||_{2}$ . Then,

(i)

$\left|\text{Cov}\left(\vec{\xi}_{it},\vec{\xi}_{jt}\right)\right|\lesssim C_{\xi}^{2}\rho(\Delta(i,j)/3)$ for all $t\in[T]$ with some geometric NED coefficient $\rho$ ;
(ii)

$\left|\text{Cov}\left(\vec{\xi}_{i,t+1},\vec{\xi}_{jt}\right)\right|\lesssim C_{\xi}^{2}\rho(\Delta(i,j)/3)$ for all $t\in[T-1]$ with some geometric NED coefficient $\rho$ .

Proof.

Since the proofs are similar, we only prove (ii). Decompose $\vec{\xi}_{it}=\vec{\xi}_{1,it}^{(\delta)}+\vec{\xi}_{2,it}^{(\delta)}$ , where

\displaystyle\vec{\xi}_{1,it}^{(\delta)}\coloneqq\mathbb{E}\left[\vec{\xi}_{it}\mid\mathcal{F}^{+}_{it}(\delta)\right],\;\;\text{and}\;\;\vec{\xi}_{2,it}^{(\delta)}\coloneqq\vec{\xi}_{it}-\mathbb{E}\left[\vec{\xi}_{it}\mid\mathcal{F}^{+}_{it}(\delta)\right],

(B.53)

where $\mathcal{F}^{+}_{it}(\delta)$ is the sigma field generated from $\{(\varepsilon_{i^{\prime}t},\varepsilon_{i^{\prime},t+1}):\Delta(i,i^{\prime})\leq\delta\}$ . Since $\varepsilon_{i^{\prime}t}$ and $\varepsilon_{i^{\prime},t+1}$ are assumed to be independent, $\mathcal{F}^{+}_{it}(\delta)=\mathcal{F}_{it}(\delta)\lor\mathcal{F}_{i,t+1}(\delta)$ holds. Then, for each pair $\vec{\xi}_{i,t+1}$ and $\vec{\xi}_{jt}$ , denoting $\delta_{ij}\coloneqq\Delta(i,j)/3$ ,

$\displaystyle\left\|\mathrm{Cov}\left(\vec{\xi}_{i,t+1},\vec{\xi}_{jt}\right)\right\|$	$\displaystyle=\left\|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})}+\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}+\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right\|$	(B.54)
	$\displaystyle\leq\left\|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right\|+\left\|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right\|$	(B.55)
	$\displaystyle\quad+\left\|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right\|+\left\|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right\|.$	(B.56)

The first term on the right-hand side is zero by Assumption 3.3(i). Note that, by Jensen’s and triangle inequalities, $||\vec{\xi}_{1,it}^{(\delta_{ij})}||_{2}\leq||\vec{\xi}_{it}||_{2}\leq||\xi_{i,t+1}||_{2}+||\xi_{it}||_{2}\leq 2C_{\xi}$ . In addition, $||\vec{\xi}_{2,it}^{(\delta_{ij})}||_{2}\leq 2||\vec{\xi}_{it}||_{2}\leq 4C_{\xi}$ . Then, since $\{\xi_{it}\}$ is assumed to be $L^{2}$ -NED on $\{\varepsilon_{it}\}$ at each $t$ , it holds that

	$\displaystyle\left\\|\vec{\xi}_{2,it}^{(\delta_{ij})}\right\\|_{2}$	$\displaystyle=\left\\|\vec{\xi}_{it}-\mathbb{E}\left[\vec{\xi}_{it}\mid\mathcal{F}^{+}_{it}(\delta_{ij})\right]\right\\|_{2}$		(B.57)
		$\displaystyle\leq\left\\|\xi_{i,t+1}-\mathbb{E}\left[\xi_{i,t+1}\mid\mathcal{F}_{i,t+1}(\delta_{ij})\right]\right\\|_{2}+\left\\|\xi_{it}-\mathbb{E}\left[\xi_{it}\mid\mathcal{F}_{it}(\delta_{ij})\right]\right\\|_{2}\leq 4C_{\xi}\rho(\delta_{ij}).$		(B.58)

Hence, Cauchy–Schwarz inequality gives

\displaystyle\left|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right|

\displaystyle\leq 4\left\|\vec{\xi}_{1,i,t+1}^{(\delta_{ij})}\right\|_{2}\left\|\vec{\xi}_{2,jt}^{(\delta_{ij})}\right\|_{2}\leq 32C_{\xi}^{2}\rho(\delta_{ij}).

(B.59)

The same inequality applies to $\left|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right|$ . Furthermore,

\displaystyle\left|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right|

\displaystyle\leq 4\left\|\vec{\xi}_{2,i,t+1}^{(\delta_{ij})}\right\|_{2}\left\|\vec{\xi}_{2,jt}^{(\delta_{ij})}\right\|_{2}\leq 64C_{\xi}^{2}\rho(\delta_{ij}).

(B.60)

This completes the proof. ∎

Lemma B.6.

Suppose that Assumptions 2.1, 3.1 – 3.4, 3.5(i), and 3.7 hold. For all $m\in[M]$ and $\theta\in\Theta_{K}$ ,

(i)

$\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)/N\right\|\lesssim_{p}\sqrt{K}/\sqrt{nT}$
(ii)

$\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])/N\right\|\lesssim_{p}K/\sqrt{nT}$
(iii)

$\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})/N\right\|\lesssim_{p}K^{(1-2\pi)/2}$ , $\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{V}(s_{l})]/N\right\|\lesssim K^{(1-2\pi)/2}$
(iv)

$\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right\|\lesssim_{p}1/\sqrt{nT}$
(v)

$\left|\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)-\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]\right\}/N\right|\lesssim_{p}1/\sqrt{nT}$
(vi)

$\left|\sum_{l=1}^{L}\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s_{l})/N\right|\lesssim_{p}K^{-2\pi}$
(vii)

$\left|\sum_{l=1}^{L}\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right|\lesssim_{p}K^{-\pi}$
(viii)

$\left|\sum_{l=1}^{L}\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right|\lesssim_{p}1/\sqrt{nT}$ .

Proof.

Below, for a generic variable x indexed by $i$ and $t$ , we denote $\vec{\text{x}}_{it}=\text{x}_{i,t+1}-\text{x}_{it}$ . In addition, we write $a_{it}(s)\coloneqq A(\overline{Y}_{it},s)$ .

(i) Observe that for each $s_{l}$ ,

\displaystyle\left\|\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)\right\|

\displaystyle=\left\|\sum_{i=1}^{n}\sum_{t=1}^{T-1}\vec{B}_{it}\otimes\phi^{K}(s_{l})[\vec{a}_{it}(s_{l})-\mathbb{E}(\vec{a}_{it}(s_{l}))]\alpha(s_{l};\theta_{0}-\theta)\right\|.

(B.61)

As a typical element, the variance of the first element of $\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)$ is given as

$\displaystyle\text{Var}[\{\bm{Z}^{\top}\bm{D}^{\top}\bm{D}(\bm{H}-\mathbb{E}[\bm{H}])(\theta_{0}-\theta)\}_{1}]$	$\displaystyle=\mathbb{E}\left(\sum_{i=1}^{n}\sum_{t=1}^{T-1}\vec{Q}^{1}_{it}\phi_{1}(s_{l})(\vec{a}_{it}-\mathbb{E}[\vec{a}_{it}])\alpha(s_{l};\theta_{0}-\theta)\right)^{2}$	(B.62)
	$\displaystyle\lesssim\sum_{t=1}^{T-1}\sum_{t^{\prime}=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}=1}^{n}\left\|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right\|$	(B.63)
	$\displaystyle=\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{Var}(\vec{a}_{it})+\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left\|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t})\right\|$	(B.64)
	$\displaystyle\quad+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\left\|\text{Cov}(\vec{a}_{it},\vec{a}_{it^{\prime}})\right\|+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left\|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right\|.$	(B.65)

Here, the dependence on $s_{l}$ , ” $(s_{l})$ ”, is occasionally omitted for notational simplicity. First, from Assumptions 2.1(i), 3.2(ii), and 3.4, we can easily see that $\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{Var}(\vec{a}_{it})\lesssim nT$ . Second, by Lemma B.4 and B.5(i), there exists a geometric NED coefficient $\rho$ that satisfies

$\displaystyle\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left\|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t})\right\|$	$\displaystyle\lesssim\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\rho(\Delta(i,i^{\prime})/3)$	(B.66)
	$\displaystyle=\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{m=1}^{\infty}\sum_{i^{\prime}:\Delta(i,i^{\prime})\in[m,m+1)}\rho(\Delta(i,i^{\prime})/3)$	(B.67)
	$\displaystyle\lesssim\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{m=1}^{\infty}m^{d-1}\rho(m)\lesssim nT,$	(B.68)

where the second inequality is from Lemma A.1(iii) Jenish and Prucha (2009), and the final claim follows from the geometric NED property. Third, since $\vec{a}_{it}$ and $\vec{a}_{it^{\prime}}$ are independent if $|t-t^{\prime}|\geq 2$ , it holds that

\displaystyle\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{it^{\prime}})\right|

\displaystyle=\sum_{t^{\prime}\in\{t-1,t+1\}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{it^{\prime}})\right|\lesssim nT.

(B.69)

Finally, noting that $\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right|=\sum_{t^{\prime}\in\{t-1,t+1\}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right|$ , Lemma B.5(ii) implies that this term is also of order $nT$ .

Combining the above results suggests that $\mathbb{E}\left\|\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)/(n(T-1))\right\|^{2}\lesssim K/(nT)$ for each $s_{l}$ , which completes the proof by applying Markov’s and triangle inequalities.

(ii) Analogous to the proof of (i).

(iii) Observe that

	$\displaystyle\left\\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})/N\right\\|$	$\displaystyle=\left\\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\phi^{K}(s_{l})\vec{v}_{it}(s_{l})/N\right\\|$		(B.70)
		$\displaystyle\lesssim\frac{1}{N}\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\left\\|\vec{v}_{it}(s_{l})\phi^{K}(s_{l})\right\\|.$		(B.71)

Then, the result follows from $\mathbb{E}||\vec{v}_{it}(s_{l})\phi^{K}(s_{l})||\leq\mathbb{E}|\vec{v}_{it}(s_{l})|\sup_{s\in[0,1]}||\phi^{K}(s)||\lesssim K^{(1-2\pi)/2}$ . The second part can be proved analogously.

(iv) By the triangle inequality,

	$\displaystyle\left\\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right\\|$	$\displaystyle=\left\\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\phi^{K}(s_{l})\vec{\varepsilon}_{it}(s_{l})/N\right\\|$		(B.72)
		$\displaystyle\leq\left\\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\varepsilon_{i,t+1}(s_{l})\phi^{K}(s_{l})/N\right\\|+\left\\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\varepsilon_{it}(s_{l})\phi^{K}(s_{l})/N\right\\|.$		(B.73)

Further, by Assumptions 3.3(i) and (iii),

	$\displaystyle\mathbb{E}\left\\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\varepsilon_{it}(s_{l})\phi^{K}(s_{l})/N\right\\|^{2}$		(B.74)
	$\displaystyle=\frac{1}{n^{2}(T-1)^{2}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{trace}\left\{\vec{B}_{it}\vec{B}_{it}^{\top}\otimes\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi^{K}(s_{l})\phi^{K}(s_{l^{\prime}})^{\top}\right\}$		(B.75)
	$\displaystyle=\frac{1}{n^{2}(T-1)^{2}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{trace}\left\{\vec{B}_{it}\vec{B}_{it}^{\top}\right\}\text{trace}\left\{\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi^{K}(s_{l})\phi^{K}(s_{l^{\prime}})^{\top}\right\}\lesssim 1/(nT).$		(B.76)

Repeating the same calculation for the other term, the result follows from Markov’s inequality.

(v) Observe that

	$\displaystyle\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)-\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]$		(B.77)
	$\displaystyle=\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}p_{m,i,j}\left(\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)-\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]\right).$		(B.78)

Here, let $\bm{e}_{m,jt}(s_{l};\theta)\coloneqq\sum_{i=1}^{n}p_{m,i,j}e_{it}(s_{l};\theta)$ and recall that there is a constant $\overline{\Delta}_{m}$ such that $p_{m,i,j}=0$ if $\Delta(i,j)>\overline{\Delta}_{m}$ . Then, noting that $\mathcal{F}_{it}((\delta-1)\overline{\Delta}_{m})\subseteq\mathcal{F}_{jt}(\delta\overline{\Delta}_{m})$ for $(i,j)$ with $\Delta(i,j)\leq\overline{\Delta}_{m}$ and $\delta>1$ ,

$\displaystyle\left\\|\bm{e}_{m,jt}(s_{l};\theta)-\mathbb{E}[\bm{e}_{m,jt}(s_{l};\theta)\mid\mathcal{F}_{jt}(\delta\overline{\Delta}_{m})]\right\\|_{2}$	$\displaystyle\leq\sum_{i=1}^{n}\|p_{m,i,j}\|\left\\|e_{it}(s_{l};\theta)-\mathbb{E}[e_{it}(s_{l};\theta)\mid\mathcal{F}_{jt}(\delta\overline{\Delta}_{m})]\right\\|_{2}$	(B.79)
	$\displaystyle\lesssim\sum_{i:\Delta(i,j)\leq\overline{\Delta}_{m}}\left\\|e_{it}(s_{l};\theta)-\mathbb{E}[e_{it}(s_{l};\theta)\mid\mathcal{F}_{it}((\delta-1)\overline{\Delta}_{m})]\right\\|_{2}$	(B.80)
	$\displaystyle\lesssim\varrho^{\lfloor(\delta-1)\overline{\Delta}_{m}/\overline{\Delta}\rfloor},$	(B.81)

which implies that $\{\bm{e}_{m,jt}(s_{l};\theta)\}$ is uniformly and geometrically $L^{2}$ -NED, for all $l\in[L]$ , $m\in[M]$ , and $\theta\in\Theta_{K}$ .

Now, suppressing the dependence on both $s_{l}$ and $\theta$ ,

$\displaystyle\text{Var}[\bm{E}^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}]$	$\displaystyle=\mathbb{E}\left(\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}p_{m,i,j}\{\vec{e}_{it}\vec{e}_{jt}-\mathbb{E}[\vec{e}_{it}\vec{e}_{jt}]\}\right)^{2}$	(B.82)
	$\displaystyle=\mathbb{E}\left(\sum_{t=1}^{T-1}\sum_{j=1}^{n}\{\vec{\bm{e}}_{m,jt}\vec{e}_{jt}-\mathbb{E}[\vec{\bm{e}}_{m,jt}\vec{e}_{jt}]\}\right)^{2}$	(B.83)
	$\displaystyle\leq\sum_{t=1}^{T-1}\sum_{t^{\prime}=1}^{T-1}\sum_{j=1}^{n}\sum_{j^{\prime}=1}^{n}\left\|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,j^{\prime}t^{\prime}}\vec{e}_{j^{\prime}t^{\prime}})\right\|$	(B.84)
	$\displaystyle=\sum_{t=1}^{T-1}\sum_{j=1}^{n}\text{Var}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt})+\sum_{t=1}^{T-1}\sum_{j=1}^{n}\sum_{j^{\prime}\neq j}\left\|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,j^{\prime}t}\vec{e}_{j^{\prime}t})\right\|$	(B.85)
	$\displaystyle\quad+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{j=1}^{n}\left\|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,jt^{\prime}}\vec{e}_{jt^{\prime}})\right\|+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{j=1}^{n}\sum_{j^{\prime}\neq j}\left\|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,j^{\prime}t^{\prime}}\vec{e}_{j^{\prime}t^{\prime}})\right\|.$	(B.86)

As we have seen in (A.5), we have $||e_{jt}||_{p}$ , $||\bm{e}_{m,jt}||_{p}<\infty$ for $p>4$ . This allows us to use Lemma A.1 of Xu and Lee (2015) (see also Corollary 4.3(b) of Gallant and White (1988)) to show that $\{\bm{e}_{m,jt}e_{jt}\}$ is uniformly and geometrically $L^{2}$ -NED. Then, following the same argument as in the proof of (i), we can show that $\text{Var}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]\lesssim nT$ for each $s_{l}$ , which gives the desired result by the triangle inequality.

(vi), (vii) These can be proved in a similar manner to the proof of Lemma B.1.

(viii) For each $s_{l}$ , $|\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l})/(n(T-1))|\lesssim_{p}1/\sqrt{nT}$ holds under Assumptions 3.3(i) and (ii), as in Lemma 9 in Yu et al. (2008) and Lemma 1 in Lee and Yu (2014). Then, the result is straightforward.

∎

Lemma B.7.

Suppose that Assumptions 2.1(i), 3.2, 3.4, and 3.5(i) hold. Then, $\sup_{\theta\in\Theta_{K}}\left\|\mathbb{E}[\overline{g}_{nT}(\theta)]\right\|\lesssim\sqrt{K}$ .

Proof.

Observe that

\displaystyle\left\|\mathbb{E}[\overline{g}_{nT}(\theta)]\right\|

\displaystyle\leq\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{E}(s_{l};\theta)]/N\right\|+\sum_{m=1}^{M}\left|\sum_{l=1}^{L}\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]/N\right|.

(B.87)

For the first term,

$\displaystyle\left\\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{E}(s_{l};\theta)]/N\right\\|$	$\displaystyle=\left\\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\mathbb{E}[\vec{e}_{it}(s_{l};\theta)]\phi^{K}(s_{l})/N\right\\|$	(B.88)
	$\displaystyle\lesssim\frac{1}{N}\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\left\\|\mathbb{E}[\vec{e}_{it}(s_{l};\theta)]\phi^{K}(s_{l})\right\\|$	(B.89)
	$\displaystyle\lesssim\sqrt{K}$	(B.90)

uniformly in $\theta\in\Theta_{K}$ , since $|\mathbb{E}[e_{it}(s_{l};\theta)]|\leq\mathbb{E}|e_{it}(s_{l};\theta)|\lesssim 1$ and $\sup_{s\in[0,1]}||\phi^{K}(s)||\lesssim\sqrt{K}$ .

For the second term,

$\displaystyle\left\|\sum_{l=1}^{L}\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]/N\right\|$	$\displaystyle=\left\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}p_{m,i,j}\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]/N\right\|$	(B.91)
	$\displaystyle\leq\frac{1}{N}\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}\|p_{m,i,j}\|\cdot\|\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]\|$	(B.92)
	$\displaystyle\lesssim 1$	(B.93)

uniformly in $\theta\in\Theta_{K}$ , since

$\displaystyle\|\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]\|$	$\displaystyle\leq\mathbb{E}[\|\vec{e}_{it}(s_{l};\theta)\|\cdot\|\vec{e}_{jt}(s_{l};\theta)\|]$	(B.94)
	$\displaystyle\leq\|\|\vec{e}_{it}(s_{l};\theta)\|\|_{2}\cdot\|\|\vec{e}_{it}(s_{l};\theta)\|\|_{2}$	(B.95)
	$\displaystyle\leq\{\|\|e_{i,t+1}(s_{l};\theta)\|\|_{2}+\|\|e_{it}(s_{l};\theta)\|\|_{2}\}\cdot\{\|\|e_{j,t+1}(s_{l};\theta)\|\|_{2}+\|\|e_{jt}(s_{l};\theta)\|\|_{2}\}$	(B.96)
	$\displaystyle\lesssim 1.$	(B.97)

This completes the proof. ∎

Lemma B.8.

Suppose that Assumptions 2.1, 3.1 – 3.5, and 3.7 hold. In addition, assume that $K/\sqrt{nT}\to 0$ and $K^{1-\pi}\to 0$ as $nT\to\infty$ . Then, $||\widehat{\theta}_{nT}-\theta_{0}||=o_{P}(1)$ .

Proof.

Observe that

\displaystyle\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]=\left(\begin{array}[]{c}A_{0}(\theta)\\ A_{1}(\theta)\\ \vdots\\ A_{M}(\theta)\end{array}\right)

(B.102)

where

	$\displaystyle A_{0}(\theta)$	$\displaystyle\coloneqq\underbracket{\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)/N}_{\lesssim_{p}\sqrt{K}/\sqrt{nT}:\>\text{Lemma \ref{lem:matLLN}(i)}}$		(B.103)
		$\displaystyle\quad+\underbracket{\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})/N-\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{V}(s_{l})]/N}_{\lesssim_{p}K^{(1-2\pi)/2}:\>\text{Lemma \ref{lem:matLLN}(iii)}}+\underbracket{\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})/N}_{\lesssim_{p}1/\sqrt{nT}:\>\text{Lemma \ref{lem:matLLN}(iv)}}$		(B.104)

and, for $m=1,\ldots,M$ ,

\displaystyle A_{m}(s;\theta)

\displaystyle\coloneqq\underbracket{\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)-\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]\right\}/N}_{\lesssim_{p}1/\sqrt{nT}:\>\text{Lemma \ref{lem:matLLN}(v)}}.

(B.105)

Hence,

	$\displaystyle\left\\|\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right\\|$	$\displaystyle\leq\left\\|A_{0}(\theta)\right\\|+\sum_{m=1}^{M}\left\|A_{m}(\theta)\right\|$		(B.106)
		$\displaystyle\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{(1-2\pi)/2}$		(B.107)

uniformly in $\theta\in\Theta_{K}$ . Further, by Cauchy-Schwarz inequality and Lemma B.7,

$\displaystyle\sup_{\theta\in\Theta_{K}}\left\|\mathcal{Q}_{nT}(\theta)-\mathcal{Q}^{*}_{nT}(\theta)\right\|$	$\displaystyle\leq\sup_{\theta\in\Theta_{K}}\left\|\left(\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right)^{\top}\Omega_{nT}\left(\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right)\right\|$	(B.108)
	$\displaystyle\quad+2\sup_{\theta\in\Theta_{K}}\left\|\left(\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right)^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta)]\right\|$	(B.109)
	$\displaystyle\lesssim\sup_{\theta\in\Theta_{K}}\left\\|\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right\\|^{2}+\sup_{\theta\in\Theta_{K}}\left\\|\mathbb{E}[\overline{g}_{nT}(\theta)]\right\\|\sup_{\theta\in\Theta_{K}}\left\\|\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right\\|$	(B.110)
	$\displaystyle\lesssim_{p}K/\sqrt{nT}+K^{1-\pi}.$	(B.111)

Combined with the identifiability of $\theta_{0}$ (Lemma B.2), the above result implies the consistency of $\widehat{\theta}_{nT}$ (see, e.g., the proof of Theorem 3.3 in Su and Hoshino (2016)). ∎

Proof of Theorem 3.1.

(i) Given the consistency result in Lemma B.8, if we can show that for an arbitrary $\epsilon>0$ , there exists a constant $C_{\epsilon}$ such that for all sufficiently large $nT$ ,

\displaystyle\Pr\left(\inf_{||\bm{u}||=C_{\epsilon}}\mathcal{Q}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})>\mathcal{Q}_{nT}(\theta_{0})\right)\geq 1-\epsilon,

(B.112)

we can conclude that $||\widehat{\theta}_{nT}-\theta_{0}||\lesssim_{p}\zeta_{nT}$ .

Decompose

	$\displaystyle\mathcal{Q}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\mathcal{Q}_{nT}(\theta_{0})$	$\displaystyle=\underbracket{\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)^{\top}\Omega_{nT}\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)}_{\eqqcolon\widetilde{A}_{nT}(\theta)}$		(B.113)
		$\displaystyle\quad+2\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0}).$		(B.114)

Lemma B.6(ii) implies that $||\Pi_{nT}-\widehat{\Pi}_{nT}||=o_{P}(1)$ , where $\widehat{\Pi}_{nT}\coloneqq\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{H}(s_{l})/N$ . Thus, by Assumption 3.6, we have $\lambda_{\min}(\widehat{\Pi}_{nT}^{\top}\widehat{\Pi}_{nT})>0$ with probability approaching one. Observing that

\displaystyle\overline{g}_{nT}(\theta)-\overline{g}_{nT}(\theta_{0})

\displaystyle=\left(\begin{array}[]{c}\widehat{\Pi}_{nT}\\ \sum_{l=1}^{L}\left[\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l})\right]^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{H}(s_{l})/N\\ \vdots\\ \sum_{l=1}^{L}\left[\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l})\right]^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{H}(s_{l})/N\end{array}\right)(\theta_{0}-\theta),

(B.119)

we obtain $\widetilde{A}_{nT}(\theta)\geq c_{1}\zeta_{nT}^{2}C_{\epsilon}^{2}$ for some $c_{1}>0$ with probability approaching one.

For the second term, we can find by Cauchy-Schwarz inequality that

	$\displaystyle\left\|\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0})\right\|$	$\displaystyle\leq(\widetilde{A}_{nT}(\theta))^{1/2}\left(\overline{g}_{nT}(\theta_{0})^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0})\right)^{1/2}$		(B.120)
		$\displaystyle\leq c_{2}(\widetilde{A}_{nT}(\theta))^{1/2}\|\|\overline{g}_{nT}(\theta_{0})\|\|.$		(B.121)

Hence,

	$\displaystyle\mathcal{Q}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\mathcal{Q}_{nT}(\theta_{0})$	$\displaystyle\geq\widetilde{A}_{nT}(\theta)-2c_{2}(\widetilde{A}_{nT}(\theta))^{1/2}\|\|\overline{g}_{nT}(\theta_{0})\|\|$		(B.122)
		$\displaystyle=(\widetilde{A}_{nT}(\theta))^{1/2}((\widetilde{A}_{nT}(\theta))^{1/2}-2c_{2}\|\|\overline{g}_{nT}(\theta_{0})\|\|).$		(B.123)

Since $(\widetilde{A}_{nT}(\theta))^{1/2}$ is bounded below from $\sqrt{c_{1}}\zeta_{nT}C_{\epsilon}$ , if we set $\zeta_{nT}\propto||\overline{g}_{nT}(\theta_{0})||$ , we can obtain the desired inequality by choosing a sufficiently large $C_{\epsilon}$ . From Lemma B.6(iii), (iv), (vi), (vii), and (viii), we have

\displaystyle\left\|\overline{g}_{nT}(\theta_{0})\right\|\lesssim 1/\sqrt{nT}+K^{(1-2\pi)/2},

(B.124)

and this completes the proof.

(ii) Note that $\int_{0}^{1}\phi^{K}(s)\phi^{K}(s)^{\top}\text{d}s=I_{K}$ by orthonormality. Then, by result (i) and Assumption 3.7,

$\displaystyle\left\\|\widehat{\alpha}_{nT}-\alpha_{0}\right\\|_{L^{2}}$	$\displaystyle\leq\left\\|\phi^{K}(\cdot)^{\top}(\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha})\right\\|_{L^{2}}+\left\\|\phi^{K}(\cdot)^{\top}\theta_{0\alpha}-\alpha_{0}(\cdot)\right\\|_{L^{2}}$	(B.125)
	$\displaystyle\lesssim\left((\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha})^{\top}\left[\int_{0}^{1}\phi^{K}(s)\phi^{K}(s)^{\top}\text{d}s\right](\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha})\right)^{1/2}+K^{-\pi}$	(B.126)
	$\displaystyle\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}.$	(B.127)

It is also straightforward to see that $\sup_{s\in[0,1]}|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\leq\sup_{s\in[0,1]}||\phi^{K}(s)||\cdot||\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha}||+K^{-\pi}\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}$ .

(iii) Analogous to the proof of (ii). ∎

Lemma B.9.

Suppose that Assumptions 2.1, 3.1 – 3.7, and 3.8(i), (ii) hold. In addition, assume that $K/\sqrt{nT}\to 0$ and $K^{1-\pi}\to 0$ as $nT\to\infty$ . Let $\overline{\theta}_{nT}$ be any vector in between $\widehat{\theta}_{nT}$ and $\theta_{0}$ . Then,

(i)

$\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}$
(ii)

$\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}-\overline{J}_{nT}\overline{J}^{\top}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}$
(iii)

$\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}$
(iv)

$\left\|\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}-\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}$

Proof.

(i) Observe that

\displaystyle\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}]\right\|\leq B_{1,nT}+2\sum_{m=1}^{M}\left(B_{2,m,nT}+B_{3,m,nT}\right)

(B.128)

where

$\displaystyle B_{1,nT}$	$\displaystyle\coloneqq\left\\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\left\{\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})]\right\}/N\right\\|\lesssim_{p}\underset{\text{Lemma \ref{lem:matLLN}(ii)}}{K/\sqrt{nT}}$	(B.129)
$\displaystyle B_{2,m,nT}$	$\displaystyle\coloneqq\left\\|\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\widehat{\theta}_{nT})-\bm{E}(s_{l};\theta_{0})\right\}^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})/N\right\\|$	(B.130)
$\displaystyle B_{3,m,nT}$	$\displaystyle\coloneqq\left\\|\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\theta_{0})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})-\mathbb{E}[\bm{E}(s_{l};\theta_{0})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})]\right\}/N\right\\|.$	(B.131)

In a similar manner to the proof of Lemma B.6(v), we can show that $||N^{-1}\sum_{l=1}^{L}\{\bm{H}(s_{l})^{\top}\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})^{\top}\bm{H}(s_{l})]\}||\lesssim_{p}K/\sqrt{nT}$ . Then, by Assumption 3.8(i) and Theorem 3.1(i), we have

$\displaystyle B_{2,m,nT}$	$\displaystyle=\left\\|\left(\frac{1}{N}\sum_{l=1}^{L}\bm{H}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})\right)\left(\widehat{\theta}_{nT}-\theta_{0}\right)\right\\|$	(B.132)
	$\displaystyle\leq\lambda_{\max}\left(\frac{1}{N}\sum_{l=1}^{L}\bm{H}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})\right)\cdot\left\\|\widehat{\theta}_{nT}-\theta_{0}\right\\|$	(B.133)
	$\displaystyle\leq\lambda_{\max}\left(\bm{D}^{\top}P_{m}\bm{D}\right)\cdot\lambda_{\max}\left(\frac{1}{N}\sum_{l=1}^{L}\bm{H}(s_{l})^{\top}\bm{H}(s_{l})\right)\cdot\left\\|\widehat{\theta}_{nT}-\theta_{0}\right\\|\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}.$	(B.134)

For $B_{3,m,nT}$ , by the same argument as in Lemma B.6(v), we can show that $B_{3,m,nT}\lesssim_{p}\sqrt{K}/\sqrt{nT}$ . This completes the proof.

(ii) By the triangle inequality,

$\displaystyle\left\\|\overline{J}_{nT}(\widehat{\theta}_{nT})\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}-\overline{J}_{nT}\overline{J}^{\top}_{nT}\right\\|$	$\displaystyle\leq\left\\|(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\\|$	(B.135)
	$\displaystyle\quad+2\left\\|\overline{J}_{nT}(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\\|$	(B.136)
	$\displaystyle\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}$	(B.137)

where the last inequality is from result (i) and Assumption 3.8(ii).

(iii) By definition of $\overline{\theta}_{nT}$ , we have $\left\|\overline{\theta}_{nT}-\theta_{0}\right\|\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}$ and thus $\left\|\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}$ , as in result (i). Then, by the triangle inequality,

$\displaystyle\left\\|\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right\\|$	$\displaystyle\leq\left\\|(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\Omega_{nT}(\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\\|$	(B.138)
	$\displaystyle\quad+\left\\|\overline{J}_{nT}\Omega_{nT}(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\\|$	(B.139)
	$\displaystyle\quad+\left\\|\overline{J}_{nT}\Omega_{nT}(\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\\|$	(B.140)
	$\displaystyle\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}.$	(B.141)

(iv) As a result of (iii), we have $\lambda_{\min}\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)>0$ with probability approaching one by Assumptions 3.5(ii) and 3.8(ii). Then, noting the equality

	$\displaystyle\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}-\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}$		(B.142)
	$\displaystyle=\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}\left[\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}-\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right]\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1},$		(B.143)

the result is straightforward. ∎

Proof of Theorem 3.2.

Since the proof is similar, we only prove (i). By the first-order condition of minimization and the mean-value expansion, we have

	$\displaystyle\bm{0}_{(d_{x}+1)K}$	$\displaystyle=\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{g}_{nT}(\widehat{\theta}_{nT})$		(B.144)
		$\displaystyle=\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\left[\overline{g}_{nT}(\theta_{0})+\overline{J}_{nT}(\overline{\theta}_{nT})\left(\widehat{\theta}_{nT}-\theta_{0}\right)\right],$		(B.145)

where $\overline{\theta}_{nT}\in[\widehat{\theta}_{nT},\theta_{0}]$ , leading to

	$\displaystyle\left(\widehat{\theta}_{nT}-\theta_{0}\right)$	$\displaystyle=-\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0})$		(B.146)
		$\displaystyle=-[G_{1,nT}+G_{2,nT}+G_{3,nT}+G_{4,nT}],$		(B.147)

with

$\displaystyle G_{1,nT}$	$\displaystyle\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{1,nT}$	(B.148)
$\displaystyle G_{2,nT}$	$\displaystyle\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{2,nT}$	(B.149)
$\displaystyle G_{3,nT}$	$\displaystyle\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\left\{\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}\right\}^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0})$	(B.150)
$\displaystyle G_{4,nT}$	$\displaystyle\coloneqq\left\{\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}-\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\right\}\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0}).$	(B.151)

First, observing that

$\displaystyle\left\\|G_{2,nT}\right\\|^{2}$	$\displaystyle=\overline{g}_{2,nT}^{\top}\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-2}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{2,nT}$	(B.152)
	$\displaystyle\lesssim\overline{g}_{2,nT}^{\top}\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{2,nT}$	(B.153)
	$\displaystyle\lesssim\overline{g}_{2,nT}^{\top}\Omega_{nT}\overline{g}_{2,nT}$	(B.154)
	$\displaystyle\lesssim\left\\|\overline{g}_{2,nT}\right\\|^{2},$	(B.155)

we can find that $\left\|G_{2,nT}\right\|\lesssim_{p}K^{(1-2\pi)/2}$ by Lemma B.6(iii), (vi), and (vii). Next, it is easy to see that

	$\displaystyle\left\\|G_{3,nT}\right\\|$	$\displaystyle\lesssim\underbracket{\left\\|\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\left\{\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}\right\}^{\top}\right\\|}_{\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}:\>\text{Lemma \ref{lem:matLLN2}(i)}}\cdot\underbracket{\left\\|\overline{g}_{nT}(\theta_{0})\right\\|}_{\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}:\>\eqref{eq:g0}}$		(B.156)
		$\displaystyle\lesssim_{p}K/(nT)$		(B.157)

from $\sqrt{nT}K^{(1-2\pi)/2}\to 0$ . Further, for $G_{4,nT}$ , by Lemma B.9(ii) and (iv) and (B.124) with Assumption 3.8(ii), we can show that $\left\|G_{4,nT}\right\|\lesssim_{p}K/(nT)$ .

Combining all these results and noting that

\displaystyle[\sigma_{nT,\alpha}(s)]^{2}\geq c\phi^{K}(s)^{\top}\underbracket{\mathbb{S}_{\alpha}\mathbb{S}_{\alpha}^{\top}}_{=I_{K}}\phi^{K}(s)\geq c||\phi^{K}(s)||^{2}>0

(B.158)

for sufficiently large $nT$ , we have

$\displaystyle\frac{\sqrt{n(T-1)}\left(\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)\right)}{\sigma_{nT,\alpha}(s)}$	$\displaystyle=-\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}[G_{1,nT}+O_{P}(K/(nT))+O_{P}(K^{(1-2\pi)/2})]}{\sigma_{nT,\alpha}(s)}$	(B.159)
	$\displaystyle\quad+\frac{\sqrt{n(T-1)}O(K^{-\pi})}{\sigma_{nT,\alpha}(s)}$	(B.160)
	$\displaystyle=-\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}G_{1,nT}}{\sigma_{nT,\alpha}(s)}+o_{P}(1)$	(B.161)

Here, let $\Lambda_{z,nT}(s)$ and $\Lambda_{m,nT}(s)$ denote the first $(d_{q}+d_{x})K$ elements and $((d_{q}+d_{x})K+m)$ -th element of $\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}/\sigma_{nT,\alpha}(s)$ , respectively. Then, we can write

	$\displaystyle\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}G_{1,nT}}{\sigma_{nT,\alpha}(s)}$		(B.162)
	$\displaystyle=\frac{\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\left[\sqrt{n(T-1)}\overline{g}_{1,nT}\right]}{\sigma_{nT,\alpha}(s)}$		(B.163)
	$\displaystyle=\frac{1}{L\sqrt{n(T-1)}}\sum_{l=1}^{L}\left[\Lambda_{z,nT}(s)\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})+\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}\left(\sum_{m=1}^{M}\Lambda_{m,nT}(s)P_{m}\right)\bm{D}\bm{\mathcal{E}}(s_{l})\right].$		(B.164)

Moreover, for convenience, we re-label the data such that $(it)=(11)\iff\text{i}=1$ , $(it)=(21)\iff\text{i}=2$ , …, $(it)=(nT)\iff\text{i}=\bm{n}$ (where $\bm{n}=nT$ ).

Let $\underset{\bm{n}\times\bm{n}}{\Pi_{M}(s)}\coloneqq\bm{D}^{\top}\left(\sum_{m=1}^{M}\Lambda_{m,nT}(s)P_{m}\right)\bm{D}=(\pi_{M,\text{i},\text{j}}(s))$ . Recalling the block-diagonal structure of $P_{m}$ and that its diagonals are all zero, we can find that the diagonal elements of $\Pi_{M}(s)$ are also all zero. Further, note that $\Pi_{M}(s)$ is symmetric. Now, letting $z_{\text{i}}^{\dagger}(s_{l})$ be the i-th column of $\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}$ , define

$\displaystyle a_{\text{i}}(s)$	$\displaystyle\coloneqq\frac{1}{L\sqrt{n(T-1)}}\sum_{l=1}^{L}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l})\varepsilon_{\text{i}}(s_{l})$	(B.165)
$\displaystyle b_{\text{i},\text{j}}(s)$	$\displaystyle\coloneqq\frac{1}{L\sqrt{n(T-1)}}\sum_{l=1}^{L}\pi_{M,\text{i},\text{j}}(s)\varepsilon_{\text{i}}(s_{l})\varepsilon_{\text{j}}(s_{l})$	(B.166)
$\displaystyle\gamma_{\text{i}}(s)$	$\displaystyle\coloneqq a_{\text{i}}(s)+2\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s),$	(B.167)

and we further re-write

\displaystyle\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}G_{1,nT}}{\sigma_{nT,\alpha}(s)}=\sum_{\text{i}=1}^{\bm{n}}\gamma_{\text{i}}(s).

(B.168)

Here, let $\mathscr{F}_{nT}(\text{i})$ denote the $\sigma$ -field generated by $\{\varepsilon_{\text{j}}:1\leq\text{j}\leq\text{i}\}$ . Under Assumption 3.3(i), we have $\mathbb{E}[\gamma_{\text{i}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]=0$ , implying that $\{\gamma_{\text{i}}(s)\}$ forms a martingale difference sequence for each $\bm{n}\geq 1$ . Then, it suffices to check the following two conditions for the central limit theorem of Scott (1973):

	$\displaystyle(1)\;\;$	$\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}1$		(B.169)
	$\displaystyle(2)\;\;$	$\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\bm{1}\{\|\gamma_{\text{i}}(s)\|\geq\eta\}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}0\;\;\text{for any $\eta>0$}$		(B.170)

Verification of condition (1)

Observe that

	$\displaystyle\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\mid\mathscr{F}_{nT}(\text{i}-1)]$	$\displaystyle=\mathbb{E}[(a_{\text{i}}(s))^{2}]+4\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\mathbb{E}[b_{\text{i},\text{j}_{1}}(s)b_{\text{i},\text{j}_{2}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]$		(B.171)
		$\displaystyle\quad+4\sum_{\text{j}=1}^{\text{i}-1}\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)].$		(B.172)

Recalling the definition of $\mathcal{V}_{z,nT}$ in (A.39), we can easily see that $\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[(a_{\text{i}}(s))^{2}]=\Lambda_{z,nT}(s)\mathcal{V}_{z,nT}\Lambda_{z,nT}(s)^{\top}$ .

For the second term on the right-hand side, noting that $\text{j}_{1},\text{j}_{2}\leq\text{i}-1$ ,

	$\displaystyle 4\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\mathbb{E}[b_{\text{i},\text{j}_{1}}(s)b_{\text{i},\text{j}_{2}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]$		(B.173)
	$\displaystyle=\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\underbracket{\frac{4}{n(T-1)}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}_{1}}(s)\pi_{M,\text{i},\text{j}_{2}}(s)\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\varepsilon_{\text{j}_{1}}(s_{l})\varepsilon_{\text{j}_{2}}(s_{l^{\prime}})}_{\eqqcolon D(s,s_{l},s_{l^{\prime}})}.$		(B.174)

Since $\Pi_{M}(s)$ is symmetric and its diagonals are zero, recalling the definition of $\mathcal{V}_{ab,nT}$ in (A.40), direct calculation yields

$\displaystyle\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[D(s,s_{l},s_{l^{\prime}})]$	$\displaystyle=\frac{4}{n(T-1)}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}=1}^{\text{i}-1}[\pi_{M,\text{i},\text{j}}(s)]^{2}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}})$	(B.175)
	$\displaystyle=\frac{2}{n(T-1)}\sum_{1\leq\text{i},\text{j}\leq\bm{n}}[\pi_{M,\text{i},\text{j}}(s)]^{2}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}})$	(B.176)
	$\displaystyle=\frac{2}{n(T-1)}\sum_{1\leq\text{i},\text{j}\leq\bm{n}}\left[\sum_{m=1}^{M}\Lambda_{m,nT}(s)\widetilde{p}_{m,\text{i},\text{j}}\right]^{2}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}})$	(B.177)
	$\displaystyle=\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\frac{2}{n(T-1)}\sum_{1\leq\text{i},\text{j}\leq\bm{n}}\widetilde{p}_{a,\text{i},\text{j}}\widetilde{p}_{b,\text{i},\text{j}}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}})$	(B.178)
	$\displaystyle=\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\mathcal{V}_{ab,nT}.$	(B.179)

Meanwhile,

$\displaystyle\text{Var}\left(D(s,s_{l},s_{l^{\prime}})\right)$	$\displaystyle\lesssim\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\sum_{\text{i}^{\prime}=1}^{\bm{n}}\sum_{\text{k}_{1}=1}^{\text{i}^{\prime}-1}\sum_{\text{k}_{2}=1}^{\text{i}^{\prime}-1}\left\|\pi_{M,\text{i},\text{j}_{1}}(s)\pi_{M,\text{i},\text{j}_{2}}(s)\pi_{M,\text{i}^{\prime},\text{k}_{1}}(s)\pi_{M,\text{i}^{\prime},\text{k}_{2}}(s)\right\|$	(B.180)
	$\displaystyle\quad\times\left\|\mathbb{E}\{(\varepsilon_{\text{j}_{1}}(s_{l})\varepsilon_{\text{j}_{2}}(s_{l^{\prime}})-\mathbb{E}[\varepsilon_{\text{j}_{1}}(s_{l})\varepsilon_{\text{j}_{2}}(s_{l^{\prime}})])(\varepsilon_{\text{k}_{1}}(s_{l})\varepsilon_{\text{k}_{2}}(s_{l^{\prime}})-\mathbb{E}[\varepsilon_{\text{k}_{1}}(s_{l})\varepsilon_{\text{k}_{2}}(s_{l^{\prime}})])\}\right\|$	(B.181)
	$\displaystyle\lesssim\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{i}^{\prime}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\left\|\pi_{M,\text{i},\text{j}_{1}}(s)\pi_{M,\text{i},\text{j}_{2}}(s)\pi_{M,\text{i}^{\prime},\text{j}_{1}}(s)\pi_{M,\text{i}^{\prime},\text{j}_{2}}(s)\right\|$	(B.182)
	$\displaystyle=\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}^{\prime}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\sum_{\text{i}=1}^{\bm{n}}\|\pi_{M,\text{i},\text{j}_{1}}(s)\|\cdot\|\pi_{M,\text{i},\text{j}_{2}}(s)\|\cdot\|\pi_{M,\text{i}^{\prime},\text{j}_{1}}(s)\|\cdot\|\pi_{M,\text{i}^{\prime},\text{j}_{2}}(s)\|$	(B.183)
	$\displaystyle\leq\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}^{\prime}=1}^{\bm{n}}\|\|\Pi_{M}(s)\|\|_{1}^{2}\cdot\|\|\Pi_{M}(s)\|\|_{\infty}^{2}\lesssim 1/(nT).$	(B.184)

Consequently, $4\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\mathbb{E}[b_{\text{i},\text{j}_{1}}(s)b_{\text{i},\text{j}_{2}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\mathcal{V}_{ab,nT}$ holds from Chebyshev’s inequality.

For the third term, for $\text{j}\leq\text{i}-1$ ,

	$\displaystyle\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]$	$\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l})\pi_{M,\text{i},\text{j}}(s)\mathbb{E}[\varepsilon_{\text{i}}(s_{l})\varepsilon_{\text{i}}(s_{l^{\prime}})]\varepsilon_{\text{j}}(s_{l^{\prime}})$		(B.185)
		$\displaystyle=\frac{1}{Ln(T-1)}\sum_{l=1}^{L}\pi_{M,\text{i},\text{j}}(s)h_{\text{i}}(s,s_{l})\varepsilon_{\text{j}}(s_{l}),$		(B.186)

where $h_{\text{i}}(s,s_{l^{\prime}})\coloneqq L^{-1}\sum_{l=1}^{L}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l})\mathbb{E}[\varepsilon_{\text{i}}(s_{l})\varepsilon_{\text{i}}(s_{l^{\prime}})]$ . Hence, we can write

\displaystyle 4\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}=1}^{\text{i}-1}\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]

\displaystyle=\frac{4}{L}\sum_{l=1}^{L}\left(\frac{1}{n(T-1)}\sum_{\text{j}=1}^{\bm{n}}\sum_{\text{i}=\text{j}+1}^{\bm{n}}\pi_{M,\text{i},\text{j}}(s)h_{\text{i}}(s,s_{l})\varepsilon_{\text{j}}(s_{l})\right).

(B.187)

Noting that $|h_{\text{i}}(s,s_{l^{\prime}})|\lesssim K/||\phi^{K}(s)||$ ,

	$\displaystyle\mathbb{E}\left\|\frac{1}{n(T-1)}\sum_{\text{j}=1}^{\bm{n}}\sum_{\text{i}=\text{j}+1}^{\bm{n}}\pi_{M,\text{i},\text{j}}(s)h_{\text{i}}(s,s_{l})\varepsilon_{\text{j}}(s_{l})\right\|^{2}$	$\displaystyle\lesssim\frac{K^{2}}{n^{2}(T-1)^{2}}\sum_{\text{j}=1}^{\bm{n}}\sum_{\text{i}=\text{j}+1}^{\bm{n}}\sum_{\text{i}^{\prime}=\text{j}+1}^{\bm{n}}\|\pi_{M,\text{i},\text{j}}(s)\|\cdot\|\pi_{M,\text{i}^{\prime},\text{j}}(s)\|$		(B.188)
		$\displaystyle\leq\frac{K^{2}}{n^{2}(T-1)^{2}}\sum_{\text{j}=1}^{\bm{n}}\|\|\Pi_{M}(s)\|\|_{1}^{2}\lesssim K^{2}/(nT)$		(B.189)

Then, by Markov’s inequality, we obtain $4\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}=1}^{\text{i}-1}\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}0$ .

Finally, combining the above results gives

	$\displaystyle\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}\Lambda_{z,nT}(s)\mathcal{V}_{z,nT}\Lambda_{z,nT}(s)^{\top}+\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\mathcal{V}_{ab,nT}$		(B.190)
	$\displaystyle=\left(\begin{array}[]{cccc}\Lambda_{z,nT}(s)&\Lambda_{1,nT}(s)&\cdots&\Lambda_{M,nT}(s)\end{array}\right)\left(\begin{array}[]{cccc}\mathcal{V}_{z,nT}&\bm{0}_{(d_{q}+d_{x})K\times 1}&\cdots&\bm{0}_{(d_{q}+d_{x})K\times 1}\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{11,nT}&\cdots&\mathcal{V}_{1M,nT}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{M1,nT}&\cdots&\mathcal{V}_{MM,nT}\end{array}\right)\left(\begin{array}[]{c}\Lambda_{z,nT}(s)^{\top}\\ \Lambda_{1,nT}(s)\\ \vdots\\ \Lambda_{M,nT}(s)\end{array}\right)$		(B.200)
	$\displaystyle=\frac{\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\Sigma_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)}{[\sigma_{nT,\alpha}(s)]^{2}}=1,$		(B.201)

as desired.

Verification of condition (2)

To verify condition (2), it is sufficient to show that $\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[|\gamma_{\text{i}}(s)|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}0$ . Moreover, by the $c_{r}$ inequality,

\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[|\gamma_{\text{i}}(s)|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)]\leq 8\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[|a_{\text{i}}(s)|^{4}]+128\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}\left[\left|\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s)\right|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)\right].

(B.202)

For the first term on the right-hand side, noting that Assumption 3.3(ii) implies $\mathbb{E}|\prod_{k=1}^{4}\varepsilon_{it}(s_{k})|<\infty$ by Hölder’s inequality,

	$\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[\|a_{\text{i}}(s)\|^{4}]$	$\displaystyle=\frac{1}{L^{4}n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{1\leq l_{1},l_{2},l_{3},l_{4}\leq L}\left\|\prod_{j=1}^{4}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l_{j}})\right\|\cdot\mathbb{E}\left\|\prod_{j=1}^{4}\varepsilon_{\text{i}}(s_{l_{j}})\right\|$		(B.203)
		$\displaystyle\lesssim\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\frac{1}{L^{4}}\sum_{1\leq l_{1},l_{2},l_{3},l_{4}\leq L}\left\|\prod_{j=1}^{4}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l_{j}})\right\|\lesssim\frac{K^{4}}{nT\left\\|\phi^{K}(s)\right\\|^{4}}.$		(B.204)

For the second term, observe that

	$\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}\left[\left\|\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s)\right\|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)\right]$		(B.205)
	$\displaystyle=\frac{1}{L^{4}n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{1\leq l_{1},l_{2},l_{3},l_{4}\leq L}\left(\sum_{\text{j}_{1}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}_{1}}(s)\varepsilon_{\text{j}_{1}}(s_{l_{1}})\right)\cdots\left(\sum_{\text{j}_{4}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}_{4}}(s)\varepsilon_{\text{j}_{4}}(s_{l_{4}})\right)\mathbb{E}\left(\prod_{k=1}^{4}\varepsilon_{\text{i}}(s_{l_{k}})\right).$		(B.206)

Further, it is easy to see that $\sum_{\text{j}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}}(s)\varepsilon_{\text{j}}(s_{l})\lesssim_{p}1$ by Markov’s inequality. Hence, we have $\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}\left[\left|\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s)\right|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)\right]\lesssim_{p}1/(nT)$ , and combining this with the previous result implies condition (2).

∎

Proof of Proposition 4.1.

Since the proofs of (i) and (ii) are almost identical, we only prove (i). By the triangle inequality,

\displaystyle\left\|\widehat{M}^{S}_{nT}(i,j,s)-M(i,j,s)\right\|_{\infty}

\displaystyle\leq\left\|\widehat{M}^{S}_{nT}(i,j,s)-M^{S}(i,j,s)\right\|_{\infty}+\left\|M^{S}(i,j,s)-M(i,j,s)\right\|_{\infty},

(B.207)

where $M^{S}(i,j,s)\coloneqq\sum_{\ell=0}^{S}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\beta_{0j},s)$ . For the first term on the right-hand side, observe that

	$\displaystyle\left\|\{\widehat{M}^{S}_{nT}(i,j,s)\}_{k}-\{M^{S}(i,j,s)\}_{k}\right\|$	$\displaystyle\leq\sum_{\ell=0}^{S}\left\|\{W_{n}^{\ell}\bm{e}_{i}\}_{k}\right\|\cdot\left\|\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s)-\gamma^{\ell}(\beta_{0j},s)\right\|$		(B.208)
		$\displaystyle\lesssim\sum_{\ell=0}^{S}\left\|\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s)-\gamma^{\ell}(\beta_{0j},s)\right\|$		(B.209)

for all $k\in[n]$ . By definition, when $\ell=0$ ,

\displaystyle\left|\widehat{\gamma}_{nT}^{0}(\widehat{\beta}_{nT,j},s)-\gamma^{0}(\beta_{0j},s)\right|

\displaystyle=\left|\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)\right|\lesssim_{p}c_{n}

(B.210)

uniformly in $s\in[0,1]$ , where $c_{n}\coloneqq\sqrt{K}/\sqrt{nT}+K^{1-\pi}$ . When $\ell=1$ , by Assumption 3.4,

$\displaystyle\left\|\widehat{\gamma}_{nT}^{1}(\widehat{\beta}_{nT,j},s)-\gamma^{1}(\beta_{0j},s)\right\|$	$\displaystyle=\left\|\widehat{\alpha}_{nT}(s)A(\widehat{\beta}_{nT,j},s)-\alpha_{0}(s)A(\beta_{0j},s)\right\|$	(B.211)
	$\displaystyle\leq\left\|\widehat{\alpha}_{nT}(s)\|\cdot\|A(\widehat{\beta}_{nT,j}-\beta_{0j},s)\right\|+\left\|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)\|\cdot\|A(\beta_{0j},s)\right\|$	(B.212)
	$\displaystyle\lesssim_{p}(\overline{\alpha}_{0}+c_{n})\cdot\sup_{s\in[0,1]}\|\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)\|+c_{n}$	(B.213)
	$\displaystyle\lesssim\overline{\alpha}_{0}c_{n}+c_{n}$	(B.214)

uniformly in $s\in[0,1]$ . Similarly, when $\ell=2$ , we have

$\displaystyle\left\|\widehat{\gamma}_{nT}^{2}(\widehat{\beta}_{nT,j},s)-\gamma^{2}(\beta_{0j},s)\right\|$	$\displaystyle=\left\|\widehat{\alpha}_{nT}(s)A(\widehat{\gamma}_{nT}^{1}(\widehat{\beta}_{nT,j},\cdot),s)-\alpha_{0}(s)A(\gamma^{1}(\beta_{0j},\cdot),s)\right\|$	(B.215)
	$\displaystyle\leq\left\|\widehat{\alpha}_{nT}(s)\|\cdot\|A(\widehat{\gamma}_{nT}^{1}(\widehat{\beta}_{nT,j},\cdot)-\gamma^{1}(\beta_{0j},\cdot),s)\right\|+\left\|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)\|\cdot\|A(\gamma^{1}(\beta_{0j},\cdot),s)\right\|$	(B.216)
	$\displaystyle\lesssim_{p}(\overline{\alpha}_{0}+c_{n})\cdot(\overline{\alpha}_{0}c_{n}+c_{n})+\overline{\alpha}_{0}c_{n}$	(B.217)
	$\displaystyle\lesssim\overline{\alpha}_{0}^{2}c_{n}+\overline{\alpha}_{0}c_{n}.$	(B.218)

Thus, repeating the same computation recursively, we can obtain $|\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s)-\gamma^{\ell}(\beta_{0j},s)|\lesssim_{p}\overline{\alpha}_{0}^{\ell-1}c_{n}$ for general $\ell\geq 1$ under $\overline{\alpha}_{0}<1$ . From a straightforward calculation, we have $\sum_{\ell=1}^{S}\overline{\alpha}_{0}^{\ell-1}c_{n}=c_{n}(1-\overline{\alpha}_{0}^{S})/(1-\overline{\alpha}_{0})$ , which leads to $||\widehat{M}^{S}_{nT}(i,j,s)-M^{S}(i,j,s)||_{\infty}\lesssim_{p}c_{n}$ .

Next, observe that $M^{S}(i,j,s)-M(i,j,s)=\sum_{\ell=S+1}^{\infty}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\beta_{0j},s)$ and that

\displaystyle|\gamma^{0}(\beta_{0j},s)|\leq\overline{\beta}_{0j},\;\;|\gamma^{1}(\beta_{0j},s)|\leq\overline{\alpha}_{0}\overline{\beta}_{0j},\;\;\ldots,\;\;|\gamma^{\ell}(\beta_{0j},s)|\leq\overline{\alpha}_{0}^{\ell}\overline{\beta}_{0j}

(B.219)

by repeatedly applying Assumption 3.4, where $\overline{\beta}_{0j}\coloneqq\sup_{s\in[0,1]}|\beta_{0j}(s)|$ . Hence, we have

\displaystyle\left\|M^{S}(i,j,s)-M(i,j,s)\right\|_{\infty}\lesssim\sum_{\ell=S+1}^{\infty}|\gamma^{\ell}(\beta_{0j},s)|\leq\frac{\overline{\beta}_{0j}}{1-\overline{\alpha}_{0}}\cdot\overline{\alpha}_{0}^{S+1}.

(B.220)

Combining these results completes the proof.

∎

Appendix C Consistent variance estimation

First, observe the following alternative representations of $\mathcal{V}_{z,nT}$ and $\mathcal{V}_{ab,nT}$ :

	$\displaystyle\mathcal{V}_{z,nT}$	$\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\mathbb{E}[\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})]$		(C.1)
	$\displaystyle\mathcal{V}_{ab,nT}$	$\displaystyle=\frac{2}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{1\leq i,j\leq n}p_{a,i,j}p_{b,i,j}\mathbb{E}[\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})]\mathbb{E}[\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}})].$		(C.2)

Define $\vec{\widehat{e}}_{it}(s)\coloneqq e_{i,t+1}(s;\widehat{\theta}_{nT})-e_{it}(s;\widehat{\theta}_{nT})$ ,

$\displaystyle\widehat{\mathcal{V}}_{z,nT}$	$\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})$	(C.3)
$\displaystyle\widehat{\mathcal{V}}_{ab,nT}$	$\displaystyle=\frac{2}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{1\leq i,j\leq n}p_{a,i,j}p_{b,i,j}\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})\vec{\widehat{e}}_{jt}(s_{l})\vec{\widehat{e}}_{jt^{\prime}}(s_{l^{\prime}})$	(C.4)
$\displaystyle\widehat{\mathcal{V}}_{nT}$	$\displaystyle\coloneqq\left(\begin{array}[]{cccc}\widehat{\mathcal{V}}_{z,nT}&\bm{0}_{(d_{q}+d_{x})K\times 1}&\cdots&\bm{0}_{(d_{q}+d_{x})K\times 1}\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\widehat{\mathcal{V}}_{11,nT}&\cdots&\widehat{\mathcal{V}}_{1M,nT}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\widehat{\mathcal{V}}_{M1,nT}&\cdots&\widehat{\mathcal{V}}_{MM,nT}\end{array}\right)$	(C.9)
$\displaystyle\widehat{\Sigma}_{nT}$	$\displaystyle\coloneqq\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\widehat{\theta}_{nT})\right)^{-1}\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\widehat{\mathcal{V}}_{nT}\Omega_{nT}\overline{J}_{nT}(\widehat{\theta}_{nT})\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\widehat{\theta}_{nT})\right)^{-1}.$	(C.10)

Then, our variance estimators for $\sigma_{nT,\alpha}(s)$ and $\sigma_{nT,\beta}(s)$ are given as

	$\displaystyle\widehat{\sigma}_{nT,\alpha}(s)$	$\displaystyle\coloneqq\sqrt{\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\widehat{\Sigma}_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)}$		(C.11)
	$\displaystyle\widehat{\sigma}_{nT,j}(s)$	$\displaystyle\coloneqq\sqrt{\phi^{K}(s)^{\top}\mathbb{S}_{j}\widehat{\Sigma}_{nT}\mathbb{S}_{j}^{\top}\phi^{K}(s)},$		(C.12)

respectively.

Proposition C.1 (Consistent variance estimation).

Suppose that the assumptions in Theorem 3.2 are satisfied. In addition, assume that $K^{3}/(nT)\to 0$ and $K^{2-\pi}\to 0$ as $nT\to\infty$ . Then,

(i)

$\left\|\widehat{\mathcal{V}}_{z,nT}-\mathcal{V}_{z,nT}\right\|\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi}$
(ii)

$\left\|\widehat{\mathcal{V}}_{ab,nT}-\mathcal{V}_{ab,nT}\right\|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}$ for all $1\leq a,b\leq M$
(iii)

$\widehat{\sigma}_{nT,\alpha}(s)/\sigma_{nT,\alpha}(s)\overset{p}{\to}1$
(iv)

$\widehat{\sigma}_{nT,j}(s)/\sigma_{nT,j}(s)\overset{p}{\to}1$ for all $j\in[d_{x}]$ .

Proof.

(i) Decompose

\displaystyle\widehat{\mathcal{V}}_{z,nT}-\mathcal{V}_{z,nT}=\left(\widehat{\mathcal{V}}_{z,nT}-\widetilde{\mathcal{V}}_{z,nT}\right)+\left(\widetilde{{\mathcal{V}}}_{z,nT}-\mathcal{V}_{z,nT}\right),

(C.13)

where

\displaystyle\widetilde{\mathcal{V}}_{z,nT}

\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}}).

(C.14)

Since

\displaystyle\vec{\widehat{e}}_{it}(s)=\underbracket{\vec{a}_{it}(s)[\alpha_{0}(s)-\widehat{\alpha}_{nT}(s)]+\sum_{j=1}^{d_{x}}\vec{X}_{it}^{j}[\beta_{0j}(s)-\widehat{\beta}_{nT,j}(s)]}_{\eqqcolon b_{it}(s)}+\vec{\varepsilon}_{it}(s),

(C.15)

we have

\displaystyle\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})

\displaystyle=\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})\varepsilon_{it^{\prime}}(s_{l^{\prime}})+\vec{\varepsilon}_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}}).

(C.16)

Then, we can write

$\displaystyle\widehat{\mathcal{V}}_{z,nT}-\widetilde{\mathcal{V}}_{z,nT}$	$\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}[\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})-\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})]$	(C.17)
	$\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})$	(C.18)
	$\displaystyle\quad+\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}b_{it}(s_{l})\varepsilon_{it^{\prime}}(s_{l^{\prime}})$	(C.19)
	$\displaystyle\quad+\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>\|t^{\prime}-t\|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\varepsilon}_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}}).$	(C.20)

In view of Theorem 3.1(ii) and (iii), we can easily find that $|b_{it}(s)|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}$ uniformly in $s$ and $(i,t)$ . In addition, it is straightforward to see that $||\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}||\lesssim K$ , and $||\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\varepsilon}_{it}(s_{l})||$ , $||\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\varepsilon_{it^{\prime}}(s_{l^{\prime}})||\lesssim_{p}K$ by Markov’s inequality. Hence, we have $||\widehat{\mathcal{V}}_{z,nT}-\widetilde{\mathcal{V}}_{z,nT}||\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi}$ .

Meanwhile, it is not difficult to see that $||\widetilde{\mathcal{V}}_{z,nT}-\mathcal{V}_{z,nT}||\lesssim_{p}K/\sqrt{nT}$ by Markov’s inequality. Then, the result follows from the triangle inequality.

(ii) Similar to the above, we decompose

\displaystyle\widehat{\mathcal{V}}_{ab,nT}-\mathcal{V}_{ab,nT}=\left(\widehat{\mathcal{V}}_{ab,nT}-\widetilde{\mathcal{V}}_{ab,nT}\right)+\left(\widetilde{{\mathcal{V}}}_{ab,nT}-\mathcal{V}_{ab,nT}\right),

(C.21)

where

\displaystyle\widetilde{\mathcal{V}}_{ab,nT}

\displaystyle=\frac{2}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{1\leq i,j\leq n}p_{a,i,j}p_{b,i,j}\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}}).

(C.22)

For the first term on the right-hand side, noting that

$\displaystyle\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})\vec{\widehat{e}}_{jt}(s_{l})\vec{\widehat{e}}_{jt^{\prime}}(s_{l^{\prime}})$	$\displaystyle=\{\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})\varepsilon_{it^{\prime}}(s_{l^{\prime}})+\vec{\varepsilon}_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})\}$	(C.23)
	$\displaystyle\quad\times\{\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}})+b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})+b_{jt}(s_{l})\varepsilon_{jt^{\prime}}(s_{l^{\prime}})+\vec{\varepsilon}_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})\}$	(C.24)
	$\displaystyle=\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}})$	(C.25)
	$\displaystyle\quad+\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}})b_{jt}(s_{l})+\;\;\cdots\text{(three $\vec{\varepsilon}$'s * one $b$)}$	(C.26)
	$\displaystyle\quad+\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})+\;\;\cdots\text{(two $\vec{\varepsilon}$'s * two $b$'s)}$	(C.27)
	$\displaystyle\quad+\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})b_{it}(s_{l})b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})+\;\;\cdots\text{(one $\vec{\varepsilon}$ * three $b$'s)}$	(C.28)
	$\displaystyle\quad+b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}}),$	(C.29)

it is not difficult to show that $|\widehat{\mathcal{V}}_{ab,nT}-\widetilde{\mathcal{V}}_{ab,nT}|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}$ .

For the second term, following the analogous argument as in the proof of Proposition 2 Lin and Lee (2010), $|\widetilde{{\mathcal{V}}}_{ab,nT}-\mathcal{V}_{ab,nT}|\lesssim_{p}1/\sqrt{nT}$ holds. Hence, we obtain the desired result.

(iii) To prove the result, it suffices to show that

\displaystyle\left|\frac{[\widehat{\sigma}_{nT,\alpha}(s)]^{2}-[\sigma_{nT,\alpha}(s)]^{2}}{[\sigma_{nT,\alpha}(s)]^{2}}\right|\overset{p}{\to}0.

(C.30)

As shown in the proof of Theorem 3.2, $[\sigma_{nT,\alpha}(s)]^{2}$ is bounded below from $c||\phi^{K}(s)||^{2}$ . On the other hand, writing $R_{nT}\coloneqq\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}$ and its estimator counterpart as $\widehat{R}_{nT}$ , by the triangle inequality,

$\displaystyle\left\|[\widehat{\sigma}_{nT,\alpha}(s)]^{2}-[\sigma_{nT,\alpha}(s)]^{2}\right\|$	$\displaystyle\leq\left\|\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\left[\widehat{\Sigma}_{nT}-\Sigma_{nT}\right]\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right\|$	(C.31)
	$\displaystyle\leq\left\\|\widehat{\mathcal{V}}_{nT}-\mathcal{V}_{nT}\right\\|\cdot\left\\|\widehat{R}_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right\\|^{2}+\lambda_{\max}(\widehat{\mathcal{V}}_{nT})\cdot\left\\|\left\{\widehat{R}_{nT}-R_{nT}\right\}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right\\|^{2}$	(C.32)
	$\displaystyle\quad+2\left\|\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}R_{nT}\widehat{\mathcal{V}}_{nT}\left\{\widehat{R}_{nT}-R_{nT}\right\}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right\|$	(C.33)
	$\displaystyle\lesssim_{p}\left(K^{3/2}/\sqrt{nT}+K^{2-\pi}\right)\left\\|\phi^{K}(s)\right\\|^{2},$	(C.34)

where the last inequality is from $||\widehat{R}_{nT}-R_{nT}||\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}$ by Lemma B.9(i) and (iv) and $||\widehat{\mathcal{V}}_{nT}-\mathcal{V}_{nT}||\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi}$ by results (i) and (ii). Thus, we have

\displaystyle\left|\frac{[\widehat{\sigma}_{nT,\alpha}(s)]^{2}-[\sigma_{nT,\alpha}(s)]^{2}}{[\sigma_{nT,\alpha}(s)]^{2}}\right|\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi}\to 0.

(C.35)

(iv) Analogous to the proof of (iii). ∎

Appendix D Supplementary figures for the empirical analysis

References

Belloni et al. (2015) Belloni, A., Chernozhukov, V., Chetverikov, D., and Kato, K., 2015. Some new asymptotic theory for least squares series: Pointwise and uniform results, Journal of Econometrics, 186 (2), 345–366.
Beyaztas et al. (2024) Beyaztas, U., Shang, H.L., Sezer, G.B., Mandal, A., Zoh, R.S., and Tekwe, C.D., 2024. Spatial function-on-function regression, arXiv preprint, 2412.17327.
Chen and Müller (2012) Chen, K. and Müller, H.G., 2012. Modeling repeated functional observations, Journal of the American Statistical Association, 107 (500), 1599–1609.
Chen (2007) Chen, X., 2007. Chapter 76 large sample sieve estimation of semi-nonparametric models, Elsevier, Handbook of Econometrics, vol. 6, 5549–5632.
Delicado (2011) Delicado, P., 2011. Dimensionality reduction when data are density functions, Computational Statistics & Data Analysis, 55 (1), 401–420.
Denbee et al. (2021) Denbee, E., Julliard, C., Li, Y., and Yuan, K., 2021. Network risk and key players: A structural analysis of interbank liquidity, Journal of Financial Economics, 141 (3), 831–859.
Di et al. (2024) Di, C., Wang, G., Wu, S., Evenson, K.R., LaMonte, M.J., and LaCroix, A.Z., 2024. Utilizing wearable devices to improve precision in physical activity epidemiology: Sensors, data and analytic methods, in: Statistics in Precision Health: Theory, Methods and Applications, Springer, 41–64.
Dudley and Philipp (1983) Dudley, R.M. and Philipp, W., 1983. Invariance principles for sums of Banach space valued random elements and empirical processes, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 62 (4), 509–552.
Eren and Uz (2020) Eren, E. and Uz, V.E., 2020. A review on bike-sharing: The factors affecting bike-sharing demand, Sustainable Cities and Society, 54, 101882.
Faghih-Imani and Eluru (2016) Faghih-Imani, A. and Eluru, N., 2016. Incorporating the impact of spatio-temporal interactions on bicycle sharing system demand: A case study of New York CitiBike system, Journal of Transport Geography, 54, 218–227.
Gallant and White (1988) Gallant, R. and White, H., 1988. A unified theory of estimation and inference for nonlinear dynamic models, Blackwell.
Hoshino (2022) Hoshino, T., 2022. Sieve IV estimation of cross-sectional interaction models with nonparametric endogenous effect, Journal of Econometrics, 229 (2), 263–275.
Hoshino (2024) Hoshino, T., 2024. Functional spatial autoregressive models, arXiv preprint, 2402.14763.
Hron et al. (2016) Hron, K., Menafoglio, A., Templ, M., Hruzová, K., and Filzmoser, P., 2016. Simplicial principal component analysis for density functions in bayes spaces, Computational Statistics & Data Analysis, 94, 330–350.
Hyndman and Ullah (2007) Hyndman, R.J. and Ullah, M.S., 2007. Robust forecasting of mortality and fertility rates: A functional data approach, Computational Statistics & Data Analysis, 51 (10), 4942–4956.
Jenish (2012) Jenish, N., 2012. Nonparametric spatial regression under near-epoch dependence, Journal of Econometrics, 167 (1), 224–239.
Jenish and Prucha (2009) Jenish, N. and Prucha, I.R., 2009. Central limit theorems and uniform laws of large numbers for arrays of random fields, Journal of Econometrics, 150 (1), 86–98.
Jenish and Prucha (2012) Jenish, N. and Prucha, I.R., 2012. On spatial processes and asymptotic inference under near-epoch dependence, Journal of Econometrics, 170 (1), 178–190.
Koop et al. (1996) Koop, G., Pesaran, M.H., and Potter, S.M., 1996. Impulse response analysis in nonlinear multivariate models, Journal of Econometrics, 74 (1), 119–147.
Kress (2014) Kress, R., 2014. Linear Integral Equations, Third Edition, Springer.
Kuersteiner and Prucha (2020) Kuersteiner, G.M. and Prucha, I.R., 2020. Dynamic spatial panel models: Networks, common shocks, and sequential exogeneity, Econometrica, 88 (5), 2109–2146.
Lee and Yu (2010) Lee, L.F. and Yu, J., 2010. Estimation of spatial autoregressive panel data models with fixed effects, Journal of Econometrics, 154 (2), 165–185.
Lee and Yu (2014) Lee, L.F. and Yu, J., 2014. Efficient GMM estimation of spatial dynamic panel data models with fixed effects, Journal of Econometrics, 180 (2), 174–197.
Lin et al. (2018) Lin, L., He, Z., and Peeta, S., 2018. Predicting station-level hourly demand in a large-scale bike-sharing network: A graph convolutional neural network approach, Transportation Research Part C: Emerging Technologies, 97, 258–276.
Lin and Lee (2010) Lin, X. and Lee, L.F., 2010. GMM estimation of spatial autoregressive models with unknown heteroskedasticity, Journal of Econometrics, 157 (1), 34–52.
Ma et al. (2024) Ma, T., Yao, F., and Zhou, Z., 2024. Network-level traffic flow prediction: Functional time series vs. functional neural network approach, The Annals of Applied Statistics, 18 (1), 424–444.
Scott (1973) Scott, D.J., 1973. Central limit theorems for martingales and for processes with stationary increments using a skorokhod representation approach, Advances in Applied Probability, 5 (1), 119–137.
Su and Hoshino (2016) Su, L. and Hoshino, T., 2016. Sieve instrumental variable quantile regression estimation of functional coefficient models, Journal of Econometrics, 191 (1), 231–254.
Torti et al. (2021) Torti, A., Pini, A., and Vantini, S., 2021. Modelling time-varying mobility flows using function-on-function regression: Analysis of a bike sharing system in the city of milan, Journal of the Royal Statistical Society Series C: Applied Statistics, 70 (1), 226–247.
Xu and Lee (2015) Xu, X. and Lee, L.F., 2015. A spatial autoregressive model with a nonlinear transformation of the dependent variable, Journal of Econometrics, 186 (1), 1–18.
Yu et al. (2008) Yu, J., De Jong, R., and Lee, L.F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large, Journal of Econometrics, 146 (1), 118–134.
Zhu et al. (2022) Zhu, X., Cai, Z., and Ma, Y., 2022. Network functional varying coefficient model, Journal of the American Statistical Association, 117 (540), 2074–2085.

	$\displaystyle\|A(\mathbb{E}[Y_{jt}],s)\|$	$\displaystyle\leq\int_{0}^{1}\|\mathbb{E}[Y_{jt}(u)]\|\omega_{1}(u,s)\text{d}u$		(B.7)
		$\displaystyle\leq\int_{0}^{1}\mathbb{E}\|Y_{jt}(u)\|\omega_{1}(u,s)\text{d}u\lesssim 1$		(B.8)

$\displaystyle\mathbb{E}\left\|\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s)\right\|$	$\displaystyle\leq\sqrt{\mathbb{E}\left\\|\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\right\\|^{2}}\sqrt{\mathbb{E}\left\\|\bm{V}(s)\right\\|^{2}}$	(B.11)
	$\displaystyle\leq\sqrt{\text{trace}\{\bm{D}^{\top}P_{m}\bm{D}\bm{D}^{\top}P_{m}^{\top}\bm{D}\mathbb{E}[\bm{V}(s)\bm{V}(s)^{\top}]\}}\sqrt{\mathbb{E}\left\\|\bm{V}(s)\right\\|^{2}}$	(B.12)
	$\displaystyle\lesssim\sum_{t=1}^{T}\sum_{i=1}^{n}\mathbb{E}\|v_{it}(s)\|^{2}.$	(B.13)

	$\displaystyle\left\\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\\|_{2}$	$\displaystyle=\left\\|\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\\|_{2}$
		$\displaystyle\leq\|\alpha_{0}(s)\|\sum_{j=1}^{n}\|w_{i,j}\|\cdot\left\\|A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\\|_{2}$
		$\displaystyle\leq\|\alpha_{0}(s)\|\sum_{j=1}^{n}\|w_{i,j}\|\cdot\left(\int_{0}^{1}\left\\|Y_{jt}(u)-Y^{(\delta)}_{jt}(u)\right\\|_{2}^{2}\>\omega_{2}(u,s)\text{d}u\right)^{1/2}\leq C\cdot\varrho,$

$\displaystyle\left\|\mathrm{Cov}\left(\vec{\xi}_{i,t+1},\vec{\xi}_{jt}\right)\right\|$	$\displaystyle=\left\|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})}+\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}+\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right\|$	(B.54)
	$\displaystyle\leq\left\|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right\|+\left\|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right\|$	(B.55)
	$\displaystyle\quad+\left\|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right\|+\left\|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right\|.$	(B.56)

	$\displaystyle\left\\|\vec{\xi}_{2,it}^{(\delta_{ij})}\right\\|_{2}$	$\displaystyle=\left\\|\vec{\xi}_{it}-\mathbb{E}\left[\vec{\xi}_{it}\mid\mathcal{F}^{+}_{it}(\delta_{ij})\right]\right\\|_{2}$		(B.57)
		$\displaystyle\leq\left\\|\xi_{i,t+1}-\mathbb{E}\left[\xi_{i,t+1}\mid\mathcal{F}_{i,t+1}(\delta_{ij})\right]\right\\|_{2}+\left\\|\xi_{it}-\mathbb{E}\left[\xi_{it}\mid\mathcal{F}_{it}(\delta_{ij})\right]\right\\|_{2}\leq 4C_{\xi}\rho(\delta_{ij}).$		(B.58)

Functional Network Autoregressive Models for Panel Data

Abstract

Introduction

Example 1.1 (Health data analysis).

Example 1.2 (Demographic data analysis).

Example 1.3 (Transportation data analysis).

Paper organization

Notation

Functional Network Autoregressive Model

The model

Stationarity

Assumption 2.1 (Stationarity).

Proposition 2.1.

Estimation and Asymptotic Theory

Integrated-GMM estimation

Asymptotic theory

Assumption 3.1 (Sampling space).

Assumption 3.2 (Observables).

Assumption 3.3 (Error term).

Assumption 3.4 (Interaction operator).

Assumption 3.5 (Weight matrices).

Assumption 3.6 (Identification).

Assumption 3.7 (Series approximation).

Theorem 3.1 (Rates of convergence).

Remark 3.1 (Local estimation approach).

Assumption 3.8 (Misc.).

Theorem 3.2 (Asymptotic normality).

Remark 3.2 (Choice of KK).

Remark 3.3 (Incompletely observed response function).

Network multiplier effects: marginal effects and impulse responses

Proposition 4.1.

Monte Carlo Simulation

Analyzing the Demand of Bike-Sharing System

Data

Empirical results

Conclusion

Acknowledgments

Appendix A Preparation

Definition A.1 (Near-epoch dependence).

Appendix B Proofs

Lemma B.1.

Proof.

Lemma B.2.

Proof.

Lemma B.3.

Proof.

Lemma B.4.

Proof.

Lemma B.5.

Proof.

Lemma B.6.

Proof.

Lemma B.7.

Proof.

Lemma B.8.

Proof.

Lemma B.9.

Proof.

Verification of condition (1)

Verification of condition (2)

Appendix C Consistent variance estimation

Proposition C.1 (Consistent variance estimation).

Proof.

Appendix D Supplementary figures for the empirical analysis

References

Remark 3.2 (Choice of $K$ ).