This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Functional Network Autoregressive Models for Panel Data

Tomohiro Ando  and Tadao Hoshino Melbourne Business School, the University of Melbourne. Email: [email protected]. Address Correspondence to: Tadao Hoshino, School of Political Science and Economics, Waseda University. Email: [email protected].
Abstract

This study proposes a novel functional vector autoregressive framework for analyzing network interactions of functional outcomes in panel data settings. In this framework, an individual’s outcome function is influenced by the outcomes of others through a simultaneous equation system. To estimate the functional parameters of interest, we need to address the endogeneity issue arising from these simultaneous interactions among outcome functions. This issue is carefully handled by developing a novel functional moment-based estimator. We establish the consistency, convergence rate, and pointwise asymptotic normality of the proposed estimator. Additionally, we discuss the estimation of marginal effects and impulse response analysis. As an empirical illustration, we analyze the demand for a bike-sharing service in the U.S. The results reveal statistically significant spatial interactions in bike availability across stations, with interaction patterns varying over the time of day.


Keywords: bicycle-sharing systems, endogeneity, functional data analysis, panel data, network autoregressive models.

Introduction

The availability of functional data has been rapidly expanding across all fields of research, leading to a growing need for statistical tools that appropriately account for the unique characteristics of each type of functional data. In the analysis of socioeconomic data, there are at least two key aspects that should be addressed. The first is that an individual’s decision or behavioral pattern may influence that of others through social networks—interactions between individuals. The second is that individuals are intrinsically heterogeneous, even after controlling for observable characteristics—unobserved heterogeneity of individuals. Therefore, analyzing socioeconomic functional data requires functional models that jointly capture both of these aspects, which is the aim of this study.

More specifically, to account for the interactions among units, we extend the network (or spatial) autoregressive (NAR) modeling approach to a functional response model. To address the unobserved individual heterogeneity, we introduce the functional fixed effects approach, given the availability of panel data. When the response variable is a scalar rather than a function, there already exists a vast body of studies investigating fixed-effect NAR models for panel data, such as

Yit=α0j=1nwi,jYjt+Xitβ0+f0i+εit,i=1,,n,t=1,,T\displaystyle Y_{it}=\alpha_{0}\sum_{j=1}^{n}w_{i,j}Y_{jt}+X_{it}^{\top}\beta_{0}+f_{0i}+\varepsilon_{it},\;\;i=1,\ldots,n,\;t=1,\ldots,T (1.1)

and its variants (e.g., Yu et al., 2008; Lee and Yu, 2010, 2014; Kuersteiner and Prucha, 2020, among others). Here, YitY_{it} is a scalar outcome, wi,jw_{i,j} denotes a known weight term measuring the social or geographical proximity between units ii and jj, XitX_{it} is a vector of covariates, f0if_{0i} represents a fixed effect specific to each ii, and εit\varepsilon_{it} denotes an error term. The term j=1nwi,jYjt\sum_{j=1}^{n}w_{i,j}Y_{jt} captures the local trend of the outcome variable in the neighborhood of ii. Model (1.1) is typically applied in fields such as health, real estate, transportation, education, and municipal data. However, with the increasing availability of functional data in these fields, such as real-time activity recognition, real-time population mobility and congestion patterns, and regional wealth distributions, scalar models like (1.1) may fail to appropriately capture the complex nature of these interactions.

The above discussion motivates us to extend (1.1) to the following model: for s[0,1]s\in[0,1],

Yit(s)=α0(s)j=1nwi,jA(Yjt,s)+Xitβ0(s)+f0i(s)+εit(s),\displaystyle Y_{it}(s)=\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s), (1.2)

where YitY_{it} represents the outcome function of interest, which may or may not be a smooth function of ss, α0\alpha_{0} is the interaction effect function, β0\beta_{0} is a vector of functional coefficients, f0if_{0i} is the fixed effect function, and εit\varepsilon_{it} is the functional error term with mean zero at each ss. Here, A(,s)A(\cdot,s) denotes a known linear functional, whose functional form may differ according to the research interest. Since the response variable is a function, we can consider various types of interaction patterns. The most typical form of interaction would be the ”concurrent” interaction, where only the responses of others at the same evaluation point ss are influential. In this case, A(,s)A(\cdot,s) is a point-evaluation functional at ss: A(Yjt,s)=Yjt(s)A(Y_{jt},s)=Y_{jt}(s). When ss represents a time, then the past outcome should affect the future outcome (but the converse should not), which would motivate us to employ A(Yjt,s)=01Yjt(u)ν(u,s)duA(Y_{jt},s)=\int_{0}^{1}Y_{jt}(u)\nu(u,s)\text{d}u, where ν(u,s)\nu(u,s) is a user-chosen function that is non-negative, increasing in uu up to ss, and ν(u,s)=0\nu(u,s)=0 for u>su>s. For other examples, if others’ responses at all evaluation points are equivalently influential, we may use A(Yjt,s)=01Yjt(u)duA(Y_{jt},s)=\int_{0}^{1}Y_{jt}(u)\text{d}u. These examples can be represented as an integral operator A(Yjt,s)=01Yjt(u)ν(u,s)duA(Y_{jt},s)=\int_{0}^{1}Y_{jt}(u)\nu(u,s)\text{d}u with some kernel weight function ν(u,s)\nu(u,s). For example, in the case of point-evaluation functional, we can set ν(u,s)=δ(us)\nu(u,s)=\delta(u-s), where δ\delta denotes the Dirac delta function.

Here, we provide three types of empirical topics to which the model (1.2) would be nicely applied.

Example 1.1 (Health data analysis).

In the health literature, researchers have increasingly focused on real-time activity data collected through wearable devices or smartphone apps (see, e.g., Di et al. (2024) for a review). As a typical example, Yit(s)Y_{it}(s) represents the activity level of individual ii, measured by an accelerometer at time ss on day tt. Now, suppose the dataset consists of an elderly population, where some people in the same neighborhood frequently engage in fitness activities such as running. If we apply our model to their activity-level data, with wi,jw_{i,j} representing neighborhood membership and A(Yjt,s)=Yjt(s)A(Y_{jt},s)=Y_{jt}(s), we may observe a significantly positive interaction effect.

Example 1.2 (Demographic data analysis).

Demographic analysis is a major application of functional data analysis (FDA). For instance, functional analysis of regional age distributions (i.e., population pyramids) has been studied extensively (e.g., Delicado, 2011; Hron et al., 2016; Hoshino, 2024). Among these studies, Hoshino (2024) considered a functional spatial autoregressive model in which Yi(s)Y_{i}(s) represents the ss-th age quantile of city ii, allowing interactions with the age quantiles of neighboring cities. Another common application in FDA is the analysis of mortality and fertility rates (e.g., Hyndman and Ullah, 2007; Chen and Müller, 2012). In such a setting, the outcome function Yit(s)Y_{it}(s) may represent the age-specific fertility rate of women of age ss in city ii in year tt. The presence of regional interactions in fertility would be unsurprising.

Example 1.3 (Transportation data analysis).

Functional data analysis of transportation data, such as traffic flows and demand for transportation services, has been gaining significant attention (see, e.g., Ma et al. (2024) for recent advancements). In the empirical application of this study, we apply our model to analyze the bike use data in a U.S. bike-sharing system. In our empirical analysis, Yit(s)Y_{it}(s) represents the availability of bikes at station ii at time ss during week tt. We find statistically significant positive or negative spatial interactions among bike availabilities of nearby stations, depending on the time of day. Further details are provided in Section 6.

In model (1.2), the parameters of our primary interest to be estimated are α0\alpha_{0} and β0\beta_{0}. With the total time periods TT possibly large or small, we apply a first-differencing transformation to eliminate the individual fixed effects from the model. For the transformed model, rather than estimating α0(s)\alpha_{0}(s) and β0(s)\beta_{0}(s) pointwise at many different ss-values separately, we approximate them using orthonormal basis expansions and estimate their entire functional forms jointly in a single estimation. Our proposed estimator is based on the generalized method of moments (GMM). Specifically, we first derive a set of moment conditions at each ss, in a similar manner to Lin and Lee (2010), and then integrate these conditions numerically over s[0,1]s\in[0,1]. These integrated moment functions define our GMM objective function, and the resulting estimator is referred to as the integrated-GMM estimator. Once α0(s)\alpha_{0}(s) and β0(s)\beta_{0}(s) are estimated, if necessary, we can estimate the fixed effects f01(s),,f0n(s)f_{01}(s),\ldots,f_{0n}(s) simply by taking the individual-level mean of the residuals.

Note that in model (1.2), the outcome functions appear on both the left- and right-hand sides, implying that it is formulated as a system of simultaneous functional equations. Depending on the true values of the functional parameters, the model may exhibit an explosive network interaction process, leading to non-stationarity and inconsistency of the proposed estimator. Thus, we first derive the condition under which the model attains the stationarity in our context. We consequently show that the magnitude of network interactions must reside within a certain range.

Then, under the stationarity condition on network interactions, along with some regularity conditions, we derive the convergence rates of the integrated-GMM estimators for α0\alpha_{0} and β0\beta_{0}. In addition, we prove that the estimators are asymptotically normal at each evaluation point ss. Due to the complexity of characterizing the stochastic process of functional outcomes, the numerical integration of moment functions, the first-differencing elimination of functional fixed effects, and the need to appropriately control the order of basis expansion, among other factors, establishing these results involves new mathematical challenges and requires careful discussion. These theoretical results are numerically corroborated through a series of Monte Carlo experiments.

As an empirical illustration of our method, we apply it to the demand analysis of a bike-sharing system in the San Francisco Bay Area, U.S. Using publicly available data from Bay Area Bike Share, we analyze spatial interactions in bike availability across 70 stations from May 2014 to August 2015. Our results reveal significant positive spatial interactions in bike availability during the morning hours, while negative interactions emerge in the early evening. Furthermore, we conduct an impulse response analysis to demonstrate how a reduction in bike availability at a given station propagates to nearby stations over time. These findings highlight the importance of spatial interactions in shared mobility services and demonstrate the practical applicability of our method.


Our paper relates to a broad range of theoretical and empirical literature. From a theoretical perspective, our study contributes to both the FDA literature and the network/spatial interactions literature by proposing a new model that connects the two. In this sense, one of the most closely related studies to ours is Zhu et al. (2022). They proposed a functional NAR model similar to (1.2) but not in panel data settings. In contrast to Zhu et al. (2022), our novel GMM estimator requires neither parametric assumptions nor I.I.D. conditions for the disturbance term. This weak requirement arises from that we treat the individual effects as parameters, whereas Zhu et al. (2022) perform functional principal component analysis to control them based on some homogeneity condition. Moreover, they considered only a concurrent interaction case (i.e., A(Yjt,s)=Yjt(s)A(Y_{jt},s)=Y_{jt}(s)). As described earlier through the examples of health data, demographic data, and transportation data, the variable ss typically represents a time on some scale. The concurrent interaction rules out interactions even with immediate past outcomes and allows only strictly simultaneous interactions, which should limit the applicability of the model. Computationally, our estimator can recover the full functional forms of the functional parameters in a single step, while the estimator in Zhu et al. (2022) must be repeatedly applied at each evaluation point ss.

On the empirical side, demand forecasting for bike-sharing systems has been an active topic in the data science literature (e.g., Faghih-Imani and Eluru, 2016; Lin et al., 2018; Eren and Uz, 2020; Torti et al., 2021, among others). Among these studies, Faghih-Imani and Eluru (2016) is most closely related to our study in that they employed a spatial panel model similar to (1.1) to analyze the spatial and temporal interaction structure for the bike-sharing system in New York City, CitiBike. In their approach, however, the data are not treated as functional, and thus the model parameters are not allowed to vary over time. By contrast, Torti et al. (2021) analyzed the flow of bikes in the bike-sharing system in Milan, BikeMi, through a functional linear model with functional coefficients; however, they did not account for the spatial interactions of mobility. Thus, our empirical study can be viewed as combining the strengths of these two papers.


The major contributions of this paper are summarized as follows: First, we propose a novel model for analyzing various forms of network and spatial interactions underlying socioeconomic functional data. Second, we formally establish a condition that ensures the outcome functions follow a unique network-stationary process within the model. Third, we develop a novel GMM-type estimator, the integrated-GMM estimator, for estimating the functional parameters. Fourth, we establish the asymptotic properties of the integrated-GMM estimator, including its consistency, rate of convergence, and pointwise limiting distribution. Fifth, we additionally develop a new approach for implementing network impulse response analysis and investigate its convergence property. Finally, we apply our method to the analysis of bike-sharing demand, offering new empirical insights into functional spatial interactions in mobility.

Paper organization

The rest of the paper is organized as follows. In Section 2, we present our model and discuss its stationarity condition. Section 3 introduces our integrated-GMM estimator and investigates its asymptotic properties. In Section 4, we discuss additional topics related to our model, including the estimation of marginal effects and network impulse response analysis. Section 5 conducts a set of Monte Carlo simulations to numerically demonstrate the properties of our estimator. Section 6 presents our empirical analysis on the U.S. bike-sharing data, and Section 7 concludes. Proofs of all technical results are provided in Appendix.

Notation

For a function hh defined on [0,1][0,1] and p[1,)p\in[1,\infty), the LpL^{p} norm of hh is written as hLp(01|h(s)|pds)1/p||h||_{L^{p}}\coloneqq(\int_{0}^{1}|h(s)|^{p}\text{d}s)^{1/p}, and Lp(0,1)L^{p}(0,1) denotes the set of hh’s such that hLp<||h||_{L^{p}}<\infty. For a random variable XX, the LpL^{p} norm of XX is written as Xp(𝔼|X|p)1/p||X||_{p}\coloneqq(\mathbb{E}|X|^{p})^{1/p}. For a matrix MM, M||M||, M1||M||_{1}, and M||M||_{\infty} denote the Frobenius norm, the maximum absolute column sum, and the maximum absolute row sum of MM, respectively. If MM is a square matrix, we use λmax(M)\lambda_{\max}(M) and λmin(M)\lambda_{\min}(M) to denote its largest and smallest eigenvalues, respectively. For a positive integer ZZ, we denote [Z]{1,,Z}[Z]\coloneqq\{1,\ldots,Z\}. We use IZI_{Z} to denote an identity matrix of dimension ZZ. Finally, XYX\lesssim Y if X=O(Y)X=O(Y) almost surely, and XPYX\lesssim_{P}Y if X=OP(Y)X=O_{P}(Y).

Functional Network Autoregressive Model

The model

Suppose that we have balanced panel data of size (n,T)(n,T): {(Yit,Xit,wi,1,,wi,n):i[n],t[T]}\{(Y_{it},X_{it},w_{i,1},\ldots,w_{i,n}):i\in[n],\>t\in[T]\}. The number of time periods TT can be either fixed or tending to infinity jointly with the sample size nn. Here, Yit:[0,1]Y_{it}:[0,1]\to\mathbb{R} denotes a random outcome function of interest with the common support [0,1][0,1], Xit=(Xit1,,Xitdx)X_{it}=(X_{it}^{1},\ldots,X_{it}^{d_{x}})^{\top} denotes a vector of covariates, and wi,jw_{i,j}\in\mathbb{R} is the (i,j)(i,j)-th element of an n×nn\times n time-invariant interaction matrix Wn=(wi,j)W_{n}=(w_{i,j}). The value of each wi,jw_{i,j} is pre-determined non-randomly. In social network analysis, it is common to set wi,j=ci,j𝟏{i and j are peers}w_{i,j}=c_{i,j}\bm{1}\{\text{$i$ and $j$ are peers}\}, where ci,jc_{i,j} is some normalizing constant. Similarly, if each ii represents a spatial unit, one may use wi,j=ci,j𝟏{Δ(i,j)Δ¯}w_{i,j}=c_{i,j}\bm{1}\{\Delta(i,j)\leq\overline{\Delta}\}, where Δ(i,j)\Delta(i,j) is the distance between ii and jj, and Δ¯\overline{\Delta} is a given threshold. As is the convention, we set wi,i=0w_{i,i}=0 for all ii for normalization.

As shown in (1.2), our working model is given as follows: for s[0,1]s\in[0,1],

Yit(s)=α0(s)A(Y¯it,s)+Xitβ0(s)+f0i(s)+εit(s),i[n],t[T]\displaystyle Y_{it}(s)=\alpha_{0}(s)A(\overline{Y}_{it},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s),\;\;i\in[n],\;\;t\in[T] (2.1)

where Y¯it=j=1nwi,jYjt\overline{Y}_{it}=\sum_{j=1}^{n}w_{i,j}Y_{jt}. Recall that we throughout assume A(,)A(\cdot,\cdot) is linear in its first argument so that we have j=1nwi,jA(Yjt,)=A(Y¯it,)\sum_{j=1}^{n}w_{i,j}A(Y_{jt},\cdot)=A(\overline{Y}_{it},\cdot). For the structure of network interaction, Beyaztas et al. (2024) and Hoshino (2024) consider alternatively the following form: 01Y¯it(u)α0(u,s)du\int_{0}^{1}\overline{Y}_{it}(u)\alpha_{0}(u,s)\,\text{d}u, reflecting the usual functional linear form in the FDA literature. We cannot say which of this type of interaction structure or the proposed one is more general, but ours may offer some interpretational simplicity. We impose additional shape restrictions on A(,)A(\cdot,\cdot) later.

The parameters of primary interest are the interaction effect function α0(s)\alpha_{0}(s) and the coefficient functions β0(s)=(β01(s),,β0dx(s))\beta_{0}(s)=(\beta_{01}(s),\ldots,\beta_{0d_{x}}(s)). The functional individual effects f01(s),,f0n(s)f_{01}(s),\ldots,f_{0n}(s) are treated as nuisance parameters. Restricting the support of ss to be a unit interval is a normalization, which is harmless as long as the response functions have the identical interval support. For simplicity, we do not explicitly assume that XitX_{it} is a function of ss, which can be relaxed easily at the expense of more complicated notations and proofs. A constant term is not included in XitX_{it}. In the following, we assume that YitL2(0,1)Y_{it}\in L^{2}(0,1), εitL2(0,1)\varepsilon_{it}\in L^{2}(0,1), and that α0\alpha_{0} and β0\beta_{0} are continuous.

Stationarity

We discuss the stationarity of our model. Recall that our model is a system of simultaneous functional equations, which may not have a unique interior solution in general, depending on the parameter values. Thus, just like the stability condition for vector autoregressive models in the time series literature, we need to impose some conditions on the model to ensure that the outcome functions follow a unique stationary data-generating process and prevent explosive behavior.

Let Yt(s)=(Y1t(s),,Ynt(s))Y_{t}(s)=(Y_{1t}(s),\ldots,Y_{nt}(s))^{\top}, 𝑨(Yt,s)=(A(Y1t,s),,A(Ynt,s))\bm{A}(Y_{t},s)=(A(Y_{1t},s),\ldots,A(Y_{nt},s))^{\top}, Xt=(X1t,,Xnt)X_{t}=(X_{1t},\ldots,X_{nt})^{\top}, F0(s)=(f01(s),,f0n(s))F_{0}(s)=(f_{01}(s),\ldots,f_{0n}(s))^{\top}, and t(s)=(ε1t(s),,εnt(s))\mathcal{E}_{t}(s)=(\varepsilon_{1t}(s),\ldots,\varepsilon_{nt}(s))^{\top}. Then, we can re-write (1.2) in matrix form as

Yt(s)=α0(s)Wn𝑨(Yt,s)+Xtβ0(s)+F0(s)+t(s),t[T].\displaystyle Y_{t}(s)=\alpha_{0}(s)W_{n}\bm{A}(Y_{t},s)+X_{t}\beta_{0}(s)+F_{0}(s)+\mathcal{E}_{t}(s),\;\;t\in[T]. (2.2)

This expression clearly indicates that our model is characterized as TT distinct systems of functional equations of size nn. We introduce the following assumption to ensure that the model has a unique stationary solution.

Assumption 2.1 (Stationarity).

(i) α¯01\overline{\alpha}_{0}\lesssim 1 and Wn1||W_{n}||_{\infty}\lesssim 1 such that α¯0Wn<1\overline{\alpha}_{0}||W_{n}||_{\infty}<1, where α¯0maxs[0,1]|α0(s)|\overline{\alpha}_{0}\coloneqq\max_{s\in[0,1]}|\alpha_{0}(s)|. (ii) For any hL2(0,1)h\in L^{2}(0,1), A(h,)L2hL2||A(h,\cdot)||_{L^{2}}\leq||h||_{L^{2}}.

Assumption 2.1(i) requires that the magnitude of the network interaction is not too strong. The existence of α¯0\overline{\alpha}_{0} is guaranteed by the continuity of α0\alpha_{0}. With Assumption 2.1(ii), we have for any h,hL2(0,1)h,h^{\prime}\in L^{2}(0,1) that A(hh,)L2hhL2||A(h-h^{\prime},\cdot)||_{L^{2}}\leq||h-h^{\prime}||_{L^{2}}, which indicates the contraction property of the operator AA. This assumption still accommodates many empirically interesting interaction patterns. For example, in the case of point-evaluation functional A(h,s)=h(s)A(h,s)=h(s), it trivially satisfies A(h,)L2=hL2||A(h,\cdot)||_{L^{2}}=||h||_{L^{2}}. For another example, suppose A(h,s)=01h(u)ν(u,s)duA(h,s)=\int_{0}^{1}h(u)\nu(u,s)\text{d}u for some continuous ν\nu. Since

A(h,)L220101|h(u)ν(u,s)|2dudsν¯2hL22,\displaystyle||A(h,\cdot)||_{L^{2}}^{2}\leq\int_{0}^{1}\int_{0}^{1}\left|h(u)\nu(u,s)\right|^{2}\text{d}u\text{d}s\leq\overline{\nu}^{2}||h||_{L^{2}}^{2}, (2.3)

where ν¯maxu,s[0,1]2|ν(u,s)|\overline{\nu}\coloneqq\max_{u,s\in[0,1]^{2}}|\nu(u,s)|, Assumption 2.1(ii) is implied if ν¯1\overline{\nu}\leq 1 holds.


Now, denote n,p{H=(h1,,hn):hiLp(0,1)for alli}\mathcal{H}_{n,p}\coloneqq\{H=(h_{1},\ldots,h_{n}):h_{i}\in L^{p}(0,1)\;\text{for all}\;i\}, and define a linear operator 𝒜\mathcal{A} as

(𝒜H)(s)α0(s)Wn𝑨(H,s),Hn,p.\displaystyle(\mathcal{A}H)(s)\coloneqq\alpha_{0}(s)W_{n}\bm{A}(H,s),\;\;H\in\mathcal{H}_{n,p}. (2.4)

Then, we can write our model symbolically as follows:

Yt=𝒜Yt+Xtβ0+F0+t,t[T].\displaystyle Y_{t}=\mathcal{A}Y_{t}+X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t},\;\;t\in[T]. (2.5)

Further, denoting Id to be the identity operator, if the inverse operator (Id𝒜)1(\text{Id}-\mathcal{A})^{-1} exists, the solution YtY_{t} of the system can be uniquely determined up to an equivalence class in n,2\mathcal{H}_{n,2} as

Yt=(Id𝒜)1[Xtβ0+F0+t],t[T].\displaystyle Y_{t}=(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t}],\;\;t\in[T]. (2.6)

The next proposition states that Assumption 2.1 is sufficient for the existence of (Id𝒜)1(\text{Id}-\mathcal{A})^{-1}.

Proposition 2.1.

Suppose that Assumption 2.1 holds. Then, (Id𝒜)1(\text{Id}-\mathcal{A})^{-1} exists, and for each t[T]t\in[T], YtY_{t} is the only solution of (1.2) in the Banach space (n,2,||||,2)(\mathcal{H}_{n,2},||\cdot||_{\infty,2}), where H,pmax1inhiLp||H||_{\infty,p}\coloneqq\max_{1\leq i\leq n}||h_{i}||_{L^{p}}.

Note that the explicit form of the inverse operator (Id𝒜)1(\text{Id}-\mathcal{A})^{-1} cannot be derived in general, except for some simple cases such as (𝒜Yt)(s)=α0(s)WnYt(s)(\mathcal{A}Y_{t})(s)=\alpha_{0}(s)W_{n}Y_{t}(s). In this case, (Id𝒜)1(\text{Id}-\mathcal{A})^{-1} is obtained as (Inα0()Wn)1(I_{n}-\alpha_{0}(\cdot)W_{n})^{-1}. However, in practice, we can approximate it with arbitrary precision by truncating the Neumann series expansion (Id𝒜)1==0𝒜(\text{Id}-\mathcal{A})^{-1}=\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell} at a sufficiently large order (see, e.g., Kress, 2014). See Remark 1 in Zhu et al. (2022) for a related discussion.

Estimation and Asymptotic Theory

Integrated-GMM estimation

To estimate the unknown functional parameters α0(s)\alpha_{0}(s) and β0(s)\beta_{0}(s), there are broadly two approaches. The first is a ”local” approach that estimates the values of these functions at specific ss-values, repeating the estimation across different points to recover the full functional forms. The second is a ”global” approach that estimates the entire functional forms in a single step using a series approximation method. Although both approaches are theoretically valid, the local approach typically requires more computation time and often leads to larger variance (but smaller bias) because it does not exploit information from nearby evaluation points. This study adopts the global approach.

Let {ϕk:k=1,2,}\{\phi_{k}:k=1,2,\ldots\} be a series of orthonormal basis functions. We throughout assume that ϕk\phi_{k}’s are continuous on [0,1][0,1]. Then, if the functions α0\alpha_{0} and β0\beta_{0} are sufficiently smooth, we can approximate

α0(s)\displaystyle\alpha_{0}(s) k=1Kϕk(s)θ0α,k,\displaystyle\approx\sum_{k=1}^{K}\phi_{k}(s)\theta_{0\alpha,k}, (3.1)
β0j(s)\displaystyle\beta_{0j}(s) k=1Kϕk(s)θ0j,k,j[dx],\displaystyle\approx\sum_{k=1}^{K}\phi_{k}(s)\theta_{0j,k},\;\;j\in[d_{x}], (3.2)

uniformly in s[0,1]s\in[0,1], for some coefficient vectors θ0α=(θ0α,1,,θ0α,K)\theta_{0\alpha}=(\theta_{0\alpha,1},\ldots,\theta_{0\alpha,K})^{\top} and θ0j=(θ0j,1,,θ0j,K)\theta_{0j}=(\theta_{0j,1},\ldots,\theta_{0j,K})^{\top}, j[dx]j\in[d_{x}]. Here, KKnTK\equiv K_{nT} is a sequence of positive integers tending to infinity as nTnT increases. For simplicity of presentation, we use the same basis function ϕk\phi_{k} and the same basis order KK to approximate both α0\alpha_{0} and β0\beta_{0}. Define θ0=(θ0α,θ01,,θ0dx)\theta_{0}=(\theta_{0\alpha}^{\top},\theta_{01}^{\top},\ldots,\theta_{0d_{x}}^{\top})^{\top}, ϕK(s)=(ϕ1(s),,ϕK(s))\phi^{K}(s)=(\phi_{1}(s),\ldots,\phi_{K}(s))^{\top},

Rit(s)(A(Y¯it,s),Xit),Hit(s)Rit(s)ϕK(s),andHt(s)=(H1t(s),,Hnt(s)).\displaystyle R_{it}(s)\coloneqq(A(\overline{Y}_{it},s),X_{it}^{\top})^{\top},\;\;H_{it}(s)\coloneqq R_{it}(s)\otimes\phi^{K}(s),\;\;\text{and}\;\;H_{t}(s)=(H_{1t}(s),\ldots,H_{nt}(s))^{\top}. (3.3)

Then, we can further re-write the model (1.2) as

Yt(s)=Ht(s)θ0+F0(s)+Vt(s)+t(s),t[T].\displaystyle Y_{t}(s)=H_{t}(s)\theta_{0}+F_{0}(s)+V_{t}(s)+\mathcal{E}_{t}(s),\;\;t\in[T]. (3.4)

Here, Vt(s)=(v1t(s),,vnt(s))V_{t}(s)=(v_{1t}(s),\ldots,v_{nt}(s))^{\top} is an n×1n\times 1 vector of series approximation errors:

vit(s)A(Y¯it,s){α0(s)ϕK(s)θ0α}+j=1dxXitj{β0j(s)ϕK(s)θ0j}.\displaystyle v_{it}(s)\coloneqq A(\overline{Y}_{it},s)\{\alpha_{0}(s)-\phi^{K}(s)^{\top}\theta_{0\alpha}\}+\sum_{j=1}^{d_{x}}X_{it}^{j}\{\beta_{0j}(s)-\phi^{K}(s)^{\top}\theta_{0j}\}. (3.5)

Under the assumptions we will introduce, this approximation error diminishes to zero at a certain rate as KK goes to infinity. How to choose an appropriate KK will be discussed in Remark 3.2.

Further, let

𝒀(s)=(Y1(s)YT(s)),𝑯(s)=(H1(s)HT(s)),𝑽(s)=(V1(s)VT(s)),𝓔(s)=(1(s)T(s)),\displaystyle\bm{Y}(s)=\left(\begin{array}[]{c}Y_{1}(s)\\ \vdots\\ Y_{T}(s)\end{array}\right),\quad\bm{H}(s)=\left(\begin{array}[]{c}H_{1}(s)\\ \vdots\\ H_{T}(s)\end{array}\right),\quad\bm{V}(s)=\left(\begin{array}[]{c}V_{1}(s)\\ \vdots\\ V_{T}(s)\end{array}\right),\quad\bm{\mathcal{E}}(s)=\left(\begin{array}[]{c}\mathcal{E}_{1}(s)\\ \vdots\\ \mathcal{E}_{T}(s)\end{array}\right), (3.18)

and 𝑫n(T1)×nT=(dij)\underset{n(T-1)\times nT}{\bm{D}}=(d_{ij}) be the one-period lag operator, whose (i,j)(i,j)-th element is defined as

dij={1ifi=j1ifn+i=j0otherwise\displaystyle d_{ij}=\begin{cases}-1&\text{if}\;\;i=j\\ 1&\text{if}\;\;n+i=j\\ 0&\text{otherwise}\end{cases} (3.19)

Then, we can remove the unknown fixed effects from the model in the following manner:

𝑫𝒀(s)\displaystyle\bm{D}\bm{Y}(s) =𝑫𝑯(s)θ0+𝑫𝑽(s)+𝑫𝓔(s).\displaystyle=\bm{D}\bm{H}(s)\theta_{0}+\bm{D}\bm{V}(s)+\bm{D}\bm{\mathcal{E}}(s). (3.20)

We estimate θ0\theta_{0} based on this expression. In order to consistently estimate θ0\theta_{0}, we need to address the endogeneity issue caused by the simultaneous interactions of the response functions; that is, since A(Y¯it,s)A(\overline{Y}_{it},s) is correlated with the error term εit(s)\varepsilon_{it}(s) in general, simply regressing 𝑫𝒀(s)\bm{D}\bm{Y}(s) on 𝑫𝑯(s)\bm{D}\bm{H}(s) does not yield a consistent estimate of θ0\theta_{0}. To tackle this issue, we employ an instrumental variable (IV) approach.

Suppose we have a dq×1d_{q}\times 1 vector of IVs Qit=(Qit1,,Qitdq)Q_{it}=(Q_{it}^{1},\ldots,Q_{it}^{d_{q}})^{\top} for A(Y¯it,s)A(\overline{Y}_{it},s). For example, noting WnYt=Wn𝒜Yt+WnXtβ0+WnF0+WntW_{n}Y_{t}=W_{n}\mathcal{A}Y_{t}+W_{n}X_{t}\beta_{0}+W_{n}F_{0}+W_{n}\mathcal{E}_{t}, we can find that the network lagged covariates X¯itj=1nwi,jXjt\overline{X}_{it}\coloneqq\sum_{j=1}^{n}w_{i,j}X_{jt} (and also their lags) are valid candidates for QitQ_{it}. Define

Bit(Qit,Xit),Zit(s)BitϕK(s),Zt(s)=(Z1t(s),,Znt(s)),\displaystyle B_{it}\coloneqq(Q_{it}^{\top},X_{it}^{\top})^{\top},\;\;Z_{it}(s)\coloneqq B_{it}\otimes\phi^{K}(s),\;\;Z_{t}(s)=(Z_{1t}(s),\ldots,Z_{nt}(s))^{\top}, (3.21)

and 𝒁(s)=(Z1(s),,ZT(s))\bm{Z}(s)=(Z_{1}(s)^{\top},\ldots,Z_{T}(s)^{\top})^{\top}. Then, we have 𝔼[𝒁(s)𝑫𝑫𝓔(s)]=𝟎(dq+dx)K\mathbb{E}[\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)]=\bm{0}_{(d_{q}+d_{x})K}.

Now, although one can estimate θ0\theta_{0} based on the linear moment conditions 𝔼[𝒁(s)𝑫𝑫𝓔(s)]=𝟎(dq+dx)K\mathbb{E}[\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)]=\bm{0}_{(d_{q}+d_{x})K} only, which results in a two-stage least squares (2SLS) type estimator, we can utilize additionally the quadratic moment conditions to improve the efficiency of estimation (see, e.g., Lin and Lee, 2010). That is, under the independence assumption on the error terms {εit(s)}i[n],t[T]\{\varepsilon_{it}(s)\}_{i\in[n],t\in[T]} (see Assumption 3.3(i) below), for any n(T1)×n(T1)n(T-1)\times n(T-1) matrices PmIT1Pm,1P_{m}\coloneqq I_{T-1}\otimes P_{m,1}, where Pm,1P_{m,1} (m=1,,Mm=1,\ldots,M) is an n×nn\times n matrix whose diagonal elements are all zero, we have

𝔼[𝓔(s)𝑫Pm𝑫𝓔(s)]=0,m[M].\displaystyle\mathbb{E}[\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)]=0,\;\;m\in[M]. (3.22)

Some examples of Pm,1P_{m,1} include Pm,1=WnP_{m,1}=W_{n} and Pm,1=WnWndiag(WnWn)P_{m,1}=W_{n}^{\top}W_{n}-\text{diag}(W_{n}^{\top}W_{n}).

Combining the linear and quadratic moment conditions, we can construct our estimator based on the following dg(dq+dx)K+Md_{g}\coloneqq(d_{q}+d_{x})K+M moment conditions: for s[0,1]s\in[0,1],

1n(T1)(𝔼[𝒁(s)𝑫𝑫𝓔(s)]𝔼[𝓔(s)𝑫P1𝑫𝓔(s)]𝔼[𝓔(s)𝑫PM𝑫𝓔(s)])=𝟎dg.\displaystyle\frac{1}{n(T-1)}\left(\begin{array}[]{c}\mathbb{E}[\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)]\\ \mathbb{E}[\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s)]\\ \vdots\\ \mathbb{E}[\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s)]\end{array}\right)=\bm{0}_{d_{g}}. (3.27)

As the empirical counterpart of these moment conditions, given a candidate value θ\theta for θ0\theta_{0}, we define

gnT(s;θ)dg×11n(T1)(𝒁(s)𝑫𝑫𝑬(s;θ)𝑬(s;θ)𝑫P1𝑫𝑬(s;θ)𝑬(s;θ)𝑫PM𝑫𝑬(s;θ)),\displaystyle\underset{d_{g}\times 1}{g_{nT}(s;\theta)}\coloneqq\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta)\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{E}(s;\theta)\\ \vdots\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{E}(s;\theta)\end{array}\right), (3.32)

where 𝑬(s;θ)𝒀(s)𝑯(s)θ\bm{E}(s;\theta)\coloneqq\bm{Y}(s)-\bm{H}(s)\theta. The estimator of θ0\theta_{0} is obtained by minimizing the norm of gnT(s;θ)g_{nT}(s;\theta) over s[0,1]s\in[0,1]. To this end, we pre-specify LLnTL\equiv L_{nT} grid points in [0,1][0,1], denoted by 0<s1<<sL<10<s_{1}<\cdots<s_{L}<1, and numerically integrate the moment functions across these points:

g¯nT(θ)1Ll=1LgnT(sl;θ).\displaystyle\overline{g}_{nT}(\theta)\coloneqq\frac{1}{L}\sum_{l=1}^{L}g_{nT}(s_{l};\theta). (3.33)

Situations where the response functions are not fully observable are discussed in Remark 3.3.

Now, we are ready to introduce our estimator:

θ^nT=(θ^nT,α,θ^nT,1,,θ^nT,dx)argminθΘK𝒬nT(θ)where𝒬nT(θ)g¯nT(θ)ΩnTg¯nT(θ),\displaystyle\begin{split}&\widehat{\theta}_{nT}=(\widehat{\theta}_{nT,\alpha}^{\top},\widehat{\theta}_{nT,1}^{\top},\ldots,\widehat{\theta}_{nT,d_{x}}^{\top})^{\top}\coloneqq\operatorname*{argmin}_{\theta\in\Theta_{K}}\mathcal{Q}_{nT}(\theta)\\ &\text{where}\;\;\mathcal{Q}_{nT}(\theta)\coloneqq\overline{g}_{nT}(\theta)^{\top}\Omega_{nT}\overline{g}_{nT}(\theta),\end{split} (3.34)

ΩnT\Omega_{nT} is a dg×dgd_{g}\times d_{g} positive definite symmetric weight matrix, and ΘK(dx+1)K\Theta_{K}\subset\mathbb{R}^{(d_{x}+1)K} is a compact parameter space containing θ0\theta_{0} in its interior. For one example of the weight matrix, we can use ΩnT=Idg\Omega_{nT}=I_{d_{g}}, or for another example,

ΩnT=((l=1L𝒁(sl)𝑫𝑫𝒁(sl)/N)1𝟎(dq+dx)K×M𝟎M×(dq+dx)KIM),\displaystyle\Omega_{nT}=\left(\begin{array}[]{cc}\left(\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{Z}(s_{l})/N\right)^{-1}&\bm{0}_{(d_{q}+d_{x})K\times M}\\ \bm{0}_{M\times(d_{q}+d_{x})K}&I_{M}\end{array}\right), (3.37)

where

NnL(T1).\displaystyle N\coloneqq nL(T-1). (3.38)

Once θ^nT\widehat{\theta}_{nT} is obtained, the estimators of α0(s)\alpha_{0}(s) and β0(s)\beta_{0}(s) are given as

α^nT(s)\displaystyle\widehat{\alpha}_{nT}(s) ϕK(s)θ^nT,α,\displaystyle\coloneqq\phi^{K}(s)^{\top}\widehat{\theta}_{nT,\alpha}, (3.39)
β^nT,j(s)\displaystyle\widehat{\beta}_{nT,j}(s) ϕK(s)θ^nT,j,j[dx]\displaystyle\coloneqq\phi^{K}(s)^{\top}\widehat{\theta}_{nT,j},\;\;j\in[d_{x}] (3.40)

which we refer to as the integrated-GMM estimators. Additionally, if one is interested in the estimation of individual fixed effect functions, the following estimator can be used:

f^ni(s)1Tt=1T(Yit(s)α^nT(s)A(Y¯it,s)Xitβ^nT(s)),i[n]\displaystyle\widehat{f}_{ni}(s)\coloneqq\frac{1}{T}\sum_{t=1}^{T}\left(Y_{it}(s)-\widehat{\alpha}_{nT}(s)A(\overline{Y}_{it},s)-X_{it}^{\top}\widehat{\beta}_{nT}(s)\right),\;\;i\in[n] (3.41)

where β^nT(s)=(β^nT,1(s),,β^nT,dx(s))\widehat{\beta}_{nT}(s)=(\widehat{\beta}_{nT,1}(s),\ldots,\widehat{\beta}_{nT,d_{x}}(s))^{\top}. Consistent estimation of f0i(s)f_{0i}(s) by f^ni(s)\widehat{f}_{ni}(s) requires TT to increase to infinity, while α0(s)\alpha_{0}(s) and β0(s)\beta_{0}(s) can be consistently estimated even when TT is fixed. More specifically, noting that the error term εit(s)\varepsilon_{it}(s) has mean zero, to average out the errors applying the law of large numbers at each ii, TT must increase to infinity. Unlike α^nT(s)\widehat{\alpha}_{nT}(s) and β^nT(s)\widehat{\beta}_{nT}(s), the estimator f^ni(s)\widehat{f}_{ni}(s) is not necessarily continuous, as we do not preclude cases where Yit(s)Y_{it}(s) and A(Y¯it,s)A(\overline{Y}_{it},s) are discontinuous in ss.

Asymptotic theory

To derive the asymptotic properties of our estimator, we first need to specify the structure of our sampling space. Following Jenish and Prucha (2012), let 𝒟d\mathcal{D}\subset\mathbb{R}^{d}, 1d<1\leq d<\infty be a possibly uneven lattice, and 𝒟n𝒟\mathcal{D}_{n}\subset\mathcal{D} be the set of observation locations. Once the observation locations are determined for a given sample of size nn, we assume that they do not vary over time tt. For spatial data, 𝒟\mathcal{D} would be defined by a geographical space with d=2d=2. Note that 𝒟\mathcal{D} does not necessarily have to be exactly observable to us. For example, 𝒟\mathcal{D} is possibly a complex space of general social and economic characteristics. In this case, we can consider it to be an embedding of individuals in a latent space, rather than their physical locations.

We first derive the rates of convergence of our estimators under the following set of assumptions.

Assumption 3.1 (Sampling space).

(i) The maximum coordinate difference (i.e., the Chebyshev distance) between any two observations i,j𝒟i,j\in\mathcal{D}, which we denote as Δ(i,j)\Delta(i,j), is at least (without loss of generality) 1; and (ii) a threshold distance Δ¯\overline{\Delta} exists such that wi,j=0w_{i,j}=0 if Δ(i,j)>Δ¯\Delta(i,j)>\overline{\Delta}.

Assumption 3.2 (Observables).

(i) {(Xit,Qit)}i[n],t[T]\{(X_{it},Q_{it})\}_{i\in[n],t\in[T]} are non-stochastic and uniformly bounded; and (ii) for all s[0,1]s\in[0,1], i[n]i\in[n], and t[T]t\in[T], Yit(s)p1||Y_{it}(s)||_{p}\lesssim 1 for some p>4p>4.

Assumption 3.3 (Error term).

(i) {εit}i[n],t[T]\{\varepsilon_{it}\}_{i\in[n],t\in[T]} are independent; (ii) for all s[0,1]s\in[0,1], i[n]i\in[n], and t[T]t\in[T], 𝔼[εit(s)]=0\mathbb{E}[\varepsilon_{it}(s)]=0, εit(s)2>0||\varepsilon_{it}(s)||_{2}>0, and εit(s)41||\varepsilon_{it}(s)||_{4}\lesssim 1; and (iii) for all i[n]i\in[n] and t[T]t\in[T], k=1K(L2l=1Ll=1LΓit(sl,sl)ϕk(sl)ϕk(sl))1\sum_{k=1}^{K}\left(L^{-2}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi_{k}(s_{l})\phi_{k}(s_{l^{\prime}})\right)\lesssim 1 uniformly in KK, where Γit(sl,sl)Cov(εit(sl),εit(sl))\Gamma_{it}(s_{l},s_{l^{\prime}})\coloneqq\text{Cov}(\varepsilon_{it}(s_{l}),\varepsilon_{it}(s_{l^{\prime}})).

Assumption 3.4 (Interaction operator).

There exists a function ωp\omega_{p} satisfying |A(h,s)|p01|h(u)|pωp(u,s)du|A(h,s)|^{p}\leq\int_{0}^{1}|h(u)|^{p}\>\omega_{p}(u,s)\text{d}u for any given 1p<1\leq p<\infty, such that 01ωp(u,s)du1\int_{0}^{1}\omega_{p}(u,s)\text{d}u\leq 1 for all s[0,1]s\in[0,1].

Assumption 3.5 (Weight matrices).

(i) For all m[M]m\in[M], Pm,1P_{m,1} is symmetric, diag(Pm,1)=𝟎n\text{diag}(P_{m,1})=\bm{0}_{n}, and Pm,11,Pm,11||P_{m,1}||_{1},||P_{m,1}||_{\infty}\lesssim 1. In addition, writing Pm,1=(pm,i,j)P_{m,1}=(p_{m,i,j}), a threshold distance Δ¯m\overline{\Delta}_{m} exists such that pm,i,j=0p_{m,i,j}=0 if Δ(i,j)>Δ¯m\Delta(i,j)>\overline{\Delta}_{m}; and (ii) 0<λmin(ΩnT)λmax(ΩnT)10<\lambda_{\min}(\Omega_{nT})\leq\lambda_{\max}(\Omega_{nT})\lesssim 1 for all sufficiently large nTnT.

Assumption 3.6 (Identification).

For all sufficiently large nTnT, 0<λmin(ΠnTΠnT)λmax(ΠnTΠnT)10<\lambda_{\min}\left(\Pi_{nT}^{\top}\Pi_{nT}\right)\leq\lambda_{\max}\left(\Pi_{nT}^{\top}\Pi_{nT}\right)\lesssim 1, where ΠnTN1l=1L𝒁(sl)𝑫𝑫𝔼[𝑯(sl)]\Pi_{nT}\coloneqq N^{-1}\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{H}(s_{l})], and N=nL(T1)N=nL(T-1).

Assumption 3.7 (Series approximation).

{ϕk:k=1,2,}\{\phi_{k}:k=1,2,\ldots\} is a series of continuous orthonormal basis functions satisfying sups[0,1]|α0(s)ϕK(s)θ0α|Kπ\sup_{s\in[0,1]}|\alpha_{0}(s)-\phi^{K}(s)^{\top}\theta_{0\alpha}|\lesssim K^{-\pi} and maxj[dx]sups[0,1]|β0j(s)ϕK(s)θ0j|Kπ\max_{j\in[d_{x}]}\sup_{s\in[0,1]}|\beta_{0j}(s)-\phi^{K}(s)^{\top}\theta_{0j}|\lesssim K^{-\pi}.

Assumptions 3.1(i) and (ii) together imply that the number of interacting partners for each unit is bounded (i.e., the network must be sparse). These assumptions play a crucial role in characterizing the stochastic process of the outcome functions. In Assumption 3.2, part (i) assumes that the covariates are non-stochastic and bounded. This type of assumption is frequently utilized in the spatial and network literature and can be interpreted as viewing the analysis conditional on the realized values of the covariates. Meanwhile, part (ii) is introduced to ensure some convergence results for the quadratic moments.

Assumption 3.3(i) allows the error terms to be fully heteroskedastic. Part (ii) should be standard. Part (iii) is a high-level condition, which plays an important role to obtain the parametric convergence rate for the GMM estimator. In general, if Γit\Gamma_{it} belongs to L2([0,1]2)L^{2}([0,1]^{2}), it admits the following series expansion:

Γit(sl,sl)=k1,k2=1κit,k1,k2ϕk1(sl)ϕk2(sl)\displaystyle\Gamma_{it}(s_{l},s_{l^{\prime}})=\sum_{k_{1},k_{2}=1}^{\infty}\kappa_{it,k_{1},k_{2}}\phi_{k_{1}}(s_{l})\phi_{k_{2}}(s_{l^{\prime}}) (3.42)

for some sequence of constants {κit,k1,k2}\{\kappa_{it,k_{1},k_{2}}\}. By the orthonormality of ϕk\phi_{k},

0101Γit(sl,sl)ϕk(sl)ϕk(sl)dsldsl\displaystyle\int_{0}^{1}\int_{0}^{1}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi_{k}(s_{l})\phi_{k}(s_{l^{\prime}})\,\text{d}s_{l}\,\text{d}s_{l^{\prime}} =k1,k2=1κit,k1,k2(01ϕk(sl)ϕk1(sl)dsl)(01ϕk(sl)ϕk2(sl)dsl)\displaystyle=\sum_{k_{1},k_{2}=1}^{\infty}\kappa_{it,k_{1},k_{2}}\left(\int_{0}^{1}\phi_{k}(s_{l})\phi_{k_{1}}(s_{l})\,\text{d}s_{l}\right)\left(\int_{0}^{1}\phi_{k}(s_{l^{\prime}})\phi_{k_{2}}(s_{l^{\prime}})\,\text{d}s_{l^{\prime}}\right) (3.43)
=κit,k,k.\displaystyle=\kappa_{it,k,k}. (3.44)

Since L2l=1Ll=1LΓit(sl,sl)ϕk(sl)ϕk(sl)L^{-2}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi_{k}(s_{l})\phi_{k}(s_{l^{\prime}}) can be seen as a numerical approximation of the left-hand side of the above expression, Assumption 3.3 part (iii) essentially requires that k=1Kκit,k,k1\sum_{k=1}^{K}\kappa_{it,k,k}\lesssim 1 uniformly in KK. In particular, if the κit,k,k\kappa_{it,k,k}’s are ordered in decreasing manner such that κit,1,1κit,2,2\kappa_{it,1,1}\geq\kappa_{it,2,2}\geq\cdots, this assumption can be interpreted in two ways: there exists a constant a>1a>1 such that κit,k,kka\kappa_{it,k,k}\lesssim k^{-a}, or there exists a fixed bb such that κit,k,k=0\kappa_{it,k,k}=0 for all k>bk>b.

Assumption 3.4 is not restrictive in most empirically relevant situations. For example, in the case where A(h,s)=h(s)A(h,s)=h(s), we can set ωp(u,s)=δ(us)\omega_{p}(u,s)=\delta(u-s) for any pp. For another example, when A(h,s)=01h(u)ν(u,s)duA(h,s)=\int_{0}^{1}h(u)\nu(u,s)\text{d}u for some kernel ν(u,s)\nu(u,s), since |A(h,s)|p01|h(u)|p|ν(u,s)|pdu|A(h,s)|^{p}\leq\int_{0}^{1}|h(u)|^{p}|\nu(u,s)|^{p}\text{d}u, we can set |ν(u,s)|p=ωp(u,s)|\nu(u,s)|^{p}=\omega_{p}(u,s) in this case.

In Assumption 3.5, we assume that the weight matrices in the quadratic moments are symmetric. Note that this assumption does not lose any generality because APm,1A=APm,1AA^{\top}P_{m,1}A=A^{\top}P_{m,1}^{\top}A for any n×1n\times 1 vector AA. If Pm,1P_{m,1} is not symmetric in practice, we can always symmetrize it as (Pm,1+Pm,1)/2(P_{m,1}+P_{m,1}^{\top})/2. The assumption for the existence of a threshold distance Δ¯m\overline{\Delta}_{m} may be non-standard, but it simplifies the proof. Since Pm,1P_{m,1}’s are usually created from the interaction matrix WnW_{n} and its powers, this assumption is consistent with Assumption 3.1(ii).

Assumption 3.6 is a regularity condition to ensure the identifiability of θ0\theta_{0}. Assumption 3.7 is standard. For example, it is satisfied if spline basis functions are used and α0\alpha_{0} and β0j\beta_{0j}’s are Hölder class of smoothness order π\pi (see, e.g., Chen, 2007; Belloni et al., 2015).

Theorem 3.1 (Rates of convergence).

Suppose that Assumptions 2.1, 3.13.7 hold. In addition, assume that K/nT0K/\sqrt{nT}\to 0 and K1π0K^{1-\pi}\to 0 as nTnT\to\infty. Then,

  • (i)

    θ^nTθ0p1/nT+K(12π)/2||\widehat{\theta}_{nT}-\theta_{0}||\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}

  • (ii)

    α^nTα0L2p1/nT+K(12π)/2||\widehat{\alpha}_{nT}-\alpha_{0}||_{L^{2}}\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}, and sups[0,1]|α^nT(s)α0(s)|pK/nT+K1π\sup_{s\in[0,1]}|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}

  • (iii)

    β^nT,jβ0jL2p1/nT+K(12π)/2||\widehat{\beta}_{nT,j}-\beta_{0j}||_{L^{2}}\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}, and sups[0,1]|β^nT,j(s)β0j(s)|pK/nT+K1π\sup_{s\in[0,1]}|\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi} for all j[dx]j\in[d_{x}].

Result (i) of Theorem 3.1 states that if the functional parameters are sufficiently smooth such that K(12π)/21/nTK^{(1-2\pi)/2}\lesssim 1/\sqrt{nT}, the series coefficient estimator is consistent and converges at the parametric rate. This result might seem somewhat surprising since the dimension of θ0\theta_{0} is increasing to infinity. An intuitive explanation for this phenomenon is that, although the sample size is nTnT, the total number of observation points used in the estimation is NN (=nL(T1)=nL(T-1)). This fact, in conjunction with Assumption 3.3(iii), leads to the result. The same convergence rate applies to the L2L^{2}-convergence rate of the functional estimators, as shown in (ii) and (iii). The uniform convergence rate for these estimators is K1/2K^{1/2} slower than the L2L^{2}-convergence rate. However, note that the convergence results obtained here are not necessarily the sharpest, and the theoretically optimal convergence rates under our setup are also unknown. These points are left for future research.

Remark 3.1 (Local estimation approach).

If one adopts a local approach that directly estimates α0(s)\alpha_{0}(s) and β0(s)\beta_{0}(s) at each ss, since there are exactly nTnT observations at each ss, it can be readily shown that |α^nT(s)α0(s)|p1/nT|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\lesssim_{p}1/\sqrt{nT} and β^nT(s)β0(s)p1/nT||\widehat{\beta}_{nT}(s)-\beta_{0}(s)||\lesssim_{p}1/\sqrt{nT}. Since the local approach does not rely on series approximation, these results are free from bias terms. However, while achieving unbiasedness, the local estimator faces challenges in deriving the uniform convergence rate.


We next present the limiting distribution of our estimators. To this end, we introduce the following notations and additional assumption:

JnT(s;θ)dg×(dx+1)KgnT(s;θ)θ,J¯nT(θ)1Ll=1LJnT(sl;θ),J¯nT𝔼[J¯nT(θ0)]\displaystyle\underset{d_{g}\times(d_{x}+1)K}{J_{nT}(s;\theta)}\coloneqq\frac{\partial g_{nT}(s;\theta)}{\partial\theta^{\top}},\;\;\overline{J}_{nT}(\theta)\coloneqq\frac{1}{L}\sum_{l=1}^{L}J_{nT}(s_{l};\theta),\;\;\overline{J}_{nT}\coloneqq\mathbb{E}\left[\overline{J}_{nT}(\theta_{0})\right] (3.45)
g1,nT(s)dg×11n(T1)(𝒁(s)𝑫𝑫𝓔(s)𝓔(s)𝑫P1𝑫𝓔(s)𝓔(s)𝑫PM𝑫𝓔(s)),g¯1,nT1Ll=1Lg1,nT(sl)\displaystyle\underset{d_{g}\times 1}{g_{1,nT}(s)}\coloneqq\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s)\\ \bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s)\\ \vdots\\ \bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s)\end{array}\right),\;\;\overline{g}_{1,nT}\coloneqq\frac{1}{L}\sum_{l=1}^{L}g_{1,nT}(s_{l}) (3.50)
𝒱nTn(T1)𝔼[g¯1,nTg¯1,nT]\displaystyle\mathcal{V}_{nT}\coloneqq n(T-1)\mathbb{E}\left[\overline{g}_{1,nT}\overline{g}_{1,nT}^{\top}\right] (3.51)
ΣnT(dx+1)K×(dx+1)K(J¯nTΩnTJ¯nT)1J¯nTΩnT𝒱nTΩnTJ¯nT(J¯nTΩnTJ¯nT)1.\displaystyle\underset{(d_{x}+1)K\times(d_{x}+1)K}{\Sigma_{nT}}\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\mathcal{V}_{nT}\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}. (3.52)

More explicit forms of the matrices JnT(s;θ)J_{nT}(s;\theta) and 𝒱nT\mathcal{V}_{nT} can be found in (A.23) and in (A.38) in Appendix A, respectively. Further, let 𝕊α\mathbb{S}_{\alpha} and 𝕊j\mathbb{S}_{j} be the K×(dx+1)KK\times(d_{x}+1)K selection matrices such that θ0α=𝕊αθ0\theta_{0\alpha}=\mathbb{S}_{\alpha}\theta_{0} and θ0j=𝕊jθ0\theta_{0j}=\mathbb{S}_{j}\theta_{0} hold.

Assumption 3.8 (Misc.).

For all sufficiently large nTnT, (i) λmax(N1l=1L𝔼[𝑯(sl)𝑯(sl)])1\lambda_{\max}\left(N^{-1}\allowbreak\sum_{l=1}^{L}\mathbb{E}[\bm{H}(s_{l})^{\top}\bm{H}(s_{l})]\right)\lesssim 1; (ii) 0<λmin(J¯nTJ¯nT)λmax(J¯nTJ¯nT)10<\lambda_{\min}\left(\overline{J}^{\top}_{nT}\overline{J}_{nT}\right)\leq\lambda_{\max}\left(\overline{J}^{\top}_{nT}\overline{J}_{nT}\right)\lesssim 1; and (iii) 0<λmin(𝒱nT)λmax(𝒱nT)10<\lambda_{\min}\left(\mathcal{V}_{nT}\right)\leq\lambda_{\max}\left(\mathcal{V}_{nT}\right)\lesssim 1.

Theorem 3.2 (Asymptotic normality).

Suppose that Assumptions 2.1, 3.13.8 hold. In addition, assume that K/nT0K/\sqrt{nT}\to 0, K2/(nTϕK(s)2)0K^{2}/(\sqrt{nT}\left\|\phi^{K}(s)\right\|^{2})\to 0, and nTK(12π)/20\sqrt{nT}K^{(1-2\pi)/2}\to 0 as nTnT\to\infty. Then,

(i) n(T1)(α^nT(s)α0(s))σnT,α(s)𝑑N(0,1)\displaystyle\;\;\frac{\sqrt{n(T-1)}\left(\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)\right)}{\sigma_{nT,\alpha}(s)}\overset{d}{\to}N(0,1) (3.53)
(ii) n(T1)(β^nT,j(s)β0j(s))σnT,j(s)𝑑N(0,1),\displaystyle\;\;\frac{\sqrt{n(T-1)}\left(\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)\right)}{\sigma_{nT,j}(s)}\overset{d}{\to}N(0,1), (3.54)

where [σnT,α(s)]2ϕK(s)𝕊αΣnT𝕊αϕK(s)[\sigma_{nT,\alpha}(s)]^{2}\coloneqq\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\Sigma_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s), and [σnT,j(s)]2ϕK(s)𝕊jΣnT𝕊jϕK(s)[\sigma_{nT,j}(s)]^{2}\coloneqq\phi^{K}(s)^{\top}\mathbb{S}_{j}\Sigma_{nT}\mathbb{S}_{j}^{\top}\phi^{K}(s).

Theorem 3.2 establishes the pointwise asymptotic normality of the integrated-GMM estimators. As is common in series estimation, we impose additional undersmoothing conditions to ensure that the bias terms vanish sufficiently quickly.

In order to perform statistical inference based on the results of Theorem 3.2, we need to consistently estimate the variances [σnT,α(s)]2[\sigma_{nT,\alpha}(s)]^{2} and [σnT,j(s)]2[\sigma_{nT,j}(s)]^{2}. To save space, the procedure for consistent variance estimation is not discussed here but is provided in Appendix C.

Remark 3.2 (Choice of KK).

Suppose that KK is proportional to (nT)c(nT)^{c} for some c>0c>0 and that ϕK(s)2||\phi^{K}(s)||^{2} is of order KK. Then, to achieve the asymptotic normality, we require K/nT0K/\sqrt{nT}\to 0 and nTK(12π)/20\sqrt{nT}K^{(1-2\pi)/2}\to 0 simultaneously, which can be reduced to the following condition on cc: (2π1)1<c<1/2(2\pi-1)^{-1}<c<1/2. This result automatically implies that the target functions must be sufficiently smooth with the smoothness π\pi at least greater than 3/23/2.

Remark 3.3 (Incompletely observed response function).

The integrated-GMM estimator is often infeasible because the response functions are typically observed only at a finite set of points in [0,1][0,1]. Even in such cases, we can approximate the entire functional form of YitY_{it} using a linear interpolation method. Suppose that for each (i,t)(i,t), YitY_{it} is observed at LitL_{it} distinct points 0sit,1sit,2sit,Lit10\leq s_{it,1}\leq s_{it,2}\leq\dots\leq s_{it,L_{it}}\leq 1. Then, for each given s[sit,l,sit,l+1]s\in[s_{it,l},s_{it,l+1}], define

Yitint(s)Yit(sit,l)+Yit(sit,l+1)Yit(sit,l)sit,l+1sit,l(ssit,l).\displaystyle Y^{\text{int}}_{it}(s)\coloneqq Y_{it}(s_{it,l})+\frac{Y_{it}(s_{it,l+1})-Y_{it}(s_{it,l})}{s_{it,l+1}-s_{it,l}}(s-s_{it,l}). (3.55)

When s<sit,1s<s_{it,1} (resp. s>sit,Lits>s_{it,L_{it}}), we set Yitint(s)Yit(sit,1)Y^{\text{int}}_{it}(s)\coloneqq Y_{it}(s_{it,1}) (resp. Yitint(s)Yit(sit,Lit)Y^{\text{int}}_{it}(s)\coloneqq Y_{it}(s_{it,L_{it}})). Other than linear interpolation, one may also use a kernel method, as in Zhu et al. (2022), to obtain Yitint(s)Y^{\text{int}}_{it}(s). Then, using Yitint(s)Y^{\text{int}}_{it}(s) in place of Yit(s)Y_{it}(s), we can write

Yitint(s)=α0(s)A(Y¯itint,s)+Xitβ0(s)+f0i(s)+εit(s)+uit(s),\displaystyle Y^{\text{int}}_{it}(s)=\alpha_{0}(s)A(\overline{Y}^{\text{int}}_{it},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s)+u_{it}(s), (3.56)

where uit(s)u_{it}(s) is the interpolation error: uit(s)Yitint(s)Yit(s)+α0(s)A(Y¯itY¯itint,s)u_{it}(s)\coloneqq Y^{\text{int}}_{it}(s)-Y_{it}(s)+\alpha_{0}(s)A(\overline{Y}_{it}-\overline{Y}^{\text{int}}_{it},s). Thus, if uit(s)u_{it}(s) converges to zero sufficiently quickly for all s[0,1]s\in[0,1], i[n]i\in[n], and t[T]t\in[T], we can apply the same estimation and inference strategy as above.

Network multiplier effects: marginal effects and impulse responses

Once the model is estimated, as a next step, one might be interested in computing the marginal effects of covariates on the outcome. In a standard linear regression model without network interaction, the estimated coefficients directly represent the marginal effects of their corresponding covariates. However, in the presence of intricate functional interaction, this is no longer the case.

As shown in Section 2, under Assumption 2.1, we have the following moving-average type representation:

Yt==0𝒜Xtβ0+=0𝒜F0+=0𝒜t,t[T].\displaystyle Y_{t}=\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}X_{t}\beta_{0}+\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}F_{0}+\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}\mathcal{E}_{t},\;\;t\in[T]. (4.1)

This expression indicates that the marginal effects of increasing XitjX_{it}^{j} by one unit on YtY_{t} is given by Yt/(Xitj)=limc0=0[𝒜(Xtj+𝒆ic)β0j𝒜Xtjβ0j]/c==0𝒜𝒆iβ0j\partial Y_{t}/(\partial X_{it}^{j})=\lim_{c\to 0}\sum_{\ell=0}^{\infty}[\mathcal{A}^{\ell}(X_{t}^{j}+\bm{e}_{i}c)\beta_{0j}-\mathcal{A}^{\ell}X_{t}^{j}\beta_{0j}]/c=\sum_{\ell=0}^{\infty}\mathcal{A}^{\ell}\bm{e}_{i}\beta_{0j} by the linearity of 𝒜\mathcal{A}^{\ell}, where XtjX_{t}^{j} is the jj-th column of XtX_{t}, and 𝒆i\bm{e}_{i} denotes the ii-th column of InI_{n}. Alternatively, a little more informative expression can be obtained as follows: letting γ(h,s)α0(s)A(h,s)\gamma(h,s)\coloneqq\alpha_{0}(s)A(h,s),

M(i,j,s)\displaystyle M(i,j,s) Yt(s)/(Xitj)=𝒆iβ0j(s)+Wn𝒆iγ(β0j,s)+Wn2𝒆iγ2(β0j,s)+\displaystyle\coloneqq\partial Y_{t}(s)/(\partial X_{it}^{j})=\bm{e}_{i}\beta_{0j}(s)+W_{n}\bm{e}_{i}\gamma(\beta_{0j},s)+W_{n}^{2}\bm{e}_{i}\gamma^{2}(\beta_{0j},s)+\cdots (4.2)
==0Wn𝒆iγ(β0j,s),\displaystyle=\sum_{\ell=0}^{\infty}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\beta_{0j},s), (4.3)

where γ0(β0j,s)=β0j(s)\gamma^{0}(\beta_{0j},s)=\beta_{0j}(s), and γ(β0j,s)=γ(γ1(β0j,),s)\gamma^{\ell}(\beta_{0j},s)=\gamma(\gamma^{\ell-1}(\beta_{0j},\cdot),s) for 1\ell\geq 1. From this, we can clearly see that the marginal effects M(i,j,s)M(i,j,s) of increasing XitjX_{it}^{j} consist of the direct effect on unit ii, the indirect effect on ii’s immediate neighbors, the second-order indirect effect on ii’s neighbors’ neighbors, and so forth, highlighting the presence of the network multiplier effect. More specifically, recall that when WnW_{n} represents a (weighted) adjacency matrix, the (i,j)(i,j)-th element of WnW_{n}^{\ell} corresponds to the number of (weighted) walks between ii and jj of length \ell. Thus, the kk-th element of M(i,j,s)M(i,j,s) is interpreted as the weighted sum of the number of walks from ii to kk, where the contribution of each length-\ell walk to the sum decays exponentially at γ(β0j,s)\gamma^{\ell}(\beta_{0j},s).

To estimate the marginal effects, not just replacing the unknown parameters with their estimators, the infinite sum generally needs to be approximated by a finite sum: for some positive integer SS,

M^nTS(i,j,s)=0SWn𝒆iγ^nT(β^nT,j,s),\displaystyle\widehat{M}^{S}_{nT}(i,j,s)\coloneqq\sum_{\ell=0}^{S}W_{n}^{\ell}\bm{e}_{i}\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s), (4.4)

where γ^nT(h,s)α^nT(s)A(h,s)\widehat{\gamma}_{nT}(h,s)\coloneqq\widehat{\alpha}_{nT}(s)A(h,s). Meanwhile, in the special case of concurrent interaction such that γ(h,s)=α0(s)h(s)\gamma(h,s)=\alpha_{0}(s)h(s), γ2(h,s)=(α0(s))2h(s)\gamma^{2}(h,s)=(\alpha_{0}(s))^{2}h(s), …, it is easy to see that M(i,j,s)==0(α0(s)Wn)𝒆iβ0j(s)=(Inα0(s)Wn)1𝒆iβ0j(s)M(i,j,s)=\sum_{\ell=0}^{\infty}(\alpha_{0}(s)W_{n})^{\ell}\bm{e}_{i}\beta_{0j}(s)=(I_{n}-\alpha_{0}(s)W_{n})^{-1}\bm{e}_{i}\beta_{0j}(s) holds. This implies that, in this case, we can estimate M(i,j,s)M(i,j,s) directly as (Inα^nT(s)Wn)1𝒆iβ^nT,j(s)(I_{n}-\widehat{\alpha}_{nT}(s)W_{n})^{-1}\bm{e}_{i}\widehat{\beta}_{nT,j}(s), without computing the infinite sum.


In the above discussion, we have demonstrated how the impacts of shifting one’s covariate propagate to others. Similarly, just like the impulse response analysis in time-series vector autoregression, we can consider network impulse responses when an external shock occurs at a given unit. In particular, in a similar spirit to Koop et al. (1996), we define

I(i,η,s)𝔼[Yt(s)εit=η]𝔼[Yt(s)],\displaystyle I(i,\eta,s)\coloneqq\mathbb{E}[Y_{t}(s)\mid\varepsilon_{it}=\eta]-\mathbb{E}[Y_{t}(s)], (4.5)

where η\eta is a given ”function” representing the external shock. By a similar calculation as above, we obtain

I(i,η,s)==0Wn𝒆iγ(η,s).\displaystyle I(i,\eta,s)=\sum_{\ell=0}^{\infty}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\eta,s). (4.6)

When plotting each element of Wn𝒆iγ(η,s)W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\eta,s) against =0,1,2,\ell=0,1,2,\ldots, it can be interpreted as a network version of the impulse response function (as a function of \ell), similarly to Denbee et al. (2021). The expected total social impact caused by an external shock to unit ii can be expressed as 01𝟏nI(i,η,s)ds\int_{0}^{1}\bm{1}_{n}^{\top}I(i,\eta,s)\text{d}s, and the unit that exerts the largest influence on the society is given by iargmaxi[n]01𝟏nI(i,η,s)dsi^{*}\coloneqq\operatorname*{argmax}_{i\in[n]}\int_{0}^{1}\bm{1}_{n}^{\top}I(i,\eta,s)\text{d}s. Denbee et al. (2021) referred to this unit as the risk key player, in the sense that an external shock to ii^{*} leads to the highest volatility in the aggregate outcome.

When assuming a concurrent interaction model, the impulse responses at ss take the following form: I(i,η,s)=(Inα0(s)Wn)1𝒆iη(s)I(i,\eta,s)=(I_{n}-\alpha_{0}(s)W_{n})^{-1}\bm{e}_{i}\eta(s). Thus, if there is no exogenous shock at ss, i.e., if η(s)=0\eta(s)=0, the expected outcome at ss remains unaffected. This implies, for instance, that a travel demand shock that occurred five minutes ago has no impact on current mobility availability, which is unrealistic. On the other hand, if the interaction structure is given by A(h,s)=01h(u)ν(u,s)duA(h,s)=\int_{0}^{1}h(u)\nu(u,s)\text{d}u with ν(s,s)0\nu(s^{\prime},s)\neq 0 for s<ss^{\prime}<s, then a shock occurring at ss^{\prime} can transmit to the outcome at ss, leading to nonzero impulse responses at ss even when η(s)=0\eta(s)=0.

The estimation of I(i,η,s)I(i,\eta,s) can be performed in the same manner as above. For some positive integer SS, we estimate I(i,η,s)I(i,\eta,s) by I^nTS(i,η,s)=0SWn𝒆iγ^nT(η,s)\widehat{I}^{S}_{nT}(i,\eta,s)\coloneqq\sum_{\ell=0}^{S}W_{n}^{\ell}\bm{e}_{i}\widehat{\gamma}_{nT}^{\ell}(\eta,s). The next proposition provides the convergence rate of M^nTS(i,j,s)\widehat{M}^{S}_{nT}(i,j,s) and that of I^nTS(i,η,s)\widehat{I}^{S}_{nT}(i,\eta,s).

Proposition 4.1.

Suppose that the assumptions in Theorem 3.1 hold. In addition, assume that α¯0<1\overline{\alpha}_{0}<1. Then, uniformly in s[0,1]s\in[0,1],

  • (i)

    maxi[n]M^nTS(i,j,s)M(i,j,s)pK/nT+K1π+α¯0S+1\max_{i\in[n]}\left\|\widehat{M}^{S}_{nT}(i,j,s)-M(i,j,s)\right\|_{\infty}\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}+\overline{\alpha}_{0}^{S+1},

  • (ii)

    maxi[n]I^nTS(i,η,s)I(i,η,s)pK/nT+K1π+α¯0S+1\max_{i\in[n]}\left\|\widehat{I}^{S}_{nT}(i,\eta,s)-I(i,\eta,s)\right\|_{\infty}\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}+\overline{\alpha}_{0}^{S+1}.

This proposition indicates that the uniform convergence rates for the marginal effect and the impulse response estimators depend on the uniform convergence rate of the integrated-GMM estimator and the summation order SS. Since the approximation error from truncating the infinite sum decreases geometrically as SS increases, in practice, setting S=4S=4 or 55 would be sufficient.

Monte Carlo Simulation

In this section, we conduct a series of Monte Carlo experiments to evaluate the finite-sample performance of the integrated-GMM estimator. Throughout the experiments, we consider the following data-generating process (DGP):

Yit(s)=α0(s)01Y¯it(u)ν(u,s)du+Xitβ0(s)+f0i(s)+εit(s),\displaystyle Y_{it}(s)=\alpha_{0}(s)\int_{0}^{1}\overline{Y}_{it}(u)\nu(u,s)\text{d}u+X_{it}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}(s), (5.1)

where XitN(0,1)X_{it}\sim N(0,1), α0(s)=ϕ(s;0.4,0.52)+0.2s0.4s2\alpha_{0}(s)=\phi(s;0.4,0.5^{2})+0.2s-0.4s^{2}, ϕ(;μ,σ2)\phi(\cdot;\mu,\sigma^{2}) denotes the standard normal density function with mean μ\mu and variance σ2\sigma^{2}, ν(u,s)=0.75(1(us)2)\nu(u,s)=0.75(1-(u-s)^{2}) (i.e., the Epanechnikov kernel function), and the individual fixed effects are given by f0i(s)=1+cos(is)f_{0i}(s)=1+\cos(is). The coefficient function is given by β0(s)=r(1+s+s(1s))\beta_{0}(s)=r(\sqrt{1+s}+s(1-s)), where rr is chosen from r{0.4,1}r\in\{0.4,1\}. We use Qit=(X¯it,X¯¯it)Q_{it}=(\overline{X}_{it},\overline{\overline{X}}_{it}) as the IVs for Y¯it\overline{Y}_{it}, and, thus, the magnitude of rr determines the strength of these IVs. For the error term, we generate εit(s)=1+degi(e1,it,e2,it,e3,it)(1,s,s2)\varepsilon_{it}(s)=\sqrt{1+\text{deg}_{i}}(e_{1,it},e_{2,it},e_{3,it})^{\top}(1,s,s^{2}), where degi\text{deg}_{i} denotes the number of units connected to ii (i.e., ii’s degree), and ej,itN(0,0.42)e_{j,it}\sim N(0,0.4^{2}) for j=1,2,3j=1,2,3. The weight matrix WnW_{n} is a row-normalized adjacency matrix, which is constructed by randomly placing nn units on a [2n]×[2n][\sqrt{2n}]\times[\sqrt{2n}] lattice, where [a][a] denotes the nearest integer to aa. Any two units are connected if the Euclidean distance between them is exactly one. The sample size nn is chosen from n{40,80}n\in\{40,80\}, and TT is from T{5,10}T\in\{5,10\}.

Since our DGP satisfies the conditions in Assumption 2.1, we can generate the outcome functions using the Neumann series approximation: YtYt(S)=0S𝒜[Xtβ0+F0+t]Y_{t}\approx Y_{t}^{(S)}\coloneqq\sum_{\ell=0}^{S}\mathcal{A}^{\ell}[X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t}], where SS is increased iteratively until maxi[n]|Yit(S)(s)Yit(S1)(s)|<0.001\max_{i\in[n]}|Y_{it}^{(S)}(s)-Y_{it}^{(S-1)}(s)|<0.001 is satisfied at each tt and ss. Throughout the simulations, integrals over [0,1][0,1] are approximated by finite summations over 99 equally-spaced grid points.

For the choice of basis functions, we use the cubic B-splines orthonormalized via the Gram-Schmidt procedure. The number of inner knots for the B-spline is selected from K~{2,3}\widetilde{K}\in\{2,3\}. The number of grid points used to evaluate the moment function is chosen from L{10,30}L\in\{10,30\}, with the points 0<s1<<sL<10<s_{1}<\dots<s_{L}<1 evenly spaced over [0,1][0,1]. For the quadratic moments, we use two weight matrices (M=2M=2): P1,1=WnP_{1,1}=W_{n} and P2,1=WnWndiag(WnWn)P_{2,1}=W_{n}^{\top}W_{n}-\text{diag}(W_{n}^{\top}W_{n}). We then compare the performance of three different estimators: GMM 1: the integrated-GMM estimator using the weight matrix given in (3.37), GMM 2: the integrated-GMM estimator using the identity weight matrix, and (integrated) 2SLS: GMM 1 estimator without utilizing the quadratic moment conditions.

For each setup, we generate the dataset 500 times. The performance of the estimators is evaluated based on the average bias (BIAS) and the average root-mean-squared error (RMSE). Specifically, the BIAS and RMSE of estimating α0\alpha_{0} are defined as

BIAS(α0)\displaystyle\text{BIAS}(\alpha_{0}) 1500b=1500[199l=199(α^nT(b)(sl)α0(sl))]\displaystyle\coloneqq\frac{1}{500}\sum_{b=1}^{500}\left[\frac{1}{99}\sum_{l=1}^{99}\left(\widehat{\alpha}_{nT}^{(b)}(s_{l})-\alpha_{0}(s_{l})\right)\right] (5.2)
RMSE(α0)\displaystyle\text{RMSE}(\alpha_{0}) 1500b=1500199l=199(α^nT(b)(sl)α0(sl))2\displaystyle\coloneqq\frac{1}{500}\sum_{b=1}^{500}\sqrt{\frac{1}{99}\sum_{l=1}^{99}\left(\widehat{\alpha}_{nT}^{(b)}(s_{l})-\alpha_{0}(s_{l})\right)^{2}} (5.3)

respectively. Here, α^nT(b)\widehat{\alpha}_{nT}^{(b)} denotes the estimator of α0\alpha_{0} obtained from the bb-th replicated dataset. The BIAS and RMSE for β0\beta_{0} are defined analogously.

The simulation results for the estimation of α0\alpha_{0} are summarized in Table 5.1. From these results, we observe that all three estimators perform reasonably well in terms of BIAS. However, in terms of RMSE, GMM 1 clearly outperforms the other two estimators across all scenarios. Recalling that, without the quadratic moment conditions, GMM 1 and 2SLS are numerically identical, it follows that GMM 1’s efficiency gain relative to 2SLS stems solely from the quadratic moment conditions. Interestingly, when comparing GMM 2 and 2SLS, we find that 2SLS even outperforms GMM 2. These findings suggest that while incorporating quadratic moments does improve the efficiency, the choice of the GMM weight matrix is equally (or potentially more) crucial. When rr increases from 0.4 to 1, the RMSEs for the GMM estimators are roughly halved or slightly more, whereas those for 2SLS shrink by more than half. This indicates that the 2SLS estimator is more sensitive to IV strength, as anticipated. The choices of LL and K~\widetilde{K} seem to have only minor impacts on performance. When we increase the sample size from nT=200nT=200 to nT=800nT=800, the RMSE values are roughly halved, demonstrating nT\sqrt{nT}-consistency of the estimators, which numerically corroborates our theoretical result in Theorem 3.1(ii).

Table 5.2 presents the simulation results for estimating β0\beta_{0}. Here, GMM 1 and 2SLS perform quite similarly, whereas GMM 2 is slightly less accurate than the other two. Notably, the RMSEs remain almost unchanged across different values of rr for all three estimators, suggesting that IV strength is not a critical factor. As with α0\alpha_{0}, the estimation of β0\beta_{0} also demonstrates nT\sqrt{nT}-consistency.

Table 5.1: Simulation result: α0\alpha_{0}
GMM 1 GMM 2 2SLS
nn TT LL K~\widetilde{K} rr BIAS RMSE BIAS RMSE BIAS RMSE
40 5 10 2 0.4 0.0062 0.1132 -0.0517 0.3820 0.0479 0.2135
1 0.0083 0.0649 -0.0205 0.2128 0.0136 0.0820
3 0.4 0.0061 0.1131 -0.0564 0.3938 0.0479 0.2135
1 0.0083 0.0649 -0.0242 0.2230 0.0136 0.0820
30 2 0.4 0.0064 0.1135 -0.0406 0.3562 0.0479 0.2135
1 0.0084 0.0650 -0.0138 0.1930 0.0136 0.0820
3 0.4 0.0064 0.1135 -0.0418 0.3610 0.0479 0.2134
1 0.0084 0.0650 -0.0150 0.1965 0.0136 0.0820
10 10 2 0.4 -0.0027 0.0773 -0.0357 0.2701 0.0142 0.1370
1 -0.0007 0.0456 -0.0179 0.1419 0.0028 0.0540
3 0.4 -0.0027 0.0773 -0.0387 0.2796 0.0142 0.1370
1 -0.0007 0.0456 -0.0203 0.1493 0.0028 0.0540
30 2 0.4 -0.0028 0.0773 -0.0296 0.2478 0.0142 0.1370
1 -0.0007 0.0455 -0.0146 0.1260 0.0028 0.0540
3 0.4 -0.0029 0.0773 -0.0307 0.2520 0.0142 0.1370
1 -0.0007 0.0455 -0.0150 0.1283 0.0028 0.0540
80 5 10 2 0.4 -0.0030 0.0812 -0.0394 0.2801 0.0140 0.1495
1 -0.0004 0.0470 -0.0196 0.1508 0.0019 0.0593
3 0.4 -0.0030 0.0812 -0.0423 0.2904 0.0140 0.1495
1 -0.0004 0.0470 -0.0210 0.1578 0.0019 0.0593
30 2 0.4 -0.0029 0.0813 -0.0336 0.2578 0.0140 0.1495
1 -0.0004 0.0470 -0.0163 0.1344 0.0019 0.0593
3 0.4 -0.0029 0.0813 -0.0342 0.2619 0.0140 0.1495
1 -0.0004 0.0470 -0.0163 0.1365 0.0019 0.0593
10 10 2 0.4 -0.0024 0.0539 -0.0142 0.1996 0.0023 0.0966
1 -0.0016 0.0321 -0.0058 0.0980 -0.0007 0.0385
3 0.4 -0.0024 0.0539 -0.0168 0.2071 0.0023 0.0966
1 -0.0016 0.0321 -0.0076 0.1032 -0.0007 0.0385
30 2 0.4 -0.0023 0.0538 -0.0109 0.1812 0.0023 0.0966
1 -0.0016 0.0320 -0.0044 0.0860 -0.0007 0.0385
3 0.4 -0.0024 0.0538 -0.0118 0.1842 0.0023 0.0966
1 -0.0016 0.0320 -0.0046 0.0874 -0.0007 0.0385
Table 5.2: Simulation result: β0\beta_{0}
GMM 1 GMM 2 2SLS
nn TT LL K~\widetilde{K} rr BIAS RMSE BIAS RMSE BIAS RMSE
40 5 10 2 0.4 0.0015 0.0653 0.0039 0.0966 0.0025 0.0681
1 0.0034 0.0669 0.0128 0.1091 -0.0019 0.0670
3 0.4 0.0015 0.0653 0.0042 0.0978 0.0025 0.0681
1 0.0034 0.0669 0.0139 0.1115 -0.0019 0.0670
30 2 0.4 0.0015 0.0653 0.0032 0.0928 0.0025 0.0681
1 0.0034 0.0669 0.0098 0.1029 -0.0019 0.0670
3 0.4 0.0015 0.0653 0.0033 0.0930 0.0025 0.0681
1 0.0034 0.0669 0.0100 0.1032 -0.0019 0.0670
10 10 2 0.4 0.0004 0.0428 0.0032 0.0612 0.0001 0.0438
1 0.0014 0.0435 0.0088 0.0654 -0.0014 0.0439
3 0.4 0.0004 0.0428 0.0034 0.0617 0.0001 0.0438
1 0.0014 0.0435 0.0094 0.0666 -0.0014 0.0439
30 2 0.4 0.0004 0.0428 0.0026 0.0590 0.0001 0.0438
1 0.0014 0.0435 0.0072 0.0620 -0.0014 0.0439
3 0.4 0.0004 0.0428 0.0027 0.0591 0.0001 0.0438
1 0.0014 0.0435 0.0073 0.0621 -0.0014 0.0439
80 5 10 2 0.4 0.0022 0.0435 0.0057 0.0669 0.0030 0.0445
1 0.0039 0.0442 0.0128 0.0739 0.0010 0.0442
3 0.4 0.0022 0.0435 0.0059 0.0676 0.0030 0.0445
1 0.0039 0.0442 0.0135 0.0754 0.0010 0.0442
30 2 0.4 0.0022 0.0435 0.0051 0.0641 0.0030 0.0445
1 0.0038 0.0442 0.0110 0.0695 0.0010 0.0442
3 0.4 0.0022 0.0435 0.0051 0.0642 0.0030 0.0445
1 0.0038 0.0442 0.0110 0.0697 0.0010 0.0442
10 10 2 0.4 0.0009 0.0304 0.0015 0.0424 0.0013 0.0306
1 0.0016 0.0306 0.0041 0.0446 0.0005 0.0305
3 0.4 0.0009 0.0304 0.0017 0.0428 0.0013 0.0306
1 0.0016 0.0306 0.0045 0.0455 0.0005 0.0305
30 2 0.4 0.0009 0.0303 0.0012 0.0407 0.0013 0.0306
1 0.0016 0.0306 0.0034 0.0421 0.0005 0.0305
3 0.4 0.0009 0.0303 0.0013 0.0408 0.0013 0.0306
1 0.0016 0.0306 0.0034 0.0422 0.0005 0.0305

Analyzing the Demand of Bike-Sharing System

As an empirical application of our method, we analyze spatial interactions in the demand for a bike-sharing system in the U.S. Demand analysis of shared mobility has been a highly active research topic in recent years across various areas, including transportation research, marketing, economics, and environmental studies. In particular, bike-sharing systems have attracted increasing attention. For a comprehensive review of this literature, see Eren and Uz (2020), for instance.

Data

The dataset used in this analysis comes from the Bay Area Bike Share in San Francisco, which was established in August 2013 and is now known as Bay Wheels. The dataset is publicly available on the Kaggle website (https://www.kaggle.com/datasets/benhamner/sf-bay-area-bike-share). It contains detailed information about the system from August 2013 to August 2015, including station locations, the number of available bicycles at each station over time, and all trip-level data during this time period. The trip data include details such as start and end times and stations, as well as the user type (subscriber or casual user). In this dataset, there are 70 bike stations in total; for a map of all 70 station locations, see Figure 6.1.

Refer to caption
Figure 6.1: Locations of bike stations

Since the initial installation of stations in August 2013, the 70th station (Ryland Park station) was added in April 2014. Accordingly, we use data from May 2014 to August 2015 for this analysis, which represents the largest balanced panel dataset that can be extracted from the raw data.

One concern in the analysis is that the shared mobility services often relocate vehicles or bikes from one station to another to maintain service availability across all locations. To detect potential relocations, we first identified instances where the number of available bicycles jumped up/down by more than or equal to 10 all at once. We then examined the distribution of these events across different hours and days, as given in Figure D.1 in Appendix D. From this figure, we can observe that sudden drops or increases in bike availability tend to occur between midnight and early morning, particularly on Sundays. Although we cannot access to formal records of actual relocation operations, these patterns may suggest that they are likely the result of bike relocation carried out by the service provider. Another concern is the enormous size of the dataset. Because the original data are recorded in minutes every day, using the raw data directly can lead to a memory problem. Moreover, daily data tend to fluctuate and to be noisy due to random events.

To address the aforementioned issues, we first rounded the trip data to 15-minute intervals and then averaged over Monday through Friday at each interval, discarding data from Saturdays and Sundays. Furthermore, to avoid potential bike relocation events in weekdays, we restrict the analysis to the time period from 6 AM to 9 PM. Consequently, our final dataset is a weekly-level panel with n=70n=70 stations and T=69T=69 weeks. The outcome of interest is the number of bicycles at each station for s[0,1]s\in[0,1], where s=0s=0 corresponds to 6 AM and s=1s=1 corresponds to 9 PM.

Figure 6.2 presents the trajectories of average bike availability for all 70 stations during the first week in our panel. It clearly shows that most of the variation in bike availability occurs between 6 AM and 9 PM.

Refer to caption
Figure 6.2: Availability of bikes at each station (averaged over 5-9 May, 2014)

Empirical results

Based on the dataset constructed as described above, we estimate model (1.2), where

Yit(s)\displaystyle Y_{it}(s) =number of available bikes at each station\displaystyle=\text{number of available bikes at each station} (6.1)
A(Yjt,s)\displaystyle A(Y_{jt},s) =average of Yjt in the past one hour\displaystyle=\text{average of $Y_{jt}$ in the past one hour} (6.2)
Xit\displaystyle X_{it} =[ ratio of round trips, ratio of subscribers (departing from station i), ratio of\displaystyle=\text{[\;ratio of round trips, ratio of subscribers (departing from station $i$), ratio of} (6.3)
subscribers (arriving at station ii), rainy day dummy, month dummies ] (6.4)
wij\displaystyle w_{ij} =w~ijjiw~ij,wherew~ij=𝟏{dist(i,j)1km}dist(i,j)\displaystyle=\frac{\widetilde{w}_{ij}}{\sum_{j\neq i}\widetilde{w}_{ij}},\;\;\text{where}\;\;\widetilde{w}_{ij}=\frac{\bm{1}\{\text{dist}(i,j)\leq 1\text{km}\}}{\text{dist}(i,j)} (6.5)

Here, dist(i,j)\text{dist}(i,j) denotes the Euclidean distance between stations ii and jj. The estimation procedure is basically the same as the GMM 1 estimator in Section 5, with K~=3\widetilde{K}=3. The rainy day dummy and month dummy variables are not used as IVs. All integrals are approximated by finite summations over grid points at 15-minute intervals.

The estimation result for the interaction effect function α0\alpha_{0} is presented in Figure 6.3. In the figure, the shaded area depicts the (pointwise) 95% confidence interval. From the figure, we observe that positive spatial interaction in bicycle availability exists during the morning hours. Although the model itself is agnostic regarding the reason for the interaction, it is plausible that as bike-sharing becomes more popular particularly among commuters, it encourages further use of the service, thereby reinforcing demand during the morning. Meanwhile, interestingly, negative interaction appears around 5–7 PM. In the evening, main users may include not only returning commuters but also individuals going out for dining, shopping, concerts, etc. As a result, bicycles might accumulate at certain popular stations while nearby less-popular stations experience lower availability, leading to the negative interaction.

Refer to caption
Figure 6.3: Estimated α0(s)\alpha_{0}(s)

To save space, the estimation results for β0(s)\beta_{0}(s) are presented in Figure D.2 in Appendix D, excluding the coefficients for the month dummies. Among the key covariates, we observe that only the ratio of arriving subscribers has a statistically significant positive impact on bike availability. This result is intuitive, as stations with a higher number of regular users arriving are expected to hold a richer stock of bikes. For other variables, for instance, the rainy day dummy has a positive effect on bike availability, which is consistent with previous studies, though the effect is not statistically significant. One possible explanation is that the rainy ”day” dummy does not capture detailed temporal variations (i.e., it is not a function of ss), and since our dataset is averaged over weeks, these may have diluted its impact.

Lastly, we conduct an impulse response analysis. The figures summarizing the results are presented in Figure 6.4. For illustration, we arbitrarily select the Embarcadero at Folsom station as the target station receiving an external shock. Specifically, we consider a hypothetical scenario in which the bike stock at this station is reduced by 2 at the peak of 9 AM (panel (a)). Panels (b) and (c) illustrate how the shock propagates to its two nearest stations, Spear at Folsom and Temporary Transbay Terminal. These figures indicate that the external shock spills over to these stations with a slight time delay, peaking just before 10 AM. Since the magnitude of both the external shock and spatial interaction is moderate in this analysis, the impulse responses for both stations are relatively mild.

Refer to caption
(a) External shock function η\eta
Refer to caption
(b) Spear at Folsom station
Refer to caption
(c) Temporary Transbay Terminal station
Figure 6.4: Impulse responses

IRF: IRF0=Wn0𝒆iγ^nT0(η,s)\text{IRF0}=W_{n}^{0}\bm{e}_{i}\widehat{\gamma}_{nT}^{0}(\eta,s), IRF1=IRF0+Wn1𝒆iγ^nT1(η,s)\text{IRF1}=\text{IRF0}+W_{n}^{1}\bm{e}_{i}\widehat{\gamma}_{nT}^{1}(\eta,s), and IRF2=IRF1+Wn2𝒆iγ^nT2(η,s)\text{IRF2}=\text{IRF1}+W_{n}^{2}\bm{e}_{i}\widehat{\gamma}_{nT}^{2}(\eta,s).

Conclusion

In this paper, we proposed a novel functional regression framework to analyze spatial and network interactions in functional panel data settings. By extending the standard NAR model to accommodate functional outcomes and individual fixed effects, we developed an integrated-GMM estimator that can estimate the functional parameters potentially more efficiently than 2SLS-based estimators. Under certain conditions, we established the theoretical properties of our estimator, including the consistency, convergence rates, and asymptotic normality, and confirmed its finite-sample performance through Monte Carlo simulations. As an empirical application, we analyzed the demand for a bike-sharing system in the San Francisco Bay Area, revealing significant spatial interactions in bike availability that vary over the time of day. Our findings highlight the importance of accounting for functional spatial dependencies in the demand for shared mobility services and the practical usefulness of our method.

Several unsolved questions still remain. These include: How can we specify the weight function ν\nu in a data-driven manner? Is it possible to extend the current framework to cases where functional outcomes are only sparsely observed? How can we construct a uniform confidence band for each functional parameter? How can we estimate the model when it has a large number of covariates? We leave these questions for future research.

Acknowledgments

Hoshino’s work was supported by JSPS KAKENHI Grant Number 23KK0226. Most parts of this paper were written during Hoshino’s research visit at the Melbourne Business School (MBS), University of Melbourne. He is deeply grateful to MBS for their hospitality.

Appendix A Preparation

The following definition is from Jenish and Prucha (2012).

Definition A.1 (Near-epoch dependence).

Let 𝒙={xn,i:i𝒟n;n1}\bm{x}=\{x_{n,i}:i\in\mathcal{D}_{n};\;n\geq 1\} and 𝒆={en,i:i𝒟n;n1}\bm{e}=\{e_{n,i}:i\in\mathcal{D}_{n};\;n\geq 1\} be triangular arrays of random fields, where xx and ee are real-valued and general (possibly infinite-dimensional) random variables, respectively. Then, the random field 𝒙\bm{x} is said to be LpL^{p}-near-epoch dependent (NED) on 𝒆\bm{e} if

xn,i𝔼[xn,in,i(δ)]pcn,iρ(δ)\left\|x_{n,i}-\mathbb{E}\left[x_{n,i}\mid\mathcal{F}_{n,i}(\delta)\right]\right\|_{p}\leq c_{n,i}\rho(\delta)

for an array of finite positive constants {cn,i:i𝒟n;n1}\{c_{n,i}:i\in\mathcal{D}_{n};\;n\geq 1\} and some function ρ(δ)0\rho(\delta)\geq 0 with ρ(δ)0\rho(\delta)\to 0 as δ\delta\to\infty, where n,i(δ)\mathcal{F}_{n,i}(\delta) is the σ\sigma-field generated by {en,j:Δ(i,j)δ}\{e_{n,j}:\Delta(i,j)\leq\delta\}. The cn,ic_{n,i}’s and ρ(δ)\rho(\delta) are called the NED scaling factors and NED coefficient, respectively. The 𝒙\bm{x} is said to be uniformly LpL^{p}-NED on 𝒆\bm{e} if cn,ic_{n,i} is uniformly bounded. If ρ(δ)ϱδ\rho(\delta)\lesssim\varrho^{\delta} for some 0<ϱ<10<\varrho<1, then it is called geometrically LpL^{p}-NED.

In the following, for a general θ=(θα,θ1,,θdx)ΘK\theta=(\theta_{\alpha}^{\top},\theta_{1}^{\top},\ldots,\theta_{d_{x}}^{\top})^{\top}\in\Theta_{K}, we denote

α(s;θ)\displaystyle\alpha(s;\theta) ϕK(s)θα\displaystyle\coloneqq\phi^{K}(s)^{\top}\theta_{\alpha} (A.1)
βj(s;θ)\displaystyle\beta_{j}(s;\theta) ϕK(s)θj,j[dx]\displaystyle\coloneqq\phi^{K}(s)^{\top}\theta_{j},\;\;j\in[d_{x}] (A.2)

Since we have assumed that the basis functions are continuous, so are α(s;θ)\alpha(s;\theta) and βj(s;θ)\beta_{j}(s;\theta), and thus they are uniformly bounded on [0,1][0,1] by the extreme value theorem. For a given θ\theta, the residual vector can be written as 𝑬(s;θ)=(E1(s;θ),,ET(s;θ))\bm{E}(s;\theta)=(E_{1}(s;\theta)^{\top},\ldots,E_{T}(s;\theta)^{\top}), where

Et(s;θ)\displaystyle E_{t}(s;\theta) =(e1t(s;θ),,ent(s;θ))\displaystyle=(e_{1t}(s;\theta),\ldots,e_{nt}(s;\theta))^{\top} (A.3)
eit(s;θ)\displaystyle e_{it}(s;\theta) =Yit(s)α(s;θ)A(Y¯it,s)j=1dxXitjβj(s;θ).\displaystyle=Y_{it}(s)-\alpha(s;\theta)A(\overline{Y}_{it},s)-\sum_{j=1}^{d_{x}}X_{it}^{j}\beta_{j}(s;\theta). (A.4)

Under Assumptions 2.1(i), 3.2, and 3.4, we have

eit(s;θ)p\displaystyle\left\|e_{it}(s;\theta)\right\|_{p} Yit(s)p+|α(s;θ)|j=1n|wi,j|A(Yjt,s)p+j=1dx|Xitj||βj(s;θ)|\displaystyle\leq\left\|Y_{it}(s)\right\|_{p}+|\alpha(s;\theta)|\sum_{j=1}^{n}|w_{i,j}|\left\|A(Y_{jt},s)\right\|_{p}+\sum_{j=1}^{d_{x}}|X_{it}^{j}|\cdot|\beta_{j}(s;\theta)| (A.5)
1\displaystyle\lesssim 1 (A.6)

for p>4p>4, uniformly in s[0,1]s\in[0,1], θΘK\theta\in\Theta_{K}, and (i,t)(i,t).

Finally, for ease of reference, we provide a list of some basic facts below:

𝑫𝑬(s;θ)\displaystyle\bm{D}\bm{E}(s;\theta) =𝑫𝑯(s)(θ0θ)+𝑫𝑽(s)+𝑫𝓔(s)\displaystyle=\bm{D}\bm{H}(s)(\theta_{0}-\theta)+\bm{D}\bm{V}(s)+\bm{D}\bm{\mathcal{E}}(s) (A.7)
𝒁(s)𝑫𝑫𝑬(s;θ)\displaystyle\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta) =𝒁(s)𝑫𝑫𝑯(s)(θ0θ)+𝒁(s)𝑫𝑫𝑽(s)+𝒁(s)𝑫𝑫𝓔(s)\displaystyle=\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{H}(s)(\theta_{0}-\theta)+\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s)+\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s) (A.8)
𝒁(s)𝑫𝑫𝑬(s;θ0)\displaystyle\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta_{0}) =𝒁(s)𝑫𝑫𝑽(s)+𝒁(s)𝑫𝑫𝓔(s)\displaystyle=\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s)+\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s) (A.9)
𝑬(s;θ)𝑫Pm𝑫𝑬(s;θ)\displaystyle\bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s;\theta) =(θ0θ)𝑯(s)𝑫Pm𝑫𝑯(s)(θ0θ)+𝑽(s)𝑫Pm𝑫𝑽(s)\displaystyle=(\theta_{0}-\theta)^{\top}\bm{H}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s)(\theta_{0}-\theta)+\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s) (A.10)
+𝓔(s)𝑫Pm𝑫𝓔(s)+2𝑽(s)𝑫Pm𝑫𝑯(s)(θ0θ)\displaystyle\quad+\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)+2\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s)(\theta_{0}-\theta) (A.11)
+2𝓔(s)𝑫Pm𝑫𝑯(s)(θ0θ)+2𝑽(s)𝑫Pm𝑫𝓔(s)\displaystyle\quad+2\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s)(\theta_{0}-\theta)+2\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s) (A.12)
𝑬(s;θ0)𝑫Pm𝑫𝑬(s;θ0)\displaystyle\bm{E}(s;\theta_{0})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s;\theta_{0}) =𝑽(s)𝑫Pm𝑫𝑽(s)+𝓔(s)𝑫Pm𝑫𝓔(s)+2𝑽(s)𝑫Pm𝑫𝓔(s)\displaystyle=\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s)+\bm{\mathcal{E}}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)+2\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s) (A.13)

Empirical moment function:

gnT(s;θ)1n(T1)(𝒁(s)𝑫𝑫𝑬(s;θ)𝑬(s;θ)𝑫P1𝑫𝑬(s;θ)𝑬(s;θ)𝑫PM𝑫𝑬(s;θ))\displaystyle g_{nT}(s;\theta)\coloneqq\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{E}(s;\theta)\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{E}(s;\theta)\\ \vdots\\ \bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{E}(s;\theta)\end{array}\right) (A.18)

Jacobian of gnT(s;θ)g_{nT}(s;\theta):

JnT(s;θ)\displaystyle J_{nT}(s;\theta) gnT(s;θ)θ=1n(T1)(𝒁(s)𝑫𝑫𝑯(s)2𝑬(s;θ)𝑫P1𝑫𝑯(s)2𝑬(s;θ)𝑫PM𝑫𝑯(s))\displaystyle\coloneqq\frac{\partial g_{nT}(s;\theta)}{\partial\theta^{\top}}=-\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\bm{H}(s)\\ 2\bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{H}(s)\\ \vdots\\ 2\bm{E}(s;\theta)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{H}(s)\end{array}\right) (A.23)

Decompose g¯nT(θ0)=g¯1,nT+g¯2,nT\overline{g}_{nT}(\theta_{0})=\overline{g}_{1,nT}+\overline{g}_{2,nT} with

g¯1,nT\displaystyle\overline{g}_{1,nT} 1Nl=1L(𝒁(sl)𝑫𝑫𝓔(sl)𝓔(sl)𝑫P1𝑫𝓔(sl)𝓔(sl)𝑫PM𝑫𝓔(sl))\displaystyle\coloneqq\frac{1}{N}\sum_{l=1}^{L}\left(\begin{array}[]{c}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})\\ \bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s_{l})\\ \vdots\\ \bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s_{l})\end{array}\right) (A.28)
g¯2,nT\displaystyle\overline{g}_{2,nT} 1Nl=1L(𝒁(sl)𝑫𝑫𝑽(sl)𝑽(sl)𝑫P1𝑫𝑽(sl)+2𝑽(sl)𝑫P1𝑫𝓔(sl)𝑽(sl)𝑫PM𝑫𝑽(sl)+2𝑽(sl)𝑫PM𝑫𝓔(sl))\displaystyle\coloneqq\frac{1}{N}\sum_{l=1}^{L}\left(\begin{array}[]{c}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})\\ \bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{V}(s_{l})+2\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s_{l})\\ \vdots\\ \bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{V}(s_{l})+2\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s_{l})\end{array}\right) (A.33)

The variance-covariance matrix of n(T1)g¯1,nT\sqrt{n(T-1)}\overline{g}_{1,nT}:

𝒱nTn(T1)𝔼[g¯1,nTg¯1,nT]=(𝒱z,nT𝟎(dq+dx)K×1𝟎(dq+dx)K×1𝟎1×(dq+dx)K𝒱11,nT𝒱1M,nT𝟎1×(dq+dx)K𝒱M1,nT𝒱MM,nT)\displaystyle\mathcal{V}_{nT}\coloneqq n(T-1)\mathbb{E}\left[\overline{g}_{1,nT}\overline{g}_{1,nT}^{\top}\right]=\left(\begin{array}[]{cccc}\mathcal{V}_{z,nT}&\bm{0}_{(d_{q}+d_{x})K\times 1}&\cdots&\bm{0}_{(d_{q}+d_{x})K\times 1}\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{11,nT}&\cdots&\mathcal{V}_{1M,nT}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{M1,nT}&\cdots&\mathcal{V}_{MM,nT}\end{array}\right) (A.38)

where

𝒱z,nTn(T1)N2l=1Ll=1L𝒁(sl)𝑫𝑫𝔼[𝓔(sl)𝓔(sl)]𝑫𝑫𝒁(sl)=1L2n(T1)l=1Ll=1Lt=1Ti=1nzit(sl)zit(sl)Γit(sl,sl),\displaystyle\begin{split}\mathcal{V}_{z,nT}&\coloneqq\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{\mathcal{E}}(s_{l})\bm{\mathcal{E}}(s_{l^{\prime}})^{\top}]\bm{D}^{\top}\bm{D}\bm{Z}(s_{l^{\prime}})\\ &=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T}\sum_{i=1}^{n}z_{it}^{\dagger}(s_{l})z_{it}^{\dagger}(s_{l^{\prime}})^{\top}\Gamma_{it}(s_{l},s_{l^{\prime}}),\end{split} (A.39)

zit(s)z_{it}^{\dagger}(s) denotes the itit-th column of 𝒁(s)𝑫𝑫(dq+dx)K×nT\underbracket{\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}}_{(d_{q}+d_{x})K\times nT}, 𝑫Pm𝑫nT×nTP~m=(p~m,it,jt)11it,jtnT\underbracket{\bm{D}^{\top}P_{m}\bm{D}}_{nT\times nT}\coloneqq\widetilde{P}_{m}=(\widetilde{p}_{m,it,jt})_{11\leq it,jt\leq nT}, and, noting that diag(P~m)=𝟎n(T1)\text{diag}(\widetilde{P}_{m})=\bm{0}_{n(T-1)} and that P~m\widetilde{P}_{m} is symmetric,

𝒱ab,nTn(T1)N2l=1Ll=1L𝔼[𝓔(sl)𝑫Pa𝑫𝓔(sl)𝓔(sl)𝑫Pb𝑫𝓔(sl)]=1n(T1)t=1Tt=1T1i1,i2n1j1,j2np~a,i1t,i2tp~b,j1t,j2t1L2l=1Ll=1L𝔼[εi1t(sl)εi2t(sl)εj1t(sl)εj2t(sl)]=1n(T1)t=1T1i1,i2np~a,i1t,i2tp~b,i1t,i2t1L2l=1Ll=1LΓi1t(sl,sl)Γi2t(sl,sl)+1n(T1)t=1T1i1,i2np~a,i1t,i2tp~b,i2t,i1t1L2l=1Ll=1LΓi1t(sl,sl)Γi2t(sl,sl)=2n(T1)t=1T1i1,i2np~a,i1t,i2tp~b,i1t,i2t1L2l=1Ll=1LΓi1t(sl,sl)Γi2t(sl,sl).\displaystyle\begin{split}\mathcal{V}_{ab,nT}&\coloneqq\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{a}\bm{D}\bm{\mathcal{E}}(s_{l})\bm{\mathcal{E}}(s_{l^{\prime}})^{\top}\bm{D}^{\top}P_{b}\bm{D}\bm{\mathcal{E}}(s_{l^{\prime}})]\\ &=\frac{1}{n(T-1)}\sum_{t=1}^{T}\sum_{t^{\prime}=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\sum_{1\leq j_{1},j_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,j_{1}t^{\prime},j_{2}t^{\prime}}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[\varepsilon_{i_{1}t}(s_{l})\varepsilon_{i_{2}t}(s_{l})\varepsilon_{j_{1}t^{\prime}}(s_{l^{\prime}})\varepsilon_{j_{2}t^{\prime}}(s_{l^{\prime}})]\\ &=\frac{1}{n(T-1)}\sum_{t=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,i_{1}t,i_{2}t}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{i_{1}t}(s_{l},s_{l^{\prime}})\Gamma_{i_{2}t}(s_{l},s_{l^{\prime}})\\ &\quad+\frac{1}{n(T-1)}\sum_{t=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,i_{2}t,i_{1}t}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{i_{1}t}(s_{l},s_{l^{\prime}})\Gamma_{i_{2}t}(s_{l},s_{l^{\prime}})\\ &=\frac{2}{n(T-1)}\sum_{t=1}^{T}\sum_{1\leq i_{1},i_{2}\leq n}\widetilde{p}_{a,i_{1}t,i_{2}t}\widetilde{p}_{b,i_{1}t,i_{2}t}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{i_{1}t}(s_{l},s_{l^{\prime}})\Gamma_{i_{2}t}(s_{l},s_{l^{\prime}}).\end{split} (A.40)

Note that the cross terms between the linear and quadratic moments are zero:

n(T1)N2l=1Ll=1L𝔼[𝒁(sl)𝑫𝑫𝓔(sl)𝓔(sl)𝑫Pm𝑫𝓔(sl)]\displaystyle\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})\bm{\mathcal{E}}(s_{l^{\prime}})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l^{\prime}})] (A.41)
=n(T1)N2l=1Ll=1Lt=1Tt=1Ti=1n1i1,i2nzit(sl)p~m,i1t,i2t𝔼[εit(sl)εi1t(sl)εi2t(sl)]=𝟎(dq+dx)K.\displaystyle=\frac{n(T-1)}{N^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T}\sum_{t^{\prime}=1}^{T}\sum_{i=1}^{n}\sum_{1\leq i_{1},i_{2}\leq n}z_{it}^{\dagger}(s_{l})\widetilde{p}_{m,i_{1}t^{\prime},i_{2}t^{\prime}}\mathbb{E}[\varepsilon_{it}(s_{l})\varepsilon_{i_{1}t^{\prime}}(s_{l^{\prime}})\varepsilon_{i_{2}t^{\prime}}(s_{l^{\prime}})]=\bm{0}_{(d_{q}+d_{x})K}. (A.42)

Appendix B Proofs

Proof of Proposition 2.1

Under Assumption 2.1, we have

{𝒜H}iL2=α0()j=1nwi,jA(hj,)L2j=1n|wi,j|α0()A(hj,)L2=j=1n|wi,j|(01|α0(s)A(hj,s)|2ds)1/2α¯0j=1n|wi,j|A(hj,)L2α¯0Wnmax1jnhjL2<H,2<\displaystyle\begin{split}\left\|\{\mathcal{A}H\}_{i}\right\|_{L^{2}}=\left\|\alpha_{0}(\cdot)\sum_{j=1}^{n}w_{i,j}A(h_{j},\cdot)\right\|_{L^{2}}&\leq\sum_{j=1}^{n}|w_{i,j}|\left\|\alpha_{0}(\cdot)A(h_{j},\cdot)\right\|_{L^{2}}\\ &=\sum_{j=1}^{n}|w_{i,j}|\left(\int_{0}^{1}\left|\alpha_{0}(s)A(h_{j},s)\right|^{2}\text{d}s\right)^{1/2}\\ &\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\left\|A(h_{j},\cdot)\right\|_{L^{2}}\\ &\leq\overline{\alpha}_{0}||W_{n}||_{\infty}\max_{1\leq j\leq n}||h_{j}||_{L^{2}}<||H||_{\infty,2}<\infty\end{split} (B.1)

for any Hn,2H\in\mathcal{H}_{n,2}. This implies that 𝒜Hn,2\mathcal{A}H\in\mathcal{H}_{n,2}. As is well known, if the operator norm of 𝒜\mathcal{A} is smaller than one, (Id𝒜)1(\text{Id}-\mathcal{A})^{-1} exists (e.g., Theorem 2.14, Kress (2014)). It is immediate from (B.1) that 𝒜H,2<1\left\|\mathcal{A}H\right\|_{\infty,2}<1 follows for any HH such that H,2=1||H||_{\infty,2}=1, which yields the desired result. \blacksquare


Lemma B.1.

Suppose that Assumptions 2.1(i), 3.2, 3.3(ii), 3.4, 3.5(i), and 3.7 hold. Then, 𝔼[gnT(s,θ0)]𝟏dgKπ\mathbb{E}[g_{nT}(s,\theta_{0})]\lesssim\bm{1}_{d_{g}}K^{-\pi}

Proof.

Observe that

𝔼[gnT(s;θ0)]\displaystyle\mathbb{E}[g_{nT}(s;\theta_{0})] =1n(T1)(𝒁(s)𝑫𝑫𝔼[𝑽(s)]𝔼[𝑽(s)𝑫P1𝑫𝑽(s)]+2𝔼[𝑽(s)𝑫P1𝑫𝓔(s)]𝔼[𝑽(s)𝑫PM𝑫𝑽(s)]+2𝔼[𝑽(s)𝑫PM𝑫𝓔(s)])\displaystyle=\frac{1}{n(T-1)}\left(\begin{array}[]{c}\bm{Z}(s)^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{V}(s)]\\ \mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{V}(s)]+2\mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{\mathcal{E}}(s)]\\ \vdots\\ \mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{V}(s)]+2\mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{\mathcal{E}}(s)]\end{array}\right) (B.6)

By Assumptions 3.2(ii) and 3.4,

|A(𝔼[Yjt],s)|\displaystyle|A(\mathbb{E}[Y_{jt}],s)| 01|𝔼[Yjt(u)]|ω1(u,s)du\displaystyle\leq\int_{0}^{1}|\mathbb{E}[Y_{jt}(u)]|\omega_{1}(u,s)\text{d}u (B.7)
01𝔼|Yjt(u)|ω1(u,s)du1\displaystyle\leq\int_{0}^{1}\mathbb{E}|Y_{jt}(u)|\omega_{1}(u,s)\text{d}u\lesssim 1 (B.8)

uniformly in s[0,1]s\in[0,1], implying that sups[0,1]|A(𝔼[Yjt],s)|1\sup_{s\in[0,1]}|A(\mathbb{E}[Y_{jt}],s)|\lesssim 1. Then, we have

|𝔼[vit(s)]|\displaystyle|\mathbb{E}[v_{it}(s)]| j=1n|wi,j||A(𝔼[Yjt],s)||α0(s)α(s;θ0)|+j=1dx|Xitj||β0j(s)βj(s;θ0)|\displaystyle\leq\sum_{j=1}^{n}|w_{i,j}|\cdot|A(\mathbb{E}[Y_{jt}],s)|\cdot|\alpha_{0}(s)-\alpha(s;\theta_{0})|+\sum_{j=1}^{d_{x}}|X_{it}^{j}|\cdot|\beta_{0j}(s)-\beta_{j}(s;\theta_{0})| (B.9)
Kπ\displaystyle\lesssim K^{-\pi} (B.10)

uniformly in s[0,1]s\in[0,1] and (i,t)(i,t) under Assumption 3.7. This implies that the first (dq+dx)K(d_{q}+d_{x})K elements of 𝔼[gnT(s;θ0)]\mathbb{E}[g_{nT}(s;\theta_{0})] are of order KπK^{-\pi}.

Next, by Cauchy-Schwarz inequality and the facts that λmax(𝑫𝑫)4\lambda_{\max}(\bm{D}\bm{D}^{\top})\leq 4 and λmax(PmPm)1\lambda_{\max}(P_{m}P_{m}^{\top})\lesssim 1 under Assumption 3.5(i), we obtain

𝔼|𝑽(s)𝑫Pm𝑫𝑽(s)|\displaystyle\mathbb{E}\left|\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s)\right| 𝔼𝑽(s)𝑫Pm𝑫2𝔼𝑽(s)2\displaystyle\leq\sqrt{\mathbb{E}\left\|\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\right\|^{2}}\sqrt{\mathbb{E}\left\|\bm{V}(s)\right\|^{2}} (B.11)
trace{𝑫Pm𝑫𝑫Pm𝑫𝔼[𝑽(s)𝑽(s)]}𝔼𝑽(s)2\displaystyle\leq\sqrt{\text{trace}\{\bm{D}^{\top}P_{m}\bm{D}\bm{D}^{\top}P_{m}^{\top}\bm{D}\mathbb{E}[\bm{V}(s)\bm{V}(s)^{\top}]\}}\sqrt{\mathbb{E}\left\|\bm{V}(s)\right\|^{2}} (B.12)
t=1Ti=1n𝔼|vit(s)|2.\displaystyle\lesssim\sum_{t=1}^{T}\sum_{i=1}^{n}\mathbb{E}|v_{it}(s)|^{2}. (B.13)

Similarly as above, by the crc_{r} inequality,

𝔼|vit(s)|2\displaystyle\mathbb{E}|v_{it}(s)|^{2} 2𝔼|j=1nwi,jA(Yjt,s)[α0(s)α(s;θ0)]|2+2|j=1dxXitj(β0j(s)βj(s;θ0))|2\displaystyle\leq 2\mathbb{E}\left|\sum_{j=1}^{n}w_{i,j}A(Y_{jt},s)[\alpha_{0}(s)-\alpha(s;\theta_{0})]\right|^{2}+2\left|\sum_{j=1}^{d_{x}}X_{it}^{j}(\beta_{0j}(s)-\beta_{j}(s;\theta_{0}))\right|^{2} (B.14)
2j=1nj=1nwi,jwi,j𝔼[A(Yjt,s)A(Yjt,s)][α0(s)α(s;θ0)]2+cK2π.\displaystyle\leq 2\sum_{j=1}^{n}\sum_{j^{\prime}=1}^{n}w_{i,j}w_{i,j^{\prime}}\mathbb{E}[A(Y_{jt},s)A(Y_{j^{\prime}t},s)][\alpha_{0}(s)-\alpha(s;\theta_{0})]^{2}+cK^{-2\pi}. (B.15)

By Cauchy-Schwarz inequality,

|𝔼[A(Yjt,s)A(Yjt,s)]|\displaystyle|\mathbb{E}[A(Y_{jt},s)A(Y_{j^{\prime}t},s)]| A(Yjt,s)2A(Yjt,s)2.\displaystyle\leq\left\|A(Y_{jt},s)\right\|_{2}\left\|A(Y_{j^{\prime}t},s)\right\|_{2}. (B.16)

Further, Assumptions 3.2(ii) and 3.4 imply that 𝔼|A(Yjt,s)|201𝔼|Yjt(u)|2ω2(u,s)du1\mathbb{E}|A(Y_{jt},s)|^{2}\leq\int_{0}^{1}\mathbb{E}|Y_{jt}(u)|^{2}\omega_{2}(u,s)\text{d}u\lesssim 1 uniformly in s[0,1]s\in[0,1] and (j,t)(j,t). Thus, |𝔼[A(Yjt,s)A(Yjt,s)]||\mathbb{E}[A(Y_{jt},s)A(Y_{j^{\prime}t},s)]| is uniformly bounded, and we have

vit(s)2Kπ\displaystyle\left\|v_{it}(s)\right\|_{2}\lesssim K^{-\pi} (B.17)

uniformly in s[0,1]s\in[0,1] and (i,t)(i,t).

Lastly, by Cauchy-Schwarz and Minkowski’s inequalities,

𝔼[𝑽(s)𝑫Pm𝑫𝓔(s)]\displaystyle\mathbb{E}[\bm{V}(s)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s)] =t[T1]1i,jnpm,i,j𝔼[(vi,t+1(s)vit(s))(εj,t+1(s)εjt(s))]\displaystyle=\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}p_{m,i,j}\mathbb{E}[(v_{i,t+1}(s)-v_{it}(s))(\varepsilon_{j,t+1}(s)-\varepsilon_{jt}(s))] (B.18)
t[T1]1i,jn|pm,i,j|𝔼|(vi,t+1(s)vit(s))(εj,t+1(s)εjt(s))|\displaystyle\leq\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}|p_{m,i,j}|\mathbb{E}|(v_{i,t+1}(s)-v_{it}(s))(\varepsilon_{j,t+1}(s)-\varepsilon_{jt}(s))| (B.19)
t[T1]1i,jn|pm,i,j|vi,t+1(s)vit(s)2εj,t+1(s)εjt(s)2\displaystyle\leq\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}|p_{m,i,j}|\cdot||v_{i,t+1}(s)-v_{it}(s)||_{2}||\varepsilon_{j,t+1}(s)-\varepsilon_{jt}(s)||_{2} (B.20)
t[T1]1i,jn|pm,i,j|{vi,t+1(s)2+vit(s)2}{εj,t+1(s)2+εjt(s)2}\displaystyle\leq\sum_{t\in[T-1]}\sum_{1\leq i,j\leq n}|p_{m,i,j}|\cdot\{||v_{i,t+1}(s)||_{2}+||v_{it}(s)||_{2}\}\{||\varepsilon_{j,t+1}(s)||_{2}+||\varepsilon_{jt}(s)||_{2}\} (B.21)
n(T1)Kπ\displaystyle\lesssim n(T-1)K^{-\pi} (B.22)

where the last inequality follows from (B.17) and Assumptions 3.3(ii) and 3.5(i). Combining these results gives the desired result. ∎


Denote the population GMM objective function as follows:

𝒬nT(θ)𝔼[g¯nT(θ)]ΩnT𝔼[g¯nT(θ)]\displaystyle\mathcal{Q}^{*}_{nT}(\theta)\coloneqq\mathbb{E}[\overline{g}_{nT}(\theta)]^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta)] (B.23)
Lemma B.2.

Suppose that Assumptions 2.1(i), 3.2, 3.3(ii), 3.4, 3.5, 3.6, and 3.7 hold. In addition, assume that K(12π)/20K^{(1-2\pi)/2}\to 0 as nTnT\to\infty. Then, for any θΘK\theta\in\Theta_{K} and e>0e>0 such that θθ0e\left\|\theta-\theta_{0}\right\|\geq e, there exists a constant ce>0c_{e}>0 such that 𝒬nT(θ)𝒬nT(θ0)>ce\mathcal{Q}^{*}_{nT}(\theta)-\mathcal{Q}^{*}_{nT}(\theta_{0})>c_{e} for all sufficiently large nTnT.

Proof.

Decompose

𝒬nT(θ)𝒬nT(θ0)\displaystyle\mathcal{Q}^{*}_{nT}(\theta)-\mathcal{Q}^{*}_{nT}(\theta_{0}) =(𝔼[g¯nT(θ)]𝔼[g¯nT(θ0)])ΩnT(𝔼[g¯nT(θ)]𝔼[g¯nT(θ0)])AnT(θ)\displaystyle=\underbracket{\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{\top}\Omega_{nT}\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)}_{\eqqcolon A_{nT}(\theta)} (B.24)
+2(𝔼[g¯nT(θ)]𝔼[g¯nT(θ0)])ΩnT𝔼[g¯nT(θ0)]\displaystyle\quad+2\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta_{0})] (B.25)

In view of

𝔼[g¯nT(θ)]𝔼[g¯nT(θ0)]=1Nl=1L(𝒁(sl)𝑫𝑫𝔼[𝑯(sl)]𝔼[(𝑯(sl)(θ0θ)+2𝑽(sl)+2𝓔(sl))𝑫P1𝑫𝑯(sl)]𝔼[(𝑯(sl)(θ0θ)+2𝑽(sl)+2𝓔(sl))𝑫PM𝑫𝑯(sl)])(θ0θ),\displaystyle\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]=\frac{1}{N}\sum_{l=1}^{L}\left(\begin{array}[]{c}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{H}(s_{l})]\\ \mathbb{E}\left[(\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l}))^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{H}(s_{l})\right]\\ \vdots\\ \mathbb{E}\left[(\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l}))^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{H}(s_{l})\right]\end{array}\right)(\theta_{0}-\theta), (B.30)

we can find that AnT(θ)A_{nT}(\theta) is bounded below from λmin(ΩnT)λmin(ΠnTΠnT)θ0θ2c1e2\lambda_{\min}(\Omega_{nT})\lambda_{\min}(\Pi_{nT}^{\top}\Pi_{nT})||\theta_{0}-\theta||^{2}\geq c_{1}e^{2} for some c1>0c_{1}>0 for all sufficiently large nTnT, under Assumptions 3.5(ii) and 3.6. Further, Cauchy-Schwarz inequality and Lemma B.1 give that

|(𝔼[g¯nT(θ)]𝔼[g¯nT(θ0)])ΩnT𝔼[g¯nT(θ0)]|\displaystyle\left|\left(\mathbb{E}[\overline{g}_{nT}(\theta)]-\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right| (AnT(θ))1/2(𝔼[g¯nT(θ0)]ΩnT𝔼[g¯nT(θ0)])1/2\displaystyle\leq(A_{nT}(\theta))^{1/2}\left(\mathbb{E}[\overline{g}_{nT}(\theta_{0})]^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta_{0})]\right)^{1/2} (B.31)
c2(AnT(θ))1/2K(12π)/2\displaystyle\leq c_{2}(A_{nT}(\theta))^{1/2}K^{(1-2\pi)/2} (B.32)

Hence, since (AnT(θ))1/2(A_{nT}(\theta))^{1/2} is bounded below from zero and K(12π)/20K^{(1-2\pi)/2}\to 0, we have

𝒬nT(θ)𝒬nT(θ0)\displaystyle\mathcal{Q}^{*}_{nT}(\theta)-\mathcal{Q}^{*}_{nT}(\theta_{0}) AnT(θ)2c2(AnT(θ))1/2K(12π)/2\displaystyle\geq A_{nT}(\theta)-2c_{2}(A_{nT}(\theta))^{1/2}K^{(1-2\pi)/2} (B.33)
=(AnT(θ))1/2((AnT(θ))1/22c2K(12π)/2)>0\displaystyle=(A_{nT}(\theta))^{1/2}((A_{nT}(\theta))^{1/2}-2c_{2}K^{(1-2\pi)/2})>0 (B.34)

for all sufficiently large nTnT. This completes the proof. ∎

Lemma B.3.

Suppose that Assumptions 2.1, 3.1, 3.2, and 3.4 hold. Then, for any given s[0,1]s\in[0,1] and all t[T]t\in[T], {Yit(s):i𝒟n;n1}\{Y_{it}(s):i\in\mathcal{D}_{n};\;n\geq 1\} is uniformly and geometrically L2L^{2}-NED on {εit:i𝒟n;n1}\{\varepsilon_{it}:i\in\mathcal{D}_{n};\;n\geq 1\}.

Proof.

We prove the lemma in a similar manner to Jenish (2012) and Hoshino (2022). Recall that YtY_{t} is uniquely determined in n,2\mathcal{H}_{n,2} as Yt=(Id𝒜)1[Xtβ0+F0]+(Id𝒜)1tY_{t}=(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}]+(\text{Id}-\mathcal{A})^{-1}\mathcal{E}_{t} under Assumption 2.1 for all t[T]t\in[T]. We denote

[ξ1t(),,ξnt()]=(Id𝒜)1[Xtβ0+F0]+(Id𝒜)1[]:n,2n,2\displaystyle[\xi_{1t}(\cdot),\ldots,\xi_{nt}(\cdot)]^{\top}=(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}]+(\text{Id}-\mathcal{A})^{-1}[\cdot]:\mathcal{H}_{n,2}\to\mathcal{H}_{n,2} (B.35)

such that Yit=ξit(t)Y_{it}=\xi_{it}(\mathcal{E}_{t}) holds for each i[n]i\in[n].

Define

1,it(δ){εjt}j:Δ(i,j)δ,2,it(δ){εjt}j:Δ(i,j)>δ\displaystyle\mathcal{E}_{1,it}^{(\delta)}\coloneqq\{\varepsilon_{jt}\}_{j:\Delta(i,j)\leq\delta},\qquad\mathcal{E}_{2,it}^{(\delta)}\coloneqq\{\varepsilon_{jt}\}_{j:\Delta(i,j)>\delta} (B.36)

for some δ>0\delta>0. Since L2(0,1)L^{2}(0,1) is separable, both 1,it(δ)\mathcal{E}_{1,it}^{(\delta)} and 2,it(δ)\mathcal{E}_{2,it}^{(\delta)} are Polish space-valued random elements in (|{j:Δ(i,j)δ}|,2,||||,2)(\mathcal{H}_{|\{j:\Delta(i,j)\leq\delta\}|,2},||\cdot||_{\infty,2}) and (|{j:Δ(i,j)>δ}|,2,||||,2)(\mathcal{H}_{|\{j:\Delta(i,j)>\delta\}|,2},||\cdot||_{\infty,2}), respectively. Then, by Lemma 2.11 of Dudley and Philipp (1983) (see also Lemma A.1 of Jenish (2012)), a function χ\chi exists such that (1,it(δ),χ(U,1,it(δ)))(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)})) has the same law as that of (1,it(δ),2,it(δ))(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)}), which is an appropriate permutation of \mathcal{E}, where UU is a random variable uniformly distributed on [0,1][0,1] and independent of 1,it(δ)\mathcal{E}_{1,it}^{(\delta)}.

Now, with a slight abuse of notation, we write

Yit\displaystyle Y_{it} =ξit(1,it(δ),2,it(δ))ξit(t)\displaystyle=\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})\equiv\xi_{it}(\mathcal{E}_{t}) (B.37)
Yit(δ)\displaystyle Y^{(\delta)}_{it} ξit(1,it(δ),χ(U,1,it(δ)))ξit(t(δ))\displaystyle\coloneqq\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))\equiv\xi_{it}(\mathcal{E}_{t}^{(\delta)}) (B.38)

where t(δ)=(ε1t(δ),,εnt(δ))\mathcal{E}^{(\delta)}_{t}=(\varepsilon_{1t}^{(\delta)},\ldots,\varepsilon_{nt}^{(\delta)})^{\top}. To be specific,

Yit(δ)(s)\displaystyle Y_{it}^{(\delta)}(s) ={(Id𝒜)1[Xtβ0+F0+t(δ)](s)}i\displaystyle=\left\{(\text{Id}-\mathcal{A})^{-1}[X_{t}\beta_{0}+F_{0}+\mathcal{E}_{t}^{(\delta)}](s)\right\}_{i} (B.39)
=α0(s)j=1nwi,jA(Yjt(δ),s)+Xitβ0(s)+f0i(s)+εit(δ)(s).\displaystyle=\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt}^{(\delta)},s)+X_{it}^{\top}\beta_{0}(s)+f_{0i}(s)+\varepsilon_{it}^{(\delta)}(s). (B.40)

By construction, we have

𝔼[Yit(s)it(δ)]\displaystyle\mathbb{E}[Y_{it}(s)\mid\mathcal{F}_{it}(\delta)] =𝔼[ξit(1,it(δ),2,it(δ))(s)1,it(δ)]\displaystyle=\mathbb{E}\left[\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)\mid\mathcal{E}_{1,it}^{(\delta)}\right]
=𝔼[ξit(1,it(δ),χ(U,1,it(δ)))(s)1,it(δ)]=𝔼[Yit(δ)(s)it(δ)],\displaystyle=\mathbb{E}\left[\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))(s)\mid\mathcal{E}_{1,it}^{(\delta)}\right]=\mathbb{E}[Y^{(\delta)}_{it}(s)\mid\mathcal{F}_{it}(\delta)],

where it(δ)\mathcal{F}_{it}(\delta) is the σ\sigma-field generated by 1,it(δ)={εjt:Δ(i,j)δ}\mathcal{E}_{1,it}^{(\delta)}=\{\varepsilon_{jt}:\Delta(i,j)\leq\delta\}. Similarly, we have Yit(u)Yit(δ)(u)224Yit(u)22||Y_{it}(u)-Y^{(\delta)}_{it}(u)||^{2}_{2}\leq 4||Y_{it}(u)||_{2}^{2}.

Here, suppose that 0<δ<Δ¯0<\delta<\overline{\Delta}, where Δ¯\overline{\Delta} is as provided in Assumption 3.1(ii). Then, because at least ii’s own εit\varepsilon_{it} is included in 1,it(δ)\mathcal{E}_{1,it}^{(\delta)}, we have εitεit(δ)\varepsilon_{it}\equiv\varepsilon_{it}^{(\delta)}, and hence

Yit(s)Yit(δ)(s)=α0(s)j=1nwi,jA(YjtYjt(δ),s)\displaystyle Y_{it}(s)-Y^{(\delta)}_{it}(s)=\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt}-Y^{(\delta)}_{jt},s) (B.41)

holds. Thus, by Minkowski’s inequality and Assumptions 3.2(ii) and 3.4,

Yit(s)Yit(δ)(s)2\displaystyle\left\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\|_{2} =α0(s)j=1nwi,jA(YjtYjt(δ),s)2\displaystyle=\left\|\alpha_{0}(s)\sum_{j=1}^{n}w_{i,j}A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\|_{2}
|α0(s)|j=1n|wi,j|A(YjtYjt(δ),s)2\displaystyle\leq|\alpha_{0}(s)|\sum_{j=1}^{n}|w_{i,j}|\cdot\left\|A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\|_{2}
|α0(s)|j=1n|wi,j|(01Yjt(u)Yjt(δ)(u)22ω2(u,s)du)1/2Cϱ,\displaystyle\leq|\alpha_{0}(s)|\sum_{j=1}^{n}|w_{i,j}|\cdot\left(\int_{0}^{1}\left\|Y_{jt}(u)-Y^{(\delta)}_{jt}(u)\right\|_{2}^{2}\>\omega_{2}(u,s)\text{d}u\right)^{1/2}\leq C\cdot\varrho,

where C2maxi,tesssupu[0,1]Yit(u)2C\coloneqq 2\max_{i,t}\operatorname*{ess\,sup}_{u\in[0,1]}||Y_{it}(u)||_{2}, and ϱα¯0Wn\varrho\coloneqq\overline{\alpha}_{0}||W_{n}||_{\infty}. Similarly, when Δ¯δ<2Δ¯\overline{\Delta}\leq\delta<2\overline{\Delta} holds, noting now that under Assumption 3.1(ii) we have εjtεjt(δ)\varepsilon_{jt}\equiv\varepsilon^{(\delta)}_{jt} for all jj’s who are direct neighbors of ii,

Yit(s)Yit(δ)(s)2α¯0j=1n|wi,j|A(YjtYjt(δ),s)2=α¯0j=1n|wi,j|k=1nwj,kA(α0()A(YktYkt(δ),),s)2α¯0j=1n|wi,j|k=1n|wj,k|A(α0()A(YktYkt(δ),),s)2α¯0j=1n|wi,j|k=1n|wj,k|(01α0(u)A(YktYkt(δ),u)22ω2(u,s)du)1/2α¯02j=1n|wi,j|k=1n|wj,k|(01A(YktYkt(δ),u)22ω2(u,s)du)1/2α¯02j=1n|wi,j|k=1n|wj,k|(0101Ykt(t)Ykt(δ)(t)22ω2(t,u)ω2(u,s)dtdu)1/2Cϱ2.\displaystyle\begin{split}\left\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\|_{2}&\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\cdot\left\|A(Y_{jt}-Y^{(\delta)}_{jt},s)\right\|_{2}\\ &=\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\cdot\left\|\sum_{k=1}^{n}w_{j,k}A(\alpha_{0}(\cdot)A(Y_{kt}-Y_{kt}^{(\delta)},\cdot),s)\right\|_{2}\\ &\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left\|A(\alpha_{0}(\cdot)A(Y_{kt}-Y_{kt}^{(\delta)},\cdot),s)\right\|_{2}\\ &\leq\overline{\alpha}_{0}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left(\int_{0}^{1}\left\|\alpha_{0}(u)A(Y_{kt}-Y_{kt}^{(\delta)},u)\right\|_{2}^{2}\omega_{2}(u,s)\text{d}u\right)^{1/2}\\ &\leq\overline{\alpha}_{0}^{2}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left(\int_{0}^{1}\left\|A(Y_{kt}-Y_{kt}^{(\delta)},u)\right\|_{2}^{2}\omega_{2}(u,s)\text{d}u\right)^{1/2}\\ &\leq\overline{\alpha}_{0}^{2}\sum_{j=1}^{n}|w_{i,j}|\sum_{k=1}^{n}|w_{j,k}|\cdot\left(\int_{0}^{1}\int_{0}^{1}\left\|Y_{kt}(t)-Y^{(\delta)}_{kt}(t)\right\|_{2}^{2}\>\omega_{2}(t,u)\omega_{2}(u,s)\text{d}t\text{d}u\right)^{1/2}\leq C\cdot\varrho^{2}.\end{split}

Applying the same argument recursively, for mΔ¯δ<(m+1)Δ¯m\overline{\Delta}\leq\delta<(m+1)\overline{\Delta} such that εjtεjt(δ)\varepsilon_{jt}\equiv\varepsilon^{(\delta)}_{jt} for all jj’s in the mm-th order neighborhood of ii, we obtain

Yit(s)Yit(δ)(s)2Cϱδ/Δ¯+1.\displaystyle\left\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\|_{2}\leq C\cdot\varrho^{\lfloor\delta/\overline{\Delta}\rfloor+1}. (B.42)

Finally, by Jensen’s inequality and (B.42),

Yit(s)𝔼[Yit(s)it(δ)]2=01[ξit(1,it(δ),2,it(δ))(s)ξit(1,it(δ),χ(u,1,it(δ)))(s)]du2{𝔼01|ξit(1,it(δ),2,it(δ))(s)ξit(1,it(δ),χ(u,1,it(δ)))(s)|2du}1/2={𝔼|ξit(1,it(δ),2,it(δ))(s)ξit(1,it(δ),χ(U,1,it(δ)))(s)|2}1/2=ξit(1,it(δ),2,it(δ))(s)ξit(1,it(δ),χ(U,1,it(δ)))(s)2=Yit(s)Yit(δ)(s)2Cϱδ/Δ¯+10\displaystyle\begin{split}\left\|Y_{it}(s)-\mathbb{E}[Y_{it}(s)\mid\mathcal{F}_{it}(\delta)]\right\|_{2}&=\left\|\int_{0}^{1}\left[\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(u,\mathcal{E}_{1,it}^{(\delta)}))(s)\right]\text{d}u\right\|_{2}\\ &\leq\left\{\mathbb{E}\int_{0}^{1}\left|\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(u,\mathcal{E}_{1,it}^{(\delta)}))(s)\right|^{2}\text{d}u\right\}^{1/2}\\ &=\left\{\mathbb{E}\left|\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))(s)\right|^{2}\right\}^{1/2}\\ &=\left\|\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\mathcal{E}_{2,it}^{(\delta)})(s)-\xi_{it}(\mathcal{E}_{1,it}^{(\delta)},\chi(U,\mathcal{E}_{1,it}^{(\delta)}))(s)\right\|_{2}\\ &=\left\|Y_{it}(s)-Y^{(\delta)}_{it}(s)\right\|_{2}\leq C\cdot\varrho^{\lfloor\delta/\overline{\Delta}\rfloor+1}\to 0\end{split} (B.43)

as δ\delta\to\infty by Assumption 2.1. This completes the proof. ∎

Lemma B.4.

Suppose that Assumptions 2.1, 3.1, 3.2, and 3.4 hold. Then, for any given s[0,1]s\in[0,1] and all t[T]t\in[T], {A(Y¯it,s):i𝒟n;n1}\{A(\overline{Y}_{it},s):i\in\mathcal{D}_{n};\;n\geq 1\} is uniformly and geometrically L2L^{2}-NED on {εit:i𝒟n;n1}\{\varepsilon_{it}:i\in\mathcal{D}_{n};\;n\geq 1\}.

Proof.

Note that jt((δ1)Δ¯)it(δΔ¯)\mathcal{F}_{jt}((\delta-1)\overline{\Delta})\subseteq\mathcal{F}_{it}(\delta\overline{\Delta}) for (i,j)(i,j) with Δ(i,j)Δ¯\Delta(i,j)\leq\overline{\Delta} and δ>1\delta>1. Thus, by Lemma B.3,

Y¯it(s)𝔼[Y¯it(s)it(δΔ¯)]2\displaystyle\left\|\overline{Y}_{it}(s)-\mathbb{E}[\overline{Y}_{it}(s)\mid\mathcal{F}_{it}(\delta\overline{\Delta})]\right\|_{2} j=1n|wi,j|Yjt(s)𝔼[Yjt(s)it(δΔ¯)]2\displaystyle\leq\sum_{j=1}^{n}|w_{i,j}|\left\|Y_{jt}(s)-\mathbb{E}[Y_{jt}(s)\mid\mathcal{F}_{it}(\delta\overline{\Delta})]\right\|_{2} (B.44)
j:Δ(i,j)Δ¯Yjt(s)𝔼[Yjt(s)jt((δ1)Δ¯)]2\displaystyle\lesssim\sum_{j:\Delta(i,j)\leq\overline{\Delta}}\left\|Y_{jt}(s)-\mathbb{E}[Y_{jt}(s)\mid\mathcal{F}_{jt}((\delta-1)\overline{\Delta})]\right\|_{2} (B.45)
ϱδ,\displaystyle\lesssim\varrho^{\lfloor\delta\rfloor}, (B.46)

which implies that {Y¯it(s)}\{\overline{Y}_{it}(s)\} is uniformly and geometrically L2L^{2}-NED. By Assumption 3.4,

A(Y¯it,s)𝔼[A(Y¯it,s)it(δ)]2\displaystyle\left\|A(\overline{Y}_{it},s)-\mathbb{E}[A(\overline{Y}_{it},s)\mid\mathcal{F}_{it}(\delta)]\right\|_{2} =A(Y¯it𝔼[Y¯itit(δ)],s)2\displaystyle=\left\|A(\overline{Y}_{it}-\mathbb{E}[\overline{Y}_{it}\mid\mathcal{F}_{it}(\delta)],s)\right\|_{2} (B.47)
(01Y¯it(u)𝔼[Y¯it(u)it(δ)]22ω2(u,s)du)1/2\displaystyle\leq\left(\int_{0}^{1}\left\|\overline{Y}_{it}(u)-\mathbb{E}[\overline{Y}_{it}(u)\mid\mathcal{F}_{it}(\delta)]\right\|_{2}^{2}\>\omega_{2}(u,s)\text{d}u\right)^{1/2} (B.48)
ϱδ/Δ¯.\displaystyle\lesssim\varrho^{\lfloor\delta/\overline{\Delta}\rfloor}. (B.49)

This proves the desired result. ∎

As a useful consequence from Lemma B.3 and B.4, we have

eit(s;θ)𝔼[eit(s;θ)it(δ)]2\displaystyle\left\|e_{it}(s;\theta)-\mathbb{E}[e_{it}(s;\theta)\mid\mathcal{F}_{it}(\delta)]\right\|_{2} Yit(s)𝔼[Yit(s)it(δ)]2\displaystyle\leq\left\|Y_{it}(s)-\mathbb{E}[Y_{it}(s)\mid\mathcal{F}_{it}(\delta)]\right\|_{2} (B.50)
+|α(s;θ)|A(Y¯it,s)𝔼[A(Y¯it,s)it(δ)]2\displaystyle\quad+|\alpha(s;\theta)|\cdot\left\|A(\overline{Y}_{it},s)-\mathbb{E}[A(\overline{Y}_{it},s)\mid\mathcal{F}_{it}(\delta)]\right\|_{2} (B.51)
ϱδ/Δ¯\displaystyle\lesssim\varrho^{\lfloor\delta/\overline{\Delta}\rfloor} (B.52)

uniformly in s[0,1]s\in[0,1], θΘK\theta\in\Theta_{K}, and (i,t)(i,t); that is, {eit(s;θ)}\{e_{it}(s;\theta)\} is uniformly and geometrically L2L^{2}-NED.

Lemma B.5.

Suppose that Assumption 3.3(i) holds. Let {ξit:i𝒟n;n1}\{\xi_{it}:i\in\mathcal{D}_{n};\;n\geq 1\} be a geometrically L2L^{2}-NED random field on {εit:i𝒟n;n1}\{\varepsilon_{it}:i\in\mathcal{D}_{n};\;n\geq 1\} for all t[T]t\in[T], independent of {εit:i𝒟n;n1}\{\varepsilon_{it^{\prime}}:i\in\mathcal{D}_{n};\;n\geq 1\} for ttt^{\prime}\neq t. Denote ξitξi,t+1ξit\vec{\xi}_{it}\coloneqq\xi_{i,t+1}-\xi_{it} and Cξmaxi,tξit2C_{\xi}\coloneqq\max_{i,t}||\xi_{it}||_{2}. Then,

  • (i)

    |Cov(ξit,ξjt)|Cξ2ρ(Δ(i,j)/3)\left|\text{Cov}\left(\vec{\xi}_{it},\vec{\xi}_{jt}\right)\right|\lesssim C_{\xi}^{2}\rho(\Delta(i,j)/3) for all t[T]t\in[T] with some geometric NED coefficient ρ\rho;

  • (ii)

    |Cov(ξi,t+1,ξjt)|Cξ2ρ(Δ(i,j)/3)\left|\text{Cov}\left(\vec{\xi}_{i,t+1},\vec{\xi}_{jt}\right)\right|\lesssim C_{\xi}^{2}\rho(\Delta(i,j)/3) for all t[T1]t\in[T-1] with some geometric NED coefficient ρ\rho.

Proof.

Since the proofs are similar, we only prove (ii). Decompose ξit=ξ1,it(δ)+ξ2,it(δ)\vec{\xi}_{it}=\vec{\xi}_{1,it}^{(\delta)}+\vec{\xi}_{2,it}^{(\delta)}, where

ξ1,it(δ)𝔼[ξitit+(δ)],andξ2,it(δ)ξit𝔼[ξitit+(δ)],\displaystyle\vec{\xi}_{1,it}^{(\delta)}\coloneqq\mathbb{E}\left[\vec{\xi}_{it}\mid\mathcal{F}^{+}_{it}(\delta)\right],\;\;\text{and}\;\;\vec{\xi}_{2,it}^{(\delta)}\coloneqq\vec{\xi}_{it}-\mathbb{E}\left[\vec{\xi}_{it}\mid\mathcal{F}^{+}_{it}(\delta)\right], (B.53)

where it+(δ)\mathcal{F}^{+}_{it}(\delta) is the sigma field generated from {(εit,εi,t+1):Δ(i,i)δ}\{(\varepsilon_{i^{\prime}t},\varepsilon_{i^{\prime},t+1}):\Delta(i,i^{\prime})\leq\delta\}. Since εit\varepsilon_{i^{\prime}t} and εi,t+1\varepsilon_{i^{\prime},t+1} are assumed to be independent, it+(δ)=it(δ)i,t+1(δ)\mathcal{F}^{+}_{it}(\delta)=\mathcal{F}_{it}(\delta)\lor\mathcal{F}_{i,t+1}(\delta) holds. Then, for each pair ξi,t+1\vec{\xi}_{i,t+1} and ξjt\vec{\xi}_{jt}, denoting δijΔ(i,j)/3\delta_{ij}\coloneqq\Delta(i,j)/3,

|Cov(ξi,t+1,ξjt)|\displaystyle\left|\mathrm{Cov}\left(\vec{\xi}_{i,t+1},\vec{\xi}_{jt}\right)\right| =|Cov(ξ1,i,t+1(δij)+ξ2,i,t+1(δij),ξ1,jt(δij)+ξ2,jt(δij))|\displaystyle=\left|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})}+\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}+\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right| (B.54)
|Cov(ξ1,i,t+1(δij),ξ1,jt(δij))|+|Cov(ξ1,i,t+1(δij),ξ2,jt(δij))|\displaystyle\leq\left|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right|+\left|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right| (B.55)
+|Cov(ξ2,i,t+1(δij),ξ1,jt(δij))|+|Cov(ξ2,i,t+1(δij),ξ2,jt(δij))|.\displaystyle\quad+\left|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right|+\left|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right|. (B.56)

The first term on the right-hand side is zero by Assumption 3.3(i). Note that, by Jensen’s and triangle inequalities, ξ1,it(δij)2ξit2ξi,t+12+ξit22Cξ||\vec{\xi}_{1,it}^{(\delta_{ij})}||_{2}\leq||\vec{\xi}_{it}||_{2}\leq||\xi_{i,t+1}||_{2}+||\xi_{it}||_{2}\leq 2C_{\xi}. In addition, ξ2,it(δij)22ξit24Cξ||\vec{\xi}_{2,it}^{(\delta_{ij})}||_{2}\leq 2||\vec{\xi}_{it}||_{2}\leq 4C_{\xi}. Then, since {ξit}\{\xi_{it}\} is assumed to be L2L^{2}-NED on {εit}\{\varepsilon_{it}\} at each tt, it holds that

ξ2,it(δij)2\displaystyle\left\|\vec{\xi}_{2,it}^{(\delta_{ij})}\right\|_{2} =ξit𝔼[ξitit+(δij)]2\displaystyle=\left\|\vec{\xi}_{it}-\mathbb{E}\left[\vec{\xi}_{it}\mid\mathcal{F}^{+}_{it}(\delta_{ij})\right]\right\|_{2} (B.57)
ξi,t+1𝔼[ξi,t+1i,t+1(δij)]2+ξit𝔼[ξitit(δij)]24Cξρ(δij).\displaystyle\leq\left\|\xi_{i,t+1}-\mathbb{E}\left[\xi_{i,t+1}\mid\mathcal{F}_{i,t+1}(\delta_{ij})\right]\right\|_{2}+\left\|\xi_{it}-\mathbb{E}\left[\xi_{it}\mid\mathcal{F}_{it}(\delta_{ij})\right]\right\|_{2}\leq 4C_{\xi}\rho(\delta_{ij}). (B.58)

Hence, Cauchy–Schwarz inequality gives

|Cov(ξ1,i,t+1(δij),ξ2,jt(δij))|\displaystyle\left|\mathrm{Cov}\left(\vec{\xi}_{1,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right| 4ξ1,i,t+1(δij)2ξ2,jt(δij)232Cξ2ρ(δij).\displaystyle\leq 4\left\|\vec{\xi}_{1,i,t+1}^{(\delta_{ij})}\right\|_{2}\left\|\vec{\xi}_{2,jt}^{(\delta_{ij})}\right\|_{2}\leq 32C_{\xi}^{2}\rho(\delta_{ij}). (B.59)

The same inequality applies to |Cov(ξ2,i,t+1(δij),ξ1,jt(δij))|\left|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{1,jt}^{(\delta_{ij})}\right)\right|. Furthermore,

|Cov(ξ2,i,t+1(δij),ξ2,jt(δij))|\displaystyle\left|\mathrm{Cov}\left(\vec{\xi}_{2,i,t+1}^{(\delta_{ij})},\vec{\xi}_{2,jt}^{(\delta_{ij})}\right)\right| 4ξ2,i,t+1(δij)2ξ2,jt(δij)264Cξ2ρ(δij).\displaystyle\leq 4\left\|\vec{\xi}_{2,i,t+1}^{(\delta_{ij})}\right\|_{2}\left\|\vec{\xi}_{2,jt}^{(\delta_{ij})}\right\|_{2}\leq 64C_{\xi}^{2}\rho(\delta_{ij}). (B.60)

This completes the proof. ∎

Lemma B.6.

Suppose that Assumptions 2.1, 3.13.4, 3.5(i), and 3.7 hold. For all m[M]m\in[M] and θΘK\theta\in\Theta_{K},

  • (i)

    l=1L𝒁(sl)𝑫𝑫(𝑯(sl)𝔼[𝑯(sl)])(θ0θ)/NpK/nT\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)/N\right\|\lesssim_{p}\sqrt{K}/\sqrt{nT}

  • (ii)

    l=1L𝒁(sl)𝑫𝑫(𝑯(sl)𝔼[𝑯(sl)])/NpK/nT\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])/N\right\|\lesssim_{p}K/\sqrt{nT}

  • (iii)

    l=1L𝒁(sl)𝑫𝑫𝑽(sl)/NpK(12π)/2\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})/N\right\|\lesssim_{p}K^{(1-2\pi)/2}, l=1L𝒁(sl)𝑫𝑫𝔼[𝑽(sl)]/NK(12π)/2\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{V}(s_{l})]/N\right\|\lesssim K^{(1-2\pi)/2}

  • (iv)

    l=1L𝒁(sl)𝑫𝑫𝓔(sl)/Np1/nT\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right\|\lesssim_{p}1/\sqrt{nT}

  • (v)

    |l=1L{𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)𝔼[𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)]}/N|p1/nT\left|\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)-\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]\right\}/N\right|\lesssim_{p}1/\sqrt{nT}

  • (vi)

    |l=1L𝑽(sl)𝑫Pm𝑫𝑽(sl)/N|pK2π\left|\sum_{l=1}^{L}\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{V}(s_{l})/N\right|\lesssim_{p}K^{-2\pi}

  • (vii)

    |l=1L𝑽(sl)𝑫Pm𝑫𝓔(sl)/N|pKπ\left|\sum_{l=1}^{L}\bm{V}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right|\lesssim_{p}K^{-\pi}

  • (viii)

    |l=1L𝓔(sl)𝑫Pm𝑫𝓔(sl)/N|p1/nT\left|\sum_{l=1}^{L}\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right|\lesssim_{p}1/\sqrt{nT}.

Proof.

Below, for a generic variable x indexed by ii and tt, we denote xit=xi,t+1xit\vec{\text{x}}_{it}=\text{x}_{i,t+1}-\text{x}_{it}. In addition, we write ait(s)A(Y¯it,s)a_{it}(s)\coloneqq A(\overline{Y}_{it},s).


(i) Observe that for each sls_{l},

𝒁(sl)𝑫𝑫(𝑯(sl)𝔼[𝑯(sl)])(θ0θ)\displaystyle\left\|\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)\right\| =i=1nt=1T1BitϕK(sl)[ait(sl)𝔼(ait(sl))]α(sl;θ0θ).\displaystyle=\left\|\sum_{i=1}^{n}\sum_{t=1}^{T-1}\vec{B}_{it}\otimes\phi^{K}(s_{l})[\vec{a}_{it}(s_{l})-\mathbb{E}(\vec{a}_{it}(s_{l}))]\alpha(s_{l};\theta_{0}-\theta)\right\|. (B.61)

As a typical element, the variance of the first element of 𝒁(sl)𝑫𝑫(𝑯(sl)𝔼[𝑯(sl)])(θ0θ)\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta) is given as

Var[{𝒁𝑫𝑫(𝑯𝔼[𝑯])(θ0θ)}1]\displaystyle\text{Var}[\{\bm{Z}^{\top}\bm{D}^{\top}\bm{D}(\bm{H}-\mathbb{E}[\bm{H}])(\theta_{0}-\theta)\}_{1}] =𝔼(i=1nt=1T1Qit1ϕ1(sl)(ait𝔼[ait])α(sl;θ0θ))2\displaystyle=\mathbb{E}\left(\sum_{i=1}^{n}\sum_{t=1}^{T-1}\vec{Q}^{1}_{it}\phi_{1}(s_{l})(\vec{a}_{it}-\mathbb{E}[\vec{a}_{it}])\alpha(s_{l};\theta_{0}-\theta)\right)^{2} (B.62)
t=1T1t=1T1i=1ni=1n|Cov(ait,ait)|\displaystyle\lesssim\sum_{t=1}^{T-1}\sum_{t^{\prime}=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}=1}^{n}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right| (B.63)
=t=1T1i=1nVar(ait)+t=1T1i=1nii|Cov(ait,ait)|\displaystyle=\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{Var}(\vec{a}_{it})+\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t})\right| (B.64)
+t=1T1tti=1n|Cov(ait,ait)|+t=1T1tti=1nii|Cov(ait,ait)|.\displaystyle\quad+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{it^{\prime}})\right|+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right|. (B.65)

Here, the dependence on sls_{l}, ”(sl)(s_{l})”, is occasionally omitted for notational simplicity. First, from Assumptions 2.1(i), 3.2(ii), and 3.4, we can easily see that t=1T1i=1nVar(ait)nT\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{Var}(\vec{a}_{it})\lesssim nT. Second, by Lemma B.4 and B.5(i), there exists a geometric NED coefficient ρ\rho that satisfies

t=1T1i=1nii|Cov(ait,ait)|\displaystyle\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t})\right| t=1T1i=1niiρ(Δ(i,i)/3)\displaystyle\lesssim\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\rho(\Delta(i,i^{\prime})/3) (B.66)
=t=1T1i=1nm=1i:Δ(i,i)[m,m+1)ρ(Δ(i,i)/3)\displaystyle=\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{m=1}^{\infty}\sum_{i^{\prime}:\Delta(i,i^{\prime})\in[m,m+1)}\rho(\Delta(i,i^{\prime})/3) (B.67)
t=1T1i=1nm=1md1ρ(m)nT,\displaystyle\lesssim\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{m=1}^{\infty}m^{d-1}\rho(m)\lesssim nT, (B.68)

where the second inequality is from Lemma A.1(iii) Jenish and Prucha (2009), and the final claim follows from the geometric NED property. Third, since ait\vec{a}_{it} and ait\vec{a}_{it^{\prime}} are independent if |tt|2|t-t^{\prime}|\geq 2, it holds that

t=1T1tti=1n|Cov(ait,ait)|\displaystyle\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{it^{\prime}})\right| =t{t1,t+1}t=1T1i=1n|Cov(ait,ait)|nT.\displaystyle=\sum_{t^{\prime}\in\{t-1,t+1\}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{it^{\prime}})\right|\lesssim nT. (B.69)

Finally, noting that t=1T1tti=1nii|Cov(ait,ait)|=t{t1,t+1}t=1T1i=1nii|Cov(ait,ait)|\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right|=\sum_{t^{\prime}\in\{t-1,t+1\}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\sum_{i^{\prime}\neq i}\left|\text{Cov}(\vec{a}_{it},\vec{a}_{i^{\prime}t^{\prime}})\right|, Lemma B.5(ii) implies that this term is also of order nTnT.

Combining the above results suggests that 𝔼𝒁(sl)𝑫𝑫(𝑯(sl)𝔼[𝑯(sl)])(θ0θ)/(n(T1))2K/(nT)\mathbb{E}\left\|\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)/(n(T-1))\right\|^{2}\lesssim K/(nT) for each sls_{l}, which completes the proof by applying Markov’s and triangle inequalities.


(ii) Analogous to the proof of (i).


(iii) Observe that

l=1L𝒁(sl)𝑫𝑫𝑽(sl)/N\displaystyle\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})/N\right\| =l=1Lt=1T1i=1nBitϕK(sl)vit(sl)/N\displaystyle=\left\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\phi^{K}(s_{l})\vec{v}_{it}(s_{l})/N\right\| (B.70)
1Nl=1Lt=1T1i=1nvit(sl)ϕK(sl).\displaystyle\lesssim\frac{1}{N}\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\left\|\vec{v}_{it}(s_{l})\phi^{K}(s_{l})\right\|. (B.71)

Then, the result follows from 𝔼vit(sl)ϕK(sl)𝔼|vit(sl)|sups[0,1]ϕK(s)K(12π)/2\mathbb{E}||\vec{v}_{it}(s_{l})\phi^{K}(s_{l})||\leq\mathbb{E}|\vec{v}_{it}(s_{l})|\sup_{s\in[0,1]}||\phi^{K}(s)||\lesssim K^{(1-2\pi)/2}. The second part can be proved analogously.


(iv) By the triangle inequality,

l=1L𝒁(sl)𝑫𝑫𝓔(sl)/N\displaystyle\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})/N\right\| =l=1Lt=1T1i=1nBitϕK(sl)εit(sl)/N\displaystyle=\left\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\phi^{K}(s_{l})\vec{\varepsilon}_{it}(s_{l})/N\right\| (B.72)
l=1Lt=1T1i=1nBitεi,t+1(sl)ϕK(sl)/N+l=1Lt=1T1i=1nBitεit(sl)ϕK(sl)/N.\displaystyle\leq\left\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\varepsilon_{i,t+1}(s_{l})\phi^{K}(s_{l})/N\right\|+\left\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\varepsilon_{it}(s_{l})\phi^{K}(s_{l})/N\right\|. (B.73)

Further, by Assumptions 3.3(i) and (iii),

𝔼l=1Lt=1T1i=1nBitεit(sl)ϕK(sl)/N2\displaystyle\mathbb{E}\left\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\varepsilon_{it}(s_{l})\phi^{K}(s_{l})/N\right\|^{2} (B.74)
=1n2(T1)2t=1T1i=1ntrace{BitBit1L2l=1Ll=1LΓit(sl,sl)ϕK(sl)ϕK(sl)}\displaystyle=\frac{1}{n^{2}(T-1)^{2}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{trace}\left\{\vec{B}_{it}\vec{B}_{it}^{\top}\otimes\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi^{K}(s_{l})\phi^{K}(s_{l^{\prime}})^{\top}\right\} (B.75)
=1n2(T1)2t=1T1i=1ntrace{BitBit}trace{1L2l=1Ll=1LΓit(sl,sl)ϕK(sl)ϕK(sl)}1/(nT).\displaystyle=\frac{1}{n^{2}(T-1)^{2}}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\text{trace}\left\{\vec{B}_{it}\vec{B}_{it}^{\top}\right\}\text{trace}\left\{\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{it}(s_{l},s_{l^{\prime}})\phi^{K}(s_{l})\phi^{K}(s_{l^{\prime}})^{\top}\right\}\lesssim 1/(nT). (B.76)

Repeating the same calculation for the other term, the result follows from Markov’s inequality.


(v) Observe that

𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)𝔼[𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)]\displaystyle\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)-\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)] (B.77)
=t=1T11i,jnpm,i,j(eit(sl;θ)ejt(sl;θ)𝔼[eit(sl;θ)ejt(sl;θ)]).\displaystyle=\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}p_{m,i,j}\left(\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)-\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]\right). (B.78)

Here, let 𝒆m,jt(sl;θ)i=1npm,i,jeit(sl;θ)\bm{e}_{m,jt}(s_{l};\theta)\coloneqq\sum_{i=1}^{n}p_{m,i,j}e_{it}(s_{l};\theta) and recall that there is a constant Δ¯m\overline{\Delta}_{m} such that pm,i,j=0p_{m,i,j}=0 if Δ(i,j)>Δ¯m\Delta(i,j)>\overline{\Delta}_{m}. Then, noting that it((δ1)Δ¯m)jt(δΔ¯m)\mathcal{F}_{it}((\delta-1)\overline{\Delta}_{m})\subseteq\mathcal{F}_{jt}(\delta\overline{\Delta}_{m}) for (i,j)(i,j) with Δ(i,j)Δ¯m\Delta(i,j)\leq\overline{\Delta}_{m} and δ>1\delta>1,

𝒆m,jt(sl;θ)𝔼[𝒆m,jt(sl;θ)jt(δΔ¯m)]2\displaystyle\left\|\bm{e}_{m,jt}(s_{l};\theta)-\mathbb{E}[\bm{e}_{m,jt}(s_{l};\theta)\mid\mathcal{F}_{jt}(\delta\overline{\Delta}_{m})]\right\|_{2} i=1n|pm,i,j|eit(sl;θ)𝔼[eit(sl;θ)jt(δΔ¯m)]2\displaystyle\leq\sum_{i=1}^{n}|p_{m,i,j}|\left\|e_{it}(s_{l};\theta)-\mathbb{E}[e_{it}(s_{l};\theta)\mid\mathcal{F}_{jt}(\delta\overline{\Delta}_{m})]\right\|_{2} (B.79)
i:Δ(i,j)Δ¯meit(sl;θ)𝔼[eit(sl;θ)it((δ1)Δ¯m)]2\displaystyle\lesssim\sum_{i:\Delta(i,j)\leq\overline{\Delta}_{m}}\left\|e_{it}(s_{l};\theta)-\mathbb{E}[e_{it}(s_{l};\theta)\mid\mathcal{F}_{it}((\delta-1)\overline{\Delta}_{m})]\right\|_{2} (B.80)
ϱ(δ1)Δ¯m/Δ¯,\displaystyle\lesssim\varrho^{\lfloor(\delta-1)\overline{\Delta}_{m}/\overline{\Delta}\rfloor}, (B.81)

which implies that {𝒆m,jt(sl;θ)}\{\bm{e}_{m,jt}(s_{l};\theta)\} is uniformly and geometrically L2L^{2}-NED, for all l[L]l\in[L], m[M]m\in[M], and θΘK\theta\in\Theta_{K}.

Now, suppressing the dependence on both sls_{l} and θ\theta,

Var[𝑬𝑫Pm𝑫𝑬]\displaystyle\text{Var}[\bm{E}^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}] =𝔼(t=1T11i,jnpm,i,j{eitejt𝔼[eitejt]})2\displaystyle=\mathbb{E}\left(\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}p_{m,i,j}\{\vec{e}_{it}\vec{e}_{jt}-\mathbb{E}[\vec{e}_{it}\vec{e}_{jt}]\}\right)^{2} (B.82)
=𝔼(t=1T1j=1n{𝒆m,jtejt𝔼[𝒆m,jtejt]})2\displaystyle=\mathbb{E}\left(\sum_{t=1}^{T-1}\sum_{j=1}^{n}\{\vec{\bm{e}}_{m,jt}\vec{e}_{jt}-\mathbb{E}[\vec{\bm{e}}_{m,jt}\vec{e}_{jt}]\}\right)^{2} (B.83)
t=1T1t=1T1j=1nj=1n|Cov(𝒆m,jtejt,𝒆m,jtejt)|\displaystyle\leq\sum_{t=1}^{T-1}\sum_{t^{\prime}=1}^{T-1}\sum_{j=1}^{n}\sum_{j^{\prime}=1}^{n}\left|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,j^{\prime}t^{\prime}}\vec{e}_{j^{\prime}t^{\prime}})\right| (B.84)
=t=1T1j=1nVar(𝒆m,jtejt)+t=1T1j=1njj|Cov(𝒆m,jtejt,𝒆m,jtejt)|\displaystyle=\sum_{t=1}^{T-1}\sum_{j=1}^{n}\text{Var}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt})+\sum_{t=1}^{T-1}\sum_{j=1}^{n}\sum_{j^{\prime}\neq j}\left|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,j^{\prime}t}\vec{e}_{j^{\prime}t})\right| (B.85)
+t=1T1ttj=1n|Cov(𝒆m,jtejt,𝒆m,jtejt)|+t=1T1ttj=1njj|Cov(𝒆m,jtejt,𝒆m,jtejt)|.\displaystyle\quad+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{j=1}^{n}\left|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,jt^{\prime}}\vec{e}_{jt^{\prime}})\right|+\sum_{t=1}^{T-1}\sum_{t^{\prime}\neq t}\sum_{j=1}^{n}\sum_{j^{\prime}\neq j}\left|\text{Cov}(\vec{\bm{e}}_{m,jt}\vec{e}_{jt},\vec{\bm{e}}_{m,j^{\prime}t^{\prime}}\vec{e}_{j^{\prime}t^{\prime}})\right|. (B.86)

As we have seen in (A.5), we have ejtp||e_{jt}||_{p}, 𝒆m,jtp<||\bm{e}_{m,jt}||_{p}<\infty for p>4p>4. This allows us to use Lemma A.1 of Xu and Lee (2015) (see also Corollary 4.3(b) of Gallant and White (1988)) to show that {𝒆m,jtejt}\{\bm{e}_{m,jt}e_{jt}\} is uniformly and geometrically L2L^{2}-NED. Then, following the same argument as in the proof of (i), we can show that Var[𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)]nT\text{Var}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]\lesssim nT for each sls_{l}, which gives the desired result by the triangle inequality.


(vi), (vii) These can be proved in a similar manner to the proof of Lemma B.1.


(viii) For each sls_{l}, |𝓔(sl)𝑫Pm𝑫𝓔(sl)/(n(T1))|p1/nT|\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{\mathcal{E}}(s_{l})/(n(T-1))|\lesssim_{p}1/\sqrt{nT} holds under Assumptions 3.3(i) and (ii), as in Lemma 9 in Yu et al. (2008) and Lemma 1 in Lee and Yu (2014). Then, the result is straightforward.

Lemma B.7.

Suppose that Assumptions 2.1(i), 3.2, 3.4, and 3.5(i) hold. Then, supθΘK𝔼[g¯nT(θ)]K\sup_{\theta\in\Theta_{K}}\left\|\mathbb{E}[\overline{g}_{nT}(\theta)]\right\|\lesssim\sqrt{K}.

Proof.

Observe that

𝔼[g¯nT(θ)]\displaystyle\left\|\mathbb{E}[\overline{g}_{nT}(\theta)]\right\| l=1L𝒁(sl)𝑫𝑫𝔼[𝑬(sl;θ)]/N+m=1M|l=1L𝔼[𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)]/N|.\displaystyle\leq\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{E}(s_{l};\theta)]/N\right\|+\sum_{m=1}^{M}\left|\sum_{l=1}^{L}\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]/N\right|. (B.87)

For the first term,

l=1L𝒁(sl)𝑫𝑫𝔼[𝑬(sl;θ)]/N\displaystyle\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{E}(s_{l};\theta)]/N\right\| =l=1Lt=1T1i=1nBit𝔼[eit(sl;θ)]ϕK(sl)/N\displaystyle=\left\|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\vec{B}_{it}\otimes\mathbb{E}[\vec{e}_{it}(s_{l};\theta)]\phi^{K}(s_{l})/N\right\| (B.88)
1Nl=1Lt=1T1i=1n𝔼[eit(sl;θ)]ϕK(sl)\displaystyle\lesssim\frac{1}{N}\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{i=1}^{n}\left\|\mathbb{E}[\vec{e}_{it}(s_{l};\theta)]\phi^{K}(s_{l})\right\| (B.89)
K\displaystyle\lesssim\sqrt{K} (B.90)

uniformly in θΘK\theta\in\Theta_{K}, since |𝔼[eit(sl;θ)]|𝔼|eit(sl;θ)|1|\mathbb{E}[e_{it}(s_{l};\theta)]|\leq\mathbb{E}|e_{it}(s_{l};\theta)|\lesssim 1 and sups[0,1]ϕK(s)K\sup_{s\in[0,1]}||\phi^{K}(s)||\lesssim\sqrt{K}.

For the second term,

|l=1L𝔼[𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)]/N|\displaystyle\left|\sum_{l=1}^{L}\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]/N\right| =|l=1Lt=1T11i,jnpm,i,j𝔼[eit(sl;θ)ejt(sl;θ)]/N|\displaystyle=\left|\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}p_{m,i,j}\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]/N\right| (B.91)
1Nl=1Lt=1T11i,jn|pm,i,j||𝔼[eit(sl;θ)ejt(sl;θ)]|\displaystyle\leq\frac{1}{N}\sum_{l=1}^{L}\sum_{t=1}^{T-1}\sum_{1\leq i,j\leq n}|p_{m,i,j}|\cdot|\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]| (B.92)
1\displaystyle\lesssim 1 (B.93)

uniformly in θΘK\theta\in\Theta_{K}, since

|𝔼[eit(sl;θ)ejt(sl;θ)]|\displaystyle|\mathbb{E}[\vec{e}_{it}(s_{l};\theta)\vec{e}_{jt}(s_{l};\theta)]| 𝔼[|eit(sl;θ)||ejt(sl;θ)|]\displaystyle\leq\mathbb{E}[|\vec{e}_{it}(s_{l};\theta)|\cdot|\vec{e}_{jt}(s_{l};\theta)|] (B.94)
eit(sl;θ)2eit(sl;θ)2\displaystyle\leq||\vec{e}_{it}(s_{l};\theta)||_{2}\cdot||\vec{e}_{it}(s_{l};\theta)||_{2} (B.95)
{ei,t+1(sl;θ)2+eit(sl;θ)2}{ej,t+1(sl;θ)2+ejt(sl;θ)2}\displaystyle\leq\{||e_{i,t+1}(s_{l};\theta)||_{2}+||e_{it}(s_{l};\theta)||_{2}\}\cdot\{||e_{j,t+1}(s_{l};\theta)||_{2}+||e_{jt}(s_{l};\theta)||_{2}\} (B.96)
1.\displaystyle\lesssim 1. (B.97)

This completes the proof. ∎

Lemma B.8.

Suppose that Assumptions 2.1, 3.13.5, and 3.7 hold. In addition, assume that K/nT0K/\sqrt{nT}\to 0 and K1π0K^{1-\pi}\to 0 as nTnT\to\infty. Then, θ^nTθ0=oP(1)||\widehat{\theta}_{nT}-\theta_{0}||=o_{P}(1).

Proof.

Observe that

g¯nT(θ)𝔼[g¯nT(θ)]=(A0(θ)A1(θ)AM(θ))\displaystyle\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]=\left(\begin{array}[]{c}A_{0}(\theta)\\ A_{1}(\theta)\\ \vdots\\ A_{M}(\theta)\end{array}\right) (B.102)

where

A0(θ)\displaystyle A_{0}(\theta) l=1L𝒁(sl)𝑫𝑫(𝑯(sl)𝔼[𝑯(sl)])(θ0θ)/NpK/nT:Lemma B.6(i)\displaystyle\coloneqq\underbracket{\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}(\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})])(\theta_{0}-\theta)/N}_{\lesssim_{p}\sqrt{K}/\sqrt{nT}:\>\text{Lemma \ref{lem:matLLN}(i)}} (B.103)
+l=1L𝒁(sl)𝑫𝑫𝑽(sl)/Nl=1L𝒁(sl)𝑫𝑫𝔼[𝑽(sl)]/NpK(12π)/2:Lemma B.6(iii)+l=1L𝒁(sl)𝑫𝑫𝓔(sl)/Np1/nT:Lemma B.6(iv)\displaystyle\quad+\underbracket{\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{V}(s_{l})/N-\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\mathbb{E}[\bm{V}(s_{l})]/N}_{\lesssim_{p}K^{(1-2\pi)/2}:\>\text{Lemma \ref{lem:matLLN}(iii)}}+\underbracket{\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})/N}_{\lesssim_{p}1/\sqrt{nT}:\>\text{Lemma \ref{lem:matLLN}(iv)}} (B.104)

and, for m=1,,Mm=1,\ldots,M,

Am(s;θ)\displaystyle A_{m}(s;\theta) l=1L{𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)𝔼[𝑬(sl;θ)𝑫Pm𝑫𝑬(sl;θ)]}/Np1/nT:Lemma B.6(v).\displaystyle\coloneqq\underbracket{\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)-\mathbb{E}[\bm{E}(s_{l};\theta)^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{E}(s_{l};\theta)]\right\}/N}_{\lesssim_{p}1/\sqrt{nT}:\>\text{Lemma \ref{lem:matLLN}(v)}}. (B.105)

Hence,

g¯nT(θ)𝔼[g¯nT(θ)]\displaystyle\left\|\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right\| A0(θ)+m=1M|Am(θ)|\displaystyle\leq\left\|A_{0}(\theta)\right\|+\sum_{m=1}^{M}\left|A_{m}(\theta)\right| (B.106)
pK/nT+K(12π)/2\displaystyle\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{(1-2\pi)/2} (B.107)

uniformly in θΘK\theta\in\Theta_{K}. Further, by Cauchy-Schwarz inequality and Lemma B.7,

supθΘK|𝒬nT(θ)𝒬nT(θ)|\displaystyle\sup_{\theta\in\Theta_{K}}\left|\mathcal{Q}_{nT}(\theta)-\mathcal{Q}^{*}_{nT}(\theta)\right| supθΘK|(g¯nT(θ)𝔼[g¯nT(θ)])ΩnT(g¯nT(θ)𝔼[g¯nT(θ)])|\displaystyle\leq\sup_{\theta\in\Theta_{K}}\left|\left(\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right)^{\top}\Omega_{nT}\left(\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right)\right| (B.108)
+2supθΘK|(g¯nT(θ)𝔼[g¯nT(θ)])ΩnT𝔼[g¯nT(θ)]|\displaystyle\quad+2\sup_{\theta\in\Theta_{K}}\left|\left(\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right)^{\top}\Omega_{nT}\mathbb{E}[\overline{g}_{nT}(\theta)]\right| (B.109)
supθΘKg¯nT(θ)𝔼[g¯nT(θ)]2+supθΘK𝔼[g¯nT(θ)]supθΘKg¯nT(θ)𝔼[g¯nT(θ)]\displaystyle\lesssim\sup_{\theta\in\Theta_{K}}\left\|\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right\|^{2}+\sup_{\theta\in\Theta_{K}}\left\|\mathbb{E}[\overline{g}_{nT}(\theta)]\right\|\sup_{\theta\in\Theta_{K}}\left\|\overline{g}_{nT}(\theta)-\mathbb{E}[\overline{g}_{nT}(\theta)]\right\| (B.110)
pK/nT+K1π.\displaystyle\lesssim_{p}K/\sqrt{nT}+K^{1-\pi}. (B.111)

Combined with the identifiability of θ0\theta_{0} (Lemma B.2), the above result implies the consistency of θ^nT\widehat{\theta}_{nT} (see, e.g., the proof of Theorem 3.3 in Su and Hoshino (2016)). ∎

Proof of Theorem 3.1.

(i) Given the consistency result in Lemma B.8, if we can show that for an arbitrary ϵ>0\epsilon>0, there exists a constant CϵC_{\epsilon} such that for all sufficiently large nTnT,

Pr(inf𝒖=Cϵ𝒬nT(θ0+ζnT𝒖)>𝒬nT(θ0))1ϵ,\displaystyle\Pr\left(\inf_{||\bm{u}||=C_{\epsilon}}\mathcal{Q}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})>\mathcal{Q}_{nT}(\theta_{0})\right)\geq 1-\epsilon, (B.112)

we can conclude that θ^nTθ0pζnT||\widehat{\theta}_{nT}-\theta_{0}||\lesssim_{p}\zeta_{nT}.

Decompose

𝒬nT(θ0+ζnT𝒖)𝒬nT(θ0)\displaystyle\mathcal{Q}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\mathcal{Q}_{nT}(\theta_{0}) =(g¯nT(θ0+ζnT𝒖)g¯nT(θ0))ΩnT(g¯nT(θ0+ζnT𝒖)g¯nT(θ0))A~nT(θ)\displaystyle=\underbracket{\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)^{\top}\Omega_{nT}\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)}_{\eqqcolon\widetilde{A}_{nT}(\theta)} (B.113)
+2(g¯nT(θ0+ζnT𝒖)g¯nT(θ0))ΩnTg¯nT(θ0).\displaystyle\quad+2\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0}). (B.114)

Lemma B.6(ii) implies that ΠnTΠ^nT=oP(1)||\Pi_{nT}-\widehat{\Pi}_{nT}||=o_{P}(1), where Π^nTl=1L𝒁(sl)𝑫𝑫𝑯(sl)/N\widehat{\Pi}_{nT}\coloneqq\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{H}(s_{l})/N. Thus, by Assumption 3.6, we have λmin(Π^nTΠ^nT)>0\lambda_{\min}(\widehat{\Pi}_{nT}^{\top}\widehat{\Pi}_{nT})>0 with probability approaching one. Observing that

g¯nT(θ)g¯nT(θ0)\displaystyle\overline{g}_{nT}(\theta)-\overline{g}_{nT}(\theta_{0}) =(Π^nTl=1L[𝑯(sl)(θ0θ)+2𝑽(sl)+2𝓔(sl)]𝑫P1𝑫𝑯(sl)/Nl=1L[𝑯(sl)(θ0θ)+2𝑽(sl)+2𝓔(sl)]𝑫PM𝑫𝑯(sl)/N)(θ0θ),\displaystyle=\left(\begin{array}[]{c}\widehat{\Pi}_{nT}\\ \sum_{l=1}^{L}\left[\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l})\right]^{\top}\bm{D}^{\top}P_{1}\bm{D}\bm{H}(s_{l})/N\\ \vdots\\ \sum_{l=1}^{L}\left[\bm{H}(s_{l})(\theta_{0}-\theta)+2\bm{V}(s_{l})+2\bm{\mathcal{E}}(s_{l})\right]^{\top}\bm{D}^{\top}P_{M}\bm{D}\bm{H}(s_{l})/N\end{array}\right)(\theta_{0}-\theta), (B.119)

we obtain A~nT(θ)c1ζnT2Cϵ2\widetilde{A}_{nT}(\theta)\geq c_{1}\zeta_{nT}^{2}C_{\epsilon}^{2} for some c1>0c_{1}>0 with probability approaching one.

For the second term, we can find by Cauchy-Schwarz inequality that

|(g¯nT(θ0+ζnT𝒖)g¯nT(θ0))ΩnTg¯nT(θ0)|\displaystyle\left|\left(\overline{g}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\overline{g}_{nT}(\theta_{0})\right)^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0})\right| (A~nT(θ))1/2(g¯nT(θ0)ΩnTg¯nT(θ0))1/2\displaystyle\leq(\widetilde{A}_{nT}(\theta))^{1/2}\left(\overline{g}_{nT}(\theta_{0})^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0})\right)^{1/2} (B.120)
c2(A~nT(θ))1/2g¯nT(θ0).\displaystyle\leq c_{2}(\widetilde{A}_{nT}(\theta))^{1/2}||\overline{g}_{nT}(\theta_{0})||. (B.121)

Hence,

𝒬nT(θ0+ζnT𝒖)𝒬nT(θ0)\displaystyle\mathcal{Q}_{nT}(\theta_{0}+\zeta_{nT}\bm{u})-\mathcal{Q}_{nT}(\theta_{0}) A~nT(θ)2c2(A~nT(θ))1/2g¯nT(θ0)\displaystyle\geq\widetilde{A}_{nT}(\theta)-2c_{2}(\widetilde{A}_{nT}(\theta))^{1/2}||\overline{g}_{nT}(\theta_{0})|| (B.122)
=(A~nT(θ))1/2((A~nT(θ))1/22c2g¯nT(θ0)).\displaystyle=(\widetilde{A}_{nT}(\theta))^{1/2}((\widetilde{A}_{nT}(\theta))^{1/2}-2c_{2}||\overline{g}_{nT}(\theta_{0})||). (B.123)

Since (A~nT(θ))1/2(\widetilde{A}_{nT}(\theta))^{1/2} is bounded below from c1ζnTCϵ\sqrt{c_{1}}\zeta_{nT}C_{\epsilon}, if we set ζnTg¯nT(θ0)\zeta_{nT}\propto||\overline{g}_{nT}(\theta_{0})||, we can obtain the desired inequality by choosing a sufficiently large CϵC_{\epsilon}. From Lemma B.6(iii), (iv), (vi), (vii), and (viii), we have

g¯nT(θ0)1/nT+K(12π)/2,\displaystyle\left\|\overline{g}_{nT}(\theta_{0})\right\|\lesssim 1/\sqrt{nT}+K^{(1-2\pi)/2}, (B.124)

and this completes the proof.


(ii) Note that 01ϕK(s)ϕK(s)ds=IK\int_{0}^{1}\phi^{K}(s)\phi^{K}(s)^{\top}\text{d}s=I_{K} by orthonormality. Then, by result (i) and Assumption 3.7,

α^nTα0L2\displaystyle\left\|\widehat{\alpha}_{nT}-\alpha_{0}\right\|_{L^{2}} ϕK()(θ^nT,αθ0α)L2+ϕK()θ0αα0()L2\displaystyle\leq\left\|\phi^{K}(\cdot)^{\top}(\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha})\right\|_{L^{2}}+\left\|\phi^{K}(\cdot)^{\top}\theta_{0\alpha}-\alpha_{0}(\cdot)\right\|_{L^{2}} (B.125)
((θ^nT,αθ0α)[01ϕK(s)ϕK(s)ds](θ^nT,αθ0α))1/2+Kπ\displaystyle\lesssim\left((\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha})^{\top}\left[\int_{0}^{1}\phi^{K}(s)\phi^{K}(s)^{\top}\text{d}s\right](\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha})\right)^{1/2}+K^{-\pi} (B.126)
p1/nT+K(12π)/2.\displaystyle\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}. (B.127)

It is also straightforward to see that sups[0,1]|α^nT(s)α0(s)|sups[0,1]ϕK(s)θ^nT,αθ0α+KπpK/nT+K1π\sup_{s\in[0,1]}|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\leq\sup_{s\in[0,1]}||\phi^{K}(s)||\cdot||\widehat{\theta}_{nT,\alpha}-\theta_{0\alpha}||+K^{-\pi}\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}.


(iii) Analogous to the proof of (ii). ∎


Lemma B.9.

Suppose that Assumptions 2.1, 3.13.7, and 3.8(i), (ii) hold. In addition, assume that K/nT0K/\sqrt{nT}\to 0 and K1π0K^{1-\pi}\to 0 as nTnT\to\infty. Let θ¯nT\overline{\theta}_{nT} be any vector in between θ^nT\widehat{\theta}_{nT} and θ0\theta_{0}. Then,

  • (i)

    J¯nT(θ^nT)J¯nTpK/nT+K(12π)/2\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}

  • (ii)

    J¯nT(θ^nT)J¯nT(θ^nT)J¯nTJ¯nTpK/nT+K(12π)/2\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}-\overline{J}_{nT}\overline{J}^{\top}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}

  • (iii)

    J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT)J¯nTΩnTJ¯nTpK/nT+K(12π)/2\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}

  • (iv)

    (J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT))(J¯nTΩnTJ¯nT)1pK/nT+K(12π)/2\left\|\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}-\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}

Proof.

(i) Observe that

J¯nT(θ^nT)J¯nT]B1,nT+2m=1M(B2,m,nT+B3,m,nT)\displaystyle\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}]\right\|\leq B_{1,nT}+2\sum_{m=1}^{M}\left(B_{2,m,nT}+B_{3,m,nT}\right) (B.128)

where

B1,nT\displaystyle B_{1,nT} l=1L𝒁(sl)𝑫𝑫{𝑯(sl)𝔼[𝑯(sl)]}/NpK/nTLemma B.6(ii)\displaystyle\coloneqq\left\|\sum_{l=1}^{L}\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\left\{\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})]\right\}/N\right\|\lesssim_{p}\underset{\text{Lemma \ref{lem:matLLN}(ii)}}{K/\sqrt{nT}} (B.129)
B2,m,nT\displaystyle B_{2,m,nT} l=1L{𝑬(sl;θ^nT)𝑬(sl;θ0)}𝑫Pm𝑫𝑯(sl)/N\displaystyle\coloneqq\left\|\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\widehat{\theta}_{nT})-\bm{E}(s_{l};\theta_{0})\right\}^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})/N\right\| (B.130)
B3,m,nT\displaystyle B_{3,m,nT} l=1L{𝑬(sl;θ0)𝑫Pm𝑫𝑯(sl)𝔼[𝑬(sl;θ0)𝑫Pm𝑫𝑯(sl)]}/N.\displaystyle\coloneqq\left\|\sum_{l=1}^{L}\left\{\bm{E}(s_{l};\theta_{0})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})-\mathbb{E}[\bm{E}(s_{l};\theta_{0})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})]\right\}/N\right\|. (B.131)

In a similar manner to the proof of Lemma B.6(v), we can show that N1l=1L{𝑯(sl)𝑯(sl)𝔼[𝑯(sl)𝑯(sl)]}pK/nT||N^{-1}\sum_{l=1}^{L}\{\bm{H}(s_{l})^{\top}\bm{H}(s_{l})-\mathbb{E}[\bm{H}(s_{l})^{\top}\bm{H}(s_{l})]\}||\lesssim_{p}K/\sqrt{nT}. Then, by Assumption 3.8(i) and Theorem 3.1(i), we have

B2,m,nT\displaystyle B_{2,m,nT} =(1Nl=1L𝑯(sl)𝑫Pm𝑫𝑯(sl))(θ^nTθ0)\displaystyle=\left\|\left(\frac{1}{N}\sum_{l=1}^{L}\bm{H}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})\right)\left(\widehat{\theta}_{nT}-\theta_{0}\right)\right\| (B.132)
λmax(1Nl=1L𝑯(sl)𝑫Pm𝑫𝑯(sl))θ^nTθ0\displaystyle\leq\lambda_{\max}\left(\frac{1}{N}\sum_{l=1}^{L}\bm{H}(s_{l})^{\top}\bm{D}^{\top}P_{m}\bm{D}\bm{H}(s_{l})\right)\cdot\left\|\widehat{\theta}_{nT}-\theta_{0}\right\| (B.133)
λmax(𝑫Pm𝑫)λmax(1Nl=1L𝑯(sl)𝑯(sl))θ^nTθ0p1/nT+K(12π)/2.\displaystyle\leq\lambda_{\max}\left(\bm{D}^{\top}P_{m}\bm{D}\right)\cdot\lambda_{\max}\left(\frac{1}{N}\sum_{l=1}^{L}\bm{H}(s_{l})^{\top}\bm{H}(s_{l})\right)\cdot\left\|\widehat{\theta}_{nT}-\theta_{0}\right\|\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}. (B.134)

For B3,m,nTB_{3,m,nT}, by the same argument as in Lemma B.6(v), we can show that B3,m,nTpK/nTB_{3,m,nT}\lesssim_{p}\sqrt{K}/\sqrt{nT}. This completes the proof.


(ii) By the triangle inequality,

J¯nT(θ^nT)J¯nT(θ^nT)J¯nTJ¯nT\displaystyle\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}-\overline{J}_{nT}\overline{J}^{\top}_{nT}\right\| (J¯nT(θ^nT)J¯nT)(J¯nT(θ^nT)J¯nT)\displaystyle\leq\left\|(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\| (B.135)
+2J¯nT(J¯nT(θ^nT)J¯nT)\displaystyle\quad+2\left\|\overline{J}_{nT}(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\| (B.136)
pK/nT+K(12π)/2\displaystyle\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2} (B.137)

where the last inequality is from result (i) and Assumption 3.8(ii).


(iii) By definition of θ¯nT\overline{\theta}_{nT}, we have θ¯nTθ0p1/nT+K(12π)/2\left\|\overline{\theta}_{nT}-\theta_{0}\right\|\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2} and thus J¯nT(θ¯nT)J¯nTpK/nT+K(12π)/2\left\|\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}_{nT}\right\|\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}, as in result (i). Then, by the triangle inequality,

J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT)J¯nTΩnTJ¯nT\displaystyle\left\|\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right\| (J¯nT(θ^nT)J¯nT)ΩnT(J¯nT(θ¯nT)J¯nT)\displaystyle\leq\left\|(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\Omega_{nT}(\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\| (B.138)
+J¯nTΩnT(J¯nT(θ^nT)J¯nT)\displaystyle\quad+\left\|\overline{J}_{nT}\Omega_{nT}(\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\| (B.139)
+J¯nTΩnT(J¯nT(θ¯nT)J¯nT)\displaystyle\quad+\left\|\overline{J}_{nT}\Omega_{nT}(\overline{J}_{nT}(\overline{\theta}_{nT})-\overline{J}_{nT})^{\top}\right\| (B.140)
pK/nT+K(12π)/2.\displaystyle\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}. (B.141)

(iv) As a result of (iii), we have λmin(J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT))>0\lambda_{\min}\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)>0 with probability approaching one by Assumptions 3.5(ii) and 3.8(ii). Then, noting the equality

(J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT))(J¯nTΩnTJ¯nT)1\displaystyle\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}-\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1} (B.142)
=(J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT))[J¯nTΩnTJ¯nTJ¯nT(θ^nT)ΩnTJ¯nT(θ¯nT)](J¯nTΩnTJ¯nT)1,\displaystyle=\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}\left[\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}-\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right]\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}, (B.143)

the result is straightforward. ∎

Proof of Theorem 3.2.

Since the proof is similar, we only prove (i). By the first-order condition of minimization and the mean-value expansion, we have

𝟎(dx+1)K\displaystyle\bm{0}_{(d_{x}+1)K} =J¯nT(θ^nT)ΩnTg¯nT(θ^nT)\displaystyle=\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{g}_{nT}(\widehat{\theta}_{nT}) (B.144)
=J¯nT(θ^nT)ΩnT[g¯nT(θ0)+J¯nT(θ¯nT)(θ^nTθ0)],\displaystyle=\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\left[\overline{g}_{nT}(\theta_{0})+\overline{J}_{nT}(\overline{\theta}_{nT})\left(\widehat{\theta}_{nT}-\theta_{0}\right)\right], (B.145)

where θ¯nT[θ^nT,θ0]\overline{\theta}_{nT}\in[\widehat{\theta}_{nT},\theta_{0}], leading to

(θ^nTθ0)\displaystyle\left(\widehat{\theta}_{nT}-\theta_{0}\right) =(J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT))J¯nT(θ^nT)ΩnTg¯nT(θ0)\displaystyle=-\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0}) (B.146)
=[G1,nT+G2,nT+G3,nT+G4,nT],\displaystyle=-[G_{1,nT}+G_{2,nT}+G_{3,nT}+G_{4,nT}], (B.147)

with

G1,nT\displaystyle G_{1,nT} (J¯nTΩnTJ¯nT)1J¯nTΩnTg¯1,nT\displaystyle\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{1,nT} (B.148)
G2,nT\displaystyle G_{2,nT} (J¯nTΩnTJ¯nT)1J¯nTΩnTg¯2,nT\displaystyle\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{2,nT} (B.149)
G3,nT\displaystyle G_{3,nT} (J¯nTΩnTJ¯nT)1{J¯nT(θ^nT)J¯nT}ΩnTg¯nT(θ0)\displaystyle\coloneqq\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\left\{\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}\right\}^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0}) (B.150)
G4,nT\displaystyle G_{4,nT} {(J¯nT(θ^nT)ΩnTJ¯nT(θ¯nT))(J¯nTΩnTJ¯nT)1}J¯nT(θ^nT)ΩnTg¯nT(θ0).\displaystyle\coloneqq\left\{\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\overline{\theta}_{nT})\right)^{-}-\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\right\}\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{g}_{nT}(\theta_{0}). (B.151)

First, observing that

G2,nT2\displaystyle\left\|G_{2,nT}\right\|^{2} =g¯2,nTΩnTJ¯nT(J¯nTΩnTJ¯nT)2J¯nTΩnTg¯2,nT\displaystyle=\overline{g}_{2,nT}^{\top}\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-2}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{2,nT} (B.152)
g¯2,nTΩnTJ¯nT(J¯nTΩnTJ¯nT)1J¯nTΩnTg¯2,nT\displaystyle\lesssim\overline{g}_{2,nT}^{\top}\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\overline{g}_{2,nT} (B.153)
g¯2,nTΩnTg¯2,nT\displaystyle\lesssim\overline{g}_{2,nT}^{\top}\Omega_{nT}\overline{g}_{2,nT} (B.154)
g¯2,nT2,\displaystyle\lesssim\left\|\overline{g}_{2,nT}\right\|^{2}, (B.155)

we can find that G2,nTpK(12π)/2\left\|G_{2,nT}\right\|\lesssim_{p}K^{(1-2\pi)/2} by Lemma B.6(iii), (vi), and (vii). Next, it is easy to see that

G3,nT\displaystyle\left\|G_{3,nT}\right\| (J¯nTΩnTJ¯nT)1{J¯nT(θ^nT)J¯nT}pK/nT+K(12π)/2:Lemma B.9(i)g¯nT(θ0)p1/nT+K(12π)/2:(B.124)\displaystyle\lesssim\underbracket{\left\|\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\left\{\overline{J}_{nT}(\widehat{\theta}_{nT})-\overline{J}_{nT}\right\}^{\top}\right\|}_{\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2}:\>\text{Lemma \ref{lem:matLLN2}(i)}}\cdot\underbracket{\left\|\overline{g}_{nT}(\theta_{0})\right\|}_{\lesssim_{p}1/\sqrt{nT}+K^{(1-2\pi)/2}:\>\eqref{eq:g0}} (B.156)
pK/(nT)\displaystyle\lesssim_{p}K/(nT) (B.157)

from nTK(12π)/20\sqrt{nT}K^{(1-2\pi)/2}\to 0. Further, for G4,nTG_{4,nT}, by Lemma B.9(ii) and (iv) and (B.124) with Assumption 3.8(ii), we can show that G4,nTpK/(nT)\left\|G_{4,nT}\right\|\lesssim_{p}K/(nT).

Combining all these results and noting that

[σnT,α(s)]2cϕK(s)𝕊α𝕊α=IKϕK(s)cϕK(s)2>0\displaystyle[\sigma_{nT,\alpha}(s)]^{2}\geq c\phi^{K}(s)^{\top}\underbracket{\mathbb{S}_{\alpha}\mathbb{S}_{\alpha}^{\top}}_{=I_{K}}\phi^{K}(s)\geq c||\phi^{K}(s)||^{2}>0 (B.158)

for sufficiently large nTnT, we have

n(T1)(α^nT(s)α0(s))σnT,α(s)\displaystyle\frac{\sqrt{n(T-1)}\left(\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)\right)}{\sigma_{nT,\alpha}(s)} =n(T1)ϕK(s)𝕊α[G1,nT+OP(K/(nT))+OP(K(12π)/2)]σnT,α(s)\displaystyle=-\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}[G_{1,nT}+O_{P}(K/(nT))+O_{P}(K^{(1-2\pi)/2})]}{\sigma_{nT,\alpha}(s)} (B.159)
+n(T1)O(Kπ)σnT,α(s)\displaystyle\quad+\frac{\sqrt{n(T-1)}O(K^{-\pi})}{\sigma_{nT,\alpha}(s)} (B.160)
=n(T1)ϕK(s)𝕊αG1,nTσnT,α(s)+oP(1)\displaystyle=-\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}G_{1,nT}}{\sigma_{nT,\alpha}(s)}+o_{P}(1) (B.161)

Here, let Λz,nT(s)\Lambda_{z,nT}(s) and Λm,nT(s)\Lambda_{m,nT}(s) denote the first (dq+dx)K(d_{q}+d_{x})K elements and ((dq+dx)K+m)((d_{q}+d_{x})K+m)-th element of ϕK(s)𝕊α(J¯nTΩnTJ¯nT)1J¯nTΩnT/σnT,α(s)\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}/\sigma_{nT,\alpha}(s), respectively. Then, we can write

n(T1)ϕK(s)𝕊αG1,nTσnT,α(s)\displaystyle\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}G_{1,nT}}{\sigma_{nT,\alpha}(s)} (B.162)
=ϕK(s)𝕊α(J¯nTΩnTJ¯nT)1J¯nTΩnT[n(T1)g¯1,nT]σnT,α(s)\displaystyle=\frac{\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1}\overline{J}^{\top}_{nT}\Omega_{nT}\left[\sqrt{n(T-1)}\overline{g}_{1,nT}\right]}{\sigma_{nT,\alpha}(s)} (B.163)
=1Ln(T1)l=1L[Λz,nT(s)𝒁(sl)𝑫𝑫𝓔(sl)+𝓔(sl)𝑫(m=1MΛm,nT(s)Pm)𝑫𝓔(sl)].\displaystyle=\frac{1}{L\sqrt{n(T-1)}}\sum_{l=1}^{L}\left[\Lambda_{z,nT}(s)\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}\bm{\mathcal{E}}(s_{l})+\bm{\mathcal{E}}(s_{l})^{\top}\bm{D}^{\top}\left(\sum_{m=1}^{M}\Lambda_{m,nT}(s)P_{m}\right)\bm{D}\bm{\mathcal{E}}(s_{l})\right]. (B.164)

Moreover, for convenience, we re-label the data such that (it)=(11)i=1(it)=(11)\iff\text{i}=1, (it)=(21)i=2(it)=(21)\iff\text{i}=2, …, (it)=(nT)i=𝒏(it)=(nT)\iff\text{i}=\bm{n} (where 𝒏=nT\bm{n}=nT).

Let ΠM(s)𝒏×𝒏𝑫(m=1MΛm,nT(s)Pm)𝑫=(πM,i,j(s))\underset{\bm{n}\times\bm{n}}{\Pi_{M}(s)}\coloneqq\bm{D}^{\top}\left(\sum_{m=1}^{M}\Lambda_{m,nT}(s)P_{m}\right)\bm{D}=(\pi_{M,\text{i},\text{j}}(s)). Recalling the block-diagonal structure of PmP_{m} and that its diagonals are all zero, we can find that the diagonal elements of ΠM(s)\Pi_{M}(s) are also all zero. Further, note that ΠM(s)\Pi_{M}(s) is symmetric. Now, letting zi(sl)z_{\text{i}}^{\dagger}(s_{l}) be the i-th column of 𝒁(sl)𝑫𝑫\bm{Z}(s_{l})^{\top}\bm{D}^{\top}\bm{D}, define

ai(s)\displaystyle a_{\text{i}}(s) 1Ln(T1)l=1LΛz,nT(s)zi(sl)εi(sl)\displaystyle\coloneqq\frac{1}{L\sqrt{n(T-1)}}\sum_{l=1}^{L}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l})\varepsilon_{\text{i}}(s_{l}) (B.165)
bi,j(s)\displaystyle b_{\text{i},\text{j}}(s) 1Ln(T1)l=1LπM,i,j(s)εi(sl)εj(sl)\displaystyle\coloneqq\frac{1}{L\sqrt{n(T-1)}}\sum_{l=1}^{L}\pi_{M,\text{i},\text{j}}(s)\varepsilon_{\text{i}}(s_{l})\varepsilon_{\text{j}}(s_{l}) (B.166)
γi(s)\displaystyle\gamma_{\text{i}}(s) ai(s)+2j=1i1bi,j(s),\displaystyle\coloneqq a_{\text{i}}(s)+2\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s), (B.167)

and we further re-write

n(T1)ϕK(s)𝕊αG1,nTσnT,α(s)=i=1𝒏γi(s).\displaystyle\frac{\sqrt{n(T-1)}\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}G_{1,nT}}{\sigma_{nT,\alpha}(s)}=\sum_{\text{i}=1}^{\bm{n}}\gamma_{\text{i}}(s). (B.168)

Here, let nT(i)\mathscr{F}_{nT}(\text{i}) denote the σ\sigma-field generated by {εj:1ji}\{\varepsilon_{\text{j}}:1\leq\text{j}\leq\text{i}\}. Under Assumption 3.3(i), we have 𝔼[γi(s)nT(i1)]=0\mathbb{E}[\gamma_{\text{i}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]=0, implying that {γi(s)}\{\gamma_{\text{i}}(s)\} forms a martingale difference sequence for each 𝒏1\bm{n}\geq 1. Then, it suffices to check the following two conditions for the central limit theorem of Scott (1973):

(1)\displaystyle(1)\;\; i=1𝒏𝔼[(γi(s))2nT(i1)]𝑝1\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}1 (B.169)
(2)\displaystyle(2)\;\; i=1𝒏𝔼[(γi(s))2𝟏{|γi(s)|η}nT(i1)]𝑝0for any η>0\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\bm{1}\{|\gamma_{\text{i}}(s)|\geq\eta\}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}0\;\;\text{for any $\eta>0$} (B.170)

Verification of condition (1)

Observe that

𝔼[(γi(s))2nT(i1)]\displaystyle\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\mid\mathscr{F}_{nT}(\text{i}-1)] =𝔼[(ai(s))2]+4j1=1i1j2=1i1𝔼[bi,j1(s)bi,j2(s)nT(i1)]\displaystyle=\mathbb{E}[(a_{\text{i}}(s))^{2}]+4\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\mathbb{E}[b_{\text{i},\text{j}_{1}}(s)b_{\text{i},\text{j}_{2}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)] (B.171)
+4j=1i1𝔼[ai(s)bi,j(s)nT(i1)].\displaystyle\quad+4\sum_{\text{j}=1}^{\text{i}-1}\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]. (B.172)

Recalling the definition of 𝒱z,nT\mathcal{V}_{z,nT} in (A.39), we can easily see that i=1𝒏𝔼[(ai(s))2]=Λz,nT(s)𝒱z,nTΛz,nT(s)\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[(a_{\text{i}}(s))^{2}]=\Lambda_{z,nT}(s)\mathcal{V}_{z,nT}\Lambda_{z,nT}(s)^{\top}.

For the second term on the right-hand side, noting that j1,j2i1\text{j}_{1},\text{j}_{2}\leq\text{i}-1,

4i=1𝒏j1=1i1j2=1i1𝔼[bi,j1(s)bi,j2(s)nT(i1)]\displaystyle 4\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\mathbb{E}[b_{\text{i},\text{j}_{1}}(s)b_{\text{i},\text{j}_{2}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)] (B.173)
=1L2l=1Ll=1L4n(T1)i=1𝒏j1=1i1j2=1i1πM,i,j1(s)πM,i,j2(s)Γi(sl,sl)εj1(sl)εj2(sl)D(s,sl,sl).\displaystyle=\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\underbracket{\frac{4}{n(T-1)}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}_{1}}(s)\pi_{M,\text{i},\text{j}_{2}}(s)\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\varepsilon_{\text{j}_{1}}(s_{l})\varepsilon_{\text{j}_{2}}(s_{l^{\prime}})}_{\eqqcolon D(s,s_{l},s_{l^{\prime}})}. (B.174)

Since ΠM(s)\Pi_{M}(s) is symmetric and its diagonals are zero, recalling the definition of 𝒱ab,nT\mathcal{V}_{ab,nT} in (A.40), direct calculation yields

1L2l=1Ll=1L𝔼[D(s,sl,sl)]\displaystyle\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\mathbb{E}[D(s,s_{l},s_{l^{\prime}})] =4n(T1)i=1𝒏j=1i1[πM,i,j(s)]21L2l=1Ll=1LΓi(sl,sl)Γj(sl,sl)\displaystyle=\frac{4}{n(T-1)}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}=1}^{\text{i}-1}[\pi_{M,\text{i},\text{j}}(s)]^{2}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}}) (B.175)
=2n(T1)1i,j𝒏[πM,i,j(s)]21L2l=1Ll=1LΓi(sl,sl)Γj(sl,sl)\displaystyle=\frac{2}{n(T-1)}\sum_{1\leq\text{i},\text{j}\leq\bm{n}}[\pi_{M,\text{i},\text{j}}(s)]^{2}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}}) (B.176)
=2n(T1)1i,j𝒏[m=1MΛm,nT(s)p~m,i,j]21L2l=1Ll=1LΓi(sl,sl)Γj(sl,sl)\displaystyle=\frac{2}{n(T-1)}\sum_{1\leq\text{i},\text{j}\leq\bm{n}}\left[\sum_{m=1}^{M}\Lambda_{m,nT}(s)\widetilde{p}_{m,\text{i},\text{j}}\right]^{2}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}}) (B.177)
=a=1Mb=1MΛa,nT(s)Λb,nT(s)2n(T1)1i,j𝒏p~a,i,jp~b,i,j1L2l=1Ll=1LΓi(sl,sl)Γj(sl,sl)\displaystyle=\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\frac{2}{n(T-1)}\sum_{1\leq\text{i},\text{j}\leq\bm{n}}\widetilde{p}_{a,\text{i},\text{j}}\widetilde{p}_{b,\text{i},\text{j}}\frac{1}{L^{2}}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Gamma_{\text{i}}(s_{l},s_{l^{\prime}})\Gamma_{\text{j}}(s_{l},s_{l^{\prime}}) (B.178)
=a=1Mb=1MΛa,nT(s)Λb,nT(s)𝒱ab,nT.\displaystyle=\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\mathcal{V}_{ab,nT}. (B.179)

Meanwhile,

Var(D(s,sl,sl))\displaystyle\text{Var}\left(D(s,s_{l},s_{l^{\prime}})\right) 1n2(T1)2i=1𝒏j1=1i1j2=1i1i=1𝒏k1=1i1k2=1i1|πM,i,j1(s)πM,i,j2(s)πM,i,k1(s)πM,i,k2(s)|\displaystyle\lesssim\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\sum_{\text{i}^{\prime}=1}^{\bm{n}}\sum_{\text{k}_{1}=1}^{\text{i}^{\prime}-1}\sum_{\text{k}_{2}=1}^{\text{i}^{\prime}-1}\left|\pi_{M,\text{i},\text{j}_{1}}(s)\pi_{M,\text{i},\text{j}_{2}}(s)\pi_{M,\text{i}^{\prime},\text{k}_{1}}(s)\pi_{M,\text{i}^{\prime},\text{k}_{2}}(s)\right| (B.180)
×|𝔼{(εj1(sl)εj2(sl)𝔼[εj1(sl)εj2(sl)])(εk1(sl)εk2(sl)𝔼[εk1(sl)εk2(sl)])}|\displaystyle\quad\times\left|\mathbb{E}\{(\varepsilon_{\text{j}_{1}}(s_{l})\varepsilon_{\text{j}_{2}}(s_{l^{\prime}})-\mathbb{E}[\varepsilon_{\text{j}_{1}}(s_{l})\varepsilon_{\text{j}_{2}}(s_{l^{\prime}})])(\varepsilon_{\text{k}_{1}}(s_{l})\varepsilon_{\text{k}_{2}}(s_{l^{\prime}})-\mathbb{E}[\varepsilon_{\text{k}_{1}}(s_{l})\varepsilon_{\text{k}_{2}}(s_{l^{\prime}})])\}\right| (B.181)
1n2(T1)2i=1𝒏i=1𝒏j1=1i1j2=1i1|πM,i,j1(s)πM,i,j2(s)πM,i,j1(s)πM,i,j2(s)|\displaystyle\lesssim\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{i}^{\prime}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\left|\pi_{M,\text{i},\text{j}_{1}}(s)\pi_{M,\text{i},\text{j}_{2}}(s)\pi_{M,\text{i}^{\prime},\text{j}_{1}}(s)\pi_{M,\text{i}^{\prime},\text{j}_{2}}(s)\right| (B.182)
=1n2(T1)2i=1𝒏j1=1i1j2=1i1i=1𝒏|πM,i,j1(s)||πM,i,j2(s)||πM,i,j1(s)||πM,i,j2(s)|\displaystyle=\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}^{\prime}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\sum_{\text{i}=1}^{\bm{n}}|\pi_{M,\text{i},\text{j}_{1}}(s)|\cdot|\pi_{M,\text{i},\text{j}_{2}}(s)|\cdot|\pi_{M,\text{i}^{\prime},\text{j}_{1}}(s)|\cdot|\pi_{M,\text{i}^{\prime},\text{j}_{2}}(s)| (B.183)
1n2(T1)2i=1𝒏||ΠM(s)||12||ΠM(s)||21/(nT).\displaystyle\leq\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}^{\prime}=1}^{\bm{n}}||\Pi_{M}(s)||_{1}^{2}\cdot||\Pi_{M}(s)||_{\infty}^{2}\lesssim 1/(nT). (B.184)

Consequently, 4i=1𝒏j1=1i1j2=1i1𝔼[bi,j1(s)bi,j2(s)nT(i1)]𝑝a=1Mb=1MΛa,nT(s)Λb,nT(s)𝒱ab,nT4\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}_{1}=1}^{\text{i}-1}\sum_{\text{j}_{2}=1}^{\text{i}-1}\mathbb{E}[b_{\text{i},\text{j}_{1}}(s)b_{\text{i},\text{j}_{2}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\mathcal{V}_{ab,nT} holds from Chebyshev’s inequality.

For the third term, for ji1\text{j}\leq\text{i}-1,

𝔼[ai(s)bi,j(s)nT(i1)]\displaystyle\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)] =1L2n(T1)l=1Ll=1LΛz,nT(s)zi(sl)πM,i,j(s)𝔼[εi(sl)εi(sl)]εj(sl)\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l})\pi_{M,\text{i},\text{j}}(s)\mathbb{E}[\varepsilon_{\text{i}}(s_{l})\varepsilon_{\text{i}}(s_{l^{\prime}})]\varepsilon_{\text{j}}(s_{l^{\prime}}) (B.185)
=1Ln(T1)l=1LπM,i,j(s)hi(s,sl)εj(sl),\displaystyle=\frac{1}{Ln(T-1)}\sum_{l=1}^{L}\pi_{M,\text{i},\text{j}}(s)h_{\text{i}}(s,s_{l})\varepsilon_{\text{j}}(s_{l}), (B.186)

where hi(s,sl)L1l=1LΛz,nT(s)zi(sl)𝔼[εi(sl)εi(sl)]h_{\text{i}}(s,s_{l^{\prime}})\coloneqq L^{-1}\sum_{l=1}^{L}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l})\mathbb{E}[\varepsilon_{\text{i}}(s_{l})\varepsilon_{\text{i}}(s_{l^{\prime}})]. Hence, we can write

4i=1𝒏j=1i1𝔼[ai(s)bi,j(s)nT(i1)]\displaystyle 4\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}=1}^{\text{i}-1}\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)] =4Ll=1L(1n(T1)j=1𝒏i=j+1𝒏πM,i,j(s)hi(s,sl)εj(sl)).\displaystyle=\frac{4}{L}\sum_{l=1}^{L}\left(\frac{1}{n(T-1)}\sum_{\text{j}=1}^{\bm{n}}\sum_{\text{i}=\text{j}+1}^{\bm{n}}\pi_{M,\text{i},\text{j}}(s)h_{\text{i}}(s,s_{l})\varepsilon_{\text{j}}(s_{l})\right). (B.187)

Noting that |hi(s,sl)|K/||ϕK(s)|||h_{\text{i}}(s,s_{l^{\prime}})|\lesssim K/||\phi^{K}(s)||,

𝔼|1n(T1)j=1𝒏i=j+1𝒏πM,i,j(s)hi(s,sl)εj(sl)|2\displaystyle\mathbb{E}\left|\frac{1}{n(T-1)}\sum_{\text{j}=1}^{\bm{n}}\sum_{\text{i}=\text{j}+1}^{\bm{n}}\pi_{M,\text{i},\text{j}}(s)h_{\text{i}}(s,s_{l})\varepsilon_{\text{j}}(s_{l})\right|^{2} K2n2(T1)2j=1𝒏i=j+1𝒏i=j+1𝒏|πM,i,j(s)||πM,i,j(s)|\displaystyle\lesssim\frac{K^{2}}{n^{2}(T-1)^{2}}\sum_{\text{j}=1}^{\bm{n}}\sum_{\text{i}=\text{j}+1}^{\bm{n}}\sum_{\text{i}^{\prime}=\text{j}+1}^{\bm{n}}|\pi_{M,\text{i},\text{j}}(s)|\cdot|\pi_{M,\text{i}^{\prime},\text{j}}(s)| (B.188)
K2n2(T1)2j=1𝒏||ΠM(s)||12K2/(nT)\displaystyle\leq\frac{K^{2}}{n^{2}(T-1)^{2}}\sum_{\text{j}=1}^{\bm{n}}||\Pi_{M}(s)||_{1}^{2}\lesssim K^{2}/(nT) (B.189)

Then, by Markov’s inequality, we obtain 4i=1𝒏j=1i1𝔼[ai(s)bi,j(s)nT(i1)]𝑝04\sum_{\text{i}=1}^{\bm{n}}\sum_{\text{j}=1}^{\text{i}-1}\mathbb{E}[a_{\text{i}}(s)b_{\text{i},\text{j}}(s)\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}0.

Finally, combining the above results gives

𝔼[(γi(s))2nT(i1)]𝑝Λz,nT(s)𝒱z,nTΛz,nT(s)+a=1Mb=1MΛa,nT(s)Λb,nT(s)𝒱ab,nT\displaystyle\mathbb{E}[(\gamma_{\text{i}}(s))^{2}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}\Lambda_{z,nT}(s)\mathcal{V}_{z,nT}\Lambda_{z,nT}(s)^{\top}+\sum_{a=1}^{M}\sum_{b=1}^{M}\Lambda_{a,nT}(s)\Lambda_{b,nT}(s)\mathcal{V}_{ab,nT} (B.190)
=(Λz,nT(s)Λ1,nT(s)ΛM,nT(s))(𝒱z,nT𝟎(dq+dx)K×1𝟎(dq+dx)K×1𝟎1×(dq+dx)K𝒱11,nT𝒱1M,nT𝟎1×(dq+dx)K𝒱M1,nT𝒱MM,nT)(Λz,nT(s)Λ1,nT(s)ΛM,nT(s))\displaystyle=\left(\begin{array}[]{cccc}\Lambda_{z,nT}(s)&\Lambda_{1,nT}(s)&\cdots&\Lambda_{M,nT}(s)\end{array}\right)\left(\begin{array}[]{cccc}\mathcal{V}_{z,nT}&\bm{0}_{(d_{q}+d_{x})K\times 1}&\cdots&\bm{0}_{(d_{q}+d_{x})K\times 1}\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{11,nT}&\cdots&\mathcal{V}_{1M,nT}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\mathcal{V}_{M1,nT}&\cdots&\mathcal{V}_{MM,nT}\end{array}\right)\left(\begin{array}[]{c}\Lambda_{z,nT}(s)^{\top}\\ \Lambda_{1,nT}(s)\\ \vdots\\ \Lambda_{M,nT}(s)\end{array}\right) (B.200)
=ϕK(s)𝕊αΣnT𝕊αϕK(s)[σnT,α(s)]2=1,\displaystyle=\frac{\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\Sigma_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)}{[\sigma_{nT,\alpha}(s)]^{2}}=1, (B.201)

as desired.

Verification of condition (2)

To verify condition (2), it is sufficient to show that i=1𝒏𝔼[|γi(s)|4nT(i1)]𝑝0\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[|\gamma_{\text{i}}(s)|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)]\overset{p}{\to}0. Moreover, by the crc_{r} inequality,

i=1𝒏𝔼[|γi(s)|4nT(i1)]8i=1𝒏𝔼[|ai(s)|4]+128i=1𝒏𝔼[|j=1i1bi,j(s)|4nT(i1)].\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[|\gamma_{\text{i}}(s)|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)]\leq 8\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[|a_{\text{i}}(s)|^{4}]+128\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}\left[\left|\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s)\right|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)\right]. (B.202)

For the first term on the right-hand side, noting that Assumption 3.3(ii) implies 𝔼|k=14εit(sk)|<\mathbb{E}|\prod_{k=1}^{4}\varepsilon_{it}(s_{k})|<\infty by Hölder’s inequality,

i=1𝒏𝔼[|ai(s)|4]\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}[|a_{\text{i}}(s)|^{4}] =1L4n2(T1)2i=1𝒏1l1,l2,l3,l4L|j=14Λz,nT(s)zi(slj)|𝔼|j=14εi(slj)|\displaystyle=\frac{1}{L^{4}n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{1\leq l_{1},l_{2},l_{3},l_{4}\leq L}\left|\prod_{j=1}^{4}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l_{j}})\right|\cdot\mathbb{E}\left|\prod_{j=1}^{4}\varepsilon_{\text{i}}(s_{l_{j}})\right| (B.203)
1n2(T1)2i=1𝒏1L41l1,l2,l3,l4L|j=14Λz,nT(s)zi(slj)|K4nTϕK(s)4.\displaystyle\lesssim\frac{1}{n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\frac{1}{L^{4}}\sum_{1\leq l_{1},l_{2},l_{3},l_{4}\leq L}\left|\prod_{j=1}^{4}\Lambda_{z,nT}(s)z_{\text{i}}^{\dagger}(s_{l_{j}})\right|\lesssim\frac{K^{4}}{nT\left\|\phi^{K}(s)\right\|^{4}}. (B.204)

For the second term, observe that

i=1𝒏𝔼[|j=1i1bi,j(s)|4nT(i1)]\displaystyle\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}\left[\left|\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s)\right|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)\right] (B.205)
=1L4n2(T1)2i=1𝒏1l1,l2,l3,l4L(j1=1i1πM,i,j1(s)εj1(sl1))(j4=1i1πM,i,j4(s)εj4(sl4))𝔼(k=14εi(slk)).\displaystyle=\frac{1}{L^{4}n^{2}(T-1)^{2}}\sum_{\text{i}=1}^{\bm{n}}\sum_{1\leq l_{1},l_{2},l_{3},l_{4}\leq L}\left(\sum_{\text{j}_{1}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}_{1}}(s)\varepsilon_{\text{j}_{1}}(s_{l_{1}})\right)\cdots\left(\sum_{\text{j}_{4}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}_{4}}(s)\varepsilon_{\text{j}_{4}}(s_{l_{4}})\right)\mathbb{E}\left(\prod_{k=1}^{4}\varepsilon_{\text{i}}(s_{l_{k}})\right). (B.206)

Further, it is easy to see that j=1i1πM,i,j(s)εj(sl)p1\sum_{\text{j}=1}^{\text{i}-1}\pi_{M,\text{i},\text{j}}(s)\varepsilon_{\text{j}}(s_{l})\lesssim_{p}1 by Markov’s inequality. Hence, we have i=1𝒏𝔼[|j=1i1bi,j(s)|4nT(i1)]p1/(nT)\sum_{\text{i}=1}^{\bm{n}}\mathbb{E}\left[\left|\sum_{\text{j}=1}^{\text{i}-1}b_{\text{i},\text{j}}(s)\right|^{4}\mid\mathscr{F}_{nT}(\text{i}-1)\right]\lesssim_{p}1/(nT), and combining this with the previous result implies condition (2).


Proof of Proposition 4.1.

Since the proofs of (i) and (ii) are almost identical, we only prove (i). By the triangle inequality,

M^SnT(i,j,s)M(i,j,s)\displaystyle\left\|\widehat{M}^{S}_{nT}(i,j,s)-M(i,j,s)\right\|_{\infty} M^SnT(i,j,s)MS(i,j,s)+MS(i,j,s)M(i,j,s),\displaystyle\leq\left\|\widehat{M}^{S}_{nT}(i,j,s)-M^{S}(i,j,s)\right\|_{\infty}+\left\|M^{S}(i,j,s)-M(i,j,s)\right\|_{\infty}, (B.207)

where MS(i,j,s)=0SWn𝒆iγ(β0j,s)M^{S}(i,j,s)\coloneqq\sum_{\ell=0}^{S}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\beta_{0j},s). For the first term on the right-hand side, observe that

|{M^SnT(i,j,s)}k{MS(i,j,s)}k|\displaystyle\left|\{\widehat{M}^{S}_{nT}(i,j,s)\}_{k}-\{M^{S}(i,j,s)\}_{k}\right| =0S|{Wn𝒆i}k||γ^nT(β^nT,j,s)γ(β0j,s)|\displaystyle\leq\sum_{\ell=0}^{S}\left|\{W_{n}^{\ell}\bm{e}_{i}\}_{k}\right|\cdot\left|\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s)-\gamma^{\ell}(\beta_{0j},s)\right| (B.208)
=0S|γ^nT(β^nT,j,s)γ(β0j,s)|\displaystyle\lesssim\sum_{\ell=0}^{S}\left|\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s)-\gamma^{\ell}(\beta_{0j},s)\right| (B.209)

for all k[n]k\in[n]. By definition, when =0\ell=0,

|γ^nT0(β^nT,j,s)γ0(β0j,s)|\displaystyle\left|\widehat{\gamma}_{nT}^{0}(\widehat{\beta}_{nT,j},s)-\gamma^{0}(\beta_{0j},s)\right| =|β^nT,j(s)β0j(s)|pcn\displaystyle=\left|\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)\right|\lesssim_{p}c_{n} (B.210)

uniformly in s[0,1]s\in[0,1], where cnK/nT+K1πc_{n}\coloneqq\sqrt{K}/\sqrt{nT}+K^{1-\pi}. When =1\ell=1, by Assumption 3.4,

|γ^nT1(β^nT,j,s)γ1(β0j,s)|\displaystyle\left|\widehat{\gamma}_{nT}^{1}(\widehat{\beta}_{nT,j},s)-\gamma^{1}(\beta_{0j},s)\right| =|α^nT(s)A(β^nT,j,s)α0(s)A(β0j,s)|\displaystyle=\left|\widehat{\alpha}_{nT}(s)A(\widehat{\beta}_{nT,j},s)-\alpha_{0}(s)A(\beta_{0j},s)\right| (B.211)
|α^nT(s)||A(β^nT,jβ0j,s)|+|α^nT(s)α0(s)||A(β0j,s)|\displaystyle\leq\left|\widehat{\alpha}_{nT}(s)|\cdot|A(\widehat{\beta}_{nT,j}-\beta_{0j},s)\right|+\left|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\cdot|A(\beta_{0j},s)\right| (B.212)
p(α¯0+cn)sups[0,1]|β^nT,j(s)β0j(s)|+cn\displaystyle\lesssim_{p}(\overline{\alpha}_{0}+c_{n})\cdot\sup_{s\in[0,1]}|\widehat{\beta}_{nT,j}(s)-\beta_{0j}(s)|+c_{n} (B.213)
α¯0cn+cn\displaystyle\lesssim\overline{\alpha}_{0}c_{n}+c_{n} (B.214)

uniformly in s[0,1]s\in[0,1]. Similarly, when =2\ell=2, we have

|γ^nT2(β^nT,j,s)γ2(β0j,s)|\displaystyle\left|\widehat{\gamma}_{nT}^{2}(\widehat{\beta}_{nT,j},s)-\gamma^{2}(\beta_{0j},s)\right| =|α^nT(s)A(γ^nT1(β^nT,j,),s)α0(s)A(γ1(β0j,),s)|\displaystyle=\left|\widehat{\alpha}_{nT}(s)A(\widehat{\gamma}_{nT}^{1}(\widehat{\beta}_{nT,j},\cdot),s)-\alpha_{0}(s)A(\gamma^{1}(\beta_{0j},\cdot),s)\right| (B.215)
|α^nT(s)||A(γ^nT1(β^nT,j,)γ1(β0j,),s)|+|α^nT(s)α0(s)||A(γ1(β0j,),s)|\displaystyle\leq\left|\widehat{\alpha}_{nT}(s)|\cdot|A(\widehat{\gamma}_{nT}^{1}(\widehat{\beta}_{nT,j},\cdot)-\gamma^{1}(\beta_{0j},\cdot),s)\right|+\left|\widehat{\alpha}_{nT}(s)-\alpha_{0}(s)|\cdot|A(\gamma^{1}(\beta_{0j},\cdot),s)\right| (B.216)
p(α¯0+cn)(α¯0cn+cn)+α¯0cn\displaystyle\lesssim_{p}(\overline{\alpha}_{0}+c_{n})\cdot(\overline{\alpha}_{0}c_{n}+c_{n})+\overline{\alpha}_{0}c_{n} (B.217)
α¯02cn+α¯0cn.\displaystyle\lesssim\overline{\alpha}_{0}^{2}c_{n}+\overline{\alpha}_{0}c_{n}. (B.218)

Thus, repeating the same computation recursively, we can obtain |γ^nT(β^nT,j,s)γ(β0j,s)|pα¯01cn|\widehat{\gamma}_{nT}^{\ell}(\widehat{\beta}_{nT,j},s)-\gamma^{\ell}(\beta_{0j},s)|\lesssim_{p}\overline{\alpha}_{0}^{\ell-1}c_{n} for general 1\ell\geq 1 under α¯0<1\overline{\alpha}_{0}<1. From a straightforward calculation, we have =1Sα¯01cn=cn(1α¯0S)/(1α¯0)\sum_{\ell=1}^{S}\overline{\alpha}_{0}^{\ell-1}c_{n}=c_{n}(1-\overline{\alpha}_{0}^{S})/(1-\overline{\alpha}_{0}), which leads to ||M^SnT(i,j,s)MS(i,j,s)||pcn||\widehat{M}^{S}_{nT}(i,j,s)-M^{S}(i,j,s)||_{\infty}\lesssim_{p}c_{n}.

Next, observe that MS(i,j,s)M(i,j,s)==S+1Wn𝒆iγ(β0j,s)M^{S}(i,j,s)-M(i,j,s)=\sum_{\ell=S+1}^{\infty}W_{n}^{\ell}\bm{e}_{i}\gamma^{\ell}(\beta_{0j},s) and that

|γ0(β0j,s)|β¯0j,|γ1(β0j,s)|α¯0β¯0j,,|γ(β0j,s)|α¯0β¯0j\displaystyle|\gamma^{0}(\beta_{0j},s)|\leq\overline{\beta}_{0j},\;\;|\gamma^{1}(\beta_{0j},s)|\leq\overline{\alpha}_{0}\overline{\beta}_{0j},\;\;\ldots,\;\;|\gamma^{\ell}(\beta_{0j},s)|\leq\overline{\alpha}_{0}^{\ell}\overline{\beta}_{0j} (B.219)

by repeatedly applying Assumption 3.4, where β¯0jsups[0,1]|β0j(s)|\overline{\beta}_{0j}\coloneqq\sup_{s\in[0,1]}|\beta_{0j}(s)|. Hence, we have

MS(i,j,s)M(i,j,s)=S+1|γ(β0j,s)|β¯0j1α¯0α¯0S+1.\displaystyle\left\|M^{S}(i,j,s)-M(i,j,s)\right\|_{\infty}\lesssim\sum_{\ell=S+1}^{\infty}|\gamma^{\ell}(\beta_{0j},s)|\leq\frac{\overline{\beta}_{0j}}{1-\overline{\alpha}_{0}}\cdot\overline{\alpha}_{0}^{S+1}. (B.220)

Combining these results completes the proof.

Appendix C Consistent variance estimation

First, observe the following alternative representations of 𝒱z,nT\mathcal{V}_{z,nT} and 𝒱ab,nT\mathcal{V}_{ab,nT}:

𝒱z,nT\displaystyle\mathcal{V}_{z,nT} =1L2n(T1)l=1Ll=1Lt=1T1t:|tt|1i=1nzit(sl)zit(sl)𝔼[εit(sl)εit(sl)]\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\mathbb{E}[\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})] (C.1)
𝒱ab,nT\displaystyle\mathcal{V}_{ab,nT} =2L2n(T1)l=1Ll=1Lt=1T1t:|tt|11i,jnpa,i,jpb,i,j𝔼[εit(sl)εit(sl)]𝔼[εjt(sl)εjt(sl)].\displaystyle=\frac{2}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{1\leq i,j\leq n}p_{a,i,j}p_{b,i,j}\mathbb{E}[\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})]\mathbb{E}[\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}})]. (C.2)

Define e^it(s)ei,t+1(s;θ^nT)eit(s;θ^nT)\vec{\widehat{e}}_{it}(s)\coloneqq e_{i,t+1}(s;\widehat{\theta}_{nT})-e_{it}(s;\widehat{\theta}_{nT}),

𝒱^z,nT\displaystyle\widehat{\mathcal{V}}_{z,nT} =1L2n(T1)l=1Ll=1Lt=1T1t:|tt|1i=1nzit(sl)zit(sl)e^it(sl)e^it(sl)\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}}) (C.3)
𝒱^ab,nT\displaystyle\widehat{\mathcal{V}}_{ab,nT} =2L2n(T1)l=1Ll=1Lt=1T1t:|tt|11i,jnpa,i,jpb,i,je^it(sl)e^it(sl)e^jt(sl)e^jt(sl)\displaystyle=\frac{2}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{1\leq i,j\leq n}p_{a,i,j}p_{b,i,j}\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})\vec{\widehat{e}}_{jt}(s_{l})\vec{\widehat{e}}_{jt^{\prime}}(s_{l^{\prime}}) (C.4)
𝒱^nT\displaystyle\widehat{\mathcal{V}}_{nT} (𝒱^z,nT𝟎(dq+dx)K×1𝟎(dq+dx)K×1𝟎1×(dq+dx)K𝒱^11,nT𝒱^1M,nT𝟎1×(dq+dx)K𝒱^M1,nT𝒱^MM,nT)\displaystyle\coloneqq\left(\begin{array}[]{cccc}\widehat{\mathcal{V}}_{z,nT}&\bm{0}_{(d_{q}+d_{x})K\times 1}&\cdots&\bm{0}_{(d_{q}+d_{x})K\times 1}\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\widehat{\mathcal{V}}_{11,nT}&\cdots&\widehat{\mathcal{V}}_{1M,nT}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{0}_{1\times(d_{q}+d_{x})K}&\widehat{\mathcal{V}}_{M1,nT}&\cdots&\widehat{\mathcal{V}}_{MM,nT}\end{array}\right) (C.9)
Σ^nT\displaystyle\widehat{\Sigma}_{nT} (J¯nT(θ^nT)ΩnTJ¯nT(θ^nT))1J¯nT(θ^nT)ΩnT𝒱^nTΩnTJ¯nT(θ^nT)(J¯nT(θ^nT)ΩnTJ¯nT(θ^nT))1.\displaystyle\coloneqq\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\widehat{\theta}_{nT})\right)^{-1}\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\widehat{\mathcal{V}}_{nT}\Omega_{nT}\overline{J}_{nT}(\widehat{\theta}_{nT})\left(\overline{J}_{nT}(\widehat{\theta}_{nT})^{\top}\Omega_{nT}\overline{J}_{nT}(\widehat{\theta}_{nT})\right)^{-1}. (C.10)

Then, our variance estimators for σnT,α(s)\sigma_{nT,\alpha}(s) and σnT,β(s)\sigma_{nT,\beta}(s) are given as

σ^nT,α(s)\displaystyle\widehat{\sigma}_{nT,\alpha}(s) ϕK(s)𝕊αΣ^nT𝕊αϕK(s)\displaystyle\coloneqq\sqrt{\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\widehat{\Sigma}_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)} (C.11)
σ^nT,j(s)\displaystyle\widehat{\sigma}_{nT,j}(s) ϕK(s)𝕊jΣ^nT𝕊jϕK(s),\displaystyle\coloneqq\sqrt{\phi^{K}(s)^{\top}\mathbb{S}_{j}\widehat{\Sigma}_{nT}\mathbb{S}_{j}^{\top}\phi^{K}(s)}, (C.12)

respectively.

Proposition C.1 (Consistent variance estimation).

Suppose that the assumptions in Theorem 3.2 are satisfied. In addition, assume that K3/(nT)0K^{3}/(nT)\to 0 and K2π0K^{2-\pi}\to 0 as nTnT\to\infty. Then,

  • (i)

    𝒱^z,nT𝒱z,nTpK3/2/nT+K2π\left\|\widehat{\mathcal{V}}_{z,nT}-\mathcal{V}_{z,nT}\right\|\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi}

  • (ii)

    𝒱^ab,nT𝒱ab,nTpK/nT+K1π\left\|\widehat{\mathcal{V}}_{ab,nT}-\mathcal{V}_{ab,nT}\right\|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi} for all 1a,bM1\leq a,b\leq M

  • (iii)

    σ^nT,α(s)/σnT,α(s)𝑝1\widehat{\sigma}_{nT,\alpha}(s)/\sigma_{nT,\alpha}(s)\overset{p}{\to}1

  • (iv)

    σ^nT,j(s)/σnT,j(s)𝑝1\widehat{\sigma}_{nT,j}(s)/\sigma_{nT,j}(s)\overset{p}{\to}1 for all j[dx]j\in[d_{x}].

Proof.

(i) Decompose

𝒱^z,nT𝒱z,nT=(𝒱^z,nT𝒱~z,nT)+(𝒱~z,nT𝒱z,nT),\displaystyle\widehat{\mathcal{V}}_{z,nT}-\mathcal{V}_{z,nT}=\left(\widehat{\mathcal{V}}_{z,nT}-\widetilde{\mathcal{V}}_{z,nT}\right)+\left(\widetilde{{\mathcal{V}}}_{z,nT}-\mathcal{V}_{z,nT}\right), (C.13)

where

𝒱~z,nT\displaystyle\widetilde{\mathcal{V}}_{z,nT} =1L2n(T1)l=1Ll=1Lt=1T1t:|tt|1i=1nzit(sl)zit(sl)εit(sl)εit(sl).\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}}). (C.14)

Since

e^it(s)=ait(s)[α0(s)α^nT(s)]+j=1dxXitj[β0j(s)β^nT,j(s)]bit(s)+εit(s),\displaystyle\vec{\widehat{e}}_{it}(s)=\underbracket{\vec{a}_{it}(s)[\alpha_{0}(s)-\widehat{\alpha}_{nT}(s)]+\sum_{j=1}^{d_{x}}\vec{X}_{it}^{j}[\beta_{0j}(s)-\widehat{\beta}_{nT,j}(s)]}_{\eqqcolon b_{it}(s)}+\vec{\varepsilon}_{it}(s), (C.15)

we have

e^it(sl)e^it(sl)\displaystyle\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}}) =εit(sl)εit(sl)+bit(sl)bit(sl)+bit(sl)εit(sl)+εit(sl)bit(sl).\displaystyle=\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})\varepsilon_{it^{\prime}}(s_{l^{\prime}})+\vec{\varepsilon}_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}}). (C.16)

Then, we can write

𝒱^z,nT𝒱~z,nT\displaystyle\widehat{\mathcal{V}}_{z,nT}-\widetilde{\mathcal{V}}_{z,nT} =1L2n(T1)l=1Ll=1Lt=1T1t:|tt|1i=1nzit(sl)zit(sl)[e^it(sl)e^it(sl)εit(sl)εit(sl)]\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}[\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})-\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})] (C.17)
=1L2n(T1)l=1Ll=1Lt=1T1t:|tt|1i=1nzit(sl)zit(sl)bit(sl)bit(sl)\displaystyle=\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}}) (C.18)
+1L2n(T1)l=1Ll=1Lt=1T1t:|tt|1i=1nzit(sl)zit(sl)bit(sl)εit(sl)\displaystyle\quad+\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}b_{it}(s_{l})\varepsilon_{it^{\prime}}(s_{l^{\prime}}) (C.19)
+1L2n(T1)l=1Ll=1Lt=1T1t:|tt|1i=1nzit(sl)zit(sl)εit(sl)bit(sl).\displaystyle\quad+\frac{1}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{i=1}^{n}\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\varepsilon}_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}}). (C.20)

In view of Theorem 3.1(ii) and (iii), we can easily find that |bit(s)|pK/nT+K1π|b_{it}(s)|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi} uniformly in ss and (i,t)(i,t). In addition, it is straightforward to see that ||zit(sl)zit(sl)||K||\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}||\lesssim K, and ||zit(sl)zit(sl)εit(sl)||||\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\vec{\varepsilon}_{it}(s_{l})||, ||zit(sl)zit(sl)εit(sl)||pK||\vec{z}_{it}(s_{l})\vec{z}_{it^{\prime}}(s_{l^{\prime}})^{\top}\varepsilon_{it^{\prime}}(s_{l^{\prime}})||\lesssim_{p}K by Markov’s inequality. Hence, we have ||𝒱^z,nT𝒱~z,nT||pK3/2/nT+K2π||\widehat{\mathcal{V}}_{z,nT}-\widetilde{\mathcal{V}}_{z,nT}||\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi}.

Meanwhile, it is not difficult to see that ||𝒱~z,nT𝒱z,nT||pK/nT||\widetilde{\mathcal{V}}_{z,nT}-\mathcal{V}_{z,nT}||\lesssim_{p}K/\sqrt{nT} by Markov’s inequality. Then, the result follows from the triangle inequality.


(ii) Similar to the above, we decompose

𝒱^ab,nT𝒱ab,nT=(𝒱^ab,nT𝒱~ab,nT)+(𝒱~ab,nT𝒱ab,nT),\displaystyle\widehat{\mathcal{V}}_{ab,nT}-\mathcal{V}_{ab,nT}=\left(\widehat{\mathcal{V}}_{ab,nT}-\widetilde{\mathcal{V}}_{ab,nT}\right)+\left(\widetilde{{\mathcal{V}}}_{ab,nT}-\mathcal{V}_{ab,nT}\right), (C.21)

where

𝒱~ab,nT\displaystyle\widetilde{\mathcal{V}}_{ab,nT} =2L2n(T1)l=1Ll=1Lt=1T1t:|tt|11i,jnpa,i,jpb,i,jεit(sl)εit(sl)εjt(sl)εjt(sl).\displaystyle=\frac{2}{L^{2}n(T-1)}\sum_{l=1}^{L}\sum_{l^{\prime}=1}^{L}\sum_{t=1}^{T-1}\sum_{t^{\prime}:\>|t^{\prime}-t|\leq 1}\sum_{1\leq i,j\leq n}p_{a,i,j}p_{b,i,j}\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}}). (C.22)

For the first term on the right-hand side, noting that

e^it(sl)e^it(sl)e^jt(sl)e^jt(sl)\displaystyle\vec{\widehat{e}}_{it}(s_{l})\vec{\widehat{e}}_{it^{\prime}}(s_{l^{\prime}})\vec{\widehat{e}}_{jt}(s_{l})\vec{\widehat{e}}_{jt^{\prime}}(s_{l^{\prime}}) ={εit(sl)εit(sl)+bit(sl)bit(sl)+bit(sl)εit(sl)+εit(sl)bit(sl)}\displaystyle=\{\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})+b_{it}(s_{l})\varepsilon_{it^{\prime}}(s_{l^{\prime}})+\vec{\varepsilon}_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})\} (C.23)
×{εjt(sl)εjt(sl)+bjt(sl)bjt(sl)+bjt(sl)εjt(sl)+εjt(sl)bjt(sl)}\displaystyle\quad\times\{\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}})+b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})+b_{jt}(s_{l})\varepsilon_{jt^{\prime}}(s_{l^{\prime}})+\vec{\varepsilon}_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})\} (C.24)
=εit(sl)εit(sl)εjt(sl)εjt(sl)\displaystyle=\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})\vec{\varepsilon}_{jt}(s_{l})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}}) (C.25)
+εit(sl)εit(sl)εjt(sl)bjt(sl)+(three ε’s * one b)\displaystyle\quad+\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})\vec{\varepsilon}_{jt^{\prime}}(s_{l^{\prime}})b_{jt}(s_{l})+\;\;\cdots\text{(three $\vec{\varepsilon}$'s * one $b$)} (C.26)
+εit(sl)εit(sl)bjt(sl)bjt(sl)+(two ε’s * two b’s)\displaystyle\quad+\vec{\varepsilon}_{it}(s_{l})\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})+\;\;\cdots\text{(two $\vec{\varepsilon}$'s * two $b$'s)} (C.27)
+εit(sl)bit(sl)bjt(sl)bjt(sl)+(one ε * three b’s)\displaystyle\quad+\vec{\varepsilon}_{it^{\prime}}(s_{l^{\prime}})b_{it}(s_{l})b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}})+\;\;\cdots\text{(one $\vec{\varepsilon}$ * three $b$'s)} (C.28)
+bit(sl)bit(sl)bjt(sl)bjt(sl),\displaystyle\quad+b_{it}(s_{l})b_{it^{\prime}}(s_{l^{\prime}})b_{jt}(s_{l})b_{jt^{\prime}}(s_{l^{\prime}}), (C.29)

it is not difficult to show that |𝒱^ab,nT𝒱~ab,nT|pK/nT+K1π|\widehat{\mathcal{V}}_{ab,nT}-\widetilde{\mathcal{V}}_{ab,nT}|\lesssim_{p}\sqrt{K}/\sqrt{nT}+K^{1-\pi}.

For the second term, following the analogous argument as in the proof of Proposition 2 Lin and Lee (2010), |𝒱~ab,nT𝒱ab,nT|p1/nT|\widetilde{{\mathcal{V}}}_{ab,nT}-\mathcal{V}_{ab,nT}|\lesssim_{p}1/\sqrt{nT} holds. Hence, we obtain the desired result.


(iii) To prove the result, it suffices to show that

|[σ^nT,α(s)]2[σnT,α(s)]2[σnT,α(s)]2|𝑝0.\displaystyle\left|\frac{[\widehat{\sigma}_{nT,\alpha}(s)]^{2}-[\sigma_{nT,\alpha}(s)]^{2}}{[\sigma_{nT,\alpha}(s)]^{2}}\right|\overset{p}{\to}0. (C.30)

As shown in the proof of Theorem 3.2, [σnT,α(s)]2[\sigma_{nT,\alpha}(s)]^{2} is bounded below from c||ϕK(s)||2c||\phi^{K}(s)||^{2}. On the other hand, writing RnTΩnTJ¯nT(J¯nTΩnTJ¯nT)1R_{nT}\coloneqq\Omega_{nT}\overline{J}_{nT}\left(\overline{J}^{\top}_{nT}\Omega_{nT}\overline{J}_{nT}\right)^{-1} and its estimator counterpart as R^nT\widehat{R}_{nT}, by the triangle inequality,

|[σ^nT,α(s)]2[σnT,α(s)]2|\displaystyle\left|[\widehat{\sigma}_{nT,\alpha}(s)]^{2}-[\sigma_{nT,\alpha}(s)]^{2}\right| |ϕK(s)𝕊α[Σ^nTΣnT]𝕊αϕK(s)|\displaystyle\leq\left|\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}\left[\widehat{\Sigma}_{nT}-\Sigma_{nT}\right]\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right| (C.31)
𝒱^nT𝒱nTR^nT𝕊αϕK(s)2+λmax(𝒱^nT){R^nTRnT}𝕊αϕK(s)2\displaystyle\leq\left\|\widehat{\mathcal{V}}_{nT}-\mathcal{V}_{nT}\right\|\cdot\left\|\widehat{R}_{nT}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right\|^{2}+\lambda_{\max}(\widehat{\mathcal{V}}_{nT})\cdot\left\|\left\{\widehat{R}_{nT}-R_{nT}\right\}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right\|^{2} (C.32)
+2|ϕK(s)𝕊αRnT𝒱^nT{R^nTRnT}𝕊αϕK(s)|\displaystyle\quad+2\left|\phi^{K}(s)^{\top}\mathbb{S}_{\alpha}R_{nT}\widehat{\mathcal{V}}_{nT}\left\{\widehat{R}_{nT}-R_{nT}\right\}\mathbb{S}_{\alpha}^{\top}\phi^{K}(s)\right| (C.33)
p(K3/2/nT+K2π)ϕK(s)2,\displaystyle\lesssim_{p}\left(K^{3/2}/\sqrt{nT}+K^{2-\pi}\right)\left\|\phi^{K}(s)\right\|^{2}, (C.34)

where the last inequality is from ||R^nTRnT||pK/nT+K(12π)/2||\widehat{R}_{nT}-R_{nT}||\lesssim_{p}K/\sqrt{nT}+K^{(1-2\pi)/2} by Lemma B.9(i) and (iv) and ||𝒱^nT𝒱nT||pK3/2/nT+K2π||\widehat{\mathcal{V}}_{nT}-\mathcal{V}_{nT}||\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi} by results (i) and (ii). Thus, we have

|[σ^nT,α(s)]2[σnT,α(s)]2[σnT,α(s)]2|pK3/2/nT+K2π0.\displaystyle\left|\frac{[\widehat{\sigma}_{nT,\alpha}(s)]^{2}-[\sigma_{nT,\alpha}(s)]^{2}}{[\sigma_{nT,\alpha}(s)]^{2}}\right|\lesssim_{p}K^{3/2}/\sqrt{nT}+K^{2-\pi}\to 0. (C.35)

(iv) Analogous to the proof of (iii). ∎

Appendix D Supplementary figures for the empirical analysis

Refer to caption
Figure D.1: Distribution of potential bike relocation events
Refer to caption
(a) X1itX^{1}_{it}: ratio of round trips
Refer to caption
(b) X2itX^{2}_{it}: ratio of departing subscribers
Refer to caption
(c) X3itX^{3}_{it}: ratio of arriving subscribers
Refer to caption
(d) X4itX^{4}_{it}: rainy day
Figure D.2: Estimated β0(s)\beta_{0}(s)

References

  • Belloni et al. (2015) Belloni, A., Chernozhukov, V., Chetverikov, D., and Kato, K., 2015. Some new asymptotic theory for least squares series: Pointwise and uniform results, Journal of Econometrics, 186 (2), 345–366.
  • Beyaztas et al. (2024) Beyaztas, U., Shang, H.L., Sezer, G.B., Mandal, A., Zoh, R.S., and Tekwe, C.D., 2024. Spatial function-on-function regression, arXiv preprint, 2412.17327.
  • Chen and Müller (2012) Chen, K. and Müller, H.G., 2012. Modeling repeated functional observations, Journal of the American Statistical Association, 107 (500), 1599–1609.
  • Chen (2007) Chen, X., 2007. Chapter 76 large sample sieve estimation of semi-nonparametric models, Elsevier, Handbook of Econometrics, vol. 6, 5549–5632.
  • Delicado (2011) Delicado, P., 2011. Dimensionality reduction when data are density functions, Computational Statistics & Data Analysis, 55 (1), 401–420.
  • Denbee et al. (2021) Denbee, E., Julliard, C., Li, Y., and Yuan, K., 2021. Network risk and key players: A structural analysis of interbank liquidity, Journal of Financial Economics, 141 (3), 831–859.
  • Di et al. (2024) Di, C., Wang, G., Wu, S., Evenson, K.R., LaMonte, M.J., and LaCroix, A.Z., 2024. Utilizing wearable devices to improve precision in physical activity epidemiology: Sensors, data and analytic methods, in: Statistics in Precision Health: Theory, Methods and Applications, Springer, 41–64.
  • Dudley and Philipp (1983) Dudley, R.M. and Philipp, W., 1983. Invariance principles for sums of Banach space valued random elements and empirical processes, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 62 (4), 509–552.
  • Eren and Uz (2020) Eren, E. and Uz, V.E., 2020. A review on bike-sharing: The factors affecting bike-sharing demand, Sustainable Cities and Society, 54, 101882.
  • Faghih-Imani and Eluru (2016) Faghih-Imani, A. and Eluru, N., 2016. Incorporating the impact of spatio-temporal interactions on bicycle sharing system demand: A case study of New York CitiBike system, Journal of Transport Geography, 54, 218–227.
  • Gallant and White (1988) Gallant, R. and White, H., 1988. A unified theory of estimation and inference for nonlinear dynamic models, Blackwell.
  • Hoshino (2022) Hoshino, T., 2022. Sieve IV estimation of cross-sectional interaction models with nonparametric endogenous effect, Journal of Econometrics, 229 (2), 263–275.
  • Hoshino (2024) Hoshino, T., 2024. Functional spatial autoregressive models, arXiv preprint, 2402.14763.
  • Hron et al. (2016) Hron, K., Menafoglio, A., Templ, M., Hruzová, K., and Filzmoser, P., 2016. Simplicial principal component analysis for density functions in bayes spaces, Computational Statistics & Data Analysis, 94, 330–350.
  • Hyndman and Ullah (2007) Hyndman, R.J. and Ullah, M.S., 2007. Robust forecasting of mortality and fertility rates: A functional data approach, Computational Statistics & Data Analysis, 51 (10), 4942–4956.
  • Jenish (2012) Jenish, N., 2012. Nonparametric spatial regression under near-epoch dependence, Journal of Econometrics, 167 (1), 224–239.
  • Jenish and Prucha (2009) Jenish, N. and Prucha, I.R., 2009. Central limit theorems and uniform laws of large numbers for arrays of random fields, Journal of Econometrics, 150 (1), 86–98.
  • Jenish and Prucha (2012) Jenish, N. and Prucha, I.R., 2012. On spatial processes and asymptotic inference under near-epoch dependence, Journal of Econometrics, 170 (1), 178–190.
  • Koop et al. (1996) Koop, G., Pesaran, M.H., and Potter, S.M., 1996. Impulse response analysis in nonlinear multivariate models, Journal of Econometrics, 74 (1), 119–147.
  • Kress (2014) Kress, R., 2014. Linear Integral Equations, Third Edition, Springer.
  • Kuersteiner and Prucha (2020) Kuersteiner, G.M. and Prucha, I.R., 2020. Dynamic spatial panel models: Networks, common shocks, and sequential exogeneity, Econometrica, 88 (5), 2109–2146.
  • Lee and Yu (2010) Lee, L.F. and Yu, J., 2010. Estimation of spatial autoregressive panel data models with fixed effects, Journal of Econometrics, 154 (2), 165–185.
  • Lee and Yu (2014) Lee, L.F. and Yu, J., 2014. Efficient GMM estimation of spatial dynamic panel data models with fixed effects, Journal of Econometrics, 180 (2), 174–197.
  • Lin et al. (2018) Lin, L., He, Z., and Peeta, S., 2018. Predicting station-level hourly demand in a large-scale bike-sharing network: A graph convolutional neural network approach, Transportation Research Part C: Emerging Technologies, 97, 258–276.
  • Lin and Lee (2010) Lin, X. and Lee, L.F., 2010. GMM estimation of spatial autoregressive models with unknown heteroskedasticity, Journal of Econometrics, 157 (1), 34–52.
  • Ma et al. (2024) Ma, T., Yao, F., and Zhou, Z., 2024. Network-level traffic flow prediction: Functional time series vs. functional neural network approach, The Annals of Applied Statistics, 18 (1), 424–444.
  • Scott (1973) Scott, D.J., 1973. Central limit theorems for martingales and for processes with stationary increments using a skorokhod representation approach, Advances in Applied Probability, 5 (1), 119–137.
  • Su and Hoshino (2016) Su, L. and Hoshino, T., 2016. Sieve instrumental variable quantile regression estimation of functional coefficient models, Journal of Econometrics, 191 (1), 231–254.
  • Torti et al. (2021) Torti, A., Pini, A., and Vantini, S., 2021. Modelling time-varying mobility flows using function-on-function regression: Analysis of a bike sharing system in the city of milan, Journal of the Royal Statistical Society Series C: Applied Statistics, 70 (1), 226–247.
  • Xu and Lee (2015) Xu, X. and Lee, L.F., 2015. A spatial autoregressive model with a nonlinear transformation of the dependent variable, Journal of Econometrics, 186 (1), 1–18.
  • Yu et al. (2008) Yu, J., De Jong, R., and Lee, L.F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large, Journal of Econometrics, 146 (1), 118–134.
  • Zhu et al. (2022) Zhu, X., Cai, Z., and Ma, Y., 2022. Network functional varying coefficient model, Journal of the American Statistical Association, 117 (540), 2074–2085.