Functional Network Autoregressive Models for Panel Data
Abstract
This study proposes a novel functional vector autoregressive framework for analyzing network interactions of functional outcomes in panel data settings. In this framework, an individual’s outcome function is influenced by the outcomes of others through a simultaneous equation system. To estimate the functional parameters of interest, we need to address the endogeneity issue arising from these simultaneous interactions among outcome functions. This issue is carefully handled by developing a novel functional moment-based estimator. We establish the consistency, convergence rate, and pointwise asymptotic normality of the proposed estimator. Additionally, we discuss the estimation of marginal effects and impulse response analysis. As an empirical illustration, we analyze the demand for a bike-sharing service in the U.S. The results reveal statistically significant spatial interactions in bike availability across stations, with interaction patterns varying over the time of day.
Keywords: bicycle-sharing systems, endogeneity, functional data analysis, panel data, network autoregressive models.
Introduction
The availability of functional data has been rapidly expanding across all fields of research, leading to a growing need for statistical tools that appropriately account for the unique characteristics of each type of functional data. In the analysis of socioeconomic data, there are at least two key aspects that should be addressed. The first is that an individual’s decision or behavioral pattern may influence that of others through social networks—interactions between individuals. The second is that individuals are intrinsically heterogeneous, even after controlling for observable characteristics—unobserved heterogeneity of individuals. Therefore, analyzing socioeconomic functional data requires functional models that jointly capture both of these aspects, which is the aim of this study.
More specifically, to account for the interactions among units, we extend the network (or spatial) autoregressive (NAR) modeling approach to a functional response model. To address the unobserved individual heterogeneity, we introduce the functional fixed effects approach, given the availability of panel data. When the response variable is a scalar rather than a function, there already exists a vast body of studies investigating fixed-effect NAR models for panel data, such as
(1.1) |
and its variants (e.g., Yu et al., 2008; Lee and Yu, 2010, 2014; Kuersteiner and Prucha, 2020, among others). Here, is a scalar outcome, denotes a known weight term measuring the social or geographical proximity between units and , is a vector of covariates, represents a fixed effect specific to each , and denotes an error term. The term captures the local trend of the outcome variable in the neighborhood of . Model (1.1) is typically applied in fields such as health, real estate, transportation, education, and municipal data. However, with the increasing availability of functional data in these fields, such as real-time activity recognition, real-time population mobility and congestion patterns, and regional wealth distributions, scalar models like (1.1) may fail to appropriately capture the complex nature of these interactions.
The above discussion motivates us to extend (1.1) to the following model: for ,
(1.2) |
where represents the outcome function of interest, which may or may not be a smooth function of , is the interaction effect function, is a vector of functional coefficients, is the fixed effect function, and is the functional error term with mean zero at each . Here, denotes a known linear functional, whose functional form may differ according to the research interest. Since the response variable is a function, we can consider various types of interaction patterns. The most typical form of interaction would be the ”concurrent” interaction, where only the responses of others at the same evaluation point are influential. In this case, is a point-evaluation functional at : . When represents a time, then the past outcome should affect the future outcome (but the converse should not), which would motivate us to employ , where is a user-chosen function that is non-negative, increasing in up to , and for . For other examples, if others’ responses at all evaluation points are equivalently influential, we may use . These examples can be represented as an integral operator with some kernel weight function . For example, in the case of point-evaluation functional, we can set , where denotes the Dirac delta function.
Here, we provide three types of empirical topics to which the model (1.2) would be nicely applied.
Example 1.1 (Health data analysis).
In the health literature, researchers have increasingly focused on real-time activity data collected through wearable devices or smartphone apps (see, e.g., Di et al. (2024) for a review). As a typical example, represents the activity level of individual , measured by an accelerometer at time on day . Now, suppose the dataset consists of an elderly population, where some people in the same neighborhood frequently engage in fitness activities such as running. If we apply our model to their activity-level data, with representing neighborhood membership and , we may observe a significantly positive interaction effect.
Example 1.2 (Demographic data analysis).
Demographic analysis is a major application of functional data analysis (FDA). For instance, functional analysis of regional age distributions (i.e., population pyramids) has been studied extensively (e.g., Delicado, 2011; Hron et al., 2016; Hoshino, 2024). Among these studies, Hoshino (2024) considered a functional spatial autoregressive model in which represents the -th age quantile of city , allowing interactions with the age quantiles of neighboring cities. Another common application in FDA is the analysis of mortality and fertility rates (e.g., Hyndman and Ullah, 2007; Chen and Müller, 2012). In such a setting, the outcome function may represent the age-specific fertility rate of women of age in city in year . The presence of regional interactions in fertility would be unsurprising.
Example 1.3 (Transportation data analysis).
Functional data analysis of transportation data, such as traffic flows and demand for transportation services, has been gaining significant attention (see, e.g., Ma et al. (2024) for recent advancements). In the empirical application of this study, we apply our model to analyze the bike use data in a U.S. bike-sharing system. In our empirical analysis, represents the availability of bikes at station at time during week . We find statistically significant positive or negative spatial interactions among bike availabilities of nearby stations, depending on the time of day. Further details are provided in Section 6.
In model (1.2), the parameters of our primary interest to be estimated are and . With the total time periods possibly large or small, we apply a first-differencing transformation to eliminate the individual fixed effects from the model. For the transformed model, rather than estimating and pointwise at many different -values separately, we approximate them using orthonormal basis expansions and estimate their entire functional forms jointly in a single estimation. Our proposed estimator is based on the generalized method of moments (GMM). Specifically, we first derive a set of moment conditions at each , in a similar manner to Lin and Lee (2010), and then integrate these conditions numerically over . These integrated moment functions define our GMM objective function, and the resulting estimator is referred to as the integrated-GMM estimator. Once and are estimated, if necessary, we can estimate the fixed effects simply by taking the individual-level mean of the residuals.
Note that in model (1.2), the outcome functions appear on both the left- and right-hand sides, implying that it is formulated as a system of simultaneous functional equations. Depending on the true values of the functional parameters, the model may exhibit an explosive network interaction process, leading to non-stationarity and inconsistency of the proposed estimator. Thus, we first derive the condition under which the model attains the stationarity in our context. We consequently show that the magnitude of network interactions must reside within a certain range.
Then, under the stationarity condition on network interactions, along with some regularity conditions, we derive the convergence rates of the integrated-GMM estimators for and . In addition, we prove that the estimators are asymptotically normal at each evaluation point . Due to the complexity of characterizing the stochastic process of functional outcomes, the numerical integration of moment functions, the first-differencing elimination of functional fixed effects, and the need to appropriately control the order of basis expansion, among other factors, establishing these results involves new mathematical challenges and requires careful discussion. These theoretical results are numerically corroborated through a series of Monte Carlo experiments.
As an empirical illustration of our method, we apply it to the demand analysis of a bike-sharing system in the San Francisco Bay Area, U.S. Using publicly available data from Bay Area Bike Share, we analyze spatial interactions in bike availability across 70 stations from May 2014 to August 2015. Our results reveal significant positive spatial interactions in bike availability during the morning hours, while negative interactions emerge in the early evening. Furthermore, we conduct an impulse response analysis to demonstrate how a reduction in bike availability at a given station propagates to nearby stations over time. These findings highlight the importance of spatial interactions in shared mobility services and demonstrate the practical applicability of our method.
Our paper relates to a broad range of theoretical and empirical literature. From a theoretical perspective, our study contributes to both the FDA literature and the network/spatial interactions literature by proposing a new model that connects the two. In this sense, one of the most closely related studies to ours is Zhu et al. (2022). They proposed a functional NAR model similar to (1.2) but not in panel data settings. In contrast to Zhu et al. (2022), our novel GMM estimator requires neither parametric assumptions nor I.I.D. conditions for the disturbance term. This weak requirement arises from that we treat the individual effects as parameters, whereas Zhu et al. (2022) perform functional principal component analysis to control them based on some homogeneity condition. Moreover, they considered only a concurrent interaction case (i.e., ). As described earlier through the examples of health data, demographic data, and transportation data, the variable typically represents a time on some scale. The concurrent interaction rules out interactions even with immediate past outcomes and allows only strictly simultaneous interactions, which should limit the applicability of the model. Computationally, our estimator can recover the full functional forms of the functional parameters in a single step, while the estimator in Zhu et al. (2022) must be repeatedly applied at each evaluation point .
On the empirical side, demand forecasting for bike-sharing systems has been an active topic in the data science literature (e.g., Faghih-Imani and Eluru, 2016; Lin et al., 2018; Eren and Uz, 2020; Torti et al., 2021, among others). Among these studies, Faghih-Imani and Eluru (2016) is most closely related to our study in that they employed a spatial panel model similar to (1.1) to analyze the spatial and temporal interaction structure for the bike-sharing system in New York City, CitiBike. In their approach, however, the data are not treated as functional, and thus the model parameters are not allowed to vary over time. By contrast, Torti et al. (2021) analyzed the flow of bikes in the bike-sharing system in Milan, BikeMi, through a functional linear model with functional coefficients; however, they did not account for the spatial interactions of mobility. Thus, our empirical study can be viewed as combining the strengths of these two papers.
The major contributions of this paper are summarized as follows: First, we propose a novel model for analyzing various forms of network and spatial interactions underlying socioeconomic functional data. Second, we formally establish a condition that ensures the outcome functions follow a unique network-stationary process within the model. Third, we develop a novel GMM-type estimator, the integrated-GMM estimator, for estimating the functional parameters. Fourth, we establish the asymptotic properties of the integrated-GMM estimator, including its consistency, rate of convergence, and pointwise limiting distribution. Fifth, we additionally develop a new approach for implementing network impulse response analysis and investigate its convergence property. Finally, we apply our method to the analysis of bike-sharing demand, offering new empirical insights into functional spatial interactions in mobility.
Paper organization
The rest of the paper is organized as follows. In Section 2, we present our model and discuss its stationarity condition. Section 3 introduces our integrated-GMM estimator and investigates its asymptotic properties. In Section 4, we discuss additional topics related to our model, including the estimation of marginal effects and network impulse response analysis. Section 5 conducts a set of Monte Carlo simulations to numerically demonstrate the properties of our estimator. Section 6 presents our empirical analysis on the U.S. bike-sharing data, and Section 7 concludes. Proofs of all technical results are provided in Appendix.
Notation
For a function defined on and , the norm of is written as , and denotes the set of ’s such that . For a random variable , the norm of is written as . For a matrix , , , and denote the Frobenius norm, the maximum absolute column sum, and the maximum absolute row sum of , respectively. If is a square matrix, we use and to denote its largest and smallest eigenvalues, respectively. For a positive integer , we denote . We use to denote an identity matrix of dimension . Finally, if almost surely, and if .
Functional Network Autoregressive Model
The model
Suppose that we have balanced panel data of size : . The number of time periods can be either fixed or tending to infinity jointly with the sample size . Here, denotes a random outcome function of interest with the common support , denotes a vector of covariates, and is the -th element of an time-invariant interaction matrix . The value of each is pre-determined non-randomly. In social network analysis, it is common to set , where is some normalizing constant. Similarly, if each represents a spatial unit, one may use , where is the distance between and , and is a given threshold. As is the convention, we set for all for normalization.
As shown in (1.2), our working model is given as follows: for ,
(2.1) |
where . Recall that we throughout assume is linear in its first argument so that we have . For the structure of network interaction, Beyaztas et al. (2024) and Hoshino (2024) consider alternatively the following form: , reflecting the usual functional linear form in the FDA literature. We cannot say which of this type of interaction structure or the proposed one is more general, but ours may offer some interpretational simplicity. We impose additional shape restrictions on later.
The parameters of primary interest are the interaction effect function and the coefficient functions . The functional individual effects are treated as nuisance parameters. Restricting the support of to be a unit interval is a normalization, which is harmless as long as the response functions have the identical interval support. For simplicity, we do not explicitly assume that is a function of , which can be relaxed easily at the expense of more complicated notations and proofs. A constant term is not included in . In the following, we assume that , , and that and are continuous.
Stationarity
We discuss the stationarity of our model. Recall that our model is a system of simultaneous functional equations, which may not have a unique interior solution in general, depending on the parameter values. Thus, just like the stability condition for vector autoregressive models in the time series literature, we need to impose some conditions on the model to ensure that the outcome functions follow a unique stationary data-generating process and prevent explosive behavior.
Let , , , , and . Then, we can re-write (1.2) in matrix form as
(2.2) |
This expression clearly indicates that our model is characterized as distinct systems of functional equations of size . We introduce the following assumption to ensure that the model has a unique stationary solution.
Assumption 2.1 (Stationarity).
(i) and such that , where . (ii) For any , .
Assumption 2.1(i) requires that the magnitude of the network interaction is not too strong. The existence of is guaranteed by the continuity of . With Assumption 2.1(ii), we have for any that , which indicates the contraction property of the operator . This assumption still accommodates many empirically interesting interaction patterns. For example, in the case of point-evaluation functional , it trivially satisfies . For another example, suppose for some continuous . Since
(2.3) |
where , Assumption 2.1(ii) is implied if holds.
Now, denote , and define a linear operator as
(2.4) |
Then, we can write our model symbolically as follows:
(2.5) |
Further, denoting Id to be the identity operator, if the inverse operator exists, the solution of the system can be uniquely determined up to an equivalence class in as
(2.6) |
The next proposition states that Assumption 2.1 is sufficient for the existence of .
Proposition 2.1.
Note that the explicit form of the inverse operator cannot be derived in general, except for some simple cases such as . In this case, is obtained as . However, in practice, we can approximate it with arbitrary precision by truncating the Neumann series expansion at a sufficiently large order (see, e.g., Kress, 2014). See Remark 1 in Zhu et al. (2022) for a related discussion.
Estimation and Asymptotic Theory
Integrated-GMM estimation
To estimate the unknown functional parameters and , there are broadly two approaches. The first is a ”local” approach that estimates the values of these functions at specific -values, repeating the estimation across different points to recover the full functional forms. The second is a ”global” approach that estimates the entire functional forms in a single step using a series approximation method. Although both approaches are theoretically valid, the local approach typically requires more computation time and often leads to larger variance (but smaller bias) because it does not exploit information from nearby evaluation points. This study adopts the global approach.
Let be a series of orthonormal basis functions. We throughout assume that ’s are continuous on . Then, if the functions and are sufficiently smooth, we can approximate
(3.1) | ||||
(3.2) |
uniformly in , for some coefficient vectors and , . Here, is a sequence of positive integers tending to infinity as increases. For simplicity of presentation, we use the same basis function and the same basis order to approximate both and . Define , ,
(3.3) |
Then, we can further re-write the model (1.2) as
(3.4) |
Here, is an vector of series approximation errors:
(3.5) |
Under the assumptions we will introduce, this approximation error diminishes to zero at a certain rate as goes to infinity. How to choose an appropriate will be discussed in Remark 3.2.
Further, let
(3.18) |
and be the one-period lag operator, whose -th element is defined as
(3.19) |
Then, we can remove the unknown fixed effects from the model in the following manner:
(3.20) |
We estimate based on this expression. In order to consistently estimate , we need to address the endogeneity issue caused by the simultaneous interactions of the response functions; that is, since is correlated with the error term in general, simply regressing on does not yield a consistent estimate of . To tackle this issue, we employ an instrumental variable (IV) approach.
Suppose we have a vector of IVs for . For example, noting , we can find that the network lagged covariates (and also their lags) are valid candidates for . Define
(3.21) |
and . Then, we have .
Now, although one can estimate based on the linear moment conditions only, which results in a two-stage least squares (2SLS) type estimator, we can utilize additionally the quadratic moment conditions to improve the efficiency of estimation (see, e.g., Lin and Lee, 2010). That is, under the independence assumption on the error terms (see Assumption 3.3(i) below), for any matrices , where () is an matrix whose diagonal elements are all zero, we have
(3.22) |
Some examples of include and .
Combining the linear and quadratic moment conditions, we can construct our estimator based on the following moment conditions: for ,
(3.27) |
As the empirical counterpart of these moment conditions, given a candidate value for , we define
(3.32) |
where . The estimator of is obtained by minimizing the norm of over . To this end, we pre-specify grid points in , denoted by , and numerically integrate the moment functions across these points:
(3.33) |
Situations where the response functions are not fully observable are discussed in Remark 3.3.
Now, we are ready to introduce our estimator:
(3.34) |
is a positive definite symmetric weight matrix, and is a compact parameter space containing in its interior. For one example of the weight matrix, we can use , or for another example,
(3.37) |
where
(3.38) |
Once is obtained, the estimators of and are given as
(3.39) | ||||
(3.40) |
which we refer to as the integrated-GMM estimators. Additionally, if one is interested in the estimation of individual fixed effect functions, the following estimator can be used:
(3.41) |
where . Consistent estimation of by requires to increase to infinity, while and can be consistently estimated even when is fixed. More specifically, noting that the error term has mean zero, to average out the errors applying the law of large numbers at each , must increase to infinity. Unlike and , the estimator is not necessarily continuous, as we do not preclude cases where and are discontinuous in .
Asymptotic theory
To derive the asymptotic properties of our estimator, we first need to specify the structure of our sampling space. Following Jenish and Prucha (2012), let , be a possibly uneven lattice, and be the set of observation locations. Once the observation locations are determined for a given sample of size , we assume that they do not vary over time . For spatial data, would be defined by a geographical space with . Note that does not necessarily have to be exactly observable to us. For example, is possibly a complex space of general social and economic characteristics. In this case, we can consider it to be an embedding of individuals in a latent space, rather than their physical locations.
We first derive the rates of convergence of our estimators under the following set of assumptions.
Assumption 3.1 (Sampling space).
(i) The maximum coordinate difference (i.e., the Chebyshev distance) between any two observations , which we denote as , is at least (without loss of generality) 1; and (ii) a threshold distance exists such that if .
Assumption 3.2 (Observables).
(i) are non-stochastic and uniformly bounded; and (ii) for all , , and , for some .
Assumption 3.3 (Error term).
(i) are independent; (ii) for all , , and , , , and ; and (iii) for all and , uniformly in , where .
Assumption 3.4 (Interaction operator).
There exists a function satisfying for any given , such that for all .
Assumption 3.5 (Weight matrices).
(i) For all , is symmetric, , and . In addition, writing , a threshold distance exists such that if ; and (ii) for all sufficiently large .
Assumption 3.6 (Identification).
For all sufficiently large , , where , and .
Assumption 3.7 (Series approximation).
is a series of continuous orthonormal basis functions satisfying and .
Assumptions 3.1(i) and (ii) together imply that the number of interacting partners for each unit is bounded (i.e., the network must be sparse). These assumptions play a crucial role in characterizing the stochastic process of the outcome functions. In Assumption 3.2, part (i) assumes that the covariates are non-stochastic and bounded. This type of assumption is frequently utilized in the spatial and network literature and can be interpreted as viewing the analysis conditional on the realized values of the covariates. Meanwhile, part (ii) is introduced to ensure some convergence results for the quadratic moments.
Assumption 3.3(i) allows the error terms to be fully heteroskedastic. Part (ii) should be standard. Part (iii) is a high-level condition, which plays an important role to obtain the parametric convergence rate for the GMM estimator. In general, if belongs to , it admits the following series expansion:
(3.42) |
for some sequence of constants . By the orthonormality of ,
(3.43) | ||||
(3.44) |
Since can be seen as a numerical approximation of the left-hand side of the above expression, Assumption 3.3 part (iii) essentially requires that uniformly in . In particular, if the ’s are ordered in decreasing manner such that , this assumption can be interpreted in two ways: there exists a constant such that , or there exists a fixed such that for all .
Assumption 3.4 is not restrictive in most empirically relevant situations. For example, in the case where , we can set for any . For another example, when for some kernel , since , we can set in this case.
In Assumption 3.5, we assume that the weight matrices in the quadratic moments are symmetric. Note that this assumption does not lose any generality because for any vector . If is not symmetric in practice, we can always symmetrize it as . The assumption for the existence of a threshold distance may be non-standard, but it simplifies the proof. Since ’s are usually created from the interaction matrix and its powers, this assumption is consistent with Assumption 3.1(ii).
Assumption 3.6 is a regularity condition to ensure the identifiability of . Assumption 3.7 is standard. For example, it is satisfied if spline basis functions are used and and ’s are Hölder class of smoothness order (see, e.g., Chen, 2007; Belloni et al., 2015).
Theorem 3.1 (Rates of convergence).
Result (i) of Theorem 3.1 states that if the functional parameters are sufficiently smooth such that , the series coefficient estimator is consistent and converges at the parametric rate. This result might seem somewhat surprising since the dimension of is increasing to infinity. An intuitive explanation for this phenomenon is that, although the sample size is , the total number of observation points used in the estimation is (). This fact, in conjunction with Assumption 3.3(iii), leads to the result. The same convergence rate applies to the -convergence rate of the functional estimators, as shown in (ii) and (iii). The uniform convergence rate for these estimators is slower than the -convergence rate. However, note that the convergence results obtained here are not necessarily the sharpest, and the theoretically optimal convergence rates under our setup are also unknown. These points are left for future research.
Remark 3.1 (Local estimation approach).
If one adopts a local approach that directly estimates and at each , since there are exactly observations at each , it can be readily shown that and . Since the local approach does not rely on series approximation, these results are free from bias terms. However, while achieving unbiasedness, the local estimator faces challenges in deriving the uniform convergence rate.
We next present the limiting distribution of our estimators. To this end, we introduce the following notations and additional assumption:
(3.45) | |||
(3.50) | |||
(3.51) | |||
(3.52) |
More explicit forms of the matrices and can be found in (A.23) and in (A.38) in Appendix A, respectively. Further, let and be the selection matrices such that and hold.
Assumption 3.8 (Misc.).
For all sufficiently large , (i) ; (ii) ; and (iii) .
Theorem 3.2 (Asymptotic normality).
Theorem 3.2 establishes the pointwise asymptotic normality of the integrated-GMM estimators. As is common in series estimation, we impose additional undersmoothing conditions to ensure that the bias terms vanish sufficiently quickly.
In order to perform statistical inference based on the results of Theorem 3.2, we need to consistently estimate the variances and . To save space, the procedure for consistent variance estimation is not discussed here but is provided in Appendix C.
Remark 3.2 (Choice of ).
Suppose that is proportional to for some and that is of order . Then, to achieve the asymptotic normality, we require and simultaneously, which can be reduced to the following condition on : . This result automatically implies that the target functions must be sufficiently smooth with the smoothness at least greater than .
Remark 3.3 (Incompletely observed response function).
The integrated-GMM estimator is often infeasible because the response functions are typically observed only at a finite set of points in . Even in such cases, we can approximate the entire functional form of using a linear interpolation method. Suppose that for each , is observed at distinct points . Then, for each given , define
(3.55) |
When (resp. ), we set (resp. ). Other than linear interpolation, one may also use a kernel method, as in Zhu et al. (2022), to obtain . Then, using in place of , we can write
(3.56) |
where is the interpolation error: . Thus, if converges to zero sufficiently quickly for all , , and , we can apply the same estimation and inference strategy as above.
Network multiplier effects: marginal effects and impulse responses
Once the model is estimated, as a next step, one might be interested in computing the marginal effects of covariates on the outcome. In a standard linear regression model without network interaction, the estimated coefficients directly represent the marginal effects of their corresponding covariates. However, in the presence of intricate functional interaction, this is no longer the case.
As shown in Section 2, under Assumption 2.1, we have the following moving-average type representation:
(4.1) |
This expression indicates that the marginal effects of increasing by one unit on is given by by the linearity of , where is the -th column of , and denotes the -th column of . Alternatively, a little more informative expression can be obtained as follows: letting ,
(4.2) | ||||
(4.3) |
where , and for . From this, we can clearly see that the marginal effects of increasing consist of the direct effect on unit , the indirect effect on ’s immediate neighbors, the second-order indirect effect on ’s neighbors’ neighbors, and so forth, highlighting the presence of the network multiplier effect. More specifically, recall that when represents a (weighted) adjacency matrix, the -th element of corresponds to the number of (weighted) walks between and of length . Thus, the -th element of is interpreted as the weighted sum of the number of walks from to , where the contribution of each length- walk to the sum decays exponentially at .
To estimate the marginal effects, not just replacing the unknown parameters with their estimators, the infinite sum generally needs to be approximated by a finite sum: for some positive integer ,
(4.4) |
where . Meanwhile, in the special case of concurrent interaction such that , , …, it is easy to see that holds. This implies that, in this case, we can estimate directly as , without computing the infinite sum.
In the above discussion, we have demonstrated how the impacts of shifting one’s covariate propagate to others. Similarly, just like the impulse response analysis in time-series vector autoregression, we can consider network impulse responses when an external shock occurs at a given unit. In particular, in a similar spirit to Koop et al. (1996), we define
(4.5) |
where is a given ”function” representing the external shock. By a similar calculation as above, we obtain
(4.6) |
When plotting each element of against , it can be interpreted as a network version of the impulse response function (as a function of ), similarly to Denbee et al. (2021). The expected total social impact caused by an external shock to unit can be expressed as , and the unit that exerts the largest influence on the society is given by . Denbee et al. (2021) referred to this unit as the risk key player, in the sense that an external shock to leads to the highest volatility in the aggregate outcome.
When assuming a concurrent interaction model, the impulse responses at take the following form: . Thus, if there is no exogenous shock at , i.e., if , the expected outcome at remains unaffected. This implies, for instance, that a travel demand shock that occurred five minutes ago has no impact on current mobility availability, which is unrealistic. On the other hand, if the interaction structure is given by with for , then a shock occurring at can transmit to the outcome at , leading to nonzero impulse responses at even when .
The estimation of can be performed in the same manner as above. For some positive integer , we estimate by . The next proposition provides the convergence rate of and that of .
Proposition 4.1.
Suppose that the assumptions in Theorem 3.1 hold. In addition, assume that . Then, uniformly in ,
-
(i)
,
-
(ii)
.
This proposition indicates that the uniform convergence rates for the marginal effect and the impulse response estimators depend on the uniform convergence rate of the integrated-GMM estimator and the summation order . Since the approximation error from truncating the infinite sum decreases geometrically as increases, in practice, setting or would be sufficient.
Monte Carlo Simulation
In this section, we conduct a series of Monte Carlo experiments to evaluate the finite-sample performance of the integrated-GMM estimator. Throughout the experiments, we consider the following data-generating process (DGP):
(5.1) |
where , , denotes the standard normal density function with mean and variance , (i.e., the Epanechnikov kernel function), and the individual fixed effects are given by . The coefficient function is given by , where is chosen from . We use as the IVs for , and, thus, the magnitude of determines the strength of these IVs. For the error term, we generate , where denotes the number of units connected to (i.e., ’s degree), and for . The weight matrix is a row-normalized adjacency matrix, which is constructed by randomly placing units on a lattice, where denotes the nearest integer to . Any two units are connected if the Euclidean distance between them is exactly one. The sample size is chosen from , and is from .
Since our DGP satisfies the conditions in Assumption 2.1, we can generate the outcome functions using the Neumann series approximation: , where is increased iteratively until is satisfied at each and . Throughout the simulations, integrals over are approximated by finite summations over 99 equally-spaced grid points.
For the choice of basis functions, we use the cubic B-splines orthonormalized via the Gram-Schmidt procedure. The number of inner knots for the B-spline is selected from . The number of grid points used to evaluate the moment function is chosen from , with the points evenly spaced over . For the quadratic moments, we use two weight matrices (): and . We then compare the performance of three different estimators: GMM 1: the integrated-GMM estimator using the weight matrix given in (3.37), GMM 2: the integrated-GMM estimator using the identity weight matrix, and (integrated) 2SLS: GMM 1 estimator without utilizing the quadratic moment conditions.
For each setup, we generate the dataset 500 times. The performance of the estimators is evaluated based on the average bias (BIAS) and the average root-mean-squared error (RMSE). Specifically, the BIAS and RMSE of estimating are defined as
(5.2) | ||||
(5.3) |
respectively. Here, denotes the estimator of obtained from the -th replicated dataset. The BIAS and RMSE for are defined analogously.
The simulation results for the estimation of are summarized in Table 5.1. From these results, we observe that all three estimators perform reasonably well in terms of BIAS. However, in terms of RMSE, GMM 1 clearly outperforms the other two estimators across all scenarios. Recalling that, without the quadratic moment conditions, GMM 1 and 2SLS are numerically identical, it follows that GMM 1’s efficiency gain relative to 2SLS stems solely from the quadratic moment conditions. Interestingly, when comparing GMM 2 and 2SLS, we find that 2SLS even outperforms GMM 2. These findings suggest that while incorporating quadratic moments does improve the efficiency, the choice of the GMM weight matrix is equally (or potentially more) crucial. When increases from 0.4 to 1, the RMSEs for the GMM estimators are roughly halved or slightly more, whereas those for 2SLS shrink by more than half. This indicates that the 2SLS estimator is more sensitive to IV strength, as anticipated. The choices of and seem to have only minor impacts on performance. When we increase the sample size from to , the RMSE values are roughly halved, demonstrating -consistency of the estimators, which numerically corroborates our theoretical result in Theorem 3.1(ii).
Table 5.2 presents the simulation results for estimating . Here, GMM 1 and 2SLS perform quite similarly, whereas GMM 2 is slightly less accurate than the other two. Notably, the RMSEs remain almost unchanged across different values of for all three estimators, suggesting that IV strength is not a critical factor. As with , the estimation of also demonstrates -consistency.
GMM 1 | GMM 2 | 2SLS | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
BIAS | RMSE | BIAS | RMSE | BIAS | RMSE | |||||
40 | 5 | 10 | 2 | 0.4 | 0.0062 | 0.1132 | -0.0517 | 0.3820 | 0.0479 | 0.2135 |
1 | 0.0083 | 0.0649 | -0.0205 | 0.2128 | 0.0136 | 0.0820 | ||||
3 | 0.4 | 0.0061 | 0.1131 | -0.0564 | 0.3938 | 0.0479 | 0.2135 | |||
1 | 0.0083 | 0.0649 | -0.0242 | 0.2230 | 0.0136 | 0.0820 | ||||
30 | 2 | 0.4 | 0.0064 | 0.1135 | -0.0406 | 0.3562 | 0.0479 | 0.2135 | ||
1 | 0.0084 | 0.0650 | -0.0138 | 0.1930 | 0.0136 | 0.0820 | ||||
3 | 0.4 | 0.0064 | 0.1135 | -0.0418 | 0.3610 | 0.0479 | 0.2134 | |||
1 | 0.0084 | 0.0650 | -0.0150 | 0.1965 | 0.0136 | 0.0820 | ||||
10 | 10 | 2 | 0.4 | -0.0027 | 0.0773 | -0.0357 | 0.2701 | 0.0142 | 0.1370 | |
1 | -0.0007 | 0.0456 | -0.0179 | 0.1419 | 0.0028 | 0.0540 | ||||
3 | 0.4 | -0.0027 | 0.0773 | -0.0387 | 0.2796 | 0.0142 | 0.1370 | |||
1 | -0.0007 | 0.0456 | -0.0203 | 0.1493 | 0.0028 | 0.0540 | ||||
30 | 2 | 0.4 | -0.0028 | 0.0773 | -0.0296 | 0.2478 | 0.0142 | 0.1370 | ||
1 | -0.0007 | 0.0455 | -0.0146 | 0.1260 | 0.0028 | 0.0540 | ||||
3 | 0.4 | -0.0029 | 0.0773 | -0.0307 | 0.2520 | 0.0142 | 0.1370 | |||
1 | -0.0007 | 0.0455 | -0.0150 | 0.1283 | 0.0028 | 0.0540 | ||||
80 | 5 | 10 | 2 | 0.4 | -0.0030 | 0.0812 | -0.0394 | 0.2801 | 0.0140 | 0.1495 |
1 | -0.0004 | 0.0470 | -0.0196 | 0.1508 | 0.0019 | 0.0593 | ||||
3 | 0.4 | -0.0030 | 0.0812 | -0.0423 | 0.2904 | 0.0140 | 0.1495 | |||
1 | -0.0004 | 0.0470 | -0.0210 | 0.1578 | 0.0019 | 0.0593 | ||||
30 | 2 | 0.4 | -0.0029 | 0.0813 | -0.0336 | 0.2578 | 0.0140 | 0.1495 | ||
1 | -0.0004 | 0.0470 | -0.0163 | 0.1344 | 0.0019 | 0.0593 | ||||
3 | 0.4 | -0.0029 | 0.0813 | -0.0342 | 0.2619 | 0.0140 | 0.1495 | |||
1 | -0.0004 | 0.0470 | -0.0163 | 0.1365 | 0.0019 | 0.0593 | ||||
10 | 10 | 2 | 0.4 | -0.0024 | 0.0539 | -0.0142 | 0.1996 | 0.0023 | 0.0966 | |
1 | -0.0016 | 0.0321 | -0.0058 | 0.0980 | -0.0007 | 0.0385 | ||||
3 | 0.4 | -0.0024 | 0.0539 | -0.0168 | 0.2071 | 0.0023 | 0.0966 | |||
1 | -0.0016 | 0.0321 | -0.0076 | 0.1032 | -0.0007 | 0.0385 | ||||
30 | 2 | 0.4 | -0.0023 | 0.0538 | -0.0109 | 0.1812 | 0.0023 | 0.0966 | ||
1 | -0.0016 | 0.0320 | -0.0044 | 0.0860 | -0.0007 | 0.0385 | ||||
3 | 0.4 | -0.0024 | 0.0538 | -0.0118 | 0.1842 | 0.0023 | 0.0966 | |||
1 | -0.0016 | 0.0320 | -0.0046 | 0.0874 | -0.0007 | 0.0385 |
GMM 1 | GMM 2 | 2SLS | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
BIAS | RMSE | BIAS | RMSE | BIAS | RMSE | |||||
40 | 5 | 10 | 2 | 0.4 | 0.0015 | 0.0653 | 0.0039 | 0.0966 | 0.0025 | 0.0681 |
1 | 0.0034 | 0.0669 | 0.0128 | 0.1091 | -0.0019 | 0.0670 | ||||
3 | 0.4 | 0.0015 | 0.0653 | 0.0042 | 0.0978 | 0.0025 | 0.0681 | |||
1 | 0.0034 | 0.0669 | 0.0139 | 0.1115 | -0.0019 | 0.0670 | ||||
30 | 2 | 0.4 | 0.0015 | 0.0653 | 0.0032 | 0.0928 | 0.0025 | 0.0681 | ||
1 | 0.0034 | 0.0669 | 0.0098 | 0.1029 | -0.0019 | 0.0670 | ||||
3 | 0.4 | 0.0015 | 0.0653 | 0.0033 | 0.0930 | 0.0025 | 0.0681 | |||
1 | 0.0034 | 0.0669 | 0.0100 | 0.1032 | -0.0019 | 0.0670 | ||||
10 | 10 | 2 | 0.4 | 0.0004 | 0.0428 | 0.0032 | 0.0612 | 0.0001 | 0.0438 | |
1 | 0.0014 | 0.0435 | 0.0088 | 0.0654 | -0.0014 | 0.0439 | ||||
3 | 0.4 | 0.0004 | 0.0428 | 0.0034 | 0.0617 | 0.0001 | 0.0438 | |||
1 | 0.0014 | 0.0435 | 0.0094 | 0.0666 | -0.0014 | 0.0439 | ||||
30 | 2 | 0.4 | 0.0004 | 0.0428 | 0.0026 | 0.0590 | 0.0001 | 0.0438 | ||
1 | 0.0014 | 0.0435 | 0.0072 | 0.0620 | -0.0014 | 0.0439 | ||||
3 | 0.4 | 0.0004 | 0.0428 | 0.0027 | 0.0591 | 0.0001 | 0.0438 | |||
1 | 0.0014 | 0.0435 | 0.0073 | 0.0621 | -0.0014 | 0.0439 | ||||
80 | 5 | 10 | 2 | 0.4 | 0.0022 | 0.0435 | 0.0057 | 0.0669 | 0.0030 | 0.0445 |
1 | 0.0039 | 0.0442 | 0.0128 | 0.0739 | 0.0010 | 0.0442 | ||||
3 | 0.4 | 0.0022 | 0.0435 | 0.0059 | 0.0676 | 0.0030 | 0.0445 | |||
1 | 0.0039 | 0.0442 | 0.0135 | 0.0754 | 0.0010 | 0.0442 | ||||
30 | 2 | 0.4 | 0.0022 | 0.0435 | 0.0051 | 0.0641 | 0.0030 | 0.0445 | ||
1 | 0.0038 | 0.0442 | 0.0110 | 0.0695 | 0.0010 | 0.0442 | ||||
3 | 0.4 | 0.0022 | 0.0435 | 0.0051 | 0.0642 | 0.0030 | 0.0445 | |||
1 | 0.0038 | 0.0442 | 0.0110 | 0.0697 | 0.0010 | 0.0442 | ||||
10 | 10 | 2 | 0.4 | 0.0009 | 0.0304 | 0.0015 | 0.0424 | 0.0013 | 0.0306 | |
1 | 0.0016 | 0.0306 | 0.0041 | 0.0446 | 0.0005 | 0.0305 | ||||
3 | 0.4 | 0.0009 | 0.0304 | 0.0017 | 0.0428 | 0.0013 | 0.0306 | |||
1 | 0.0016 | 0.0306 | 0.0045 | 0.0455 | 0.0005 | 0.0305 | ||||
30 | 2 | 0.4 | 0.0009 | 0.0303 | 0.0012 | 0.0407 | 0.0013 | 0.0306 | ||
1 | 0.0016 | 0.0306 | 0.0034 | 0.0421 | 0.0005 | 0.0305 | ||||
3 | 0.4 | 0.0009 | 0.0303 | 0.0013 | 0.0408 | 0.0013 | 0.0306 | |||
1 | 0.0016 | 0.0306 | 0.0034 | 0.0422 | 0.0005 | 0.0305 |
Analyzing the Demand of Bike-Sharing System
As an empirical application of our method, we analyze spatial interactions in the demand for a bike-sharing system in the U.S. Demand analysis of shared mobility has been a highly active research topic in recent years across various areas, including transportation research, marketing, economics, and environmental studies. In particular, bike-sharing systems have attracted increasing attention. For a comprehensive review of this literature, see Eren and Uz (2020), for instance.
Data
The dataset used in this analysis comes from the Bay Area Bike Share in San Francisco, which was established in August 2013 and is now known as Bay Wheels. The dataset is publicly available on the Kaggle website (https://www.kaggle.com/datasets/benhamner/sf-bay-area-bike-share). It contains detailed information about the system from August 2013 to August 2015, including station locations, the number of available bicycles at each station over time, and all trip-level data during this time period. The trip data include details such as start and end times and stations, as well as the user type (subscriber or casual user). In this dataset, there are 70 bike stations in total; for a map of all 70 station locations, see Figure 6.1.

Since the initial installation of stations in August 2013, the 70th station (Ryland Park station) was added in April 2014. Accordingly, we use data from May 2014 to August 2015 for this analysis, which represents the largest balanced panel dataset that can be extracted from the raw data.
One concern in the analysis is that the shared mobility services often relocate vehicles or bikes from one station to another to maintain service availability across all locations. To detect potential relocations, we first identified instances where the number of available bicycles jumped up/down by more than or equal to 10 all at once. We then examined the distribution of these events across different hours and days, as given in Figure D.1 in Appendix D. From this figure, we can observe that sudden drops or increases in bike availability tend to occur between midnight and early morning, particularly on Sundays. Although we cannot access to formal records of actual relocation operations, these patterns may suggest that they are likely the result of bike relocation carried out by the service provider. Another concern is the enormous size of the dataset. Because the original data are recorded in minutes every day, using the raw data directly can lead to a memory problem. Moreover, daily data tend to fluctuate and to be noisy due to random events.
To address the aforementioned issues, we first rounded the trip data to 15-minute intervals and then averaged over Monday through Friday at each interval, discarding data from Saturdays and Sundays. Furthermore, to avoid potential bike relocation events in weekdays, we restrict the analysis to the time period from 6 AM to 9 PM. Consequently, our final dataset is a weekly-level panel with stations and weeks. The outcome of interest is the number of bicycles at each station for , where corresponds to 6 AM and corresponds to 9 PM.
Figure 6.2 presents the trajectories of average bike availability for all 70 stations during the first week in our panel. It clearly shows that most of the variation in bike availability occurs between 6 AM and 9 PM.

Empirical results
Based on the dataset constructed as described above, we estimate model (1.2), where
(6.1) | ||||
(6.2) | ||||
(6.3) | ||||
subscribers (arriving at station ), rainy day dummy, month dummies ]⊤ | (6.4) | |||
(6.5) |
Here, denotes the Euclidean distance between stations and . The estimation procedure is basically the same as the GMM 1 estimator in Section 5, with . The rainy day dummy and month dummy variables are not used as IVs. All integrals are approximated by finite summations over grid points at 15-minute intervals.
The estimation result for the interaction effect function is presented in Figure 6.3. In the figure, the shaded area depicts the (pointwise) 95% confidence interval. From the figure, we observe that positive spatial interaction in bicycle availability exists during the morning hours. Although the model itself is agnostic regarding the reason for the interaction, it is plausible that as bike-sharing becomes more popular particularly among commuters, it encourages further use of the service, thereby reinforcing demand during the morning. Meanwhile, interestingly, negative interaction appears around 5–7 PM. In the evening, main users may include not only returning commuters but also individuals going out for dining, shopping, concerts, etc. As a result, bicycles might accumulate at certain popular stations while nearby less-popular stations experience lower availability, leading to the negative interaction.

To save space, the estimation results for are presented in Figure D.2 in Appendix D, excluding the coefficients for the month dummies. Among the key covariates, we observe that only the ratio of arriving subscribers has a statistically significant positive impact on bike availability. This result is intuitive, as stations with a higher number of regular users arriving are expected to hold a richer stock of bikes. For other variables, for instance, the rainy day dummy has a positive effect on bike availability, which is consistent with previous studies, though the effect is not statistically significant. One possible explanation is that the rainy ”day” dummy does not capture detailed temporal variations (i.e., it is not a function of ), and since our dataset is averaged over weeks, these may have diluted its impact.
Lastly, we conduct an impulse response analysis. The figures summarizing the results are presented in Figure 6.4. For illustration, we arbitrarily select the Embarcadero at Folsom station as the target station receiving an external shock. Specifically, we consider a hypothetical scenario in which the bike stock at this station is reduced by 2 at the peak of 9 AM (panel (a)). Panels (b) and (c) illustrate how the shock propagates to its two nearest stations, Spear at Folsom and Temporary Transbay Terminal. These figures indicate that the external shock spills over to these stations with a slight time delay, peaking just before 10 AM. Since the magnitude of both the external shock and spatial interaction is moderate in this analysis, the impulse responses for both stations are relatively mild.



IRF: , , and .
Conclusion
In this paper, we proposed a novel functional regression framework to analyze spatial and network interactions in functional panel data settings. By extending the standard NAR model to accommodate functional outcomes and individual fixed effects, we developed an integrated-GMM estimator that can estimate the functional parameters potentially more efficiently than 2SLS-based estimators. Under certain conditions, we established the theoretical properties of our estimator, including the consistency, convergence rates, and asymptotic normality, and confirmed its finite-sample performance through Monte Carlo simulations. As an empirical application, we analyzed the demand for a bike-sharing system in the San Francisco Bay Area, revealing significant spatial interactions in bike availability that vary over the time of day. Our findings highlight the importance of accounting for functional spatial dependencies in the demand for shared mobility services and the practical usefulness of our method.
Several unsolved questions still remain. These include: How can we specify the weight function in a data-driven manner? Is it possible to extend the current framework to cases where functional outcomes are only sparsely observed? How can we construct a uniform confidence band for each functional parameter? How can we estimate the model when it has a large number of covariates? We leave these questions for future research.
Acknowledgments
Hoshino’s work was supported by JSPS KAKENHI Grant Number 23KK0226. Most parts of this paper were written during Hoshino’s research visit at the Melbourne Business School (MBS), University of Melbourne. He is deeply grateful to MBS for their hospitality.
Appendix A Preparation
The following definition is from Jenish and Prucha (2012).
Definition A.1 (Near-epoch dependence).
Let and be triangular arrays of random fields, where and are real-valued and general (possibly infinite-dimensional) random variables, respectively. Then, the random field is said to be -near-epoch dependent (NED) on if
for an array of finite positive constants and some function with as , where is the -field generated by . The ’s and are called the NED scaling factors and NED coefficient, respectively. The is said to be uniformly -NED on if is uniformly bounded. If for some , then it is called geometrically -NED.
In the following, for a general , we denote
(A.1) | ||||
(A.2) |
Since we have assumed that the basis functions are continuous, so are and , and thus they are uniformly bounded on by the extreme value theorem. For a given , the residual vector can be written as , where
(A.3) | ||||
(A.4) |
Under Assumptions 2.1(i), 3.2, and 3.4, we have
(A.5) | ||||
(A.6) |
for , uniformly in , , and .
Finally, for ease of reference, we provide a list of some basic facts below:
(A.7) | ||||
(A.8) | ||||
(A.9) | ||||
(A.10) | ||||
(A.11) | ||||
(A.12) | ||||
(A.13) |
Empirical moment function:
(A.18) |
Jacobian of :
(A.23) |
Decompose with
(A.28) | ||||
(A.33) |
The variance-covariance matrix of :
(A.38) |
where
(A.39) |
denotes the -th column of , , and, noting that and that is symmetric,
(A.40) |
Note that the cross terms between the linear and quadratic moments are zero:
(A.41) | |||
(A.42) |
Appendix B Proofs
Proof of Proposition 2.1
Under Assumption 2.1, we have
(B.1) |
for any . This implies that . As is well known, if the operator norm of is smaller than one, exists (e.g., Theorem 2.14, Kress (2014)). It is immediate from (B.1) that follows for any such that , which yields the desired result.
Proof.
Observe that
(B.6) |
By Assumptions 3.2(ii) and 3.4,
(B.7) | ||||
(B.8) |
uniformly in , implying that . Then, we have
(B.9) | ||||
(B.10) |
uniformly in and under Assumption 3.7. This implies that the first elements of are of order .
Next, by Cauchy-Schwarz inequality and the facts that and under Assumption 3.5(i), we obtain
(B.11) | ||||
(B.12) | ||||
(B.13) |
Similarly as above, by the inequality,
(B.14) | ||||
(B.15) |
By Cauchy-Schwarz inequality,
(B.16) |
Further, Assumptions 3.2(ii) and 3.4 imply that uniformly in and . Thus, is uniformly bounded, and we have
(B.17) |
uniformly in and .
Denote the population GMM objective function as follows:
(B.23) |
Lemma B.2.
Proof.
Decompose
(B.24) | ||||
(B.25) |
In view of
(B.30) |
we can find that is bounded below from for some for all sufficiently large , under Assumptions 3.5(ii) and 3.6. Further, Cauchy-Schwarz inequality and Lemma B.1 give that
(B.31) | ||||
(B.32) |
Hence, since is bounded below from zero and , we have
(B.33) | ||||
(B.34) |
for all sufficiently large . This completes the proof. ∎
Lemma B.3.
Proof.
We prove the lemma in a similar manner to Jenish (2012) and Hoshino (2022). Recall that is uniquely determined in as under Assumption 2.1 for all . We denote
(B.35) |
such that holds for each .
Define
(B.36) |
for some . Since is separable, both and are Polish space-valued random elements in and , respectively. Then, by Lemma 2.11 of Dudley and Philipp (1983) (see also Lemma A.1 of Jenish (2012)), a function exists such that has the same law as that of , which is an appropriate permutation of , where is a random variable uniformly distributed on and independent of .
Now, with a slight abuse of notation, we write
(B.37) | ||||
(B.38) |
where . To be specific,
(B.39) | ||||
(B.40) |
By construction, we have
where is the -field generated by . Similarly, we have .
Here, suppose that , where is as provided in Assumption 3.1(ii). Then, because at least ’s own is included in , we have , and hence
(B.41) |
holds. Thus, by Minkowski’s inequality and Assumptions 3.2(ii) and 3.4,
where , and . Similarly, when holds, noting now that under Assumption 3.1(ii) we have for all ’s who are direct neighbors of ,
Applying the same argument recursively, for such that for all ’s in the -th order neighborhood of , we obtain
(B.42) |
Finally, by Jensen’s inequality and (B.42),
(B.43) |
as by Assumption 2.1. This completes the proof. ∎
Lemma B.4.
Proof.
As a useful consequence from Lemma B.3 and B.4, we have
(B.50) | ||||
(B.51) | ||||
(B.52) |
uniformly in , , and ; that is, is uniformly and geometrically -NED.
Lemma B.5.
Suppose that Assumption 3.3(i) holds. Let be a geometrically -NED random field on for all , independent of for . Denote and . Then,
-
(i)
for all with some geometric NED coefficient ;
-
(ii)
for all with some geometric NED coefficient .
Proof.
Since the proofs are similar, we only prove (ii). Decompose , where
(B.53) |
where is the sigma field generated from . Since and are assumed to be independent, holds. Then, for each pair and , denoting ,
(B.54) | ||||
(B.55) | ||||
(B.56) |
The first term on the right-hand side is zero by Assumption 3.3(i). Note that, by Jensen’s and triangle inequalities, . In addition, . Then, since is assumed to be -NED on at each , it holds that
(B.57) | ||||
(B.58) |
Hence, Cauchy–Schwarz inequality gives
(B.59) |
The same inequality applies to . Furthermore,
(B.60) |
This completes the proof. ∎
Lemma B.6.
Proof.
Below, for a generic variable x indexed by and , we denote . In addition, we write .
(i) Observe that for each ,
(B.61) |
As a typical element, the variance of the first element of is given as
(B.62) | ||||
(B.63) | ||||
(B.64) | ||||
(B.65) |
Here, the dependence on , ””, is occasionally omitted for notational simplicity. First, from Assumptions 2.1(i), 3.2(ii), and 3.4, we can easily see that . Second, by Lemma B.4 and B.5(i), there exists a geometric NED coefficient that satisfies
(B.66) | ||||
(B.67) | ||||
(B.68) |
where the second inequality is from Lemma A.1(iii) Jenish and Prucha (2009), and the final claim follows from the geometric NED property. Third, since and are independent if , it holds that
(B.69) |
Finally, noting that , Lemma B.5(ii) implies that this term is also of order .
Combining the above results suggests that for each , which completes the proof by applying Markov’s and triangle inequalities.
(ii) Analogous to the proof of (i).
(iii) Observe that
(B.70) | ||||
(B.71) |
Then, the result follows from . The second part can be proved analogously.
(iv) By the triangle inequality,
(B.72) | ||||
(B.73) |
Further, by Assumptions 3.3(i) and (iii),
(B.74) | |||
(B.75) | |||
(B.76) |
Repeating the same calculation for the other term, the result follows from Markov’s inequality.
(v) Observe that
(B.77) | |||
(B.78) |
Here, let and recall that there is a constant such that if . Then, noting that for with and ,
(B.79) | ||||
(B.80) | ||||
(B.81) |
which implies that is uniformly and geometrically -NED, for all , , and .
Now, suppressing the dependence on both and ,
(B.82) | ||||
(B.83) | ||||
(B.84) | ||||
(B.85) | ||||
(B.86) |
As we have seen in (A.5), we have , for . This allows us to use Lemma A.1 of Xu and Lee (2015) (see also Corollary 4.3(b) of Gallant and White (1988)) to show that is uniformly and geometrically -NED. Then, following the same argument as in the proof of (i), we can show that for each , which gives the desired result by the triangle inequality.
(vi), (vii) These can be proved in a similar manner to the proof of Lemma B.1.
(viii) For each , holds under Assumptions 3.3(i) and (ii), as in Lemma 9 in Yu et al. (2008) and Lemma 1 in Lee and Yu (2014). Then, the result is straightforward.
∎
Proof.
Observe that
(B.87) |
For the first term,
(B.88) | ||||
(B.89) | ||||
(B.90) |
uniformly in , since and .
For the second term,
(B.91) | ||||
(B.92) | ||||
(B.93) |
uniformly in , since
(B.94) | ||||
(B.95) | ||||
(B.96) | ||||
(B.97) |
This completes the proof. ∎
Lemma B.8.
Proof.
Observe that
(B.102) |
where
(B.103) | ||||
(B.104) |
and, for ,
(B.105) |
Hence,
(B.106) | ||||
(B.107) |
uniformly in . Further, by Cauchy-Schwarz inequality and Lemma B.7,
(B.108) | ||||
(B.109) | ||||
(B.110) | ||||
(B.111) |
Combined with the identifiability of (Lemma B.2), the above result implies the consistency of (see, e.g., the proof of Theorem 3.3 in Su and Hoshino (2016)). ∎
Proof of Theorem 3.1.
(i) Given the consistency result in Lemma B.8, if we can show that for an arbitrary , there exists a constant such that for all sufficiently large ,
(B.112) |
we can conclude that .
Decompose
(B.113) | ||||
(B.114) |
Lemma B.6(ii) implies that , where . Thus, by Assumption 3.6, we have with probability approaching one. Observing that
(B.119) |
we obtain for some with probability approaching one.
For the second term, we can find by Cauchy-Schwarz inequality that
(B.120) | ||||
(B.121) |
Hence,
(B.122) | ||||
(B.123) |
Since is bounded below from , if we set , we can obtain the desired inequality by choosing a sufficiently large . From Lemma B.6(iii), (iv), (vi), (vii), and (viii), we have
(B.124) |
and this completes the proof.
(ii) Note that by orthonormality. Then, by result (i) and Assumption 3.7,
(B.125) | ||||
(B.126) | ||||
(B.127) |
It is also straightforward to see that .
(iii) Analogous to the proof of (ii). ∎
Lemma B.9.
Proof.
(i) Observe that
(B.128) |
where
(B.129) | ||||
(B.130) | ||||
(B.131) |
In a similar manner to the proof of Lemma B.6(v), we can show that . Then, by Assumption 3.8(i) and Theorem 3.1(i), we have
(B.132) | ||||
(B.133) | ||||
(B.134) |
For , by the same argument as in Lemma B.6(v), we can show that . This completes the proof.
(ii) By the triangle inequality,
(B.135) | ||||
(B.136) | ||||
(B.137) |
where the last inequality is from result (i) and Assumption 3.8(ii).
(iii) By definition of , we have and thus , as in result (i). Then, by the triangle inequality,
(B.138) | ||||
(B.139) | ||||
(B.140) | ||||
(B.141) |
Proof of Theorem 3.2.
Since the proof is similar, we only prove (i). By the first-order condition of minimization and the mean-value expansion, we have
(B.144) | ||||
(B.145) |
where , leading to
(B.146) | ||||
(B.147) |
with
(B.148) | ||||
(B.149) | ||||
(B.150) | ||||
(B.151) |
First, observing that
(B.152) | ||||
(B.153) | ||||
(B.154) | ||||
(B.155) |
we can find that by Lemma B.6(iii), (vi), and (vii). Next, it is easy to see that
(B.156) | ||||
(B.157) |
from . Further, for , by Lemma B.9(ii) and (iv) and (B.124) with Assumption 3.8(ii), we can show that .
Combining all these results and noting that
(B.158) |
for sufficiently large , we have
(B.159) | ||||
(B.160) | ||||
(B.161) |
Here, let and denote the first elements and -th element of , respectively. Then, we can write
(B.162) | |||
(B.163) | |||
(B.164) |
Moreover, for convenience, we re-label the data such that , , …, (where ).
Let . Recalling the block-diagonal structure of and that its diagonals are all zero, we can find that the diagonal elements of are also all zero. Further, note that is symmetric. Now, letting be the i-th column of , define
(B.165) | ||||
(B.166) | ||||
(B.167) |
and we further re-write
(B.168) |
Here, let denote the -field generated by . Under Assumption 3.3(i), we have , implying that forms a martingale difference sequence for each . Then, it suffices to check the following two conditions for the central limit theorem of Scott (1973):
(B.169) | ||||
(B.170) |
Verification of condition (1)
For the second term on the right-hand side, noting that ,
(B.173) | |||
(B.174) |
Since is symmetric and its diagonals are zero, recalling the definition of in (A.40), direct calculation yields
(B.175) | ||||
(B.176) | ||||
(B.177) | ||||
(B.178) | ||||
(B.179) |
Meanwhile,
(B.180) | ||||
(B.181) | ||||
(B.182) | ||||
(B.183) | ||||
(B.184) |
Consequently, holds from Chebyshev’s inequality.
For the third term, for ,
(B.185) | ||||
(B.186) |
where . Hence, we can write
(B.187) |
Noting that ,
(B.188) | ||||
(B.189) |
Then, by Markov’s inequality, we obtain .
Finally, combining the above results gives
(B.190) | |||
(B.200) | |||
(B.201) |
as desired.
Verification of condition (2)
To verify condition (2), it is sufficient to show that . Moreover, by the inequality,
(B.202) |
For the first term on the right-hand side, noting that Assumption 3.3(ii) implies by Hölder’s inequality,
(B.203) | ||||
(B.204) |
For the second term, observe that
(B.205) | |||
(B.206) |
Further, it is easy to see that by Markov’s inequality. Hence, we have , and combining this with the previous result implies condition (2).
∎
Proof of Proposition 4.1.
Since the proofs of (i) and (ii) are almost identical, we only prove (i). By the triangle inequality,
(B.207) |
where . For the first term on the right-hand side, observe that
(B.208) | ||||
(B.209) |
for all . By definition, when ,
(B.210) |
uniformly in , where . When , by Assumption 3.4,
(B.211) | ||||
(B.212) | ||||
(B.213) | ||||
(B.214) |
uniformly in . Similarly, when , we have
(B.215) | ||||
(B.216) | ||||
(B.217) | ||||
(B.218) |
Thus, repeating the same computation recursively, we can obtain for general under . From a straightforward calculation, we have , which leads to .
Next, observe that and that
(B.219) |
by repeatedly applying Assumption 3.4, where . Hence, we have
(B.220) |
Combining these results completes the proof.
∎
Appendix C Consistent variance estimation
First, observe the following alternative representations of and :
(C.1) | ||||
(C.2) |
Define ,
(C.3) | ||||
(C.4) | ||||
(C.9) | ||||
(C.10) |
Then, our variance estimators for and are given as
(C.11) | ||||
(C.12) |
respectively.
Proposition C.1 (Consistent variance estimation).
Suppose that the assumptions in Theorem 3.2 are satisfied. In addition, assume that and as . Then,
-
(i)
-
(ii)
for all
-
(iii)
-
(iv)
for all .
Proof.
(i) Decompose
(C.13) |
where
(C.14) |
Since
(C.15) |
we have
(C.16) |
Then, we can write
(C.17) | ||||
(C.18) | ||||
(C.19) | ||||
(C.20) |
In view of Theorem 3.1(ii) and (iii), we can easily find that uniformly in and . In addition, it is straightforward to see that , and , by Markov’s inequality. Hence, we have .
Meanwhile, it is not difficult to see that by Markov’s inequality. Then, the result follows from the triangle inequality.
(ii) Similar to the above, we decompose
(C.21) |
where
(C.22) |
For the first term on the right-hand side, noting that
(C.23) | ||||
(C.24) | ||||
(C.25) | ||||
(C.26) | ||||
(C.27) | ||||
(C.28) | ||||
(C.29) |
it is not difficult to show that .
For the second term, following the analogous argument as in the proof of Proposition 2 Lin and Lee (2010), holds. Hence, we obtain the desired result.
(iii) To prove the result, it suffices to show that
(C.30) |
As shown in the proof of Theorem 3.2, is bounded below from . On the other hand, writing and its estimator counterpart as , by the triangle inequality,
(C.31) | ||||
(C.32) | ||||
(C.33) | ||||
(C.34) |
where the last inequality is from by Lemma B.9(i) and (iv) and by results (i) and (ii). Thus, we have
(C.35) |
(iv) Analogous to the proof of (iii). ∎
Appendix D Supplementary figures for the empirical analysis





References
- Belloni et al. (2015) Belloni, A., Chernozhukov, V., Chetverikov, D., and Kato, K., 2015. Some new asymptotic theory for least squares series: Pointwise and uniform results, Journal of Econometrics, 186 (2), 345–366.
- Beyaztas et al. (2024) Beyaztas, U., Shang, H.L., Sezer, G.B., Mandal, A., Zoh, R.S., and Tekwe, C.D., 2024. Spatial function-on-function regression, arXiv preprint, 2412.17327.
- Chen and Müller (2012) Chen, K. and Müller, H.G., 2012. Modeling repeated functional observations, Journal of the American Statistical Association, 107 (500), 1599–1609.
- Chen (2007) Chen, X., 2007. Chapter 76 large sample sieve estimation of semi-nonparametric models, Elsevier, Handbook of Econometrics, vol. 6, 5549–5632.
- Delicado (2011) Delicado, P., 2011. Dimensionality reduction when data are density functions, Computational Statistics & Data Analysis, 55 (1), 401–420.
- Denbee et al. (2021) Denbee, E., Julliard, C., Li, Y., and Yuan, K., 2021. Network risk and key players: A structural analysis of interbank liquidity, Journal of Financial Economics, 141 (3), 831–859.
- Di et al. (2024) Di, C., Wang, G., Wu, S., Evenson, K.R., LaMonte, M.J., and LaCroix, A.Z., 2024. Utilizing wearable devices to improve precision in physical activity epidemiology: Sensors, data and analytic methods, in: Statistics in Precision Health: Theory, Methods and Applications, Springer, 41–64.
- Dudley and Philipp (1983) Dudley, R.M. and Philipp, W., 1983. Invariance principles for sums of Banach space valued random elements and empirical processes, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 62 (4), 509–552.
- Eren and Uz (2020) Eren, E. and Uz, V.E., 2020. A review on bike-sharing: The factors affecting bike-sharing demand, Sustainable Cities and Society, 54, 101882.
- Faghih-Imani and Eluru (2016) Faghih-Imani, A. and Eluru, N., 2016. Incorporating the impact of spatio-temporal interactions on bicycle sharing system demand: A case study of New York CitiBike system, Journal of Transport Geography, 54, 218–227.
- Gallant and White (1988) Gallant, R. and White, H., 1988. A unified theory of estimation and inference for nonlinear dynamic models, Blackwell.
- Hoshino (2022) Hoshino, T., 2022. Sieve IV estimation of cross-sectional interaction models with nonparametric endogenous effect, Journal of Econometrics, 229 (2), 263–275.
- Hoshino (2024) Hoshino, T., 2024. Functional spatial autoregressive models, arXiv preprint, 2402.14763.
- Hron et al. (2016) Hron, K., Menafoglio, A., Templ, M., Hruzová, K., and Filzmoser, P., 2016. Simplicial principal component analysis for density functions in bayes spaces, Computational Statistics & Data Analysis, 94, 330–350.
- Hyndman and Ullah (2007) Hyndman, R.J. and Ullah, M.S., 2007. Robust forecasting of mortality and fertility rates: A functional data approach, Computational Statistics & Data Analysis, 51 (10), 4942–4956.
- Jenish (2012) Jenish, N., 2012. Nonparametric spatial regression under near-epoch dependence, Journal of Econometrics, 167 (1), 224–239.
- Jenish and Prucha (2009) Jenish, N. and Prucha, I.R., 2009. Central limit theorems and uniform laws of large numbers for arrays of random fields, Journal of Econometrics, 150 (1), 86–98.
- Jenish and Prucha (2012) Jenish, N. and Prucha, I.R., 2012. On spatial processes and asymptotic inference under near-epoch dependence, Journal of Econometrics, 170 (1), 178–190.
- Koop et al. (1996) Koop, G., Pesaran, M.H., and Potter, S.M., 1996. Impulse response analysis in nonlinear multivariate models, Journal of Econometrics, 74 (1), 119–147.
- Kress (2014) Kress, R., 2014. Linear Integral Equations, Third Edition, Springer.
- Kuersteiner and Prucha (2020) Kuersteiner, G.M. and Prucha, I.R., 2020. Dynamic spatial panel models: Networks, common shocks, and sequential exogeneity, Econometrica, 88 (5), 2109–2146.
- Lee and Yu (2010) Lee, L.F. and Yu, J., 2010. Estimation of spatial autoregressive panel data models with fixed effects, Journal of Econometrics, 154 (2), 165–185.
- Lee and Yu (2014) Lee, L.F. and Yu, J., 2014. Efficient GMM estimation of spatial dynamic panel data models with fixed effects, Journal of Econometrics, 180 (2), 174–197.
- Lin et al. (2018) Lin, L., He, Z., and Peeta, S., 2018. Predicting station-level hourly demand in a large-scale bike-sharing network: A graph convolutional neural network approach, Transportation Research Part C: Emerging Technologies, 97, 258–276.
- Lin and Lee (2010) Lin, X. and Lee, L.F., 2010. GMM estimation of spatial autoregressive models with unknown heteroskedasticity, Journal of Econometrics, 157 (1), 34–52.
- Ma et al. (2024) Ma, T., Yao, F., and Zhou, Z., 2024. Network-level traffic flow prediction: Functional time series vs. functional neural network approach, The Annals of Applied Statistics, 18 (1), 424–444.
- Scott (1973) Scott, D.J., 1973. Central limit theorems for martingales and for processes with stationary increments using a skorokhod representation approach, Advances in Applied Probability, 5 (1), 119–137.
- Su and Hoshino (2016) Su, L. and Hoshino, T., 2016. Sieve instrumental variable quantile regression estimation of functional coefficient models, Journal of Econometrics, 191 (1), 231–254.
- Torti et al. (2021) Torti, A., Pini, A., and Vantini, S., 2021. Modelling time-varying mobility flows using function-on-function regression: Analysis of a bike sharing system in the city of milan, Journal of the Royal Statistical Society Series C: Applied Statistics, 70 (1), 226–247.
- Xu and Lee (2015) Xu, X. and Lee, L.F., 2015. A spatial autoregressive model with a nonlinear transformation of the dependent variable, Journal of Econometrics, 186 (1), 1–18.
- Yu et al. (2008) Yu, J., De Jong, R., and Lee, L.F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large, Journal of Econometrics, 146 (1), 118–134.
- Zhu et al. (2022) Zhu, X., Cai, Z., and Ma, Y., 2022. Network functional varying coefficient model, Journal of the American Statistical Association, 117 (540), 2074–2085.