This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11footnotetext:   School of Mathematics, Jilin University, Changchun 130012, China.22footnotetext:   Department of Statistics, Florida State University, Tallahassee, FL 32306, U.S.A.

Regression analysis of mixed sparse synchronous and asynchronous longitudinal covariates with varying-coefficient models

Congmin Liu1, Zhuowei Sun1 and Hongyuan Cao2
Abstract

We consider varying-coefficient models for mixed synchronous and asynchronous longitudinal covariates, where asynchronicity refers to the misalignment of longitudinal measurement times within an individual. We propose three different methods of parameter estimation and inference. The first method is a one-step approach that estimates non-parametric regression functions for synchronous and asynchronous longitudinal covariates simultaneously. The second method is a two-step approach in which synchronous longitudinal covariates are regressed with the longitudinal response by centering the synchronous longitudinal covariates first and, in the second step, the residuals from the first step are regressed with asynchronous longitudinal covariates. The third method is the same as the second method except that in the first step, we omit the asynchronous longitudinal covariate and include a non-parametric intercept in the regression analysis of synchronous longitudinal covariates and the longitudinal response. We further construct simultaneous confidence bands for the non-parametric regression functions to quantify the overall magnitude of variation. Extensive simulation studies provide numerical support for the theoretical findings. The practical utility of the methods is illustrated on a dataset from the ADNI study.

Keywords: Asynchronous longitudinal data; Convergence rate; Kernel-weighting; Simultaneous confidence band; Varying-coefficient model.

1 Introduction

Asynchronous longitudinal data arise when the longitudinal covariates and response are misaligned within individuals. An example of asynchronous longitudinal data is in the analysis of electronic health records (EHRs). EHR data consist of longitudinal medical records of a large number of patients in one or more electronic health care systems, such as vital signs, laboratory tests, procedure codes, and medications, among others (Lou et al., 2021). Due to the retrospective nature of EHRs, measurement times are collected at each clinical encounter, which can be irregular, sparse, and heterogeneous across patients and asynchronous within patients. Another example comes from a dataset in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study (Li et al., 2022). In this dataset, 256256 subjects are followed up for 55 years, where cognitive decline metrics, such as the Mini-Mental State Examination (MMSE) score, are mismatched with medical imaging measurement, such as fractional anisotropy (FA), which reflects fiber density, axonal diameter, and myelination in white matter, within individuals. Other covariates, such as age, education level, and the number of APOE4 genes, a genetic risk factor for Alzheimer’s disease, are synchronous with longitudinal response MMSE, creating mixed synchronous and asynchronous longitudinal covariates. In Figure 1, we plot the visit times for MMSE scores and FA of all patients and two randomly selected patients.

Refer to caption
Refer to caption
Figure 1: Asynchronous observation times of MMSE and FA.

For regression analysis of sparse asynchronous longitudinal data, several methods have been proposed in the literature. For example, Xiong and Dubin (2010) proposed to bin the asynchronous longitudinal covariates to align with the longitudinal response. Classic longitudinal data analysis methods can be used afterwards. Sentürk et al. (2013) explicitly addressed the asynchronous setting for generalized varying-coefficient models with one covariate but did not provide the theoretical properties of the estimators. Cao et al. (2015) proposed a non-parametric kernel weighting approach for generalized linear models to address the asynchronous structure and rigorously established the consistency and asymptotic normality of the resulting estimators. This was extended to a more general set up in Cao et al. (2016) and a partially linear model in Chen and Cao (2017). Sun et al. (2021) studied an informative measurement process for asynchronous longitudinal data. Li et al. (2022) proposed a class of generalized functional partial-linear varying-coefficient models for temporally asynchronous functional neuroimaging data. These methods assume that all longitudinal covariates have the same measurement times that are asynchronous with the longitudinal response. The problem of mixed synchronous and asynchronous longitudinal covariates has not been considered before.

In this paper, we propose statistical methods for estimation and inference of mixed synchronous and asynchronous longitudinal covariates in varying-coefficient models. There is a huge literature on varying-coefficient models (Hastie and Tibshirani, 1993). For example, Fan et al. (2007) considered a semiparametric varying-coefficient partially linear model for synchronous longitudinal data. The main methodological and theoretical developments on varying-coefficient models are summarized in Fan and Zhang (2008). Unlike these classic results, the data structure we work with is very different. The longitudinal covariates have two parts, one part is measured synchronously with the longitudinal response, and another part is asynchronous with the longitudinal response. We consider a dynamic model that allows the covariates’ effect to change with respect to time. To evaluate the overall pattern and magnitude of the time-dependent coefficient or to test whether certain parametric functions are adequate in describing the overall trend of the regression relationship over time, it is more informative to construct simultaneous confidence bands (SCBs) of the non-parametric regression function. Theoretically, to construct SCBs, one has to establish the limiting distribution of the maximum deviation between the estimated and the true function. Due to the sparse, irregular, and asynchronous longitudinal data structure, one has to perform uniform investigations into a dependent empirical process where stochastic variations have to be controlled in time, synchronous longitudinal process, and asynchronous longitudinal process. Progress made in SCB for longitudinal data can be found in Ma et al. (2012), Gu et al. (2014), Zheng et al. (2014), and Cao et al. (2018), among others. For time series data, the simultaneous inference problem is studied in Zhou and Wu (2010) and Liu and Wu (2010), among others. Due to the slow rate of convergence of the extreme value distribution, a resampling method, such as wild bootstrap, is generally adopted in SCB construction (Zhu et al., 2007, 2012). The theoretical guarantees of wild bootstrap can be found in Wu (1986), Liu (1988) and Chapter 2.9 of van der Vaart and Wellner (1996).

To improve efficiency of the varying-coefficient estimation of synchronous longitudinal covariates, we propose a two-step method. In the first step, we regress the response to the synchronous longitudinal covariates; in the second step, the residuals from the first step are regressed to the asynchronous longitudinal covariates. By avoiding unnecessary smoothing of the synchronous longitudinal covariates, we get the non-parametric rate of convergence n2/5n^{2/5} for their time-varying coefficient estimation, which is faster than n1/3n^{1/3} rate of convergence with a method that estimates mixed synchronous and asynchronous longitudinal covariates simultaneously. The trade-off is that the two-step method requires that the synchronous and asynchronous longitudinal covariates be uncorrelated at all time whereas the one-step method does not impose such an assumption.

The remainder of the paper is organized as follows. Section 2 establishes methods and asymptotic theory for varying-coefficient estimation of mixed synchronous and asynchronous longitudinal covariates using one-step and two-step methods, respectively. We also develop an inferential strategy with simultaneous confidence band for uncertainty quantification of the non-parametric regression function. Section 3 studies the finite-sample performance of the proposed method through simulations. Section 4 analyzes a data set from the ADNI study. Section 5 gives some concluding remarks. All technical proofs are relegated in the Appendix.

2 Estimation and inference

Suppose we have a longitudinal study with nn subjects. For the iith subject, denote Yi(t)Y_{i}(t) as the response variable at time t,t, let Xi(t)X_{i}(t) be a p×1p\times 1 vector of time-dependent covariate and Zi(t)Z_{i}(t) be a q×1q\times 1 vector of time-dependent covariate, respectively. Mixed synchronous and asynchronous longitudinal covariates for the iith subject refer to the observation of {Yi(Tij),Xi(Tij),Zi(Sik)}\{Y_{i}(T_{ij}),X_{i}(T_{ij}),Z_{i}(S_{ik})\}, j=1,,Li;k=1,,Mi,j=1,\ldots,L_{i};k=1,\ldots,M_{i}, where Tij,j=1,,Li,T_{ij},j=1,\ldots,L_{i}, are the observation times for the longitudinal response and synchronous longitudinal covariates and Sik,k=1,,Mi,S_{ik},k=1,\ldots,M_{i}, are the observation times for the asynchronous longitudinal covariates with the requirement that LiL_{i} and MiM_{i} are finite with probability 1.1. We use univariate and bivariate counting processes to represent the observation times. Specifically, for subject i=1,,n,i=1,\ldots,n, Ni(t)=j=1LiI(Tijt)N_{i}(t)=\sum\limits_{j=1}^{L_{i}}I(T_{ij}\leq t) counts the number of observation times up to tt in the response and synchronous longitudinal covariates and Ni(t,s)=j=1Lik=1MiI(Tijt,Siks)N^{*}_{i}(t,s)=\sum\limits_{j=1}^{L_{i}}\sum\limits_{k=1}^{M_{i}}I(T_{ij}\leq t,S_{ik}\leq s) counts the number of observation times up to tt on the response and synchronous longitudinal covariates and up to ss in the asynchronous longitudinal covariates (Cao et al., 2015, 2016).

We consider the following varying-coefficient model:

E{Y(t)X(t),Z(t)}=X(t)Tβ(t)+Z(t)Tγ(t),E\left\{Y(t)\mid X(t),Z(t)\right\}=X(t)^{T}\beta(t)+Z(t)^{T}\gamma(t), (2.1)

where β(t)\beta(t) and γ(t)\gamma(t) are non-parametric regression functions for synchronous and asynchronous longitudinal covariates, respectively. We are interested in the statistical estimation and inference of them.

2.1 A one-step approach

To estimate β(t)\beta(t) and γ(t)\gamma(t) simultaneously, we consider a local linear estimator (Fan and Gijbels, 1996; Cao et al., 2018):

{β^w(t),β˙^w(t),γ^w(t),γ˙^w(t)}=argmin{β0,β1,γ0,γ1}i=1nKh1,h2(t1t,t2t){Yi(t1)Xi(t1)Tβ0Xi(t1)Tβ1(t1t)Zi(t2)Tγ0Zi(t2)Tγ1(t2t)}2dNi(t1,t2).\begin{split}&\{\hat{\beta}_{w}(t),\hat{\dot{\beta}}_{w}(t),\hat{\gamma}_{w}(t),\hat{\dot{\gamma}}_{w}(t)\}\\ =&\mathop{\rm arg\min}_{\{\beta_{0},\beta_{1},\gamma_{0},\gamma_{1}\}}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\left\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\beta_{0}-X_{i}(t_{1})^{T}\beta_{1}(t_{1}-t)\right.\\ &\left.\ \ \quad\quad\quad\quad\quad\quad\quad-Z_{i}(t_{2})^{T}\gamma_{0}-Z_{i}(t_{2})^{T}\gamma_{1}(t_{2}-t)\right\}^{2}dN^{*}_{i}(t_{1},t_{2}).\end{split} (2.2)

For any fixed time point t,t, denote R(t1,t2,t)={X(t1)T,X(t1)T(t1t),Z(t2)T,Z(t2)T(t2t)}TR(t_{1},t_{2},t)=\{X(t_{1})^{T},X(t_{1})^{T}(t_{1}-t),Z(t_{2})^{T},Z(t_{2})^{T}(t_{2}-t)\}^{T}, and ρ0(t){β0(t)T,β˙0(t)T,γ0(t)T,γ˙0(t)T}T,\rho_{0}(t)\triangleq\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T},\gamma_{0}(t)^{T},\dot{\gamma}_{0}(t)^{T}\}^{T}, where β0(t)\beta_{0}(t) and γ0(t)\gamma_{0}(t) are the true non-parametric regression functions and β˙0(t)\dot{\beta}_{0}(t) and γ˙0(t)\dot{\gamma}_{0}(t) are the corresponding derivative functions. Assume that the conditional variance function var{Y(t)X(t),Z(t)}=σ{t,X(t),Z(t)}2<.\mbox{var}\left\{Y(t)\mid X(t),Z(t)\right\}=\sigma\left\{t,X(t),Z(t)\right\}^{2}<\infty. We need the following assumptions.

  1. (A1)

    Ni(t,s)N^{*}_{i}(t,s) is independent of {Xi(t),Zi(t),Yi(t)}\{X_{i}(t),Z_{i}(t),Y_{i}(t)\} and E{dNi(t,s)}=η(t,s)dtds,E\left\{dN^{*}_{i}(t,s)\right\}=\eta(t,s)dtds, i=1,,n,i=1,\ldots,n, where η(t,s)\eta(t,s) is a twice continuously differentiable function for any (t,s)[0,1]2.(t,s)\in[0,1]^{\otimes 2}. For t1s1,t2s2,P(dN(t1,t2)=1N(s1,s2)N(s1,s2)=1)=f(t1,t2,s1,s2)t_{1}\neq s_{1},t_{2}\neq s_{2},{P}(dN^{*}(t_{1},t_{2})=1\mid N^{*}(s_{1},s_{2})-N^{*}(s_{1}-,s_{2}-)=1)=f(t_{1},t_{2},s_{1},s_{2}) dt1dt2dt_{1}dt_{2} where f(t1,t2,s1,s2)f(t_{1},t_{2},s_{1},s_{2}) is continuous for t1s1,t2s2t_{1}\neq s_{1},t_{2}\neq s_{2} and
    f(t1±,t2±,s1±,s2±)f(t_{1}\pm,t_{2}\pm,s_{1}\pm,s_{2}\pm) exist.

  2. (A2)

    ρ0(t)\rho_{0}(t) is twice continuously differentiable for t[0,1].\forall t\in[0,1]. For any fixed time point t,t, E{R(t1,t2,t)R(s1,s2,t)T}2(p+q)×2(p+q)E\left\{R(t_{1},t_{2},t)R(s_{1},s_{2},t)^{T}\right\}\in{\mathbb{R}}^{2(p+q)\times 2(p+q)} and E{R(t1,t2,t)}2(p+q){E}\left\{R(t_{1},t_{2},t)\right\}\in{\mathbb{R}}^{2(p+q)} are twice continuously differentiable for (t1,t2,s1,s2)[0,1]4.(t_{1},t_{2},s_{1},s_{2})\in[0,1]^{\otimes 4}. E{C(t,t)C(t,t)T}η(t,t)(p+q)×(p+q)E\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\in{\mathbb{R}}^{(p+q)\times(p+q)} is positive definite for t[0,1],\forall t\in[0,1], where C(t,t)={X(t)T,Z(t)T}TC(t,t)=\left\{X(t)^{T},Z(t)^{T}\right\}^{T}. Moreover,

    E[C(t,t)C(t,t)Tσ{t,X(t),Z(t)}2]η(t,t)<,t[0,1],\|E\left[C(t,t)C(t,t)^{T}\sigma\left\{t,X(t),Z(t)\right\}^{2}\right]\|_{\infty}\eta(t,t)<\infty,\quad\forall t\in[0,1],

    where for a square matrix A,A, A=max1inj=1n|aij|.\|A\|_{\infty}=\max_{1\leq i\leq n}\sum_{j=1}^{n}|a_{ij}|.

  3. (A3)

    K(,)K(\cdot,\cdot) is a bivariate density function satisfying z1K(z1,z2)𝑑z1𝑑z2=z2K(z1,z2)\iint z_{1}K(z_{1},z_{2})dz_{1}dz_{2}=\iint z_{2}K(z_{1},z_{2})
    dz1dz2=0,K(z1,z2)2𝑑z1𝑑z2<,z12K(z1,z2)𝑑z1𝑑z2<,z22K(z1,z2)𝑑z1𝑑z2<dz_{1}dz_{2}=0,\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}<\infty,\iint z_{1}^{2}K(z_{1},z_{2})dz_{1}dz_{2}<\infty,\iint z_{2}^{2}K(z_{1},z_{2})dz_{1}dz_{2}<\infty and z12z22K(z1,z2)𝑑z1𝑑z2<.\iint z_{1}^{2}z_{2}^{2}K(z_{1},z_{2})dz_{1}dz_{2}<\infty.

  4. (A4)

    (nh1h2)1/2(h12+h22)0(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\to 0 and nh1h2.nh_{1}h_{2}\to\infty.

Condition (A1) requires that the observation process be independent of the covariate processes. Analogous assumptions have been made in Lin and Ying (2001) and Cao et al. (2015). Condition (A2) ensures the identifiability of ρ0(t)\rho_{0}(t) at any fixed time point t,t, and some smoothness assumptions on the expectation of certain functional of R(t1,t2,t)R(t_{1},t_{2},t) and ρ0(t).\rho_{0}(t). Conditions (A3) and (A4) specify valid kernels and bandwidths. For the bivariate kernel function, we recommend kernel functions with bounded support, such as the product of the univariate Epanechnikov kernel.

The following theorem establishes the asymptotic results of β^w(t)\hat{\beta}_{w}(t) and γ^w(t).\hat{\gamma}_{w}(t). The proofs are relegated in the Appendix.

Theorem 1.

Under conditions (A1)-(A4), as n,n\rightarrow\infty, we have

nh1h2[{β^w(t)β0(t)}T,{γ^w(t)γ0(t)}T]TdN{0,A(t)1Σ(t)A(t)1},\sqrt{nh_{1}h_{2}}\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{*}(t)^{-1}\Sigma^{*}(t)A^{*}(t)^{-1}\},

where

A(t)=E{C(t,t)C(t,t)T}η(t,t)A^{*}(t)=E\{C(t,t)C(t,t)^{T}\}\eta(t,t)

and

Σ(t)={K(z1,z2)2𝑑z1𝑑z2}E[C(t,t)C(t,t)Tσ{t,X(t),Z(t)}2]η(t,t).\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).

Theorem 1 derives the joint asymptotic distribution of β^w(t)\hat{\beta}_{w}(t) and γ^w(t).\hat{\gamma}_{w}(t). If we let hα=h1=h2h_{\alpha}=h_{1}=h_{2}, the bias of β^w(t)\hat{\beta}_{w}(t) and γ^w(t)\hat{\gamma}_{w}(t) in Theorem 1 is O(hα2)O(h_{\alpha}^{2}), which is the same as that in Theorem 2 of Cao et al. (2015). The reason why the O(hα)O(h_{\alpha}) term vanishes is that we use symmetric density functions as the kernel functions, which are specified in (A3). When the sample size nn goes to infinity, under (A4), the bias terms vanish. In general, the off-diagonal part of the limiting variance covariance matrix is non-zero. In the special case that E{X(t)Z(t)T}=0p×q,E\{X(t)Z(t)^{T}\}=0_{p\times q}, β^w(t)\hat{\beta}_{w}(t) and γ^w(t)\hat{\gamma}_{w}(t) are independent. Moreover, if we assume that σ{t,X(t),Z(t)}2=σ(t)2,\sigma\{t,X(t),Z(t)\}^{2}=\sigma(t)^{2}, then Var{β^w(t)}=[E{X(t)X(t)T}η(t,t)]1{K(z1,z2)2𝑑z1𝑑z2}σ(t)2\mbox{Var}\{\hat{\beta}_{w}(t)\}=[E\{X(t)X(t)^{T}\}\eta(t,t)]^{-1}\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}\sigma(t)^{2} and Var{γ^w(t)}=[E{Z(t)Z(t)T}η(t,t)]1{K(z1,z2)2𝑑z1𝑑z2}σ(t)2,\mbox{Var}\{\hat{\gamma}_{w}(t)\}=[E\{Z(t)Z(t)^{T}\}\eta(t,t)]^{-1}\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}\sigma(t)^{2}, respectively.

Our method depends on the proper choice of bandwidths. If we let hα=h1=h2,h_{\alpha}=h_{1}=h_{2}, according to condition (A4), a valid bandwidth is larger than O(n1/2),O(n^{-1/2}), and the bias of β^w(t)\hat{\beta}_{w}(t) and γ^w(t)\hat{\gamma}_{w}(t) in Theorem 1 is of order hα2,h_{\alpha}^{2}, so we should choose bandwidth hα=o(n1/6).h_{\alpha}=o(n^{-1/6}). With this choice of hαh_{\alpha}, we obtain a rate of convergence O(n1/3),O(n^{1/3}), which is the same as that achieved in Cao et al. (2015). This is slower than the non-parametric rate of convergence n2/5n^{2/5} due to the asynchronous longitudinal data structure.

In Theorem 1, A(t)A^{*}(t) and Σ(t)\Sigma^{*}(t) are not directly computable. In practice, we use the estimating equation derived from (2.2) and estimate the variance covariance matrix of ρ^w(t)\hat{\rho}_{w}(t) with the sandwich formula:

var^{ρ^w(t)}={i=1nKh1,h2(t1t,t2t)Ri(t1,t2,t)Ri(t1,t2,t)T𝑑Ni(t1,t2)}1×i=1n[Kh1,h2(t1t,t2t)Ri(t1,t2,t){Yi(t1)Ri(t1,t2,t)Tρ^w(t)}dNi(t1,t2)]2×{i=1nKh1,h2(t1t,t2t)Ri(t1,t2,t)Ri(t1,t2,t)T𝑑Ni(t1,t2)}1.\begin{split}&\widehat{\mathrm{var}}\{\hat{\rho}_{w}(t)\}\\ =&\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)R_{i}(t_{1},t_{2},t)R_{i}(t_{1},t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}\\ &\times\sum_{i=1}^{n}\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\hat{\rho}_{w}(t)\right\}dN^{*}_{i}(t_{1},t_{2})\right]^{\otimes 2}\\ &\times\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)R_{i}(t_{1},t_{2},t)R_{i}(t_{1},t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}.\end{split}

The variance covariance matrices of β^w(t)\hat{\beta}_{w}(t) and γ^w(t)\hat{\gamma}_{w}(t) are the upper p×pp\times p submatrix of the upper left (2p)×(2p)(2p)\times(2p) submatrix and the upper q×qq\times q submatrix of the lower right (2q)×(2q)(2q)\times(2q) submatrix of var^{ρ^w(t)}\widehat{\mathrm{var}}\{\hat{\rho}_{w}(t)\}, respectively.

2.2 Simultaneous confidence bands

To get uncertainty quantification and evaluate the overall magnitude of variation of the non-parametric regression functions, we construct simultaneous confidence bands (SCB). Here, we use γ(t)\gamma(t) to illustrate. Specifically, for a pre-specified confidence level 1α1-\alpha and a given smooth function of qq dimensions a(t),a(t), we aim to find smooth random functions L(t)L(t) and U(t)U(t), such that

{L(t)a(t)Tγ(t)U(t),t[a,b]}1α\mathbb{P}\{L(t)\leq a(t)^{T}\gamma(t)\leq U(t),\forall t\in[a,b]\}\rightarrow 1-\alpha

as number of subjects nn\rightarrow\infty for a pre-specified closed interval [a,b].[a,b]. Theoretical results on SCB construction rely on extreme value theory with slow rate of convergence (Fan and Zhang, 2000; Cao et al., 2018). We shall adopt a wild bootstrap approach to resample the residuals (Wu, 1986; Liu, 1988). The wild bootstrap is computationally efficient as it multiplies a random disturbance on the residuals without repeated estimation of the non-parametric regression function. Specific steps are as follows.

Step 1. For each subject i,i=1,2,,n,i,i=1,2,\ldots,n, calculate the kernel-weighted residual

r^i(t1,t2,t)=Kh1,h2(t1t,t2t){Yi(t1)Ri(t1,t2,t)Tρ^w(t)}.\hat{r}_{i}(t_{1},t_{2},t)=K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\hat{\rho}_{w}(t)\right\}.

Step 2. Obtain an approximation of γ^w(t)γ0(t)\hat{\gamma}_{w}(t)-\gamma_{0}(t) through

Q^n(t)=1ni=1n{fn(t)}1Zi(t2)r^i(t1,t2,t)𝑑Ni(t1,t2),\hat{Q}_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\left\{f_{n}(t)\right\}^{-1}\iint Z_{i}(t_{2})\hat{r}_{i}(t_{1},t_{2},t)dN^{*}_{i}(t_{1},t_{2}),

where fn(t)=1ni=1nKh1,h2(t1t,t2t)Zi(t2)Zi(t2)T𝑑Ni(t1,t2).f_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)Z_{i}(t_{2})Z_{i}(t_{2})^{T}dN^{*}_{i}(t_{1},t_{2}).

Step 3. Independently generate {ui,i=1,,n}\left\{u_{i},i=1,\dots,n\right\} from a distribution with mean 0 and standard deviation 1.1.. For example, we can use N(0,1)N(0,1) or the Rademacher distribution which takes values +1+1 and 1-1 with equal probability. Construct the stochastic process

Q^nu(t)=1ni=1nui{fn(t)}1Zi(t2)r^i(t1,t2,t)𝑑Ni(t1,t2).\hat{Q}_{n}^{u}(t)=\frac{1}{n}\sum_{i=1}^{n}u_{i}\cdot\left\{f_{n}(t)\right\}^{-1}\iint Z_{i}(t_{2})\hat{r}_{i}(t_{1},t_{2},t)dN^{*}_{i}(t_{1},t_{2}).

Step 4. Repeat Step 3 BB times to obtain {supta(t)TQ^nu(1)(t),,supta(t)TQ^nu(B)(t)},\{\sup\limits_{t}\mid a(t)^{T}\hat{Q}_{n}^{u(1)}(t)\mid,\dots,\sup\limits_{t}\mid a(t)^{T}\hat{Q}_{n}^{u(B)}(t)\mid\}, and denote its 1α1-\alpha empirical percentile as cα.c_{\alpha}. The SCB for a(t)Tγ(t)a(t)^{T}\gamma(t) can be written as

(a(t)Tγ^w(t)cα,a(t)Tγ^w(t)+cα).\left(a(t)^{T}\hat{\gamma}_{w}(t)-c_{\alpha},a(t)^{T}\hat{\gamma}_{w}(t)+c_{\alpha}\right).

We recommend the Rademacher distribution as the multiplier in Step 3 through extensive simulation studies with different random variables. The details are omitted.

We remark that for a given pp-dimensional smooth function d(t),d(t), the construction of SCB for d(t)Tβ(t)d(t)^{T}\beta(t) is exactly the same except in Step 2, let J^n(t)=1ni=1n{gn(t)}1Xi(t1)r^i(t1,t2,t)𝑑Ni(t1,t2),\hat{J}_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\left\{g_{n}(t)\right\}^{-1}\iint X_{i}(t_{1})\hat{r}_{i}(t_{1},t_{2},t)\newline dN^{*}_{i}(t_{1},t_{2}), where gn(t)=1ni=1nKh1,h2(t1t,t2t)Xi(t1)Xi(t1)T𝑑Ni(t1,t2).g_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)X_{i}(t_{1})X_{i}(t_{1})^{T}dN^{*}_{i}(t_{1},t_{2}). The remaining steps are based on J^n(t)\hat{J}_{n}(t) and we replace a(t)a(t) by d(t).d(t).

2.3 A two-step approach

In (2.1), β(t)\beta(t) is the regression function of the synchronous longitudinal covariate. In classic non-parametric regression analysis of longitudinal data, we get the O(n2/5)O(n^{2/5}) rate of convergence. Can we achieve the same rate of convergence as the estimation of β(t)\beta(t) with mixed synchronous and asynchronous longitudinal covariates? In this subsection, we provide an affirmative answer to this question.

Specifically, we propose a two-step approach to estimate β(t)\beta(t) and γ(t),\gamma(t), respectively. In the first step, we aim to obtain unbiased estimation of β(t)\beta(t) from (2.1) using only X(t)X(t) and Y(t)Y(t). There are two strategies to handle the part that involves γ(t).\gamma(t). We can either remove it or take it into account by a non-parametric function. In the second step, we regress residuals from the first step on the asynchronous longitudinal covariates Z(t)Z(t) to get regression function estimation of γ(t)\gamma(t) similar to that in Cao et al. (2015).

We need a critical assumption

E{Z(t)X(t)}=E{Z(t)}.E\{Z(t)\mid X(t)\}=E\{Z(t)\}. (2.3)

Condition (2.3) states that X(t)X(t) and Z(t)Z(t) are uncorrelated when X(t)0X(t)\neq 0 for any t.t. This condition is plausible when X(t)X(t) indicates a time-invariant treatment, which is randomized and independent of other covariates. This condition means that there is no unmeasured confounder between treatment and outcome. This assumption can be strong in observational studies, where unmeasured confounders abound. In classic linear models, when the covariates are orthogonal with each other, the estimation of regression coefficients remains unbiased with a larger variance if some covariates are omitted. Unlike linear models, bias occurs when orthogonal covariates are omitted for longitudinal data if a non-parametric intercept is not added. If X(t)X(t) and Z(t)Z(t) are correlated, there will be bias using the proposed two-step method.

A centering approach to estimate β(t)\beta(t).  From (2.1), taking a conditional expectation on X(t),X(t), by (2.3), we have

E{Y(t)X(t)}=X(t)Tβ(t)+{EZ(t)}Tγ(t).E\{{Y}(t)\mid X(t)\}={X}(t)^{T}\beta(t)+\{EZ(t)\}^{T}\gamma(t). (2.4)

Taking another expectation, we get unconditional expectation of Y(t)Y(t) as

E{Y(t)}={EX(t)}Tβ(t)+{EZ(t)}Tγ(t).E\{Y(t)\}=\{EX(t)\}^{T}\beta(t)+\{EZ(t)\}^{T}\gamma(t). (2.5)

By taking the difference between (2.4) and (2.5), we obtain

E{Y~(t)X~(t)}=X~(t)Tβ(t),E\{\tilde{Y}(t)\mid\tilde{X}(t)\}=\tilde{X}(t)^{T}\beta(t), (2.6)

where Y~(t)=Y(t)E{Y(t)}\tilde{Y}(t)=Y(t)-E\{Y(t)\} and X~(t)=X(t)E{X(t)}.\tilde{X}(t)=X(t)-E\{X(t)\}. Unbiased estimation of β(t)\beta(t) can be obtained through (2.6). In (2.6), mY(t)=E{Y(t)}m_{Y}(t)=E\{Y(t)\} and mX(t)=E{X(t)}m_{X}(t)=E\{X(t)\} are not directly observed. We propose Nadaraya-Watson type of estimators as follows (Nadaraya, 1964; Watson, 1964). Let m^Y(t0)={i=1nKh(tt0)Yi(t)𝑑Ni(t)}/{i=1nKh(tt0)𝑑Ni(t)},\hat{m}_{Y}(t_{0})=\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})Y_{i}(t)dN_{i}(t)\}/\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})dN_{i}(t)\}, and m^X(t0)={i=1nKh(tt0)Xi(t)𝑑Ni(t)}/{i=1nKh(tt0)𝑑Ni(t)},\hat{m}_{X}(t_{0})=\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})X_{i}(t)dN_{i}(t)\}/\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})dN_{i}(t)\}, where K()K(\cdot) is a kernel function with bounded support, K()0,K(t)𝑑t=1K(\cdot)\geq 0,\int_{-\infty}^{\infty}K(t)dt=1 and Kh(t)=h1K(t/h).K_{h}(t)=h^{-1}K(t/h). The bandwidth h0h\rightarrow 0 and nh.nh\rightarrow\infty. Denote Y^i(t)=Yi(t)m^Y(t)\hat{Y}_{i}(t)=Y_{i}(t)-\hat{m}_{Y}(t) and X^i(t)=Xi(t)m^X(t).\hat{X}_{i}(t)=X_{i}(t)-\hat{m}_{X}(t). We aim to find

{β^c(t),β˙^c(t)}=argminβ0,β1Rp[1ni=1nKh(t1t){Y^i(t1)X^i(t1)Tβ0X^i(t1)Tβ1(t1t)}2𝑑Ni(t1)].\begin{split}&\{\hat{\beta}_{c}(t),\hat{\dot{\beta}}_{c}(t)\}\\ =&\mathop{\rm arg\min}_{\beta_{0},\beta_{1}\in R^{p}}\Big{[}\frac{1}{n}\sum_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{\hat{Y}_{i}(t_{1})-\hat{X}_{i}(t_{1})^{T}\beta_{0}-\hat{X}_{i}(t_{1})^{T}\beta_{1}(t_{1}-t)\right\}^{2}dN_{i}(t_{1})\Big{]}.\end{split} (2.7)

Define

Sn,l(t)=1nhi=1nKh(t1t)X^i(t1)X^i(t1)T{(t1t)/h}l𝑑Ni(t1),l=0,1,2andqn,l(t)=1nhi=1nKh(t1t)X^i(t1)Y^i(t1){(t1t)/h}l𝑑Ni(t1),l=0,1.\begin{split}S_{n,l}(t)&=\frac{1}{nh}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\hat{X}_{i}(t_{1})\hat{X}_{i}(t_{1})^{T}\{(t_{1}-t)/h\}^{l}dN_{i}(t_{1}),l=0,1,2\\ \mbox{and}\quad q_{n,l}(t)&=\frac{1}{nh}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\hat{X}_{i}(t_{1})\hat{Y}_{i}(t_{1})\{(t_{1}-t)/h\}^{l}dN_{i}(t_{1}),l=0,1.\end{split}

Then

(β^c(t)hβ˙^c(t))=(Sn,0Sn,1Sn,1Sn,2)1(qn,0qn,1).\begin{pmatrix}\hat{\beta}_{c}(t)\\ h\hat{\dot{\beta}}_{c}(t)\end{pmatrix}=\begin{pmatrix}S_{n,0}&S_{n,1}\\ S_{n,1}&S_{n,2}\end{pmatrix}^{-1}\begin{pmatrix}q_{n,0}\\ q_{n,1}\end{pmatrix}.

Next, we show the asymptotic properties of β^c(t)\hat{\beta}_{c}(t). Denote W~i(t1,t)={X~i(t1)T,X~i(t1)T(t1t)}T,\tilde{W}_{i}(t_{1},t)=\{\tilde{X}_{i}(t_{1})^{T},\tilde{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T}, θ0(t)={β0(t)T,β˙0(t)T}T,\theta_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T}\}^{T}, θ^c(t)={β^c(t)T,β˙^c(t)T}T,\hat{\theta}_{c}(t)=\{\hat{\beta}_{c}(t)^{T},\hat{\dot{\beta}}_{c}(t)^{T}\}^{T}, and var{Y~(t)X~(t)}=σ{t,X~(t)}2.\mbox{var}\{\tilde{Y}(t)\mid\tilde{X}(t)\}=\sigma\{t,\tilde{X}(t)\}^{2}. We need the following assumptions.

  1. (A5)

    Ni(t)N_{i}(t) is independent of {Xi(t),Yi(t)},\left\{X_{i}(t),Y_{i}(t)\right\}, and E{dNi(t)}=λ(t)dt,i=1,,n,E\{dN_{i}(t)\}=\lambda(t)dt,i=1,\dots,n, where λ(t)\lambda(t) is twice continuously differentiable for t[0,1].\forall t\in[0,1].

  2. (A6)

    θ0(t)\theta_{0}(t) is twice continuously differentiable for t[0,1]\forall t\in[0,1]. For any fixed time point t,t, E{W~(t1,t)}E\{\tilde{W}(t_{1},t)\} and E{W~(t1,t)W~(s1,t)T}E\{\tilde{W}(t_{1},t)\tilde{W}(s_{1},t)^{T}\} are twice continuously differentiable for (t1,s1)[0,1]2(t_{1},s_{1})\in[0,1]^{\otimes 2}. Furthermore, E{X~(t)X~(t)T}λ(t)E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t) is positive definite for t[0,1]\forall t\in[0,1]. Furthermore,

    E[X~(t)X~(t)Tσ{t,X~(t)}2]λ(t)<,\|E[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}]\|_{\infty}\lambda(t)<\infty,

    where for a square matrix A,A, A=max1inj=1n|aij|.\|A\|_{\infty}=\max_{1\leq i\leq n}\sum_{j=1}^{n}|a_{ij}|.

  3. (A7)

    K()K(\cdot) is a symmetric density function satisfying zK(z)𝑑z=0,\int zK(z)dz=0, z2K(z)𝑑z<\int z^{2}K(z)dz<\infty and K(z)2𝑑z<.\int K(z)^{2}dz<\infty.

  4. (A8)

    nhnh\to\infty and nh50.nh^{5}\to 0.

Conditions (A5)-(A8) are similar in spirit to conditions (A1)-(A4) in Section 2.1. (A5) is the univariate version of (A1). (A6) is a modified identifiability condition for θ0(t).\theta_{0}(t). (A7) and (A8) are requirements on the kernel function and the bandwidth.

We state the asymptotic distribution of β^c(t)\hat{\beta}_{c}(t) in the following theorem and relegate the proofs in the Appendix.

Theorem 2.

Under conditions (2.3), (A5)-(A8), as n,n\rightarrow\infty, we have

nh{β^c(t)β0(t)}dN{0,A1(t)Σ(t)A1(t)},\sqrt{nh}\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where

A(t)=E{X~(t)X~(t)T}λ(t)andΣ(t)={K(z)2𝑑z}E[X~(t)X~(t)Tσ{t,X~(t)}2]λ(t).\begin{split}A(t)&=E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\\ \mbox{and}\quad\Sigma(t)&=\big{\{}\int K(z)^{2}dz\big{\}}E\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t).\end{split}

We need one bandwidth hh and the bias of βc(t)\beta_{c}(t) in Theorem 2 is O(h2)O(h^{2}), which is the same as in classic nonparametric regression such as Theorem 1 of Fan and Zhang (2008). Our method depends on the selection of the bandwidth, which strikes a balance between bias and variability. From (A8), we shall choose the bandwidth h=o(n1/5).h=o(n^{-1/5}). With this choice of the bandwidth, we achieve a convergence rate O(n2/5),O(n^{2/5}), the same as non-parametric regression of classic longitudinal data (Fan and Zhang, 2008) and faster than that obtained in Theorem 1. It is counterintuitive that we get an improved rate of convergence with less information – we do not use any information contained in Zi(Sik),i=1,,n;k=1,,Mi.Z_{i}(S_{ik}),i=1,\ldots,n;k=1,\ldots,M_{i}. The key is the orthogonality assumption imposed in (2.3), which allows us to obtain an unbiased estimation of the non-parametric regression function β0(t).\beta_{0}(t). When (2.3) is violated, we can regress the residuals of Y(t)Y(t) on Z(t)Z(t) on the residuals of X(t)X(t) on Z(t)Z(t) similar to the FWL theorem in linear models (Frisch and Waugh, 1933; Lovell, 1963; Ding, 2021). We conjecture that the results would be similar to the simultaneous estimation of β(t)\beta(t) and γ(t),\gamma(t), with a slower convergence rate for the estimation of β(t).\beta(t).

For statistical inference, we need to estimate the variance of β^c(t).\hat{\beta}_{c}(t). We use the sandwich formula to estimate the variance covariance matrix of {β^c(t),β˙^c(t)}\{\hat{\beta}_{c}(t),\hat{\dot{\beta}}_{c}(t)\} as follows:

var^{θ^c(t)}={i=1nKh(t1t)W^i(t1,t)W^i(t1,t)T𝑑Ni(t1)}1×i=1n[Kh(t1t)W^i(t1,t){Y^i(t1)W^i(t1,t)Tθ^c(t)}dNi(t1)]2×{i=1nKh(t1t)W^i(t1,t)W^i(t1,t)T𝑑Ni(t1)}1,\begin{split}\widehat{\mathrm{var}}\{\hat{\theta}_{c}(t)\}=&\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{W}_{i}(t_{1},t)\hat{W}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1}\\ &\times\sum_{i=1}^{n}\left[\iint K_{h}(t_{1}-t)\hat{W}_{i}(t_{1},t)\left\{\hat{Y}_{i}(t_{1})-\hat{W}_{i}(t_{1},t)^{T}\hat{\theta}_{c}(t)\right\}dN_{i}(t_{1})\right]^{\otimes 2}\\ &\times\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{W}_{i}(t_{1},t)\hat{W}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1},\end{split}

where W^i(t1,t)={X^i(t1)T,X^i(t1)T(t1t)}T,X^i(t)=Xi(t)m^X(t),\hat{W}_{i}(t_{1},t)=\{\hat{X}_{i}(t_{1})^{T},\hat{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T},\hat{X}_{i}(t)=X_{i}(t)-\hat{m}_{X}(t), and Y^i(t)=Yi(t)m^Y(t).\hat{Y}_{i}(t)=Y_{i}(t)-\hat{m}_{Y}(t). The variance covariance matrix of β^c(t)\hat{\beta}_{c}(t) is the upper left p×pp\times p submatrix of var^{θ^c(t)}\widehat{\mathrm{var}}\{\hat{\theta}_{c}(t)\} while we treat β˙c(t)\dot{\beta}_{c}(t) as a nuisance function. In the literature on density estimation, β˙^c(t)\hat{\dot{\beta}}_{c}(t) estimated from (2.7) has more desirable properties than conventional kernel density estimation. More details on this can be found in Cattaneo et al. (2020).

A varying-coefficient model approach to estimate β(t)\beta(t).  Another strategy to obtain an unbiased estimation of β(t)\beta(t) is to absorb the part involving γ(t)\gamma(t) through a non-parametric function. We use (2.5) and treat E{Z(t)}Tγ(t)E\{Z(t)\}^{T}\gamma(t) as a non-parametric intercept. Specifically, consider

E{Y(t)X(t)}=X(t)Tβ(t),E\{Y(t)\mid X(t)\}=X^{*}(t)^{T}\beta^{*}(t), (2.8)

where X(t)={1,X(t)T}T,X^{*}(t)=\{1,X(t)^{T}\}^{T}, and β(t)={α(t),β(t)T}T,\beta^{*}(t)=\{\alpha(t),\beta(t)^{T}\}^{T}, where α(t)\alpha(t) and β(t)\beta(t) are non-parametric regression functions. Our aim is to find

{β^v(t),β˙^v(t)}=argminβ0,β1Rp+1i=1nKh(t1t){Yi(t1)Xi(t1)Tβ0Xi(t1)Tβ1(t1t)}2𝑑Ni(t1),\begin{split}&\{\hat{\beta}^{*}_{v}(t),\hat{\dot{\beta}}^{*}_{v}(t)\}\\ =&\mathop{\rm arg\min}_{\beta_{0}^{*},\beta_{1}^{*}\in R^{p+1}}\sum_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{Y_{i}(t_{1})-X_{i}^{*}(t_{1})^{T}\beta_{0}^{*}-X_{i}^{*}(t_{1})^{T}\beta_{1}^{*}(t_{1}-t)\right\}^{2}dN_{i}(t_{1}),\end{split} (2.9)

where K()K\left(\cdot\right) is a kernel function, Kh(t)=K(t/h)/h,K_{h}(t)=K\left(t/h\right)/h, and hh is a bandwidth.

Next we present the asymptotic distribution of β^v(t)\hat{\beta}_{v}(t) and relegate the proofs in the Appendix.

Theorem 3.

Under conditions (2.3), (A5)-(A8), as n,n\rightarrow\infty, we have

nh{β^v(t)β0(t)}dN{0,A1(t)Σ(t)A1(t)},\sqrt{nh}\{\hat{\beta}_{v}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where A(t)A(t) and Σ(t)\Sigma(t) are specified in Theorem 2.

The asymptotic distribution and the bias of β^v(t)\hat{\beta}_{v}(t) is the same as that of β^c(t).\hat{\beta}_{c}(t). Simulation studies provide numerical support for this. For variance estimation, let Qi(t1,t)={1,Xi(t1)T,(t1t),Xi(t1)T(t1t)}TQ_{i}(t_{1},t)=\{1,X_{i}(t_{1})^{T},(t_{1}-t),X_{i}(t_{1})^{T}(t_{1}-t)\}^{T} and ι^v(t){α^v(t),β^v(t)T,α˙^v(t),β˙^v(t)T}T.\hat{\iota}_{v}(t)\triangleq\{\hat{\alpha}_{v}(t),\hat{\beta}_{v}(t)^{T},\hat{\dot{\alpha}}_{v}(t),\hat{\dot{\beta}}_{v}(t)^{T}\}^{T}. We calculate the variance of ι^v(t)\hat{\iota}_{v}(t) by the following formula:

var^{ι^v(t)}={i=1nKh(t1t)Q^i(t1,t)Q^i(t1,t)T𝑑Ni(t1)}1×i=1n[Kh(t1t)Q^i(t1,t){Y^i(t1)Q^i(t1,t)Tι^v(t)}dNi(t1)]2×{i=1nKh(t1t)Q^i(t1,t)Q^i(t1,t)T𝑑Ni(t1)}1.\begin{split}\widehat{\mathrm{var}}\{\hat{\iota}_{v}(t)\}=&\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{Q}_{i}(t_{1},t)\hat{Q}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1}\\ &\times\sum_{i=1}^{n}\left[\iint K_{h}(t_{1}-t)\hat{Q}_{i}(t_{1},t)\left\{\hat{Y}_{i}(t_{1})-\hat{Q}_{i}(t_{1},t)^{T}\hat{\iota}_{v}(t)\right\}dN_{i}(t_{1})\right]^{\otimes 2}\\ &\times\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{Q}_{i}(t_{1},t)\hat{Q}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1}.\end{split}

The variance covariance matrix of {α^v(t),β^v(t)T}T\{\hat{\alpha}_{v}(t),\hat{\beta}_{v}(t)^{T}\}^{T} is the upper (p+1)×(p+1)(p+1)\times(p+1) submatrix of var^{ι^v(t)}.\widehat{\mathrm{var}}\{\hat{\iota}_{v}(t)\}. There is a trade-off between the centering approach and the non-parametric intercept approach. In terms of computation, the centering approach is faster. If we are interested in the non-parametric intercept term, the varying-coefficient model is more informative.

A kernel weighting approach to estimate γ(t)\gamma(t).  In the first stage, β(t)\beta(t) can be estimated using either the centering or the varying-coefficient model approach, and they have the same asymptotic distribution. Once β^(t)\hat{\beta}(t) is obtained, we can estimate γ(t)\gamma(t) by regressing the residuals from the first step on the asynchronous longitudinal covariates Z(t).Z(t). Due to the asynchronicity of Z(t),Z(t), we need two bandwidths in the calculation of γ^(t).\hat{\gamma}(t). Specifically,

{γ^(t),γ˙^(t)}=argminγ0,γ1Rqi=1nKh1,h2(t1t,t2t){Yi(t1)Xi(t1)Tβ^(t1)Zi(t2)Tγ0Zi(t2)Tγ1(t2t)}2dNi(t1,t2),\begin{split}\{\hat{\gamma}(t),\hat{\dot{\gamma}}(t)\}=\mathop{\rm arg\min}_{\gamma_{0},\gamma_{1}\in R^{q}}\sum_{i=1}^{n}\iint&K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\hat{\beta}(t_{1})\\ &-Z_{i}(t_{2})^{T}\gamma_{0}-Z_{i}(t_{2})^{T}\gamma_{1}(t_{2}-t)\}^{2}dN^{*}_{i}(t_{1},t_{2}),\end{split} (2.10)

where Kh1,h2(t,s)=K(t/h1,s/h2)/(h1h2),K_{h_{1},h_{2}}(t,s)=K(t/h_{1},s/h_{2})/(h_{1}h_{2}), where K(t,s)K(t,s) is a bivariate kernel function, which is usually taken to be the product of the univariate kernel function and h1h_{1} and h2h_{2} are bandwidths. As β^(t)\hat{\beta}(t) has a faster convergence rate, whether we use the true β(t)\beta(t) or β^(t)\hat{\beta}(t) does not affect the inference of γ^(t).\hat{\gamma}(t).

We establish the asymptotic distribution of γ^(t)\hat{\gamma}(t) in the following theorem and relegate the proofs to the Appendix.

Theorem 4.

Under conditions (2.3), (A1)-(A4), as n,n\rightarrow\infty, we have

nh1h2{γ^(t)γ0(t)}dN{0,A+(t)1Σ+(t)A+(t)1},\sqrt{nh_{1}h_{2}}\{\hat{\gamma}(t)-\gamma_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{+}(t)^{-1}\Sigma^{+}(t)A^{+}(t)^{-1}\},

where

A+(t)=E{Z(t)Z(t)T}η(t,t)andΣ+(t)={K(z1,z2)2𝑑z1𝑑z2}E[Z(t)Z(t)Tσ{t,X(t),Z(t)}2]η(t,t).\begin{split}A^{+}(t)&=E\{Z(t)Z(t)^{T}\}\eta(t,t)\\ \mbox{and}\quad\Sigma^{+}(t)&=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).\end{split}

In the second step, we need two bandwidths. If we set h1=h2=hβ,h_{1}=h_{2}=h_{\beta}, the bias of γ^(t)\hat{\gamma}(t) in Theorem 4 is O(hβ2).O(h_{\beta}^{2}). We achieve the same rate of convergence O(n1/3)O(n^{1/3}) for the estimation of γ^(t)\hat{\gamma}(t) as in the one-step method. This is the price we pay for the asynchronous longitudinal data structure. Whether we can improve this rate of convergence remains open at the moment.

Let ϕ^(t)={γ^(t)T,γ˙^(t)T}T,\hat{\phi}(t)=\{\hat{\gamma}(t)^{T},\hat{\dot{\gamma}}(t)^{T}\}^{T}, Vi(t2,t)={Zi(t2)T,Zi(t2)T(t2t)}T,V_{i}(t_{2},t)=\{Z_{i}(t_{2})^{T},Z_{i}(t_{2})^{T}(t_{2}-t)\}^{T}, we can calculate

var^{ϕ^(t)}\displaystyle\widehat{\mathrm{var}}\{\hat{\phi}(t)\}
={i=1nKh1,h2(t1t,t2t)Vi(t2,t)Vi(t2,t)T𝑑Ni(t1,t2)}1\displaystyle=\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)V_{i}(t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}
×i=1n[Kh1,h2(t1t,t2t)Vi(t2,t){Yi(t1)Xi(t1)Tβ^(t1)Vi(t2,t)Tϕ^(t)}dNi(t1,t2)]2\displaystyle\times\sum_{i=1}^{n}\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)\left\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\hat{\beta}(t_{1})-V_{i}(t_{2},t)^{T}\hat{\phi}(t)\right\}dN^{*}_{i}(t_{1},t_{2})\right]^{\otimes 2}
×{i=1nKh1,h2(t1t,t2t)Vi(t2,t)Vi(t2,t)T𝑑Ni(t1,t2)}1.\displaystyle\times\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)V_{i}(t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}.

The variance covariance matrix of γ^(t)\hat{\gamma}(t) is the upper p×pp\times p submatrix of var^{ϕ^(t)}.\widehat{\mathrm{var}}\{\hat{\phi}(t)\}. Statistical inference can be carried out afterwards.

2.4 Bandwidth selection

Our methods depend critically on the choice of the bandwidth. Asymptotic results provide theoretical range of valid bandwidths. In practice, it is desirable to perform a data-driven bandwidth selection. Cao et al. (2015) and Chen and Cao (2017) used a data-splitting strategy to separately calculate bias and variability and choose a bandwidth that minimizes the mean squared error.

In this paper, we use a kernel smoothed DD folds cross-validation procedure to select the optimal bandwidth. We use the two-step approach with centering in the first step and kernel weighting in the second step to illustrate this point. Specifically, we split the data set into DD folds, and estimate the non-parametric regression functions withholding one fold. The prediction error is calculated based on the one-fold data withheld and the regression functions are estimated with the other D1D-1 fold data. We do this DD times and the optimal bandwidth is the one that minimizes the average squared prediction error for any fixed time point t.t.

Specifically, for β^(t)\hat{\beta}(t) at the time point t,t, the average squared prediction error is

ASPEβ^(t)(h;t)=1Dk=1DAn(k)(h;t)/Bn(k)(h;t),\mathrm{ASPE}_{\hat{\beta}(t)}(h;t)=\frac{1}{D}\sum\limits_{k=1}^{D}A_{n^{(k)}}(h;t)/B_{n^{(k)}}(h;t), (2.11)

where

An(k)(h;t)=i=1n(k)Kh(t1t){Y^i(t1)X^i(t1)Tβ^(k)(t1)}2𝑑Ni(t1),andBn(k)(h;t)=i=1n(k)Kh(t1t)𝑑Ni(t1),k=1,,D,\begin{split}A_{n^{(k)}}(h;t)&=\sum\limits_{i=1}^{n^{(k)}}\int K_{h}(t_{1}-t)\{\hat{Y}_{i}(t_{1})-\hat{X}_{i}(t_{1})^{T}\hat{\beta}^{(-k)}(t_{1})\}^{2}dN_{i}(t_{1}),\\ \mbox{and}\quad B_{n^{(k)}}(h;t)&=\sum\limits_{i=1}^{n^{(k)}}\int K_{h}(t_{1}-t)dN_{i}(t_{1}),\quad k=1,\ldots,D,\end{split}

where n(k)n^{(k)} is the number of subjects in the kkth fold, β^(k)(t)\hat{\beta}^{(-k)}(t) is estimated without the subjects in the kkth fold, Y^i(t)\hat{Y}_{i}(t) estimates Y~i(t)\tilde{Y}_{i}(t), X^i(t)\hat{X}_{i}(t) estimates X~i(t),\tilde{X}_{i}(t), Kh()=K()/hK_{h}(\cdot)=K(\cdot)/h, K()K(\cdot) is a kernel density function and hh is the bandwidth. Similarly, for γ^(t)\hat{\gamma}(t) at time point t,t, the average squared prediction error is

ASPEγ^(t)(h1,h2;t)=1Dk=1DCn(k)(h1,h2;t)/Dn(k)(h1,h2;t),\mathrm{ASPE}_{\hat{\gamma}(t)}(h_{1},h_{2};t)=\frac{1}{D}\sum\limits_{k=1}^{D}C_{n^{(k)}}(h_{1},h_{2};t)/D_{n^{(k)}}(h_{1},h_{2};t), (2.12)

where

Cn(k)(h1,h2;t)=i=1n(k)Kh1,h2(t1t,t2t)×{Yi(t1)Xi(t1)Tβ^(t1)Zi(t2)Tγ^(k)(t2)}2dNi(t1,t2),andDn(k)(h1,h2;t)=i=1n(k)Kh1,h2(t1t,t2t)𝑑Ni(t1,t2),\begin{split}C_{n^{(k)}}(h_{1},h_{2};t)=&\sum\limits_{i=1}^{n^{(k)}}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\\ &\times\left\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\hat{\beta}(t_{1})-Z_{i}(t_{2})^{T}\hat{\gamma}^{(-k)}(t_{2})\right\}^{2}dN^{*}_{i}(t_{1},t_{2}),\\ \mbox{and}\quad D_{n^{(k)}}(h_{1},h_{2};t)=&\sum\limits_{i=1}^{n^{(k)}}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)dN^{*}_{i}(t_{1},t_{2}),\end{split}

where n(k)n^{(k)} is the number of subjects in the kkth fold, γ^(k)(t)\hat{\gamma}^{(-k)}(t) is estimated without the subjects in the kkth fold, β^(t)\hat{\beta}(t) is estimated from Step 1 with the optimal bandwidth plugged in, Kh1,h2(,)K_{h_{1},h_{2}}(\cdot,\cdot) is a bivariate kernel function, and h1h_{1} and h2h_{2} are bandwidths. It is worth noting that when using the two-step approach, we need to first select the bandwidth hopth_{{opt}} for β^(t)\hat{\beta}(t) by minimizing (2.11), and then select the optimal bandwidth for γ^(t)\hat{\gamma}(t) by minimizing (2.12). If we want to calculate the optimal bandwidth of the regression function in a certain range of t,t, we can calculate the integrated ASPE.

3 Numerical studies

In this section, we evaluate the performance of the finite sample of the proposed method through simulations.

3.1 Data generating process and model specification

We generate 1,0001,000 datasets, each consisting of 400400 or 900900 subjects. For each subject, the number of observations of synchronous longitudinal data {X(t),Y(t)}\{X(t),Y(t)\} and asynchronous longitudinal covariate Z(t)Z(t) follow Poisson(5)+1\mbox{Poisson}(5)+1. The observational times are independently generated from the standard uniform distribution 𝒰(0,1)\mathcal{U}(0,1) for {X(t),Y(t)}\{X(t),Y(t)\} and Z(t),Z(t), respectively. The covariate processes X(t)X(t) and Z(t)Z(t) are both Gaussian, with E{X(t)}=0,E\{X(t)\}=0, E{Z(t)}=2(t0.5)2E\{Z(t)\}=2(t-0.5)^{2} and Cov{X(t),X(s)}=Cov{Z(t),Z(s)}=ets.\mbox{Cov}\{X(t),X(s)\}=\mbox{Cov}\{Z(t),Z(s)\}=e^{-\mid t-s\mid}. The values of the stochastic process Z(t)Z(t) are recorded at the observational times of {X(t),Y(t)},\{X(t),Y(t)\}, only in the data generation stage, to generate Y(t)Y(t) in the model (2.1).

The longitudinal response is generated from

Y(t)=X(t)Tβ(t)+Z(t)Tγ(t)+ϵ(t),Y(t)=X(t)^{T}\beta(t)+Z(t)^{T}\gamma(t)+\epsilon(t),

where β(t)\beta(t) and γ(t)\gamma(t) are time-dependent coefficients, and ϵ(t)\epsilon(t) is a Gaussian process, independent of X(t)X(t) and Z(t),Z(t), with E{ϵ(t)}=0E\{\epsilon(t)\}=0 and Cov{ϵ(t),ϵ(s)}=2ts.\mbox{Cov}\{\epsilon(t),\epsilon(s)\}=2^{-\mid t-s\mid}. For the time-dependent coefficients, we consider two different settings: (i) β(t)=3(t0.4)2,γ(t)=sin(2πt);\beta(t)=3(t-0.4)^{2},\gamma(t)=\mbox{sin}(2\pi t); and (ii) β(t)=0.4t+0.5,γ(t)=t.\beta(t)=0.4t+0.5,\gamma(t)=\sqrt{t}.

3.2 Comparison of one-step and two-step methods

We compare two methods: one-step kernel weighting approach and two-step approach with centering in the first step and kernel weighting in the second step. Summary tables of two-step approach using non-parametric intercept and omitting the asynchronous longitudinal covariate at the first step and kernel weighting in the second step can be found in the Appendix. This method produces results similar to the two-step approach with centering in the first step and kernel weighting at the second step.

As suggested in Fan and Gijbels (1996), we use the Epanechnikov kernel K(t)=0.75(1t2)+,K(t)=0.75(1-t^{2})_{+}, where x+=max{x,0},x_{+}=\mbox{max}\{x,0\}, due to its excellent empirical performance. For the one-step kernel weighting method, we need kernel functions of two dimensions, where we use the product of a one-dimensional Epanechnikov kernel K(t,s)=0.5625(1t2)+(1s2)+.K(t,s)=0.5625(1-t^{2})_{+}(1-s^{2})_{+}. Other kernel functions produce similar results and the details are omitted.

We present the results of setting (i). The results of setting (ii) can be found in the Appendix. The simulation results for n=400n=400 and n=900n=900 are summarized in Table 1. We assess the estimation and inference of γ(t)\gamma(t) and β(t)\beta(t) at t=0.3,0.6t=0.3,0.6 and 0.9.0.9. The results at other time points are similar and thus omitted.

Table 1: 1000 simulation results for β(t)=3(t0.4)2\beta(t)=3(t-0.4)^{2} and γ(t)=sin(2πt)\gamma(t)=\sin(2\pi t)
t=0.3t=0.3 t=0.6t=0.6 t=0.9t=0.9
BD NP-F Bias SD SE CP Bias SD SE CP Bias SD SE CP
n=400n=400 Two-step  (Centering+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0070.007 0.1540.154 0.1440.144 92.492.4 0.0030.003 0.1300.130 0.1230.123 92.592.5 0.0050.005 0.1310.131 0.1220.122 91.891.8
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.008-0.008 0.1950.195 0.1860.186 92.092.0 0.008-0.008 0.1660.166 0.1540.154 91.291.2 0.0040.004 0.1680.168 0.1550.155 91.491.4
auto β(t)\beta(t) 0.004-0.004 0.1660.166 0.1540.154 91.791.7 0.005-0.005 0.1300.130 0.1210.121 92.292.2 0.0060.006 0.1370.137 0.1320.132 92.692.6
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.047-0.047 0.1460.146 0.1390.139 90.390.3 0.0350.035 0.1500.150 0.1410.141 91.091.0 0.0300.030 0.1480.148 0.1360.136 90.290.2
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.046-0.046 0.1570.157 0.1570.157 91.491.4 0.0240.024 0.1560.156 0.1460.146 90.390.3 0.0250.025 0.1490.149 0.1410.141 90.890.8
auto γ(t)\gamma(t) 0.047-0.047 0.1540.154 0.1420.142 90.290.2 0.0230.023 0.1560.156 0.1400.140 89.489.4 0.0260.026 0.1440.144 0.1360.136 90.390.3
One-step  (KW)
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} β(t)\beta(t) 0.0030.003 0.1200.120 0.1140.114 94.294.2 0.0020.002 0.1250.125 0.1160.116 91.191.1 0.0020.002 0.1250.125 0.1170.117 90.990.9
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0020.002 0.1480.148 0.1400.140 92.092.0 0.0040.004 0.1500.150 0.1380.138 90.790.7 0.007-0.007 0.1540.154 0.1370.137 90.090.0
auto β(t)\beta(t) 0.0010.001 0.1110.111 0.1060.106 92.692.6 0.0030.003 0.1180.118 0.1100.110 92.092.0 0.0070.007 0.1140.114 0.1070.107 92.492.4
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} γ(t)\gamma(t) 0.050-0.050 0.1280.128 0.1170.117 87.987.9 0.0330.033 0.1210.121 0.1160.116 91.191.1 0.0310.031 0.1130.113 0.1100.110 92.392.3
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.042-0.042 0.1560.156 0.1420.142 88.288.2 0.0190.019 0.1520.152 0.1380.138 89.889.8 0.0190.019 0.1490.149 0.1300.130 87.987.9
auto γ(t)\gamma(t) 0.056-0.056 0.1190.119 0.1070.107 87.487.4 0.0340.034 0.1150.115 0.1110.111 90.890.8 0.0360.036 0.1080.108 0.1020.102 91.591.5
n=900n=900 Two-step  (Centering+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0040.004 0.1220.122 0.1190.119 92.892.8 0.0010.001 0.1050.105 0.1000.100 92.292.2 0.0040.004 0.1030.103 0.1000.100 95.095.0
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.008-0.008 0.1760.176 0.1630.163 92.792.7 0.0040.004 0.1400.140 0.1350.135 92.892.8 0.007-0.007 0.1430.143 0.1360.136 92.092.0
auto β(t)\beta(t) 0.002-0.002 0.1240.124 0.1190.119 93.393.3 0.0010.001 0.1050.105 0.1020.102 93.093.0 0.0020.002 0.1020.102 0.1000.100 93.693.6
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.036-0.036 0.1310.131 0.1320.132 92.292.2 0.0280.028 0.1340.134 0.1260.126 91.891.8 0.0090.009 0.1280.128 0.1240.124 92.892.8
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.039-0.039 0.1510.151 0.1500.150 91.091.0 0.0160.016 0.1410.141 0.1360.136 93.093.0 0.0170.017 0.1320.132 0.1380.138 92.292.2
auto γ(t)\gamma(t) 0.031-0.031 0.1250.125 0.1210.121 91.891.8 0.0230.023 0.1340.134 0.1280.128 92.492.4 0.0160.016 0.1350.135 0.1240.124 91.291.2
One-step  (KW)
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} β(t)\beta(t) 0.0000.000 0.1050.105 0.1020.102 92.992.9 0.0010.001 0.1020.102 0.1000.100 92.492.4 0.002-0.002 0.1070.107 0.1000.100 92.892.8
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0020.002 0.1410.141 0.1260.126 91.391.3 0.0000.000 0.1270.127 0.1250.125 92.992.9 0.001-0.001 0.1320.132 0.1240.124 92.192.1
auto β(t)\beta(t) 0.0050.005 0.0920.092 0.0900.090 92.792.7 0.0030.003 0.0910.091 0.0850.085 92.392.3 0.0010.001 0.0860.086 0.0830.083 93.293.2
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} γ(t)\gamma(t) 0.032-0.032 0.1050.105 0.1010.101 92.092.0 0.0140.014 0.1070.107 0.1000.100 92.892.8 0.0230.023 0.0970.097 0.0960.096 92.892.8
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.022-0.022 0.1370.137 0.1250.125 92.192.1 0.0160.016 0.1370.137 0.1260.126 92.092.0 0.0160.016 0.1280.128 0.1180.118 92.592.5
auto γ(t)\gamma(t) 0.036-0.036 0.0920.092 0.0900.090 91.491.4 0.0260.026 0.0910.091 0.0860.086 92.292.2 0.0250.025 0.0800.080 0.0800.080 91.691.6
  • Note: “BD” represents the bandwidths, where hh represents the bandwidth in the centering approach, h1h_{1} and h2h_{2} represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error, and “CP” is the pointwise 95%95\% coverage probability. “auto” represents automatic bandwidth selection.

For the estimation of γ(t)\gamma(t) based on the one-step kernel weighting method or the two-step method with centering in the first step and kernel weighting in the second step, we find that the bias is generally small, the estimated standard deviation is close to the empirical standard deviation, and the probability of coverage is close to the nominal 95%95\% with a large sample size at the examined time points t=0.3,0.6t=0.3,0.6 and 0.90.9. Performance improves as the sample size increases. Therefore, we can conduct a valid pointwise inference of γ(t)\gamma(t) in practice using the proposed method.

Both methods are valid for inference of β(t)\beta(t). From Figure 2, we observe that with the same bandwidth, the standard deviation of β^(t)\hat{\beta}(t) is smaller using the two-step method with kernel weighting in the second step compared to the one-step kernel weighting method, which is consistent with our theoretical prediction of the convergence rate.

Refer to caption
Figure 2: Typical estimated curves, 95% point-wise confidence intervals, and 95% simultaneous confidence bands for β(t)=3(t0.4)2,γ(t)=sin(2πt)\beta(t)=3(t-0.4)^{2},\gamma(t)=\sin(2\pi t) with the same bandwidth h=h1=h2=n0.5,h=h_{1}=h_{2}=n^{-0.5}, n=400n=400 (first row) and n=900n=900 (second row).

In the one-step kernel weighting method, we let h1=h2=h.h_{1}=h_{2}=h. The allowable range of hh is (n1/2,n1/6).(n^{-1/2},n^{-1/6}). In the simulation, we choose the optimal bandwidth in the range of (n0.5,n0.4).(n^{-0.5},n^{-0.4}). In the two-step method, the allowable bandwidth in the first step for the estimation of β(t)\beta(t) is (n1,n1/5)(n^{-1},n^{-1/5}) and we choose the optimal bandwidth in the range of (n0.8,n0.6).(n^{-0.8},n^{-0.6}). The allowable bandwidth in the second step for the estimation of γ(t)\gamma(t) is (n1/2,n1/6)(n^{-1/2},n^{-1/6}) when h1=h2=hh_{1}=h_{2}=h and the optimal bandwidth is selected in the range (n0.5,n0.4).(n^{-0.5},n^{-0.4}). Automatic bandwidth selection performs well and can be used to conduct inference.

3.3 Simultaneous confidence bands

In this section, we examine the performance of the simultaneous confidence band. Instead of a point-wise confidence interval for fixed time points, it is more informative to examine the overall magnitude of variation of the non-parametric regression function. The data generation process is the same as that in Section 3.1 and we use setting (i) to illustrate. We are interested in constructing simultaneous confidence bands for β(t)\beta(t) and γ(t)\gamma(t) based on the one-step kernel weighting method and the two-step method with kernel weighting at the second stage.

Specifically, we use the wild bootstrap approach described in Section 2 to find the empirical 9595 percentile of the supremum of the difference between estimated non-parametric regression function and true non-parametric regression function. We use the Rademacher distribution, which takes value +1+1 with probability 50%50\% and value 1-1 with probability 50%50\% in the wild bootstrap. In the calculation of the empirical coverage probability, we use 181181 equally spaced grid points in [0.05,0.95].[0.05,0.95]. We use the Epanechikov kernel. Results based on other commonly used kernel functions such as the uniform kernel are similar, and thus omitted.

In addition to the simultaneous coverage probability of the pointwise confidence interval and the newly proposed simultaneous confidence band, we also present the root average squared error (RASE). Using γ(t)\gamma(t) as an example, the RASE for a non-parametric function is defined as RASE2=ngrid1k=1ngrid{γ^(tk)γ(tk)}2,{\rm RASE}^{2}=n_{grid}^{-1}\sum\limits_{k=1}^{n_{grid}}\left\{\hat{\gamma}(t_{k})-\gamma(t_{k})\right\}^{2}, where {tk,k=1,,ngrid}\{t_{k},k=1,\ldots,n_{grid}\} are grid points at which the non-parametric function γ()\gamma(\cdot) is estimated.

The results are summarized in Table 2 with n=400n=400 and 900.900. We can see that as the sample size increases, the simultaneous confidence band coverage probabilities based on the proposed one-step kernel weighting and two-step method with kernel weighting in the second step are close to the nominal ones, and the point-wise confidence interval is not valid for simultaneous inference. The RASE gets smaller with increased sample size.

Table 2: Simulation results for β(t)=3(t0.4)2,γ(t)=sin(2πt)\beta(t)=3(t-0.4)^{2},\gamma(t)=\sin(2\pi t)
n=400n=400 n=900n=900
BD NP-F RASE(SD) CI SCB RASE(SD) CI SCB
Two-step (Centering+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.134(0.022)0.134(0.022) 4.84.8 95.195.1 0.109(0.014)0.109(0.014) 1.01.0 94.994.9
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.177(0.022)0.177(0.022) 0.00.0 93.693.6 0.152(0.013)0.152(0.013) 0.00.0 93.893.8
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.150(0.027)0.150(0.027) 3.43.4 93.193.1 0.133(0.020)0.133(0.020) 1.61.6 94.194.1
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.158(0.027)0.158(0.027) 1.51.5 95.495.4 0.142(0.018)0.142(0.018) 0.20.2 95.095.0
One-step (KW)
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} β(t)\beta(t) 0.120(0.025)0.120(0.025) 14.214.2 97.397.3 0.104(0.018)0.104(0.018) 5.85.8 95.895.8
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.148(0.027)0.148(0.027) 4.64.6 95.895.8 0.133(0.019)0.133(0.019) 1.71.7 95.595.5
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} γ(t)\gamma(t) 0.125(0.027)0.125(0.027) 9.79.7 94.894.8 0.103(0.019)0.103(0.019) 7.17.1 95.295.2
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.149(0.026)0.149(0.026) 3.33.3 94.694.6 0.133(0.020)0.133(0.020) 1.21.2 94.294.2
  • Note: “BD” represents the bandwidths, where hh represents the bandwidth in the centering approach, h1h_{1} and h2h_{2} represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “RASE” represents the average of RASE, and “SD” represents the standard deviation of RASE. “CI” represents the simultaneous coverage probability based on the pointwise confidence interval. “SCB” represents the simultaneous coverage probability based on SCB. The nominal level is 95%95\%.

Figure 2 shows typical graphs of β(t)\beta(t) and γ(t)\gamma(t) based on the one-step kernel weighting method and the two-step method with kernel weighting at the second step for n=400n=400 and 900.900. We can see that in the estimation of β(t),\beta(t), both the pointwise confidence interval and the simultaneous confidence band are narrower in the two-step method with kernel weighting at the second step than in that based on the one-step kernel weighting method. This reflects that we obtain a more efficient estimate of β(t)\beta(t) by using the two-step method compared to the one-step method. As the sample size increases, we obtain narrower confidence bands for β(t)\beta(t) and γ(t).\gamma(t)..

4 Application to the ADNI Data

In this section, we illustrate the proposed methods on a data set from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study (http://www.adni-info.org/). ADNI is a large-scale multicenter neuroimaging study that collects genetic, clinical, imaging, and cognitive data at multiple time points to better understand Alzheimer’s Disease. We used a subset of ADNI consisting of n=256n=256 subjects over a 55 year follow-up to examine the relationship between diffusion-weighted imaging (DWI) and cognitive decline. For the measurement of DWI, we use fractional anisotropy (FA), which reflects fiber density, axonal diameter, and myelination in white matter. FA is observed at 11 to 88 time points. For the outcome related to cognitive decline, we use the Mini-Mental State Examination (MMSE) score, with lower scores indicating impairment. The MMSE score is examined from 11 to 1313 time points. For each subject, the measurement times of MMSE and FA are mismatched. Other covariates such as age (ranging from 5555 to 96 years), years of education (ranging from 1111 to 2020) and the number of APOE4 genes are synchronous with MMSE, creating mixed synchronous and asynchronous longitudinal covariates. The growth curves for YMMSE(t),XAge(t)Y_{\rm MMSE}(t),X_{\rm Age}(t) and ZFA(t)Z_{\rm FA}(t) of randomly selected patients are presented in Figure 3. Details of the registration and normalization of imaging data can be found in Li et al. (2022).

Refer to caption
Figure 3: The growth curves for YMMSE(t),XAge(t)Y_{\rm MMSE}(t),X_{\rm Age}(t) and ZFA(t)Z_{\rm FA}(t) of patient 9, 156 and 256.

We aim to examine the association between MMSE and log hazard function of FA at region 0.350.35 (the whole brain is divided into 100100 regions) accounting for other baseline covariates. We first normalize the data as recommended in Diggle et al. (2002); Fan and Li (2004). Specifically, for continuous baseline covariate education level, we subtract the mean and divide the standard deviation across all individuals. For continuous longitudinal data, such as age, at time point tt, we get Nadaraya-Watson type of estimator of X¯Age(t)\bar{X}_{\rm{Age}}(t) and X¯Age2(t).\bar{X}^{2}_{\rm{Age}}(t). The standard deviation sAge(t)=X¯Age2(t){X¯Age(t)}2s_{\rm{Age}}(t)=\sqrt{\bar{X}^{2}_{\rm{Age}}(t)-\{\bar{X}_{\rm{Age}}(t)\}^{2}} and the normalized version is {XAge(t)X¯Age(t)}/sAge(t).\{X_{\rm{Age}}(t)-\bar{X}_{\rm{Age}}(t)\}/s_{\rm{Age}}(t). Other longitudinal data can be normalized in the same way and we omit the details. Below we use the same notation for normalized data.

Our model is

E{YMMSE(t)XAge(t),XEdu,XAPOE4,ZFA(t)}\displaystyle E\{Y_{\rm{MMSE}}(t)\mid X_{\rm{Age}}(t),X_{\rm{Edu}},X_{\rm{APOE4}},Z_{\rm{FA}}(t)\} (4.13)
=\displaystyle= α(t)+βAge(t)XAge(t)+βEdu(t)XEdu+βAPOE4(t)XAPOE4+γFA(t)ZFA(t),\displaystyle\alpha(t)+\beta_{\rm{Age}}(t)X_{\rm{Age}}(t)+\beta_{\rm{Edu}}(t)X_{\rm{Edu}}+\beta_{\rm{APOE4}}(t)X_{\rm{APOE4}}+\gamma_{\rm{FA}}(t)Z_{\rm{FA}}(t),

where YMMSE(t)Y_{\rm{MMSE}}(t) denotes longitudinal response of MMSE, XAge(t),XEdu,XAPOE4X_{\rm{Age}}(t),X_{\rm{Edu}},X_{\rm{APOE4}} denote synchronous longitudinal covariates and ZFA(t)Z_{\rm{FA}}(t) denotes asynchronous longitudinal covariate. In (LABEL:realdata1), except for XAPOE4,X_{\rm{APOE4}}, which denotes number of copies of gene APOE4 (0,10,1 or 22), the rest are continuous variables.

In our proposed approaches, the two-step method requires that the synchronous and asynchronous longitudinal covariates are uncorrelated. We regress log hazard function of FA on age, education level, and the number of APOE4 genes with three univariate regression models and one multivariate regression model as follows

E{ZFA(t)XAge(t),XEdu,XAPOE4}=α+β1XAge(t)+β2XEdu+β3XAPOE4.{}E\{Z_{\rm{FA}}(t)\mid X_{\rm{Age}}(t),X_{\rm{Edu}},X_{\rm{APOE4}}\}=\alpha+\beta_{1}X_{\rm{Age}}(t)+\beta_{2}X_{\rm{Edu}}+\beta_{3}X_{\rm{APOE4}}. (4.14)

The results are summarized in Table 3. We observe that education level and APOE4 genes are not statistically significantly associated with FA. There is a significant effect of age. We shall use one-step method to estimate regression functions in (LABEL:realdata1) and do the two-step method without age as a sensitivity analysis. The first stage of the two-step approach uses the bandwidth h=4n0.60.144h=4n^{-0.6}\approx 0.144, and the second stage of the two-step approach and one-step kernel weighting approach use the bandwidth h1=h2=4n0.60.144h_{1}=h_{2}=4n^{-0.6}\approx 0.144 in ADNI data.

Table 3: Regression results of FA on age, education level and APOE4
Fit separately Fit in one model
Age Edu APOE4 Age Edu APOE4
Estimate 0.0080.008 0.001-0.001 0.021-0.021 0.0090.009 0.0010.001 0.0040.004
SE 0.0030.003 0.0070.007 0.0290.029 0.0030.003 0.0070.007 0.0310.031
pp-value 0.0030.003 0.9230.923 0.4750.475 0.0030.003 0.9250.925 0.9040.904

Figure 4 presents the non-parametric regression functions and SCBs of βAge(t),βEdu(t),\beta_{\rm{Age}}(t),\beta_{\rm{Edu}}(t), βAPOE4(t)\beta_{\rm{APOE4}}(t) and γFA(t)\gamma_{\rm{FA}}(t) based on one-step method. We also draw constant and linear lines to check their adequacy in explaining the variation. We observe that the SCB of βAge(t)\beta_{\rm Age}(t) does not contain linear or constant lines, which implies that age has a non-linear time varying effect on MMSE. Education level is not statistically significantly associated with MMSE. There is a negative constant effect of APOE4 on MMSE, which is consistent with results in the medical literature (Safieh et al., 2019). FA has a non-zero constant effect on MMSE.

Refer to caption
Figure 4: Estimates of βAge(t),βEdu(t),βAPOE4(t)\beta_{\rm{Age}}(t),\beta_{\rm{Edu}}(t),\beta_{\rm{APOE4}}(t) and γFA(t)\gamma_{\rm{FA}}(t) and their SCBs in model (LABEL:realdata1).

Therefore our analysis suggests the following model

E{YMMSE(t)XAge(t),XAPOE4,ZFA(t)}\displaystyle E\{Y_{\rm{MMSE}}(t)\mid X_{\rm{Age}}(t),X_{\rm{APOE4}},Z_{\rm{FA}}(t)\} (4.15)
=\displaystyle= α(t)+βAge(t)XAge(t)+βAPOE4XAPOE4+γFAZFA(t),\displaystyle\alpha(t)+\beta_{\rm{Age}}(t)X_{\rm{Age}}(t)+\beta_{\rm{APOE4}}X_{\rm{APOE4}}+\gamma_{\rm{FA}}Z_{\rm{FA}}(t),

we fit model (LABEL:realdata3) based on one step method. As a comparison, we omit age and use the two step approach with centering at the first step and kernel weighting at the second step to estimate the constant coefficients for APOE4 and FA. In other words, we minimize i=1256j=1Li{Y^MMSE(Tij)βAPOE4X^APOE4}2\sum_{i=1}^{256}\sum_{j=1}^{L_{i}}\{\hat{Y}_{\rm{MMSE}}(T_{ij})-\beta_{\rm{APOE4}}\hat{X}_{\rm{APOE4}}\}^{2} to obtain β^APOE4\hat{\beta}_{\rm{APOE4}} at the first step where Y^MMSE\hat{Y}_{\rm{MMSE}} and X^APOE4\hat{X}_{\rm{APOE4}} are the centering estimators. Then we combine profile least squares method (Fan and Li, 2004) and kernel weighting method (Cao et al., 2015) to estimate γFA.\gamma_{\rm{FA}}.

The results of constant coefficient estimates of βAPOE4\beta_{\rm APOE4} and γFA\gamma_{\rm FA} are summarized in Table 4.

Table 4: Constant coefficient estimates of APOE4 and FA
Estimate SE p-value
One-step APOE4 0.472-0.472 0.0780.078 <0.001<0.001
FA 0.1290.129 0.0480.048 0.0080.008
Two-step APOE4 0.471-0.471 0.0800.080 <0.001<0.001
FA 0.1210.121 0.0500.050 0.0150.015

We can see that APOE4 has a negative statistically significant association with MMSE and FA has a positive statistically significant association with MMSE based on both one-step and two-step methods. In addition, Figure 5 presents the regression function estimation of βAge(t)\beta_{\rm{Age}}(t) based on one-step method. It seems that the age effect is highly non-linear, which cannot be obtained by existing methods.

Refer to caption
Figure 5: The estimate of βAge(t)\beta_{\rm Age}(t) and its 95% SCB in model (LABEL:realdata3).

5 Concluding remarks

In this paper, we propose new methods to conduct statistical inference of mixed synchronous and asynchronous longitudinal covariates with varying coefficient models. Our one-step method allows arbitrary dependence between the synchronous and asynchronous longitudinal covariates. If we further restrict the synchronous and asynchronous longitudinal covariates to be uncorrelated, we get non-parametric rate of convergence O(n2/5)O(n^{2/5}) for the regression function of synchronous longitudinal covariates, faster than the rate of convergence O(n1/3)O(n^{1/3}) for the regression function of asynchronous longitudinal covariates. It remains an open question whether O(n2/5)O(n^{2/5}) rate of convergence can be obtained for the regression function of asynchronous longitudinal covariates. To quantify the overall magnitude of variation of the non-parametric regression function, we construct simultaneous confidence bands for the non-parametric regression functions with wild bootstrap. The computation is scalable as we put random variation in the residual without the need to compute the non-parametric regression function for each bootstrap sample. Such SCBs allow us to see if the regression function can be simplified as a constant, or a linear function.

We use working independence covariance matrix similar to the GEE (Liang and Zeger, 1986). Pepe and Anderson (1994) shows that with time dependent covariats, working independence is a safe choice. Efficiency can be improved by correctly specified covariance matrix, which is usually difficult.

R code for simulation and data analysis can be found in https://github.com/hongyuan-cao/mixed-longitudinal. The dataset can be found in https://github.com/BIG-S2/GFPLVCM.

Acknowledgements

We thank Ting Li for help with data acquisition. Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

References

  • Andersen and Gill (1982) Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. Ann. Stat., 10, 1100–1120.
  • Cao et al. (2016) Cao, H., Li, J. and Fine, J. P. (2016). On last observation carried forward and asynchronous longitudinal regression analysis. Electron. J. Stat., 10, 1155–1180.
  • Cao et al. (2015) Cao, H., Zeng, D. and Fine, J. P. (2015). Regression analysis of sparse asynchronous longitudinal data. J. R. Stat. Soc. Ser. B-Stat. Methodol., 77, 755–776.
  • Cao et al. (2018) Cao, H., Liu, W. and Zhou, Z. (2018). Simultaneous nonparametric regression analysis of sparse longitudinal data. Bernoulli, 24, 3013–3038.
  • Cattaneo et al. (2020) Cattaneo, M. D., Jansson, M. and Ma, X. (2020). Simple local polynomial density estimators. J. Am. Stat. Assoc., 115, 1449–1455.
  • Chen and Cao (2017) Chen, L. and Cao, H. (2017). Analysis of asynchronous longitudinal data with partially linear models. Electron. J. Stat., 11, 1549–1569.
  • Diggle et al. (2002) Diggle, P. J., Heagerty, P., Liang, K. Y. and Zeger, S. L. (2002). Analysis of longitudinal data (2nd ed.), Clarendon, TX: Clarendon Press.
  • Ding (2021) Ding, P. (2021). The Frisch-Waugh-Lovell theorem for standard errors. Stat. Probab. Lett., 168, 108945.
  • Fan and Gijbels (1996) Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall, London.
  • Fan and Li (2004) Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc., 99, 710–723.
  • Fan et al. (2007) Fan, J., Huang, T. and Li, R. (2007). Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function. J. Am. Stat. Assoc., 102, 632–641.
  • Fan and Zhang (2000) Fan, J. and Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scand. J. Stat., 27, 715–731.
  • Fan and Zhang (2008) Fan, J. and Zhang, W. (2008). Statistical methods with varying coefficient models. Stat. Interface, 1, 179–195.
  • Frisch and Waugh (1933) Frisch, R. and Waugh, F. V. (1933). Partial time regressions as compared with individual trends. Econometrica, 1, 387–401.
  • Gu et al. (2014) Gu, L., Wang, L., Härdle, W. K. and Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. Test, 23, 806–843.
  • Hastie and Tibshirani (1993) Hastie, T. and Tibshirani R. (1993). Varying-coefficient models. J. R. Stat. Soc. Ser. B-Stat. Methodol., 55, 757–796.
  • Li et al. (2022) Li, T., Li, T., Zhu, Z. and Zhu H. (2022). Regression analysis of asynchronous longitudinal functional and scalar data. J. Am. Stat. Assoc., 117, 1228–1242.
  • Liang and Zeger (1986) Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
  • Lin and Ying (2001) Lin, D. Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. J. Am. Stat. Assoc., 96, 103–126.
  • Liu (1988) Liu, R. Y. (1988). Bootstrap procedures under some non-i.i.d. models. Ann. Stat., 16, 1696–1708.
  • Liu and Wu (2010) Liu, W. and Wu, W. B. (2010). Simultaneous nonparametric inference of time series. Ann. Stat., 38, 2388–2421.
  • Lou et al. (2021) Lou, J., Wang, Y., Li, L. and Zeng, D. (2021). Learning latent heterogeneity for type 2 diabetes patients using longitudinal health markers in electronic health records. Stat. Med., 40, 1930–1946.
  • Lovell (1963) Lovell, M. C. (1963). Seasonal adjustment of economic time series and multiple regression analysis. J. Am. Stat. Assoc., 58, 993–1010.
  • Ma et al. (2012) Ma, S., Yang, L. and Carroll, R. J. (2012). A simultaneous confidence band for sparse longitudinal regression. Stat. Sin., 22, 95–122.
  • Nadaraya (1964) Nadaraya, E. A. (1964). On estimating regression. Theory Probab. Appl., 9, 141–142.
  • Pepe and Anderson (1994) Pepe, M. S. and Anderson, G. L. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Commun. Stat.-Simul. Comput., 23, 939–951.
  • Safieh et al. (2019) Safieh, M., Korczyn, A. D. and Michaelson, D. M. (2019). ApoE4: an emerging therapeutic target for Alzheimer’s disease. BMC Med., 17, 64.
  • Sentürk et al. (2013) Sentürk, D., Dalrymple, L. S., Mohammed, S. M., Kaysen, G. A. and Nguyen, D. V. (2013). Modeling time-varying effects with generalized and unsynchronized longitudinal data. Stat. Med., 32, 2971–2987.
  • Sun et al. (2021) Sun, D., Zhao, H. and Sun, J. (2021). Regression analysis of asynchronous longitudinal data with informative observation processes. Comput. Stat. Data Anal. 107161.
  • van der Vaart and Wellner (1996) van der Vaart A. and Wellner, J. (1996). Weak convergence and empirical processes. Springer, New York.
  • Watson (1964) Watson, G. S. (1964). Smooth regression analysis. Sankhyā Ser. A., 26, 359–372.
  • Wu (1986) Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat., 14, 1261–1295.
  • Xiong and Dubin (2010) Xiong, X. and Dubin, J. A. (2010). A binning method for analyzing mixed longitudinal data measured at distinct time points. Stat. Med., 29, 1919–1931.
  • Zheng et al. (2014) Zheng, S., Yang, L. and Härdle, W. K. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. J. Am. Stat. Assoc., 109, 661–-673.
  • Zhou and Wu (2010) Zhou, Z. and Wu, W. B. (2010). Simultaneous inference of linear models with time varying coefficients. J. R. Stat. Soc. Ser. B-Stat. Methodol., 72, 513–531.
  • Zhu et al. (2007) Zhu, H., Ibrahim, J. G., Tang, N., Rowe, D. B., Hao, X., Bansal, R. and Peterson, B. S. (2007). A statistical analysis of brain morphology using wild bootstrapping. IEEE Trans. Med. Imaging, 26, 954–966.
  • Zhu et al. (2012) Zhu, H., Li, R. and Kong, L. (2012). Multivariate varying coefficient model for functional responses. Ann. Stat., 40, 2634–2666.

Appendix A Proofs of main results

In this section, we provide detailed proofs of Theorems 1 to 4. Our main tools are empirical processes.

A.1 Proof of Theorem 1

Theorem 1. Under conditions (A1)-(A4), as n,n\rightarrow\infty, we have

nh1h2[{β^w(t)β0(t)}T,{γ^w(t)γ0(t)}T]TdN{0,A(t)1Σ(t)A(t)1}\sqrt{nh_{1}h_{2}}\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{*}(t)^{-1}\Sigma^{*}(t)A^{*}(t)^{-1}\}

where

A(t)=E{C(t,t)C(t,t)T}η(t,t)A^{*}(t)=E\{C(t,t)C(t,t)^{T}\}\eta(t,t)

and

Σ(t)={K(z1,z2)2𝑑z1𝑑z2}E[C(t,t)C(t,t)Tσ{t,X(t),Z(t)}2]η(t,t).\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).
Proof.

For any time point tt, denote true regression function

ρ0(t)={β0(t)T,β˙0(t)T,γ0(t)T,γ˙0(t)T}T,\rho_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T},\gamma_{0}(t)^{T},\dot{\gamma}_{0}(t)^{T}\}^{T},

estimated regression function

ρ^w(t)={β^w(t)T,β˙^w(t)T,γ^w(t)T,γ˙^w(t)T}T,\hat{\rho}_{w}(t)=\{\hat{\beta}_{w}(t)^{T},\hat{\dot{\beta}}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T},\hat{\dot{\gamma}}_{w}(t)^{T}\}^{T},

and longitudinal covariates

Ri(t1,t2,t)={Xi(t1)T,Xi(t1)T(t1t),Zi(t2)T,Zi(t2)T(t2t)}T.R_{i}(t_{1},t_{2},t)=\{X_{i}(t_{1})^{T},X_{i}(t_{1})^{T}(t_{1}-t),Z_{i}(t_{2})^{T},Z_{i}(t_{2})^{T}(t_{2}-t)\}^{T}.

Let

e=(e1,p×p0p×p0p×q0p×q0q×p0q×pe2,q×q0q×q)Te^{*}=\begin{pmatrix}e_{1,p\times p}&0_{p\times p}&0_{p\times q}&0_{p\times q}\\ 0_{q\times p}&0_{q\times p}&e_{2,q\times q}&0_{q\times q}\end{pmatrix}^{T}

be a matrix with dimension (2p+2q)×(p+q),(2p+2q)\times(p+q), where e1,p×pe_{1,p\times p} is a p×pp\times p identity matrix, e2,q×qe_{2,q\times q} is a q×qq\times q identity matrix and the others are matrices with 0 entries. Then our estimating equation becomes

eTUnw{ρ(t)}\displaystyle{e^{*}}^{T}U_{n}^{w}\left\{\rho(t)\right\}
\displaystyle\triangleq n1i=1nKh1,h2(t1t,t2t)eTRi(t1,t2,t){Yi(t1)Ri(t1,t2,t)Tρ(t)}𝑑Ni(t1,t2)\displaystyle n^{-1}\sum\limits_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\right\}dN^{*}_{i}(t_{1},t_{2})
=\displaystyle= n1i=1nKh1,h2(t1t,t2t)Ci(t1,t2){Yi(t1)Ri(t1,t2,t)Tρ(t)}𝑑Ni(t1,t2),\displaystyle n^{-1}\sum\limits_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)C_{i}(t_{1},t_{2})\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\right\}dN^{*}_{i}(t_{1},t_{2}), (S.1)

where Ci(t1,t2)={Xi(t1)T,Zi(t2)T}T,i=1,,nC_{i}(t_{1},t_{2})=\{X_{i}(t_{1})^{T},Z_{i}(t_{2})^{T}\}^{T},i=1,\ldots,n. Solving Unw{ρ(t)}=0,U_{n}^{w}\{\rho(t)\}=0, we obtain ρ^w(t)\hat{\rho}_{w}(t). We first show the consistency of β^w(t)\hat{\beta}_{w}(t) and γ^w(t).\hat{\gamma}_{w}(t). Note that

E[eTUnw{ρ(t)}]\displaystyle E\left[{e^{*}}^{T}U_{n}^{w}\{\rho(t)\}\right]
=\displaystyle= n1i=1nE(E[Kh1,h2(t1t,t2t)eTRi(t1,t2,t){Yi(t1)Ri(t1,t2,t)Tρ(t)}..\displaystyle n^{-1}\sum\limits_{i=1}^{n}E\bigg{(}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\right\}\bigg{.}\bigg{.}
..×dNi(t1,t2)X(t),Z(s),N(t,s),t,s[0,1]])\displaystyle\bigg{.}\bigg{.}\times dN_{i}^{*}(t_{1},t_{2})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\bigg{]}\bigg{)}
=\displaystyle= n1i=1nE[Kh1,h2(t1t,t2t)eTRi(t1,t2,t).\displaystyle n^{-1}\sum\limits_{i=1}^{n}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\bigg{.}
.×E{Yi(t1)Ri(t1,t2,t)Tρ(t)X(t),Z(s),t,s[0,1]}η(t1,t2)dt1dt2]\displaystyle\bigg{.}\times E\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}
=\displaystyle= n1i=1nE(Kh1,h2(t1t,t2t)eTRi(t1,t2,t).\displaystyle n^{-1}\sum\limits_{i=1}^{n}E\bigg{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\bigg{.}
.×[Ri(t1,t1,t)Tρ0(t)Ri(t1,t2,t)Tρ(t)+Op{(t1t)2}]η(t1,t2)dt1dt2).\displaystyle\bigg{.}\times\left[R_{i}(t_{1},t_{1},t)^{T}\rho_{0}(t)-R_{i}(t_{1},t_{2},t)^{T}\rho(t)+O_{p}\{(t_{1}-t)^{2}\}\right]\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{)}.

By conditions (A2)-(A4) and Taylor expansion,

E[eTUnw{ρ(t)}]=\displaystyle E\left[{e^{*}}^{T}U_{n}^{w}\{\rho(t)\}\right]= E{C(t,t)C(t,t)T}η(t,t)[{β0(t)β(t)}T,{γ0(t)γ(t)}T]T\displaystyle E\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta_{0}(t)-\beta(t)\}^{T},\{\gamma_{0}(t)-\gamma(t)\}^{T}\right]^{T}
+op(1).\displaystyle+o_{p}(1).

By the law of large numbers, eTUnw{ρ(t)}puw{β(t),γ(t)},{e^{*}}^{T}U_{n}^{w}\left\{\rho(t)\right\}\stackrel{{\scriptstyle p}}{{\rightarrow}}u^{w}\left\{\beta(t),\gamma(t)\right\}, as n,n\to\infty, where uw{β(t),γ(t)}=E{C(t,t)C(t,t)T}η(t,t)[{β0(t)β(t)}T,{γ0(t)γ(t)}T]T,u^{w}\left\{\beta(t),\gamma(t)\right\}\newline =E\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta_{0}(t)-\beta(t)\}^{T},\{\gamma_{0}(t)-\gamma(t)\}^{T}\right]^{T}, a column vector in p+q.{\mathbb{R}}^{p+q}. Under condition (A2), {β0(t)T,γ0(t)T}T\left\{\beta_{0}(t)^{T},\gamma_{0}(t)^{T}\right\}^{T} is the unique solution to uw{β(t),γ(t)}=0.u^{w}\left\{\beta(t),\gamma(t)\right\}=0.
{β^w(t)T,γ^w(t)T}T\{\hat{\beta}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T}\}^{T} solves the estimation equation eTUnw{ρ(t)}=0.{e^{*}}^{T}U_{n}^{w}\left\{\rho(t)\right\}=0. By Andersen and Gill (1982), it follows that for any fixed t,t, {β^w(t)T,γ^w(t)T}Tp{β0(t)T,γ0(t)T}T\{\hat{\beta}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T}\}^{T}\stackrel{{\scriptstyle p}}{{\to}}\{\beta_{0}(t)^{T},\gamma_{0}(t)^{T}\}^{T}.

We next show the asymptotic normality of {β^w(t)T,γ^w(t)T}T.\{\hat{\beta}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T}\}^{T}. Let 𝒫n\mathcal{P}_{n} and 𝒫\mathcal{P} denote the empirical measure and the true probability measure, respectively. We have

(nh1h2)1/2eTUnw{ρ(t)}\displaystyle(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho(t)\right\}
=\displaystyle= (nh1h2)1/2(𝒫n𝒫)[Kh1,h2(t1t,t2t)eTR(t1,t2,t){Y(t1)R(t1,t2,t)Tρ(t)}.\displaystyle(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho(t)\}\Big{.}
.×dN(t1,t2)]\displaystyle\Big{.}\times dN^{*}(t_{1},t_{2})\Big{]}
+(nh1h2)1/2E[Kh1,h2(t1t,t2t)eTR(t1,t2,t){Y(t1)R(t1,t2,t)Tρ(t)}.\displaystyle+(nh_{1}h_{2})^{1/2}{{E}}\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho(t)\}\Big{.}
.×dN(t1,t2)]\displaystyle\Big{.}\times dN^{*}(t_{1},t_{2})\Big{]}
=\displaystyle= I+II.\displaystyle\mathrm{I}+\mathrm{II}. (S.2)

For I,\mathrm{I}, we consider the class of functions

{(h1h2)1/2Kh1,h2(t1t,t2t)eTR(t1,t2,t){Y(t1)R(t1,t2,t)Tρ(t)}.×dN(t1,t2):ρ(t)ρ0(t)<ϵ}\begin{split}&\left\{(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho(t)\}\right.\\ &\bigg{.}\times dN^{*}(t_{1},t_{2}):\mid\rho(t)-\rho_{0}(t)\mid<\epsilon\bigg{\}}\end{split}

for any fixed tt and a given constant ϵ.\epsilon. Note that the functions in this class are differentiable and their corresponding first-order derivatives are bounded according to condition (A2), therefore the functions in this class are Lipschitz continuous in ρ(t)\rho(t) at any fixed tt and the Lipschitz constant is uniformly bounded by (h1h2)1/2Kh1,h2(t1t,t2t)eTR(t1,t2,t)R(t1,t2,t)T𝑑N(t1,t2).(h_{1}h_{2})^{1/2}\|\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)R(t_{1},t_{2},t)^{T}dN^{*}(t_{1},t_{2})\|_{\infty}. It can be shown that this is a P-Donsker class (van der Vaart and Wellner, 1996). We obtain that for ρ(t)ρ0(t)<M(nh1h2)1/2\mid\rho(t)-\rho_{0}(t)\mid<M(nh_{1}h_{2})^{-1/2} where MM is a constant, the first term in (S.2) is equal to

I=\displaystyle\mathrm{I}= (nh1h2)1/2(𝒫n𝒫)[Kh1,h2(t1t,t2t)eTR(t1,t2,t){Y(t1)R(t1,t2,t)Tρ0(t)}\displaystyle(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\}\right.
.×dN(t1,t2)]+op(1)\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})\bigg{]}+o_{p}(1)
=\displaystyle= (nh1h2)1/2eT(Unw{ρ0(t)}E[Unw{ρ0(t)}])+op(1).\displaystyle(nh_{1}h_{2})^{1/2}e^{*T}\left(U_{n}^{w}\{\rho_{0}(t)\}-E\left[U_{n}^{w}\{\rho_{0}(t)\}\right]\right)+o_{p}(1). (S.3)

For the second term on the right-hand side of equation (S.2), we have

II=\displaystyle\mathrm{II}= (nh1h2)1/2E[Kh1,h2(t1t,t2t)eTR(t1,t2,t){Y(t1)R(t1,t2,t)Tρ0(t)}.\displaystyle(nh_{1}h_{2})^{1/2}E\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\left\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\right\}\Big{.}
.×dN(t1,t2)]\displaystyle\Big{.}\times dN^{*}(t_{1},t_{2})\Big{]}
(nh1h2)1/2E{Kh1,h2(t1t,t2t)eTR(t1,t2,t)R(t1,t2,t)T𝑑N(t1,t2)}\displaystyle-(nh_{1}h_{2})^{1/2}E\left\{\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)R(t_{1},t_{2},t)^{T}dN^{*}(t_{1},t_{2})\right\}
×{ρ(t)ρ0(t)}\displaystyle\times\left\{\rho(t)-\rho_{0}(t)\right\}
=\displaystyle= I1I2.\displaystyle I_{1}-I_{2}.

Define

φi(t)\displaystyle\varphi_{i}(t)
=(h1h2)1/2Kh1,h2(t1t,t2t)eTRi(t1,t2,t){Yi(t1)Ri(t1,t2,t)Tρ0(t)}𝑑Ni(t1,t2),\displaystyle=(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho_{0}(t)\right\}dN^{*}_{i}(t_{1},t_{2}),

then (nh1h2)1/2eTUnw{ρ0(t)}=n1/2i=1nφi(t),(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\varphi_{i}(t), and we have

I1=\displaystyle I_{1}= nE{φ(t)}\displaystyle\sqrt{n}E\left\{\varphi(t)\right\}
=\displaystyle= nE[E{φ(t)X(t),Z(s),N(t,s),t,s[0,1]}]\displaystyle\sqrt{n}E\left[E\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]
=\displaystyle= (nh1h2)1/2E[Kh1,h2(t1t,t2t)eTR(t1,t2,t).\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\bigg{.}
.×E{Y(t1)R(t1,t2,t)Tρ0(t)X(t),Z(s),t,s[0,1]}η(t1,t2)dt1dt2]\displaystyle\bigg{.}\times E\left\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}
=\displaystyle= (nh1h2)1/2E[Kh1,h2(t1t,t2t)eTR(t1,t2,t).\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\bigg{.}
.×{X(t1)Tβ0(t1)+Z(t1)Tγ0(t1)R(t1,t2,t)Tρ0(t)}η(t1,t2)dt1dt2]\displaystyle\bigg{.}\times\left\{X(t_{1})^{T}\beta_{0}(t_{1})+Z(t_{1})^{T}\gamma_{0}(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}
=\displaystyle= (nh1h2)1/2Kh1,h2(t1t,t2t)E[eTR(t1,t2,t){R(t1,t1,t)TR(t1,t2,t)T}.\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\bigg{[}{e^{*}}^{T}R(t_{1},t_{2},t)\left\{R(t_{1},t_{1},t)^{T}-R(t_{1},t_{2},t)^{T}\right\}\bigg{.}
.×ρ0(t)+Op{(t1t)2}]η(t1,t2)dt1dt2,\displaystyle\bigg{.}\times\rho_{0}(t)+O_{p}\{(t_{1}-t)^{2}\}\bigg{]}\eta(t_{1},t_{2})dt_{1}dt_{2},

By conditions (A3) and (A4), I1=o(1).I_{1}=o(1).

For I2,I_{2}, we have

I2=(nh1h2)1/2E{C(t,t)C(t,t)T}η(t,t)[{β(t)β0(t)}T,{γ(t)γ0(t)}T]T+O{(nh1h2)1/2(h12+h22)}.\begin{split}I_{2}=&(nh_{1}h_{2})^{1/2}{{E}}\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta(t)-\beta_{0}(t)\}^{T},\{\gamma(t)-\gamma_{0}(t)\}^{T}\right]^{T}\\ &+O\left\{(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\right\}.\end{split}

As nn\to\infty and (nh1h2)1/2(h12+h22)0,(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\to 0,

II=(nh1h2)1/2E{C(t,t)C(t,t)T}η(t,t)[{β(t)β0(t)}T,{γ(t)γ0(t)}T]T+o(1).\mathrm{II}=-(nh_{1}h_{2})^{1/2}{{E}}\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta(t)-\beta_{0}(t)\}^{T},\{\gamma(t)-\gamma_{0}(t)\}^{T}\right]^{T}+o(1). (S.4)

Combining (S.2), (S.3) and (S.4), we obtain

(nh1h2)1/2eT(Unw{ρ0(t)}E[Unw{ρ0(t)}])\displaystyle(nh_{1}h_{2})^{1/2}e^{*T}\left(U_{n}^{w}\left\{\rho_{0}(t)\right\}-E\left[U_{n}^{w}\left\{\rho_{0}(t)\right\}\right]\right)
=\displaystyle= (nh1h2)1/2E{C(t,t)C(t,t)T}η(t,t)[{β^w(t)β0(t)}T,{γ^w(t)γ0(t)}T]T+o(1).\displaystyle(nh_{1}h_{2})^{1/2}{{E}}\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}+o(1). (S.5)

Recall

φi(t)\displaystyle\varphi_{i}(t)
=(h1h2)1/2Kh1,h2(t1t,t2t)eTRi(t1,t2,t){Yi(t1)Ri(t1,t2,t)Tρ0(t)}𝑑Ni(t1,t2).\displaystyle=(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho_{0}(t)\}dN^{*}_{i}(t_{1},t_{2}).

Since (nh1h2)1/2eTUnw{ρ0(t)}=n1/2i=1nφi(t)(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\varphi_{i}(t), sum of independent and identically distributed (i.i.d.) random variables, we next calculate var{φ(t)}.\mbox{var}\left\{\varphi(t)\right\}. Note that

var{φ(t)}\displaystyle\mbox{var}\left\{\varphi(t)\right\} =E[var{φ(t)X(t),Z(s),N(t,s),t,s[0,1]}]\displaystyle=E\left[\mbox{var}\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]
+var[E{φ(t)X(t),Z(s),N(t,s),t,s[0,1]}],\displaystyle+\mbox{var}\left[E\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right],

where

E[var{φ(t)X(t),Z(s),N(t,s),t,s[0,1]}]\displaystyle E\left[\mbox{var}\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]
=\displaystyle= h1h2E[Kh1,h2(t1t,t2t)Kh1,h2(s1t,s2t)eTR(t1,t2,t)R(s1,s2,t)Te.\displaystyle h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)R(s_{1},s_{2},t)^{T}e^{*}\bigg{.}
×.E{Y(t1)Y(s1)X(t),Z(s),N(t,s),t,s[0,1]}dN(t1,t2)dN(s1,s2)]\displaystyle\times\bigg{.}E\left\{Y(t_{1})Y(s_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}dN^{*}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}
h1h2E[Kh1,h2(t1t,t2t)Kh1,h2(s1t,s2t)eTR(t1,t2,t)R(s1,s2,t)Te.\displaystyle-h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)R(s_{1},s_{2},t)^{T}e^{*}\bigg{.}
×.E{Y(t1)X(t),Z(s),N(t,s),t,s[0,1]}.\displaystyle\times\bigg{.}E\left\{Y(t_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\bigg{.}
×.E{Y(s1)X(t),Z(s),N(t,s),t,s[0,1]}dN(t1,t2)dN(s1,s2)]\displaystyle\times\bigg{.}E\left\{Y(s_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}dN^{*}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}
=\displaystyle= K(z1,z2)2𝑑z1𝑑z2E[C(t,t)C(t,t)Tσ{t,X(t),Z(t)}2]η(t,t)+O(h12+h22),\displaystyle\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}{{E}}\left[C(t,t)C(t,t)^{T}\sigma\left\{t,X(t),Z(t)\right\}^{2}\right]\eta(t,t)+O(h_{1}^{2}+h_{2}^{2}),

and

var[E{φ(t)X(t),Z(s),N(t,s),t,s[0,1]}]\displaystyle\mbox{var}\left[E\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]
=\displaystyle= h1h2var(Kh1,h2(t1t,t2t)eTR(t1,t2,t)[{R(t1,t1,t)R(t1,t2,t)}Tρ0(t).\displaystyle h_{1}h_{2}\mbox{var}\bigg{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\left[\{R(t_{1},t_{1},t)-R(t_{1},t_{2},t)\}^{T}\rho_{0}(t)\right.\bigg{.}
.+Op{(t1t)2}]dN(t1,t2))\displaystyle\bigg{.}\left.+O_{p}\left\{(t_{1}-t)^{2}\right\}\right]dN^{*}(t_{1},t_{2})\bigg{)}
=\displaystyle= h1h2E(Kh1,h2(t1t,t2t)Kh1,h2(s1t,s2t)eTR(t1,t2,t).\displaystyle h_{1}h_{2}E\bigg{(}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\bigg{.}
.×[{R(t1,t1,t)R(t1,t2,t)}Tρ0(t)+Op{(t1t)2}].\displaystyle\bigg{.}\times\left[\left\{R(t_{1},t_{1},t)-R(t_{1},t_{2},t)\right\}^{T}\rho_{0}(t)+O_{p}\left\{(t_{1}-t)^{2}\right\}\right]\bigg{.}
.×[{R(s1,s1,t)R(s1,s2,t)}Tρ0(t)+Op{(s1t)2}]R(s1,s2,t)Te.\displaystyle\bigg{.}\times\left[\left\{R(s_{1},s_{1},t)-R(s_{1},s_{2},t)\right\}^{T}\rho_{0}(t)+O_{p}\left\{(s_{1}-t)^{2}\right\}\right]R(s_{1},s_{2},t)^{T}e^{*}\bigg{.}
.×dN(t1,t2)dN(s1,s2))\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{)}
h1h2{Kh1,h2(t1t,t2t)E(eTR(t1,t2,t)[{R(t1,t1,t)R(t1,t2,t)}Tρ0(t)..\displaystyle-h_{1}h_{2}\bigg{\{}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\bigg{(}{e^{*}}^{T}R(t_{1},t_{2},t)[\left\{R(t_{1},t_{1},t)-R(t_{1},t_{2},t)\right\}^{T}\rho_{0}(t)\bigg{.}\bigg{.}
..+Op{(t1t)2}])dN(t1,t2)}2\displaystyle\bigg{.}\bigg{.}+O_{p}\left\{(t_{1}-t)^{2}\right\}]\bigg{)}dN^{*}(t_{1},t_{2})\bigg{\}}^{\otimes 2}
=\displaystyle= O(h12+h22).\displaystyle O(h_{1}^{2}+h_{2}^{2}).

To prove the asymptotic normality, we verify the Lyapunov condition. Note that

(nh1h2)1/2eTUnw{ρ0(t)}=n1/2i=1nφi(t)=i=1nn1/2n1φi(t),(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\varphi_{i}(t)=\sum\limits_{i=1}^{n}n^{1/2}n^{-1}\varphi_{i}(t),

then similar to the calculation of variance, we have

i=1nE[n1/2n1φi(t)E{n1/2n1φi(t)}3]\displaystyle\sum\limits_{i=1}^{n}E\left[\mid n^{1/2}n^{-1}\varphi_{i}(t)-E\{n^{1/2}n^{-1}\varphi_{i}(t)\}\mid^{3}\right] =nO{(nh1h2)3/2n3(h1h2)2}\displaystyle=nO\left\{(nh_{1}h_{2})^{3/2}n^{-3}(h_{1}h_{2})^{-2}\right\}
=O{(nh1h2)1/2}.\displaystyle=O\left\{(nh_{1}h_{2})^{-1/2}\right\}.

Therefore, by condition (A4) nh1h2nh_{1}h_{2}\to\infty and (nh1h2)1/2(h12+h22)0,(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\to 0, we have

(nh1h2)1/2eT(Unw{ρ0(t)}E[Unw{ρ0(t)}])𝑑N{0,Σ(t)},(nh_{1}h_{2})^{1/2}e^{*T}\left(U_{n}^{w}\left\{\rho_{0}(t)\right\}-E\left[U_{n}^{w}\left\{\rho_{0}(t)\right\}\right]\right)\xrightarrow{d}N\left\{0,\Sigma^{*}(t)\right\},

where Σ(t)={K(z1,z2)2dz1dz2}E[C(t,t)C(t,t)Tσ{t,X(t),Z(t)}2]η(t,t).\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\left\{t,X(t),Z(t)\right\}^{2}\right]\eta(t,t). Combining with equation (S.5), we have

nh1h2[{β^w(t)β0(t)}T,{γ^w(t)γ0(t)}T]TdN{0,A(t)1Σ(t)A(t)1},\sqrt{nh_{1}h_{2}}\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{*}(t)^{-1}\Sigma^{*}(t)A^{*}(t)^{-1}\},

where

A(t)=E{C(t,t)C(t,t)T}η(t,t)A^{*}(t)=E\{C(t,t)C(t,t)^{T}\}\eta(t,t)

and

Σ(t)={K(z1,z2)2dz1dz2}E[C(t,t)C(t,t)Tσ{t,X(t),Z(t)}2]η(t,t).\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).

Therefore the conclusion of Theorem 11 holds. ∎

A.2 Proof of Theorem 2

Theorem 2. Under conditions (2.3), (A5)-(A8), as n,n\rightarrow\infty, we have

nh{β^c(t)β0(t)}dN{0,A1(t)Σ(t)A1(t)},\sqrt{nh}\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where

A(t)=E{X~(t)X~(t)T}λ(t)andΣ(t)={K(z)2dz}E[X~(t)X~(t)Tσ{t,X~(t)}2]λ(t).\begin{split}A(t)&=E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\\ \mbox{and}\quad\Sigma(t)&=\big{\{}\int K(z)^{2}dz\big{\}}E\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t).\end{split}
Proof.

For any t,t, let

θ0(t)={β0(t)T,β˙0(t)T}T,\theta_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T}\}^{T},
θ^c(t)={β^c(t)T,β˙^c(t)T}T,\hat{\theta}_{c}(t)=\{\hat{\beta}_{c}(t)^{T},\hat{\dot{\beta}}_{c}(t)^{T}\}^{T},

and

W~i(t1,t)={X~i(t1)T,X~i(t1)T(t1t)}T,\tilde{W}_{i}(t_{1},t)=\{\tilde{X}_{i}(t_{1})^{T},\tilde{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T},

under conditions (A7) and (A8), m^X(t)pmX(t)\hat{m}_{X}(t)\stackrel{{\scriptstyle p}}{{\to}}m_{X}(t) and m^Y(t)pmY(t).\hat{m}_{Y}(t)\stackrel{{\scriptstyle p}}{{\to}}m_{Y}(t). Let e=(e1,p×p,0p×p)T2p×p,e=(e_{1,p\times p},0_{p\times p})^{T}\in{\mathbb{R}}^{2p\times p}, where e1,p×pe_{1,p\times p} is a p×pp\times p unit matrix, then the estimating equation is

eTUnc{θ(t)}=n1i=1nKh(t1t)eTW~i(t1,t){Y~i(t1)W~i(t1,t)Tθ(t)}dNi(t1)+op(1).e^{T}U_{n}^{c}\left\{\theta(t)\right\}=n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)e^{T}\tilde{W}_{i}(t_{1},t)\{\tilde{Y}_{i}(t_{1})-\tilde{W}_{i}(t_{1},t)^{T}\theta(t)\}dN_{i}(t_{1})+o_{p}(1). (S.6)

Solving eTUnc{θ(t)}=0,e^{T}U_{n}^{c}\{\theta(t)\}=0, we obtain β^c(t)\hat{\beta}_{c}(t). We first show the consistency of β^c(t).\hat{\beta}_{c}(t).

E[Kh(t1t)eTW~(t1,t){Y~(t1)W~(t1,t)Tθ(t)}dN(t1)]=E(E[Kh(t1t)eTW~(t1,t){Y~(t1)W~(t1,t)Tθ(t)}dN(t1)X(t),N(t),t[0,1]])=E[Kh(t1t)eTW~(t1,t)E{Y~(t1)W~(t1,t)Tθ(t)X(t),t[0,1]}λ(t1)dt1]=E(Kh(t1t)[eTW~(t1,t)W~(t1,t)T{θ0(t)θ(t)}+Op{(t1t)2}]λ(t1)dt1).\begin{split}&E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\right\}dN(t_{1})\right]\\ =&E\left(E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\right\}dN(t_{1})\mid X(t),N(t),t\in[0,1]\right]\right)\\ =&E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)E\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\mid X(t),t\in[0,1]\right\}\lambda(t_{1})dt_{1}\right]\\ =&E\left(\int K_{h}(t_{1}-t)\left[e^{T}\tilde{W}(t_{1},t)\tilde{W}(t_{1},t)^{T}\left\{\theta_{0}(t)-\theta(t)\right\}+O_{p}\{(t_{1}-t)^{2}\}\right]\lambda(t_{1})dt_{1}\right).\end{split}

Let t1=t+hzt_{1}=t+hz, by conditions (A6)-(A8) and the law of large numbers, we have eTUnc{θ(t)}p{β0(t)β(t)}E{X~(t)X~(t)T}λ(t){e}^{T}U_{n}^{c}\{\theta(t)\}\stackrel{{\scriptstyle p}}{{\to}}\{\beta_{0}(t)-\beta(t)\}E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t) as n.n\to\infty. Similar to the proof of Theorem 11, under condition (A6) and convexity lemma (Andersen and Gill, 1982), β^c(t)pβ0(t)\hat{\beta}_{c}(t)\stackrel{{\scriptstyle p}}{{\to}}\beta_{0}(t).

Next we show the asymptotic normality of β^c(t).\hat{\beta}_{c}(t). Let 𝒫n\mathcal{P}_{n} and 𝒫\mathcal{P} denote the empirical measure and the true probability measure respectively, we have

nheTUnc{θ(t)}=\displaystyle\sqrt{nh}e^{T}U_{n}^{c}\left\{\theta(t)\right\}= nh(𝒫n𝒫)[Kh(t1t)eTW~(t1,t){Y~(t1)W~(t1,t)Tθ(t)}dN(t1)]\displaystyle\sqrt{nh}\left(\mathcal{P}_{n}-\mathcal{P}\right)\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\}dN(t_{1})\right]
+nhE[Kh(t1t)eTW~(t1,t){Y~(t1)W~(t1,t)Tθ(t)}dN(t1)].\displaystyle+\sqrt{nh}{{E}}\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\}dN(t_{1})\right]. (S.7)

For the first term in (S.7), we consider the class of functions

{hKh(t1t)eTW~(t1,t){Y~(t1)W~(t1,t)Tθ(t)}dN(t1):θ(t)θ0(t)<ϵ}\left\{\sqrt{h}\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\}dN(t_{1}):\mid\theta(t)-\theta_{0}(t)\mid<\epsilon\right\}

for any given constant ϵ\epsilon and any fixed time point t.t. Similarly to the proof in Theorem 11, it can be shown that this is a P-Donsker class (van der Vaart and Wellner, 1996), we obtain that for θ(t)θ0(t)<M(nh)1/2\mid\theta(t)-\theta_{0}(t)\mid<M(nh)^{-1/2} where MM is a constant, the first term in (S.7) is equal to

nh(𝒫n𝒫)[Kh(t1t)eTW~(t1,t){Y~(t1)W~(t1,t)Tθ0(t)}dN(t1)]+op(1)\displaystyle\sqrt{nh}\left(\mathcal{P}_{n}-\mathcal{P}\right)\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\}dN(t_{1})\right]+o_{p}(1)
=\displaystyle= nheT(Unc{θ0(t)}E[Unc{θ0(t)}])+op(1).\displaystyle\sqrt{nh}e^{T}\left(U_{n}^{c}\{\theta_{0}(t)\}-E\left[U_{n}^{c}\{\theta_{0}(t)\}\right]\right)+o_{p}(1). (S.8)

For the second term on the right-hand side of equation (S.7), we have

nhE(Kh(t1t)eTW~(t1,t)[Y~(t1)W~(t1,t)T{θ(t)θ0(t)}W~(t1,t)Tθ0(t)].\displaystyle\sqrt{nh}{{E}}\Big{(}\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left[\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\left\{\theta(t)-\theta_{0}(t)\right\}-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right]\Big{.}
.×dN(t1))\displaystyle\Big{.}\times dN(t_{1})\Big{)}
=\displaystyle= nhE[Kh(t1t)eTW~(t1,t){Y~(t1)W~(t1,t)Tθ0(t)}dN(t1)]\displaystyle\sqrt{nh}E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}dN(t_{1})\right]
nhE{Kh(t1t)eTW~(t1,t)W~(t1,t)Tλ(t1)dt1}{θ(t)θ0(t)}\displaystyle\ -\sqrt{nh}E\left\{\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\tilde{W}(t_{1},t)^{T}\lambda(t_{1})dt_{1}\right\}\left\{\theta(t)-\theta_{0}(t)\right\}
=\displaystyle= I1I2.\displaystyle I_{1}-I_{2}. (S.9)

For I1,I_{1}, let ψi(t)=hKh(t1t)eTW~i(t1,t){Y~i(t1)W~i(t1,t)Tθ0(t)}dNi(t1)\psi_{i}(t)=\sqrt{h}\int K_{h}(t_{1}-t)e^{T}\tilde{W}_{i}(t_{1},t)\{\tilde{Y}_{i}(t_{1})-\tilde{W}_{i}(t_{1},t)^{T}\theta_{0}(t)\}dN_{i}(t_{1}). ψi(t)\psi_{i}(t)s are i.i.d. and nheTUnc{θ0(t)}=n1/2i=1nψi(t).\sqrt{nh}e^{T}U_{n}^{c}\left\{\theta_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\psi_{i}(t). We have

I1=\displaystyle I_{1}= nE{ψ(t)}\displaystyle\sqrt{n}E\left\{\psi(t)\right\}
=\displaystyle= nE[E{ψ(t)X~(t),N(t),t[0,1]}]\displaystyle\sqrt{n}E\left[E\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]
=\displaystyle= (nh)1/2E[Kh(t1t)eTW~(t1,t)E{Y~(t1)W~(t1,t)Tθ0(t)X~(t),t[0,1]}λ(t1)dt1]\displaystyle(nh)^{1/2}E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)E\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\mid\tilde{X}(t),t\in[0,1]\right\}\lambda(t_{1})dt_{1}\right]
=\displaystyle= (nh)1/2E[Kh(t1t)eTW~(t1,t){W~(t1,t1)Tθ0(t1)W~(t1,t)Tθ0(t)}λ(t1)dt1]\displaystyle(nh)^{1/2}E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}\lambda(t_{1})dt_{1}\right]
=\displaystyle= (nh)1/2Kh(t1t)E[eT{W~(t1,t)W~(t1,t1)TW~(t1,t)W~(t1,t)T}θ0(t).\displaystyle(nh)^{1/2}\int K_{h}(t_{1}-t)E\Big{[}e^{T}\left\{\tilde{W}(t_{1},t)\tilde{W}(t_{1},t_{1})^{T}-\tilde{W}(t_{1},t)\tilde{W}(t_{1},t)^{T}\right\}\theta_{0}(t)\Big{.}
.+Op{(t1t)2}]λ(t1)dt1,\displaystyle\Big{.}+O_{p}\{(t_{1}-t)^{2}\}\Big{]}\lambda(t_{1})dt_{1},

let t1=h1z1+t,t_{1}=h_{1}z_{1}+t, similar to the proof of Theorem 11, by conditions (A7) and (A8), we have I1=o(1).I_{1}=o(1).

For I2I_{2}, we have

I2=nheTE{K(z)W~(t+hz,t)W~(t+hz,t)Tλ(t+hz)dz}{θ(t)θ0(t)}=nhE{X~(t)X~(t)T}λ(t){β(t)β0(t)}+O(n1/2h5/2).\begin{split}I_{2}&=\sqrt{nh}e^{T}E\left\{\int K(z)\tilde{W}(t+hz,t)\tilde{W}(t+hz,t)^{T}\lambda(t+hz)dz\right\}\left\{\theta(t)-\theta_{0}(t)\right\}\\ &=\sqrt{nh}E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\left\{\beta(t)-\beta_{0}(t)\right\}+O(n^{1/2}h^{5/2}).\\ \end{split}

Consequently,

nhE{X~(t)X~(t)T}λ(t){β^c(t)β0(t)}+o(1)=nheT(Unc{θ0(t)}E[Unc{θ0(t)}]).\sqrt{nh}E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}+o(1)=\sqrt{nh}e^{T}\left(U_{n}^{c}\left\{\theta_{0}(t)\right\}-E\left[U_{n}^{c}\left\{\theta_{0}(t)\right\}\right]\right). (S.10)

The variance of nheTUnc(θ0(t))\sqrt{nh}e^{T}U_{n}^{c}\left(\theta_{0}(t)\right) can be computed using ψi(t)\psi_{i}(t) defined earlier,

var{ψ(t)}=E[var{ψ(t)X~(t),N(t),t[0,1]}]+var[E{ψ(t)X~(t),N(t),t[0,1]}]=J1+J2,\begin{split}\mathrm{var}\left\{\psi(t)\right\}&=E\left[\mathrm{var}\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]+\mathrm{var}\left[E\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]\\ &=J_{1}+J_{2},\end{split}

where

J1=\displaystyle J_{1}= E{E([ψ(t)E{ψ(t)X~(t),N(t),t[0,1]}]2X~(t),N(t),t[0,1])}\displaystyle E\left\{E\left(\left[\psi(t)-E\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]^{2}\mid\tilde{X}(t),N(t),t\in[0,1]\right)\right\}
=\displaystyle= hE{Kh(t1t)Kh(s1t)eTW~(t1,t)W~(s1,t)Te.\displaystyle hE\bigg{\{}\iint K_{h}(t_{1}-t)K_{h}(s_{1}-t)e^{T}\tilde{W}(t_{1},t)\tilde{W}(s_{1},t)^{T}e\bigg{.}
.×E([Y~(t1)E{Y~(t1)X~(t),N(t),t[0,1]}].\displaystyle\bigg{.}\times E\left(\left[\tilde{Y}(t_{1})-E\left\{\tilde{Y}(t_{1})\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]\right.\bigg{.}
.×[Y~(s1)E{Y~(s1)X~(t),N(t),t[0,1]}]X~(t),N(t),t[0,1])dN(t1)dN(s1)}\displaystyle\bigg{.}\left.\times\left[\tilde{Y}(s_{1})-E\left\{\tilde{Y}(s_{1})\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]\mid\tilde{X}(t),N(t),t\in[0,1]\right)dN(t_{1})dN(s_{1})\bigg{\}}
=\displaystyle= K(z)2dzE[X~(t)X~(t)Tσ{t,X~(t)}2]λ(t)+O(h2),\displaystyle\int K(z)^{2}dzE\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t)+O(h^{2}),

and

J2=\displaystyle J_{2}= hvar[Kh(t1t)eTW~(t1,t){W~(t1,t1)Tθ0(t1)W~(t1,t)Tθ0(t)}dN(t1)]\displaystyle h\mathrm{var}\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}dN(t_{1})\right]
=\displaystyle= hE[Kh(t1t)Kh(s1t)eTW~(t1,t){W~(t1,t1)Tθ0(t1)W~(t1,t)Tθ0(t)}.\displaystyle hE\bigg{[}\iint K_{h}(t_{1}-t)K_{h}(s_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}\bigg{.}
×.{W~(s1,s1)Tθ0(s1)W~(s1,t)Tθ0(t)}W~(s1,t)TedN(t1)dN(s1)]\displaystyle\times\bigg{.}\left\{\tilde{W}(s_{1},s_{1})^{T}\theta_{0}(s_{1})-\tilde{W}(s_{1},t)^{T}\theta_{0}(t)\right\}\tilde{W}(s_{1},t)^{T}edN(t_{1})dN(s_{1})\bigg{]}
h{Kh(t1t)eTE(W~(t1,t)[W~(t1,t1)Tθ0(t1)W~(t1,t)Tθ0(t)])dN(t1)}2\displaystyle-h\left\{\int K_{h}(t_{1}-t)e^{T}E\left(\tilde{W}(t_{1},t)\left[\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right]\right)dN(t_{1})\right\}^{\otimes 2}
=\displaystyle= O(h2).\displaystyle O(h^{2}).

To prove the asymptotic normality, we verify the Lyapunov condition. Note that

(nh)1/2eTUnc{θ0(t)}=n1/2i=1nψi(t)=i=1nn1/2n1ψi(t),(nh)^{1/2}e^{T}U_{n}^{c}\left\{\theta_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\psi_{i}(t)=\sum\limits_{i=1}^{n}n^{1/2}n^{-1}\psi_{i}(t),

then similar to the calculation of variance,

i=1nE[n1/2n1ψi(t)E{n1/2n1ψi(t)}3]=nO{(nh)3/2n3h2}=O{(nh)1/2}.\sum\limits_{i=1}^{n}E\left[\mid n^{1/2}n^{-1}\psi_{i}(t)-E\{n^{1/2}n^{-1}\psi_{i}(t)\}\mid^{3}\right]=nO\left\{(nh)^{3/2}n^{-3}h^{-2}\right\}=O\left\{(nh)^{-1/2}\right\}.

Combining (S.10), we have

nh{β^c(t)β0(t)}dN{0,A(t)1Σ(t)A(t)1},\sqrt{nh}\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A(t)^{-1}\Sigma(t)A(t)^{-1}\},

where

A(t)=E{X~(t)X~(t)T}λ(t)andΣ(t)={K(z)2dz}E[X~(t)X~(t)Tσ{t,X~(t)}2]λ(t).\begin{split}A(t)&=E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\\ \mbox{and}\quad\Sigma(t)&=\big{\{}\int K(z)^{2}dz\big{\}}E\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t).\end{split}

Therefore the conclusion of Theorem 22 holds. ∎

A.3 Proof of Theorem 3

Theorem 3.Under conditions (2.3), (A5)-(A8), as n,n\rightarrow\infty, we have

nh{β^v(t)β0(t)}dN{0,A1(t)Σ(t)A1(t)},\sqrt{nh}\{\hat{\beta}_{v}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where A(t)A(t) and Σ(t)\Sigma(t) are specified in Theorem 22.

Proof.

From Fan and Zhang (2008), ι^v(t){α^v(t),β^v(t)T,α˙^v(t),β˙^v(t)T}T\hat{\iota}_{v}(t)\triangleq\{\hat{\alpha}_{v}(t),\hat{\beta}_{v}(t)^{T},\hat{\dot{\alpha}}_{v}(t),\hat{\dot{\beta}}_{v}(t)^{T}\}^{T} is consistent estimates of the coefficients in model (2.8). Here we show that β^v(t)\hat{\beta}_{v}(t) is a consistent estimator for β(t)\beta(t) in model (2.1). For any fixed time point tt, define

Unv{ι(t)}=n1i=1nKh(t1t){Yi(t1)Qi(t1,t)Tι(t)}2dNi(t1),U_{n}^{v}\{\iota(t)\}=n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{Y_{i}(t_{1})-Q_{i}(t_{1},t)^{T}\iota(t)\right\}^{2}dN_{i}(t_{1}), (S.11)

where ι(t)={α(t),β(t)T,α˙(t),β˙(t)T}T\iota(t)=\{\alpha(t),\beta(t)^{T},\dot{\alpha}(t),\dot{\beta}(t)^{T}\}^{T} and Qi(t1,t)={1,Xi(t1)T,(t1t),Xi(t1)T(t1t)}TQ_{i}(t_{1},t)=\{1,X_{i}(t_{1})^{T},(t_{1}-t),X_{i}(t_{1})^{T}(t_{1}-t)\}^{T}. From (2.5), α(t)=E{Z(t)}Tγ(t)=E{Y(t)}E{X(t)}Tβ(t),\alpha(t)=E\{Z(t)\}^{T}\gamma(t)=E\{Y(t)\}-E\{X(t)\}^{T}\beta(t), we have

Unv{ι(t)}\displaystyle U_{n}^{v}\{\iota(t)\}
=\displaystyle= n1i=1nKh(t1t){Yi(t1)α(t)α˙(t)(t1t)Xi(t1)Tβ(t)Xi(t1)Tβ˙(t)(t1t)}2\displaystyle n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{Y_{i}(t_{1})-\alpha(t)-\dot{\alpha}(t)(t_{1}-t)-X_{i}(t_{1})^{T}\beta(t)-X_{i}(t_{1})^{T}\dot{\beta}(t)(t_{1}-t)\right\}^{2}
×dNi(t1)\displaystyle\times dN_{i}(t_{1})
=\displaystyle= n1i=1nKh(t1t){Yi(t1)EY(t1)+Op(t1t)2+EX(t1)Tβ(t1)Xi(t1)Tβ(t).\displaystyle n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\Big{\{}Y_{i}(t_{1})-EY(t_{1})+O_{p}(t_{1}-t)^{2}+EX(t_{1})^{T}\beta(t_{1})-X_{i}(t_{1})^{T}\beta(t)\Big{.}
.Xi(t1)Tβ˙(t)(t1t)}2dNi(t1)\displaystyle\Big{.}-X_{i}(t_{1})^{T}\dot{\beta}(t)(t_{1}-t)\Big{\}}^{2}dN_{i}(t_{1})
=\displaystyle= n1i=1nKh(t1t){Y~i(t1)W~i(t1,t)Tθ(t)}2dNi(t1)+op(1),\displaystyle n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{\tilde{Y}_{i}(t_{1})-\tilde{W}_{i}(t_{1},t)^{T}\theta(t)\right\}^{2}dN_{i}(t_{1})+o_{p}(1), (S.12)

where θ(t)={β(t)T,β˙(t)T}T\theta(t)=\{\beta(t)^{T},\dot{\beta}(t)^{T}\}^{T}, W~i(t1,t)={X~i(t1)T,X~i(t1)T(t1t)}T,\tilde{W}_{i}(t_{1},t)=\{\tilde{X}_{i}(t_{1})^{T},\tilde{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T}, X~i(t1)=Xi(t1)EX(t1),\tilde{X}_{i}(t_{1})=X_{i}(t_{1})-EX(t_{1}), and Y~i(t1)=Yi(t1)EY(t1).\tilde{Y}_{i}(t_{1})=Y_{i}(t_{1})-EY(t_{1}). The estimating equation is the same as the centering approach. Consequently, let θ^v(t)={β^v(t)T,β˙^v(t)T}T\hat{\theta}_{v}(t)=\{\hat{\beta}_{v}(t)^{T},\hat{\dot{\beta}}_{v}(t)^{T}\}^{T} be the estimating function using the varying-coefficient model approach and θ0(t)\theta_{0}(t) be the true function. The proof of Theorem 33 is the same as that of Theorem 22 and is thus omitted. ∎

A.4 Proof of Theorem 4

Theorem 4. Under conditions (2.3), (A1)-(A4), as n,n\rightarrow\infty, we have

nh1h2{γ^(t)γ0(t)}dN{0,A+(t)1Σ+(t)A+(t)1},\sqrt{nh_{1}h_{2}}\{\hat{\gamma}(t)-\gamma_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{+}(t)^{-1}\Sigma^{+}(t)A^{+}(t)^{-1}\},

where

A+(t)=E{Z(t)Z(t)T}η(t,t)andΣ+(t)={K(z1,z2)2dz1dz2}E[Z(t)Z(t)Tσ{t,X(t),Z(t)}2]η(t,t).\begin{split}A^{+}(t)&=E\{Z(t)Z(t)^{T}\}\eta(t,t)\\ \mbox{and}\quad\Sigma^{+}(t)&=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).\end{split}
Proof.

For any fixed time point t,t, let ϕ0(t)={γ0(t)T,γ˙0(t)T}T\phi_{0}(t)=\{\gamma_{0}(t)^{T},\dot{\gamma}_{0}(t)^{T}\}^{T}, ϕ^(t)={γ^(t)T,γ˙^(t)T}T,\hat{\phi}(t)=\{\hat{\gamma}(t)^{T},\hat{\dot{\gamma}}(t)^{T}\}^{T}, Vi(t2,t)={Zi(t2)T,Zi(t2)T(t2t)}T,V_{i}(t_{2},t)=\{Z_{i}(t_{2})^{T},Z_{i}(t_{2})^{T}(t_{2}-t)\}^{T}, θ0(t)={β0(t)T,β˙0(t)T}T,\theta_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T}\}^{T}, θ^(t)={β^(t)T,β˙^(t)T}T,\hat{\theta}(t)=\{\hat{\beta}(t)^{T},\hat{\dot{\beta}}(t)^{T}\}^{T}, and Wi(t1,t)={Xi(t1)T,Xi(t1)T(t1t)}T,W_{i}(t_{1},t)=\{X_{i}(t_{1})^{T},X_{i}(t_{1})^{T}(t_{1}-t)\}^{T}, for |θ^(t)θ0(t)|<ϵ|\hat{\theta}(t)-\theta_{0}(t)|<\epsilon with any given constant ϵ,\epsilon, the estimating equation about ϕ(t)\phi(t) becomes

Un{ϕ(t),θ^(t)}\displaystyle U_{n}\{\phi(t),\hat{\theta}(t)\}
=\displaystyle= n1i=1nKh1,h2(t1t,t2t)Vi(t2,t){Yi(t1)Wi(t1,t)Tθ0(t)Vi(t2,t)Tϕ(t)}\displaystyle n^{-1}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)-V_{i}(t_{2},t)^{T}\phi(t)\right\}
×dNi(t1,t2)+op(1).\displaystyle\times dN^{*}_{i}(t_{1},t_{2})+o_{p}(1). (S.13)

Let e+=(e2,q×q,0q×q)T2q×qe^{+}=(e_{2,q\times q},0_{q\times q})^{T}\in{\mathbb{R}}^{2q\times q}. By taking expected value of e+TUn{ϕ(t),θ^(t)}{e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}, we have

n1i=1ne+TE[Kh1,h2(t1t,t2t)Vi(t2,t)E{Yi(t1)Wi(t1,t)Tθ0(t).\displaystyle n^{-1}\sum\limits_{i=1}^{n}{e^{+}}^{T}E\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)E\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)\right.\Big{.}
.Vi(t2,t)Tϕ(t)X(t),Z(s),t,s[0,1]}η(t1,t2)dt1dt2]+op(1)\displaystyle\left.\Big{.}-V_{i}(t_{2},t)^{T}\phi(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\Big{]}+o_{p}(1)
=\displaystyle= n1i=1ne+TE(Kh1,h2(t1t,t2t)Vi(t2,t)[Vi(t1,t)Tϕ0(t)Vi(t2,t)Tϕ(t).\displaystyle n^{-1}\sum\limits_{i=1}^{n}{e^{+}}^{T}E\Big{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)\left[V_{i}(t_{1},t)^{T}\phi_{0}(t)-V_{i}(t_{2},t)^{T}\phi(t)\right.\Big{.}
.+Op{(t1t)2}]η(t1,t2)dt1dt2).\displaystyle\left.\Big{.}+O_{p}\{(t_{1}-t)^{2}\}\right]\eta(t_{1},t_{2})dt_{1}dt_{2}\Big{)}.

Let t1=h1z1+t,t2=h2z2+t,t_{1}=h_{1}z_{1}+t,t_{2}=h_{2}z_{2}+t, then by conditions (A2)-(A4) and Taylor expansion, we have E[e+TUn{ϕ(t),θ^(t)}]=E{Z(t)Z(t)T}η(t,t){γ0(t)γ(t)}+o(1).E\left[{e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}\right]=E\{Z(t)Z(t)^{T}\}\eta(t,t)\{\gamma_{0}(t)-\gamma(t)\}+o(1). By the law of large numbers, e+TUn{ϕ(t),θ^(t)}pu{γ(t)},{e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}\stackrel{{\scriptstyle p}}{{\rightarrow}}u\{\gamma(t)\}, as n,n\to\infty, where u{γ(t)}=E{Z(t)Z(t)T}η(t,t)×{γ0(t)γ(t)}.u\{\gamma(t)\}=E\{Z(t)Z(t)^{T}\}\eta(t,t)\times\newline \left\{\gamma_{0}(t)-\gamma(t)\right\}. Under condition (A2), γ0(t)\gamma_{0}(t) is the unique solution to u{γ(t)}=0.u\{\gamma(t)\}=0. γ^(t)\hat{\gamma}(t) solves the estimation equation e+TUn{ϕ(t),θ^(t)}=0.{e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}=0. By Andersen and Gill (1982), it follows that for any fixed t,t, γ^(t)pγ0(t)\hat{\gamma}(t)\stackrel{{\scriptstyle p}}{{\to}}\gamma_{0}(t).

We next show the asymptotic normality of γ^(t).\hat{\gamma}(t). Using 𝒫n\mathcal{P}_{n} and 𝒫\mathcal{P} to denote the empirical measure and true probability measure respectively, we have

(nh1h2)1/2e+TUn{ϕ(t),θ^(t)}\displaystyle(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}
=\displaystyle= (nh1h2)1/2(𝒫n𝒫)[Kh1,h2(t1t,t2t)e+TVi(t2,t){Yi(t1)Wi(t1,t)Tθ^(t)\displaystyle(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\hat{\theta}(t)-\right.
.Vi(t2,t)Tϕ(t)}dNi(t1,t2)]\displaystyle\bigg{.}-V_{i}(t_{2},t)^{T}\phi(t)\}dN^{*}_{i}(t_{1},t_{2})\bigg{]}
+(nh1h2)1/2E[Kh1,h2(t1t,t2t)e+TVi(t2,t){Yi(t1)Wi(t1,t)Tθ^(t)\displaystyle+(nh_{1}h_{2})^{1/2}E\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\hat{\theta}(t)\right.
.Vi(t2,t)Tϕ(t)dNi(t1,t2)]\displaystyle\bigg{.}-V_{i}(t_{2},t)^{T}\phi(t)dN^{*}_{i}(t_{1},t_{2})\bigg{]}
=\displaystyle= I+II.\displaystyle\mathrm{I}+\mathrm{II}. (S.14)

For the term I\mathrm{I} and any fixed time point t,t, we consider the class of functions

{(h1h2)1/2Kh1,h2(t1t,t2t)e+TV(t2,t){Y(t1)W(t1,t)Tθ(t)V(t2,t)Tϕ(t)}\displaystyle\left\{(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\left\{Y(t_{1})-W(t_{1},t)^{T}\theta(t)-V(t_{2},t)^{T}\phi(t)\right\}\right.
.×dN(t1,t2):|θ(t)θ0(t)|<ϵ1,|ϕ(t)ϕ0(t)|<ϵ2}\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2}):|\theta(t)-\theta_{0}(t)|<\epsilon_{1},|\phi(t)-\phi_{0}(t)|<\epsilon_{2}\bigg{\}}

for given constants ϵ1\epsilon_{1} and ϵ2.\epsilon_{2}. Similarly the proof in Theorem 11, we obtain that for θ(t)θ0(t)<M1(nh)1/2,ϕ(t)ϕ0(t)<M2(nh1h2)1/2\mid\theta(t)-\theta_{0}(t)\mid<M_{1}(nh)^{-1/2},\mid\phi(t)-\phi_{0}(t)\mid<M_{2}(nh_{1}h_{2})^{-1/2} where M1M_{1} and M2M_{2} are some constants, the first term I\mathrm{I} in (S.14) is equal to

(nh1h2)1/2(𝒫n𝒫)[Kh1,h2(t1t,t2t)e+TVi(t2,t).×{Yi(t1)Wi(t1,t)Tθ0(t)Vi(t2,t)Tϕ0(t)}dNi(t1,t2)]+op(1)=(nh1h2)1/2e+T(Un{ϕ0(t),θ0(t)}E[Un{ϕ0(t),θ0(t)}])+op(1).\begin{split}&(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\right.\\ &\bigg{.}\times\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)-V_{i}(t_{2},t)^{T}\phi_{0}(t)\right\}dN^{*}_{i}(t_{1},t_{2})\bigg{]}+o_{p}(1)\\ =&(nh_{1}h_{2})^{1/2}{e^{+}}^{T}\left(U_{n}\{\phi_{0}(t),\theta_{0}(t)\}-E\left[U_{n}\{\phi_{0}(t),\theta_{0}(t)\}\right]\right)+o_{p}(1).\end{split}

For the second term II\mathrm{II} on the right-hand side of equation (S.14), we have

II\displaystyle\mathrm{II}
=\displaystyle= (nh1h2)1/2Kh1,h2(t1t,t2t)E[e+TV(t2,t){Y(t1)W(t1,t)Tθ^(t)V(t2,t)Tϕ(t)}]\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\left[{e^{+}}^{T}V(t_{2},t)\{Y(t_{1})-W(t_{1},t)^{T}\hat{\theta}(t)-V(t_{2},t)^{T}\phi(t)\}\right]
×η(t1,t2)dt1dt2\displaystyle\times\eta(t_{1},t_{2})dt_{1}dt_{2}
=\displaystyle= (nh1h2)1/2E[Kh1,h2(t1t,t2t)e+TV(t2,t){Y(t1)W(t1,t)Tθ0(t).\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\left\{Y(t_{1})-W(t_{1},t)^{T}\theta_{0}(t)\right.\bigg{.}
.V(t2,t)Tϕ0(t)}η(t1,t2)dt1dt2]\displaystyle\bigg{.}\left.-V(t_{2},t)^{T}\phi_{0}(t)\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}
(nh1h2)1/2E[Kh1,h2(t1t,t2t)e+TV(t2,t)W(t1,t)T{θ^(t)θ0(t)}η(t1,t2)dt1dt2]\displaystyle-(nh_{1}h_{2})^{1/2}E\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)W(t_{1},t)^{T}\{\hat{\theta}(t)-\theta_{0}(t)\}\eta(t_{1},t_{2})dt_{1}dt_{2}\right]
(nh1h2)1/2E[Kh1,h2(t1t,t2t)e+TV(t2,t)V(t2,t)T{ϕ(t)ϕ0(t)}η(t1,t2)dt1dt2]\displaystyle-(nh_{1}h_{2})^{1/2}E\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)V(t_{2},t)^{T}\{\phi(t)-\phi_{0}(t)\}\eta(t_{1},t_{2})dt_{1}dt_{2}\right]
=\displaystyle= I1I2I3.\displaystyle I_{1}-I_{2}-I_{3}.

For I1,I_{1}, let

πi(t)=(h1h2)1/2Kh1,h2(t1t,t2t)e+TVi(t2,t){Yi(t1)Wi(t1,t)Tθ0(t)Vi(t2,t)Tϕ0(t)}dNi(t1,t2),\begin{split}\pi_{i}(t)=&(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)\right.\\ &\left.-V_{i}(t_{2},t)^{T}\phi_{0}(t)\right\}dN^{*}_{i}(t_{1},t_{2}),\end{split}

πi(t)\pi_{i}(t)s are i.i.d. and (nh1h2)1/2e+TUn{ϕ0(t),θ0(t)}=n1/2i=1nπi(t),(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\{\phi_{0}(t),\theta_{0}(t)\}=n^{-1/2}\sum_{i=1}^{n}\pi_{i}(t), then we have

I1=\displaystyle I_{1}= nE{π(t)}\displaystyle\sqrt{n}E\left\{\pi(t)\right\}
=\displaystyle= nE[E{π(t)X(t),Z(s),N(t,s),t,s[0,1]}]\displaystyle\sqrt{n}E\left[E\left\{\pi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]
=\displaystyle= (nh1h2)1/2E[Kh1,h2(t1t,t2t)e+TV(t2,t).\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\bigg{.}
.×E{Y(t1)W(t1,t)Tθ0(t)V(t2,t)Tϕ0(t)X(t),Z(s),t,s[0,1]}η(t1,t2)dt1dt2]\displaystyle\bigg{.}\times E\left\{Y(t_{1})-W(t_{1},t)^{T}\theta_{0}(t)-V(t_{2},t)^{T}\phi_{0}(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}
=\displaystyle= (nh1h2)1/2Kh1,h2(t1t,t2t)E[e+T{V(t2,t)V(t1,t)TV(t2,t)V(t2,t)T}ϕ0(t).\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\bigg{[}{e^{+}}^{T}\left\{V(t_{2},t)V(t_{1},t)^{T}-V(t_{2},t)V(t_{2},t)^{T}\right\}\phi_{0}(t)\bigg{.}
.+Op{(t1t)2}]η(t1,t2)dt1dt2,\displaystyle\bigg{.}+O_{p}\{(t_{1}-t)^{2}\}\bigg{]}\eta(t_{1},t_{2})dt_{1}dt_{2},

let t1=h1z1+t,t2=h2z2+t,t_{1}=h_{1}z_{1}+t,t_{2}=h_{2}z_{2}+t, similar to the proof of Theorem 11 we have I1=o(1).I_{1}=o(1). By the proof of Theorem 22, conditions (A4) and (A8), we have I2=o(1).I_{2}=o(1).

For I3,I_{3}, we have

I3=\displaystyle I_{3}= (nh1h2)1/2Kh1,h2(t1t,t2t)e+TE{V(t2,t)V(t2,t)T}{ϕ(t)ϕ0(t)}η(t1,t2)dt1dt2\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}E\{V(t_{2},t)V(t_{2},t)^{T}\}\{\phi(t)-\phi_{0}(t)\}\eta(t_{1},t_{2})dt_{1}dt_{2}
=\displaystyle= (nh1h2)1/2K(z1,z2)e+TE{V(h2z2+t,t)V(h2z2+t,t)T}η(h1z1+t,h2z2+t)dz1dz2\displaystyle(nh_{1}h_{2})^{1/2}\iint K(z_{1},z_{2}){e^{+}}^{T}E\{V(h_{2}z_{2}+t,t)V(h_{2}z_{2}+t,t)^{T}\}\eta(h_{1}z_{1}+t,h_{2}z_{2}+t)dz_{1}dz_{2}
×{ϕ(t)ϕ0(t)}\displaystyle\times\{\phi(t)-\phi_{0}(t)\}
=\displaystyle= (nh1h2)1/2E{Z(t)Z(t)T}η(t,t){γ(t)γ0(t)}+o(1).\displaystyle(nh_{1}h_{2})^{1/2}E\{Z(t)Z(t)^{T}\}\eta(t,t)\{\gamma(t)-\gamma_{0}(t)\}+o(1).

Consequently,

(nh1h2)1/2e+T(Un{ϕ0(t),θ0(t)}E[Un{ϕ0(t),θ0(t)}])=(nh1h2)1/2E{Z(t)Z(t)T}η(t,t){γ^(t)γ0(t)}+op(1).\begin{split}&(nh_{1}h_{2})^{1/2}{e^{+}}^{T}\left(U_{n}\{\phi_{0}(t),\theta_{0}(t)\}-E\left[U_{n}\{\phi_{0}(t),\theta_{0}(t)\}\right]\right)\\ =&(nh_{1}h_{2})^{1/2}E\{Z(t)Z(t)^{T}\}\eta(t,t)\{\hat{\gamma}(t)-\gamma_{0}(t)\}+o_{p}(1).\end{split}

Now we show that (nh1h2)1/2e+TUn{ϕ0(t),θ0(t)}(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\{\phi_{0}(t),\theta_{0}(t)\} follows the central limit theorem. The variance can be computed using πi.\pi_{i}.

var{π(t)}=E[var{π(t)X(t),Z(s),N(t,s),t,s[0,1]}]+var[E{π(t)X(t),Z(s),N(t,s),t,s[0,1]}]=J1+J2,\begin{split}\mbox{var}\left\{\pi(t)\right\}=&E\left[\mathrm{var}\left\{\pi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]\\ &+\mathrm{var}\left[E\left\{\pi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]\\ =&J_{1}+J_{2},\end{split}

where

J1=\displaystyle J_{1}= h1h2E[Kh1,h2(t1t,t2t)Kh1,h2(s1t,s2t)e+TV(t2,t)V(s2,t)Te+.\displaystyle h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{+}}^{T}V(t_{2},t)V(s_{2},t)^{T}{e^{+}}\bigg{.}
×.E{Y(t1)Y(s1)X(t),Z(s),N(t,s),t,s[0,1]}dN(t1,t2)dN(s1,s2)]\displaystyle\times\bigg{.}E\left\{Y(t_{1})Y(s_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}dN^{*}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}
h1h2E[Kh1,h2(t1t,t2t)Kh1,h2(s1t,s2t)e+TV(t2,t)V(s2,t)Te+.\displaystyle-h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{+}}^{T}V(t_{2},t)V(s_{2},t)^{T}e^{+}\bigg{.}
×.E{Y(t1)X(t),Z(s),N(t,s),t,s[0,1]}.\displaystyle\times\bigg{.}E\left\{Y(t_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\bigg{.}
×.E{Y(s1)X(t),Z(s),N(t,s),t,s[0,1]}dN(t1,t2)dN(s1,s2)]\displaystyle\times\bigg{.}E\left\{Y(s_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}dN^{*}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}
=\displaystyle= K(z1,z2)2dz1dz2E[Z(t)Z(t)Tσ{t,X(t),Z(t)}2]η(t,t)+O(h12+h22),\displaystyle\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t)+O(h_{1}^{2}+h_{2}^{2}),

and

J2=\displaystyle J_{2}= h1h2var[Kh1,h2(t1t,t2t)e+TV(t2,t){Z(t1)Tγ0(t1)V(t2,t)Tϕ0(t)}.\displaystyle h_{1}h_{2}\mbox{var}\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\left\{Z(t_{1})^{T}\gamma_{0}(t_{1})-V(t_{2},t)^{T}\phi_{0}(t)\right\}\bigg{.}
.×dN(t1,t2)]\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})\bigg{]}
=\displaystyle= h1h2E[Kh1,h2(t1t,t2t)Kh1,h2(s1t,s2t)e+TV(t2,t).\displaystyle h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{+}}^{T}V(t_{2},t)\bigg{.}
.×{V(t1,t)V(t2,t)}Tϕ0(t)ϕ0(t)T{V(s1,t)V(s2,t)}V(s2,t)Te+.\displaystyle\bigg{.}\times\left\{V(t_{1},t)-V(t_{2},t)\right\}^{T}\phi_{0}(t)\phi_{0}(t)^{T}\left\{V(s_{1},t)-V(s_{2},t)\right\}V(s_{2},t)^{T}e^{+}\bigg{.}
.×dN(t1,t2)dN(s1,s2)]\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}
h1h2(Kh1,h2(t1t,t2t)E[e+TV(t2,t){V(t1,t)V(t2,t)}T]ϕ0(t).\displaystyle-h_{1}h_{2}\bigg{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\left[{e^{+}}^{T}V(t_{2},t)\{V(t_{1},t)-V(t_{2},t)\}^{T}\right]\phi_{0}(t)\bigg{.}
.×dN(t1,t2))2\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})\bigg{)}^{\otimes 2}
=\displaystyle= O(h12+h22).\displaystyle O(h_{1}^{2}+h_{2}^{2}).

Thus

var{π(t)}=K(z1,z2)2dz1dz2E[Z(t)Z(t)Tσ{t,X(t),Z(t)}2]η(t,t)+O(h12+h22)=Σ+(t)+O(h12+h22).\begin{split}\mbox{var}\left\{\pi(t)\right\}&=\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t)+O(h_{1}^{2}+h_{2}^{2})\\ &=\Sigma^{+}(t)+O(h_{1}^{2}+h_{2}^{2}).\end{split}

To prove the asymptotic normality, we verify the Lyapunov condition. Note that

(nh1h2)1/2e+TUn{ϕ0(t),θ0(t)}=n1/2i=1nπi(t)=i=1nn1/2n1πi(t),(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\left\{\phi_{0}(t),\theta_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\pi_{i}(t)=\sum\limits_{i=1}^{n}n^{1/2}n^{-1}\pi_{i}(t),

then similar to the calculation of variance,

i=1nE[n1/2n1πi(t)E{n1/2n1πi(t)}3]\displaystyle\sum\limits_{i=1}^{n}E\left[\mid n^{1/2}n^{-1}\pi_{i}(t)-E\{n^{1/2}n^{-1}\pi_{i}(t)\}\mid^{3}\right] =nO{(nh1h2)3/2n3(h1h2)2}\displaystyle=nO\{(nh_{1}h_{2})^{3/2}n^{-3}(h_{1}h_{2})^{-2}\}
=O{(nh1h2)1/2}.\displaystyle=O\{(nh_{1}h_{2})^{-1/2}\}.

Consequently, by condition (A4) we have

(nh1h2)1/2e+T(Un{ϕ0(t),θ0(t)}E[Un{ϕ0(t),θ0(t)}])𝑑N{0,Σ+(t)}.(nh_{1}h_{2})^{1/2}{e^{+}}^{T}\left(U_{n}\{\phi_{0}(t),\theta_{0}(t)\}-E\left[U_{n}\{\phi_{0}(t),\theta_{0}(t)\}\right]\right)\xrightarrow{d}N\{0,\Sigma^{+}(t)\}.

Then we have

nh1h2{γ^(t)γ0(t)}dN{0,A+(t)1Σ+(t)A+(t)1},\sqrt{nh_{1}h_{2}}\{\hat{\gamma}(t)-\gamma_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{+}(t)^{-1}\Sigma^{+}(t)A^{+}(t)^{-1}\},

where

A+(t)=E{Z(t)Z(t)T}η(t,t)andΣ+(t)={K(z1,z2)2dz1dz2}E[Z(t)Z(t)Tσ{t,X(t),Z(t)}2]η(t,t).\begin{split}A^{+}(t)&=E\{Z(t)Z(t)^{T}\}\eta(t,t)\\ \mbox{and}\quad\Sigma^{+}(t)&=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).\end{split}

Therefore the conclusion of Theorem 44 holds. ∎

Appendix B Additional simulations

In this section, we present simulation results for setting (i) β(t)=3(t0.4)2\beta(t)=3(t-0.4)^{2} and γ(t)=sin(2πt)\gamma(t)=\sin(2\pi t) (Table S.1) by using two-step approach with a varying-coefficient model method at the first step and kernel weighting at the second step. We further show results for setting (ii) β(t)=0.4t+0.5\beta(t)=0.4t+0.5 and γ(t)=t\gamma(t)=\sqrt{t} using the proposed three methods (Table S.2).

Table S.1: 1000 simulation results for β(t)=3(t0.4)2\beta(t)=3(t-0.4)^{2} and γ(t)=sin(2πt)\gamma(t)=\mbox{sin}(2\pi t)
t=0.3t=0.3 t=0.6t=0.6 t=0.9t=0.9
BD NP-F Bias SD SE CP Bias SD SE CP Bias SD SE CP
Results for n=400n=400 Two-step  (VCM+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0030.003 0.1570.157 0.1440.144 92.292.2 0.0050.005 0.1240.124 0.1220.122 92.992.9 0.001-0.001 0.1330.133 0.1220.122 92.592.5
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0040.004 0.1980.198 0.1860.186 92.292.2 0.001-0.001 0.1680.168 0.1550.155 91.391.3 0.0060.006 0.1680.168 0.1560.156 92.692.6
auto β(t)\beta(t) 0.007-0.007 0.1540.154 0.1450.145 91.591.5 0.0030.003 0.1300.130 0.1260.126 92.992.9 0.003-0.003 0.1270.127 0.1250.125 93.593.5
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.045-0.045 0.1530.153 0.1440.144 90.390.3 0.0330.033 0.1550.155 0.1400.140 88.888.8 0.0340.034 0.1470.147 0.1340.134 89.789.7
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.045-0.045 0.1630.163 0.1510.151 89.789.7 0.0290.029 0.1660.166 0.1480.148 89.189.1 0.0280.028 0.1480.148 0.1420.142 91.091.0
auto γ(t)\gamma(t) 0.039-0.039 0.1530.153 0.1410.141 90.690.6 0.0350.035 0.1480.148 0.1410.141 91.491.4 0.0320.032 0.1330.133 0.1280.128 92.092.0
Results for n=900n=900 Two-step  (VCM+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0010.001 0.1200.120 0.1190.119 94.694.6 0.003-0.003 0.1010.101 0.1000.100 93.893.8 0.0010.001 0.1020.102 0.1000.100 93.493.4
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0070.007 0.1710.171 0.1610.161 92.192.1 0.0000.000 0.1400.140 0.1370.137 94.194.1 0.0020.002 0.1470.147 0.1360.136 91.991.9
auto β(t)\beta(t) 0.002-0.002 0.1340.134 0.1250.125 92.892.8 0.0000.000 0.1040.104 0.1000.100 93.393.3 0.001-0.001 0.1030.103 0.1000.100 92.692.6
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.024-0.024 0.1400.140 0.1310.131 91.091.0 0.0180.018 0.1350.135 0.1300.130 92.192.1 0.0280.028 0.1270.127 0.1220.122 91.691.6
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.028-0.028 0.1470.147 0.1560.156 90.890.8 0.0130.013 0.1380.138 0.1350.135 92.792.7 0.0160.016 0.1300.130 0.1390.139 92.592.5
auto γ(t)\gamma(t) 0.036-0.036 0.1410.141 0.1290.129 90.290.2 0.0140.014 0.1350.135 0.1290.129 92.792.7 0.0210.021 0.1160.116 0.1230.123 92.192.1
  • Note: “BD” represents the bandwidths, where hh represents the bandwidth in the varying coefficient model (VCM) approach, h1h_{1} and h2h_{2} represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error and “CP” is the point-wise 95%95\% coverage probability. “auto” represents automatic bandwidth selection.

Table S.2: 1000 simulation results for β(t)=0.4t+0.5\beta(t)=0.4t+0.5 and γ(t)=t\gamma(t)=\sqrt{t}
t=0.3t=0.3 t=0.6t=0.6 t=0.9t=0.9
BD NP-F Bias SD SE CP Bias SD SE CP Bias SD SE CP
Results for n=400n=400 Two-step  (Centering+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0030.003 0.1300.130 0.1200.120 91.491.4 0.003-0.003 0.1380.138 0.1330.133 92.692.6 0.004-0.004 0.1500.150 0.1450.145 93.193.1
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0100.010 0.1640.164 0.1550.155 92.092.0 0.0000.000 0.1770.177 0.1720.172 92.692.6 0.0040.004 0.2000.200 0.1860.186 91.891.8
auto β(t)\beta(t) 0.003-0.003 0.1420.142 0.1330.133 91.991.9 0.0050.005 0.1340.134 0.1320.132 93.193.1 0.001-0.001 0.1640.164 0.1540.154 92.692.6
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.025-0.025 0.1470.147 0.1380.138 91.691.6 0.031-0.031 0.1520.152 0.1400.140 88.988.9 0.035-0.035 0.1490.149 0.1350.135 89.689.6
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.026-0.026 0.1580.158 0.1500.150 92.592.5 0.027-0.027 0.1560.156 0.1490.149 91.091.0 0.029-0.029 0.1520.152 0.1440.144 90.890.8
auto γ(t)\gamma(t) 0.024-0.024 0.1440.144 0.1330.133 90.490.4 0.029-0.029 0.1510.151 0.1430.143 91.091.0 0.029-0.029 0.1440.144 0.1360.136 91.291.2
Two-step  (VCM+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.002-0.002 0.1310.131 0.1190.119 91.891.8 0.011-0.011 0.1320.132 0.1320.132 94.294.2 0.004-0.004 0.1470.147 0.1440.144 93.193.1
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.002-0.002 0.1670.167 0.1540.154 91.691.6 0.002-0.002 0.1870.187 0.1690.169 91.291.2 0.0010.001 0.1990.199 0.1880.188 92.792.7
auto β(t)\beta(t) 0.0040.004 0.1300.130 0.1280.128 93.793.7 0.0120.012 0.1440.144 0.1340.134 92.092.0 0.0040.004 0.1510.151 0.1440.144 91.991.9
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.029-0.029 0.1520.152 0.1360.136 89.989.9 0.028-0.028 0.1490.149 0.1410.141 92.392.3 0.033-0.033 0.1450.145 0.1340.134 90.590.5
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.026-0.026 0.1590.159 0.1430.143 89.689.6 0.032-0.032 0.1570.157 0.1530.153 91.091.0 0.039-0.039 0.1540.154 0.1500.150 91.691.6
auto γ(t)\gamma(t) 0.025-0.025 0.1420.142 0.1330.133 91.191.1 0.030-0.030 0.1510.151 0.1370.137 90.490.4 0.043-0.043 0.1400.140 0.1290.129 90.490.4
One-step  (KW)
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} β(t)\beta(t) 0.0010.001 0.1240.124 0.1140.114 91.891.8 0.004-0.004 0.1180.118 0.1150.115 91.891.8 0.005-0.005 0.1160.116 0.1190.119 93.993.9
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0040.004 0.1460.146 0.1360.136 90.690.6 0.005-0.005 0.1480.148 0.1350.135 90.390.3 0.002-0.002 0.1500.150 0.1380.138 91.591.5
auto β(t)\beta(t) 0.001-0.001 0.1010.101 0.0960.096 93.193.1 0.0060.006 0.1250.125 0.1140.114 91.791.7 0.0000.000 0.1110.111 0.1030.103 92.392.3
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} γ(t)\gamma(t) 0.025-0.025 0.1220.122 0.1150.115 92.692.6 0.024-0.024 0.1240.124 0.1150.115 91.491.4 0.030-0.030 0.1200.120 0.1110.111 90.790.7
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.009-0.009 0.1480.148 0.1370.137 91.091.0 0.020-0.020 0.1480.148 0.1350.135 90.290.2 0.018-0.018 0.1510.151 0.1350.135 91.691.6
auto γ(t)\gamma(t) 0.026-0.026 0.1010.101 0.0960.096 91.791.7 0.026-0.026 0.1220.122 0.1130.113 91.091.0 0.040-0.040 0.1020.102 0.0980.098 91.291.2
Results for n=900n=900 Two-step  (Centering+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.001-0.001 0.0990.099 0.0980.098 93.793.7 0.004-0.004 0.1150.115 0.1100.110 92.892.8 0.0070.007 0.1180.118 0.1190.119 94.294.2
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0000.000 0.1430.143 0.1340.134 92.992.9 0.0010.001 0.1570.157 0.1470.147 90.690.6 0.0000.000 0.1660.166 0.1620.162 93.493.4
auto β(t)\beta(t) 0.0000.000 0.1120.112 0.1100.110 93.993.9 0.0020.002 0.1140.114 0.1090.109 92.792.7 0.0010.001 0.1390.139 0.1340.134 92.792.7
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.015-0.015 0.1370.137 0.1260.126 90.490.4 0.016-0.016 0.1340.134 0.1270.127 92.392.3 0.015-0.015 0.1260.126 0.1240.124 92.392.3
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.010-0.010 0.1400.140 0.1430.143 91.691.6 0.021-0.021 0.1420.142 0.1430.143 92.592.5 0.025-0.025 0.1400.140 0.1770.177 92.692.6
auto γ(t)\gamma(t) 0.014-0.014 0.1320.132 0.1280.128 91.391.3 0.014-0.014 0.1340.134 0.1270.127 92.192.1 0.024-0.024 0.1330.133 0.1290.129 91.291.2
Two-step  (VCM+KW)
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.002-0.002 0.1040.104 0.0990.099 93.093.0 0.0040.004 0.1130.113 0.1090.109 93.993.9 0.0000.000 0.1230.123 0.1190.119 92.892.8
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0040.004 0.1430.143 0.1330.133 91.491.4 0.0000.000 0.1570.157 0.1480.148 92.392.3 0.0040.004 0.1660.166 0.1600.160 93.493.4
auto β(t)\beta(t) 0.001-0.001 0.0950.095 0.0980.098 94.994.9 0.0060.006 0.1160.116 0.1130.113 93.693.6 0.0040.004 0.1230.123 0.1190.119 92.992.9
h=n0.6,h1=h2=n0.5h=n^{-0.6},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.009-0.009 0.1300.130 0.1270.127 92.992.9 0.015-0.015 0.1320.132 0.1270.127 92.692.6 0.022-0.022 0.1290.129 0.1240.124 91.991.9
h=n0.7,h1=h2=n0.5h=n^{-0.7},h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.016-0.016 0.1410.141 0.1320.132 91.691.6 0.026-0.026 0.1380.138 0.1380.138 92.292.2 0.025-0.025 0.1360.136 0.1550.155 91.691.6
auto γ(t)\gamma(t) 0.013-0.013 0.1240.124 0.1190.119 92.692.6 0.020-0.020 0.1250.125 0.1190.119 92.292.2 0.031-0.031 0.1310.131 0.1230.123 90.590.5
One-step  (KW)
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} β(t)\beta(t) 0.0020.002 0.1020.102 0.0980.098 94.294.2 0.003-0.003 0.1070.107 0.0990.099 92.992.9 0.0030.003 0.1060.106 0.1000.100 92.692.6
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} β(t)\beta(t) 0.0000.000 0.1360.136 0.1250.125 90.490.4 0.0020.002 0.1350.135 0.1260.126 91.791.7 0.001-0.001 0.1390.139 0.1280.128 92.692.6
auto β(t)\beta(t) 0.001-0.001 0.0860.086 0.0820.082 93.193.1 0.0010.001 0.0850.085 0.0830.083 93.393.3 0.0010.001 0.1000.100 0.0980.098 93.493.4
h1=h2=n0.45h_{1}=h_{2}=n^{-0.45} γ(t)\gamma(t) 0.011-0.011 0.1040.104 0.0990.099 90.990.9 0.017-0.017 0.1050.105 0.0980.098 92.092.0 0.018-0.018 0.0990.099 0.0940.094 91.891.8
h1=h2=n0.5h_{1}=h_{2}=n^{-0.5} γ(t)\gamma(t) 0.006-0.006 0.1310.131 0.1250.125 92.392.3 0.015-0.015 0.1330.133 0.1260.126 92.892.8 0.012-0.012 0.1340.134 0.1210.121 90.890.8
auto γ(t)\gamma(t) 0.021-0.021 0.0860.086 0.0810.081 91.791.7 0.026-0.026 0.0880.088 0.0830.083 91.191.1 0.025-0.025 0.0950.095 0.0930.093 93.193.1
  • Note: “BD” represents the bandwidths, where hh represents the bandwidth in the centering and varying coefficient model (VCM) approaches, h1h_{1} and h2h_{2} represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error and “CP” is the point-wise 95%95\% coverage probability. “auto” represents automatic bandwidth selection.