¹¹footnotetext: School of Mathematics, Jilin University, Changchun 130012, China.²²footnotetext: Department of Statistics, Florida State University, Tallahassee, FL 32306, U.S.A.

Regression analysis of mixed sparse synchronous and asynchronous longitudinal covariates with varying-coefficient models

Congmin Liu¹, Zhuowei Sun¹ and Hongyuan Cao²

Abstract

We consider varying-coefficient models for mixed synchronous and asynchronous longitudinal covariates, where asynchronicity refers to the misalignment of longitudinal measurement times within an individual. We propose three different methods of parameter estimation and inference. The first method is a one-step approach that estimates non-parametric regression functions for synchronous and asynchronous longitudinal covariates simultaneously. The second method is a two-step approach in which synchronous longitudinal covariates are regressed with the longitudinal response by centering the synchronous longitudinal covariates first and, in the second step, the residuals from the first step are regressed with asynchronous longitudinal covariates. The third method is the same as the second method except that in the first step, we omit the asynchronous longitudinal covariate and include a non-parametric intercept in the regression analysis of synchronous longitudinal covariates and the longitudinal response. We further construct simultaneous confidence bands for the non-parametric regression functions to quantify the overall magnitude of variation. Extensive simulation studies provide numerical support for the theoretical findings. The practical utility of the methods is illustrated on a dataset from the ADNI study.

Keywords: Asynchronous longitudinal data; Convergence rate; Kernel-weighting; Simultaneous confidence band; Varying-coefficient model.

1 Introduction

Asynchronous longitudinal data arise when the longitudinal covariates and response are misaligned within individuals. An example of asynchronous longitudinal data is in the analysis of electronic health records (EHRs). EHR data consist of longitudinal medical records of a large number of patients in one or more electronic health care systems, such as vital signs, laboratory tests, procedure codes, and medications, among others (Lou et al., 2021). Due to the retrospective nature of EHRs, measurement times are collected at each clinical encounter, which can be irregular, sparse, and heterogeneous across patients and asynchronous within patients. Another example comes from a dataset in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study (Li et al., 2022). In this dataset, $256$ subjects are followed up for $5$ years, where cognitive decline metrics, such as the Mini-Mental State Examination (MMSE) score, are mismatched with medical imaging measurement, such as fractional anisotropy (FA), which reflects fiber density, axonal diameter, and myelination in white matter, within individuals. Other covariates, such as age, education level, and the number of APOE4 genes, a genetic risk factor for Alzheimer’s disease, are synchronous with longitudinal response MMSE, creating mixed synchronous and asynchronous longitudinal covariates. In Figure 1, we plot the visit times for MMSE scores and FA of all patients and two randomly selected patients.

Refer to caption — Figure 1: Asynchronous observation times of MMSE and FA.

For regression analysis of sparse asynchronous longitudinal data, several methods have been proposed in the literature. For example, Xiong and Dubin (2010) proposed to bin the asynchronous longitudinal covariates to align with the longitudinal response. Classic longitudinal data analysis methods can be used afterwards. Sentürk et al. (2013) explicitly addressed the asynchronous setting for generalized varying-coefficient models with one covariate but did not provide the theoretical properties of the estimators. Cao et al. (2015) proposed a non-parametric kernel weighting approach for generalized linear models to address the asynchronous structure and rigorously established the consistency and asymptotic normality of the resulting estimators. This was extended to a more general set up in Cao et al. (2016) and a partially linear model in Chen and Cao (2017). Sun et al. (2021) studied an informative measurement process for asynchronous longitudinal data. Li et al. (2022) proposed a class of generalized functional partial-linear varying-coefficient models for temporally asynchronous functional neuroimaging data. These methods assume that all longitudinal covariates have the same measurement times that are asynchronous with the longitudinal response. The problem of mixed synchronous and asynchronous longitudinal covariates has not been considered before.

In this paper, we propose statistical methods for estimation and inference of mixed synchronous and asynchronous longitudinal covariates in varying-coefficient models. There is a huge literature on varying-coefficient models (Hastie and Tibshirani, 1993). For example, Fan et al. (2007) considered a semiparametric varying-coefficient partially linear model for synchronous longitudinal data. The main methodological and theoretical developments on varying-coefficient models are summarized in Fan and Zhang (2008). Unlike these classic results, the data structure we work with is very different. The longitudinal covariates have two parts, one part is measured synchronously with the longitudinal response, and another part is asynchronous with the longitudinal response. We consider a dynamic model that allows the covariates’ effect to change with respect to time. To evaluate the overall pattern and magnitude of the time-dependent coefficient or to test whether certain parametric functions are adequate in describing the overall trend of the regression relationship over time, it is more informative to construct simultaneous confidence bands (SCBs) of the non-parametric regression function. Theoretically, to construct SCBs, one has to establish the limiting distribution of the maximum deviation between the estimated and the true function. Due to the sparse, irregular, and asynchronous longitudinal data structure, one has to perform uniform investigations into a dependent empirical process where stochastic variations have to be controlled in time, synchronous longitudinal process, and asynchronous longitudinal process. Progress made in SCB for longitudinal data can be found in Ma et al. (2012), Gu et al. (2014), Zheng et al. (2014), and Cao et al. (2018), among others. For time series data, the simultaneous inference problem is studied in Zhou and Wu (2010) and Liu and Wu (2010), among others. Due to the slow rate of convergence of the extreme value distribution, a resampling method, such as wild bootstrap, is generally adopted in SCB construction (Zhu et al., 2007, 2012). The theoretical guarantees of wild bootstrap can be found in Wu (1986), Liu (1988) and Chapter 2.9 of van der Vaart and Wellner (1996).

To improve efficiency of the varying-coefficient estimation of synchronous longitudinal covariates, we propose a two-step method. In the first step, we regress the response to the synchronous longitudinal covariates; in the second step, the residuals from the first step are regressed to the asynchronous longitudinal covariates. By avoiding unnecessary smoothing of the synchronous longitudinal covariates, we get the non-parametric rate of convergence $n^{2/5}$ for their time-varying coefficient estimation, which is faster than $n^{1/3}$ rate of convergence with a method that estimates mixed synchronous and asynchronous longitudinal covariates simultaneously. The trade-off is that the two-step method requires that the synchronous and asynchronous longitudinal covariates be uncorrelated at all time whereas the one-step method does not impose such an assumption.

The remainder of the paper is organized as follows. Section 2 establishes methods and asymptotic theory for varying-coefficient estimation of mixed synchronous and asynchronous longitudinal covariates using one-step and two-step methods, respectively. We also develop an inferential strategy with simultaneous confidence band for uncertainty quantification of the non-parametric regression function. Section 3 studies the finite-sample performance of the proposed method through simulations. Section 4 analyzes a data set from the ADNI study. Section 5 gives some concluding remarks. All technical proofs are relegated in the Appendix.

2 Estimation and inference

Suppose we have a longitudinal study with $n$ subjects. For the $i$ th subject, denote $Y_{i}(t)$ as the response variable at time $t,$ let $X_{i}(t)$ be a $p\times 1$ vector of time-dependent covariate and $Z_{i}(t)$ be a $q\times 1$ vector of time-dependent covariate, respectively. Mixed synchronous and asynchronous longitudinal covariates for the $i$ th subject refer to the observation of $\{Y_{i}(T_{ij}),X_{i}(T_{ij}),Z_{i}(S_{ik})\}$ , $j=1,\ldots,L_{i};k=1,\ldots,M_{i},$ where $T_{ij},j=1,\ldots,L_{i},$ are the observation times for the longitudinal response and synchronous longitudinal covariates and $S_{ik},k=1,\ldots,M_{i},$ are the observation times for the asynchronous longitudinal covariates with the requirement that $L_{i}$ and $M_{i}$ are finite with probability $1.$ We use univariate and bivariate counting processes to represent the observation times. Specifically, for subject $i=1,\ldots,n,$ $N_{i}(t)=\sum\limits_{j=1}^{L_{i}}I(T_{ij}\leq t)$ counts the number of observation times up to $t$ in the response and synchronous longitudinal covariates and $N^{*}_{i}(t,s)=\sum\limits_{j=1}^{L_{i}}\sum\limits_{k=1}^{M_{i}}I(T_{ij}\leq t,S_{ik}\leq s)$ counts the number of observation times up to $t$ on the response and synchronous longitudinal covariates and up to $s$ in the asynchronous longitudinal covariates (Cao et al., 2015, 2016).

We consider the following varying-coefficient model:

E\left\{Y(t)\mid X(t),Z(t)\right\}=X(t)^{T}\beta(t)+Z(t)^{T}\gamma(t),

(2.1)

where $\beta(t)$ and $\gamma(t)$ are non-parametric regression functions for synchronous and asynchronous longitudinal covariates, respectively. We are interested in the statistical estimation and inference of them.

2.1 A one-step approach

To estimate $\beta(t)$ and $\gamma(t)$ simultaneously, we consider a local linear estimator (Fan and Gijbels, 1996; Cao et al., 2018):

\begin{split}&\{\hat{\beta}_{w}(t),\hat{\dot{\beta}}_{w}(t),\hat{\gamma}_{w}(t),\hat{\dot{\gamma}}_{w}(t)\}\\ =&\mathop{\rm arg\min}_{\{\beta_{0},\beta_{1},\gamma_{0},\gamma_{1}\}}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\left\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\beta_{0}-X_{i}(t_{1})^{T}\beta_{1}(t_{1}-t)\right.\\ &\left.\ \ \quad\quad\quad\quad\quad\quad\quad-Z_{i}(t_{2})^{T}\gamma_{0}-Z_{i}(t_{2})^{T}\gamma_{1}(t_{2}-t)\right\}^{2}dN^{*}_{i}(t_{1},t_{2}).\end{split}

(2.2)

For any fixed time point $t,$ denote $R(t_{1},t_{2},t)=\{X(t_{1})^{T},X(t_{1})^{T}(t_{1}-t),Z(t_{2})^{T},Z(t_{2})^{T}(t_{2}-t)\}^{T}$ , and $\rho_{0}(t)\triangleq\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T},\gamma_{0}(t)^{T},\dot{\gamma}_{0}(t)^{T}\}^{T},$ where $\beta_{0}(t)$ and $\gamma_{0}(t)$ are the true non-parametric regression functions and $\dot{\beta}_{0}(t)$ and $\dot{\gamma}_{0}(t)$ are the corresponding derivative functions. Assume that the conditional variance function $\mbox{var}\left\{Y(t)\mid X(t),Z(t)\right\}=\sigma\left\{t,X(t),Z(t)\right\}^{2}<\infty.$ We need the following assumptions.

(A1)

$N^{*}_{i}(t,s)$ is independent of $\{X_{i}(t),Z_{i}(t),Y_{i}(t)\}$ and $E\left\{dN^{*}_{i}(t,s)\right\}=\eta(t,s)dtds,$ $i=1,\ldots,n,$ where $\eta(t,s)$ is a twice continuously differentiable function for any $(t,s)\in[0,1]^{\otimes 2}.$ For $t_{1}\neq s_{1},t_{2}\neq s_{2},{P}(dN^{*}(t_{1},t_{2})=1\mid N^{*}(s_{1},s_{2})-N^{*}(s_{1}-,s_{2}-)=1)=f(t_{1},t_{2},s_{1},s_{2})$ $dt_{1}dt_{2}$ where $f(t_{1},t_{2},s_{1},s_{2})$ is continuous for $t_{1}\neq s_{1},t_{2}\neq s_{2}$ and
$f(t_{1}\pm,t_{2}\pm,s_{1}\pm,s_{2}\pm)$ exist.

(A2)

$\rho_{0}(t)$ is twice continuously differentiable for $\forall t\in[0,1].$ For any fixed time point $t,$ $E\left\{R(t_{1},t_{2},t)R(s_{1},s_{2},t)^{T}\right\}\in{\mathbb{R}}^{2(p+q)\times 2(p+q)}$ and ${E}\left\{R(t_{1},t_{2},t)\right\}\in{\mathbb{R}}^{2(p+q)}$ are twice continuously differentiable for $(t_{1},t_{2},s_{1},s_{2})\in[0,1]^{\otimes 4}.$ $E\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\in{\mathbb{R}}^{(p+q)\times(p+q)}$ is positive definite for $\forall t\in[0,1],$ where $C(t,t)=\left\{X(t)^{T},Z(t)^{T}\right\}^{T}$ . Moreover,

\|E\left[C(t,t)C(t,t)^{T}\sigma\left\{t,X(t),Z(t)\right\}^{2}\right]\|_{\infty}\eta(t,t)<\infty,\quad\forall t\in[0,1],

where for a square matrix $A,$ $\|A\|_{\infty}=\max_{1\leq i\leq n}\sum_{j=1}^{n}|a_{ij}|.$

(A3)

$K(\cdot,\cdot)$ is a bivariate density function satisfying $\iint z_{1}K(z_{1},z_{2})dz_{1}dz_{2}=\iint z_{2}K(z_{1},z_{2})$
$dz_{1}dz_{2}=0,\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}<\infty,\iint z_{1}^{2}K(z_{1},z_{2})dz_{1}dz_{2}<\infty,\iint z_{2}^{2}K(z_{1},z_{2})dz_{1}dz_{2}<\infty$ and $\iint z_{1}^{2}z_{2}^{2}K(z_{1},z_{2})dz_{1}dz_{2}<\infty.$
(A4)

$(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\to 0$ and $nh_{1}h_{2}\to\infty.$

Condition (A1) requires that the observation process be independent of the covariate processes. Analogous assumptions have been made in Lin and Ying (2001) and Cao et al. (2015). Condition (A2) ensures the identifiability of $\rho_{0}(t)$ at any fixed time point $t,$ and some smoothness assumptions on the expectation of certain functional of $R(t_{1},t_{2},t)$ and $\rho_{0}(t).$ Conditions (A3) and (A4) specify valid kernels and bandwidths. For the bivariate kernel function, we recommend kernel functions with bounded support, such as the product of the univariate Epanechnikov kernel.

The following theorem establishes the asymptotic results of $\hat{\beta}_{w}(t)$ and $\hat{\gamma}_{w}(t).$ The proofs are relegated in the Appendix.

Theorem 1.

Under conditions (A1)-(A4), as $n\rightarrow\infty,$ we have

\sqrt{nh_{1}h_{2}}\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{*}(t)^{-1}\Sigma^{*}(t)A^{*}(t)^{-1}\},

where

A^{*}(t)=E\{C(t,t)C(t,t)^{T}\}\eta(t,t)

and

\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).

Theorem 1 derives the joint asymptotic distribution of $\hat{\beta}_{w}(t)$ and $\hat{\gamma}_{w}(t).$ If we let $h_{\alpha}=h_{1}=h_{2}$ , the bias of $\hat{\beta}_{w}(t)$ and $\hat{\gamma}_{w}(t)$ in Theorem 1 is $O(h_{\alpha}^{2})$ , which is the same as that in Theorem 2 of Cao et al. (2015). The reason why the $O(h_{\alpha})$ term vanishes is that we use symmetric density functions as the kernel functions, which are specified in (A3). When the sample size $n$ goes to infinity, under (A4), the bias terms vanish. In general, the off-diagonal part of the limiting variance covariance matrix is non-zero. In the special case that $E\{X(t)Z(t)^{T}\}=0_{p\times q},$ $\hat{\beta}_{w}(t)$ and $\hat{\gamma}_{w}(t)$ are independent. Moreover, if we assume that $\sigma\{t,X(t),Z(t)\}^{2}=\sigma(t)^{2},$ then $\mbox{Var}\{\hat{\beta}_{w}(t)\}=[E\{X(t)X(t)^{T}\}\eta(t,t)]^{-1}\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}\sigma(t)^{2}$ and $\mbox{Var}\{\hat{\gamma}_{w}(t)\}=[E\{Z(t)Z(t)^{T}\}\eta(t,t)]^{-1}\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}\sigma(t)^{2},$ respectively.

Our method depends on the proper choice of bandwidths. If we let $h_{\alpha}=h_{1}=h_{2},$ according to condition (A4), a valid bandwidth is larger than $O(n^{-1/2}),$ and the bias of $\hat{\beta}_{w}(t)$ and $\hat{\gamma}_{w}(t)$ in Theorem 1 is of order $h_{\alpha}^{2},$ so we should choose bandwidth $h_{\alpha}=o(n^{-1/6}).$ With this choice of $h_{\alpha}$ , we obtain a rate of convergence $O(n^{1/3}),$ which is the same as that achieved in Cao et al. (2015). This is slower than the non-parametric rate of convergence $n^{2/5}$ due to the asynchronous longitudinal data structure.

In Theorem 1, $A^{*}(t)$ and $\Sigma^{*}(t)$ are not directly computable. In practice, we use the estimating equation derived from (2.2) and estimate the variance covariance matrix of $\hat{\rho}_{w}(t)$ with the sandwich formula:

\begin{split}&\widehat{\mathrm{var}}\{\hat{\rho}_{w}(t)\}\\ =&\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)R_{i}(t_{1},t_{2},t)R_{i}(t_{1},t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}\\ &\times\sum_{i=1}^{n}\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\hat{\rho}_{w}(t)\right\}dN^{*}_{i}(t_{1},t_{2})\right]^{\otimes 2}\\ &\times\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)R_{i}(t_{1},t_{2},t)R_{i}(t_{1},t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}.\end{split}

The variance covariance matrices of $\hat{\beta}_{w}(t)$ and $\hat{\gamma}_{w}(t)$ are the upper $p\times p$ submatrix of the upper left $(2p)\times(2p)$ submatrix and the upper $q\times q$ submatrix of the lower right $(2q)\times(2q)$ submatrix of $\widehat{\mathrm{var}}\{\hat{\rho}_{w}(t)\}$ , respectively.

2.2 Simultaneous confidence bands

To get uncertainty quantification and evaluate the overall magnitude of variation of the non-parametric regression functions, we construct simultaneous confidence bands (SCB). Here, we use $\gamma(t)$ to illustrate. Specifically, for a pre-specified confidence level $1-\alpha$ and a given smooth function of $q$ dimensions $a(t),$ we aim to find smooth random functions $L(t)$ and $U(t)$ , such that

\mathbb{P}\{L(t)\leq a(t)^{T}\gamma(t)\leq U(t),\forall t\in[a,b]\}\rightarrow 1-\alpha

as number of subjects $n\rightarrow\infty$ for a pre-specified closed interval $[a,b].$ Theoretical results on SCB construction rely on extreme value theory with slow rate of convergence (Fan and Zhang, 2000; Cao et al., 2018). We shall adopt a wild bootstrap approach to resample the residuals (Wu, 1986; Liu, 1988). The wild bootstrap is computationally efficient as it multiplies a random disturbance on the residuals without repeated estimation of the non-parametric regression function. Specific steps are as follows.

Step 1. For each subject $i,i=1,2,\ldots,n,$ calculate the kernel-weighted residual

\hat{r}_{i}(t_{1},t_{2},t)=K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\hat{\rho}_{w}(t)\right\}.

Step 2. Obtain an approximation of $\hat{\gamma}_{w}(t)-\gamma_{0}(t)$ through

\hat{Q}_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\left\{f_{n}(t)\right\}^{-1}\iint Z_{i}(t_{2})\hat{r}_{i}(t_{1},t_{2},t)dN^{*}_{i}(t_{1},t_{2}),

where $f_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)Z_{i}(t_{2})Z_{i}(t_{2})^{T}dN^{*}_{i}(t_{1},t_{2}).$

Step 3. Independently generate $\left\{u_{i},i=1,\dots,n\right\}$ from a distribution with mean $0$ and standard deviation $1.$ . For example, we can use $N(0,1)$ or the Rademacher distribution which takes values $+1$ and $-1$ with equal probability. Construct the stochastic process

\hat{Q}_{n}^{u}(t)=\frac{1}{n}\sum_{i=1}^{n}u_{i}\cdot\left\{f_{n}(t)\right\}^{-1}\iint Z_{i}(t_{2})\hat{r}_{i}(t_{1},t_{2},t)dN^{*}_{i}(t_{1},t_{2}).

Step 4. Repeat Step 3 $B$ times to obtain $\{\sup\limits_{t}\mid a(t)^{T}\hat{Q}_{n}^{u(1)}(t)\mid,\dots,\sup\limits_{t}\mid a(t)^{T}\hat{Q}_{n}^{u(B)}(t)\mid\},$ and denote its $1-\alpha$ empirical percentile as $c_{\alpha}.$ The SCB for $a(t)^{T}\gamma(t)$ can be written as

\left(a(t)^{T}\hat{\gamma}_{w}(t)-c_{\alpha},a(t)^{T}\hat{\gamma}_{w}(t)+c_{\alpha}\right).

We recommend the Rademacher distribution as the multiplier in Step 3 through extensive simulation studies with different random variables. The details are omitted.

We remark that for a given $p$ -dimensional smooth function $d(t),$ the construction of SCB for $d(t)^{T}\beta(t)$ is exactly the same except in Step 2, let $\hat{J}_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\left\{g_{n}(t)\right\}^{-1}\iint X_{i}(t_{1})\hat{r}_{i}(t_{1},t_{2},t)\newline dN^{*}_{i}(t_{1},t_{2}),$ where $g_{n}(t)=\frac{1}{n}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)X_{i}(t_{1})X_{i}(t_{1})^{T}dN^{*}_{i}(t_{1},t_{2}).$ The remaining steps are based on $\hat{J}_{n}(t)$ and we replace $a(t)$ by $d(t).$

2.3 A two-step approach

In (2.1), $\beta(t)$ is the regression function of the synchronous longitudinal covariate. In classic non-parametric regression analysis of longitudinal data, we get the $O(n^{2/5})$ rate of convergence. Can we achieve the same rate of convergence as the estimation of $\beta(t)$ with mixed synchronous and asynchronous longitudinal covariates? In this subsection, we provide an affirmative answer to this question.

Specifically, we propose a two-step approach to estimate $\beta(t)$ and $\gamma(t),$ respectively. In the first step, we aim to obtain unbiased estimation of $\beta(t)$ from (2.1) using only $X(t)$ and $Y(t)$ . There are two strategies to handle the part that involves $\gamma(t).$ We can either remove it or take it into account by a non-parametric function. In the second step, we regress residuals from the first step on the asynchronous longitudinal covariates $Z(t)$ to get regression function estimation of $\gamma(t)$ similar to that in Cao et al. (2015).

We need a critical assumption

E\{Z(t)\mid X(t)\}=E\{Z(t)\}.

(2.3)

Condition (2.3) states that $X(t)$ and $Z(t)$ are uncorrelated when $X(t)\neq 0$ for any $t.$ This condition is plausible when $X(t)$ indicates a time-invariant treatment, which is randomized and independent of other covariates. This condition means that there is no unmeasured confounder between treatment and outcome. This assumption can be strong in observational studies, where unmeasured confounders abound. In classic linear models, when the covariates are orthogonal with each other, the estimation of regression coefficients remains unbiased with a larger variance if some covariates are omitted. Unlike linear models, bias occurs when orthogonal covariates are omitted for longitudinal data if a non-parametric intercept is not added. If $X(t)$ and $Z(t)$ are correlated, there will be bias using the proposed two-step method.

A centering approach to estimate $\beta(t)$ . From (2.1), taking a conditional expectation on $X(t),$ by (2.3), we have

E\{{Y}(t)\mid X(t)\}={X}(t)^{T}\beta(t)+\{EZ(t)\}^{T}\gamma(t).

(2.4)

Taking another expectation, we get unconditional expectation of $Y(t)$ as

E\{Y(t)\}=\{EX(t)\}^{T}\beta(t)+\{EZ(t)\}^{T}\gamma(t).

(2.5)

By taking the difference between (2.4) and (2.5), we obtain

E\{\tilde{Y}(t)\mid\tilde{X}(t)\}=\tilde{X}(t)^{T}\beta(t),

(2.6)

where $\tilde{Y}(t)=Y(t)-E\{Y(t)\}$ and $\tilde{X}(t)=X(t)-E\{X(t)\}.$ Unbiased estimation of $\beta(t)$ can be obtained through (2.6). In (2.6), $m_{Y}(t)=E\{Y(t)\}$ and $m_{X}(t)=E\{X(t)\}$ are not directly observed. We propose Nadaraya-Watson type of estimators as follows (Nadaraya, 1964; Watson, 1964). Let $\hat{m}_{Y}(t_{0})=\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})Y_{i}(t)dN_{i}(t)\}/\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})dN_{i}(t)\},$ and $\hat{m}_{X}(t_{0})=\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})X_{i}(t)dN_{i}(t)\}/\{\sum\limits_{i=1}^{n}\int K_{h}(t-t_{0})dN_{i}(t)\},$ where $K(\cdot)$ is a kernel function with bounded support, $K(\cdot)\geq 0,\int_{-\infty}^{\infty}K(t)dt=1$ and $K_{h}(t)=h^{-1}K(t/h).$ The bandwidth $h\rightarrow 0$ and $nh\rightarrow\infty.$ Denote $\hat{Y}_{i}(t)=Y_{i}(t)-\hat{m}_{Y}(t)$ and $\hat{X}_{i}(t)=X_{i}(t)-\hat{m}_{X}(t).$ We aim to find

\begin{split}&\{\hat{\beta}_{c}(t),\hat{\dot{\beta}}_{c}(t)\}\\ =&\mathop{\rm arg\min}_{\beta_{0},\beta_{1}\in R^{p}}\Big{[}\frac{1}{n}\sum_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{\hat{Y}_{i}(t_{1})-\hat{X}_{i}(t_{1})^{T}\beta_{0}-\hat{X}_{i}(t_{1})^{T}\beta_{1}(t_{1}-t)\right\}^{2}dN_{i}(t_{1})\Big{]}.\end{split}

(2.7)

Define

\begin{split}S_{n,l}(t)&=\frac{1}{nh}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\hat{X}_{i}(t_{1})\hat{X}_{i}(t_{1})^{T}\{(t_{1}-t)/h\}^{l}dN_{i}(t_{1}),l=0,1,2\\ \mbox{and}\quad q_{n,l}(t)&=\frac{1}{nh}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\hat{X}_{i}(t_{1})\hat{Y}_{i}(t_{1})\{(t_{1}-t)/h\}^{l}dN_{i}(t_{1}),l=0,1.\end{split}

Then

\begin{pmatrix}\hat{\beta}_{c}(t)\\ h\hat{\dot{\beta}}_{c}(t)\end{pmatrix}=\begin{pmatrix}S_{n,0}&S_{n,1}\\ S_{n,1}&S_{n,2}\end{pmatrix}^{-1}\begin{pmatrix}q_{n,0}\\ q_{n,1}\end{pmatrix}.

Next, we show the asymptotic properties of $\hat{\beta}_{c}(t)$ . Denote $\tilde{W}_{i}(t_{1},t)=\{\tilde{X}_{i}(t_{1})^{T},\tilde{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T},$ $\theta_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T}\}^{T},$ $\hat{\theta}_{c}(t)=\{\hat{\beta}_{c}(t)^{T},\hat{\dot{\beta}}_{c}(t)^{T}\}^{T},$ and $\mbox{var}\{\tilde{Y}(t)\mid\tilde{X}(t)\}=\sigma\{t,\tilde{X}(t)\}^{2}.$ We need the following assumptions.

(A5)

$N_{i}(t)$ is independent of $\left\{X_{i}(t),Y_{i}(t)\right\},$ and $E\{dN_{i}(t)\}=\lambda(t)dt,i=1,\dots,n,$ where $\lambda(t)$ is twice continuously differentiable for $\forall t\in[0,1].$
(A6)

$\theta_{0}(t)$ is twice continuously differentiable for $\forall t\in[0,1]$ . For any fixed time point $t,$ $E\{\tilde{W}(t_{1},t)\}$ and $E\{\tilde{W}(t_{1},t)\tilde{W}(s_{1},t)^{T}\}$ are twice continuously differentiable for $(t_{1},s_{1})\in[0,1]^{\otimes 2}$ . Furthermore, $E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)$ is positive definite for $\forall t\in[0,1]$ . Furthermore,

$\|E[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}]\|_{\infty}\lambda(t)<\infty,$

where for a square matrix $A,$ $\|A\|_{\infty}=\max_{1\leq i\leq n}\sum_{j=1}^{n}|a_{ij}|.$
(A7)

$K(\cdot)$ is a symmetric density function satisfying $\int zK(z)dz=0,$ $\int z^{2}K(z)dz<\infty$ and $\int K(z)^{2}dz<\infty.$
(A8)

$nh\to\infty$ and $nh^{5}\to 0.$

Conditions (A5)-(A8) are similar in spirit to conditions (A1)-(A4) in Section 2.1. (A5) is the univariate version of (A1). (A6) is a modified identifiability condition for $\theta_{0}(t).$ (A7) and (A8) are requirements on the kernel function and the bandwidth.

We state the asymptotic distribution of $\hat{\beta}_{c}(t)$ in the following theorem and relegate the proofs in the Appendix.

Theorem 2.

Under conditions (2.3), (A5)-(A8), as $n\rightarrow\infty,$ we have

\sqrt{nh}\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where

\begin{split}A(t)&=E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\\ \mbox{and}\quad\Sigma(t)&=\big{\{}\int K(z)^{2}dz\big{\}}E\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t).\end{split}

We need one bandwidth $h$ and the bias of $\beta_{c}(t)$ in Theorem 2 is $O(h^{2})$ , which is the same as in classic nonparametric regression such as Theorem 1 of Fan and Zhang (2008). Our method depends on the selection of the bandwidth, which strikes a balance between bias and variability. From (A8), we shall choose the bandwidth $h=o(n^{-1/5}).$ With this choice of the bandwidth, we achieve a convergence rate $O(n^{2/5}),$ the same as non-parametric regression of classic longitudinal data (Fan and Zhang, 2008) and faster than that obtained in Theorem 1. It is counterintuitive that we get an improved rate of convergence with less information – we do not use any information contained in $Z_{i}(S_{ik}),i=1,\ldots,n;k=1,\ldots,M_{i}.$ The key is the orthogonality assumption imposed in (2.3), which allows us to obtain an unbiased estimation of the non-parametric regression function $\beta_{0}(t).$ When (2.3) is violated, we can regress the residuals of $Y(t)$ on $Z(t)$ on the residuals of $X(t)$ on $Z(t)$ similar to the FWL theorem in linear models (Frisch and Waugh, 1933; Lovell, 1963; Ding, 2021). We conjecture that the results would be similar to the simultaneous estimation of $\beta(t)$ and $\gamma(t),$ with a slower convergence rate for the estimation of $\beta(t).$

For statistical inference, we need to estimate the variance of $\hat{\beta}_{c}(t).$ We use the sandwich formula to estimate the variance covariance matrix of $\{\hat{\beta}_{c}(t),\hat{\dot{\beta}}_{c}(t)\}$ as follows:

\begin{split}\widehat{\mathrm{var}}\{\hat{\theta}_{c}(t)\}=&\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{W}_{i}(t_{1},t)\hat{W}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1}\\ &\times\sum_{i=1}^{n}\left[\iint K_{h}(t_{1}-t)\hat{W}_{i}(t_{1},t)\left\{\hat{Y}_{i}(t_{1})-\hat{W}_{i}(t_{1},t)^{T}\hat{\theta}_{c}(t)\right\}dN_{i}(t_{1})\right]^{\otimes 2}\\ &\times\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{W}_{i}(t_{1},t)\hat{W}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1},\end{split}

where $\hat{W}_{i}(t_{1},t)=\{\hat{X}_{i}(t_{1})^{T},\hat{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T},\hat{X}_{i}(t)=X_{i}(t)-\hat{m}_{X}(t),$ and $\hat{Y}_{i}(t)=Y_{i}(t)-\hat{m}_{Y}(t).$ The variance covariance matrix of $\hat{\beta}_{c}(t)$ is the upper left $p\times p$ submatrix of $\widehat{\mathrm{var}}\{\hat{\theta}_{c}(t)\}$ while we treat $\dot{\beta}_{c}(t)$ as a nuisance function. In the literature on density estimation, $\hat{\dot{\beta}}_{c}(t)$ estimated from (2.7) has more desirable properties than conventional kernel density estimation. More details on this can be found in Cattaneo et al. (2020).

A varying-coefficient model approach to estimate $\beta(t)$ . Another strategy to obtain an unbiased estimation of $\beta(t)$ is to absorb the part involving $\gamma(t)$ through a non-parametric function. We use (2.5) and treat $E\{Z(t)\}^{T}\gamma(t)$ as a non-parametric intercept. Specifically, consider

E\{Y(t)\mid X(t)\}=X^{*}(t)^{T}\beta^{*}(t),

(2.8)

where $X^{*}(t)=\{1,X(t)^{T}\}^{T},$ and $\beta^{*}(t)=\{\alpha(t),\beta(t)^{T}\}^{T},$ where $\alpha(t)$ and $\beta(t)$ are non-parametric regression functions. Our aim is to find

\begin{split}&\{\hat{\beta}^{*}_{v}(t),\hat{\dot{\beta}}^{*}_{v}(t)\}\\ =&\mathop{\rm arg\min}_{\beta_{0}^{*},\beta_{1}^{*}\in R^{p+1}}\sum_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{Y_{i}(t_{1})-X_{i}^{*}(t_{1})^{T}\beta_{0}^{*}-X_{i}^{*}(t_{1})^{T}\beta_{1}^{*}(t_{1}-t)\right\}^{2}dN_{i}(t_{1}),\end{split}

(2.9)

where $K\left(\cdot\right)$ is a kernel function, $K_{h}(t)=K\left(t/h\right)/h,$ and $h$ is a bandwidth.

Next we present the asymptotic distribution of $\hat{\beta}_{v}(t)$ and relegate the proofs in the Appendix.

Theorem 3.

Under conditions (2.3), (A5)-(A8), as $n\rightarrow\infty,$ we have

\sqrt{nh}\{\hat{\beta}_{v}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where $A(t)$ and $\Sigma(t)$ are specified in Theorem 2.

The asymptotic distribution and the bias of $\hat{\beta}_{v}(t)$ is the same as that of $\hat{\beta}_{c}(t).$ Simulation studies provide numerical support for this. For variance estimation, let $Q_{i}(t_{1},t)=\{1,X_{i}(t_{1})^{T},(t_{1}-t),X_{i}(t_{1})^{T}(t_{1}-t)\}^{T}$ and $\hat{\iota}_{v}(t)\triangleq\{\hat{\alpha}_{v}(t),\hat{\beta}_{v}(t)^{T},\hat{\dot{\alpha}}_{v}(t),\hat{\dot{\beta}}_{v}(t)^{T}\}^{T}.$ We calculate the variance of $\hat{\iota}_{v}(t)$ by the following formula:

\begin{split}\widehat{\mathrm{var}}\{\hat{\iota}_{v}(t)\}=&\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{Q}_{i}(t_{1},t)\hat{Q}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1}\\ &\times\sum_{i=1}^{n}\left[\iint K_{h}(t_{1}-t)\hat{Q}_{i}(t_{1},t)\left\{\hat{Y}_{i}(t_{1})-\hat{Q}_{i}(t_{1},t)^{T}\hat{\iota}_{v}(t)\right\}dN_{i}(t_{1})\right]^{\otimes 2}\\ &\times\left\{\sum_{i=1}^{n}\iint K_{h}(t_{1}-t)\hat{Q}_{i}(t_{1},t)\hat{Q}_{i}(t_{1},t)^{T}dN_{i}(t_{1})\right\}^{-1}.\end{split}

The variance covariance matrix of $\{\hat{\alpha}_{v}(t),\hat{\beta}_{v}(t)^{T}\}^{T}$ is the upper $(p+1)\times(p+1)$ submatrix of $\widehat{\mathrm{var}}\{\hat{\iota}_{v}(t)\}.$ There is a trade-off between the centering approach and the non-parametric intercept approach. In terms of computation, the centering approach is faster. If we are interested in the non-parametric intercept term, the varying-coefficient model is more informative.

A kernel weighting approach to estimate $\gamma(t)$ . In the first stage, $\beta(t)$ can be estimated using either the centering or the varying-coefficient model approach, and they have the same asymptotic distribution. Once $\hat{\beta}(t)$ is obtained, we can estimate $\gamma(t)$ by regressing the residuals from the first step on the asynchronous longitudinal covariates $Z(t).$ Due to the asynchronicity of $Z(t),$ we need two bandwidths in the calculation of $\hat{\gamma}(t).$ Specifically,

\begin{split}\{\hat{\gamma}(t),\hat{\dot{\gamma}}(t)\}=\mathop{\rm arg\min}_{\gamma_{0},\gamma_{1}\in R^{q}}\sum_{i=1}^{n}\iint&K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\hat{\beta}(t_{1})\\ &-Z_{i}(t_{2})^{T}\gamma_{0}-Z_{i}(t_{2})^{T}\gamma_{1}(t_{2}-t)\}^{2}dN^{*}_{i}(t_{1},t_{2}),\end{split}

(2.10)

where $K_{h_{1},h_{2}}(t,s)=K(t/h_{1},s/h_{2})/(h_{1}h_{2}),$ where $K(t,s)$ is a bivariate kernel function, which is usually taken to be the product of the univariate kernel function and $h_{1}$ and $h_{2}$ are bandwidths. As $\hat{\beta}(t)$ has a faster convergence rate, whether we use the true $\beta(t)$ or $\hat{\beta}(t)$ does not affect the inference of $\hat{\gamma}(t).$

We establish the asymptotic distribution of $\hat{\gamma}(t)$ in the following theorem and relegate the proofs to the Appendix.

Theorem 4.

Under conditions (2.3), (A1)-(A4), as $n\rightarrow\infty,$ we have

\sqrt{nh_{1}h_{2}}\{\hat{\gamma}(t)-\gamma_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{+}(t)^{-1}\Sigma^{+}(t)A^{+}(t)^{-1}\},

where

\begin{split}A^{+}(t)&=E\{Z(t)Z(t)^{T}\}\eta(t,t)\\ \mbox{and}\quad\Sigma^{+}(t)&=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).\end{split}

In the second step, we need two bandwidths. If we set $h_{1}=h_{2}=h_{\beta},$ the bias of $\hat{\gamma}(t)$ in Theorem 4 is $O(h_{\beta}^{2}).$ We achieve the same rate of convergence $O(n^{1/3})$ for the estimation of $\hat{\gamma}(t)$ as in the one-step method. This is the price we pay for the asynchronous longitudinal data structure. Whether we can improve this rate of convergence remains open at the moment.

Let $\hat{\phi}(t)=\{\hat{\gamma}(t)^{T},\hat{\dot{\gamma}}(t)^{T}\}^{T},$ $V_{i}(t_{2},t)=\{Z_{i}(t_{2})^{T},Z_{i}(t_{2})^{T}(t_{2}-t)\}^{T},$ we can calculate

		$\displaystyle\widehat{\mathrm{var}}\{\hat{\phi}(t)\}$
		$\displaystyle=\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)V_{i}(t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}$
		$\displaystyle\times\sum_{i=1}^{n}\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)\left\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\hat{\beta}(t_{1})-V_{i}(t_{2},t)^{T}\hat{\phi}(t)\right\}dN^{*}_{i}(t_{1},t_{2})\right]^{\otimes 2}$
		$\displaystyle\times\left\{\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)V_{i}(t_{2},t)^{T}dN^{*}_{i}(t_{1},t_{2})\right\}^{-1}.$

The variance covariance matrix of $\hat{\gamma}(t)$ is the upper $p\times p$ submatrix of $\widehat{\mathrm{var}}\{\hat{\phi}(t)\}.$ Statistical inference can be carried out afterwards.

2.4 Bandwidth selection

Our methods depend critically on the choice of the bandwidth. Asymptotic results provide theoretical range of valid bandwidths. In practice, it is desirable to perform a data-driven bandwidth selection. Cao et al. (2015) and Chen and Cao (2017) used a data-splitting strategy to separately calculate bias and variability and choose a bandwidth that minimizes the mean squared error.

In this paper, we use a kernel smoothed $D$ folds cross-validation procedure to select the optimal bandwidth. We use the two-step approach with centering in the first step and kernel weighting in the second step to illustrate this point. Specifically, we split the data set into $D$ folds, and estimate the non-parametric regression functions withholding one fold. The prediction error is calculated based on the one-fold data withheld and the regression functions are estimated with the other $D-1$ fold data. We do this $D$ times and the optimal bandwidth is the one that minimizes the average squared prediction error for any fixed time point $t.$

Specifically, for $\hat{\beta}(t)$ at the time point $t,$ the average squared prediction error is

\mathrm{ASPE}_{\hat{\beta}(t)}(h;t)=\frac{1}{D}\sum\limits_{k=1}^{D}A_{n^{(k)}}(h;t)/B_{n^{(k)}}(h;t),

(2.11)

where

\begin{split}A_{n^{(k)}}(h;t)&=\sum\limits_{i=1}^{n^{(k)}}\int K_{h}(t_{1}-t)\{\hat{Y}_{i}(t_{1})-\hat{X}_{i}(t_{1})^{T}\hat{\beta}^{(-k)}(t_{1})\}^{2}dN_{i}(t_{1}),\\ \mbox{and}\quad B_{n^{(k)}}(h;t)&=\sum\limits_{i=1}^{n^{(k)}}\int K_{h}(t_{1}-t)dN_{i}(t_{1}),\quad k=1,\ldots,D,\end{split}

where $n^{(k)}$ is the number of subjects in the $k$ th fold, $\hat{\beta}^{(-k)}(t)$ is estimated without the subjects in the $k$ th fold, $\hat{Y}_{i}(t)$ estimates $\tilde{Y}_{i}(t)$ , $\hat{X}_{i}(t)$ estimates $\tilde{X}_{i}(t),$ $K_{h}(\cdot)=K(\cdot)/h$ , $K(\cdot)$ is a kernel density function and $h$ is the bandwidth. Similarly, for $\hat{\gamma}(t)$ at time point $t,$ the average squared prediction error is

\mathrm{ASPE}_{\hat{\gamma}(t)}(h_{1},h_{2};t)=\frac{1}{D}\sum\limits_{k=1}^{D}C_{n^{(k)}}(h_{1},h_{2};t)/D_{n^{(k)}}(h_{1},h_{2};t),

(2.12)

where

\begin{split}C_{n^{(k)}}(h_{1},h_{2};t)=&\sum\limits_{i=1}^{n^{(k)}}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)\\ &\times\left\{Y_{i}(t_{1})-X_{i}(t_{1})^{T}\hat{\beta}(t_{1})-Z_{i}(t_{2})^{T}\hat{\gamma}^{(-k)}(t_{2})\right\}^{2}dN^{*}_{i}(t_{1},t_{2}),\\ \mbox{and}\quad D_{n^{(k)}}(h_{1},h_{2};t)=&\sum\limits_{i=1}^{n^{(k)}}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)dN^{*}_{i}(t_{1},t_{2}),\end{split}

where $n^{(k)}$ is the number of subjects in the $k$ th fold, $\hat{\gamma}^{(-k)}(t)$ is estimated without the subjects in the $k$ th fold, $\hat{\beta}(t)$ is estimated from Step 1 with the optimal bandwidth plugged in, $K_{h_{1},h_{2}}(\cdot,\cdot)$ is a bivariate kernel function, and $h_{1}$ and $h_{2}$ are bandwidths. It is worth noting that when using the two-step approach, we need to first select the bandwidth $h_{{opt}}$ for $\hat{\beta}(t)$ by minimizing (2.11), and then select the optimal bandwidth for $\hat{\gamma}(t)$ by minimizing (2.12). If we want to calculate the optimal bandwidth of the regression function in a certain range of $t,$ we can calculate the integrated ASPE.

3 Numerical studies

In this section, we evaluate the performance of the finite sample of the proposed method through simulations.

3.1 Data generating process and model specification

We generate $1,000$ datasets, each consisting of $400$ or $900$ subjects. For each subject, the number of observations of synchronous longitudinal data $\{X(t),Y(t)\}$ and asynchronous longitudinal covariate $Z(t)$ follow $\mbox{Poisson}(5)+1$ . The observational times are independently generated from the standard uniform distribution $\mathcal{U}(0,1)$ for $\{X(t),Y(t)\}$ and $Z(t),$ respectively. The covariate processes $X(t)$ and $Z(t)$ are both Gaussian, with $E\{X(t)\}=0,$ $E\{Z(t)\}=2(t-0.5)^{2}$ and $\mbox{Cov}\{X(t),X(s)\}=\mbox{Cov}\{Z(t),Z(s)\}=e^{-\mid t-s\mid}.$ The values of the stochastic process $Z(t)$ are recorded at the observational times of $\{X(t),Y(t)\},$ only in the data generation stage, to generate $Y(t)$ in the model (2.1).

The longitudinal response is generated from

Y(t)=X(t)^{T}\beta(t)+Z(t)^{T}\gamma(t)+\epsilon(t),

where $\beta(t)$ and $\gamma(t)$ are time-dependent coefficients, and $\epsilon(t)$ is a Gaussian process, independent of $X(t)$ and $Z(t),$ with $E\{\epsilon(t)\}=0$ and $\mbox{Cov}\{\epsilon(t),\epsilon(s)\}=2^{-\mid t-s\mid}.$ For the time-dependent coefficients, we consider two different settings: (i) $\beta(t)=3(t-0.4)^{2},\gamma(t)=\mbox{sin}(2\pi t);$ and (ii) $\beta(t)=0.4t+0.5,\gamma(t)=\sqrt{t}.$

3.2 Comparison of one-step and two-step methods

We compare two methods: one-step kernel weighting approach and two-step approach with centering in the first step and kernel weighting in the second step. Summary tables of two-step approach using non-parametric intercept and omitting the asynchronous longitudinal covariate at the first step and kernel weighting in the second step can be found in the Appendix. This method produces results similar to the two-step approach with centering in the first step and kernel weighting at the second step.

As suggested in Fan and Gijbels (1996), we use the Epanechnikov kernel $K(t)=0.75(1-t^{2})_{+},$ where $x_{+}=\mbox{max}\{x,0\},$ due to its excellent empirical performance. For the one-step kernel weighting method, we need kernel functions of two dimensions, where we use the product of a one-dimensional Epanechnikov kernel $K(t,s)=0.5625(1-t^{2})_{+}(1-s^{2})_{+}.$ Other kernel functions produce similar results and the details are omitted.

We present the results of setting (i). The results of setting (ii) can be found in the Appendix. The simulation results for $n=400$ and $n=900$ are summarized in Table 1. We assess the estimation and inference of $\gamma(t)$ and $\beta(t)$ at $t=0.3,0.6$ and $0.9.$ The results at other time points are similar and thus omitted.

Table 1: 1000 simulation results for

\beta(t)=3(t-0.4)^{2}

and

\gamma(t)=\sin(2\pi t)

		$t=0.3$				$t=0.6$				$t=0.9$
BD	NP-F	Bias	SD	SE	CP	Bias	SD	SE	CP	Bias	SD	SE	CP
$n=400$		Two-step (Centering+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.007$	$0.154$	$0.144$	$92.4$	$0.003$	$0.130$	$0.123$	$92.5$	$0.005$	$0.131$	$0.122$	$91.8$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$-0.008$	$0.195$	$0.186$	$92.0$	$-0.008$	$0.166$	$0.154$	$91.2$	$0.004$	$0.168$	$0.155$	$91.4$
auto	$\beta(t)$	$-0.004$	$0.166$	$0.154$	$91.7$	$-0.005$	$0.130$	$0.121$	$92.2$	$0.006$	$0.137$	$0.132$	$92.6$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.047$	$0.146$	$0.139$	$90.3$	$0.035$	$0.150$	$0.141$	$91.0$	$0.030$	$0.148$	$0.136$	$90.2$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.046$	$0.157$	$0.157$	$91.4$	$0.024$	$0.156$	$0.146$	$90.3$	$0.025$	$0.149$	$0.141$	$90.8$
auto	$\gamma(t)$	$-0.047$	$0.154$	$0.142$	$90.2$	$0.023$	$0.156$	$0.140$	$89.4$	$0.026$	$0.144$	$0.136$	$90.3$
		One-step (KW)
$h_{1}=h_{2}=n^{-0.45}$	$\beta(t)$	$0.003$	$0.120$	$0.114$	$94.2$	$0.002$	$0.125$	$0.116$	$91.1$	$0.002$	$0.125$	$0.117$	$90.9$
$h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.002$	$0.148$	$0.140$	$92.0$	$0.004$	$0.150$	$0.138$	$90.7$	$-0.007$	$0.154$	$0.137$	$90.0$
auto	$\beta(t)$	$0.001$	$0.111$	$0.106$	$92.6$	$0.003$	$0.118$	$0.110$	$92.0$	$0.007$	$0.114$	$0.107$	$92.4$
$h_{1}=h_{2}=n^{-0.45}$	$\gamma(t)$	$-0.050$	$0.128$	$0.117$	$87.9$	$0.033$	$0.121$	$0.116$	$91.1$	$0.031$	$0.113$	$0.110$	$92.3$
$h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.042$	$0.156$	$0.142$	$88.2$	$0.019$	$0.152$	$0.138$	$89.8$	$0.019$	$0.149$	$0.130$	$87.9$
auto	$\gamma(t)$	$-0.056$	$0.119$	$0.107$	$87.4$	$0.034$	$0.115$	$0.111$	$90.8$	$0.036$	$0.108$	$0.102$	$91.5$
$n=900$		Two-step (Centering+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.004$	$0.122$	$0.119$	$92.8$	$0.001$	$0.105$	$0.100$	$92.2$	$0.004$	$0.103$	$0.100$	$95.0$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$-0.008$	$0.176$	$0.163$	$92.7$	$0.004$	$0.140$	$0.135$	$92.8$	$-0.007$	$0.143$	$0.136$	$92.0$
auto	$\beta(t)$	$-0.002$	$0.124$	$0.119$	$93.3$	$0.001$	$0.105$	$0.102$	$93.0$	$0.002$	$0.102$	$0.100$	$93.6$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.036$	$0.131$	$0.132$	$92.2$	$0.028$	$0.134$	$0.126$	$91.8$	$0.009$	$0.128$	$0.124$	$92.8$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.039$	$0.151$	$0.150$	$91.0$	$0.016$	$0.141$	$0.136$	$93.0$	$0.017$	$0.132$	$0.138$	$92.2$
auto	$\gamma(t)$	$-0.031$	$0.125$	$0.121$	$91.8$	$0.023$	$0.134$	$0.128$	$92.4$	$0.016$	$0.135$	$0.124$	$91.2$
		One-step (KW)
$h_{1}=h_{2}=n^{-0.45}$	$\beta(t)$	$0.000$	$0.105$	$0.102$	$92.9$	$0.001$	$0.102$	$0.100$	$92.4$	$-0.002$	$0.107$	$0.100$	$92.8$
$h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.002$	$0.141$	$0.126$	$91.3$	$0.000$	$0.127$	$0.125$	$92.9$	$-0.001$	$0.132$	$0.124$	$92.1$
auto	$\beta(t)$	$0.005$	$0.092$	$0.090$	$92.7$	$0.003$	$0.091$	$0.085$	$92.3$	$0.001$	$0.086$	$0.083$	$93.2$
$h_{1}=h_{2}=n^{-0.45}$	$\gamma(t)$	$-0.032$	$0.105$	$0.101$	$92.0$	$0.014$	$0.107$	$0.100$	$92.8$	$0.023$	$0.097$	$0.096$	$92.8$
$h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.022$	$0.137$	$0.125$	$92.1$	$0.016$	$0.137$	$0.126$	$92.0$	$0.016$	$0.128$	$0.118$	$92.5$
auto	$\gamma(t)$	$-0.036$	$0.092$	$0.090$	$91.4$	$0.026$	$0.091$	$0.086$	$92.2$	$0.025$	$0.080$	$0.080$	$91.6$

•

Note: “BD” represents the bandwidths, where $h$ represents the bandwidth in the centering approach, $h_{1}$ and $h_{2}$ represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error, and “CP” is the pointwise $95\%$ coverage probability. “auto” represents automatic bandwidth selection.

For the estimation of $\gamma(t)$ based on the one-step kernel weighting method or the two-step method with centering in the first step and kernel weighting in the second step, we find that the bias is generally small, the estimated standard deviation is close to the empirical standard deviation, and the probability of coverage is close to the nominal $95\%$ with a large sample size at the examined time points $t=0.3,0.6$ and $0.9$ . Performance improves as the sample size increases. Therefore, we can conduct a valid pointwise inference of $\gamma(t)$ in practice using the proposed method.

Both methods are valid for inference of $\beta(t)$ . From Figure 2, we observe that with the same bandwidth, the standard deviation of $\hat{\beta}(t)$ is smaller using the two-step method with kernel weighting in the second step compared to the one-step kernel weighting method, which is consistent with our theoretical prediction of the convergence rate.

In the one-step kernel weighting method, we let $h_{1}=h_{2}=h.$ The allowable range of $h$ is $(n^{-1/2},n^{-1/6}).$ In the simulation, we choose the optimal bandwidth in the range of $(n^{-0.5},n^{-0.4}).$ In the two-step method, the allowable bandwidth in the first step for the estimation of $\beta(t)$ is $(n^{-1},n^{-1/5})$ and we choose the optimal bandwidth in the range of $(n^{-0.8},n^{-0.6}).$ The allowable bandwidth in the second step for the estimation of $\gamma(t)$ is $(n^{-1/2},n^{-1/6})$ when $h_{1}=h_{2}=h$ and the optimal bandwidth is selected in the range $(n^{-0.5},n^{-0.4}).$ Automatic bandwidth selection performs well and can be used to conduct inference.

3.3 Simultaneous confidence bands

In this section, we examine the performance of the simultaneous confidence band. Instead of a point-wise confidence interval for fixed time points, it is more informative to examine the overall magnitude of variation of the non-parametric regression function. The data generation process is the same as that in Section 3.1 and we use setting (i) to illustrate. We are interested in constructing simultaneous confidence bands for $\beta(t)$ and $\gamma(t)$ based on the one-step kernel weighting method and the two-step method with kernel weighting at the second stage.

Specifically, we use the wild bootstrap approach described in Section 2 to find the empirical $95$ percentile of the supremum of the difference between estimated non-parametric regression function and true non-parametric regression function. We use the Rademacher distribution, which takes value $+1$ with probability $50\%$ and value $-1$ with probability $50\%$ in the wild bootstrap. In the calculation of the empirical coverage probability, we use $181$ equally spaced grid points in $[0.05,0.95].$ We use the Epanechikov kernel. Results based on other commonly used kernel functions such as the uniform kernel are similar, and thus omitted.

In addition to the simultaneous coverage probability of the pointwise confidence interval and the newly proposed simultaneous confidence band, we also present the root average squared error (RASE). Using $\gamma(t)$ as an example, the RASE for a non-parametric function is defined as ${\rm RASE}^{2}=n_{grid}^{-1}\sum\limits_{k=1}^{n_{grid}}\left\{\hat{\gamma}(t_{k})-\gamma(t_{k})\right\}^{2},$ where $\{t_{k},k=1,\ldots,n_{grid}\}$ are grid points at which the non-parametric function $\gamma(\cdot)$ is estimated.

The results are summarized in Table 2 with $n=400$ and $900.$ We can see that as the sample size increases, the simultaneous confidence band coverage probabilities based on the proposed one-step kernel weighting and two-step method with kernel weighting in the second step are close to the nominal ones, and the point-wise confidence interval is not valid for simultaneous inference. The RASE gets smaller with increased sample size.

Table 2: Simulation results for

\beta(t)=3(t-0.4)^{2},\gamma(t)=\sin(2\pi t)

		$n=400$			$n=900$
BD	NP-F	RASE(SD)	CI	SCB	RASE(SD)	CI	SCB
		Two-step (Centering+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.134(0.022)$	$4.8$	$95.1$	$0.109(0.014)$	$1.0$	$94.9$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.177(0.022)$	$0.0$	$93.6$	$0.152(0.013)$	$0.0$	$93.8$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$0.150(0.027)$	$3.4$	$93.1$	$0.133(0.020)$	$1.6$	$94.1$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$0.158(0.027)$	$1.5$	$95.4$	$0.142(0.018)$	$0.2$	$95.0$
		One-step (KW)
$h_{1}=h_{2}=n^{-0.45}$	$\beta(t)$	$0.120(0.025)$	$14.2$	$97.3$	$0.104(0.018)$	$5.8$	$95.8$
$h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.148(0.027)$	$4.6$	$95.8$	$0.133(0.019)$	$1.7$	$95.5$
$h_{1}=h_{2}=n^{-0.45}$	$\gamma(t)$	$0.125(0.027)$	$9.7$	$94.8$	$0.103(0.019)$	$7.1$	$95.2$
$h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$0.149(0.026)$	$3.3$	$94.6$	$0.133(0.020)$	$1.2$	$94.2$

•

Note: “BD” represents the bandwidths, where $h$ represents the bandwidth in the centering approach, $h_{1}$ and $h_{2}$ represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “RASE” represents the average of RASE, and “SD” represents the standard deviation of RASE. “CI” represents the simultaneous coverage probability based on the pointwise confidence interval. “SCB” represents the simultaneous coverage probability based on SCB. The nominal level is $95\%$ .

Figure 2 shows typical graphs of $\beta(t)$ and $\gamma(t)$ based on the one-step kernel weighting method and the two-step method with kernel weighting at the second step for $n=400$ and $900.$ We can see that in the estimation of $\beta(t),$ both the pointwise confidence interval and the simultaneous confidence band are narrower in the two-step method with kernel weighting at the second step than in that based on the one-step kernel weighting method. This reflects that we obtain a more efficient estimate of $\beta(t)$ by using the two-step method compared to the one-step method. As the sample size increases, we obtain narrower confidence bands for $\beta(t)$ and $\gamma(t).$ .

4 Application to the ADNI Data

In this section, we illustrate the proposed methods on a data set from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study (http://www.adni-info.org/). ADNI is a large-scale multicenter neuroimaging study that collects genetic, clinical, imaging, and cognitive data at multiple time points to better understand Alzheimer’s Disease. We used a subset of ADNI consisting of $n=256$ subjects over a $5$ year follow-up to examine the relationship between diffusion-weighted imaging (DWI) and cognitive decline. For the measurement of DWI, we use fractional anisotropy (FA), which reflects fiber density, axonal diameter, and myelination in white matter. FA is observed at $1$ to $8$ time points. For the outcome related to cognitive decline, we use the Mini-Mental State Examination (MMSE) score, with lower scores indicating impairment. The MMSE score is examined from $1$ to $13$ time points. For each subject, the measurement times of MMSE and FA are mismatched. Other covariates such as age (ranging from $55$ to 96 years), years of education (ranging from $11$ to $20$ ) and the number of APOE4 genes are synchronous with MMSE, creating mixed synchronous and asynchronous longitudinal covariates. The growth curves for $Y_{\rm MMSE}(t),X_{\rm Age}(t)$ and $Z_{\rm FA}(t)$ of randomly selected patients are presented in Figure 3. Details of the registration and normalization of imaging data can be found in Li et al. (2022).

We aim to examine the association between MMSE and log hazard function of FA at region $0.35$ (the whole brain is divided into $100$ regions) accounting for other baseline covariates. We first normalize the data as recommended in Diggle et al. (2002); Fan and Li (2004). Specifically, for continuous baseline covariate education level, we subtract the mean and divide the standard deviation across all individuals. For continuous longitudinal data, such as age, at time point $t$ , we get Nadaraya-Watson type of estimator of $\bar{X}_{\rm{Age}}(t)$ and $\bar{X}^{2}_{\rm{Age}}(t).$ The standard deviation $s_{\rm{Age}}(t)=\sqrt{\bar{X}^{2}_{\rm{Age}}(t)-\{\bar{X}_{\rm{Age}}(t)\}^{2}}$ and the normalized version is $\{X_{\rm{Age}}(t)-\bar{X}_{\rm{Age}}(t)\}/s_{\rm{Age}}(t).$ Other longitudinal data can be normalized in the same way and we omit the details. Below we use the same notation for normalized data.

Our model is

			$\displaystyle E\{Y_{\rm{MMSE}}(t)\mid X_{\rm{Age}}(t),X_{\rm{Edu}},X_{\rm{APOE4}},Z_{\rm{FA}}(t)\}$		(4.13)
		$\displaystyle=$	$\displaystyle\alpha(t)+\beta_{\rm{Age}}(t)X_{\rm{Age}}(t)+\beta_{\rm{Edu}}(t)X_{\rm{Edu}}+\beta_{\rm{APOE4}}(t)X_{\rm{APOE4}}+\gamma_{\rm{FA}}(t)Z_{\rm{FA}}(t),$		(4.13)

where $Y_{\rm{MMSE}}(t)$ denotes longitudinal response of MMSE, $X_{\rm{Age}}(t),X_{\rm{Edu}},X_{\rm{APOE4}}$ denote synchronous longitudinal covariates and $Z_{\rm{FA}}(t)$ denotes asynchronous longitudinal covariate. In (LABEL:realdata1), except for $X_{\rm{APOE4}},$ which denotes number of copies of gene APOE4 ( $0,1$ or $2$ ), the rest are continuous variables.

In our proposed approaches, the two-step method requires that the synchronous and asynchronous longitudinal covariates are uncorrelated. We regress log hazard function of FA on age, education level, and the number of APOE4 genes with three univariate regression models and one multivariate regression model as follows

{}E\{Z_{\rm{FA}}(t)\mid X_{\rm{Age}}(t),X_{\rm{Edu}},X_{\rm{APOE4}}\}=\alpha+\beta_{1}X_{\rm{Age}}(t)+\beta_{2}X_{\rm{Edu}}+\beta_{3}X_{\rm{APOE4}}.

(4.14)

The results are summarized in Table 3. We observe that education level and APOE4 genes are not statistically significantly associated with FA. There is a significant effect of age. We shall use one-step method to estimate regression functions in (LABEL:realdata1) and do the two-step method without age as a sensitivity analysis. The first stage of the two-step approach uses the bandwidth $h=4n^{-0.6}\approx 0.144$ , and the second stage of the two-step approach and one-step kernel weighting approach use the bandwidth $h_{1}=h_{2}=4n^{-0.6}\approx 0.144$ in ADNI data.

Table 3: Regression results of FA on age, education level and APOE4

	Fit separately			Fit in one model
	Age	Edu	APOE4	Age	Edu	APOE4
Estimate	$0.008$	$-0.001$	$-0.021$	$0.009$	$0.001$	$0.004$
SE	$0.003$	$0.007$	$0.029$	$0.003$	$0.007$	$0.031$
$p$ -value	$0.003$	$0.923$	$0.475$	$0.003$	$0.925$	$0.904$

Figure 4 presents the non-parametric regression functions and SCBs of $\beta_{\rm{Age}}(t),\beta_{\rm{Edu}}(t),$ $\beta_{\rm{APOE4}}(t)$ and $\gamma_{\rm{FA}}(t)$ based on one-step method. We also draw constant and linear lines to check their adequacy in explaining the variation. We observe that the SCB of $\beta_{\rm Age}(t)$ does not contain linear or constant lines, which implies that age has a non-linear time varying effect on MMSE. Education level is not statistically significantly associated with MMSE. There is a negative constant effect of APOE4 on MMSE, which is consistent with results in the medical literature (Safieh et al., 2019). FA has a non-zero constant effect on MMSE.

Therefore our analysis suggests the following model

			$\displaystyle E\{Y_{\rm{MMSE}}(t)\mid X_{\rm{Age}}(t),X_{\rm{APOE4}},Z_{\rm{FA}}(t)\}$		(4.15)
		$\displaystyle=$	$\displaystyle\alpha(t)+\beta_{\rm{Age}}(t)X_{\rm{Age}}(t)+\beta_{\rm{APOE4}}X_{\rm{APOE4}}+\gamma_{\rm{FA}}Z_{\rm{FA}}(t),$		(4.15)

we fit model (LABEL:realdata3) based on one step method. As a comparison, we omit age and use the two step approach with centering at the first step and kernel weighting at the second step to estimate the constant coefficients for APOE4 and FA. In other words, we minimize $\sum_{i=1}^{256}\sum_{j=1}^{L_{i}}\{\hat{Y}_{\rm{MMSE}}(T_{ij})-\beta_{\rm{APOE4}}\hat{X}_{\rm{APOE4}}\}^{2}$ to obtain $\hat{\beta}_{\rm{APOE4}}$ at the first step where $\hat{Y}_{\rm{MMSE}}$ and $\hat{X}_{\rm{APOE4}}$ are the centering estimators. Then we combine profile least squares method (Fan and Li, 2004) and kernel weighting method (Cao et al., 2015) to estimate $\gamma_{\rm{FA}}.$

The results of constant coefficient estimates of $\beta_{\rm APOE4}$ and $\gamma_{\rm FA}$ are summarized in Table 4.

Table 4: Constant coefficient estimates of APOE4 and FA

		Estimate	SE	p-value
One-step	APOE4	$-0.472$	$0.078$	$<0.001$
	FA	$0.129$	$0.048$	$0.008$
Two-step	APOE4	$-0.471$	$0.080$	$<0.001$
	FA	$0.121$	$0.050$	$0.015$

We can see that APOE4 has a negative statistically significant association with MMSE and FA has a positive statistically significant association with MMSE based on both one-step and two-step methods. In addition, Figure 5 presents the regression function estimation of $\beta_{\rm{Age}}(t)$ based on one-step method. It seems that the age effect is highly non-linear, which cannot be obtained by existing methods.

5 Concluding remarks

In this paper, we propose new methods to conduct statistical inference of mixed synchronous and asynchronous longitudinal covariates with varying coefficient models. Our one-step method allows arbitrary dependence between the synchronous and asynchronous longitudinal covariates. If we further restrict the synchronous and asynchronous longitudinal covariates to be uncorrelated, we get non-parametric rate of convergence $O(n^{2/5})$ for the regression function of synchronous longitudinal covariates, faster than the rate of convergence $O(n^{1/3})$ for the regression function of asynchronous longitudinal covariates. It remains an open question whether $O(n^{2/5})$ rate of convergence can be obtained for the regression function of asynchronous longitudinal covariates. To quantify the overall magnitude of variation of the non-parametric regression function, we construct simultaneous confidence bands for the non-parametric regression functions with wild bootstrap. The computation is scalable as we put random variation in the residual without the need to compute the non-parametric regression function for each bootstrap sample. Such SCBs allow us to see if the regression function can be simplified as a constant, or a linear function.

We use working independence covariance matrix similar to the GEE (Liang and Zeger, 1986). Pepe and Anderson (1994) shows that with time dependent covariats, working independence is a safe choice. Efficiency can be improved by correctly specified covariance matrix, which is usually difficult.

R code for simulation and data analysis can be found in https://github.com/hongyuan-cao/mixed-longitudinal. The dataset can be found in https://github.com/BIG-S2/GFPLVCM.

Acknowledgements

We thank Ting Li for help with data acquisition. Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

References

Andersen and Gill (1982) Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. Ann. Stat., 10, 1100–1120.
Cao et al. (2016) Cao, H., Li, J. and Fine, J. P. (2016). On last observation carried forward and asynchronous longitudinal regression analysis. Electron. J. Stat., 10, 1155–1180.
Cao et al. (2015) Cao, H., Zeng, D. and Fine, J. P. (2015). Regression analysis of sparse asynchronous longitudinal data. J. R. Stat. Soc. Ser. B-Stat. Methodol., 77, 755–776.
Cao et al. (2018) Cao, H., Liu, W. and Zhou, Z. (2018). Simultaneous nonparametric regression analysis of sparse longitudinal data. Bernoulli, 24, 3013–3038.
Cattaneo et al. (2020) Cattaneo, M. D., Jansson, M. and Ma, X. (2020). Simple local polynomial density estimators. J. Am. Stat. Assoc., 115, 1449–1455.
Chen and Cao (2017) Chen, L. and Cao, H. (2017). Analysis of asynchronous longitudinal data with partially linear models. Electron. J. Stat., 11, 1549–1569.
Diggle et al. (2002) Diggle, P. J., Heagerty, P., Liang, K. Y. and Zeger, S. L. (2002). Analysis of longitudinal data (2nd ed.), Clarendon, TX: Clarendon Press.
Ding (2021) Ding, P. (2021). The Frisch-Waugh-Lovell theorem for standard errors. Stat. Probab. Lett., 168, 108945.
Fan and Gijbels (1996) Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall, London.
Fan and Li (2004) Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc., 99, 710–723.
Fan et al. (2007) Fan, J., Huang, T. and Li, R. (2007). Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function. J. Am. Stat. Assoc., 102, 632–641.
Fan and Zhang (2000) Fan, J. and Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scand. J. Stat., 27, 715–731.
Fan and Zhang (2008) Fan, J. and Zhang, W. (2008). Statistical methods with varying coefficient models. Stat. Interface, 1, 179–195.
Frisch and Waugh (1933) Frisch, R. and Waugh, F. V. (1933). Partial time regressions as compared with individual trends. Econometrica, 1, 387–401.
Gu et al. (2014) Gu, L., Wang, L., Härdle, W. K. and Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. Test, 23, 806–843.
Hastie and Tibshirani (1993) Hastie, T. and Tibshirani R. (1993). Varying-coefficient models. J. R. Stat. Soc. Ser. B-Stat. Methodol., 55, 757–796.
Li et al. (2022) Li, T., Li, T., Zhu, Z. and Zhu H. (2022). Regression analysis of asynchronous longitudinal functional and scalar data. J. Am. Stat. Assoc., 117, 1228–1242.
Liang and Zeger (1986) Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Lin and Ying (2001) Lin, D. Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. J. Am. Stat. Assoc., 96, 103–126.
Liu (1988) Liu, R. Y. (1988). Bootstrap procedures under some non-i.i.d. models. Ann. Stat., 16, 1696–1708.
Liu and Wu (2010) Liu, W. and Wu, W. B. (2010). Simultaneous nonparametric inference of time series. Ann. Stat., 38, 2388–2421.
Lou et al. (2021) Lou, J., Wang, Y., Li, L. and Zeng, D. (2021). Learning latent heterogeneity for type 2 diabetes patients using longitudinal health markers in electronic health records. Stat. Med., 40, 1930–1946.
Lovell (1963) Lovell, M. C. (1963). Seasonal adjustment of economic time series and multiple regression analysis. J. Am. Stat. Assoc., 58, 993–1010.
Ma et al. (2012) Ma, S., Yang, L. and Carroll, R. J. (2012). A simultaneous confidence band for sparse longitudinal regression. Stat. Sin., 22, 95–122.
Nadaraya (1964) Nadaraya, E. A. (1964). On estimating regression. Theory Probab. Appl., 9, 141–142.
Pepe and Anderson (1994) Pepe, M. S. and Anderson, G. L. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Commun. Stat.-Simul. Comput., 23, 939–951.
Safieh et al. (2019) Safieh, M., Korczyn, A. D. and Michaelson, D. M. (2019). ApoE4: an emerging therapeutic target for Alzheimer’s disease. BMC Med., 17, 64.
Sentürk et al. (2013) Sentürk, D., Dalrymple, L. S., Mohammed, S. M., Kaysen, G. A. and Nguyen, D. V. (2013). Modeling time-varying effects with generalized and unsynchronized longitudinal data. Stat. Med., 32, 2971–2987.
Sun et al. (2021) Sun, D., Zhao, H. and Sun, J. (2021). Regression analysis of asynchronous longitudinal data with informative observation processes. Comput. Stat. Data Anal. 107161.
van der Vaart and Wellner (1996) van der Vaart A. and Wellner, J. (1996). Weak convergence and empirical processes. Springer, New York.
Watson (1964) Watson, G. S. (1964). Smooth regression analysis. Sankhyā Ser. A., 26, 359–372.
Wu (1986) Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat., 14, 1261–1295.
Xiong and Dubin (2010) Xiong, X. and Dubin, J. A. (2010). A binning method for analyzing mixed longitudinal data measured at distinct time points. Stat. Med., 29, 1919–1931.
Zheng et al. (2014) Zheng, S., Yang, L. and Härdle, W. K. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. J. Am. Stat. Assoc., 109, 661–-673.
Zhou and Wu (2010) Zhou, Z. and Wu, W. B. (2010). Simultaneous inference of linear models with time varying coefficients. J. R. Stat. Soc. Ser. B-Stat. Methodol., 72, 513–531.
Zhu et al. (2007) Zhu, H., Ibrahim, J. G., Tang, N., Rowe, D. B., Hao, X., Bansal, R. and Peterson, B. S. (2007). A statistical analysis of brain morphology using wild bootstrapping. IEEE Trans. Med. Imaging, 26, 954–966.
Zhu et al. (2012) Zhu, H., Li, R. and Kong, L. (2012). Multivariate varying coefficient model for functional responses. Ann. Stat., 40, 2634–2666.

Appendix A Proofs of main results

In this section, we provide detailed proofs of Theorems 1 to 4. Our main tools are empirical processes.

A.1 Proof of Theorem 1

Theorem 1. Under conditions (A1)-(A4), as $n\rightarrow\infty,$ we have

\sqrt{nh_{1}h_{2}}\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{*}(t)^{-1}\Sigma^{*}(t)A^{*}(t)^{-1}\}

where

A^{*}(t)=E\{C(t,t)C(t,t)^{T}\}\eta(t,t)

and

\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).

Proof.

For any time point $t$ , denote true regression function

\rho_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T},\gamma_{0}(t)^{T},\dot{\gamma}_{0}(t)^{T}\}^{T},

estimated regression function

\hat{\rho}_{w}(t)=\{\hat{\beta}_{w}(t)^{T},\hat{\dot{\beta}}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T},\hat{\dot{\gamma}}_{w}(t)^{T}\}^{T},

and longitudinal covariates

R_{i}(t_{1},t_{2},t)=\{X_{i}(t_{1})^{T},X_{i}(t_{1})^{T}(t_{1}-t),Z_{i}(t_{2})^{T},Z_{i}(t_{2})^{T}(t_{2}-t)\}^{T}.

Let

e^{*}=\begin{pmatrix}e_{1,p\times p}&0_{p\times p}&0_{p\times q}&0_{p\times q}\\ 0_{q\times p}&0_{q\times p}&e_{2,q\times q}&0_{q\times q}\end{pmatrix}^{T}

be a matrix with dimension $(2p+2q)\times(p+q),$ where $e_{1,p\times p}$ is a $p\times p$ identity matrix, $e_{2,q\times q}$ is a $q\times q$ identity matrix and the others are matrices with $0$ entries. Then our estimating equation becomes

	$\displaystyle{e^{*}}^{T}U_{n}^{w}\left\{\rho(t)\right\}$
$\displaystyle\triangleq$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{}}^{T}R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\right\}dN^{}_{i}(t_{1},t_{2})$
$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)C_{i}(t_{1},t_{2})\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\right\}dN^{*}_{i}(t_{1},t_{2}),$	(S.1)

where $C_{i}(t_{1},t_{2})=\{X_{i}(t_{1})^{T},Z_{i}(t_{2})^{T}\}^{T},i=1,\ldots,n$ . Solving $U_{n}^{w}\{\rho(t)\}=0,$ we obtain $\hat{\rho}_{w}(t)$ . We first show the consistency of $\hat{\beta}_{w}(t)$ and $\hat{\gamma}_{w}(t).$ Note that

		$\displaystyle E\left[{e^{*}}^{T}U_{n}^{w}\{\rho(t)\}\right]$
	$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}E\bigg{(}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\right\}\bigg{.}\bigg{.}$
		$\displaystyle\bigg{.}\bigg{.}\times dN_{i}^{}(t_{1},t_{2})\mid X(t),Z(s),N^{}(t,s),t,s\in[0,1]\bigg{]}\bigg{)}$
	$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\bigg{.}$
		$\displaystyle\bigg{.}\times E\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}$
	$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}E\bigg{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R_{i}(t_{1},t_{2},t)\bigg{.}$
		$\displaystyle\bigg{.}\times\left[R_{i}(t_{1},t_{1},t)^{T}\rho_{0}(t)-R_{i}(t_{1},t_{2},t)^{T}\rho(t)+O_{p}\{(t_{1}-t)^{2}\}\right]\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{)}.$

By conditions (A2)-(A4) and Taylor expansion,

	$\displaystyle E\left[{e^{*}}^{T}U_{n}^{w}\{\rho(t)\}\right]=$	$\displaystyle E\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta_{0}(t)-\beta(t)\}^{T},\{\gamma_{0}(t)-\gamma(t)\}^{T}\right]^{T}$
		$\displaystyle+o_{p}(1).$

By the law of large numbers, ${e^{*}}^{T}U_{n}^{w}\left\{\rho(t)\right\}\stackrel{{\scriptstyle p}}{{\rightarrow}}u^{w}\left\{\beta(t),\gamma(t)\right\},$ as $n\to\infty,$ where $u^{w}\left\{\beta(t),\gamma(t)\right\}\newline =E\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta_{0}(t)-\beta(t)\}^{T},\{\gamma_{0}(t)-\gamma(t)\}^{T}\right]^{T},$ a column vector in ${\mathbb{R}}^{p+q}.$ Under condition (A2), $\left\{\beta_{0}(t)^{T},\gamma_{0}(t)^{T}\right\}^{T}$ is the unique solution to $u^{w}\left\{\beta(t),\gamma(t)\right\}=0.$
$\{\hat{\beta}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T}\}^{T}$ solves the estimation equation ${e^{*}}^{T}U_{n}^{w}\left\{\rho(t)\right\}=0.$ By Andersen and Gill (1982), it follows that for any fixed $t,$ $\{\hat{\beta}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T}\}^{T}\stackrel{{\scriptstyle p}}{{\to}}\{\beta_{0}(t)^{T},\gamma_{0}(t)^{T}\}^{T}$ .

We next show the asymptotic normality of $\{\hat{\beta}_{w}(t)^{T},\hat{\gamma}_{w}(t)^{T}\}^{T}.$ Let $\mathcal{P}_{n}$ and $\mathcal{P}$ denote the empirical measure and the true probability measure, respectively. We have

	$\displaystyle(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho(t)\right\}$
$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho(t)\}\Big{.}$
	$\displaystyle\Big{.}\times dN^{*}(t_{1},t_{2})\Big{]}$
	$\displaystyle+(nh_{1}h_{2})^{1/2}{{E}}\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho(t)\}\Big{.}$
	$\displaystyle\Big{.}\times dN^{*}(t_{1},t_{2})\Big{]}$
$\displaystyle=$	$\displaystyle\mathrm{I}+\mathrm{II}.$	(S.2)

For $\mathrm{I},$ we consider the class of functions

\begin{split}&\left\{(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho(t)\}\right.\\ &\bigg{.}\times dN^{*}(t_{1},t_{2}):\mid\rho(t)-\rho_{0}(t)\mid<\epsilon\bigg{\}}\end{split}

for any fixed $t$ and a given constant $\epsilon.$ Note that the functions in this class are differentiable and their corresponding first-order derivatives are bounded according to condition (A2), therefore the functions in this class are Lipschitz continuous in $\rho(t)$ at any fixed $t$ and the Lipschitz constant is uniformly bounded by $(h_{1}h_{2})^{1/2}\|\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)R(t_{1},t_{2},t)^{T}dN^{*}(t_{1},t_{2})\|_{\infty}.$ It can be shown that this is a P-Donsker class (van der Vaart and Wellner, 1996). We obtain that for $\mid\rho(t)-\rho_{0}(t)\mid<M(nh_{1}h_{2})^{-1/2}$ where $M$ is a constant, the first term in (S.2) is equal to

$\displaystyle\mathrm{I}=$	$\displaystyle(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\}\right.$
	$\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})\bigg{]}+o_{p}(1)$
$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}e^{*T}\left(U_{n}^{w}\{\rho_{0}(t)\}-E\left[U_{n}^{w}\{\rho_{0}(t)\}\right]\right)+o_{p}(1).$	(S.3)

For the second term on the right-hand side of equation (S.2), we have

	$\displaystyle\mathrm{II}=$	$\displaystyle(nh_{1}h_{2})^{1/2}E\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\left\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\right\}\Big{.}$
		$\displaystyle\Big{.}\times dN^{*}(t_{1},t_{2})\Big{]}$
		$\displaystyle-(nh_{1}h_{2})^{1/2}E\left\{\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{}}^{T}R(t_{1},t_{2},t)R(t_{1},t_{2},t)^{T}dN^{}(t_{1},t_{2})\right\}$
		$\displaystyle\times\left\{\rho(t)-\rho_{0}(t)\right\}$
	$\displaystyle=$	$\displaystyle I_{1}-I_{2}.$

Define

	$\displaystyle\varphi_{i}(t)$
	$\displaystyle=(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{}}^{T}R_{i}(t_{1},t_{2},t)\left\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho_{0}(t)\right\}dN^{}_{i}(t_{1},t_{2}),$

then $(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\varphi_{i}(t),$ and we have

	$\displaystyle I_{1}=$	$\displaystyle\sqrt{n}E\left\{\varphi(t)\right\}$
	$\displaystyle=$	$\displaystyle\sqrt{n}E\left[E\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\bigg{.}$
		$\displaystyle\bigg{.}\times E\left\{Y(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\bigg{.}$
		$\displaystyle\bigg{.}\times\left\{X(t_{1})^{T}\beta_{0}(t_{1})+Z(t_{1})^{T}\gamma_{0}(t_{1})-R(t_{1},t_{2},t)^{T}\rho_{0}(t)\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\bigg{[}{e^{*}}^{T}R(t_{1},t_{2},t)\left\{R(t_{1},t_{1},t)^{T}-R(t_{1},t_{2},t)^{T}\right\}\bigg{.}$
		$\displaystyle\bigg{.}\times\rho_{0}(t)+O_{p}\{(t_{1}-t)^{2}\}\bigg{]}\eta(t_{1},t_{2})dt_{1}dt_{2},$

By conditions (A3) and (A4), $I_{1}=o(1).$

For $I_{2},$ we have

\begin{split}I_{2}=&(nh_{1}h_{2})^{1/2}{{E}}\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta(t)-\beta_{0}(t)\}^{T},\{\gamma(t)-\gamma_{0}(t)\}^{T}\right]^{T}\\ &+O\left\{(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\right\}.\end{split}

As $n\to\infty$ and $(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\to 0,$

\mathrm{II}=-(nh_{1}h_{2})^{1/2}{{E}}\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\beta(t)-\beta_{0}(t)\}^{T},\{\gamma(t)-\gamma_{0}(t)\}^{T}\right]^{T}+o(1).

(S.4)

Combining (S.2), (S.3) and (S.4), we obtain

		$\displaystyle(nh_{1}h_{2})^{1/2}e^{*T}\left(U_{n}^{w}\left\{\rho_{0}(t)\right\}-E\left[U_{n}^{w}\left\{\rho_{0}(t)\right\}\right]\right)$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}{{E}}\left\{C(t,t)C(t,t)^{T}\right\}\eta(t,t)\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}+o(1).$		(S.5)

Recall

	$\displaystyle\varphi_{i}(t)$
	$\displaystyle=(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{}}^{T}R_{i}(t_{1},t_{2},t)\{Y_{i}(t_{1})-R_{i}(t_{1},t_{2},t)^{T}\rho_{0}(t)\}dN^{}_{i}(t_{1},t_{2}).$

Since $(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\varphi_{i}(t)$ , sum of independent and identically distributed (i.i.d.) random variables, we next calculate $\mbox{var}\left\{\varphi(t)\right\}.$ Note that

	$\displaystyle\mbox{var}\left\{\varphi(t)\right\}$	$\displaystyle=E\left[\mbox{var}\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]$
		$\displaystyle+\mbox{var}\left[E\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right],$

where

		$\displaystyle E\left[\mbox{var}\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]$
	$\displaystyle=$	$\displaystyle h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{}}^{T}R(t_{1},t_{2},t)R(s_{1},s_{2},t)^{T}e^{}\bigg{.}$
		$\displaystyle\times\bigg{.}E\left\{Y(t_{1})Y(s_{1})\mid X(t),Z(s),N^{}(t,s),t,s\in[0,1]\right\}dN^{}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}$
		$\displaystyle-h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{}}^{T}R(t_{1},t_{2},t)R(s_{1},s_{2},t)^{T}e^{}\bigg{.}$
		$\displaystyle\times\bigg{.}E\left\{Y(t_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\bigg{.}$
		$\displaystyle\times\bigg{.}E\left\{Y(s_{1})\mid X(t),Z(s),N^{}(t,s),t,s\in[0,1]\right\}dN^{}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}$
	$\displaystyle=$	$\displaystyle\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}{{E}}\left[C(t,t)C(t,t)^{T}\sigma\left\{t,X(t),Z(t)\right\}^{2}\right]\eta(t,t)+O(h_{1}^{2}+h_{2}^{2}),$

and

		$\displaystyle\mbox{var}\left[E\left\{\varphi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]$
	$\displaystyle=$	$\displaystyle h_{1}h_{2}\mbox{var}\bigg{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\left[\{R(t_{1},t_{1},t)-R(t_{1},t_{2},t)\}^{T}\rho_{0}(t)\right.\bigg{.}$
		$\displaystyle\bigg{.}\left.+O_{p}\left\{(t_{1}-t)^{2}\right\}\right]dN^{*}(t_{1},t_{2})\bigg{)}$
	$\displaystyle=$	$\displaystyle h_{1}h_{2}E\bigg{(}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{*}}^{T}R(t_{1},t_{2},t)\bigg{.}$
		$\displaystyle\bigg{.}\times\left[\left\{R(t_{1},t_{1},t)-R(t_{1},t_{2},t)\right\}^{T}\rho_{0}(t)+O_{p}\left\{(t_{1}-t)^{2}\right\}\right]\bigg{.}$
		$\displaystyle\bigg{.}\times\left[\left\{R(s_{1},s_{1},t)-R(s_{1},s_{2},t)\right\}^{T}\rho_{0}(t)+O_{p}\left\{(s_{1}-t)^{2}\right\}\right]R(s_{1},s_{2},t)^{T}e^{*}\bigg{.}$
		$\displaystyle\bigg{.}\times dN^{}(t_{1},t_{2})dN^{}(s_{1},s_{2})\bigg{)}$
		$\displaystyle-h_{1}h_{2}\bigg{\{}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\bigg{(}{e^{*}}^{T}R(t_{1},t_{2},t)[\left\{R(t_{1},t_{1},t)-R(t_{1},t_{2},t)\right\}^{T}\rho_{0}(t)\bigg{.}\bigg{.}$
		$\displaystyle\bigg{.}\bigg{.}+O_{p}\left\{(t_{1}-t)^{2}\right\}]\bigg{)}dN^{*}(t_{1},t_{2})\bigg{\}}^{\otimes 2}$
	$\displaystyle=$	$\displaystyle O(h_{1}^{2}+h_{2}^{2}).$

To prove the asymptotic normality, we verify the Lyapunov condition. Note that

(nh_{1}h_{2})^{1/2}e^{*T}U_{n}^{w}\left\{\rho_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\varphi_{i}(t)=\sum\limits_{i=1}^{n}n^{1/2}n^{-1}\varphi_{i}(t),

then similar to the calculation of variance, we have

	$\displaystyle\sum\limits_{i=1}^{n}E\left[\mid n^{1/2}n^{-1}\varphi_{i}(t)-E\{n^{1/2}n^{-1}\varphi_{i}(t)\}\mid^{3}\right]$	$\displaystyle=nO\left\{(nh_{1}h_{2})^{3/2}n^{-3}(h_{1}h_{2})^{-2}\right\}$
		$\displaystyle=O\left\{(nh_{1}h_{2})^{-1/2}\right\}.$

Therefore, by condition (A4) $nh_{1}h_{2}\to\infty$ and $(nh_{1}h_{2})^{1/2}(h_{1}^{2}+h_{2}^{2})\to 0,$ we have

(nh_{1}h_{2})^{1/2}e^{*T}\left(U_{n}^{w}\left\{\rho_{0}(t)\right\}-E\left[U_{n}^{w}\left\{\rho_{0}(t)\right\}\right]\right)\xrightarrow{d}N\left\{0,\Sigma^{*}(t)\right\},

where $\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\left\{t,X(t),Z(t)\right\}^{2}\right]\eta(t,t).$ Combining with equation (S.5), we have

\sqrt{nh_{1}h_{2}}\left[\{\hat{\beta}_{w}(t)-\beta_{0}(t)\}^{T},\{\hat{\gamma}_{w}(t)-\gamma_{0}(t)\}^{T}\right]^{T}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{*}(t)^{-1}\Sigma^{*}(t)A^{*}(t)^{-1}\},

where

A^{*}(t)=E\{C(t,t)C(t,t)^{T}\}\eta(t,t)

and

\Sigma^{*}(t)=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[C(t,t)C(t,t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).

Therefore the conclusion of Theorem $1$ holds. ∎

A.2 Proof of Theorem 2

Theorem 2. Under conditions (2.3), (A5)-(A8), as $n\rightarrow\infty,$ we have

\sqrt{nh}\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where

\begin{split}A(t)&=E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\\ \mbox{and}\quad\Sigma(t)&=\big{\{}\int K(z)^{2}dz\big{\}}E\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t).\end{split}

Proof.

For any $t,$ let

\theta_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T}\}^{T},

\hat{\theta}_{c}(t)=\{\hat{\beta}_{c}(t)^{T},\hat{\dot{\beta}}_{c}(t)^{T}\}^{T},

and

\tilde{W}_{i}(t_{1},t)=\{\tilde{X}_{i}(t_{1})^{T},\tilde{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T},

under conditions (A7) and (A8), $\hat{m}_{X}(t)\stackrel{{\scriptstyle p}}{{\to}}m_{X}(t)$ and $\hat{m}_{Y}(t)\stackrel{{\scriptstyle p}}{{\to}}m_{Y}(t).$ Let $e=(e_{1,p\times p},0_{p\times p})^{T}\in{\mathbb{R}}^{2p\times p},$ where $e_{1,p\times p}$ is a $p\times p$ unit matrix, then the estimating equation is

e^{T}U_{n}^{c}\left\{\theta(t)\right\}=n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)e^{T}\tilde{W}_{i}(t_{1},t)\{\tilde{Y}_{i}(t_{1})-\tilde{W}_{i}(t_{1},t)^{T}\theta(t)\}dN_{i}(t_{1})+o_{p}(1).

(S.6)

Solving $e^{T}U_{n}^{c}\{\theta(t)\}=0,$ we obtain $\hat{\beta}_{c}(t)$ . We first show the consistency of $\hat{\beta}_{c}(t).$

\begin{split}&E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\right\}dN(t_{1})\right]\\ =&E\left(E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\right\}dN(t_{1})\mid X(t),N(t),t\in[0,1]\right]\right)\\ =&E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)E\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\mid X(t),t\in[0,1]\right\}\lambda(t_{1})dt_{1}\right]\\ =&E\left(\int K_{h}(t_{1}-t)\left[e^{T}\tilde{W}(t_{1},t)\tilde{W}(t_{1},t)^{T}\left\{\theta_{0}(t)-\theta(t)\right\}+O_{p}\{(t_{1}-t)^{2}\}\right]\lambda(t_{1})dt_{1}\right).\end{split}

Let $t_{1}=t+hz$ , by conditions (A6)-(A8) and the law of large numbers, we have ${e}^{T}U_{n}^{c}\{\theta(t)\}\stackrel{{\scriptstyle p}}{{\to}}\{\beta_{0}(t)-\beta(t)\}E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)$ as $n\to\infty.$ Similar to the proof of Theorem $1$ , under condition (A6) and convexity lemma (Andersen and Gill, 1982), $\hat{\beta}_{c}(t)\stackrel{{\scriptstyle p}}{{\to}}\beta_{0}(t)$ .

Next we show the asymptotic normality of $\hat{\beta}_{c}(t).$ Let $\mathcal{P}_{n}$ and $\mathcal{P}$ denote the empirical measure and the true probability measure respectively, we have

	$\displaystyle\sqrt{nh}e^{T}U_{n}^{c}\left\{\theta(t)\right\}=$	$\displaystyle\sqrt{nh}\left(\mathcal{P}_{n}-\mathcal{P}\right)\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\}dN(t_{1})\right]$
		$\displaystyle+\sqrt{nh}{{E}}\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\}dN(t_{1})\right].$		(S.7)

For the first term in (S.7), we consider the class of functions

\left\{\sqrt{h}\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta(t)\}dN(t_{1}):\mid\theta(t)-\theta_{0}(t)\mid<\epsilon\right\}

for any given constant $\epsilon$ and any fixed time point $t.$ Similarly to the proof in Theorem $1$ , it can be shown that this is a P-Donsker class (van der Vaart and Wellner, 1996), we obtain that for $\mid\theta(t)-\theta_{0}(t)\mid<M(nh)^{-1/2}$ where $M$ is a constant, the first term in (S.7) is equal to

		$\displaystyle\sqrt{nh}\left(\mathcal{P}_{n}-\mathcal{P}\right)\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\}dN(t_{1})\right]+o_{p}(1)$
	$\displaystyle=$	$\displaystyle\sqrt{nh}e^{T}\left(U_{n}^{c}\{\theta_{0}(t)\}-E\left[U_{n}^{c}\{\theta_{0}(t)\}\right]\right)+o_{p}(1).$		(S.8)

For the second term on the right-hand side of equation (S.7), we have

	$\displaystyle\sqrt{nh}{{E}}\Big{(}\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left[\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\left\{\theta(t)-\theta_{0}(t)\right\}-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right]\Big{.}$
	$\displaystyle\Big{.}\times dN(t_{1})\Big{)}$
$\displaystyle=$	$\displaystyle\sqrt{nh}E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}dN(t_{1})\right]$
	$\displaystyle\ -\sqrt{nh}E\left\{\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\tilde{W}(t_{1},t)^{T}\lambda(t_{1})dt_{1}\right\}\left\{\theta(t)-\theta_{0}(t)\right\}$
$\displaystyle=$	$\displaystyle I_{1}-I_{2}.$	(S.9)

For $I_{1},$ let $\psi_{i}(t)=\sqrt{h}\int K_{h}(t_{1}-t)e^{T}\tilde{W}_{i}(t_{1},t)\{\tilde{Y}_{i}(t_{1})-\tilde{W}_{i}(t_{1},t)^{T}\theta_{0}(t)\}dN_{i}(t_{1})$ . $\psi_{i}(t)$ s are i.i.d. and $\sqrt{nh}e^{T}U_{n}^{c}\left\{\theta_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\psi_{i}(t).$ We have

	$\displaystyle I_{1}=$	$\displaystyle\sqrt{n}E\left\{\psi(t)\right\}$
	$\displaystyle=$	$\displaystyle\sqrt{n}E\left[E\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]$
	$\displaystyle=$	$\displaystyle(nh)^{1/2}E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)E\left\{\tilde{Y}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\mid\tilde{X}(t),t\in[0,1]\right\}\lambda(t_{1})dt_{1}\right]$
	$\displaystyle=$	$\displaystyle(nh)^{1/2}E\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}\lambda(t_{1})dt_{1}\right]$
	$\displaystyle=$	$\displaystyle(nh)^{1/2}\int K_{h}(t_{1}-t)E\Big{[}e^{T}\left\{\tilde{W}(t_{1},t)\tilde{W}(t_{1},t_{1})^{T}-\tilde{W}(t_{1},t)\tilde{W}(t_{1},t)^{T}\right\}\theta_{0}(t)\Big{.}$
		$\displaystyle\Big{.}+O_{p}\{(t_{1}-t)^{2}\}\Big{]}\lambda(t_{1})dt_{1},$

let $t_{1}=h_{1}z_{1}+t,$ similar to the proof of Theorem $1$ , by conditions (A7) and (A8), we have $I_{1}=o(1).$

For $I_{2}$ , we have

\begin{split}I_{2}&=\sqrt{nh}e^{T}E\left\{\int K(z)\tilde{W}(t+hz,t)\tilde{W}(t+hz,t)^{T}\lambda(t+hz)dz\right\}\left\{\theta(t)-\theta_{0}(t)\right\}\\ &=\sqrt{nh}E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\left\{\beta(t)-\beta_{0}(t)\right\}+O(n^{1/2}h^{5/2}).\\ \end{split}

Consequently,

\sqrt{nh}E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}+o(1)=\sqrt{nh}e^{T}\left(U_{n}^{c}\left\{\theta_{0}(t)\right\}-E\left[U_{n}^{c}\left\{\theta_{0}(t)\right\}\right]\right).

(S.10)

The variance of $\sqrt{nh}e^{T}U_{n}^{c}\left(\theta_{0}(t)\right)$ can be computed using $\psi_{i}(t)$ defined earlier,

\begin{split}\mathrm{var}\left\{\psi(t)\right\}&=E\left[\mathrm{var}\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]+\mathrm{var}\left[E\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]\\ &=J_{1}+J_{2},\end{split}

where

	$\displaystyle J_{1}=$	$\displaystyle E\left\{E\left(\left[\psi(t)-E\left\{\psi(t)\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]^{2}\mid\tilde{X}(t),N(t),t\in[0,1]\right)\right\}$
	$\displaystyle=$	$\displaystyle hE\bigg{\{}\iint K_{h}(t_{1}-t)K_{h}(s_{1}-t)e^{T}\tilde{W}(t_{1},t)\tilde{W}(s_{1},t)^{T}e\bigg{.}$
		$\displaystyle\bigg{.}\times E\left(\left[\tilde{Y}(t_{1})-E\left\{\tilde{Y}(t_{1})\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]\right.\bigg{.}$
		$\displaystyle\bigg{.}\left.\times\left[\tilde{Y}(s_{1})-E\left\{\tilde{Y}(s_{1})\mid\tilde{X}(t),N(t),t\in[0,1]\right\}\right]\mid\tilde{X}(t),N(t),t\in[0,1]\right)dN(t_{1})dN(s_{1})\bigg{\}}$
	$\displaystyle=$	$\displaystyle\int K(z)^{2}dzE\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t)+O(h^{2}),$

and

	$\displaystyle J_{2}=$	$\displaystyle h\mathrm{var}\left[\int K_{h}(t_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}dN(t_{1})\right]$
	$\displaystyle=$	$\displaystyle hE\bigg{[}\iint K_{h}(t_{1}-t)K_{h}(s_{1}-t)e^{T}\tilde{W}(t_{1},t)\left\{\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right\}\bigg{.}$
		$\displaystyle\times\bigg{.}\left\{\tilde{W}(s_{1},s_{1})^{T}\theta_{0}(s_{1})-\tilde{W}(s_{1},t)^{T}\theta_{0}(t)\right\}\tilde{W}(s_{1},t)^{T}edN(t_{1})dN(s_{1})\bigg{]}$
		$\displaystyle-h\left\{\int K_{h}(t_{1}-t)e^{T}E\left(\tilde{W}(t_{1},t)\left[\tilde{W}(t_{1},t_{1})^{T}\theta_{0}(t_{1})-\tilde{W}(t_{1},t)^{T}\theta_{0}(t)\right]\right)dN(t_{1})\right\}^{\otimes 2}$
	$\displaystyle=$	$\displaystyle O(h^{2}).$

To prove the asymptotic normality, we verify the Lyapunov condition. Note that

(nh)^{1/2}e^{T}U_{n}^{c}\left\{\theta_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\psi_{i}(t)=\sum\limits_{i=1}^{n}n^{1/2}n^{-1}\psi_{i}(t),

then similar to the calculation of variance,

\sum\limits_{i=1}^{n}E\left[\mid n^{1/2}n^{-1}\psi_{i}(t)-E\{n^{1/2}n^{-1}\psi_{i}(t)\}\mid^{3}\right]=nO\left\{(nh)^{3/2}n^{-3}h^{-2}\right\}=O\left\{(nh)^{-1/2}\right\}.

Combining (S.10), we have

\sqrt{nh}\{\hat{\beta}_{c}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A(t)^{-1}\Sigma(t)A(t)^{-1}\},

where

\begin{split}A(t)&=E\{\tilde{X}(t)\tilde{X}(t)^{T}\}\lambda(t)\\ \mbox{and}\quad\Sigma(t)&=\big{\{}\int K(z)^{2}dz\big{\}}E\left[\tilde{X}(t)\tilde{X}(t)^{T}\sigma\{t,\tilde{X}(t)\}^{2}\right]\lambda(t).\end{split}

Therefore the conclusion of Theorem $2$ holds. ∎

A.3 Proof of Theorem 3

Theorem 3.Under conditions (2.3), (A5)-(A8), as $n\rightarrow\infty,$ we have

\sqrt{nh}\{\hat{\beta}_{v}(t)-\beta_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{-1}(t)\Sigma(t)A^{-1}(t)\},

where $A(t)$ and $\Sigma(t)$ are specified in Theorem $2$ .

Proof.

From Fan and Zhang (2008), $\hat{\iota}_{v}(t)\triangleq\{\hat{\alpha}_{v}(t),\hat{\beta}_{v}(t)^{T},\hat{\dot{\alpha}}_{v}(t),\hat{\dot{\beta}}_{v}(t)^{T}\}^{T}$ is consistent estimates of the coefficients in model (2.8). Here we show that $\hat{\beta}_{v}(t)$ is a consistent estimator for $\beta(t)$ in model (2.1). For any fixed time point $t$ , define

U_{n}^{v}\{\iota(t)\}=n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{Y_{i}(t_{1})-Q_{i}(t_{1},t)^{T}\iota(t)\right\}^{2}dN_{i}(t_{1}),

(S.11)

where $\iota(t)=\{\alpha(t),\beta(t)^{T},\dot{\alpha}(t),\dot{\beta}(t)^{T}\}^{T}$ and $Q_{i}(t_{1},t)=\{1,X_{i}(t_{1})^{T},(t_{1}-t),X_{i}(t_{1})^{T}(t_{1}-t)\}^{T}$ . From (2.5), $\alpha(t)=E\{Z(t)\}^{T}\gamma(t)=E\{Y(t)\}-E\{X(t)\}^{T}\beta(t),$ we have

	$\displaystyle U_{n}^{v}\{\iota(t)\}$
$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{Y_{i}(t_{1})-\alpha(t)-\dot{\alpha}(t)(t_{1}-t)-X_{i}(t_{1})^{T}\beta(t)-X_{i}(t_{1})^{T}\dot{\beta}(t)(t_{1}-t)\right\}^{2}$
	$\displaystyle\times dN_{i}(t_{1})$
$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\Big{\{}Y_{i}(t_{1})-EY(t_{1})+O_{p}(t_{1}-t)^{2}+EX(t_{1})^{T}\beta(t_{1})-X_{i}(t_{1})^{T}\beta(t)\Big{.}$
	$\displaystyle\Big{.}-X_{i}(t_{1})^{T}\dot{\beta}(t)(t_{1}-t)\Big{\}}^{2}dN_{i}(t_{1})$
$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}\int K_{h}(t_{1}-t)\left\{\tilde{Y}_{i}(t_{1})-\tilde{W}_{i}(t_{1},t)^{T}\theta(t)\right\}^{2}dN_{i}(t_{1})+o_{p}(1),$	(S.12)

where $\theta(t)=\{\beta(t)^{T},\dot{\beta}(t)^{T}\}^{T}$ , $\tilde{W}_{i}(t_{1},t)=\{\tilde{X}_{i}(t_{1})^{T},\tilde{X}_{i}(t_{1})^{T}(t_{1}-t)\}^{T},$ $\tilde{X}_{i}(t_{1})=X_{i}(t_{1})-EX(t_{1}),$ and $\tilde{Y}_{i}(t_{1})=Y_{i}(t_{1})-EY(t_{1}).$ The estimating equation is the same as the centering approach. Consequently, let $\hat{\theta}_{v}(t)=\{\hat{\beta}_{v}(t)^{T},\hat{\dot{\beta}}_{v}(t)^{T}\}^{T}$ be the estimating function using the varying-coefficient model approach and $\theta_{0}(t)$ be the true function. The proof of Theorem $3$ is the same as that of Theorem $2$ and is thus omitted. ∎

A.4 Proof of Theorem 4

Theorem 4. Under conditions (2.3), (A1)-(A4), as $n\rightarrow\infty,$ we have

\sqrt{nh_{1}h_{2}}\{\hat{\gamma}(t)-\gamma_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{+}(t)^{-1}\Sigma^{+}(t)A^{+}(t)^{-1}\},

where

\begin{split}A^{+}(t)&=E\{Z(t)Z(t)^{T}\}\eta(t,t)\\ \mbox{and}\quad\Sigma^{+}(t)&=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).\end{split}

Proof.

For any fixed time point $t,$ let $\phi_{0}(t)=\{\gamma_{0}(t)^{T},\dot{\gamma}_{0}(t)^{T}\}^{T}$ , $\hat{\phi}(t)=\{\hat{\gamma}(t)^{T},\hat{\dot{\gamma}}(t)^{T}\}^{T},$ $V_{i}(t_{2},t)=\{Z_{i}(t_{2})^{T},Z_{i}(t_{2})^{T}(t_{2}-t)\}^{T},$ $\theta_{0}(t)=\{\beta_{0}(t)^{T},\dot{\beta}_{0}(t)^{T}\}^{T},$ $\hat{\theta}(t)=\{\hat{\beta}(t)^{T},\hat{\dot{\beta}}(t)^{T}\}^{T},$ and $W_{i}(t_{1},t)=\{X_{i}(t_{1})^{T},X_{i}(t_{1})^{T}(t_{1}-t)\}^{T},$ for $|\hat{\theta}(t)-\theta_{0}(t)|<\epsilon$ with any given constant $\epsilon,$ the estimating equation about $\phi(t)$ becomes

	$\displaystyle U_{n}\{\phi(t),\hat{\theta}(t)\}$
$\displaystyle=$	$\displaystyle n^{-1}\sum_{i=1}^{n}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)-V_{i}(t_{2},t)^{T}\phi(t)\right\}$
	$\displaystyle\times dN^{*}_{i}(t_{1},t_{2})+o_{p}(1).$	(S.13)

Let $e^{+}=(e_{2,q\times q},0_{q\times q})^{T}\in{\mathbb{R}}^{2q\times q}$ . By taking expected value of ${e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}$ , we have

		$\displaystyle n^{-1}\sum\limits_{i=1}^{n}{e^{+}}^{T}E\Big{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)E\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)\right.\Big{.}$
		$\displaystyle\left.\Big{.}-V_{i}(t_{2},t)^{T}\phi(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\Big{]}+o_{p}(1)$
	$\displaystyle=$	$\displaystyle n^{-1}\sum\limits_{i=1}^{n}{e^{+}}^{T}E\Big{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)V_{i}(t_{2},t)\left[V_{i}(t_{1},t)^{T}\phi_{0}(t)-V_{i}(t_{2},t)^{T}\phi(t)\right.\Big{.}$
		$\displaystyle\left.\Big{.}+O_{p}\{(t_{1}-t)^{2}\}\right]\eta(t_{1},t_{2})dt_{1}dt_{2}\Big{)}.$

Let $t_{1}=h_{1}z_{1}+t,t_{2}=h_{2}z_{2}+t,$ then by conditions (A2)-(A4) and Taylor expansion, we have $E\left[{e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}\right]=E\{Z(t)Z(t)^{T}\}\eta(t,t)\{\gamma_{0}(t)-\gamma(t)\}+o(1).$ By the law of large numbers, ${e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}\stackrel{{\scriptstyle p}}{{\rightarrow}}u\{\gamma(t)\},$ as $n\to\infty,$ where $u\{\gamma(t)\}=E\{Z(t)Z(t)^{T}\}\eta(t,t)\times\newline \left\{\gamma_{0}(t)-\gamma(t)\right\}.$ Under condition (A2), $\gamma_{0}(t)$ is the unique solution to $u\{\gamma(t)\}=0.$ $\hat{\gamma}(t)$ solves the estimation equation ${e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}=0.$ By Andersen and Gill (1982), it follows that for any fixed $t,$ $\hat{\gamma}(t)\stackrel{{\scriptstyle p}}{{\to}}\gamma_{0}(t)$ .

We next show the asymptotic normality of $\hat{\gamma}(t).$ Using $\mathcal{P}_{n}$ and $\mathcal{P}$ to denote the empirical measure and true probability measure respectively, we have

	$\displaystyle(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\{\phi(t),\hat{\theta}(t)\}$
$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\hat{\theta}(t)-\right.$
	$\displaystyle\bigg{.}-V_{i}(t_{2},t)^{T}\phi(t)\}dN^{*}_{i}(t_{1},t_{2})\bigg{]}$
	$\displaystyle+(nh_{1}h_{2})^{1/2}E\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\hat{\theta}(t)\right.$
	$\displaystyle\bigg{.}-V_{i}(t_{2},t)^{T}\phi(t)dN^{*}_{i}(t_{1},t_{2})\bigg{]}$
$\displaystyle=$	$\displaystyle\mathrm{I}+\mathrm{II}.$	(S.14)

For the term $\mathrm{I}$ and any fixed time point $t,$ we consider the class of functions

	$\displaystyle\left\{(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\left\{Y(t_{1})-W(t_{1},t)^{T}\theta(t)-V(t_{2},t)^{T}\phi(t)\right\}\right.$
	$\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2}):\|\theta(t)-\theta_{0}(t)\|<\epsilon_{1},\|\phi(t)-\phi_{0}(t)\|<\epsilon_{2}\bigg{\}}$

for given constants $\epsilon_{1}$ and $\epsilon_{2}.$ Similarly the proof in Theorem $1$ , we obtain that for $\mid\theta(t)-\theta_{0}(t)\mid<M_{1}(nh)^{-1/2},\mid\phi(t)-\phi_{0}(t)\mid<M_{2}(nh_{1}h_{2})^{-1/2}$ where $M_{1}$ and $M_{2}$ are some constants, the first term $\mathrm{I}$ in (S.14) is equal to

\begin{split}&(nh_{1}h_{2})^{1/2}(\mathcal{P}_{n}-\mathcal{P})\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\right.\\ &\bigg{.}\times\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)-V_{i}(t_{2},t)^{T}\phi_{0}(t)\right\}dN^{*}_{i}(t_{1},t_{2})\bigg{]}+o_{p}(1)\\ =&(nh_{1}h_{2})^{1/2}{e^{+}}^{T}\left(U_{n}\{\phi_{0}(t),\theta_{0}(t)\}-E\left[U_{n}\{\phi_{0}(t),\theta_{0}(t)\}\right]\right)+o_{p}(1).\end{split}

For the second term $\mathrm{II}$ on the right-hand side of equation (S.14), we have

		$\displaystyle\mathrm{II}$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\left[{e^{+}}^{T}V(t_{2},t)\{Y(t_{1})-W(t_{1},t)^{T}\hat{\theta}(t)-V(t_{2},t)^{T}\phi(t)\}\right]$
		$\displaystyle\times\eta(t_{1},t_{2})dt_{1}dt_{2}$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\left\{Y(t_{1})-W(t_{1},t)^{T}\theta_{0}(t)\right.\bigg{.}$
		$\displaystyle\bigg{.}\left.-V(t_{2},t)^{T}\phi_{0}(t)\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}$
		$\displaystyle-(nh_{1}h_{2})^{1/2}E\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)W(t_{1},t)^{T}\{\hat{\theta}(t)-\theta_{0}(t)\}\eta(t_{1},t_{2})dt_{1}dt_{2}\right]$
		$\displaystyle-(nh_{1}h_{2})^{1/2}E\left[\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)V(t_{2},t)^{T}\{\phi(t)-\phi_{0}(t)\}\eta(t_{1},t_{2})dt_{1}dt_{2}\right]$
	$\displaystyle=$	$\displaystyle I_{1}-I_{2}-I_{3}.$

For $I_{1},$ let

\begin{split}\pi_{i}(t)=&(h_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V_{i}(t_{2},t)\left\{Y_{i}(t_{1})-W_{i}(t_{1},t)^{T}\theta_{0}(t)\right.\\ &\left.-V_{i}(t_{2},t)^{T}\phi_{0}(t)\right\}dN^{*}_{i}(t_{1},t_{2}),\end{split}

$\pi_{i}(t)$ s are i.i.d. and $(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\{\phi_{0}(t),\theta_{0}(t)\}=n^{-1/2}\sum_{i=1}^{n}\pi_{i}(t),$ then we have

	$\displaystyle I_{1}=$	$\displaystyle\sqrt{n}E\left\{\pi(t)\right\}$
	$\displaystyle=$	$\displaystyle\sqrt{n}E\left[E\left\{\pi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}E\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\bigg{.}$
		$\displaystyle\bigg{.}\times E\left\{Y(t_{1})-W(t_{1},t)^{T}\theta_{0}(t)-V(t_{2},t)^{T}\phi_{0}(t)\mid X(t),Z(s),t,s\in[0,1]\right\}\eta(t_{1},t_{2})dt_{1}dt_{2}\bigg{]}$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\bigg{[}{e^{+}}^{T}\left\{V(t_{2},t)V(t_{1},t)^{T}-V(t_{2},t)V(t_{2},t)^{T}\right\}\phi_{0}(t)\bigg{.}$
		$\displaystyle\bigg{.}+O_{p}\{(t_{1}-t)^{2}\}\bigg{]}\eta(t_{1},t_{2})dt_{1}dt_{2},$

let $t_{1}=h_{1}z_{1}+t,t_{2}=h_{2}z_{2}+t,$ similar to the proof of Theorem $1$ we have $I_{1}=o(1).$ By the proof of Theorem $2$ , conditions (A4) and (A8), we have $I_{2}=o(1).$

For $I_{3},$ we have

	$\displaystyle I_{3}=$	$\displaystyle(nh_{1}h_{2})^{1/2}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}E\{V(t_{2},t)V(t_{2},t)^{T}\}\{\phi(t)-\phi_{0}(t)\}\eta(t_{1},t_{2})dt_{1}dt_{2}$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}\iint K(z_{1},z_{2}){e^{+}}^{T}E\{V(h_{2}z_{2}+t,t)V(h_{2}z_{2}+t,t)^{T}\}\eta(h_{1}z_{1}+t,h_{2}z_{2}+t)dz_{1}dz_{2}$
		$\displaystyle\times\{\phi(t)-\phi_{0}(t)\}$
	$\displaystyle=$	$\displaystyle(nh_{1}h_{2})^{1/2}E\{Z(t)Z(t)^{T}\}\eta(t,t)\{\gamma(t)-\gamma_{0}(t)\}+o(1).$

Consequently,

\begin{split}&(nh_{1}h_{2})^{1/2}{e^{+}}^{T}\left(U_{n}\{\phi_{0}(t),\theta_{0}(t)\}-E\left[U_{n}\{\phi_{0}(t),\theta_{0}(t)\}\right]\right)\\ =&(nh_{1}h_{2})^{1/2}E\{Z(t)Z(t)^{T}\}\eta(t,t)\{\hat{\gamma}(t)-\gamma_{0}(t)\}+o_{p}(1).\end{split}

Now we show that $(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\{\phi_{0}(t),\theta_{0}(t)\}$ follows the central limit theorem. The variance can be computed using $\pi_{i}.$

\begin{split}\mbox{var}\left\{\pi(t)\right\}=&E\left[\mathrm{var}\left\{\pi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]\\ &+\mathrm{var}\left[E\left\{\pi(t)\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\right]\\ =&J_{1}+J_{2},\end{split}

where

	$\displaystyle J_{1}=$	$\displaystyle h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{+}}^{T}V(t_{2},t)V(s_{2},t)^{T}{e^{+}}\bigg{.}$
		$\displaystyle\times\bigg{.}E\left\{Y(t_{1})Y(s_{1})\mid X(t),Z(s),N^{}(t,s),t,s\in[0,1]\right\}dN^{}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}$
		$\displaystyle-h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{+}}^{T}V(t_{2},t)V(s_{2},t)^{T}e^{+}\bigg{.}$
		$\displaystyle\times\bigg{.}E\left\{Y(t_{1})\mid X(t),Z(s),N^{*}(t,s),t,s\in[0,1]\right\}\bigg{.}$
		$\displaystyle\times\bigg{.}E\left\{Y(s_{1})\mid X(t),Z(s),N^{}(t,s),t,s\in[0,1]\right\}dN^{}(t_{1},t_{2})dN^{*}(s_{1},s_{2})\bigg{]}$
	$\displaystyle=$	$\displaystyle\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t)+O(h_{1}^{2}+h_{2}^{2}),$

and

	$\displaystyle J_{2}=$	$\displaystyle h_{1}h_{2}\mbox{var}\bigg{[}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t){e^{+}}^{T}V(t_{2},t)\left\{Z(t_{1})^{T}\gamma_{0}(t_{1})-V(t_{2},t)^{T}\phi_{0}(t)\right\}\bigg{.}$
		$\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})\bigg{]}$
	$\displaystyle=$	$\displaystyle h_{1}h_{2}E\bigg{[}\iiiint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)K_{h_{1},h_{2}}(s_{1}-t,s_{2}-t){e^{+}}^{T}V(t_{2},t)\bigg{.}$
		$\displaystyle\bigg{.}\times\left\{V(t_{1},t)-V(t_{2},t)\right\}^{T}\phi_{0}(t)\phi_{0}(t)^{T}\left\{V(s_{1},t)-V(s_{2},t)\right\}V(s_{2},t)^{T}e^{+}\bigg{.}$
		$\displaystyle\bigg{.}\times dN^{}(t_{1},t_{2})dN^{}(s_{1},s_{2})\bigg{]}$
		$\displaystyle-h_{1}h_{2}\bigg{(}\iint K_{h_{1},h_{2}}(t_{1}-t,t_{2}-t)E\left[{e^{+}}^{T}V(t_{2},t)\{V(t_{1},t)-V(t_{2},t)\}^{T}\right]\phi_{0}(t)\bigg{.}$
		$\displaystyle\bigg{.}\times dN^{*}(t_{1},t_{2})\bigg{)}^{\otimes 2}$
	$\displaystyle=$	$\displaystyle O(h_{1}^{2}+h_{2}^{2}).$

Thus

\begin{split}\mbox{var}\left\{\pi(t)\right\}&=\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t)+O(h_{1}^{2}+h_{2}^{2})\\ &=\Sigma^{+}(t)+O(h_{1}^{2}+h_{2}^{2}).\end{split}

To prove the asymptotic normality, we verify the Lyapunov condition. Note that

(nh_{1}h_{2})^{1/2}{e^{+}}^{T}U_{n}\left\{\phi_{0}(t),\theta_{0}(t)\right\}=n^{-1/2}\sum\limits_{i=1}^{n}\pi_{i}(t)=\sum\limits_{i=1}^{n}n^{1/2}n^{-1}\pi_{i}(t),

then similar to the calculation of variance,

	$\displaystyle\sum\limits_{i=1}^{n}E\left[\mid n^{1/2}n^{-1}\pi_{i}(t)-E\{n^{1/2}n^{-1}\pi_{i}(t)\}\mid^{3}\right]$	$\displaystyle=nO\{(nh_{1}h_{2})^{3/2}n^{-3}(h_{1}h_{2})^{-2}\}$
		$\displaystyle=O\{(nh_{1}h_{2})^{-1/2}\}.$

Consequently, by condition (A4) we have

(nh_{1}h_{2})^{1/2}{e^{+}}^{T}\left(U_{n}\{\phi_{0}(t),\theta_{0}(t)\}-E\left[U_{n}\{\phi_{0}(t),\theta_{0}(t)\}\right]\right)\xrightarrow{d}N\{0,\Sigma^{+}(t)\}.

Then we have

\sqrt{nh_{1}h_{2}}\{\hat{\gamma}(t)-\gamma_{0}(t)\}\stackrel{{\scriptstyle d}}{{\to}}N\{0,A^{+}(t)^{-1}\Sigma^{+}(t)A^{+}(t)^{-1}\},

where

\begin{split}A^{+}(t)&=E\{Z(t)Z(t)^{T}\}\eta(t,t)\\ \mbox{and}\quad\Sigma^{+}(t)&=\big{\{}\iint K(z_{1},z_{2})^{2}dz_{1}dz_{2}\big{\}}E\left[Z(t)Z(t)^{T}\sigma\{t,X(t),Z(t)\}^{2}\right]\eta(t,t).\end{split}

Therefore the conclusion of Theorem $4$ holds. ∎

Appendix B Additional simulations

In this section, we present simulation results for setting (i) $\beta(t)=3(t-0.4)^{2}$ and $\gamma(t)=\sin(2\pi t)$ (Table S.1) by using two-step approach with a varying-coefficient model method at the first step and kernel weighting at the second step. We further show results for setting (ii) $\beta(t)=0.4t+0.5$ and $\gamma(t)=\sqrt{t}$ using the proposed three methods (Table S.2).

Table S.1: 1000 simulation results for

\beta(t)=3(t-0.4)^{2}

and

\gamma(t)=\mbox{sin}(2\pi t)

		$t=0.3$				$t=0.6$				$t=0.9$
BD	NP-F	Bias	SD	SE	CP	Bias	SD	SE	CP	Bias	SD	SE	CP
Results for $n=400$		Two-step (VCM+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.003$	$0.157$	$0.144$	$92.2$	$0.005$	$0.124$	$0.122$	$92.9$	$-0.001$	$0.133$	$0.122$	$92.5$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.004$	$0.198$	$0.186$	$92.2$	$-0.001$	$0.168$	$0.155$	$91.3$	$0.006$	$0.168$	$0.156$	$92.6$
auto	$\beta(t)$	$-0.007$	$0.154$	$0.145$	$91.5$	$0.003$	$0.130$	$0.126$	$92.9$	$-0.003$	$0.127$	$0.125$	$93.5$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.045$	$0.153$	$0.144$	$90.3$	$0.033$	$0.155$	$0.140$	$88.8$	$0.034$	$0.147$	$0.134$	$89.7$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.045$	$0.163$	$0.151$	$89.7$	$0.029$	$0.166$	$0.148$	$89.1$	$0.028$	$0.148$	$0.142$	$91.0$
auto	$\gamma(t)$	$-0.039$	$0.153$	$0.141$	$90.6$	$0.035$	$0.148$	$0.141$	$91.4$	$0.032$	$0.133$	$0.128$	$92.0$
Results for $n=900$		Two-step (VCM+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.001$	$0.120$	$0.119$	$94.6$	$-0.003$	$0.101$	$0.100$	$93.8$	$0.001$	$0.102$	$0.100$	$93.4$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.007$	$0.171$	$0.161$	$92.1$	$0.000$	$0.140$	$0.137$	$94.1$	$0.002$	$0.147$	$0.136$	$91.9$
auto	$\beta(t)$	$-0.002$	$0.134$	$0.125$	$92.8$	$0.000$	$0.104$	$0.100$	$93.3$	$-0.001$	$0.103$	$0.100$	$92.6$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.024$	$0.140$	$0.131$	$91.0$	$0.018$	$0.135$	$0.130$	$92.1$	$0.028$	$0.127$	$0.122$	$91.6$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.028$	$0.147$	$0.156$	$90.8$	$0.013$	$0.138$	$0.135$	$92.7$	$0.016$	$0.130$	$0.139$	$92.5$
auto	$\gamma(t)$	$-0.036$	$0.141$	$0.129$	$90.2$	$0.014$	$0.135$	$0.129$	$92.7$	$0.021$	$0.116$	$0.123$	$92.1$

•

Note: “BD” represents the bandwidths, where $h$ represents the bandwidth in the varying coefficient model (VCM) approach, $h_{1}$ and $h_{2}$ represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error and “CP” is the point-wise $95\%$ coverage probability. “auto” represents automatic bandwidth selection.

Table S.2: 1000 simulation results for

\beta(t)=0.4t+0.5

and

\gamma(t)=\sqrt{t}

		$t=0.3$				$t=0.6$				$t=0.9$
BD	NP-F	Bias	SD	SE	CP	Bias	SD	SE	CP	Bias	SD	SE	CP
Results for $n=400$		Two-step (Centering+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.003$	$0.130$	$0.120$	$91.4$	$-0.003$	$0.138$	$0.133$	$92.6$	$-0.004$	$0.150$	$0.145$	$93.1$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.010$	$0.164$	$0.155$	$92.0$	$0.000$	$0.177$	$0.172$	$92.6$	$0.004$	$0.200$	$0.186$	$91.8$
auto	$\beta(t)$	$-0.003$	$0.142$	$0.133$	$91.9$	$0.005$	$0.134$	$0.132$	$93.1$	$-0.001$	$0.164$	$0.154$	$92.6$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.025$	$0.147$	$0.138$	$91.6$	$-0.031$	$0.152$	$0.140$	$88.9$	$-0.035$	$0.149$	$0.135$	$89.6$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.026$	$0.158$	$0.150$	$92.5$	$-0.027$	$0.156$	$0.149$	$91.0$	$-0.029$	$0.152$	$0.144$	$90.8$
auto	$\gamma(t)$	$-0.024$	$0.144$	$0.133$	$90.4$	$-0.029$	$0.151$	$0.143$	$91.0$	$-0.029$	$0.144$	$0.136$	$91.2$
		Two-step (VCM+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$-0.002$	$0.131$	$0.119$	$91.8$	$-0.011$	$0.132$	$0.132$	$94.2$	$-0.004$	$0.147$	$0.144$	$93.1$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$-0.002$	$0.167$	$0.154$	$91.6$	$-0.002$	$0.187$	$0.169$	$91.2$	$0.001$	$0.199$	$0.188$	$92.7$
auto	$\beta(t)$	$0.004$	$0.130$	$0.128$	$93.7$	$0.012$	$0.144$	$0.134$	$92.0$	$0.004$	$0.151$	$0.144$	$91.9$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.029$	$0.152$	$0.136$	$89.9$	$-0.028$	$0.149$	$0.141$	$92.3$	$-0.033$	$0.145$	$0.134$	$90.5$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.026$	$0.159$	$0.143$	$89.6$	$-0.032$	$0.157$	$0.153$	$91.0$	$-0.039$	$0.154$	$0.150$	$91.6$
auto	$\gamma(t)$	$-0.025$	$0.142$	$0.133$	$91.1$	$-0.030$	$0.151$	$0.137$	$90.4$	$-0.043$	$0.140$	$0.129$	$90.4$
		One-step (KW)
$h_{1}=h_{2}=n^{-0.45}$	$\beta(t)$	$0.001$	$0.124$	$0.114$	$91.8$	$-0.004$	$0.118$	$0.115$	$91.8$	$-0.005$	$0.116$	$0.119$	$93.9$
$h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.004$	$0.146$	$0.136$	$90.6$	$-0.005$	$0.148$	$0.135$	$90.3$	$-0.002$	$0.150$	$0.138$	$91.5$
auto	$\beta(t)$	$-0.001$	$0.101$	$0.096$	$93.1$	$0.006$	$0.125$	$0.114$	$91.7$	$0.000$	$0.111$	$0.103$	$92.3$
$h_{1}=h_{2}=n^{-0.45}$	$\gamma(t)$	$-0.025$	$0.122$	$0.115$	$92.6$	$-0.024$	$0.124$	$0.115$	$91.4$	$-0.030$	$0.120$	$0.111$	$90.7$
$h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.009$	$0.148$	$0.137$	$91.0$	$-0.020$	$0.148$	$0.135$	$90.2$	$-0.018$	$0.151$	$0.135$	$91.6$
auto	$\gamma(t)$	$-0.026$	$0.101$	$0.096$	$91.7$	$-0.026$	$0.122$	$0.113$	$91.0$	$-0.040$	$0.102$	$0.098$	$91.2$
Results for $n=900$		Two-step (Centering+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$-0.001$	$0.099$	$0.098$	$93.7$	$-0.004$	$0.115$	$0.110$	$92.8$	$0.007$	$0.118$	$0.119$	$94.2$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.000$	$0.143$	$0.134$	$92.9$	$0.001$	$0.157$	$0.147$	$90.6$	$0.000$	$0.166$	$0.162$	$93.4$
auto	$\beta(t)$	$0.000$	$0.112$	$0.110$	$93.9$	$0.002$	$0.114$	$0.109$	$92.7$	$0.001$	$0.139$	$0.134$	$92.7$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.015$	$0.137$	$0.126$	$90.4$	$-0.016$	$0.134$	$0.127$	$92.3$	$-0.015$	$0.126$	$0.124$	$92.3$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.010$	$0.140$	$0.143$	$91.6$	$-0.021$	$0.142$	$0.143$	$92.5$	$-0.025$	$0.140$	$0.177$	$92.6$
auto	$\gamma(t)$	$-0.014$	$0.132$	$0.128$	$91.3$	$-0.014$	$0.134$	$0.127$	$92.1$	$-0.024$	$0.133$	$0.129$	$91.2$
		Two-step (VCM+KW)
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$-0.002$	$0.104$	$0.099$	$93.0$	$0.004$	$0.113$	$0.109$	$93.9$	$0.000$	$0.123$	$0.119$	$92.8$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.004$	$0.143$	$0.133$	$91.4$	$0.000$	$0.157$	$0.148$	$92.3$	$0.004$	$0.166$	$0.160$	$93.4$
auto	$\beta(t)$	$-0.001$	$0.095$	$0.098$	$94.9$	$0.006$	$0.116$	$0.113$	$93.6$	$0.004$	$0.123$	$0.119$	$92.9$
$h=n^{-0.6},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.009$	$0.130$	$0.127$	$92.9$	$-0.015$	$0.132$	$0.127$	$92.6$	$-0.022$	$0.129$	$0.124$	$91.9$
$h=n^{-0.7},h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.016$	$0.141$	$0.132$	$91.6$	$-0.026$	$0.138$	$0.138$	$92.2$	$-0.025$	$0.136$	$0.155$	$91.6$
auto	$\gamma(t)$	$-0.013$	$0.124$	$0.119$	$92.6$	$-0.020$	$0.125$	$0.119$	$92.2$	$-0.031$	$0.131$	$0.123$	$90.5$
		One-step (KW)
$h_{1}=h_{2}=n^{-0.45}$	$\beta(t)$	$0.002$	$0.102$	$0.098$	$94.2$	$-0.003$	$0.107$	$0.099$	$92.9$	$0.003$	$0.106$	$0.100$	$92.6$
$h_{1}=h_{2}=n^{-0.5}$	$\beta(t)$	$0.000$	$0.136$	$0.125$	$90.4$	$0.002$	$0.135$	$0.126$	$91.7$	$-0.001$	$0.139$	$0.128$	$92.6$
auto	$\beta(t)$	$-0.001$	$0.086$	$0.082$	$93.1$	$0.001$	$0.085$	$0.083$	$93.3$	$0.001$	$0.100$	$0.098$	$93.4$
$h_{1}=h_{2}=n^{-0.45}$	$\gamma(t)$	$-0.011$	$0.104$	$0.099$	$90.9$	$-0.017$	$0.105$	$0.098$	$92.0$	$-0.018$	$0.099$	$0.094$	$91.8$
$h_{1}=h_{2}=n^{-0.5}$	$\gamma(t)$	$-0.006$	$0.131$	$0.125$	$92.3$	$-0.015$	$0.133$	$0.126$	$92.8$	$-0.012$	$0.134$	$0.121$	$90.8$
auto	$\gamma(t)$	$-0.021$	$0.086$	$0.081$	$91.7$	$-0.026$	$0.088$	$0.083$	$91.1$	$-0.025$	$0.095$	$0.093$	$93.1$

•

Note: “BD” represents the bandwidths, where $h$ represents the bandwidth in the centering and varying coefficient model (VCM) approaches, $h_{1}$ and $h_{2}$ represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error and “CP” is the point-wise $95\%$ coverage probability. “auto” represents automatic bandwidth selection.