Regression analysis of mixed sparse synchronous and asynchronous longitudinal covariates with varying-coefficient models
Abstract
We consider varying-coefficient models for mixed synchronous and asynchronous longitudinal covariates, where asynchronicity refers to the misalignment of longitudinal measurement times within an individual. We propose three different methods of parameter estimation and inference. The first method is a one-step approach that estimates non-parametric regression functions for synchronous and asynchronous longitudinal covariates simultaneously. The second method is a two-step approach in which synchronous longitudinal covariates are regressed with the longitudinal response by centering the synchronous longitudinal covariates first and, in the second step, the residuals from the first step are regressed with asynchronous longitudinal covariates. The third method is the same as the second method except that in the first step, we omit the asynchronous longitudinal covariate and include a non-parametric intercept in the regression analysis of synchronous longitudinal covariates and the longitudinal response. We further construct simultaneous confidence bands for the non-parametric regression functions to quantify the overall magnitude of variation. Extensive simulation studies provide numerical support for the theoretical findings. The practical utility of the methods is illustrated on a dataset from the ADNI study.
Keywords: Asynchronous longitudinal data; Convergence rate; Kernel-weighting; Simultaneous confidence band; Varying-coefficient model.
1 Introduction
Asynchronous longitudinal data arise when the longitudinal covariates and response are misaligned within individuals. An example of asynchronous longitudinal data is in the analysis of electronic health records (EHRs). EHR data consist of longitudinal medical records of a large number of patients in one or more electronic health care systems, such as vital signs, laboratory tests, procedure codes, and medications, among others (Lou et al., 2021). Due to the retrospective nature of EHRs, measurement times are collected at each clinical encounter, which can be irregular, sparse, and heterogeneous across patients and asynchronous within patients. Another example comes from a dataset in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study (Li et al., 2022). In this dataset, subjects are followed up for years, where cognitive decline metrics, such as the Mini-Mental State Examination (MMSE) score, are mismatched with medical imaging measurement, such as fractional anisotropy (FA), which reflects fiber density, axonal diameter, and myelination in white matter, within individuals. Other covariates, such as age, education level, and the number of APOE4 genes, a genetic risk factor for Alzheimer’s disease, are synchronous with longitudinal response MMSE, creating mixed synchronous and asynchronous longitudinal covariates. In Figure 1, we plot the visit times for MMSE scores and FA of all patients and two randomly selected patients.


For regression analysis of sparse asynchronous longitudinal data, several methods have been proposed in the literature. For example, Xiong and Dubin (2010) proposed to bin the asynchronous longitudinal covariates to align with the longitudinal response. Classic longitudinal data analysis methods can be used afterwards. Sentürk et al. (2013) explicitly addressed the asynchronous setting for generalized varying-coefficient models with one covariate but did not provide the theoretical properties of the estimators. Cao et al. (2015) proposed a non-parametric kernel weighting approach for generalized linear models to address the asynchronous structure and rigorously established the consistency and asymptotic normality of the resulting estimators. This was extended to a more general set up in Cao et al. (2016) and a partially linear model in Chen and Cao (2017). Sun et al. (2021) studied an informative measurement process for asynchronous longitudinal data. Li et al. (2022) proposed a class of generalized functional partial-linear varying-coefficient models for temporally asynchronous functional neuroimaging data. These methods assume that all longitudinal covariates have the same measurement times that are asynchronous with the longitudinal response. The problem of mixed synchronous and asynchronous longitudinal covariates has not been considered before.
In this paper, we propose statistical methods for estimation and inference of mixed synchronous and asynchronous longitudinal covariates in varying-coefficient models. There is a huge literature on varying-coefficient models (Hastie and Tibshirani, 1993). For example, Fan et al. (2007) considered a semiparametric varying-coefficient partially linear model for synchronous longitudinal data. The main methodological and theoretical developments on varying-coefficient models are summarized in Fan and Zhang (2008). Unlike these classic results, the data structure we work with is very different. The longitudinal covariates have two parts, one part is measured synchronously with the longitudinal response, and another part is asynchronous with the longitudinal response. We consider a dynamic model that allows the covariates’ effect to change with respect to time. To evaluate the overall pattern and magnitude of the time-dependent coefficient or to test whether certain parametric functions are adequate in describing the overall trend of the regression relationship over time, it is more informative to construct simultaneous confidence bands (SCBs) of the non-parametric regression function. Theoretically, to construct SCBs, one has to establish the limiting distribution of the maximum deviation between the estimated and the true function. Due to the sparse, irregular, and asynchronous longitudinal data structure, one has to perform uniform investigations into a dependent empirical process where stochastic variations have to be controlled in time, synchronous longitudinal process, and asynchronous longitudinal process. Progress made in SCB for longitudinal data can be found in Ma et al. (2012), Gu et al. (2014), Zheng et al. (2014), and Cao et al. (2018), among others. For time series data, the simultaneous inference problem is studied in Zhou and Wu (2010) and Liu and Wu (2010), among others. Due to the slow rate of convergence of the extreme value distribution, a resampling method, such as wild bootstrap, is generally adopted in SCB construction (Zhu et al., 2007, 2012). The theoretical guarantees of wild bootstrap can be found in Wu (1986), Liu (1988) and Chapter 2.9 of van der Vaart and Wellner (1996).
To improve efficiency of the varying-coefficient estimation of synchronous longitudinal covariates, we propose a two-step method. In the first step, we regress the response to the synchronous longitudinal covariates; in the second step, the residuals from the first step are regressed to the asynchronous longitudinal covariates. By avoiding unnecessary smoothing of the synchronous longitudinal covariates, we get the non-parametric rate of convergence for their time-varying coefficient estimation, which is faster than rate of convergence with a method that estimates mixed synchronous and asynchronous longitudinal covariates simultaneously. The trade-off is that the two-step method requires that the synchronous and asynchronous longitudinal covariates be uncorrelated at all time whereas the one-step method does not impose such an assumption.
The remainder of the paper is organized as follows. Section 2 establishes methods and asymptotic theory for varying-coefficient estimation of mixed synchronous and asynchronous longitudinal covariates using one-step and two-step methods, respectively. We also develop an inferential strategy with simultaneous confidence band for uncertainty quantification of the non-parametric regression function. Section 3 studies the finite-sample performance of the proposed method through simulations. Section 4 analyzes a data set from the ADNI study. Section 5 gives some concluding remarks. All technical proofs are relegated in the Appendix.
2 Estimation and inference
Suppose we have a longitudinal study with subjects. For the th subject, denote as the response variable at time let be a vector of time-dependent covariate and be a vector of time-dependent covariate, respectively. Mixed synchronous and asynchronous longitudinal covariates for the th subject refer to the observation of , where are the observation times for the longitudinal response and synchronous longitudinal covariates and are the observation times for the asynchronous longitudinal covariates with the requirement that and are finite with probability We use univariate and bivariate counting processes to represent the observation times. Specifically, for subject counts the number of observation times up to in the response and synchronous longitudinal covariates and counts the number of observation times up to on the response and synchronous longitudinal covariates and up to in the asynchronous longitudinal covariates (Cao et al., 2015, 2016).
We consider the following varying-coefficient model:
(2.1) |
where and are non-parametric regression functions for synchronous and asynchronous longitudinal covariates, respectively. We are interested in the statistical estimation and inference of them.
2.1 A one-step approach
To estimate and simultaneously, we consider a local linear estimator (Fan and Gijbels, 1996; Cao et al., 2018):
(2.2) |
For any fixed time point denote , and where and are the true non-parametric regression functions and and are the corresponding derivative functions. Assume that the conditional variance function We need the following assumptions.
-
(A1)
is independent of and where is a twice continuously differentiable function for any For where is continuous for and
exist. -
(A2)
is twice continuously differentiable for For any fixed time point and are twice continuously differentiable for is positive definite for where . Moreover,
where for a square matrix
-
(A3)
is a bivariate density function satisfying
and -
(A4)
and
Condition (A1) requires that the observation process be independent of the covariate processes. Analogous assumptions have been made in Lin and Ying (2001) and Cao et al. (2015). Condition (A2) ensures the identifiability of at any fixed time point and some smoothness assumptions on the expectation of certain functional of and Conditions (A3) and (A4) specify valid kernels and bandwidths. For the bivariate kernel function, we recommend kernel functions with bounded support, such as the product of the univariate Epanechnikov kernel.
The following theorem establishes the asymptotic results of and The proofs are relegated in the Appendix.
Theorem 1.
Under conditions (A1)-(A4), as we have
where
and
Theorem 1 derives the joint asymptotic distribution of and If we let , the bias of and in Theorem 1 is , which is the same as that in Theorem 2 of Cao et al. (2015). The reason why the term vanishes is that we use symmetric density functions as the kernel functions, which are specified in (A3). When the sample size goes to infinity, under (A4), the bias terms vanish. In general, the off-diagonal part of the limiting variance covariance matrix is non-zero. In the special case that and are independent. Moreover, if we assume that then and respectively.
Our method depends on the proper choice of bandwidths. If we let according to condition (A4), a valid bandwidth is larger than and the bias of and in Theorem 1 is of order so we should choose bandwidth With this choice of , we obtain a rate of convergence which is the same as that achieved in Cao et al. (2015). This is slower than the non-parametric rate of convergence due to the asynchronous longitudinal data structure.
In Theorem 1, and are not directly computable. In practice, we use the estimating equation derived from (2.2) and estimate the variance covariance matrix of with the sandwich formula:
The variance covariance matrices of and are the upper submatrix of the upper left submatrix and the upper submatrix of the lower right submatrix of , respectively.
2.2 Simultaneous confidence bands
To get uncertainty quantification and evaluate the overall magnitude of variation of the non-parametric regression functions, we construct simultaneous confidence bands (SCB). Here, we use to illustrate. Specifically, for a pre-specified confidence level and a given smooth function of dimensions we aim to find smooth random functions and , such that
as number of subjects for a pre-specified closed interval Theoretical results on SCB construction rely on extreme value theory with slow rate of convergence (Fan and Zhang, 2000; Cao et al., 2018). We shall adopt a wild bootstrap approach to resample the residuals (Wu, 1986; Liu, 1988). The wild bootstrap is computationally efficient as it multiplies a random disturbance on the residuals without repeated estimation of the non-parametric regression function. Specific steps are as follows.
Step 1. For each subject calculate the kernel-weighted residual
Step 2. Obtain an approximation of through
where
Step 3. Independently generate from a distribution with mean and standard deviation . For example, we can use or the Rademacher distribution which takes values and with equal probability. Construct the stochastic process
Step 4. Repeat Step 3 times to obtain and denote its empirical percentile as The SCB for can be written as
We recommend the Rademacher distribution as the multiplier in Step 3 through extensive simulation studies with different random variables. The details are omitted.
We remark that for a given -dimensional smooth function the construction of SCB for is exactly the same except in Step 2, let where The remaining steps are based on and we replace by
2.3 A two-step approach
In (2.1), is the regression function of the synchronous longitudinal covariate. In classic non-parametric regression analysis of longitudinal data, we get the rate of convergence. Can we achieve the same rate of convergence as the estimation of with mixed synchronous and asynchronous longitudinal covariates? In this subsection, we provide an affirmative answer to this question.
Specifically, we propose a two-step approach to estimate and respectively. In the first step, we aim to obtain unbiased estimation of from (2.1) using only and . There are two strategies to handle the part that involves We can either remove it or take it into account by a non-parametric function. In the second step, we regress residuals from the first step on the asynchronous longitudinal covariates to get regression function estimation of similar to that in Cao et al. (2015).
We need a critical assumption
(2.3) |
Condition (2.3) states that and are uncorrelated when for any This condition is plausible when indicates a time-invariant treatment, which is randomized and independent of other covariates. This condition means that there is no unmeasured confounder between treatment and outcome. This assumption can be strong in observational studies, where unmeasured confounders abound. In classic linear models, when the covariates are orthogonal with each other, the estimation of regression coefficients remains unbiased with a larger variance if some covariates are omitted. Unlike linear models, bias occurs when orthogonal covariates are omitted for longitudinal data if a non-parametric intercept is not added. If and are correlated, there will be bias using the proposed two-step method.
A centering approach to estimate . From (2.1), taking a conditional expectation on by (2.3), we have
(2.4) |
Taking another expectation, we get unconditional expectation of as
(2.5) |
By taking the difference between (2.4) and (2.5), we obtain
(2.6) |
where and Unbiased estimation of can be obtained through (2.6). In (2.6), and are not directly observed. We propose Nadaraya-Watson type of estimators as follows (Nadaraya, 1964; Watson, 1964). Let and where is a kernel function with bounded support, and The bandwidth and Denote and We aim to find
(2.7) |
Define
Then
Next, we show the asymptotic properties of . Denote and We need the following assumptions.
-
(A5)
is independent of and where is twice continuously differentiable for
-
(A6)
is twice continuously differentiable for . For any fixed time point and are twice continuously differentiable for . Furthermore, is positive definite for . Furthermore,
where for a square matrix
-
(A7)
is a symmetric density function satisfying and
-
(A8)
and
Conditions (A5)-(A8) are similar in spirit to conditions (A1)-(A4) in Section 2.1. (A5) is the univariate version of (A1). (A6) is a modified identifiability condition for (A7) and (A8) are requirements on the kernel function and the bandwidth.
We state the asymptotic distribution of in the following theorem and relegate the proofs in the Appendix.
Theorem 2.
We need one bandwidth and the bias of in Theorem 2 is , which is the same as in classic nonparametric regression such as Theorem 1 of Fan and Zhang (2008). Our method depends on the selection of the bandwidth, which strikes a balance between bias and variability. From (A8), we shall choose the bandwidth With this choice of the bandwidth, we achieve a convergence rate the same as non-parametric regression of classic longitudinal data (Fan and Zhang, 2008) and faster than that obtained in Theorem 1. It is counterintuitive that we get an improved rate of convergence with less information – we do not use any information contained in The key is the orthogonality assumption imposed in (2.3), which allows us to obtain an unbiased estimation of the non-parametric regression function When (2.3) is violated, we can regress the residuals of on on the residuals of on similar to the FWL theorem in linear models (Frisch and Waugh, 1933; Lovell, 1963; Ding, 2021). We conjecture that the results would be similar to the simultaneous estimation of and with a slower convergence rate for the estimation of
For statistical inference, we need to estimate the variance of We use the sandwich formula to estimate the variance covariance matrix of as follows:
where and The variance covariance matrix of is the upper left submatrix of while we treat as a nuisance function. In the literature on density estimation, estimated from (2.7) has more desirable properties than conventional kernel density estimation. More details on this can be found in Cattaneo et al. (2020).
A varying-coefficient model approach to estimate . Another strategy to obtain an unbiased estimation of is to absorb the part involving through a non-parametric function. We use (2.5) and treat as a non-parametric intercept. Specifically, consider
(2.8) |
where and where and are non-parametric regression functions. Our aim is to find
(2.9) |
where is a kernel function, and is a bandwidth.
Next we present the asymptotic distribution of and relegate the proofs in the Appendix.
The asymptotic distribution and the bias of is the same as that of Simulation studies provide numerical support for this. For variance estimation, let and We calculate the variance of by the following formula:
The variance covariance matrix of is the upper submatrix of There is a trade-off between the centering approach and the non-parametric intercept approach. In terms of computation, the centering approach is faster. If we are interested in the non-parametric intercept term, the varying-coefficient model is more informative.
A kernel weighting approach to estimate . In the first stage, can be estimated using either the centering or the varying-coefficient model approach, and they have the same asymptotic distribution. Once is obtained, we can estimate by regressing the residuals from the first step on the asynchronous longitudinal covariates Due to the asynchronicity of we need two bandwidths in the calculation of Specifically,
(2.10) |
where where is a bivariate kernel function, which is usually taken to be the product of the univariate kernel function and and are bandwidths. As has a faster convergence rate, whether we use the true or does not affect the inference of
We establish the asymptotic distribution of in the following theorem and relegate the proofs to the Appendix.
Theorem 4.
In the second step, we need two bandwidths. If we set the bias of in Theorem 4 is We achieve the same rate of convergence for the estimation of as in the one-step method. This is the price we pay for the asynchronous longitudinal data structure. Whether we can improve this rate of convergence remains open at the moment.
Let we can calculate
The variance covariance matrix of is the upper submatrix of Statistical inference can be carried out afterwards.
2.4 Bandwidth selection
Our methods depend critically on the choice of the bandwidth. Asymptotic results provide theoretical range of valid bandwidths. In practice, it is desirable to perform a data-driven bandwidth selection. Cao et al. (2015) and Chen and Cao (2017) used a data-splitting strategy to separately calculate bias and variability and choose a bandwidth that minimizes the mean squared error.
In this paper, we use a kernel smoothed folds cross-validation procedure to select the optimal bandwidth. We use the two-step approach with centering in the first step and kernel weighting in the second step to illustrate this point. Specifically, we split the data set into folds, and estimate the non-parametric regression functions withholding one fold. The prediction error is calculated based on the one-fold data withheld and the regression functions are estimated with the other fold data. We do this times and the optimal bandwidth is the one that minimizes the average squared prediction error for any fixed time point
Specifically, for at the time point the average squared prediction error is
(2.11) |
where
where is the number of subjects in the th fold, is estimated without the subjects in the th fold, estimates , estimates , is a kernel density function and is the bandwidth. Similarly, for at time point the average squared prediction error is
(2.12) |
where
where is the number of subjects in the th fold, is estimated without the subjects in the th fold, is estimated from Step 1 with the optimal bandwidth plugged in, is a bivariate kernel function, and and are bandwidths. It is worth noting that when using the two-step approach, we need to first select the bandwidth for by minimizing (2.11), and then select the optimal bandwidth for by minimizing (2.12). If we want to calculate the optimal bandwidth of the regression function in a certain range of we can calculate the integrated ASPE.
3 Numerical studies
In this section, we evaluate the performance of the finite sample of the proposed method through simulations.
3.1 Data generating process and model specification
We generate datasets, each consisting of or subjects. For each subject, the number of observations of synchronous longitudinal data and asynchronous longitudinal covariate follow . The observational times are independently generated from the standard uniform distribution for and respectively. The covariate processes and are both Gaussian, with and The values of the stochastic process are recorded at the observational times of only in the data generation stage, to generate in the model (2.1).
The longitudinal response is generated from
where and are time-dependent coefficients, and is a Gaussian process, independent of and with and For the time-dependent coefficients, we consider two different settings: (i) and (ii)
3.2 Comparison of one-step and two-step methods
We compare two methods: one-step kernel weighting approach and two-step approach with centering in the first step and kernel weighting in the second step. Summary tables of two-step approach using non-parametric intercept and omitting the asynchronous longitudinal covariate at the first step and kernel weighting in the second step can be found in the Appendix. This method produces results similar to the two-step approach with centering in the first step and kernel weighting at the second step.
As suggested in Fan and Gijbels (1996), we use the Epanechnikov kernel where due to its excellent empirical performance. For the one-step kernel weighting method, we need kernel functions of two dimensions, where we use the product of a one-dimensional Epanechnikov kernel Other kernel functions produce similar results and the details are omitted.
We present the results of setting (i). The results of setting (ii) can be found in the Appendix. The simulation results for and are summarized in Table 1. We assess the estimation and inference of and at and The results at other time points are similar and thus omitted.
BD | NP-F | Bias | SD | SE | CP | Bias | SD | SE | CP | Bias | SD | SE | CP | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Two-step (Centering+KW) | |||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
One-step (KW) | |||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
Two-step (Centering+KW) | |||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
One-step (KW) | |||||||||||||||
auto | |||||||||||||||
auto |
-
•
Note: “BD” represents the bandwidths, where represents the bandwidth in the centering approach, and represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error, and “CP” is the pointwise coverage probability. “auto” represents automatic bandwidth selection.
For the estimation of based on the one-step kernel weighting method or the two-step method with centering in the first step and kernel weighting in the second step, we find that the bias is generally small, the estimated standard deviation is close to the empirical standard deviation, and the probability of coverage is close to the nominal with a large sample size at the examined time points and . Performance improves as the sample size increases. Therefore, we can conduct a valid pointwise inference of in practice using the proposed method.
Both methods are valid for inference of . From Figure 2, we observe that with the same bandwidth, the standard deviation of is smaller using the two-step method with kernel weighting in the second step compared to the one-step kernel weighting method, which is consistent with our theoretical prediction of the convergence rate.

In the one-step kernel weighting method, we let The allowable range of is In the simulation, we choose the optimal bandwidth in the range of In the two-step method, the allowable bandwidth in the first step for the estimation of is and we choose the optimal bandwidth in the range of The allowable bandwidth in the second step for the estimation of is when and the optimal bandwidth is selected in the range Automatic bandwidth selection performs well and can be used to conduct inference.
3.3 Simultaneous confidence bands
In this section, we examine the performance of the simultaneous confidence band. Instead of a point-wise confidence interval for fixed time points, it is more informative to examine the overall magnitude of variation of the non-parametric regression function. The data generation process is the same as that in Section 3.1 and we use setting (i) to illustrate. We are interested in constructing simultaneous confidence bands for and based on the one-step kernel weighting method and the two-step method with kernel weighting at the second stage.
Specifically, we use the wild bootstrap approach described in Section 2 to find the empirical percentile of the supremum of the difference between estimated non-parametric regression function and true non-parametric regression function. We use the Rademacher distribution, which takes value with probability and value with probability in the wild bootstrap. In the calculation of the empirical coverage probability, we use equally spaced grid points in We use the Epanechikov kernel. Results based on other commonly used kernel functions such as the uniform kernel are similar, and thus omitted.
In addition to the simultaneous coverage probability of the pointwise confidence interval and the newly proposed simultaneous confidence band, we also present the root average squared error (RASE). Using as an example, the RASE for a non-parametric function is defined as where are grid points at which the non-parametric function is estimated.
The results are summarized in Table 2 with and We can see that as the sample size increases, the simultaneous confidence band coverage probabilities based on the proposed one-step kernel weighting and two-step method with kernel weighting in the second step are close to the nominal ones, and the point-wise confidence interval is not valid for simultaneous inference. The RASE gets smaller with increased sample size.
BD | NP-F | RASE(SD) | CI | SCB | RASE(SD) | CI | SCB | |
---|---|---|---|---|---|---|---|---|
Two-step (Centering+KW) | ||||||||
One-step (KW) | ||||||||
-
•
Note: “BD” represents the bandwidths, where represents the bandwidth in the centering approach, and represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “RASE” represents the average of RASE, and “SD” represents the standard deviation of RASE. “CI” represents the simultaneous coverage probability based on the pointwise confidence interval. “SCB” represents the simultaneous coverage probability based on SCB. The nominal level is .
Figure 2 shows typical graphs of and based on the one-step kernel weighting method and the two-step method with kernel weighting at the second step for and We can see that in the estimation of both the pointwise confidence interval and the simultaneous confidence band are narrower in the two-step method with kernel weighting at the second step than in that based on the one-step kernel weighting method. This reflects that we obtain a more efficient estimate of by using the two-step method compared to the one-step method. As the sample size increases, we obtain narrower confidence bands for and .
4 Application to the ADNI Data
In this section, we illustrate the proposed methods on a data set from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study (http://www.adni-info.org/). ADNI is a large-scale multicenter neuroimaging study that collects genetic, clinical, imaging, and cognitive data at multiple time points to better understand Alzheimer’s Disease. We used a subset of ADNI consisting of subjects over a year follow-up to examine the relationship between diffusion-weighted imaging (DWI) and cognitive decline. For the measurement of DWI, we use fractional anisotropy (FA), which reflects fiber density, axonal diameter, and myelination in white matter. FA is observed at to time points. For the outcome related to cognitive decline, we use the Mini-Mental State Examination (MMSE) score, with lower scores indicating impairment. The MMSE score is examined from to time points. For each subject, the measurement times of MMSE and FA are mismatched. Other covariates such as age (ranging from to 96 years), years of education (ranging from to ) and the number of APOE4 genes are synchronous with MMSE, creating mixed synchronous and asynchronous longitudinal covariates. The growth curves for and of randomly selected patients are presented in Figure 3. Details of the registration and normalization of imaging data can be found in Li et al. (2022).

We aim to examine the association between MMSE and log hazard function of FA at region (the whole brain is divided into regions) accounting for other baseline covariates. We first normalize the data as recommended in Diggle et al. (2002); Fan and Li (2004). Specifically, for continuous baseline covariate education level, we subtract the mean and divide the standard deviation across all individuals. For continuous longitudinal data, such as age, at time point , we get Nadaraya-Watson type of estimator of and The standard deviation and the normalized version is Other longitudinal data can be normalized in the same way and we omit the details. Below we use the same notation for normalized data.
Our model is
(4.13) | |||||
where denotes longitudinal response of MMSE, denote synchronous longitudinal covariates and denotes asynchronous longitudinal covariate. In (LABEL:realdata1), except for which denotes number of copies of gene APOE4 ( or ), the rest are continuous variables.
In our proposed approaches, the two-step method requires that the synchronous and asynchronous longitudinal covariates are uncorrelated. We regress log hazard function of FA on age, education level, and the number of APOE4 genes with three univariate regression models and one multivariate regression model as follows
(4.14) |
The results are summarized in Table 3. We observe that education level and APOE4 genes are not statistically significantly associated with FA. There is a significant effect of age. We shall use one-step method to estimate regression functions in (LABEL:realdata1) and do the two-step method without age as a sensitivity analysis. The first stage of the two-step approach uses the bandwidth , and the second stage of the two-step approach and one-step kernel weighting approach use the bandwidth in ADNI data.
Fit separately | Fit in one model | ||||||
---|---|---|---|---|---|---|---|
Age | Edu | APOE4 | Age | Edu | APOE4 | ||
Estimate | |||||||
SE | |||||||
-value |
Figure 4 presents the non-parametric regression functions and SCBs of and based on one-step method. We also draw constant and linear lines to check their adequacy in explaining the variation. We observe that the SCB of does not contain linear or constant lines, which implies that age has a non-linear time varying effect on MMSE. Education level is not statistically significantly associated with MMSE. There is a negative constant effect of APOE4 on MMSE, which is consistent with results in the medical literature (Safieh et al., 2019). FA has a non-zero constant effect on MMSE.

Therefore our analysis suggests the following model
(4.15) | |||||
we fit model (LABEL:realdata3) based on one step method. As a comparison, we omit age and use the two step approach with centering at the first step and kernel weighting at the second step to estimate the constant coefficients for APOE4 and FA. In other words, we minimize to obtain at the first step where and are the centering estimators. Then we combine profile least squares method (Fan and Li, 2004) and kernel weighting method (Cao et al., 2015) to estimate
The results of constant coefficient estimates of and are summarized in Table 4.
Estimate | SE | p-value | ||
---|---|---|---|---|
One-step | APOE4 | |||
FA | ||||
Two-step | APOE4 | |||
FA |
We can see that APOE4 has a negative statistically significant association with MMSE and FA has a positive statistically significant association with MMSE based on both one-step and two-step methods. In addition, Figure 5 presents the regression function estimation of based on one-step method. It seems that the age effect is highly non-linear, which cannot be obtained by existing methods.

5 Concluding remarks
In this paper, we propose new methods to conduct statistical inference of mixed synchronous and asynchronous longitudinal covariates with varying coefficient models. Our one-step method allows arbitrary dependence between the synchronous and asynchronous longitudinal covariates. If we further restrict the synchronous and asynchronous longitudinal covariates to be uncorrelated, we get non-parametric rate of convergence for the regression function of synchronous longitudinal covariates, faster than the rate of convergence for the regression function of asynchronous longitudinal covariates. It remains an open question whether rate of convergence can be obtained for the regression function of asynchronous longitudinal covariates. To quantify the overall magnitude of variation of the non-parametric regression function, we construct simultaneous confidence bands for the non-parametric regression functions with wild bootstrap. The computation is scalable as we put random variation in the residual without the need to compute the non-parametric regression function for each bootstrap sample. Such SCBs allow us to see if the regression function can be simplified as a constant, or a linear function.
We use working independence covariance matrix similar to the GEE (Liang and Zeger, 1986). Pepe and Anderson (1994) shows that with time dependent covariats, working independence is a safe choice. Efficiency can be improved by correctly specified covariance matrix, which is usually difficult.
R code for simulation and data analysis can be found in https://github.com/hongyuan-cao/mixed-longitudinal. The dataset can be found in https://github.com/BIG-S2/GFPLVCM.
Acknowledgements
We thank Ting Li for help with data acquisition. Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
References
- Andersen and Gill (1982) Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study. Ann. Stat., 10, 1100–1120.
- Cao et al. (2016) Cao, H., Li, J. and Fine, J. P. (2016). On last observation carried forward and asynchronous longitudinal regression analysis. Electron. J. Stat., 10, 1155–1180.
- Cao et al. (2015) Cao, H., Zeng, D. and Fine, J. P. (2015). Regression analysis of sparse asynchronous longitudinal data. J. R. Stat. Soc. Ser. B-Stat. Methodol., 77, 755–776.
- Cao et al. (2018) Cao, H., Liu, W. and Zhou, Z. (2018). Simultaneous nonparametric regression analysis of sparse longitudinal data. Bernoulli, 24, 3013–3038.
- Cattaneo et al. (2020) Cattaneo, M. D., Jansson, M. and Ma, X. (2020). Simple local polynomial density estimators. J. Am. Stat. Assoc., 115, 1449–1455.
- Chen and Cao (2017) Chen, L. and Cao, H. (2017). Analysis of asynchronous longitudinal data with partially linear models. Electron. J. Stat., 11, 1549–1569.
- Diggle et al. (2002) Diggle, P. J., Heagerty, P., Liang, K. Y. and Zeger, S. L. (2002). Analysis of longitudinal data (2nd ed.), Clarendon, TX: Clarendon Press.
- Ding (2021) Ding, P. (2021). The Frisch-Waugh-Lovell theorem for standard errors. Stat. Probab. Lett., 168, 108945.
- Fan and Gijbels (1996) Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall, London.
- Fan and Li (2004) Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc., 99, 710–723.
- Fan et al. (2007) Fan, J., Huang, T. and Li, R. (2007). Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function. J. Am. Stat. Assoc., 102, 632–641.
- Fan and Zhang (2000) Fan, J. and Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scand. J. Stat., 27, 715–731.
- Fan and Zhang (2008) Fan, J. and Zhang, W. (2008). Statistical methods with varying coefficient models. Stat. Interface, 1, 179–195.
- Frisch and Waugh (1933) Frisch, R. and Waugh, F. V. (1933). Partial time regressions as compared with individual trends. Econometrica, 1, 387–401.
- Gu et al. (2014) Gu, L., Wang, L., Härdle, W. K. and Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. Test, 23, 806–843.
- Hastie and Tibshirani (1993) Hastie, T. and Tibshirani R. (1993). Varying-coefficient models. J. R. Stat. Soc. Ser. B-Stat. Methodol., 55, 757–796.
- Li et al. (2022) Li, T., Li, T., Zhu, Z. and Zhu H. (2022). Regression analysis of asynchronous longitudinal functional and scalar data. J. Am. Stat. Assoc., 117, 1228–1242.
- Liang and Zeger (1986) Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
- Lin and Ying (2001) Lin, D. Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. J. Am. Stat. Assoc., 96, 103–126.
- Liu (1988) Liu, R. Y. (1988). Bootstrap procedures under some non-i.i.d. models. Ann. Stat., 16, 1696–1708.
- Liu and Wu (2010) Liu, W. and Wu, W. B. (2010). Simultaneous nonparametric inference of time series. Ann. Stat., 38, 2388–2421.
- Lou et al. (2021) Lou, J., Wang, Y., Li, L. and Zeng, D. (2021). Learning latent heterogeneity for type 2 diabetes patients using longitudinal health markers in electronic health records. Stat. Med., 40, 1930–1946.
- Lovell (1963) Lovell, M. C. (1963). Seasonal adjustment of economic time series and multiple regression analysis. J. Am. Stat. Assoc., 58, 993–1010.
- Ma et al. (2012) Ma, S., Yang, L. and Carroll, R. J. (2012). A simultaneous confidence band for sparse longitudinal regression. Stat. Sin., 22, 95–122.
- Nadaraya (1964) Nadaraya, E. A. (1964). On estimating regression. Theory Probab. Appl., 9, 141–142.
- Pepe and Anderson (1994) Pepe, M. S. and Anderson, G. L. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Commun. Stat.-Simul. Comput., 23, 939–951.
- Safieh et al. (2019) Safieh, M., Korczyn, A. D. and Michaelson, D. M. (2019). ApoE4: an emerging therapeutic target for Alzheimer’s disease. BMC Med., 17, 64.
- Sentürk et al. (2013) Sentürk, D., Dalrymple, L. S., Mohammed, S. M., Kaysen, G. A. and Nguyen, D. V. (2013). Modeling time-varying effects with generalized and unsynchronized longitudinal data. Stat. Med., 32, 2971–2987.
- Sun et al. (2021) Sun, D., Zhao, H. and Sun, J. (2021). Regression analysis of asynchronous longitudinal data with informative observation processes. Comput. Stat. Data Anal. 107161.
- van der Vaart and Wellner (1996) van der Vaart A. and Wellner, J. (1996). Weak convergence and empirical processes. Springer, New York.
- Watson (1964) Watson, G. S. (1964). Smooth regression analysis. Sankhyā Ser. A., 26, 359–372.
- Wu (1986) Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat., 14, 1261–1295.
- Xiong and Dubin (2010) Xiong, X. and Dubin, J. A. (2010). A binning method for analyzing mixed longitudinal data measured at distinct time points. Stat. Med., 29, 1919–1931.
- Zheng et al. (2014) Zheng, S., Yang, L. and Härdle, W. K. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. J. Am. Stat. Assoc., 109, 661–-673.
- Zhou and Wu (2010) Zhou, Z. and Wu, W. B. (2010). Simultaneous inference of linear models with time varying coefficients. J. R. Stat. Soc. Ser. B-Stat. Methodol., 72, 513–531.
- Zhu et al. (2007) Zhu, H., Ibrahim, J. G., Tang, N., Rowe, D. B., Hao, X., Bansal, R. and Peterson, B. S. (2007). A statistical analysis of brain morphology using wild bootstrapping. IEEE Trans. Med. Imaging, 26, 954–966.
- Zhu et al. (2012) Zhu, H., Li, R. and Kong, L. (2012). Multivariate varying coefficient model for functional responses. Ann. Stat., 40, 2634–2666.
Appendix A Proofs of main results
In this section, we provide detailed proofs of Theorems 1 to 4. Our main tools are empirical processes.
A.1 Proof of Theorem 1
Theorem 1. Under conditions (A1)-(A4), as we have
where
and
Proof.
For any time point , denote true regression function
estimated regression function
and longitudinal covariates
Let
be a matrix with dimension where is a identity matrix, is a identity matrix and the others are matrices with entries. Then our estimating equation becomes
(S.1) |
where . Solving we obtain . We first show the consistency of and Note that
By conditions (A2)-(A4) and Taylor expansion,
By the law of large numbers, as where a column vector in Under condition (A2), is the unique solution to
solves the estimation equation By Andersen and Gill (1982), it follows that for any fixed .
We next show the asymptotic normality of Let and denote the empirical measure and the true probability measure, respectively. We have
(S.2) |
For we consider the class of functions
for any fixed and a given constant Note that the functions in this class are differentiable and their corresponding first-order derivatives are bounded according to condition (A2), therefore the functions in this class are Lipschitz continuous in at any fixed and the Lipschitz constant is uniformly bounded by It can be shown that this is a P-Donsker class (van der Vaart and Wellner, 1996). We obtain that for where is a constant, the first term in (S.2) is equal to
(S.3) |
For the second term on the right-hand side of equation (S.2), we have
Define
then and we have
By conditions (A3) and (A4),
For we have
As and
(S.4) |
Combining (S.2), (S.3) and (S.4), we obtain
(S.5) |
Recall
Since , sum of independent and identically distributed (i.i.d.) random variables, we next calculate Note that
where
and
To prove the asymptotic normality, we verify the Lyapunov condition. Note that
then similar to the calculation of variance, we have
Therefore, by condition (A4) and we have
where Combining with equation (S.5), we have
where
and
Therefore the conclusion of Theorem holds. ∎
A.2 Proof of Theorem 2
Proof.
For any let
and
under conditions (A7) and (A8), and Let where is a unit matrix, then the estimating equation is
(S.6) |
Solving we obtain . We first show the consistency of
Let , by conditions (A6)-(A8) and the law of large numbers, we have as Similar to the proof of Theorem , under condition (A6) and convexity lemma (Andersen and Gill, 1982), .
Next we show the asymptotic normality of Let and denote the empirical measure and the true probability measure respectively, we have
(S.7) |
For the first term in (S.7), we consider the class of functions
for any given constant and any fixed time point Similarly to the proof in Theorem , it can be shown that this is a P-Donsker class (van der Vaart and Wellner, 1996), we obtain that for where is a constant, the first term in (S.7) is equal to
(S.8) |
For the second term on the right-hand side of equation (S.7), we have
(S.9) |
For let . s are i.i.d. and We have
let similar to the proof of Theorem , by conditions (A7) and (A8), we have
For , we have
Consequently,
(S.10) |
The variance of can be computed using defined earlier,
where
and
To prove the asymptotic normality, we verify the Lyapunov condition. Note that
then similar to the calculation of variance,
Combining (S.10), we have
where
Therefore the conclusion of Theorem holds. ∎
A.3 Proof of Theorem 3
Proof.
From Fan and Zhang (2008), is consistent estimates of the coefficients in model (2.8). Here we show that is a consistent estimator for in model (2.1). For any fixed time point , define
(S.11) |
where and . From (2.5), we have
(S.12) |
where , and The estimating equation is the same as the centering approach. Consequently, let be the estimating function using the varying-coefficient model approach and be the true function. The proof of Theorem is the same as that of Theorem and is thus omitted. ∎
A.4 Proof of Theorem 4
Proof.
For any fixed time point let , and for with any given constant the estimating equation about becomes
(S.13) |
Let . By taking expected value of , we have
Let then by conditions (A2)-(A4) and Taylor expansion, we have By the law of large numbers, as where Under condition (A2), is the unique solution to solves the estimation equation By Andersen and Gill (1982), it follows that for any fixed .
We next show the asymptotic normality of Using and to denote the empirical measure and true probability measure respectively, we have
(S.14) |
For the term and any fixed time point we consider the class of functions
for given constants and Similarly the proof in Theorem , we obtain that for where and are some constants, the first term in (S.14) is equal to
For the second term on the right-hand side of equation (S.14), we have
For let
s are i.i.d. and then we have
let similar to the proof of Theorem we have By the proof of Theorem , conditions (A4) and (A8), we have
For we have
Consequently,
Now we show that follows the central limit theorem. The variance can be computed using
where
and
Thus
To prove the asymptotic normality, we verify the Lyapunov condition. Note that
then similar to the calculation of variance,
Consequently, by condition (A4) we have
Then we have
where
Therefore the conclusion of Theorem holds. ∎
Appendix B Additional simulations
In this section, we present simulation results for setting (i) and (Table S.1) by using two-step approach with a varying-coefficient model method at the first step and kernel weighting at the second step. We further show results for setting (ii) and using the proposed three methods (Table S.2).
BD | NP-F | Bias | SD | SE | CP | Bias | SD | SE | CP | Bias | SD | SE | CP | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Results for | Two-step (VCM+KW) | ||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
Results for | Two-step (VCM+KW) | ||||||||||||||
auto | |||||||||||||||
auto |
-
•
Note: “BD” represents the bandwidths, where represents the bandwidth in the varying coefficient model (VCM) approach, and represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error and “CP” is the point-wise coverage probability. “auto” represents automatic bandwidth selection.
BD | NP-F | Bias | SD | SE | CP | Bias | SD | SE | CP | Bias | SD | SE | CP | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Results for | Two-step (Centering+KW) | ||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
Two-step (VCM+KW) | |||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
One-step (KW) | |||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
Results for | Two-step (Centering+KW) | ||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
Two-step (VCM+KW) | |||||||||||||||
auto | |||||||||||||||
auto | |||||||||||||||
One-step (KW) | |||||||||||||||
auto | |||||||||||||||
auto |
-
•
Note: “BD” represents the bandwidths, where represents the bandwidth in the centering and varying coefficient model (VCM) approaches, and represent the bandwidths in the kernel weighting (KW) approach. “NP-F” represents the non-parametric function. “Bias” is the absolute bias. “SD” is the sample standard deviation. “SE” is the average of the standard error and “CP” is the point-wise coverage probability. “auto” represents automatic bandwidth selection.