Dynamic Risk Prediction Triggered by Intermediate Events Using Survival Tree Ensembles

Yifei Sun, Sy Han Chiou, Colin O.Wu, Meghan McGarry, and Chiung-Yu Huang^†^†Yifei Sun is Assistant Professor, Department of Biostatistics, Columbia University Mailman School of Public Health, New York, NY 10032 (Email: [email protected]). Sy Han Chiou is Assistant Professor, Department of Mathematical Sciences, University of Texas at Dallas (Email: [email protected]). Meghan McGarry is Assistant Professor, Department of Pediatrics, School of Medicine, University of California San Francisco (Email: [email protected]). Colin O. Wu is Mathematical Statistician, National Heart, Lung, and Blood Institute, National Institutes of Health (Email: [email protected]). Chiung-Yu Huang is Professor, Department of Epidemiology and Biostatistics, School of Medicine, University of California San Francisco, San Francisco, CA 94158 (Email: [email protected]).

Abstract

With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to conventional landmark prediction with fixed landmark times, our methods allow the landmark times to be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue of model incompatibility at different landmark times. In our framework, both the longitudinal predictors and the event time outcome are subject to right censoring, and thus existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we propose a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.

KEY WORDS: Dynamic prediction, landmark analysis, multi-state model, survival tree, time-dependent predictors.

1 Introduction

Cystic fibrosis (CF) is a genetic disease characterized by a progressive, irreversible decline in lung function caused by chronic microbial infections of the airways. Despite recent advances in diagnosis and treatment, the burden of CF care remains high, and most patients succumb to respiratory failure. There is currently no cure for CF, so early prevention of lung disease for high-risk patients are essential for successful disease management. The goal of this research is to develop flexible and accurate event risk prediction algorithms for abnormal lung function in pediatric CF patients by exploiting the rich longitudinal information made available by the Cystic Fibrosis Foundation Patient Registry (CFFPR).

The CFFPR is a large electronic health record database that collects encounter-based records of over 300 unique variables on patients from over 120 accredited CF care centers in the United States (Knapp et al., 2016). The CFFPR contains detailed information on potential risk factors, including symptoms, pulmonary infections, medications, test results, and medical history. Analyses of CFFPR suggested that the variability in spirometry measurements over time is highly predictive of subsequent lung function decline (Morgan et al., 2016). Moreover, the acquisition of chronic, mucoid, or multidrug-resistant subtypes of Pseudomonas aeruginosa (PA) leads to more severe pulmonary disease, accelerating the decline in lung function (McGarry et al., 2020). In this paper, the event of interest is the progressive loss of lung function, defined as the first time that the percent predicted forced expiratory volume in 1 second (ppFEV1) drops below 80% in CFFPR. Since risk factors such as weight and height in pediatric patients can change substantially over time, models with baseline predictors have limited potential for long-term prognosis. Incorporating repeated measurements and intermediate clinical events would reflect ongoing CF management and result in more accurate prediction.

To incorporate the longitudinal patient information in risk prediction, one major approach is joint modeling (see, for example, Rizopoulos, 2011; Taylor et al., 2013). Under the joint modeling framework, a longitudinal submodel for the time-dependent variables and a survival submodel for the time-to-event outcome are postulated; the sub-models are typically linked via latent variables. Such a model formulation provides a complete specification of the joint distribution, based on which the survival probability given the history of longitudinal measurements can be derived. Most joint modeling methods consider a single, continuous time-dependent variable. Although attempts have been made to incorporate multiple time-dependent predictor variables (Proust-Lima et al., 2016; Wang et al., 2017), correct specification of the model forms for all the time-dependent covariates and their associations with the event outcome remains a major challenge. Moreover, it is not clear how existing joint modeling approaches can further incorporate the information on the multiple intermediate events, such as the acquisition of different subtypes of PA, in risk prediction.

Another major approach that can account for longitudinal predictors is landmark analysis, where models are constructed at pre-specified landmark times to predict the event risk in a future time interval. For example, at each landmark time, one may postulate a working Cox model with appropriate summaries of the covariate history up to the landmark time (e.g., last observed values) as predictors and then fit the Cox model using data from subjects who are at risk of the event. The estimation can either be performed using a separate model at each landmark time point or a supermodel for all landmark time points (van Houwelingen, 2007; van Houwelingen and Putter, 2008, 2011). This way, multiple and mixed type time-dependent predictors can be easily incorporated. Moreover, to better exploit the repeated measurements and to handle measurement errors, one may also consider a two-stage landmark approach (Rizopoulos et al., 2017; Sweeting et al., 2017; Ferrer et al., 2019): in the first step, mixed-effects models are used to model the longitudinal predictors; in the second step, functions of the best linear unbiased prediction (BLUP) estimator of the random effects are included as predictors of the landmark Cox model. Other than Cox models, Parast et al. (2012) considered time-varying coefficient models to incorporate a single intermediate event and multiple biomarker measurements. Zheng and Heagerty (2005), Maziarz et al. (2017), and Zhu et al. (2019) further considered the impact of informative observation times of repeated measurements on future risk.

Direct application of the existing landmark analysis method to the CFFPR data may not be ideal for the following reasons: first, imposing semiparametric working models at different landmark times may result in incompatible models and inconsistent predictions (Jewell and Nielsen, 1993; Rizopoulos et al., 2017). In other words, a joint distribution of predictors and event times that satisfies the models at all the landmark times simultaneously may not exist. Second, the specification of how the predictor history affects the future event risk may require deep clinical insight. For example, researchers have shown that various summaries of the repeated measurements, including the variability (Morgan et al., 2016), the rate of change (Mannino et al., 2006), and the area under the trajectory curve (Domanski et al., 2020), can serve as important predictors of disease risks, while the last observed value has been commonly used in the statistical literature. Therefore, nonparametric statistical learning methods are appealing in landmark prediction, because they require minimal model assumptions and have the potential to deal with a large number of complicated predictors. Tanner et al. (2021) applied super learners for landmark prediction in CF patients, where discrete time survival analysis were conducted via the use of machine learning algorithms for binary outcomes.

In this paper, we propose a unified framework for landmark prediction using survival tree ensembles, where the landmark times can be subject-specific. A subject-specific landmark can be defined by an intermediate clinical event that modifies patients’ risk profiles and triggers the need for an updated evaluation of future risk. In our application, the acquisition of chronic PA usually leads to accelerated deterioration in the pulmonary function and serves as a natural landmark. When the landmark time is random, the number of observed predictors at the landmark time often varies across subjects, creating analytical challenges in fully utilizing the available information. Moreover, unlike static risk prediction models where baseline predictors are completely observed, the observation of the time-dependent predictors is further subject to right censoring. To tackle these problems, we propose a risk-set-based approach to handle the possibly censored predictors. To avoid the instability issue of a single tree, we propose a novel ensemble procedure based on averaging unbiased martingale estimating equations derived from individual trees. Our ensemble method is different from existing ensemble methods that directly average the cumulative hazard predictions and has strong empirical performances in dealing with censored data.

The rest of this article is organized as follows. In Section 2, we introduce a landmark prediction framework that incorporates the repeated measurements and intermediate events. In Section 3, we propose tree-based ensemble methods to deal with censored predictors and outcomes. We propose a concordance measure to evaluate the prediction performance in Section 4 and define a permutation variable importance measure in Section 5. The proposed methods are evaluated by extensive simulation studies in Section 6 and are applied to the CFFPR data in Section 7. We conclude the paper with a discussion in Section 8.

2 Model Setup

In contrast to static risk prediction methods that output a conditional survival function given baseline predictors, dynamic landmark prediction focuses on the survival function conditioning on the predictor history up to the landmark time. Since history information involves complicated stochastic processes, challenges arise as to how to partition the history processes when applying tree-based methods. In what follows, we first introduce a generalized definition of the landmark survival function, starting from either a fixed or subject-specific landmark time. We then express the history information as a fixed-length predictor vector on which recursive partition can be applied.

Denote by $T$ a continuous failure event time and by ${T_{\rm L}}$ a landmark time. The landmark is selected a priori and is usually clinically meaningful. We allow ${T_{\rm L}}$ to be either fixed or subject-specific. We focus on the subpopulation that is free of the failure event at ${T_{\rm L}}$ and predict the risk after ${T_{\rm L}}$ . Denote by ${\bm{Z}}$ the baseline predictors and denote by ${\mathcal{H}}(t)$ other information observed on $[0,t]$ . Our goal is to predict the probability conditioning on all the available information up to ${T_{\rm L}}$ , that is,

\displaystyle P(T-{T_{\rm L}}\geq t\mid T\geq{T_{\rm L}},{T_{\rm L}},{\mathcal{H}}({T_{\rm L}}),{\bm{Z}}).

(1)

To illustrate the observed history ${\mathcal{H}}(t)$ , we consider two types of predictors that are available in the CFFPR data. The first type of predictors is repeated measurements of time-dependent variables such as weight and ppFEV1. It is worthwhile to point out that both internal and external time-dependent predictors can be included, as only their history up to ${T_{\rm L}}$ will be used. We denote this type of predictors by ${\bm{W}}(t)$ , a $q$ -dimensional vector of time-dependent variables, and assume ${\bm{W}}(\cdot)$ is available at fixed time points $t_{1},t_{2},\ldots,t_{K}$ . The observed history up to $t$ is ${\mathcal{H}}_{W}(t)=\{{\bm{W}}(s)dO(s),0<s\leq t\}$ , where $O(t)$ is a counting process that jumps by one when ${\bm{W}}(\cdot)$ is measured (i.e., $dO(t_{k})=1$ for $k=1,\ldots,K$ ). The second type of predictors is the timings of intermediate clinical events such as pseudomonas infections. Denote by $U_{j}$ the time to the $j$ th intermediate event, $j=1,\ldots,J$ . The observed history up to $t$ is ${\mathcal{H}}_{U}(t)=\{I(U_{j}\leq s),0<s\leq t,j=1,\ldots,J\}$ . Collectively, we have a system of history processes ${\mathcal{H}}(t)=({\mathcal{H}}_{W}(t),{\mathcal{H}}_{U}(t))$ .

In our framework, both $t_{k}$ and $U_{j}$ can serve as landmark times. Due to the stochastic nature of $U_{j}$ , the order of $\{t_{k},U_{j},k=1,\ldots,K,j=1,\ldots,J\}$ can not be pre-determined. As a result, the number of available predictors at a given landmark time can vary across subjects. For illustration, we consider one fixed time point $t_{1}$ at age 7 and one intermediate event chronic PA (cPA), of which the occurrence time is denoted by $U_{1}$ . Figure 1 depicts the observed data of two study subjects. At $t_{1}=7$ , subject 1 has experienced cPA ( $U_{1}=4.8\leq t_{1}$ ), while subject 2 remains free of cPA ( $U_{1}=10.2>t_{1}$ ). Let ${\bm{W}}(t)$ be the body weight measured at time $t$ (i.e., $q=1$ ). The probabilities of interest are given as follows:

(I)

At a fixed landmark time ${T_{\rm L}}=t_{1}$ , we predict the risk at $t_{1}+t$ , $t>0$ , among those who are at risk, that is, $T\geq t_{1}$ . Note that subjects in the risk set may or may not have experienced the intermediate event prior to $t_{1}$ . Given ${\bm{W}}(t_{1})$ and the partially observed $U_{1}$ , the conditional survival probability (1) can be reexpressed as

\displaystyle\begin{cases}P(T\geq t+t_{1}\mid T\geq t_{1},{\bm{Z}},{\bm{W}}(t_{1}),U_{1}),&\text{if~{}}U_{1}\leq t_{1},\\ P(T\geq t+t_{1}\mid T\geq t_{1},{\bm{Z}},{\bm{W}}(t_{1}),U_{1}>t_{1}),&\text{otherwise}.\end{cases}

In other words, at $t_{1}$ , we output the former for subjects who experience the intermediate event prior to $t_{1}$ (subject 1, Figure 1(a)), while output the latter for others (subject 2, Figure 1(b)).

(II)

At a random landmark time ${T_{\rm L}}=U_{1}$ , we predict the risk for subjects who have experienced the intermediate event and are free of the failure event. The predictor value ${\bm{W}}(t_{1})$ is available only if $U_{1}\geq t_{1}$ . In this case, we predict

\displaystyle\begin{cases}P(T\geq t+U_{1}\mid T\geq U_{1},{\bm{Z}},U_{1}),&\text{if~{}}U_{1}\leq t_{1},\\ P(T\geq t+U_{1}\mid T\geq U_{1},{\bm{Z}},{\bm{W}}(t_{1}),U_{1}),&\text{otherwise}.\end{cases}

Therefore, at $U_{1}$ , we output the former for subjects whose ${\bm{W}}(t_{1})$ is observed after $U_{1}$ (Subject 1, Figure 1(c)), while output the latter for others (Subject 2, Figure 1(d)).

(a) Subject 1, landmark at

t_{1}=7

(b) Subject 2, landmark at

t_{1}=7

U_{1}

(d) Subject 2, landmark at

U_{1}

Figure 1: Illustration of fixed and random landmark times. marks the landmark time point; marks the available information at the landmark time; marks the unavailable information at the landmark time; marks the target prediction interval.

In the above example, the observed predictors vary across subjects. We then represent the history ${\mathcal{H}}({T_{\rm L}})$ as a vector with a fixed length, so that tree-based methods can be applied to estimate the probability in (1). Define the complete predictors ${\bm{X}}=\{{\bm{W}}(t_{1}),\ldots,{\bm{W}}(t_{K}),U_{1},\ldots,U_{J}\}$ . The information in ${\bm{X}}$ may not be fully available at a given landmark time. We define the available information up to $t$ by ${\bm{X}}(t)=\{{\bm{W}}(t_{1},t),\ldots,$ ${\bm{W}}(t_{K},t),U_{1}(t),\ldots,U_{J}(t)\}$ , where

\displaystyle{\bm{W}}(t_{k},t)=\begin{cases}{\bm{W}}(t_{k}),&\text{~{}if~{}}t_{k}\leq t,\\ {\textbf{NA}_{q}},&\text{~{}otherwise},\end{cases}\hskip 21.68121ptU_{j}(t)=\begin{cases}U_{j},&\text{~{}if~{}}U_{j}\leq t,\\ t^{+},&\text{~{}otherwise}.\end{cases}

By a slight abuse of notation, we write $U_{j}(t)=t^{+}$ if $U_{j}>t$ and ${\bm{W}}(t_{k},t)=\textbf{NA}_{q}$ if $t_{k}>t$ , with $\textbf{NA}_{q}\overset{\rm def}{=}(\rm NA,\ldots,NA)$ denoting a non-numeric $q$ -dimensional vector. Here an NA value indicates that the covariate value is collected after the landmark time and thus should not be used for constructing prediction models. In other words, under our setting NA is treated as an attribute rather than missing data, as the target probability is not conditional on ${\bm{W}}(t_{k})$ for $t_{k}>{T_{\rm L}}$ . The covariate history ${\mathcal{H}}({T_{\rm L}})$ can then be expressed as ${\bm{X}}({T_{\rm L}})$ , which is a $(qK+J)$ -dimensional vector. This way, a covariate not being observed is predictive of the outcome, and the target survival probability function can be expressed as follows:

\displaystyle S(t\mid a,{\bm{z}},{\bm{x}})=P(T\geq t+a\mid T\geq{T_{\rm L}}=a,{\bm{Z}}={\bm{z}},{\bm{X}}({T_{\rm L}})={\bm{x}}).

(2)

In the example depicted in Figure 1, we have ${\bm{X}}(t)=\{{\bm{W}}(t_{1},t),U_{1}(t)\}$ . For ${T_{\rm L}}=t_{1}$ , the predictor values in Figure 1(a) and 1(b) correspond to ${\bm{x}}=(19.7,4.8)$ and ${\bm{x}}=(21.5,7^{+})$ , respectively. For ${T_{\rm L}}=U_{1}$ , the predictor values in Figures 1(c) and 1(d) correspond to ${\bm{x}}=({\rm NA},4.8)$ and ${\bm{x}}=(21.5,10.2)$ , respectively. Since the information at ${T_{\rm L}}$ involves left-bounded intervals and non-numeric values, applying semiparametric methods to estimate $S(t\mid a,{\bm{z}},{\bm{x}})$ is challenging. Hence we propose tree-based methods to handle partially observed predictors.

3 Survival trees and ensembles for landmark prediction

At a given landmark time, we build a tree-based model to predict future event risk. Survival trees are popular nonparametric tools for risk prediction. The original survival trees take baseline predictors as input variables and output the survival probability conditioning on the baseline predictors (see, for example, Gordon and Olshen, 1985; Ciampi et al., 1986; Segal, 1988; Davis and Anderson, 1989; LeBlanc and Crowley, 1992, 1993; Zhang, 1995; Molinaro et al., 2004; Steingrimsson et al., 2016), and ensemble methods have been applied to address the instability issue of a single tree (Hothorn et al., 2004, 2006; Ishwaran et al., 2008; Zhu and Kosorok, 2012; Steingrimsson et al., 2019). However, existing methods may not be directly applied, because the predictors in ${\bm{X}}$ are not completely observed at ${T_{\rm L}}$ , and the available predictors $({T_{\rm L}},{\bm{X}}({T_{\rm L}}))$ are subject to right censoring. In the absence of censoring, we introduce a partition scheme for subjects who are event-free at the landmark time in Section 3.1. To handle censored data, we propose risk-set methods to estimate the partition-based landmark survival probability in Section 3.2 and propose an ensemble procedure in Section 3.3.

3.1 Partition on partially observed predictors at the landmark time

A tree partitions the predictor space into disjoint subsets termed terminal nodes and assigns the same survival prediction for subjects that enter the same terminal node. In dynamic risk prediction, the population of interest is subjects who remain event-free at the landmark time, that is, those with $T\geq{T_{\rm L}}$ . We use ${\mathcal{T}}=\{\tau_{1},\tau_{2},\ldots,\tau_{M}\}$ to denote a partition on the sample space of $({T_{\rm L}},{\bm{Z}},{\bm{X}}({T_{\rm L}}))\mid T\geq{T_{\rm L}}$ , where $\tau_{m},m=1,\ldots,M$ , are the terminal nodes. The terminal nodes are formed recursively using binary partitions by asking a sequence of yes-or-no questions. Existing implementations of trees usually do not handle mixtures of numeric and nominal variables. Since a variable in ${\bm{X}}({T_{\rm L}})$ may take either numeric/ordinal values or NA, the conventional partition scheme needs to be extended. When a variable in ${\bm{W}}(t_{k})$ is nominal, its counterpart in ${\bm{W}}(t_{k},{T_{\rm L}})$ is also nominal and can be split applying existing approaches. In what follows, we focus on the case where the longitudinal marker measurements ${\bm{W}}(t_{k})$ are numeric/ordinal.

When ${T_{\rm L}}$ is random, the partition is based on the variables in $({T_{\rm L}},{\bm{Z}},{\bm{X}}({T_{\rm L}}))$ . We consider the following situations:

(R1)

When ${\bm{W}}(t_{k},{T_{\rm L}})$ ’s are the splitting variables, conventional splitting rules may not be directly applied because they take NA values when $t_{k}>{T_{\rm L}}$ . Supposed $W$ is an element of ${\bm{W}}(t_{k},{T_{\rm L}})$ and $c$ is a cutoff value. We consider two possible splits, (a) { $W>c$ } versus { $W\leq c$ or $W={\rm NA}$ } and (b) { $W>c$ or $W={\rm NA}$ } versus { $W\leq c$ }, and select the split that yields a larger improvement in the splitting criterion. We note that $W=\rm NA$ (i.e., ${T_{\rm L}}<t_{k}$ ) is treated as an attribute rather than missing data.
(R2)

Conventional splitting rules cannot be directly applied to $U_{j}({T_{\rm L}})$ , because $U_{j}({T_{\rm L}})$ can be ${T_{\rm L}}^{+}$ and the support of $U_{j}({T_{\rm L}})$ is not an ordered set. To tackle this problem, we employ a set of transformed predictors $\{U_{1}({T_{\rm L}})/{T_{\rm L}},\ldots,U_{J}({T_{\rm L}})/{T_{\rm L}}\}$ , which, together with ${T_{\rm L}}$ , contains the same information as $\{U_{1}({T_{\rm L}}),\ldots,U_{J}({T_{\rm L}}),{T_{\rm L}}\}$ . Then the rule $U_{j}({T_{\rm L}})/{T_{\rm L}}>c$ can be applied for $c\in(0,1]$ , and $U_{j}({T_{\rm L}})/{T_{\rm L}}>1$ is equivalent to $U_{j}({T_{\rm L}})={T_{\rm L}}^{+}$ .

The splitting rules in (R1) and (R2) can be implemented by transforming ${\bm{X}}({T_{\rm L}})$ to a numeric/ordinal vector and passing the transformed predictors into a tree algorithm using conventional partition rules: First, we create two features for each element $W$ in ${\bm{W}}(t_{k},{T_{\rm L}})$ . The two features, denoted by $W^{+}$ and $W^{-}$ , take the same value as $W$ when $W\neq{{\rm NA}}$ (i.e., $t_{k}>{T_{\rm L}}$ ), and take extreme values $M$ and $-M$ otherwise, where $M$ is a large positive number outside the possible range of predictor values. Instead of using $W$ , we use $W^{+}$ and $W^{-}$ as candidate variables for splitting. The partition “{ $W^{+}>c$ } versus { $W^{+}\leq c$ }” is equivalent to the type (b) partition “{ $W>c$ or $W={\rm NA}$ } versus { $W\leq c$ }”, because observations satisfying $W={\rm NA}$ or $W>c$ are assigned to the same child node; similarly, splitting based on $W^{-}$ yields type (a) partitions. As an example, if landmark is cPA and $W$ denotes the observed age-7 weight at the landmark time, then the rule “Is $W^{+}>20$ ?” is equivalent to “Is weight at age 7 greater than 20kg or does the subject have cPA before age 7?”. Implementing such a partition rule is equivalent to the “missings together” approach (Zhang et al., 1996) and missingness incorporated in attributes (Twala et al., 2008). Second, for each transformed intermediate event time $U_{j}({T_{\rm L}})/{T_{\rm L}}(j=1,\ldots,J)$ , we replace the value $1^{+}$ with $M$ . The details are described in Algorithm 1 (see Table 1). The partition scheme also apply to the trivial case of fixed landmark times. Figure 2 illustrates the proposed data preprocessing procedure.

Landmark			At-risk status	${\bm{W}}(t_{1},{T_{\rm L}})$		$U_{1}({T_{\rm L}})$
time ${T_{\rm L}}$	ID	Observed data	$I(T\geq{T_{\rm L}})$	Observed	Processed	Observed	Processed
$t_{1}=7$	1		1	19.7	(19.7, 19.7)	4.8	4.8 / 7
	2		1	21.5	(21.5, 21.5)	$7^{+}$	$M$
	3		0	-	-	-	-
$U_{1}$	1		1	NA	( $M$ , $-M$ )	4.8	4.8 / 4.8
	2		1	21.5	(21.5, 21.5)	10.2	10.2 / 10.2
	3		0	-	-	-	-

Figure 2: Illustration of observed data and preprocessed data from three subjects at fixed and random landmark times. At each landmark time

{T_{\rm L}}

, only subjects with

T\geq{T_{\rm L}}

(subjects 1 and 2) are of interest. For each predictor that can take the value NA (e.g.,

{\bm{W}}(t_{1},{T_{\rm L}})

), we create two features that take extreme values

M

and

-M

if the predictor is not observed. For partially observed intermediate events (e.g.,

U_{1}({T_{\rm L}})

), we first divide the event time by

{T_{\rm L}}

and replace the value

1^{+}

with

M

. After preprocessing, zero-variance features and duplicate features can be removed before running the tree algorithm.

The partition scheme described above guarantees that each individual has a well-defined pathway to determine its node membership. Given such a partition ${\mathcal{T}}$ , one can define a partition function $l_{{\mathcal{T}}}(a,{\bm{z}},{\bm{x}})$ , which returns the terminal node in ${\mathcal{T}}$ that contains $(a,{\bm{z}},{\bm{x}})$ . We define the following partition-based survival function,

\displaystyle S_{{\mathcal{T}}}(t\mid a,{\bm{z}},{\bm{x}})=P(T\geq t+a\mid T\geq{T_{\rm L}},({T_{\rm L}},{\bm{Z}},{\bm{X}}({T_{\rm L}}))\in l_{{\mathcal{T}}}(a,{\bm{z}},{\bm{x}})).

(3)

The probability $S_{{\mathcal{T}}}(t\mid a,{\bm{z}},{\bm{x}})$ approximates the target function $S(t\mid a,{\bm{z}},{\bm{x}})$ .

3.2 Partition-based estimation at the landmark time

The follow-up of a subject can be terminated due to loss to follow-up or study end. We now consider estimating $S_{\mathcal{T}}(t\mid a,{\bm{z}},{\bm{x}})$ with censored data. Denote by $C$ the censoring time and assume $C$ is independent of $({T_{\rm L}},T,{\bm{X}})$ given ${\bm{Z}}$ . We follow the convention to define $Y=\min(T,C)$ and $\Delta=I(T\leq C)$ . We further define ${Y_{\rm L}}=\min({T_{\rm L}},Y)$ and ${\Delta_{\rm L}}=I({T_{\rm L}}\leq T,{T_{\rm L}}\leq C)$ . Note that ${\Delta_{\rm L}}=1$ is the at-risk indicator at ${T_{\rm L}}$ . For subjects who are free of the failure event at ${T_{\rm L}}$ , one can observe ${T_{\rm L}}$ and ${\bm{X}}({T_{\rm L}})$ when ${\Delta_{\rm L}}=1$ . The training data are $\{(Y_{i},\Delta_{i},{Y_{{\rm L}i}},{\Delta_{\rm L}}_{i},{\bm{X}}_{i}({Y_{{\rm L}i}}),{\bm{Z}}_{i}),i=1,\ldots,n\}$ , which are assumed to be independent identically distributed replicates of $(Y,\Delta,{Y_{\rm L}},{\Delta_{\rm L}},{\bm{X}}({Y_{\rm L}}),{\bm{Z}})$ .

Define $N(t,a)=\Delta I(Y-a\leq t)$ . Denote by $\lambda(t\mid a,{\bm{z}},{\bm{x}})$ the landmark hazard function, that is, $\lambda(t\mid a,{\bm{z}},{\bm{x}})dt=P(T-{T_{\rm L}}\in[t,t+dt)\mid T-{T_{\rm L}}\geq t,{T_{\rm L}}=a,{\bm{Z}}={\bm{z}},{\bm{X}}({T_{\rm L}})={\bm{x}})$ . The survival function $S(t\mid a,{\bm{z}},{\bm{x}})$ and the hazard function $\lambda(t\mid a,{\bm{z}},{\bm{x}})$ have a one-to-one correspondence relationship: $S(t\mid a,{\bm{z}},{\bm{x}})=\exp\{-\int_{0}^{t}\lambda(u\mid a,{\bm{z}},{\bm{x}})du\}$ . For $t>0$ , we have

$\displaystyle E\{N(dt,{Y_{\rm L}})-I(Y-{Y_{\rm L}}\geq t)\lambda(t\mid a,{\bm{z}},{\bm{x}})dt\mid{\Delta_{\rm L}}=1,{Y_{\rm L}}=a,{\bm{Z}}={\bm{z}},{\bm{X}}({Y_{\rm L}})={\bm{x}}\}$
	$\displaystyle=$	$\displaystyle E\{N(dt,a)-I(T\geq a+t,C\geq a+t)\lambda(t\mid a,{\bm{z}},{\bm{x}})dt\mid C\geq a,T\geq{T_{\rm L}}=a,{\bm{Z}}={\bm{z}},{\bm{X}}({T_{\rm L}})={\bm{x}}\}$
	$\displaystyle=$	$\displaystyle E\{I(T-{T_{\rm L}}\in[t,t+dt))-I(T-{T_{\rm L}}\geq t)\lambda(t\mid a,{\bm{z}},{\bm{x}})dt\mid C\geq a,T\geq{T_{\rm L}}=a,{\bm{Z}}={\bm{z}},$
		$\displaystyle{\bm{X}}({T_{\rm L}})={\bm{x}}\}\times P(C\geq a+t\mid C\geq a,{\bm{Z}}={\bm{z}})$
	$\displaystyle=$	$\displaystyle 0.$

Then we have

\displaystyle\lambda(t\mid a,{\bm{z}},{\bm{x}})dt=\frac{E\{N(dt,{Y_{\rm L}})\mid{\Delta_{\rm L}}=1,{Y_{\rm L}}=a,{\bm{Z}}={\bm{z}},{\bm{X}}({Y_{\rm L}})={\bm{x}}\}}{E\{I(Y-{Y_{\rm L}}\geq t)\mid{\Delta_{\rm L}}=1,{Y_{\rm L}}=a,{\bm{Z}}={\bm{z}},{\bm{X}}({Y_{\rm L}})={\bm{x}}\}}.

(4)

Conditioning on ${\Delta_{\rm L}}=1$ and $({Y_{\rm L}},{\bm{Z}},{\bm{X}}({Y_{\rm L}}))$ , subjects with $Y-{Y_{\rm L}}\geq t$ can be viewed as a representative sample of the population with $T-{T_{\rm L}}\geq t$ for each $t>0$ . Heuristically, the numerator and denominator in (4) can be estimated using partition-based estimators in the subsample with ${\Delta_{\rm L}}=1$ . Given a partition ${\mathcal{T}}$ , the function $S_{\mathcal{T}}(t\mid a,{\bm{z}},{\bm{x}})$ in (3) can be estimated by the following estimator,

	$\displaystyle{\widehat{S}}_{{\mathcal{T}}}(t\mid a,$	$\displaystyle{\bm{z}},{\bm{x}})$
		$\displaystyle=\exp\left\{-\int_{0}^{t}\frac{{\sum_{i=1}^{n}}{\Delta_{\rm L}}_{i}I(({Y_{\rm L}}_{i},{\bm{Z}}_{i},{\bm{X}}_{i}({Y_{\rm L}}_{i}))\in l_{{\mathcal{T}}}(a,{\bm{z}},{\bm{x}}))N_{i}(du,{Y_{\rm L}}_{i})}{{\sum_{i=1}^{n}}{\Delta_{\rm L}}_{i}I(({Y_{\rm L}}_{i},{\bm{Z}}_{i},{\bm{X}}_{i}({Y_{\rm L}}_{i}))\in l_{{\mathcal{T}}}(a,{\bm{z}},{\bm{x}}),Y_{i}-{Y_{\rm L}}_{i}\geq u)}\right\}.$		(5)

When a new subject is event-free at the landmark time ${T_{\rm L}}_{0}$ with predictors ${\bm{Z}}_{0}$ and ${\bm{X}}_{0}({T_{\rm L}}_{0})$ , the predicted survival probability based on a single tree is ${\widehat{S}}_{\mathcal{T}}(t\mid{T_{\rm L}}_{0},{\bm{Z}}_{0},{\bm{X}}_{0}({T_{\rm L}}_{0}))$ . In practice, the partition ${\mathcal{T}}$ can be constructed via a recursive partition algorithm, and the split-complexity pruning can be applied to determine the size of the tree (LeBlanc and Crowley, 1993). More details of the tree algorithm are given in the Supplementary Material (Sun et al., 2022).

3.3 Survival tree ensembles based on martingale estimating equations

Since the prediction based on a tree is often unstable, ensemble methods such as bagging (Breiman, 1996) and random forests (Breiman, 2001) have been commonly applied. The original random forests perform the prediction for a new data point by averaging predictions from a large number of trees, which are often grown sufficiently deep to achieve low bias (Friedman et al., 2001). However, for censored data, a large tree may result in a small number of observed failures in the terminal nodes, leading to increased estimation bias of survival or cumulative hazard functions. Existing survival forests inherit from the original random forest and directly average the cumulative hazard prediction from individual trees. Therefore, the node size parameter needs to be carefully tuned to achieve accurate prediction. On the other hand, if the target estimate can be expressed as the solution of an unbiased estimating equation, a natural way is to solve the averaged estimating equations. In what follows, we propose an ensemble procedure based on averaging martingale estimating equations.

For $b=1,\ldots,B$ , we draw the $b$ th bootstrap sample from the training data. Let $\mathbb{T}=\{{\mathcal{T}}_{b}\}_{b=1}^{B}$ be a collection of $B$ partitions constructed using the bootstrap datasets. Each partition is constructed via a recursive partition procedure where at each split, $m$ predictors are randomly selected as the candidate variables for splitting, and $m$ is smaller than the number of predictors. Let $l_{b}$ be the partition function based on the partition ${\mathcal{T}}_{b}$ . The tree-based estimation from ${\mathcal{T}}_{b}$ can be obtained from the following estimating equation,

\displaystyle{\sum_{i=1}^{n}}w_{bi}I(({Y_{\rm L}}_{i},{\bm{Z}}_{i},{\bm{X}}_{i}({Y_{\rm L}}_{i}))\in l_{b}(a,{\bm{z}},{\bm{x}})){\Delta_{\rm L}}_{i}\{N_{i}(dt,{Y_{\rm L}}_{i})-I(Y_{i}-{Y_{\rm L}}_{i}\geq t)\lambda(t\mid a,{\bm{z}},{\bm{x}})dt\}=0,

where $w_{bi}$ is the frequency of the $i$ th observation in the $b$ th bootstrap sample. Note that when $w_{bi}=1$ for all $i=1,\ldots,n$ , solving the above estimating equation yields the estimator in (3.2). To perform prediction using all the trees, we consider the following averaged estimating equation,

\displaystyle{\sum_{i=1}^{n}}w_{i}(a,{\bm{z}},{\bm{x}})\{N_{i}(dt,{Y_{\rm L}}_{i})-I(Y_{i}-{Y_{\rm L}}_{i}\geq t)\lambda(t\mid a,{\bm{z}},{\bm{x}})dt\}=0,

where $w_{i}(a,{\bm{z}},{\bm{x}})={\sum_{b=1}^{B}}w_{bi}I\{({Y_{\rm L}}_{i},{\bm{Z}}_{i},{\bm{X}}_{i}({Y_{\rm L}}_{i}))\in l_{b}(a,{\bm{z}},{\bm{x}})\}{\Delta_{\rm L}}_{i}/B$ . Solving the averaged estimating equation yields

\displaystyle{\widehat{S}}_{\mathbb{T}}(t\mid a,{\bm{z}},{\bm{x}})=\exp\left\{-\int_{0}^{t}\frac{{\sum_{i=1}^{n}}w_{i}(a,{\bm{z}},{\bm{x}})N_{i}(ds,{Y_{\rm L}}_{i})}{{\sum_{i=1}^{n}}w_{i}(a,{\bm{z}},{\bm{x}})I(Y_{i}-{Y_{\rm L}}_{i}\geq s)}\right\}.

The estimator ${\widehat{S}}_{\mathbb{T}}(t\mid a,{\bm{z}},{\bm{x}})$ can be viewed as an adaptive nearest neighbour estimator (Lin and Jeon, 2006), where the weight assigned to each observation comes from random forests. The risk prediction algorithm is given in Table 1.

Table 1: The data preprocessing and risk prediction algorithm

Input :

\{(Y_{i},\Delta_{i},{Y_{{\rm L}i}},{\Delta_{\rm L}}_{i},{\bm{X}}_{i}({Y_{{\rm L}i}}),{\bm{Z}}_{i}),i=1,\ldots,n\}

Output :

\widehat{S}_{\mathbb{T}}(t\mid a,{\bm{z}},{\bm{x}})

5Function Transform( $a,{\bm{x}}$ ):

/* The function transforms the longitudinal predictors

{\bm{X}}({T_{\rm L}})

(i.e.,

{\bm{x}}

) to a vector

\widetilde{\bm{X}}({T_{\rm L}})

(i.e.,

\widetilde{\bm{x}}

) such that the rules in (R1) and (R2) on

{\bm{X}}({T_{\rm L}})

can be achieved using a conventional tree-structured rule on

\widetilde{\bm{X}}({T_{\rm L}})

\widetilde{\bm{x}}\leftarrow{\rm vector}({\rm size}:2Kq+J)

;

8 for $l=1$ to $Kq+J$ do

if $1\leq l\leq Kq$ and ${\bm{x}}[l]\neq{\rm NA}$ ;

// observed longitudinal measurements

9 then

\widetilde{\bm{x}}[2l-1]\leftarrow{\bm{x}}[l]

;

\widetilde{\bm{x}}[2l]\leftarrow{\bm{x}}[l]

;

else if $1\leq l\leq Kq$ and ${\bm{x}}[l]={\rm NA}$ ;

// unobserved longitudinal measurements

13 then

\widetilde{\bm{x}}[2l-1]\leftarrow M

;

\widetilde{\bm{x}}[2l]\leftarrow-M

;

/* The split

\widetilde{\bm{x}}[2l-1]>c

vs.

\widetilde{\bm{x}}[2l-1]\leq c

corresponds to (

{\bm{x}}[l]>c

{\bm{x}}[l]={\rm NA}

) vs.

{\bm{x}}[l]\leq c

; the split

\widetilde{\bm{x}}[2l]>c

vs.

\widetilde{\bm{x}}[2l]\leq c

corresponds to

{\bm{x}}[l]>c

vs. (

{\bm{x}}[l]\leq c

{\bm{x}}[l]={\rm NA}

) */

else if $Kq+1\leq l\leq Kq+J$ and ${\bm{x}}[l]=a+$ ;

// unobserved intermediate events

17 then

\widetilde{\bm{x}}[Kq+l]\leftarrow M

;

20 else

\widetilde{\bm{x}}[Kq+l]\leftarrow{\bm{x}}[l]/a

;

// observed intermediate events

22 end if

/* The split

\widetilde{\bm{x}}[Kq+l]>c

vs.

\widetilde{\bm{x}}[Kq+l]\leq c

corresponds to

{\bm{x}}[l]/a>c

vs.

{\bm{x}}[l]/a\leq c

24 end for

26 return

\widetilde{\bm{x}}

;

27 end

29Data Preprocessing

31for $i=1$ to $n$ do

32 if ${\Delta_{\rm L}}_{i}=1$ then

\widetilde{\bm{X}}_{i}({Y_{\rm L}}_{i})\leftarrow

Transform( ${Y_{\rm L}}_{i},{\bm{X}}_{i}({Y_{\rm L}}_{i})$ );

35 end if

37 end for

\widetilde{\bm{x}}\leftarrow

Transform( $a,{\bm{x}}$ ) ;

40Survival probability prediction

41Initialize the weights:

w_{i}\leftarrow 0

i=1,\ldots,n

;

43for $b=1$ to $B$ do

44 Draw the

b

th bootstrap sample from the training data;

Construct a partition

{\mathcal{T}}_{b}

using subjects with

\Delta_{L}=1

in the

b

th bootstrap sample:

46•

The predictors are $\{{Y_{\rm L}},{\bm{Z}},\widetilde{\bm{X}}({Y_{\rm L}})\}$ and the censored outcome is $(Y-{Y_{\rm L}},\Delta)$ ;

47•

At each split, a random selection of $m$ predictors are used as the candidate splitting variables;

49 for $i=1$ to $n$ do

50 if ${\Delta_{\rm L}}_{i}=1$ and $({Y_{\rm L}}_{i},{\bm{Z}}_{i},\widetilde{\bm{X}}_{i}({Y_{\rm L}}_{i}))$ and $(a,{\bm{z}},\widetilde{\bm{x}})$ are in the same terminal node then

w_{i}\leftarrow w_{i}+w_{bi}

;

w_{bi}

is the frequency of the

i

th observation in the

b

th bootstrapped sample */

53 end if

55 end for

57 end for

59Predict the survival probability using

\widehat{S}_{\mathbb{T}}(t\mid a,{\bm{z}},{\bm{x}})=\exp\left\{-\sum_{i=1}^{n}\frac{w_{i}\Delta_{i}I(Y_{i}-{Y_{\rm L}}_{i}\leq t)}{\sum_{j=1}^{n}w_{j}I(Y_{j}-{Y_{\rm L}}_{j}\geq Y_{i}-{Y_{\rm L}}_{i})}\right\}.

Algorithm 1 The survival tree ensemble algorithm

4 Evaluating the landmark prediction performance

To evaluate the performance of the predicted risk score, we extend the cumulative/dynamic receiver operating characteristics (ROC) curves (Heagerty et al., 2000), which has been commonly used when a risk score is based on baseline predictors. We note that ROC and concordance indices for dynamic prediction at a fixed landmark time has been studied in the literature (Rizopoulos et al., 2017; Wang et al., 2017). Here we consider a more general case where the landmark time ${T_{\rm L}}$ is random and subject to right censoring. For $t>0$ , subjects with $0\leq T-{T_{\rm L}}<t$ are considered as cases and subjects with $T-{T_{\rm L}}\geq t$ are considered as controls. The ROC curve then evaluates the performance of a risk score that discriminates between subjects who have experienced the events prior to ${T_{\rm L}}+t$ and those who do not.

Let $g(t,{T_{\rm L}},{\bm{Z}},{\bm{X}}({T_{\rm L}}))$ denote a risk score based on $({T_{\rm L}},{\bm{Z}},{\bm{X}}({T_{\rm L}}))$ , with a larger value indicating a higher chance of being a case. For each $t>0$ , The true positive rate and false positive rate at a threshold of $c$ are defined as follows,

	$\displaystyle{{\rm TPR}_{t}(c)}=$	$\displaystyle P(g(t,{T_{\rm L}},{\bm{Z}},{\bm{X}}({T_{\rm L}}))>c\mid 0\leq T-{T_{\rm L}}<t,{{T_{\rm L}}\leq\tau_{0}}),$
	$\displaystyle{{\rm FPR}_{t}(c)}=$	$\displaystyle P(g(t,{T_{\rm L}},{\bm{Z}},{\bm{X}}({T_{\rm L}}))>c\mid T-{T_{\rm L}}\geq t,{{T_{\rm L}}\leq\tau_{0}}),$

where $\tau_{0}$ is a pre-specified constant. The ROC curve at $t$ is defined as ${\rm ROC}_{t}(p)={\rm TPR}_{t}({\rm FPR}_{t}^{-1}(p))$ . Following the arguments of McIntosh and Pepe (2002), it can be shown that $g(t,a,{\bm{z}},{\bm{x}})=1-S(t\mid a,{\bm{z}},{\bm{x}})$ yields the highest ROC curve, which justifies the use of the proposed time-dependent ROC curve. Moreover, the area under the ROC curve is equivalent to the following concordance measure,

$\displaystyle{\rm CON}_{t}(g)$
	$\displaystyle=$	$\displaystyle P\{g(t,{T_{\rm L}}_{1},{\bm{Z}},{\bm{X}}_{1}({T_{\rm L}}_{1}))<g(t,{T_{\rm L}}_{2},{\bm{Z}},{\bm{X}}_{2}({T_{\rm L}}_{2}))\mid$
		$\displaystyle\hskip 158.99377pt0\leq T_{2}-{T_{\rm L}}_{2}<t\leq T_{1}-{T_{\rm L}}_{1},{{T_{\rm L}}_{1}\leq\tau_{0},{T_{\rm L}}_{2}\leq\tau_{0}}\}+$
		$\displaystyle\hskip 0.72229pt0.5P\{g(t,{T_{\rm L}}_{1},{\bm{Z}},{\bm{X}}_{1}({T_{\rm L}}_{1}))=g(t,{T_{\rm L}}_{2},{\bm{Z}},{\bm{X}}_{2}({T_{\rm L}}_{2}))\mid$
		$\displaystyle\hskip 158.99377pt0\leq T_{2}-{T_{\rm L}}_{2}<t\leq T_{1}-{T_{\rm L}}_{1},{{T_{\rm L}}_{1}\leq\tau_{0},{T_{\rm L}}_{2}\leq\tau_{0}}\},$

where $({T_{\rm L}}_{1},{\bm{Z}}_{1},{\bm{X}}_{1}({T_{\rm L}}_{1}),T_{1})$ and $({T_{\rm L}}_{2},{\bm{Z}}_{2},{\bm{X}}_{2}({T_{\rm L}}_{2}),T_{2})$ are independent pairs of observations, and the second term accounts for potential ties in the risk score.

In practice, one usually builds the model on a training dataset and evaluates its performance on an independent test dataset that are also subject to right censoring. To simplify notation here, we construct the estimator for ${\rm CON}_{t}(g)$ using the observed data introduced in Section 3.2, although ${\rm CON}_{t}$ evaluated using the test data should be used in real applications. Define $d_{ij}(t)=g(t,{Y_{\rm L}}_{j},{\bm{Z}}_{j},{\bm{X}}_{j}({Y_{\rm L}}_{j})){\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}-}g(t,{Y_{\rm L}}_{i},{\bm{Z}}_{i},{\bm{X}}_{i}({Y_{\rm L}}_{i}))$ . The ${\rm CON}_{t}(g)$ measure can be estimated by

	$\displaystyle\widehat{{\rm CON}}_{t}(g)$
		$\displaystyle=$	$\displaystyle\frac{\sum_{i\neq j}\{I(d_{ij}(t)>0)+0.5I(d_{ij}(t)=0)\}\frac{\Delta_{j}I(0\leq Y_{j}-{Y_{\rm L}}_{j}\leq t<Y_{i}-{Y_{\rm L}}_{i},{{Y_{\rm L}}_{i}\leq\tau_{0},{Y_{\rm L}}_{j}\leq\tau_{0}})}{{\widehat{S}}_{C}(Y_{j}\mid{\bm{Z}}_{j}){\widehat{S}}_{C}({Y_{\rm L}}_{i}+t\mid{\bm{Z}}_{i})}}{\sum_{i\neq j}\frac{\Delta_{j}I(0\leq Y_{j}-{Y_{\rm L}}_{j}\leq t<Y_{i}-{Y_{\rm L}}_{i},{{Y_{\rm L}}_{i}\leq\tau_{0},{Y_{\rm L}}_{j}\leq\tau_{0}})}{{\widehat{S}}_{C}(Y_{j}\mid{\bm{Z}}_{j}){\widehat{S}}_{C}({Y_{\rm L}}_{i}+t\mid{\bm{Z}}_{i})}},$

where $\widehat{S}_{C}(t\mid{\bm{z}})$ is an estimator for the conditional censoring distribution $S_{C}(t\mid{\bm{z}})=P(C\geq t\mid{\bm{Z}}={\bm{z}})$ . For example, survival trees or forests can be applied to estimate $S_{C}(t\mid{\bm{z}})$ ; when censoring is completely random, the Kaplan-Meier estimator can also be applied. Under regularity conditions, we show that $\widehat{{\rm CON}}_{t}(g)$ consistently estimates ${{\rm CON}}_{t}(g)$ in the Supplementary Material. In practice, one can either use the concordance at a given time point $t$ or an integrated measure to evaluate the overall concordance on a given time interval $[t_{\rm L},t_{\rm U}]$ . In the latter case, a weighted average of concordance on a grid of time points in $[t_{\rm L},t_{\rm U}]$ can be reported. For example, one may assign equal weights to all the time points. Another example is a weight proportional to the denominator in $\widehat{\rm CON}_{t}(g)$ , that is, $\sum_{i\neq j}{\Delta_{j}I(0\leq Y_{j}-{Y_{\rm L}}_{j}\leq t<Y_{i}-{Y_{\rm L}}_{i},{{Y_{\rm L}}_{i}\leq\tau_{0},{Y_{\rm L}}_{j}\leq\tau_{0}})}/{{\widehat{S}}_{C}(Y_{j}\mid{\bm{Z}}_{j}){\widehat{S}}_{C}({Y_{\rm L}}_{i}+t\mid{\bm{Z}}_{i})}$ , to avoid potentially unstable estimation for very small or large time points.

5 Permutation variable importance

Variable importance is a useful measure for understanding the impact of predictors in tree ensembles and can be used as a reference for variable selection (Breiman, 2001). In the original random forests, each tree is constructed using a bootstrap sample of the original data, and the out-of-bag (OOB) data can be used to estimate the OOB prediction performance. The permutation variable importance of a predictor is computed as the average decrease in model accuracy on the OOB samples when the respective feature values are randomly permuted.

To study variable importance in dynamic risk prediction using censored data, we consider an extension of variable importance. Following the original random forests, the OOB prediction for each training observation is made based on trees constructed without using this observation. Applying the same arguments as in Section 4, the prediction based on trees built without the $i$ th subject is

\displaystyle{\widehat{S}}_{-i}(t\mid a,{\bm{z}},{\bm{x}})=\exp\left\{-\int_{0}^{t}\frac{{\sum_{k=1}^{n}}w_{k,-i}(a,{\bm{z}},{\bm{x}})N_{k}(ds,{Y_{\rm L}}_{k})}{{\sum_{k=1}^{n}}w_{k,-i}(a,{\bm{z}},{\bm{x}})I(Y_{k}-{Y_{\rm L}}_{k}\geq s)}\right\},

where $w_{k,-i}(a,{\bm{z}},{\bm{x}})={\sum_{b=1}^{B}}w_{bk}I(({Y_{\rm L}}_{k},{\bm{Z}}_{k},{\bm{X}}_{k}({Y_{\rm L}}_{k}))\in l_{b}(a,{\bm{z}},{\bm{x}}),w_{bi}=0){\Delta_{\rm L}}_{i}$ . Define $d_{ij}(t)=-{\widehat{S}}_{-i}(t\mid{Y_{\rm L}}_{i},{\bm{Z}}_{i},{\bm{X}}_{i}({Y_{\rm L}}_{i}))+{\widehat{S}}_{-j}(t\mid{Y_{\rm L}}_{j},{\bm{Z}}_{j},{\bm{X}}_{j}({Y_{\rm L}}_{j}))$ . The OOB concordance at $t$ can be calculated by applying (4). To compute variable importance for a predictor, we permute this predictor and calculate the OOB concordance after permutation. We repeat the permutation multiple times (e.g., 100 times) and define the variable importance as the average difference in OOB concordances over all the permutations.

Permuting variables is straightforward in the case of fixed landmark times (i.e., ${T_{\rm L}}=t_{k}$ ), where we randomly shuffling the observed values of the predictor among individuals who remained under observation at the landmark time, that is, subjects with ${\Delta_{\rm L}}=1$ . When the landmark time is random (i.e., ${T_{\rm L}}=U_{j}$ ), we propose different permutation procedures for calculating variable importance according to the type of the variable: (1) If the variable of interest is completely observed at the landmark time ${T_{\rm L}}$ (e.g., the value of ${T_{\rm L}}$ and baseline covariates ${\bm{Z}}$ ), we randomly shuffle its values among subjects with ${\Delta_{\rm L}}=1$ . (2) If the variable of interest is an intermediate event time $U_{j^{\prime}}(j^{\prime}\neq j)$ , we propose to permute its relative value to the landmark time, that is, $U_{j^{\prime}}({T_{\rm L}})/{T_{\rm L}}$ , among subjects with ${\Delta_{\rm L}}=1$ . Note that the reason to permute the ratio $U_{j^{\prime}}({T_{\rm L}})/{T_{\rm L}}$ instead of the untransformed intermediate event time directly is to avoid incompatible pairs of the intermediate event time and the landmark time under permutation. (3) If the variable of interest is an element of ${\bm{W}}(t_{k})$ , we permute its values among subjects with complete observations at the landmark time (i.e., ${T_{\rm L}}\geq t_{k}$ and ${\Delta_{\rm L}}=1$ ). This is because ${\bm{W}}(t_{k})$ is not used in prediction for subjects with ${T_{\rm L}}<t_{k}$ .

6 Simulation

Simulation studies were conducted to assess the performance of the proposed methods in estimating the landmark survival probability in (1) under scenarios when the landmark time is fixed or random. In both cases, we generated the time-independent predictors ${\bm{Z}}=(Z_{1},\ldots,Z_{10})$ from a multivariate normal distribution with $E(Z_{i})=1$ , ${\rm Var}(Z_{i})=1$ , and ${\rm Cov}(Z_{i},Z_{j})=0$ , for $i,j=1,\ldots,10$ . The longitudinal predictors were observed intermittently at time points specified below.

In the first set of simulations, the longitudinal predictors ${\bm{W}}(\cdot)$ were measured at fixed landmark time points $t_{k}=k$ for $k=1,\ldots,K$ . The longitudinal predictors ${\bm{W}}(t)=(W_{1}(t),\ldots,W_{10}(t))$ were generated from $W_{i}(t)=a_{i}F(b_{i}t)/t$ , $i=1,\ldots,10$ , where $a_{i}$ and $b_{i}$ are independent standard uniform random variables and $F(x)=1-\exp(-x^{2})$ . The probability in (1) can be expressed as $P(T\geq t_{k}+t\mid T\geq t_{k},{\bm{W}}(t_{1}),\ldots,{\bm{W}}(t_{k}),{\bm{Z}})$ for $0=t_{0}<t<t_{k+1}-t_{k}$ , $k\geq 1$ , and $P(T\geq t\mid{\bm{Z}})$ for $0<t<t_{1}$ . For $k\geq 1$ , we assume the hazard function of $T$ on $(t_{k},t_{k+1})$ depends on the history of ${\bm{W}}(\cdot)$ up to $t$ only through its value at $t_{k}$ . For $t\in(t_{k},t_{k+1})$ and $k\geq 0$ , we consider the following hazard functions for $T$ ,
(I) $\lambda(t\mid{\mathcal{H}}_{W}(t),{\bm{Z}})=t^{2}\exp\left\{-5+\sum_{j=1}^{10}\alpha_{kj}W_{j}(t_{k})+\sum_{j=1}^{10}\beta_{kj}W_{j}(t_{k})Z_{j}+\sum_{j=1}^{3}Z_{j}^{2}\right\}$ , $\alpha_{kj}=\beta_{kj}=2I(k=1)+4I(k\geq 2)$ for $1\leq j\leq 3$ , and $\alpha_{kj}=\beta_{kj}=0$ for $4\leq j\leq 10$ ;
(II) $\lambda(t\mid{\mathcal{H}}_{W}(t),{\bm{Z}})=0.1t^{2}+\exp\left\{-5+\sum_{j=1}^{10}\alpha_{kj}W_{j}(t_{k})+\sum_{j=1}^{10}\beta_{kj}W_{j}(t_{k})Z_{j}+\sum_{j=1}^{3}Z_{j}^{2}\right\}$ , $\alpha_{kj}=\beta_{kj}=I(k=1)+2I(k\geq 2)$ for $1\leq j\leq 3$ , and $\alpha_{kj}=\beta_{kj}=0$ for $4\leq j\leq 10$ .
The closed-form expressions of the true landmark survival probabilities under Models (I) and (II) are given in the Supplementary Material. When evaluating the prediction performance of different approaches, we focus on the landmark probability at $t_{2}=2$ .

In the second set of simulations, we consider the case where both an intermediate event and longitudinal markers are present. Specifically, we generated the event times from irreversible multi-state models with three states: healthy, diseased, and death. We assume that all subjects started in the healthy state, disease onset is an intermediate event, and death is the event of interest. We generated the time to the first event, denoted by $D$ , from a uniform distribution on [0, 5]. Define the disease indicator, $\Pi$ , where $\Pi=1$ indicates the subject moves from the healthy state to the disease state at time $D$ , and $\Pi=0$ indicates the subject moves from the healthy state to death at time $D$ . The disease indicator was obtained via a logistic regression model, $\operatorname{logit\!}P(\Pi=1\mid{\bm{Z}},{\bm{W}}(D))=\sum_{j=1}^{3}W_{i}(D)+\sum_{j=1}^{3}Z_{j}+\gamma$ , where $\gamma$ is a frailty variable following a gamma distribution with mean 1 and variance 0.5, ${\bm{W}}(t)=(W_{1}(t),\ldots,W_{10}(t))$ were generated from $W_{j}(t)=a_{j}\{1-\exp(-0.04t^{2})\}$ , and $a_{j}$ follows a uniform distribution on $[-1,1]$ for $j=1,\ldots,10$ . Given a subject had developed the disease at time $D$ , i.e., $\Pi=1$ , the residual survival time, $R$ , was generated from the following models,
(III) $\log R=-5+\sum_{j=1}^{3}W_{j}(D)+\sum_{j=1}^{3}Z_{j}^{2}+\sum_{j=1}^{3}W_{j}(D)Z_{j}+\log(1+D)+\gamma+\epsilon$ , where $\epsilon$ is a standard normal random variable;
(IV) $\log R=-5+\sum_{j=1}^{3}W_{j}(D)+\sum_{j=1}^{3}Z_{j}^{2}+\sum_{j=1}^{3}W_{j}(D)Z_{j}+\log(1+D)+\gamma+I(Z_{1}>0)\epsilon_{1}+I(Z_{1}\leq 0)\epsilon_{2}$ , where $\epsilon_{1}$ and $\epsilon_{2}$ are independent normal random variables with variances 1 and 0.25, respectively;
(V) The hazard function of $R$ is $t^{2}\exp[-5+\sum_{j=1}^{3}\{2I(1\leq D<2)+4I(D\geq 2)\}\{W_{j}(D)+W_{j}(D)Z_{j}+Z_{j}^{2}\}].$
When $\Pi=1$ , the time to death is $T=D+R$ , and the time to the intermediate event is $U=D$ ; when $\Pi=0$ , the time to death is $T=D$ and the intermediate event does not occur. Under Models (III)-(V), we consider the following three scenarios depending on how the landmark time and the longitudinal markers are observed:

(A)

The landmark is the intermediate event, i.e., ${T_{\rm L}}=U$ , and ${\bm{W}}(\cdot)$ is observed intermittently at $t_{k}=k$ , $k=1,\ldots,5$ . The target probability in (1) is

$P(T\geq U+t\mid T\geq U,U,{\bm{W}}(t_{1},U),\ldots,{\bm{W}}(t_{K},U),{\bm{Z}}).$

(B)

The landmark time is fixed at ${T_{\rm L}}=a$ , and ${\bm{W}}(\cdot)$ is observed at the intermediate event. The target probability is

\displaystyle\begin{cases}P(T-a\geq t\mid T\geq a,U,{\bm{W}}(U),{\bm{Z}}),&U\leq a,\\ P(T-a\geq t\mid T\geq a,U>a,{\bm{Z}}),&U>a.\\ \end{cases}

(C)

The landmark is the intermediate event, and ${\bm{W}}(\cdot)$ is observed at the intermediate event. The target probability is $P(T\geq U+t\mid T\geq U,U,{\bm{W}}(U),{\bm{Z}})$ .

Scenario (A) is motivated by the CFFPR data, where the longitudinal marker is regularly monitored. Scenarios (B) and (C) are motivated by applications where markers are observed when a disease is diagnosed. In Scenario (B), we set $a=2$ . Due to the complicated relationship between event times and longitudinal markers, deriving the closed-form expression of the true probability under Models (III)-(V) is challenging. We outline the the Monte Carlo method used to approximate the true probabilities in the Supplementary Material.

For all scenarios, the censoring time was generated from an independent exponential distribution with rate $c$ , where $c$ was chosen to achieve either a 20% or 40% rate of censoring at the baseline. We simulated 500 training datasets with sample sizes of 200 and 400 at baseline. The results for large sample sizes ( $n=5000$ ) are included in the Supplementary Material. The trees were constructed with a minimum terminal node size of 15. When a single tree was used for prediction, the size of the tree was determined by split-complexity pruning via ten-fold cross-validation. To grow the trees in the ensemble method, we randomly selected $\left\lceil\sqrt{p}\right\rceil$ variables at each splitting step and did not prune the trees. We applied the log-rank splitting rule in the ensemble method in order to compare the martingale-based ensemble approach with the default ensemble approach in random survival forests. Each fitted model was evaluated on independent test data with 500 observations. The evaluating criteria were the integrated mean absolute error (IMAE), the integrated mean squared error (IMSE), the integrated Brier score (IBS), and the integrated concordance over $[0,t_{0}]$ , where $t_{0}$ is set to be approximately the 90% quantile of $T-{T_{\rm L}}$ . The integrated concordance was defined in Section 4, and other evaluating criteria are defined as follows:

	$\displaystyle{\rm IMAE}$	$\displaystyle=\sum_{i=1}^{500}\int_{0}^{t_{0}}\left\|\widehat{S}(t\mid T^{0}_{{\rm L}i},{\bm{Z}}_{i}^{0},{\mathcal{H}}_{i}^{0}(T^{0}_{{\rm L}i}))-S(t\mid T^{0}_{{\rm L}i},{\bm{Z}}_{i}^{0},{\mathcal{H}}_{i}^{0}(T^{0}_{{\rm L}i}))\right\|dt/500,$
	$\displaystyle{\rm IMSE}$	$\displaystyle=\sum_{i=1}^{500}\int_{0}^{t_{0}}\left\{\widehat{S}(t\mid T^{0}_{{\rm L}i},{\bm{Z}}_{i}^{0},{\mathcal{H}}_{i}^{0}(T^{0}_{{\rm L}i}))-S(t\mid T^{0}_{{\rm L}i},{\bm{Z}}_{i}^{0},{\mathcal{H}}_{i}^{0}(T^{0}_{{\rm L}i}))\right\}^{2}dt/500,$
	$\displaystyle{\rm IBS}$	$\displaystyle=\sum_{i=1}^{500}\int_{0}^{t_{0}}\left\{\widehat{S}(t\mid T^{0}_{{\rm L}i},{\bm{Z}}_{i}^{0},{\mathcal{H}}_{i}^{0}(T^{0}_{{\rm L}i}))-I(T^{0}_{i}-{T_{\rm L}}^{0}_{i}\geq t)\right\}^{2}dt/500,$

where $S(t\mid{T_{\rm L}},{\bm{Z}},{\mathcal{H}}({T_{\rm L}}))=P(T-{T_{\rm L}}\geq t\mid T\geq{T_{\rm L}},{T_{\rm L}},{\bm{Z}},{\mathcal{H}}({T_{\rm L}}))$ is the target survival probability, and the superscript 0 is used to denote the test data.

For comparison, we applied the conventional random survival forest implemented in the R package ranger (Wright and Ziegler, 2017) and the simple landmark Cox model in all scenarios; we also applied the two-stage landmark approach in Models (I) and (II). We used the proposed data preprocessing procedure to prepare the predictors for the conventional random survival forest but calculated the predicted survival probabilities by default (i.e., averaging the cumulative hazard predictions). Under Models (I) and (II), the predictors in the simple landmark Cox model are $\{{\bm{W}}(t_{1}),\ldots,{\bm{W}}(t_{k}),{\bm{Z}}\}$ . Under Models (III)-(V), the predictors in the simple landmark Cox model are $\{U,{\bm{Z}},{\bm{W}}(t_{k})I(t_{k}\leq U),I(t_{k}>U);k\geq 1\}$ , $\{{\bm{Z}},UI(U\leq a),{\bm{W}}(U)I(U\leq a),I(a>U)\}$ , and $\{U,{\bm{Z}},{\bm{W}}(U)\}$ in sub-scenarios (A), (B), and (C), respectively. For the two-stage landmark approach, we fit separate linear mixed models for all the variables in ${\bm{W}}(\cdot)$ : each model includes a time variable and a random intercept, and were estimated using repeated measurements from subjects who are at-risk at the landmark time. The predictors of the landmark Cox model then include the BLUPs of random effects and ${\bm{Z}}$ .

The simulation results are summarized in Tables 2 and 3, in which the proposed ensemble method outperforms the others based on the four evaluation criteria we considered. As expected, when the sample size increases from 200 to 400, the IMAEs, the IMSEs, and the IBSs of the proposed methods decrease, while the integrated concordance increases. The conventional random survival forest approach performs similarly to the proposed ensemble method when $n=200$ but loses its edge as the sample size increases. On the other hand, the simple landmark Cox model and the two-stage landmark approach yield similar results under Model (I), but the former yields better results under Model (II). When comparing the tree models with the Cox models, we observe that a small IMSE does not necessarily accompany by a large integrated concordance. We conjecture this is because the concordance measure only depends on the order of the predicted survival probability and is less sensitive in terms of risk calibration. In summary, the proposed ensemble method has strong performances and serves as an appealing tool for dynamic risk prediction.

Table 2: Summaries of integrated mean absolute error (IMAE)

(\times 1000)

, integrated mean squared error (IMSE)

(\times 1000)

, integrated Brier score (IBS)

(\times 1000)

, and integrated concordance (ICON)

(\times 1000)

of different methods in the first set of simulations. cen is the censoring percentage; Tr: the proposed tree; E1: the proposed survival tree ensemble; E2: the original random survival forest; C1: the landmark Cox model; C2: the two stages landmark Cox model.

		IMAE					IMSE					IBS					ICON
$n$	cen	Tr	E1	E2	C1	C2	Tr	E1	E2	C1	C2	Tr	E1	E2	C1	C2	Tr	E1	E2	C1	C2
		Scenario (I)
200	20%	213	196	219	220	216	74	64	71	91	91	192	183	196	208	208	508	642	634	572	624
	40%	215	202	222	236	239	76	68	74	106	112	185	177	188	212	217	509	621	615	562	607
400	20%	201	172	207	204	197	65	48	63	74	66	181	165	188	192	184	515	694	687	593	655
	40%	205	178	212	209	206	70	51	66	80	73	176	160	180	188	181	526	675	666	586	647
		Scenario (II)
200	20%	73	76	83	139	175	13	13	14	38	58	172	172	182	196	214	531	537	537	533	534
	40%	76	79	86	192	236	13	14	15	69	98	161	161	171	208	235	530	538	536	536	536
400	20%	64	59	81	91	114	11	10	13	19	26	164	160	172	177	184	530	547	546	539	541
	40%	62	64	83	113	145	10	11	14	26	41	156	150	159	172	184	531	544	542	536	538

Table 3: Summaries of integrated mean absolute error (IMAE)

(\times 1000)

, integrated mean squared error (IMSE)

(\times 1000)

, integrated Brier score (IBS)

(\times 1000)

, and integrated concordance (ICON)

(\times 1000)

of different methods in the second set of simulations. cen is the censoring percentage; Tr: the proposed tree; E1: the proposed survival tree ensemble; E2: the original random survival forest; C1: the landmark Cox model.

		IMAE				IMSE				IBS				ICON
$n$	cen	Tr	E1	E2	C1	Tr	E1	E2	C1	Tr	E1	E2	C1	Tr	E1	E2	C1
		Scenario (III-A)
200	20%	192	167	178	310	84	72	79	237	98	79	86	260	816	902	900	531
	40%	266	194	198	405	114	85	93	319	140	81	87	323	610	906	906	530
400	20%	173	154	169	255	68	61	69	185	70	67	78	210	904	909	909	544
	40%	204	178	191	323	85	76	84	239	74	64	76	241	882	917	915	523
		Scenario (III-B)
200	20%	234	216	219	249	94	81	89	122	143	132	133	161	544	713	700	614
	40%	245	228	215	268	99	83	87	138	144	133	133	169	545	652	645	584
400	20%	212	181	192	213	84	65	70	92	132	116	126	133	572	764	760	646
	40%	224	196	193	226	92	67	68	101	135	117	127	136	584	708	703	619
		Scenario (III-C)
200	20%	236	214	228	259	108	93	97	125	150	138	138	168	524	822	713	643
	40%	253	233	256	292	121	106	110	150	160	147	147	187	522	783	678	623
400	20%	232	187	206	236	95	66	85	98	138	113	128	143	550	901	810	688
	40%	247	204	233	258	106	77	94	114	146	122	134	153	574	872	777	669
		Scenario (IV-A)
200	20%	183	163	170	311	80	69	69	238	94	80	81	261	844	902	876	519
	40%	248	191	195	411	109	83	81	324	127	81	83	329	680	906	871	526
400	20%	168	150	165	258	74	59	66	187	79	67	78	209	906	909	879	510
	40%	197	173	187	325	90	72	79	241	83	64	76	240	885	915	876	510
		Scenario (IV-B)
200	20%	199	180	181	206	73	62	63	91	142	132	134	161	543	717	706	615
	40%	198	178	177	216	72	59	60	98	144	133	133	168	549	655	648	585
400	20%	184	153	156	190	65	49	53	72	130	115	125	132	581	770	770	647
	40%	186	152	157	188	66	46	50	72	134	117	127	135	580	712	707	620
		Scenario (IV-C)
200	20%	269	245	243	263	114	98	103	130	150	136	137	167	520	828	833	649
	40%	299	276	275	297	128	112	117	155	159	146	146	185	522	789	789	631
400	20%	252	209	212	244	101	81	90	114	147	111	125	151	552	904	897	694
	40%	278	239	240	274	114	93	100	131	155	120	132	162	571	878	871	674
		Scenario (V-A)
200	20%	212	185	190	349	89	75	78	311	133	112	115	333	688	832	824	534
	40%	229	208	215	368	103	90	92	350	115	100	103	316	707	815	820	526
400	20%	193	166	179	377	75	66	69	310	107	105	105	362	828	859	843	534
	40%	220	190	206	427	91	83	84	365	95	92	93	347	810	847	842	539
		Scenario (V-B)
200	20%	180	158	154	188	54	42	42	70	161	150	151	175	564	672	677	572
	40%	189	164	164	200	62	46	47	83	147	135	138	164	569	650	651	566
400	20%	168	136	141	173	53	31	35	55	160	135	143	161	566	715	710	599
	40%	177	141	149	178	63	33	39	62	147	120	128	146	604	691	685	589
		Scenario (V-C)
200	20%	234	208	207	239	89	71	71	108	157	140	141	174	527	747	733	580
	40%	242	222	223	257	101	80	81	128	145	129	130	167	549	712	694	568
400	20%	219	179	181	226	83	54	55	91	151	125	127	159	581	827	812	609
	40%	219	192	194	237	90	61	62	101	137	114	112	146	631	808	797	595

7 Application to the Cystic Fibrosis Foundation Patient Registry Data

Understanding the risk factors associated with the progressive loss of lung function is crucial in managing CF. The risk for lung disease depends on patient characteristics, and certain patient groups, such as Hispanic patients, are at increased risk of severe disease for reasons not yet known (McGarry et al., 2019). Our goal is to build prediction models for the development of moderate airflow limitation, defined as the first time that ppFEV1 drops below 80% in CFPPR. Our analysis focused on 5,398 pediatric CF patients who were diagnosed before age one between 2008 and 2013; among them, 419 were Hispanic. The data were subjected to right-censoring due to loss of follow-up or administrative censoring. A total of 4,507 failure events were observed with a median follow-up time of 7.71 years.

The rich information in the CFPPR data renders the possibility of a comprehensive evaluation of important risk factors. We considered baseline predictors including gender, ethnicity, maternal education status ( $\geq$ 16 years of education vs. else), insurance status, geographic location (West, Midwest, Northeast, and South), and mutation class (severe, mild, and unknown). Since baseline factors may have limited predictability, we further included repeated measurements and intermediate events as predictors. The longitudinal measurements, ppFEV1, percent predicted forced vital capacity (ppFVC), weight, and height, were assessed regularly throughout the study and were annualized at integer ages via last observation carried forward. The intermediate events include different subtypes of PA (initial acquisition, mucoid, chronic, multidrug-resistant), methicillin-sensitive staphylococcus aureus (MSSA), methicillin-resistant staphylococcus aureus (MRSA), as well as the diagnoses of CF-related diabetes (CFRD) and pancreatic insufficiency.

For landmark prediction, we considered the following fixed and random landmark times:

(LM1)

The landmark time is age 7, with the target prediction interval $[7,22]$ .
(LM2)

The landmark time is age 12, with the target prediction interval $[12,22]$ .
(LM3)

The landmark time is the acquisition of chronic Pseudomonas aeruginosa (cPA), and the target prediction interval is from the time of acquiring cPA to age 22.

The fixed landmark ages 7 and 12 correspond to middle childhood and preadolescence. The random landmark event cPA was considered because patients with cPA are more likely to develop increased inflammation, leading to an accelerated loss in lung function (Kamata et al., 2017). The median time to cPA in our dataset was 15.5 years. While PA is a frequent pathogen in cystic fibrosis, early PA is eradicated in the majority of patients through inhaled and intravenous antibiotics (Döring et al., 2004; Heltshe et al., 2018). When PA is not eradicated after initial PA, it converts to chronic PA. Since initial PA is frequently eradicated and does not have the long-term impact on pulmonary function, we chose the landmark time of cPA (Harun et al., 2016). A model using initial PA as the landmark is reported in the Supplementary Material. To perform risk prediction, we used Model (LM1) to obtain the future risk for a patient who is event-free at age 7. An updated prediction can be carried out using Model (LM2) if the patient remains event-free at age 12. Upon converting to cPA, the predicted risk can be updated using Model (LM3). Although landmark prediction models can be constructed at multiple time points, we suggest practitioners focusing on the predicted probabilities given by the landmark model that is most close in time. To illustrate the event history and landmark times, we show the occurrence of PA during the follow-up period for a random sample of 50 patients in the Supplementary Material. At each landmark time, the exact timing of an intermediate event is available only if it has occurred before the landmark time.

The data were partitioned into a training set (60%) and a test set (40%). The landmark prediction models were built using the training data and were evaluated on the test data via the proposed concordance measure. Results from the simple landmark Cox model and the two-stage approach were also reported for comparison. Similar to the conventional landmark prediction models, the Cox models were constructed using subjects who remained event-free and uncensored at each landmark time. In the Cox models, the $j$ th partially observed intermediate events was incorporated via two predictors, $U_{j}I(U_{j}\leq{T_{\rm L}})$ and $I(U_{j}>{T_{\rm L}})$ . In the simple landmark Cox model, the partially observed repeated measurements at $t_{k}$ were expressed using ${\bm{W}}(t_{k})I(t_{k}\leq{T_{\rm L}})$ and $I(t_{k}>{T_{\rm L}})$ under Model (LM3). The models were built in the way described in the simulation section.

The concordance measures are summarized in Table 4. For landmark prediction at ages 7 and 12, we reported the average ${\rm CON}_{t}$ at 50 equally spaced time points on the target prediction intervals. For the prediction at the landmark age of 12, both Models (LM1) and (LM2) can be applied: (LM1) used history up to age 7 while (LM2) used history up to age 12. As expected, incorporating additional information between ages 7 and 12 results in an increase of average concordance from 0.711 to 0.739 in our ensemble model. For the landmark model at cPA, we used the concordance at a time horizon of 5 years after cPA as the evaluation criterion. Since we focus on the risk prior to age 22, predicting the 5-year risk for individuals who acquired the chronic form of PA after age 17 is not feasible. Therefore, the concordance was evaluated in the subsample of subjects who developed cPA before age 17. The ensemble method yielded better performances compared to its competitors.

Table 4: Concordance measures evaluated using the test data in the CFFPR analysis. The column Landmark gives the left bound of the target prediction interval. The integrated concordance is reported for Models (LM1) and (LM2) over the interval where the risk prediction is performed, where

\widehat{\rm CON}_{[a,b]}=\sum_{j=1}^{50}\widehat{\rm CON}_{t_{j}}/50

and

t_{j}=a+(b-a)j/50

. The concordance at year 5 after cPA is reported for Model (LM3).

Landmark	Measure	Model	Tree	Ensemble	Cox-1	Cox-2
Age 7	$\widehat{\rm CON}_{[0,15]}$	LM1	0.654	0.748	0.695	0.699
Age 12	$\widehat{\rm CON}_{[0,10]}$	LM2	0.667	0.739	0.674	0.674
Age 12	$\widehat{\rm CON}_{[0,10]}$	LM1	0.620	0.711	0.611	0.669
cPA	$\widehat{\rm CON}_{5}$	LM3	0.763	0.813	0.788	-

Note: Cox-1 stands for the simple one-stage landmark Cox models, and Cox-2 stands for the two-stage landmark Cox models.

In an attempt to identify important predictors in the ensemble models, we computed the permutation variable importance with 100 permutations. Under the fixed landmark time models, we permuted all of the repeated measurements of a marker simultaneously to evaluate the overall impact of the longitudinal marker. The results of the permutation variable importance are summarized in Figure 3, where ppFEV1, ppFVC, weight, and height are identified as the top four important predictors for both landmark ages 7 and 12 (Figures 3(a) and 3(b)). Following them, intermediate events related to PA and staphylococcus aureus are moderately important. We note that mucoid PA and MSSA became more important at age 12 when compared to age 7. This could be due to the fact that these intermediate events are less common before age 7. When using cPA as the landmark, the repeated measurements after the acquisition of cPA were not used in prediction, and thus the number of observed repeated measurements varied across subjects. Unlike baseline variables and intermediate events of which the permutations were performed among subjects who experienced cPA, the permutation of a marker at a specific time point (e.g., age 7) was performed among subjects who experienced cPA after the time point. To this end, we plot the variable importance for predictors at different ages separately, so that the variable importance measures in each plot are based on permuting the same set of subjects. Figure 4(a) shows the importance of baseline variables and intermediate events. The timing of cPA plays an important role in predicting future event risk. We note that baseline factors such as ethnicity have a relatively low variable importance. However, this does not mean that ethnicity does not affect the risk of lung function decline. We conjecture that the effect of ethnicity was predominantly mediated through spirometry measurements. Therefore, one should be cautious when interpreting the variable importance. When applying the proposed method in health disparity research, one can further build separate models for Hispanic and non-Hispanic patients. Additional analyses in different ethnic groups are included in the Supplementary Material.

Refer to caption — (a) Variable importance when the landmark time is age 7 and the risk prediction interval is $[7,22]$ .

Our models identify repeated measurements of weight and height as important variables in landmark prediction. To provide more insight into how historical weight measurements affect future risk in our ensemble model, we present the predicted event-free probabilities for hypothetical patients with different weight trajectories in Figure 5. Specifically, we consider Hispanic and non-Hispanic male patients whose weights were in the 10th, 50th, and 90th weight-for-age percentiles of the corresponding ethnicity-gender subgroup. For all six patients, the intermediate event times were fixed at the median derived from the Kaplan-Meier estimates, while categorical predictors and continuous predictors were fixed at the reference levels and mean values, respectively. At the landmark age 7, patients with the 10th percentile weight trajectories had the highest predicted risk, followed by patients with 90th percentile and those with 50th percentile weight trajectories (Figure 5(a)). At the landmark age 12, the predicted risk in patients with 10th percentile weight remains the highest, followed by patients with 50th percentile and those with 90th percentile weight trajectories (Figure 5(b)). A possible explanation for the predicted curves of the 50th and 90th weight percentiles to flip between landmark age 7 and 12 is a higher degree of survivor bias at the later landmark age; in other words, the event-free individuals at landmark age 12 tended to have better lung function than those at landmark age 7. So if overweight is associated with a higher risk of lung function decline, the 90th percentile patients are more likely to fail before age 12. As a result, we have a group of healthier overweight subjects at the landmark age 12, and thus their risk beyond age 12 can be lower than individuals with 50th percentile weight. As for underweight individuals, the degree of selection bias may not be large enough to compensate for the poor lung function and thus remains the highest risk group at landmark age 12. To summarize, the landmark prediction models built at age 7 and age 12 are for different survivor populations, and thus can not be directly compared.

8 Discussion

In this paper, we proposed a unified framework for tree-based risk prediction with updated information. Compared to semiparametric methods, our methods can handle a large, growing number of predictors over time and do not impose strong model assumptions. Furthermore, the landmark times at which a prediction is performed are allowed to be subject-specific and defined by intermediate clinical events. Notably, our ensemble procedure averages the unbiased martingale estimation equations instead of survival probabilities and avoids the potential bias arising due to small terminal node sizes.

Our discussion has focused on the case where the time-dependent variables ${\bm{W}}(\cdot)$ are observed at fixed time points $t_{1},\ldots,t_{K}$ . The proposed method can also be applied to the case where repeated measurements are collected at irregular time points, such as hospitalizations. When building prediction models, one can consider using up to $K$ repeated measurements as predictors, where $K$ is a fixed integer. Denote by $V_{1}<V_{2}<\cdots<V_{K}$ the potential observation times of ${\bm{W}}(\cdot)$ . At time $t$ , the available information can be expressed using ${\bm{X}}(t)=\{{\bm{W}}(V_{1},t),\ldots,{\bm{W}}(V_{K},t),V_{1}(t),\ldots,V_{K}(t)\}$ , where ${\bm{W}}(V_{k},t)={\bm{W}}(V_{k})$ and $V_{k}(t)=V_{k}$ if $V_{k}\leq t$ , while ${\bm{W}}(V_{k},t)=\textbf{NA}_{q}$ and $V_{k}(t)=t^{+}$ otherwise. In other words, the random measurement times $V_{1},\cdots,V_{K}$ are treated as intermediate event times. In this way, our framework can incorporate irregularly observed covariate information.

Acknowledgements

The authors would like to thank the three anonymous referees, the Associate Editor, and the Editor for their helpful comments that improved the quality of this paper. The authors thank the CF Foundation for the use of the CFFPR data to conduct this study. Additionally, we would like to thank the patients, care providers, and care coordinators at CF centers throughout the U.S. for their contributions to the CFFPR. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung and Blood Institute (NHLBI), the National Institutes of Health (NIH) and the U.S. Department of Health and Human Services. Sun’s research was supported by NIH (R21HL156228). Wu’s research was supported in part by the Intramural Research Program of the NHLBI/NIH. Huang’s research was supported by NIH (R01CA193888). McGarry’s research was supported by NIH (1K23HL133437-01A1) and CF Foundation Therapeutics (MCGARR16A0).

References

Breiman (1996) Breiman, L. (1996). Bagging predictors. Machine Learning 24, 123–140.
Breiman (2001) Breiman, L. (2001). Random forests. Machine Learning 45, 5–32.
Ciampi et al. (1986) Ciampi, A., Thiffault, J., Nakache, J.-P., and Asselain, B. (1986). stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covariates. Computational Statistics & Data Analysis 4, 185–204.
Davis and Anderson (1989) Davis, R. B. and Anderson, J. R. (1989). Exponential survival trees. Statistics in Medicine 8, 947–961.
Domanski et al. (2020) Domanski, M. J., Tian, X., Wu, C. O., Reis, J. P., Dey, A. K., Gu, Y., Zhao, L., Bae, S., Liu, K., Hasan, A. A., et al. (2020). Time course of LDL cholesterol exposure and cardiovascular disease event risk. Journal of the American College of Cardiology 76, 1507–1516.
Döring et al. (2004) Döring, G., Hoiby, N., Group, C. S., et al. (2004). Early intervention and prevention of lung disease in cystic fibrosis: a european consensus. Journal of Cystic fibrosis 3, 67–91.
Ferrer et al. (2019) Ferrer, L., Putter, H., and Proust-Lima, C. (2019). Individual dynamic predictions using landmarking and joint modelling: Validation of estimators and robustness assessment. Statistical Methods in Medical Research 28, 3649–3666.
Friedman et al. (2001) Friedman, J., Hastie, T., and Tibshirani, R. (2001). The Elements of Statistical Learning. New York: Springer.
Gordon and Olshen (1985) Gordon, L. and Olshen, R. A. (1985). Tree-structured survival analysis. Cancer Treatment Reports 69, 1065–1069.
Harun et al. (2016) Harun, S. N., Wainwright, C., Klein, K., and Hennig, S. (2016). A systematic review of studies examining the rate of lung function decline in patients with cystic fibrosis. Paediatric respiratory reviews 20, 55–66.
Heagerty et al. (2000) Heagerty, P. J., Lumley, T., and Pepe, M. S. (2000). Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344.
Heltshe et al. (2018) Heltshe, S., Khan, U., Beckett, V., Baines, A., Emerson, J., Sanders, D., Gibson, R., Morgan, W., and Rosenfeld, M. (2018). Longitudinal development of initial, chronic and mucoid pseudomonas aeruginosa infection in young children with cystic fibrosis. Journal of Cystic Fibrosis 17, 341–347.
Hothorn et al. (2006) Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., and Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics 7, 355–373.
Hothorn et al. (2004) Hothorn, T., Lausen, B., Benner, A., and Radespiel-Tröger, M. (2004). Bagging survival trees. Statistics in Medicine 23, 77–91.
Ishwaran et al. (2008) Ishwaran, H., Kogalur, U. B., Blackstone, E. H., and Lauer, M. S. (2008). Random survival forests. The Annals of Applied Statistics 2, 841–860.
Jewell and Nielsen (1993) Jewell, N. P. and Nielsen, J. P. (1993). A framework for consistent prediction rules based on markers. Biometrika 80, 153–164.
Kamata et al. (2017) Kamata, H., Asakura, T., Suzuki, S., Namkoong, H., Yagi, K., Funatsu, Y., Okamori, S., Uno, S., Uwamino, Y., Fujiwara, H., Nishimura, T., Ishii, M., Betsuyaku, T., and Hasegawa, N. (2017). Impact of chronic pseudomonas aeruginosa infection on health-related quality of life in mycobacterium avium complex lung disease. BMC Pulmonary Medicine 17, 198.
Knapp et al. (2016) Knapp, E. A., Fink, A. K., Goss, C. H., Sewall, A., Ostrenga, J., Dowd, C., Elbert, A., Petren, K. M., and Marshall, B. C. (2016). The Cystic Fibrosis Foundation Patient Registry. Design and methods of a national observational disease registry. Annals of the American Thoracic Society 13, 1173–1179.
LeBlanc and Crowley (1992) LeBlanc, M. and Crowley, J. (1992). Relative risk trees for censored survival data. Biometrics 48, 411–425.
LeBlanc and Crowley (1993) LeBlanc, M. and Crowley, J. (1993). Survival trees by goodness of split. Journal of the American Statistical Association 88, 457–467.
Lin and Jeon (2006) Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. Journal of the American Statistical Association 101, 578–590.
Mannino et al. (2006) Mannino, D. M., Reichert, M. M., and Davis, K. J. (2006). Lung function decline and outcomes in an adult population. American Journal of Respiratory and Critical Care Medicine 173, 985–990.
Maziarz et al. (2017) Maziarz, M., Heagerty, P., Cai, T., and Zheng, Y. (2017). On longitudinal prediction with time-to-event outcome: Comparison of modeling options. Biometrics 73, 83–93.
McGarry et al. (2020) McGarry, M. E., Huang, C.-Y., Nielson, D. W., and Ly, N. P. (2020). Early acquisition and conversion of Pseudomonas aeruginosa in hispanic youth with cystic fibrosis in the United States. Journal of Cystic Fibrosis page In press.
McGarry et al. (2019) McGarry, M. E., Neuhaus, J. M., Nielson, D. W., and Ly, N. P. (2019). Regional variations in longitudinal pulmonary function: A comparison of Hispanic and non-Hispanic subjects with cystic fibrosis in the United States. Pediatric Pulmonology 54, 1382–1390.
McIntosh and Pepe (2002) McIntosh, M. W. and Pepe, M. S. (2002). Combining several screening tests: Optimality of the risk score. Biometrics 58, 657–664.
Molinaro et al. (2004) Molinaro, A. M., Dudoit, S., and Van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored data. Journal of Multivariate Analysis 90, 154–177.
Morgan et al. (2016) Morgan, W. J., Vandevanter, D. R., Pasta, D. J., Foreman, A. J., Wagener, J. S., Konstan, M. W., Morgan, W., Konstan, M., Liou, T., McColley, S., et al. (2016). Forced expiratory volume in 1 second variability helps identify patients with cystic fibrosis at risk of greater loss of lung function. The Journal of Pediatrics 169, 116–121.
Parast et al. (2012) Parast, L., Cheng, S.-C., and Cai, T. (2012). Landmark prediction of long-term survival incorporating short-term event time information. Journal of the American Statistical Association 107, 1492–1501.
Proust-Lima et al. (2016) Proust-Lima, C., Dartigues, J.-F., and Jacqmin-Gadda, H. (2016). Joint modeling of repeated multivariate cognitive measures and competing risks of dementia and death: A latent process and latent class approach. Statistics in Medicine 35, 382–398.
Rizopoulos (2011) Rizopoulos, D. (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics 67, 819–829.
Rizopoulos et al. (2017) Rizopoulos, D., Molenberghs, G., and Lesaffre, E. M. (2017). Dynamic predictions with time-dependent covariates in survival analysis using joint modeling and landmarking. Biometrical Journal 59, 1261–1276.
Segal (1988) Segal, M. R. (1988). Regression trees for censored data. Biometrics 44, 35–47.
Steingrimsson et al. (2016) Steingrimsson, J. A., Diao, L., Molinaro, A. M., and Strawderman, R. L. (2016). Doubly robust survival trees. Statistics in Medicine 35, 3595–3612.
Steingrimsson et al. (2019) Steingrimsson, J. A., Diao, L., and Strawderman, R. L. (2019). Censoring unbiased regression trees and ensembles. Journal of the American Statistical Association 114, 370–383.
Sun et al. (2022) Sun, Y., Chiou, S. H., Wu, C. O., McGarry, M., and Huang, C.-Y. (2022). Supplement to “dynamic risk prediction triggered by intermediate events using survival tree ensembles”.
Sweeting et al. (2017) Sweeting, M. J., Barrett, J. K., Thompson, S. G., and Wood, A. M. (2017). The use of repeated blood pressure measures for cardiovascular risk prediction: A comparison of statistical models in the aric study. Statistics in Medicine 36, 4514–4528.
Tanner et al. (2021) Tanner, K. T., Sharples, L. D., Daniel, R. M., and Keogh, R. H. (2021). Dynamic survival prediction combining landmarking with a machine learning ensemble: Methodology and empirical comparison. Journal of the Royal Statistical Society: Series A (Statistics in Society) 184, 3–30.
Taylor et al. (2013) Taylor, J. M. G., Park, Y., Ankerst, D. P., Proust-Lima, C., Williams, S., Kestin, L., Bae, K., Pickles, T., and Sandler, H. (2013). Real-time individual predictions of prostate cancer recurrence using joint models. Biometrics 69, 206–213.
Twala et al. (2008) Twala, B., Jones, M. C., and Hand, D. J. (2008). Good methods for coping with missing data in decision trees. Pattern Recognition Letters 29, 950–956.
van Houwelingen (2007) van Houwelingen, H. C. (2007). Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics 34, 70–85.
van Houwelingen and Putter (2008) van Houwelingen, H. C. and Putter, H. (2008). Dynamic predicting by landmarking as an alternative for multi-state modeling: An application to acute lymphoid leukemia data. Lifetime Data Analysis 14, 447–463.
van Houwelingen and Putter (2011) van Houwelingen, H. C. and Putter, H. (2011). Dynamic prediction in clinical survival analysis. Boca Raton: CRC Press.
Wang et al. (2017) Wang, J., Luo, S., and Li, L. (2017). Dynamic prediction for multiple repeated measures and event time data: An application to Parkinson’s disease. The Annals of Applied Statistics 11, 1787–1809.
Wright and Ziegler (2017) Wright, M. N. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software 77, 1–17.
Zhang (1995) Zhang, H. (1995). Splitting criteria in survival trees. Statistical Modelling 104, 305–313.
Zhang et al. (1996) Zhang, H., Holford, T., and Bracken, M. B. (1996). A tree-based method of analysis for prospective studies. Statistics in Medicine 15, 37–49.
Zheng and Heagerty (2005) Zheng, Y. and Heagerty, P. J. (2005). Partly conditional survival models for longitudinal data. Biometrics 61, 379–391.
Zhu and Kosorok (2012) Zhu, R. and Kosorok, M. R. (2012). Recursively imputed survival trees. Journal of the American Statistical Association 107, 331–340.
Zhu et al. (2019) Zhu, Y., Li, L., and Huang, X. (2019). Landmark linear transformation model for dynamic prediction with application to a longitudinal cohort study of chronic disease. Journal of the Royal Statistical Society: Series C (Applied Statistics) 68, 771–791.