Regression Trees for Cumulative Incidence Functions
Abstract
The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past decade. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and related ensemble methods, have begun only recently. In this paper, we develop a novel approach to building regression trees for estimating cumulative incidence curves in a competing risks setting. The proposed methods employ augmented estimators of the Brier score risk as the primary basis for building and pruning trees. The proposed methods are easily implemented using the
R statistical software package. Simulation studies demonstrate the utility of our approach in the competing risks setting. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.
Keywords: Brier score; CART; Cause-specific hazard; Fine and Gray model; Sub-distribution function
1 Introduction
A subject being followed over time may experience several types of events related, for example, to disease morbidity and mortality. For example, in a Phase III trial of concomitant versus sequential chemotherapy and thoracic radiotherapy for patients with inoperable non-small cell lung cancer (NSCLC) conducted by the Radiation Therapy Oncology Group (RTOG), patients were followed up to 5 years, the occurrence of either disease progression or death being of particular interest. Such “competing risks” data are commonly encountered in cancer and other biomedical follow-up studies, in addition to the potential complication of right-censoring on the event time(s) of interest.
Two quantities are often used when analyzing competing risks data: the cause-specific hazard function (CSH) and the cumulative incidence function (CIF). For a given event, the former describes the instantaneous risk of this event at time , given that no events have yet occurred; the latter describes the probability of occurrence, or absolute risk, of that event across time and can be derived directly from the subdistribution hazard function (fine1999proportional). The literature on competing risks is growing rapidly, particularly for hazard-based regression modeling; see dignam for a contemporary review, where parametric and semi-parametric approaches to modeling both the CSH and CIF are considered. The literature on tree-based methods for competing risks remains much less developed. leblanc1992relative extended the classification and regression tree (CART) procedure of breiman1984classification to the case of right-censored survival data by replacing the squared error loss function ordinarily used for a continuous outcome with an approximation to the likelihood function derived under a certain proportional hazards assumption. Motivated by methods for handling monotone missing data, molinaro2004tree later proposed using an inverse probability weighted (IPCW) squared error loss function to directly generalize the original CART procedure for right-censored survival data, obtaining a procedure that reduces to the standard form of CART for a continuous outcome when censoring is absent. steingrimsson2016doubly recently proposed a “doubly robust” extension motivated by semiparametric efficiency theory for missing data (robins1994estimation; tsiatis2007semiparametric), and demonstrated significantly improved performance in comparison to the IPCW-based tree procedure of molinaro2004tree. However, such methods cannot be used when there is more than one type of event (i.e., in addition to censoring). For two or more competing risks, callaghan2008 proposed two methods of building classification trees for the CIF: by maximizing between-node heterogeneity via the two-sample log-rank test of gray1988class; and, by maximizing within-node homogeneity using sums of event-specific martingale residuals. To our knowledge, no software package is available that implements these methods, and more importantly, there is also no software specifically targeted to the problem of fitting regression trees to competing risk outcomes. In contrast, ensemble methods for estimating the CIF as a function of covariates do exist; see, for example, ishwaran2014random and mogensen2013random. ishwaran2014random implement their methods in the randomForestSRC package (rfsrc), where the unpruned trees that make up the bootstrap ensemble are typically built using splitting criteria designed for group comparisons with a single competing risk outcome, such as the generalized logrank test.
This paper proposes a novel CART-based approach to building regression trees under competing risks. In Section 2, we introduce the relevant data structures and describe a direct approach to building regression trees for estimating the CIF for a specified cause when there is no loss to follow-up. Section 3 develops the necessary extensions of these loss functions for right-censored outcomes, along with extensions for estimating the CIF at several time points. These methods focus on directly estimating the CIF, avoiding estimation of the cause-specific or subdistribution hazard function. In Section 4, simulation studies are used to investigate performance of these new methods and Section 5 summarizes analysis results for the RTOG 9410 Phase III lung cancer trial mentioned at the beginning of this section. The paper concludes with comments on future work in Section 6.
2 CIF Regression Trees with Uncensored Data
This section introduces the relevant competing risks data structure and reviews a CART-based approach to building regression trees for estimating the CIF in the case where there is no other loss to follow-up (i.e., no random censoring). This necessary background will allow us to develop an analogous approach when there is loss to follow-up.
2.1 Full Data Structure
Let be the time to failure for failure type and let be a vector of covariates, where . Let be the minimum of all latent failure times; it is assumed has a bounded multivariate density function and hence that is continuous. Then, in the absence of other loss to follow-up, is assumed to be the fully observed (or full) data for a subject, where is the event type corresponding to This assumption implies that is observed and, in addition, that for every however, itself is not assumed to be observed for . Let be the full data observed on independent subjects, where are assumed to be identically distributed (i.i.d.). The dependence structure of is left unspecifed.
2.2 CIF Estimation Under a Brier Loss Function
Let The set of CIFs can be estimated from the data using any of several suitable parametric or semiparametric methods without further assumptions (e.g., independence of ). This section introduces a loss-based method for estimating for a fixed cause and time point in the case where is piecewise constant as a function of This is a key step in our proposed approach to building a regression tree to estimate
Let form a partition of Define and suppose that is, subjects falling into partition all share the same CIF Define and let be a model for Then, fixing both and the Brier loss (cf., brier1950verification) is an unbiased estimator of the risk or equivalently, Assuming that is observed,
(1) |
is also an unbiased estimator of When considered as a function of the set of scalar parameters (i.e., for fixed and ), the empirical Brier loss (1) is minimized when where
(2) |
is a nonparametric estimate for
2.3 Estimating a CIF Regression Tree Using CART
The CART algorithm of breiman1984classification estimates a regression tree as follows:
-
1.
Using recursive binary partitioning, grow a maximal tree by selecting a (covariate, cutpoint) combination at every stage that minimizes an appropriate loss function;
-
2.
Using cross-validation, select the best tree from the sequence of candidate trees generated by Step 1 via cost complexity pruning (i.e., using penalized loss).
In its most commonly used form, CART estimates the conditional mean response as a piecewise constant function on the covariate space, making all decisions on the basis of minimizing squared error loss. The resulting tree-structured regression function estimates the predicted response in each terminal node using the sample mean of the observations falling into that node. This process generalizes in a straightforward way to more general loss functions.
In the absence of censoring and under a piecewise constant model
(3) |
for Section 2.2 shows that a nonparametric estimate for for a given cause at a fixed can be obtained by minimizing the loss (1), a problem equivalent to estimating the conditional mean response from the modified dataset by minimizing the squared error loss (1). Thus, any implementation of CART for squared error loss (e.g., rpart) applied to will produce a regression tree estimate of In particular, CART estimates and the associated terminal nodes from the data and within each terminal node, estimates by (2). While statistically inefficient, this process can be repeated for each and any to generate nonparametric estimates of the CIF at a given set of time points for every cause.
3 CIF Regression Trees with Censored Data
With “full data” the squared error loss plays a critical role in both steps of the CART algorithm described at the beginning of Section 2.3 when estimating a conditional mean (e.g., breiman1984classification, Chapter 3.3). In follow-up studies involving competing risks outcomes, the full data might not be observed due to loss to follow-up. In this case, estimating for a specified under the squared error loss (1) is not possible and one cannot run the CART procedure of Section 2.3 as described.
An attractive way to overcome this difficulty is to use to construct an observed data loss function that has the same risk as the desired full data loss function (c.f., molinaro2004tree; lostritto2012partitioning; steingrimsson2016doubly), that is, the empirical Brier loss (1). This section extends the developments of Section 2 to the case of right-censored competing risks by deriving several new observed data loss functions that share the same risk as (1). By substituting any of these new loss functions in for (1) everywhere throughout the CART algorithm of Section 2.3, one obtains a new regression tree method for estimating Similarly to steingrimsson2016doubly, direct implementation is possible using rpart, which provides users with the ability to customize loss functions and decision making procedures. An appealing feature of the algorithms induced by these new loss functions is that each also reduces to the algorithm described in Section 2.3 when censoring is absent.
3.1 Observed Data Structure
Let be a continuous right-censoring random variable that, given , is statistically independent of For a given subject, assume that instead of we only observe where and is the (any) event indicator. The observed data on i.i.d. subjects is Similarly to the case where random censoring on permits estimation of the CIF from the data .
3.2 Estimating via IPCW Loss
Fix and let Define for any and suppose that almost surely for some . Let and in addition, define Easy calculations then show
for a fixed This risk equivalence motivates the construction of an IPCW-type observed data loss function that is an unbiased estimator for Define for any suitable survivor function
(4) |
then, has the same risk as (1) and it follows that (4) is minimized by
(5) |
Observe that (4) and (5) respectively reduce to (1) and (2) when censoring is absent.
When and we have and the loss function (4) is then just a special case of that considered in molinaro2004tree (see also steingrimsson2016doubly). Similarly, for and setting the loss function (4) is just that considered in lostritto2012partitioning. Hence, (4) extends several existing loss functions to the problem of estimating a CIF. In practice, an estimator for is used in (4); popular approaches here include product-limit estimators derived from the Kaplan-Meier and Cox regression estimation procedures.
Given different choices of in (4) generate different losses, hence different CART algorithms. However, there are just two important choices of in (4): (standard IPCW; IPCW) and (modified IPCW; IPCW). In selecting any observations that are censored after time can still contribute (all-cause failure) information to (4); as , the influence of these observations eventually vanishes. Consequently, the IPCW loss uses more of the observed data in calculating the loss function and associated minimizers compared to the IPCW loss and one can therefore expect a regression tree built using (4) for to perform as well or better than one derived from (4) with . The use of in place of has an additional practical advantage: since for every we have making it easier to satisfy the empirical positivity condition required for implementation.
3.3 Improving the Efficiency of IPCW Loss Functions
3.3.1 Estimating by augmenting the IPCW loss
As in steingrimsson2016doubly, one can employ semiparametric estimation theory for missing data to construct an improved estimator of the full data risk In particular, one can augment the IPCW loss function (4) with additional information on censored subjects, thereby incorporating additional information into the tree building process.
Consider first the loss function (4) with we denote this by Define as the set of CIFs of interest and let denote a corresponding model that may or may not coincide with . For any and define it is shown later how this expression depends on Then, fixing the augmented estimator of having the smallest variance that can be constructed from the unbiased estimator is given by where, for suitable models and
(6) |
and with denoting the cumulative hazard function corresponding to (cf. tsiatis2007semiparametric, Sec. 9.3, 10.3). The “doubly robust” loss function reduces to a special case of the class of loss functions proposed in steingrimsson2016doubly when
Instead of augmenting (i.e., IPCW), one might instead augment (4) when (i.e., IPCW). However, it turns out that the resulting loss function is identical to see the Appendix (Section A.1). Hence, no additional efficiency is gained from augmenting IPCW and it suffices to consider only.
Returning to because is binary, we have
(7) |
for any suitable (e.g., ), where reduces to
(8) |
The notation and means that these quantities are calculated under the CIF model specification . Hence, under a model , the calculation of requires estimating both the CIF for cause and the all-cause probability
Considering as a function of the scalar parameters only and differentiating with respect to each one, it can be shown that
(9) |
minimize where
The validity of this result relies on the assumption that for some and each Under this same assumption, Lemma 1 of strawderman2000estimating implies
letting it follows that (9) can be rewritten as
(10) |
As in Section 3.2, and (10) respectively reduce to (1) and (2) when censoring is absent.
The specification for all and generates an interesting special case of despite the fact that is necessarily incorrectly modeled in the presence of censoring. In particular, for any suitable choice of , (i) where
and, (ii) for is an unbiased estimator of the risk . In fact, noting that (7) implies can be rewritten in terms of for every the minimizer of is given by
That is, under the loss the estimator for is the Buckley-James (BJ) estimator of the mean response within node (buckley1979linear), an estimator that can also be derived directly from (10) by setting For this reason, we refer to as the Buckley-James loss function.
3.3.2 Composite augmented loss functions with multiple time points
Under a tree model of the form (3), and similarly to methods used for survival trees (e.g., leblanc1992relative; molinaro2004tree; steingrimsson2016doubly), the quantity being estimated within each terminal node depends on the choice of but the underlying partition structure is time-invariant. That is, the effects of baseline covariates and their interactions on the CIF are not allowed to change over time. As a result, for a given , we can further reduce variability when estimating by considering loss functions constructed from that incorporate information over several time points.
Recall that where is given by (4) (i.e., for ) and is given by (6). For a given set of time points a simple composite loss function for a given event type can be formed by calculating
(11) |
where are positive weights that, without loss of generality, can be assumed to sum to one. Minimizing (11) with respect to gives
(12) |
The terminal node estimators (12) are exactly equivalent to (10) computed for and do not depend on . However, the corresponding assessment of total loss and thus decisions to split nodes at any given cutpoint for any given covariate are directly influenced by the combined specifications of Consequently, performance gains in the resulting tree-based estimates for occur in the determination of when and where to split, rather than in the manner by which individual terminal nodes are estimated. In the absence of censoring, the indicated composite loss function also reduces to that which would be computed by extending the loss function introduced in Section 2.2 in the manner described above.
4 Simulation Studies
In this section, we will summarize the structure and results of several simulation studies designed to study the performance of regression trees built using the IPCW, Buckley James, and doubly robust Brier loss functions introduced in the previous section. We first describe how data are generated. Several performance evaluation measures are used to evaluate tree-building performance; these are described next, followed by a summary of the results.
4.1 Simulation Settings
Covariates are generated independently of each other; let The underlying competing risks model assumes and takes the form
(13) | |||
(14) |
where and and are regression coefficients (cf., fine1999proportional). Note that this formulation satisfies the additivity constraint Three settings are considered: (i) and with (High Signal); (ii) and with (Medium Signal); and, (iii) and with (Low Signal). The corresponding cumulative incidence functions for these settings are shown in Figure 1.

Similarly to steingrimsson2016doubly, we respectively generate training sets of size 250, 500 and 1000 (with noninformative censoring) for the purposes of estimation and an independent uncensored test set of size 2000 for evaluating performance. Censoring is exponentially distributed with a rate chosen to give approximately 50% censoring on . For each training set size, 500 simulations are run for each of the three settings.
4.2 Performance Measures
For assessing estimator performance, it is assumed there is an underlying true tree structure to be recovered. Let be the tree-predicted cumulative incidence for the cause of interest for a subject having covariates ; let be the corresponding true cumulative incidence function. Then, as a measure of predictive performance, we consider
(15) |
where is the number of observations in the test dataset. We also use several other performance measures focused more on the identification of the underlying tree structure:
-
•
fitted sizetrue size : This quantity measures difference between the size of fitted tree and the size of the true tree, where size denotes the number of terminal nodes. It can be seen from Figure 1 that the true tree size is 3.
-
•
NSP: This quantity is defined as where represents number of times any of the covariates (i.e., covariates that do not affect the true CIF) appear in simulation run .
-
•
PCSP : This quantity is defined as where if the fitted tree in simulation has the same covariates and the same number of splits as the true tree that defines the CIF; otherwise . While it can happen that a fitted tree may involve only the covariates and , it cannot be expected to have exactly the same splits as the true tree (especially for a continuous covariate); thus, in this measure, a ‘correct tree’ is one that looks like the true tree but may involve different split points.
4.3 Estimation of and
The IPCW and doubly robust Brier loss functions require estimation of The Buckley-James and doubly robust Brier loss functions both rely on the function in (8), the optimal choice requiring specification of both and the event-free survival function for .
Estimation of is comparatively straightforward; for example, one may use regression procedures (e.g., Cox regression models, random survival forests) or product-limit estimators as appropriate. For all simulations considered here, the censoring distribution is estimated by the Kaplan-Meier method. For building an IPCW-type tree with in (4), the resulting censoring weight may not satisfy the required positivity assumption. Hence, similarly to steingrimsson2016doubly, we replace by when calculating IPCW where (i.e., the marginal truncation rate of observed survival times is 5%). No such modification is required when calculating IPCW that is, when computing (4) with
Calculation of the Buckley-James and doubly robust Brier loss functions in practice requires using an estimated model in place of when computing (8). Parametric models have been proposed that ensure the compatibility between the models used for and as well as the ability to calculate probabilities for every see, for example, jeong2006direct, jeong2007parametric, cheng2009modeling, and shi2013constrained. Compared to existing semiparametric methods (e.g., fine1999proportional; scheike2008predicting; eriksson2015proportional), the parametric likelihood methods of jeong2006direct and jeong2007parametric enable researchers to estimate two or more CIFs jointly. However, cheng2009modeling and shi2013constrained show that even these methods fail to enforce the additivity constraint and propose methods that resolve this inconsistency.
Since our focus is on a particular cause we will without loss of generality assume that
and that (i.e., captures all causes ); in this case,
specification
of and is equivalent to specifying
Our simulation study
considers the following estimators when
-
1.
RF(npar) : random survival forests as proposed in ishwaran2014random.
-
2.
RF+lgtc: a hybrid method of random survival forests and parametric modeling;
-
3.
FG(true): the simulation model of Section 4.1 is an example of the class of models considered in jeong2007parametric. For are estimated by maximum likelihood under a correctly specified parametric model.
-
4.
lgtc+godds: for is estimated by maximum likelihood using the parametric cumulative incidence model proposed in shi2013constrained.
Further details on the RF(npar) and RF+lgtc estimators, including the parametric models used in #2 and #4 above, may be found in Section A.3 of the Appendix. We anticipate that FG(true) will lead to the best performance because the underlying models for the event of interest and for censoring are both specified correctly. As in steingrimsson2016doubly, the estimates and depend only on and are computed in advance of the process of building the desired regression tree for estimating
4.4 Simulation Results
Each combination of loss function and method for estimating and/or leads to a distinct algorithm. In this section, we show results for the IPCW-type estimators with and ; the Buckley James Brier loss, where RF(npar) (algorithm BJ-RF (npar)) and FG(true) (algorithm BJ-FG (true)) are respectively used to estimate ; and, the doubly robust Brier loss function, where RF(npar) (algorithm DR-RF (npar)) and FG(true) (algorithm DR-FG (true)) are respectively used to estimate . Results using RF+lgtc and lgtc+godds for estimating are summarized in the Appendix (Section A.5).
In fitting the trees, we consider three fixed time points , and , respectively representing the 25th, 50th and 75th quantile of the true marginal distribution of . These times are used for building individual trees as well as for composite loss functions with the respective training set sizes and . Below, we focus on the results with using composite loss functions as described in Section 3.3.2. Table 1 summarizes the tree fitting performance measures for estimating in (13) for using a composite loss function having weights . For the chosen metrics, and irrespective of signal strength, there is a clear benefit to using augmented loss functions over IPCW loss functions, the exception being BJ-RF (npar) (i.e., the Buckley-James loss with estimated by random survival forests). Overall, the BJ-FG (true) algorithm that uses the correct model for performs best, followed by DR-FG (true), DR-RF (npar), IPCW BJ-RF (npar) and then IPCW. Prediction error performance is considered in Figure 2, which shows boxplots of the mean squared prediction error given in (15) under equally weighted composite loss. Here, the BJ-FG (true) algorithm again performs uniformly best, with the augmented methods DR-FG (true), DR-RF (npar), and BJ-RF (npar) all performing similarly, followed by IPCW and then IPCW
In general, the methods that incorporate information beyond that used by IPCW1 exhibit significant improvement in performance, with the methodology for estimating quantities needed to compute the augmented loss function playing a smaller role, particularly as sample size increases. Considering only methods where no knowledge of is used, the doubly robust loss function also provides gains over both the Buckley-James and IPCW loss functions; however, the gains achieved are substantially less compared to the gains over IPCW. This phenomenon is expected and can be explained by noting that (i) the censoring distribution is being consistently estimated in all cases; (ii) IPCW can be viewed as an augmented version of IPCW and, (iii) IPCW only requires estimating one infinite dimensional parameter.
Numerical and graphical results for other methods of nuisance parameter estimation and other settings (a single time point loss and composite loss with various training set sample sizes) are shown in the Appendix. These results demonstrate no significant deviations from the trends and observations summarized above. Some important conclusions that can be drawn from the combined set of results include the following: (i) the method for estimating in (8) has little overall impact on performance, with greater differences arising from the type of loss (IPCW versus DR versus BJ) and whether or not a composite loss function is used; (ii) getting the data generating model exactly right yields noticeable gains, and there is otherwise a degree of robustness across methods, particularly for doubly robust loss; and, (iii) the performance of IPCW2 is often reasonably close to that for the augmented losses, particularly with composite loss, and the comparative degree of simplicity involved in implementing this method has much to recommend it.
High Sig | Med Sig | Low Sig | |||||||
---|---|---|---|---|---|---|---|---|---|
fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | |
IPCW1 | 0.132 | 0.084 | 0.916 | 0.182 | 0.132 | 0.874 | 0.558 | 0.164 | 0.658 |
IPCW2 | 0.124 | 0.082 | 0.932 | 0.138 | 0.100 | 0.906 | 0.282 | 0.136 | 0.830 |
BJ-RF(npar) | 0.142 | 0.090 | 0.904 | 0.148 | 0.088 | 0.908 | 0.226 | 0.128 | 0.878 |
BJ-FG(true) | 0.092 | 0.068 | 0.940 | 0.102 | 0.064 | 0.934 | 0.118 | 0.082 | 0.918 |
DR-RF(npar) | 0.066 | 0.052 | 0.958 | 0.092 | 0.056 | 0.940 | 0.216 | 0.118 | 0.856 |
DR-FG(true) | 0.058 | 0.044 | 0.966 | 0.092 | 0.058 | 0.942 | 0.176 | 0.076 | 0.874 |



5 Example: Lung Cancer Treatment Trial
We illustrate our methods using data from the RTOG 9410, randomized trial of patients with locally advanced inoperable non-small cell lung cancer. The motivation for this trial was to ascertain whether sequential or concurrent delivery of chemotherapy and thoracic radiotherapy (TRT) is a better treatment strategy. The original RTOG 9410 study randomized 610 patients to three treatment arms: sequential chemotherapy followed by radiotherapy (RX=1); once-daily chemotherapy concurrent with radiotherapy (RX=2); and, twice-daily chemotherapy concurrent with radiotherapy (RX=3). The primary endpoint of interest was overall survival and the main trial analysis results were published in curran2011sequential, demonstrating a survival benefit of concurrent delivery of chemotherapy and TRT compared with sequential delivery. Secondary analyses of the data using the time from randomization to the first occurrence of three possible outcomes are considered: in-field failure (cancer recurrence within the treatment field for TRT); out-field failure (cancer recurrence and distant metastasis outside of the treatment field for TRT); and, death without documented in-field or out-field failure (i.e., cancer progression). Among these event types, those that first experienced out-field failures are of particular interest since these patients typically have suboptimal prognosis and may be candidates for more intensified treatment regimens intended to prevent distant metastasis, including but not limited to consolidative chemotherapy, prophylactic cranial irradiation (for brain metastases), and so on. As such, patients that experienced both in-field failure and out-field failure were considered to be out-field failures for purposes of this analysis.
At the time the study database was last updated in 2009, there were 577 patients, with approximately 3% censoring on the aforementioned outcomes. Because this censoring rate is so low, we removed these censored observations and compare the results of analyses of the resulting uncensored dataset (554 patients) to analyses of data created using an artificially induced censoring mechanism. The purpose of doing this analysis is to evaluate how censoring affects the analysis and, in particular, illustrates how well the various procedures are able to recover the desired tree(s) that would be built had all outcomes been fully observed. We focus on building trees for each outcome using a composite loss function with 5 time points (5.2, 6.2, 8.5, 11.8, 19.0 months), selected as the 25th, 35th, 50th, 65th and 80th percentiles of the observed “all cause” event time (i.e., ). Baseline covariates included in this analysis are Treatment (RX), Age, Stage (American Joint Committee on Cancer [AJCC] stage IIIB vs. IIIA or II), Gender, Karnofsky score (70, 80, 90 or 100), Race (White vs. non-White), and Histology (Squamous vs. non-Squamous). Censoring is created according to a Uniform distribution, generating approximately 29% censoring on . In addition to building a tree using the uncensored dataset (i.e., see Section 3.3.2), we consider the methods IPCW, IPCW, BJ-RF(npar), BJ-RF(npar), DR-RF+lgtc, DR-RF(npar) and DR-RF+lgtc; this will allow us to evaluate the effect of censoring as well as different models used in constructing the augmented loss. Similarly to steingrimsson2016doubly, we use repeated 10-fold cross-validation to improve the stability of the tree building process and select the final tree using that having the lowest mean cross-validated error. For the in-field failure outcome, all methods (including those applied to the uncensored dataset) lead to trees that consist only of a root node. Hence, we focus on outfield failure and death without progression below.
For out-field failure, the estimated tree obtained using the uncensored data creates 4 risk groupings (see Figure A.4 in the Appendix): age 50.5; 50.5 age 70.5 with squamous histology; age 50.5 with squamous histology; and, age 70.5 with non-squamous histology. The plot of the CIF indicates that age 70.5 with non-squamous histology has the lowest risk, and that this risk is very similar to those aged 50.5 with squamous histology. With censored data, the IPCW, IPCW, BJ-RF(npar) and DR-RF(npar) methods all lead to the same tree structure; see Figures A.5 – A.8 in the Appendix. The BJ-RF+lgtc and DR-RF+lgtc also share the same tree structures (see Figures A.9 and A.10) and identify nearly the same risk groups as the other censored data methods, the difference being that the secondary split on age occurs at 70.5 rather than 71.5. Importantly, the trees for out-field failure respectively built using the uncensored and censored datasets are very similar. Figure 3 respectively summarize the estimated CIFs obtained using the uncensored, DR-RF(npar), IPCW and BJ-RF(npar) loss functions. There are slight differences in the latter three estimators due to the way in which the CIFs (i.e., terminal nodes) are estimated. We see that the three highest risk groups are identified as being the same for all methods; the differences created by censoring essentially occur in the risk determination for the oldest squamous patients, where censoring tends to be the heaviest. In general, the results show that younger patients with non-squamous histology have the highest risk of out-field failure. This observation is important, as these patients may be considered as candidates for more intensified treatment regimens.

For death without progression, the fitted tree from uncensored data creates 5 risk groups: non-squamous histology with age 72.5; male, with age and squamous histology; male, with age 67.5 and squamous histology; female with squamous histology; and, non-squamous histology with age 72.5. This tree is given in Figure A.11 in the Appendix. The tree built using IPCW results in a root node. The trees built with IPCW and the two doubly robust methods are nearly identical to that built using the uncensored dataset, the difference being that the age cutpoint of 72.5 is replaced by 70.5; see Figures A.12 A.14 of the Appendix. The tree built using BJ-RF(npar) is also very similar, but incorporates an extra cutpoint on Karnofsky score for younger males with squamous histology; this distinguishes those having a score of 100 from those having a lower score (see Figure A.15 of the Appendix). The tree built using the BJ-RF+lgtc method differs somewhat more substantially for younger males with squamous histology, splitting on both Karnofsky score (70 or 80 vs. 90 or 100) treatment (RX = 1 vs. not); see Figure A.16 of the Appendix. The comparison of these trees points to a benefit of using the doubly robust loss over the Buckley-James loss, as the censoring distribution is modeled correctly; the comparison of the augmented loss results to those obtained for IPCW (where censoring is modeled correctly) further highlights the value of incorporating additional information into the model building process. Figure 4 is obtained analogously to Figure 3 and shows the estimated CIFs and risk stratification obtained using the uncensored and censored datasets. For death without progression, the risk evidently increases with increasing age and being male; the risk also increases with squamous histology, though in a way that appears to be age-dependent.

6 Discussion
Trees remain of significant interest to practitioners, especially so clinicians. The proposed doubly robust CIF regression tree demonstrates improved performance compared to IPCW-based methods whether a single time point or composite loss function is used, with composite loss functions giving much better performance overall. To our knowledge, there are no publically available software packages that directly implement tree-based regression methods for competing risks; our proposed methods can be implemented using existing software, with example code made available as part of supplementary materials.
Several extensions are possible. For example, it is easy in principle to combine the proposed estimation procedure with ensemble methods, providing an interesting alternative to the random forests methods recently introduced by mogensen2013random and ishwaran2014random. With no random feature selection, this extension is easily accomplished through bagging (i.e., bootstrap-based aggregation). However, with random feature selection, new software is required, as existing software does not provide for the same possibility of extending rpart for use with other loss functions. A second interesting direction is to extend both regression tree and ensemble procedures to the problem of simultaneous estimation of multiple CIFs. Similarly to the case of a single CIF, we anticipate that such multivariate problems can be handled using existing software in the case of regression trees and that new software will again be needed for related ensemble extensions.
Acknowledgments
This work was supported by the National Institutes of Health (R01CA163687: AMM, RLS, YC; U10-CA180822: CH). We thank the NRG Oncology Statistics and Data Management Center for providing de-identified RTOG 9410 clinical trial data under a data use agreement.
Supplementary Material for
Regression Trees for Cumulative Incidence Functions
by Cho, Molinaro, Hu and Strawderman
Appendix A Appendix
References to figures, tables, theorems and equations preceded by “A.” are internal to this appendix; all other references refer to the main paper.
A.1 Equivalence of augmentation for IPCW and IPCW
The equivalence result to be proved below holds for any pair of suitable models of and , including and . Throughout, we suppose that suitable models and are given. The proof will be facilitated by first establishing that IPCW can be rewritten as an IPCW estimator in standard form and augmented similarly to IPCW We will then show that the difference between its augmented version and that for IPCW are mathematically identical.
Define the modified data where and for We first observe that the IPCW loss can be re-expressed as a standard IPCW estimator that can be computed from In particular, the IPCW loss (i.e., (4) with ) is mathematically equivalent to
(A.1) |
This equivalence follows because almost surely provided that is continuous. Strict inequality on the right-hand side is necessary here because the fact that almost surely implies whenever and The critical observation is that the loss estimator (A.1) is now an IPCW estimator in standard form constructed from and thus can be augmented similarly to IPCW (cf. tsiatis2007semiparametric, Sec. 9.3, 10.3). In particular, the augmented loss is given by where
Note that this last expression is just (6), but with an upper limit of integration of in place of .
We can now establish the equivalence of the augmented loss functions respectively derived from the IPCW and IPCW losses. Trivially, we may rewrite
(A.2) |
and
(A.3) |
Consequently, almost surely, where
and
Consider the integral term appearing in (B) for any fixed value of . By the definition of , the integral is clearly zero if (i.e., when ); hence, we only need to consider the calculation of
for Recall that , where is given in (8). By definition, we have that for ; hence, for
(A.4) |
Calculations analogous to those done in bai2013doubly show that
(A.5) |
Substituting (A.5) into (A.4) gives
(A.6) |
Hence we can see that
almost surely, where
Recall that and are both continuous; hence, . We now consider the calculation of under the 6 possible ways in which and can be ordered with positive probability:
-
1.
In this case, and Hence,
-
2.
As in Case 1, and hence,
-
3.
In this case, and . Moreover, regardless of because Hence,
-
4.
In this case, and Hence,
-
5.
In this case, and Hence,
-
6.
In this case, and regardless of because . Hence,
In summary, we have proved that except possibly on a set which has probability measure 0.
A.2 Further details on CART algorithms using augmented loss
The extension of the CIF regression tree methodology described in Section 2.3 to the case where is observed has two main steps:
-
1.
Throughout, replace the loss function used by CART with
-
2.
Use the modified algorithm in Step #1 to grow a maximal tree, thereby generating a sequence of trees as candidates for the best tree;
-
3.
Using cross-validation, select the best tree from this sequence via cost complexity pruning.
Regarding Step #1, the overall structure of the CART algorithm is independent of the choice of loss function; the specification of the loss function only plays a role in determining how the splitting, evaluation, and model selection decisions are made in Steps #2 and #3 (cf., breiman1984classification).
Step #2 serves to restrict the search space, that is, the set of possible trees that one must consider in determining which tree is optimal; see below. To describe how Step #3 is carried out, we first need to define the notion of a cross-validated risk estimator. Let be a partition of such that each contains all observed data on some subset of subjects and each subject in appears in exactly one of the sets . Let where if observation is in dataset and zero otherwise. Recalling (7), define the modified notation
as given, is now a function of . Fixing and for each let
where is the prediction obtained from a tree that estimates from the reduced dataset Then, similarly to steingrimsson2016doubly, a cross-validated doubly robust risk estimator can be defined as follows:
Returning to Step #3: Suppose Steps #1 and 2 have been run. The corresponding unpruned maximal tree generates a sequence of (say) subtrees, each of which is a candidate for the final tree. Each of these trees can be identified from this maximal tree by a unique choice of the so-called cost complexity tuning penalty parameter; call these parameter values . Now, for each fixed let denote the sequence of trees obtained by running the doubly robust tree building procedure on the datasets where each such tree is determined by cost complexity pruning using the penalty parameter . Define to be the corresponding cross-validated risk estimates. The final tree is then obtained from the initial maximal tree using the penalty parameter where minimizes
The indicated procedure is modified in an obvious way to accommodate IPCW-type and Buckley-James Brier loss functions. In the case of the corresponding composite loss function, a weighted sum of the desired loss estimates over is instead used. Beyond these loss modifications, estimation of the maximal tree and selection of the corresponding best tree is carried out exactly as described above. In practice, implementation of the algorithm just described is possible using rpart’s capabilities for incorporating user-written splitting and evaluation functions (therneau2015introduction). Examples of such code are provided as part of the Supplementary Materials for this paper.
A.3 Estimating for computing augmented loss functions
Section 4.3 discusses methods for estimating both the censoring distribution and the parameter when computing observed data loss functions. This section expands on the four methods for estimating needed for computing (8), that are described in Section 4.3. We assume
-
1.
RF(npar): for is estimated using nonparametric random forests. More specifically, the rfsrc function in the randomForestSRC package is used (ishwaran2014random), where the
predict
command is used to estimate and for and . For and bootstrap samples, the components of are given by here, for the bootstrap sample, is the estimated CIF of interest and is the corresponding ”all other causes” estimate. The event-free survival function is estimated in the obvious manner. -
2.
RF+lgtc: for is estimated using “random parametric forest”. That is, as above, the rfsrc function in the randomForestSRC package is first used (ishwaran2014random) to construct a forest (i.e., trees) with . Using the tree obtained for the bootstrapped sample, are then estimated by fitting the three parameter logistic cumulative incidence model of cheng2009modeling separately to the data falling into each terminal node. Finally, similarly to the desired ensemble predictors are obtained by computing bootstrap averages.
-
3.
FG(true): the simulation model of Section 4.1 is an example of the class of models considered in jeong2007parametric. For are estimated by maximum likelihood under a correctly specified parametric model.
-
4.
lgtc+godds: for is estimated by maximum likelihood using the parametric cumulative incidence model proposed in shi2013constrained.
A.3.1 The CIF models of cheng2009modeling shi2013constrained
Here, we describe the parametric models of cheng2009modeling shi2013constrained that are used by RF+lgtc and lgtc+godds. Specifically, assuming and hence for these models are given below:
-
1.
Three parameter logistic model without covariates (RF+lgtc; cheng2009modeling):
This model is motivated by nonlinear modeling in bioassay and dose-response curves (ritz2005bioassay). Specifically, the parametric model that is fit by maximum likelihood separately within each terminal node of a tree is given bywhere consists of three parameters: is upper asymptote of when ; and, and are the slope and center of the curve (cheng2009modeling). To satisfy the additivity constraint required of CIFs, we further assume .
-
2.
Three parameter logistic model with covariates (lgtc+godds; shi2013constrained):
shi2013constrained extends the model of cheng2009modeling to incorporate covariates. Specifically, the parametric model used for iswhere is a specified nondecreasing function and is the covariate-independent model of cheng2009modeling. One possible form of is
where is parameter for each event type. This model is an extension of the generalized odds-rate model (dabrowska1988estimation). When using a generalized odds model as a link function , the CIF for is given by the formula where
the CIF for is given by
where . Observe that the additive constraint is satisfied. In addition, note that a separate covariate effect for is not modeled and hence there are no additional regression parameters to be estimated (i.e., even though CIF for does depend on covariates through ).
A.4 Miscellaneous Algorithm Specifications
Two important tuning parameters that govern the size of the maximal tree built in Step #2 of general CART algorithm are minbucket
, the minimum possible number of observations in a terminal node, and minsplit
, the minimum number of observations in a node to be considered for a possible split. Throughout, we set minbucket
=10 and minsplit
=30. For the doubly robust and Buckley-James methods, both uncensored and censored
observations are counted when considering these limits; for IPCW methods, only uncensored observations are counted.
A.5 Additional simulation results
In this section, we summarize the full set of simulation results using both“single time point” and composite loss functions. We recall that the IPCW1 and IPCW2 refer to the usual () and modified () IPCW methods, respectively. As described elsewhere, the Buckley-James (BJ) and doubly robust (DR) methods use several approaches to estimating (8), which is required for computing the augmented loss function. In particular, as described in Sections 4.3 (see also Section A.3.1), we use nonparametric random forests (RF (npar)), random parametric forests (RF+lgtc), the true data generating model (FG (true)) and the three-parameter logistic distribution in combination with a generalized odds model (Lgtc+godds). Results are summarized for each type of loss function. For example, results corresponding to“BJ-RF (npar)” means that we use the BJ loss function in combination with nonparametric random forests for calculating (8).
Table A.1 – A.4 summarize the numerical performance of the fitted trees and Figures A.1 – A.3 show boxplots of prediction error. The choice of time points and method of summarizing the results in the tables and figures is the same as in Section 4.4. It is easy to see that (i) performance generally improves with increasing sample size and tends to be worst at ; (ii) there is substantially more variation in results using a single time point loss in comparison to a composite loss; and, (iii) the use of the composite loss function results in better overall performance, inlcuding lower prediction error, in comparison to the use of any single time point loss function.
In the medium and low signal settings, greater differences between methods emerge, with a clear improvement at compared to Regardless of sample size, IPCW1 has the least favorable performance overall in both the medium and low signal settings, and BJ-FG(true) has the most favorable performance overall; these differences are especially noticeable in the low signal setting.
It is also clear that IPCW2 generally exhibits less variability than IPCW1 at time points and , whereas for time , the two measures perform similarly. This last result is expected since as .
The results further show that the choice of method for estimating the conditional expectation has little overall impact on performance; the main differences stem from the type of loss (IPCW versus DR versus BJ) and whether or not a composite loss function is used. The results also clearly highlight the value of getting the data generating model exactly right, and otherwise demonstrate a desirable degree of robustness across methods, particularly for doubly robust loss. When only comparing IPCW1 and the doubly robust methods, the efficiency gain is evident, particularly at time point using a single time point loss. The performance for IPCW2 is often reasonably close to that for the Buckley-James and doubly robust loss functions, especially for a composite loss function; the comparative degree of simplicity involved in implementing this method has much to recommend it.
High Sig | Med Sig | Low Sig | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | ||
IPCW1 | 0.432 | 0.196 | 0.724 | 1.668 | 0.118 | 0.112 | 1.890 | 0.072 | 0.020 | |
0.204 | 0.138 | 0.890 | 0.838 | 0.166 | 0.494 | 1.644 | 0.124 | 0.118 | ||
0.294 | 0.188 | 0.816 | 0.780 | 0.250 | 0.556 | 1.332 | 0.198 | 0.246 | ||
IPCW2 | 0.350 | 0.230 | 0.800 | 1.428 | 0.208 | 0.220 | 1.834 | 0.114 | 0.040 | |
0.130 | 0.072 | 0.922 | 0.546 | 0.168 | 0.674 | 1.500 | 0.150 | 0.182 | ||
0.354 | 0.242 | 0.800 | 0.726 | 0.216 | 0.576 | 1.318 | 0.134 | 0.238 | ||
BJ-RF(npar) | 0.392 | 0.280 | 0.814 | 1.298 | 0.286 | 0.284 | 1.840 | 0.160 | 0.042 | |
0.236 | 0.162 | 0.876 | 0.424 | 0.238 | 0.770 | 1.354 | 0.258 | 0.252 | ||
0.320 | 0.258 | 0.858 | 0.336 | 0.218 | 0.818 | 0.980 | 0.304 | 0.438 | ||
BJ-RF+lgtc | 0.326 | 0.234 | 0.822 | 1.302 | 0.304 | 0.290 | 1.840 | 0.140 | 0.044 | |
0.256 | 0.182 | 0.864 | 0.404 | 0.228 | 0.774 | 1.344 | 0.232 | 0.256 | ||
0.302 | 0.236 | 0.858 | 0.234 | 0.144 | 0.854 | 0.930 | 0.256 | 0.448 | ||
BJ-FG(true) | 0.336 | 0.260 | 0.826 | 0.950 | 0.284 | 0.482 | 1.694 | 0.152 | 0.106 | |
0.190 | 0.140 | 0.902 | 0.204 | 0.134 | 0.882 | 0.682 | 0.246 | 0.630 | ||
0.142 | 0.116 | 0.932 | 0.150 | 0.098 | 0.908 | 0.202 | 0.138 | 0.886 | ||
BJ-lgtc+godds | 0.392 | 0.282 | 0.792 | 1.182 | 0.320 | 0.354 | 1.766 | 0.146 | 0.070 | |
0.342 | 0.244 | 0.850 | 0.368 | 0.236 | 0.818 | 1.024 | 0.284 | 0.404 | ||
0.616 | 0.336 | 0.728 | 0.426 | 0.262 | 0.816 | 0.702 | 0.386 | 0.602 | ||
DR-RF(npar) | 0.342 | 0.248 | 0.814 | 1.378 | 0.246 | 0.246 | 1.838 | 0.138 | 0.044 | |
0.170 | 0.128 | 0.898 | 0.508 | 0.242 | 0.696 | 1.506 | 0.222 | 0.182 | ||
0.288 | 0.188 | 0.860 | 0.480 | 0.128 | 0.688 | 1.382 | 0.122 | 0.236 | ||
DR-RF+lgtc | 0.338 | 0.236 | 0.816 | 1.392 | 0.288 | 0.250 | 1.826 | 0.132 | 0.044 | |
0.166 | 0.122 | 0.916 | 0.524 | 0.248 | 0.692 | 1.518 | 0.210 | 0.168 | ||
0.270 | 0.190 | 0.864 | 0.544 | 0.138 | 0.660 | 1.426 | 0.144 | 0.222 | ||
DR-FG(true) | 0.366 | 0.266 | 0.818 | 1.362 | 0.268 | 0.274 | 1.830 | 0.134 | 0.046 | |
0.168 | 0.126 | 0.910 | 0.482 | 0.208 | 0.734 | 1.416 | 0.150 | 0.212 | ||
0.244 | 0.164 | 0.866 | 0.588 | 0.234 | 0.686 | 1.318 | 0.148 | 0.284 | ||
DR-lgtc+godds | 0.366 | 0.274 | 0.802 | 1.366 | 0.250 | 0.264 | 1.842 | 0.112 | 0.036 | |
0.126 | 0.100 | 0.920 | 0.510 | 0.236 | 0.722 | 1.518 | 0.222 | 0.180 | ||
0.236 | 0.160 | 0.854 | 0.578 | 0.174 | 0.648 | 1.404 | 0.116 | 0.236 |
High Sig | Med Sig | Low Sig | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | ||
IPCW1 | 0.154 | 0.110 | 0.892 | 1.058 | 0.184 | 0.422 | 1.842 | 0.078 | 0.064 | |
0.124 | 0.066 | 0.932 | 0.222 | 0.138 | 0.850 | 1.020 | 0.122 | 0.422 | ||
0.134 | 0.092 | 0.916 | 0.178 | 0.106 | 0.892 | 0.672 | 0.134 | 0.594 | ||
IPCW2 | 0.198 | 0.130 | 0.886 | 0.480 | 0.220 | 0.726 | 1.588 | 0.172 | 0.166 | |
0.118 | 0.076 | 0.928 | 0.120 | 0.090 | 0.904 | 0.666 | 0.170 | 0.614 | ||
0.116 | 0.086 | 0.916 | 0.162 | 0.094 | 0.898 | 0.632 | 0.176 | 0.626 | ||
BJ-RF+npar | 0.192 | 0.128 | 0.906 | 0.394 | 0.208 | 0.774 | 1.506 | 0.192 | 0.214 | |
0.120 | 0.070 | 0.924 | 0.178 | 0.126 | 0.894 | 0.366 | 0.162 | 0.784 | ||
0.136 | 0.086 | 0.918 | 0.182 | 0.114 | 0.894 | 0.190 | 0.136 | 0.868 | ||
BJ-RF+lgtc | 0.192 | 0.136 | 0.904 | 0.428 | 0.240 | 0.762 | 1.554 | 0.198 | 0.196 | |
0.086 | 0.052 | 0.934 | 0.170 | 0.120 | 0.898 | 0.362 | 0.162 | 0.792 | ||
0.142 | 0.090 | 0.918 | 0.174 | 0.108 | 0.900 | 0.200 | 0.140 | 0.870 | ||
BJ-FG(true) | 0.184 | 0.136 | 0.912 | 0.250 | 0.166 | 0.852 | 1.190 | 0.244 | 0.370 | |
0.110 | 0.072 | 0.950 | 0.186 | 0.134 | 0.880 | 0.122 | 0.086 | 0.906 | ||
0.082 | 0.066 | 0.956 | 0.104 | 0.074 | 0.942 | 0.098 | 0.058 | 0.936 | ||
BJ-lgtc+godds | 0.194 | 0.140 | 0.904 | 0.326 | 0.222 | 0.804 | 1.314 | 0.214 | 0.300 | |
0.196 | 0.126 | 0.902 | 0.226 | 0.146 | 0.862 | 0.270 | 0.152 | 0.828 | ||
0.526 | 0.190 | 0.704 | 0.350 | 0.192 | 0.818 | 0.218 | 0.146 | 0.852 | ||
DR-RF(npar) | 0.184 | 0.136 | 0.910 | 0.456 | 0.204 | 0.742 | 1.546 | 0.140 | 0.186 | |
0.118 | 0.066 | 0.934 | 0.134 | 0.094 | 0.908 | 0.626 | 0.148 | 0.654 | ||
0.102 | 0.068 | 0.938 | 0.088 | 0.056 | 0.944 | 0.498 | 0.140 | 0.692 | ||
DR-RF+lgtc | 0.158 | 0.116 | 0.918 | 0.482 | 0.202 | 0.722 | 1.568 | 0.146 | 0.180 | |
0.108 | 0.052 | 0.930 | 0.118 | 0.088 | 0.916 | 0.644 | 0.158 | 0.652 | ||
0.076 | 0.054 | 0.952 | 0.110 | 0.070 | 0.936 | 0.524 | 0.140 | 0.678 | ||
DR-FG(true) | 0.190 | 0.140 | 0.906 | 0.444 | 0.188 | 0.756 | 1.576 | 0.172 | 0.186 | |
0.084 | 0.050 | 0.956 | 0.124 | 0.080 | 0.916 | 0.552 | 0.134 | 0.676 | ||
0.096 | 0.066 | 0.946 | 0.078 | 0.054 | 0.950 | 0.522 | 0.128 | 0.686 | ||
DR-lgtc+godds | 0.206 | 0.140 | 0.904 | 0.440 | 0.186 | 0.744 | 1.552 | 0.192 | 0.170 | |
0.098 | 0.050 | 0.946 | 0.138 | 0.094 | 0.908 | 0.594 | 0.148 | 0.668 | ||
0.084 | 0.054 | 0.948 | 0.118 | 0.076 | 0.928 | 0.496 | 0.112 | 0.682 |
High Sig | Med Sig | Low Sig | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | ||
IPCW1 | 0.106 | 0.076 | 0.930 | 0.132 | 0.072 | 0.908 | 1.328 | 0.134 | 0.284 | |
0.066 | 0.042 | 0.962 | 0.134 | 0.114 | 0.904 | 0.208 | 0.116 | 0.864 | ||
0.092 | 0.054 | 0.936 | 0.070 | 0.052 | 0.952 | 0.124 | 0.088 | 0.916 | ||
IPCW2 | 0.088 | 0.072 | 0.946 | 0.092 | 0.070 | 0.934 | 0.646 | 0.170 | 0.650 | |
0.064 | 0.042 | 0.956 | 0.116 | 0.090 | 0.924 | 0.122 | 0.076 | 0.916 | ||
0.100 | 0.056 | 0.932 | 0.082 | 0.056 | 0.944 | 0.130 | 0.094 | 0.922 | ||
BJ-RF+npar | 0.106 | 0.082 | 0.924 | 0.114 | 0.084 | 0.932 | 0.488 | 0.156 | 0.722 | |
0.088 | 0.056 | 0.950 | 0.168 | 0.118 | 0.890 | 0.108 | 0.070 | 0.924 | ||
0.114 | 0.060 | 0.932 | 0.180 | 0.110 | 0.894 | 0.142 | 0.106 | 0.906 | ||
BJ-RF+lgtc | 0.126 | 0.102 | 0.922 | 0.104 | 0.078 | 0.938 | 0.458 | 0.152 | 0.738 | |
0.064 | 0.048 | 0.958 | 0.136 | 0.094 | 0.904 | 0.086 | 0.056 | 0.934 | ||
0.124 | 0.068 | 0.924 | 0.160 | 0.104 | 0.898 | 0.138 | 0.094 | 0.904 | ||
BJ-FG(true) | 0.108 | 0.088 | 0.932 | 0.088 | 0.056 | 0.946 | 0.218 | 0.100 | 0.866 | |
0.084 | 0.070 | 0.956 | 0.058 | 0.042 | 0.954 | 0.076 | 0.050 | 0.944 | ||
0.036 | 0.024 | 0.970 | 0.104 | 0.074 | 0.942 | 0.060 | 0.040 | 0.958 | ||
BJ-lgtc+godds | 0.112 | 0.084 | 0.922 | 0.110 | 0.082 | 0.934 | 0.302 | 0.120 | 0.820 | |
0.164 | 0.094 | 0.922 | 0.162 | 0.090 | 0.902 | 0.100 | 0.076 | 0.932 | ||
0.774 | 0.128 | 0.508 | 0.358 | 0.126 | 0.786 | 0.204 | 0.122 | 0.860 | ||
DR-RF(npar) | 0.088 | 0.070 | 0.942 | 0.102 | 0.072 | 0.934 | 0.556 | 0.168 | 0.692 | |
0.068 | 0.048 | 0.958 | 0.098 | 0.068 | 0.930 | 0.104 | 0.072 | 0.926 | ||
0.048 | 0.038 | 0.978 | 0.048 | 0.030 | 0.964 | 0.098 | 0.076 | 0.936 | ||
DR-RF+lgtc | 0.092 | 0.070 | 0.938 | 0.094 | 0.066 | 0.938 | 0.588 | 0.168 | 0.682 | |
0.048 | 0.034 | 0.962 | 0.100 | 0.068 | 0.932 | 0.090 | 0.064 | 0.934 | ||
0.040 | 0.032 | 0.982 | 0.034 | 0.024 | 0.974 | 0.098 | 0.074 | 0.938 | ||
DR-FG(true) | 0.080 | 0.066 | 0.940 | 0.068 | 0.048 | 0.954 | 0.532 | 0.164 | 0.706 | |
0.082 | 0.062 | 0.960 | 0.092 | 0.066 | 0.938 | 0.072 | 0.058 | 0.950 | ||
0.050 | 0.042 | 0.974 | 0.056 | 0.036 | 0.968 | 0.088 | 0.074 | 0.948 | ||
DR-lgtc+godds | 0.068 | 0.054 | 0.948 | 0.094 | 0.064 | 0.938 | 0.564 | 0.162 | 0.690 | |
0.054 | 0.036 | 0.962 | 0.114 | 0.078 | 0.934 | 0.074 | 0.054 | 0.938 | ||
0.048 | 0.022 | 0.970 | 0.040 | 0.024 | 0.972 | 0.062 | 0.052 | 0.948 |
High Sig | Med Sig | Low Sig | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | fitted-3 | NSP | PCSP | ||
IPCW1 | 0.168 | 0.116 | 0.914 | 0.506 | 0.132 | 0.670 | 1.286 | 0.102 | 0.272 | |
IPCW2 | 0.156 | 0.120 | 0.916 | 0.314 | 0.142 | 0.798 | 1.128 | 0.146 | 0.358 | |
BJ-RF(npar) | 0.234 | 0.186 | 0.878 | 0.304 | 0.158 | 0.810 | 1.128 | 0.246 | 0.364 | |
BJ-RF+lgtc | 0.250 | 0.196 | 0.882 | 0.290 | 0.158 | 0.824 | 1.110 | 0.244 | 0.360 | |
BJ-FG(true) | 0.156 | 0.116 | 0.910 | 0.194 | 0.142 | 0.892 | 0.348 | 0.182 | 0.790 | |
BJ-lgtc+godds | 0.358 | 0.246 | 0.844 | 0.364 | 0.266 | 0.814 | 0.788 | 0.328 | 0.532 | |
DR-RF(npar) | 0.150 | 0.120 | 0.924 | 0.272 | 0.138 | 0.828 | 1.156 | 0.146 | 0.334 | |
DR-RF+lgtc | 0.152 | 0.126 | 0.912 | 0.264 | 0.120 | 0.834 | 1.134 | 0.178 | 0.342 | |
DR-FG(true) | 0.132 | 0.090 | 0.932 | 0.194 | 0.092 | 0.880 | 1.050 | 0.170 | 0.408 | |
DR-lgtc+godds | 0.120 | 0.092 | 0.926 | 0.286 | 0.144 | 0.846 | 1.044 | 0.146 | 0.378 | |
IPCW1 | 0.132 | 0.084 | 0.916 | 0.182 | 0.132 | 0.874 | 0.558 | 0.164 | 0.658 | |
IPCW2 | 0.124 | 0.082 | 0.932 | 0.138 | 0.100 | 0.906 | 0.282 | 0.136 | 0.830 | |
BJ-RF(npar) | 0.142 | 0.090 | 0.904 | 0.148 | 0.088 | 0.908 | 0.226 | 0.128 | 0.878 | |
BJ-RF+lgtc | 0.130 | 0.082 | 0.908 | 0.156 | 0.094 | 0.908 | 0.210 | 0.124 | 0.880 | |
BJ-FG(true) | 0.092 | 0.068 | 0.940 | 0.102 | 0.064 | 0.934 | 0.118 | 0.082 | 0.918 | |
BJ-lgtc+godds | 0.250 | 0.122 | 0.848 | 0.156 | 0.080 | 0.896 | 0.154 | 0.104 | 0.896 | |
DR-RF(npar) | 0.066 | 0.052 | 0.958 | 0.092 | 0.056 | 0.940 | 0.216 | 0.118 | 0.856 | |
DR-RF+lgtc | 0.040 | 0.030 | 0.966 | 0.092 | 0.052 | 0.938 | 0.230 | 0.106 | 0.834 | |
DR-FG(true) | 0.058 | 0.044 | 0.966 | 0.092 | 0.058 | 0.942 | 0.176 | 0.076 | 0.874 | |
DR-lgtc+godds | 0.040 | 0.030 | 0.970 | 0.082 | 0.056 | 0.950 | 0.214 | 0.102 | 0.852 | |
IPCW1 | 0.080 | 0.036 | 0.940 | 0.122 | 0.096 | 0.928 | 0.088 | 0.074 | 0.936 | |
IPCW2 | 0.068 | 0.052 | 0.948 | 0.084 | 0.046 | 0.946 | 0.094 | 0.074 | 0.938 | |
BJ-RF(npar) | 0.110 | 0.062 | 0.918 | 0.112 | 0.074 | 0.918 | 0.096 | 0.068 | 0.934 | |
BJ-RF+lgtc | 0.080 | 0.052 | 0.938 | 0.126 | 0.084 | 0.912 | 0.094 | 0.064 | 0.936 | |
BJ-FG(true) | 0.072 | 0.046 | 0.954 | 0.088 | 0.060 | 0.940 | 0.086 | 0.070 | 0.954 | |
BJ-lgtc+godds | 0.270 | 0.078 | 0.818 | 0.140 | 0.076 | 0.904 | 0.100 | 0.066 | 0.928 | |
DR-RF(npar) | 0.050 | 0.028 | 0.976 | 0.068 | 0.046 | 0.952 | 0.098 | 0.068 | 0.928 | |
DR-RF+lgtc | 0.038 | 0.022 | 0.978 | 0.056 | 0.044 | 0.956 | 0.104 | 0.078 | 0.930 | |
DR-FG(true) | 0.030 | 0.022 | 0.978 | 0.056 | 0.034 | 0.966 | 0.058 | 0.042 | 0.958 | |
DR-lgtc+godds | 0.028 | 0.016 | 0.984 | 0.060 | 0.044 | 0.952 | 0.082 | 0.056 | 0.944 |









A.6 Tree results in data analysis
In this subsection, we show various trees built for the RTOG9410 study described in Section 5. Numbers in each terminal node represent terminal node estimators (i.e., CIF estimates) generated by the loss function used to build the corresponding tree.