Bayesian modelling of response to therapy and drug-sensitivity in acute lymphoblastic leukemia
Abstarct
Acute lymphoblastic leukemia (ALL) is a heterogeneous haematologic malignancy involving the abnormal proliferation of immature lymphocytes and accounts for most paediatric cancer cases. The management of ALL in children has seen great improvement in the last decades thanks to greater understanding of the disease leading to improved treatment strategies evidenced through clinical trials. Common therapy regimens involve a first course of chemotherapy (induction phase), followed by treatment with a combination of anti-leukemia drugs. A measure of the efficacy early in the course of therapy is the presence of minimal residual disease (MRD). MRD quantifies residual tumor cells and indicates the effectiveness of the treatment over the course of therapy. MRD positivity is defined for values of MRD greater than 0.01%, yielding left-censored MRD observations. We propose a Bayesian model to study the relationship between patient features (leukemia subtype, baseline characteristics, and drug sensitivity profile) and MRD observed at two time points during the induction phase. Specifically, we model the observed MRD values via an auto-regressive model, accounting for left-censoring of the data and for the fact that some patients are already in remission after the first stage of induction therapy. Patient characteristics are included in the model via linear regression terms. In particular, patient-specific drug sensitivity based on ex-vivo assays of patient samples is exploited to identify groups of subjects with similar profiles. We include this information as a covariate in the model for MRD. We adopt horseshoe priors for the regression coefficients to perform variable selection to identify important covariates. We fit the proposed approach to data from three prospective paediatric ALL clinical trials carried out at the St. Jude Children’s Research Hospital. Our results highlight that drug sensitivity profiles and leukemic subtypes play an important role in the response to induction therapy as measured by serial MRD measures.
keywords: conditional modelling, left censoring, Lethal Concentration 50, minimal residual disease, mixture models
1 Introduction
Acute lymphoblastic leukemia (ALL) accounts for around 25% of pediatric cancers (age 18 years) and is the most common cancer among children [18]. The cure rate for children with acute lymphoblastic leukemia (ALL) has improved greatly over the past few decades. Ten-year survival for childhood ALL has gone from around 11% in the first half of the 1960s to more than 90% by 2010 thanks to advances in understanding of the disease and clinical trials that have greatly improved outcomes [24, 17]. Research in the past has revealed cytogenetic characteristics of patients’ leukemic blast cells that are associated with prognosis [3]. For example, children with ETV6-RUNX1-positive ALL have a better outcome from chemotherapy than those who are negative for this fusion gene [11]. There are now many ALL subtypes defined by mutations, gene fusions as well as transcriptome with some directly informing new treatment strategies [14]. In addition to genomic determinants of prognosis, pharmacokinetic variability, some of which is influenced by pharmacogenomics, contributes to heterogeneity in treatment response and toxicity, both systemically and at the cellular level [28, 10].
The general treatment strategy for leukemia and many other hematologic malignancies is to first reduce cancer cell burden substantially, with the objective of inducing clinical remission. This initial phase of a chemotherapeutic regimen is called the induction phase. The induction phase often includes sequential treatment with a combination of cytotoxic and/or targeted-therapies. The strategy uses intermediate indicators of treatment response after a first course of the drugs and adapts subsequent drug doses and combinations accordingly. The goal is to achieve remission by the end of the induction phase, without which the patient’s prognosis is much poorer.
In ALL, the absence of minimal residual disease (MRD) has served as an early indicator of benefit from the anti-leukemia chemotherapy for decades [7]. Methodologies for MRD measurement detect and quantify residual tumor cells beyond the sensitivity level of cytomorphology. MRD assessment can be refined by the evaluation of additional genomic markers [9]. Nowadays the clinical impact of MRD is widely accepted, and MRD is considered the most important prognostic factor in the management of ALL. The prognostic significance of MRD can vary, depending on the timing in which it is measured. Early assessment of MRD during and at the end of the induction phase is often used for clinical decision making when treating patients with ALL [5].
Assays for MRD have a lower limit of detection. A patient is defined as MRD negative if the patient’s MRD levels fall below a detection threshold, typically less than 1/10000 cells. It is common in clinical practice to adopt a threshold of 0.01% to define MRD positivity. This value represents a limit of detection by flow cytometry, and retrospective analyses have shown that patients whose MRD ¿ 0.01% after induction therapy have a greater risk of relapse and poorer prognosis [4, 2, 15]. It is possible to achieve a routine sensitivity of 0.001% by PCR in clinical samples and similar sensitivity may be achieved by flow cytometry with specific B-cell ALL subtypes [26].
The study population for this work comprises patients who participated in three clinical trials at the St. Jude Children’s Research Hospital. The treatment regimens call for MRD assessments on day 15 and day 42 (end of induction) for determination of further treatment. Patients with MRD of 1% or higher on day 15 receive more intensive therapy for the remainder of the induction phase. Further intensification is reserved for patients with 5% of more leukemic cells on day 15. Patients with so-called standard risk ALL who have MRD of 0.01% or higher on day 42 are reclassified as high-risk ALL.
Additional information on the participants in the studies, collected at the beginning of the treatment, is available, such as age, gender, white blood cell count (WBC), and ALL subtype, as well as results of an ex vivo drug sensitivity screening to characterize the patients’ sensitivity profiles to anti-leukemic drugs.
The aim of this work is to investigate the relationship between patients’ leukemia subtype, leukemic cell drug sensitivity, and clinical benefit as measured by the early marker of treatment effect (MRD). To this end, we specify a joint model in a Bayesian framework for the MRD outcomes collected at days 15 and 42 able to account for the censoring and time dependence between the MRD measurements. Our statistical model combines multiple complexities and allows inference from the full Bayesian model. The complexities include the following. The model allows for the presence of leukemic cells below the assay’s lower limit of detection by treating MRD assessments as left censored if recorded as 0.01%. The model also considers MRD assessments on days 15 and 42 jointly via an autoregressive model. For day 42 MRD, inference is conditional on the presence of MRD on day 15, since no patient who is MRD negative on day 15 becomes MRD positive on day 42. This conditional autoregressive model allows for clearer interpretation of covariate effects. The model also includes Bayesian clustering as part of posterior fitting. The clustering of patients is based on ex vivo measurements of the sensitivity of each patient’s own leukemic cells to the anticancer drugs the patient receives during induction. The investigation also includes the genomic ALL subtype of each patient. A final component of the model is the use of the horseshoe prior [6] for selecting important covariates from the large number of features examined. This class of prior distributions for the regression coefficients provides a way to identify important covariates, thanks to the strong shrinkage effect of its heavy tails. In particular, the density function of this distribution presents a singularity at zero, leading to a more aggressive shrinkage of small coefficients towards zero than other standard prior distributions, and leaving important larger coefficients unaffected. A performance comparison between the horseshoe distribution and the spike-and-slab approach of [12] is offered in [6], showing consistency of the two posterior variable selection results. For further discussion on prior elicitation for the horseshoe distribution, see [20].
2 ALL Dataset
The motivating application consists of data for patients treated in three prospective pediatric ALL clinical trials at the St. Jude Children’s Research Hospital. These studies were part of the long-standing Total Therapy clinical trials program at the institution, initiated in 1962, comprising clinical trials that continue to build sequentially on each other [23]. These studies have shown the benefit of full-dose chemotherapy and treatment directed at the central nervous system for treating pediatric ALL, with greatly improved outcomes over the years, currently achieving event-free survival that exceeds 90% [24].
Our data set includes MRD measurements for childhood ALL patients in Total Therapy studies XV (), XVI (), and XVII (). The treatment regimens for the studies include many of the same drugs during the six-week induction phase. There are several anti-leukemic agents that are common to the three studies; prednisone, vincristine, daunorubicin, PEG-asparaginase, methotrexate, cyclophosphamide, cytarabine, and mercaptopurine. MRD measurements are made on day 15 and at the end of the remission induction phase of treatment (day 42).
The detection limit for MRD is 0.01% (1 leukemia cell among 10,000 normal cells in the bone marrow), below which the exact values are not considered accurately measurable. Figure 3 displays the observed MRD values at day 15 and 42 (on the scale). The values of MRD smaller than 0.01 are therefore censored and are jittered around 0.001 in the figure to improve presentation. The detection limit for MRD poses a modelling challenge, which we address in this work. The day 15 MRD assessment determines subsequent treatment during the remainder of the induction phase. Additionally, patients whose leukemic cells exhibit genomic variants for which targeted agents are available receive these chemotherapeutic agents (e.g. a tyrosine kinase inhibitor for patients whose ALL cells harbor a chromosomal translocation creating a BCR/ABL fusion). In our analysis, we focus on the drugs that all patients receive.
Additional information recorded at entry include age, gender, white blood cell count (WBC), ALL subtype, and treatment protocol number. Characteristics of the patients that are available across the three studies are summarised in Table 1. The median age of the children is between 4.9 years and 5.6 years, typical for childhood ALL. There are slightly more male than female patients. Baseline white blood cell counts are somewhat lower in Total XVI than in the other two studies.
The genomic subtype categories are shown in Figure 4. In the data set, 23 subtypes are observed, with the most common subtypes being hyperploid and ETV6-RUNX1, typical for childhood ALL. Section 1 includes information about the fusion product ETV6-RUNX1, which confers a favorable prognosis. Hyperdipoid ALL is also a common subtype that is also considered to have a more favorable prognosis [19]. We merged several subtypes that included fewer than 10 subjects each to improve degrees of freedom for this analysis. For example, we merged the Ph-like CRLF2 and the Ph-like non-CRLF2 subtypes [25]. The analysis included 12 distinct subtypes. The detected MRD levels vary for different genomic categories, as shown in Figure 1, displaying the measured MRD values on a -scale at each time point and across subtypes.

The data set includes estimates of patient-specific cancer cell sensitivities to various drugs in the combination chemotherapeutic regimen. The estimates arise from ex vivo dilution assays of patient samples. In essence, the patient’s leukemic cells are plated with each drug via dilution assays to determine sensitivity. For each drug concentration, a count of the number of surviving cells provides an estimate of the lethality of that concentration to the patient’s leukemia cells. From the roughly six concentrations per drug, an estimated LC50, the concentration that led to 50% of the cells dying, is used as the measure of the sensitivity of each patient’s disease to the specific drug. We included five anti-leukemia drugs in these analyses, selected after excluding the compounds for which the missing rate is more than 40%. The compounds used for this analysis are asparaginase, prednisone, vincristine, 6TG and 6MP. The entire ALL pharmacotype dataset was previously published [16]. To highlight the relationship between the observed MRD values and the estimated LC50s, in Figure 2 we present the Pearson correlations between these two variables for each of the five compounds and for each subtype category, averaged over 100 imputed LC50 data sets (see Section 3).

Estimates of LC50s for patients are not available for some medications if the number of available cells was not sufficient to test all medications. Therefore, we applied multiple imputation for the missing observations. We impute missing data via the R package mice [27].
Total XV | Total XVI | Total XVII | Overall | |
(N = 192) | (N = 428) | (N = 168) | (N = 788) | |
age | ||||
Mean (SD) | 6.45 (4.36) | 7.13 (4.82) | 6.89 (4.60) | 6.92 (4.67) |
Median [Min, Max] | 4.89 [1.02, 18.7] | 5.63 [0.120, 18.9] | 5.58 [1.00, 18.5] | 5.42 [0.120, 18.9] |
gender | ||||
Female | 93 (48.4%) | 182 (42.5%) | 81 (48.2%) | 356 (45.2%) |
Male | 99 (51.6%) | 246 (57.5%) | 87 (51.8%) | 432 (54.8%) |
WBC at diagnosis | ||||
Mean (SD) | 56.1 (105) | 46.3 (89.5) | 59.4 (105) | 51.5 (96.8) |
Median [Min, Max] | 17.4 [1.20, 1010] | 14.2 [0.700, 638] | 20.2 [1.30, 730] | 16.6 [0.700, 1014] |
MRD (day 15) | ||||
Mean (SD) | 3.50 (10.7) | 4.67 (13.3) | 3.62 (11.7) | 4.16 (12.4) |
Median [Min, Max] | 0.0295 [0, 72.1] | 0.116 [0, 86.8] | 0.0320 [0, 80.7] | 0.0500 [0, 86.8] |
MRD (day 42) | ||||
Mean (SD) | 0.460 (2.90) | 0.131 (0.846) | 0.167 (1.96) | 0.219 (1.81) |
Median [Min, Max] | 0 [0, 36.0] | 0 [0, 12.2] | 0 [0, 25.4] | 0 [0, 36.0] |
MRD cat. (day 15) | ||||
69 (35.9%) | 129 (30.1%) | 65 (38.7%) | 263 (33.4%) | |
83 (43.2%) | 161 (37.6%) | 58 (34.5%) | 302 (38.3%) | |
13 (6.8%) | 76 (17.8%) | 25 (14.9%) | 114 (14.5%) | |
27 (14.1%) | 62 (14.5%) | 20 (11.9%) | 109 (13.8%) | |
MRD cat. (day 42) | ||||
152 (79.2%) | 365 (85.3%) | 149 (88.7%) | 666 (84.5%) | |
27 (14.1%) | 49 (11.4%) | 18 (10.7%) | 94 (11.9%) | |
13 (6.8%) | 14 (3.3%) | 1 (0.6%) | 28 (3.6%) |
The main goals of the statistical model are: (i) to characterize the MRD responses at both time points accounting for their time dependence and the censoring; (ii) to investigate the effect of baseline and in-trial covariate information on the MRD levels; (iii) to identify subgroups of patients presenting similar sensitivity to drug exposures and (iv) to investigate the relationship between the clustering structure and drug-subtype interactions.


3 Model specification
Data are available for children, for whom both baseline and in-trial information are available. Covariates included in the analysis are age, gender, WBC at diagnosis, ALL subtype and therapy protocol, as well as drug sensitivity profiles obtained from ex vivo dilution assays. The regression model included eleven indicator variables for the twelve ALL subtypes. For each patient, at most one of these eleven parameters will equal 1. The baseline category is ETV6-RUNX1, the most prevalent subtype. If a patient is positive for ETV6-RUNX1, all eleven subtype variables will be 0.
The ALL data set presents two main statistical challenges: (i) censoring of the MRD levels due to the assay’s lower limit of detection (i.e., 0.01%); (ii) estimation of each patient’s drug sensitivity profile. As stated above, MRD values above the threshold 0.01% are observed. We chose to treat the lack of MRD as a censored observation instead, allowing for the fact that there may still be residual disease but at an undetectable level. This consideration is particularly important when introducing temporal dependence. In the likelihood, censoring can be accounted for using standard strategies from survival analysis and censored observations are imputed as part of the Markov chain Monte Carlo (MCMC) algorithm.
We account for the repeated measures aspect of the MRD assessments by including an autoregressive term (of order 1) to model dependence between the MRD at day 15 and day 42, only for those observations for which MRD at day 15 is above the detection threshold. Otherwise, the two time points are treated as conditionally independent. We opt for this strategy because in the dataset if MRD at day 15 is censored (i.e., 0.01%), then MRD at day 42 also falls below the detection threshold (see Figure 3). Introducing dependence among the observations censored at both times would bias the estimate of the temporal effect as well as of the covariate effects, since the statistical model includes covariate information via a linear regression term.
Another important feature of our modelling strategy is the inclusion of a patient-specific measure of the patient’s leukemic cells’ sensitivity to individual drugs in our model of MRD. Drug sensitivity profiles for each patient are estimated from ex vivo drug sensitivity assays. The model includes a finite mixture model for the profiles to allow the data to perhaps determine categories that relate to overall sensitivity or resistance to one or more of the anticancer drugs. A major advantage of the Bayesian framework is that it allows for joint estimation of the MRD model and the drug sensitivity profiles, enabling a probabilistically sound quantification and propagation of uncertainty.
We describe in detail the statistical strategy below.
Modelling censored MRD observations
Let denote the minimal residual disease measured at day 15 and 42 from entry into the study, respectively. Let . Let be the conditional density function of the -th observation at time (= 1, 2), given the covariates , the drug sensitivity profiles and the vector of parameters . Similarly, let be the conditional distribution function corresponding to the density and denoting the probability of the response falling below a value . Recall that the MRD values below 0.01 are censored at the lower detection level. Let be this threshold value. The probability of observing a value lower than the detection threshold is, therefore:
(1) |
We assume that and are conditionally independent given covariate information, the drug sensitivity profiles, and the parameters in the model. Let be the left-censoring indicator at time for subject , where if the corresponding MRD value is below the detection threshold (i.e., censored) and if it is observed. The likelihood of the model is then:
(2) |
We specify a Normal distribution as to model the MRD observations and assume
(3) | ||||
where is the normal distribution with mean and variance , HS represents the Horseshoe prior [6] for the regression parameters following the specification in [20] and is the inverse Gamma distribution with mean and variance . The HS prior belongs to the family of continuous shrinkage prior distributions, characterized by a singularity at zero, but whose fat tails still allow for large values of the coefficients. For each subject , the covariate vector contains information on age, gender, WBC (on scale), ALL subtype and therapy protocol. The variable contains information on the drug sensitivity profile (i.e., cluster membership) of the -th subject and is based on the results of the ex vivo assays. Estimation of is a key component of the proposed modelling strategy and will be discussed later in more details.
Note that the specification of the mean terms and is different. For day 15, contains standard regression terms on patients covariates, including ALL subtype and the clusters from the sensitivity profiles. At time , however, the mean term contains an auto-regressive term () along with regression coefficients, and estimation only occurs if the MRD level at time is not censored (i.e., ). By doing so, the treatment effect at day 15 is taken into account only if the patient has detectable MRD on day 15 (right quadrant of Figure 3). Indeed, in our data set if MRD is not detected at day 15, it is also not detected at day 42. The regression results would be biased by the imputation of the censored observations (which do not contain much information on covariate effects at day 42).
Drug sensitivity profiles
In this section we describe how the results from the ex vivo drug sensitivity assays (see Section 2) are used to estimate drug sensitivity profiles. Here drug sensitivity is estimated by the concentration of a drug that kills half of the tested cells in culture (LC50). Let be the patient ’s vector of (LC)50 values for the panel of anti-leukemic drugs. Our goal is to cluster individuals based on their cancer cells’ sensitivity to these five drugs. We perform an initial analysis of the drug sensitivity profiles to determine how many clusters are supported by the data. Because the missing rate is high (between 19.43% and 39.48%), we repeat the imputation procedure 100 times, producing 100 data sets. We apply clustering to each data set and analyze the results to ensure that the clustering is robust to the imputation. Specifically, we cluster the subjects in each imputed data set using the popular -means method.
Figure 5 shows the total within-sum-of-squares (WSS) values obtained for different number of pre-specified clusters (elbow plot). The figure shows the results for each imputed data set in grey, with the average across the 100 imputation data sets in red. It is common practice to select the point where the curve of WSS values in the elbow plot shows the beginning of a plateau as the number of clusters increases. In our case, we set . (We examined larger values of , but the results were similar.) For the main analysis, we select the imputation data set with the lowest WSS value when . In summary, this exploratory analysis is employed to determine the number of clusters for the different drug-sensitivity profiles in the data, as well as to choose an imputed data set to perform the main analysis.

The selected imputed data set is then modelled via a finite mixture with components as:
(4) | ||||
where is the -dimensional multivariate Normal distribution with mean vector and covariance matrix , while is the Dirichlet distribution with parameter vector . We set , corresponding to uniform prior probabilities. By fitting a mixture model to the imputed dataset, we also obtain an estimate of cluster membership for each individual. That is, using the latent variable representation of a mixture model, we can estimate , which denotes to which mixture component the -th observation belongs. Clusters correspond to distinct data-driven drug sensitivity profiles in our statistical model. A patient’s cluster membership is included in the model (3) through the regression terms , for and , with cluster membership appropriately coded via a vector of dummy variables of dimension . Since , we set the reference category to be representing membership to Cluster 1, while and denote membership to Cluster 2 and Cluster 3, respectively. We stress the fact that models (3) and (3) are estimated simultaneously, with random and updated within the MCMC algorithm jointly with the remaining parameters. The full model is summarised in Figure 6.
4 Posterior inference
Posterior inference under the proposed model is obtained via JAGS [22], interfaced with RStudio via the R package rjags [21]. We run the MCMC chain for 15000 iterations, discarding the first 5000 as burn-in and thinning every two.
Figure 7 displays the scatterplots of the posterior samples of the coefficients , and of the variances , . The posterior distribution of the auto-regressive coefficient is concentrated around the value 1 (posterior median 0.98, posterior interquartile range 0.92-1.05), indicating that the responses at the two different times are strongly associated for those subjects with . Furthermore, the residual variances and show little a posteriori correlation.



We now discuss the effect of the covariates and on the responses. The posterior distribution of the coefficients for are summarized in Figure 9, where we report their posterior means and 95% credible intervals. Each bar corresponds to one of the covariates included in the analysis. Most of the covariates are associated only with MRD measured at time (i.e., their 95% credible interval does not contain the value zero). Interestingly, we observe that age, WBC, and protocol are all relevant factors at day 15, while gender is not. The ALL subtype of the patient is also a relevant factor. The subtypes’ positive posterior regression coefficients on day 15 show substantially higher MRD on day 15 for these subtypes, relative to the reference subtype ETV6-RUNX1, as expressed by many non-zero coefficients. This holds also for the drug sensitivity profiles (see the posterior distribution of variables Cluster 2 and Cluster 3), relative to Cluster 1, the reference cluster. We see that Cluster 2 is associated with lower MRD on day 15 that Cluster 1, while Cluster 3 patients tend to have higher day 15 MRD values than Cluster 1 patients. Cluster 2 has more ETV6-RUNX1 and hyperdiploid patients, which are associated with sensitivity to prednisone and asparaginase, consistent with previous publications.


Recall that inference for day 42 depends on the presence of MRD on day 15. We see that Protocol T15 is associated with a higher mean MRD value than the other two studies. T16 and T17 occurred later and included some newer targeted anticancer drugs. We also see an improvement in the effect for the hyperdiploid subtype on day 42, relative to the reference genomic subtype. We also present posterior inference on the censored MRD values, which are imputed within the MCMC algorithm. In Figure 10 we show the posterior means for the censored observations, as well as the observed MRD values. Note that the bottom left quadrant corresponds to MRD values censored at both time points, while the bottom right panel to observations censored only at day 42. No observation lies in the upper-left quadrant, indicating that none of the subjects in the study experiences a relapse between days 15 and 42.

Clustering of the subjects is achieved via model (3), in which the sensitivity profiles are modelled via a three-component mixture distribution. The posterior estimate of the partition, obtained by minimizing Binder’s loss function as implemented in the R package salso [8], presents three clusters of sizes 271, 267 and 250, respectively. We provide an interpretation of the cluster features by investigating the average values of LC50s within each cluster, as well as the proportion of subjects characterised by each subtype. The results are summarised in the radar plots in Figure 11. We observe differences among clusters in the values of the LC50 relative to the drugs prednisone and asparaginase, and in the proportion of subjects with subtype ETV6-RUNX1. Cluster 2 is associated with the lowest MRD on day 15 (best prognosis). This cluster has more ETV6-RUNX1 and hyperdiploid patients, which are associated with sensitivity to prednisone and asparaginase, consistent with previous publications (e.g., [13, 1]).






5 Conclusions
Minimal residual disease is used in clinical practice to assess treatment response in patients with ALL. MRD is typically regarded as an independent prognostic factor and is used in clinical trials for risk assignment and to guide clinical therapeutic choices in the management of the treatment of the cancer. In this work we combine information from MRD measurements taken at two time points with patient-specific drug sensitivity estimates and leukemia genomics, with the goal of better understanding treatment response and the role of potential prognostic or predictive (i.e., relevant for treatment selection) factors. Our results highlight that clustering of individuals in terms of sensitivity to drugs that are part of the induction phase of therapy is mainly driven by sensitivity to two anticancer agents, prednisone and asparaginase. The analysis also highlights the role of genomic subtypes as important determinants of response to induction therapy as measured by MRD.
From a methodological point of view, the Bayesian framework allows us to jointly estimate the parameters that relate MRD at different time points during the course of therapy to patient-specific covariates, including simultaneously inferred drug sensitivity profiles. Moreover, it is straightforward to account for censoring. A key component of our modelling strategy is that we model MRD at day 42 conditionally on the event that MRD on day 15 is detected. This allows for extra flexibility and accounts for the fact that in our data set remission at day 15 implies remission at day 42.
A drawback of our modelling strategy is pre-imputation of missing LC50 values, which leads to underestimation of uncertainty. We opted for this strategy because of the substantial missing rate. In theory, it is straightforward to impute missing observations within the MCMC algorithm. Future work includes a further investigation of the relationship between subtypes and drug sensitivity. In the data, estimation of LC50 occasionally led to estimates below and above the range of concentrations for the patient. Our model did not account for the censoring in estimation of LC50s for affected drugs with these patients’ samples.
Another limitation is that the analysis was confined to a subset of ALL subtypes, pooling subtypes with fewer than ten patients. We did consider hierarchical modeling but still pooling. Future work will consider all ALL subtypes as separate categories, including those with small numbers. A robust multi-level model may be useful to account for the uncertainty of the rarer subtypes. Such an extension would be useful, as we can imagine that more and more “subtypes” with very few patients will be defined in the future.
Future research will consider the incorporation of a greater number of serial MRD measures in the model. Such a strategy could include measures after remission induction that could be part of monitoring the disease in peripheral blood rather than bone marrow and help fine tune treatment decision rules.
References
- Autry et al. [2020] Robert J. Autry, Steven W. Paugh, Robert Carter, Lei Shi, Jingjing Liu, Daniel C. Ferguson, Calvin E. Lau, Erik J. Bonten, Wenjian Yang, J. Robert McCorkle, Jordan A. Beard, John C. Panetta, Jonathan D. Diedrich, Kristine R. Crews, Deqing Pei, Christopher J. Coke, Sivaraman Natarajan, Alireza Khatamian, Seth E. Karol, Elixabet Lopez-Lopez, Barthelemy Diouf, Colton Smith, Yoshihiro Gocho, Kohei Hagiwara, Kathryn G. Roberts, Stanley Pounds, Steven M. Kornblau, Wendy Stock, Elisabeth M. Paietta, Mark R. Litzow, Hiroto Inaba, Charles G. Mullighan, Sima Jeha, Ching-Hon Pui, Cheng Cheng, Daniel Savic, Jiyang Yu, Charles Gawad, Mary V. Relling, Jun J. Yang, and William E. Evans. Integrative genomic analyses reveal mechanisms of glucocorticoid resistance in acute lymphoblastic leukemia. Nature Cancer, 1(3):329–344, 2020. doi: 10.1038/s43018-020-0037-3. URL https://doi.org/10.1038/s43018-020-0037-3.
- Borowitz et al. [2015] M. J. Borowitz, B. L. Wood, M. Devidas, M. L. Loh, E. A. Raetz, W. L. Salzer, J. B. Nachman, A. J. Carroll, N. A. Heerema, J. M. Gastier-Foster, C. L. Willman, Y. Dai, N. J. Winick, S. P. Hunger, W. L. Carroll, and E. Larsen. Prognostic significance of minimal residual disease in high risk b-all: a report from children’s oncology group study aall0232. Blood, 126(8):964–71, 2015. ISSN 1528-0020 (Electronic) 0006-4971 (Linking). doi: 10.1182/blood-2015-03-633685. URL https://www.ncbi.nlm.nih.gov/pubmed/26124497.
- Campana [2008] D. Campana. Molecular determinants of treatment response in acute lymphoblastic leukemia. Hematology Am Soc Hematol Educ Program, pages 366–73, 2008. ISSN 1520-4391 (Print) 1520-4383. doi: 10.1182/asheducation-2008.1.366.
- Campana [2010] Dario Campana. Minimal residual disease in acute lymphoblastic leukemia. Hematology 2010, the American Society of Hematology Education Program Book, 2010(1):7–12, 2010.
- Campana and Pui [2017] Dario Campana and Ching-Hon Pui. Minimal residual disease-guided therapy in childhood acute lymphoblastic leukemia. Blood, 129(14):1913–1918, 04 2017. ISSN 0006-4971. doi: 10.1182/blood-2016-12-725804. URL https://doi.org/10.1182/blood-2016-12-725804.
- Carvalho et al. [2010] Carlos M. Carvalho, Nicholas G. Polson, and James G. Scott. The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480, 2010. ISSN 00063444, 14643510. URL http://www.jstor.org/stable/25734098.
- Cavé et al. [1998] Hélǹe Cavé, Jutte van der Werff ten Bosch, Stefan Suciu, Christine Guidal, Christine Waterkeyn, Jacques Otten, Marleen Bakkus, Kris Thielemans, Bernard Grandchamp, Etienne Vilmer, Brigitte Nelken, Martine Fournier, Patrick Boutard, Emmanuel Lebrun, Françoise Méchinaud, Richard Garand, Alain Robert, Nicole Dastugue, Emmanuel Plouvier, Evelyne Racadot, Alice Ferster, Jan Gyselinck, Odile Fenneteau, Michel Duval, Gabriel Solbu, and Anne-Marie Manel. Clinical significance of minimal residual disease in childhood acute lymphoblastic leukemia. New England Journal of Medicine, 339(9):591–598, 1998. doi: 10.1056/NEJM199808273390904. URL https://doi.org/10.1056/NEJM199808273390904. PMID: 9718378.
- Dahl et al. [2020] David B. Dahl, Devin J. Johnson, and Peter Müller. salso: Search Algorithms and Loss Functions for Bayesian Clustering, 2020. URL https://CRAN.R-project.org/package=salso. R package version 0.2.5.
- Della Starza et al. [2019] Irene Della Starza, Sabina Chiaretti, Maria S De Propris, Loredana Elia, Marzia Cavalli, Lucia A De Novi, Roberta Soscia, Monica Messina, Antonella Vitale, Anna Guarini, et al. Minimal residual disease in acute lymphoblastic leukemia: technical and clinical advances. Frontiers in oncology, page 726, 2019.
- Estlin et al. [2000] EJ Estlin, M Ronghe, GAA Burke, and SM Yule. The clinical and cellular pharmacology of vincristine, corticosteroids, l-asparaginase, anthracyclines and cyclophosphamide in relation to childhood acute lymphoblastic leukaemia. British journal of haematology, 110(4):780–790, 2000.
- Gandemer et al. [2012] Virginie Gandemer, Sylvie Chevret, Arnaud Petit, Christiane Vermylen, Thierry Leblanc, Gérard Michel, Claudine Schmitt, Odile Lejars, Pascale Schneider, François Demeocq, Brigitte Bader-Meunier, Françoise Bernaudin, Yves Perel, Marie-Françoise Auclerc, Jean-Michel Cayuela, Guy Leverger, and André Baruchel. Excellent prognosis of late relapses of etv6/runx1-positive childhood acute lymphoblastic leukemia: lessons from the fralle 93 protocol. Haematologica, 97(11):1743–1750, Nov 2012. ISSN 1592-8721 (Electronic); 0390-6078 (Print); 0390-6078 (Linking). doi: 10.3324/haematol.2011.059584.
- George and McCulloch [1993] Edward I George and Robert E McCulloch. Variable selection via gibbs sampling. Journal of the American Statistical Association, 88(423):881–889, 1993.
- Holleman et al. [2004] Amy Holleman, Meyling H. Cheok, Monique L. den Boer, Wenjian Yang, Anjo J.P. Veerman, Karin M. Kazemier, Deqing Pei, Cheng Cheng, Ching-Hon Pui, Mary V. Relling, Gritta E. Janka-Schaub, Rob Pieters, and William E. Evans. Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. New England Journal of Medicine, 351(6):533–542, 2004. doi: 10.1056/NEJMoa033513. URL https://doi.org/10.1056/NEJMoa033513. PMID: 15295046.
- Jeha et al. [2021] Sima Jeha, John Choi, Kathryn G. Roberts, Deqing Pei, Elaine Coustan-Smith, Hiroto Inaba, Jeffrey E. Rubnitz, Raul C. Ribeiro, Tanja A. Gruber, Susana C. Raimondi, Seth E. Karol, Chunxu Qu, Samuel W. Brady, Zhaohui Gu, Jun J. Yang, Cheng Cheng, James R. Downing, Williams E. Evans, Mary V. Relling, Dario Campana, Charles G. Mullighan, and Ching-Hon Pui. Clinical Significance of Novel Subtypes of Acute Lymphoblastic Leukemia in the Context of Minimal Residual Disease–Directed Therapy. Blood Cancer Discovery, 2(4):326–337, 07 2021. ISSN 2643-3230. doi: 10.1158/2643-3230.BCD-20-0229. URL https://doi.org/10.1158/2643-3230.BCD-20-0229.
- Kruse et al. [2020] A. Kruse, N. Abdel-Azim, H. N. Kim, Y. Ruan, V. Phan, H. Ogana, W. Wang, R. Lee, E. J. Gang, S. Khazal, and Y. M. Kim. Minimal residual disease detection in acute lymphoblastic leukemia. Int J Mol Sci, 21(3), 2020. ISSN 1422-0067 (Electronic) 1422-0067 (Linking). doi: 10.3390/ijms21031054. URL https://www.ncbi.nlm.nih.gov/pubmed/32033444.
- Lee et al. [2023] Shawn H. R. Lee, Wenjian Yang, Yoshihiro Gocho, August John, Lauren Rowland, Brandon Smart, Hannah Williams, Dylan Maxwell, Jeremy Hunt, Wentao Yang, Kristine R. Crews, Kathryn G. Roberts, Sima Jeha, Cheng Cheng, Seth E. Karol, Mary V. Relling, Gary L. Rosner, Hiroto Inaba, Charles G. Mullighan, Ching-Hon Pui, William E. Evans, and Jun J. Yang. Pharmacotypes across the genomic landscape of pediatric acute lymphoblastic leukemia and impact on treatment response. Nature Medicine, 29(1):170–179, 2023. doi: 10.1038/s41591-022-02112-7. URL https://doi.org/10.1038/s41591-022-02112-7.
- Ma et al. [2014] Haiqing Ma, Huanhuan Sun, and Xiaoping Sun. Survival improvement by decade of patients aged 0-14 years with acute lymphoblastic leukemia: a seer analysis. Scientific Reports, 4(1):4227, 2014. doi: 10.1038/srep04227. URL https://doi.org/10.1038/srep04227.
- Marcotte et al. [2021] Erin L. Marcotte, Allison M. Domingues, Jeannette M. Sample, Michaela R. Richardson, and Logan G. Spector. Racial and ethnic disparities in pediatric cancer incidence among children and young adults in the united states by single year of age. Cancer, 127(19):3651–3663, 2021. doi: https://doi.org/10.1002/cncr.33678. URL https://acsjournals.onlinelibrary.wiley.com/doi/abs/10.1002/cncr.33678.
- Paulsson et al. [2010] Kajsa Paulsson, Erik Forestier, Henrik Lilljebjörn, Jesper Heldrup, Mikael Behrendtz, Bryan D. Young, and Bertil Johansson. Genetic landscape of high hyperdiploid childhood acute lymphoblastic leukemia. Proceedings of the National Academy of Sciences, 107(50):21719–21724, 2010. doi: 10.1073/pnas.1006981107. URL https://www.pnas.org/doi/abs/10.1073/pnas.1006981107.
- Piironen and Vehtari [2017] Juho Piironen and Aki Vehtari. Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2):5018–5051, 2017.
- Plummer [2019] Martyn Plummer. rjags: Bayesian Graphical Models using MCMC, 2019. URL https://CRAN.R-project.org/package=rjags. R package version 4-10.
- Plummer et al. [2003] Martyn Plummer et al. Jags: A program for analysis of bayesian graphical models using gibbs sampling. Proceedings of the 3rd international workshop on distributed statistical computing, 124(125.10):1–10, 2003.
- Pui et al. [2010] C H Pui, D. Pei, J T Sandlund, R C Ribeiro, J E Rubnitz, S C Raimondi, M. Onciu, D. Campana, L E Kun, S. Jeha, C. Cheng, S C Howard, M L Metzger, D. Bhojwani, J R Downing, W E Evans, and M V Relling. Long-term results of st jude total therapy studies 11, 12, 13a, 13b, and 14 for childhood acute lymphoblastic leukemia. Leukemia, 24(2):371–382, 2010. doi: 10.1038/leu.2009.252. URL https://doi.org/10.1038/leu.2009.252.
- Pui and Evans [2013] Ching-Hon Pui and William E. Evans. A 50-year journey to cure childhood acute lymphoblastic leukemia. Seminars in Hematology, 50(3):185–196, 2013. ISSN 0037-1963. doi: https://doi.org/10.1053/j.seminhematol.2013.06.007. URL https://www.sciencedirect.com/science/article/pii/S0037196313000905. Consultative Hematology.
- Tasian et al. [2017] S. K. Tasian, M. L. Loh, and S. P. Hunger. Philadelphia chromosome-like acute lymphoblastic leukemia. Blood, 130(19):2064–2072, 2017. ISSN 0006-4971 (Print) 0006-4971. doi: 10.1182/blood-2017-06-743252.
- Theunissen et al. [2017] P. Theunissen, E. Mejstrikova, L. Sedek, A. J. van der Sluijs-Gelling, G. Gaipa, M. Bartels, E. Sobral da Costa, M. Kotrova, M. Novakova, E. Sonneveld, C. Buracchi, P. Bonaccorso, E. Oliveira, J. G. Te Marvelde, T. Szczepanski, L. Lhermitte, O. Hrusak, Q. Lecrevisse, G. E. Grigore, E. Fronkova, J. Trka, M. Bruggemann, A. Orfao, J. J. van Dongen, V. H. van der Velden, and Consortium EuroFlow. Standardized flow cytometry for highly sensitive mrd measurements in b-cell acute lymphoblastic leukemia. Blood, 129(3):347–357, 2017. ISSN 1528-0020 (Electronic) 0006-4971 (Linking). doi: 10.1182/blood-2016-07-726307. URL https://www.ncbi.nlm.nih.gov/pubmed/27903527.
- Van Buuren and Groothuis-Oudshoorn [2011] Stef Van Buuren and Karin Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in r. Journal of statistical software, 45:1–67, 2011.
- Wijaya et al. [2020] Juwina Wijaya, Tomoka Gose, and John D. Schuetz. Using pharmacology to squeeze the life out of childhood leukemia, and potential strategies to achieve breakthroughs in medulloblastoma treatment. Pharmacological Reviews, 72(3):668–691, 2020. ISSN 0031-6997. doi: 10.1124/pr.118.016824. URL https://pharmrev.aspetjournals.org/content/72/3/668.