Random Survival Forest for Censored Functional Data

Elvira Romano Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Caserta, Italy Giuseppe Loffredo Department of Mathematics and Physics, University of Campania "Luigi Vanvitelli", Caserta, Italy Fabrizio Maturo Department of Economics, Statistics and Business, Faculty of Technological and Innovation Sciences, Universitas Mercatorum, Rome, Italy

Abstract

This paper introduces a Random Survival Forest (RSF) method for functional data. The focus is specifically on defining a new functional data structure, the Censored Functional Data (CFD), for dealing with temporal observations that are censored due to study limitations or incomplete data collection. This approach allows for precise modelling of functional survival trajectories, leading to improved interpretation and prediction of survival dynamics across different groups. A medical survival study on the benchmark SOFA data set is presented. Results show good performance of the proposed approach, particularly in ranking the importance of predicting variables, as captured through dynamic changes in SOFA scores and patient mortality rates.

Keywords: Functional Data Analysis, Survival Analysis, Random Survival Forest, Functional Principal Component Analysis, Functional Random Survival Forest.

⁰⁰footnotetext: Abbreviations: FDA, Functional Data Analysis; FPC, Functional Principal Components; FCTs, Functional Classification Trees; FRF, Functional Random Forest; FBG, Functional Bagging; OOBFD, Out of Bag Functional Data; IBFD, In-Bag Functional Data; ESC, Empirical Splitting Curve; TSC, Theoretical Splitting Curve; FBGSS, Functional Between Groups Sum of Squares; FBLSS, Functional Between Leaves Sum of Squares.

1 Introduction

Survival analysis focuses on estimating and predicting the time until an event of interest, such as death or another unique occurrence, using historical data Kleinbaum and Klein (2012). This type of analysis can be challenging because the exact timing of the event may be unknown in certain instances. Survival trees (STs), a specific type of model within this field, allow for uncovering intricate nonlinear relationships in an intuitive, interpretable format. By iteratively dividing the population into subgroups and predicting a unique survival distribution for each terminal node, STs provide a powerful tool for understanding and forecasting time-to-event data. STs and their ensembles are widespread in non-parametric modelling. STs are particularly noteworthy for their ability to handle complex relationships and interactions among variables Ishwaran et al. (2008). They offer a clear picture of the decision-making procedure, making it easier to interpret survival patterns and recognise critical prognostic factors. As a result, STs have established a unique position in survival analysis by delivering a visually understandable framework for modelling and predicting survival probabilities. Well-known limitations of single STs are variance and overfitting. To address these concerns, Bagging (SB) and Random Survival Forest (RSF) Wang et al. (2019) are frequently used. These techniques enhance reliability by averaging multiple trees, reducing variance, preventing overfitting, and improving prediction accuracy and robustness. An extensive review can be found in Wang and Li (2017). SB trains multiple STs on bootstrapped samples generated from the original dataset. Then, the single predictions of STs are combined through averaging or voting. RSF expands the concept of traditional Random Forest (RF) to survival analysis Hemant et al. (2008), Biemann and Kearney (2010). In RSF, each tree is trained on a random subset of observations and a random subset of features at each split. Introducing randomness in the tree-building process, RSF diminishes the correlation between STs and improves the diversity of the ensemble, resulting in better generalisation performance. Further research also focused on combining survival analysis with various modelling approaches, discussed in more detail in Wey et al. (2015).

Studying the relationship between time-varying processes and time-to-event data is challenging in clinical research. A modern alternative solution involves using Functional Data Analysis (FDA) Ramsay and Silverman (2005). FDA transforms longitudinal data into curves, which are then integrated into survival analysis models. A first approach proposed by Lin et al. (2021) incorporates multiple longitudinal outcomes by extracting features using two methods: multivariate functional principal component analysis and multivariate fast covariance estimation for sparse functional data. These extracted features are subsequently used as covariates in a survival model, with FPCA aiding in the estimation process. Spreafico et al. (2023) Spreafico et al. (2023) proposed a novel approach for addressing time-varying covariates in survival analysis using FDA. The latter research aims to transform longitudinal data into functional data, facilitating their treatment and analysis within the survival framework. For longitudinal studies, patients often return for follow-up visits at regular intervals. Therefore, the true event time is only known to lie within an interval between visits, and the exact time is obscured. Multiple studies have demonstrated biased survival outcome estimates using the Cox model’s well-developed methodology for modelling interval-censored data Zhang and Sun (2010). In many studies, the focus has been on reconstructing curves over the entire follow-up period of study Delaigle and Hall (2013)Strzalkowska-Kominiak and Romo (2021) to manage the data at the same observation time. Starting from these latter ideas, our research strives to extend the concept of RSF to the functional data framework, exploiting the FDA’s potential in dimensionality reduction, predictive power, interpretability, and ability to reconstruct censored data.

The paper positions itself within the literature on combining FDA and statistical learning. The starting point of our proposal is to extend the Functional Random Forest (FRF) offered by Maturo and Verde (2023), Maturo and Verde (2022) to the context of irregular data in survival analysis, with a particular focus on the issues of censured data reconstruction and explainability of the RSF.

Implementing RSF in the functional context poses significant challenges, especially given the vast amount of available data and the incomplete nature of each statistical unit’s temporal sequences. This work considers a new type of functional data, defined as Censored Functional Data (CFD). Unlike previous studies, we concentrate on the actual observation period to reconstruct the trajectories associated with each unit rather than extending the curves over the entire temporal domain of the follow-up study. This approach aims to maximise the utilisation of available information during the study period without excessive extensions or interpolations. Essentially, the main goal is to accurately model the trajectory of curves, considering only the observation period associated with each unit, thus ensuring a precise representation of the survival dynamic within that specific period.

By integrating the CFD with FPCA, we aim to capture relationships and dynamic patterns in the data, leading to more accurate and interpretable survival predictions within STs. Section 2 introduces preliminaries on Functional Data Analysis and Survival Random Forest and describes our contribution. Section 3 assesses the performance of our proposed approach on the well-known "SOFA" data set. The paper ends with a discussion and conclusion on our proposal.

2 Material and Methods

2.1 Preliminaries on Functional Data Analysis (FDA)

The field of Functional Data Analysis (FDA) Ramsay et al. (2009); Ramsay and Silverman (2005); Ferraty (2011) deals with data represented in functional form rather than using traditional discrete observations. Typically, data are handled as vectors or matrices, with an assumption of independence among observations. However, in many situations, the observations are displayed on a discrete time scale or spatial domain, and the information is not available on the entire study domain. Indeed, FDA aims to capture the underlying structure and the variability present in discrete data by transforming it into a functional data object that assumes values within a functional space denoted as $\mathcal{H}$ . In general, the functional data object can be represented as a random variable $\mathcal{X}=\{X(t);t\in\mathcal{T}\}$ , where $\mathcal{T}$ is a common domain and $X(t)$ denotes the value that the function assumes at time or position $t$ . In this article, we will consider $\mathcal{T}$ to be a space-time domain. This real-valued function can be viewed as the realization of a one-dimensional stochastic process, often assumed to be in a Hilbert space, such as $L^{2}$ defined on the compact domain $\mathcal{T}$ . Here, a stochastic process $\mathcal{X}$ is said to be an $L^{2}$ process if and only if it satisfies $E\big{[}\int_{\mathcal{T}}X(t)\,dt\big{]}<\infty$ . The $L^{2}$ space is defined as a normed space of functions where the norm is given by $\|\cdot\|_{2}$ . In general, considering a time domain $\mathcal{T}$ , in the Hilbert space $(L^{2}(\mathcal{T}),\|\cdot\|_{2})$ is defined the distance between two functions $f(t)$ and $g(t)$ as follows:

d_{L^{2}}(f,g)=\|f-g\|_{2}=\left(\int_{\mathcal{T}}\left|f(t)-g(t)\right|^{2}dt\right)^{\frac{1}{2}}\,.

(1)

FDA aims to convert discrete data into a functional form using techniques such as smoothing, interpolation, and regression Schimek (2013). We observe the discrete-time data given by $N$ pairs $(Y_{i},t_{i})$ , where $t_{i}\in\mathcal{T}$ and $Y_{i}\in\mathbb{R}$ are the values recorded at the time $t_{i}$ for $i=1,...,N$ statistic units. The $i$ -th recorded value $Y_{i}=Y(t_{i})$ corresponds to a realization of a generic function $X(\cdot)$ at time $t_{i}\in\mathcal{T}$ . In FDA, we can define the function $X(\cdot)$ by finite linear combination Ramsay and Silverman (2005) as follows:

X(t)=\sum_{j=1}^{K}c_{j}\phi_{j}(t)\quad t\in\mathcal{T}

(2)

where $\{\phi_{j}(t)\}_{j=1}^{K}$ is a set of basis functions used in the representation, and $\{c_{j}\}_{j=1}^{K}$ are the coefficients of each basis function. However, in real cases, the $i$ -th observation can be affected by the $i$ -th term’s error, $\varepsilon_{i}$ , due to measurement inaccuracies or other factors introducing variability. Thus, we observe $Y_{i}$ as $Y_{i}=X(t_{i})+\varepsilon_{i}$ . In the Equation (2), the functional data is approximated by a finite linear combination of basis functions that can be chosen based on the specific problem and the characteristics of the data trend. The coefficients $\{c_{j}\}_{j=1}^{K}$ can be calculated using the sum of squared errors (SSE), where the problem is given by:

\min\,(\varepsilon_{i})^{2}=\min_{c_{1},c_{2},\ldots,c_{K}}\sum_{i=1}^{N}\left(Y_{i}-\sum_{j=1}^{K}c_{j}\phi_{j}(t_{i})\right)^{2}\,,

(3)

where $N$ is the number of data points observed, $t_{i}$ represents the $i$ -th instant time, and $K$ is the number of basis functions chosen for the functional representation. In the context of FDA, considering the $N$ data functions expressed as $X_{1}(t),...,X_{N}(t)$ , we can introduce the standard estimators for the mean and covariance functions:

\hat{\mu}(t)=\frac{1}{N}\sum_{i=1}^{N}X_{i}(t)

(4)

and

\widehat{C}(t,s)=\frac{1}{N}\sum_{i=1}^{N}[(X_{i}(t)-\hat{\mu}(t))(X_{i}(s)-\hat{\mu}(s))]\,.

(5)

Moreover, the FDA provides many useful tools for dealing with functional data to interpret and make predictions e.g., the Functional Principal Component Analysis (FPCA) Ramsay and Silverman (2005); Ferraty and Vieu (2006), Functional Regression Analysis (FRA) Ramsay and Silverman (2005); Febrero-Bande and de la Fuente (2012a) and Functional Classification Trees (FCTs) Maturo and Verde (2023, 2022, 2024). FPCA extends classical Principal Component Analysis (PCA) in a functional context. The method allows us to reduce the high dimensionality of the data, preserving the maximum amount of information Ramsay and Silverman (2005); Aguilera and Aguilera-Morillo (2013); Febrero-Bande and de la Fuente (2012b). In this context, the approximated functional data $X(\cdot)$ is given by:

X(t)=\sum_{m=1}^{p}\nu_{m}\xi_{m}(t)\quad t\in\mathcal{T}\,,

(6)

where $p$ represents the total number of Functional Principal Components Scores (FPCs), where $\xi_{m}(t)$ denotes the eigenfunction for the $m$ -th of the function $X(t)$ . The explained variance of these curves is determined by $\sum_{m=1}^{p}\lambda_{m}$ , where $\lambda_{m}$ represents the variance of the $m$ -th FPC. It’s important to note that the variance explained by each FPC decreases as the index $m$ increases, that is $\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{p}$ . We assume $X_{i}(t)$ functions, for $i=1,\ldots,N$ , be centered, i.e., $\int_{\mathcal{T}}X_{i}(t)dt=0$ , which implies that the overall mean function $\mu(t)$ is zero. The $m$ -th FPC can be obtained as follows:

\nu_{im}=\int_{\mathcal{T}}X_{i}(t)\xi_{m}(t)dt,\quad i=1,\ldots,N,\quad m=1,\ldots,p

(7)

where the eigenfunctions $\xi_{m}(t)$ are obtained resolving this problem:

\max_{\xi_{m}}Var\Bigg{[}\int_{\mathcal{T}}X_{i}(t)\xi_{m}(t)dt\Bigg{]},

(8)

s.t.

\|\xi_{m}\|_{2}^{2}=\int_{\mathcal{T}}\xi_{m}(t)^{2}\,dt=1\,,

(9)

and

\int_{\mathcal{T}}\xi_{m}(t)\xi_{n}(t)\,dt=0\,\qquad\text{for}\qquad m\neq n.

(10)

In statistics and machine learning, the data points can be classified into distinct categories using the features collected during the study, which can vary across different fields and applications. In many situations, utilizing functional data objects as features can be advantageous. Starting from a classic Classification Tree (CT), we can move to the functional approach, which involves adapting traditional techniques to accommodate functional data, which are functions defined over a continuous domain. The method is called Functional Classification Tree (FCT) and allows us to deal with functions defined in a functional space $L^{2}(\mathcal{T})$ . The methodology takes the pairs $\{Z_{i},X_{i}(t)\}$ , where $X_{i}(t)$ is a predictor curve for $i$ -th unit defined within metric space $(L^{2}(\mathcal{T}),\|\cdot\|)$ , and $Z_{i}$ is a scalar response observed for $i$ -th unit. Thus, the FCT algorithm predicts the vector-response $\bm{Z}\in\mathbb{R}^{N}$ considering as features the FPCs’ scores obtained from decomposition (6) applied for each $X_{i}(t)$ to obtain the score-matrix as follows:

\bm{V}=\begin{pmatrix}\nu_{11}&\nu_{12}&\dots&\nu_{1p}\\ \nu_{21}&\nu_{22}&\dots&\nu_{2p}\\ \vdots&\vdots&\ddots&\vdots\\ \nu_{N1}&\nu_{N2}&\dots&\nu_{Np}\\ \end{pmatrix}

(11)

where $\bm{V}\in\mathbb{R}^{N\times p}$ and $\nu_{im}$ correspond to FPC score of the $i$ -th curve relative to the $m$ -th eigenfuncion $\xi_{m}$ for $i=1,...,N$ and $m=1,...,p$ Maturo and Verde (2022, 2023). This approach is called the Functional Classification Tree with Principal Components (FCT-FPCs). The FCT-FPCs organize data into groups based on their characteristics. Starting with all data points, which are the functions, the algorithm involves stratifying and segmenting the predictor space into rectangular regions, called terminal nodes or lives, based on certain criteria that make the groups more homogeneous by using as features the score-vector $\bm{\nu}_{m}=(\nu_{1m},...,\nu_{Nm})^{T}$ for $m=1,...,p$ . The method selects the best features $\bm{\nu_{m}}$ and the best threshold $c$ to define at each internal node of the tree two half-planes given by:

R_{1}(m,c)=\{X(t)\in L^{2}(\mathcal{T})|\bm{\nu_{m}}\leq c\}\quad\text{and}\quad R_{2}(m,c)=\{X(t)\in L^{2}(\mathcal{T})|\bm{\nu_{m}}>c\}

(12)

Then, to determine the best feature and threshold to split on, the algorithm uses the splitting criterion such as the Gini index or the Shannon-Weiner index Chao and Shen (2003); Chao et al. (2014). Let $Q(\text{node})$ be the splitting criterion at node $h$ , which divides it into two daughter nodes left-daughter, right-daughter. Let $Q(\text{left-daughter})$ , $Q(\text{right-daughter})$ be the splitting criteria for the node and its two daughter nodes. The impurity is computed as follows:

\Delta Q=Q(\text{node})-\left(\frac{n_{\text{left}}}{n_{\text{node}}}Q(\text{left-daughter})+\frac{n_{\text{right}}}{n_{\text{node}}}Q(\text{right-daughter})\right)

(13)

Where $n_{\text{node}}$ is the number of samples in the splitting-node, $n_{\text{left}}$ and $n_{\text{right}}$ are the number of samples in the left and right daughter nodes. The feature $\bm{\nu}_{m}$ and threshold $c$ that maximize $\Delta Q$ are chosen for splitting at each node.

2.2 Contribute

This section focuses on managing survival time data, starting with the creation of censored functional data and ending with the creation of the Functional Random Survival Forest (FRSF) based on FPCs. We explore the complexity of managing survival data, address challenges posed by censoring, and use techniques such as Principal Components Analysis through Conditional Expectation (PACE) to extract meaningful features from irregular functional data.

2.2.1 Censored Functional Data

In a survival study, $N$ subjects are enrolled in a given period $\mathcal{T}=[a,b]$ called a follow-up. In this period, different data are collected for each $i$ -th subject, with $i=1,\ldots,N$ . In general, the times $T_{i}$ denote the survival time, and $C_{i}$ denote the censored time, i.e., an event did not occur for the subject. Let $\delta_{i}$ be defined for each subject as follows:

\delta_{i}:=\begin{cases}1&\text{if $T_{i}\leq C_{i}$ (Event occurred)}\\ 0&\text{if $T_{i}>C_{i}$ (Censoring occurred)}\end{cases}\quad i=1,...,N\,.

(14)

During the follow-up study, $J_{i}$ values are collected for each subject until the event time, denoted by $T_{i}^{*}=\mbox{min}\,(C_{i},T_{i})$ , occurs. This information defines the Survival Time Data (STD) given by $(\bm{Y}_{i},\bm{t}_{i},T_{i}^{*},\delta_{i})$ , where $\bm{Y}_{i}=(Y_{11},...,Y_{1J_{i}})^{T}$ refers to the observed values vector with $Y_{ij}=Y(t_{ij})$ and $\bm{t}_{i}=(t_{i1},...,t_{iJ_{i}})^{T}$ is the time vector for the $i$ -th unit.

In this context, the main challenge of employing the FDA approach and leveraging functional tools is to retrieve data for every subject, even when there are no observations during the entire observation period. For this reason, our approach involves the introduction of CFD, enabling the continuous reconstruction of information for each subject at any given moment. The main issue of this extension is how to replace these data by using different functions according to the number of recording values $J_{i}$ . Note that $t_{iJ_{i}}\leq T_{i}^{*}\leq t_{iJ_{i+1}}$ for each $i$ -th unit and no longitudinal measurements are available after $T_{i}^{*}$ . Thus, the observation of the function $\mathcal{X}_{i}$ consist of $J_{i}$ pairs $(t_{ij},Y_{ij})$ for $i=1,...,N$ and $j=1,\ldots,J_{i}$ .

Constructing functional data requires a different approach in survival analysis with censored data. Hence, we aim to develop a methodology that handles censoring and addresses the unique aspects of functional data, such as temporal truncation, to ensure accurate modelling and interpretation of the underlying patterns over time. In our case, we have $J_{i}\neq J$ as observed data, which are $J_{i}$ -dimensional. For this reason, the first step is to convert the observed values $\{(Y_{ij},t_{ij}):i=1,...,N\>\text{and}\>j=1,...,J_{i}\}$ into a functional form by considering different recording values. In this connection, the standard approach to estimating the functional form by starting with observed values is the basis approximation. The CFD can be defined as follows:

Definition 1

Let functional space $L^{2}(\mathcal{T})$ where $\mathcal{T}=[a,b]$ is a compact interval. The CFD is a functional data expressed as follows:

\mathcal{X}_{i}=\{X_{i}(t):t\in[a,T_{i}^{*}]\}\quad\text{with}\quad\>T_{i}^{*}=\mbox{min}\,(C_{i},T_{i})\quad\text{and}\quad\delta_{i}=1_{[T_{i}^{*}=T_{i}]}\quad\text{for}\quad i=1,...,N

(15)

where $T_{i}$ is the true event time and $C_{i}$ is the censored time.

As in (15), we can express these functions according to numbers of $J_{i}$ . For which, using a fixed basis representation and other functional representations, we obtain the CFD as follows:

Definition 2

Let functional space $L^{2}(\mathcal{T})$ defined on the compact interval $\mathcal{T}=[a,b]$ and pairs observed values $(t_{ij},Y_{ij})$ , where $t_{ij}\in\mathcal{T}$ for $j=1,...,J_{i}$ , the CFD is built as follows:

X_{i}(t):=\begin{cases}Y_{ij}&\text{if $J_{i}=1$}\\ \beta_{0}+\beta_{1}t&\text{if $J_{i}=2$}\\ \sum_{k=1}^{K_{i}}c_{ik}\phi_{k}(t)&\text{if $J_{i}\geq 3$}\end{cases}\quad\text{with}\quad t\in[a,T_{i}^{*}]\quad\text{and}\quad\delta_{i}=1_{[T_{i}^{*}=T_{i}]}\quad\text{for}\quad i=1,...,N\

(16)

where $c_{ik}$ corresponds to the $k$ -th coefficient for $i$ -th unit and $\phi_{k}(t)$ to $k$ -th basis function that can be chosen to approximate the full basis expansion.

Moreover, $K_{i}$ is determined individually for $i$ -th unit when the number of recorded values exceeds or equals $3$ . It is determined using leave-one-out cross-validation. The method minimizes the prediction error of the number of components $K$ choosing the best $\hat{K}_{i}$ for the $i$ -th subject as follows:

\hat{K}_{i}=\arg\min_{K}\sum_{j=1}^{J_{i}}\left\{Y_{ij}-\hat{Y}_{i}^{(-i)}(t_{ij})\right\}^{2}\quad\text{for}\quad i=1,\ldots,N

(17)

In addition, considering that in real-world applications the measurement of $Y_{ij}$ may be subject to errors, we consider the random noise $\varepsilon_{ij}$ with $E[\varepsilon_{ij}]=0$ and $var[\varepsilon_{ij}]=\sigma_{ij}^{2}$ , where $\varepsilon_{ij}$ are independent across $i$ and $j$ and the errors are considered homoschedastic with $\sigma_{ij}^{2}=\sigma^{2}$ . Then, we can express the observed values as $Y_{ij}=X_{i}(t_{ij})+\varepsilon_{ij}$ , with $t_{ij}\in[a,T_{i}^{*}]$ , max $\{T_{i}^{*}:i=1,...,N\}\leq\tau$ , where $\tau$ is the length of the study follow-up. Hence, for different numbers $J_{i}\geq 2$ using the definition (16) we obtain the follow model:

Y_{ij}=\begin{cases}\beta_{0}+\beta_{1}t_{ij}+\varepsilon_{ij}&\text{if $J_{i}=2$}\\ \sum_{k=1}^{K_{i}}c_{ik}\phi_{k}(t_{ij})+\varepsilon_{ij}&\text{if $J_{i}\geq 3$}\end{cases}\quad\text{with}\quad t_{ij}\in[a,T_{i}^{*}]\quad\text{for}\quad i=1,...,N\,\text{,}\,j=1,...,J_{i}\,.

(18)

In order to evaluate the coefficients for the (18), we use the SSD criterion as follows:

\min\,\Bigg{[}\sum_{j=1}^{J_{i}}[Y_{ij}-X_{i}(t_{ij})]^{2}\Bigg{]}=\begin{cases}\min_{\beta_{0},\beta_{1}}\,\big{[}\sum_{j=1}^{J_{i}}[Y_{ij}-\beta_{0}-\beta_{1}t_{ij}]\big{]}&\text{if $J_{i}=2$}\\ \min_{c_{i1},c_{i2},...,c_{iK_{i}}}\,\big{[}\sum_{j=1}^{J_{i}}[Y_{ij}-\sum_{k=1}^{K_{i}}c_{ik}\phi_{k}(t_{ij})]\big{]}&\text{if $J_{i}\geq 3$}\end{cases}

(19)

calculated for all subjects $i=1,...,N$ .

2.2.2 Principal component analysis for Censored Functional Data

FPCA is an extension of the classical PCA because the principle is to replace vectors with functions, matrices by linear operators, and, in particular, covariance by auto-covariance operators Ramsay and Silverman (2005). Moreover, scalar products in vector are replaced by scalar products in function space $L^{2}$ . Let’s suppose that $X(t)$ is a generic trajectory defined on Hilbert space $L^{2}(\mathcal{T})$ with its mean function $\mu(t)=E[X(t)]$ and its covariance function $G(s,t)=Cov(X(t),X(s))$ , with $t,s\in\mathcal{T}$ . The covariance function can be expressed by the spectral decomposition as $G(s,t)=\sum_{m=1}^{\infty}\lambda_{m}\xi_{m}(t)\xi_{m}(s)$ , where $\{\lambda_{m}\}_{m=1,..,\infty}$ , correspond to a set non-increasing eigenvalues such that $\sum_{m}\lambda_{m}<\infty$ and $\{\xi_{m}(t)\}_{m=1,..,\infty}$ correspond to the eigenfunctions. Thus, the trajectory $X(t)$ admits the following Karhunen-Loevè expansion:

X(t)=\mu(t)+\sum_{m=1}^{\infty}\nu_{m}\xi_{m}(t)\quad\text{with}\quad t\in\mathcal{T}\,,

(20)

where the coefficient $\nu_{m}=\int_{\mathcal{T}}[X(t)-\mu(t)]\xi_{m}(t)\,dt$ corresponds to the $m$ -th Functional Principal Component (FPC) score of $X(\cdot)$ and satisfies the condition $E[\nu_{m},\nu_{n}]=\delta_{m\,n}n_{m}$ with $\delta_{m\,n}=1$ if $m=n$ and $0$ otherwise.

For the case of sparse or irregular functional data, a fully non-parametric approach denoted as PACE Yao et al. (2005) is used. Let $Y_{ij^{(h)}}$ be the $j$ -hth observation obtained from CFD (16) evaluating the $i$ -th curve at time $t_{ij^{(h)}}\in[a,T_{i}^{*}]$ , where $h=t_{ij^{(h)}}-t_{i{j-1}^{(h)}}$ is the step size chosen for the time increment by considering $t_{i1}=t_{i1^{(h)}}\leq t_{i2^{(h)}}\leq...\leq t_{iJ_{i-1}^{(h)}}\leq t_{iJ_{i}^{(h)}}=t_{iJ_{i}}$ for $i=1,...,N$ . Then, including additional measurement errors $\varepsilon_{ij}$ , uncorrelated among them with $E[\varepsilon_{ij^{(h)}}]=0$ and $Var[\varepsilon_{ij^{(h)}}]=\sigma^{2}$ , to a model with measurements $Y_{ij^{(h)}}$ calculated at time $t_{ij^{(h)}}$ from (20) we obtain:

Y_{ij^{(h)}}=X_{i}(t_{ij^{(h)}})+\varepsilon_{ij^{(h)}}=\mu(t_{ij^{(h)}})+\sum_{m=1}^{\infty}\nu_{im}\xi_{m}(t_{ij^{(h)}})+\varepsilon_{ij^{(h)}}\quad\text{with}\quad t_{ij^{(h)}}\in\mathcal{T}\,,

(21)

where the expansion (21) can be approximated as follows:

Y_{ij^{(h)}}\approx\mu(t_{ij^{(h)}})+\sum_{m=1}^{p}\nu_{im}\xi_{m}(t_{ij^{(h)}})+\varepsilon_{ij^{(h)}}\quad\text{with}\quad t_{ij^{(h)}}\in\mathcal{T}.

(22)

Moreover, the mean function, the covariance function and the eigenfunctions are calculated using the local linear smoothers Fan and Gijbels (1992). From (21), we note that the $Cov(Y_{ij^{(h)}},Y_{il^{(h)}}|\,t_{ij^{(h)}},t_{il^{(h)}}=Cov(X(t_{ij^{(h)}},X(t_{il^{(h)}})+\sigma^{2}\delta_{jl}$ for which $G_{i}(t_{ij^{(h)}},t_{il^{(h)}})=(Y_{ij^{(h)}}-\hat{\mu}(t_{ij^{(h)}})\,(Y_{il^{(h)}}-\hat{\mu}(t_{il^{(h)}})$ , where $\hat{\mu}(t)$ is a mean function obtained by applying a local linear smoother to the scatterplot $\{(t_{ij^{(h)}},Y_{ij^{(h)}}):1\leq i\leq N,\,1\leq j\leq J_{i}^{(h)}\}$ . The estimated covariance surface, $\hat{G}(s,t)$ , is calculated considering the raw estimates $G_{i}(t_{ij^{(h)}},t_{il^{(h)}})$ and then applying two-dimensional smoothing to the scatterplot $\{(\,(t_{ij^{(h)}},t_{il^{(h)}})\,;\,G_{i}(t_{ij^{(h)}},t_{il^{(h)}})\,):1\leq i\leq N,\,1\leq j\neq l\leq J_{i}^{(h)}\}$ . Nevertheless, the calculation of scores $\nu_{im}$ is done according to the definition of integrals in $L^{2}(\mathcal{T})$ , but in this case having that $Y_{ij^{(h)}}$ are available only at discrete random times $t_{ij^{(h)}}$ we can obtain the estimates $\hat{\nu}_{im}$ from (21) as $\hat{\nu}_{im}=\sum_{j=1}^{J_{i}^{h}}(Y_{ij^{(h)}}-\hat{\mu}(t_{ij^{(h)}})\,\hat{\xi}_{m}(t_{ij^{(h)}})\,(t_{ij^{(h)}}-t_{ij-1^{(h)}})$ . The goal is to obtain predicted trajectories $\hat{X}_{i}(t)$ for irregular data $Y_{ij^{(h)}}$ . In summary, by discretising continuous trajectories with step sizes $h$ , we can effectively analyse functional data even when observations are sparse or irregular. This approach allows for the practical implementation of FPCA, capturing essential features of the data while accounting for measurement errors and irregular observation patterns.

An alternative to the Riemann sum given by the previous formula is to assume that in (21), the $\nu_{im}$ and $\varepsilon_{ij^{(h)}}$ are jointly Gaussian. For this reason, defining $\bm{\hat{Y}}_{i}^{(h)}=(Y_{i1^{(h)}},...,Y_{iJ_{i}^{(h)}})^{T}$ , $\bm{\mu}_{i}^{(h)}=(\mu(t_{i1^{(h)}}),...,\mu(t_{iJ_{i}^{(h)}}))^{T}$ and $\bm{\xi}_{im}^{(h)}=(\xi_{m}(t_{i1^{(h)}}),...,\xi_{m}(t_{iJ_{i}^{(h)}}))^{T}$ under Gaussian assumptions, we obtain the best prediction for the $m$ -th FPC score $\nu_{im}$ for the $i$ -th subject as conditional expectation $E[\nu_{im}|\bm{\hat{Y}}_{i}^{(h)}]$ as follows:

\hat{\nu}_{im}=E[\nu_{im}|\bm{\hat{Y}}_{i}^{(h)}]=E[\nu_{im}]+Cov(\nu_{im},\bm{\hat{Y}}_{i}^{(h)})Cov(\bm{\hat{Y}}_{i}^{(h)},\bm{\hat{Y}}_{i}^{(h)})^{-1}(\bm{\hat{Y}}_{i}^{(h)}-\bm{\mu}_{i}^{(h)})=\lambda_{m}\bm{\xi}_{im}^{(h)\,T}\bm{\Sigma}_{\bm{\hat{Y}}_{i}^{(h)}}^{-1}(\bm{\hat{Y}}_{i}^{(h)}-\bm{\mu}_{i}^{(h)})\,,

(23)

where $\bm{\Sigma}_{\bm{\hat{Y}}_{i}^{(h)}}=\text{Cov}(\bm{\hat{Y}}_{i}^{(h)},\bm{\hat{Y}}_{i}^{(h)})=\text{Cov}(\bm{\hat{X}}_{i}^{(h)},\bm{\hat{X}}_{i}^{(h)})+\sigma^{2}\bm{I}$ , with $\bm{\hat{X}}_{i}^{(h)}=(\bm{X}(t_{i1^{(h)}}),...,\bm{X}(t_{iJ_{i}^{(h)}}))$ and $\bm{I}\in\mathbb{R}^{J_{i}^{(h)}\times J_{i}^{(h)}}$ identity matrix. Thus, $(\bm{\Sigma}_{\bm{\hat{Y}}_{i}^{(h)}})_{jl}=G(t_{ij^{(h)}},t_{il^{(h)}})+\sigma^{2}\delta_{jl}$ , where $\delta_{jl}=1$ if $j=l$ and $0$ otherwise. Equation (23) can be obtained by using the estimates of $\lambda_{m}$ , $\bm{\xi}_{im}^{(h)}$ , and $\bm{\Sigma}_{\bm{\hat{Y}}_{i}^{(h)}}$ as follows:

\hat{\nu}_{im}=\hat{\lambda}_{m}\hat{\bm{\xi}}_{im}^{\,(h)\,T}\hat{\bm{\Sigma}}_{\bm{\hat{Y}}_{i}^{(h)}}^{-1}(\bm{\hat{Y}}_{i}^{(h)}-\hat{\bm{\mu}}_{i}^{(h)}).

(24)

Finally, from (24) the prediction for the trajectory $X_{i}(t)$ for $i$ th subject can be done considering the first $p$ eigenfunctions as follows:

\hat{X}_{i}(t)=\hat{\mu}(t)+\sum_{m=1}^{p}\hat{\nu}_{im}\hat{\xi}_{m}(t)\quad\text{with}\quad t\in\mathcal{T}.

(25)

2.2.3 Functional Survival Tree (FST)

In supervised learning, Survival Trees (STs) extend the classical Classification Trees (CTs) grown on censored data. Unlike classical decision trees, STs are not directly applicable to censored data; a ST focuses on predicting the time until an event of interest occurs for a subject. These problems introduce a distinctive challenge when dealing with survival data, making certain aspects of implementing ST more intricate than decision trees for classification tasks. In survival studies, we have, for each subject, an observation vector $\bm{Y}_{i}=(Y_{i1},...,Y_{iJ_{i}})^{T}$ that is a data collected in $J_{i}$ values until the event time $T_{i}^{*}=\mbox{min}\,(C_{i},T_{i})$ occurs for the $i$ -th unit and the response vector $\bm{Z}=(Z_{1},Z_{2},...,Z_{N})^{T}$ that can have binary values given by $Z_{i}=0$ or $Z_{i}=1$ for $i=1,...,N$ . Thus, we observe, for the $i$ -th subject, the data given by $(\bm{Y}_{i},\bm{t}_{i},Z_{i},T_{i}^{*},\delta_{i})$ , with $\delta_{i}$ censoring indicator.

In traditional CTs, the non-pruned trees lead to identifying homogeneous terminal nodes (leaves) using some splitting criteria based on decreasing an impurity metric, e.g. the Gini or Shannon-Weier indexes. However, in STs, the splitting criterion used in each node $s$ of the tree, denoted by $T$ , differs from classical methods when dealing with censored data. Indeed, iteratively, the procedure grows a tree until each node defines, at the very least, a singular distinct event by using survival splitting as a means for maximizing between-node survival differences given by curves defined through parametric (Cox regression) and non-parametric (Kaplan-Meier) approaches Shimokawa et al. (2015).

In a classical decision tree, the features vectors $\bm{X}\in\mathbb{R}^{J}$ have the same size for each unit: on the contrary, in STs the challenge is how to grow a tree having, for each subject, data collected during follow-up study $\mathcal{T}$ given by $Y_{ij}=Y(t_{ij})$ for $j=1,...,J_{i}$ values with $J_{i}\neq J_{k}$ for some $i\neq k$ with $t_{ij}\in\mathcal{T}$ . For this reason, starting from these data types and defining the CFD, we deal with new features using the values estimated on CFD and applying the PACE decomposition on them. Therefore, extending Functional Classification Trees Maturo and Verde (2022) to the context of survival analysis, we propose the Functional Survival Classification Tree with Principal Components (FSCTs-FPCs). The latter classifier considers the new data $(\bm{\hat{\nu}}_{i},Z_{i},T_{i}^{*},\delta_{i})$ , where $\bm{\hat{\nu}}_{i}=(\hat{\nu}_{i1},\hat{\nu}_{i2},...,\hat{\nu}_{ip})^{T}$ corresponds to the $i$ -th score vector obtained from Equation 24. Hence, the features matrix is defined as follows:

\bm{\hat{V}}=\begin{pmatrix}\hat{\nu}_{11}&\hat{\nu}_{12}&\dots&\hat{\nu}_{1p}\\ \hat{\nu}_{21}&\hat{\nu}_{22}&\dots&\hat{\nu}_{2p}\\ \vdots&\vdots&\ddots&\vdots\\ \hat{\nu}_{N1}&\hat{\nu}_{N2}&\dots&\hat{\nu}_{Np}\\ \end{pmatrix}

(26)

where $\bm{\hat{V}}\in\mathbb{R}^{N\times p}$ and $\hat{\nu}_{im}$ is the $m$ -th FPC score of the $i$ -th curve relative to the eigenfunction $\hat{\xi}_{m}$ for $i=1,...,N$ and $m=1,...,p$ .

Typically, considering the node $s$ of the survival tree $T$ , given a score-predictor $\bm{\hat{\nu}}_{m}$ , the method finds a value $c$ and selects the best $\bm{\hat{\nu}}_{0}$ such that survival differences between the two inequalities, $\bm{\hat{\nu}}_{0}\leq c$ and $\bm{\hat{\nu}}_{0}>c$ , are maximized in order to predict the response $\bm{Z}$ . The aim is to define in node $s$ two regions $R_{1}(m,c)=\{\hat{X}(t)\in L^{2}(\mathcal{T})|\bm{\hat{\nu}}_{m}\leq c\}$ and $R_{2}(m,c)=\{\hat{X}(t)\in L^{2}(\mathcal{T})|\bm{\hat{\nu}}_{m}>c\}$ to identify $l=1,...,L$ times for the two groups, with $t_{1}\leq t_{2}\leq...\leq t_{L}$ . One of the survival criteria used within node $s$ , to select the threshold $c$ and the functional score-prediction $\bm{\hat{\nu}}_{0}$ , is the log-rank test as splitting method Ziegler et al. (2007). It can be used to maximize a splitting value $c$ for a predictor variable $\bm{\hat{\nu}}$ as follows:

|L(\bm{\hat{\nu}},c)|=\frac{\sum_{l=1}^{L}\Big{(}d_{l1}-r_{l1}\frac{d_{l}}{r_{l}}\Big{)}}{\sqrt{\sum_{l=1}^{L}\frac{r_{l1}}{r_{l}}\Big{(}1-\frac{r_{l1}}{r_{l}}\Big{)}\Big{(}\frac{r_{l}-d_{l}}{r_{l}-1}\Big{)}d_{l}}}

(27)

with $d_{lj}$ number of events in daughter nodes $j\in\{1,2\}$ at time $t_{l}\in\mathcal{T}$ (time of observation), where $d_{l}=d_{l1}+d_{l2}$ is the total number of events within node $s$ . Moreover, $r_{lj}$ is the number of individuals who are at risk (alive) in daughter nodes $j\in\{1,2\}$ at time $t_{l}\in\mathcal{T}$ , where $r_{l}=r_{l1}+r_{2}$ is the total number of individuals alive within node $s$ . $K$ is the total number of times between two daughter nodes. The number $r_{l1}$ corresponds to $T_{l}\geq t_{l}$ for which $\bm{\hat{\nu}}\leq c$ , while $r_{l2}$ corresponds to $T_{l}\geq t_{l}$ for which $\bm{\hat{\nu}}>c$ within node $s$ . The Equation (27) allows finding the best $\bm{\hat{\nu}}^{*}$ and the best $c^{*}$ which $|L(\bm{\hat{\nu}}^{*},c^{*})|\geq|L(\bm{\hat{\nu}},c)|$ for each $\bm{\hat{\nu}}$ and $c$ chosen randomly. This process is repeated randomly at every node $s$ until the terminal node is reached. The FSCTs-FPCs use different methods for constructing predictive models. Two common alternative approaches include log-rank splitting, which divides data based on survival differences, and the Nelson-Aalen estimator, used for estimating cumulative hazard. The last method helps us to identify significant predictors of survival outcomes in the terminal nodes. Once the survival tree $T$ is built, let $s$ be the terminal node, where $s=1,...,S$ , in which are defined the times $t_{1s}\leq t_{2s}\leq...\leq t_{L_{s}s}$ and let $d_{ls}$ and $r_{ls}$ the number of events and individuals at risk at time $t_{ls}$ , the survival predictor in $T$ is defined in terms of the predictor within each terminal node $s$ . For this, the Cumulative Hazard Function (CHF) and the Survival Function (SF) are obtained by using the Nelson-Aalen estimator and Kaplan-Meier estimator defined as follows:

\mathcal{\hat{H}}_{s}(t)=\sum_{t_{ls}\leq t}\frac{d_{ls}}{r_{ls}},\quad\hat{S}_{s}(t)=\prod_{t_{ls}\leq t}\Bigg{(}1-\frac{d_{ls}}{r_{ls}}\Bigg{)}\,

(28)

where $\mathcal{H}_{s}(t)$ and $\hat{S}_{s}(t)$ are the hazard and the survival estimate for a terminal node $s$ obtained considering all distinct event times $t_{ls}\leq t$ . However, predicting survival outcomes after the survival tree construction is essential. This evaluation often involves comparing the predicted survival probabilities with the observed outcomes. One standard method for evaluating the importance of a survival tree is using the Concordance Index (C-index), also known as Harrell’s C or the C-statistics, which tells us how accurately the model ranks people’s survival times. A higher C-index indicates better predictive performance of the survival tree model. Other metrics, such as the Brier score, can be used for further evaluation.

2.2.4 Functional Random Survival Forest (FRSF)

The Functional Random Survival Forest (FRSF) extends the classic RSF. In our approach, the method is built for the CFD considering the pairs $\{Z_{i},X_{i}(t)\}$ , where $X_{i}(t)$ is a CFD curve for $i$ -th unit defined in the metric space $(L^{2}(\mathcal{T}),\|\cdot\|_{2})$ , while $Z_{i}$ is a scalar response. Starting from the dataset $D=\{Z_{i},\hat{\bm{\nu}}_{i},T_{i}^{*},\delta_{i}\}_{i=1}^{N}$ used for the realization of the model, where $\hat{\bm{\nu}}_{i}\in\mathbb{R}^{p}$ is the $i$ -th score calculated by PACE on CFD, $Z_{i}$ is a binary response, $T_{i}^{*}=\min\,(C_{i},T_{i})$ , and $\delta_{i}$ is a censoring indicator at sample $i=1,...,N$ . The method generates $B$ bootstrap samples from $D$ by creating the Training data set (In-bag data) and Validation data set (Out-of-bag data) to construct multiple STs.

The following pseudo-code can summarise the FRSF:

Algorithm 1 Functional Random Survival Forest (FRSF)

1:Input: Dataset

D=\{(Z_{i},\hat{\bm{\nu}}_{i},T_{i}^{*},\delta_{i})\}_{i=1}^{N}

, number of trees

B

, number of random predictors

q

2:for

b=1,2,...,B

3: Sample a

b

bootstrap sample of size

N

from

D

with replacement (In-bag data)

4: Use the remaining data as Out-of-bag data

5: Train a survival tree

T_{b}

on the bootstrap sample

b

using

q

random FPCs’ scores at the node

6: Calculate a CHF in each terminal node

s

7:end for

8:Calculate the average CHF over all survival trees

T_{1},T_{2},\ldots,T_{B}

for IB and OOB data

9:Output: Survival Trees

\{T_{1},T_{2},...,T_{B}\}

and the averaged CHF

The step $8$ of the algorithm can be explained from (28) as follows:

\overline{\mathcal{H}}^{IB}(t|\hat{\bm{\nu}})=\frac{1}{B}\sum_{b=1}^{B}{\mathcal{H}}_{b}(t|\hat{\bm{\nu}})\,,\qquad\overline{\mathcal{H}}_{i}^{OOB}(t|\hat{\bm{\nu}}_{i})=\frac{1}{B}\sum_{b=1}^{B}{\mathcal{H}}_{b}(t|\hat{\bm{\nu}}_{i})

(29)

where ${\mathcal{H}}_{b}(t|\hat{\bm{\nu}})$ is the in-bag CHF, while ${\mathcal{H}}_{b}(t|\hat{\bm{\nu}}_{i})$ is the out-of-bag CHF for $b$ -th survival tree.

The metrics derived from the ensemble CHF, such as the predictor error and the variable importance (VIMP), provide crucial insights into the model’s goodness-of-fit calculated using only the OOB data. This process evaluates the accuracy of predictors and determines their relative importance in predicting survival outcomes. In FRSF the Breiman-Cutler Variable Importance (VIMP), also known as permutation importance Breiman (2002) is used to quantify the importance of each predictor variable by evaluating its impact on prediction error. The VIMP method permutes the out-of-bag (OOB) values of $\hat{\bm{\nu}}$ within each tree. This permutation process entails randomising the values of $\hat{\bm{\nu}}$ , while keeping the values of the other predictor variables unchanged. Moreover, the modified data influences decision paths through the tree structure based on the altered $\hat{\bm{\nu}}$ values. The resulting out-of-bag error, indicative of prediction accuracy with the permuted $\hat{\bm{\nu}}$ is compared to the original error calculated with unaltered $\hat{\bm{\nu}}$ values. The discrepancy between these errors quantifies the importance of $\hat{\bm{\nu}}$ in predicting outcomes. Aggregating these individual importance scores across all trees in the forest yields the permutation importance for variable $\hat{\bm{\nu}}$ , providing insights into its contribution to the overall predictive performance of the model.

3 Application

We consider a dataset from the field of critical care medicine called the Sequential Organ Failure Assessment (SOFA) score, designed to provide a comprehensive evaluation of organ function in critically ill patients. The SOFA score dataset is available in the R package ’refund’. Many applications have been developed utilizing the SOFA score dataset, as evidenced by several studies Moreno et al. (2023). This dataset concerns specifically $520$ patients admitted to the Intensive Care Unit (ICU) with Acute Lung Injury. For each patient, daily measurements of SOFA scores, which range from $0$ to $24$ , have been recorded for each patient over $173$ ICU days as shown in Figure (1), indicating the severity of organ failure.

Refer to caption — Figure 1: SOFA scores collected in $173$ ICU days concerning outcome

Other variables collected during hospitalization include ICU death indicators, ICU length of stay (LOS), patient age, gender, and the Charlson co-morbidity index, providing insights into baseline health status. Figure (2) shows descriptive statistics for the variables. Boxplots visually summarize the distributional characteristics of the numerical variables, while barplots compare categorical data. All data concerning the outcome variable indicate the survival status, corresponding to $237$ patients died and $283$ lived until the last recording time.

The first problem we address is converting the SOFA scores into a functional form to monitor recovery progress throughout hospitalization. We thus propose a functional approach based on the previously described definition in Equation 2.2.1, where the response variable corresponds to the SOFA scores, and the independent variable represents the $t$ at which the score was calculated for different recorded values. For each statistical unit, truncated curves will be defined based on the scores recorded during hospitalization as follows:

$\displaystyle\hat{Y}_{\text{score}}(t)_{(1)}$	$\displaystyle=Y_{\text{score}}$				(30)
$\displaystyle\hat{Y}_{\text{score}}(t)_{(2)}$	$\displaystyle=\hat{\beta}_{0}+\hat{\beta}_{1}t$				(31)
$\displaystyle\hat{Y}_{\text{score}}(t)_{(i)}$	$\displaystyle=\hat{c}_{1}\hat{\phi_{1}}(t)+\hat{c}_{2}\hat{\phi_{2}}(t)+\ldots+\hat{c}_{K_{i}}\hat{\phi}_{K_{i}}(t)$	with	$\displaystyle i\in\Omega$	$\displaystyle\Omega=\{i\in\mathcal{I}:J_{i}\geq 3\}$	(32)

where in the equation (32) we select the set of functions $\{\hat{\phi}_{k}\}_{k=1,\ldots,K_{i}}$ as B-splines of order $4$ . The truncated curves have been visualized respect to the corresponding outcome variable, as shown in the Figure (3). To assess the model quality of the FRSF, we propose to use different training datasets, comprising $50\%$ , $60\%$ , $70\%$ , and $80\%$ of the SOFA dataset.

Train Dataset	STD Model		CFD Model(h=0.5)		CFD Model(h=0.2)
Train Dataset	(OOB) CRPS	(OOB) RPE	(OOB) CRPS	(OOB) RPE	(OOB) CRPS	(OOB) RPE
50%	0.11830311	0.14836430	0.11861220	0.14734700	0.11130006	0.13733469
60%	0.11631665	0.14853905	0.11517514	0.14402103	0.11273948	0.14061401
70%	0.09189498	0.14906593	0.09253855	0.14428571	0.08753912	0.14195055
80%	0.10149814	0.14921755	0.10135369	0.14619833	0.09974069	0.14570930

Table 1: Comparison of OOB scores for CRPS and RPE across different training datasets.

The model is evaluated considering the four scenarios obtained from the previously defined partition. In particular, we compare our proposed method (FRSF for CFD) and the traditional FRSF for STD data. We assess the models’ performance across the different training dataset proportions using various metrics, such as the Continuous Ranked Probability Score (CRPS) and the Requested Performance Error (RPE) on the OOB data proportion. As seen in Table 1, the models show varying degrees of accuracy depending on the proportion of the training dataset.

A more detailed analysis of temporal evaluations with different temporal discretisation values, represented by the parameter $h$ , enables capturing information that may otherwise be lost during the follow-up period. By carefully selecting $h$ , we ensure a more detailed and accurate temporal assessment, filling in gaps where the information was not collected initially. Specifically, we have chosen two different bandwidth values, $h=0.5$ and $h=0.2$ , within the CFD model. Moreover, the model’s performance has been analyzed using the RPE scores based on the number of trees and the CRPS scores to observe the forecasted temporal trend. Graphs for the four different scenarios (4) illustrate the model’s predictive performance evolution.

It can be noted that there is a significant improvement in the RPE (left panel) and CRPS (right panel) in all cases.

The CRPS function shows better performance for the FRSF with CFD during intermediate daily periods when a substantial amount of data is available. For example, when examining the CRPS and RPE graphs for the Model CFD with $h=0.2$ , we observe a gradual decrease in both metrics over time, indicating improved model accuracy. Furthermore, the quality of the models has been shown through Variable Importance (VIMP) and Relative Importance (RI) for scenarios (50%, 60%, 70%, and 80% of the Training dataset) for the three models. In our specific analysis, we focus on the scenario using 80% of the training dataset (results for other scenarios are provided in the appendix). The variables considered include the first four FPCs scores, Age, Charlson Comorbidity Index, and Gender. By examining the variable importance values across different scenarios and models, we can determine each feature relative contribution to the predictive performance of the models (Table 2).

Variable	Importance	Relative Importance
PC1	0.4138	1.0000
PC2	0.1325	0.3201
PC4	0.0786	0.1900
PC3	0.0510	0.1231
Age	0.0435	0.1052
Charlson	0.0284	0.0686
Gender	-0.0004	-0.0009

Variable	Importance	Relative Importance
PC1	0.4202	1.0000
PC2	0.1287	0.3062
PC4	0.0806	0.1919
PC3	0.0487	0.1159
Age	0.0416	0.0989
Charlson	0.0270	0.0644
Gender	-0.0008	-0.0019

Variable	Importance	Relative Importance
PC1	0.4396	1.0000
PC2	0.1191	0.2709
PC4	0.0945	0.2149
PC3	0.0532	0.1211
Age	0.0434	0.0987
Charlson	0.0346	0.0786
Gender	-0.0004	-0.0008

Table 2: Variable Importance (VIMP) for Model employing 80% of the data as the training set. The top-left table corresponds to the Model STD, the top-right table corresponds to the Model CFD (h=0.5), and the bottom table represents the Model CFD (h=0.2).

In general, PC1 consistently exhibits the highest importance across all cases and models, followed by PC2 and PC4. Age and Charlson Comorbidity Index also show moderate importance, while Gender has an insignificant impact on the model. These values underscore the varying degrees of influence that different variables have on the predictive accuracy of the models.

4 Discussion and Conclusions

This study presents a new approach to survival analysis by developing Functional Random Survival Forest (FRSF), specially designed for Censored Functional Data (CFD). This approach embeds Functional Data Analysis within the framework of Random Survival Forest, thereby utilizing the strength of FPCA in managing complex, high-dimensional, censored, and temporally correlated data.

This study’s critical contribution is introducing the FRSF model, a novel extension of traditional RSF incorporating functional data. This offers significant advantages compared to conventional methods, especially when the data are censored and/or high-dimensional. The implementation of CFD provides a robust framework for handling incomplete or irregularly spaced data by enabling the reconstruction of continuous trajectories from discrete observations and performing a more comprehensive analysis of survival dynamics. Following the work of Maturo and Verde on FRF Maturo and Verde (2023), the current study adapts and extends FRF to the context of survival analysis and mainly deals with the difficulties posed by censored data. Fusing FPCA into the FRSF framework allows for dimension reduction while retaining key features, which gives the model interpretability to the FST and explainability for the overall RSF model. That means the model remains flexible in accounting for differing observation times and censoring mechanisms, leading to robust inferences without some common assumptions made for the parametric survival models.

A crucial aspect of this study is the introduction of the parameter $h$ , which denotes the step size for time increments in the evaluation of functional data. The parameter $h$ is meaningful as it permits discretizing continuous trajectories, enabling effective functional data analysis even when observations are sparse or irregular. Different values of $h$ can capture various levels of detail in the data, with smaller values providing finer resolution and potentially revealing more fine patterns in the temporal evolution of the data. Our analysis used different values of $h$ to demonstrate how different discretization levels impact the model’s performance. The proposed FRSF model was empirically validated using the SOFA dataset, which consists of daily measurements of critically ill patients.

Despite its strengths, the FRSF model has certain limitations. One of the primary challenges is the computational complexity associated with the FPCA decomposition and the subsequent construction of survival trees. The choice of basis functions and FPCs can also impact the model’s performance. Future research should optimise these aspects to enhance the model’s efficiency and scalability and test different basis functions, such as wavelets, to capture more complex and localised patterns in the data. In addition, future research could explore integrating FRSF with other machine learning approaches, such as boosting or deep learning, which could further enhance its predictive capabilities and scalability.

In conclusion, the FRSF offers a powerful and versatile tool for survival analysis, particularly in the presence of censored and irregularly spaced functional data. Its integration of FDA and RSF techniques, along with empirical validation, demonstrates its potential for significant contributions to theoretical advancements and practical applications in survival analysis.

Funding and/or Conflicts of interests/Competing interests

All the authors declare that they did not receive support from any organisation for the submitted work. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial or non-financial interest in the subject matter or materials discussed in this manuscript.

References

Aguilera and Aguilera-Morillo [2013] A. Aguilera and M. Aguilera-Morillo. Penalized PCA approaches for b-spline expansions of smooth functional data. Applied Mathematics and Computation, 219(14):7805–7819, mar 2013. doi: 10.1016/j.amc.2013.02.009. URL https://doi.org/10.1016%2Fj.amc.2013.02.009.
Biemann and Kearney [2010] T. Biemann and E. Kearney. Size does matter: How varying group sizes in a sample affect the most common measures of group diversity. Organizational Research Methods, 13(3):582–599, jul 2010. doi: 10.1177/1094428109338875. URL https://doi.org/10.1177%2F1094428109338875.
Breiman [2002] L. Breiman. Manual on setting up, using, and understanding random forests. (1), 2002.
Chao and Shen [2003] A. Chao and T.-J. Shen. Nonparametric estimation of shannon’s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10(4):429–443, 2003. doi: 10.1023/B:ENVR.0000043140.70017.ac.
Chao et al. [2014] A. Chao, N. J. Gotelli, T. C. Hsieh, E. L. Sander, K. H. Ma, R. K. Colwell, and A. M. Ellison. Rarefaction and extrapolation with hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84(1):45–67, feb 2014. doi: 10.1890/13-0133.1. URL https://doi.org/10.1890%2F13-0133.1.
Delaigle and Hall [2013] A. Delaigle and P. Hall. Classification using censored functional data. Journal of the American Statistical Association, 108(504):1269–1283, 2013. doi: 10.1080/01621459.2013.824893. URL https://doi.org/10.1080/01621459.2013.824893.
Fan and Gijbels [1992] J. Fan and I. Gijbels. Variable bandwidth and local linear regression smoothers. Annals of Statistics, 20(4):2008–2036, December 1992. doi: 10.1214/aos/1176348900.
Febrero-Bande and de la Fuente [2012a] M. Febrero-Bande and M. de la Fuente. Statistical computing in functional data analysis: The r package fda.usc. Journal of Statistical Software, Articles, 51(4):1–28, 2012a. doi: 10.18637/jss.v051.i04. URL https://www.jstatsoft.org/v051/i04.
Febrero-Bande and de la Fuente [2012b] M. Febrero-Bande and M. O. de la Fuente. Statistical computing in functional data analysis: The R package fda.usc. Journal of Statistical Software, 2012b. ISSN 15487660. doi: 10.18637/jss.v051.i04.
Ferraty [2011] F. Ferraty. Recent Advances in Functional Data Analysis and Related Topics. Physica-Verlag HD, 2011. doi: 10.1007/978-3-7908-2736-1. URL https://doi.org/10.1007%2F978-3-7908-2736-1.
Ferraty and Vieu [2006] F. Ferraty and P. Vieu. Nonparametric Functional Data Analysis. Springer New York, 2006. doi: 10.1007/0-387-36620-2. URL https://doi.org/10.1007%2F0-387-36620-2.
Hemant et al. [2008] I. Hemant, U. Kogalur, E. Blackstone, and M. Lauer. Random survival forests. The Annals of Applied Statistics, 2:841–860, 2008.
Ishwaran et al. [2008] H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer. Random survival forests. The Annals of Applied Statistics, 2:841–860, 2008.
Kleinbaum and Klein [2012] D. G. Kleinbaum and M. Klein. Survival Analysis: A Self-learning Text. Springer, third edition, 2012. ISBN 978-1441966452.
Lin et al. [2021] J. Lin, K. Li, and S. Luo. Functional survival forests for multivariate longitudinal outcomes: Dynamic prediction of Alzheimer’s disease progression. Statistical Methods in Medical Research, 30(1):99–111, 2021. doi: 10.1177/0962280220941532. URL https://doi.org/10.1177/0962280220941532.
Maturo and Verde [2022] F. Maturo and R. Verde. Pooling random forest and functional data analysis for biomedical signals supervised classification: Theory and application to electrocardiogram data. Statistics in Medicine, 41(12):2247–2275, 2022. doi: https://doi.org/10.1002/sim.9353. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9353.
Maturo and Verde [2023] F. Maturo and R. Verde. Supervised classification of curves via a combined use of functional data analysis and tree-based methods. Computational Statistics, 38:419–459, 2023. doi: 10.1007/s00180-022-01236-1. URL https://doi.org/10.1007/s00180-022-01236-1.
Maturo and Verde [2024] F. Maturo and R. Verde. Combining unsupervised and supervised learning techniques for enhancing the performance of functional data classifiers. Computational Statistics, 39(1):239–270, 2024. doi: 10.1007/s00180-022-01259-8.
Moreno et al. [2023] R. Moreno, A. Rhodes, L. Piquilloud, et al. The sequential organ failure assessment (sofa) score: has the time come for an update? Critical Care, 27(1):15, 2023.
Ramsay and Silverman [2005] J. Ramsay and B. Silverman. Functional Data Analysis, 2nd edn. Springer, New York, 2005.
Ramsay et al. [2009] J. Ramsay, G. Hooker, and S. Graves. Introduction to functional data analysis. In Functional Data Analysis with R and MATLAB, pages 1–19. Springer New York, 2009. doi: 10.1007/978-0-387-98185-7\_1. URL https://doi.org/10.1007%2F978-0-387-98185-7_1.
Schimek [2013] M. G. Schimek, editor. Smoothing and Regression: Approaches, Computation, and Application. John Wiley & Sons, 2013.
Shimokawa et al. [2015] A. Shimokawa, Y. Kawasaki, and E. Miyaoka. Comparison of splitting methods on survival tree. The International Journal of Biostatistics, 11:175 – 188, 2015. URL https://api.semanticscholar.org/CorpusID:11441090.
Spreafico et al. [2023] M. Spreafico, F. Ieva, and M. Fiocco. Modelling time-varying covariates effect on survival via functional data analysis: application to the MRC BO06 trial in osteosarcoma. Statistical Methods & Applications, 32:271–298, 2023. doi: 10.1007/s10260-022-00647-0. URL https://doi.org/10.1007/s10260-022-00647-0.
Strzalkowska-Kominiak and Romo [2021] E. Strzalkowska-Kominiak and J. Romo. Censored functional data for incomplete follow-up studies. Statistics in Medicine, 40:2821–2838, 2021. doi: 10.1002/sim.8930. URL https://doi.org/10.1002/sim.8930.
Wang and Li [2017] H. Wang and G. Li. A selective review on random survival forests for high dimensional data. Quantitative Biology, 36(2):85–96, 2017. doi: 10.22283/qbs.2017.36.2.85.
Wang et al. [2019] P. Wang, Y. Li, and C. K. Reddy. Machine learning for survival analysis: A survey. ACM Computing Surveys (CSUR), 51(6):1–36, 2019.
Wey et al. [2015] A. Wey, J. Connett, and K. Rudser. Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models. Biostatistics, 16(3):537–549, 02 2015. ISSN 1465-4644. doi: 10.1093/biostatistics/kxv001. URL https://doi.org/10.1093/biostatistics/kxv001.
Yao et al. [2005] F. Yao, H.-G. Müller, and J.-L. Wang. Functional linear regression analysis for longitudinal data. The Annals of Statistics, pages 287 3–2903, 2005.
Zhang and Sun [2010] Z. Zhang and J. Sun. Interval censoring. Statistical Methods in Medical Research, 19(1):53–70, Feb 2010. doi: 10.1177/0962280209105023. URL https://doi.org/10.1177/0962280209105023.
Ziegler et al. [2007] A. Ziegler, S. Lange, and R. Bender. Survival analysis: log rank test. Dtsch Med Wochenschr, 132(Suppl 1):e39–e41, 2007.

Appendix

VIMP Tables

Variable	Importance	Relative Importance
PC1	0.4466	1.0000
PC2	0.1209	0.2708
PC4	0.0610	0.1366
Age	0.0526	0.1177
PC3	0.0512	0.1147
Charlson	0.0398	0.0891
Gender	-0.0004	-0.0008

Variable	Importance	Relative Importance
PC1	0.4549	1.0000
PC2	0.1233	0.2710
PC4	0.0701	0.1541
Age	0.0497	0.1093
PC3	0.0459	0.1010
Charlson	0.0388	0.0854
Gender	0.0000	0.0000

Variable	Importance	Relative Importance
PC1	0.4749	1.0000
PC2	0.1144	0.2409
PC4	0.0800	0.1685
Age	0.0475	0.0999
PC3	0.0464	0.0977
Charlson	0.0381	0.0801
Gender	-0.0001	-0.0002

Table 3: Variable Importance (VIMP) for 50% of the training dataset. The top-left table corresponds to the Model STD, the top-right table corresponds to the Model CFD (h=0.5), and the bottom table represents the Model CFD (h=0.2).

Variable	Importance	Relative Importance
PC1	0.4473	1.0000
PC2	0.1078	0.2409
PC4	0.0768	0.1716
Age	0.0656	0.1467
PC3	0.0467	0.1043
Charlson	0.0424	0.0947
Gender	-0.0007	-0.0015

Variable	Importance	Relative Importance
PC1	0.4571	1.0000
PC2	0.1117	0.2445
PC4	0.0824	0.1803
Age	0.0585	0.1280
PC3	0.0399	0.0873
Charlson	0.0389	0.0852
Gender	-0.0006	-0.0012

Variable	Importance	Relative Importance
PC1	0.4674	1.0000
PC2	0.1094	0.2340
PC4	0.0916	0.1959
Age	0.0536	0.1147
PC3	0.0423	0.0904
Charlson	0.0414	0.0886
Gender	-0.0003	-0.0006

Table 4: Variable Importance (VIMP) for 60% of the training dataset. The top-left table corresponds to the Model STD, the top-right table corresponds to the Model CFD (h=0.5), and the bottom table represents the Model CFD (h=0.2).

Variable	Importance	Relative Importance
PC1	0.4338	1.0000
PC2	0.1169	0.2694
PC4	0.1067	0.2459
Age	0.0612	0.1411
PC3	0.0567	0.1308
Charlson	0.0362	0.0836
Gender	-0.0001	-0.0003

Variable	Importance	Relative Importance
PC1	0.4457	1.0000
PC2	0.1104	0.2477
PC4	0.1062	0.2382
Age	0.0557	0.1249
PC3	0.0515	0.1156
Charlson	0.0432	0.0969
Gender	-0.0001	-0.0001

Variable	Importance	Relative Importance
PC1	0.4465	1.0000
PC4	0.1045	0.2340
PC2	0.1024	0.2293
Age	0.0524	0.1174
PC3	0.0517	0.1157
Charlson	0.0415	0.0928
Gender	-0.0003	-0.0008

Table 5: Variable Importance (VIMP) for 70% of the training dataset. The top-left table corresponds to the Model STD, the top-right table corresponds to the Model CFD (h=0.5), and the bottom table represents the Model CFD (h=0.2).