Deep Learning on Hester Davis Scores for
Inpatient Fall Prediction

Hojjat Salehinejad^1,2, , Ricky Rojas¹, Kingsley Iheasirim³,
Mohammed Yousufuddin⁴, and Bijan Borah¹
¹Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA ²Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA ³Department of Medicine, Mayo Clinic Health System, Mankato, MN, USA ⁴Department of Hospital Internal Medicine, Mayo Clinic Health System, Austin, MN, USA {salehinejad.hojjat, rojas.ricardo, iheasirim.kingsley, yousufuddin.mohammed, borah.bijan}@mayo.edu

Abstract

Fall risk prediction among hospitalized patients is a critical aspect of patient safety in clinical settings, and accurate models can help prevent adverse events. The Hester Davis Score (HDS) is commonly used to assess fall risk, with current clinical practice relying on a threshold-based approach. In this method, a patient is classified as high-risk when their HDS exceeds a predefined threshold. However, this approach may fail to capture dynamic patterns in fall risk over time. In this study, we model the threshold-based approach and propose two machine learning approaches for enhanced fall prediction: One-step ahead fall prediction and sequence-to-point fall prediction. The one-step ahead model uses the HDS at the current timestamp to predict the risk at the next timestamp, while the sequence-to-point model leverages all preceding HDS values to predict fall risk using deep learning. We compare these approaches to assess their accuracy in fall risk prediction, demonstrating that deep learning can outperform the traditional threshold-based method by capturing temporal patterns and improving prediction reliability. These findings highlight the potential for data-driven approaches to enhance patient safety through more reliable fall prevention strategies.

Index Terms:

Fall risk, fall prediction, Hester Davis score, machine learning.

I Introduction

Fall risk assessment is a critical process in healthcare, aimed at identifying hospitalized patients who are at higher risk of falling during their stay [1]. This assessment typically involves evaluating a combination of factors such as age, mobility, mental status, use of certain medications, and previous fall history. Tools like the Hester Davis Score (HDS) are widely employed to quantify fall risk based on these factors, allowing healthcare providers to classify patients into low, moderate, or high-risk categories [2, 3]. By accurately identifying high-risk patients, hospitals can implement preventive measures, such as increasing monitoring, modifying the patient’s environment, or providing assistive devices to prevent falls. Effective fall risk assessment is key to improving patient safety, reducing fall-related injuries, and minimizing healthcare costs associated with prolonged hospital stays and related complications [1, 4].

The HDS is a widely used tool in clinical settings for fall risk evaluation, offering a structured and standardized scoring system that assesses key factors such as age, mental status, mobility, medication usage, continence, recent fall history, and behavioral tendencies. Each factor is assigned a weighted score based on its contribution to fall risk, producing a cumulative score that categorizes patients into risk levels. The HDS allows for near real-time reassessment, as it incorporates both static and dynamic characteristics of the patient. This facilitates timely interventions, such as bed alarms or increased supervision, to mitigate fall risks in hospitalized patients [5].

Despite the utility of the HDS and other threshold-based models, these approaches often fail to capture the evolving risk patterns over time. They rely on instantaneous values to trigger preventive measures, which may not reflect the subtle, progressive changes in a patient’s condition. To address this limitation, machine learning models can offer a more dynamic and data-driven approach to fall risk prediction by incorporating the sequential pattern in the data. Similar approaches have been proposed for other challenges in healthcare, such as early warning systems [6, 7], hypertension detection [8], and human activity recognition [9, 10].

Machine learning, and particularly deep learning, has demonstrated superior performance in a variety of healthcare applications, from COVID-19 lung prognosis detection using chest computed tomography (CT) scans [11], to cervical spine fracture detection [12], and in-hospital mortality prediction among diabetic intensive care unit (ICU) patients [13]. In fall prediction, these models can be used to analyze complex, non-linear interactions between clinical variables, offering enhanced predictive power.

In this paper, we model the traditional threshold-based fall risk assessment approach using HDS and propose two machine learning-based alternatives: One-step ahead fall prediction and sequence-to-point fall prediction using deep learning. The former uses the HDS at a given time to predict fall risk at the next timestamp, while the latter leverages all preceding samples in a time series to forecast fall events. Sequence-to-point prediction is particularly important, as it captures the entire sequence of events leading up to a fall, allowing the model to identify temporal patterns and trends that threshold-based methods might overlook. For example, a gradual increase in HDS values over time may signify rising fall risk, even if the individual scores do not exceed predefined thresholds. This approach enables more accurate and timely predictions, enhancing the ability to intervene before a fall occurs. Particularly, recurrent neural networks (RNNs) [14, 15], long short-term memory (LSTM) [16] networks, and gated recurrent unit (GRU) [17] networks are proposed to learn from temporal pattern in the HDSs. We compare the performance of these methods to evaluate their potential in improving the accuracy and timeliness of fall risk predictions in clinical settings.

The source code used in this project and data can be made available on reasonable request and approval of corresponding authorities by contacting the corresponding author.

Refer to caption — (a) One-step ahead fall prediction model.

II Fall Prediction Models

In this section, fall prediction using HDS is modeled in two schemes, as illustrated in Figure 1. Let ${\mathbf{x}_{n}=(x_{n,1},...,x_{n,t},...,x_{n,T})}$ represent $T$ HDSs of an individual $n\in\{1,...,N\}$ from admission to discharge, where $t+1$ is a prediction timestamp, $T$ is the total number of retrospective samples, and $N$ is the total number of individuals. The HDSs of patients are calculated every $\Delta T$ hours. In the One-step ahead fall prediction scheme, in order to make a prediction for an individual $n$ at the future timestamp $t_{n}+1$ , the last HDS $x_{n,t_{n}}$ is used. In the sequence-to-point fall prediction scheme, the entire HDS samples $(x_{n,1},...,x_{n,t_{n}})$ since admission, are used to make a prediction for an individual $n$ at the future timestamp $t_{n}+1$ .

II-A One-Step ahead Fall Prediction

The current clinical practice approach involves comparing the HDS to a predefined threshold. In this section, we mathematically model this approach and then propose machine learning models to learn from the HDS at the current timestamp $t$ in order to predict the outcome at the subsequent timestamp $t+1$ .

II-A1 Threshold-based Method

Most clinical providers use an absolute number threshold $\theta$ to determine if a patient is at a high risk of fall and needs extra care and increased monitoring. In this approach, if at any time $t$ the HDS value $x_{n,t}$ for patient $n$ exceeds the threshold $\theta$ , the patient is classified as high-risk for falls, defined as

\tilde{y}_{n,t+1}=\left\{\begin{array}[]{ c l }1&\quad\textrm{if }x_{n,t_{n}}\geq\theta\\ 0&\quad\textrm{otherwise}\end{array},\right.

(1)

where $\tilde{y}_{n,t+1}=1$ means the patient $n$ is at high-risk of fall at the future timestamp $t_{n}+1$ and $\tilde{y}_{n,t+1}=0$ means otherwise.

II-A2 Machine Learning Methods

It is possible to build a binary fall prediction model about the fall outcome at a future timestamp $t+1$ based solely on the current available HDS sample $x_{n,t}$ . The task is to predict a binary label ${\tilde{y}_{n,t+1}\in\{0,1\}}$ at time $t+1$ , using only the value of $x_{n,t}$ , the sample immediately preceding $t+1$ . For each sample $x_{n,t}$ in the retrospective dataset, we pair it with a corresponding label $y_{n,t+1}$ , which represents the outcome at the next time step. The prediction model is built using a binary classifier $\phi(x_{n,t})$ , which maps each $x_{n,t}$ to a binary outcome $\tilde{y}_{n,t+1}$ as

\tilde{y}_{n,t+1}=\phi(x_{n,t}),

(2)

where the training set consists of $N-1$ pairs $\{(x_{1,t},y_{1,t+1}),\dots,(x_{N-1,t},y_{N-1,t+1})\}$ and each $x_{n,t}$ serves as a feature to predict the binary outcome $y_{n,t+1}$ . The classifier is trained to minimize the prediction error by adjusting its parameters to best capture the relationship between the single time series sample $x_{n,t}$ and the next time step’s binary fall label. Various machine learning models such as k-nearest neighbors (KNN) [18], support vector machine (SVM) [19], random forest (RF) [20], and extreme gradient boosting (XGB) [21] are evaluated in experiments section for this aim.

II-B Sequence-to-Point Fall Prediction

Sequence-to-point fall prediction, which utilizes all preceding samples in a time series to predict a fall event, can hold significant importance in clinical settings. Unlike traditional threshold-based methods that rely on single, instantaneous values, sequence-to-point prediction leverages the entire sequence of data leading up to the instant before which the fall event is predicted. This approach captures temporal patterns and trends that may be missed using isolated samples.

II-B1 Recurrent Neural Networks

The RNNs are one of the popular methods to model sequential dependencies within time series, making it suitable for tasks where the prediction at the final time step depends on the prior inputs. These networks leverages its hidden state to capture the temporal dynamics from sequential inputs, enabling the model to predict the risk of a fall at the final time step.

In order to model the sequence-to-point binary classification task for inpatient fall risk prediction using RNNs [15], let $x_{t}$ represent a HDS at time $t$ for an individual without loss of generality. The RNN processes each time series up to time step $t$ to predict whether a fall occurs at time $t+1$ . The hidden state $\mathbf{h}_{t}$ at each time step is computed as

\mathbf{h}_{t}=\sigma(\mathbf{W}_{IH}\cdot\textbf{x}_{t}+\mathbf{W}_{HH}\cdot\mathbf{h}_{t-1}+\mathbf{b}_{h}),

(3)

where $\sigma(\cdot)$ is the activation function, $\mathbf{h}_{t}$ is the hidden state at time step $t$ , $\mathbf{W}_{IH}$ is the input weight matrix, $\mathbf{W}_{HH}$ is the recurrent weight matrix, and $\mathbf{b}_{h}$ is the bias vector. The hidden state $\mathbf{h}_{t}$ at time step $t$ captures the information of the HDSs up to that point.

At the time step $t$ , the hidden state $\mathbf{h}_{t}$ is used to predict the occurrence of a fall at time step $t+1$ . The hidden state is passed through a fully connected layer followed by a Softmax activation to produce the output logits as

\mathbf{z}_{t+1}=\mathbf{W}_{HO}\cdot\mathbf{h}_{t}+\mathbf{b}_{o},

(4)

where $\mathbf{W}_{HO}$ is the output weight matrix and $\mathbf{b}_{o}$ is the bias term. The Softmax activation function is applied to the logits to obtain the probability distribution over the two classes (fall or no fall) as

p_{c}=\frac{e^{z_{t+1,c}}}{\sum_{c=1}^{2}e^{z_{t+1,c}}},

(5)

where $z_{t+1,c}$ is the logit corresponding to class $c$ (fall or no fall)and the predicted outcome is

\tilde{y}_{n,t+1}=\arg\max_{c\in\{1,2\}}p_{c}.

(6)

For simplicity, assume $p$ as the predicted probability of the fall outcome class. The networks is trained using backpropagation and cross-entropy loss function as

\mathcal{L}=-\frac{1}{N}\sum_{n=1}^{N}\left(y_{n}\cdot\log(p_{n})+(1-y_{n})\cdot\log(1-p_{n})\right),

(7)

where $y_{n}\in\{0,1\}$ is the true label, $p_{n}$ is the predicted probability of the fall outcome class of individual $n\in\{1,...,N\}$ .

II-B2 Long Short-Term Memory Networks

To address the issue of vanishing gradients commonly faced by standard RNNs, we implemented a LSTM network, which introduces gates to control information flow and maintain long-range dependencies across time steps [15]. It maintains an internal memory state $\mathbf{c}_{t}$ along with the hidden state $\mathbf{h}_{t}$ . At each time step $t$ , the LSTM computes the input gate as

\mathbf{g}_{t}^{i}=\sigma(\mathbf{W}_{Ig^{i}}\cdot\textbf{x}_{t}+\mathbf{W}_{Hg^{i}}\cdot\mathbf{h}_{t-1}+\mathbf{b}_{g^{i}}),

(8)

where $\mathbf{W}_{Ig^{i}}$ is the weight matrix from the input layer to the input gate, $\mathbf{W}_{Hg^{i}}$ is the weight matrix from hidden state to the input gate, and $\mathbf{b}_{g^{i}}$ is the bias of the input gate. The forget gate is defined as

\mathbf{g}_{t}^{f}=\sigma(\mathbf{W}_{Ig^{f}}\cdot\textbf{x}_{t}+\mathbf{W}_{Hg^{f}}\cdot\mathbf{h}_{t-1}+\mathbf{b}_{g^{f}}),

(9)

where $\mathbf{W}_{Ig^{f}}$ is the weight matrix from the input layer to the forget gate, $\mathbf{W}_{Hg^{f}}$ is the weight matrix from hidden state to the forget gate, and $\mathbf{b}_{g^{f}}$ is the bias of the forget gate. The cell gate as

\mathbf{g}_{t}^{c}=\tanh(\mathbf{W}_{Ig^{c}}\cdot\textbf{x}_{t}+\mathbf{W}_{Hg^{c}}\cdot\mathbf{h}_{t-1}+\mathbf{b}_{g^{c}}),

(10)

where $\mathbf{W}_{Ig^{c}}$ is the weight matrix from the input layer to the cell gate, $\mathbf{W}_{Hg^{c}}$ is the weight matrix from hidden state to the cell gate, and $\mathbf{b}_{g^{c}}$ is the bias of the cell gate. The output gate is

\mathbf{g}_{t}^{o}=\sigma(\mathbf{W}_{Ig^{o}}\cdot\textbf{x}_{t}+\mathbf{W}_{Hg^{o}}\cdot\mathbf{h}_{t-1}+\mathbf{b}_{g^{o}}),

(11)

where $\mathbf{W}_{Ig^{o}}$ is the weight matrix from the input layer to the output gate, $\mathbf{W}_{Hg^{o}}$ is the weight matrix from hidden state to the output gate, and $\mathbf{b}_{g^{o}}$ is the bias of the output gate.

The memory state $\mathbf{c}_{t}$ and hidden state $\mathbf{h}_{t}$ are updated as

\mathbf{g}_{t}^{c}=\mathbf{g}_{t}^{f}\odot\mathbf{g}_{t-1}^{c}+\mathbf{g}_{t}^{i}\odot\mathbf{g}_{t}^{\tilde{c}},

(12)

and

\mathbf{h}_{t}=\mathbf{g}_{t}^{o}\odot\tanh(\mathbf{g}_{t}^{c}),

(13)

where $\mathbf{g}_{t}^{\tilde{c}}$ is the candidate cell state [15]. At time step $t$ , the hidden state $\mathbf{h}_{t}$ is used to predict the fall event similar to Eqs. (4) and (7) in training the proposed RNN.

II-B3 Gated Recurrent Unit

The GRU is a simplified variant of the LSTM that reduces the number of gates while retaining the ability to manage long-range dependencies [15]. GRUs simplify the gating mechanism by combining the forget and input gates into a single update gate. At each time step $t$ , the GRU computes an update gate as

\mathbf{z}_{t}=\sigma(\mathbf{W}_{z}\cdot\mathbf{x}_{t}+\mathbf{U}_{z}\cdot\mathbf{h}_{t-1}+\mathbf{b}_{z}),

(14)

and the reset gate as

\mathbf{r}_{t}=\sigma(\mathbf{W}_{r}\cdot\mathbf{x}_{t}+\mathbf{U}_{r}\cdot\mathbf{h}_{t-1}+\mathbf{b}_{r}),

(15)

and the candidate hidden state as

\tilde{\mathbf{h}}_{t}=\tanh(\mathbf{W}_{h}\cdot\mathbf{x}_{t}+\mathbf{U}_{h}\cdot(\mathbf{r}_{t}\odot\mathbf{h}_{t-1})+\mathbf{b}_{h}),

(16)

where the hidden state is then updated as

\mathbf{h}_{t}=(1-\mathbf{z}_{t})\odot\mathbf{h}_{t-1}+\mathbf{z}_{t}\odot\tilde{\mathbf{h}}_{t}.

(17)

At time step $t$ , the hidden state $\mathbf{h}_{t}$ is used to predict the fall event at timestamp $t+1$ similar to Eqs. (4) and (7) in training the RNN.

III Experiments

III-A Data

Our Institutional Review Board approved the study protocol. The dataset consisted of $46,695$ hospitalized patients, including $4,245$ who experienced a fall (median age $66$ ; $44\%$ male) and $42,450$ who did not (median age $66$ ; $48\%$ male). Retrospective data was collected from consecutive patients admitted between January 1, 2018, and May 23, 2023, for various medical and surgical conditions across 4 academic and 13 community hospitals in Arizona, Florida, Minnesota, and Wisconsin in the United States. Patients were identified using electronic medical records. Adults aged 18 years and older who had been hospitalized for at least one day were included in the study, while those admitted to critical care units, hospice, or psychiatric units were excluded.

TABLE I: Performance results in one-step ahead fall prediction. Results are normalized to a scale of one and averaged over 10-fold cross-validation.

Model Performance Metric (Avg. $\pm$ Std.) Accuracy F1 Score Specificity Sensitivity PPV AUC HDS 7 0.57 $\pm$ 0.01 0.57 $\pm$ 0.01 0.52 $\pm$ 0.01 0.62 $\pm$ 0.01 0.54 $\pm$ 0.01 0.57 $\pm$ 0.01 HDS 20 0.60 $\pm$ 0.01 0.56 $\pm$ 0.00 0.92 $\pm$ 0.00 0.29 $\pm$ 0.01 0.65 $\pm$ 0.01 0.60 $\pm$ 0.01 KNN 0.52 $\pm$ 0.00 0.39 $\pm$ 0.01 0.99 $\pm$ 0.01 0.05 $\pm$ 0.01 0.05 $\pm$ 0.01 0.54 $\pm$ 0.01 SVM 0.63 $\pm$ 0.01 0.62 $\pm$ 0.01 0.82 $\pm$ 0.01 0.44 $\pm$ 0.03 0.57 $\pm$ 0.01 0.66 $\pm$ 0.01 RF 0.63 $\pm$ 0.01 0.62 $\pm$ 0.01 0.80 $\pm$ 0.01 0.46 $\pm$ 0.01 0.62 $\pm$ 0.01 0.70 $\pm$ 0.01 XGB 0.63 $\pm$ 0.01 0.62 $\pm$ 0.01 0.81 $\pm$ 0.01 0.46 $\pm$ 0.01 0.62 $\pm$ 0.01 0.70 $\pm$ 0.01

TABLE II: Performance results in sequence-to-point fall prediction. Results are normalized to a scale of one and averaged over 10-fold cross-validation.

Model Performance Metric (Avg. $\pm$ Std.) Accuracy F1 Score Specificity Sensitivity PPV AUC RNN 0.69 $\pm$ 0.13 0.64 $\pm$ 0.18 0.69 $\pm$ 0.24 0.69 $\pm$ 0.37 0.69 $\pm$ 0.19 0.66 $\pm$ 0.12 LSTM 0.70 $\pm$ 0.12 0.66 $\pm$ 0.18 0.64 $\pm$ 0.22 0.44 $\pm$ 0.27 0.76 $\pm$ 0.12 0.70 $\pm$ 0.10 GRU 0.74 $\pm$ 0.20 0.67 $\pm$ 0.28 0.94 $\pm$ 0.07 0.53 $\pm$ 0.44 0.53 $\pm$ 0.18 0.77 $\pm$ 0.09

III-B Evaluation Setup

A 10-fold cross-validation was performed, with the average (Avg.) and standard deviation (Std.) of each performance metric recorded. In each independent run, the models were trained from scratch on a randomly selected training dataset and evaluated on a randomly selected balanced test dataset. For each cross-validation fold, a balanced test set was created by randomly selecting 10% of the data from the fall event class and 10% from the no fall event class. This left the remaining dataset imbalanced. To address this, a balanced training dataset was constructed for each fold by including all remaining encounters from the fall event class and randomly selecting an equal number of encounters from the no fall event class. The combined samples were shuffled prior to each training iteration.

The machine learning models were evaluated using several metrics. Accuracy is defined as

Acc=\frac{TP+TN}{P+N},

(18)

where $TP$ is the true positive value, $TN$ is the true negative value, $P$ is the number of true fall encounters, and $N$ is the number of true encounters without a fall. With a balanced test dataset, accuracy equals balanced accuracy. The F1 Score is given by

F1=\frac{2\cdot TP}{2\cdot TP+FP+FN},

(19)

where $FP$ represents false positives (encounters incorrectly predicted as fall event) and $FN$ denotes false negatives (encounters incorrectly predicted as not fall). Specificity, or true negative rate, is defined as

TNR=\frac{TN}{TN+FP},

(20)

and sensitivity, or true positive rate, is calculated as

TPR=\frac{TP}{TP+FN}.

(21)

The Positive Predictive Value (PPV) is defined as the proportion of $TP$ s out of the total number of positive results, calculated as

PPV=\frac{TP}{TP+FP}.

(22)

III-C Training Setup

The hyperparameter tuning was conducted using $10\%$ of the training data as the validation dataset, which was different from the test dataset, using random serach. All the models were implemented in Python and PyTorch [22] and trained on two NVIDIA A6000 GPUs with $256$ GB of RAM and $64$ CPU cores.

The SVM [23] model was built with a radial basis function (RBF) with a regularization parameter of $0.1$ (grid searched in $\{0.001,0.01,0.1,1,10\}$ ). The KNN model was evaluated for various nearest neighbor values in $\{1,2,...,10\}$ and the 1-nearest neighbour was selected. The XGB model was trained with $300$ estimators, $2$ parallel trees, and regularization coefficient $1$ . The number of trees in RF was searched ranging from $100$ to $500$ with step $100$ , set to $300$ , and the maximum depth of trees was set to $10$ to prevent overfitting.

Hyperparameter tuning for RNNs, LSTMs, and GRUs involved selecting optimal values for each parameter to maximize model performance and efficiency. For the number of units, the search was in $\{32,64,128,256\}$ with one layer. Exponential adaptive learning rate with Adam optimizer was used initiated from $0.1$ and $0.01$ , with $0.1$ being a good starting point, with a batch size of $32$ . Early-stopping was applied with a patience of $5$ epoch given $200$ training epochs. Dropout [24] rate was set to $0.5$ . In LSTMs, setting the forget gate bias close to $1$ helped the model retain long-term dependencies, and in GRUs, adjusting the update gate similarly enhances performance. The rectified linear unit (ReLU) [25] activation function was used due to its non-linearity and faster convergence.

III-D Performance Results Analysis

Table I presents the performance results for various models in one-step-ahead fall prediction, with metrics normalized to a scale of 1 and averaged over 10-fold cross-validation. The models evaluated include two threshold-based methods, the HDS 7 and HDS 20, and several machine learning algorithms including KNN, SVM, RF, and XGB. The HDS 7 threshold-based method demonstrates moderate and consistent performance across all metrics, while HDS 20 achieves high specificity at $0.92$ but low sensitivity at $0.29$ , indicating its strength in identifying non-fall events at the expense of detecting actual falls. KNN exhibits the highest specificity at $0.99$ but suffers from extremely low sensitivity at $0.05$ , highlighting its poor performance in fall detection. In contrast, the machine learning models SVM, RF, and XGB show comparable performance, with accuracy around $0.63$ , balanced F1 scores, and the area under the curve (AUC) values ranging from $0.66$ to $0.70$ . These results indicate that these models are more effective in balancing sensitivity and specificity compared to the threshold-based methods, with RF and XGB providing the best overall discriminative power, as evidenced by their higher AUC scores at $0.70$ .

Table II presents the performance results for various models in sequence-to-point fall prediction. Among the LSTM, GRU, and RNN models, distinct differences in effectiveness are observed. The GRU model achieves the highest accuracy at $0.74$ , demonstrating its strong capability to classify instances correctly. It also shows a commendable F1 score of $0.67$ , reflecting a balance between precision and recall, alongside impressive specificity at $0.94$ and moderate sensitivity at $0.53$ . Conversely, the RNN model, with slightly lower accuracy at $0.69$ , demonstrates higher sensitivity at $0.69$ , suggesting better performance in identifying positive instances. The LSTM model exhibits competitive performance with an accuracy of $0.70$ and a favorable F1 score of $0.66$ , though its specificity and sensitivity indicate a trade-off in accurately identifying true negatives and positives.

The superior performance of the GRU compared to the RNN and LSTM can be attributed to its streamlined architecture, which employs fewer parameters while effectively capturing long-range dependencies. By combining the forget and input gates into a single update gate, the GRU simplifies the model, enhancing its learning efficiency. This design helps mitigate the vanishing gradient problem that often plagues traditional RNNs, allowing the GRU to converge faster during training. Additionally, the GRU typically requires less computational resources, making it an attractive option in scenarios where model efficiency is critical.

Figure 2 displays the receiver operating characteristic curve (ROC) curves of models. Overall, the GRU model stands out for its accuracy and specificity, making it a preferable choice for applications prioritizing precision. However, if maximizing the identification of positive cases is the primary goal, the RNN model may be more suitable due to its higher sensitivity. Therefore, the selection of the model should be guided by the specific requirements of the application, whether focusing on maximizing correct classifications or optimizing for sensitivity.

IV Conclusion

In conclusion, effective fall risk assessment is crucial for enhancing patient safety in healthcare settings, particularly for hospitalized individuals. Traditional methods, such as the threshold-based HDS, provide a structured approach to evaluating fall risk but often fall short in capturing the dynamic nature of patient conditions. This study highlights the limitations of threshold-based models, which may overlook subtle changes in risk factors over time. In contrast, machine learning approaches, including one-step ahead and sequence-to-point fall prediction methods, offer a more sophisticated framework for predicting fall risk by analyzing temporal patterns and interactions among clinical variables. The comparative analysis demonstrates that machine learning models, particularly the GRU, outperformed traditional methods and provided a more balanced sensitivity and specificity. By utilizing advanced algorithms, healthcare providers can achieve more accurate predictions, leading to timely interventions that can significantly reduce the incidence of falls and associated complications. Future research should continue to explore and refine these machine learning techniques to further enhance fall risk assessment strategies in clinical practice.

References

[1] Karen L Perell, Audrey Nelson, Ronald L Goldman, Stephen L Luther, Nicole Prieto-Lewis, and Laurence Z Rubenstein. Fall risk assessment measures: an analytic review. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences, 56(12):M761–M766, 2001.
[2] Gideon Moseti Nyakundi. Use of the Hester Davis Falls Risk Assessment Scale in Medical-Surgical Patients. PhD thesis, Walden University, 2022.
[3] Amelia Payne. Impact of the Hester Davis Fall Risk Scale on Inpatient Falls. University of Missouri-Saint Louis, 2020.
[4] Veronica Strini, Roberta Schiavolin, and Angela Prendin. Fall risk assessment scales: A systematic literature review. Nursing Reports, 11(2):430–443, 2021.
[5] Amy L Hester and Dees M Davis. Validation of the hester davis scale for fall risk assessment in a neurosciences population. Journal of Neuroscience Nursing, 45(5):298–305, 2013.
[6] Hojjat Salehinejad, Anne M. Meehan, Parvez A. Rahman, Marcia A. Core, Bijan J. Borah, and Pedro J. Caraballo. Novel machine learning model to improve performance of an early warning system in hospitalized patients: a retrospective multisite cross-validation study. eClinicalMedicine, 66:102312, 2023.
[7] Hojjat Salehinejad, Anne M. Meehan, Pedro J. Caraballo, and Bijan J. Borah. Contrastive transfer learning for prediction of adverse events in hospitalized patients. IEEE Journal of Translational Engineering in Health and Medicine, 12:215–224, 2024.
[8] Navid Hasanzadeh, Shahrokh Valaee, and Hojjat Salehinejad. Hypertension detection from high-dimensional representation of photoplethysmogram signals. In 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pages 1–4, 2023.
[9] Hojjat Salehinejad and Shahrokh Valaee. Litehar: Lightweight human activity recognition from wifi signals with random convolution kernels. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4068–4072, 2022.
[10] Hojjat Salehinejad, Radomir Djogo, Navid Hasanzadeh, and Shahrokh Valaee. Smctl: Subcarrier masking contrastive transfer learning for human gesture recognition with passive wi-fi sensing. In 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–5, 2024.
[11] Edward H Lee, Jimmy Zheng, Errol Colak, Maryam Mohammadzadeh, Golnaz Houshmand, Nicholas Bevins, Felipe Kitamura, Emre Altinmakas, Eduardo Pontes Reis, Jae-Kwang Kim, et al. Deep covid detect: an international experience on covid-19 lung detection and prognosis using chest ct. NPJ digital medicine, 4(1):11, 2021.
[12] Hojjat Salehinejad, Edward Ho, Hui-Ming Lin, Priscila Crivellaro, Oleksandra Samorodova, Monica Tafur Arciniegas, Zamir Merali, Suradech Suthiphosuwan, Aditya Bharatha, Kristen Yeom, et al. Deep sequential learning for cervical spine fracture detection on computed tomography imaging. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pages 1911–1914. IEEE, 2021.
[13] Julian Theis, William L Galanter, Andrew D Boyd, and Houshang Darabi. Improving the in-hospital mortality prediction of diabetes icu patients using a process mining/deep learning architecture. IEEE Journal of Biomedical and Health Informatics, 26(1):388–399, 2021.
[14] Simon Haykin. Recurrent neural networks for. Digital Signal Processing Systems: Implementation Techniques: Advances in Theory and Applications, page 89, 1995.
[15] Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, Errol Colak, and Shahrokh Valaee. Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078, 2017.
[16] Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. Lstm neural networks for language modeling. In Interspeech, volume 2012, pages 194–197, 2012.
[17] Rahul Dey and Fathi M Salem. Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pages 1597–1600. IEEE, 2017.
[18] Jorma Laaksonen and Erkki Oja. Classification with learning k-nearest neighbors. In Proceedings of international conference on neural networks (ICNN’96), volume 3, pages 1480–1483. IEEE, 1996.
[19] Alex J Smola and Bernhard Schölkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199–222, 2004.
[20] Leo Breiman. Random forests. Machine learning, 45:5–32, 2001.
[21] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
[22] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
[23] Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28, 1998.
[24] A Labach, H Salehinejad, and S Valaee. Survey of dropout methods for deep neural networks. arxiv 2019. arXiv preprint arXiv:1904.13310.
[25] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Shun Chen. Lstm fully convolutional networks for time series classification. IEEE access, 6:1662–1669, 2017.

Deep Learning on Hester Davis Scores for Inpatient Fall Prediction