DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep
Traditional methods for assessing illness severity and predicting in-hospital mortality among critically ill patients require time-consuming, error-prone calculations using static variable thresholds. These methods do not capitalize on the emerging availability of streaming electronic health record data or capture time-sensitive individual physiological patterns, a critical task in the intensive care unit. We propose a novel acuity score framework (DeepSOFA) that leverages temporal measurements and interpretable deep learning models to assess illness severity at any point during an ICU stay. We compare DeepSOFA with SOFA (Sequential Organ Failure Assessment) baseline models using the same model inputs and find that at any point during an ICU admission, DeepSOFA yields significantly more accurate predictions of in-hospital mortality. A DeepSOFA model developed in a public database and validated in a single institutional cohort had a mean AUC for the entire ICU stay of 0.90 (95% CI 0.90–0.91) compared with baseline SOFA models with mean AUC 0.79 (95% CI 0.79–0.80) and 0.85 (95% CI 0.85–0.86). Deep models are well-suited to identify ICU patients in need of life-saving interventions prior to the occurrence of an unexpected adverse event and inform shared decision-making processes among patients, providers, and families regarding goals of care and optimal resource utilization.
Critically ill patients in the intensive care unit (ICU) have a life-threatening condition or the propensity to develop one at any moment. Early recognition of evolving illness severity in the ICU is invaluable. Timely and accurate illness severity assessments may identify patients in need of life-saving interventions prior to the occurrence of an unexpected adverse event and may inform shared decision-making processes among patients, providers, and families regarding goals of care and optimal resource utilization.
One of the most commonly used tools for assessing ICU patient acuity is the Sequential Organ Failure Assessment (SOFA) score1. SOFA considers 13 variables representing six different organ systems (cardiovascular, respiratory, nervous, liver, coagulation, and renal) and uses their worst measurements over a given interval (typically 24 hours) in conjunction with static value thresholds to assign numerical scores for each component. The sum of these component scores yields the overall SOFA score, which can be used to assess illness severity and predict mortality2,3,4. Although SOFA provides a reasonably accurate assessment of a patient’s overall condition and mortality risk, its accuracy is hindered by fixed cutoff points for each component score, and SOFA variables are often infrequent or missing in electronic health records. In particular, Glasgow Coma Scale scores and measurements of serum bilirubin and partial pressure of arterial oxygen are often sparse. Badawi et al.5 performed retrospective hourly recalculations of several acuity scores for ICU patients at 208 hospitals in the Philips eICU Research Institute database, reporting that hourly SOFA scores predicted ICU mortality with mean area under the receiver operating characteristic curve (AUC) of 0.86. Although hourly acuity score calculations may provide advantages despite the potentially confounding impact of transient and self-limited fluctuations in real-time data6, they are only feasible if implemented as an autonomous real-time process.
The availability of temporal trends and high-fidelity physiologic measurements in the ICU offers the opportunity to apply computational approaches beyond existing conventional models7,8,9. Our primary aim was to develop an acuity score framework that encompasses the full scope of a patient’s physiologic measurements over time to generate dynamic in-hospital mortality predictions. Our solution uses deep learning, a branch of machine learning that encompasses models and architectures that learn optimal features from the data itself, capturing increasingly complex representations of raw data by combining layers of nonlinear data transformations10,11. Deep learning models automatically discover latent patterns and form high-level representations from large amounts of raw data without the need for manual feature extraction based on a priori domain knowledge or practitioner intuition, which is time-consuming and error-prone. Deep learning has revolutionized natural language processing, speech recognition, and computer vision, and is gaining momentum within healthcare12. Computer vision has been used to identify diabetic retinopathy13 and recognize skin cancer with accuracy similar to that of a board-certified dermatologist14. Deep models have also been used to predict pain responses15, the onset of heart failure16, and ICU mortality17.
Here we report the development and external validation of DeepSOFA, a deep learning model that employs a clinician-interpretable variant of recurrent neural network (RNN) to analyze multivariate temporal clinical data in the ICU. Experiments were performed with two independent hospital populations and were designed to be cross-institutional; we report internal and externally validated results for both hospital cohorts. Cohorts were derived from ICU admissions at the University of Florida Health Hospital and the publicly available Medical Information Mart for Intensive Care (MIMIC-III) dataset that contains records for ICU patients from the Beth Israel Deaconess Medical Center in Boston, Massachusetts18. We compared deep learning mortality prediction models trained on hourly measurements with baseline models using traditional SOFA score definitions and the same hourly measurements using the entirety of a patient’s data stream over the same time period. Two baseline SOFA models were tested: a Bedside SOFA model using published mortality rates correlating with any given total SOFA score2, and a Traditional SOFA model in which hourly SOFA scores are correlated with in-hospital mortality for individual patients5. Because deep models automatically learn the complex, nonlinear associations among input variables, we hypothesized that DeepSOFA would yield greater accuracy in predicting in-hospital mortality among ICU patients compared with traditional SOFA techniques.
Development of DeepSOFA model
Two datasets (UFHealth and MIMIC) derived from two distinct cohorts of ICU patients from two academic medical centers, University of Florida Health (Gainesville, FL) and Beth Israel Deaconess Medical Center (Boston, MA), respectively, were used for model development and external cross validation (Table 1). The UFHealth cohort included 36,216 ICU admissions for 27,660 patients, and the MIMIC cohort included 48,948 ICU admissions for 35,993 patients. To ensure that results would be generalizable to all patients entering an ICU at any phase of a hospital admission, all ICU admissions and readmissions for all patients were analyzed. Cohorts were comparable in terms of patient characteristics and outcomes, with slightly shorter ICU stays with a median of 2.1 days (25th–75th percentiles 1.2–4.1) vs. 2.9 days (25th–75th percentiles 1.5–5.9), shorter time between hospital and ICU admission with a median of 0.1 hours (25th–75th percentiles 0.0–24.6) vs. 7.0 hours (25th–75th percentiles 1.8–21.9), longer time between ICU and hospital discharge with a median of 73.1 hours (25th–75th percentiles 27.0–143.0) vs. 48.4 hours (25th–75th percentiles 0.0–122.6), and greater proportion of ICU stays requiring mechanical ventilation (47.8% vs. 30.4%) for the MIMIC cohort compared to UFHealth cohort. The MIMIC cohort also included a greater proportion of Medical ICU admissions (38.9% vs. 25.1%) and Cardiac ICU admissions (32.4% vs. 18.3%), with fewer Surgical ICU admissions (28.7% vs. 33.0%). The DeepSOFA model was trained and internally validated with 5-fold cross validation in each cohort separately.
Learn more at: https://www.nature.com/articles/s41598-019-38491-0