Database Credentialed Access
SCRIPT CarpeDiem Dataset: demographics, outcomes, and per-day clinical parameters for critically ill patients with suspected pneumonia
Nikolay Markov , Catherine A Gao , Thomas Stoeger , Anna Pawlowski , Mengjia Kang , Prasanth Nannapaneni , Rogan Grant , Luke Rasmussen , Daniel Schneider , Justin Starren , Richard Wunderink , GR Scott Budinger , Alexander Misharin , Benjamin Singer , NU SCRIPT Study Investigators
Published: March 13, 2023. Version: 1.1.0
When using this resource, please cite:
(show more options)
Markov, N., Gao, C. A., Stoeger, T., Pawlowski, A., Kang, M., Nannapaneni, P., Grant, R., Rasmussen, L., Schneider, D., Starren, J., Wunderink, R., Budinger, G. S., Misharin, A., Singer, B., & Study Investigators, N. S. (2023). SCRIPT CarpeDiem Dataset: demographics, outcomes, and per-day clinical parameters for critically ill patients with suspected pneumonia (version 1.1.0). PhysioNet. https://doi.org/10.13026/5phr-4r89.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Traditional approaches to analyzing episodes of patient care in the ICU examine features on presentation and outcomes on discharge, collapsing the numerous events that happen during a patient’s stay and ignoring intercurrent ICU complications or interventions. With CarpeDiem, we aimed to examine clinical features on a per-day basis as done during the common practice of daily physician rounds. The Successful Clinical Response in Pneumonia Therapy (SCRIPT) CarpeDiem Dataset features 12,495 patient-ICU-days from 585 patients enrolled in the SCRIPT study between June 2018 and March 2022, all of whom were critically ill patients on mechanical ventilation suspected of having pneumonia who underwent a bronchoalveolar lavage as part of routine clinical care. Each patient has demographics, pneumonia episode information as adjudicated by a panel of critical care physicians, outcomes, and 44 clinical parameters for which data are present including vital signs, laboratory parameters, and mechanical support devices. Data have been deidentified per Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor rules. This dataset combines expert clinician adjudication with per-day granular information to provide a unique tool to examine the clinical courses of critically ill patients with pneumonia. We use this dataset in a manuscript examining the contribution of ventilator-associated pneumonia to patient outcomes (see Usage section for details) and share it for others to work with.
Machine learning has emerged as a powerful tool for integrating information from complex biomedical datasets, allowing identification of biomarkers, improving diagnosis, and predicting outcomes across the spectrum of medical conditions [1-3]. Traditional approaches have been applied at the level of patient encounters, using clinical data collected on admission to predict therapies that will improve outcomes on discharge [4-6]. This strategy largely ignores the duration of illness or intercurrent events such as ICU complications or interventions occurring over the course of the illness that are only indirectly related to the initial diagnosis. These approaches are particularly ill-suited to understanding patients with long lengths of stay. With CarpeDiem, we aimed to examine clinical features on a per-day basis, as done during the common practice of daily physician rounds. Since many labs are checked at least once a day, and major changes are often made at least once a day after team rounds, we chose this time block to bin our features. We hope that by providing these data with this discretization, scientists can examine more of the intercurrent events during what is often a prolonged ICU course.
Data were extracted from the electronic health record for patients enrolled in the SCRIPT study. The SCRIPT CarpeDiem Dataset features 12,495 patient-ICU-days from 585 patients enrolled in the SCRIPT study between June 2018 and March 2022, all of whom were critically ill patients on mechanical ventilation suspected of having pneumonia who underwent a bronchoalveolar lavage as part of routine clinical care. Each patient has demographics, pneumonia episode information as adjudicated by a panel of critical care physicians, outcomes, and 44 clinical parameters for which data are present including vital signs, laboratory parameters, and mechanical support devices; additional details are described in our manuscript. Details of how clinical parameters were aggregated are available on our code repository . Factors used for the SOFA score calculation are included in the 44 clinical parameters, and the SOFA score is also calculated for each ICU patient-day.
Demographic (including age, gender, ethnicity, race, BMI, smoking status, admission source, admission (acute physiology score [APS] and Sequential Organ Failure Assessment [SOFA] score) and outcomes (including discharge disposition, cumulative ICU days, cumulative intubation days, number of ICU stays, tracheostomy requirement) data are provided for the hospitalization. Discharge dispositions are categorized into six categories: Home, Acute inpatient rehabilitation (‘Rehab’), Skilled Nursing Facility (‘SNF’), Long-term acute care hospital (‘LTACH’), Hospice, or Died. Given our focus on patients with severe respiratory failure, patients who underwent lung transplant during their stay are categorized as having died, as their native lungs have been replaced and they would likely have died had they not undergone lung transplantation. A panel of critical care physicians adjudicated patients on the basis of their enrollment bronchoalveolar lavage results into four categories: Non-pneumonia Control, COVID-19, Other viral pneumonia, and Other pneumonia. This panel also identified, through an interactive review process, individual pneumonia episodes, and whether these episodes were deemed to be successfully cured, indeterminate, or not cured. Details are available in our manuscripts describing the adjudication process [7,8].
The data were deidentified according to the HIPAA Safe Harbor rules . All dates have been removed and are presented relative to a patient’s ICU stay, with ICU stay 1, day 1 being the first day of the patient’s first ICU stay. Ages above the age of 89 are aggregated into a pool labeled 91, as done by others . Deidentification was performed by CAG and NSM, with review by LVR, MK, DS, JS
This is a cohort of critically ill patients from the Successful Clinical Response in Pneumonia Therapy (SCRIPT) study , a prospective single-center observational study of patients who were hospitalized in a medical ICU, required mechanical ventilation, and received a bronchoalveolar lavage given suspicion for pneumonia. SCRIPT seeks to delineate the host/pathogen interactions during pneumonia using multiomic analysis of bronchoalveolar lavage fluid joined with clinical data and physician adjudication.
There are 585 patients in the cohort with median [IQR] age of 62 [51,72]. 190 had COVID-19, 50 had pneumonia secondary to other respiratory viruses, 252 had other pneumonia (bacterial), and 93 were initially suspected of having pneumonia yet subsequently adjudicated as having respiratory failure unrelated to pneumonia (non-pneumonia controls). 45% of the cohort had an unfavorable outcome defined as discharge to hospice or death. There are 136 episodes of community-acquired pneumonia (CAP), 214 episodes of hospital-acquired pneumonia (HAP), and 328 episodes of ventilator-associated pneumonia (VAP). 317 episodes were adjudicated to be successfully cured, 131 were adjudicated as indeterminate, and 230 were adjudicated as not cured.
Dataset - ‘CarpeDiem_dataset.csv’ This csv file is a data table that has each patient-ICU-day presented in a single row, along with admission summary information. The first 20 columns present demographic and outcome data summarized across the patient’s stay, the next 47 columns contain day-by-day values, and the last 5 columns present clinical pneumonia episode adjudication data. Columns represent values of features examined daily in the practice of physician rounds, including whether or not the patient required support from mechanical ventilation, extracorporeal membrane oxygenation, and dialysis. Vitals, ventilator parameters, and lab values are also summarized.
Data dictionary - CarpeDiem_data_dictionary.csv’ this csv file is a dictionary of variables included, and explanation of column names.
We used this dataset to examine daily clinical states by performing unsupervised clustering on patient-day features, and identified 14 clusters with different clinical characteristics. We used this map to ask questions about patterns in pneumonia between patients who had or did not have COVID-19 . The code repository used in our paper is available at Github  - please feel free to open issues on that platform with questions.
This unique feature of clinical adjudication provided by a panel of critical care physicians will allow scientists to examine the dataset for features associated with successful pneumonia treatment. Additional analyses can be done on outcome prediction with traditional and more novel tools such as machine learning algorithms.
Limitations include the missing data inherent to EHR data, though we believe this is informative in and of itself, as physicians will not check labs when they are not felt to be helpful or clinically important. Other patterns of missing data - such as the lack of ventilator information when a patient is extubated - are also inherent to the absence or presence of such a support device. We chose 44 parameters of clinical interest as identified by our critical care physicians, but there are many other parameters available or measured in the ICU that are not included in our dataset.
Version 1.0.0 initial release.
This study is approved by the Northwestern University Institutional Review Board with study ID STU00204868.
The authors would like to thank Northwestern Memorial Hospital and the Northwestern University Feinberg School of Medicine for their support, as well as all the patients, providers, and SCRIPT team members. SCRIPT is funded by NIH NIAID U19AI135964.
Conflicts of Interest
BDS holds US patent 10,905,706, “Compositions and methods to accelerate resolution of acute lung inflammation,” and serves on the Scientific Advisory Board of Zoe Biosciences, for which he holds stock options. The other authors have no conflicts of interest to declare.
- Deo, R. C. Machine Learning in Medicine. Circulation 132, 1920–1930 (2015).
- Wiens, J. & Shenoy, E. S. Machine Learning for Healthcare: On the Verge of a Major Shift in Healthcare Epidemiology. Clin. Infect. Dis. 66, 149–153 (2018).
- Kanjilal, S. et al. A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection. Sci. Transl. Med. 12, (2020).
- Chan, P. S., Jain, R., Nallmothu, B. K., Berg, R. A. & Sasson, C. Rapid Response Teams: A Systematic Review and Meta-analysis. Arch. Intern. Med. 170, 18–26 (2010).
- Buist, M. D. et al. Effects of a medical emergency team on reduction of incidence of and mortality from unexpected cardiac arrests in hospital: preliminary study. BMJ 324, 387–390 (2002).
- McNeill, G. & Bryden, D. Do either early warning systems or emergency response teams improve hospital patient survival? A systematic review. Resuscitation 84, 1652–1667 (2013).
- Pickens CI et al. An Adjudication Protocol for Severe Bacterial and Viral Pneumonia. medRxiv (2022). doi: 10.1101/2022.10.26.22281461
- Grant, R. A. et al. Circuits between infected macrophages and T cells in SARS-CoV-2 pneumonia. Nature (2021) doi:10.1038/s41586-020-03148-w.
- Office for Civil Rights (OCR). Guidance regarding methods for DE-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. HHS.gov. 2012; published online Sept 7. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html (accessed Sept 9, 2022).
- Johnson, A. et al. MIMIC-IV. (2022) doi:10.13026/7VCR-E114.
- NU SCRIPT. “Successful Clinical Response In Pneumonia Therapy (SCRIPT) Systems Biology Center.” https://script.northwestern.edu/
- NU SCRIPT. “CarpeDiem resources”. https://nupulmonary.org/carpediem/
- NU SCRIPT. “NUSCRIPT/Carpediem: Notebooks and Code for the Carpediem Project.” GitHub, https://github.com/NUSCRIPT/carpediem.
- Gao, Markov, Stoeger et al. A machine learning approach identifies unresolving secondary pneumonia as a contributor to mortality in patients with severe pneumonia, including COVID-19. medRxiv (2022) doi: 10.1101/2022.09.23.22280118v1
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
CITI Data or Specimens Only Research
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project