Database Credentialed Access

MIMIC-III Clinical Database CareVue subset

Alistair Johnson Tom Pollard Roger Mark

Published: Sept. 21, 2022. Version: 1.4


When using this resource, please cite: (show more options)
Johnson, A., Pollard, T., & Mark, R. (2022). MIMIC-III Clinical Database CareVue subset (version 1.4). PhysioNet. https://doi.org/10.13026/8a4q-w170.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

MIMIC-III is a database of critically ill patients admitted to an intensive care unit (ICU) at the Beth Israel Deaconess Medical Center (BIDMC) in Boston, MA. MIMIC-III has seen broad use, and was updated with the release of MIMIC-IV. MIMIC-IV contains more contemporaneous stays, higher granularity data, and expanded domains of information. To maximize the sample size of MIMIC-IV, the database overlaps with MIMIC-III, and specifically both databases contain the same admissions which occurred between 2008 - 2012. This overlap complicates analyses of the two databases simultaneously. Here we provide a subset of MIMIC-III containing patients who are not in MIMIC-IV. The goal of this project is to simplify the combination of MIMIC-III with MIMIC-IV.


Background

MIMIC-III was first published in 2015 and has since received large interest from researchers, students, and practitioners worldwide. A significant source of data in MIMIC-III was the clinical information system present in the ICUs at the time. From 2001 - 2008, the information system in use was Philips CareVue. In 2008, the ICUs changed to iMDSoft MetaVision as the clinical information system. As a result, many concepts in MIMIC-III must be coded for twice: once using an algorithm to extract the values from CareVue, and once again using an algorithm to extract values from MetaVision. The MIMIC Code Repository simplified this exercise for most investigators by providing a community resource where derivations from MIMIC-III could be collaboratively developed [2].

MIMIC-IV was released in 2020 and updated MIMIC-III with new datatypes and more recent data, with the period of data collection spanning 2008 - 2019 [3]. In the release of MIMIC-IV, a key decision was made for date shifting which impacted its relationship with MIMIC-III. First, the mechanism of date shifting was changed, resulting in new date shifts for each patient. The date shifting approach in MIMIC-III was chosen to preserve day of the week and season, allowing investigators to study the effects of these phenomena on clinical care. In order to protect patient privacy and mask the true dates of service, retaining the day of week and season required obscuring the year of admission for each patient.

Since the publication of MIMIC-III, it has become apparent that preserving the year of admission is more desirable than the day of the week and season as it allows for study of changing clinical practice over time. As a result, the date shifting approach was changed in MIMIC-IV to preserve year of admission (with added noise), while the day of the week and season was obscured. As a result of changing the date shift, new subject_id values were generated for each patient, and the individuals in MIMIC-III cannot be linked to MIMIC-IV.

The generation of a new subject_id for each individual in MIMIC-IV makes it difficult to identify the same individual in MIMIC-III. This is particularly relevant giving their overlapping data collection periods. Independence of observations is an important assumption of many analytical approaches, and researchers are currently unable to ensure this assumption is met when merging MIMIC-III and MIMIC-IV.


Methods

We aimed to subselect MIMIC-III to contain only those subject_id which do not occur in MIMIC-IV. The MIMIC-III Clinical Database v1.4 was downloaded from PhysioNet [1]. We extracted CareVue stays from the downloaded icustays table using dbsource = 'carevue'. This provided a table with the subject_id, hadm_id, and icustay_id to retain in the CareVue subset. For tables with no patient specific information (e.g. d_items), we retained the entire table. For tables with individual patient data, we used the most specific patient identifier present for all rows to filter the data. Specifically, we used the following identifier for the following tables:

  • subject_id - patients
  • hadm_id - admissions, callout, cptevents, diagnoses_icd, drgcodes, microbiologyevents, prescriptions, procedures_icd, services, transfers
  • icustay_id - chartevents, datetimeevents, icustays, inputevents_cv, inputevents_mv, outputevents, procedureevents_mv

For the above tables, filtering to the stated patient identifiers for the CareVue subset implicitly filters data collected to 2001 - 2008. For labevents and noteevents, we retained only data documented before the discharge date of the last hospitalization in the CareVue subset.

Once the CareVue subset tables had been generated, a second filter was applied to remove individuals admitted during the MIMIC-IV data collection period. Any subject with a hospitalization in MIMIC-IV v2.0 was removed. This resulted in 6,636 unique individuals being removed from the CareVue subset.


Data Description

The MIMIC-III Clinical Database CareVue subset is formatted identically to MIMIC-III. For detail on the data structure, see the MIMIC-III Clinical Database PhysioNet page [1]. MetaVision associated tables, including inputevents_mv and procedureevents_mv, are retained in the database to allow compatibility with MIMIC-III queries. These tables are empty as no patients were admitted using the MetaVision system in the CareVue subset.


Usage Notes

The CareVue subset can be used in much the same was as the original MIMIC-III Clinical Database [1]. The data files were exported from a PostgreSQL database using the "CSV" (comma separated value) argument and compressed using the gzip application. After decompression, the files follow the RFC 4180 guidance for CSV files.

Note that the removal of patients present in MIMIC-IV results in a bias towards mortality, as surviving patients have been removed. Researchers should make note of this bias and contextualize their results accordingly.


Release Notes

The release notes will follow those of the MIMIC-III Clinical Database [1]. The current version of the MIMIC-III Clinical Database Carevue subset is v1.4, and it was derived from MIMIC-III v1.4.


Ethics

The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.


Acknowledgements

We would like to thank the Beth Israel Deaconess Medical Center for their continued support of the MIMIC project. In particular we would like to thank Carolyn Conti, Alvin Gayles, Larry Markson, Ayad Shammout, Lu Shen, and Manu Tandon for their assistance. We would also like to thank the NIH for their gracious support.


Conflicts of Interest

None to declare.


References

  1. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  2. Johnson, A. E., Stone, D. J., Celi, L. A., & Pollard, T. J. (2018). The MIMIC Code Repository: enabling reproducibility in critical care research. Journal of the American Medical Informatics Association, 25(1), 32-39.
  3. Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2021). MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98.

Parent Projects
MIMIC-III Clinical Database CareVue subset was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Discovery
Corresponding Author
You must be logged in to view the contact information.

Files