Database Credentialed Access
MIMIC-IV-ED
Alistair Johnson , Lucas Bulgarelli , Tom Pollard , Leo Anthony Celi , Roger Mark , Steven Horng
Published: June 3, 2021. Version: 1.0 <View latest version>
When using this resource, please cite:
(show more options)
Johnson, A., Bulgarelli, L., Pollard, T., Celi, L. A., Mark, R., & Horng, S. (2021). MIMIC-IV-ED (version 1.0). PhysioNet. https://doi.org/10.13026/77z6-9w59.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
MIMIC-ED is a large, freely available database of emergency department (ED) admissions at the Beth Israel Deaconess Medical Center between 2011 and 2019. As of MIMIC-ED v1.0, the database contains 448,972 ED stays. Vital signs, triage information, medication reconciliation, medication administration, and discharge diagnoses are available. All data are deidentified to comply with the Health Information Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-ED is intended to support a diverse range of education initiatives and research studies.
Background
The emergency department (ED) is a high demand environment where patients are assessed and triaged for further care. ED patients compose a heterogenous cohort with severity ranging from mild abrasions to life-threatening cardiac complications. The ED is fundamentally a resource limited environment where the most important resource available, human attention, is rationed to maximize positive patient outcomes. Recent advances in algorithmic approaches present an exciting opportunity for improving the quality of care delivered in the ED. A prerequisite to data-driven analyses are sufficiently large datasets, and broad data accessibility enables reproducibility of research. MIMIC-ED is intended to support data analysis in emergency care by providing a large database of admissions to an ED at a tertiary academic medical center in Boston, MA. It is a module of MIMIC-IV [1].
Methods
Data were extracted from the Beth Israel Deaconess Medical Center (BIDMC) ED in eXtended Markup Language (XML), and subsequently converted from XML into a denormalized relational database designed to simplify analysis. All data were deidentified to comply with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision [2]. Patient identifiers were replaced with randomized surrogates. Three deidentified patient identifiers are present in the dataset: subject_id
, hadm_id
, and stay_id
. All three of these identifiers were generated in concordance with MIMIC-IV and MIMIC-CXR, allowing linkage of these datasets using one or more of the aforementioned identifiers [1,3-5]. Dates were shifted to a random time occurring between 2100 - 2200 on a patient-specific basis. Date shifts were consistently applied for a single subject_id
, and all times associated with a single subject_id
are temporally consistent and reflect the true order of events. Conversely, distinct subject_id
which have data overlapping in time were not necessarily present in the ED at the same time. Finally, free-text fields were processed using a hybrid deidentification algorithm, and PHI entities detected were replaced with three underscores (’___’) [6].
A schema comprising of six tables was created. The edstays table was created to track patient admission and discharge from the ED for a single patient stay as identified by stay_id
. Five data tables store information documented during the patient's stay: diagnosis, medrecon, pyxis, triage, and vitalsign. Tables are named to reflect the data within or its provenance. While a core aim of MIMIC-ED is to provide real world clinical data for research purposes, and as such limit the amount of preprocessing performed prior to data release, a number of data cleaning steps were necessary during transformation. Observations were deduplicated upon insertion using the table specific primary key. The primary key was a combination of stay_id
, charttime
if present, and additional attribute columns as appropriate (e.g. the name
column in pyxis). For deidentification purposes, a regular expression was used to retain only numeric vital signs in the triage and vitalsign tables. Observations more than one year outside of the ED stay - usually occurring due to typographical errors in the charted time - were removed.
The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center, Boston, MA and the Massachusetts Institute of Technology, Cambridge, MA (#2001P001699).
Data Description
MIMIC-ED is composed of a single patient tracking table, edstays, and five data tables: diagnosis, medrecon, pyxis, triage, and vitalsign.
edstays
Patient stays are tracked in the edstays table. Each row of the edstays table has a unique stay_id
, which represents a unique patient stay in the ED. The edstays table contains the following columns: subject_id
, hadm_id
, stay_id
, intime
, and outtime
. The intime
indicates the time at which the patient was admitted to the ED, and the outtime
indicates the time at which the patient was discharged from the ED. If the patient was admitted to the hospital following their ED stay, the hadm_id
column will be populated with an identifier representing their hospital stay. hadm_id
can be linked with the hadm_id
in MIMIC-IV to obtain further detail about the patient’s hospital stay. Finally, each individual is assigned a unique subject_id
, and patients with multiple ED stays will have the same subject_id
across stays in the edstays table. Note that subject_id
can be linked with MIMIC-IV to obtain patient demographics. subject_id
can also be linked with the PatientID DICOM attribute in MIMIC-CXR to obtain chest x-rays for patients if they were taken [3].
diagnosis
The diagnosis table provides coded diagnoses for the patient in the International Classification of Diseases (ICD) Ninth or Tenth revision (ICD-9 or ICD-10). These diagnoses are determined by trained coders after discharge from the emergency department and are used for billing purposes. There are six columns in the diagnosis table: subject_id
, stay_id
, seq_num
, icd_code
, icd_version
, and icd_title
. A maximum of 9 ICD codes are available for a single stay. The seq_num
column provides a pseudo-order for the ICD codes, with a value of 1 usually indicating highest relevance and a value of 9 indicating least relevance. The icd_code
provides the coded representation of the diagnosis using the ICD ontology, the icd_version
column is either 9 or 10 indicating whether the ontology used is ICD-9 or ICD-10, and the icd_title
column provides the textual description of the ICD code.
It is important to note that the billed diagnoses in the diagnosis table are exclusively related to the patient's emergency department stay. If the patient is subsequently admitted to the hospital, they will have a separate set of billed diagnoses for their hospital stay, which are not recorded in this table. See the usage notes for details regarding linking MIMIC-ED to MIMIC-IV, which would facilitate comparison of the billed ED diagnoses with billed hospital diagnoses.
medrecon
The medrecon table provides medicine reconciliation for each patient, that is a list of the medications which the patient was taking prior to their ED stay. The medrecon table has nine columns: subject_id
, stay_id
, charttime
, name
, gsn
, ndc
, etc_rn
, etccode
, and etcdescription
. The charttime
provides the date and time at which the medicine reconciliation was documented. The name
column provides a text description of the medicine, the gsn
column provides the Generic Sequence Number (GSN), and the ndc
column provides the National Drug Code (NDC). Note a gsn
or an ndc
of 0 indicates that the value is missing. Columns prefixed with etc
provide an ontology for grouping together drugs of a similar class. Note that as a medicine can be classified in multiple groups in the ontology, there may be more than one row for a single medication. For example, the medication Adderal is (1) a CNS stimulant, (2) an Attention Deficit-Hyperactivity Therapy, and (3) a narcolepsy therapy. As a result, patients taking adderal prior to their admission will have three rows in the medrecon table, delineated by the sequential monotonically increasing integer etc_rn
. The etccode
provides the coded form of the ontology group, and the etcdescription
proides the textual description of the ontology group.
pyxis
The pyxis table provides dispensation information for medications provided by the BD Pyxis MedStation, an automated medication dispensing system present in the ED [7]. The pyxis table has nine columns: subject_id
, stay_id
, charttime
, med_rn
, name
, gsn_rn
, and gsn
. The charttime
provides the time at which the medication was dispensed. If multiple medications were dispensed at the same time, the med_rn
column delineates these medications. The name
column provides a textual description of the medication dispensed, and may additionally contain auxiliary information such as the formulation. The gsn
column provides the Generic Sequence Number (GSN) if available, and gsn_rn
delineates multiple GSN values associated with the same medication. Note that a gsn
of 0 indicates that the GSN is missing. Not all medications are dispensed by the Pyxis MedStation, and as a result not all medications are recorded in the pyxis table. For example, large fluid volumes (such as those used for resuscitation) are not present in this table.
triage
The triage table provide information collected from the patient at the time of triage. All patients who present to the ED are immediately triaged, a process which involves assessing their health status and ascertaining the reason for their visit. The triage table has eleven columns: subject_id
, stay_id
, temperature
, heartrate
, resprate
, o2sat
, sbp
, dbp
, pain
, acuity
, and chiefcomplaint
. Vital signs collected at triage include patient temperate (temperature
), heart rate (heartrate
), respiratory rate (resprate
), oxygen saturation (o2sat
), systolic blood pressure (sbp
), and diastolic blood pressure (dbp
). Although vital signs can be documented as free-text, the deidentification approach retained only numeric vital signs. A patient reported pain level is available in the pain
column. The chiefcomplaint
is a free-text field which contains the patient’s reported reason for presenting to the ED. The chiefcomplaint
field is usually a comma separated list of entries. PHI present in the chiefcomplaint
field has been replaced by three underscores ("___"
). Based upon the triage assessment, the care provider will assign an integer level of severity (acuity
), where 1 indicates the highest severity and 5 indicates the lowest severity.
vitalsign
The vitalsign table contains aperiodic vital signs documented for patients during their stay. The vitalsign table has eleven columns: subject_id
, stay_id
, charttime
, temperature
, heartrate
, resprate
, o2sat
, sbp
, dbp
, rhythm
, and pain
. Vital signs in the vitalsign table are similar to those collected in the triage table. The rhythm
column additionally provides the hearth rhythm for the patient. The charttime
provides the time at which the vital signs were recorded.
Usage Notes
Organization
MIMIC-ED is organized in a star schema, best understood visually, in which a a single table is at the center of the star and all other tables link to this central table using the same identifier. The edstays table provides admission and discharge times for each stay in MIMIC-ED, uniquely referred to by the identifier stay_id
. All other tables may be linked to edstays through stay_id
, and most tables have more than one row per stay_id
.
MIMIC-ED may be analyzed using any number of software programs, including relational database management systems. Code for loading MIMIC-ED into PostgreSQL is provided in an open source repository [8,9]. The repository also contains code for deriving concepts, tutorials, data analysis notebooks, and acts as a forum for community discussion [8,9]. We further provide MIMIC-ED natively in cloud based database services including Google BigQuery, allowing immediate use of the dataset for credentialed investigators.
Data Linkage
MIMIC-ED is usable as a standalone research database, but may also be linked to MIMIC-IV and MIMIC-CXR [1,3]. The subject_id
value provides an implicit link between the datasets; that is all three databases refer to the same individual with the same subject_id
. All ED stays in MIMIC-ED, represented by stay_id
, are present in the MIMIC-IV transfers table. Linking to MIMIC-IV, for example, would provide approximate age for ED patients which are present in the patients table. Laboratory measurements for ED patients would be available in the labevents table of the hosp module in MIMIC-IV, prescribed medications would be available in the prescriptions table of the hosp module in MIMIC-IV, and so on. Emergency department patients who are eventually admitted to the intensive care unit would have information regarding their subsequent ICU stay in the icu module of MIMIC-IV. As a result, MIMIC-ED may be used to acquire pre-ICU information for critically ill patients in MIMIC-IV. MIMIC-IV covers a wider time frame than MIMIC-ED, and as such not all emergency department stays in MIMIC-IV will be present in MIMIC-ED, but almost all ED admissions in MIMIC-ED will be present in MIMIC-IV.
Patients within MIMIC-CXR are a subset of patients within MIMIC-ED. As a result, many ED patients who have a chest x-ray ordered will have the image and radiology report available in MIMIC-CXR. Note that not all ED patients will have x-rays in MIMIC-CXR as MIMIC-ED covers a larger time frame, but almost all ED stays which have x-rays in MIMIC-CXR will have the associated stay present in MIMIC-ED.
Limitations
Data contained within MIMIC-ED are collected during routine clinical care, and their use for research is secondary to their use in clinical care. The data may contain implicit biases as a result of local data collection practices, implausible values for measurements, and missing documentation for provided treatments. Many interventions, including major events such as endotracheal intubation, are not documented clearly. Researchers should take care to address these issues in their work.
Release Notes
The current version of MIMIC-ED is v1.0, the initial release of MIMIC-ED.
Acknowledgements
We would like to thank the Beth Israel Deaconess Medical Center for their continued collaboration and support of MIMIC. In particular we thank Carolyn Conti, Alvin Gayles, Ayad Shammout, and Lu Shen for their help with data extraction.
Conflicts of Interest
Nothing to declare.
References
- Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2021). MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98.
- Health Insurance Portability and Accountability Act [HIPAA] of 1996, Pub. L. No. 104-191. https://www.congress.gov/104/plaws/publ191/PLAW-104publ191.pdf
- Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. https://doi.org/10.13026/C2JT1Q.
- Johnson, A., Lungren, M., Peng, Y., Lu, Z., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR-JPG - chest radiographs with structured labels (version 2.0.0). PhysioNet. https://doi.org/10.13026/8360-t248.
- Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 6, 317 (2019). https://doi.org/10.1038/s41597-019-0322-0
- Johnson AEW, Bulgarelli L, and Pollard T. 2020. Deidentification of free-text medical records using pre-trained bidirectional transformers. In Proceedings of the ACM Conference on Health, Inference, and Learning (CHIL '20). Association for Computing Machinery, New York, NY, USA, 214–221. DOI:https://doi.org/10.1145/3368555.3384455
- Pyxis Medstation Website. https://www.bd.com/en-us/offerings/capabilities/medication-and-supply-management/medication-and-supply-management-technologies/pyxis-medication-technologies/pyxis-medstation-es-system [Accessed: 10 April 2021]
- MIMIC Code Repository on GitHub. https://github.com/MIT-LCP/mimic-code/ [Accessed: 1 May 2021]
- Alistair E W Johnson, David J Stone, Leo A Celi, Tom J Pollard, The MIMIC Code Repository: enabling reproducibility in critical care research, Journal of the American Medical Informatics Association, Volume 25, Issue 1, January 2018, Pages 32–39, https://doi.org/10.1093/jamia/ocx084
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0):
https://doi.org/10.13026/77z6-9w59
DOI (latest version):
https://doi.org/10.13026/1cjn-2370
Topics:
ed
emergency
mimic-iv
mimic
Project Website:
https://mimic-iv.mit.edu
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project