Database Open Access
MIMIC-IV Clinical Database Demo
Published: June 22, 2022. Version: 1.0
When using this resource, please cite:
(show more options)
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2022). MIMIC-IV Clinical Database Demo (version 1.0). PhysioNet. https://doi.org/10.13026/jwtp-v091.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.
The increasing adoption of digital electronic health records has led to the existence of large datasets that could be used to carry out important research across many areas of medicine. Research progress has been limited, however, due to limitations in the way that the datasets are curated and made available for research. The MIMIC datasets allow credentialed researchers around the world unprecedented access to real world clinical data, helping to reduce the barriers to conducting important medical research. The public availability of the data allows studies to be reproduced and collaboratively improved in ways that would not otherwise be possible.
First, the set of individuals to include in the demo was chosen. Each person in MIMIC-IV is assigned a unique
subject_id. As the
subject_id is randomly generated, ordering by
subject_id results in a random subset of individuals. We only considered individuals with an
anchor_year_group value of 2011 - 2013 or 2014 - 2016 to ensure overlap with MIMIC-CXR v2.0.0. The first 100
subject_id who satisfied the
anchor_year_group criteria were selected for the demo dataset.
All tables from MIMIC-IV were included in the demo dataset. Tables containing patient information, such as emar or labevents, were filtered using the list of selected
subject_id. Tables which do not contain patient level information were included in their entirety (e.g. d_items or d_labitems). Note that all tables which do not contain patient level information are prefixed with the characters 'd_'.
Deidentification was performed following the same approach as the MIMIC-IV database. Protected health information (PHI) as listed in the HIPAA Safe Harbor provision was removed. Patient identifiers were replaced using a random cipher, resulting in deidentified integer identifiers for patients, hospitalizations, and ICU stays. Stringent rules were applied to structured columns based on the data type. Dates were shifted consistently using a random integer removing seasonality, day of the week, and year information. Text fields were filtered by manually curated allow and block lists, as well as context-specific regular expressions. For example, columns containing dose values were filtered to only contain numeric values. If necessary, a free-text deidentification algorithm was applied to remove PHI from free-text. Results of this algorithm were manually reviewed and verified to remove identified PHI.
MIMIC-IV is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-IV Clinical Database page  or the MIMIC-IV online documentation . The demo shares an identical schema and structure to the equivalent version of MIMIC-IV.
Data files are distributed in comma separated value (CSV) format following the RFC 4180 standard . The dataset is also made available on Google BigQuery. Instructions to accessing the dataset on BigQuery are provided on the online MIMIC-IV documentation, under the cloud page .
The MIMIC-IV demo provides researchers with the opportunity to better understand MIMIC-IV data.
CSV files can be opened natively using any text editor or spreadsheet program. However, as some tables are large it may be preferable to navigate the data via a relational database. We suggest either working with the data in Google BigQuery (see the "Files" section for access details) or creating an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.
Code is made available for use with MIMIC-IV on the MIMIC-IV code repository . Code provided includes derivation of clinical concepts, tutorials, and reproducible analyses.
Release notes for the demo follow the release notes for the MIMIC-IV database.
This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
This research and development was supported by grants NIH-R01-EB017205, NIH-R01-EB001659, and NIH-R01-GM104987 from the National Institutes of Health. The authors would also like to thank Philips Healthcare and staff at the Beth Israel Deaconess Medical Center, Boston, for supporting database development, and Ken Pierce for providing ongoing support for the MIMIC research community.
Conflicts of Interest
The authors declare no competing financial interests.
- Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2021). MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98.
- MIMIC Online Documentation. Accessed June 6th 2022. https://mimic.mit.edu/
- Shafranovich Y. Common format and MIME type for comma-separated values (CSV) files. https://www.hjp.at/doc/rfc/rfc4180.html
- Johnson AE, Stone DJ, Celi LA, Pollard TJ. The MIMIC Code Repository: enabling reproducibility in critical care research. Journal of the American Medical Informatics Association. 2018 Jan;25(1):32-9. https://github.com/MIT-LCP/mimic-code
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Open Database License v1.0
Total uncompressed size: 14.2 MB.
Access the files
- Download the ZIP file (14.2 MB)
- Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/mimic-iv-demo/1.0/
|d_hcpcs.csv.gz (download)||498.3 KB||2022-05-15|
|d_icd_diagnoses.csv.gz (download)||1.7 MB||2022-05-15|
|d_icd_procedures.csv.gz (download)||1.0 MB||2022-05-15|
|d_labitems.csv.gz (download)||14.6 KB||2022-05-15|
|diagnoses_icd.csv.gz (download)||23.7 KB||2022-05-15|
|drgcodes.csv.gz (download)||7.3 KB||2022-05-15|
|emar.csv.gz (download)||697.1 KB||2022-05-15|
|emar_detail.csv.gz (download)||663.9 KB||2022-05-15|
|hcpcsevents.csv.gz (download)||974 B||2022-05-15|
|labevents.csv.gz (download)||1.8 MB||2022-05-15|
|microbiologyevents.csv.gz (download)||76.9 KB||2022-05-15|
|pharmacy.csv.gz (download)||492.1 KB||2022-05-15|
|poe.csv.gz (download)||590.4 KB||2022-05-15|
|poe_detail.csv.gz (download)||21.7 KB||2022-05-15|
|prescriptions.csv.gz (download)||416.9 KB||2022-05-15|
|procedures_icd.csv.gz (download)||6.4 KB||2022-05-15|
|services.csv.gz (download)||5.0 KB||2022-05-15|