Database Credentialed Access
MIMIC-IV-Ext-CLIF: MIMIC-IV in the Common Longitudinal ICU data Format (CLIF)
Zewei Liao , Shan Guleria , Kevin Smith , Rachel Baccile , Kaveri Chhikara , Dema Therese , Vaishvik Chaudhari , Michael Craig Burkhart , Brett Beaulieu-Jones , Snigdha Jain , Kathryn Connell , Kevin Buell , Juan Rojas , Patrick Lyons , Siva Bhavani , Catherine A Gao , Chad Hochberg , Nick Ingraham , William Parker , CLIF Consortium
Published: March 23, 2026. Version: 1.1.0
When using this resource, please cite:
Liao, Z., Guleria, S., Smith, K., Baccile, R., Chhikara, K., Therese, D., Chaudhari, V., Burkhart, M. C., Beaulieu-Jones, B., Jain, S., Connell, K., Buell, K., Rojas, J., Lyons, P., Bhavani, S., Gao, C. A., Hochberg, C., Ingraham, N., Parker, W., & Consortium, C. (2026). MIMIC-IV-Ext-CLIF: MIMIC-IV in the Common Longitudinal ICU data Format (CLIF) (version 1.1.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/j481-g420
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Abstract
MIMIC-IV-Ext-CLIF is a derived dataset of MIMIC-IV v3.1, transformed into the Common Longitudinal ICU data Format (CLIF). CLIF is an open-source critical care data model developed by a consortium of more than 10 US academic medical centers to standardize intensive care unit (ICU) data for multi-center research. While CLIF has demonstrated value in federated research settings, access has been limited to consortium members with institutional electronic health record (EHR) data. This dataset addresses that gap by providing CLIF-formatted data derived from the freely accessible de-identified MIMIC-IV dataset, enabling researchers worldwide to utilize the CLIF format without requiring institutional EHR access.
MIMIC-IV-Ext-CLIF contains 14 CLIF tables as of the latest CLIF 2.1.0 version, covering core demographics (patient, hospitalization), clinical monitoring (vitals, labs), respiratory support, medication administration, and specialized ICU interventions (e.g., continuous renal replacement therapy). The transformation employs a reproducible ETL pipeline with transparent mapping decisions documented in user-friendly spreadsheets.
Background
Each year, more than five million Americans suffer from critical illness, or acute organ failure that necessitates life-sustaining interventions. While electronic health records (EHRs) contain granular data that could inform better understanding and management of critical illness, large-scale EHR research is hampered by challenges related to data handling, security, and standardization [1].
To address these issues, a consortium of critical care clinicians and data scientists from over ten U.S. health systems created the Common Longitudinal Intensive Care Unit (ICU) data Format (CLIF). CLIF is an open-source data model that harmonizes a minimum set of ICU Data Elements (mCIDE) to support research in critical care. Its effectiveness has been demonstrated in federated studies analyzing data from over 100,000 ICU patients across multiple health systems, highlighting its utility in reproducible multi-center research, from mortality prediction validation to clinical subphenotyping [1].
However, CLIF implementation requires substantial data science and clinical expertise, while access has been limited to consortium institutions with EHR infrastructure. This creates a barrier for researchers who could benefit from CLIF's standardized format but lack institutional resources. MIMIC-IV-Ext-CLIF closes this gap by providing a freely accessible CLIF-formatted dataset from MIMIC-IV [2], opening up CLIF access to researchers without institutional EHR access, who can now develop code against the MIMIC-IV-Ext-CLIF dataset and scale their studies across the CLIF consortium.
Methods
The dataset is created following a four-step process: (1) query, (2) map, (3) program, and (4) validate.
(1) Query
An extensive query of the original MIMIC-IV database was performed to identify candidate data elements for each CLIF table. The query employs case-insensitive keyword matching against MIMIC tables and fields, in tandem with consultation of the rich online documentation on both the official MIMIC website and the GitHub community.
To systematically evaluate candidates, we developed utility functions that automatically generate comprehensive listings of MIMIC data elements with relevant statistics: minimum, mean, median, maximum values for numeric variables, and distinct categories and their frequency counts for categorical variables.
(2) Map
Candidate MIMIC data elements, along with their statistical summaries, are comprehensively documented in a user-friendly spreadsheet [3] where they are reviewed and mapped to CLIF mCIDE categories by a group of one to three expert physician-scientists and one to two data scientists. In most cases, a common decision label is assigned to each MIMIC item or data element reviewed:
- TO MAP, AS IS: Direct mapping to CLIF category without transformation
- TO MAP, CONVERT UOM: Mapping requires unit of measurement conversion (e.g., mg/dL to mmol/L)
- NO MAPPING: This MIMIC data element has no appropriate CLIF counterpart and should not be mapped
- MAPPED ELSEWHERE: This data element is captured in a different CLIF table or field
- UNSURE: Requires additional clinical or technical review
- NOT AVAILABLE: CLIF data element not present or insufficient in MIMIC-IV
This open-access documentation approach prioritizes transparency, allowing clinicians and researchers to evaluate mapping decisions without requiring programming expertise. The spreadsheet also tracks review status, reviewer identities, and actionable next steps.
For each mapping, we preserve both the MIMIC-specific terminology (in *_name fields) and the standardized CLIF category (in *_category fields). For example, MIMIC's itemid 220045 ("Heart Rate") maps to CLIF vital_category = "heart_rate" while vital_name preserves the original MIMIC label. This dual representation enables both standardization and traceability.
(3) Program
Mapping decisions documented in the spreadsheet are implemented as modular Python scripts using modern data engineering frameworks. Each CLIF table is built using the Hamilton DAG pattern [4] to ensure modularity, testability, and reproducibility. The pipeline uses exported CSV copies of the mapping spreadsheet as the "source of truth," avoiding error-prone hard-coding. When a mapping decision changes (e.g., updating "TO MAP, AS IS" to "NO MAPPING"), re-running the pipeline automatically incorporates the change without code modification. The ETL pipeline is implemented in the CLIF-MIMIC GitHub repository [5] and is publicly available for review and reuse.
(4) Validate
Validation occurs at multiple levels to ensure CLIF 2.1.0 compliance and data quality.
The pandera framework [6] is deployed to validate the schema of each transformed CLIF table, checking for compliance in data types, nullability, and permissible mCIDE categories. These validations are accompanied by more complex and comprehensive checks using tools in the CLIF ecosystem such as CLIF TableOne [7] and CLIF Lighthouse [8] against CLIF consortium-wide quality benchmarks.
For complex transformations such as flattening the timestamps in the medication administration tables, unit tests are written to ensure the robustness of the transformation.
Data Description
The dataset consists of 14 CLIF tables derived from MIMIC-IV v3.1, each stored as a separate Parquet file, along with a supplemental table (clif_patient_assessments_raw_gcs) containing non-imputed GCS scores taken directly from chartevents. Following CLIF's entity-relationship model [1], the tables are organized by clinical information type and linked by patient_id and hospitalization_id. The anchor tables — patient, hospitalization, and adt (admission-discharge-transfer) — capture core demographics, encounter-level admission and discharge details, and patient movement across hospital locations, respectively. The remaining longitudinal event tables capture clinical events recorded over the course of each ICU stay, organized by domain: clinical monitoring (vitals, labs, patient_assessments, position), respiratory care (respiratory_support), medication administration (medication_admin_continuous, medication_admin_intermittent), specialized ICU interventions (crrt_therapy), and administrative records (code_status, hospital_diagnosis, patient_procedures).
For detailed descriptions of each CLIF table, see the data dictionary for the latest CLIF 2.1.0 version [9].
mimic-iv-ext-clif/ ├── README.md ├── clif_patient.parquet ├── clif_hospitalization.parquet ├── clif_adt.parquet ├── clif_vitals.parquet ├── clif_labs.parquet ├── clif_respiratory_support.parquet ├── clif_patient_assessments.parquet ├── clif_patient_assessments_raw_gcs.parquet ├── clif_medication_admin_continuous.parquet ├── clif_medication_admin_intermittent.parquet ├── clif_position.parquet ├── clif_crrt_therapy.parquet ├── clif_code_status.parquet ├── clif_hospital_diagnosis.parquet └── clif_patient_procedures.parquet
Usage Notes
Reuse potential
As a freely accessible implementation of the CLIF format, this dataset offers substantial reuse potential for researchers both within or outside the CLIF consortium. For researchers already with CLIF-formatted institutional data, this dataset can serve as a validation dataset for code development and project prototyping. For researchers currently building their CLIF ETL pipelines, this dataset can serve as a reference implementation in orchestrating certain CLIF-specific transformations. For researchers without CLIF-formatted institutional data, this dataset provides a low-barrier entry point to the CLIF format whereby code developed against this dataset can be scaled across the entire CLIF consortium, and any researcher can reproduce findings from any CLIF consortium studies using this open-access implementation.
This dataset has already been used in CLIF projects examining the heterogeneity of adherence to lung-protective ventilation [10], rates and outcomes associated with ICU readmission [11], identifying early opportunities for mobilization in patients on mechanical ventilation [12], and validation of machine learning models to predict short-term risk of ventilator-associated pneumonia [13]. Each project has its own associated code repository.
Known issues or limitations
The following are select issues and mapping considerations in the current release. For a comprehensive listing of all issues encountered and decisions made when mapping MIMIC-IV to CLIF, see the ISSUESLOG [14].
Race and ethnicity mapping. In MIMIC-IV, race and ethnicity are documented per encounter and may vary across encounters for the same patient. To assign a unique value in CLIF's patient table, we select the highest-frequency informative value (excluding "Unknown" and "Other"), breaking ties by recency.
Lab order datetime. CLIF's labs table includes three datetime fields: lab_order_dttm, lab_collect_dttm, and lab_result_dttm. MIMIC-IV's two-timestamp model (charttime for specimen acquisition and storetime for result availability) does not include a direct analog for order time. To ensure the field is populated for downstream use, lab_order_dttm is set to the same charttime value used for lab_collect_dttm. Users should be aware that these two fields are identical in this derived dataset.
Medication route mapping. In MIMIC-IV, medication route information is dispersed across multiple fields (ordercategoryname, secondaryordercategoryname, ordercomponenttypedescription, ordercategorydescription, category). In most cases, a combination of these fields determines the med_route_category in CLIF. In rarer cases, the specific medication is also needed—for example, for the same ordercategoryname = '11-Prophylaxis (Non IV)', Heparin Sodium is mapped to intramuscular ('im') while Pantoprazole is mapped to enteral. Two medications—Insulin-Humalog and Naloxone—are marked only as '05-Med Bolus' and are currently mapped to 'iv', though they can theoretically be administered via IM or inhalation in rare cases.
Complementary resources
- To view or run the ETL pipeline that generates all the CLIF tables presented, see the MIMIC-IV-Ext-CLIF GitHub repository [5]
- To review the mapping decisions, see the MIMIC-to-CLIF mapping spreadsheet [3]
- To understand the structure of each CLIF table, review the CLIF data dictionary [9]
- To learn more about CLIF, visit the CLIF website [15]
Release Notes
In the event of a lag between syncing the most up-to-date release here, please always refer to the CHANGELOG [16] of this project's GitHub repository for the most up-to-date releases as well as past releases.
| MIMIC version | CLIF version | Latest CLIF-MIMIC release | Status |
|---|---|---|---|
| IV-3.1 | 2.1.0 | v1.1.0 | 🧩 partial (✅ stable on the already-released tables) |
| IV-3.1 | 2.0.0 | v0.1.0 | ✅ stable |
[v1.1.0] - 2026-02-13
Readme
Tables updated: labs, patient_assessments.
New
- Improve
lab_categorycoverage in thelabstable by addingbasophils_percent,basophils_absolute,lymphocytes_absolute,eosinophils_absolute,neutrophils_absolute,monocytes_absoluteand expanding capture ofwbc. - Add
patient_assessments_raw_gcssupplemental table with non-imputed GCS scores taken directly fromchartevents. See ISSUESLOG [14] for details on the difference from the imputed GCS inpatient_assessments.
Ethics
MIMIC-IV-Ext-CLIF is derived from MIMIC-IV and is covered by the same IRB.
Acknowledgements
We thank the MIMIC team at MIT Laboratory for Computational Physiology and Beth Israel Deaconess Medical Center for creating and maintaining the MIMIC-IV database, without which this public CLIF implementation would not be possible. We are in particular grateful to Dr. Alistair Johnson and Dr. Tom Pollard for graciously answering our questions during both the ETL and project submission processes. We thank PhysioNet for hosting and distributing critical care datasets that enable reproducible research worldwide.
Dr. Lyons is supported by NIH/NCI K08CA270383. Dr. Rojas is supported by NIH/NIDA R01DA051464 and the Robert Wood Johnson Foundation and has received consulting fees from Truveta. Dr. Buell is supported by an institutional research training grant (NIH/NHLBI T32 HL007605). Dr. Bhavani is supported by NIH/NIGMS K23GM144867. Dr. Gao is supported by NIH/NHLBI K23HL169815, a Parker B. Francis Opportunity Award, and an American Thoracic Society Unrestricted Grant. Dr. Hochberg is supported by NIH/NHLBI K23HL169743. Dr. Ingraham is supported by NIH/NHLBI K23HL166783. Dr. Barker is supported by an institutional research training grant (NIH/NHLBI T32 HL007749). Dr. Parker is supported by NIH K08HL150291, R01LM014263, and the Greenwall Foundation.
Conflicts of Interest
The author(s) have no conflicts of interest to declare.
References
- Rojas JC, Lyons PG, Chhikara K, Chaudhari V, Bhavani SV, Nour M, et al. A common longitudinal intensive care unit data format (CLIF) for critical illness research. Intensive Care Med [Internet]. 2025 Mar 1 [cited 2025 Nov 11];51(3):556–69. Available from: https://doi.org/10.1007/s00134-025-07848-7
- Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data [Internet]. 2023 Jan 3 [cited 2025 Nov 11];10(1):1. Available from: https://www.nature.com/articles/s41597-022-01899-x
- MIMIC-to-CLIF Mapping Spreadsheet. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://docs.google.com/spreadsheets/d/1QhybvnlIuNFw0t94JPE6ei2Ei6UgzZAbGgwjwZCTtxE
- Krawczyk S, Izzy E ben, Quinn D. Hamilton: enabling software engineering best practices for data transformations via generalized dataflow graphs. In: Cappiello C, Geisler S, Vidal ME, editors. 1st International Workshop on Data Ecosystems co-located with 48th International Conference on Very Large Databases (VLDB 2022) [Internet]. 2022. p. 41–50. Available from: https://ceur-ws.org/Vol-3306/paper5.pdf
- CLIF-MIMIC GitHub Repository. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://github.com/Common-Longitudinal-ICU-data-Format/CLIF-MIMIC
- Bantilan N. pandera: Statistical Data Validation of Pandas Dataframes. In: Agarwal M, Calloway C, Niederhut D, Shupe D, editors. Proceedings of the 19th Python in Science Conference. 2020. p. 116–24.
- CLIF TableOne GitHub Repository. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://github.com/Common-Longitudinal-ICU-data-Format/CLIF-TableOne
- CLIF Lighthouse GitHub Repository. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://github.com/Common-Longitudinal-ICU-data-Format/CLIF-Lighthouse
- CLIF Data Dictionary 2.1.0. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://clif-icu.com/data-dictionary/data-dictionary-2.1.0
- Ingraham NE, Chhikara K, Eddington C, Ortiz AC, Schmid B, Weissman GE, et al. The Association of Sex and Height With Low-tidal Volume Ventilation in a Multi-center Cohort of Critically Ill Adults. Am J Respir Crit Care Med [Internet]. 2025 May [cited 2025 Nov 11];211(Abstracts):A7695–A7695. Available from: https://www.atsjournals.org/doi/abs/10.1164/ajrccm.2025.211.Abstracts.A7695
- Amagai S, Chaudhari V, Chhikara K, Ingraham NE, Hochberg CH, Barker AK, et al. The Epidemiology of ICU Readmissions Across Ten Health Systems. Crit Care Explor [Internet]. 2025 Nov [cited 2025 Nov 11];7(11):e1341. Available from: https://journals.lww.com/ccejournal/fulltext/2025/11000/the_epidemiology_of_icu_readmissions_across_ten.1.aspx
- Patel BK, Chhikara K, Liao Z, Ingraham NE, Eddington C, Jain S, et al. Identifying Windows of Opportunity for Early Mobilization of Mechanically Ventilated Patients: Multi-center Comparative Analysis of Clinical Trial and Consensus Guideline Eligibility Criteria. Am J Respir Crit Care Med [Internet]. 2025 May [cited 2025 Nov 11];211(Abstracts):A2870–A2870. Available from: https://www.atsjournals.org/doi/abs/10.1164/ajrccm.2025.211.Abstracts.A2870
- Peltekian AK, Liao WT, Guggilla V, Markov N, Senkow K, Liao Z, et al. Developing and externally validating machine learning models to forecast short-term risk of ventilator-associated pneumonia [Internet]. medRxiv; 2026 [cited 2026 Feb 13]. p. 2026.01.28.26344858. Available from: https://www.medrxiv.org/content/10.64898/2026.01.28.26344858v1
- CLIF-MIMIC Issues Log. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://github.com/Common-Longitudinal-ICU-data-Format/CLIF-MIMIC/blob/main/ISSUESLOG.md
- CLIF: Common Longitudinal ICU data Format. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://clif-icu.com/
- CLIF-MIMIC Changelog. Common Longitudinal ICU data Format (CLIF) [Internet]. 2025 [cited 2025 Nov 11]. Available from: https://github.com/Common-Longitudinal-ICU-data-Format/CLIF-MIMIC/blob/main/CHANGELOG.md
Parent Projects
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.1.0):
https://doi.org/10.13026/j481-g420
DOI (latest version):
https://doi.org/10.13026/86jk-m086
Topics:
critical care
mimic
clif
the common longitudinal icu data format
Project Website:
https://github.com/Common-Longitudinal-ICU-data-Format/CLIF-MIMIC
Project Views
11
Current Version11
All VersionsCorresponding Author
Versions
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project