Name: MIMIC-IV demo data in the Medical Event Data Standard (MEDS)
Published: Sept. 29, 2025
License: https://opendatacommons.org/licenses/odbl/index.html

Database Open Access

Robin Philippus van de Water , Ethan Steinberg , Michael Wornow , Patrick Rockenschaub , Matthew McDermott

Published: Sept. 29, 2025. Version: 0.0.1

When using this resource, please cite: (show more options)
van de Water, R. P., Steinberg, E., Wornow, M., Rockenschaub, P., & McDermott, M. (2025). MIMIC-IV demo data in the Medical Event Data Standard (MEDS) (version 0.0.1). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/t2y8-ea41

MLA	van de Water, Robin Philippus, et al. "MIMIC-IV demo data in the Medical Event Data Standard (MEDS)" (version 0.0.1). PhysioNet (2025). RRID:SCR_007345. https://doi.org/10.13026/t2y8-ea41
APA	van de Water, R. P., Steinberg, E., Wornow, M., Rockenschaub, P., & McDermott, M. (2025). MIMIC-IV demo data in the Medical Event Data Standard (MEDS) (version 0.0.1). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/t2y8-ea41
Chicago	van de Water, Robin Philippus, Steinberg, Ethan, Wornow, Michael, Rockenschaub, Patrick, and Matthew McDermott. "MIMIC-IV demo data in the Medical Event Data Standard (MEDS)" (version 0.0.1). PhysioNet (2025). RRID:SCR_007345. https://doi.org/10.13026/t2y8-ea41
Harvard	van de Water, R. P., Steinberg, E., Wornow, M., Rockenschaub, P., and McDermott, M. (2025) 'MIMIC-IV demo data in the Medical Event Data Standard (MEDS)' (version 0.0.1), PhysioNet. RRID:SCR_007345. Available at: https://doi.org/10.13026/t2y8-ea41
Vancouver	van de Water R P, Steinberg E, Wornow M, Rockenschaub P, McDermott M. MIMIC-IV demo data in the Medical Event Data Standard (MEDS) (version 0.0.1). PhysioNet. 2025. RRID:SCR_007345. Available from: https://doi.org/10.13026/t2y8-ea41

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

This dataset is an automated ETL conversion of the MIMIC-IV Clinical Database Demo into the Medical Event Data Standard (MEDS). MEDS is a data schema for storing streams of medical events such as those sourced from Electronic Health Records or claims records. MEDS is intentionally a minimal standard, designed for maximum interoperability across datasets, existing tools, and model architectures. By providing a simple standardization layer between datasets and model-specific code, MEDS is intended to help make machine learning research for EHR data more reproducible, robust, computationally performant, and collaborative.

Background

The Medical Event Data Standard (MEDS) [1] is a data schema for storing streams of medical events such as those sourced from either Electronic Health Records or claims records. MEDS is intentionally a minimal standard, designed for maximum interoperability across datasets, existing tools, and model architectures.The MEDS schema is simple and scalable and supports a growing suite of tools that can help accelerate the development of machine learning models. Our community is actively expanding the ecosystem to include public and private datasets, with private datasets remaining private, but helping ensure frictionless reproducibility from models trained on private data to other settings, machine learning models, and clinically-relevant benchmark tasks. To better facilitate experimenting with MEDS and improve understanding of the format as a whole, we provide an official demo dataset. Likewise, the MIMIC-IV demo has been converted to the OMOP (Observational Medical Outcomes Partnership) standard [2, 3].

Methods

We created a pipeline that extracts the MIMIC-IV dataset into the MEDS format. The dataset provided here was generated using this pipeline, specifically using v0.0.6 of the MEDS Python package available on the Python Packaging Index, PyPi (https://pypi.org/project/MIMIC-IV-MEDS/0.0.6/).

Generating the MEDS demo

The dataset was generated by first downloading the MIMIC-IV demo dataset (v2.2) [4] using the following commands:

pip install MIMIC_IV_MEDS==0.0.6
export DATASET_DOWNLOAD_USERNAME=$PHYSIONET_USERNAME
export DATASET_DOWNLOAD_PASSWORD=$PHYSIONET_PASSWORD
export ROOT_OUTPUT_DIR=path/to/your/desired/directory

The following command was then run to convert the data to the MEDS format:

MEDS_extract-MIMIC_IV root_output_dir=$ROOT_OUTPUT_DIR do_demo=True

The MEDS package uses MEDS-Transforms [6] to manage the data transformation. MEDS-Transforms can be adapted to create ETLs from any dataset to MEDS (see, for example, the list at: https://github.com/Medical-Event-Data-Standard#datasets--benchmarks).

While MEDS supports sharding, the demo dataset comprises a single shard due to the limited size of the dataset. Sharding is useful for larger datasets to enable tasks to be parallelized, but is unnecessary in this case.

Data Description

The MIMIC-IV demo dataset - and this MEDS transform - contains routinely collected electronic health record data for 100 critical care patients (referred to as "subjects"). The core MEDS dataset is contained within a data folder, which is split into three subsets:

data/train (80 subjects)
data/tuning, also known as validation (10 subjects)
data/held_out, also known as test (10 subjects)

Within each folder there is a Parquet file containing the event-based patient records. The MEDS data schema is simple, comprising of only 3 required columns: subject_id, time, code, and two optional columns: numeric_value and text_value. These data are supplemented by three metadata files:

metadata/codes.parquet: details on the unique codes used in the dataset and their parent codes.
metadata/dataset.json: general information about the ETL and dataset
metadata/subject_splits.parquet: specifies which subjects appear in the training, tuning, and held out subsets.

codes.parquet

This file contains metadata about the code vocabulary featured in the data files. It contains the following three columns:

code: The code value, of type string.
description: An optional free-text, human readable description of the code, of type string.
parent_codes: An optional list of links to parent codes in this dataset or external ontology nodes associated with this code, of type list[string].

dataset.json

This file contains metadata about the dataset itself, including the following:

dataset_name: The name of the dataset, of type string.
dataset_version: The version of the dataset, of type string. Ensuring the version numbers used are meaningful and unique is important for reproducibility, but is ultimately not enforced by the MEDS schema and is left to the dataset creator.
etl_name: The name of the ETL process used to generate the dataset, of type string.
etl_version: The version of the ETL process used to generate the dataset, of type string.
meds_version: The version of the MEDS standard used to generate the dataset, of type string.
created_at: The timestamp at which the dataset was created, of type string in ISO 8601 format (note that this is not an official timestamp type, but rather a string representation of a timestamp as this is a JSON file).

subject_splits.parquet

This file maps subject IDs to pre-defined splits of the dataset, such as training, hyperparameter tuning, and held-out sets. In the MEDS splits file, each row contains a subject_id (int64) and a split (string) column, where split is the name of the split in which that subject lives. For the three canonical AI/ML splits, MEDS uses the following split names:

train: The training split. This data can be used for any purpose during model building, and in supervised training labels over this split will be visible to the model.
tuning: The hyperparameter tuning split. This split is sometimes called the "dev" or "val" split in other contexts. This data can be used for tuning hyperparameters or for training of the final model, but should not be used for final evaluation of model performance. Users may choose to merge this with the training split then re-shuffle themselves if they need more splits or a different split ratio. Not all datasets will specify this split, as it is optional.
held_out: The final evaluation held-out split. This data should not be used for training or tuning, and should only be used for final evaluation of model performance. This split is sometimes called the "test" split in other contexts. No data about these patients should be assumed to be available during data pre-processing, training, or tuning.

For more information on the MEDS data structure, we suggest referring to the MEDS documentation [7].

Usage Notes

The data is in the .parquet format. Use compatible libraries, such as Pyarrow or Polars for Python to read the data. Alternatively, you can inspect the data with any Parquet reader or viewer. In addition, there is a growing set of MEDS-compliant tools, such as:

MEDS-Reader: A software package for efficient EHR processing [8].
MEDS-Transforms: A set of functions and scripts for transforming data to and from MEDS [6].
MEDS-Tab: A software package designed for automated tabularize and prepare MEDS data [9].
MEDS-Inspect: A software package to interactively inspect MEDS data [10].
ACES: A software package and configuration language for reproducible extraction of task cohorts [11].

For more information, tutorials, and compatible tools, see the MEDS documentation [7].

Release Notes

Version 0.1.1: Initial release. Added dataset in MEDS 0.3.3 with MIMIC-IV ETL 0.0.6

Ethics

The MIMIC-IV Medical Event Data Standard (MEDS) demo dataset was derived from the MIMIC-IV Clinical Database Demo (V2.2).

Acknowledgements

We acknowledge the work of the MEDS and MEDS-DEV community:

MEDS: Edward Choi, Ethan Steinberg, Jason A. Fries, Jungwoo Oh, Matthew B. A. McDermott, Michael Wornow, Nigam H. Shah, Patrick Rockenschaub, Robin P. van de Water, Tom J. Pollard

MEDS-DEV: Teya S. Bergamaschi, Jeffrey N. Chiang, Edward Choi, Young Sang Choi, Jason A. Fries, Jack Gallifant, Raffaele Giancotti, Xinzhuo Jiang, Hyewon Jeong, Vincent Jeanselme, Shalmali Joshi, Alistair Johnson, Apara Kashyap, Kiril V. Klein, Aleksia Kolo, Yuta Kobayashi, Ryan C. King, Simon A. Lee, Yanwei Li, Matthew B. A. McDermott, Maria E. Montgomery, Mikkel Odgaard, Jungwoo Oh, Nassim Oufattole, Chao Pang, Tom J. Pollard, Pawel Renc, Patrick Rockenschaub, Nigam H. Shah, Martin Sillesen, Ethan Steinberg, Kamilė Stankevičiūtė, Robin P. van de Water, Michael Wornow, Justin Xu, Mads Nielsen.

Conflicts of Interest

No conflicts of interest to report.

References

MEDS Working Group: Arnrich, B., Choi, E., Fries, J. A., McDermott, M. B. A., Oh, J., Pollard, T., Shah, N., Steinberg, E., Wornow, M., & van de Water, R. (2024). Medical Event Data Standard (MEDS): Facilitating machine learning for health. In ICLR 2024 Workshop on Learning from Time Series For Health. https://openreview.net/forum?id=IsHy2ebjIG
Kallfelz, M., Tsvetkova, A., Pollard, T., Kwong, M., Lipori, G., Huser, V., Osborn, J., Hao, S., & Williams, A. (2021). MIMIC-IV demo data in the OMOP Common Data Model (version 0.9). PhysioNet. https://doi.org/10.13026/p1f5-7x35.
Hripcsak, G., Duke, J.D., Shah, N.H., et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform. 2015;216:574-8. PMC4815923. https://pubmed.ncbi.nlm.nih.gov/26262116/
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV Clinical Database Demo (version 2.2). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/dp1f-ex47
https://github.com/Medical-Event-Data-Standard/MIMIC_IV_MEDS [Accessed 2025-07-08]
https://github.com/mmcdermott/MEDS_extract/ [Accessed 2025-07-08]
MEDS website: https://medical-event-data-standard.github.io/
MEDS Reader: https://meds-reader.readthedocs.io/en/latest/
MEDS Tab: https://meds-tab.readthedocs.io/en/latest/
MEDS Inspect: https://github.com/rvandewater/MEDS-Inspect
ACES: Justin Xu and Jack Gallifant and Alistair E. W. Johnson and Matthew B. A. McDermott. ACES: Automatic Cohort Extraction System for Event-Stream Datasets (2025). https://arxiv.org/abs/2406.19653

Parent Projects

MIMIC-IV demo data in the Medical Event Data Standard (MEDS) was derived from:

MIMIC-IV Clinical Database Demo v2.2

Please cite them when using this project.

Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Open Data Commons Open Database License v1.0

Discovery

DOI (version 0.0.1):
https://doi.org/10.13026/t2y8-ea41

DOI (latest version):
https://doi.org/10.13026/y9xz-1347

Topics:
ehr critical care electronic health record machine learning mimic meds medical event data standard

Project Website:
https://github.com/Medical-Event-Data-Standard

Corresponding Author

You must be logged in to view the contact information.

Versions

0.0.1 Sept. 29, 2025

Files

Total uncompressed size: 5.7 MB.

Access the files

Download the ZIP file (4.7 MB)

Download the files using your terminal:

wget -r -N -c -np https://physionet.org/files/mimic-iv-demo-meds/0.0.1/

Download the files using AWS command line tools:

aws s3 sync --no-sign-request s3://physionet-open/mimic-iv-demo-meds/0.0.1/ DESTINATION

Folder Navigation:

Name	Size	Modified
Parent Directory
codes.parquet (download)	381.0 KB	2025-05-09
dataset.json (download)	184 B	2025-05-09
subject_splits.parquet (download)	1.3 KB	2025-05-09

MIMIC-IV demo data in the Medical Event Data Standard (MEDS)

Cite