April 18, 2023

Responsible use of MIMIC data with online services like GPT

We have received inquiries regarding the use of credentialed data (MIMIC-III, MIMIC-IV, MIMIC-CXR) with online services such as GPT. The PhysioNet Credentialed Data Use Agreement explicitly prohibits sharing access to the data with third parties, including sending it through APIs provided by companies like OpenAI, or using it in online platforms like ChatGPT.

If you are interested in using the GPT family of models, we suggest using one of the following services:

  • Azure OpenAI service. You'll need to opt out of human review of the data via this form. Reasons for opting out are: 1) you are processing sensitive data where the likelihood of harmful outputs and/or misuse is low, and 2) you do not have the right to permit Microsoft to process the data for abuse detection due to the data use agreement you have signed.
  • Amazon Bedrock. Bedrock provides options for fine-tuning foundation models using private labeled data. After creating a copy of a base foundation model for exclusive use, data is not shared back to the base model for training.
  • Google's Gemini via Vertex AI on Google Cloud Platform. Gemini doesn't use your prompts or its responses as data to train its models. If making use of additional features offered through the Gemini for Google Cloud Trusted Tester Program, you should obtain the appropriate opt-outs for data sharing, or otherwise not perform tasks that require the sharing of data.
  • Anthropic Claude. Claude does not use your prompts or its responses as data to train its models by default, and routine human review of data is not performed.

If you have any questions about this policy, feel free to reach out: https://physionet.org/about/#contact_us

April 24, 2024

Guidelines for creating datasets and models from MIMIC

We recognize that there is value in creating datasets or models that are either derived from MIMIC or which augment MIMIC in some way (for example, by adding annotations). Here are some guidelines on creating these datasets and models:

  • Any derived datasets or models should be treated as containing sensitive information. If you wish to share these resources, they should be shared on PhysioNet under the same agreement as the source data.
  • If you would like to use the MIMIC acronym in your project name, please include the letters “Ext” (for example, MIMIC-IV-Ext-YOUR-DATASET"). Ext may either indicate “extracted” (e.g. a derived subset) or “extended” (e.g. annotations), depending on your use case.
  • Please select the relevant "Parent Projects" in the Discovery tab of the submission portal when preparing your project for submission.
March 27, 2025

This repository is under review by NIH for potential modification in compliance with U.S. federal Administration directives.

This repository is under review by NIH for potential modification in compliance with U.S. federal Administration directives.

July 15, 2025

Access Restrictions Under DOJ Data Security Program

PhysioNet has introduced updated access policies for certain datasets to comply with the U.S. Department of Justice’s Data Security Program (DSP) under Executive Order 14117. The DSP final rule took effect on April 8, 2025 and full enforcement began July 8, 2025: https://www.justice.gov/opa/media/1396351/dl

The DSP imposes export-control–style restrictions on U.S. persons sharing or transferring bulk sensitive personal data (e.g., genomic, biometric, health, financial, geolocation) and U.S. government-related data with specified countries or "covered persons". The rule applies to interactions with countries including: China (including Hong Kong and Macau), Cuba, Iran, North Korea, Russia, and Venezuela, as well as individuals or entities connected to them.

PhysioNet now prevents access to certain controlled-access datasets for users connecting from IP addresses or affiliations in those regions, or for those classified as “covered persons”. These steps are taken to satisfy legal obligations and are not a judgment on your work as researchers.

We understand these changes may affect ongoing research. PhysioNet is committed to supporting your efforts to understand the policy and explore compliant access options.

Further information

Featured Resources

More Resources
Database Open Access

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients

Hyung-Chul Lee, Chul-Woo Jung

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients Published: Sept. 21, 2022. Version: 1.0.0
Database Credentialed Access

MIMIC-IV

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, Roger Mark

Large database of de-identified health information from patients admitted to Beth Israel Deaconess Medical Center Published: Jan. 6, 2023. Version: 2.2
Database Credentialed Access

MIMIC-CXR Database

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng

Chest radiographs in DICOM format with associated free-text reports. Published: Sept. 19, 2019. Version: 2.0.0

Latest Resources

More Resources
Software Open Access

Vital File Web Viewer: A Browser-Based Tool for Viewing High-Resolution Vital Signs Data

Eunsun Rachel Lee, Hyung-Chul Lee

A standalone, browser-based viewer for .vital files that enables interactive visualization of high-resolution vital signs data without server-side components. Published: Sept. 1, 2025. Version: 1.0.0
Database Open Access

MIMIC-IV Clinical Database Demo on FHIR

Alex Bennett, Hannes Ulrich, Joshua Wiedekopf, Piotr Szul, John Grimes, Alistair Johnson

The MIMIC-IV Clinical Database Demo on FHIR is a 100 patient subset of the MIMIC-IV v2.2 and MIMIC-IV-ED v2.2 clinical databases converted into the Fast Healthcare Interoperability Resources (FHIR) format. Published: Aug. 27, 2025. Version: 2.1.0
Database Restricted Access

Community-Acquired Pneumonia, Endotypes and Phenotypes (NACef): Prospective, observational cohort study of Translational Medicine

Natalia Sanabria-Herrera, Esteban Garcia Gallo, Luis Felipe Reyes

Community-Acquired Pneumonia (CAP) poses a significant health risk, linked to high in-hospital morbidity and mortality rates. The dataset includes clinical details of 768 CAP patients at Clinica Universidad de La Sabana, Colombia. Published: Aug. 21, 2025. Version: 2.0.1

News

More News
Aug. 18, 2025

Roger Mark and George Moody Receive the 2026 IEEE Biomedical Engineering Award

Each year, the IEEE Awards Board selects a distinguished group of individuals to receive IEEE’s highest honors, recognizing exceptional achievements and significant contributions to technology, society, and the engineering profession.

We are honored to share that Professor Roger G. Mark and the late George B. Moody have been named co-recipients of the 2026 IEEE Biomedical Engineering Award for their leadership in ECG signal processing and the creation and distribution of curated biomedical and clinical data. View announcement on the IEEE website.

This recognition highlights the profound and lasting impact that Roger Mark and George Moody have had on biomedical engineering and the global research community. Their vision and contributions continue to underpin our work on PhysioNet and databases such as MIMIC.

About Roger G. Mark

Roger G. Mark is Distinguished Professor of Health Sciences and Technology Emeritus at the Institute for Medical Engineering & Science at MIT. His work spans physiological signal processing, patient monitoring, and critical care decision support. He is the co-founder of PhysioNet, launched in 1999 to provide open access to physiologic signals, clinical data, and open-source software for the research community.

About George B. Moody

George B. Moody made transformative contributions to biomedical signal processing through his work in electrocardiography. He developed the WFDB libraries and much of the code available on PhysioNet, which remains essential for ECG signal processing worldwide. He also created and led the PhysioNet/Computing in Cardiology Challenges for 15 years, fostering global collaboration and innovation.

Feb. 26, 2025

BioNLP @ACL 2025 Shared Task on Grounded Electronic Health Record Question Answering (ArchEHR-QA)

The overarching goal of the ArchEHR-QA 2025 (pronounced "Archer") shared task is to develop automated responses to patients' questions by generating answers that are grounded in key clinical evidence from their electronic health records (EHRs). The proposed dataset, ArchEHR-QA, comprises hand-curated, realistic patient questions (reflective of patient portal messages), relevant focus areas identified within these questions (as determined by a clinician), corresponding clinician-rewritten versions (crafted to aid in formulating responses), and note excerpts providing essential clinical context.

Feb. 4, 2025

New Dataset: Bridge2AI-Voice v1.0 Now Available on PhysioNet

We are pleased to announce the release of Bridge2AI-Voice v1.0, a dataset designed to advance research into the use of voice as a biomarker of health. This dataset, developed as part of the NIH Bridge2AI initiative, aims to support artificial intelligence research by providing ethically sourced, high-quality voice-derived data linked to clinical information.

Bridge2AI-Voice v1.0 includes 12,523 voice-derived recordings from 306 participants across five North American sites. Participants were selected based on conditions known to affect vocal characteristics, including:

  • Voice disorders (e.g., laryngeal conditions affecting phonation)
  • Neurological and neurodegenerative disorders (e.g., Parkinson’s, ALS, stroke)
  • Mood and psychiatric disorders (e.g., depression, anxiety)
  • Respiratory disorders (e.g., asthma, chronic cough)
  • Pediatric voice and speech disorders

The initial release does not include raw voice recordings. Instead, it provides derived acoustic features, such as spectrograms, along with detailed demographic, clinical, and validated questionnaire data.

Jan. 21, 2025

The George B. Moody PhysioNet Challenge 2025 has begun

This year's Challenge focuses on detecting Chagas disease from ECGs. Chagas disease is a parasitic disease in Central and South America that affects an estimated 6.5 million people and causes nearly 10,000 deaths annually. Timely treatment may prevent or slow damage to the cardiovascular system, but serological testing capacity is limited, so detection through ECGs can help to identify potential Chagas patients for testing and treatment. 

Nov. 12, 2024

MIMIC-IV v3.1 is now available on BigQuery

MIMIC-IV v3.1 is now available on BigQuery. Users may request access via PhysioNet. Currently, MIMIC-IV v3.1 is available on the mimiciv_v3_1_hosp and mimiciv_v3_1_icu schemas. MIMIC-IV v2.2 is available on the mimiciv_v2_2_hosp and mimiciv_v2_2_icu datasets as well as the mimiciv_hosp and mimiciv_icu datasets. On November 25th 2024, we will replace the data on mimiciv_hosp and mimiciv_icu with MIMIC-IV v3.1.