April 24, 2024

Guidelines for creating datasets and models from MIMIC

We recognize that there is value in creating datasets or models that are either derived from MIMIC or which augment MIMIC in some way (for example, by adding annotations). Here are some guidelines on creating these datasets and models:

  • Any derived datasets or models should be treated as containing sensitive information. If you wish to share these resources, they should be shared on PhysioNet under the same agreement as the source data.
  • If you would like to use the MIMIC acronym in your project name, please include the letters “Ext” (for example, MIMIC-IV-Ext-YOUR-DATASET"). Ext may either indicate “extracted” (e.g. a derived subset) or “extended” (e.g. annotations), depending on your use case.
  • Please select the relevant "Parent Projects" in the Discovery tab of the submission portal when preparing your project for submission.
March 27, 2025

This repository is under review by NIH for potential modification in compliance with U.S. federal Administration directives.

This repository is under review by NIH for potential modification in compliance with U.S. federal Administration directives.

July 15, 2025

Access Restrictions Under DOJ Data Security Program

PhysioNet has introduced updated access policies for certain datasets to comply with the U.S. Department of Justice’s Data Security Program (DSP) under Executive Order 14117. The DSP final rule took effect on April 8, 2025 and full enforcement began July 8, 2025: https://www.justice.gov/opa/media/1396351/dl

The DSP imposes export-control–style restrictions on U.S. persons sharing or transferring bulk sensitive personal data (e.g., genomic, biometric, health, financial, geolocation) and U.S. government-related data with specified countries or "covered persons". The rule applies to interactions with countries including: China (including Hong Kong and Macau), Cuba, Iran, North Korea, Russia, and Venezuela, as well as individuals or entities connected to them.

PhysioNet now prevents access to certain controlled-access datasets for users connecting from IP addresses or affiliations in those regions, or for those classified as “covered persons”. These steps are taken to satisfy legal obligations and are not a judgment on your work as researchers.

We understand these changes may affect ongoing research. PhysioNet is committed to supporting your efforts to understand the policy and explore compliant access options.

Further information
Sept. 24, 2025

Use of MIMIC Data with Large Language Models and Online Services

We have received inquiries about the use of credentialed and restricted data on PhysioNet, including MIMIC-III, MIMIC-IV, MIMIC-CXR, and their derivatives, with large language models (LLMs) and online services. The PhysioNet Credentialed Data Use Agreement explicitly prohibits sharing access to the data with third parties, including sending it through APIs or using it on online platforms.

Key Requirements:

  • Zero Data Retention: MIMIC data must not be stored or retained by third-party LLM services.

  • User Responsibility: Researchers are responsible for ensuring compliance with the Data Use Agreement.

Recommendations:

  • Strongly Recommended: Use locally deployed LLMs to maintain full control over the data.

  • If Using Cloud Services or APIs: Verify that the service’s settings ensure zero data retention, no use of data for model training, and no human review. Many services retain data by default. Even when services claim "zero data retention," their requirements may be insufficient due to internal processing, logging, or caching practices. Regularly review platform policies, as they may change without notice. If a service’s data handling practices are unclear or cannot be fully verified, do not use the service.

Important Disclaimer: PhysioNet cannot verify the data practices of external services and does not endorse or recommend specific platforms.

Nov. 10, 2025

Seeking Applications for Exceptional Candidates for the Director, National Institute of General Medical Sciences (NIGMS), NIH

The National Institutes of Health (NIH) is seeking applications for the position of Director, National Institute of General Medical Sciences (NIGMS). The Director, NIGMS, provides leadership, and administers, fosters, and supports research in the basic and general medical sciences and in related natural or behavioral sciences. The Director develops Institute goals, priorities, policies, and program activities, and keeps the Director, NIH, abreast of NIGMS developments, accomplishments, and needs as they relate to the overall mission of the NIH. In exercising the Director’s responsibilities for program planning, implementation and evaluation, the incumbent works with and seeks the advice of a wide range of groups within the scientific community including investigators, institutions, scientific societies, and relevant commercial organizations. 

The Director is responsible for managing a high-level, complex organization and serving as the chief visionary for the Institute. The Director actively engages others to create a shared vision of the purpose and direction of the organization and works collaboratively within the Institute, across the NIH, and with external entities to generate, gain commitment for, and accomplish NIGMS goals. The Director must demonstrate a keen awareness of the workings of the public sector and successfully navigate with that environment to promote and reach NIGMS and NIH objectives.

The position is open for application from Friday, November 7, 2025 – Friday, November 21, 2025.  Additional information on the position and the application process can be found here: Director, National Institute of General Medical Sciences | Office of Human Resources

Featured Resources

More Resources
Database Open Access

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients

Hyung-Chul Lee, Chul-Woo Jung

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients Published: Sept. 21, 2022. Version: 1.0.0
Database Credentialed Access

MIMIC-IV

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, Roger Mark

Large database of de-identified health information from patients admitted to Beth Israel Deaconess Medical Center Published: Jan. 6, 2023. Version: 2.2
Database Credentialed Access

MIMIC-CXR Database

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng

Chest radiographs in DICOM format with associated free-text reports. Published: Sept. 19, 2019. Version: 2.0.0

Latest Resources

More Resources
Database Credentialed Access

Predictors of Hospital Onset Infection: A Matched Retrospective Cohort Dataset

Ziming Wei, Luke Sagers, Caroline McKenna, Ted Pak, Chanu Rhee, Michael Klompas, Sanjat Kanjilal

NPA-CP is a freely accessible dataset derived from electronic health record (EHR) information at MGB between 2015 and 2024. The dataset includes 11 different pathogens and can be used to predict hospital-onset infections for these pathogens. Published: Nov. 4, 2025. Version: 1.0.0
Database Credentialed Access

MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador Martinez, Eduardo Perez Guerrero, Paola Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy Zandee van Rilland, Poonam Hosamani, Kevin Keet, Minjoung Go, Evelyn Ling, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay Chaudhari

MedVAL-Bench is the first large-scale physician-validated benchmark for medical text validation, spanning 6 diverse medical tasks and containing 840 language model-generated outputs annotated by 12 physicians with error assessments and risk grades. Published: Nov. 3, 2025. Version: 1.0.0
Database Open Access

HeartCycle: A comprehensive dataset of synchronized impedance cardiography and echocardiography for accurate hemodynamic predictions

Eduardo Illueca Fernandez, Ricardo Couceiro, Farhad Abtahi, Jorge Henriques, Rui Pedro Paiva, Lino Goncalves, Jose Millet, Fernando Seoane, Jens Muehlsteff, Paulo Carvalho

Impedance cardiography dataset (ICG) which combines the ICG signals and other methodologies with the golden standard echocardiographys (ECG). Researchers can use this dataset to compare the ICG points with the real hemodynamic events. Published: Nov. 2, 2025. Version: 1.0.0

News

More News
Sept. 11, 2025

Bridge2AI Raw Audio Data Access

The published Bridge2AI-Voice dataset contains derived features from the audio waveforms. Interested users can request access to the original raw audio data by contacting: DACO@b2ai-voice.org

The raw audio data will be disseminated through controlled access only to protect participant's privacy.

Aug. 18, 2025

Roger Mark and George Moody Receive the 2026 IEEE Biomedical Engineering Award

Each year, the IEEE Awards Board selects a distinguished group of individuals to receive IEEE’s highest honors, recognizing exceptional achievements and significant contributions to technology, society, and the engineering profession.

We are honored to share that Professor Roger G. Mark and the late George B. Moody have been named co-recipients of the 2026 IEEE Biomedical Engineering Award for their leadership in ECG signal processing and the creation and distribution of curated biomedical and clinical data. View announcement on the IEEE website.

This recognition highlights the profound and lasting impact that Roger Mark and George Moody have had on biomedical engineering and the global research community. Their vision and contributions continue to underpin our work on PhysioNet and databases such as MIMIC.

About Roger G. Mark

Roger G. Mark is Distinguished Professor of Health Sciences and Technology Emeritus at the Institute for Medical Engineering & Science at MIT. His work spans physiological signal processing, patient monitoring, and critical care decision support. He is the co-founder of PhysioNet, launched in 1999 to provide open access to physiologic signals, clinical data, and open-source software for the research community.

About George B. Moody

George B. Moody made transformative contributions to biomedical signal processing through his work in electrocardiography. He developed the WFDB libraries and much of the code available on PhysioNet, which remains essential for ECG signal processing worldwide. He also created and led the PhysioNet/Computing in Cardiology Challenges for 15 years, fostering global collaboration and innovation.

Feb. 26, 2025

BioNLP @ACL 2025 Shared Task on Grounded Electronic Health Record Question Answering (ArchEHR-QA)

The overarching goal of the ArchEHR-QA 2025 (pronounced "Archer") shared task is to develop automated responses to patients' questions by generating answers that are grounded in key clinical evidence from their electronic health records (EHRs). The proposed dataset, ArchEHR-QA, comprises hand-curated, realistic patient questions (reflective of patient portal messages), relevant focus areas identified within these questions (as determined by a clinician), corresponding clinician-rewritten versions (crafted to aid in formulating responses), and note excerpts providing essential clinical context.

Feb. 4, 2025

New Dataset: Bridge2AI-Voice v1.0 Now Available on PhysioNet

We are pleased to announce the release of Bridge2AI-Voice v1.0, a dataset designed to advance research into the use of voice as a biomarker of health. This dataset, developed as part of the NIH Bridge2AI initiative, aims to support artificial intelligence research by providing ethically sourced, high-quality voice-derived data linked to clinical information.

Bridge2AI-Voice v1.0 includes 12,523 voice-derived recordings from 306 participants across five North American sites. Participants were selected based on conditions known to affect vocal characteristics, including:

  • Voice disorders (e.g., laryngeal conditions affecting phonation)
  • Neurological and neurodegenerative disorders (e.g., Parkinson’s, ALS, stroke)
  • Mood and psychiatric disorders (e.g., depression, anxiety)
  • Respiratory disorders (e.g., asthma, chronic cough)
  • Pediatric voice and speech disorders

The initial release does not include raw voice recordings. Instead, it provides derived acoustic features, such as spectrograms, along with detailed demographic, clinical, and validated questionnaire data.

Jan. 21, 2025

The George B. Moody PhysioNet Challenge 2025 has begun

This year's Challenge focuses on detecting Chagas disease from ECGs. Chagas disease is a parasitic disease in Central and South America that affects an estimated 6.5 million people and causes nearly 10,000 deaths annually. Timely treatment may prevent or slow damage to the cardiovascular system, but serological testing capacity is limited, so detection through ECGs can help to identify potential Chagas patients for testing and treatment.