PhysioNet Index

Database Credentialed Access

MIMIC-IV-ECG-Ext-ICD: Diagnostic labels for MIMIC-IV-ECG

Nils Strodthoff, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp

Dataset that links ECG records from MIMIC-IV-ECG to ED discharge and hospital discharge diagnoses, which enables to train general ECG prediction models based on clinical labels and facilitates the retrieval of further clinical metadata from MIMIC-IV.

electrocardiography mimic machine learning

Published: Aug. 30, 2024. Version: 1.0.1

Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, et al.

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

information extraction artificial intelligence oncology natural language processing electronic health records large language models

Published: Feb. 7, 2024. Version: 1.0

Database Open Access

KINECAL

Sean Maudsley-Barton, Moi Hoon Yap

A dataset for balance falls-risk assessment and balance impairment analysis

balance posturography clinical tests postural sway falls-risk age-related changes

Published: June 8, 2023. Version: 1.0.3

Database Credentialed Access

Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information

Yael Bensoussan, Alexandros Sigaras, Anais Rameau, et al.

A dataset of features from voice recordings and metadata to enable the development, benchmarking, and validation of clinically applicable machine-learning models for diagnosing a wide range of health conditions.

health biomarkers bridge2ai voice

Published: May 1, 2026. Version: 3.1.0

Database Credentialed Access

Bridge2AI-Voice Pediatric Dataset

Yael Bensoussan, Alexandros Sigaras, Anais Rameau, et al.

A dataset of questionnaire responses, spectrograms, and other information for pediatric participants collected for the Bridge2AI voice as a biomarker of health project.

health pediatric biomarkers bridge2ai voice

Published: May 1, 2026. Version: 1.1.0

Database Credentialed Access

MIMIC-IV-ECHO: Echocardiogram Matched Subset

Brian Gow, Tom Pollard, Nathaniel Greenbaum, et al.

The MIMIC-IV-ECHO module contains structured measurements from over 200,000 echocardiograms and more than 500,000 echocardiogram DICOM files. Patients overlap with those in the MIMIC-IV Clinical Database.

Published: March 10, 2026. Version: 1.0

Database Credentialed Access

Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation

Jong Hak Moon, Geon Choi, Paloma Rabaey, et al.

A radiologist-annotated benchmark of structured chest X-ray reports at single and sequential levels, comprising 1,473 reports across 18 relation types and 80 longitudinal cases.

fine-grained structured reports attribute-level clinical reasoning medical text structuring longitudinal clinical reasoning chest x-ray report parsing medical information structuring benchmark dataset for radiology report medical information extraction structured radiology reports temporal relation extraction radiology report benchmarking longitudinal clinical understanding

Published: Jan. 11, 2026. Version: 1.0.0

Database Restricted Access

Microbiological, Immunological and Biochemical Characteristics of the Development of Ventilator Associated Pneumonia

Natalia Sanabria-Herrera, Ingrid Gisell Bustos Moya, Luis Felipe Reyes

This study explores the respiratory microbiome's role in nosocomial lower respiratory tract infections in ICU patients. Conducted in Chía, Colombia, revealing the microbiome's impact on disease progression.

Published: Dec. 5, 2025. Version: 1.1.1

Database Credentialed Access

Antibiotic Resistance Microbiology Dataset Mass General Brigham (ARMD-MGB)

Ziming Wei, Sanjat Kanjilal

ARMD-MGB contains detailed microbiology and clinical metadata for >225,000 patients and >970,000 cultures collected over 10 years

medical informatics antimicrobial resistance electronic health records

Published: Dec. 5, 2025. Version: 1.0.0

Database Credentialed Access

EchoGraph-annotated ECHO-NOTE2NUM examples

Chieh-Ju Chao, Mohammad Asadi

EchoGraph is a model that automatically extracts and structures clinical information from echocardiogram reports. The Annotated ECHO-NOTE2NUM Dataset contains MIMIC-III echo reports enhanced with EchoGraph annotations to enhance future research.

Published: Dec. 3, 2025. Version: 1.0.0

Search

Resources

MIMIC-IV-ECG-Ext-ICD: Diagnostic labels for MIMIC-IV-ECG

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

KINECAL

Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information

Bridge2AI-Voice Pediatric Dataset

MIMIC-IV-ECHO: Echocardiogram Matched Subset

Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation

Microbiological, Immunological and Biochemical Characteristics of the Development of Ventilator Associated Pneumonia

Antibiotic Resistance Microbiology Dataset Mass General Brigham (ARMD-MGB)

EchoGraph-annotated ECHO-NOTE2NUM examples