Resources


Database Credentialed Access

NCH Sleep DataBank: A Large Collection of Real-world Pediatric Sleep Studies with Longitudinal Clinical Data

Harlin Lee, Boyue Li, Yungui Huang, Yuejie Chi, Simon Lin

The NCH Sleep DataBank includes 3,984 pediatric sleep studies on 3,673 unique patients conducted at Nationwide Children's Hospital between 2017 and 2019. It contains polysomnography (PSG), clinical annotations, and longitudinal clinical data.

eeg ehr pediatrics polysomnography clinical decision support sleep study ecg sleep disorders electronic health records

Published: Oct. 27, 2021. Version: 3.1.0


Database Open Access

MIMIC-IV demo data in the Medical Event Data Standard (MEDS)

Robin Philippus van de Water, Ethan Steinberg, Michael Wornow, Patrick Rockenschaub, Matthew McDermott

MIMIC-IV Clinical Database Demo in MEDS (Medical Event Data Standard) format.

ehr critical care electronic health record mimic machine learning meds medical event data standard

Published: Sept. 29, 2025. Version: 0.0.1


Database Credentialed Access

MIMIC-Ext-DrugDetection

Fabrice Harel-Canada, Nanyun Peng, David Goodman, Ruby Romero, Allan Nguyen, Brandon Moghanian, Anabel Salimian

This project offers a multilabel annotated dataset of clinical note sentences from MIMIC-III/IV for substance use detection. It supports NLP research for identifying various co-occurring drug use mentions in patient records.

ehr mimic-iv substance use clinical notes methamphetamine multi-label cocaine drug detection polysubstance use prescription opioid misuse cannabis benzodiazepine misuse injection drug use heroin mimic-iii

Published: Sept. 25, 2025. Version: 1.0.0


Database Credentialed Access

Immunosuppressive Condition and Medication Annotations for Admission Notes in the MIMIC-III Database

Vijeeth Guggilla, Melissa Bak, Mengjia Kang, Theresa Walunas, Catherine A Gao

This database contains 200 MIMIC-III admission notes with adjudicated labels for histories of various immunosuppressive conditions and usage of various immunosuppressive medications.

Published: Aug. 4, 2025. Version: 1.0.0


Challenge Credentialed Access

ArchEHR-QA: BioNLP at ACL 2025 Shared Task on Grounded Electronic Health Record Question Answering

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

electronic health record question answering clinicians patient portals

Published: April 11, 2025. Version: 1.2


Database Credentialed Access

MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.

question answering chest x-ray benchmark evaluation radiology electronic health records machine learning multimodal deep learning visual question answering

Published: July 19, 2024. Version: 1.0.0


Challenge Credentialed Access

Discharge Me: BioNLP ACL'24 Shared Task on Streamlining Discharge Documentation

Justin Xu

Data for the "Discharge Me!" Shared Task on Streamlining Discharge Documentation for BioNLP ACL'24

generation bionlp acl discharge summary

Published: April 12, 2024. Version: 1.3


Database Credentialed Access

NCH Sleep DataBank: A Large Collection of Real-world Pediatric Sleep Studies with Longitudinal Clinical Data

Harlin Lee, Boyue Li, Yungui Huang, Yuejie Chi, Simon Lin

The NCH Sleep DataBank includes 3,984 pediatric sleep studies on 3,673 unique patients conducted at Nationwide Children's Hospital between 2017 and 2019. It contains polysomnography (PSG), clinical annotations, and longitudinal clinical data.

eeg ehr pediatrics polysomnography clinical decision support sleep study ecg sleep disorders electronic health records

Published: Oct. 27, 2021. Version: 3.1.0


Model Credentialed Access

Clinical BERT Models Trained on Pseudo Re-identified MIMIC-III Notes

Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, Byron Wallace

We explore recovering sensitive info from BERT trained over non-deidentified EHR. We make our models and data available to further facilitate research.

Published: April 28, 2021. Version: 1.0.0


Database Credentialed Access

AMR-UTI: Antimicrobial Resistance in Urinary Tract Infections

Michael Oberst, Soorajnath Boominathan, Helen Zhou, Sanjat Kanjilal, David Sontag

AMR-UTI is a freely accessible dataset, derived from electronic health record (EHR) information on over 100,000 urinary tract infections (UTI) treated at Massachusetts General Hospital and Brigham & Women's Hospital in Boston, MA, USA.

antibiotic resistance causal inference policy learning antimicrobial resistance urinary tract infection clinical decision support machine learning

Published: Nov. 4, 2020. Version: 1.0.0