Resources


Database Credentialed Access

Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries

Melissa Poulsen, Vanessa Troiani, Philip Freda, Danielle Mowery, Anahita Davoudi

The database contains a corpus of annotated data from the MIMIC-III Critical Care Database from a study that aimed to develop and apply an annotation schema to characterize opioid use disorder and related contextual factors.

opioid use disorder substance use natural language processing clinical notes

Published: Feb. 8, 2023. Version: 1.0.0


Database Credentialed Access

Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries

Melissa Poulsen, Vanessa Troiani, Philip Freda, Danielle Mowery, Anahita Davoudi

The database contains a corpus of annotated data from the MIMIC-III Critical Care Database from a study that aimed to develop and apply an annotation schema to characterize opioid use disorder and related contextual factors.

opioid use disorder substance use natural language processing clinical notes

Published: Feb. 8, 2023. Version: 1.0.0


Database Credentialed Access

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

James Mullenbach, Yada Pruksachatkun, Sean Adler, Jennifer Seale, Jordan Swartz, T Greg McKelvey, Yi Yang, David Sontag

Clinical action items annotated over MIMIC-III. 718 discharge summaries are labeled at a sentence- and character-level with multiple action labels including Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other.

Published: June 21, 2021. Version: 1.0.0


Database Credentialed Access

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

Jean-Benoit Delbrouck

RadGraph-XL is a large, expert-annotated dataset of 2,300 radiology reports covering multiple modalities and anatomies. It enables accurate extraction of clinical entities and relations for downstream medical AI tasks.

Published: Sept. 12, 2025. Version: 1.0.0


Database Open Access

NInFEA: Non-Invasive Multimodal Foetal ECG-Doppler Dataset for Antenatal Cardiology Research

Danilo Pani, Eleonora Sulas, Monica Urru, Reza Sameni, Luigi Raffo, Roberto Tumbarello

Open dataset featuring non-invasive electrophysiological recordings, fetal pulsed-wave Doppler and maternal respiration signals. It provides a ground truth on the fetal heart activity when an invasive scalp lead is unavailable.

foetus pwd doppler foetal ecg maternal ecg pwd envelope non-invasive cardiology early pregnancy antenatal fecg ecg

Published: Nov. 12, 2020. Version: 1.0.0

Visualize waveforms

Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, Brenda Miao, Travis Zack, Atul Butte

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

artificial intelligence information extraction oncology natural language processing electronic health records large language models

Published: Feb. 7, 2024. Version: 1.0


Database Credentialed Access

MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador Martinez, Eduardo Perez Guerrero, Paola Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy Zandee van Rilland, Poonam Hosamani, Kevin Keet, Minjoung Go, Evelyn Ling, David Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay Chaudhari

MedVAL-Bench is the first large-scale physician-validated benchmark for medical text validation, spanning 6 diverse medical tasks and containing 840 language model-generated outputs annotated by 12 physicians with error assessments and risk grades.

Published: Nov. 14, 2025. Version: 1.0.1


Database Credentialed Access

MIMIC-IV-ECG-Ext-ICD: Diagnostic labels for MIMIC-IV-ECG

Nils Strodthoff, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp

Dataset that links ECG records from MIMIC-IV-ECG to ED discharge and hospital discharge diagnoses, which enables to train general ECG prediction models based on clinical labels and facilitates the retrieval of further clinical metadata from MIMIC-IV.

machine learning electrocardiography mimic

Published: Aug. 30, 2024. Version: 1.0.1


Database Restricted Access

DREAMT: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

Ke Wang, Jiamu Yang, Ayush Shetty, Jessilyn Dunn

We present high resolution wearable device multichannel data along with clinical labeled and recorded sleep stage and polysomnography (PSG) data from 100 sleep abnormal patients with sleep apnea.

wearable sleep disorders biomedical time series classification

Published: April 30, 2025. Version: 2.1.0


Challenge Credentialed Access

ShAReCLEF eHealth 2013: Natural Language Processing and Information Retrieval for Clinical Care

Danielle Mowery

2013 ShARe/CLEF eHealth Evaluation Lab: Natural Language Processing and Information Retrieval for Clinical Care (Tasks 1 and 2).

natural language processing

Published: Feb. 15, 2013. Version: 1.0