Resources


Challenge Credentialed Access

SNOMED CT Entity Linking Challenge

Will Hardman, Mark Banks, Rory Davidson, et al.

272 discharge notes from the MIMIC-IV-Note dataset annotated with SNOMED CT concepts.

snomed entity linking clinical annotation

Published: Feb. 17, 2026. Version: 1.2.1


Database Credentialed Access

Predictors of Hospital Onset Infection: A Matched Retrospective Cohort Dataset

Ziming Wei, Luke Sagers, Caroline McKenna, et al.

NPA-CP is a freely accessible dataset derived from electronic health record (EHR) information at MGB between 2015 and 2024. The dataset includes 11 different pathogens and can be used to predict hospital-onset infections for these pathogens.

electronic health records infection control clinical machine learning infectious diseases hospital onset infection colonization pressure

Published: Nov. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-Ext-DrugDetection

Fabrice Harel-Canada, Nanyun Peng, David Goodman, et al.

This project offers a multilabel annotated dataset of clinical note sentences from MIMIC-III/IV for substance use detection. It supports NLP research for identifying various co-occurring drug use mentions in patient records.

ehr mimic-iv substance use clinical notes mimic-iii methamphetamine multi-label cocaine drug detection polysubstance use prescription opioid misuse cannabis benzodiazepine misuse injection drug use heroin

Published: Sept. 25, 2025. Version: 1.0.0


Database Open Access

Myocardial perfusion scintigraphy image database

Wesley Calixto, Solange Nogueira, Fernanda Luz, et al.

This database provides a collection of myocardial perfusion scintigraphy images. The dataset encompasses a diversity of clinical cases, including various perfusion patterns and underlying cardiac conditions.

nifti artificial intelligence anonymization clinical diagnosis myocardial perfusion systems modeling myocardial perfusion scintigraphy metadata ventricular walls coronary artery disease convolutional neural networks automated segmentation dicom

Published: Sept. 9, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext Triage Instruction Corpus

Qingyang Shen, Quan Guo

MIMIC-IV-Ext Triage Instruction Corpus includes 9,629 ED triage cases organized by the five-level ESI, enabling LLMs to improve triage accuracy. It provides CSV data, generation prompts, expert validation samples, and SQL QC scripts.

nlp clinical decision support machine learning large language models emergency severity index emergency triage

Published: March 4, 2025. Version: 1.0.0


Database Credentialed Access

Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries

Melissa Poulsen, Vanessa Troiani, Philip Freda, et al.

The database contains a corpus of annotated data from the MIMIC-III Critical Care Database from a study that aimed to develop and apply an annotation schema to characterize opioid use disorder and related contextual factors.

opioid use disorder substance use natural language processing clinical notes

Published: Feb. 8, 2023. Version: 1.0.0


Database Credentialed Access

AMR-UTI: Antimicrobial Resistance in Urinary Tract Infections

Michael Oberst, Soorajnath Boominathan, Helen Zhou, et al.

AMR-UTI is a freely accessible dataset, derived from electronic health record (EHR) information on over 100,000 urinary tract infections (UTI) treated at Massachusetts General Hospital and Brigham & Women's Hospital in Boston, MA, USA.

antibiotic resistance causal inference policy learning urinary tract infection antimicrobial resistance clinical decision support machine learning

Published: Nov. 4, 2020. Version: 1.0.0


Database Credentialed Access

MIMIC-III-Ext-CA: a MIMIC-III Derived Dataset of Cardiac Arrests in Photoplethysmographs

Gerben Hup, Xi Long, Rik Vullings

The MIMIC-III-Ext-CA dataset contains annotations of 31 PPG-captured cardiac arrest episodes from the MIMIC-III clinical and waveform databases.

ppg photoplethysmography mimic-iii cardiac arrest out-of-hospital cardiac arrest ohca

Published: March 10, 2026. Version: 1.0.0


Challenge Credentialed Access

SNOMED CT Entity Linking Challenge

Will Hardman, Mark Banks, Rory Davidson, et al.

272 discharge notes from the MIMIC-IV-Note dataset annotated with SNOMED CT concepts.

snomed entity linking clinical annotation

Published: Feb. 17, 2026. Version: 1.2.1


Database Open Access

tOLIet: Single-lead Thigh-based Electrocardiography Using Polimeric Dry Electrodes

Aline Santos Silva, Hugo Plácido da Silva, Miguel Correia, et al.

We present tOLIet, the first thigh ECG dataset with real signals captured by a toilet seat with electrodes. There are 149 recordings from 86 people, useful for research into cardiovascular assessment using "invisible" ECG.

Published: Feb. 2, 2026. Version: 1.0.1