Resources


Database Credentialed Access

MIMIC-III-Ext-Notes

Darren Liu, Monique Bouvier, Delgersuren Bold, et al.

We evaluated general large language models' performance in clinical information extraction on MIMIC-III notes.

Published: Feb. 27, 2026. Version: 1.0.0


Challenge Credentialed Access

SNOMED CT Entity Linking Challenge

Will Hardman, Mark Banks, Rory Davidson, et al.

272 discharge notes from the MIMIC-IV-Note dataset annotated with SNOMED CT concepts.

snomed entity linking clinical annotation

Published: Feb. 17, 2026. Version: 1.2.1


Database Credentialed Access

Predictors of Hospital Onset Infection: A Matched Retrospective Cohort Dataset

Ziming Wei, Luke Sagers, Caroline McKenna, et al.

NPA-CP is a freely accessible dataset derived from electronic health record (EHR) information at MGB between 2015 and 2024. The dataset includes 11 different pathogens and can be used to predict hospital-onset infections for these pathogens.

electronic health records infection control clinical machine learning infectious diseases hospital onset infection colonization pressure

Published: Nov. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext clinical decision support for referral, triage and diagnosis

Farieda Gaber, Altuna Akalin

This MIMIC-IV extended dataset is designed to evaluate and improve LLMs' ability to assist with triage, specialist referral, and diagnosis, using critical patient information such as history of present illness,vitals signs and other relevant data.

Published: Oct. 8, 2025. Version: 1.0.2


Database Open Access

MIMIC-IV demo data in the Medical Event Data Standard (MEDS)

Robin Philippus van de Water, Ethan Steinberg, Michael Wornow, et al.

MIMIC-IV Clinical Database Demo in MEDS (Medical Event Data Standard) format.

ehr critical care electronic health record machine learning mimic meds medical event data standard

Published: Sept. 29, 2025. Version: 0.0.1


Database Credentialed Access

MIMIC-Ext-DrugDetection

Fabrice Harel-Canada, Nanyun Peng, David Goodman, et al.

This project offers a multilabel annotated dataset of clinical note sentences from MIMIC-III/IV for substance use detection. It supports NLP research for identifying various co-occurring drug use mentions in patient records.

ehr mimic-iv substance use clinical notes methamphetamine multi-label cocaine drug detection polysubstance use prescription opioid misuse cannabis benzodiazepine misuse injection drug use heroin mimic-iii

Published: Sept. 25, 2025. Version: 1.0.0


Database Credentialed Access

Annotated Social Determinants of Health Dataset for Adverse Pregnancy Outcomes

Nidhi Soley, MaKhaila Bentil, Jash Shah, et al.

This project provides a manually annotated dataset of social determinants of health—social support, occupation, and substance use—linked to pregnancy outcomes, extracted from MIMIC-III and MIMIC-IV discharge summary notes.

Published: Aug. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-Ext-CXR-QBA: A Structured, Tagged, and Localized Visual Question Answering Dataset with Question-Box-Answer Triplets and Scene Graphs for Chest X-ray Images

Philip Müller, Friederike Jungmann, Georgios Kaissis, et al.

We present a large-scale CXR VQA dataset derived from MIMIC-CXR with 42M QA pairs, featuring hierarchical answers, bounding boxes, and structured tags. We generated QA-pairs using LLM-based extraction from radiology reports and localization models.

chest x-rays vqa localization scene graphs

Published: July 22, 2025. Version: 1.0.0


Database Open Access

Hillel Yaffe Glaucoma Dataset (HYGD): A Gold-Standard Annotated Fundus Dataset for Glaucoma Detection

Or Abramovich, Hadas Pizem, Jonathan Fhima, et al.

HYGD is a rigorously annotated fundus image dataset with gold-standard clinical labels designed to improve and benchmark deep learning models for accurate glaucoma detection.

ophthalmology retina dfi gold-standard gon fundus glaucoma

Published: June 3, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext Cardiac Disease

Jiawei Cao, Sendong Zhao

The subset of the MIMIC-IV dataset includes the examination results and diagnostic information of 4,761 cardiac disease patients. The examination results for each patient are listed separately as evidence for the final diagnosis.

Published: May 6, 2025. Version: 1.0.0