Resources


Database Credentialed Access

Immunosuppressive Condition and Medication Annotations for Admission Notes in the MIMIC-III Database

Vijeeth Guggilla, Melissa Bak, Mengjia Kang, et al.

This database contains 200 MIMIC-III admission notes with adjudicated labels for histories of various immunosuppressive conditions and usage of various immunosuppressive medications.

Published: Aug. 4, 2025. Version: 1.0.0


Database Credentialed Access

SCRIPT X2B8 Dataset: per-day clinical features to model successful next-day extubation

Sam Fenske, Alec Peltekian, Mengjia Kang, et al.

This dataset contains electronic health record (EHR) data from ICU patients receiving mechanical ventilation, aggregated on a daily basis, along with annotations of intubation, extubation, tracheostomy days, and cases of failed extubation. Data can b

Published: Jan. 28, 2025. Version: 1.0.0


Database Credentialed Access

TherLid: A Thermometry Linked Dataset

Jeremy Tan, Inês Martins, João Matos, et al.

TherLiD is an open-source dataset of 13,251 paired temperature readings (contact and infrared) from MIMIC-IV and eICU databases. With added demographics and derived data, it supports research on racial and ethnic disparities in infrared thermometry.

thermometry intensive care unit health equity electronic health records

Published: Jan. 21, 2025. Version: 1.0.0


Database Credentialed Access

ENCoDE, mEasuring skiN Color to correct pulse Oximetry DisparitiEs: skin tone and clinical data from a prospective trial on acute care patients.

Sicheng Hao, Katelyn Dempsey, João Matos, et al.

A prospective collected EHR-linked skin tone measurements database in OMOP format with emphasis on pulse oximetry disparities.

Published: Aug. 22, 2024. Version: 1.0.0


Database Credentialed Access

EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, et al.

An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Published: June 26, 2024. Version: 1.0.1


Database Credentialed Access

ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection

Sunjae Kwon, Xun Wang, Weisong Liu, et al.

Opioid-related aberrant behaviors (ORABs) detection Dataset (ODD) which is a large-size, expert-annotated, and multi-label classification benchmark dataset corresponding to the task

substance use natural language processing opioid related aberrant behavior

Published: Jan. 11, 2024. Version: 1.0.0


Challenge Credentialed Access

BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization

Yanjun Gao, Dmitriy Dligach, Timothy Miller, et al.

This is the data storage for BioNLP Workshop Shared Task 1A: Problem List Summarization.

bionlp clinical natural language processing electronic health record summarization

Published: Nov. 12, 2023. Version: 2.0.0


Database Credentialed Access

BOLD, a blood-gas and oximetry linked dataset

João Matos, Tristan Struja, Jack Gallifant, et al.

An open-source pulse oximetry and arterial blood gas dataset, derived from MIMIC-III, MIMIC-IV, and eICU-CRD

pulse oximetry intensive care unit health equity electronic health records

Published: Nov. 8, 2023. Version: 1.0


Database Open Access

MIMIC-IV Clinical Database Demo

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, et al.

An openly available subset of patients in the MIMIC-IV database.

critical care electronic health record mimic

Published: Jan. 31, 2023. Version: 2.2


Database Credentialed Access

Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization

Yanjun Gao, John Caskey, Timothy Miller, et al.

We introduce a hierarchical annotation suite of tasks addressing clinical text understanding, reasoning and abstraction over evidence, and diagnosis summarization. One task is section tagging major section and the other task is diagnosis generation.

Published: Sept. 30, 2022. Version: 1.0.0