Resources


Database Credentialed Access

MIMIC-IV-ECHO-Ext-MIMICEchoQA: A Benchmark Dataset for Echocardiogram-Based Visual Question Answering

Rahul Thapa, Andrew Li, Qingyang Wu, et al.

We present MIMICEchoQA, a benchmark dataset for echocardiogram-based question answering, built from the publicly available MIMIC-IV-ECHO database.

Published: Oct. 7, 2025. Version: 1.0.0


Database Credentialed Access

CXR-Align: A Benchmark for CXR-Report Alignment with Negations

Hanbin Ko

CXR-Align is a benchmark dataset created to evaluate vision-language models' capability to interpret negations in chest X-ray (CXR) reports, featuring systematically modified reports from MIMIC-CXR.

Published: Aug. 21, 2025. Version: 1.0.0


Database Credentialed Access

ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection

Sunjae Kwon, Xun Wang, Weisong Liu, et al.

Opioid-related aberrant behaviors (ORABs) detection Dataset (ODD) which is a large-size, expert-annotated, and multi-label classification benchmark dataset corresponding to the task

substance use natural language processing opioid related aberrant behavior

Published: Jan. 11, 2024. Version: 1.0.0


Database Open Access

VTaC: A Benchmark Dataset of Ventricular Tachycardia Alarms from ICU Monitors

Li-wei Lehman, Benjamin Moody, Lucas McCullum, et al.

VTaC is an annotated ventricular tachycardia (VT) arrhythmia alarm database containing over 5,000 waveform recordings with VT alarms from ICU monitors, with each alarm labeled as either true or false by at least two human expert annotators.

arrhythmia machine learning icu false alarms benchmark dataset ventricular tachycardia

Published: Oct. 1, 2024. Version: 1.0

Visualize waveforms

Database Credentialed Access

MIMIC-III-Ext-PPG: A PPG Benchmark Dataset for Cardiorespiratory Analysis

Mohammad Moulaeifard, Peter H Charlton, Nils Strodthoff

Large-Scale, Quality-Assessed PPG-based Benchmark Dataset for Cardiovascular and Respiratory Signal Analysis based on MIMIC-III

blood pressure critical care electrocardiogram photoplethysmography signal quality heart rhythm respiratory rate

Published: March 17, 2026. Version: 1.1.0


Database Open Access

VTaC: A Benchmark Dataset of Ventricular Tachycardia Alarms from ICU Monitors

Li-wei Lehman, Benjamin Moody, Lucas McCullum, et al.

VTaC is an annotated ventricular tachycardia (VT) arrhythmia alarm database containing over 5,000 waveform recordings with VT alarms from ICU monitors, with each alarm labeled as either true or false by at least two human expert annotators.

arrhythmia machine learning icu false alarms benchmark dataset ventricular tachycardia

Published: Oct. 1, 2024. Version: 1.0

Visualize waveforms

Database Restricted Access

VinDr-Mammo: A large-scale benchmark dataset for computer-aided detection and diagnosis in full-field digital mammography

Hieu Huy Pham, Hieu Nguyen Trung, Ha Quy Nguyen

A large-scale benchmark dataset for computer-aided detection and diagnosis in mammography

Published: March 21, 2022. Version: 1.0.0


Database Credentialed Access

MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, et al.

We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.

question answering machine learning electronic health records evaluation chest x-ray radiology deep learning benchmark multimodal visual question answering

Published: July 19, 2024. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-MedicalBench: Evaluating Large Language Models Towards Improved Medical Concept Extraction

Zhichao Yang, Gregory Lyng, Sanjit Batra, et al.

This dataset is an evidence‑grounded benchmark built on MIMIC‑IV discharge summaries that evaluates how well large language models can verify ICD‑10 medical concepts, including implicitly documented diagnoses, by identifying supporting text evidence.

Published: March 23, 2026. Version: 1.0.0


Database Credentialed Access

MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context

Zishan Gu, Jiayuan Chen, Fenglin Liu, et al.

MedVH provides a visual hallucination evaluation benchmark for large language models in the medical context. It formulates tests using chest X-ray images, including multi-choice question answering and long-text generation tasks.

Published: Dec. 10, 2025. Version: 1.0.1