Resources


Challenge Credentialed Access

CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays

Gregory Holste, Mingquan Lin, Song Wang, et al.

CXR-LT 2024 was a challenge for long-tailed, multi-label, and zero-shot thorax disease classification on chest X-rays, held at MICCAI 2024. This page contains long-tailed labels for 45 diseases from the CXR-LT 2024 and 2023 challenges.

disease classification artificial intelligence chest x-ray deep learning computer-aided diagnosis long-tailed learning cardiopulmonary disease zero-shot learning

Published: March 19, 2025. Version: 2.0.0


Database Open Access

VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-Scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels

Dain Eun, Kayoung Shim, Hyunsoo Lee, et al.

We present a comprehensive intraoperative arrhythmia dataset with 734,528 seconds of ECG recordings from 482 patients, featuring over 660,000 beats annotated and validated by five anesthesiologists.

ppg vitaldb ecg arterial waveform intraoperative dataset

Published: Feb. 26, 2026. Version: 1.0.0


Database Restricted Access

MIMIC-IV-Ext-Apixaban-Trial-Criteria-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, et al.

We created 23 questions resembling eligibility criteria from the apixaban clinical trial and evaluated them on a random sample of 100 patient notes from MIMIC-IV. We release the 2300 total question-answer pairs as a dataset here.

clinical q and a evaluation set clinical trial eligibility

Published: April 30, 2025. Version: 1.0.0


Database Restricted Access

MIMIC-III-Ext-Synthetic-Clinical-Trial-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, et al.

In our recent study, we used Llama-3.1-70B-Instruct to generate synthetic training examples resembling clinical trial eligibility criteria. We manually reviewed 1000 of these examples and release them here.

large language models synthetic data distillation clinical trial eligibility

Published: April 22, 2025. Version: 1.0.0


Database Open Access

Image-derived cardiomegaly biomarker values for 96K chest X-rays in MIMIC-CXR/MIMIC-CXR-JPG

Benjamin Duvieusart, Felix Krones, Guy Parsons, et al.

Automatically extracted cardiomegaly biomarkers - cardiothoracic ratio (CTR) and cardiopulmonary area ratio (CPAR) - for all posterior-anterior chest x-ray scans in MIMIC-CXR/MIMIC-CXR-JPG.

biomarkers mimic-cxr cpar ctr cardiomegaly

Published: Aug. 23, 2024. Version: 1.0.0


Database Credentialed Access

EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, et al.

An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Published: June 26, 2024. Version: 1.0.1


Database Credentialed Access

Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization

Yanjun Gao, John Caskey, Timothy Miller, et al.

We introduce a hierarchical annotation suite of tasks addressing clinical text understanding, reasoning and abstraction over evidence, and diagnosis summarization. One task is section tagging major section and the other task is diagnosis generation.

Published: Sept. 30, 2022. Version: 1.0.0


Database Credentialed Access

Embedding-Based Representations for BRSET and mBRSET

David Restrepo, Chenwei Wu, Michael Morley, et al.

Precomputed image embeddings for the BRSET and mBRSET Brazilian retinal datasets to support efficient, secure, and equitable ophthalmic AI research, enabling tasks such as classification, clustering, multimodal modeling, and fairness analysis.

computer vision ophthalmology vector embeddings

Published: March 30, 2026. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-MedicalBench: Evaluating Large Language Models Towards Improved Medical Concept Extraction

Zhichao Yang, Gregory Lyng, Sanjit Batra, et al.

This dataset is an evidence‑grounded benchmark built on MIMIC‑IV discharge summaries that evaluates how well large language models can verify ICD‑10 medical concepts, including implicitly documented diagnoses, by identifying supporting text evidence.

Published: March 23, 2026. Version: 1.0.0


Model Credentialed Access

Fine-tuning foundational models to code diagnoses from veterinary health records

Adam Kiehl, Nadia Saklou, G Joseph Strecker, et al.

Fine-tuned GatorTron LLM for veterinary diagnosis coding to 7,739 SNOMED-CT codes based on clinical summary text from the Colorado State University Veterinary Teaching Hospital.

transformers natural language processing large language models foundational models one health diagnoses snomed-ct veterinary medicine omop cdm veterinary medical records clinical coding

Published: Jan. 25, 2026. Version: 1.0.0