Resources


Database Credentialed Access

MS-CXR-T: Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Shruthi Bannur, Stephanie Hyland, Qianchu Liu, et al.

The MS-CXR-T is a multimodal benchmark that enhances the MIMIC-CXR v2 dataset by including expert-verified annotations. Its goal is to evaluate biomedical visual-language processing models in terms of temporal semantics extracted from image and text.

disease progression cxr vision-language processing chest x-ray radiology multimodal

Published: March 17, 2023. Version: 1.0.0


Database Credentialed Access

Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization

Yanjun Gao, John Caskey, Timothy Miller, et al.

We introduce a hierarchical annotation suite of tasks addressing clinical text understanding, reasoning and abstraction over evidence, and diagnosis summarization. One task is section tagging major section and the other task is diagnosis generation.

Published: Sept. 30, 2022. Version: 1.0.0


Database Open Access

Cerebral perfusion and cognitive decline in type 2 diabetes

Vera Novak, Rodrigo Quispe, Charles Saunders

Dataset collected during a study on type 2 diabetes on brain blood flow, vasoreactivity and functional outcomes (gait and balance) using TCD, MRI perfusion and foot pressure distribution and gait measures.

vasoregulation brain diabetes

Published: Aug. 5, 2022. Version: 1.0.1

Visualize waveforms

Database Restricted Access

VinDr-SpineXR: A large annotated medical image dataset for spinal lesions detection and classification from radiographs

Hieu Huy Pham, Hieu Nguyen Trung, Ha Quy Nguyen

VinDr-SpineXR: A large annotated medical image dataset for spinal lesions detection and classification from radiographs

Published: Aug. 24, 2021. Version: 1.0.0


Database Credentialed Access

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Mingjie Li, Wenjia Cai, Rui Liu, et al.

Benchmark dataset for report generation based on fundus fluorescein angiography images and reports.

fundus fluorescein angiography medical report generation vision and language explainable and reliable evaluation

Published: Jan. 21, 2025. Version: 1.1.0


Database Credentialed Access

MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, et al.

We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.

question answering machine learning electronic health records evaluation chest x-ray radiology benchmark multimodal deep learning visual question answering

Published: July 19, 2024. Version: 1.0.0


Database Open Access

Brugada-HUCA: 12-Lead ECG Recordings for the Study of Brugada Syndrome

Nahuel Costa Cortez, Daniel Garcia Iglesias

Brugada syndrome is a rare but potentially life-threatening cardiac arrhythmia disorder, with an elevated risk of sudden cardiac death. This dataset introduces 12-lead ECG recordings gather to support the study of this rare disease.

Published: Feb. 2, 2026. Version: 1.0.0

Visualize waveforms

Model Credentialed Access

Fine-tuning foundational models to code diagnoses from veterinary health records

Adam Kiehl, Nadia Saklou, G Joseph Strecker, et al.

Fine-tuned GatorTron LLM for veterinary diagnosis coding to 7,739 SNOMED-CT codes based on clinical summary text from the Colorado State University Veterinary Teaching Hospital.

transformers natural language processing large language models foundational models one health diagnoses snomed-ct veterinary medicine omop cdm veterinary medical records clinical coding

Published: Jan. 25, 2026. Version: 1.0.0


Database Open Access

PSG-IPA: A PolySomnoGraphic Inter-scorer Performance Assessment database

Diego Alvarez-Estevez

The HMC-IPA dataset comprises 20 PSG recordings, each with manual and computer-assisted scorings by 12 sleep technologists, for studying inter-scorer variability and evaluating automated sleep analysis algorithms

Published: Jan. 8, 2026. Version: 1.0.0

Visualize waveforms

Database Credentialed Access

EchoGraph-annotated ECHO-NOTE2NUM examples

Chieh-Ju Chao, Mohammad Asadi

EchoGraph is a model that automatically extracts and structures clinical information from echocardiogram reports. The Annotated ECHO-NOTE2NUM Dataset contains MIMIC-III echo reports enhanced with EchoGraph annotations to enhance future research.

Published: Dec. 3, 2025. Version: 1.0.0