Resources


Model Credentialed Access

Me-LLaMA: Foundation Large Language Models for Medical Applications

Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu, Fongci Lin, Xueqing Peng, Jimin Huang, Jeffrey Zhang, Vipina Keloth, Xinyu Zhou, Huan He, Lucila Ohno-Machado, Yonghui Wu, Hua Xu, Jiang Bian

Me-LLaMA is a family of large language models for medical applications trained using clinical text with LLaMA2 models as the base. We release model weights for the foundation models as well as the chat-enhanced models.

large language models

Published: June 5, 2024. Version: 1.0.0


Database Restricted Access

Swiss-Mammo: A physician-written, synthetic dataset of German mammography reports

Daniel Reichenpfader, Sandro von Däniken, Harald Marcel Bonel

Swiss-Mammo: A physician-written, synthetic dataset of 28 German mammography reports. The dataset is stratified based on BI-RADS categories and available in German and English.

radiology mammography structured reporting bi-rads

Published: June 24, 2025. Version: 1.0.1


Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering chest x-ray benchmark evaluation multi-modal question answering ehr question answering semantic parsing machine learning deep learning electronic health records visual question answering

Published: July 23, 2024. Version: 1.0.0


Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering chest x-ray benchmark evaluation multi-modal question answering ehr question answering semantic parsing machine learning deep learning electronic health records visual question answering

Published: July 23, 2024. Version: 1.0.0


Database Open Access

Hillel Yaffe Glaucoma Dataset (HYGD): A Gold-Standard Annotated Fundus Dataset for Glaucoma Detection

Or Abramovich, Hadas Pizem, Jonathan Fhima, Eran Berkowitz, Ben Gofrit, Jan Van Eijgen, Eytan Blumenthal, Joachim Behar

HYGD is a rigorously annotated fundus image dataset with gold-standard clinical labels designed to improve and benchmark deep learning models for accurate glaucoma detection.

ophthalmology retina dfi gold-standard gon fundus glaucoma

Published: June 3, 2025. Version: 1.0.0


Database Credentialed Access

RadNLI: A natural language inference dataset for the radiology domain

Yasuhide Miura, Yuhao Zhang, Emily Tsai, Curtis Langlotz, Dan Jurafsky

A radiology NLI dataset introduced in the paper: Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Published: June 29, 2021. Version: 1.0.0


Database Credentialed Access

FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Tengfei Liu, Cong Wang, xin chen, zhong liu, Caineng Pan, Mengke Li, yingfeng zheng, Yizhi Liu, Flora Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang

Benchmark dataset for report generation based on fundus fluorescein angiography images and reports.

fundus fluorescein angiography medical report generation vision and language explainable and reliable evaluation

Published: Jan. 21, 2025. Version: 1.1.0


Database Credentialed Access

RadGraph2: Tracking Findings Over Time in Radiology Reports

Adam Dejl, Sameer Khanna, Patricia Therese Pile, Kibo Yoon, Steven QH Truong, Hanh Duong, Agustina Saenz, Pranav Rajpurkar

RadGraph2 is a dataset of 800 chest radiology reports annotated using a fine-grained entity-relationship schema, which captures key findings as well as mentions of changes that occurred in comparison with the previous radiology studies.

chest x-rays relation extraction disease progression information extraction radiology reports named entity recognition

Published: Aug. 8, 2024. Version: 1.0.0


Database Restricted Access

Pulmonary Edema Severity Grades Based on MIMIC-CXR

Ruizhi Liao, Geeticka Chauhan, Polina Golland, Seth Berkowitz, Steven Horng

Pulmonary edema metadata and labels for MIMIC-CXR

Published: Feb. 9, 2021. Version: 1.0.1


Database Credentialed Access

Medical-CXR-VQA dataset: A Large-Scale LLM-Enhanced Medical Dataset for Visual Question Answering on Chest X-Ray Images

Xinyue Hu, Lin Gu, Kazuma Kobayashi, liangchen liu, Mengliang Zhang, Tatsuya Harada, Ronald Summers, Yingying Zhu

Medical-CXR-VQA provides a large-scale LLM-enhanced dataset for visual question answering in medical chest x-ray images.

Published: Jan. 21, 2025. Version: 1.0.0