Resources
Database Credentialed Access
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
Hyungyung Lee, Geon Choi, Jung Oh Lee, Hangyul Yoon, Hyuk Gi Hong, Edward Choi
evaluation chest x-ray benchmark structured chest x-ray qa intermediate reasoning steps structured reasoning grounded reasoning diagnostic reasoning structured diagnostic pipeline
Published: Oct. 23, 2025. Version: 1.0.1
Database Credentialed Access
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
Hyungyung Lee, Geon Choi, Jung Oh Lee, Hangyul Yoon, Hyuk Gi Hong, Edward Choi
evaluation chest x-ray benchmark structured chest x-ray qa intermediate reasoning steps structured reasoning grounded reasoning diagnostic reasoning structured diagnostic pipeline
Published: Oct. 23, 2025. Version: 1.0.1
Database Credentialed Access
MIMIC-IV-Ext-MDS-ED: Multimodal Decision Support in the Emergency Department - a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine
Juan Miguel Lopez Alcaraz, Nils Strodthoff
emergency department ecg diagnoses prediction deterioration prediction benchmark multimodal
Published: Sept. 12, 2024. Version: 1.0.0
Database Contributor Review
ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room
Mel Molina, Nikita Mehandru, Niloufar Golchini, Ahmed Alaa
Published: Oct. 23, 2025. Version: 1.0.0
Database Credentialed Access
FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark
Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Tengfei Liu, Cong Wang, xin chen, zhong liu, Caineng Pan, Mengke Li, yingfeng zheng, Yizhi Liu, Flora Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang
fundus fluorescein angiography medical report generation vision and language explainable and reliable evaluation
Published: Jan. 21, 2025. Version: 1.1.0
Database Credentialed Access
MIMIC-IV-Ext-MDS-ED: Multimodal Decision Support in the Emergency Department - a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine
Juan Miguel Lopez Alcaraz, Nils Strodthoff
emergency department ecg diagnoses prediction deterioration prediction benchmark multimodal
Published: Sept. 12, 2024. Version: 1.0.0
Database Credentialed Access
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha, Hangyul Yoon, Kwang Hyun Kim, Jeewon Yang, Seunghyun Won, Edward Choi
Published: June 26, 2024. Version: 1.0.1
Database Credentialed Access
MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark
Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador Martinez, Eduardo Perez Guerrero, Paola Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy Zandee van Rilland, Poonam Hosamani, Kevin Keet, Minjoung Go, Evelyn Ling, David Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay Chaudhari
Published: Nov. 14, 2025. Version: 1.0.1
Database Credentialed Access
MIMIC-IV-ECHO-Ext-MIMICEchoQA: A Benchmark Dataset for Echocardiogram-Based Visual Question Answering
Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder-Rodriguez, Angela Zhang, David Ouyang, James Zou
Published: Oct. 7, 2025. Version: 1.0.0
Database Credentialed Access
CXR-Align: A Benchmark for CXR-Report Alignment with Negations
Hanbin Ko
Published: Aug. 21, 2025. Version: 1.0.0