Resources
Database Credentialed Access
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
Hyungyung Lee, Geon Choi, Jung Oh Lee, Hangyul Yoon, Hyuk Gi Hong, Edward Choi
evaluation chest x-ray benchmark structured chest x-ray qa intermediate reasoning steps structured reasoning grounded reasoning diagnostic reasoning structured diagnostic pipeline
Published: Oct. 23, 2025. Version: 1.0.1
Database Credentialed Access
CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
Hyungyung Lee, Geon Choi, Jung Oh Lee, Hangyul Yoon, Hyuk Gi Hong, Edward Choi
evaluation chest x-ray benchmark structured chest x-ray qa intermediate reasoning steps structured reasoning grounded reasoning diagnostic reasoning structured diagnostic pipeline
Published: Oct. 23, 2025. Version: 1.0.1
Database Credentialed Access
MIMIC-IV-Ext-MDS-ED: Multimodal Decision Support in the Emergency Department - a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine
Juan Miguel Lopez Alcaraz, Nils Strodthoff
emergency department ecg diagnoses prediction deterioration prediction benchmark multimodal
Published: Sept. 12, 2024. Version: 1.0.0
Database Contributor Review
ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room
Mel Molina, Nikita Mehandru, Niloufar Golchini, Ahmed Alaa
Published: Oct. 23, 2025. Version: 1.0.0
Database Credentialed Access
FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark
Mingjie Li, Wenjia Cai, Rui Liu, Yuetian Weng, Tengfei Liu, Cong Wang, xin chen, zhong liu, Caineng Pan, Mengke Li, yingfeng zheng, Yizhi Liu, Flora Salim, Karin Verspoor, Xiaodan Liang, Xiaojun Chang
fundus fluorescein angiography medical report generation vision and language explainable and reliable evaluation
Published: Jan. 21, 2025. Version: 1.1.0
Database Credentialed Access
MIMIC-IV-Ext-MDS-ED: Multimodal Decision Support in the Emergency Department - a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine
Juan Miguel Lopez Alcaraz, Nils Strodthoff
emergency department ecg diagnoses prediction deterioration prediction benchmark multimodal
Published: Sept. 12, 2024. Version: 1.0.0
Database Credentialed Access
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha, Hangyul Yoon, Kwang Hyun Kim, Jeewon Yang, Seunghyun Won, Edward Choi
Published: June 26, 2024. Version: 1.0.1
Database Credentialed Access
MIMIC-IV-ECHO-Ext-MIMICEchoQA: A Benchmark Dataset for Echocardiogram-Based Visual Question Answering
Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder-Rodriguez, Angela Zhang, David Ouyang, James Zou
Published: Oct. 7, 2025. Version: 1.0.0
Database Credentialed Access
CXR-Align: A Benchmark for CXR-Report Alignment with Negations
Hanbin Ko
Published: Aug. 21, 2025. Version: 1.0.0
Database Credentialed Access
ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection
Sunjae Kwon, Xun Wang, Weisong Liu, Emily Druhl, Minhee Sung, Joel Reisman, Wenjun Li, Robert Kerns, William Becker, Hong Yu
substance use natural language processing opioid related aberrant behavior
Published: Jan. 11, 2024. Version: 1.0.0