Resources


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, et al.

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2


Database Open Access

KINECAL

Sean Maudsley-Barton, Moi Hoon Yap

A dataset for balance falls-risk assessment and balance impairment analysis

balance posturography clinical tests postural sway falls-risk age-related changes

Published: June 8, 2023. Version: 1.0.3


Database Credentialed Access

CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays

Hyungyung Lee, Geon Choi, Jung Oh Lee, et al.

CheXStruct is an automated pipeline that derives structured diagnostic reasoning steps from chest X-rays. CXReasonBench builds on this to evaluate whether models perform clinically grounded, multi-step reasoning beyond final diagnoses.

evaluation chest x-ray benchmark structured chest x-ray qa intermediate reasoning steps structured reasoning grounded reasoning diagnostic reasoning structured diagnostic pipeline

Published: Oct. 23, 2025. Version: 1.0.1


Database Credentialed Access

CXR-Align: A Benchmark for CXR-Report Alignment with Negations

Hanbin Ko

CXR-Align is a benchmark dataset created to evaluate vision-language models' capability to interpret negations in chest X-ray (CXR) reports, featuring systematically modified reports from MIMIC-CXR.

Published: Aug. 21, 2025. Version: 1.0.0


Database Credentialed Access

Medical-CXR-VQA dataset: A Large-Scale LLM-Enhanced Medical Dataset for Visual Question Answering on Chest X-Ray Images

Xinyue Hu, Lin Gu, Kazuma Kobayashi, et al.

Medical-CXR-VQA provides a large-scale LLM-enhanced dataset for visual question answering in medical chest x-ray images.

Published: Jan. 21, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-ECG-Ext-ICD: Diagnostic labels for MIMIC-IV-ECG

Nils Strodthoff, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp

Dataset that links ECG records from MIMIC-IV-ECG to ED discharge and hospital discharge diagnoses, which enables to train general ECG prediction models based on clinical labels and facilitates the retrieval of further clinical metadata from MIMIC-IV.

machine learning electrocardiography mimic

Published: Aug. 30, 2024. Version: 1.0.1


Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, et al.

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

artificial intelligence information extraction oncology natural language processing electronic health records large language models

Published: Feb. 7, 2024. Version: 1.0


Database Open Access

KINECAL

Sean Maudsley-Barton, Moi Hoon Yap

A dataset for balance falls-risk assessment and balance impairment analysis

balance posturography clinical tests postural sway falls-risk age-related changes

Published: June 8, 2023. Version: 1.0.3


Database Credentialed Access

Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation

Jong Hak Moon, Geon Choi, Paloma Rabaey, et al.

A radiologist-annotated benchmark of structured chest X-ray reports at single and sequential levels, comprising 1,473 reports across 18 relation types and 80 longitudinal cases.

fine-grained structured reports attribute-level clinical reasoning medical text structuring longitudinal clinical reasoning chest x-ray report parsing medical information structuring benchmark dataset for radiology report medical information extraction structured radiology reports temporal relation extraction radiology report benchmarking longitudinal clinical understanding

Published: Jan. 11, 2026. Version: 1.0.0


Database Restricted Access

MIMIC-IV-Ext-Apixaban-Trial-Criteria-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, et al.

We created 23 questions resembling eligibility criteria from the apixaban clinical trial and evaluated them on a random sample of 100 patient notes from MIMIC-IV. We release the 2300 total question-answer pairs as a dataset here.

clinical q and a evaluation set clinical trial eligibility

Published: April 30, 2025. Version: 1.0.0