Resources


Database Credentialed Access

MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters

Amin Dada, Osman Alperen Koras, Marie Bauer, Amanda Butler, Kaleb Smith, Jens Kleesiek, Julian Friedrich

MeDiSumQA is a dataset of patient-oriented QA pairs from MIMIC-IV discharge summaries, designed to evaluate LLMs in generating safe, patient-friendly medical responses for clinical QA and healthcare communication.

Published: May 5, 2025. Version: 1.0.0


Database Credentialed Access

Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset

Eric Lehman

Dataset of questions asked by medical experts about patients. Medical experts will read a discharge summary line-by-line and (1) ask any question that they may have and (2) record what in the text "triggered" them to ask their question.

question generation question answering machine learning

Published: July 28, 2022. Version: 1.0


Database Credentialed Access

Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset

Eric Lehman

Dataset of questions asked by medical experts about patients. Medical experts will read a discharge summary line-by-line and (1) ask any question that they may have and (2) record what in the text "triggered" them to ask their question.

question generation question answering machine learning

Published: July 28, 2022. Version: 1.0


Database Credentialed Access

Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset

Eric Lehman

Dataset of questions asked by medical experts about patients. Medical experts will read a discharge summary line-by-line and (1) ask any question that they may have and (2) record what in the text "triggered" them to ask their question.

question generation question answering machine learning

Published: July 28, 2022. Version: 1.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Challenge Credentialed Access

ArchEHR-QA: BioNLP at ACL 2025 Shared Task on Grounded Electronic Health Record Question Answering

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

electronic health record question answering clinicians patient portals

Published: April 11, 2025. Version: 1.2


Database Credentialed Access

MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.

question answering chest x-ray benchmark evaluation machine learning radiology deep learning multimodal electronic health records visual question answering

Published: July 19, 2024. Version: 1.0.0


Database Credentialed Access

Annotated Question-Answer Pairs for Clinical Notes in the MIMIC-III Database

Xiang Yue, Xinliang Frederick Zhang, Huan Sun

Annotated Question Answering Pairs for Clinical Notes in the MIMIC-III Database

clinical question answering clinical nlp clinical reading comprehension

Published: Jan. 15, 2021. Version: 1.0.0


Database Restricted Access

MIMIC-IV-Ext-Apixaban-Trial-Criteria-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, Brett Beaulieu-Jones

We created 23 questions resembling eligibility criteria from the apixaban clinical trial and evaluated them on a random sample of 100 patient notes from MIMIC-IV. We release the 2300 total question-answer pairs as a dataset here.

clinical q and a evaluation set clinical trial eligibility

Published: April 30, 2025. Version: 1.0.0