Resources


Database Credentialed Access

LLaVA-Rad MIMIC-CXR Annotations

Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu-Hsin Wei, Tristan Naumann, Muhao Chen, Matthew Lungren, Akshay Chaudhari, Serena Yeung, Curtis Langlotz, Sheng Wang, Hoifung Poon

This dataset provides GPT-4 extracted sections of radiology reports from MIMIC-CXR, complementing rule-based section extractions with additional reports with findings, and removing references to priors from findings.

Published: Jan. 24, 2025. Version: 1.0.0


Database Credentialed Access

Generalized Image Embeddings for the MIMIC Chest X-Ray dataset

Andrew Sellergren, Atilla Kiraly, Tom Pollard, Wei-Hung Weng, Yun Liu, Akib Uddin, Christina Chen

This database contains compact information-rich embeddings of the MIMIC-CXR Database v2.0.0 using the CXR Foundation API v1.0.

Published: Feb. 22, 2023. Version: 1.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Database Credentialed Access

MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters

Amin Dada, Osman Alperen Koras, Marie Bauer, Amanda Butler, Kaleb Smith, Jens Kleesiek, Julian Friedrich

MeDiSumQA is a dataset of patient-oriented QA pairs from MIMIC-IV discharge summaries, designed to evaluate LLMs in generating safe, patient-friendly medical responses for clinical QA and healthcare communication.

Published: May 5, 2025. Version: 1.0.0


Database Credentialed Access

Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization

Yanjun Gao, John Caskey, Timothy Miller, Brihat Sharma, Matthew Churpek, Dmitriy Dligach, Majid Afshar

We introduce a hierarchical annotation suite of tasks addressing clinical text understanding, reasoning and abstraction over evidence, and diagnosis summarization. One task is section tagging major section and the other task is diagnosis generation.

Published: Sept. 30, 2022. Version: 1.0.0


Database Credentialed Access

Annotated Question-Answer Pairs for Clinical Notes in the MIMIC-III Database

Xiang Yue, Xinliang Frederick Zhang, Huan Sun

Annotated Question Answering Pairs for Clinical Notes in the MIMIC-III Database

clinical question answering clinical nlp clinical reading comprehension

Published: Jan. 15, 2021. Version: 1.0.0


Database Open Access

ReXErr-v1: Clinically Meaningful Chest X-Ray Report Errors Derived from MIMIC-CXR

Vishwanatha Rao, Serena Zhang, Julian Acosta, Subathra Adithan, Pranav Rajpurkar

Chest X-Ray reports containing synthetic errors based upon the MIMIC-CXR database. Errors were injected using LLMs and sampled across common human and AI model errors.

Published: March 19, 2025. Version: 1.0.0


Challenge Credentialed Access

BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization

Yanjun Gao, Dmitriy Dligach, Timothy Miller, Majid Afshar

This is the data storage for BioNLP Workshop Shared Task 1A: Problem List Summarization.

bionlp clinical natural language processing electronic health record summarization

Published: Nov. 12, 2023. Version: 2.0.0


Database Restricted Access

MIMIC-III-Ext-Synthetic-Clinical-Trial-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, Brett Beaulieu-Jones

In our recent study, we used Llama-3.1-70B-Instruct to generate synthetic training examples resembling clinical trial eligibility criteria. We manually reviewed 1000 of these examples and release them here.

large language models synthetic data distillation clinical trial eligibility

Published: April 22, 2025. Version: 1.0.0


Database Restricted Access

MIMIC-III-Ext-Synthetic-Clinical-Trial-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, Brett Beaulieu-Jones

In our recent study, we used Llama-3.1-70B-Instruct to generate synthetic training examples resembling clinical trial eligibility criteria. We manually reviewed 1000 of these examples and release them here.

large language models synthetic data distillation clinical trial eligibility

Published: April 22, 2025. Version: 1.0.0