Resources


Database Credentialed Access

MIMIC-Ext-CXR-QBA: A Structured, Tagged, and Localized Visual Question Answering Dataset with Question-Box-Answer Triplets and Scene Graphs for Chest X-ray Images

Philip Müller, Friederike Jungmann, Georgios Kaissis, et al.

We present a large-scale CXR VQA dataset derived from MIMIC-CXR with 42M QA pairs, featuring hierarchical answers, bounding boxes, and structured tags. We generated QA-pairs using LLM-based extraction from radiology reports and localization models.

chest x-rays vqa localization scene graphs

Published: July 22, 2025. Version: 1.0.0


Database Restricted Access

Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels

Kendall Park, Rory Sayres, Andrew Sellergren, et al.

This work further refines the labels associated with CheXpert in MIMIC-CXR-JPG 2.0.0 by filtering with Med-PaLM 2 followed by verification by manual review by three US board-certified radiologists.

mimic-cxr labels

Published: Feb. 4, 2025. Version: 1.0.0


Database Restricted Access

Visual Question Answering evaluation dataset for MIMIC CXR

Timo Kohlberger, Charles Lau, Tom Pollard, et al.

This dataset provides 224 VQAs for 40 test set cases, and 111 VQAs for 23 validation set cases of the MIMIC CXR dataset.

Published: Jan. 28, 2025. Version: 1.0.0


Database Credentialed Access

RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives

Yuxiang Liao, Hantao Liu, Irena Spasic

RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. Based on the annotated data, we fine-tuned a deep neural model and used it to annotate the whole MIMIC-CXR dataset. Both data are available.

natural language processing coreference resolution radiology

Published: Jan. 30, 2024. Version: 1.0.0


Database Credentialed Access

LLaVA-Rad MIMIC-CXR Annotations

Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, et al.

This dataset provides GPT-4 extracted sections of radiology reports from MIMIC-CXR, complementing rule-based section extractions with additional reports with findings, and removing references to priors from findings.

Published: Jan. 24, 2025. Version: 1.0.0


Database Restricted Access

LATTE-CXR: Locally Aligned TexT and imagE, Explainable dataset for Chest X-Rays

Elham Ghelichkhan, Tolga Tasdizen

This dataset includes bounding box-statement pairs for chest X-ray images, derived from radiologists’ eye-tracking data (for explainability) and annotations, for local visual-language models.

eye-tracking chest x-ray dataset automatically generated dataset caption-guided object detection image captioning with region-level description grounded radiology report generation phrase grounding xai multi-modal learning local visual-language models localization

Published: Feb. 4, 2025. Version: 1.0.0


Database Open Access

ReXErr-v1: Clinically Meaningful Chest X-Ray Report Errors Derived from MIMIC-CXR

Vishwanatha Rao, Serena Zhang, Julian Acosta, et al.

Chest X-Ray reports containing synthetic errors based upon the MIMIC-CXR database. Errors were injected using LLMs and sampled across common human and AI model errors.

Published: March 19, 2025. Version: 1.0.0


Database Contributor Review

CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools

Eulalia Farre Maduell, Salvador Lima-Lopez, Santiago Andres Frid, et al.

CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.

de-identification clinical ner anonymization

Published: April 20, 2024. Version: 1.0.1


Database Credentialed Access

Eye Gaze Data for Chest X-rays

Alexandros Karargyris, Satyananda Kashyap, Ismini Lourentzou, et al.

This dataset was a collected using an eye tracking system while a radiologist interpreted and read 1,083 public CXR images. The dataset contains the following aligned modalities: image, transcribed report text, dictation audio and eye gaze data.

convolutional network heatmap eye tracking explainability audio chest cxr machine learning chest x-ray radiology multimodal deep learning

Published: Sept. 12, 2020. Version: 1.0.0


Database Credentialed Access

Eye Gaze Data for Chest X-rays

Alexandros Karargyris, Satyananda Kashyap, Ismini Lourentzou, et al.

This dataset was a collected using an eye tracking system while a radiologist interpreted and read 1,083 public CXR images. The dataset contains the following aligned modalities: image, transcribed report text, dictation audio and eye gaze data.

convolutional network heatmap eye tracking explainability audio chest cxr machine learning chest x-ray radiology multimodal deep learning

Published: Sept. 12, 2020. Version: 1.0.0