Resources


Database Credentialed Access

RadVLM Instruction Dataset

Nicolas Deperrois, Hidetoshi Matsuo, Samuel Ruiperez-Campillo, Moritz Vandenhirtz, Sonia Laguna, Alain Ryser, Koji Fujimoto, Mizuho Nishio, Thomas Sutter, Julia Vogt, Jonas Kluckert, Thomas Frauenfelder, Christian Bluethgen, Farhad Nooralahzadeh, Michael Krauthammer

This dataset is designed to construct RadVLM, a vision–language model for chest X-ray interpretation. It includes instruction data for tasks such as report generation, abnormality detection, and region grounding, and multitask conversation.

chest x-rays vision-language models medical ai

Published: Sept. 25, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples

Zhenbang Wu, Anant Dadu, Mike Nalls, Faraz Faghri, Jimeng Sun

This dataset contains 450K open-ended instruction-following examples generated using GPT-3.5 based on the MIMIC-IV EHR database.

large language models medical question answering instruction tuning

Published: Sept. 9, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation

Philip Chung, Akshay Swaminathan, Alex Goodell, Yeasul Kim, Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel badih El Ariss, Marc Ghanem, David Seong, Andrew Lee, Caitlin Coombes, Brad Bradshaw, Mahir Sufian, Hyo Jung Hong, Teresa Nguyen, Mohammad Rasouli, Komal Kamra, Mark Burbridge, James McAvoy, Roya Saffary, Stephen Parnell Ma, Dev Dash, James Xie, Ellen Wang, Cliff Schmiesing, Nigam Shah, Nima Aghaeepour

A clinician-labeled dataset for fact-checking long-form clinical text against patient EHRs. The dataset contains LLM-written and human-written Brief Hospital Course summaries decomposed to atomic claim and sentence propositions with annotations.

artificial intelligence natural language processing clinical notes electronic health records large language models brief hospital course long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation

Published: April 9, 2025. Version: 1.0.0


Database Restricted Access

LATTE-CXR: Locally Aligned TexT and imagE, Explainable dataset for Chest X-Rays

Elham Ghelichkhan, Tolga Tasdizen

This dataset includes bounding box-statement pairs for chest X-ray images, derived from radiologists’ eye-tracking data (for explainability) and annotations, for local visual-language models.

eye-tracking chest x-ray dataset automatically generated dataset caption-guided object detection image captioning with region-level description grounded radiology report generation phrase grounding xai multi-modal learning local visual-language models localization

Published: Feb. 4, 2025. Version: 1.0.0


Database Credentialed Access

RaDialog Instruct Dataset

Chantal Pellegrini, Ege Özsoy, Benjamin Busam, Nassir Navab, Matthias Keicher

Image-based instruct data for Chest X-Ray understanding and analysis.

medical image understaning radiology chatbot radiology report generation radiology assistant large vision-language models

Published: July 12, 2024. Version: 1.1.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0


Database Open Access

CPAP Pressure and Flow Data from a Local Trial of 30 Adults at the University of Canterbury

Ella Guy, Jennifer Knopp, Geoff Chase

A pressure and flow dataset was collected from a trial of 30 adults at the University of Canterbury undergoing CPAP therapy for a variety of instructed breath rates at PEEP levels of 4cmH2O and 7cmH2O.

peep cpap respiratory mechanics pulmonary mechanics respiratory modelling biomedical engineering

Published: March 24, 2022. Version: 1.0.1


Database Credentialed Access

EchoGraph-annotated ECHO-NOTE2NUM examples

Chieh-Ju Chao, Mohammad Asadi

EchoGraph is a model that automatically extracts and structures clinical information from echocardiogram reports. The Annotated ECHO-NOTE2NUM Dataset contains MIMIC-III echo reports enhanced with EchoGraph annotations to enhance future research.

Published: Dec. 3, 2025. Version: 1.0.0


Database Contributor Review

InReDD-Dataset-PAN924

Caio Uehara Martins, Camila Tirapelli, Hugo Gaêta-Araujo, Jose Augusto Baranauskas, Breno Zancan, Jose Carneiro, Alessandra Macedo

InReDD‑Dataset-V1 is a collection of 924 anonymised panoramic dental radiographs curated by the Interdisciplinary Research Group in Digital Dentistry (InReDD) at the University of São Paulo.

Published: Nov. 22, 2025. Version: 1.0.0


Database Credentialed Access

MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

Asad Aali, Vasiliki Bikia, Maya Varma, Nicole Chiou, Sophie Ostmeier, Arnav Singhvi, Magdalini Paschali, Ashwin Kumar, Andrew Johnston, Karimar Amador Martinez, Eduardo Perez Guerrero, Paola Cruz Rivera, Sergios Gatidis, Christian Bluethgen, Eduardo Pontes Reis, Eddy Zandee van Rilland, Poonam Hosamani, Kevin Keet, Minjoung Go, Evelyn Ling, David Larson, Curtis Langlotz, Roxana Daneshjou, Jason Hom, Sanmi Koyejo, Emily Alsentzer, Akshay Chaudhari

MedVAL-Bench is the first large-scale physician-validated benchmark for medical text validation, spanning 6 diverse medical tasks and containing 840 language model-generated outputs annotated by 12 physicians with error assessments and risk grades.

Published: Nov. 14, 2025. Version: 1.0.1