Resources


Database Credentialed Access

Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset

Eric Lehman

Dataset of questions asked by medical experts about patients. Medical experts will read a discharge summary line-by-line and (1) ask any question that they may have and (2) record what in the text "triggered" them to ask their question.

question generation question answering machine learning

Published: July 28, 2022. Version: 1.0


Database Credentialed Access

MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Chaitanya Shivade

This is a resource for training machine learning models for language inference in the medical domain.

natural language inference recognizing textual entailment

Published: Oct. 1, 2019. Version: 1.0.0


Database Open Access

MIMIC-III Clinical Database Demo

Alistair Johnson, Tom Pollard, Roger Mark

An open source demo of the MIMIC-III Clinical Database

critical care electronic health records mimic

Published: April 24, 2019. Version: 1.4


Challenge Credentialed Access

ShAReCLEF eHealth Evaluation Lab 2014 (Task 2): Disorder Attributes in Clinical Reports

Danielle Mowery

The ShARe/CLEF eHealth 2014 Challenge (Task 2) on Disorder Attributes in Clinical Reports

Published: Nov. 1, 2013. Version: 1.0


Database Credentialed Access

Symile-MIMIC: a multimodal clinical dataset of chest X-rays, electrocardiograms, and blood labs from MIMIC-IV

Adriel Saporta, Aahlad Manas Puli, Mark Goldstein, et al.

A multimodal clinical dataset consisting of CXRs, ECGs, and blood labs, designed to evaluate Symile, a simple contrastive loss that accommodates any number of modalities and allows any model to produce representations for each modality.

database cxr chest x-ray contrastive learning model multimodal mimic electrocardiogram ecg

Published: Jan. 28, 2025. Version: 1.0.0


Database Credentialed Access

RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives

Yuxiang Liao, Hantao Liu, Irena Spasic

RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. Based on the annotated data, we fine-tuned a deep neural model and used it to annotate the whole MIMIC-CXR dataset. Both data are available.

natural language processing coreference resolution radiology

Published: Jan. 30, 2024. Version: 1.0.0


Database Open Access

MIMIC-IV Clinical Database Demo

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, et al.

An openly available subset of patients in the MIMIC-IV database.

critical care electronic health record mimic

Published: Jan. 31, 2023. Version: 2.2


Database Restricted Access

Upper body thermal images and associated clinical data from a pilot cohort study of COVID-19

Jose Tamez-Peña, Adam Yala, Servando Cardona, et al.

Thermal videos of people with positive and negative COVID-19 tests.

thermal videos sars-cov-2 clinical symptoms covid-19

Published: Aug. 16, 2021. Version: 1.1


Database Credentialed Access

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Saahil Jain, Ashwin Agrawal, Adriel Saporta, et al.

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports, which are obtained using a novel information extraction (IE) schema to capture clinically relevant information in a radiology report.

entity and relation extraction graph multi-modal natural language processing radiology

Published: June 3, 2021. Version: 1.0.0


Database Open Access

Clinical data from the MIMIC-II database for a case study on indwelling arterial catheters

Jesse Raffa

Dataset extracted from MIMIC-II for a tutorial on effectiveness of indwelling arterial catheters in hemodynamically stable patients with respiratory failure for mortality outcomes.

Published: Oct. 28, 2016. Version: 1.0