Resources


Database Credentialed Access

MIMIC-IV-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp

Jing Wang, Xing Niu, Tong Zhang, Jie Shen, Juyong Kim, Jeremy Weiss

It is a time series clinical events dataset with concrete temporal information. The dataset consists of 22,588,586 clinical events and related timestamps from 267,284 discharge summaries of the MIMIC-IV-Note.

mimic clinical event annotation time series temporal annotation

Published: Sept. 29, 2025. Version: 1.0.0


Model Credentialed Access

Clinical BERT Models Trained on Pseudo Re-identified MIMIC-III Notes

Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, Byron Wallace

We explore recovering sensitive info from BERT trained over non-deidentified EHR. We make our models and data available to further facilitate research.

Published: April 28, 2021. Version: 1.0.0


Database Credentialed Access

Curated Data for Describing Blood Glucose Management in the Intensive Care Unit

Aldo Robles Arévalo, Roselyn Mateo-Collado, Leo Anthony Celi

The data subsets consist of time series files that includes all the curated entries of glucose readings and insulin inputs from MIMIC-III database.

insulin replacement therapy glycemic control critical care

Published: April 19, 2021. Version: 1.0.1


Database Restricted Access

CheXchoNet: A Chest Radiograph Dataset with Gold Standard Echocardiography Labels

Pierre Elias, Shreyas Bhave

Early detection of heart failure is vital for improving outcomes. The dataset contains 71,589 CXRs paired with gold standard labels from echocardiograms to enable the training of models to detect pathologies indicative of early stage heart failure.

chest x-rays heart failure early detection cardiac structural abnormalties deep learning

Published: March 20, 2024. Version: 1.0.0


Database Restricted Access

MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications

Chihcheng Hsieh, Chun Ouyang, Jacinto C Nascimento, Joao Pereira, Joaquim Jorge, Catarina Moreira

MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications

Published: March 23, 2023. Version: 1.0.0


Database Credentialed Access

CAD-Chest: Comprehensive Annotation of Diseases based on MIMIC-CXR Radiology Report

Mengliang Zhang, Xinyue Hu, Lin Gu, Tatsuya Harada, Kazuma Kobayashi, Ronald Summers, Yingying Zhu

The CAD-Chest dataset provides comprehensive annotations of disease, including disease severity, uncertainty, and location based on the MIMIC-CXR radiologist reports.

chesr x-ray disease label

Published: Dec. 8, 2023. Version: 1.0


Database Credentialed Access

MIMIC-III-Ext-VeriFact-BHC: Labeled Propositions From Brief Hospital Course Summaries for Long-form Clinical Text Evaluation

Philip Chung, Akshay Swaminathan, Alex Goodell, Yeasul Kim, Momsen Reincke, Lichy Han, Ben Deverett, Mohammad Amin Sadeghi, Abdel badih El Ariss, Marc Ghanem, David Seong, Andrew Lee, Caitlin Coombes, Brad Bradshaw, Mahir Sufian, Hyo Jung Hong, Teresa Nguyen, Mohammad Rasouli, Komal Kamra, Mark Burbridge, James McAvoy, Roya Saffary, Stephen Parnell Ma, Dev Dash, James Xie, Ellen Wang, Cliff Schmiesing, Nigam Shah, Nima Aghaeepour

A clinician-labeled dataset for fact-checking long-form clinical text against patient EHRs. The dataset contains LLM-written and human-written Brief Hospital Course summaries decomposed to atomic claim and sentence propositions with annotations.

artificial intelligence natural language processing clinical notes large language models brief hospital course electronic health records long-form text chart review text reranking atomic claim hybrid retrieval clinical informatics clinical medicine fact verification retrieval-augmented generation logical atomism text embedding formal logic llm-as-a-judge llm evaluation

Published: April 9, 2025. Version: 1.0.0


Database Credentialed Access

RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives

Yuxiang Liao, Hantao Liu, Irena Spasic

RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. Based on the annotated data, we fine-tuned a deep neural model and used it to annotate the whole MIMIC-CXR dataset. Both data are available.

natural language processing coreference resolution radiology

Published: Jan. 30, 2024. Version: 1.0.0


Database Credentialed Access

PIFIR: PET-CT Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jeremy Ong, Ramin Alipour, Leon Worth, Monica Slavin, Karin Thursky, Karin Verspoor

A corpus of PET-CT reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 27, 2025. Version: 1.0.0


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, Joanne Teh, Leon Worth, Monica Slavin, karin thursky, Karin Verspoor

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2