Resources


Database Credentialed Access

MIMIC-IV-Ext-CEKG: A Process-Oriented Dataset Derived from MIMIC-IV for Enhanced Clinical Insights

Milad Naeimaei Aali, Felix Mannhardt, Pieter Jelle Toussaint

The MIMIC-IV-Ext-CEKG dataset is crafted for object-centric process mining in healthcare, specifically to create clinical event knowledge graphs for patients with multimorbidity, as well as for data mining and machine learning tasks.

mimic process mining multi entity process mining object centric event log clinical event knowledge graph

Published: April 8, 2025. Version: 1.0.0


Challenge Credentialed Access

CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays

Gregory Holste, Mingquan Lin, Song Wang, et al.

CXR-LT 2024 was a challenge for long-tailed, multi-label, and zero-shot thorax disease classification on chest X-rays, held at MICCAI 2024. This page contains long-tailed labels for 45 diseases from the CXR-LT 2024 and 2023 challenges.

disease classification artificial intelligence chest x-ray computer-aided diagnosis long-tailed learning cardiopulmonary disease zero-shot learning deep learning

Published: March 19, 2025. Version: 2.0.0


Database Credentialed Access

MIMIC-IV-Ext Triage Instruction Corpus

Qingyang Shen, Quan Guo

MIMIC-IV-Ext Triage Instruction Corpus includes 9,629 ED triage cases organized by the five-level ESI, enabling LLMs to improve triage accuracy. It provides CSV data, generation prompts, expert validation samples, and SQL QC scripts.

nlp clinical decision support large language models emergency severity index emergency triage machine learning

Published: March 4, 2025. Version: 1.0.0


Database Restricted Access

OpenOximetry Repository

Nicholas Fong, Michael Lipnick, Philip Bickler, et al.

A repository of matched arterial oxygen and pulse oximeter readings obtained under controlled conditions, with high-frequency physiologic waveforms and skin color measurements.

Published: Feb. 28, 2025. Version: 1.1.1


Database Credentialed Access

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

Asad Aali, Dave Van Veen, Yamin Arefeen, et al.

This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.

natural language processing clinical notes brief hospital course text summarization machine learning

Published: Feb. 3, 2025. Version: 1.2.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition data augmentation entity normalization machine learning

Published: Feb. 3, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV on FHIR

Alex Bennett, Joshua Wiedekopf, Hannes Ulrich, et al.

MIMIC-IV and MIMIC-IV-ED data mapped into FHIR resources.

mimic-iv fhir electronic health record us core fast healthcare interoperability resources mimic

Published: Nov. 12, 2024. Version: 2.1


Database Credentialed Access

C-REACT: Contextualized Race and Ethnicity Annotations for Clinical Text

Oliver Bear Don't Walk IV, Adrienne Pichon, Harry Reyes Nieva, et al.

Two sets of gold-standard annotations for race and ethnicity information from clinical notes in MIMIC-III. Contains race and ethnicity label assignments and related information such as country of origin and spoken language.

clinical notes patient country information race and ethnicity patient language information

Published: Oct. 21, 2024. Version: 1.0.0


Database Credentialed Access

MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, et al.

We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.

question answering electronic health records evaluation chest x-ray radiology benchmark multimodal visual question answering deep learning machine learning

Published: July 19, 2024. Version: 1.0.0


Database Credentialed Access

MIMIC-CXR-JPG - chest radiographs with structured labels

Alistair Johnson, Matthew Lungren, Yifan Peng, et al.

Chest x-rays in JPG format with structured labels derived from the associated radiology report.

computer vision chest x-ray radiology mimic deep learning

Published: March 12, 2024. Version: 2.1.0