Resources


Database Credentialed Access

AIPatient KG: MIMIC-III and CORAL Electronic Health Records based Patient Knowledge Graph

Lizhou Fan, Huizi Yu

This project integrates MIMIC-III and CORAL electronic health records into knowledge graphs to improve medical analysis and enhance decision-making capabilities. Resources include two knowledge graph snapshots and two question-and-answering datasets.

Published: April 15, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-III-Ext-tPatchGNN

Chenlong Yin, Weijia Zhang

The processed MIMIC-III dataset for the benchmark of Irregular Multivariate Time Series Forecasting: A Transformable Patching Graph Neural Networks Approach.

Published: April 9, 2025. Version: 1.0.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp machine learning named entity recognition data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Database Restricted Access

MIMIC-IV-Ext-DiReCT

Bowen Wang, Jiuyang Chang, Yiming Qian

A diagnostic reasoning dataset designed to evaluate the performance of large language models in aligning with human doctors when making diagnoses from clinical notes.

Published: Jan. 21, 2025. Version: 1.0.0


Database Contributor Review

A multimodal dental dataset facilitating machine learning research and clinic services

Wenjing Liu, Yunyou Huang, Suqin Tang

A new dental dataset that contains 169 patients, three commonly used dental image models, and images of various health conditions of the oral cavity.

Published: Oct. 11, 2024. Version: 1.1.0


Database Restricted Access

Multimodal Physiological Indices During Surgery Under Anesthesia

Sandya Subramanian, Bryan Tseng, Riccardo Barbieri, Emery Brown

Multimodal physiological indices collected during surgery when patients were under anesthesia

anesthesia nociception

Published: Aug. 23, 2024. Version: 1.0


Database Open Access

Brno University of Technology Smartphone PPG Database (BUT PPG)

Andrea Nemcova, Radovan Smisek, Eniko Vargova, Lucie Maršánová, Martin Vitek, Lukas Smital, Marina Filipenska, Pavlina Sikorova, Pavel Gálík

BUT PPG is a database created for the purpose of evaluating PPG signal quality and estimation of heart rate. The data comprises 3,888 10s recordings of PPGs recorded by smartphone and associated ECG and ACC signals and annotations.

heart rate artificial intelligence ppg ecg acc signal quality assessment annotations accelerometric data photoplethysmography electrocardiogram

Published: Aug. 23, 2024. Version: 2.0.0


Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, Brenda Miao, Travis Zack, Atul Butte

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

artificial intelligence information extraction oncology natural language processing large language models electronic health records

Published: Feb. 7, 2024. Version: 1.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Database Open Access

Patient-level dataset to study the effect of COVID-19 in people with Multiple Sclerosis

Hamza Khan, Lotte Geys, peer baneke, Giancarlo Comi, Liesbet Peeters

This dataset is part of the Global Data Sharing Initiative. The data was acquired by people with MS and clinicians using a fast data entry tool. The dataset includes demographics, comorbidities and hospital stay and COVID-19 symptoms of PwMS.

Published: Jan. 2, 2024. Version: 1.0.1