Resources


Database Credentialed Access

MIMIC-III-Ext-tPatchGNN

Chenlong Yin, Weijia Zhang

The processed MIMIC-III dataset for the benchmark of Irregular Multivariate Time Series Forecasting: A Transformable Patching Graph Neural Networks Approach.

Published: April 9, 2025. Version: 1.0.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Database Restricted Access

MIMIC-IV-Ext-DiReCT

Bowen Wang, Jiuyang Chang, Yiming Qian

A diagnostic reasoning dataset designed to evaluate the performance of large language models in aligning with human doctors when making diagnoses from clinical notes.

Published: Jan. 21, 2025. Version: 1.0.0


Database Contributor Review

A multimodal dental dataset facilitating machine learning research and clinic services

Wenjing Liu, Yunyou Huang, Suqin Tang

A new dental dataset that contains 169 patients, three commonly used dental image models, and images of various health conditions of the oral cavity.

Published: Oct. 11, 2024. Version: 1.1.0


Database Restricted Access

Multimodal Physiological Indices During Surgery Under Anesthesia

Sandya Subramanian, Bryan Tseng, Riccardo Barbieri, Emery Brown

Multimodal physiological indices collected during surgery when patients were under anesthesia

anesthesia nociception

Published: Aug. 23, 2024. Version: 1.0


Database Open Access

Brno University of Technology Smartphone PPG Database (BUT PPG)

Andrea Nemcova, Radovan Smisek, Eniko Vargova, Lucie Maršánová, Martin Vitek, Lukas Smital, Marina Filipenska, Pavlina Sikorova, Pavel Gálík

BUT PPG is a database created for the purpose of evaluating PPG signal quality and estimation of heart rate. The data comprises 3,888 10s recordings of PPGs recorded by smartphone and associated ECG and ACC signals and annotations.

heart rate artificial intelligence ppg ecg acc signal quality assessment annotations accelerometric data photoplethysmography electrocardiogram

Published: Aug. 23, 2024. Version: 2.0.0


Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, Brenda Miao, Travis Zack, Atul Butte

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

artificial intelligence information extraction oncology natural language processing electronic health records large language models

Published: Feb. 7, 2024. Version: 1.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Database Open Access

Patient-level dataset to study the effect of COVID-19 in people with Multiple Sclerosis

Hamza Khan, Lotte Geys, peer baneke, Giancarlo Comi, Liesbet Peeters

This dataset is part of the Global Data Sharing Initiative. The data was acquired by people with MS and clinicians using a fast data entry tool. The dataset includes demographics, comorbidities and hospital stay and COVID-19 symptoms of PwMS.

Published: Jan. 2, 2024. Version: 1.0.1


Challenge Credentialed Access

BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization

Yanjun Gao, Dmitriy Dligach, Timothy Miller, Majid Afshar

This is the data storage for BioNLP Workshop Shared Task 1A: Problem List Summarization.

bionlp clinical natural language processing electronic health record summarization

Published: Nov. 12, 2023. Version: 2.0.0