Resources


Database Credentialed Access

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi

Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Published: March 19, 2025. Version: 1.0.1


Database Credentialed Access

MIMIC-IV

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, Roger Mark

Large database of de-identified health information from patients admitted to Beth Israel Deaconess Medical Center

critical care intensive care unit machine learning mimic

Published: Oct. 11, 2024. Version: 3.1


Database Contributor Review

A multimodal dental dataset facilitating machine learning research and clinic services

Wenjing Liu, Yunyou Huang, Suqin Tang

A new dental dataset that contains 169 patients, three commonly used dental image models, and images of various health conditions of the oral cavity.

Published: Oct. 11, 2024. Version: 1.1.0


Database Open Access

Integration of Electroencephalogram and Eye-Gaze Datasets for Performance Evaluation in Fundamentals of Laparoscopic Surgery (FLS) Tasks

Somayeh B Shafiei, Saeed Shadpour

Brain activity and eye gaze data were collected from a group of 25 participants who completed the FLS tasks using a trainer box (Pyxus®). Each participant performed the tasks five times, and their performance was evaluated by an expert rater.

Published: Aug. 23, 2023. Version: 1.0.0

Visualize waveforms

Database Credentialed Access

TherLid: A Thermometry Linked Dataset

Jeremy Tan, Inês Martins, João Matos, Tiago Filipe Sousa Gonçalves, Tetsu Ohnuma, Jaime dos Santos Cardoso, Leo Anthony Celi, Vijay Krishnamoorthy, Andrea Lane, An Kwok Wong

TherLiD is an open-source dataset of 13,251 paired temperature readings (contact and infrared) from MIMIC-IV and eICU databases. With added demographics and derived data, it supports research on racial and ethnic disparities in infrared thermometry.

thermometry intensive care unit health equity electronic health records

Published: Jan. 21, 2025. Version: 1.0.0


Software Credentialed Access

Code for generating the HAIM multimodal dataset of MIMIC-IV clinical data and x-rays

Luis R Soenksen, Yu Ma, Cynthia Zeng, Leonard David Jean Boussioux, Kimberly Villalobos Carballo, Liangyuan Na, Holly Wiberg, Michael Li, Ignacio Fuentes, Dimitris Bertsimas

Code for generating the HAIM multimodal dataset of MIMIC-IV clinical data and x-rays

database code multimodality

Published: Aug. 23, 2022. Version: 1.0.1


Database Credentialed Access

Comprehensive Polysomnography (CPS) Dataset: A Resource for Sleep-Related Arousal Research

Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Philipp Walter, Gjergji Kasneci

This dataset includes polysomnographic sleep recordings from a study on sleep-related arousal diagnostics, featuring raw and derived data channels, annotated event types, and questionnaire data.

polysomnography sleep disorders machine learning in healthcare sleep arousal diagnostics pulse wave analysis

Published: Sept. 18, 2024. Version: 1.0.0


Challenge Credentialed Access

ShAReCLEF eHealth 2013: Natural Language Processing and Information Retrieval for Clinical Care

Danielle Mowery

2013 ShARe/CLEF eHealth Evaluation Lab: Natural Language Processing and Information Retrieval for Clinical Care (Tasks 1 and 2).

natural language processing

Published: Feb. 15, 2013. Version: 1.0


Challenge Open Access

Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019

Matthew Reyna, Chris Josef, Russell Jeter, Supreeth Shashikumar, Benjamin Moody, M. Brandon Westover, Ashish Sharma, Shamim Nemati, Gari D. Clifford

The 2019 PhysioNet Computing in Cardiology Challenge invites participants to predict sepsis in clinical data

prediction challenge sepsis

Published: Aug. 5, 2019. Version: 1.0.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0