Resources


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, et al.

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language mimic large language models

Published: Nov. 6, 2023. Version: 1.0.0


Database Open Access

CPAP Pressure and Flow Data from a Local Trial of 30 Adults at the University of Canterbury

Ella Guy, Jennifer Knopp, Geoff Chase

A pressure and flow dataset was collected from a trial of 30 adults at the University of Canterbury undergoing CPAP therapy for a variety of instructed breath rates at PEEP levels of 4cmH2O and 7cmH2O.

peep cpap respiratory mechanics pulmonary mechanics respiratory modelling biomedical engineering

Published: March 24, 2022. Version: 1.0.1


Database Open Access

Neurophysiological Dataset of Stress Resilience During Human-Computer Interaction

Shotabdi Roy, Joseph Nuamah

This dataset contains multimodal neurophysiological and physiological recordings collected from participants performing cognitively demanding tasks to study the temporal dynamics of stress resilience during human-computer interaction

Published: Feb. 27, 2026. Version: 1.0.0


Challenge Credentialed Access

ArchEHR-QA: A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Sarvesh Soni, Dina Demner-Fushman

A dataset for grounded question answering (QA) from electronic health records (EHRs).

question answering electronic health record patient portals clinicians

Published: Jan. 1, 2026. Version: 1.3


Database Credentialed Access

EchoGraph-annotated ECHO-NOTE2NUM examples

Chieh-Ju Chao, Mohammad Asadi

EchoGraph is a model that automatically extracts and structures clinical information from echocardiogram reports. The Annotated ECHO-NOTE2NUM Dataset contains MIMIC-III echo reports enhanced with EchoGraph annotations to enhance future research.

Published: Dec. 3, 2025. Version: 1.0.0


Database Contributor Review

InReDD-Dataset-PAN924

Caio Uehara Martins, Camila Tirapelli, Hugo Gaêta-Araujo, et al.

InReDD‑Dataset-V1 is a collection of 924 anonymised panoramic dental radiographs curated by the Interdisciplinary Research Group in Digital Dentistry (InReDD) at the University of São Paulo.

Published: Nov. 22, 2025. Version: 1.0.0


Database Credentialed Access

MedVAL-Bench: Expert-Annotated Medical Text Validation Benchmark

Asad Aali, Vasiliki Bikia, Maya Varma, et al.

MedVAL-Bench is the first large-scale physician-validated benchmark for medical text validation, spanning 6 diverse medical tasks and containing 840 language model-generated outputs annotated by 12 physicians with error assessments and risk grades.

Published: Nov. 14, 2025. Version: 1.0.1


Database Credentialed Access

PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions

Daeun Kyung, Hyunseung Chung, Seongsu Bae, et al.

PatientSim is a patient simulator that simulates realistic and diverse personas for clinical scenarios, enabling robust training and evaluation of doctor-patient interactions in multi-turn dialogues.

electronic health records multi-turn dialogue llm simulation doctor-patient consultation

Published: Oct. 18, 2025. Version: 1.0.0


Database Restricted Access

TN-Mammo: A Multi-view Mammography Dataset for Breast Density Classification

Binh Nguyen, Cat Le, Loc Vu, et al.

We release the first version of TN-Mammo (June 2024), a mammogram dataset of 676 cases with breast density labels, providing high-quality data to support machine learning and early breast cancer detection.

Published: Oct. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-22MCTS: A 22 Millions-Event Temporal Clinical Time-Series Dataset with Relative Timestamp

Jing Wang, Xing Niu, Tong Zhang, et al.

It is a time series clinical events dataset with concrete temporal information. The dataset consists of 22,588,586 clinical events and related timestamps from 267,284 discharge summaries of the MIMIC-IV-Note.

mimic clinical event annotation time series temporal annotation

Published: Sept. 29, 2025. Version: 1.0.0