Resources


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

Asad Aali, Dave Van Veen, Yamin Arefeen, Jason Hom, Christian Bluethgen, Eduardo Pontes Reis, Sergios Gatidis, Namuun Clifford, Joseph Daws, Arash Tehrani, Jangwon Kim, Akshay Chaudhari

This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.

natural language processing clinical notes brief hospital course machine learning text summarization

Published: Feb. 3, 2025. Version: 1.2.0


Database Restricted Access

CXRGraph: Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation

Yuxiang Liao, Hoisang Heung, Hantao Liu, Irena Spasic

CXRGraph is a structured radiology report dataset built upon RadGraph and tailored for the Automatic Radiology Report Generation task. It can identify more task-relevant information such as abnormalities and hallucinated prior references.

relation extraction information extraction natural language processing named entity recognition structured radiology report

Published: Feb. 3, 2025. Version: 1.0.0


Database Credentialed Access

MedDec: Medical Decisions for Discharge Summaries in the MIMIC-III Database

Mohamed Elgaar, Jiali Cheng, Nidhi Vakil, Hadi Amiri, Leo Anthony Celi

Annotations of ten types of medical decisions from discharge summaries in the MIMIC-III database.

natural language processing medical decisions span classification discharge summary mimic

Published: Oct. 16, 2024. Version: 1.0.0


Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, Brenda Miao, Travis Zack, Atul Butte

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

artificial intelligence information extraction oncology natural language processing large language models electronic health records

Published: Feb. 7, 2024. Version: 1.0


Database Credentialed Access

RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives

Yuxiang Liao, Hantao Liu, Irena Spasic

RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. Based on the annotated data, we fine-tuned a deep neural model and used it to annotate the whole MIMIC-CXR dataset. Both data are available.

natural language processing coreference resolution radiology

Published: Jan. 30, 2024. Version: 1.0.0


Database Credentialed Access

Annotation dataset of social determinants of health from MIMIC-III Clinical Care Database

Marco Guevara, Shan Chen, Spencer Thomas, Danielle Bitterman

Annotation dataset of social determinants of health from MIMC-III Clinical Care Database notes.

natural language processing social determinants of health

Published: Jan. 24, 2024. Version: 1.0.1


Challenge Credentialed Access

BioNLP Workshop 2023 Shared Task 1A: Problem List Summarization

Yanjun Gao, Dmitriy Dligach, Timothy Miller, Majid Afshar

This is the data storage for BioNLP Workshop Shared Task 1A: Problem List Summarization.

bionlp clinical natural language processing electronic health record summarization

Published: Nov. 12, 2023. Version: 2.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0