Resources


Challenge Credentialed Access

ShAReCLEF eHealth 2013: Natural Language Processing and Information Retrieval for Clinical Care

Danielle Mowery

2013 ShARe/CLEF eHealth Evaluation Lab: Natural Language Processing and Information Retrieval for Clinical Care (Tasks 1 and 2).

natural language processing

Published: Feb. 15, 2013. Version: 1.0


Database Restricted Access

MIMIC-III-Ext-Synthetic-Clinical-Trial-Questions

Elizabeth Woo, Michael Craig Burkhart, Emily Alsentzer, Brett Beaulieu-Jones

In our recent study, we used Llama-3.1-70B-Instruct to generate synthetic training examples resembling clinical trial eligibility criteria. We manually reviewed 1000 of these examples and release them here.

large language models synthetic data distillation clinical trial eligibility

Published: April 22, 2025. Version: 1.0.0


Database Credentialed Access

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven QH Truong, Du Nguyen Duong, Tan Bui, Pierre Chambon, Matthew Lungren, Andrew Ng, Curtis Langlotz, Pranav Rajpurkar

RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports, which are obtained using a novel information extraction (IE) schema to capture clinically relevant information in a radiology report.

entity and relation extraction graph multi-modal natural language processing radiology

Published: June 3, 2021. Version: 1.0.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Database Credentialed Access

EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha, Hangyul Yoon, Kwang Hyun Kim, Jeewon Yang, Seunghyun Won, Edward Choi

An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries

Published: June 26, 2024. Version: 1.0.1


Database Credentialed Access

RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports

Jean-Benoit Delbrouck

RadGraph-XL is a large, expert-annotated dataset of 2,300 radiology reports covering multiple modalities and anatomies. It enables accurate extraction of clinical entities and relations for downstream medical AI tasks.

Published: Sept. 12, 2025. Version: 1.0.0


Database Restricted Access

Swiss-Mammo: A physician-written, synthetic dataset of German mammography reports

Daniel Reichenpfader, Sandro von Däniken, Harald Marcel Bonel

Swiss-Mammo: A physician-written, synthetic dataset of 28 German mammography reports. The dataset is stratified based on BI-RADS categories and available in German and English.

radiology mammography structured reporting bi-rads

Published: June 24, 2025. Version: 1.0.1


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Software Open Access

Cerebral Haemodynamic Autoregulatory Information System GUI

Acute Brain injury (ABI) is a devastating event requiring intensive acute treatment and post-injury rehabilitation, both delivered for indeterminate periods of time. For severe ABIs, acute treatment is aimed at stabilizing the patient to prevent sec…

brain injury intracranial pressure

Published: Dec. 16, 2016. Version: 1.0.0


Database Credentialed Access

AIPatient KG: MIMIC-III and CORAL Electronic Health Records based Patient Knowledge Graph

Lizhou Fan, Huizi Yu

This project integrates MIMIC-III and CORAL electronic health records into knowledge graphs to improve medical analysis and enhance decision-making capabilities. Resources include two knowledge graph snapshots and two question-and-answering datasets.

Published: April 15, 2025. Version: 1.0.0