Resources


Model Credentialed Access

Asclepius-R : Clinical Large Language Model Built On MIMIC-III Discharge Summaries

Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi

Asclepius: Publicly Available Clinical Large Language Models with Synthetic Clinical Notes Asclepius-R: A instruction-finetuned large language model with MIMIC-III clinical notes

clinical notes synthetic clinical notes synthetic notes asclepius open-source llm clinical llm large language model

Published: March 25, 2024. Version: 1.1.0


Software Open Access

Transformer-DeID: Deidentification of free-text clinical notes with transformers

Callandra Moore, Lucas Bulgarelli, Tom Pollard, Alistair Johnson

Fine tune transformer-based neural networks to deidentify clinical text data.

deidentification neural networks transformers

Published: Nov. 2, 2023. Version: 1.0.0


Database Credentialed Access

ReFiSco: Report Fix and Score Dataset for Radiology Report Generation

Katherine Tian, Sina J Hartung, Andrew A Li, Jaehwan Jeong, Fardad Behzadi, Juan Calle-Toro, Subathra Adithan, Michael Pohlen, David Osayande, Pranav Rajpurkar

Preliminary human expert evaluation study on 60 MIMIC-CXR radiology reports

Published: Aug. 23, 2023. Version: 0.0


Model Credentialed Access

Medical AI Research Foundations: A repository of medical foundation models

Shekoofeh Azizi, Jan Freyberg, Laura Culp, Patricia MacWilliams, Sara Mahdavi, Vivek Natarajan, Alan Karthikesalingam

Medical AI Research Foundations is a repository of medical foundation models.

Published: April 25, 2023. Version: 1.0.0


Database Credentialed Access

MS-CXR-T: Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Shruthi Bannur, Stephanie Hyland, Qianchu Liu, Fernando Pérez-García, Max Ilse, Daniel Coelho de Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anton Schwaighofer, Maria Teodora Wetscherek, Hannah Richardson, Tristan Naumann, Javier Alvarez Valle, Ozan Oktay

The MS-CXR-T is a multimodal benchmark that enhances the MIMIC-CXR v2 dataset by including expert-verified annotations. Its goal is to evaluate biomedical visual-language processing models in terms of temporal semantics extracted from image and text.

disease progression cxr vision-language processing chest x-ray radiology multimodal

Published: March 17, 2023. Version: 1.0.0


Model Credentialed Access

Clinical-T5: Large Language Models Built Using MIMIC Clinical Text

Eric Lehman, Alistair Johnson

We train a T5-Base and T5-Large from scratch on MIMIC-III and MIMIC-IV. Additionally, we further pretrain T5-Base and SciFive on notes from MIMIC. We release these model weights on PhysioNet.

Published: Jan. 25, 2023. Version: 1.0.0


Database Open Access

PTB-XL, a large publicly available electrocardiography dataset

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Wojciech Samek, Tobias Schaeffter

The PTB-XL ECG dataset is a large dataset of 21801 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw signal data has been annotated by up to two cardiologists with 71 different ECG statements and is supplemented by rich metadata.

ptb-xl ptb ecg electrocardiography

Published: Nov. 9, 2022. Version: 1.0.3

Visualize waveforms

Database Credentialed Access

Chest X-ray segmentation images based on MIMIC-CXR

Li-Ching Chen, Po-Chih Kuo, Ryan Wang, Judy Gichoya, Leo Anthony Celi

A chest x-rays segmentation dataset derived from MIMIC-CXR based on deep learning algorithm and human examination.

segmentation chest x-rays cxr

Published: Aug. 18, 2022. Version: 1.0.0


Database Credentialed Access

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

Jayetri Bardhan, Anthony Colas, Kirk Roberts, Daisy Zhe Wang

DrugEHRQA is a QA dataset containing question-answers from MIMIC-III tables and discharge summaries.

question-answer qa

Published: April 12, 2022. Version: 1.0.0


Database Credentialed Access

RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain

Pavel Blinov, Aleksandr Nesterov, Galina Zubkova, Arina Reshetnikova, Vladimir Kokh, Chaitanya Shivade

RuMedNLI is the full counterpart dataset of MedNLI in Russian language.

natural language inference recognizing textual entailment russian language

Published: April 1, 2022. Version: 1.0.0