Resources


Database Contributor Review

CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools

Eulalia Farre Maduell, Salvador Lima-Lopez, Santiago Andres Frid, Artur Conesa, Elisa Asensio, Antonio Lopez-Rueda, Helena Arino, Elena Calvo, Maria Jesús Bertran, Maria Angeles Marcos, Montserrat Nofre Maiz, Laura Tañá Velasco, Antonia Marti, Ricardo Farreres, Xavier Pastor, Xavier Borrat Frigola, Martin Krallinger

CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.

de-identification clinical ner anonymization

Published: April 20, 2024. Version: 1.0.1


Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, Brenda Miao, Travis Zack, Atul Butte

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

artificial intelligence electronic health records information extraction natural language processing oncology large language models

Published: Feb. 7, 2024. Version: 1.0


Challenge Credentialed Access

SNOMED CT Entity Linking Challenge

Will Hardman, Mark Banks, Rory Davidson, Donna Truran, Nindya Widita Ayuningtyas, Hoa Ngo, Alistair Johnson, Tom Pollard

272 discharge notes from the MIMIC-IV-Note dataset annotated with SNOMED CT concepts.

snomed entity linking clinical annotation

Published: Dec. 19, 2023. Version: 1.0.0


Database Credentialed Access

Radiology Report Expert Evaluation (ReXVal) Dataset

Feiyang Yu, Mark Endo, Rayan Krishnan, Ian Pan, Andy Tsai, Eduardo Pontes Reis, Eduardo Kaiser Ururahy Nunes Fonseca, Henrique Lee, Zahra Shakeri, Andrew Ng, Curtis Langlotz, Vasantha Kumar Venugopal, Pranav Rajpurkar

The Radiology Report Expert Evaluation (ReXVal) Dataset is a publicly available dataset of radiologist evaluations of errors in automatically generated radiology reports.

Published: June 20, 2023. Version: 1.0.0


Database Open Access

Simultaneous physiological measurements with five devices at different cognitive and physical loads

Marcus Vollmer, Dominic Bläsing, Julian Elias Reiser, Maria Nisser, Anja Buder

Dataset to support comparison of usability and accuracy from simultaneous measurements collected from 13 subjects including five devices: NeXus-10 MKII, eMotion Faros 360°, Hexoskin Hx1, SOMNOTouch NIBP, Polar RS800 Multi.

holter multiparameter photoplethysmogram noise accelerometer heart rate ecg movement temperature hrv respiration

Published: Jan. 18, 2023. Version: 1.0.2

Visualize waveforms

Database Credentialed Access

RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports

Sarvesh Soni, Kirk Roberts

RadQA is an electronic health record question answering dataset containing clinical questions that can be answered using the Findings and Impressions sections of radiology reports

electronic health records clinical notes question answering radiology reports machine reading comprehension

Published: Dec. 9, 2022. Version: 1.0.0


Database Credentialed Access

MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel Coelho de Castro, Anton Schwaighofer, Stephanie Hyland, Maria Teodora Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez Valle, Hoifung Poon, Ozan Oktay

MS-CXR is a new dataset containing 1162 Chest X-ray bounding box labels paired with radiology text descriptions, annotated and verified by two board-certified radiologists.

chest x-ray vision-language processing

Published: May 16, 2022. Version: 0.1


Database Open Access

Lobachevsky University Electrocardiography Database

Alena Kalyakulina, Igor Yusipov, Viktor Moskalenko, Alexander Nikolskiy, Konstantin Kosonogov, Nikolai Zolotykh, Mikhail Ivanchenko

ECG signal database that consists of 200 10-second 12-lead records. The boundaries and peaks of P, T waves and QRS complexes were manually annotated by cardiologists. Each record is annotated with the corresponding diagnosis.

diagnosis electrocardiography delineation open database database ecg

Published: Jan. 19, 2021. Version: 1.0.1

Visualize waveforms

Database Open Access

PhysioZoo - mammalian NSR databases

Ori Shemla, Joachim Behar

PhysioZoo is a collaborative platform dedicated to the study of the heart rate variability in electrophysiological recordings from mammals

heart rate variabillity electrophysiology mammals ecg

Published: Aug. 27, 2019. Version: 1.0.0

Visualize waveforms

Software Open Access

ECGPUWAVE

ecgpuwave analyses an ECG signal from the specified record, detecting the QRS complexes and locating the beginning, peak, and end of the P, QRS, and ST-T waveforms. The output of ecgpuwave is written as a standard WFDB-format annotation file associa…

ecg

Published: Oct. 29, 2018. Version: 1.3.4