Resources


Database Contributor Review

CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools

Eulalia Farre Maduell, Salvador Lima-Lopez, Santiago Andres Frid, Artur Conesa, Elisa Asensio, Antonio Lopez-Rueda, Helena Arino, Elena Calvo, Maria Jesús Bertran, Maria Angeles Marcos, Montserrat Nofre Maiz, Laura Tañá Velasco, Antonia Marti, Ricardo Farreres, Xavier Pastor, Xavier Borrat Frigola, Martin Krallinger

CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.

de-identification clinical ner anonymization

Published: April 20, 2024. Version: 1.0.1


Database Credentialed Access

Nosocomial Risk Datasets from MIMIC-III

Travis Goodwin

Text-based Longitudinal Data for Predicting Nosocomial Disease Risk as used by CANTRIP.

pressure injury risk prediction acute kidney injury anemia forecasting natural language processing deep learning

Published: Sept. 15, 2022. Version: 1.0


Database Contributor Review

BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language

Henrique Dias, Ana Helena Dias Pereira dos Ulbrich

Brazilian clinical dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states.

prescriptions exams tertiary care natural language processing clinical notes

Published: July 14, 2022. Version: 1.1


Database Credentialed Access

Maternal fat ultrasound measurement and nutritional assessment during pregnancy: A dataset centered in gestational outcomes

Alexandre da Silva Rocha, Juliana Rombaldi Bernardi, Alice Schoffel, Daniela Kretzer, Salete Matos, José Antônio Magalhães, Marcelo Goldani

Dataset collected as part of a prospective study in which abdominal maternal fat tissue measurements were compared with outcomes during hospitalization for labor and delivery.

pregnancy ultrasound abdominal

Published: Dec. 4, 2020. Version: 1.0.0


Database Credentialed Access

AMR-UTI: Antimicrobial Resistance in Urinary Tract Infections

Michael Oberst, Soorajnath Boominathan, Helen Zhou, Sanjat Kanjilal, David Sontag

AMR-UTI is a freely accessible dataset, derived from electronic health record (EHR) information on over 100,000 urinary tract infections (UTI) treated at Massachusetts General Hospital and Brigham & Women's Hospital in Boston, MA, USA.

antibiotic resistance causal inference policy learning antimicrobial resistance urinary tract infection clinical decision support machine learning

Published: Nov. 4, 2020. Version: 1.0.0


Database Credentialed Access

Phenotype Annotations for Patient Notes in the MIMIC-III Database

Edward Moseley, Leo Anthony Celi, Joy Wu, Franck Dernoncourt

Clinical notes, annotated by at least two expert annotators for over ten patient phenotypes, including advanced cancer, substance abuse, and treatment non-adherence.

patient classification natural language processing

Published: March 5, 2020. Version: 1.20.03


Challenge Open Access

Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019

Matthew Reyna, Chris Josef, Russell Jeter, Supreeth Shashikumar, Benjamin Moody, M. Brandon Westover, Ashish Sharma, Shamim Nemati, Gari D. Clifford

The 2019 PhysioNet Computing in Cardiology Challenge invites participants to predict sepsis in clinical data

prediction challenge sepsis

Published: Aug. 5, 2019. Version: 1.0.0


Challenge Open Access

Predicting Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012

The focus of the PhysioNet/CinC Challenge 2012 is to develop methods for patient-specific prediction of in-hospital mortality. Participants will use information collected during the first two days of an ICU stay to predict which patients survive the…

mortality prediction challenge ehr mimic

Published: Jan. 20, 2012. Version: 1.0.0


Database Open Access

HeartCycle: A comprehensive dataset of synchronized impedance cardiography and echocardiography for accurate hemodynamic predictions

Eduardo Illueca Fernandez, Ricardo Couceiro, Farhad Abtahi, Jorge Henriques, Rui Pedro Paiva, Lino Goncalves, Jose Millet, Fernando Seoane, Jens Muehlsteff, Paulo Carvalho

Impedance cardiography dataset (ICG) which combines the ICG signals and other methodologies with the golden standard echocardiographys (ECG). Researchers can use this dataset to compare the ICG points with the real hemodynamic events.

machine learning cardiovascular physiology electrophysiological study echocardiography impedance cardiography

Published: Nov. 2, 2025. Version: 1.0.0


Database Open Access

tOLIet: Single-lead Thigh-based Electrocardiography Using Polimeric Dry Electrodes

Aline Santos Silva, Hugo Plácido da Silva, Miguel Correia, Andreia Cristina Gonçalves da Costa, Sérgio Laranjo

We present tOLIet, the first thigh ECG dataset with real signals captured by a toilet seat with electrodes. There are 149 recordings from 86 people, useful for research into cardiovascular assessment using "invisible" ECG.

Published: June 24, 2025. Version: 1.0.0