Resources


Database Contributor Review

CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools

Eulalia Farre Maduell, Salvador Lima-Lopez, Santiago Andres Frid, et al.

CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.

de-identification clinical ner anonymization

Published: April 20, 2024. Version: 1.0.1


Database Open Access

PADS - Parkinsons Disease Smartwatch dataset

Julian Varghese, Alexander Brenner, Lucas Plagwitz, et al.

The PADS dataset contains smartwatch-based records from interactive neurological assessments of Parkinsons disease patients, differential diagnoses and healthy controls. The data is complemented with non-motor symptoms and medical history information

wearables movement disorders parkinsons disease

Published: March 25, 2024. Version: 1.0.0


Database Open Access

ScientISST MOVE: Annotated Wearable Multimodal Biosignals recorded during Everyday Life Activities in Naturalistic Environments

João Areias Saraiva, Mariana Abreu, Ana Sofia Carmo, et al.

Multimodal (ECG, EMG, EDA, PPG, TEMP, ACC) biosignal dataset of everyday activities. Created with 3 wearable devices based on ScientISST Sense and Empatica E4.

greet lift uncontrolled environments run jump gesticulate walk wearable multimodal

Published: March 25, 2024. Version: 1.0.1


Database Open Access

Respiratory and heart rate monitoring dataset from aeration study

Ella Frances Sophia Guy, Isaac Flett, Jaimey Anne Clifton, et al.

Respiratory and cardiovascular data collected from 20 subjects. Pressure, flow, aeration, and heart-rate data were collected during trials which included resting breathing, CPAP at varied PEEP settings, breath-holds, and forced expiratory manoeuvres.

Published: March 20, 2024. Version: 1.0.0


Database Restricted Access

CheXchoNet: A Chest Radiograph Dataset with Gold Standard Echocardiography Labels

Pierre Elias, Shreyas Bhave

Early detection of heart failure is vital for improving outcomes. The dataset contains 71,589 CXRs paired with gold standard labels from echocardiograms to enable the training of models to detect pathologies indicative of early stage heart failure.

chest x-rays heart failure early detection cardiac structural abnormalties deep learning

Published: March 20, 2024. Version: 1.0.0


Database Credentialed Access

MIMIC-CXR-JPG - chest radiographs with structured labels

Alistair Johnson, Matthew Lungren, Yifan Peng, et al.

Chest x-rays in JPG format with structured labels derived from the associated radiology report.

computer vision chest x-ray radiology mimic deep learning

Published: March 12, 2024. Version: 2.1.0


Database Credentialed Access

EchoNotes Structured Database derived from MIMIC-III (ECHO-NOTE2NUM)

Gloria Hyunjung Kwak, Dana Moukheiber, Mira Moukheiber, et al.

A structured echocardiogram database derived from 43,472 observational notes obtained during echocardiogram studies conducted in the intensive care unit at the Beth Israel Deaconess Medical Center between 2001 and 2012.

Published: Feb. 23, 2024. Version: 1.0.0


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, et al.

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2


Database Credentialed Access

CORAL: expert-Curated medical Oncology Reports to Advance Language model inference

Madhumita Sushil, Vanessa Kennedy, Divneet Mandair, et al.

Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.

artificial intelligence information extraction oncology natural language processing electronic health records large language models

Published: Feb. 7, 2024. Version: 1.0


Database Open Access

A Multi-Modal Satellite Imagery Dataset for Public Health Analysis in Colombia

Sebastian A Cajas, David Restrepo, Dana Moukheiber, et al.

Multi-Modal Satellite imagery Dataset in Colombia: A public health analysis with spatiotemporally aligned satellite images and its corresponding metadata across 81 municipalities (2016-2018), facilitating multimodal AI applications.

multimodality satellite imagery

Published: Jan. 30, 2024. Version: 1.0.0