Resources


Database Credentialed Access

CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports

Vlada Rozova, Anna Khanina, Jasmine Teng, Joanne Teh, Leon Worth, Monica Slavin, karin thursky, Karin Verspoor

A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.

nlp clinical documentation information extraction invasive fungal infections

Published: Feb. 20, 2024. Version: 1.0.2


Database Contributor Review

CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools

Eulalia Farre Maduell, Salvador Lima-Lopez, Santiago Andres Frid, Artur Conesa, Elisa Asensio, Antonio Lopez-Rueda, Helena Arino, Elena Calvo, Maria Jesús Bertran, Maria Angeles Marcos, Montserrat Nofre Maiz, Laura Tañá Velasco, Antonia Marti, Ricardo Farreres, Xavier Pastor, Xavier Borrat Frigola, Martin Krallinger

CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.

de-identification clinical ner anonymization

Published: April 20, 2024. Version: 1.0.1


Database Restricted Access

MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications

Chihcheng Hsieh, Chun Ouyang, Jacinto C Nascimento, Joao Pereira, Joaquim Jorge, Catarina Moreira

MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications

Published: March 23, 2023. Version: 1.0.0


Database Credentialed Access

RadGraph2: Tracking Findings Over Time in Radiology Reports

Adam Dejl, Sameer Khanna, Patricia Therese Pile, Kibo Yoon, Steven QH Truong, Hanh Duong, Agustina Saenz, Pranav Rajpurkar

RadGraph2 is a dataset of 800 chest radiology reports annotated using a fine-grained entity-relationship schema, which captures key findings as well as mentions of changes that occurred in comparison with the previous radiology studies.

chest x-rays relation extraction disease progression information extraction radiology reports named entity recognition

Published: Aug. 8, 2024. Version: 1.0.0


Model Credentialed Access

EntityBERT: BERT-based Models Pretrained on MIMIC-III with or without Entity-centric Masking Strategy for the Clinical Domain

Chen Lin, Steven Bethard, Guergana Savova, Timothy Miller, Dmitriy Dligach

Pretraining of models with a broad representation of biomedical terminology (PubMedBERT) on MIMIC-III corpus along with or without a novel entity-centric masking strategy.

Published: March 17, 2022. Version: 1.0.1


Database Credentialed Access

RadNLI: A natural language inference dataset for the radiology domain

Yasuhide Miura, Yuhao Zhang, Emily Tsai, Curtis Langlotz, Dan Jurafsky

A radiology NLI dataset introduced in the paper: Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Published: June 29, 2021. Version: 1.0.0


Database Credentialed Access

National Institutes of Health Stroke Scale (NIHSS) Annotations for the MIMIC-III Database

Jiayang Wang, Xiaoshuo Huang, Lin Yang, Jiao Li

A dataset of annotated NIHSS scale items and corresponding scores from stroke patients discharge summaries in MIMIC-III.

Published: Jan. 25, 2021. Version: 1.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0


Database Credentialed Access

MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Chaitanya Shivade

This is a resource for training machine learning models for language inference in the medical domain.

natural language inference recognizing textual entailment

Published: Oct. 1, 2019. Version: 1.0.0