Resources


Database Open Access

CUILESS2016

A corpus of Concept Unique Identifier concepts taken from the SemEval2015 Task 14.

concept umls snomed

Published: Jan. 24, 2018. Version: 1.0.0


Database Credentialed Access

Paediatric Intensive Care database

Haomin Li, Xian Zeng, Gang Yu

PIC (Paediatric Intensive Care) is a large paediatric-specific, single-centre, bilingual database comprising information relating to children admitted to critical care units at a large children’s hospital in China.

intensive care natural language processing critical care pediatrics

Published: Nov. 12, 2020. Version: 1.1.0


Database Credentialed Access

MIMIC-III Clinical Database

Alistair Johnson, Tom Pollard, Roger Mark

MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The databas…

intensive care clinical natural language processing critical care machine learning

Published: Sept. 4, 2016. Version: 1.4


Challenge Credentialed Access

ShAReCLEF eHealth 2013: Natural Language Processing and Information Retrieval for Clinical Care

Danielle Mowery

2013 ShARe/CLEF eHealth Evaluation Lab: Natural Language Processing and Information Retrieval for Clinical Care (Tasks 1 and 2).

natural language processing

Published: Feb. 15, 2013. Version: 1.0


Challenge Credentialed Access

Analysis of Clinical Text: Task 14 of SemEval 2015

Guergana Savova

This is the dataset for SemEval 2014 and 2015, Analysis of Clinical Text

nlp semeval

Published: Dec. 28, 2014. Version: 2.0


Database Credentialed Access

RadNLI: A natural language inference dataset for the radiology domain

Yasuhide Miura, Yuhao Zhang, Emily Tsai, Curtis Langlotz, Dan Jurafsky

A radiology NLI dataset introduced in the paper: Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Published: June 29, 2021. Version: 1.0.0


Model Credentialed Access

EntityBERT: BERT-based Models Pretrained on MIMIC-III with or without Entity-centric Masking Strategy for the Clinical Domain

Chen Lin, Timothy Miller, Steven Bethard, Guergana Savova, Dmitriy Dligach

Pretraining of models with a broad representation of biomedical terminology (PubMedBERT) on MIMIC-III corpus along with or without a novel entity-centric masking strategy.

Published: Sept. 9, 2021. Version: 1.0.0


Model Credentialed Access

Clinical BERT Models Trained on Pseudo Re-identified MIMIC-III Notes

Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, Byron Wallace

We explore recovering sensitive info from BERT trained over non-deidentified EHR. We make our models and data available to further facilitate research.

Published: April 28, 2021. Version: 1.0.0


Database Credentialed Access

National Institutes of Health Stroke Scale (NIHSS) Annotations for the MIMIC-III Database

Jiayang Wang, Xiaoshuo Huang, Lin Yang, Jiao Li

A dataset of annotated NIHSS scale items and corresponding scores from stroke patients discharge summaries in MIMIC-III.

Published: Jan. 25, 2021. Version: 1.0.0


Model Credentialed Access

Transformer models trained on MIMIC-III to generate synthetic patient notes

Ali Amin-Nejad, Julia Ive, Sumithra Velupillai

Machine learning models that have been trained using MIMIC-III to enable the creation of synthetic discharge summaries.

Published: May 27, 2020. Version: 1.0.0