Resources
Database Open Access
Synthetic Mention Corpora for Disease Entity Recognition and Normalization
We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.
nlp machine learning named entity recognition data augmentation entity normalization
Published: Feb. 3, 2025. Version: 1.0.0
Database Contributor Review
CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools
CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.
de-identification clinical ner anonymization
Published: April 20, 2024. Version: 1.0.1
Database Contributor Review
CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools
CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.
de-identification clinical ner anonymization
Published: April 20, 2024. Version: 1.0.1