Database Credentialed Access

# MedNLI - A Natural Language Inference Dataset For The Clinical Domain

Published: Oct. 1, 2019. Version: 1.0.0

Shivade, C. (2019). MedNLI - A Natural Language Inference Dataset For The Clinical Domain (version 1.0.0). PhysioNet. https://doi.org/10.13026/C2RS98.

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

## Abstract

State of the art models using deep neural networks have become very good in learning an accurate mapping from inputs to outputs. However, they still lack generalization capabilities in conditions that differ from the ones encountered during training. This is even more challenging in specialized, and knowledge intensive domains, where training data is limited. To address this gap, we introduce MedNLI - a dataset annotated by doctors, performing a natural language inference task (NLI), grounded in the medical history of patients. As the source of premise sentences, we used the MIMIC-III. More specifically, to minimize the risks to patient privacy, we worked with clinical notes corresponding to the deceased patients. The clinicians in our team suggested the Past Medical History to be the most informative section of a clinical note, from which useful inferences can be drawn about the patient.

## Background

Owing to restricted access of patient data, the common approach of using crowd sourcing platforms to get annotations is not possible in the medical domain. Labeling requires domain experts which results in prohibitive costs of annotation. Owing to these constraints, the onset of community-driven NLP research facilitated by shared resources has been late in the clinical domain. Despite these barriers, publicly available datasets have been created through initiatives such as i2b2 and CLEF. However, most of these datasets have modest sizes, and they either target fundamental NLP problems (e.g. co-reference resolution) or information extraction tasks (e.g. named entity extraction). Currently, the clinical domain lacks large labeled datasets to train modern data-intensive models for end-to-end tasks such as NLI, question answering, or paraphrasing. These problems (particularly NLI) have excellent resources in the open-domain. To close this gap, we curated MedNLI and made it publicly available [1].

## Methods

The clinicians in our team suggested the Past Medical History to be the most informative section of a clinical note, from which useful inferences can be drawn about the patient. Therefore, we segmented these notes into sections using a simple rule based program capturing the formatting of these section headers. We extracted the Past Medical History section and used a sentence splitter trained on biomedical articles from LingPipe get a pool of candidate premises. We then randomly sampled sentences from these candidates and presented them to the clinicians for annotation. The exact prompt shown to the clinicians for the annotation task is as follows.

You will be shown a sentence from Past Medical History section of a de-identified clinical note. Using only this sentence, your knowledge about the field of medicine, and common sense:

• Write one alternate sentence that is definitely a true description of the patient. Example, for the sentence "Patient has type II diabetes" you could write "Patient suffers from a chronic condition".
• Write one alternate sentence that might be a true description of the patient. Example, for the sentence "Patient has type II diabetes" you could write "Patient has hypertension".
• Write one sentence that is definitely a false description of the patient. Example, for the sentence "Patient has type II diabetes" you could write "The patient's insulin levels are normal without any medications".

## Data Description

There are four files in the dataset

1. README.txt: A short introduction to the dataset.
2. mli_train_v1.jsonl: The training split.
3. mli_dev_v1.jsonl - The development/validation split.
4. mli_test_v1.jsonl - The test split.

## Usage Notes

The clinical notes from the NOTEEVENTS table of MIMIC-III (v1.4) are the source for the premise statements in this dataset [2]. More specifically, each note was segmented into sections and sentences from the "past medical history" section were randomly sampled. The dataset is in json lines format and follows the exact the same format as the SNLI and Multi_NLI datasets. Each record of this test set is a json line consisting of the following structure:

1. gold_label: entailment, contradiction, or neutral (redacted since this is a test set).
2. sentence1: the premise statement.
3. sentence2: the hypothesis statement.
4. sentence1 parse: The constituency parse of the premise using Stanford parser.
5. sentence2 parse: The constituency parse of the hypothesis using Stanford parser.
6. sentence1 binary parse: The binary parse of the premise using Stanford parser.
7. sentence2 binary parse: The binary parse of the hypothesis using Stanford parser.

A sample record from the training set is shown below

{"sentence1": "Labs were notable for Cr 1.7 (baseline 0.5 per old records) and lactate 2.4.", "pairID": "23eb94b8-66c7-11e7-a8dc-f45c89b91419", "sentence1_parse": "(ROOT (S (NP (NNPS Labs)) (VP (VBD were) (ADJP (JJ notable) (PP (IN for) (NP (NP (NP (NN Cr) (CD 1.7)) (PRN (-LRB- -LRB-) (NP (NP (NN baseline) (CD 0.5)) (PP (IN per) (NP (JJ old) (NNS records)))) (-RRB- -RRB-))) (CC and) (NP (NN lactate) (CD 2.4)))))) (. .)))", "sentence1_binary_parse": "( Labs ( ( were ( notable ( for ( ( ( ( Cr 1.7 ) ( -LRB- ( ( ( baseline 0.5 ) ( per ( old records ) ) ) -RRB- ) ) ) and ) ( lactate 2.4 ) ) ) ) ) . ) )", "sentence2": " Patient has elevated Cr", "sentence2_parse": "(ROOT (S (NP (NN Patient)) (VP (VBZ has) (NP (JJ elevated) (NN Cr)))))", "sentence2_binary_parse": "( Patient ( has ( elevated Cr ) ) )", "gold_label": "entailment"}

The goal of the task is to classify a given premise-hypothesis pair into one of the three classes: entailment, contradiction, or neutral.

## Release Notes

This page represents the republished version of the original MedNLI data first published here. There is no change in the data. All contents remain the same. The new look simply represents the updated platform on which MIMIC derived data is hosted.

## Acknowledgements

This work would not have been possible without Adam Coy, Andrew Colucci, Chanida Thammachart, and Hassan Ahmad who helped us in creating the dataset.

## Conflicts of Interest

The authors have no conflicts of interest to declare.

## References

1. Romanov, A., & Shivade, C. (2018). Lessons from Natural Language Inference in the Clinical Domain. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 1586-1596).
2. Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.

##### Parent Projects
MedNLI - A Natural Language Inference Dataset For The Clinical Domain was derived from: Please cite them when using this project.
##### Access

Access Policy:
Only credentialed users who sign the DUA can access the files.