Database Credentialed Access

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

James Mullenbach Yada Pruksachatkun Sean Adler Jennifer Seale Jordan Swartz T Greg McKelvey Yi Yang David Sontag

Published: June 21, 2021. Version: 1.0.0

When using this resource, please cite: (show more options)
Mullenbach, J., Pruksachatkun, Y., Adler, S., Seale, J., Swartz, J., McKelvey, T. G., Yang, Y., & Sontag, D. (2021). CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes (version 1.0.0). PhysioNet.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


We created a dataset of clinical action items annotated over MIMIC-III. This dataset, which we call CLIP, is annotated by physicians and covers 718 discharge summaries, representing 107,494 sentences. Annotations were collected as character-level spans to discharge summaries after applying surrogate generation to fill in the anonymized templates from MIMIC-III text with faked data. We release these spans, their aggregation into sentence-level labels, and the sentence tokenizer used to aggregate the spans and label sentences. We also release the surrogate data generator, and the document IDs used for training, validation, and test splits, to enable reproduction. The spans are annotated with 0 or more labels of 7 different types, representing the different actions that may need to be taken: Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other. We encourage the community to use this dataset to develop methods for automatically extracting clinical action items from discharge summaries.


Continuity of care is crucial to ensuring positive health outcomes for patients discharged from an inpatient hospital setting, and improved information sharing via discharge summaries can help. When patients are discharged, they often require further actions to be taken by their primary care provider (PCP), who manages their long-term health, such as reviewing lab test results once they are available.  Jackson et al. [1] found that following up on pending clinical actions is critical for minimizing risk of medical error during care transitions, especially for patients with complex treatment plans. However, discharge summaries are often lengthy, so scanning the document for specific action items can be time-consuming and error-prone. 

We introduce this dataset to enable work on a new task in clinical natural language processing that accomplishes focused information extraction from ICU discharge summaries. Given a discharge summary, the task is to extract all the clinically actionable follow-up items in the note, at any level of specificity: character-, word-, or sentence-level. A successful NLP model for this task can generate a focused summary by extracting spans from a discharge summary to present to a physician. This may reduce physicians’ cognitive load, time spent reading documentation, and likelihood of omitting important care items when receiving patients recently discharged from an inpatient setting. An associated paper provides further detail on the dataset, and describes work to develop machine learning models that output clinically actionable follow-up items given an input note [2, 3].


CLIP is created on top of the popular clinical dataset MIMIC-III [4,5]. The MIMIC-III dataset contains 59,652 critical care discharge summaries from the Beth Israel Deaconess Medical Center over the period of 2001 to 2012, among millions of other notes and structured data. We annotated 718 randomly sampled discharge summaries from the set of patients that were discharged from the ICU (i.e., survived) and thus brought back to the care of their primary care physician or relevant specialists. This dataset is also the first of its kind in the clinical space. The total number of sentences (using our provided sentence tokenizer) is 107,494, of which 12,079 have at least one label. 

Our dataset was annotated by four physicians and one resident over the course of three months. We built a special-purpose annotation tool, which allowed annotators to select and label arbitrary character-level spans of text within the document. Since MIMIC-III is an anonymized dataset, entities such as names, dates, phone numbers, hospital names, and others that were censored were replaced with synthetic substitute entities, to make reading and annotating notes easier.

After collecting initial annotations, we met with the annotators in multiple sessions to reconcile differences in their annotations. We adjusted the annotation guidelines slightly to reduce ambiguity and improve labeling consistency. Specifically, the Patient Instructions label originally instructed annotators to choose only those instructions that are unique to that patient, and exclude general guidelines such as ``Call your doctor if you experience a fever.'' However, we observed this was too ambiguous in practice, so we chose to automatically label any sentence in document sections “Followup instructions” and “Discharge instructions” as the Patient Instructions label, using regular expressions to identify these common section headers in the MIMIC-III discharge summaries. We then had two of the original annotators revise all existing annotations, to catch mistakes and adjust to the refined guidelines. We release our final guidelines along with the dataset.

We estimated inter-rater reliability by having two physician annotators independently annotate a set of 13 documents comprising 2600 sentences. Comparing predictions on a binary reduction of the task, in which a match indicates that both annotators labeled a sentence (regardless of chosen label types), we measured a Cohen's kappa statistic of 0.925.

The sampled MIMIC-III data is further split randomly into training, validation, and test sets, such that all sentences from a document go to the same set, with 518, 100, and 100 notes respectively.

Data Description

The sentence-level data (sentence_level.csv) is a csv with four columns: "doc_id" representing the unique id for the discharge summary, "sent_index" representing the location of the sentence within the document, the "sentence" as a pre-tokenized list of words, and the "labels" as a (possibly empty) list of labels for that sentence.

The character-level data (`character_level/*.json`) is represented as an individual json file for each document, with the following format:

  id: <doc_id>,
  text: <full text with surrogate generated entities>,
  spans: [
    document_id: <doc_id>,
    type: [label_type_1, ...],
    start: <character index of span start>,
    end: <character index of span end>

The train, validation, and test split document ids are found in train_ids.csv, val_ids.csv, and test_ids.csv, respectively, each with a single column containing the document id.

The seven action item aspects that we labeled in the dataset, along with (fabricated) example discharge summary snippets for each one, are presented below:

Action Type



Patient Instructions

Post-discharge instructions that are directed to the patient, so the PCP can ensure the patient understands and performs them.

No driving until post-op visit and you are no longer taking pain medications.


Appointments to be made by the PCP, or monitored to ensure the patient attends them.

The patient requires a neurology consult at XYZ for evaluation. 


Medications that the PCP either needs to ensure that the patient is taking correctly, e.g. time-limited medications or new medications that may need dose adjustment.

The patient was instructed to hold ASA and refrain from NSAIDs for 2 weeks. 


Laboratory tests that either have results pending or need to be ordered by the PCP.

We ask that the patients’ family physician repeat these tests in 2 weeks to ensure  resolution.


Procedures that the PCP needs to either order, ensure another caregiver orders, or ensure the patient undergoes.

Please follow-up for EGD with GI.


Imaging studies that either have results pending or need to be ordered by the PCP. 

Superior segment of the left lower lobe: rounded density which could have been related to infection, but follow-up for resolution recommended to exclude possible malignancy 


Other actionable information that is important to relay to the PCP but does not fall under existing aspects (e.g. the need to closely observe the patient's diet, or fax results to another provider).

Since the patient has been struggling to gain weight this past year, we will monitor his nutritional status and  trend weights closely.

The sentence-level prevalence of each label is as follows:

  • Patient Instructions: 6.55%
  • Appointment: 4.59%
  • Medication: 1.88%
  • Lab: 0.69%
  • Procedure: 0.28%
  • Imaging: 0.18%
  • Other: 0.05%

Usage Notes

In an associated paper, describe work to develop machine learning models that output clinically actionable follow-up items given an input note [2]. Code for reproducing this paper is provided in the code folder and also available on GitHub [3].

To replicate one of the main results of the paper (Table 4 row 8, “MIMIC-DNote-BERT+Context”), run, which invokes the training script 10 times:

for seed in 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346
  python ../../../data/all_revised_data/train.csv  clinicalbert_disch --criterion auc_macro --n_context_sentences 2 --seed $seed

For further details on reproducing the study, please refer to the file provided with the code.


We thank our annotation team of physicians for their fruitful collaboration.

Conflicts of Interest

The authors have no conflicts of interest to declare.


  1. Carlos T. Jackson, Mohammad Shahsahebi, Tiffany Wedlake, and C Annette Dubard. 2015. Timeliness of outpatient follow-up: An evidence-based approach for planning after hospital discharge. Annals of Family Medicine, 13 2:115–22.
  2. CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes James Mullenbach, Yada Pruksachatkun, Sean Adler, Jennifer Seale, Jordan Swartz, Greg McKelvey, Hui Dai, Yi Yang and David Sontag ACL 2021
  3. Code for the CLIP Dataset. [Accessed: 20 May 2021]
  4. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet.
  5. Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035.

Parent Projects
CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes was derived from: Please cite them when using this project.

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research


DOI (version 1.0.0):

DOI (latest version):

Corresponding Author
You must be logged in to view the contact information.