Database Credentialed Access

Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries

Melissa Poulsen Vanessa Troiani Philip Freda Danielle Mowery Anahita Davoudi

Published: Feb. 8, 2023. Version: 1.0.0


When using this resource, please cite: (show more options)
Poulsen, M., Troiani, V., Freda, P., Mowery, D., & Davoudi, A. (2023). Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries (version 1.0.0). PhysioNet. https://doi.org/10.13026/tbb4-t319.

Additionally, please cite the original publication:

Poulsen MN, Freda PJ, Troiani V, Davoudi A, Mowery DL. Classifying Characteristics of Opioid Use Disorder From Hospital Discharge Summaries Using Natural Language Processing. Frontiers in Public Health. 2022;10.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Opioid use disorder (OUD) is underdiagnosed in health system settings, limiting research on OUD using electronic health records (EHRs). Medical encounter notes can enrich structured EHR data with documented signs and symptoms of OUD and social risks and behaviors. To capture this information at scale, natural language processing tools must be developed and evaluated. We conducted a pilot study that aimed to 1) develop and apply an annotation schema to deeply characterize OUD and related clinical, behavioral, and environmental factors; and 2) automate the annotation schema using machine learning and deep learning-based approaches. De-identified patient data for this study included hospital discharge summaries of patients with International Classification of Diseases (ICD-9) OUD diagnostic codes, obtained from the MIMIC-III Critical Care Database. We developed an annotation schema to characterize problematic opioid use, identify individuals with potential OUD, and provide psychosocial context. The final annotation schema contained 33 classes. Two annotators reviewed discharge summaries from a random sample of 100 of these patients. The first corpus of 40 patients was reviewed by both annotators. We achieved moderate inter-annotator agreement, with F1-scores across all classes increasing from 48% to 66%. The second corpus of 60 patients was reviewed by a single annotator. The shared database contains the resulting 3,270 annotations with the note identifier, span offset with accompanying text snippet, and class assignments and may be useful to future development of natural language processing systems related to OUD.


Background

Prior studies have utilized natural language processing (NLP) to identify problematic opioid use and opioid overdose from electronic health records [1-7]. However, several gaps remain in the development of NLP systems. For example, clinical, behavioral, and environmental factors linked to opioid use disorder (OUD) that are documented in clinical notes have generally been omitted. Prior studies have also been primarily conducted among patients on long-term prescription opioids (e.g., as therapy for chronic pain), missing opportunities to identify and study OUD related to illicit opioid use.

Studies are needed that utilize rigorous annotation approaches to inform NLP systems that include individuals who developed OUD through illicit opioid use and that draw upon additional information contained in clinical text to deeply characterize problematic opioid use. The shared database contains the annotations from a pilot study that aimed to 1) develop and apply an annotation schema to deeply characterize OUD and related clinical, behavioral, and environmental factors; and 2) automate the annotation schema using machine learning and deep learning-based approaches. Details of the study are available in [8].

Given the time and effort required to create a corpus of annotated data, we hope that the shared set of annotations will facilitate the work of other researchers who are developing NLP systems related to OUD. The annotations are linked to the classes and attributes from our annotation schema; thus, future researchers can select annotations related to classes/attributes that may be relevant to their work, or simply use the full set of annotations.


Methods

Patients included in this study came from the MIMIC-III Critical Care Database, a publicly-available, de-identified dataset that includes clinical data for roughly 60,000 patients with a hospital stay at Beth Israel Deaconess Medical Center in Boston, Massachusetts between 2001–2012 [9]. From the MIMIC-III dataset, we downloaded discharge summaries from 762 patients who had an International Classification of Diseases, version 9 (ICD-9) code related to opioid use disorder (304.00–304.03, 304.7, 304.70– 304.73, 304.8, 304.81, 304.82, 304.83, 305.50–305.53, 965.00, 965.01, 965.02, 965.09, E850.0, E935.0). A random sample of 100 patients was selected for annotation.

We developed an annotation schema to characterize problematic opioid use, identify individuals with potential OUD, and provide psychosocial context. Details of the annotation schema development process is available in [8]. The final annotation schema contained 33 classes. Table 1 lists the classes, class attributes used to further characterize the text (attributes not present for all classes), and class descriptions.

For the annotation work, we leveraged an open-source text annotation tool called the extensible Human Oracle Suite of Tools (eHOST) [10]. Annotation was primarily completed at the sentence level, assigning full sentences to one or more relevant classes. The exceptions were for the class opioid type, for which we annotated phrases (i.e., the specific opioid name) and the patient-level OUD assertion, which was assigned to the discharge summary as a whole. The first corpus of 40 patients was reviewed by two annotators. Annotations were adjudicated with disagreements resolved through discussion with all study authors. We achieved moderate inter-annotator agreement, with F1-scores across all classes increasing from 48% to 66%. Once the IAA was deemed sufficient to begin separate annotation work, the same annotators then reviewed discharge summaries for a second corpus of 60 patients (30 patients each).

The shared database contains the resulting 3,270 annotations from the annotation process, with each annotation linked to one or more classes and attributes.

 

Table 1. Classes with their attributes and a brief description and example term

Class

Attributes

Brief description/example term

Drug screening

Type: non-opioid, opioid, unspecified

Result: negative, positive

Drug screening results by type of drug for which the patient was screened and the result of the screening (positive or negative) (e.g., “toxicology screen positive for benzos and opiates”); “unspecified” type denotes insufficient information to determine the type of drug for which the patient was screened

Interpersonal and legal consequences

None

Consequences of opioid use related to interpersonal problems (relationships) or legal issues (e.g., “was arrested last Tuesday for possession of drug”)

Opioid abuse uncertain

None

Opioid abuse possible but unclear (e.g., “question of possible heroin use prior to the accident”)

Opioid dependency indicator

None

Indication of physical dependency on opioids, including signs of withdrawal, craving, or tolerance (e.g., “patient in acute opiate withdrawal”)

Opioid illicit use

None

Recent history of illicit opioid use (e.g., “patient admits to using heroin”)

Opioid past illicit use

None

Patient has a history of illicit opioid use (“history of IV heroin use”)

Opioid reduction successful

None

Efforts to cut down, control, or wean from opioid use successful (e.g., “weaned himself down to 3-4 mg PO”)

Opioid reduction unsuccessful*

None

Efforts to cut down, control, or wean from opioid use unsuccessful (no examples identified)

Opioid seeking*

None

Opioid-seeking behaviors such as reporting lost opioid pills, seeking early refills, using others’ prescriptions, etc. (“reported taking husband’s methadone for headaches”)

Opioid type

None

Any mention of an opioid (e.g., “hydromorphone,” “methadone,”)

Opioid use underspecified

None

No context as to whether opioid use is as prescribed or illicit (e.g., “chronic use of opiate”)

Other contexts

None

Any other context that may be relevant to identifying opioid use disorder; this class is highly variable

Other drug use

Type: illicit, non-illicit, unknown

Use of addictive substances by type—non-illicit (e.g., tobacco, alcohol) or illicit drugs other than opioids (e.g., “cocaine abuse”)

Other drug use negated

Type: illicit, non-illicit, unknown

Patient does not use an addictive substance, by type—non-illicit or illicit drugs other than opioids (e.g., “denies tobacco, EtOH, IVDU”)

Other drug use past

Type: illicit, non-illicit, unknown

History of use of addictive substances, by type—non-illicit or illicit drugs other than opioids (e.g., “former heavy alcohol use”)

OUD current

None

Clinical writer asserts patient has an opioid use disorder (e.g., “heroin addiction”)

OUD negated

None

Clinical write notes patient does not have an opioid use disorder (e.g., “no history of IVDU or opiate pain med use”)

OUD past

None

Clinical writer asserts patient had an opioid use disorder in the past (e.g., “history of narcotic dependence”)

OUD psychosocial stressors related

None

Events such as trauma, homelessness, friends/family who use drugs, etc. that have potentially triggered drug misuse (e.g., “brother with IVDU”)

OUD tx

None

Treatment for opioid use disorder including medication-based treatment, counseling, detoxification, rehabilitation (e.g., “IVDU on methadone”)

OUD tx negated

None

Patient not in treatment for opioid use disorder (e.g., “offered addictions counseling but refused”)

Overdose

Type: non-opioid, opioid, unknown

Overdose from an opioid or other drug by type (e.g., “heroin and cocaine overdose”)

Overdose negated

Type: non-opioid, opioid, unknown

Patient did not overdose, by type (e.g., “not consistent with a narcotic overdose”)

Overdose past

Type: non-opioid, opioid, unspecified

Past drug overdose, by type (e.g., “has been hospitalized for 7 medication overdoses”)

Overdose uncertain

Type: non-opioid, opioid, unknown

Uncertain whether patient had a drug overdose or not, by type (“did not respond to Narcan in the ED”)

Pain

None

Descriptions of pain (e.g., “chronic back pain”)

Pain mgmt

None

Pain management efforts (“Oxycodone 5 mg Q4H as needed for breakthrough pain”)

Patient-level OUD assertion

Type: clinical writer-asserted positive, negative*, uncertain*, not-specified

Clinical writer’s assertion regarding patient’s opioid use disorder status (positive, negative, uncertain, not specified)

Psychiatric

None

Acute psychiatric or mental health condition is a factor in the visit (e.g., “suicidal ideations”)

Psychiatric negated

None

Lack of psychiatric or mental health condition (e.g., “denies feeling depressed”)

Psychiatric past

None

History of a psychiatric or mental health condition (e.g., “history of anxiety”)

Psychiatric uncertain

None

Clinical writer uncertain about whether patient has a psychiatric or mental health condition (“psychiatry questioned the patient’s diagnoses”)

Vocational interferences*

None

Consequences of opioid use related to patient’s employment or schooling (e.g., “she was a nurse but lost her license secondary to substance abuse”)

*Indicates that class or attribute was part of the annotation schema but was not used in the annotations and so will not appear in the annotation dataset

Note on illicit versus non-illicit drug use: non-illicit use refers to legal substances including alcohol and tobacco; illicit use refers to illegal substances such as cocaine; illicit opioid use refers to non-medical use of opioids, such as heroin use or misuse of prescription opioids


Data Description

The dataset is contained in a single .csv file entitled OUD_MIMIC_annotations_all.csv. Data include 3,270 rows of annotated sentences or words (“terms”) from 137 discharge summaries of 100 patients. Each patient had between one and six associated discharge summaries. Each term is linked to a class and (for some classes) attributes of that class, the result of the annotation process described in the Methods section. Table 1 lists the classes, class attributes used to further characterize the text (attributes not present for all classes), and class descriptions. A single term could be linked to more than one class and so duplicate annotations exist in the dataset. The file contains the following variables (columns):

  • File name: The name of the source file from the MIMIC-III Critical Care Database containing each discharge summary; the first 3-4 numbers in the name identifies the patient. For example, the name1084_194111.0_47905_Dischargesummary references file 194111.0_47905, which is a discharge summary, from patient 1084.
  • File name with extension: The name of the source file, with the extension (.txt).
  • Term: The annotated sentence/word/phrase. Note that for the patient-level OUD assertion, the class was assigned to the full discharge summary and so the associated text is simply the first word contained in the discharge summary (see limitations of the patient-level OUD assertion in [8]).
  • Span: The location of the annotated term within the associated discharge summary.
  • Class: The class assigned to the annotated term.
  • Att1: Att1 indicates that the class contained a first-level attribute (“type”). If blank, the class did not contain any attributes. See Table 1 for attributes assigned to classes.
  • Att1 value: The value of attribute 1 that was assigned to the annotated term.
  • Att2: If a class contained second-level attributes, Att2 will contain a value (“result”). If blank, the class did not contain a second-level attribute. See Table 1 for attributes assigned to classes.
  • Att2 value: The value of attribute 2 that was assigned to the annotated term.
  • Annotation set: One of two values (“adjudicated” or “independent”) indicates whether the annotations were the result of the first corpus (adjudicated by two annotators) or the second corpus (annotated by a single annotator).

Usage Notes

When publishing findings based on the use of this dataset, please reference the following publication:

Poulsen MN, Freda PJ, Troiani V, Davoudi A, Mowery DL. Classifying Characteristics of Opioid Use Disorder From Hospital Discharge Summaries Using Natural Language Processing. Frontiers in Public Health. 2022;10.

This publication provides detailed information regarding the full methods of the pilot study that may be helpful in the reuse of this data, including the process of developing the annotation schema and the limitations of the schema.

Briefly, the annotation dataset was developed during a pilot study that aimed to develop and apply an annotation schema to deeply characterize OUD and related clinical, behavioral , and environmental factors and to automate the schema using machine learning and deep learning-based approaches [8]. Following development of the annotation schema and the annotation dataset, we divided the patients with their associated annotated sentences into training and testing sets. We used the training set to generate features, employing three NLP algorithms/knowledge sources. We trained and tested prediction models for classification with a traditional machine learner (logistic regression) and deep learning approach (Autogluon based on ELECTRA’s replace token detection model).

Given the time and effort required to create a corpus of annotated data, we hope that the shared set of annotations will facilitate the work of other researchers who are developing NLP systems related to OUD. The annotations are linked to the classes and attributes from our annotation schema; thus, future researchers can select annotations related to classes/attributes that may be relevant to their work, or simply use the full set of annotations. Potential uses of the dataset include basic NLP research such as training sentence classifiers related to OUD and studying linguistic constructions of OUD assertions from clinical notes.

Limitations related to the annotation work are described in detail in our publication referenced above [8]. Briefly, the patient data used for annotations was obtained solely from hospital admissions from a U.S. hospital, which has implications for the generalizability of study findings to non-hospital and non-U.S. settings. The number of annotations for some classes may also be insufficient to conduct analyses on these classes. Finally, we identified limitations to our annotation schema, including ambiguity of some classes, that will be improved upon in our future research.


Release Notes

Initial release version 1.0.0


Ethics

The Geisinger and University of Pennsylvania Institutional Review Boards reviewed the protocol for this study and determined it met criteria for exempt human subjects research, as all data were fully de-identified.


Acknowledgements

MP was supported by the National Institute on Drug Abuse of the National Institutes of Health under Award Number K01DA049903. PF was supported by the Ruth L. Kirschstein National Research Award (T32 HG009495). VT was supported by the National Institute on Drug Abuse of the National Institutes of Health under Award Number R01DA044015 (PI: VT) and the Pennsylvania Department of Health. DM, PF, and AD was supported by DM’s start-up funding through the University of Pennsylvania.


Conflicts of Interest

The authors have no conflicts of interest to declare.


References

  1. Alzeer A, Jones JF, Bair MJ, Liu X, Alfantouck LA, Patel J, et al. A comparison of text mining versus diagnostic codes to identify opioid use problem: A retrospective study. Preprint. 2020.
  2. Carrell DS, Cronkite D, Palmer RE, Saunders K, Gross DE, Masters ET, et al. Using natural language processing to identify problem usage of prescription opioids. International journal of medical informatics. 2015;84(12):1057-64.
  3. Haller IV, Renier CM, Juusola M, Hitz P, Steffen W, Asmus MJ, et al. Enhancing Risk Assessment in Patients Receiving Chronic Opioid Analgesic Therapy Using Natural Language Processing. Pain Med. 2017;18(10):1952-60.
  4. Hazlehurst B, Green CA, Perrin NA, Brandes J, Carrell DS, Baer A, et al. Using natural language processing of clinical text to enhance identification of opioid-related overdoses in electronic health records data. Pharmacoepidemiol Drug Saf. 2019;28(8):1143-51.
  5. Hylan TR, Von Korff M, Saunders K, Masters E, Palmer RE, Carrell D, et al. Automated prediction of risk for problem opioid use in a primary care setting. J Pain. 2015;16(4):380-7.
  6. Lingeman JM, Wang P, Becker W, Yu H. Detecting Opioid-Related Aberrant Behavior using Natural Language Processing. AMIA Annual Symposium proceedings AMIA Symposium. 2018;2017:1179-85.
  7. Sharma B, Dligach D, Swope K, Salisbury-Afshar E, Karnik NS, Joyce C, et al. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients. BMC Med Inform Decis Mak. 2020;20(1):79.
  8. Poulsen MN, Freda PJ, Troiani V, Davoudi A, Mowery DL. Classifying Characteristics of Opioid Use Disorder From Hospital Discharge Summaries Using Natural Language Processing. Frontiers in Public Health. 2022;10.
  9. Johnson AEW, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3(1):160035.
  10. South B, Shen S, Leng J, Forbush T, DuVall S, Chapman WW, editors. A prototype tool set to support machine-assisted annotation. Proceedings of the 2012 Workshop on Biomedical Natural Language Processing; 2012; Stroudsburg, PA: Association for Computational Linguistics; 2012.

Parent Projects
Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.

Files