Database Restricted Access

MIMIC-IV-Ext-Apixaban-Trial-Criteria-Questions

Elizabeth Woo Michael Craig Burkhart Emily Alsentzer Brett Beaulieu-Jones

Published: April 30, 2025. Version: 1.0.0


When using this resource, please cite: (show more options)
Woo, E., Burkhart, M. C., Alsentzer, E., & Beaulieu-Jones, B. (2025). MIMIC-IV-Ext-Apixaban-Trial-Criteria-Questions (version 1.0.0). PhysioNet. https://doi.org/10.13026/4p6q-vb04.

Additionally, please cite the original publication:

Woo EG, Burkhart MC, Alsentzer E, & Beaulieu-Jones BK (2024). "Synthetic Data Distillation Enables the Extraction of Clinical Information at Scale". medRxiv 2024.09.27.24314517; doi: https://doi.org/10.1101/2024.09.27.24314517

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Large-language models (LLMs) show promise for extracting information from clinical notes. Deploying these models at scale can be challenging due to high computational costs, regulatory constraints, and privacy concerns. To address these challenges, synthetic data distillation can be used to fine-tune smaller, open-source LLMs that achieve performance similar to the teacher model. These smaller models can be run on less expensive local hardware or at a vastly reduced cost in cloud deployments. In our recent study, we used Llama-3.1-70B-Instruct to generate synthetic training examples in the form of question-answer pairs along with supporting information. We then used these questions to fine-tune smaller versions of Llama to improve their ability to extract clinical information from notes.

To evaluate the resulting models, we created 23 questions resembling eligibility criteria from the apixaban clinical trial and evaluated them on a random sample of 100 patient notes from MIMIC-IV. Notes from MIMIC-IV were taken from after 2012 to ensure no overlap with any of the notes from MIMIC-III which were used to generate the data used to finetune the models. We release the 2300 total question-answer pairs as a dataset here.


Background

In our recent article [1], we created 23 boolean and numeric questions resembling eligibility criteria from the 2011 ARISTOTLE trial [2] comparing apixaban to warfarin. Using these questions, we manually annotated notes for 100 patients from MIMIC-IV, taken after 2012.

Our primary motivation for sharing this dataset on Physionet is to provide other credentialled users access to manually-created question-answer pairs for clinical notes. We used these question-answer pairs in our manuscript to evaluate the effectiveness of finetuning LLMs for answering questions on clinical notes. In addition to reproducing our results, members of the research community could use these examples as ground truth data for finetuning their own models in the future, or as a benchmark dataset for validating LLM performance in the clinical domain.


Methods

After restricting to notes taken after 2012, the 100 patients were selected randomly from MIMIC-IV. A human reviewer validated each of the 2300 question-answer pairs and corrected them if necessary.

There were 23 questions (15 boolean, 8 numeric) answered for each of the 100 patients, giving 2300 question-answer pairs in total. We created the set of questions to model clinical trial inclusion criteria, and asked the same questions for each patient.

The 15 boolean questions were as follows:

  1. Does the note describe the patient as having atrial fibrillation (afib)? Answer "No" if the note describes the patient as having afib secondary to another reversible cause.
  2. Does the note describe the patient as ever being diagnosed with depression or major depressive disorder (MDD)? Answer "No" unless the note describes a diagnosis or history of depression.
  3. Does the note describe the patient as ever being diagnosed with schizophrenia or any schizoaffective disorders? Answer "No" unless the note describes a diagnosis or history of a schizoaffective disorder.
  4. Does the note describe the patient as ever being diagnosed with bipolar disorder? Answer "No" unless the note describes a diagnosis or history of bipolar disorder.
  5. Does the note describe the patient as ever having any hemorrhagic tendencies or blood dyscrasias? Answer "No" unless the note describes a diagnosis or history of hemorrhagic tendencies or blood dyscrasias.
  6. Does the note describe the patient as having a stroke during this admission or within the last month? (Answer "Yes" for any recent stroke if the date is unclear, answer "No" if no stroke is mentioned or a prior stroke occurred but it was not recent.)
  7. Does the note describe the patient as ever having peptic ulcer disease?
  8. Does the note describe the patient as having serious bleeding in the past 6 months? Answer "No" unless the note describes a serious recent bleeding issue.
  9. Does the note describe the patient as having a planned or past ablation procedure for afib? Answer "No" unless the note includes information about a past or planned ablation for afib.
  10. Does the note describe the patient as ever having valvular disease (stenosis) requiring surgery? Answer "No" if there is mention of stenosis without surgery.
  11. Does the note describe the patient as having heart failure?
  12. Does the note describe the patient as having diabetes mellitus (DM1, DM2, T2D, T1DM, T2DM)?
  13. Does the note describe the patient as having arterial hypertension (high bp e.g. >140, or HTN)? This includes pre-existing hypertension and treated hypertension.
  14. Does the note describe the patient as ever having a stroke or transient ischemic attack (TIA)? Answer "No" unless the note includes information about the patient having a prior stroke or TIA.
  15. Does the note describe the patient as being unable to make medical decisions upon discharge? Answer "No" unless there is evidence the patient cannot make their own medical decisions. Answer "Yes" if there is clear mention of dementia or the patient is deceased.

The 8 numeric questions were as follows:

  1. What is the lowest platelet count (PLT) mentioned in the note? Answer "NA" if no platelet count (PLT) is available in the note.
  2. What is the highest total bilirubin (TotBili, Bili) mentioned in the note? Answer "NA" if no bilirubin value is available in the note.
  3. What is the highest aspartate aminotransferase level (AST) mentioned in the note? Answer "NA" if no AST value is available in the note.
  4. What is the highest serum creatinine (Creat) mentioned in the note? Answer "NA" if no creatinine value is available in the note.
  5. What is the lowest hemoglobin (HGB) mentioned in the note? Answer "NA" if no HGB value is available in the note.
  6. What is the highest CHADS2 score mentioned? Answer "NA" if no CHADS2 score is in the note.
  7. What is the lowest left ventricular ejection (LVEF, ef, ejection fraction) fraction mentioned in the note? Answer "NA" if no LVEF is in the note, Answer 55 if the lowest value is 55%% or greater.
  8. What is the highest blood glucose lab mentioned? Answer "NA" if no blood glucose score is in the note.

Data Description

The csv file 'annotated_apixaban_combined.csv' contains a header and 2300 rows with the following columns:

column name description
text text of the MIMIC note
note_id from MIMIC
hadm_id from MIMIC
criterion question label
question_type numeric or boolean
question one of the 23 questions listed above
answer as determine from manual review
not_specified boolean indicating if the question cannot be answered from the contents of the note

Summary statistics for the 15 boolean questions were as follows:

Question Answer Count (%)
1 Does the note describe the patient as having atrial fibrillation (afib)? Answer "No" if the note describes the patient as having afib secondary to another reversible cause. Yes 71 (71%)
No 29 (29%)
2 Does the note describe the patient as ever being diagnosed with depression or major depressive disorder (MDD)? Answer "No" unless the note describes a diagnosis or history of depression. Yes 23 (23%)
No 77 (77%)
3 Does the note describe the patient as ever being diagnosed with schizophrenia or any schizoaffective disorders? Answer "No" unless the note describes a diagnosis or history of a schizoaffective disorder. Yes 2 (2%)
No 98 (98%)
4 Does the note describe the patient as ever being diagnosed with bipolar disorder? Answer "No" unless the note describes a diagnosis or history of bipolar disorder. Yes 5 (5%)
No 95 (95%)
5 Does the note describe the patient as ever having any hemorrhagic tendencies or blood dyscrasias? Answer "No" unless the note describes a diagnosis or history of hemorrhagic tendencies or blood dyscrasias. Yes 18 (18%)
No 82 (82%)
6 Does the note describe the patient as having a stroke during this admission or within the last month? (Answer "Yes" for any recent stroke if the date is unclear, answer "No" if no stroke is mentioned or a prior stroke occurred but it was not recent) Yes 16 (16%)
No 84 (84%)
7 Does the note describe the patient as ever having peptic ulcer disease? Yes 6 (6%)
No 94 (94%)
8 Does the note describe the patient as having serious bleeding in the past 6 months? Answer "No" unless the note describes a serious recent bleeding issue. Yes 20 (20%)
No 80 (80%)
9 Does the note describe the patient as having a planned or past ablation procedure for afib? Answer "No" unless the note includes information about a past or planned ablation for afib. Yes 5 (5%)
No 95 (95%)
10 Does the note describe the patient as ever having valvular disease (stenosis) requiring surgery? Answer "No" if there is mention of stenosis without surgery. Yes 10 (10%)
No 90 (90%)
11 Does the note describe the patient as having heart failure? Yes 53 (53%)
No 47 (47%)
12 Does the note describe the patient as having diabetes mellitus (DM1, DM2, T2D, T1DM, T2DM)? Yes 44 (44%)
No 56 (56%)
13 Does the note describe the patient as having arterial hypertension (high bp e.g. >140, or HTN)? This includes pre-existing hypertension and treated hypertension. Yes 82 (82%)
No 47 (47%)
14 Does the note describe the patient as ever having a stroke or transient ischemic attack (TIA)? Answer "No" unless the note includes information about the patient having a prior stroke or TIA Yes 19 (19%)
No 81 (81%)
15 Does the note describe the patient as being unable to make medical decisions upon discharge? Answer "No" unless there is evidence the patient cannot make their own medical decisions. Answer "Yes" if there is clear mention of dementia or the patient is deceased. Yes 13 (13%)
No 87 (87%)

Summary statistics for the 8 numeric questions were as follows:

Question Mean value Median value Standard deviation Range NAs
1 What is the lowest platelet count (PLT) mentioned in the note? Answer "NA" if no platelet count (PLT) is available in the note. 148.53 147.50 90.8 15-364 60 (60%)
2 What is the highest total bilirubin (TotBili, Bili) mentioned in the note? Answer "NA" if no bilirubin value is available in the note. 0.903 0.600 1.11 0.2-6.8 33 (33%)
3 What is the highest aspartate aminotransferase level (AST) mentioned in the note? Answer "NA" if no AST value is available in the note. 194.4 36.0 1049.597 8-8627 33 (33%)
4 What is the highest serum creatinine (Creat) mentioned in the note? Answer "NA" if no creatinine value is available in the note. 1.586 1.200 1.199 0.5-7.8 3 (3%)
5 What is the lowest hemoglobin (HGB) mentioned in the note? Answer "NA" if no HGB value is available in the note. 10.21 10.15 2.054 6.0-15.9 2 (2%)
6 What is the highest CHADS2 score mentioned? Answer "NA" if no CHADS2 score is in the note. 3.95 3.50 1.39 1-6 80 (80%)
7 What is the lowest left ventricular ejection (LVEF, ef, ejection fraction) fraction mentioned in the note? Answer "NA" if no LVEF is in the note, Answer 55 if the lowest value is 55%% or greater. 47.89 50.00 14.4 20-75 53 (53%)
8 What is the highest blood glucose lab mentioned? Answer "NA" if no blood glucose score is in the note. 142.1 126.0 52.2 78-412 3 (3%)

Usage Notes

This dataset accompanies our manuscript [1] that includes more detailed methods and associated results. In brief, this dataset could be used to help evaluate an LLM's ability to answer clinical questions from notes. This is a pressing problem that has received active interest from the research community recently [4].

Code to evaluate LLMs using this data can be found on Github [3].

Known limitations

We used this dataset to evaluate the performance of LLM's for answering questions about clinical notes. During evaluation, we would supply the note and the question to the LLM, and compare the LLM's response with the correct answer. For this purpose, these question-answer pairs seemed quite adequate. As each Q&A pair required manual review, we only included data for 100 persons in this set. This is admittedly quite small in proportion to the full MIMIC-IV dataset, and may not be large enough to properly represent the MIMIC-IV cohort for some applications.


Release Notes

This version (1.0.0) corresponds to the first release.


Ethics

This dataset is derived from MIMIC and does not link to any external data sources or include any analyses which would enable the re-identification of participants in MIMIC. It therefore falls under the same consent and ethics approvals as the original MIMIC dataset.

All model training and analysis was performed on the Randi high performance computing cluster at the University of Chicago's Center for Research Informatics. Randi is HIPPA-compliant and has been audited and approved for the handling of patient data.


Acknowledgements

This work was funded in part by the National Institutes of Health, specifically the National Institute of Neurological Disorders and Stroke grant number R00NS114850 to BKB. This project would not have been possible without the support of the Center for Research Informatics at the University of Chicago and particularly the High-Performance Computing team. The authors are grateful for the resources and support this team provided throughout the duration of the project. The Center for Research Informatics is funded by the Biological Sciences Division at the University of Chicago with additional funding provided by the Institute for Translational Medicine, CTSA grant number 2U54TR002389-06 from the National Institutes of Health.


Conflicts of Interest

The authors have no conflicts of interest to declare.


References

  1. Woo EG, Burkhart MC, Alsentzer E, Beaulieu-Jones BK (2024). "Synthetic Data Distillation Enables the Extraction of Clinical Information at Scale". medRxiv 2024.09.27.24314517; doi: https://doi.org/10.1101/2024.09.27.24314517
  2. Granger CB, et al. (2011). "Apixaban versus warfarin in patients with atrial fibrillation". N. Engl. J. Med. 365: 981–992.
  3. Beaulieu-Jones BK. "clinical-synthetic-data-distil." Available from: https://github.com/bbj-lab/clinical-synthetic-data-distil
  4. Hager P, Jungmann F, Holland R, et al (2024). "Evaluation and mitigation of the limitations of large language models in clinical decision-making." Nat. Med. 30: 2613–2622.

Parent Projects
MIMIC-IV-Ext-Apixaban-Trial-Criteria-Questions was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Only registered users who sign the specified data use agreement can access the files.

License (for files):
PhysioNet Restricted Health Data License 1.5.0

Data Use Agreement:
PhysioNet Restricted Health Data Use Agreement 1.5.0

Corresponding Author
You must be logged in to view the contact information.

Files