Database Restricted Access
Pulmonary Edema Severity Grades Based on MIMIC-CXR
Published: Jan. 26, 2021. Version: 1.0 <View latest version>
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Clinical management decisions for patients with acutely decompensated heart failure and many other diseases are often based on grades of pulmonary edema severity, rather than its mere absence or presence. Chest radiographs are commonly performed to assess pulmonary edema. The MIMIC-CXR dataset that consists of 377,110 chest radiographs with free-text radiology reports offers a tremendous opportunity to study this subject.
This dataset is curated based on MIMIC-CXR, containing 3 metadata files that consist of pulmonary edema severity grades extracted from the MIMIC-CXR dataset through different means: 1) by regular expression (regex) from radiology reports, 2) by expert labeling from radiology reports, and 3) by consensus labeling from chest radiographs.
This dataset aims to support the algorithmic development of pulmonary edema assessment from chest x-ray images and benchmark its performance. The metadata files have subject IDs, study IDs, DICOM IDs, and the numerical grades of pulmonary edema severity. The IDs listed in this dataset have the same mapping structure as in MIMIC-CXR.
Clinical management decisions for patients with acutely decompensated heart failure and many other diseases are often based on grades of pulmonary edema severity, rather than its mere absence or presence. Clinicians often monitor changes in pulmonary edema severity to assess the efficacy of therapy. Accurate monitoring of pulmonary edema is essential when competing clinical priorities complicate clinical management. The extracted pulmonary edema severity labels in this dataset were numerically coded as follows: 0, none; 1, vascular congestion; 2, interstitial edema; and 3, alveolar edema.
Large-scale and common datasets have been the catalyst for the rise of machine learning today. In 2019, investigators released MIMIC-CXR , a large-scale publicly available chest radiograph dataset with free-text radiology reports. This dataset builds upon MIMIC-CXR, aiming to catalyze and benchmark future algorithmic developments in grading pulmonary edema severity from chest radiographs [2-4].
We aimed to identify patients with congestive heart failure (CHF) within the MIMIC-CXR dataset to limit confounding labels from other disease processes. There were 17,857 images in MIMIC-CXR which were acquired during visits with an emergency department discharge diagnosis code consistent with CHF . This resulted in 16,108 radiology reports and 1,916 patients that were included that had CHF. The label curation described below is performed within this CHF cohort. The cohort information is summarized in
The pulmonary edema severity grades are extracted from the MIMIC-CXR dataset through 3 different means, described as follows.
Regular expression labeling from radiology reports.
The edema severity grades were extracted from radiology reports using regular expression (regex) . Each severity level is associated with several keyword terms that are representative of that severity group (e.g., "Kerley B lines" in "2-interstitial edema"), as listed in . If multiple keyword terms are detected affirmed in a report, the most severe level will be assigned to that report. Within the 16,108 radiology reports in the CHF cohort, regex is able to label 6,710 reports.
Expert labeling from radiology reports.
A board-certified radiologist and two domain experts have read 485 radiology reports that are randomly selected from the 16,108 radiology reports in the CHF cohort, and give pulmonary edema severity grades based on the reports (more details in ).
Consensus labeling from chest radiographs.
We had 3 senior radiology residents and 1 attending radiologist manually label a set of 141 frontal view chest radiographs from 123 CHF patients in MIMIC-CXR. The three residents labeled the images independently. If the three residents had exactly the same pulmonary edema severity of an image, then a consensus label is assigned. If only two out of the three residents agreed on the edema severity, then an attending radiologist reviewer was added. If a majority of the reviewers (three out of four) now agreed, then a consensus label is assigned. If no consensus was reached, then the four radiologists discussed their interpretations in a round-robin process, and then again voted anonymously on their edema severity levels. If a majority of the votes was reached, then a consensus label is assigned. If no consensus was reached, then another round-robin discussion is performed with another anonymous vote. This process is then repeated one additional time, and if no consensus is reached, then the image is labelled as no consensus. After independent labeling, discussion,and voting, the inter-rater agreement (Fleiss' Kappa) among the 3 radiology residents was 0.97. Our modified Delphi process yields consensus labels for all 141 images (more details in ).
This dataset contains 3 metadata files that consist of pulmonary edema severity grades extracted from the MIMIC-CXR dataset through different means.
regex_report_edema_severity.csv. The edema severity grades were extracted from radiology reports using regular expression (regex). Regex was able to label 6710 radiology reports.
expert_report_edema_severity.csv. A board-certified radiologist and two domain experts have read 485 radiology reports and give pulmonary edema severity grades based on the reports.
consensus_image_edema_severity.csv. Three senior radiology residents and one attending radiologist have labeled 141 chest radiographs. This label set is the highest-quality among the three sets, and we recommend holding it out for testing.
Together with MIMIC-CXR, prior work has utilized this dataset to develop chest x-ray image models for pulmonary edema assessment, using semi-supervised learning that leverages the large number of chest x-ray images in MIMIC-CXR   or image-text joint learning that further leverages the raw text in the radiology reports .
The regex labels have been used for model training. The expert labels (from reports) and/or the consensus labels (from images) have been used for model testing.
While this dataset is curated within the CHF cohort in MIMIC-CXR, pulmonary edema is a manifestation of volume status in sepsis and renal failure, just as in CHF. Future work could expand the label curation in other disease contexts or use other clinical data in MIMIC-IV  to obtain surrogates to patient volume status.
The authors thank Alistair Johnson, James L. Smith, Stanley Y. Kim, Amalie C. Thavikulwat for helping with the data curation. Research related to this dataset was supported by NIH NIBIB NAC P41EB015902, Philips, Wistron, MIT Lincoln Laboratory, and MIT Deshpande Center.
Conflicts of Interest
Philips Healthcare supported the creation of this resource.
- Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. https://doi.org/10.13026/C2JT1Q.
- Liao, R., Rubin, J., Lam, G., Berkowitz, S., Dalal, S., Wells, W., ... & Golland, P. (2019). Semi-supervised learning for quantification of pulmonary edema in chest x-ray images. arXiv preprint arXiv:1902.10785.
- Chauhan*, G., Liao*, R., Wells, W., Andreas, J., Wang, X., Berkowitz, S., ... & Golland, P. (2020, October). Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 529-539). Springer, Cham.
- Horng*, S., Liao*, R., Wang, X., Dalal, S., Golland, P., & Berkowitz, S. J. (2021). Deep learning to quantify pulmonary edema in chest radiographs. Radiology: Artificial Intelligence, e190228.
- Zhao CY, Xu-Wilson M, Gangireddy SR, Horng S. Predicting Disposition Decision, Mortality, and Readmission for Acute Heart Failure Patients in the Emergency Department Using Vital Sign, Laboratory, Echocardiographic, and Other Clinical Data. Circulation. 2018 Nov 6;138(Suppl_1):A14287-.
Only logged in users who sign the specified data use agreement can access the files.
License (for files):
PhysioNet Restricted Health Data License 1.5.0