Database Open Access

Heart and lung segmentations for MIMIC-CXR/MIMIC-CXR-JPG and Montgomery County TB databases

Benjamin Duvieusart Felix Krones Guy Parsons Lionel Tarassenko Bartlomiej W Papiez Adam Mahdi

Published: Aug. 14, 2023. Version: 1.0.0

When using this resource, please cite: (show more options)
Duvieusart, B., Krones, F., Parsons, G., Tarassenko, L., Papiez, B. W., & Mahdi, A. (2023). Heart and lung segmentations for MIMIC-CXR/MIMIC-CXR-JPG and Montgomery County TB databases (version 1.0.0). PhysioNet.

Additionally, please cite the original publication:

Duvieusart, B., Krones, F., Parsons, G., Tarassenko, L., Papież, B.W., Mahdi, A. (2022). Multimodal Cardiomegaly Classification with Image-Derived Digital Biomarkers. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, CB. (eds) Medical Image Understanding and Analysis. MIUA 2022. Lecture Notes in Computer Science, vol 13413. Springer, Cham.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


Segmenting the heart and lungs from chest X-ray images is essential for accurate disease diagnosis, enabling the calculation of image-derived digital biomarkers for cardiopulmonary health assessment. Furthermore, creation of annotated data sets can enhance development of machine learning applications in disease detection, reduce image noise for better clarity, or promote standardization for tracking disease progression or treatment effects. Generation of manual segmentations is time intensive process and it also requires access to trained medical professionals to verify the quality and accuracy of the scans. Originally made to train heart and lung segmentation/detection models for cardiomegaly diagnosis, the manual segmentations in this data paper are published here in hope of aiding development of AI models relating to heart and lungs identification in chest X-rays. This database presents the heart and lung segmentations for 200 semi-randomly chosen MIMIC-CXR/MIMIC-CXR-JPG posterior-anterior chest X-rays for the purpose of training detection and segmentation networks. Additionally, it contains the heart segmentations for the 138 posterior-anterior chest X-rays in the Montgomery Country tuberculosis database.


X-ray scans are one of the most common medical imaging modalities in hospitals and clinics worldwide [1]. However, there is a global shortage of trained radiologists, leading doctors unable to meet imaging and diagnostic demands [2]. This places X-rays, specifically chest X-rays, as a prime candidate for the development of AI powered diagnostic aids - these have the potential to relieve the burden placed on doctors and increase the quality and equity of healthcare worldwide [3,4]. When developing AI tools for medical purposes a large number of high quality ground truth segmentations are a fundamental requirement. This can be a major hurdle as each segmentation is time intensive and qualified medical personnel is needed validate the accuracy and consistency of the segmentations. This database of 200 manually segmented lungs and 338 manually segmented heart, was originally curated for [3], a study developing multimodal diagnostic tool for cardiomegaly. Specifically, this database was used to train Faster-RCNN and Mask-RCNN models to build an pipeline to automatically retrieve cardiothoracic ratio and cardiopulmonary area ratio biomarker values. We hope this database of segmentations can be used to train novel segmentation and detection networks and ease the burden of manual segmentations by increasing the pool of widely available scan/segmentation pairs.


Data Sources

This database contains:

  • Heart and lung segmentations relating to 200 posterior-anterior (PA) chest X-rays which were semi-randomly selected from MIMC-CXR [4] /MIMIC-CXR-JPG [5].
  • Heart segmentations relating to all 138 PA chest X-rays from the Montgomery County tuberculosis database [6].

The 200 scans from MIMIC-CXR/MIMIC-CXR-JPG with ground truth segmentation masks in this paper, were selected randomly after the application of 3 exclusion criteria. The exclusion criteria were: 

  1. Scans have to be from the posterior-anterior (PA) view (i.e. all other views are excluded)
  2. Scans had to be of high quality – visual quality control was implemented to remove scan in which the chest is centered and clear visible
  3. Scans cannot have 'positive' or 'negative' cardiomegaly labels, rather having 'missing' or 'uncertain' labels in order to avoid the samples used for cardiomegaly classification in [3]. Labels were generated by combining NegBio and Chexpert labels in MIMIC-CXR-JPG.

Data Preparation

The definition of the lung area for used for the ground truth lung segmentations is taken from [8] and explained as "any pixel for which radiation passed through the lung, but not through the mediastinum, the heart, structures below the diaphragm, and the aorta. The vena cava superior, when visible, is not considered to be part of the mediastinum."  Similarly, the definition of the heart area used for the cardiac segmentations is taken from [8] and explained as "pixels for which radiation passes through the heart. From anatomical knowledge the heart border at the central top and bottom part can be drawn". To aid segmentations, particularly of the heart, non-medical contributors were given guidance on identification and segmentation of anatomical markers in a PA chest X-ray by an experienced NHS clinician. 

Segmentations were generated based on DICOM files using ITK-Snap software [9] and exported into binary JPG files in a 3 stage process:

  1. Initial heart and lung segmentation were generated by non-medical professions
  2. Segmentations were examined and review by an experienced NHS clinician, compiling feedback on an image-by-image basis to achieve gold-standard segmentation 
  3. Feedback given by NHS clinician was implemented for gold-standard segmentation, and final masks were exported in JPG format

Data Description

The dataset consists of two folders, mimic_masks and mgmy_masks, according to source of the scan.

mimic_mask has a file and two subfolders, /lungs/ and /heart/, which contain segmentations of lungs and heart respectively. The MIMIC_links.csv file links between mask files and corresponding DICOM file paths from MIMIC-CXR.

mgmy_masks contains segmentations of hearts from the Montgomery County TB database.

Detailed Description

  • mimic_masks: folder containing binary PNG images of manually created ‘gold standard’ heart and lung segmentations masks for 200 PA chest X-rays taken from MIMIC-CXR, and file linking
    • MIMIC_links.csv :  spreadsheet linking segmentation file names (3 digit format, ###.png) to DICOM file paths from MIMIC-CXR and MIMIC-CXR-JPG. Sample linking of segmentation masks to original images can be found in the CardiomegalyBiomarkers GitHub repository [10].
    • mimic_masks/heart/ : subfolder containing PNG of heart segmentations, files names are in 3 digit format (###.png).
    • mimic_masks/lungs/ : subfolder containing PNG of lung segmentations, files names are in 3 digit format (###.png).
  • The mgmy_masks folder contains binary PNG images of manually created ‘gold standard’ heart segmentations masks for 138 PA chest X-rays from Montgomery County TB database. Files names are in format MCUCXR_#####_#.png. N.B. file names of segmentations are identical to corresponding CXR files in Montgomery Country TB database.

Folder Structure

  └── mgmy_masks
      ├── MCUCXR_#####_#.png
      ├── MCUCXR_#####_#.png
      ├── ...

  └── mimic_masks 
      ├── MIMIC_links.csv
      └── heart
            ├── ###.png
            ├── ###.png
            ├── ...

      └── lungs
            ├── ###.png
            ├── ###.png
            ├── ...

Usage Notes

Note that while the segmentations were verified by an NHS clinician to ensure a maximum level of consistency and accuracy, they were completed by 3 different non-medically trained researchers, as such there may be inter-segmenter variability. Code showing training of models and implementation of heart and lung segmentations train Faster-RCNN or Mask-RCNN can be found in the CardiomegalyBiomarkers GitHub repository [10].


The authors declare no ethics concerns.

Conflicts of Interest

The authors have no conflicts of interest to declare


  1. McAdams HP, Samei E, Dobbins J 3rd, Tourassi GD, Ravin CE. Recent advances in chest radiography. Radiology. 2006 Dec;241(3):663-83. doi: 10.1148/radiol.2413051535. PMID: 17114619.
  2. Rimmer A. Radiologist shortage leaves patient care at risk, warns royal college BMJ 2017; 359 :j4683 doi: 10.1136/bmj.j4683
  3. Duvieusart, B., Krones, F., Parsons, G., Tarassenko, L., Papież, B.W., Mahdi, A. (2022). Multimodal Cardiomegaly Classification with Image-Derived Digital Biomarkers. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, CB. (eds) Medical Image Understanding and Analysis. MIUA 2022. Lecture Notes in Computer Science, vol 13413. Springer, Cham. doi: 10.1007/978-3-031-12053-4_2
  4. Moukheiber D, Mahindre S, Moukheiber L, Moukheiber M, Wang S, Ma C, Shih G, Peng Y, Gao M. Few-Shot Learning Geometric Ensemble for Multi-label Classification of Chest X-Rays. InMICCAI Workshop on Data Augmentation, Labelling, and Imperfections 2022 Sep 16 (pp. 112-122). Cham: Springer Nature Switzerland.
  5. Johnson, A., Pollard, T., Mark, R., Berkowitz, S., Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. doi: 10.13026/C2JT1Q.
  6. Johnson, A., Lungren, M., Peng, Y., Lu, Z., Mark, R., Berkowitz, S., Horng, S. (2019). MIMIC-CXR-JPG - chest radiographs with structured labels (version 2.0.0). PhysioNet. doi: 10.13026/8360-t248.
  7. Jaeger S, Candemir S, Antani S, Wáng YX, Lu PX, Thoma G. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg. 2014 Dec;4(6):475-7. doi: 10.3978/j.issn.2223-4292.2014.11.20. PMID: 25525580; PMCID: PMC4256233.
  8. van Ginneken, B., Stegmann, M., Loog, M. (2006). Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Medical Image Analysis, vol 10(1). doi: 10.1016/
  9. Yushkevich, P., Piven, J., Hazlett, H. C., Smith, R. G., Ho, S., Gee, J., Gerig, G. (2006). User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage, vol 31(3). doi: 10.1016/j.neuroimage.2006.01.015
  10. Duvieusart, B., Krones, F. (2022). CardiomegalyBiomarkers GitHub Repository. GitHub. Retrieved on 7 July 2023 from

Parent Projects
Heart and lung segmentations for MIMIC-CXR/MIMIC-CXR-JPG and Montgomery County TB databases was derived from: Please cite them when using this project.

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Open Data Commons Attribution License v1.0

Corresponding Author
You must be logged in to view the contact information.


Total uncompressed size: 7.0 MB.

Access the files
Folder Navigation: <base>
Name Size Modified
LICENSE.txt (download) 19.9 KB 2023-08-09
SHA256SUMS.txt (download) 48.4 KB 2023-08-14