Challenge Credentialed Access

CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays

Gregory Holste Song Wang Ajay Jaiswal Yuzhe Yang Mingquan Lin Yifan Peng Atlas Wang

Published: Sept. 28, 2023. Version: 1.1.0

When using this resource, please cite: (show more options)
Holste, G., Wang, S., Jaiswal, A., Yang, Y., Lin, M., Peng, Y., & Wang, A. (2023). CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays (version 1.1.0). PhysioNet.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


Many real-world problems, including diagnostic medical imaging exams, are “long-tailed” – there are a few common findings followed by more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple disease findings simultaneously. This is distinct from most large-scale image classification benchmarks, where each image only belongs to one label and the distribution of labels is relatively balanced. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied its interplay with label co-occurrence. This competition will provide a challenging large-scale multi-label long-tailed learning task on chest X-rays (CXRs), encouraging community engagement with this emerging interdisciplinary topic. This project contains labels for the CXR-LT 2023 competition dataset, containing 377,110 CXRs from 26 classes, and a related subset used in the MICCAI 2023 paper, "How Does Pruning Impact Multi-Label Long-Tailed Learning?" containing 257,018 frontal CXRs from 19 classes.



Chest radiography, like many diagnostic medical exams, produces a long-tailed distribution of clinical findings; while a small subset of diseases are routinely observed, the vast majority of diseases are relatively rare [1]. This poses a challenge for standard deep learning methods, which exhibit bias toward the most common classes at the expense of the important, but rare, “tail” classes [2]. Many existing methods [3] have been proposed to tackle this specific type of imbalance, though only recently with attention to long-tailed medical image recognition problems [4-6]. Diagnosis on chest X-rays (CXRs) is also a multi-label problem, as patients often present with multiple disease findings simultaneously; however, only a select few studies incorporate knowledge of label co-occurrence into the learning process [7-9, 12].

Since most large-scale image classification benchmarks contain single-label images with a mostly balanced distribution of labels, many standard deep learning methods fail to accommodate the class imbalance and co-occurrence problems posed by the long-tailed, multi-label nature of tasks like disease diagnosis on CXRs [2].

To develop a benchmark for long-tailed, multi-label medical image classification, we expand upon the MIMIC-CXR-JPG [10,11] dataset by enlarging the set of target classes from 14 to 26 (see full details in “Data Description”), generating labels for 12 new disease findings by parsing radiology reports. This follows the procedure of Holste et al. [2], who added 5 new findings to MIMIC-CXR-JPG – Calcification of the Aorta, Subcutaneous Emphysema, Tortuous Aorta, Pneumomediastinum, and Pneumoperitoneum – to study long-tailed learning approaches for CXRs and Moukheiber et al. [12], who added 5 new classes – Chronic obstructive pulmonary disease, Emphysema, Interstitial lung disease, Calcification, Fibrosis to study ensemble methods for few-shot learning on CXRs.

CXR-LT challenge task

Given a CXR, detect all clinical findings. If no findings are present, predict "No Finding" (with the exception that "No Finding" can co-occur with "Support Devices"). To do this, participants will train multi-label thorax disease classifiers on the provided labeled training data.

ICCV 2023 CVAMD workshop

This shared task is hosted in conjunction with the ICCV 2023 [13] workshop, Computer Vision for Automated Medical Diagnosis (CVAMD) [14]. Upon completion of the competition, we will invite participants to submit their solutions for potential presentation at CVAMD 2023 and publication in the ICCV 2023 workshop proceedings. We intend to accept 5-6 papers for publication and select 2-3 of the accepted papers for oral presentation at CVAMD in Paris.


This shared task will be conducted on CodaLab [15]. Participants must have credentialed access to MIMIC-CXR-JPG v2.0.0 [10,11] (more details in the "Data Description" section) and register for the competition on CodaLab.

Tentative timeline

  • 05/01/2023: Development Phase begins. Participants can begin making submissions and tracking results on the public leaderboard.
  • 07/14/2023: Testing Phase begins. Unlabeled test data will be released to registered participants. The leaderboard will be kept private for this phase.
  • 07/17/2023: Competition ends. Participants are invited to submit their solutions as 8-page papers to ICCV CVAMD 2023!
  • 07/28/2023: ICCV CVAMD 2023 submission deadline. (Competition participants may receive an extension if needed.)
  • 08/04/2023: ICCV CVAMD 2023 acceptance notification.
  • 10/02/2023: ICCV CVAMD 2023 workshop.

Data Description

CXR-LT 2023 challenge data

This challenge will use an expanded version of MIMIC-CXR-JPG v2.0.0 [10,11], a large benchmark dataset for automated thorax disease classification. Following Holste et al. [2], each CXR study in the dataset was labeled with 12 newly added disease findings extracted from the associated radiology reports. The resulting long-tailed (LT) dataset contains 377,110 CXRs, each labeled with at least one of 26 clinical findings (including a "No Finding" class). In addition to the 13 clinical findings in the original MIMIC-CXR-JPG v2.0.0 dataset, the following 12 new findings have been added:

  • Calcification of the Aorta
  • Emphysema
  • Fibrosis
  • Hernia
  • Infiltration
  • Mass
  • Nodule
  • Pleural Thickening
  • Pneumomediastinum
  • Pneumoperitoneum
  • Subcutaneous Emphysema
  • Tortuous Aorta

Within the cxr-lt-2023/ directory, training set and validation set image IDs, metadata, and labels can be found in train.csv and development.csv, respectively. Alternatively, these files are available to registered participants on CodaLab [15] under "Participate" -> "Files". Test set image IDs, metadata, and labels can be found in test.csv after the conclusion of the competition.

MICCAI 2023 PruneCXR data

Additionally, a subset of this dataset used in the forthcoming MICCAI 2023 paper, "How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers?" [16] is provided here to ensure reproducibility; please see the accompanying Github repository [17] for full details on implementation and reproducibility. This study consists of 257,018 frontal CXRS, each labeled with one of 19 clinical findings (including a "No Findings" class). The following 5 findings are included in the label set in addition to the original 13 found in MIMIC-CXR-JPG v2.0.0:

  • Calcification of the Aorta
  • Pneumomediastinum
  • Pneumoperitoneum
  • Subcutaneous Emphysema
  • Tortuous Aorta

Within the miccai-2023_mimic-cxr-lt/ directory, the training, validation, and test set image IDs and labels for this study can be found, respectively, in miccai2023_mimic-cxr-lt_labels_train.csv, miccai2023_mimic-cxr-lt_labels_val.csv, and miccai2023_mimic-cxr-lt_labels_test.csv.


Participants will upload image-level predictions on the provided test sets for evaluation. Since this is a multi-label classification problem with severe imbalance, the primary evaluation metric will be mean Average Precision (mAP) (i.e., "macro-averaged" AP across the 26 classes). While Area Under the Receiver Operating Characteristic Curve (AUC) is a standard metric for related datasets, AUC can be heavily inflated in the presence of strong imbalance. Instead, mAP is more appropriate for the long-tailed, multi-label setting since it both (i) measures performance across decision thresholds and (ii) does not degrade under class imbalance. For thoroughness, mean AUC (mAUC) and mean F1 score (mF1) — using a threshold of 0.5 for each class — will be calculated and appear on the leaderboard, but not contribute to team rankings.

A sample submission .csv for the development phase can be found in development_sample_submission.csv or on CodaLab [15] under "Participate" -> "Files". A sample submission for the test phase (test_sample_submission.csv) will be released when the testing phase begins on 07/14/2023. Reproducible code is required for this challenge, so all final submissions must contain all necessary code for preprocessing, training, inference, etc. See instructions below for how to successfully submit on CodaLab.

Submission file structure

All CodaLab submissions are required to be in .zip format. For this competition, this compressed .zip file must contain (i) a predictions .csv file and (ii) a "code/" directory with all of your training and inference code. The required file structure is as follows:

        xxx.csv  # predictions .csv file
        code/  # code directory
        ├── ...

To create the final submission .zip file, you might then run zip -r xxx.csv code. Please note that the names of your individual submission files do not matter, though the code directory must be named "code".

Predictions .csv file requirements

Your predictions .csv file must contain image-level predictions of the probability that each of the 26 classes are present in a given image. Specifically,

  • You must have a "dicom_id" column with the provided unique image IDs for the given evaluation set.
    • Entries must be strings.
  • You must have a column for each of the 26 class labels ("Atelectasis", "Calcification of the Aorta", etc.).
    • Entries must be floats or integers in the interval [0, 1].

Release Notes

Version 1.1.0: This is an update to the CXR-LT 2023 challenge dataset now that the competition has concluded. All test phase image IDs, metadata, and labels have been added. Additionally, a subset of the CXR-LT challenge dataset used in Holste et al. [16] has been included for reproducibility.


This shared task uses image data from MIMIC-CXR-JPG v2.0.0 and generates labels from free-text radiology reports in MIMIC-CXR, a de-identified dataset that we gained access to through a PhysioNet Credentialed Health Data Use Agreement (v1.5.0).


We thank our steering committee for their support of this project: Leo Anthony Celi, Zhiyong Lu, George Shih, and Ronald Summers.

Conflicts of Interest

The authors have no conflicts of interest to declare.


  1. Holste G, Jiang Z, Jaiswal A, Hanna M, Minkowitz S, Legasto AC, Escalon JG, Steinberger S, Bittman M, Shen TC, Ding Y, Summers RM, Shih G, Peng Y, Wang Z. How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers? arXiv preprint arXiv:2308.09180. 2023 Aug 17.
  2. Github. PruneCXR: How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers? [Internet]. Available from:
  3. Zhang Y, Kang B, Hooi B, Yan S, Feng J. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 Apr 19.
  4. Zhang R, Haihong E, Yuan L, He J, Zhang H, Zhang S, Wang Y, Song M, Wang L. MBNM: multi-branch network based on memory features for long-tailed medical image recognition. Computer Methods and Programs in Biomedicine. 2021 Nov 1;212:106448.
  5. Ju L, Wang X, Wang L, Liu T, Zhao X, Drummond T, Mahapatra D, Ge Z. Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24 2021 (pp. 3-12). Springer International Publishing.
  6. Yang Z, Pan J, Yang Y, Shi X, Zhou HY, Zhang Z, Bian C. ProCo: Prototype-Aware Contrastive Learning for Long-Tailed Medical Image Classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VIII 2022 Sep 16 (pp. 173-182). Cham: Springer Nature Switzerland.
  7. Chen H, Miao S, Xu D, Hager GD, Harrison AP. Deep hierarchical multi-label classification of chest X-ray images. In International Conference on Medical Imaging with Deep Learning 2019 May 24 (pp. 109-120). PMLR.
  8. Wang G, Wang P, Cong J, Liu K, Wei B. BB-GCN: A Bi-modal Bridged Graph Convolutional Network for Multi-label Chest X-Ray Recognition. arXiv preprint arXiv:2302.11082. 2023 Feb 22.
  9. Zhou SK, Greenspan H, Davatzikos C, Duncan JS, Van Ginneken B, Madabhushi A, Prince JL, Rueckert D, Summers RM. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE. 2021 Feb 26;109(5):820-38.
  10. Johnson AE, Pollard TJ, Greenbaum NR, Lungren MP, Deng CY, Peng Y, Lu Z, Mark RG, Berkowitz SJ, Horng S. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042. 2019 Jan 21.
  11. PhysioNet. MIMIC-CXR-JPG - chest radiographs with structured labels [Internet]. Available from:
  12. Moukheiber D, Mahindre S, Moukheiber L, Moukheiber M, Wang S, Ma C, Shih G, Peng Y, Gao M. Few-Shot Learning Geometric Ensemble for Multi-label Classification of Chest X-Rays. In Data Augmentation, Labelling, and Imperfections: Second MICCAI Workshop, DALI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings 2022 Sep 16 (pp. 112-122). Cham: Springer Nature Switzerland.
  13. IEEE/CVF. International Conference on Computer Vision 2023 [Internet]. Available from:
  14. Yang Y, Wang F, Holste G, Peng Y. Computer Vision for Automated Medical Diagnosis [Internet]. Available from:
  15. CodaLab. CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays [Internet]. Available from:
  16. Chen B, Li J, Lu G, Yu H, Zhang D. Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification. IEEE Journal of Biomedical and Health Informatics. 2020 Jan 16;24(8):2292-302.
  17. Holste G, Wang S, Jiang Z, Shen TC, Shih G, Summers RM, Peng Y, Wang Z. Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study. In Data Augmentation, Labelling, and Imperfections: Second MICCAI Workshop, DALI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings 2022 Sep 16 (pp. 22-32). Cham: Springer Nature Switzerland.

Parent Projects
CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays was derived from: Please cite them when using this project.

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.