Name: MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays
Published: March 25, 2026
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Geon Choi , Hangyul Yoon , Hyunju Shin , Hyunki Park , Sang Hoon Seo , Eunho Yang , Edward Choi

Published: March 25, 2026. Version: 1.0.0

When using this resource, please cite:
Choi, G., Yoon, H., Shin, H., Park, H., Seo, S. H., Yang, E., & Choi, E. (2026). MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/8ejy-4t06

MLA	Choi, Geon, et al. "MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays" (version 1.0.0). PhysioNet (2026). RRID:SCR_007345. https://doi.org/10.13026/8ejy-4t06
APA	Choi, G., Yoon, H., Shin, H., Park, H., Seo, S. H., Yang, E., & Choi, E. (2026). MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/8ejy-4t06
Chicago	Choi, Geon, Yoon, Hangyul, Shin, Hyunju, Park, Hyunki, Seo, Sang Hoon, Yang, Eunho, and Edward Choi. "MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays" (version 1.0.0). PhysioNet (2026). RRID:SCR_007345. https://doi.org/10.13026/8ejy-4t06
Harvard	Choi, G., Yoon, H., Shin, H., Park, H., Seo, S. H., Yang, E., and Choi, E. (2026) 'MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays' (version 1.0.0), PhysioNet. RRID:SCR_007345. Available at: https://doi.org/10.13026/8ejy-4t06
Vancouver	Choi G, Yoon H, Shin H, Park H, Seo S H, Yang E, Choi E. MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays (version 1.0.0). PhysioNet. 2026. RRID:SCR_007345. Available from: https://doi.org/10.13026/8ejy-4t06

BibTeX

@article{PhysioNet-mimic-cxr-ext-ils-1.0.0,
  author = {Choi, Geon and Yoon, Hangyul and Shin, Hyunju and Park, Hyunki and Seo, Sang Hoon and Yang, Eunho and Choi, Edward},
  title = {{MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays}},
  journal = {{PhysioNet}},
  year = {2026},
  month = mar,
  note = {Version 1.0.0},
  doi = {10.13026/8ejy-4t06},
  url = {https://doi.org/10.13026/8ejy-4t06}
}

Additionally, please cite the original publication:

Choi, Geon, et al. "Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset." arXiv preprint arXiv:2511.15186 (2025).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

The applicability of current lesion segmentation models for chest X-rays (CXRs) has been limited both by a small number of target labels and the reliance on complex, expert-level text inputs, creating a barrier to practical use. To address these limitations, we introduce instruction-guided lesion segmentation (ILS), a medical-domain adaptation of referring image segmentation (RIS) designed to segment diverse lesion types based on simple, user-friendly instructions. Under this task, we construct MIMIC-CXR-Ext-ILS, the first large-scale instruction-answer dataset for CXR lesion segmentation, using our fully automated multimodal pipeline that generates annotations from CXR images and their corresponding reports. MIMIC-CXR-Ext-ILS contains 1.1M instruction-answer pairs derived from 192K images and 91K unique segmentation masks, covering seven major lesion types. Despite being constructed entirely without human intervention, expert evaluations report a high acceptance rate of over 95% for this dataset.

Background

Medical imaging is an essential technique in modern medicine, enabling accurate diagnosis and appropriate treatment. Among various imaging modalities, chest X-ray (CXR) is one of the most common examinations due to its high accessibility and rapid acquisition [1]. Radiologists reach a diagnosis by integrating visual evidence from CXRs with their clinical knowledge, and describe these findings in a text format known as a radiology report. A key step in this diagnostic process is identifying the precise location and boundary of a lesion—an abnormal region with pathological changes [2]. This task is labor-intensive and demands substantial clinical expertise and analytical precision.

To alleviate physicians’ workload in localizing pathological regions, there is a growing demand for automated lesion segmentation models in CXRs. Recently, vision–language models (VLMs) equipped with segmentation modules [3-5] have emerged as a promising solution for referring image segmentation (RIS), as they can interpret diverse user-specific needs expressed through natural language instructions. However, despite the success of such VLMs in general domain RIS, their application to CXRs remains limited. Although prior studies [6, 7] have explored CXR lesion segmentation using text prompts, they are limited to a single lesion type (e.g., COVID-19) and moreover require long, detailed expert-level medical descriptions based on tailored CXR review (e.g., "Bilateral pulmonary infection, two infected areas, upper right lung and upper left lung.") as input.

To address these limitations, we propose a more user-friendly task, namely instruction-guided lesion segmentation(ILS). In this task, the model is required to process diverse user instructions, ranging from prompts that specify the lesion type and target location, to requests that look for abnormalities globally. If the requested lesion is not present, the model should reliably report its absence. Additionally, the model should be able to provide textual descriptions regarding a lesion's location or type, even if not explicitly prompted by the user.

However, existing CXR datasets mostly provide only coarse bounding-box localization or single lesion type masks without explicit links to textual prompts [8-14]. To bridge this gap, we introduce MIMIC-CXR-Ext-ILS, a large-scale dataset constructed via a novel automated pipeline. By extracting high-confidence anomalous regions and structured text from MIMIC-CXR [15-18], we generated 1.1M instruction–answer pairs and 91K lesion segmentation masks across 192K images. MIMIC-CXR-Ext-ILS is the first to offer dense segmentation annotations explicitly paired with versatile instructions, enabling the development of interactive and accessible medical AI.

Methods

The MIMIC-CXR dataset, comprising DICOM images and paired radiology reports, serves as the foundation of our work. Our data curation pipeline consists of three main phases: (1) automated generation of lesion masks leveraging both image and report modalities; (2) creation of instruction-answer pairs aligned with the generated masks; and (3) quality verification by four board-certified radiation oncologists to ensure clinical validity. For further details, please refer to the original paper [19].

Grounded Lesion Mask Generation

To construct our dataset without manual annotation, we utilize the MIMIC-CXR repository, leveraging paired images and radiology reports through a four-stage automated pipeline.

Report Structuring and Location Mapping. The first step employs large language models (LLMs) to parse reports into structured tuples consisting of the following categories: entity, sentence index, presence, certainty, location, and predicted lesion type. For example, if the second sentence in a radiology report is "The lower lung opacity is pneumonia," its corresponding output is (opacity, 2, positive, definitive, [right lung base, left lung base], pneumonia). Here, the term "lower lung" in the original text is explicitly mapped to the standard anatomical labels "right lung base" and "left lung base."
Spatial Information Extraction. The second step utilizes three specialized models: RadEdit [20], CXAS [21], and a pretrained YOLO [22]. These models are used to generate an anomaly map, anatomy masks, and lesion box masks, respectively, which serve as essential visual cues for the subsequent mask generation process. Specifically, each model functions as follows:
- RadEdit: This diffusion-based editing model modifies X-ray images guided by text prompts. By inputting an X-ray image containing a lesion along with the prompt "No acute cardiopulmonary process," we generate a counterfactual image with the lesion removed. We then calculate the pixel-level difference between the original and edited images to construct an anomaly map, operating on the premise that differences exceeding a predefined threshold indicate the presence of a lesion.
- CXAS: This model segments various anatomical regions within chest X-rays. Leveraging the locations extracted during the initial mapping step, CXAS generates an anatomy mask that approximates the spatial extent of the specific disease mentioned in the radiology report. This mask is subsequently employed to spatially filter the anomaly map.
- Pretrained YOLO: This model detects various abnormalities in chest X-rays, outputting bounding boxes with associated confidence scores. To generate the final lesion masks, we create binary box masks by filling the interiors of the high-confidence bounding boxes. These box masks are then utilized to spatially filter the anomaly map.
Lesion Mask Generation. With the three visual cues extracted, initial lesion masks are generated. The anomaly map is decomposed and aligned with specific report findings using the anatomy and lesion box masks. These initial masks then undergo a post-processing step involving the removal of small, noisy artifacts to produce the final, refined lesion masks.
Location Verification. In the final step, we explicitly verify whether each lesion mask has been successfully grounded to the structured report. To assess the grounding status, we define three types of locations: reported location, grounded location, and empty location. The reported location is the set of anatomical labels derived from the initial report structuring. Based on this set, the grounded location is defined as the subset of the reported location that spatially overlaps with a generated lesion mask, confirming the successful localization of the reported finding. Finally, we introduce an empty location, which refers to a lung region with no reported lesions, used to generate negative samples.

Instruction-Answer Pair Generation

We build our dataset for seven major lesion types found in CXRs: cardiomegaly, pneumonia, atelectasis, opacity, consolidation, edema, and effusion. These lesions are not only the most frequently mentioned in radiology reports, but also clinically significant to be common annotation targets in other medical datasets [8, 15, 16, 23]. For each lesion, we construct positive instruction-answer pairs, which include a ground-truth lesion mask. Negative pairs using an empty mask are also generated to enable the model to confirm the absence of lesions.

Instruction Types and Limitations. We consider three types of segmentation instructions (Table 1). A basic instruction specifies both the segmentation target and its location. The location can be a broad region (such as left lung or right lung), one of eight more specific zones (apical, upper, mid, and lower zones for each lung), or a combination of these regions. In contrast, a global instruction specifies only the segmentation target. A lesion inference instruction asks the model to predict the type of lesion represented by an opacity within a given location. The generation of these instructions is directly bounded by the outcomes of the grounded lesion mask generation. For instance, a global instruction becomes invalid if the generated mask only partially covers the target lesion. To mitigate this, our framework dynamically synthesizes only the instruction-answer pairs that are strictly validated by the grounding information available for each specific image.

Table 1. Templates for each question type. Each type includes answer templates for both positive and negative cases, with the negative answers positioned in the last row of each cell.
Type	Role	Template
Basic	Instruction	Segment the [Target] in the [Location]
	Answer	[SEG]
	Answer	[SEG] There is no [Target] in the [Location].
Global	Instruction	Segment the [Target].
	Answer	[SEG] It is located in the [Location].
	Answer	[SEG] There is no [Target].
Lesion Inference	Instruction	Segment the opacity in the [Location] and predict its type.
	Answer	[SEG] It is highly suggestive of [Lesion].
		[SEG] It possibly reflects [Lesion].
		[SEG] There is no opacity in the [Location].

Instruction Generation. The instruction generation process begins by creating a basic instruction for each grounded lesion. Next, we determine whether a global instruction can be generated. The global instruction is created only when the grounded location and the reported location are identical. Separately, we generate lesion inference instructions by transforming the basic instructions for pneumonia, atelectasis, and edema, replacing these specific lesion types with opacity. Negative samples are generated by (1) selecting lesion types that are not mentioned or explicitly negated in the radiology report; or (2) utilizing empty locations to substitute the original location in the basic instruction of a positive sample.
Answer Generation. Each answer consists of a lesion mask and a textual description. The answer lesion masks for positive pairs are determined differently depending on whether they are organ-level or localized abnormalities. For cardiomegaly, we utilize a heart mask as its corresponding lesion mask since this condition is defined by the state of a specific organ [24]. In contrast, localized abnormalities (e.g., pneumonia or effusion) can appear in variable locations, so for these findings, we use the lesion masks generated in grounded lesion mask generation. For negative pairs, an empty mask is used. As for the textual description, it is also provided for both positive and negative samples. Specifically, the answer template for lesion inference incorporates a certainty level.

Quality Verification for Test Set

To assess the quality of MIMIC-CXR-Ext-ILS, an expert review was conducted by four radiation oncologists. For the test set samples, clinicians classified each case as either acceptable or unacceptable based on mask quality. Positive cases were reviewed by all experts, while negatives were split among them. Any sample judged unacceptable by at least one expert was excluded from the final test set. Among the 10.5K mask samples initially reviewed, 96.4% were rated as acceptable and finally included in the test set.

Data Description

Overview

MIMIC-CXR-Ext-ILS is an automatically generated large-scale dataset derived from MIMIC-CXR images and radiology reports, consisting of the following key elements:

Images: 191,563 CXR images from MIMIC-CXR
Instruction-Answer Pairs: 1,065,621 pairs in total
- Positive Pairs: 135,291 positive pairs in total (train: 131,825, val: 1,112, test: 2,354)
- Negative Pairs: 930,330 negative pairs in total (train: 913,315, val: 7,134, test: 9,881)
Annotations: 91,301 lesion segmentation masks
Target Findings: Seven major lesion types frequently found in CXRs: cardiomegaly, pneumonia, atelectasis, opacity, consolidation, edema, and effusion. The distribution of pairs per lesion type is detailed in Table 2.
Structure: Each CXR image corresponds to multiple instruction-answer pairs. For positive samples, the lesion segmentation mask serves as the target label.

Table 2. Distribution of the MIMIC-CXR-Ext-ILS dataset.
Lesion Type	# of Positive Pairs	# of Negative Pairs
Cardiomegaly	40,243	24,414
Pneumonia	9,781	151,256
Atelectasis	21,471	148,622
Opacity	10,933	148,852
Consolidation	3,919	154,005
Edema	34,676	151,300
Effusion	14,268	151,881

Files

The dataset is organized into the lesion_mask/ folder, which contains segmentation masks, and mimic_ils_instruction_answer.json, a comprehensive JSON file containing metadata, report contexts, and instruction-answer pairs.

Folder Structure

Upon extracting lesion_mask.zip, the directory is organized as follows:

base/
├── lesion_mask/
│   ├── s10000000
│   │   ├── s10000000_effusion_0.png
│   │   ├── s10000000_pneumonia_1.png
│   │   └── s10000000_atelectasis_2.png
│   ├── s10000001
│   ├── s10000002
│   ├── ...
│   └── s99999999
└── mimic_ils_instruction_answer.json

JSON File Structure

mimic_ils_instruction_answer.json encapsulates comprehensive metadata and instruction-answer annotations, structured with the following keys:

subject_id: Unique identifier for the patient.
dicom_id: Unique identifier for the CXR image.
image_path: Relative path to the image consistent with the MIMIC-CXR-JPG structure.
section_name: Source section parsed from the radiology report (e.g., findings, impression, last_paragraph).
section_content: Text content of the parsed section.
instruction_answer_pairs: A dictionary containing the generated instruction and answer pairs.
- pair_id: Unique identifier for the specific instruction-answer pair.
- instruction: The input prompt or query provided to the model.
- answer: The expected output response. [SEG] is a special token for segmentation tasks.
- type: The category of the instruction (e.g., basic, global, lesion_inference).
- target: The specific pathological finding or lesion class targeted by the instruction.
- location: This field specifies the anatomical location for the following cases:
  - Negative Pairs: Indicates the location where a lesion is absent.
  - Cardiomegaly: Since the target location for cardiomegaly is consistently the heart, this field is always ["heart"] for these cases.
- reported_location: This field is used exclusively for positive pairs, excluding cardiomegaly cases. It contains the anatomical locations mentioned in the original radiology report.
- grounded_location: This field is used exclusively for positive pairs, excluding cardiomegaly cases, and represents a subset of the reported_location. While the reported_location includes all areas mentioned in original radiology report, the grounded_location only includes the specific regions where a corresponding lesion mask has been generated and verified.
- sent_idx: Index of the source sentence within the report section.
- seg: Boolean flag indicating whether the model should perform segmentation for the given instruction.
  - True: Assigned to positive pairs, where the instruction refers to a lesion that is actually present and requires segmentation.
  - False: Assigned to negative pairs, where the specified lesion is absent, indicating that no segmentation should be performed.
- seg_mask_path: Relative path to the corresponding binary lesion segmentation mask
  - For positive pairs (seg: True), this field provides the file path to the specific mask for the target lesion.
  - For negative pairs (seg: False), this field is set to null, as there is no corresponding lesion to segment. When training models with these pairs, users are encouraged to treat these cases as samples requiring an empty (all-zero) binary mask to supervise the model in correctly predicting the absence of findings.

{
    "train": {
        "s10000000": {
            "subject_id": "p10000000",
            "dicom_id": "xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx",
            "image_path": "p10/p10000000/s10000000/xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx.jpg",
            "section_name": "findings",
            "section_content": "... (2) Small left and moderate layering right pleural effusions have increased. ...",
            "instruction_answer_pairs": {
                "positive_pairs": [
                    {
                        "pair_id": "s10000000_positive_0",
                        "instruction": "Segment the effusion in the left lung base.",
                        "answer": "[SEG]",
                        "type": "basic",
                        "target": "effusion",
                        "reported_location": [
                            "left lung",
                            "right lung base"
                        ],
                        "grounded_location": [
                            "left lung base"
                        ],
                        "sent_idx": "2",
                        "seg": true,
                        "seg_mask_path": "s10000000/s10000000_effusion_0.png"
                    },
                    ...
                ],
                "negative_pairs": [
                    {
                        "pair_id": "s10000000_negative_0",
                        "instruction": "Segment the consolidation in the right lung.",
                        "answer": "[SEG] There is no consolidation in the right lung.",
                        "type": "basic",
                        "target": "consolidation",
                        "location": [
                            "right lung"
                        ],
                        "seg": false,
                        "seg_mask_path": null
                    },
                    ...
                ]
            }
        },
        "val": ...
        "test": ...
}

Usage Notes

The following Python code snippet demonstrates how to load the dataset and access specific samples.

import json
import os

# --- Configuration ---
SPLIT = 'train'

# Update the path to your actual dataset location
JSON_PATH = 'mimic_ils_instruction_answer.json'

# --- Load Dataset ---
with open(JSON_PATH, 'r') as f:
    dataset = json.load(f)

# --- Visualize Samples ---
print(f"Successfully loaded dataset. Visualizing samples from '{SPLIT}' split...\n")

for i, (study_id, data) in enumerate(dataset[SPLIT].items()):
    if i >= 5:
        break

    print(f"{'='*20} Sample {i+1} (Study ID: {study_id}) {'='*20}")
    print(f"Image Path: {data['image_path']}\n")

    pairs_data = data['instruction_answer_pairs']

    # Iterate over pair types to reduce code duplication
    for pair_type in ['positive_pairs', 'negative_pairs']:
        pairs = pairs_data.get(pair_type, [])
        header = pair_type.replace('_', ' ').title()
        print(f"[{header}] - {len(pairs)} item(s)")

        for idx, pair in enumerate(pairs):
            print(f"  {idx+1}. Instruction : {pair['instruction']}")
            print(f"     Answer        : {pair['answer']}")
            print(f"     Mask Path     : {pair['seg_mask_path']}")

        print("-" * 30)
    print("\n")

Intended Usage

The primary intended use of the MIMIC-CXR-Ext-ILS dataset is to enable models to segment seven distinct types of CXR lesions based on simple natural language instructions.

A key intended outcome of models trained on this dataset is usability for laypersons. Users without medical expertise cannot be expected to visually identify lesions to provide precise instructions. However, because MIMIC-CXR-Ext-ILS extensively incorporates negative cases for absence confirmation, a model trained on this dataset can overcome this limitation. Users can obtain effective screening results without needing to visually inspect the image themselves through the following interactions:

Iterative Querying: Users can iteratively query the model across various anatomical locations. The system will autonomously verify the presence of a lesion—outputting a precise segmentation mask if it exists, or confirming its absence otherwise.
Global Instructions: Users can make broad, intuitive inquiries (e.g., "Segment the opacity"). Coupled with the model’s robust handling of negative cases, this allows for comprehensive and automated visual screening.

Flexible Usage

While MIMIC-CXR-Ext-ILS is primarily designed for instruction-guided lesion segmentation, its rich metadata allows for flexible usage tailored to specific research needs. Users can filter and extract internal information to create customized subsets. For example, by selecting cases with global instructions (e.g., "Segment the effusion") and their corresponding masks, one can train a specialized model dedicated solely to effusion segmentation.

Data Access and Dependency

Please note that this project provides only the derived lesion segmentation masks and instruction-answer annotations. The corresponding CXR images are not included in this repository. To use this dataset, users must separately request and access the source images through the parent MIMIC-CXR and MIMIC-CXR-JPG projects.

Limitations

As the lesion masks and instruction-answer pairs in the MIMIC-CXR-Ext-ILS dataset were generated automatically, users should be aware that the data may contain occasional label or localization errors. Additionally, there may be inherent biases in the distribution of the generated instructions. We acknowledge these artifacts of the automated pipeline and plan to continuously refine the dataset to address these distributional biases and improve annotation quality in future updates.

Release Notes

Version 1.0.0: Initial release

For any questions or inquiries regarding this dataset, please contact us at choigeon@kaist.ac.kr.

Ethics

The MIMIC-CXR-Ext-ILS dataset is derived from the parent MIMIC-CXR and MIMIC-CXR-JPG datasets. As a derived work, this project strictly adheres to the same controlled-access policy, license, and data use agreement (DUA) established by the parent projects. Furthermore, no new protected health information (PHI) has been introduced, reconstructed, or exposed in this derived dataset.

Conflicts of Interest

The authors have no conflicts of interest to declare.

References

Broder JS. Diagnostic Imaging for the Emergency Physician E-Book: Diagnostic Imaging for the Emergency Physician E-Book. Elsevier Health Sciences; 2011 Mar 15.
De Coronado S, Haber MW, Sioutos N, Tuttle MS, Wright LW. NCI Thesaurus: using science-based terminology to integrate cancer research results. InMedinfo 2004 Jan 1 (pp. 33-37).
Lai X, Tian Z, Chen Y, Li Y, Yuan Y, Liu S, Jia J. Lisa: Reasoning segmentation via large language model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024;9579-9589.
Lan M, Chen C, Zhou Y, Xu J, Ke Y, Wang X, Feng L, Zhang W. Text4Seg: Reimagining image segmentation as text generation. arXiv. 2024. arXiv:2410.09855.
Ren Z, Huang Z, Wei Y, Zhao Y, Fu D, Feng J, Jin X. Pixellm: Pixel reasoning with large multimodal model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024;26374-26383.
Huang X, Li H, Cao M, Chen L, You C, An D. Cross-modal conditioned reconstruction for language-guided medical image segmentation. IEEE Transactions on Medical Imaging. 2024 Dec 26;44(4):1821-35.
Li Z, Li Y, Li Q, Wang P, Guo D, Lu L, Jin D, Zhang Y, Hong Q. Lvit: language meets vision transformer in medical image segmentation. IEEE transactions on medical imaging. 2023 Jul 3;43(1):96-107.
Boecking B, Usuyama N, Bannur S, Castro DC, Schwaighofer A, Hyland S, et al. Making the most of text semantics to improve biomedical vision–language processing. In Proceedings of the European Conference on Computer Vision. 2022;1-21.
de Castro DC, Bustos A, Bannur S, Hyland SL, Bouzid K, Wetscherek MT, Sánchez-Valverde MD, Jaques-Pérez L, Pérez-Rodríguez L, Takeda K, Salinas-Serrano JM. Padchest-gr: A bilingual chest x-ray dataset for grounded radiology report generation. NEJM AI. 2025 Jun 26;2(7):AIdbp2401120.
Liu Y, Wu YH, Ban Y, Wang H, Cheng MM. Rethinking computer-aided tuberculosis diagnosis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020;2646-2655.
Nguyen HQ, Lam K, Le LT, Pham HH, Tran DQ, Nguyen DB, Le DD, Pham CM, Tong HT, Dinh DH, Do CD. VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations. Scientific Data. 2022 Jul 20;9(1):429.
Zawacki A, Wu C, Shih G, Elliott J, Fomitchev M, Hussain M, Lakhani P, Culliton P, Bao S. SIIM-ACR Pneumothorax Segmentation. Kaggle. 2019. Accessed March 20, 2026. https://kaggle.com/competitions/siim-acr-pneumothorax-segmentation
Degerli A, Kiranyaz S, Chowdhury ME, Gabbouj M. Osegnet: Operational segmentation network for COVID-19 detection using chest X-ray images. In Proceedings of the IEEE International Conference on Image Processing (ICIP). 2022;2306-2310.
Danilov VV, Proutski A, Karpovsky A, Kirpich A, Litmanovich D, Nefaridze D, Talalov O, Semyonov S, Koniukhovskii V, Shvartc V, Gankin Y. Indirect supervision applied to COVID-19 and pneumonia classification. Informatics in Medicine Unlocked. 2022 Jan 1;28:100835.
Johnson A, Pollard T, Mark R, Berkowitz S, Horng S. Mimic-cxr database. PhysioNet10. 2024;13026(C2JT1Q):5.
Johnson AEW, Pollard TJ, Mark RG, Berkowitz SJ, Horng S. MIMIC-CXR Database (version 2.0.0). PhysioNet. 2024. doi:10.13026/C2JT1Q.
Johnson AE, Pollard TJ, Greenbaum NR, Lungren MP, Deng CY, Peng Y, Lu Z, Mark RG, Berkowitz SJ, Horng S. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042. 2019 Jan 21.
Johnson A, Lungren M, Peng Y, Lu Z, Mark R, Berkowitz S, Horng S. Mimic-cxr-jpg-chest radiographs with structured labels. PhysioNet. 2019;101(215-220):1.
Choi G, Yoon H, Shin H, Park H, Seo SH, Yang E, Choi E. Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset. arXiv preprint arXiv:2511.15186. 2025 Nov 19.
Pérez-García F, Bond-Taylor S, Sanchez PP, van Breugel B, Castro DC, Sharma H, Salvatelli V, Wetscherek MT, Richardson H, Lungren MP, Nori A. Radedit: stress-testing biomedical vision models via diffusion image editing. InEuropean Conference on Computer Vision 2024 Sep 29 (pp. 358-376). Cham: Springer Nature Switzerland.
Seibold C, Jaus A, Fink MA, Kim M, Reiß S, Herrmann K, et al. Accurate fine-grained segmentation of human anatomy in radiographs via volumetric pseudo-labeling. arXiv. 2023. arXiv:2306.03934.
Nguyen D, Ho MK, Ta H, Nguyen TT, Chen Q, Rav K, Dang QD, Ramchandre S, Phung SL, Liao Z, To MS. Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs. arXiv preprint arXiv:2505.00744. 2025 Apr 30.
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 2097-2106).
Gaggion N, Mansilla L, Mosquera C, Milone DH, Ferrante E. Improving anatomical plausibility in medical image segmentation via hybrid graph neural networks: applications to chest x-ray analysis. IEEE Transactions on Medical Imaging. 2022 Nov 24;42(2):546-56.

Parent Projects

MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays was derived from:

Please cite them when using this project.

Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Discovery

DOI (version 1.0.0):
https://doi.org/10.13026/8ejy-4t06

DOI (latest version):
https://doi.org/10.13026/gdq0-ej95

Topics:
chest x-ray segmentation text-guided segmentation lesion segmentation

Project Views

3

Current Version

3

All Versions

Project Views by Unique Registered Users

View Details

Corresponding Author

You must be logged in to view the contact information.

Versions

1.0.0 March 25, 2026

Files

This is a restricted-access resource. To access the files, you must fulfill all of the following requirements:

be a credentialed user
complete required training:

CITI Data or Specimens Only Research

here

sign the data use agreement for the project

MIMIC-CXR-Ext-ILS: Lesion Segmentation Masks and Instruction-Answer Pairs for Chest X-rays

Cite

Cite

Abstract

Background

Methods

Grounded Lesion Mask Generation

Instruction-Answer Pair Generation

Quality Verification for Test Set

Data Description

Overview

Files

Folder Structure

JSON File Structure

Usage Notes

Intended Usage

Flexible Usage

Data Access and Dependency

Limitations

Release Notes

Ethics

Conflicts of Interest

References

Parent Projects

Share

Access

Discovery

Project Views

3

3

Corresponding Author

Versions

Files