Name: CXR-Align: A Benchmark for CXR-Report Alignment with Negations
Published: Aug. 21, 2025
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Hanbin Ko

Published: Aug. 21, 2025. Version: 1.0.0

When using this resource, please cite: (show more options)
Ko, H. (2025). CXR-Align: A Benchmark for CXR-Report Alignment with Negations (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/7ebc-s018

MLA	Ko, Hanbin. "CXR-Align: A Benchmark for CXR-Report Alignment with Negations" (version 1.0.0). PhysioNet (2025). RRID:SCR_007345. https://doi.org/10.13026/7ebc-s018
APA	Ko, H. (2025). CXR-Align: A Benchmark for CXR-Report Alignment with Negations (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/7ebc-s018
Chicago	Ko, Hanbin. "CXR-Align: A Benchmark for CXR-Report Alignment with Negations" (version 1.0.0). PhysioNet (2025). RRID:SCR_007345. https://doi.org/10.13026/7ebc-s018
Harvard	Ko, H. (2025) 'CXR-Align: A Benchmark for CXR-Report Alignment with Negations' (version 1.0.0), PhysioNet. RRID:SCR_007345. Available at: https://doi.org/10.13026/7ebc-s018
Vancouver	Ko H. CXR-Align: A Benchmark for CXR-Report Alignment with Negations (version 1.0.0). PhysioNet. 2025. RRID:SCR_007345. Available from: https://doi.org/10.13026/7ebc-s018

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

CXR-Align is a benchmark dataset designed to evaluate vision-language processing (VLP) models' ability to accurately interpret negations in chest X-ray (CXR) reports. Negations are prevalent in medical documentation and pose significant challenges for automated analysis, as misinterpretation can lead to critical diagnostic errors. Existing medical VLP systems often inadequately handle negated findings, motivating the creation of CXR-Align to specifically target and mitigate this limitation.

The dataset comprises systematically modified and anonymized clinical reports derived from the MIMIC-III database. Each report in CXR-Align includes controlled alterations, particularly negations introduced to clinically relevant positive findings. Reports are standardized to exclude normal CXR cases, ensuring that each entry maintains direct diagnostic significance and maximizes the datase's utility for assessing nuanced language comprehension.

CXR-Align enables researchers to rigorously test the accuracy and robustness of VLP models, advancing the development of systems capable of reliably interpreting complex clinical language. The dataset's structured format and clear diagnostic focus make it an essential resource for researchers aiming to enhance model comprehension of medical negations, ultimately contributing to safer and more effective clinical decision-support tools.

Background

Negations are frequently encountered in medical reporting, particularly in chest X-ray (CXR) analysis, and their accurate interpretation is critical for effective clinical decision-making. However, existing medical vision-language processing (VLP) models often struggle with negation comprehension, leading to potential diagnostic inaccuracies [1,2]. Current datasets and tools often lack structured handling of negation, leading to limited model capabilities in accurately aligning radiology reports with imaging findings. Specifically, the subtlety and complexity inherent in clinical negations can result in misinterpretations, which may significantly impact patient care and clinical outcomes. To address these limitations, CXR-Align is introduced as a dedicated benchmark dataset explicitly designed to enhance the understanding and handling of negations in medical VLP systems. By systematically modifying and anonymizing clinical reports to include controlled negations, this dataset aims to provide a robust framework for evaluating and improving model accuracy and reliability. The anticipated benefit of releasing this dataset is to stimulate the development of more sophisticated, clinically relevant VLP models, ultimately leading to safer and more effective clinical decision-support tools.

Methods

CXR-Align draws upon clinical reports from the MIMIC-CXR dataset [3]. Each case in CXR-Align corresponds to a single chest X-ray, ensuring that the text does not describe multiple timepoints or follow-up comparisons. By focusing on standalone examinations, CXR-Align provides a clearer view of clinically relevant findings within a single temporal context.

Inclusion and Exclusion Criteria

The primary inclusion criterion was the presence of at least one clinically relevant positive finding, such as cardiomegaly, atelectasis, edema, pleural effusion, pneumothorax, or consolidation. Reports that exclusively documented "normal" findings were excluded to ensure the dataset focuses on diagnostically significant scenarios. We additionally removed reports that lacked sufficient textual detail for reliable entity extraction, thereby maintaining consistency and clarity within each radiology report.

Negation Pipeline and Report Variants

After filtering, CheXbert [4] was used to identify positive clinical entities in each report. For every report containing a suitable positive finding, we standardized the text by removing irrelevant or repetitive details and normalizing terminology (e.g., aligning synonyms "cardiac enlargement" with "cardiomegaly"). Each processed report then underwent two transformations:

Removal of Findings: All content related to the chosen entity was deleted to prevent contradictory statements and pave the way for a controlled negation step.
Insertion of a Negated Statement: A sentence negating the selected finding was inserted at a random position (beginning, middle, or end) within the standardized report. When dealing with mediastinal-related findings, we used one of the following predefined sentences:
- "The cardiomediastinal silhouette is normal."
- "The cardiac silhouette is unremarkable."
- "The heart size is normal."
- "The cardiomediastinal silhouette is within normal limits."
- "No cardiomegaly."
For other findings, we employed one of these standard templates:
- "No (finding) is seen."
- "No (finding) is observed."
- "There is no (finding)."
- "No evidence of (finding)."

If all references to the entity had already been removed, the negated statement was appended to the end of the report. This process yielded two modified variants of each original report: one lacking the targeted finding and one containing a newly introduced negation.

Quality Assurance

To ensure clinical accuracy, we randomly sampled 500 negation-annotated reports from the final corpus. A researcher with four years of experience in chest X-ray analysis examined these reports, verifying that the negations preserved clinical integrity and that no contradictory statements remained. Any inconsistencies, from incomplete negations to minor grammatical slips, were resolved to uphold high data quality standards.

Dataset and Documentation

The resulting dataset includes each original report alongside its "removal" and "negation" variants, paired with the corresponding chest X-ray images. All scripts and prompt templates used for this pipeline are available in a public repository [5], enabling other researchers to reproduce or extend the CXR-Align approach for their own clinical language-processing initiatives.

Data Description

CXR-Align is structured as a JSON file containing the primary key "mimic" This key maps to a dictionary containing individual object identifiers, each corresponding to a specific clinical report. Each object identifier includes the following sub-fields:

"report": The original report text segmented into individual statements.
"chosen": The clinical entity selected for negation or omission.
"location": The position within the report (beginning, middle, or end) where the negated statement is introduced.
"omitted": The report version from which all sentences containing the chosen clinical entity have been removed. This tests model comprehension of the full context by assessing similarities with the original report.
"negation": The modified report created from the "omitted" report by inserting a manually defined negated statement at the specified "location." If a mediastinal-related finding is chosen, one of the predefined statements (e.g., "The heart size is normal," "No cardiomegaly") is added. For other findings, negation statements such as "No (finding) is seen," or "There is no (finding)" are used.

This structured format facilitates precise and reproducible evaluations for vision-language processing research.

Usage Notes

Researchers are encouraged to use CXR-Align to evaluate and benchmark medical vision-language models, particularly in tasks that involve negation handling, radiology report alignment, diagnostic accuracy assessments, and clinical decision-support research. The dataset is particularly beneficial for analyzing model robustness and interpretability in contexts that involve complex linguistic negations.

Known limitations of the dataset include the domain specificity of findings and negation patterns derived exclusively from chest X-ray reports, which may restrict its applicability to other medical imaging modalities or clinical domains. Future developments will aim to increase the diversity of negation prompts, currently limited to predefined templates, to enhance the dataset’s robustness and applicability.

Users must clearly cite both the CXR-Align dataset and the original MIMIC-CXR dataset [3] in any related research or publications. Any modifications or extensions of this dataset should explicitly reference the original CXR-Align benchmark. Tools and code developed for data modification are accessible and detailed in the accompanying repository, available in the dataset documentation and references [5].

Release Notes

Version 1.0.0: Initial release of the CXR-Align dataset.

Content: Includes modified clinical reports derived from MIMIC-CXR datasets, structured in JSON format. CheXpert reports were explicitly excluded due to licensing restrictions.

Features: Controlled negations, omission of selected entities, and precise annotation of negation locations.

Purpose: Designed specifically for evaluating the comprehension of negations by vision-language processing models.

Ethics

This dataset comprises modified clinical reports derived from publicly available datasets (MIMIC-CXR), which were previously anonymized and de-identified by their original creators. No additional protected health information (PHI) was collected or used in this work. The modifications introduced into the reports involve only systematic negations and entity omissions for benchmarking purposes and do not pose additional risks to patient privacy. The dataset promotes beneficial advancements in medical artificial intelligence and clinical decision-making by improving models' ability to correctly interpret negations. There are no foreseeable risks associated with the use of this dataset.

Conflicts of Interest

No conflicts of interest.

References

Bannur S, Hyland S, Liu Q, Perez-Garcia F, Ilse M, Castro DC, et al. Learning to exploit temporal structure for biomedical vision-language processing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023.
Singh J, Shrivastava I, Vatsa M, Singh R, Bharati A. Learn "no" to say "yes" better: Improving vision-language models via negations. arXiv preprint arXiv:2403.20312; 2024.
Johnson AE, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3:160035.
Smit A, Jain S, Rajpurkar P, Pareek A, Ng AY, Lungren MP. CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv preprint arXiv:2004.09167. 2020 Apr 20.
CXR-align [Internet]. Available from: https://github.com/lukeingawesome/cxralign/tree/main/cxr-align [Accessed 2025 Jul 29].