Database Restricted Access

REFLACX: Reports and eye-tracking data for localization of abnormalities in chest x-rays

Ricardo Bigolin Lanfredi Mingyuan Zhang William Auffermann Jessica Chan Phuong-Anh Duong Vivek Srikumar Trafton Drew Joyce Schroeder Tolga Tasdizen

Published: Sept. 27, 2021. Version: 1.0.0

When using this resource, please cite: (show more options)
Bigolin Lanfredi, R., Zhang, M., Auffermann, W., Chan, J., Duong, P., Srikumar, V., Drew, T., Schroeder, J., & Tasdizen, T. (2021). REFLACX: Reports and eye-tracking data for localization of abnormalities in chest x-rays (version 1.0.0). PhysioNet.

Additionally, please cite the original publication:

Bigolin Lanfredi, R., Zhang, M., Auffermann, W., Chan, J., Duong, P., Srikumar, V., Drew, T., Schroeder, J., & Tasdizen, T. (2021). REFLACX, a dataset of reports and eye-tracking data for localization of abnormalities in chest x-rays.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


Labels localizing anomalies are rare in current chest x-rays datasets. We collected a dataset, using a method that can potentially be scaled up, to be a proof-of-concept for collecting implicit localization data through an eye-tracker. This dataset, named REFLACX, provides eye-tracking data collected while radiologists dictated reports for frontal chest x-rays from the MIMIC-CXR dataset, paired with the timestamped transcription of the dictation. The dataset contains 3,032 cases of synchronized eye-tracking and transcription pairs labeled by five radiologists. For each case, we also provide other labels for validating algorithms derived from this dataset. These labels include image-level labels of anomalies found during the reading, ellipses localizing these anomalies, and bounding boxes around lungs and heart. A subset of 109 chest x-rays has readings from all radiologists to estimate the variability of the provided labels.


Li et al. have shown that localization information improves the scores of a model when used as weak supervision for chest x-ray classification [1]. Even though some datasets provide this localization information as manually labeled bounding boxes [2-4], they are rare and, when provided, usually contain relatively few bounding boxes. To develop a potentially scalable method for acquiring this type of data, we collected a dataset with implicit localization of anomalies in chest x-rays through eye-tracking data recorded during the dictation of reports by radiologists. This proposed dataset is also in line with the prioritization of automatic labeling techniques by the NIH's roadmap for AI in medical imaging [5].

Our data collection involved five radiologists using a custom interface mimicking clinical practice. We sampled the displayed chest x-rays from the MIMIC-CXR dataset [6]. Radiologists dictated reports for these images while eye-tracking data and audio were recorded, with synchronization between both types of data. They also selected manual labels for the image. They drew a chest bounding box, including lung and heart, for normalization of the chest location. They also selected labels to validate algorithms developed with this dataset: image-level labels and localization of abnormalities through the drawing of ellipses. Data collection was separated into three phases, with the last phase composing the bulk of our dataset. Phases 1 and 2 had the same set of chest x-rays for every radiologist to estimate the labeling variability among readers. Phase 3 had a separate set of images for each radiologist.


Images were sampled from the MIMIC-CXR dataset [6-8]. Before sampling, we filtered them to include only frontal x-rays, studies with only one frontal x-ray, and images present in the labels table of the MIMIC-CXR-JPG dataset [8-10]. Images were sampled such that 20% were from the MIMIC-CXR test set and 80% from the rest of the dataset. After the sampling, images that had large parts of the lung missing or were highly rotated or clearly flipped were excluded. For phase 3, images with anonymization rectangles were also excluded.

We used a custom interface in MATLAB R2019a and Psychtoolbox 3.0.7 [11-13] for data recording, radiologist interaction, and image display on a 4K screen. The interface allowed for the dictation of reports, followed by manual labeling and transcription correction. The interface allowed for zooming, panning, and windowing of the chest x-rays. To collect the eye-tracking data, we used an EyeLink 1000 Plus with a 25 mm lens. The eye tracker was used in remote mode, for which the radiologists had to use a sticker on their forehead. They had some freedom of movement, as long as the sticker and tracked eye were in the camera's field of view, and the distance to the camera did not change to the point of unfocusing the image. For calibration of the correspondence between eye state and screen gaze location, the radiologists looked at a sequence of 13 points in different screen locations. Calibrations happened at the beginning of data collection sessions, after breaks, every 25 cases without calibrations, or when there were problems with data quality fixable by a calibration. Audio from dictations was recorded with a PowerMic II microphone at 48,000 Hz and transcribed by an IBM Watson Speech-To-Text model trained with reports from the MIMIC-CXR dataset, reports from our radiologists, and audio from our radiologists, depending on the phase. Transcriptions were corrected by radiologists and by another person for clear mistakes.

Eye-tracking data were automatically parsed for fixations, i.e., the stabilization of the gaze at a specific position, by the EyeLink 1000 Host PC, which was the system recording all the eye-tracking data. The data collected by the MATLAB interface, including dictation audio, were synchronized with the eye-tracking data by synchronization messages sent from MATLAB to the EyeLink 1000 Host PC. Because the transcription was corrected, the timestamps had to be adapted to the new set of words. We calculated the differences between the two sentences, i.e., sets of grouped different words. For each difference, we interpolated the timestamps from the syllables of the original words to the syllables of the new word set. The eye-tracking data were tested for quality by checking for duration and percentage of blinks, i.e., moments when the eye tracker did not find the pupil, cornea, or sticker. We removed eye-tracking data when there was a blink lasting for more than 3 s or when blinks represented more than 15% of the data of one case (n=41). We also removed eye-tracking data when we found software problems in the saving process (n=6), when radiologists indicated the end of dictation before finishing it (n=7), when glasses were identified as the radiologist's eyes by the eye tracker (n=2), and when we displayed images that were included in the experiments against our defined filtering rules (n=6).

Data Description

This dataset is separated into two folders, main_data, containing the main files for the intended use of the dataset and metadata tables, and gaze_data, which is more significant in data size and contains the complete recorded eye-tracking data in 1000 Hz. Both folders contain several subfolders, one for each reading of a chest x-ray, named after an ID assigned to each reading. All IDs and some of their metadata are listed in the metadata tables. The dataset has three metadata tables, one for each of the phases of data collection: two preliminary phases, collected from November 11, 2020, to January 4, 2021, and from March 1, 2021, to March 11, 2021, when radiologists read a shared set of 109 chest x-rays, and the main phase, collected from March 24, 2021, to June 7, 2021, with readings of 2,507 chest x-rays. Each subfolder contains several comma-separated tables, one for each type of data collected: fixations, localization ellipses, chest bounding boxes, and the timestamped transcriptions. An example of the folder structure for one of the cases is:



For the total 3,052 IDs, eye-tracking data and reports are not provided for 20 of them, which had their eye-tracking data discarded for quality but were kept in the dataset to calculate agreement scores for the manual labels. These readings represent 2,616 unique chest x-rays and 2,199 unique subjects. The columns of each of the tables are described below. All timestamp columns are counted from the start of the audio recording in seconds. All columns representing pixel coordinates have the origin at the top left corner, with x coordinates representing the horizontal axis and y coordinates representing the vertical axis.

  • main_data/metadata_phase_<phase>.csv:
    • id (string): used to identify each chest x-ray reading.
    • split (string): the split (train, validate or test) given by the MIMIC-CXR dataset.
    • eye_tracking_data_discarded (Boolean): this column is True in the small subset of 20 images from phases 1 and 2 that had their eye-tracking data discarded, but the validation labels were kept to allow us to calculate the variability scores for these phases. The fixations.csv, gaze.csv, timestamps_transcription.csv, and transcription.txt files are not provided for cases for which this column is True.
    • image (string): folder location of the used chest x-ray in the MIMIC-CXR dataset.
    • dicom_id (string): id identifying the read chest x-ray that can be used to link this table with the MIMIC-CXR tables.
    • subject_id (string): id of the patient of the chest x-ray, from the MIMIC-CXR and MIMIC-IV [8,14] datasets.
    • image_size_x , image_size_y (int): horizontal and vertical sizes of the chest x-ray, in pixels.
    • Airway wall thickening, Atelectasis, Consolidation, Emphysema, Enlarged cardiac silhouette, Fibrosis, Fracture, Groundglass opacity, Mass, Nodule, Pleural effusion, Pleural thickening, Pneumothorax, Pulmonary edema, Wide mediastinum (phase 1); Abnormal mediastinal contour, Acute fracture, Atelectasis, Consolidation, Enlarged cardiac silhouette, Enlarged hilum, Groundglass opacity, Hiatal hernia, High lung volume / emphysema, Interstitial lung disease, Lung nodule or mass, Pleural abnormality, Pneumothorax, Pulmonary edema (phases 2 and 3) (int): columns for the certainty of image-level labels, with values from 0 to 5, representing the maximum certainty selected over all ellipses drawn for the label, with the following representation: 0: not selected by radiologist, 1: Unlikely (<10%), 2: Less Likely (~25%), 3: Possibly (~50%), 4: Suspicious for/Probably (~75%), 5: Consistent with (>90%), following the definition by Panicek et al. [15].
    • Quality issue, Support devices (phase 1); Support devices (phases 2 and 3) (Boolean): image-level labels with only presence indicated.
    • Other (string): written additional labels, separated by a "|" symbol.
  • main_data/<id>/fixations.csv: list of fixations, i.e., stabilizations of the gaze of the radiologist in a specific location, for the <id> reading.
    • timestamp_start_fixation, timestamp_end_fixation (float): time when the fixation started and ended. The difference between these two values may be used as a weighting of the importance of that fixation.
    • x_position, y_position (int): average position for the fixation, in image space.
    • pupil_area_normalized (float): area of the pupil, normalized by the pupil area measured during calibration at the beginning of the data collection session.
    • window_level, window_width (float): variables representing the average windowing state over the fixation. Images were shown according to
i m a g e _ s h o w n = o r i g i n a l _ i m a g e w i n d o w _ l e v e l w i n d o w _ w i d t h + 0.5 image\_shown=\frac{original\_image-window\_level}{window\_width}+0.5

, where image_shown was then trimmed to 0 to 1, original_image was the loaded DICOM normalized to the [0, 1] range (usually by a division by 4096), window_level varied from 0 to 1, and window_width from 1.5e-5 to 2.

  • angular_resolution_x_pixels_per_degree, angular_resolution_y_pixels_per_degree (int): the number of image pixels per visual angle in degrees for each image axis, depending on the screen region where the fixation was and the level of zoom applied to the image. 
    • xmin_shown_from_image, ymin_shown_from_image, xmax_shown_from_image, ymax_shown_from_image (int): the part of the image that was shown on the screen.
  • xmin_in_screen_coordinates, ymin_in_screen_coordinates, xmax_in_screen_coordinates, ymax_in_screen_coordinates (int): the coordinates on the screen where the image was shown. Together with the <…>_shown_from_image columns, these columns represent the zooming and panning state at the start of the fixation.
  • gaze_data/<id>/gaze.csv: list of gaze locations at 1000 Hz. This table has the same columns as the fixations table, except for the timestamp columns, which were replaced by a single timestamp_sample column. Rows with empty values for positions, pupil area, and angular resolutions represent moments when the radiologist's eye was not found, e.g., when radiologists blinked.
  • main_data/<id>/timestamps_transcriptions.csv:
    • word (string): the transcribed word, after correction by radiologists and by another person. The symbols for periods, commas, and slashes were also considered words since the radiologists dictated them.
    • timestamp_start_word, timestamp_end_word (float): time of start and end of the dictation of the word.
  • main_data/<id>/transcription.txt: full corrected transcription in text format.
  • main_data/<id>/anomaly_location_ellipses.csv:
    • xmin, ymin, xmax, ymax (int): extreme points of each axis of the drawn ellipse in image space.
    • certainty (int): certainty of the presence of the highlighted finding, following the same 0-5 representation as in the metadata table.
  • (Boolean) The rest of the columns, one for each of the possible labels, represent which labels were found to have the possibility of representing the highlighted finding. Radiologists were asked not to highlight the labels "Support devices," "Quality issue," and "Other," while the rest of the labels were mandatorily drawn.
  • main_data/<id>/chest_bounding_box.csv: table with one row representing the drawn chest bounding box around the lungs and heart.
    • xmin, ymin, xmax, ymax (int): extreme coordinates of the bounding box in image space.

Usage Notes

The chest x-rays read in this study originated from the MIMIC-CXR dataset [6-8], so access to that dataset is needed for sensible use of this dataset. The primary uses for which this dataset was conceptualized include:

  • combining fixations into a saliency map,
  • using the sequences of fixations with the timestamped transcriptions to get label-specific localization,
  • validating this localization using the drawn ellipses,
  • using the chest bounding boxes to normalize chest x-ray positioning, and
  • training a model for outputting chest bounding boxes.

We provide in a GitHub repository [16] the MATLAB interface code, the code used to postprocess the data, and examples on the use of the dataset:

  • how to calculate the brightness of the chest x-ray for an additional pupil area normalization,
  • how to generate heatmaps,
  • how to normalize chest x-ray positions using the chest bounding box,
  • how to filter fixations that are out of the chest x-ray, and
  • how to load tables from the dataset.

This proof-of-concept is not a perfect replication of the clinical setting, with limitations that still need to be addressed, including:

  • the need for frequent calibration,
  • the limitation of head movement,
  • the use of a single screen showing only one frontal x-ray,
  • the unavailability of report templates and some types of report modification,
  • the use of a single chest x-ray dataset,
  • the display of only 8-bit intensities, and
  • the intervention of a second person for calibration and real-time data-quality checks.

Release Notes

Initial release of the dataset


The collection of this dataset was funded by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number R21EB028367. Lauren Williams provided coding examples for the use of Psychtoolbox. David Alonso explained eye-tracking data collection procedures. Christine Picket helped in the editing of the text. Howard Mann was one of the readers of chest x-rays.

Conflicts of Interest

The authors have no conflicts of interest to declare.


  1. Li Z, Wang C, Han M, Xue Y, Wei W, Li L-J, et al. Thoracic Disease Identification and Localization with Limited Supervision. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition [Internet]. IEEE; 2018. Available from:
  2. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Internet]. IEEE; 2017. Available from:
  3. Nguyen HQ, Lam K, Le LT, Pham HH, Tran DQ, Nguyen DB, Le DD, Pham CM, Tong HT, Dinh DH, Do CD. VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations. arXiv preprint arXiv:2012.15029. 2020 Dec 30.
  4. Shih G, Wu CC, Halabi SS, Kohli MD, Prevedello LM, Cook TS, et al. Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert Annotations of Possible Pneumonia. Radiology: Artificial Intelligence [Internet]. 2019 Jan;1(1):e180041. Available from:
  5. Langlotz CP, Allen B, Erickson BJ, Kalpathy-Cramer J, Bigelow K, Cook TS, et al. A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop. Radiology [Internet]. 2019 Jun;291(3):781–91. Available from:
  6. Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data [Internet]. 2019 Dec;6(1). Available from:
  7. Johnson A, Pollard T, Mark R, Berkowitz S, Horng S. MIMIC-CXR Database (version 2.0.0). PhysioNet. 2019. Available from:
  8. Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
  9. Johnson A, Lungren M, Peng Y, Lu Z, Mark R, Berkowitz S, Horng S. MIMIC-CXR-JPG - chest radiographs with structured labels (version 2.0.0). PhysioNet. 2019. Available from:
  10. Johnson AE, Pollard TJ, Greenbaum NR, Lungren MP, Deng CY, Peng Y, Lu Z, Mark RG, Berkowitz SJ, Horng S. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042. 2019 Jan 21.
  11. Brainard DH. The Psychophysics Toolbox. Spatial Vis [Internet]. 1997;10(4):433–6. Available from:
  12. Pelli DG. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spatial Vis [Internet]. 1997;10(4):437–42. Available from:
  13. Kleiner M, Brainard D, Pelli D, Ingling A, Murray R, Broussard C. What's new in psychtoolbox-3. Perception. 2007;36(14):1-16.
  14. Johnson A, Bulgarelli L, Pollard T, Horng S, Celi LA, Mark R. MIMIC-IV [Internet]. PhysioNet; 2021. Available from:
  15. Panicek DM, Hricak H. How Sure Are You, Doctor? A Standardized Lexicon to Describe the Radiologist's Level of Certainty. American Journal of Roentgenology [Internet]. 2016 Jul;207(1):2–3. Available from:
  16. Ricardo Bigolin Lanfredi. ricbl/eyetracking: Code for REFLACX dataset v1.0. 2021. doi:10.5281/zenodo.5501093.

Parent Projects
REFLACX: Reports and eye-tracking data for localization of abnormalities in chest x-rays was derived from: Please cite them when using this project.

Access Policy:
Only logged in users who sign the specified data use agreement can access the files.

License (for files):
PhysioNet Restricted Health Data License 1.5.0

Corresponding Author
You must be logged in to view the contact information.