Database Credentialed Access

Eye Gaze Data for Chest X-rays

Alexandros Karargyris Satyananda Kashyap Ismini Lourentzou Joy Wu Matthew Tong Arjun Sharma Shafiq Abedin David Beymer Vandana Mukherjee Elizabeth Krupinski Mehdi Moradi

Published: Sept. 12, 2020. Version: 1.0.0

When using this resource, please cite: (show more options)
Karargyris, A., Kashyap, S., Lourentzou, I., Wu, J., Tong, M., Sharma, A., Abedin, S., Beymer, D., Mukherjee, V., Krupinski, E., & Moradi, M. (2020). Eye Gaze Data for Chest X-rays (version 1.0.0). PhysioNet.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


We created a rich multimodal dataset for the Chest X-Ray (CXR) domain. The data was collected using an eye tracking system while a radiologist interpreted and read 1,083 public CXR images. The dataset contains the following aligned modalities: image, transcribed report text, dictation audio and eye gaze data. We hope this dataset can contribute to various fields of research with applications in machine learning such as deep learning explainability, multi-modal fusion, disease classification, and automated radiology report generation to name a few. The images were selected from the MIMIC-CXR Database and were associated with studies from 1,038 subjects (female: 495, male: 543) who had age range 20 - 80 years old.


CXR is the most common imaging modality in the United States. It makes up to 74% of all imaging modalities ordered by physicians [1]. In recent years with the proliferation of deep learning techniques and publicly available CXR datasets([2], [3], [4]), numerous machine learning approaches have been proposed and deployed in radiology settings for disease detection. 

Eye tracking in radiology has been extensively studied for the purposes of education, perception understanding, fatigue measurement (please see literature reviews for more details: [5], [6], [7], [8]). More recently, efforts such as [9], [10], and [11] have shown use of eye gaze data to improve segmentation and disease classification in Computed Tomography (CT) radiography by combining them in deep learning techniques.

Currently, there is a lack of public datasets that capture eye gaze data in CXR space and given their promising utilization in machine learning, we are releasing the first of its kind dataset to the research community to explore and implement novel applications.


The dataset was collected using an eye tracking system (GP3 HD Eye Tracker, Gazepoint). A radiologist, American Board of Radiology (ABR) certified, with 5 years of attending experience performed interpretation/reading on 1,083 CXR images. The analysis software (Gazepoint Analysis UX Edition) allowed for recording and exporting of eye gaze data and dictation audio.

To identify the images for this study we used MIMIC-CXR Database [1], which is a large public dataset containing CXR in conjunction with the MIMIC-IV Clinical Database [12] that contains clinical outcomes. Inclusion and exclusion criteria were applied on the Emergency Department clinical noted from MIMIC-IV Clinical Database [12] resulting in a subset of 1,083 cases covering equally 3 prominent clinical conditions (i.e. Normal, Pneumonia and Congestive Heart Failure (CHF)). The corresponding CXR images of these cases were extracted from the MIMIC-CXR database [2]. 

The radiologist performed radiology reading on these CXR images using Gazepoint's GP3 Eye Tracker, Gazepoint Analysis UX Edition software, a headset microphone, a PC computer and a monitor (Dell S2719DGF) set at 1920x1080 resolution. Radiology reading took place in multiple sessions (i.e. 30 cases per session) over a period of 2 months (i.e. March - May 2020). The Gazepoint Analysis UX Edition exported raw and processed eye gaze fixations (.csv format) and voice dictation (audio) of radiologist's reading. The audio files were further processed with speech-to-text software (i.e. Google Speech-to-Text API) to extract text transcripts along with dictation word time-related information (.json format). Furthermore, these transcripts were manually corrected. The final dataset contained the eye gaze signal information (.csv), audio files (.wav, .mp3) and transcript files (.json). 

Data Description

The dataset consists of the following data documents:

  • master_sheet.csv: Master spreadsheet containing DICOM_IDs (i.e. original MIMIC-CXR Database IDs) along with disease labels
  • fixations.csv: Spreadsheet containing fixation eye gaze data as exported by Gazepoint Analysis UX Edition software containing DICOM_IDs
  • eye_gaze.csv: Spreadsheet containing raw eye gaze data as exported by Gazepoint Analysis UX Edition software containing DICOM_IDs
  • bounding_boxes.csv: Spreadsheet containing bounding boxes coordinates for the anatomical structures containing DICOM_IDs
  • inclusion_exclusion_criteria_outputs: Folder containing 3 spreadsheet files that were generated after applying inclusion/exclusion criteria. These 3 spreadsheet files can be used by the sampling script to generate the master_sheet.csv. This is optional and it is shared for reproducible purposes.
  • audio_segmentation_transcripts: Folder with i) dictation audio files (i.e. mp3, wav), ii) transcript file (i.e. json), iii) anatomy segmentation mask files (i.e. png) for each DICOM_ID.

The user can traverse easily between the data documents using the DICOM_ID as well as the MIMIC-CXR Database.

NOTE: The bounding_boxes.csv and anatomy segmentation masks files are provided as supplemental sources to help researchers for useful in-depth and correlation analysis (e.g. eye gaze vs. anatomical structures) and/or anatomical structure segmentation purposes.

Detailed Description

1) The master_sheet.csv spreadsheet provides the following key information (detailed description found in table_descriptions.pdf):

  • The DICOM_ID column maps each row to the original MIMIC CXR image as well as the rest of the documents in this dataset.
  • Granular disease labels given by the MIMIC CXR database [2] (i.e. CheXpert NLP tool [3])
  • The reason for exam sentences sectioned out from Indication section of the original MIMIC-CXR report

2) The fixations.csv and eye_gaze.csv spreadsheets contain the eye tracking information. They were exported by the Gazepoint Analysis UX Edition software. The difference between fixations.csv and eye_gaze.csv is that the former file is a subset of the latter one.

Specifically, the eye_gaze.csv file contains one (1) row for every data sample collected from the eye tracker while fixations.csv file contains a single data entry per fixation. Fixation is defined as the maintaining of the eye gaze on a single location (i.e. eye gaze cluster). So the Gazepoint Analysis UX Edition software generates the fixations.csv file by post-processing (i.e. 'sweeping') the eye_gaze.csv file and storing the last entry for each fixation. Both fixations.csv and eye_gaze.csv spreadsheets contain the same columns. Key columns that are found in both spreadsheets are listed below (detailed description found in table_descriptions.pdf):

  • DICOM_ID: maps rows to the original MIMIC image name.
  • TIME (in secs): presents the time elapsed in seconds since the last system initialization or calibration (i.e. when a new CXR image was presented to the radiologist)
  • FPOGX: the X coordinates of the fixation POG, as a fraction of the screen size. (0, 0) is top left, (0.5, 0.5) is the screen center, and (1.0, 1.0) is bottom right.
  • FPOGY: the Y coordinates of the fixation POG, as a fraction of the screen size. (0, 0) is top left, (0.5, 0.5) is the screen center, and (1.0, 1.0) is bottom right.
  • X_ORIGINAL: the X coordinate of the fixation in original MIMIC DICOM image coordinates.
  • Y_ORIGINAL: the Y coordinate of the fixation in original MIMIC DICOM image coordinates.

3) The bounding_boxes.csv contains the following columns:

  • dicom_id: the MIMIC DICOM image name
  • bbox_name: the anatomy name
  • x1: the X coordinate of the top left corner point of the bounding box in original MIMIC DICOM image coordinates
  • y1: the Y coordinate of the top left corner point of the bounding box in original MIMIC DICOM image coordinates
  • x2: the X coordinate of the bottom right corner point of the bounding box in original MIMIC DICOM image coordinates
  • y2: the Y coordinate of the bottom right corner point of the bounding box in original MIMIC DICOM image coordinates

4) The audio_segmentation_transcripts folder contains subfolders named using DICOM_IDs. Each subfolder contains the following files:

  • audio.wav: the dictation audio in wav format
  • audio.mp3: the dictation audio in mp3 format
  • transcript.json: the transcript of the dictation audio with timestamps for each spoken phrase. Specifically, phrase tag contains phrase text, begin_time tag contains the starting time (in seconds) of dictation for phrase, end_time tag contains the end time (in seconds) of dictation for phrase
  • left_lung.png, right_lung.png, mediastinum.png and aortic_knob.png are the manually segmentation images of four (4) key anatomies: left lung, right lung, mediastinum, aortic knob, respectively.

Usage Notes

The dataset requires access to the CXR DICOM images found in the MIMIC-CXR database [2]. In general, the user is advised to use the fixations.csv spreadsheet for their experiments because it contains the eye gaze signal as post-processed by the Gazepoint Analysis UX Edition. However if the user wants access to the raw sampled eye gaze signal they are advised to use eye_gaze.csv.

As mentioned in the Data Description section, the user can work on a combination of information coming from the data document by utilizing the DICOM_ID tag found across all the data documents. Examples of data usage can be found at

Release Notes

Version 1.0.0: Initial upload of dataset 

Conflicts of Interest

No conflicts of interest to declare


  1. Mettler, F. A., Bhargavan, M., Faulkner, K., Gilley, D. B., Gray, J. E., Ibbott, G. S., Lipoti, J. A., Mahesh, M., McCrohan, J. L., Stabin, M. G., Thomadsen, B. R., and Yoshizumi, T. T., “Radiologic and Nuclear Medicine Studies in the United States and Worldwide: Frequency, Radiation Dose, and Comparison with Other Radiation Sources19502007,” Radiology 253, 520–531 (nov 2009)
  2. Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet.
  3. Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al., 2019. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. arXiv preprint arXiv:1901.07031
  4. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017,
  5. Stephen Anthony Waite, Arkadij Grigorian, Robert G Alexander, Stephen Louis Macknik, Marisa Carrasco, David Heeger, and Susana Martinez-Conde. 2019. Analysis of perceptual expertise in radiology–Current knowledge and a new perspective. Frontiers in human neuroscience 13 (2019), 213
  6. Van der Gijp, A., Ravesloot, C., Jarodzka, H., Van der Schaaf, M., Van der Schaaf, I., Van Schaik, J., & Ten Cate, T. J. (2016). How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. Advances in Health Sciences Education, 1-23. doi: 10.1007/s10459-016-9698-1
  7. Krupinski, E. A. (2010). Current perspectives in medical image perception. Attention, Perception, & Psychophysics, 72(5), 1205–1217.
  8. Tourassi G, Voisin S, Paquit V, Krupinski E: Investigating the link between radiologists’ gaze, diagnostic decision, and image content. J Am Med Inform Assoc 20(6):1067–1075, 2013
  9. Khosravan N, Celik H, Turkbey B, Jones EC, Wood B, Bagci U: A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning. Med Image Anal 51:101–115, 2019
  10. Stember, J.N., Celik, H., Krupinski, E. et al. Eye Tracking for Deep Learning Segmentation Using Convolutional Neural Networks. J Digit Imaging 32, 597–604 (2019).
  11. Aresta, Guilherme, et al. "Automatic lung nodule detection combined with gaze information improves radiologists' screening performance." IEEE Journal of Biomedical and Health Informatics (2020).
  12. Johnson, Alistair, et al. "MIMIC-IV" (version 0.4). PhysioNet (2020),

Parent Projects
Eye Gaze Data for Chest X-rays was derived from: Please cite them when using this project.

Access Policy:
Only PhysioNet credentialed users who sign the specified DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Corresponding Author
You must be logged in to view the contact information.