Database Credentialed Access
Eye Gaze Data for Chest X-rays
Published: Sept. 12, 2020. Version: 1.0.0
When using this resource, please cite:
(show more options)
Karargyris, A., Kashyap, S., Lourentzou, I., Wu, J., Tong, M., Sharma, A., Abedin, S., Beymer, D., Mukherjee, V., Krupinski, E., & Moradi, M. (2020). Eye Gaze Data for Chest X-rays (version 1.0.0). PhysioNet. https://doi.org/10.13026/qfdz-zr67.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
We created a rich multimodal dataset for the Chest X-Ray (CXR) domain. The data was collected using an eye tracking system while a radiologist interpreted and read 1,083 public CXR images. The dataset contains the following aligned modalities: image, transcribed report text, dictation audio and eye gaze data. We hope this dataset can contribute to various fields of research with applications in machine learning such as deep learning explainability, multi-modal fusion, disease classification, and automated radiology report generation to name a few. The images were selected from the MIMIC-CXR Database and were associated with studies from 1,038 subjects (female: 495, male: 543) who had age range 20 - 80 years old.
CXR is the most common imaging modality in the United States. It makes up to 74% of all imaging modalities ordered by physicians . In recent years with the proliferation of deep learning techniques and publicly available CXR datasets(, , ), numerous machine learning approaches have been proposed and deployed in radiology settings for disease detection.
Eye tracking in radiology has been extensively studied for the purposes of education, perception understanding, fatigue measurement (please see literature reviews for more details: , , , ). More recently, efforts such as , , and  have shown use of eye gaze data to improve segmentation and disease classification in Computed Tomography (CT) radiography by combining them in deep learning techniques.
Currently, there is a lack of public datasets that capture eye gaze data in CXR space and given their promising utilization in machine learning, we are releasing the first of its kind dataset to the research community to explore and implement novel applications.
The dataset was collected using an eye tracking system (GP3 HD Eye Tracker, Gazepoint). A radiologist, American Board of Radiology (ABR) certified, with 5 years of attending experience performed interpretation/reading on 1,083 CXR images. The analysis software (Gazepoint Analysis UX Edition) allowed for recording and exporting of eye gaze data and dictation audio.
To identify the images for this study we used MIMIC-CXR Database , which is a large public dataset containing CXR in conjunction with the MIMIC-IV Clinical Database  that contains clinical outcomes. Inclusion and exclusion criteria were applied on the Emergency Department clinical noted from MIMIC-IV Clinical Database  resulting in a subset of 1,083 cases covering equally 3 prominent clinical conditions (i.e. Normal, Pneumonia and Congestive Heart Failure (CHF)). The corresponding CXR images of these cases were extracted from the MIMIC-CXR database .
The radiologist performed radiology reading on these CXR images using Gazepoint's GP3 Eye Tracker, Gazepoint Analysis UX Edition software, a headset microphone, a PC computer and a monitor (Dell S2719DGF) set at 1920x1080 resolution. Radiology reading took place in multiple sessions (i.e. 30 cases per session) over a period of 2 months (i.e. March - May 2020). The Gazepoint Analysis UX Edition exported raw and processed eye gaze fixations (.csv format) and voice dictation (audio) of radiologist's reading. The audio files were further processed with speech-to-text software (i.e. Google Speech-to-Text API) to extract text transcripts along with dictation word time-related information (.json format). Furthermore, these transcripts were manually corrected. The final dataset contained the eye gaze signal information (.csv), audio files (.wav, .mp3) and transcript files (.json).
The dataset consists of the following data documents:
master_sheet.csv:Master spreadsheet containing
DICOM_IDs (i.e. original MIMIC-CXR Database IDs) along with disease labels
fixations.csv: Spreadsheet containing fixation eye gaze data as exported by Gazepoint Analysis UX Edition software containing
eye_gaze.csv: Spreadsheet containing raw eye gaze data as exported by Gazepoint Analysis UX Edition software containing
bounding_boxes.csv: Spreadsheet containing bounding boxes coordinates for the anatomical structures containing
inclusion_exclusion_criteria_outputs: Folder containing 3 spreadsheet files that were generated after applying inclusion/exclusion criteria. These 3 spreadsheet files can be used by the sampling script to generate the
master_sheet.csv. This is optional and it is shared for reproducible purposes.
audio_segmentation_transcripts: Folder with i) dictation audio files (i.e. mp3, wav), ii) transcript file (i.e. json), iii) anatomy segmentation mask files (i.e. png) for each
The user can traverse easily between the data documents using the
DICOM_ID as well as the MIMIC-CXR Database.
bounding_boxes.csv and anatomy segmentation masks files are provided as supplemental sources to help researchers for useful in-depth and correlation analysis (e.g. eye gaze vs. anatomical structures) and/or anatomical structure segmentation purposes.
master_sheet.csv spreadsheet provides the following key information (detailed description found in
DICOM_IDcolumn maps each row to the original MIMIC CXR image as well as the rest of the documents in this dataset.
- Granular disease labels given by the MIMIC CXR database  (i.e. CheXpert NLP tool )
- The reason for exam sentences sectioned out from Indication section of the original MIMIC-CXR report
eye_gaze.csv spreadsheets contain the eye tracking information. They were exported by the Gazepoint Analysis UX Edition software. The difference between
eye_gaze.csv is that the former file is a subset of the latter one.
eye_gaze.csv file contains one (1) row for every data sample collected from the eye tracker while
fixations.csv file contains a single data entry per fixation. Fixation is defined as the maintaining of the eye gaze on a single location (i.e. eye gaze cluster). So the Gazepoint Analysis UX Edition software generates the
fixations.csv file by post-processing (i.e. 'sweeping') the
eye_gaze.csv file and storing the last entry for each fixation. Both
eye_gaze.csv spreadsheets contain the same columns. Key columns that are found in both spreadsheets are listed below (detailed description found in
DICOM_ID: maps rows to the original MIMIC image name.
TIME (in secs): presents the time elapsed in seconds since the last system initialization or calibration (i.e. when a new CXR image was presented to the radiologist)
FPOGX: the X coordinates of the fixation POG, as a fraction of the screen size. (0, 0) is top left, (0.5, 0.5) is the screen center, and (1.0, 1.0) is bottom right.
FPOGY: the Y coordinates of the fixation POG, as a fraction of the screen size. (0, 0) is top left, (0.5, 0.5) is the screen center, and (1.0, 1.0) is bottom right.
X_ORIGINAL: the X coordinate of the fixation in original MIMIC DICOM image coordinates.
Y_ORIGINAL: the Y coordinate of the fixation in original MIMIC DICOM image coordinates.
bounding_boxes.csv contains the following columns:
dicom_id: the MIMIC DICOM image name
bbox_name: the anatomy name
x1: the X coordinate of the top left corner point of the bounding box in original MIMIC DICOM image coordinates
y1: the Y coordinate of the top left corner point of the bounding box in original MIMIC DICOM image coordinates
x2: the X coordinate of the bottom right corner point of the bounding box in original MIMIC DICOM image coordinates
y2: the Y coordinate of the bottom right corner point of the bounding box in original MIMIC DICOM image coordinates
audio_segmentation_transcripts folder contains subfolders named using
DICOM_IDs. Each subfolder contains the following files:
audio.wav: the dictation audio in wav format
audio.mp3: the dictation audio in mp3 format
transcript.json: the transcript of the dictation audio with timestamps for each spoken phrase. Specifically,
phrasetag contains phrase text,
begin_timetag contains the starting time (in seconds) of dictation for phrase,
end_timetag contains the end time (in seconds) of dictation for phrase
aortic_knob.pngare the manually segmentation images of four (4) key anatomies: left lung, right lung, mediastinum, aortic knob, respectively.
The dataset requires access to the CXR DICOM images found in the MIMIC-CXR database . In general, the user is advised to use the
fixations.csv spreadsheet for their experiments because it contains the eye gaze signal as post-processed by the Gazepoint Analysis UX Edition. However if the user wants access to the raw sampled eye gaze signal they are advised to use
As mentioned in the Data Description section, the user can work on a combination of information coming from the data document by utilizing the
DICOM_ID tag found across all the data documents. Examples of data usage can be found at https://github.com/cxr-eye-gaze/eye-gaze-dataset
Version 1.0.0: Initial upload of dataset
Conflicts of Interest
No conflicts of interest to declare
- Mettler, F. A., Bhargavan, M., Faulkner, K., Gilley, D. B., Gray, J. E., Ibbott, G. S., Lipoti, J. A., Mahesh, M., McCrohan, J. L., Stabin, M. G., Thomadsen, B. R., and Yoshizumi, T. T., “Radiologic and Nuclear Medicine Studies in the United States and Worldwide: Frequency, Radiation Dose, and Comparison with Other Radiation Sources19502007,” Radiology 253, 520–531 (nov 2009)
- Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. https://doi.org/10.13026/C2JT1Q.
- Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al., 2019. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. arXiv preprint arXiv:1901.07031
- Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017,
- Stephen Anthony Waite, Arkadij Grigorian, Robert G Alexander, Stephen Louis Macknik, Marisa Carrasco, David Heeger, and Susana Martinez-Conde. 2019. Analysis of perceptual expertise in radiology–Current knowledge and a new perspective. Frontiers in human neuroscience 13 (2019), 213
- Van der Gijp, A., Ravesloot, C., Jarodzka, H., Van der Schaaf, M., Van der Schaaf, I., Van Schaik, J., & Ten Cate, T. J. (2016). How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. Advances in Health Sciences Education, 1-23. doi: 10.1007/s10459-016-9698-1
- Krupinski, E. A. (2010). Current perspectives in medical image perception. Attention, Perception, & Psychophysics, 72(5), 1205–1217.
- Tourassi G, Voisin S, Paquit V, Krupinski E: Investigating the link between radiologists’ gaze, diagnostic decision, and image content. J Am Med Inform Assoc 20(6):1067–1075, 2013
- Khosravan N, Celik H, Turkbey B, Jones EC, Wood B, Bagci U: A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning. Med Image Anal 51:101–115, 2019
- Stember, J.N., Celik, H., Krupinski, E. et al. Eye Tracking for Deep Learning Segmentation Using Convolutional Neural Networks. J Digit Imaging 32, 597–604 (2019). https://doi.org/10.1007/s10278-019-00220-4
- Aresta, Guilherme, et al. "Automatic lung nodule detection combined with gaze information improves radiologists' screening performance." IEEE Journal of Biomedical and Health Informatics (2020).
- Johnson, Alistair, et al. "MIMIC-IV" (version 0.4). PhysioNet (2020), https://doi.org/10.13026/a3wn-hq05.
Only PhysioNet credentialed users who sign the specified DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0