Database Credentialed Access

BOLD, a blood-gas and oximetry linked dataset

João Matos Tristan Struja Jack Gallifant Luis Filipe Nakayama Marie Charpignon Xiaoli Liu Jaime dos Santos Cardoso Leo Anthony Celi An Kwok Wong

Published: Nov. 8, 2023. Version: 1.0

When using this resource, please cite: (show more options)
Matos, J., Struja, T., Gallifant, J., Nakayama, L. F., Charpignon, M., Liu, X., dos Santos Cardoso, J., Celi, L. A., & Wong, A. K. (2023). BOLD, a blood-gas and oximetry linked dataset (version 1.0). PhysioNet.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


Pulse oximeters measure peripheral arterial oxygen saturation (SpO2) noninvasively, while the gold standard involves arterial blood gas measurement (SaO2). There are known racial and ethnic disparities in their performance. BOLD is a new comprehensive dataset that aims to underscore the importance of addressing biases in pulse oximetry accuracy, particularly affecting people with darker skin tones. The dataset was created by harmonizing three Electronic Health Record databases (MIMIC-III, MIMIC-IV, eICU-CRD) comprising Intensive Care Unit stays. Paired SpO2 and SaO2 measurements were time-aligned and combined with various other clinical parameters to provide a detailed picture of each patient. It includes 49,099 such paired measurements within a 5-minute window, with oxygen saturation levels between 70-100%. Significantly, about 25% of the data represents minority racial and ethnic groups, a proportion seldom achieved in previous studies. The code scripts have been made publicly available to facilitate replication.We hope that BOLD is poised to be a valuable resource for health equity studies, that can be used to develop pulse oximetry debiasing algorithms.


The measurement and management of arterial blood gas (ABG) and pulse oximetry in the Intensive Care Unit (ICU) have long been the subject of clinical interest but are often under-studied. Pulse oximeters and ABG are tools for evaluating systemic oxygen saturation and providing guidance for clinical decision-making. Standardization in pairing arterial blood gas samples with pulse oximeter readings, a critical component for effective patient monitoring and management, is particularly scarce. This is due in part to challenges in coordinating large electronic health record (EHR) datasets and synchronizing clinical protocols across multiple medical centers.

Recent research has added another layer of complexity by uncovering racial disparities in pulse oximeter reading accuracy, which have critical implications for patient care and outcomes [1-4]. Such disparities further emphasize the urgent need for robust and inclusive datasets for thorough analysis. Given these challenges and revelations, real-world data for retrospective analysis can offer invaluable insights. All of these studies have used EHR data, which was stored in multiple formats that may not be easy to use, representing a barrier to entry.

Existing large-scale EHR datasets, though available, are often not in a form that can readily answer these nuanced and urgently needed clinical questions [5-7]. The understanding necessary to make individual EHR datasets usable is nontrivial; thus, this preprocessing to create a unified and easily usable dataset removes a barrier to entry. Consequently, this dataset aims to be a holistic approach to the extraction, processing, and analysis of ABG samples and pulse oximeter readings, emphasizing the study of racial disparities, from EHR.

To address these issues, our study proposes a validated and reproducible methodology to convert unprocessed database queries into a clinically useful dataset. Our multidisciplinary team, which includes clinicians (i.e., pulmonologists and intensivists) and data scientists, has developed rules based on clinical and physiological standards for pairing each arterial blood gas sample with a corresponding pulse oximeter reading along with clinical scores, vital signs, and laboratory test values.

Our primary objective is to facilitate extensive data analysis in pulse oximetry to reduce inequities by merging data from three major, publicly available, ICU-EHR databases – MIMIC-III, MIMIC-IV, and eICU-CRD. A combined dataset not only offers a solution to the paucity of large and diverse datasets but also provides a unique platform for exploring the discovered disparities in pulse oximetry readings. By making this robust dataset publicly available, we provide researchers with the means to develop models that address these racial and ethnic disparities [8], thus significantly contributing to the accuracy and fairness of healthcare delivery. Furthermore, by making the platform and code available, this platform can serve as an example for conducting similar studies that would benefit from linked databases.


BOLD is sourced from three ICU-EHR databases: MIMIC-III, MIMIC-IV, and eICU-CRD. Its creation was carried out in three steps:

  1. SaO2-SpO2 matching: Pulse oximetry readings (SpO2) are required to precede arterial blood gas measurements (SaO2) within a strict time window of up to 5 minutes. Missing ABG data is not permitted, ensuring a robust comparison of these two vital signs. Pairs where oxygen saturation is not within the 70-100% range are excluded. Since a patient can have multiple ABG readings - and (SaO2, SpO2) pairs - in the same hospital stay, only the first pair per hospitalization (across all ICU admissions during a hospitalization) were kept. The goal is to avoid a non-uniform overrepresentation of patients.

  2. Time alignment and curation across the databases: Time-varying data were aligned with each (SaO2, SpO2) pair, using left-sided windows (only past data, relative to the SaO2 timestamp, were considered). This was done to provide a comprehensive clinical picture of the patient at the time of the SaO2 measurement, with a variable-specific tolerance in the past.

  3. Harmonization of concepts across the databases: The resulting dataset was standardized to ensure consistency across different databases. Only variables present in all databases were included to minimize missing data. Categorical variables were mapped to the same labels, across databases. Patient identifiers, demographics, admission characteristics, vital signs, laboratory values, and clinical scores are stored in the same formats, ensuring a cohesive dataset. For race and ethnicity, standardized categories were used, although the data retains the original limitations in distinguishing between the concepts of race and ethnicity. Since Hispanic was used as a race and ethnicity group, patients identified as Black or White are presumed to be non-Hispanic Black and non-Hispanic White. It is presumed unreliable to differentiate Hispanic Black and Hispanic White from recorded race and ethnicity data. The final categories included: "American Indian / Alaska Native"; "Asian"; "Black"; "Hispanic OR Latino"; "More Than One Race"; "Native Hawaiian / Pacific Islander"; "Unknown"; and "White".

  4. Merging of databases: After ensuring harmonization of data formats, with all columns matching, the three databases were merged through horizontal concatenation.

These mappings and methodology are fully accompanied by open-source code, which can be easily modified by interested users to accommodate new needs.

Data Description

There are several categories of data helpful in augmenting analyses and characterizing patients receiving a temporally proximate (SaO2, SpO2) pair.

All adjunctive (e.g., vital signs, laboratory values, etc.) data is referenced from the time of the ABG. A time delta (delta_ prefix) refers to the time difference between the most recently recorded covariate of interest and ABG measurement. Each time-varying covariate is accompanied by a time delta. The ABG measurement time is set as the reference and takes place after the covariate measurement or reading, unless otherwise noted.

Sample Size: The final dataset comprises 49,099 pairs (representing 44,907 patients) in total, where 4,921 pairs (4,778 patients) come from MIMIC-IV; 740 pairs (728 patients) from MIMIC-III; and 43,438 pairs (39,401 patients) from eICU-CRD. Note that there are more pairs than patients because a single patient can have multiple hospital stays.

Identifiers: Each encounter has three identifiers, at different levels: patient, hospital, and ICU admission. The original identifiers are kept to allow linking the data with the original databases and eventually pull other variables of interest. However, to avoid overlap among the databases, we created new, unique identifiers for our dataset that reflect each of these three identifiers. Each encounter also has an identifier to reflect the source database.

Among the three used databases, only eICU-CRD has hospital identifiers, since the MIMIC databases come from one single hospital, i.e. Beth Israel Deaconess Medical Center (BIDMC). As a result, MIMIC data was assigned a hospital index of 9999, which is outside the range of eICU hospital indices. Other hospital-related variables (number of beds, US region, and teaching status) were harmonized accordingly.  

Demographics: Demographics, such as admission age, sex, race and ethnicity, were isolated from demographics tables for each dataset. Admission age was unified, with age between 18-89 directly evaluated, and “≥ 90” noted as 90. 

Admission characteristics and patient outcomes: Comorbidities are calculated by van Walraven Elixhauser score (MIMIC-III) and Charlson Comorbidity Index (MIMIC-IV, eICU-CRD). BMI was computed with the weight and height on admission, for each database. Admission characteristics (e.g., hospital size) and patient outcomes (e.g., in-hospital mortality) were recorded for each patient and annotated per row. 

Vital signs: Temperature, blood pressure (both non-invasive and invasive), heart rate, respiratory rate, and SpO2 were extracted. These data were obtained from the chartevents and nursecharting tables of the original MIMIC and eICU databases, respectively. The prefix “vitals_” is used for each variable of this type, except for SpO2.

Laboratory test values: Typical common laboratory values were merged within time windows as noted. Lab measurements of the categories: ABG (no prefix), complete blood count (“cbc_” prefix); coagulation (“coag_” prefix); basic metabolic panel (“bmp_” prefix); hepatic function panel (“hfp_” prefix); and other enzymes (“other_” prefix) were pulled. In the MIMIC databases, all laboratory data were collected from the original labevents table; in eICU-CRD, data were collected from the labs table. 

Hourly SOFA scores: To characterize organ dysfunction and severity of illness, sequential organ failure assessment (SOFA) scores were used [9]. SOFA scores for each dataset were calculated hourly. SOFA scores were extracted the hour prior to the ABG to ensure no impact on characterizing underlying organ dysfunction  (“sofa_past_” prefix). SOFA scores were also extracted 24 hours after the ABG to quantify impact on organ dysfunction (“sofa_future_” prefix).

Usage Notes

The dataset is shared as a single comma-separated value file (CSV), which is accompanied by a data dictionary in .PDF format. We also provide all the code to recreate the dataset [10]. The 1_dataset.ipynb notebook contains all the necessary queries optimized to be used on Google’s BigQuery (SQL standard) to generate the final CSV file. We did softcode the important inclusion criteria of lower SaO2, upper SaO2, and lower and upper time windows to facilitate any changes to these key parameters. Analysts need to make sure they set up a BigQuery project according to the instructions in our notebook. We also share the notebooks 2_consort_diagram.ipynb, 3_tableones.ipynb, and 4_technical_validation.ipynb to recreate all the analyses provided in this paper.;


The use of the data in this research came from MIMIC-III, MIMIC-IV, and eICU-CRD, all fully de-identified databases (containing no protected health information) that we received permission for use under a PhysioNet Credentialed Health Data Use Agreement (v1.5.0). The study was determined to be exempt from human subjects research. All experiments need to follow the PhysioNet Credentialed Health Data License Agreement. Medical charting by providers in the electronic health record is at-risk for multiple types of bias.

Conflicts of Interest

AIW holds equity and management roles in Ataia Medical. All other authors report no conflicts of interest.


  1. Sjoding MW, Dickson RP, Iwashyna TJ, et al.: Racial Bias in Pulse Oximetry Measurement. N Engl J Med 2020; 383:2477–2478
  2. Charpignon ML, Byers J, Cabral S, Celi LA, Fernandes C, Gallifant J, Lough ME, Mlombwa D, Moukheiber L, Ong BA, Panitchote A. Critical bias in critical care devices. Critical Care Clinics. 2023 Oct 1;39(4):795-813.
  3. Gottlieb ER, Ziegler J, Morley K, et al.: Assessment of Racial and Ethnic Differences in Oxygen Supplementation Among Patients in the Intensive Care Unit [Internet]. JAMA Internal Medicine 2022; 182:849 Available from:
  4. Nazer LH, Zatarah R, Waldrip S, Ke JX, Moukheiber M, Khanna AK, Hicklen RS, Moukheiber L, Moukheiber D, Ma H, Mathur P. Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digital Health. 2023 Jun 22;2(6):e0000278.
  5. Pollard TJ, Johnson AEW, Raffa J, et al.: The eICU Collaborative Research Database [Internet]. 2017; Available from:
  6. Johnson AEW, Pollard TJ, Shen L, et al.: MIMIC-III, a freely accessible critical care database. Sci Data 2016; 3:160035
  7. Johnson A, Bulgarelli L, Pollard T, et al.: MIMIC-IV [Internet]. 2021; Available from:
  8. Matos J, Struja T, Gallifant J, et al.: Shining Light on Dark Skin: Pulse Oximetry Correction Models. In: 2023 IEEE 7th Portuguese Meeting on Bioengineering (ENBENG). 2023. p. 211–214.
  9. Vincent JL, Moreno R, Takala J, et al.: The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996; 22:707–710
  10. "BOLD, a blood-gas and oximetry linked dataset". GitHub, Accessed 20 September 2023.

Parent Projects
BOLD, a blood-gas and oximetry linked dataset was derived from: Please cite them when using this project.

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.