Database Contributor Review
HiRID, a high time-resolution ICU dataset
Martin Faltys , Marc Zimmermann , Xinrui Lyu , Matthias Hüser , Stephanie Hyland , Gunnar Rätsch , Tobias Merz
Published: Feb. 18, 2021. Version: 1.1.1
New on PhysioNet: the HiRID critical care dataset (June 5, 2020, 12:29 p.m.)
We are pleased to announce the release of the HiRID critical care dataset, developed as part of a collaboration between Bern University Hospital and the Swiss Federal Institute of Technology (ETH). HiRID is a freely accessible critical care dataset containing data relating to more than 33 thousand admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland, an interdisciplinary 60-bed unit admitting >6,500 patients per year.
When using this resource, please cite:
(show more options)
Faltys, M., Zimmermann, M., Lyu, X., Hüser, M., Hyland, S., Rätsch, G., & Merz, T. (2021). HiRID, a high time-resolution ICU dataset (version 1.1.1). PhysioNet. https://doi.org/10.13026/nkwc-js72.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
HiRID is a freely accessible critical care dataset containing data relating to almost 34 thousand patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed unit admitting >6,500 patients per year. The ICU offers the full range of modern interdisciplinary intensive care medicine for adult patients. The dataset was developed in cooperation between the Swiss Federal Institute of Technology (ETH) Zürich, Switzerland and the ICU.
The dataset contains de-identified demographic information and a total of 681 routinely collected physiological variables, diagnostic test results and treatment parameters from almost 34 thousand admissions during the period from January 2008 to June 2016. Data is stored with a uniquely high time resolution of one entry every two minutes.
Background
Critical illness is characterized by the presence or risk of developing life-threatening organ dysfunction. Critically ill patients are typically cared for in intensive care units (ICUs), which specialize in providing continuous monitoring and advanced therapeutic and diagnostic technologies. This dataset was collected during routine care at the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed unit admitting >6,500 patients per year. It was initially extracted to support a study on the early prediction of circulatory failure in the intensive care unit using machine learning [1]. The latest documentation for the dataset is available at hirid.intensivecare.ai [2].
Methods
The HiRID database contains a large selection of all routinely collected data relating to patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU). The data was extracted from the ICU Patient Data Management System which is used to prospectively register patient health information, measurements of organ function parameters, results of laboratory tests and treatment parameters from ICU admission to discharge.
HiRID contains:
-
Demographic data
-
Measurements from bedside monitoring
-
Measurements and settings of medical devices such as mechanical ventilation
-
Observations by health care providers e.g.: GCS, RASS, urine and other fluid output
-
Lab values
-
Administered drugs, fluids and nutrition
HiRID has a higher time resolution than other published datasets, most importantly for bedside monitoring with most parameters recorded every 2 minutes.
Anonymization procedure
To ensure the anonymization of individuals in the data set, we followed the procedures successfully applied for the MIMIC-III and AmsterdamUMCdb dataset, which adopted the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor requirements and, in the case of AmsterdamUMCdb, also the European Union's General Data Protection Regulation (GDPR) standards [3,4].
-
Removal of all eighteen identifying data elements listed in HIPAA
-
Dates were shifted by a random offset such that the admission date lies between 2100 and 2200. We made sure to preserve the seasonality, time of day and the day of week.
-
Patient age, height and weight are binned into bins of size 5. For patient age, the max bin is 90 years and contains also all older patients.
-
Measurements and medications with changing units over time were standardized to the latest unit used. This standardization was necessary to make a conclusion about estimated admission times, based on the units used in a specific patient, impossible.
-
Free text was removed from the database
-
k-anonymization was applied on patient age, weight, height and sex.
Ethical approval and patient consent
The institutional review board (IRB) of the Canton of Bern approved the study. The need for obtaining informed patient consent was waived owing to the retrospective and observational nature of the study.
Data Description
The overall data is available in two states: as raw data and/or as pre-processed data. Additionally there are three reference tables for variable lookup.
Reference tables
-
variable reference (*hirid_variable_reference.csv*) - reference table for variables (for raw stage)
-
ordinal variable reference (*ordinal_vars_ref.csv*) - reference table for categorical/ordinal variables for string value lookup
-
pre-processed variable reference (*hirid_variable_reference_preprocessed.csv*) - reference table for variables (for merged and imputed stage)
Raw data
The raw data was only processed if this was necessary for patient de-identification and otherwise left unchanged compared to the original source. The source data contains the complete set of available variables (685 variables). It consists of the following tables:
-
observations
-
pharma records
-
general data
Preprocessed data
The pre-processed data consists of intermediary pipeline stages from the accompanying publication by Hyland et al [1]. Source variables representing the same clinical concepts were merged into one meta-variable per concept. The data contains the 18 most predictive meta-variables only, as defined in our publication. Two different stages of the pipeline are available
-
Merged stage source variables are merged into meta-variables by clinical concepts e.g. non-opioid-analgesics. The time grid is left unchanged and is sparse.
-
Imputed stage the data from the merged stage is down sampled to a five-minute time grid. The time grid is filled with imputed values. The imputation strategy is complex and is discussed in the original publication.
The code used to generate these stages can be found in this GitHub repository under the preprocessing folder [5].
Which data to use?
The pre-processed data is intended mainly as a quick way to jump-start a project or for use in a proof of concept. We recommend using the source data whenever possible for regular projects. It is the most flexible form and contains the complete set of variables in the original time resolution.
Data formats
Data is available in two formats: CSV for wide compatibility and Apache Parquet for convenience and performance.
Since the data sets are fairly large, they are split into partitions, such that they can be processed in parallel in a straightforward way. The lookup table mapping patient id to partition id is provided in the file named {data_set}_index.csv along with the data. The partitions are aligned between the different data sets and tables, such that the data of a patient can always be found in the partition with the same id. Note however, that a patient may not occur in all data sets, e.g. a patient might be missing in the preprocessed data, because a patient didn't meet the demographic criteria to be included in the study.
Patient ID / ICU admission
The dataset treats each ICU admission uniquely and it is not possible to identify multiple ICU admissions as originating from the same patient. For each ICU (re-)admission a unique "Patient ID" is generated.
Data schemata
The schemata of every table can be found in the *schemata.pdf* file.
Usage Notes
As the database contains detailed information regarding the clinical care of patients, it must be treated with appropriate care and respect.
Researchers are required to formally request access via PhysioNet. To be granted access, the user has to be a credentialed PhysioNet user, digitally sign the Data Use Agreement and provide a specific research question.
Release Notes
Changes from version 1.0:
- Introduction of the discharge status (alive/dead/unknown) as a column in the general table
- Corrected an issue with the unit of Troponin (Variable ID: 24000538, 24000806). Now all measurements have ng/l as unit. Some measurements had a wrong unit before.
Conflicts of Interest
The authors declare no conflicts of interest
References
- HiRID GitHub Repository. http://github.com/ratschlab/circEWS/
- AmsterdamUMCdb website and documentation. https://amsterdammedicaldatascience.nl/
- HiRID Database website and documentation. https://hirid.intensivecare.ai/
- Johnson, A., Pollard, T., Shen, L., Lehman, L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L., & Mark, R. (2016). MIMIC-III, a freely accessible critical care database. Scientific data, 3, 160035.
- Hyland, S.L., Faltys, M., Hüser, M. et al. Early prediction of circulatory failure in the intensive care unit using machine learning. Nat Med 26, 364–373 (2020). https://doi.org/10.1038/s41591-020-0789-4
Access
Access Policy:
Only credentialed users who sign the DUA can access the files. In addition, users must have individual studies reviewed by the contributor.
License (for files):
PhysioNet Contributor Review Health Data License 1.5.0
Data Use Agreement:
PhysioNet Contributor Review Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.1.1):
https://doi.org/10.13026/nkwc-js72
DOI (latest version):
https://doi.org/10.13026/323r-nk04
Topics:
icu
clinical
intensive care
high resolution
critical care
machine learning
Project Website:
https://hirid.intensivecare.ai/
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- submit a request to the authors to use the data for your project