Name: Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts
Published: March 23, 2026
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Yuan Pu , Dennis Shung , Alexander Tong , Nicole Zhang , Yuki Kawamura

Published: March 23, 2026. Version: 1.0.0

When using this resource, please cite:
Pu, Y., Shung, D., Tong, A., Zhang, N., & Kawamura, Y. (2026). Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/pqe3-bv16

MLA	Pu, Yuan, et al. "Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts" (version 1.0.0). PhysioNet (2026). RRID:SCR_007345. https://doi.org/10.13026/pqe3-bv16
APA	Pu, Y., Shung, D., Tong, A., Zhang, N., & Kawamura, Y. (2026). Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/pqe3-bv16
Chicago	Pu, Yuan, Shung, Dennis, Tong, Alexander, Zhang, Nicole, and Yuki Kawamura. "Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts" (version 1.0.0). PhysioNet (2026). RRID:SCR_007345. https://doi.org/10.13026/pqe3-bv16
Harvard	Pu, Y., Shung, D., Tong, A., Zhang, N., and Kawamura, Y. (2026) 'Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts' (version 1.0.0), PhysioNet. RRID:SCR_007345. Available at: https://doi.org/10.13026/pqe3-bv16
Vancouver	Pu Y, Shung D, Tong A, Zhang N, Kawamura Y. Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts (version 1.0.0). PhysioNet. 2026. RRID:SCR_007345. Available from: https://doi.org/10.13026/pqe3-bv16

BibTeX

@article{PhysioNet-clinical-trajectory-flow-icu-1.0.0,
  author = {Pu, Yuan and Shung, Dennis and Tong, Alexander and Zhang, Nicole and Kawamura, Yuki},
  title = {{Clinical Time Series Datasets for Trajectory Flow Matching Evaluation: ICU Sepsis, ICU Cardiac Arrest, and ICU GIB Cohorts}},
  journal = {{PhysioNet}},
  year = {2026},
  month = mar,
  note = {Version 1.0.0},
  doi = {10.13026/pqe3-bv16},
  url = {https://doi.org/10.13026/pqe3-bv16}
}

Additionally, please cite the original publication:

Zhang X, Pu Y, Kawamura Y, Loza A, Bengio Y, Shung DL, Tong A. Trajectory Flow Matching with Applications to Clinical Time Series Modelling. Advances in Neural Information Processing Systems 2024. 37: 107198-107224

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). RRID:SCR_007345.
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

This resource comprises three clinical time series datasets used in the paper Trajectory Flow Matching with Applications to Clinical Time Series Modeling to evaluate models for handling irregularly sampled data in critical care settings. The ICU Sepsis and ICU Cardiac Arrest datasets are derived from the eICU Collaborative Research Database v2.0, and the ICU Gastrointestinal Bleeding (GIB) Dataset is derived from the MIMIC-III database. The ICU Sepsis Dataset includes 3362 patients with sepsis as the primary admission diagnosis, the ICU Cardiac Arrest Dataset contains 64589 ICU patients at risk for cardiac arrest, and the ICU GIB Dataset includes 2602 patients with gastrointestinal bleeding. Each dataset is split into training, validation, and test sets with a ratio of 0.8/0.1/0.1. These datasets were used to demonstrate the improved performance of the Trajectory Flow Matching (TFM). Sharing these datasets aims to facilitate further exploration of clinical time series modeling, the development of new models in the medical domain, and to help others replicate the experimental results.

Background

The motivation for this dataset was the need for providers and healthcare systems to track hemodynamic status for patients over time, particularly those with clinical conditions (e.g. sepsis, acute gastrointestinal bleeding, ICU patients at risk of cardiac arrest) where hemodynamic changes can directly impact patient outcomes. Existing approaches to model the dynamic trajectories of hemodynamics should account for irregular sampling and incorporate other baseline patient conditions, such as significant co-morbidities; no datasets existed to test approaches that did both, like trajectory flow matching. We created these cohorts to more faithfully describe the hemodynamic trajectories of patients requiring ICU care, and these support the specific question of applying trajectory flow matching or other dynamic risk modeling efforts in an irregularly sampled context.

The three datasets provided in this resource were derived from two well-known critical care databases: the eICU Collaborative Research Database v2.0 and the MIMIC-III database. They were created to evaluate the performance of TFM [1] - a novel approach for modeling stochastic and irregularly sampled clinical time series data - against NeuralODE, LatentODE, Aligned FM, and NeuralSDE specifically in critical care settings [2,3,4,5]. By sharing these datasets, we also aim to support the broader scientific community in advancing the development of robust time series models for critical care.

Methods

The following description is adapted from Appendix B2.2 of the TFM paper [1].

In the TFM project, patient trajectories consisted of heart rate and blood pressure measurements during the first 24 hours following ICU admission. The timeline for each trajectory, originally in minutes, was scaled to a range between 0 and 1 by dividing by 1440. Additionally, heart rate and blood pressure values were z-score normalized to standardize the data. Other available variables were used as conditional inputs.

Intensive Care Unit Sepsis (ICU Sepsis) Dataset

The eICU Collaborative Research Database v2.0 [6] is a database including de-identified information collected from over 200,000 patients in multiple intensive care units (ICUs) in the United States from 2014 to 2015. The ICU Sepsis Dataset was created by subsetting the eICU Database for 3362 patients with sepsis as the primary admission diagnosis (2689 patients in training set, 336 in validation set, and 337 in test set). The following data fields were extracted: patient sex, age, heart rate, mean arterial pressure, norepinephrine dose and infusion rate, and a validated ICU score (APACHE-IV). Each patient's complete pair measurements of heart rate and mean arterial pressure over time form one trajectory to be modeled. Norepinephrine infusion rates were calculated by converting drug doses or infusion rates to μg/kg/min, and where drug doses were not explicitly available, the dose was inferred from the free text given in the drug name. Start and end times for norepinephrine infusion were calculated by dividing the dose by the infusion rate. Where there appeared to be multiple infusions at the same time, the maximum infusion rate was taken as the infusion rate. As a conditional input to the models, the norepinephrine infusion doses are then scaled to between 0 and 1 by dividing by the maximum norepinephrine value in the dataset. The APACHE-IV score, a validated critical care risk score, predicts individual patient mortality risk [7]. In data preprocessing, we uses logistic regression of the score against binary hospital mortality data to generate a probability for each patient, serving as an additional variable 'apache_outcome_prob'.

Intensive Care Unit Cardiac Arrest (ICU Cardiac Arrest) Dataset

This dataset was extracted from the eICU Collaborative Research Database v2.0 [6] described above to reflect ICU patients at risk for cardiac arrest. This dataset excludes patients who presented with myocardial infarction (MI) and includes variables used in the Cardiac Arrest Risk Triage (CART) score [8]: respiratory rate, heart rate, diastolic blood pressure, and age recorded at the time of ICU admission. 51671 patients were included in the training set, with 6459 patients each in the validation and test sets.

Intensive Care Unit Acute Gastrointestinal Bleeding (ICU GIB) Dataset

The Medical Information Mart for Intensive Care III (MIMIC-III) critical care database contains data for over 40,000 patients in the Beth Israel Deaconess Medical Center from 2001 to 2012 requiring an ICU stay [9]. We selected a cohort of 2602 ICU patients with the primary diagnosis of gastrointestinal bleeding to form the ICU GIB dataset, split into a training set of 2082 patients, and a validation set and a test set of 260 patients each. We extracted the following variables: age, sex, heart rate, systolic blood pressure, diastolic blood pressure, usage of vasopressor, usage of blood product, liver disease. The outcome variable was in-hospital mortality. Likewise, trajectories to model in the TFM paper consist of complete pairs of heart rate and mean arterial pressure (calculated from systolic blood pressure and diastolic blood pressure) measurements.

Data Description

This resource contains three datasets stored as .csv files. Each dataset is structured as a Pandas DataFrame. The column 'label' with values 'train', 'val', and 'test', indicates whether the row belongs to the training, validation, or test sets, respectively, for the experiments and evaluations reported in the TFM paper [1]. DataFrame rows sharing the same 'HADM_ID' represent the vital trajectories (retaining only complete pairs of heart rate and blood pressure measurements), along with both static and dynamic relevant conditions, for a single patient during the first 24 hours following their ICU admission.

ICU Sepsis Dataset (`eICU_sepsis_physionet.csv`)

This dataset includes 3362 patients with sepsis as the primary diagnosis, sourced from the eICU database. The training set includes 2689 patients, the validation set contains 336 patients, and the test set comprises 337 patients. Variables include:

TIME_FROM_ADM: time in minutes since ICU admission
time_scaled*: TIME_FROM_ADM scaled by 1440
hr_normalized*: z-score normalized heart rate
map_normalized*: z-score normalized mean arterial pressure
norepi_inf_scaled*: norepinephrine infusion value scaled by the max dose
apache_outcome_prob*: a risk probability calculated by regressing APACHE-IV score to hospital mortality. Note that this variable contains outcome information
apache: raw APACHE-IV score calculated on admission
ICU_MORT: binary ICU mortality outcome (1 indicating death, 0 indicating survival)
HOSP_MORT: binary in-hospital mortality outcome (1 indicating death, 0 indicating survival)

ICU Cardiac Arrest Dataset (`eICU_cardiacArrest_physionet.csv`)

This dataset contains data from 64589 ICU patients at risk for cardiac arrest, also sourced from the eICU database. 51671 patients were included in the training set, with 6459 patients each in the validation and test sets. Variables include:

TIME_FROM_ADM: time in minutes since ICU admission
time_scaled*: TIME_FROM_ADM scaled by 1440
hr_normalized*: z-score normalized heart rate
dbp_normalized*: z-score normalized diastolic blood pressure
rr_normalized*: z-score normalized respiratory rate
age_normalized*: z-score normalized age at admission
MI: binary myocardial infarction status developed during the hospital stay (1 indicating presence, 0 indicating absence)
ICU_MORT: binary ICU mortality outcome (1 indicating death, 0 indicating survival)
HOSP_MORT: binary in-hospital mortality outcome (1 indicating death, 0 indicating survival)

ICU GIB Dataset (MIMIC_gib_physionet.csv)

This dataset consists of 2602 ICU patients diagnosed with gastrointestinal bleeding, extracted from the MIMIC-III database. The training set includes 2082 patients, the validation set and the test set contain 260 patients each. Variables include:

TIME_FROM_ADM: time in hours since ICU admission
time_scaled*: TIME_FROM_ADM scaled by 24
hr_normalized*: z-score normalized heart rate
map_normalized*: z-score normalized mean arterial pressure
pressor: binary vasopressor usage (1= used, 0 = not used)
pressor_gaussian*: vasopressor usage with a Gaussian decay applied over time to pressor usage records. Note that this decay was applied before excluding any time steps with missing hr or map, so the values in this column still reflect the influence of usage records that were later excluded and may not align perfectly with the binary usage in this final df.
bloodprod: binary blood product other than packed red blood cells usage (1= used, 0 = not used)
bloodprod_gaussian*: blood product other than packed red blood cells usage with a Gaussian decay applied over time to bloodprod usage records. Note that this decay was applied before excluding any time steps with missing hr or map, so the values in this column still reflect the influence of usage records that were later excluded and may not align perfectly with the binary usage in this final df.
prbc*: binary packed red blood cells usage (1= used, 0 = not used)
severe_liver*: binary liver disease status (1 = presence of severe liver disease, 0 = absence)
HOSP_MORT: binary in-hospital mortality outcome (1 = death, 0 = survival)

* indicates variables used in the TFM paper experiments [1].

Usage Notes

These datasets have already been used to evaluate the performance of TFM, NeuralODE, LatentODE, Aligned FM, and NeuralSDE in the context of clinical time series modeling [1,2,3,4,5]. For more details on data usage, including code and implementation details, please refer to the associated GitHub repository for TFM [10]. In addition to the variables used in these experiments, we have included outcome variables such as mortality and the development of severe conditions. Although these outcomes were not part of our original TFM experiments, they are provided to support further research on clinical outcome prediction, which represents a natural extension of clinical time series modeling.

One known limitation of the data is data completeness. For each patient, only time points with complete measurements—both heart rate and blood pressure (and respiratory rate for the ICU Cardiac Arrest Dataset)—were retained, while time points with incomplete measurements were excluded. The resulting time trajectories represent the patient's vital signs based solely on these complete measurements over the first 24 hours following admission to the ICU. We do not explicitly use other available data fields in the ICU databases (e.g. unstructured note text, laboratory values, and imaging).

Ethics

This is a subset of an approved project (MIMIC) by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified. This is also a subset of the eICU Collaborative database, which is a study that is exempt from institutional review board approval due to the retrospective design, lack of direct patient intervention, and the security schema, for which the re-identification risk was certified as meeting safe harbor standards by an independent privacy expert (Privacert, Cambridge, MA) (Health Insurance Portability and Accountability Act Certification no. 1031219-2).

Conflicts of Interest

The authors have no conflicts of interest to declare.

References

Zhang XN, Pu Y, Kawamura Y, Loza A, Bengio Y, Shung D, Tong A. Trajectory Flow Matching with Applications to Clinical Time Series Modelling. Advances in Neural Information Processing Systems. 2025 Jan 31;37:107198-224.
Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK. Neural ordinary differential equations. Advances in neural information processing systems. 2018;31.
Lipman Y, Chen RT, Ben-Hamu H, Nickel M, Le M. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747. 2022 Oct 6.
Rubanova Y, Chen RT, Duvenaud DK. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems. 2019;32.
Liu X, Xiao T, Si S, Cao Q, Kumar S, Hsieh CJ. Neural sde: Stabilizing neural ode networks with stochastic noise. arXiv preprint arXiv:1906.02355. 2019 Jun 5.
Pollard TJ, Johnson AE, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Scientific data. 2018 Sep 11;5(1):1-3.
Zimmerman JE, Kramer AA, McNair DS, Malila FM. Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today’s critically ill patients. Critical care medicine. 2006 May 1;34(5):1297-310.
Churpek MM, Yuen TC, Park SY, Meltzer DO, Hall JB, Edelson DP. Derivation of a cardiac arrest prediction model using ward vital signs. Critical care medicine. 2012 Jul 1;40(7):2102-8.
Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, Moody B, Szolovits P, Anthony Celi L, Mark RG. MIMIC-III, a freely accessible critical care database. Scientific data. 2016 May 24;3(1):1-9.