Database Credentialed Access
Multimodal Clinical Monitoring in the Emergency Department (MC-MED)
Aman Kansal , Emma Chen , Tom Jin , Pranav Rajpurkar , David Kim
Published: March 3, 2025. Version: 1.0.0
When using this resource, please cite:
(show more options)
Kansal, A., Chen, E., Jin, T., Rajpurkar, P., & Kim, D. (2025). Multimodal Clinical Monitoring in the Emergency Department (MC-MED) (version 1.0.0). PhysioNet. https://doi.org/10.13026/jz99-4j81.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
Emergency department (ED) patients often present with undiagnosed complaints, and can exhibit rapidly evolving physiology. Therefore, data from continuous physiologic monitoring, in addition to the electronic health record, is essential to understand the acute course of illness and responses to interventions. The complexity of ED care and the large amount of unstructured multimodal data it produces has limited the accessibility of detailed ED data for research. We release Multimodal Clinical Monitoring in the Emergency Department (MC-MED), a comprehensive, multimodal, and de-identified clinical and physiological dataset. MC-MED includes 118,385 adult ED visits to an academic medical center from 2020 to 2022. Data include continuously monitored vital signs, physiologic waveforms (electrocardiogram, photoplethysmogram, respiration), patient demographics, medical histories, orders, medication administrations, laboratory and imaging results, and visit outcomes. MC-MED is the first dataset to combine detailed physiologic monitoring with clinical events and outcomes for a large, diverse ED population.
Background
Emergency departments (EDs) play a critical role in evaluating and treating patients with a wide range of medical conditions, and ED care has major implications for patient outcomes, healthcare costs, and downstream inpatient and ambulatory care[1,2]. ED patients often present with undifferentiated complaints, and the nature and severity of their condition may become clear only over the course of the visit, which can include rapidly changing physiology. Thus, ED patients undergo continuous monitoring while receiving a large variety of diagnostic and therapeutic interventions, resulting in a large volume of heterogeneous, time-varying data. These data include triage reports, free-text notes, continuously monitored vital signs and physiologic waveforms such as electrocardiogram (ECG) and photoplethysmography (PPG), medication administration logs, laboratory and imaging results, diagnoses, disposition decisions, and subsequent encounters. While structured data elements are typically stored in the electronic health record (EHR), other types of data, such as the high-resolution time series produced by bedside monitoring, are seldom integrated with clinical data, and are often discarded due to their size[3]. Moreover, noise, missingness, and the ubiquity of protected health information can make these data challenging to navigate, and limit their availability for research[4].
Few comprehensive ED datasets exist for general use. Currently, the only publicly available ED dataset is MIMIC-IV-ED[5], a module of MIMIC-IV[6]. MIMIC-IV contains data from the ED and intensive care unit (ICU) of Beth Israel Deaconess Medical Center, from 2008 to 2019, including 73K ICU admissions and 358K ED visits. While ICU data include continuous vital signs and physiologic waveforms, ED visits include only infrequent vital sign measurements. Other ICU-only datasets include MIMIC-III[7], eICU[8], HiRID[9], and AmsterdamUMCdb[10]. These contain structured EHR data, and vital signs recorded at various frequencies, with 1-minute intervals being most common. None contain physiologic waveforms (Table 10).
We present Multimodal Clinical Monitoring in the Emergency Department (MC-MED), a first-of-its-kind dataset containing multimodal clinical and physiological data from 118,385 adult ED visits to monitored beds of the Stanford adult ED between September 2020 and September 2022. The dataset includes: patient demographics, medical histories, and home medications; continuously monitored vital signs and ECG, PPG, and respiratory waveforms; orders placed and medications administered during the visit; laboratory and imaging results; diagnoses, visit disposition, and length of stay. Figure 1 presents an overview of the time-varying data modalities included throughout an ED visit. MC-MED differs from existing datasets in its focus on a diverse ED population and its inclusion of continuously recorded vital signs and physiologic waveforms. Moreover, it is the first dataset to exclusively cover ED patients during and after the peak of the COVID-19 pandemic. Thus, MC-MED represents a valuable resource for researchers exploring many aspects of modern emergency care, with a focus on granular physiological measurements.
Methods
Acquisition & Transformation
MC-MED includes 118,385 adult ED visits to monitored beds of the Stanford Health Care Emergency Department between 2020 and 2022, from 70,545 unique patients aged 18 or older at the time of visit. Clinical EHR data were derived from the STAnford medicine Research data Repository (STARR), a clinical data warehouse containing data from the Epic EHR at Stanford Health Care, and from auxiliary hospital applications such as the radiology Picture Archiving and Communications System. Continuously monitored vital signs and physiologic waveforms were captured with Philips IntelliVue bedside monitors, stored in a separate data warehouse, and extracted using Philips PIC iX DWC Toolkit (C.03.31). The data acquisition, transformation, and de-identification processes are documented below. The study was approved by the Stanford University Institutional Review Board (58581).
MC-MED data is organized into four categories: (1) structured EHR data (visit data, prior diagnoses, home medications, laboratory results, orders), (2) free-text radiology reports, (3) continuously monitored vital signs, and (4) ECG, PPG, and respiratory waveforms. Data categories 1-3 are consolidated in tables and stored as CSV files. Physiologic waveforms are stored as WaveForm DataBase (WFDB) files, in folders nested by visit identifier (CSN).
Deidentification
MC-MED underwent a comprehensive deidentification process to remove all patient identifiers specified in the HIPAA Privacy Rule[11]. Patient (MRN) and visit (CSN) identifiers were mapped to random integers, and ages and date-times were randomly shifted at the patient level. Free-text radiology impressions were processed to remove any protected health information or information about specific providers, and manually verified by human reviewers. Figure 2 illustrates the de-identification process. Specific data elements were de-identified as follows:
- Medical Record Number (MRN) is a unique patient-level identifier. We randomly generated a new unique integer to replace each original MRN.
- Contact Serial Number (CSN) is a unique visit identifier, allowing linkage of the various data elements of MC-MED. We randomly generated a unique integer to replace each CSN.
- Age at time of visit was altered by adding or subtracting a uniformly random number of years ranging from 0 to 2. Adjusted ages below 18 (all reflecting actual ages of at least 18) were set to 18, and ages exceeding 90 were set to 90. This ensures that actual ages are obscured, while providing researchers with an accurate age range for analysis.
- Datetime fields are shifted by a random interval for each patient (MRN). We generated a random timeshift for each MRN, shifting all of that patient's datetime fields across all data elements to new values anchored between 2150 and 2350, preserving seasonality (January-March, April-June, July-September, and October-December). Shifted datetime fields include: Arrival_time, Roomed_time, Admit_time, Dispo_time, and Departure_time (visits table); Order_time, Result_time, and First_admin_time (orders table); Entry_date, Start_date and End_date (home medications table); Noted_date (past medical history table); Order_time and Result_time (labs and rads tables).
- Laboratory results with qualitative free-text interpretations unique to a specific patient were removed.
- Radiology reports were deidentified with the Stanford MIDRC Penn Deidentifier [12], an automated deidentification model designed to remove PHI from free-text radiology reports. Any PHI was replaced with three underscores ("___").
De-identification of all data elements was manually inspected for PHI by author and non-author human reviewers.
Waveform and Vital Sign Preprocessing
In the study hospital's monitoring database, waveform and numeric data recorded by the bedside monitors are associated with ED beds rather than patient visits. We therefore used data on patient rooming locations, and rooming and departure times to segment continuously recorded vital signs and waveforms by patient visit. We used Python's WFDB library to process and export this waveform data.
Because visits are associated with variable periods of monitoring of different modalities (for instance, ECG leads and PPG probes may be detached during patient movement and transport, then reattached upon return to the room), we present waveform data for each visit in multiple segments for each modality (ECG, PPG, respiration), and exclude recordings without physiologically meaningful signals (for instance, from detached leads). Specifically, waveform segments with constant values for 10 seconds or longer were removed. We computed derivative waveforms, w', where w'[i+1] = w[i+1]-w[i]. We then removed waveform segments for which w' was 0 for 10 seconds or greater. This processing ensures efficient representation and reliability of waveform data linked to complete ED visits.
Train-Validation-Test Splits
Though researchers may segment MC-MED in the manner most appropriate for their research question, we release two training/validation/test splits for general use. For both splits, the training set contains 80% of visits, and validation and test sets each contain 10% of visits.
Random patient-level split: CSNs (visits) corresponding to the same MRN (patient) are present in the same set: split_random_train.csv, split_random_val.csv and split_random_test.csv.
Chronological split: All visits in the validation set occur after the final visit in the training set, and all visits in the test set occur after the final visit in the validation set. To prevent patient data leakage between sets, each patient (MRN) is again restricted to only one of the training, validation, or test sets. This results in 13,007 visits being removed from these sets, and exact splits of 78%, 11%, and 11% for split_chrono_train.csv, split_chrono_val.csv, and split_chrono_test.csv.
Data Description
The Multimodal Clinical Monitoring in the Emergency Department (MC-MED) dataset offers a comprehensive collection of de-identified emergency department (ED) patient visits, encompassing both clinical data and continuous physiological waveforms. The dataset is structured to facilitate easy navigation and analysis, organized as follows:
./
├── labs.csv
├── meds.csv
├── numerics.csv
├── orders.csv
├── pmh.csv
├── rads.csv
├── split_chrono_train.csv
├── split_chrono_val.csv
├── split_chrono_test.csv
├── split_random_train.csv
├── split_random_val.csv
├── split_random_test.csv
├── visits.csv
├── waveform_summary.csv
├── waveforms
Top-Level Files:
- visits.csv: Contains high-level information for each ED visit, including patient demographics (age, sex), arrival method, chief complaint, disposition, and event timings.
- pmh.csv (Past Medical History): Lists historical diagnoses for patients, with corresponding ICD-9/ICD-10 codes and descriptions.
- meds.csv (Home Medications): Details patients' home medications, including start and end dates, along with coded medication identifiers.
- orders.csv: Records orders placed during the ED visit, such as labs, imaging studies, and medications, along with their timestamps.
- labs.csv: Provides laboratory test results, including component values, abnormal flags, and reference ranges.
- rads.csv (Radiology): Contains information on imaging studies, including study names and summarized impressions.
- numerics.csv: Offers minute-level numeric vital signs recorded during the ED stay, such as heart rate (HR), respiratory rate (RR), oxygen saturation (SpO2), systolic and diastolic blood pressure (SBP, DBP), mean arterial pressure (MAP), temperature (Temp), perfusion index (Perf), pain score, oxygen flow rate (LPM_O2), and heart rate variability metrics (1min_HRV, 5min_HRV).
- waveform_summary.csv: Summarizes available waveform segments (e.g., ECG, Pleth, Resp) for each visit, including total duration and segment count.
- split_*.csv files: Provide predefined training, validation, and test splits. Two types of splits are available:
- split_random_*.csv: Random 80/10/10 split by patient.
- split_chrono_*.csv: Chronological split ensuring no patient overlap between sets.
Waveform Data:
The waveform data are organized into the waveform folder, which has the following structure:
{CSN_suffix}/ # Folder named by last three digits of the CSN
{Full_CSN}/ # Folder named by the full CSN (visit identifier)
II/ # ECG waveform segments
{Full_CSN}_{segment_number}.dat
{Full_CSN}_{segment_number}.hea
Pleth/ # PPG waveform segments
{Full_CSN}_{segment_number}.dat
{Full_CSN}_{segment_number}.hea
...
Resp/ # Respiration waveform segments
{Full_CSN}_{segment_number}.dat
{Full_CSN}_{segment_number}.hea
Data elements are described in the following sections (and separately in the data dictionary file), and Table 11 summarizes patient/visit characteristics and statistics.
Visits
The visits table describes high-level characteristics of each visit. Data available at the time of patient arrival include: patient demographics ("Age", "Gender", "Race", "Ethnicity"), means of arrival to the ED ("Means_of_arrival"), triage vital signs ("Triage_Temp", "Triage_HR", "Triage_RR", "Triage_SpO2", "Triage_SBP", "Triage_DBP"), triage acuity by Emergency Severity Index (ESI) ("Triage_acuity"), and chief complaint ("CC"). Data summarizing the visit itself include ED disposition ("ED_dispo"), ED length of stay in hours ("ED_LOS"), class of primary visit payor ("Payor_class"), and primary diagnosis, by ICD9 (International Classification of Diseases, Ninth Revision) and ICD10 (International Classification of Diseases, Tenth Revision) codes, accompanied by free-text descriptions ("Dx_name"). For patients admitted to the hospital, the table includes admitting service ("Admit_service"), hospital length of stay in days ("Hosp_LOS"), and disposition on hospital discharge ("DC_dispo"). Shifted timestamps include: "Arrival_time" (arrival in ED), "Roomed_time" (first rooming), "Dispo_time" (time of disposition decision), "Admit_time" (time of admission), and "Departure_time" (time of departure from ED). Finally, the visits table includes the number of visits from a given patient in the dataset ("Visits"), the sequence of a given visit ("Visit_no"), the hours from ED departure until a patient's next ED visit ("Hours_to_next_visit"), and the disposition of the next ED visit ("Dispo_class_next_visit"). The visits table can be linked to other tables by CSN (orders, labs, rads, numerics) or MRN (meds, PMH).
Table 1: Visits data elements
Column Name |
Description |
Data Type |
Sample Data |
MRN |
Patient identifier (Random ID mapped to original MRN) |
int |
99940664 |
CSN |
Visit identifier (Random ID mapped to original CSN) |
int |
98874959 |
Visit_no |
The visit number for this patient in the dataset |
int |
1 |
Visits |
Total number of visits for this patient in the dataset |
int |
1 |
Age |
Patient age in years at time of visit (Random perturbation of age +/- 2 years, Ages greater than 90 set to 90) |
int |
90 |
Gender |
Patient gender |
string |
F |
Race |
Patient race |
string |
White |
Ethnicity |
Patient hispanic ethnicity |
string |
Non-Hispanic/Non-Latino |
Means_of_arrival |
Means of arrival to the ED |
string |
Self |
Triage_Temp |
Temperature at triage (C) |
float |
36.7 |
Triage_HR |
Heart rate at triage (bpm) |
float |
80.0 |
Triage_RR |
Respiratory rate at triage (breaths per min.) |
float |
18.0 |
Triage_SpO2 |
Oxygen saturation at triage (%) |
float |
100.0 |
Triage_SBP |
Systolic blood pressure at triage (mmHg) |
float |
128.0 |
Triage_DBP |
Diastolic blood pressure at triage (mmHg) |
float |
78.0 |
Triage_acuity |
Emergency Severity Index (ESI) at triage (1-5) |
string |
3-Urgent |
CC |
Chief complaint(s) at triage |
string |
ABDOMINAL PAIN |
ED_dispo |
Disposition of patient from the ED |
string |
Discharge |
Hours_to_next_visit |
For patients with a subsequent visit, number of hours from departure of current visit to arrival of next visit |
float |
40.0 |
Dispo_class_next_visit |
Dispo_class for next visit |
string |
Discharge |
ED_LOS |
Length of ED stay, hours |
float |
4.82 |
Hosp_LOS |
Length of hospital stay (including post-ED admission), days |
float |
1.0 |
DC_dispo |
Final disposition of patient from the hospital |
string |
Home/Work (includes foster care) |
Payor_class |
Class of primary visit payor |
string |
Medicare |
Admit_service |
For admitted patients, service admitting the patient from the ED |
string |
Emergency Medicine |
Dx_ICD9 |
Primary visit diagnosis, ICD9 code |
string |
786.50 |
Dx_ICD10 |
Primary visit diagnosis, ICD10 code |
string |
R07.9 |
Dx_name |
Name of ICD10 code |
string |
Chest pain, unspecified type |
Arrival_time |
Time of arrival to ED (Random-shift date, keeping season constant) |
datetime |
2262-01-09T03:16:07Z |
Roomed_time |
Time of patient rooming (Random-shift date, keeping season constant) |
datetime |
2283-03-02T07:36:59Z |
Dispo_time |
Time of disposition decision (Random-shift date, keeping season constant) |
datetime |
2247-09-22T10:54:42Z |
Admit_time |
For admitted patients, time of admission order (Random-shift date, keeping season constant) |
datetime |
2283-03-02T12:29:59Z |
Departure_time |
Time of departure from ED (Random-shift date, keeping season constant) |
datetime |
2209-08-12T11:31:38Z |
Orders
The orders table contains all orders placed by the ED physician during the visit, and is linked to other tables by CSN. "Order_type" categorizes orders, e.g., lab tests, imaging, medications, consults, nursing orders. "Procedure_name" describes the order, and "Procedure_ID" gives an accompanying procedure code. The following timestamps are shifted at the MRN level: "Order_time" describes when an order is placed, "First_admin_time" when a medication order is administered to a patient, and "Result_time" when a laboratory or imaging order produces a reported result.
Table 2: Orders data elements
Column Name |
Description |
Data Type |
Sample Data |
CSN |
Visit identifier (Random ID mapped to original CSN) |
int |
99139687 |
Order_time |
Time of order (Random-shift date, keeping season constant) |
datetime |
2226-01-15T15:39:22Z |
Order_type |
Type of order (lab, imaging, medication, etc) |
string |
Lab |
Procedure_name |
Name of order |
string |
CBC WITH DIFFERENTIAL |
Procedure_ID |
Identifier for order (Mapped to CPT codes) |
string |
LABMETC |
First_admin_time |
For medications, time of first administration (Random-shift date, keeping season constant) |
datetime |
2212-11-03T16:51:00Z |
Result_time |
For lab and imaging tests, time of result (Random-shift date, keeping season constant) |
datetime |
2295-08-25T16:55:28Z |
Meds
The meds table contains patient home medications, organized by patient (MRN). "Med_ID" gives a unique medication code, and "NDC" the National Drug Code, where available. "Name" and "Generic_name" describe the medication. "Med_class" gives a high-level classification of the medication, and "Med_subclass" a more detailed classification. "Active" indicates whether a patient was thought to be using the medication at the time of the visit. "Start_date" and "End_date" give shifted dates of medication initiation and termination, where applicable. These dates can be used to identify home medications at the time of a given visit.
Table 3: Meds data elements
Column Name |
Description |
Data Type |
Sample Data |
MRN |
Patient identifier (Random ID mapped to original MRN) |
int |
99721983 |
Med_ID |
Medication identifier (mapped to NDC id) |
int |
14113 |
NDC |
National Drug Code identifier |
string |
69618-066-10 |
Name |
Medication name |
string |
ASPIRIN 81 MG PO TBEC |
Generic_name |
Generic name |
string |
aspirin 81 mg tablet,delayed release |
Med_class |
High-level classification of the medication |
string |
VITAMIN D PREPARATIONS |
Med_subclass |
A more detailed classification |
string |
Vitamins - D Derivatives |
Active |
Indicates whether a patient was thought to be using the medication at the time of the visit |
string |
Y |
Entry_date |
Medication entry date (Random-shift date, keeping season constant) |
date |
2270-07-20T00:00:00Z |
Start_date |
Medication start date (Random-shift date, keeping season constant) |
date |
2275-08-31T00:00:00Z |
End_date |
Medication end date (Random-shift date, keeping season constant) |
date |
2241-08-06T00:00:00Z |
PMH
The PMH (Past Medical History) table contains prior diagnoses, organized by patient (MRN). "Noted_date" gives the shifted date when the diagnosis was recorded, and can be used to identify known diagnoses at the time of a given visit. "CodeType" specifies whether the "Code" should be interpreted as an ICD9 or ICD10 code. "Desc10" gives a text description of the ICD code. "CCS" gives the Clinical Classification Software category of the diagnosis, and "DescCCS" a text description of the CCS category.
Table 4: PMH data elements
Column Name |
Description |
Data Type |
Sample Data |
MRN |
Patient identifier (Random ID mapped to original MRN) |
int |
99084665 |
Noted_date |
Date when the diagnosis was recorded (Random-shift date, keeping season constant) |
date |
2219-08-10T00:00:00Z |
CodeType |
Whether code is ICD9 or ICD10 |
string |
Dx10 |
Code |
Diagnosis code |
string |
I10 |
Desc10 |
Text description of the code |
string |
Essential (primary) hypertension |
CCS |
Clinical Classification Software category of the diagnosis |
float |
259.0 |
DescCCS |
Text description of the CCS category |
string |
Residual codes; unclassified |
Labs
The labs table gives results for lab tests ordered during the ED visit (CSN). "Display_name" describes the test or panel of tests (e.g. comprehensive metabolic panel), while "Component_name" describes the specific measurement (e.g. serum sodium). "Abnormal" indicates whether any result in the test falls outside the normal range, and "Component_abnormal" whether a specific measurement is abnormal. "Component_result" gives the specific result, which may be numeric or categorical, while "Component_value" assigns a numeric value to all results. "Component_units" gives the units in which "Component_value" is measured, and "Component_nml_low" and "Component_nml_high" describe the normal range, where applicable. "Order_time" is the shifted time the test was ordered (which may not exactly match the timestamps in the orders table, since the laboratory may modify or further specify orders), and "Result_time" gives the time the result was reported.
Table 5: Labs data elements
Column Name |
Description |
Data Type |
Sample Data |
CSN |
Visit identifier (Random ID mapped to original CSN) |
int |
99880957 |
Order_time |
Time of lab order (Random-shift date, keeping season constant) |
datetime |
2221-07-03T18:37:12Z |
Result_time |
Time of lab result (Random-shift date, keeping season constant) |
datetime |
2233-07-07T04:37:14Z |
Display_name |
Name of order |
string |
CBC with Differential (CBCD) |
Abnormal |
Flag for abnormal or critical result |
string |
Abnormal |
Component_name |
Name of lab component |
string |
SODIUM |
Component_result |
Lab result (Removed results containing dates or names) |
string |
Negative |
Component_value |
Lab value (Sometimes identical to result, sometimes different, e.g. result may be categorical and value numeric. Removed results containing dates or names) |
string |
1 |
Component_units |
Units of component_value, where applicable |
string |
% |
Component_abnormal |
Flag for abnormal component_value |
string |
Normal |
Component_nml_low |
Low end of normal range for component |
float |
0.0 |
Component_nml_high |
High end of normal range for component |
float |
5.2 |
Rads
The rads table contains results of imaging studies ordered during each visit (CSN). "Study" is the type of imaging test (e.g., "XR HAND 3 VIEWS LEFT"), and "Impression" gives the de-identified free text summary of the resulting radiology report. As for the labs table, "Order_time" is the shifted time the study was ordered, and "Result_time" reflects when the attending radiologist's impression was posted.
Table 6: Rads data elements
Column Name |
Description |
Data Type |
Sample Data |
CSN |
Visit identifier (Random ID mapped to original CSN) |
int |
99002166 |
Order_time |
Time of imaging order (Random-shift date, keeping season constant) |
datetime |
2205-07-07T11:11:36Z |
Result_time |
Time of imaging result (Random-shift date, keeping season constant) |
datetime |
2291-10-02T05:06:13Z |
Study |
Imaging study name |
string |
XR CHEST 1 VIEW |
Impression |
Imaging result impression (de-identified with [12]) |
string |
"1. No acute cardiopulmonary disease." |
Numerics
The numerics table contains the non-waveform monitoring data, by visit (CSN). "Measure" indicates one of 12 measurements: heart rate (HR), respiratory rate (RR), oxygen saturation by pulse oximetry (SpO2), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), temperature in degrees Fahrenheit (Temp), mean last-minute perfusion index derived from the PPG waveform (Perf), pain rating on 0-10 scale (Pain), liters per minute of supplemental oxygen (LPM_O2), and heart rate variability over the last 1 minute (1min_HRV) or 5 minutes (5min_HRV), calculated as the standard deviation of the beat-to-beat RR interval of the ECG waveform over this period. "Value" gives the accompanying numeric value of the observation. Where underlying observations are made more frequently than once per minute, they are aggregated to the mean value over the 60 seconds preceding the timestamp given in "Time". "Source" indicates whether a value was recorded by nursing (Chart) or derived directly from the monitoring database (Monitor).
Table 7: Numerics Data Elements
Column Name |
Description |
Data Type |
Sample Data |
CSN |
Visit identifier (Random ID mapped to original CSN) |
int |
99833461 |
Source |
Indicates whether a value was recorded by nursing (Chart) or derived directly from the monitoring database (Monitor) |
string |
Monitor |
Measure |
One of 12 measurements: - HR Heart rate, heartbeats per minute - RR Respiratory rate, breaths per minute - SpO2 Oxygen saturation by pulse oximetry, % - SBP Systolic blood pressure, mmHg - DBP Diastolic blood pressure, mmHg - MAP Mean arterial pressure, mmHg - Temp Body temperature, degrees Fahrenheit - Perf Mean last-minute perfusion index derived from the PPG waveform - Pain Self-reported pain rating, 0 (no pain) to 10 (worst pain) - LPM_O2 Flow rate of supplemental oxygen, liters per minute - 1min_HRV or 5min_HRV Heart rate variability over the last 1 minute or 5 minutes, calculated as the standard deviation of the beat-to-beat RR interval of the ECG waveform over this period. |
string |
SpO2 |
Value |
Observation value |
float |
100.0 |
Time |
Timestamp of the measurement. When underlying observations are made more frequently than once per minute, they are aggregated to the mean value over the 60 seconds preceding the timestamp. |
string |
2247-05-09T06:40:42Z |
Waveforms
The waveforms folder contains ECG (electrocardiogram), PPG (photoplethysmogram), and Resp (respiration) waveform data as WFDB records. The waveform data is organized into folders by visit. Parent folders are named by the last 3 digits of the CSN. These folders contain CSN-level folders, which contain subfolders for each waveform data type (ECG, PPG, respiration), containing multi-segment WFDB-compatible records of the waveforms.
Waveform Summary File
Table 8: Waveform summary data elements
Column Name |
Description |
Data Type |
Sample Data |
CSN |
Visit identifier (Random ID mapped to original CSN) |
int |
99633476 |
Type |
One of 3 waveforms: - II Electrocardiogram - Pleth Photoplethysmogram - Resp Respiration |
string |
Pleth |
Segments |
The number of waveform segments |
int |
1 |
Duration |
The total duration for all segments for the waveform type, in seconds |
float |
119.984 |
Waveform Data
Table 9: Waveform type description
Type |
Description |
Common Uses |
Electrocardiogram (ECG) |
Records the voltage and timing of the heart's electrical activity |
Diagnosis of myocardial injury, arrhythmias, electrolyte derangements; assessment of medication responses |
Photoplethysmogram (Pleth/PPG) |
Records changes in blood volume over time at the site of the sensor |
Estimation of heart rate, respiratory rate, blood pressure, blood oxygen saturation |
Respiration (Resp) |
Estimates chest wall expansion and contraction |
Estimation of respiratory rate, tidal volume, respiratory function |
Exhibits
Table 10. Comparison of MC-MED and existing ED or ICU datasets
Dataset |
Source |
Time range |
Unique patients |
Visits |
ED data |
ICU data |
Structured EHR |
Free- Text data |
Wave- forms |
Vital Signs |
MIMIC-III [7] |
The ICU of Beth Israel Deaconess Medical Center, Boston, Massachusetts |
2001 - 2012 |
38,597 (adults) |
53,423 (adults); 7,870 (neonates) |
✓ |
✓ |
✓ |
Hourly time-stamped nurse-verified physiological measurements |
||
MIMIC-IV (v2.2) [6] |
The ICU of Beth Israel Deaconess Medical Center |
2008–2019 |
299,712 |
73,181ICU stays |
✓ |
✓ |
✓ |
ECG, PPG, and Blood Pressure signals |
Digitally derived from waveforms, or sampled irregularly (such as non-invasive blood pressure) |
|
The ED of Beth Israel Deaconess Medical Center |
358,050 ED stays |
✓ |
✓ |
✓ |
Vital signs taken every 1-4 hours for ED patients |
|||||
eICU (v2.0) [8] |
335 ICU units at 208 hospitals located throughout the US |
2014 - 2015 |
139,367 |
200,859 |
✓ |
✓ |
⭘ (Semi-structured notes) |
Vital signs are recorded both hourly and continuously ( at 1-minute intervals, with 5-minute medians) |
||
HiRID [9] |
The Department of Intensive Care Medicine of the Bern University Hospital, Switzerland |
January, 2008 - June, 2016 |
33,905 |
55,602 |
✓ |
✓ |
Bedside vital signs are recorded continuously (at regular intervals of 2 min) |
|||
Amsterd-amUMCdb (v1.0.2) [10] |
The department of Intensive Care, a mixed medical-surgical ICU, from Amsterdam University Medical Center |
2003 - 2016 |
20,109 |
23,106 |
✓ |
✓ |
✓ |
Recorded up to one value every minute |
||
EHRSHOT[15] |
Stanford Medicine |
1990 - 2023 |
6,739 |
921,499 (all encounters, not restricted to ED or ICU) |
✓ |
✓ |
✓ |
|||
MC-MED |
Stanford Health Care Emergency Department |
2020 - 2022 |
70,545 |
118,385 |
✓ |
✓ |
✓ |
ECG, PPG, and respiration |
Recorded continuously, reported as 1-minute means |
Table 11: Summary of visit metadata (de-identified).
Group |
Count |
Percentage |
|
Age |
18-30 |
19525 |
16.49 |
30-40 |
18096 |
15.29 |
|
40-50 |
15664 |
13.23 |
|
50-60 |
17864 |
15.09 |
|
60-70 |
17982 |
15.19 |
|
70-80 |
15107 |
12.76 |
|
80-90 |
14147 |
11.95 |
|
Gender |
F |
64272 |
54.29 |
M |
54077 |
45.68 |
|
U |
36 |
0.03 |
|
Race |
American Indian or Alaska Native |
309 |
0.26 |
Asian |
19430 |
16.41 |
|
Black or African American |
7653 |
6.46 |
|
Declines to State |
551 |
0.47 |
|
Native Hawaiian or Other Pacific Islander |
2468 |
2.08 |
|
Other |
39951 |
33.75 |
|
Unknown |
519 |
0.44 |
|
White |
47504 |
40.13 |
|
Ethnicity |
Hispanic/Latino |
32494 |
27.45 |
Non-Hispanic/Non-Latino |
84557 |
71.43 |
|
Unknown |
701 |
0.59 |
|
Declines to State |
633 |
0.53 |
|
Triage Acuity |
1-Resuscitation |
1066 |
0.9 |
2-Emergent |
27880 |
23.55 |
|
3-Urgent |
77259 |
65.26 |
|
4-Semi-Urgent |
10973 |
9.27 |
|
5-Non-Urgent |
680 |
0.57 |
|
Unknown |
527 |
0.45 |
|
ED Disposition |
Discharge |
70246 |
59.34 |
Inpatient |
29483 |
24.90 |
|
Observation |
15890 |
13.42 |
|
ICU |
2766 |
2.34 |
Figure 1. Time-varying data modalities recorded throughout a patient's ED visit (CSN 99797372). (A) shows the orders placed by the ED physician. (B) shows selected numeric monitoring data, including heart rate (HR), blood pressure (BP), peripheral oxygen saturation (SpO2), respiratory rate (RR), and heart rate variability (HRV). (C) shows electrocardiogram (ECG), photoplethysmogram (PPG), and respiration (Resp) waveforms at a specific point in the visit. (D) shows a free-text radiology report. (E) depicts one of numerous time-stamped laboratory results.
Figure 2: De-identification process. Patient (MRN) and visit (CSN) identifiers are mapped to random integers. Patient age is randomly perturbed by 0-2 years. All timestamps are shifted by a patient-level random time interval, maintaining seasonality. PHI is stripped from free-text radiology impressions using a BERT-based de-identification tool.
Usage Notes
We demonstrate how to read, link, and visualize the data in the MC-MED GitHub repository [14]. For detailed notes on how to unpack the data, please refer to the README and the data-descriptor files.
MC-MED supports a wide range of research uses, including early detection of critical illness, improved patient risk stratification, and the development of advanced clinical decision support tools. It facilitates precision medicine by combining continuous physiologic monitoring with comprehensive clinical records. However, MC-MED is from a single center, and data may contain noise or missingness. Time shifts and de-identification protect privacy but may limit certain analyses. Despite these constraints, MC-MED sets a valuable benchmark for evaluating algorithms, as demonstrated in the Multimodal Clinical Benchmark for Emergency Care (MC-BEC) paper[13], providing a reliable foundation for innovative emergency care research.
Release Notes
This is a provisional submission for peer reviewers only and will be updated when made public.
Ethics
The project was reviewed by the Stanford University Institutional Review Board and Secondary Data Use Committee. The study was approved by the Stanford University Institutional Review Board (58581) with waiver of consent for retrospective research on de-identified, routinely collected data.
Acknowledgements
We thank Stephanie Bogdan and Xiaoli Yang for their independent verification of de-identification.
Conflicts of Interest
The authors report no conflicts of interest.
References
- Burke LG, Burke RC, Epstein SK, Orav EJ, Jha AK. Trends in Costs of Care for Medicare Beneficiaries Treated in the Emergency Department From 2011 to 2016. JAMA Netw Open. 2020 Aug 3;3(8):e208229.
- Sun BC, Hsia RY, Weiss RE, Zingmond D, Liang LJ, Han W, et al. Effect of emergency department crowding on outcomes of admitted patients. Ann Emerg Med. 2013 Jun;61(6):605–11.e6.
- De Georgia MA, Kaffashi F, Jacono FJ, Loparo KA. Information technology in critical care: review of monitoring and data acquisition systems for patient care and research. ScientificWorldJournal. 2015 Feb 4;2015:727694.
- Seh AH, Zarour M, Alenezi M, Sarkar AK, Agrawal A, Kumar R, et al. Healthcare Data Breaches: Insights and Implications. Healthcare (Basel) [Internet]. 2020 May 13;8(2). Available from: http://dx.doi.org/10.3390/healthcare8020133
- Johnson A, Bulgarelli L, Pollard T, Celi LA, Mark R, Horng S. MIMIC-IV-ED [Internet]. PhysioNet; 2023. Available from: https://physionet.org/content/mimic-iv-ed/2.2/
- Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023 Jan 3;10(1):1.
- Johnson AEW, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016 May 24;3:160035.
- Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018 Sep 11;5:180178.
- Yèche H, Kuznetsova R, Zimmermann M, Hüser M, Lyu X, Faltys M, et al. HiRID-ICU-Benchmark -- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data [Internet]. arXiv [cs.LG]. 2021. Available from: http://arxiv.org/abs/2111.08536
- Thoral PJ, Peppink JM, Driessen RH, Sijbrands EJG, Kompanje EJO, Kaplan L, et al. Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example. Crit Care Med. 2021 Jun 1;49(6):e563–77.
- Portability I, Act A. Guidance regarding methods for de-identification of protected health information in accordance with the health insurance portability and accountability act (HIPAA) privacy rule [Internet]. 2012. Available from: https://privacysecurityacademy.com/wp-content/uploads/2021/03/HHS-OCR-Guidance-on-De-Identification-of-PHI-2012.pdf
- Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J Am Med Inform Assoc. 2023 Jan 18;30(2):318–28.
- Chen E, Kansal A, Chen J, Jin BT, Reisler J, Kim DE, et al. Multimodal Clinical Benchmark for Emergency Care (MC-BEC): A Comprehensive Benchmark for Evaluating Foundation Models in Emergency Medicine. Advances in Neural Information Processing Systems. 2023 Dec 15;36:45794–811.
- GitHub - dkimlab/MCMED [Internet]. GitHub. [cited 2025 Jan 8]. Available from: https://github.com/dkimlab/MCMED
- Wornow M, Thapa R, Steinberg E, Fries JA, Shah NH. EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models [Internet]. arXiv [cs.LG]. 2023. Available from: http://arxiv.org/abs/2307.02028
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/jz99-4j81
DOI (latest version):
https://doi.org/10.13026/xgx1-7x47
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project