Database Credentialed Access

Multimodal Clinical Monitoring in the Emergency Department (MC-MED)

Aman Kansal Emma Chen Tom Jin Pranav Rajpurkar David Kim

Published: March 3, 2025. Version: 1.0.0


When using this resource, please cite: (show more options)
Kansal, A., Chen, E., Jin, T., Rajpurkar, P., & Kim, D. (2025). Multimodal Clinical Monitoring in the Emergency Department (MC-MED) (version 1.0.0). PhysioNet. https://doi.org/10.13026/jz99-4j81.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Emergency department (ED) patients often present with undiagnosed complaints, and can exhibit rapidly evolving physiology. Therefore, data from continuous physiologic monitoring, in addition to the electronic health record, is essential to understand the acute course of illness and responses to interventions. The complexity of ED care and the large amount of unstructured multimodal data it produces has limited the accessibility of detailed ED data for research. We release Multimodal Clinical Monitoring in the Emergency Department (MC-MED), a comprehensive, multimodal, and de-identified clinical and physiological dataset. MC-MED includes 118,385 adult ED visits to an academic medical center from 2020 to 2022. Data include continuously monitored vital signs, physiologic waveforms (electrocardiogram, photoplethysmogram, respiration), patient demographics, medical histories, orders, medication administrations, laboratory and imaging results, and visit outcomes. MC-MED is the first dataset to combine detailed physiologic monitoring with clinical events and outcomes for a large, diverse ED population.


Background

Emergency departments (EDs) play a critical role in evaluating and treating patients with a wide range of medical conditions, and ED care has major implications for patient outcomes, healthcare costs, and downstream inpatient and ambulatory care[1,2]. ED patients often present with undifferentiated complaints, and the nature and severity of their condition may become clear only over the course of the visit, which can include rapidly changing physiology. Thus, ED patients undergo continuous monitoring while receiving a large variety of diagnostic and therapeutic interventions, resulting in a large volume of heterogeneous, time-varying data. These data include triage reports, free-text notes, continuously monitored vital signs and physiologic waveforms such as electrocardiogram (ECG) and photoplethysmography (PPG), medication administration logs, laboratory and imaging results, diagnoses, disposition decisions, and subsequent encounters. While structured data elements are typically stored in the electronic health record (EHR), other types of data, such as the high-resolution time series produced by bedside monitoring, are seldom integrated with clinical data, and are often discarded due to their size[3]. Moreover, noise, missingness, and the ubiquity of protected health information can make these data challenging to navigate, and limit their availability for research[4].

Few comprehensive ED datasets exist for general use. Currently, the only publicly available ED dataset is MIMIC-IV-ED[5], a module of MIMIC-IV[6]. MIMIC-IV contains data from the ED and intensive care unit (ICU) of Beth Israel Deaconess Medical Center, from 2008 to 2019, including 73K ICU admissions and 358K ED visits. While ICU data include continuous vital signs and physiologic waveforms, ED visits include only infrequent vital sign measurements. Other ICU-only datasets include MIMIC-III[7], eICU[8], HiRID[9], and AmsterdamUMCdb[10]. These contain structured EHR data, and vital signs recorded at various frequencies, with 1-minute intervals being most common. None contain physiologic waveforms (Table 10).

We present Multimodal Clinical Monitoring in the Emergency Department (MC-MED), a first-of-its-kind dataset containing multimodal clinical and physiological data from 118,385 adult ED visits to monitored beds of the Stanford adult ED between September 2020 and September 2022. The dataset includes: patient demographics, medical histories, and home medications; continuously monitored vital signs and ECG, PPG, and respiratory waveforms; orders placed and medications administered during the visit; laboratory and imaging results; diagnoses, visit disposition, and length of stay. Figure 1 presents an overview of the time-varying data modalities included throughout an ED visit. MC-MED differs from existing datasets in its focus on a diverse ED population and its inclusion of continuously recorded vital signs and physiologic waveforms. Moreover, it is the first dataset to exclusively cover ED patients during and after the peak of the COVID-19 pandemic. Thus, MC-MED represents a valuable resource for researchers exploring many aspects of modern emergency care, with a focus on granular physiological measurements.


Methods

Acquisition & Transformation

MC-MED includes 118,385 adult ED visits to monitored beds of the Stanford Health Care Emergency Department between 2020 and 2022, from 70,545 unique patients aged 18 or older at the time of visit. Clinical EHR data were derived from the STAnford medicine Research data Repository (STARR), a clinical data warehouse containing data from the Epic EHR at Stanford Health Care, and from auxiliary hospital applications such as the radiology Picture Archiving and Communications System. Continuously monitored vital signs and physiologic waveforms were captured with Philips IntelliVue bedside monitors, stored in a separate data warehouse, and extracted using Philips PIC iX DWC Toolkit (C.03.31). The data acquisition, transformation, and de-identification processes are documented below. The study was approved by the Stanford University Institutional Review Board (58581).

MC-MED data is organized into four categories: (1) structured EHR data (visit data, prior diagnoses, home medications, laboratory results, orders), (2) free-text radiology reports, (3) continuously monitored vital signs, and (4) ECG, PPG, and respiratory waveforms. Data categories 1-3 are consolidated in tables and stored as CSV files. Physiologic waveforms are stored as WaveForm DataBase (WFDB) files, in folders nested by visit identifier (CSN).

Deidentification

MC-MED underwent a comprehensive deidentification process to remove all patient identifiers specified in the HIPAA Privacy Rule[11]. Patient (MRN) and visit (CSN) identifiers were mapped to random integers, and ages and date-times were randomly shifted at the patient level. Free-text radiology impressions were processed to remove any protected health information or information about specific providers, and manually verified by human reviewers. Figure 2 illustrates the de-identification process. Specific data elements were de-identified as follows:

  • Medical Record Number (MRN) is a unique patient-level identifier. We randomly generated a new unique integer to replace each original MRN.
  • Contact Serial Number (CSN) is a unique visit identifier, allowing linkage of the various data elements of MC-MED. We randomly generated a unique integer to replace each CSN.
  • Age at time of visit was altered by adding or subtracting a uniformly random number of years ranging from 0 to 2. Adjusted ages below 18 (all reflecting actual ages of at least 18) were set to 18, and ages exceeding 90 were set to 90. This ensures that actual ages are obscured, while providing researchers with an accurate age range for analysis.
  • Datetime fields are shifted by a random interval for each patient (MRN). We generated a random timeshift for each MRN, shifting all of that patient's datetime fields across all data elements to new values anchored between 2150 and 2350, preserving seasonality (January-March, April-June, July-September, and October-December). Shifted datetime fields include: Arrival_time, Roomed_time, Admit_time, Dispo_time, and Departure_time (visits table); Order_time, Result_time, and First_admin_time (orders table); Entry_date, Start_date and End_date (home medications table); Noted_date (past medical history table); Order_time and Result_time (labs and rads tables).
  • Laboratory results with qualitative free-text interpretations unique to a specific patient were removed.
  • Radiology reports were deidentified with the Stanford MIDRC Penn Deidentifier [12], an automated deidentification model designed to remove PHI from free-text radiology reports. Any PHI was replaced with three underscores ("___").

De-identification of all data elements was manually inspected for PHI by author and non-author human reviewers.

Waveform and Vital Sign Preprocessing

In the study hospital's monitoring database, waveform and numeric data recorded by the bedside monitors are associated with ED beds rather than patient visits. We therefore used data on patient rooming locations, and rooming and departure times to segment continuously recorded vital signs and waveforms by patient visit. We used Python's WFDB library to process and export this waveform data.

Because visits are associated with variable periods of monitoring of different modalities (for instance, ECG leads and PPG probes may be detached during patient movement and transport, then reattached upon return to the room), we present waveform data for each visit in multiple segments for each modality (ECG, PPG, respiration), and exclude recordings without physiologically meaningful signals (for instance, from detached leads). Specifically, waveform segments with constant values for 10 seconds or longer were removed. We computed derivative waveforms, w', where w'[i+1] = w[i+1]-w[i]. We then removed waveform segments for which w' was 0 for 10 seconds or greater. This processing ensures efficient representation and reliability of waveform data linked to complete ED visits.

Train-Validation-Test Splits

Though researchers may segment MC-MED in the manner most appropriate for their research question, we release two training/validation/test splits for general use. For both splits, the training set contains 80% of visits, and validation and test sets each contain 10% of visits.

Random patient-level split: CSNs (visits) corresponding to the same MRN (patient) are present in the same set: split_random_train.csv, split_random_val.csv and split_random_test.csv.

Chronological split: All visits in the validation set occur after the final visit in the training set, and all visits in the test set occur after the final visit in the validation set. To prevent patient data leakage between sets, each patient (MRN) is again restricted to only one of the training, validation, or test sets. This results in 13,007 visits being removed from these sets, and exact splits of 78%, 11%, and 11% for split_chrono_train.csv, split_chrono_val.csv, and split_chrono_test.csv.


Data Description

The Multimodal Clinical Monitoring in the Emergency Department (MC-MED) dataset offers a comprehensive collection of de-identified emergency department (ED) patient visits, encompassing both clinical data and continuous physiological waveforms. The dataset is structured to facilitate easy navigation and analysis, organized as follows:

./

├── labs.csv

├── meds.csv

├── numerics.csv

├── orders.csv

├── pmh.csv

├── rads.csv

├── split_chrono_train.csv

├── split_chrono_val.csv

├── split_chrono_test.csv

├── split_random_train.csv

├── split_random_val.csv

├── split_random_test.csv

├── visits.csv

├── waveform_summary.csv

├── waveforms

Top-Level Files:

  • visits.csv: Contains high-level information for each ED visit, including patient demographics (age, sex), arrival method, chief complaint, disposition, and event timings.
  • pmh.csv (Past Medical History): Lists historical diagnoses for patients, with corresponding ICD-9/ICD-10 codes and descriptions.
  • meds.csv (Home Medications): Details patients' home medications, including start and end dates, along with coded medication identifiers.
  • orders.csv: Records orders placed during the ED visit, such as labs, imaging studies, and medications, along with their timestamps.
  • labs.csv: Provides laboratory test results, including component values, abnormal flags, and reference ranges.
  • rads.csv (Radiology): Contains information on imaging studies, including study names and summarized impressions.
  • numerics.csv: Offers minute-level numeric vital signs recorded during the ED stay, such as heart rate (HR), respiratory rate (RR), oxygen saturation (SpO2), systolic and diastolic blood pressure (SBP, DBP), mean arterial pressure (MAP), temperature (Temp), perfusion index (Perf), pain score, oxygen flow rate (LPM_O2), and heart rate variability metrics (1min_HRV, 5min_HRV).
  • waveform_summary.csv: Summarizes available waveform segments (e.g., ECG, Pleth, Resp) for each visit, including total duration and segment count.
  • split_*.csv files: Provide predefined training, validation, and test splits. Two types of splits are available:
    • split_random_*.csv: Random 80/10/10 split by patient.
    • split_chrono_*.csv: Chronological split ensuring no patient overlap between sets.

Waveform Data:

The waveform data are organized into the waveform folder, which has the following structure:

  {CSN_suffix}/              # Folder named by last three digits of the CSN

        {Full_CSN}/            # Folder named by the full CSN (visit identifier)

            II/                # ECG waveform segments

                {Full_CSN}_{segment_number}.dat

                {Full_CSN}_{segment_number}.hea

            Pleth/             # PPG waveform segments

                {Full_CSN}_{segment_number}.dat

                {Full_CSN}_{segment_number}.hea

                ...

            Resp/              # Respiration waveform segments

                {Full_CSN}_{segment_number}.dat

                {Full_CSN}_{segment_number}.hea

Data elements are described in the following sections (and separately in the data dictionary file), and Table 11 summarizes patient/visit characteristics and statistics.

Visits

The visits table describes high-level characteristics of each visit. Data available at the time of patient arrival include: patient demographics ("Age", "Gender", "Race", "Ethnicity"), means of arrival to the ED ("Means_of_arrival"), triage vital signs ("Triage_Temp", "Triage_HR", "Triage_RR", "Triage_SpO2", "Triage_SBP", "Triage_DBP"), triage acuity by Emergency Severity Index (ESI) ("Triage_acuity"), and chief complaint ("CC"). Data summarizing the visit itself include ED disposition ("ED_dispo"), ED length of stay in hours ("ED_LOS"), class of primary visit payor ("Payor_class"), and primary diagnosis, by ICD9 (International Classification of Diseases, Ninth Revision) and ICD10 (International Classification of Diseases, Tenth Revision) codes, accompanied by free-text descriptions ("Dx_name"). For patients admitted to the hospital, the table includes admitting service ("Admit_service"), hospital length of stay in days ("Hosp_LOS"), and disposition on hospital discharge ("DC_dispo"). Shifted timestamps include: "Arrival_time" (arrival in ED), "Roomed_time" (first rooming), "Dispo_time" (time of disposition decision), "Admit_time" (time of admission), and "Departure_time" (time of departure from ED). Finally, the visits table includes the number of visits from a given patient in the dataset ("Visits"), the sequence of a given visit ("Visit_no"), the hours from ED departure until a patient's next ED visit ("Hours_to_next_visit"), and the disposition of the next ED visit ("Dispo_class_next_visit"). The visits table can be linked to other tables by CSN (orders, labs, rads, numerics) or MRN (meds, PMH).

Table 1: Visits data elements

Column Name

Description

Data Type

Sample Data

MRN

Patient identifier (Random ID mapped to original MRN)

int

99940664

CSN

Visit identifier (Random ID mapped to original CSN)

int

98874959

Visit_no

The visit number for this patient in the dataset

int

1

Visits

Total number of visits for this patient in the dataset

int

1

Age

Patient age in years at time of visit (Random perturbation of age +/- 2 years, Ages greater than 90 set to 90)

int

90

Gender

Patient gender

string

F

Race

Patient race

string

White

Ethnicity

Patient hispanic ethnicity

string

Non-Hispanic/Non-Latino

Means_of_arrival

Means of arrival to the ED

string

Self

Triage_Temp

Temperature at triage (C)

float

36.7

Triage_HR

Heart rate at triage (bpm)

float

80.0

Triage_RR

Respiratory rate at triage (breaths per min.)

float

18.0

Triage_SpO2

Oxygen saturation at triage (%)

float

100.0

Triage_SBP

Systolic blood pressure at triage (mmHg)

float

128.0

Triage_DBP

Diastolic blood pressure at triage (mmHg)

float

78.0

Triage_acuity

Emergency Severity Index (ESI) at triage (1-5)

string

3-Urgent

CC

Chief complaint(s) at triage

string

ABDOMINAL PAIN

ED_dispo

Disposition of patient from the ED

string

Discharge

Hours_to_next_visit

For patients with a subsequent visit, number of hours from departure of current visit to arrival of next visit

float

40.0

Dispo_class_next_visit

Dispo_class for next visit

string

Discharge

ED_LOS

Length of ED stay, hours

float

4.82

Hosp_LOS

Length of hospital stay (including post-ED admission), days

float

1.0

DC_dispo

Final disposition of patient from the hospital

string

Home/Work (includes foster care)

Payor_class

Class of primary visit payor

string

Medicare

Admit_service

For admitted patients, service admitting the patient from the ED

string

Emergency Medicine

Dx_ICD9

Primary visit diagnosis, ICD9 code

string

786.50

Dx_ICD10

Primary visit diagnosis, ICD10 code

string

R07.9

Dx_name

Name of ICD10 code

string

Chest pain, unspecified type

Arrival_time

Time of arrival to ED (Random-shift date, keeping season constant)

datetime

2262-01-09T03:16:07Z

Roomed_time

Time of patient rooming (Random-shift date, keeping season constant)

datetime

2283-03-02T07:36:59Z

Dispo_time

Time of disposition decision (Random-shift date, keeping season constant)

datetime

2247-09-22T10:54:42Z

Admit_time

For admitted patients, time of admission order (Random-shift date, keeping season constant)

datetime

2283-03-02T12:29:59Z

Departure_time

Time of departure from ED (Random-shift date, keeping season constant)

datetime

2209-08-12T11:31:38Z

Orders

The orders table contains all orders placed by the ED physician during the visit, and is linked to other tables by CSN. "Order_type" categorizes orders, e.g., lab tests, imaging, medications, consults, nursing orders. "Procedure_name" describes the order, and "Procedure_ID" gives an accompanying procedure code. The following timestamps are shifted at the MRN level: "Order_time" describes when an order is placed, "First_admin_time" when a medication order is administered to a patient, and "Result_time" when a laboratory or imaging order produces a reported result.

Table 2: Orders data elements

Column Name

Description

Data Type

Sample Data

CSN

Visit identifier (Random ID mapped to original CSN)

int

99139687

Order_time

Time of order (Random-shift date, keeping season constant)

datetime

2226-01-15T15:39:22Z

Order_type

Type of order (lab, imaging, medication, etc)

string

Lab

Procedure_name

Name of order

string

CBC WITH DIFFERENTIAL

Procedure_ID

Identifier for order (Mapped to CPT codes)

string

LABMETC

First_admin_time

For medications, time of first administration (Random-shift date, keeping season constant)

datetime

2212-11-03T16:51:00Z

Result_time

For lab and imaging tests, time of result (Random-shift date, keeping season constant)

datetime

2295-08-25T16:55:28Z

Meds

The meds table contains patient home medications, organized by patient (MRN). "Med_ID" gives a unique medication code, and "NDC" the National Drug Code, where available. "Name" and "Generic_name" describe the medication. "Med_class" gives a high-level classification of the medication, and "Med_subclass" a more detailed classification. "Active" indicates whether a patient was thought to be using the medication at the time of the visit. "Start_date" and "End_date" give shifted dates of medication initiation and termination, where applicable. These dates can be used to identify home medications at the time of a given visit.

Table 3: Meds data elements

Column Name

Description

Data Type

Sample Data

MRN

Patient identifier (Random ID mapped to original MRN)

int

99721983

Med_ID

Medication identifier (mapped to NDC id)

int

14113

NDC

National Drug Code identifier

string

69618-066-10

Name

Medication name

string

ASPIRIN 81 MG PO TBEC

Generic_name

Generic name

string

aspirin 81 mg tablet,delayed release

Med_class

High-level classification of the medication

string

VITAMIN D PREPARATIONS

Med_subclass

A more detailed classification

string

Vitamins - D Derivatives

Active

Indicates whether a patient was thought to be using the medication at the time of the visit

string

Y

Entry_date

Medication entry date (Random-shift date, keeping season constant)

date

2270-07-20T00:00:00Z

Start_date

Medication start date (Random-shift date, keeping season constant)

date

2275-08-31T00:00:00Z

End_date

Medication end date (Random-shift date, keeping season constant)

date

2241-08-06T00:00:00Z

PMH

The PMH (Past Medical History) table contains prior diagnoses, organized by patient (MRN). "Noted_date" gives the shifted date when the diagnosis was recorded, and can be used to identify known diagnoses at the time of a given visit. "CodeType" specifies whether the "Code" should be interpreted as an ICD9 or ICD10 code. "Desc10" gives a text description of the ICD code. "CCS" gives the Clinical Classification Software category of the diagnosis, and "DescCCS" a text description of the CCS category.

Table 4: PMH data elements

Column Name

Description

Data Type

Sample Data

MRN

Patient identifier (Random ID mapped to original MRN)

int

99084665

Noted_date

Date when the diagnosis was recorded (Random-shift date, keeping season constant)

date

2219-08-10T00:00:00Z

CodeType

Whether code is ICD9 or ICD10

string

Dx10

Code

Diagnosis code

string

I10

Desc10

Text description of the code

string

Essential (primary) hypertension

CCS

Clinical Classification Software category of the diagnosis

float

259.0

DescCCS

Text description of the CCS category

string

Residual codes; unclassified

Labs

The labs table gives results for lab tests ordered during the ED visit (CSN). "Display_name" describes the test or panel of tests (e.g. comprehensive metabolic panel), while "Component_name" describes the specific measurement (e.g. serum sodium). "Abnormal" indicates whether any result in the test falls outside the normal range, and "Component_abnormal" whether a specific measurement is abnormal. "Component_result" gives the specific result, which may be numeric or categorical, while "Component_value" assigns a numeric value to all results. "Component_units" gives the units in which "Component_value" is measured, and "Component_nml_low" and "Component_nml_high" describe the normal range, where applicable. "Order_time" is the shifted time the test was ordered (which may not exactly match the timestamps in the orders table, since the laboratory may modify or further specify orders), and "Result_time" gives the time the result was reported.

Table 5: Labs data elements

Column Name

Description

Data Type

Sample Data

CSN

Visit identifier (Random ID mapped to original CSN)

int

99880957

Order_time

Time of lab order (Random-shift date, keeping season constant)

datetime

2221-07-03T18:37:12Z

Result_time

Time of lab result (Random-shift date, keeping season constant)

datetime

2233-07-07T04:37:14Z

Display_name

Name of order

string

CBC with Differential (CBCD)

Abnormal

Flag for abnormal or critical result

string

Abnormal

Component_name

Name of lab component

string

SODIUM

Component_result

Lab result (Removed results containing dates or names)

string

Negative

Component_value

Lab value (Sometimes identical to result, sometimes different, e.g. result may be categorical and value numeric. Removed results containing dates or names)

string

1

Component_units

Units of component_value, where applicable

string

%

Component_abnormal

Flag for abnormal component_value

string

Normal

Component_nml_low

Low end of normal range for component

float

0.0

Component_nml_high

High end of normal range for component

float

5.2

Rads

The rads table contains results of imaging studies ordered during each visit (CSN). "Study" is the type of imaging test (e.g., "XR HAND 3 VIEWS LEFT"), and "Impression" gives the de-identified free text summary of the resulting radiology report. As for the labs table, "Order_time" is the shifted time the study was ordered, and "Result_time" reflects when the attending radiologist's impression was posted.

Table 6: Rads data elements

Column Name

Description

Data Type

Sample Data

CSN

Visit identifier (Random ID mapped to original CSN)

int

99002166

Order_time

Time of imaging order (Random-shift date, keeping season constant)

datetime

2205-07-07T11:11:36Z

Result_time

Time of imaging result (Random-shift date, keeping season constant)

datetime

2291-10-02T05:06:13Z

Study

Imaging study name

string

XR CHEST 1 VIEW

Impression

Imaging result impression (de-identified with [12])

string

"1. No acute cardiopulmonary disease."

Numerics

The numerics table contains the non-waveform monitoring data, by visit (CSN). "Measure" indicates one of 12 measurements: heart rate (HR), respiratory rate (RR), oxygen saturation by pulse oximetry (SpO2), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), temperature in degrees Fahrenheit (Temp), mean last-minute perfusion index derived from the PPG waveform (Perf), pain rating on 0-10 scale (Pain), liters per minute of supplemental oxygen (LPM_O2), and heart rate variability over the last 1 minute (1min_HRV) or 5 minutes (5min_HRV), calculated as the standard deviation of the beat-to-beat RR interval of the ECG waveform over this period. "Value" gives the accompanying numeric value of the observation. Where underlying observations are made more frequently than once per minute, they are aggregated to the mean value over the 60 seconds preceding the timestamp given in "Time". "Source" indicates whether a value was recorded by nursing (Chart) or derived directly from the monitoring database (Monitor).

Table 7: Numerics Data Elements

Column Name

Description

Data Type

Sample Data

CSN

Visit identifier (Random ID mapped to original CSN)

int

99833461

Source

Indicates whether a value was recorded by nursing (Chart) or derived directly from the monitoring database (Monitor)

string

Monitor

Measure

One of 12 measurements:

- HR Heart rate, heartbeats per minute

- RR Respiratory rate, breaths per minute

- SpO2 Oxygen saturation by pulse oximetry, %

- SBP Systolic blood pressure, mmHg

- DBP Diastolic blood pressure, mmHg

- MAP Mean arterial pressure, mmHg

- Temp Body temperature, degrees Fahrenheit

- Perf Mean last-minute perfusion index derived from the PPG waveform

- Pain Self-reported pain rating, 0 (no pain) to 10 (worst pain)

- LPM_O2 Flow rate of supplemental oxygen, liters per minute

- 1min_HRV or 5min_HRV Heart rate variability over the last 1 minute or 5 minutes, calculated as the standard deviation of the beat-to-beat RR interval of the ECG waveform over this period.

string

SpO2

Value

Observation value

float

100.0

Time

Timestamp of the measurement. When underlying observations are made more frequently than once per minute, they are aggregated to the mean value over the 60 seconds preceding the timestamp.

string

2247-05-09T06:40:42Z

Waveforms

The waveforms folder contains ECG (electrocardiogram), PPG (photoplethysmogram), and Resp (respiration) waveform data as WFDB records. The waveform data is organized into folders by visit. Parent folders are named by the last 3 digits of the CSN. These folders contain CSN-level folders, which contain subfolders for each waveform data type (ECG, PPG, respiration), containing multi-segment WFDB-compatible records of the waveforms.

Waveform Summary File

Table 8: Waveform summary data elements

Column Name

Description

Data Type

Sample Data

CSN

Visit identifier (Random ID mapped to original CSN)

int

99633476

Type

One of 3 waveforms:

- II Electrocardiogram

- Pleth Photoplethysmogram

- Resp Respiration

string

Pleth

Segments

The number of waveform segments

int

1

Duration

The total duration for all segments for the waveform type, in seconds

float

119.984

Waveform Data

Table 9: Waveform type description

Type

Description

Common Uses

Electrocardiogram (ECG)

Records the voltage and timing of the heart's electrical activity

Diagnosis of myocardial injury, arrhythmias, electrolyte derangements; assessment of medication responses

Photoplethysmogram (Pleth/PPG)

Records changes in blood volume over time at the site of the sensor

Estimation of heart rate, respiratory rate, blood pressure, blood oxygen saturation

Respiration (Resp)

Estimates chest wall expansion and contraction

Estimation of respiratory rate, tidal volume, respiratory function

Exhibits

Table 10. Comparison of MC-MED and existing ED or ICU datasets

Dataset

Source

Time range

Unique patients

Visits

ED data

ICU data

Structured EHR

Free-

Text data

Wave-

forms

Vital Signs

MIMIC-III [7]

The ICU of Beth Israel Deaconess Medical Center, Boston, Massachusetts

2001 - 2012

38,597

(adults)

53,423 (adults); 7,870 (neonates)

 

 

Hourly time-stamped nurse-verified physiological measurements

MIMIC-IV (v2.2) [6]

The ICU of Beth Israel Deaconess Medical Center

2008–2019

299,712

73,181ICU stays

 

ECG, PPG, and Blood Pressure signals

Digitally derived from waveforms, or sampled irregularly (such as non-invasive blood pressure)

The ED of Beth Israel Deaconess Medical Center

358,050 ED stays

 

 

Vital signs taken every 1-4 hours for ED patients

eICU (v2.0) [8]

335 ICU units at 208 hospitals located throughout the US

2014 - 2015

139,367

200,859

 

⭘ (Semi-structured notes)

 

Vital signs are recorded both hourly and continuously ( at 1-minute intervals, with 5-minute medians)

HiRID [9]

The Department of Intensive Care Medicine of the Bern University Hospital, Switzerland

January, 2008 - June, 2016

33,905

55,602

 

   

Bedside vital signs are recorded continuously (at regular intervals of 2 min)

Amsterd-amUMCdb (v1.0.2) [10]

The department of Intensive Care, a mixed medical-surgical ICU, from Amsterdam University Medical Center

2003 - 2016

20,109

23,106

 

 

Recorded up to one value every minute

EHRSHOT[15]

Stanford Medicine

1990 - 2023

6,739

921,499 (all encounters, not restricted to ED or ICU)

     

MC-MED

Stanford Health Care Emergency Department

2020 - 2022

70,545

118,385

 

ECG, PPG, and respiration

Recorded continuously, reported as 1-minute means

Table 11: Summary of visit metadata (de-identified).

 

Group

Count

Percentage

Age

18-30

19525

16.49

30-40

18096

15.29

40-50

15664

13.23

50-60

17864

15.09

60-70

17982

15.19

70-80

15107

12.76

80-90

14147

11.95

Gender

F

64272

54.29

M

54077

45.68

U

36

0.03

Race

American Indian or Alaska Native

309

0.26

Asian

19430

16.41

Black or African American

7653

6.46

Declines to State

551

0.47

Native Hawaiian or Other Pacific Islander

2468

2.08

Other

39951

33.75

Unknown

519

0.44

White

47504

40.13

Ethnicity

Hispanic/Latino

32494

27.45

Non-Hispanic/Non-Latino

84557

71.43

Unknown

701

0.59

Declines to State

633

0.53

Triage Acuity

1-Resuscitation

1066

0.9

2-Emergent

27880

23.55

3-Urgent

77259

65.26

4-Semi-Urgent

10973

9.27

5-Non-Urgent

680

0.57

Unknown

527

0.45

ED Disposition

Discharge

70246

59.34

Inpatient

29483

24.90

Observation

15890

13.42

ICU

2766

2.34

Figure 1. Time-varying data modalities recorded throughout a patient's ED visit (CSN 99797372). (A) shows the orders placed by the ED physician. (B) shows selected numeric monitoring data, including heart rate (HR), blood pressure (BP), peripheral oxygen saturation (SpO2), respiratory rate (RR), and heart rate variability (HRV). (C) shows electrocardiogram (ECG), photoplethysmogram (PPG), and respiration (Resp) waveforms at a specific point in the visit. (D) shows a free-text radiology report. (E) depicts one of numerous time-stamped laboratory results.

Figure 2: De-identification process. Patient (MRN) and visit (CSN) identifiers are mapped to random integers. Patient age is randomly perturbed by 0-2 years. All timestamps are shifted by a patient-level random time interval, maintaining seasonality. PHI is stripped from free-text radiology impressions using a BERT-based de-identification tool.


Usage Notes

We demonstrate how to read, link, and visualize the data in the MC-MED GitHub repository [14]. For detailed notes on how to unpack the data, please refer to the README and the data-descriptor files.

MC-MED supports a wide range of research uses, including early detection of critical illness, improved patient risk stratification, and the development of advanced clinical decision support tools. It facilitates precision medicine by combining continuous physiologic monitoring with comprehensive clinical records. However, MC-MED is from a single center, and data may contain noise or missingness. Time shifts and de-identification protect privacy but may limit certain analyses. Despite these constraints, MC-MED sets a valuable benchmark for evaluating algorithms, as demonstrated in the Multimodal Clinical Benchmark for Emergency Care (MC-BEC) paper[13], providing a reliable foundation for innovative emergency care research.


Release Notes

This is a provisional submission for peer reviewers only and will be updated when made public.


Ethics

The project was reviewed by the Stanford University Institutional Review Board and Secondary Data Use Committee. The study was approved by the Stanford University Institutional Review Board (58581) with waiver of consent for retrospective research on de-identified, routinely collected data.


Acknowledgements

We thank Stephanie Bogdan and Xiaoli Yang for their independent verification of de-identification.


Conflicts of Interest

The authors report no conflicts of interest.


References

  1. Burke LG, Burke RC, Epstein SK, Orav EJ, Jha AK. Trends in Costs of Care for Medicare Beneficiaries Treated in the Emergency Department From 2011 to 2016. JAMA Netw Open. 2020 Aug 3;3(8):e208229.
  2. Sun BC, Hsia RY, Weiss RE, Zingmond D, Liang LJ, Han W, et al. Effect of emergency department crowding on outcomes of admitted patients. Ann Emerg Med. 2013 Jun;61(6):605–11.e6.
  3. De Georgia MA, Kaffashi F, Jacono FJ, Loparo KA. Information technology in critical care: review of monitoring and data acquisition systems for patient care and research. ScientificWorldJournal. 2015 Feb 4;2015:727694.
  4. Seh AH, Zarour M, Alenezi M, Sarkar AK, Agrawal A, Kumar R, et al. Healthcare Data Breaches: Insights and Implications. Healthcare (Basel) [Internet]. 2020 May 13;8(2). Available from: http://dx.doi.org/10.3390/healthcare8020133
  5. Johnson A, Bulgarelli L, Pollard T, Celi LA, Mark R, Horng S. MIMIC-IV-ED [Internet]. PhysioNet; 2023. Available from: https://physionet.org/content/mimic-iv-ed/2.2/
  6. Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023 Jan 3;10(1):1.
  7. Johnson AEW, Pollard TJ, Shen L, Lehman LWH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016 May 24;3:160035.
  8. Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018 Sep 11;5:180178.
  9. Yèche H, Kuznetsova R, Zimmermann M, Hüser M, Lyu X, Faltys M, et al. HiRID-ICU-Benchmark -- A Comprehensive Machine Learning Benchmark on High-resolution ICU Data [Internet]. arXiv [cs.LG]. 2021. Available from: http://arxiv.org/abs/2111.08536
  10. Thoral PJ, Peppink JM, Driessen RH, Sijbrands EJG, Kompanje EJO, Kaplan L, et al. Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example. Crit Care Med. 2021 Jun 1;49(6):e563–77.
  11. Portability I, Act A. Guidance regarding methods for de-identification of protected health information in accordance with the health insurance portability and accountability act (HIPAA) privacy rule [Internet]. 2012. Available from: https://privacysecurityacademy.com/wp-content/uploads/2021/03/HHS-OCR-Guidance-on-De-Identification-of-PHI-2012.pdf
  12. Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J Am Med Inform Assoc. 2023 Jan 18;30(2):318–28.
  13. Chen E, Kansal A, Chen J, Jin BT, Reisler J, Kim DE, et al. Multimodal Clinical Benchmark for Emergency Care (MC-BEC): A Comprehensive Benchmark for Evaluating Foundation Models in Emergency Medicine. Advances in Neural Information Processing Systems. 2023 Dec 15;36:45794–811.
  14. GitHub - dkimlab/MCMED [Internet]. GitHub. [cited 2025 Jan 8]. Available from: https://github.com/dkimlab/MCMED
  15. Wornow M, Thapa R, Steinberg E, Fries JA, Shah NH. EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models [Internet]. arXiv [cs.LG]. 2023. Available from: http://arxiv.org/abs/2307.02028

Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Discovery

DOI (version 1.0.0):
https://doi.org/10.13026/jz99-4j81

DOI (latest version):
https://doi.org/10.13026/xgx1-7x47

Corresponding Author
You must be logged in to view the contact information.

Files