Database Open Access
VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-Scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels
Dain Eun , Kayoung Shim , Hyunsoo Lee , Yeji Lim , Hanbyeol Lim , Hyeonhoon Lee , Hyung-Chul Lee
Published: Feb. 26, 2026. Version: 1.0.0
When using this resource, please cite:
Eun, D., Shim, K., Lee, H., Lim, Y., Lim, H., Lee, H., & Lee, H. (2026). VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-Scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/axd6-wm13
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Abstract
Intraoperative cardiac arrhythmias present distinct characteristics and clinical challenges compared to non-surgical environments, yet publicly available electrocardiogram (ECG) databases have primarily focused on ambulatory and intensive care environments. To address this gap, we present the VitalDB Arrhythmia Database, a comprehensive collection of annotated intraoperative ECG recordings specifically designed for developing and validating arrhythmia detection algorithms in the surgical context. The database comprises 734,528 seconds of continuous ECG data from 482 surgical patients, with over 660,000 individually annotated heartbeats classified across four beat types and 10 distinct rhythm categories. To efficiently process the extensive source data, we developed a custom deep learning beat classifier that served as an automated screening tool for arrhythmia candidate segments. All annotations underwent rigorous validation by five anesthesiologists, with each segment independently reviewed by at least two anesthesiologists. Inter-rater reliability analysis demonstrated excellent agreement with an overall Cohen's kappa of 0.930 ± 0.130. This publicly accessible resource provides the research community with clinically validated intraoperative arrhythmia data, facilitating the development of robust detection algorithms suited to the unique physiological and technical challenges of the perioperative environment.
Background
The intraoperative period is characterized by a high incidence of cardiac arrhythmias triggered by surgical stimuli, anesthetic agents, and autonomic fluctuations [1, 2]. While intraoperative arrhythmias exhibit distinct patterns, existing public ECG datasets have predominantly consisted of data from ambulatory Holter recordings, failing to represent the unique characteristics of the surgical environment. The creation of such datasets has been challenging due to the need for experienced anesthesiologists to manually review hours of ECG recordings. While most perioperative arrhythmias are benign, they can require immediate intervention, and delayed treatment can lead to increased morbidity and mortality. The scarcity of labeled arrhythmia datasets from intraoperative ECG data has hindered the development of sophisticated detection algorithms. To address this, we analyzed 734,528 seconds of ECG data from VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients [3, 4], with rhythm and beat labels validated by anesthesiologists, and present the VitalDB Arrhythmia Database as a publicly accessible dataset to facilitate future medical research and algorithm development in this unique clinical context.
Methods
The database was constructed through a multi-stage process involving automated screening of the public VitalDB open dataset [3, 4], followed by meticulous manual annotation and validation by five anesthesiologists.
1. Automated Candidate Screening: A deep learning beat classifier, UniMS-ECGNet, was used to efficiently identify potential arrhythmia segments. The algorithm was trained on data from the VitalDB open dataset, the MIT-BIH Arrhythmia Database [3, 5], and the CU Ventricular Tachyarrhythmia Database [3, 6]. Candidate segments were selected based on criteria such as consecutive abnormal beats, high R-R interval variability, or patterned ectopy. Further details on the model architecture and usage can be found in the referenced repository [7].
2. Annotation and Validation: Individual ECG beats were classified into four primary classes (Normal, Supraventricular, Ventricular, Unclassifiable). Each continuous segment was assigned one of ten final rhythm labels. All automated selections underwent rigorous clinical validation by a board of five anesthesiologists, with each segment being independently labeled by at least two physicians to reach a final consensus.
Data Description
The dataset includes a metadata file and 482 individual annotation files. The raw ECG waveform data is not included in this package but can be freely accessed and downloaded from the public VitalDB database using the corresponding case_id [3, 4]. Alternatively, the waveform data can be accessed programmatically using the VitalDB Python package (pip install vitaldb) without requiring manual file downloads or additional authentication.
metadata.csv: A single file summarizing all cases, including columns likecase_id,analyzed_duration_sec,total_beats, and a list ofrhythm_classespresent in the case. Additionally, it includes relevant surgical and clinical information.Annotation_Files/folder: The folder contains a corresponding CSV file for each case, providing detailed beat and rhythm annotations. The annotation files are provided in the formatAnnotation_file_[case_id].csvfor each case.
Column definitions for the annotation files are as follows:
time_second: The timestamp of the R-peak in seconds, measured from the beginning of the recording.beat_type: The classification of the individual heartbeat.rhythm_label: The overall heart rhythm label for the segment in which the beat occurs.bad_signal_quality: A boolean marker (True/False) indicating if the beat is located within a segment of excessive noise or poor signal quality.bad_signal_quality_label: A label indicating the start or end of a bad signal quality segment (e.g.,Start1,End1). This column is empty for rows not marking these specific boundaries.
Rhythm and Signal Quality Annotations
Rhythm Classes The database contains 10 distinct rhythm categories, with summary statistics presented below:
| Rhythm Label | Number of cases | Number of beats | Duration in Seconds |
| Normal Sinus Rhythm | 370 | 408,420 | 384,407 |
| Noise | 250 | - | 67,734 |
| Atrial Fibrillation | 111 | 163,270 | 121,888 |
| Patterned Ventricular Ectopy | 109 | 24,069 | 47,481 |
| Supraventricular Tachyarrhythmia | 109 | 6,416 | 14,799 |
| Ventricular Tachyarrhythmia | 88 | 1,598 | 9,927 |
| Patterned Atrial Ectopy | 85 | 20,326 | 40,600 |
| Sinus Node Dysfunction | 66 | 23,141 | 31,942 |
| Wandering Atrial Pacemaker / Multifocal Atrial Rhythm | 26 | 10,132 | 9,630 |
| Atrioventricular Block | 10 | 4,323 | 5,486 |
| Unclassifiable | 6 | 199 | 631 |
Signal Quality Labels Specific criteria were used to label segments with poor signal quality:
Bad Signal Quality: This flag was used for segments where QRS complexes are visible but noise obscures P-wave or T-wave morphology, making accurate interpretation difficult. This label can be used alongside other rhythm labels when the underlying rhythm remains interpretable.Noise: This label was applied to segments where artifacts are so severe that QRS complexes themselves cannot be detected. Therefore, segments labeled as 'Noise' do not contain beat-level annotations.
Usage Notes
Accessing and Using the Data
The VitalDB Arrhythmia Database provides expert medical annotations for the ECG waveforms found in the comprehensive VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients, which is also available on PhysioNet [3, 4].
While our annotations are specific to ECG arrhythmias, users can easily link them to the full, raw waveform data from the original VitalDB project. This allows researchers to seamlessly integrate our labels with synchronized biosignals like PPG and arterial pressure waveforms, paving the way for novel multi-modal algorithm development.
A detailed, executable guide on how to merge and utilize both datasets is provided in the UsageNote.ipynb Jupyter Notebook in our GitHub repository [8]. The recommended workflow is as follows:
-
Download Annotations
Download the annotation files from this PhysioNet project.
-
Install Required Packages
Install the necessary Python packages to access waveform data and handle annotations.
pip install vitaldb pandas numpy matplotlib -
Load ECG Waveforms and Annotations
Use the
case_idto download the ECG waveform from VitalDB and load the corresponding annotation file.import vitaldb import pandas as pd import numpy as np # Example case_id case_id = 337 # Load ECG waveform from VitalDB (sampled at 500 Hz) vals = vitaldb.load_case(case_id, ['SNUADC/ECG_II'], 1/500) ecg_data = vals['SNUADC/ECG_II'] # Load annotation file annotation_file = f'Annotation_file_{case_id}.csv' annotations = pd.read_csv(annotation_file) # Display the structure of annotations print(annotations.head()) # Extract specific columns time_seconds = annotations['time_second'].values beat_types = annotations['beat_type'].values rhythm_labels = annotations['rhythm_label'].values signal_quality = annotations['bad_signal_quality'].values # Example: Get annotations for a specific time range (e.g., 100-110 seconds) start_time = 100 end_time = 110 segment_annotations = annotations[ (annotations['time_second'] >= start_time) & (annotations['time_second'] <= end_time) ] print(f"\nAnnotations between {start_time}s and {end_time}s:") print(segment_annotations[['time_second', 'beat_type', 'rhythm_label']]) # Example: Filter by specific rhythm type afib_beats = annotations[annotations['rhythm_label'] == 'Atrial fibrillation'] print(f"\nTotal Atrial fibrillation beats: {len(afib_beats)}") -
Combine and Analyze
The annotation file contains beat-level information with timestamps (
time_second), which can be matched with the ECG waveform data for visualization and analysis. Eachtime_secondvalue corresponds to the R-peak location of a detected heartbeat.
Limitations
-
Focused Annotation Scope: The annotations in this dataset are concentrated on specific segments (approximately 20 minutes per case) identified as arrhythmia candidates by our screening process and finally labeled by anesthesiologists, rather than covering the full duration of anesthesia. Consequently, the dataset does not represent a continuous, exhaustive record of every beat throughout the entire surgical procedure.
-
Data Integration: To facilitate efficient distribution, this package explicitly contains the expert annotations and metadata. It is designed to operate in tandem with the VitalDB [3, 4], where the corresponding high-fidelity raw ECG waveforms are readily accessible for analysis.
Release Notes
This is the initial release (version 1.0.0) of the VitalDB Arrhythmia Database.
Ethics
This project utilizes VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients, a publicly available de-identified dataset from Seoul National University Hospital, approved by the institutional review board (H-1408-101-605). As this work involves secondary analysis of publicly accessible, de-identified data, no additional ethical approval was required.
All shared files are fully de-identified and contain no protected health information (PHI). To ensure privacy, no absolute dates or timestamps are included in the data, metadata, or filenames. All temporal information is provided as relative time offsets.
Conflicts of Interest
The authors declare no competing interests.
References
- Kwon CH, Kim SH. Intraoperative management of critical arrhythmia. Korean J Anesthesiol. 2017;70(2):120-6.
- Staikou C, Stamelos M, Stavroulakis E. Impact of anaesthetic drugs and adjuvants on ECG markers of torsadogenicity. Br J Anaesth. 2014;112(2):217-30.
- Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215-20.
- Lee HC, Park Y, Yoon SB, Yang SM, Park D, Jung CW. VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients. Sci Data. 2022;9(1):279.
- Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng Med Biol Mag. 2001;20(3):45-50.
- Nolle FM, Badura FK, Catlett JM, Bowser RW, Sketch MH. CREI-GARD, a new concept in computerized arrhythmia monitoring systems. Comput Cardiol. 1986;13:515-8.
- VitalDB. Vital beat noise detection [Internet]. GitHub; 2025 [cited 2026 Jan 24]. Available from: https://github.com/vitaldb/arrdb/blob/main/Vital_beat_noise_detection.ipynb
- VitalDB. arrdb. GitHub [Internet]. [cited 2026 Jan 24]. Available from: https://github.com/vitaldb/arrdb/
Parent Projects
Access
Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Creative Commons Attribution 4.0 International Public License
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/axd6-wm13
DOI (latest version):
https://doi.org/10.13026/cprk-3z36
Topics:
ppg
vitaldb
ecg
arterial waveform
intraoperative dataset
Project Website:
https://github.com/vitaldb/arrdb
Project Views
6
Current Version6
All VersionsCorresponding Author
Versions
Files
Total uncompressed size: 20.9 MB.
Access the files
- Download the ZIP file (3.4 MB)
-
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/vitaldb-arrhythmia/1.0.0/
| Name | Size | Modified |
|---|---|---|
| Annotation_Files | ||
| LICENSE.txt (download) | 14.5 KB | 2026-02-24 |
| README.md (download) | 11.8 KB | 2026-01-30 |
| SHA256SUMS.txt (download) | 50.5 KB | 2026-02-26 |
| metadata.csv (download) | 218.3 KB | 2026-01-24 |