Database Open Access
ECG Fragment Database for the Exploration of Dangerous Arrhythmia
Published: March 17, 2022. Version: 1.0.0
When using this resource, please cite:
(show more options)
Nemirko, A., Manilo, L., Tatarinova, A., Alekseev, B., & Evdakova, E. (2022). ECG Fragment Database for the Exploration of Dangerous Arrhythmia (version 1.0.0). PhysioNet. https://doi.org/10.13026/kpfg-xs25.
L. A. Manilo, A. P. Nemirko, E. G. Evdakova and A. A. Tatarinova, "ECG Database for Evaluating the Efficiency of Recognizing Dangerous Arrhythmias," 2021 IEEE Ural-Siberian Conference on Computational Technologies in Cognitive Science, Genomics and Biomedicine (CSGB), 2021, pp. 120-123, doi: 10.1109/CSGB53040.2021.9496029.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
The database contains a set of 2 second fragments of ECG signals with rhythm disturbances, which are grouped into separate classes according to the degree of threat to the patient's life. It is intended for practical use to develop and test the efficiency of algorithms for detecting dangerous arrhythmias for continuous monitoring systems.
The “MIT-BIH Malignant Ventricular Ectopy Database” (MVED) was selected as the primary source of ECG records. This database contains heart rhythm disturbances necessary for research and is widely used for testing and comparative analysis of various algorithms for detecting arrhythmia threats. The signal is a modified limb lead II (MLII).
The database contains a set of 2 s fragments of ECG signals with rhythm disturbances, which are grouped into separate classes according to the degree of threat to the patient's life. It is intended for practical use to develop and test the efficiency of algorithms for detecting dangerous arrhythmias for continuous monitoring systems.
The database “MIT-BIH Malignant Ventricular Ectopia Database” (MVED) was selected as the primary source of ECG records. This database contains heart rhythm disturbances necessary for research and is widely used for testing and comparative analysis of various algorithms for detecting arrhythmia threats. The signal is a modified limb leadII (MLII).
Reliable detection of life-threatening arrhythmias is one of the challenges in monitoring cardiovascular diseases, especially for patients with cardiac arrhythmias. It is important to note that in continuous ECG monitoring systems, the alarm should be issued instantly, i.e. immediately at the time of the occurrence. This, in turn, allows effective resuscitation measures to be followed.
There is a great variety in the approaches used for automatic detection of dangerous arrhythmias, both in terms of the recognition methods used and the time window used for detection. Researchers use different sets of ECG signals to test algorithms, making objective comparisons of algorithm performance difficult. In addition, developers use different times to indicate life-threatening disorders (from 2 s to 8 s). However, now there is a suggestion that the optimal time for detecting a life-threatening arrhythmia is 2 s .
There is therefore a need for a database of short ECG records that can be used to objectively assess quality of work of existing and newly created algorithms. We offer a database of short 2 s ECG fragments with a ranking of arrhythmias according to the degree of risk to the patient's life. Thus, we provide an opportunity for testing existing algorithms and conducting a comparative assessment of the quality of their work on a single experimental material.
The proposed classification (ranking) is based on the potential risk posed by a rhythm disturbance to a patient's health. The ranking follows medical literature [2-6] and is supported by the medical specialists supervising this work. The ranking of ECG recordings is potentially helpful for detecting dangerous ventricular fibrillation (class 1), and also to recognize the precursors of these arrhythmias, such as - high rate ventricular tachycardia (monomorphic and polymorphic) (class 3) and ventricular tachycardia torsade de pointes (class 2). The remaining classes 4, 5, and 6 constitute an alternative group of dangerous arrhythmias, and their division into separate types simplifies the identification and analysis of possible erroneous decisions.
ECG records presented in the MIT-BIH Malignant Ventricular Ectopy Database (MVED)  were used as the initial data set.
Taking into account the need to form training samples of short ECG fragments, including various waveforms, the source records were manually cut based on the annotations available in the database. We formed 6 classes of records with 2 s signal fragments following a detailed review of 22 half-hour ECG records presented in the MVED database with a sampling rate of 360 Hz,. Classification was based on the potential risk to the patient.
When forming the samples of fragments of each class, the annotations given in the MVED database were initially used. Modification (and verification) of annotations was then carried out as follows.
VT ventricular tachycardia was grouped into three types of arrhythmia:
- VTTdP (class 2): ventricular tachycardia torsade de pointes,
- VTHR (class 3): high rate ventricular tachycardia (monomorphic and polymorphic),
- VTLR (class 4): low rate ventricular tachycardia (monomorphic and polymorphic).
Defining these subgroups was deemed to be important for the task of early detection of dangerous arrhythmia. The classes were assigned manually since the source database did not contain the relevant annotations.
In the process of viewing the original records, "BBB - sinus rhythm with bundle branch block" was also annotated. Fragments with this class are included in class 6 (sinus rhythm).
During detailed review of the extracted fragments, some annotations were corrected, mainly those that related to the degree of danger: (VF/VFL); (VFL, VF/VTHR); (VT/VF, VFL); (HGEA/VTHR); (BI/HGEA); (NOD/VER); (N/BBB); (N/BI); (N/SVTA); (N/HGEA). In total 412 annotations were revised, out of a complete dataset of 1016 fragments.
An experienced, specialist cardiologist from the laboratory of cardiological research of Almazov National Medical Research Centre was involved in this work. In difficult cases, we sought further consultation with other qualified cardiologists at the same institute.
In addition to 2 s fragments of ECG signals (following the accepted classification), spectral descriptions of signals were obtained. These were used in algorithms for the classification of dangerous arrhythmias. Spectral descriptions are represented by samples of the power spectral density (PSD), calculated in a given frequency range.
The spectrum was calculated for each ECG fragment using Fast Fourier Transform . The calculation procedure is described in detail in . In the frequency range 0 - 15 Hz, using Daniell's periodogram estimate, two sets of spectral coefficients were formed, differing in the number of features and the step size along the frequency axis (0.5 Hz and 1.0 Hz). Along with the description limited in the frequency range, the full set of spectral coefficients in the frequency range 0 - 180 Hz was calculated.
We used these spectral descriptions to study the efficiency of algorithms for recognizing dangerous arrhythmias [10, 11, 12]. An analysis of the results is discussed in the "Usage Notes" section. The obtained spectral estimates can also be used by algorithm developers. The presence of the full spectrum of the signal makes it possible to study different frequency representations of ECG fragments to obtain a more reliable result.
ECG fragments with a duration of 2 s were selected manually from a known database . A set of spectral characteristics (PSD) was obtained in the range from 0 to 180 Hz for analysis in the frequency domain. This set contains the full spectrum (0-180 Hz) in 0.5 Hz steps, smoothed spectrums in the frequency domain (0–15 Hz), represented by PSD samples in 1 Hz and 1.5 Hz steps. Zero counts of each smoothed spectrum file retain full signal strength. In the full spectrum, only the first 360 samples should be used, i.e. limiting the spectrum to 180 Hz, since the second half of the spectrum is a mirror image of its first half (the effect of the finite discrete Fourier transform).
Beginning in early 2018, a detailed review of 22 records from the MVED database was carried out to select ECG fragments. The work was carried out by three specialists from the Department of Biotechnical Systems, Saint Petersburg Electrotechnical University LETI, and a cardiologist from the Almazov National Medical Research Centre. Together, the team has over 30 years experience in this domain. The annotations provided in the original MVED database were used in the compilation of data samples. At the same time, a slight modernization of the type of data annotation was carried out:
- a more thorough classification of ECG fragments with VT annotation on VTTdP, VTHR, VTLR;
- addition of the BBB type of arrhythmia ;
- clarification of arrhythmia type for borderline cases.
This concerned 412 fragments. Current ECG fragments were excluded from further consideration only in the following cases: in the presence of significant noise in the signal, as well as in the case of an obvious discrepancy of opinions about the type of rhythm disturbance.
The process of forming a new database included two stages. At the first stage, more than 1000 short ECG fragments were selected and classified into 6 main groups (classes). At the second stage, a review of the cut fragments was carried out with the specification of the type of arrhythmia indicated in the annotation.
Fragments were selected according to the principle of the greatest variety of ECG signal forms represented in each class. We used a visual assessment of the selected fragments. The choice of the duration of fragments 2s is justified by the requirement for the fastest possible detection of dangerous arrhythmias. This condition is extremely important for continuous ECG monitoring systems since the patient's life depends on the speed of decision-making. On the other hand, as shown in [1, 9], this duration is sufficient to reveal the specific properties of the ECG, characteristic of the considered types of dangerous arrhythmias.
The proposed risk scale is based on data from the medical literature [2-6], where the consequences of the occurrence of catastrophic arrhythmias, their development in time are considered in detail, as well as their possible precursors are analyzed.
Already in the early works [2, 3, 4], which developed a direction to study the causes and precursors of sudden cardiac death, it was shown that the immediate cause of sudden cardiac death was fatal ventricular arrhythmias:
- monomorphic ventricular tachycardia with the transition to fibrillation or ventricular flutter — in 62.4% of cases,
- ventricular tachycardia type «torsades de pointes» — in 12.7% of cases,
- primary ventricular fibrillation — in 8.3% of cases.
Given these data, as well as several other studies conducted to date, for example [5, 6], ventricular fibrillation and flutter, sustained ventricular tachycardia and ventricular tachycardia of the "torsades de pointes" type are considered life-threatening arrhythmias. The generally accepted method of primary and secondary prevention of sudden death in such cases is the use of non-implantable and implantable cardioverter-defibrillators. The task of these defibrillators is to recognize life-threatening arrhythmia in the first seconds and select a mode of exposure.
Classes 1, 2, and 3 are formed according to these ideas. The remaining grades (4, 5, and 6) include non-fatal arrhythmias. They are grouped according to the principle of separation of ventricular and supraventricular disorders, which have varying degrees of danger to the patient's life. Considering the importance of recognizing extremely dangerous forms of arrhythmias, it was proposed to streamline these classes in the direction of reducing the risk of severe consequences (from 1 to 6 classes).
From each original ECG recording (22 recordings in total), after careful review, short ECG fragments were cut out for all violations that occurred in the annotation. The number of chunks cut for each of the 22 implementations is shown below in the Content description section. The selection of ECG fragments was carried out according to the principle of the maximum variety of ECG signal shapes.
The starting and ending points of the fragment were chosen arbitrarily because this principle is incorporated into the existing algorithms of ECG monitoring systems. This principle is fully implemented by us using the spectral description of ECG fragments.
The classes of arrhythmia included in the training dataset are presented below, broadly in order of decreasing risk to the patient:
- Life-threatening arrhythmias requiring urgent resuscitation:
- VFL: ventricular flutter;
- VF: ventricular fibrillation.
- A special form of life-threatening arrhythmias:
- VTTdP: ventricular tachycardia torsade de pointes.
- Life-threatening ventricular arrhythmias:
- VTHR: high rate ventricular tachycardia (monomorphic and polymorphic).
- Potentially dangerous ventricular arrhythmias:
- VTLR: low rate ventricular tachycardia (monomorphic and polymorphic);
- B: ventricular bigeminy;
- HGEA: high degree of ventricular ectopic activity;
- VER: ventricular escape rhythm.
- Supraventricular arrhythmias:
- AFIB: atrial fibrillation;
- SVTA: supraventricular tachycardia;
- SBR: sinus bradycardia;
- BI: first-degree heart block;
- NOD: nodal (a-v) rhythm.
- Sinus rhythm:
- BBB: sinus rhythm with bundle branch block;
- N: normal sinus rhythm;
- Ne: normal rhythm with single extrasystole.
The quantitative composition of various types of arrhythmias is presented in Table 1, below:
|Class||Types of arrhythmias||Number of fragments||Total in class|
The distribution of the number of large fragments according to ECG records is included in Table 2, below:
|Records||Number of fragments|
The database contains 2 s ECG fragments, a full spectrum of an ECG fragment (0 - 180 Hz) with a step of 0.5 Hz, smoothed spectra (0-15 Hz) with a step of 1 Hz and 1.5 Hz. As a result, 4 files were saved for each 2 s ECG fragment.
File naming and structure
The list below shows a set of example filenames:
These filenames can be interpreted as follows:
- ECG implementation number (
- Diagnostic result (
- Start time of a fragment with a 2 s duration from the beginning of the implementation (
- Data type:
frag: 2 s signal fragment;
full: full spectrum (0 - 180 Hz), step 0.5 Hz;
15_2: smoothed spectrum (0 - 15 Hz), step 1 Hz; 15 features;
10_3: smoothed spectrum (0 - 15 Hz), step 1.5 Hz; 10 features.
A zero count of each smoothed spectrum file retains the full signal power (2 s fragment).
The project folder is organized into 6 sections:
Each section contains 4 subsections in which records of the corresponding content are stored:
frag: initial data;
15_2: spectrums of the format specified in the designation.
This database contains labelled ECG fragments. It may help to support research into heart rhythm disturbances, as well as comparative analysis of algorithms for arrhythmia detection.
We have used the ECG fragments and corresponding spectral coefficients to develop algorithms to detect dangerous arrhythmia disorders such as ventricular fibrillation and ventricular tachycardia [10, 11, 12]. Each of the six classes was represented by a sample of 90 objects. Classification was carried out using classical methods: k nearest neighbors (kNN) method, nearest convex hull method (LNCH), linear discriminant analysis (FLD), support vector machine (SVM). To date, we have found the cubic SVM-based algorithm to have the highest accuracy (94.8%), but our research is ongoing.
Using sets of short ECG fragments, researchers will be able to test algorithms for recognizing arrhythmias, and to conduct comparative analyses of the effectiveness of new and existing methods for detecting dangerous disorders. Multi-class data ranking makes it possible to solve many problems, such as: detection of fragments of ventricular fibrillation (VF), recognition of precursors of dangerous disorders, detection of a critical (torsade de pointes) form of ventricular tachycardia. The presence of a spectral description may help to support arrhythmia recognition in both frequency and time domains.
Limitations of the dataset are mainly related to the length of the time interval at which the ECG fragment is formed. This limitation defines a certain class of online algorithms designed to recognize dangerous arrhythmias in short 2 s time intervals. In addition, the proposed database does not include ECG signals for two cases: asystole, and artificial pacemaker. In addition, the data does not contain demographics (e.g. age, gender, height, weight) or medications.
The authors declare no ethics concerns.
The development of this database was supported in part by the Russian Foundation for Basic Research project 19-29-01009.
Conflicts of Interest
The authors have no conflicts of interest to declare.
- Acharya U. R., Fujita H., Oh S. L., Raghavendra U., Tan J. H., Adam M., Gertych A., Hagiwara Y., Automated identification of shockable and non-shockable life-threatening ventricular arrhythmias using convolutional neural network. Future Generation Computer Systems, vol. 79, pp. 952 – 959, 2018. [Online].
- A.B. Bayers de Luna, P. Coumel, J.F. Leclercq. Ambulatory sudden cardiac death: mechanism of production of fatal arrhythmia on the basis of data 157 cases // // Am. Heart J. 1989. V. 117 (1). Р. 151–159.
- Lown B., Wolf M. Approaches to Sudden Death from Coronary Heart Disease// Circulation 1971; 44. Р. 130-142.
- Bigger J.T. Jr. Identification of patients at high risk for sudden cardiac death // Am. J. Cardiol. 1984;54(9). Р. 3D-8D.
- Priori S.G., Blomstrom-Lundqvist C., Mazzanti A. et al. 2015 ESC Guidelines for the management of patients with ventricular arrhythmias and the prevention of sudden cardiac death // European Heart Journal. 2015;36(41). Р. 2793-2867.
- Glikson M., Nielsen J.C., Kronborg M. B. et al. 2021 ESC Guidelines on cardiac pacing and cardiac resynchronization therapy // European Heart Journal (2021).00. Р.1-94
- MIT-BIH Malignant Ventricular Ectopy Database // Massachusetts Institute of Technology https://www.physionet.org/content/vfdb/1.0.0/
- Marple S. L. Digital Spectral Analysis with Applications. Prentice Hall, Englewood Cliffs, NJ, 1987.
- Manilo L.A., Nemirko A.P. Recognition of biomedical signals based on their spectral description data analysis // Pattern Recognition and Image Analysis. 2016. Vol. 26, № 4. P. 782–788.
- Nemirko, A.; Manilo, L.; Alekseev, B.; Sokolova, A. and Yuldashev, Z. The Comparison of Algorithms for Life-threatening Cardiac Arrhythmias Recognition. SCITEPRESS - Science and Technology Publications. 2020. pp. 402-407.
- Manilo L.A., Nemirko A.P., Evdakova E.G. Recognition of Dangerous Rhythm Disturbances from Short ECG Fragments. –202l Ural Symposium on Biomedical Engineering, Radioelectronics and information Technology (USBEREIT). IEEE, 2021, pp. 0041-0044. https://ieeexplore.ieee.org/document/9455071/
- Manilo L.A., Nemirko А.Р., Evdakova E.G., Tatarinova А.А. ECG Database for Evaluating the Efficiency оf Recognizing Dangerous Arrhythmias. - 2021 IEEE Ural-Siberian Conference оп Computational Technologies in Cognitive Science, Genomics and Biomedicine (CSGB). IEEE, 2021, pp. 120-123.https://ieeexplore.ieee.org/document/9496029/
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Open Data Commons Attribution License v1.0
Total uncompressed size: 5.6 MB.
Access the files
- Download the ZIP file (4.2 MB)
- Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/ecg-fragment-high-risk-label/1.0.0/
|LICENSE.txt (download)||19.9 KB||2022-03-16|
|README.txt (download)||198 B||2022-01-14|
|RECORDS (download)||44.1 KB||2022-01-14|
|SHA256SUMS.txt (download)||563.8 KB||2022-03-22|