Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p00
Name Size Modified
Parent Directory
p00000
p00001
p00002
p00003
p00004
p00005
p00006
p00007
p00008
p00009
p00010
p00011
p00012
p00013
p00014
p00015
p00016
p00017
p00018
p00019
p00020
p00021
p00022
p00023
p00024
p00025
p00026
p00027
p00028
p00029
p00030
p00031
p00032
p00033
p00034
p00035
p00036
p00037
p00038
p00039
p00040
p00041
p00042
p00043
p00044
p00045
p00046
p00047
p00048
p00049
p00050
p00051
p00052
p00053
p00054
p00055
p00056
p00057
p00058
p00059
p00060
p00061
p00062
p00063
p00064
p00065
p00066
p00067
p00068
p00069
p00070
p00071
p00072
p00073
p00074
p00075
p00076
p00077
p00078
p00079
p00080
p00081
p00082
p00083
p00084
p00085
p00086
p00087
p00088
p00089
p00090
p00091
p00092
p00093
p00094
p00095
p00096
p00097
p00098
p00099
p00100
p00101
p00102
p00103
p00104
p00105
p00106
p00107
p00108
p00109
p00110
p00111
p00112
p00113
p00114
p00115
p00116
p00117
p00118
p00119
p00120
p00121
p00122
p00123
p00124
p00125
p00126
p00127
p00128
p00129
p00130
p00131
p00132
p00133
p00134
p00135
p00136
p00137
p00138
p00139
p00140
p00141
p00142
p00143
p00144
p00145
p00146
p00147
p00148
p00149
p00150
p00151
p00152
p00153
p00154
p00155
p00156
p00157
p00158
p00159
p00160
p00161
p00162
p00163
p00164
p00165
p00166
p00167
p00168
p00169
p00170
p00171
p00172
p00173
p00174
p00175
p00176
p00177
p00178
p00179
p00180
p00181
p00182
p00183
p00184
p00185
p00186
p00187
p00188
p00189
p00190
p00191
p00192
p00193
p00194
p00195
p00196
p00197
p00198
p00199
p00200
p00201
p00202
p00203
p00204
p00205
p00206
p00207
p00208
p00209
p00210
p00211
p00212
p00213
p00214
p00215
p00216
p00217
p00218
p00219
p00220
p00221
p00222
p00223
p00224
p00225
p00226
p00227
p00228
p00229
p00230
p00231
p00232
p00233
p00234
p00235
p00236
p00237
p00238
p00239
p00240
p00241
p00242
p00243
p00244
p00245
p00246
p00247
p00248
p00249
p00250
p00251
p00252
p00253
p00254
p00255
p00256
p00257
p00258
p00259
p00260
p00261
p00262
p00263
p00264
p00265
p00266
p00267
p00268
p00269
p00270
p00271
p00272
p00273
p00274
p00275
p00276
p00277
p00278
p00279
p00280
p00281
p00282
p00283
p00284
p00285
p00286
p00287
p00288
p00289
p00290
p00291
p00292
p00293
p00294
p00295
p00296
p00297
p00298
p00299
p00300
p00301
p00302
p00303
p00304
p00305
p00306
p00307
p00308
p00309
p00310
p00311
p00312
p00313
p00314
p00315
p00316
p00317
p00318
p00319
p00320
p00321
p00322
p00323
p00324
p00325
p00326
p00327
p00328
p00329
p00330
p00331
p00332
p00333
p00334
p00335
p00336
p00337
p00338
p00339
p00340
p00341
p00342
p00343
p00344
p00345
p00346
p00347
p00348
p00349
p00350
p00351
p00352
p00353
p00354
p00355
p00356
p00357
p00358
p00359
p00360
p00361
p00362
p00363
p00364
p00365
p00366
p00367
p00368
p00369
p00370
p00371
p00372
p00373
p00374
p00375
p00376
p00377
p00378
p00379
p00380
p00381
p00382
p00383
p00384
p00385
p00386
p00387
p00388
p00389
p00390
p00391
p00392
p00393
p00394
p00395
p00396
p00397
p00398
p00399
p00400
p00401
p00402
p00403
p00404
p00405
p00406
p00407
p00408
p00409
p00410
p00411
p00412
p00413
p00414
p00415
p00416
p00417
p00418
p00419
p00420
p00421
p00422
p00423
p00424
p00425
p00426
p00427
p00428
p00429
p00430
p00431
p00432
p00433
p00434
p00435
p00436
p00437
p00438
p00439
p00440
p00441
p00442
p00443
p00444
p00445
p00446
p00447
p00448
p00449
p00450
p00451
p00452
p00453
p00454
p00455
p00456
p00457
p00458
p00459
p00460
p00461
p00462
p00463
p00464
p00465
p00466
p00467
p00468
p00469
p00470
p00471
p00472
p00473
p00474
p00475
p00476
p00477
p00478
p00479
p00480
p00481
p00482
p00483
p00484
p00485
p00486
p00487
p00488
p00489
p00490
p00491
p00492
p00493
p00494
p00495
p00496
p00497
p00498
p00499
p00500
p00501
p00502
p00503
p00504
p00505
p00506
p00507
p00508
p00509
p00510
p00511
p00512
p00513
p00514
p00515
p00516
p00517
p00518
p00519
p00520
p00521
p00522
p00523
p00524
p00525
p00526
p00527
p00528
p00529
p00530
p00531
p00532
p00533
p00534
p00535
p00536
p00537
p00538
p00539
p00540
p00541
p00542
p00543
p00544
p00545
p00546
p00547
p00548
p00549
p00550
p00551
p00552
p00553
p00554
p00555
p00556
p00557
p00558
p00559
p00560
p00561
p00562
p00563
p00564
p00565
p00566
p00567
p00568
p00569
p00570
p00571
p00572
p00573
p00574
p00575
p00576
p00577
p00578
p00579
p00580
p00581
p00582
p00583
p00584
p00585
p00586
p00587
p00588
p00589
p00590
p00591
p00592
p00593
p00594
p00595
p00596
p00597
p00598
p00599
p00600
p00601
p00602
p00603
p00604
p00605
p00606
p00607
p00608
p00609
p00610
p00611
p00612
p00613
p00614
p00615
p00616
p00617
p00618
p00619
p00620
p00621
p00622
p00623
p00624
p00625
p00626
p00627
p00628
p00629
p00630
p00631
p00632
p00633
p00634
p00635
p00636
p00637
p00638
p00639
p00640
p00641
p00642
p00643
p00644
p00645
p00646
p00647
p00648
p00649
p00650
p00651
p00652
p00653
p00654
p00655
p00656
p00657
p00658
p00659
p00660
p00661
p00662
p00663
p00664
p00665
p00666
p00667
p00668
p00669
p00670
p00671
p00672
p00673
p00674
p00675
p00676
p00677
p00678
p00679
p00680
p00681
p00682
p00683
p00684
p00685
p00686
p00687
p00688
p00689
p00690
p00691
p00692
p00693
p00694
p00695
p00696
p00697
p00698
p00699
p00700
p00701
p00702
p00703
p00704
p00705
p00706
p00707
p00708
p00709
p00710
p00711
p00712
p00713
p00714
p00715
p00716
p00717
p00718
p00719
p00720
p00721
p00722
p00723
p00724
p00725
p00726
p00727
p00728
p00729
p00730
p00731
p00732
p00733
p00734
p00735
p00736
p00737
p00738
p00739
p00740
p00741
p00742
p00743
p00744
p00745
p00746
p00747
p00748
p00749
p00750
p00751
p00752
p00753
p00754
p00755
p00756
p00757
p00758
p00759
p00760
p00761
p00762
p00763
p00764
p00765
p00766
p00767
p00768
p00769
p00770
p00771
p00772
p00773
p00774
p00775
p00776
p00777
p00778
p00779
p00780
p00781
p00782
p00783
p00784
p00785
p00786
p00787
p00788
p00789
p00790
p00791
p00792
p00793
p00794
p00795
p00796
p00797
p00798
p00799
p00800
p00801
p00802
p00803
p00804
p00805
p00806
p00807
p00808
p00809
p00810
p00811
p00812
p00813
p00814
p00815
p00816
p00817
p00818
p00819
p00820
p00821
p00822
p00823
p00824
p00825
p00826
p00827
p00828
p00829
p00830
p00831
p00832
p00833
p00834
p00835
p00836
p00837
p00838
p00839
p00840
p00841
p00842
p00843
p00844
p00845
p00846
p00847
p00848
p00849
p00850
p00851
p00852
p00853
p00854
p00855
p00856
p00857
p00858
p00859
p00860
p00861
p00862
p00863
p00864
p00865
p00866
p00867
p00868
p00869
p00870
p00871
p00872
p00873
p00874
p00875
p00876
p00877
p00878
p00879
p00880
p00881
p00882
p00883
p00884
p00885
p00886
p00887
p00888
p00889
p00890
p00891
p00892
p00893
p00894
p00895
p00896
p00897
p00898
p00899
p00900
p00901
p00902
p00903
p00904
p00905
p00906
p00907
p00908
p00909
p00910
p00911
p00912
p00913
p00914
p00915
p00916
p00917
p00918
p00919
p00920
p00921
p00922
p00923
p00924
p00925
p00926
p00927
p00928
p00929
p00930
p00931
p00932
p00933
p00934
p00935
p00936
p00937
p00938
p00939
p00940
p00941
p00942
p00943
p00944
p00945
p00946
p00947
p00948
p00949
p00950
p00951
p00952
p00953
p00954
p00955
p00956
p00957
p00958
p00959
p00960
p00961
p00962
p00963
p00964
p00965
p00966
p00967
p00968
p00969
p00970
p00971
p00972
p00973
p00974
p00975
p00976
p00977
p00978
p00979
p00980
p00981
p00982
p00983
p00984
p00985
p00986
p00987
p00988
p00989
p00990
p00991
p00992
p00993
p00994
p00995
p00996
p00997
p00998
p00999