Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p09
Name Size Modified
Parent Directory
p09000
p09001
p09002
p09003
p09004
p09005
p09006
p09007
p09008
p09009
p09010
p09011
p09012
p09013
p09014
p09015
p09016
p09017
p09018
p09019
p09020
p09021
p09022
p09023
p09024
p09025
p09026
p09027
p09028
p09029
p09030
p09031
p09032
p09033
p09034
p09035
p09036
p09037
p09038
p09039
p09040
p09041
p09042
p09043
p09044
p09045
p09046
p09047
p09048
p09049
p09050
p09051
p09052
p09053
p09054
p09055
p09056
p09057
p09058
p09059
p09060
p09061
p09062
p09063
p09064
p09065
p09066
p09067
p09068
p09069
p09070
p09071
p09072
p09073
p09074
p09075
p09076
p09077
p09078
p09079
p09080
p09081
p09082
p09083
p09084
p09085
p09086
p09087
p09088
p09089
p09090
p09091
p09092
p09093
p09094
p09095
p09096
p09097
p09098
p09099
p09100
p09101
p09102
p09103
p09104
p09105
p09106
p09107
p09108
p09109
p09110
p09111
p09112
p09113
p09114
p09115
p09116
p09117
p09118
p09119
p09120
p09121
p09122
p09123
p09124
p09125
p09126
p09127
p09128
p09129
p09130
p09131
p09132
p09133
p09134
p09135
p09136
p09137
p09138
p09139
p09140
p09141
p09142
p09143
p09144
p09145
p09146
p09147
p09148
p09149
p09150
p09151
p09152
p09153
p09154
p09155
p09156
p09157
p09158
p09159
p09160
p09161
p09162
p09163
p09164
p09165
p09166
p09167
p09168
p09169
p09170
p09171
p09172
p09173
p09174
p09175
p09176
p09177
p09178
p09179
p09180
p09181
p09182
p09183
p09184
p09185
p09186
p09187
p09188
p09189
p09190
p09191
p09192
p09193
p09194
p09195
p09196
p09197
p09198
p09199
p09200
p09201
p09202
p09203
p09204
p09205
p09206
p09207
p09208
p09209
p09210
p09211
p09212
p09213
p09214
p09215
p09216
p09217
p09218
p09219
p09220
p09221
p09222
p09223
p09224
p09225
p09226
p09227
p09228
p09229
p09230
p09231
p09232
p09233
p09234
p09235
p09236
p09237
p09238
p09239
p09240
p09241
p09242
p09243
p09244
p09245
p09246
p09247
p09248
p09249
p09250
p09251
p09252
p09253
p09254
p09255
p09256
p09257
p09258
p09259
p09260
p09261
p09262
p09263
p09264
p09265
p09266
p09267
p09268
p09269
p09270
p09271
p09272
p09273
p09274
p09275
p09276
p09277
p09278
p09279
p09280
p09281
p09282
p09283
p09284
p09285
p09286
p09287
p09288
p09289
p09290
p09291
p09292
p09293
p09294
p09295
p09296
p09297
p09298
p09299
p09300
p09301
p09302
p09303
p09304
p09305
p09306
p09307
p09308
p09309
p09310
p09311
p09312
p09313
p09314
p09315
p09316
p09317
p09318
p09319
p09320
p09321
p09322
p09323
p09324
p09325
p09326
p09327
p09328
p09329
p09330
p09331
p09332
p09333
p09334
p09335
p09336
p09337
p09338
p09339
p09340
p09341
p09342
p09343
p09344
p09345
p09346
p09347
p09348
p09349
p09350
p09351
p09352
p09353
p09354
p09355
p09356
p09357
p09358
p09359
p09360
p09361
p09362
p09363
p09364
p09365
p09366
p09367
p09368
p09369
p09370
p09371
p09372
p09373
p09374
p09375
p09376
p09377
p09378
p09379
p09380
p09381
p09382
p09383
p09384
p09385
p09386
p09387
p09388
p09389
p09390
p09391
p09392
p09393
p09394
p09395
p09396
p09397
p09398
p09399
p09400
p09401
p09402
p09403
p09404
p09405
p09406
p09407
p09408
p09409
p09410
p09411
p09412
p09413
p09414
p09415
p09416
p09417
p09418
p09419
p09420
p09421
p09422
p09423
p09424
p09425
p09426
p09427
p09428
p09429
p09430
p09431
p09432
p09433
p09434
p09435
p09436
p09437
p09438
p09439
p09440
p09441
p09442
p09443
p09444
p09445
p09446
p09447
p09448
p09449
p09450
p09451
p09452
p09453
p09454
p09455
p09456
p09457
p09458
p09459
p09460
p09461
p09462
p09463
p09464
p09465
p09466
p09467
p09468
p09469
p09470
p09471
p09472
p09473
p09474
p09475
p09476
p09477
p09478
p09479
p09480
p09481
p09482
p09483
p09484
p09485
p09486
p09487
p09488
p09489
p09490
p09491
p09492
p09493
p09494
p09495
p09496
p09497
p09498
p09499
p09500
p09501
p09502
p09503
p09504
p09505
p09506
p09507
p09508
p09509
p09510
p09511
p09512
p09513
p09514
p09515
p09516
p09517
p09518
p09519
p09520
p09521
p09522
p09523
p09524
p09525
p09526
p09527
p09528
p09529
p09530
p09531
p09532
p09533
p09534
p09535
p09536
p09537
p09538
p09539
p09540
p09541
p09542
p09543
p09544
p09545
p09546
p09547
p09548
p09549
p09550
p09551
p09552
p09553
p09554
p09555
p09556
p09557
p09558
p09559
p09560
p09561
p09562
p09563
p09564
p09565
p09566
p09567
p09568
p09569
p09570
p09571
p09572
p09573
p09574
p09575
p09576
p09577
p09578
p09579
p09580
p09581
p09582
p09583
p09584
p09585
p09586
p09587
p09588
p09589
p09590
p09591
p09592
p09593
p09594
p09595
p09596
p09597
p09598
p09599
p09600
p09601
p09602
p09603
p09604
p09605
p09606
p09607
p09608
p09609
p09610
p09611
p09612
p09613
p09614
p09615
p09616
p09617
p09618
p09619
p09620
p09621
p09622
p09623
p09624
p09625
p09626
p09627
p09628
p09629
p09630
p09631
p09632
p09633
p09634
p09635
p09636
p09637
p09638
p09639
p09640
p09641
p09642
p09643
p09644
p09645
p09646
p09647
p09648
p09649
p09650
p09651
p09652
p09653
p09654
p09655
p09656
p09657
p09658
p09659
p09660
p09661
p09662
p09663
p09664
p09665
p09666
p09667
p09668
p09669
p09670
p09671
p09672
p09673
p09674
p09675
p09676
p09677
p09678
p09679
p09680
p09681
p09682
p09683
p09684
p09685
p09686
p09687
p09688
p09689
p09690
p09691
p09692
p09693
p09694
p09695
p09696
p09697
p09698
p09699
p09700
p09701
p09702
p09703
p09704
p09705
p09706
p09707
p09708
p09709
p09710
p09711
p09712
p09713
p09714
p09715
p09716
p09717
p09718
p09719
p09720
p09721
p09722
p09723
p09724
p09725
p09726
p09727
p09728
p09729
p09730
p09731
p09732
p09733
p09734
p09735
p09736
p09737
p09738
p09739
p09740
p09741
p09742
p09743
p09744
p09745
p09746
p09747
p09748
p09749
p09750
p09751
p09752
p09753
p09754
p09755
p09756
p09757
p09758
p09759
p09760
p09761
p09762
p09763
p09764
p09765
p09766
p09767
p09768
p09769
p09770
p09771
p09772
p09773
p09774
p09775
p09776
p09777
p09778
p09779
p09780
p09781
p09782
p09783
p09784
p09785
p09786
p09787
p09788
p09789
p09790
p09791
p09792
p09793
p09794
p09795
p09796
p09797
p09798
p09799
p09800
p09801
p09802
p09803
p09804
p09805
p09806
p09807
p09808
p09809
p09810
p09811
p09812
p09813
p09814
p09815
p09816
p09817
p09818
p09819
p09820
p09821
p09822
p09823
p09824
p09825
p09826
p09827
p09828
p09829
p09830
p09831
p09832
p09833
p09834
p09835
p09836
p09837
p09838
p09839
p09840
p09841
p09842
p09843
p09844
p09845
p09846
p09847
p09848
p09849
p09850
p09851
p09852
p09853
p09854
p09855
p09856
p09857
p09858
p09859
p09860
p09861
p09862
p09863
p09864
p09865
p09866
p09867
p09868
p09869
p09870
p09871
p09872
p09873
p09874
p09875
p09876
p09877
p09878
p09879
p09880
p09881
p09882
p09883
p09884
p09885
p09886
p09887
p09888
p09889
p09890
p09891
p09892
p09893
p09894
p09895
p09896
p09897
p09898
p09899
p09900
p09901
p09902
p09903
p09904
p09905
p09906
p09907
p09908
p09909
p09910
p09911
p09912
p09913
p09914
p09915
p09916
p09917
p09918
p09919
p09920
p09921
p09922
p09923
p09924
p09925
p09926
p09927
p09928
p09929
p09930
p09931
p09932
p09933
p09934
p09935
p09936
p09937
p09938
p09939
p09940
p09941
p09942
p09943
p09944
p09945
p09946
p09947
p09948
p09949
p09950
p09951
p09952
p09953
p09954
p09955
p09956
p09957
p09958
p09959
p09960
p09961
p09962
p09963
p09964
p09965
p09966
p09967
p09968
p09969
p09970
p09971
p09972
p09973
p09974
p09975
p09976
p09977
p09978
p09979
p09980
p09981
p09982
p09983
p09984
p09985
p09986
p09987
p09988
p09989
p09990
p09991
p09992
p09993
p09994
p09995
p09996
p09997
p09998
p09999