Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p01
Name Size Modified
Parent Directory
p01000
p01001
p01002
p01003
p01004
p01005
p01006
p01007
p01008
p01009
p01010
p01011
p01012
p01013
p01014
p01015
p01016
p01017
p01018
p01019
p01020
p01021
p01022
p01023
p01024
p01025
p01026
p01027
p01028
p01029
p01030
p01031
p01032
p01033
p01034
p01035
p01036
p01037
p01038
p01039
p01040
p01041
p01042
p01043
p01044
p01045
p01046
p01047
p01048
p01049
p01050
p01051
p01052
p01053
p01054
p01055
p01056
p01057
p01058
p01059
p01060
p01061
p01062
p01063
p01064
p01065
p01066
p01067
p01068
p01069
p01070
p01071
p01072
p01073
p01074
p01075
p01076
p01077
p01078
p01079
p01080
p01081
p01082
p01083
p01084
p01085
p01086
p01087
p01088
p01089
p01090
p01091
p01092
p01093
p01094
p01095
p01096
p01097
p01098
p01099
p01100
p01101
p01102
p01103
p01104
p01105
p01106
p01107
p01108
p01109
p01110
p01111
p01112
p01113
p01114
p01115
p01116
p01117
p01118
p01119
p01120
p01121
p01122
p01123
p01124
p01125
p01126
p01127
p01128
p01129
p01130
p01131
p01132
p01133
p01134
p01135
p01136
p01137
p01138
p01139
p01140
p01141
p01142
p01143
p01144
p01145
p01146
p01147
p01148
p01149
p01150
p01151
p01152
p01153
p01154
p01155
p01156
p01157
p01158
p01159
p01160
p01161
p01162
p01163
p01164
p01165
p01166
p01167
p01168
p01169
p01170
p01171
p01172
p01173
p01174
p01175
p01176
p01177
p01178
p01179
p01180
p01181
p01182
p01183
p01184
p01185
p01186
p01187
p01188
p01189
p01190
p01191
p01192
p01193
p01194
p01195
p01196
p01197
p01198
p01199
p01200
p01201
p01202
p01203
p01204
p01205
p01206
p01207
p01208
p01209
p01210
p01211
p01212
p01213
p01214
p01215
p01216
p01217
p01218
p01219
p01220
p01221
p01222
p01223
p01224
p01225
p01226
p01227
p01228
p01229
p01230
p01231
p01232
p01233
p01234
p01235
p01236
p01237
p01238
p01239
p01240
p01241
p01242
p01243
p01244
p01245
p01246
p01247
p01248
p01249
p01250
p01251
p01252
p01253
p01254
p01255
p01256
p01257
p01258
p01259
p01260
p01261
p01262
p01263
p01264
p01265
p01266
p01267
p01268
p01269
p01270
p01271
p01272
p01273
p01274
p01275
p01276
p01277
p01278
p01279
p01280
p01281
p01282
p01283
p01284
p01285
p01286
p01287
p01288
p01289
p01290
p01291
p01292
p01293
p01294
p01295
p01296
p01297
p01298
p01299
p01300
p01301
p01302
p01303
p01304
p01305
p01306
p01307
p01308
p01309
p01310
p01311
p01312
p01313
p01314
p01315
p01316
p01317
p01318
p01319
p01320
p01321
p01322
p01323
p01324
p01325
p01326
p01327
p01328
p01329
p01330
p01331
p01332
p01333
p01334
p01335
p01336
p01337
p01338
p01339
p01340
p01341
p01342
p01343
p01344
p01345
p01346
p01347
p01348
p01349
p01350
p01351
p01352
p01353
p01354
p01355
p01356
p01357
p01358
p01359
p01360
p01361
p01362
p01363
p01364
p01365
p01366
p01367
p01368
p01369
p01370
p01371
p01372
p01373
p01374
p01375
p01376
p01377
p01378
p01379
p01380
p01381
p01382
p01383
p01384
p01385
p01386
p01387
p01388
p01389
p01390
p01391
p01392
p01393
p01394
p01395
p01396
p01397
p01398
p01399
p01400
p01401
p01402
p01403
p01404
p01405
p01406
p01407
p01408
p01409
p01410
p01411
p01412
p01413
p01414
p01415
p01416
p01417
p01418
p01419
p01420
p01421
p01422
p01423
p01424
p01425
p01426
p01427
p01428
p01429
p01430
p01431
p01432
p01433
p01434
p01435
p01436
p01437
p01438
p01439
p01440
p01441
p01442
p01443
p01444
p01445
p01446
p01447
p01448
p01449
p01450
p01451
p01452
p01453
p01454
p01455
p01456
p01457
p01458
p01459
p01460
p01461
p01462
p01463
p01464
p01465
p01466
p01467
p01468
p01469
p01470
p01471
p01472
p01473
p01474
p01475
p01476
p01477
p01478
p01479
p01480
p01481
p01482
p01483
p01484
p01485
p01486
p01487
p01488
p01489
p01490
p01491
p01492
p01493
p01494
p01495
p01496
p01497
p01498
p01499
p01500
p01501
p01502
p01503
p01504
p01505
p01506
p01507
p01508
p01509
p01510
p01511
p01512
p01513
p01514
p01515
p01516
p01517
p01518
p01519
p01520
p01521
p01522
p01523
p01524
p01525
p01526
p01527
p01528
p01529
p01530
p01531
p01532
p01533
p01534
p01535
p01536
p01537
p01538
p01539
p01540
p01541
p01542
p01543
p01544
p01545
p01546
p01547
p01548
p01549
p01550
p01551
p01552
p01553
p01554
p01555
p01556
p01557
p01558
p01559
p01560
p01561
p01562
p01563
p01564
p01565
p01566
p01567
p01568
p01569
p01570
p01571
p01572
p01573
p01574
p01575
p01576
p01577
p01578
p01579
p01580
p01581
p01582
p01583
p01584
p01585
p01586
p01587
p01588
p01589
p01590
p01591
p01592
p01593
p01594
p01595
p01596
p01597
p01598
p01599
p01600
p01601
p01602
p01603
p01604
p01605
p01606
p01607
p01608
p01609
p01610
p01611
p01612
p01613
p01614
p01615
p01616
p01617
p01618
p01619
p01620
p01621
p01622
p01623
p01624
p01625
p01626
p01627
p01628
p01629
p01630
p01631
p01632
p01633
p01634
p01635
p01636
p01637
p01638
p01639
p01640
p01641
p01642
p01643
p01644
p01645
p01646
p01647
p01648
p01649
p01650
p01651
p01652
p01653
p01654
p01655
p01656
p01657
p01658
p01659
p01660
p01661
p01662
p01663
p01664
p01665
p01666
p01667
p01668
p01669
p01670
p01671
p01672
p01673
p01674
p01675
p01676
p01677
p01678
p01679
p01680
p01681
p01682
p01683
p01684
p01685
p01686
p01687
p01688
p01689
p01690
p01691
p01692
p01693
p01694
p01695
p01696
p01697
p01698
p01699
p01700
p01701
p01702
p01703
p01704
p01705
p01706
p01707
p01708
p01709
p01710
p01711
p01712
p01713
p01714
p01715
p01716
p01717
p01718
p01719
p01720
p01721
p01722
p01723
p01724
p01725
p01726
p01727
p01728
p01729
p01730
p01731
p01732
p01733
p01734
p01735
p01736
p01737
p01738
p01739
p01740
p01741
p01742
p01743
p01744
p01745
p01746
p01747
p01748
p01749
p01750
p01751
p01752
p01753
p01754
p01755
p01756
p01757
p01758
p01759
p01760
p01761
p01762
p01763
p01764
p01765
p01766
p01767
p01768
p01769
p01770
p01771
p01772
p01773
p01774
p01775
p01776
p01777
p01778
p01779
p01780
p01781
p01782
p01783
p01784
p01785
p01786
p01787
p01788
p01789
p01790
p01791
p01792
p01793
p01794
p01795
p01796
p01797
p01798
p01799
p01800
p01801
p01802
p01803
p01804
p01805
p01806
p01807
p01808
p01809
p01810
p01811
p01812
p01813
p01814
p01815
p01816
p01817
p01818
p01819
p01820
p01821
p01822
p01823
p01824
p01825
p01826
p01827
p01828
p01829
p01830
p01831
p01832
p01833
p01834
p01835
p01836
p01837
p01838
p01839
p01840
p01841
p01842
p01843
p01844
p01845
p01846
p01847
p01848
p01849
p01850
p01851
p01852
p01853
p01854
p01855
p01856
p01857
p01858
p01859
p01860
p01861
p01862
p01863
p01864
p01865
p01866
p01867
p01868
p01869
p01870
p01871
p01872
p01873
p01874
p01875
p01876
p01877
p01878
p01879
p01880
p01881
p01882
p01883
p01884
p01885
p01886
p01887
p01888
p01889
p01890
p01891
p01892
p01893
p01894
p01895
p01896
p01897
p01898
p01899
p01900
p01901
p01902
p01903
p01904
p01905
p01906
p01907
p01908
p01909
p01910
p01911
p01912
p01913
p01914
p01915
p01916
p01917
p01918
p01919
p01920
p01921
p01922
p01923
p01924
p01925
p01926
p01927
p01928
p01929
p01930
p01931
p01932
p01933
p01934
p01935
p01936
p01937
p01938
p01939
p01940
p01941
p01942
p01943
p01944
p01945
p01946
p01947
p01948
p01949
p01950
p01951
p01952
p01953
p01954
p01955
p01956
p01957
p01958
p01959
p01960
p01961
p01962
p01963
p01964
p01965
p01966
p01967
p01968
p01969
p01970
p01971
p01972
p01973
p01974
p01975
p01976
p01977
p01978
p01979
p01980
p01981
p01982
p01983
p01984
p01985
p01986
p01987
p01988
p01989
p01990
p01991
p01992
p01993
p01994
p01995
p01996
p01997
p01998
p01999