Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p03
Name Size Modified
Parent Directory
p03000
p03001
p03002
p03003
p03004
p03005
p03006
p03007
p03008
p03009
p03010
p03011
p03012
p03013
p03014
p03015
p03016
p03017
p03018
p03019
p03020
p03021
p03022
p03023
p03024
p03025
p03026
p03027
p03028
p03029
p03030
p03031
p03032
p03033
p03034
p03035
p03036
p03037
p03038
p03039
p03040
p03041
p03042
p03043
p03044
p03045
p03046
p03047
p03048
p03049
p03050
p03051
p03052
p03053
p03054
p03055
p03056
p03057
p03058
p03059
p03060
p03061
p03062
p03063
p03064
p03065
p03066
p03067
p03068
p03069
p03070
p03071
p03072
p03073
p03074
p03075
p03076
p03077
p03078
p03079
p03080
p03081
p03082
p03083
p03084
p03085
p03086
p03087
p03088
p03089
p03090
p03091
p03092
p03093
p03094
p03095
p03096
p03097
p03098
p03099
p03100
p03101
p03102
p03103
p03104
p03105
p03106
p03107
p03108
p03109
p03110
p03111
p03112
p03113
p03114
p03115
p03116
p03117
p03118
p03119
p03120
p03121
p03122
p03123
p03124
p03125
p03126
p03127
p03128
p03129
p03130
p03131
p03132
p03133
p03134
p03135
p03136
p03137
p03138
p03139
p03140
p03141
p03142
p03143
p03144
p03145
p03146
p03147
p03148
p03149
p03150
p03151
p03152
p03153
p03154
p03155
p03156
p03157
p03158
p03159
p03160
p03161
p03162
p03163
p03164
p03165
p03166
p03167
p03168
p03169
p03170
p03171
p03172
p03173
p03174
p03175
p03176
p03177
p03178
p03179
p03180
p03181
p03182
p03183
p03184
p03185
p03186
p03187
p03188
p03189
p03190
p03191
p03192
p03193
p03194
p03195
p03196
p03197
p03198
p03199
p03200
p03201
p03202
p03203
p03204
p03205
p03206
p03207
p03208
p03209
p03210
p03211
p03212
p03213
p03214
p03215
p03216
p03217
p03218
p03219
p03220
p03221
p03222
p03223
p03224
p03225
p03226
p03227
p03228
p03229
p03230
p03231
p03232
p03233
p03234
p03235
p03236
p03237
p03238
p03239
p03240
p03241
p03242
p03243
p03244
p03245
p03246
p03247
p03248
p03249
p03250
p03251
p03252
p03253
p03254
p03255
p03256
p03257
p03258
p03259
p03260
p03261
p03262
p03263
p03264
p03265
p03266
p03267
p03268
p03269
p03270
p03271
p03272
p03273
p03274
p03275
p03276
p03277
p03278
p03279
p03280
p03281
p03282
p03283
p03284
p03285
p03286
p03287
p03288
p03289
p03290
p03291
p03292
p03293
p03294
p03295
p03296
p03297
p03298
p03299
p03300
p03301
p03302
p03303
p03304
p03305
p03306
p03307
p03308
p03309
p03310
p03311
p03312
p03313
p03314
p03315
p03316
p03317
p03318
p03319
p03320
p03321
p03322
p03323
p03324
p03325
p03326
p03327
p03328
p03329
p03330
p03331
p03332
p03333
p03334
p03335
p03336
p03337
p03338
p03339
p03340
p03341
p03342
p03343
p03344
p03345
p03346
p03347
p03348
p03349
p03350
p03351
p03352
p03353
p03354
p03355
p03356
p03357
p03358
p03359
p03360
p03361
p03362
p03363
p03364
p03365
p03366
p03367
p03368
p03369
p03370
p03371
p03372
p03373
p03374
p03375
p03376
p03377
p03378
p03379
p03380
p03381
p03382
p03383
p03384
p03385
p03386
p03387
p03388
p03389
p03390
p03391
p03392
p03393
p03394
p03395
p03396
p03397
p03398
p03399
p03400
p03401
p03402
p03403
p03404
p03405
p03406
p03407
p03408
p03409
p03410
p03411
p03412
p03413
p03414
p03415
p03416
p03417
p03418
p03419
p03420
p03421
p03422
p03423
p03424
p03425
p03426
p03427
p03428
p03429
p03430
p03431
p03432
p03433
p03434
p03435
p03436
p03437
p03438
p03439
p03440
p03441
p03442
p03443
p03444
p03445
p03446
p03447
p03448
p03449
p03450
p03451
p03452
p03453
p03454
p03455
p03456
p03457
p03458
p03459
p03460
p03461
p03462
p03463
p03464
p03465
p03466
p03467
p03468
p03469
p03470
p03471
p03472
p03473
p03474
p03475
p03476
p03477
p03478
p03479
p03480
p03481
p03482
p03483
p03484
p03485
p03486
p03487
p03488
p03489
p03490
p03491
p03492
p03493
p03494
p03495
p03496
p03497
p03498
p03499
p03500
p03501
p03502
p03503
p03504
p03505
p03506
p03507
p03508
p03509
p03510
p03511
p03512
p03513
p03514
p03515
p03516
p03517
p03518
p03519
p03520
p03521
p03522
p03523
p03524
p03525
p03526
p03527
p03528
p03529
p03530
p03531
p03532
p03533
p03534
p03535
p03536
p03537
p03538
p03539
p03540
p03541
p03542
p03543
p03544
p03545
p03546
p03547
p03548
p03549
p03550
p03551
p03552
p03553
p03554
p03555
p03556
p03557
p03558
p03559
p03560
p03561
p03562
p03563
p03564
p03565
p03566
p03567
p03568
p03569
p03570
p03571
p03572
p03573
p03574
p03575
p03576
p03577
p03578
p03579
p03580
p03581
p03582
p03583
p03584
p03585
p03586
p03587
p03588
p03589
p03590
p03591
p03592
p03593
p03594
p03595
p03596
p03597
p03598
p03599
p03600
p03601
p03602
p03603
p03604
p03605
p03606
p03607
p03608
p03609
p03610
p03611
p03612
p03613
p03614
p03615
p03616
p03617
p03618
p03619
p03620
p03621
p03622
p03623
p03624
p03625
p03626
p03627
p03628
p03629
p03630
p03631
p03632
p03633
p03634
p03635
p03636
p03637
p03638
p03639
p03640
p03641
p03642
p03643
p03644
p03645
p03646
p03647
p03648
p03649
p03650
p03651
p03652
p03653
p03654
p03655
p03656
p03657
p03658
p03659
p03660
p03661
p03662
p03663
p03664
p03665
p03666
p03667
p03668
p03669
p03670
p03671
p03672
p03673
p03674
p03675
p03676
p03677
p03678
p03679
p03680
p03681
p03682
p03683
p03684
p03685
p03686
p03687
p03688
p03689
p03690
p03691
p03692
p03693
p03694
p03695
p03696
p03697
p03698
p03699
p03700
p03701
p03702
p03703
p03704
p03705
p03706
p03707
p03708
p03709
p03710
p03711
p03712
p03713
p03714
p03715
p03716
p03717
p03718
p03719
p03720
p03721
p03722
p03723
p03724
p03725
p03726
p03727
p03728
p03729
p03730
p03731
p03732
p03733
p03734
p03735
p03736
p03737
p03738
p03739
p03740
p03741
p03742
p03743
p03744
p03745
p03746
p03747
p03748
p03749
p03750
p03751
p03752
p03753
p03754
p03755
p03756
p03757
p03758
p03759
p03760
p03761
p03762
p03763
p03764
p03765
p03766
p03767
p03768
p03769
p03770
p03771
p03772
p03773
p03774
p03775
p03776
p03777
p03778
p03779
p03780
p03781
p03782
p03783
p03784
p03785
p03786
p03787
p03788
p03789
p03790
p03791
p03792
p03793
p03794
p03795
p03796
p03797
p03798
p03799
p03800
p03801
p03802
p03803
p03804
p03805
p03806
p03807
p03808
p03809
p03810
p03811
p03812
p03813
p03814
p03815
p03816
p03817
p03818
p03819
p03820
p03821
p03822
p03823
p03824
p03825
p03826
p03827
p03828
p03829
p03830
p03831
p03832
p03833
p03834
p03835
p03836
p03837
p03838
p03839
p03840
p03841
p03842
p03843
p03844
p03845
p03846
p03847
p03848
p03849
p03850
p03851
p03852
p03853
p03854
p03855
p03856
p03857
p03858
p03859
p03860
p03861
p03862
p03863
p03864
p03865
p03866
p03867
p03868
p03869
p03870
p03871
p03872
p03873
p03874
p03875
p03876
p03877
p03878
p03879
p03880
p03881
p03882
p03883
p03884
p03885
p03886
p03887
p03888
p03889
p03890
p03891
p03892
p03893
p03894
p03895
p03896
p03897
p03898
p03899
p03900
p03901
p03902
p03903
p03904
p03905
p03906
p03907
p03908
p03909
p03910
p03911
p03912
p03913
p03914
p03915
p03916
p03917
p03918
p03919
p03920
p03921
p03922
p03923
p03924
p03925
p03926
p03927
p03928
p03929
p03930
p03931
p03932
p03933
p03934
p03935
p03936
p03937
p03938
p03939
p03940
p03941
p03942
p03943
p03944
p03945
p03946
p03947
p03948
p03949
p03950
p03951
p03952
p03953
p03954
p03955
p03956
p03957
p03958
p03959
p03960
p03961
p03962
p03963
p03964
p03965
p03966
p03967
p03968
p03969
p03970
p03971
p03972
p03973
p03974
p03975
p03976
p03977
p03978
p03979
p03980
p03981
p03982
p03983
p03984
p03985
p03986
p03987
p03988
p03989
p03990
p03991
p03992
p03993
p03994
p03995
p03996
p03997
p03998
p03999