Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p10
Name Size Modified
Parent Directory
p10000
p10001
p10002
p10003
p10004
p10005
p10006
p10007
p10008
p10009
p10010
p10011
p10012
p10013
p10014
p10015
p10016
p10017
p10018
p10019
p10020
p10021
p10022
p10023
p10024
p10025
p10026
p10027
p10028
p10029
p10030
p10031
p10032
p10033
p10034
p10035
p10036
p10037
p10038
p10039
p10040
p10041
p10042
p10043
p10044
p10045
p10046
p10047
p10048
p10049
p10050
p10051
p10052
p10053
p10054
p10055
p10056
p10057
p10058
p10059
p10060
p10061
p10062
p10063
p10064
p10065
p10066
p10067
p10068
p10069
p10070
p10071
p10072
p10073
p10074
p10075
p10076
p10077
p10078
p10079
p10080
p10081
p10082
p10083
p10084
p10085
p10086
p10087
p10088
p10089
p10090
p10091
p10092
p10093
p10094
p10095
p10096
p10097
p10098
p10099
p10100
p10101
p10102
p10103
p10104
p10105
p10106
p10107
p10108
p10109
p10110
p10111
p10112
p10113
p10114
p10115
p10116
p10117
p10118
p10119
p10120
p10121
p10122
p10123
p10124
p10125
p10126
p10127
p10128
p10129
p10130
p10131
p10132
p10133
p10134
p10135
p10136
p10137
p10138
p10139
p10140
p10141
p10142
p10143
p10144
p10145
p10146
p10147
p10148
p10149
p10150
p10151
p10152
p10153
p10154
p10155
p10156
p10157
p10158
p10159
p10160
p10161
p10162
p10163
p10164
p10165
p10166
p10167
p10168
p10169
p10170
p10171
p10172
p10173
p10174
p10175
p10176
p10177
p10178
p10179
p10180
p10181
p10182
p10183
p10184
p10185
p10186
p10187
p10188
p10189
p10190
p10191
p10192
p10193
p10194
p10195
p10196
p10197
p10198
p10199
p10200
p10201
p10202
p10203
p10204
p10205
p10206
p10207
p10208
p10209
p10210
p10211
p10212
p10213
p10214
p10215
p10216
p10217
p10218
p10219
p10220
p10221
p10222
p10223
p10224
p10225
p10226
p10227
p10228
p10229
p10230
p10231
p10232
p10233
p10234
p10235
p10236
p10237
p10238
p10239
p10240
p10241
p10242
p10243
p10244
p10245
p10246
p10247
p10248
p10249
p10250
p10251
p10252
p10253
p10254
p10255
p10256
p10257
p10258
p10259
p10260
p10261
p10262
p10263
p10264
p10265
p10266
p10267
p10268
p10269
p10270
p10271
p10272
p10273
p10274
p10275
p10276
p10277
p10278
p10279
p10280
p10281
p10282
p10283
p10284
p10285
p10286
p10287
p10288
p10289
p10290
p10291
p10292
p10293
p10294
p10295
p10296
p10297
p10298
p10299
p10300
p10301
p10302
p10303
p10304
p10305
p10306
p10307
p10308
p10309
p10310
p10311
p10312
p10313
p10314
p10315
p10316
p10317
p10318
p10319
p10320
p10321
p10322
p10323
p10324
p10325
p10326
p10327
p10328
p10329
p10330
p10331
p10332
p10333
p10334
p10335
p10336
p10337
p10338
p10339
p10340
p10341
p10342
p10343
p10344
p10345
p10346
p10347
p10348
p10349
p10350
p10351
p10352
p10353
p10354
p10355
p10356
p10357
p10358
p10359
p10360
p10361
p10362
p10363
p10364
p10365
p10366
p10367
p10368
p10369
p10370
p10371
p10372
p10373
p10374
p10375
p10376
p10377
p10378
p10379
p10380
p10381
p10382
p10383
p10384
p10385
p10386
p10387
p10388
p10389
p10390
p10391
p10392
p10393
p10394
p10395
p10396
p10397
p10398
p10399
p10400
p10401
p10402
p10403
p10404
p10405
p10406
p10407
p10408
p10409
p10410
p10411
p10412
p10413
p10414
p10415
p10416
p10417
p10418
p10419
p10420
p10421
p10422
p10423
p10424
p10425
p10426
p10427
p10428
p10429
p10430
p10431
p10432
p10433
p10434
p10435
p10436
p10437
p10438
p10439
p10440
p10441
p10442
p10443
p10444
p10445
p10446
p10447
p10448
p10449
p10450
p10451
p10452
p10453
p10454
p10455
p10456
p10457
p10458
p10459
p10460
p10461
p10462
p10463
p10464
p10465
p10466
p10467
p10468
p10469
p10470
p10471
p10472
p10473
p10474
p10475
p10476
p10477
p10478
p10479
p10480
p10481
p10482
p10483
p10484
p10485
p10486
p10487
p10488
p10489
p10490
p10491
p10492
p10493
p10494
p10495
p10496
p10497
p10498
p10499
p10500
p10501
p10502
p10503
p10504
p10505
p10506
p10507
p10508
p10509
p10510
p10511
p10512
p10513
p10514
p10515
p10516
p10517
p10518
p10519
p10520
p10521
p10522
p10523
p10524
p10525
p10526
p10527
p10528
p10529
p10530
p10531
p10532
p10533
p10534
p10535
p10536
p10537
p10538
p10539
p10540
p10541
p10542
p10543
p10544
p10545
p10546
p10547
p10548
p10549
p10550
p10551
p10552
p10553
p10554
p10555
p10556
p10557
p10558
p10559
p10560
p10561
p10562
p10563
p10564
p10565
p10566
p10567
p10568
p10569
p10570
p10571
p10572
p10573
p10574
p10575
p10576
p10577
p10578
p10579
p10580
p10581
p10582
p10583
p10584
p10585
p10586
p10587
p10588
p10589
p10590
p10591
p10592
p10593
p10594
p10595
p10596
p10597
p10598
p10599
p10600
p10601
p10602
p10603
p10604
p10605
p10606
p10607
p10608
p10609
p10610
p10611
p10612
p10613
p10614
p10615
p10616
p10617
p10618
p10619
p10620
p10621
p10622
p10623
p10624
p10625
p10626
p10627
p10628
p10629
p10630
p10631
p10632
p10633
p10634
p10635
p10636
p10637
p10638
p10639
p10640
p10641
p10642
p10643
p10644
p10645
p10646
p10647
p10648
p10649
p10650
p10651
p10652
p10653
p10654
p10655
p10656
p10657
p10658
p10659
p10660
p10661
p10662
p10663
p10664
p10665
p10666
p10667
p10668
p10669
p10670
p10671
p10672
p10673
p10674
p10675
p10676
p10677
p10678
p10679
p10680
p10681
p10682
p10683
p10684
p10685
p10686
p10687
p10688
p10689
p10690
p10691
p10692
p10693
p10694
p10695
p10696
p10697
p10698
p10699
p10700
p10701
p10702
p10703
p10704
p10705
p10706
p10707
p10708
p10709
p10710
p10711
p10712
p10713
p10714
p10715
p10716
p10717
p10718
p10719
p10720
p10721
p10722
p10723
p10724
p10725
p10726
p10727
p10728
p10729
p10730
p10731
p10732
p10733
p10734
p10735
p10736
p10737
p10738
p10739
p10740
p10741
p10742
p10743
p10744
p10745
p10746
p10747
p10748
p10749
p10750
p10751
p10752
p10753
p10754
p10755
p10756
p10757
p10758
p10759
p10760
p10761
p10762
p10763
p10764
p10765
p10766
p10767
p10768
p10769
p10770
p10771
p10772
p10773
p10774
p10775
p10776
p10777
p10778
p10779
p10780
p10781
p10782
p10783
p10784
p10785
p10786
p10787
p10788
p10789
p10790
p10791
p10792
p10793
p10794
p10795
p10796
p10797
p10798
p10799
p10800
p10801
p10802
p10803
p10804
p10805
p10806
p10807
p10808
p10809
p10810
p10811
p10812
p10813
p10814
p10815
p10816
p10817
p10818
p10819
p10820
p10821
p10822
p10823
p10824
p10825
p10826
p10827
p10828
p10829
p10830
p10831
p10832
p10833
p10834
p10835
p10836
p10837
p10838
p10839
p10840
p10841
p10842
p10843
p10844
p10845
p10846
p10847
p10848
p10849
p10850
p10851
p10852
p10853
p10854
p10855
p10856
p10857
p10858
p10859
p10860
p10861
p10862
p10863
p10864
p10865
p10866
p10867
p10868
p10869
p10870
p10871
p10872
p10873
p10874
p10875
p10876
p10877
p10878
p10879
p10880
p10881
p10882
p10883
p10884
p10885
p10886
p10887
p10888
p10889
p10890
p10891
p10892
p10893
p10894
p10895
p10896
p10897
p10898
p10899
p10900
p10901
p10902
p10903
p10904
p10905
p10906
p10907
p10908
p10909
p10910
p10911
p10912
p10913
p10914
p10915
p10916
p10917
p10918
p10919
p10920
p10921
p10922
p10923
p10924
p10925
p10926
p10927
p10928
p10929
p10930
p10931
p10932
p10933
p10934
p10935
p10936
p10937
p10938
p10939
p10940
p10941
p10942
p10943
p10944
p10945
p10946
p10947
p10948
p10949
p10950
p10951
p10952
p10953
p10954
p10955
p10956
p10957
p10958
p10959
p10960
p10961
p10962
p10963
p10964
p10965
p10966
p10967
p10968
p10969
p10970
p10971
p10972
p10973
p10974
p10975
p10976
p10977
p10978
p10979
p10980
p10981
p10982
p10983
p10984
p10985
p10986
p10987
p10988
p10989
p10990
p10991
p10992
p10993
p10994
p10995
p10996
p10997
p10998
p10999