Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p07
Name Size Modified
Parent Directory
p07000
p07001
p07002
p07003
p07004
p07005
p07006
p07007
p07008
p07009
p07010
p07011
p07012
p07013
p07014
p07015
p07016
p07017
p07018
p07019
p07020
p07021
p07022
p07023
p07024
p07025
p07026
p07027
p07028
p07029
p07030
p07031
p07032
p07033
p07034
p07035
p07036
p07037
p07038
p07039
p07040
p07041
p07042
p07043
p07044
p07045
p07046
p07047
p07048
p07049
p07050
p07051
p07052
p07053
p07054
p07055
p07056
p07057
p07058
p07059
p07060
p07061
p07062
p07063
p07064
p07065
p07066
p07067
p07068
p07069
p07070
p07071
p07072
p07073
p07074
p07075
p07076
p07077
p07078
p07079
p07080
p07081
p07082
p07083
p07084
p07085
p07086
p07087
p07088
p07089
p07090
p07091
p07092
p07093
p07094
p07095
p07096
p07097
p07098
p07099
p07100
p07101
p07102
p07103
p07104
p07105
p07106
p07107
p07108
p07109
p07110
p07111
p07112
p07113
p07114
p07115
p07116
p07117
p07118
p07119
p07120
p07121
p07122
p07123
p07124
p07125
p07126
p07127
p07128
p07129
p07130
p07131
p07132
p07133
p07134
p07135
p07136
p07137
p07138
p07139
p07140
p07141
p07142
p07143
p07144
p07145
p07146
p07147
p07148
p07149
p07150
p07151
p07152
p07153
p07154
p07155
p07156
p07157
p07158
p07159
p07160
p07161
p07162
p07163
p07164
p07165
p07166
p07167
p07168
p07169
p07170
p07171
p07172
p07173
p07174
p07175
p07176
p07177
p07178
p07179
p07180
p07181
p07182
p07183
p07184
p07185
p07186
p07187
p07188
p07189
p07190
p07191
p07192
p07193
p07194
p07195
p07196
p07197
p07198
p07199
p07200
p07201
p07202
p07203
p07204
p07205
p07206
p07207
p07208
p07209
p07210
p07211
p07212
p07213
p07214
p07215
p07216
p07217
p07218
p07219
p07220
p07221
p07222
p07223
p07224
p07225
p07226
p07227
p07228
p07229
p07230
p07231
p07232
p07233
p07234
p07235
p07236
p07237
p07238
p07239
p07240
p07241
p07242
p07243
p07244
p07245
p07246
p07247
p07248
p07249
p07250
p07251
p07252
p07253
p07254
p07255
p07256
p07257
p07258
p07259
p07260
p07261
p07262
p07263
p07264
p07265
p07266
p07267
p07268
p07269
p07270
p07271
p07272
p07273
p07274
p07275
p07276
p07277
p07278
p07279
p07280
p07281
p07282
p07283
p07284
p07285
p07286
p07287
p07288
p07289
p07290
p07291
p07292
p07293
p07294
p07295
p07296
p07297
p07298
p07299
p07300
p07301
p07302
p07303
p07304
p07305
p07306
p07307
p07308
p07309
p07310
p07311
p07312
p07313
p07314
p07315
p07316
p07317
p07318
p07319
p07320
p07321
p07322
p07323
p07324
p07325
p07326
p07327
p07328
p07329
p07330
p07331
p07332
p07333
p07334
p07335
p07336
p07337
p07338
p07339
p07340
p07341
p07342
p07343
p07344
p07345
p07346
p07347
p07348
p07349
p07350
p07351
p07352
p07353
p07354
p07355
p07356
p07357
p07358
p07359
p07360
p07361
p07362
p07363
p07364
p07365
p07366
p07367
p07368
p07369
p07370
p07371
p07372
p07373
p07374
p07375
p07376
p07377
p07378
p07379
p07380
p07381
p07382
p07383
p07384
p07385
p07386
p07387
p07388
p07389
p07390
p07391
p07392
p07393
p07394
p07395
p07396
p07397
p07398
p07399
p07400
p07401
p07402
p07403
p07404
p07405
p07406
p07407
p07408
p07409
p07410
p07411
p07412
p07413
p07414
p07415
p07416
p07417
p07418
p07419
p07420
p07421
p07422
p07423
p07424
p07425
p07426
p07427
p07428
p07429
p07430
p07431
p07432
p07433
p07434
p07435
p07436
p07437
p07438
p07439
p07440
p07441
p07442
p07443
p07444
p07445
p07446
p07447
p07448
p07449
p07450
p07451
p07452
p07453
p07454
p07455
p07456
p07457
p07458
p07459
p07460
p07461
p07462
p07463
p07464
p07465
p07466
p07467
p07468
p07469
p07470
p07471
p07472
p07473
p07474
p07475
p07476
p07477
p07478
p07479
p07480
p07481
p07482
p07483
p07484
p07485
p07486
p07487
p07488
p07489
p07490
p07491
p07492
p07493
p07494
p07495
p07496
p07497
p07498
p07499
p07500
p07501
p07502
p07503
p07504
p07505
p07506
p07507
p07508
p07509
p07510
p07511
p07512
p07513
p07514
p07515
p07516
p07517
p07518
p07519
p07520
p07521
p07522
p07523
p07524
p07525
p07526
p07527
p07528
p07529
p07530
p07531
p07532
p07533
p07534
p07535
p07536
p07537
p07538
p07539
p07540
p07541
p07542
p07543
p07544
p07545
p07546
p07547
p07548
p07549
p07550
p07551
p07552
p07553
p07554
p07555
p07556
p07557
p07558
p07559
p07560
p07561
p07562
p07563
p07564
p07565
p07566
p07567
p07568
p07569
p07570
p07571
p07572
p07573
p07574
p07575
p07576
p07577
p07578
p07579
p07580
p07581
p07582
p07583
p07584
p07585
p07586
p07587
p07588
p07589
p07590
p07591
p07592
p07593
p07594
p07595
p07596
p07597
p07598
p07599
p07600
p07601
p07602
p07603
p07604
p07605
p07606
p07607
p07608
p07609
p07610
p07611
p07612
p07613
p07614
p07615
p07616
p07617
p07618
p07619
p07620
p07621
p07622
p07623
p07624
p07625
p07626
p07627
p07628
p07629
p07630
p07631
p07632
p07633
p07634
p07635
p07636
p07637
p07638
p07639
p07640
p07641
p07642
p07643
p07644
p07645
p07646
p07647
p07648
p07649
p07650
p07651
p07652
p07653
p07654
p07655
p07656
p07657
p07658
p07659
p07660
p07661
p07662
p07663
p07664
p07665
p07666
p07667
p07668
p07669
p07670
p07671
p07672
p07673
p07674
p07675
p07676
p07677
p07678
p07679
p07680
p07681
p07682
p07683
p07684
p07685
p07686
p07687
p07688
p07689
p07690
p07691
p07692
p07693
p07694
p07695
p07696
p07697
p07698
p07699
p07700
p07701
p07702
p07703
p07704
p07705
p07706
p07707
p07708
p07709
p07710
p07711
p07712
p07713
p07714
p07715
p07716
p07717
p07718
p07719
p07720
p07721
p07722
p07723
p07724
p07725
p07726
p07727
p07728
p07729
p07730
p07731
p07732
p07733
p07734
p07735
p07736
p07737
p07738
p07739
p07740
p07741
p07742
p07743
p07744
p07745
p07746
p07747
p07748
p07749
p07750
p07751
p07752
p07753
p07754
p07755
p07756
p07757
p07758
p07759
p07760
p07761
p07762
p07763
p07764
p07765
p07766
p07767
p07768
p07769
p07770
p07771
p07772
p07773
p07774
p07775
p07776
p07777
p07778
p07779
p07780
p07781
p07782
p07783
p07784
p07785
p07786
p07787
p07788
p07789
p07790
p07791
p07792
p07793
p07794
p07795
p07796
p07797
p07798
p07799
p07800
p07801
p07802
p07803
p07804
p07805
p07806
p07807
p07808
p07809
p07810
p07811
p07812
p07813
p07814
p07815
p07816
p07817
p07818
p07819
p07820
p07821
p07822
p07823
p07824
p07825
p07826
p07827
p07828
p07829
p07830
p07831
p07832
p07833
p07834
p07835
p07836
p07837
p07838
p07839
p07840
p07841
p07842
p07843
p07844
p07845
p07846
p07847
p07848
p07849
p07850
p07851
p07852
p07853
p07854
p07855
p07856
p07857
p07858
p07859
p07860
p07861
p07862
p07863
p07864
p07865
p07866
p07867
p07868
p07869
p07870
p07871
p07872
p07873
p07874
p07875
p07876
p07877
p07878
p07879
p07880
p07881
p07882
p07883
p07884
p07885
p07886
p07887
p07888
p07889
p07890
p07891
p07892
p07893
p07894
p07895
p07896
p07897
p07898
p07899
p07900
p07901
p07902
p07903
p07904
p07905
p07906
p07907
p07908
p07909
p07910
p07911
p07912
p07913
p07914
p07915
p07916
p07917
p07918
p07919
p07920
p07921
p07922
p07923
p07924
p07925
p07926
p07927
p07928
p07929
p07930
p07931
p07932
p07933
p07934
p07935
p07936
p07937
p07938
p07939
p07940
p07941
p07942
p07943
p07944
p07945
p07946
p07947
p07948
p07949
p07950
p07951
p07952
p07953
p07954
p07955
p07956
p07957
p07958
p07959
p07960
p07961
p07962
p07963
p07964
p07965
p07966
p07967
p07968
p07969
p07970
p07971
p07972
p07973
p07974
p07975
p07976
p07977
p07978
p07979
p07980
p07981
p07982
p07983
p07984
p07985
p07986
p07987
p07988
p07989
p07990
p07991
p07992
p07993
p07994
p07995
p07996
p07997
p07998
p07999