Database Open Access

Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset

Shawn Tan Satya Ortiz-Gagné Nicolas Beaudoin-Gagnon Pierre Fecteau Aaron Courville Yoshua Bengio Joseph Paul Cohen

Published: April 12, 2022. Version: 1.0


When using this resource, please cite: (show more options)
Tan, S., Ortiz-Gagné, S., Beaudoin-Gagnon, N., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2022). Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset (version 1.0). PhysioNet. https://doi.org/10.13026/kk0v-r952.

Additionally, please cite the original publication:

Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC).

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This is a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats. The signals were recorded with a 16-bit resolution at 250Hz with a fixed chest mounted single lead probe for up to 2 weeks. The average age of the patient is 62.2±17.4 years. 20 technologists annotated each beat's type (Normal, Premature Atrial Contraction, Premature Ventricular contraction) and rhythm (Normal Sinusal Rhythm, Atrial Fibrillation, Atrial Flutter).


Background

Arrhythmia detection is presently performed by cardiologists or technologists familiar with ECG readings. Recently, supervised machine learning has been successfully applied to perform automated detection of many arrhythmias [1,2,3,4]. However, there may be ECG anomalies that warrant further investigation because they do not fit the morphology of presently known arrhythmia. We seek to use a data driven approach to finding these differences that cardiologists have anecdotally observed. Existing public ECG datasets include the the MIMIC-III Waveform Database and the ECG-ViEW II dataset [5,6]. Here we present Icentia11k, a dataset of continuous raw electrocardiogram (ECG) signals containing 11 thousand patients and 2 billion labelled beats


Methods

Our data is collected by the CardioSTAT, a single-lead heart monitor device from Icentia [7]. The raw signals were recorded with a 16-bit resolution and sampled at 250Hz with the CardioSTAT in a modified lead 1 position. The wealth of data this provides us can allow us to improve on the techniques currently used by the medical industry to process days worth of ECG data, and perhaps to catch anomalous events earlier than currently possible.

The dataset is processed from data provided by 11,000 patients who used the CardioSTAT device predominantly in Ontario, Canada, from various medical centers. While the device captures ECG data for up to two weeks, the majority of the prescribed duration of wear was one week.

The data is analyzed by Icentia's team of 20 technologists who performed annotation using proprietary analysis tools. Initial beat detection is performed automatically and then a technologist analyses the record labelling beat and rhythm types performing a full disclosure analysis (i.e. they see the whole recording). Finally the analysis is approved by a senior technologist before making it to the dataset.

The ethics institutional review boards at the Université de Montréal approved the study and release of data (CERSES-19-065-D).


Data Description

We segment each patient record into segments of 2 20 + 1 2^{20}+1  signal samples (≈70 minutes). This longer time context was informed by discussions with technologists: the context is useful for rhythm detection. We made it a power of two with a middle sample to allow for easier convolution stack parameterization. From this, we randomly select 50 of the segments and their respective labels from the list of segments. The goal here is to reduce the size of the dataset while maintaining a fair representation of each patient.

Data structure

The data is structured into patients and segments.

Patient level (3-14 days)

At this level, the data can capture features which vary in a systematic way and not isolated events, like the placement of the probes or patient specific noise.

Segment level (1,048,577 int16 samples, approximately 1 hour)

A cardiologist can look at a specific segment and identify patterns which indicate a disease while ignoring noise from the signal such as a unique signal amplitude. Looking at trends in the segment help to correctly identify arrhythmia as half an hour provides the necessary context to observe the stress of a specific activity.

Aggregate statistics

Aggregate statistics are shown below:

Statistic # (units)
Number of patients 11,000
Number of labeled beats 2,774,054,987
Sample rate 250Hz
Segment size 2 20 + 1 2^{20}+1  = 1,048,577
Total number of segments 541,794 (not all patients have enough for 50 segments)

Beats are annotated in ann.symbols at the R timepoint in the QRS complex. The timepoint in the rec.signal for each annotation is found in ann.sample Below shows the counts for beats over the entire dataset. There are also annotations with a '+' symbol which just mean there is a rhythm annotation (next table).

Symbol Beat Description Count
N Normal 2,061,141,216
S ESSV (PAC): Premature or ectopic supraventricular beat, premature atrial contraction 19,346,728
V ESV (PVC): Premature ventricular contraction, premature ventricular contraction 17,203,041
Q Undefined: Unclassifiable beat 676,364,002

Rhythms are annotated in ann.aux_note at each timepoint. For example a normal sinusal rhythm will start with a '(N' annotation and then end with a ')' annotation. The entire sequence in between is annotated as a normal sinusal rhythm. Below are the counts of each annotated region which could be one beat or thousands.

Symbol Rhythm Labels Count
(N ... ) NSR (Normal sinusal rhythm) 16,083,158
(AFIB ... ) AFib (Atrial fibrillation) 848,564
(AFL ... ) AFlutter (Atrial flutter) 313,251

Details on how the dataset is encoded into wfdb format are available on GitHub [8].


Usage Notes

By releasing this dataset, we seek to enable the research community to develop better models for detection of arrhythmia and related heart disease. The dataset is described in more detail in our accompanying paper [9], which also describes our efforts to evaluation existing models for classification of arrhythmia. Code for working with the data, including executable notebooks, is available on GitHub [8].

Example code

To look at patient 9000 and segment 0 the filename would be: p09/09000/p09000_s00 and it can loaded using wfdb as follows:

import wfdb
patient_id=9000
segment_id=0
start=2000
length=1024
filename = f'{data_path}/p0{str(patient_id)[:1]}/p{patient_id:05d}/p{patient_id:05d}_s{segment_id:02d}'
rec = wfdb.rdrecord(filename, sampfrom=start, sampto=start+length)
ann = wfdb.rdann(filename, "atr", sampfrom=start, sampto=start+length, shift_samps=True)
wfdb.plot_wfdb(rec, ann, plot_sym=True, figsize=(15,4));

Limitations

It should be noted that since the people who wear the device are patients, the dataset does not represent a true random sample of the global population.  For one, the average age of the patient is 62.2±17.4 years of age.  Furthermore, whereas the CardioSTAT can be worn by any patient, it is mostly used for third line exam, so the majority of records in the dataset exhibit arrhythmias. No particular effort has been done on patient selection except data collection has been conducted over years 2017 and 2018.


Release Notes

Version 1.0: First release on PhysioNet. Prior to this release data was made available on AcademicTorrents [10].


Ethics

The authors declare no ethics concerns. The ethics institutional review boards at the University of Montreal approved the study and release of data (#CERSES-19-065-D).


Acknowledgements

We thank Leon Glass, Yannick Le Devehat, Germain Ethier, and Margaux Luck, Kris Sankaran, and Gabriele Prato for useful discussions. This work is partially funded by a grant from Icentia, Fonds de Recherche en Santé du Québec, and the Institut de valorisation des donnees (IVADO). This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank AcademicTorrents.com for making data available for our research.


Conflicts of Interest

None


References

  1. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 2019
  2. Yıldırım O, Pławiak P, Tan RS, Acharya UR. Arrhythmia detection using deep convolutional neural network with long duration ecg signals. Computers in biology and medicine 2018.
  3. Minchole A, Rodriguez B. Artificial intelligence for the electrocardiogram. Nature Medicine 1 2019.
  4. Porumb M, Iadanza E, Massaro S, Pecchia L. A convolutional neural network approach to detect congestive heart failure. Biomedical Signal Processing and Control 2020.
  5. Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). PhysioNet. https://doi.org/10.13026/C2XW26.
  6. Kim YG, Shin D, Park MY, Lee S, Jeon MS, Yoon D, Park RW. ECG-ViEW II, a freely accessible electrocardiogram database. PloS one 2017.
  7. Icentia website. https://www.icentia.com/
  8. Icentia11k project on GitHub. https://github.com/shawntan/icentia-ecg/tree/master/physionet
  9. Tan, S., Androz, G., Ortiz-Gagné, S., Chamseddine, A., Fecteau, P., Courville, A., Bengio, Y., & Cohen, J. P. (2021, October 21). Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery. Computing in Cardiology Conference (CinC). https://www.cinc.org/2021/Program/accepted/229_Preprint.pdf
  10. Icentia11k Dataset on Academic Torrents. https://academictorrents.com/details/af04abfe9a3c96b30e5dd029eb185e19a7055272

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 1.1 TB.

Access the files

Visualize waveforms

Folder Navigation: <base>/p08
Name Size Modified
Parent Directory
p08000
p08001
p08002
p08003
p08004
p08005
p08006
p08007
p08008
p08009
p08010
p08011
p08012
p08013
p08014
p08015
p08016
p08017
p08018
p08019
p08020
p08021
p08022
p08023
p08024
p08025
p08026
p08027
p08028
p08029
p08030
p08031
p08032
p08033
p08034
p08035
p08036
p08037
p08038
p08039
p08040
p08041
p08042
p08043
p08044
p08045
p08046
p08047
p08048
p08049
p08050
p08051
p08052
p08053
p08054
p08055
p08056
p08057
p08058
p08059
p08060
p08061
p08062
p08063
p08064
p08065
p08066
p08067
p08068
p08069
p08070
p08071
p08072
p08073
p08074
p08075
p08076
p08077
p08078
p08079
p08080
p08081
p08082
p08083
p08084
p08085
p08086
p08087
p08088
p08089
p08090
p08091
p08092
p08093
p08094
p08095
p08096
p08097
p08098
p08099
p08100
p08101
p08102
p08103
p08104
p08105
p08106
p08107
p08108
p08109
p08110
p08111
p08112
p08113
p08114
p08115
p08116
p08117
p08118
p08119
p08120
p08121
p08122
p08123
p08124
p08125
p08126
p08127
p08128
p08129
p08130
p08131
p08132
p08133
p08134
p08135
p08136
p08137
p08138
p08139
p08140
p08141
p08142
p08143
p08144
p08145
p08146
p08147
p08148
p08149
p08150
p08151
p08152
p08153
p08154
p08155
p08156
p08157
p08158
p08159
p08160
p08161
p08162
p08163
p08164
p08165
p08166
p08167
p08168
p08169
p08170
p08171
p08172
p08173
p08174
p08175
p08176
p08177
p08178
p08179
p08180
p08181
p08182
p08183
p08184
p08185
p08186
p08187
p08188
p08189
p08190
p08191
p08192
p08193
p08194
p08195
p08196
p08197
p08198
p08199
p08200
p08201
p08202
p08203
p08204
p08205
p08206
p08207
p08208
p08209
p08210
p08211
p08212
p08213
p08214
p08215
p08216
p08217
p08218
p08219
p08220
p08221
p08222
p08223
p08224
p08225
p08226
p08227
p08228
p08229
p08230
p08231
p08232
p08233
p08234
p08235
p08236
p08237
p08238
p08239
p08240
p08241
p08242
p08243
p08244
p08245
p08246
p08247
p08248
p08249
p08250
p08251
p08252
p08253
p08254
p08255
p08256
p08257
p08258
p08259
p08260
p08261
p08262
p08263
p08264
p08265
p08266
p08267
p08268
p08269
p08270
p08271
p08272
p08273
p08274
p08275
p08276
p08277
p08278
p08279
p08280
p08281
p08282
p08283
p08284
p08285
p08286
p08287
p08288
p08289
p08290
p08291
p08292
p08293
p08294
p08295
p08296
p08297
p08298
p08299
p08300
p08301
p08302
p08303
p08304
p08305
p08306
p08307
p08308
p08309
p08310
p08311
p08312
p08313
p08314
p08315
p08316
p08317
p08318
p08319
p08320
p08321
p08322
p08323
p08324
p08325
p08326
p08327
p08328
p08329
p08330
p08331
p08332
p08333
p08334
p08335
p08336
p08337
p08338
p08339
p08340
p08341
p08342
p08343
p08344
p08345
p08346
p08347
p08348
p08349
p08350
p08351
p08352
p08353
p08354
p08355
p08356
p08357
p08358
p08359
p08360
p08361
p08362
p08363
p08364
p08365
p08366
p08367
p08368
p08369
p08370
p08371
p08372
p08373
p08374
p08375
p08376
p08377
p08378
p08379
p08380
p08381
p08382
p08383
p08384
p08385
p08386
p08387
p08388
p08389
p08390
p08391
p08392
p08393
p08394
p08395
p08396
p08397
p08398
p08399
p08400
p08401
p08402
p08403
p08404
p08405
p08406
p08407
p08408
p08409
p08410
p08411
p08412
p08413
p08414
p08415
p08416
p08417
p08418
p08419
p08420
p08421
p08422
p08423
p08424
p08425
p08426
p08427
p08428
p08429
p08430
p08431
p08432
p08433
p08434
p08435
p08436
p08437
p08438
p08439
p08440
p08441
p08442
p08443
p08444
p08445
p08446
p08447
p08448
p08449
p08450
p08451
p08452
p08453
p08454
p08455
p08456
p08457
p08458
p08459
p08460
p08461
p08462
p08463
p08464
p08465
p08466
p08467
p08468
p08469
p08470
p08471
p08472
p08473
p08474
p08475
p08476
p08477
p08478
p08479
p08480
p08481
p08482
p08483
p08484
p08485
p08486
p08487
p08488
p08489
p08490
p08491
p08492
p08493
p08494
p08495
p08496
p08497
p08498
p08499
p08500
p08501
p08502
p08503
p08504
p08505
p08506
p08507
p08508
p08509
p08510
p08511
p08512
p08513
p08514
p08515
p08516
p08517
p08518
p08519
p08520
p08521
p08522
p08523
p08524
p08525
p08526
p08527
p08528
p08529
p08530
p08531
p08532
p08533
p08534
p08535
p08536
p08537
p08538
p08539
p08540
p08541
p08542
p08543
p08544
p08545
p08546
p08547
p08548
p08549
p08550
p08551
p08552
p08553
p08554
p08555
p08556
p08557
p08558
p08559
p08560
p08561
p08562
p08563
p08564
p08565
p08566
p08567
p08568
p08569
p08570
p08571
p08572
p08573
p08574
p08575
p08576
p08577
p08578
p08579
p08580
p08581
p08582
p08583
p08584
p08585
p08586
p08587
p08588
p08589
p08590
p08591
p08592
p08593
p08594
p08595
p08596
p08597
p08598
p08599
p08600
p08601
p08602
p08603
p08604
p08605
p08606
p08607
p08608
p08609
p08610
p08611
p08612
p08613
p08614
p08615
p08616
p08617
p08618
p08619
p08620
p08621
p08622
p08623
p08624
p08625
p08626
p08627
p08628
p08629
p08630
p08631
p08632
p08633
p08634
p08635
p08636
p08637
p08638
p08639
p08640
p08641
p08642
p08643
p08644
p08645
p08646
p08647
p08648
p08649
p08650
p08651
p08652
p08653
p08654
p08655
p08656
p08657
p08658
p08659
p08660
p08661
p08662
p08663
p08664
p08665
p08666
p08667
p08668
p08669
p08670
p08671
p08672
p08673
p08674
p08675
p08676
p08677
p08678
p08679
p08680
p08681
p08682
p08683
p08684
p08685
p08686
p08687
p08688
p08689
p08690
p08691
p08692
p08693
p08694
p08695
p08696
p08697
p08698
p08699
p08700
p08701
p08702
p08703
p08704
p08705
p08706
p08707
p08708
p08709
p08710
p08711
p08712
p08713
p08714
p08715
p08716
p08717
p08718
p08719
p08720
p08721
p08722
p08723
p08724
p08725
p08726
p08727
p08728
p08729
p08730
p08731
p08732
p08733
p08734
p08735
p08736
p08737
p08738
p08739
p08740
p08741
p08742
p08743
p08744
p08745
p08746
p08747
p08748
p08749
p08750
p08751
p08752
p08753
p08754
p08755
p08756
p08757
p08758
p08759
p08760
p08761
p08762
p08763
p08764
p08765
p08766
p08767
p08768
p08769
p08770
p08771
p08772
p08773
p08774
p08775
p08776
p08777
p08778
p08779
p08780
p08781
p08782
p08783
p08784
p08785
p08786
p08787
p08788
p08789
p08790
p08791
p08792
p08793
p08794
p08795
p08796
p08797
p08798
p08799
p08800
p08801
p08802
p08803
p08804
p08805
p08806
p08807
p08808
p08809
p08810
p08811
p08812
p08813
p08814
p08815
p08816
p08817
p08818
p08819
p08820
p08821
p08822
p08823
p08824
p08825
p08826
p08827
p08828
p08829
p08830
p08831
p08832
p08833
p08834
p08835
p08836
p08837
p08838
p08839
p08840
p08841
p08842
p08843
p08844
p08845
p08846
p08847
p08848
p08849
p08850
p08851
p08852
p08853
p08854
p08855
p08856
p08857
p08858
p08859
p08860
p08861
p08862
p08863
p08864
p08865
p08866
p08867
p08868
p08869
p08870
p08871
p08872
p08873
p08874
p08875
p08876
p08877
p08878
p08879
p08880
p08881
p08882
p08883
p08884
p08885
p08886
p08887
p08888
p08889
p08890
p08891
p08892
p08893
p08894
p08895
p08896
p08897
p08898
p08899
p08900
p08901
p08902
p08903
p08904
p08905
p08906
p08907
p08908
p08909
p08910
p08911
p08912
p08913
p08914
p08915
p08916
p08917
p08918
p08919
p08920
p08921
p08922
p08923
p08924
p08925
p08926
p08927
p08928
p08929
p08930
p08931
p08932
p08933
p08934
p08935
p08936
p08937
p08938
p08939
p08940
p08941
p08942
p08943
p08944
p08945
p08946
p08947
p08948
p08949
p08950
p08951
p08952
p08953
p08954
p08955
p08956
p08957
p08958
p08959
p08960
p08961
p08962
p08963
p08964
p08965
p08966
p08967
p08968
p08969
p08970
p08971
p08972
p08973
p08974
p08975
p08976
p08977
p08978
p08979
p08980
p08981
p08982
p08983
p08984
p08985
p08986
p08987
p08988
p08989
p08990
p08991
p08992
p08993
p08994
p08995
p08996
p08997
p08998
p08999