Name: The CirCor DigiScope Phonocardiogram Dataset
Published: Jan. 28, 2022
License: https://opendatacommons.org/licenses/by/index.html

Database Open Access

Jorge Oliveira , Francesco Renna , Paulo Costa , Marcelo Nogueira , Ana Cristina Oliveira , Andoni Elola , Carlos Ferreira , Alipio Jorge , Ali Bahrami Rad , Matthew Reyna , Reza Sameni , Gari Clifford , Miguel Coimbra

Published: Jan. 28, 2022. Version: 1.0.1 <View latest version>

This is not the latest version. Click here for the latest version.

When using this resource, please cite: (show more options)
Oliveira, J., Renna, F., Costa, P., Nogueira, M., Oliveira, A. C., Elola, A., Ferreira, C., Jorge, A., Bahrami Rad, A., Reyna, M., Sameni, R., Clifford, G., & Coimbra, M. (2022). The CirCor DigiScope Phonocardiogram Dataset (version 1.0.1). PhysioNet. https://doi.org/10.13026/7bkn-d780.

MLA	Oliveira, Jorge, et al. "The CirCor DigiScope Phonocardiogram Dataset" (version 1.0.1). PhysioNet (2022), https://doi.org/10.13026/7bkn-d780.
APA	Oliveira, J., Renna, F., Costa, P., Nogueira, M., Oliveira, A. C., Elola, A., Ferreira, C., Jorge, A., Bahrami Rad, A., Reyna, M., Sameni, R., Clifford, G., & Coimbra, M. (2022). The CirCor DigiScope Phonocardiogram Dataset (version 1.0.1). PhysioNet. https://doi.org/10.13026/7bkn-d780.
Chicago	Oliveira, Jorge, Renna, Francesco, Costa, Paulo, Nogueira, Marcelo, Oliveira, Ana Cristina, Elola, Andoni, Ferreira, Carlos, Jorge, Alipio, Bahrami Rad, Ali, Reyna, Matthew, Sameni, Reza, Clifford, Gari, and Miguel Coimbra. "The CirCor DigiScope Phonocardiogram Dataset" (version 1.0.1). PhysioNet (2022). https://doi.org/10.13026/7bkn-d780.
Harvard	Oliveira, J., Renna, F., Costa, P., Nogueira, M., Oliveira, A. C., Elola, A., Ferreira, C., Jorge, A., Bahrami Rad, A., Reyna, M., Sameni, R., Clifford, G., and Coimbra, M. (2022) 'The CirCor DigiScope Phonocardiogram Dataset' (version 1.0.1), PhysioNet. Available at: https://doi.org/10.13026/7bkn-d780.
Vancouver	Oliveira J, Renna F, Costa P, Nogueira M, Oliveira A C, Elola A, Ferreira C, Jorge A, Bahrami Rad A, Reyna M, Sameni R, Clifford G, Coimbra M. The CirCor DigiScope Phonocardiogram Dataset (version 1.0.1). PhysioNet. 2022. Available from: https://doi.org/10.13026/7bkn-d780.

Additionally, please cite the original publication:

J. H. Oliveira, F. Renna, P. Costa, D. Nogueira, C. Oliveira, C. Ferreira, A. Jorge, S. Mattos, T. Hatem, T. Tavares, A. Elola, A. Rad, R. Sameni, G. D. Clifford, & M. T. Coimbra (2021). The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification. IEEE Journal of Biomedical and Health Informatics, https://doi.org/10.1109/JBHI.2021.3137048

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

A total number of 5272 heart sound recordings were collected from the main four auscultation locations of 1568 subjects, aged between 0 and 21 years (mean ± STD = 6.1 ± 4.1 years), with a duration between 4.8 to 80.4 seconds (mean ± STD = 22.9 ± 7.4 s), totaling more than 33.5 hours of recording. Each cardiac murmur in the dataset has been annotated in detail by a human annotator, in terms of timing, shape, pitch, grading, quality and location. Moreover, segmentation annotations regarding the location of fundamental heart sounds (S1 and S2) in the recordings have been obtained using a semi-supervised scheme. The segmentation annotations were performed by voting between three state-of-the-art machine-based algorithms. An expert annotator later studied the consensus and mismatches between the algorithms beat-by-beat and performed a manual annotation whenever the algorithms had disagreed or were not acceptable for the expert. To date, the dataset is the largest publicly available pediatric heart sound dataset, supporting deeper research on the topic of auscultation-based health recommendation systems. The dataset is being used in the George B. Moody PhysioNet Challenge 2022 on Heart Murmur Detection from Phonocardiogram Recordings.

Background

Fundamental heart sounds (namely S1 and S2) are generated from vibrations of the cardiac valves as they open and close during the cardiac cycle. Valve malfunctioning causes turbulence in the blood flow within heart chambers and near the heart, which translates into abnormal sounds known as murmurs. The analysis of murmurs provides invaluable information regarding the functioning status of heart valves. The anatomical position of heart valves relative to the chest wall dictates the optimal auscultation position. As such, for clinical auscultations of each heart valve, the stethoscope is ideally placed at specific locations:

Aortic valve: second intercostal space, right sternal border
Pulmonic valve: second intercostal space, left sternal border
Tricuspid valve: left lower sternal border
Mitral valve: fifth intercostal space, midclavicular line (cardiac apex)

Blood flowing through these structures creates audible sounds, which are more audible the more turbulent the flow is [1]. The first heart sound (S1) is produced by vibrations of the mitral and tricuspid valves as they close in at the beginning of the systole. S1 is audible at the chest wall and is formed by two components – the mitral and tricuspid [1]. Although the mitral component of S1 is louder and occurs sooner, under physiological resting conditions, both components occur closely enough, making it hard to distinguish them [2]. The second heart sound (S2) is produced by the closure of the aortic and pulmonic valves, at the beginning of the diastole. Similarly to the S1, it is also formed by two components, with the aortic component being louder and occurring sooner than the pulmonic component, due to the pressures in the aorta being higher than in the pulmonary artery. In contrast and unlike S1, under normal conditions the closure sound of the aortic and pulmonic valves can be discernible, due to an increase in venous return during inspiration, which slightly delays the pressure increase in the pulmonary artery and consequently the pulmonic valve closure [3].

Methods

The dataset was collected as part of two mass screening campaigns conducted in Northeast Brazil in July-August 2014 and June-July 2015 [4]. The data collection was approved by the 5192-Complexo Hospitalar HUOC/PROCAPE institutional review board, under the request of the Real Hospital Portugues de Beneficencia em Pernambuco. The target population included all participants presenting voluntarily for screening within the study period. Subjects younger than 21 years of age with a signed parental consent form (where appropriate) were included. A total of 2061 participants attended the 2014 and 2015 “Caravana do Coração” (Portuguese for “Caravan of the Heart”) campaigns, with 493 participants being excluded for not meeting the eligibility criteria. All participants completed a socio-demographic questionnaire and subsequently underwent a clinical examination (anamnesis and physical examination), a nursing assessment (physiological measurements), and cardiac investigations (chest radiography, electrocardiogram, and echocardiogram) when ordered as the result of an examination. Data quality assessment was performed, and all entries were screened for incorrectly entered or measured values, inconsistent data or outliers, and deleted as appropriate. The resulting entries were then compiled to capture the socio-demographic and clinical variables used in our dataset.

Subsequently, an electronic auscultation was performed and audio samples from four auscultation points were typically collected. All samples were collected by the same operator for the duration of the screening, in a real clinical setting. The resulting phonocardiogram (PCG) audio files were assessed for signal quality and were segmented by cardiac physiologists. The signal quality assessment and segmentations were performed by a different expert in each campaign, and the murmur annotation was performed by the same annotator for the entire dataset (including both campaigns). Overall, 119 participants had recordings that did not meet the required signal quality standards, i.e. these subject recordings did not lead into a reliable murmur characterization and description. These records were considered as unidentifiable (unknown) by the annotator.

The acquired audio records were automatically segmented using the three algorithms proposed in [5], [6] and [7]. These algorithms were only used to detect and identify the fundamental heart sounds (S1 and S2 sounds) and their corresponding boundaries. The aforementioned cardiac physiologists inspected the algorithms’ outputs on mutually exclusive data (as each expert screened only one of the two campaigns). Accordingly, each expert analyzed the automated annotations and whenever the annotator disagreed with the suggested automatic annotations, a manual annotation was required. In such cases, the annotator was instructed to annotate at least five complete heart cycles. Segmentation labels were retained for sections of heart sound recordings that were considered of high quality and representative by the cardiac physiologists. The remainder of the signal may include both low and high quality data. In this way, the users of the dataset may choose to use (or not to use) the suggested time windows, where the signal quality was manually inspected, and the automated labels were validated.

This methodology was applied to both the 2014 and 2015 screening campaigns. A cardiac physiologist manually classified and characterized murmur events blindly, and independent of other clinical notes. The cardiac physiologist inspected by simultaneously listening and visualizing each non-filtered heart sound signal, through the Audacity software. Note that no segmentation data is used by the expert in order to detect and characterize murmur events. The sounds were recorded in an ambulatory environment. Different noisy sources have been observed in our dataset, including stethoscope rubbing noise, speaking, crying, or laughing sounds in the background. On the other hand, the proposed dataset is a representative sample of real-world environments where automatic machine-based auscultation systems may operate. Further details regarding the methodology can be found in [4].

Data Description

There are four data file types in the dataset (per subject):

A wave recording file (binary .wav format) per auscultation location for each subject, which contains the heart sound data
A header file (text .hea format) describing the .wav file using the standard WFDB format [8]
A segmentation data file (text .tsv format) per auscultation location for all subjects, which contains segmentation information regarding the start and end points of the fundamental heart sounds S1 and S2
A subject description text file (text .txt format) per subject, where the name of the file corresponds to the subject ID. Demographic data such as weight, height, sex, age group and pregnancy status as well as a detailed description of murmur events are provided in this file.

In addition to the subject-wise description files, the file training_data.csv contains the overall information for all the records of the training set in the PhysioNet Challenge 2022; this file will not be provided with the Challenge data

The filenames for the audio data, the header file, the segmentation annotation, and the subject description file names are formatted as ABCDE_XY.wav, ABCDE_XY.hea, ABCDE_XY.tsv and ABCDE.txt, respectively. Here ABCDE is a numeric subject identifier and XY is one of the following codes corresponding to the auscultation location where the PCG was collected on the body surface:

PV corresponds to the pulmonary valve point;
TV corresponds to the tricuspid valve point;
AV corresponds to the aortic valve point;
MV corresponds to the mitral valve point;
Phc for any other auscultation Location.

If more than one recording exists per auscultation location, an integer index succeeds the auscultation location code in the file name, i.e. ABCDE_XY_n.wav, ABCDE_XY_n.hea and ABCDE_XY_n.tsv, where n is an integer (1, 2, …). Accordingly, each audio file has its own header and annotation segmentation file, but the subject description file ABCDE.txt is shared between all auscultation recordings of the same subject ID. Multi-location records or repeated records with the same subject ID base name (e.g., ABCDE as described above), have been recorded in the same session, sequentially. Therefore, the corresponding .wav files have different lengths and there is no time synchrony between them.

All heart sound records were screened for presence of murmurs at each auscultation location. Each murmur was classified according to its timing (early-, mid-, and late- systolic/diastolic) [9], shape (crescendo, decrescendo, diamond, plateau), pitch (high, medium, low), quality (blowing, harsh, musical) [9], and grade (according to the Levine scale [10]). This information is provided in the subject description files (with .txt extension), with the following format:

The first line indicates the subject’s identifier, the number of recordings, and the sampling frequency (in Hz) separated by space delimiters.
The second line contains information about the heart sound data files corresponding to the current subject ID, also separated by empty spaces. Here the location of the recording (AV, PV, TV, MV, or Phc), the name of the header file, the name of the .wav file, and the name of the file that includes the information about the segmentation (.tsv) are included.
The rest of the lines start with a hash symbol (#) and indicate the information described in the following table.

Subject Description File Variables
Variable	Description (data type)	Possible values
`Age`	Age category (string)	Neonate Infant Child Adolescent Young adult
`Sex`	Reported sex (string)	Female Male
`Height`	Height in centimeters (number)	> 0
`Weight`	Weight in kilograms (number)	> 0
`Pregnancy status`	Did the subject report being pregnant at the time of the examination? (boolean)	True False
`Additional ID`	The second record identifier for subjects that participated to both screening campaigns (string)	Subject identifier
`Campaign`	Campaign attended by the subject (string)	CC2014 CC2015
`Murmur`	Indicates if a murmur is present, absent or unidentifiable for the annotator (string)	Present Absent Unknown
`Murmur locations`	Auscultation locations where at least one murmur has been observed (string)	Any combination of the following abbreviations separated by plus (+) signs: PV, TV, AV, MV, and Phc
`Most audible location`	Auscultation location where murmurs sounded more intense for the annotator (string)	PV TV AV MV Phc
`Systolic murmur timing`	Timing of the murmur in the systolic period (string)	Early-systolic Holosystolic Late-systolic Mid-systolic
`Systolic murmur shape`	Shape of the murmur in the systolic period (string)	Crescendo Decrescendo Diamond Plateau
`Systolic murmur pitch`	Pitch of the murmur in the systolic period (string)	Low Medium High
`Systolic murmur grading`	Grading of the murmur in the systolic period according to the Levine’s scale (string)	I/VI II/VI III/VI
`Systolic murmur quality`	Quality of the murmur in the systolic period (string)	Blowing Harsh Musical
`Diastolic murmur timing`	Timing of the murmur in the diastolic period (string)	Early-diastolic Holodiastolic Mid-diastolic
`Diastolic murmur shape`	Shape of the murmur in the diastolic period (string)	Decrescendo Plateau
`Diastolic murmur pitch`	Pitch of the murmur in the diastolic period (string)	Low Medium High
`Diastolic murmur grading`	Grading of the murmur in the diastolic period according to the Levine scale (string)	I/IV II/IV III/IV
`Diastolic murmur quality`	Quality of the murmur in the diastolic period (string)	Blowing Harsh

Example 1: The subject description file 1234.txt contains information about the subject with ID number 1234, as shown below. Accordingly, there are a total of four wave files for this subject acquired from the locations AV, PV, TV and MV, all sampled at 4000 Hz. Each .wav file has its murmur segmentation information registered in a separate .tsv file, with a similar base name as the corresponding .wav file.

1234 4 4000
AV 1234_AV.hea 1234_AV.wav 1234_AV.tsv
PV 1234_PV.hea 1234_PV.wav 1234_PV.tsv
TV 1234_TV.hea 1234_TV.wav 1234_TV.tsv
MV 1234_MV.hea 1234_MV.wav 1234_MV.tsv
#Age: Child
#Sex: Female
#Height: 123.0
#Weight: 13.5
#Pregnancy status: False
#Murmur: Present
#Murmur locations: AV+MV+PV+TV
#Most audible location: TV
#Systolic murmur timing: Holosystolic
#Systolic murmur shape: Diamond
#Systolic murmur grading: III/VI
#Systolic murmur pitch: High
#Systolic murmur quality: Harsh
#Diastolic murmur timing: nan
#Diastolic murmur shape: nan
#Diastolic murmur grading: nan
#Diastolic murmur pitch: nan
#Diastolic murmur quality: nan
#Campaign: CC2014
#Additional ID: nan

The segmentation annotation file (with .tsv extension and in plain text format) is composed of three distinct columns: the first column corresponds to the time instant (in seconds), where the wave was detected for the first time; the second column corresponds to the time instant (in seconds) where the wave was detected for the last time; the third column corresponds to an identifier that uniquely identifies the detected wave, with the following convention:

The S1 wave is identified by the integer 1.
The systolic period is identified by the integer 2.
The S2 wave is identified by the integer 3.
The diastolic period is identified by the integer 4.
The unannotated segments of the signal are identified by the integer 0.

The file training_data.csv contains the same information as the subject-wise .txt files (with the same tags and values), combined in column-wise format for all subjects. Each row of this file corresponds to one subject ID and each column corresponds to one of the subject description tags; this file will not be provided with the Challenge data.

Usage Notes

The dataset is organized in three sets: training, validation, and test sets.

These sets were randomly selected through stratified random sampling over the following classes: normal subjects (absent murmur), abnormal subjects (present murmur) and unsure (unknown murmur absence/presence for the human expert annotator.

We have shared 60% of the dataset publicly as a training set for the George B. Moody PhysioNet Challenge 2022 [11].

We are retaining the remaining 40% of the data privately to score models presented at the PhysioNet Challenge 2022. These data will be released in this project after the end of the Challenge.

In these files, murmur waves are described and categorized per subject and following the same terminology used by physicians, namely its timing, shape, pitch, grading and quality. In addition, the auscultation locations where the murmur is present as well as the auscultation location where the murmur is detected more intensively are also reported in these files.

A short description of the variables extracted from a physical examination and a heart sound auscultation, and the corresponding tags in the description text files are also provided. Note that a description of the murmur is provided in the subject description files only when a murmur is detected in at least one of the subject’s recordings. Thus for healthy subjects no data is provided for those variables, and a single nan (not a number) symbol is provided.

In the tag Murmur, each value is of string type and with one of the following outcomes:

Present: Murmur waves were detected in at least one heart sound recording
Absent: Murmur waves were not detected in any heart sound recording
Unknown: The presence or absence of murmurs was unclear for the annotator

The tag Murmurs locations is a string, each string is a concatenation of acronyms. Each acronym represents an auscultation location where at least one murmur wave has been observed. If more than one location needs to be reported, the locations are listed and separated by plus (+) signs. The acronyms used to identify the auscultation locations names are: PV (the pulmonary valve); TV (tricuspid valve); AV (aortic valve), MV (mitral valve), and Phc for any other auscultation location.

The tag Most audible location is also a string, each string is a single acronym that identifies the auscultation location where murmur waves were audible more intensively. The acronyms used are the same as in the Murmurs locations tag.

The tag Systolic murmur timing is a string and describes the location of the murmur wave in the systolic period. The outcomes are one of the following:

Early-systolic: a murmur has been observed at the beginning of the systolic period
Mid-systolic: a murmur has been observed at the middle of the systolic period
Late-systolic: a murmur has been observed at the ending of the systolic period
Holosystolic: a murmur has been observed over the whole systolic period

The tag Systolic murmur shape is a string and describes the shape of the murmur wave that has been observed in the systolic period. The shape of a murmur can be viewed as a function of murmur intensity over time. The possible outcomes for this variable are:

Crescendo: the amplitude of the murmur wave increases over time
Decrescendo: the amplitude of the murmur wave decreases over time
Diamond: the amplitude of the murmur wave first increases for some time but then decreases for the rest of the time period
Plateau: the amplitude of the murmur wave stays approximately constant over the whole period

The tag Systolic murmur pitch is a string and it is related to the pressure gradient felt in the heart chambers. In general, the higher the pitch is, the higher is the pressure gradient felt in the corresponding heart chamber. For example, in an aortic stenosis, a large pressure gradient is felt between the left ventricle and the aorta artery. As a result, murmurs generated by an aortic stenosis have in general a high pitch. The possible outcomes are

High
Medium
Low

The tag Systolic murmur grading is a string and describes the murmur's grade feature from waves observed in the systolic period. This feature is highly correlated with the severity of the murmur. The higher the grading, the worse the patient prognostic and outcome. Since not all subjects have auscultation sounds recorded from all the four main auscultation locations, the approach adopted by the expert annotator to provide grading annotations is as follows:

Grade I/VI: if barely audible and not heard/present or not recorded in all auscultation locations
Grade II/VI: if soft, but easily heard in all auscultation locations
Grade III/VI: if moderately loud or loud. In this dataset, grade III/VI denotes all grade III/VI and above (IV/VI, V/VI, and VI/VI)

Accordingly, the grade annotations can diverge from the original definition of murmur grading, when applied to cases for which not all the auscultation locations are available. In such cases, murmurs were classified by default as grade I/VI. Moreover, the cases classified as grade III/VI, actually include murmurs that could potentially be of grade III/VI or higher, since discrimination among grades III/VI, IV/VI, V/VI, and VI/VI is associated with palpable murmurs, also known as thrills [10], which can only be assessed via in-person physical examination.

The tag Systolic murmur quality is a string and describes the murmur's quality feature from waves observed in the systolic period. It relates to the presence of harmonics and the overtones. The possible outcomes of this variable are

Blowing
Harsh
Musical

The tag Diastolic murmur timing is a string and describes the location of the murmur wave in the diastolic period. The possible outcomes of this variable are:

Early-diastolic: a murmur has been observed at the beginning of the diastolic period
Mid-diastolic: a murmur has been observed at the middle of the diastolic period
Holosystolic: a murmur has been observed over the whole diastolic period

The tag Diastolic murmur shape is a string and describes the shape of the murmur wave that has been observed in the diastolic period. In this database, only murmurs with decrescendo and plateau shapes have been observed in the diastolic period.

The tag Diastolic murmur grading is a string and describes the murmur's grade feature from waves heard during the diastolic period. In contrast to systolic murmurs, diastolic murmurs do not follow the Levine grading scale [10]. Instead, murmurs are graded from I to IV (instead of I to VI).

Grade I/IV: if barely audible and not heard/present or not recorded in all auscultation locations
Grade II/IV: if soft, but easily heard in all auscultation locations
Grade III/IV: if moderately loud or loud

On the other hand, IV/IV are associated with palpable murmurs, also known as thrills [10], which can only be assessed via physical in-person examination. In this database, Grades III/IV and IV/IV are merged together.

The tag Diastolic murmur pitch is a string and describes the murmur's pitch feature from waves observed in the diastolic period. The possible outcomes of this variable are

High
Medium
Low

The tag Diastolic murmur quality is a string and describes the murmur's quality feature from waves heard in the diastolic period. In this dataset, only murmurs with Blowing and Harsh qualities have been heard in the diastolic period.

The tag Sex is a string and displays the subject’s reported gender. The possible values for this variable are Male or Female, according to the reported gender of the subject at the time of data acquisition. Subjects participating in both CC2014 and CC2015 screening campaigns were consistent in the reported gender and age (taking into account the time/age gap between the two campaigns).

The tag Age is a string and corresponds to the age category of the subject and according to the National Institute of Child Health and Human Development (NICHD) pediatric terminology [12]. The possible values of this variable are:

Neonate: birth to 27 days old
Infant: 28 days old to 1 year old
Child: 1 to 11 years old
Adolescent: 12 to 18 years old
Young Adult: 19 to 21 years old

The tag Height is a number and corresponds to the subject’s height in centimeters (cm).

The tag Weight is a number and corresponds to the subject's weight in kilograms (kg).

The tag Pregnancy status is a boolean variable (True/False) and explicitly identifies the subjects that reported being pregnant at the time of the screening campaign.

Since some subjects attended the two screening campaigns but using a different identifier, the tag Additional ID provides the second identifier used for the subject.

Since the current dataset is used in the 2022 PhysioNet Challenge on heart murmur detection, in order to remove any data bias in the dataset, data from the same subject (regardless of the screening campaign) is provided either in the training, validation, or testing set (the validation and testing sets not being provided in the current release to comply with the Challenge requirements).

The tag Campaign is a string that identifies the screening campaign that the subject attended. The acronyms used are the following:

CC2014: the 2014 screening campaign
CC2015: the 2015 screening campaign

Note that the two screening campaigns have a one-year time gap and taking into account the subjects' age, an open question that requires further study is whether or not any long-term dependencies can be found between the data acquired from the same subjects between the two campaigns. This is left as an open discussion for further evaluation by the scientific community.

Any research/publication based on this database is requested to cite the CirCor DigiScope Phonocardiogram Dataset project on PhysioNet and its corresponding article [4].

Release Notes

Version 1.0.1 made the following changes. The initial 30% of the dataset is augmented with another 30% of the dataset to create the training set for the PhysioNet 2022 Challenge on Heart Murmur Detection [11]. Several formatting changes have also been made to the data. The overall data description file training_data.csv is updated. In addition, subject-wise description text files (.txt) are added per subject. The per-subject segmentation files (previously with .txt extension) are renamed to .tsv files. Further examples and details are added regarding the data format.

Ethics

The data collection was approved by the 5192-Complexo Hospitalar HUOC/PROCAPE institutional review board, under the request of the Real Hospital Portugues de Beneficencia em Pernambuco, Brazil.

Acknowledgements

This work is a result of the Project DigiScope2 (POCI-01-0145-FEDER-029200 - PTDC/CCI-COM/29200/2017), funded by Fundo Europeu de Desenvolvimento Regional (FEDER), through Programa Operacional Competitividade e Internacionalização (POCI), and by national funds, through Fundação para a Ciência e Tecnologia (FCT).

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

P. Libby, R. Bonow, D. Mann, and D. Zipes, Braunwald’s Heart Disease: A Textbook of Cardiovascular Medicine. 8th edition. Elsevier Science, 2007.
J. Soler-Soler and E. Galve, “Worldwide perspective of valve disease,” Heart, vol. 83, no. 6, pp. 721–725, 2000. [Online]. Available: https://dx.doi.org/10.1136/heart.83.6.721
S. Dornbush and A. E. Turnquest, Physiology, heart sounds (2019) [Online]. Available: In:StatPearls[Internet].TreasureIsland(FL):StatPearlsPublishing, https://www.ncbi.nlm.nih.gov/books/NBK541010/
J. H. Oliveira, F. Renna, P. Costa, D. Nogueira, C. Oliveira, C. Ferreira, A. Jorge, S. Mattos, T. Hatem, T. Tavares, A. Elola, A. Rad, R. Sameni, G. D. Clifford, & M. T. Coimbra (2021). The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification. IEEE Journal of Biomedical and Health Informatics, https://doi.org/10.1109/JBHI.2021.3137048
C. Liu, D. Springer, Q. Li, and et. al., “An open access database for the evaluation of heart sound algorithms,” Physiological Measurement, vol. 37, no. 12, p. 2181, 2016. [Online]. Available: https://doi.org/10.1088/0967-3334/37/12/2181
J. Oliveira, F. Renna, T. Mantadelis, and M. T. Coimbra, “Adaptive sojourn time HSMM for heart sound segmentation,” IEEE J. Biomed. Health Informatics, vol. 23, no. 2, pp. 642–649, 2019. [Online]. Available: https://doi.org/10.1109/JBHI.2018.2841197
F. Renna, J. H. Oliveira, and M. T. Coimbra, “Deep convolutional neural networks for heart sound segmentation,” IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 6, pp. 2435–2445, 2019. [Online] Available: https://doi.org/10.1109/JBHI.2019.2894222
G. Moody, T. Pollard, and B. Moody, “WFDB Software Package (version 10.6.2),” PhysioNet, 2021. https://doi.org/10.13026/zzpx-h016
S. J. Owen and K. Wong, “Cardiac auscultation via simulation: a survey of the approach of uk medical schools,” BMC research notes, vol. 8, pp. 427–427, Sep 2015, 26358413 [pmid]. [Online] Available: https://doi.org/10.1186/s13104-015-1419-y
A. Freeman and S. Levine, “The clinical significance of the systolic murmur. a study of 1000 consecutive “non-cardiac” cases,” Ann Intern Med, vol. 6, p. 1371–1385, 1933. [Online]. Available: https://doi.org/10.7326/0003-4819-6-11-1371
The George B. Moody PhysioNet Challenge 2022: Heart Murmur Detection from Phonocardiogram Recordings. [Online] Available: https://moody-challenge.physionet.org/2022/
K. Williams, D. Thomson, I. Seto, D. Contopoulos-Ioannidis et al., “Standard 6: Age groups for pediatric trials,” Pediatrics, vol. 129 Suppl 3, pp. S153–60, 06 2012. [Online] Available: http://dx.doi.org/10.1542/peds.2012-0055I

Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Open Data Commons Attribution License v1.0

Discovery

DOI (version 1.0.1):
https://doi.org/10.13026/7bkn-d780

DOI (latest version):
https://doi.org/10.13026/g02k-a047

Topics:
signal processing murmur pitch george b moody physionet challenge 2022 murmur grading murmur location murmur timing phonocardiogram pregnant murmur shape pediatric murmur detection murmur intensity murmur quality

Corresponding Author

You must be logged in to view the contact information.

Versions

1.0.0 - Jan. 11, 2022
1.0.1 - Jan. 28, 2022
1.0.2 - April 29, 2022
1.0.3 - May 10, 2022

Files

Total uncompressed size: 558.9 MB.

Access the files

Download the ZIP file (449.4 MB)

Download the files using your terminal:

wget -r -N -c -np https://physionet.org/files/circor-heart-sound/1.0.1/

Download the files using AWS command line tools:

aws s3 sync s3://physionet-open/circor-heart-sound/1.0.1/ DESTINATION

Visualize waveforms

Folder Navigation:

Name	Size	Modified
training_data
LICENSE.txt (download)	19.9 KB	2022-01-28
RECORDS (download)	71.1 KB	2022-01-28
SHA256SUMS.txt (download)	934.9 KB	2022-01-28
training_data.csv (download)	106.9 KB	2022-01-28

The CirCor DigiScope Phonocardiogram Dataset

Cite