Challenge Open Access
Heart Murmur Detection from Phonocardiogram Recordings: The George B. Moody PhysioNet Challenge 2022
Matthew Reyna , Yashar Kiarashi , Andoni Elola , Jorge Oliveira , Francesco Renna , Annie Gu , Erick Andres Perez Alday , Nadi Sadr , Sandra Mattos , Miguel Coimbra , Reza Sameni , Ali Bahrami Rad , Zuzana Koscova , Gari Clifford
Published: Sept. 28, 2023. Version: 1.0.0
When using this resource, please cite:
(show more options)
Reyna, M., Kiarashi, Y., Elola, A., Oliveira, J., Renna, F., Gu, A., Perez Alday, E. A., Sadr, N., Mattos, S., Coimbra, M., Sameni, R., Bahrami Rad, A., Koscova, Z., & Clifford, G. (2023). Heart Murmur Detection from Phonocardiogram Recordings: The George B. Moody PhysioNet Challenge 2022 (version 1.0.0). PhysioNet. https://doi.org/10.13026/t49p-5v35.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Congenital heart diseases affect about 1% of newborns, representing an important morbidity and mortality factor for several severe conditions, including advanced heart failure . In a 2019 survey, it was estimated that congenital heart diseases affect over 500,000 children in East Africa , and about 8 in every 1000 live births . Acquired heart diseases include rheumatic fever and the Kawasaki disease, the former being a serious public health problem in developing regions, e.g., rural Brazil . Several regions of developing countries have difficulties in diagnosing and treating both congenital and acquired heart conditions in children. This is mainly due to the lack of infrastructure and cardiology specialists in geographically large areas and difficulty in accessing health services. In addition, the current COVID-19 pandemic poses new difficulties in the clinical evaluation of patients by delaying important in-person patient-doctor contacts, negatively impacting screening and monitoring activities.
A non-invasive assessment of the mechanical function of the heart, performed at point-of-care settings, can provide early information regarding congenital and acquired heart diseases in children. The lack of early diagnoses of these conditions represents a major public health problem, especially in underprivileged countries with high birth rates [5, 6, 7]. In particular, cardiac auscultation and the analysis of the phonocardiogram (PCG) can unveil fundamental clinical information regarding heart malfunctioning caused by congenital and acquired heart disease in pediatric populations. This is achieved by detecting abnormal sound waves, or heart murmurs, in the PCG signal. Murmurs are abnormal waves generated by turbulent blood flow in cardiac and vascular structures. They are closely associated with specific diseases such as septal defects, failure of ductus arteriosus closure in newborns, and defective cardiac valves.
Succedent the 2016 Challenge, which focused on classifying normal vs. abnormal heart sounds from a single short recording from a single precordial location [8, 9], this year’s Challenge is devoted to detecting the presence or absence of murmurs from multiple heart sound recordings from multiple auscultation locations, as well as detecting the clinical outcomes.
The goal of the Challenge is to identify the presence, absence, or unclear cases of murmurs and the normal vs. abnormal clinical outcomes from heart sound recordings collected from multiple auscultation locations on the body using a digital stethoscope.
We ask participants to design and implement a working, open-source algorithm that, based only on the provided recordings and routine demographic data, can determine whether any murmurs are audible from a patient’s recordings. Also, they need to identify the clinical outcomes from such recordings and demographic data. We have designed a scoring function that reflects the burden of algorithmic pre-screening, expert screening, treatment, and missed diagnoses. The team with the lowest score wins the Challenge.
To participate in the Challenge, register your team by providing the full names, affiliations, and official email addresses of your entire team before you submit your algorithm. The details of all authors must be exactly the same as the details in your abstract submission to Computing in Cardiology. You may update your author list by completing this form again (read the form for details), but changes to your authors must not contravene the rules of the Challenge.
The Challenge data contain one or more heart sound recordings for 1568 patients as well as routine demographic information about the patients from whom the recordings were taken. The Challenge labels consist of two types:
- Murmur-related labels indicate whether an expert annotator detected the presence or absence of a murmur in a patient from the recordings or whether the annotator was unsure about the presence or absence of a murmur.
- Outcome-related labels indicate the normal or abnormal clinical outcome diagnosed by a medical expert.
The Challenge data come from The CirCor DigiScope Phonocardiogram Dataset v1.0.3 . They were collected from a pediatric population during two mass screening campaigns conducted in Northeast Brazil in July-August 2014 and June-July 2015. The data collection was approved by the 5192-Complexo Hospitalar HUOC/PROCAPE institutional review board, under the request of the Real Hospital Portugues de Beneficencia em Pernambuco. The target population was individuals who were 21 years old or younger who presented voluntarily for screening with a signed parental or legal guardian consent form. All participants completed a sociodemographic questionnaire and subsequently underwent a clinical examination, a nursing assessment, and cardiac investigations. A detailed description of the data can be found in .
Each patient in the Challenge data has one or more recordings from one or more prominent auscultation locations: pulmonary valve (PV), aortic valve (AV), mitral valve (MV), tricuspid valve (TV), and other (Phc). The recordings were collected sequentially (not simultaneously) from different auscultation locations using a digital stethoscope. The number, location, and duration of the recordings vary between patients.
The Challenge data is organized into three distinct sets: training, validation, and test sets. We have publicly released 60% of the dataset as the training set of the 2022 Challenge, and we have retained the remaining 40% as a hidden data for validation and test purposes. The hidden data will be used to evaluate the entries to the 2022 Challenge and will be released only after the end of the 2022 Challenge.
To create the training, validation, and test sets, the original dataset was partitioned patient-wise (no patient belonged to multiple sets) through stratified random sampling to provide similar proportions of patients with murmurs (present), patients without murmurs (absent), and unknown cases across the different sets. The training set contains 3163 recordings from 942 patients.
The public training set contains heart sound recordings, routine demographic information, murmur-related labels (presence, absence, or unknown), outcome-related labels (normal or abnormal), annotations of the murmur characteristics (location, timing, shape, pitch, quality, and grade), and heart sound segmentations. The private validation and test sets only contain heart sound recordings and demographic information.
The following Data Table shows the available information in the training, validation, and test sets of the Challenge data. The label variables are in bold. A detailed description of this table can be found in the Data Description.
|Variable||Short description (format)||Possible values||Training||Validation||Test|
|Age||Age category (string)||Neonate
|Sex||Reported sex (string)||Female
|Height||Height in centimeters (number)||✓||✓||✓|
|Weight||Weight in kilograms (number)||✓||✓||✓|
|Pregnancy status||Did the patient report being pregnant during screening? (Boolean)||✓||✓||✓|
|Additional ID||The second identifier for patients that participated to both screening campaigns (string)||✓|
|Campaign||Campaign attended by the patient (string)||CC2014
|Murmur||Indicates if a murmur is present, absent or unidentifiable for the annotator; the Challenge label (string)||Present
|Murmur locations||Auscultation locations where at least one murmur has been observed (string)||Any combination of the following abbreviations, concatenated with plus (+) signs: PV, TV, AV, MV, and Phc||✓|
|Most audible location||Auscultation location where murmurs sounded more intense for the annotator (string)||PV
|Systolic murmur timing||The timing of the murmur within the systolic period (string)||Early-systolic
|Systolic murmur shape||Shape of the murmur in the systolic period (string)||Crescendo
|Systolic murmur pitch||Pitch of the murmur in the systolic period (string)||Low
|Systolic murmur grading||Grading of the murmur in the systolic period according to the Levine scale  with some modification (string)||I/VI
|Systolic murmur quality||Quality of the murmur in the systolic period (string)||Blowing
|Diastolic murmur timing||The timing of the murmur within the diastolic period (string)||Early-diastolic
|Diastolic murmur shape||Shape of the murmur in the diastolic period (string)||Decrescendo
|Diastolic murmur pitch||Pitch of the murmur in the diastolic period (string)||Low
|Diastolic murmur grading||Grading of the murmur in the diastolic period (string)||I/IV
|Diastolic murmur quality||Quality of the murmur in the diastolic period (string)||Blowing
|Outcome||Indicates if the clinical outcome diagnosed by the medical expert is normal or abnormal; the Challenge label (string)||Normal
Note 1: The participants are welcome and encouraged to use external PCG or audio datasets, including the 2016 PhysioNet Challenge data [8, 9] and PhysioNet EPHNOGRAM dataset  for training their models or for transfer learning.
Note 2: The participants are encouraged to relabel the data and share new labels with us for further investigation. We may consider providing consensus labels at some point.
There are four data file types in the training set:
- A wave recording file (binary
.wavformat) per auscultation location for each subject, which contains the heart sound data.
- A header file (text
.heaformat) describing the
.wavfile using the standard WFDB format.
- A segmentation data file (text
.tsvformat) per auscultation location for all subjects, which contains segmentation information regarding the start and end points of the fundamental heart sounds S1 and S2.
- A subject description text file (text
.txtformat) per subject, where the name of the file corresponds to the subject ID. Demographic data such as age, sex, height, and weight as well as the murmur and clinical outcomes and a detailed description of any murmur events are provided in this file.
The validation and test datasets have the same structure, but the
.txt file does not provide information about murmurs or outcomes, and the
.tsv segmentation files are not provided.
The filenames for the audio data, the header file, the segmentation annotation, and the subject description are formatted as
ABCDE.txt, respectively. Here,
ABCDE is a numeric subject identifier and
XY is one of the following codes corresponding to the auscultation location where the PCG was collected on the body surface:
- PV corresponds to the pulmonary valve point;
- TV corresponds to the tricuspid valve point;
- AV corresponds to the aortic valve point;
- MV corresponds to the mitral valve point;
- Phc corresponds to any other auscultation location.
If more than one recording exists per auscultation location, an integer index follows the auscultation location code in the file name, i.e,
n is an integer (1, 2, …). Accordingly, each audio file has its own header and annotation segmentation file, but the subject description file
ABCDE.txt is shared between all auscultation recordings of the same subject ID. These audio recordings were recorded sequentially, not simultaneously, and therefore may have different lengths. The sequence of signal aquisition locations is unknown and is not necessarily consisent across different subjects.
The subject description file has the following format:
- The first line indicates the subject identifier, the number of recordings, and the sampling frequency (in Hz) separated by space delimiters, respectively.
- The following lines contain information about the heart sound data files corresponding to the current subject ID, also separated by empty spaces. Here, the location of the recording (AV, PV, TV, MV, or Phc), the name of the header file, the name of the WAV file, and the name of the segmentation file are included (the segmentation files are only provided for the training data).
- The rest of the lines start with a pound/hash symbol (#) and indicate the information described in the Data Table.
Example: The subject description file
1234.txt contains information about the subject with ID number 1234, as shown below. Accordingly, there are a total of four WAV files for this subject acquired from the locations AV, PV, TV and MV, all sampled at 4000 Hz. Each
.wav file has its heart sound segmentation information registered in a separate
.tsv file, with a similar base name as the corresponding
1234 4 4000 AV 1234_AV.hea 1234_AV.wav 1234_AV.tsv PV 1234_PV.hea 1234_PV.wav 1234_PV.tsv TV 1234_TV.hea 1234_TV.wav 1234_TV.tsv MV 1234_MV.hea 1234_MV.wav 1234_MV.tsv #Age: Child #Sex: Female #Height: 123.0 #Weight: 13.5 #Pregnancy status: False #Murmur: Present #Murmur locations: AV+MV+PV+TV #Most audible location: TV #Systolic murmur timing: Holosystolic #Systolic murmur shape: Diamond #Systolic murmur grading: III/VI #Systolic murmur pitch: High #Systolic murmur quality: Harsh #Diastolic murmur timing: nan #Diastolic murmur shape: nan #Diastolic murmur grading: nan #Diastolic murmur pitch: nan #Diastolic murmur quality: nan #Campaign: CC2014 #Additional ID: nan #Outcome: Abnormal
The segmentation annotation file (with
.tsv extension and in plain text format) is composed of three distinct columns: the first column corresponds to the time instant (in seconds) where the wave was detected for the first time, the second column corresponds to the time instant (in seconds) where the wave was detected for the last time, and the third column corresponds to an identifier that uniquely identifies the detected wave. Here, we use the following convention:
- The S1 wave is identified by the integer 1.
- The systolic period is identified by the integer 2.
- The S2 wave is identified by the integer 3.
- The diastolic period is identified by the integer 4.
- The unannotated segments of the signal are identified by the integer 0.
wget -r -N -c -np https://physionet.org/files/circor-heart-sound/1.0.3/
The source codes of the participating teams are split into 2 different .zip files, for the outcome and murmur tasks, or they are in one. zip file if the code for both tasks is the same. Source codes and papers of the participating teams can be found at the bottom of this page.
For each patient (independently of the number of recording locations), your algorithm must identify the class label (present, absent, unknown) as well as a probability or confidence score for each class per subject ID. As an example, suppose that you have four recordings in four locations on the body, your classifier needs to analyze those recordings but at the end must generate only one label (e.g., present) per subject ID with the score/probability for all classes, which are numbers between zero and one.
Your code might produce the following output for the patient ID 1234:
#1234 Present, Unknown, Absent, Abnormal, Normal 1, 0, 0 1, 0 0.75, 0.15, 0.1 0.6, 0.4
This output indicates that the classifier identified a murmur for patient 1234, and it indicates the probability of the presence of a murmur as 75%, the probability of the absence of a murmur as 10%, and the probability of an murmur unknown status as 15%. It also indicates the classifier identified an abnormal clinical outcome, and it indicates the probability of an abnormal outcome as 60% and the probability of a normal outcome as 40%.
- The MATLAB algorithm implements a random forest classifier that uses age, sex, height, weight, and pregnancy status (extracted from demographic data) and the mean, variance, and kurtosis of each PCG recording to classify the presence, absence, or unknown murmur status and an abnormal or normal clinical outcome for each patient. Your team does not need to use these features or this classifier type.
- The Python algorithm also implements a random forest classifier using the same features to classify the presence, absence, or unknown murmur status and an abnormal or normal clinical outcome for each patient. Your team does not need to use these features or this classifier type.
For this year’s Challenge, we developed two scoring metrics. Both scoring metrics can be defined in terms of the following confusion matrices for murmurs and clinical outcomes:
The first scoring metric is a weighted accuracy metric that places more importance or weight on patients with murmurs and abnormal outcomes.
For the murmur classifiers, the weighted accuracy metric is defined as
For the clinical outcome classifiers, the weighted accuracy metric is defined as
We will use to rank the murmur classifiers, but we will use a different metric to rank the clinical outcome classifiers.
The second scoring metric is a cost-based metric that considers the costs of algorithmic prescreening, expert screening, treatment, and diagnostic errors that result in late or missed treatments.
The screening procedure is as follows:
- The algorithm either refers or does not refer a patient to an expert. If the algorithm’s output is murmur present, murmur unknown, or outcome abnormal, then the patient is referred to an expert. If the algorithm’s output is murmur absent or outcome normal, then the patient is not referred to an expert.
- If the patient is referred to an expert, then the expert screens the patient. If the clinical outcome is abnormal, then the patient receives treatment. If the clinical outcome is normal, then the patient does not receive treatment. We assume that the expert does not make diagnostic errors.
- If the patient is not referred to an expert, then the patient does not receive treatment. If the expert would have identified an abnormal clinical outcome, then the patient would have received treatment, so this results in missed or late treatment. If the expert would have identified a normal clinical outcome, then the patient would not have received treatment anyway.
To study the value of algorithmic prescreening, we have defined a nonlinear cost function for expert screening with our clinical collaborators.
be the total cost of prescreenings by an algorithm, let
be the total cost of screenings by a human expert out of a population of patients, let
be the total cost of treatments, and let
be the total cost of delayed or missed treatments due to negative algorithmic prescreening. We assume that these costs are averaged over many subjects.
Due to our focus on the utility of algorithmic prescreening, is more complicated than the other costs, but the idea is simple: the total cost of expert screening increases as more patients are screened, but the average cost of screening a patient increases as we screen below or above our screening capacity.
For the above equation for , the mean per-patient cost of expert screening is $1000 when we screen 50% of the patient cohort, but it increases to $10000 when we screen 100% patient cohort. The mean per-patient reaches a minimum when we screen 25% of the patient cohort, but there is still a cost for our expert screening capacity even if we screen 0% of the patient cohort.
For a murmur classifier, we define
as the total cost for using the murmur classifier for algorithmic prescreening, and
as the mean cost for using the murmur classifier for algorithmic prescreening, where is the total number of patients.
For a clinical outcome classifier, we define
as the total cost for using the clinical outcome classifier for algorithmic prescreening, and
as the mean cost for using the outcome classifier for algorithmic prescreening, where is the total number of patients.
We will use to rank the clinical outcome classifiers.
The authors declare no ethics concerns.
This year’s Challenge is generously co-sponsored by MathWorks and the Gordon and Betty Moore Foundation.
Obtaining Complimentary MATLAB Licenses
MathWorks has generously decided to sponsor this Challenge by providing complimentary licenses to all teams that wish to use MATLAB. Users can apply for a license and learn more about MATLAB support by visiting the PhysioNet Challenge page from MathWorks. If you have questions or need technical support, then please contact MathWorks at email@example.com.
Obtaining Complimentary Google Cloud Platform Credits
At the time of launching this Challenge, Google Cloud offers multiple services for free on a one-year trial basis and $300 in cloud credits. Teams can request research credits here. Additionally, if teams are based at an educational institution in selected countries, then they can access free GCP training online. The Challenge Organizers, their employers, PhysioNet and Computing in Cardiology accept no responsibility for the loss of credits, or failure to issue credits for any reason.
Conflicts of Interest
The authors have no conflicts of interest to declare.
- D. S. Burstein, P. Shamszad, D. Dai, C. S. Almond, J. F. Price, K. Y. Lin, M. J. O’Connor, R. E. Shaddy, C. E. Mascio, and J. W. Rossano, “Significant mortality, morbidity and resource utilization associated with advanced heart failure in congenital heart disease in children and young adults,”. American Heart Journal, vol. 209, pp. 9-19, 2019.
- S. G. Jivanji, S. Lubega, B. Reel, and S. A. Qureshi, “Congenital heart disease in East Africa,” Frontiers in Pediatrics.7:250, 2019.
- L. Zühlke, M. Mirabel, and E. Marijon, “Congenital heart disease and rheumatic heart disease in Africa: Recent advances and current priorities,” Heart, vol. 99, no. 21, pp. 1664-1561, 2013.
- S. M. Carvalho, I. Dalben, J. E. Corrente, and C. S. Magalhães, “Rheumatic fever presentation and outcome: a case-series report,” Revista Brasileira de Reumatologia, vol. 52, no. 2, pp. 241-246, 2012
- A. Tandon, S. Sengupta, V. Shukla, and S. Danda, “Risk factors for congenital heart disease (CHD) in Vellore, India,” Current Research Journal of Biological Sciences, vol. 2, no. 4, pp. 253-258, 2010.
- M. D. Seckeler and T. R. Hoke, “The worldwide epidemiology of acute rheumatic fever and rheumatic heart disease,” Clinical Epidemiology, vol. 3, pp. 67-84, 2011.
- A. Gheorghe, U. Griffiths, A. Murphy, H. Legido-Quigley, P. Lamptey, and P. Perel, “The economic burden of cardiovascular disease and hypertension in low-and middle-income countries: a systematic review,” BMC Public Health, vol. 18, no. 1, 2018.
- G. D. Clifford, C. Liu, B. Moody, D. Springer, I. Silva, Q. Li, and R. G. Mark. “Classification of normal/abnormal heart sound recordings: The PhysioNet/Computing in Cardiology Challenge 2016.” In 2016 Computing in Cardiology Conference (CinC), 2016 Sep 11 (pp. 609-612).]
- G. D. Clifford, C. Liu, B. Moody, J. Millet, S. Schmidt, Q. Li, I. Silva, R.G. Mark. “Recent advances in heart sound analysis,” Physiol Meas., vol. 38, pp. E10-E25, 2017, doi: 10.1088/1361-6579/aa7ec8. Focus issue online at https://iopscience.iop.org/journal/0967-3334/page/Recent-advances-in-heart-sound-analysis.
- J. H. Oliveira et al., “The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification,” IEEE Journal of Biomedical and Health Informatics, doi: 10.1109/JBHI.2021.3137048.
- R. Keren, M. Tereschuk, and X. Luan, “Evaluation of a novel method for grading heart murmur intensity,” Archives of pediatric & adolescent medicine, vol. 159, no. 4, pp. 329-334, 2005.
- K. Williams, D. Thomson, I. Seto, D. Contopoulos-Ioannidis et al., “Standard 6: Age groups for pediatric trials,” Pediatrics, vol. 129 Suppl 3, pp. S153-60, 06 2012.
- D. B. Springer, L. Tarassenko and G. D. Clifford, “Logistic Regression-HSMM-Based Heart Sound Segmentation,” in IEEE Transactions on Biomedical Engineering, vol. 63, no. 4, pp. 822-832, April 2016, doi: 10.1109/TBME.2015.2475278.
- J. Oliveira, F. Renna, T. Mantadelis and M. Coimbra, “Adaptive Sojourn Time HSMM for Heart Sound Segmentation,” in IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 2, pp. 642-649, March 2019, doi: 10.1109/JBHI.2018.2841197.
- F. Renna, J. Oliveira and M. T. Coimbra, “Deep Convolutional Neural Networks for Heart Sound Segmentation,” in IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 6, pp. 2435-2445, Nov. 2019, doi: 10.1109/JBHI.2019.2894222.
Anyone can access the files, as long as they conform to the terms of the specified license.
License (for files):
Creative Commons Attribution 4.0 International Public License
Total uncompressed size: 1.3 GB.
Access the files
- Download the ZIP file (1.3 GB)
Download the files using your terminal:
wget -r -N -c -np https://physionet.org/files/challenge-2022/1.0.0/
|LICENSE.txt (download)||14.5 KB||2023-09-26|
|SHA256SUMS.txt (download)||11.4 KB||2023-09-28|