Database Credentialed Access
Bridge2AI-Voice Pediatric Dataset
Yael Bensoussan , Alexandros Sigaras , Anais Rameau , Olivier Elemento , Maria Powell , David Dorr , Philip Payne , Vardit Ravitsky , Jean-Christophe Bélisle-Pipon , Ruth Bahr , Stephanie Watts , Donald Bolser , Jennifer Siu , Jordan Lerner-Ellis , Frank Rudzicz , Micah Boyer , Yassmeen Abdel-Aty , Toufeeq Ahmed Syed , James Anibal , Dona Amraei , Stephen Aradi , Kirollos Armosh , Ana Sophia Martinez , Shaheen Awan , Steven Bedrick , Helena Beltran , Alexander Bernier , Moroni Berrios , Isaac Bevers , Alden Blatter , Rahul Brito , Amy Brown , Johnathan Brown , Léo Cadillac , Selina Casalino , John Costello , Abhijeet Dalal , Iris De Santiago , Enrique Diaz-Ocampo , Amanda Doherty-Kirby , Mohamed Ebraheem , Ellie Eiseman , Mahmoud Elmahdy , Renee English , Emily Evangelista , Kenneth Fletcher , Hortense Gallois , Gaelyn Garrett , Alexander Gelbard , Anna Goldenberg , Karim Hanna , William Hersh , Jennifer Jain , Lochana Jayachandran , Kaley Jenney , Kathy Jenkins , Stacy Jo , Alistair Johnson , Ayush Kalia , Megha Kalia , Zoha Khawa , Cindy Kostelnik , Alisa Krause , Andrea Krussel , Elisa Lapadula , Genelle Leo , Justin Levinsky , Chloe Loewith , Radhika Mahajan , Vrishni Maharaj , Siyu Miao , LeAnn Michaels , Matthew Mifsud , Marian Mikhael , Elijah Moothedan , Yosef Nafii , Tempestt Neal , Karlee Newberry , Evan Ng , Christopher Nickel , Amanda Peltier , Trevor Pharr , Michaela Pnacekova , Matthew Pontell , Claire Premi-Bortolotto , Parnaz Rafatjou , JM Rahman , John Ramos , Sarah Rohde , Michael de Riesthal , Jillian Rossi , Laurie Russell , Samantha Salvi Cruz , Joyce Samuel , Suketu Shah , Ahmed Shawkat , Elizabeth Silberholz , John Stark , Lala Su , Shrramana Ganesh Sudhakar , Duncan Sutherland , Venkata Swarna Mukhi , Jeffrey Tang , Luka Taylor , Jamie Toghranegar , Julie Tu , Megan Urbano , Gavin Victor , Kimberly Vinson , Jordan Wilke , Claire Wilson , Madeleine Zanin , Xijie Zeng , Theresa Zesiewicz , Robin Zhao , Pantelis Zisimopoulos , Satrajit Ghosh
Published: Dec. 17, 2025. Version: 1.0.0
When using this resource, please cite:
(show more options)
Bensoussan, Y., Sigaras, A., Rameau, A., Elemento, O., Powell, M., Dorr, D., Payne, P., Ravitsky, V., Bélisle-Pipon, J., Bahr, R., Watts, S., Bolser, D., Siu, J., Lerner-Ellis, J., Rudzicz, F., Boyer, M., Abdel-Aty, Y., Ahmed Syed, T., Anibal, J., ... Ghosh, S. (2025). Bridge2AI-Voice Pediatric Dataset (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/y7mp-eh56
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
Abstract
The human voice contains complex acoustic markers which have been linked to important health conditions including dementia, mood disorders, and cancer. When viewed as a biomarker, voice is a promising characteristic to measure as it is simple to collect, cost-effective, and has broad clinical utility. Recent advances in artificial intelligence have provided techniques to extract previously unknown prognostically useful information from dense data elements such as images. The Bridge2AI-Voice project seeks to create an ethically sourced flagship dataset to enable future research in artificial intelligence and support critical insights into the use of voice as a biomarker of health. Here we present Bridge2AI-Voice, a comprehensive collection of data derived from voice recordings with corresponding clinical information.
Bridge2AI-Voice Pediatric Dataset v3.0 contains derived audio features for 22,620 recordings collected from 300 participants aged 2-18. The release contains data considered low risk, including derivations such as spectrograms but not the original voice recordings. Detailed demographic, clinical, and validated questionnaire data are also made available.
Background
Understanding voice and speech development in children is essential for identifying communication or speech disorders early in life, and for supporting timely intervention [1]. Pediatric and adult voice/speech production are fundamentally different because the respiratory system and larynx undergo rapid functional maturation/development throughout childhood [2, 3]. These developmental changes can influence acoustic features such as fundamental frequency (F₀) [2]. As a result, evidence/normative data that we have in adults cannot be generalized to pediatric populations.
Despite the clinical importance of detecting pediatric communication disorders such as autism spectrum disorder and speech delays, the availability of large-scale pediatric databases/datasets remains/is limited. Data collection in pediatric populations introduces/poses unique challenges such as privacy issues, consent processes, the need for developmentally appropriate tasks. These factors have contributed to the lack of publicly/open access/ available pediatric data sets that enable machine learning for pediatric voice analysis [3].
Establishing a robust multi-institutional dataset that integrates pediatric voice data with demographic information would advance the understanding of voice and disease as well as early detection and intervention. Resources such as this project are intended to enable study of developmental norms, create AI-driven tools for early screening, and support clinical insight.
Methods
Patients/healthy volunteers at the Hospital for Sick Children were considered for enrollment in the study. Patients were considered eligible for the study if they fulfilled the inclusion criteria of 2 to 18 years of age, and English proficiency. Exclusion criteria included participants over 18 years of age, and individuals who were non-verbal. Non-patients, recruited through research postings, were evaluated for eligibility based on the study’s inclusion and exclusion criteria. Following confirmation of eligibility, parental or participant consent was obtained prior to data collection and data sharing. Once consented, patients were assigned a unique study identification number and a standardized age-appropriate protocol for data collection was adopted. The protocol included the collection of demographic information, voice and speech related questionnaires, and questionnaires inquiring about medical history.
All data was collected through customized software – reproschema-ui – on tablets. A headset was used to record for most participants, while the remaining recordings utilized the built-in tablet microphone due to low tolerance of wearing headphones or existing complex medical conditions. All participants completed the recording and demographic data collection in one session. For participants without adequate comprehension and familiarity with their past medical history, parents or decision-makers completed the survey during the recording on a separate tablet. The simultaneous completion of the recording and survey improved efficiency and minimized participant burden and fatigue. Data were exported and converted to tab delimited values using an open source library developed by our team [4].
Data Description
The dataset contains both derived audio data features (under features) and phenotypic information acquired during data collection. Binary files are made available as Parquet, an open-source column-oriented data file format. Each of the parquet files is formatted similarly. Each element of the parquet formatted dataset contains a unique identifier for the participant (participant_id), a unique identifier for the recording session (session_id), the task performed (task_name), the number of time frames associated with that feature (n_frames), and the tensor data for the feature.
- torchaudio_spectrograms.parquet (n=27584) contains spectrograms of dimension 201xT generated using the short-time Fast Fourier Transform (FFT) with a 25ms window size, 10ms hop length, and a 400-point FFT.
- torchaudio_mel_spectrograms.parquet (n=27584) contains Mel spectrograms of dimension 60xT generated with a 25ms window size, 10ms hop length, a 400-point FFT and 60 Mel bins.
- torchaudio_mfcc.parquet (n=27584) contains Mel-frequency cepstrum coefficients of dimension 60xT using the same parameters as the mel spectrograms.
- torchaudio_pitch.parquet (n=28862) contains the detected pitch (fundamental frequency) over time and is of dimension T with a min and max pitch of 80 and 500 respectively.
- sparc_ema.parquet (n=27560) contains the estimated electromagnetic articulography (EMA) using a deep learning model with dimensions Tx12 where the 12 correspond to X/Y positions of six articulators: tongue dorsum (TD), tongue body (TB), tongue tip (TT), lower incisor (LI), upper lip (UL), lower lip (LL), respectively.
- sparc_loudness.parquet (n=28837) contains the estimated loudness based on the average absolute amplitude of the audio waveform of size T, using 20ms windows.
- sparc_periodicity.parquet (n=28837) contains the estimated periodicity (confidence of pitch presence) derived from the audio using 20ms windows of dimension T.
- sparc_pitch.parquet (n=28837) contains the estimated fundamental frequency (F0) of the audio signal using a different algorithm than before with a range of 50-550Hz and dimension T.
- ppgs.parquet (n=27616) contains the phonetic posteriorgram probabilities across 40 phoneme categories giving a dimension of 40xT with a frame rate of 100Hz.
Spectrograms, Mel Spectrograms, MFC coefficients, PPGs, and EMAs for sensitive records and audio checks have been removed from v3.0. Additionally, some files, whether due to length or other issues, could not generate certain features and so are not included in the bundled data.
In addition to the parquet files, the features folder contains the following plain-text file, features derived from the open-source Speech and Music Interpretation by Large-space Extraction (openSMILE [5]), Praat [6], parselmouth [7], and torchaudio [8, 9] are provided. Each feature is present in the static_features.tsv file.
All of the above files are associated with a data dictionary file which has the same file stem and a JSON suffixes (e.g. torch_spectrogram.json). The above data dictionaries have the same overall structure: a dictionary where keys are the column names matching the associated data file, and values are dictionaries with further detail. The description value in the data dictionary provides a one sentence summary of the respective column.
The code used to preprocess the raw audio waveforms into the parquet file and to merge the source data into the phenotype files has been made open source in the b2aiprep library [4].
Usage Notes
If using Python, the parquet dataset can be loaded in with the HuggingFace datasets library as follows:
from datasets import Dataset
ds = Dataset.from_parquet("torchaudio_spectrogram.parquet")
A spectrogram can be plotted in decibels by converting it from its original power representation:
from datasets import Dataset
import pandas as pd
import matplotlib.pyplot as plt
import librosa
import numpy as np
ds = Dataset.from_parquet("torchaudio_mel_spectrogram.parquet")
spectrogram = librosa.power_to_db(np.asarray(ds[0]['mel_spectrogram']))
plt.figure(figsize=(10, 4))
plt.imshow(spectrogram, aspect='auto', origin='lower')
plt.title('Spectrogram')
plt.xlabel('Time Step')
plt.ylabel('Frequency')
plt.colorbar()
plt.show()
A phenotype file can be loaded with any statistical analysis tool. For example, the pandas library in Python can read the data:
import pandas as pd
df = pd.read_csv("demographics.tsv", sep="\t", header=0)
Release Notes
b2ai-voice-pediatric v1.0: This was the first release of the Bridge2AI-Voice Pediatric dataset.
Ethics
Data collection and sharing was approved by the Research Ethics Board at the Hospital for Sick Children.
Acknowledgements
This release would not be possible without the graceful contribution of data from all the participants of the study.
This project was funded by NIH project number 3OT2OD032720-01S1: Bridge2AI: Voice as a Biomarker of Health - Building an ethically sourced, bioaccoustic database to understand disease like never before. We would also like to thank the NIH for their continued support of the project.
Conflicts of Interest
None to declare.
References
- Kelchner, L. N., Brehm, S. B., de Alarcon, A., & Weinrich, B. (2012). Update on pediatric voice and airway disorders: assessment and care. Current opinion in otolaryngology & head and neck surgery, 20(3), 160–164. https://doi.org/10.1097/MOO.0b013e3283530ecb
- Tavares, E. L., Labio, R. B., & Martins, R. H. (2010). Normative study of vocal acoustic parameters from children from 4 to 12 years of age without vocal symptoms: a pilot study. Brazilian journal of otorhinolaryngology, 76(4), 485–490. https://doi.org/10.1590/S1808-86942010000400013
- Fujiki RB, Venkatraman A, Heller Murray ES. The Pediatric Vocal Mechanism: Structure and Function. J Voice. 2025 Apr 4:S0892-1997(25)00118-3. doi: 10.1016/j.jvoice.2025.03.025. Epub ahead of print. PMID: 40187973; PMCID: PMC12353639.
- Johnson, A., Bevers, I., Ng, E., Wilke, J., Brito, R., Bedrick, S., Catania, F. & Ghosh, S. (2025). Bridge2AI Data Processing Library (Version 3.0.0) [Computer software]. https://github.com/sensein/b2aiprep
- Florian Eyben, Martin Wöllmer, Björn Schuller: "openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor", Proc. ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978-1-60558-933-6, pp. 1459-1462, 25.-29.10.2010.
- Boersma P, Van Heuven V. Speak and unSpeak with PRAAT. Glot International. 2001 Nov;5(9/10):341-7.
- Jadoul Y, Thompson B, De Boer B. Introducing parselmouth: A python interface to praat. Journal of Phonetics. 2018 Nov 1;71:1-5.
- Hwang, J., Hira, M., Chen, C., Zhang, X., Ni, Z., Sun, G., Ma, P., Huang, R., Pratap, V., Zhang, Y., Kumar, A., Yu, C.-Y., Zhu, C., Liu, C., Kahn, J., Ravanelli, M., Sun, P., Watanabe, S., Shi, Y., Tao, T., Scheibler, R., Cornell, S., Kim, S., & Petridis, S. (2023). TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch. arXiv preprint arXiv:2310.17864
- Yang, Y.-Y., Hira, M., Ni, Z., Chourdia, A., Astafurov, A., Chen, C., Yeh, C.-F., Puhrsch, C., Pollack, D., Genzel, D., Greenberg, D., Yang, E. Z., Lian, J., Mahadeokar, J., Hwang, J., Chen, J., Goldsborough, P., Roy, P., Narenthiran, S., Watanabe, S., Chintala, S., Quenneville-Bélair, V, & Shi, Y. (2021). TorchAudio: Building Blocks for Audio and Speech Processing. arXiv preprint arXiv:2110.15018.
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
Bridge2AI Voice Registered Access License
Data Use Agreement:
Bridge2AI Voice Registered Access Agreement
Required training:
No training required
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/y7mp-eh56
DOI (latest version):
https://doi.org/10.13026/mf9s-5r03
Project Website:
https://b2ai-voice.org/
Corresponding Author
Versions
Files
- be a credentialed user
- sign the data use agreement for the project