Database Open Access
A Multimodal Dataset for Investigating Working Memory in Presence of Music
Saman Khazaei , Srinidhi Parshi , Samiul Alam , Md Rafiul Amin , Rose T Faghih
Published: Feb. 26, 2025. Version: 1.0.0
        When using this resource, please cite: 
        (show more options)
        
Khazaei, S., Parshi, S., Alam, S., Amin, M. R., & Faghih, R. T. (2025). A Multimodal Dataset for Investigating Working Memory in Presence of Music (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/6vh4-dk68
      
        Please include the standard citation for PhysioNet:
        (show more options)
          
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.
      
Abstract
We present the accompanying dataset to the study "A Multimodal Dataset for Investigating Working Memory in Presenceof Music". The experiment is conducted with the aim of investigating the viability of music as an intervention to regulate cognitive arousal and performance states. We recorded the multimodal physiological signals and behavioral data during a working memory task called the n-back task while the background music was playing. We requested the participants to provide the music, and two types of music were employed with the calming and exciting content. The calming music was played during the first session of the experiment, and the exciting music was presented during the second session. Each session includes an equal number of 1-back and 3-back task blocks, where 22 trials are presentedwithin each task block. A total number of 16 task blocks are implemented in each session (8 blocks of 1-back task and 8 blocks of 3-back task). In this experiment,11 participants/subjects originally participated, while we removed participants/subjects with small modalities. The recorded signals are skin conductance (SC), electrocardiogram (ECG), skin surface temperature (SKT), respiration, photoplethysmography (PPG), functional near-infrared spectroscopy (fNIRS), electromyogram (EMG), de-identified facial expression scores, sequence of correct/incorrect responses, and reaction time.
Background
The non-invasive closed-loop brain-machine interface (BMI) design can leverage the recent advances in physiological data collection systems. More specifically, various physiological signals can be collected in a non-invasive manner, and they can be used as the informative index of cognitive brain states [1-2]. The loop can be closed, and these brain states may be regulated using non-invasive interventions such as music [1]. The brain state regulation via musical stimulation can be implemented within closed-loop BMI (CLBMI) in everyday life settings. Music can be used in the context of neural rehabilitation and cognitive impairment treatments [1-5]. In the realm of cognitive brain states, the well-known Yerkes-Dodson law notes that the moderate range of arousal state can result in maximized performance [6-7]. Hence, it is worth investigating the potential of music as an intervention to regulate the arousal and performance states.
Physiological signals can provide valuable insights into the physiological basis of changes in brain activity. For instance, spikes in EDA may reflect increased autonomic arousal and correspond with changes in brain activity, such as heightened activity in the amygdala and other regions involved in emotional processing [8]. Also, the behavioral data can be applied as an informative observation that may reflect the dynamic of cognitive performance [1]. This information can improve our understanding of the relationship between arousal and performance, which could have important implications for the treatment of various neurological and psychiatric conditions. While behavioral data and physiological signals such as SC, PPG, and ECG can serve as informative indexes of the brain, the incorporation of fNIRS in this multimodal dataset enables the direct evaluation of the brain response. Particularly, the fNIRS has higher spatial resolution compared to electroencephalography (EEG), and studies have confirmed the suitability of fNIRS, particularly in the prefrontal cortex, for evaluating cognitive workload in memory tasks [9]. The portability and ease of use of fNIRS head caps, combined with their ability to provide high-quality data on hemodynamic variations, make fNIRS a suitable method for passive brain-computer interface (BCI) monitoring of stress levels.
Therefore, a multimodal dataset that incorporates various neuroimaging and physiological signals could be crucial in decoding the dynamic nature of brain states. By combining multiple modalities such as PPG, EEG, fNIRS, and EDA, we can capture the complementary aspects of neural activity and physiological changes, providing a more comprehensive picture of the brain's response to cognitive tasks. This approach can lead to the identification of more robust biomarkers of cognitive stress and provide insights into the neural mechanisms underlying cognitive workload, ultimately leading to the development of more effective interventions and treatments for neurological and psychiatric conditions. Despite the small sample size, a diverse array of data encompassing multiple physiological measurements was recorded in the presence of various music types and cognitive loads. The conducted study can shed light on the human-in-the-loop experiments, and the multimodal feature of this dataset enables researchers to study various physiological signals in a concurrent manner and gain a comprehensive understanding of brain response to music and cognitive load.
Methods
The conducted experiment here has two main sessions, and two types of music are used to mimic the high and low arousing conditions. These types of music include calming and exciting content, where the calming music played in the first session of the experiment while the exciting music played in the second session of the experiment. In this experiment, we mainly focus on the cognitive brain states under various cognitive loads as well as the arousing conditions. More specifically, we employ a working memory experiment called the "n-back" task, in which a participant needs to recall the n-th previous stimuli and utilize the working memory as one of the basic cognitive functions [10]. The underlying cognitive brain states are often hidden and challenging to be tracked in a continuous manner. Hence, the informative physiological and behavioral data can play a crucial role in decoding these hidden brain states and designing a control strategy. We collect a diverse array of data encompassing various physiological signals such as skin conductance (SC), electrocardiogram (ECG), skin surface temperature (SKT), respiration, photoplethysmography (PPG), functional near-infrared spectroscopy (fNIRS), electromyogram (EMG), de-identified facial expression scores, and behavioral data.
In the designed n-back task, each participant was shown a series of alphabets sequentially, and the participant must identify if the alphabet displayed currently is the same as the alphabet displayed in "n-th" previous iteration. As is evident, the cognitive workload increases with "n", since the participant has to recall more of the stimuli with higher values of "n". In this experiment, we employed the equal blocks of 1-back and 3-back within two sessions where a total number of 16 blocks were randomly distributed in each session (8 blocks of each type of task), and each block included 22 trials. The first session was accompanied by calming background music, while the background music during the second session included exciting content. The first session was annotated as "Calming," and the second session was annotated as "Vexing". At the beginning of each block, we show a 5-second instruction describing the task to inform the participant about the type of n-back task (e.g., 1-back or 3-back) that would be presented. The instruction was delivered via a displayed text on a smart 65-inch TV screen, and the type of n-back task (e.g., 1-back or 3-back) shown at each block was specified. Each task block consisted of 22 trials where 22 alphabetical stimuli were presented. For each trial, a stimulus letter was displayed for 0.5 seconds followed by a resting cross for 1.5 seconds. Therefore, each task block had a total duration of 49 seconds [instruction time 5 seconds + (stimulus display time 0.5 seconds + resting cross display 1.5 seconds) × 22 trials = 49 seconds]. At the end of each task block, a 10-second relaxing/resting window was presented where a resting cross was displayed on the screen. After 8 task blocks (halfway mark for each session), a 20-second relaxing/resting section was presented where a resting cross is displayed on a smart 65-inch TV screen connected via HDMI to a laptop. Data were collected on a separate laptop connected to all devices. The duration of each session was 964 seconds, which is approximately 16 minutes. After each session, there was a 2-minute resting segment in which the participant was allowed to relax. A relaxation cross was displayed on the screen during this time.
The collected signals in this experiment include skin conductance (SC), electrocardiogram (ECG), skin surface temperature (SKT), respiration, photoplethysmography (PPG), functional near-infrared spectroscopy (fNIRS), electromyogram (EMG), de-identified facial expression scores, and behavioral data (i.e., sequence of correct/incorrect responses and reaction time). We include a list of collected signals and employed tools (e.g., sensors, software) below:
- The MP160 Biopac system and the BioNomadix wireless devices were used to collect Electrocardiogram (ECG) [11-12];
- The belt sensor of the MP160 Biopac system was employed to capture the contraction and expansion of the lungs (respiration);
- The MP160 BioPac system with the BioNomadix wearable device coupled with the BN-TEMP-A-XDCR Biopac sensor and the Empatica E4 wearable wristband [13], worn by the participant to collect skin temperature data;
- Sensors from both the MP160 Biopac system and Empatica E4 wearable wristband were used to record Electrodermal Activity (EDA);
- Wearable physiological sensor BN-PULSE-XDCR coupled with BioNomadix unit was used to obtain Photoplethysmography (PPG) data with the M160 Biopac system. Also, the Empatica E4 wearable wristband collected PPG data;
- Sensors from the MP160 Biopac system were used for Electromyogram (EMG) recordings;
- Functional Near Infrared Spectroscopy (fNIRS) data were recorded using NIRSport 2 [14-15];
- The facial expression scores were obtained using Face Reader software [16];
- The experimental design, timing, and triggers for different equipment have been executed using the Chronos input device and E-Prime software, and the behavioral signals were stored [17];
The experiment originally had 11 subjects/participants. However, all sensor measurements could not be collected for all the subjects. Subjects/participants with small modalities were, therefore, removed altogether. Below, we tabulate what subject data are included in the dataset:
| Subject # | Biopac | Empatica | Face Reader | NIRSport 2 | Chronos and E-prime | 
|---|---|---|---|---|---|
| 1 | |||||
| 2 | |||||
| 3 | ✔ | ✔ | ✔ | ✔ | ✔ | 
| 4 | ✔ | ✔ | ✔ | ✔ | ✔ | 
| 5 | |||||
| 6 | ✔ | ✔ | ✔ | ✔ | |
| 7 | |||||
| 8 | ✔ | ✔ | ✔ | ✔ | ✔ | 
| 9 | |||||
| 10 | |||||
| 11 | ✔ | ✔ | ✔ | ✔ | ✔ | 
De-identification:
To ensure the confidentiality of the participant's data, protect the participant's health data, and avoid the violation of participants' privacy while performing this research, we followed the provided guidance by the Health Insurance Portability and Accountability Act (HIPAA). More specifically, we avoid sharing the 18 HIPAA identifiers, including:
- We assign a de-identifiable subject ID and avoid sharing the name of participants.
- The elements of dates (except year) directly related to the participant, such as the participant's birthday date and the exact date of the data collection, are preserved. Specifically, to de-identify the time data, we uniformly shifted all time stamps of the Empatica and Face reader recordings with respect to the unix time such that we preserve the anonymity and confidentiality of the data. The timestamp of the fNIRS and Biopac recordings is provided in the de-identified format where the start of the recordings is set to 0, and the triggers information and timings are available accordingly.
- We avoid sharing the listed information on 18 HIPAA identifiers such as telephone number, fax number, social security number, medical record number, certificate/license number, health plan beneficiary number, account number, email address, vehicle identifiers, biometric identifiers, photo with full-face visibility, IP address, device and digital identifiers, and other identifying numbers or codes.
- Since the face recording can be considered as the identifiable data, we do not share that data, and we only make the de-identified facial expression scores obtained from Face Reader software (the Noldus [16]) publicly available.
- The Biopac data has its own timestamp, and it can be synchronized with Facial expression scores timing.
Data Description
The data stored in multiple folders including:
- Behavioral data processed with E-prime software stored in "Behavioral_data" folder. The folder includes behavioral signals and the Chronos keypad recordings during the Calming and Vexing session:
	- The file names are encoded as: {"name of the session", "Subject number", ".csv"}
 
- Physiological data collected via Biopac stored in "Biopac_data" folder. The folders include raw physiological signals recorded via Biopac configuration for all the studied subjects.
	- The recorded signals include:
		- Electrodermal Activity (EDA) stored in "EDA" folder
- Electrocardiogram (ECG) stored in "ECG" folder
- Photoplethysmography (PPG) stored in "PPG" folder
- Respiration (RESP) stored in "RESP" folder
- Electromyogram (EMG) stored in "EMG" folder
- Data saved as {"Subject number", "gender_", "type of data (e.g., EDA, ECG)"} in CSV format.
 
- The Timing folder includes the trigger information with respect to onset of Biopac recording:
		- for each block saved as: {"Subject number", "gender","_Triggers_block.csv"}
- for each trial saved as: {"Subject number", "gender","_Triggers_trial.csv"}
 
- The raw Biopac data has the sampling frequency of 2 kHz, and recordings start from 0.
 
- The recorded signals include:
		
- Facial expression scores processed with Face Reader software stored called the Nodulus [16] in "Face_reader_data" folder. The facial expression scores for each subject can be found in the subject's folder (e.g., subject_3F). In the subject's folder:
	- The videos were taken in serial and their time stamps are given.
- Each video contains a *_state.txt and *_detailed.txt containing the most probable state and probability distribution of each state for each timestamp respectively.
- The onset of the first video is available in the de-identified time with respect to unix time.
- Video 2, 3 , ... are right after the previous video.
- Please use the onset of the first video to align the data with Biopac data.
 
- fNIRS data collected via NIRSport 2, processed with Nirslab software [15], and stored in "fNIRS_data" folder.
	- The processed total hemoglobin (tothb), oxygenated hemoglobin (ohb), deoxygenated hemoglobin (dohb), and saturated oxygen level (O2) data in millimole (mmol) for each subject and 44 channels are available through folders in CSV format.
		- data saved as: {"Subject number", "gender_", "type of data (e.g., dohb,)"}
 
- The recording start time is 0 and the sampling frequency is 7.6294 Hz.
- The trigger information and sessions timings with respect to onset of fNIRS recording are available in excel files for each subject, and that data is saved as: {"Subject number", "gender_Triggers.csv"}
- The timing of triggers can be used for the purpose of synchronization with Biopac data.
 
- The processed total hemoglobin (tothb), oxygenated hemoglobin (ohb), deoxygenated hemoglobin (dohb), and saturated oxygen level (O2) data in millimole (mmol) for each subject and 44 channels are available through folders in CSV format.
		
- Physiological signals collected via the Empatica device are stored in "Empatica_data" in the ".csv" format. The data collected via Empatica device for each subject can be found in the subject's folder (e.g., "Subject_3F). In the subject's folders, .csv files in this archive are in the following format:
	- The first row is the initial time of the session expressed as a de-identified timestamp with respect to unix timestamp in UTC. The second row is the sample rate expressed in Hz.
		- TEMP.csv: Data from temperature sensor expressed degrees on the Celsius (°C) scale.
- EDA.csv: Data from the electrodermal activity sensor expressed as microsiemens (μS).
- BVP.csv: Data from photoplethysmograph.
- ACC.csv: Data from 3-axis accelerometer sensor. The accelerometer is configured to measure acceleration in the range [-2g, 2g]. Therefore, the unit in this file is 1/64g. Data from x, y, and z axis are respectively in the first, second, and third columns.
- IBI.csv: Time between individuals heart beats extracted from the BVP signal. No sample rate is needed for this file. The first column is the time (with respect to the initial time) of the detected inter-beat interval expressed in seconds (s). The second column is the duration in seconds (s) of the detected inter-beat interval (i.e., the distance in seconds from the previous beat).
- HR.csv: Average heart rate extracted from the BVP signal. The first row is the initial time of the session expressed as a unix timestamp in UTC. The second row is the sample rate expressed in Hz.
- tags.csv: Event mark times. Each row corresponds to a physical button press on the device; the same time as the status LED is first illuminated. The time is expressed as a de-identified timestamp with respect to the unix timestamp in UTC; and it is synchronized with initial time of the session indicated in the related data files from the corresponding session.
 
 
- The first row is the initial time of the session expressed as a de-identified timestamp with respect to unix timestamp in UTC. The second row is the sample rate expressed in Hz.
		
Usage Notes
The data has been employed in a publication to investigate the viability of using music as a safe intervention for regulating the cognitive brain states including arousal and performance [1]. In [1], we mainly decode the underlying arousal and performance states from the collected skin conductance and behavioral signals, respectively. While we only utilize a portion of the data to obtain a better insight into brain response to music and cognitive load, we present the complete experimental settings and share the complete multimodal dataset with the research community so that we can provide a framework for any researcher who is interested in performing research in this paradigm. Interested researchers can employ multiple biomarkers simultaneously and obtain a robust understanding of the brain response to music and induced workload.
More specifically, the multimodal data is a rich neuro-physiological dataset that can contribute to the research society and enables other researchers to replicate this study, redesign the experiment, and improve the presented framework by addressing the limitations. These limitations include but are not limited to the absence of a control group, small sample size, subject familiarity with the music, and the lack of any cognitive state score provided by subjects. These limitations may prevent us from having a general and firm resolution. Hence, conducting studies with larger sample size, presence of control group, newly generated music, and shuffled task difficulty as well as music sessions are required to generalize the findings.
Ethics
The study protocol was reviewed and approved by the institutional review board (IRB) at the University of Houston, TX, USA (STUDY00002013).
Acknowledgements
We would like to extend our gratitude to Dr. Dilranjan S. Wickramasuriya for his valuable contribution during data collection. This work was supported in part by the U.S. National Science Foundation under grants 1942585/2226123 - CAREER: MINDWATCH: Multimodal Intelligent Noninvasive brain state Decoder for Wearable AdapTive Closed- loop arcHitectures and 1755780 - CRII: CPS: Wearable-Machine Interface Architecture.
Conflicts of Interest
Rose T. Faghih and Md. Rafiul Amin are co-inventors of a patent application filed by the University of Houston linked to this research [2].
References
- Khazaei, S., Parshi, S., Alam, S., Amin, M. R., & Faghih, R. T. (2024). A multimodal dataset for investigating working memory in presence of music: a pilot study. Frontiers in Neuroscience, 18, 1406814.
- Faghih, R. T., Wickramasuriya, D. S. & Amin, M. R. Systems and methods for estimating a nervous system state based on measurement of a physiological condition (2022). US Patent App. 17/514,129
- Ottonello, M., Fiabane, E., Pistarini, C., Spigno, P., and Torselli, E. (2019). Difficulties in emotion regulation during rehabilitation for alcohol addiction: correlations with metacognitive beliefs about alcohol use and relapse risk. Neuropsychiat. Dis. Treat. 2019, 2917–2925
- Fekri Azgomi, H., F. Branco, L. R., Amin, M. R., Khazaei, S., & Faghih, R. T. (2023). Regulation of brain cognitive states through auditory, gustatory, and olfactory stimulation with wearable monitoring. Scientific reports, 13(1), 12399
- Ray, K. D., & Mittelman, M. S. (2017). Music therapy: A nonpharmacological approach to the care of agitation and depressive symptoms for nursing home residents with dementia. Dementia, 16(6), 689-710.
- Yerkes, R. M. (1907). The Dancing Mouse, Vol. 1. New York, NY: Macmillan Company.
- Yerkes, R. M., and Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Punishment 1908, 27–41.
- Wickramasuriya, D. S., Amin, M., & Faghih, R. T. (2019). Skin conductance as a viable alternative for closing the deep brain stimulation loop in neuropsychiatric disorders. Frontiers in neuroscience, 780.
- Herff, C., Heger, D., Fortmann, O., Hennrich, J., Putze, F., & Schultz, T. (2014). Mental workload during n-back task—quantified in the prefrontal cortex using fNIRS. Frontiers in human neuroscience, 7, 935.
- Khazaei, S., Amin, M. R., Tahir, M., & Faghih, R. T. (2024). Bayesian Inference of Hidden Cognitive Performance and Arousal States in Presence of Music. IEEE Open Journal of Engineering in Medicine and Biology.
- BIOPAC Systems, Inc. BIOPAC Systems [Internet]. Goleta (CA): BIOPAC Systems, Inc.; [cited 2025 Jan 14]. Available from: https://www.biopac.com
- BIOPAC Systems, Inc. AcqKnowledge License Pack: Scripting, NDT, BHAPI, ACKAPI [Internet]. Goleta (CA): BIOPAC Systems, Inc.; [cited 2025 Jan 14]. Available from: https://www.biopac.com/product/acqknowledge-plus-scripting-ndt-bhapi-ackapi/
- Empatica Inc. Empatica [Internet]. Boston (MA): Empatica Inc.; [cited 2025 Jan 14]. Available from: https://www.empatica.com
- NIRx Medical Technologies, LLC. NIRx Medical Technologies [Internet]. Glen Head (NY): NIRx Medical Technologies, LLC; [cited 2025 Jan 14]. Available from: https://nirx.net
- NIRx Medical Technologies, LLC. Aurora software [Internet]. Glen Head (NY): NIRx Medical Technologies, LLC; [cited 2025 Jan 14]. Available from: https://nirx.net/software
- Noldus Information Technology. FaceReader [Internet]. Wageningen (Netherlands): Noldus Information Technology; [cited 2025 Jan 14]. Available from: https://www.noldus.com/facereader
- Psychology Software Tools, Inc. E-Prime software [Internet]. Sharpsburg (PA): Psychology Software Tools, Inc.; [cited 2025 Jan 14]. Available from: https://pstnet.com/eprime
Access
              Access Policy:
              
              Anyone can access the files, as long as they conform to the terms of the specified license.
            
              License (for files):
              
              Open Data Commons Attribution License v1.0
            
Discovery
DOI (version 1.0.0):
                
                https://doi.org/10.13026/6vh4-dk68
              
DOI (latest version):
                
                https://doi.org/10.13026/gr11-3d06
              
Corresponding Author
Files
Total uncompressed size: 1.6 GB.
Access the files
- Download the ZIP file (301.8 MB)
- 
              Download the files using your terminal:
              wget -r -N -c -np https://physionet.org/files/multimodal-nback-music/1.0.0/ 
- To download the files using AWS command line tools, first configure your AWS credentials.
| Name | Size | Modified | 
|---|---|---|
| Behavioral_data | ||
| Biopac_data | ||
| Empatica_data | ||
| Face_reader_data | ||
| fNIRS_data | ||
| Description.pdf (download) | 101.4 KB | 2024-11-12 | 
| LICENSE.txt (download) | 19.9 KB | 2025-01-14 | 
| SHA256SUMS.txt (download) | 16.2 KB | 2025-02-26 |