Name: Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks
Published: Aug. 25, 2022
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Restricted Access

Hrishikesh Rao , Emilie Cowen , Sophia Yuditskaya , Laura Brattain , Jamie Koerner , Gregory Ciccarelli , Ronisha Carter , Vivienne Sze , Tamara Broderick , Hayley Reynolds , Kyle McAlpin , Thomas Heldt

Published: Aug. 25, 2022. Version: 1.0.0

Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks: CogPilot Data Challenge (Sept. 8, 2022, 9:54 a.m.)

We are pleased to announce the publication of a dataset comprising multimodal physiologic, flight performance, and user interaction data streams, collected as participants performed virtual-reality flight tasks of varying difficulty.

With over an hour of highly multimodal physiological and behavioral signals collected on each of the thirty-five participants, the dataset represents a unique opportunity to develop analytics and models linking an individual’s physiology to their behavior and performance in tasks of varying difficulty.

More data are being collected and will be uploaded to PhysioNet periodically. The data underpins the CogPilot Data Challenge, which explores how performance measurements and physiological data can be used to assess the competency of student pilots. To participate in the CogPilot Challenge, visit: https://pilotperformance.mit.edu/

When using this resource, please cite: (show more options)
Rao, H., Cowen, E., Yuditskaya, S., Brattain, L., Koerner, J., Ciccarelli, G., Carter, R., Sze, V., Broderick, T., Reynolds, H., McAlpin, K., & Heldt, T. (2022). Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks (version 1.0.0). PhysioNet. https://doi.org/10.13026/azwa-ge48.

MLA	Rao, Hrishikesh, et al. "Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks" (version 1.0.0). PhysioNet (2022), https://doi.org/10.13026/azwa-ge48.
APA	Rao, H., Cowen, E., Yuditskaya, S., Brattain, L., Koerner, J., Ciccarelli, G., Carter, R., Sze, V., Broderick, T., Reynolds, H., McAlpin, K., & Heldt, T. (2022). Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks (version 1.0.0). PhysioNet. https://doi.org/10.13026/azwa-ge48.
Chicago	Rao, Hrishikesh, Cowen, Emilie, Yuditskaya, Sophia, Brattain, Laura, Koerner, Jamie, Ciccarelli, Gregory, Carter, Ronisha, Sze, Vivienne, Broderick, Tamara, Reynolds, Hayley, McAlpin, Kyle, and Thomas Heldt. "Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks" (version 1.0.0). PhysioNet (2022). https://doi.org/10.13026/azwa-ge48.
Harvard	Rao, H., Cowen, E., Yuditskaya, S., Brattain, L., Koerner, J., Ciccarelli, G., Carter, R., Sze, V., Broderick, T., Reynolds, H., McAlpin, K., and Heldt, T. (2022) 'Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks' (version 1.0.0), PhysioNet. Available at: https://doi.org/10.13026/azwa-ge48.
Vancouver	Rao H, Cowen E, Yuditskaya S, Brattain L, Koerner J, Ciccarelli G, Carter R, Sze V, Broderick T, Reynolds H, McAlpin K, Heldt T. Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks (version 1.0.0). PhysioNet. 2022. Available from: https://doi.org/10.13026/azwa-ge48.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

This dataset includes multimodal physiologic, flight performance, and user interaction data streams, collected as participants performed virtual flight tasks of varying difficulty. In virtual reality, individuals flew an "Instrument Landing System" (ILS) protocol, in which they had to land an aircraft mostly relying on the cockpit instrument readings. Participants were presented with four levels of difficulty, which were generated by varying wind speed, turbulence, and visibility. Each of the participants performed 12 runs, split into 3 blocks of four consecutive runs, one run at each difficulty, in a single experimental session. The sequence of difficulty levels was presented in a counterbalanced manner across blocks. Flight performance was quantified as a function of horizontal and vertical deviation from an ideal path towards the runway as well as deviation from the prescribed ideal speed of 115 knots. Multimodal physiological signals were aggregated and synchronized using Lab Streaming Layer. Descriptions of data quality are provided to assess each data stream. The starter code provides examples of loading and plotting the time synchronized data streams, extracting sample features from the eye tracking data, and building models to predict pilot performance from the physiology data streams.

Background

Unobtrusive and real-time detection of neurocognitive decline is an open challenge. Declining cognitive alertness impacts operating professions (e.g. pilots, air traffic controllers, medical professionals) in which human errors can have catastrophic consequences. While evaluation of neurocognitive decline through dedicated challenge tasks is a common approach in the laboratory or field studies, such a stimulus-response paradigm requires active participation, takes time and attention away from a professional's natural tasks, and usually results in poor compliance [1,2]. Reliable, robust, and unobtrusive technologies to determine neurocognitive status in operational or training environments remains a critical technology need. ;

The focus of the current study was on predicting performance in simulated flight tasks from physiological signals. The central hypothesis being that information contained in individuals' physiological response, in addition to traditional flight performance metrics, can provide insight into their cognitive state, and therefore aid in predicting flight performance. The affective computing and gaming literature have shown that if the difficulty of the game is matched to the competence of the player, the player is in a state of "flow," or positive excitement, conducive to effective learning. However, if there is a mismatch between the difficulty and the competence of the player, they may enter either a state of anxiety or boredom [3]. Further, deviations into anxiety or boredom are correlated with changes in physiology, including changes in heart rate, pupillary dilation, respiration rate, galvanic skin response, and more [4]. Therefore, the relationship between difficulty and competence, as assessed via physiological responses, can be exploited to optimally titrate the levels of difficulty. These data enable the development of algorithms that relate physiology to performance and can build the bridge towards optimal difficulty manipulation for effective learning.

Methods

Experimental Protocol: Participants performed repeated runs of a simulated ILS procedure. Starting from the same point every run, they had to use the instruments on the cockpit instrument panel to guide the aircraft to the runway in as ideal a path as possible. There were four levels of difficulty, in which the wind speed, turbulence, and visibility was varied, as noted in the table below. Each participant performed 3 blocks, where each block included 4 runs corresponding to one run per difficulty. Therefore, each participant performed 12 runs total. Within each block, the difficulty levels were presented in a counterbalanced manner. The ordering of difficulty levels was the same for all participants. Each run typically lasted 8 mins but could vary based on the speed the participants flew or if they crashed earlier. Each participant performed a 5-min rest period once before the start of the runs and once after. During the rest periods, they sat comfortably and relatively motionless. A typical experimental session lasted 4 hours, including setup and breaks. Throughout the experiment, a set of wearable sensors were used to collect multimodal physiological data. All sensor data were collected for all runs and rest periods.

ILS Difficulty	Wind	Clouds/Visibility	Turbulence
1	None	No Clouds Visibility Unlimited	None
2	140° @ 10 knots	Overcast above 2500’ Visibility 5 statute miles	None
3	3000’: 80° @ 10 kts 1000’: 140° @ 10 kts	Overcast above 1000’ Visibility 3 statute miles	Light above 1000’
4	4800’: 250° @ 10 kts 2800’: 80° @ 15 – 20 kts 800’: 200° @ 15 kts	Overcast above 400’ Visibility 1 statute mile	Light

Experimental Setup: Seated comfortably, participants wore an HTC Vive Pro Eye headset that displayed the virtual environment. The flight software was X-Plane 11 (ver11.50; Laminar Research, USA). To control the simulated T-6 Texan II aircraft (developed by Flite Advantage Simulation & Training, LLC, Florida, USA), volunteers manipulated a physical joystick, a throttle, and rudder pedals (Thrustmaster HOTAS Warthog flight stick, dual throttle, and pendular rudder controllers; Thrustmaster, USA). The table below provides an overview of the multimodal physiological sensors that were used. In addition, participant interaction with the joystick, throttle, and rudder were captured. Finally, virtual aircraft behavior was stored and used to determine instantaneous and cumulative flight performance. The aircraft behavior included 6 degrees of freedom motion (i.e., latitude, longitude, elevation, yaw, pitch, roll), several indicators of speed (e.g., knots indicated airspeed), trims (i.e., elevator, aileron), and ILS instrument readings. For details on accessing all aircraft data, refer to Table 8-2 in the Data Dictionary, which is included in the reference documentation. ;

Modality	Location	Device	Nominal Sampling Rate ;
Electrocardiography	Chest	Shimmer3 ECG Unit	512 Hz
Respiration	Chest	Shimmer3 ECG Unit	512 Hz
Respiration	Chest	Respitrace	1024 Hz
Accelerometry	Chest	Shimmer3 ACC Unit	128 Hz
Accelerometry	Right Forearm	Shimmer3 ACC Unit	128 Hz
Electromyography	Right Forearm	Shimmer3 ACC Unit	128 Hz
Electrodermal Activity	Left Middle Fingertip	Shimmer3 GSR+ Unit	1024 Hz
Photoplethysmography	Left Hand Fingers	Shimmer3 GSR+ Unit	1024 Hz
Eye Tracking	Face	HTC Vive Pro Eye	250 Hz
Head Movement	Head	X-Plane 11 Simulator	4 Hz

Data Synchronization: Lab Streaming Layer (LSL; ver113) was used to aggregate and synchronize the data [5,6]. LSL is a library of tools that enables the unified collection of time series data and handles time synchronization along a common time axis across independently collected data streams. Each datum from each sensor is marked with a time stamp that is common and synchronized across all streams. LSL derives the timestamps from the local high-resolution clock of the data acquisition computer with a temporal resolution better than milliseconds [7]. LSL also handles clock offsets between devices and network packet exchange delays and provides for offline jitter correction.

Measures of Flight Performance: There are three standard components to the evaluation of flight performance during an ILS approach. The first is the deviation away from the ideal horizontal path leading towards the runway. The second is the deviation away from the ideal vertical path leading to the runway. In ILS parlance, these are called the Localizer (horizontal) and Glide Slope (vertical) lines, respectively. Note that a –3 degrees descent is ideal for Glide Slope. Effectively, the error for the Localizer and Glide Slope are the angular deviations away from ideal descent trajectory. In our calculations of flight performance, any absolute error greater than 3 degrees is capped at 3 degrees, thereby creating a scale of –3 to 3 for the angular errors. The third component is the absolute deviation from the ideal speed of 115 knots. Dividing the airspeed error by 10 puts extreme air speed errors onto a –3 to 3 range. The scaled speed errors > 3 are also capped at 3. The instantaneous error is computed for each time point. The total error for a trial is computed as the sum of the magnitude of deviation in each of the three components, divided by the total duration of the trial, yielding a single error value for each trial.

Data Description

Data from 35 volunteers are included in the first version of this dataset. Each volunteer received a de-identified ID upon providing written informed consent to participate in this study. Broadly, the data included are of two categories: Time series data collected and derived data from the collected time series data.

The data for each participant is structured in folder hierarchies corresponding to participant ID, then date, and then run number. Time series data is stored within each "run” folder. Files are labeled in the format: sub-cpXXX_ses-YYYYMMDD_task-XXXX_stream-XXXX_feat-chunk_level-XXX_run-XXX_dat.csv. For each time series file, there is an associated metadata file, labeled in the format: sub-cpXXX_ses-YYYYMMDD_task-XXXX_stream-XXXX_feat-chunk_level-XXX_run-XXX_hea.csv. The task type can either be "rest," corresponding to the 5-min periods at the start and end of the experiment wherein the participant sat still, or "ils," corresponding to trials wherein they performed the ILS scenario. The stream names identify the data streams collected in LSL. The level identifiers denote the experimental difficulty of the level, ranging from levels 1 (easiest) through level 4 (hardest). Metadata for each data stream (e.g., nominal and effective sampling rates, and sample counts) are stored in hidden files with the same name as the raw data, except file names are prepended with a period so that they appear as hidden files.

Flight performance data was derived from the raw time series aircraft behavior data. Time varying flight performance data is included for each run of the ILS. The files following the naming convention: sub-cpXXX_ses-YYYYMMDD_task-ils_stream-lslxp11_feat-perfmetric_level-XXX_run-XXX.csv. The time stamps correspond to the samples within the XPlane~11 aircraft (i.e., *stream-lslxp11xpac*) data stream. A single summary file of the cumulative errors for every run of the ILS is included. The file, located within the task-ils sub-folder, is named "PerfMetrics.csv”. As noted below, the compute_flight_performance.m code details the derivation of flight performance data from the raw aircraft behavior data.

Usage Notes

Data Quality: To quantify data quality, the amount of missing data, the duration of the run, and amount of data close to the nominal sampling rate were computed. Visualization of the data quality is included for every file and data stream. The visualization includes a time series plot showing the effective sampling rate of each data point and marking periods of low data quality (i.e., data points where the sampling rate dropped below the 5% threshold). The summary table provides a high-level view of data quality, whereas the time series data provides a finer-grained view of where, within a given trial, there were potentially missed packets of data. Note that in the summary table, cells highlighted in grey indicate that the file is missing for that run.

Starter Code: The starter code, written in Python as Jupyter notebooks, is broken into three sequential steps. The steps cover data loading and visualization, signal processing and feature extraction, and finally machine learning prediction of performance from features. The starter code is meant to be an example and not comprehensive coverage of the full set of signals and subsequent analyses.

In the first step, Step1_LoadExplore_TimeSeries.ipynb, signals are loaded and plotted. Example physiological signals include electrocardiography, electromyography, and pupil diameter among others. Aircraft data visualized are the latitude, longitude, and elevation, which are a small subset of the 18 aircraft-related signals recorded from XPlane-11. To show the derived signal of performance, the three component errors and total instantaneous error are plotted. Note that all data are plotted against MATLAB datenum time. In the last portion of the file, a few signals are plotted synchronously in their absolute time (i.e., “wall clock” time).

In the second step, Step2_FeatureGeneration.ipynb, features are generated from the eye tracking data stream. First, these features include power spectral density of eye movements, fixation and saccade rates, and pupil diameter. The features are aggregated and written to a CSV file to be used in the third step.

In the third step, Step3_PredictiveModeling.ipynb, the eye tracking features are used to classify the run difficulty and to estimate total flight performance error. A sample machine learning pipeline is set up wherein 25% of the data is held out, the model is trained on the held-in data, and predictions made on the held-out data. A support vector machine (SVM) is used to classify the difficulty levels, just focusing on the extremes of the easiest vs. the hardest difficulty, and a linear regression is used to predict the performance. These results do not necessarily set the benchmark for prediction performance. Rather, the code is meant to serve as an example for handling the data.

Flight Performance Code: Though the flight performance data are provided for the entire dataset, the code to compute the derived data, compute_flight_performance.m, is also provided in the starter code page. The code computes instantaneous and cumulative errors and saves a file per run, as well as a single file that summarizes the cumulative errors for all runs.

Participant Piloting Experience: Included in the reference documentation are data documenting each participant's flight experience prior to participating in the study (see ParticipantPriorFlightExperience.pdf). They reported their experience with both fixed-wing and rotocraft (I.e., helicopter) and whether those hours were spent in a real airframe or simulator.

Release Notes

Initial release version 1.0

Ethics

Data collection was approved by MIT's Committee on the Use of Humans as Experimental Subjects (COUHES) and by the Air Force Human Research Protection Office (HRPO). Participating subjects provided written informed consent prior to participation and consented to have their de-identified data made publicly available. The data were reviewed and approved for public release by the Secretary of the Air Force Public Affairs Office (SAF/PA).

Acknowledgements

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. Research was sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. ;

Dr. Gregory Ciccarelli's contributions were made during his tenure at the MIT Lincoln Laboratory. ;

Conflicts of Interest

The authors have no conflicts of interest.

References

Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical issues in ergonomics science, 3(2), 159-177.
Cain, B. (2007). A review of the mental workload literature. Technical report, Defence Research and Development Toronto, Canada.
Landhäußer, A., & Keller, J. (2012). Flow and its affective, cognitive, and performance-related consequences. In Advances in flow research (pp. 65-85). Springer, New York, NY.
Kotsia, I., Zafeiriou, S., & Fotopoulos, S. (2013). Affective gaming: A comprehensive survey. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 663-670).
Kothe, C., Medine, D., Boulay, C., Grivich, M., & Stenner, T. (2014). Lab streaming layer. URL https://github. com/sccn/labstreaminglayer.
Smalt, C., Rao, H., & Ciccarelli, G. (2021). Signal Acquisition Modules for Lab Streaming Layer: v1.0 (Version v1.0). Zenodo. http://doi.org/10.5281/zenodo.4612264.
Kothe, C., Medine, D., Boulay, C., Grivich, M., & Stenner, T. (2014). Lab streaming layer. URL https://labstreaminglayer.readthedocs.io/info/time_synchronization.html Accessed on [8-24-2022]