Database Restricted Access

Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks

Hrishikesh Rao Emilie Cowen Sophia Yuditskaya Laura Brattain Jamie Koerner Gregory Ciccarelli Ronisha Carter Vivienne Sze Tamara Broderick Hayley Reynolds Kyle McAlpin Thomas Heldt

Published: Aug. 25, 2022. Version: 1.0.0

When using this resource, please cite: (show more options)
Rao, H., Cowen, E., Yuditskaya, S., Brattain, L., Koerner, J., Ciccarelli, G., Carter, R., Sze, V., Broderick, T., Reynolds, H., McAlpin, K., & Heldt, T. (2022). Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks (version 1.0.0). PhysioNet.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


This dataset includes multimodal physiologic, flight performance, and user interaction data streams, collected as participants performed virtual flight tasks of varying difficulty. In virtual reality, individuals flew an "Instrument Landing System" (ILS) protocol, in which they had to land an aircraft mostly relying on the cockpit instrument readings. Participants were presented with four levels of difficulty, which were generated by varying wind speed, turbulence, and visibility. Each of the participants performed 12 runs, split into 3 blocks of four consecutive runs, one run at each difficulty, in a single experimental session. The sequence of difficulty levels was presented in a counterbalanced manner across blocks. Flight performance was quantified as a function of horizontal and vertical deviation from an ideal path towards the runway as well as deviation from the prescribed ideal speed of 115 knots. Multimodal physiological signals were aggregated and synchronized using Lab Streaming Layer. Descriptions of data quality are provided to assess each data stream. The starter code provides examples of loading and plotting the time synchronized data streams, extracting sample features from the eye tracking data, and building models to predict pilot performance from the physiology data streams.


Unobtrusive and real-time detection of neurocognitive decline is an open challenge. Declining cognitive alertness impacts operating professions (e.g. pilots, air traffic controllers, medical professionals) in which human errors can have catastrophic consequences. While evaluation of neurocognitive decline through dedicated challenge tasks is a common approach in the laboratory or field studies, such a stimulus-response paradigm requires active participation, takes time and attention away from a professional's natural tasks, and usually results in poor compliance [1,2]. Reliable, robust, and unobtrusive technologies to determine neurocognitive status in operational or training environments remains a critical technology need. ;

The focus of the current study was on predicting performance in simulated flight tasks from physiological signals. The central hypothesis being that information contained in individuals' physiological response, in addition to traditional flight performance metrics, can provide insight into their cognitive state, and therefore aid in predicting flight performance. The affective computing and gaming literature have shown that if the difficulty of the game is matched to the competence of the player, the player is in a state of "flow," or positive excitement, conducive to effective learning. However, if there is a mismatch between the difficulty and the competence of the player, they may enter either a state of anxiety or boredom [3]. Further, deviations into anxiety or boredom are correlated with changes in physiology, including changes in heart rate, pupillary dilation, respiration rate, galvanic skin response, and more [4]. Therefore, the relationship between difficulty and competence, as assessed via physiological responses, can be exploited to optimally titrate the levels of difficulty. These data enable the development of algorithms that relate physiology to performance and can build the bridge towards optimal difficulty manipulation for effective learning.


Experimental Protocol: Participants performed repeated runs of a simulated ILS procedure. Starting from the same point every run, they had to use the instruments on the cockpit instrument panel to guide the aircraft to the runway in as ideal a path as possible. There were four levels of difficulty, in which the wind speed, turbulence, and visibility was varied, as noted in the table below. Each participant performed 3 blocks, where each block included 4 runs corresponding to one run per difficulty. Therefore, each participant performed 12 runs total. Within each block, the difficulty levels were presented in a counterbalanced manner. The ordering of difficulty levels was the same for all participants. Each run typically lasted 8 mins but could vary based on the speed the participants flew or if they crashed earlier. Each participant performed a 5-min rest period once before the start of the runs and once after. During the rest periods, they sat comfortably and relatively motionless. A typical experimental session lasted 4 hours, including setup and breaks. Throughout the experiment, a set of wearable sensors were used to collect multimodal physiological data. All sensor data were collected for all runs and rest periods.

ILS Difficulty






No Clouds

Visibility Unlimited



140° @ 10 knots

Overcast above 2500’

Visibility 5 statute miles



3000’: 80° @ 10 kts

1000’: 140° @ 10 kts

Overcast above 1000’

Visibility 3 statute miles

Light above 1000’



4800’: 250° @ 10 kts

2800’: 80° @ 15 – 20 kts

800’: 200° @ 15 kts

Overcast above 400’

Visibility 1 statute mile


Experimental Setup: Seated comfortably, participants wore an HTC Vive Pro Eye headset that displayed the virtual environment. The flight software was X-Plane 11 (ver11.50; Laminar Research, USA). To control the simulated T-6 Texan II aircraft (developed by Flite Advantage Simulation & Training, LLC, Florida, USA), volunteers manipulated a physical joystick, a throttle, and rudder pedals (Thrustmaster HOTAS Warthog flight stick, dual throttle, and pendular rudder controllers; Thrustmaster, USA). The table below provides an overview of the multimodal physiological sensors that were used. In addition, participant interaction with the joystick, throttle, and rudder were captured. Finally, virtual aircraft behavior was stored and used to determine instantaneous and cumulative flight performance. The aircraft behavior included 6 degrees of freedom motion (i.e., latitude, longitude, elevation, yaw, pitch, roll), several indicators of speed (e.g., knots indicated airspeed), trims (i.e., elevator, aileron), and ILS instrument readings. For details on accessing all aircraft data, refer to Table 8-2 in the Data Dictionary, which is included in the reference documentation. ;




Sampling Rate ;



Shimmer3 ECG Unit 

512 Hz 



Shimmer3 ECG Unit 

512 Hz 




1024 Hz 



Shimmer3 ACC Unit 

128 Hz 


Right Forearm

Shimmer3 ACC Unit 

128 Hz 


Right Forearm 

Shimmer3 ACC Unit 

128 Hz 

Electrodermal Activity 

Left Middle Fingertip

Shimmer3 GSR+ Unit 

1024 Hz 


Left Hand Fingers 

Shimmer3 GSR+ Unit 

1024 Hz 

Eye Tracking 


HTC Vive Pro Eye 

250 Hz 

Head Movement 


X-Plane 11 Simulator 

4 Hz 

Data Synchronization: Lab Streaming Layer (LSL; ver113) was used to aggregate and synchronize the data [5,6]. LSL is a library of tools that enables the unified collection of time series data and handles time synchronization along a common time axis across independently collected data streams. Each datum from each sensor is marked with a time stamp that is common and synchronized across all streams. LSL derives the timestamps from the local high-resolution clock of the data acquisition computer with a temporal resolution better than milliseconds [7]. LSL also handles clock offsets between devices and network packet exchange delays and provides for offline jitter correction.

Measures of Flight Performance: There are three standard components to the evaluation of flight performance during an ILS approach. The first is the deviation away from the ideal horizontal path leading towards the runway. The second is the deviation away from the ideal vertical path leading to the runway. In ILS parlance, these are called the Localizer (horizontal) and Glide Slope (vertical) lines, respectively. Note that a –3 degrees descent is ideal for Glide Slope. Effectively, the error for the Localizer and Glide Slope are the angular deviations away from ideal descent trajectory. In our calculations of flight performance, any absolute error greater than 3 degrees is capped at 3 degrees, thereby creating a scale of –3 to 3 for the angular errors. The third component is the absolute deviation from the ideal speed of 115 knots. Dividing the airspeed error by 10 puts extreme air speed errors onto a –3 to 3 range. The scaled speed errors > 3 are also capped at 3. The instantaneous error is computed for each time point. The total error for a trial is computed as the sum of the magnitude of deviation in each of the three components, divided by the total duration of the trial, yielding a single error value for each trial.

Data Description

Data from 35 volunteers are included in the first version of this dataset. Each volunteer received a de-identified ID upon providing written informed consent to participate in this study. Broadly, the data included are of two categories: Time series data collected and derived data from the collected time series data.

The data for each participant is structured in folder hierarchies corresponding to participant ID, then date, and then run number. Time series data is stored within each "run” folder. Files are labeled in the format: sub-cpXXX_ses-YYYYMMDD_task-XXXX_stream-XXXX_feat-chunk_level-XXX_run-XXX_dat.csv. For each time series file, there is an associated metadata file, labeled in the format: sub-cpXXX_ses-YYYYMMDD_task-XXXX_stream-XXXX_feat-chunk_level-XXX_run-XXX_hea.csv. The task type can either be "rest," corresponding to the 5-min periods at the start and end of the experiment wherein the participant sat still, or "ils," corresponding to trials wherein they performed the ILS scenario. The stream names identify the data streams collected in LSL. The level identifiers denote the experimental difficulty of the level, ranging from levels 1 (easiest) through level 4 (hardest). Metadata for each data stream (e.g., nominal and effective sampling rates, and sample counts) are stored in hidden files with the same name as the raw data, except file names are prepended with a period so that they appear as hidden files.

Flight performance data was derived from the raw time series aircraft behavior data. Time varying flight performance data is included for each run of the ILS. The files following the naming convention: sub-cpXXX_ses-YYYYMMDD_task-ils_stream-lslxp11_feat-perfmetric_level-XXX_run-XXX.csv. The time stamps correspond to the samples within the XPlane~11 aircraft (i.e., *stream-lslxp11xpac*) data stream. A single summary file of the cumulative errors for every run of the ILS is included. The file, located within the task-ils sub-folder, is named "PerfMetrics.csv”. As noted below, the compute_flight_performance.m code details the derivation of flight performance data from the raw aircraft behavior data.

Usage Notes

Data Quality: To quantify data quality, the amount of missing data, the duration of the run, and amount of data close to the nominal sampling rate were computed. Visualization of the data quality is included for every file and data stream. The visualization includes a time series plot showing the effective sampling rate of each data point and marking periods of low data quality (i.e., data points where the sampling rate dropped below the 5% threshold). The summary table provides a high-level view of data quality, whereas the time series data provides a finer-grained view of where, within a given trial, there were potentially missed packets of data. Note that in the summary table, cells highlighted in grey indicate that the file is missing for that run.

Starter Code: The starter code, written in Python as Jupyter notebooks, is broken into three sequential steps. The steps cover data loading and visualization, signal processing and feature extraction, and finally machine learning prediction of performance from features. The starter code is meant to be an example and not comprehensive coverage of the full set of signals and subsequent analyses. 

In the first step, Step1_LoadExplore_TimeSeries.ipynb, signals are loaded and plotted. Example physiological signals include electrocardiography, electromyography, and pupil diameter among others. Aircraft data visualized are the latitude, longitude, and elevation, which are a small subset of the 18 aircraft-related signals recorded from XPlane-11. To show the derived signal of performance, the three component errors and total instantaneous error are plotted. Note that all data are plotted against MATLAB datenum time. In the last portion of the file, a few signals are plotted synchronously in their absolute time (i.e., “wall clock” time). 

In the second step, Step2_FeatureGeneration.ipynb, features are generated from the eye tracking data stream. First, these features include power spectral density of eye movements, fixation and saccade rates, and pupil diameter. The features are aggregated and written to a CSV file to be used in the third step. 

In the third step, Step3_PredictiveModeling.ipynb, the eye tracking features are used to classify the run difficulty and to estimate total flight performance error. A sample machine learning pipeline is set up wherein 25% of the data is held out, the model is trained on the held-in data, and predictions made on the held-out data. A support vector machine (SVM) is used to classify the difficulty levels, just focusing on the extremes of the easiest vs. the hardest difficulty, and a linear regression is used to predict the performance. These results do not necessarily set the benchmark for prediction performance. Rather, the code is meant to serve as an example for handling the data.

Flight Performance Code: Though the flight performance data are provided for the entire dataset, the code to compute the derived data, compute_flight_performance.m, is also provided in the starter code page. The code computes instantaneous and cumulative errors and saves a file per run, as well as a single file that summarizes the cumulative errors for all runs. 

Participant Piloting Experience: Included in the reference documentation are data documenting each participant's flight experience prior to participating in the study (see ParticipantPriorFlightExperience.pdf). They reported their experience with both fixed-wing and rotocraft (I.e., helicopter) and whether those hours were spent in a real airframe or simulator.

Release Notes

Initial release version 1.0


Data collection was approved by MIT's Committee on the Use of Humans as Experimental Subjects (COUHES) and by the Air Force Human Research Protection Office (HRPO). Participating subjects provided written informed consent prior to participation and consented to have their de-identified data made publicly available. The data were reviewed and approved for public release by the Secretary of the Air Force Public Affairs Office (SAF/PA).


DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. Research was sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. ;

Dr. Gregory Ciccarelli's contributions were made during his tenure at the MIT Lincoln Laboratory. ;

Conflicts of Interest

The authors have no conflicts of interest.


  1. Wickens, C. D. (2002). Multiple resources and performance prediction. Theoretical issues in ergonomics science, 3(2), 159-177.
  2. Cain, B. (2007). A review of the mental workload literature. Technical report, Defence Research and Development Toronto, Canada.
  3. Landhäußer, A., & Keller, J. (2012). Flow and its affective, cognitive, and performance-related consequences. In Advances in flow research (pp. 65-85). Springer, New York, NY.
  4. Kotsia, I., Zafeiriou, S., & Fotopoulos, S. (2013). Affective gaming: A comprehensive survey. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 663-670).
  5. Kothe, C., Medine, D., Boulay, C., Grivich, M., & Stenner, T. (2014). Lab streaming layer. URL https://github. com/sccn/labstreaminglayer.
  6. Smalt, C., Rao, H., & Ciccarelli, G. (2021). Signal Acquisition Modules for Lab Streaming Layer: v1.0 (Version v1.0). Zenodo.
  7. Kothe, C., Medine, D., Boulay, C., Grivich, M., & Stenner, T. (2014). Lab streaming layer. URL Accessed on [8-24-2022]


Access Policy:
Only registered users who sign the specified data use agreement can access the files.

License (for files):
PhysioNet Restricted Health Data License 1.5.0

Data Use Agreement:
PhysioNet Restricted Health Data Use Agreement 1.5.0


DOI (version 1.0.0):

DOI (latest version):

Project Website:

Corresponding Author
You must be logged in to view the contact information.