Database Open Access

# A multi-camera and multimodal dataset for posture and gait analysis

Published: Nov. 1, 2021. Version: 1.0.0

Palermo, M., Mendes Lopes, J., André, J., Cerqueira, J., & Santos, C. (2021). A multi-camera and multimodal dataset for posture and gait analysis (version 1.0.0). PhysioNet. https://doi.org/10.13026/fyxw-n385.

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

## Abstract

Gait and posture analysis while using assisting robotic devices is of utmost importance to attain effective assistance. This work provides a multi-camera, multimodal, and detailed dataset for vision-based applications using a wheeled robotic walker equipped with a pair of affordable cameras. Depth data was acquired at 30 fps from a total of 14 healthy participants walking at 3 different gait speeds, across 3 different walking scenarios/paths at 3 different locations. Simultaneously, accurate skeleton joint data was recorded using an inertial-based commercial motion capture system that provides a reliable ground-truth for classical or novel (i.e., machine learning-based) vision-based applications. In total, the database contains approximately 166K frames of synchronized data, which amounts to 92 minutes of total recording time. This dataset may contribute to the development and evaluation of: i) classic or data-driven vision-based pose estimation algorithms; ii) applications in human detection and tracking, and movement forecasting; iii) and gait/posture metrics analysis using a rehabilitation device.

## Background

Gait and posture disabilities are a common form of disability. These may result in a lack of stability, affected motor coordination, poor balance, and muscle weakness, leading to an increased risk of falls and fall-related injuries [1]. Consequently, quality of life is highly jeopardized, causing social-economic consequences due to the increased institutionalization and dependence on others [2,3].

Robotics-based rehabilitation is an evolving area that aims to improve the quality of life of motor-impaired persons by providing residual motor skills recovery based on repetitive and intensity-adapted training along with assistive devices [1]. To provide a more user-centered approach by designing rehabilitation therapies considering each person’s disability level, gait and posture analysis is relevant. Current solutions are based on expensive systems, namely optical motion capture systems (e.g. Vicon, Qualisys), that require complex setups along with specific environments and workspaces. Locomotion analysis using low-cost equipment, e.g. Kinect SDK (Microsoft Corporation, USA), has been presented in the literature [4]; however, this solution is prone to errors especially when dealing with non-trivial poses, rapid movements, or challenging light conditions. Additionally, the foot joints present instability and high error, making it unsuitable for gait analysis. [5,6].

Recent studies involving vision-based machine learning techniques are showing great potential for locomotion analysis. Besides being a low-cost solution, evidence shows reasonable precision on estimating the person’s pose without the need of wearable markers/sensors nor complex setups [7]. Nevertheless, this approach requires a considerable amount of quality data to train the models and achieve the precision and accuracy required to be an effective locomotion analysis tool.

Current available datasets of gait and posture analysis, present data that do not correspond to the real-world settings [8]. Most datasets that present accurate 3D joint data, usually obtained with on-body visual markers, are captured in laboratory context, within controlled environments [9,10,11,12]. Furthermore, most datasets do not provide camera-related data together with 3D joint coordinates obtained with standard motion tracker systems [13]. Those that provide camera-related data, namely depth recordings, have joint data that is captured with the Kinect SDK (Microsoft Corporation, USA) which is prone to errors and not robust to environmental conditions [11,12].

To address these challenges, this dataset presents multi-camera vision data involving 14 healthy subjects walking with a robotic walker. The dataset includes raw inertial data, segments’ position, orientation, acceleration and angular velocity, and joint angles, measured with the commercially available Xsens MTw Awinda motion capture system [14], and depth frames of gait (GC) and posture (PC), captured with the walker's embedded cameras [3].

## Methods

### Participants

This dataset includes data from 14 healthy subjects (10 males and 4 females; body mass: 69.7±11.4 kg; body height: 172±10.2 cm; age: 25.4±2.31 years-old), that were recruited and accepted to participate, voluntarily, in the data collection. Participants were selected based on a set of inclusion criteria, as follows: i) present healthy locomotion without any clinical history of abnormalities; ii) present total postural control; iii) present body height between 150 and 190 cm, iv) are 18 or more years old; and v) provide written and informed consent to participate in the study. Data collection was conducted under the ethical procedures of the Ethics Committee in Life and Health Sciences (CEICVS 063/2021), following the Helsinki Declaration and the Oviedo Convention. Participants’ rights were preserved and, therefore, personal information was remained confidential and it is not provided in this dataset.

### Instrumentation and Data Collection

Participants were instructed to bring long clothes and sport shoes. Each participant was instrumented with the full-body inertial motion tracking system MTw Awinda (Xsens Technologies, B.V., The Netherlands), placing seventeen IMUs on head, shoulders, chest, arms, forearms, wrist, waist, thighs, shanks, and feet, secured with a strap. An additional IMU was placed on the walker’s upper camera to assess the camera’s orientation regarding the MVN world axis. This was performed to provide camera-related clean data without showing the sensors’ location. The sensors' placement followed the manufacturer's guidelines, and were always performed by the same researcher, thus, minimizing errors caused by sensors' misplacement.

Data collection included: i) kinematic data, namely sensors’ free acceleration, magnetic field, and orientation; segments’ orientation, position, velocity, and acceleration; and joints’ angle. These were acquired at 60 Hz using the MVN software; and ii) depth images from the walker's embedded cameras captured at 30 frames-per-second (fps). All data were time synchronized using a hardware trigger.

### Experimental protocol

Experimental protocol proceeded as follows: firstly, the required participants' anthropometric data were measured and introduced on the MVN Analyze to adjust the software’s biomechanical model (MVN BIOMECH) to the participant’s physiognomy;secondly, the MVN BIOMECH was calibrated considering the manufacturer’s guidelines, ensuring the calibration’s quality for each subject. During the calibration procedures, the additional IMU was placed on a stick and after the successful calibration, it was moved to the upper camera ensuring the same orientation within trials.

Subsequently, each participant experienced a one-day protocol in which they performed 3 trials, one per each slow gait speed (0.3, 0.5, and 0.7 m/s), which were considered since are often observed in persons with motor disabilities [15], and considering 3 different sequences: i) walking forward in a corridor, ii) turning right in a corner, and iii) turning left in a corner. Each trial was repeated 3 times, but in different locations, to accommodate different scenarios and environment conditions.

Each trial proceeded as follows: firstly, the walker was placed on the starting line of each location (these were measured and drawn on the floor prior to data collection). Then, the participants were placed in front of the walker, and were asked to assume the N-Pose (similar to T-Pose, but with arms straight close to the body, as recommended by Xsens) to reset the IMUs’ internal referential. Afterwards, the participants were asked to grab both walker's handles. After these first 3 steps, data collection started, using a remote controller to guide the walker. The participants walked normally until they reached the end line of the trial. Finally, the recording was stopped and the walker was moved to the next trial's starting location, repeating the process. Prior to data collection, the participants performed a familiarization trial with the robotic walker and the selected gait speeds. Moreover, prior to each session, a set of image data were collected and used offline to obtain the relationship between the cameras' position and orientation (i.e., the transformation matrix) considering both cameras and regarding the MVN world axis.

### Data Processing

Data from the inertial motion tracking and the walker's depth images are synchronized temporally using timestamp files which were recorded during acquisition with the walker's embedded software. The corresponding temporal indexes for each data modality were saved in a ".csv" file which can be used to easily select data when needed, while also keeping all raw samples obtained.

The MVN BIOMECH position was normalized to the origin of the world axis, considering the center-of-mass position and the heading removed, so that the biomechanical model is always facing forward. This processed data is referred to as "normalized_skeleton_3D".

The joints' positions were also related with the walker's cameras. First, the MVN BIOMECH root joint was centred in the referential origin. Then, a rotation was applied to transform the MVN BIOMECH referential to the posture camera's, using the additional MTw Awinda PROP sensor placed over the camera. Lastly, a translation was applied to place the MVN BIOMECH wrists in the same position as the corresponding walker's handles. This offset was obtained with an extrinsic calibration step. This method is valid as long as the subject is always grabbing the walker's handles, which was ensured during acquisition.

This transformation was computed to both hands to reduce symmetrical errors coming from the calibration procedure. This processing step allows having labeled 3D joints which are directly related to the information obtained from the cameras' data. These data are referred to as "aligned_skeleton_3D".

The joint’s positions were, then, projected to 2D space, using the camera intrinsic parameters, which can be used to label the joints in the 2D frames. This projection is direct in the frames of the posture camera. Regarding the gait camera, an extrinsic transformation was applied, converting the points from the posture camera to the gait camera referential. These data are referred to as "aligned_skeleton_2D".

Moreover, the joint’s positions were projected to 3D space using the camera intrinsic parameters for each of the cameras and the depth image. Then, a referential transformation was performed to put the gait data pointcloud into the posture camera referential. These data are not saved as part of the "processed_data" since it occupies a significant amount of space. Nevertheless, the authors provide code to obtain the pointcloud data if needed.

Additionally, the foot joints from the MVN BIOMECH contained in the "Segment Position.csv" file ("foot", "toe") were, in all methods, replaced with ones from the ".c3d" file ("heel", "toe”). This moves the foot keypoints from the ankle to the heel, which is more relevant for the analysis of gait metrics [3].

## Data Description

This dataset is organized in 5 levels, as follows: i) level 0 (Root), includes participant's metadata, general dataset information, raw data folders, and processed data folders; ii) level 1 (Participant): includes a folder for each of the fourteen participants of this data collection; iii) level 2 (Sequence): contains a folder for each performed sequence (walking straight or turning and its speed), along with both intrinsic and extrinsic calibration files; iv) level 3 (Location): includes a folder with the repetition's location ID (corner1/2/3 and corridor1/2/3); and v) level 4 (Data): presents the data files for each of the aforementioned modalities.

### Raw Data

These data are organized inside the "raw_data" folder (level 0), following the previously detailed structure. Raw data includes: i) calibration data, with both intrinsic and extrinsic files; ii) the skeleton joint data obtained with the MVN Analyze software; iii) the camera's depth frame data; and iv) the synchronization stamp file.

### Calibration Data

Inside each of the participants' directories (level 2), two calibration files are presented: one for the cameras' intrinsic parameters, and another for the extrinsic referential transformations that allows both stereo calibration between the two cameras and the positioning of MVN BIOMECH model regarding the walker's posture camera. These files were respectively named "intrinsic_calibration.json" and "extrinsic_calibration.json".

### Skeleton data

Files obtained from the MVN software are presented in level 4 for each individual trial. These are the i) files exported from the MVN analyse as ".csv" files in the "*_csv" folder. It contains raw sensor readings (magnetometer, gyroscope, accelerometer and orientation), the center-of-mass location, joint kinematics (angle, velocity, acceleration) and segments' position and orientation, obtained through the MVN BIOMECH model; ii) exported ".c3d" files also processed by the MVN software, containing a more complete set of body keypoints extrapolated from the MVN BIOMECH model.

Additionally, a timestamp (".stamp" file) was saved with the instant the trigger signal was sent to the MTw Awinda base station to start recording. This was necessary to correctly align the data temporally due to the asynchronous nature of the ROS system used in the walker.

### Cameras' frame data

Depth frames from each of the cameras were saved into the respective folders ("gait_depth_registered" and "posture_depth_registered"). A timestamp was also saved for each of the depth frames, which was written in the name of each file.

### Processed Data

All the processed data is stored inside the "processed_data" folder (level 0) and follows the same hierarchical structure as the "raw_data" folder. The files for each trial are organized in level 4.

It is composed of 5 files saved in ".csv" format. Four of them contain the joint data , namely: the normalized joint data in 3D space ("norm_skeleton_3d.csv"), the aligned joint data in 3D space ("aligned_skeleton_3d.csv"), and the aligned 2D joint data for the gait ("aligned_skeleton_2d_gait.csv") and posture ("aligned_skeleton_2d_posture.csv") cameras. Each column in these files contains the position of one of the joints for each axis. It should be noted that in the case of the 2D data, some of the points are projected outside the image frame as they are not seen by the camera sensor, however their position in the 2D camera plane is still valid.

An additional file was added ("synchronized_data_idx.csv"), containing indexes of corresponding data samples for each modality, to synchronize the processed data samples with the depth files which are stored raw, as obtained from the walker.

Metadata were collected from all participants. These include i) age, ii) gender, iii) body mass, iv) body height, and v) limb dimensions, namely: hip height, shoe length, shoulder height, shoulder width, elbow span, wrist span, arm span, hip width, knee height and ankle height. This information is stored on "subjects_metadata.csv" file which was placed on the root folder location (level 0).

Additionally, information regarding the organization and data contained in the "raw_data" and "processed_data" folders is also presented in two respective "data_description.txt" files (level 0).

### Limitations

During the dataset processing, we observed some data irregularities that should be considered when using this dataset: i) a few trials (15/378) were discarded due to sensor displacement during a sequence or file corruption on some of the modalities; ii) a few trials, the depth data from the walkers' cameras was slightly corrupted from infrared exposure from sunlight; these data were not discarded, as it was considered representative of real environment variability, found in real sessions; iii) the aligned "aligned_skeleton" data, although providing reasonable estimates of the human joint locations, are affected by compounding transformation errors which might produce lower quality alignments between the visual data and the joint data. These errors were minimized as much as possible in the protocol, but if camera-relative positional data is not necessary, then the "normalized_skeleton" should be used, as it is not affected by these errors.

## Usage Notes

This database includes scripts to process, handle, visualize, and evaluate the data described. All scripts used are based on the Python programming language and, thus, open source.

The dataset has also been used on a related publication, to develop and evaluate deep learning based algorithms for patient pose estimation using the smart walker [16]. We hope it can further contribute to the development and evaluation of classic or data-driven vision-based pose estimation algorithms, applications in human detection, joint tracking, and movement forecasting, and gait/posture metrics analysis targeting smart walker solutions for rehabilitation.

## Acknowledgements

This work has been supported by the FCT - Fundação para a Ciência e Tecnologia - with the reference scholarship 2020.05708.BD and under the national support to R&D units grant, through the reference project UIDB/04436/2020 and UIDP/04436/2020.

## Conflicts of Interest

The authors declare no competing interests.

## References

1. Mikolajczyk, T, Ciobanu, I, Badea, D, Iliescu, A, Pizzamiglio, S, Schauer, T, Seel, T, Seiciu, P, Turner, D, Berteanu, M. "Advanced technology for gait rehabilitation: An overview". Advances in Mechanical Engineering 2018; 10(7):1–19.
2. Olesen, J, Gustavsson, A, Svensson, M, Wittchen, H, Jönsson, B. "The economic cost of brain disorders in Europe". European Journal of Neurology 2012; 19(1):155–162.
3. Moreira, R, Alves, J, Matias, A, Santos, C. "Smart and Assistive Walker – ASBGo: Rehabilitation Robotics: A Smart– Walker to Assist Ataxic Patients". Springer Nature Switzerland AG; 2019:37–68.
4. Moshe Gabel, Ran Gilad-Bachrach, Erin Renshaw, A. Schuster. "Full body gait analysis with Kinect". 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2012:1964-1967.
5. Springer, S, Yogev Seligmann, G. "Validity of the kinect for gait assessment: A focused review". Sensors 2016; 16(2):194.
6. Qifei Wang, G. Kurillo, Ferda Ofli, R. Bajcsy. "Evaluation of Pose Tracking Accuracy in the First and Second Generations of Microsoft Kinect". 2015 International Conference on Healthcare Informatics 2015:380-389.
7. Mehta, D, Sridhar, S, Sotnychenko, O, Rhodin, H, Shafiei, M, Seidel, HP, Xu, W, Casas, D, Theobalt, C. "Vnect: Real-time 3d human pose estimation with a single rgb camera". ACM Transactions on Graphics (TOG) 2017; 36(4):1–14.
8. Yucheng Chen, Yingli Tian, Mingyi He. "Monocular human pose estimation: A survey of deep learning-based methods". Computer Vision and Image Understanding 2020; 192:102897.
9. Ionescu, C, Papava, D, Olaru, V, Sminchisescu, C. "Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments". IEEE Transactions on Pattern Analysis and Machine Intelligence 2014; 36(7):1325-1339.
10. Mehta, D, Rhodin, H, Casas, D, Fua, P, Sotnychenko, O, Xu, W, Theobalt, CMonocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. In 3D Vision (3DV), 2017 Fifth International Conference on 2017.
11. Trumble, M, Gilbert, A, Malleson, C, Hilton, A, Collomosse, JTotal Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. In 2017 British Machine Vision Conference (BMVC) 2017.
12. Joo, H, Simon, T, Li, X, Liu, H, Tan, L, Gui, L, Banerjee, S, Godisart, T, Nabbe, B, Matthews, I, Kanade, T, Nobuhara, S, Sheikh, Y. "Panoptic Studio: A Massively Multiview System for Social Interaction Capture". IEEE Transactions on Pattern Analysis and Machine Intelligence 2017.
13. Schreiber, C, Moissenet, F. "A multimodal dataset of human gait at different walking speeds established on injury-free adult participants". Scientific data 2019; 6(1):1–7.
14. Roetenberg, D, Luinge, H, Slycke, P. "Xsens MVN: Full 6DOF human motion tracking using miniature inertial sensors". Xsens Motion Technologies BV, Tech. Rep 2009; 1.
15. C.B. Beaman, C.L. Peterson, R.R. Neptune, S.A. Kautz. "Differences in self-selected and fastest-comfortable walking in post-stroke hemiparetic persons". Gait & Posture 2010; 31(3):311 - 316.
16. Manuel Palermo, Sara Moccia, Lucia Migliorelli, Emanuele Frontoni, Cristina P. Santos. "Real-time human pose estimation on a smart walker using convolutional neural networks". Expert Systems with Applications 2021; 184:115498.

##### Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

##### Corresponding Author
You must be logged in to view the contact information.

## Files

Total uncompressed size: 19.5 GB.

##### Access the files
wget -r -N -c -np https://physionet.org/files/multi-gait-posture/1.0.0/