Database Restricted Access

Kinematic dataset of actors expressing emotions

Mingming Zhang Lu Yu Keye Zhang Bixuan Du Bin Zhan Shaohua Chen Xiuhao Jiang Shuai Guo Jiafeng Zhao Yang Wang Bin Wang Shenglan Liu Wenbo Luo

Published: July 7, 2020. Version: 2.1.0


When using this resource, please cite: (show more options)
Zhang, M., Yu, L., Zhang, K., Du, B., Zhan, B., Chen, S., Jiang, X., Guo, S., Zhao, J., Wang, Y., Wang, B., Liu, S., & Luo, W. (2020). Kinematic dataset of actors expressing emotions (version 2.1.0). PhysioNet. https://doi.org/10.13026/kg8b-1t49.

Additionally, please cite the original publication:

Zhang, M., Yu, L., Zhang, K., Du, B., Zhan, B., Chen, S., Jiang, X., Guo, S., Zhao, J., Wang, Y., Wang, B., Liu, S*., & Luo, W*. (2020). Kinematic dataset of actors expressing emotions. Scientific Data. (2020) 7. 292. https://doi.org/10.1038/s41597-020-00635-7

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

We produced a kinematic dataset to assist in recognizing cues from all parts of the body that indicate human emotions (happy, sad, angry, fearful, , disgust, surprise) and neutral. The present dataset was created using a portable wireless motion capture system. Twenty-two semi-professional actors (50% female) completed performances. A total of 1402 recordings at 125 Hz were collected, consisting of the position and rotation data of 72 anatomical nodes. We hope this dataset will contribute to multiple fields of research and practice, including social neuroscience, psychiatry, computer vision, and biometric and information forensics.


Background

Recognizing human emotions is crucial for social communication and survival. There are various carriers and channels of emotional expression. The relevant existing works in both psychology [1] and computer science [2, 3] mainly focus on human faces and voices. Recently, psychologists have found that body movements can provide considerable emotional information relative to facial expression, regardless of static and dynamic conditions [4,5]. Therefore, human body indicators are essential for thorough emotion recognition.

Earlier efforts have created emotional body movement datasets using motion capture techniques, which record emotional expression while dancing, walking, and performing other actions [5]. However, most of these datasets provide finished products (e.g., point-light displays, videos), rather than raw kinematic data which may capture how emotion is encoded in body movements more precisely. There is currently a lack of public kinematic datasets that capture humans expressing emotions.


Methods

The dataset was created using a wireless motion capture system (Noitom Perception Neuron, Noitom Technology Ltd., Beijing, China) with 17 wearable sensors at 125 Hz [6-9]. These sensors were placed on both sides of the actors, including their upper and lower arms, hips, spine, head, feet, hands, shoulders, and both upper and lower legs. A four-step calibration procedure using four successive static poses was carried out before performances and whenever necessary (e.g. bad WIFI signal or after resting). The actors performed in a square stage of 1 * 1 m.

Twenty-four college students (13 females, mean age = 20.75 years, SD = 1.92) from the drama and dance clubs of the Dalian University of Technology were recruited as actors. Two females dropped out (i.e., F04, F13), so there were 22 actors left. All of them gave their written informed consent before performing and were informed that their motion data would be used only for scientific research. The study was approved by the Human Research Institutional Review Board of Liaoning Normal University in accordance with the Declaration of Helsinki (1991). After the recording phase, the actors were paid appropriately.

The actors started in a neutral stance (i.e., facing forward and arms naturally at sides) and then needed to complete free and scenario performances successively for each emotion (happy, sad, angry, fearful, neutral, disgust, and surprise). The former based on their self-understanding; for the latter, the order of scenarios was random. The actors had six seconds to complete each performance, after which it was reviewed and evaluated for signal quality; hence, some performances would be repeated several times.


Data Description

A total of 1402 trials were collected. file_info.csv provides a summary of the recorded performances. Note that "0" in "scenario_ID" means free performance; "version" denotes the number of repetitions.

Each actor has their own folder (named with the actor ID) consisting of BVH files for all emotions. Each trial was named systematically as "<actor_ID><emotion><scenario_ID><version>", where:

  • actor_ID: represents the actor ID;
  • emotion: includes happy (H), sad (SA), neutral (N), angry (A), disgust (D), fearful (F), and surprise (SU);
  • scenario_ID: consists of the free (indicated as 0) and scenario performance indicated with the corresponding numeral from 1 to 5;
  • version: denotes the number of repetitions (for details, see file_info.csv).

Each BVH file contains ASCII text and two sections (i.e., HIERARCHY and MOTION). Beginning with the keyword HIERARCHY, this section defines the joint tree, the name of each node, the number of channels, and the relative position between joints (i.e., the bone length of each part of the human body). In total there are 72 nodes (1 Root, 58 Joints, and 13 End Sites) in this section, which are calculated according to the 17 sensors. The MOTION section records the motion data. According to the joint sequence defined, the data of each frame is provided, and the position and rotation information of each joint node is recorded. There are some legends in a BVH file:

  • HIERARCHY: beginning of the header section
  • ROOT: location of the Hips
  • JOINT: location of the skeletal joint refers to the parent-joint
  • CHANNELS: number of channels including position and rotation channels
  • OFFSET: X, Y, and Z offsets of the segment relative to its parent-joint
  • End Site: end of a JOINT which has no child-joint
  • MOTION: beginning of the second section
  • FRAMES: numbers of frames
  • Frame Time: sampling time per frame

The mass center of the first frame for each recording was used to evaluate the effect of calibration.


Usage Notes

BVH files are plain text and can be imported directly into popular software such as 3ds Max, MotionBuilder, and other open access 3D applications. The data can be reused to build different avatars in virtual reality and augmented reality products. Previous studies on emotion recognition in the field of computer and information science have mainly focused on human faces and voices; hence, the dataset created in this study may help to improve technologies and contribute to scientific research in fields such as psychiatry and psychology.


Release Notes

Added in version 2.0.0:

  • "Surprise" data added
  • file_info.csv and mass_center_of _first_frame.csv updated.

Added in version 2.1.0:

  • export_frame.m is the Matlab code used to extract the frame number of BVH files to calculate durations.

Acknowledgements

We also thank S. Liu and X. Yi for contribution to the data collation. This work was supported by the National Natural Science Foundation of China (31871106).


Conflicts of Interest

The authors declare no conflict of interest.


References

  1. Perception Neuron website: https://neuronmocap.com/content/axis-neuron [Accessed 20 April 2019]
  2. Robert-Lachaine, X., Mecheri, H., Muller, A., Larue, C. & Plamondon, A. Validation of a low-cost inertial motion capture system for whole-body motion analysis. J. Biomech. 99, 109520 (2020).
  3. Kim, H. S. et al. Application of a perception neuron system in simulation-based surgical training. J Clin Med 8 (2019).
  4. Sers, R. et al. Validity of the perception neuron inertial motion capture system for upper body motion analysis. Measurement 149 (2020).
  5. Atkinson, A. P., Dittrich, W. H., Gemmell, A. J. & Young, A. W. Emotion perception from dynamic and static body expressions in point-light and full-light displays. Perception 33, 717-746 (2004).
  6. de Gelder, B. & Van den Stock, J. The bodily expressive action stimulus test (BEAST). Construction and validation of a stimulus basis for measuring perception of whole body expression of emotions. Front. Psychol. 2, 181 (2011).
  7. Lalitha, S., Madhavan, A., Bhushan, B. & Saketh, S. Speech emotion recognition. 2014 International Conference on Advances in Electronics, Computers and Communications (ICAECC) (2014).
  8. Schuller, B., Rigoll, G. & Lang, M. Hidden markov model-based speech emotion recognition. 2003 International Conference on Multimedia and Expo, Vol I, Proceedings, 401-404 (2003).
  9. de Gelder, B. Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos T R Soc B 364, 3475-3484 (2009).

Share
Access

Access Policy:
Only logged in users who sign the specified data use agreement can access the files.

License (for files):
PhysioNet Restricted Health Data License 1.5.0

Corresponding Author
You must be logged in to view the contact information.
Versions

Files