Understanding the Limits of 2D Skeletons for Action Recognition

J 2021

Understanding the Limits of 2D Skeletons for Action Recognition

ELIÁŠ, Petr, Jan SEDMIDUBSKÝ and Pavel ZEZULA

Basic information

Original name

Understanding the Limits of 2D Skeletons for Action Recognition

Authors

ELIÁŠ, Petr (203 Czech Republic, belonging to the institution), Jan SEDMIDUBSKÝ (203 Czech Republic, belonging to the institution) and Pavel ZEZULA (203 Czech Republic, belonging to the institution)

Edition

Multimedia Systems, 2021, 0942-4962

Other information

Language

English

Type of outcome

Článek v odborném periodiku

Field of Study

10200 1.2 Computer and information sciences

Country of publisher

United States of America

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

URL

Impact factor

Impact factor: 2.603

RIV identification code

RIV/00216224:14330/21:00118833

Organization unit

Faculty of Informatics

DOI

http://dx.doi.org/10.1007/s00530-021-00754-0

UT WoS

000615767700001

Keywords in English

skeleton sequence;2D skeleton data;3D skeleton data;action recognition;normalization

Abstract

V originále

With the development of motion capture technologies, 3D action recognition has become a popular task that finds great applicability in many areas, such as augmented reality, human–computer interaction, sports, or healthcare. On the other hand, the acquisition of 3D human skeleton data is an expensive and time-consuming process, mainly due to the high costs of capturing technologies and the absence of suitable actors. We overcome these issues by focusing on the 2D skeleton modality that can be easily extracted from ordinary videos. The objective of this work is to demonstrate a high descriptive power of such a 2D skeleton modality by achieving accuracy on the task of daily action recognition competitive to 3D skeleton data. More importantly, we thoroughly analyze the factors that significantly influence the 2D recognition accuracy, such as the sensitivity towards data normalization, scaling, quantization, and 3D-to-2D distortions in skeleton orientations and sizes, which are caused by the loss of depth dimension and fixed-angle camera view. We also provide valuable insights on how to mitigate these problems to increase recognition accuracy significantly. The experimental evaluation is conducted on three datasets different in nature. The ability to learn different types of actions better using either 2D or 3D skeletons is also reported. Throughout experiments, a generic light-weight LSTM network is used, whose architecture can be easily tuned to achieve the desired trade-off between its accuracy and efficiency. We show that the proposed approach achieves not only the state-of-the-art results in 2D skeleton action recognition but is also highly competitive to the best-performing methods classifying 3D skeleton sequences or the visual content extracted from ordinary videos.

Links

GA19-02033S, research and development project

Name: Vyhledávání, analytika a anotace datových toků lidských pohybů

Investor: Czech Science Foundation

Citovat

ELIÁŠ, Petr, Jan SEDMIDUBSKÝ and Pavel ZEZULA. Understanding the Limits of 2D Skeletons for Action Recognition. Multimedia Systems. 2021, vol. 27, No 3, p. 547-561. ISSN 0942-4962. Available from: https://dx.doi.org/10.1007/s00530-021-00754-0.

@article{1730757,
   author = {Eliáš, Petr and Sedmidubský, Jan and Zezula, Pavel},
   article_number = {3},
   doi = {http://dx.doi.org/10.1007/s00530-021-00754-0},
   keywords = {skeleton sequence;2D skeleton data;3D skeleton data;action recognition;normalization},
   language = {eng},
   issn = {0942-4962},
   journal = {Multimedia Systems},
   title = {Understanding the Limits of 2D Skeletons for Action Recognition},
   url = {https://link.springer.com/article/10.1007/s00530-021-00754-0},
   volume = {27},
   year = {2021}
}

TY  - JOUR
ID  - 1730757
AU  - Eliáš, Petr - Sedmidubský, Jan - Zezula, Pavel
PY  - 2021
TI  - Understanding the Limits of 2D Skeletons for Action Recognition
JF  - Multimedia Systems
VL  - 27
IS  - 3
SP  - 547-561
EP  - 547-561
SN  - 09424962
KW  - skeleton sequence;2D skeleton data;3D skeleton data;action recognition;normalization
UR  - https://link.springer.com/article/10.1007/s00530-021-00754-0
N2  - With the development of motion capture technologies, 3D action recognition has become a popular task that finds great applicability in many areas, such as augmented reality, human–computer interaction, sports, or healthcare. On the other hand, the acquisition of 3D human skeleton data is an expensive and time-consuming process, mainly due to the high costs of capturing technologies and the absence of suitable actors. We overcome these issues by focusing on the 2D skeleton modality that can be easily extracted from ordinary videos. The objective of this work is to demonstrate a high descriptive power of such a 2D skeleton modality by achieving accuracy on the task of daily action recognition competitive to 3D skeleton data. More importantly, we thoroughly analyze the factors that significantly influence the 2D recognition accuracy, such as the sensitivity towards data normalization, scaling, quantization, and 3D-to-2D distortions in skeleton orientations and sizes, which are caused by the loss of depth dimension and fixed-angle camera view. We also provide valuable insights on how to mitigate these problems to increase recognition accuracy significantly. The experimental evaluation is conducted on three datasets different in nature. The ability to learn different types of actions better using either 2D or 3D skeletons is also reported. Throughout experiments, a generic light-weight LSTM network is used, whose architecture can be easily tuned to achieve the desired trade-off between its accuracy and efficiency. We show that the proposed approach achieves not only the state-of-the-art results in 2D skeleton action recognition but is also highly competitive to the best-performing methods classifying 3D skeleton sequences or the visual content extracted from ordinary videos.
ER  -

ELIÁŠ, Petr, Jan SEDMIDUBSKÝ and Pavel ZEZULA. Understanding the Limits of 2D Skeletons for Action Recognition. \textit{Multimedia Systems}. 2021, vol.~27, No~3, p.~547-561. ISSN~0942-4962. Available from: https://dx.doi.org/10.1007/s00530-021-00754-0.

Detailed Information on Publication Record