Cargando…

Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement

This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appe...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Chengming, Liu, Qian, Dang, Yaqi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606411/
https://www.ncbi.nlm.nih.gov/pubmed/34819900
http://dx.doi.org/10.3389/fpsyg.2021.769509
_version_ 1784602332159279104
author Ma, Chengming
Liu, Qian
Dang, Yaqi
author_facet Ma, Chengming
Liu, Qian
Dang, Yaqi
author_sort Ma, Chengming
collection PubMed
description This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition.
format Online
Article
Text
id pubmed-8606411
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-86064112021-11-23 Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement Ma, Chengming Liu, Qian Dang, Yaqi Front Psychol Psychology This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition. Frontiers Media S.A. 2021-11-08 /pmc/articles/PMC8606411/ /pubmed/34819900 http://dx.doi.org/10.3389/fpsyg.2021.769509 Text en Copyright © 2021 Ma, Liu and Dang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Ma, Chengming
Liu, Qian
Dang, Yaqi
Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_full Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_fullStr Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_full_unstemmed Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_short Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement
title_sort multimodal art pose recognition and interaction with human intelligence enhancement
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8606411/
https://www.ncbi.nlm.nih.gov/pubmed/34819900
http://dx.doi.org/10.3389/fpsyg.2021.769509
work_keys_str_mv AT machengming multimodalartposerecognitionandinteractionwithhumanintelligenceenhancement
AT liuqian multimodalartposerecognitionandinteractionwithhumanintelligenceenhancement
AT dangyaqi multimodalartposerecognitionandinteractionwithhumanintelligenceenhancement