Cargando…

Contrastive self-supervised representation learning without negative samples for multimodal human action recognition

Action recognition is an important component of human-computer interaction, and multimodal feature representation and learning methods can be used to improve recognition performance due to the interrelation and complementarity between different modalities. However, due to the lack of large-scale lab...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Huaigang, Ren, Ziliang, Yuan, Huaqiang, Xu, Zhenyu, Zhou, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10354269/
https://www.ncbi.nlm.nih.gov/pubmed/37476841
http://dx.doi.org/10.3389/fnins.2023.1225312
_version_ 1785074891764006912
author Yang, Huaigang
Ren, Ziliang
Yuan, Huaqiang
Xu, Zhenyu
Zhou, Jun
author_facet Yang, Huaigang
Ren, Ziliang
Yuan, Huaqiang
Xu, Zhenyu
Zhou, Jun
author_sort Yang, Huaigang
collection PubMed
description Action recognition is an important component of human-computer interaction, and multimodal feature representation and learning methods can be used to improve recognition performance due to the interrelation and complementarity between different modalities. However, due to the lack of large-scale labeled samples, the performance of existing ConvNets-based methods are severely constrained. In this paper, a novel and effective multi-modal feature representation and contrastive self-supervised learning framework is proposed to improve the action recognition performance of models and the generalization ability of application scenarios. The proposed recognition framework employs weight sharing between two branches and does not require negative samples, which could effectively learn useful feature representations by using multimodal unlabeled data, e.g., skeleton sequence and inertial measurement unit signal (IMU). The extensive experiments are conducted on two benchmarks: UTD-MHAD and MMAct, and the results show that our proposed recognition framework outperforms both unimodal and multimodal baselines in action retrieval, semi-supervised learning, and zero-shot learning scenarios.
format Online
Article
Text
id pubmed-10354269
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103542692023-07-20 Contrastive self-supervised representation learning without negative samples for multimodal human action recognition Yang, Huaigang Ren, Ziliang Yuan, Huaqiang Xu, Zhenyu Zhou, Jun Front Neurosci Neuroscience Action recognition is an important component of human-computer interaction, and multimodal feature representation and learning methods can be used to improve recognition performance due to the interrelation and complementarity between different modalities. However, due to the lack of large-scale labeled samples, the performance of existing ConvNets-based methods are severely constrained. In this paper, a novel and effective multi-modal feature representation and contrastive self-supervised learning framework is proposed to improve the action recognition performance of models and the generalization ability of application scenarios. The proposed recognition framework employs weight sharing between two branches and does not require negative samples, which could effectively learn useful feature representations by using multimodal unlabeled data, e.g., skeleton sequence and inertial measurement unit signal (IMU). The extensive experiments are conducted on two benchmarks: UTD-MHAD and MMAct, and the results show that our proposed recognition framework outperforms both unimodal and multimodal baselines in action retrieval, semi-supervised learning, and zero-shot learning scenarios. Frontiers Media S.A. 2023-07-05 /pmc/articles/PMC10354269/ /pubmed/37476841 http://dx.doi.org/10.3389/fnins.2023.1225312 Text en Copyright © 2023 Yang, Ren, Yuan, Xu and Zhou. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Yang, Huaigang
Ren, Ziliang
Yuan, Huaqiang
Xu, Zhenyu
Zhou, Jun
Contrastive self-supervised representation learning without negative samples for multimodal human action recognition
title Contrastive self-supervised representation learning without negative samples for multimodal human action recognition
title_full Contrastive self-supervised representation learning without negative samples for multimodal human action recognition
title_fullStr Contrastive self-supervised representation learning without negative samples for multimodal human action recognition
title_full_unstemmed Contrastive self-supervised representation learning without negative samples for multimodal human action recognition
title_short Contrastive self-supervised representation learning without negative samples for multimodal human action recognition
title_sort contrastive self-supervised representation learning without negative samples for multimodal human action recognition
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10354269/
https://www.ncbi.nlm.nih.gov/pubmed/37476841
http://dx.doi.org/10.3389/fnins.2023.1225312
work_keys_str_mv AT yanghuaigang contrastiveselfsupervisedrepresentationlearningwithoutnegativesamplesformultimodalhumanactionrecognition
AT renziliang contrastiveselfsupervisedrepresentationlearningwithoutnegativesamplesformultimodalhumanactionrecognition
AT yuanhuaqiang contrastiveselfsupervisedrepresentationlearningwithoutnegativesamplesformultimodalhumanactionrecognition
AT xuzhenyu contrastiveselfsupervisedrepresentationlearningwithoutnegativesamplesformultimodalhumanactionrecognition
AT zhoujun contrastiveselfsupervisedrepresentationlearningwithoutnegativesamplesformultimodalhumanactionrecognition