Cargando…

Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition

This paper presents the simultaneous utilization of video images and inertial signals that are captured at the same time via a video camera and a wearable inertial sensor within a fusion framework in order to achieve a more robust human action recognition compared to the situations when each sensing...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Haoran, Jafari, Roozbeh, Kehtarnavaz, Nasser
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749419/
https://www.ncbi.nlm.nih.gov/pubmed/31450609
http://dx.doi.org/10.3390/s19173680
_version_ 1783452274435031040
author Wei, Haoran
Jafari, Roozbeh
Kehtarnavaz, Nasser
author_facet Wei, Haoran
Jafari, Roozbeh
Kehtarnavaz, Nasser
author_sort Wei, Haoran
collection PubMed
description This paper presents the simultaneous utilization of video images and inertial signals that are captured at the same time via a video camera and a wearable inertial sensor within a fusion framework in order to achieve a more robust human action recognition compared to the situations when each sensing modality is used individually. The data captured by these sensors are turned into 3D video images and 2D inertial images that are then fed as inputs into a 3D convolutional neural network and a 2D convolutional neural network, respectively, for recognizing actions. Two types of fusion are considered—Decision-level fusion and feature-level fusion. Experiments are conducted using the publicly available dataset UTD-MHAD in which simultaneous video images and inertial signals are captured for a total of 27 actions. The results obtained indicate that both the decision-level and feature-level fusion approaches generate higher recognition accuracies compared to the approaches when each sensing modality is used individually. The highest accuracy of 95.6% is obtained for the decision-level fusion approach.
format Online
Article
Text
id pubmed-6749419
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-67494192019-09-27 Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition Wei, Haoran Jafari, Roozbeh Kehtarnavaz, Nasser Sensors (Basel) Article This paper presents the simultaneous utilization of video images and inertial signals that are captured at the same time via a video camera and a wearable inertial sensor within a fusion framework in order to achieve a more robust human action recognition compared to the situations when each sensing modality is used individually. The data captured by these sensors are turned into 3D video images and 2D inertial images that are then fed as inputs into a 3D convolutional neural network and a 2D convolutional neural network, respectively, for recognizing actions. Two types of fusion are considered—Decision-level fusion and feature-level fusion. Experiments are conducted using the publicly available dataset UTD-MHAD in which simultaneous video images and inertial signals are captured for a total of 27 actions. The results obtained indicate that both the decision-level and feature-level fusion approaches generate higher recognition accuracies compared to the approaches when each sensing modality is used individually. The highest accuracy of 95.6% is obtained for the decision-level fusion approach. MDPI 2019-08-24 /pmc/articles/PMC6749419/ /pubmed/31450609 http://dx.doi.org/10.3390/s19173680 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wei, Haoran
Jafari, Roozbeh
Kehtarnavaz, Nasser
Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition
title Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition
title_full Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition
title_fullStr Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition
title_full_unstemmed Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition
title_short Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition
title_sort fusion of video and inertial sensing for deep learning–based human action recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749419/
https://www.ncbi.nlm.nih.gov/pubmed/31450609
http://dx.doi.org/10.3390/s19173680
work_keys_str_mv AT weihaoran fusionofvideoandinertialsensingfordeeplearningbasedhumanactionrecognition
AT jafariroozbeh fusionofvideoandinertialsensingfordeeplearningbasedhumanactionrecognition
AT kehtarnavaznasser fusionofvideoandinertialsensingfordeeplearningbasedhumanactionrecognition