Cargando…

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals

This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively d...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Yeongtaek, Kim, Incheol
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6427216/
https://www.ncbi.nlm.nih.gov/pubmed/30832433
http://dx.doi.org/10.3390/s19051085
_version_ 1783405160901378048
author Song, Yeongtaek
Kim, Incheol
author_facet Song, Yeongtaek
Kim, Incheol
author_sort Song, Yeongtaek
collection PubMed
description This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.
format Online
Article
Text
id pubmed-6427216
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-64272162019-04-15 Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals Song, Yeongtaek Kim, Incheol Sensors (Basel) Article This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model. MDPI 2019-03-03 /pmc/articles/PMC6427216/ /pubmed/30832433 http://dx.doi.org/10.3390/s19051085 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Song, Yeongtaek
Kim, Incheol
Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_full Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_fullStr Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_full_unstemmed Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_short Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_sort spatio-temporal action detection in untrimmed videos by using multimodal features and region proposals
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6427216/
https://www.ncbi.nlm.nih.gov/pubmed/30832433
http://dx.doi.org/10.3390/s19051085
work_keys_str_mv AT songyeongtaek spatiotemporalactiondetectioninuntrimmedvideosbyusingmultimodalfeaturesandregionproposals
AT kimincheol spatiotemporalactiondetectioninuntrimmedvideosbyusingmultimodalfeaturesandregionproposals