Cargando…

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals

This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Yeongtaek, Kim, Incheol
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6427216/ https://www.ncbi.nlm.nih.gov/pubmed/30832433 http://dx.doi.org/10.3390/s19051085

_version_	1783405160901378048
author	Song, Yeongtaek Kim, Incheol
author_facet	Song, Yeongtaek Kim, Incheol
author_sort	Song, Yeongtaek
collection	PubMed
description	This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model.
format	Online Article Text
id	pubmed-6427216
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-64272162019-04-15 Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals Song, Yeongtaek Kim, Incheol Sensors (Basel) Article This paper proposes a novel deep neural network model for solving the spatio-temporal-action-detection problem, by localizing all multiple-action regions and classifying the corresponding actions in an untrimmed video. The proposed model uses a spatio-temporal region proposal method to effectively detect multiple-action regions. First, in the temporal region proposal, anchor boxes were generated by targeting regions expected to potentially contain actions. Unlike the conventional temporal region proposal methods, the proposed method uses a complementary two-stage method to effectively detect the temporal regions of the respective actions occurring asynchronously. In addition, to detect a principal agent performing an action among the people appearing in a video, the spatial region proposal process was used. Further, coarse-level features contain comprehensive information of the whole video and have been frequently used in conventional action-detection studies. However, they cannot provide detailed information of each person performing an action in a video. In order to overcome the limitation of coarse-level features, the proposed model additionally learns fine-level features from the proposed action tubes in the video. Various experiments conducted using the LIRIS-HARL and UCF-10 datasets confirm the high performance and effectiveness of the proposed deep neural network model. MDPI 2019-03-03 /pmc/articles/PMC6427216/ /pubmed/30832433 http://dx.doi.org/10.3390/s19051085 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Song, Yeongtaek Kim, Incheol Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title	Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_full	Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_fullStr	Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_full_unstemmed	Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_short	Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals
title_sort	spatio-temporal action detection in untrimmed videos by using multimodal features and region proposals
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6427216/ https://www.ncbi.nlm.nih.gov/pubmed/30832433 http://dx.doi.org/10.3390/s19051085
work_keys_str_mv	AT songyeongtaek spatiotemporalactiondetectioninuntrimmedvideosbyusingmultimodalfeaturesandregionproposals AT kimincheol spatiotemporalactiondetectioninuntrimmedvideosbyusingmultimodalfeaturesandregionproposals

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals

Ejemplares similares