Cargando…

Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes

In this study, HcVGH, a method that learns spatio-temporal categories by segmenting first-person-view (FPV) videos captured by mobile robots, is proposed. Humans perceive continuous high-dimensional information by dividing and categorizing it into significant segments. This unsupervised segmentation...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nagano, Masatoshi, Nakamura, Tomoaki, Nagai, Takayuki, Mochihashi, Daichi, Kobayashi, Ichiro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Robotics and AI
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9562109/ https://www.ncbi.nlm.nih.gov/pubmed/36246490 http://dx.doi.org/10.3389/frobt.2022.903450

_version_	1784808097807597568
author	Nagano, Masatoshi Nakamura, Tomoaki Nagai, Takayuki Mochihashi, Daichi Kobayashi, Ichiro
author_facet	Nagano, Masatoshi Nakamura, Tomoaki Nagai, Takayuki Mochihashi, Daichi Kobayashi, Ichiro
author_sort	Nagano, Masatoshi
collection	PubMed
description	In this study, HcVGH, a method that learns spatio-temporal categories by segmenting first-person-view (FPV) videos captured by mobile robots, is proposed. Humans perceive continuous high-dimensional information by dividing and categorizing it into significant segments. This unsupervised segmentation capability is considered important for mobile robots to learn spatial knowledge. The proposed HcVGH combines a convolutional variational autoencoder (cVAE) with HVGH, a past method, which follows the hierarchical Dirichlet process-variational autoencoder-Gaussian process-hidden semi-Markov model comprising deep generative and statistical models. In the experiment, FPV videos of an agent were used in a simulated maze environment. FPV videos contain spatial information, and spatial knowledge can be learned by segmenting them. Using the FPV-video dataset, the segmentation performance of the proposed model was compared with previous models: HVGH and hierarchical recurrent state space model. The average segmentation F-measure achieved by HcVGH was 0.77; therefore, HcVGH outperformed the baseline methods. Furthermore, the experimental results showed that the parameters that represent the movability of the maze environment can be learned.
format	Online Article Text
id	pubmed-9562109
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-95621092022-10-15 Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes Nagano, Masatoshi Nakamura, Tomoaki Nagai, Takayuki Mochihashi, Daichi Kobayashi, Ichiro Front Robot AI Robotics and AI In this study, HcVGH, a method that learns spatio-temporal categories by segmenting first-person-view (FPV) videos captured by mobile robots, is proposed. Humans perceive continuous high-dimensional information by dividing and categorizing it into significant segments. This unsupervised segmentation capability is considered important for mobile robots to learn spatial knowledge. The proposed HcVGH combines a convolutional variational autoencoder (cVAE) with HVGH, a past method, which follows the hierarchical Dirichlet process-variational autoencoder-Gaussian process-hidden semi-Markov model comprising deep generative and statistical models. In the experiment, FPV videos of an agent were used in a simulated maze environment. FPV videos contain spatial information, and spatial knowledge can be learned by segmenting them. Using the FPV-video dataset, the segmentation performance of the proposed model was compared with previous models: HVGH and hierarchical recurrent state space model. The average segmentation F-measure achieved by HcVGH was 0.77; therefore, HcVGH outperformed the baseline methods. Furthermore, the experimental results showed that the parameters that represent the movability of the maze environment can be learned. Frontiers Media S.A. 2022-09-30 /pmc/articles/PMC9562109/ /pubmed/36246490 http://dx.doi.org/10.3389/frobt.2022.903450 Text en Copyright © 2022 Nagano, Nakamura, Nagai, Mochihashi and Kobayashi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Robotics and AI Nagano, Masatoshi Nakamura, Tomoaki Nagai, Takayuki Mochihashi, Daichi Kobayashi, Ichiro Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes
title	Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes
title_full	Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes
title_fullStr	Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes
title_full_unstemmed	Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes
title_short	Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes
title_sort	spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and gaussian processes
topic	Robotics and AI
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9562109/ https://www.ncbi.nlm.nih.gov/pubmed/36246490 http://dx.doi.org/10.3389/frobt.2022.903450
work_keys_str_mv	AT naganomasatoshi spatiotemporalcategorizationforfirstpersonviewvideosusingaconvolutionalvariationalautoencoderandgaussianprocesses AT nakamuratomoaki spatiotemporalcategorizationforfirstpersonviewvideosusingaconvolutionalvariationalautoencoderandgaussianprocesses AT nagaitakayuki spatiotemporalcategorizationforfirstpersonviewvideosusingaconvolutionalvariationalautoencoderandgaussianprocesses AT mochihashidaichi spatiotemporalcategorizationforfirstpersonviewvideosusingaconvolutionalvariationalautoencoderandgaussianprocesses AT kobayashiichiro spatiotemporalcategorizationforfirstpersonviewvideosusingaconvolutionalvariationalautoencoderandgaussianprocesses

Spatio-temporal categorization for first-person-view videos using a convolutional variational autoencoder and Gaussian processes

Ejemplares similares