Cargando…

Gaze Tracking Based on Concatenating Spatial-Temporal Features

Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hwang, Bor-Jiunn, Chen, Hui-Hui, Hsieh, Chaur-Heh, Huang, Deng-Yu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8781122/ https://www.ncbi.nlm.nih.gov/pubmed/35062502 http://dx.doi.org/10.3390/s22020545

_version_	1784638013527031808
author	Hwang, Bor-Jiunn Chen, Hui-Hui Hsieh, Chaur-Heh Huang, Deng-Yu
author_facet	Hwang, Bor-Jiunn Chen, Hui-Hui Hsieh, Chaur-Heh Huang, Deng-Yu
author_sort	Hwang, Bor-Jiunn
collection	PubMed
description	Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addition to the spatial features, the temporal features are considered to improve the accuracy in this paper by using videos instead of images as the input data. To be able to capture spatial and temporal features at the same time, the convolutional neural network (CNN) and long short-term memory (LSTM) network are introduced to build a training model. In this way, CNN is used to extract the spatial features, and LSTM correlates temporal features. This paper presents a CNN Concatenating LSTM network (CCLN) that concatenates spatial and temporal features to improve the performance of gaze estimation in the case of time-series videos as the input training data. In addition, the proposed model can be optimized by exploring the numbers of LSTM layers, the influence of batch normalization (BN) and global average pooling layer (GAP) on CCLN. It is generally believed that larger amounts of training data will lead to better models. To provide data for training and prediction, we propose a method for constructing datasets of video for gaze point estimation. The issues are studied, including the effectiveness of different commonly used general models and the impact of transfer learning. Through exhaustive evaluation, it has been proved that the proposed method achieves a better prediction accuracy than the existing CNN-based methods. Finally, 93.1% of the best model and 92.6% of the general model MobileNet are obtained.
format	Online Article Text
id	pubmed-8781122
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87811222022-01-22 Gaze Tracking Based on Concatenating Spatial-Temporal Features Hwang, Bor-Jiunn Chen, Hui-Hui Hsieh, Chaur-Heh Huang, Deng-Yu Sensors (Basel) Article Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addition to the spatial features, the temporal features are considered to improve the accuracy in this paper by using videos instead of images as the input data. To be able to capture spatial and temporal features at the same time, the convolutional neural network (CNN) and long short-term memory (LSTM) network are introduced to build a training model. In this way, CNN is used to extract the spatial features, and LSTM correlates temporal features. This paper presents a CNN Concatenating LSTM network (CCLN) that concatenates spatial and temporal features to improve the performance of gaze estimation in the case of time-series videos as the input training data. In addition, the proposed model can be optimized by exploring the numbers of LSTM layers, the influence of batch normalization (BN) and global average pooling layer (GAP) on CCLN. It is generally believed that larger amounts of training data will lead to better models. To provide data for training and prediction, we propose a method for constructing datasets of video for gaze point estimation. The issues are studied, including the effectiveness of different commonly used general models and the impact of transfer learning. Through exhaustive evaluation, it has been proved that the proposed method achieves a better prediction accuracy than the existing CNN-based methods. Finally, 93.1% of the best model and 92.6% of the general model MobileNet are obtained. MDPI 2022-01-11 /pmc/articles/PMC8781122/ /pubmed/35062502 http://dx.doi.org/10.3390/s22020545 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Hwang, Bor-Jiunn Chen, Hui-Hui Hsieh, Chaur-Heh Huang, Deng-Yu Gaze Tracking Based on Concatenating Spatial-Temporal Features
title	Gaze Tracking Based on Concatenating Spatial-Temporal Features
title_full	Gaze Tracking Based on Concatenating Spatial-Temporal Features
title_fullStr	Gaze Tracking Based on Concatenating Spatial-Temporal Features
title_full_unstemmed	Gaze Tracking Based on Concatenating Spatial-Temporal Features
title_short	Gaze Tracking Based on Concatenating Spatial-Temporal Features
title_sort	gaze tracking based on concatenating spatial-temporal features
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8781122/ https://www.ncbi.nlm.nih.gov/pubmed/35062502 http://dx.doi.org/10.3390/s22020545
work_keys_str_mv	AT hwangborjiunn gazetrackingbasedonconcatenatingspatialtemporalfeatures AT chenhuihui gazetrackingbasedonconcatenatingspatialtemporalfeatures AT hsiehchaurheh gazetrackingbasedonconcatenatingspatialtemporalfeatures AT huangdengyu gazetrackingbasedonconcatenatingspatialtemporalfeatures

Gaze Tracking Based on Concatenating Spatial-Temporal Features

Ejemplares similares