Cargando…

Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism

The direction of human gaze is an important indicator of human behavior, reflecting the level of attention and cognitive state towards various visual stimuli in the environment. Convolutional neural networks have achieved good performance in gaze estimation tasks, but their global modeling capabilit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yujie, Chen, Jiahui, Ma, Jiaxin, Wang, Xiwen, Zhang, Wei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346721/ https://www.ncbi.nlm.nih.gov/pubmed/37448073 http://dx.doi.org/10.3390/s23136226

_version_	1785073380209197056
author	Li, Yujie Chen, Jiahui Ma, Jiaxin Wang, Xiwen Zhang, Wei
author_facet	Li, Yujie Chen, Jiahui Ma, Jiaxin Wang, Xiwen Zhang, Wei
author_sort	Li, Yujie
collection	PubMed
description	The direction of human gaze is an important indicator of human behavior, reflecting the level of attention and cognitive state towards various visual stimuli in the environment. Convolutional neural networks have achieved good performance in gaze estimation tasks, but their global modeling capability is limited, making it difficult to further improve prediction performance. In recent years, transformer models have been introduced for gaze estimation and have achieved state-of-the-art performance. However, their slicing-and-mapping mechanism for processing local image patches can compromise local spatial information. Moreover, the single down-sampling rate and fixed-size tokens are not suitable for multiscale feature learning in gaze estimation tasks. To overcome these limitations, this study introduces a Swin Transformer for gaze estimation and designs two network architectures: a pure Swin Transformer gaze estimation model (SwinT-GE) and a hybrid gaze estimation model that combines convolutional structures with SwinT-GE (Res-Swin-GE). SwinT-GE uses the tiny version of the Swin Transformer for gaze estimation. Res-Swin-GE replaces the slicing-and-mapping mechanism of SwinT-GE with convolutional structures. Experimental results demonstrate that Res-Swin-GE significantly outperforms SwinT-GE, exhibiting strong competitiveness on the MpiiFaceGaze dataset and achieving a 7.5% performance improvement over existing state-of-the-art methods on the Eyediap dataset.
format	Online Article Text
id	pubmed-10346721
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103467212023-07-15 Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism Li, Yujie Chen, Jiahui Ma, Jiaxin Wang, Xiwen Zhang, Wei Sensors (Basel) Article The direction of human gaze is an important indicator of human behavior, reflecting the level of attention and cognitive state towards various visual stimuli in the environment. Convolutional neural networks have achieved good performance in gaze estimation tasks, but their global modeling capability is limited, making it difficult to further improve prediction performance. In recent years, transformer models have been introduced for gaze estimation and have achieved state-of-the-art performance. However, their slicing-and-mapping mechanism for processing local image patches can compromise local spatial information. Moreover, the single down-sampling rate and fixed-size tokens are not suitable for multiscale feature learning in gaze estimation tasks. To overcome these limitations, this study introduces a Swin Transformer for gaze estimation and designs two network architectures: a pure Swin Transformer gaze estimation model (SwinT-GE) and a hybrid gaze estimation model that combines convolutional structures with SwinT-GE (Res-Swin-GE). SwinT-GE uses the tiny version of the Swin Transformer for gaze estimation. Res-Swin-GE replaces the slicing-and-mapping mechanism of SwinT-GE with convolutional structures. Experimental results demonstrate that Res-Swin-GE significantly outperforms SwinT-GE, exhibiting strong competitiveness on the MpiiFaceGaze dataset and achieving a 7.5% performance improvement over existing state-of-the-art methods on the Eyediap dataset. MDPI 2023-07-07 /pmc/articles/PMC10346721/ /pubmed/37448073 http://dx.doi.org/10.3390/s23136226 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Li, Yujie Chen, Jiahui Ma, Jiaxin Wang, Xiwen Zhang, Wei Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_full	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_fullStr	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_full_unstemmed	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_short	Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism
title_sort	gaze estimation based on convolutional structure and sliding window-based attention mechanism
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346721/ https://www.ncbi.nlm.nih.gov/pubmed/37448073 http://dx.doi.org/10.3390/s23136226
work_keys_str_mv	AT liyujie gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT chenjiahui gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT majiaxin gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT wangxiwen gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism AT zhangwei gazeestimationbasedonconvolutionalstructureandslidingwindowbasedattentionmechanism

Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism

Ejemplares similares