Cargando…

Gait-ViT: Gait Recognition with Vision Transformer

Identifying an individual based on their physical/behavioral characteristics is known as biometric recognition. Gait is one of the most reliable biometrics due to its advantages, such as being perceivable at a long distance and difficult to replicate. The existing works mostly leverage Convolutional...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mogan, Jashila Nair, Lee, Chin Poo, Lim, Kian Ming, Muthu, Kalaiarasi Sonai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9572525/ https://www.ncbi.nlm.nih.gov/pubmed/36236462 http://dx.doi.org/10.3390/s22197362

_version_	1784810636797018112
author	Mogan, Jashila Nair Lee, Chin Poo Lim, Kian Ming Muthu, Kalaiarasi Sonai
author_facet	Mogan, Jashila Nair Lee, Chin Poo Lim, Kian Ming Muthu, Kalaiarasi Sonai
author_sort	Mogan, Jashila Nair
collection	PubMed
description	Identifying an individual based on their physical/behavioral characteristics is known as biometric recognition. Gait is one of the most reliable biometrics due to its advantages, such as being perceivable at a long distance and difficult to replicate. The existing works mostly leverage Convolutional Neural Networks for gait recognition. The Convolutional Neural Networks perform well in image recognition tasks; however, they lack the attention mechanism to emphasize more on the significant regions of the image. The attention mechanism encodes information in the image patches, which facilitates the model to learn the substantial features in the specific regions. In light of this, this work employs the Vision Transformer (ViT) with an attention mechanism for gait recognition, referred to as Gait-ViT. In the proposed Gait-ViT, the gait energy image is first obtained by averaging the series of images over the gait cycle. The images are then split into patches and transformed into sequences by flattening and patch embedding. Position embedding, along with patch embedding, are applied on the sequence of patches to restore the positional information of the patches. Subsequently, the sequence of vectors is fed to the Transformer encoder to produce the final gait representation. As for the classification, the first element of the sequence is sent to the multi-layer perceptron to predict the class label. The proposed method obtained 99.93% on CASIA-B, 100% on OU-ISIR D and 99.51% on OU-LP, which exhibit the ability of the Vision Transformer model to outperform the state-of-the-art methods.
format	Online Article Text
id	pubmed-9572525
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-95725252022-10-17 Gait-ViT: Gait Recognition with Vision Transformer Mogan, Jashila Nair Lee, Chin Poo Lim, Kian Ming Muthu, Kalaiarasi Sonai Sensors (Basel) Article Identifying an individual based on their physical/behavioral characteristics is known as biometric recognition. Gait is one of the most reliable biometrics due to its advantages, such as being perceivable at a long distance and difficult to replicate. The existing works mostly leverage Convolutional Neural Networks for gait recognition. The Convolutional Neural Networks perform well in image recognition tasks; however, they lack the attention mechanism to emphasize more on the significant regions of the image. The attention mechanism encodes information in the image patches, which facilitates the model to learn the substantial features in the specific regions. In light of this, this work employs the Vision Transformer (ViT) with an attention mechanism for gait recognition, referred to as Gait-ViT. In the proposed Gait-ViT, the gait energy image is first obtained by averaging the series of images over the gait cycle. The images are then split into patches and transformed into sequences by flattening and patch embedding. Position embedding, along with patch embedding, are applied on the sequence of patches to restore the positional information of the patches. Subsequently, the sequence of vectors is fed to the Transformer encoder to produce the final gait representation. As for the classification, the first element of the sequence is sent to the multi-layer perceptron to predict the class label. The proposed method obtained 99.93% on CASIA-B, 100% on OU-ISIR D and 99.51% on OU-LP, which exhibit the ability of the Vision Transformer model to outperform the state-of-the-art methods. MDPI 2022-09-28 /pmc/articles/PMC9572525/ /pubmed/36236462 http://dx.doi.org/10.3390/s22197362 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Mogan, Jashila Nair Lee, Chin Poo Lim, Kian Ming Muthu, Kalaiarasi Sonai Gait-ViT: Gait Recognition with Vision Transformer
title	Gait-ViT: Gait Recognition with Vision Transformer
title_full	Gait-ViT: Gait Recognition with Vision Transformer
title_fullStr	Gait-ViT: Gait Recognition with Vision Transformer
title_full_unstemmed	Gait-ViT: Gait Recognition with Vision Transformer
title_short	Gait-ViT: Gait Recognition with Vision Transformer
title_sort	gait-vit: gait recognition with vision transformer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9572525/ https://www.ncbi.nlm.nih.gov/pubmed/36236462 http://dx.doi.org/10.3390/s22197362
work_keys_str_mv	AT moganjashilanair gaitvitgaitrecognitionwithvisiontransformer AT leechinpoo gaitvitgaitrecognitionwithvisiontransformer AT limkianming gaitvitgaitrecognitionwithvisiontransformer AT muthukalaiarasisonai gaitvitgaitrecognitionwithvisiontransformer

Gait-ViT: Gait Recognition with Vision Transformer

Ejemplares similares