Cargando…

Face-based age estimation using improved Swin Transformer with attention-based convolution

Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong feature...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Chaojun, Zhao, Shiwei, Zhang, Ke, Wang, Yibo, Liang, Longping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10130448/
https://www.ncbi.nlm.nih.gov/pubmed/37123378
http://dx.doi.org/10.3389/fnins.2023.1136934
_version_ 1785030960128983040
author Shi, Chaojun
Zhao, Shiwei
Zhang, Ke
Wang, Yibo
Liang, Longping
author_facet Shi, Chaojun
Zhao, Shiwei
Zhang, Ke
Wang, Yibo
Liang, Longping
author_sort Shi, Chaojun
collection PubMed
description Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets.
format Online
Article
Text
id pubmed-10130448
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-101304482023-04-27 Face-based age estimation using improved Swin Transformer with attention-based convolution Shi, Chaojun Zhao, Shiwei Zhang, Ke Wang, Yibo Liang, Longping Front Neurosci Neuroscience Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets. Frontiers Media S.A. 2023-04-12 /pmc/articles/PMC10130448/ /pubmed/37123378 http://dx.doi.org/10.3389/fnins.2023.1136934 Text en Copyright © 2023 Shi, Zhao, Zhang, Wang and Liang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Shi, Chaojun
Zhao, Shiwei
Zhang, Ke
Wang, Yibo
Liang, Longping
Face-based age estimation using improved Swin Transformer with attention-based convolution
title Face-based age estimation using improved Swin Transformer with attention-based convolution
title_full Face-based age estimation using improved Swin Transformer with attention-based convolution
title_fullStr Face-based age estimation using improved Swin Transformer with attention-based convolution
title_full_unstemmed Face-based age estimation using improved Swin Transformer with attention-based convolution
title_short Face-based age estimation using improved Swin Transformer with attention-based convolution
title_sort face-based age estimation using improved swin transformer with attention-based convolution
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10130448/
https://www.ncbi.nlm.nih.gov/pubmed/37123378
http://dx.doi.org/10.3389/fnins.2023.1136934
work_keys_str_mv AT shichaojun facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT zhaoshiwei facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT zhangke facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT wangyibo facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution
AT lianglongping facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution