Cargando…
Face-based age estimation using improved Swin Transformer with attention-based convolution
Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong feature...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10130448/ https://www.ncbi.nlm.nih.gov/pubmed/37123378 http://dx.doi.org/10.3389/fnins.2023.1136934 |
_version_ | 1785030960128983040 |
---|---|
author | Shi, Chaojun Zhao, Shiwei Zhang, Ke Wang, Yibo Liang, Longping |
author_facet | Shi, Chaojun Zhao, Shiwei Zhang, Ke Wang, Yibo Liang, Longping |
author_sort | Shi, Chaojun |
collection | PubMed |
description | Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets. |
format | Online Article Text |
id | pubmed-10130448 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-101304482023-04-27 Face-based age estimation using improved Swin Transformer with attention-based convolution Shi, Chaojun Zhao, Shiwei Zhang, Ke Wang, Yibo Liang, Longping Front Neurosci Neuroscience Recently Transformer models is new direction in the computer vision field, which is based on self multihead attention mechanism. Compared with the convolutional neural network, this Transformer uses the self-attention mechanism to capture global contextual information and extract more strong features by learning the association relationship between different features, which has achieved good results in many vision tasks. In face-based age estimation, some facial patches that contain rich age-specific information are critical in the age estimation task. The present study proposed an attention-based convolution (ABC) age estimation framework, called improved Swin Transformer with ABC, in which two separate regions were implemented, namely ABC and Swin Transformer. ABC extracted facial patches containing rich age-specific information using a shallow convolutional network and a multiheaded attention mechanism. Subsequently, the features obtained by ABC were spliced with the flattened image in the Swin Transformer, which were then input to the Swin Transformer to predict the age of the image. The ABC framework spliced the important regions that contained rich age-specific information into the original image, which could fully mobilize the long-dependency of the Swin Transformer, that is, extracting stronger features by learning the dependency relationship between different features. ABC also introduced loss of diversity to guide the training of self-attention mechanism, reducing overlap between patches so that the diverse and important patches were discovered. Through extensive experiments, this study showed that the proposed framework outperformed several state-of-the-art methods on age estimation benchmark datasets. Frontiers Media S.A. 2023-04-12 /pmc/articles/PMC10130448/ /pubmed/37123378 http://dx.doi.org/10.3389/fnins.2023.1136934 Text en Copyright © 2023 Shi, Zhao, Zhang, Wang and Liang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Shi, Chaojun Zhao, Shiwei Zhang, Ke Wang, Yibo Liang, Longping Face-based age estimation using improved Swin Transformer with attention-based convolution |
title | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_full | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_fullStr | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_full_unstemmed | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_short | Face-based age estimation using improved Swin Transformer with attention-based convolution |
title_sort | face-based age estimation using improved swin transformer with attention-based convolution |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10130448/ https://www.ncbi.nlm.nih.gov/pubmed/37123378 http://dx.doi.org/10.3389/fnins.2023.1136934 |
work_keys_str_mv | AT shichaojun facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT zhaoshiwei facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT zhangke facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT wangyibo facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution AT lianglongping facebasedageestimationusingimprovedswintransformerwithattentionbasedconvolution |