Cargando…
PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. I...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534625/ https://www.ncbi.nlm.nih.gov/pubmed/36211021 http://dx.doi.org/10.1155/2022/8255763 |
_version_ | 1784802584243994624 |
---|---|
author | Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo |
author_facet | Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo |
author_sort | Dan, Yongping |
collection | PubMed |
description | Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed. |
format | Online Article Text |
id | pubmed-9534625 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-95346252022-10-06 PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo Comput Intell Neurosci Research Article Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed. Hindawi 2022-09-28 /pmc/articles/PMC9534625/ /pubmed/36211021 http://dx.doi.org/10.1155/2022/8255763 Text en Copyright © 2022 Yongping Dan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition |
title | PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition |
title_full | PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition |
title_fullStr | PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition |
title_full_unstemmed | PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition |
title_short | PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition |
title_sort | pf-vit: parallel and fast vision transformer for offline handwritten chinese character recognition |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534625/ https://www.ncbi.nlm.nih.gov/pubmed/36211021 http://dx.doi.org/10.1155/2022/8255763 |
work_keys_str_mv | AT danyongping pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition AT zhuzongnan pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition AT jinweishou pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition AT lizhuo pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition |