Cargando…

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition

Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. I...

Descripción completa

Detalles Bibliográficos
Autores principales: Dan, Yongping, Zhu, Zongnan, Jin, Weishou, Li, Zhuo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534625/
https://www.ncbi.nlm.nih.gov/pubmed/36211021
http://dx.doi.org/10.1155/2022/8255763
_version_ 1784802584243994624
author Dan, Yongping
Zhu, Zongnan
Jin, Weishou
Li, Zhuo
author_facet Dan, Yongping
Zhu, Zongnan
Jin, Weishou
Li, Zhuo
author_sort Dan, Yongping
collection PubMed
description Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed.
format Online
Article
Text
id pubmed-9534625
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-95346252022-10-06 PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo Comput Intell Neurosci Research Article Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed. Hindawi 2022-09-28 /pmc/articles/PMC9534625/ /pubmed/36211021 http://dx.doi.org/10.1155/2022/8255763 Text en Copyright © 2022 Yongping Dan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Dan, Yongping
Zhu, Zongnan
Jin, Weishou
Li, Zhuo
PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_full PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_fullStr PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_full_unstemmed PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_short PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_sort pf-vit: parallel and fast vision transformer for offline handwritten chinese character recognition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534625/
https://www.ncbi.nlm.nih.gov/pubmed/36211021
http://dx.doi.org/10.1155/2022/8255763
work_keys_str_mv AT danyongping pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition
AT zhuzongnan pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition
AT jinweishou pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition
AT lizhuo pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition