Cargando…

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition

Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. I...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dan, Yongping, Zhu, Zongnan, Jin, Weishou, Li, Zhuo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534625/ https://www.ncbi.nlm.nih.gov/pubmed/36211021 http://dx.doi.org/10.1155/2022/8255763

_version_	1784802584243994624
author	Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo
author_facet	Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo
author_sort	Dan, Yongping
collection	PubMed
description	Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed.
format	Online Article Text
id	pubmed-9534625
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-95346252022-10-06 PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo Comput Intell Neurosci Research Article Recently, Vision Transformer (ViT) has been widely used in the field of image recognition. Unfortunately, the ViT model repeatedly stacks 12-layer encoders, resulting in a large number of model computations, many parameters, and slow training speed, making it difficult to deploy on mobile devices. In order to reduce the computational complexity of the model and improve the training speed, a parallel and fast Vision Transformer method for offline handwritten Chinese character recognition is proposed. The method adds parallel branches of the encoder module to the structure of the Vision Transformer model. Parallel modes include two-way parallel, four-way parallel, and seven-way parallel. The original picture is fed to the encoder module after flattening and linear embedding processing operations. The core step in the encoder is the multihead attention mechanism. Multihead self-attention can learn the interdependence between image sequence blocks. In addition, the use of data expansion strategies increases the diversity of data. In the two-way parallel experiment, when the model is 98.1% accurate on the dataset, the number of parameters and the number of FLOPs are 43.11 million and 4.32 G, respectively. Compared with the ViT model, whose parameters and FLOPs are 86 million and 16.8 G, respectively, the two-way parallel model has a 50.1% decrease in parameters and a 34.6% decrease in FLOPs. This method has been demonstrated to effectively reduce the computational complexity of the model while indirectly improving image recognition speed. Hindawi 2022-09-28 /pmc/articles/PMC9534625/ /pubmed/36211021 http://dx.doi.org/10.1155/2022/8255763 Text en Copyright © 2022 Yongping Dan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Dan, Yongping Zhu, Zongnan Jin, Weishou Li, Zhuo PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title	PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_full	PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_fullStr	PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_full_unstemmed	PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_short	PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition
title_sort	pf-vit: parallel and fast vision transformer for offline handwritten chinese character recognition
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9534625/ https://www.ncbi.nlm.nih.gov/pubmed/36211021 http://dx.doi.org/10.1155/2022/8255763
work_keys_str_mv	AT danyongping pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition AT zhuzongnan pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition AT jinweishou pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition AT lizhuo pfvitparallelandfastvisiontransformerforofflinehandwrittenchinesecharacterrecognition

PF-ViT: Parallel and Fast Vision Transformer for Offline Handwritten Chinese Character Recognition

Ejemplares similares