Cargando…

Lightweight Scene Text Recognition Based on Transformer

Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-qualit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Luan, Xin, Zhang, Jinwei, Xu, Miaomiao, Silamu, Wushouer, Li, Yanbing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181526/ https://www.ncbi.nlm.nih.gov/pubmed/37177694 http://dx.doi.org/10.3390/s23094490

_version_	1785041595321548800
author	Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing
author_facet	Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing
author_sort	Luan, Xin
collection	PubMed
description	Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load.
format	Online Article Text
id	pubmed-10181526
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-101815262023-05-13 Lightweight Scene Text Recognition Based on Transformer Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing Sensors (Basel) Article Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load. MDPI 2023-05-05 /pmc/articles/PMC10181526/ /pubmed/37177694 http://dx.doi.org/10.3390/s23094490 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing Lightweight Scene Text Recognition Based on Transformer
title	Lightweight Scene Text Recognition Based on Transformer
title_full	Lightweight Scene Text Recognition Based on Transformer
title_fullStr	Lightweight Scene Text Recognition Based on Transformer
title_full_unstemmed	Lightweight Scene Text Recognition Based on Transformer
title_short	Lightweight Scene Text Recognition Based on Transformer
title_sort	lightweight scene text recognition based on transformer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181526/ https://www.ncbi.nlm.nih.gov/pubmed/37177694 http://dx.doi.org/10.3390/s23094490
work_keys_str_mv	AT luanxin lightweightscenetextrecognitionbasedontransformer AT zhangjinwei lightweightscenetextrecognitionbasedontransformer AT xumiaomiao lightweightscenetextrecognitionbasedontransformer AT silamuwushouer lightweightscenetextrecognitionbasedontransformer AT liyanbing lightweightscenetextrecognitionbasedontransformer

Lightweight Scene Text Recognition Based on Transformer

Ejemplares similares