Cargando…

Lightweight Scene Text Recognition Based on Transformer

Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-qualit...

Descripción completa

Detalles Bibliográficos
Autores principales: Luan, Xin, Zhang, Jinwei, Xu, Miaomiao, Silamu, Wushouer, Li, Yanbing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181526/
https://www.ncbi.nlm.nih.gov/pubmed/37177694
http://dx.doi.org/10.3390/s23094490
_version_ 1785041595321548800
author Luan, Xin
Zhang, Jinwei
Xu, Miaomiao
Silamu, Wushouer
Li, Yanbing
author_facet Luan, Xin
Zhang, Jinwei
Xu, Miaomiao
Silamu, Wushouer
Li, Yanbing
author_sort Luan, Xin
collection PubMed
description Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load.
format Online
Article
Text
id pubmed-10181526
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101815262023-05-13 Lightweight Scene Text Recognition Based on Transformer Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing Sensors (Basel) Article Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load. MDPI 2023-05-05 /pmc/articles/PMC10181526/ /pubmed/37177694 http://dx.doi.org/10.3390/s23094490 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Luan, Xin
Zhang, Jinwei
Xu, Miaomiao
Silamu, Wushouer
Li, Yanbing
Lightweight Scene Text Recognition Based on Transformer
title Lightweight Scene Text Recognition Based on Transformer
title_full Lightweight Scene Text Recognition Based on Transformer
title_fullStr Lightweight Scene Text Recognition Based on Transformer
title_full_unstemmed Lightweight Scene Text Recognition Based on Transformer
title_short Lightweight Scene Text Recognition Based on Transformer
title_sort lightweight scene text recognition based on transformer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181526/
https://www.ncbi.nlm.nih.gov/pubmed/37177694
http://dx.doi.org/10.3390/s23094490
work_keys_str_mv AT luanxin lightweightscenetextrecognitionbasedontransformer
AT zhangjinwei lightweightscenetextrecognitionbasedontransformer
AT xumiaomiao lightweightscenetextrecognitionbasedontransformer
AT silamuwushouer lightweightscenetextrecognitionbasedontransformer
AT liyanbing lightweightscenetextrecognitionbasedontransformer