Cargando…
Lightweight Scene Text Recognition Based on Transformer
Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-qualit...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181526/ https://www.ncbi.nlm.nih.gov/pubmed/37177694 http://dx.doi.org/10.3390/s23094490 |
_version_ | 1785041595321548800 |
---|---|
author | Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing |
author_facet | Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing |
author_sort | Luan, Xin |
collection | PubMed |
description | Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load. |
format | Online Article Text |
id | pubmed-10181526 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-101815262023-05-13 Lightweight Scene Text Recognition Based on Transformer Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing Sensors (Basel) Article Scene text recognition (STR) has been a hot research field in computer vision, aiming to recognize text in natural scenes using computers. Currently, attention-based encoder–decoder frameworks struggle to precisely align feature regions with the target object when dealing with complex and low-quality images, a phenomenon known as attention drift. Additionally, with the rise of Transformer, the increasing size of parameters results in higher computational costs. In order to solve the above problems, based on the latest research results of Vision Transformer (ViT), we utilize an additional position-enhancement branch to alleviate attention drift and dynamically fused position information with visual information to achieve better recognition accuracy. The experimental results demonstrate that our model achieves a 3% higher average recognition accuracy on the test set compared to the baseline. Meanwhile, our model maintains the advantage of a small number of parameters and fast inference speed, achieving a good balance between accuracy, speed, and computational load. MDPI 2023-05-05 /pmc/articles/PMC10181526/ /pubmed/37177694 http://dx.doi.org/10.3390/s23094490 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Luan, Xin Zhang, Jinwei Xu, Miaomiao Silamu, Wushouer Li, Yanbing Lightweight Scene Text Recognition Based on Transformer |
title | Lightweight Scene Text Recognition Based on Transformer |
title_full | Lightweight Scene Text Recognition Based on Transformer |
title_fullStr | Lightweight Scene Text Recognition Based on Transformer |
title_full_unstemmed | Lightweight Scene Text Recognition Based on Transformer |
title_short | Lightweight Scene Text Recognition Based on Transformer |
title_sort | lightweight scene text recognition based on transformer |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10181526/ https://www.ncbi.nlm.nih.gov/pubmed/37177694 http://dx.doi.org/10.3390/s23094490 |
work_keys_str_mv | AT luanxin lightweightscenetextrecognitionbasedontransformer AT zhangjinwei lightweightscenetextrecognitionbasedontransformer AT xumiaomiao lightweightscenetextrecognitionbasedontransformer AT silamuwushouer lightweightscenetextrecognitionbasedontransformer AT liyanbing lightweightscenetextrecognitionbasedontransformer |