Cargando…
Attention Guided Feature Encoding for Scene Text Recognition
The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition meth...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604773/ https://www.ncbi.nlm.nih.gov/pubmed/36286370 http://dx.doi.org/10.3390/jimaging8100276 |
_version_ | 1784817898963861504 |
---|---|
author | Hassan, Ehtesham V. L., Lekshmi |
author_facet | Hassan, Ehtesham V. L., Lekshmi |
author_sort | Hassan, Ehtesham |
collection | PubMed |
description | The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods. |
format | Online Article Text |
id | pubmed-9604773 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96047732022-10-27 Attention Guided Feature Encoding for Scene Text Recognition Hassan, Ehtesham V. L., Lekshmi J Imaging Article The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods. MDPI 2022-10-08 /pmc/articles/PMC9604773/ /pubmed/36286370 http://dx.doi.org/10.3390/jimaging8100276 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Hassan, Ehtesham V. L., Lekshmi Attention Guided Feature Encoding for Scene Text Recognition |
title | Attention Guided Feature Encoding for Scene Text Recognition |
title_full | Attention Guided Feature Encoding for Scene Text Recognition |
title_fullStr | Attention Guided Feature Encoding for Scene Text Recognition |
title_full_unstemmed | Attention Guided Feature Encoding for Scene Text Recognition |
title_short | Attention Guided Feature Encoding for Scene Text Recognition |
title_sort | attention guided feature encoding for scene text recognition |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604773/ https://www.ncbi.nlm.nih.gov/pubmed/36286370 http://dx.doi.org/10.3390/jimaging8100276 |
work_keys_str_mv | AT hassanehtesham attentionguidedfeatureencodingforscenetextrecognition AT vllekshmi attentionguidedfeatureencodingforscenetextrecognition |