Cargando…

Attention Guided Feature Encoding for Scene Text Recognition

The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Hassan, Ehtesham, V. L., Lekshmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604773/
https://www.ncbi.nlm.nih.gov/pubmed/36286370
http://dx.doi.org/10.3390/jimaging8100276
_version_ 1784817898963861504
author Hassan, Ehtesham
V. L., Lekshmi
author_facet Hassan, Ehtesham
V. L., Lekshmi
author_sort Hassan, Ehtesham
collection PubMed
description The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods.
format Online
Article
Text
id pubmed-9604773
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96047732022-10-27 Attention Guided Feature Encoding for Scene Text Recognition Hassan, Ehtesham V. L., Lekshmi J Imaging Article The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods. MDPI 2022-10-08 /pmc/articles/PMC9604773/ /pubmed/36286370 http://dx.doi.org/10.3390/jimaging8100276 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hassan, Ehtesham
V. L., Lekshmi
Attention Guided Feature Encoding for Scene Text Recognition
title Attention Guided Feature Encoding for Scene Text Recognition
title_full Attention Guided Feature Encoding for Scene Text Recognition
title_fullStr Attention Guided Feature Encoding for Scene Text Recognition
title_full_unstemmed Attention Guided Feature Encoding for Scene Text Recognition
title_short Attention Guided Feature Encoding for Scene Text Recognition
title_sort attention guided feature encoding for scene text recognition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604773/
https://www.ncbi.nlm.nih.gov/pubmed/36286370
http://dx.doi.org/10.3390/jimaging8100276
work_keys_str_mv AT hassanehtesham attentionguidedfeatureencodingforscenetextrecognition
AT vllekshmi attentionguidedfeatureencodingforscenetextrecognition