Cargando…

Attention Guided Feature Encoding for Scene Text Recognition

The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition meth...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hassan, Ehtesham, V. L., Lekshmi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604773/ https://www.ncbi.nlm.nih.gov/pubmed/36286370 http://dx.doi.org/10.3390/jimaging8100276

_version_	1784817898963861504
author	Hassan, Ehtesham V. L., Lekshmi
author_facet	Hassan, Ehtesham V. L., Lekshmi
author_sort	Hassan, Ehtesham
collection	PubMed
description	The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods.
format	Online Article Text
id	pubmed-9604773
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96047732022-10-27 Attention Guided Feature Encoding for Scene Text Recognition Hassan, Ehtesham V. L., Lekshmi J Imaging Article The real-life scene images exhibit a range of variations in text appearances, including complex shapes, variations in sizes, and fancy font properties. Consequently, text recognition from scene images remains a challenging problem in computer vision research. We present a scene text recognition methodology by designing a novel feature-enhanced convolutional recurrent neural network architecture. Our work addresses scene text recognition as well as sequence-to-sequence modeling, where a novel deep encoder–decoder network is proposed. The encoder in the proposed network is designed around a hierarchy of convolutional blocks enabled with spatial attention blocks, followed by bidirectional long short-term memory layers. In contrast to existing methods for scene text recognition, which incorporate temporal attention on the decoder side of the entire architecture, our convolutional architecture incorporates novel spatial attention design to guide feature extraction onto textual details in scene text images. The experiments and analysis demonstrate that our approach learns robust text-specific feature sequences for input images, as the convolution architecture designed for feature extraction is tuned to capture a broader spatial text context. With extensive experiments on ICDAR2013, ICDAR2015, IIIT5K and SVT datasets, the paper demonstrates an improvement over many important state-of-the-art methods. MDPI 2022-10-08 /pmc/articles/PMC9604773/ /pubmed/36286370 http://dx.doi.org/10.3390/jimaging8100276 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Hassan, Ehtesham V. L., Lekshmi Attention Guided Feature Encoding for Scene Text Recognition
title	Attention Guided Feature Encoding for Scene Text Recognition
title_full	Attention Guided Feature Encoding for Scene Text Recognition
title_fullStr	Attention Guided Feature Encoding for Scene Text Recognition
title_full_unstemmed	Attention Guided Feature Encoding for Scene Text Recognition
title_short	Attention Guided Feature Encoding for Scene Text Recognition
title_sort	attention guided feature encoding for scene text recognition
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604773/ https://www.ncbi.nlm.nih.gov/pubmed/36286370 http://dx.doi.org/10.3390/jimaging8100276
work_keys_str_mv	AT hassanehtesham attentionguidedfeatureencodingforscenetextrecognition AT vllekshmi attentionguidedfeatureencodingforscenetextrecognition

Attention Guided Feature Encoding for Scene Text Recognition

Ejemplares similares