Cargando…

A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information

There is a growing interest in scene text detection for arbitrary shapes. The effectiveness of text detection has also evolved from horizontal text detection to the ability to perform text detection in multiple directions and arbitrary shapes. However, scene text detection is still a challenging tas...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Zhenchao, Silamu, Wushour, Li, Yuze, Xu, Miaomiao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9781025/ https://www.ncbi.nlm.nih.gov/pubmed/36560350 http://dx.doi.org/10.3390/s22249982

_version_	1784856972523208704
author	Wang, Zhenchao Silamu, Wushour Li, Yuze Xu, Miaomiao
author_facet	Wang, Zhenchao Silamu, Wushour Li, Yuze Xu, Miaomiao
author_sort	Wang, Zhenchao
collection	PubMed
description	There is a growing interest in scene text detection for arbitrary shapes. The effectiveness of text detection has also evolved from horizontal text detection to the ability to perform text detection in multiple directions and arbitrary shapes. However, scene text detection is still a challenging task due to significant differences in size and aspect ratio and diversity in shape, as well as orientation, coarse annotations, and other factors. Regression-based methods are inspired by object detection and have limitations in fitting the edges of arbitrarily shaped text due to the characteristics of their methods. Segmentation-based methods, on the other hand, perform prediction at the pixel level and thus can fit arbitrarily shaped text better. However, the inaccuracy of image text annotations and the distribution characteristics of text pixels, which contain a large number of background pixels and misclassified pixels, degrades the performance of segmentation-based text detection methods to some extent. Usually, considering whether a pixel belongs to a text region is highly dependent on the strength of the semantic information it has and the position of the pixel in the text area. Based on the above two points, we propose an innovative and robust method for scene text detection combining position and semantic information. First, we add position information to the images using a position encoding module (PosEM) to help the model learn the implicit feature relationships associated with the position. Second, we use the semantic enhancement module (SEM) to enhance the model’s focus on the semantic information in the image during feature extraction. Then, to minimize the effect of noise due to inaccurate image text annotations and the distribution characteristics of text pixels, we convert the detection results into a probability map that can more reasonably represent the text distribution. Finally, we reconstruct and filter the text instances using a post-processing algorithm to reduce false positives. The experimental results show that our model improves significantly on the Total-Text, MSRA-TD500, and CTW1500 datasets, outperforming most previous advanced algorithms.
format	Online Article Text
id	pubmed-9781025
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-97810252022-12-24 A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information Wang, Zhenchao Silamu, Wushour Li, Yuze Xu, Miaomiao Sensors (Basel) Article There is a growing interest in scene text detection for arbitrary shapes. The effectiveness of text detection has also evolved from horizontal text detection to the ability to perform text detection in multiple directions and arbitrary shapes. However, scene text detection is still a challenging task due to significant differences in size and aspect ratio and diversity in shape, as well as orientation, coarse annotations, and other factors. Regression-based methods are inspired by object detection and have limitations in fitting the edges of arbitrarily shaped text due to the characteristics of their methods. Segmentation-based methods, on the other hand, perform prediction at the pixel level and thus can fit arbitrarily shaped text better. However, the inaccuracy of image text annotations and the distribution characteristics of text pixels, which contain a large number of background pixels and misclassified pixels, degrades the performance of segmentation-based text detection methods to some extent. Usually, considering whether a pixel belongs to a text region is highly dependent on the strength of the semantic information it has and the position of the pixel in the text area. Based on the above two points, we propose an innovative and robust method for scene text detection combining position and semantic information. First, we add position information to the images using a position encoding module (PosEM) to help the model learn the implicit feature relationships associated with the position. Second, we use the semantic enhancement module (SEM) to enhance the model’s focus on the semantic information in the image during feature extraction. Then, to minimize the effect of noise due to inaccurate image text annotations and the distribution characteristics of text pixels, we convert the detection results into a probability map that can more reasonably represent the text distribution. Finally, we reconstruct and filter the text instances using a post-processing algorithm to reduce false positives. The experimental results show that our model improves significantly on the Total-Text, MSRA-TD500, and CTW1500 datasets, outperforming most previous advanced algorithms. MDPI 2022-12-18 /pmc/articles/PMC9781025/ /pubmed/36560350 http://dx.doi.org/10.3390/s22249982 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Wang, Zhenchao Silamu, Wushour Li, Yuze Xu, Miaomiao A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information
title	A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information
title_full	A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information
title_fullStr	A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information
title_full_unstemmed	A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information
title_short	A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information
title_sort	robust method: arbitrary shape text detection combining semantic and position information
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9781025/ https://www.ncbi.nlm.nih.gov/pubmed/36560350 http://dx.doi.org/10.3390/s22249982
work_keys_str_mv	AT wangzhenchao arobustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation AT silamuwushour arobustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation AT liyuze arobustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation AT xumiaomiao arobustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation AT wangzhenchao robustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation AT silamuwushour robustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation AT liyuze robustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation AT xumiaomiao robustmethodarbitraryshapetextdetectioncombiningsemanticandpositioninformation

A Robust Method: Arbitrary Shape Text Detection Combining Semantic and Position Information

Ejemplares similares