Cargando…

MR-FPN: Multi-Level Residual Feature Pyramid Text Detection Network Based on Self-Attention Environment

With humanity entering the age of intelligence, text detection technology has been gradually applied in the industry. However, text detection in a complex background is still a challenging problem for researchers to overcome. Most of the current algorithms are not robust enough to locate text region...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Jianjun, Ibrayim, Mayire, Hamdulla, Askar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9102995/
https://www.ncbi.nlm.nih.gov/pubmed/35591028
http://dx.doi.org/10.3390/s22093337
Descripción
Sumario:With humanity entering the age of intelligence, text detection technology has been gradually applied in the industry. However, text detection in a complex background is still a challenging problem for researchers to overcome. Most of the current algorithms are not robust enough to locate text regions, and the problem of the misdetection of adjacent text instances still exists. In order to solve the above problems, this paper proposes a multi-level residual feature pyramid network (MR-FPN) based on a self-attention environment, which can accurately separate adjacent text instances. Specifically, the framework uses ResNet50 as the backbone network, which is improved on the feature pyramid network (FPN). A self-attention module (SAM) is introduced to capture pixel-level relations, increase context connection, and obtain efficient features. At the same time, the multi-scale enhancement module (MEM) improves the expression ability of text information, extracting strong semantic information and integrating the multi-scale features generated by the feature pyramid. In addition, information regarding the upper features will cause loss when the feature pyramid is passed down step by step, and multi-level residuals can effectively solve this problem. The proposed model can effectively improve the fusion ability of the feature pyramid, provide more refined features for text detection, and improve the robustness of text detection. This model was evaluated on CTW1500, Total-Text, ICDAR2015, and MSRA-TD500 datasets of different kinds and achieved varying degrees of improvement. It is worth mentioning that the F-measure of 83.31% obtained by this paper on the Total-Text dataset exceeds that of the baseline system by 5%.