Cargando…
MR-FPN: Multi-Level Residual Feature Pyramid Text Detection Network Based on Self-Attention Environment
With humanity entering the age of intelligence, text detection technology has been gradually applied in the industry. However, text detection in a complex background is still a challenging problem for researchers to overcome. Most of the current algorithms are not robust enough to locate text region...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9102995/ https://www.ncbi.nlm.nih.gov/pubmed/35591028 http://dx.doi.org/10.3390/s22093337 |
Sumario: | With humanity entering the age of intelligence, text detection technology has been gradually applied in the industry. However, text detection in a complex background is still a challenging problem for researchers to overcome. Most of the current algorithms are not robust enough to locate text regions, and the problem of the misdetection of adjacent text instances still exists. In order to solve the above problems, this paper proposes a multi-level residual feature pyramid network (MR-FPN) based on a self-attention environment, which can accurately separate adjacent text instances. Specifically, the framework uses ResNet50 as the backbone network, which is improved on the feature pyramid network (FPN). A self-attention module (SAM) is introduced to capture pixel-level relations, increase context connection, and obtain efficient features. At the same time, the multi-scale enhancement module (MEM) improves the expression ability of text information, extracting strong semantic information and integrating the multi-scale features generated by the feature pyramid. In addition, information regarding the upper features will cause loss when the feature pyramid is passed down step by step, and multi-level residuals can effectively solve this problem. The proposed model can effectively improve the fusion ability of the feature pyramid, provide more refined features for text detection, and improve the robustness of text detection. This model was evaluated on CTW1500, Total-Text, ICDAR2015, and MSRA-TD500 datasets of different kinds and achieved varying degrees of improvement. It is worth mentioning that the F-measure of 83.31% obtained by this paper on the Total-Text dataset exceeds that of the baseline system by 5%. |
---|