Cargando…

DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection

Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dinh, My-Tham, Choi, Deok-Jai, Lee, Guee-Sang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347224/ https://www.ncbi.nlm.nih.gov/pubmed/37447738 http://dx.doi.org/10.3390/s23135889

_version_	1785073499871641600
author	Dinh, My-Tham Choi, Deok-Jai Lee, Guee-Sang
author_facet	Dinh, My-Tham Choi, Deok-Jai Lee, Guee-Sang
author_sort	Dinh, My-Tham
collection	PubMed
description	Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods.
format	Online Article Text
id	pubmed-10347224
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-103472242023-07-15 DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection Dinh, My-Tham Choi, Deok-Jai Lee, Guee-Sang Sensors (Basel) Article Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods. MDPI 2023-06-25 /pmc/articles/PMC10347224/ /pubmed/37447738 http://dx.doi.org/10.3390/s23135889 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Dinh, My-Tham Choi, Deok-Jai Lee, Guee-Sang DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection
title	DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection
title_full	DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection
title_fullStr	DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection
title_full_unstemmed	DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection
title_short	DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection
title_sort	densetextpvt: pyramid vision transformer with deep multi-scale feature refinement network for dense text detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347224/ https://www.ncbi.nlm.nih.gov/pubmed/37447738 http://dx.doi.org/10.3390/s23135889
work_keys_str_mv	AT dinhmytham densetextpvtpyramidvisiontransformerwithdeepmultiscalefeaturerefinementnetworkfordensetextdetection AT choideokjai densetextpvtpyramidvisiontransformerwithdeepmultiscalefeaturerefinementnetworkfordensetextdetection AT leegueesang densetextpvtpyramidvisiontransformerwithdeepmultiscalefeaturerefinementnetworkfordensetextdetection

DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection

Ejemplares similares