Cargando…

An effective modular approach for crowd counting in an image using convolutional neural networks

Abrupt and continuous nature of scale variation in a crowded scene is a challenging task to enhance crowd counting accuracy in an image. Existing crowd counting techniques generally used multi-column or single-column dilated convolution to tackle scale variation due to perspective distortion. Howeve...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ilyas, Naveed, Ahmad, Zaheer, Lee, Boreom, Kim, Kiseon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8986811/ https://www.ncbi.nlm.nih.gov/pubmed/35388054 http://dx.doi.org/10.1038/s41598-022-09685-w

_version_	1784682612967604224
author	Ilyas, Naveed Ahmad, Zaheer Lee, Boreom Kim, Kiseon
author_facet	Ilyas, Naveed Ahmad, Zaheer Lee, Boreom Kim, Kiseon
author_sort	Ilyas, Naveed
collection	PubMed
description	Abrupt and continuous nature of scale variation in a crowded scene is a challenging task to enhance crowd counting accuracy in an image. Existing crowd counting techniques generally used multi-column or single-column dilated convolution to tackle scale variation due to perspective distortion. However, due to multi-column nature, they obtain identical features, whereas, the standard dilated convolution (SDC) with expanded receptive field size has sparse pixel sampling rate. Due to sparse nature of SDC, it is highly challenging to obtain relevant contextual information. Further, features at multiple scale are not extracted despite some inception-based model is not used (which is cost effective). To mitigate theses drawbacks in SDC, we therefore, propose a hierarchical dense dilated deep pyramid feature extraction through convolution neural network (CNN) for single image crowd counting (HDPF). It comprises of three modules: general feature extraction module (GFEM), deep pyramid feature extraction module (PFEM) and fusion module (FM). The GFEM is responsible to obtain task independent general features. Whereas, PFEM plays a vital role to obtain the relevant contextual information due to dense pixel sampling rate caused by densely connected dense stacked dilated convolutional modules (DSDCs). Further, due to dense connections among DSDCs, the final feature map acquires multi-scale information with expanded receptive field as compared to SDC. Due to dense pyramid nature, it is very effective to propagate the extracted feature from lower dilated convolutional layers (DCLs) to middle and higher DCLs, which result in better estimation accuracy. The FM is used to fuse the incoming features extracted by other modules. The proposed technique is tested through simulations on three well known datasets: Shanghaitech (Part-A), Shanghaitech (Part-B) and Venice. Results justify its relative effectiveness in terms of selected performance.
format	Online Article Text
id	pubmed-8986811
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-89868112022-04-08 An effective modular approach for crowd counting in an image using convolutional neural networks Ilyas, Naveed Ahmad, Zaheer Lee, Boreom Kim, Kiseon Sci Rep Article Abrupt and continuous nature of scale variation in a crowded scene is a challenging task to enhance crowd counting accuracy in an image. Existing crowd counting techniques generally used multi-column or single-column dilated convolution to tackle scale variation due to perspective distortion. However, due to multi-column nature, they obtain identical features, whereas, the standard dilated convolution (SDC) with expanded receptive field size has sparse pixel sampling rate. Due to sparse nature of SDC, it is highly challenging to obtain relevant contextual information. Further, features at multiple scale are not extracted despite some inception-based model is not used (which is cost effective). To mitigate theses drawbacks in SDC, we therefore, propose a hierarchical dense dilated deep pyramid feature extraction through convolution neural network (CNN) for single image crowd counting (HDPF). It comprises of three modules: general feature extraction module (GFEM), deep pyramid feature extraction module (PFEM) and fusion module (FM). The GFEM is responsible to obtain task independent general features. Whereas, PFEM plays a vital role to obtain the relevant contextual information due to dense pixel sampling rate caused by densely connected dense stacked dilated convolutional modules (DSDCs). Further, due to dense connections among DSDCs, the final feature map acquires multi-scale information with expanded receptive field as compared to SDC. Due to dense pyramid nature, it is very effective to propagate the extracted feature from lower dilated convolutional layers (DCLs) to middle and higher DCLs, which result in better estimation accuracy. The FM is used to fuse the incoming features extracted by other modules. The proposed technique is tested through simulations on three well known datasets: Shanghaitech (Part-A), Shanghaitech (Part-B) and Venice. Results justify its relative effectiveness in terms of selected performance. Nature Publishing Group UK 2022-04-06 /pmc/articles/PMC8986811/ /pubmed/35388054 http://dx.doi.org/10.1038/s41598-022-09685-w Text en © The Author(s) 2022, corrected publication 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Ilyas, Naveed Ahmad, Zaheer Lee, Boreom Kim, Kiseon An effective modular approach for crowd counting in an image using convolutional neural networks
title	An effective modular approach for crowd counting in an image using convolutional neural networks
title_full	An effective modular approach for crowd counting in an image using convolutional neural networks
title_fullStr	An effective modular approach for crowd counting in an image using convolutional neural networks
title_full_unstemmed	An effective modular approach for crowd counting in an image using convolutional neural networks
title_short	An effective modular approach for crowd counting in an image using convolutional neural networks
title_sort	effective modular approach for crowd counting in an image using convolutional neural networks
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8986811/ https://www.ncbi.nlm.nih.gov/pubmed/35388054 http://dx.doi.org/10.1038/s41598-022-09685-w
work_keys_str_mv	AT ilyasnaveed aneffectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks AT ahmadzaheer aneffectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks AT leeboreom aneffectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks AT kimkiseon aneffectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks AT ilyasnaveed effectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks AT ahmadzaheer effectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks AT leeboreom effectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks AT kimkiseon effectivemodularapproachforcrowdcountinginanimageusingconvolutionalneuralnetworks

An effective modular approach for crowd counting in an image using convolutional neural networks

Ejemplares similares