Cargando…

An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †

Crowd counting, as a basic computer vision task, plays an important role in many fields such as video surveillance, accident prediction, public security, and intelligent transportation. At present, crowd counting tasks face various challenges. Firstly, due to the diversity of crowd distribution and...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Peng, Lei, Weimin, Zhao, Xinlei, Dong, Lijia, Lin, Zhaonan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10535756/
https://www.ncbi.nlm.nih.gov/pubmed/37765861
http://dx.doi.org/10.3390/s23187805
_version_ 1785112706292908032
author Zhang, Peng
Lei, Weimin
Zhao, Xinlei
Dong, Lijia
Lin, Zhaonan
author_facet Zhang, Peng
Lei, Weimin
Zhao, Xinlei
Dong, Lijia
Lin, Zhaonan
author_sort Zhang, Peng
collection PubMed
description Crowd counting, as a basic computer vision task, plays an important role in many fields such as video surveillance, accident prediction, public security, and intelligent transportation. At present, crowd counting tasks face various challenges. Firstly, due to the diversity of crowd distribution and increasing population density, there is a phenomenon of large-scale crowd aggregation in public places, sports stadiums, and stations, resulting in very serious occlusion. Secondly, when annotating large-scale datasets, positioning errors can also easily affect training results. In addition, the size of human head targets in dense images is not consistent, making it difficult to identify both near and far targets using only one network simultaneously. The existing crowd counting methods mainly use density plot regression methods. However, this framework does not distinguish the features between distant and near targets and cannot adaptively respond to scale changes. Therefore, the detection performance in areas with sparse population distribution is not good. To solve such problems, we propose an adaptive multi-scale far and near distance network based on the convolutional neural network (CNN) framework for counting dense populations and achieving a good balance between accuracy, inference speed, and performance. However, on the feature level, in order to enable the model to distinguish the differences between near and far features, we use stacked convolution layers to deepen the depth of the network, allocate different receptive fields according to the distance between the target and the camera, and fuse the features between nearby targets to enhance the feature extraction ability of pedestrians under nearby targets. Secondly, depth information is used to distinguish distant and near targets of different scales and the original image is cut into four different patches to perform pixel-level adaptive modeling on the population. In addition, we add density normalized average precision (nAP) indicators to analyze the accuracy of our method in spatial positioning. This paper validates the effectiveness of NF-Net on three challenging benchmarks in Shanghai Tech Part A and B, UCF_ CC_50, and UCF-QNRF datasets. Compared with SOTA, it has more significant performance in various scenarios. In the UCF-QNRF dataset, it is further validated that our method effectively solves the interference of complex backgrounds.
format Online
Article
Text
id pubmed-10535756
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-105357562023-09-29 An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting † Zhang, Peng Lei, Weimin Zhao, Xinlei Dong, Lijia Lin, Zhaonan Sensors (Basel) Article Crowd counting, as a basic computer vision task, plays an important role in many fields such as video surveillance, accident prediction, public security, and intelligent transportation. At present, crowd counting tasks face various challenges. Firstly, due to the diversity of crowd distribution and increasing population density, there is a phenomenon of large-scale crowd aggregation in public places, sports stadiums, and stations, resulting in very serious occlusion. Secondly, when annotating large-scale datasets, positioning errors can also easily affect training results. In addition, the size of human head targets in dense images is not consistent, making it difficult to identify both near and far targets using only one network simultaneously. The existing crowd counting methods mainly use density plot regression methods. However, this framework does not distinguish the features between distant and near targets and cannot adaptively respond to scale changes. Therefore, the detection performance in areas with sparse population distribution is not good. To solve such problems, we propose an adaptive multi-scale far and near distance network based on the convolutional neural network (CNN) framework for counting dense populations and achieving a good balance between accuracy, inference speed, and performance. However, on the feature level, in order to enable the model to distinguish the differences between near and far features, we use stacked convolution layers to deepen the depth of the network, allocate different receptive fields according to the distance between the target and the camera, and fuse the features between nearby targets to enhance the feature extraction ability of pedestrians under nearby targets. Secondly, depth information is used to distinguish distant and near targets of different scales and the original image is cut into four different patches to perform pixel-level adaptive modeling on the population. In addition, we add density normalized average precision (nAP) indicators to analyze the accuracy of our method in spatial positioning. This paper validates the effectiveness of NF-Net on three challenging benchmarks in Shanghai Tech Part A and B, UCF_ CC_50, and UCF-QNRF datasets. Compared with SOTA, it has more significant performance in various scenarios. In the UCF-QNRF dataset, it is further validated that our method effectively solves the interference of complex backgrounds. MDPI 2023-09-11 /pmc/articles/PMC10535756/ /pubmed/37765861 http://dx.doi.org/10.3390/s23187805 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Peng
Lei, Weimin
Zhao, Xinlei
Dong, Lijia
Lin, Zhaonan
An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †
title An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †
title_full An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †
title_fullStr An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †
title_full_unstemmed An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †
title_short An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †
title_sort adaptive multi-scale network based on depth information for crowd counting †
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10535756/
https://www.ncbi.nlm.nih.gov/pubmed/37765861
http://dx.doi.org/10.3390/s23187805
work_keys_str_mv AT zhangpeng anadaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT leiweimin anadaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT zhaoxinlei anadaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT donglijia anadaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT linzhaonan anadaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT zhangpeng adaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT leiweimin adaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT zhaoxinlei adaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT donglijia adaptivemultiscalenetworkbasedondepthinformationforcrowdcounting
AT linzhaonan adaptivemultiscalenetworkbasedondepthinformationforcrowdcounting