Cargando…

A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection

Most pedestrian detection methods focus on bounding boxes based on fusing RGB with lidar. These methods do not relate to how the human eye perceives objects in the real world. Furthermore, lidar and vision can have difficulty detecting pedestrians in scattered environments, and radar can be used to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Plascencia, Alfredo Chávez, García-Gómez, Pablo, Perez, Eduardo Bernal, DeMas-Giménez, Gerard, Casas, Josep R., Royo, Santiago
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10144184/ https://www.ncbi.nlm.nih.gov/pubmed/37112506 http://dx.doi.org/10.3390/s23084167

_version_	1785034041164038144
author	Plascencia, Alfredo Chávez García-Gómez, Pablo Perez, Eduardo Bernal DeMas-Giménez, Gerard Casas, Josep R. Royo, Santiago
author_facet	Plascencia, Alfredo Chávez García-Gómez, Pablo Perez, Eduardo Bernal DeMas-Giménez, Gerard Casas, Josep R. Royo, Santiago
author_sort	Plascencia, Alfredo Chávez
collection	PubMed
description	Most pedestrian detection methods focus on bounding boxes based on fusing RGB with lidar. These methods do not relate to how the human eye perceives objects in the real world. Furthermore, lidar and vision can have difficulty detecting pedestrians in scattered environments, and radar can be used to overcome this problem. Therefore, the motivation of this work is to explore, as a preliminary step, the feasibility of fusing lidar, radar, and RGB for pedestrian detection that potentially can be used for autonomous driving that uses a fully connected convolutional neural network architecture for multimodal sensors. The core of the network is based on SegNet, a pixel-wise semantic segmentation network. In this context, lidar and radar were incorporated by transforming them from 3D pointclouds into 2D gray images with 16-bit depths, and RGB images were incorporated with three channels. The proposed architecture uses a single SegNet for each sensor reading, and the outputs are then applied to a fully connected neural network to fuse the three modalities of sensors. Afterwards, an up-sampling network is applied to recover the fused data. Additionally, a custom dataset of 60 images was proposed for training the architecture, with an additional 10 for evaluation and 10 for testing, giving a total of 80 images. The experiment results show a training mean pixel accuracy of 99.7% and a training mean intersection over union of 99.5%. Also, the testing mean of the IoU was 94.4%, and the testing pixel accuracy was 96.2%. These metric results have successfully demonstrated the effectiveness of using semantic segmentation for pedestrian detection under the modalities of three sensors. Despite some overfitting in the model during experimentation, it performed well in detecting people in test mode. Therefore, it is worth emphasizing that the focus of this work is to show that this method is feasible to be used, as it works regardless of the size of the dataset. Also, a bigger dataset would be necessary to achieve a more appropiate training. This method gives the advantage of detecting pedestrians as the human eye does, thereby resulting in less ambiguity. Additionally, this work has also proposed an extrinsic calibration matrix method for sensor alignment between radar and lidar based on singular value decomposition.
format	Online Article Text
id	pubmed-10144184
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-101441842023-04-29 A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection Plascencia, Alfredo Chávez García-Gómez, Pablo Perez, Eduardo Bernal DeMas-Giménez, Gerard Casas, Josep R. Royo, Santiago Sensors (Basel) Article Most pedestrian detection methods focus on bounding boxes based on fusing RGB with lidar. These methods do not relate to how the human eye perceives objects in the real world. Furthermore, lidar and vision can have difficulty detecting pedestrians in scattered environments, and radar can be used to overcome this problem. Therefore, the motivation of this work is to explore, as a preliminary step, the feasibility of fusing lidar, radar, and RGB for pedestrian detection that potentially can be used for autonomous driving that uses a fully connected convolutional neural network architecture for multimodal sensors. The core of the network is based on SegNet, a pixel-wise semantic segmentation network. In this context, lidar and radar were incorporated by transforming them from 3D pointclouds into 2D gray images with 16-bit depths, and RGB images were incorporated with three channels. The proposed architecture uses a single SegNet for each sensor reading, and the outputs are then applied to a fully connected neural network to fuse the three modalities of sensors. Afterwards, an up-sampling network is applied to recover the fused data. Additionally, a custom dataset of 60 images was proposed for training the architecture, with an additional 10 for evaluation and 10 for testing, giving a total of 80 images. The experiment results show a training mean pixel accuracy of 99.7% and a training mean intersection over union of 99.5%. Also, the testing mean of the IoU was 94.4%, and the testing pixel accuracy was 96.2%. These metric results have successfully demonstrated the effectiveness of using semantic segmentation for pedestrian detection under the modalities of three sensors. Despite some overfitting in the model during experimentation, it performed well in detecting people in test mode. Therefore, it is worth emphasizing that the focus of this work is to show that this method is feasible to be used, as it works regardless of the size of the dataset. Also, a bigger dataset would be necessary to achieve a more appropiate training. This method gives the advantage of detecting pedestrians as the human eye does, thereby resulting in less ambiguity. Additionally, this work has also proposed an extrinsic calibration matrix method for sensor alignment between radar and lidar based on singular value decomposition. MDPI 2023-04-21 /pmc/articles/PMC10144184/ /pubmed/37112506 http://dx.doi.org/10.3390/s23084167 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Plascencia, Alfredo Chávez García-Gómez, Pablo Perez, Eduardo Bernal DeMas-Giménez, Gerard Casas, Josep R. Royo, Santiago A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title	A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_full	A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_fullStr	A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_full_unstemmed	A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_short	A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_sort	preliminary study of deep learning sensor fusion for pedestrian detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10144184/ https://www.ncbi.nlm.nih.gov/pubmed/37112506 http://dx.doi.org/10.3390/s23084167
work_keys_str_mv	AT plascenciaalfredochavez apreliminarystudyofdeeplearningsensorfusionforpedestriandetection AT garciagomezpablo apreliminarystudyofdeeplearningsensorfusionforpedestriandetection AT perezeduardobernal apreliminarystudyofdeeplearningsensorfusionforpedestriandetection AT demasgimenezgerard apreliminarystudyofdeeplearningsensorfusionforpedestriandetection AT casasjosepr apreliminarystudyofdeeplearningsensorfusionforpedestriandetection AT royosantiago apreliminarystudyofdeeplearningsensorfusionforpedestriandetection AT plascenciaalfredochavez preliminarystudyofdeeplearningsensorfusionforpedestriandetection AT garciagomezpablo preliminarystudyofdeeplearningsensorfusionforpedestriandetection AT perezeduardobernal preliminarystudyofdeeplearningsensorfusionforpedestriandetection AT demasgimenezgerard preliminarystudyofdeeplearningsensorfusionforpedestriandetection AT casasjosepr preliminarystudyofdeeplearningsensorfusionforpedestriandetection AT royosantiago preliminarystudyofdeeplearningsensorfusionforpedestriandetection

A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection

Ejemplares similares