Cargando…

A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection

Most pedestrian detection methods focus on bounding boxes based on fusing RGB with lidar. These methods do not relate to how the human eye perceives objects in the real world. Furthermore, lidar and vision can have difficulty detecting pedestrians in scattered environments, and radar can be used to...

Descripción completa

Detalles Bibliográficos
Autores principales: Plascencia, Alfredo Chávez, García-Gómez, Pablo, Perez, Eduardo Bernal, DeMas-Giménez, Gerard, Casas, Josep R., Royo, Santiago
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10144184/
https://www.ncbi.nlm.nih.gov/pubmed/37112506
http://dx.doi.org/10.3390/s23084167
_version_ 1785034041164038144
author Plascencia, Alfredo Chávez
García-Gómez, Pablo
Perez, Eduardo Bernal
DeMas-Giménez, Gerard
Casas, Josep R.
Royo, Santiago
author_facet Plascencia, Alfredo Chávez
García-Gómez, Pablo
Perez, Eduardo Bernal
DeMas-Giménez, Gerard
Casas, Josep R.
Royo, Santiago
author_sort Plascencia, Alfredo Chávez
collection PubMed
description Most pedestrian detection methods focus on bounding boxes based on fusing RGB with lidar. These methods do not relate to how the human eye perceives objects in the real world. Furthermore, lidar and vision can have difficulty detecting pedestrians in scattered environments, and radar can be used to overcome this problem. Therefore, the motivation of this work is to explore, as a preliminary step, the feasibility of fusing lidar, radar, and RGB for pedestrian detection that potentially can be used for autonomous driving that uses a fully connected convolutional neural network architecture for multimodal sensors. The core of the network is based on SegNet, a pixel-wise semantic segmentation network. In this context, lidar and radar were incorporated by transforming them from 3D pointclouds into 2D gray images with 16-bit depths, and RGB images were incorporated with three channels. The proposed architecture uses a single SegNet for each sensor reading, and the outputs are then applied to a fully connected neural network to fuse the three modalities of sensors. Afterwards, an up-sampling network is applied to recover the fused data. Additionally, a custom dataset of 60 images was proposed for training the architecture, with an additional 10 for evaluation and 10 for testing, giving a total of 80 images. The experiment results show a training mean pixel accuracy of 99.7% and a training mean intersection over union of 99.5%. Also, the testing mean of the IoU was 94.4%, and the testing pixel accuracy was 96.2%. These metric results have successfully demonstrated the effectiveness of using semantic segmentation for pedestrian detection under the modalities of three sensors. Despite some overfitting in the model during experimentation, it performed well in detecting people in test mode. Therefore, it is worth emphasizing that the focus of this work is to show that this method is feasible to be used, as it works regardless of the size of the dataset. Also, a bigger dataset would be necessary to achieve a more appropiate training. This method gives the advantage of detecting pedestrians as the human eye does, thereby resulting in less ambiguity. Additionally, this work has also proposed an extrinsic calibration matrix method for sensor alignment between radar and lidar based on singular value decomposition.
format Online
Article
Text
id pubmed-10144184
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101441842023-04-29 A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection Plascencia, Alfredo Chávez García-Gómez, Pablo Perez, Eduardo Bernal DeMas-Giménez, Gerard Casas, Josep R. Royo, Santiago Sensors (Basel) Article Most pedestrian detection methods focus on bounding boxes based on fusing RGB with lidar. These methods do not relate to how the human eye perceives objects in the real world. Furthermore, lidar and vision can have difficulty detecting pedestrians in scattered environments, and radar can be used to overcome this problem. Therefore, the motivation of this work is to explore, as a preliminary step, the feasibility of fusing lidar, radar, and RGB for pedestrian detection that potentially can be used for autonomous driving that uses a fully connected convolutional neural network architecture for multimodal sensors. The core of the network is based on SegNet, a pixel-wise semantic segmentation network. In this context, lidar and radar were incorporated by transforming them from 3D pointclouds into 2D gray images with 16-bit depths, and RGB images were incorporated with three channels. The proposed architecture uses a single SegNet for each sensor reading, and the outputs are then applied to a fully connected neural network to fuse the three modalities of sensors. Afterwards, an up-sampling network is applied to recover the fused data. Additionally, a custom dataset of 60 images was proposed for training the architecture, with an additional 10 for evaluation and 10 for testing, giving a total of 80 images. The experiment results show a training mean pixel accuracy of 99.7% and a training mean intersection over union of 99.5%. Also, the testing mean of the IoU was 94.4%, and the testing pixel accuracy was 96.2%. These metric results have successfully demonstrated the effectiveness of using semantic segmentation for pedestrian detection under the modalities of three sensors. Despite some overfitting in the model during experimentation, it performed well in detecting people in test mode. Therefore, it is worth emphasizing that the focus of this work is to show that this method is feasible to be used, as it works regardless of the size of the dataset. Also, a bigger dataset would be necessary to achieve a more appropiate training. This method gives the advantage of detecting pedestrians as the human eye does, thereby resulting in less ambiguity. Additionally, this work has also proposed an extrinsic calibration matrix method for sensor alignment between radar and lidar based on singular value decomposition. MDPI 2023-04-21 /pmc/articles/PMC10144184/ /pubmed/37112506 http://dx.doi.org/10.3390/s23084167 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Plascencia, Alfredo Chávez
García-Gómez, Pablo
Perez, Eduardo Bernal
DeMas-Giménez, Gerard
Casas, Josep R.
Royo, Santiago
A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_full A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_fullStr A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_full_unstemmed A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_short A Preliminary Study of Deep Learning Sensor Fusion for Pedestrian Detection
title_sort preliminary study of deep learning sensor fusion for pedestrian detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10144184/
https://www.ncbi.nlm.nih.gov/pubmed/37112506
http://dx.doi.org/10.3390/s23084167
work_keys_str_mv AT plascenciaalfredochavez apreliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT garciagomezpablo apreliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT perezeduardobernal apreliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT demasgimenezgerard apreliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT casasjosepr apreliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT royosantiago apreliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT plascenciaalfredochavez preliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT garciagomezpablo preliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT perezeduardobernal preliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT demasgimenezgerard preliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT casasjosepr preliminarystudyofdeeplearningsensorfusionforpedestriandetection
AT royosantiago preliminarystudyofdeeplearningsensorfusionforpedestriandetection