Cargando…

Part-Based Obstacle Detection Using a Multiple Output Neural Network

Detecting the objects surrounding a moving vehicle is essential for autonomous driving and for any kind of advanced driving assistance system; such a system can also be used for analyzing the surrounding traffic as the vehicle moves. The most popular techniques for object detection are based on imag...

Descripción completa

Detalles Bibliográficos
Autores principales: Itu, Razvan, Danescu, Radu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9228803/
https://www.ncbi.nlm.nih.gov/pubmed/35746094
http://dx.doi.org/10.3390/s22124312
Descripción
Sumario:Detecting the objects surrounding a moving vehicle is essential for autonomous driving and for any kind of advanced driving assistance system; such a system can also be used for analyzing the surrounding traffic as the vehicle moves. The most popular techniques for object detection are based on image processing; in recent years, they have become increasingly focused on artificial intelligence. Systems using monocular vision are increasingly popular for driving assistance, as they do not require complex calibration and setup. The lack of three-dimensional data is compensated for by the efficient and accurate classification of the input image pixels. The detected objects are usually identified as cuboids in the 3D space, or as rectangles in the image space. Recently, instance segmentation techniques have been developed that are able to identify the freeform set of pixels that form an individual object, using complex convolutional neural networks (CNNs). This paper presents an alternative to these instance segmentation networks, combining much simpler semantic segmentation networks with light, geometrical post-processing techniques, to achieve instance segmentation results. The semantic segmentation network produces four semantic labels that identify the quarters of the individual objects: top left, top right, bottom left, and bottom right. These pixels are grouped into connected regions, based on their proximity and their position with respect to the whole object. Each quarter is used to generate a complete object hypothesis, which is then scored according to object pixel fitness. The individual homogeneous regions extracted from the labeled pixels are then assigned to the best-fitted rectangles, leading to complete and freeform identification of the pixels of individual objects. The accuracy is similar to instance segmentation-based methods but with reduced complexity in terms of trainable parameters, which leads to a reduced demand for computational resources.