Cargando…

Cascaded Cross-Modality Fusion Network for 3D Object Detection

We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Zhiyu, Lin, Qiong, Sun, Jing, Feng, Yujian, Liu, Shangdong, Liu, Qiang, Ji, Yimu, Xu, He
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807/ https://www.ncbi.nlm.nih.gov/pubmed/33348795 http://dx.doi.org/10.3390/s20247243

_version_	1783628807095189504
author	Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He
author_facet	Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He
author_sort	Chen, Zhiyu
collection	PubMed
description	We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.
format	Online Article Text
id	pubmed-7766807
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-77668072020-12-28 Cascaded Cross-Modality Fusion Network for 3D Object Detection Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He Sensors (Basel) Article We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods. MDPI 2020-12-17 /pmc/articles/PMC7766807/ /pubmed/33348795 http://dx.doi.org/10.3390/s20247243 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He Cascaded Cross-Modality Fusion Network for 3D Object Detection
title	Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_full	Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_fullStr	Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_full_unstemmed	Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_short	Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_sort	cascaded cross-modality fusion network for 3d object detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807/ https://www.ncbi.nlm.nih.gov/pubmed/33348795 http://dx.doi.org/10.3390/s20247243
work_keys_str_mv	AT chenzhiyu cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT linqiong cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT sunjing cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT fengyujian cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT liushangdong cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT liuqiang cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT jiyimu cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT xuhe cascadedcrossmodalityfusionnetworkfor3dobjectdetection

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Ejemplares similares