Cargando…

Cascaded Cross-Modality Fusion Network for 3D Object Detection

We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features o...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zhiyu, Lin, Qiong, Sun, Jing, Feng, Yujian, Liu, Shangdong, Liu, Qiang, Ji, Yimu, Xu, He
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807/
https://www.ncbi.nlm.nih.gov/pubmed/33348795
http://dx.doi.org/10.3390/s20247243
_version_ 1783628807095189504
author Chen, Zhiyu
Lin, Qiong
Sun, Jing
Feng, Yujian
Liu, Shangdong
Liu, Qiang
Ji, Yimu
Xu, He
author_facet Chen, Zhiyu
Lin, Qiong
Sun, Jing
Feng, Yujian
Liu, Shangdong
Liu, Qiang
Ji, Yimu
Xu, He
author_sort Chen, Zhiyu
collection PubMed
description We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.
format Online
Article
Text
id pubmed-7766807
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77668072020-12-28 Cascaded Cross-Modality Fusion Network for 3D Object Detection Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He Sensors (Basel) Article We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods. MDPI 2020-12-17 /pmc/articles/PMC7766807/ /pubmed/33348795 http://dx.doi.org/10.3390/s20247243 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chen, Zhiyu
Lin, Qiong
Sun, Jing
Feng, Yujian
Liu, Shangdong
Liu, Qiang
Ji, Yimu
Xu, He
Cascaded Cross-Modality Fusion Network for 3D Object Detection
title Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_full Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_fullStr Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_full_unstemmed Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_short Cascaded Cross-Modality Fusion Network for 3D Object Detection
title_sort cascaded cross-modality fusion network for 3d object detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807/
https://www.ncbi.nlm.nih.gov/pubmed/33348795
http://dx.doi.org/10.3390/s20247243
work_keys_str_mv AT chenzhiyu cascadedcrossmodalityfusionnetworkfor3dobjectdetection
AT linqiong cascadedcrossmodalityfusionnetworkfor3dobjectdetection
AT sunjing cascadedcrossmodalityfusionnetworkfor3dobjectdetection
AT fengyujian cascadedcrossmodalityfusionnetworkfor3dobjectdetection
AT liushangdong cascadedcrossmodalityfusionnetworkfor3dobjectdetection
AT liuqiang cascadedcrossmodalityfusionnetworkfor3dobjectdetection
AT jiyimu cascadedcrossmodalityfusionnetworkfor3dobjectdetection
AT xuhe cascadedcrossmodalityfusionnetworkfor3dobjectdetection