Cargando…
Cascaded Cross-Modality Fusion Network for 3D Object Detection
We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features o...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807/ https://www.ncbi.nlm.nih.gov/pubmed/33348795 http://dx.doi.org/10.3390/s20247243 |
_version_ | 1783628807095189504 |
---|---|
author | Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He |
author_facet | Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He |
author_sort | Chen, Zhiyu |
collection | PubMed |
description | We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods. |
format | Online Article Text |
id | pubmed-7766807 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-77668072020-12-28 Cascaded Cross-Modality Fusion Network for 3D Object Detection Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He Sensors (Basel) Article We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods. MDPI 2020-12-17 /pmc/articles/PMC7766807/ /pubmed/33348795 http://dx.doi.org/10.3390/s20247243 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Chen, Zhiyu Lin, Qiong Sun, Jing Feng, Yujian Liu, Shangdong Liu, Qiang Ji, Yimu Xu, He Cascaded Cross-Modality Fusion Network for 3D Object Detection |
title | Cascaded Cross-Modality Fusion Network for 3D Object Detection |
title_full | Cascaded Cross-Modality Fusion Network for 3D Object Detection |
title_fullStr | Cascaded Cross-Modality Fusion Network for 3D Object Detection |
title_full_unstemmed | Cascaded Cross-Modality Fusion Network for 3D Object Detection |
title_short | Cascaded Cross-Modality Fusion Network for 3D Object Detection |
title_sort | cascaded cross-modality fusion network for 3d object detection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766807/ https://www.ncbi.nlm.nih.gov/pubmed/33348795 http://dx.doi.org/10.3390/s20247243 |
work_keys_str_mv | AT chenzhiyu cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT linqiong cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT sunjing cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT fengyujian cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT liushangdong cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT liuqiang cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT jiyimu cascadedcrossmodalityfusionnetworkfor3dobjectdetection AT xuhe cascadedcrossmodalityfusionnetworkfor3dobjectdetection |