Cargando…

Bilateral Cross-Modal Fusion Network for Robot Grasp Detection

In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This arch...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qiang, Sun, Xueying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057080/
https://www.ncbi.nlm.nih.gov/pubmed/36992051
http://dx.doi.org/10.3390/s23063340
_version_ 1785016275442860032
author Zhang, Qiang
Sun, Xueying
author_facet Zhang, Qiang
Sun, Xueying
author_sort Zhang, Qiang
collection PubMed
description In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method.
format Online
Article
Text
id pubmed-10057080
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100570802023-03-30 Bilateral Cross-Modal Fusion Network for Robot Grasp Detection Zhang, Qiang Sun, Xueying Sensors (Basel) Article In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method. MDPI 2023-03-22 /pmc/articles/PMC10057080/ /pubmed/36992051 http://dx.doi.org/10.3390/s23063340 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Qiang
Sun, Xueying
Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_full Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_fullStr Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_full_unstemmed Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_short Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_sort bilateral cross-modal fusion network for robot grasp detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057080/
https://www.ncbi.nlm.nih.gov/pubmed/36992051
http://dx.doi.org/10.3390/s23063340
work_keys_str_mv AT zhangqiang bilateralcrossmodalfusionnetworkforrobotgraspdetection
AT sunxueying bilateralcrossmodalfusionnetworkforrobotgraspdetection