Cargando…

Bilateral Cross-Modal Fusion Network for Robot Grasp Detection

In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This arch...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Qiang, Sun, Xueying
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057080/ https://www.ncbi.nlm.nih.gov/pubmed/36992051 http://dx.doi.org/10.3390/s23063340

_version_	1785016275442860032
author	Zhang, Qiang Sun, Xueying
author_facet	Zhang, Qiang Sun, Xueying
author_sort	Zhang, Qiang
collection	PubMed
description	In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method.
format	Online Article Text
id	pubmed-10057080
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-100570802023-03-30 Bilateral Cross-Modal Fusion Network for Robot Grasp Detection Zhang, Qiang Sun, Xueying Sensors (Basel) Article In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method. MDPI 2023-03-22 /pmc/articles/PMC10057080/ /pubmed/36992051 http://dx.doi.org/10.3390/s23063340 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Qiang Sun, Xueying Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title	Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_full	Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_fullStr	Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_full_unstemmed	Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_short	Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
title_sort	bilateral cross-modal fusion network for robot grasp detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057080/ https://www.ncbi.nlm.nih.gov/pubmed/36992051 http://dx.doi.org/10.3390/s23063340
work_keys_str_mv	AT zhangqiang bilateralcrossmodalfusionnetworkforrobotgraspdetection AT sunxueying bilateralcrossmodalfusionnetworkforrobotgraspdetection

Bilateral Cross-Modal Fusion Network for Robot Grasp Detection

Ejemplares similares