Cargando…
Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This arch...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057080/ https://www.ncbi.nlm.nih.gov/pubmed/36992051 http://dx.doi.org/10.3390/s23063340 |
_version_ | 1785016275442860032 |
---|---|
author | Zhang, Qiang Sun, Xueying |
author_facet | Zhang, Qiang Sun, Xueying |
author_sort | Zhang, Qiang |
collection | PubMed |
description | In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method. |
format | Online Article Text |
id | pubmed-10057080 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100570802023-03-30 Bilateral Cross-Modal Fusion Network for Robot Grasp Detection Zhang, Qiang Sun, Xueying Sensors (Basel) Article In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method. MDPI 2023-03-22 /pmc/articles/PMC10057080/ /pubmed/36992051 http://dx.doi.org/10.3390/s23063340 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhang, Qiang Sun, Xueying Bilateral Cross-Modal Fusion Network for Robot Grasp Detection |
title | Bilateral Cross-Modal Fusion Network for Robot Grasp Detection |
title_full | Bilateral Cross-Modal Fusion Network for Robot Grasp Detection |
title_fullStr | Bilateral Cross-Modal Fusion Network for Robot Grasp Detection |
title_full_unstemmed | Bilateral Cross-Modal Fusion Network for Robot Grasp Detection |
title_short | Bilateral Cross-Modal Fusion Network for Robot Grasp Detection |
title_sort | bilateral cross-modal fusion network for robot grasp detection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057080/ https://www.ncbi.nlm.nih.gov/pubmed/36992051 http://dx.doi.org/10.3390/s23063340 |
work_keys_str_mv | AT zhangqiang bilateralcrossmodalfusionnetworkforrobotgraspdetection AT sunxueying bilateralcrossmodalfusionnetworkforrobotgraspdetection |