Cargando…

Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer

Learning from visual observation for efficient robotic manipulation is a hitherto significant challenge in Reinforcement Learning (RL). Although the collocation of RL policies and convolution neural network (CNN) visual encoder achieves high efficiency and success rate, the method general performanc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Hao, Song, Meichao, Ding, Zhen, Yi, Chunzhi, Jiang, Feng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823612/ https://www.ncbi.nlm.nih.gov/pubmed/36617113 http://dx.doi.org/10.3390/s23010515

_version_	1784866202932215808
author	Guo, Hao Song, Meichao Ding, Zhen Yi, Chunzhi Jiang, Feng
author_facet	Guo, Hao Song, Meichao Ding, Zhen Yi, Chunzhi Jiang, Feng
author_sort	Guo, Hao
collection	PubMed
description	Learning from visual observation for efficient robotic manipulation is a hitherto significant challenge in Reinforcement Learning (RL). Although the collocation of RL policies and convolution neural network (CNN) visual encoder achieves high efficiency and success rate, the method general performance for multi-tasks is still limited to the efficacy of the encoder. Meanwhile, the increasing cost of the encoder optimization for general performance could debilitate the efficiency advantage of the original policy. Building on the attention mechanism, we design a robotic manipulation method that significantly improves the policy general performance among multitasks with the lite Transformer based visual encoder, unsupervised learning, and data augmentation. The encoder of our method could achieve the performance of the original Transformer with much less data, ensuring efficiency in the training process and intensifying the general multi-task performances. Furthermore, we experimentally demonstrate that the master view outperforms the other alternative third-person views in the general robotic manipulation tasks when combining the third-person and egocentric views to assimilate global and local visual information. After extensively experimenting with the tasks from the OpenAI Gym Fetch environment, especially in the Push task, our method succeeds in 92% versus baselines that of 65%, 78% for the CNN encoder, 81% for the ViT encoder, and with fewer training steps.
format	Online Article Text
id	pubmed-9823612
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-98236122023-01-08 Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer Guo, Hao Song, Meichao Ding, Zhen Yi, Chunzhi Jiang, Feng Sensors (Basel) Article Learning from visual observation for efficient robotic manipulation is a hitherto significant challenge in Reinforcement Learning (RL). Although the collocation of RL policies and convolution neural network (CNN) visual encoder achieves high efficiency and success rate, the method general performance for multi-tasks is still limited to the efficacy of the encoder. Meanwhile, the increasing cost of the encoder optimization for general performance could debilitate the efficiency advantage of the original policy. Building on the attention mechanism, we design a robotic manipulation method that significantly improves the policy general performance among multitasks with the lite Transformer based visual encoder, unsupervised learning, and data augmentation. The encoder of our method could achieve the performance of the original Transformer with much less data, ensuring efficiency in the training process and intensifying the general multi-task performances. Furthermore, we experimentally demonstrate that the master view outperforms the other alternative third-person views in the general robotic manipulation tasks when combining the third-person and egocentric views to assimilate global and local visual information. After extensively experimenting with the tasks from the OpenAI Gym Fetch environment, especially in the Push task, our method succeeds in 92% versus baselines that of 65%, 78% for the CNN encoder, 81% for the ViT encoder, and with fewer training steps. MDPI 2023-01-03 /pmc/articles/PMC9823612/ /pubmed/36617113 http://dx.doi.org/10.3390/s23010515 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Guo, Hao Song, Meichao Ding, Zhen Yi, Chunzhi Jiang, Feng Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer
title	Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer
title_full	Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer
title_fullStr	Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer
title_full_unstemmed	Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer
title_short	Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer
title_sort	vision-based efficient robotic manipulation with a dual-streaming compact convolutional transformer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9823612/ https://www.ncbi.nlm.nih.gov/pubmed/36617113 http://dx.doi.org/10.3390/s23010515
work_keys_str_mv	AT guohao visionbasedefficientroboticmanipulationwithadualstreamingcompactconvolutionaltransformer AT songmeichao visionbasedefficientroboticmanipulationwithadualstreamingcompactconvolutionaltransformer AT dingzhen visionbasedefficientroboticmanipulationwithadualstreamingcompactconvolutionaltransformer AT yichunzhi visionbasedefficientroboticmanipulationwithadualstreamingcompactconvolutionaltransformer AT jiangfeng visionbasedefficientroboticmanipulationwithadualstreamingcompactconvolutionaltransformer

Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer

Ejemplares similares