Cargando…

A High-Efficient Reinforcement Learning Approach for Dexterous Manipulation

Robotic hands have the potential to perform complex tasks in unstructured environments owing to their bionic design, inspired by the most agile biological hand. However, the modeling, planning and control of dexterous hands remain unresolved, open challenges, resulting in the simple movements and re...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jianhua, Zhou, Xuanyi, Zhou, Jinyu, Qiu, Shiming, Liang, Guoyuan, Cai, Shibo, Bao, Guanjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10296228/
https://www.ncbi.nlm.nih.gov/pubmed/37366859
http://dx.doi.org/10.3390/biomimetics8020264
Descripción
Sumario:Robotic hands have the potential to perform complex tasks in unstructured environments owing to their bionic design, inspired by the most agile biological hand. However, the modeling, planning and control of dexterous hands remain unresolved, open challenges, resulting in the simple movements and relatively clumsy motions of current robotic end effectors. This paper proposed a dynamic model based on generative adversarial architecture to learn the state mode of the dexterous hand, reducing the model’s prediction error in long spans. An adaptive trajectory planning kernel was also developed to generate High-Value Area Trajectory (HVAT) data according to the control task and dynamic model, with adaptive trajectory adjustment achieved by changing the Levenberg–Marquardt (LM) coefficient and the linear searching coefficient. Furthermore, an improved Soft Actor–Critic (SAC) algorithm is designed by combining maximum entropy value iteration and HVAT value iteration. An experimental platform and simulation program were built to verify the proposed method with two manipulating tasks. The experimental results indicate that the proposed dexterous hand reinforcement learning algorithm has better training efficiency and requires fewer training samples to achieve quite satisfactory learning and control performance.