Cargando…

Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions

Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of...

Descripción completa

Detalles Bibliográficos
Autores principales: Tamosiunaite, Minija, Asfour, Tamim, Wörgötter, Florentin
Formato: Texto
Lenguaje:English
Publicado: Springer-Verlag 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2798030/
https://www.ncbi.nlm.nih.gov/pubmed/19229556
http://dx.doi.org/10.1007/s00422-009-0295-8
Descripción
Sumario:Reinforcement learning methods can be used in robotics applications especially for specific target-oriented problems, for example the reward-based recalibration of goal directed actions. To this end still relatively large and continuous state-action spaces need to be efficiently handled. The goal of this paper is, thus, to develop a novel, rather simple method which uses reinforcement learning with function approximation in conjunction with different reward-strategies for solving such problems. For the testing of our method, we use a four degree-of-freedom reaching problem in 3D-space simulated by a two-joint robot arm system with two DOF each. Function approximation is based on 4D, overlapping kernels (receptive fields) and the state-action space contains about 10,000 of these. Different types of reward structures are being compared, for example, reward-on- touching-only against reward-on-approach. Furthermore, forbidden joint configurations are punished. A continuous action space is used. In spite of a rather large number of states and the continuous action space these reward/punishment strategies allow the system to find a good solution usually within about 20 trials. The efficiency of our method demonstrated in this test scenario suggests that it might be possible to use it on a real robot for problems where mixed rewards can be defined in situations where other types of learning might be difficult.