Cargando…
Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization
We present a general, two-stage reinforcement learning approach to create robust policies that can be deployed on real robots without any additional training using a single demonstration generated by trajectory optimization. The demonstration is used in the first stage as a starting point to facilit...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484268/ https://www.ncbi.nlm.nih.gov/pubmed/36134338 http://dx.doi.org/10.3389/frobt.2022.854212 |
_version_ | 1784791846323486720 |
---|---|
author | Bogdanovic, Miroslav Khadiv , Majid Righetti , Ludovic |
author_facet | Bogdanovic, Miroslav Khadiv , Majid Righetti , Ludovic |
author_sort | Bogdanovic, Miroslav |
collection | PubMed |
description | We present a general, two-stage reinforcement learning approach to create robust policies that can be deployed on real robots without any additional training using a single demonstration generated by trajectory optimization. The demonstration is used in the first stage as a starting point to facilitate initial exploration. In the second stage, the relevant task reward is optimized directly and a policy robust to environment uncertainties is computed. We demonstrate and examine in detail the performance and robustness of our approach on highly dynamic hopping and bounding tasks on a quadruped robot. |
format | Online Article Text |
id | pubmed-9484268 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-94842682022-09-20 Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization Bogdanovic, Miroslav Khadiv , Majid Righetti , Ludovic Front Robot AI Robotics and AI We present a general, two-stage reinforcement learning approach to create robust policies that can be deployed on real robots without any additional training using a single demonstration generated by trajectory optimization. The demonstration is used in the first stage as a starting point to facilitate initial exploration. In the second stage, the relevant task reward is optimized directly and a policy robust to environment uncertainties is computed. We demonstrate and examine in detail the performance and robustness of our approach on highly dynamic hopping and bounding tasks on a quadruped robot. Frontiers Media S.A. 2022-08-31 /pmc/articles/PMC9484268/ /pubmed/36134338 http://dx.doi.org/10.3389/frobt.2022.854212 Text en Copyright © 2022 Bogdanovic, Khadiv and Righetti . https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Robotics and AI Bogdanovic, Miroslav Khadiv , Majid Righetti , Ludovic Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization |
title | Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization |
title_full | Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization |
title_fullStr | Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization |
title_full_unstemmed | Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization |
title_short | Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization |
title_sort | model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization |
topic | Robotics and AI |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484268/ https://www.ncbi.nlm.nih.gov/pubmed/36134338 http://dx.doi.org/10.3389/frobt.2022.854212 |
work_keys_str_mv | AT bogdanovicmiroslav modelfreereinforcementlearningforrobustlocomotionusingdemonstrationsfromtrajectoryoptimization AT khadivmajid modelfreereinforcementlearningforrobustlocomotionusingdemonstrationsfromtrajectoryoptimization AT righettiludovic modelfreereinforcementlearningforrobustlocomotionusingdemonstrationsfromtrajectoryoptimization |