Cargando…
Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imit...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114147/ https://www.ncbi.nlm.nih.gov/pubmed/35602587 http://dx.doi.org/10.1007/s10994-022-06144-5 |
_version_ | 1784709720411471872 |
---|---|
author | Blondé, Lionel Strasser, Pablo Kalousis, Alexandros |
author_facet | Blondé, Lionel Strasser, Pablo Kalousis, Alexandros |
author_sort | Blondé, Lionel |
collection | PubMed |
description | Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. We show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. We complement these guarantees with empirical evidence attesting to the strong positive effect that the consistent satisfaction of the Lipschitzness constraint on the reward has on imitation performance. Finally, we tackle a generic pessimistic reward preconditioning add-on spawning a large class of reward shaping methods, which makes the base method it is plugged into provably more robust, as shown in several additional theoretical guarantees. We then discuss these through a fine-grained lens and share our insights. Crucially, the guarantees derived and reported in this work are valid for any reward satisfying the Lipschitzness condition, nothing is specific to imitation. As such, these may be of independent interest. |
format | Online Article Text |
id | pubmed-9114147 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-91141472022-05-19 Lipschitzness is all you need to tame off-policy generative adversarial imitation learning Blondé, Lionel Strasser, Pablo Kalousis, Alexandros Mach Learn Article Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. We show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. We complement these guarantees with empirical evidence attesting to the strong positive effect that the consistent satisfaction of the Lipschitzness constraint on the reward has on imitation performance. Finally, we tackle a generic pessimistic reward preconditioning add-on spawning a large class of reward shaping methods, which makes the base method it is plugged into provably more robust, as shown in several additional theoretical guarantees. We then discuss these through a fine-grained lens and share our insights. Crucially, the guarantees derived and reported in this work are valid for any reward satisfying the Lipschitzness condition, nothing is specific to imitation. As such, these may be of independent interest. Springer US 2022-04-04 2022 /pmc/articles/PMC9114147/ /pubmed/35602587 http://dx.doi.org/10.1007/s10994-022-06144-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Blondé, Lionel Strasser, Pablo Kalousis, Alexandros Lipschitzness is all you need to tame off-policy generative adversarial imitation learning |
title | Lipschitzness is all you need to tame off-policy generative adversarial imitation learning |
title_full | Lipschitzness is all you need to tame off-policy generative adversarial imitation learning |
title_fullStr | Lipschitzness is all you need to tame off-policy generative adversarial imitation learning |
title_full_unstemmed | Lipschitzness is all you need to tame off-policy generative adversarial imitation learning |
title_short | Lipschitzness is all you need to tame off-policy generative adversarial imitation learning |
title_sort | lipschitzness is all you need to tame off-policy generative adversarial imitation learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9114147/ https://www.ncbi.nlm.nih.gov/pubmed/35602587 http://dx.doi.org/10.1007/s10994-022-06144-5 |
work_keys_str_mv | AT blondelionel lipschitznessisallyouneedtotameoffpolicygenerativeadversarialimitationlearning AT strasserpablo lipschitznessisallyouneedtotameoffpolicygenerativeadversarialimitationlearning AT kalousisalexandros lipschitznessisallyouneedtotameoffpolicygenerativeadversarialimitationlearning |