Cargando…

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms l...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jiexin, Uchibe, Eiji, Doya, Kenji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5256123/
https://www.ncbi.nlm.nih.gov/pubmed/28167910
http://dx.doi.org/10.3389/fnbot.2017.00001