Cargando…

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms l...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Jiexin, Uchibe, Eiji, Doya, Kenji
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2017
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5256123/ https://www.ncbi.nlm.nih.gov/pubmed/28167910 http://dx.doi.org/10.3389/fnbot.2017.00001

_version_	1782498654177722368
author	Wang, Jiexin Uchibe, Eiji Doya, Kenji
author_facet	Wang, Jiexin Uchibe, Eiji Doya, Kenji
author_sort	Wang, Jiexin
collection	PubMed
description	EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate.
format	Online Article Text
id	pubmed-5256123
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-52561232017-02-06 Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer Wang, Jiexin Uchibe, Eiji Doya, Kenji Front Neurorobot Neuroscience EM-based policy search methods estimate a lower bound of the expected return from the histories of episodes and iteratively update the policy parameters using the maximum of a lower bound of expected return, which makes gradient calculation and learning rate tuning unnecessary. Previous algorithms like Policy learning by Weighting Exploration with the Returns, Fitness Expectation Maximization, and EM-based Policy Hyperparameter Exploration implemented the mechanisms to discard useless low-return episodes either implicitly or using a fixed baseline determined by the experimenter. In this paper, we propose an adaptive baseline method to discard worse samples from the reward history and examine different baselines, including the mean, and multiples of SDs from the mean. The simulation results of benchmark tasks of pendulum swing up and cart-pole balancing, and standing up and balancing of a two-wheeled smartphone robot showed improved performances. We further implemented the adaptive baseline with mean in our two-wheeled smartphone robot hardware to test its performance in the standing up and balancing task, and a view-based approaching task. Our results showed that with adaptive baseline, the method outperformed the previous algorithms and achieved faster, and more precise behaviors at a higher successful rate. Frontiers Media S.A. 2017-01-23 /pmc/articles/PMC5256123/ /pubmed/28167910 http://dx.doi.org/10.3389/fnbot.2017.00001 Text en Copyright © 2017 Wang, Uchibe and Doya. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Wang, Jiexin Uchibe, Eiji Doya, Kenji Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
title	Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
title_full	Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
title_fullStr	Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
title_full_unstemmed	Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
title_short	Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer
title_sort	adaptive baseline enhances em-based policy search: validation in a view-based positioning task of a smartphone balancer
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5256123/ https://www.ncbi.nlm.nih.gov/pubmed/28167910 http://dx.doi.org/10.3389/fnbot.2017.00001
work_keys_str_mv	AT wangjiexin adaptivebaselineenhancesembasedpolicysearchvalidationinaviewbasedpositioningtaskofasmartphonebalancer AT uchibeeiji adaptivebaselineenhancesembasedpolicysearchvalidationinaviewbasedpositioningtaskofasmartphonebalancer AT doyakenji adaptivebaselineenhancesembasedpolicysearchvalidationinaviewbasedpositioningtaskofasmartphonebalancer

Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer

Ejemplares similares