Cargando…

Realistic Actor-Critic: A framework for balance between value overestimation and underestimation

INTRODUCTION: The value approximation bias is known to lead to suboptimal policies or catastrophic overestimation bias accumulation that prevent the agent from making the right decisions between exploration and exploitation. Algorithms have been proposed to mitigate the above contradiction. However,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Sicen, Tang, Qinyun, Pang, Yiming, Ma, Xinmeng, Wang, Gang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868235/ https://www.ncbi.nlm.nih.gov/pubmed/36699950 http://dx.doi.org/10.3389/fnbot.2022.1081242

_version_	1784876486805684224
author	Li, Sicen Tang, Qinyun Pang, Yiming Ma, Xinmeng Wang, Gang
author_facet	Li, Sicen Tang, Qinyun Pang, Yiming Ma, Xinmeng Wang, Gang
author_sort	Li, Sicen
collection	PubMed
description	INTRODUCTION: The value approximation bias is known to lead to suboptimal policies or catastrophic overestimation bias accumulation that prevent the agent from making the right decisions between exploration and exploitation. Algorithms have been proposed to mitigate the above contradiction. However, we still lack an understanding of how the value bias impact performance and a method for efficient exploration while keeping stable updates. This study aims to clarify the effect of the value bias and improve the reinforcement learning algorithms to enhance sample efficiency. METHODS: This study designs a simple episodic tabular MDP to research value underestimation and overestimation in actor-critic methods. This study proposes a unified framework called Realistic Actor-Critic (RAC), which employs Universal Value Function Approximators (UVFA) to simultaneously learn policies with different value confidence-bound with the same neural network, each with a different under overestimation trade-off. RESULTS: This study highlights that agents could over-explore low-value states due to inflexible under-overestimation trade-off in the fixed hyperparameters setting, which is a particular form of the exploration-exploitation dilemma. And RAC performs directed exploration without over-exploration using the upper bounds while still avoiding overestimation using the lower bounds. Through carefully designed experiments, this study empirically verifies that RAC achieves 10x sample efficiency and 25% performance improvement compared to Soft Actor-Critic in the most challenging Humanoid environment. All the source codes are available at https://github.com/ihuhuhu/RAC. DISCUSSION: This research not only provides valuable insights for research on the exploration-exploitation trade-off by studying the frequency of policies access to low-value states under different value confidence-bounds guidance, but also proposes a new unified framework that can be combined with current actor-critic methods to improve sample efficiency in the continuous control domain.
format	Online Article Text
id	pubmed-9868235
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-98682352023-01-24 Realistic Actor-Critic: A framework for balance between value overestimation and underestimation Li, Sicen Tang, Qinyun Pang, Yiming Ma, Xinmeng Wang, Gang Front Neurorobot Neuroscience INTRODUCTION: The value approximation bias is known to lead to suboptimal policies or catastrophic overestimation bias accumulation that prevent the agent from making the right decisions between exploration and exploitation. Algorithms have been proposed to mitigate the above contradiction. However, we still lack an understanding of how the value bias impact performance and a method for efficient exploration while keeping stable updates. This study aims to clarify the effect of the value bias and improve the reinforcement learning algorithms to enhance sample efficiency. METHODS: This study designs a simple episodic tabular MDP to research value underestimation and overestimation in actor-critic methods. This study proposes a unified framework called Realistic Actor-Critic (RAC), which employs Universal Value Function Approximators (UVFA) to simultaneously learn policies with different value confidence-bound with the same neural network, each with a different under overestimation trade-off. RESULTS: This study highlights that agents could over-explore low-value states due to inflexible under-overestimation trade-off in the fixed hyperparameters setting, which is a particular form of the exploration-exploitation dilemma. And RAC performs directed exploration without over-exploration using the upper bounds while still avoiding overestimation using the lower bounds. Through carefully designed experiments, this study empirically verifies that RAC achieves 10x sample efficiency and 25% performance improvement compared to Soft Actor-Critic in the most challenging Humanoid environment. All the source codes are available at https://github.com/ihuhuhu/RAC. DISCUSSION: This research not only provides valuable insights for research on the exploration-exploitation trade-off by studying the frequency of policies access to low-value states under different value confidence-bounds guidance, but also proposes a new unified framework that can be combined with current actor-critic methods to improve sample efficiency in the continuous control domain. Frontiers Media S.A. 2023-01-09 /pmc/articles/PMC9868235/ /pubmed/36699950 http://dx.doi.org/10.3389/fnbot.2022.1081242 Text en Copyright © 2023 Li, Tang, Pang, Ma and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Li, Sicen Tang, Qinyun Pang, Yiming Ma, Xinmeng Wang, Gang Realistic Actor-Critic: A framework for balance between value overestimation and underestimation
title	Realistic Actor-Critic: A framework for balance between value overestimation and underestimation
title_full	Realistic Actor-Critic: A framework for balance between value overestimation and underestimation
title_fullStr	Realistic Actor-Critic: A framework for balance between value overestimation and underestimation
title_full_unstemmed	Realistic Actor-Critic: A framework for balance between value overestimation and underestimation
title_short	Realistic Actor-Critic: A framework for balance between value overestimation and underestimation
title_sort	realistic actor-critic: a framework for balance between value overestimation and underestimation
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868235/ https://www.ncbi.nlm.nih.gov/pubmed/36699950 http://dx.doi.org/10.3389/fnbot.2022.1081242
work_keys_str_mv	AT lisicen realisticactorcriticaframeworkforbalancebetweenvalueoverestimationandunderestimation AT tangqinyun realisticactorcriticaframeworkforbalancebetweenvalueoverestimationandunderestimation AT pangyiming realisticactorcriticaframeworkforbalancebetweenvalueoverestimationandunderestimation AT maxinmeng realisticactorcriticaframeworkforbalancebetweenvalueoverestimationandunderestimation AT wanggang realisticactorcriticaframeworkforbalancebetweenvalueoverestimationandunderestimation

Realistic Actor-Critic: A framework for balance between value overestimation and underestimation

Ejemplares similares