Cargando…

Scaling Up Q-Learning via Exploiting State–Action Equivalence

Recent success stories in reinforcement learning have demonstrated that leveraging structural properties of the underlying environment is key in devising viable methods capable of solving complex tasks. We study off-policy learning in discounted reinforcement learning, where some equivalence relatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lyu, Yunlian, Côme, Aymeric, Zhang, Yijie, Talebi, Mohammad Sadegh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10137898/ https://www.ncbi.nlm.nih.gov/pubmed/37190372 http://dx.doi.org/10.3390/e25040584

_version_	1785032577413808128
author	Lyu, Yunlian Côme, Aymeric Zhang, Yijie Talebi, Mohammad Sadegh
author_facet	Lyu, Yunlian Côme, Aymeric Zhang, Yijie Talebi, Mohammad Sadegh
author_sort	Lyu, Yunlian
collection	PubMed
description	Recent success stories in reinforcement learning have demonstrated that leveraging structural properties of the underlying environment is key in devising viable methods capable of solving complex tasks. We study off-policy learning in discounted reinforcement learning, where some equivalence relation in the environment exists. We introduce a new model-free algorithm, called QL-ES (Q-learning with equivalence structure), which is a variant of (asynchronous) Q-learning tailored to exploit the equivalence structure in the MDP. We report a non-asymptotic PAC-type sample complexity bound for QL-ES, thereby establishing its sample efficiency. This bound also allows us to quantify the superiority of QL-ES over Q-learning analytically, which shows that the theoretical gain in some domains can be massive. We report extensive numerical experiments demonstrating that QL-ES converges significantly faster than (structure-oblivious) Q-learning empirically. They imply that the empirical performance gain obtained by exploiting the equivalence structure could be massive, even in simple domains. To the best of our knowledge, QL-ES is the first provably efficient model-free algorithm to exploit the equivalence structure in finite MDPs.
format	Online Article Text
id	pubmed-10137898
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-101378982023-04-28 Scaling Up Q-Learning via Exploiting State–Action Equivalence Lyu, Yunlian Côme, Aymeric Zhang, Yijie Talebi, Mohammad Sadegh Entropy (Basel) Article Recent success stories in reinforcement learning have demonstrated that leveraging structural properties of the underlying environment is key in devising viable methods capable of solving complex tasks. We study off-policy learning in discounted reinforcement learning, where some equivalence relation in the environment exists. We introduce a new model-free algorithm, called QL-ES (Q-learning with equivalence structure), which is a variant of (asynchronous) Q-learning tailored to exploit the equivalence structure in the MDP. We report a non-asymptotic PAC-type sample complexity bound for QL-ES, thereby establishing its sample efficiency. This bound also allows us to quantify the superiority of QL-ES over Q-learning analytically, which shows that the theoretical gain in some domains can be massive. We report extensive numerical experiments demonstrating that QL-ES converges significantly faster than (structure-oblivious) Q-learning empirically. They imply that the empirical performance gain obtained by exploiting the equivalence structure could be massive, even in simple domains. To the best of our knowledge, QL-ES is the first provably efficient model-free algorithm to exploit the equivalence structure in finite MDPs. MDPI 2023-03-29 /pmc/articles/PMC10137898/ /pubmed/37190372 http://dx.doi.org/10.3390/e25040584 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Lyu, Yunlian Côme, Aymeric Zhang, Yijie Talebi, Mohammad Sadegh Scaling Up Q-Learning via Exploiting State–Action Equivalence
title	Scaling Up Q-Learning via Exploiting State–Action Equivalence
title_full	Scaling Up Q-Learning via Exploiting State–Action Equivalence
title_fullStr	Scaling Up Q-Learning via Exploiting State–Action Equivalence
title_full_unstemmed	Scaling Up Q-Learning via Exploiting State–Action Equivalence
title_short	Scaling Up Q-Learning via Exploiting State–Action Equivalence
title_sort	scaling up q-learning via exploiting state–action equivalence
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10137898/ https://www.ncbi.nlm.nih.gov/pubmed/37190372 http://dx.doi.org/10.3390/e25040584
work_keys_str_mv	AT lyuyunlian scalingupqlearningviaexploitingstateactionequivalence AT comeaymeric scalingupqlearningviaexploitingstateactionequivalence AT zhangyijie scalingupqlearningviaexploitingstateactionequivalence AT talebimohammadsadegh scalingupqlearningviaexploitingstateactionequivalence

Scaling Up Q-Learning via Exploiting State–Action Equivalence

Ejemplares similares