Cargando…

Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play

Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforce...

Descripción completa

Detalles Bibliográficos
Autores principales:	Abdelfattah, Sherif, Kasmarik, Kathryn, Hu, Jiankun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2018
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189603/ https://www.ncbi.nlm.nih.gov/pubmed/30356836 http://dx.doi.org/10.3389/fnbot.2018.00065

_version_	1783363393330085888
author	Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun
author_facet	Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun
author_sort	Abdelfattah, Sherif
collection	PubMed
description	Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments.
format	Online Article Text
id	pubmed-6189603
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-61896032018-10-23 Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun Front Neurorobot Neuroscience Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments. Frontiers Media S.A. 2018-10-09 /pmc/articles/PMC6189603/ /pubmed/30356836 http://dx.doi.org/10.3389/fnbot.2018.00065 Text en Copyright © 2018 Abdelfattah, Kasmarik and Hu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title	Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_full	Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_fullStr	Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_full_unstemmed	Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_short	Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
title_sort	evolving robust policy coverage sets in multi-objective markov decision processes through intrinsically motivated self-play
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189603/ https://www.ncbi.nlm.nih.gov/pubmed/30356836 http://dx.doi.org/10.3389/fnbot.2018.00065
work_keys_str_mv	AT abdelfattahsherif evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay AT kasmarikkathryn evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay AT hujiankun evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay

Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play

Ejemplares similares