Cargando…
Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play
Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforce...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189603/ https://www.ncbi.nlm.nih.gov/pubmed/30356836 http://dx.doi.org/10.3389/fnbot.2018.00065 |
_version_ | 1783363393330085888 |
---|---|
author | Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun |
author_facet | Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun |
author_sort | Abdelfattah, Sherif |
collection | PubMed |
description | Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments. |
format | Online Article Text |
id | pubmed-6189603 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-61896032018-10-23 Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun Front Neurorobot Neuroscience Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments. Frontiers Media S.A. 2018-10-09 /pmc/articles/PMC6189603/ /pubmed/30356836 http://dx.doi.org/10.3389/fnbot.2018.00065 Text en Copyright © 2018 Abdelfattah, Kasmarik and Hu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Abdelfattah, Sherif Kasmarik, Kathryn Hu, Jiankun Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play |
title | Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play |
title_full | Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play |
title_fullStr | Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play |
title_full_unstemmed | Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play |
title_short | Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play |
title_sort | evolving robust policy coverage sets in multi-objective markov decision processes through intrinsically motivated self-play |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6189603/ https://www.ncbi.nlm.nih.gov/pubmed/30356836 http://dx.doi.org/10.3389/fnbot.2018.00065 |
work_keys_str_mv | AT abdelfattahsherif evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay AT kasmarikkathryn evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay AT hujiankun evolvingrobustpolicycoveragesetsinmultiobjectivemarkovdecisionprocessesthroughintrinsicallymotivatedselfplay |