Cargando…

A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm

Many everyday activities are sequential in nature. That is, they can be seen as a sequence of subactions and sometimes subgoals. In the motor execution of sequential action, context effects are observed in which later subactions modulate the execution of earlier subactions (e.g., reaching for an ove...

Descripción completa

Detalles Bibliográficos
Autores principales:	de Kleijn, Roy, Sen, Deniz, Kachergis, George
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2022
Materias:	Topic: Everyday Activities — Editors: Holger Schultheis and Richard P. Cooper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9303318/ https://www.ncbi.nlm.nih.gov/pubmed/35005844 http://dx.doi.org/10.1111/tops.12595

_version_	1784751834717487104
author	de Kleijn, Roy Sen, Deniz Kachergis, George
author_facet	de Kleijn, Roy Sen, Deniz Kachergis, George
author_sort	de Kleijn, Roy
collection	PubMed
description	Many everyday activities are sequential in nature. That is, they can be seen as a sequence of subactions and sometimes subgoals. In the motor execution of sequential action, context effects are observed in which later subactions modulate the execution of earlier subactions (e.g., reaching for an overturned mug, people will optimize their grasp to achieve a comfortable end state). A trajectory (movement) adaptation of an often‐used paradigm in the study of sequential action, the serial response time task, showed several context effects of which centering behavior is of special interest. Centering behavior refers to the tendency (or strategy) of subjects to move their arm or mouse cursor to a position equidistant to all stimuli in the absence of predictive information, thereby reducing movement time to all possible targets. In the current study, we investigated sequential action in a virtual robotic agent trained using proximal policy optimization, a state‐of‐the‐art deep reinforcement learning algorithm. The agent was trained to reach for appearing targets, similar to a serial response time task given to humans. We found that agents were more likely to develop centering behavior similar to human subjects after curricularized learning. In our curriculum, we first rewarded agents for reaching targets before introducing a penalty for energy expenditure. When the penalty was applied with no curriculum, many agents failed to learn the task due to a lack of action space exploration, resulting in high variability of agents' performance. Our findings suggest that in virtual agents, similar to infants, early energetic exploration can promote robust later learning. This may have the same effect as infants' curiosity‐based learning by which they shape their own curriculum. However, introducing new goals cannot wait too long, as there may be critical periods in development after which agents (as humans) cannot flexibly learn to incorporate new objectives. These lessons are making their way into machine learning and offer exciting new avenues for studying both human and machine learning of sequential action.
format	Online Article Text
id	pubmed-9303318
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-93033182022-07-22 A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm de Kleijn, Roy Sen, Deniz Kachergis, George Top Cogn Sci Topic: Everyday Activities — Editors: Holger Schultheis and Richard P. Cooper Many everyday activities are sequential in nature. That is, they can be seen as a sequence of subactions and sometimes subgoals. In the motor execution of sequential action, context effects are observed in which later subactions modulate the execution of earlier subactions (e.g., reaching for an overturned mug, people will optimize their grasp to achieve a comfortable end state). A trajectory (movement) adaptation of an often‐used paradigm in the study of sequential action, the serial response time task, showed several context effects of which centering behavior is of special interest. Centering behavior refers to the tendency (or strategy) of subjects to move their arm or mouse cursor to a position equidistant to all stimuli in the absence of predictive information, thereby reducing movement time to all possible targets. In the current study, we investigated sequential action in a virtual robotic agent trained using proximal policy optimization, a state‐of‐the‐art deep reinforcement learning algorithm. The agent was trained to reach for appearing targets, similar to a serial response time task given to humans. We found that agents were more likely to develop centering behavior similar to human subjects after curricularized learning. In our curriculum, we first rewarded agents for reaching targets before introducing a penalty for energy expenditure. When the penalty was applied with no curriculum, many agents failed to learn the task due to a lack of action space exploration, resulting in high variability of agents' performance. Our findings suggest that in virtual agents, similar to infants, early energetic exploration can promote robust later learning. This may have the same effect as infants' curiosity‐based learning by which they shape their own curriculum. However, introducing new goals cannot wait too long, as there may be critical periods in development after which agents (as humans) cannot flexibly learn to incorporate new objectives. These lessons are making their way into machine learning and offer exciting new avenues for studying both human and machine learning of sequential action. John Wiley and Sons Inc. 2022-01-10 2022-04 /pmc/articles/PMC9303318/ /pubmed/35005844 http://dx.doi.org/10.1111/tops.12595 Text en © 2022 The Authors. Topics in Cognitive Science published by Wiley Periodicals LLC on behalf of Cognitive Science Society https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle	Topic: Everyday Activities — Editors: Holger Schultheis and Richard P. Cooper de Kleijn, Roy Sen, Deniz Kachergis, George A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm
title	A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm
title_full	A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm
title_fullStr	A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm
title_full_unstemmed	A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm
title_short	A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm
title_sort	critical period for robust curriculum‐based deep reinforcement learning of sequential action in a robot arm
topic	Topic: Everyday Activities — Editors: Holger Schultheis and Richard P. Cooper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9303318/ https://www.ncbi.nlm.nih.gov/pubmed/35005844 http://dx.doi.org/10.1111/tops.12595
work_keys_str_mv	AT dekleijnroy acriticalperiodforrobustcurriculumbaseddeepreinforcementlearningofsequentialactioninarobotarm AT sendeniz acriticalperiodforrobustcurriculumbaseddeepreinforcementlearningofsequentialactioninarobotarm AT kachergisgeorge acriticalperiodforrobustcurriculumbaseddeepreinforcementlearningofsequentialactioninarobotarm AT dekleijnroy criticalperiodforrobustcurriculumbaseddeepreinforcementlearningofsequentialactioninarobotarm AT sendeniz criticalperiodforrobustcurriculumbaseddeepreinforcementlearningofsequentialactioninarobotarm AT kachergisgeorge criticalperiodforrobustcurriculumbaseddeepreinforcementlearningofsequentialactioninarobotarm

A Critical Period for Robust Curriculum‐Based Deep Reinforcement Learning of Sequential Action in a Robot Arm

Ejemplares similares