Cargando…

Reinforcement Learning on Slow Features of High-Dimensional Input Streams

Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas...

Descripción completa

Detalles Bibliográficos
Autores principales: Legenstein, Robert, Wilbert, Niko, Wiskott, Laurenz
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2924248/
https://www.ncbi.nlm.nih.gov/pubmed/20808883
http://dx.doi.org/10.1371/journal.pcbi.1000894
_version_ 1782185556107591680
author Legenstein, Robert
Wilbert, Niko
Wiskott, Laurenz
author_facet Legenstein, Robert
Wilbert, Niko
Wiskott, Laurenz
author_sort Legenstein, Robert
collection PubMed
description Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning.
format Text
id pubmed-2924248
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29242482010-08-31 Reinforcement Learning on Slow Features of High-Dimensional Input Streams Legenstein, Robert Wilbert, Niko Wiskott, Laurenz PLoS Comput Biol Research Article Humans and animals are able to learn complex behaviors based on a massive stream of sensory information from different modalities. Early animal studies have identified learning mechanisms that are based on reward and punishment such that animals tend to avoid actions that lead to punishment whereas rewarded actions are reinforced. However, most algorithms for reward-based learning are only applicable if the dimensionality of the state-space is sufficiently small or its structure is sufficiently simple. Therefore, the question arises how the problem of learning on high-dimensional data is solved in the brain. In this article, we propose a biologically plausible generic two-stage learning system that can directly be applied to raw high-dimensional input streams. The system is composed of a hierarchical slow feature analysis (SFA) network for preprocessing and a simple neural network on top that is trained based on rewards. We demonstrate by computer simulations that this generic architecture is able to learn quite demanding reinforcement learning tasks on high-dimensional visual input streams in a time that is comparable to the time needed when an explicit highly informative low-dimensional state-space representation is given instead of the high-dimensional visual input. The learning speed of the proposed architecture in a task similar to the Morris water maze task is comparable to that found in experimental studies with rats. This study thus supports the hypothesis that slowness learning is one important unsupervised learning principle utilized in the brain to form efficient state representations for behavioral learning. Public Library of Science 2010-08-19 /pmc/articles/PMC2924248/ /pubmed/20808883 http://dx.doi.org/10.1371/journal.pcbi.1000894 Text en Legenstein et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Legenstein, Robert
Wilbert, Niko
Wiskott, Laurenz
Reinforcement Learning on Slow Features of High-Dimensional Input Streams
title Reinforcement Learning on Slow Features of High-Dimensional Input Streams
title_full Reinforcement Learning on Slow Features of High-Dimensional Input Streams
title_fullStr Reinforcement Learning on Slow Features of High-Dimensional Input Streams
title_full_unstemmed Reinforcement Learning on Slow Features of High-Dimensional Input Streams
title_short Reinforcement Learning on Slow Features of High-Dimensional Input Streams
title_sort reinforcement learning on slow features of high-dimensional input streams
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2924248/
https://www.ncbi.nlm.nih.gov/pubmed/20808883
http://dx.doi.org/10.1371/journal.pcbi.1000894
work_keys_str_mv AT legensteinrobert reinforcementlearningonslowfeaturesofhighdimensionalinputstreams
AT wilbertniko reinforcementlearningonslowfeaturesofhighdimensionalinputstreams
AT wiskottlaurenz reinforcementlearningonslowfeaturesofhighdimensionalinputstreams