Cargando…

Embodied Object Representation Learning and Recognition

Scene understanding and decomposition is a crucial challenge for intelligent systems, whether it is for object manipulation, navigation, or any other task. Although current machine and deep learning approaches for object detection and classification obtain high accuracy, they typically do not levera...

Descripción completa

Detalles Bibliográficos
Autores principales:	Van de Maele, Toon, Verbelen, Tim, Çatal, Ozan, Dhoedt, Bart
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9049856/ https://www.ncbi.nlm.nih.gov/pubmed/35496899 http://dx.doi.org/10.3389/fnbot.2022.840658

_version_	1784696234352574464
author	Van de Maele, Toon Verbelen, Tim Çatal, Ozan Dhoedt, Bart
author_facet	Van de Maele, Toon Verbelen, Tim Çatal, Ozan Dhoedt, Bart
author_sort	Van de Maele, Toon
collection	PubMed
description	Scene understanding and decomposition is a crucial challenge for intelligent systems, whether it is for object manipulation, navigation, or any other task. Although current machine and deep learning approaches for object detection and classification obtain high accuracy, they typically do not leverage interaction with the world and are limited to a set of objects seen during training. Humans on the other hand learn to recognize and classify different objects by actively engaging with them on first encounter. Moreover, recent theories in neuroscience suggest that cortical columns in the neocortex play an important role in this process, by building predictive models about objects in their reference frame. In this article, we present an enactive embodied agent that implements such a generative model for object interaction. For each object category, our system instantiates a deep neural network, called Cortical Column Network (CCN), that represents the object in its own reference frame by learning a generative model that predicts the expected transform in pixel space, given an action. The model parameters are optimized through the active inference paradigm, i.e., the minimization of variational free energy. When provided with a visual observation, an ensemble of CCNs each vote on their belief of observing that specific object category, yielding a potential object classification. In case the likelihood on the selected category is too low, the object is detected as an unknown category, and the agent has the ability to instantiate a novel CCN for this category. We validate our system in an simulated environment, where it needs to learn to discern multiple objects from the YCB dataset. We show that classification accuracy improves as an embodied agent can gather more evidence, and that it is able to learn about novel, previously unseen objects. Finally, we show that an agent driven through active inference can choose their actions to reach a preferred observation.
format	Online Article Text
id	pubmed-9049856
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-90498562022-04-29 Embodied Object Representation Learning and Recognition Van de Maele, Toon Verbelen, Tim Çatal, Ozan Dhoedt, Bart Front Neurorobot Neuroscience Scene understanding and decomposition is a crucial challenge for intelligent systems, whether it is for object manipulation, navigation, or any other task. Although current machine and deep learning approaches for object detection and classification obtain high accuracy, they typically do not leverage interaction with the world and are limited to a set of objects seen during training. Humans on the other hand learn to recognize and classify different objects by actively engaging with them on first encounter. Moreover, recent theories in neuroscience suggest that cortical columns in the neocortex play an important role in this process, by building predictive models about objects in their reference frame. In this article, we present an enactive embodied agent that implements such a generative model for object interaction. For each object category, our system instantiates a deep neural network, called Cortical Column Network (CCN), that represents the object in its own reference frame by learning a generative model that predicts the expected transform in pixel space, given an action. The model parameters are optimized through the active inference paradigm, i.e., the minimization of variational free energy. When provided with a visual observation, an ensemble of CCNs each vote on their belief of observing that specific object category, yielding a potential object classification. In case the likelihood on the selected category is too low, the object is detected as an unknown category, and the agent has the ability to instantiate a novel CCN for this category. We validate our system in an simulated environment, where it needs to learn to discern multiple objects from the YCB dataset. We show that classification accuracy improves as an embodied agent can gather more evidence, and that it is able to learn about novel, previously unseen objects. Finally, we show that an agent driven through active inference can choose their actions to reach a preferred observation. Frontiers Media S.A. 2022-04-14 /pmc/articles/PMC9049856/ /pubmed/35496899 http://dx.doi.org/10.3389/fnbot.2022.840658 Text en Copyright © 2022 Van de Maele, Verbelen, Çatal and Dhoedt. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Van de Maele, Toon Verbelen, Tim Çatal, Ozan Dhoedt, Bart Embodied Object Representation Learning and Recognition
title	Embodied Object Representation Learning and Recognition
title_full	Embodied Object Representation Learning and Recognition
title_fullStr	Embodied Object Representation Learning and Recognition
title_full_unstemmed	Embodied Object Representation Learning and Recognition
title_short	Embodied Object Representation Learning and Recognition
title_sort	embodied object representation learning and recognition
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9049856/ https://www.ncbi.nlm.nih.gov/pubmed/35496899 http://dx.doi.org/10.3389/fnbot.2022.840658
work_keys_str_mv	AT vandemaeletoon embodiedobjectrepresentationlearningandrecognition AT verbelentim embodiedobjectrepresentationlearningandrecognition AT catalozan embodiedobjectrepresentationlearningandrecognition AT dhoedtbart embodiedobjectrepresentationlearningandrecognition

Embodied Object Representation Learning and Recognition

Ejemplares similares