Cargando…

Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder

View-invariant object recognition is a challenging problem that has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g., 3D rotations). Human...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kheradpisheh, Saeed R., Ghodrati, Masoud, Ganjtabesh, Mohammad, Masquelier, Timothée
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2016
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5015476/ https://www.ncbi.nlm.nih.gov/pubmed/27642281 http://dx.doi.org/10.3389/fncom.2016.00092

_version_	1782452444328886272
author	Kheradpisheh, Saeed R. Ghodrati, Masoud Ganjtabesh, Mohammad Masquelier, Timothée
author_facet	Kheradpisheh, Saeed R. Ghodrati, Masoud Ganjtabesh, Mohammad Masquelier, Timothée
author_sort	Kheradpisheh, Saeed R.
collection	PubMed
description	View-invariant object recognition is a challenging problem that has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g., 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best models for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition task using the same set of images and controlling the kinds of transformation (position, scale, rotation in plane, and rotation in depth) as well as their magnitude, which we call “variation level.” We used four object categories: car, ship, motorcycle, and animal. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs (proposed respectively by Hinton's group and Zisserman's group) on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position (much easier). This suggests that DCNNs would be reasonable models of human feed-forward vision. In addition, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research.
format	Online Article Text
id	pubmed-5015476
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-50154762016-09-16 Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder Kheradpisheh, Saeed R. Ghodrati, Masoud Ganjtabesh, Mohammad Masquelier, Timothée Front Comput Neurosci Neuroscience View-invariant object recognition is a challenging problem that has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g., 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best models for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition task using the same set of images and controlling the kinds of transformation (position, scale, rotation in plane, and rotation in depth) as well as their magnitude, which we call “variation level.” We used four object categories: car, ship, motorcycle, and animal. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs (proposed respectively by Hinton's group and Zisserman's group) on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position (much easier). This suggests that DCNNs would be reasonable models of human feed-forward vision. In addition, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research. Frontiers Media S.A. 2016-08-31 /pmc/articles/PMC5015476/ /pubmed/27642281 http://dx.doi.org/10.3389/fncom.2016.00092 Text en Copyright © 2016 Kheradpisheh, Ghodrati, Ganjtabesh and Masquelier. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Kheradpisheh, Saeed R. Ghodrati, Masoud Ganjtabesh, Mohammad Masquelier, Timothée Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder
title	Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder
title_full	Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder
title_fullStr	Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder
title_full_unstemmed	Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder
title_short	Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder
title_sort	humans and deep networks largely agree on which kinds of variation make object recognition harder
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5015476/ https://www.ncbi.nlm.nih.gov/pubmed/27642281 http://dx.doi.org/10.3389/fncom.2016.00092
work_keys_str_mv	AT kheradpishehsaeedr humansanddeepnetworkslargelyagreeonwhichkindsofvariationmakeobjectrecognitionharder AT ghodratimasoud humansanddeepnetworkslargelyagreeonwhichkindsofvariationmakeobjectrecognitionharder AT ganjtabeshmohammad humansanddeepnetworkslargelyagreeonwhichkindsofvariationmakeobjectrecognitionharder AT masqueliertimothee humansanddeepnetworkslargelyagreeonwhichkindsofvariationmakeobjectrecognitionharder

Humans and Deep Networks Largely Agree on Which Kinds of Variation Make Object Recognition Harder

Ejemplares similares