Cargando…

Constrained sampling from deep generative image models reveals mechanisms of human target detection

The first steps of visual processing are often described as a bank of oriented filters followed by divisive normalization. This approach has been tremendously successful at predicting contrast thresholds in simple visual displays. However, it is unclear to what extent this kind of architecture also...

Descripción completa

Detalles Bibliográficos
Autor principal: Fruend, Ingo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Association for Research in Vision and Ophthalmology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7424951/
https://www.ncbi.nlm.nih.gov/pubmed/32729908
http://dx.doi.org/10.1167/jov.20.7.32
_version_ 1783570407818788864
author Fruend, Ingo
author_facet Fruend, Ingo
author_sort Fruend, Ingo
collection PubMed
description The first steps of visual processing are often described as a bank of oriented filters followed by divisive normalization. This approach has been tremendously successful at predicting contrast thresholds in simple visual displays. However, it is unclear to what extent this kind of architecture also supports processing in more complex visual tasks performed in naturally looking images. We used a deep generative image model to embed arc segments with different curvatures in naturalistic images. These images contain the target as part of the image scene, resulting in considerable appearance variation of target as well as background. Three observers localized arc targets in these images, with an average accuracy of 74.7%. Data were fit by several biologically inspired models, four standard deep convolutional neural networks (CNNs), and a five-layer CNN specifically trained for this task. Four models predicted observer responses particularly well; (1) a bank of oriented filters, similar to complex cells in primate area V1; (2) a bank of oriented filters followed by tuned gain control, incorporating knowledge about cortical surround interactions; (3) a bank of oriented filters followed by local normalization; and (4) the five-layer CNN. A control experiment with optimized stimuli based on these four models showed that the observers’ data were best explained by model (2) with tuned gain control. These data suggest that standard models of early vision provide good descriptions of performance in much more complex tasks than what they were designed for, while general-purpose non linear models such as convolutional neural networks do not.
format Online
Article
Text
id pubmed-7424951
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher The Association for Research in Vision and Ophthalmology
record_format MEDLINE/PubMed
spelling pubmed-74249512020-08-26 Constrained sampling from deep generative image models reveals mechanisms of human target detection Fruend, Ingo J Vis Article The first steps of visual processing are often described as a bank of oriented filters followed by divisive normalization. This approach has been tremendously successful at predicting contrast thresholds in simple visual displays. However, it is unclear to what extent this kind of architecture also supports processing in more complex visual tasks performed in naturally looking images. We used a deep generative image model to embed arc segments with different curvatures in naturalistic images. These images contain the target as part of the image scene, resulting in considerable appearance variation of target as well as background. Three observers localized arc targets in these images, with an average accuracy of 74.7%. Data were fit by several biologically inspired models, four standard deep convolutional neural networks (CNNs), and a five-layer CNN specifically trained for this task. Four models predicted observer responses particularly well; (1) a bank of oriented filters, similar to complex cells in primate area V1; (2) a bank of oriented filters followed by tuned gain control, incorporating knowledge about cortical surround interactions; (3) a bank of oriented filters followed by local normalization; and (4) the five-layer CNN. A control experiment with optimized stimuli based on these four models showed that the observers’ data were best explained by model (2) with tuned gain control. These data suggest that standard models of early vision provide good descriptions of performance in much more complex tasks than what they were designed for, while general-purpose non linear models such as convolutional neural networks do not. The Association for Research in Vision and Ophthalmology 2020-07-30 /pmc/articles/PMC7424951/ /pubmed/32729908 http://dx.doi.org/10.1167/jov.20.7.32 Text en Copyright 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
spellingShingle Article
Fruend, Ingo
Constrained sampling from deep generative image models reveals mechanisms of human target detection
title Constrained sampling from deep generative image models reveals mechanisms of human target detection
title_full Constrained sampling from deep generative image models reveals mechanisms of human target detection
title_fullStr Constrained sampling from deep generative image models reveals mechanisms of human target detection
title_full_unstemmed Constrained sampling from deep generative image models reveals mechanisms of human target detection
title_short Constrained sampling from deep generative image models reveals mechanisms of human target detection
title_sort constrained sampling from deep generative image models reveals mechanisms of human target detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7424951/
https://www.ncbi.nlm.nih.gov/pubmed/32729908
http://dx.doi.org/10.1167/jov.20.7.32
work_keys_str_mv AT fruendingo constrainedsamplingfromdeepgenerativeimagemodelsrevealsmechanismsofhumantargetdetection