Cargando…

DeepGaze III: Modeling free-viewing human scanpaths with deep learning

Humans typically move their eyes in “scanpaths” of fixations linked by saccades. Here we present DeepGaze III, a new model that predicts the spatial location of consecutive fixations in a free-viewing scanpath over static images. DeepGaze III is a deep learning–based model that combines image inform...

Descripción completa

Detalles Bibliográficos
Autores principales: Kümmerer, Matthias, Bethge, Matthias, Wallis, Thomas S. A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Association for Research in Vision and Ophthalmology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9055565/
https://www.ncbi.nlm.nih.gov/pubmed/35472130
http://dx.doi.org/10.1167/jov.22.5.7
_version_ 1784697441234190336
author Kümmerer, Matthias
Bethge, Matthias
Wallis, Thomas S. A.
author_facet Kümmerer, Matthias
Bethge, Matthias
Wallis, Thomas S. A.
author_sort Kümmerer, Matthias
collection PubMed
description Humans typically move their eyes in “scanpaths” of fixations linked by saccades. Here we present DeepGaze III, a new model that predicts the spatial location of consecutive fixations in a free-viewing scanpath over static images. DeepGaze III is a deep learning–based model that combines image information with information about the previous fixation history to predict where a participant might fixate next. As a high-capacity and flexible model, DeepGaze III captures many relevant patterns in the human scanpath data, setting a new state of the art in the MIT300 dataset and thereby providing insight into how much information in scanpaths across observers exists in the first place. We use this insight to assess the importance of mechanisms implemented in simpler, interpretable models for fixation selection. Due to its architecture, DeepGaze III allows us to disentangle several factors that play an important role in fixation selection, such as the interplay of scene content and scanpath history. The modular nature of DeepGaze III allows us to conduct ablation studies, which show that scene content has a stronger effect on fixation selection than previous scanpath history in our main dataset. In addition, we can use the model to identify scenes for which the relative importance of these sources of information differs most. These data-driven insights would be difficult to accomplish with simpler models that do not have the computational capacity to capture such patterns, demonstrating an example of how deep learning advances can be used to contribute to scientific understanding.
format Online
Article
Text
id pubmed-9055565
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Association for Research in Vision and Ophthalmology
record_format MEDLINE/PubMed
spelling pubmed-90555652022-05-01 DeepGaze III: Modeling free-viewing human scanpaths with deep learning Kümmerer, Matthias Bethge, Matthias Wallis, Thomas S. A. J Vis Article Humans typically move their eyes in “scanpaths” of fixations linked by saccades. Here we present DeepGaze III, a new model that predicts the spatial location of consecutive fixations in a free-viewing scanpath over static images. DeepGaze III is a deep learning–based model that combines image information with information about the previous fixation history to predict where a participant might fixate next. As a high-capacity and flexible model, DeepGaze III captures many relevant patterns in the human scanpath data, setting a new state of the art in the MIT300 dataset and thereby providing insight into how much information in scanpaths across observers exists in the first place. We use this insight to assess the importance of mechanisms implemented in simpler, interpretable models for fixation selection. Due to its architecture, DeepGaze III allows us to disentangle several factors that play an important role in fixation selection, such as the interplay of scene content and scanpath history. The modular nature of DeepGaze III allows us to conduct ablation studies, which show that scene content has a stronger effect on fixation selection than previous scanpath history in our main dataset. In addition, we can use the model to identify scenes for which the relative importance of these sources of information differs most. These data-driven insights would be difficult to accomplish with simpler models that do not have the computational capacity to capture such patterns, demonstrating an example of how deep learning advances can be used to contribute to scientific understanding. The Association for Research in Vision and Ophthalmology 2022-04-26 /pmc/articles/PMC9055565/ /pubmed/35472130 http://dx.doi.org/10.1167/jov.22.5.7 Text en Copyright 2022 The Authors https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License.
spellingShingle Article
Kümmerer, Matthias
Bethge, Matthias
Wallis, Thomas S. A.
DeepGaze III: Modeling free-viewing human scanpaths with deep learning
title DeepGaze III: Modeling free-viewing human scanpaths with deep learning
title_full DeepGaze III: Modeling free-viewing human scanpaths with deep learning
title_fullStr DeepGaze III: Modeling free-viewing human scanpaths with deep learning
title_full_unstemmed DeepGaze III: Modeling free-viewing human scanpaths with deep learning
title_short DeepGaze III: Modeling free-viewing human scanpaths with deep learning
title_sort deepgaze iii: modeling free-viewing human scanpaths with deep learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9055565/
https://www.ncbi.nlm.nih.gov/pubmed/35472130
http://dx.doi.org/10.1167/jov.22.5.7
work_keys_str_mv AT kummerermatthias deepgazeiiimodelingfreeviewinghumanscanpathswithdeeplearning
AT bethgematthias deepgazeiiimodelingfreeviewinghumanscanpathswithdeeplearning
AT wallisthomassa deepgazeiiimodelingfreeviewinghumanscanpathswithdeeplearning