Cargando…

Grounding human-object interaction to affordance behavior in multimodal datasets

While affordance detection and Human-Object interaction (HOI) detection tasks are related, the theoretical foundation of affordances makes it clear that the two are distinct. In particular, researchers in affordances make distinctions between J. J. Gibson's traditional definition of an affordan...

Descripción completa

Detalles Bibliográficos
Autores principales:	Henlein, Alexander, Gopinath, Anju, Krishnaswamy, Nikhil, Mehler, Alexander, Pustejovsky, James
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9923013/ https://www.ncbi.nlm.nih.gov/pubmed/36793938 http://dx.doi.org/10.3389/frai.2023.1084740

_version_	1784887656353628160
author	Henlein, Alexander Gopinath, Anju Krishnaswamy, Nikhil Mehler, Alexander Pustejovsky, James
author_facet	Henlein, Alexander Gopinath, Anju Krishnaswamy, Nikhil Mehler, Alexander Pustejovsky, James
author_sort	Henlein, Alexander
collection	PubMed
description	While affordance detection and Human-Object interaction (HOI) detection tasks are related, the theoretical foundation of affordances makes it clear that the two are distinct. In particular, researchers in affordances make distinctions between J. J. Gibson's traditional definition of an affordance, “the action possibilities” of the object within the environment, and the definition of a telic affordance, or one defined by conventionalized purpose or use. We augment the HICO-DET dataset with annotations for Gibsonian and telic affordances and a subset of the dataset with annotations for the orientation of the humans and objects involved. We then train an adapted Human-Object Interaction (HOI) model and evaluate a pre-trained viewpoint estimation system on this augmented dataset. Our model, AffordanceUPT, is based on a two-stage adaptation of the Unary-Pairwise Transformer (UPT), which we modularize to make affordance detection independent of object detection. Our approach exhibits generalization to new objects and actions, can effectively make the Gibsonian/telic distinction, and shows that this distinction is correlated with features in the data that are not captured by the HOI annotations of the HICO-DET dataset.
format	Online Article Text
id	pubmed-9923013
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-99230132023-02-14 Grounding human-object interaction to affordance behavior in multimodal datasets Henlein, Alexander Gopinath, Anju Krishnaswamy, Nikhil Mehler, Alexander Pustejovsky, James Front Artif Intell Artificial Intelligence While affordance detection and Human-Object interaction (HOI) detection tasks are related, the theoretical foundation of affordances makes it clear that the two are distinct. In particular, researchers in affordances make distinctions between J. J. Gibson's traditional definition of an affordance, “the action possibilities” of the object within the environment, and the definition of a telic affordance, or one defined by conventionalized purpose or use. We augment the HICO-DET dataset with annotations for Gibsonian and telic affordances and a subset of the dataset with annotations for the orientation of the humans and objects involved. We then train an adapted Human-Object Interaction (HOI) model and evaluate a pre-trained viewpoint estimation system on this augmented dataset. Our model, AffordanceUPT, is based on a two-stage adaptation of the Unary-Pairwise Transformer (UPT), which we modularize to make affordance detection independent of object detection. Our approach exhibits generalization to new objects and actions, can effectively make the Gibsonian/telic distinction, and shows that this distinction is correlated with features in the data that are not captured by the HOI annotations of the HICO-DET dataset. Frontiers Media S.A. 2023-01-30 /pmc/articles/PMC9923013/ /pubmed/36793938 http://dx.doi.org/10.3389/frai.2023.1084740 Text en Copyright © 2023 Henlein, Gopinath, Krishnaswamy, Mehler and Pustejovsky. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Henlein, Alexander Gopinath, Anju Krishnaswamy, Nikhil Mehler, Alexander Pustejovsky, James Grounding human-object interaction to affordance behavior in multimodal datasets
title	Grounding human-object interaction to affordance behavior in multimodal datasets
title_full	Grounding human-object interaction to affordance behavior in multimodal datasets
title_fullStr	Grounding human-object interaction to affordance behavior in multimodal datasets
title_full_unstemmed	Grounding human-object interaction to affordance behavior in multimodal datasets
title_short	Grounding human-object interaction to affordance behavior in multimodal datasets
title_sort	grounding human-object interaction to affordance behavior in multimodal datasets
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9923013/ https://www.ncbi.nlm.nih.gov/pubmed/36793938 http://dx.doi.org/10.3389/frai.2023.1084740
work_keys_str_mv	AT henleinalexander groundinghumanobjectinteractiontoaffordancebehaviorinmultimodaldatasets AT gopinathanju groundinghumanobjectinteractiontoaffordancebehaviorinmultimodaldatasets AT krishnaswamynikhil groundinghumanobjectinteractiontoaffordancebehaviorinmultimodaldatasets AT mehleralexander groundinghumanobjectinteractiontoaffordancebehaviorinmultimodaldatasets AT pustejovskyjames groundinghumanobjectinteractiontoaffordancebehaviorinmultimodaldatasets

Grounding human-object interaction to affordance behavior in multimodal datasets

Ejemplares similares