Cargando…

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing

Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural...

Descripción completa

Detalles Bibliográficos
Autores principales: Mi, Jinpeng, Lyu, Jianzhi, Tang, Song, Li, Qingdu, Zhang, Jianwei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7331387/
https://www.ncbi.nlm.nih.gov/pubmed/32670046
http://dx.doi.org/10.3389/fnbot.2020.00043
_version_ 1783553318047449088
author Mi, Jinpeng
Lyu, Jianzhi
Tang, Song
Li, Qingdu
Zhang, Jianwei
author_facet Mi, Jinpeng
Lyu, Jianzhi
Tang, Song
Li, Qingdu
Zhang, Jianwei
author_sort Mi, Jinpeng
collection PubMed
description Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios.
format Online
Article
Text
id pubmed-7331387
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-73313872020-07-14 Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing Mi, Jinpeng Lyu, Jianzhi Tang, Song Li, Qingdu Zhang, Jianwei Front Neurorobot Neuroscience Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios. Frontiers Media S.A. 2020-06-25 /pmc/articles/PMC7331387/ /pubmed/32670046 http://dx.doi.org/10.3389/fnbot.2020.00043 Text en Copyright © 2020 Mi, Lyu, Tang, Li and Zhang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Mi, Jinpeng
Lyu, Jianzhi
Tang, Song
Li, Qingdu
Zhang, Jianwei
Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_full Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_fullStr Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_full_unstemmed Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_short Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_sort interactive natural language grounding via referring expression comprehension and scene graph parsing
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7331387/
https://www.ncbi.nlm.nih.gov/pubmed/32670046
http://dx.doi.org/10.3389/fnbot.2020.00043
work_keys_str_mv AT mijinpeng interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing
AT lyujianzhi interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing
AT tangsong interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing
AT liqingdu interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing
AT zhangjianwei interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing