Cargando…

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bao, Jiatong, Jia, Yunyi, Cheng, Yu, Tang, Hongru, Xi, Ning
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2016
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5191097/ https://www.ncbi.nlm.nih.gov/pubmed/27983604 http://dx.doi.org/10.3390/s16122117

_version_	1782487556051435520
author	Bao, Jiatong Jia, Yunyi Cheng, Yu Tang, Hongru Xi, Ning
author_facet	Bao, Jiatong Jia, Yunyi Cheng, Yu Tang, Hongru Xi, Ning
author_sort	Bao, Jiatong
collection	PubMed
description	Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications.
format	Online Article Text
id	pubmed-5191097
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-51910972017-01-03 Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera Bao, Jiatong Jia, Yunyi Cheng, Yu Tang, Hongru Xi, Ning Sensors (Basel) Article Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications. MDPI 2016-12-13 /pmc/articles/PMC5191097/ /pubmed/27983604 http://dx.doi.org/10.3390/s16122117 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Bao, Jiatong Jia, Yunyi Cheng, Yu Tang, Hongru Xi, Ning Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
title	Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
title_full	Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
title_fullStr	Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
title_full_unstemmed	Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
title_short	Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
title_sort	detecting target objects by natural language instructions using an rgb-d camera
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5191097/ https://www.ncbi.nlm.nih.gov/pubmed/27983604 http://dx.doi.org/10.3390/s16122117
work_keys_str_mv	AT baojiatong detectingtargetobjectsbynaturallanguageinstructionsusinganrgbdcamera AT jiayunyi detectingtargetobjectsbynaturallanguageinstructionsusinganrgbdcamera AT chengyu detectingtargetobjectsbynaturallanguageinstructionsusinganrgbdcamera AT tanghongru detectingtargetobjectsbynaturallanguageinstructionsusinganrgbdcamera AT xining detectingtargetobjectsbynaturallanguageinstructionsusinganrgbdcamera

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Ejemplares similares