Cargando…

Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection

This paper describes a new method that enables a service robot to understand spoken commands in a robust manner using off-the-shelf automatic speech recognition (ASR) systems and an encoder-decoder neural network with noise injection. In numerous instances, the understanding of spoken commands in th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tada, Yuuki, Hagiwara, Yoshinobu, Tanaka, Hiroki, Taniguchi, Tadahiro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Robotics and AI
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805724/ https://www.ncbi.nlm.nih.gov/pubmed/33501159 http://dx.doi.org/10.3389/frobt.2019.00144

_version_	1783636366304739328
author	Tada, Yuuki Hagiwara, Yoshinobu Tanaka, Hiroki Taniguchi, Tadahiro
author_facet	Tada, Yuuki Hagiwara, Yoshinobu Tanaka, Hiroki Taniguchi, Tadahiro
author_sort	Tada, Yuuki
collection	PubMed
description	This paper describes a new method that enables a service robot to understand spoken commands in a robust manner using off-the-shelf automatic speech recognition (ASR) systems and an encoder-decoder neural network with noise injection. In numerous instances, the understanding of spoken commands in the area of service robotics is modeled as a mapping of speech signals to a sequence of commands that can be understood and performed by a robot. In a conventional approach, speech signals are recognized, and semantic parsing is applied to infer the command sequence from the utterance. However, if errors occur during the process of speech recognition, a conventional semantic parsing method cannot be appropriately applied because most natural language processing methods do not recognize such errors. We propose the use of encoder-decoder neural networks, e.g., sequence to sequence, with noise injection. The noise is injected into phoneme sequences during the training phase of encoder-decoder neural network-based semantic parsing systems. We demonstrate that the use of neural networks with a noise injection can mitigate the negative effects of speech recognition errors in understanding robot-directed speech commands i.e., increase the performance of semantic parsing. We implemented the method and evaluated it using the commands given during a general purpose service robot (GPSR) task, such as a task applied in RoboCup@Home, which is a standard service robot competition for the testing of service robots. The results of the experiment show that the proposed method, namely, sequence to sequence with noise injection (Seq2Seq-NI), outperforms the baseline methods. In addition, Seq2Seq-NI enables a robot to understand a spoken command even when the speech recognition by an off-the-shelf ASR system contains recognition errors. Moreover, in this paper we describe an experiment conducted to evaluate the influence of the injected noise and provide a discussion of the results.
format	Online Article Text
id	pubmed-7805724
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-78057242021-01-25 Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection Tada, Yuuki Hagiwara, Yoshinobu Tanaka, Hiroki Taniguchi, Tadahiro Front Robot AI Robotics and AI This paper describes a new method that enables a service robot to understand spoken commands in a robust manner using off-the-shelf automatic speech recognition (ASR) systems and an encoder-decoder neural network with noise injection. In numerous instances, the understanding of spoken commands in the area of service robotics is modeled as a mapping of speech signals to a sequence of commands that can be understood and performed by a robot. In a conventional approach, speech signals are recognized, and semantic parsing is applied to infer the command sequence from the utterance. However, if errors occur during the process of speech recognition, a conventional semantic parsing method cannot be appropriately applied because most natural language processing methods do not recognize such errors. We propose the use of encoder-decoder neural networks, e.g., sequence to sequence, with noise injection. The noise is injected into phoneme sequences during the training phase of encoder-decoder neural network-based semantic parsing systems. We demonstrate that the use of neural networks with a noise injection can mitigate the negative effects of speech recognition errors in understanding robot-directed speech commands i.e., increase the performance of semantic parsing. We implemented the method and evaluated it using the commands given during a general purpose service robot (GPSR) task, such as a task applied in RoboCup@Home, which is a standard service robot competition for the testing of service robots. The results of the experiment show that the proposed method, namely, sequence to sequence with noise injection (Seq2Seq-NI), outperforms the baseline methods. In addition, Seq2Seq-NI enables a robot to understand a spoken command even when the speech recognition by an off-the-shelf ASR system contains recognition errors. Moreover, in this paper we describe an experiment conducted to evaluate the influence of the injected noise and provide a discussion of the results. Frontiers Media S.A. 2020-01-14 /pmc/articles/PMC7805724/ /pubmed/33501159 http://dx.doi.org/10.3389/frobt.2019.00144 Text en Copyright © 2020 Tada, Hagiwara, Tanaka and Taniguchi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Robotics and AI Tada, Yuuki Hagiwara, Yoshinobu Tanaka, Hiroki Taniguchi, Tadahiro Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection
title	Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection
title_full	Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection
title_fullStr	Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection
title_full_unstemmed	Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection
title_short	Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection
title_sort	robust understanding of robot-directed speech commands using sequence to sequence with noise injection
topic	Robotics and AI
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805724/ https://www.ncbi.nlm.nih.gov/pubmed/33501159 http://dx.doi.org/10.3389/frobt.2019.00144
work_keys_str_mv	AT tadayuuki robustunderstandingofrobotdirectedspeechcommandsusingsequencetosequencewithnoiseinjection AT hagiwarayoshinobu robustunderstandingofrobotdirectedspeechcommandsusingsequencetosequencewithnoiseinjection AT tanakahiroki robustunderstandingofrobotdirectedspeechcommandsusingsequencetosequencewithnoiseinjection AT taniguchitadahiro robustunderstandingofrobotdirectedspeechcommandsusingsequencetosequencewithnoiseinjection

Robust Understanding of Robot-Directed Speech Commands Using Sequence to Sequence With Noise Injection

Ejemplares similares