Cargando…

Speech Recognition for the iCub Platform

This paper describes open source software (available at https://github.com/robotology/natural-speech) to build automatic speech recognition (ASR) systems and run them within the YARP platform. The toolkit is designed (i) to allow non-ASR experts to easily create their own ASR system and run it on iC...

Descripción completa

Detalles Bibliográficos
Autores principales: Higy, Bertrand, Mereta, Alessio, Metta, Giorgio, Badino, Leonardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805979/
https://www.ncbi.nlm.nih.gov/pubmed/33500897
http://dx.doi.org/10.3389/frobt.2018.00010
_version_ 1783636426933403648
author Higy, Bertrand
Mereta, Alessio
Metta, Giorgio
Badino, Leonardo
author_facet Higy, Bertrand
Mereta, Alessio
Metta, Giorgio
Badino, Leonardo
author_sort Higy, Bertrand
collection PubMed
description This paper describes open source software (available at https://github.com/robotology/natural-speech) to build automatic speech recognition (ASR) systems and run them within the YARP platform. The toolkit is designed (i) to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii) to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human–iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab) is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: “articulatory” and “unsupervised” speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning-based baselines. The second type of recognition systems, the “unsupervised” systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems). To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2.5-h speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates.
format Online
Article
Text
id pubmed-7805979
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78059792021-01-25 Speech Recognition for the iCub Platform Higy, Bertrand Mereta, Alessio Metta, Giorgio Badino, Leonardo Front Robot AI Robotics and AI This paper describes open source software (available at https://github.com/robotology/natural-speech) to build automatic speech recognition (ASR) systems and run them within the YARP platform. The toolkit is designed (i) to allow non-ASR experts to easily create their own ASR system and run it on iCub and (ii) to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human–iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab) is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: “articulatory” and “unsupervised” speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning-based baselines. The second type of recognition systems, the “unsupervised” systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems). To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2.5-h speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates. Frontiers Media S.A. 2018-02-12 /pmc/articles/PMC7805979/ /pubmed/33500897 http://dx.doi.org/10.3389/frobt.2018.00010 Text en Copyright © 2018 Higy, Mereta, Metta and Badino. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Robotics and AI
Higy, Bertrand
Mereta, Alessio
Metta, Giorgio
Badino, Leonardo
Speech Recognition for the iCub Platform
title Speech Recognition for the iCub Platform
title_full Speech Recognition for the iCub Platform
title_fullStr Speech Recognition for the iCub Platform
title_full_unstemmed Speech Recognition for the iCub Platform
title_short Speech Recognition for the iCub Platform
title_sort speech recognition for the icub platform
topic Robotics and AI
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805979/
https://www.ncbi.nlm.nih.gov/pubmed/33500897
http://dx.doi.org/10.3389/frobt.2018.00010
work_keys_str_mv AT higybertrand speechrecognitionfortheicubplatform
AT meretaalessio speechrecognitionfortheicubplatform
AT mettagiorgio speechrecognitionfortheicubplatform
AT badinoleonardo speechrecognitionfortheicubplatform