Cargando…

Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control

Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and...

Descripción completa

Detalles Bibliográficos
Autores principales: Poirier, Samuel, Côté-Allard, Ulysse, Routhier, François, Campeau-Lecours, Alexandre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347238/
https://www.ncbi.nlm.nih.gov/pubmed/37447906
http://dx.doi.org/10.3390/s23136056
_version_ 1785073502986960896
author Poirier, Samuel
Côté-Allard, Ulysse
Routhier, François
Campeau-Lecours, Alexandre
author_facet Poirier, Samuel
Côté-Allard, Ulysse
Routhier, François
Campeau-Lecours, Alexandre
author_sort Poirier, Samuel
collection PubMed
description Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. In contrast, vocal commands may represent a viable and intuitive alternative. This work represents an important step toward providing a viable vocal interface for people living with upper limb disabilities by proposing a novel lightweight vocal command recognition system. The proposed model leverages the MobileNet2 architecture, augmenting it with a novel approach to the self-attention mechanism, achieving a new state-of-the-art performance for Keyword Spotting (KWS) on the Google Speech Commands Dataset (GSCD). Moreover, this work presents a new dataset, referred to as the French Speech Commands Dataset (FSCD), comprising 4963 vocal command utterances. Using the GSCD as the source, we used Transfer Learning (TL) to adapt the model to this cross-language task. TL has been shown to significantly improve the model performance on the FSCD. The viability of the proposed approach is further demonstrated through real-life control of a robotic arm by four healthy participants using both the proposed vocal interface and a joystick.
format Online
Article
Text
id pubmed-10347238
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103472382023-07-15 Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control Poirier, Samuel Côté-Allard, Ulysse Routhier, François Campeau-Lecours, Alexandre Sensors (Basel) Article Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. In contrast, vocal commands may represent a viable and intuitive alternative. This work represents an important step toward providing a viable vocal interface for people living with upper limb disabilities by proposing a novel lightweight vocal command recognition system. The proposed model leverages the MobileNet2 architecture, augmenting it with a novel approach to the self-attention mechanism, achieving a new state-of-the-art performance for Keyword Spotting (KWS) on the Google Speech Commands Dataset (GSCD). Moreover, this work presents a new dataset, referred to as the French Speech Commands Dataset (FSCD), comprising 4963 vocal command utterances. Using the GSCD as the source, we used Transfer Learning (TL) to adapt the model to this cross-language task. TL has been shown to significantly improve the model performance on the FSCD. The viability of the proposed approach is further demonstrated through real-life control of a robotic arm by four healthy participants using both the proposed vocal interface and a joystick. MDPI 2023-06-30 /pmc/articles/PMC10347238/ /pubmed/37447906 http://dx.doi.org/10.3390/s23136056 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Poirier, Samuel
Côté-Allard, Ulysse
Routhier, François
Campeau-Lecours, Alexandre
Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
title Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
title_full Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
title_fullStr Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
title_full_unstemmed Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
title_short Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
title_sort efficient self-attention model for speech recognition-based assistive robots control
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347238/
https://www.ncbi.nlm.nih.gov/pubmed/37447906
http://dx.doi.org/10.3390/s23136056
work_keys_str_mv AT poiriersamuel efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol
AT coteallardulysse efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol
AT routhierfrancois efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol
AT campeaulecoursalexandre efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol