Cargando…
Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347238/ https://www.ncbi.nlm.nih.gov/pubmed/37447906 http://dx.doi.org/10.3390/s23136056 |
_version_ | 1785073502986960896 |
---|---|
author | Poirier, Samuel Côté-Allard, Ulysse Routhier, François Campeau-Lecours, Alexandre |
author_facet | Poirier, Samuel Côté-Allard, Ulysse Routhier, François Campeau-Lecours, Alexandre |
author_sort | Poirier, Samuel |
collection | PubMed |
description | Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. In contrast, vocal commands may represent a viable and intuitive alternative. This work represents an important step toward providing a viable vocal interface for people living with upper limb disabilities by proposing a novel lightweight vocal command recognition system. The proposed model leverages the MobileNet2 architecture, augmenting it with a novel approach to the self-attention mechanism, achieving a new state-of-the-art performance for Keyword Spotting (KWS) on the Google Speech Commands Dataset (GSCD). Moreover, this work presents a new dataset, referred to as the French Speech Commands Dataset (FSCD), comprising 4963 vocal command utterances. Using the GSCD as the source, we used Transfer Learning (TL) to adapt the model to this cross-language task. TL has been shown to significantly improve the model performance on the FSCD. The viability of the proposed approach is further demonstrated through real-life control of a robotic arm by four healthy participants using both the proposed vocal interface and a joystick. |
format | Online Article Text |
id | pubmed-10347238 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-103472382023-07-15 Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control Poirier, Samuel Côté-Allard, Ulysse Routhier, François Campeau-Lecours, Alexandre Sensors (Basel) Article Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. In contrast, vocal commands may represent a viable and intuitive alternative. This work represents an important step toward providing a viable vocal interface for people living with upper limb disabilities by proposing a novel lightweight vocal command recognition system. The proposed model leverages the MobileNet2 architecture, augmenting it with a novel approach to the self-attention mechanism, achieving a new state-of-the-art performance for Keyword Spotting (KWS) on the Google Speech Commands Dataset (GSCD). Moreover, this work presents a new dataset, referred to as the French Speech Commands Dataset (FSCD), comprising 4963 vocal command utterances. Using the GSCD as the source, we used Transfer Learning (TL) to adapt the model to this cross-language task. TL has been shown to significantly improve the model performance on the FSCD. The viability of the proposed approach is further demonstrated through real-life control of a robotic arm by four healthy participants using both the proposed vocal interface and a joystick. MDPI 2023-06-30 /pmc/articles/PMC10347238/ /pubmed/37447906 http://dx.doi.org/10.3390/s23136056 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Poirier, Samuel Côté-Allard, Ulysse Routhier, François Campeau-Lecours, Alexandre Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control |
title | Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control |
title_full | Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control |
title_fullStr | Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control |
title_full_unstemmed | Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control |
title_short | Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control |
title_sort | efficient self-attention model for speech recognition-based assistive robots control |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347238/ https://www.ncbi.nlm.nih.gov/pubmed/37447906 http://dx.doi.org/10.3390/s23136056 |
work_keys_str_mv | AT poiriersamuel efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol AT coteallardulysse efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol AT routhierfrancois efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol AT campeaulecoursalexandre efficientselfattentionmodelforspeechrecognitionbasedassistiverobotscontrol |