Cargando…

Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language

Silent speech decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of brain–computer interfaces (BCIs) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silen...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Huiyan, Lin, Haohong, Wang, You, Wang, Hengyang, Zhang, Ming, Gao, Han, Ai, Qing, Luo, Zhiyuan, Li, Guang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9312762/
https://www.ncbi.nlm.nih.gov/pubmed/35884626
http://dx.doi.org/10.3390/brainsci12070818
_version_ 1784753912666914816
author Li, Huiyan
Lin, Haohong
Wang, You
Wang, Hengyang
Zhang, Ming
Gao, Han
Ai, Qing
Luo, Zhiyuan
Li, Guang
author_facet Li, Huiyan
Lin, Haohong
Wang, You
Wang, Hengyang
Zhang, Ming
Gao, Han
Ai, Qing
Luo, Zhiyuan
Li, Guang
author_sort Li, Huiyan
collection PubMed
description Silent speech decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of brain–computer interfaces (BCIs) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silent speech in tonal languages such as Mandarin Chinese is still difficult. This paper proposes an optimized sequence-to-sequence (Seq2Seq) approach to synthesize voice from the sEMG-based silent speech. We extract duration information to regulate the sEMG-based silent speech using the audio length. Then, we provide a deep-learning model with an encoder–decoder structure and a state-of-the-art vocoder to generate the audio waveform. Experiments based on six Mandarin Chinese speakers demonstrate that the proposed model can successfully decode silent speech in Mandarin Chinese and achieve a character error rate (CER) of 6.41% on average with human evaluation.
format Online
Article
Text
id pubmed-9312762
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93127622022-07-26 Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language Li, Huiyan Lin, Haohong Wang, You Wang, Hengyang Zhang, Ming Gao, Han Ai, Qing Luo, Zhiyuan Li, Guang Brain Sci Article Silent speech decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of brain–computer interfaces (BCIs) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silent speech in tonal languages such as Mandarin Chinese is still difficult. This paper proposes an optimized sequence-to-sequence (Seq2Seq) approach to synthesize voice from the sEMG-based silent speech. We extract duration information to regulate the sEMG-based silent speech using the audio length. Then, we provide a deep-learning model with an encoder–decoder structure and a state-of-the-art vocoder to generate the audio waveform. Experiments based on six Mandarin Chinese speakers demonstrate that the proposed model can successfully decode silent speech in Mandarin Chinese and achieve a character error rate (CER) of 6.41% on average with human evaluation. MDPI 2022-06-23 /pmc/articles/PMC9312762/ /pubmed/35884626 http://dx.doi.org/10.3390/brainsci12070818 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Huiyan
Lin, Haohong
Wang, You
Wang, Hengyang
Zhang, Ming
Gao, Han
Ai, Qing
Luo, Zhiyuan
Li, Guang
Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
title Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
title_full Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
title_fullStr Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
title_full_unstemmed Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
title_short Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language
title_sort sequence-to-sequence voice reconstruction for silent speech in a tonal language
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9312762/
https://www.ncbi.nlm.nih.gov/pubmed/35884626
http://dx.doi.org/10.3390/brainsci12070818
work_keys_str_mv AT lihuiyan sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT linhaohong sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT wangyou sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT wanghengyang sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT zhangming sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT gaohan sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT aiqing sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT luozhiyuan sequencetosequencevoicereconstructionforsilentspeechinatonallanguage
AT liguang sequencetosequencevoicereconstructionforsilentspeechinatonallanguage