Cargando…

Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insuff...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Byeongwook, Cho, Kwang-Hyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5120313/
https://www.ncbi.nlm.nih.gov/pubmed/27876875
http://dx.doi.org/10.1038/srep37647
_version_ 1782469216500187136
author Lee, Byeongwook
Cho, Kwang-Hyun
author_facet Lee, Byeongwook
Cho, Kwang-Hyun
author_sort Lee, Byeongwook
collection PubMed
description Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.
format Online
Article
Text
id pubmed-5120313
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-51203132016-11-28 Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference Lee, Byeongwook Cho, Kwang-Hyun Sci Rep Article Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test. Nature Publishing Group 2016-11-23 /pmc/articles/PMC5120313/ /pubmed/27876875 http://dx.doi.org/10.1038/srep37647 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Lee, Byeongwook
Cho, Kwang-Hyun
Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
title Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
title_full Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
title_fullStr Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
title_full_unstemmed Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
title_short Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
title_sort brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5120313/
https://www.ncbi.nlm.nih.gov/pubmed/27876875
http://dx.doi.org/10.1038/srep37647
work_keys_str_mv AT leebyeongwook braininspiredspeechsegmentationforautomaticspeechrecognitionusingthespeechenvelopeasatemporalreference
AT chokwanghyun braininspiredspeechsegmentationforautomaticspeechrecognitionusingthespeechenvelopeasatemporalreference