Cargando…
Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference
Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insuff...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5120313/ https://www.ncbi.nlm.nih.gov/pubmed/27876875 http://dx.doi.org/10.1038/srep37647 |
_version_ | 1782469216500187136 |
---|---|
author | Lee, Byeongwook Cho, Kwang-Hyun |
author_facet | Lee, Byeongwook Cho, Kwang-Hyun |
author_sort | Lee, Byeongwook |
collection | PubMed |
description | Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test. |
format | Online Article Text |
id | pubmed-5120313 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-51203132016-11-28 Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference Lee, Byeongwook Cho, Kwang-Hyun Sci Rep Article Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test. Nature Publishing Group 2016-11-23 /pmc/articles/PMC5120313/ /pubmed/27876875 http://dx.doi.org/10.1038/srep37647 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Lee, Byeongwook Cho, Kwang-Hyun Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference |
title | Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference |
title_full | Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference |
title_fullStr | Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference |
title_full_unstemmed | Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference |
title_short | Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference |
title_sort | brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5120313/ https://www.ncbi.nlm.nih.gov/pubmed/27876875 http://dx.doi.org/10.1038/srep37647 |
work_keys_str_mv | AT leebyeongwook braininspiredspeechsegmentationforautomaticspeechrecognitionusingthespeechenvelopeasatemporalreference AT chokwanghyun braininspiredspeechsegmentationforautomaticspeechrecognitionusingthespeechenvelopeasatemporalreference |