Cargando…
Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses
Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precis...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9433707/ https://www.ncbi.nlm.nih.gov/pubmed/36061597 http://dx.doi.org/10.3389/fnins.2022.906616 |
_version_ | 1784780682628694016 |
---|---|
author | Na, Youngmin Joo, Hyosung Trang, Le Thi Quan, Luong Do Anh Woo, Jihwan |
author_facet | Na, Youngmin Joo, Hyosung Trang, Le Thi Quan, Luong Do Anh Woo, Jihwan |
author_sort | Na, Youngmin |
collection | PubMed |
description | Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test. |
format | Online Article Text |
id | pubmed-9433707 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-94337072022-09-02 Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses Na, Youngmin Joo, Hyosung Trang, Le Thi Quan, Luong Do Anh Woo, Jihwan Front Neurosci Neuroscience Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user’s speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four–channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models’ informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test. Frontiers Media S.A. 2022-08-18 /pmc/articles/PMC9433707/ /pubmed/36061597 http://dx.doi.org/10.3389/fnins.2022.906616 Text en Copyright © 2022 Na, Joo, Trang, Quan and Woo. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Na, Youngmin Joo, Hyosung Trang, Le Thi Quan, Luong Do Anh Woo, Jihwan Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_full | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_fullStr | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_full_unstemmed | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_short | Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
title_sort | objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9433707/ https://www.ncbi.nlm.nih.gov/pubmed/36061597 http://dx.doi.org/10.3389/fnins.2022.906616 |
work_keys_str_mv | AT nayoungmin objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT joohyosung objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT tranglethi objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT quanluongdoanh objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses AT woojihwan objectivespeechintelligibilitypredictionusingadeeplearningmodelwithcontinuousspeechevokedcorticalauditoryresponses |