Cargando…
Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical si...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365753/ https://www.ncbi.nlm.nih.gov/pubmed/34408661 http://dx.doi.org/10.3389/fphys.2021.700655 |
_version_ | 1783738773572419584 |
---|---|
author | Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich |
author_facet | Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich |
author_sort | Kuruvila, Ivine |
collection | PubMed |
description | Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy. |
format | Online Article Text |
id | pubmed-8365753 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-83657532021-08-17 Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich Front Physiol Physiology Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy. Frontiers Media S.A. 2021-08-02 /pmc/articles/PMC8365753/ /pubmed/34408661 http://dx.doi.org/10.3389/fphys.2021.700655 Text en Copyright © 2021 Kuruvila, Muncke, Fischer and Hoppe. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Physiology Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model |
title | Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model |
title_full | Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model |
title_fullStr | Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model |
title_full_unstemmed | Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model |
title_short | Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model |
title_sort | extracting the auditory attention in a dual-speaker scenario from eeg using a joint cnn-lstm model |
topic | Physiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365753/ https://www.ncbi.nlm.nih.gov/pubmed/34408661 http://dx.doi.org/10.3389/fphys.2021.700655 |
work_keys_str_mv | AT kuruvilaivine extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel AT munckejan extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel AT fischereghart extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel AT hoppeulrich extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel |