Cargando…

Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model

Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical si...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuruvila, Ivine, Muncke, Jan, Fischer, Eghart, Hoppe, Ulrich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365753/
https://www.ncbi.nlm.nih.gov/pubmed/34408661
http://dx.doi.org/10.3389/fphys.2021.700655
_version_ 1783738773572419584
author Kuruvila, Ivine
Muncke, Jan
Fischer, Eghart
Hoppe, Ulrich
author_facet Kuruvila, Ivine
Muncke, Jan
Fischer, Eghart
Hoppe, Ulrich
author_sort Kuruvila, Ivine
collection PubMed
description Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.
format Online
Article
Text
id pubmed-8365753
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-83657532021-08-17 Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich Front Physiol Physiology Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy. Frontiers Media S.A. 2021-08-02 /pmc/articles/PMC8365753/ /pubmed/34408661 http://dx.doi.org/10.3389/fphys.2021.700655 Text en Copyright © 2021 Kuruvila, Muncke, Fischer and Hoppe. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Physiology
Kuruvila, Ivine
Muncke, Jan
Fischer, Eghart
Hoppe, Ulrich
Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_full Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_fullStr Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_full_unstemmed Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_short Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_sort extracting the auditory attention in a dual-speaker scenario from eeg using a joint cnn-lstm model
topic Physiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365753/
https://www.ncbi.nlm.nih.gov/pubmed/34408661
http://dx.doi.org/10.3389/fphys.2021.700655
work_keys_str_mv AT kuruvilaivine extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel
AT munckejan extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel
AT fischereghart extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel
AT hoppeulrich extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel