Cargando…

Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model

Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical si...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kuruvila, Ivine, Muncke, Jan, Fischer, Eghart, Hoppe, Ulrich
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Physiology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365753/ https://www.ncbi.nlm.nih.gov/pubmed/34408661 http://dx.doi.org/10.3389/fphys.2021.700655

_version_	1783738773572419584
author	Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich
author_facet	Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich
author_sort	Kuruvila, Ivine
collection	PubMed
description	Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.
format	Online Article Text
id	pubmed-8365753
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-83657532021-08-17 Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich Front Physiol Physiology Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy. Frontiers Media S.A. 2021-08-02 /pmc/articles/PMC8365753/ /pubmed/34408661 http://dx.doi.org/10.3389/fphys.2021.700655 Text en Copyright © 2021 Kuruvila, Muncke, Fischer and Hoppe. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Physiology Kuruvila, Ivine Muncke, Jan Fischer, Eghart Hoppe, Ulrich Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title	Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_full	Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_fullStr	Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_full_unstemmed	Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_short	Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model
title_sort	extracting the auditory attention in a dual-speaker scenario from eeg using a joint cnn-lstm model
topic	Physiology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365753/ https://www.ncbi.nlm.nih.gov/pubmed/34408661 http://dx.doi.org/10.3389/fphys.2021.700655
work_keys_str_mv	AT kuruvilaivine extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel AT munckejan extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel AT fischereghart extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel AT hoppeulrich extractingtheauditoryattentioninadualspeakerscenariofromeegusingajointcnnlstmmodel

Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model

Ejemplares similares