Cargando…

Auditory Attention Detection via Cross-Modal Attention

Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. How...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cai, Siqi, Li, Peiwen, Su, Enze, Xie, Longhan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8333999/ https://www.ncbi.nlm.nih.gov/pubmed/34366770 http://dx.doi.org/10.3389/fnins.2021.652058

_version_	1783733034931978240
author	Cai, Siqi Li, Peiwen Su, Enze Xie, Longhan
author_facet	Cai, Siqi Li, Peiwen Su, Enze Xie, Longhan
author_sort	Cai, Siqi
collection	PubMed
description	Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
format	Online Article Text
id	pubmed-8333999
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-83339992021-08-05 Auditory Attention Detection via Cross-Modal Attention Cai, Siqi Li, Peiwen Su, Enze Xie, Longhan Front Neurosci Neuroscience Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance. Frontiers Media S.A. 2021-07-21 /pmc/articles/PMC8333999/ /pubmed/34366770 http://dx.doi.org/10.3389/fnins.2021.652058 Text en Copyright © 2021 Cai, Li, Su and Xie. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Cai, Siqi Li, Peiwen Su, Enze Xie, Longhan Auditory Attention Detection via Cross-Modal Attention
title	Auditory Attention Detection via Cross-Modal Attention
title_full	Auditory Attention Detection via Cross-Modal Attention
title_fullStr	Auditory Attention Detection via Cross-Modal Attention
title_full_unstemmed	Auditory Attention Detection via Cross-Modal Attention
title_short	Auditory Attention Detection via Cross-Modal Attention
title_sort	auditory attention detection via cross-modal attention
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8333999/ https://www.ncbi.nlm.nih.gov/pubmed/34366770 http://dx.doi.org/10.3389/fnins.2021.652058
work_keys_str_mv	AT caisiqi auditoryattentiondetectionviacrossmodalattention AT lipeiwen auditoryattentiondetectionviacrossmodalattention AT suenze auditoryattentiondetectionviacrossmodalattention AT xielonghan auditoryattentiondetectionviacrossmodalattention

Auditory Attention Detection via Cross-Modal Attention

Ejemplares similares