Cargando…

A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes

In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous st...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Lei, Wang, Yihan, Liu, Zhixing, Wu, Ed X., Chen, Fei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8866945/ https://www.ncbi.nlm.nih.gov/pubmed/35221885 http://dx.doi.org/10.3389/fnins.2021.760611

_version_	1784655945147613184
author	Wang, Lei Wang, Yihan Liu, Zhixing Wu, Ed X. Chen, Fei
author_facet	Wang, Lei Wang, Yihan Liu, Zhixing Wu, Ed X. Chen, Fei
author_sort	Wang, Lei
collection	PubMed
description	In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)–level–based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level–based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed via the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level–based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to –6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level–based model has the potential to decode dynamic attention states in the realistic auditory scenarios.
format	Online Article Text
id	pubmed-8866945
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-88669452022-02-25 A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes Wang, Lei Wang, Yihan Liu, Zhixing Wu, Ed X. Chen, Fei Front Neurosci Neuroscience In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)–level–based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level–based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed via the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level–based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to –6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level–based model has the potential to decode dynamic attention states in the realistic auditory scenarios. Frontiers Media S.A. 2022-02-10 /pmc/articles/PMC8866945/ /pubmed/35221885 http://dx.doi.org/10.3389/fnins.2021.760611 Text en Copyright © 2022 Wang, Wang, Liu, Wu and Chen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Wang, Lei Wang, Yihan Liu, Zhixing Wu, Ed X. Chen, Fei A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes
title	A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes
title_full	A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes
title_fullStr	A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes
title_full_unstemmed	A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes
title_short	A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes
title_sort	speech-level–based segmented model to decode the dynamic auditory attention states in the competing speaker scenes
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8866945/ https://www.ncbi.nlm.nih.gov/pubmed/35221885 http://dx.doi.org/10.3389/fnins.2021.760611
work_keys_str_mv	AT wanglei aspeechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT wangyihan aspeechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT liuzhixing aspeechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT wuedx aspeechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT chenfei aspeechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT wanglei speechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT wangyihan speechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT liuzhixing speechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT wuedx speechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes AT chenfei speechlevelbasedsegmentedmodeltodecodethedynamicauditoryattentionstatesinthecompetingspeakerscenes

A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes

Ejemplares similares