Cargando…

Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech

When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Daube, Christoph, Ince, Robin A.A., Gross, Joachim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cell Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6584359/
https://www.ncbi.nlm.nih.gov/pubmed/31130454
http://dx.doi.org/10.1016/j.cub.2019.04.067
_version_ 1783428502698065920
author Daube, Christoph
Ince, Robin A.A.
Gross, Joachim
author_facet Daube, Christoph
Ince, Robin A.A.
Gross, Joachim
author_sort Daube, Christoph
collection PubMed
description When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representations are guided by linguistically defined concepts, such as phonemes. Here, we argue that in order to arrive at an unbiased understanding of the neuronal responses to speech, we should focus instead on representations obtained directly from the stimulus. We illustrate our view with a data-driven, information theoretic analysis of a dataset of 24 young, healthy humans who listened to a 1 h narrative while their magnetoencephalogram (MEG) was recorded. We find that two recent results, the improved performance of an encoding model in which annotated linguistic and acoustic features were combined and the decoding of phoneme subgroups from phoneme-locked responses, can be explained by an encoding model that is based entirely on acoustic features. These acoustic features capitalize on acoustic edges and outperform Gabor-filtered spectrograms, which can explicitly describe the spectrotemporal characteristics of individual phonemes. By replicating our results in publicly available electroencephalography (EEG) data, we conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we believe that in order to further our understanding of human cortical responses to speech, we should also explore low-level and parsimonious explanations for apparent high-level phenomena.
format Online
Article
Text
id pubmed-6584359
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Cell Press
record_format MEDLINE/PubMed
spelling pubmed-65843592019-06-27 Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech Daube, Christoph Ince, Robin A.A. Gross, Joachim Curr Biol Article When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representations are guided by linguistically defined concepts, such as phonemes. Here, we argue that in order to arrive at an unbiased understanding of the neuronal responses to speech, we should focus instead on representations obtained directly from the stimulus. We illustrate our view with a data-driven, information theoretic analysis of a dataset of 24 young, healthy humans who listened to a 1 h narrative while their magnetoencephalogram (MEG) was recorded. We find that two recent results, the improved performance of an encoding model in which annotated linguistic and acoustic features were combined and the decoding of phoneme subgroups from phoneme-locked responses, can be explained by an encoding model that is based entirely on acoustic features. These acoustic features capitalize on acoustic edges and outperform Gabor-filtered spectrograms, which can explicitly describe the spectrotemporal characteristics of individual phonemes. By replicating our results in publicly available electroencephalography (EEG) data, we conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we believe that in order to further our understanding of human cortical responses to speech, we should also explore low-level and parsimonious explanations for apparent high-level phenomena. Cell Press 2019-06-17 /pmc/articles/PMC6584359/ /pubmed/31130454 http://dx.doi.org/10.1016/j.cub.2019.04.067 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Daube, Christoph
Ince, Robin A.A.
Gross, Joachim
Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech
title Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech
title_full Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech
title_fullStr Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech
title_full_unstemmed Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech
title_short Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech
title_sort simple acoustic features can explain phoneme-based predictions of cortical responses to speech
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6584359/
https://www.ncbi.nlm.nih.gov/pubmed/31130454
http://dx.doi.org/10.1016/j.cub.2019.04.067
work_keys_str_mv AT daubechristoph simpleacousticfeaturescanexplainphonemebasedpredictionsofcorticalresponsestospeech
AT incerobinaa simpleacousticfeaturescanexplainphonemebasedpredictionsofcorticalresponsestospeech
AT grossjoachim simpleacousticfeaturescanexplainphonemebasedpredictionsofcorticalresponsestospeech