Cargando…

Active listening

This paper introduces active listening, as a unified framework for synthesising and recognising speech. The notion of active listening inherits from active inference, which considers perception and action under one universal imperative: to maximise the evidence for our (generative) models of the wor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Friston, Karl J., Sajid, Noor, Quiroga-Martinez, David Ricardo, Parr, Thomas, Price, Cathy J., Holmes, Emma
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier/North-Holland Biomedical Press 2021
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7812378/ https://www.ncbi.nlm.nih.gov/pubmed/32732017 http://dx.doi.org/10.1016/j.heares.2020.107998

_version_	1783637656935071744
author	Friston, Karl J. Sajid, Noor Quiroga-Martinez, David Ricardo Parr, Thomas Price, Cathy J. Holmes, Emma
author_facet	Friston, Karl J. Sajid, Noor Quiroga-Martinez, David Ricardo Parr, Thomas Price, Cathy J. Holmes, Emma
author_sort	Friston, Karl J.
collection	PubMed
description	This paper introduces active listening, as a unified framework for synthesising and recognising speech. The notion of active listening inherits from active inference, which considers perception and action under one universal imperative: to maximise the evidence for our (generative) models of the world. First, we describe a generative model of spoken words that simulates (i) how discrete lexical, prosodic, and speaker attributes give rise to continuous acoustic signals; and conversely (ii) how continuous acoustic signals are recognised as words. The ‘active’ aspect involves (covertly) segmenting spoken sentences and borrows ideas from active vision. It casts speech segmentation as the selection of internal actions, corresponding to the placement of word boundaries. Practically, word boundaries are selected that maximise the evidence for an internal model of how individual words are generated. We establish face validity by simulating speech recognition and showing how the inferred content of a sentence depends on prior beliefs and background noise. Finally, we consider predictive validity by associating neuronal or physiological responses, such as the mismatch negativity and P300, with belief updating under active listening, which is greatest in the absence of accurate prior beliefs about what will be heard next.
format	Online Article Text
id	pubmed-7812378
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Elsevier/North-Holland Biomedical Press
record_format	MEDLINE/PubMed
spelling	pubmed-78123782021-01-22 Active listening Friston, Karl J. Sajid, Noor Quiroga-Martinez, David Ricardo Parr, Thomas Price, Cathy J. Holmes, Emma Hear Res Technical Note This paper introduces active listening, as a unified framework for synthesising and recognising speech. The notion of active listening inherits from active inference, which considers perception and action under one universal imperative: to maximise the evidence for our (generative) models of the world. First, we describe a generative model of spoken words that simulates (i) how discrete lexical, prosodic, and speaker attributes give rise to continuous acoustic signals; and conversely (ii) how continuous acoustic signals are recognised as words. The ‘active’ aspect involves (covertly) segmenting spoken sentences and borrows ideas from active vision. It casts speech segmentation as the selection of internal actions, corresponding to the placement of word boundaries. Practically, word boundaries are selected that maximise the evidence for an internal model of how individual words are generated. We establish face validity by simulating speech recognition and showing how the inferred content of a sentence depends on prior beliefs and background noise. Finally, we consider predictive validity by associating neuronal or physiological responses, such as the mismatch negativity and P300, with belief updating under active listening, which is greatest in the absence of accurate prior beliefs about what will be heard next. Elsevier/North-Holland Biomedical Press 2021-01 /pmc/articles/PMC7812378/ /pubmed/32732017 http://dx.doi.org/10.1016/j.heares.2020.107998 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Technical Note Friston, Karl J. Sajid, Noor Quiroga-Martinez, David Ricardo Parr, Thomas Price, Cathy J. Holmes, Emma Active listening
title	Active listening
title_full	Active listening
title_fullStr	Active listening
title_full_unstemmed	Active listening
title_short	Active listening
title_sort	active listening
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7812378/ https://www.ncbi.nlm.nih.gov/pubmed/32732017 http://dx.doi.org/10.1016/j.heares.2020.107998
work_keys_str_mv	AT fristonkarlj activelistening AT sajidnoor activelistening AT quirogamartinezdavidricardo activelistening AT parrthomas activelistening AT pricecathyj activelistening AT holmesemma activelistening

Active listening

Ejemplares similares