Cargando…

A Physiologically Inspired Model for Solving the Cocktail Party Problem

At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avia...

Descripción completa

Detalles Bibliográficos
Autores principales: Chou, Kenny F., Dong, Junzi, Colburn, H. Steven, Sen, Kamal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889086/
https://www.ncbi.nlm.nih.gov/pubmed/31392449
http://dx.doi.org/10.1007/s10162-019-00732-4
_version_ 1783475346589351936
author Chou, Kenny F.
Dong, Junzi
Colburn, H. Steven
Sen, Kamal
author_facet Chou, Kenny F.
Dong, Junzi
Colburn, H. Steven
Sen, Kamal
author_sort Chou, Kenny F.
collection PubMed
description At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an “attended” target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.
format Online
Article
Text
id pubmed-6889086
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-68890862019-12-16 A Physiologically Inspired Model for Solving the Cocktail Party Problem Chou, Kenny F. Dong, Junzi Colburn, H. Steven Sen, Kamal J Assoc Res Otolaryngol Research Article At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an “attended” target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects. Springer US 2019-08-07 2019-12 /pmc/articles/PMC6889086/ /pubmed/31392449 http://dx.doi.org/10.1007/s10162-019-00732-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research Article
Chou, Kenny F.
Dong, Junzi
Colburn, H. Steven
Sen, Kamal
A Physiologically Inspired Model for Solving the Cocktail Party Problem
title A Physiologically Inspired Model for Solving the Cocktail Party Problem
title_full A Physiologically Inspired Model for Solving the Cocktail Party Problem
title_fullStr A Physiologically Inspired Model for Solving the Cocktail Party Problem
title_full_unstemmed A Physiologically Inspired Model for Solving the Cocktail Party Problem
title_short A Physiologically Inspired Model for Solving the Cocktail Party Problem
title_sort physiologically inspired model for solving the cocktail party problem
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889086/
https://www.ncbi.nlm.nih.gov/pubmed/31392449
http://dx.doi.org/10.1007/s10162-019-00732-4
work_keys_str_mv AT choukennyf aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem
AT dongjunzi aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem
AT colburnhsteven aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem
AT senkamal aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem
AT choukennyf physiologicallyinspiredmodelforsolvingthecocktailpartyproblem
AT dongjunzi physiologicallyinspiredmodelforsolvingthecocktailpartyproblem
AT colburnhsteven physiologicallyinspiredmodelforsolvingthecocktailpartyproblem
AT senkamal physiologicallyinspiredmodelforsolvingthecocktailpartyproblem