Cargando…
A Physiologically Inspired Model for Solving the Cocktail Party Problem
At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avia...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889086/ https://www.ncbi.nlm.nih.gov/pubmed/31392449 http://dx.doi.org/10.1007/s10162-019-00732-4 |
_version_ | 1783475346589351936 |
---|---|
author | Chou, Kenny F. Dong, Junzi Colburn, H. Steven Sen, Kamal |
author_facet | Chou, Kenny F. Dong, Junzi Colburn, H. Steven Sen, Kamal |
author_sort | Chou, Kenny F. |
collection | PubMed |
description | At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an “attended” target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects. |
format | Online Article Text |
id | pubmed-6889086 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-68890862019-12-16 A Physiologically Inspired Model for Solving the Cocktail Party Problem Chou, Kenny F. Dong, Junzi Colburn, H. Steven Sen, Kamal J Assoc Res Otolaryngol Research Article At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an “attended” target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects. Springer US 2019-08-07 2019-12 /pmc/articles/PMC6889086/ /pubmed/31392449 http://dx.doi.org/10.1007/s10162-019-00732-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Research Article Chou, Kenny F. Dong, Junzi Colburn, H. Steven Sen, Kamal A Physiologically Inspired Model for Solving the Cocktail Party Problem |
title | A Physiologically Inspired Model for Solving the Cocktail Party Problem |
title_full | A Physiologically Inspired Model for Solving the Cocktail Party Problem |
title_fullStr | A Physiologically Inspired Model for Solving the Cocktail Party Problem |
title_full_unstemmed | A Physiologically Inspired Model for Solving the Cocktail Party Problem |
title_short | A Physiologically Inspired Model for Solving the Cocktail Party Problem |
title_sort | physiologically inspired model for solving the cocktail party problem |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6889086/ https://www.ncbi.nlm.nih.gov/pubmed/31392449 http://dx.doi.org/10.1007/s10162-019-00732-4 |
work_keys_str_mv | AT choukennyf aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem AT dongjunzi aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem AT colburnhsteven aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem AT senkamal aphysiologicallyinspiredmodelforsolvingthecocktailpartyproblem AT choukennyf physiologicallyinspiredmodelforsolvingthecocktailpartyproblem AT dongjunzi physiologicallyinspiredmodelforsolvingthecocktailpartyproblem AT colburnhsteven physiologicallyinspiredmodelforsolvingthecocktailpartyproblem AT senkamal physiologicallyinspiredmodelforsolvingthecocktailpartyproblem |