Cargando…

Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time

The human auditory system has the ability to segregate complex auditory scenes into a foreground component and a background, allowing us to listen to specific speech sounds from a mixture of sounds. Selective attention plays a crucial role in this process, colloquially known as the “cocktail party e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thakur, Chetan Singh, Wang, Runchun M., Afshar, Saeed, Hamilton, Tara J., Tapson, Jonathan C., Shamma, Shihab A., van Schaik, André
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2015
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4557082/ https://www.ncbi.nlm.nih.gov/pubmed/26388721 http://dx.doi.org/10.3389/fnins.2015.00309

_version_	1782388448265502720
author	Thakur, Chetan Singh Wang, Runchun M. Afshar, Saeed Hamilton, Tara J. Tapson, Jonathan C. Shamma, Shihab A. van Schaik, André
author_facet	Thakur, Chetan Singh Wang, Runchun M. Afshar, Saeed Hamilton, Tara J. Tapson, Jonathan C. Shamma, Shihab A. van Schaik, André
author_sort	Thakur, Chetan Singh
collection	PubMed
description	The human auditory system has the ability to segregate complex auditory scenes into a foreground component and a background, allowing us to listen to specific speech sounds from a mixture of sounds. Selective attention plays a crucial role in this process, colloquially known as the “cocktail party effect.” It has not been possible to build a machine that can emulate this human ability in real-time. Here, we have developed a framework for the implementation of a neuromorphic sound segregation algorithm in a Field Programmable Gate Array (FPGA). This algorithm is based on the principles of temporal coherence and uses an attention signal to separate a target sound stream from background noise. Temporal coherence implies that auditory features belonging to the same sound source are coherently modulated and evoke highly correlated neural response patterns. The basis for this form of sound segregation is that responses from pairs of channels that are strongly positively correlated belong to the same stream, while channels that are uncorrelated or anti-correlated belong to different streams. In our framework, we have used a neuromorphic cochlea as a frontend sound analyser to extract spatial information of the sound input, which then passes through band pass filters that extract the sound envelope at various modulation rates. Further stages include feature extraction and mask generation, which is finally used to reconstruct the targeted sound. Using sample tonal and speech mixtures, we show that our FPGA architecture is able to segregate sound sources in real-time. The accuracy of segregation is indicated by the high signal-to-noise ratio (SNR) of the segregated stream (90, 77, and 55 dB for simple tone, complex tone, and speech, respectively) as compared to the SNR of the mixture waveform (0 dB). This system may be easily extended for the segregation of complex speech signals, and may thus find various applications in electronic devices such as for sound segregation and speech recognition.
format	Online Article Text
id	pubmed-4557082
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-45570822015-09-18 Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time Thakur, Chetan Singh Wang, Runchun M. Afshar, Saeed Hamilton, Tara J. Tapson, Jonathan C. Shamma, Shihab A. van Schaik, André Front Neurosci Neuroscience The human auditory system has the ability to segregate complex auditory scenes into a foreground component and a background, allowing us to listen to specific speech sounds from a mixture of sounds. Selective attention plays a crucial role in this process, colloquially known as the “cocktail party effect.” It has not been possible to build a machine that can emulate this human ability in real-time. Here, we have developed a framework for the implementation of a neuromorphic sound segregation algorithm in a Field Programmable Gate Array (FPGA). This algorithm is based on the principles of temporal coherence and uses an attention signal to separate a target sound stream from background noise. Temporal coherence implies that auditory features belonging to the same sound source are coherently modulated and evoke highly correlated neural response patterns. The basis for this form of sound segregation is that responses from pairs of channels that are strongly positively correlated belong to the same stream, while channels that are uncorrelated or anti-correlated belong to different streams. In our framework, we have used a neuromorphic cochlea as a frontend sound analyser to extract spatial information of the sound input, which then passes through band pass filters that extract the sound envelope at various modulation rates. Further stages include feature extraction and mask generation, which is finally used to reconstruct the targeted sound. Using sample tonal and speech mixtures, we show that our FPGA architecture is able to segregate sound sources in real-time. The accuracy of segregation is indicated by the high signal-to-noise ratio (SNR) of the segregated stream (90, 77, and 55 dB for simple tone, complex tone, and speech, respectively) as compared to the SNR of the mixture waveform (0 dB). This system may be easily extended for the segregation of complex speech signals, and may thus find various applications in electronic devices such as for sound segregation and speech recognition. Frontiers Media S.A. 2015-09-02 /pmc/articles/PMC4557082/ /pubmed/26388721 http://dx.doi.org/10.3389/fnins.2015.00309 Text en Copyright © 2015 Thakur, Wang, Afshar, Hamilton, Tapson, Shamma and van Schaik. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Thakur, Chetan Singh Wang, Runchun M. Afshar, Saeed Hamilton, Tara J. Tapson, Jonathan C. Shamma, Shihab A. van Schaik, André Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
title	Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
title_full	Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
title_fullStr	Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
title_full_unstemmed	Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
title_short	Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
title_sort	sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4557082/ https://www.ncbi.nlm.nih.gov/pubmed/26388721 http://dx.doi.org/10.3389/fnins.2015.00309
work_keys_str_mv	AT thakurchetansingh soundstreamsegregationaneuromorphicapproachtosolvethecocktailpartyprobleminrealtime AT wangrunchunm soundstreamsegregationaneuromorphicapproachtosolvethecocktailpartyprobleminrealtime AT afsharsaeed soundstreamsegregationaneuromorphicapproachtosolvethecocktailpartyprobleminrealtime AT hamiltontaraj soundstreamsegregationaneuromorphicapproachtosolvethecocktailpartyprobleminrealtime AT tapsonjonathanc soundstreamsegregationaneuromorphicapproachtosolvethecocktailpartyprobleminrealtime AT shammashihaba soundstreamsegregationaneuromorphicapproachtosolvethecocktailpartyprobleminrealtime AT vanschaikandre soundstreamsegregationaneuromorphicapproachtosolvethecocktailpartyprobleminrealtime

Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time

Ejemplares similares