Cargando…

New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding

Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a s...

Descripción completa

Detalles Bibliográficos
Autores principales: Schuller, Björn, Baird, Alice, Gebhard, Alexander, Amiriparian, Shahin, Keren, Gil, Schmitt, Maximilian, Cummins, Nicholas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8581779/
https://www.ncbi.nlm.nih.gov/pubmed/34751066
http://dx.doi.org/10.1177/23312165211046135
_version_ 1784596878513405952
author Schuller, Björn
Baird, Alice
Gebhard, Alexander
Amiriparian, Shahin
Keren, Gil
Schmitt, Maximilian
Cummins, Nicholas
author_facet Schuller, Björn
Baird, Alice
Gebhard, Alexander
Amiriparian, Shahin
Keren, Gil
Schmitt, Maximilian
Cummins, Nicholas
author_sort Schuller, Björn
collection PubMed
description Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of ‘holistic human-parity’ machine listening abilities.
format Online
Article
Text
id pubmed-8581779
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-85817792021-11-12 New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding Schuller, Björn Baird, Alice Gebhard, Alexander Amiriparian, Shahin Keren, Gil Schmitt, Maximilian Cummins, Nicholas Trends Hear Perspective Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of ‘holistic human-parity’ machine listening abilities. SAGE Publications 2021-11-09 /pmc/articles/PMC8581779/ /pubmed/34751066 http://dx.doi.org/10.1177/23312165211046135 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Perspective
Schuller, Björn
Baird, Alice
Gebhard, Alexander
Amiriparian, Shahin
Keren, Gil
Schmitt, Maximilian
Cummins, Nicholas
New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding
title New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding
title_full New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding
title_fullStr New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding
title_full_unstemmed New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding
title_short New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding
title_sort new avenues in audio intelligence: towards holistic real-life audio understanding
topic Perspective
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8581779/
https://www.ncbi.nlm.nih.gov/pubmed/34751066
http://dx.doi.org/10.1177/23312165211046135
work_keys_str_mv AT schullerbjorn newavenuesinaudiointelligencetowardsholisticreallifeaudiounderstanding
AT bairdalice newavenuesinaudiointelligencetowardsholisticreallifeaudiounderstanding
AT gebhardalexander newavenuesinaudiointelligencetowardsholisticreallifeaudiounderstanding
AT amiriparianshahin newavenuesinaudiointelligencetowardsholisticreallifeaudiounderstanding
AT kerengil newavenuesinaudiointelligencetowardsholisticreallifeaudiounderstanding
AT schmittmaximilian newavenuesinaudiointelligencetowardsholisticreallifeaudiounderstanding
AT cumminsnicholas newavenuesinaudiointelligencetowardsholisticreallifeaudiounderstanding