Cargando…

Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds

The “time-varying loudness” (TVL) model of Glasberg and Moore calculates “instantaneous loudness” every 1 ms, and this is used to generate predictions of short-term loudness, the loudness of a short segment of sound, such as a word in a sentence, and of long-term loudness, the loudness of a longer s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schlittenlacher, Josef, Turner, Richard E., Moore, Brian C. J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2020
Materias:	2019 ISAAR special collection
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7457659/ https://www.ncbi.nlm.nih.gov/pubmed/32853098 http://dx.doi.org/10.1177/2331216520943074

_version_	1783576039561101312
author	Schlittenlacher, Josef Turner, Richard E. Moore, Brian C. J.
author_facet	Schlittenlacher, Josef Turner, Richard E. Moore, Brian C. J.
author_sort	Schlittenlacher, Josef
collection	PubMed
description	The “time-varying loudness” (TVL) model of Glasberg and Moore calculates “instantaneous loudness” every 1 ms, and this is used to generate predictions of short-term loudness, the loudness of a short segment of sound, such as a word in a sentence, and of long-term loudness, the loudness of a longer segment of sound, such as a whole sentence. The calculation of instantaneous loudness is computationally intensive and real-time implementation of the TVL model is difficult. To speed up the computation, a deep neural network (DNN) was trained to predict instantaneous loudness using a large database of speech sounds and artificial sounds (tones alone and tones in white or pink noise), with the predictions of the TVL model as a reference (providing the “correct” answer, specifically the loudness level in phons). A multilayer perceptron with three hidden layers was found to be sufficient, with more complex DNN architecture not yielding higher accuracy. After training, the deviations between the predictions of the TVL model and the predictions of the DNN were typically less than 0.5 phons, even for types of sounds that were not used for training (music, rain, animal sounds, and washing machine). The DNN calculates instantaneous loudness over 100 times more quickly than the TVL model. Possible applications of the DNN are discussed.
format	Online Article Text
id	pubmed-7457659
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-74576592020-09-11 Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds Schlittenlacher, Josef Turner, Richard E. Moore, Brian C. J. Trends Hear 2019 ISAAR special collection The “time-varying loudness” (TVL) model of Glasberg and Moore calculates “instantaneous loudness” every 1 ms, and this is used to generate predictions of short-term loudness, the loudness of a short segment of sound, such as a word in a sentence, and of long-term loudness, the loudness of a longer segment of sound, such as a whole sentence. The calculation of instantaneous loudness is computationally intensive and real-time implementation of the TVL model is difficult. To speed up the computation, a deep neural network (DNN) was trained to predict instantaneous loudness using a large database of speech sounds and artificial sounds (tones alone and tones in white or pink noise), with the predictions of the TVL model as a reference (providing the “correct” answer, specifically the loudness level in phons). A multilayer perceptron with three hidden layers was found to be sufficient, with more complex DNN architecture not yielding higher accuracy. After training, the deviations between the predictions of the TVL model and the predictions of the DNN were typically less than 0.5 phons, even for types of sounds that were not used for training (music, rain, animal sounds, and washing machine). The DNN calculates instantaneous loudness over 100 times more quickly than the TVL model. Possible applications of the DNN are discussed. SAGE Publications 2020-08-27 /pmc/articles/PMC7457659/ /pubmed/32853098 http://dx.doi.org/10.1177/2331216520943074 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/ Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	2019 ISAAR special collection Schlittenlacher, Josef Turner, Richard E. Moore, Brian C. J. Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
title	Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
title_full	Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
title_fullStr	Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
title_full_unstemmed	Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
title_short	Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds
title_sort	development of a deep neural network for speeding up a model of loudness for time-varying sounds
topic	2019 ISAAR special collection
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7457659/ https://www.ncbi.nlm.nih.gov/pubmed/32853098 http://dx.doi.org/10.1177/2331216520943074
work_keys_str_mv	AT schlittenlacherjosef developmentofadeepneuralnetworkforspeedingupamodelofloudnessfortimevaryingsounds AT turnerricharde developmentofadeepneuralnetworkforspeedingupamodelofloudnessfortimevaryingsounds AT moorebriancj developmentofadeepneuralnetworkforspeedingupamodelofloudnessfortimevaryingsounds

Development of a Deep Neural Network for Speeding Up a Model of Loudness for Time-Varying Sounds

Ejemplares similares