Cargando…

Modeling Timbre Similarity of Short Music Clips

There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute a...

Descripción completa

Detalles Bibliográficos
Autores principales: Siedenburg, Kai, Müllensiefen, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5405345/
https://www.ncbi.nlm.nih.gov/pubmed/28491045
http://dx.doi.org/10.3389/fpsyg.2017.00639
_version_ 1783231746431516672
author Siedenburg, Kai
Müllensiefen, Daniel
author_facet Siedenburg, Kai
Müllensiefen, Daniel
author_sort Siedenburg, Kai
collection PubMed
description There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of short music clips can be modeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets—mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability—best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance (R(2)) between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related to timbre as well as higher level categorical features such as musical genre play a major role in the perception of short music clips.
format Online
Article
Text
id pubmed-5405345
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-54053452017-05-10 Modeling Timbre Similarity of Short Music Clips Siedenburg, Kai Müllensiefen, Daniel Front Psychol Psychology There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of short music clips can be modeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets—mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability—best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance (R(2)) between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related to timbre as well as higher level categorical features such as musical genre play a major role in the perception of short music clips. Frontiers Media S.A. 2017-04-26 /pmc/articles/PMC5405345/ /pubmed/28491045 http://dx.doi.org/10.3389/fpsyg.2017.00639 Text en Copyright © 2017 Siedenburg and Müllensiefen. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Siedenburg, Kai
Müllensiefen, Daniel
Modeling Timbre Similarity of Short Music Clips
title Modeling Timbre Similarity of Short Music Clips
title_full Modeling Timbre Similarity of Short Music Clips
title_fullStr Modeling Timbre Similarity of Short Music Clips
title_full_unstemmed Modeling Timbre Similarity of Short Music Clips
title_short Modeling Timbre Similarity of Short Music Clips
title_sort modeling timbre similarity of short music clips
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5405345/
https://www.ncbi.nlm.nih.gov/pubmed/28491045
http://dx.doi.org/10.3389/fpsyg.2017.00639
work_keys_str_mv AT siedenburgkai modelingtimbresimilarityofshortmusicclips
AT mullensiefendaniel modelingtimbresimilarityofshortmusicclips