Cargando…

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stowell, Dan, Plumbley, Mark D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2014
Materias:	Ecology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4106198/ https://www.ncbi.nlm.nih.gov/pubmed/25083350 http://dx.doi.org/10.7717/peerj.488

_version_	1782327493515018240
author	Stowell, Dan Plumbley, Mark D.
author_facet	Stowell, Dan Plumbley, Mark D.
author_sort	Stowell, Dan
collection	PubMed
description	Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and “unsupervised”, meaning it requires no manual data labelling, yet it can improve performance on “supervised” tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We study the interaction between dataset characteristics and choice of feature representation through further empirical analysis.
format	Online Article Text
id	pubmed-4106198
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-41061982014-07-31 Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning Stowell, Dan Plumbley, Mark D. PeerJ Ecology Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and “unsupervised”, meaning it requires no manual data labelling, yet it can improve performance on “supervised” tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We study the interaction between dataset characteristics and choice of feature representation through further empirical analysis. PeerJ Inc. 2014-07-17 /pmc/articles/PMC4106198/ /pubmed/25083350 http://dx.doi.org/10.7717/peerj.488 Text en © 2014 Stowell and Plumbley http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Ecology Stowell, Dan Plumbley, Mark D. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
title	Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
title_full	Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
title_fullStr	Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
title_full_unstemmed	Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
title_short	Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
title_sort	automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
topic	Ecology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4106198/ https://www.ncbi.nlm.nih.gov/pubmed/25083350 http://dx.doi.org/10.7717/peerj.488
work_keys_str_mv	AT stowelldan automaticlargescaleclassificationofbirdsoundsisstronglyimprovedbyunsupervisedfeaturelearning AT plumbleymarkd automaticlargescaleclassificationofbirdsoundsisstronglyimprovedbyunsupervisedfeaturelearning

Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning

Ejemplares similares