Cargando…

Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations

The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs)...

Descripción completa

Detalles Bibliográficos
Autores principales: Trapanotto, Martino, Nanni, Loris, Brahnam, Sheryl, Guo, Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9029749/
https://www.ncbi.nlm.nih.gov/pubmed/35448223
http://dx.doi.org/10.3390/jimaging8040096
_version_ 1784691956495941632
author Trapanotto, Martino
Nanni, Loris
Brahnam, Sheryl
Guo, Xiang
author_facet Trapanotto, Martino
Nanni, Loris
Brahnam, Sheryl
Guo, Xiang
author_sort Trapanotto, Martino
collection PubMed
description The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub.
format Online
Article
Text
id pubmed-9029749
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90297492022-04-23 Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations Trapanotto, Martino Nanni, Loris Brahnam, Sheryl Guo, Xiang J Imaging Article The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub. MDPI 2022-04-01 /pmc/articles/PMC9029749/ /pubmed/35448223 http://dx.doi.org/10.3390/jimaging8040096 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Trapanotto, Martino
Nanni, Loris
Brahnam, Sheryl
Guo, Xiang
Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
title Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
title_full Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
title_fullStr Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
title_full_unstemmed Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
title_short Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations
title_sort convolutional neural networks for the identification of african lions from individual vocalizations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9029749/
https://www.ncbi.nlm.nih.gov/pubmed/35448223
http://dx.doi.org/10.3390/jimaging8040096
work_keys_str_mv AT trapanottomartino convolutionalneuralnetworksfortheidentificationofafricanlionsfromindividualvocalizations
AT nanniloris convolutionalneuralnetworksfortheidentificationofafricanlionsfromindividualvocalizations
AT brahnamsheryl convolutionalneuralnetworksfortheidentificationofafricanlionsfromindividualvocalizations
AT guoxiang convolutionalneuralnetworksfortheidentificationofafricanlionsfromindividualvocalizations