Cargando…

Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs

Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighte...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Feiyu, Zhang, Luyang, Chen, Hongxiang, Xie, Jiangjian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8624801/
https://www.ncbi.nlm.nih.gov/pubmed/34828205
http://dx.doi.org/10.3390/e23111507
_version_ 1784606262988636160
author Zhang, Feiyu
Zhang, Luyang
Chen, Hongxiang
Xie, Jiangjian
author_facet Zhang, Feiyu
Zhang, Luyang
Chen, Hongxiang
Xie, Jiangjian
author_sort Zhang, Feiyu
collection PubMed
description Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability.
format Online
Article
Text
id pubmed-8624801
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86248012021-11-27 Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs Zhang, Feiyu Zhang, Luyang Chen, Hongxiang Xie, Jiangjian Entropy (Basel) Article Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability. MDPI 2021-11-13 /pmc/articles/PMC8624801/ /pubmed/34828205 http://dx.doi.org/10.3390/e23111507 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Feiyu
Zhang, Luyang
Chen, Hongxiang
Xie, Jiangjian
Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs
title Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs
title_full Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs
title_fullStr Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs
title_full_unstemmed Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs
title_short Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs
title_sort bird species identification using spectrogram based on multi-channel fusion of dcnns
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8624801/
https://www.ncbi.nlm.nih.gov/pubmed/34828205
http://dx.doi.org/10.3390/e23111507
work_keys_str_mv AT zhangfeiyu birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns
AT zhangluyang birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns
AT chenhongxiang birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns
AT xiejiangjian birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns