Cargando…

Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms

Speech signals are being used as a primary input source in human–computer interaction (HCI) to develop several applications, such as automatic speech recognition (ASR), speech emotion recognition (SER), gender, and age recognition. Classifying speakers according to their age and gender is a challeng...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tursunov, Anvarjon, Mustaqeem, Choeh, Joon Yeon, Kwon, Soonil
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8434188/ https://www.ncbi.nlm.nih.gov/pubmed/34502785 http://dx.doi.org/10.3390/s21175892

_version_	1783751539044646912
author	Tursunov, Anvarjon Mustaqeem, Choeh, Joon Yeon Kwon, Soonil
author_facet	Tursunov, Anvarjon Mustaqeem, Choeh, Joon Yeon Kwon, Soonil
author_sort	Tursunov, Anvarjon
collection	PubMed
description	Speech signals are being used as a primary input source in human–computer interaction (HCI) to develop several applications, such as automatic speech recognition (ASR), speech emotion recognition (SER), gender, and age recognition. Classifying speakers according to their age and gender is a challenging task in speech processing owing to the disability of the current methods of extracting salient high-level speech features and classification models. To address these problems, we introduce a novel end-to-end age and gender recognition convolutional neural network (CNN) with a specially designed multi-attention module (MAM) from speech signals. Our proposed model uses MAM to extract spatial and temporal salient features from the input data effectively. The MAM mechanism uses a rectangular shape filter as a kernel in convolution layers and comprises two separate time and frequency attention mechanisms. The time attention branch learns to detect temporal cues, whereas the frequency attention module extracts the most relevant features to the target by focusing on the spatial frequency features. The combination of the two extracted spatial and temporal features complements one another and provide high performance in terms of age and gender classification. The proposed age and gender classification system was tested using the Common Voice and locally developed Korean speech recognition datasets. Our suggested model achieved 96%, 73%, and 76% accuracy scores for gender, age, and age-gender classification, respectively, using the Common Voice dataset. The Korean speech recognition dataset results were 97%, 97%, and 90% for gender, age, and age-gender recognition, respectively. The prediction performance of our proposed model, which was obtained in the experiments, demonstrated the superiority and robustness of the tasks regarding age, gender, and age-gender recognition from speech signals.
format	Online Article Text
id	pubmed-8434188
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-84341882021-09-12 Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms Tursunov, Anvarjon Mustaqeem, Choeh, Joon Yeon Kwon, Soonil Sensors (Basel) Article Speech signals are being used as a primary input source in human–computer interaction (HCI) to develop several applications, such as automatic speech recognition (ASR), speech emotion recognition (SER), gender, and age recognition. Classifying speakers according to their age and gender is a challenging task in speech processing owing to the disability of the current methods of extracting salient high-level speech features and classification models. To address these problems, we introduce a novel end-to-end age and gender recognition convolutional neural network (CNN) with a specially designed multi-attention module (MAM) from speech signals. Our proposed model uses MAM to extract spatial and temporal salient features from the input data effectively. The MAM mechanism uses a rectangular shape filter as a kernel in convolution layers and comprises two separate time and frequency attention mechanisms. The time attention branch learns to detect temporal cues, whereas the frequency attention module extracts the most relevant features to the target by focusing on the spatial frequency features. The combination of the two extracted spatial and temporal features complements one another and provide high performance in terms of age and gender classification. The proposed age and gender classification system was tested using the Common Voice and locally developed Korean speech recognition datasets. Our suggested model achieved 96%, 73%, and 76% accuracy scores for gender, age, and age-gender classification, respectively, using the Common Voice dataset. The Korean speech recognition dataset results were 97%, 97%, and 90% for gender, age, and age-gender recognition, respectively. The prediction performance of our proposed model, which was obtained in the experiments, demonstrated the superiority and robustness of the tasks regarding age, gender, and age-gender recognition from speech signals. MDPI 2021-09-01 /pmc/articles/PMC8434188/ /pubmed/34502785 http://dx.doi.org/10.3390/s21175892 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Tursunov, Anvarjon Mustaqeem, Choeh, Joon Yeon Kwon, Soonil Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms
title	Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms
title_full	Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms
title_fullStr	Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms
title_full_unstemmed	Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms
title_short	Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms
title_sort	age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8434188/ https://www.ncbi.nlm.nih.gov/pubmed/34502785 http://dx.doi.org/10.3390/s21175892
work_keys_str_mv	AT tursunovanvarjon ageandgenderrecognitionusingaconvolutionalneuralnetworkwithaspeciallydesignedmultiattentionmodulethroughspeechspectrograms AT mustaqeem ageandgenderrecognitionusingaconvolutionalneuralnetworkwithaspeciallydesignedmultiattentionmodulethroughspeechspectrograms AT choehjoonyeon ageandgenderrecognitionusingaconvolutionalneuralnetworkwithaspeciallydesignedmultiattentionmodulethroughspeechspectrograms AT kwonsoonil ageandgenderrecognitionusingaconvolutionalneuralnetworkwithaspeciallydesignedmultiattentionmodulethroughspeechspectrograms

Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms

Ejemplares similares