Cargando…

Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verificati...

Descripción completa

Detalles Bibliográficos
Autores principales: Fong, Simon, Lan, Kun, Wong, Raymond
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3830839/
https://www.ncbi.nlm.nih.gov/pubmed/24288684
http://dx.doi.org/10.1155/2013/720834
_version_ 1782291533274284032
author Fong, Simon
Lan, Kun
Wong, Raymond
author_facet Fong, Simon
Lan, Kun
Wong, Raymond
author_sort Fong, Simon
collection PubMed
description Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC.
format Online
Article
Text
id pubmed-3830839
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-38308392013-11-28 Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection Fong, Simon Lan, Kun Wong, Raymond Biomed Res Int Research Article Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. Hindawi Publishing Corporation 2013 2013-10-29 /pmc/articles/PMC3830839/ /pubmed/24288684 http://dx.doi.org/10.1155/2013/720834 Text en Copyright © 2013 Simon Fong et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Fong, Simon
Lan, Kun
Wong, Raymond
Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection
title Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection
title_full Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection
title_fullStr Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection
title_full_unstemmed Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection
title_short Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection
title_sort classifying human voices by using hybrid sfx time-series preprocessing and ensemble feature selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3830839/
https://www.ncbi.nlm.nih.gov/pubmed/24288684
http://dx.doi.org/10.1155/2013/720834
work_keys_str_mv AT fongsimon classifyinghumanvoicesbyusinghybridsfxtimeseriespreprocessingandensemblefeatureselection
AT lankun classifyinghumanvoicesbyusinghybridsfxtimeseriespreprocessingandensemblefeatureselection
AT wongraymond classifyinghumanvoicesbyusinghybridsfxtimeseriespreprocessingandensemblefeatureselection