Cargando…
A Novel Voice Sensor for the Detection of Speech Signals
In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (wel...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Molecular Diversity Preservation International (MDPI)
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3892860/ https://www.ncbi.nlm.nih.gov/pubmed/24316566 http://dx.doi.org/10.3390/s131216533 |
_version_ | 1782299597753810944 |
---|---|
author | Wang, Kun-Ching |
author_facet | Wang, Kun-Ching |
author_sort | Wang, Kun-Ching |
collection | PubMed |
description | In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint), Wu et al. were the first to use band-spectral entropy (BSE) to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise) environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE) parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR) levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms. |
format | Online Article Text |
id | pubmed-3892860 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Molecular Diversity Preservation International (MDPI) |
record_format | MEDLINE/PubMed |
spelling | pubmed-38928602014-01-16 A Novel Voice Sensor for the Detection of Speech Signals Wang, Kun-Ching Sensors (Basel) Article In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint), Wu et al. were the first to use band-spectral entropy (BSE) to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise) environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE) parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR) levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms. Molecular Diversity Preservation International (MDPI) 2013-12-02 /pmc/articles/PMC3892860/ /pubmed/24316566 http://dx.doi.org/10.3390/s131216533 Text en © 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/). |
spellingShingle | Article Wang, Kun-Ching A Novel Voice Sensor for the Detection of Speech Signals |
title | A Novel Voice Sensor for the Detection of Speech Signals |
title_full | A Novel Voice Sensor for the Detection of Speech Signals |
title_fullStr | A Novel Voice Sensor for the Detection of Speech Signals |
title_full_unstemmed | A Novel Voice Sensor for the Detection of Speech Signals |
title_short | A Novel Voice Sensor for the Detection of Speech Signals |
title_sort | novel voice sensor for the detection of speech signals |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3892860/ https://www.ncbi.nlm.nih.gov/pubmed/24316566 http://dx.doi.org/10.3390/s131216533 |
work_keys_str_mv | AT wangkunching anovelvoicesensorforthedetectionofspeechsignals AT wangkunching novelvoicesensorforthedetectionofspeechsignals |