Cargando…

NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram

With the evolution in technology, communication based on the voice has gained importance in applications such as online conferencing, online meetings, voice-over internet protocol (VoIP), etc. Limiting factors such as environmental noise, encoding and decoding of the speech signal, and limitations o...

Descripción completa

Detalles Bibliográficos
Autores principales: Zafar, Shakeel, Nizami, Imran Fareed, Rehman, Mobeen Ur, Majid, Muhammad, Ryu, Jihyoung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10301095/
https://www.ncbi.nlm.nih.gov/pubmed/37420818
http://dx.doi.org/10.3390/s23125652
_version_ 1785064731487240192
author Zafar, Shakeel
Nizami, Imran Fareed
Rehman, Mobeen Ur
Majid, Muhammad
Ryu, Jihyoung
author_facet Zafar, Shakeel
Nizami, Imran Fareed
Rehman, Mobeen Ur
Majid, Muhammad
Ryu, Jihyoung
author_sort Zafar, Shakeel
collection PubMed
description With the evolution in technology, communication based on the voice has gained importance in applications such as online conferencing, online meetings, voice-over internet protocol (VoIP), etc. Limiting factors such as environmental noise, encoding and decoding of the speech signal, and limitations of technology may degrade the quality of the speech signal. Therefore, there is a requirement for continuous quality assessment of the speech signal. Speech quality assessment (SQA) enables the system to automatically tune network parameters to improve speech quality. Furthermore, there are many speech transmitters and receivers that are used for voice processing including mobile devices and high-performance computers that can benefit from SQA. SQA plays a significant role in the evaluation of speech-processing systems. Non-intrusive speech quality assessment (NI-SQA) is a challenging task due to the unavailability of pristine speech signals in real-world scenarios. The success of NI-SQA techniques highly relies on the features used to assess speech quality. Various NI-SQA methods are available that extract features from speech signals in different domains, but they do not take into account the natural structure of the speech signals for assessment of speech quality. This work proposes a method for NI-SQA based on the natural structure of the speech signals that are approximated using the natural spectrogram statistical (NSS) properties derived from the speech signal spectrogram. The pristine version of the speech signal follows a structured natural pattern that is disrupted when distortion is introduced in the speech signal. The deviation of NSS properties between the pristine and distorted speech signals is utilized to predict speech quality. The proposed methodology shows better performance in comparison to state-of-the-art NI-SQA methods on the Centre for Speech Technology Voice Cloning Toolkit corpus (VCTK-Corpus) with a Spearman’s rank-ordered correlation constant (SRC) of 0.902, Pearson correlation constant (PCC) of 0.960, and root mean squared error (RMSE) of 0.206. Conversely, on the NOIZEUS-960 database, the proposed methodology shows an SRC of 0.958, PCC of 0.960, and RMSE of 0.114.
format Online
Article
Text
id pubmed-10301095
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103010952023-06-29 NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram Zafar, Shakeel Nizami, Imran Fareed Rehman, Mobeen Ur Majid, Muhammad Ryu, Jihyoung Sensors (Basel) Article With the evolution in technology, communication based on the voice has gained importance in applications such as online conferencing, online meetings, voice-over internet protocol (VoIP), etc. Limiting factors such as environmental noise, encoding and decoding of the speech signal, and limitations of technology may degrade the quality of the speech signal. Therefore, there is a requirement for continuous quality assessment of the speech signal. Speech quality assessment (SQA) enables the system to automatically tune network parameters to improve speech quality. Furthermore, there are many speech transmitters and receivers that are used for voice processing including mobile devices and high-performance computers that can benefit from SQA. SQA plays a significant role in the evaluation of speech-processing systems. Non-intrusive speech quality assessment (NI-SQA) is a challenging task due to the unavailability of pristine speech signals in real-world scenarios. The success of NI-SQA techniques highly relies on the features used to assess speech quality. Various NI-SQA methods are available that extract features from speech signals in different domains, but they do not take into account the natural structure of the speech signals for assessment of speech quality. This work proposes a method for NI-SQA based on the natural structure of the speech signals that are approximated using the natural spectrogram statistical (NSS) properties derived from the speech signal spectrogram. The pristine version of the speech signal follows a structured natural pattern that is disrupted when distortion is introduced in the speech signal. The deviation of NSS properties between the pristine and distorted speech signals is utilized to predict speech quality. The proposed methodology shows better performance in comparison to state-of-the-art NI-SQA methods on the Centre for Speech Technology Voice Cloning Toolkit corpus (VCTK-Corpus) with a Spearman’s rank-ordered correlation constant (SRC) of 0.902, Pearson correlation constant (PCC) of 0.960, and root mean squared error (RMSE) of 0.206. Conversely, on the NOIZEUS-960 database, the proposed methodology shows an SRC of 0.958, PCC of 0.960, and RMSE of 0.114. MDPI 2023-06-16 /pmc/articles/PMC10301095/ /pubmed/37420818 http://dx.doi.org/10.3390/s23125652 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zafar, Shakeel
Nizami, Imran Fareed
Rehman, Mobeen Ur
Majid, Muhammad
Ryu, Jihyoung
NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram
title NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram
title_full NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram
title_fullStr NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram
title_full_unstemmed NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram
title_short NISQE: Non-Intrusive Speech Quality Evaluator Based on Natural Statistics of Mean Subtracted Contrast Normalized Coefficients of Spectrogram
title_sort nisqe: non-intrusive speech quality evaluator based on natural statistics of mean subtracted contrast normalized coefficients of spectrogram
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10301095/
https://www.ncbi.nlm.nih.gov/pubmed/37420818
http://dx.doi.org/10.3390/s23125652
work_keys_str_mv AT zafarshakeel nisqenonintrusivespeechqualityevaluatorbasedonnaturalstatisticsofmeansubtractedcontrastnormalizedcoefficientsofspectrogram
AT nizamiimranfareed nisqenonintrusivespeechqualityevaluatorbasedonnaturalstatisticsofmeansubtractedcontrastnormalizedcoefficientsofspectrogram
AT rehmanmobeenur nisqenonintrusivespeechqualityevaluatorbasedonnaturalstatisticsofmeansubtractedcontrastnormalizedcoefficientsofspectrogram
AT majidmuhammad nisqenonintrusivespeechqualityevaluatorbasedonnaturalstatisticsofmeansubtractedcontrastnormalizedcoefficientsofspectrogram
AT ryujihyoung nisqenonintrusivespeechqualityevaluatorbasedonnaturalstatisticsofmeansubtractedcontrastnormalizedcoefficientsofspectrogram