Cargando…

Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM

Speech quality is frequently affected by a variety factors in online conferencing applications, such as background noise, reverberation, packet loss and network jitter. In real scenarios, it is impossible to obtain a clean reference signal for evaluating the quality of the conferencing speech. There...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Kailai, Yan, Diqun, Ye, Zhe, Xu, Xianbo, Gao, JinXing, Dong, Li, Peng, Chengbin, Yang, Kun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer London 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088708/
https://www.ncbi.nlm.nih.gov/pubmed/37362228
http://dx.doi.org/10.1007/s11760-023-02559-2
_version_ 1785022621196222464
author Shen, Kailai
Yan, Diqun
Ye, Zhe
Xu, Xianbo
Gao, JinXing
Dong, Li
Peng, Chengbin
Yang, Kun
author_facet Shen, Kailai
Yan, Diqun
Ye, Zhe
Xu, Xianbo
Gao, JinXing
Dong, Li
Peng, Chengbin
Yang, Kun
author_sort Shen, Kailai
collection PubMed
description Speech quality is frequently affected by a variety factors in online conferencing applications, such as background noise, reverberation, packet loss and network jitter. In real scenarios, it is impossible to obtain a clean reference signal for evaluating the quality of the conferencing speech. Therefore, an effective non-intrusive speech quality assessment (NISQA) method is necessary. In this paper, we propose a new network framework for NISQA based on ResNet and BiLSTM. ResNet is utilized to extract local features, while BiLSTM is used to integrate representative features with long-term time dependencies and sequential characteristics. Considering that ResNet may result in the loss of context information when applied to the NISQA task, we propose a variant of ResNet which can preserve the time series information of the conferencing speech. The experimental results demonstrate that the proposed method has a high correlation with the mean opinion score of clean, noisy and processed speech.
format Online
Article
Text
id pubmed-10088708
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer London
record_format MEDLINE/PubMed
spelling pubmed-100887082023-04-12 Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM Shen, Kailai Yan, Diqun Ye, Zhe Xu, Xianbo Gao, JinXing Dong, Li Peng, Chengbin Yang, Kun Signal Image Video Process Original Paper Speech quality is frequently affected by a variety factors in online conferencing applications, such as background noise, reverberation, packet loss and network jitter. In real scenarios, it is impossible to obtain a clean reference signal for evaluating the quality of the conferencing speech. Therefore, an effective non-intrusive speech quality assessment (NISQA) method is necessary. In this paper, we propose a new network framework for NISQA based on ResNet and BiLSTM. ResNet is utilized to extract local features, while BiLSTM is used to integrate representative features with long-term time dependencies and sequential characteristics. Considering that ResNet may result in the loss of context information when applied to the NISQA task, we propose a variant of ResNet which can preserve the time series information of the conferencing speech. The experimental results demonstrate that the proposed method has a high correlation with the mean opinion score of clean, noisy and processed speech. Springer London 2023-04-10 /pmc/articles/PMC10088708/ /pubmed/37362228 http://dx.doi.org/10.1007/s11760-023-02559-2 Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Paper
Shen, Kailai
Yan, Diqun
Ye, Zhe
Xu, Xianbo
Gao, JinXing
Dong, Li
Peng, Chengbin
Yang, Kun
Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM
title Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM
title_full Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM
title_fullStr Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM
title_full_unstemmed Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM
title_short Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM
title_sort non-intrusive speech quality assessment with attention-based resnet-bilstm
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088708/
https://www.ncbi.nlm.nih.gov/pubmed/37362228
http://dx.doi.org/10.1007/s11760-023-02559-2
work_keys_str_mv AT shenkailai nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm
AT yandiqun nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm
AT yezhe nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm
AT xuxianbo nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm
AT gaojinxing nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm
AT dongli nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm
AT pengchengbin nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm
AT yangkun nonintrusivespeechqualityassessmentwithattentionbasedresnetbilstm