Cargando…

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions

Development of a native language robust ASR framework is very challenging as well as an active area of research. Although an urge for investigation of effective front-end as well as back-end approaches are required for tackling environment differences, large training complexity and inter-speaker var...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bawa, Puneet, Kadyan, Virender, Tripathy, Abinash, Singh, Thipendra P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2022
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160864/ https://www.ncbi.nlm.nih.gov/pubmed/35668730 http://dx.doi.org/10.1007/s40747-022-00651-7

_version_	1784719361129316352
author	Bawa, Puneet Kadyan, Virender Tripathy, Abinash Singh, Thipendra P.
author_facet	Bawa, Puneet Kadyan, Virender Tripathy, Abinash Singh, Thipendra P.
author_sort	Bawa, Puneet
collection	PubMed
description	Development of a native language robust ASR framework is very challenging as well as an active area of research. Although an urge for investigation of effective front-end as well as back-end approaches are required for tackling environment differences, large training complexity and inter-speaker variability in achieving success of a recognition system. In this paper, four front-end approaches: mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), relative spectral-perceptual linear prediction (RASTA-PLP) and power-normalized cepstral coefficients (PNCC) have been investigated to generate unique and robust feature vectors at different SNR values. Furthermore, to handle the large training data complexity, parameter optimization has been performed with sequence-discriminative training techniques: maximum mutual information (MMI), minimum phone error (MPE), boosted-MMI (bMMI), and state-level minimum Bayes risk (sMBR). It has been demonstrated by selection of an optimal value of parameters using lattice generation, and adjustments of learning rates. In proposed framework, four different systems have been tested by analyzing various feature extraction approaches (with or without speaker normalization through Vocal Tract Length Normalization (VTLN) approach in test set) and classification strategy on with or without artificial extension of train dataset. To compare each system performance, true matched (adult train and test—S1, child train and test—S2) and mismatched (adult train and child test—S3, adult + child train and child test—S4) systems on large adult and very small Punjabi clean speech corpus have been demonstrated. Consequently, gender-based in-domain data augmented is used to moderate acoustic and phonetic variations throughout adult and children’s speech under mismatched conditions. The experiment result shows that an effective framework developed on PNCC + VTLN front-end approach using TDNN-sMBR-based model through parameter optimization technique yields a relative improvement (RI) of 40.18%, 47.51%, and 49.87% in matched, mismatched and gender-based in-domain augmented system under typical clean and noisy conditions, respectively.
format	Online Article Text
id	pubmed-9160864
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-91608642022-06-02 Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions Bawa, Puneet Kadyan, Virender Tripathy, Abinash Singh, Thipendra P. Complex Intell Systems Original Article Development of a native language robust ASR framework is very challenging as well as an active area of research. Although an urge for investigation of effective front-end as well as back-end approaches are required for tackling environment differences, large training complexity and inter-speaker variability in achieving success of a recognition system. In this paper, four front-end approaches: mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), relative spectral-perceptual linear prediction (RASTA-PLP) and power-normalized cepstral coefficients (PNCC) have been investigated to generate unique and robust feature vectors at different SNR values. Furthermore, to handle the large training data complexity, parameter optimization has been performed with sequence-discriminative training techniques: maximum mutual information (MMI), minimum phone error (MPE), boosted-MMI (bMMI), and state-level minimum Bayes risk (sMBR). It has been demonstrated by selection of an optimal value of parameters using lattice generation, and adjustments of learning rates. In proposed framework, four different systems have been tested by analyzing various feature extraction approaches (with or without speaker normalization through Vocal Tract Length Normalization (VTLN) approach in test set) and classification strategy on with or without artificial extension of train dataset. To compare each system performance, true matched (adult train and test—S1, child train and test—S2) and mismatched (adult train and child test—S3, adult + child train and child test—S4) systems on large adult and very small Punjabi clean speech corpus have been demonstrated. Consequently, gender-based in-domain data augmented is used to moderate acoustic and phonetic variations throughout adult and children’s speech under mismatched conditions. The experiment result shows that an effective framework developed on PNCC + VTLN front-end approach using TDNN-sMBR-based model through parameter optimization technique yields a relative improvement (RI) of 40.18%, 47.51%, and 49.87% in matched, mismatched and gender-based in-domain augmented system under typical clean and noisy conditions, respectively. Springer International Publishing 2022-06-02 2023 /pmc/articles/PMC9160864/ /pubmed/35668730 http://dx.doi.org/10.1007/s40747-022-00651-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Original Article Bawa, Puneet Kadyan, Virender Tripathy, Abinash Singh, Thipendra P. Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
title	Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
title_full	Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
title_fullStr	Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
title_full_unstemmed	Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
title_short	Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions
title_sort	developing sequentially trained robust punjabi speech recognition system under matched and mismatched conditions
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9160864/ https://www.ncbi.nlm.nih.gov/pubmed/35668730 http://dx.doi.org/10.1007/s40747-022-00651-7
work_keys_str_mv	AT bawapuneet developingsequentiallytrainedrobustpunjabispeechrecognitionsystemundermatchedandmismatchedconditions AT kadyanvirender developingsequentiallytrainedrobustpunjabispeechrecognitionsystemundermatchedandmismatchedconditions AT tripathyabinash developingsequentiallytrainedrobustpunjabispeechrecognitionsystemundermatchedandmismatchedconditions AT singhthipendrap developingsequentiallytrainedrobustpunjabispeechrecognitionsystemundermatchedandmismatchedconditions

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions

Ejemplares similares