Cargando…

Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning

Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human–computer interaction in recent generations. Furthermore, non-na...

Descripción completa

Detalles Bibliográficos
Autores principales: Radha, Kodali, Bansal, Mohan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601443/
https://www.ncbi.nlm.nih.gov/pubmed/37420510
http://dx.doi.org/10.3390/e24101490
_version_ 1784817066535026688
author Radha, Kodali
Bansal, Mohan
author_facet Radha, Kodali
Bansal, Mohan
author_sort Radha, Kodali
collection PubMed
description Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human–computer interaction in recent generations. Furthermore, non-native children are observed to exhibit a diverse range of reading errors during second language (L2) acquisition, such as lexical disfluency, hesitations, intra-word switching, and word repetitions, which are not yet addressed, resulting in ASR’s struggle to recognize non-native children’s speech. The main objective of this study is to develop a non-native children’s speech recognition system on top of feature-space discriminative models, such as feature-space maximum mutual information (fMMI) and boosted feature-space maximum mutual information (fbMMI). Harnessing the collaborative power of speed perturbation-based data augmentation on the original children’s speech corpora yields an effective performance. The corpus focuses on different speaking styles of children, together with read speech and spontaneous speech, in order to investigate the impact of non-native children’s L2 speaking proficiency on speech recognition systems. The experiments revealed that feature-space MMI models with steadily increasing speed perturbation factors outperform traditional ASR baseline models.
format Online
Article
Text
id pubmed-9601443
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96014432022-10-27 Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning Radha, Kodali Bansal, Mohan Entropy (Basel) Article Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human–computer interaction in recent generations. Furthermore, non-native children are observed to exhibit a diverse range of reading errors during second language (L2) acquisition, such as lexical disfluency, hesitations, intra-word switching, and word repetitions, which are not yet addressed, resulting in ASR’s struggle to recognize non-native children’s speech. The main objective of this study is to develop a non-native children’s speech recognition system on top of feature-space discriminative models, such as feature-space maximum mutual information (fMMI) and boosted feature-space maximum mutual information (fbMMI). Harnessing the collaborative power of speed perturbation-based data augmentation on the original children’s speech corpora yields an effective performance. The corpus focuses on different speaking styles of children, together with read speech and spontaneous speech, in order to investigate the impact of non-native children’s L2 speaking proficiency on speech recognition systems. The experiments revealed that feature-space MMI models with steadily increasing speed perturbation factors outperform traditional ASR baseline models. MDPI 2022-10-19 /pmc/articles/PMC9601443/ /pubmed/37420510 http://dx.doi.org/10.3390/e24101490 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Radha, Kodali
Bansal, Mohan
Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
title Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
title_full Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
title_fullStr Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
title_full_unstemmed Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
title_short Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
title_sort audio augmentation for non-native children’s speech recognition through discriminative learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601443/
https://www.ncbi.nlm.nih.gov/pubmed/37420510
http://dx.doi.org/10.3390/e24101490
work_keys_str_mv AT radhakodali audioaugmentationfornonnativechildrensspeechrecognitionthroughdiscriminativelearning
AT bansalmohan audioaugmentationfornonnativechildrensspeechrecognitionthroughdiscriminativelearning