Cargando…

Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech

Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the...

Descripción completa

Detalles Bibliográficos
Autores principales: Long, Yan-Hua, Ye, Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393320/
https://www.ncbi.nlm.nih.gov/pubmed/25860959
http://dx.doi.org/10.1371/journal.pone.0123466
_version_ 1782366150868336640
author Long, Yan-Hua
Ye, Hong
author_facet Long, Yan-Hua
Ye, Hong
author_sort Long, Yan-Hua
collection PubMed
description Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.
format Online
Article
Text
id pubmed-4393320
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43933202015-04-21 Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech Long, Yan-Hua Ye, Hong PLoS One Research Article Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement. Public Library of Science 2015-04-10 /pmc/articles/PMC4393320/ /pubmed/25860959 http://dx.doi.org/10.1371/journal.pone.0123466 Text en © 2015 Long, Ye http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Long, Yan-Hua
Ye, Hong
Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_full Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_fullStr Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_full_unstemmed Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_short Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_sort filled pause refinement based on the pronunciation probability for lecture speech
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393320/
https://www.ncbi.nlm.nih.gov/pubmed/25860959
http://dx.doi.org/10.1371/journal.pone.0123466
work_keys_str_mv AT longyanhua filledpauserefinementbasedonthepronunciationprobabilityforlecturespeech
AT yehong filledpauserefinementbasedonthepronunciationprobabilityforlecturespeech