Cargando…

Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech

Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Long, Yan-Hua, Ye, Hong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393320/ https://www.ncbi.nlm.nih.gov/pubmed/25860959 http://dx.doi.org/10.1371/journal.pone.0123466

_version_	1782366150868336640
author	Long, Yan-Hua Ye, Hong
author_facet	Long, Yan-Hua Ye, Hong
author_sort	Long, Yan-Hua
collection	PubMed
description	Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement.
format	Online Article Text
id	pubmed-4393320
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-43933202015-04-21 Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech Long, Yan-Hua Ye, Hong PLoS One Research Article Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement. Public Library of Science 2015-04-10 /pmc/articles/PMC4393320/ /pubmed/25860959 http://dx.doi.org/10.1371/journal.pone.0123466 Text en © 2015 Long, Ye http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Long, Yan-Hua Ye, Hong Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title	Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_full	Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_fullStr	Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_full_unstemmed	Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_short	Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
title_sort	filled pause refinement based on the pronunciation probability for lecture speech
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393320/ https://www.ncbi.nlm.nih.gov/pubmed/25860959 http://dx.doi.org/10.1371/journal.pone.0123466
work_keys_str_mv	AT longyanhua filledpauserefinementbasedonthepronunciationprobabilityforlecturespeech AT yehong filledpauserefinementbasedonthepronunciationprobabilityforlecturespeech

Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech

Ejemplares similares