Cargando…
Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech
Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393320/ https://www.ncbi.nlm.nih.gov/pubmed/25860959 http://dx.doi.org/10.1371/journal.pone.0123466 |
_version_ | 1782366150868336640 |
---|---|
author | Long, Yan-Hua Ye, Hong |
author_facet | Long, Yan-Hua Ye, Hong |
author_sort | Long, Yan-Hua |
collection | PubMed |
description | Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement. |
format | Online Article Text |
id | pubmed-4393320 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-43933202015-04-21 Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech Long, Yan-Hua Ye, Hong PLoS One Research Article Nowadays, although automatic speech recognition has become quite proficient in recognizing or transcribing well-prepared fluent speech, the transcription of speech that contains many disfluencies remains problematic, such as spontaneous conversational and lecture speech. Filled pauses (FPs) are the most frequently occurring disfluencies in this type of speech. Most recent studies have shown that FPs are widely believed to increase the error rates for state-of-the-art speech transcription, primarily because most FPs are not well annotated or provided in training data transcriptions and because of the similarities in acoustic characteristics between FPs and some common non-content words. To enhance the speech transcription system, we propose a new automatic refinement approach to detect FPs in British English lecture speech transcription. This approach combines the pronunciation probabilities for each word in the dictionary and acoustic language model scores for FP refinement through a modified speech recognition forced-alignment framework. We evaluate the proposed approach on the Reith Lectures speech transcription task, in which only imperfect training transcriptions are available. Successful results are achieved for both the development and evaluation datasets. Acoustic models trained on different styles of speech genres have been investigated with respect to FP refinement. To further validate the effectiveness of the proposed approach, speech transcription performance has also been examined using systems built on training data transcriptions with and without FP refinement. Public Library of Science 2015-04-10 /pmc/articles/PMC4393320/ /pubmed/25860959 http://dx.doi.org/10.1371/journal.pone.0123466 Text en © 2015 Long, Ye http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Long, Yan-Hua Ye, Hong Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech |
title | Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech |
title_full | Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech |
title_fullStr | Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech |
title_full_unstemmed | Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech |
title_short | Filled Pause Refinement Based on the Pronunciation Probability for Lecture Speech |
title_sort | filled pause refinement based on the pronunciation probability for lecture speech |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393320/ https://www.ncbi.nlm.nih.gov/pubmed/25860959 http://dx.doi.org/10.1371/journal.pone.0123466 |
work_keys_str_mv | AT longyanhua filledpauserefinementbasedonthepronunciationprobabilityforlecturespeech AT yehong filledpauserefinementbasedonthepronunciationprobabilityforlecturespeech |