Cargando…
Improve word embedding using both writing and pronunciation
Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287836/ https://www.ncbi.nlm.nih.gov/pubmed/30532197 http://dx.doi.org/10.1371/journal.pone.0208785 |
_version_ | 1783379693713489920 |
---|---|
author | Zhu, Wenhao Jin, Xin Ni, Jianyue Wei, Baogang Lu, Zhiguo |
author_facet | Zhu, Wenhao Jin, Xin Ni, Jianyue Wei, Baogang Lu, Zhiguo |
author_sort | Zhu, Wenhao |
collection | PubMed |
description | Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages. |
format | Online Article Text |
id | pubmed-6287836 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-62878362018-12-28 Improve word embedding using both writing and pronunciation Zhu, Wenhao Jin, Xin Ni, Jianyue Wei, Baogang Lu, Zhiguo PLoS One Research Article Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages. Public Library of Science 2018-12-10 /pmc/articles/PMC6287836/ /pubmed/30532197 http://dx.doi.org/10.1371/journal.pone.0208785 Text en © 2018 Zhu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Zhu, Wenhao Jin, Xin Ni, Jianyue Wei, Baogang Lu, Zhiguo Improve word embedding using both writing and pronunciation |
title | Improve word embedding using both writing and pronunciation |
title_full | Improve word embedding using both writing and pronunciation |
title_fullStr | Improve word embedding using both writing and pronunciation |
title_full_unstemmed | Improve word embedding using both writing and pronunciation |
title_short | Improve word embedding using both writing and pronunciation |
title_sort | improve word embedding using both writing and pronunciation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287836/ https://www.ncbi.nlm.nih.gov/pubmed/30532197 http://dx.doi.org/10.1371/journal.pone.0208785 |
work_keys_str_mv | AT zhuwenhao improvewordembeddingusingbothwritingandpronunciation AT jinxin improvewordembeddingusingbothwritingandpronunciation AT nijianyue improvewordembeddingusingbothwritingandpronunciation AT weibaogang improvewordembeddingusingbothwritingandpronunciation AT luzhiguo improvewordembeddingusingbothwritingandpronunciation |