Cargando…

Improve word embedding using both writing and pronunciation

Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Wenhao, Jin, Xin, Ni, Jianyue, Wei, Baogang, Lu, Zhiguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287836/
https://www.ncbi.nlm.nih.gov/pubmed/30532197
http://dx.doi.org/10.1371/journal.pone.0208785
_version_ 1783379693713489920
author Zhu, Wenhao
Jin, Xin
Ni, Jianyue
Wei, Baogang
Lu, Zhiguo
author_facet Zhu, Wenhao
Jin, Xin
Ni, Jianyue
Wei, Baogang
Lu, Zhiguo
author_sort Zhu, Wenhao
collection PubMed
description Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages.
format Online
Article
Text
id pubmed-6287836
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62878362018-12-28 Improve word embedding using both writing and pronunciation Zhu, Wenhao Jin, Xin Ni, Jianyue Wei, Baogang Lu, Zhiguo PLoS One Research Article Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages. Public Library of Science 2018-12-10 /pmc/articles/PMC6287836/ /pubmed/30532197 http://dx.doi.org/10.1371/journal.pone.0208785 Text en © 2018 Zhu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhu, Wenhao
Jin, Xin
Ni, Jianyue
Wei, Baogang
Lu, Zhiguo
Improve word embedding using both writing and pronunciation
title Improve word embedding using both writing and pronunciation
title_full Improve word embedding using both writing and pronunciation
title_fullStr Improve word embedding using both writing and pronunciation
title_full_unstemmed Improve word embedding using both writing and pronunciation
title_short Improve word embedding using both writing and pronunciation
title_sort improve word embedding using both writing and pronunciation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6287836/
https://www.ncbi.nlm.nih.gov/pubmed/30532197
http://dx.doi.org/10.1371/journal.pone.0208785
work_keys_str_mv AT zhuwenhao improvewordembeddingusingbothwritingandpronunciation
AT jinxin improvewordembeddingusingbothwritingandpronunciation
AT nijianyue improvewordembeddingusingbothwritingandpronunciation
AT weibaogang improvewordembeddingusingbothwritingandpronunciation
AT luzhiguo improvewordembeddingusingbothwritingandpronunciation