Cargando…

Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts

Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Yu, Huang, Jiaxin, Wang, Guangyuan, Wang, Zihan, Zhang, Chao, Han, Jiawei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931948/
https://www.ncbi.nlm.nih.gov/pubmed/33693384
http://dx.doi.org/10.3389/fdata.2020.00009
_version_ 1783660389489180672
author Meng, Yu
Huang, Jiaxin
Wang, Guangyuan
Wang, Zihan
Zhang, Chao
Han, Jiawei
author_facet Meng, Yu
Huang, Jiaxin
Wang, Guangyuan
Wang, Zihan
Zhang, Chao
Han, Jiawei
author_sort Meng, Yu
collection PubMed
description Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks.
format Online
Article
Text
id pubmed-7931948
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79319482021-03-09 Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts Meng, Yu Huang, Jiaxin Wang, Guangyuan Wang, Zihan Zhang, Chao Han, Jiawei Front Big Data Big Data Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks. Frontiers Media S.A. 2020-03-11 /pmc/articles/PMC7931948/ /pubmed/33693384 http://dx.doi.org/10.3389/fdata.2020.00009 Text en Copyright © 2020 Meng, Huang, Wang, Wang, Zhang and Han. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Meng, Yu
Huang, Jiaxin
Wang, Guangyuan
Wang, Zihan
Zhang, Chao
Han, Jiawei
Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_full Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_fullStr Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_full_unstemmed Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_short Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
title_sort unsupervised word embedding learning by incorporating local and global contexts
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931948/
https://www.ncbi.nlm.nih.gov/pubmed/33693384
http://dx.doi.org/10.3389/fdata.2020.00009
work_keys_str_mv AT mengyu unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT huangjiaxin unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT wangguangyuan unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT wangzihan unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT zhangchao unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts
AT hanjiawei unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts