Cargando…
Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts
Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931948/ https://www.ncbi.nlm.nih.gov/pubmed/33693384 http://dx.doi.org/10.3389/fdata.2020.00009 |
_version_ | 1783660389489180672 |
---|---|
author | Meng, Yu Huang, Jiaxin Wang, Guangyuan Wang, Zihan Zhang, Chao Han, Jiawei |
author_facet | Meng, Yu Huang, Jiaxin Wang, Guangyuan Wang, Zihan Zhang, Chao Han, Jiawei |
author_sort | Meng, Yu |
collection | PubMed |
description | Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks. |
format | Online Article Text |
id | pubmed-7931948 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79319482021-03-09 Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts Meng, Yu Huang, Jiaxin Wang, Guangyuan Wang, Zihan Zhang, Chao Han, Jiawei Front Big Data Big Data Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks. Frontiers Media S.A. 2020-03-11 /pmc/articles/PMC7931948/ /pubmed/33693384 http://dx.doi.org/10.3389/fdata.2020.00009 Text en Copyright © 2020 Meng, Huang, Wang, Wang, Zhang and Han. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Meng, Yu Huang, Jiaxin Wang, Guangyuan Wang, Zihan Zhang, Chao Han, Jiawei Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts |
title | Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts |
title_full | Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts |
title_fullStr | Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts |
title_full_unstemmed | Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts |
title_short | Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts |
title_sort | unsupervised word embedding learning by incorporating local and global contexts |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931948/ https://www.ncbi.nlm.nih.gov/pubmed/33693384 http://dx.doi.org/10.3389/fdata.2020.00009 |
work_keys_str_mv | AT mengyu unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts AT huangjiaxin unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts AT wangguangyuan unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts AT wangzihan unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts AT zhangchao unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts AT hanjiawei unsupervisedwordembeddinglearningbyincorporatinglocalandglobalcontexts |