Cargando…
A method for constructing word sense embeddings based on word sense induction
Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional lo...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412592/ https://www.ncbi.nlm.nih.gov/pubmed/37558764 http://dx.doi.org/10.1038/s41598-023-40062-3 |
_version_ | 1785086944508641280 |
---|---|
author | Sun, Yujia Platoš, Jan |
author_facet | Sun, Yujia Platoš, Jan |
author_sort | Sun, Yujia |
collection | PubMed |
description | Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional long short-term memory neural network and a self-attention mechanism to extract the contextual information of polysemous words. Then, a K-means algorithm, which is improved by optimizing the density peaks clustering algorithm based on cosine similarity, is applied to perform word sense induction on the contextual information of polysemous words. Finally, the method constructs the corresponding word sense embedded representations of the polysemous words. The results of the experiments demonstrate that the proposed method produces better word sense induction than Euclidean distance, Pearson correlation, and KL-divergence and more accurate word sense embeddings than mean shift, DBSCAN, spectral clustering, and agglomerative clustering. |
format | Online Article Text |
id | pubmed-10412592 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-104125922023-08-11 A method for constructing word sense embeddings based on word sense induction Sun, Yujia Platoš, Jan Sci Rep Article Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional long short-term memory neural network and a self-attention mechanism to extract the contextual information of polysemous words. Then, a K-means algorithm, which is improved by optimizing the density peaks clustering algorithm based on cosine similarity, is applied to perform word sense induction on the contextual information of polysemous words. Finally, the method constructs the corresponding word sense embedded representations of the polysemous words. The results of the experiments demonstrate that the proposed method produces better word sense induction than Euclidean distance, Pearson correlation, and KL-divergence and more accurate word sense embeddings than mean shift, DBSCAN, spectral clustering, and agglomerative clustering. Nature Publishing Group UK 2023-08-09 /pmc/articles/PMC10412592/ /pubmed/37558764 http://dx.doi.org/10.1038/s41598-023-40062-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Sun, Yujia Platoš, Jan A method for constructing word sense embeddings based on word sense induction |
title | A method for constructing word sense embeddings based on word sense induction |
title_full | A method for constructing word sense embeddings based on word sense induction |
title_fullStr | A method for constructing word sense embeddings based on word sense induction |
title_full_unstemmed | A method for constructing word sense embeddings based on word sense induction |
title_short | A method for constructing word sense embeddings based on word sense induction |
title_sort | method for constructing word sense embeddings based on word sense induction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412592/ https://www.ncbi.nlm.nih.gov/pubmed/37558764 http://dx.doi.org/10.1038/s41598-023-40062-3 |
work_keys_str_mv | AT sunyujia amethodforconstructingwordsenseembeddingsbasedonwordsenseinduction AT platosjan amethodforconstructingwordsenseembeddingsbasedonwordsenseinduction AT sunyujia methodforconstructingwordsenseembeddingsbasedonwordsenseinduction AT platosjan methodforconstructingwordsenseembeddingsbasedonwordsenseinduction |