Cargando…

A method for constructing word sense embeddings based on word sense induction

Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional lo...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Yujia, Platoš, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412592/
https://www.ncbi.nlm.nih.gov/pubmed/37558764
http://dx.doi.org/10.1038/s41598-023-40062-3
_version_ 1785086944508641280
author Sun, Yujia
Platoš, Jan
author_facet Sun, Yujia
Platoš, Jan
author_sort Sun, Yujia
collection PubMed
description Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional long short-term memory neural network and a self-attention mechanism to extract the contextual information of polysemous words. Then, a K-means algorithm, which is improved by optimizing the density peaks clustering algorithm based on cosine similarity, is applied to perform word sense induction on the contextual information of polysemous words. Finally, the method constructs the corresponding word sense embedded representations of the polysemous words. The results of the experiments demonstrate that the proposed method produces better word sense induction than Euclidean distance, Pearson correlation, and KL-divergence and more accurate word sense embeddings than mean shift, DBSCAN, spectral clustering, and agglomerative clustering.
format Online
Article
Text
id pubmed-10412592
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-104125922023-08-11 A method for constructing word sense embeddings based on word sense induction Sun, Yujia Platoš, Jan Sci Rep Article Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional long short-term memory neural network and a self-attention mechanism to extract the contextual information of polysemous words. Then, a K-means algorithm, which is improved by optimizing the density peaks clustering algorithm based on cosine similarity, is applied to perform word sense induction on the contextual information of polysemous words. Finally, the method constructs the corresponding word sense embedded representations of the polysemous words. The results of the experiments demonstrate that the proposed method produces better word sense induction than Euclidean distance, Pearson correlation, and KL-divergence and more accurate word sense embeddings than mean shift, DBSCAN, spectral clustering, and agglomerative clustering. Nature Publishing Group UK 2023-08-09 /pmc/articles/PMC10412592/ /pubmed/37558764 http://dx.doi.org/10.1038/s41598-023-40062-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Sun, Yujia
Platoš, Jan
A method for constructing word sense embeddings based on word sense induction
title A method for constructing word sense embeddings based on word sense induction
title_full A method for constructing word sense embeddings based on word sense induction
title_fullStr A method for constructing word sense embeddings based on word sense induction
title_full_unstemmed A method for constructing word sense embeddings based on word sense induction
title_short A method for constructing word sense embeddings based on word sense induction
title_sort method for constructing word sense embeddings based on word sense induction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10412592/
https://www.ncbi.nlm.nih.gov/pubmed/37558764
http://dx.doi.org/10.1038/s41598-023-40062-3
work_keys_str_mv AT sunyujia amethodforconstructingwordsenseembeddingsbasedonwordsenseinduction
AT platosjan amethodforconstructingwordsenseembeddingsbasedonwordsenseinduction
AT sunyujia methodforconstructingwordsenseembeddingsbasedonwordsenseinduction
AT platosjan methodforconstructingwordsenseembeddingsbasedonwordsenseinduction