Cargando…

Document Network Projection in Pretrained Word Embedding Space

We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g., citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., t...

Descripción completa

Detalles Bibliográficos
Autores principales: Gourru, Antoine, Guille, Adrien, Velcin, Julien, Jacques, Julien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148102/
http://dx.doi.org/10.1007/978-3-030-45442-5_19
_version_ 1783520532232142848
author Gourru, Antoine
Guille, Adrien
Velcin, Julien
Jacques, Julien
author_facet Gourru, Antoine
Guille, Adrien
Velcin, Julien
Jacques, Julien
author_sort Gourru, Antoine
collection PubMed
description We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g., citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.
format Online
Article
Text
id pubmed-7148102
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71481022020-04-13 Document Network Projection in Pretrained Word Embedding Space Gourru, Antoine Guille, Adrien Velcin, Julien Jacques, Julien Advances in Information Retrieval Article We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g., citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes. 2020-03-24 /pmc/articles/PMC7148102/ http://dx.doi.org/10.1007/978-3-030-45442-5_19 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Gourru, Antoine
Guille, Adrien
Velcin, Julien
Jacques, Julien
Document Network Projection in Pretrained Word Embedding Space
title Document Network Projection in Pretrained Word Embedding Space
title_full Document Network Projection in Pretrained Word Embedding Space
title_fullStr Document Network Projection in Pretrained Word Embedding Space
title_full_unstemmed Document Network Projection in Pretrained Word Embedding Space
title_short Document Network Projection in Pretrained Word Embedding Space
title_sort document network projection in pretrained word embedding space
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148102/
http://dx.doi.org/10.1007/978-3-030-45442-5_19
work_keys_str_mv AT gourruantoine documentnetworkprojectioninpretrainedwordembeddingspace
AT guilleadrien documentnetworkprojectioninpretrainedwordembeddingspace
AT velcinjulien documentnetworkprojectioninpretrainedwordembeddingspace
AT jacquesjulien documentnetworkprojectioninpretrainedwordembeddingspace