Cargando…

Measuring novelty in science with word embedding

Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of sema...

Descripción completa

Detalles Bibliográficos
Autores principales: Shibayama, Sotaro, Yin, Deyun, Matsumoto, Kuniko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8253414/
https://www.ncbi.nlm.nih.gov/pubmed/34214135
http://dx.doi.org/10.1371/journal.pone.0254034
_version_ 1783717506844721152
author Shibayama, Sotaro
Yin, Deyun
Matsumoto, Kuniko
author_facet Shibayama, Sotaro
Yin, Deyun
Matsumoto, Kuniko
author_sort Shibayama, Sotaro
collection PubMed
description Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding–a vector representation of each vocabulary–to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document’s reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation.
format Online
Article
Text
id pubmed-8253414
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-82534142021-07-13 Measuring novelty in science with word embedding Shibayama, Sotaro Yin, Deyun Matsumoto, Kuniko PLoS One Research Article Novelty is a core value in science, and a reliable measurement of novelty is crucial. This study proposes a new approach of measuring the novelty of scientific articles based on both citation data and text data. The proposed approach considers an article to be novel if it cites a combination of semantically distant references. To this end, we first assign a word embedding–a vector representation of each vocabulary–to each cited reference on the basis of text information included in the reference. With these vectors, a distance between every pair of references is computed. Finally, the novelty of a focal document is evaluated by summarizing the distances between all references. The approach draws on limited text information (the titles of references) and publicly shared library for word embeddings, which minimizes the requirement of data access and computational cost. We share the code, with which one can compute the novelty score of a document of interest only by having the focal document’s reference list. We validate the proposed measure through three exercises. First, we confirm that word embeddings can be used to quantify semantic distances between documents by comparing with an established bibliometric distance measure. Second, we confirm the criterion-related validity of the proposed novelty measure with self-reported novelty scores collected from a questionnaire survey. Finally, as novelty is known to be correlated with future citation impact, we confirm that the proposed measure can predict future citation. Public Library of Science 2021-07-02 /pmc/articles/PMC8253414/ /pubmed/34214135 http://dx.doi.org/10.1371/journal.pone.0254034 Text en © 2021 Shibayama et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Shibayama, Sotaro
Yin, Deyun
Matsumoto, Kuniko
Measuring novelty in science with word embedding
title Measuring novelty in science with word embedding
title_full Measuring novelty in science with word embedding
title_fullStr Measuring novelty in science with word embedding
title_full_unstemmed Measuring novelty in science with word embedding
title_short Measuring novelty in science with word embedding
title_sort measuring novelty in science with word embedding
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8253414/
https://www.ncbi.nlm.nih.gov/pubmed/34214135
http://dx.doi.org/10.1371/journal.pone.0254034
work_keys_str_mv AT shibayamasotaro measuringnoveltyinsciencewithwordembedding
AT yindeyun measuringnoveltyinsciencewithwordembedding
AT matsumotokuniko measuringnoveltyinsciencewithwordembedding