Cargando…
Identify novel elements of knowledge with word embedding
As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combinat...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10281565/ https://www.ncbi.nlm.nih.gov/pubmed/37339138 http://dx.doi.org/10.1371/journal.pone.0284567 |
_version_ | 1785061025833287680 |
---|---|
author | Yin, Deyun Wu, Zhao Yokota, Kazuki Matsumoto, Kuniko Shibayama, Sotaro |
author_facet | Yin, Deyun Wu, Zhao Yokota, Kazuki Matsumoto, Kuniko Shibayama, Sotaro |
author_sort | Yin, Deyun |
collection | PubMed |
description | As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields. |
format | Online Article Text |
id | pubmed-10281565 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-102815652023-06-21 Identify novel elements of knowledge with word embedding Yin, Deyun Wu, Zhao Yokota, Kazuki Matsumoto, Kuniko Shibayama, Sotaro PLoS One Research Article As novelty is a core value in science, a reliable approach to measuring the novelty of scientific documents is critical. Previous novelty measures however had a few limitations. First, the majority of previous measures are based on recombinant novelty concept, attempting to identify a novel combination of knowledge elements, but insufficient effort has been made to identify a novel element itself (element novelty). Second, most previous measures are not validated, and it is unclear what aspect of newness is measured. Third, some of the previous measures can be computed only in certain scientific fields for technical constraints. This study thus aims to provide a validated and field-universal approach to computing element novelty. We drew on machine learning to develop a word embedding model, which allows us to extract semantic information from text data. Our validation analyses suggest that our word embedding model does convey semantic information. Based on the trained word embedding, we quantified the element novelty of a document by measuring its distance from the rest of the document universe. We then carried out a questionnaire survey to obtain self-reported novelty scores from 800 scientists. We found that our element novelty measure is significantly correlated with self-reported novelty in terms of discovering and identifying new phenomena, substances, molecules, etc. and that this correlation is observed across different scientific fields. Public Library of Science 2023-06-20 /pmc/articles/PMC10281565/ /pubmed/37339138 http://dx.doi.org/10.1371/journal.pone.0284567 Text en © 2023 Yin et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Yin, Deyun Wu, Zhao Yokota, Kazuki Matsumoto, Kuniko Shibayama, Sotaro Identify novel elements of knowledge with word embedding |
title | Identify novel elements of knowledge with word embedding |
title_full | Identify novel elements of knowledge with word embedding |
title_fullStr | Identify novel elements of knowledge with word embedding |
title_full_unstemmed | Identify novel elements of knowledge with word embedding |
title_short | Identify novel elements of knowledge with word embedding |
title_sort | identify novel elements of knowledge with word embedding |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10281565/ https://www.ncbi.nlm.nih.gov/pubmed/37339138 http://dx.doi.org/10.1371/journal.pone.0284567 |
work_keys_str_mv | AT yindeyun identifynovelelementsofknowledgewithwordembedding AT wuzhao identifynovelelementsofknowledgewithwordembedding AT yokotakazuki identifynovelelementsofknowledgewithwordembedding AT matsumotokuniko identifynovelelementsofknowledgewithwordembedding AT shibayamasotaro identifynovelelementsofknowledgewithwordembedding |