Cargando…

Building a PubMed knowledge graph

PubMed(®) is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMe...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Jian, Kim, Sunkyu, Song, Min, Jeong, Minbyul, Kim, Donghyeon, Kang, Jaewoo, Rousseau, Justin F., Li, Xin, Xu, Weijia, Torvik, Vetle I., Bu, Yi, Chen, Chongyan, Ebeid, Islam Akef, Li, Daifeng, Ding, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320186/
https://www.ncbi.nlm.nih.gov/pubmed/32591513
http://dx.doi.org/10.1038/s41597-020-0543-2
_version_ 1783551188440973312
author Xu, Jian
Kim, Sunkyu
Song, Min
Jeong, Minbyul
Kim, Donghyeon
Kang, Jaewoo
Rousseau, Justin F.
Li, Xin
Xu, Weijia
Torvik, Vetle I.
Bu, Yi
Chen, Chongyan
Ebeid, Islam Akef
Li, Daifeng
Ding, Ying
author_facet Xu, Jian
Kim, Sunkyu
Song, Min
Jeong, Minbyul
Kim, Donghyeon
Kang, Jaewoo
Rousseau, Justin F.
Li, Xin
Xu, Weijia
Torvik, Vetle I.
Bu, Yi
Chen, Chongyan
Ebeid, Islam Akef
Li, Daifeng
Ding, Ying
author_sort Xu, Jian
collection PubMed
description PubMed(®) is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID(®), and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities.
format Online
Article
Text
id pubmed-7320186
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73201862020-06-30 Building a PubMed knowledge graph Xu, Jian Kim, Sunkyu Song, Min Jeong, Minbyul Kim, Donghyeon Kang, Jaewoo Rousseau, Justin F. Li, Xin Xu, Weijia Torvik, Vetle I. Bu, Yi Chen, Chongyan Ebeid, Islam Akef Li, Daifeng Ding, Ying Sci Data Data Descriptor PubMed(®) is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID(®), and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. Nature Publishing Group UK 2020-06-26 /pmc/articles/PMC7320186/ /pubmed/32591513 http://dx.doi.org/10.1038/s41597-020-0543-2 Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Xu, Jian
Kim, Sunkyu
Song, Min
Jeong, Minbyul
Kim, Donghyeon
Kang, Jaewoo
Rousseau, Justin F.
Li, Xin
Xu, Weijia
Torvik, Vetle I.
Bu, Yi
Chen, Chongyan
Ebeid, Islam Akef
Li, Daifeng
Ding, Ying
Building a PubMed knowledge graph
title Building a PubMed knowledge graph
title_full Building a PubMed knowledge graph
title_fullStr Building a PubMed knowledge graph
title_full_unstemmed Building a PubMed knowledge graph
title_short Building a PubMed knowledge graph
title_sort building a pubmed knowledge graph
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320186/
https://www.ncbi.nlm.nih.gov/pubmed/32591513
http://dx.doi.org/10.1038/s41597-020-0543-2
work_keys_str_mv AT xujian buildingapubmedknowledgegraph
AT kimsunkyu buildingapubmedknowledgegraph
AT songmin buildingapubmedknowledgegraph
AT jeongminbyul buildingapubmedknowledgegraph
AT kimdonghyeon buildingapubmedknowledgegraph
AT kangjaewoo buildingapubmedknowledgegraph
AT rousseaujustinf buildingapubmedknowledgegraph
AT lixin buildingapubmedknowledgegraph
AT xuweijia buildingapubmedknowledgegraph
AT torvikvetlei buildingapubmedknowledgegraph
AT buyi buildingapubmedknowledgegraph
AT chenchongyan buildingapubmedknowledgegraph
AT ebeidislamakef buildingapubmedknowledgegraph
AT lidaifeng buildingapubmedknowledgegraph
AT dingying buildingapubmedknowledgegraph