Cargando…
Building a PubMed knowledge graph
PubMed(®) is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMe...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320186/ https://www.ncbi.nlm.nih.gov/pubmed/32591513 http://dx.doi.org/10.1038/s41597-020-0543-2 |
_version_ | 1783551188440973312 |
---|---|
author | Xu, Jian Kim, Sunkyu Song, Min Jeong, Minbyul Kim, Donghyeon Kang, Jaewoo Rousseau, Justin F. Li, Xin Xu, Weijia Torvik, Vetle I. Bu, Yi Chen, Chongyan Ebeid, Islam Akef Li, Daifeng Ding, Ying |
author_facet | Xu, Jian Kim, Sunkyu Song, Min Jeong, Minbyul Kim, Donghyeon Kang, Jaewoo Rousseau, Justin F. Li, Xin Xu, Weijia Torvik, Vetle I. Bu, Yi Chen, Chongyan Ebeid, Islam Akef Li, Daifeng Ding, Ying |
author_sort | Xu, Jian |
collection | PubMed |
description | PubMed(®) is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID(®), and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. |
format | Online Article Text |
id | pubmed-7320186 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-73201862020-06-30 Building a PubMed knowledge graph Xu, Jian Kim, Sunkyu Song, Min Jeong, Minbyul Kim, Donghyeon Kang, Jaewoo Rousseau, Justin F. Li, Xin Xu, Weijia Torvik, Vetle I. Bu, Yi Chen, Chongyan Ebeid, Islam Akef Li, Daifeng Ding, Ying Sci Data Data Descriptor PubMed(®) is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID(®), and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. Nature Publishing Group UK 2020-06-26 /pmc/articles/PMC7320186/ /pubmed/32591513 http://dx.doi.org/10.1038/s41597-020-0543-2 Text en © This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. |
spellingShingle | Data Descriptor Xu, Jian Kim, Sunkyu Song, Min Jeong, Minbyul Kim, Donghyeon Kang, Jaewoo Rousseau, Justin F. Li, Xin Xu, Weijia Torvik, Vetle I. Bu, Yi Chen, Chongyan Ebeid, Islam Akef Li, Daifeng Ding, Ying Building a PubMed knowledge graph |
title | Building a PubMed knowledge graph |
title_full | Building a PubMed knowledge graph |
title_fullStr | Building a PubMed knowledge graph |
title_full_unstemmed | Building a PubMed knowledge graph |
title_short | Building a PubMed knowledge graph |
title_sort | building a pubmed knowledge graph |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7320186/ https://www.ncbi.nlm.nih.gov/pubmed/32591513 http://dx.doi.org/10.1038/s41597-020-0543-2 |
work_keys_str_mv | AT xujian buildingapubmedknowledgegraph AT kimsunkyu buildingapubmedknowledgegraph AT songmin buildingapubmedknowledgegraph AT jeongminbyul buildingapubmedknowledgegraph AT kimdonghyeon buildingapubmedknowledgegraph AT kangjaewoo buildingapubmedknowledgegraph AT rousseaujustinf buildingapubmedknowledgegraph AT lixin buildingapubmedknowledgegraph AT xuweijia buildingapubmedknowledgegraph AT torvikvetlei buildingapubmedknowledgegraph AT buyi buildingapubmedknowledgegraph AT chenchongyan buildingapubmedknowledgegraph AT ebeidislamakef buildingapubmedknowledgegraph AT lidaifeng buildingapubmedknowledgegraph AT dingying buildingapubmedknowledgegraph |