Cargando…

Toward a Coronavirus Knowledge Graph

This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related t...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Peng, Bu, Yi, Jiang, Peng, Shi, Xiaowen, Lun, Bing, Chen, Chongyan, Syafiandini, Arida Ferti, Ding, Ying, Song, Min
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8307964/
https://www.ncbi.nlm.nih.gov/pubmed/34209818
http://dx.doi.org/10.3390/genes12070998
_version_ 1783728168357593088
author Zhang, Peng
Bu, Yi
Jiang, Peng
Shi, Xiaowen
Lun, Bing
Chen, Chongyan
Syafiandini, Arida Ferti
Ding, Ying
Song, Min
author_facet Zhang, Peng
Bu, Yi
Jiang, Peng
Shi, Xiaowen
Lun, Bing
Chen, Chongyan
Syafiandini, Arida Ferti
Ding, Ying
Song, Min
author_sort Zhang, Peng
collection PubMed
description This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation.
format Online
Article
Text
id pubmed-8307964
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83079642021-07-25 Toward a Coronavirus Knowledge Graph Zhang, Peng Bu, Yi Jiang, Peng Shi, Xiaowen Lun, Bing Chen, Chongyan Syafiandini, Arida Ferti Ding, Ying Song, Min Genes (Basel) Article This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation. MDPI 2021-06-29 /pmc/articles/PMC8307964/ /pubmed/34209818 http://dx.doi.org/10.3390/genes12070998 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Peng
Bu, Yi
Jiang, Peng
Shi, Xiaowen
Lun, Bing
Chen, Chongyan
Syafiandini, Arida Ferti
Ding, Ying
Song, Min
Toward a Coronavirus Knowledge Graph
title Toward a Coronavirus Knowledge Graph
title_full Toward a Coronavirus Knowledge Graph
title_fullStr Toward a Coronavirus Knowledge Graph
title_full_unstemmed Toward a Coronavirus Knowledge Graph
title_short Toward a Coronavirus Knowledge Graph
title_sort toward a coronavirus knowledge graph
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8307964/
https://www.ncbi.nlm.nih.gov/pubmed/34209818
http://dx.doi.org/10.3390/genes12070998
work_keys_str_mv AT zhangpeng towardacoronavirusknowledgegraph
AT buyi towardacoronavirusknowledgegraph
AT jiangpeng towardacoronavirusknowledgegraph
AT shixiaowen towardacoronavirusknowledgegraph
AT lunbing towardacoronavirusknowledgegraph
AT chenchongyan towardacoronavirusknowledgegraph
AT syafiandiniaridaferti towardacoronavirusknowledgegraph
AT dingying towardacoronavirusknowledgegraph
AT songmin towardacoronavirusknowledgegraph