Cargando…
Toward a Coronavirus Knowledge Graph
This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related t...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8307964/ https://www.ncbi.nlm.nih.gov/pubmed/34209818 http://dx.doi.org/10.3390/genes12070998 |
_version_ | 1783728168357593088 |
---|---|
author | Zhang, Peng Bu, Yi Jiang, Peng Shi, Xiaowen Lun, Bing Chen, Chongyan Syafiandini, Arida Ferti Ding, Ying Song, Min |
author_facet | Zhang, Peng Bu, Yi Jiang, Peng Shi, Xiaowen Lun, Bing Chen, Chongyan Syafiandini, Arida Ferti Ding, Ying Song, Min |
author_sort | Zhang, Peng |
collection | PubMed |
description | This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation. |
format | Online Article Text |
id | pubmed-8307964 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-83079642021-07-25 Toward a Coronavirus Knowledge Graph Zhang, Peng Bu, Yi Jiang, Peng Shi, Xiaowen Lun, Bing Chen, Chongyan Syafiandini, Arida Ferti Ding, Ying Song, Min Genes (Basel) Article This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation. MDPI 2021-06-29 /pmc/articles/PMC8307964/ /pubmed/34209818 http://dx.doi.org/10.3390/genes12070998 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhang, Peng Bu, Yi Jiang, Peng Shi, Xiaowen Lun, Bing Chen, Chongyan Syafiandini, Arida Ferti Ding, Ying Song, Min Toward a Coronavirus Knowledge Graph |
title | Toward a Coronavirus Knowledge Graph |
title_full | Toward a Coronavirus Knowledge Graph |
title_fullStr | Toward a Coronavirus Knowledge Graph |
title_full_unstemmed | Toward a Coronavirus Knowledge Graph |
title_short | Toward a Coronavirus Knowledge Graph |
title_sort | toward a coronavirus knowledge graph |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8307964/ https://www.ncbi.nlm.nih.gov/pubmed/34209818 http://dx.doi.org/10.3390/genes12070998 |
work_keys_str_mv | AT zhangpeng towardacoronavirusknowledgegraph AT buyi towardacoronavirusknowledgegraph AT jiangpeng towardacoronavirusknowledgegraph AT shixiaowen towardacoronavirusknowledgegraph AT lunbing towardacoronavirusknowledgegraph AT chenchongyan towardacoronavirusknowledgegraph AT syafiandiniaridaferti towardacoronavirusknowledgegraph AT dingying towardacoronavirusknowledgegraph AT songmin towardacoronavirusknowledgegraph |