Cargando…
Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications
BACKGROUND: With the continuous spread of COVID-19, information about the worldwide pandemic is exploding. Therefore, it is necessary and significant to organize such a large amount of information. As the key branch of artificial intelligence, a knowledge graph (KG) is helpful to structure, reason,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9109781/ https://www.ncbi.nlm.nih.gov/pubmed/35476822 http://dx.doi.org/10.2196/37215 |
_version_ | 1784708958277074944 |
---|---|
author | Wang, Haofen Du, Huifang Qi, Guilin Chen, Huajun Hu, Wei Chen, Zhuo |
author_facet | Wang, Haofen Du, Huifang Qi, Guilin Chen, Huajun Hu, Wei Chen, Zhuo |
author_sort | Wang, Haofen |
collection | PubMed |
description | BACKGROUND: With the continuous spread of COVID-19, information about the worldwide pandemic is exploding. Therefore, it is necessary and significant to organize such a large amount of information. As the key branch of artificial intelligence, a knowledge graph (KG) is helpful to structure, reason, and understand data. OBJECTIVE: To improve the utilization value of the information and effectively aid researchers to combat COVID-19, we have constructed and successively released a unified linked data set named OpenKG-COVID19, which is one of the largest existing KGs related to COVID-19. OpenKG-COVID19 includes 10 interlinked COVID-19 subgraphs covering the topics of encyclopedia, concept, medical, research, event, health, epidemiology, goods, prevention, and character. METHODS: In this paper, we introduce the key techniques exploited in building COVID-19 KGs in a top-down manner. First, the schema of the modeling process for each KG in OpenKG-COVID19 is described. Second, we propose different methods for extracting knowledge from open government sites, professional texts, public domain–specific sources, and public encyclopedia sites. The curated 10 COVID-19 KGs are further linked together at both the schema and data levels. In addition, we present the naming convention for OpenKG-COVID19. RESULTS: OpenKG-COVID19 has more than 2572 concepts, 329,600 entities, 513 properties, and 2,687,329 facts, and the data set will be updated continuously. Each COVID-19 KG was evaluated, and the average precision was found to be above 93%. We have developed search and browse interfaces and a SPARQL endpoint to improve user access. Possible intelligent applications based on OpenKG-COVID19 for further development are also described. CONCLUSIONS: A KG is useful for intelligent question-answering, semantic searches, recommendation systems, visualization analysis, and decision-making support. Research related to COVID-19, biomedicine, and many other communities can benefit from OpenKG-COVID19. Furthermore, the 10 KGs will be continuously updated to ensure that the public will have access to sufficient and up-to-date knowledge. |
format | Online Article Text |
id | pubmed-9109781 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-91097812022-05-17 Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications Wang, Haofen Du, Huifang Qi, Guilin Chen, Huajun Hu, Wei Chen, Zhuo JMIR Med Inform Original Paper BACKGROUND: With the continuous spread of COVID-19, information about the worldwide pandemic is exploding. Therefore, it is necessary and significant to organize such a large amount of information. As the key branch of artificial intelligence, a knowledge graph (KG) is helpful to structure, reason, and understand data. OBJECTIVE: To improve the utilization value of the information and effectively aid researchers to combat COVID-19, we have constructed and successively released a unified linked data set named OpenKG-COVID19, which is one of the largest existing KGs related to COVID-19. OpenKG-COVID19 includes 10 interlinked COVID-19 subgraphs covering the topics of encyclopedia, concept, medical, research, event, health, epidemiology, goods, prevention, and character. METHODS: In this paper, we introduce the key techniques exploited in building COVID-19 KGs in a top-down manner. First, the schema of the modeling process for each KG in OpenKG-COVID19 is described. Second, we propose different methods for extracting knowledge from open government sites, professional texts, public domain–specific sources, and public encyclopedia sites. The curated 10 COVID-19 KGs are further linked together at both the schema and data levels. In addition, we present the naming convention for OpenKG-COVID19. RESULTS: OpenKG-COVID19 has more than 2572 concepts, 329,600 entities, 513 properties, and 2,687,329 facts, and the data set will be updated continuously. Each COVID-19 KG was evaluated, and the average precision was found to be above 93%. We have developed search and browse interfaces and a SPARQL endpoint to improve user access. Possible intelligent applications based on OpenKG-COVID19 for further development are also described. CONCLUSIONS: A KG is useful for intelligent question-answering, semantic searches, recommendation systems, visualization analysis, and decision-making support. Research related to COVID-19, biomedicine, and many other communities can benefit from OpenKG-COVID19. Furthermore, the 10 KGs will be continuously updated to ensure that the public will have access to sufficient and up-to-date knowledge. JMIR Publications 2022-05-13 /pmc/articles/PMC9109781/ /pubmed/35476822 http://dx.doi.org/10.2196/37215 Text en ©Haofen Wang, Huifang Du, Guilin Qi, Huajun Chen, Wei Hu, Zhuo Chen. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 13.05.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Wang, Haofen Du, Huifang Qi, Guilin Chen, Huajun Hu, Wei Chen, Zhuo Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications |
title | Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications |
title_full | Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications |
title_fullStr | Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications |
title_full_unstemmed | Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications |
title_short | Construction of a Linked Data Set of COVID-19 Knowledge Graphs: Development and Applications |
title_sort | construction of a linked data set of covid-19 knowledge graphs: development and applications |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9109781/ https://www.ncbi.nlm.nih.gov/pubmed/35476822 http://dx.doi.org/10.2196/37215 |
work_keys_str_mv | AT wanghaofen constructionofalinkeddatasetofcovid19knowledgegraphsdevelopmentandapplications AT duhuifang constructionofalinkeddatasetofcovid19knowledgegraphsdevelopmentandapplications AT qiguilin constructionofalinkeddatasetofcovid19knowledgegraphsdevelopmentandapplications AT chenhuajun constructionofalinkeddatasetofcovid19knowledgegraphsdevelopmentandapplications AT huwei constructionofalinkeddatasetofcovid19knowledgegraphsdevelopmentandapplications AT chenzhuo constructionofalinkeddatasetofcovid19knowledgegraphsdevelopmentandapplications |