Cargando…

A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses

BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common g...

Descripción completa

Detalles Bibliográficos
Autores principales: Waagmeester, Andra, Willighagen, Egon L., Su, Andrew I., Kutmon, Martina, Gayo, Jose Emilio Labra, Fernández-Álvarez, Daniel, Groom, Quentin, Schaap, Peter J., Verhagen, Lisa M., Koehorst, Jasper J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820539/
https://www.ncbi.nlm.nih.gov/pubmed/33482803
http://dx.doi.org/10.1186/s12915-020-00940-y
_version_ 1783639237785026560
author Waagmeester, Andra
Willighagen, Egon L.
Su, Andrew I.
Kutmon, Martina
Gayo, Jose Emilio Labra
Fernández-Álvarez, Daniel
Groom, Quentin
Schaap, Peter J.
Verhagen, Lisa M.
Koehorst, Jasper J.
author_facet Waagmeester, Andra
Willighagen, Egon L.
Su, Andrew I.
Kutmon, Martina
Gayo, Jose Emilio Labra
Fernández-Álvarez, Daniel
Groom, Quentin
Schaap, Peter J.
Verhagen, Lisa M.
Koehorst, Jasper J.
author_sort Waagmeester, Andra
collection PubMed
description BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. RESULTS: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. CONCLUSIONS: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
format Online
Article
Text
id pubmed-7820539
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78205392021-01-22 A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses Waagmeester, Andra Willighagen, Egon L. Su, Andrew I. Kutmon, Martina Gayo, Jose Emilio Labra Fernández-Álvarez, Daniel Groom, Quentin Schaap, Peter J. Verhagen, Lisa M. Koehorst, Jasper J. BMC Biol Methodology Article BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. RESULTS: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. CONCLUSIONS: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4). BioMed Central 2021-01-22 /pmc/articles/PMC7820539/ /pubmed/33482803 http://dx.doi.org/10.1186/s12915-020-00940-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Waagmeester, Andra
Willighagen, Egon L.
Su, Andrew I.
Kutmon, Martina
Gayo, Jose Emilio Labra
Fernández-Álvarez, Daniel
Groom, Quentin
Schaap, Peter J.
Verhagen, Lisa M.
Koehorst, Jasper J.
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_full A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_fullStr A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_full_unstemmed A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_short A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
title_sort protocol for adding knowledge to wikidata: aligning resources on human coronaviruses
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820539/
https://www.ncbi.nlm.nih.gov/pubmed/33482803
http://dx.doi.org/10.1186/s12915-020-00940-y
work_keys_str_mv AT waagmeesterandra aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT willighagenegonl aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT suandrewi aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT kutmonmartina aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT gayojoseemiliolabra aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT fernandezalvarezdaniel aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT groomquentin aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT schaappeterj aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT verhagenlisam aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT koehorstjasperj aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT waagmeesterandra protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT willighagenegonl protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT suandrewi protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT kutmonmartina protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT gayojoseemiliolabra protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT fernandezalvarezdaniel protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT groomquentin protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT schaappeterj protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT verhagenlisam protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses
AT koehorstjasperj protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses