Cargando…
A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common g...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820539/ https://www.ncbi.nlm.nih.gov/pubmed/33482803 http://dx.doi.org/10.1186/s12915-020-00940-y |
_version_ | 1783639237785026560 |
---|---|
author | Waagmeester, Andra Willighagen, Egon L. Su, Andrew I. Kutmon, Martina Gayo, Jose Emilio Labra Fernández-Álvarez, Daniel Groom, Quentin Schaap, Peter J. Verhagen, Lisa M. Koehorst, Jasper J. |
author_facet | Waagmeester, Andra Willighagen, Egon L. Su, Andrew I. Kutmon, Martina Gayo, Jose Emilio Labra Fernández-Álvarez, Daniel Groom, Quentin Schaap, Peter J. Verhagen, Lisa M. Koehorst, Jasper J. |
author_sort | Waagmeester, Andra |
collection | PubMed |
description | BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. RESULTS: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. CONCLUSIONS: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4). |
format | Online Article Text |
id | pubmed-7820539 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-78205392021-01-22 A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses Waagmeester, Andra Willighagen, Egon L. Su, Andrew I. Kutmon, Martina Gayo, Jose Emilio Labra Fernández-Álvarez, Daniel Groom, Quentin Schaap, Peter J. Verhagen, Lisa M. Koehorst, Jasper J. BMC Biol Methodology Article BACKGROUND: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a “commons.” Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. RESULTS: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. CONCLUSIONS: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4). BioMed Central 2021-01-22 /pmc/articles/PMC7820539/ /pubmed/33482803 http://dx.doi.org/10.1186/s12915-020-00940-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Waagmeester, Andra Willighagen, Egon L. Su, Andrew I. Kutmon, Martina Gayo, Jose Emilio Labra Fernández-Álvarez, Daniel Groom, Quentin Schaap, Peter J. Verhagen, Lisa M. Koehorst, Jasper J. A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title | A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_full | A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_fullStr | A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_full_unstemmed | A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_short | A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses |
title_sort | protocol for adding knowledge to wikidata: aligning resources on human coronaviruses |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820539/ https://www.ncbi.nlm.nih.gov/pubmed/33482803 http://dx.doi.org/10.1186/s12915-020-00940-y |
work_keys_str_mv | AT waagmeesterandra aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT willighagenegonl aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT suandrewi aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT kutmonmartina aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT gayojoseemiliolabra aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT fernandezalvarezdaniel aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT groomquentin aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT schaappeterj aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT verhagenlisam aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT koehorstjasperj aprotocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT waagmeesterandra protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT willighagenegonl protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT suandrewi protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT kutmonmartina protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT gayojoseemiliolabra protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT fernandezalvarezdaniel protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT groomquentin protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT schaappeterj protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT verhagenlisam protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses AT koehorstjasperj protocolforaddingknowledgetowikidataaligningresourcesonhumancoronaviruses |