Cargando…

Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes

The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale...

Descripción completa

Detalles Bibliográficos
Autores principales: Putman, Tim E., Burgstaller-Muehlbacher, Sebastian, Waagmeester, Andra, Wu, Chunlei, Su, Andrew I., Good, Benjamin M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4822648/
https://www.ncbi.nlm.nih.gov/pubmed/27022157
http://dx.doi.org/10.1093/database/baw028
_version_ 1782425791648235520
author Putman, Tim E.
Burgstaller-Muehlbacher, Sebastian
Waagmeester, Andra
Wu, Chunlei
Su, Andrew I.
Good, Benjamin M.
author_facet Putman, Tim E.
Burgstaller-Muehlbacher, Sebastian
Waagmeester, Andra
Wu, Chunlei
Su, Andrew I.
Good, Benjamin M.
author_sort Putman, Tim E.
collection PubMed
description The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata’s semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43 694 gene and 37 966 protein items for 21 species of bacteria, including the human pathogenic bacteria Chlamydia trachomatis. Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling ∼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert.
format Online
Article
Text
id pubmed-4822648
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48226482016-04-07 Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes Putman, Tim E. Burgstaller-Muehlbacher, Sebastian Waagmeester, Andra Wu, Chunlei Su, Andrew I. Good, Benjamin M. Database (Oxford) Original Article The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata’s semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43 694 gene and 37 966 protein items for 21 species of bacteria, including the human pathogenic bacteria Chlamydia trachomatis. Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling ∼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert. Oxford University Press 2016-03-28 /pmc/articles/PMC4822648/ /pubmed/27022157 http://dx.doi.org/10.1093/database/baw028 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/),which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Article
Putman, Tim E.
Burgstaller-Muehlbacher, Sebastian
Waagmeester, Andra
Wu, Chunlei
Su, Andrew I.
Good, Benjamin M.
Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
title Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
title_full Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
title_fullStr Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
title_full_unstemmed Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
title_short Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
title_sort centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4822648/
https://www.ncbi.nlm.nih.gov/pubmed/27022157
http://dx.doi.org/10.1093/database/baw028
work_keys_str_mv AT putmantime centralizingcontentanddistributinglaboracommunitymodelforcuratingtheverylongtailofmicrobialgenomes
AT burgstallermuehlbachersebastian centralizingcontentanddistributinglaboracommunitymodelforcuratingtheverylongtailofmicrobialgenomes
AT waagmeesterandra centralizingcontentanddistributinglaboracommunitymodelforcuratingtheverylongtailofmicrobialgenomes
AT wuchunlei centralizingcontentanddistributinglaboracommunitymodelforcuratingtheverylongtailofmicrobialgenomes
AT suandrewi centralizingcontentanddistributinglaboracommunitymodelforcuratingtheverylongtailofmicrobialgenomes
AT goodbenjaminm centralizingcontentanddistributinglaboracommunitymodelforcuratingtheverylongtailofmicrobialgenomes