Cargando…

Next generation models for storage and representation of microbial biological annotation

BACKGROUND: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by...

Descripción completa

Detalles Bibliográficos
Autores principales:	Quest, Daniel J, Land, Miriam L, Brettin, Thomas S, Cottingham, Robert W
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3026362/ https://www.ncbi.nlm.nih.gov/pubmed/20946598 http://dx.doi.org/10.1186/1471-2105-11-S6-S15

_version_	1782197035305271296
author	Quest, Daniel J Land, Miriam L Brettin, Thomas S Cottingham, Robert W
author_facet	Quest, Daniel J Land, Miriam L Brettin, Thomas S Cottingham, Robert W
author_sort	Quest, Daniel J
collection	PubMed
description	BACKGROUND: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. RESULTS: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. CONCLUSIONS: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them.
format	Text
id	pubmed-3026362
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30263622011-01-26 Next generation models for storage and representation of microbial biological annotation Quest, Daniel J Land, Miriam L Brettin, Thomas S Cottingham, Robert W BMC Bioinformatics Proceedings BACKGROUND: Traditional genome annotation systems were developed in a very different computing era, one where the World Wide Web was just emerging. Consequently, these systems are built as centralized black boxes focused on generating high quality annotation submissions to GenBank/EMBL supported by expert manual curation. The exponential growth of sequence data drives a growing need for increasingly higher quality and automatically generated annotation. Typical annotation pipelines utilize traditional database technologies, clustered computing resources, Perl, C, and UNIX file systems to process raw sequence data, identify genes, and predict and categorize gene function. These technologies tightly couple the annotation software system to hardware and third party software (e.g. relational database systems and schemas). This makes annotation systems hard to reproduce, inflexible to modification over time, difficult to assess, difficult to partition across multiple geographic sites, and difficult to understand for those who are not domain experts. These systems are not readily open to scrutiny and therefore not scientifically tractable. The advent of Semantic Web standards such as Resource Description Framework (RDF) and OWL Web Ontology Language (OWL) enables us to construct systems that address these challenges in a new comprehensive way. RESULTS: Here, we develop a framework for linking traditional data to OWL-based ontologies in genome annotation. We show how data standards can decouple hardware and third party software tools from annotation pipelines, thereby making annotation pipelines easier to reproduce and assess. An illustrative example shows how TURTLE (Terse RDF Triple Language) can be used as a human readable, but also semantically-aware, equivalent to GenBank/EMBL files. CONCLUSIONS: The power of this approach lies in its ability to assemble annotation data from multiple databases across multiple locations into a representation that is understandable to researchers. In this way, all researchers, experimental and computational, will more easily understand the informatics processes constructing genome annotation and ultimately be able to help improve the systems that produce them. BioMed Central 2010-10-07 /pmc/articles/PMC3026362/ /pubmed/20946598 http://dx.doi.org/10.1186/1471-2105-11-S6-S15 Text en Copyright ©2010 Quest et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Quest, Daniel J Land, Miriam L Brettin, Thomas S Cottingham, Robert W Next generation models for storage and representation of microbial biological annotation
title	Next generation models for storage and representation of microbial biological annotation
title_full	Next generation models for storage and representation of microbial biological annotation
title_fullStr	Next generation models for storage and representation of microbial biological annotation
title_full_unstemmed	Next generation models for storage and representation of microbial biological annotation
title_short	Next generation models for storage and representation of microbial biological annotation
title_sort	next generation models for storage and representation of microbial biological annotation
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3026362/ https://www.ncbi.nlm.nih.gov/pubmed/20946598 http://dx.doi.org/10.1186/1471-2105-11-S6-S15
work_keys_str_mv	AT questdanielj nextgenerationmodelsforstorageandrepresentationofmicrobialbiologicalannotation AT landmiriaml nextgenerationmodelsforstorageandrepresentationofmicrobialbiologicalannotation AT brettinthomass nextgenerationmodelsforstorageandrepresentationofmicrobialbiologicalannotation AT cottinghamrobertw nextgenerationmodelsforstorageandrepresentationofmicrobialbiologicalannotation

Next generation models for storage and representation of microbial biological annotation

Ejemplares similares