Cargando…

Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA

BACKGROUND: Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise....

Descripción completa

Detalles Bibliográficos
Autores principales: Djebali, Sarah, Delaplace, Franck, Crollius, Hugues Roest
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810556/
https://www.ncbi.nlm.nih.gov/pubmed/16925841
http://dx.doi.org/10.1186/gb-2006-7-s1-s7
_version_ 1782132601466650624
author Djebali, Sarah
Delaplace, Franck
Crollius, Hugues Roest
author_facet Djebali, Sarah
Delaplace, Franck
Crollius, Hugues Roest
author_sort Djebali, Sarah
collection PubMed
description BACKGROUND: Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. RESULTS: We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. CONCLUSION: We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement.
format Text
id pubmed-1810556
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18105562007-03-07 Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA Djebali, Sarah Delaplace, Franck Crollius, Hugues Roest Genome Biol Research BACKGROUND: Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. RESULTS: We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. CONCLUSION: We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810556/ /pubmed/16925841 http://dx.doi.org/10.1186/gb-2006-7-s1-s7 Text en Copyright © 2006 Djebali et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Djebali, Sarah
Delaplace, Franck
Crollius, Hugues Roest
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
title Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
title_full Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
title_fullStr Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
title_full_unstemmed Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
title_short Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
title_sort exogean: a framework for annotating protein-coding genes in eukaryotic genomic dna
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810556/
https://www.ncbi.nlm.nih.gov/pubmed/16925841
http://dx.doi.org/10.1186/gb-2006-7-s1-s7
work_keys_str_mv AT djebalisarah exogeanaframeworkforannotatingproteincodinggenesineukaryoticgenomicdna
AT delaplacefranck exogeanaframeworkforannotatingproteincodinggenesineukaryoticgenomicdna
AT crolliushuguesroest exogeanaframeworkforannotatingproteincodinggenesineukaryoticgenomicdna