Cargando…

JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions

BACKGROUND: Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to emerge that can provide near perfect levels of sensitivity and specificity at the level of whole genes...

Descripción completa

Detalles Bibliográficos
Autores principales: Allen, Jonathan E, Majoros, William H, Pertea, Mihaela, Salzberg, Steven L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810558/
https://www.ncbi.nlm.nih.gov/pubmed/16925843
http://dx.doi.org/10.1186/gb-2006-7-s1-s9
_version_ 1782132602133544960
author Allen, Jonathan E
Majoros, William H
Pertea, Mihaela
Salzberg, Steven L
author_facet Allen, Jonathan E
Majoros, William H
Pertea, Mihaela
Salzberg, Steven L
author_sort Allen, Jonathan E
collection PubMed
description BACKGROUND: Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to emerge that can provide near perfect levels of sensitivity and specificity at the level of whole genes. As an incremental step in this direction, it is hoped that controlled gene finding experiments in the ENCODE regions will provide a more accurate view of the relative benefits of different strategies for modeling and predicting gene structures. RESULTS: Here we describe our general-purpose eukaryotic gene finding pipeline and its major components, as well as the methodological adaptations that we found necessary in accommodating human DNA in our pipeline, noting that a similar level of effort may be necessary by ourselves and others with similar pipelines whenever a new class of genomes is presented to the community for analysis. We also describe a number of controlled experiments involving the differential inclusion of various types of evidence and feature states into our models and the resulting impact these variations have had on predictive accuracy. CONCLUSION: While in the case of the non-comparative gene finders we found that adding model states to represent specific biological features did little to enhance predictive accuracy, for our evidence-based 'combiner' program the incorporation of additional evidence tracks tended to produce significant gains in accuracy for most evidence types, suggesting that improved modeling efforts at the hidden Markov model level are of relatively little value. We relate these findings to our current plans for future research.
format Text
id pubmed-1810558
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18105582007-03-07 JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions Allen, Jonathan E Majoros, William H Pertea, Mihaela Salzberg, Steven L Genome Biol Research BACKGROUND: Predicting complete protein-coding genes in human DNA remains a significant challenge. Though a number of promising approaches have been investigated, an ideal suite of tools has yet to emerge that can provide near perfect levels of sensitivity and specificity at the level of whole genes. As an incremental step in this direction, it is hoped that controlled gene finding experiments in the ENCODE regions will provide a more accurate view of the relative benefits of different strategies for modeling and predicting gene structures. RESULTS: Here we describe our general-purpose eukaryotic gene finding pipeline and its major components, as well as the methodological adaptations that we found necessary in accommodating human DNA in our pipeline, noting that a similar level of effort may be necessary by ourselves and others with similar pipelines whenever a new class of genomes is presented to the community for analysis. We also describe a number of controlled experiments involving the differential inclusion of various types of evidence and feature states into our models and the resulting impact these variations have had on predictive accuracy. CONCLUSION: While in the case of the non-comparative gene finders we found that adding model states to represent specific biological features did little to enhance predictive accuracy, for our evidence-based 'combiner' program the incorporation of additional evidence tracks tended to produce significant gains in accuracy for most evidence types, suggesting that improved modeling efforts at the hidden Markov model level are of relatively little value. We relate these findings to our current plans for future research. BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810558/ /pubmed/16925843 http://dx.doi.org/10.1186/gb-2006-7-s1-s9 Text en Copyright © 2006 Allen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Allen, Jonathan E
Majoros, William H
Pertea, Mihaela
Salzberg, Steven L
JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
title JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
title_full JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
title_fullStr JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
title_full_unstemmed JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
title_short JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions
title_sort jigsaw, genezilla, and glimmerhmm: puzzling out the features of human genes in the encode regions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810558/
https://www.ncbi.nlm.nih.gov/pubmed/16925843
http://dx.doi.org/10.1186/gb-2006-7-s1-s9
work_keys_str_mv AT allenjonathane jigsawgenezillaandglimmerhmmpuzzlingoutthefeaturesofhumangenesintheencoderegions
AT majoroswilliamh jigsawgenezillaandglimmerhmmpuzzlingoutthefeaturesofhumangenesintheencoderegions
AT perteamihaela jigsawgenezillaandglimmerhmmpuzzlingoutthefeaturesofhumangenesintheencoderegions
AT salzbergstevenl jigsawgenezillaandglimmerhmmpuzzlingoutthefeaturesofhumangenesintheencoderegions