Cargando…

Reranking candidate gene models with cross-species comparison for improved gene prediction

BACKGROUND: Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In parti...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qian, Crammer, Koby, Pereira, Fernando CN, Roos, David S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2587481/
https://www.ncbi.nlm.nih.gov/pubmed/18854050
http://dx.doi.org/10.1186/1471-2105-9-433
_version_ 1782160919367778304
author Liu, Qian
Crammer, Koby
Pereira, Fernando CN
Roos, David S
author_facet Liu, Qian
Crammer, Koby
Pereira, Fernando CN
Roos, David S
author_sort Liu, Qian
collection PubMed
description BACKGROUND: Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. RESULTS: We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. CONCLUSION: Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.
format Text
id pubmed-2587481
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25874812008-11-26 Reranking candidate gene models with cross-species comparison for improved gene prediction Liu, Qian Crammer, Koby Pereira, Fernando CN Roos, David S BMC Bioinformatics Methodology Article BACKGROUND: Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. RESULTS: We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. CONCLUSION: Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. BioMed Central 2008-10-14 /pmc/articles/PMC2587481/ /pubmed/18854050 http://dx.doi.org/10.1186/1471-2105-9-433 Text en Copyright © 2008 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Liu, Qian
Crammer, Koby
Pereira, Fernando CN
Roos, David S
Reranking candidate gene models with cross-species comparison for improved gene prediction
title Reranking candidate gene models with cross-species comparison for improved gene prediction
title_full Reranking candidate gene models with cross-species comparison for improved gene prediction
title_fullStr Reranking candidate gene models with cross-species comparison for improved gene prediction
title_full_unstemmed Reranking candidate gene models with cross-species comparison for improved gene prediction
title_short Reranking candidate gene models with cross-species comparison for improved gene prediction
title_sort reranking candidate gene models with cross-species comparison for improved gene prediction
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2587481/
https://www.ncbi.nlm.nih.gov/pubmed/18854050
http://dx.doi.org/10.1186/1471-2105-9-433
work_keys_str_mv AT liuqian rerankingcandidategenemodelswithcrossspeciescomparisonforimprovedgeneprediction
AT crammerkoby rerankingcandidategenemodelswithcrossspeciescomparisonforimprovedgeneprediction
AT pereirafernandocn rerankingcandidategenemodelswithcrossspeciescomparisonforimprovedgeneprediction
AT roosdavids rerankingcandidategenemodelswithcrossspeciescomparisonforimprovedgeneprediction