Cargando…

Vertebrate gene finding from multiple-species alignments using a two-level strategy

BACKGROUND: One way in which the accuracy of gene structure prediction in vertebrate DNA sequences can be improved is by analyzing alignments with multiple related species, since functional regions of genes tend to be more conserved. RESULTS: We describe DOGFISH, a vertebrate gene finder consisting...

Descripción completa

Detalles Bibliográficos
Autores principales: Carter, David, Durbin, Richard
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810555/
https://www.ncbi.nlm.nih.gov/pubmed/16925840
http://dx.doi.org/10.1186/gb-2006-7-s1-s6
_version_ 1782132601142640640
author Carter, David
Durbin, Richard
author_facet Carter, David
Durbin, Richard
author_sort Carter, David
collection PubMed
description BACKGROUND: One way in which the accuracy of gene structure prediction in vertebrate DNA sequences can be improved is by analyzing alignments with multiple related species, since functional regions of genes tend to be more conserved. RESULTS: We describe DOGFISH, a vertebrate gene finder consisting of a cleanly separated site classifier and structure predictor. The classifier scores potential splice sites and other features, using sequence alignments between multiple vertebrate species, while the structure predictor hypothesizes coding transcripts by combining these scores using a simple model of gene structure. This also identifies and assigns confidence scores to possible additional exons. Performance is assessed on the ENCODE regions. We predict transcripts and exons across the whole human genome, and identify over 10,000 high confidence new coding exons not in the Ensembl gene set. CONCLUSION: We present a practical multiple species gene prediction method. Accuracy improves as additional species, up to at least eight, are introduced. The novel predictions of the whole-genome scan should support efficient experimental verification.
format Text
id pubmed-1810555
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18105552007-03-07 Vertebrate gene finding from multiple-species alignments using a two-level strategy Carter, David Durbin, Richard Genome Biol Research BACKGROUND: One way in which the accuracy of gene structure prediction in vertebrate DNA sequences can be improved is by analyzing alignments with multiple related species, since functional regions of genes tend to be more conserved. RESULTS: We describe DOGFISH, a vertebrate gene finder consisting of a cleanly separated site classifier and structure predictor. The classifier scores potential splice sites and other features, using sequence alignments between multiple vertebrate species, while the structure predictor hypothesizes coding transcripts by combining these scores using a simple model of gene structure. This also identifies and assigns confidence scores to possible additional exons. Performance is assessed on the ENCODE regions. We predict transcripts and exons across the whole human genome, and identify over 10,000 high confidence new coding exons not in the Ensembl gene set. CONCLUSION: We present a practical multiple species gene prediction method. Accuracy improves as additional species, up to at least eight, are introduced. The novel predictions of the whole-genome scan should support efficient experimental verification. BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810555/ /pubmed/16925840 http://dx.doi.org/10.1186/gb-2006-7-s1-s6 Text en Copyright © 2006 Carter and Durbin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Carter, David
Durbin, Richard
Vertebrate gene finding from multiple-species alignments using a two-level strategy
title Vertebrate gene finding from multiple-species alignments using a two-level strategy
title_full Vertebrate gene finding from multiple-species alignments using a two-level strategy
title_fullStr Vertebrate gene finding from multiple-species alignments using a two-level strategy
title_full_unstemmed Vertebrate gene finding from multiple-species alignments using a two-level strategy
title_short Vertebrate gene finding from multiple-species alignments using a two-level strategy
title_sort vertebrate gene finding from multiple-species alignments using a two-level strategy
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810555/
https://www.ncbi.nlm.nih.gov/pubmed/16925840
http://dx.doi.org/10.1186/gb-2006-7-s1-s6
work_keys_str_mv AT carterdavid vertebrategenefindingfrommultiplespeciesalignmentsusingatwolevelstrategy
AT durbinrichard vertebrategenefindingfrommultiplespeciesalignmentsusingatwolevelstrategy