Cargando…

Automatic generation of gene finders for eukaryotic species

BACKGROUND: The number of sequenced eukaryotic genomes is rapidly increasing. This means that over time it will be hard to keep supplying customised gene finders for each genome. This calls for procedures to automatically generate species-specific gene finders and to re-train them as the quantity an...

Descripción completa

Detalles Bibliográficos
Autores principales: Munch, Kasper, Krogh, Anders
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1522026/
https://www.ncbi.nlm.nih.gov/pubmed/16712739
http://dx.doi.org/10.1186/1471-2105-7-263
_version_ 1782128787871236096
author Munch, Kasper
Krogh, Anders
author_facet Munch, Kasper
Krogh, Anders
author_sort Munch, Kasper
collection PubMed
description BACKGROUND: The number of sequenced eukaryotic genomes is rapidly increasing. This means that over time it will be hard to keep supplying customised gene finders for each genome. This calls for procedures to automatically generate species-specific gene finders and to re-train them as the quantity and quality of reliable gene annotation grows. RESULTS: We present a procedure, Agene, that automatically generates a species-specific gene predictor from a set of reliable mRNA sequences and a genome. We apply a Hidden Markov model (HMM) that implements explicit length distribution modelling for all gene structure blocks using acyclic discrete phase type distributions. The state structure of the each HMM is generated dynamically from an array of sub-models to include only gene features represented in the training set. CONCLUSION: Acyclic discrete phase type distributions are well suited to model sequence length distributions. The performance of each individual gene predictor on each individual genome is comparable to the best of the manually optimised species-specific gene finders. It is shown that species-specific gene finders are superior to gene finders trained on other species.
format Text
id pubmed-1522026
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15220262006-07-28 Automatic generation of gene finders for eukaryotic species Munch, Kasper Krogh, Anders BMC Bioinformatics Methodology Article BACKGROUND: The number of sequenced eukaryotic genomes is rapidly increasing. This means that over time it will be hard to keep supplying customised gene finders for each genome. This calls for procedures to automatically generate species-specific gene finders and to re-train them as the quantity and quality of reliable gene annotation grows. RESULTS: We present a procedure, Agene, that automatically generates a species-specific gene predictor from a set of reliable mRNA sequences and a genome. We apply a Hidden Markov model (HMM) that implements explicit length distribution modelling for all gene structure blocks using acyclic discrete phase type distributions. The state structure of the each HMM is generated dynamically from an array of sub-models to include only gene features represented in the training set. CONCLUSION: Acyclic discrete phase type distributions are well suited to model sequence length distributions. The performance of each individual gene predictor on each individual genome is comparable to the best of the manually optimised species-specific gene finders. It is shown that species-specific gene finders are superior to gene finders trained on other species. BioMed Central 2006-05-21 /pmc/articles/PMC1522026/ /pubmed/16712739 http://dx.doi.org/10.1186/1471-2105-7-263 Text en Copyright © 2006 Munch and Krogh; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Munch, Kasper
Krogh, Anders
Automatic generation of gene finders for eukaryotic species
title Automatic generation of gene finders for eukaryotic species
title_full Automatic generation of gene finders for eukaryotic species
title_fullStr Automatic generation of gene finders for eukaryotic species
title_full_unstemmed Automatic generation of gene finders for eukaryotic species
title_short Automatic generation of gene finders for eukaryotic species
title_sort automatic generation of gene finders for eukaryotic species
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1522026/
https://www.ncbi.nlm.nih.gov/pubmed/16712739
http://dx.doi.org/10.1186/1471-2105-7-263
work_keys_str_mv AT munchkasper automaticgenerationofgenefindersforeukaryoticspecies
AT kroghanders automaticgenerationofgenefindersforeukaryoticspecies