Cargando…

GeneAlign: a coding exon prediction tool based on phylogenetical comparisons

GeneAlign is a coding exon prediction tool for predicting protein coding genes by measuring the homologies between a sequence of a genome and related sequences, which have been annotated, of other genomes. Identifying protein coding genes is one of most important tasks in newly sequenced genomes. Wi...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsieh, Shu Ju, Lin, Chun Yuan, Liu, Ning Han, Chow, Wei Yuan, Tang, Chuan Yi
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538901/
https://www.ncbi.nlm.nih.gov/pubmed/16845010
http://dx.doi.org/10.1093/nar/gkl307
_version_ 1782129148469182464
author Hsieh, Shu Ju
Lin, Chun Yuan
Liu, Ning Han
Chow, Wei Yuan
Tang, Chuan Yi
author_facet Hsieh, Shu Ju
Lin, Chun Yuan
Liu, Ning Han
Chow, Wei Yuan
Tang, Chuan Yi
author_sort Hsieh, Shu Ju
collection PubMed
description GeneAlign is a coding exon prediction tool for predicting protein coding genes by measuring the homologies between a sequence of a genome and related sequences, which have been annotated, of other genomes. Identifying protein coding genes is one of most important tasks in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in the newly sequenced genomes by comparing to annotated genes of phylogenetically close organisms. GeneAlign applies CORAL, a heuristic linear time alignment tool, to determine if regions flanked by the candidate signals (initiation codon-GT, AG-GT and AG-STOP codon) are similar to annotated coding exons. Employing the conservation of gene structures and sequence homologies between protein coding regions increases the prediction accuracy. GeneAlign was tested on Projector dataset of 491 human–mouse homologous sequence pairs. At the gene level, both the average sensitivity and the average specificity of GeneAlign are 81%, and they are larger than 96% at the exon level. The rates of missing exons and wrong exons are smaller than 1%. GeneAlign is a free tool available at .
format Text
id pubmed-1538901
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-15389012006-08-18 GeneAlign: a coding exon prediction tool based on phylogenetical comparisons Hsieh, Shu Ju Lin, Chun Yuan Liu, Ning Han Chow, Wei Yuan Tang, Chuan Yi Nucleic Acids Res Article GeneAlign is a coding exon prediction tool for predicting protein coding genes by measuring the homologies between a sequence of a genome and related sequences, which have been annotated, of other genomes. Identifying protein coding genes is one of most important tasks in newly sequenced genomes. With increasing numbers of gene annotations verified by experiments, it is feasible to identify genes in the newly sequenced genomes by comparing to annotated genes of phylogenetically close organisms. GeneAlign applies CORAL, a heuristic linear time alignment tool, to determine if regions flanked by the candidate signals (initiation codon-GT, AG-GT and AG-STOP codon) are similar to annotated coding exons. Employing the conservation of gene structures and sequence homologies between protein coding regions increases the prediction accuracy. GeneAlign was tested on Projector dataset of 491 human–mouse homologous sequence pairs. At the gene level, both the average sensitivity and the average specificity of GeneAlign are 81%, and they are larger than 96% at the exon level. The rates of missing exons and wrong exons are smaller than 1%. GeneAlign is a free tool available at . Oxford University Press 2006-07-01 2006-07-14 /pmc/articles/PMC1538901/ /pubmed/16845010 http://dx.doi.org/10.1093/nar/gkl307 Text en © The Author 2006. Published by Oxford University Press. All rights reserved
spellingShingle Article
Hsieh, Shu Ju
Lin, Chun Yuan
Liu, Ning Han
Chow, Wei Yuan
Tang, Chuan Yi
GeneAlign: a coding exon prediction tool based on phylogenetical comparisons
title GeneAlign: a coding exon prediction tool based on phylogenetical comparisons
title_full GeneAlign: a coding exon prediction tool based on phylogenetical comparisons
title_fullStr GeneAlign: a coding exon prediction tool based on phylogenetical comparisons
title_full_unstemmed GeneAlign: a coding exon prediction tool based on phylogenetical comparisons
title_short GeneAlign: a coding exon prediction tool based on phylogenetical comparisons
title_sort genealign: a coding exon prediction tool based on phylogenetical comparisons
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538901/
https://www.ncbi.nlm.nih.gov/pubmed/16845010
http://dx.doi.org/10.1093/nar/gkl307
work_keys_str_mv AT hsiehshuju genealignacodingexonpredictiontoolbasedonphylogeneticalcomparisons
AT linchunyuan genealignacodingexonpredictiontoolbasedonphylogeneticalcomparisons
AT liuninghan genealignacodingexonpredictiontoolbasedonphylogeneticalcomparisons
AT chowweiyuan genealignacodingexonpredictiontoolbasedonphylogeneticalcomparisons
AT tangchuanyi genealignacodingexonpredictiontoolbasedonphylogeneticalcomparisons