Cargando…

GeneWaltz--A new method for reducing the false positives of gene finding

BACKGROUND: Identifying protein-coding regions in genomic sequences is an essential step in genome analysis. It is well known that the proportion of false positives among genes predicted by current methods is high, especially when the exons are short. These false positives are problematic because th...

Descripción completa

Detalles Bibliográficos
Autores principales: Misawa, Kazuharu, Kikuno, Reiko F
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955682/
https://www.ncbi.nlm.nih.gov/pubmed/20875138
http://dx.doi.org/10.1186/1756-0381-3-6
_version_ 1782188067813064704
author Misawa, Kazuharu
Kikuno, Reiko F
author_facet Misawa, Kazuharu
Kikuno, Reiko F
author_sort Misawa, Kazuharu
collection PubMed
description BACKGROUND: Identifying protein-coding regions in genomic sequences is an essential step in genome analysis. It is well known that the proportion of false positives among genes predicted by current methods is high, especially when the exons are short. These false positives are problematic because they waste time and resources of experimental studies. METHODS: We developed GeneWaltz, a new filtering method that reduces the risk of false positives in gene finding. GeneWaltz utilizes a codon-to-codon substitution matrix that was constructed by comparing protein-coding regions from orthologous gene pairs between mouse and human genomes. Using this matrix, a scoring scheme was developed; it assigned higher scores to coding regions and lower scores to non-coding regions. The regions with high scores were considered candidate coding regions. One-dimensional Karlin-Altschul statistics was used to test the significance of the coding regions identified by GeneWaltz. RESULTS: The proportion of false positives among genes predicted by GENSCAN and Twinscan were high, especially when the exons were short. GeneWaltz significantly reduced the ratio of false positives to all positives predicted by GENSCAN and Twinscan, especially when the exons were short. CONCLUSIONS: GeneWaltz will be helpful in experimental genomic studies. GeneWaltz binaries and the matrix are available online at http://en.sourceforge.jp/projects/genewaltz/.
format Text
id pubmed-2955682
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29556822010-10-18 GeneWaltz--A new method for reducing the false positives of gene finding Misawa, Kazuharu Kikuno, Reiko F BioData Min Methodology BACKGROUND: Identifying protein-coding regions in genomic sequences is an essential step in genome analysis. It is well known that the proportion of false positives among genes predicted by current methods is high, especially when the exons are short. These false positives are problematic because they waste time and resources of experimental studies. METHODS: We developed GeneWaltz, a new filtering method that reduces the risk of false positives in gene finding. GeneWaltz utilizes a codon-to-codon substitution matrix that was constructed by comparing protein-coding regions from orthologous gene pairs between mouse and human genomes. Using this matrix, a scoring scheme was developed; it assigned higher scores to coding regions and lower scores to non-coding regions. The regions with high scores were considered candidate coding regions. One-dimensional Karlin-Altschul statistics was used to test the significance of the coding regions identified by GeneWaltz. RESULTS: The proportion of false positives among genes predicted by GENSCAN and Twinscan were high, especially when the exons were short. GeneWaltz significantly reduced the ratio of false positives to all positives predicted by GENSCAN and Twinscan, especially when the exons were short. CONCLUSIONS: GeneWaltz will be helpful in experimental genomic studies. GeneWaltz binaries and the matrix are available online at http://en.sourceforge.jp/projects/genewaltz/. BioMed Central 2010-09-28 /pmc/articles/PMC2955682/ /pubmed/20875138 http://dx.doi.org/10.1186/1756-0381-3-6 Text en Copyright ©2010 Misawa and Kikuno; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Misawa, Kazuharu
Kikuno, Reiko F
GeneWaltz--A new method for reducing the false positives of gene finding
title GeneWaltz--A new method for reducing the false positives of gene finding
title_full GeneWaltz--A new method for reducing the false positives of gene finding
title_fullStr GeneWaltz--A new method for reducing the false positives of gene finding
title_full_unstemmed GeneWaltz--A new method for reducing the false positives of gene finding
title_short GeneWaltz--A new method for reducing the false positives of gene finding
title_sort genewaltz--a new method for reducing the false positives of gene finding
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955682/
https://www.ncbi.nlm.nih.gov/pubmed/20875138
http://dx.doi.org/10.1186/1756-0381-3-6
work_keys_str_mv AT misawakazuharu genewaltzanewmethodforreducingthefalsepositivesofgenefinding
AT kikunoreikof genewaltzanewmethodforreducingthefalsepositivesofgenefinding