Cargando…

An unsupervised classification scheme for improving predictions of prokaryotic TIS

BACKGROUND: Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for imp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tech, Maike, Meinicke, Peter
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1434772/ https://www.ncbi.nlm.nih.gov/pubmed/16526950 http://dx.doi.org/10.1186/1471-2105-7-121

_version_	1782127255235854336
author	Tech, Maike Meinicke, Peter
author_facet	Tech, Maike Meinicke, Peter
author_sort	Tech, Maike
collection	PubMed
description	BACKGROUND: Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes. RESULTS: We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from E. coli and B. subtilis. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on P. aeruginosa, B. pseudomallei and R. solanacearum. CONCLUSION: On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool »TICO«(TIs COrrector) which is publicly available from our web site.
format	Text
id	pubmed-1434772
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14347722006-04-21 An unsupervised classification scheme for improving predictions of prokaryotic TIS Tech, Maike Meinicke, Peter BMC Bioinformatics Methodology Article BACKGROUND: Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes. RESULTS: We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from E. coli and B. subtilis. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on P. aeruginosa, B. pseudomallei and R. solanacearum. CONCLUSION: On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool »TICO«(TIs COrrector) which is publicly available from our web site. BioMed Central 2006-03-09 /pmc/articles/PMC1434772/ /pubmed/16526950 http://dx.doi.org/10.1186/1471-2105-7-121 Text en Copyright © 2006 Tech and Meinicke; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Tech, Maike Meinicke, Peter An unsupervised classification scheme for improving predictions of prokaryotic TIS
title	An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_full	An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_fullStr	An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_full_unstemmed	An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_short	An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_sort	unsupervised classification scheme for improving predictions of prokaryotic tis
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1434772/ https://www.ncbi.nlm.nih.gov/pubmed/16526950 http://dx.doi.org/10.1186/1471-2105-7-121
work_keys_str_mv	AT techmaike anunsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis AT meinickepeter anunsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis AT techmaike unsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis AT meinickepeter unsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis

An unsupervised classification scheme for improving predictions of prokaryotic TIS

Ejemplares similares