Cargando…

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes

BACKGROUND: Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown pro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Makita, Yuko, de Hoon, Michiel JL, Danchin, Antoine
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1805508/ https://www.ncbi.nlm.nih.gov/pubmed/17286872 http://dx.doi.org/10.1186/1471-2105-8-47

_version_	1782132485348392960
author	Makita, Yuko de Hoon, Michiel JL Danchin, Antoine
author_facet	Makita, Yuko de Hoon, Michiel JL Danchin, Antoine
author_sort	Makita, Yuko
collection	PubMed
description	BACKGROUND: Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods are optimized for Escherichia coli data sets; applying these methods to newly sequenced bacterial genomes may not result in an equivalent level of accuracy. RESULTS: Based on a biological representation of the translation process, we applied Bayesian statistics to create a score function for predicting translation initiation sites. In contrast to existing programs, our combination of methods uses supervised learning to optimally use the set of known translation initiation sites. We combined the Ribosome Binding Site (RBS) sequence, the distance between the translation initiation site and the RBS sequence, the base composition of the start codon, the nucleotide composition (A-rich sequences) following start codons, and the expected distribution of the protein length in a Bayesian scoring function. To further increase the prediction accuracy, we also took into account the operon orientation. The outcome of the procedure achieved a prediction accuracy of 93.2% in 858 E. coli genes from the EcoGene data set and 92.7% accuracy in a data set of 1243 Bacillus subtilis 'non-y' genes. We confirmed the performance in the GC-rich Gamma-Proteobacteria Herminiimonas arsenicoxydans, Pseudomonas aeruginosa, and Burkholderia pseudomallei K96243. CONCLUSION: Hon-yaku, being based on a careful choice of elements important in translation, improved the prediction accuracy in B. subtilis data sets and other bacteria except for E. coli. We believe that most remaining mispredictions are due to atypical ribosomal binding sequences used in specific translation control processes, or likely errors in the training data sets.
format	Text
id	pubmed-1805508
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-18055082007-03-13 Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes Makita, Yuko de Hoon, Michiel JL Danchin, Antoine BMC Bioinformatics Methodology Article BACKGROUND: Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods are optimized for Escherichia coli data sets; applying these methods to newly sequenced bacterial genomes may not result in an equivalent level of accuracy. RESULTS: Based on a biological representation of the translation process, we applied Bayesian statistics to create a score function for predicting translation initiation sites. In contrast to existing programs, our combination of methods uses supervised learning to optimally use the set of known translation initiation sites. We combined the Ribosome Binding Site (RBS) sequence, the distance between the translation initiation site and the RBS sequence, the base composition of the start codon, the nucleotide composition (A-rich sequences) following start codons, and the expected distribution of the protein length in a Bayesian scoring function. To further increase the prediction accuracy, we also took into account the operon orientation. The outcome of the procedure achieved a prediction accuracy of 93.2% in 858 E. coli genes from the EcoGene data set and 92.7% accuracy in a data set of 1243 Bacillus subtilis 'non-y' genes. We confirmed the performance in the GC-rich Gamma-Proteobacteria Herminiimonas arsenicoxydans, Pseudomonas aeruginosa, and Burkholderia pseudomallei K96243. CONCLUSION: Hon-yaku, being based on a careful choice of elements important in translation, improved the prediction accuracy in B. subtilis data sets and other bacteria except for E. coli. We believe that most remaining mispredictions are due to atypical ribosomal binding sequences used in specific translation control processes, or likely errors in the training data sets. BioMed Central 2007-02-08 /pmc/articles/PMC1805508/ /pubmed/17286872 http://dx.doi.org/10.1186/1471-2105-8-47 Text en Copyright © 2007 Makita et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Makita, Yuko de Hoon, Michiel JL Danchin, Antoine Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title	Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_full	Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_fullStr	Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_full_unstemmed	Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_short	Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes
title_sort	hon-yaku: a biology-driven bayesian methodology for identifying translation initiation sites in prokaryotes
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1805508/ https://www.ncbi.nlm.nih.gov/pubmed/17286872 http://dx.doi.org/10.1186/1471-2105-8-47
work_keys_str_mv	AT makitayuko honyakuabiologydrivenbayesianmethodologyforidentifyingtranslationinitiationsitesinprokaryotes AT dehoonmichieljl honyakuabiologydrivenbayesianmethodologyforidentifyingtranslationinitiationsitesinprokaryotes AT danchinantoine honyakuabiologydrivenbayesianmethodologyforidentifyingtranslationinitiationsitesinprokaryotes

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes

Ejemplares similares