Cargando…

Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy

BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is des...

Descripción completa

Detalles Bibliográficos
Autor principal:	Yang, Zheng Rong
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777180/ https://www.ncbi.nlm.nih.gov/pubmed/19874585 http://dx.doi.org/10.1186/1471-2105-10-361

_version_	1782174150246268928
author	Yang, Zheng Rong
author_facet	Yang, Zheng Rong
author_sort	Yang, Zheng Rong
collection	PubMed
description	BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins. RESULTS: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at for public use. CONCLUSION: The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification.
format	Text
id	pubmed-2777180
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27771802009-11-15 Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy Yang, Zheng Rong BMC Bioinformatics Research Article BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins. RESULTS: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at for public use. CONCLUSION: The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification. BioMed Central 2009-10-29 /pmc/articles/PMC2777180/ /pubmed/19874585 http://dx.doi.org/10.1186/1471-2105-10-361 Text en Copyright © 2009 Yang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Yang, Zheng Rong Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title	Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_full	Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_fullStr	Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_full_unstemmed	Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_short	Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_sort	predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777180/ https://www.ncbi.nlm.nih.gov/pubmed/19874585 http://dx.doi.org/10.1186/1471-2105-10-361
work_keys_str_mv	AT yangzhengrong predictingsulfotyrosinesitesusingtherandomforestalgorithmwithsignificantlyimprovedpredictionaccuracy

Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy

Ejemplares similares