Cargando…

Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy

BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is des...

Descripción completa

Detalles Bibliográficos
Autor principal: Yang, Zheng Rong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777180/
https://www.ncbi.nlm.nih.gov/pubmed/19874585
http://dx.doi.org/10.1186/1471-2105-10-361
_version_ 1782174150246268928
author Yang, Zheng Rong
author_facet Yang, Zheng Rong
author_sort Yang, Zheng Rong
collection PubMed
description BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins. RESULTS: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at for public use. CONCLUSION: The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification.
format Text
id pubmed-2777180
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27771802009-11-15 Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy Yang, Zheng Rong BMC Bioinformatics Research Article BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins. RESULTS: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at for public use. CONCLUSION: The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification. BioMed Central 2009-10-29 /pmc/articles/PMC2777180/ /pubmed/19874585 http://dx.doi.org/10.1186/1471-2105-10-361 Text en Copyright © 2009 Yang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yang, Zheng Rong
Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_full Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_fullStr Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_full_unstemmed Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_short Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
title_sort predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777180/
https://www.ncbi.nlm.nih.gov/pubmed/19874585
http://dx.doi.org/10.1186/1471-2105-10-361
work_keys_str_mv AT yangzhengrong predictingsulfotyrosinesitesusingtherandomforestalgorithmwithsignificantlyimprovedpredictionaccuracy