Cargando…
Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is des...
Autor principal: | |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777180/ https://www.ncbi.nlm.nih.gov/pubmed/19874585 http://dx.doi.org/10.1186/1471-2105-10-361 |
_version_ | 1782174150246268928 |
---|---|
author | Yang, Zheng Rong |
author_facet | Yang, Zheng Rong |
author_sort | Yang, Zheng Rong |
collection | PubMed |
description | BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins. RESULTS: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at for public use. CONCLUSION: The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification. |
format | Text |
id | pubmed-2777180 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27771802009-11-15 Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy Yang, Zheng Rong BMC Bioinformatics Research Article BACKGROUND: Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins. RESULTS: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at for public use. CONCLUSION: The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification. BioMed Central 2009-10-29 /pmc/articles/PMC2777180/ /pubmed/19874585 http://dx.doi.org/10.1186/1471-2105-10-361 Text en Copyright © 2009 Yang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Yang, Zheng Rong Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy |
title | Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy |
title_full | Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy |
title_fullStr | Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy |
title_full_unstemmed | Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy |
title_short | Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy |
title_sort | predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777180/ https://www.ncbi.nlm.nih.gov/pubmed/19874585 http://dx.doi.org/10.1186/1471-2105-10-361 |
work_keys_str_mv | AT yangzhengrong predictingsulfotyrosinesitesusingtherandomforestalgorithmwithsignificantlyimprovedpredictionaccuracy |