Cargando…
Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs
BACKGROUND: As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding sc...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335299/ https://www.ncbi.nlm.nih.gov/pubmed/18282281 http://dx.doi.org/10.1186/1471-2105-9-101 |
_version_ | 1782152820114325504 |
---|---|
author | Chen, Yong-Zi Tang, Yu-Rong Sheng, Zhi-Ya Zhang, Ziding |
author_facet | Chen, Yong-Zi Tang, Yu-Rong Sheng, Zhi-Ya Zhang, Ziding |
author_sort | Chen, Yong-Zi |
collection | PubMed |
description | BACKGROUND: As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding scheme was employed to improve the prediction of mucin-type O-glycosylation sites in mammalian proteins. RESULTS: A new protein bioinformatics tool, CKSAAP_OGlySite, was developed to predict mucin-type O-glycosylation serine/threonine (S/T) sites in mammalian proteins. Using the composition of k-spaced amino acid pairs (CKSAAP) based encoding scheme, the proposed method was trained and tested in a new and stringent O-glycosylation dataset with the assistance of Support Vector Machine (SVM). When the ratio of O-glycosylation to non-glycosylation sites in training datasets was set as 1:1, 10-fold cross-validation tests showed that the proposed method yielded a high accuracy of 83.1% and 81.4% in predicting O-glycosylated S and T sites, respectively. Based on the same datasets, CKSAAP_OGlySite resulted in a higher accuracy than the conventional binary encoding based method (about +5.0%). When trained and tested in 1:5 datasets, the CKSAAP encoding showed a more significant improvement than the binary encoding. We also merged the training datasets of S and T sites and integrated the prediction of S and T sites into one single predictor (i.e. S+T predictor). Either in 1:1 or 1:5 datasets, the performance of this S+T predictor was always slightly better than those predictors where S and T sites were independently predicted, suggesting that the molecular recognition of O-glycosylated S/T sites seems to be similar and the increase of the S+T predictor's accuracy may be a result of expanded training datasets. Moreover, CKSAAP_OGlySite was also shown to have better performance when benchmarked against two existing predictors. CONCLUSION: Because of CKSAAP encoding's ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, CKSAAP_ OGlySite has been proved more powerful than the conventional binary encoding based method. This suggests that it can be used as a competitive mucin-type O-glycosylation site predictor to the biological community. CKSAAP_OGlySite is now available at . |
format | Text |
id | pubmed-2335299 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23352992008-04-28 Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs Chen, Yong-Zi Tang, Yu-Rong Sheng, Zhi-Ya Zhang, Ziding BMC Bioinformatics Research Article BACKGROUND: As one of the most common protein post-translational modifications, glycosylation is involved in a variety of important biological processes. Computational identification of glycosylation sites in protein sequences becomes increasingly important in the post-genomic era. A new encoding scheme was employed to improve the prediction of mucin-type O-glycosylation sites in mammalian proteins. RESULTS: A new protein bioinformatics tool, CKSAAP_OGlySite, was developed to predict mucin-type O-glycosylation serine/threonine (S/T) sites in mammalian proteins. Using the composition of k-spaced amino acid pairs (CKSAAP) based encoding scheme, the proposed method was trained and tested in a new and stringent O-glycosylation dataset with the assistance of Support Vector Machine (SVM). When the ratio of O-glycosylation to non-glycosylation sites in training datasets was set as 1:1, 10-fold cross-validation tests showed that the proposed method yielded a high accuracy of 83.1% and 81.4% in predicting O-glycosylated S and T sites, respectively. Based on the same datasets, CKSAAP_OGlySite resulted in a higher accuracy than the conventional binary encoding based method (about +5.0%). When trained and tested in 1:5 datasets, the CKSAAP encoding showed a more significant improvement than the binary encoding. We also merged the training datasets of S and T sites and integrated the prediction of S and T sites into one single predictor (i.e. S+T predictor). Either in 1:1 or 1:5 datasets, the performance of this S+T predictor was always slightly better than those predictors where S and T sites were independently predicted, suggesting that the molecular recognition of O-glycosylated S/T sites seems to be similar and the increase of the S+T predictor's accuracy may be a result of expanded training datasets. Moreover, CKSAAP_OGlySite was also shown to have better performance when benchmarked against two existing predictors. CONCLUSION: Because of CKSAAP encoding's ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, CKSAAP_ OGlySite has been proved more powerful than the conventional binary encoding based method. This suggests that it can be used as a competitive mucin-type O-glycosylation site predictor to the biological community. CKSAAP_OGlySite is now available at . BioMed Central 2008-02-18 /pmc/articles/PMC2335299/ /pubmed/18282281 http://dx.doi.org/10.1186/1471-2105-9-101 Text en Copyright © 2008 Chen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Chen, Yong-Zi Tang, Yu-Rong Sheng, Zhi-Ya Zhang, Ziding Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs |
title | Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs |
title_full | Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs |
title_fullStr | Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs |
title_full_unstemmed | Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs |
title_short | Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs |
title_sort | prediction of mucin-type o-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335299/ https://www.ncbi.nlm.nih.gov/pubmed/18282281 http://dx.doi.org/10.1186/1471-2105-9-101 |
work_keys_str_mv | AT chenyongzi predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofkspacedaminoacidpairs AT tangyurong predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofkspacedaminoacidpairs AT shengzhiya predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofkspacedaminoacidpairs AT zhangziding predictionofmucintypeoglycosylationsitesinmammalianproteinsusingthecompositionofkspacedaminoacidpairs |