Cargando…

In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences

Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large n...

Descripción completa

Detalles Bibliográficos
Autores principales: Chauhan, Jagat Singh, Rao, Alka, Raghava, Gajendra P. S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3695939/
https://www.ncbi.nlm.nih.gov/pubmed/23840574
http://dx.doi.org/10.1371/journal.pone.0067008
_version_ 1782275041299267584
author Chauhan, Jagat Singh
Rao, Alka
Raghava, Gajendra P. S.
author_facet Chauhan, Jagat Singh
Rao, Alka
Raghava, Gajendra P. S.
author_sort Chauhan, Jagat Singh
collection PubMed
description Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites’ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.
format Online
Article
Text
id pubmed-3695939
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36959392013-07-09 In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences Chauhan, Jagat Singh Rao, Alka Raghava, Gajendra P. S. PLoS One Research Article Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites’ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences. Public Library of Science 2013-06-28 /pmc/articles/PMC3695939/ /pubmed/23840574 http://dx.doi.org/10.1371/journal.pone.0067008 Text en © 2013 Chauhan et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chauhan, Jagat Singh
Rao, Alka
Raghava, Gajendra P. S.
In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences
title In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences
title_full In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences
title_fullStr In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences
title_full_unstemmed In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences
title_short In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences
title_sort in silico platform for prediction of n-, o- and c-glycosites in eukaryotic protein sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3695939/
https://www.ncbi.nlm.nih.gov/pubmed/23840574
http://dx.doi.org/10.1371/journal.pone.0067008
work_keys_str_mv AT chauhanjagatsingh insilicoplatformforpredictionofnoandcglycositesineukaryoticproteinsequences
AT raoalka insilicoplatformforpredictionofnoandcglycositesineukaryoticproteinsequences
AT raghavagajendraps insilicoplatformforpredictionofnoandcglycositesineukaryoticproteinsequences