Cargando…

Glycosylation site prediction using ensembles of Support Vector Machine classifiers

BACKGROUND: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Exp...

Descripción completa

Detalles Bibliográficos
Autores principales: Caragea, Cornelia, Sinapov, Jivko, Silvescu, Adrian, Dobbs, Drena, Honavar, Vasant
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2220009/
https://www.ncbi.nlm.nih.gov/pubmed/17996106
http://dx.doi.org/10.1186/1471-2105-8-438
_version_ 1782149326929133568
author Caragea, Cornelia
Sinapov, Jivko
Silvescu, Adrian
Dobbs, Drena
Honavar, Vasant
author_facet Caragea, Cornelia
Sinapov, Jivko
Silvescu, Adrian
Dobbs, Drena
Honavar, Vasant
author_sort Caragea, Cornelia
collection PubMed
description BACKGROUND: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. RESULTS: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. CONCLUSION: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.
format Text
id pubmed-2220009
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22200092008-01-31 Glycosylation site prediction using ensembles of Support Vector Machine classifiers Caragea, Cornelia Sinapov, Jivko Silvescu, Adrian Dobbs, Drena Honavar, Vasant BMC Bioinformatics Research Article BACKGROUND: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. RESULTS: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. CONCLUSION: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences. BioMed Central 2007-11-09 /pmc/articles/PMC2220009/ /pubmed/17996106 http://dx.doi.org/10.1186/1471-2105-8-438 Text en Copyright © 2007 Caragea et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Caragea, Cornelia
Sinapov, Jivko
Silvescu, Adrian
Dobbs, Drena
Honavar, Vasant
Glycosylation site prediction using ensembles of Support Vector Machine classifiers
title Glycosylation site prediction using ensembles of Support Vector Machine classifiers
title_full Glycosylation site prediction using ensembles of Support Vector Machine classifiers
title_fullStr Glycosylation site prediction using ensembles of Support Vector Machine classifiers
title_full_unstemmed Glycosylation site prediction using ensembles of Support Vector Machine classifiers
title_short Glycosylation site prediction using ensembles of Support Vector Machine classifiers
title_sort glycosylation site prediction using ensembles of support vector machine classifiers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2220009/
https://www.ncbi.nlm.nih.gov/pubmed/17996106
http://dx.doi.org/10.1186/1471-2105-8-438
work_keys_str_mv AT carageacornelia glycosylationsitepredictionusingensemblesofsupportvectormachineclassifiers
AT sinapovjivko glycosylationsitepredictionusingensemblesofsupportvectormachineclassifiers
AT silvescuadrian glycosylationsitepredictionusingensemblesofsupportvectormachineclassifiers
AT dobbsdrena glycosylationsitepredictionusingensemblesofsupportvectormachineclassifiers
AT honavarvasant glycosylationsitepredictionusingensemblesofsupportvectormachineclassifiers