Cargando…
A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity shou...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682369/ https://www.ncbi.nlm.nih.gov/pubmed/26680539 http://dx.doi.org/10.1186/1471-2105-16-S18-S10 |
_version_ | 1782405876875788288 |
---|---|
author | Kao, Hui-Ju Huang, Chien-Hsun Bretaña, Neil Arvin Lu, Cheng-Tsung Huang, Kai-Yao Weng, Shun-Long Lee, Tzong-Yi |
author_facet | Kao, Hui-Ju Huang, Chien-Hsun Bretaña, Neil Arvin Lu, Cheng-Tsung Huang, Kai-Yao Weng, Shun-Long Lee, Tzong-Yi |
author_sort | Kao, Hui-Ju |
collection | PubMed |
description | Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/. |
format | Online Article Text |
id | pubmed-4682369 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46823692015-12-21 A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs Kao, Hui-Ju Huang, Chien-Hsun Bretaña, Neil Arvin Lu, Cheng-Tsung Huang, Kai-Yao Weng, Shun-Long Lee, Tzong-Yi BMC Bioinformatics Research Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/. BioMed Central 2015-12-09 /pmc/articles/PMC4682369/ /pubmed/26680539 http://dx.doi.org/10.1186/1471-2105-16-S18-S10 Text en Copyright © 2015 Kao et al.; http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Kao, Hui-Ju Huang, Chien-Hsun Bretaña, Neil Arvin Lu, Cheng-Tsung Huang, Kai-Yao Weng, Shun-Long Lee, Tzong-Yi A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs |
title | A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs |
title_full | A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs |
title_fullStr | A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs |
title_full_unstemmed | A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs |
title_short | A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs |
title_sort | two-layered machine learning method to identify protein o-glcnacylation sites with o-glcnac transferase substrate motifs |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682369/ https://www.ncbi.nlm.nih.gov/pubmed/26680539 http://dx.doi.org/10.1186/1471-2105-16-S18-S10 |
work_keys_str_mv | AT kaohuiju atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT huangchienhsun atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT bretananeilarvin atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT luchengtsung atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT huangkaiyao atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT wengshunlong atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT leetzongyi atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT kaohuiju twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT huangchienhsun twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT bretananeilarvin twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT luchengtsung twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT huangkaiyao twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT wengshunlong twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs AT leetzongyi twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs |