Cargando…

A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs

Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity shou...

Descripción completa

Detalles Bibliográficos
Autores principales: Kao, Hui-Ju, Huang, Chien-Hsun, Bretaña, Neil Arvin, Lu, Cheng-Tsung, Huang, Kai-Yao, Weng, Shun-Long, Lee, Tzong-Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682369/
https://www.ncbi.nlm.nih.gov/pubmed/26680539
http://dx.doi.org/10.1186/1471-2105-16-S18-S10
_version_ 1782405876875788288
author Kao, Hui-Ju
Huang, Chien-Hsun
Bretaña, Neil Arvin
Lu, Cheng-Tsung
Huang, Kai-Yao
Weng, Shun-Long
Lee, Tzong-Yi
author_facet Kao, Hui-Ju
Huang, Chien-Hsun
Bretaña, Neil Arvin
Lu, Cheng-Tsung
Huang, Kai-Yao
Weng, Shun-Long
Lee, Tzong-Yi
author_sort Kao, Hui-Ju
collection PubMed
description Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.
format Online
Article
Text
id pubmed-4682369
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46823692015-12-21 A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs Kao, Hui-Ju Huang, Chien-Hsun Bretaña, Neil Arvin Lu, Cheng-Tsung Huang, Kai-Yao Weng, Shun-Long Lee, Tzong-Yi BMC Bioinformatics Research Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/. BioMed Central 2015-12-09 /pmc/articles/PMC4682369/ /pubmed/26680539 http://dx.doi.org/10.1186/1471-2105-16-S18-S10 Text en Copyright © 2015 Kao et al.; http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kao, Hui-Ju
Huang, Chien-Hsun
Bretaña, Neil Arvin
Lu, Cheng-Tsung
Huang, Kai-Yao
Weng, Shun-Long
Lee, Tzong-Yi
A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
title A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
title_full A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
title_fullStr A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
title_full_unstemmed A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
title_short A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs
title_sort two-layered machine learning method to identify protein o-glcnacylation sites with o-glcnac transferase substrate motifs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682369/
https://www.ncbi.nlm.nih.gov/pubmed/26680539
http://dx.doi.org/10.1186/1471-2105-16-S18-S10
work_keys_str_mv AT kaohuiju atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT huangchienhsun atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT bretananeilarvin atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT luchengtsung atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT huangkaiyao atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT wengshunlong atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT leetzongyi atwolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT kaohuiju twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT huangchienhsun twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT bretananeilarvin twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT luchengtsung twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT huangkaiyao twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT wengshunlong twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs
AT leetzongyi twolayeredmachinelearningmethodtoidentifyproteinoglcnacylationsiteswithoglcnactransferasesubstratemotifs