Cargando…
UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines
BACKGROUND: The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process – E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically bin...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895383/ https://www.ncbi.nlm.nih.gov/pubmed/26818456 http://dx.doi.org/10.1186/s12918-015-0246-z |
_version_ | 1782435836060499968 |
---|---|
author | Huang, Chien-Hsun Su, Min-Gang Kao, Hui-Ju Jhong, Jhih-Hua Weng, Shun-Long Lee, Tzong-Yi |
author_facet | Huang, Chien-Hsun Su, Min-Gang Kao, Hui-Ju Jhong, Jhih-Hua Weng, Shun-Long Lee, Tzong-Yi |
author_sort | Huang, Chien-Hsun |
collection | PubMed |
description | BACKGROUND: The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process – E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically binds the last amino acid of ubiquitin (glycine 76) to a lysine residue of a target protein. The high-throughput of mass spectrometry-based proteomics has stimulated a large-scale identification of ubiquitin-conjugated peptides. Hence, a new web resource, UbiSite, was developed to identify ubiquitin-conjugation site on lysines based on large-scale proteome dataset. RESULTS: Given a total of 37,647 ubiquitin-conjugated proteins, including 128026 ubiquitylated peptides, obtained from various resources, this study carries out a large-scale investigation on ubiquitin-conjugation sites based on sequenced and structural characteristics. A TwoSampleLogo reveals that a significant depletion of histidine (H), arginine (R) and cysteine (C) residues around ubiquitylation sites may impact the conjugation of ubiquitins in closed three-dimensional environments. Based on the large-scale ubiquitylation dataset, a motif discovery tool, MDDLogo, has been adopted to characterize the potential substrate motifs for ubiquitin conjugation. Not only are single features such as amino acid composition (AAC), positional weighted matrix (PWM), position-specific scoring matrix (PSSM) and solvent-accessible surface area (SASA) considered, but also the effectiveness of incorporating MDDLogo-identified substrate motifs into a two-layered prediction model is taken into account. Evaluation by five-fold cross-validation showed that PSSM is the best feature in discriminating between ubiquitylation and non-ubiquitylation sites, based on support vector machine (SVM). Additionally, the two-layered SVM model integrating MDDLogo-identified substrate motifs could obtain a promising accuracy and the Matthews Correlation Coefficient (MCC) at 81.06 % and 0.586, respectively. Furthermore, the independent testing showed that the two-layered SVM model could outperform other prediction tools, reaching at 85.10 % sensitivity, 69.69 % specificity, 73.69 % accuracy and the 0.483 of MCC value. CONCLUSION: The independent testing result indicated the effectiveness of incorporating MDDLogo-identified motifs into the prediction of ubiquitylation sites. In order to provide meaningful assistance to researchers interested in large-scale ubiquitinome data, the two-layered SVM model has been implemented onto a web-based system (UbiSite), which is freely available at http://csb.cse.yzu.edu.tw/UbiSite/. Two cases given in the UbiSite provide a demonstration of effective identification of ubiquitylation sites with reference to substrate motifs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-015-0246-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4895383 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-48953832016-06-10 UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines Huang, Chien-Hsun Su, Min-Gang Kao, Hui-Ju Jhong, Jhih-Hua Weng, Shun-Long Lee, Tzong-Yi BMC Syst Biol Proceedings BACKGROUND: The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process – E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically binds the last amino acid of ubiquitin (glycine 76) to a lysine residue of a target protein. The high-throughput of mass spectrometry-based proteomics has stimulated a large-scale identification of ubiquitin-conjugated peptides. Hence, a new web resource, UbiSite, was developed to identify ubiquitin-conjugation site on lysines based on large-scale proteome dataset. RESULTS: Given a total of 37,647 ubiquitin-conjugated proteins, including 128026 ubiquitylated peptides, obtained from various resources, this study carries out a large-scale investigation on ubiquitin-conjugation sites based on sequenced and structural characteristics. A TwoSampleLogo reveals that a significant depletion of histidine (H), arginine (R) and cysteine (C) residues around ubiquitylation sites may impact the conjugation of ubiquitins in closed three-dimensional environments. Based on the large-scale ubiquitylation dataset, a motif discovery tool, MDDLogo, has been adopted to characterize the potential substrate motifs for ubiquitin conjugation. Not only are single features such as amino acid composition (AAC), positional weighted matrix (PWM), position-specific scoring matrix (PSSM) and solvent-accessible surface area (SASA) considered, but also the effectiveness of incorporating MDDLogo-identified substrate motifs into a two-layered prediction model is taken into account. Evaluation by five-fold cross-validation showed that PSSM is the best feature in discriminating between ubiquitylation and non-ubiquitylation sites, based on support vector machine (SVM). Additionally, the two-layered SVM model integrating MDDLogo-identified substrate motifs could obtain a promising accuracy and the Matthews Correlation Coefficient (MCC) at 81.06 % and 0.586, respectively. Furthermore, the independent testing showed that the two-layered SVM model could outperform other prediction tools, reaching at 85.10 % sensitivity, 69.69 % specificity, 73.69 % accuracy and the 0.483 of MCC value. CONCLUSION: The independent testing result indicated the effectiveness of incorporating MDDLogo-identified motifs into the prediction of ubiquitylation sites. In order to provide meaningful assistance to researchers interested in large-scale ubiquitinome data, the two-layered SVM model has been implemented onto a web-based system (UbiSite), which is freely available at http://csb.cse.yzu.edu.tw/UbiSite/. Two cases given in the UbiSite provide a demonstration of effective identification of ubiquitylation sites with reference to substrate motifs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-015-0246-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-11 /pmc/articles/PMC4895383/ /pubmed/26818456 http://dx.doi.org/10.1186/s12918-015-0246-z Text en © Huang et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Huang, Chien-Hsun Su, Min-Gang Kao, Hui-Ju Jhong, Jhih-Hua Weng, Shun-Long Lee, Tzong-Yi UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines |
title | UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines |
title_full | UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines |
title_fullStr | UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines |
title_full_unstemmed | UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines |
title_short | UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines |
title_sort | ubisite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895383/ https://www.ncbi.nlm.nih.gov/pubmed/26818456 http://dx.doi.org/10.1186/s12918-015-0246-z |
work_keys_str_mv | AT huangchienhsun ubisiteincorporatingtwolayeredmachinelearningmethodwithsubstratemotifstopredictubiquitinconjugationsiteonlysines AT sumingang ubisiteincorporatingtwolayeredmachinelearningmethodwithsubstratemotifstopredictubiquitinconjugationsiteonlysines AT kaohuiju ubisiteincorporatingtwolayeredmachinelearningmethodwithsubstratemotifstopredictubiquitinconjugationsiteonlysines AT jhongjhihhua ubisiteincorporatingtwolayeredmachinelearningmethodwithsubstratemotifstopredictubiquitinconjugationsiteonlysines AT wengshunlong ubisiteincorporatingtwolayeredmachinelearningmethodwithsubstratemotifstopredictubiquitinconjugationsiteonlysines AT leetzongyi ubisiteincorporatingtwolayeredmachinelearningmethodwithsubstratemotifstopredictubiquitinconjugationsiteonlysines |