Cargando…

PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity

BACKGROUND: Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. Howeve...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Tzong-Yi, Bretaña, Neil Arvin, Lu, Cheng-Tsung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228547/
https://www.ncbi.nlm.nih.gov/pubmed/21703007
http://dx.doi.org/10.1186/1471-2105-12-261
_version_ 1782217830953910272
author Lee, Tzong-Yi
Bretaña, Neil Arvin
Lu, Cheng-Tsung
author_facet Lee, Tzong-Yi
Bretaña, Neil Arvin
Lu, Cheng-Tsung
author_sort Lee, Tzong-Yi
collection PubMed
description BACKGROUND: Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding in silico prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites. RESULTS: Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species Arabidopsis thaliana. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using Arabidopsis thaliana phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms. CONCLUSIONS: This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos http://csb.cse.yzu.edu.tw/PlantPhos/. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos.
format Online
Article
Text
id pubmed-3228547
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32285472011-12-02 PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity Lee, Tzong-Yi Bretaña, Neil Arvin Lu, Cheng-Tsung BMC Bioinformatics Research Article BACKGROUND: Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. Due to the difficulty in performing high-throughput mass spectrometry-based experiment, there is a desire to predict phosphorylation sites using computational methods. However, previous studies regarding in silico prediction of plant phosphorylation sites lack the consideration of kinase-specific phosphorylation data. Thus, we are motivated to propose a new method that investigates different substrate specificities in plant phosphorylation sites. RESULTS: Experimentally verified phosphorylation data were extracted from TAIR9-a protein database containing 3006 phosphorylation data from the plant species Arabidopsis thaliana. In an attempt to investigate the various substrate motifs in plant phosphorylation, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Profile hidden Markov model (HMM) is then applied to learn a predictive model for each subgroup. Cross-validation evaluation on the MDD-clustered HMMs yields an average accuracy of 82.4% for serine, 78.6% for threonine, and 89.0% for tyrosine models. Moreover, independent test results using Arabidopsis thaliana phosphorylation data from UniProtKB/Swiss-Prot show that the proposed models are able to correctly predict 81.4% phosphoserine, 77.1% phosphothreonine, and 83.7% phosphotyrosine sites. Interestingly, several MDD-clustered subgroups are observed to have similar amino acid conservation with the substrate motifs of well-known kinases from Phospho.ELM-a database containing kinase-specific phosphorylation data from multiple organisms. CONCLUSIONS: This work presents a novel method for identifying plant phosphorylation sites with various substrate motifs. Based on cross-validation and independent testing, results show that the MDD-clustered models outperform models trained without using MDD. The proposed method has been implemented as a web-based plant phosphorylation prediction tool, PlantPhos http://csb.cse.yzu.edu.tw/PlantPhos/. Additionally, two case studies have been demonstrated to further evaluate the effectiveness of PlantPhos. BioMed Central 2011-06-26 /pmc/articles/PMC3228547/ /pubmed/21703007 http://dx.doi.org/10.1186/1471-2105-12-261 Text en Copyright ©2011 Lee et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lee, Tzong-Yi
Bretaña, Neil Arvin
Lu, Cheng-Tsung
PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_full PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_fullStr PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_full_unstemmed PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_short PlantPhos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
title_sort plantphos: using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228547/
https://www.ncbi.nlm.nih.gov/pubmed/21703007
http://dx.doi.org/10.1186/1471-2105-12-261
work_keys_str_mv AT leetzongyi plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity
AT bretananeilarvin plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity
AT luchengtsung plantphosusingmaximaldependencedecompositiontoidentifyplantphosphorylationsiteswithsubstratesitespecificity