Cargando…

iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features

BACKGROUND: Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism an...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Kai-Yao, Hung, Fang-Yu, Kao, Hui-Ju, Lau, Hui-Hsuan, Weng, Shun-Long
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7727188/
https://www.ncbi.nlm.nih.gov/pubmed/33297954
http://dx.doi.org/10.1186/s12859-020-03916-5
_version_ 1783621050624376832
author Huang, Kai-Yao
Hung, Fang-Yu
Kao, Hui-Ju
Lau, Hui-Hsuan
Weng, Shun-Long
author_facet Huang, Kai-Yao
Hung, Fang-Yu
Kao, Hui-Ju
Lau, Hui-Hsuan
Weng, Shun-Long
author_sort Huang, Kai-Yao
collection PubMed
description BACKGROUND: Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites. RESULT: We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. CONCLUSION: The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/.
format Online
Article
Text
id pubmed-7727188
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77271882020-12-11 iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features Huang, Kai-Yao Hung, Fang-Yu Kao, Hui-Ju Lau, Hui-Hsuan Weng, Shun-Long BMC Bioinformatics Research Article BACKGROUND: Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites. RESULT: We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. CONCLUSION: The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/. BioMed Central 2020-12-09 /pmc/articles/PMC7727188/ /pubmed/33297954 http://dx.doi.org/10.1186/s12859-020-03916-5 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Huang, Kai-Yao
Hung, Fang-Yu
Kao, Hui-Ju
Lau, Hui-Hsuan
Weng, Shun-Long
iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features
title iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features
title_full iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features
title_fullStr iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features
title_full_unstemmed iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features
title_short iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features
title_sort idpgk: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7727188/
https://www.ncbi.nlm.nih.gov/pubmed/33297954
http://dx.doi.org/10.1186/s12859-020-03916-5
work_keys_str_mv AT huangkaiyao idpgkcharacterizationandidentificationoflysinephosphoglycerylationsitesbasedonsequencebasedfeatures
AT hungfangyu idpgkcharacterizationandidentificationoflysinephosphoglycerylationsitesbasedonsequencebasedfeatures
AT kaohuiju idpgkcharacterizationandidentificationoflysinephosphoglycerylationsitesbasedonsequencebasedfeatures
AT lauhuihsuan idpgkcharacterizationandidentificationoflysinephosphoglycerylationsitesbasedonsequencebasedfeatures
AT wengshunlong idpgkcharacterizationandidentificationoflysinephosphoglycerylationsitesbasedonsequencebasedfeatures