Cargando…

pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties

BACKGROUND: Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the i...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Jian, Zhuang, Minhui, Liu, Jingjing, Zhang, Meng, Zeng, Cong, Jiang, Bin, Wu, Jing, Song, Xiaofeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520798/
https://www.ncbi.nlm.nih.gov/pubmed/36171552
http://dx.doi.org/10.1186/s12859-022-04938-x
_version_ 1784799706557186048
author Zhao, Jian
Zhuang, Minhui
Liu, Jingjing
Zhang, Meng
Zeng, Cong
Jiang, Bin
Wu, Jing
Song, Xiaofeng
author_facet Zhao, Jian
Zhuang, Minhui
Liu, Jingjing
Zhang, Meng
Zeng, Cong
Jiang, Bin
Wu, Jing
Song, Xiaofeng
author_sort Zhao, Jian
collection PubMed
description BACKGROUND: Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the inherent disadvantages of experimental methods, it is an urgent task for developing efficient computational approaches to identify pHis sites. RESULTS: Here, we present a novel tool, pHisPred, for accurately identifying pHis sites from protein sequences. We manually collected the largest number of experimental validated pHis sites to build benchmark datasets. Using randomized tenfold CV, the weighted SVM-RBF model shows the best performance than other four commonly used classification models (LR, KNN, RF, and MLP). From ten thousands of features, 140 and 150 most informative features were individually selected out for eukaryotic and prokaryotic models. The average AUC and F1-score values of pHisPred were (0.81, 0.40) and (0.78, 0.46) for tenfold CV on the eukaryotic and prokaryotic training datasets, respectively. In addition, pHisPred significantly outperforms other tools on testing datasets, in particular on the eukaryotic one. CONCLUSION: We implemented a python program of pHisPred, which is freely available for non-commercial use at https://github.com/xiaofengsong/pHisPred. Moreover, users can use it to train new models with their own data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04938-x.
format Online
Article
Text
id pubmed-9520798
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95207982022-09-30 pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties Zhao, Jian Zhuang, Minhui Liu, Jingjing Zhang, Meng Zeng, Cong Jiang, Bin Wu, Jing Song, Xiaofeng BMC Bioinformatics Methodology BACKGROUND: Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the inherent disadvantages of experimental methods, it is an urgent task for developing efficient computational approaches to identify pHis sites. RESULTS: Here, we present a novel tool, pHisPred, for accurately identifying pHis sites from protein sequences. We manually collected the largest number of experimental validated pHis sites to build benchmark datasets. Using randomized tenfold CV, the weighted SVM-RBF model shows the best performance than other four commonly used classification models (LR, KNN, RF, and MLP). From ten thousands of features, 140 and 150 most informative features were individually selected out for eukaryotic and prokaryotic models. The average AUC and F1-score values of pHisPred were (0.81, 0.40) and (0.78, 0.46) for tenfold CV on the eukaryotic and prokaryotic training datasets, respectively. In addition, pHisPred significantly outperforms other tools on testing datasets, in particular on the eukaryotic one. CONCLUSION: We implemented a python program of pHisPred, which is freely available for non-commercial use at https://github.com/xiaofengsong/pHisPred. Moreover, users can use it to train new models with their own data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04938-x. BioMed Central 2022-09-28 /pmc/articles/PMC9520798/ /pubmed/36171552 http://dx.doi.org/10.1186/s12859-022-04938-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Zhao, Jian
Zhuang, Minhui
Liu, Jingjing
Zhang, Meng
Zeng, Cong
Jiang, Bin
Wu, Jing
Song, Xiaofeng
pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_full pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_fullStr pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_full_unstemmed pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_short pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
title_sort phispred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520798/
https://www.ncbi.nlm.nih.gov/pubmed/36171552
http://dx.doi.org/10.1186/s12859-022-04938-x
work_keys_str_mv AT zhaojian phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT zhuangminhui phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT liujingjing phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT zhangmeng phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT zengcong phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT jiangbin phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT wujing phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties
AT songxiaofeng phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties