Cargando…
pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties
BACKGROUND: Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the i...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520798/ https://www.ncbi.nlm.nih.gov/pubmed/36171552 http://dx.doi.org/10.1186/s12859-022-04938-x |
_version_ | 1784799706557186048 |
---|---|
author | Zhao, Jian Zhuang, Minhui Liu, Jingjing Zhang, Meng Zeng, Cong Jiang, Bin Wu, Jing Song, Xiaofeng |
author_facet | Zhao, Jian Zhuang, Minhui Liu, Jingjing Zhang, Meng Zeng, Cong Jiang, Bin Wu, Jing Song, Xiaofeng |
author_sort | Zhao, Jian |
collection | PubMed |
description | BACKGROUND: Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the inherent disadvantages of experimental methods, it is an urgent task for developing efficient computational approaches to identify pHis sites. RESULTS: Here, we present a novel tool, pHisPred, for accurately identifying pHis sites from protein sequences. We manually collected the largest number of experimental validated pHis sites to build benchmark datasets. Using randomized tenfold CV, the weighted SVM-RBF model shows the best performance than other four commonly used classification models (LR, KNN, RF, and MLP). From ten thousands of features, 140 and 150 most informative features were individually selected out for eukaryotic and prokaryotic models. The average AUC and F1-score values of pHisPred were (0.81, 0.40) and (0.78, 0.46) for tenfold CV on the eukaryotic and prokaryotic training datasets, respectively. In addition, pHisPred significantly outperforms other tools on testing datasets, in particular on the eukaryotic one. CONCLUSION: We implemented a python program of pHisPred, which is freely available for non-commercial use at https://github.com/xiaofengsong/pHisPred. Moreover, users can use it to train new models with their own data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04938-x. |
format | Online Article Text |
id | pubmed-9520798 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95207982022-09-30 pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties Zhao, Jian Zhuang, Minhui Liu, Jingjing Zhang, Meng Zeng, Cong Jiang, Bin Wu, Jing Song, Xiaofeng BMC Bioinformatics Methodology BACKGROUND: Protein histidine phosphorylation (pHis) plays critical roles in prokaryotic signal transduction pathways and various eukaryotic cellular processes. It is estimated to account for 6–10% of the phosphoproteome, however only hundreds of pHis sites have been discovered to date. Due to the inherent disadvantages of experimental methods, it is an urgent task for developing efficient computational approaches to identify pHis sites. RESULTS: Here, we present a novel tool, pHisPred, for accurately identifying pHis sites from protein sequences. We manually collected the largest number of experimental validated pHis sites to build benchmark datasets. Using randomized tenfold CV, the weighted SVM-RBF model shows the best performance than other four commonly used classification models (LR, KNN, RF, and MLP). From ten thousands of features, 140 and 150 most informative features were individually selected out for eukaryotic and prokaryotic models. The average AUC and F1-score values of pHisPred were (0.81, 0.40) and (0.78, 0.46) for tenfold CV on the eukaryotic and prokaryotic training datasets, respectively. In addition, pHisPred significantly outperforms other tools on testing datasets, in particular on the eukaryotic one. CONCLUSION: We implemented a python program of pHisPred, which is freely available for non-commercial use at https://github.com/xiaofengsong/pHisPred. Moreover, users can use it to train new models with their own data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04938-x. BioMed Central 2022-09-28 /pmc/articles/PMC9520798/ /pubmed/36171552 http://dx.doi.org/10.1186/s12859-022-04938-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Zhao, Jian Zhuang, Minhui Liu, Jingjing Zhang, Meng Zeng, Cong Jiang, Bin Wu, Jing Song, Xiaofeng pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties |
title | pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties |
title_full | pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties |
title_fullStr | pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties |
title_full_unstemmed | pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties |
title_short | pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties |
title_sort | phispred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9520798/ https://www.ncbi.nlm.nih.gov/pubmed/36171552 http://dx.doi.org/10.1186/s12859-022-04938-x |
work_keys_str_mv | AT zhaojian phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties AT zhuangminhui phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties AT liujingjing phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties AT zhangmeng phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties AT zengcong phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties AT jiangbin phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties AT wujing phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties AT songxiaofeng phispredatoolfortheidentificationofhistidinephosphorylationsitesbyintegratingaminoacidpatternsandproperties |