Cargando…
Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
BACKGROUND: Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate gro...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142496/ https://www.ncbi.nlm.nih.gov/pubmed/34030700 http://dx.doi.org/10.1186/s12967-021-02851-0 |
_version_ | 1783696564788658176 |
---|---|
author | Jamal, Salma Ali, Waseem Nagpal, Priya Grover, Abhinav Grover, Sonam |
author_facet | Jamal, Salma Ali, Waseem Nagpal, Priya Grover, Abhinav Grover, Sonam |
author_sort | Jamal, Salma |
collection | PubMed |
description | BACKGROUND: Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases—most commonly neurological disorders, Alzheimer’s disease, and Parkinson’s disease—thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. METHODS: In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. RESULTS: The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. CONCLUSIONS: The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-021-02851-0. |
format | Online Article Text |
id | pubmed-8142496 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81424962021-05-25 Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins Jamal, Salma Ali, Waseem Nagpal, Priya Grover, Abhinav Grover, Sonam J Transl Med Research BACKGROUND: Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases—most commonly neurological disorders, Alzheimer’s disease, and Parkinson’s disease—thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. METHODS: In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. RESULTS: The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. CONCLUSIONS: The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-021-02851-0. BioMed Central 2021-05-24 /pmc/articles/PMC8142496/ /pubmed/34030700 http://dx.doi.org/10.1186/s12967-021-02851-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Jamal, Salma Ali, Waseem Nagpal, Priya Grover, Abhinav Grover, Sonam Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins |
title | Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins |
title_full | Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins |
title_fullStr | Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins |
title_full_unstemmed | Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins |
title_short | Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins |
title_sort | predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142496/ https://www.ncbi.nlm.nih.gov/pubmed/34030700 http://dx.doi.org/10.1186/s12967-021-02851-0 |
work_keys_str_mv | AT jamalsalma predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins AT aliwaseem predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins AT nagpalpriya predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins AT groverabhinav predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins AT groversonam predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins |