Cargando…

Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins

BACKGROUND: Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate gro...

Descripción completa

Detalles Bibliográficos
Autores principales: Jamal, Salma, Ali, Waseem, Nagpal, Priya, Grover, Abhinav, Grover, Sonam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142496/
https://www.ncbi.nlm.nih.gov/pubmed/34030700
http://dx.doi.org/10.1186/s12967-021-02851-0
_version_ 1783696564788658176
author Jamal, Salma
Ali, Waseem
Nagpal, Priya
Grover, Abhinav
Grover, Sonam
author_facet Jamal, Salma
Ali, Waseem
Nagpal, Priya
Grover, Abhinav
Grover, Sonam
author_sort Jamal, Salma
collection PubMed
description BACKGROUND: Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases—most commonly neurological disorders, Alzheimer’s disease, and Parkinson’s disease—thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. METHODS: In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. RESULTS: The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. CONCLUSIONS: The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-021-02851-0.
format Online
Article
Text
id pubmed-8142496
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81424962021-05-25 Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins Jamal, Salma Ali, Waseem Nagpal, Priya Grover, Abhinav Grover, Sonam J Transl Med Research BACKGROUND: Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases—most commonly neurological disorders, Alzheimer’s disease, and Parkinson’s disease—thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. METHODS: In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. RESULTS: The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. CONCLUSIONS: The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12967-021-02851-0. BioMed Central 2021-05-24 /pmc/articles/PMC8142496/ /pubmed/34030700 http://dx.doi.org/10.1186/s12967-021-02851-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Jamal, Salma
Ali, Waseem
Nagpal, Priya
Grover, Abhinav
Grover, Sonam
Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
title Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
title_full Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
title_fullStr Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
title_full_unstemmed Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
title_short Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
title_sort predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8142496/
https://www.ncbi.nlm.nih.gov/pubmed/34030700
http://dx.doi.org/10.1186/s12967-021-02851-0
work_keys_str_mv AT jamalsalma predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins
AT aliwaseem predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins
AT nagpalpriya predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins
AT groverabhinav predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins
AT groversonam predictingphosphorylationsitesusingmachinelearningbyintegratingthesequencestructureandfunctionalinformationofproteins