Cargando…

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions

MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, inf...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Zhongliang, Yeung, Wayland, Gravel, Nathan, Salcedo, Mariah, Soleymani, Saber, Li, Sheng, Kannan, Natarajan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900213/
https://www.ncbi.nlm.nih.gov/pubmed/36692152
http://dx.doi.org/10.1093/bioinformatics/btad046
_version_ 1784882800498835456
author Zhou, Zhongliang
Yeung, Wayland
Gravel, Nathan
Salcedo, Mariah
Soleymani, Saber
Li, Sheng
Kannan, Natarajan
author_facet Zhou, Zhongliang
Yeung, Wayland
Gravel, Nathan
Salcedo, Mariah
Soleymani, Saber
Li, Sheng
Kannan, Natarajan
author_sort Zhou, Zhongliang
collection PubMed
description MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase–substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9900213
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99002132023-02-07 Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan Bioinformatics Original Paper MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase–substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-01-24 /pmc/articles/PMC9900213/ /pubmed/36692152 http://dx.doi.org/10.1093/bioinformatics/btad046 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Zhou, Zhongliang
Yeung, Wayland
Gravel, Nathan
Salcedo, Mariah
Soleymani, Saber
Li, Sheng
Kannan, Natarajan
Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_full Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_fullStr Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_full_unstemmed Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_short Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_sort phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900213/
https://www.ncbi.nlm.nih.gov/pubmed/36692152
http://dx.doi.org/10.1093/bioinformatics/btad046
work_keys_str_mv AT zhouzhongliang phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions
AT yeungwayland phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions
AT gravelnathan phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions
AT salcedomariah phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions
AT soleymanisaber phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions
AT lisheng phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions
AT kannannatarajan phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions