Cargando…
Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, inf...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900213/ https://www.ncbi.nlm.nih.gov/pubmed/36692152 http://dx.doi.org/10.1093/bioinformatics/btad046 |
_version_ | 1784882800498835456 |
---|---|
author | Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan |
author_facet | Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan |
author_sort | Zhou, Zhongliang |
collection | PubMed |
description | MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase–substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9900213 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-99002132023-02-07 Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan Bioinformatics Original Paper MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase–substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-01-24 /pmc/articles/PMC9900213/ /pubmed/36692152 http://dx.doi.org/10.1093/bioinformatics/btad046 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions |
title | Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions |
title_full | Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions |
title_fullStr | Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions |
title_full_unstemmed | Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions |
title_short | Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions |
title_sort | phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900213/ https://www.ncbi.nlm.nih.gov/pubmed/36692152 http://dx.doi.org/10.1093/bioinformatics/btad046 |
work_keys_str_mv | AT zhouzhongliang phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT yeungwayland phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT gravelnathan phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT salcedomariah phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT soleymanisaber phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT lisheng phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT kannannatarajan phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions |