Cargando…

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions

MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, inf...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Zhongliang, Yeung, Wayland, Gravel, Nathan, Salcedo, Mariah, Soleymani, Saber, Li, Sheng, Kannan, Natarajan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900213/ https://www.ncbi.nlm.nih.gov/pubmed/36692152 http://dx.doi.org/10.1093/bioinformatics/btad046

_version_	1784882800498835456
author	Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan
author_facet	Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan
author_sort	Zhou, Zhongliang
collection	PubMed
description	MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase–substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-9900213
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-99002132023-02-07 Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan Bioinformatics Original Paper MOTIVATION: The human genome encodes over 500 distinct protein kinases which regulate nearly all cellular processes by the specific phosphorylation of protein substrates. While advances in mass spectrometry and proteomics studies have identified thousands of phosphorylation sites across species, information on the specific kinases that phosphorylate these sites is currently lacking for the vast majority of phosphosites. Recently, there has been a major focus on the development of computational models for predicting kinase–substrate associations. However, most current models only allow predictions on a subset of well-studied kinases. Furthermore, the utilization of hand-curated features and imbalances in training and testing datasets pose unique challenges in the development of accurate predictive models for kinase-specific phosphorylation prediction. Motivated by the recent development of universal protein language models which automatically generate context-aware features from primary sequence information, we sought to develop a unified framework for kinase-specific phosphosite prediction, allowing for greater investigative utility and enabling substrate predictions at the whole kinome level. RESULTS: We present a deep learning model for kinase-specific phosphosite prediction, termed Phosformer, which predicts the probability of phosphorylation given an arbitrary pair of unaligned kinase and substrate peptide sequences. We demonstrate that Phosformer implicitly learns evolutionary and functional features during training, removing the need for feature curation and engineering. Further analyses reveal that Phosformer also learns substrate specificity motifs and is able to distinguish between functionally distinct kinase families. Benchmarks indicate that Phosformer exhibits significant improvements compared to the state-of-the-art models, while also presenting a more generalized, unified, and interpretable predictive framework. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/esbgkannan/phosformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-01-24 /pmc/articles/PMC9900213/ /pubmed/36692152 http://dx.doi.org/10.1093/bioinformatics/btad046 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Zhou, Zhongliang Yeung, Wayland Gravel, Nathan Salcedo, Mariah Soleymani, Saber Li, Sheng Kannan, Natarajan Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title	Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_full	Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_fullStr	Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_full_unstemmed	Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_short	Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
title_sort	phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900213/ https://www.ncbi.nlm.nih.gov/pubmed/36692152 http://dx.doi.org/10.1093/bioinformatics/btad046
work_keys_str_mv	AT zhouzhongliang phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT yeungwayland phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT gravelnathan phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT salcedomariah phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT soleymanisaber phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT lisheng phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions AT kannannatarajan phosformeranexplainabletransformermodelforproteinkinasespecificphosphorylationpredictions

Phosformer: an explainable transformer model for protein kinase-specific phosphorylation predictions

Ejemplares similares