Cargando…

KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases

BACKGROUND: Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less...

Descripción completa

Detalles Bibliográficos
Autores principales: Anandakrishnan, Manju, Ross, Karen E., Chen, Chuming, Shanker, Vijay, Cowart, Julie, Wu, Cathy H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561642/
https://www.ncbi.nlm.nih.gov/pubmed/37818330
http://dx.doi.org/10.7717/peerj.16164
_version_ 1785117966688321536
author Anandakrishnan, Manju
Ross, Karen E.
Chen, Chuming
Shanker, Vijay
Cowart, Julie
Wu, Cathy H.
author_facet Anandakrishnan, Manju
Ross, Karen E.
Chen, Chuming
Shanker, Vijay
Cowart, Julie
Wu, Cathy H.
author_sort Anandakrishnan, Manju
collection PubMed
description BACKGROUND: Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. METHODS: KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder’s generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 “dark” kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. RESULTS: KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8–0.9, and two at 0.7–0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. CONCLUSIONS: KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.
format Online
Article
Text
id pubmed-10561642
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-105616422023-10-10 KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases Anandakrishnan, Manju Ross, Karen E. Chen, Chuming Shanker, Vijay Cowart, Julie Wu, Cathy H. PeerJ Bioinformatics BACKGROUND: Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. METHODS: KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder’s generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 “dark” kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. RESULTS: KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8–0.9, and two at 0.7–0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. CONCLUSIONS: KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates. PeerJ Inc. 2023-10-06 /pmc/articles/PMC10561642/ /pubmed/37818330 http://dx.doi.org/10.7717/peerj.16164 Text en ©2023 Anandakrishnan et al. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by-nc/4.0/) , which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Anandakrishnan, Manju
Ross, Karen E.
Chen, Chuming
Shanker, Vijay
Cowart, Julie
Wu, Cathy H.
KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases
title KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases
title_full KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases
title_fullStr KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases
title_full_unstemmed KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases
title_short KSFinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases
title_sort ksfinder—a knowledge graph model for link prediction of novel phosphorylated substrates of kinases
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561642/
https://www.ncbi.nlm.nih.gov/pubmed/37818330
http://dx.doi.org/10.7717/peerj.16164
work_keys_str_mv AT anandakrishnanmanju ksfinderaknowledgegraphmodelforlinkpredictionofnovelphosphorylatedsubstratesofkinases
AT rosskarene ksfinderaknowledgegraphmodelforlinkpredictionofnovelphosphorylatedsubstratesofkinases
AT chenchuming ksfinderaknowledgegraphmodelforlinkpredictionofnovelphosphorylatedsubstratesofkinases
AT shankervijay ksfinderaknowledgegraphmodelforlinkpredictionofnovelphosphorylatedsubstratesofkinases
AT cowartjulie ksfinderaknowledgegraphmodelforlinkpredictionofnovelphosphorylatedsubstratesofkinases
AT wucathyh ksfinderaknowledgegraphmodelforlinkpredictionofnovelphosphorylatedsubstratesofkinases