Cargando…

Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding

The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied “dark” members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the e...

Descripción completa

Detalles Bibliográficos
Autores principales: Salcedo, Mariah V., Gravel, Nathan, Keshavarzi, Abbas, Huang, Liang-Chin, Kochut, Krzysztof J., Kannan, Natarajan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590106/
https://www.ncbi.nlm.nih.gov/pubmed/37868056
http://dx.doi.org/10.7717/peerj.15815
_version_ 1785123928963809280
author Salcedo, Mariah V.
Gravel, Nathan
Keshavarzi, Abbas
Huang, Liang-Chin
Kochut, Krzysztof J.
Kannan, Natarajan
author_facet Salcedo, Mariah V.
Gravel, Nathan
Keshavarzi, Abbas
Huang, Liang-Chin
Kochut, Krzysztof J.
Kannan, Natarajan
author_sort Salcedo, Mariah V.
collection PubMed
description The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied “dark” members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing.
format Online
Article
Text
id pubmed-10590106
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-105901062023-10-22 Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding Salcedo, Mariah V. Gravel, Nathan Keshavarzi, Abbas Huang, Liang-Chin Kochut, Krzysztof J. Kannan, Natarajan PeerJ Biochemistry The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied “dark” members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing. PeerJ Inc. 2023-10-18 /pmc/articles/PMC10590106/ /pubmed/37868056 http://dx.doi.org/10.7717/peerj.15815 Text en ©2023 Salcedo et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Biochemistry
Salcedo, Mariah V.
Gravel, Nathan
Keshavarzi, Abbas
Huang, Liang-Chin
Kochut, Krzysztof J.
Kannan, Natarajan
Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
title Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
title_full Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
title_fullStr Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
title_full_unstemmed Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
title_short Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
title_sort predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
topic Biochemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590106/
https://www.ncbi.nlm.nih.gov/pubmed/37868056
http://dx.doi.org/10.7717/peerj.15815
work_keys_str_mv AT salcedomariahv predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding
AT gravelnathan predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding
AT keshavarziabbas predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding
AT huangliangchin predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding
AT kochutkrzysztofj predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding
AT kannannatarajan predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding