Cargando…
Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding
The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied “dark” members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the e...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590106/ https://www.ncbi.nlm.nih.gov/pubmed/37868056 http://dx.doi.org/10.7717/peerj.15815 |
_version_ | 1785123928963809280 |
---|---|
author | Salcedo, Mariah V. Gravel, Nathan Keshavarzi, Abbas Huang, Liang-Chin Kochut, Krzysztof J. Kannan, Natarajan |
author_facet | Salcedo, Mariah V. Gravel, Nathan Keshavarzi, Abbas Huang, Liang-Chin Kochut, Krzysztof J. Kannan, Natarajan |
author_sort | Salcedo, Mariah V. |
collection | PubMed |
description | The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied “dark” members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing. |
format | Online Article Text |
id | pubmed-10590106 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105901062023-10-22 Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding Salcedo, Mariah V. Gravel, Nathan Keshavarzi, Abbas Huang, Liang-Chin Kochut, Krzysztof J. Kannan, Natarajan PeerJ Biochemistry The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied “dark” members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing. PeerJ Inc. 2023-10-18 /pmc/articles/PMC10590106/ /pubmed/37868056 http://dx.doi.org/10.7717/peerj.15815 Text en ©2023 Salcedo et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Biochemistry Salcedo, Mariah V. Gravel, Nathan Keshavarzi, Abbas Huang, Liang-Chin Kochut, Krzysztof J. Kannan, Natarajan Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding |
title | Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding |
title_full | Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding |
title_fullStr | Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding |
title_full_unstemmed | Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding |
title_short | Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding |
title_sort | predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding |
topic | Biochemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10590106/ https://www.ncbi.nlm.nih.gov/pubmed/37868056 http://dx.doi.org/10.7717/peerj.15815 |
work_keys_str_mv | AT salcedomariahv predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding AT gravelnathan predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding AT keshavarziabbas predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding AT huangliangchin predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding AT kochutkrzysztofj predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding AT kannannatarajan predictingproteinandpathwayassociationsforunderstudieddarkkinasesusingpatternconstrainedknowledgegraphembedding |