Cargando…

An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding

Protein–protein interaction (PPI) is the basis of the whole molecular mechanisms of living cells. Although traditional experiments are able to detect PPIs accurately, they often encounter high cost and require more time. As a result, computational methods have been used to predict PPIs to avoid thes...

Descripción completa

Detalles Bibliográficos
Autores principales: Su, Xiao-Rui, You, Zhu-Hong, Hu, Lun, Huang, Yu-An, Wang, Yi, Yi, Hai-Cheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7953052/
https://www.ncbi.nlm.nih.gov/pubmed/33719344
http://dx.doi.org/10.3389/fgene.2021.635451
_version_ 1783663854139473920
author Su, Xiao-Rui
You, Zhu-Hong
Hu, Lun
Huang, Yu-An
Wang, Yi
Yi, Hai-Cheng
author_facet Su, Xiao-Rui
You, Zhu-Hong
Hu, Lun
Huang, Yu-An
Wang, Yi
Yi, Hai-Cheng
author_sort Su, Xiao-Rui
collection PubMed
description Protein–protein interaction (PPI) is the basis of the whole molecular mechanisms of living cells. Although traditional experiments are able to detect PPIs accurately, they often encounter high cost and require more time. As a result, computational methods have been used to predict PPIs to avoid these problems. Graph structure, as the important and pervasive data carriers, is considered as the most suitable structure to present biomedical entities and relationships. Although graph embedding is the most popular approach for graph representation learning, it usually suffers from high computational and space cost, especially in large-scale graphs. Therefore, developing a framework, which can accelerate graph embedding and improve the accuracy of embedding results, is important to large-scale PPIs prediction. In this paper, we propose a multi-level model LPPI to improve both the quality and speed of large-scale PPIs prediction. Firstly, protein basic information is collected as its attribute, including positional gene sets, motif gene sets, and immunological signatures. Secondly, we construct a weighted graph by using protein attributes to calculate node similarity. Then GraphZoom is used to accelerate the embedding process by reducing the size of the weighted graph. Next, graph embedding methods are used to learn graph topology features from the reconstructed graph. Finally, the linear Logistic Regression (LR) model is used to predict the probability of interactions of two proteins. LPPI achieved a high accuracy of 0.99997 and 0.9979 on the PPI network dataset and GraphSAGE-PPI dataset, respectively. Our further results show that the LPPI is promising for large-scale PPI prediction in both accuracy and efficiency, which is beneficial to other large-scale biomedical molecules interactions detection.
format Online
Article
Text
id pubmed-7953052
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79530522021-03-13 An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding Su, Xiao-Rui You, Zhu-Hong Hu, Lun Huang, Yu-An Wang, Yi Yi, Hai-Cheng Front Genet Genetics Protein–protein interaction (PPI) is the basis of the whole molecular mechanisms of living cells. Although traditional experiments are able to detect PPIs accurately, they often encounter high cost and require more time. As a result, computational methods have been used to predict PPIs to avoid these problems. Graph structure, as the important and pervasive data carriers, is considered as the most suitable structure to present biomedical entities and relationships. Although graph embedding is the most popular approach for graph representation learning, it usually suffers from high computational and space cost, especially in large-scale graphs. Therefore, developing a framework, which can accelerate graph embedding and improve the accuracy of embedding results, is important to large-scale PPIs prediction. In this paper, we propose a multi-level model LPPI to improve both the quality and speed of large-scale PPIs prediction. Firstly, protein basic information is collected as its attribute, including positional gene sets, motif gene sets, and immunological signatures. Secondly, we construct a weighted graph by using protein attributes to calculate node similarity. Then GraphZoom is used to accelerate the embedding process by reducing the size of the weighted graph. Next, graph embedding methods are used to learn graph topology features from the reconstructed graph. Finally, the linear Logistic Regression (LR) model is used to predict the probability of interactions of two proteins. LPPI achieved a high accuracy of 0.99997 and 0.9979 on the PPI network dataset and GraphSAGE-PPI dataset, respectively. Our further results show that the LPPI is promising for large-scale PPI prediction in both accuracy and efficiency, which is beneficial to other large-scale biomedical molecules interactions detection. Frontiers Media S.A. 2021-02-26 /pmc/articles/PMC7953052/ /pubmed/33719344 http://dx.doi.org/10.3389/fgene.2021.635451 Text en Copyright © 2021 Su, You, Hu, Huang, Wang and Yi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Su, Xiao-Rui
You, Zhu-Hong
Hu, Lun
Huang, Yu-An
Wang, Yi
Yi, Hai-Cheng
An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding
title An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding
title_full An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding
title_fullStr An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding
title_full_unstemmed An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding
title_short An Efficient Computational Model for Large-Scale Prediction of Protein–Protein Interactions Based on Accurate and Scalable Graph Embedding
title_sort efficient computational model for large-scale prediction of protein–protein interactions based on accurate and scalable graph embedding
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7953052/
https://www.ncbi.nlm.nih.gov/pubmed/33719344
http://dx.doi.org/10.3389/fgene.2021.635451
work_keys_str_mv AT suxiaorui anefficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT youzhuhong anefficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT hulun anefficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT huangyuan anefficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT wangyi anefficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT yihaicheng anefficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT suxiaorui efficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT youzhuhong efficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT hulun efficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT huangyuan efficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT wangyi efficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding
AT yihaicheng efficientcomputationalmodelforlargescalepredictionofproteinproteininteractionsbasedonaccurateandscalablegraphembedding