Cargando…

Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model

BACKGROUND: The investigation of possible interactions between two proteins in intracellular signaling is an expensive and laborious procedure in the wet-lab, therefore, several in silico approaches have been implemented to narrow down the candidates for future experimental validations. Reformulatin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Balogh, Olivér M., Benczik, Bettina, Horváth, András, Pétervári, Mátyás, Csermely, Péter, Ferdinandy, Péter, Ágg, Bence
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8858570/ https://www.ncbi.nlm.nih.gov/pubmed/35183129 http://dx.doi.org/10.1186/s12859-022-04598-x

_version_	1784654269253681152
author	Balogh, Olivér M. Benczik, Bettina Horváth, András Pétervári, Mátyás Csermely, Péter Ferdinandy, Péter Ágg, Bence
author_facet	Balogh, Olivér M. Benczik, Bettina Horváth, András Pétervári, Mátyás Csermely, Péter Ferdinandy, Péter Ágg, Bence
author_sort	Balogh, Olivér M.
collection	PubMed
description	BACKGROUND: The investigation of possible interactions between two proteins in intracellular signaling is an expensive and laborious procedure in the wet-lab, therefore, several in silico approaches have been implemented to narrow down the candidates for future experimental validations. Reformulating the problem in the field of network theory, the set of proteins can be represented as the nodes of a network, while the interactions between them as the edges. The resulting protein–protein interaction (PPI) network enables the use of link prediction techniques in order to discover new probable connections. Therefore, here we aimed to offer a novel approach to the link prediction task in PPI networks, utilizing a generative machine learning model. RESULTS: We created a tool that consists of two modules, the data processing framework and the machine learning model. As data processing, we used a modified breadth-first search algorithm to traverse the network and extract induced subgraphs, which served as image-like input data for our model. As machine learning, an image-to-image translation inspired conditional generative adversarial network (cGAN) model utilizing Wasserstein distance-based loss improved with gradient penalty was used, taking the combined representation from the data processing as input, and training the generator to predict the probable unknown edges in the provided induced subgraphs. Our link prediction tool was evaluated on the protein–protein interaction networks of five different species from the STRING database by calculating the area under the receiver operating characteristic, the precision-recall curves and the normalized discounted cumulative gain (AUROC, AUPRC, NDCG, respectively). Test runs yielded the averaged results of AUROC = 0.915, AUPRC = 0.176 and NDCG = 0.763 on all investigated species. CONCLUSION: We developed a software for the purpose of link prediction in PPI networks utilizing machine learning. The evaluation of our software serves as the first demonstration that a cGAN model, conditioned on raw topological features of the PPI network, is an applicable solution for the PPI prediction problem without requiring often unavailable molecular node attributes. The corresponding scripts are available at https://github.com/semmelweis-pharmacology/ppi_pred. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04598-x.
format	Online Article Text
id	pubmed-8858570
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-88585702022-02-23 Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model Balogh, Olivér M. Benczik, Bettina Horváth, András Pétervári, Mátyás Csermely, Péter Ferdinandy, Péter Ágg, Bence BMC Bioinformatics Software BACKGROUND: The investigation of possible interactions between two proteins in intracellular signaling is an expensive and laborious procedure in the wet-lab, therefore, several in silico approaches have been implemented to narrow down the candidates for future experimental validations. Reformulating the problem in the field of network theory, the set of proteins can be represented as the nodes of a network, while the interactions between them as the edges. The resulting protein–protein interaction (PPI) network enables the use of link prediction techniques in order to discover new probable connections. Therefore, here we aimed to offer a novel approach to the link prediction task in PPI networks, utilizing a generative machine learning model. RESULTS: We created a tool that consists of two modules, the data processing framework and the machine learning model. As data processing, we used a modified breadth-first search algorithm to traverse the network and extract induced subgraphs, which served as image-like input data for our model. As machine learning, an image-to-image translation inspired conditional generative adversarial network (cGAN) model utilizing Wasserstein distance-based loss improved with gradient penalty was used, taking the combined representation from the data processing as input, and training the generator to predict the probable unknown edges in the provided induced subgraphs. Our link prediction tool was evaluated on the protein–protein interaction networks of five different species from the STRING database by calculating the area under the receiver operating characteristic, the precision-recall curves and the normalized discounted cumulative gain (AUROC, AUPRC, NDCG, respectively). Test runs yielded the averaged results of AUROC = 0.915, AUPRC = 0.176 and NDCG = 0.763 on all investigated species. CONCLUSION: We developed a software for the purpose of link prediction in PPI networks utilizing machine learning. The evaluation of our software serves as the first demonstration that a cGAN model, conditioned on raw topological features of the PPI network, is an applicable solution for the PPI prediction problem without requiring often unavailable molecular node attributes. The corresponding scripts are available at https://github.com/semmelweis-pharmacology/ppi_pred. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04598-x. BioMed Central 2022-02-19 /pmc/articles/PMC8858570/ /pubmed/35183129 http://dx.doi.org/10.1186/s12859-022-04598-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Software Balogh, Olivér M. Benczik, Bettina Horváth, András Pétervári, Mátyás Csermely, Péter Ferdinandy, Péter Ágg, Bence Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model
title	Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model
title_full	Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model
title_fullStr	Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model
title_full_unstemmed	Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model
title_short	Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model
title_sort	efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8858570/ https://www.ncbi.nlm.nih.gov/pubmed/35183129 http://dx.doi.org/10.1186/s12859-022-04598-x
work_keys_str_mv	AT balogholiverm efficientlinkpredictionintheproteinproteininteractionnetworkusingtopologicalinformationinagenerativeadversarialnetworkmachinelearningmodel AT benczikbettina efficientlinkpredictionintheproteinproteininteractionnetworkusingtopologicalinformationinagenerativeadversarialnetworkmachinelearningmodel AT horvathandras efficientlinkpredictionintheproteinproteininteractionnetworkusingtopologicalinformationinagenerativeadversarialnetworkmachinelearningmodel AT petervarimatyas efficientlinkpredictionintheproteinproteininteractionnetworkusingtopologicalinformationinagenerativeadversarialnetworkmachinelearningmodel AT csermelypeter efficientlinkpredictionintheproteinproteininteractionnetworkusingtopologicalinformationinagenerativeadversarialnetworkmachinelearningmodel AT ferdinandypeter efficientlinkpredictionintheproteinproteininteractionnetworkusingtopologicalinformationinagenerativeadversarialnetworkmachinelearningmodel AT aggbence efficientlinkpredictionintheproteinproteininteractionnetworkusingtopologicalinformationinagenerativeadversarialnetworkmachinelearningmodel

Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model

Ejemplares similares