Cargando…

Protein complexes identification based on go attributed network embedding

BACKGROUND: Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challen...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Bo, Li, Kun, Zheng, Wei, Liu, Xiaoxia, Zhang, Yijia, Zhao, Zhehuan, He, Zengyou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302388/
https://www.ncbi.nlm.nih.gov/pubmed/30572820
http://dx.doi.org/10.1186/s12859-018-2555-x
_version_ 1783381967260090368
author Xu, Bo
Li, Kun
Zheng, Wei
Liu, Xiaoxia
Zhang, Yijia
Zhao, Zhehuan
He, Zengyou
author_facet Xu, Bo
Li, Kun
Zheng, Wei
Liu, Xiaoxia
Zhang, Yijia
Zhao, Zhehuan
He, Zengyou
author_sort Xu, Bo
collection PubMed
description BACKGROUND: Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process. RESULTS: In this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics. CONCLUSIONS: GANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2555-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6302388
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63023882018-12-31 Protein complexes identification based on go attributed network embedding Xu, Bo Li, Kun Zheng, Wei Liu, Xiaoxia Zhang, Yijia Zhao, Zhehuan He, Zengyou BMC Bioinformatics Research Article BACKGROUND: Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process. RESULTS: In this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics. CONCLUSIONS: GANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2555-x) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-20 /pmc/articles/PMC6302388/ /pubmed/30572820 http://dx.doi.org/10.1186/s12859-018-2555-x Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Xu, Bo
Li, Kun
Zheng, Wei
Liu, Xiaoxia
Zhang, Yijia
Zhao, Zhehuan
He, Zengyou
Protein complexes identification based on go attributed network embedding
title Protein complexes identification based on go attributed network embedding
title_full Protein complexes identification based on go attributed network embedding
title_fullStr Protein complexes identification based on go attributed network embedding
title_full_unstemmed Protein complexes identification based on go attributed network embedding
title_short Protein complexes identification based on go attributed network embedding
title_sort protein complexes identification based on go attributed network embedding
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302388/
https://www.ncbi.nlm.nih.gov/pubmed/30572820
http://dx.doi.org/10.1186/s12859-018-2555-x
work_keys_str_mv AT xubo proteincomplexesidentificationbasedongoattributednetworkembedding
AT likun proteincomplexesidentificationbasedongoattributednetworkembedding
AT zhengwei proteincomplexesidentificationbasedongoattributednetworkembedding
AT liuxiaoxia proteincomplexesidentificationbasedongoattributednetworkembedding
AT zhangyijia proteincomplexesidentificationbasedongoattributednetworkembedding
AT zhaozhehuan proteincomplexesidentificationbasedongoattributednetworkembedding
AT hezengyou proteincomplexesidentificationbasedongoattributednetworkembedding