Cargando…

Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient

Identification of protein complexes from protein-protein interaction (PPI) networks is a key problem in PPI mining, solved by parameter-dependent approaches that suffer from small recall rates. Here we introduce GCC-v, a family of efficient, parameter-free algorithms to accurately predict protein co...

Descripción completa

Detalles Bibliográficos
Autores principales: Omranian, Sara, Angeleska, Angela, Nikoloski, Zoran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479235/
https://www.ncbi.nlm.nih.gov/pubmed/34630943
http://dx.doi.org/10.1016/j.csbj.2021.09.014
_version_ 1784576209203494912
author Omranian, Sara
Angeleska, Angela
Nikoloski, Zoran
author_facet Omranian, Sara
Angeleska, Angela
Nikoloski, Zoran
author_sort Omranian, Sara
collection PubMed
description Identification of protein complexes from protein-protein interaction (PPI) networks is a key problem in PPI mining, solved by parameter-dependent approaches that suffer from small recall rates. Here we introduce GCC-v, a family of efficient, parameter-free algorithms to accurately predict protein complexes using the (weighted) clustering coefficient of proteins in PPI networks. Through comparative analyses with gold standards and PPI networks from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, we demonstrate that GCC-v outperforms twelve state-of-the-art approaches for identification of protein complexes with respect to twelve performance measures in at least 85.71% of scenarios. We also show that GCC-v results in the exact recovery of ∼35% of protein complexes in a pan-plant PPI network and discover 144 new protein complexes in Arabidopsis thaliana, with high support from GO semantic similarity. Our results indicate that findings from GCC-v are robust to network perturbations, which has direct implications to assess the impact of the PPI network quality on the predicted protein complexes.
format Online
Article
Text
id pubmed-8479235
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-84792352021-10-07 Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient Omranian, Sara Angeleska, Angela Nikoloski, Zoran Comput Struct Biotechnol J Research Article Identification of protein complexes from protein-protein interaction (PPI) networks is a key problem in PPI mining, solved by parameter-dependent approaches that suffer from small recall rates. Here we introduce GCC-v, a family of efficient, parameter-free algorithms to accurately predict protein complexes using the (weighted) clustering coefficient of proteins in PPI networks. Through comparative analyses with gold standards and PPI networks from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, we demonstrate that GCC-v outperforms twelve state-of-the-art approaches for identification of protein complexes with respect to twelve performance measures in at least 85.71% of scenarios. We also show that GCC-v results in the exact recovery of ∼35% of protein complexes in a pan-plant PPI network and discover 144 new protein complexes in Arabidopsis thaliana, with high support from GO semantic similarity. Our results indicate that findings from GCC-v are robust to network perturbations, which has direct implications to assess the impact of the PPI network quality on the predicted protein complexes. Research Network of Computational and Structural Biotechnology 2021-09-20 /pmc/articles/PMC8479235/ /pubmed/34630943 http://dx.doi.org/10.1016/j.csbj.2021.09.014 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Omranian, Sara
Angeleska, Angela
Nikoloski, Zoran
Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient
title Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient
title_full Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient
title_fullStr Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient
title_full_unstemmed Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient
title_short Efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient
title_sort efficient and accurate identification of protein complexes from protein-protein interaction networks based on the clustering coefficient
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479235/
https://www.ncbi.nlm.nih.gov/pubmed/34630943
http://dx.doi.org/10.1016/j.csbj.2021.09.014
work_keys_str_mv AT omraniansara efficientandaccurateidentificationofproteincomplexesfromproteinproteininteractionnetworksbasedontheclusteringcoefficient
AT angeleskaangela efficientandaccurateidentificationofproteincomplexesfromproteinproteininteractionnetworksbasedontheclusteringcoefficient
AT nikoloskizoran efficientandaccurateidentificationofproteincomplexesfromproteinproteininteractionnetworksbasedontheclusteringcoefficient