Cargando…

Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes

BACKGROUND: The prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics. Three groups of complexes have been identified as problematic to discover. First, many complexes are sparsely connected in the PPI network, an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yong, Chern Han, Wong, Limsoon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4522147/ https://www.ncbi.nlm.nih.gov/pubmed/26231465 http://dx.doi.org/10.1186/s13062-015-0067-4

_version_	1782383924986511360
author	Yong, Chern Han Wong, Limsoon
author_facet	Yong, Chern Han Wong, Limsoon
author_sort	Yong, Chern Han
collection	PubMed
description	BACKGROUND: The prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics. Three groups of complexes have been identified as problematic to discover. First, many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms. Second, many complexes are embedded within highly-connected regions of the PPI network, which makes it difficult to accurately delimit their boundaries. Third, many complexes are small (composed of two or three distinct proteins), so that traditional topological markers such as density are ineffective. RESULTS: We have previously proposed three approaches to address these challenges. First, Supervised Weighting of Composite Networks (SWC) integrates diverse data sources with supervised weighting, and successfully fills in missing co-complex edges in sparse complexes to allow them to be predicted. Second, network decomposition (DECOMP) splits the PPI network into spatially- and temporally-coherent subnetworks, allowing complexes embedded within highly-connected regions to be more clearly demarcated. Finally, Size-Specific Supervised Weighting (SSS) integrates diverse data sources with supervised learning to weight edges in a size-specific manner—of being in a small complex versus a large complex—and improves the prediction of small complexes. Here we integrate these three approaches into a single system. We test the integrated approach on the prediction of yeast and human complexes, and show that it outperforms SWC, DECOMP, or SSS when run individually, achieving the highest precision and recall levels. CONCLUSION: Three groups of protein complexes remain challenging to predict from PPI data: sparse complexes, embedded complexes, and small complexes. Our previous approaches have addressed each of these challenges individually, through data integration, PPI-network decomposition, and supervised learning. Here we integrate these approaches into a single complex-discovery system, which improves the prediction of all three types of challenging complexes. With our approach, protein complexes can be more accurately and comprehensively predicted, allowing a clearer elucidation of the modular machinery of the cell. REVIEWERS: This article was reviewed by Prof. Masanori Arita and Dr. Yang Liu (nominated by Prof. Charles DeLisi).
format	Online Article Text
id	pubmed-4522147
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45221472015-08-02 Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes Yong, Chern Han Wong, Limsoon Biol Direct Research BACKGROUND: The prediction of protein complexes from high-throughput protein-protein interaction (PPI) data remains an important challenge in bioinformatics. Three groups of complexes have been identified as problematic to discover. First, many complexes are sparsely connected in the PPI network, and do not form dense clusters that can be derived by clustering algorithms. Second, many complexes are embedded within highly-connected regions of the PPI network, which makes it difficult to accurately delimit their boundaries. Third, many complexes are small (composed of two or three distinct proteins), so that traditional topological markers such as density are ineffective. RESULTS: We have previously proposed three approaches to address these challenges. First, Supervised Weighting of Composite Networks (SWC) integrates diverse data sources with supervised weighting, and successfully fills in missing co-complex edges in sparse complexes to allow them to be predicted. Second, network decomposition (DECOMP) splits the PPI network into spatially- and temporally-coherent subnetworks, allowing complexes embedded within highly-connected regions to be more clearly demarcated. Finally, Size-Specific Supervised Weighting (SSS) integrates diverse data sources with supervised learning to weight edges in a size-specific manner—of being in a small complex versus a large complex—and improves the prediction of small complexes. Here we integrate these three approaches into a single system. We test the integrated approach on the prediction of yeast and human complexes, and show that it outperforms SWC, DECOMP, or SSS when run individually, achieving the highest precision and recall levels. CONCLUSION: Three groups of protein complexes remain challenging to predict from PPI data: sparse complexes, embedded complexes, and small complexes. Our previous approaches have addressed each of these challenges individually, through data integration, PPI-network decomposition, and supervised learning. Here we integrate these approaches into a single complex-discovery system, which improves the prediction of all three types of challenging complexes. With our approach, protein complexes can be more accurately and comprehensively predicted, allowing a clearer elucidation of the modular machinery of the cell. REVIEWERS: This article was reviewed by Prof. Masanori Arita and Dr. Yang Liu (nominated by Prof. Charles DeLisi). BioMed Central 2015-08-01 /pmc/articles/PMC4522147/ /pubmed/26231465 http://dx.doi.org/10.1186/s13062-015-0067-4 Text en © Yong and Wong; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Yong, Chern Han Wong, Limsoon Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes
title	Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes
title_full	Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes
title_fullStr	Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes
title_full_unstemmed	Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes
title_short	Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes
title_sort	prediction of problematic complexes from ppi networks: sparse, embedded, and small complexes
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4522147/ https://www.ncbi.nlm.nih.gov/pubmed/26231465 http://dx.doi.org/10.1186/s13062-015-0067-4
work_keys_str_mv	AT yongchernhan predictionofproblematiccomplexesfromppinetworkssparseembeddedandsmallcomplexes AT wonglimsoon predictionofproblematiccomplexesfromppinetworkssparseembeddedandsmallcomplexes

Prediction of problematic complexes from PPI networks: sparse, embedded, and small complexes

Ejemplares similares