Cargando…

Supervised maximum-likelihood weighting of composite protein networks for complex prediction

BACKGROUND: Protein complexes participate in many important cellular functions, so finding the set of existent complexes is essential for understanding the organization and regulation of processes in the cell. With the availability of large amounts of high-throughput protein-protein interaction (PPI...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yong, Chern Han, Liu, Guimei, Chua, Hon Nian, Wong, Limsoon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521185/ https://www.ncbi.nlm.nih.gov/pubmed/23281936 http://dx.doi.org/10.1186/1752-0509-6-S2-S13

_version_	1782252900524752896
author	Yong, Chern Han Liu, Guimei Chua, Hon Nian Wong, Limsoon
author_facet	Yong, Chern Han Liu, Guimei Chua, Hon Nian Wong, Limsoon
author_sort	Yong, Chern Han
collection	PubMed
description	BACKGROUND: Protein complexes participate in many important cellular functions, so finding the set of existent complexes is essential for understanding the organization and regulation of processes in the cell. With the availability of large amounts of high-throughput protein-protein interaction (PPI) data, many algorithms have been proposed to discover protein complexes from PPI networks. However, such approaches are hindered by the high rate of noise in high-throughput PPI data, including spurious and missing interactions. Furthermore, many transient interactions are detected between proteins that are not from the same complex, while not all proteins from the same complex may actually interact. As a result, predicted complexes often do not match true complexes well, and many true complexes go undetected. RESULTS: We address these challenges by integrating PPI data with other heterogeneous data sources to construct a composite protein network, and using a supervised maximum-likelihood approach to weight each edge based on its posterior probability of belonging to a complex. We then use six different clustering algorithms, and an aggregative clustering strategy, to discover complexes in the weighted network. We test our method on Saccharomyces cerevisiae and Homo sapiens, and show that complex discovery is improved: compared to previously proposed supervised and unsupervised weighting approaches, our method recalls more known complexes, achieves higher precision at all recall levels, and generates novel complexes of greater functional similarity. Furthermore, our maximum-likelihood approach allows learned parameters to be used to visualize and evaluate the evidence of novel predictions, aiding human judgment of their credibility. CONCLUSIONS: Our approach integrates multiple data sources with supervised learning to create a weighted composite protein network, and uses six clustering algorithms with an aggregative clustering strategy to discover novel complexes. We show improved performance over previous approaches in terms of precision, recall, and number and quality of novel predictions. We present and visualize two novel predicted complexes in yeast and human, and find external evidence supporting these predictions.
format	Online Article Text
id	pubmed-3521185
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35211852012-12-14 Supervised maximum-likelihood weighting of composite protein networks for complex prediction Yong, Chern Han Liu, Guimei Chua, Hon Nian Wong, Limsoon BMC Syst Biol Proceedings BACKGROUND: Protein complexes participate in many important cellular functions, so finding the set of existent complexes is essential for understanding the organization and regulation of processes in the cell. With the availability of large amounts of high-throughput protein-protein interaction (PPI) data, many algorithms have been proposed to discover protein complexes from PPI networks. However, such approaches are hindered by the high rate of noise in high-throughput PPI data, including spurious and missing interactions. Furthermore, many transient interactions are detected between proteins that are not from the same complex, while not all proteins from the same complex may actually interact. As a result, predicted complexes often do not match true complexes well, and many true complexes go undetected. RESULTS: We address these challenges by integrating PPI data with other heterogeneous data sources to construct a composite protein network, and using a supervised maximum-likelihood approach to weight each edge based on its posterior probability of belonging to a complex. We then use six different clustering algorithms, and an aggregative clustering strategy, to discover complexes in the weighted network. We test our method on Saccharomyces cerevisiae and Homo sapiens, and show that complex discovery is improved: compared to previously proposed supervised and unsupervised weighting approaches, our method recalls more known complexes, achieves higher precision at all recall levels, and generates novel complexes of greater functional similarity. Furthermore, our maximum-likelihood approach allows learned parameters to be used to visualize and evaluate the evidence of novel predictions, aiding human judgment of their credibility. CONCLUSIONS: Our approach integrates multiple data sources with supervised learning to create a weighted composite protein network, and uses six clustering algorithms with an aggregative clustering strategy to discover novel complexes. We show improved performance over previous approaches in terms of precision, recall, and number and quality of novel predictions. We present and visualize two novel predicted complexes in yeast and human, and find external evidence supporting these predictions. BioMed Central 2012-12-12 /pmc/articles/PMC3521185/ /pubmed/23281936 http://dx.doi.org/10.1186/1752-0509-6-S2-S13 Text en Copyright ©2012 Yong et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Yong, Chern Han Liu, Guimei Chua, Hon Nian Wong, Limsoon Supervised maximum-likelihood weighting of composite protein networks for complex prediction
title	Supervised maximum-likelihood weighting of composite protein networks for complex prediction
title_full	Supervised maximum-likelihood weighting of composite protein networks for complex prediction
title_fullStr	Supervised maximum-likelihood weighting of composite protein networks for complex prediction
title_full_unstemmed	Supervised maximum-likelihood weighting of composite protein networks for complex prediction
title_short	Supervised maximum-likelihood weighting of composite protein networks for complex prediction
title_sort	supervised maximum-likelihood weighting of composite protein networks for complex prediction
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521185/ https://www.ncbi.nlm.nih.gov/pubmed/23281936 http://dx.doi.org/10.1186/1752-0509-6-S2-S13
work_keys_str_mv	AT yongchernhan supervisedmaximumlikelihoodweightingofcompositeproteinnetworksforcomplexprediction AT liuguimei supervisedmaximumlikelihoodweightingofcompositeproteinnetworksforcomplexprediction AT chuahonnian supervisedmaximumlikelihoodweightingofcompositeproteinnetworksforcomplexprediction AT wonglimsoon supervisedmaximumlikelihoodweightingofcompositeproteinnetworksforcomplexprediction

Supervised maximum-likelihood weighting of composite protein networks for complex prediction

Ejemplares similares