Cargando…

Predicting Co-Complexed Protein Pairs from Heterogeneous Data

Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be...

Descripción completa

Detalles Bibliográficos
Autores principales:	Qiu, Jian, Noble, William Stafford
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275314/ https://www.ncbi.nlm.nih.gov/pubmed/18421371 http://dx.doi.org/10.1371/journal.pcbi.1000054

_version_	1782151856249634816
author	Qiu, Jian Noble, William Stafford
author_facet	Qiu, Jian Noble, William Stafford
author_sort	Qiu, Jian
collection	PubMed
description	Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC(50) of 0.937. Assuming a negative-to-positive ratio of 600∶1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data.
format	Text
id	pubmed-2275314
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-22753142008-04-18 Predicting Co-Complexed Protein Pairs from Heterogeneous Data Qiu, Jian Noble, William Stafford PLoS Comput Biol Research Article Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC(50) of 0.937. Assuming a negative-to-positive ratio of 600∶1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data. Public Library of Science 2008-04-18 /pmc/articles/PMC2275314/ /pubmed/18421371 http://dx.doi.org/10.1371/journal.pcbi.1000054 Text en Qiu, Noble. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Qiu, Jian Noble, William Stafford Predicting Co-Complexed Protein Pairs from Heterogeneous Data
title	Predicting Co-Complexed Protein Pairs from Heterogeneous Data
title_full	Predicting Co-Complexed Protein Pairs from Heterogeneous Data
title_fullStr	Predicting Co-Complexed Protein Pairs from Heterogeneous Data
title_full_unstemmed	Predicting Co-Complexed Protein Pairs from Heterogeneous Data
title_short	Predicting Co-Complexed Protein Pairs from Heterogeneous Data
title_sort	predicting co-complexed protein pairs from heterogeneous data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275314/ https://www.ncbi.nlm.nih.gov/pubmed/18421371 http://dx.doi.org/10.1371/journal.pcbi.1000054
work_keys_str_mv	AT qiujian predictingcocomplexedproteinpairsfromheterogeneousdata AT noblewilliamstafford predictingcocomplexedproteinpairsfromheterogeneousdata

Predicting Co-Complexed Protein Pairs from Heterogeneous Data

Ejemplares similares