Cargando…

Probabilistic prediction and ranking of human protein-protein interactions

BACKGROUND: Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-...

Descripción completa

Detalles Bibliográficos
Autores principales: Scott, Michelle S, Barton, Geoffrey J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1939716/
https://www.ncbi.nlm.nih.gov/pubmed/17615067
http://dx.doi.org/10.1186/1471-2105-8-239
_version_ 1782134402272198656
author Scott, Michelle S
Barton, Geoffrey J
author_facet Scott, Michelle S
Barton, Geoffrey J
author_sort Scott, Michelle S
collection PubMed
description BACKGROUND: Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. RESULTS: The prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates. CONCLUSION: The set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions.
format Text
id pubmed-1939716
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19397162007-08-03 Probabilistic prediction and ranking of human protein-protein interactions Scott, Michelle S Barton, Geoffrey J BMC Bioinformatics Research Article BACKGROUND: Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. RESULTS: The prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates. CONCLUSION: The set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions. BioMed Central 2007-07-05 /pmc/articles/PMC1939716/ /pubmed/17615067 http://dx.doi.org/10.1186/1471-2105-8-239 Text en Copyright © 2007 Scott and Barton; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Scott, Michelle S
Barton, Geoffrey J
Probabilistic prediction and ranking of human protein-protein interactions
title Probabilistic prediction and ranking of human protein-protein interactions
title_full Probabilistic prediction and ranking of human protein-protein interactions
title_fullStr Probabilistic prediction and ranking of human protein-protein interactions
title_full_unstemmed Probabilistic prediction and ranking of human protein-protein interactions
title_short Probabilistic prediction and ranking of human protein-protein interactions
title_sort probabilistic prediction and ranking of human protein-protein interactions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1939716/
https://www.ncbi.nlm.nih.gov/pubmed/17615067
http://dx.doi.org/10.1186/1471-2105-8-239
work_keys_str_mv AT scottmichelles probabilisticpredictionandrankingofhumanproteinproteininteractions
AT bartongeoffreyj probabilisticpredictionandrankingofhumanproteinproteininteractions