Cargando…
Filtering high-throughput protein-protein interaction data using a combination of genomic features
BACKGROUND: Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the inter...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1127019/ https://www.ncbi.nlm.nih.gov/pubmed/15833142 http://dx.doi.org/10.1186/1471-2105-6-100 |
_version_ | 1782123946341040128 |
---|---|
author | Patil, Ashwini Nakamura, Haruki |
author_facet | Patil, Ashwini Nakamura, Haruki |
author_sort | Patil, Ashwini |
collection | PubMed |
description | BACKGROUND: Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. RESULTS: In this study, we use a combination of 3 genomic features – structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at . CONCLUSION: A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction. |
format | Text |
id | pubmed-1127019 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-11270192005-05-17 Filtering high-throughput protein-protein interaction data using a combination of genomic features Patil, Ashwini Nakamura, Haruki BMC Bioinformatics Research Article BACKGROUND: Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. RESULTS: In this study, we use a combination of 3 genomic features – structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at . CONCLUSION: A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction. BioMed Central 2005-04-18 /pmc/articles/PMC1127019/ /pubmed/15833142 http://dx.doi.org/10.1186/1471-2105-6-100 Text en Copyright © 2005 Patil and Nakamura; licensee BioMed Central Ltd. |
spellingShingle | Research Article Patil, Ashwini Nakamura, Haruki Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_full | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_fullStr | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_full_unstemmed | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_short | Filtering high-throughput protein-protein interaction data using a combination of genomic features |
title_sort | filtering high-throughput protein-protein interaction data using a combination of genomic features |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1127019/ https://www.ncbi.nlm.nih.gov/pubmed/15833142 http://dx.doi.org/10.1186/1471-2105-6-100 |
work_keys_str_mv | AT patilashwini filteringhighthroughputproteinproteininteractiondatausingacombinationofgenomicfeatures AT nakamuraharuki filteringhighthroughputproteinproteininteractiondatausingacombinationofgenomicfeatures |