Cargando…

Information assessment on predicting protein-protein interactions

BACKGROUND: Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive a...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Nan, Wu, Baolin, Jansen, Ronald, Gerstein, Mark, Zhao, Hongyu
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC529436/
https://www.ncbi.nlm.nih.gov/pubmed/15491499
http://dx.doi.org/10.1186/1471-2105-5-154
_version_ 1782121976819613696
author Lin, Nan
Wu, Baolin
Jansen, Ronald
Gerstein, Mark
Zhao, Hongyu
author_facet Lin, Nan
Wu, Baolin
Jansen, Ronald
Gerstein, Mark
Zhao, Hongyu
author_sort Lin, Nan
collection PubMed
description BACKGROUND: Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information. RESULTS: Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions. CONCLUSIONS: In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO) functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the Bayesian methods decreased classification performance.
format Text
id pubmed-529436
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5294362004-11-21 Information assessment on predicting protein-protein interactions Lin, Nan Wu, Baolin Jansen, Ronald Gerstein, Mark Zhao, Hongyu BMC Bioinformatics Research Article BACKGROUND: Identifying protein-protein interactions is fundamental for understanding the molecular machinery of the cell. Proteome-wide studies of protein-protein interactions are of significant value, but the high-throughput experimental technologies suffer from high rates of both false positive and false negative predictions. In addition to high-throughput experimental data, many diverse types of genomic data can help predict protein-protein interactions, such as mRNA expression, localization, essentiality, and functional annotation. Evaluations of the information contributions from different evidences help to establish more parsimonious models with comparable or better prediction accuracy, and to obtain biological insights of the relationships between protein-protein interactions and other genomic information. RESULTS: Our assessment is based on the genomic features used in a Bayesian network approach to predict protein-protein interactions genome-wide in yeast. In the special case, when one does not have any missing information about any of the features, our analysis shows that there is a larger information contribution from the functional-classification than from expression correlations or essentiality. We also show that in this case alternative models, such as logistic regression and random forest, may be more effective than Bayesian networks for predicting interactions. CONCLUSIONS: In the restricted problem posed by the complete-information subset, we identified that the MIPS and Gene Ontology (GO) functional similarity datasets as the dominating information contributors for predicting the protein-protein interactions under the framework proposed by Jansen et al. Random forests based on the MIPS and GO information alone can give highly accurate classifications. In this particular subset of complete information, adding other genomic data does little for improving predictions. We also found that the data discretizations used in the Bayesian methods decreased classification performance. BioMed Central 2004-10-18 /pmc/articles/PMC529436/ /pubmed/15491499 http://dx.doi.org/10.1186/1471-2105-5-154 Text en Copyright © 2004 Lin et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Lin, Nan
Wu, Baolin
Jansen, Ronald
Gerstein, Mark
Zhao, Hongyu
Information assessment on predicting protein-protein interactions
title Information assessment on predicting protein-protein interactions
title_full Information assessment on predicting protein-protein interactions
title_fullStr Information assessment on predicting protein-protein interactions
title_full_unstemmed Information assessment on predicting protein-protein interactions
title_short Information assessment on predicting protein-protein interactions
title_sort information assessment on predicting protein-protein interactions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC529436/
https://www.ncbi.nlm.nih.gov/pubmed/15491499
http://dx.doi.org/10.1186/1471-2105-5-154
work_keys_str_mv AT linnan informationassessmentonpredictingproteinproteininteractions
AT wubaolin informationassessmentonpredictingproteinproteininteractions
AT jansenronald informationassessmentonpredictingproteinproteininteractions
AT gersteinmark informationassessmentonpredictingproteinproteininteractions
AT zhaohongyu informationassessmentonpredictingproteinproteininteractions