Cargando…

Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional scr...

Descripción completa

Detalles Bibliográficos
Autores principales: Nariai, Naoki, Kolaczyk, Eric D., Kasif, Simon
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1828618/
https://www.ncbi.nlm.nih.gov/pubmed/17396164
http://dx.doi.org/10.1371/journal.pone.0000337
_version_ 1782132731134607360
author Nariai, Naoki
Kolaczyk, Eric D.
Kasif, Simon
author_facet Nariai, Naoki
Kolaczyk, Eric D.
Kasif, Simon
author_sort Nariai, Naoki
collection PubMed
description Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.
format Text
id pubmed-1828618
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18286182007-03-29 Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data Nariai, Naoki Kolaczyk, Eric D. Kasif, Simon PLoS One Research Article Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function. Public Library of Science 2007-03-28 /pmc/articles/PMC1828618/ /pubmed/17396164 http://dx.doi.org/10.1371/journal.pone.0000337 Text en Nariai et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Nariai, Naoki
Kolaczyk, Eric D.
Kasif, Simon
Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data
title Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data
title_full Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data
title_fullStr Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data
title_full_unstemmed Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data
title_short Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data
title_sort probabilistic protein function prediction from heterogeneous genome-wide data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1828618/
https://www.ncbi.nlm.nih.gov/pubmed/17396164
http://dx.doi.org/10.1371/journal.pone.0000337
work_keys_str_mv AT nariainaoki probabilisticproteinfunctionpredictionfromheterogeneousgenomewidedata
AT kolaczykericd probabilisticproteinfunctionpredictionfromheterogeneousgenomewidedata
AT kasifsimon probabilisticproteinfunctionpredictionfromheterogeneousgenomewidedata