Cargando…

The Impact of Multifunctional Genes on "Guilt by Association" Analysis

Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establis...

Descripción completa

Detalles Bibliográficos
Autores principales: Gillis, Jesse, Pavlidis, Paul
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041792/
https://www.ncbi.nlm.nih.gov/pubmed/21364756
http://dx.doi.org/10.1371/journal.pone.0017258
_version_ 1782198482313936896
author Gillis, Jesse
Pavlidis, Paul
author_facet Gillis, Jesse
Pavlidis, Paul
author_sort Gillis, Jesse
collection PubMed
description Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
format Text
id pubmed-3041792
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30417922011-03-01 The Impact of Multifunctional Genes on "Guilt by Association" Analysis Gillis, Jesse Pavlidis, Paul PLoS One Research Article Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies. Public Library of Science 2011-02-18 /pmc/articles/PMC3041792/ /pubmed/21364756 http://dx.doi.org/10.1371/journal.pone.0017258 Text en Gillis, Pavlidis. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Gillis, Jesse
Pavlidis, Paul
The Impact of Multifunctional Genes on "Guilt by Association" Analysis
title The Impact of Multifunctional Genes on "Guilt by Association" Analysis
title_full The Impact of Multifunctional Genes on "Guilt by Association" Analysis
title_fullStr The Impact of Multifunctional Genes on "Guilt by Association" Analysis
title_full_unstemmed The Impact of Multifunctional Genes on "Guilt by Association" Analysis
title_short The Impact of Multifunctional Genes on "Guilt by Association" Analysis
title_sort impact of multifunctional genes on "guilt by association" analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041792/
https://www.ncbi.nlm.nih.gov/pubmed/21364756
http://dx.doi.org/10.1371/journal.pone.0017258
work_keys_str_mv AT gillisjesse theimpactofmultifunctionalgenesonguiltbyassociationanalysis
AT pavlidispaul theimpactofmultifunctionalgenesonguiltbyassociationanalysis
AT gillisjesse impactofmultifunctionalgenesonguiltbyassociationanalysis
AT pavlidispaul impactofmultifunctionalgenesonguiltbyassociationanalysis