Cargando…

GOing Bayesian: model-based gene set analysis of genome-scale data

The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. However, knowledge bases such as the Gene Ontology (GO) contain hundreds or thousands of categories with very high over...

Descripción completa

Detalles Bibliográficos
Autores principales: Bauer, Sebastian, Gagneur, Julien, Robinson, Peter N.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887944/
https://www.ncbi.nlm.nih.gov/pubmed/20172960
http://dx.doi.org/10.1093/nar/gkq045
_version_ 1782182614636953600
author Bauer, Sebastian
Gagneur, Julien
Robinson, Peter N.
author_facet Bauer, Sebastian
Gagneur, Julien
Robinson, Peter N.
author_sort Bauer, Sebastian
collection PubMed
description The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. However, knowledge bases such as the Gene Ontology (GO) contain hundreds or thousands of categories with very high overlap between categories. Thus, enrichment analysis performed on one category at a time frequently returns large numbers of correlated categories, leaving the choice of the most relevant ones to the user's; interpretation. Here we present model-based gene set analysis (MGSA) that analyzes all categories at once by embedding them in a Bayesian network, in which gene response is modeled as a function of the activation of biological categories. Probabilistic inference is used to identify the active categories. The Bayesian modeling approach naturally takes category overlap into account and avoids the need for multiple testing corrections met in single-category enrichment analysis. On simulated data, MGSA identifies active categories with up to 95% precision at a recall of 20% for moderate settings of noise, leading to a 10-fold precision improvement over single-category statistical enrichment analysis. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations.
format Text
id pubmed-2887944
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28879442010-06-22 GOing Bayesian: model-based gene set analysis of genome-scale data Bauer, Sebastian Gagneur, Julien Robinson, Peter N. Nucleic Acids Res Computational Biology The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. However, knowledge bases such as the Gene Ontology (GO) contain hundreds or thousands of categories with very high overlap between categories. Thus, enrichment analysis performed on one category at a time frequently returns large numbers of correlated categories, leaving the choice of the most relevant ones to the user's; interpretation. Here we present model-based gene set analysis (MGSA) that analyzes all categories at once by embedding them in a Bayesian network, in which gene response is modeled as a function of the activation of biological categories. Probabilistic inference is used to identify the active categories. The Bayesian modeling approach naturally takes category overlap into account and avoids the need for multiple testing corrections met in single-category enrichment analysis. On simulated data, MGSA identifies active categories with up to 95% precision at a recall of 20% for moderate settings of noise, leading to a 10-fold precision improvement over single-category statistical enrichment analysis. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations. Oxford University Press 2010-06 2010-02-19 /pmc/articles/PMC2887944/ /pubmed/20172960 http://dx.doi.org/10.1093/nar/gkq045 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Bauer, Sebastian
Gagneur, Julien
Robinson, Peter N.
GOing Bayesian: model-based gene set analysis of genome-scale data
title GOing Bayesian: model-based gene set analysis of genome-scale data
title_full GOing Bayesian: model-based gene set analysis of genome-scale data
title_fullStr GOing Bayesian: model-based gene set analysis of genome-scale data
title_full_unstemmed GOing Bayesian: model-based gene set analysis of genome-scale data
title_short GOing Bayesian: model-based gene set analysis of genome-scale data
title_sort going bayesian: model-based gene set analysis of genome-scale data
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887944/
https://www.ncbi.nlm.nih.gov/pubmed/20172960
http://dx.doi.org/10.1093/nar/gkq045
work_keys_str_mv AT bauersebastian goingbayesianmodelbasedgenesetanalysisofgenomescaledata
AT gagneurjulien goingbayesianmodelbasedgenesetanalysisofgenomescaledata
AT robinsonpetern goingbayesianmodelbasedgenesetanalysisofgenomescaledata