Cargando…

Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership

High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an “interesting” set of genes – say, genes that are co-expressed or bound by t...

Descripción completa

Detalles Bibliográficos
Autores principales: Iacucci, Ernesto, Zingg, Hans H., Perkins, Theodore J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3284693/
https://www.ncbi.nlm.nih.gov/pubmed/22375144
http://dx.doi.org/10.3389/fgene.2012.00024
_version_ 1782224391462977536
author Iacucci, Ernesto
Zingg, Hans H.
Perkins, Theodore J.
author_facet Iacucci, Ernesto
Zingg, Hans H.
Perkins, Theodore J.
author_sort Iacucci, Ernesto
collection PubMed
description High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an “interesting” set of genes – say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched) or under-represented (depleted) among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover “gold standard” annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.
format Online
Article
Text
id pubmed-3284693
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-32846932012-02-28 Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership Iacucci, Ernesto Zingg, Hans H. Perkins, Theodore J. Front Genet Genetics High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an “interesting” set of genes – say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched) or under-represented (depleted) among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover “gold standard” annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available. Frontiers Research Foundation 2012-02-23 /pmc/articles/PMC3284693/ /pubmed/22375144 http://dx.doi.org/10.3389/fgene.2012.00024 Text en Copyright © 2012 Iacucci, Zingg and Perkins. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
spellingShingle Genetics
Iacucci, Ernesto
Zingg, Hans H.
Perkins, Theodore J.
Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership
title Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership
title_full Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership
title_fullStr Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership
title_full_unstemmed Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership
title_short Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership
title_sort methods for determining the statistical significance of enrichment or depletion of gene ontology classifications under weighted membership
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3284693/
https://www.ncbi.nlm.nih.gov/pubmed/22375144
http://dx.doi.org/10.3389/fgene.2012.00024
work_keys_str_mv AT iacucciernesto methodsfordeterminingthestatisticalsignificanceofenrichmentordepletionofgeneontologyclassificationsunderweightedmembership
AT zingghansh methodsfordeterminingthestatisticalsignificanceofenrichmentordepletionofgeneontologyclassificationsunderweightedmembership
AT perkinstheodorej methodsfordeterminingthestatisticalsignificanceofenrichmentordepletionofgeneontologyclassificationsunderweightedmembership