Cargando…
Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)
Motivation: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058919/ https://www.ncbi.nlm.nih.gov/pubmed/24574114 http://dx.doi.org/10.1093/bioinformatics/btu110 |
_version_ | 1782321186401681408 |
---|---|
author | Frost, H. Robert Moore, Jason H. |
author_facet | Frost, H. Robert Moore, Jason H. |
author_sort | Frost, H. Robert |
collection | PubMed |
description | Motivation: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar datasets. Results: We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets. As shown using simulated gene sets with simulated data and Molecular Signatures Database collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results. Availability and implementation: http://cran.r-project.org/web/packages/EMVC/index.html. Contact: jason.h.moore@dartmouth.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4058919 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-40589192014-06-18 Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) Frost, H. Robert Moore, Jason H. Bioinformatics Original Papers Motivation: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar datasets. Results: We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets. As shown using simulated gene sets with simulated data and Molecular Signatures Database collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results. Availability and implementation: http://cran.r-project.org/web/packages/EMVC/index.html. Contact: jason.h.moore@dartmouth.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-06-15 2014-02-25 /pmc/articles/PMC4058919/ /pubmed/24574114 http://dx.doi.org/10.1093/bioinformatics/btu110 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Frost, H. Robert Moore, Jason H. Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) |
title | Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) |
title_full | Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) |
title_fullStr | Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) |
title_full_unstemmed | Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) |
title_short | Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) |
title_sort | optimization of gene set annotations via entropy minimization over variable clusters (emvc) |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058919/ https://www.ncbi.nlm.nih.gov/pubmed/24574114 http://dx.doi.org/10.1093/bioinformatics/btu110 |
work_keys_str_mv | AT frosthrobert optimizationofgenesetannotationsviaentropyminimizationovervariableclustersemvc AT moorejasonh optimizationofgenesetannotationsviaentropyminimizationovervariableclustersemvc |