Cargando…

Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)

Motivation: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment...

Descripción completa

Detalles Bibliográficos
Autores principales: Frost, H. Robert, Moore, Jason H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058919/
https://www.ncbi.nlm.nih.gov/pubmed/24574114
http://dx.doi.org/10.1093/bioinformatics/btu110
_version_ 1782321186401681408
author Frost, H. Robert
Moore, Jason H.
author_facet Frost, H. Robert
Moore, Jason H.
author_sort Frost, H. Robert
collection PubMed
description Motivation: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar datasets. Results: We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets. As shown using simulated gene sets with simulated data and Molecular Signatures Database collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results. Availability and implementation: http://cran.r-project.org/web/packages/EMVC/index.html. Contact: jason.h.moore@dartmouth.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4058919
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40589192014-06-18 Optimization of gene set annotations via entropy minimization over variable clusters (EMVC) Frost, H. Robert Moore, Jason H. Bioinformatics Original Papers Motivation: Gene set enrichment has become a critical tool for interpreting the results of high-throughput genomic experiments. Inconsistent annotation quality and lack of annotation specificity, however, limit the statistical power of enrichment methods and make it difficult to replicate enrichment results across biologically similar datasets. Results: We propose a novel algorithm for optimizing gene set annotations to best match the structure of specific empirical data sources. Our proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets. As shown using simulated gene sets with simulated data and Molecular Signatures Database collections with microarray gene expression data, the EMVC algorithm accurately filters annotations unrelated to the experimental outcome resulting in increased gene set enrichment power and better replication of enrichment results. Availability and implementation: http://cran.r-project.org/web/packages/EMVC/index.html. Contact: jason.h.moore@dartmouth.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-06-15 2014-02-25 /pmc/articles/PMC4058919/ /pubmed/24574114 http://dx.doi.org/10.1093/bioinformatics/btu110 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Frost, H. Robert
Moore, Jason H.
Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)
title Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)
title_full Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)
title_fullStr Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)
title_full_unstemmed Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)
title_short Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)
title_sort optimization of gene set annotations via entropy minimization over variable clusters (emvc)
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058919/
https://www.ncbi.nlm.nih.gov/pubmed/24574114
http://dx.doi.org/10.1093/bioinformatics/btu110
work_keys_str_mv AT frosthrobert optimizationofgenesetannotationsviaentropyminimizationovervariableclustersemvc
AT moorejasonh optimizationofgenesetannotationsviaentropyminimizationovervariableclustersemvc