Cargando…

Robust and accurate data enrichment statistics via distribution function of sum of weights

Motivation: Term-enrichment analysis facilitates biological interpretation by assigning to experimentally/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical significance for each vocabulary term and using t...

Descripción completa

Detalles Bibliográficos
Autores principales: Stojmirović, Aleksandar, Yu, Yi-Kuo
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958744/
https://www.ncbi.nlm.nih.gov/pubmed/20826881
http://dx.doi.org/10.1093/bioinformatics/btq511
_version_ 1782188369773592576
author Stojmirović, Aleksandar
Yu, Yi-Kuo
author_facet Stojmirović, Aleksandar
Yu, Yi-Kuo
author_sort Stojmirović, Aleksandar
collection PubMed
description Motivation: Term-enrichment analysis facilitates biological interpretation by assigning to experimentally/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical significance for each vocabulary term and using the most significant terms to describe a given set of biological entities, often associated with weights. Many existing enrichment methods require selections of (arbitrary number of) the most significant entities and/or do not account for weights of entities. Others either mandate extensive simulations to obtain statistics or assume normal weight distribution. In addition, most methods have difficulty assigning correct statistical significance to terms with few entities. Results: Implementing the well-known Lugananni–Rice formula, we have developed a novel approach, called SaddleSum, that is free from all the aforementioned constraints and evaluated it against several existing methods. With entity weights properly taken into account, SaddleSum is internally consistent and stable with respect to the choice of number of most significant entities selected. Making few assumptions on the input data, the proposed method is universal and can thus be applied to areas beyond analysis of microarrays. Employing asymptotic approximation, SaddleSum provides a term-size-dependent score distribution function that gives rise to accurate statistical significance even for terms with few entities. As a consequence, SaddleSum enables researchers to place confidence in its significance assignments to small terms that are often biologically most specific. Availability: Our implementation, which uses Bonferroni correction to account for multiple hypotheses testing, is available at http://www.ncbi.nlm.nih.gov/CBBresearch/qmbp/mn/enrich/. Source code for the standalone version can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/SaddleSum/. Contact: yyu@ncbi.nlm.nih.gov Supplementary information: Supplementary materials are available at Bioinformatics online.
format Text
id pubmed-2958744
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29587442010-10-22 Robust and accurate data enrichment statistics via distribution function of sum of weights Stojmirović, Aleksandar Yu, Yi-Kuo Bioinformatics Original Paper Motivation: Term-enrichment analysis facilitates biological interpretation by assigning to experimentally/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical significance for each vocabulary term and using the most significant terms to describe a given set of biological entities, often associated with weights. Many existing enrichment methods require selections of (arbitrary number of) the most significant entities and/or do not account for weights of entities. Others either mandate extensive simulations to obtain statistics or assume normal weight distribution. In addition, most methods have difficulty assigning correct statistical significance to terms with few entities. Results: Implementing the well-known Lugananni–Rice formula, we have developed a novel approach, called SaddleSum, that is free from all the aforementioned constraints and evaluated it against several existing methods. With entity weights properly taken into account, SaddleSum is internally consistent and stable with respect to the choice of number of most significant entities selected. Making few assumptions on the input data, the proposed method is universal and can thus be applied to areas beyond analysis of microarrays. Employing asymptotic approximation, SaddleSum provides a term-size-dependent score distribution function that gives rise to accurate statistical significance even for terms with few entities. As a consequence, SaddleSum enables researchers to place confidence in its significance assignments to small terms that are often biologically most specific. Availability: Our implementation, which uses Bonferroni correction to account for multiple hypotheses testing, is available at http://www.ncbi.nlm.nih.gov/CBBresearch/qmbp/mn/enrich/. Source code for the standalone version can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/SaddleSum/. Contact: yyu@ncbi.nlm.nih.gov Supplementary information: Supplementary materials are available at Bioinformatics online. Oxford University Press 2010-11-01 2010-09-08 /pmc/articles/PMC2958744/ /pubmed/20826881 http://dx.doi.org/10.1093/bioinformatics/btq511 Text en http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Stojmirović, Aleksandar
Yu, Yi-Kuo
Robust and accurate data enrichment statistics via distribution function of sum of weights
title Robust and accurate data enrichment statistics via distribution function of sum of weights
title_full Robust and accurate data enrichment statistics via distribution function of sum of weights
title_fullStr Robust and accurate data enrichment statistics via distribution function of sum of weights
title_full_unstemmed Robust and accurate data enrichment statistics via distribution function of sum of weights
title_short Robust and accurate data enrichment statistics via distribution function of sum of weights
title_sort robust and accurate data enrichment statistics via distribution function of sum of weights
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958744/
https://www.ncbi.nlm.nih.gov/pubmed/20826881
http://dx.doi.org/10.1093/bioinformatics/btq511
work_keys_str_mv AT stojmirovicaleksandar robustandaccuratedataenrichmentstatisticsviadistributionfunctionofsumofweights
AT yuyikuo robustandaccuratedataenrichmentstatisticsviadistributionfunctionofsumofweights