Cargando…
ProbCD: enrichment analysis accounting for categorization uncertainty
BACKGROUND: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of effort...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2169266/ https://www.ncbi.nlm.nih.gov/pubmed/17935624 http://dx.doi.org/10.1186/1471-2105-8-383 |
_version_ | 1782144861275684864 |
---|---|
author | Vêncio, Ricardo ZN Shmulevich, Ilya |
author_facet | Vêncio, Ricardo ZN Shmulevich, Ilya |
author_sort | Vêncio, Ricardo ZN |
collection | PubMed |
description | BACKGROUND: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. RESULTS: We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: . CONCLUSION: We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation. |
format | Text |
id | pubmed-2169266 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-21692662008-01-02 ProbCD: enrichment analysis accounting for categorization uncertainty Vêncio, Ricardo ZN Shmulevich, Ilya BMC Bioinformatics Software BACKGROUND: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. RESULTS: We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: . CONCLUSION: We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation. BioMed Central 2007-10-12 /pmc/articles/PMC2169266/ /pubmed/17935624 http://dx.doi.org/10.1186/1471-2105-8-383 Text en Copyright © 2007 Vêncio and Shmulevich; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Vêncio, Ricardo ZN Shmulevich, Ilya ProbCD: enrichment analysis accounting for categorization uncertainty |
title | ProbCD: enrichment analysis accounting for categorization uncertainty |
title_full | ProbCD: enrichment analysis accounting for categorization uncertainty |
title_fullStr | ProbCD: enrichment analysis accounting for categorization uncertainty |
title_full_unstemmed | ProbCD: enrichment analysis accounting for categorization uncertainty |
title_short | ProbCD: enrichment analysis accounting for categorization uncertainty |
title_sort | probcd: enrichment analysis accounting for categorization uncertainty |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2169266/ https://www.ncbi.nlm.nih.gov/pubmed/17935624 http://dx.doi.org/10.1186/1471-2105-8-383 |
work_keys_str_mv | AT vencioricardozn probcdenrichmentanalysisaccountingforcategorizationuncertainty AT shmulevichilya probcdenrichmentanalysisaccountingforcategorizationuncertainty |