Cargando…

ProbCD: enrichment analysis accounting for categorization uncertainty

BACKGROUND: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of effort...

Descripción completa

Detalles Bibliográficos
Autores principales: Vêncio, Ricardo ZN, Shmulevich, Ilya
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2169266/
https://www.ncbi.nlm.nih.gov/pubmed/17935624
http://dx.doi.org/10.1186/1471-2105-8-383
_version_ 1782144861275684864
author Vêncio, Ricardo ZN
Shmulevich, Ilya
author_facet Vêncio, Ricardo ZN
Shmulevich, Ilya
author_sort Vêncio, Ricardo ZN
collection PubMed
description BACKGROUND: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. RESULTS: We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: . CONCLUSION: We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.
format Text
id pubmed-2169266
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21692662008-01-02 ProbCD: enrichment analysis accounting for categorization uncertainty Vêncio, Ricardo ZN Shmulevich, Ilya BMC Bioinformatics Software BACKGROUND: As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. RESULTS: We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: . CONCLUSION: We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation. BioMed Central 2007-10-12 /pmc/articles/PMC2169266/ /pubmed/17935624 http://dx.doi.org/10.1186/1471-2105-8-383 Text en Copyright © 2007 Vêncio and Shmulevich; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Vêncio, Ricardo ZN
Shmulevich, Ilya
ProbCD: enrichment analysis accounting for categorization uncertainty
title ProbCD: enrichment analysis accounting for categorization uncertainty
title_full ProbCD: enrichment analysis accounting for categorization uncertainty
title_fullStr ProbCD: enrichment analysis accounting for categorization uncertainty
title_full_unstemmed ProbCD: enrichment analysis accounting for categorization uncertainty
title_short ProbCD: enrichment analysis accounting for categorization uncertainty
title_sort probcd: enrichment analysis accounting for categorization uncertainty
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2169266/
https://www.ncbi.nlm.nih.gov/pubmed/17935624
http://dx.doi.org/10.1186/1471-2105-8-383
work_keys_str_mv AT vencioricardozn probcdenrichmentanalysisaccountingforcategorizationuncertainty
AT shmulevichilya probcdenrichmentanalysisaccountingforcategorizationuncertainty