Cargando…

An accurate paired sample test for count data

Motivation: Recent technology platforms in proteomics and genomics produce count data for quantitative analysis. Previous works on statistical significance analysis for count data have mainly focused on the independent sample setting, which does not cover the case where pairs of measurements are tak...

Descripción completa

Detalles Bibliográficos
Autores principales: Pham, Thang V., Jimenez, Connie R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436821/
https://www.ncbi.nlm.nih.gov/pubmed/22962487
http://dx.doi.org/10.1093/bioinformatics/bts394
_version_ 1782242705499226112
author Pham, Thang V.
Jimenez, Connie R.
author_facet Pham, Thang V.
Jimenez, Connie R.
author_sort Pham, Thang V.
collection PubMed
description Motivation: Recent technology platforms in proteomics and genomics produce count data for quantitative analysis. Previous works on statistical significance analysis for count data have mainly focused on the independent sample setting, which does not cover the case where pairs of measurements are taken from individual patients before and after treatment. This experimental setting requires paired sample testing such as the paired t-test often used for continuous measurements. A state-of-the-art method uses a negative binomial distribution in a generalized linear model framework for paired sample testing. A paired sample design assumes that the relative change within each pair is constant across biological samples. This model can be used as an approximation to the true model in cases of heterogeneity of response in complex biological systems. We aim to specify the variation in response explicitly in combination with the inherent technical variation. Results: We formulate the problem of paired sample test for count data in a framework of statistical combination of multiple contingency tables. In particular, we specify explicitly a random distribution for the effect with an inverted beta model. The technical variation can be modeled by either a standard Poisson distribution or an exponentiated Poisson distribution, depending on the reproducibility of the acquisition workflow. The new statistical test is evaluated on both proteomics and genomics datasets, showing a comparable performance to the state-of-the-art method in general, and in several cases where the two methods differ, the proposed test returns more reasonable p-values. Availability: Available for download at http://www.oncoproteomics.nl/. Contact: t.pham@vumc.nl
format Online
Article
Text
id pubmed-3436821
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34368212012-12-12 An accurate paired sample test for count data Pham, Thang V. Jimenez, Connie R. Bioinformatics Original Papers Motivation: Recent technology platforms in proteomics and genomics produce count data for quantitative analysis. Previous works on statistical significance analysis for count data have mainly focused on the independent sample setting, which does not cover the case where pairs of measurements are taken from individual patients before and after treatment. This experimental setting requires paired sample testing such as the paired t-test often used for continuous measurements. A state-of-the-art method uses a negative binomial distribution in a generalized linear model framework for paired sample testing. A paired sample design assumes that the relative change within each pair is constant across biological samples. This model can be used as an approximation to the true model in cases of heterogeneity of response in complex biological systems. We aim to specify the variation in response explicitly in combination with the inherent technical variation. Results: We formulate the problem of paired sample test for count data in a framework of statistical combination of multiple contingency tables. In particular, we specify explicitly a random distribution for the effect with an inverted beta model. The technical variation can be modeled by either a standard Poisson distribution or an exponentiated Poisson distribution, depending on the reproducibility of the acquisition workflow. The new statistical test is evaluated on both proteomics and genomics datasets, showing a comparable performance to the state-of-the-art method in general, and in several cases where the two methods differ, the proposed test returns more reasonable p-values. Availability: Available for download at http://www.oncoproteomics.nl/. Contact: t.pham@vumc.nl Oxford University Press 2012-09-15 2012-09-03 /pmc/articles/PMC3436821/ /pubmed/22962487 http://dx.doi.org/10.1093/bioinformatics/bts394 Text en © The Author(s) (2012). Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Pham, Thang V.
Jimenez, Connie R.
An accurate paired sample test for count data
title An accurate paired sample test for count data
title_full An accurate paired sample test for count data
title_fullStr An accurate paired sample test for count data
title_full_unstemmed An accurate paired sample test for count data
title_short An accurate paired sample test for count data
title_sort accurate paired sample test for count data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436821/
https://www.ncbi.nlm.nih.gov/pubmed/22962487
http://dx.doi.org/10.1093/bioinformatics/bts394
work_keys_str_mv AT phamthangv anaccuratepairedsampletestforcountdata
AT jimenezconnier anaccuratepairedsampletestforcountdata
AT phamthangv accuratepairedsampletestforcountdata
AT jimenezconnier accuratepairedsampletestforcountdata