Cargando…

Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model

BACKGROUND: Serial analysis of gene expression (SAGE) is used to obtain quantitative snapshots of the transcriptome. These profiles are count-based and are assumed to follow a Binomial or Poisson distribution. However, tag counts observed across multiple libraries (for example, one or more groups of...

Descripción completa

Detalles Bibliográficos
Autor principal:	Zuyderduyn, Scott D
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2147036/ https://www.ncbi.nlm.nih.gov/pubmed/17683533 http://dx.doi.org/10.1186/1471-2105-8-282

_version_	1782144348928868352
author	Zuyderduyn, Scott D
author_facet	Zuyderduyn, Scott D
author_sort	Zuyderduyn, Scott D
collection	PubMed
description	BACKGROUND: Serial analysis of gene expression (SAGE) is used to obtain quantitative snapshots of the transcriptome. These profiles are count-based and are assumed to follow a Binomial or Poisson distribution. However, tag counts observed across multiple libraries (for example, one or more groups of biological replicates) have additional variance that cannot be accommodated by this assumption alone. Several models have been proposed to account for this effect, all of which utilize a continuous prior distribution to explain the excess variance. Here, a Poisson mixture model, which assumes excess variability arises from sampling a mixture of distinct components, is proposed and the merits of this model are discussed and evaluated. RESULTS: The goodness of fit of the Poisson mixture model on 15 sets of biological SAGE replicates is compared to the previously proposed hierarchical gamma-Poisson (negative binomial) model, and a substantial improvement is seen. In further support of the mixture model, there is observed: 1) an increase in the number of mixture components needed to fit the expression of tags representing more than one transcript; and 2) a tendency for components to cluster libraries into the same groups. A confidence score is presented that can identify tags that are differentially expressed between groups of SAGE libraries. Several examples where this test outperforms those previously proposed are highlighted. CONCLUSION: The Poisson mixture model performs well as a) a method to represent SAGE data from biological replicates, and b) a basis to assign significance when testing for differential expression between multiple groups of replicates. Code for the R statistical software package is included to assist investigators in applying this model to their own data.
format	Text
id	pubmed-2147036
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-21470362007-12-19 Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model Zuyderduyn, Scott D BMC Bioinformatics Methodology Article BACKGROUND: Serial analysis of gene expression (SAGE) is used to obtain quantitative snapshots of the transcriptome. These profiles are count-based and are assumed to follow a Binomial or Poisson distribution. However, tag counts observed across multiple libraries (for example, one or more groups of biological replicates) have additional variance that cannot be accommodated by this assumption alone. Several models have been proposed to account for this effect, all of which utilize a continuous prior distribution to explain the excess variance. Here, a Poisson mixture model, which assumes excess variability arises from sampling a mixture of distinct components, is proposed and the merits of this model are discussed and evaluated. RESULTS: The goodness of fit of the Poisson mixture model on 15 sets of biological SAGE replicates is compared to the previously proposed hierarchical gamma-Poisson (negative binomial) model, and a substantial improvement is seen. In further support of the mixture model, there is observed: 1) an increase in the number of mixture components needed to fit the expression of tags representing more than one transcript; and 2) a tendency for components to cluster libraries into the same groups. A confidence score is presented that can identify tags that are differentially expressed between groups of SAGE libraries. Several examples where this test outperforms those previously proposed are highlighted. CONCLUSION: The Poisson mixture model performs well as a) a method to represent SAGE data from biological replicates, and b) a basis to assign significance when testing for differential expression between multiple groups of replicates. Code for the R statistical software package is included to assist investigators in applying this model to their own data. BioMed Central 2007-08-02 /pmc/articles/PMC2147036/ /pubmed/17683533 http://dx.doi.org/10.1186/1471-2105-8-282 Text en Copyright © 2007 Zuyderduyn; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Zuyderduyn, Scott D Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
title	Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
title_full	Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
title_fullStr	Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
title_full_unstemmed	Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
title_short	Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model
title_sort	statistical analysis and significance testing of serial analysis of gene expression data using a poisson mixture model
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2147036/ https://www.ncbi.nlm.nih.gov/pubmed/17683533 http://dx.doi.org/10.1186/1471-2105-8-282
work_keys_str_mv	AT zuyderduynscottd statisticalanalysisandsignificancetestingofserialanalysisofgeneexpressiondatausingapoissonmixturemodel

Statistical analysis and significance testing of serial analysis of gene expression data using a Poisson mixture model

Ejemplares similares