Cargando…

Modeling Sage data with a truncated gamma-Poisson model

BACKGROUND: Serial Analysis of Gene Expressions (SAGE) produces gene expression measurements on a discrete scale, due to the finite number of molecules in the sample. This means that part of the variance in SAGE data should be understood as the sampling error in a binomial or Poisson distribution, w...

Descripción completa

Detalles Bibliográficos
Autores principales: Thygesen, Helene H, Zwinderman, Aeilko H
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1479844/
https://www.ncbi.nlm.nih.gov/pubmed/16549008
http://dx.doi.org/10.1186/1471-2105-7-157
_version_ 1782128205882195968
author Thygesen, Helene H
Zwinderman, Aeilko H
author_facet Thygesen, Helene H
Zwinderman, Aeilko H
author_sort Thygesen, Helene H
collection PubMed
description BACKGROUND: Serial Analysis of Gene Expressions (SAGE) produces gene expression measurements on a discrete scale, due to the finite number of molecules in the sample. This means that part of the variance in SAGE data should be understood as the sampling error in a binomial or Poisson distribution, whereas other variance sources, in particular biological variance, should be modeled using a continuous distribution function, i.e. a prior on the intensity of the Poisson distribution. One challenge is that such a model predicts a large number of genes with zero counts, which cannot be observed. RESULTS: We present a hierarchical Poisson model with a gamma prior and three different algorithms for estimating the parameters in the model. It turns out that the rate parameter in the gamma distribution can be estimated on the basis of a single SAGE library, whereas the estimate of the shape parameter becomes unstable. This means that the number of zero counts cannot be estimated reliably. When a bivariate model is applied to two SAGE libraries, however, the number of predicted zero counts becomes more stable and in approximate agreement with the number of transcripts observed across a large number of experiments. In all the libraries we analyzed there was a small population of very highly expressed tags, typically 1% of the tags, that could not be accounted for by the model. To handle those tags we chose to augment our model with a non-parametric component. We also show some results based on a log-normal distribution instead of the gamma distribution. CONCLUSION: By modeling SAGE data with a hierarchical Poisson model it is possible to separate the sampling variance from the variance in gene expression. If expression levels are reported at the gene level rather than at the tag level, genes mapped to multiple tags must be kept separate, since their expression levels show a different statistical behavior. A log-normal prior provided a better fit to our data than the gamma prior, but except for a small subpopulation of tags with very high counts, the two priors are similar.
format Text
id pubmed-1479844
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-14798442006-06-19 Modeling Sage data with a truncated gamma-Poisson model Thygesen, Helene H Zwinderman, Aeilko H BMC Bioinformatics Research Article BACKGROUND: Serial Analysis of Gene Expressions (SAGE) produces gene expression measurements on a discrete scale, due to the finite number of molecules in the sample. This means that part of the variance in SAGE data should be understood as the sampling error in a binomial or Poisson distribution, whereas other variance sources, in particular biological variance, should be modeled using a continuous distribution function, i.e. a prior on the intensity of the Poisson distribution. One challenge is that such a model predicts a large number of genes with zero counts, which cannot be observed. RESULTS: We present a hierarchical Poisson model with a gamma prior and three different algorithms for estimating the parameters in the model. It turns out that the rate parameter in the gamma distribution can be estimated on the basis of a single SAGE library, whereas the estimate of the shape parameter becomes unstable. This means that the number of zero counts cannot be estimated reliably. When a bivariate model is applied to two SAGE libraries, however, the number of predicted zero counts becomes more stable and in approximate agreement with the number of transcripts observed across a large number of experiments. In all the libraries we analyzed there was a small population of very highly expressed tags, typically 1% of the tags, that could not be accounted for by the model. To handle those tags we chose to augment our model with a non-parametric component. We also show some results based on a log-normal distribution instead of the gamma distribution. CONCLUSION: By modeling SAGE data with a hierarchical Poisson model it is possible to separate the sampling variance from the variance in gene expression. If expression levels are reported at the gene level rather than at the tag level, genes mapped to multiple tags must be kept separate, since their expression levels show a different statistical behavior. A log-normal prior provided a better fit to our data than the gamma prior, but except for a small subpopulation of tags with very high counts, the two priors are similar. BioMed Central 2006-03-20 /pmc/articles/PMC1479844/ /pubmed/16549008 http://dx.doi.org/10.1186/1471-2105-7-157 Text en Copyright © 2006 Thygesen and Zwinderman; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Thygesen, Helene H
Zwinderman, Aeilko H
Modeling Sage data with a truncated gamma-Poisson model
title Modeling Sage data with a truncated gamma-Poisson model
title_full Modeling Sage data with a truncated gamma-Poisson model
title_fullStr Modeling Sage data with a truncated gamma-Poisson model
title_full_unstemmed Modeling Sage data with a truncated gamma-Poisson model
title_short Modeling Sage data with a truncated gamma-Poisson model
title_sort modeling sage data with a truncated gamma-poisson model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1479844/
https://www.ncbi.nlm.nih.gov/pubmed/16549008
http://dx.doi.org/10.1186/1471-2105-7-157
work_keys_str_mv AT thygesenheleneh modelingsagedatawithatruncatedgammapoissonmodel
AT zwindermanaeilkoh modelingsagedatawithatruncatedgammapoissonmodel