Cargando…

Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework

BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Gilchrist, Michael A, Qin, Hong, Zaretzki, Russell
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2217564/
https://www.ncbi.nlm.nih.gov/pubmed/17945026
http://dx.doi.org/10.1186/1471-2105-8-403
_version_ 1782149282072100864
author Gilchrist, Michael A
Qin, Hong
Zaretzki, Russell
author_facet Gilchrist, Michael A
Qin, Hong
Zaretzki, Russell
author_sort Gilchrist, Michael A
collection PubMed
description BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome. RESULTS: Using the yeast Saccharomyces cerevisiae as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA. CONCLUSION: With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process.
format Text
id pubmed-2217564
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22175642008-01-30 Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework Gilchrist, Michael A Qin, Hong Zaretzki, Russell BMC Bioinformatics Methodology Article BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a high-throughput method for inferring mRNA expression levels from the experimentally generated sequence based tags. Standard analyses of SAGE data, however, ignore the fact that the probability of generating an observable tag varies across genes and between experiments. As a consequence, these analyses result in biased estimators and posterior probability intervals for gene expression levels in the transcriptome. RESULTS: Using the yeast Saccharomyces cerevisiae as an example, we introduce a new Bayesian method of data analysis which is based on a model of SAGE tag formation. Our approach incorporates the variation in the probability of tag formation into the interpretation of SAGE data and allows us to derive exact joint and approximate marginal posterior distributions for the mRNA frequency of genes detectable using SAGE. Our analysis of these distributions indicates that the frequency of a gene in the tag pool is influenced by its mRNA frequency, the cleavage efficiency of the anchoring enzyme (AE), and the number of informative and uninformative AE cleavage sites within its mRNA. CONCLUSION: With a mechanistic, model based approach for SAGE data analysis, we find that inter-genic variation in SAGE tag formation is large. However, this variation can be estimated and, importantly, accounted for using the methods we develop here. As a result, SAGE based estimates of mRNA frequencies can be adjusted to remove the bias introduced by the SAGE tag formation process. BioMed Central 2007-10-18 /pmc/articles/PMC2217564/ /pubmed/17945026 http://dx.doi.org/10.1186/1471-2105-8-403 Text en Copyright © 2007 Gilchrist et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Gilchrist, Michael A
Qin, Hong
Zaretzki, Russell
Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework
title Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework
title_full Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework
title_fullStr Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework
title_full_unstemmed Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework
title_short Modeling SAGE tag formation and its effects on data interpretation within a Bayesian framework
title_sort modeling sage tag formation and its effects on data interpretation within a bayesian framework
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2217564/
https://www.ncbi.nlm.nih.gov/pubmed/17945026
http://dx.doi.org/10.1186/1471-2105-8-403
work_keys_str_mv AT gilchristmichaela modelingsagetagformationanditseffectsondatainterpretationwithinabayesianframework
AT qinhong modelingsagetagformationanditseffectsondatainterpretationwithinabayesianframework
AT zaretzkirussell modelingsagetagformationanditseffectsondatainterpretationwithinabayesianframework