Cargando…

Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

BACKGROUND: Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with...

Descripción completa

Detalles Bibliográficos
Autores principales: Zaretzki, Russell L, Gilchrist, Michael A, Briggs, William M, Armagan, Artin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829012/
https://www.ncbi.nlm.nih.gov/pubmed/20128916
http://dx.doi.org/10.1186/1471-2105-11-72
_version_ 1782178059440357376
author Zaretzki, Russell L
Gilchrist, Michael A
Briggs, William M
Armagan, Artin
author_facet Zaretzki, Russell L
Gilchrist, Michael A
Briggs, William M
Armagan, Artin
author_sort Zaretzki, Russell L
collection PubMed
description BACKGROUND: Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power. RESULTS: Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context. CONCLUSIONS: Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.
format Text
id pubmed-2829012
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28290122010-02-26 Bias correction and Bayesian analysis of aggregate counts in SAGE libraries Zaretzki, Russell L Gilchrist, Michael A Briggs, William M Armagan, Artin BMC Bioinformatics Methodology article BACKGROUND: Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power. RESULTS: Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context. CONCLUSIONS: Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression. BioMed Central 2010-02-03 /pmc/articles/PMC2829012/ /pubmed/20128916 http://dx.doi.org/10.1186/1471-2105-11-72 Text en Copyright ©2010 Zaretzki et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Zaretzki, Russell L
Gilchrist, Michael A
Briggs, William M
Armagan, Artin
Bias correction and Bayesian analysis of aggregate counts in SAGE libraries
title Bias correction and Bayesian analysis of aggregate counts in SAGE libraries
title_full Bias correction and Bayesian analysis of aggregate counts in SAGE libraries
title_fullStr Bias correction and Bayesian analysis of aggregate counts in SAGE libraries
title_full_unstemmed Bias correction and Bayesian analysis of aggregate counts in SAGE libraries
title_short Bias correction and Bayesian analysis of aggregate counts in SAGE libraries
title_sort bias correction and bayesian analysis of aggregate counts in sage libraries
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829012/
https://www.ncbi.nlm.nih.gov/pubmed/20128916
http://dx.doi.org/10.1186/1471-2105-11-72
work_keys_str_mv AT zaretzkirusselll biascorrectionandbayesiananalysisofaggregatecountsinsagelibraries
AT gilchristmichaela biascorrectionandbayesiananalysisofaggregatecountsinsagelibraries
AT briggswilliamm biascorrectionandbayesiananalysisofaggregatecountsinsagelibraries
AT armaganartin biascorrectionandbayesiananalysisofaggregatecountsinsagelibraries