Cargando…

A flexible Bayesian method for detecting allelic imbalance in RNA-seq data

BACKGROUND: One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitl...

Descripción completa

Detalles Bibliográficos
Autores principales:	León-Novelo, Luis G, McIntyre, Lauren M, Fear, Justin M, Graze, Rita M
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230747/ https://www.ncbi.nlm.nih.gov/pubmed/25339465 http://dx.doi.org/10.1186/1471-2164-15-920

_version_	1782344326027673600
author	León-Novelo, Luis G McIntyre, Lauren M Fear, Justin M Graze, Rita M
author_facet	León-Novelo, Luis G McIntyre, Lauren M Fear, Justin M Graze, Rita M
author_sort	León-Novelo, Luis G
collection	PubMed
description	BACKGROUND: One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data. RESULTS: Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test. CONCLUSIONS: To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-920) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4230747
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42307472014-11-14 A flexible Bayesian method for detecting allelic imbalance in RNA-seq data León-Novelo, Luis G McIntyre, Lauren M Fear, Justin M Graze, Rita M BMC Genomics Methodology Article BACKGROUND: One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data. RESULTS: Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test. CONCLUSIONS: To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-920) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-23 /pmc/articles/PMC4230747/ /pubmed/25339465 http://dx.doi.org/10.1186/1471-2164-15-920 Text en © León-Novelo et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article León-Novelo, Luis G McIntyre, Lauren M Fear, Justin M Graze, Rita M A flexible Bayesian method for detecting allelic imbalance in RNA-seq data
title	A flexible Bayesian method for detecting allelic imbalance in RNA-seq data
title_full	A flexible Bayesian method for detecting allelic imbalance in RNA-seq data
title_fullStr	A flexible Bayesian method for detecting allelic imbalance in RNA-seq data
title_full_unstemmed	A flexible Bayesian method for detecting allelic imbalance in RNA-seq data
title_short	A flexible Bayesian method for detecting allelic imbalance in RNA-seq data
title_sort	flexible bayesian method for detecting allelic imbalance in rna-seq data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4230747/ https://www.ncbi.nlm.nih.gov/pubmed/25339465 http://dx.doi.org/10.1186/1471-2164-15-920
work_keys_str_mv	AT leonnoveloluisg aflexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata AT mcintyrelaurenm aflexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata AT fearjustinm aflexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata AT grazeritam aflexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata AT leonnoveloluisg flexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata AT mcintyrelaurenm flexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata AT fearjustinm flexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata AT grazeritam flexiblebayesianmethodfordetectingallelicimbalanceinrnaseqdata

A flexible Bayesian method for detecting allelic imbalance in RNA-seq data

Ejemplares similares