Cargando…

Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences

MOTIVATION: In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarith...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhu, Anqi, Ibrahim, Joseph G, Love, Michael I
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581436/ https://www.ncbi.nlm.nih.gov/pubmed/30395178 http://dx.doi.org/10.1093/bioinformatics/bty895

_version_	1783428166456442880
author	Zhu, Anqi Ibrahim, Joseph G Love, Michael I
author_facet	Zhu, Anqi Ibrahim, Joseph G Love, Michael I
author_sort	Zhu, Anqi
collection	PubMed
description	MOTIVATION: In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC). RESULTS: When the read counts are low or highly variable, the maximum likelihood estimates for the LFCs has high variance, leading to large estimates not representative of true differences, and poor ranking of genes by effect size. One approach is to introduce filtering thresholds and pseudocounts to exclude or moderate estimated LFCs. Filtering may result in a loss of genes from the analysis with true differences in expression, while pseudocounts provide a limited solution that must be adapted per dataset. Here, we propose the use of a heavy-tailed Cauchy prior distribution for effect sizes, which avoids the use of filter thresholds or pseudocounts. The proposed method, Approximate Posterior Estimation for generalized linear model, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference. AVAILABILITY AND IMPLEMENTATION: The apeglm package is available as an R/Bioconductor package at https://bioconductor.org/packages/apeglm, and the methods can be called from within the DESeq2 software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6581436
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-65814362019-06-21 Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences Zhu, Anqi Ibrahim, Joseph G Love, Michael I Bioinformatics Original Papers MOTIVATION: In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC). RESULTS: When the read counts are low or highly variable, the maximum likelihood estimates for the LFCs has high variance, leading to large estimates not representative of true differences, and poor ranking of genes by effect size. One approach is to introduce filtering thresholds and pseudocounts to exclude or moderate estimated LFCs. Filtering may result in a loss of genes from the analysis with true differences in expression, while pseudocounts provide a limited solution that must be adapted per dataset. Here, we propose the use of a heavy-tailed Cauchy prior distribution for effect sizes, which avoids the use of filter thresholds or pseudocounts. The proposed method, Approximate Posterior Estimation for generalized linear model, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference. AVAILABILITY AND IMPLEMENTATION: The apeglm package is available as an R/Bioconductor package at https://bioconductor.org/packages/apeglm, and the methods can be called from within the DESeq2 software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-06 2018-11-03 /pmc/articles/PMC6581436/ /pubmed/30395178 http://dx.doi.org/10.1093/bioinformatics/bty895 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Zhu, Anqi Ibrahim, Joseph G Love, Michael I Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
title	Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
title_full	Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
title_fullStr	Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
title_full_unstemmed	Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
title_short	Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
title_sort	heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581436/ https://www.ncbi.nlm.nih.gov/pubmed/30395178 http://dx.doi.org/10.1093/bioinformatics/bty895
work_keys_str_mv	AT zhuanqi heavytailedpriordistributionsforsequencecountdataremovingthenoiseandpreservinglargedifferences AT ibrahimjosephg heavytailedpriordistributionsforsequencecountdataremovingthenoiseandpreservinglargedifferences AT lovemichaeli heavytailedpriordistributionsforsequencecountdataremovingthenoiseandpreservinglargedifferences

Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences

Ejemplares similares