Cargando…

Modeling bias and variation in the stochastic processes of small RNA sequencing

The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation i...

Descripción completa

Detalles Bibliográficos
Autores principales: Argyropoulos, Christos, Etheridge, Alton, Sakhanenko, Nikita, Galas, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499834/
https://www.ncbi.nlm.nih.gov/pubmed/28369495
http://dx.doi.org/10.1093/nar/gkx199
_version_ 1783248538698776576
author Argyropoulos, Christos
Etheridge, Alton
Sakhanenko, Nikita
Galas, David
author_facet Argyropoulos, Christos
Etheridge, Alton
Sakhanenko, Nikita
Galas, David
author_sort Argyropoulos, Christos
collection PubMed
description The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data.
format Online
Article
Text
id pubmed-5499834
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54998342017-07-12 Modeling bias and variation in the stochastic processes of small RNA sequencing Argyropoulos, Christos Etheridge, Alton Sakhanenko, Nikita Galas, David Nucleic Acids Res Methods Online The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data. Oxford University Press 2017-06-20 2017-03-27 /pmc/articles/PMC5499834/ /pubmed/28369495 http://dx.doi.org/10.1093/nar/gkx199 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Argyropoulos, Christos
Etheridge, Alton
Sakhanenko, Nikita
Galas, David
Modeling bias and variation in the stochastic processes of small RNA sequencing
title Modeling bias and variation in the stochastic processes of small RNA sequencing
title_full Modeling bias and variation in the stochastic processes of small RNA sequencing
title_fullStr Modeling bias and variation in the stochastic processes of small RNA sequencing
title_full_unstemmed Modeling bias and variation in the stochastic processes of small RNA sequencing
title_short Modeling bias and variation in the stochastic processes of small RNA sequencing
title_sort modeling bias and variation in the stochastic processes of small rna sequencing
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499834/
https://www.ncbi.nlm.nih.gov/pubmed/28369495
http://dx.doi.org/10.1093/nar/gkx199
work_keys_str_mv AT argyropouloschristos modelingbiasandvariationinthestochasticprocessesofsmallrnasequencing
AT etheridgealton modelingbiasandvariationinthestochasticprocessesofsmallrnasequencing
AT sakhanenkonikita modelingbiasandvariationinthestochasticprocessesofsmallrnasequencing
AT galasdavid modelingbiasandvariationinthestochasticprocessesofsmallrnasequencing