Cargando…

Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates

Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix(2) (rd. “mixquare”), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragm...

Descripción completa

Detalles Bibliográficos
Autores principales: Tuerk, Andreas, Wiktorin, Gregor, Güler, Serhat
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5448817/
https://www.ncbi.nlm.nih.gov/pubmed/28505151
http://dx.doi.org/10.1371/journal.pcbi.1005515
_version_ 1783239635631079424
author Tuerk, Andreas
Wiktorin, Gregor
Güler, Serhat
author_facet Tuerk, Andreas
Wiktorin, Gregor
Güler, Serhat
author_sort Tuerk, Andreas
collection PubMed
description Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix(2) (rd. “mixquare”), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix(2) are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix(2) to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix(2) overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix(2) on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC) Consortia. On MAQC data, Mix(2) achieves improved correlation to qPCR measurements with a relative increase in R(2) between 4% and 50%. Mix(2) also yields repeatable concentration estimates across technical replicates with a relative increase in R(2) between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix(2) reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix(2) yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix(2), 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix(2), 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R(2) between 8% and 44% and reduced standard deviation.
format Online
Article
Text
id pubmed-5448817
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-54488172017-06-06 Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates Tuerk, Andreas Wiktorin, Gregor Güler, Serhat PLoS Comput Biol Research Article Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix(2) (rd. “mixquare”), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix(2) are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix(2) to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix(2) overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix(2) on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC) Consortia. On MAQC data, Mix(2) achieves improved correlation to qPCR measurements with a relative increase in R(2) between 4% and 50%. Mix(2) also yields repeatable concentration estimates across technical replicates with a relative increase in R(2) between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix(2) reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix(2) yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix(2), 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix(2), 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R(2) between 8% and 44% and reduced standard deviation. Public Library of Science 2017-05-15 /pmc/articles/PMC5448817/ /pubmed/28505151 http://dx.doi.org/10.1371/journal.pcbi.1005515 Text en © 2017 Tuerk et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tuerk, Andreas
Wiktorin, Gregor
Güler, Serhat
Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
title Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
title_full Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
title_fullStr Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
title_full_unstemmed Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
title_short Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates
title_sort mixture models reveal multiple positional bias types in rna-seq data and lead to accurate transcript concentration estimates
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5448817/
https://www.ncbi.nlm.nih.gov/pubmed/28505151
http://dx.doi.org/10.1371/journal.pcbi.1005515
work_keys_str_mv AT tuerkandreas mixturemodelsrevealmultiplepositionalbiastypesinrnaseqdataandleadtoaccuratetranscriptconcentrationestimates
AT wiktoringregor mixturemodelsrevealmultiplepositionalbiastypesinrnaseqdataandleadtoaccuratetranscriptconcentrationestimates
AT gulerserhat mixturemodelsrevealmultiplepositionalbiastypesinrnaseqdataandleadtoaccuratetranscriptconcentrationestimates