Cargando…

A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data

A main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Gregory R., Birtwistle, Marc R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915702/
https://www.ncbi.nlm.nih.gov/pubmed/27326762
http://dx.doi.org/10.1371/journal.pone.0157828
_version_ 1782438725836341248
author Smith, Gregory R.
Birtwistle, Marc R.
author_facet Smith, Gregory R.
Birtwistle, Marc R.
author_sort Smith, Gregory R.
collection PubMed
description A main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability model used to describe mRNAseq data is central to deriving DEGs, and not surprisingly most softwares use different models and assumptions to analyze mRNAseq data. Here, we propose a mechanistic justification to model mRNAseq as a binomial process, with data from technical replicates given by a binomial distribution, and data from biological replicates well-described by a beta-binomial distribution. We demonstrate good agreement of this model with two large datasets. We show that an emergent feature of the beta-binomial distribution, given parameter regimes typical for mRNAseq experiments, is the well-known quadratic polynomial scaling of variance with the mean. The so-called dispersion parameter controls this scaling, and our analysis suggests that the dispersion parameter is a continually decreasing function of the mean, as opposed to current approaches that impose an asymptotic value to the dispersion parameter at moderate mean read counts. We show how this leads to current approaches overestimating variance for moderately to highly expressed genes, which inflates false negative rates. Describing mRNAseq data with a beta-binomial distribution thus may be preferred since its parameters are relatable to the mechanistic underpinnings of the technique and may improve the consistency of DEG analysis across softwares, particularly for moderately to highly expressed genes.
format Online
Article
Text
id pubmed-4915702
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49157022016-07-06 A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data Smith, Gregory R. Birtwistle, Marc R. PLoS One Research Article A main application for mRNA sequencing (mRNAseq) is determining lists of differentially-expressed genes (DEGs) between two or more conditions. Several software packages exist to produce DEGs from mRNAseq data, but they typically yield different DEGs, sometimes markedly so. The underlying probability model used to describe mRNAseq data is central to deriving DEGs, and not surprisingly most softwares use different models and assumptions to analyze mRNAseq data. Here, we propose a mechanistic justification to model mRNAseq as a binomial process, with data from technical replicates given by a binomial distribution, and data from biological replicates well-described by a beta-binomial distribution. We demonstrate good agreement of this model with two large datasets. We show that an emergent feature of the beta-binomial distribution, given parameter regimes typical for mRNAseq experiments, is the well-known quadratic polynomial scaling of variance with the mean. The so-called dispersion parameter controls this scaling, and our analysis suggests that the dispersion parameter is a continually decreasing function of the mean, as opposed to current approaches that impose an asymptotic value to the dispersion parameter at moderate mean read counts. We show how this leads to current approaches overestimating variance for moderately to highly expressed genes, which inflates false negative rates. Describing mRNAseq data with a beta-binomial distribution thus may be preferred since its parameters are relatable to the mechanistic underpinnings of the technique and may improve the consistency of DEG analysis across softwares, particularly for moderately to highly expressed genes. Public Library of Science 2016-06-21 /pmc/articles/PMC4915702/ /pubmed/27326762 http://dx.doi.org/10.1371/journal.pone.0157828 Text en © 2016 Smith, Birtwistle http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Smith, Gregory R.
Birtwistle, Marc R.
A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data
title A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data
title_full A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data
title_fullStr A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data
title_full_unstemmed A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data
title_short A Mechanistic Beta-Binomial Probability Model for mRNA Sequencing Data
title_sort mechanistic beta-binomial probability model for mrna sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915702/
https://www.ncbi.nlm.nih.gov/pubmed/27326762
http://dx.doi.org/10.1371/journal.pone.0157828
work_keys_str_mv AT smithgregoryr amechanisticbetabinomialprobabilitymodelformrnasequencingdata
AT birtwistlemarcr amechanisticbetabinomialprobabilitymodelformrnasequencingdata
AT smithgregoryr mechanisticbetabinomialprobabilitymodelformrnasequencingdata
AT birtwistlemarcr mechanisticbetabinomialprobabilitymodelformrnasequencingdata