Cargando…

Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negati...

Descripción completa

Detalles Bibliográficos
Autores principales: Gierliński, Marek, Cole, Christian, Schofield, Pietà, Schurch, Nicholas J., Sherstnev, Alexander, Singh, Vijender, Wrobel, Nicola, Gharbi, Karim, Simpson, Gordon, Owen-Hughes, Tom, Blaxter, Mark, Barton, Geoffrey J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754627/
https://www.ncbi.nlm.nih.gov/pubmed/26206307
http://dx.doi.org/10.1093/bioinformatics/btv425
_version_ 1782416055303405568
author Gierliński, Marek
Cole, Christian
Schofield, Pietà
Schurch, Nicholas J.
Sherstnev, Alexander
Singh, Vijender
Wrobel, Nicola
Gharbi, Karim
Simpson, Gordon
Owen-Hughes, Tom
Blaxter, Mark
Barton, Geoffrey J.
author_facet Gierliński, Marek
Cole, Christian
Schofield, Pietà
Schurch, Nicholas J.
Sherstnev, Alexander
Singh, Vijender
Wrobel, Nicola
Gharbi, Karim
Simpson, Gordon
Owen-Hughes, Tom
Blaxter, Mark
Barton, Geoffrey J.
author_sort Gierliński, Marek
collection PubMed
description Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk
format Online
Article
Text
id pubmed-4754627
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47546272016-02-17 Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment Gierliński, Marek Cole, Christian Schofield, Pietà Schurch, Nicholas J. Sherstnev, Alexander Singh, Vijender Wrobel, Nicola Gharbi, Karim Simpson, Gordon Owen-Hughes, Tom Blaxter, Mark Barton, Geoffrey J. Bioinformatics Original Papers Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk Oxford University Press 2015-11-15 2015-07-23 /pmc/articles/PMC4754627/ /pubmed/26206307 http://dx.doi.org/10.1093/bioinformatics/btv425 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Gierliński, Marek
Cole, Christian
Schofield, Pietà
Schurch, Nicholas J.
Sherstnev, Alexander
Singh, Vijender
Wrobel, Nicola
Gharbi, Karim
Simpson, Gordon
Owen-Hughes, Tom
Blaxter, Mark
Barton, Geoffrey J.
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
title Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
title_full Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
title_fullStr Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
title_full_unstemmed Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
title_short Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
title_sort statistical models for rna-seq data derived from a two-condition 48-replicate experiment
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4754627/
https://www.ncbi.nlm.nih.gov/pubmed/26206307
http://dx.doi.org/10.1093/bioinformatics/btv425
work_keys_str_mv AT gierlinskimarek statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT colechristian statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT schofieldpieta statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT schurchnicholasj statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT sherstnevalexander statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT singhvijender statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT wrobelnicola statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT gharbikarim statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT simpsongordon statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT owenhughestom statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT blaxtermark statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment
AT bartongeoffreyj statisticalmodelsforrnaseqdataderivedfromatwocondition48replicateexperiment