Cargando…

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biologica...

Descripción completa

Detalles Bibliográficos
Autores principales: McCarthy, Davis J., Chen, Yunshun, Smyth, Gordon K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/
https://www.ncbi.nlm.nih.gov/pubmed/22287627
http://dx.doi.org/10.1093/nar/gks042
_version_ 1782236096005931008
author McCarthy, Davis J.
Chen, Yunshun
Smyth, Gordon K.
author_facet McCarthy, Davis J.
Chen, Yunshun
Smyth, Gordon K.
author_sort McCarthy, Davis J.
collection PubMed
description A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.
format Online
Article
Text
id pubmed-3378882
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33788822012-06-20 Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation McCarthy, Davis J. Chen, Yunshun Smyth, Gordon K. Nucleic Acids Res Computational Biology A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses. Oxford University Press 2012-05 2012-01-28 /pmc/articles/PMC3378882/ /pubmed/22287627 http://dx.doi.org/10.1093/nar/gks042 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
McCarthy, Davis J.
Chen, Yunshun
Smyth, Gordon K.
Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
title Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
title_full Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
title_fullStr Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
title_full_unstemmed Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
title_short Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
title_sort differential expression analysis of multifactor rna-seq experiments with respect to biological variation
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/
https://www.ncbi.nlm.nih.gov/pubmed/22287627
http://dx.doi.org/10.1093/nar/gks042
work_keys_str_mv AT mccarthydavisj differentialexpressionanalysisofmultifactorrnaseqexperimentswithrespecttobiologicalvariation
AT chenyunshun differentialexpressionanalysisofmultifactorrnaseqexperimentswithrespecttobiologicalvariation
AT smythgordonk differentialexpressionanalysisofmultifactorrnaseqexperimentswithrespecttobiologicalvariation