Cargando…

Power analysis and sample size estimation for RNA-Seq differential expression

It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simul...

Descripción completa

Detalles Bibliográficos
Autores principales: Ching, Travers, Huang, Sijia, Garmire, Lana X.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201821/
https://www.ncbi.nlm.nih.gov/pubmed/25246651
http://dx.doi.org/10.1261/rna.046011.114
_version_ 1782340235370168320
author Ching, Travers
Huang, Sijia
Garmire, Lana X.
author_facet Ching, Travers
Huang, Sijia
Garmire, Lana X.
author_sort Ching, Travers
collection PubMed
description It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.
format Online
Article
Text
id pubmed-4201821
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-42018212015-11-01 Power analysis and sample size estimation for RNA-Seq differential expression Ching, Travers Huang, Sijia Garmire, Lana X. RNA Bioinformatics It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments. Cold Spring Harbor Laboratory Press 2014-11 /pmc/articles/PMC4201821/ /pubmed/25246651 http://dx.doi.org/10.1261/rna.046011.114 Text en © 2014 Ching et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Bioinformatics
Ching, Travers
Huang, Sijia
Garmire, Lana X.
Power analysis and sample size estimation for RNA-Seq differential expression
title Power analysis and sample size estimation for RNA-Seq differential expression
title_full Power analysis and sample size estimation for RNA-Seq differential expression
title_fullStr Power analysis and sample size estimation for RNA-Seq differential expression
title_full_unstemmed Power analysis and sample size estimation for RNA-Seq differential expression
title_short Power analysis and sample size estimation for RNA-Seq differential expression
title_sort power analysis and sample size estimation for rna-seq differential expression
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201821/
https://www.ncbi.nlm.nih.gov/pubmed/25246651
http://dx.doi.org/10.1261/rna.046011.114
work_keys_str_mv AT chingtravers poweranalysisandsamplesizeestimationforrnaseqdifferentialexpression
AT huangsijia poweranalysisandsamplesizeestimationforrnaseqdifferentialexpression
AT garmirelanax poweranalysisandsamplesizeestimationforrnaseqdifferentialexpression