Cargando…

On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments

BACKGROUND: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical...

Descripción completa

Detalles Bibliográficos
Autores principales: Assefa, Alemu Takele, Vandesompele, Jo, Thas, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7168886/
https://www.ncbi.nlm.nih.gov/pubmed/32306892
http://dx.doi.org/10.1186/s12864-020-6721-y
_version_ 1783523734688104448
author Assefa, Alemu Takele
Vandesompele, Jo
Thas, Olivier
author_facet Assefa, Alemu Takele
Vandesompele, Jo
Thas, Olivier
author_sort Assefa, Alemu Takele
collection PubMed
description BACKGROUND: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. RESULT: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. CONCLUSION: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power.
format Online
Article
Text
id pubmed-7168886
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71688862020-04-23 On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments Assefa, Alemu Takele Vandesompele, Jo Thas, Olivier BMC Genomics Research Article BACKGROUND: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. RESULT: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. CONCLUSION: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power. BioMed Central 2020-04-19 /pmc/articles/PMC7168886/ /pubmed/32306892 http://dx.doi.org/10.1186/s12864-020-6721-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Assefa, Alemu Takele
Vandesompele, Jo
Thas, Olivier
On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
title On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
title_full On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
title_fullStr On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
title_full_unstemmed On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
title_short On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
title_sort on the utility of rna sample pooling to optimize cost and statistical power in rna sequencing experiments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7168886/
https://www.ncbi.nlm.nih.gov/pubmed/32306892
http://dx.doi.org/10.1186/s12864-020-6721-y
work_keys_str_mv AT assefaalemutakele ontheutilityofrnasamplepoolingtooptimizecostandstatisticalpowerinrnasequencingexperiments
AT vandesompelejo ontheutilityofrnasamplepoolingtooptimizecostandstatisticalpowerinrnasequencingexperiments
AT thasolivier ontheutilityofrnasamplepoolingtooptimizecostandstatisticalpowerinrnasequencingexperiments