Cargando…
On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
BACKGROUND: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7168886/ https://www.ncbi.nlm.nih.gov/pubmed/32306892 http://dx.doi.org/10.1186/s12864-020-6721-y |
_version_ | 1783523734688104448 |
---|---|
author | Assefa, Alemu Takele Vandesompele, Jo Thas, Olivier |
author_facet | Assefa, Alemu Takele Vandesompele, Jo Thas, Olivier |
author_sort | Assefa, Alemu Takele |
collection | PubMed |
description | BACKGROUND: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. RESULT: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. CONCLUSION: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power. |
format | Online Article Text |
id | pubmed-7168886 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-71688862020-04-23 On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments Assefa, Alemu Takele Vandesompele, Jo Thas, Olivier BMC Genomics Research Article BACKGROUND: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. RESULT: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. CONCLUSION: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power. BioMed Central 2020-04-19 /pmc/articles/PMC7168886/ /pubmed/32306892 http://dx.doi.org/10.1186/s12864-020-6721-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Assefa, Alemu Takele Vandesompele, Jo Thas, Olivier On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments |
title | On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments |
title_full | On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments |
title_fullStr | On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments |
title_full_unstemmed | On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments |
title_short | On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments |
title_sort | on the utility of rna sample pooling to optimize cost and statistical power in rna sequencing experiments |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7168886/ https://www.ncbi.nlm.nih.gov/pubmed/32306892 http://dx.doi.org/10.1186/s12864-020-6721-y |
work_keys_str_mv | AT assefaalemutakele ontheutilityofrnasamplepoolingtooptimizecostandstatisticalpowerinrnasequencingexperiments AT vandesompelejo ontheutilityofrnasamplepoolingtooptimizecostandstatisticalpowerinrnasequencingexperiments AT thasolivier ontheutilityofrnasamplepoolingtooptimizecostandstatisticalpowerinrnasequencingexperiments |