Cargando…

Size matters: how sample size affects the reproducibility and specificity of gene set analysis

BACKGROUND: Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice o...

Descripción completa

Detalles Bibliográficos
Autores principales: Maleki, Farhad, Ovens, Katie, McQuillan, Ian, Kusalik, Anthony J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805317/
https://www.ncbi.nlm.nih.gov/pubmed/31639047
http://dx.doi.org/10.1186/s40246-019-0226-2
_version_ 1783461353993797632
author Maleki, Farhad
Ovens, Katie
McQuillan, Ian
Kusalik, Anthony J.
author_facet Maleki, Farhad
Ovens, Katie
McQuillan, Ian
Kusalik, Anthony J.
author_sort Maleki, Farhad
collection PubMed
description BACKGROUND: Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice of sample size. However, choosing an appropriate sample size can be difficult, especially because the choice may be method-dependent. Further, sample size choice can have unexpected effects on specificity. RESULTS: In this paper, we report on a systematic, quantitative approach to study the effect of sample size on the reproducibility of the results from 13 gene set analysis methods. We also investigate the impact of sample size on the specificity of these methods. Rather than relying on synthetic data, the proposed approach uses real expression datasets to offer an accurate and reliable evaluation. CONCLUSION: Our findings show that, as a general pattern, the results of gene set analysis become more reproducible as sample size increases. However, the extent of reproducibility and the rate at which it increases vary from method to method. In addition, even in the absence of differential expression, some gene set analysis methods report a large number of false positives, and increasing sample size does not lead to reducing these false positives. The results of this research can be used when selecting a gene set analysis method from those available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40246-019-0226-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6805317
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68053172019-10-24 Size matters: how sample size affects the reproducibility and specificity of gene set analysis Maleki, Farhad Ovens, Katie McQuillan, Ian Kusalik, Anthony J. Hum Genomics Research BACKGROUND: Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice of sample size. However, choosing an appropriate sample size can be difficult, especially because the choice may be method-dependent. Further, sample size choice can have unexpected effects on specificity. RESULTS: In this paper, we report on a systematic, quantitative approach to study the effect of sample size on the reproducibility of the results from 13 gene set analysis methods. We also investigate the impact of sample size on the specificity of these methods. Rather than relying on synthetic data, the proposed approach uses real expression datasets to offer an accurate and reliable evaluation. CONCLUSION: Our findings show that, as a general pattern, the results of gene set analysis become more reproducible as sample size increases. However, the extent of reproducibility and the rate at which it increases vary from method to method. In addition, even in the absence of differential expression, some gene set analysis methods report a large number of false positives, and increasing sample size does not lead to reducing these false positives. The results of this research can be used when selecting a gene set analysis method from those available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40246-019-0226-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-10-22 /pmc/articles/PMC6805317/ /pubmed/31639047 http://dx.doi.org/10.1186/s40246-019-0226-2 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Maleki, Farhad
Ovens, Katie
McQuillan, Ian
Kusalik, Anthony J.
Size matters: how sample size affects the reproducibility and specificity of gene set analysis
title Size matters: how sample size affects the reproducibility and specificity of gene set analysis
title_full Size matters: how sample size affects the reproducibility and specificity of gene set analysis
title_fullStr Size matters: how sample size affects the reproducibility and specificity of gene set analysis
title_full_unstemmed Size matters: how sample size affects the reproducibility and specificity of gene set analysis
title_short Size matters: how sample size affects the reproducibility and specificity of gene set analysis
title_sort size matters: how sample size affects the reproducibility and specificity of gene set analysis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805317/
https://www.ncbi.nlm.nih.gov/pubmed/31639047
http://dx.doi.org/10.1186/s40246-019-0226-2
work_keys_str_mv AT malekifarhad sizemattershowsamplesizeaffectsthereproducibilityandspecificityofgenesetanalysis
AT ovenskatie sizemattershowsamplesizeaffectsthereproducibilityandspecificityofgenesetanalysis
AT mcquillanian sizemattershowsamplesizeaffectsthereproducibilityandspecificityofgenesetanalysis
AT kusalikanthonyj sizemattershowsamplesizeaffectsthereproducibilityandspecificityofgenesetanalysis