Cargando…

Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature

Top differentially expressed gene lists are often inconsistent between studies and it has been suggested that small sample sizes contribute to lack of reproducibility and poor prediction accuracy in discriminative models. We considered sex differences (69♂, 65♀) in 134 human skeletal muscle biopsies...

Descripción completa

Detalles Bibliográficos
Autores principales: Stretch, Cynthia, Khan, Sheehan, Asgarian, Nasimeh, Eisner, Roman, Vaisipour, Saman, Damaraju, Sambasivarao, Graham, Kathryn, Bathe, Oliver F., Steed, Helen, Greiner, Russell, Baracos, Vickie E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3670871/
https://www.ncbi.nlm.nih.gov/pubmed/23755224
http://dx.doi.org/10.1371/journal.pone.0065380
_version_ 1782271898096238592
author Stretch, Cynthia
Khan, Sheehan
Asgarian, Nasimeh
Eisner, Roman
Vaisipour, Saman
Damaraju, Sambasivarao
Graham, Kathryn
Bathe, Oliver F.
Steed, Helen
Greiner, Russell
Baracos, Vickie E.
author_facet Stretch, Cynthia
Khan, Sheehan
Asgarian, Nasimeh
Eisner, Roman
Vaisipour, Saman
Damaraju, Sambasivarao
Graham, Kathryn
Bathe, Oliver F.
Steed, Helen
Greiner, Russell
Baracos, Vickie E.
author_sort Stretch, Cynthia
collection PubMed
description Top differentially expressed gene lists are often inconsistent between studies and it has been suggested that small sample sizes contribute to lack of reproducibility and poor prediction accuracy in discriminative models. We considered sex differences (69♂, 65♀) in 134 human skeletal muscle biopsies using DNA microarray. The full dataset and subsamples (n = 10 (5♂, 5♀) to n = 120 (60♂, 60♀)) thereof were used to assess the effect of sample size on the differential expression of single genes, gene rank order and prediction accuracy. Using our full dataset (n = 134), we identified 717 differentially expressed transcripts (p<0.0001) and we were able predict sex with ∼90% accuracy, both within our dataset and on external datasets. Both p-values and rank order of top differentially expressed genes became more variable using smaller subsamples. For example, at n = 10 (5♂, 5♀), no gene was considered differentially expressed at p<0.0001 and prediction accuracy was ∼50% (no better than chance). We found that sample size clearly affects microarray analysis results; small sample sizes result in unstable gene lists and poor prediction accuracy. We anticipate this will apply to other phenotypes, in addition to sex.
format Online
Article
Text
id pubmed-3670871
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36708712013-06-10 Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature Stretch, Cynthia Khan, Sheehan Asgarian, Nasimeh Eisner, Roman Vaisipour, Saman Damaraju, Sambasivarao Graham, Kathryn Bathe, Oliver F. Steed, Helen Greiner, Russell Baracos, Vickie E. PLoS One Research Article Top differentially expressed gene lists are often inconsistent between studies and it has been suggested that small sample sizes contribute to lack of reproducibility and poor prediction accuracy in discriminative models. We considered sex differences (69♂, 65♀) in 134 human skeletal muscle biopsies using DNA microarray. The full dataset and subsamples (n = 10 (5♂, 5♀) to n = 120 (60♂, 60♀)) thereof were used to assess the effect of sample size on the differential expression of single genes, gene rank order and prediction accuracy. Using our full dataset (n = 134), we identified 717 differentially expressed transcripts (p<0.0001) and we were able predict sex with ∼90% accuracy, both within our dataset and on external datasets. Both p-values and rank order of top differentially expressed genes became more variable using smaller subsamples. For example, at n = 10 (5♂, 5♀), no gene was considered differentially expressed at p<0.0001 and prediction accuracy was ∼50% (no better than chance). We found that sample size clearly affects microarray analysis results; small sample sizes result in unstable gene lists and poor prediction accuracy. We anticipate this will apply to other phenotypes, in addition to sex. Public Library of Science 2013-06-03 /pmc/articles/PMC3670871/ /pubmed/23755224 http://dx.doi.org/10.1371/journal.pone.0065380 Text en © 2013 Stretch et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Stretch, Cynthia
Khan, Sheehan
Asgarian, Nasimeh
Eisner, Roman
Vaisipour, Saman
Damaraju, Sambasivarao
Graham, Kathryn
Bathe, Oliver F.
Steed, Helen
Greiner, Russell
Baracos, Vickie E.
Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature
title Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature
title_full Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature
title_fullStr Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature
title_full_unstemmed Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature
title_short Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature
title_sort effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3670871/
https://www.ncbi.nlm.nih.gov/pubmed/23755224
http://dx.doi.org/10.1371/journal.pone.0065380
work_keys_str_mv AT stretchcynthia effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT khansheehan effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT asgariannasimeh effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT eisnerroman effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT vaisipoursaman effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT damarajusambasivarao effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT grahamkathryn effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT batheoliverf effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT steedhelen effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT greinerrussell effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature
AT baracosvickiee effectsofsamplesizeondifferentialgeneexpressionrankorderandpredictionaccuracyofagenesignature