Cargando…

Sample size for detecting differentially expressed genes in microarray experiments

BACKGROUND: Microarray experiments are often performed with a small number of biological replicates, resulting in low statistical power for detecting differentially expressed genes and concomitant high false positive rates. While increasing sample size can increase statistical power and decrease err...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wei, Caimiao, Li, Jiangning, Bumgarner, Roger E
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2004
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC533874/ https://www.ncbi.nlm.nih.gov/pubmed/15533245 http://dx.doi.org/10.1186/1471-2164-5-87

_version_	1782121992982364160
author	Wei, Caimiao Li, Jiangning Bumgarner, Roger E
author_facet	Wei, Caimiao Li, Jiangning Bumgarner, Roger E
author_sort	Wei, Caimiao
collection	PubMed
description	BACKGROUND: Microarray experiments are often performed with a small number of biological replicates, resulting in low statistical power for detecting differentially expressed genes and concomitant high false positive rates. While increasing sample size can increase statistical power and decrease error rates, with too many samples, valuable resources are not used efficiently. The issue of how many replicates are required in a typical experimental system needs to be addressed. Of particular interest is the difference in required sample sizes for similar experiments in inbred vs. outbred populations (e.g. mouse and rat vs. human). RESULTS: We hypothesize that if all other factors (assay protocol, microarray platform, data pre-processing) were equal, fewer individuals would be needed for the same statistical power using inbred animals as opposed to unrelated human subjects, as genetic effects on gene expression will be removed in the inbred populations. We apply the same normalization algorithm and estimate the variance of gene expression for a variety of cDNA data sets (humans, inbred mice and rats) comparing two conditions. Using one sample, paired sample or two independent sample t-tests, we calculate the sample sizes required to detect a 1.5-, 2-, and 4-fold changes in expression level as a function of false positive rate, power and percentage of genes that have a standard deviation below a given percentile. CONCLUSIONS: Factors that affect power and sample size calculations include variability of the population, the desired detectable differences, the power to detect the differences, and an acceptable error rate. In addition, experimental design, technical variability and data pre-processing play a role in the power of the statistical tests in microarrays. We show that the number of samples required for detecting a 2-fold change with 90% probability and a p-value of 0.01 in humans is much larger than the number of samples commonly used in present day studies, and that far fewer individuals are needed for the same statistical power when using inbred animals rather than unrelated human subjects.
format	Text
id	pubmed-533874
institution	National Center for Biotechnology Information
language	English
publishDate	2004
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-5338742004-11-26 Sample size for detecting differentially expressed genes in microarray experiments Wei, Caimiao Li, Jiangning Bumgarner, Roger E BMC Genomics Research Article BACKGROUND: Microarray experiments are often performed with a small number of biological replicates, resulting in low statistical power for detecting differentially expressed genes and concomitant high false positive rates. While increasing sample size can increase statistical power and decrease error rates, with too many samples, valuable resources are not used efficiently. The issue of how many replicates are required in a typical experimental system needs to be addressed. Of particular interest is the difference in required sample sizes for similar experiments in inbred vs. outbred populations (e.g. mouse and rat vs. human). RESULTS: We hypothesize that if all other factors (assay protocol, microarray platform, data pre-processing) were equal, fewer individuals would be needed for the same statistical power using inbred animals as opposed to unrelated human subjects, as genetic effects on gene expression will be removed in the inbred populations. We apply the same normalization algorithm and estimate the variance of gene expression for a variety of cDNA data sets (humans, inbred mice and rats) comparing two conditions. Using one sample, paired sample or two independent sample t-tests, we calculate the sample sizes required to detect a 1.5-, 2-, and 4-fold changes in expression level as a function of false positive rate, power and percentage of genes that have a standard deviation below a given percentile. CONCLUSIONS: Factors that affect power and sample size calculations include variability of the population, the desired detectable differences, the power to detect the differences, and an acceptable error rate. In addition, experimental design, technical variability and data pre-processing play a role in the power of the statistical tests in microarrays. We show that the number of samples required for detecting a 2-fold change with 90% probability and a p-value of 0.01 in humans is much larger than the number of samples commonly used in present day studies, and that far fewer individuals are needed for the same statistical power when using inbred animals rather than unrelated human subjects. BioMed Central 2004-11-08 /pmc/articles/PMC533874/ /pubmed/15533245 http://dx.doi.org/10.1186/1471-2164-5-87 Text en Copyright © 2004 Wei et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Wei, Caimiao Li, Jiangning Bumgarner, Roger E Sample size for detecting differentially expressed genes in microarray experiments
title	Sample size for detecting differentially expressed genes in microarray experiments
title_full	Sample size for detecting differentially expressed genes in microarray experiments
title_fullStr	Sample size for detecting differentially expressed genes in microarray experiments
title_full_unstemmed	Sample size for detecting differentially expressed genes in microarray experiments
title_short	Sample size for detecting differentially expressed genes in microarray experiments
title_sort	sample size for detecting differentially expressed genes in microarray experiments
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC533874/ https://www.ncbi.nlm.nih.gov/pubmed/15533245 http://dx.doi.org/10.1186/1471-2164-5-87
work_keys_str_mv	AT weicaimiao samplesizefordetectingdifferentiallyexpressedgenesinmicroarrayexperiments AT lijiangning samplesizefordetectingdifferentiallyexpressedgenesinmicroarrayexperiments AT bumgarnerrogere samplesizefordetectingdifferentiallyexpressedgenesinmicroarrayexperiments

Sample size for detecting differentially expressed genes in microarray experiments

Ejemplares similares