Cargando…

Power and sample size estimation in microarray studies

BACKGROUND: Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifica...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Wei-Jiun, Hsueh, Huey-Miin, Chen, James J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2837028/ https://www.ncbi.nlm.nih.gov/pubmed/20100337 http://dx.doi.org/10.1186/1471-2105-11-48

_version_	1782178763271831552
author	Lin, Wei-Jiun Hsueh, Huey-Miin Chen, James J
author_facet	Lin, Wei-Jiun Hsueh, Huey-Miin Chen, James J
author_sort	Lin, Wei-Jiun
collection	PubMed
description	BACKGROUND: Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (π(1)) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes. RESULTS: A sample size estimate based on the common formulation, to achieve the desired sensitivity on average, can be calculated using a univariate method without taking the correlation among genes into consideration. This formulation of sample size problem is inadequate because the probability of detecting the specified sensitivity can be lower than 50%. On the other hand, the needed sample size calculated by the proposed permutation method will ensure detecting at least the desired sensitivity with 95% probability. The method is shown to perform well for a real example dataset using a small pilot dataset with 4-6 samples per group. CONCLUSIONS: We recommend that the sample size problem should be formulated to detect a specified proportion of differentially expressed genes with 95% probability. This formulation ensures finding the desired proportion of true positives with high probability. The proposed permutation method takes the correlation structure and effect size heterogeneity into consideration and works well using only a small pilot dataset.
format	Text
id	pubmed-2837028
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28370282010-03-12 Power and sample size estimation in microarray studies Lin, Wei-Jiun Hsueh, Huey-Miin Chen, James J BMC Bioinformatics Methodology article BACKGROUND: Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (π(1)) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes. RESULTS: A sample size estimate based on the common formulation, to achieve the desired sensitivity on average, can be calculated using a univariate method without taking the correlation among genes into consideration. This formulation of sample size problem is inadequate because the probability of detecting the specified sensitivity can be lower than 50%. On the other hand, the needed sample size calculated by the proposed permutation method will ensure detecting at least the desired sensitivity with 95% probability. The method is shown to perform well for a real example dataset using a small pilot dataset with 4-6 samples per group. CONCLUSIONS: We recommend that the sample size problem should be formulated to detect a specified proportion of differentially expressed genes with 95% probability. This formulation ensures finding the desired proportion of true positives with high probability. The proposed permutation method takes the correlation structure and effect size heterogeneity into consideration and works well using only a small pilot dataset. BioMed Central 2010-01-25 /pmc/articles/PMC2837028/ /pubmed/20100337 http://dx.doi.org/10.1186/1471-2105-11-48 Text en Copyright ©2010 Lin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology article Lin, Wei-Jiun Hsueh, Huey-Miin Chen, James J Power and sample size estimation in microarray studies
title	Power and sample size estimation in microarray studies
title_full	Power and sample size estimation in microarray studies
title_fullStr	Power and sample size estimation in microarray studies
title_full_unstemmed	Power and sample size estimation in microarray studies
title_short	Power and sample size estimation in microarray studies
title_sort	power and sample size estimation in microarray studies
topic	Methodology article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2837028/ https://www.ncbi.nlm.nih.gov/pubmed/20100337 http://dx.doi.org/10.1186/1471-2105-11-48
work_keys_str_mv	AT linweijiun powerandsamplesizeestimationinmicroarraystudies AT hsuehhueymiin powerandsamplesizeestimationinmicroarraystudies AT chenjamesj powerandsamplesizeestimationinmicroarraystudies

Power and sample size estimation in microarray studies

Ejemplares similares