Cargando…

Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

BACKGROUND: Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate...

Descripción completa

Detalles Bibliográficos
Autores principales: Aliferis, Constantin F., Statnikov, Alexander, Tsamardinos, Ioannis, Schildcrout, Jonathan S., Shepherd, Bryan E., Harrell, Frank E.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2654113/
https://www.ncbi.nlm.nih.gov/pubmed/19290050
http://dx.doi.org/10.1371/journal.pone.0004922
_version_ 1782165334529146880
author Aliferis, Constantin F.
Statnikov, Alexander
Tsamardinos, Ioannis
Schildcrout, Jonathan S.
Shepherd, Bryan E.
Harrell, Frank E.
author_facet Aliferis, Constantin F.
Statnikov, Alexander
Tsamardinos, Ioannis
Schildcrout, Jonathan S.
Shepherd, Bryan E.
Harrell, Frank E.
author_sort Aliferis, Constantin F.
collection PubMed
description BACKGROUND: Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development. METHODOLOGY/PRINCIPAL FINDINGS: We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data. CONCLUSIONS/SIGNIFICANCE: The findings of the present study have two important practical implications: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests.
format Text
id pubmed-2654113
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26541132009-03-17 Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data Aliferis, Constantin F. Statnikov, Alexander Tsamardinos, Ioannis Schildcrout, Jonathan S. Shepherd, Bryan E. Harrell, Frank E. PLoS One Research Article BACKGROUND: Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development. METHODOLOGY/PRINCIPAL FINDINGS: We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data. CONCLUSIONS/SIGNIFICANCE: The findings of the present study have two important practical implications: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests. Public Library of Science 2009-03-17 /pmc/articles/PMC2654113/ /pubmed/19290050 http://dx.doi.org/10.1371/journal.pone.0004922 Text en Aliferis et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Aliferis, Constantin F.
Statnikov, Alexander
Tsamardinos, Ioannis
Schildcrout, Jonathan S.
Shepherd, Bryan E.
Harrell, Frank E.
Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data
title Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data
title_full Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data
title_fullStr Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data
title_full_unstemmed Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data
title_short Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data
title_sort factors influencing the statistical power of complex data analysis protocols for molecular signature development from microarray data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2654113/
https://www.ncbi.nlm.nih.gov/pubmed/19290050
http://dx.doi.org/10.1371/journal.pone.0004922
work_keys_str_mv AT aliferisconstantinf factorsinfluencingthestatisticalpowerofcomplexdataanalysisprotocolsformolecularsignaturedevelopmentfrommicroarraydata
AT statnikovalexander factorsinfluencingthestatisticalpowerofcomplexdataanalysisprotocolsformolecularsignaturedevelopmentfrommicroarraydata
AT tsamardinosioannis factorsinfluencingthestatisticalpowerofcomplexdataanalysisprotocolsformolecularsignaturedevelopmentfrommicroarraydata
AT schildcroutjonathans factorsinfluencingthestatisticalpowerofcomplexdataanalysisprotocolsformolecularsignaturedevelopmentfrommicroarraydata
AT shepherdbryane factorsinfluencingthestatisticalpowerofcomplexdataanalysisprotocolsformolecularsignaturedevelopmentfrommicroarraydata
AT harrellfranke factorsinfluencingthestatisticalpowerofcomplexdataanalysisprotocolsformolecularsignaturedevelopmentfrommicroarraydata