Cargando…

P-values in genomics: Apparent precision masks high uncertainty

Scientists often interpret P-values as measures of the relative strength of statistical findings. This is common practice in large-scale genomic studies where P-values are used to choose which of numerous hypothesis test results should be pursued in subsequent research. In this study, we examine P-v...

Descripción completa

Detalles Bibliográficos
Autores principales: Lazzeroni, L C, Lu, Y, Belitskaya-Lévy, I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255087/
https://www.ncbi.nlm.nih.gov/pubmed/24419042
http://dx.doi.org/10.1038/mp.2013.184
Descripción
Sumario:Scientists often interpret P-values as measures of the relative strength of statistical findings. This is common practice in large-scale genomic studies where P-values are used to choose which of numerous hypothesis test results should be pursued in subsequent research. In this study, we examine P-value variability to assess the degree of certainty P-values provide. We develop prediction intervals for the P-value in a replication study given the P-value observed in an initial study. The intervals depend on the initial value of P and the ratio of sample sizes between the initial and replication studies, but not on the underlying effect size or initial sample size. The intervals are valid for most large-sample statistical tests in any context, and can be used in the presence of single or multiple tests. While P-values are highly variable, future P-value variability can be explicitly predicted based on a P-value from an initial study. The relative size of the replication and initial study is an important predictor of the P-value in a subsequent replication study. We provide a handy calculator implementing these results and apply them to a study of Alzheimer's disease and recent findings of the Cross-Disorder Group of the Psychiatric Genomics Consortium. This study suggests that overinterpretation of very significant, but highly variable, P-values is an important factor contributing to the unexpectedly high incidence of non-replication. Formal prediction intervals can also provide realistic interpretations and comparisons of P-values associated with different estimated effect sizes and sample sizes.