Cargando…

P-values in genomics: Apparent precision masks high uncertainty

Scientists often interpret P-values as measures of the relative strength of statistical findings. This is common practice in large-scale genomic studies where P-values are used to choose which of numerous hypothesis test results should be pursued in subsequent research. In this study, we examine P-v...

Descripción completa

Detalles Bibliográficos
Autores principales: Lazzeroni, L C, Lu, Y, Belitskaya-Lévy, I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255087/
https://www.ncbi.nlm.nih.gov/pubmed/24419042
http://dx.doi.org/10.1038/mp.2013.184
_version_ 1782347391829016576
author Lazzeroni, L C
Lu, Y
Belitskaya-Lévy, I
author_facet Lazzeroni, L C
Lu, Y
Belitskaya-Lévy, I
author_sort Lazzeroni, L C
collection PubMed
description Scientists often interpret P-values as measures of the relative strength of statistical findings. This is common practice in large-scale genomic studies where P-values are used to choose which of numerous hypothesis test results should be pursued in subsequent research. In this study, we examine P-value variability to assess the degree of certainty P-values provide. We develop prediction intervals for the P-value in a replication study given the P-value observed in an initial study. The intervals depend on the initial value of P and the ratio of sample sizes between the initial and replication studies, but not on the underlying effect size or initial sample size. The intervals are valid for most large-sample statistical tests in any context, and can be used in the presence of single or multiple tests. While P-values are highly variable, future P-value variability can be explicitly predicted based on a P-value from an initial study. The relative size of the replication and initial study is an important predictor of the P-value in a subsequent replication study. We provide a handy calculator implementing these results and apply them to a study of Alzheimer's disease and recent findings of the Cross-Disorder Group of the Psychiatric Genomics Consortium. This study suggests that overinterpretation of very significant, but highly variable, P-values is an important factor contributing to the unexpectedly high incidence of non-replication. Formal prediction intervals can also provide realistic interpretations and comparisons of P-values associated with different estimated effect sizes and sample sizes.
format Online
Article
Text
id pubmed-4255087
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-42550872014-12-11 P-values in genomics: Apparent precision masks high uncertainty Lazzeroni, L C Lu, Y Belitskaya-Lévy, I Mol Psychiatry Original Article Scientists often interpret P-values as measures of the relative strength of statistical findings. This is common practice in large-scale genomic studies where P-values are used to choose which of numerous hypothesis test results should be pursued in subsequent research. In this study, we examine P-value variability to assess the degree of certainty P-values provide. We develop prediction intervals for the P-value in a replication study given the P-value observed in an initial study. The intervals depend on the initial value of P and the ratio of sample sizes between the initial and replication studies, but not on the underlying effect size or initial sample size. The intervals are valid for most large-sample statistical tests in any context, and can be used in the presence of single or multiple tests. While P-values are highly variable, future P-value variability can be explicitly predicted based on a P-value from an initial study. The relative size of the replication and initial study is an important predictor of the P-value in a subsequent replication study. We provide a handy calculator implementing these results and apply them to a study of Alzheimer's disease and recent findings of the Cross-Disorder Group of the Psychiatric Genomics Consortium. This study suggests that overinterpretation of very significant, but highly variable, P-values is an important factor contributing to the unexpectedly high incidence of non-replication. Formal prediction intervals can also provide realistic interpretations and comparisons of P-values associated with different estimated effect sizes and sample sizes. Nature Publishing Group 2014-12 2014-01-14 /pmc/articles/PMC4255087/ /pubmed/24419042 http://dx.doi.org/10.1038/mp.2013.184 Text en Copyright © 2014 Macmillan Publishers Limited http://creativecommons.org/licenses/by-nc-nd/3.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
spellingShingle Original Article
Lazzeroni, L C
Lu, Y
Belitskaya-Lévy, I
P-values in genomics: Apparent precision masks high uncertainty
title P-values in genomics: Apparent precision masks high uncertainty
title_full P-values in genomics: Apparent precision masks high uncertainty
title_fullStr P-values in genomics: Apparent precision masks high uncertainty
title_full_unstemmed P-values in genomics: Apparent precision masks high uncertainty
title_short P-values in genomics: Apparent precision masks high uncertainty
title_sort p-values in genomics: apparent precision masks high uncertainty
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255087/
https://www.ncbi.nlm.nih.gov/pubmed/24419042
http://dx.doi.org/10.1038/mp.2013.184
work_keys_str_mv AT lazzeronilc pvaluesingenomicsapparentprecisionmaskshighuncertainty
AT luy pvaluesingenomicsapparentprecisionmaskshighuncertainty
AT belitskayalevyi pvaluesingenomicsapparentprecisionmaskshighuncertainty