Cargando…

The harmonic mean p-value for combining dependent tests

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywis...

Descripción completa

Detalles Bibliográficos
Autor principal: Wilson, Daniel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6347718/
https://www.ncbi.nlm.nih.gov/pubmed/30610179
http://dx.doi.org/10.1073/pnas.1814092116
_version_ 1783389969829593088
author Wilson, Daniel J.
author_facet Wilson, Daniel J.
author_sort Wilson, Daniel J.
collection PubMed
description Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.
format Online
Article
Text
id pubmed-6347718
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-63477182019-01-29 The harmonic mean p-value for combining dependent tests Wilson, Daniel J. Proc Natl Acad Sci U S A Physical Sciences Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery. National Academy of Sciences 2019-01-22 2019-01-04 /pmc/articles/PMC6347718/ /pubmed/30610179 http://dx.doi.org/10.1073/pnas.1814092116 Text en Copyright © 2019 the Author(s). Published by PNAS. http://creativecommons.org/licenses/by/4.0/ This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY) (http://creativecommons.org/licenses/by/4.0/) .
spellingShingle Physical Sciences
Wilson, Daniel J.
The harmonic mean p-value for combining dependent tests
title The harmonic mean p-value for combining dependent tests
title_full The harmonic mean p-value for combining dependent tests
title_fullStr The harmonic mean p-value for combining dependent tests
title_full_unstemmed The harmonic mean p-value for combining dependent tests
title_short The harmonic mean p-value for combining dependent tests
title_sort harmonic mean p-value for combining dependent tests
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6347718/
https://www.ncbi.nlm.nih.gov/pubmed/30610179
http://dx.doi.org/10.1073/pnas.1814092116
work_keys_str_mv AT wilsondanielj theharmonicmeanpvalueforcombiningdependenttests
AT wilsondanielj harmonicmeanpvalueforcombiningdependenttests