Cargando…

Efficient p-value estimation in massively parallel testing problems

We present a new method to efficiently estimate very large numbers of p-values using empirically constructed null distributions of a test statistic. The need to evaluate a very large number of p-values is increasingly common with modern genomic data, and when interaction effects are of interest, the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kustra, Rafal, Shi, Xiaofei, Murdoch, Duncan J., Greenwood, Celia M. T., Rangrej, Jagadish
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2008
Materias:	Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2536722/ https://www.ncbi.nlm.nih.gov/pubmed/18304995 http://dx.doi.org/10.1093/biostatistics/kxm053

_version_	1782159104924450816
author	Kustra, Rafal Shi, Xiaofei Murdoch, Duncan J. Greenwood, Celia M. T. Rangrej, Jagadish
author_facet	Kustra, Rafal Shi, Xiaofei Murdoch, Duncan J. Greenwood, Celia M. T. Rangrej, Jagadish
author_sort	Kustra, Rafal
collection	PubMed
description	We present a new method to efficiently estimate very large numbers of p-values using empirically constructed null distributions of a test statistic. The need to evaluate a very large number of p-values is increasingly common with modern genomic data, and when interaction effects are of interest, the number of tests can easily run into billions. When the asymptotic distribution is not easily available, permutations are typically used to obtain p-values but these can be computationally infeasible in large problems. Our method constructs a prediction model to obtain a first approximation to the p-values and uses Bayesian methods to choose a fraction of these to be refined by permutations. We apply and evaluate our method on the study of association between 2-way interactions of genetic markers and colorectal cancer using the data from the first phase of a large, genome-wide case–control study. The results show enormous computational savings as compared to evaluating a full set of permutations, with little decrease in accuracy.
format	Text
id	pubmed-2536722
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-25367222009-02-25 Efficient p-value estimation in massively parallel testing problems Kustra, Rafal Shi, Xiaofei Murdoch, Duncan J. Greenwood, Celia M. T. Rangrej, Jagadish Biostatistics Articles We present a new method to efficiently estimate very large numbers of p-values using empirically constructed null distributions of a test statistic. The need to evaluate a very large number of p-values is increasingly common with modern genomic data, and when interaction effects are of interest, the number of tests can easily run into billions. When the asymptotic distribution is not easily available, permutations are typically used to obtain p-values but these can be computationally infeasible in large problems. Our method constructs a prediction model to obtain a first approximation to the p-values and uses Bayesian methods to choose a fraction of these to be refined by permutations. We apply and evaluate our method on the study of association between 2-way interactions of genetic markers and colorectal cancer using the data from the first phase of a large, genome-wide case–control study. The results show enormous computational savings as compared to evaluating a full set of permutations, with little decrease in accuracy. Oxford University Press 2008-10 2008-02-27 /pmc/articles/PMC2536722/ /pubmed/18304995 http://dx.doi.org/10.1093/biostatistics/kxm053 Text en © 2008 The Authors This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Articles Kustra, Rafal Shi, Xiaofei Murdoch, Duncan J. Greenwood, Celia M. T. Rangrej, Jagadish Efficient p-value estimation in massively parallel testing problems
title	Efficient p-value estimation in massively parallel testing problems
title_full	Efficient p-value estimation in massively parallel testing problems
title_fullStr	Efficient p-value estimation in massively parallel testing problems
title_full_unstemmed	Efficient p-value estimation in massively parallel testing problems
title_short	Efficient p-value estimation in massively parallel testing problems
title_sort	efficient p-value estimation in massively parallel testing problems
topic	Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2536722/ https://www.ncbi.nlm.nih.gov/pubmed/18304995 http://dx.doi.org/10.1093/biostatistics/kxm053
work_keys_str_mv	AT kustrarafal efficientpvalueestimationinmassivelyparalleltestingproblems AT shixiaofei efficientpvalueestimationinmassivelyparalleltestingproblems AT murdochduncanj efficientpvalueestimationinmassivelyparalleltestingproblems AT greenwoodceliamt efficientpvalueestimationinmassivelyparalleltestingproblems AT rangrejjagadish efficientpvalueestimationinmassivelyparalleltestingproblems

Efficient p-value estimation in massively parallel testing problems

Ejemplares similares