Cargando…

FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution

Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Mulin Jun, Sham, Pak Chung, Wang, Junwen
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2010
Materias:	Applications Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2971576/ https://www.ncbi.nlm.nih.gov/pubmed/20861029 http://dx.doi.org/10.1093/bioinformatics/btq540

_version_	1782190645796929536
author	Li, Mulin Jun Sham, Pak Chung Wang, Junwen
author_facet	Li, Mulin Jun Sham, Pak Chung Wang, Junwen
author_sort	Li, Mulin Jun
collection	PubMed
description	Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10(−9)) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 10(9) resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. Availability: The FastPval executable file, the java GUI and source code, and the java web start server with example data and introduction, are available at http://wanglab.hku.hk/pvalue Contact: junwen@hku.hk Supplementary information: Supplementary data are available at Bioinformatics online and http://wanglab.hku.hk/pvalue/.
format	Text
id	pubmed-2971576
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-29715762010-11-04 FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution Li, Mulin Jun Sham, Pak Chung Wang, Junwen Bioinformatics Applications Note Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10(−9)) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 10(9) resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. Availability: The FastPval executable file, the java GUI and source code, and the java web start server with example data and introduction, are available at http://wanglab.hku.hk/pvalue Contact: junwen@hku.hk Supplementary information: Supplementary data are available at Bioinformatics online and http://wanglab.hku.hk/pvalue/. Oxford University Press 2010-11-15 2010-09-21 /pmc/articles/PMC2971576/ /pubmed/20861029 http://dx.doi.org/10.1093/bioinformatics/btq540 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Note Li, Mulin Jun Sham, Pak Chung Wang, Junwen FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution
title	FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution
title_full	FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution
title_fullStr	FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution
title_full_unstemmed	FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution
title_short	FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution
title_sort	fastpval: a fast and memory efficient program to calculate very low p-values from empirical distribution
topic	Applications Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2971576/ https://www.ncbi.nlm.nih.gov/pubmed/20861029 http://dx.doi.org/10.1093/bioinformatics/btq540
work_keys_str_mv	AT limulinjun fastpvalafastandmemoryefficientprogramtocalculateverylowpvaluesfromempiricaldistribution AT shampakchung fastpvalafastandmemoryefficientprogramtocalculateverylowpvaluesfromempiricaldistribution AT wangjunwen fastpvalafastandmemoryefficientprogramtocalculateverylowpvaluesfromempiricaldistribution

FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution

Ejemplares similares