Cargando…
Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes
BACKGROUND: Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3508622/ https://www.ncbi.nlm.nih.gov/pubmed/22616673 http://dx.doi.org/10.1186/1756-0381-5-3 |
_version_ | 1782251216184541184 |
---|---|
author | Dai, Hongying Bhandary, Madhusudan Becker, Mara Leeder, J Steven Gaedigk, Roger Motsinger-Reif, Alison A |
author_facet | Dai, Hongying Bhandary, Madhusudan Becker, Mara Leeder, J Steven Gaedigk, Roger Motsinger-Reif, Alison A |
author_sort | Dai, Hongying |
collection | PubMed |
description | BACKGROUND: Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs). METHODS: We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippett’s test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis. RESULTS: Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippett’s test has very low power when the effect size of GxG interactions is small. CONCLUSIONS: The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies. |
format | Online Article Text |
id | pubmed-3508622 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35086222012-11-29 Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes Dai, Hongying Bhandary, Madhusudan Becker, Mara Leeder, J Steven Gaedigk, Roger Motsinger-Reif, Alison A BioData Min Methodology BACKGROUND: Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs). METHODS: We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippett’s test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis. RESULTS: Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippett’s test has very low power when the effect size of GxG interactions is small. CONCLUSIONS: The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies. BioMed Central 2012-05-22 /pmc/articles/PMC3508622/ /pubmed/22616673 http://dx.doi.org/10.1186/1756-0381-5-3 Text en Copyright ©2012 Dai et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Dai, Hongying Bhandary, Madhusudan Becker, Mara Leeder, J Steven Gaedigk, Roger Motsinger-Reif, Alison A Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes |
title | Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes |
title_full | Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes |
title_fullStr | Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes |
title_full_unstemmed | Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes |
title_short | Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes |
title_sort | global tests of p-values for multifactor dimensionality reduction models in selection of optimal number of target genes |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3508622/ https://www.ncbi.nlm.nih.gov/pubmed/22616673 http://dx.doi.org/10.1186/1756-0381-5-3 |
work_keys_str_mv | AT daihongying globaltestsofpvaluesformultifactordimensionalityreductionmodelsinselectionofoptimalnumberoftargetgenes AT bhandarymadhusudan globaltestsofpvaluesformultifactordimensionalityreductionmodelsinselectionofoptimalnumberoftargetgenes AT beckermara globaltestsofpvaluesformultifactordimensionalityreductionmodelsinselectionofoptimalnumberoftargetgenes AT leederjsteven globaltestsofpvaluesformultifactordimensionalityreductionmodelsinselectionofoptimalnumberoftargetgenes AT gaedigkroger globaltestsofpvaluesformultifactordimensionalityreductionmodelsinselectionofoptimalnumberoftargetgenes AT motsingerreifalisona globaltestsofpvaluesformultifactordimensionalityreductionmodelsinselectionofoptimalnumberoftargetgenes |