Cargando…

A comparison of internal validation techniques for multifactor dimensionality reduction

BACKGROUND: It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly...

Descripción completa

Detalles Bibliográficos
Autores principales: Winham, Stacey J, Slater, Andrew J, Motsinger-Reif, Alison A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2920275/
https://www.ncbi.nlm.nih.gov/pubmed/20650002
http://dx.doi.org/10.1186/1471-2105-11-394
_version_ 1782185262624800768
author Winham, Stacey J
Slater, Andrew J
Motsinger-Reif, Alison A
author_facet Winham, Stacey J
Slater, Andrew J
Motsinger-Reif, Alison A
author_sort Winham, Stacey J
collection PubMed
description BACKGROUND: It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data. RESULTS: MDR with 3WS is computationally approximately five times faster than 5-fold cross-validation. The power to find the exact true disease loci without detecting false positive loci is higher with 5-fold cross-validation than with 3WS before pruning. However, the power to find the true disease causing loci in addition to false positive loci is equivalent to the 3WS. With the incorporation of a pruning procedure after the 3WS, the power of the 3WS approach to detect only the exact disease loci is equivalent to that of MDR with cross-validation. In the real data application, the cross-validation and 3WS analyses indicate the same two-locus model. CONCLUSIONS: Our results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures. The specific pruning procedure should be chosen understanding the trade-off between identifying all relevant genetic effects but including false positives and missing important genetic factors. This implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies.
format Text
id pubmed-2920275
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29202752010-08-12 A comparison of internal validation techniques for multifactor dimensionality reduction Winham, Stacey J Slater, Andrew J Motsinger-Reif, Alison A BMC Bioinformatics Research Article BACKGROUND: It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data. RESULTS: MDR with 3WS is computationally approximately five times faster than 5-fold cross-validation. The power to find the exact true disease loci without detecting false positive loci is higher with 5-fold cross-validation than with 3WS before pruning. However, the power to find the true disease causing loci in addition to false positive loci is equivalent to the 3WS. With the incorporation of a pruning procedure after the 3WS, the power of the 3WS approach to detect only the exact disease loci is equivalent to that of MDR with cross-validation. In the real data application, the cross-validation and 3WS analyses indicate the same two-locus model. CONCLUSIONS: Our results reveal that the performance of the two internal validation methods is equivalent with the use of pruning procedures. The specific pruning procedure should be chosen understanding the trade-off between identifying all relevant genetic effects but including false positives and missing important genetic factors. This implies 3WS may be a powerful and computationally efficient approach to screen for epistatic effects, and could be used to identify candidate interactions in large-scale genetic studies. BioMed Central 2010-07-22 /pmc/articles/PMC2920275/ /pubmed/20650002 http://dx.doi.org/10.1186/1471-2105-11-394 Text en Copyright ©2010 Winham et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Winham, Stacey J
Slater, Andrew J
Motsinger-Reif, Alison A
A comparison of internal validation techniques for multifactor dimensionality reduction
title A comparison of internal validation techniques for multifactor dimensionality reduction
title_full A comparison of internal validation techniques for multifactor dimensionality reduction
title_fullStr A comparison of internal validation techniques for multifactor dimensionality reduction
title_full_unstemmed A comparison of internal validation techniques for multifactor dimensionality reduction
title_short A comparison of internal validation techniques for multifactor dimensionality reduction
title_sort comparison of internal validation techniques for multifactor dimensionality reduction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2920275/
https://www.ncbi.nlm.nih.gov/pubmed/20650002
http://dx.doi.org/10.1186/1471-2105-11-394
work_keys_str_mv AT winhamstaceyj acomparisonofinternalvalidationtechniquesformultifactordimensionalityreduction
AT slaterandrewj acomparisonofinternalvalidationtechniquesformultifactordimensionalityreduction
AT motsingerreifalisona acomparisonofinternalvalidationtechniquesformultifactordimensionalityreduction
AT winhamstaceyj comparisonofinternalvalidationtechniquesformultifactordimensionalityreduction
AT slaterandrewj comparisonofinternalvalidationtechniquesformultifactordimensionalityreduction
AT motsingerreifalisona comparisonofinternalvalidationtechniquesformultifactordimensionalityreduction