Cargando…
A random forest approach to the detection of epistatic interactions in case-control studies
BACKGROUND: The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown the...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648748/ https://www.ncbi.nlm.nih.gov/pubmed/19208169 http://dx.doi.org/10.1186/1471-2105-10-S1-S65 |
_version_ | 1782164978945490944 |
---|---|
author | Jiang, Rui Tang, Wanwan Wu, Xuebing Fu, Wenhui |
author_facet | Jiang, Rui Tang, Wanwan Wu, Xuebing Fu, Wenhui |
author_sort | Jiang, Rui |
collection | PubMed |
description | BACKGROUND: The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown their successes in small-scale case-control data, the "combination explosion" course prohibits their applications to genome-wide analysis. It is therefore indispensable to develop new methods that are able to reduce the search space for epistatic interactions from an astronomic number of all possible combinations of genetic variants to a manageable set of candidates. RESULTS: We studied case-control data from the viewpoint of binary classification. More precisely, we treated single nucleotide polymorphism (SNP) markers as categorical features and adopted the random forest to discriminate cases against controls. On the basis of the gini importance given by the random forest, we designed a sliding window sequential forward feature selection (SWSFS) algorithm to select a small set of candidate SNPs that could minimize the classification error and then statistically tested up to three-way interactions of the candidates. We compared this approach with three existing methods on three simulated disease models and showed that our approach is comparable to, sometimes more powerful than, the other methods. We applied our approach to a genome-wide case-control dataset for Age-related Macular Degeneration (AMD) and successfully identified two SNPs that were reported to be associated with this disease. CONCLUSION: Besides existing pure statistical approaches, we demonstrated the feasibility of incorporating machine learning methods into genome-wide case-control studies. The gini importance offers yet another measure for the associations between SNPs and complex diseases, thereby complementing existing statistical measures to facilitate the identification of epistatic interactions and the understanding of epistasis in the pathogenesis of complex diseases. |
format | Text |
id | pubmed-2648748 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26487482009-03-03 A random forest approach to the detection of epistatic interactions in case-control studies Jiang, Rui Tang, Wanwan Wu, Xuebing Fu, Wenhui BMC Bioinformatics Research BACKGROUND: The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown their successes in small-scale case-control data, the "combination explosion" course prohibits their applications to genome-wide analysis. It is therefore indispensable to develop new methods that are able to reduce the search space for epistatic interactions from an astronomic number of all possible combinations of genetic variants to a manageable set of candidates. RESULTS: We studied case-control data from the viewpoint of binary classification. More precisely, we treated single nucleotide polymorphism (SNP) markers as categorical features and adopted the random forest to discriminate cases against controls. On the basis of the gini importance given by the random forest, we designed a sliding window sequential forward feature selection (SWSFS) algorithm to select a small set of candidate SNPs that could minimize the classification error and then statistically tested up to three-way interactions of the candidates. We compared this approach with three existing methods on three simulated disease models and showed that our approach is comparable to, sometimes more powerful than, the other methods. We applied our approach to a genome-wide case-control dataset for Age-related Macular Degeneration (AMD) and successfully identified two SNPs that were reported to be associated with this disease. CONCLUSION: Besides existing pure statistical approaches, we demonstrated the feasibility of incorporating machine learning methods into genome-wide case-control studies. The gini importance offers yet another measure for the associations between SNPs and complex diseases, thereby complementing existing statistical measures to facilitate the identification of epistatic interactions and the understanding of epistasis in the pathogenesis of complex diseases. BioMed Central 2009-01-30 /pmc/articles/PMC2648748/ /pubmed/19208169 http://dx.doi.org/10.1186/1471-2105-10-S1-S65 Text en Copyright © 2009 Jiang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Jiang, Rui Tang, Wanwan Wu, Xuebing Fu, Wenhui A random forest approach to the detection of epistatic interactions in case-control studies |
title | A random forest approach to the detection of epistatic interactions in case-control studies |
title_full | A random forest approach to the detection of epistatic interactions in case-control studies |
title_fullStr | A random forest approach to the detection of epistatic interactions in case-control studies |
title_full_unstemmed | A random forest approach to the detection of epistatic interactions in case-control studies |
title_short | A random forest approach to the detection of epistatic interactions in case-control studies |
title_sort | random forest approach to the detection of epistatic interactions in case-control studies |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648748/ https://www.ncbi.nlm.nih.gov/pubmed/19208169 http://dx.doi.org/10.1186/1471-2105-10-S1-S65 |
work_keys_str_mv | AT jiangrui arandomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies AT tangwanwan arandomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies AT wuxuebing arandomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies AT fuwenhui arandomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies AT jiangrui randomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies AT tangwanwan randomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies AT wuxuebing randomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies AT fuwenhui randomforestapproachtothedetectionofepistaticinteractionsincasecontrolstudies |