Cargando…
Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions
Risk of complex disorders is thought to be multifactorial, involving interactions between risk factors. However, many genetic studies assess association between disease status and markers one single-nucleotide polymorphism (SNP) at a time, due to the high-dimensional nature of the search space of al...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367584/ https://www.ncbi.nlm.nih.gov/pubmed/18466558 |
_version_ | 1782154327090003968 |
---|---|
author | Nicodemus, Kristin K Wang, Wenyi Shugart, Yin Yao |
author_facet | Nicodemus, Kristin K Wang, Wenyi Shugart, Yin Yao |
author_sort | Nicodemus, Kristin K |
collection | PubMed |
description | Risk of complex disorders is thought to be multifactorial, involving interactions between risk factors. However, many genetic studies assess association between disease status and markers one single-nucleotide polymorphism (SNP) at a time, due to the high-dimensional nature of the search space of all possible interactions. Three ensemble methods have been recently proposed for use in high-dimensional data (Monte Carlo logic regression, random forests, and generalized boosted regression). An intuitive way to detect an association between genetic markers and disease status is to use variable importance measures, even though the stability of these measures in the context of a whole-genome association study is unknown. For the simulated data of Problem 3 in the Genetic Analysis Workshop 15 (GAW15), we examined the variability of both rankings and magnitude of variable importance measures using 10 variables simulated to participate in gene × gene and gene × environment interactions. We conducted 500 analyses per method on one randomly selected replicate, tallying the rankings and importance measures for each of the 10 variables of interest. When the simulated effect size was strong, all three methods showed stable rankings and estimates of variable importance. However, under conditions more commonly expected to be encountered in complex diseases, random forests and generalized boosted regression showed more stable estimates of variable importance and variable rankings. Individuals endeavoring to apply statistical learning methods to detect interaction in complex disease studies should perform repeated analyses in order to assure variable importance measures and rankings do not vary greatly, even for statistical learning algorithms that are thought to be stable. |
format | Text |
id | pubmed-2367584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23675842008-05-06 Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions Nicodemus, Kristin K Wang, Wenyi Shugart, Yin Yao BMC Proc Proceedings Risk of complex disorders is thought to be multifactorial, involving interactions between risk factors. However, many genetic studies assess association between disease status and markers one single-nucleotide polymorphism (SNP) at a time, due to the high-dimensional nature of the search space of all possible interactions. Three ensemble methods have been recently proposed for use in high-dimensional data (Monte Carlo logic regression, random forests, and generalized boosted regression). An intuitive way to detect an association between genetic markers and disease status is to use variable importance measures, even though the stability of these measures in the context of a whole-genome association study is unknown. For the simulated data of Problem 3 in the Genetic Analysis Workshop 15 (GAW15), we examined the variability of both rankings and magnitude of variable importance measures using 10 variables simulated to participate in gene × gene and gene × environment interactions. We conducted 500 analyses per method on one randomly selected replicate, tallying the rankings and importance measures for each of the 10 variables of interest. When the simulated effect size was strong, all three methods showed stable rankings and estimates of variable importance. However, under conditions more commonly expected to be encountered in complex diseases, random forests and generalized boosted regression showed more stable estimates of variable importance and variable rankings. Individuals endeavoring to apply statistical learning methods to detect interaction in complex disease studies should perform repeated analyses in order to assure variable importance measures and rankings do not vary greatly, even for statistical learning algorithms that are thought to be stable. BioMed Central 2007-12-18 /pmc/articles/PMC2367584/ /pubmed/18466558 Text en Copyright © 2007 Nicodemus et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Nicodemus, Kristin K Wang, Wenyi Shugart, Yin Yao Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions |
title | Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions |
title_full | Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions |
title_fullStr | Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions |
title_full_unstemmed | Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions |
title_short | Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions |
title_sort | stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367584/ https://www.ncbi.nlm.nih.gov/pubmed/18466558 |
work_keys_str_mv | AT nicodemuskristink stabilityofvariableimportancescoresandrankingsusingstatisticallearningtoolsonsinglenucleotidepolymorphismsandriskfactorsinvolvedingenegeneandgeneenvironmentinteractions AT wangwenyi stabilityofvariableimportancescoresandrankingsusingstatisticallearningtoolsonsinglenucleotidepolymorphismsandriskfactorsinvolvedingenegeneandgeneenvironmentinteractions AT shugartyinyao stabilityofvariableimportancescoresandrankingsusingstatisticallearningtoolsonsinglenucleotidepolymorphismsandriskfactorsinvolvedingenegeneandgeneenvironmentinteractions |