Cargando…
Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data
Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects i...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3882018/ https://www.ncbi.nlm.nih.gov/pubmed/24404405 http://dx.doi.org/10.4172/2153-0602.1000143 |
_version_ | 1782298305597800448 |
---|---|
author | Lubke, GH Laurin, C Walters, R Eriksson, N Hysi, P Spector, TD Montgomery, GW Martin, NG Medland, SE Boomsma, DI |
author_facet | Lubke, GH Laurin, C Walters, R Eriksson, N Hysi, P Spector, TD Montgomery, GW Martin, NG Medland, SE Boomsma, DI |
author_sort | Lubke, GH |
collection | PubMed |
description | Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach. |
format | Online Article Text |
id | pubmed-3882018 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
record_format | MEDLINE/PubMed |
spelling | pubmed-38820182014-01-06 Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data Lubke, GH Laurin, C Walters, R Eriksson, N Hysi, P Spector, TD Montgomery, GW Martin, NG Medland, SE Boomsma, DI J Data Mining Genomics Proteomics Article Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach. 2013-10-20 /pmc/articles/PMC3882018/ /pubmed/24404405 http://dx.doi.org/10.4172/2153-0602.1000143 Text en Copyright: © 2013 Lubke GH, et al. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original the original author and source are credited. |
spellingShingle | Article Lubke, GH Laurin, C Walters, R Eriksson, N Hysi, P Spector, TD Montgomery, GW Martin, NG Medland, SE Boomsma, DI Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data |
title | Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data |
title_full | Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data |
title_fullStr | Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data |
title_full_unstemmed | Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data |
title_short | Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data |
title_sort | gradient boosting as a snp filter: an evaluation using simulated and hair morphology data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3882018/ https://www.ncbi.nlm.nih.gov/pubmed/24404405 http://dx.doi.org/10.4172/2153-0602.1000143 |
work_keys_str_mv | AT lubkegh gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT laurinc gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT waltersr gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT erikssonn gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT hysip gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT spectortd gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT montgomerygw gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT martinng gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT medlandse gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata AT boomsmadi gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata |