Cargando…

Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data

Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects i...

Descripción completa

Detalles Bibliográficos
Autores principales: Lubke, GH, Laurin, C, Walters, R, Eriksson, N, Hysi, P, Spector, TD, Montgomery, GW, Martin, NG, Medland, SE, Boomsma, DI
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3882018/
https://www.ncbi.nlm.nih.gov/pubmed/24404405
http://dx.doi.org/10.4172/2153-0602.1000143
_version_ 1782298305597800448
author Lubke, GH
Laurin, C
Walters, R
Eriksson, N
Hysi, P
Spector, TD
Montgomery, GW
Martin, NG
Medland, SE
Boomsma, DI
author_facet Lubke, GH
Laurin, C
Walters, R
Eriksson, N
Hysi, P
Spector, TD
Montgomery, GW
Martin, NG
Medland, SE
Boomsma, DI
author_sort Lubke, GH
collection PubMed
description Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach.
format Online
Article
Text
id pubmed-3882018
institution National Center for Biotechnology Information
language English
publishDate 2013
record_format MEDLINE/PubMed
spelling pubmed-38820182014-01-06 Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data Lubke, GH Laurin, C Walters, R Eriksson, N Hysi, P Spector, TD Montgomery, GW Martin, NG Medland, SE Boomsma, DI J Data Mining Genomics Proteomics Article Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach. 2013-10-20 /pmc/articles/PMC3882018/ /pubmed/24404405 http://dx.doi.org/10.4172/2153-0602.1000143 Text en Copyright: © 2013 Lubke GH, et al. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original the original author and source are credited.
spellingShingle Article
Lubke, GH
Laurin, C
Walters, R
Eriksson, N
Hysi, P
Spector, TD
Montgomery, GW
Martin, NG
Medland, SE
Boomsma, DI
Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data
title Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data
title_full Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data
title_fullStr Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data
title_full_unstemmed Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data
title_short Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data
title_sort gradient boosting as a snp filter: an evaluation using simulated and hair morphology data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3882018/
https://www.ncbi.nlm.nih.gov/pubmed/24404405
http://dx.doi.org/10.4172/2153-0602.1000143
work_keys_str_mv AT lubkegh gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT laurinc gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT waltersr gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT erikssonn gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT hysip gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT spectortd gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT montgomerygw gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT martinng gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT medlandse gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata
AT boomsmadi gradientboostingasasnpfilteranevaluationusingsimulatedandhairmorphologydata