Cargando…

Resampling procedures to identify important SNPs using a consensus approach

Our goal is to identify common single-nucleotide polymorphisms (SNPs) (minor allele frequency > 1%) that add predictive accuracy above that gained by knowledge of easily measured clinical variables. We take an algorithmic approach to predict each phenotypic variable using a combination of phenoty...

Descripción completa

Detalles Bibliográficos
Autores principales: Pardy, Christopher, Motyer, Allan, Wilson, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287897/
https://www.ncbi.nlm.nih.gov/pubmed/22373247
http://dx.doi.org/10.1186/1753-6561-5-S9-S59
_version_ 1782224768450166784
author Pardy, Christopher
Motyer, Allan
Wilson, Susan
author_facet Pardy, Christopher
Motyer, Allan
Wilson, Susan
author_sort Pardy, Christopher
collection PubMed
description Our goal is to identify common single-nucleotide polymorphisms (SNPs) (minor allele frequency > 1%) that add predictive accuracy above that gained by knowledge of easily measured clinical variables. We take an algorithmic approach to predict each phenotypic variable using a combination of phenotypic and genotypic predictors. We perform our procedure on the first simulated replicate and then validate against the others. Our procedure performs well when predicting Q1 but is less successful for the other outcomes. We use resampling procedures where possible to guard against false positives and to improve generalizability. The approach is based on finding a consensus regarding important SNPs by applying random forests and the least absolute shrinkage and selection operator (LASSO) on multiple subsamples. Random forests are used first to discard unimportant predictors, narrowing our focus to roughly 100 important SNPs. A cross-validation LASSO is then used to further select variables. We combine these procedures to guarantee that cross-validation can be used to choose a shrinkage parameter for the LASSO. If the clinical variables were unavailable, this prefiltering step would be essential. We perform the SNP-based analyses simultaneously rather than one at a time to estimate SNP effects in the presence of other causal variants. We analyzed the first simulated replicate of Genetic Analysis Workshop 17 without knowledge of the true model. Post-conference knowledge of the simulation parameters allowed us to investigate the limitations of our approach. We found that many of the false positives we identified were substantially correlated with genuine causal SNPs.
format Online
Article
Text
id pubmed-3287897
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32878972012-02-28 Resampling procedures to identify important SNPs using a consensus approach Pardy, Christopher Motyer, Allan Wilson, Susan BMC Proc Proceedings Our goal is to identify common single-nucleotide polymorphisms (SNPs) (minor allele frequency > 1%) that add predictive accuracy above that gained by knowledge of easily measured clinical variables. We take an algorithmic approach to predict each phenotypic variable using a combination of phenotypic and genotypic predictors. We perform our procedure on the first simulated replicate and then validate against the others. Our procedure performs well when predicting Q1 but is less successful for the other outcomes. We use resampling procedures where possible to guard against false positives and to improve generalizability. The approach is based on finding a consensus regarding important SNPs by applying random forests and the least absolute shrinkage and selection operator (LASSO) on multiple subsamples. Random forests are used first to discard unimportant predictors, narrowing our focus to roughly 100 important SNPs. A cross-validation LASSO is then used to further select variables. We combine these procedures to guarantee that cross-validation can be used to choose a shrinkage parameter for the LASSO. If the clinical variables were unavailable, this prefiltering step would be essential. We perform the SNP-based analyses simultaneously rather than one at a time to estimate SNP effects in the presence of other causal variants. We analyzed the first simulated replicate of Genetic Analysis Workshop 17 without knowledge of the true model. Post-conference knowledge of the simulation parameters allowed us to investigate the limitations of our approach. We found that many of the false positives we identified were substantially correlated with genuine causal SNPs. BioMed Central 2011-11-29 /pmc/articles/PMC3287897/ /pubmed/22373247 http://dx.doi.org/10.1186/1753-6561-5-S9-S59 Text en Copyright ©2011 Pardy et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Pardy, Christopher
Motyer, Allan
Wilson, Susan
Resampling procedures to identify important SNPs using a consensus approach
title Resampling procedures to identify important SNPs using a consensus approach
title_full Resampling procedures to identify important SNPs using a consensus approach
title_fullStr Resampling procedures to identify important SNPs using a consensus approach
title_full_unstemmed Resampling procedures to identify important SNPs using a consensus approach
title_short Resampling procedures to identify important SNPs using a consensus approach
title_sort resampling procedures to identify important snps using a consensus approach
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287897/
https://www.ncbi.nlm.nih.gov/pubmed/22373247
http://dx.doi.org/10.1186/1753-6561-5-S9-S59
work_keys_str_mv AT pardychristopher resamplingprocedurestoidentifyimportantsnpsusingaconsensusapproach
AT motyerallan resamplingprocedurestoidentifyimportantsnpsusingaconsensusapproach
AT wilsonsusan resamplingprocedurestoidentifyimportantsnpsusingaconsensusapproach