Cargando…

A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies

BACKGROUND: Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that needs to be taken into account. Hence, increasingly modern computationally expensive regression...

Descripción completa

Detalles Bibliográficos
Autores principales: Zuber, Verena, Duarte Silva, A Pedro, Strimmer, Korbinian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558454/
https://www.ncbi.nlm.nih.gov/pubmed/23113980
http://dx.doi.org/10.1186/1471-2105-13-284
_version_ 1782257440146849792
author Zuber, Verena
Duarte Silva, A Pedro
Strimmer, Korbinian
author_facet Zuber, Verena
Duarte Silva, A Pedro
Strimmer, Korbinian
author_sort Zuber, Verena
collection PubMed
description BACKGROUND: Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that needs to be taken into account. Hence, increasingly modern computationally expensive regression methods are employed for SNP selection that consider all markers simultaneously and thus incorporate dependencies among SNPs. RESULTS: We develop a novel multivariate algorithm for large scale SNP selection using CAR score regression, a promising new approach for prioritizing biomarkers. Specifically, we propose a computationally efficient procedure for shrinkage estimation of CAR scores from high-dimensional data. Subsequently, we conduct a comprehensive comparison study including five advanced regression approaches (boosting, lasso, NEG, MCP, and CAR score) and a univariate approach (marginal correlation) to determine the effectiveness in finding true causal SNPs. CONCLUSIONS: Simultaneous SNP selection is a challenging task. We demonstrate that our CAR score-based algorithm consistently outperforms all competing approaches, both uni- and multivariate, in terms of correctly recovered causal SNPs and SNP ranking. An R package implementing the approach as well as R code to reproduce the complete study presented here is available from http://strimmerlab.org/software/care/.
format Online
Article
Text
id pubmed-3558454
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35584542013-01-31 A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies Zuber, Verena Duarte Silva, A Pedro Strimmer, Korbinian BMC Bioinformatics Methodology Article BACKGROUND: Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that needs to be taken into account. Hence, increasingly modern computationally expensive regression methods are employed for SNP selection that consider all markers simultaneously and thus incorporate dependencies among SNPs. RESULTS: We develop a novel multivariate algorithm for large scale SNP selection using CAR score regression, a promising new approach for prioritizing biomarkers. Specifically, we propose a computationally efficient procedure for shrinkage estimation of CAR scores from high-dimensional data. Subsequently, we conduct a comprehensive comparison study including five advanced regression approaches (boosting, lasso, NEG, MCP, and CAR score) and a univariate approach (marginal correlation) to determine the effectiveness in finding true causal SNPs. CONCLUSIONS: Simultaneous SNP selection is a challenging task. We demonstrate that our CAR score-based algorithm consistently outperforms all competing approaches, both uni- and multivariate, in terms of correctly recovered causal SNPs and SNP ranking. An R package implementing the approach as well as R code to reproduce the complete study presented here is available from http://strimmerlab.org/software/care/. BioMed Central 2012-10-31 /pmc/articles/PMC3558454/ /pubmed/23113980 http://dx.doi.org/10.1186/1471-2105-13-284 Text en Copyright ©2012 Zuber et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Zuber, Verena
Duarte Silva, A Pedro
Strimmer, Korbinian
A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies
title A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies
title_full A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies
title_fullStr A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies
title_full_unstemmed A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies
title_short A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies
title_sort novel algorithm for simultaneous snp selection in high-dimensional genome-wide association studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3558454/
https://www.ncbi.nlm.nih.gov/pubmed/23113980
http://dx.doi.org/10.1186/1471-2105-13-284
work_keys_str_mv AT zuberverena anovelalgorithmforsimultaneoussnpselectioninhighdimensionalgenomewideassociationstudies
AT duartesilvaapedro anovelalgorithmforsimultaneoussnpselectioninhighdimensionalgenomewideassociationstudies
AT strimmerkorbinian anovelalgorithmforsimultaneoussnpselectioninhighdimensionalgenomewideassociationstudies
AT zuberverena novelalgorithmforsimultaneoussnpselectioninhighdimensionalgenomewideassociationstudies
AT duartesilvaapedro novelalgorithmforsimultaneoussnpselectioninhighdimensionalgenomewideassociationstudies
AT strimmerkorbinian novelalgorithmforsimultaneoussnpselectioninhighdimensionalgenomewideassociationstudies