Cargando…

Assessing statistical significance in multivariable genome wide association analysis

Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple test...

Descripción completa

Detalles Bibliográficos
Autores principales: Buzdugan, Laura, Kalisch, Markus, Navarro, Arcadi, Schunk, Daniel, Fehr, Ernst, Bühlmann, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920127/
https://www.ncbi.nlm.nih.gov/pubmed/27153677
http://dx.doi.org/10.1093/bioinformatics/btw128
_version_ 1782439354415710208
author Buzdugan, Laura
Kalisch, Markus
Navarro, Arcadi
Schunk, Daniel
Fehr, Ernst
Bühlmann, Peter
author_facet Buzdugan, Laura
Kalisch, Markus
Navarro, Arcadi
Schunk, Daniel
Fehr, Ernst
Bühlmann, Peter
author_sort Buzdugan, Laura
collection PubMed
description Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ‘spuriously correlated’ SNP merely happens to be correlated with the ‘truly causal’ SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact: peter.buehlmann@stat.math.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4920127
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49201272016-06-27 Assessing statistical significance in multivariable genome wide association analysis Buzdugan, Laura Kalisch, Markus Navarro, Arcadi Schunk, Daniel Fehr, Ernst Bühlmann, Peter Bioinformatics Original Papers Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ‘spuriously correlated’ SNP merely happens to be correlated with the ‘truly causal’ SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact: peter.buehlmann@stat.math.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-07-01 2016-03-07 /pmc/articles/PMC4920127/ /pubmed/27153677 http://dx.doi.org/10.1093/bioinformatics/btw128 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Buzdugan, Laura
Kalisch, Markus
Navarro, Arcadi
Schunk, Daniel
Fehr, Ernst
Bühlmann, Peter
Assessing statistical significance in multivariable genome wide association analysis
title Assessing statistical significance in multivariable genome wide association analysis
title_full Assessing statistical significance in multivariable genome wide association analysis
title_fullStr Assessing statistical significance in multivariable genome wide association analysis
title_full_unstemmed Assessing statistical significance in multivariable genome wide association analysis
title_short Assessing statistical significance in multivariable genome wide association analysis
title_sort assessing statistical significance in multivariable genome wide association analysis
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920127/
https://www.ncbi.nlm.nih.gov/pubmed/27153677
http://dx.doi.org/10.1093/bioinformatics/btw128
work_keys_str_mv AT buzduganlaura assessingstatisticalsignificanceinmultivariablegenomewideassociationanalysis
AT kalischmarkus assessingstatisticalsignificanceinmultivariablegenomewideassociationanalysis
AT navarroarcadi assessingstatisticalsignificanceinmultivariablegenomewideassociationanalysis
AT schunkdaniel assessingstatisticalsignificanceinmultivariablegenomewideassociationanalysis
AT fehrernst assessingstatisticalsignificanceinmultivariablegenomewideassociationanalysis
AT buhlmannpeter assessingstatisticalsignificanceinmultivariablegenomewideassociationanalysis