Cargando…

Bayesian multiple logistic regression for case-control GWAS

Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-...

Descripción completa

Detalles Bibliográficos
Autores principales: Banerjee, Saikat, Zeng, Lingyao, Schunkert, Heribert, Söding, Johannes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329526/
https://www.ncbi.nlm.nih.gov/pubmed/30596640
http://dx.doi.org/10.1371/journal.pgen.1007856
_version_ 1783386843298922496
author Banerjee, Saikat
Zeng, Lingyao
Schunkert, Heribert
Söding, Johannes
author_facet Banerjee, Saikat
Zeng, Lingyao
Schunkert, Heribert
Söding, Johannes
author_sort Banerjee, Saikat
collection PubMed
description Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.
format Online
Article
Text
id pubmed-6329526
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63295262019-01-30 Bayesian multiple logistic regression for case-control GWAS Banerjee, Saikat Zeng, Lingyao Schunkert, Heribert Söding, Johannes PLoS Genet Research Article Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore. Public Library of Science 2018-12-31 /pmc/articles/PMC6329526/ /pubmed/30596640 http://dx.doi.org/10.1371/journal.pgen.1007856 Text en © 2018 Banerjee et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Banerjee, Saikat
Zeng, Lingyao
Schunkert, Heribert
Söding, Johannes
Bayesian multiple logistic regression for case-control GWAS
title Bayesian multiple logistic regression for case-control GWAS
title_full Bayesian multiple logistic regression for case-control GWAS
title_fullStr Bayesian multiple logistic regression for case-control GWAS
title_full_unstemmed Bayesian multiple logistic regression for case-control GWAS
title_short Bayesian multiple logistic regression for case-control GWAS
title_sort bayesian multiple logistic regression for case-control gwas
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329526/
https://www.ncbi.nlm.nih.gov/pubmed/30596640
http://dx.doi.org/10.1371/journal.pgen.1007856
work_keys_str_mv AT banerjeesaikat bayesianmultiplelogisticregressionforcasecontrolgwas
AT zenglingyao bayesianmultiplelogisticregressionforcasecontrolgwas
AT schunkertheribert bayesianmultiplelogisticregressionforcasecontrolgwas
AT sodingjohannes bayesianmultiplelogisticregressionforcasecontrolgwas