Cargando…

Analysis of genome-wide association data by large-scale Bayesian logistic regression

Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yuanjia, Sha, Nanshi, Fang, Yixin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795912/
https://www.ncbi.nlm.nih.gov/pubmed/20018005
_version_ 1782175468603047936
author Wang, Yuanjia
Sha, Nanshi
Fang, Yixin
author_facet Wang, Yuanjia
Sha, Nanshi
Fang, Yixin
author_sort Wang, Yuanjia
collection PubMed
description Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L(1 )and L(2 )penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data.
format Text
id pubmed-2795912
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27959122009-12-18 Analysis of genome-wide association data by large-scale Bayesian logistic regression Wang, Yuanjia Sha, Nanshi Fang, Yixin BMC Proc Proceedings Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data from a GWA while controlling for collinearity and overfitting in a high dimensional predictor space, we propose a variable selection procedure using Bayesian logistic regression. We explored a connection between Bayesian regression with certain priors and L(1 )and L(2 )penalized logistic regression. After analyzing large number of SNPs simultaneously in a Bayesian regression, we selected important SNPs for further consideration. With much fewer SNPs of interest, problems of multiple comparisons and collinearity are less severe. We conducted simulation studies to examine probability of correctly selecting disease contributing SNPs and applied developed methods to analyze Genetic Analysis Workshop 16 North American Rheumatoid Arthritis Consortium data. BioMed Central 2009-12-15 /pmc/articles/PMC2795912/ /pubmed/20018005 Text en Copyright ©2009 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Wang, Yuanjia
Sha, Nanshi
Fang, Yixin
Analysis of genome-wide association data by large-scale Bayesian logistic regression
title Analysis of genome-wide association data by large-scale Bayesian logistic regression
title_full Analysis of genome-wide association data by large-scale Bayesian logistic regression
title_fullStr Analysis of genome-wide association data by large-scale Bayesian logistic regression
title_full_unstemmed Analysis of genome-wide association data by large-scale Bayesian logistic regression
title_short Analysis of genome-wide association data by large-scale Bayesian logistic regression
title_sort analysis of genome-wide association data by large-scale bayesian logistic regression
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795912/
https://www.ncbi.nlm.nih.gov/pubmed/20018005
work_keys_str_mv AT wangyuanjia analysisofgenomewideassociationdatabylargescalebayesianlogisticregression
AT shananshi analysisofgenomewideassociationdatabylargescalebayesianlogisticregression
AT fangyixin analysisofgenomewideassociationdatabylargescalebayesianlogisticregression