Cargando…

Mixed logistic regression in genome-wide association studies

BACKGROUND: Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT,...

Descripción completa

Detalles Bibliográficos
Autores principales: Milet, Jacqueline, Courtin, David, Garcia, André, Perdry, Hervé
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7684894/
https://www.ncbi.nlm.nih.gov/pubmed/33228527
http://dx.doi.org/10.1186/s12859-020-03862-2
_version_ 1783613088980795392
author Milet, Jacqueline
Courtin, David
Garcia, André
Perdry, Hervé
author_facet Milet, Jacqueline
Courtin, David
Garcia, André
Perdry, Hervé
author_sort Milet, Jacqueline
collection PubMed
description BACKGROUND: Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. RESULTS: We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. CONCLUSION: The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).
format Online
Article
Text
id pubmed-7684894
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-76848942020-11-25 Mixed logistic regression in genome-wide association studies Milet, Jacqueline Courtin, David Garcia, André Perdry, Hervé BMC Bioinformatics Methodology Article BACKGROUND: Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants’ effects. We propose two computationally efficient methods to estimate the variants’ effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa. RESULTS: We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample. CONCLUSION: The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis). BioMed Central 2020-11-23 /pmc/articles/PMC7684894/ /pubmed/33228527 http://dx.doi.org/10.1186/s12859-020-03862-2 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Milet, Jacqueline
Courtin, David
Garcia, André
Perdry, Hervé
Mixed logistic regression in genome-wide association studies
title Mixed logistic regression in genome-wide association studies
title_full Mixed logistic regression in genome-wide association studies
title_fullStr Mixed logistic regression in genome-wide association studies
title_full_unstemmed Mixed logistic regression in genome-wide association studies
title_short Mixed logistic regression in genome-wide association studies
title_sort mixed logistic regression in genome-wide association studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7684894/
https://www.ncbi.nlm.nih.gov/pubmed/33228527
http://dx.doi.org/10.1186/s12859-020-03862-2
work_keys_str_mv AT miletjacqueline mixedlogisticregressioningenomewideassociationstudies
AT courtindavid mixedlogisticregressioningenomewideassociationstudies
AT garciaandre mixedlogisticregressioningenomewideassociationstudies
AT perdryherve mixedlogisticregressioningenomewideassociationstudies