Cargando…

Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data

Performing a genome-wide association study (GWAS) with a binary phenotype using family data is a challenging task. Using linear mixed effects models is typically unsuitable for binary traits, and numerical approximations of the likelihood function may not work well with rare genetic variants with sm...

Descripción completa

Detalles Bibliográficos
Autores principales: Gurinovich, Anastasia, Li, Mengze, Leshchyk, Anastasia, Bae, Harold, Song, Zeyuan, Arbeev, Konstantin G., Nygaard, Marianne, Feitosa, Mary F, Perls, Thomas T, Sebastiani, Paola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9544087/
https://www.ncbi.nlm.nih.gov/pubmed/36212134
http://dx.doi.org/10.3389/fgene.2022.897210
_version_ 1784804521511223296
author Gurinovich, Anastasia
Li, Mengze
Leshchyk, Anastasia
Bae, Harold
Song, Zeyuan
Arbeev, Konstantin G.
Nygaard, Marianne
Feitosa, Mary F
Perls, Thomas T
Sebastiani, Paola
author_facet Gurinovich, Anastasia
Li, Mengze
Leshchyk, Anastasia
Bae, Harold
Song, Zeyuan
Arbeev, Konstantin G.
Nygaard, Marianne
Feitosa, Mary F
Perls, Thomas T
Sebastiani, Paola
author_sort Gurinovich, Anastasia
collection PubMed
description Performing a genome-wide association study (GWAS) with a binary phenotype using family data is a challenging task. Using linear mixed effects models is typically unsuitable for binary traits, and numerical approximations of the likelihood function may not work well with rare genetic variants with small counts. Additionally, imbalance in the case-control ratios poses challenges as traditional statistical methods such as the Score test or Wald test perform poorly in this setting. In the last couple of years, several methods have been proposed to better approximate the likelihood function of a mixed effects logistic regression model that uses Saddle Point Approximation (SPA). SPA adjustment has recently been implemented in multiple software, including GENESIS, SAIGE, REGENIE and fastGWA-GLMM: four increasingly popular tools to perform GWAS of binary traits. We compare Score and SPA tests using real family data to evaluate computational efficiency and the agreement of the results. Additionally, we compare various ways to adjust for family relatedness, such as sparse and full genetic relationship matrices (GRM) and polygenic effect estimates. We use the New England Centenarian Study imputed genotype data and the Long Life Family Study whole-genome sequencing data and the binary phenotype of human extreme longevity to compare the agreement of the results and tools’ computational performance. The evaluation suggests that REGENIE might not be a good choice when analyzing correlated data of a small size. fastGWA-GLMM is the most computationally efficient compared to the other three tools, but it appears to be overly conservative when applied to family-based data. GENESIS, SAIGE and fastGWA-GLMM produced similar, although not identical, results, with SPA adjustment performing better than Score tests. Our evaluation also demonstrates the importance of adjusting by full GRM in highly correlated datasets when using GENESIS or SAIGE.
format Online
Article
Text
id pubmed-9544087
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95440872022-10-08 Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data Gurinovich, Anastasia Li, Mengze Leshchyk, Anastasia Bae, Harold Song, Zeyuan Arbeev, Konstantin G. Nygaard, Marianne Feitosa, Mary F Perls, Thomas T Sebastiani, Paola Front Genet Genetics Performing a genome-wide association study (GWAS) with a binary phenotype using family data is a challenging task. Using linear mixed effects models is typically unsuitable for binary traits, and numerical approximations of the likelihood function may not work well with rare genetic variants with small counts. Additionally, imbalance in the case-control ratios poses challenges as traditional statistical methods such as the Score test or Wald test perform poorly in this setting. In the last couple of years, several methods have been proposed to better approximate the likelihood function of a mixed effects logistic regression model that uses Saddle Point Approximation (SPA). SPA adjustment has recently been implemented in multiple software, including GENESIS, SAIGE, REGENIE and fastGWA-GLMM: four increasingly popular tools to perform GWAS of binary traits. We compare Score and SPA tests using real family data to evaluate computational efficiency and the agreement of the results. Additionally, we compare various ways to adjust for family relatedness, such as sparse and full genetic relationship matrices (GRM) and polygenic effect estimates. We use the New England Centenarian Study imputed genotype data and the Long Life Family Study whole-genome sequencing data and the binary phenotype of human extreme longevity to compare the agreement of the results and tools’ computational performance. The evaluation suggests that REGENIE might not be a good choice when analyzing correlated data of a small size. fastGWA-GLMM is the most computationally efficient compared to the other three tools, but it appears to be overly conservative when applied to family-based data. GENESIS, SAIGE and fastGWA-GLMM produced similar, although not identical, results, with SPA adjustment performing better than Score tests. Our evaluation also demonstrates the importance of adjusting by full GRM in highly correlated datasets when using GENESIS or SAIGE. Frontiers Media S.A. 2022-09-23 /pmc/articles/PMC9544087/ /pubmed/36212134 http://dx.doi.org/10.3389/fgene.2022.897210 Text en Copyright © 2022 Gurinovich, Li, Leshchyk, Bae, Song, Arbeev, Nygaard, Feitosa, Perls and Sebastiani. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Gurinovich, Anastasia
Li, Mengze
Leshchyk, Anastasia
Bae, Harold
Song, Zeyuan
Arbeev, Konstantin G.
Nygaard, Marianne
Feitosa, Mary F
Perls, Thomas T
Sebastiani, Paola
Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data
title Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data
title_full Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data
title_fullStr Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data
title_full_unstemmed Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data
title_short Evaluation of GENESIS, SAIGE, REGENIE and fastGWA-GLMM for genome-wide association studies of binary traits in correlated data
title_sort evaluation of genesis, saige, regenie and fastgwa-glmm for genome-wide association studies of binary traits in correlated data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9544087/
https://www.ncbi.nlm.nih.gov/pubmed/36212134
http://dx.doi.org/10.3389/fgene.2022.897210
work_keys_str_mv AT gurinovichanastasia evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT limengze evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT leshchykanastasia evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT baeharold evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT songzeyuan evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT arbeevkonstanting evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT nygaardmarianne evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT feitosamaryf evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT perlsthomast evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata
AT sebastianipaola evaluationofgenesissaigeregenieandfastgwaglmmforgenomewideassociationstudiesofbinarytraitsincorrelateddata