Cargando…

A benchmark study on current GWAS models in admixed populations

OBJECTIVE: The performances of popular Genome-wide association study (GWAS) models haven’t been examined yet in a consistent manner under the scenario of genetic admixture, which introduces several challenging aspects such as heterogeneity of minor allele frequency (MAF), a wide spectrum of case-con...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Zikun, Huaman, Basilio Cieza, Reyes-Dumeyer, Dolly, Montesinos, Rosa, Soto-Añari, Marcio, Custodio, Nilton, Tosto, Giuseppe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168347/
https://www.ncbi.nlm.nih.gov/pubmed/37163101
http://dx.doi.org/10.1101/2023.04.27.538299
_version_ 1785038838731636736
author Yang, Zikun
Huaman, Basilio Cieza
Reyes-Dumeyer, Dolly
Montesinos, Rosa
Soto-Añari, Marcio
Custodio, Nilton
Tosto, Giuseppe
author_facet Yang, Zikun
Huaman, Basilio Cieza
Reyes-Dumeyer, Dolly
Montesinos, Rosa
Soto-Añari, Marcio
Custodio, Nilton
Tosto, Giuseppe
author_sort Yang, Zikun
collection PubMed
description OBJECTIVE: The performances of popular Genome-wide association study (GWAS) models haven’t been examined yet in a consistent manner under the scenario of genetic admixture, which introduces several challenging aspects such as heterogeneity of minor allele frequency (MAF), a wide spectrum of case-control ratio, and varying effect sizes etc. METHODS: We generated a cohort of synthetic individuals (N=19,234) that simulates 1) a large sample size; 2) two-way admixture [Native American-European ancestry] and 3) a binary phenotype. We then examined the inflation factors produced by three popular GWAS tools: GMMAT, SAIGE, and Tractor. We also computed power calculations under different MAFs, case-control ratios, and varying ancestry percentages. Then, we employed a cohort of Peruvians (N=249) to further examine the performances of the testing models on 1) real genetic data and 2) small sample sizes. Finally, we validated these findings using an independent Peruvian cohort (N=109) included in 1000 Genome project (1000G). RESULTS: In the synthetic cohort, SAIGE performed better than GMMAT and Tractor in terms of type-I error rate, especially under severe unbalanced case-control ratio. On the contrary, power analysis identified Tractor as the best method to pinpoint ancestry-specific causal variants, but showed decreased power when no adequate heterogeneity of the true effect sizes was simulated between ancestries. The real Peruvian data showed that Tractor is severely affected by small sample sizes, and produced severely inflated statistics, which we replicated in the 1000G Peruvian cohort. DISCUSSION: The current study illustrates the limitations of available GWAS tools under different scenarios of genetic admixture. We urge caution when interpreting results under complex population scenarios.
format Online
Article
Text
id pubmed-10168347
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101683472023-05-10 A benchmark study on current GWAS models in admixed populations Yang, Zikun Huaman, Basilio Cieza Reyes-Dumeyer, Dolly Montesinos, Rosa Soto-Añari, Marcio Custodio, Nilton Tosto, Giuseppe bioRxiv Article OBJECTIVE: The performances of popular Genome-wide association study (GWAS) models haven’t been examined yet in a consistent manner under the scenario of genetic admixture, which introduces several challenging aspects such as heterogeneity of minor allele frequency (MAF), a wide spectrum of case-control ratio, and varying effect sizes etc. METHODS: We generated a cohort of synthetic individuals (N=19,234) that simulates 1) a large sample size; 2) two-way admixture [Native American-European ancestry] and 3) a binary phenotype. We then examined the inflation factors produced by three popular GWAS tools: GMMAT, SAIGE, and Tractor. We also computed power calculations under different MAFs, case-control ratios, and varying ancestry percentages. Then, we employed a cohort of Peruvians (N=249) to further examine the performances of the testing models on 1) real genetic data and 2) small sample sizes. Finally, we validated these findings using an independent Peruvian cohort (N=109) included in 1000 Genome project (1000G). RESULTS: In the synthetic cohort, SAIGE performed better than GMMAT and Tractor in terms of type-I error rate, especially under severe unbalanced case-control ratio. On the contrary, power analysis identified Tractor as the best method to pinpoint ancestry-specific causal variants, but showed decreased power when no adequate heterogeneity of the true effect sizes was simulated between ancestries. The real Peruvian data showed that Tractor is severely affected by small sample sizes, and produced severely inflated statistics, which we replicated in the 1000G Peruvian cohort. DISCUSSION: The current study illustrates the limitations of available GWAS tools under different scenarios of genetic admixture. We urge caution when interpreting results under complex population scenarios. Cold Spring Harbor Laboratory 2023-04-30 /pmc/articles/PMC10168347/ /pubmed/37163101 http://dx.doi.org/10.1101/2023.04.27.538299 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Yang, Zikun
Huaman, Basilio Cieza
Reyes-Dumeyer, Dolly
Montesinos, Rosa
Soto-Añari, Marcio
Custodio, Nilton
Tosto, Giuseppe
A benchmark study on current GWAS models in admixed populations
title A benchmark study on current GWAS models in admixed populations
title_full A benchmark study on current GWAS models in admixed populations
title_fullStr A benchmark study on current GWAS models in admixed populations
title_full_unstemmed A benchmark study on current GWAS models in admixed populations
title_short A benchmark study on current GWAS models in admixed populations
title_sort benchmark study on current gwas models in admixed populations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168347/
https://www.ncbi.nlm.nih.gov/pubmed/37163101
http://dx.doi.org/10.1101/2023.04.27.538299
work_keys_str_mv AT yangzikun abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT huamanbasiliocieza abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT reyesdumeyerdolly abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT montesinosrosa abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT sotoanarimarcio abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT custodionilton abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT tostogiuseppe abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT yangzikun benchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT huamanbasiliocieza benchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT reyesdumeyerdolly benchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT montesinosrosa benchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT sotoanarimarcio benchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT custodionilton benchmarkstudyoncurrentgwasmodelsinadmixedpopulations
AT tostogiuseppe benchmarkstudyoncurrentgwasmodelsinadmixedpopulations