Cargando…
A benchmark study on current GWAS models in admixed populations
OBJECTIVE: The performances of popular Genome-wide association study (GWAS) models haven’t been examined yet in a consistent manner under the scenario of genetic admixture, which introduces several challenging aspects such as heterogeneity of minor allele frequency (MAF), a wide spectrum of case-con...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168347/ https://www.ncbi.nlm.nih.gov/pubmed/37163101 http://dx.doi.org/10.1101/2023.04.27.538299 |
_version_ | 1785038838731636736 |
---|---|
author | Yang, Zikun Huaman, Basilio Cieza Reyes-Dumeyer, Dolly Montesinos, Rosa Soto-Añari, Marcio Custodio, Nilton Tosto, Giuseppe |
author_facet | Yang, Zikun Huaman, Basilio Cieza Reyes-Dumeyer, Dolly Montesinos, Rosa Soto-Añari, Marcio Custodio, Nilton Tosto, Giuseppe |
author_sort | Yang, Zikun |
collection | PubMed |
description | OBJECTIVE: The performances of popular Genome-wide association study (GWAS) models haven’t been examined yet in a consistent manner under the scenario of genetic admixture, which introduces several challenging aspects such as heterogeneity of minor allele frequency (MAF), a wide spectrum of case-control ratio, and varying effect sizes etc. METHODS: We generated a cohort of synthetic individuals (N=19,234) that simulates 1) a large sample size; 2) two-way admixture [Native American-European ancestry] and 3) a binary phenotype. We then examined the inflation factors produced by three popular GWAS tools: GMMAT, SAIGE, and Tractor. We also computed power calculations under different MAFs, case-control ratios, and varying ancestry percentages. Then, we employed a cohort of Peruvians (N=249) to further examine the performances of the testing models on 1) real genetic data and 2) small sample sizes. Finally, we validated these findings using an independent Peruvian cohort (N=109) included in 1000 Genome project (1000G). RESULTS: In the synthetic cohort, SAIGE performed better than GMMAT and Tractor in terms of type-I error rate, especially under severe unbalanced case-control ratio. On the contrary, power analysis identified Tractor as the best method to pinpoint ancestry-specific causal variants, but showed decreased power when no adequate heterogeneity of the true effect sizes was simulated between ancestries. The real Peruvian data showed that Tractor is severely affected by small sample sizes, and produced severely inflated statistics, which we replicated in the 1000G Peruvian cohort. DISCUSSION: The current study illustrates the limitations of available GWAS tools under different scenarios of genetic admixture. We urge caution when interpreting results under complex population scenarios. |
format | Online Article Text |
id | pubmed-10168347 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-101683472023-05-10 A benchmark study on current GWAS models in admixed populations Yang, Zikun Huaman, Basilio Cieza Reyes-Dumeyer, Dolly Montesinos, Rosa Soto-Añari, Marcio Custodio, Nilton Tosto, Giuseppe bioRxiv Article OBJECTIVE: The performances of popular Genome-wide association study (GWAS) models haven’t been examined yet in a consistent manner under the scenario of genetic admixture, which introduces several challenging aspects such as heterogeneity of minor allele frequency (MAF), a wide spectrum of case-control ratio, and varying effect sizes etc. METHODS: We generated a cohort of synthetic individuals (N=19,234) that simulates 1) a large sample size; 2) two-way admixture [Native American-European ancestry] and 3) a binary phenotype. We then examined the inflation factors produced by three popular GWAS tools: GMMAT, SAIGE, and Tractor. We also computed power calculations under different MAFs, case-control ratios, and varying ancestry percentages. Then, we employed a cohort of Peruvians (N=249) to further examine the performances of the testing models on 1) real genetic data and 2) small sample sizes. Finally, we validated these findings using an independent Peruvian cohort (N=109) included in 1000 Genome project (1000G). RESULTS: In the synthetic cohort, SAIGE performed better than GMMAT and Tractor in terms of type-I error rate, especially under severe unbalanced case-control ratio. On the contrary, power analysis identified Tractor as the best method to pinpoint ancestry-specific causal variants, but showed decreased power when no adequate heterogeneity of the true effect sizes was simulated between ancestries. The real Peruvian data showed that Tractor is severely affected by small sample sizes, and produced severely inflated statistics, which we replicated in the 1000G Peruvian cohort. DISCUSSION: The current study illustrates the limitations of available GWAS tools under different scenarios of genetic admixture. We urge caution when interpreting results under complex population scenarios. Cold Spring Harbor Laboratory 2023-04-30 /pmc/articles/PMC10168347/ /pubmed/37163101 http://dx.doi.org/10.1101/2023.04.27.538299 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Yang, Zikun Huaman, Basilio Cieza Reyes-Dumeyer, Dolly Montesinos, Rosa Soto-Añari, Marcio Custodio, Nilton Tosto, Giuseppe A benchmark study on current GWAS models in admixed populations |
title | A benchmark study on current GWAS models in admixed populations |
title_full | A benchmark study on current GWAS models in admixed populations |
title_fullStr | A benchmark study on current GWAS models in admixed populations |
title_full_unstemmed | A benchmark study on current GWAS models in admixed populations |
title_short | A benchmark study on current GWAS models in admixed populations |
title_sort | benchmark study on current gwas models in admixed populations |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168347/ https://www.ncbi.nlm.nih.gov/pubmed/37163101 http://dx.doi.org/10.1101/2023.04.27.538299 |
work_keys_str_mv | AT yangzikun abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT huamanbasiliocieza abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT reyesdumeyerdolly abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT montesinosrosa abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT sotoanarimarcio abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT custodionilton abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT tostogiuseppe abenchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT yangzikun benchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT huamanbasiliocieza benchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT reyesdumeyerdolly benchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT montesinosrosa benchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT sotoanarimarcio benchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT custodionilton benchmarkstudyoncurrentgwasmodelsinadmixedpopulations AT tostogiuseppe benchmarkstudyoncurrentgwasmodelsinadmixedpopulations |