Cargando…

A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies

BACKGROUND: Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-se...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shao, Zhonghe, Wang, Ting, Qiao, Jiahao, Zhang, Yuchen, Huang, Shuiping, Zeng, Ping
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9429742/ https://www.ncbi.nlm.nih.gov/pubmed/36042399 http://dx.doi.org/10.1186/s12859-022-04897-3

_version_	1784779553614331904
author	Shao, Zhonghe Wang, Ting Qiao, Jiahao Zhang, Yuchen Huang, Shuiping Zeng, Ping
author_facet	Shao, Zhonghe Wang, Ting Qiao, Jiahao Zhang, Yuchen Huang, Shuiping Zeng, Ping
author_sort	Shao, Zhonghe
collection	PubMed
description	BACKGROUND: Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS: We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION: In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04897-3.
format	Online Article Text
id	pubmed-9429742
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-94297422022-09-01 A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies Shao, Zhonghe Wang, Ting Qiao, Jiahao Zhang, Yuchen Huang, Shuiping Zeng, Ping BMC Bioinformatics Research BACKGROUND: Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS: We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION: In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04897-3. BioMed Central 2022-08-30 /pmc/articles/PMC9429742/ /pubmed/36042399 http://dx.doi.org/10.1186/s12859-022-04897-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Shao, Zhonghe Wang, Ting Qiao, Jiahao Zhang, Yuchen Huang, Shuiping Zeng, Ping A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
title	A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
title_full	A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
title_fullStr	A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
title_full_unstemmed	A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
title_short	A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
title_sort	comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9429742/ https://www.ncbi.nlm.nih.gov/pubmed/36042399 http://dx.doi.org/10.1186/s12859-022-04897-3
work_keys_str_mv	AT shaozhonghe acomprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT wangting acomprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT qiaojiahao acomprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT zhangyuchen acomprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT huangshuiping acomprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT zengping acomprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT shaozhonghe comprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT wangting comprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT qiaojiahao comprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT zhangyuchen comprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT huangshuiping comprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies AT zengping comprehensivecomparisonofmultilocusassociationmethodswithsummarystatisticsingenomewideassociationstudies

A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies

Ejemplares similares