Cargando…

BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data

BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and...

Descripción completa

Detalles Bibliográficos
Autores principales: Kässens, Jan Christian, Wienbrandt, Lars, Ellinghaus, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8239664/
https://www.ncbi.nlm.nih.gov/pubmed/34184051
http://dx.doi.org/10.1093/gigascience/giab047
_version_ 1783715099471511552
author Kässens, Jan Christian
Wienbrandt, Lars
Ellinghaus, David
author_facet Kässens, Jan Christian
Wienbrandt, Lars
Ellinghaus, David
author_sort Kässens, Jan Christian
collection PubMed
description BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples. RESULTS: Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. CONCLUSIONS: Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc.
format Online
Article
Text
id pubmed-8239664
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82396642021-06-29 BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data Kässens, Jan Christian Wienbrandt, Lars Ellinghaus, David Gigascience Technical Note BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples. RESULTS: Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. CONCLUSIONS: Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc. Oxford University Press 2021-06-29 /pmc/articles/PMC8239664/ /pubmed/34184051 http://dx.doi.org/10.1093/gigascience/giab047 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Kässens, Jan Christian
Wienbrandt, Lars
Ellinghaus, David
BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data
title BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data
title_full BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data
title_fullStr BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data
title_full_unstemmed BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data
title_short BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data
title_sort bigwas: single-command quality control and association testing for multi-cohort and biobank-scale gwas/phewas data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8239664/
https://www.ncbi.nlm.nih.gov/pubmed/34184051
http://dx.doi.org/10.1093/gigascience/giab047
work_keys_str_mv AT kassensjanchristian bigwassinglecommandqualitycontrolandassociationtestingformulticohortandbiobankscalegwasphewasdata
AT wienbrandtlars bigwassinglecommandqualitycontrolandassociationtestingformulticohortandbiobankscalegwasphewasdata
AT ellinghausdavid bigwassinglecommandqualitycontrolandassociationtestingformulticohortandbiobankscalegwasphewasdata