Cargando…

PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies

Genome-wide association studies (GWAS) are increasingly being applied to investigate the genetic basis of bacterial traits. However, approaches to perform power calculations for bacterial GWAS are limited. Here we implemented two alternative approaches to conduct power calculations using existing co...

Descripción completa

Detalles Bibliográficos
Autores principales: Coll, Francesc, Gouliouris, Theodore, Bruchmann, Sebastian, Phelan, Jody, Raven, Kathy E., Clark, Taane G., Parkhill, Julian, Peacock, Sharon J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8956664/
https://www.ncbi.nlm.nih.gov/pubmed/35338232
http://dx.doi.org/10.1038/s42003-022-03194-2
_version_ 1784676612673896448
author Coll, Francesc
Gouliouris, Theodore
Bruchmann, Sebastian
Phelan, Jody
Raven, Kathy E.
Clark, Taane G.
Parkhill, Julian
Peacock, Sharon J.
author_facet Coll, Francesc
Gouliouris, Theodore
Bruchmann, Sebastian
Phelan, Jody
Raven, Kathy E.
Clark, Taane G.
Parkhill, Julian
Peacock, Sharon J.
author_sort Coll, Francesc
collection PubMed
description Genome-wide association studies (GWAS) are increasingly being applied to investigate the genetic basis of bacterial traits. However, approaches to perform power calculations for bacterial GWAS are limited. Here we implemented two alternative approaches to conduct power calculations using existing collections of bacterial genomes. First, a sub-sampling approach was undertaken to reduce the allele frequency and effect size of a known and detectable genotype-phenotype relationship by modifying phenotype labels. Second, a phenotype-simulation approach was conducted to simulate phenotypes from existing genetic variants. We implemented both approaches into a computational pipeline (PowerBacGWAS) that supports power calculations for burden testing, pan-genome and variant GWAS; and applied it to collections of Enterococcus faecium, Klebsiella pneumoniae and Mycobacterium tuberculosis. We used this pipeline to determine sample sizes required to detect causal variants of different minor allele frequencies (MAF), effect sizes and phenotype heritability, and studied the effect of homoplasy and population diversity on the power to detect causal variants. Our pipeline and user documentation are made available and can be applied to other bacterial populations. PowerBacGWAS can be used to determine sample sizes required to find statistically significant associations, or the associations detectable with a given sample size. We recommend to perform power calculations using existing genomes of the bacterial species and population of study.
format Online
Article
Text
id pubmed-8956664
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-89566642022-04-20 PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies Coll, Francesc Gouliouris, Theodore Bruchmann, Sebastian Phelan, Jody Raven, Kathy E. Clark, Taane G. Parkhill, Julian Peacock, Sharon J. Commun Biol Article Genome-wide association studies (GWAS) are increasingly being applied to investigate the genetic basis of bacterial traits. However, approaches to perform power calculations for bacterial GWAS are limited. Here we implemented two alternative approaches to conduct power calculations using existing collections of bacterial genomes. First, a sub-sampling approach was undertaken to reduce the allele frequency and effect size of a known and detectable genotype-phenotype relationship by modifying phenotype labels. Second, a phenotype-simulation approach was conducted to simulate phenotypes from existing genetic variants. We implemented both approaches into a computational pipeline (PowerBacGWAS) that supports power calculations for burden testing, pan-genome and variant GWAS; and applied it to collections of Enterococcus faecium, Klebsiella pneumoniae and Mycobacterium tuberculosis. We used this pipeline to determine sample sizes required to detect causal variants of different minor allele frequencies (MAF), effect sizes and phenotype heritability, and studied the effect of homoplasy and population diversity on the power to detect causal variants. Our pipeline and user documentation are made available and can be applied to other bacterial populations. PowerBacGWAS can be used to determine sample sizes required to find statistically significant associations, or the associations detectable with a given sample size. We recommend to perform power calculations using existing genomes of the bacterial species and population of study. Nature Publishing Group UK 2022-03-25 /pmc/articles/PMC8956664/ /pubmed/35338232 http://dx.doi.org/10.1038/s42003-022-03194-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Coll, Francesc
Gouliouris, Theodore
Bruchmann, Sebastian
Phelan, Jody
Raven, Kathy E.
Clark, Taane G.
Parkhill, Julian
Peacock, Sharon J.
PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies
title PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies
title_full PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies
title_fullStr PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies
title_full_unstemmed PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies
title_short PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies
title_sort powerbacgwas: a computational pipeline to perform power calculations for bacterial genome-wide association studies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8956664/
https://www.ncbi.nlm.nih.gov/pubmed/35338232
http://dx.doi.org/10.1038/s42003-022-03194-2
work_keys_str_mv AT collfrancesc powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies
AT gouliouristheodore powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies
AT bruchmannsebastian powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies
AT phelanjody powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies
AT ravenkathye powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies
AT clarktaaneg powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies
AT parkhilljulian powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies
AT peacocksharonj powerbacgwasacomputationalpipelinetoperformpowercalculationsforbacterialgenomewideassociationstudies