Cargando…
CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS
MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can d...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235473/ https://www.ncbi.nlm.nih.gov/pubmed/35758804 http://dx.doi.org/10.1093/bioinformatics/btac238 |
_version_ | 1784736317309976576 |
---|---|
author | Roux de Bézieux, Hector Lima, Leandro Perraudeau, Fanny Mary, Arnaud Dudoit, Sandrine Jacob, Laurent |
author_facet | Roux de Bézieux, Hector Lima, Leandro Perraudeau, Fanny Mary, Arnaud Dudoit, Sandrine Jacob, Laurent |
author_sort | Roux de Bézieux, Hector |
collection | PubMed |
description | MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects. RESULTS: Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9235473 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92354732022-06-29 CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS Roux de Bézieux, Hector Lima, Leandro Perraudeau, Fanny Mary, Arnaud Dudoit, Sandrine Jacob, Laurent Bioinformatics ISCB/Ismb 2022 MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects. RESULTS: Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-27 /pmc/articles/PMC9235473/ /pubmed/35758804 http://dx.doi.org/10.1093/bioinformatics/btac238 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | ISCB/Ismb 2022 Roux de Bézieux, Hector Lima, Leandro Perraudeau, Fanny Mary, Arnaud Dudoit, Sandrine Jacob, Laurent CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS |
title | CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS |
title_full | CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS |
title_fullStr | CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS |
title_full_unstemmed | CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS |
title_short | CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS |
title_sort | caldera: finding all significant de bruijn subgraphs for bacterial gwas |
topic | ISCB/Ismb 2022 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235473/ https://www.ncbi.nlm.nih.gov/pubmed/35758804 http://dx.doi.org/10.1093/bioinformatics/btac238 |
work_keys_str_mv | AT rouxdebezieuxhector calderafindingallsignificantdebruijnsubgraphsforbacterialgwas AT limaleandro calderafindingallsignificantdebruijnsubgraphsforbacterialgwas AT perraudeaufanny calderafindingallsignificantdebruijnsubgraphsforbacterialgwas AT maryarnaud calderafindingallsignificantdebruijnsubgraphsforbacterialgwas AT dudoitsandrine calderafindingallsignificantdebruijnsubgraphsforbacterialgwas AT jacoblaurent calderafindingallsignificantdebruijnsubgraphsforbacterialgwas |