Cargando…

CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS

MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can d...

Descripción completa

Detalles Bibliográficos
Autores principales: Roux de Bézieux, Hector, Lima, Leandro, Perraudeau, Fanny, Mary, Arnaud, Dudoit, Sandrine, Jacob, Laurent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235473/
https://www.ncbi.nlm.nih.gov/pubmed/35758804
http://dx.doi.org/10.1093/bioinformatics/btac238
_version_ 1784736317309976576
author Roux de Bézieux, Hector
Lima, Leandro
Perraudeau, Fanny
Mary, Arnaud
Dudoit, Sandrine
Jacob, Laurent
author_facet Roux de Bézieux, Hector
Lima, Leandro
Perraudeau, Fanny
Mary, Arnaud
Dudoit, Sandrine
Jacob, Laurent
author_sort Roux de Bézieux, Hector
collection PubMed
description MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects. RESULTS: Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9235473
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92354732022-06-29 CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS Roux de Bézieux, Hector Lima, Leandro Perraudeau, Fanny Mary, Arnaud Dudoit, Sandrine Jacob, Laurent Bioinformatics ISCB/Ismb 2022 MOTIVATION: Genome-wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single-nucleotide polymorphisms to mobile genetic elements. This approach does not require a reference genome, making it easier to account for accessory genes. However, a same gene can exist in slightly different versions across different strains, leading to diluted effects. RESULTS: Here, we overcome this issue by testing covariates built from closed connected subgraphs (CCSs) of the de Bruijn graph defined over genomic k-mers. These covariates capture polymorphic genes as a single entity, improving k-mer-based GWAS both in terms of power and interpretability. However, a method naively testing all possible subgraphs would be powerless due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all CCSs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. Our method integrates with existing visual tools to facilitate interpretation. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_ISMB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-27 /pmc/articles/PMC9235473/ /pubmed/35758804 http://dx.doi.org/10.1093/bioinformatics/btac238 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle ISCB/Ismb 2022
Roux de Bézieux, Hector
Lima, Leandro
Perraudeau, Fanny
Mary, Arnaud
Dudoit, Sandrine
Jacob, Laurent
CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS
title CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS
title_full CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS
title_fullStr CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS
title_full_unstemmed CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS
title_short CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS
title_sort caldera: finding all significant de bruijn subgraphs for bacterial gwas
topic ISCB/Ismb 2022
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235473/
https://www.ncbi.nlm.nih.gov/pubmed/35758804
http://dx.doi.org/10.1093/bioinformatics/btac238
work_keys_str_mv AT rouxdebezieuxhector calderafindingallsignificantdebruijnsubgraphsforbacterialgwas
AT limaleandro calderafindingallsignificantdebruijnsubgraphsforbacterialgwas
AT perraudeaufanny calderafindingallsignificantdebruijnsubgraphsforbacterialgwas
AT maryarnaud calderafindingallsignificantdebruijnsubgraphsforbacterialgwas
AT dudoitsandrine calderafindingallsignificantdebruijnsubgraphsforbacterialgwas
AT jacoblaurent calderafindingallsignificantdebruijnsubgraphsforbacterialgwas