Cargando…
Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel
Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, witho...
Autores principales: | , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Microbiology Society
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8715432/ https://www.ncbi.nlm.nih.gov/pubmed/34554082 http://dx.doi.org/10.1099/mgen.0.000651 |
_version_ | 1784624125589848064 |
---|---|
author | Labbé, Geneviève Kruczkiewicz, Peter Robertson, James Mabon, Philip Schonfeld, Justin Kein, Daniel Rankin, Marisa A. Gopez, Matthew Hole, Darian Son, David Knox, Natalie Laing, Chad R. Bessonov, Kyrylo Taboada, Eduardo N. Yoshida, Catherine Ziebell, Kim Nichani, Anil Johnson, Roger P. Van Domselaar, Gary Nash, John H. E. |
author_facet | Labbé, Geneviève Kruczkiewicz, Peter Robertson, James Mabon, Philip Schonfeld, Justin Kein, Daniel Rankin, Marisa A. Gopez, Matthew Hole, Darian Son, David Knox, Natalie Laing, Chad R. Bessonov, Kyrylo Taboada, Eduardo N. Yoshida, Catherine Ziebell, Kim Nichani, Anil Johnson, Roger P. Van Domselaar, Gary Nash, John H. E. |
author_sort | Labbé, Geneviève |
collection | PubMed |
description | Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, without the need to rebuild phylogenetic trees for the entire dataset. This classification approach has, however, seen limited uptake in routine public health settings due to analytical complexity and the lack of standardized tools that provide clear and easy ways to interpret results. The BioHansel tool was developed to provide an organism-agnostic tool for hierarchical SNP-based genotyping. The tool identifies split k-mers that distinguish predefined lineages in whole genome sequencing (WGS) data using SNP-based genotyping schemes. BioHansel uses the Aho-Corasick algorithm to type isolates from assembled genomes or raw read sequence data in a matter of seconds, with limited computational resources. This makes BioHansel ideal for use by public health agencies that rely on WGS methods for surveillance of bacterial pathogens. Genotyping results are evaluated using a quality assurance module which identifies problematic samples, such as low-quality or contaminated datasets. Using existing hierarchical SNP schemes for Mycobacterium tuberculosis and Salmonella Typhi, we compare the genotyping results obtained with the k-mer-based tools BioHansel and SKA, with those of the organism-specific tools TBProfiler and genotyphi, which use gold-standard reference-mapping approaches. We show that the genotyping results are fully concordant across these different methods, and that the k-mer-based tools are significantly faster. We also test the ability of the BioHansel quality assurance module to detect intra-lineage contamination and demonstrate that it is effective, even in populations with low genetic diversity. We demonstrate the scalability of the tool using a dataset of ~8100 S. Typhi public genomes and provide the aggregated results of geographical distributions as part of the tool’s output. BioHansel is an open source Python 3 application available on PyPI and Conda repositories and as a Galaxy tool from the public Galaxy Toolshed. In a public health context, BioHansel enables rapid and high-resolution classification of bacterial pathogens with low genetic diversity. |
format | Online Article Text |
id | pubmed-8715432 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Microbiology Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-87154322021-12-29 Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel Labbé, Geneviève Kruczkiewicz, Peter Robertson, James Mabon, Philip Schonfeld, Justin Kein, Daniel Rankin, Marisa A. Gopez, Matthew Hole, Darian Son, David Knox, Natalie Laing, Chad R. Bessonov, Kyrylo Taboada, Eduardo N. Yoshida, Catherine Ziebell, Kim Nichani, Anil Johnson, Roger P. Van Domselaar, Gary Nash, John H. E. Microb Genom Research Articles Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, without the need to rebuild phylogenetic trees for the entire dataset. This classification approach has, however, seen limited uptake in routine public health settings due to analytical complexity and the lack of standardized tools that provide clear and easy ways to interpret results. The BioHansel tool was developed to provide an organism-agnostic tool for hierarchical SNP-based genotyping. The tool identifies split k-mers that distinguish predefined lineages in whole genome sequencing (WGS) data using SNP-based genotyping schemes. BioHansel uses the Aho-Corasick algorithm to type isolates from assembled genomes or raw read sequence data in a matter of seconds, with limited computational resources. This makes BioHansel ideal for use by public health agencies that rely on WGS methods for surveillance of bacterial pathogens. Genotyping results are evaluated using a quality assurance module which identifies problematic samples, such as low-quality or contaminated datasets. Using existing hierarchical SNP schemes for Mycobacterium tuberculosis and Salmonella Typhi, we compare the genotyping results obtained with the k-mer-based tools BioHansel and SKA, with those of the organism-specific tools TBProfiler and genotyphi, which use gold-standard reference-mapping approaches. We show that the genotyping results are fully concordant across these different methods, and that the k-mer-based tools are significantly faster. We also test the ability of the BioHansel quality assurance module to detect intra-lineage contamination and demonstrate that it is effective, even in populations with low genetic diversity. We demonstrate the scalability of the tool using a dataset of ~8100 S. Typhi public genomes and provide the aggregated results of geographical distributions as part of the tool’s output. BioHansel is an open source Python 3 application available on PyPI and Conda repositories and as a Galaxy tool from the public Galaxy Toolshed. In a public health context, BioHansel enables rapid and high-resolution classification of bacterial pathogens with low genetic diversity. Microbiology Society 2021-09-23 /pmc/articles/PMC8715432/ /pubmed/34554082 http://dx.doi.org/10.1099/mgen.0.000651 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License. |
spellingShingle | Research Articles Labbé, Geneviève Kruczkiewicz, Peter Robertson, James Mabon, Philip Schonfeld, Justin Kein, Daniel Rankin, Marisa A. Gopez, Matthew Hole, Darian Son, David Knox, Natalie Laing, Chad R. Bessonov, Kyrylo Taboada, Eduardo N. Yoshida, Catherine Ziebell, Kim Nichani, Anil Johnson, Roger P. Van Domselaar, Gary Nash, John H. E. Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel |
title | Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel |
title_full | Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel |
title_fullStr | Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel |
title_full_unstemmed | Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel |
title_short | Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel |
title_sort | rapid and accurate snp genotyping of clonal bacterial pathogens with biohansel |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8715432/ https://www.ncbi.nlm.nih.gov/pubmed/34554082 http://dx.doi.org/10.1099/mgen.0.000651 |
work_keys_str_mv | AT labbegenevieve rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT kruczkiewiczpeter rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT robertsonjames rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT mabonphilip rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT schonfeldjustin rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT keindaniel rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT rankinmarisaa rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT gopezmatthew rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT holedarian rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT sondavid rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT knoxnatalie rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT laingchadr rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT bessonovkyrylo rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT taboadaeduardon rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT yoshidacatherine rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT ziebellkim rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT nichanianil rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT johnsonrogerp rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT vandomselaargary rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel AT nashjohnhe rapidandaccuratesnpgenotypingofclonalbacterialpathogenswithbiohansel |