Cargando…
The K-mer antibiotic resistance gene variant analyzer (KARGVA)
Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. C...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027697/ https://www.ncbi.nlm.nih.gov/pubmed/36960290 http://dx.doi.org/10.3389/fmicb.2023.1060891 |
_version_ | 1784909762615312384 |
---|---|
author | Marini, Simone Boucher, Christina Noyes, Noelle Prosperi, Mattia |
author_facet | Marini, Simone Boucher, Christina Noyes, Noelle Prosperi, Mattia |
author_sort | Marini, Simone |
collection | PubMed |
description | Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license. |
format | Online Article Text |
id | pubmed-10027697 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-100276972023-03-22 The K-mer antibiotic resistance gene variant analyzer (KARGVA) Marini, Simone Boucher, Christina Noyes, Noelle Prosperi, Mattia Front Microbiol Microbiology Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license. Frontiers Media S.A. 2023-03-07 /pmc/articles/PMC10027697/ /pubmed/36960290 http://dx.doi.org/10.3389/fmicb.2023.1060891 Text en Copyright © 2023 Marini, Boucher, Noyes and Prosperi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Marini, Simone Boucher, Christina Noyes, Noelle Prosperi, Mattia The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_full | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_fullStr | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_full_unstemmed | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_short | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_sort | k-mer antibiotic resistance gene variant analyzer (kargva) |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027697/ https://www.ncbi.nlm.nih.gov/pubmed/36960290 http://dx.doi.org/10.3389/fmicb.2023.1060891 |
work_keys_str_mv | AT marinisimone thekmerantibioticresistancegenevariantanalyzerkargva AT boucherchristina thekmerantibioticresistancegenevariantanalyzerkargva AT noyesnoelle thekmerantibioticresistancegenevariantanalyzerkargva AT prosperimattia thekmerantibioticresistancegenevariantanalyzerkargva AT marinisimone kmerantibioticresistancegenevariantanalyzerkargva AT boucherchristina kmerantibioticresistancegenevariantanalyzerkargva AT noyesnoelle kmerantibioticresistancegenevariantanalyzerkargva AT prosperimattia kmerantibioticresistancegenevariantanalyzerkargva |