Cargando…

The K-mer antibiotic resistance gene variant analyzer (KARGVA)

Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. C...

Descripción completa

Detalles Bibliográficos
Autores principales: Marini, Simone, Boucher, Christina, Noyes, Noelle, Prosperi, Mattia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027697/
https://www.ncbi.nlm.nih.gov/pubmed/36960290
http://dx.doi.org/10.3389/fmicb.2023.1060891
_version_ 1784909762615312384
author Marini, Simone
Boucher, Christina
Noyes, Noelle
Prosperi, Mattia
author_facet Marini, Simone
Boucher, Christina
Noyes, Noelle
Prosperi, Mattia
author_sort Marini, Simone
collection PubMed
description Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license.
format Online
Article
Text
id pubmed-10027697
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-100276972023-03-22 The K-mer antibiotic resistance gene variant analyzer (KARGVA) Marini, Simone Boucher, Christina Noyes, Noelle Prosperi, Mattia Front Microbiol Microbiology Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license. Frontiers Media S.A. 2023-03-07 /pmc/articles/PMC10027697/ /pubmed/36960290 http://dx.doi.org/10.3389/fmicb.2023.1060891 Text en Copyright © 2023 Marini, Boucher, Noyes and Prosperi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Marini, Simone
Boucher, Christina
Noyes, Noelle
Prosperi, Mattia
The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_full The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_fullStr The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_full_unstemmed The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_short The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_sort k-mer antibiotic resistance gene variant analyzer (kargva)
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10027697/
https://www.ncbi.nlm.nih.gov/pubmed/36960290
http://dx.doi.org/10.3389/fmicb.2023.1060891
work_keys_str_mv AT marinisimone thekmerantibioticresistancegenevariantanalyzerkargva
AT boucherchristina thekmerantibioticresistancegenevariantanalyzerkargva
AT noyesnoelle thekmerantibioticresistancegenevariantanalyzerkargva
AT prosperimattia thekmerantibioticresistancegenevariantanalyzerkargva
AT marinisimone kmerantibioticresistancegenevariantanalyzerkargva
AT boucherchristina kmerantibioticresistancegenevariantanalyzerkargva
AT noyesnoelle kmerantibioticresistancegenevariantanalyzerkargva
AT prosperimattia kmerantibioticresistancegenevariantanalyzerkargva