Cargando…

Fast genome-based delimitation of Enterobacterales species

Average Nucleotide Identity (ANI) is becoming a standard measure for bacterial species delimitation. However, its calculation can take orders of magnitude longer than similarity estimates based on sampling of short nucleotides, compiled into so-called sketches. These estimates are widely used. Howev...

Descripción completa

Detalles Bibliográficos
Autores principales: Hernández-Salmerón, Julie E., Irani, Tanya, Moreno-Hagelsieb, Gabriel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501659/
https://www.ncbi.nlm.nih.gov/pubmed/37708115
http://dx.doi.org/10.1371/journal.pone.0291492
_version_ 1785106159491874816
author Hernández-Salmerón, Julie E.
Irani, Tanya
Moreno-Hagelsieb, Gabriel
author_facet Hernández-Salmerón, Julie E.
Irani, Tanya
Moreno-Hagelsieb, Gabriel
author_sort Hernández-Salmerón, Julie E.
collection PubMed
description Average Nucleotide Identity (ANI) is becoming a standard measure for bacterial species delimitation. However, its calculation can take orders of magnitude longer than similarity estimates based on sampling of short nucleotides, compiled into so-called sketches. These estimates are widely used. However, their variable correlation with ANI has suggested that they might not be as accurate. For a where-the-rubber-meets-the-road assessment, we compared two sketching programs, mash and dashing, against ANI, in delimiting species among Esterobacterales genomes. Receiver Operating Characteristic (ROC) analysis found Area Under the Curve (AUC) values of 0.99, almost perfect species discrimination for all three measures. Subsampling to avoid over-represented species reduced these AUC values to 0.92, still highly accurate. Focused tests with ten genera, each represented by more than three species, also showed almost identical results for all methods. Shigella showed the lowest AUC values (0.68), followed by Citrobacter (0.80). All other genera, Dickeya, Enterobacter, Escherichia, Klebsiella, Pectobacterium, Proteus, Providencia and Yersinia, produced AUC values above 0.90. The species delimitation thresholds varied, with species distance ranges in a few genera overlapping the genus ranges of other genera. Mash was able to separate the E. coli + Shigella complex into 25 apparent phylogroups, four of them corresponding, roughly, to the four Shigella species represented in the data. Our results suggest that fast estimates of genome similarity are as good as ANI for species delimitation. Therefore, these estimates might suffice for covering the role of genomic similarity in bacterial taxonomy, and should increase confidence in their use for efficient bacterial identification and clustering, from epidemiological to genome-based detection of potential contaminants in farming and industry settings.
format Online
Article
Text
id pubmed-10501659
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-105016592023-09-15 Fast genome-based delimitation of Enterobacterales species Hernández-Salmerón, Julie E. Irani, Tanya Moreno-Hagelsieb, Gabriel PLoS One Research Article Average Nucleotide Identity (ANI) is becoming a standard measure for bacterial species delimitation. However, its calculation can take orders of magnitude longer than similarity estimates based on sampling of short nucleotides, compiled into so-called sketches. These estimates are widely used. However, their variable correlation with ANI has suggested that they might not be as accurate. For a where-the-rubber-meets-the-road assessment, we compared two sketching programs, mash and dashing, against ANI, in delimiting species among Esterobacterales genomes. Receiver Operating Characteristic (ROC) analysis found Area Under the Curve (AUC) values of 0.99, almost perfect species discrimination for all three measures. Subsampling to avoid over-represented species reduced these AUC values to 0.92, still highly accurate. Focused tests with ten genera, each represented by more than three species, also showed almost identical results for all methods. Shigella showed the lowest AUC values (0.68), followed by Citrobacter (0.80). All other genera, Dickeya, Enterobacter, Escherichia, Klebsiella, Pectobacterium, Proteus, Providencia and Yersinia, produced AUC values above 0.90. The species delimitation thresholds varied, with species distance ranges in a few genera overlapping the genus ranges of other genera. Mash was able to separate the E. coli + Shigella complex into 25 apparent phylogroups, four of them corresponding, roughly, to the four Shigella species represented in the data. Our results suggest that fast estimates of genome similarity are as good as ANI for species delimitation. Therefore, these estimates might suffice for covering the role of genomic similarity in bacterial taxonomy, and should increase confidence in their use for efficient bacterial identification and clustering, from epidemiological to genome-based detection of potential contaminants in farming and industry settings. Public Library of Science 2023-09-14 /pmc/articles/PMC10501659/ /pubmed/37708115 http://dx.doi.org/10.1371/journal.pone.0291492 Text en © 2023 Hernández-Salmerón et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hernández-Salmerón, Julie E.
Irani, Tanya
Moreno-Hagelsieb, Gabriel
Fast genome-based delimitation of Enterobacterales species
title Fast genome-based delimitation of Enterobacterales species
title_full Fast genome-based delimitation of Enterobacterales species
title_fullStr Fast genome-based delimitation of Enterobacterales species
title_full_unstemmed Fast genome-based delimitation of Enterobacterales species
title_short Fast genome-based delimitation of Enterobacterales species
title_sort fast genome-based delimitation of enterobacterales species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501659/
https://www.ncbi.nlm.nih.gov/pubmed/37708115
http://dx.doi.org/10.1371/journal.pone.0291492
work_keys_str_mv AT hernandezsalmeronjuliee fastgenomebaseddelimitationofenterobacteralesspecies
AT iranitanya fastgenomebaseddelimitationofenterobacteralesspecies
AT morenohagelsiebgabriel fastgenomebaseddelimitationofenterobacteralesspecies