Cargando…

Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data

We compared the performance of four open-source in silico Salmonella typing tools (SeqSero, SeqSero2, Salmonella In Silico Typing Resource [SISTR], and Metric Oriented Sequence Typer [MOST]) to assess their potential for replacing laboratory serological testing with serovar predictions from whole-ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Uelze, Laura, Borowiak, Maria, Deneke, Carlus, Szabó, István, Fischer, Jennie, Tausch, Simon H., Malorny, Burkhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7028957/
https://www.ncbi.nlm.nih.gov/pubmed/31862714
http://dx.doi.org/10.1128/AEM.02265-19
_version_ 1783499075275980800
author Uelze, Laura
Borowiak, Maria
Deneke, Carlus
Szabó, István
Fischer, Jennie
Tausch, Simon H.
Malorny, Burkhard
author_facet Uelze, Laura
Borowiak, Maria
Deneke, Carlus
Szabó, István
Fischer, Jennie
Tausch, Simon H.
Malorny, Burkhard
author_sort Uelze, Laura
collection PubMed
description We compared the performance of four open-source in silico Salmonella typing tools (SeqSero, SeqSero2, Salmonella In Silico Typing Resource [SISTR], and Metric Oriented Sequence Typer [MOST]) to assess their potential for replacing laboratory serological testing with serovar predictions from whole-genome sequencing data. We conducted a retrospective analysis of 1,624 Salmonella isolates of 72 serovars submitted to the German National Salmonella Reference Laboratory between 1999 and 2019. All isolates are derived from animal and foodstuff origins. We conducted Illumina short-read sequencing and compared the in silico serovar prediction results with the results of routine laboratory serotyping. We found the best-performing in silico serovar prediction tool to be SISTR, with 94% correctly typed isolates, followed by SeqSero2 (87%), SeqSero (81%), and MOST (79%). Furthermore, we found that mapping-based tools like SeqSero and SeqSero2 (allele mode) were more reliable for the prediction of monophasic variants, while sequence type and cluster-based methods like MOST and SISTR (core-genome multilocus sequence type [cgMLST]), showed greater resilience when confronted with GC-biased sequencing data. We showed that the choice of library preparation kit could substantially affect O antigen detection, due to the low GC content of the wzx and wzy genes. Although the accuracy of computational serovar predictions is still not quite on par with traditional serotyping by Salmonella reference laboratories, the command-line tools investigated in this study perform a rapid, efficient, inexpensive, and reproducible analysis, which can be integrated into in-house characterization pipelines. Based on our results, we find SISTR most suitable for automated, routine serotyping for public health surveillance of Salmonella. IMPORTANCE Salmonella spp. are important foodborne pathogens. To reduce the number of infected patients, it is essential to understand which subtypes of the bacteria cause disease outbreaks. Traditionally, characterization of Salmonella requires serological testing, a laboratory method by which Salmonella isolates can be classified into over 2,600 distinct subtypes, called serovars. Due to recent advances in whole-genome sequencing, many tools have been developed to replace traditional testing methods with computational analysis of genome sequences. It is crucial to validate that these tools, many already in use for routine surveillance, deliver accurate and reliable serovar information. In this study, we set out to compare which of the currently available open-source command-line tools is most suitable to replace serological testing. A thorough evaluation of the differing computational approaches is highly important to ensure the backward compatibility of serotyping data and to maintain comparability between laboratories.
format Online
Article
Text
id pubmed-7028957
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-70289572020-03-06 Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data Uelze, Laura Borowiak, Maria Deneke, Carlus Szabó, István Fischer, Jennie Tausch, Simon H. Malorny, Burkhard Appl Environ Microbiol Food Microbiology We compared the performance of four open-source in silico Salmonella typing tools (SeqSero, SeqSero2, Salmonella In Silico Typing Resource [SISTR], and Metric Oriented Sequence Typer [MOST]) to assess their potential for replacing laboratory serological testing with serovar predictions from whole-genome sequencing data. We conducted a retrospective analysis of 1,624 Salmonella isolates of 72 serovars submitted to the German National Salmonella Reference Laboratory between 1999 and 2019. All isolates are derived from animal and foodstuff origins. We conducted Illumina short-read sequencing and compared the in silico serovar prediction results with the results of routine laboratory serotyping. We found the best-performing in silico serovar prediction tool to be SISTR, with 94% correctly typed isolates, followed by SeqSero2 (87%), SeqSero (81%), and MOST (79%). Furthermore, we found that mapping-based tools like SeqSero and SeqSero2 (allele mode) were more reliable for the prediction of monophasic variants, while sequence type and cluster-based methods like MOST and SISTR (core-genome multilocus sequence type [cgMLST]), showed greater resilience when confronted with GC-biased sequencing data. We showed that the choice of library preparation kit could substantially affect O antigen detection, due to the low GC content of the wzx and wzy genes. Although the accuracy of computational serovar predictions is still not quite on par with traditional serotyping by Salmonella reference laboratories, the command-line tools investigated in this study perform a rapid, efficient, inexpensive, and reproducible analysis, which can be integrated into in-house characterization pipelines. Based on our results, we find SISTR most suitable for automated, routine serotyping for public health surveillance of Salmonella. IMPORTANCE Salmonella spp. are important foodborne pathogens. To reduce the number of infected patients, it is essential to understand which subtypes of the bacteria cause disease outbreaks. Traditionally, characterization of Salmonella requires serological testing, a laboratory method by which Salmonella isolates can be classified into over 2,600 distinct subtypes, called serovars. Due to recent advances in whole-genome sequencing, many tools have been developed to replace traditional testing methods with computational analysis of genome sequences. It is crucial to validate that these tools, many already in use for routine surveillance, deliver accurate and reliable serovar information. In this study, we set out to compare which of the currently available open-source command-line tools is most suitable to replace serological testing. A thorough evaluation of the differing computational approaches is highly important to ensure the backward compatibility of serotyping data and to maintain comparability between laboratories. American Society for Microbiology 2020-02-18 /pmc/articles/PMC7028957/ /pubmed/31862714 http://dx.doi.org/10.1128/AEM.02265-19 Text en Copyright © 2020 Uelze et al. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Food Microbiology
Uelze, Laura
Borowiak, Maria
Deneke, Carlus
Szabó, István
Fischer, Jennie
Tausch, Simon H.
Malorny, Burkhard
Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data
title Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data
title_full Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data
title_fullStr Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data
title_full_unstemmed Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data
title_short Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data
title_sort performance and accuracy of four open-source tools for in silico serotyping of salmonella spp. based on whole-genome short-read sequencing data
topic Food Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7028957/
https://www.ncbi.nlm.nih.gov/pubmed/31862714
http://dx.doi.org/10.1128/AEM.02265-19
work_keys_str_mv AT uelzelaura performanceandaccuracyoffouropensourcetoolsforinsilicoserotypingofsalmonellasppbasedonwholegenomeshortreadsequencingdata
AT borowiakmaria performanceandaccuracyoffouropensourcetoolsforinsilicoserotypingofsalmonellasppbasedonwholegenomeshortreadsequencingdata
AT denekecarlus performanceandaccuracyoffouropensourcetoolsforinsilicoserotypingofsalmonellasppbasedonwholegenomeshortreadsequencingdata
AT szaboistvan performanceandaccuracyoffouropensourcetoolsforinsilicoserotypingofsalmonellasppbasedonwholegenomeshortreadsequencingdata
AT fischerjennie performanceandaccuracyoffouropensourcetoolsforinsilicoserotypingofsalmonellasppbasedonwholegenomeshortreadsequencingdata
AT tauschsimonh performanceandaccuracyoffouropensourcetoolsforinsilicoserotypingofsalmonellasppbasedonwholegenomeshortreadsequencingdata
AT malornyburkhard performanceandaccuracyoffouropensourcetoolsforinsilicoserotypingofsalmonellasppbasedonwholegenomeshortreadsequencingdata