Cargando…

ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data

Escherichia coli is a priority foodborne pathogen of public health concern and phenotypic serotyping provides critical information for surveillance and outbreak detection activities. Public health and food safety laboratories are increasingly adopting whole-genome sequencing (WGS) for characterizing...

Descripción completa

Detalles Bibliográficos
Autores principales: Bessonov, Kyrylo, Laing, Chad, Robertson, James, Yong, Irene, Ziebell, Kim, Gannon, Victor P. J., Nichani, Anil, Arya, Gitanjali, Nash, John H. E., Christianson, Sara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8767331/
https://www.ncbi.nlm.nih.gov/pubmed/34860150
http://dx.doi.org/10.1099/mgen.0.000728
_version_ 1784634713804111872
author Bessonov, Kyrylo
Laing, Chad
Robertson, James
Yong, Irene
Ziebell, Kim
Gannon, Victor P. J.
Nichani, Anil
Arya, Gitanjali
Nash, John H. E.
Christianson, Sara
author_facet Bessonov, Kyrylo
Laing, Chad
Robertson, James
Yong, Irene
Ziebell, Kim
Gannon, Victor P. J.
Nichani, Anil
Arya, Gitanjali
Nash, John H. E.
Christianson, Sara
author_sort Bessonov, Kyrylo
collection PubMed
description Escherichia coli is a priority foodborne pathogen of public health concern and phenotypic serotyping provides critical information for surveillance and outbreak detection activities. Public health and food safety laboratories are increasingly adopting whole-genome sequencing (WGS) for characterizing pathogens, but it is imperative to maintain serotype designations in order to minimize disruptions to existing public health workflows. Multiple in silico tools have been developed for predicting serotypes from WGS data, including SRST2, SerotypeFinder and EToKi EBEis, but these tools were not designed with the specific requirements of diagnostic laboratories, which include: speciation, input data flexibility (fasta/fastq), quality control information and easily interpretable results. To address these specific requirements, we developed ECTyper (https://github.com/phac-nml/ecoli_serotyping) for performing both speciation within Escherichia and Shigella , and in silico serotype prediction. We compared the serotype prediction performance of each tool on a newly sequenced panel of 185 isolates with confirmed phenotypic serotype information. We found that all tools were highly concordant, with 92–97 % for O-antigens and 98–100 % for H-antigens, and ECTyper having the highest rate of concordance. We extended the benchmarking to a large panel of 6954 publicly available E. coli genomes to assess the performance of the tools on a more diverse dataset. On the public data, there was a considerable drop in concordance, with 75–91 % for O-antigens and 62–90 % for H-antigens, and ECTyper and SerotypeFinder being the most concordant. This study highlights that in silico predictions show high concordance with phenotypic serotyping results, but there are notable differences in tool performance. ECTyper provides highly accurate and sensitive in silico serotype predictions, in addition to speciation, and is designed to be easily incorporated into bioinformatic workflows.
format Online
Article
Text
id pubmed-8767331
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-87673312022-01-19 ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data Bessonov, Kyrylo Laing, Chad Robertson, James Yong, Irene Ziebell, Kim Gannon, Victor P. J. Nichani, Anil Arya, Gitanjali Nash, John H. E. Christianson, Sara Microb Genom Research Articles Escherichia coli is a priority foodborne pathogen of public health concern and phenotypic serotyping provides critical information for surveillance and outbreak detection activities. Public health and food safety laboratories are increasingly adopting whole-genome sequencing (WGS) for characterizing pathogens, but it is imperative to maintain serotype designations in order to minimize disruptions to existing public health workflows. Multiple in silico tools have been developed for predicting serotypes from WGS data, including SRST2, SerotypeFinder and EToKi EBEis, but these tools were not designed with the specific requirements of diagnostic laboratories, which include: speciation, input data flexibility (fasta/fastq), quality control information and easily interpretable results. To address these specific requirements, we developed ECTyper (https://github.com/phac-nml/ecoli_serotyping) for performing both speciation within Escherichia and Shigella , and in silico serotype prediction. We compared the serotype prediction performance of each tool on a newly sequenced panel of 185 isolates with confirmed phenotypic serotype information. We found that all tools were highly concordant, with 92–97 % for O-antigens and 98–100 % for H-antigens, and ECTyper having the highest rate of concordance. We extended the benchmarking to a large panel of 6954 publicly available E. coli genomes to assess the performance of the tools on a more diverse dataset. On the public data, there was a considerable drop in concordance, with 75–91 % for O-antigens and 62–90 % for H-antigens, and ECTyper and SerotypeFinder being the most concordant. This study highlights that in silico predictions show high concordance with phenotypic serotyping results, but there are notable differences in tool performance. ECTyper provides highly accurate and sensitive in silico serotype predictions, in addition to speciation, and is designed to be easily incorporated into bioinformatic workflows. Microbiology Society 2021-12-03 /pmc/articles/PMC8767331/ /pubmed/34860150 http://dx.doi.org/10.1099/mgen.0.000728 Text en © 2021 Crown Copyright https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License.
spellingShingle Research Articles
Bessonov, Kyrylo
Laing, Chad
Robertson, James
Yong, Irene
Ziebell, Kim
Gannon, Victor P. J.
Nichani, Anil
Arya, Gitanjali
Nash, John H. E.
Christianson, Sara
ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data
title ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data
title_full ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data
title_fullStr ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data
title_full_unstemmed ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data
title_short ECTyper: in silico Escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data
title_sort ectyper: in silico escherichia coli serotype and species prediction from raw and assembled whole-genome sequence data
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8767331/
https://www.ncbi.nlm.nih.gov/pubmed/34860150
http://dx.doi.org/10.1099/mgen.0.000728
work_keys_str_mv AT bessonovkyrylo ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT laingchad ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT robertsonjames ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT yongirene ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT ziebellkim ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT gannonvictorpj ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT nichanianil ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT aryagitanjali ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT nashjohnhe ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata
AT christiansonsara ectyperinsilicoescherichiacoliserotypeandspeciespredictionfromrawandassembledwholegenomesequencedata