Cargando…

SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

Streptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have...

Descripción completa

Detalles Bibliográficos
Autores principales: Epping, Lennard, van Tonder, Andries J., Gladstone, Rebecca A., Bentley, Stephen D., Page, Andrew J., Keane, Jacqueline A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6113868/
https://www.ncbi.nlm.nih.gov/pubmed/29870330
http://dx.doi.org/10.1099/mgen.0.000186
_version_ 1783351091100909568
author Epping, Lennard
van Tonder, Andries J.
Gladstone, Rebecca A.
Bentley, Stephen D.
Page, Andrew J.
Keane, Jacqueline A.
author_facet Epping, Lennard
van Tonder, Andries J.
Gladstone, Rebecca A.
Bentley, Stephen D.
Page, Andrew J.
Keane, Jacqueline A.
author_sort Epping, Lennard
collection PubMed
description Streptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15–21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sanger-pathogens/seroba
format Online
Article
Text
id pubmed-6113868
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-61138682018-08-30 SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data Epping, Lennard van Tonder, Andries J. Gladstone, Rebecca A. Bentley, Stephen D. Page, Andrew J. Keane, Jacqueline A. Microb Genom Methods Paper Streptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15–21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sanger-pathogens/seroba Microbiology Society 2018-06-15 /pmc/articles/PMC6113868/ /pubmed/29870330 http://dx.doi.org/10.1099/mgen.0.000186 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Paper
Epping, Lennard
van Tonder, Andries J.
Gladstone, Rebecca A.
Bentley, Stephen D.
Page, Andrew J.
Keane, Jacqueline A.
SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_full SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_fullStr SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_full_unstemmed SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_short SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
title_sort seroba: rapid high-throughput serotyping of streptococcus pneumoniae from whole genome sequence data
topic Methods Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6113868/
https://www.ncbi.nlm.nih.gov/pubmed/29870330
http://dx.doi.org/10.1099/mgen.0.000186
work_keys_str_mv AT eppinglennard serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT vantonderandriesj serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT gladstonerebeccaa serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT bentleystephend serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT pageandrewj serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata
AT keanejacquelinea serobarapidhighthroughputserotypingofstreptococcuspneumoniaefromwholegenomesequencedata