Cargando…

Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline

Streptococcus pneumoniae typically express one of 92 serologically distinct capsule polysaccharide (cps) types (serotypes). Some of these serotypes are closely related to each other; using the commercially available typing antisera, these are assigned to common serogroups containing types that show...

Descripción completa

Detalles Bibliográficos
Autores principales: Kapatai, Georgia, Sheppard, Carmen L., Al-Shahib, Ali, Litt, David J., Underwood, Anthony P., Harrison, Timothy G., Fry, Norman K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5028725/
https://www.ncbi.nlm.nih.gov/pubmed/27672516
http://dx.doi.org/10.7717/peerj.2477
_version_ 1782454384300392448
author Kapatai, Georgia
Sheppard, Carmen L.
Al-Shahib, Ali
Litt, David J.
Underwood, Anthony P.
Harrison, Timothy G.
Fry, Norman K.
author_facet Kapatai, Georgia
Sheppard, Carmen L.
Al-Shahib, Ali
Litt, David J.
Underwood, Anthony P.
Harrison, Timothy G.
Fry, Norman K.
author_sort Kapatai, Georgia
collection PubMed
description Streptococcus pneumoniae typically express one of 92 serologically distinct capsule polysaccharide (cps) types (serotypes). Some of these serotypes are closely related to each other; using the commercially available typing antisera, these are assigned to common serogroups containing types that show cross-reactivity. In this serotyping scheme, factor antisera are used to allocate serotypes within a serogroup, based on patterns of reactions. This serotyping method is technically demanding, requires considerable experience and the reading of the results can be subjective. This study describes the analysis of the S. pneumoniae capsular operon genetic sequence to determine serotype distinguishing features and the development, evaluation and verification of an automated whole genome sequence (WGS)-based serotyping bioinformatics tool, PneumoCaT (Pneumococcal Capsule Typing). Initially, WGS data from 871 S. pneumoniae isolates were mapped to reference cps locus sequences for the 92 serotypes. Thirty-two of 92 serotypes could be unambiguously identified based on sequence similarities within the cps operon. The remaining 60 were allocated to one of 20 ‘genogroups’ that broadly correspond to the immunologically defined serogroups. By comparing the cps reference sequences for each genogroup, unique molecular differences were determined for serotypes within 18 of the 20 genogroups and verified using the set of 871 isolates. This information was used to design a decision-tree style algorithm within the PneumoCaT bioinformatics tool to predict to serotype level for 89/94 (92 + 2 molecular types/subtypes) from WGS data and to serogroup level for serogroups 24 and 32, which currently comprise 2.1% of UK referred, invasive isolates submitted to the National Reference Laboratory (NRL), Public Health England (June 2014–July 2015). PneumoCaT was evaluated with an internal validation set of 2065 UK isolates covering 72/92 serotypes, including 19 non-typeable isolates and an external validation set of 2964 isolates from Thailand (n = 2,531), USA (n = 181) and Iceland (n = 252). PneumoCaT was able to predict serotype in 99.1% of the typeable UK isolates and in 99.0% of the non-UK isolates. Concordance was evaluated in UK isolates where further investigation was possible; in 91.5% of the cases the predicted capsular type was concordant with the serologically derived serotype. Following retesting, concordance increased to 99.3% and in most resolved cases (97.8%; 135/138) discordance was shown to be caused by errors in original serotyping. Replicate testing demonstrated that PneumoCaT gave 100% reproducibility of the predicted serotype result. In summary, we have developed a WGS-based serotyping method that can predict capsular type to serotype level for 89/94 serotypes and to serogroup level for the remaining four. This approach could be integrated into routine typing workflows in reference laboratories, reducing the need for phenotypic immunological testing.
format Online
Article
Text
id pubmed-5028725
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-50287252016-09-26 Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline Kapatai, Georgia Sheppard, Carmen L. Al-Shahib, Ali Litt, David J. Underwood, Anthony P. Harrison, Timothy G. Fry, Norman K. PeerJ Bioinformatics Streptococcus pneumoniae typically express one of 92 serologically distinct capsule polysaccharide (cps) types (serotypes). Some of these serotypes are closely related to each other; using the commercially available typing antisera, these are assigned to common serogroups containing types that show cross-reactivity. In this serotyping scheme, factor antisera are used to allocate serotypes within a serogroup, based on patterns of reactions. This serotyping method is technically demanding, requires considerable experience and the reading of the results can be subjective. This study describes the analysis of the S. pneumoniae capsular operon genetic sequence to determine serotype distinguishing features and the development, evaluation and verification of an automated whole genome sequence (WGS)-based serotyping bioinformatics tool, PneumoCaT (Pneumococcal Capsule Typing). Initially, WGS data from 871 S. pneumoniae isolates were mapped to reference cps locus sequences for the 92 serotypes. Thirty-two of 92 serotypes could be unambiguously identified based on sequence similarities within the cps operon. The remaining 60 were allocated to one of 20 ‘genogroups’ that broadly correspond to the immunologically defined serogroups. By comparing the cps reference sequences for each genogroup, unique molecular differences were determined for serotypes within 18 of the 20 genogroups and verified using the set of 871 isolates. This information was used to design a decision-tree style algorithm within the PneumoCaT bioinformatics tool to predict to serotype level for 89/94 (92 + 2 molecular types/subtypes) from WGS data and to serogroup level for serogroups 24 and 32, which currently comprise 2.1% of UK referred, invasive isolates submitted to the National Reference Laboratory (NRL), Public Health England (June 2014–July 2015). PneumoCaT was evaluated with an internal validation set of 2065 UK isolates covering 72/92 serotypes, including 19 non-typeable isolates and an external validation set of 2964 isolates from Thailand (n = 2,531), USA (n = 181) and Iceland (n = 252). PneumoCaT was able to predict serotype in 99.1% of the typeable UK isolates and in 99.0% of the non-UK isolates. Concordance was evaluated in UK isolates where further investigation was possible; in 91.5% of the cases the predicted capsular type was concordant with the serologically derived serotype. Following retesting, concordance increased to 99.3% and in most resolved cases (97.8%; 135/138) discordance was shown to be caused by errors in original serotyping. Replicate testing demonstrated that PneumoCaT gave 100% reproducibility of the predicted serotype result. In summary, we have developed a WGS-based serotyping method that can predict capsular type to serotype level for 89/94 serotypes and to serogroup level for the remaining four. This approach could be integrated into routine typing workflows in reference laboratories, reducing the need for phenotypic immunological testing. PeerJ Inc. 2016-09-14 /pmc/articles/PMC5028725/ /pubmed/27672516 http://dx.doi.org/10.7717/peerj.2477 Text en ©2016 Kapatai et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Kapatai, Georgia
Sheppard, Carmen L.
Al-Shahib, Ali
Litt, David J.
Underwood, Anthony P.
Harrison, Timothy G.
Fry, Norman K.
Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline
title Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline
title_full Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline
title_fullStr Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline
title_full_unstemmed Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline
title_short Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline
title_sort whole genome sequencing of streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5028725/
https://www.ncbi.nlm.nih.gov/pubmed/27672516
http://dx.doi.org/10.7717/peerj.2477
work_keys_str_mv AT kapataigeorgia wholegenomesequencingofstreptococcuspneumoniaedevelopmentevaluationandverificationoftargetsforserogroupandserotypepredictionusinganautomatedpipeline
AT sheppardcarmenl wholegenomesequencingofstreptococcuspneumoniaedevelopmentevaluationandverificationoftargetsforserogroupandserotypepredictionusinganautomatedpipeline
AT alshahibali wholegenomesequencingofstreptococcuspneumoniaedevelopmentevaluationandverificationoftargetsforserogroupandserotypepredictionusinganautomatedpipeline
AT littdavidj wholegenomesequencingofstreptococcuspneumoniaedevelopmentevaluationandverificationoftargetsforserogroupandserotypepredictionusinganautomatedpipeline
AT underwoodanthonyp wholegenomesequencingofstreptococcuspneumoniaedevelopmentevaluationandverificationoftargetsforserogroupandserotypepredictionusinganautomatedpipeline
AT harrisontimothyg wholegenomesequencingofstreptococcuspneumoniaedevelopmentevaluationandverificationoftargetsforserogroupandserotypepredictionusinganautomatedpipeline
AT frynormank wholegenomesequencingofstreptococcuspneumoniaedevelopmentevaluationandverificationoftargetsforserogroupandserotypepredictionusinganautomatedpipeline