Cargando…

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile

Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Eyre, David W., Peto, Tim E. A., Crook, Derrick W., Walker, A. Sarah, Wilcox, Mark H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Society for Microbiology 2019
Materias:	Bacteriology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6935933/ https://www.ncbi.nlm.nih.gov/pubmed/31666367 http://dx.doi.org/10.1128/JCM.01037-19

_version_	1783483655590510592
author	Eyre, David W. Peto, Tim E. A. Crook, Derrick W. Walker, A. Sarah Wilcox, Mark H.
author_facet	Eyre, David W. Peto, Tim E. A. Crook, Derrick W. Walker, A. Sarah Wilcox, Mark H.
author_sort	Eyre, David W.
collection	PubMed
description	Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ≤2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ≤2 gene differences and 16 (4%) had ≥4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships.
format	Online Article Text
id	pubmed-6935933
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	American Society for Microbiology
record_format	MEDLINE/PubMed
spelling	pubmed-69359332020-01-31 Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile Eyre, David W. Peto, Tim E. A. Crook, Derrick W. Walker, A. Sarah Wilcox, Mark H. J Clin Microbiol Bacteriology Pathogen whole-genome sequencing has huge potential as a tool to better understand infection transmission. However, rapidly identifying closely related genomes among a background of thousands of other genomes is challenging. Here, we describe a refinement to core genome multilocus sequence typing (cgMLST) in which alleles at each gene are reproducibly converted to a unique hash, or short string of letters (hash-cgMLST). This avoids the resource-intensive need for a single centralized database of sequentially numbered alleles. We test the reproducibility and discriminatory power of cgMLST/hash-cgMLST compared to those of mapping-based approaches in Clostridium difficile, using repeated sequencing of the same isolates (replicates) and data from consecutive infection isolates from six English hospitals. Hash-cgMLST provided the same results as standard cgMLST, with minimal performance penalty. Comparing 272 replicate sequence pairs using reference-based mapping, there were 0, 1, or 2 single-nucleotide polymorphisms (SNPs) between 262 (96%), 5 (2%), and 1 (<1%) of the pairs, respectively. Using hash-cgMLST, 218 (80%) of replicate pairs assembled with SPAdes had zero gene differences, and 31 (11%), 5 (2%), and 18 (7%) pairs had 1, 2, and >2 differences, respectively. False gene differences were clustered in specific genes and associated with fragmented assemblies, but were reduced using the SKESA assembler. Considering 412 pairs of infections with ≤2 SNPS, i.e., consistent with recent transmission, 376 (91%) had ≤2 gene differences and 16 (4%) had ≥4. Comparing a genome to 100,000 others took <1 min using hash-cgMLST. Hash-cgMLST is an effective surveillance tool for rapidly identifying clusters of related genomes. However, cgMLST/hash-cgMLST generate more false variants than mapping-based approaches. Follow-up mapping-based analyses are likely required to precisely define close genetic relationships. American Society for Microbiology 2019-12-23 /pmc/articles/PMC6935933/ /pubmed/31666367 http://dx.doi.org/10.1128/JCM.01037-19 Text en Copyright © 2019 Eyre et al. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Bacteriology Eyre, David W. Peto, Tim E. A. Crook, Derrick W. Walker, A. Sarah Wilcox, Mark H. Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile
title	Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile
title_full	Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile
title_fullStr	Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile
title_full_unstemmed	Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile
title_short	Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile
title_sort	hash-based core genome multilocus sequence typing for clostridium difficile
topic	Bacteriology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6935933/ https://www.ncbi.nlm.nih.gov/pubmed/31666367 http://dx.doi.org/10.1128/JCM.01037-19
work_keys_str_mv	AT eyredavidw hashbasedcoregenomemultilocussequencetypingforclostridiumdifficile AT petotimea hashbasedcoregenomemultilocussequencetypingforclostridiumdifficile AT crookderrickw hashbasedcoregenomemultilocussequencetypingforclostridiumdifficile AT walkerasarah hashbasedcoregenomemultilocussequencetypingforclostridiumdifficile AT wilcoxmarkh hashbasedcoregenomemultilocussequencetypingforclostridiumdifficile

Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile

Ejemplares similares