Cargando…

HierCC: a multi-level clustering scheme for population assignments based on core genome MLST

MOTIVATION: Routine infectious disease surveillance is increasingly based on large-scale whole-genome sequencing databases. Real-time surveillance would benefit from immediate assignments of each genome assembly to hierarchical population structures. Here we present pHierCC, a pipeline that defines...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Zhemin, Charlesworth, Jane, Achtman, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545296/
https://www.ncbi.nlm.nih.gov/pubmed/33823553
http://dx.doi.org/10.1093/bioinformatics/btab234
_version_ 1784589985680195584
author Zhou, Zhemin
Charlesworth, Jane
Achtman, Mark
author_facet Zhou, Zhemin
Charlesworth, Jane
Achtman, Mark
author_sort Zhou, Zhemin
collection PubMed
description MOTIVATION: Routine infectious disease surveillance is increasingly based on large-scale whole-genome sequencing databases. Real-time surveillance would benefit from immediate assignments of each genome assembly to hierarchical population structures. Here we present pHierCC, a pipeline that defines a scalable clustering scheme, HierCC, based on core genome multi-locus typing that allows incremental, static, multi-level cluster assignments of genomes. We also present HCCeval, which identifies optimal thresholds for assigning genomes to cohesive HierCC clusters. HierCC was implemented in EnteroBase in 2018 and has since genotyped >530 000 genomes from Salmonella, Escherichia/Shigella, Streptococcus, Clostridioides, Vibrio and Yersinia. AVAILABILITY AND IMPLEMENTATION: https://enterobase.warwick.ac.uk/ and Source code and instructions: https://github.com/zheminzhou/pHierCC SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8545296
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85452962021-10-26 HierCC: a multi-level clustering scheme for population assignments based on core genome MLST Zhou, Zhemin Charlesworth, Jane Achtman, Mark Bioinformatics Applications Notes MOTIVATION: Routine infectious disease surveillance is increasingly based on large-scale whole-genome sequencing databases. Real-time surveillance would benefit from immediate assignments of each genome assembly to hierarchical population structures. Here we present pHierCC, a pipeline that defines a scalable clustering scheme, HierCC, based on core genome multi-locus typing that allows incremental, static, multi-level cluster assignments of genomes. We also present HCCeval, which identifies optimal thresholds for assigning genomes to cohesive HierCC clusters. HierCC was implemented in EnteroBase in 2018 and has since genotyped >530 000 genomes from Salmonella, Escherichia/Shigella, Streptococcus, Clostridioides, Vibrio and Yersinia. AVAILABILITY AND IMPLEMENTATION: https://enterobase.warwick.ac.uk/ and Source code and instructions: https://github.com/zheminzhou/pHierCC SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-04-06 /pmc/articles/PMC8545296/ /pubmed/33823553 http://dx.doi.org/10.1093/bioinformatics/btab234 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Zhou, Zhemin
Charlesworth, Jane
Achtman, Mark
HierCC: a multi-level clustering scheme for population assignments based on core genome MLST
title HierCC: a multi-level clustering scheme for population assignments based on core genome MLST
title_full HierCC: a multi-level clustering scheme for population assignments based on core genome MLST
title_fullStr HierCC: a multi-level clustering scheme for population assignments based on core genome MLST
title_full_unstemmed HierCC: a multi-level clustering scheme for population assignments based on core genome MLST
title_short HierCC: a multi-level clustering scheme for population assignments based on core genome MLST
title_sort hiercc: a multi-level clustering scheme for population assignments based on core genome mlst
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545296/
https://www.ncbi.nlm.nih.gov/pubmed/33823553
http://dx.doi.org/10.1093/bioinformatics/btab234
work_keys_str_mv AT zhouzhemin hierccamultilevelclusteringschemeforpopulationassignmentsbasedoncoregenomemlst
AT charlesworthjane hierccamultilevelclusteringschemeforpopulationassignmentsbasedoncoregenomemlst
AT achtmanmark hierccamultilevelclusteringschemeforpopulationassignmentsbasedoncoregenomemlst