Cargando…

Genealogical inference and more flexible sequence clustering using iterative-PopPUNK

Bacterial genome data are accumulating at an unprecedented speed due to the routine use of sequencing in clinical diagnoses, public health surveillance, and population genetics studies. Genealogical reconstruction is fundamental to many of these uses; however, inferring genealogy from large-scale ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Bin, Lees, John A., Wu, Hongjin, Yang, Chao, Falush, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10519404/
https://www.ncbi.nlm.nih.gov/pubmed/37253539
http://dx.doi.org/10.1101/gr.277395.122
_version_ 1785145954628796416
author Zhao, Bin
Lees, John A.
Wu, Hongjin
Yang, Chao
Falush, Daniel
author_facet Zhao, Bin
Lees, John A.
Wu, Hongjin
Yang, Chao
Falush, Daniel
author_sort Zhao, Bin
collection PubMed
description Bacterial genome data are accumulating at an unprecedented speed due to the routine use of sequencing in clinical diagnoses, public health surveillance, and population genetics studies. Genealogical reconstruction is fundamental to many of these uses; however, inferring genealogy from large-scale genome data sets quickly, accurately, and flexibly is still a challenge. Here, we extend an alignment- and annotation-free method, PopPUNK, to increase its flexibility and interpretability across data sets. Our method, iterative-PopPUNK, rapidly produces multiple consistent cluster assignments across a range of sequence identities. By constructing a partially resolved genealogical tree with respect to these clusters, users can select a resolution most appropriate for their needs. We showed the accuracy of clusters at all levels of similarity and genealogical inference of iterative-PopPUNK based on simulated data and obtained phylogenetically concordant results in real data sets from seven bacterial species. Using two example sets of Escherichia/Shigella and Vibrio parahaemolyticus genomes, we show that iterative-PopPUNK can achieve cluster resolutions ranging from phylogroup down to sequence typing (ST). The iterative-PopPUNK algorithm is implemented in the “PopPUNK_iterate” program, available as part of the PopPUNK package.
format Online
Article
Text
id pubmed-10519404
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-105194042023-12-01 Genealogical inference and more flexible sequence clustering using iterative-PopPUNK Zhao, Bin Lees, John A. Wu, Hongjin Yang, Chao Falush, Daniel Genome Res Methods Bacterial genome data are accumulating at an unprecedented speed due to the routine use of sequencing in clinical diagnoses, public health surveillance, and population genetics studies. Genealogical reconstruction is fundamental to many of these uses; however, inferring genealogy from large-scale genome data sets quickly, accurately, and flexibly is still a challenge. Here, we extend an alignment- and annotation-free method, PopPUNK, to increase its flexibility and interpretability across data sets. Our method, iterative-PopPUNK, rapidly produces multiple consistent cluster assignments across a range of sequence identities. By constructing a partially resolved genealogical tree with respect to these clusters, users can select a resolution most appropriate for their needs. We showed the accuracy of clusters at all levels of similarity and genealogical inference of iterative-PopPUNK based on simulated data and obtained phylogenetically concordant results in real data sets from seven bacterial species. Using two example sets of Escherichia/Shigella and Vibrio parahaemolyticus genomes, we show that iterative-PopPUNK can achieve cluster resolutions ranging from phylogroup down to sequence typing (ST). The iterative-PopPUNK algorithm is implemented in the “PopPUNK_iterate” program, available as part of the PopPUNK package. Cold Spring Harbor Laboratory Press 2023-06 /pmc/articles/PMC10519404/ /pubmed/37253539 http://dx.doi.org/10.1101/gr.277395.122 Text en © 2023 Zhao et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Methods
Zhao, Bin
Lees, John A.
Wu, Hongjin
Yang, Chao
Falush, Daniel
Genealogical inference and more flexible sequence clustering using iterative-PopPUNK
title Genealogical inference and more flexible sequence clustering using iterative-PopPUNK
title_full Genealogical inference and more flexible sequence clustering using iterative-PopPUNK
title_fullStr Genealogical inference and more flexible sequence clustering using iterative-PopPUNK
title_full_unstemmed Genealogical inference and more flexible sequence clustering using iterative-PopPUNK
title_short Genealogical inference and more flexible sequence clustering using iterative-PopPUNK
title_sort genealogical inference and more flexible sequence clustering using iterative-poppunk
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10519404/
https://www.ncbi.nlm.nih.gov/pubmed/37253539
http://dx.doi.org/10.1101/gr.277395.122
work_keys_str_mv AT zhaobin genealogicalinferenceandmoreflexiblesequenceclusteringusingiterativepoppunk
AT leesjohna genealogicalinferenceandmoreflexiblesequenceclusteringusingiterativepoppunk
AT wuhongjin genealogicalinferenceandmoreflexiblesequenceclusteringusingiterativepoppunk
AT yangchao genealogicalinferenceandmoreflexiblesequenceclusteringusingiterativepoppunk
AT falushdaniel genealogicalinferenceandmoreflexiblesequenceclusteringusingiterativepoppunk