Cargando…

Optimized phylogenetic clustering of HIV-1 sequence data for public health applications

Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random samp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chato, Connor, Feng, Yi, Ruan, Yuhua, Xing, Hui, Herbeck, Joshua, Kalish, Marcia, Poon, Art F. Y.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9744331/ https://www.ncbi.nlm.nih.gov/pubmed/36449514 http://dx.doi.org/10.1371/journal.pcbi.1010745

_version_	1784848901513150464
author	Chato, Connor Feng, Yi Ruan, Yuhua Xing, Hui Herbeck, Joshua Kalish, Marcia Poon, Art F. Y.
author_facet	Chato, Connor Feng, Yi Ruan, Yuhua Xing, Hui Herbeck, Joshua Kalish, Marcia Poon, Art F. Y.
author_sort	Chato, Connor
collection	PubMed
description	Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007–0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 − 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.
format	Online Article Text
id	pubmed-9744331
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-97443312022-12-13 Optimized phylogenetic clustering of HIV-1 sequence data for public health applications Chato, Connor Feng, Yi Ruan, Yuhua Xing, Hui Herbeck, Joshua Kalish, Marcia Poon, Art F. Y. PLoS Comput Biol Research Article Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007–0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 − 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies. Public Library of Science 2022-11-30 /pmc/articles/PMC9744331/ /pubmed/36449514 http://dx.doi.org/10.1371/journal.pcbi.1010745 Text en © 2022 Chato et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Chato, Connor Feng, Yi Ruan, Yuhua Xing, Hui Herbeck, Joshua Kalish, Marcia Poon, Art F. Y. Optimized phylogenetic clustering of HIV-1 sequence data for public health applications
title	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications
title_full	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications
title_fullStr	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications
title_full_unstemmed	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications
title_short	Optimized phylogenetic clustering of HIV-1 sequence data for public health applications
title_sort	optimized phylogenetic clustering of hiv-1 sequence data for public health applications
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9744331/ https://www.ncbi.nlm.nih.gov/pubmed/36449514 http://dx.doi.org/10.1371/journal.pcbi.1010745
work_keys_str_mv	AT chatoconnor optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT fengyi optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT ruanyuhua optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT xinghui optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT herbeckjoshua optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT kalishmarcia optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications AT poonartfy optimizedphylogeneticclusteringofhiv1sequencedataforpublichealthapplications

Optimized phylogenetic clustering of HIV-1 sequence data for public health applications

Ejemplares similares