Cargando…

Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’

Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user...

Descripción completa

Detalles Bibliográficos
Autores principales: Jacobson, David, Barratt, Joel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955612/
https://www.ncbi.nlm.nih.gov/pubmed/36827266
http://dx.doi.org/10.1371/journal.pone.0282154
_version_ 1784894389730934784
author Jacobson, David
Barratt, Joel
author_facet Jacobson, David
Barratt, Joel
author_sort Jacobson, David
collection PubMed
description Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user-defined parameter to predict the optimal partition number and augmenting this parameter can drastically impact resultant partition memberships. Here, we demonstrate how to optimize a given tree dissection parameter to maximize accuracy irrespective of the tree dissection method used. We hierarchically clustered 1,873 genotypes of the foodborne pathogen Cyclospora spp., including 587 possessing links to historic outbreaks. We dissected the resulting tree using a statistical method requiring users to select the value of a ‘stringency parameter’ (s), with a recommended value of 95% to 99.5%. We dissected this hierarchical tree across s-values from 94% to 99.5% (at increments of 0.25%), to identify a value that maximized partitioning accuracy, defined as the degree to which genetic partitions conform to known epidemiologic groupings. We show that s-values of 96.5% and 96.75% yield the highest accuracy (> 99.9%) when clustering Cyclospora sp. isolates with known epidemiologic linkages. In practice, the optimized s-value will generate robust genetic partitions comprising isolates likely derived from a common food source, even when the epidemiologic grouping is not known prior to genetic clustering. While the s-value is specific to the tree dissection method used here, the optimization approach described could be applied to any parameter/method used to dissect hierarchical trees.
format Online
Article
Text
id pubmed-9955612
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-99556122023-02-25 Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ Jacobson, David Barratt, Joel PLoS One Research Article Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user-defined parameter to predict the optimal partition number and augmenting this parameter can drastically impact resultant partition memberships. Here, we demonstrate how to optimize a given tree dissection parameter to maximize accuracy irrespective of the tree dissection method used. We hierarchically clustered 1,873 genotypes of the foodborne pathogen Cyclospora spp., including 587 possessing links to historic outbreaks. We dissected the resulting tree using a statistical method requiring users to select the value of a ‘stringency parameter’ (s), with a recommended value of 95% to 99.5%. We dissected this hierarchical tree across s-values from 94% to 99.5% (at increments of 0.25%), to identify a value that maximized partitioning accuracy, defined as the degree to which genetic partitions conform to known epidemiologic groupings. We show that s-values of 96.5% and 96.75% yield the highest accuracy (> 99.9%) when clustering Cyclospora sp. isolates with known epidemiologic linkages. In practice, the optimized s-value will generate robust genetic partitions comprising isolates likely derived from a common food source, even when the epidemiologic grouping is not known prior to genetic clustering. While the s-value is specific to the tree dissection method used here, the optimization approach described could be applied to any parameter/method used to dissect hierarchical trees. Public Library of Science 2023-02-24 /pmc/articles/PMC9955612/ /pubmed/36827266 http://dx.doi.org/10.1371/journal.pone.0282154 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Jacobson, David
Barratt, Joel
Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
title Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
title_full Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
title_fullStr Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
title_full_unstemmed Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
title_short Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
title_sort optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955612/
https://www.ncbi.nlm.nih.gov/pubmed/36827266
http://dx.doi.org/10.1371/journal.pone.0282154
work_keys_str_mv AT jacobsondavid optimizinghierarchicaltreedissectionparametersusinghistoricepidemiologicdataasgroundtruth
AT barrattjoel optimizinghierarchicaltreedissectionparametersusinghistoricepidemiologicdataasgroundtruth