Cargando…
Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’
Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955612/ https://www.ncbi.nlm.nih.gov/pubmed/36827266 http://dx.doi.org/10.1371/journal.pone.0282154 |
_version_ | 1784894389730934784 |
---|---|
author | Jacobson, David Barratt, Joel |
author_facet | Jacobson, David Barratt, Joel |
author_sort | Jacobson, David |
collection | PubMed |
description | Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user-defined parameter to predict the optimal partition number and augmenting this parameter can drastically impact resultant partition memberships. Here, we demonstrate how to optimize a given tree dissection parameter to maximize accuracy irrespective of the tree dissection method used. We hierarchically clustered 1,873 genotypes of the foodborne pathogen Cyclospora spp., including 587 possessing links to historic outbreaks. We dissected the resulting tree using a statistical method requiring users to select the value of a ‘stringency parameter’ (s), with a recommended value of 95% to 99.5%. We dissected this hierarchical tree across s-values from 94% to 99.5% (at increments of 0.25%), to identify a value that maximized partitioning accuracy, defined as the degree to which genetic partitions conform to known epidemiologic groupings. We show that s-values of 96.5% and 96.75% yield the highest accuracy (> 99.9%) when clustering Cyclospora sp. isolates with known epidemiologic linkages. In practice, the optimized s-value will generate robust genetic partitions comprising isolates likely derived from a common food source, even when the epidemiologic grouping is not known prior to genetic clustering. While the s-value is specific to the tree dissection method used here, the optimization approach described could be applied to any parameter/method used to dissect hierarchical trees. |
format | Online Article Text |
id | pubmed-9955612 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-99556122023-02-25 Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ Jacobson, David Barratt, Joel PLoS One Research Article Hierarchical clustering of pathogen genotypes is widely used to complement epidemiologic investigations of outbreaks. Investigators must dissect trees to obtain genetic partitions that provide epidemiologists with meaningful information. Statistical approaches to tree dissection often require a user-defined parameter to predict the optimal partition number and augmenting this parameter can drastically impact resultant partition memberships. Here, we demonstrate how to optimize a given tree dissection parameter to maximize accuracy irrespective of the tree dissection method used. We hierarchically clustered 1,873 genotypes of the foodborne pathogen Cyclospora spp., including 587 possessing links to historic outbreaks. We dissected the resulting tree using a statistical method requiring users to select the value of a ‘stringency parameter’ (s), with a recommended value of 95% to 99.5%. We dissected this hierarchical tree across s-values from 94% to 99.5% (at increments of 0.25%), to identify a value that maximized partitioning accuracy, defined as the degree to which genetic partitions conform to known epidemiologic groupings. We show that s-values of 96.5% and 96.75% yield the highest accuracy (> 99.9%) when clustering Cyclospora sp. isolates with known epidemiologic linkages. In practice, the optimized s-value will generate robust genetic partitions comprising isolates likely derived from a common food source, even when the epidemiologic grouping is not known prior to genetic clustering. While the s-value is specific to the tree dissection method used here, the optimization approach described could be applied to any parameter/method used to dissect hierarchical trees. Public Library of Science 2023-02-24 /pmc/articles/PMC9955612/ /pubmed/36827266 http://dx.doi.org/10.1371/journal.pone.0282154 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Jacobson, David Barratt, Joel Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ |
title | Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ |
title_full | Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ |
title_fullStr | Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ |
title_full_unstemmed | Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ |
title_short | Optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ |
title_sort | optimizing hierarchical tree dissection parameters using historic epidemiologic data as ‘ground truth’ |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955612/ https://www.ncbi.nlm.nih.gov/pubmed/36827266 http://dx.doi.org/10.1371/journal.pone.0282154 |
work_keys_str_mv | AT jacobsondavid optimizinghierarchicaltreedissectionparametersusinghistoricepidemiologicdataasgroundtruth AT barrattjoel optimizinghierarchicaltreedissectionparametersusinghistoricepidemiologicdataasgroundtruth |