Cargando…
Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty
Phylogeography is a popular way to analyze virus sequences annotated with discrete, epidemiologically-relevant, trait data. For applied public health surveillance, a key quantity of interest is often the state at the root of the inferred phylogeny. In epidemiological terms, this represents the geogr...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7686256/ https://www.ncbi.nlm.nih.gov/pubmed/32798768 http://dx.doi.org/10.1016/j.meegid.2020.104501 |
_version_ | 1783613303827726336 |
---|---|
author | Vaiente, Matteo A. Scotch, Matthew |
author_facet | Vaiente, Matteo A. Scotch, Matthew |
author_sort | Vaiente, Matteo A. |
collection | PubMed |
description | Phylogeography is a popular way to analyze virus sequences annotated with discrete, epidemiologically-relevant, trait data. For applied public health surveillance, a key quantity of interest is often the state at the root of the inferred phylogeny. In epidemiological terms, this represents the geographic origin of the observed outbreak. Since determining the origin of an outbreak is often critical for public health intervention, it is prudent to understand how well phylogeographic models perform this root state classification task under various analytical scenarios. Specifically, we investigate how discrete state space and sequence data set influence the root state classification accuracy. We performed phylogeographic inference on several simulated DNA data sets while i) increasing the number of sequences and ii) increasing the total number of possible discrete trait values. We show that phylogeographic models tend to perform best at intermediate sequence data set sizes. Further, we demonstrate that a popular metric used for evaluation of phylogeographic models, the Kullback-Leibler (KL) divergence, both increases with discrete state space and data set sizes. Further, by modeling phylogeographic root state classification accuracy using logistic regression, we show that KL is not supported as a predictor of model accuracy, indicating its limited utility for assessing phylogeographic model performance on empirical data. These results suggest that relying solely on the KL metric may lead to artificially inflated support for models with finer discretization schemes and larger data set sizes. These results will be important for public health practitioners seeking to use phylogeographic models for applied infectious disease surveillance. |
format | Online Article Text |
id | pubmed-7686256 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-76862562020-11-25 Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty Vaiente, Matteo A. Scotch, Matthew Infect Genet Evol Article Phylogeography is a popular way to analyze virus sequences annotated with discrete, epidemiologically-relevant, trait data. For applied public health surveillance, a key quantity of interest is often the state at the root of the inferred phylogeny. In epidemiological terms, this represents the geographic origin of the observed outbreak. Since determining the origin of an outbreak is often critical for public health intervention, it is prudent to understand how well phylogeographic models perform this root state classification task under various analytical scenarios. Specifically, we investigate how discrete state space and sequence data set influence the root state classification accuracy. We performed phylogeographic inference on several simulated DNA data sets while i) increasing the number of sequences and ii) increasing the total number of possible discrete trait values. We show that phylogeographic models tend to perform best at intermediate sequence data set sizes. Further, we demonstrate that a popular metric used for evaluation of phylogeographic models, the Kullback-Leibler (KL) divergence, both increases with discrete state space and data set sizes. Further, by modeling phylogeographic root state classification accuracy using logistic regression, we show that KL is not supported as a predictor of model accuracy, indicating its limited utility for assessing phylogeographic model performance on empirical data. These results suggest that relying solely on the KL metric may lead to artificially inflated support for models with finer discretization schemes and larger data set sizes. These results will be important for public health practitioners seeking to use phylogeographic models for applied infectious disease surveillance. 2020-08-13 2020-11 /pmc/articles/PMC7686256/ /pubmed/32798768 http://dx.doi.org/10.1016/j.meegid.2020.104501 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ). |
spellingShingle | Article Vaiente, Matteo A. Scotch, Matthew Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty |
title | Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty |
title_full | Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty |
title_fullStr | Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty |
title_full_unstemmed | Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty |
title_short | Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty |
title_sort | going back to the roots: evaluating bayesian phylogeographic models with discrete trait uncertainty |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7686256/ https://www.ncbi.nlm.nih.gov/pubmed/32798768 http://dx.doi.org/10.1016/j.meegid.2020.104501 |
work_keys_str_mv | AT vaientematteoa goingbacktotherootsevaluatingbayesianphylogeographicmodelswithdiscretetraituncertainty AT scotchmatthew goingbacktotherootsevaluatingbayesianphylogeographicmodelswithdiscretetraituncertainty |