Cargando…

Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty

Phylogeography is a popular way to analyze virus sequences annotated with discrete, epidemiologically-relevant, trait data. For applied public health surveillance, a key quantity of interest is often the state at the root of the inferred phylogeny. In epidemiological terms, this represents the geogr...

Descripción completa

Detalles Bibliográficos
Autores principales: Vaiente, Matteo A., Scotch, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7686256/
https://www.ncbi.nlm.nih.gov/pubmed/32798768
http://dx.doi.org/10.1016/j.meegid.2020.104501
_version_ 1783613303827726336
author Vaiente, Matteo A.
Scotch, Matthew
author_facet Vaiente, Matteo A.
Scotch, Matthew
author_sort Vaiente, Matteo A.
collection PubMed
description Phylogeography is a popular way to analyze virus sequences annotated with discrete, epidemiologically-relevant, trait data. For applied public health surveillance, a key quantity of interest is often the state at the root of the inferred phylogeny. In epidemiological terms, this represents the geographic origin of the observed outbreak. Since determining the origin of an outbreak is often critical for public health intervention, it is prudent to understand how well phylogeographic models perform this root state classification task under various analytical scenarios. Specifically, we investigate how discrete state space and sequence data set influence the root state classification accuracy. We performed phylogeographic inference on several simulated DNA data sets while i) increasing the number of sequences and ii) increasing the total number of possible discrete trait values. We show that phylogeographic models tend to perform best at intermediate sequence data set sizes. Further, we demonstrate that a popular metric used for evaluation of phylogeographic models, the Kullback-Leibler (KL) divergence, both increases with discrete state space and data set sizes. Further, by modeling phylogeographic root state classification accuracy using logistic regression, we show that KL is not supported as a predictor of model accuracy, indicating its limited utility for assessing phylogeographic model performance on empirical data. These results suggest that relying solely on the KL metric may lead to artificially inflated support for models with finer discretization schemes and larger data set sizes. These results will be important for public health practitioners seeking to use phylogeographic models for applied infectious disease surveillance.
format Online
Article
Text
id pubmed-7686256
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-76862562020-11-25 Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty Vaiente, Matteo A. Scotch, Matthew Infect Genet Evol Article Phylogeography is a popular way to analyze virus sequences annotated with discrete, epidemiologically-relevant, trait data. For applied public health surveillance, a key quantity of interest is often the state at the root of the inferred phylogeny. In epidemiological terms, this represents the geographic origin of the observed outbreak. Since determining the origin of an outbreak is often critical for public health intervention, it is prudent to understand how well phylogeographic models perform this root state classification task under various analytical scenarios. Specifically, we investigate how discrete state space and sequence data set influence the root state classification accuracy. We performed phylogeographic inference on several simulated DNA data sets while i) increasing the number of sequences and ii) increasing the total number of possible discrete trait values. We show that phylogeographic models tend to perform best at intermediate sequence data set sizes. Further, we demonstrate that a popular metric used for evaluation of phylogeographic models, the Kullback-Leibler (KL) divergence, both increases with discrete state space and data set sizes. Further, by modeling phylogeographic root state classification accuracy using logistic regression, we show that KL is not supported as a predictor of model accuracy, indicating its limited utility for assessing phylogeographic model performance on empirical data. These results suggest that relying solely on the KL metric may lead to artificially inflated support for models with finer discretization schemes and larger data set sizes. These results will be important for public health practitioners seeking to use phylogeographic models for applied infectious disease surveillance. 2020-08-13 2020-11 /pmc/articles/PMC7686256/ /pubmed/32798768 http://dx.doi.org/10.1016/j.meegid.2020.104501 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ).
spellingShingle Article
Vaiente, Matteo A.
Scotch, Matthew
Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty
title Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty
title_full Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty
title_fullStr Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty
title_full_unstemmed Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty
title_short Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty
title_sort going back to the roots: evaluating bayesian phylogeographic models with discrete trait uncertainty
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7686256/
https://www.ncbi.nlm.nih.gov/pubmed/32798768
http://dx.doi.org/10.1016/j.meegid.2020.104501
work_keys_str_mv AT vaientematteoa goingbacktotherootsevaluatingbayesianphylogeographicmodelswithdiscretetraituncertainty
AT scotchmatthew goingbacktotherootsevaluatingbayesianphylogeographicmodelswithdiscretetraituncertainty