Cargando…

Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography

Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here,...

Descripción completa

Detalles Bibliográficos
Autores principales: Scotch, Matthew, Tahsin, Tasnia, Weissenbacher, Davy, O’Connor, Karen, Magge, Arjun, Vaiente, Matteo, Suchard, Marc A, Gonzalez-Hernandez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6395475/
https://www.ncbi.nlm.nih.gov/pubmed/30838129
http://dx.doi.org/10.1093/ve/vey043
_version_ 1783399096812306432
author Scotch, Matthew
Tahsin, Tasnia
Weissenbacher, Davy
O’Connor, Karen
Magge, Arjun
Vaiente, Matteo
Suchard, Marc A
Gonzalez-Hernandez, Graciela
author_facet Scotch, Matthew
Tahsin, Tasnia
Weissenbacher, Davy
O’Connor, Karen
Magge, Arjun
Vaiente, Matteo
Suchard, Marc A
Gonzalez-Hernandez, Graciela
author_sort Scotch, Matthew
collection PubMed
description Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled 10, 30, and 50); (ii) assigned the remaining uncertainty to just one other location; thus ‘splitting’ the uncertainty among two locations (i.e. 10/90, 30/70, and 50/50); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005–0.047 for scenarios with sampling uncertainty—(i) and (ii) above—versus a range of 0.063–0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to ad-hoc heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography.
format Online
Article
Text
id pubmed-6395475
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63954752019-03-05 Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography Scotch, Matthew Tahsin, Tasnia Weissenbacher, Davy O’Connor, Karen Magge, Arjun Vaiente, Matteo Suchard, Marc A Gonzalez-Hernandez, Graciela Virus Evol Research Article Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled 10, 30, and 50); (ii) assigned the remaining uncertainty to just one other location; thus ‘splitting’ the uncertainty among two locations (i.e. 10/90, 30/70, and 50/50); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005–0.047 for scenarios with sampling uncertainty—(i) and (ii) above—versus a range of 0.063–0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to ad-hoc heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography. Oxford University Press 2019-02-28 /pmc/articles/PMC6395475/ /pubmed/30838129 http://dx.doi.org/10.1093/ve/vey043 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research Article
Scotch, Matthew
Tahsin, Tasnia
Weissenbacher, Davy
O’Connor, Karen
Magge, Arjun
Vaiente, Matteo
Suchard, Marc A
Gonzalez-Hernandez, Graciela
Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
title Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
title_full Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
title_fullStr Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
title_full_unstemmed Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
title_short Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
title_sort incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6395475/
https://www.ncbi.nlm.nih.gov/pubmed/30838129
http://dx.doi.org/10.1093/ve/vey043
work_keys_str_mv AT scotchmatthew incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography
AT tahsintasnia incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography
AT weissenbacherdavy incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography
AT oconnorkaren incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography
AT maggearjun incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography
AT vaientematteo incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography
AT suchardmarca incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography
AT gonzalezhernandezgraciela incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography