Cargando…
Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here,...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6395475/ https://www.ncbi.nlm.nih.gov/pubmed/30838129 http://dx.doi.org/10.1093/ve/vey043 |
_version_ | 1783399096812306432 |
---|---|
author | Scotch, Matthew Tahsin, Tasnia Weissenbacher, Davy O’Connor, Karen Magge, Arjun Vaiente, Matteo Suchard, Marc A Gonzalez-Hernandez, Graciela |
author_facet | Scotch, Matthew Tahsin, Tasnia Weissenbacher, Davy O’Connor, Karen Magge, Arjun Vaiente, Matteo Suchard, Marc A Gonzalez-Hernandez, Graciela |
author_sort | Scotch, Matthew |
collection | PubMed |
description | Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled 10, 30, and 50); (ii) assigned the remaining uncertainty to just one other location; thus ‘splitting’ the uncertainty among two locations (i.e. 10/90, 30/70, and 50/50); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005–0.047 for scenarios with sampling uncertainty—(i) and (ii) above—versus a range of 0.063–0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to ad-hoc heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography. |
format | Online Article Text |
id | pubmed-6395475 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63954752019-03-05 Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography Scotch, Matthew Tahsin, Tasnia Weissenbacher, Davy O’Connor, Karen Magge, Arjun Vaiente, Matteo Suchard, Marc A Gonzalez-Hernandez, Graciela Virus Evol Research Article Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled 10, 30, and 50); (ii) assigned the remaining uncertainty to just one other location; thus ‘splitting’ the uncertainty among two locations (i.e. 10/90, 30/70, and 50/50); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005–0.047 for scenarios with sampling uncertainty—(i) and (ii) above—versus a range of 0.063–0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to ad-hoc heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography. Oxford University Press 2019-02-28 /pmc/articles/PMC6395475/ /pubmed/30838129 http://dx.doi.org/10.1093/ve/vey043 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research Article Scotch, Matthew Tahsin, Tasnia Weissenbacher, Davy O’Connor, Karen Magge, Arjun Vaiente, Matteo Suchard, Marc A Gonzalez-Hernandez, Graciela Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography |
title | Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography |
title_full | Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography |
title_fullStr | Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography |
title_full_unstemmed | Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography |
title_short | Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography |
title_sort | incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6395475/ https://www.ncbi.nlm.nih.gov/pubmed/30838129 http://dx.doi.org/10.1093/ve/vey043 |
work_keys_str_mv | AT scotchmatthew incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography AT tahsintasnia incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography AT weissenbacherdavy incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography AT oconnorkaren incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography AT maggearjun incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography AT vaientematteo incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography AT suchardmarca incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography AT gonzalezhernandezgraciela incorporatingsamplinguncertaintyinthegeospatialassignmentoftaxaforvirusphylogeography |