Cargando…
Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key co...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542781/ https://www.ncbi.nlm.nih.gov/pubmed/26072502 http://dx.doi.org/10.1093/bioinformatics/btv259 |
_version_ | 1782386561799684096 |
---|---|
author | Weissenbacher, Davy Tahsin, Tasnia Beard, Rachel Figaro, Mari Rivera, Robert Scotch, Matthew Gonzalez, Graciela |
author_facet | Weissenbacher, Davy Tahsin, Tasnia Beard, Rachel Figaro, Mari Rivera, Robert Scotch, Matthew Gonzalez, Graciela |
author_sort | Weissenbacher, Davy |
collection | PubMed |
description | Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles. Motivation: In this article, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a ‘metadata heuristic’). Results: For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences. Contact: davy.weissenbacher@asu.edu |
format | Online Article Text |
id | pubmed-4542781 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-45427812015-08-25 Knowledge-driven geospatial location resolution for phylogeographic models of virus migration Weissenbacher, Davy Tahsin, Tasnia Beard, Rachel Figaro, Mari Rivera, Robert Scotch, Matthew Gonzalez, Graciela Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles. Motivation: In this article, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a ‘metadata heuristic’). Results: For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences. Contact: davy.weissenbacher@asu.edu Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4542781/ /pubmed/26072502 http://dx.doi.org/10.1093/bioinformatics/btv259 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0/),which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Weissenbacher, Davy Tahsin, Tasnia Beard, Rachel Figaro, Mari Rivera, Robert Scotch, Matthew Gonzalez, Graciela Knowledge-driven geospatial location resolution for phylogeographic models of virus migration |
title | Knowledge-driven geospatial location resolution for phylogeographic models of virus migration |
title_full | Knowledge-driven geospatial location resolution for phylogeographic models of virus migration |
title_fullStr | Knowledge-driven geospatial location resolution for phylogeographic models of virus migration |
title_full_unstemmed | Knowledge-driven geospatial location resolution for phylogeographic models of virus migration |
title_short | Knowledge-driven geospatial location resolution for phylogeographic models of virus migration |
title_sort | knowledge-driven geospatial location resolution for phylogeographic models of virus migration |
topic | Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542781/ https://www.ncbi.nlm.nih.gov/pubmed/26072502 http://dx.doi.org/10.1093/bioinformatics/btv259 |
work_keys_str_mv | AT weissenbacherdavy knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration AT tahsintasnia knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration AT beardrachel knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration AT figaromari knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration AT riverarobert knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration AT scotchmatthew knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration AT gonzalezgraciela knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration |