Cargando…

Knowledge-driven geospatial location resolution for phylogeographic models of virus migration

Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key co...

Descripción completa

Detalles Bibliográficos
Autores principales: Weissenbacher, Davy, Tahsin, Tasnia, Beard, Rachel, Figaro, Mari, Rivera, Robert, Scotch, Matthew, Gonzalez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542781/
https://www.ncbi.nlm.nih.gov/pubmed/26072502
http://dx.doi.org/10.1093/bioinformatics/btv259
_version_ 1782386561799684096
author Weissenbacher, Davy
Tahsin, Tasnia
Beard, Rachel
Figaro, Mari
Rivera, Robert
Scotch, Matthew
Gonzalez, Graciela
author_facet Weissenbacher, Davy
Tahsin, Tasnia
Beard, Rachel
Figaro, Mari
Rivera, Robert
Scotch, Matthew
Gonzalez, Graciela
author_sort Weissenbacher, Davy
collection PubMed
description Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles. Motivation: In this article, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a ‘metadata heuristic’). Results: For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences. Contact: davy.weissenbacher@asu.edu
format Online
Article
Text
id pubmed-4542781
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-45427812015-08-25 Knowledge-driven geospatial location resolution for phylogeographic models of virus migration Weissenbacher, Davy Tahsin, Tasnia Beard, Rachel Figaro, Mari Rivera, Robert Scotch, Matthew Gonzalez, Graciela Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles. Motivation: In this article, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a ‘metadata heuristic’). Results: For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences. Contact: davy.weissenbacher@asu.edu Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4542781/ /pubmed/26072502 http://dx.doi.org/10.1093/bioinformatics/btv259 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0/),which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
Weissenbacher, Davy
Tahsin, Tasnia
Beard, Rachel
Figaro, Mari
Rivera, Robert
Scotch, Matthew
Gonzalez, Graciela
Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
title Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
title_full Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
title_fullStr Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
title_full_unstemmed Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
title_short Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
title_sort knowledge-driven geospatial location resolution for phylogeographic models of virus migration
topic Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4542781/
https://www.ncbi.nlm.nih.gov/pubmed/26072502
http://dx.doi.org/10.1093/bioinformatics/btv259
work_keys_str_mv AT weissenbacherdavy knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration
AT tahsintasnia knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration
AT beardrachel knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration
AT figaromari knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration
AT riverarobert knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration
AT scotchmatthew knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration
AT gonzalezgraciela knowledgedrivengeospatiallocationresolutionforphylogeographicmodelsofvirusmigration