Cargando…

Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses

Zoonotic viruses represent emerging or re-emerging pathogens that pose significant public health threats throughout the world. It is therefore crucial to advance current surveillance mechanisms for these viruses through outlets such as phylogeography. Despite the abundance of zoonotic viral sequence...

Descripción completa

Detalles Bibliográficos
Autores principales: Tahsin, Tasnia, Beard, Rachel, Rivera, Robert, Lauder, Rob, Wallstrom, Garrick, Scotch, Matthew, Gonzalez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333696/
https://www.ncbi.nlm.nih.gov/pubmed/25717409
_version_ 1782358085045583872
author Tahsin, Tasnia
Beard, Rachel
Rivera, Robert
Lauder, Rob
Wallstrom, Garrick
Scotch, Matthew
Gonzalez, Graciela
author_facet Tahsin, Tasnia
Beard, Rachel
Rivera, Robert
Lauder, Rob
Wallstrom, Garrick
Scotch, Matthew
Gonzalez, Graciela
author_sort Tahsin, Tasnia
collection PubMed
description Zoonotic viruses represent emerging or re-emerging pathogens that pose significant public health threats throughout the world. It is therefore crucial to advance current surveillance mechanisms for these viruses through outlets such as phylogeography. Despite the abundance of zoonotic viral sequence data in publicly available databases such as GenBank, phylogeographic analysis of these viruses is often limited by the lack of adequate geographic metadata. However, many GenBank records include references to articles with more detailed information and automated systems may help extract this information efficiently and effectively. In this paper, we describe our efforts to determine the proportion of GenBank records with “insufficient” geographic metadata for seven well-studied viruses. We also evaluate the performance of four different Named Entity Recognition (NER) systems for automatically extracting related entities using a manually created gold-standard.
format Online
Article
Text
id pubmed-4333696
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-43336962015-02-25 Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses Tahsin, Tasnia Beard, Rachel Rivera, Robert Lauder, Rob Wallstrom, Garrick Scotch, Matthew Gonzalez, Graciela AMIA Jt Summits Transl Sci Proc Articles Zoonotic viruses represent emerging or re-emerging pathogens that pose significant public health threats throughout the world. It is therefore crucial to advance current surveillance mechanisms for these viruses through outlets such as phylogeography. Despite the abundance of zoonotic viral sequence data in publicly available databases such as GenBank, phylogeographic analysis of these viruses is often limited by the lack of adequate geographic metadata. However, many GenBank records include references to articles with more detailed information and automated systems may help extract this information efficiently and effectively. In this paper, we describe our efforts to determine the proportion of GenBank records with “insufficient” geographic metadata for seven well-studied viruses. We also evaluate the performance of four different Named Entity Recognition (NER) systems for automatically extracting related entities using a manually created gold-standard. American Medical Informatics Association 2014-04-07 /pmc/articles/PMC4333696/ /pubmed/25717409 Text en ©2014 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Tahsin, Tasnia
Beard, Rachel
Rivera, Robert
Lauder, Rob
Wallstrom, Garrick
Scotch, Matthew
Gonzalez, Graciela
Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses
title Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses
title_full Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses
title_fullStr Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses
title_full_unstemmed Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses
title_short Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses
title_sort natural language processing methods for enhancing geographic metadata for phylogeography of zoonotic viruses
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333696/
https://www.ncbi.nlm.nih.gov/pubmed/25717409
work_keys_str_mv AT tahsintasnia naturallanguageprocessingmethodsforenhancinggeographicmetadataforphylogeographyofzoonoticviruses
AT beardrachel naturallanguageprocessingmethodsforenhancinggeographicmetadataforphylogeographyofzoonoticviruses
AT riverarobert naturallanguageprocessingmethodsforenhancinggeographicmetadataforphylogeographyofzoonoticviruses
AT lauderrob naturallanguageprocessingmethodsforenhancinggeographicmetadataforphylogeographyofzoonoticviruses
AT wallstromgarrick naturallanguageprocessingmethodsforenhancinggeographicmetadataforphylogeographyofzoonoticviruses
AT scotchmatthew naturallanguageprocessingmethodsforenhancinggeographicmetadataforphylogeographyofzoonoticviruses
AT gonzalezgraciela naturallanguageprocessingmethodsforenhancinggeographicmetadataforphylogeographyofzoonoticviruses