Cargando…

Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature

Phylogeography research involving virus spread and tree reconstruction relies on accurate geographic locations of infected hosts. Insufficient level of geographic information in nucleotide sequence repositories such as GenBank motivates the use of natural language processing methods for extracting g...

Descripción completa

Detalles Bibliográficos
Autores principales: Magge, Arjun, Weissenbacher, Davy, Sarker, Abeed, Scotch, Matthew, Gonzalez-Hernandez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417823/
https://www.ncbi.nlm.nih.gov/pubmed/30864314
_version_ 1783403628835373056
author Magge, Arjun
Weissenbacher, Davy
Sarker, Abeed
Scotch, Matthew
Gonzalez-Hernandez, Graciela
author_facet Magge, Arjun
Weissenbacher, Davy
Sarker, Abeed
Scotch, Matthew
Gonzalez-Hernandez, Graciela
author_sort Magge, Arjun
collection PubMed
description Phylogeography research involving virus spread and tree reconstruction relies on accurate geographic locations of infected hosts. Insufficient level of geographic information in nucleotide sequence repositories such as GenBank motivates the use of natural language processing methods for extracting geographic location names (toponyms) in the scientific article associated with the sequence, and disambiguating the locations to their co-ordinates. In this paper, we present an extensive study of multiple recurrent neural network architectures for the task of extracting geographic locations and their effective contribution to the disambiguation task using population heuristics. The methods presented in this paper achieve a strict detection F(1) score of 0.94, disambiguation accuracy of 91% and an overall resolution F(1) score of 0.88 that are significantly higher than previously developed methods, improving our capability to find the location of infected hosts and enrich metadata information.
format Online
Article
Text
id pubmed-6417823
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-64178232019-03-14 Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature Magge, Arjun Weissenbacher, Davy Sarker, Abeed Scotch, Matthew Gonzalez-Hernandez, Graciela Pac Symp Biocomput Article Phylogeography research involving virus spread and tree reconstruction relies on accurate geographic locations of infected hosts. Insufficient level of geographic information in nucleotide sequence repositories such as GenBank motivates the use of natural language processing methods for extracting geographic location names (toponyms) in the scientific article associated with the sequence, and disambiguating the locations to their co-ordinates. In this paper, we present an extensive study of multiple recurrent neural network architectures for the task of extracting geographic locations and their effective contribution to the disambiguation task using population heuristics. The methods presented in this paper achieve a strict detection F(1) score of 0.94, disambiguation accuracy of 91% and an overall resolution F(1) score of 0.88 that are significantly higher than previously developed methods, improving our capability to find the location of infected hosts and enrich metadata information. 2019 /pmc/articles/PMC6417823/ /pubmed/30864314 Text en http://creativecommons.org/licenses/by-nc/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC)4.0 License.
spellingShingle Article
Magge, Arjun
Weissenbacher, Davy
Sarker, Abeed
Scotch, Matthew
Gonzalez-Hernandez, Graciela
Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature
title Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature
title_full Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature
title_fullStr Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature
title_full_unstemmed Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature
title_short Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature
title_sort bi-directional recurrent neural network models for geographic location extraction in biomedical literature
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417823/
https://www.ncbi.nlm.nih.gov/pubmed/30864314
work_keys_str_mv AT maggearjun bidirectionalrecurrentneuralnetworkmodelsforgeographiclocationextractioninbiomedicalliterature
AT weissenbacherdavy bidirectionalrecurrentneuralnetworkmodelsforgeographiclocationextractioninbiomedicalliterature
AT sarkerabeed bidirectionalrecurrentneuralnetworkmodelsforgeographiclocationextractioninbiomedicalliterature
AT scotchmatthew bidirectionalrecurrentneuralnetworkmodelsforgeographiclocationextractioninbiomedicalliterature
AT gonzalezhernandezgraciela bidirectionalrecurrentneuralnetworkmodelsforgeographiclocationextractioninbiomedicalliterature