Cargando…

Deep neural networks and distant supervision for geographic location mention extraction

MOTIVATION: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeograph...

Descripción completa

Detalles Bibliográficos
Autores principales: Magge, Arjun, Weissenbacher, Davy, Sarker, Abeed, Scotch, Matthew, Gonzalez-Hernandez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022665/
https://www.ncbi.nlm.nih.gov/pubmed/29950020
http://dx.doi.org/10.1093/bioinformatics/bty273
_version_ 1783335726630305792
author Magge, Arjun
Weissenbacher, Davy
Sarker, Abeed
Scotch, Matthew
Gonzalez-Hernandez, Graciela
author_facet Magge, Arjun
Weissenbacher, Davy
Sarker, Abeed
Scotch, Matthew
Gonzalez-Hernandez, Graciela
author_sort Magge, Arjun
collection PubMed
description MOTIVATION: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER. RESULTS: Our NER achieves an F1-score of 0.910 and significantly outperforms the previous state-of-the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER’s capability to embed external features to further boost the system’s performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature.
format Online
Article
Text
id pubmed-6022665
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60226652018-07-10 Deep neural networks and distant supervision for geographic location mention extraction Magge, Arjun Weissenbacher, Davy Sarker, Abeed Scotch, Matthew Gonzalez-Hernandez, Graciela Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER. RESULTS: Our NER achieves an F1-score of 0.910 and significantly outperforms the previous state-of-the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER’s capability to embed external features to further boost the system’s performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022665/ /pubmed/29950020 http://dx.doi.org/10.1093/bioinformatics/bty273 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Magge, Arjun
Weissenbacher, Davy
Sarker, Abeed
Scotch, Matthew
Gonzalez-Hernandez, Graciela
Deep neural networks and distant supervision for geographic location mention extraction
title Deep neural networks and distant supervision for geographic location mention extraction
title_full Deep neural networks and distant supervision for geographic location mention extraction
title_fullStr Deep neural networks and distant supervision for geographic location mention extraction
title_full_unstemmed Deep neural networks and distant supervision for geographic location mention extraction
title_short Deep neural networks and distant supervision for geographic location mention extraction
title_sort deep neural networks and distant supervision for geographic location mention extraction
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022665/
https://www.ncbi.nlm.nih.gov/pubmed/29950020
http://dx.doi.org/10.1093/bioinformatics/bty273
work_keys_str_mv AT maggearjun deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction
AT weissenbacherdavy deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction
AT sarkerabeed deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction
AT scotchmatthew deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction
AT gonzalezhernandezgraciela deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction