Cargando…
Deep neural networks and distant supervision for geographic location mention extraction
MOTIVATION: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeograph...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022665/ https://www.ncbi.nlm.nih.gov/pubmed/29950020 http://dx.doi.org/10.1093/bioinformatics/bty273 |
_version_ | 1783335726630305792 |
---|---|
author | Magge, Arjun Weissenbacher, Davy Sarker, Abeed Scotch, Matthew Gonzalez-Hernandez, Graciela |
author_facet | Magge, Arjun Weissenbacher, Davy Sarker, Abeed Scotch, Matthew Gonzalez-Hernandez, Graciela |
author_sort | Magge, Arjun |
collection | PubMed |
description | MOTIVATION: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER. RESULTS: Our NER achieves an F1-score of 0.910 and significantly outperforms the previous state-of-the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER’s capability to embed external features to further boost the system’s performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature. |
format | Online Article Text |
id | pubmed-6022665 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60226652018-07-10 Deep neural networks and distant supervision for geographic location mention extraction Magge, Arjun Weissenbacher, Davy Sarker, Abeed Scotch, Matthew Gonzalez-Hernandez, Graciela Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER. RESULTS: Our NER achieves an F1-score of 0.910 and significantly outperforms the previous state-of-the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER’s capability to embed external features to further boost the system’s performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022665/ /pubmed/29950020 http://dx.doi.org/10.1093/bioinformatics/bty273 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Magge, Arjun Weissenbacher, Davy Sarker, Abeed Scotch, Matthew Gonzalez-Hernandez, Graciela Deep neural networks and distant supervision for geographic location mention extraction |
title | Deep neural networks and distant supervision for geographic location mention extraction |
title_full | Deep neural networks and distant supervision for geographic location mention extraction |
title_fullStr | Deep neural networks and distant supervision for geographic location mention extraction |
title_full_unstemmed | Deep neural networks and distant supervision for geographic location mention extraction |
title_short | Deep neural networks and distant supervision for geographic location mention extraction |
title_sort | deep neural networks and distant supervision for geographic location mention extraction |
topic | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022665/ https://www.ncbi.nlm.nih.gov/pubmed/29950020 http://dx.doi.org/10.1093/bioinformatics/bty273 |
work_keys_str_mv | AT maggearjun deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction AT weissenbacherdavy deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction AT sarkerabeed deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction AT scotchmatthew deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction AT gonzalezhernandezgraciela deepneuralnetworksanddistantsupervisionforgeographiclocationmentionextraction |