Cargando…
Building deep learning models for evidence classification from the open access biomedical literature
We investigate the application of deep learning to biocuration tasks that involve classification of text associated with biomedical evidence in primary research articles. We developed a large-scale corpus of molecular papers derived from PubMed and PubMed Central open access records and used it to t...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449534/ https://www.ncbi.nlm.nih.gov/pubmed/30938776 http://dx.doi.org/10.1093/database/baz034 |
_version_ | 1783408868759437312 |
---|---|
author | Burns, Gully A Li, Xiangci Peng, Nanyun |
author_facet | Burns, Gully A Li, Xiangci Peng, Nanyun |
author_sort | Burns, Gully A |
collection | PubMed |
description | We investigate the application of deep learning to biocuration tasks that involve classification of text associated with biomedical evidence in primary research articles. We developed a large-scale corpus of molecular papers derived from PubMed and PubMed Central open access records and used it to train deep learning word embeddings under the GloVe, FastText and ELMo algorithms. We applied those models to a distant supervised method classification task based on text from figure captions or fragments surrounding references to figures in the main text using a variety or models and parameterizations. We then developed document classification (triage) methods for molecular interaction papers by using deep learning mechanisms of attention to aggregate classification-based decisions over selected paragraphs in the document. We were able to obtain triage performance with an accuracy of 0.82 using a combined convolutional neural network, bi-directional long short-term memory architecture augmented by attention to produce a single decision for triage. In this work, we hope to encourage biocuration systems developers to apply deep learning methods to their specialized tasks by repurposing large-scale word embedding to apply to their data. |
format | Online Article Text |
id | pubmed-6449534 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64495342019-04-09 Building deep learning models for evidence classification from the open access biomedical literature Burns, Gully A Li, Xiangci Peng, Nanyun Database (Oxford) Original Article We investigate the application of deep learning to biocuration tasks that involve classification of text associated with biomedical evidence in primary research articles. We developed a large-scale corpus of molecular papers derived from PubMed and PubMed Central open access records and used it to train deep learning word embeddings under the GloVe, FastText and ELMo algorithms. We applied those models to a distant supervised method classification task based on text from figure captions or fragments surrounding references to figures in the main text using a variety or models and parameterizations. We then developed document classification (triage) methods for molecular interaction papers by using deep learning mechanisms of attention to aggregate classification-based decisions over selected paragraphs in the document. We were able to obtain triage performance with an accuracy of 0.82 using a combined convolutional neural network, bi-directional long short-term memory architecture augmented by attention to produce a single decision for triage. In this work, we hope to encourage biocuration systems developers to apply deep learning methods to their specialized tasks by repurposing large-scale word embedding to apply to their data. Oxford University Press 2019-04-02 /pmc/articles/PMC6449534/ /pubmed/30938776 http://dx.doi.org/10.1093/database/baz034 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Burns, Gully A Li, Xiangci Peng, Nanyun Building deep learning models for evidence classification from the open access biomedical literature |
title | Building deep learning models for evidence classification from the open access biomedical literature |
title_full | Building deep learning models for evidence classification from the open access biomedical literature |
title_fullStr | Building deep learning models for evidence classification from the open access biomedical literature |
title_full_unstemmed | Building deep learning models for evidence classification from the open access biomedical literature |
title_short | Building deep learning models for evidence classification from the open access biomedical literature |
title_sort | building deep learning models for evidence classification from the open access biomedical literature |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449534/ https://www.ncbi.nlm.nih.gov/pubmed/30938776 http://dx.doi.org/10.1093/database/baz034 |
work_keys_str_mv | AT burnsgullya buildingdeeplearningmodelsforevidenceclassificationfromtheopenaccessbiomedicalliterature AT lixiangci buildingdeeplearningmodelsforevidenceclassificationfromtheopenaccessbiomedicalliterature AT pengnanyun buildingdeeplearningmodelsforevidenceclassificationfromtheopenaccessbiomedicalliterature |