Cargando…

Building deep learning models for evidence classification from the open access biomedical literature

We investigate the application of deep learning to biocuration tasks that involve classification of text associated with biomedical evidence in primary research articles. We developed a large-scale corpus of molecular papers derived from PubMed and PubMed Central open access records and used it to t...

Descripción completa

Detalles Bibliográficos
Autores principales: Burns, Gully A, Li, Xiangci, Peng, Nanyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449534/
https://www.ncbi.nlm.nih.gov/pubmed/30938776
http://dx.doi.org/10.1093/database/baz034
_version_ 1783408868759437312
author Burns, Gully A
Li, Xiangci
Peng, Nanyun
author_facet Burns, Gully A
Li, Xiangci
Peng, Nanyun
author_sort Burns, Gully A
collection PubMed
description We investigate the application of deep learning to biocuration tasks that involve classification of text associated with biomedical evidence in primary research articles. We developed a large-scale corpus of molecular papers derived from PubMed and PubMed Central open access records and used it to train deep learning word embeddings under the GloVe, FastText and ELMo algorithms. We applied those models to a distant supervised method classification task based on text from figure captions or fragments surrounding references to figures in the main text using a variety or models and parameterizations. We then developed document classification (triage) methods for molecular interaction papers by using deep learning mechanisms of attention to aggregate classification-based decisions over selected paragraphs in the document. We were able to obtain triage performance with an accuracy of 0.82 using a combined convolutional neural network, bi-directional long short-term memory architecture augmented by attention to produce a single decision for triage. In this work, we hope to encourage biocuration systems developers to apply deep learning methods to their specialized tasks by repurposing large-scale word embedding to apply to their data.
format Online
Article
Text
id pubmed-6449534
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64495342019-04-09 Building deep learning models for evidence classification from the open access biomedical literature Burns, Gully A Li, Xiangci Peng, Nanyun Database (Oxford) Original Article We investigate the application of deep learning to biocuration tasks that involve classification of text associated with biomedical evidence in primary research articles. We developed a large-scale corpus of molecular papers derived from PubMed and PubMed Central open access records and used it to train deep learning word embeddings under the GloVe, FastText and ELMo algorithms. We applied those models to a distant supervised method classification task based on text from figure captions or fragments surrounding references to figures in the main text using a variety or models and parameterizations. We then developed document classification (triage) methods for molecular interaction papers by using deep learning mechanisms of attention to aggregate classification-based decisions over selected paragraphs in the document. We were able to obtain triage performance with an accuracy of 0.82 using a combined convolutional neural network, bi-directional long short-term memory architecture augmented by attention to produce a single decision for triage. In this work, we hope to encourage biocuration systems developers to apply deep learning methods to their specialized tasks by repurposing large-scale word embedding to apply to their data. Oxford University Press 2019-04-02 /pmc/articles/PMC6449534/ /pubmed/30938776 http://dx.doi.org/10.1093/database/baz034 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Burns, Gully A
Li, Xiangci
Peng, Nanyun
Building deep learning models for evidence classification from the open access biomedical literature
title Building deep learning models for evidence classification from the open access biomedical literature
title_full Building deep learning models for evidence classification from the open access biomedical literature
title_fullStr Building deep learning models for evidence classification from the open access biomedical literature
title_full_unstemmed Building deep learning models for evidence classification from the open access biomedical literature
title_short Building deep learning models for evidence classification from the open access biomedical literature
title_sort building deep learning models for evidence classification from the open access biomedical literature
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449534/
https://www.ncbi.nlm.nih.gov/pubmed/30938776
http://dx.doi.org/10.1093/database/baz034
work_keys_str_mv AT burnsgullya buildingdeeplearningmodelsforevidenceclassificationfromtheopenaccessbiomedicalliterature
AT lixiangci buildingdeeplearningmodelsforevidenceclassificationfromtheopenaccessbiomedicalliterature
AT pengnanyun buildingdeeplearningmodelsforevidenceclassificationfromtheopenaccessbiomedicalliterature