Cargando…

Towards reliable named entity recognition in the biomedical domain

MOTIVATION: Automatic biomedical named entity recognition (BioNER) is a key task in biomedical information extraction. For some time, state-of-the-art BioNER has been dominated by machine learning methods, particularly conditional random fields (CRFs), with a recent focus on deep learning. However,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Giorgi, John M, Bader, Gary D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6956779/ https://www.ncbi.nlm.nih.gov/pubmed/31218364 http://dx.doi.org/10.1093/bioinformatics/btz504

_version_	1783487203187359744
author	Giorgi, John M Bader, Gary D
author_facet	Giorgi, John M Bader, Gary D
author_sort	Giorgi, John M
collection	PubMed
description	MOTIVATION: Automatic biomedical named entity recognition (BioNER) is a key task in biomedical information extraction. For some time, state-of-the-art BioNER has been dominated by machine learning methods, particularly conditional random fields (CRFs), with a recent focus on deep learning. However, recent work has suggested that the high performance of CRFs for BioNER may not generalize to corpora other than the one it was trained on. In our analysis, we find that a popular deep learning-based approach to BioNER, known as bidirectional long short-term memory network-conditional random field (BiLSTM-CRF), is correspondingly poor at generalizing. To address this, we evaluate three modifications of BiLSTM-CRF for BioNER to improve generalization: improved regularization via variational dropout, transfer learning and multi-task learning. RESULTS: We measure the effect that each strategy has when training/testing on the same corpus (‘in-corpus’ performance) and when training on one corpus and evaluating on another (‘out-of-corpus’ performance), our measure of the model’s ability to generalize. We found that variational dropout improves out-of-corpus performance by an average of 4.62%, transfer learning by 6.48% and multi-task learning by 8.42%. The maximal increase we identified combines multi-task learning and variational dropout, which boosts out-of-corpus performance by 10.75%. Furthermore, we make available a new open-source tool, called Saber that implements our best BioNER models. AVAILABILITY AND IMPLEMENTATION: Source code for our biomedical IE tool is available at https://github.com/BaderLab/saber. Corpora and other resources used in this study are available at https://github.com/BaderLab/Towards-reliable-BioNER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6956779
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-69567792020-01-16 Towards reliable named entity recognition in the biomedical domain Giorgi, John M Bader, Gary D Bioinformatics Original Papers MOTIVATION: Automatic biomedical named entity recognition (BioNER) is a key task in biomedical information extraction. For some time, state-of-the-art BioNER has been dominated by machine learning methods, particularly conditional random fields (CRFs), with a recent focus on deep learning. However, recent work has suggested that the high performance of CRFs for BioNER may not generalize to corpora other than the one it was trained on. In our analysis, we find that a popular deep learning-based approach to BioNER, known as bidirectional long short-term memory network-conditional random field (BiLSTM-CRF), is correspondingly poor at generalizing. To address this, we evaluate three modifications of BiLSTM-CRF for BioNER to improve generalization: improved regularization via variational dropout, transfer learning and multi-task learning. RESULTS: We measure the effect that each strategy has when training/testing on the same corpus (‘in-corpus’ performance) and when training on one corpus and evaluating on another (‘out-of-corpus’ performance), our measure of the model’s ability to generalize. We found that variational dropout improves out-of-corpus performance by an average of 4.62%, transfer learning by 6.48% and multi-task learning by 8.42%. The maximal increase we identified combines multi-task learning and variational dropout, which boosts out-of-corpus performance by 10.75%. Furthermore, we make available a new open-source tool, called Saber that implements our best BioNER models. AVAILABILITY AND IMPLEMENTATION: Source code for our biomedical IE tool is available at https://github.com/BaderLab/saber. Corpora and other resources used in this study are available at https://github.com/BaderLab/Towards-reliable-BioNER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-01-01 2019-06-20 /pmc/articles/PMC6956779/ /pubmed/31218364 http://dx.doi.org/10.1093/bioinformatics/btz504 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Giorgi, John M Bader, Gary D Towards reliable named entity recognition in the biomedical domain
title	Towards reliable named entity recognition in the biomedical domain
title_full	Towards reliable named entity recognition in the biomedical domain
title_fullStr	Towards reliable named entity recognition in the biomedical domain
title_full_unstemmed	Towards reliable named entity recognition in the biomedical domain
title_short	Towards reliable named entity recognition in the biomedical domain
title_sort	towards reliable named entity recognition in the biomedical domain
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6956779/ https://www.ncbi.nlm.nih.gov/pubmed/31218364 http://dx.doi.org/10.1093/bioinformatics/btz504
work_keys_str_mv	AT giorgijohnm towardsreliablenamedentityrecognitioninthebiomedicaldomain AT badergaryd towardsreliablenamedentityrecognitioninthebiomedicaldomain

Towards reliable named entity recognition in the biomedical domain

Ejemplares similares