Cargando…

LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools

BACKGROUND: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical...

Descripción completa

Detalles Bibliográficos
Autores principales: Hemati, Wahed, Mehler, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6689880/
https://www.ncbi.nlm.nih.gov/pubmed/30631966
http://dx.doi.org/10.1186/s13321-018-0327-2
_version_ 1783443106854600704
author Hemati, Wahed
Mehler, Alexander
author_facet Hemati, Wahed
Mehler, Alexander
author_sort Hemati, Wahed
collection PubMed
description BACKGROUND: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier. RESULTS: We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%. AVAILABILITY AND IMPLEMENTATION: Data and code are available at https://github.com/texttechnologylab/LSTMVoter.
format Online
Article
Text
id pubmed-6689880
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-66898802019-08-15 LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools Hemati, Wahed Mehler, Alexander J Cheminform Research Article BACKGROUND: Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier. RESULTS: We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%. AVAILABILITY AND IMPLEMENTATION: Data and code are available at https://github.com/texttechnologylab/LSTMVoter. Springer International Publishing 2019-01-10 /pmc/articles/PMC6689880/ /pubmed/30631966 http://dx.doi.org/10.1186/s13321-018-0327-2 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Hemati, Wahed
Mehler, Alexander
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_full LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_fullStr LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_full_unstemmed LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_short LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
title_sort lstmvoter: chemical named entity recognition using a conglomerate of sequence labeling tools
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6689880/
https://www.ncbi.nlm.nih.gov/pubmed/30631966
http://dx.doi.org/10.1186/s13321-018-0327-2
work_keys_str_mv AT hematiwahed lstmvoterchemicalnamedentityrecognitionusingaconglomerateofsequencelabelingtools
AT mehleralexander lstmvoterchemicalnamedentityrecognitionusingaconglomerateofsequencelabelingtools