Cargando…

Improving deep learning method for biomedical named entity recognition by using entity definition information

BACKGROUND: Biomedical named entity recognition (NER) is a fundamental task of biomedical text mining that finds the boundaries of entity mentions in biomedical text and determines their entity type. To accelerate the development of biomedical NER techniques in Spanish, the PharmaCoNER organizers la...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiong, Ying, Chen, Shuai, Tang, Buzhou, Chen, Qingcai, Wang, Xiaolong, Yan, Jun, Zhou, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8680061/
https://www.ncbi.nlm.nih.gov/pubmed/34920699
http://dx.doi.org/10.1186/s12859-021-04236-y
_version_ 1784616666327416832
author Xiong, Ying
Chen, Shuai
Tang, Buzhou
Chen, Qingcai
Wang, Xiaolong
Yan, Jun
Zhou, Yi
author_facet Xiong, Ying
Chen, Shuai
Tang, Buzhou
Chen, Qingcai
Wang, Xiaolong
Yan, Jun
Zhou, Yi
author_sort Xiong, Ying
collection PubMed
description BACKGROUND: Biomedical named entity recognition (NER) is a fundamental task of biomedical text mining that finds the boundaries of entity mentions in biomedical text and determines their entity type. To accelerate the development of biomedical NER techniques in Spanish, the PharmaCoNER organizers launched a competition to recognize pharmacological substances, compounds, and proteins. Biomedical NER is usually recognized as a sequence labeling task, and almost all state-of-the-art sequence labeling methods ignore the meaning of different entity types. In this paper, we investigate some methods to introduce the meaning of entity types in deep learning methods for biomedical NER and apply them to the PharmaCoNER 2019 challenge. The meaning of each entity type is represented by its definition information. MATERIAL AND METHOD: We investigate how to use entity definition information in the following two methods: (1) SQuad-style machine reading comprehension (MRC) methods that treat entity definition information as query and biomedical text as context and predict answer spans as entities. (2) Span-level one-pass (SOne) methods that predict entity spans of one type by one type and introduce entity type meaning, which is represented by entity definition information. All models are trained and tested on the PharmaCoNER 2019 corpus, and their performance is evaluated by strict micro-average precision, recall, and F1-score. RESULTS: Entity definition information brings improvements to both SQuad-style MRC and SOne methods by about 0.003 in micro-averaged F1-score. The SQuad-style MRC model using entity definition information as query achieves the best performance with a micro-averaged precision of 0.9225, a recall of 0.9050, and an F1-score of 0.9137, respectively. It outperforms the best model of the PharmaCoNER 2019 challenge by 0.0032 in F1-score. Compared with the state-of-the-art model without using manually-crafted features, our model obtains a 1% improvement in F1-score, which is significant. These results indicate that entity definition information is useful for deep learning methods on biomedical NER. CONCLUSION: Our entity definition information enhanced models achieve the state-of-the-art micro-average F1 score of 0.9137, which implies that entity definition information has a positive impact on biomedical NER detection. In the future, we will explore more entity definition information from knowledge graph.
format Online
Article
Text
id pubmed-8680061
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86800612021-12-17 Improving deep learning method for biomedical named entity recognition by using entity definition information Xiong, Ying Chen, Shuai Tang, Buzhou Chen, Qingcai Wang, Xiaolong Yan, Jun Zhou, Yi BMC Bioinformatics Research BACKGROUND: Biomedical named entity recognition (NER) is a fundamental task of biomedical text mining that finds the boundaries of entity mentions in biomedical text and determines their entity type. To accelerate the development of biomedical NER techniques in Spanish, the PharmaCoNER organizers launched a competition to recognize pharmacological substances, compounds, and proteins. Biomedical NER is usually recognized as a sequence labeling task, and almost all state-of-the-art sequence labeling methods ignore the meaning of different entity types. In this paper, we investigate some methods to introduce the meaning of entity types in deep learning methods for biomedical NER and apply them to the PharmaCoNER 2019 challenge. The meaning of each entity type is represented by its definition information. MATERIAL AND METHOD: We investigate how to use entity definition information in the following two methods: (1) SQuad-style machine reading comprehension (MRC) methods that treat entity definition information as query and biomedical text as context and predict answer spans as entities. (2) Span-level one-pass (SOne) methods that predict entity spans of one type by one type and introduce entity type meaning, which is represented by entity definition information. All models are trained and tested on the PharmaCoNER 2019 corpus, and their performance is evaluated by strict micro-average precision, recall, and F1-score. RESULTS: Entity definition information brings improvements to both SQuad-style MRC and SOne methods by about 0.003 in micro-averaged F1-score. The SQuad-style MRC model using entity definition information as query achieves the best performance with a micro-averaged precision of 0.9225, a recall of 0.9050, and an F1-score of 0.9137, respectively. It outperforms the best model of the PharmaCoNER 2019 challenge by 0.0032 in F1-score. Compared with the state-of-the-art model without using manually-crafted features, our model obtains a 1% improvement in F1-score, which is significant. These results indicate that entity definition information is useful for deep learning methods on biomedical NER. CONCLUSION: Our entity definition information enhanced models achieve the state-of-the-art micro-average F1 score of 0.9137, which implies that entity definition information has a positive impact on biomedical NER detection. In the future, we will explore more entity definition information from knowledge graph. BioMed Central 2021-12-17 /pmc/articles/PMC8680061/ /pubmed/34920699 http://dx.doi.org/10.1186/s12859-021-04236-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Xiong, Ying
Chen, Shuai
Tang, Buzhou
Chen, Qingcai
Wang, Xiaolong
Yan, Jun
Zhou, Yi
Improving deep learning method for biomedical named entity recognition by using entity definition information
title Improving deep learning method for biomedical named entity recognition by using entity definition information
title_full Improving deep learning method for biomedical named entity recognition by using entity definition information
title_fullStr Improving deep learning method for biomedical named entity recognition by using entity definition information
title_full_unstemmed Improving deep learning method for biomedical named entity recognition by using entity definition information
title_short Improving deep learning method for biomedical named entity recognition by using entity definition information
title_sort improving deep learning method for biomedical named entity recognition by using entity definition information
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8680061/
https://www.ncbi.nlm.nih.gov/pubmed/34920699
http://dx.doi.org/10.1186/s12859-021-04236-y
work_keys_str_mv AT xiongying improvingdeeplearningmethodforbiomedicalnamedentityrecognitionbyusingentitydefinitioninformation
AT chenshuai improvingdeeplearningmethodforbiomedicalnamedentityrecognitionbyusingentitydefinitioninformation
AT tangbuzhou improvingdeeplearningmethodforbiomedicalnamedentityrecognitionbyusingentitydefinitioninformation
AT chenqingcai improvingdeeplearningmethodforbiomedicalnamedentityrecognitionbyusingentitydefinitioninformation
AT wangxiaolong improvingdeeplearningmethodforbiomedicalnamedentityrecognitionbyusingentitydefinitioninformation
AT yanjun improvingdeeplearningmethodforbiomedicalnamedentityrecognitionbyusingentitydefinitioninformation
AT zhouyi improvingdeeplearningmethodforbiomedicalnamedentityrecognitionbyusingentitydefinitioninformation