Cargando…

Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes

BACKGROUND: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant coul...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Huiwei, Ning, Shixian, Liu, Zhe, Lang, Chengkun, Liu, Zhuang, Lei, Bizun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6990512/
https://www.ncbi.nlm.nih.gov/pubmed/32000677
http://dx.doi.org/10.1186/s12859-020-3375-3
_version_ 1783492516407934976
author Zhou, Huiwei
Ning, Shixian
Liu, Zhe
Lang, Chengkun
Liu, Zhuang
Lei, Bizun
author_facet Zhou, Huiwei
Ning, Shixian
Liu, Zhe
Lang, Chengkun
Liu, Zhuang
Lei, Bizun
author_sort Zhou, Huiwei
collection PubMed
description BACKGROUND: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. RESULTS: To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871 F1-score for PNER and 0.445 F1-score for PNEN, respectively, leading to a new state-of-the-art performance. CONCLUSIONS: We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement.
format Online
Article
Text
id pubmed-6990512
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69905122020-02-03 Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes Zhou, Huiwei Ning, Shixian Liu, Zhe Lang, Chengkun Liu, Zhuang Lei, Bizun BMC Bioinformatics Research Article BACKGROUND: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. RESULTS: To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871 F1-score for PNER and 0.445 F1-score for PNEN, respectively, leading to a new state-of-the-art performance. CONCLUSIONS: We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement. BioMed Central 2020-01-30 /pmc/articles/PMC6990512/ /pubmed/32000677 http://dx.doi.org/10.1186/s12859-020-3375-3 Text en © The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zhou, Huiwei
Ning, Shixian
Liu, Zhe
Lang, Chengkun
Liu, Zhuang
Lei, Bizun
Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
title Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
title_full Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
title_fullStr Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
title_full_unstemmed Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
title_short Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
title_sort knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6990512/
https://www.ncbi.nlm.nih.gov/pubmed/32000677
http://dx.doi.org/10.1186/s12859-020-3375-3
work_keys_str_mv AT zhouhuiwei knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes
AT ningshixian knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes
AT liuzhe knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes
AT langchengkun knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes
AT liuzhuang knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes
AT leibizun knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes