Cargando…
Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes
BACKGROUND: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant coul...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6990512/ https://www.ncbi.nlm.nih.gov/pubmed/32000677 http://dx.doi.org/10.1186/s12859-020-3375-3 |
_version_ | 1783492516407934976 |
---|---|
author | Zhou, Huiwei Ning, Shixian Liu, Zhe Lang, Chengkun Liu, Zhuang Lei, Bizun |
author_facet | Zhou, Huiwei Ning, Shixian Liu, Zhe Lang, Chengkun Liu, Zhuang Lei, Bizun |
author_sort | Zhou, Huiwei |
collection | PubMed |
description | BACKGROUND: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. RESULTS: To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871 F1-score for PNER and 0.445 F1-score for PNEN, respectively, leading to a new state-of-the-art performance. CONCLUSIONS: We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement. |
format | Online Article Text |
id | pubmed-6990512 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69905122020-02-03 Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes Zhou, Huiwei Ning, Shixian Liu, Zhe Lang, Chengkun Liu, Zhuang Lei, Bizun BMC Bioinformatics Research Article BACKGROUND: Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations and entity ambiguity. A biomedical entity may have multiple variants and a variant could denote several different entity identifiers. RESULTS: To remedy the above issues, we present a novel knowledge-enhanced system for protein/gene named entity recognition (PNER) and normalization (PNEN). On one hand, a large amount of entity name knowledge extracted from biomedical knowledge bases is used to recognize more entity variants. On the other hand, structural knowledge of entities is extracted and encoded as identifier (ID) embeddings, which are then used for better entity normalization. Moreover, deep contextualized word representations generated by pre-trained language models are also incorporated into our knowledge-enhanced system for modeling multi-sense information of entities. Experimental results on the BioCreative VI Bio-ID corpus show that our proposed knowledge-enhanced system achieves 0.871 F1-score for PNER and 0.445 F1-score for PNEN, respectively, leading to a new state-of-the-art performance. CONCLUSIONS: We propose a knowledge-enhanced system that combines both entity knowledge and deep contextualized word representations. Comparison results show that entity knowledge is beneficial to the PNER and PNEN task and can be well combined with contextualized information in our system for further improvement. BioMed Central 2020-01-30 /pmc/articles/PMC6990512/ /pubmed/32000677 http://dx.doi.org/10.1186/s12859-020-3375-3 Text en © The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Zhou, Huiwei Ning, Shixian Liu, Zhe Lang, Chengkun Liu, Zhuang Lei, Bizun Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes |
title | Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes |
title_full | Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes |
title_fullStr | Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes |
title_full_unstemmed | Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes |
title_short | Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes |
title_sort | knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6990512/ https://www.ncbi.nlm.nih.gov/pubmed/32000677 http://dx.doi.org/10.1186/s12859-020-3375-3 |
work_keys_str_mv | AT zhouhuiwei knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes AT ningshixian knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes AT liuzhe knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes AT langchengkun knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes AT liuzhuang knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes AT leibizun knowledgeenhancedbiomedicalnamedentityrecognitionandnormalizationapplicationtoproteinsandgenes |