Cargando…

Improving biomedical named entity recognition with syntactic information

BACKGROUND: Biomedical named entity recognition (BioNER) is an important task for understanding biomedical texts, which can be challenging due to the lack of large-scale labeled training data and domain knowledge. To address the challenge, in addition to using powerful encoders (e.g., biLSTM and Bio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tian, Yuanhe, Shen, Wang, Song, Yan, Xia, Fei, He, Min, Li, Kenli
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7687711/ https://www.ncbi.nlm.nih.gov/pubmed/33238875 http://dx.doi.org/10.1186/s12859-020-03834-6

_version_	1783613580429492224
author	Tian, Yuanhe Shen, Wang Song, Yan Xia, Fei He, Min Li, Kenli
author_facet	Tian, Yuanhe Shen, Wang Song, Yan Xia, Fei He, Min Li, Kenli
author_sort	Tian, Yuanhe
collection	PubMed
description	BACKGROUND: Biomedical named entity recognition (BioNER) is an important task for understanding biomedical texts, which can be challenging due to the lack of large-scale labeled training data and domain knowledge. To address the challenge, in addition to using powerful encoders (e.g., biLSTM and BioBERT), one possible method is to leverage extra knowledge that is easy to obtain. Previous studies have shown that auto-processed syntactic information can be a useful resource to improve model performance, but their approaches are limited to directly concatenating the embeddings of syntactic information to the input word embeddings. Therefore, such syntactic information is leveraged in an inflexible way, where inaccurate one may hurt model performance. RESULTS: In this paper, we propose BioKMNER, a BioNER model for biomedical texts with key-value memory networks (KVMN) to incorporate auto-processed syntactic information. We evaluate BioKMNER on six English biomedical datasets, where our method with KVMN outperforms the strong baseline method, namely, BioBERT, from the previous study on all datasets. Specifically, the F1 scores of our best performing model are 85.29% on BC2GM, 77.83% on JNLPBA, 94.22% on BC5CDR-chemical, 90.08% on NCBI-disease, 89.24% on LINNAEUS, and 76.33% on Species-800, where state-of-the-art performance is obtained on four of them (i.e., BC2GM, BC5CDR-chemical, NCBI-disease, and Species-800). CONCLUSION: The experimental results on six English benchmark datasets demonstrate that auto-processed syntactic information can be a useful resource for BioNER and our method with KVMN can appropriately leverage such information to improve model performance.
format	Online Article Text
id	pubmed-7687711
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-76877112020-11-30 Improving biomedical named entity recognition with syntactic information Tian, Yuanhe Shen, Wang Song, Yan Xia, Fei He, Min Li, Kenli BMC Bioinformatics Research Article BACKGROUND: Biomedical named entity recognition (BioNER) is an important task for understanding biomedical texts, which can be challenging due to the lack of large-scale labeled training data and domain knowledge. To address the challenge, in addition to using powerful encoders (e.g., biLSTM and BioBERT), one possible method is to leverage extra knowledge that is easy to obtain. Previous studies have shown that auto-processed syntactic information can be a useful resource to improve model performance, but their approaches are limited to directly concatenating the embeddings of syntactic information to the input word embeddings. Therefore, such syntactic information is leveraged in an inflexible way, where inaccurate one may hurt model performance. RESULTS: In this paper, we propose BioKMNER, a BioNER model for biomedical texts with key-value memory networks (KVMN) to incorporate auto-processed syntactic information. We evaluate BioKMNER on six English biomedical datasets, where our method with KVMN outperforms the strong baseline method, namely, BioBERT, from the previous study on all datasets. Specifically, the F1 scores of our best performing model are 85.29% on BC2GM, 77.83% on JNLPBA, 94.22% on BC5CDR-chemical, 90.08% on NCBI-disease, 89.24% on LINNAEUS, and 76.33% on Species-800, where state-of-the-art performance is obtained on four of them (i.e., BC2GM, BC5CDR-chemical, NCBI-disease, and Species-800). CONCLUSION: The experimental results on six English benchmark datasets demonstrate that auto-processed syntactic information can be a useful resource for BioNER and our method with KVMN can appropriately leverage such information to improve model performance. BioMed Central 2020-11-25 /pmc/articles/PMC7687711/ /pubmed/33238875 http://dx.doi.org/10.1186/s12859-020-03834-6 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Tian, Yuanhe Shen, Wang Song, Yan Xia, Fei He, Min Li, Kenli Improving biomedical named entity recognition with syntactic information
title	Improving biomedical named entity recognition with syntactic information
title_full	Improving biomedical named entity recognition with syntactic information
title_fullStr	Improving biomedical named entity recognition with syntactic information
title_full_unstemmed	Improving biomedical named entity recognition with syntactic information
title_short	Improving biomedical named entity recognition with syntactic information
title_sort	improving biomedical named entity recognition with syntactic information
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7687711/ https://www.ncbi.nlm.nih.gov/pubmed/33238875 http://dx.doi.org/10.1186/s12859-020-03834-6
work_keys_str_mv	AT tianyuanhe improvingbiomedicalnamedentityrecognitionwithsyntacticinformation AT shenwang improvingbiomedicalnamedentityrecognitionwithsyntacticinformation AT songyan improvingbiomedicalnamedentityrecognitionwithsyntacticinformation AT xiafei improvingbiomedicalnamedentityrecognitionwithsyntacticinformation AT hemin improvingbiomedicalnamedentityrecognitionwithsyntacticinformation AT likenli improvingbiomedicalnamedentityrecognitionwithsyntacticinformation

Improving biomedical named entity recognition with syntactic information

Ejemplares similares