Cargando…

BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework

BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and de...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Xiangwen, Du, Haijian, Luo, Xiaowei, Tong, Fan, Song, Wei, Zhao, Dongsheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9682683/
https://www.ncbi.nlm.nih.gov/pubmed/36418937
http://dx.doi.org/10.1186/s12859-022-05051-9
_version_ 1784834904742166528
author Zheng, Xiangwen
Du, Haijian
Luo, Xiaowei
Tong, Fan
Song, Wei
Zhao, Dongsheng
author_facet Zheng, Xiangwen
Du, Haijian
Luo, Xiaowei
Tong, Fan
Song, Wei
Zhao, Dongsheng
author_sort Zheng, Xiangwen
collection PubMed
description BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. RESULTS: In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. CONCLUSION: The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance.
format Online
Article
Text
id pubmed-9682683
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-96826832022-11-24 BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework Zheng, Xiangwen Du, Haijian Luo, Xiaowei Tong, Fan Song, Wei Zhao, Dongsheng BMC Bioinformatics Research BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. RESULTS: In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. CONCLUSION: The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance. BioMed Central 2022-11-22 /pmc/articles/PMC9682683/ /pubmed/36418937 http://dx.doi.org/10.1186/s12859-022-05051-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zheng, Xiangwen
Du, Haijian
Luo, Xiaowei
Tong, Fan
Song, Wei
Zhao, Dongsheng
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
title BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
title_full BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
title_fullStr BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
title_full_unstemmed BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
title_short BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
title_sort biobygans: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9682683/
https://www.ncbi.nlm.nih.gov/pubmed/36418937
http://dx.doi.org/10.1186/s12859-022-05051-9
work_keys_str_mv AT zhengxiangwen biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework
AT duhaijian biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework
AT luoxiaowei biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework
AT tongfan biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework
AT songwei biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework
AT zhaodongsheng biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework