Cargando…
BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework
BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and de...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9682683/ https://www.ncbi.nlm.nih.gov/pubmed/36418937 http://dx.doi.org/10.1186/s12859-022-05051-9 |
_version_ | 1784834904742166528 |
---|---|
author | Zheng, Xiangwen Du, Haijian Luo, Xiaowei Tong, Fan Song, Wei Zhao, Dongsheng |
author_facet | Zheng, Xiangwen Du, Haijian Luo, Xiaowei Tong, Fan Song, Wei Zhao, Dongsheng |
author_sort | Zheng, Xiangwen |
collection | PubMed |
description | BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. RESULTS: In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. CONCLUSION: The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance. |
format | Online Article Text |
id | pubmed-9682683 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-96826832022-11-24 BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework Zheng, Xiangwen Du, Haijian Luo, Xiaowei Tong, Fan Song, Wei Zhao, Dongsheng BMC Bioinformatics Research BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. RESULTS: In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. CONCLUSION: The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance. BioMed Central 2022-11-22 /pmc/articles/PMC9682683/ /pubmed/36418937 http://dx.doi.org/10.1186/s12859-022-05051-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zheng, Xiangwen Du, Haijian Luo, Xiaowei Tong, Fan Song, Wei Zhao, Dongsheng BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework |
title | BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework |
title_full | BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework |
title_fullStr | BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework |
title_full_unstemmed | BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework |
title_short | BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework |
title_sort | biobygans: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9682683/ https://www.ncbi.nlm.nih.gov/pubmed/36418937 http://dx.doi.org/10.1186/s12859-022-05051-9 |
work_keys_str_mv | AT zhengxiangwen biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework AT duhaijian biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework AT luoxiaowei biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework AT tongfan biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework AT songwei biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework AT zhaodongsheng biobygansbiomedicalnamedentityrecognitionbyfusingcontextualandsyntacticfeaturesthroughgraphattentionnetworkinnodeclassificationframework |