Cargando…

A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures

Recognition of Traditional Chinese Medicine (TCM) entities from different types of literature is challenging research, which is the foundation for extracting a large amount of TCM knowledge existing in unstructured texts into structured formats. The lack of large-scale annotated data makes unsatisfa...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Yuekun, Liu, Yun, Zhang, Dezheng, Zhang, Jiye, Liu, He, Xie, Yonghong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9553443/
https://www.ncbi.nlm.nih.gov/pubmed/36248956
http://dx.doi.org/10.1155/2022/1495841
_version_ 1784806472034549760
author Ma, Yuekun
Liu, Yun
Zhang, Dezheng
Zhang, Jiye
Liu, He
Xie, Yonghong
author_facet Ma, Yuekun
Liu, Yun
Zhang, Dezheng
Zhang, Jiye
Liu, He
Xie, Yonghong
author_sort Ma, Yuekun
collection PubMed
description Recognition of Traditional Chinese Medicine (TCM) entities from different types of literature is challenging research, which is the foundation for extracting a large amount of TCM knowledge existing in unstructured texts into structured formats. The lack of large-scale annotated data makes unsatisfactory application of conventional deep learning models in TCM text knowledge extraction. Some other unsupervised methods rely on other auxiliary data, such as domain dictionaries. We propose a multigranularity text-driven NER model based on Conditional Generation Adversarial Network (MT-CGAN) to implement TCM NER with small-scale annotated corpus. In the model, a multigranularity text features encoder (MTFE) is designed to extract rich semantic and grammatical information from multiple dimensions of TCM texts. By differentiating the conditional constraints of the generator and discriminator of MT-CGAN, the synchronization between the generated tag labs and the named entities is guaranteed. Furthermore, seeds of different TCM text types are introduced into our model to improve the precision of NER. We compare our method with other baseline methods to illustrate the effectiveness of our method on 4 kinds of gold-standard datasets. The experiment results show that the standard precision, recall, and F1 score of our method are higher than the state-of-the-art methods by 0.24∼8.97%, 0.89∼12.74%, and 0.01∼10.84%. MT-CGAN is able to extract entities from different types of TCM literature effectively. Our experimental results indicate that the proposed approach has a clear advantage in processing TCM texts with more entity types, higher sparsity, less regular features, and a small-scale corpus.
format Online
Article
Text
id pubmed-9553443
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-95534432022-10-13 A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures Ma, Yuekun Liu, Yun Zhang, Dezheng Zhang, Jiye Liu, He Xie, Yonghong Comput Intell Neurosci Research Article Recognition of Traditional Chinese Medicine (TCM) entities from different types of literature is challenging research, which is the foundation for extracting a large amount of TCM knowledge existing in unstructured texts into structured formats. The lack of large-scale annotated data makes unsatisfactory application of conventional deep learning models in TCM text knowledge extraction. Some other unsupervised methods rely on other auxiliary data, such as domain dictionaries. We propose a multigranularity text-driven NER model based on Conditional Generation Adversarial Network (MT-CGAN) to implement TCM NER with small-scale annotated corpus. In the model, a multigranularity text features encoder (MTFE) is designed to extract rich semantic and grammatical information from multiple dimensions of TCM texts. By differentiating the conditional constraints of the generator and discriminator of MT-CGAN, the synchronization between the generated tag labs and the named entities is guaranteed. Furthermore, seeds of different TCM text types are introduced into our model to improve the precision of NER. We compare our method with other baseline methods to illustrate the effectiveness of our method on 4 kinds of gold-standard datasets. The experiment results show that the standard precision, recall, and F1 score of our method are higher than the state-of-the-art methods by 0.24∼8.97%, 0.89∼12.74%, and 0.01∼10.84%. MT-CGAN is able to extract entities from different types of TCM literature effectively. Our experimental results indicate that the proposed approach has a clear advantage in processing TCM texts with more entity types, higher sparsity, less regular features, and a small-scale corpus. Hindawi 2022-09-24 /pmc/articles/PMC9553443/ /pubmed/36248956 http://dx.doi.org/10.1155/2022/1495841 Text en Copyright © 2022 Yuekun Ma et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ma, Yuekun
Liu, Yun
Zhang, Dezheng
Zhang, Jiye
Liu, He
Xie, Yonghong
A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures
title A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures
title_full A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures
title_fullStr A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures
title_full_unstemmed A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures
title_short A Multigranularity Text Driven Named Entity Recognition CGAN Model for Traditional Chinese Medicine Literatures
title_sort multigranularity text driven named entity recognition cgan model for traditional chinese medicine literatures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9553443/
https://www.ncbi.nlm.nih.gov/pubmed/36248956
http://dx.doi.org/10.1155/2022/1495841
work_keys_str_mv AT mayuekun amultigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT liuyun amultigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT zhangdezheng amultigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT zhangjiye amultigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT liuhe amultigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT xieyonghong amultigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT mayuekun multigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT liuyun multigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT zhangdezheng multigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT zhangjiye multigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT liuhe multigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures
AT xieyonghong multigranularitytextdrivennamedentityrecognitioncganmodelfortraditionalchinesemedicineliteratures