Cargando…

TCMNER and PubMed: A Novel Chinese Character-Level-Based Model and a Dataset for TCM Named Entity Recognition

Intelligent traditional Chinese medicine (TCM) has become a popular research field by means of prospering of deep learning technology. Important achievements have been made in such representative tasks as automatic diagnosis of TCM syndromes and diseases and generation of TCM herbal prescriptions. H...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhi, Luo, Changyong, Zheng, Zeyu, Li, Yan, Fu, Dianzheng, Yu, Xinzhu, Zhao, Jiawei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369169/
https://www.ncbi.nlm.nih.gov/pubmed/34413968
http://dx.doi.org/10.1155/2021/3544281
Descripción
Sumario:Intelligent traditional Chinese medicine (TCM) has become a popular research field by means of prospering of deep learning technology. Important achievements have been made in such representative tasks as automatic diagnosis of TCM syndromes and diseases and generation of TCM herbal prescriptions. However, one unavoidable issue that still hinders its progress is the lack of labeled samples, i.e., the TCM medical records. As an efficient tool, the named entity recognition (NER) models trained on various TCM resources can effectively alleviate this problem and continuously increase the labeled TCM samples. In this work, on the basis of in-depth analysis, we argue that the performance of the TCM named entity recognition model can be better by using the character-level representation and tagging and propose a novel word-character integrated self-attention module. With the help of TCM doctors and experts, we define 5 classes of TCM named entities and construct a comprehensive NER dataset containing the standard content of the publications and the clinical medical records. The experimental results on this dataset demonstrate the effectiveness of the proposed module.