Cargando…

Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts

Biomedical named entity recognition (BioNER) from clinical texts is a fundamental task for clinical data analysis due to the availability of large volume of electronic medical record data, which are mostly in free text format, in real-world clinical settings. Clinical text data incorporates signific...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Qunsheng, Yang, Kuo, Shu, Zixin, Chang, Kai, Zheng, Qiguang, Zheng, Yi, Lu, Kezhi, Xu, Ning, Tian, Haoyu, Li, Xiaomeng, Yang, Yuxia, Zhou, Yana, Yu, Haibin, Zhang, Xiaoping, Xia, Jianan, Zhu, Qiang, Poon, Josiah, Poon, Simon, Zhang, Runshun, Li, Xiaodong, Zhou, Xuezhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8941495/
https://www.ncbi.nlm.nih.gov/pubmed/35342762
http://dx.doi.org/10.1155/2022/3524090
_version_ 1784673118816567296
author Zou, Qunsheng
Yang, Kuo
Shu, Zixin
Chang, Kai
Zheng, Qiguang
Zheng, Yi
Lu, Kezhi
Xu, Ning
Tian, Haoyu
Li, Xiaomeng
Yang, Yuxia
Zhou, Yana
Yu, Haibin
Zhang, Xiaoping
Xia, Jianan
Zhu, Qiang
Poon, Josiah
Poon, Simon
Zhang, Runshun
Li, Xiaodong
Zhou, Xuezhong
author_facet Zou, Qunsheng
Yang, Kuo
Shu, Zixin
Chang, Kai
Zheng, Qiguang
Zheng, Yi
Lu, Kezhi
Xu, Ning
Tian, Haoyu
Li, Xiaomeng
Yang, Yuxia
Zhou, Yana
Yu, Haibin
Zhang, Xiaoping
Xia, Jianan
Zhu, Qiang
Poon, Josiah
Poon, Simon
Zhang, Runshun
Li, Xiaodong
Zhou, Xuezhong
author_sort Zou, Qunsheng
collection PubMed
description Biomedical named entity recognition (BioNER) from clinical texts is a fundamental task for clinical data analysis due to the availability of large volume of electronic medical record data, which are mostly in free text format, in real-world clinical settings. Clinical text data incorporates significant phenotypic medical entities (e.g., symptoms, diseases, and laboratory indexes), which could be used for profiling the clinical characteristics of patients in specific disease conditions (e.g., Coronavirus Disease 2019 (COVID-19)). However, general BioNER approaches mostly rely on coarse-grained annotations of phenotypic entities in benchmark text dataset. Owing to the numerous negation expressions of phenotypic entities (e.g., “no fever,” “no cough,” and “no hypertension”) in clinical texts, this could not feed the subsequent data analysis process with well-prepared structured clinical data. In this paper, we developed Human-machine Cooperative Phenotypic Spectrum Annotation System (http://www.tcmai.org/login, HCPSAS) and constructed a fine-grained Chinese clinical corpus. Thereafter, we proposed a phenotypic named entity recognizer: Phenonizer, which utilized BERT to capture character-level global contextual representation, extracted local contextual features combined with bidirectional long short-term memory, and finally obtained the optimal label sequences through conditional random field. The results on COVID-19 dataset show that Phenonizer outperforms those methods based on Word2Vec with an F1-score of 0.896. By comparing character embeddings from different data, it is found that character embeddings trained by clinical corpora can improve F-score by 0.0103. In addition, we evaluated Phenonizer on two kinds of granular datasets and proved that fine-grained dataset can boost methods' F1-score slightly by about 0.005. Furthermore, the fine-grained dataset enables methods to distinguish between negated symptoms and presented symptoms. Finally, we tested the generalization performance of Phenonizer, achieving a superior F1-score of 0.8389. In summary, together with fine-grained annotated benchmark dataset, Phenonizer proposes a feasible approach to effectively extract symptom information from Chinese clinical texts with acceptable performance.
format Online
Article
Text
id pubmed-8941495
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-89414952022-03-24 Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts Zou, Qunsheng Yang, Kuo Shu, Zixin Chang, Kai Zheng, Qiguang Zheng, Yi Lu, Kezhi Xu, Ning Tian, Haoyu Li, Xiaomeng Yang, Yuxia Zhou, Yana Yu, Haibin Zhang, Xiaoping Xia, Jianan Zhu, Qiang Poon, Josiah Poon, Simon Zhang, Runshun Li, Xiaodong Zhou, Xuezhong Biomed Res Int Research Article Biomedical named entity recognition (BioNER) from clinical texts is a fundamental task for clinical data analysis due to the availability of large volume of electronic medical record data, which are mostly in free text format, in real-world clinical settings. Clinical text data incorporates significant phenotypic medical entities (e.g., symptoms, diseases, and laboratory indexes), which could be used for profiling the clinical characteristics of patients in specific disease conditions (e.g., Coronavirus Disease 2019 (COVID-19)). However, general BioNER approaches mostly rely on coarse-grained annotations of phenotypic entities in benchmark text dataset. Owing to the numerous negation expressions of phenotypic entities (e.g., “no fever,” “no cough,” and “no hypertension”) in clinical texts, this could not feed the subsequent data analysis process with well-prepared structured clinical data. In this paper, we developed Human-machine Cooperative Phenotypic Spectrum Annotation System (http://www.tcmai.org/login, HCPSAS) and constructed a fine-grained Chinese clinical corpus. Thereafter, we proposed a phenotypic named entity recognizer: Phenonizer, which utilized BERT to capture character-level global contextual representation, extracted local contextual features combined with bidirectional long short-term memory, and finally obtained the optimal label sequences through conditional random field. The results on COVID-19 dataset show that Phenonizer outperforms those methods based on Word2Vec with an F1-score of 0.896. By comparing character embeddings from different data, it is found that character embeddings trained by clinical corpora can improve F-score by 0.0103. In addition, we evaluated Phenonizer on two kinds of granular datasets and proved that fine-grained dataset can boost methods' F1-score slightly by about 0.005. Furthermore, the fine-grained dataset enables methods to distinguish between negated symptoms and presented symptoms. Finally, we tested the generalization performance of Phenonizer, achieving a superior F1-score of 0.8389. In summary, together with fine-grained annotated benchmark dataset, Phenonizer proposes a feasible approach to effectively extract symptom information from Chinese clinical texts with acceptable performance. Hindawi 2022-03-23 /pmc/articles/PMC8941495/ /pubmed/35342762 http://dx.doi.org/10.1155/2022/3524090 Text en Copyright © 2022 Qunsheng Zou et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zou, Qunsheng
Yang, Kuo
Shu, Zixin
Chang, Kai
Zheng, Qiguang
Zheng, Yi
Lu, Kezhi
Xu, Ning
Tian, Haoyu
Li, Xiaomeng
Yang, Yuxia
Zhou, Yana
Yu, Haibin
Zhang, Xiaoping
Xia, Jianan
Zhu, Qiang
Poon, Josiah
Poon, Simon
Zhang, Runshun
Li, Xiaodong
Zhou, Xuezhong
Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts
title Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts
title_full Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts
title_fullStr Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts
title_full_unstemmed Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts
title_short Phenonizer: A Fine-Grained Phenotypic Named Entity Recognizer for Chinese Clinical Texts
title_sort phenonizer: a fine-grained phenotypic named entity recognizer for chinese clinical texts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8941495/
https://www.ncbi.nlm.nih.gov/pubmed/35342762
http://dx.doi.org/10.1155/2022/3524090
work_keys_str_mv AT zouqunsheng phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT yangkuo phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT shuzixin phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT changkai phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT zhengqiguang phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT zhengyi phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT lukezhi phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT xuning phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT tianhaoyu phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT lixiaomeng phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT yangyuxia phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT zhouyana phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT yuhaibin phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT zhangxiaoping phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT xiajianan phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT zhuqiang phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT poonjosiah phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT poonsimon phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT zhangrunshun phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT lixiaodong phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts
AT zhouxuezhong phenonizerafinegrainedphenotypicnamedentityrecognizerforchineseclinicaltexts