Cargando…

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records

OBJECTIVE: Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, An, Hu, Jiahui, Zhao, Wanqing, Feng, Ming, Fu, Ji, Feng, Shanshan, Lou, Pei, Ren, Huiling, Chen, Xianlai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8941801/
https://www.ncbi.nlm.nih.gov/pubmed/35321705
http://dx.doi.org/10.1186/s12911-022-01810-z
_version_ 1784673177816793088
author Fang, An
Hu, Jiahui
Zhao, Wanqing
Feng, Ming
Fu, Ji
Feng, Shanshan
Lou, Pei
Ren, Huiling
Chen, Xianlai
author_facet Fang, An
Hu, Jiahui
Zhao, Wanqing
Feng, Ming
Fu, Ji
Feng, Shanshan
Lou, Pei
Ren, Huiling
Chen, Xianlai
author_sort Fang, An
collection PubMed
description OBJECTIVE: Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. METHODS: The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. RESULTS: Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. CONCLUSIONS: In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.
format Online
Article
Text
id pubmed-8941801
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89418012022-03-24 Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records Fang, An Hu, Jiahui Zhao, Wanqing Feng, Ming Fu, Ji Feng, Shanshan Lou, Pei Ren, Huiling Chen, Xianlai BMC Med Inform Decis Mak Research OBJECTIVE: Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. METHODS: The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. RESULTS: Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. CONCLUSIONS: In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts. BioMed Central 2022-03-23 /pmc/articles/PMC8941801/ /pubmed/35321705 http://dx.doi.org/10.1186/s12911-022-01810-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Fang, An
Hu, Jiahui
Zhao, Wanqing
Feng, Ming
Fu, Ji
Feng, Shanshan
Lou, Pei
Ren, Huiling
Chen, Xianlai
Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_full Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_fullStr Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_full_unstemmed Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_short Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
title_sort extracting clinical named entity for pituitary adenomas from chinese electronic medical records
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8941801/
https://www.ncbi.nlm.nih.gov/pubmed/35321705
http://dx.doi.org/10.1186/s12911-022-01810-z
work_keys_str_mv AT fangan extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT hujiahui extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT zhaowanqing extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT fengming extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT fuji extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT fengshanshan extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT loupei extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT renhuiling extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords
AT chenxianlai extractingclinicalnamedentityforpituitaryadenomasfromchineseelectronicmedicalrecords