Cargando…

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

BACKGROUND: Electronic health records (EHRs) are important data resources for clinical studies and applications. Physicians or clinicians describe patients’ disorders or treatment procedures in EHRs using free text (unstructured) clinical notes. The narrative information plays an important role in p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yu, Wang, Xuwen, Hou, Zhen, Li, Jiao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2018
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6315256/ https://www.ncbi.nlm.nih.gov/pubmed/30559093 http://dx.doi.org/10.2196/medinform.9965

_version_	1783384251569274880
author	Zhang, Yu Wang, Xuwen Hou, Zhen Li, Jiao
author_facet	Zhang, Yu Wang, Xuwen Hou, Zhen Li, Jiao
author_sort	Zhang, Yu
collection	PubMed
description	BACKGROUND: Electronic health records (EHRs) are important data resources for clinical studies and applications. Physicians or clinicians describe patients’ disorders or treatment procedures in EHRs using free text (unstructured) clinical notes. The narrative information plays an important role in patient treatment and clinical research. However, it is challenging to make machines understand the clinical narratives. OBJECTIVE: This study aimed to automatically identify Chinese clinical entities from free text in EHRs and make machines semantically understand diagnoses, tests, body parts, symptoms, treatments, and so on. METHODS: The dataset we used for this study is the benchmark dataset with human annotated Chinese EHRs, released by the China Conference on Knowledge Graph and Semantic Computing 2017 clinical named entity recognition challenge task. Overall, 2 machine learning models, the conditional random fields (CRF) method and bidirectional long short-term memory (LSTM)-CRF, were applied to recognize clinical entities from Chinese EHR data. To train the CRF–based model, we selected features such as bag of Chinese characters, part-of-speech tags, character types, and the position of characters. For the bidirectional LSTM-CRF–based model, character embeddings and segmentation information were used as features. In addition, we also employed a dictionary-based approach as the baseline for the purpose of performance evaluation. Precision, recall, and the harmonic average of precision and recall (F1 score) were used to evaluate the performance of the methods. RESULTS: Experiments on the test set showed that our methods were able to automatically identify types of Chinese clinical entities such as diagnosis, test, symptom, body part, and treatment simultaneously. With regard to overall performance, CRF and bidirectional LSTM-CRF achieved a precision of 0.9203 and 0.9112, recall of 0.8709 and 0.8974, and F1 score of 0.8949 and 0.9043, respectively. The results also indicated that our methods performed well in recognizing each type of clinical entity, in which the “symptom” type achieved the best F1 score of over 0.96. Moreover, as the number of features increased, the F1 score of the CRF model increased from 0.8547 to 0.8949. CONCLUSIONS: In this study, we employed two computational methods to simultaneously identify types of Chinese clinical entities from free text in EHRs. With training, these methods can effectively identify various types of clinical entities (eg, symptom and treatment) with high accuracy. The deep learning model, bidirectional LSTM-CRF, can achieve better performance than the CRF model with little feature engineering. This study contributed to translating human-readable health information into machine-readable information.
format	Online Article Text
id	pubmed-6315256
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-63152562019-01-28 Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods Zhang, Yu Wang, Xuwen Hou, Zhen Li, Jiao JMIR Med Inform Original Paper BACKGROUND: Electronic health records (EHRs) are important data resources for clinical studies and applications. Physicians or clinicians describe patients’ disorders or treatment procedures in EHRs using free text (unstructured) clinical notes. The narrative information plays an important role in patient treatment and clinical research. However, it is challenging to make machines understand the clinical narratives. OBJECTIVE: This study aimed to automatically identify Chinese clinical entities from free text in EHRs and make machines semantically understand diagnoses, tests, body parts, symptoms, treatments, and so on. METHODS: The dataset we used for this study is the benchmark dataset with human annotated Chinese EHRs, released by the China Conference on Knowledge Graph and Semantic Computing 2017 clinical named entity recognition challenge task. Overall, 2 machine learning models, the conditional random fields (CRF) method and bidirectional long short-term memory (LSTM)-CRF, were applied to recognize clinical entities from Chinese EHR data. To train the CRF–based model, we selected features such as bag of Chinese characters, part-of-speech tags, character types, and the position of characters. For the bidirectional LSTM-CRF–based model, character embeddings and segmentation information were used as features. In addition, we also employed a dictionary-based approach as the baseline for the purpose of performance evaluation. Precision, recall, and the harmonic average of precision and recall (F1 score) were used to evaluate the performance of the methods. RESULTS: Experiments on the test set showed that our methods were able to automatically identify types of Chinese clinical entities such as diagnosis, test, symptom, body part, and treatment simultaneously. With regard to overall performance, CRF and bidirectional LSTM-CRF achieved a precision of 0.9203 and 0.9112, recall of 0.8709 and 0.8974, and F1 score of 0.8949 and 0.9043, respectively. The results also indicated that our methods performed well in recognizing each type of clinical entity, in which the “symptom” type achieved the best F1 score of over 0.96. Moreover, as the number of features increased, the F1 score of the CRF model increased from 0.8547 to 0.8949. CONCLUSIONS: In this study, we employed two computational methods to simultaneously identify types of Chinese clinical entities from free text in EHRs. With training, these methods can effectively identify various types of clinical entities (eg, symptom and treatment) with high accuracy. The deep learning model, bidirectional LSTM-CRF, can achieve better performance than the CRF model with little feature engineering. This study contributed to translating human-readable health information into machine-readable information. JMIR Publications 2018-12-17 /pmc/articles/PMC6315256/ /pubmed/30559093 http://dx.doi.org/10.2196/medinform.9965 Text en ©Yu Zhang, Xuwen Wang, Zhen Hou, Jiao Li. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 17.12.2018. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Zhang, Yu Wang, Xuwen Hou, Zhen Li, Jiao Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods
title	Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods
title_full	Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods
title_fullStr	Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods
title_full_unstemmed	Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods
title_short	Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods
title_sort	clinical named entity recognition from chinese electronic health records via machine learning methods
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6315256/ https://www.ncbi.nlm.nih.gov/pubmed/30559093 http://dx.doi.org/10.2196/medinform.9965
work_keys_str_mv	AT zhangyu clinicalnamedentityrecognitionfromchineseelectronichealthrecordsviamachinelearningmethods AT wangxuwen clinicalnamedentityrecognitionfromchineseelectronichealthrecordsviamachinelearningmethods AT houzhen clinicalnamedentityrecognitionfromchineseelectronichealthrecordsviamachinelearningmethods AT lijiao clinicalnamedentityrecognitionfromchineseelectronichealthrecordsviamachinelearningmethods

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

Ejemplares similares