Cargando…

A hybrid approach for named entity recognition in Chinese electronic medical record

BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity reco...

Descripción completa

Detalles Bibliográficos
Autores principales: Ji, Bin, Liu, Rui, Li, Shasha, Yu, Jie, Wu, Qingbo, Tan, Yusong, Wu, Jiaju
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454595/
https://www.ncbi.nlm.nih.gov/pubmed/30961597
http://dx.doi.org/10.1186/s12911-019-0767-2
_version_ 1783409567180259328
author Ji, Bin
Liu, Rui
Li, Shasha
Yu, Jie
Wu, Qingbo
Tan, Yusong
Wu, Jiaju
author_facet Ji, Bin
Liu, Rui
Li, Shasha
Yu, Jie
Wu, Qingbo
Tan, Yusong
Wu, Jiaju
author_sort Ji, Bin
collection PubMed
description BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. RESULTS: The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. CONCLUSION: Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset.
format Online
Article
Text
id pubmed-6454595
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64545952019-04-19 A hybrid approach for named entity recognition in Chinese electronic medical record Ji, Bin Liu, Rui Li, Shasha Yu, Jie Wu, Qingbo Tan, Yusong Wu, Jiaju BMC Med Inform Decis Mak Research BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. RESULTS: The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. CONCLUSION: Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset. BioMed Central 2019-04-09 /pmc/articles/PMC6454595/ /pubmed/30961597 http://dx.doi.org/10.1186/s12911-019-0767-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ji, Bin
Liu, Rui
Li, Shasha
Yu, Jie
Wu, Qingbo
Tan, Yusong
Wu, Jiaju
A hybrid approach for named entity recognition in Chinese electronic medical record
title A hybrid approach for named entity recognition in Chinese electronic medical record
title_full A hybrid approach for named entity recognition in Chinese electronic medical record
title_fullStr A hybrid approach for named entity recognition in Chinese electronic medical record
title_full_unstemmed A hybrid approach for named entity recognition in Chinese electronic medical record
title_short A hybrid approach for named entity recognition in Chinese electronic medical record
title_sort hybrid approach for named entity recognition in chinese electronic medical record
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454595/
https://www.ncbi.nlm.nih.gov/pubmed/30961597
http://dx.doi.org/10.1186/s12911-019-0767-2
work_keys_str_mv AT jibin ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT liurui ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT lishasha ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT yujie ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT wuqingbo ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT tanyusong ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT wujiaju ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT jibin hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT liurui hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT lishasha hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT yujie hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT wuqingbo hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT tanyusong hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord
AT wujiaju hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord