Cargando…
A hybrid approach for named entity recognition in Chinese electronic medical record
BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity reco...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454595/ https://www.ncbi.nlm.nih.gov/pubmed/30961597 http://dx.doi.org/10.1186/s12911-019-0767-2 |
_version_ | 1783409567180259328 |
---|---|
author | Ji, Bin Liu, Rui Li, Shasha Yu, Jie Wu, Qingbo Tan, Yusong Wu, Jiaju |
author_facet | Ji, Bin Liu, Rui Li, Shasha Yu, Jie Wu, Qingbo Tan, Yusong Wu, Jiaju |
author_sort | Ji, Bin |
collection | PubMed |
description | BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. RESULTS: The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. CONCLUSION: Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset. |
format | Online Article Text |
id | pubmed-6454595 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64545952019-04-19 A hybrid approach for named entity recognition in Chinese electronic medical record Ji, Bin Liu, Rui Li, Shasha Yu, Jie Wu, Qingbo Tan, Yusong Wu, Jiaju BMC Med Inform Decis Mak Research BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. RESULTS: The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. CONCLUSION: Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset. BioMed Central 2019-04-09 /pmc/articles/PMC6454595/ /pubmed/30961597 http://dx.doi.org/10.1186/s12911-019-0767-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ji, Bin Liu, Rui Li, Shasha Yu, Jie Wu, Qingbo Tan, Yusong Wu, Jiaju A hybrid approach for named entity recognition in Chinese electronic medical record |
title | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_full | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_fullStr | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_full_unstemmed | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_short | A hybrid approach for named entity recognition in Chinese electronic medical record |
title_sort | hybrid approach for named entity recognition in chinese electronic medical record |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454595/ https://www.ncbi.nlm.nih.gov/pubmed/30961597 http://dx.doi.org/10.1186/s12911-019-0767-2 |
work_keys_str_mv | AT jibin ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT liurui ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT lishasha ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT yujie ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT wuqingbo ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT tanyusong ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT wujiaju ahybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT jibin hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT liurui hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT lishasha hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT yujie hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT wuqingbo hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT tanyusong hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord AT wujiaju hybridapproachfornamedentityrecognitioninchineseelectronicmedicalrecord |