Cargando…

A hybrid approach for named entity recognition in Chinese electronic medical record

BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity reco...

Descripción completa

Detalles Bibliográficos
Autores principales: Ji, Bin, Liu, Rui, Li, Shasha, Yu, Jie, Wu, Qingbo, Tan, Yusong, Wu, Jiaju
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454595/
https://www.ncbi.nlm.nih.gov/pubmed/30961597
http://dx.doi.org/10.1186/s12911-019-0767-2
Descripción
Sumario:BACKGROUND: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic. METHODS: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance. RESULTS: The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours. CONCLUSION: Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset.