Cargando…

Entity recognition from clinical texts via recurrent neural network

BACKGROUND: Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recog...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zengjian, Yang, Ming, Wang, Xiaolong, Chen, Qingcai, Tang, Buzhou, Wang, Zhe, Xu, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506598/
https://www.ncbi.nlm.nih.gov/pubmed/28699566
http://dx.doi.org/10.1186/s12911-017-0468-7
_version_ 1783249592338350080
author Liu, Zengjian
Yang, Ming
Wang, Xiaolong
Chen, Qingcai
Tang, Buzhou
Wang, Zhe
Xu, Hua
author_facet Liu, Zengjian
Yang, Ming
Wang, Xiaolong
Chen, Qingcai
Tang, Buzhou
Wang, Zhe
Xu, Hua
author_sort Liu, Zengjian
collection PubMed
description BACKGROUND: Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts. METHODS: In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer – generates representation of each word of a sentence; LSTM layer – outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer – makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence. RESULTS: Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems. CONCLUSIONS: LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge bases widely existing in the clinical domain into LSTM, which is a case of our future work. Moreover, how to use LSTM to recognize entities in specific formats is also another possible future direction.
format Online
Article
Text
id pubmed-5506598
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55065982017-07-12 Entity recognition from clinical texts via recurrent neural network Liu, Zengjian Yang, Ming Wang, Xiaolong Chen, Qingcai Tang, Buzhou Wang, Zhe Xu, Hua BMC Med Inform Decis Mak Research BACKGROUND: Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts. METHODS: In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer – generates representation of each word of a sentence; LSTM layer – outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer – makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence. RESULTS: Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems. CONCLUSIONS: LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge bases widely existing in the clinical domain into LSTM, which is a case of our future work. Moreover, how to use LSTM to recognize entities in specific formats is also another possible future direction. BioMed Central 2017-07-05 /pmc/articles/PMC5506598/ /pubmed/28699566 http://dx.doi.org/10.1186/s12911-017-0468-7 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Zengjian
Yang, Ming
Wang, Xiaolong
Chen, Qingcai
Tang, Buzhou
Wang, Zhe
Xu, Hua
Entity recognition from clinical texts via recurrent neural network
title Entity recognition from clinical texts via recurrent neural network
title_full Entity recognition from clinical texts via recurrent neural network
title_fullStr Entity recognition from clinical texts via recurrent neural network
title_full_unstemmed Entity recognition from clinical texts via recurrent neural network
title_short Entity recognition from clinical texts via recurrent neural network
title_sort entity recognition from clinical texts via recurrent neural network
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5506598/
https://www.ncbi.nlm.nih.gov/pubmed/28699566
http://dx.doi.org/10.1186/s12911-017-0468-7
work_keys_str_mv AT liuzengjian entityrecognitionfromclinicaltextsviarecurrentneuralnetwork
AT yangming entityrecognitionfromclinicaltextsviarecurrentneuralnetwork
AT wangxiaolong entityrecognitionfromclinicaltextsviarecurrentneuralnetwork
AT chenqingcai entityrecognitionfromclinicaltextsviarecurrentneuralnetwork
AT tangbuzhou entityrecognitionfromclinicaltextsviarecurrentneuralnetwork
AT wangzhe entityrecognitionfromclinicaltextsviarecurrentneuralnetwork
AT xuhua entityrecognitionfromclinicaltextsviarecurrentneuralnetwork