Cargando…

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features

BACKGROUND: Named entity recognition (NER) is an important task in clinical natural language processing (NLP) research. Machine learning (ML) based NER methods have shown good performance in recognizing entities in clinical text. Algorithms and features are two important factors that largely affect...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Buzhou, Cao, Hongxin, Wu, Yonghui, Jiang, Min, Xu, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3618243/
https://www.ncbi.nlm.nih.gov/pubmed/23566040
http://dx.doi.org/10.1186/1472-6947-13-S1-S1
_version_ 1782265384920940544
author Tang, Buzhou
Cao, Hongxin
Wu, Yonghui
Jiang, Min
Xu, Hua
author_facet Tang, Buzhou
Cao, Hongxin
Wu, Yonghui
Jiang, Min
Xu, Hua
author_sort Tang, Buzhou
collection PubMed
description BACKGROUND: Named entity recognition (NER) is an important task in clinical natural language processing (NLP) research. Machine learning (ML) based NER methods have shown good performance in recognizing entities in clinical text. Algorithms and features are two important factors that largely affect the performance of ML-based NER systems. Conditional Random Fields (CRFs), a sequential labelling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to clinical NER tasks. For features, syntactic and semantic information of context words has often been used in clinical NER systems. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, and word representation features, which contain word-level back-off information over large unlabelled corpus by unsupervised algorithms, have not been extensively investigated for clinical text processing. Therefore, the primary goal of this study is to evaluate the use of SSVMs and word representation features in clinical NER tasks. METHODS: In this study, we developed SSVMs-based NER systems to recognize clinical entities in hospital discharge summaries, using the data set from the concept extration task in the 2010 i2b2 NLP challenge. We compared the performance of CRFs and SSVMs-based NER classifiers with the same feature sets. Furthermore, we extracted two different types of word representation features (clustering-based representation features and distributional representation features) and integrated them with the SSVMs-based clinical NER system. We then reported the performance of SSVM-based NER systems with different types of word representation features. RESULTS AND DISCUSSION: Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER systems achieved better performance than the CRFs-based systems for clinical entity recognition, when same features were used. Both types of word representation features (clustering-based and distributional representations) improved the performance of ML-based NER systems. By combining two different types of word representation features together with SSVMs, our system achieved a highest F-measure of 85.82%, which outperformed the best system reported in the challenge by 0.6%. Our results show that SSVMs is a great potential algorithm for clinical NLP research, and both types of unsupervised word representation features are beneficial to clinical NER tasks.
format Online
Article
Text
id pubmed-3618243
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36182432013-04-09 Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features Tang, Buzhou Cao, Hongxin Wu, Yonghui Jiang, Min Xu, Hua BMC Med Inform Decis Mak Proceedings BACKGROUND: Named entity recognition (NER) is an important task in clinical natural language processing (NLP) research. Machine learning (ML) based NER methods have shown good performance in recognizing entities in clinical text. Algorithms and features are two important factors that largely affect the performance of ML-based NER systems. Conditional Random Fields (CRFs), a sequential labelling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to clinical NER tasks. For features, syntactic and semantic information of context words has often been used in clinical NER systems. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, and word representation features, which contain word-level back-off information over large unlabelled corpus by unsupervised algorithms, have not been extensively investigated for clinical text processing. Therefore, the primary goal of this study is to evaluate the use of SSVMs and word representation features in clinical NER tasks. METHODS: In this study, we developed SSVMs-based NER systems to recognize clinical entities in hospital discharge summaries, using the data set from the concept extration task in the 2010 i2b2 NLP challenge. We compared the performance of CRFs and SSVMs-based NER classifiers with the same feature sets. Furthermore, we extracted two different types of word representation features (clustering-based representation features and distributional representation features) and integrated them with the SSVMs-based clinical NER system. We then reported the performance of SSVM-based NER systems with different types of word representation features. RESULTS AND DISCUSSION: Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER systems achieved better performance than the CRFs-based systems for clinical entity recognition, when same features were used. Both types of word representation features (clustering-based and distributional representations) improved the performance of ML-based NER systems. By combining two different types of word representation features together with SSVMs, our system achieved a highest F-measure of 85.82%, which outperformed the best system reported in the challenge by 0.6%. Our results show that SSVMs is a great potential algorithm for clinical NLP research, and both types of unsupervised word representation features are beneficial to clinical NER tasks. BioMed Central 2013-04-05 /pmc/articles/PMC3618243/ /pubmed/23566040 http://dx.doi.org/10.1186/1472-6947-13-S1-S1 Text en Copyright © 2013 Tang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Tang, Buzhou
Cao, Hongxin
Wu, Yonghui
Jiang, Min
Xu, Hua
Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features
title Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features
title_full Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features
title_fullStr Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features
title_full_unstemmed Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features
title_short Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features
title_sort recognizing clinical entities in hospital discharge summaries using structural support vector machines with word representation features
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3618243/
https://www.ncbi.nlm.nih.gov/pubmed/23566040
http://dx.doi.org/10.1186/1472-6947-13-S1-S1
work_keys_str_mv AT tangbuzhou recognizingclinicalentitiesinhospitaldischargesummariesusingstructuralsupportvectormachineswithwordrepresentationfeatures
AT caohongxin recognizingclinicalentitiesinhospitaldischargesummariesusingstructuralsupportvectormachineswithwordrepresentationfeatures
AT wuyonghui recognizingclinicalentitiesinhospitaldischargesummariesusingstructuralsupportvectormachineswithwordrepresentationfeatures
AT jiangmin recognizingclinicalentitiesinhospitaldischargesummariesusingstructuralsupportvectormachineswithwordrepresentationfeatures
AT xuhua recognizingclinicalentitiesinhospitaldischargesummariesusingstructuralsupportvectormachineswithwordrepresentationfeatures