Cargando…

Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition

BACKGROUND: This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Wangjin, Choi, Jinwook
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Technical Advance
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6632205/ https://www.ncbi.nlm.nih.gov/pubmed/31307440 http://dx.doi.org/10.1186/s12911-019-0865-1

_version_	1783435691547426816
author	Lee, Wangjin Choi, Jinwook
author_facet	Lee, Wangjin Choi, Jinwook
author_sort	Lee, Wangjin
collection	PubMed
description	BACKGROUND: This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model that constrains label transition dependency of adjoining labels under the Markov assumption. METHODS: Based on the first-order structure, our proposed model utilizes non-entity tokens between separated entities as an information transmission medium by applying a label induction method. The model is referred to as precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model’s structure allows the precursor entity information to propagate forward through the label sequence. RESULTS: We compared the proposed model with both first- and second-order CRFs in terms of their F(1)-scores, using two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital electronic health record). The proposed model demonstrated better entity recognition performance than both the first- and second-order CRFs and was also more efficient than the higher-order model. CONCLUSION: The proposed precursor-induced CRF which uses non-entity labels as label transition information improves entity recognition F(1) score by exploiting long-distance transition factors without exponentially increasing the computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors showed even worse results than the first-order model and required the longest computation time. Thus, the proposed model could offer a considerable performance improvement over current clinical named entity recognition methods based on the CRF models.
format	Online Article Text
id	pubmed-6632205
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-66322052019-07-25 Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition Lee, Wangjin Choi, Jinwook BMC Med Inform Decis Mak Technical Advance BACKGROUND: This paper presents a conditional random fields (CRF) method that enables the capture of specific high-order label transition factors to improve clinical named entity recognition performance. Consecutive clinical entities in a sentence are usually separated from each other, and the textual descriptions in clinical narrative documents frequently indicate causal or posterior relationships that can be used to facilitate clinical named entity recognition. However, the CRF that is generally used for named entity recognition is a first-order model that constrains label transition dependency of adjoining labels under the Markov assumption. METHODS: Based on the first-order structure, our proposed model utilizes non-entity tokens between separated entities as an information transmission medium by applying a label induction method. The model is referred to as precursor-induced CRF because its non-entity state memorizes precursor entity information, and the model’s structure allows the precursor entity information to propagate forward through the label sequence. RESULTS: We compared the proposed model with both first- and second-order CRFs in terms of their F(1)-scores, using two clinical named entity recognition corpora (the i2b2 2012 challenge and the Seoul National University Hospital electronic health record). The proposed model demonstrated better entity recognition performance than both the first- and second-order CRFs and was also more efficient than the higher-order model. CONCLUSION: The proposed precursor-induced CRF which uses non-entity labels as label transition information improves entity recognition F(1) score by exploiting long-distance transition factors without exponentially increasing the computational time. In contrast, a conventional second-order CRF model that uses longer distance transition factors showed even worse results than the first-order model and required the longest computation time. Thus, the proposed model could offer a considerable performance improvement over current clinical named entity recognition methods based on the CRF models. BioMed Central 2019-07-15 /pmc/articles/PMC6632205/ /pubmed/31307440 http://dx.doi.org/10.1186/s12911-019-0865-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Technical Advance Lee, Wangjin Choi, Jinwook Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
title	Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
title_full	Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
title_fullStr	Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
title_full_unstemmed	Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
title_short	Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
title_sort	precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition
topic	Technical Advance
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6632205/ https://www.ncbi.nlm.nih.gov/pubmed/31307440 http://dx.doi.org/10.1186/s12911-019-0865-1
work_keys_str_mv	AT leewangjin precursorinducedconditionalrandomfieldsconnectingseparateentitiesbyinductionforimprovedclinicalnamedentityrecognition AT choijinwook precursorinducedconditionalrandomfieldsconnectingseparateentitiesbyinductionforimprovedclinicalnamedentityrecognition

Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition

Ejemplares similares